# Livestock Informatics Toolkit: A Case Study in Visually Characterizing Complex Behavioral Patterns across Multiple Sensor Platforms, Using Novel Unsupervised Machine Learning and Information Theoretic Approaches

^{1}

^{2}

^{3}

^{*}

## Abstract

**:**

## 1. Introduction

## 2. Materials and Methods

#### 2.1. Description of Data

^{TM}rotary parlor (DeLaval, Tumba, Sweden). At each morning milking, raw milking logs were exported from the parlor software, and the data were processed in order to extract the single-file order that cows entered the rotary [23]. A total of 80 milk order records—26 recorded while cows remained overnight in a free-stall barn, and 54 following the transition to overnight access to spring pasture—were used to create discrete encodings for parlor entry patterns via data mechanics clustering (see McVey et al. for further analytical details) [7]. The dendrograms summarizing the distribution of cow entry-order patterns and subsequent heatmap visualizations will be subjected to further analysis, without modifications to the previously reported encodings.

#### 2.2. Improving Empirical Encodings of Overall Time Budget through Simulation

#### 2.3. Improving Tree Pruning Decisions through Simulation

#### 2.4. An Information Theoretic Framework for Cross-Sensor Inferences

## 3. Results and Discussion

#### 3.1. Improving Empirical Encodings of Overall Time Budget through Simulation

#### 3.2. Improving Tree Pruning Decisions through Simulation

#### 3.3. An Information Theoretic Framework for Cross-Sensor Inferences

## 4. Conclusions

## Supplementary Materials

## Author Contributions

## Funding

## Institutional Review Board Statement

## Informed Consent Statement

## Data Availability Statement

## Acknowledgments

## Conflicts of Interest

## References

- Stygar, A.H.; Gómez, Y.; Berteselli, G.V.; Costa, E.D.; Canali, E.; Niemi, J.K.; Llonch, P.; Pastell, M. A Systematic Review on Commercially Available and Validated Sensor Technologies for Welfare Assessment of Dairy Cattle. Front. Veter. Sci.
**2021**, 8, 634338. [Google Scholar] [CrossRef] [PubMed] - Pinheiro, J.; Bates, D. Mixed-Effects Models in S and S-PLUS; Springer Science & Business Media: Berlin, Germany, 2006. [Google Scholar]
- Farine, D.R. A guide to null models for animal social network analysis. Methods Ecol. Evol.
**2017**, 8, 1309–1320. [Google Scholar] [CrossRef] [PubMed] [Green Version] - McCowan, B.; Beisner, B.; Bliss-Moreau, E.; Vandeleest, J.; Jin, J.; Hannibal, D.; Hsieh, F. Connections Matter: Social Networks and Lifespan Health in Primate Translational Models. Front. Psychol.
**2016**, 7, 433. [Google Scholar] [CrossRef] [PubMed] [Green Version] - Cooper, M.D.; Arney, D.R.; Webb, C.R.; Phillips, C.J. Interactions between housed dairy cows during feeding, lying, and standing. J. Vet. Behav.
**2008**, 3, 218–227. [Google Scholar] [CrossRef] - Valletta, J.J.; Torney, C.; Kings, M.; Thornton, A.; Madden, J. Applications of machine learning in animal behaviour studies. Anim. Behav.
**2017**, 124, 203–220. [Google Scholar] [CrossRef] - McVey, C.; Hsieh, F.; Manriquez, D.; Pinedo, P.; Horback, K. Mind the Queue: A Case Study in Visualizing Heterogeneous Behavioral Patterns in Livestock Sensor Data Using Unsupervised Machine Learning Techniques. Front. Vet. Sci.
**2020**, 7, 523. [Google Scholar] [CrossRef] - Kirby, M. Geometric Data Analysis: An Empirical Approach to Dimensionality Reduction and the Study of Patterns; John Wiley & Sons: New York, NY, USA, 2001. [Google Scholar]
- James, G.; Witten, D.; Hastie, T.; Tibshirani, R. An Introduction to Statistical Learning; Springer: New York, NY, USA, 2013; Volume 103. [Google Scholar]
- Hastie, T.; Tibshirani, R.; Friedman, J. The Elements of Statistical Learning: Data Mining, Inference, and Prediction; Springer Science & Business Media: New York, NY, USA, 2009. [Google Scholar]
- Hsieh, F.; Chou, E.; Chen, T.-L. Mimicking Complexity of Structured Data Matrix’s Information Content: Categorical Exploratory Data Analysis. Entropy
**2021**, 23, 594. [Google Scholar] [CrossRef] - MacKay, D.J.C. Information Theory, Inference, and Learning Algorithms; Cambridge University Press: Cambridge, UK, 2003. [Google Scholar]
- Adamczyk, K.; Cywicka, D.; Herbut, P.; Trześniowska, E. The application of cluster analysis methods in assessment of daily physical activity of dairy cows milked in the Voluntary Milking System. Comput. Electron. Agric.
**2017**, 141, 65–72. [Google Scholar] [CrossRef] - Schwager, M.; Anderson, D.M.; Butler, Z.; Rus, D. Robust classification of animal tracking data. Comput. Electron. Agric.
**2007**, 56, 46–59. [Google Scholar] [CrossRef] - Dutta, R.; Smith, D.; Rawnsley, R.; Bishop-Hurley, G.; Hills, J.; Timms, G.; Henry, D. Dynamic cattle behavioural classification using supervised ensemble classifiers. Comput. Electron. Agric.
**2015**, 111, 18–28. [Google Scholar] [CrossRef] - Xu, H.; Li, S.; Lee, C.; Ni, W.; Abbott, D.; Johnson, M.; Lea, J.M.; Yuan, J.; Campbell, D.L.M. Analysis of Cattle Social Transitional Behaviour: Attraction and Repulsion. Sensors
**2020**, 20, 5340. [Google Scholar] [CrossRef] [PubMed] - Brenninkmeyer, C.; Dippel, S.; Brinkmann, J.; March, S.; Winckler, C.; Knierim, U. Investigating integument alterations in cubicle housed dairy cows: Which types and locations can be combined? Animal
**2016**, 10, 342–348. [Google Scholar] [CrossRef] [PubMed] [Green Version] - Lee, M.; Lee, S.; Park, J.; Seo, S. Clustering and Characterization of the Lactation Curves of Dairy Cows Using K-Medoids Clustering Algorithm. Animal
**2020**, 10, 1348. [Google Scholar] [CrossRef] - Fushing, H.; Liu, S.-Y.; Hsieh, Y.-C.; McCowan, B. From patterned response dependency to structured covariate dependency: Entropy based categorical-pattern-matching. PLoS ONE
**2018**, 13, e0198253. [Google Scholar] [CrossRef] [PubMed] - Guan, J.; Fushing, H. Coupling Geometry on Binary Bipartite Networks: Hypotheses Testing on Pattern Geometry and Nestedness. Front. Appl. Math. Stat.
**2018**, 4, 38. [Google Scholar] [CrossRef] - Manriquez, D.; Chen, L.; Albornoz, G.; Velez, J.; Pinedo, P. Case Study: Assessment of human-conditioned sorting behavior in dairy cows in farm research trials. Prof. Anim. Sci.
**2018**, 34, 664–670. [Google Scholar] [CrossRef] - Manriquez, D.; Chen, L.; Melendez, P.; Pinedo, P. The effect of an organic rumen-protected fat supplement on performance, metabolic status, and health of dairy cows. BMC Vet. Res.
**2019**, 15, 1–14. [Google Scholar] [CrossRef] [Green Version] - R Core Team. R: A Language and Environment for Statistical Computing. R Foundation for Statistical Computing. 2018. Available online: https://www.R-project.org/ (accessed on 17 August 2018).
- Bikker, J.; Van Laar, H.; Rump, P.; Doorenbos, J.; Van Meurs, K.; Griffioen, G.; Dijkstra, J. Technical note: Evaluation of an earattached movement sensor to record cow feeding behavior and activity. J. Dairy Sci.
**2014**, 97, 2974–2979. [Google Scholar] [CrossRef] - Pereira, G.; Heins, B.; Endres, M. Technical note: Validation of an ear-tag accelerometer sensor to determine rumination, eating, and activity behaviors of grazing dairy cattle. J. Dairy Sci.
**2018**, 101, 2492–2495. [Google Scholar] [CrossRef] [Green Version] - Agresti, A. Categorical Data Analysis, 3rd ed.; John Wiley & Sons, Inc.: Hoboken, NJ, USA, 2013. [Google Scholar]
- Shirkhorshidi, A.S.; Aghabozorgi, S.; Wah, T.Y. A Comparison Study on Similarity and Dissimilarity Measures in Clustering Continuous Data. PLoS ONE
**2015**, 10, e0144059. [Google Scholar] [CrossRef] [Green Version] - Efron, B.; Tibshirani, R.J. An Introduction to the Bootstrap; Springer: New York, NY, USA, 1993. [Google Scholar]
- Papadakis, M.; Tsagris, M.; Dimitriadis, M.; Fafalios, S.; Tsamardinos, I.; Fasiolo, M.; Borboudakis, G.; Burkardt, J.; Zou, C.; Lakiotaki, K.; et al. Rfast: A Collection of Efficient and Extremely Fast R Functions (1.9.9). 2020. Available online: https://CRAN.R-project.org/package=Rfast (accessed on 10 March 2020).
- Kolde, R. pheatmap: Pretty Heatmaps. 2019. Available online: https://CRAN.R-project.org/package=pheatmap (accessed on 4 January 2019).
- Wickham, H. ggplot2: Elegant Graphics for Data Analysis; Springer: New York, NY, USA, 2016; Available online: http://ggplot2.org (accessed on 6 July 2020).
- Kassambara, A. ggpubr: “ggplot2” Based Publication Ready Plots (0.4.0). 2020. Available online: https://CRAN.R-project.org/package=ggpubr (accessed on 6 July 2020).
- Tucker, C.B.; Jensen, M.B.; De Passillé, A.M.; Hänninen, L.; Rushen, J. Invited review: Lying time and the welfare of dairy cows. J. Dairy Sci.
**2021**, 104, 20–46. [Google Scholar] [CrossRef] - Rand, W.M. Objective Criteria for the Evaluation of Clustering Methods. J. Am. Stat. Assoc.
**1971**, 66, 846–850. [Google Scholar] [CrossRef] - Hausser, J.; Strimmer, K. entropy: Estimation of Entropy, Mutual Information and Related Quantities (1.3.0). 2021. Available online: https://CRAN.R-project.org/package=entropy (accessed on 9 August 2021).
- Hausser, J.; Strimmer, K. Entropy Inference and the James-Stein Estimator, with Application to Nonlinear Gene Association Networks. J. Mach. Learn. Res.
**2009**, 10, 1469–1484. [Google Scholar] - Johnson, M. Confidence Intervals on Likelihood Estimates for Estimating Association Strength. Technical Report. 1999. Available online: http://web.science.mq.edu.au/~mjohnson/papers/sigdiff.pdf (accessed on 10 September 2021).
- Rathore, A. Order of cow entry at milking and its relationships with milk yield and consistency of the order. Appl. Anim. Ethol.
**1982**, 8, 45–52. [Google Scholar] [CrossRef] - Gadbury, J.C. Some preliminary field observations on the order of entry of cows into herringbone parlours. Appl. Anim. Ethol.
**1975**, 1, 275–281. [Google Scholar] [CrossRef] - Berry, D.; McCarthy, J. Genetic and non-genetic factors associated with milking order in lactating dairy cows. Appl. Anim. Behav. Sci.
**2012**, 136, 15–19. [Google Scholar] [CrossRef] - Beggs, D.; Jongman, E.; Hemsworth, P.; Fisher, A. Short communication: Milking order consistency of dairy cows in large Australian herds. J. Dairy Sci.
**2018**, 101, 603–608. [Google Scholar] [CrossRef] [Green Version] - Soffié, M.; Thinès, G.; De Marneffe, G. Relation between milking order and dominance value in a group of dairy cows. Appl. Anim. Ethol.
**1976**, 2, 271–276. [Google Scholar] [CrossRef] - Kilgour, R.; Scott, T.H. Leadership in a Herd of Dairy Cows. Proc. N. Z. Soc. Anim. Prod.
**1959**, 19, 36–43. [Google Scholar] - Reinhardt, V. Movement Orders and Leadership in a Semi-Wild Cattle Herd. Behaviour
**1983**, 83, 251–264. [Google Scholar] [CrossRef]

**Figure 1.**Visualization of the test–branch results for the first bifurcation of the Euclidean distance time budget dendrogram, cut using the noise-penalized ensemble of data mimicries. In simulations under the alternative hypothesis, the addition of noise intended to mimic measurement error has destabilized the tree, causing it to “flip-flop” between first isolating cows with more moderate time budgets, and animals at the two extremes of the tradeoff between eating and ruminating. Although both branches are distinguishable from measurement errors, this ambiguity in bifurcation order has produced bimodality in the distribution of mutual information estimates against the encoding for the observed data. Retesting with more clusters allows the algorithm to “look down the branch” to produce better separation between encodings under the null and the alternative, and thereby avoid spurious over-pruning.

**Figure 2.**Comparison of overall time budget encodings derived from different dissimilarity metrics. In each heatmap cows are arranged along the row axis, and the mutually exclusive behaviors along the column axis, such that each cell is colored to represent the proportion of time that a given cow is recorded engaging in a specific behavior. Row gaps have been added within each heatmap to reflect the first 10 branches of the corresponding dendrogram, which here are numerically indexed reading from top to bottom (

**A**) Euclidean norm encoding with row annotations representing cow-level attributes. (

**B**) KL Divergence encoding with row annotations representing log-scaled variance in observed daily time budgets. (

**C**) Noise-penalized ensemble-weighted Euclidean distance encoding with row annotations representing the log-scaled ensemble variances. (

**D**) Plasticity-penalized ensemble-weighted Euclidean distance encoding with row annotations representing the log-scaled ensemble variances. See Supplemental Materials for full-scale versions of these images.

**Figure 3.**Visualization produced using the compareEncoding utility. The noise-penalized encoding is represented on the row axis and the plasticity-penalized encoding is represented on the column axis. Clusters in either heatmap are numbered from top to bottom, and so align directly with the corresponding row and column margins of the contingency table reading up-down and left-right respectively. Cell counts show that these two encodings are quite similar at the extremes of the time budget distribution, but differ slightly in cutoffs amongst the more moderate time budget clusters. See Supplemental Materials for larger versions of these images.

**Figure 4.**Encodings produced by the cutreeEnsemble algorithm. (

**A**) Dendrogram produced by the noise-penalized ensemble weighted dissimilarity metric cut using the noise-penalized data mimicry. The extremely fine encoding with 38 stochastically validated clusters demonstrates that, with so many recorded observations over this extended observation window, the accuracy of the sensor itself should impose few constraints on our behavioral inferences. (

**B**) Dendrogram produced by the plasticity-penalized ensemble weighted dissimilarity metric cut using the plasticity-penalized data mimicry. A courser encoding is returned when uncertainty in time budget observations attributable to the behavioral plasticity of the animal itself is taken into consideration.

**Figure 5.**Visualization produced using the compareEncoding utility with cells colored by pointwise mutual information estimates significant at the alpha = 0.05 significance level after simulations using multinomial resampling. Data mechanics encoding of parlor entry position is presented to the row margin of the contingency table, wherein the heatmap contains row annotations representing days on trial and the observation period, such that the pen period corresponds with the observation window of the overall time budget. The noise-penalized encoding of overall time budget is represented on the column axis of the contingency table. Pointwise mutual information values reveal that the significant MI test between these two encodings is driven predominantly by behavioral patterns amongst cows in the latter half of the milking queue.

**Figure 6.**Visualization produced using the compareEncoding utility with cells colored by pointwise mutual information estimates significant at the alpha = 0.05 significance level after simulations using multinomial resampling. Cows with the most moderate time budgets were overrepresented among animals with no recorded health events, while sick cows were overrepresented in the overall time budget cluster characterized by relatively low time spend eating and low-to-moderate amounts of time spent nonactive.

**Figure 7.**Visualization produced using the compareEncoding utility with cells colored by pointwise mutual information estimates significant at the alpha = 0.05 significance level after simulations using multinomial resampling. Data mechanics encoding of time budget data using only cows with no recorded health events is represented on the row axis, and the plasticity-penalized encoding of overall time budget is represented on the column axis. Among cows with no acute illness, cows at the very end of the queue are now overrepresented in the time budget cluster characterized with fairly high time spent eating (cluster 8). Cows entering just ahead of them are not only underrepresented in this high eating time cluster, but are also overrepresented in the cluster with relatively low eating time cluster with low-to-moderate nonactivity (cluster 3) that was independently associated with higher rates of clinical illness.

Publisher’s Note: MDPI stays neutral with regard to jurisdictional claims in published maps and institutional affiliations. |

© 2021 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).

## Share and Cite

**MDPI and ACS Style**

McVey, C.; Hsieh, F.; Manriquez, D.; Pinedo, P.; Horback, K.
Livestock Informatics Toolkit: A Case Study in Visually Characterizing Complex Behavioral Patterns across Multiple Sensor Platforms, Using Novel Unsupervised Machine Learning and Information Theoretic Approaches. *Sensors* **2022**, *22*, 1.
https://doi.org/10.3390/s22010001

**AMA Style**

McVey C, Hsieh F, Manriquez D, Pinedo P, Horback K.
Livestock Informatics Toolkit: A Case Study in Visually Characterizing Complex Behavioral Patterns across Multiple Sensor Platforms, Using Novel Unsupervised Machine Learning and Information Theoretic Approaches. *Sensors*. 2022; 22(1):1.
https://doi.org/10.3390/s22010001

**Chicago/Turabian Style**

McVey, Catherine, Fushing Hsieh, Diego Manriquez, Pablo Pinedo, and Kristina Horback.
2022. "Livestock Informatics Toolkit: A Case Study in Visually Characterizing Complex Behavioral Patterns across Multiple Sensor Platforms, Using Novel Unsupervised Machine Learning and Information Theoretic Approaches" *Sensors* 22, no. 1: 1.
https://doi.org/10.3390/s22010001