Farm-Specific Effects in Predicting Mastitis by Applying Machine Learning Models to Automated Milking System and Other Farm Management Data
Simple Summary
Abstract
1. Introduction
- (1)
- Combined-farm effects (evaluating ML models on pooled data from all farms);
- (2)
- Combined-to-individual farm effects (evaluating generalization ability of models for individual farms);
- (3)
- Individual-farm effects (understanding model performance within a farm and comparing it between farms);
- (4)
- Farm-to-farm effects (evaluating model generalization to yet unobserved farms).
2. Materials and Methods
2.1. Data Collection and Preprocessing
2.2. Data Processing
2.3. Data Transformation
2.4. Procedure of ML Application
2.4.1. Data Splitting
- (1)
- Combined training and testing (mixed-farm effects): ML models were trained on combined data of all four farms for the period of 2019–2022 and tested on combined data of all farms for the year 2023 and up to May 2024.
- (2)
- Combined training and individual testing (mixed-to-individual farm effects): ML models were trained on combined data from all farms for the period 2019–2022 but tested on individual data of each farm for the year 2023 and up to May 2024.
- (3)
- Individual training and testing (individual-farm effects): ML models were trained on the data of each individual farm separately for the period from 2019 to 2022 and tested separately on each farm’s own data for the year 2023 and up to May 2024.
- (4)
- Farm-to-farm training and testing (farm-to-farm effects): ML models were trained on the complete data of three farms but tested on the complete data of the fourth farm that was not included in the training of models (leave-one-out cross validation). The same procedure was applied for all four farms.
2.4.2. Cross Validation and Hyperparameter Tuning
2.4.3. Resampling Technique
2.5. ML Models Evaluated
2.5.1. Logistic Regression (LR)
2.5.2. Support Vector Machine (SVM)
2.5.3. Decision Tree (DT)
2.5.4. Random Forest (RF)
2.5.5. Gradient-Boosting Decision Tree (GBDT)
2.5.6. Multi-Layer Perceptron Neural Network (MLP-NN)
3. Results
3.1. Mixed-Farm Effects (Training Combined and Testing Combined)
3.2. Mixed-to-Individual Farm Effects (Training Combined, Testing Separate)
3.3. Individual-Farm Effects (Training and Testing Separately)
3.4. Farm-to-Farm Effects (Training on Data of Three Farms and Testing on Data of Fourth Farm)
4. Discussion
5. Conclusions
Author Contributions
Funding
Institutional Review Board Statement
Informed Consent Statement
Data Availability Statement
Acknowledgments
Conflicts of Interest
Abbreviations
IMI | Intra-mammary infection |
AMS | Automatic milking system |
ML | Machine learning |
LR | Logistic regression |
SVM | Support vector machine |
DT | Decision tree |
RF | Random forest |
MLP-NN | Multi-layer perceptron neural network |
GBDT | Gradient-boosting decision tree |
SCC | Somatic cell count |
EC | Electrical conductivity |
MF | Milk flow |
MT | Milk temperature |
MY | Milk yield |
FM | Fat content in milk |
PM | Protein content in milk |
NL | Number of lactations |
DIM | Days in milk |
ET | Environmental temperature |
EH | Environmental humidity |
References
- Halasa, T.; Huijps, K.; Østerås, O.; Hogeveen, H. Economic Effects of Bovine Mastitis and Mastitis Management: A Review. Vet. Q. 2007, 29, 18–31. [Google Scholar] [CrossRef]
- Petersson-Wolfe, C.S.; Leslie, K.E.; Swartz, T.H. An Update on the Effect of Clinical Mastitis on the Welfare of Dairy Cows and Potential Therapies. Vet. Clin. N. Am. Food Anim. Pract. 2018, 34, 525–535. [Google Scholar] [CrossRef]
- Merle, R.; Hoedemaker, M.; Knubben-Schweizer, G.; Metzner, M.; Müller, K.-E.; Campe, A. Application of Epidemiological Methods in a Large-Scale Cross-Sectional Study in 765 German Dairy Herds—Lessons Learned. Animals 2024, 14, 1385. [Google Scholar] [CrossRef]
- Bar, D.; Gröhn, Y.T.; Bennett, G.; González, R.N.; Hertl, J.A.; Schulte, H.F.; Tauer, L.W.; Welcome, F.L.; Schukken, Y.H. Effect of Repeated Episodes of Generic Clinical Mastitis on Milk Yield in Dairy Cows. J. Dairy Sci. 2007, 90, 4643–4653. [Google Scholar] [CrossRef] [PubMed]
- Jacobs, J.A.; Siegford, J.M. Invited Review: The Impact of Automatic Milking Systems on Dairy Cow Management, Behavior, Health, and Welfare. J. Dairy Sci. 2012, 95, 2227–2247. [Google Scholar] [CrossRef] [PubMed]
- Ozella, L.; Giacobini, M.; Vicuna Diaz, E.; Schiavone, A.; Forte, C. A Comparative Study of Social Behavior in Primiparous and Multiparous Dairy Cows during Automatic Milking. Appl. Anim. Behav. Sci. 2023, 268, 106065. [Google Scholar] [CrossRef]
- Lasser, J.; Matzhold, C.; Egger-Danner, C.; Fuerst-Waltl, B.; Steininger, F.; Wittek, T.; Klimek, P. Integrating Diverse Data Sources to Predict Disease Risk in Dairy Cattle—A Machine Learning Approach. J. Anim. Sci. 2021, 99, skab294. [Google Scholar] [CrossRef] [PubMed]
- Hogeveen, H.; Kamphuis, C.; Steeneveld, W.; Mollenhorst, H. Sensors and Clinical Mastitis—The Quest for the Perfect Alert. Sensors 2010, 10, 7991–8009. [Google Scholar] [CrossRef]
- Kamphuis, C.; Mollenhorst, H.; Heesterbeek, J.A.P.; Hogeveen, H. Detection of Clinical Mastitis with Sensor Data from Automatic Milking Systems Is Improved by Using Decision-Tree Induction. J. Dairy Sci. 2010, 93, 3616–3627. [Google Scholar] [CrossRef]
- Jansen, J.; van den Borne, B.H.P.; Renes, R.J.; van Schaik, G.; Lam, T.J.G.M.; Leeuwis, C. Explaining Mastitis Incidence in Dutch Dairy Farming: The Influence of Farmers’ Attitudes and Behaviour. Prev. Vet. Med. 2009, 92, 210–223. [Google Scholar] [CrossRef]
- Schukken, Y.H.; Grommers, F.J.; Van De Geer, D.; Erb, H.N.; Brand, A. Risk Factors for Clinical Mastitis in Herds with a Low Bulk Milk Somatic Cell Count. 1. Data and Risk Factors for All Cases. J. Dairy Sci. 1990, 73, 3463–3471. [Google Scholar] [CrossRef]
- Peeler, E.J.; Green, M.J.; Fitzpatrick, J.L.; Morgan, K.L.; Green, L.E. Risk Factors Associated with Clinical Mastitis in Low Somatic Cell Count British Dairy Herds. J. Dairy Sci. 2000, 83, 2464–2472. [Google Scholar] [CrossRef]
- Barnouin, J.; Bord, S.; Bazin, S.; Chassagne, M. Dairy Management Practices Associated with Incidence Rate of Clinical Mastitis in Low Somatic Cell Score Herds in France. J. Dairy Sci. 2005, 88, 3700–3709. [Google Scholar] [CrossRef]
- Barkema, H.W.; Van der Ploeg, J.D.; Schukken, Y.H.; Lam, T.J.G.M.; Benedictus, G.; Brand, A. Management Style and Its Association with Bulk Milk Somatic Cell Count and Incidence Rate of Clinical Mastitis. J. Dairy Sci. 1999, 82, 1655–1663. [Google Scholar] [CrossRef] [PubMed]
- Naqvi, S.A.; King, M.T.M.; Matson, R.D.; DeVries, T.J.; Deardon, R.; Barkema, H.W. Mastitis Detection with Recurrent Neural Networks in Farms Using Automated Milking Systems. Comput. Electron. Agric. 2022, 192, 106618. [Google Scholar] [CrossRef]
- van Rossum, G.; Drake, F.L. The Python 3 Language Reference Manual; CreateSpace: Scotts Valley, CA, USA, 2009; ISBN 978-1-4414-1269-0. [Google Scholar]
- Barkema, H.W.; Schukken, Y.H.; Lam, T.J.; Beiboer, M.L.; Wilmink, H.; Benedictus, G.; Brand, A. Incidence of Clinical Mastitis in Dairy Herds Grouped in Three Categories by Bulk Milk Somatic Cell Counts. J. Dairy Sci. 1998, 81, 411–419. [Google Scholar] [CrossRef]
- Hertl, J.A.; Schukken, Y.H.; Welcome, F.L.; Tauer, L.W.; Gröhn, Y.T. Effects of Pathogen-Specific Clinical Mastitis on Probability of Conception in Holstein Dairy Cows. J. Dairy Sci. 2014, 97, 6942–6954. [Google Scholar] [CrossRef] [PubMed]
- James, D. Hamilton Time Series Analysis; Princeton University Press: Princeton, NJ, USA, 1994; Available online: https://press.princeton.edu/books/hardcover/9780691042893/time-series-analysis (accessed on 7 May 2024).
- Dharejo, M.N.; Minoque, L.; Kabelitz, T.; Amon, T.; Kashongwe, O.; Doherr, M.G. Time Series Data Analysis to Predict the Status of Mastitis in Dairy Cows by Applying Machine Learning Models to Automated Milking Systems Data. Prev. Vet. Med. 2025, 242, 106575. [Google Scholar] [CrossRef]
- Fan, X.; Watters, R.D.; Nydam, D.V.; Virkler, P.D.; Wieland, M.; Reed, K.F. Multivariable Time Series Classification for Clinical Mastitis Detection and Prediction in Automated Milking Systems. J. Dairy Sci. 2023, 106, 3448–3464. [Google Scholar] [CrossRef] [PubMed]
- Zheng, A.; Casari, A. Feature Engineering for Machine Learning; O’Reilly Media, Inc.: Santa Rosa, CA, USA, 2018; Available online: https://www.oreilly.com/library/view/feature-engineering-for/9781491953235/ (accessed on 27 May 2024).
- Sarkar, D.; Bali, R.; Sharma, T. Practical Machine Learning with Python; Apress: Berkeley, CA, USA, 2018; ISBN 978-1-4842-3206-4. [Google Scholar]
- Sharma, N.; Malviya, L.; Jadhav, A.; Lalwani, P. A Hybrid Deep Neural Net Learning Model for Predicting Coronary Heart Disease Using Randomized Search Cross-Validation Optimization. Decis. Anal. J. 2023, 9, 100331. [Google Scholar] [CrossRef]
- Ali, H.; Salleh, M.N.M.; Saedudin, R.; Hussain, K.; Mushtaq, M.F. Imbalance Class Problems in Data Mining: A Review. Indones. J. Electr. Eng. Comput. Sci. 2019, 14, 1552–1563. [Google Scholar] [CrossRef]
- Chawla, N.V.; Bowyer, K.W.; Hall, L.O.; Kegelmeyer, W.P. SMOTE: Synthetic Minority Over-Sampling Technique. J. Artif. Intell. Res. 2002, 16, 321–357. [Google Scholar] [CrossRef]
- Boateng, E.Y.; Abaye, D.A. A Review of the Logistic Regression Model with Emphasis on Medical Research. J. Data Anal. Inf. Process. 2019, 07, 190. [Google Scholar] [CrossRef]
- Pisner, D.A.; Schnyer, D.M. Chapter 6—Support Vector Machine. In Machine Learning; Mechelli, A., Vieira, S., Eds.; Academic Press: Cambridge, MA, USA, 2020; pp. 101–121. ISBN 978-0-12-815739-8. [Google Scholar]
- Jenhani, I.; Amor, N.B.; Elouedi, Z. Decision Trees as Possibilistic Classifiers. Int. J. Approx. Reason. 2008, 48, 784–807. [Google Scholar] [CrossRef]
- Hatwell, J.; Gaber, M.M.; Azad, R.M.A. CHIRPS: Explaining Random Forest Classification. Artif. Intell. Rev. 2020, 53, 5747–5788. [Google Scholar] [CrossRef]
- Anghel, A.; Papandreou, N.; Parnell, T.; De Palma, A.; Pozidis, H. Benchmarking and Optimization of Gradient Boosting Decision Tree Algorithms. arXiv 2018, arXiv:1809.04559. [Google Scholar]
- Raghu, S.; Sriraam, N. Optimal Configuration of Multilayer Perceptron Neural Network Classifier for Recognition of Intracranial Epileptic Seizures. Expert Syst. Appl. 2017, 89, 205–221. [Google Scholar] [CrossRef]
- Murtagh, F. Multilayer Perceptrons for Classification and Regression. Neurocomputing 1991, 2, 183–197. [Google Scholar] [CrossRef]
- Böker, A.R.; Bartel, A.; Do Duc, P.; Hentzsch, A.; Reichmann, F.; Merle, R.; Arndt, H.; Dachrodt, L.; Woudstra, S.; Hoedemaker, M. Status of Udder Health Performance Indicators and Implementation of on Farm Monitoring on German Dairy Cow Farms: Results from a Large Scale Cross-Sectional Study. Front. Vet. Sci. 2023, 10, 1193301. [Google Scholar] [CrossRef] [PubMed]
- Khatun, M.; Thomson, P.C.; Kerrisk, K.L.; Lyons, N.A.; Clark, C.E.F.; Molfino, J.; García, S.C. Development of a New Clinical Mastitis Detection Method for Automatic Milking Systems. J. Dairy Sci. 2018, 101, 9385–9395. [Google Scholar] [CrossRef] [PubMed]
- de Mol, R.M.; Kroeze, G.H.; Achten, J.M.F.H.; Maatje, K.; Rossing, W. Results of a Multivariate Approach to Automated Oestrus and Mastitis Detection. Livest. Prod. Sci. 1997, 48, 219–227. [Google Scholar] [CrossRef]
Farm Name | Farm B | Farm G | Farm H | Farm M |
---|---|---|---|---|
German Federal State | Brandenburg | Brandenburg | Brandenburg | Saxony |
Farm Size (ha) | 1200 | 940 | 3590 | 1370 |
Herd Size | 220 | 230 | 560 | 820 |
AMS Type | Lely Astronaut * | Lely Astronaut | GEA Mione ** | Lely Astronaut |
Average Daily Milk Yield per Cow (L) | 31 | 30 | 29 | 31 |
ML Model | Method | Hyperparameters | |
---|---|---|---|
Tested | Selected | ||
Logistic Regression | Binary | C = range (1 to 30) | C = 10 |
Support Vector Machine | Linear | C = range (1 to 30) | C = 10 |
Decision Tree | Gini | Max depth: range (1 to 25) | Max depth = 12 |
Random Forest | Gini | Number of estimators: 5 to 50 Max depth: 5 to 20 | Number of estimators = 25 Max depth = 12 |
Gradient-Boosting Decision Tree | Log loss | Number of estimators: 5 to 50 Max depth: 5 to 20 | Number of estimators = 25 Max depth = 12 |
Multi-Layer Perceptron Neural Network | Input: Relu Output: Sigmoid | Hidden layer sizes: 10 to 50 | Hidden layer sizes = 20 |
Total Observations | Negative | Positive | Positive % | |
---|---|---|---|---|
Overall | 1,886,947 | 1,875,568 | 11,379 | 0.60 |
Farm B | 282,291 | 281,921 | 370 | 0.13 |
Farm G | 297,073 | 295,964 | 1109 | 0.37 |
Farm H | 395,622 | 394,428 | 1194 | 0.30 |
Farm M | 911,961 | 903,255 | 8706 | 0.95 |
Farm | ML Models | Accuracy (%) | Sensitivity (%) | Specificity (%) | Area Under Curve (%) |
---|---|---|---|---|---|
All | LR | 91 | 83 | 91 | 95 |
SVM | 92 | 81 | 92 | 95 | |
DT | 89 | 80 | 89 | 91 | |
RF | 89 | 87 | 89 | 96 | |
MLP-NN | 83 | 93 | 83 | 96 | |
GBDT | 92 | 78 | 92 | 96 |
Farm | ML Models | Accuracy (%) | Sensitivity (%) | Specificity (%) | Area Under Curve (%) |
---|---|---|---|---|---|
B | LR | 94 | 82 | 94 | 96 |
SVM | 92 | 83 | 92 | 96 | |
DT | 88 | 73 | 88 | 86 | |
RF | 89 | 79 | 89 | 94 | |
MLP-NN | 84 | 94 | 84 | 94 | |
GBDT | 91 | 68 | 91 | 93 | |
G | LR | 97 | 64 | 97 | 96 |
SVM | 98 | 59 | 98 | 97 | |
DT | 95 | 68 | 95 | 89 | |
RF | 96 | 74 | 96 | 96 | |
MLP-NN | 94 | 85 | 94 | 96 | |
GBDT | 97 | 61 | 97 | 96 | |
H | LR | 91 | 43 | 91 | 83 |
SVM | 92 | 41 | 92 | 84 | |
DT | 89 | 41 | 89 | 79 | |
RF | 89 | 48 | 89 | 85 | |
MLP-NN | 84 | 63 | 84 | 83 | |
GBDT | 91 | 43 | 91 | 84 | |
M | LR | 93 | 77 | 93 | 95 |
SVM | 93 | 76 | 93 | 95 | |
DT | 88 | 75 | 88 | 90 | |
RF | 88 | 73 | 88 | 93 | |
MLP-NN | 85 | 89 | 85 | 93 | |
GBDT | 90 | 71 | 90 | 93 |
Farm | ML Models | Accuracy (%) | Sensitivity (%) | Specificity (%) | Area Under Curve (%) |
---|---|---|---|---|---|
B | LR | 92 | 88 | 92 | 96 |
SVM | 92 | 87 | 92 | 96 | |
DT | 95 | 68 | 95 | 82 | |
RF | 97 | 54 | 98 | 96 | |
MLP-NN | 94 | 80 | 94 | 95 | |
GBDT | 97 | 61 | 97 | 93 | |
G | LR | 95 | 86 | 95 | 98 |
SVM | 96 | 85 | 96 | 98 | |
DT | 96 | 80 | 96 | 82 | |
RF | 96 | 82 | 96 | 97 | |
MLP-NN | 93 | 93 | 93 | 96 | |
GBDT | 97 | 77 | 97 | 94 | |
H | LR | 81 | 71 | 81 | 83 |
SVM | 82 | 69 | 82 | 83 | |
DT | 87 | 55 | 88 | 74 | |
RF | 91 | 49 | 91 | 87 | |
MLP-NN | 73 | 85 | 73 | 84 | |
GBDT | 93 | 45 | 93 | 85 | |
M | LR | 92 | 85 | 92 | 96 |
SVM | 93 | 84 | 93 | 96 | |
DT | 90 | 82 | 90 | 96 | |
RF | 90 | 87 | 90 | 89 | |
MLP-NN | 86 | 93 | 86 | 96 | |
GBDT | 93 | 81 | 93 | 96 |
Farm | ML Models | Accuracy (%) | Sensitivity (%) | Specificity (%) | Area Under Curve (%) |
---|---|---|---|---|---|
B | LR | 88 | 89 | 88 | 95 |
SVM | 89 | 87 | 89 | 95 | |
DT | 86 | 83 | 86 | 92 | |
RF | 86 | 88 | 86 | 95 | |
MLP-NN | 80 | 97 | 80 | 94 | |
GBDT | 89 | 79 | 89 | 94 | |
G | LR | 97 | 74 | 97 | 96 |
SVM | 97 | 72 | 97 | 97 | |
DT | 95 | 68 | 95 | 90 | |
RF | 96 | 74 | 96 | 97 | |
MLP-NN | 94 | 91 | 94 | 96 | |
GBDT | 97 | 60 | 97 | 97 | |
H | LR | 89 | 45 | 90 | 81 |
SVM | 90 | 44 | 90 | 82 | |
DT | 85 | 51 | 85 | 80 | |
RF | 86 | 55 | 86 | 85 | |
MLP-NN | 82 | 64 | 82 | 83 | |
GBDT | 89 | 44 | 89 | 85 | |
M | LR | 90 | 87 | 90 | 96 |
SVM | 90 | 86 | 90 | 96 | |
DT | 88 | 63 | 88 | 76 | |
RF | 89 | 72 | 89 | 91 | |
MLP-NN | 83 | 93 | 83 | 91 | |
GBDT | 91 | 58 | 91 | 90 |
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content. |
© 2025 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).
Share and Cite
Dharejo, M.N.; Kashongwe, O.; Amon, T.; Kabelitz, T.; Doherr, M.G. Farm-Specific Effects in Predicting Mastitis by Applying Machine Learning Models to Automated Milking System and Other Farm Management Data. Animals 2025, 15, 2825. https://doi.org/10.3390/ani15192825
Dharejo MN, Kashongwe O, Amon T, Kabelitz T, Doherr MG. Farm-Specific Effects in Predicting Mastitis by Applying Machine Learning Models to Automated Milking System and Other Farm Management Data. Animals. 2025; 15(19):2825. https://doi.org/10.3390/ani15192825
Chicago/Turabian StyleDharejo, Muhammad N., Olivier Kashongwe, Thomas Amon, Tina Kabelitz, and Marcus G. Doherr. 2025. "Farm-Specific Effects in Predicting Mastitis by Applying Machine Learning Models to Automated Milking System and Other Farm Management Data" Animals 15, no. 19: 2825. https://doi.org/10.3390/ani15192825
APA StyleDharejo, M. N., Kashongwe, O., Amon, T., Kabelitz, T., & Doherr, M. G. (2025). Farm-Specific Effects in Predicting Mastitis by Applying Machine Learning Models to Automated Milking System and Other Farm Management Data. Animals, 15(19), 2825. https://doi.org/10.3390/ani15192825