Influence of Preprocessing Methods of Automated Milking Systems Data on Prediction of Mastitis with Machine Learning Models
Abstract
:1. Introduction
2. Materials and Methods
2.1. Data Collection
2.2. Data Preparation
2.2.1. Libraries and Packages Used for Data Preparation and Modeling
2.2.2. Training, Validation and Test Data Split
2.3. Missing-Value Imputation
2.4. Resampling Methods
2.5. Model Building, Evaluation and Parameter Tuning
3. Results
3.1. Performance Metrics of ML Models Trained on Data with Different Missing-Value Imputation Techniques
3.2. Performance Metrics of ML Models Trained on Data with Different Resampling Techniques
3.3. Ranking of the Prediction Scores of Individual Machine Learning Models with Both Resampling and Imputation Methods*
4. Discussion
5. Conclusions
Supplementary Materials
Author Contributions
Funding
Data Availability Statement
Acknowledgments
Conflicts of Interest
References
- Cheng, W.N.; Gu, H.S. Bovine Mastitis: Risk Factors, Therapeutic Strategies, and Alternative Treatments-A Review. Asian-Australas. J. Anim. Sci. 2020, 33, 1699–1713. [Google Scholar] [CrossRef] [PubMed]
- Egyedy, A.F.; Ametaj, B.N. Mastitis: Impact of Dry Period, Pathogens, and Immune Responses on Etiopathogenesis of Disease and its Association with Periparturient Diseases. Dairy 2022, 3, 881–906. [Google Scholar] [CrossRef]
- Hogeveen, H.; Steeneveld, W.; Wolf, C.A. Production Diseases Reduce the Efficiency of Dairy Production: A Review of the Results, Methods, and Approaches Regarding the Economics of Mastitis. Annu. Rev. Resour. Econ. 2019, 11, 289–312. [Google Scholar] [CrossRef]
- Sweeney, M.T.; Gunnett, L.; Kumar, D.M.; Lunt, B.L.; Moulin, V.; Barrett, M.; Gurjar, A.; Doré, E.; Pedraza, J.R.; Bade, D.; et al. Antimicrobial susceptibility of mastitis pathogens isolated from North American dairy cattle, 2011–2022. Vet. Microbiol. 2024, 291, 110015. [Google Scholar] [CrossRef]
- Martins, S.A.; Martins, V.C.; Cardoso, F.A.; Germano, J.; Rodrigues, M.; Duarte, C.; Bexiga, R.; Cardoso, S.; Freitas, P.P. Biosensors for On-Farm Diagnosis of Mastitis. Front. Bioeng. Biotechnol. 2019, 7, 186. [Google Scholar] [CrossRef] [PubMed]
- Tommasoni, C.; Fiore, E.; Lisuzzo, A.; Gianesella, M. Mastitis in Dairy Cattle: On-Farm Diagnostics and Future Perspectives. Animals 2023, 13, 25381. [Google Scholar] [CrossRef]
- Haxhiaj, K.; Wishart, D.S.; Ametaj, B.N. Mastitis: What It Is, Current Diagnostics, and the Potential of Metabolomics to Identify New Predictive Biomarkers. Dairy 2022, 3, 722–746. [Google Scholar] [CrossRef]
- Bernhardt, H.; Höhendinger, M.; Gräff, A.; Hijazi, O.; Höld, M.; Reger, M.; Stumpenhausen, J. Development of Automatic Milking in Germany. In Proceedings of the 2019 ASABE Annual International Meeting, Boston, MA, USA, 7 July–10 July 2019; American Society of Agricultural and Biological Engineers: St. Joseph, MI, USA, 2019; p. 1. [Google Scholar]
- Kaswan, S.; Chandratre, G.A.; Upadhyay, D.; Sharma, A.; Sreekala, S.M.; Badgujar, P.C.; Panda, P.; Ruchay, A. Applications of sensors in livestock management. In Engineering Applications in Livestock Production; Academic Press: Cambridge, MA, USA, 2024; pp. 63–92. [Google Scholar]
- D’Anvers, L.; Adriaens, I.; Brulle, I.V.D.; Valckenier, D.; Salamone, M.; Piepers, S.; De Vliegher, S.; Aernouts, B. Key udder health parameters on dairy farms with an automated milking system. Livest. Sci. 2024, 287, 105522. [Google Scholar] [CrossRef]
- Bonestroo, J.; van der Voort, M.; Hogeveen, H.; Emanuelson, U.; Klaas, I.C.; Fall, N. Forecasting Chronic Mastitis Using Automatic Milking System Sensor Data and Gradient-Boosting Classifiers. Comput. Electron. Agric. 2022, 198, 107002. [Google Scholar] [CrossRef]
- Bobbo, T.; Biffani, S.; Taccioli, C.; Penasa, M.; Cassandro, M. Comparison of Machine Learning Methods to Predict Udder Health Status Based on Somatic Cell Counts in Dairy Cows. Sci. Rep. 2021, 11, 13642. [Google Scholar] [CrossRef]
- Hyde, R.M.; Down, P.M.; Bradley, A.J.; Breen, J.E.; Hudson, C.; Leach, K.A.; Green, M.J. Automated Prediction of Mastitis Infection Patterns in Dairy Herds Using Machine Learning. Sci. Rep. 2020, 10, 4289. [Google Scholar] [CrossRef] [PubMed]
- Post, C.; Rietz, C.; Büscher, W.; Müller, U. Using Sensor Data to Detect Lameness and Mastitis Treatment Events in Dairy Cows: A Comparison of Classification Models. Sensors 2020, 20, 3863. [Google Scholar] [CrossRef] [PubMed]
- Fadul-Pacheco, L.; Delgado, H.; Cabrera, V.E. Exploring Machine Learning Algorithms for Early Prediction of Clinical Mastitis. Int. Dairy J. 2021, 119, 105051. [Google Scholar] [CrossRef]
- Abdul Ghafoor, N.; Sitkowska, B. MasPA: A Machine Learning Application to Predict Risk of Mastitis in Cattle from AMS Sensor Data. Agriengineering 2021, 3, 575–583. [Google Scholar] [CrossRef]
- Tian, H.; Zhou, X.; Wang, H.; Xu, C.; Zhao, Z.; Xu, W.; Deng, Z. The Prediction of Clinical Mastitis in Dairy Cows Based on Milk Yield, Rumination Time, and Milk Electrical Conductivity Using Machine Learning Algorithms. Animals 2024, 14, 427. [Google Scholar] [CrossRef]
- Hannon, F.P.; Green, M.J.; O’grady, L.; Hudson, C.; Gouw, A.; Randall, L.V. Predictive modelling of deviation from expected milk yield in transition cows on automatic milking systems. Prev. Vet. Med. 2024, 225, 106160. [Google Scholar] [CrossRef]
- Dominiak, K.N.; Kristensen, A.R. Prioritizing Alarms from Sensor-Based Detection Models in Livestock Production—A Review on Model Performance and Alarm Reducing Methods. Comput. Electron. Agric. 2017, 133, 46–67. [Google Scholar] [CrossRef]
- Van Buuren, S. Flexible Imputation of Missing Data; CRC Press: Boca Raton, FL, USA, 2018. [Google Scholar]
- Madley-Dowd, P.; Hughes, R.; Tilling, K.; Heron, J. The Proportion of Missing Data Should Not Be Used to Guide Decisions on Multiple Imputation. J. Clin. Epidemiol. 2019, 110, 63–73. [Google Scholar] [CrossRef]
- Pham, T.M.; Pandis, N.; White, I.R. Missing Data: Issues, Concepts, Methods. Semin. Orthod. 2024, 30, 37–44. [Google Scholar] [CrossRef]
- Woods, A.D.; Gerasimova, D.; Van Dusen, B.; Nissen, J.; Bainter, S.; Uzdavines, A.; Davis-Kean, P.E.; Halvorson, M.; King, K.M.; Logan, J.A.; et al. Best practices for addressing missing data through multiple imputation. Infant. Child Dev. 2024, 33, e2407. [Google Scholar] [CrossRef]
- Li, J.; Wang, Z.; Wu, L.; Qiu, S.; Zhao, H.; Lin, F.; Zhang, K. Method for Incomplete and Imbalanced Data Based on Multivariate Imputation by Chained Equations and Ensemble Learning. IEEE J. Biomed. Health Inform. 2024, 28, 3102–3113. [Google Scholar] [CrossRef] [PubMed]
- Huang, G. Missing Data Filling Method Based on Linear Interpolation and Lightgbm. Proc. J. Phys. Conf. Ser. 2021, 1754, 012187. [Google Scholar] [CrossRef]
- Khushi, M.; Shaukat, K.; Alam, T.M.; Hameed, I.A.; Uddin, S.; Luo, S.; Yang, X.; Reyes, M.C. A Comparative Performance Analysis of Data Resampling Methods on Imbalance Medical Data. IEEE Access 2021, 9, 109960–109975. [Google Scholar] [CrossRef]
- Johnson, J.M.; Khoshgoftaar, T.M. A Survey on Classifying Big Data with Label Noise. J. Data Inf. Qual. 2022, 14, 43. [Google Scholar] [CrossRef]
- Guo, J.; Wu, H.; Chen, X.; Lin, W. Adaptive SV-Borderline SMOTE-SVM algorithm for imbalanced data classification. Appl. Soft. Comput. 2024, 150, 110986. [Google Scholar] [CrossRef]
- Pedregosa, F.; Varoquaux, G.; Gramfort, A.; Michel, V.; Thirion, B.; Grisel, O.; Blondel, M.; Prettenhofer, P.; Weiss, R.; Dubourg, V.; et al. Scikit-Learn: Machine Learning in Python. J. Mach. Learn. Res. 2011, 12, 2825–2830. [Google Scholar]
- van Leerdam, M.; Hut, P.R.; Liseune, A.; Slavco, E.; Hulsen, J.; Hostens, M. A Predictive Model for Hypocalcaemia in Dairy Cows Utilizing Behavioural Sensor Data Combined with Deep Learning. Comput. Electron. Agric. 2024, 220, 108877. [Google Scholar] [CrossRef]
- Ghorbani, R.; Ghousi, R. Comparing Different Resampling Methods in Predicting Students’ Performance Using Machine Learning Techniques. IEEE Access 2020, 8, 67899–67911. [Google Scholar] [CrossRef]
- Lemaître, G.; Nogueira, F.; Aridas, C.K. Imbalanced-Learn: A Python Toolbox to Tackle the Curse of Imbalanced Datasets in Machine Learning. J. Mach. Learn. Res. 2017, 18, 1–5. [Google Scholar]
- Kiouvrekis, Y.; Vasileiou, N.G.C.; Katsarou, E.I.; Lianou, D.T.; Michael, C.K.; Zikas, S.; Katsafadou, A.I.; Bourganou, M.V.; Liagka, D.V.; Chatzopoulos, D.C.; et al. The Use of Machine Learning to Predict Prevalence of Subclinical Mastitis in Dairy Sheep Farms. Animals 2024, 14, 2295. [Google Scholar] [CrossRef]
- Bagui, S.S.; Mink, D.; Bagui, S.C.; Subramaniam, S. Determining Resampling Ratios Using BSMOTE and SVM-SMOTE for Identifying Rare Attacks in Imbalanced Cybersecurity Data. Computers 2023, 12, 204. [Google Scholar] [CrossRef]
- Liu, J.-J.; Yao, J.-P.; Liu, J.-H.; Wang, Z.-Y.; Huang, L. Missing data imputation and classification of small sample missing time series data based on gradient penalized adversarial multi-task learning. Appl. Intell. 2024, 54, 2528–2550. [Google Scholar] [CrossRef]
- Park, I.; Kim, H.S.; Lee, J.; Kim, J.H.; Song, C.H.; Kim, H.K. Temperature Prediction Using the Missing Data Refinement Model Based on a Long Short-Term Memory Neural Network. Atmosphere 2019, 10, 718. [Google Scholar] [CrossRef]
- Magallanes-Quintanar, R.; Galván-Tejada, C.E.; Galván-Tejada, J.I.; Gamboa-Rosales, H.; Méndez-Gallegos, S.d.J.; García-Domínguez, A. Neural Hierarchical Interpolation for Standardized Precipitation Index Forecasting. Atmosphere 2024, 15, 912. [Google Scholar] [CrossRef]
- Abidin, N.Z.; Ritahani, A.; Emran, A.N. Performance Analysis of Machine Learning Algorithms for Missing Value Imputation. ijacsa 2018, 9, 660. [Google Scholar] [CrossRef]
- Ou, H.; Yao, Y.; He, Y. Missing data imputation method combining random forest and generative adversarial imputation network. Sensors 2024, 24, 1112. [Google Scholar] [CrossRef]
- Nithya, R.; Kokilavani, T.; Beena, T.L.A. Balancing Cerebrovascular Disease Data with Integrated Ensemble Learning and SVM-SMOTE. Netw. Model. Anal. Health Inform. Bioinform. 2024, 13, 12. [Google Scholar] [CrossRef]
- Mukaka, M.; White, S.A.; Terlouw, D.J.; Mwapasa, V.; Kalilani-Phiri, L.; Faragher, E.B. Is Using Multiple Imputation Better than Complete Case Analysis for Estimating a Prevalence (Risk) Difference in Randomized Controlled Trials When Binary Outcome Observations Are Missing? Trials 2016, 17, 341. [Google Scholar] [CrossRef]
- Buabeng, A.; Simons, A.; Frempong, N.K.; Ziggah, Y.Y. A Novel Hybrid Predictive Maintenance Model Based on Clustering, Smote and Multi-Layer Perceptron Neural Network Optimised with Grey Wolf Algorithm. SN Appl. Sci. 2021, 3, 593. [Google Scholar] [CrossRef]
- Wongvorachan, T.; He, S.; Bulut, O. A Comparison of Undersampling, Oversampling, and SMOTE Methods for Dealing with Imbalanced Classification in Educational Data Mining. Information 2023, 14, 54. [Google Scholar] [CrossRef]
- Jian, C.; Gao, J.; Ao, Y. A New Sampling Method for Classifying Imbalanced Data Based on Support Vector Machine Ensemble. Neurocomputing 2016, 193, 115–122. [Google Scholar] [CrossRef]
- Tiwaskar, S.; Rashid, M.; Gokhale, P. Impact of machine learning-based imputation techniques on medical datasets-a comparative analysis. Multimed. Tools Appl. 2024, 1–21. [Google Scholar] [CrossRef]
- Upadhyay, A.; Singh, M.; Yadav, V.K. Improvised Number Identification Using SVM and Random Forest Classifiers. J. Inf. Optim. Sci. 2020, 41, 387–394. [Google Scholar] [CrossRef]
- Kaur, P.; Joshi, J.C.; Aggarwal, P. Estimation of missing weather variables using different data mining techniques for avalanche forecasting. Nat. Hazards 2024, 120, 5075–5098. [Google Scholar] [CrossRef]
Variable | Total Cases | Missing Values |
---|---|---|
Alarm * | 75,217 | 0 |
EC_FL | 39,189 | 36,028 |
EC_FR | 39,189 | 36,028 |
EC_BR | 38,902 | 36,315 |
EC_BL | 39,075 | 36,142 |
EC_ALL | 37,891 | 37,326 |
SCC | 16,548 | 58,669 |
Milk yield | 75,188 | 29 |
Ma_mlk | 75,217 | 0 |
Ma_EC | 41,201 | 3220 |
Ma_SCC | 23,691 | 36,449 |
Std_EC | 40,304 | 10,581 |
Std_mlk | 75,215 | 2 |
Std_scc | 17,370 | 57,847 |
Imbalance Ratio | Negative Cases | Positive Cases | |
---|---|---|---|
Complete cases | |||
Training set | 38.31 | 4061 | 106 |
Validation | 39.07 | 1016 | 26 |
Data with replaced missing values | |||
Training set | 158.92 | 47,837 | 301 |
Validation | 151.34 | 11,956 | 79 |
Test set | 42.93 | 601 | 14 |
Imbalance Ratio | Negative | Positive | |
---|---|---|---|
Complete cases | |||
No resampling | 38.31 | 4061 | 106 |
SMOTE | 1.00 | 4061 | 4061 |
SMOTEEN | 1.005 | 4021 | 4000 |
SVMSMOTE | 1.00 | 4061 | 4061 |
Simple Imputer | |||
No resampling | 158.92 | 47,837 | 301 |
SMOTE | 1.00 | 47,837 | 47,837 |
SMOTEEN | 0.99 | 47,157 | 47,834 |
SVMSMOTE | 1.00 | 47,837 | 47,837 |
Multiple Imputer | |||
No resampling | 158.92 | 47,837 | 301 |
SMOTE | 1.00 | 47,837 | 47,837 |
SMOTEEN | 1.01 | 47,830 | 47,159 |
SVMSMOTE | 1.00 | 47,837 | 47,837 |
Linear Interpolation | |||
No resampling | 158.92 | 47,837 | 301 |
SMOTE | 1.00 | 59,859 | 59,859 |
SMOTEEN | 1.03 | 59,834 | 58,599 |
SVMSMOTE | 1.00 | 59,859 | 59,859 |
Predicted Positive | Predicted Negative | |
---|---|---|
Actual Positive | True positive, TP | False negative, FN |
Actual Negative | False positive, FP | True negative, TN |
CC | SI | MI | LI | |
---|---|---|---|---|
Accuracy | ||||
LR | 0.877 | 0.859 | 0.857 | 0.815 |
DT | 0.968 | 0.98 | 0.974 | 0.98 |
RF | 0.991 | 0.981 | 0.983 | 0.984 |
MLP | 0.817 | 0.966 | 0.959 | 0.951 |
Precision | ||||
LR | 0.157 | 0.25 | 0.236 | 0.232 |
DT | 0.575 | 0.571 | 0.489 | 0.584 |
RF | 0.837 | 0.6 | 0.632 | 0.634 |
MLP | 0.166 | 0.462 | 0.396 | 0.462 |
Recall | ||||
LR | 1 | 0.893 | 0.804 | 0.821 |
DT | 0.786 | 0.857 | 0.857 | 0.768 |
RF | 0.75 | 0.821 | 0.929 | 0.857 |
MLP | 0.661 | 0.946 | 0.875 | 0.75 |
F1 Score | ||||
LR | 0.271 | 0.305 | 0.257 | 0.22 |
DT | 0.613 | 0.677 | 0.615 | 0.652 |
RF | 0.787 | 0.674 | 0.733 | 0.717 |
MLP | 0.366 | 0.588 | 0.5 | 0.36 |
Kappa | ||||
LR | 0.241 | 0.277 | 0.229 | 0.191 |
DT | 0.602 | 0.668 | 0.603 | 0.642 |
RF | 0.782 | 0.665 | 0.725 | 0.71 |
MLP | 0.224 | 0.574 | 0.484 | 0.344 |
Imputation | Resampling | Accuracy | F1Score | Precision | Recall | Kappa | Overall Rank |
---|---|---|---|---|---|---|---|
LR models | |||||||
SI | None | 0.984 | 0.615 | 0.667 | 0.571 | 0.607 | 27 |
MI | None | 0.980 | 0.455 | 0.625 | 0.357 | 0.445 | 42 |
LI | None | 0.980 | 0.400 | 0.667 | 0.286 | 0.392 | 46 |
CC | SVMSMOTE | 0.886 | 0.286 | 0.167 | 1.000 | 0.257 | 49 |
CC | None | 0.876 | 0.269 | 0.156 | 1.000 | 0.239 | 59 |
MLP models | |||||||
SI | None | 0.990 | 0.786 | 0.786 | 0.786 | 0.781 | 7 |
SI | SVMSMOTE | 0.966 | 0.571 | 0.400 | 1.000 | 0.557 | 29 |
MI | None | 0.982 | 0.560 | 0.636 | 0.500 | 0.551 | 30 |
SI | SMOTE | 0.958 | 0.519 | 0.350 | 1.000 | 0.502 | 36 |
MI | SMOTE | 0.954 | 0.500 | 0.333 | 1.000 | 0.482 | 36 |
DT models | |||||||
LI | None | 0.992 | 0.815 | 0.846 | 0.786 | 0.811 | 4 |
SI | None | 0.990 | 0.800 | 0.750 | 0.857 | 0.795 | 6 |
CC | None | 0.990 | 0.750 | 0.900 | 0.643 | 0.745 | 9 |
MI | None | 0.987 | 0.750 | 0.667 | 0.857 | 0.743 | 10 |
CC | SVMSMOTE | 0.985 | 0.743 | 0.619 | 0.929 | 0.735 | 11 |
RF models | |||||||
MI | None | 0.998 | 0.963 | 1.000 | 0.929 | 0.962 | 1 |
CC | None | 0.993 | 0.833 | 1.000 | 0.714 | 0.830 | 2 |
CC | SMOTEEN | 0.992 | 0.815 | 0.846 | 0.786 | 0.811 | 3 |
LI | SVMSMOTE | 0.990 | 0.813 | 0.722 | 0.929 | 0.808 | 5 |
CC | SVMSMOTE | 0.989 | 0.759 | 0.733 | 0.786 | 0.753 | 8 |
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content. |
© 2024 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).
Share and Cite
Kashongwe, O.; Kabelitz, T.; Ammon, C.; Minogue, L.; Doherr, M.; Silva Boloña, P.; Amon, T.; Amon, B. Influence of Preprocessing Methods of Automated Milking Systems Data on Prediction of Mastitis with Machine Learning Models. AgriEngineering 2024, 6, 3427-3442. https://doi.org/10.3390/agriengineering6030195
Kashongwe O, Kabelitz T, Ammon C, Minogue L, Doherr M, Silva Boloña P, Amon T, Amon B. Influence of Preprocessing Methods of Automated Milking Systems Data on Prediction of Mastitis with Machine Learning Models. AgriEngineering. 2024; 6(3):3427-3442. https://doi.org/10.3390/agriengineering6030195
Chicago/Turabian StyleKashongwe, Olivier, Tina Kabelitz, Christian Ammon, Lukas Minogue, Markus Doherr, Pablo Silva Boloña, Thomas Amon, and Barbara Amon. 2024. "Influence of Preprocessing Methods of Automated Milking Systems Data on Prediction of Mastitis with Machine Learning Models" AgriEngineering 6, no. 3: 3427-3442. https://doi.org/10.3390/agriengineering6030195
APA StyleKashongwe, O., Kabelitz, T., Ammon, C., Minogue, L., Doherr, M., Silva Boloña, P., Amon, T., & Amon, B. (2024). Influence of Preprocessing Methods of Automated Milking Systems Data on Prediction of Mastitis with Machine Learning Models. AgriEngineering, 6(3), 3427-3442. https://doi.org/10.3390/agriengineering6030195