Appropriate Data Quality Checks Improve the Reliability of Values Predicted from Milk Mid-Infrared Spectra
Abstract
:Simple Summary
Abstract
1. Introduction
2. Materials and Methods
2.1. Data
2.2. Data-Cleaning Techniques
3. Results
3.1. Comparison between Manufacturer’s and Externally Predicted Phenotypes
3.2. Deletion of Extreme Predicted Phenotypes (Method 1)
3.3. GH-Based Data-Cleaning (Method 2)
3.4. Data Cleaning Based on the Absolute Fat Residual Limit (Method 3)
3.5. Comparison of the Three Tested Data-Cleaning Methods
4. Discussion
5. Conclusions
Author Contributions
Funding
Institutional Review Board Statement
Informed Consent Statement
Data Availability Statement
Acknowledgments
Conflicts of Interest
References
- Gargiulo, J.I.; Eastwood, C.R.; Garcia, S.C.; Lyons, N.A. Dairy farmers with larger herd sizes adopt more precision dairy technologies. J. Dairy Sci. 2018, 101, 5466–5473. [Google Scholar] [CrossRef] [PubMed]
- Lamb, R.C. Improving Large Dairy Herd Management Practices: Review of Regional Project NC-1191. J. Dairy Sci. 1978, 61, 1284–1291. [Google Scholar] [CrossRef]
- Boichard, D.; Brochard, M. New phenotypes for new breeding goals in dairy cattle. Animal 2012, 6, 544–550. [Google Scholar] [CrossRef] [PubMed] [Green Version]
- Gengler, N.; Soyeurt, H.; Dehareng, F.; Bastin, C.; Colinet, F.; Hammami, H.; Vanrobays, M.-L.; Lainé, A.; Vanderick, S.; Grelet, C.; et al. Capitalizing on fine milk composition for breeding and management of dairy cows1. J. Dairy Sci. 2016, 99, 4071–4079. [Google Scholar] [CrossRef] [Green Version]
- Soyeurt, H.; Dardenne, P.; Dehareng, F.; Lognay, G.; Veselko, D.; Marlier, M.; Bertozzi, C.; Mayeres, P.; Gengler, N. Estimating fatty acid content in cow milk using mid-infrared spectrometry. J. Dairy Sci. 2006, 89, 3690–3695. [Google Scholar] [CrossRef] [Green Version]
- De Marchi, M.; Penasa, M.; Cecchinato, A.; Mele, M.; Secchiari, P.; Bittante, G. Effectiveness of mid-infrared spectroscopy to predict fatty acid composition of Brown Swiss bovine milk. Animal 2011, 5, 1653–1658. [Google Scholar] [CrossRef] [Green Version]
- Soyeurt, H.; Bruwier, D.; Romnee, J.-M.; Gengler, N.; Bertozzi, C.; Veselko, D.; Dardenne, P. Potential estimation of major mineral contents in cow milk using mid-infrared spectrometry. J. Dairy Sci. 2009, 92, 2444–2454. [Google Scholar] [CrossRef] [Green Version]
- De Marchi, M.; Fagan, C.C.; O Donnell, C.P.; Cecchinato, A.; Dal Zotto, R.; Cassandro, M.; Penasa, M.; Bittante, G. Prediction of coagulation properties, titratable acidity, and pH of bovine milk using mid-infrared spectroscopy. J. Dairy Sci. 2009, 92, 423–432. [Google Scholar] [CrossRef]
- McParland, S.; Banos, G.; Wall, E.; Coffey, M.P.; Soyeurt, H.; Veerkamp, R.F.; Berry, D.P. The use of mid-infrared spectrometry to predict body energy status of Holstein cows1. J. Dairy Sci. 2011, 94, 3651–3661. [Google Scholar] [CrossRef] [PubMed] [Green Version]
- Grelet, C.; Froidmont, E.; Foldager, L.; Salavati, M.; Hostens, M.; Ferris, C.P.; Ingvartsen, K.L.; Crowe, M.A.; Sorensen, M.T.; Fernandez Pierna, J.A.; et al. Potential of milk mid-infrared spectra to predict nitrogen use efficiency of individual dairy cows in early lactation. J. Dairy Sci. 2020, 103, 4435–4445. [Google Scholar] [CrossRef]
- Delhez, P.; Ho, P.N.; Gengler, N.; Soyeurt, H.; Pryce, J.E. Diagnosing the pregnancy status of dairy cows: How useful is milk mid-infrared spectroscopy? J. Dairy Sci. 2020, 103, 3264–3274. [Google Scholar] [CrossRef] [PubMed] [Green Version]
- Ho, P.N.; Bonfatti, V.; Luke, T.D.W.; Pryce, J.E. Classifying the fertility of dairy cows using milk mid-infrared spectroscopy. J. Dairy Sci. 2019, 102, 10460–10470. [Google Scholar] [CrossRef] [Green Version]
- Vanlierde, A.; Soyeurt, H.; Gengler, N.; Colinet, F.G.; Froidmont, E.; Kreuzer, M.; Grandl, F.; Bell, M.; Lund, P.; Olijhoek, D.W.; et al. Short communication: Development of an equation for estimating methane emissions of dairy cows from milk Fourier transform mid-infrared spectra by using reference data obtained exclusively from respiration chambers. J. Dairy Sci. 2018, 101, 7618–7624. [Google Scholar] [CrossRef] [Green Version]
- Hansen, P.W.; Holroyd, S.E. Development and application of Fourier transform infrared spectroscopy for detection of milk adulteration in practice. Int. J. Dairy Technol. 2019, 72, 321–331. [Google Scholar] [CrossRef]
- Zeaiter, M.; Roger, J.-M.; Bellon-Maurel, V.; Rutledge, D.N. Robustness of models developed by multivariate calibration. Part I: The assessment of robustness. Trac Trends Anal. Chem. 2004, 23, 157–170. [Google Scholar] [CrossRef]
- Grelet, C.; Dardenne, P.; Soyeurt, H.; Fernandez, J.A.; Vanlierde, A.; Stevens, F.; Gengler, N.; Dehareng, F. Large-scale phenotyping in dairy sector using milk MIR spectra: Key factors affecting the quality of predictions. Methods 2021, 186, 97–111. [Google Scholar] [CrossRef] [PubMed]
- Thomas, E.V.; Ge, N. Development of robust multivariate calibration models. Technometrics 2000, 42, 168–177. [Google Scholar] [CrossRef]
- Melfsen, A.; Hartung, E.; Haeussermann, A. Robustness of near-infrared calibration models for the prediction of milk constituents during the milking process. J. Dairy Res. 2013, 80, 103–112. [Google Scholar] [CrossRef]
- Mahalanobis, P.C. On the Generalized Distance in Statistics; National Institute of Science of India: Karnataka, India, 1936. [Google Scholar]
- Mark, H. Normalized Distances for Qualitative Near-Infrared Reflectance Analysis. Anal. Chem. 1986, 58, 379–384. [Google Scholar] [CrossRef]
- Dokas, P.; Ertoz, L.; Kumar, V.; Lazarevic, A.; Srivastava, J.; Tan, P.-N. Data mining for network intrusion detection. In Proceedings of the NSF Workshop on Next Generation Data Mining, University of Minnesota, Minneapolis, MN, USA, 1–3 November 2002; pp. 21–30. [Google Scholar]
- Shenk, J.S.; Westerhaus, M.O. Population definition, sample selection, and calibration procedures for near infrared reflectance spectroscopy. Crop Sci. 1991, 31, 469–474. [Google Scholar] [CrossRef]
- De Maesschalck, R.; Jouan-Rimbaud, D.; Massart, D.L. The Mahalanobis distance. Chemom. Intell. Lab. Syst. 2000, 50, 1–18. [Google Scholar] [CrossRef]
- Whitfield, R.G.; Gerger, M.E.; Sharp, R.L. Near-Infrared Spectrum Qualification Via Mahalanobis Distance Determination. Appl. Spectrosc. 1987, 41, 1204–1213. [Google Scholar] [CrossRef]
- Dale, L.M.; Werner, A.; Spiekers, H.; Stamer, E.; Au, M.; Onken, F. Prediction of evaluated energy balance (NEL and ME) in dairy cows by milk mid- infrared (MIR) spectra. ICAR Tech. Ser. 2019, 24, 137–141. [Google Scholar]
- Grelet, C.; Pierna, J.A.F.; Dardenne, P.; Soyeurt, H.; Vanlierde, A.; Colinet, F.; Bastin, C.; Gengler, N.; Baeten, V.; Dehareng, F. Standardization of milk mid-infrared spectrometers for the transfer and use of multiple models. J. Dairy Sci. 2017, 100, 7910–7921. [Google Scholar] [CrossRef] [PubMed] [Green Version]
- ICAR. Section 2—Guidelines for Dairy Cattle Milk Recording. International Organization for Standardization; ICAR: Geneva, Switzerland, 2017; Available online: https://www.icar.org/Guidelines/02-Overview-Cattle-Milk-Recording.pdf (accessed on 6 June 2019).
- Williams, P. Near-Infrared Technology Getting the Best out of Light; PDK: Nanaimo, BC, Canada, 2003. [Google Scholar]
- Garrido-Varo, A.; Garcia-Olmo, J.; Fearn, T. A note on Mahalanobis and related distance measures in WinISI and The Unscrambler. J. Near Infrared Spectrosc. 2019, 27, 253–258. [Google Scholar] [CrossRef] [Green Version]
- Cao, Z.; Huang, W.; Wang, T.; Wang, Y.; Wen, W.; Ma, M.; Li, S. Effects of parity, days in milk, milk production and milk components on milk urea nitrogen in Chinese Holstein. J. Anim. Vet. Adv. 2010, 9, 688–695. [Google Scholar] [CrossRef] [Green Version]
- Yang, L.; Yang, Q.; Yi, M.; Pang, Z.H.; Xiong, B.H. Effects of seasonal change and parity on raw milk composition and related indices in Chinese Holstein cows in northern China. J. Dairy Sci. 2013, 96, 6863–6869. [Google Scholar] [CrossRef] [PubMed]
- Bastin, C.; Berry, D.P.; Soyeurt, H.; Gengler, N. Genetic correlations of days open with production traits and contents in milk of major fatty acids predicted by mid-infrared spectrometry. J. Dairy Sci. 2012, 95, 6113–6121. [Google Scholar] [CrossRef] [Green Version]
- Fleming, A.; Schenkel, F.S.; Chen, J.; Malchiodi, F.; Bonfatti, V.; Ali, R.A.; Mallard, B.; Corredig, M.; Miglior, F. Prediction of milk fatty acid content with mid-infrared spectroscopy in Canadian dairy cattle using differently distributed model development sets. J. Dairy Sci. 2017, 100, 5073–5081. [Google Scholar] [CrossRef] [Green Version]
- De Vries, A.; Feleke, S. Prediction of future uniform milk prices in Florida Federal Milk Marketing Order 6 from milk futures markets. J. Dairy Sci. 2008, 91, 4871–4880. [Google Scholar] [CrossRef]
- Næs, T.; Isaksson, T.; Fearn, T.; Davies, T. A User friendly Guide to Multivariate Calibration and Classification; NIR Publications: Innsbruck, Australia, 2002; ISBN 0952866625. [Google Scholar]
- Mark, H.L.; Tunnell, D. Qualitative Near-Infrared Reflectance Analysis using Mahalanobis Distances. Anal. Chem. 1985, 57, 1449–1456. [Google Scholar] [CrossRef]
- Ritchie, G.E.; Mark, H.; Ciurczak, E.W. Evaluation of the Conformity Index and the Mahalanobis distance as a tool for process analysis: A technical note. AAPS PharmSciTech 2003, 4, 1–10. [Google Scholar] [CrossRef] [PubMed] [Green Version]
- Liu, S.; Zhang, R.; Kang, R.; Meng, J.; Ao, C. Milk fatty acids profiles and milk production from dairy cows fed different forage quality diets. Anim. Nutr. 2016, 2, 329–333. [Google Scholar] [CrossRef] [PubMed]
- Bastin, C.; Gengler, N. Genetics of body condition score as an indicator of dairy cattle fertility. A review. Biotechnol. Agron. Soc. Environ. 2013, 17, 64–75. [Google Scholar]
- Hammami, H.; Vandenplas, J.; Vanrobays, M.L.; Rekik, B.; Bastin, C.; Gengler, N. Genetic analysis of heat stress effects on yield traits, udder health, and fatty acids of Walloon Holstein cows. J. Dairy Sci. 2015, 98, 4956–4968. [Google Scholar] [CrossRef] [Green Version]
- Morozova, M. Discriminant Analysis and Mahalanobis Distance (Nir Diffuse Reflectance Spectra) in the Assessment of Drug’S Batch-To-Batch Dispersion and Quality Threshold Establishment. Eur. Sci. J. 2013, 9, 8–25. [Google Scholar]
- Soyeurt, H.; Dehareng, F.; Gengler, N.; McParland, S.; Wall, E.; Berry, D.P.; Coffey, M.; Dardenne, P. Mid-infrared prediction of bovine milk fatty acids across multiple breeds, production systems, and countries. J. Dairy Sci. 2011, 94, 1657–1667. [Google Scholar] [CrossRef] [Green Version]
Trait (g/dL of Milk) | N 1 | Mean ± SD 2 | Cross-Validation | |
---|---|---|---|---|
RMSE * | R2 | |||
Fat | 1799 | 3.93 ± 1.00 | 0.0086 | 0.9999 |
Protein | 4305 | 3.36 ± 0.41 | 0.0200 | 0.9976 |
Monounsaturated fatty acids | 1793 | 1.08 ± 0.34 | 0.0581 | 0.9705 |
Saturated fatty acids | 1790 | 2.69 ± 0.74 | 0.0719 | 0.9904 |
Unsaturated fatty acids | 1788 | 1.24 ± 0.37 | 0.0648 | 0.9698 |
N = 346,818 | Parameters | Fat | Protein | MFA 1 | SFA 1 | UFA 1 |
---|---|---|---|---|---|---|
Manufacturer’s values | Mean | 3.93 | 3.41 | 0.85 | 2.58 | 0.92 |
SD 2 | 1.10 | 0.42 | 0.32 | 0.76 | 0.38 | |
CV 3 | 27.99 | 12.32 | 37.65 | 29.46 | 41.30 | |
Minimum | 1.01 | 1.01 | 0.04 | 0.10 | 0.01 | |
Maximum | 8.99 | 6.99 | 4.61 | 8.00 | 5.19 | |
Skewness | 0.65 | 0.40 | 1.59 | 0.79 | 1.44 | |
Kurtosis | 1.52 | 2.62 | 6.00 | 2.12 | 5.07 | |
Externally predicted values | Mean | 3.94 | 3.53 | 1.13 | 2.62 | 1.28 |
SD | 1.10 | 0.47 | 0.42 | 0.75 | 0.46 | |
CV | 27.92 | 13.31 | 37.17 | 28.63 | 35.94 | |
Minimum | 0.72 | 0.52 | −5.62 | 0.13 | −6.44 | |
Maximum | 9.92 | 7.10 | 5.26 | 7.32 | 5.66 | |
Skewness | 0.69 | 0.28 | 1.22 | 0.72 | 1.12 | |
Kurtosis | 1.63 | 1.88 | 5.56 | 1.80 | 5.39 | |
Prediction relationship 4 | RMSD | 0.173 | 0.187 | 0.327 | 0.199 | 0.397 |
r | 0.99 | 0.95 | 0.94 | 0.97 | 0.94 |
Traits | Threshold | N | Data loss (%) | g/dL of milk | GainRMSD 2 (%) | |||
---|---|---|---|---|---|---|---|---|
1% | 99% | Mean | SD | RMSD 1 | ||||
Fat | 1.59 | 7.35 | 339,909 | 1.99 | 3.93 | 0.99 | 0.17 | 1.29 |
Protein | 2.49 | 4.78 | 339,941 | 1.98 | 3.52 | 0.42 | 0.185 | 0.97 |
MFA 3 | 0.35 | 2.51 | 339,708 | 2.05 | 1.12 | 0.36 | 0.319 | 2.62 |
SFA 3 | 1.06 | 4.92 | 339,852 | 2.00 | 2.61 | 0.68 | 0.195 | 1.95 |
UFA 3 | 0.4 | 2.78 | 339,892 | 2.00 | 1.27 | 0.40 | 0.389 | 2.16 |
Traits | GH Distance | GH ≤ 3 | 3 < GH ≤ 5 | GH > 5 | r | ||
---|---|---|---|---|---|---|---|
Mean | SD | Max | (%) | (%) | (%) | (GH,e2) | |
Fat | 1.84 | 2.66 | 182.5 | 86.61 | 8.27 | 5.12 | 0.35 |
Protein | 1.90 | 4.11 | 182.6 | 86.36 | 7.99 | 5.65 | 0.07 |
Monounsaturated FAs 1 | 2.37 | 3.23 | 211.3 | 79.73 | 11.42 | 8.85 | 0.48 |
Saturated FAs | 1.78 | 2.48 | 170.3 | 87.52 | 7.84 | 4.64 | 0.47 |
Unsaturated FAs | 2.37 | 3.23 | 209.7 | 79.70 | 11.46 | 8.85 | 0.47 |
Traits | N | Data Loss (%) | g/dL of Milk | GainRMSD 2 (%) | ||
---|---|---|---|---|---|---|
Mean | SD | RMSD 1 | ||||
Fat | 329,064 | 5.12 | 3.89 | 1.01 | 0.159 | 8.88 |
Protein | 327,230 | 5.65 | 3.52 | 0.42 | 0.185 | 1.29 |
Monounsaturated fatty acids | 316,128 | 8.85 | 1.1 | 0.37 | 0.303 | 7.79 |
Saturated fatty acids | 330,739 | 4.64 | 2.59 | 0.70 | 0.188 | 5.97 |
Unsaturated fatty acids | 316,128 | 8.85 | 1.25 | 0.40 | 0.376 | 5.72 |
N = 316,025 | g/dL of Milk | GainRMSD 2 (%) | ||
---|---|---|---|---|
Mean | SD | RMSD 1 | ||
Fat | 3.88 | 1.06 | 0.125 | 38.82 |
Protein | 3.53 | 0.46 | 0.184 | 1.49 |
Monounsaturated fatty acids | 1.1 | 0.38 | 0.304 | 7.52 |
Saturated fatty acids | 2.58 | 0.73 | 0.185 | 7.64 |
Unsaturated fatty acids | 1.24 | 0.42 | 0.373 | 6.35 |
Traits | Parameters | M1 1 | M2 1 | M3 1 | M1 and M2 | M1 and M3 | M2 and M3 | M1 and M2 and M3 | M1 or M2 | M1 or M3 | M2 or M3 | M1 or M2 or M3 |
---|---|---|---|---|---|---|---|---|---|---|---|---|
Fat | GainRMSD 3 (%) | 1.29 | 8.88 | 38.82 | 38.82 | 1.70 | 8.39 | 1.40 | 8.88 | 38.77 | 41.39 | 41.08 |
N loss 4 (%) | 1.99 | 5.12 | 8.88 | 0.76 | 0.33 | 2.30 | 0.26 | 6.35 | 10.54 | 11.69 | 12.85 | |
Gain:loss 5 | 0.65 | 1.73 | 4.37 | 51.23 | 5.09 | 3.64 | 5.48 | 1.40 | 3.68 | 3.54 | 3.20 | |
Protein | GainRMSD (%) | 0.97 | 1.29 | 1.49 | 1.49 | 0.35 | 0.66 | 0.32 | 1.60 | 2.21 | 2.22 | 2.54 |
N loss (%) | 1.98 | 5.65 | 8.88 | 1.41 | 0.36 | 1.42 | 0.29 | 6.22 | 10.50 | 13.11 | 13.61 | |
Gain:loss | 0.49 | 0.23 | 0.17 | 1.06 | 0.97 | 0.47 | 1.12 | 0.26 | 0.21 | 0.17 | 0.19 | |
MFA 2 | GainRMSD (%) | 2.62 | 7.79 | 7.52 | 7.52 | 1.74 | 4.69 | 1.62 | 7.96 | 8.77 | 11.36 | 11.42 |
N loss (%) | 2.05 | 8.85 | 8.88 | 0.93 | 0.57 | 2.98 | 0.48 | 9.97 | 10.36 | 14.75 | 15.78 | |
Gain:loss | 1.28 | 0.88 | 0.85 | 8.07 | 3.07 | 1.58 | 3.38 | 0.80 | 0.85 | 0.77 | 0.72 | |
SFA 2 | GainRMSD (%) | 1.95 | 5.97 | 7.64 | 7.64 | 2.27 | 3.34 | 0.65 | 6.63 | 9.29 | 10.99 | 11.60 |
N loss (%) | 2.01 | 4.64 | 8.88 | 0.58 | 1.02 | 2.13 | 0.15 | 6.07 | 10.63 | 11.38 | 12.71 | |
Gain:loss | 0.97 | 1.29 | 0.86 | 13.22 | 2.22 | 1.57 | 4.20 | 1.09 | 0.87 | 0.97 | 0.91 | |
UFA 2 | GainRMSD (%) | 2.16 | 5.72 | 6.35 | 6.35 | 1.57 | 3.82 | 1.48 | 5.74 | 7.18 | 8.74 | 8.67 |
N loss (%) | 2.00 | 8.85 | 8.88 | 0.92 | 0.56 | 2.95 | 0.47 | 9.92 | 10.32 | 14.77 | 15.76 | |
Gain:loss | 1.08 | 0.65 | 0.72 | 6.87 | 2.82 | 1.29 | 3.17 | 0.58 | 0.70 | 0.59 | 0.55 |
Publisher’s Note: MDPI stays neutral with regard to jurisdictional claims in published maps and institutional affiliations. |
© 2021 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (http://creativecommons.org/licenses/by/4.0/).
Share and Cite
Zhang, L.; Li, C.; Dehareng, F.; Grelet, C.; Colinet, F.; Gengler, N.; Brostaux, Y.; Soyeurt, H. Appropriate Data Quality Checks Improve the Reliability of Values Predicted from Milk Mid-Infrared Spectra. Animals 2021, 11, 533. https://doi.org/10.3390/ani11020533
Zhang L, Li C, Dehareng F, Grelet C, Colinet F, Gengler N, Brostaux Y, Soyeurt H. Appropriate Data Quality Checks Improve the Reliability of Values Predicted from Milk Mid-Infrared Spectra. Animals. 2021; 11(2):533. https://doi.org/10.3390/ani11020533
Chicago/Turabian StyleZhang, Lei, Chunfang Li, Frédéric Dehareng, Clément Grelet, Frédéric Colinet, Nicolas Gengler, Yves Brostaux, and Hélène Soyeurt. 2021. "Appropriate Data Quality Checks Improve the Reliability of Values Predicted from Milk Mid-Infrared Spectra" Animals 11, no. 2: 533. https://doi.org/10.3390/ani11020533
APA StyleZhang, L., Li, C., Dehareng, F., Grelet, C., Colinet, F., Gengler, N., Brostaux, Y., & Soyeurt, H. (2021). Appropriate Data Quality Checks Improve the Reliability of Values Predicted from Milk Mid-Infrared Spectra. Animals, 11(2), 533. https://doi.org/10.3390/ani11020533