Predictions of Milk Fatty Acid Contents by Mid-Infrared Spectroscopy in Chinese Holstein Cows

Genetic improvement of milk fatty acid content traits in dairy cattle is of great significance. However, chromatography-based methods to measure milk fatty acid content have several disadvantages. Thus, quick and accurate predictions of various milk fatty acid contents based on the mid-infrared spectrum (MIRS) from dairy herd improvement (DHI) data are essential and meaningful to expand the amount of phenotypic data available. In this study, 24 kinds of milk fatty acid concentrations were measured from the milk samples of 336 Holstein cows in Shandong Province, China, using the gas chromatography (GC) technique, which simultaneously produced MIRS values for the prediction of fatty acids. After quantification by the GC technique, milk fatty acid contents expressed as g/100 g of milk (milk-basis) and g/100 g of fat (fat-basis) were processed by five spectral pre-processing algorithms: first-order derivative (DER1), second-order derivative (DER2), multiple scattering correction (MSC), standard normal transform (SNV), and Savitzky–Golsy convolution smoothing (SG), and four regression models: random forest regression (RFR), partial least square regression (PLSR), least absolute shrinkage and selection operator regression (LassoR), and ridge regression (RidgeR). Two ranges of wavebands (4000~400 cm−1 and 3017~2823 cm−1/1805~1734 cm−1) were also used in the above analysis. The prediction accuracy was evaluated using a 10-fold cross validation procedure, with the ratio of the training set and the test set as 3:1, where the determination coefficient (R2) and residual predictive deviation (RPD) were used for evaluations. The results showed that 17 out of 31 milk fatty acids were accurately predicted using MIRS, with RPD values higher than 2 and R2 values higher than 0.75. In addition, 16 out of 31 fatty acids were accurately predicted by RFR, indicating that the ensemble learning model potentially resulted in a higher prediction accuracy. Meanwhile, DER1, DER2 and SG pre-processing algorithms led to high prediction accuracy for most fatty acids. In summary, these results imply that the application of MIRS to predict the fatty acid contents of milk is feasible.


Introduction
Lipids in milk provide a major source of energy and the essential structural components for the cell membranes of the newborns in all mammalian species. They also confer distinctive properties to dairy foods that affect further processing procedures [1]. Milk fat is rich in many fatty acids that are important to human health [2][3][4]. Studies have shown that more than 400 different fatty acids have been identified in milk fat, but most of them only appeared in trace amounts [5], where around 12 kinds of fatty acids in bovine milk fat Molecules 2023, 28, 666 2 of 11 presented at above a 1% concentration [6]. Moreover, changes in milk fatty acids may also affect cow health and energy statuses [7].
Currently, several techniques have been developed to measure fatty acids in milk, including high performance liquid chromatography (HPLC), gas chromatography (GC), near-infrared spectroscopy (NIRS), mid-infrared spectrum (MIRS), etc. [8][9][10]. Chemical methods (e.g., HPLC and GC) provide high measurement accuracy for fatty acid contents of bovine milk, but their pretreatments are multifarious and costly, causing difficulties in realizing the high-throughput measurements [11,12]. Of note, infrared spectroscopybased measurement methods show advantages of providing rapid and low-cost predictions of milk fatty acid contents [13]; thus, they have become the promising technologies for high-throughput measurements, but they still need to be optimized to improve their prediction accuracy.
The utilization of infrared spectroscopy to predict the milk fatty acid contents in dairy cattle has been reported in many studies. Coppa et al. (2010) established a prediction equation for milk fatty acid contents based on the NIRS from 468 milk samples that predicted the total milk fatty acids, SFA, MUFA, PUFA, C18:1, and conjugated linoleic acid (CLA), with R 2 values greater than 0.88. Soyeurt et al. (2006) developed a fatty acid prediction model using 600 milk samples from 275 cows of 6 breeds to predict C10:0, C12:0, C14:0, C16:0, C16:1cis-9, C18:1, C18:2cis-9, SFA (saturated fatty acids), and MUFA (monounsaturated fatty acids), based on MIRS data, with the cross-validated coefficients of determination (R 2 ) of 0.62~0.94. Subsequently, Soyeurt et al. (2011) investigated the MIRS prediction of fatty acids across various cattle breeds, production systems, and countries. They summarized that the usefulness of the built equations providing the best prediction accuracy for animal breeding and milk payment systems was R 2 ≥ 0.75 and 0.95, respectively [4]. for most individual fatty acid models. In addition, the genetic correlations among milk fatty acids predicted by MIRS were also explored in a large-scale milk sampling (n = 34,141) of New Zealand dairy cattle, where they implied the application of MIRS as the phenotypic proxy for the genetic selection of fatty acid contents [14]. In the Chinese Holstein population, Du et al. (2020) estimated the heritability of MIRS and several milk production traits, i.e., protein, fat, and lactose percentages, along with their genetic correlations. They found that MIRS heritability ranged from 0 to 0.11 and genetic correlations varied significantly [15]. In sheep, ewes, and goats, MIRS was also used to predict the fatty acid profiles for the establishment and validation of the predictive models [16][17][18].
Previous studies used a partial least square regression model (non-integrated learning model) [19,20] to investigate the effects of different spectral preprocessing methods on the prediction equation accuracy [4,5,[21][22][23]. However, the combined effects of the regression models and spectral preprocessing methods on the prediction equation accuracy for different fatty acids has rarely been explored, especially for the milk fat of Chinese Holstein cows. Therefore, the objective of this study was to investigate the prediction methods under the optimal strategy to predict milk fatty acids with high accuracy based on the MIRS data from the dairy herd improvement (DHI) database of Chinese Holstein cattle and to potentially provide the high-throughput measurements of a large amount of milk fatty acid phenotypic data; thereby, our study enabled milk fatty acid traits to be feasibly recorded for genetic evaluations of such traits in dairy cattle breeding programs in China. To the best of our knowledge, this is the first time the MIRS predictions on fatty acids of two types of fatty acid measurements (g/100 g of milk and g/100 g fat) have been investigated with five pre-processed algorithms and two ranges of wavebands (4000~400 cm −1 and 3017~2823 cm −1 /1805~1734 cm −1 ) using four regression models in Chinese Holstein cattle.

Milk Samples and Fatty Acids
Milk samples were collected from 336 Holstein cows on a farm in Shandong Province, China, including one small tube (30 mL) and one large tube (50 mL) from each cow. After sampling, all tubes were immediately stored in liquid nitrogen (−196 ℃) and delivered to our experimental lab for further analysis within 6 h. In this study, to maintain analysis consistency, none of the 672 collected milk samples received any preservative additions, and the milk samples in the 30 mL and 50 mL tubes were used to measure fatty acid contents and MIRS, respectively.

Milk Samples and Fatty Acids
Milk samples were collected from 336 Holstein cows on a farm in Shandong Province, China, including one small tube (30 mL) and one large tube (50 mL) from each cow. After sampling, all tubes were immediately stored in liquid nitrogen (−196 • C) and delivered to our experimental lab for further analysis within 6 h. In this study, to maintain analysis consistency, none of the 672 collected milk samples received any preservative additions, and the milk samples in the 30 mL and 50 mL tubes were used to measure fatty acid contents and MIRS, respectively.
Here, the GC methodology for the quantification of fatty acid contents in our study was similar to those in other studies [4,25]. The outputs of the GC technique were generated by analyzing the methyl esters from the fat in the milk following ISO Standard 15884 (ISO-IDF (International Organization for Standardization-International Dairy Federation), 2002). Normally, the GC technique is used as the gold standard for fatty acid measurements because of its high accuracy, even for low contents [26,27], while the MIRS method is more rapid and less expensive [13,21].

Predictions of Milk Fatty Acid Contents Using MIRS Data
Each fatty acid content quantified by the GC technique was converted from g/100 g of milk (milk-basis) to g/100 g of fat (fat-basis) using the fat contents determined by MIRS. The final MIRS values (the averaged values of two transformed MIRS replicates using the same milk sample) were processed using five spectral pre-processing algorithms, i.e., firstorder derivative (DER1), second-order derivative (DER2), multiple scattering correction (MSC), standard normal transform (SNV), and Savitzky-Golsy convolution smoothing (SG). In order to compare the influence of each pre-processing algorithm, we used them individually to process the final MIRS values. Two types of fatty acid measurements (g/100 g of milk and g/100 g fat), with the five pre-processed spectra above and two ranges of wavebands (4000~400 cm −1 and 3017~2823 cm −1 /1805~1734 cm −1 ), were analyzed using four regression models, i.e., random forest regression (RFR), partial least square regression (PLSR), least absolute shrinkage and selection operator regression (LassoR), and ridge regression (RidgeR). The determination coefficient (R 2 ) and residual predictive deviation (RPD) were used to evaluate the metrics of the four regression models. Prediction accuracy was assessed using a 10-fold cross validation procedure with the ratio of the training set and the test set as 3:1. The GC quantification technique, fatty acid measurements, spectral pre-processing algorithms, fatty acid prediction methods, and prediction accuracy assessments are summarized in Figure 2. RFR, PLSR, LassoR, RidgeR, R 2 , and RPD indicate gas chromatography, mid-infrared spectrum, first-order derivative, second-order derivative, multiple scattering correction, standard normal transform, Savitzky-Golsy convolution smoothing, random forest regression, partial least square regression, least absolute shrinkage and selection operator regression, ridge regression, determination coefficient, and residual predictive deviation, respectively.

Discussion
The concentrations of different milk fatty acids in our study (Table 1) seem slightly lower than those in other studies [5,[28][29][30], which could be caused mainly by the differences in feed diet and milk-collection times of the farm, where they supplied their own total mix ration (TMR) three times per day, which is less than other similar Chinese Holstein cattle farms (four or five times per day). Compared to the results of Soyeurt et al.
(2011) and Fleming et al. (2017), the variation coefficients ranged from 12.978% to 44.207% as fat-basis (g/100 g of fat), which were slightly lower, on average, than those in other studies. The higher variations of fatty acids as fat-basis (g/100 g of fat) in relation to those as milk-basis (g/100 g of milk) could be a tendency in which fatty acids exhibited high mean values and variation coefficients (Table 1).

Discussion
The concentrations of different milk fatty acids in our study (Table 1) seem slightly lower than those in other studies [5,[28][29][30], which could be caused mainly by the differences in feed diet and milk-collection times of the farm, where they supplied their own total mix ration (TMR) three times per day, which is less than other similar Chinese Holstein cattle farms (four or five times per day). Compared to the results of Soyeurt et al. (2011) and Fleming et al. (2017), the variation coefficients ranged from 12.978% to 44.207% as fat-basis (g/100 g of fat), which were slightly lower, on average, than those in other studies. The higher variations of fatty acids as fat-basis (g/100 g of fat) in relation to those as milk-basis (g/100 g of milk) could be a tendency in which fatty acids exhibited high mean values and variation coefficients (Table 1).
Many previous studies have investigated the accuracy and applicability of prediction models based on R 2 values. Soyeurt et al. (2011) suggested that models with R 2 > 0.75 might be utilized for animal breeding. However, Zaalberg et al. (2021) used prediction models with R 2 > 0.6 for mineral elements in animal breeding [31]. Cecchinato et al. (2009) showed low R 2 values for curd characteristics predicted by MIRS, but they found high genetic correlations between the measured values and the predicted values [32]. In our study, 17 fatty acids (C8:0, C10:0, C12:0, C14:0, C18:0, C20:0, C22:0, C24:0, C18:1n9c, C20:1, C20:5n3, SFA, UFA, MUFA, SCFA, MCFA, and LCFA) showed RPD ≥ 2 and R 2 ≥ 0.75 (Table 2), which is consistent with the results of Soyeurt et al. (2006). This suggests that these 17 fatty acids can be accurately predicted using MIRS, and that this method has the potential for further fat trait selections in animal breeding. Furthermore, 6 fatty acids (C12:0, C20:0, C22:0, C20:5n3, UFA, and LCFA) with R 2 > 0.8, which were well predicted by MIRS, could also be used for breeding selections. For the grouped fatty acids, the R 2 values of the test set were greater than 0.7 (Table 3) . For both the training and the test sets, 6 individual fatty acids (C20:0, C22:0, C24:0, C20:1, C18:3n6, and C20:5n3) as fat-basis (g/100 g of fat) showed R 2 values greater than 0.7, whereas inconsistent results were found in other studies [4,5]. Fleming et al. (2017) obtained higher accuracy (R 2 ) from fatty acids expressed on the milk-basis than on the fat-basis. Soyeurt et al. (2011) used the fatty acids predicted in milk for their prediction in fat and only achieved results better than those of the direct prediction in fat for C6:0, C12:0, C18:2 cis-9, cis-12, SFA, and SCFA. RPD is also used to measure the prediction effect and accuracy of models [33,34]. Three classifications of RPD are as follows: high prediction accuracy, which can be used for the quantitative prediction of substances when RPD ≥ 2; good prediction, which can be used for rough quantitative prediction or qualitative analysis when 1.4 ≤ RPD < 2; and low prediction accuracy, which cannot be used for quantitative prediction when RPD < 1.4. Generally, a higher accuracy (R 2 and RPD) can also be observed in the prediction of fatty acids by MIRS on the milk-basis (n = 22) than on the fat-basis (n = 9) (Tables 2 and 3), which is consistent with the results of other studies [4,5,21,35].
Different spectral pre-processing algorithms influence the prediction accuracy of fatty acids. Soyeurt et al. (2012) used MIRS to predict the lactoferrin content in bovine milk and obtained the highest prediction accuracy using PLSR based on DER1. Our study also found that derivatives (DER1 and DER2) and SG smoothing algorithms can be applied for most fatty acid predictions ( Table 2). The derivative algorithm uses the absorbance values corresponding to each of two adjacent wave points to calculate their derivative values, where the spectrum is processed by the derivative. The wave points with large differences in absorbance reduce signal/noise interference; then, the corresponding value of the current wave point moves sequentially to retain the spectral information for stronger spectrum continuity (Figure 1).

Conclusions
In this study, different regression models led to varying prediction accuracy of fatty acid contents, while different pre-processing algorithms for the spectra also influenced prediction accuracy. It was revealed that a higher accuracy for most fatty acids can be achieved when derivative and SG pre-processing algorithms for RFR models were used. Therefore, after a series of evaluations in Chinese Holstein cows, these results suggest that the application of MIRS to predict the fatty acid contents of milk is feasible.