Variation of Detailed Protein Composition of Cow Milk Predicted from a Large Database of Mid-Infrared Spectra

Simple Summary Milk proteins are one of the most valuable milk components. The objective of the present study was to assess sources of variation of detailed protein composition predicted from infrared spectra in milk of dairy and dual-purpose cattle breeds. Results showed that protein fractions were primarily influenced by days in milk, and the relative proportion of each fraction through lactation was not constant. Protein fractions correlated with crude protein, total casein, fat and milk urea nitrogen. In perspective, mid-infrared predictions of milk fractions could be useful for the dairy sector to improve nutritional and technological properties of milk. Abstract This study aimed to investigate factors affecting protein fractions, namely α-casein (α-CN), β-casein (β-CN), κ-casein (κ-CN), β-lactoglobulin (β-LG) and α-lactalbumin (α-LA) predicted from milk infrared spectra in milk of dairy and dual-purpose cattle breeds. The dataset comprised 735,328 observations from 49,049 cows in 1782 herds. Results highlighted significant differences of protein fractions in milk of the studied breeds. Significant variations of protein fractions were found also through parities and lactation, with the latter thoroughly influencing protein fractions percentage. Interesting correlations (r) were estimated between β-CN, κ-CN and β-LG, expressed as percentage of crude protein, and milk urea nitrogen (r = 0.31, −0.20 and −0.26, respectively) and between α-LA and fat percentage (r = 0.41). The present study paves the way for future studies on the associations between protein fractions and milk technological properties, and for the estimation of genetic parameters of predicted protein composition.


Introduction
Milk proteins are one of the most valuable components among milk constituents.This is mainly due to the wide array of nutritional, nutraceutical and technological properties they are endowed with.First, milk and dairy products are major sources of proteins in the human diet, both in terms of recommended daily intake and biological value [1].Second, milk and whey proteins, and peptides derived from their metabolic hydrolysis, have nutraceutical properties, such as antibacterial, antiviral, antifungal and antioxidant activity [2].Adequate milk protein intake, together with calcium and vitamin D, results in decreased bone fracture and osteoporosis risk [3].Third, caseins are primarily involved in the cheese-making process, since they are the only milk constituents reacting to rennet and are mainly responsible for milk coagulation properties and yield, retaining also other milk components and water in the caseinate complex [4].Casein fractions influence milk coagulation properties; in particular, κ-casein (κ-CN) and α-casein (α-CN) proportions have positive effects on curd firming time and curd firmness [5].At the same time, whey proteins have been reported to influence curd properties, for example, α-lactalbumin (α-LA) has been demonstrated to improve the rate of firming and curd firmness, contrary to β-lactoglobulin (β-LG) [6].
For all these reasons, milk protein content is included in the quality-based payment systems of many dairy companies [7] as well as in the selection indexes of different breeds and countries [8].Accordingly, the possibility of characterizing not only total protein or total casein content but also specific protein fractions at population level could be of great interest in order to genetically improve the milk aptitude to coagulate, considering the influence that milk proteins have on milk coagulation properties and cheese yield [5]. Quantification of total milk proteins and caseins is based on the Kjeldahl method, whereas qualitative and quantitative analyses of detailed milk protein composition are based on High Performance Liquid Chromatography (HPLC) [9][10][11].Such techniques, commonly recognised as reference or gold standard methods, are not adequate for the acquisition of phenotypic information at population level due to their high demand in terms of costs, time and trained personnel [12].For these reasons, large-scale collection of protein fractions is still partially hampered, thus preventing their inclusion in breeding programmes and in quality-based payment systems.Mid-infrared spectroscopy (MIRS) has been recognized as a reliable, fast and cost-effective tool for the prediction of milk phenotypes, including total protein and casein content [13].Moreover, an advantage of MIRS is the possibility to retroactively apply calibration models and thus study the temporal variation of novel traits when spectra are properly stored and standardized [14].Recently, the feasibility of characterizing detailed milk protein composition using mid-infrared prediction models has been investigated [11], and population-level studies have been conducted [15,16].
To our knowledge, there is a paucity of information on the fine protein composition of cow milk predicted from mid-infrared spectra at population level.Therefore, the objectives of the present study were to (i) assess sources of variation of detailed milk protein composition predicted by MIRS in a large database of dairy and dual-purpose cattle breeds, and (ii) estimate the correlations between the milk content of protein fractions and other milk traits.

Data Collection
Data and spectra information of 2,119,143 milk analyses of fat, crude protein (CP) and casein percentage, and milk urea nitrogen (MUN, mg/dL) collected between January 2011 and December 2017 were provided by the South Tyrolean Dairy Association (Bolzano, Italy).Milk yield (kg/day) and somatic cell count (SCC, cells/µL) were also available.Information on herds and cows were provided by the Breeders Association of Bolzano Province (Bolzano, Italy).Milk samples were from 128,328 Holstein-Friesian (HF), Brown Swiss (BS), Simmental (SI), Alpine Grey (AG) and Pinzgauer (PI) cows farmed in 4453 single-breed herds.The average size of herds under milk recording in this mountainous area is small and animals are fed forage or hay and concentrates.Between 15% and 20% of the farms move their cows to highland pastures in late spring or early summer, and during the highland sojourn animals have access to grazing.
Immediately after collection, 50 mL of milk samples were added with 200 µL of preservative (Bronysolv; ANA.LI.TIK Austria, Vienna, Austria) and processed in the laboratory of the South Tyrolean Dairy Association according to the guidelines of the International Committee for Animal Recording for milk quality analyses.Fat, CP and casein percentages, and MUN content were determined using MilkoScan FT6000 or MilkoScan FT7 (FOSS Electric A/S, Hillerød, Denmark).To offset changes in instrumental response and ensure the comparability of spectra between MilkoScan FT6000 and MilkoScan FT7, the two instruments were routinely calibrated using a standard sample, according to the manufacturer instructions [17].Principal component analysis on spectra did not show significant differences between the two instruments.Somatic cell count was determined using a Cell Fossomatic (FOSS Electric A/S, Hillerød, Denmark) and transformed to somatic cell score (SCS) with the following formula: SCS = log 2 (SCC/100) + 3. Spectral data from 5000 to 900 cm −1 were used to develop MIRS models to predict detailed milk protein composition.

Data Editing and Statistical Analyses
In the present study, days in milk (DIM) between 5 and 305 days, and parity between 1 and 15 were considered.Lactations with less than three test day records were discarded from the dataset.Observations from cows that changed herd during the investigated period were removed.The final dataset consisted of 735,328 records from 49,049 cows and 1782 single-breed herds, collected between January 2011 and December 2017 during the official monthly test day recording.Records were from two dairy (HF, n = 6271 cows; BS, n = 15,556 cows) and three dual-purpose cattle breeds (SI, n = 16,836 cows; AG, n = 9202 cows; PI, n = 1184 cows).Spectra outliers were identified by calculating the Mahalanobis distance between the data point (spectrum) and the centroid of the spectra cluster.Predicted milk protein fractions were set to missing if outside the range of the reference data used for calibrations.For all studied traits, values deviating more than 3 standard deviations from the corresponding trait mean were set to missing.
Sources of variation of detailed milk protein composition and traditional milk traits were investigated using the HPMIXED procedure of SAS software ver.9.4 (SAS Institute Inc., Cary, NC, USA), according to the following linear model: where y ijklmno is the analysed trait; µ is the overall intercept of the model; B i is the fixed effect of the ith breed (i = HF, BS, AG, SI, PI); M j is the fixed effect of the jth month of sampling (j = 1 to 12); Y k is the fixed effect of the kth year of sampling (k = 2011 to 2017); S l is the fixed effect of the lth DIM class of the cow (l = 1 to 30; 10-day classes); P m is the fixed effect of the mth parity of the cow (m = 1 to 5, with class 5 including cows of parity ≥ 5); (B × M) ij is the fixed interaction effect between breed and month of sampling; (B × S) il is the fixed interaction effect between breed and DIM class; (B × P) im is the fixed interaction effect between breed and parity; (S × P) lm is the fixed interaction effect between DIM class and parity; H n (B i ) is the random effect of the nth herd nested within the ith breed ~N(0,σ 2 H(B) ); C o (B i ) is the random effect of the oth cow nested within the ith breed ~N(0,σ 2 C(B) ); and e ijklmno is the random residual ~N(0,σ 2 e ).Because of the data structure (herd nested within breed), the significance of the breed effect was tested on herd within breed variance.A multiple comparison of means was performed for the main effect of breed, using Bonferroni's test (p < 0.05).Finally, Pearson correlations between residuals of milk production traits and detailed protein composition were assessed using the CORR procedure of SAS.

Results and Discussion
In the present study, only data from single-breed herds were available for statistical investigation.No detailed information on diet and management of the cows was available; however, feeding strategies of the herds were based on requirements and production levels of their breeds, and thus the breed-estimated effect could also include a part of the farming conditions (herd) effect.For this reason, a nested approach has been used, similarly to previous papers [20,21].

Descriptive Statistics
Descriptive statistics and proportion of phenotypic variance accounted by cow and herd effects for milk yield, composition, SCS, MUN and detailed milk protein composition are reported in Table 1 Means for protein fractions were 14.30, 10.45, 7.30, 1.82 and 0.70 mg/mL of milk for α-CN, β-CN, κ-CN, β-LG and α-LA, respectively, and the corresponding means for protein fractions expressed as percentage of CP were 41.36%, 30.48%, 21.25%, 5.23% and 2.02%, respectively (Table 1).Quantifications of milk proteins obtained in the present study were consistent with detailed protein composition determined by HPLC in the milk of HF and Jersey breeds [23].The amount of total κ-CN was slightly greater than that reported by other authors, and this was probably due to the high incidence of BS cows in the present study (32% of total animals) and to farming systems that favoured milk composition rather than milk yield [24].
The greatest proportion of phenotypic variance explained by herd effect was estimated for milk yield (35.09%) and MUN (20.27%), meaning that farm management and feeding system were important for these features.For all other traits, the cow was more important than the herd effect in explaining the phenotypic variation; in particular, values ranged from 25.58% (fat percentage) to 41.78% (casein percentage) for milk quality traits, and among milk protein fractions they were lowest for α-LA and greatest for β-LG, regardless of the unit of measurement (Table 1).Overall, the result for β-LG reflects the fact that protein and its fractions are only partially affected by variations in nutrition and management [25].Considering that this protein fraction has been identified as one of the major milk allergens, strategies such as genetic selection might be of particular interest to decrease its content in milk and develop hypoallergenic milk and functional foods [26].

Breed Effect
To our knowledge, this is one of the first studies that has used historical spectra information to predict protein composition in different dairy and dual-purpose cattle breeds.Bonfatti et al. (2017) [9] studied milk protein composition using predicted protein phenotypes from a large spectra database of Italian SI cows.Even if some studies about milk protein composition have been published recently, all of them investigated the phenotypic and genetic variation of milk protein composition using HPLC on a limited number of samples.Moreover, concerning the two dual-purpose breeds (AG and PI), their detailed protein composition has been characterized for the first time in the present study.
Table 2 reports the least squares means (LSMs) of milk yield, composition, SCS, MUN and detailed protein fractions for HF, BS, SI, AG and PI breeds.Alpine Grey and HF had the lowest (17.10 kg/day) and the highest milk yield (28.43 kg/day), respectively.Regarding chemical composition, fat, CP and casein percentages were greater for BS cows than for other breeds, and SI cows had significantly lower SCS (2.45) than other breeds, with SCS from 2.62 (AG) to 2.85 (BS).Milk urea nitrogen ranged from 19.04 mg/dL (HF) to 21.74 mg/dL (AG).Overall, detailed milk protein composition varied significantly across breeds.In particular, BS cows showed the greatest amount of all casein fractions and the lowest amount of β-LG when expressed as mg/mL (p < 0.05), whereas HF exhibited the lowest amount of caseins, even if not significantly different from PI, and α-LA.The greatest β-LG content (mg/mL) was observed in the milk of SI cows (p < 0.05).
Cipolat-Gotet et al. (2018) [27] determined detailed milk protein composition of 1264 Italian BS samples through reversed phase HPLC and results showed that protein fraction contents were similar to those reported in the current study, except for κ-CN, which will be discussed more in details later on, and β-LG.Differences in the latter were probably determined by the wider lactation range in the study of Cipolat-Gotet et al. (2018) [27] compared with the present work.
In order to investigate differences in the relative proportion of protein fractions, LSMs were estimated for proteins expressed as g/100 g of CP.As a result, α-CN differed slightly among breeds, with values between 41.12% (AG) and 41.75% (SI), whereas β-CN, κ-CN and α-LA were significantly greater in BS (31.81%, 21.99% and 2.10%, respectively) compared with other breeds.The lowest concentration of β-CN (29.28%) was estimated for SI, and the lowest concentration of κ-CN was obtained for HF (20.76%) and SI (20.81%).Finally, β-LG ranged from 4.34% (BS) to 5.91% (HF).
Relative proportions of α-CN and β-LG percentage in HF breed (41.64% and 5.91%, respectively) were lower compared with results of Schopen et al. (2009) [28] in first-parity Dutch HF cows, whereas β-CN was higher compared with the same study (30.44% and 27.17%, respectively).Such differences can be attributed to the different cow parities and lactation stages included in the sampling, to diversities in farming system and area, and to the lower relative amount of κ-CN observed in the study of Schopen et al. (2009) [28].Those authors determined only non-glycosylated mono-phosphorylated κ-CN using capillary zone electrophoresis, and this can explain the lower κ-CN percentage compared with that obtained in our study.Such hypothesis is corroborated by κ-CN Animals 2019, 9, 176 6 of 14 determined in the study of McDermott et al. (2017) [15], which is consistent with the κ-CN reported in the present study.Previous reports predicted protein fractions content of the SI cattle breed from infrared spectra.Compared with Bonfatti et al. (2017) [9], lower α-CN and β-LG and higher β-CN and κ-CN were found in the present study.Such differences could be attributed to the same factors already discussed for HF.

Effects of Parity, Lactation Stage and Season
Variations of protein fractions across different parities and breeds are depicted in Figure 1.All caseins and α-LA, expressed as mg/mL of milk, followed a trend similar to that of CP (Supplementary Figure S1), with the greatest amount in second-parity cows and a decreasing content in later parities.The same trend was not so clear for β-LG, which showed only slight variations across different parities.Switching to protein fractions expressed as percentage of CP, α-CN and κ-CN increased in milk of older compared with first-and second-parity cows, with a more obvious trend for specialized dairy breeds (HF and BS).Conversely, β-CN and α-LA decreased with parity order, and β-LG remained almost stable.
Figure 2 depicts the LSMs of predicted protein composition across lactation for HF, BS, AG, SI and PI breeds.Overall, the trend of milk protein composition measured as mg/mL across DIM mirrored that of CP (Supplementary Figure S2).Interestingly, protein fractions percentages showed important variations across DIM.In particular, α-CN decreased from 5 to 45 DIM and then slightly increased until 305 DIM, with different trends among breeds, and β-CN increased until 125 to 155 DIM and slightly decreased thereafter.A constant decrease of κ-CN was observed through the entire lactation, with a more gradual slope for HF.The variation of milk protein fractions across lactation may explain the trend of milk technological properties described in previous reports on the same breeds and study area [29,30].Finally, β-LG and α-LA decreased until 75 DIM and increased during the remaining part of the lactation.Such trends for β-LG and α-LA resemble those recorded by Niero et al. (2016) [24] and Maurmayr et al. 2018 [31] who measured β-LG and α-LA using HPLC.Higher percentage of β-LG in early lactation could be associated with the biological function of this protein fraction in newborn calves, with particular regard to its ability to increase the absorption of small hydrophobic ligands such as retinol and fatty acids [32]. Figure 2 depicts the LSMs of predicted protein composition across lactation for HF, BS, AG, SI and PI breeds.Overall, the trend of milk protein composition measured as mg/mL across DIM mirrored that of CP (Supplementary Figure S2).Interestingly, protein fractions percentages showed important variations across DIM.In particular, α-CN decreased from 5 to 45 DIM and then slightly increased until 305 DIM, with different trends among breeds, and β-CN increased until 125 to 155 DIM and slightly decreased thereafter.A constant decrease of κ-CN was observed through the entire lactation, with a more gradual slope for HF.The variation of milk protein fractions across lactation may explain the trend of milk technological properties described in previous reports on the same breeds and study area [29,30].Finally, β-LG and α-LA decreased until 75 DIM and increased during the remaining part of the lactation.Such trends for β-LG and α-LA resemble those recorded by Niero et al. (2016) [24] and Maurmayr et al. 2018 [31] who measured β-LG and α-LA using HPLC.Higher percentage of β-LG in early lactation could be associated with the biological function of this protein fraction in newborn calves, with particular regard to its ability to increase the absorption of small hydrophobic ligands such as retinol and fatty acids [32].Regarding monthly variation of the amount of protein fractions (mg/mL of milk), caseins and α-LA followed the same trend as CP (Supplementary Figure S3), with a general decrease during the summer period and the minimum in June-July (Figure 3).Such trend was previously reported by Bernabucci et al. (2015) [33] and was correlated to heat stress affecting cows during summer.On the Regarding monthly variation of the amount of protein fractions (mg/mL of milk), caseins and α-LA followed the same trend as CP (Supplementary Figure S3), with a general decrease during the summer period and the minimum in June-July (Figure 3).Such trend was previously reported by Bernabucci et al. (2015) [33] and was correlated to heat stress affecting cows during summer.On the contrary, β-LG increased during the summer period, probably due to its immunomodulatory role.Percentage of α-CN showed two major peaks in April and July, whereas κ-CN (%) slightly decreased during summer, with a minimum in July, and β-CN (%) exhibited only small variations across months of sampling.Finally, β-LG (%) slightly increased between May and September, and α-LA (%) had an erratic trend, with the greatest percentage in November.Similar trends for protein fractions were reported by Bernabucci et al. (2015) [33].Seasonal impacts on protein fractions could be the result of the pasture system applied in alpine areas during summer [34].
Animals 2019, 9, x 9 of 14 contrary, β-LG increased during the summer period, probably due to its immunomodulatory role.Percentage of α-CN showed two major peaks in April and July, whereas κ-CN (%) slightly decreased during summer, with a minimum in July, and β-CN (%) exhibited only small variations across months of sampling.Finally, β-LG (%) slightly increased between May and September, and α-LA (%) had an erratic trend, with the greatest percentage in November.Similar trends for protein fractions were reported by Bernabucci et al. (2015) [33].Seasonal impacts on protein fractions could be the result of the pasture system applied in alpine areas during summer [34].
Protein fractions expressed as mg/mL were moderately to strongly associated with CP and total casein (r = 0.32 to 0.89), and weakly negatively associated with milk yield (r = −0.16 to −0.06; Table 3), which is consistent with a dilution effect of milk components at higher milk yield [35]. Overall, milk protein fractions were also weakly associated with fat percentage, SCS and MUN (r = −0.24 to 0.26), except for a moderate correlation between α-LA and fat percentage (r = 0.41).Correlations of protein fractions, expressed on CP, with milk yield, composition, SCS and MUN were generally weak (r = −0.26 to 0.31), except for a moderate relationship between α-LA and fat percentage (r = 0.41; Table 3).Differences in the magnitude of correlations between MUN and milk protein fractions probably underline that each protein fraction has a different impact on nitrogen conversion efficiency [36].

Table 1 .
[21]lk yield averaged 23.45 kg/day, and means of fat, CP, casein, SCS and MUN were 4.03%, 3.46%, 2.72%, 2.48 and 21.19 mg/dL, respectively.Averages of milk yield and composition traits observed in the present study were comparable with values reported by Penasa et al. (2014) [22], who studied milk coagulation properties of HF, BS and SI cows in multi-breed herds, and Visentin et al. (2018)[21], who assessed the phenotypic variation of major milk mineral content in HF, BS, AG and SI cows in single-breed herds.Mean, standard deviation (SD), range, coefficient of variation (CV) and percentage of phenotypic variance accounted by cow (σ 2 c ) and herd (σ 2 h ) for milk yield, milk composition, somatic cell score (SCS), milk urea nitrogen (MUN) and detailed protein composition of cow milk.

Table 2 .
Least squares means (SE in parentheses) of milk yield, milk composition, somatic cell score (SCS), milk urea nitrogen (MUN) and detailed protein composition of different cow breeds 1 .
1Least squares means with different superscript letters within a row are significantly different (p < 0.05).