3.1. Reference Dataset
The descriptive statistics of the entire dataset are mentioned in
Table 2. The various numbers of samples per mineral is related to the erroneous ICP-AES analysis, as mentioned previously. The main purpose of this study was to establish robust models who can be applied on a large application scale. Compared to the current study, the models published in the literature [
19,
23] had a different standard deviation and mean. Indeed, these metrics vary according to the sampling.
The models presented by Toffanin et al. [
20] exhibited relatively few differences in mean concentration for Ca and P (0.84% and −5.84%) with our models. A higher average was especially found with the mineral models presented by Visentin et al. [
23] (12.45%, 17.42%, 29.60%, 1.93%, −0.86% for Na, Ca, Mg, P and K, respectively). The sampling was carried out in the South Tyrol region where cattle are mainly used for cheese production.
Another difference is clearly exposed by Zaalberg et al. [
26], and a mean difference is observed for Danish Holstein with −4.09%, 5.07%, 8.08%, −29.32%, and −5.49% and for Danish Jersey with 12.35%, 23.71%, 17.91%, −13.61%, and −14.49% for Na, Ca, Mg, P, and K, respectively, in comparison to our data.
3.2. Creation of Models Predicting Mineral Content in Milk
The prediction performances of both the different sets of predictive PLS models to determine the major mineral contents in milk are listed in
Table 3. Maximum 1181 and 1211 samples were used to design both sets of calibration models. In order to ensure the comparison of results between the two sets of models, the number of LV used for the second set of models was the same as the ones fixed for the first set. The
has shown that the T test is mandatory in order to obtain unfalse models. Indeed, the T test decreased RMSE, especially for Ca and P. This decrease was probably due to a bad sampling or a precipitate issue coming from a long freezing time. The validation RPD value allowed us to assess the accuracy and robustness of each prediction equation. Both models had relatively similar cross-validation performances, but in cross-validation, RPD decreased from −1% to −13% when 30 Austrian samples were added in the training set. However, the validation RPD increased from 6 to 39%, except for Na, where RPD decreased by −16%. As stated in [
27], it validates that inclusion of variability in the dataset can decrease the apparent accuracy (i.e., in cross-validation) but improve the robustness and real performances of the models in validation. It consequently seems important to cover the variability of new samples to predict in order to limit extrapolation and ensure better predictions in routine. Indeed, when the inclusion of Austrian dataset is added, the RMSEP decreases by 1.1%, 12.8%, 11.1%, 36.7%, and 18.7% for Na, Ca, Mg, P, and K, respectively.
The main purpose of using different sets of calibration was not to evaluate the model’s performance, but more to compare the data influence. Indeed, the data exposed a completely different distribution for both models. Within this study, the main purpose was to establish robust models, which is not classically the main criteria from a statistical point of view. In this specific case, a large coverage of spectral variability which implies that models can be applied in a large panel of cases was the priority. This sought-after robustness allows the model to be used on a larger scale for smart farming. The standard deviation performed with all the samples analysed from ICP-AES exhibited higher SD (
Table 2) compared to the models exhibited in
Table 3. When some values are too different from each other, the removal of these samples (considered as outliers) helps to minimize prediction errors.
The residual analysis performed during the calibration process made this job. The models were built in several sample collection phases, where the samples were selected if the predicted value from the previous model was of interest to improve the robustness of the newly created model. However, this data collection method is known to increase the prediction errors of a model [
27].
The opposite results in cross-validation and validation show the relevancy of performing a fully independent validation. The RMSEP of validation mentioned in
Table 3 confirms the importance of searching for variability to obtain the most robust model, as RMSEP was improved when only 30 Austrian samples were added in the training set. For cattle coming from a totally different environment with different feed rations, different climatic conditions, and different breeds, the milk composition could radically change. For Na, Ca, and Mg, a slight difference was found between both external validations (
Table 3), which explains the model being robust enough to predict the Austrian samples well.
For the P model including Austrian samples in the calibration dataset, a large decrease of the RMSEP of validation was observed, confirming a better performance of the predicted model (i.e., Higher RPD). The relatively high errors of prediction (for both external validation) for the K model revealed the incapacity of the model to accurately predict the Austrian samples.
When 80% of samples in the calibration model were randomly chosen, the RMSEP for predicting K was 106.7 mg/kg. This prediction error seems more consistent (than the validation with only Austrian samples) and confirms the differences from K content in the Austrian milk sample. These results may be explained by the difficulty to predict the K content in milk.
Even with a robust model including samples from four different countries, the external validation exhibited some deficiencies to validate Austrian samples for P and K. A decrease for RMSEP was revealed when 30 samples coming from the Austrian validation set was included in the calibration set. The results obtained in this study highlight the need for data collaboration and the interest of including local variability to obtain the most robust models. The best way to collaborate is to have a common database and standardized spectra allowing to taking into account the machine-dependent errors [
29].
Compared to the current study, the models published in the literature [
15,
19] exhibited similar model performance (RPD) but exhibited higher RMSE. Indeed, the RMSECV obtained in the current study were globally lower when compared to a previous study by −30%, −77%, −60%, +20%, and −49% [
19] and −40%, −124%, −79%, −34%, and −31% [
23] for Na, Ca, Mg, P, and K, respectively. However, it is important to note that the data exposed a completely different distribution in the other studies. Indeed, the results highlighted by the literature [
19,
23] present a higher mean, and consequently, a greater standard error of prediction. Consequently, it is more relevant to compare the results using the RPD, which is relative to RMSE and SD.
Another approach will be to compare with the
CCC metric, but Soyeurt et al. [
19] and Visentin et al. [
23] did not evaluate their model with this parameter. When the
CCC was applied to the cross-validations of our three models, the performance was approximately the same, except for Na coming from the first models (8.6%). The
CCC informed us that our models were not subject to bias or slope issues.
Based on the RPD values [
36], we can conclude that all models created for Ca and P have the potential for approximate quantitative predictions and screening, whereas the screening for Na, Mg, and K seems to be impossible due to the low RPD values of their models. However, the accuracy of those models was enough to discriminate different groups of cows producing high or low contents of minerals in their milk. The poor RPD performance of the model is related, so that the infrared does not directly detect minerals in milk. Indeed, since infrared measures the vibration of binding molecules, the infrared measurement cannot allow a direct detection of possible ions in solutions. The models presented are therefore indirect models.
3.3. Correlation between Mineral Traits and Traits Related to Animal Health and Milk Technological Properties
The Pearson correlations were estimated to achieve two objectives. The first one was to compare the reference minerals dataset with the MIR predicted traits, and the second one was to compare the predicted mineral dataset with the MIR predicted traits (
Table 4). Since the dataset created to develop the different models was not representative of the population as the selection of samples was not randomly conducted, the correlations given in this study cannot be generalized to the population. However, it was interesting to observe whether the relationships between traits stayed similar when using reference or predicted mineral contents even if the prediction accuracy was not high. Moreover, this allows to confirm the relationships with other predicted traits known to be related to animal health or the milk technological properties.
When comparing the correlation obtained between reference or predicted mineral traits and other MIR predicted traits mentioned in
Table 4, the value of correlations had the same trend, but not the same amplitude. Indeed, the correlations between the predicted mineral data generally result in a higher correlation than those using the reference values. This correlations amplification could be mainly explained by the common part of the spectral information shared by all the predicted traits. In order not to exaggerate the reality, the results and discussion will focus on the correlation values between the measured minerals and the predicted features.
Pearson’s correlation between the reference mineral contents of milk ranged from −0.27 to 0.55. Moderate relationships between Mg and P, Ca, and Mg, as well as Ca and P were found with a correlation value of 0.55, 0.54, and 0.49, respectively. Weak negative relationships were found for Na and K, Ca and K, as well as Mg and K with −0.27, −0.15, and −0.14, respectively. It is interesting to notice that the correlation between mineral contents predicted from the calibration set exposed the same trend of relationship (
Table 4).
The caseins were part of the phosphoproteins with 0.85% of the phosphorous as phosphates. Unlike whey proteins, which do not contain any phosphorous, the phosphates are responsible for the ability to bind a large amount of calcium cations. This type of phosphate coming from the esterification of the hydroxyl group of serine was called organic phosphate [
37]. The presence of calcium ions associated with casein led to the formation of casein micelles. This calcium ionic interaction is strongly linked to protein concentration [
37]. The analysed minerals were initially found as Calcium or Magnesium Phosphate. As there were counterions (ionic interaction), a strong correlation between these minerals exist (Ca and P, Mg and P;
Table 4). Magnesium is also involved in many enzymatic reactions. For example, in milk, Mg is involved with alkaline phosphatase, which has a role of dephosphorylating agents [
38].
These phosphorous salts are mainly found inside large colloidal particles of casein under the form of colloidal (nano-clusters) phosphate. Indeed, the phosphorous is distributed within 38.5% of colloidal inorganic phosphate, 33% of inorganic salt soluble in milk, 20% of organic phosphate (casein phosphate), 7% as organic esters, and 1.5% of phosphorous lipids [
39]. The percentage of the salt included in the micelle was around 6% (
w/
w) [
37,
40], explaining the link between pProteins and Mg and P found in
Table 4. In order to better explain this phenomenon, the method and models developed by Franzoi et al. [
24,
40] could be used to calculate the correlation between casein and the micellar mineral.
In the aqueous phase of the milk, the bivalent cations could be favorably chelated to citrate to form undissociated complexes. The positive correlation between pCitrate and Ca and Mg (r = 0.43, 0.48, respectively) could explain the possible chelation.
Increased sodium in milk is a sign of udder infection in cows [
41]. Therefore, the sodium comes from a blood transfer with the milk. To maintain a constant osmotic pressure, the synthesis of lactose is reduced [
39,
42]. The Na content was negatively correlated with pLactose (
Table 4) confirming this transfer. It has been demonstrated that NAGase is a mastitis marker [
43,
44]. In this study, a positive correlation between Na and pNAGase and pLDH was found (
Table 4) leading to a possible confirmation of the link between sodium and mastitic infection. Moreover, a clear positive correlation was found with pLactoferrin (r = 0.36 to 0.62), which is also known as a mastitis indicator [
34,
45].
High Na and low K content in milk were found to be a mastitis indicator [
46]. This is in line with the negative correlation observed between these two minerals. Summer et al. [
47] revealed the links between Na, K, and somatic cells confirming that low potassium content is an indicator of clinical mastitis [
47]. When milk exhibits a high somatic cell content (>400,000), the Na concentration tends to increase, while the K content tends to drop.
The milk coagulation ability (k20, a30, and Fresh ILCY) was mainly correlated with Ca, Mg, and P (
Table 4) and exposed the importance of mineral composition to cheese manufacturing [
48]. Sanchez et al. [
49] asserted a moderate genetic correlation between Ca, Mg, and P with fresh curd yield and coagulation traits. No genetic correlation was found with Na and K. These findings follow the same correlation as our study.
3.5. Month Influence on Milk-Predicted Mineral Composition
The descriptive statistics derived from large-scale prediction from the Walloon DHI database are shown in
Table 5. By monthly averaging those results, the annual evolution of concentration of the five main predicted minerals in milk were computed and shown in
Figure 1. There were three types of predicted mineral annual tendencies: (1) sodium (2) calcium, magnesium, and phosphorus, and (3) potassium. Predicted sodium increased from February to September, with a slight decrease between May and July. During the year, the lowest amount of pNa was around February, and the maximum amount around September. A local minimum was observed in July.
The second tendency exposed an annual behavior with a characteristic minimum during the summer and a significant increase between August and September. Although the concentration of pMg remained mostly stable between October and January, a sharp decrease was observed between December and January. For pCa, a similar plateau was observed between October and December.
The last predicted mineral tendency (K) experienced a sharp decline for a long period of the year between January and October. During this long period of decline, two local minima were observed in March and May, with the strongest decline occurring between April and May.
For pCa and pP, the concentration followed the same trend as the fat concentrations depending on the month, as shown by Zhang et al. [
50], while pMg was more similar to the proteins aspect. Bittante et al. [
51] has also shown that the proteins and the fat in milk decrease during the summer. Thus, the predicted minerals Ca, Mg, and P followed an identical pathway.
Diet is a key factor in explaining this behavior: throughout the year, the diet can change radically. In the studied region, during the summer, the livestock were mainly on pastures, and little supplementary feed (concentrate, silage, etc.) was given, resulting in a lower proportion of fat and protein. Grazing is also susceptible to modify the cation/anion balance of the diet, which affects the body mineral balance [
52]. Another important change during the summer is the increase in temperature. This heat stress had a negative impact on the cow’s well-being, leading to underfeeding and therefore a modification of milk composition [
50].
In many countries, farmers use group calving at the same time of the year. This induces cows to have the same stage of lactation at the same time, which may explain some of the differences in mineral content throughout the year.
3.6. Effect of Lactation on Milk-Predicted Minerals
The mineral content of milk according to the DIM and the parity is shown in
Figure 2. Two behaviors were observed, where the first corresponds to an exponential decrease until the lactation peaks, followed by a slight increase in mineral content over the days for pNa, pCa, pMg, and pP. This trend showed the opposite behavior of milk production. The second behavior corresponds to a slight increase in K content up to the maximum (different according to parity), followed by a slight decrease over the DIM. In agreement with
Table 4, potassium was positively correlated with milk yields (r = 0.32). A decrease through the parities was observed for mineral elements (except for pNa and pCa) with a larger gap between the first and second parity.
Analyses of DIM and parity has already been done by different authors [
53,
54], but these studies focused on the breed and not on a global aspect. Our study has provided a new observation of the mineral concentration depending on the DIM and the parity around 10,000 spectra per days in milk on the first five parities of the cattle. The sharp decrease in mineral traits (except pK) at the beginning of DIM was probably the result of a dilution effect, where the onset of milk yields was indeed very low. The increase of minerals, except for pK, throughout lactation could also be derived from a dilution effect in milk. Moreover, Holt et al. [
55] mentioned a rise in casein level in late lactation, leading to a higher content of Ca and Mg, and Kume and Tanabe [
56] and Visentin et al. [
53] mentioned a similar degressive trend for pCa, pMg, and pP content across parity. Or as in the first parity of cows, the skeleton may not be completely structured, and calcium could be more mobilized for bone fixation instead of calf feeding, explaining the minimum at the beginning and the progressive increase until it is higher than the Ca present in the milk for multiparous cows through the DIM.
A link between mastitis infection and sodium was clearly identified based on the estimated correlations with mastitis indicators (
Table 4). The increase in sodium across parity could be explained by the fact that the more cows age, the SCC tends to increase, and the more likely they are to have an udder infection [
57,
58]. Finally, there was an increase in sodium across lactation, as the increase in SCC is due to the innate immune response of the animal in preparation for calving, and to reinforce the defense mechanism of the mammary gland at this critical period of calving [
59] so the sodium would follow the trend of SCC. The correlation between minerals and NAGase could be a good indicator to detect mastitides in dairy cattle.