Next Article in Journal
Locomotion, Postures, and Substrate Use in Captive Southern Pygmy Slow Lorises (Strepsirrhini, Primates): Implications for Conservation
Previous Article in Journal
Taste Preferences in Broilers: Behavioral Evaluation for Varying Concentrations of Four Essential Amino Acids
Previous Article in Special Issue
Former Food and Agro-Industrial By-Products in Dairy Cow Diets: Effects on Milk Quality and Cheese Production
 
 
Font Type:
Arial Georgia Verdana
Font Size:
Aa Aa Aa
Line Spacing:
Column Width:
Background:
Article

Detection of Dairy Herd Management Issues Using Fatty Acid Profiles Predicted by Mid-Infrared Spectrometry

1
TERRA Research and Teaching Centre, Gembloux Agro-Bio Tech, University of Liège, 5030 Gembloux, Belgium
2
Lactanet, Saint-Anne-de-Bellevue, QC H9X 3R4, Canada
3
Walloon Breeders Association, 5590 Ciney, Belgium
4
Comité du Lait, 4651 Herve, Belgium
5
Walloon Agricultural Research Centre, 5030 Gembloux, Belgium
*
Author to whom correspondence should be addressed.
These authors contributed equally to this work.
Animals 2025, 15(11), 1575; https://doi.org/10.3390/ani15111575
Submission received: 27 March 2025 / Revised: 9 May 2025 / Accepted: 19 May 2025 / Published: 28 May 2025

Simple Summary

Farms generate increasing amounts of data each year. One example is the bulk tank milk composition, predicted through spectrometry, which is routinely measured for milk payment purposes. Among the different milk components, the fatty acid profile is of utmost importance because it is closely linked to animal status and farm management practices. This research aims to explore a novel application of these fatty acid profiles by developing a practical herd-monitoring tool for farmers and advisors. The methodology developed consists of an unsupervised learning method that identifies meaningful patterns in fatty acid profiles, combined with an expert-driven interpretation of those patterns. The analysis was performed using a Belgian bulk tank milk database. Seven distinct patterns were identified; among these, three were associated with good management practices, three indicated potential risks, and one highlighted a metabolic disorder probably related to management practices. Most of these patterns were also observed in a Canadian bulk tank milk dataset, demonstrating that the method is generalizable across different regions and farming conditions. Moreover, the probabilities associated with each pattern can serve as a reliable foundation for creating practical alerts to support on-farm decision making, thereby enhancing farm management and contributing positively to animal welfare.

Abstract

This article focuses on the creation of a monitoring tool using routinely collected data from milk payment analyses. Milk samples were analyzed through Fourier Transform mid-infrared spectrometry every 1 to 3 days, and their compositions were predicted using machine learning models. Among the predicted parameters, fatty acid profiles appear to be effective indicators of animal status and management practices. In this research, these profiles were summarized using 31 fatty acids or groups of fatty acids. The methodology consists of four steps: hierarchical clustering to detect patterns in a Belgian spectral dataset (N = 774,781), interpretation of the identified seven clusters, development of predictive models applied to a Canadian dataset (N = 670,165), and validation using management information collected from Canadian farms. The identified clusters revealed significant relationships with feeding management strategies and temporal evolutions, highlighting the potential to develop automated alert systems that assist farmers and advisors in herd monitoring.

1. Introduction

In 2023, the Food and Agriculture Organization (FAO) emphasized the critical role of milk in enhancing nutrition and health due to its nutritional composition and significance in global food systems [1]. Cow milk typically comprises 86.9% water, 4.6% lactose, 4.2% fat, 3.4% protein, 0.8% minerals, and 0.1% vitamins. Among the fat, 98% consists of triglycerides associated with fatty acids (FAs) [2]. FAs are essential since their composition directly impacts milk quality, influencing physical, nutritional, and technological properties important for human consumption and dairy processing [3]. Moreover, the milk FA profile is a valuable proxy of the cow’s metabolic status [4]. Although approximately 400 different FAs exist in milk, only 12 FAs constitute more than 1% each in milk fat [2,5].
Based on their origins, FAs can be categorized into three main groups: preformed FAs (approximately 41%), mixed FAs (about 31%), and de novo FAs (around 28%). Understanding variation among these FA groups is crucial for evaluating the animal’s metabolic and nutritional status, since they originate from distinct metabolic pathways which are sensitive to management practices and health conditions. Preformed and de novo FA pathways are summarized in Figure 1.
De novo FAs are synthesized directly in the epithelial cells of the mammary gland from residues of ruminal microbial fermentation [3]. The main primers are acetate (two carbons) and butyrate (four carbons), onto which two-carbon molecules are successively added to produce FAs up to fifteen carbons long, which are immediately excreted in milk [6].
Preformed FAs originate from three sources: dietary intake, microbial FA synthesis in the rumen, and body fat mobilization. Dietary FAs, once digested and absorbed into the bloodstream, can enter milk unchanged or undergo microbial biohydrogenation in the rumen if polyunsaturated. Biohydrogenation saturates FA chains through microbial action in the rumen, forming intermediate unsaturated FAs before becoming fully saturated. Occasionally, intermediate products bypass the final saturation steps and are absorbed directly in their unsaturated form [7]. These ruminal processes greatly influence milk FA profiles and depend heavily on microbial populations present in the rumen.
These microbial populations can also synthesize FAs from ruminal fermentation residues [8]. The importance of each source depends on the microbial population and the proportion of lipids in the ration [9]. Biosynthesis mainly produces C16:0 and C18:0 from acetate and butyrate, but also odd- or branched-chain FAs from other primers. For example, using propionate (three carbons), microorganisms can synthesize odd-chain FAs, mainly C15:0 and C17:0. By using amino acid catabolism residues, the final products are branched-chain FAs [9]. These microbial FAs are then absorbed in the intestine, pass into the bloodstream, and are incorporated into milk fat.
The final source of preformed FAs is body fat, which can be mobilized by the animal. This source generally accounts for less than 10% of milk fatty acids, except after calving, when lactation begins and the animal’s energy deficit is greatest [10,11]. This deficit is due to the high energy demand for calving and the onset of lactation, as well as the reduced dry matter intake by the animal at this stage of lactation [12]. At the same time, the mammary gland loses its capacity to produce FAs de novo. To compensate for these phenomena, the animal will mobilize its body fat. In the first weeks of lactation, there are fewer de novo and more preformed FAs in the milk [13].
Those mentioned pathways are known to be influenced by many factors [2,11,14] such as breed and genetics [15]; herd management, including animal density and ration [16,17]; stage of lactation [13]; temperature [18,19]; animal health and physiology [20]; animal nutrition; and the main fermentation processes in the rumen, themselves linked to diet [21,22].
Thus, analyzing the changes in the FA profile of milk is a powerful tool to monitor a dairy herd routinely, whether to assess the impact of management practices or detect external factors impairing the herd, such as feed storage problems or heat stress events during periods of hot weather. However, a monitoring tool requires easy, cheap, and robust data acquisition. In this context, the Fourier Transform mid-infrared (FT-MIR) spectrometry already applied to milk bulk tanks for milk payment is a huge opportunity. FT-MIR spectrometry is a rapid and non-destructive method that can predict several key milk FAs accurately [15,23].
However, interpreting complex FA profiles routinely remains challenging due to the numerous FAs and their interactions. Traditional multivariate analyses provide global overviews but lack detail for specific patterns. Unsupervised machine learning techniques, specifically clustering and pattern recognition, may provide deeper insights into specific FA profile groupings, revealing less frequent yet significant patterns. These machine learning methodologies do not require a target to predict, which is often difficult to have at a large scale, but aim to find variables that are interacting together and that form groups of homogeneous observations. Another advantage of using an unsupervised approach is the possibility to recognize patterns on the entire spectral database and not only on a few records for which the target is observed, as is done when using supervised learning.
Recently, Franceschini et al. [24] demonstrated this by applying hierarchical clustering to FT-MIR predictions of individual cow milk samples. This approach was initially applied to individual cow milk samples from Dairy Herd Improvement (DHI) records using FT-MIR predictions related to animal health. In the present study, we extend this approach by utilizing spectral data obtained from milk analyses used for milk payment determination. Compared to DHI records, bulk tank milk records offer the advantage of higher sampling frequency (every 1 to 4 days) versus traditional milk recording (every 4 to 6 weeks). Furthermore, such data are available for all herds delivering milk to dairy processors, whereas DHI data are limited to herds enrolled in performance recording programs.
In conclusion, this study aims to address the complexity of interpreting the numerous FA profiles by identifying the most representative and informative ones, providing biological interpretation, and developing a predictive model. From those results, the final aim is to estimate the feasibility of establishing a decision support tool for dairy herd management, based on the FA profile of milk predicted by mid-infrared spectrometry.

2. Materials and Methods

The software used for data processing and analysis was the R language, version 4.2.3 [25]. The entire workflow from databases to results is summarized in Figure 2.

2.1. Belgian Dataset

The first database used in this study is linked to the analysis of milk samples collected within the milk payment scheme in the Southern Region of Belgium called Wallonia. Access to this database is governed by the “Futurospectre” agreement between ULiège—Gembloux Agro-Bio Tech (Gembloux, Belgium), the Walloon Research Centre (CRA-W, Gembloux, Belgium), the Walloon Breeding Association (AWé, Ciney, Belgium), and the milk laboratory Comité du Lait (Battice, Belgium). No ethical approval was needed, as the milk samples were collected during the milk payment carried out routinely.
The milk samples, taken every 1 to 3 days by the dairies from the tanks of the farms collected, were analyzed using FT-MIR spectrometry by the Comité du Lait (Battice, Belgium). All samples were analyzed with Foss MilkoScan spectrometers (Foss, Hillerod, Denmark). During this analysis, in addition to the fat and protein contents predicted by the spectrometer, spectra were also recorded in a database. The spectra were standardized using the method developed by Grelet et al. [26]. Different prediction equations from Grelet et al. [27] were then applied to these spectra to extend the number of phenotypes available. These correspond to 31 phenotypes relating to FAs or FA groups (g/dL milk), milk production (kg/day), estimated fat content (g/dL milk), β-hydroxybutyrate concentration in milk (BHB; log, mmol/L plasma), energy balance, protein efficiency, free FA in blood (FFA; log, µEq/L plasma), and dry matter intake (kg/day).
The predicted FA concentrations have been modified to be expressed as a ratio in g/100 g fat from the predicted fat content. This unit has the advantage of reducing correlation with milk production and fat content, better reflecting the importance of different metabolic pathways responsible for producing these milk compounds.
To isolate potential heat stress effects, temperature and humidity measurements were added to the database. The zip codes of Walloon municipalities were linked to the nearest meteorological station. The map of Belgian municipalities was superimposed on that of the 30 Walloon weather stations in the AGROMET network [28] using the Voronoi method as presented by Nickmilder et al. [29]. This method involves assigning to each station a polygon of a given area on the map. Municipalities covering more than one polygon were linked to the polygon on which most of their surface area was located. Finally, the temperature humidity index (THI) was calculated using the following formula:
T H I = 0.8 × t s a + h r a 100 × t s a 14.4 + 46.4
where tsa is the ambient dry temperature (°C) and hra is the ambient relative humidity (%) [30]. Thanks to this link between dairy records and weather stations, meteorological information could be merged with the spectral database.
Then, the resulting database was cleaned according to ICAR’s recommendations for recording of dairy cattle milk data [31]. These recommendations were created for the cow level, but it is expected to work at bulk tank milk level, as it is a weighted average of the herd. Only values between 1.5% and 9% fat and between 1% and 7% protein in g/dL of milk were retained. Rows containing outliers for fat and protein, or negative values for fatty acids or missing values for one of the traits studied, were discarded.
Then, the standardized Mahalanobis distance (GH) was calculated for each record to detect potential extreme data and compare the datasets from Belgium and Canada. To calculate it, it was necessary to perform a principal component analysis (PCA) on the 31 studied phenotypes, given the high correlations existing between some of them. Following this PCA, six principal components (PCs) were retained, as they explained 95.48% of the variability in the data. This analysis was performed using the FactoMineR package, version 2.4 [32]. Next, the GH distance was calculated as follows [33]:
G H = ( x ¯ µ ¯ T S 1 x ¯ µ ¯ ) / n P C
where x ¯ corresponds to the vector of PCs of the observation, µ ¯ is the vector containing the mean of each PC, T denotes the transpose, S−1 denotes the inverse of the variance–covariance matrix of the PCs selected by describing the matrix x ¯ , and nPC is the number of PCs. No high extreme samples defined by a GH > 5 were observed, so we decided to keep all samples, as we wanted to observe samples with abnormal behaviors. The cleaned database included 774,781 records collected from 2835 Walloon farms between December 2018 and December 2021.

2.2. Canadian Dataset

The second dataset used in the present work comes from Canadian milk recording. Lactanet (Ste Anne de Bellevue, QC, Canada) provided access to spectral data from tank samples collected in Quebec from January 2020 to March 2022 from 4676 farms. The samples taken every 1 to 3 days from the tank were analyzed by Foss MilkScan spectrometers (Foss, Hillerod, Denmark). Canadian spectral data were used to predict FT-MIR phenotypes using the Belgian prediction models. Next, the predicted FT-MIR phenotypes were cleaned using the same methodology as the one applied for the Belgian dataset. Prior to any analysis, the Canadian data were projected onto the Belgian PCA used to detect outliers to ensure that their variability was indeed included in the Walloon variability and that the Walloon results are applicable.
The cleaned Canadian dataset contains 670,165 records. The estimated descriptive statistics of Belgian and Canadian datasets along with the performances of the prediction models are mentioned in Table 1. The differences observed are mostly related to the differences between a pasture-based system in Wallonia (Belgium) compared to a more productive system in Canada, where fat supplementation is more common [34].
In addition to milk composition information, Lactanet has made additional information available to better understand herd management. All of these data are average values per herd, calculated for the 12 months prior to April 2022, and are available for 3006 farms. These traits and their related descriptive statistics are presented in Table 2. The transition index corresponds to the difference between expected animal production, based on production in the previous lactation, and the projected production of these animals, based on the production in the first test. This index is used to evaluate the strategy implemented for cows during the transition period. The management index provides information at the herd level from the average animal’s environment and management effect [35]. The index at the animal level is based on the difference between the phenotype and the genetic effect.
Information on rations given in 2021 was also available for 540 herds for which Lactanet advisors provided feeding advice. This information includes the theoretical percentage of dry matter in the ration (49.97 ± 28.97) and the theoretical percentage of corn silage in the ration (22.85 ± 17.45). The latest dataset used, supplied by Lactanet, concerns the presence of ventilation in the barn. These data were collected by survey on 2113 farms and indicate the presence or absence of additional ventilation in summer (Yes = 75.96%, No = 21.44%, and NA = 2.60%).

2.3. Unsupervised Learning

Since many of the problems that can be detected are linked to multiple FA deviations, and a single FA irregularity can be associated with various problems, a multivariate approach was necessary for a more comprehensive understanding of the relationships between FAs, and consequently, to better assess the overall situation of the herd. Given the lack of diagnostic data for Walloon herds, the innovation of this work lies in studying the relationships between variables to identify groups of herds with distinct FA profiles and to explain them. This study extends the work of Franceschini et al. [24] on individual cow milk samples. Based on the predicted phenotypes, the literature, and the more extensive Canadian data, the identified clusters were interpreted, and their practical usefulness was assessed. From those results, the final objective was to study the feasibility of developing a specific decision-making tool, based on predicted FAs, to assist dairy farmers in their daily decision making.

2.4. Hierarchical Clustering

A hierarchical clustering algorithm was applied to the 31 Belgian FA predictions, with the aim of grouping together samples with similar characteristics. The function used was hclust from the Stats package version 3.4.1 [25]. This algorithm groups data according to their distance from each other. In the present study, the “ward.D2” method was used to calculate these distances between groups. The algorithm sums the squares of the distances between the data, then merges the data in such a way that the intra-group dispersion is minimized [36]. This method requires too much computer memory to be applied to the entire dataset. Therefore, a representative subset was created in 2 steps. Firstly, records with the greatest GH distances (GH ≥ 3) were kept, as they are the most extreme ones reflecting milk compositions different from the average. This represented 12,322 samples. Second, to equilibrate normal and more extreme samples, we selected 5000 samples randomly in every other GH strata (from 0 to 1, from 1 to 2, and from 2 to 3). So, the final subset was composed of 27,322 records. The number of clusters selected was a function of the cluster height, visible on the cluster dendrogram, as well as the distance differential with respect to the previous merge. The visualization of the first two PCs was used to illustrate the position of the various clusters. The 8 other indicators available in the database were added as additional features in this PCA to help the interpretation of these results.

2.5. Cluster Prediction

To facilitate the prediction of clusters in the available datasets, partial least squares discriminant analysis (PLS-DA) and random forest (RF) models were applied to the subset for which the cluster labels were known. The PLS-DA was estimated using the Caret package, version 6.0-90 [37], while the RF model was estimated using the randomForest package, version 4.7-1.1 [38]. PLS-DA was selected because some FA traits are correlated. The analysis was performed on 31 centered and scaled FAs, with a maximum of 30 components. The predicted cluster is the one with the highest probability of membership. A ten-fold cross-validation was used to determine the optimal number of PLS components and to assess the classification’s performance. Performance was measured using global accuracy and Cohen’s Kappa coefficient, which accounts for agreement occurring by chance. The random forest model was chosen to capture potential non-linear relationships. The Gini index was used as the splitting criterion, and the number of trees was set to 500. The maximum number of features was optimized through cross-validation, following the same methodology used for PLS-DA.
Since the 31 FAs were predicted by FT-MIR spectrometry, PLS-DA and RF models were also created, directly using the spectra to predict the clusters. This approach enhances the model’s transferability by eliminating the FA prediction step, which may vary depending on the lab or equation model used. A first derivative was applied to the spectra, and 212 spectral points were utilized as suggested by Grelet et al. [27]. The modeling methodology was the same as that used for the model based on the 31 FAs. After cross-validation, cluster predictions using PLS-DA and RF from FAs and from spectra were performed on all available Belgian and Canadian datasets.

2.6. Interpretation

To interpret the clusters, the means and standard deviations of each FA and other available sources of information were calculated according to the predicted clusters. This allowed us to compare the clusters.
The means associated with each cluster are essential for understanding the underlying patterns, but the frequency of each cluster and the frequency of transitions between clusters are also crucial for interpretation. For this reason, the transition matrix between clusters was estimated. For each observation in a specific cluster at time t, we measured the probabilities of transitioning to every cluster at time t + 1. In this approach, transitions between clusters were considered in a binary manner: from cluster 1 to 1, from 1 to 2, and so on. This allows for consideration of the dynamics between clusters.
Since the emergence of a problem in a herd is rarely spontaneous, the transition from one cluster to another may be too abrupt as a monitoring tool. Therefore, it is important to use quantitative rather than qualitative information. In this study, the probabilities of cluster membership obtained from PLS-DA and RF were also considered as a monitoring tool. To validate their usefulness, we calculated the correlations between cluster probabilities and FAs and indicators available in all databases. These probabilities were further used to visualize the evolution of herd conditions over time.
Finally, using additional validation data provided by Lactanet on animals, herd breakdown, and feeding, the herds found in each cluster were analyzed to determine whether it was possible to identify any trends in the type of animals or management. This was carried out to explain certain clusters, but also to confirm hypotheses about the herd status in the different clusters.

3. Results and Discussion

3.1. Clustering on Belgian Dataset

Seven clusters were retained after clustering the selected subset (N = 27,322) based on the separation observed in the dendrogram (Figure 3). The distribution of observations across the clusters is strongly influenced by the GH value (Table 3), confirming the hypothesis that rare observations with high GH values must be considered to understand the overall picture. Moreover, the less frequent clusters associated with high GH values could be linked to stress within the herd. Indeed, it is reasonable to assume that stressed herds would exhibit more extreme patterns with lower frequencies in the dataset. However, further analysis is necessary to confirm this hypothesis.
The records for this subset were projected onto the first two PCs which together explained 74.3% of the whole dataset’s variance, as shown in Figure 4. The first axis is associated with milk saturation. The top-left quadrant corresponds to de novo FAs, except C4, which, along with C16, is associated with the bottom-left quadrant. The top-right quadrant is related to trans long-chain FAs, while the bottom-right is associated with cis long-chain FAs. The scatterplot illustrates an overlap among clusters with a visible transition from one cluster to another. This configuration was expected, as the underlying natural processes of metabolic issues and stress are continuous.
To facilitate cluster interpretation and to work on the complete dataset, supervised prediction models—PLS-DA and RF—were applied to the subset with known cluster labels. An advantage of this approach is its ability to estimate cluster membership probabilities, providing quantitative information that may be particularly valuable for decision support tools. In the past, Franceschini et al. [24] adopted a similar methodology.
For this study and based on the 31 FT-MIR-predicted FAs, a cross-validated PLS-DA predicted clusters with an average accuracy of 66.01% and a Cohen’s Kappa coefficient of 60.13%, using 16 components. On the same dataset, a cross-validated RF achieved an average accuracy of 91.81% and a Cohen’s Kappa coefficient of 90.14%. The same prediction methods (PLS-DA and RF) were applied directly to the standardized spectra to predict clusters, yielding relatively strong performance. The cross-validated global accuracies were 68.84% for PLS-DA with 29 components and 79% for the RF. A hypothesis about the differences in performance could be made regarding the non-linearity of the relation between some FAs and the clusters. The decrease in accuracy for RF when switching to spectra could be linked to the high dimensionality and the higher correlation between spectral wavelength absorption than between FAs.
For the following analysis of this paper, the RF model based on the 31 FAs was used to predict the cluster for the whole dataset, as it is the model with the highest global accuracy (91.81%), so most of the observations are correctly classified and misclassifications primarily occur in adjacent clusters, which can be explained by the cluster continuum shown in Figure 4 and Table 4.

3.2. Cluster Prediction on Belgian and Canadian Datasets

Before making any predictions using the Canadian dataset, it was essential to verify that the variability in the Canadian data is encompassed within the variability in the Belgian data used for clustering. Figure 5 illustrates the projection of the Canadian data onto the first two PCs initially developed from the Belgian database, confirming that the variability in the Canadian data is included within the Belgian FA data’s variability. Consequently, the RF model developed using the Belgian FA data can be applied to the Canadian data.
Using the developed RF algorithm, cluster membership was predicted for all Belgian (N = 774,781) and Canadian (N = 670,165) records. Table 5 presents the proportion of records by cluster for both countries, illustrating differences not only between Belgium and Canada but also compared to the GH (Table 4). We assumed that higher GH milk spectra are over-represented in the Belgian subset and that differences between countries may be due to the variability differences displayed in Figure 5.
First, there is minimal or no representation of Canadian data in Clusters 2, 3, 4, and 6. Second, the primary clusters in Belgium are Clusters 1 and 4, with Clusters 5 and 7 also occurring frequently, whereas Cluster 7 is predominant in Canada, with Clusters 1 and 5 following. One hypothesis, therefore, is that Clusters 1 and 4 represent an average situation in Belgium, while Cluster 7 represents an average situation in Canada. As most herds in a population are expected to be well managed, this would be the normal situation. At this step, more information such as the FA profiles is required to deepen interpretations.
To further explore the dynamics between clusters, transitions from one cluster to another were computed (Table 6). This table gives the proportion of records going from Cluster A at time t to Cluster B at time t + 1. In Belgium, Clusters 1, 4, and 5 have the highest proportion—around 65%—of successive records remaining within the same cluster. Clusters 1 and 4 appear to be interrelated, as approximately 20% of the records transition between them. A significant proportion of records from Cluster 5 moves to Cluster 1. In contrast, only a small proportion of records from Clusters 1 or 4 transit to Cluster 5. Instead, records in Cluster 5 predominantly come from Clusters 2, 3, and 6. Cluster 7 is less stable and less represented than Clusters 1 and 4, but it seems to be more related to these clusters than to others. In contrast, Cluster 3 rarely transits to Clusters 1 or 4.
Finally, Clusters 2 and 6 are the least stable, likely associated with temporary events. This could suggest that herds in Cluster 3 have experienced a deterioration in animal welfare, with herds remaining in this cluster longer than in others. This may indicate a problematic state and greater difficulty returning to a normal status. There are more transitions to Cluster 7 from Clusters 1 and 4, which are considered healthy clusters, than from Clusters 2, 3, 5, and 6, which are considered problematic. These observations suggest that Clusters 1 and 4 represent typical conditions in Wallonia, with Clusters 5 and 7 as intermediate clusters, and Clusters 2, 3, and 6 as potential problem clusters related to stress. If these hypotheses are correct and these clusters indeed reflect stress states, this dynamic would be expected.
In Canada, Cluster 7, which seems to be the standard, is also related to Clusters 1 and 4, even though Cluster 4 is not very represented in this dataset. Cluster 5 observations mostly stay in Cluster 5, and observations from Clusters 2, 3 and 6 go back to Cluster 5.

3.3. Cluster Interpretation

To deepen the interpretation of clusters, the means by cluster were calculated for phenotypes predicted by FT-MIR from bulk tank milk collected in Belgium and Canada (Table 7). To facilitate reading of the results, a color code was applied. The best and second-best values are in green and those with the worst values are in red. Globally, the results are similar for Belgium and Canada. The major differences between Belgium and Canada for FAs concern C18, polyunsaturated FAs, branched FAs, and odd FAs. For the supplementary variables, the protein, protein efficiency, energy balance, and blood free FAs are globally lower in Canada while the milk yield and the dry matter intake are higher. However, the performance of the equation for energy balance and blood free FAs is low, so the differences might be irrelevant. The maximum fat is also higher in Canada. Generally, Cluster 4 presents the best values in Wallonia followed by de-novo-FA-dependent Clusters 7 and Cluster 1, which have good values on average. Cluster 3 is the worst, followed by Cluster 6 and Cluster 2. Like Cluster 1, in terms of the “healthy” clusters, Cluster 5 seems to be intermediate. The same trends are observed for Canada. However, Clusters 2, 3, 4, and 6 are not very well represented in Canada. In a more controlled production and management system, these clusters may occur very rarely but may indicate more severe cases. Out of Clusters 1, 5, and 7, which represent 99.2% of the population, Cluster 5 would be the problematic cluster, with a potential transition to severe clusters.
So, Cluster 1 is the most represented in Belgium, followed by Cluster 4. These are the “standards” with better values for Cluster 4. Cluster 7, the most represented in Canada, has also good overall values. The difference in “standard” clusters between Canada and Belgium, and the small representation of some clusters, can probably be explained by the differences in farming systems between the two regions. Indeed, Quebec has few farms practicing grazing, unlike Wallonia. In Quebec, most farms use stanchion barns for lactating cows, whereas Walloon farms generally use loose housing when the animals are not in pasture [39,40]. The herd size in Quebec is 73 cows, and each animal gives an average of over 9300 L of milk per year [41] In Wallonia, there are 64 cows per herd, and the milk production is only 6600 L per year on average [42]. Moreover, those clusters are probably related to different types of feeding management. The interpretation is complex because many factors such as the type of diet, amount of concentrates, and fat supplementation are interacting with each other. The hypothesis is that Clusters 1, 4, and 7 correspond respectively to fresh grass, grass silage, and maize silage, potentially with concentrates and fat supplementation. Indeed, the grass silage diet is associated with higher C14 and C16 and lower C18 mono- and polyunsaturated FAs. The maize silage diet tends to increase the C6, C8, C10, and C12 found in Cluster 7 while the fresh grass diet favors an increase in C4 and C18:1cis9 [43]. Among the “healthy” clusters, Cluster 1 has the highest values for those traits. Concentrates are potentially found in Clusters 4 and 7 because they increase the percentages of C10 and C12 but also trans monounsaturated FAs and C18:2cis9cis12 [43]. Fat supplementation differs depending on the type of lipid. However, it is associated with an increase in polyunsaturated FAs and an inhibition of de novo FAs [43], which can explain the difference between Wallonia and Quebec for PUFAs and the inversion of the cluster order for de novo FAs between Clusters 4 and 7.
The hypothesis for Clusters 2, 3, 5, and 6 is that they include herds in a state of stress or poor health. Indeed, these clusters generally present the worst values for at least five traits. However, there are differences between these four clusters. First, Cluster 3 is the worst cluster, and is associated with extreme values for most of the traits. It is difficult to interpret the exact issue, as all the traits are extreme, but it is probably related to energy balance or ruminal acidosis. Moreover, Cluster 3 also presents a low value for the predicted milk yield.
Cluster 1, which is considered related to pasture or fresh grass, is similar to Clusters 2 and 6. However, Cluster 1 presents good values, unlike Clusters 2 and 6. After Cluster 3, Cluster 6 has the worse values for short-chain FAs, C16:1, and protein, and high values of THI. The low value for de novo FAs could indicate that there is an issue with rumen health, but for rare events, involving around 2.5% of the observations, such as dietary imbalance. Cluster 5, which has low values for C16 and high values for branched FAs, trans FAs, and odd-chain FAs, is also probably related to rumen health affected by the feeding strategy, such as being given a high-concentrate diet. This cluster is more common, involving 10% of the observations, and has low values for milk yield and energy balance, so it could be a combination with an energy balance issue. For those two clusters, farmers should monitor rumen health and adapt consequently. Cluster 2 has low values for fat, protein, unsaturated FAs, C14, C14:1c9, and DMI. It has also high values for blood free FAs and for long-chain FAs, more specifically C18 and C18:1c9, which is a sign of fat mobilization and negative energy balance. These clusters are essential, as they are transitioning to Cluster 3, which is an abnormal herd status. At this step, on-farm information is required to obtain a deeper interpretation.
Even if we have a high percentage of missing information, increasing the difficulty of drawing general conclusions, the management indicators provided by Lactanet should be useful information to further understand the clusters. Table 8 shows a higher percentage of organic farms and fewer herds with additional ventilation in Clusters 2 and 6 than in Clusters 1, 3, and 5. There is a significant difference between clusters in feed intake, with a very low proportion of corn silage in the rations of Clusters 2 and 3 (Table 9).
Table 10 shows herd indicators related to management, production, reproduction, and sanitary status averaged over twelve months. More than 50% of the Canadian herds in each cluster had data available for these different indicators. Cluster 6 was not found in this dataset. The data for Clusters 1 and 5 are very close or identical for most indicators, and they present the best values. Clusters 4 and 7 also had close, but worse values.
If we look at the transition index of these clusters, which represents the average herd’s performance at the start of lactation compared with what was expected of it, we observe that Clusters 2 and 3 have the lowest values, with negative values for Cluster 3. Cluster 5 is also slightly lower than the others. In other words, these herds showed lower performance in the first lactation than expected, suggesting a bad transition strategy between dry-off and early lactation. The milk management index, which estimates the environmental effect, shows negative values, but with very large standard deviations for all clusters. These indices are less negative for Clusters 1, 4, and 7. This trend is also found for the fat management index, suggesting that Clusters 2, 3, and 5 represent undesirable situations in Canada.

3.4. From Clusters to Probabilities

We showed that the clusters are related to herd management and could be of interest as a monitoring tool. However, we can improve the quality of the information given by using the probability of belonging to a cluster instead of a binary value. These probabilities are already a by-product of the RF algorithm, if we consider the proportions of trees that predict an observation into each cluster. These proportions respect the properties of probability. While these probabilities are correlated, which is expected, as the clusters overlap in Figure 4 and the underlying biological process is not discrete, the information they provide is useful for developing a monitoring tool on a routine basis. This information is mainly related to feeding management practices at the herd level and is available every one to three days. The FT-MIR predictions of the clusters and their probabilities become time series where extreme events can be detected and sent to the farmer.
An example of such a time series is represented in Figure 6. Every probability varies in time, and the switches between clusters happen when the cluster with the highest probability changes. This is typically the case when a peak occurs for specific issues. However, the peak is rarely based on one milk sample, so the increase in probability could be an early method of detecting management issues. In Belgium, the most observed clusters are Clusters 1 and 4, with some potential issues detected with Clusters 2 and 5. For Canada, it is mostly Cluster 7, then Cluster 1 and Cluster 5 for some potential issues. Those probabilities could be the sources of feedback for management.

3.5. Practical Implications and Study Limitations

The findings of this study suggest that FA profiles, summarized through a clustering approach, could be implemented as a herd-level monitoring tool for farmers. This tool would function as an alert system integrated into herd-monitoring software, where an alert would indicate the detection of an anomaly without necessarily identifying a precise cause. The alert system would not replace an expert opinion but could encourage the farmers to ask for help. Based on the cluster interpretation in this study, it appears that, even though the standard clusters differed between Belgium and Canada, likely due to different farming systems, Cluster 3 was consistently associated with abnormal FA profiles in both countries, even if Cluster 3 is not common in Canada. In Belgium, the observed dynamics between clusters showed transitions from intermediate clusters to Cluster 3, suggesting that monitoring the probability of belonging to these intermediate clusters could be used to detect early warning signals and mitigate health issues before escalation. The simplest solution would be to consider the cluster membership. So, Cluster 3 could be a red flag, while Cluster 2, 5, and 6 could be orange flags, and the others green flags. The red flag, which is severe, suggests that the farmer should contact an advisor or a veterinarian, while an orange flag suggests that they should evaluate the feed storage, the diet composition, or the fat supplementation, or check for a heat stress event and check the farm mitigation system.
However, the study presents several limitations. First, the interpretation of clusters remains complex due to the numerous interactions between FA profiles and management practices. Second, cluster validation is required using on-farm data and real-world health observations. While pattern detection from large FT-MIR prediction databases is a powerful tool, it cannot replace validation using reference measurements and expert field knowledge. Third, the alert methodology itself must be further fine-tuned to optimize the signal derived from the time series data. Fourth, practical deployment of such a system depends on a data infrastructure capable of delivering alerts rapidly enough to inform timely decision making at the farm level.
These limitations could be addressed through pilot trials on commercial farms, where feedback from farmers is available when the clustering approach flags potential issues. Such trials would support better interpretation of the clusters, confirm their utility, fine-tune the alert system, and help develop an experimental real-time data pipeline that could later be scaled for routine use. This would also allow for a first use case and for an assessment of its usefulness.

4. Conclusions

This study extends a previously developed clustering methodology—originally designed for individual milk samples—to bulk tank milk samples. The results suggest that the FT-MIR variability in Belgian herds encompasses that of Canadian herds, potentially reflecting a broader range of feeding systems in Belgium. Our findings indicate that this methodology could serve as a cost-effective tool for generating actionable indicators from existing milk payment data without imposing additional burdens on farmers. Notably, the transition from intermediate clusters to Cluster 3 appears promising as a reliable health-monitoring indicator. Integrating this approach into routine herd management may enhance the capacity to respond proactively to management issues. The final automated alert system should be merged with an individual level alert to improve the feedback given to farmers. However, field validation is required before widespread implementation, particularly to adapt the approach across diverse farming systems and geographical contexts.

Author Contributions

Conceptualization, S.F., C.F. and H.S.; methodology, S.F., C.F., C.N., N.G. and H.S.; software, S.F., C.F., C.N. and H.S.; validation, S.F., C.F. and H.S.; formal analysis, S.F., C.F. and H.S.; investigation, S.F., C.F. and H.S.; resources, F.D., M.B., C.B. and D.V.; data curation, M.B., D.W., D.V., C.B. and C.N.; writing—original draft preparation, S.F., C.F. and H.S.; writing—review and editing, S.F., D.W. and H.S.; visualization, S.F., D.W. and C.F.; supervision, D.E.S. and H.S.; project administration, H.S.; funding acquisition, H.S. All authors have read and agreed to the published version of the manuscript.

Funding

This research received no external funding.

Institutional Review Board Statement

Not applicable.

Informed Consent Statement

Not applicable.

Data Availability Statement

Restrictions apply to the availability of these data. The data were obtained from “Comité du lait” and “Lactanet” and are available upon request with the permission of the provider.

Acknowledgments

The authors acknowledge the support of Lactanet and the University Laval for hosting one co-author during its internship in Canada. During the preparation of this manuscript/study, the author(s) used ChatGPT 4.5 as a writing assistant to refine phrasing and improve readability. The authors have reviewed and edited the output and take full responsibility for the content of this publication.

Conflicts of Interest

The authors declare no conflicts of interest.

Abbreviations

The following abbreviations are used in this manuscript:
BHBβ-hydroxybutyrate
DHIDairy Herd Improvement
FAFatty acid
FFAFree fatty acid
FT-MIRFourier Transform mid-infrared
GHStandardized Mahalanobis distance
PCPrincipal component
PCAPrincipal component analysis
THITemperature humidity index

References

  1. FAO. Contribution of Terrestrial Animal Source Food to Healthy Diets for Improved Nutrition and Health Outcomes—An Evidence and Policy Overview on the State of Knowledge and Gaps; FAO: Rome, Italy, 2023. [Google Scholar] [CrossRef]
  2. Jensen, R.G. The Composition of Bovine Milk Lipids: January 1995 to December 2000. J. Dairy Sci. 2002, 85, 295–350. [Google Scholar] [CrossRef] [PubMed]
  3. Chilliard, Y.; Ferlay, A.; Mansbridge, R.M.; Doreau, M. Ruminant Milk Fat Plasticity:Nutritional Control of Saturated, Polyunsaturated, Trans and Conjugated Fatty Acids. Ann. Zootech. 2000, 49, 181–205. [Google Scholar] [CrossRef]
  4. Giannuzzi, D.; Toscano, A.; Pegolo, S.; Gallo, L.; Tagliapietra, F.; Mele, M.; Minuti, A.; Trevisi, E.; Marsan, P.A.; Schiavon, S.; et al. Associations between Milk Fatty Acid Profile and Body Condition Score, Ultrasound Hepatic Measurements and Blood Metabolites in Holstein Cows. Animals 2022, 12, 1202. [Google Scholar] [CrossRef]
  5. Riuzzi, G.; Davis, H.; Lanza, I.; Butler, G.; Contiero, B.; Gottardo, F.; Segato, S. Multivariate Modelling of Milk Fatty Acid Profile to Discriminate the Forages in Dairy Cows’ Ration. Sci. Rep. 2021, 11, 23201. [Google Scholar] [CrossRef] [PubMed]
  6. Lindmark Månsson, H. Fatty Acids in Bovine Milk Fat. Food Nutr. Res. 2008, 52, 1821. [Google Scholar] [CrossRef]
  7. Bauman, D.E.; Mather, I.H.; Wall, R.J.; Lock, A.L. Major Advances Associated with the Biosynthesis of Milk. J. Dairy Sci. 2006, 89, 1235–1243. [Google Scholar] [CrossRef]
  8. Jenkins, T.C. Lipid Metabolism in the Rumen. J. Dairy Sci. 1993, 76, 3851–3863. [Google Scholar] [CrossRef]
  9. Vlaeminck, B.; Fievez, V.; Cabrita, A.R.J.; Fonseca, A.J.M.; Dewhurst, R.J. Factors Affecting Odd- and Branched-Chain Fatty Acids in Milk: A Review. Anim. Feed Sci. Technol. 2006, 131, 389–417. [Google Scholar] [CrossRef]
  10. Fiore, E.; Blasi, F.; Morgante, M.; Cossignani, L.; Badon, T.; Gianesella, M.; Contiero, B.; Berlanda, M. Changes of Milk Fatty Acid Composition in Four Lipid Classes as Biomarkers for the Diagnosis of Bovine Ketosis Using Bioanalytical Thin Layer Chromatography and Gas Chromatographic Techniques (TLC-GC). J. Pharm. Biomed. Anal. 2020, 188, 113372. [Google Scholar] [CrossRef]
  11. Palmquist, D.L.; Beaulieu, A.D.; Barbano, D.M. Feed and Animal Factors Influencing Milk Fat Composition. J. Dairy Sci. 1993, 76, 1753–1771. [Google Scholar] [CrossRef]
  12. Jorjong, S.; van Knegsel, A.T.M.; Verwaeren, J.; Lahoz, M.V.; Bruckmaier, R.M.; De Baets, B.; Kemp, B.; Fievez, V. Milk Fatty Acids as Possible Biomarkers to Early Diagnose Elevated Concentrations of Blood Plasma Nonesterified Fatty Acids in Dairy Cows. J. Dairy Sci. 2014, 97, 7054–7064. [Google Scholar] [CrossRef] [PubMed]
  13. Craninx, M.; Steen, A.; Van Laar, H.; Van Nespen, T.; Martín-Tereso, J.; de Baets, B.; Fievez, V. Effect of Lactation Stage on the Odd- and Branched-Chain Milk Fatty Acids of Dairy Cattle Under Grazing and Indoor Conditions. J. Dairy Sci. 2008, 91, 2662–2677. [Google Scholar] [CrossRef]
  14. Mansbridge, R.J.; Blake, J.S. Nutritional Factors Affecting the Fatty Acid Composition of Bovine Milk Fat. Livest. Prod. Sci. 1997, 50, 95–110. [Google Scholar] [CrossRef]
  15. Soyeurt, H.; Dehareng, F.; Gengler, N.; McParland, S.; Wall, E.; Berry, D.P.; Coffey, M.; Dardenne, P. Mid-Infrared Prediction of Bovine Milk Fatty Acids across Multiple Breeds, Production Systems, and Countries. J. Dairy Sci. 2011, 94, 1657–1667. [Google Scholar] [CrossRef]
  16. Woolpert, M.E.; Dann, H.M.; Cotanch, K.W.; Melilli, C.; Chase, L.E.; Grant, R.; Barbano, D. Management, Nutrition, and Lactation Performance Are Related to Bulk Tank Milk de Novo Fatty Acid Concentration on Northeastern US Dairy Farms. J. Dairy Sci. 2016, 99, 8486–8497. [Google Scholar] [CrossRef]
  17. Woolpert, M.E.; Dann, H.M.; Cotanch, K.W.; Melilli, C.; Chase, L.E.; Grant, R.; Barbano, D. Management Practices, Physically Effective Fiber, and Ether Extract Are Related to Bulk Tank Milk de Novo Fatty Acid Concentration on Holstein Dairy Farms. J. Dairy Sci. 2017, 100, 5097–5106. [Google Scholar] [CrossRef]
  18. Hammami, H.; Vandenplas, J.; Vanrobays, M.-L.; Rekik, B.; Bastin, C.; Gengler, N. Genetic Analysis of Heat Stress Effects on Yield Traits, Udder Health, and Fatty Acids of Walloon Holstein Cows. J. Dairy Sci. 2015, 98, 4956–4968. [Google Scholar] [CrossRef] [PubMed]
  19. Liu, Z.; Ezernieks, V.; Wang, J.; Arachchillage, N.W.; Garner, J.B.; Wales, W.J.; Cocks, B.G.; Rochfort, S. Heat Stress in Dairy Cattle Alters Lipid Composition of Milk. Sci. Rep. 2017, 7, 961. [Google Scholar] [CrossRef] [PubMed]
  20. Mann, S.; Nydam, D.V.; Lock, A.L.; Overton, T.R.; McArt, J.A.A. Short Communication: Association of Milk Fatty Acids with Early Lactation Hyperketonemia and Elevated Concentration of Nonesterified Fatty Acids. J. Dairy Sci. 2016, 99, 5851–5857. [Google Scholar] [CrossRef]
  21. Vlaeminck, B.; Fievez, V.; Demeyer, D.; Dewhurst, R.J. Effect of Forage:Concentrate Ratio on Fatty Acid Composition of Rumen Bacteria Isolated From Ruminal and Duodenal Digesta. J. Dairy Sci. 2006, 89, 2668–2678. [Google Scholar] [CrossRef]
  22. Vlaeminck, B.; Fievez, V.; Tamminga, S.; Dewhurst, R.J.; Van Vuuren, A.; De Brabander, D.; Demeyer, D. Milk Odd- and Branched-Chain Fatty Acids in Relation to the Rumen Fermentation Pattern. J. Dairy Sci. 2006, 89, 3954–3964. [Google Scholar] [CrossRef] [PubMed]
  23. Soyeurt, H.; Dardenne, P.; Dehareng, F.; Lognay, G.; Veselko, D.; Marlier, M.; Bertozzi, C.; Mayeres, P.; Gengler, N. Estimating Fatty Acid Content in Cow Milk Using Mid-Infrared Spectrometry. J. Dairy Sci. 2006, 89, 3690–3695. [Google Scholar] [CrossRef]
  24. Franceschini, S.; Grelet, C.; Leblois, J.; Gengler, N.; Soyeurt, H. Can Unsupervised Learning Methods Applied to Milk Recording Big Data Provide New Insights into Dairy Cow Health? J. Dairy Sci. 2022, 105, 6760–6772. [Google Scholar] [CrossRef]
  25. R Core Team. R: A Language and Environment for Statistical Computing; R Core Team: Vienna, Austria, 2022. [Google Scholar]
  26. Grelet, C.; Fernández Pierna, J.A.; Dardenne, P.; Baeten, V.; Dehareng, F. Standardization of Milk Mid-Infrared Spectra from a European Dairy Network. J. Dairy Sci. 2015, 98, 2150–2160. [Google Scholar] [CrossRef]
  27. Grelet, C.; Dardenne, P.; Soyeurt, H.; Fernandez, J.A.; Vanlierde, A.; Steevens, F.; Gengler, N.; Dehareng, F. Large-Scale Phenotyping in Dairy Sector Using Milk MIR Spectra: Key Factors Affecting the Quality of Predictions. Methods 2021, 186, 97–111. [Google Scholar] [CrossRef] [PubMed]
  28. Agromet.Be. Available online: https://agromet.be/fr/pages/home/ (accessed on 25 March 2025).
  29. Nickmilder, C.; Tedde, A.; Dufrasne, I.; Lessire, F.; Glesner, N.; Tychon, B.; Bindelle, J.; Soyeurt, H. Creation of a Walloon Pasture Monitoring Platform Based on Machine Learning Models and Remote Sensing. Remote. Sens. 2023, 15, 1890. [Google Scholar] [CrossRef]
  30. Jubb, T.; Perkins, N.R. Veterinary Handbook for Cattle, Sheep and Goats; Australian Livestock Export Corporation Limited: Sydney, Australia, 2015. [Google Scholar]
  31. International Council on Animal Recording (ICAR). Cattle Milk Recording: Standards and Guidelines. Available online: https://www.icar.org/Guidelines/02-Overview-Cattle-Milk-Recording.pdf (accessed on 27 March 2025).
  32. Lê, S.; Josse, J.; Husson, F. FactoMineR: An R Package for Multivariate Analysis. J. Stat. Softw. 2008, 25, 1–18. [Google Scholar] [CrossRef]
  33. Zhang, L.; Li, C.; Dehareng, F.; Grelet, C.; Colinet, F.; Gengler, N.; Brostaux, Y.; Soyeurt, H. Appropriate Data Quality Checks Improve the Reliability of Values Predicted from Milk Mid-Infrared Spectra. Animals 2021, 11, 533. [Google Scholar] [CrossRef] [PubMed]
  34. Music, J.; Charlebois, S.; Marangoni, A.G.; Ghazani, S.M.; Burgess, J.; Proulx, A.; Somogyi, S.; Patelli, Y. Data Deficits and Transparency: What Led to Canada’s ‘Buttergate’. Trends Food Sci. Technol. 2022, 123, 334–342. [Google Scholar] [CrossRef]
  35. Lactanet Guide to the Sustainability Index. Available online: https://lactanet.ca/en/guide-to-the-sustainability-index/ (accessed on 21 March 2024).
  36. Murtagh, F.; Legendre, P. Ward’s Hierarchical Agglomerative Clustering Method: Which Algorithms Implement Ward’s Criterion? J. Classif. 2014, 31, 274–295. [Google Scholar] [CrossRef]
  37. Kuhn, M. Building Predictive Models in R Using the Caret Package. J. Stat. Softw. 2008, 28, 1–26. [Google Scholar] [CrossRef]
  38. Liaw, A.; Wiener, M. Classification and Regression by randomForest. R News 2002, 2, 18–22. [Google Scholar]
  39. National Farm Animal Care Council (NFACC). Code of Practice for the Care and Handling of Dairy Cattle: Review of Scientific Research on Priority Issues; National Farm Animal Care Council (NFACC): Lacombe, AB, Canada, 2020. [Google Scholar]
  40. Arnott, G.; Ferris, C.; O’Connell, N. A Comparison of Confinement and Pasture Systems for Dairy Cows: What Does the Science Say? AgriSearch: Lisburn, UK, 2015. [Google Scholar]
  41. Production Laitière (Lait de Vache). Available online: https://www.quebec.ca/agriculture-environnement-et-ressources-naturelles/agriculture/industrie-agricole-au-quebec/productions-agricoles/production-lait-vache (accessed on 27 March 2025).
  42. Service Publique de Wallonie. État de l’Agriculture Wallonne; SPW ARNE: Wallonia, Belgium, 2021. [Google Scholar]
  43. Chilliard, Y.; Ferlay, A.; Doreau, M. Effect of Different Types of Forages, Animal Fat or Marine Oils in Cow’s Diet on Milk Fat Secretion and Composition, Especially Conjugated Linoleic Acid (CLA) and Polyunsaturated Fatty Acids. Livest. Prod. Sci. 2001, 70, 31–48. [Google Scholar] [CrossRef]
Figure 1. Diagram of fatty acid synthesis pathways (FAs: fatty acids, µorganism: microorganisms).
Figure 1. Diagram of fatty acid synthesis pathways (FAs: fatty acids, µorganism: microorganisms).
Animals 15 01575 g001
Figure 2. Workflow of the analysis. The colors indicate the origin of the dataset: green for Belgium and red for Canada. (Bel = Belgian, DB = database, Can = Canadian, FT-MIR = Fourier Transform mid-infrared, Pred = predictions).
Figure 2. Workflow of the analysis. The colors indicate the origin of the dataset: green for Belgium and red for Canada. (Bel = Belgian, DB = database, Can = Canadian, FT-MIR = Fourier Transform mid-infrared, Pred = predictions).
Animals 15 01575 g002
Figure 3. A dendrogram of the seven clusters found from the selected subset.
Figure 3. A dendrogram of the seven clusters found from the selected subset.
Animals 15 01575 g003
Figure 4. Projection of subset samples on the first two principal components (N = 27,322) (SCFA = short-chain fatty acids, MCFA = medium-chain fatty acids, LCFA = long-chain fatty acids, Sat = saturated fatty acids, Mono = monounsaturated fatty acids, Poly = polyunsaturated fatty acids, Insat = total of insaturated fatty acids, BFA = branched fatty acids, TotT = total of trans fatty acids).
Figure 4. Projection of subset samples on the first two principal components (N = 27,322) (SCFA = short-chain fatty acids, MCFA = medium-chain fatty acids, LCFA = long-chain fatty acids, Sat = saturated fatty acids, Mono = monounsaturated fatty acids, Poly = polyunsaturated fatty acids, Insat = total of insaturated fatty acids, BFA = branched fatty acids, TotT = total of trans fatty acids).
Animals 15 01575 g004
Figure 5. Projection of Canadian data (blue) onto graph of individuals (black) on first two principal components estimated based on whole Belgian database.
Figure 5. Projection of Canadian data (blue) onto graph of individuals (black) on first two principal components estimated based on whole Belgian database.
Animals 15 01575 g005
Figure 6. Evolution of cluster probabilities in time series for one farm in Belgium (A) and one farm in Canada (B).
Figure 6. Evolution of cluster probabilities in time series for one farm in Belgium (A) and one farm in Canada (B).
Animals 15 01575 g006
Table 1. Descriptive statistics for the 39 FT-MIR-predicted traits used in the study calculated from the available Belgian and Canadian spectra.
Table 1. Descriptive statistics for the 39 FT-MIR-predicted traits used in the study calculated from the available Belgian and Canadian spectra.
FT-MIR Predicted Traits 1UnitBelgium
(N = 774,781)
Canada
(N = 670,165)
R2cvRMSE
Fatg/dL of milk4.17 ± 0.344.14 ± 0.27//
Proteing/dL of milk3.47 ± 0.192.62 ± 0.13//
Milk yieldkg/day26.85 ± 2.8529.97 ± 1.540.693.48
Energy balance −3.12 ± 3.36−7.05 ± 1.560.431.33
Nitrogen efficiency 53.67 ± 14.1217.54 ± 1.250.521.44
Blood BHBmmol/L plasma (log)−0.82 ± 0.09−0.78 ± 0.060.71.85
Blood free FAµeq/L of plasma495.15 ± 131.57421.89 ± 84.790.39344.2
Dry matter intakekg/day22.81 ± 2.1424.18 ± 1.410.451.35
C4g/100 g of fat2.68 ± 0.192.63 ± 0.130.930.008
C6g/100 g of fat1.81 ± 0.121.83 ± 0.080.910.006
C8g/100 g of fat1.18 ± 0.101.28 ± 0.070.910.004
C10g/100 g of fat2.65 ± 0.363.26 ± 0.250.920.01
C12g/100 g of fat3.32 ± 0.444.09 ± 0.330.930.011
C14g/100 g of fat11.43 ± 0.8712.32 ± 0.650.940.03
C14:1cis9g/100 g of fat1.06 ± 0.111.17 ± 0.080.710.008
C16g/100 g of fat31.33 ± 3.2528.90 ± 1.520.950.091
C16:1g/100 g of fat1.61 ± 0.171.56 ± 0.090.730.013
C17g/100 g of fat0.64 ± 0.050.66 ± 0.020.810.003
C18g/100 g of fat9.53 ± 1.039.05 ± 0.610.840.056
Total C18:1transg/100 g of fat3.12 ± 0.753.62 ± 0.370.80.025
C18:1cis9g/100 g of fat18.56 ± 2.6319.98 ± 1.740.950.063
Total C18:1cisg/100 g of fat20.04 ± 2.7821.57 ± 1.860.950.061
Total C18:2g/100 g of fat2.10 ± 0.222.54 ± 0.120.710.014
C18:2cis9cis12g/100 g of fat1.25 ± 0.151.49 ± 0.110.750.011
C18:2cis9trans11g/100 g of fat0.47 ± 0.100.61 ± 0.050.740.01
C18:3cis9cis12cis15g/100 g of fat0.76 ± 0.330.96 ± 0.130.690.004
Saturated FAsg/100 g of fat68.49 ± 4.0966.88 ± 2.060.990.072
Monounsaturated FAsg/100 g of fat26.78 ± 3.1127.51 ± 1.970.970.059
Polyunsaturated FAsg/100 g of fat3.46 ± 0.684.44 ± 0.280.790.021
Unsaturated FAsg/100 g of fat30.33 ± 3.5731.87 ± 2.150.970.064
Short-chain FAsg/100 g of fat8.77 ± 0.609.32 ± 0.420.930.025
Medium-chain FAsg/100 g of fat51.68 ± 3.9551.89 ± 2.50.970.104
Long-chain FAsg/100 g of fat38.50 ± 4.1239.04 ± 2.720.950.11
Branched FAsg/100 g of fat2.27 ± 0.262.67 ± 0.080.770.013
Omega3g/100 g of fat0.58 ± 0.120.71 ± 0.050.680.006
Omega6g/100 g of fat2.13 ± 0.242.58 ± 0.140.740.014
Odd-chain FAsg/100 g of fat3.82 ± 0.354.30 ± 0.120.840.016
Total Trans FAsg/100 g of fat3.91 ± 0.934.59 ± 0.460.820.029
Total C18:1g/100 g of fat23.15 ± 3.0823.7 ± 1.990.960.06
1 Traits in bold will be used in the following unsupervised analysis. FAs = fatty acids, BHB = beta-hydroxybutyrate.
Table 2. Herd characteristics for 3006 Canadian farms.
Table 2. Herd characteristics for 3006 Canadian farms.
UnitMean ± SD
Number of lactation cowsCows66.65 ± 50.87
Days in milkdays176.99 ± 23.64
Margin on feed costs$CA/cow/year5009.77 ± 1254.85
Margin on feed costs per kg of fat$CA/cow/year/kg12.53 ± 1.52
Milk yieldL/day26.59 ± 5.05
Fatkg/cow/day1.06 ± 0.30
Proteinkg/cow/day0.86 ± 0.24
Milk production at lactation peakL/day39.72 ± 5.76
Days in milk at lactation peakdays44.75 ± 4.54
Somatic cells in milk×103 cells/mL182.02 ± 116.21
% of cows in the herd with somatic cells count > 200,000 cells/mL%18.95 ± 7.92
Urea in milkg/mL5.01 ± 5.90
% of cows in the herd with a urea concentration < 5 or >12%7.75 ± 9.90
Transition index 236.12 ± 445.85
% of cows in the herd with a negative transition index%37.98 ± 18.25
Age at first calvingmonth25.30 ± 2.28
Calving intervaldays409.75 ± 31.72
Management index −303.55 ± 1218.80
Management index for fat −14.99 ± 52.65
% of the 3006 farms in conventional farming system%60.98
% of the 3006 farms in organic farming system%13.11
% of the 3006 farms with no information about their farming system%25.91
Table 3. Proportions of the seven clusters for the observations from the subset (N = 27,322).
Table 3. Proportions of the seven clusters for the observations from the subset (N = 27,322).
Cluster
Cluster 1Cluster 2Cluster 3Cluster 4Cluster 5Cluster 6Cluster 7
All data23.557.9011.7222.7515.8910.818.71
GH < 327.532.294.0728.5115.645.4813.47
GH > 317.9812.5121.0415.7312.5417.292.91
Table 4. Confusion matrix related to random forest algorithm applied on 31 predicted fatty acids (N = 27,322).
Table 4. Confusion matrix related to random forest algorithm applied on 31 predicted fatty acids (N = 27,322).
Reference
Cluster 1Cluster 2Cluster 3Cluster 4Cluster 5Cluster 6Cluster 7
PredictionCluster 15781122027910011359
Cluster 278165124030570
Cluster 304329790105720
Cluster 428700587503646
Cluster 55036111140014862
Cluster 6963390364626238
Cluster 75300245942204
Table 5. Percentages of Belgian and Canadian records by cluster.
Table 5. Percentages of Belgian and Canadian records by cluster.
Cluster 1Cluster 2Cluster 3Cluster 4Cluster 5Cluster 6Cluster 7
Belgium38.471.181.0436.6010.402.519.80
Canada12.000.280.040.4813.22<0.0173.98
Table 6. Cluster transitions between 2 successive controls, expressed in %, for data collected in Belgium and Canada 1.
Table 6. Cluster transitions between 2 successive controls, expressed in %, for data collected in Belgium and Canada 1.
Cluster 1Cluster 2Cluster 3Cluster 4Cluster 5Cluster 6Cluster 7
BelgiumCluster 165.631.260.0920.195.971.695.17
Cluster 240.2432.522.324.9710.868.900.20
Cluster 32.782.8555.750.4523.6714.330.17
Cluster 422.020.140.0169.810.430.537.06
Cluster 520.681.232.251.6264.653.006.56
Cluster 624.014.216.558.0213.1843.420.62
Cluster 719.830.020.0127.805.540.1246.68
CanadaCluster 134.390.280.010.3111.100.0053.92
Cluster 211.5118.630.700.0060.870.008.30
Cluster 30.837.4423.550.0062.810.005.37
Cluster 47.100.000.0035.090.690.0057.12
Cluster 510.271.240.170.0263.080.0025.22
Cluster 60.000.000.000.0083.330.0016.67
Cluster 78.790.040.000.384.430.0086.35
1 Each line is first defined by a cluster at time t, and then the other columns represent the probabilities of moving into the cluster at time t + 1. Clusters in bold represent less than 1% of the dataset.
Table 7. Means by cluster of the phenotypes predicted by FT-MIR from bulk tank milk collected in Belgium and Canada as well as temperature humidity index (THI) for Belgium. Two best values are highlighted in shades of green, and worst in shades of red according to their rankings. Clusters for Quebec, in bold, represent less than 1% each.
Table 7. Means by cluster of the phenotypes predicted by FT-MIR from bulk tank milk collected in Belgium and Canada as well as temperature humidity index (THI) for Belgium. Two best values are highlighted in shades of green, and worst in shades of red according to their rankings. Clusters for Quebec, in bold, represent less than 1% each.
BelgiumCanada
C1C2C3C4C5C6C7C1C2C3C4C5C6C7
C42.782.872.492.692.552.602.502.792.852.632.752.642.552.61
C61.821.71.461.861.701.561.841.801.681.531.971.741.531.85
C81.161.010.881.231.110.941.261.221.080.971.391.180.951.30
C102.532.061.782.772.511.873.063.072.532.233.602.902.383.36
C123.152.512.303.513.132.443.783.833.102.804.533.623.194.22
C1411.139.569.0512.0710.559.9211.911.8010.279.6613.2811.3611.0612.58
C14:1c91.010.841.001.121.051.051.091.090.961.021.241.111.201.19
C1631.0427.1624.2634.3825.7830.0129.3128.7925.5423.5332.6226.5829.8429.33
C16:11.581.691.951.591.701.921.601.531.651.831.451.651.721.55
C170.640.660.720.610.700.660.700.640.660.700.620.680.680.66
C1810.0211.4810.449.049.569.939.189.4910.6610.578.369.678.738.87
Total C18:1t3.213.714.442.504.213.133.583.494.014.522.974.053.603.57
C18:1c919.0623.8226.2816.5621.8123.2617.9821.1825.2527.2616.4622.6223.2119.31
Total C18:1c20.6125.6928.1717.9723.3925.0719.2722.927.2229.2917.8024.3624.7820.85
Total C18:22.122.42.421.922.392.082.282.552.702.82.232.652.332.52
C18:2c9c121.281.471.201.221.201.211.261.561.611.551.341.51.411.47
C18:2c9t110.460.530.630.380.630.450.570.570.620.690.520.650.500.60
C18:3c9c12c150.720.801.380.501.330.791.080.851.001.260.761.121.050.96
SFAs68.3763.1557.5471.9361.5364.0667.2466.2561.395871.8463.4864.8167.59
MUFAs27.3832.7636.1524.3230.9932.1226.228.4333.0836.0323.2430.7130.5326.78
PUFAs3.464.014.602.864.563.364.194.294.675.113.784.764.174.41
UFAs30.9136.8140.9227.2935.6535.7130.432.6737.6941.1127.0035.4134.6731.11
SCFAs8.717.926.889.068.247.359.049.128.307.5410.088.757.599.46
MCFAs50.6244.0741.7955.5345.6748.25150.4444.6042.2856.9248.0850.2652.81
LCFAs39.8447.8149.8634.843.6744.213840.6447.1250.0333.4843.342.1838.01
BFAs2.212.202.662.112.702.262.542.582.602.772.492.732.602.67
Omega30.590.680.770.470.760.560.700.680.760.820.610.770.600.71
Omega62.172.482.361.952.362.032.342.612.732.782.302.662.342.56
Odd-chain FAs3.743.734.293.594.343.754.264.144.104.354.104.344.414.32
Total trans FAs4.004.585.543.165.323.884.564.314.925.623.785.114.334.54
Total C18:123.8529.432.2820.6427.2428.2722.4924.7629.5132.1819.4426.9026.3522.94
Fat4.113.944.014.234.104.054.254.174.023.994.844.023.834.15
Protein3.443.313.433.503.53.373.552.552.442.502.902.562.552.65
Milk yield26.8726.1022.4528.1124.4924.8125.5230.8329.8226.5530.9028.5624.7730.08
EB−2.75−5.40−8.50−1.43−7.40−4.83−4.04−7.87−9.51−10.14−6.95−8.22−6.94−6.69
NUE56.6756.2931.0458.1038.7341.7849.6718.7019.7619.7517.9618.2420.1417.22
Blood BHB−0.81−0.74−0.71−0.87−0.73−0.77−0.79−0.75−0.67−0.68−0.89−0.73−0.77−0.80
Blood free FAs526.9678.60714.60407.70590.40629.00520.30449.3597.8622.60293.8514.4496.00400.8
DMI22.2419.9319.7823.5122.0320.8424.1723.4921.4521.2727.3123.0324.6724.49
THI52.2752.4556.7150.2854.6756.3149.12///////
SFAs = saturated FAs, MUFAs = monounsaturated FAs, PUFAs = polyunsaturated FAs, UFAs = unsaturated FAs, SCFAs = short-chain FAs, MCFAs = medium-chain FAs, LCFAs = long-chain FAs, BFAs = branched FAs, EB = energy balance, NUE = Nitrogen Use Efficiency, BHB = beta-hydroxybutyrate, FFAs = free FAs, DMI = dry matter intake. THI was not calculated for the Canadian dataset, as meteorological data were not available.
Table 8. Percentages of organic and conventional farms in the various clusters, in % of herds present in the cluster.
Table 8. Percentages of organic and conventional farms in the various clusters, in % of herds present in the cluster.
LevelCluster 1
(N = 4330)
Cluster 2
(N = 520)
Cluster 3
(N = 83)
Cluster 4
(N = 577)
Cluster 5
(N = 3250)
Cluster 6
(N = 4)
Cluster 7
(N = 4639)
Farming
system
Organic8.4311.923.6112.138.770.008.26
Conventional 39.3325.7716.8740.5636.1525.0039.21
Unknown52.2462.3179.5247.3155.0875.0052.53
Additional ventilationNo75.8384.7571.4377.9177.19/75.95
Yes21.5513.5628.5718.9920.48/21.44
Unknown2.621.690.003.102.33/2.61
Data available for 2227 herds out of 4675 herds for Canada, N = 13,403 herds × cluster.
Table 9. Percentages of different feeding in the various clusters, in % of herds present in the cluster.
Table 9. Percentages of different feeding in the various clusters, in % of herds present in the cluster.
Cluster 1
(N = 498)
Cluster 2
(N = 27)
Cluster 3
(N = 2)
Cluster 4
(N = 66)
Cluster 5
(N = 319)
Cluster 6
(N = 0)
Cluster 7
(N = 529)
Feeding% of dry matter50.0331.330.7449.5346.56/49.83
% of corn silage22.448.632.0433.9818.22/22.79
Data available for 2227 herds out of 4675 herds for Canada, N = 13,403 herds × cluster.
Table 10. Means and standard deviations by cluster of management related data for 3006 Quebec herds. No observations were available for cluster 6.
Table 10. Means and standard deviations by cluster of management related data for 3006 Quebec herds. No observations were available for cluster 6.
TraitsCluster 1Cluster 2Cluster 3Cluster 4Cluster 5Cluster 7
Number of cows in lactation65.32 ± 41.6346.04 ± 17.140.55 ± 28.3266.22 ± 34.3858.13 ± 33.8966.06 ± 42.37
Days in milk174.41 ± 20.3177.82 ± 31.35234.75 ± 74.5174.26 ± 19.91176.55 ± 22.29174.94 ± 20.31
MFEED ($CA/cow/year)5134.67 ± 959.264643.00 ± 1056.393074.75 ± 1120.774932.20 ± 1056.975011.12 ± 1005.995129.53 ± 945.2
MFEED per fat yield ($CA/cow/year/kg of fat)12.59 ± 1.1712.83 ± 1.811.16 ± 0.8712.42 ± 1.3012.59 ± 1.3512.58 ± 1.14
Milk yield (L/cow/day)26.88 ± 4.5824.61 ± 4.6818.57 ± 5.9925.69 ± 5.2626.32 ± 4.6626.83 ± 4.56
Fat (kg/cow/day)1.08 ± 0.270.96 ± 0.250.72 ± 0.181.07 ± 0.281.04 ± 0.281.08 ± 0.26
Protein (kg/cow/day)0.89 ± 0.220.79 ± 0.210.62 ± 0.190.87 ± 0.230.86 ± 0.230.89 ± 0.22
Milk at lactation peak (L/day)40.09 ± 5.0237.12 ± 5.2132.35 ± 5.9238.46 ± 6.3539.39 ± 5.0840.05 ± 5.00
Days in milk at lactation peak44.30 ± 4.3742.85 ± 5.9237.5 ± 10.4742.43 ± 4.7044.41 ± 4.7344.25 ± 4.34
Somatic cells (103 cells/mL)175.49 ± 106.42183.50 ± 108.72278.25 ± 83.43177.64 ± 110.73181.34 ± 111.68177.34 ± 106.55
Somatic cells > 200,000/mL (% of cows/herd)18.57 ± 7.4420.80 ± 8.3431.27 ± 9.1519.54 ± 8.3319.41 ± 7.5218.69 ± 7.49
Urea (g/mL)9.36 ± 5.337.47 ± 5.485.08 ± 5.899.10 ± 5.719.22 ± 5.599.42 ± 5.33
Urea < 5 or >12 g/mL (% of cows/herd)7.50 ± 7.297.81 ± 7.006.95 ± 5.259.26 ± 8.727.98 ± 7.637.50 ± 7.22
Transition index275.94 ± 406.0553.90 ± 469.31−406.75 ± 512.39274.55 ± 429.99200.7 ± 413.45279.08 ± 401.74
Transition index < 035.90 ± 16.2545.88 ± 19.1667.5 ± 17.7136.06 ± 17.1738.75 ± 16.9535.72 ± 16.05
Age at first calving (month)24.99 ± 1.9426.09 ± 3.2730.15 ± 8.2825.12 ± 2.1725.23 ± 2.2124.99 ± 1.92
Calving interval (days)403.79 ± 26.51409.22 ± 37.63457.50 ± 95.24404.47 ± 23.5406.06 ± 30.01404.24 ± 26.79
% of involuntary culling19.43 ± 8.8918.46 ± 8.1920.94 ± 14.217.20 ± 7.7919.83 ± 9.4319.51 ± 8.90
% of dead cows5.25 ± 4.634.78 ± 4.463.69 ± 4.95.91 ± 5.595.37 ± 4.945.33 ± 4.61
Milk MI−289.96 ± 1223.41−798.47 ± 1456.19−2423.77 ± 1503.39−358.1 ± 1256.43−466.23 ± 1284.08−299.49 ± 1215.2
Fat MI−14.49 ± 52.79−40.84 ± 62.6−107.18 ± 55.86−14.57 ± 55.53−23.05 ± 54.50−14.86 ± 52.25
MI = management index, MFEED = margin on feed costs.
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

Share and Cite

MDPI and ACS Style

Franceschini, S.; Fastré, C.; Nickmilder, C.; Santschi, D.E.; Warner, D.; Bahadi, M.; Bertozzi, C.; Veselko, D.; Dehareng, F.; Gengler, N.; et al. Detection of Dairy Herd Management Issues Using Fatty Acid Profiles Predicted by Mid-Infrared Spectrometry. Animals 2025, 15, 1575. https://doi.org/10.3390/ani15111575

AMA Style

Franceschini S, Fastré C, Nickmilder C, Santschi DE, Warner D, Bahadi M, Bertozzi C, Veselko D, Dehareng F, Gengler N, et al. Detection of Dairy Herd Management Issues Using Fatty Acid Profiles Predicted by Mid-Infrared Spectrometry. Animals. 2025; 15(11):1575. https://doi.org/10.3390/ani15111575

Chicago/Turabian Style

Franceschini, Sébastien, Claire Fastré, Charles Nickmilder, Débora E. Santschi, Daniel Warner, Mazen Bahadi, Carlo Bertozzi, Didier Veselko, Frédéric Dehareng, Nicolas Gengler, and et al. 2025. "Detection of Dairy Herd Management Issues Using Fatty Acid Profiles Predicted by Mid-Infrared Spectrometry" Animals 15, no. 11: 1575. https://doi.org/10.3390/ani15111575

APA Style

Franceschini, S., Fastré, C., Nickmilder, C., Santschi, D. E., Warner, D., Bahadi, M., Bertozzi, C., Veselko, D., Dehareng, F., Gengler, N., & Soyeurt, H. (2025). Detection of Dairy Herd Management Issues Using Fatty Acid Profiles Predicted by Mid-Infrared Spectrometry. Animals, 15(11), 1575. https://doi.org/10.3390/ani15111575

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Metrics

Back to TopTop