Validation of Dairy Cow Bodyweight Prediction Using Traits Easily Recorded by Dairy Herd Improvement Organizations and Its Potential Improvement Using Feature Selection Algorithms

Simple Summary First, the current work consisted of validating the feasibility of large-scale dairy cow bodyweight prediction from models involving the day in milk, milk yield, parity, and milk mid-infrared spectrum. Second, it aimed to improve the accuracy of predictive models by using feature selection algorithms to decrease the number of predictors to limit overfitting. The models, using accessible and low-cost measurements, provided highly reproducible predictions. These could be easily obtained on an individual basis throughout a cow’s productive life by dairy herd improvement organizations, thus providing potentially relevant information for the dairy farmer at three levels: economics (reproductive performance), animal welfare (disease detection), and environment (methane production). Abstract Knowing the body weight (BW) of a cow at a specific moment or measuring its changes through time is of interest for management purposes. The current work aimed to validate the feasibility of predicting BW using the day in milk, parity, milk yield, and milk mid-infrared (MIR) spectrum from a multiple-country dataset and reduce the number of predictors to limit the risk of over-fitting and potentially improve its accuracy. The BW modeling procedure involved feature selections and herd-independent validation in identifying the most interesting subsets of predictors and then external validation of the models. From 1849 records collected in 9 herds from 360 Holstein cows, the best performing models achieved a root mean square error (RMSE) for the herd-independent validation between 52 ± 2.34 kg to 56 ± 3.16 kg, including from 5 to 62 predictors. Among these models, three performed remarkably well in external validation using an independent dataset (N = 4067), resulting in RMSE ranging from 52 to 56 kg. The results suggest that multiple optimal BW predictive models coexist due to the high correlations between adjacent spectral points.


Introduction
Knowing the live bodyweight (BW) of a dairy cow is an important source of information related to farm management [1], disease detection [2], quantification of dry matter intake [3], estimation of methane production [4,5], or even assessing the reproductive performance of a given cow through its BW changes [6]. Capturing BW can be achieved by moving the animal on a weighing scale or using an automatic walkover weighing system [7]. However, smaller farms cannot always afford these expensive, challenging, and costly to maintain investments [8]. Moreover, even if BW is measured at farms, the data often stay unshared, limiting its use by Dairy Herd Improvement (DHI) organizations. Previous studies showed the value of using observations from linear morphological classification in isolation [9,10], or in combination [11][12][13] in predicting BW, with accuracy ranging from 36 to 105 kg, and R 2 varying between 0.53 and 0.99. However, BW should be known regularly to monitor effectively, either at the farm or cow level. Therefore, accurate repeated measurements of morphological traits would be required, which could be costly and labor-intensive. Furthermore, linear morphological classification estimations are seldom taken during an animal's lifetime [8,14], which would prevent controlling BW over time.
To overcome some of these difficulties, researchers have developed automated methods making full use of computer vision techniques either through 2 [15,16], or 3-dimensional [17] devices. R 2 ranged from 0.75 to 0.98. More recently, Song et al. (2018) [18] obtained an RMSE of 41kg using a model whose predictors came from measurements estimated from a 3-D camera. Although a professional classifier's intervention would no longer be required with such images, the devices may be expensive and require fine-tuning and maintenance, adding cost for the dairy. These devices are also cumbersome, as shown by their representations [16][17][18].
The current study aimed to expand on the range of BW prediction equations using mid-infrared (MIR) spectra. These data are routinely obtained (often every four weeks) on a large scale by DHI organizations. The advantages include the low cost of the routine collection, reproducibility of measurement from standardized, automated acquisition protocols for spectral data and animal characteristics, and the time saved by the operators who do not need to spend time on measurement. With many years of data stored by some DHI centers, a backward BW prediction is of interest for genetic purposes. The rationale is that milk composition assessed through milk MIR data can be linked to body condition changes associated with mobilization of fat reserves. A previous study featured a performance of Root Mean Square Error (RMSE) of cross-validation (CV) of 53 kg, and RMSE of prediction ranging between 37 to 64 kg for BW using MIR spectra, days in milk (DIM), parity, the month of test, and milk yield [8]. As the high number of predictors in a model can harm its robustness, notably for partial least squares (PLS) regression [19], reducing the number of spectral points to be considered in the predictive model from 1060 to 277 based on relevance in explaining the composition of milk (i.e., the regions between 950 and 1600, 1750 and 1800 cm −1 and between 2600 and 3000 cm −1 ) has been recommended [8].
In addition to validating the feasibility of predicting BW from the day in milk (DIM), parity, milk yield (MY), and milk MIR spectrum, this study also aimed to reduce the number of predictors further with the objective of removing irrelevant features [20]. The working hypotheses were (i) MIR spectral data are valuable for predicting BW together with DIM, parity, and MY, and (ii) multiple optimal models should coexist in predicting BW using MIR spectrum and animal characteristics. To better determine the performance of the approach, (i) we compared model performance, using MIR, parity, DIM, and MY with another using linear morphological traits predicting BW, and (ii) we interpreted the shape of the averaged large-scale predicted BW regarding the period of lactation.

Materials and Methods
We used R software [21] for statistical analysis, the plsr function of the pls package [22] to calibrate the PLS regression models, and the caret [23] package as an interface to perform all model training and parametrization.

Training Data
The training dataset contained 1915 records collected between 2007 and 2016 made up of 360 Holstein cows from 9 herds (h01 to h09 in Table 1). A total of 1,161 records were collected from 93 Holstein cows (herds h03 and h06-h09) as part of the European Interreg Genotype plus Environment project (GplusE) (http://www.gpluse.eu, last visited 29 April 2021). Each sample included DIM, parity, test-day BW, MY, and milk MIR spectrum. Ragsdale (1934) reported that female Holstein reached a mature weight between 5 to 7 years [24]. Matthews and Fohrman (1954) exhibited similar observations [25]. Based on these considerations, the parity variable was divided into four classes (1st to 4th + ), with 4th + gathering cows within or beyond their fourth parity. For herds h01 to h05, the MIR spectra were obtained using a MilkoScan FT6000 spectrometer (Foss Analytics, Hillerød, Denmark) or a Standard Lactoscope FT-MIR automatic (PerkinElmer, Waltham, MA, United States). Regarding the GplusE herds (h03, h06-h09), MIR spectral data formerly came from FT2 and FT6000 spectrometers (Foss Analytics, Hillerød, Denmark) or a Standard Lactoscope FT-MIR automatic (PerkinElmer, Waltham, MA, United States) [26]. Those spectra were then standardized according to the standardization procedure given in [27]. We applied a first derivative transformation with a gap of 5 wavelengths to the original spectra to flatten the spectral baseline drift [8]. Only spectral points covered by Foss, PerkinElmer, and Bentley instruments were kept to ensure the model transferability between instruments. Those were located from 925.66 cm −1 to  [27], representing 797 points. Moreover, another preselection of 277 spectral points was also possible [8]: [950 cm −1 , 1600 cm −1 ], [1750 cm −1 , 1800 cm −1 ], and [2600 cm −1 , 3000 cm −1 ]. Consequently, we decided in this study to start the variables selection processes by considering two different starting points, one comprising all of the 797 initial spectral points, denoted by ALL, and the other having only the 277 points selected by empirical knowledge from Soyeurt et al. (2019) [8], subsequently expressed by HSO (for H. Soyeurt). Spectral records with a global H distance (GH), calculated using principal components explaining 99% of the spectral variability, higher than three were dropped [8,28,29].
The cows were weighed after milking using different walkover weighing pieces of equipment. A Fullwood branded system (Fullwood, Shropshire, UK) was used for herds h01-h03, for herd h04 Myscale Pro-W810 device (Gallagher, Canley, UK), an Autoweight (Dairymaster, Causeway, Co. Kerry, Ireland) system for h06-h09, and an incorporated scale gear inside an Astronaut A4 milking robot were employed for herds h05. Due to the variability of BW between age groups [24], 4 BW intervals were created and corresponded to the mean plus or minus the standard deviation of BW records calculated by parity class: (376,705), (449,767), (487,838), and (492,875) for cows in first to fourth+ parity, respectively. Records out of those ranges were discarded.
Finally, to ensure the consistency between BW and spectral records, a residual analysis based on predictions built from a partial least square (PLS) regression revealed other outliers. All samples with a residual value deviating by more than three standard deviations from the mean of all residuals were considered outliers and removed from the dataset. For each newly cleaned dataset, a new residual analysis was performed. This iterative procedure stopped when no more outliers were observed. The three cleaning steps (i.e., BW outliers, GH distances, and residual analysis) led to the deletion of 3.45% of the data, decreasing the number of records from 1915 to 1849, representing 360 different cows. Table 2 shows the descriptive statistics associated with this cleaned calibration dataset. All predictors were centered and scaled before modeling to give them the same importance in the regression.

Features Selection
Depending on the underlying induction model, the presence of irrelevant or redundant predictors could lead to performance degradation [20]. Therefore, we conducted variable selection for both datasets (i.e., HSO and ALL) to reduce the number of predictors. We used a univariate filter method, known to be tolerant in its selection [30]. This first technique, which did not consider the relationships between the variables, was a preselection step allowing only irrelevant predictors to be removed. Second, the recursive feature elimination (RFE) [31], based on the wrapper approach backward selection [32], enabled a more drastic selection by only picking a subset of the weakly relevant features.

Filter Selection
For each dataset, each predictor X was successively used to calibrate a univariate generalized additive model (GAM) resumed as BW = β 0 + β 1 s(X) + ε, where s is the smoothing splines function indicating any functional relationship between X and BW [32], β 1 , β 0 being respectively the model's slope and intercept, and ε the error. The benefit of using a GAM model over a simple linear regression was that it could point out even non-linear relationships between the target and predictor variables [33]. For each fitted model, the null hypothesis (H0) asserts that no functional relationship exists between BW and X, which was tested through a nonparametric F statistic with a significance level set to 5%. Consequently, any model associated with a p-value lower than 5% means the rejection of H0, indicating the relevance of maintaining the predictor (X) in the dataset. At the end of that selection step, two new datasets with fewer predictors than the former ones arose, namely ALL_SBF and HSO_SBF, with SBF standing for "selected by filter" (Table 3).  1 Original subsets of predictors before any selection step; All the mid-infrared spectral points originally considered (ALL); Only the mid-infrared spectral points at wavenumbers (950 cm −1 , 1600 cm −1 ), (1750 cm −1 , 1800 cm −1 ), and (2600 cm −1 , 3000 cm −1 ) originally considered before selection steps (HSO). 2 Corresponding number of features to the subset. 3 Subsets of predictors after the first selection step, i.e., selection by filter (SBF). 4 Subsets of predictors after the recursive feature elimination (RFE) selection step; SBF means that a first selection was made before RFE using selection by filter; VIP (variables in projection) and BETA (beta coefficients) refers to the ranking methods chosen for RFE. 5 Number of created subsets by the recursive feature elimination selection step. 6 Maximum number of selected features. 7 Stratified 10-fold cross-validation root mean squared error (kg).

Wrapper Selection
A wrapper is a multivariate selection method involving the induction algorithm in the variable evaluation mechanism [20,34], thus adding consistency in the variables' ranking regarding the final objective. The wrapper method chosen was RFE, which was composed of four steps. First, the predictors were ranked in descending order according to the relevance to predict BW. We used variable importance in the projection (VIP) and BETA scores on the entire training dataset to assess the relevance to predict BW [35]. Second, we split each ranked set of features (i.e., using VIP or BETA) into subsets. The number of subsets depended on the number of features in the ranked set. To limit the computational cost, we did not use all combinations of ranked features. To be clear, the first subset encompassed the total number of features (Nf), while those following included Nf-gap features. The different values taken by the variable gap depended on the number of features remaining (i.e., gap = 5 for (Nf, 255), gap = 2 for (252,200), and gap = 1 for (199,2) features). When a subset was created, a PLSR was built to re-rank the features following their information relevancy as explained previously and to evaluate the model by calculating a stratified 10-fold cross-validation RMSE (RMSEscv), as recommended by Kohavi (1995) [36]. For the stratification, we defined 4 BW intervals based on the minimum, maximum BW values and the first three quantiles: (428,564), (564,614), (614,671), and (671,820). Each cross-validation fold contained the same number of records in each BW interval. The wrapper selection was conducted on HSO, ALL_SBF, and HSO_SBF. Therefore, a total of 1,280 different subsets of features were created, as described in Table 3. In order to limit the computational cost, by still focusing on the most generalizable and parsimonious models, we kept for the further validation procedures, only the 13 models including an equal or lower number of features compared to the model having the lowest RMSE SCV + (tolerance/100) * lowest RMSE SCV where the tolerance ranged from 0 to 12. Consequently, we ended with a total of 78 subsets (i.e., each set ALL_SBF_VIP, ALL_SBF_BETA, HSO_VIP, HSO_BETA, HSO_SBF_VIP, and HSO_SBF_BETA contained 13 subsets of predictors).

Herd Independent Internal Validation
The stratified cross-validation, as proposed in this study, could have led to over-fitting of the models. Therefore, a herd-independent internal validation (iv) was used to define the optimal number of PLS factors and to identify the best subsets of features. The creation of the folds for this validation respected the following rules: herds in the calibration set must not be in the validation set, and the validation set could contain more than one herd to have a validation size comprised between 10 and 30% of the initial dataset. The partitioning was repeated 101 times for the 78 previously selected subsets. To make these models parsimonious, we decided to limit the maximum number of PLS factors to 10. This procedure was done for the 78 selected subsets of features.
Consequently, 7878 PLS models were built. For each kind of model, we computed the average herd-independent RMSE (RMSE iv ) and the associated standard deviation (RMSESD iv ). Then, we performed 10-fold stratified cross-validations on the same 7878 models to evaluate the gap between herd-independent and stratified cross-validation performance. To do so, we calculated the average RMSE per kind of models (RMSE SCV ) and its standard deviation (RMSESD SCV ). Ultimately, we selected the six best models (one per kind of model) based on these two averages RMSE and their respective standard deviation.

External Validation
The external validation dataset included 4067 records collected from 231 Australian Holstein cows between October 2015 and December 2017 at the Ellinbank Research Farm belonging to Agriculture Victoria Research (h10 in Table 1). A walkover scale DeLaval Automatic Weigh System AWS100 (DeLaval International, Tumba, Sweden) was used for herd h10 [37] to measure test-day BW. The MIR spectra were provided by a spectrometer model 2000 (Bentley Instruments, Chaska, MN, USA). We approximated the standardization of the spectra by applying the technique developed by Grelet et al. (2015) [27] but using a deferred standardization table.
Before processing the external validation, we calibrated the six selected final models over the entire training dataset (N = 1849 records). Their external validation performance was assessed by calculating the RMSE of validation (RMSE v ) on this fully independent h10 dataset.

Bodyweight Prediction from DHI Dataset
Ultimately, we applied the models offering the best external validation performance to the DHI dataset managed by the Walloon Breeding Association (Awé, Ciney, Belgium), comprising 3,769,477 records collected from 276,728 Holstein cows, with The MIR spectral points provided by FT6000 and FT+ spectrometers (Foss Analytics, Hillerød, Denmark). Although no observed reference BW measurements were present in this dataset, the idea was to verify if the average trends of predicted BW within and across lactations were those expected. As a link exists between BW and milk composition, we compared the average predicted BW's evolution to the observed milk production and predicted fatty acids curves. The fatty acids content was predicted using the equations developed by Soyeurt et al. (2011) [38].

Features Selection
The role of the selection algorithm was to maintain spectral points that had useful information and sufficient variability and eliminate spectral areas containing noises. A first bare selection consisted of picking the most relevant features by using SBF. From the results presented in Table 3, this selection reduced variables number of ALL by 52.62% and HSO by 43.21%. For both datasets, the parity, DIM, and MY were still present. The selection made by Soyeurt et al. (2019) [8] based on previous knowledge allowed a consistent skimming on irrelevant predictors. Indeed, the areas excluded from HSO were also excluded from ALL at 57.61% using SBF.
The wrapper selection enabled a more drastic selection by only picking a subset of the weakly relevant features from the analysis of 1280 different subsets (Table 3). To limit the computational cost and to ensure the use of a parsimonious model, we selected the 13 models for which the RMSE SCV were equal to the lowest observed RMSE SCV + (tolerance/100) * lowest observed RMSE SCV with a tolerance ranging from 0 to 12. The models associated with the lowest RMSE SCV included the maximum number of informative features, ranging from 141 to 282 features (Table 3). It was interesting to notice that the HSO_BETA conserved all HSO features, confirming the interest of the variables selected by experience.

Herd-Independent Internal Validation
For each of the 13 selected models (i.e., one model per tolerance value ranging from 0 to 12), we performed a herd independent internal validation repeated 101 times. Figure 1 summarizes the results by profile and dataset for each tolerance value. RMSE iv ranged between 51.6 ± 2.25 and 60.3 ± 4.83 kg. HSO_VIP did the best with a RMSE iv of 53.7 ± 3.03 kg. Regardless of the profile, ALL_SBF was the worst performer dataset with RMSE iv of 56.7 ± 3.95. These results showed how relevant was the preselection made by Soyeurt et al. (2019) [8]. By taking into account RMSE iv , RMSESD iv , RMSE SCV , and RMSESD SCV in Figure 1, six models can be kept for the external validation. Their performance results are summarized in Table 4. The best number of features to optimize RMSE iv and RMSE SCV were bound between 8 to 20, associated with HSO_VIP, HSO_SBF_BETA, and HSO_SBF_VIP.
Before performing the Australian data's external validation (h10, Table 2), we build the six selected models in Table 4 using the entire calibration dataset (N = 1849 records). RMSE SCV ranged from 48 to 52 kg, which was the range of accuracy mentioned by Soyeurt et al. (2019) [8]. Therefore, the current study confirmed the feasibility of predicting an indicator of BW using milk MIR spectrum, MY, DIM, and parity. 1 Subsets of predictors after the recursive feature elimination (RFE) selection step; SBF means that a first selection was made before RFE using selection by filter; VIP (variables in projection) and BETA (beta coefficients) refers to the ranking methods chosen for RFE; ALL means that originally all the mid-infrared spectral points were considered; HSO means that only the mid-infrared spectral points at wavenumbers (950 cm −1 , 1600 cm −1 ), (1750 cm −1 , 1800 cm −1 ), and (2600 cm −1 , 3000 cm −1 ) were initially considered before selection steps. 2 Corresponding number of features to the subset. 3 Root mean squared error from the herd-independent validation repeated 101 times. 4 Root mean squared error of 10-folds stratified cross-validation. 5 Number of latent variables kept by the partial least square regression. 6 Coefficient of determination. 7 Root mean squared error of validation using Australian dataset.

Figure 1.
Root mean squared error for herd independent validation repeated 101 times (RMSEiv) and its standard deviation from the 13 models selected from the recursive feature elimination (RFE) tolerance. RMSE for the stratified 10-fold crossvalidation (RMSESCV) for the same models and its standard deviation are also plotted.
Before performing the Australian data's external validation (h10, Table 2), we build the six selected models in Table 4 using the entire calibration dataset (N = 1849 records). RMSESCV ranged from 48 to 52 kg, which was the range of accuracy mentioned by Soyeurt et al. (2019) [8]. Therefore, the current study confirmed the feasibility of predicting an indicator of BW using milk MIR spectrum, MY, DIM, and parity. Figure 1. Root mean squared error for herd independent validation repeated 101 times (RMSE iv ) and its standard deviation from the 13 models selected from the recursive feature elimination (RFE) tolerance. RMSE for the stratified 10-fold cross-validation (RMSE SCV ) for the same models and its standard deviation are also plotted. 1 Subsets of predictors after the recursive feature elimination (RFE) selection step; SBF means that a first selection was made before RFE using selection by filter; VIP (variables in projection) and BETA (beta coefficients) refers to the ranking methods chosen for RFE; ALL means that originally all the mid-infrared spectral points were considered; HSO means that only the mid-infrared spectral points at wavenumbers (950 cm −1 , 1600 cm −1 ), (1750 cm −1 , 1800 cm −1 ), and (2600 cm −1 , 3000 cm −1 ) were initially considered before selection steps. 2 Corresponding number of features to the subset. 3 Root mean squared error from the herd-independent validation repeated 101 times. 4 Root mean squared error of 10-folds stratified cross-validation. 5 Number of latent variables kept by the partial least square regression. 6 Coefficient of determination. 7 Root mean squared error of validation using Australian dataset.

External Validation
From Table 4, it can be seen that these six selected models did not always match with the best externally performing models in light of results obtained in the external validation. One of the probable causes could be that the standardization of the validation set (h10) was only approached via a standardization table remote in time, suggesting that some spectral zones would be less sensitive to artifacts and, consequently, to standardization. However, some of these models performed reasonably well in external validation, with good RMSE v values ranging from 52 to 56 kg (Table 4).

Model Interpretation
The remainder of the analysis focused on the three models with the best RMSEv statistics (52 and 56 kg): HSO_VIP, HSO_BETA, and HSO_SBF_BETA. The obtained models using those subsets of features were quite similar. HSO_VIP kept 11 features distributed into 3 PLS components. HSO_BETA involved 7 features for 3 components, where HSO_SBF_BETA contained 8 features summarized in 7 PLS components ( Table 4). All subsets of features had the MY, DIM, and parity in common but had selected different spectral points. Despite some slight disparities between RMSEv values of those three models (52 vs. 56 kg), the predictions given by those three models were highly correlated (R > 95%). Consequently, even if the predictors selected were different for these models, their respective combinations within the corresponding PLS model brought the same level of usefulness in predicting BW. The difference occurred at the spectral level, which would be explained by the high correlations between adjacent spectral points. Therefore, neighboring spectral points shared similar information. Figure 2 shows that the spectral zones of interest bounded by the interval (926 cm −1 , 1120 cm −1 ), which is part of the so-called fingerprint region (926 cm −1 , 1500 cm −1 ) [39]. Moreover, this zone relates to a high heritability value, meaning that transmission is made across generations and confirms that this spectral area was not noisy [40]. The absorbance of the spectral regions between 900 and 1153 cm −1 could be explained by the C-O and C-C stretching modes [39], related to the chemical structure of several major milk components such as fat, sugar, and protein [40]. Lactose association could be extended to the spectral ranges (950 cm −1 , 1200 cm −1 ), stressing that the interval (1000 cm −1 , 1150 cm −1 ) was related to lactic acid (lactate) [41]. Conclusively, the region of interest (926 cm −1 , 1200 cm −1 ) could be stated as the lactose-region [42].
with good RMSEv values ranging from 52 to 56 kg (Table 4).

Model Interpretation
The remainder of the analysis focused on the three models with the best RMSEv statistics (52 and 56 kg): HSO_VIP, HSO_BETA, and HSO_SBF_BETA. The obtained models using those subsets of features were quite similar. HSO_VIP kept 11 features distributed into 3 PLS components. HSO_BETA involved 7 features for 3 components, where HSO_SBF_BETA contained 8 features summarized in 7 PLS components (Table 4). All subsets of features had the MY, DIM, and parity in common but had selected different spectral points. Despite some slight disparities between RMSEv values of those three models (52 vs. 56 kg), the predictions given by those three models were highly correlated (R>95%). Consequently, even if the predictors selected were different for these models, their respective combinations within the corresponding PLS model brought the same level of usefulness in predicting BW. The difference occurred at the spectral level, which would be explained by the high correlations between adjacent spectral points. Therefore, neighboring spectral points shared similar information. Figure 2 shows that the spectral zones of interest bounded by the interval (926 cm −1 , 1120 cm −1 ), which is part of the so-called fingerprint region (926 cm −1 , 1500 cm −1 ) [39]. Moreover, this zone relates to a high heritability value, meaning that transmission is made across generations and confirms that this spectral area was not noisy [40]. The absorbance of the spectral regions between 900 and 1153 cm −1 could be explained by the C-O and C-C stretching modes [39], related to the chemical structure of several major milk components such as fat, sugar, and protein [40]. Lactose association could be extended to the spectral ranges (950 cm −1 , 1200 cm −1 ), stressing that the interval (1000 cm −1 , 1150 cm −1 ) was related to lactic acid (lactate) [41]. Conclusively, the region of interest (926 cm −1 , 1200 cm −1 ) could be stated as the lactose-region [42].   Figure 2 shows how closely some spectral points were gathered when involved in a prediction model. The first cluster included the spectral points at 949 cm −1 and 953 cm −1 . This zone (946 cm −1 , 956 cm −1 ) was activated twice by the three PLS models. The interval (1000 cm −1 , 1040 cm −1 ) corresponded to the second grouping area whose spectral points were attributed to lactate [43]. This zone had the largest number of activated points when involved in BW prediction. The third zone included points in (1040 cm −1 , 1100 cm −1 ). This area was mainly associated with lactose measurement [43]. The fourth zone, such as the second zone, was also connected to lactate. Lactic acid showed absorption bands, notably at 1115 cm −1 [41], which corresponds to one of the points activated when this zone was involved in the modeling. Therefore, a spectral zone was then defined by a set of contiguous and delimited spectral points providing useful information explaining BW's variance. All three selected models used a various mix of these spectral zones in different proportions and combinations. This selection depended on (i) the initial subset of predictors and (ii) the ranking function associated with RFE. Figure 2 points out that the spectral areas activated were quite different for the same ranking function (HSO_BETA vs. HSO_SBF_BETA), thus suggesting zones 2 and 4 as reciprocal substitutes. Furthermore, through HSO_VIP and HSO_BETA, Figure 2 shows that the two ranking methods can order the most important variables differently from the same initial subset of predictors. The dataset's particularity that adjacent variables were highly correlated may have played a crucial role in that discrepancy between ranking profiles. Zone 4 operated as a pivot for the substitutions between zone 1 and zones 2 and 3. This discussion should be put into perspective because the standardization of the outer-validation dataset (h10) was not perfect, and the analyses to be made on an external validation with a standardization carried out in better circumstances may be different.
Nevertheless, it was clear that there is no one-of-a-kind best model. There are instead many models, each of them looking for relevant information in distinct spectral regions. This claim is in line with the analysis carried out by Kohavi and John (1997) [20], who showed that some weakly relevant predictors might coexist in the sense that, although providing useful information to the model, they could be interchanged without affecting its predictive quality.
Besides, it appeared that spectral information brought to the predictive model was useful and relevant when associated with the variables MY, DIM, and parity in BW prediction. The added value of using spectral points was quantified by comparing the three aforementioned models with another, including DIM, MY, and parity features. This model with three features for two PLS components had an RMSEv of 70 kg against 52 kg for the best results obtained using the spectral data as additional features.

Implementation of BW Predictive Models on DHI Database
We applied the best models on the Walloon DHI dataset. Figure 3A shows the behavior of the different BW predictions' averages, computed by parity and DIM on this DHI dataset. For the sake of clarity, only the first parity is given, while the other parity showed similar results. It could be observed that the average predicted BW evolutions across lactation revealed the expected behavior, especially the BW loss in early lactation caused by the concurrent effects of energy demand increase for milk production ( Figure 3C) and feed intake partially prevented due to appetite drop and physical limitation in rumen capacity [44]. The contrast of BW evolutions between HSO, HSO_BETA, and other plotted models was remarkable. Dettmann et al. (2020) showed similar weight curves for Holstein cows to what was observed through the HSO_SBF_BETA equation, both in behavior and size [45].
Furthermore, changes occur in milk fat composition in early lactation [46]. As the milk fatty acids content can be predicted by MIR technology [38], we assumed that indicators of their variations would also be present in the MIR spectral data. To that end, we predicted the milk FA content of the DHI records from milk MIR spectra and then averaged by DIM and parity. Figure 3B and D respectively show the evolution of FA groups' content in fat (i.e., short, medium, and long-chain FAs) and the FA drivers of change (C10:0, C14:0, C16:0, C18:0, and C18:1 cis-9 expressed in fat) within lactation. We were interested in measuring the linear correlation between each MIR spectral point kept as predictors by the HSO_SBF_BETA, HSO_BETA, HSO_VIP models, and the aforementioned FA. Figure 2 highlights those points of interest in green arranged in four spectral zones. The analysis showed that C10:0 was moderately correlated with the spectral points at 948.81 cm −1 , and 952.66 cm −1 (zone1), 1029.80 cm −1 (zone2), and 1110.80 cm −1 (zone4) with a respective correlation of −0.55, −0.56, −0.42, and 0.63. The medium-chain FAs were weakly to moderately correlated with points in the first, second, and third spectral zones, featuring correlations of −0.70, −0.67, −0.38, and 0.53 for the points at 948.81 cm −1 , and 952.66 cm −1 (zone1), 1029.80 cm −1 (zone2), and at 1110.80 cm −1 (zone3), respectively. The long-chain FAs happened to be moderately correlated with points belonging to the first zone (948.81 cm −1 and 952.66 cm −1 resp. −0.61, −0.51) and weakly correlated with the point at 1114.65 cm −1 located in zone 4 (−0.24). Furthermore, changes occur in milk fat composition in early lactation [46]. As the milk fatty acids content can be predicted by MIR technology [38], we assumed that indicators of their variations would also be present in the MIR spectral data. To that end, we predicted the milk FA content of the DHI records from milk MIR spectra and then averaged by DIM and parity. Figure 3B and D respectively show the evolution of FA groups' content in fat (i.e., short, medium, and long-chain FAs) and the FA drivers of change (C10:0, C14:0, C16:0, C18:0, and C18:1 cis-9 expressed in fat) within lactation. We were interested in measuring the linear correlation between each MIR spectral point kept as predictors by the HSO_SBF_BETA, HSO_BETA, HSO_VIP models, and the aforementioned FA. Figure 2 highlights those points of interest in green arranged in four spectral zones. The analysis showed that C10:0 was moderately correlated with the spectral points at 948.81 cm −1 , and 952.66 cm −1 (zone1), 1029.80 cm −1 (zone2), and 1110.80 cm −1 (zone4) with a respective correlation of −0.55, −0.56, −0.42, and 0.63. The medium-chain FAs were Knowledge about FA content and their evolution through lactation could be a differentiator for BW prediction. Although their daily changes are not explicitly found in the MIR, predictive models may find signals in the spectral data that explain some aspects of the animal BW. This kind of a posteriori analysis is valuable for model validation. Although Table 4 shows that three models stood out, we found out that the average predicted BW evolution model HSO_BETA exhibited a completely unexpected behavior since it tends to underestimate the average predicted BW in the very early lactation in comparison with the other models. While the external validation showed similar performance for the models HSO_VIP and HSO_SBF_BETA (RMSEv = 51 kg), we favor HSO_SBF_BETA, with a trend of averaged predicted BW across lactation that is consistent with the literature. Nevertheless, a predictive model is dependent on the quality and representativeness of its training data. Therefore, these models may need to evolve in light of the emergence of new data that may highlight other MIR spectral predictors' relevancy in the future.
In the calibration dataset, 39 records collected from 39 different cows belonging to the h02 herd (Walloon Breeding Association, Belgium, May-07) had morphological information. We used those records to predict BW using linear morphological classification proposed by Saussez (2017) [47] for the Walloon Breeding Association. The obtained RMSE was 71 kg for the equation using linear classification vs. 65, 60, and 54 kg for HSO_BETA, HSO_VIP, HSO_SBF_BETA, respectively. These results highlighted a positive interest in developing prediction models using MIR data. Although those models' predictions cannot replace the actual BW measurements, they can be considered an indicator that smoothed out the measurement's variability. Besides, thanks to a large amount of repeated data and the homogeneity of the phenotypes needed to perform the BW prediction proposed in this study, the predicted BW phenotypes are expected to be easily comparable between cows and herds, providing relevant information for breeding and management.
Furthermore, DHI organizations could potentially provide these predictions as a new service. The development of a multi-trait approach combining linear morphological classification, actual BW measurement, and MIR predicted BW could improve BW's accuracy. The repeated measurements of MIR predicted BW for the same cows on a routine basis would also define BW changes that can be interesting in improving dairy cows' feed efficiency.

Conclusions
Although multiple optimal models coexisted in predicting BW using MIR spectra, it was shown that the usage of these spectra within a modeling process to predict BW is relevant, and the performance achieved were comparable with such offered by methods using linear morphological traits as input variables. However, the advantage of the method proposed in this paper is to obtain highly reproducible BW predictions obtained at low cost and every 4-6 weeks for the productive cows participating in the routine milk recording. Moreover, thanks to the predictive model's features, it is possible to rapidly operate a past BW prediction to obtain a large-scale database for genetic purposes.