1. Introduction
Replacement heifers are an important investment that will pay off through milk production. Productive cows are distinguished by their persistence in milk yield and longevity [
1]. However, the costs incurred (financial and environmental) during heifer rearing must be offset by the future milk yield. Therefore, it is crucial to identify and select heifers with good productive potential as soon as possible [
2]. Higher efficiency would result from efforts to achieve at least 88% of the mature milk yield in the first lactation and calve heifers younger than 22 months [
1]. It is suggested that heifers reaching between 73% and 77% of their mature weight at the first calving can produce more milk in their first lactation without compromising the long-term milk yield and herd life [
1]. However, farm-specific targets may be more useful than a “one-size-fits-all” approach [
3]. In addition to attaining targets, an early retention strategy for heifers can help reduce feed costs, as opposed to retaining all the calves [
2]; this requires suitable rearing management and selection criteria [
4,
5,
6].
The first lactation milk yield is a performance indicator that is associated with other measures of lifetime performance [
7,
8,
9]. Several studies have reported predictive models and correlations between the first lactation milk and body weight, average daily growth rate (ADG) or body condition score [
10,
11,
12]. Particularly interesting is the use of derivatives of the weight curve models of growing heifers to detect negative breakpoints that explain differences in the first lactation milk associated with the number of breakpoints detected [
13]. Informative breakpoints could occur at any time during the growing period. It is also possible that the body weight at a specific time better explains the first lactation milk on its own [
11]. Detection of positive and negative breakpoints [
13] could explain how grazing heifers with different growth trajectories (seasonal or linear) achieve similar outcomes [
14].
Heifer growth during rearing is influenced by fluctuations in health and environmental conditions. Measures of heifer weights are a common practice used to monitor growth, calculate the ADG and determine management practices according to specific weight thresholds, such as the timing of the first insemination. The heifer’s target growth trajectory between body weights is predominantly linear, thus creating a mostly linear growth trajectory from 6 to 15 and 15 to 22 months of age [
11,
14]. Heifer growth curve models that predict the body weight at a specific point in the future can assist in cattle selection. Rather than a curve, the heifer’s growth during the rearing period has been characterized as linear [
11,
14].
Nonlinear growth models can represent patterns of acceleration and deceleration, such as periods of restriction and overfeeding; heifers that experience these oscillations experience a positive effect on the amount of secretory mammary tissue and its activity [
15]. The effects of pre-pubertal and post-pubertal ADG on the future milk production and reproduction are inconsistent; while accelerated ADG reduces mammary development, there are also questions about whether periods of restriction and re-alimentation increase the amount and activity of mammary secretory tissue and if this is advantageous for grazing dairy [
15]. Therefore, derivatives of Fourier transforms or splines would better represent fluctuations and changes in the ADG compared to growth curve models such as the Logistic, Brody, or Gompertz [
16]. Additionally, these functions may not be entirely relevant for heifers, because the asymptotic value of the body weight is not reached until two or three lactations [
13].
The approach of this study was to use the weight data of heifers during rearing to reveal oscillations that determine future milk performance. Fourier series are functions that consist of sums of trigonometric functions and are used to represent oscillatory components in a time series [
17]. Transient or instantaneous growth rates may reveal growth spurts due to environmental factors [
13]. As such, the instantaneous ADG is the first derivative of a growth curve at a given time [
18]. The first derivatives of nonlinear functions can have maximum or minimum values representing points of inflexion at specific times. Changes in the heifer’s instantaneous growth rate could result from illness, changes in the nutritional plane or many management practices. Positive and negative changes in the instantaneous growth rate would accumulate and impact future performance. Because a derivative is a function, it can be derived to obtain a second derivative or higher-order derivatives [
19]. The second derivative provides information about an inflection point and also indicates how the first derivative will change as the input varies. High-order derivatives are used to avoid local minima in multivariate optimization functions. They are especially valuable in machine learning for guiding the optimization search [
20].
We proposed that the first and second derivatives of the growth curve provide valuable information for predicting the future milk yield. As far as we know, there are no reports on the use of the derivatives of a growth curve as explanatory variables in a predictive model. The closest approach was when the first derivative was used to detect breakdowns in the growth curves of heifers; there, the number of accumulated breakdowns during heifer rearing was then associated with milk production in the first lactation [
13]. Handcock [
21] utilized the weight of heifers up to 21 months of age to forecast the milk yield in their initial lactation and the first derivative proved valuable in determining the optimal weight for maximizing the milk yield. It also revealed that heifers with lower weights had a greater capacity to enhance their milk yield by increasing their weight in comparison to heavier heifers. In that study, the first derivative was not included as an explanatory variable in the model.
To generate the predictive model, we used artificial intelligence algorithms, specifically machine-learning algorithms. These algorithms are capable of identifying and associating patterns in the data, making them superior to ordinary least squares regression. This approach is supported by an increasing use of artificial intelligence algorithms to predict milk yields [
22]. Machine learning (ML) takes advantage of parallel processing and a range of artificial intelligence algorithms such as neural networks, random forest, and generalized linear models, among many. There are ML frameworks such as H2O that solve classification or regression problems with streamlined workflows and easy model deployment [
23]. Within the H2O platform, AutoML is a fully automated supervised learning algorithm that trains and cross-validates base models. It generates a stack of different models based on ML algorithms and also stacked ensembles. Model stacking is an ensemble modeling technique that involves training a model to combine the outputs of many models from the stack.
The present work used a small dataset of grazing heifers to model the first lactation milk yield based on a limited number of measurable variables early in the heifer’s life. The main objective was to determine if there was a specific age at which the heifer’s milk yield in her first lactation could be more accurately predicted. The hypothesis was that variables derived from a Fourier function representing the heifer’s growth during rearing were important in the modeling, particularly the first and second derivatives.
2. Materials and Methods
This exploratory and retrospective observational study was conducted at the National Autonomous University of Mexico station in Tequisquiapan, México (20°36′13.88″ N, 99°55′02.91″ W), at an altitude of 1913 m above sea level. Dairy production was in a year-round milk system of Holstein-Friesian, Jersey and their crosses on irrigated pasture with rotational grazing. The climate was semi-dry temperate, with 512 mm of precipitation and an average annual temperature of 17.5 °C. The predominant pasture species were alfalfa (
Medicago sativa), cocksfoot (
Dactylis glomerata), ryegrass (
Lolium multiflorum), and tall fescue (
Festuca arundinacea). The grazing area was allocated based on cuttings of the available forage before each grazing period, as well as the monthly values of the proximate analysis of the forage [
24].
The management of the heifers was performed as follows: weighing at birth, repeatedly dosed with good-quality colostrum, and kept in individual housings. The heifers were fed whole milk, complemented with varying amounts of concentrate and ad libitum alfalfa hay, until they started grazing at around three months of age. After 12 months of age, their feeding consisted exclusively of grazing. We considered that the heifers between 12 and 18 months of age had altered food intake, probably because they spent a significant amount of time in management pens when heat detection, insemination procedures, and regrouping took place. After 18 months of age, when pregnancy was confirmed, the heifers were managed as a single group.
The body weight was recorded between 9 and 19 times during each heifer’s rearing. The individual growth curve was modeled using a 2 × 2 Fourier series (Equation (1)). Estimated values of the body weight (P, kg), first derivative (1D, kg d
−1), and second derivative (2D, kg d
−2) were calculated at a three-month interval from the third to the twenty-first month of age. P was used in the modeling instead of the measured body weight because the age of weighing was irregular among the heifers.
where
y is the response variable,
a0,
a and
b are the Fourier’s coefficients,
t is the time, and
i is the period and the π constant.
We used data from 78 Holstein-Friesian heifers with at least 293 days in milk. After calving, between 72 and 94 daily records were collected using proportional milk meters from Waikato LTD Co., Hamilton, New Zealand. Records of the 305-day milk yield were estimated from the daily milk weights by integrating a 2 × 2 Fourier function from day 1 to 305 (L, kg). The models were obtained using Table Curve 2D v5.01 software (Grafiti LLC, Palo Alto, CA, USA), and the normality of the residuals was tested using the Anderson–Darling test in the R program [
25].
- 2.
Modeling variables and scenarios
To model the future milk production, L was used as the response variable, while P, 1D, 2D, the age at effective artificial insemination (AI), and the season of the year in which the heifer was born (E) were used as explanatory variables. The milk yield in the first lactation, L, is an indicator of performance for other variables related to the productive life of the cow [
7,
8,
9]. We hypothesized that P, 1D and 2D would predict L better at a certain heifer ages, because the heifer weight, ADG, and body condition are positively correlated with L [
10,
11,
12]. The inclusion of 1D and 2D in the models was supported by the usefulness of higher-order derivatives in guiding the optimization search of multivariate functions solved by artificial intelligence [
20]. In particular, we were interested in exploring whether the 1D and 2D variables contribute to improving the modeling compared to the variables P, E, or AI.
The following scenarios were generated based on the explanatory variables used: (a) all the variables, (b) except P, (c) without 1D and 2D, (d) except 1D, (e) except 2D, and (f) except AI. These scenarios were explored using estimates obtained at heifer ages of 3, 6, 9, 12, 15, 18, and 21 months. The database was divided into 85% of the records for training the models and 15% of the records for testing the model. Using bootstrapping [
26], which is a sampling technique with replacement, six samples were taken from the database. Thus, there were 252 runs based on 6 scenarios, 7 ages of the heifer, and 6 samples.
- 3.
Algorithms
Modeling was explored using multiple linear regression with the stepwise procedure; however, no model showed significance for any of the scenarios (
p < 0.05). Therefore, we only report the modeling results performed with the AutoML function of the H2O package v.2.32.14 in R [
27]. AutoML allows for the exploration of the database and the identification of patterns that enable the creation of a predictive model using various machine-learning algorithms, such as the random forest, neural networks, deep learning, and generalized linear models, among others. For each of the 252 runs, hundreds of predictive models were obtained, from which AutoML could generate an ensemble of models. Often, the ensemble had better predictive ability than the best model obtained using a specific machine-learning algorithm. Within each run, the models were ordered according to their deviance as a measure of the goodness of fit, which is a generalization of likelihood [
28]. A model is considered a better fit if it has a lower numerical value for this statistic.
Overfitting and multicollinearity are conditions that limit the quality and usefulness of regression models. These issues can be addressed in machine learning [
29]; the AutoML procedure reduces these conditions by thoroughly exploring the parameter values and conducting cross-validation through multiple runs of the algorithm [
30]. Deviance was also used as a criterion to interrupt the processing of a run when no improvement was obtained in the cross-validation process. Only the best model from each run was chosen for further analysis.
- 4.
Importance of variables and model interpretability
Deviance and the coefficient of determination (r2) were used to test the model by comparing the estimated values with the corresponding observed values from the records reserved for validation. For these two measures, the average and standard error (n = 6) were calculated for each scenario for the heifer’s age. For the heifer’s age when the deviance was lower, we tested the hypothesis of lower deviance for the scenario including all the available variables against the scenario when the derivatives were not included. Only this test was performed to control the type I error at a level significance of 0.05 using a one-tailed non-parametric Wilcoxon test.
The contribution of the explanatory variables to reducing the model’s deviance is determined by the importance in each model, but this is not possible for ensembles. Although machine-learning algorithms yield more accurate predictions, they are often referred to as black box models because they lack an explanation of the underlying process that led to those predictions. However, there is currently a growing focus on interpreting the models generated by machine learning [
31]. The SHapley Additive exPlanations (SHAP) [
32] values are an option for model interpretability; the SHAP values assist in post hoc agnostic interpretability of the variable importance in models. Each observed value is assigned a SHAP value, which represents its contribution to the value predicted by the model [
33].
3. Results
The deviance was lower at six months of age when 1D was included as an explanatory variable in the model (
Figure 1a,b,e,f). When 2D was used, but not 1D, the deviance decreased at 12 and 15 months compared to the other heifer ages (
Figure 1d). When the derivatives were not included in the model, the deviance was high at any age (
Figure 1c). When P was not included in the model, the deviance was only low at six months of age (
Figure 1b). At six months of age, the deviance was lower in the scenario that included all the variables compared to the scenario where the derivatives were not included (
p = 0.022). Interestingly, using all the variables, or excluding only AI, had low deviance when using data from six and nine months of age (
Figure 1a,f). Furthermore, the models including all the variables consistently had the lowest deviance at any age.
The r
2 values were the lowest when 1D and 2D were unavailable for model building (
Figure 2c). The r
2 was consistently close to or slightly higher than 0.6 from three to twelve months of age when all the variables were included. At six months of age, the r
2 value exceeded 0.6 only in the scenarios where P was not included (
Figure 2b), 2D was not included (
Figure 2e) or AI was not included (
Figure 2f). Modeling without AI would be viable at early stages of heifer rearing since the r
2 values were high when using data from 6 and 9 months of age.
The obtained deviance and r
2 values indicated that using all the variables from three to twelve months of age resulted in better models. In this case, 1D was the most important variable at six and nine months, while 2D was the most important at three and twelve months (
Table 1). It is noteworthy that E was not utilized in the optimal model in any scenario or at any age. This result caught our attention because the production system under examination exhibits seasonal fluctuations in terms of forage availability. Additionally, heifers born in the autumn showed a slightly higher numerical value of L compared to those born in the spring (
Table 2).
Generally, the DL algorithm (70%) and XGBoost (16%) were the most commonly used in all the scenarios. The ensembles only produced a superior model in 4.7% of runs. Neural networks are well known for classification problems, while the XGBoost can be used directly for regression. Although machine-learning algorithms primarily focus on classification or regression problems, there is an overlap in their utility for solving these two types of problems [
27]. The AutoML implementation does not provide SHAP values for DL models, but they are available for XGBoost and could offer insight into how the variables were important. The summary plot of the SHAP values (
Figure 3) shows that 1D and AI in the XGBoost model using all the variables at six months of age were the most influential variables. High values for these variables had a negative impact on L; low values would contribute positively to L in the range of 200 to 300 kg. Although P had an average importance of 0.16 (
Table 1), in this XGBoost model it was the least important.
4. Discussion
The models using D1 and D2 had lower deviance at younger heifer ages, indicating the usefulness of these variables. All the explored models did not include interactions or quadratic effects to improve the relationship with L. Zanton and Heinrichs [
34] showed that the average weight gain (ADG) before puberty explained the L in a quadratic model: y = ADG + ADG
2 with r
2 = 0.61; however, in a linear model, it was not significant: y = ADG with r
2 = 0.06. With quadratic models, the relationship between the heifer weight and L was linear for weights at three months of age, but it was curvilinear for weights at six months or older for certain dairy racial groups, excluding Holstein-Friesian [
11]. In a preliminary study, we used multiple regression analysis, but only identified non-significant relationships between the heifer weight and L as the main effect. Gelsiger et al. [
35] used the ADG as a quadratic term, but they found that different management practices between barns had a greater effect on the L compared to nutrition or the ADG. In our study, the database was limited in terms of the number of heifers with complete records and the number of variables included. A predictive model for commercial applications would improve its accuracy by incorporating a larger number of records and variables, including those related to the environment, animal activity, management and parentage, among others. Aquilani et al. [
36] reviewed the practical challenges of grassland agriculture to develop precision approaches involving the control, monitoring, and observation of livestock. Thus, there is potential to build more comprehensive databases as an alternative to exploring variable transformations that increase the model complexity.
AI was an important variable when using data from different ages and when all the variables were considered (
Table 1). When relying on this kind of model, the decision to cull the heifer would have to wait until it reached 17 months of age and the pregnancy was confirmed. For models like the one shown in
Figure 3, the low SHAP values when the days of AI increased indicate a negative contribution to milk production. This finding conflicts with the accepted practice of not exposing very young heifers to breeding [
10]. However, using all the variables except AI from the data for the 6 and 9 months of age models with low deviance and high r
2 contributes to our hypothesis that P, 1D, 2D can better predict L.
The naturally changing forage resource results in shifts in the feed intake and feed nutritional quality, but pasture-based conditions also lead to lower stress, improved hygiene, and better animal welfare compared to their pen-fed counterparts [
37]. Overall, environmental conditions and their impact on phenotype can lead to fluctuations in the ADG. These fluctuations can be observed as changing values of 1D and 2D during the growth phase of heifers. In pen-fed animals, the fluctuations in the ADG may be different to those of grazing animals in terms of the magnitude, temporal occurrence, or frequency. In this sense, the modeling importance of the 1D, 2D, and potentially higher-order derivatives were encouraging, but their usefulness needs to be confirmed in other raising conditions for replacement heifers.
The modeling could be improved by increasing the execution time and reducing the number of algorithms included in the construction phase. It is possible that the solution space was not thoroughly explored due to the limited execution time of 200 s for completing a model. If the time limit was exceeded, the model would fail and be discarded.
Deep learning was the algorithm that provided the best solution in most runs. However, there were two main variants depending on how the model’s hyperparameters were optimized. The traditional method involved using a Cartesian grid, while the alternative method involved using a random search grid [
27]. The average deviance for the Cartesian grid was 377,733, while for the random search grid it was 571,712. For XGBoost, something similar happened, with deviance values of 491,663 and 748,663. Possibly, the random search for hyperparameters was not efficient within the allocated time and the optimization of hyperparameters should be restricted to a Cartesian grid search.