The Use of Artificial Neural Networks and a General Discriminant Analysis for Predicting Culling Reasons in Holstein-Friesian Cows Based on First-Lactation Performance Records

Simple Summary Routinely collected data on the performance of dairy cows are a valuable source of information on the beginning, course, and completion of their productive life. As a result, when using sufficiently accurate methods, one can analyze and optimize the milk production process at a herd level from the breeding and economic point-of-view. In this context, it is important to have a possibility to early predict culling reasons for cows, since, in the case of finding an effective method, it would be possible to modify breeding actions and farm management practices without anticipating the end of the animals’ productive lives. Therefore, the aim of the present study was to verify whether artificial neural networks and a general discriminant analysis may be an effective tool for predicting the culling reasons in cows based on routinely collected first-lactation data. It turned out that they were most effective in predicting culling due to old age and reproductive problems. It is significant because infertility is one of the conditions that are the most difficult to eliminate in dairy herds. Abstract The aim of the present study was to verify whether artificial neural networks (ANN) may be an effective tool for predicting the culling reasons in cows based on routinely collected first-lactation records. Data on Holstein-Friesian cows culled in Poland between 2017 and 2018 were used in the present study. A general discriminant analysis (GDA) was applied as a reference method for ANN. Considering all predictive performance measures, ANN were the most effective in predicting the culling of cows due to old age (99.76–99.88% of correctly classified cases). In addition, a very high correct classification rate (99.24–99.98%) was obtained for culling the animals due to reproductive problems. It is significant because infertility is one of the conditions that are the most difficult to eliminate in dairy herds. The correct classification rate for individual culling reasons obtained with GDA (0.00–97.63%) was, in general, lower than that for multilayer perceptrons (MLP). The obtained results indicated that, in order to effectively predict the previously mentioned culling reasons, the following first-lactation parameters should be used: calving age, calving difficulty, and the characteristics of the lactation curve based on Wood’s model parameters.


Introduction
Routinely collected data on longevity and survivability of cows and their culling reasons may be used for the analysis of dairy cattle management and milk production profitability in individual animals, herds, or populations [1]. Therefore, these data are more frequently monitored in real time and relationships among them are analyzed in the context of animal performance prediction [2]. dairy cows, or the estimation of carcass weight in beef cattle [31], among others. Another, more traditional statistical approach (also belonging to data mining methods) is a general discriminant analysis (GDA). In this method, discriminant function analysis problems are solved using a general multivariate linear model, in which the dependent variables are binary vectors that reflect the class membership of each case (animal) [32,33]. GDA offers more possibilities than traditional discriminant function analysis, based on a classification rule, which allows for the correct classification of cows and the evaluation of classification accuracy depending on the adopted division criteria. In animal science, GDA has been used for dystocia detection in dairy cows [34] or the examination of factors affecting beef tenderness [35], among others.
The following research hypothesis was adopted in the present study. ANN may be an effective tool for predicting culling reasons in cows, based on routinely collected first-lactation data. Moreover, the effects of the prediction made by ANN were compared with those obtained using the GDA.

Animals
The analysis included data on Holstein-Friesian cows (from 466 herds) culled in Poland between 2017 and 2018. The animals were performance recorded. The data were obtained from the SYMLEK Polish National Milk Recording System. SYMLEK is a system of databases (including the results of data analysis for breeding purposes) on the population of dairy cattle under milk recording in Poland. At the breeding level, the system is managed by the Polish Federation of Cattle Breeders and Dairy Farmers, while ZETO Software is responsible for its technical (IT) side.

Estimation of Wood's Model Parameters
Based on milk yield from test-day records, the first-lactation curve parameters were estimated separately for each group (age at first calving × culling reason). For this purpose, the mean values of milk yield from test-day records were determined for each lactation stage. Ten lactation stages were distinguished at 30-day intervals (the first lactation stage from 5 to 30 days of lactation, the second lactation stage from 31 to 60 days, the third lactation stage from 61 to 90 days, the fourth lactation stage from 91 to 120 days, the fifth lactation stage from 121 to 150 days, the sixth lactation stage from 151 to 180 days, the seventh lactation stage from 181 to 210 days, the eighth lactation stage from 211 to 240 days, the ninth lactation stage from 241 to 270 days, and the 10 th lactation stage from 271 to 305 days). For the description of the lactation curve, the gamma function proposed by Wood [36] was used: where y is the milk production (kg) at time t (days), e is Napier's constant, a is the initial milk yield, b is the rate of increase until the peak is reached, and c is the rate of decline after peak production. The regression model parameters were estimated with the quasi-Newton method [37]. A total of 948,010 test-day records (for the first 305-day lactation) from 163,369 cows were used for estimating Wood's model parameters. The estimated Wood's model parameters (a, b, c) were used as explanatory (input) variables for further analysis.

Data Editing
When preparing the training set for classification using neural networks, only cows with a complete set of information were used. Records with less than 1 kg of milk, incomplete or erroneous ones (e.g., improbable minimal and maximal values of variables) were removed. In addition, only cows with at least nine test-day records were included in the analysis. The final dataset contained 50,879 cows.
In this dataset, the following explanatory (input) variables were included: X 1 -herdsize (from 3 to 1644 cows), X 2 -age at first calving (from 17 to 34 months), X 3 -lactation length (in days), X 4 -the number of first-lactation test-day records, X 5 -X 7 -Wood's model parameters (a, b, c, respectively) for individual categories (age at first calving × culling reason). Additionally, the following production traits for the first lactation were used as predictors (minimum, maximum, mean values, and standard deviations, respectively): X 8 -X 11 -daily milk yield (kg), X 12 -X 15 -fat content (%), X 16 -X 19 -protein content (%), X 20 -X 23 -lactose content (%), X 24 -X 27 -dry matter content (%), X 28 -X 31 -urea content (mg/L), and X 32 -X 35 -somatic cell count (thousand/mL). Moreover, first-lactation nominal variables such as X 36 -calving difficulty (according to the scale used for performance recording: easy, spontaneous, difficult, very difficult, abortion, and cesarean section) and X 37 -calving season (spring from 21 March to 20 June, summer from 21 June to 21 September, autumn from 22 September to 22 December, and winter from 23 December to 20 March) were included in the model.
Ultimately, the dataset was randomly divided into a training set (33,071 culling records, 65% of all observations), a validation set (used for controlling the network training process, 7632 records, 15% of all observations) and a test set (used for verifying the predictive performance of the models, 10,176 records, and 20% of all observations). The distribution of continuous and nominal predictors in individual sets is presented in Tables 2 and 3.   R1-infectious diseases, R2-respiratory system diseases, R3-low milk yield, R4-nutritive and metabolic diseases, R5-leg diseases, R6-udder diseases, R7-infertility and reproduction problems, R8-old age, R9-accidents, and R10-others.

Neural Network Analysis
Different multilayer perceptrons (MLP) with one hidden layer were analyzed. The hidden layer consisted of 5 to 30 neurons (the number of neurons was selected empirically). The number of neurons in the input layer was 45 and the calving season and calving difficulty variables were coded by four and six neurons, respectively (one-of-n encoding) (Figure 1). In the input layer, the min-max transformation was used for continuous variables. In the hidden and output neurons, different types of activation functions were verified (linear, logistic, hyperbolic tangent, and exponential). The networks were trained with the Broyden-Fletcher-Goldfarb-Shanno (BFGS) algorithm, which is a powerful second order training algorithm with very fast convergence but high memory requirements due to storing the Hessian matrix [38]. For each analyzed network, a given number of iterations was carried out until reaching the minimum misclassification rate on the validation set. For the evaluation of the network during its training, two error functions were considered, i.e., the sum of squares and cross-entropy. The latter is calculated as the sum of the products of real values and error logarithms for each output neuron. Based on the obtained results, the classification matrix was created on the test set and predictive performance measures were calculated (the percentage of correctly classified cases from each category and the overall accuracy). In addition, the positive predictive values (PPV) were calculated, which showed the reliability of predictions made by the neural models. Finally, a sensitivity analysis was carried out for ANN, which allowed for the ordering of predictors (input variables) according to their relative importance. This analysis was based on two criteria: an error ratio, i.e., error when input was set to mean divided by the error when input was used (for continuous predictors) or an average error when input was set to all other categorical levels divided by the error when input was used (for categorical predictors), and a rank, which ordered predictors according to their decreasing importance from one (the most important predictor) to 37 (the least important predictor).

Training of the Neural Model with the Most Discriminative Predictors
Based on the results of sensitivity analysis (Table 4), the set of predictors was limited to the five most discriminative ones, adopting the value of an error ratio above 1.5. The entire procedure was the same as for the networks with the full set of predictors (the selection of the best network out of 10 initial networks, classification matrix, sensitivity analysis, and gains charts). Clearly, there were differences in the number of input neurons (ten, Figure 2). Based on the obtained results, the classification matrix was created on the test set and predictive performance measures were calculated (the percentage of correctly classified cases from each category and the overall accuracy). In addition, the positive predictive values (PPV) were calculated, which showed the reliability of predictions made by the neural models. Finally, a sensitivity analysis was carried out for ANN, which allowed for the ordering of predictors (input variables) according to their relative importance. This analysis was based on two criteria: an error ratio, i.e., error when input was set to mean divided by the error when input was used (for continuous predictors) or an average error when input was set to all other categorical levels divided by the error when input was used (for categorical predictors), and a rank, which ordered predictors according to their decreasing importance from one (the most important predictor) to 37 (the least important predictor).

Training of the Neural Model with the Most Discriminative Predictors
Based on the results of sensitivity analysis (Table 4), the set of predictors was limited to the five most discriminative ones, adopting the value of an error ratio above 1.5. The entire procedure was the same as for the networks with the full set of predictors (the selection of the best network out of 10 initial networks, classification matrix, sensitivity analysis, and gains charts). Clearly, there were differences in the number of input neurons (ten, Figure 2).

Discriminant Analysis
Based on the same dataset as for ANN, the GDA was carried out and the classification matrix was created on the test set for the models including all 37 initial or the five most discriminative predictors. The method of the GDA model building was described in more

Discriminant Analysis
Based on the same dataset as for ANN, the GDA was carried out and the classification matrix was created on the test set for the models including all 37 initial or the five most discriminative predictors. The method of the GDA model building was described in more detail by Zaborski et al. [34]. In addition, the predictive performance measures were calculated (the percentage of correctly classified cases from each culling category, the overall accuracy, and PPV).

Gains Charts
In order to better illustrate the predictive abilities of the neural and GDA models, cumulative gains charts were also plotted. These charts show the relationship between the cumulative gains (the proportion of cases from a given culling category among all the cases belonging to this category) and the percentage of cases predicted by the model as belonging to this category in the whole data set [39]. A diagonal crossing the (0,0) and (1,1) points (the baseline) indicates a random model (without any predictive capabilities). Therefore, the curves located above the diagonal are preferred [the closer the line to the (0,1) point, the better the model] [40].

Results
The most effective ANN with 37 predictors had a relatively, highly correct classification rate on the training and validation set (86.73%-96.17% and 87.33%-95.96%, respectively) ( Table 5). From among the analyzed ANN, the MLP with one hidden layer and a 45-29-10 structure (the number of neurons in the input, hidden and output layer, respectively) was selected ( Figure 1). This perceptron (denoted as MLP37) had the highest correct classification rate on the validation set. The applied training algorithm included 320 iterations. The cross-entropy error function was applied together with the SoftMax activation function in the network output layer. A hyperbolic tangent activation function was used in the hidden layer. The sensitivity analysis of MLP37 showed that the greatest influence on the output variable was exerted by lactation curve parameters (a, b, c), age at first calving, and calving difficulty. Their error ratio ranged from 8.870 (calving difficulty) to 188.901 (the a parameter). Therefore, these variables were used as the only input variables for the network with a reduced set of predictors. The remaining input variables for MLP37 had a much lower error ratio, i.e., below 2 (Table 4).
In comparison with the best networks containing 37 predictors, the networks with a lower number of predictors were characterized by the lower values of a correct classification rate both on the training (71.46%-83.01%) and validation (71.40%-83.52%) set. Among these networks, the MLP with one hidden layer and a 10-19-10 structure (denoted as MLP5) was the most effective ( Figure 2). The applied training algorithm included 227 iterations. To evaluate the network performance during its training (like for MLP37), an entropy error function was used, which was applied together with the SoftMax activation function in the network output layer. Similarly, a hyperbolic tangent activation function was used in the hidden layer ( Table 5).
As can be seen from Table 3, the most frequent culling reasons in the test set were: reproductive problems (4055 records), udder diseases (1314 records), and accidents and leg diseases (1047 and 1025 records, respectively). Nevertheless, both MLP37 and MLP5 almost always correctly classified culling records from an old age category (R8) ( Table 6). A very high correct classification rate (at least 99%) was also found for cows culled due to reproductive problems (R7). In other cases, the percentage of correct classification was 91%-97% for MLP37 (except for low milk yield-R3, for which it was 77%) and 51%-88% for MLP5. On the other hand, the lowest correct classification rate (77% and 55% for MLP37 and MLP5, respectively) was observed for low milk yield (R3). The percentage of correct classification for individual culling reasons obtained with GDA37 and GDA5 was, in general, lower than that for MLP37 and MLP5 (Table 7).  R1-infectious diseases, R2-respiratory system diseases, R3-low milk yield, R4nutritive and metabolic diseases, R5-leg diseases, R6-udder diseases, R7-infertility and reproduction problems, R8-old age, R9-accidents, and R10-other. The numbers of correctly classified cases are shown on the diagonal. A significant indicator of the predictive abilities of ANN and GDA was also the reliability of prediction. In the present study, PPV were used for this purpose (Table 7). In general, these values were quite high for both neural models (88.31-100% for MLP37 and 68.15-100% for MLP5). For GDA, they ranged from 0% to 83.33% (GDA37) and from 0% to 100% (GDA5). In order to get an even better insight into the prediction reliability of ANN and GDA, the cumulative gains charts were plotted and analyzed, which illustrated the relationship between the percentage of correctly classified cases from a given category and the percentage of records from the dataset ordered, according to the predicted probability of the class assignment. It should be emphasized that the gain curves for most culling reasons predicted by ANN were located as much higher than the baseline [near the (0, 100%) point], which indicates a high prediction reliability of the neural models (Figures 3 and 4). However, this time, MLP37 was (like for the previously reported results) more effective than MLP5. When interpreting gains charts for individual categories, one should also consider the percentage of cases from a given category in the whole dataset. Consequently, the course of the curve for udder diseases was not optimal (from the first 13% of observations classified with the highest probability to this category by the model, which about 80% belonged to this class). On the other hand, the curve for reproductive problems passed much closer to the baseline, even though the prediction reliability was very high. This resulted from the fact that the percentage of cases from this category in the whole dataset was about 40%. The gains obtained with both MLP37 and MLP5 were the highest (besides reproductive problems) for such culling reasons as: old age, accidents, or leg diseases. The gains for the GDA models with 37 and five predictors were much lower (Figures 5 and 6). In principle, the gains curves for individual categories (except for old age) were located very close to the baseline, and some curves were even below this line, which shows the uselessness of such models, since better results can be obtained in a purely random manner (without any model). n-number of records, MLP37-multilayer perceptron with one hidden layer and 37 input variables, MLP5-multilayer perceptron with one hidden layer and five input variables, GDA37-general discriminant analysis with 37 predictors, GDA5-general discriminant analysis with five predictors, Cor.-correct, Incor.-incorrect, PPV-positive predictive value, R1-infectious diseases, R2-respiratory system diseases, R3-low milk yield, R4-nutritive and metabolic diseases, R5-leg diseases, R6-udder diseases, R7-infertility and reproduction problems, R8-old age, R9-accidents, and R10-others.
the whole dataset was about 40%. The gains obtained with both MLP37 and MLP5 were the highest (besides reproductive problems) for such culling reasons as: old age, accidents, or leg diseases. The gains for the GDA models with 37 and five predictors were much lower ( Figures 5 and 6). In principle, the gains curves for individual categories (except for old age) were located very close to the baseline, and some curves were even below this line, which shows the uselessness of such models, since better results can be obtained in a purely random manner (without any model).

Discussion
In the present study, old age was the most accurately predicted culling reason by both MLP37 and MLP5. In addition, prediction reliability for ANN was high for this category, which indicates that almost all animals predicted by ANN to be culled due to

Discussion
In the present study, old age was the most accurately predicted culling reason by both MLP37 and MLP5. In addition, prediction reliability for ANN was high for this category, which indicates that almost all animals predicted by ANN to be culled due to old age, really belonged to this category. The gains charts were also nearly optimal. Similar values of individual performance indicators were obtained for GDA. Therefore, in this case, it is really possible to include only lactation curve parameters (from Wood's model), age at first calving, and calving difficulty as the only predictors in primiparous cows. It may be of great importance from the production practice point-of-view, since, among all culling reasons, old age directly indicates the productive lifespan of dairy cows, considering the longest-living animals. The possibility of an early prediction of the maximum lifespan of cows (based on first-lactation parameters) may provide information required for the assessment of milk production profitability [41] and the modification of breeding programs for dairy cattle in terms of their longevity [42]. It should also be noted that many authors [1,[42][43][44] have indicated cow longevity as one of the most important measures of animal welfare. It is highly significant from both a breeding and production point-of-view and due to the increasingly higher sensitivity of dairy product consumers to the human-animal relationship [45]. Therefore, it seems that the prediction of the maximum length of productive life in high-yielding cows should be interesting for both breeders/milk producers and the food industry.
On the other hand, reproductive problems, as the second culling reason (after old age), most accurately predicted by MLP37 and MLP5, belong to the most frequent difficulties encountered in production practice. It is estimated that they account for approximately 20%-40% of culled dairy cows [1,46]. A highly correct classification rate (at least 99% in the case of ANN) for this category was also accompanied by high prediction reliability and high gains (considering the fact that this category was the most frequent one). Cows with reproductive problems were sometimes incorrectly classified to such categories as: leg diseases, udder diseases, accidents, other reasons, and (to a lesser extent) low yield, respiratory system diseases, metabolic diseases, and old age. It may have resulted from the relationship between reproductive problems in cows and other culling reasons. An association between cow fertility and respiratory system diseases [47,48], milk yield level [49,50], metabolic diseases [51][52][53], leg diseases [54][55][56], and udder diseases [57,58] has been shown. In the present study, ANN incorrectly classified reproductive problems in cows in these cases. At the same time, this result supports the suggestion made by Adamczyk et al. [59], who recommended to consider not only the ultimate culling reasons but also the mutual relationships among individual reasons and the life-history of cows when analyzing longevity and indicating culling reasons for these animals. Cows culled due to reproductive problems were also accurately predicted by GDA. However, PPV and gains were lower than those for ANN considering the proportion of this class in the whole dataset. Similar results were obtained for GDA with a reduced set of predictors.
For the potential application of ANN in dairy production, prediction of culling reasons in cows should be considered in a broader context, i.e., concerning the lifetime performance of animals. In this regard, Kumar and Hooda [60] stated that artificial intelligence may be successfully applied to the prediction of lifetime milk yield of cows based on age at first calving, calving interval, and some parameters of the first and second lactation (service period, lactation milk yield, lactation length, and dry period), whereas Bhosale and Singh [61] reported that, for the effective prediction of lifetime milk yield in crossbred cows with a proportion of Holstein-Friesian genes, it is sufficient to include only the first-lactation parameters in the ANN input layer (lactation length, peak yield, and lactation total milk yield). Moreover, ANN was very effective in this case for both smaller (fewer than 10 cows) and larger herds. Considering the results reported by Bhosale and Singh [61], it should be noted that, in the present study, the predictive abilities of ANN were confirmed based on the first-lactation data, including Wood's model parameters. Consequently, the effectiveness of ANN in predicting phenotypic milk performance traits is even more important due to the fact that it corresponds to the significant abilities of ANN to predict breeding value of dairy cattle [62].
In this context, a more traditional method such as a discriminant function analysis was much less effective when compared with ANN including 37 or five predictors. In addition, the application of GDA is associated with certain assumptions about predictors, especially multicollinearity, which limits its applicability [33]. Predictors should not be correlated with each other since this causes computational problems. These assumptions, however, are not so important for ANN.

Conclusions
In the present study, it was shown that artificial neural networks may be an effective method of classifying cows culled due to old age based on routinely collected first-lactation data. Among the remaining culling reasons, a highly correct classification rate was observed for reproductive problems. An association between this culling reason and low milk yield, udder diseases, metabolic diseases, leg diseases, and respiratory system diseases was also confirmed in our study. It should be emphasized that, for the effective prediction of culling reasons, it was sufficient to include such first-lactation traits as calving age, calving difficulty, and the characteristics of the lactation curve (Wood's model parameters). The confirmed abilities of ANN may constitute a valuable source of information that can be used for breeding programs' modification in Holstein-Friesian cattle and economic model optimization for dairy herds.