Application of Machine Learning Algorithms to Predict Body Condition Score from Liveweight Records of Mature Romney Ewes

Body condition score (BCS) in sheep (Ovis aries) is a widely used subjective measure of the degree of soft tissue coverage. Body condition score and liveweight are statistically related in ewes; therefore, it was hypothesized that BCS could be accurately predicted from liveweight using machine learning models. Individual ewe liveweight and body condition score data at each stage of the annual cycle (pre-breeding, pregnancy diagnosis, pre-lambing and weaning) at 43 to 54 months of age were used. Nine machine learning (ML) algorithms (ordinal logistic regression, multinomial regression, linear discriminant analysis, classification and regression tree, random forest, k-nearest neighbors, support vector machine, neural networks and gradient boosting decision trees) were applied to predict BCS from a ewe’s current and previous liveweight record. A three class BCS (1.0–2.0, 2.5–3.5, >3.5) scale was used due to high-class imbalance in the five-scale BCS data. The results showed that using ML to predict ewe BCS at 43 to 54 months of age from current and previous liveweight could be achieved with high accuracy (>85%) across all stages of the annual cycle. The gradient boosting decision tree algorithm (XGB) was the most efficient for BCS prediction regardless of season. All models had balanced specificity and sensitivity. The findings suggest that there is potential for predicting ewe BCS from liveweight using classification machine learning algorithms.


Introduction
Body condition score (BCS) in sheep (Ovis aries) is a widely used subjective measure of the degree of soft tissue coverage (predominantly fat and muscle) of the lumbar vertebrae region [1,2].Body condition score is based on a 1-5 scale using half units or quarter units and is conducted by palpation of the lumbar vertebrae immediately caudal to the last rib above the kidneys [2].Unlike liveweight (LW), BCS is not affected by fluctuations in gut-fill, fleece weight and frame size, which confound liveweight as a measure of animal size to give an indication of body condition [3].BCS can be easily learned and is cost-effective and requires no specialist equipment [2].The optimal BCS range for ewe performance is 2.5 to 3.5 [2]; outside this range performance is either adversely affected or it is inefficient in terms of performance per kilogram of feed eaten [4].Farmers can use targeted feeding based on this optimal range to optimize overall performance.
Despite the advantages of using BCS over liveweight (LW) for flock management, many farmers in extensive farming systems do not regularly do so.For instance, only 7% and 40% of the farmers indicated that they conducted hands-on BCS in Australia and New Zealand, respectively [5,6].Farmers often rely on visual inspection-a method which is inaccurate-or they only use liveweight measure [7], which is influenced by factors including gut fill variation, frame size, physiological stage and fleece weight [2].The low uptake of BCS among farmers may in some part be due to challenges such as assessor subjectivity and extra labor requirements [2].Attempts to increase the uptake of BCS among farmers-including use of promotional training workshops and regular traininghave not yielded the desired outcome, likely because they do not directly alleviate the labor burden related to hands-on BCS [2].Therefore, accurate and reliable alternative methods to estimate body condition score with less hands-on measurement would be advantageous and would likely improve the uptake of BCS technology, especially for large flocks.
Ewe BCS and LW are correlated [2,8,9].This relationship varies by age, stage of the annual cycle and breed of animal [8,9].Semakula et al. [9] reported that in Romney ewes, both LW and BCS plateaued after they reached 43-54 months of age, thereby establishing a stable base BCS-LW relationship.This means that, as a ewe ages, future liveweights, based on BCS-LW prediction equations, could potentially be used to predict a BCS with a degree of accuracy and reduce the need for hands-on BCS measurement.
Modern automated weighing systems with individual electronic identification offer an opportunity to collect lifetime data relatively easily and quickly.With such large datasets, it has become possible to process and extract valuable information.Semakula et al. [10] applied multivariate regression models to predict ewe BCS from lifetime liveweight data as a ewe aged from eight to sixty-seven months.At best, these multivariate models explained 49% and 21% of the variability in BCS using the five-scale (nine points) and three-scale (three points), respectively.Further, BCS was skewed with little variability due to the limited nature of the BCS scale used (1-5, in increments of 0.5).Using only discrete values such as BCS can lead to the heaping or grouping of all possible values (i.e., noncontinuous) at isolated points, affecting the resolution and ultimately the accuracy of any prediction model.
Approaches that circumvent the challenges of considering discrete as continuous data are required for BCS prediction.Classification-based models are recommended for discrete and categorical data analysis [11][12][13][14].Among these classification approaches, machine learning (ML) classification models have been used with greater success compared to traditional statistical methods in sheep production for early estimation of the growth and quality of wool in adult Australian merino sheep [15] and sheep carcass traits [16] from early-life data.Machine learning utilizes algorithms whose logic can be learned directly from unique patterns in the data or inexplicitly through pre-programmed classical statistical methods [17].The successful use of ML algorithms in various fields of science warrants their application in animal production problem solving [18,19].Ideally, it should be possible to install this computer-acquired intelligence into modern weighing systems to automatically explore patterns in lifetime liveweights and predict BCS.The aim of this study was to investigate the use of machine learning algorithms to predict ewe BCS from current and previous liveweight data.In the present study, ewe BCS was predicted for the ewes in their fourth year of life (43-54 months) at four stages of the annual system using previous liveweight measurements.

Farms and Animals Used and Data Collection
The current study was a follow-up of the previous two studies [9,10].In their study, Semakula et al. [9] only determined the nature of the relationship between LW and BCS (linear) and the factors affecting their relationship (ewe age, stage of annual cycle and pregnancy rank).In the subsequent study, Semakula et al. [10] demonstrated the potential of predicting ewe BCS as a continuous variable from liveweight and previous BCS records.The resulting linear models had high prediction error (>10%), and a greater part of the variability in BCS (from 39 to 89%) remained unexplained.The current study attempts to predict BCS from LW records in a more precise way, using machine learning algorithms.The details on how the animals were managed and data was collected were reported in Semakula et al. [9].

Statistical Analyses
Data were analyzed using R program version 4.3.4[20] with caret package extensions [21].Data were initially explored to identify completeness and were summarized by BCS to determine class distribution.Missing values (n = 26) were imputed using the bagimput function from the caret package.This method constructs a "bagging" model for a given variable based on regression trees, using all other variables as predictors while maintaining the original data distribution structure [21].Liveweight data were normalized and centered during analysis using the pre-process function from the caret package.The distribution of BCS at all stages of the annual cycle showed that on a full BCS scale (1-5), there were high-class imbalances (more than 1:50 for any two classes).The average ratios of the class frequencies (minimum: maximum) were 1:216, 1:1336, 1:498 and 1:97 for pre-breeding, pregnancy diagnosis, pre-lambing and weaning, respectively (Figure 1A).The high-class or extreme imbalance was due to too few extreme BCS cases with the majority of individual BCS measurements ranging from 2.5 to 3.5.[20] with caret package extensions [21].Data were initially explored to identify completeness and were summarized by BCS to determine class distribution.Missing values (n = 26) were imputed using the bagimput function from the caret package.This method constructs a "bagging" model for a given variable based on regression trees, using all other variables as predictors while maintaining the original data distribution structure [21].Liveweight data were normalized and centered during analysis using the pre-process function from the caret package.The distribution of BCS at all stages of the annual cycle showed that on a full BCS scale (1-5), there were high-class imbalances (more than 1:50 for any two classes).The average ratios of the class frequencies (minimum: maximum) were 1:216, 1:1336, 1:498 and 1:97 for prebreeding, pregnancy diagnosis, pre-lambing and weaning, respectively (Figure 1A).The high-class or extreme imbalance was due to too few extreme BCS cases with the majority of individual BCS measurements ranging from 2.5 to 3.5.
Triguero et al. [22] categorized class imbalances above 50:1 for any two outcomes as high-class imbalance.Body condition score data is both discrete and ordered in nature, which makes multiclass classification regression approaches more suitable for its analysis.However, when the underlying assumptions are grossly violated or when classes are extremely imbalanced [23], classification statistical methods become less accurate [24].Strategies to overcome the challenge of class imbalance may include resampling techniques such as oversampling, undersampling and synthetic minority oversampling [25].Such methods of circumventing class imbalances hold in cases below 50:1 imbalance.In the case of high-class imbalance, the samples generated become less representative of the true sample distribution leading to underfitting or overfitting the model.Triguero et al. [22] categorized class imbalances above 50:1 for any two outcomes as high-class imbalance.Body condition score data is both discrete and ordered in nature, which makes multiclass classification regression approaches more suitable for its analysis.However, when the underlying assumptions are grossly violated or when classes are extremely imbalanced [23], classification statistical methods become less accurate [24].Strategies to overcome the challenge of class imbalance may include resampling techniques such as oversampling, undersampling and synthetic minority oversampling [25].Such methods of circumventing class imbalances hold in cases below 50:1 imbalance.In the case of high-class imbalance, the samples generated become less representative of the true sample distribution leading to underfitting or overfitting the model.
To improve the balance of the BCS class distribution, a new but narrower three-class BCS scale was devised (BCS 1.0-2.0: 1, 2.5-3.5: 2 and >3.5: 3) (Figure 1B).The selection of a new scale was guided by literature, where BCS of 2.5 to 3.5 is considered to be the range for optimal performance [2].Below this BCS range, there is reduced performance; above this range, energy is used inefficiently.In addition, the resulting classes were resampled through minority class oversampling to create "synthetic" data, a method popularly known as SMOTE [25] using the SmoteClassif function in the UBL package [26].Resampling improves the class-level distribution (balances the number of per class observations) of a categorical variable so that the assumptions of classification models can hold.

Variable Selection and Model Building
The best variable combinations for prediction of BCS (1, 2 or 3) at each stage of the annual cycle using liveweight were selected through the regularization and variable selection technique utilizing the elastic net method in the glmnet extension [27] in the caret package [21].The elastic net method combines the power of two penalized-regularization methods (ridge and lasso regression) to search for significant predictors and handling of collinearity [28].
All models were fitted and validated using four steps as described by Semakula et al. [9].The steps included: (i) data partitioning, (ii) resampling, (iii) model training and (iv) validation.Data were partitioned with stratification into training and testing datasets in a ratio of 3:1, with replacement.Resampling was done using the bootstrapping and aggregation [29] procedures in the caret package [21].During resampling, 10 equal-sized subsamples, repeated three times, were selected from the dataset.Prediction models were trained on nine subsample sets which were used to compute the parameters, and the 10th was used to evaluate the model as well as compute the error.The procedure was run 30 times (10-folds repeated three times), and the average parameter values and their probabilities were computed as described by Semakula et al. [9].
The algorithms used for this work were selected from a range of probabilistic and nonprobabilistic methods in order to cover the most commonly used machine learning algorithms [17,30].A summary of the concepts, advantages and disadvantages of each algorithm is given in Table A1 in Appendix A. Further, the criteria for selecting these methods included (i) successful application in other animal science studies [16,19,20] and (ii) ability to handle multiclass categorization [24].Three traditional (ordinal logistic, multinomial regression [31,32] and linear discriminant analysis (LDA) [33]) statistical models (white box or low-level machine learning models), two low-level black models (random forest (RF) [34] and classification and regression trees (CART) [35]) and four high-level black box models (support vector machines (SVM) [36] and k-nearest neighbors (K-NN) [37,38], neural networks (ANN), and gradient boosting decision trees (XGB) [39]) were compared.Machine learning models can be categorized in two main ways: (i) whether data provides labels that classify variables (supervised) or not (unsupervised) [40]; and (ii) if a clear description of the analysis detailing how covariates and the target variable are related (classical statistical methods or white boxes), a partial description blue print (low-level or semiblack boxes) or no description can be given (high-level black boxes) [17].All algorithms were implemented in R package using several caret package extensions (nnet, multinom, polr, lda, rpart, svmLinear, xgblinear, rf and knn (http://topepo.github.io/caret/index.html,accessed on 19 January 2021)).A chart summarizing the model building and evaluation procedures is given as in the appendices (Figure A1).

Model Performance Evaluation
Using a three-class BCS scale (1.0-2.0,2.5-3.5, >3.5), model fit and ranking between models were assessed using overall accuracy, balanced accuracy, precision, F-measure, sensitivity, and specificity.The metrics were computed from the number of true positive (TP), true negative (TN), false positive (FP) and false negative (FN) predictions as described by Tharwat [24].In addition, Cohen's kappa statistic [41]-a common measure to calculate agreement between the classification of qualitative observations was calculated as described by McHugh [42] and Botchkarev [43].To evaluate the power of the algorithms to correctly classify ewe BCS, measures of the balance (authenticity and prediction power) between sensitivity and specificity were computed.These indicators of model power and authenticity (positive likelihood ratio, negative likelihood ratio and Youden's index) combine sensitivity and specificity to emphasize how well a model can predict the outcome [44].
A detailed description of the metrics (accuracy and authenticity) used in model assessment is given in Table 1.
Table 1.Model performance evaluation metrics.

Balanced accuracy
The proportion of correctly classified subjects for each class.Useful especially when there is class imbalance.

Negative likelihood rate (NLR)
The ratio between the false negative and true negative rates and mirrors the probability for "negative" events to be detected by a model.

NLR = 1−Sensitivity Specificity
Youden's index (YI) The sum of sensitivity and specificity minus one YI = (Sensitivity + Specificity) − 1 Cohen's kappa (κ) Measures the degree of agreement between two raters or ratings (inter-rater or interrater reliability) κ = po −pe The analysis generated a dataset of 108 records (4 time points, 3 BCS classes and 9 models of two groups of model performance evaluation metrics firstly, the indicators of accuracy: balance accuracy, precision and F-measure, and secondly measures of model authenticity: sensitivity and specificity).To obtain a holistic picture of the overall model performance, the two groups of performance metrics were examined.Initially, each group of variables was explored using principal component analysis (PCA) to determine the appropriate number of components of dimensions where the Eigen values associated with each component were compared with those generated through a probabilistic process based on Monte Carol PCA for parallel analysis simulation [45,46].Monte Carlo PCA simulated Eigen values allow comparisons based on the same sample size and number of variables.If the Eigen value of a component from real data is greater than the simulated one, then that component is important.Otherwise, if equal to or less than, such components are considered not important.Consequently, one component was considered important from each group of variables (indicators of accuracy: explained variance = 87%; indicators of sensitivity-specificity: explained variance = 61%) having explained most of the variability in the group data.
Principal component analysis is limited to continuous data.In order to decipher the patterns in the relationship between the categorical variable (BCS) and each model regarding their overall performance, a correspondence analysis was required.Therefore, the FAMD function in the FactoMiner package [47] was used to analyze both groups of variables.The FAMD extension combines PCA and multiple correspondence analysis (MCA) to conduct factor analysis.Each group of variables then resulted in a single dimension (latent variable).A scatterplot of accuracy and sensitivity-specificity latent variables was constructed for each of the four stages of the annual sheep weighing cycle.Models were ranked on a scale of 1 to 9 (where 1 is best and 9 is the poorest) at each stage of the annual cycle, to obtain the overall performance rank.

Overall Performance of Machine Learning Models
This section presents results for the accuracy in a broad sense, sensitivity and specificity of nine models in predicting ewe BCS based on the testing dataset (Tables 2 and 3).
Additionally, Table A2 is supplied in the appendix, which show the comparisons between model accuracy across stages of the annual sheep weighing cycle in New Zealand.Model: (XGB: Gradient boosting decision trees model, RF: random forest, K-NN: k-nearest neighbors, SVM: support vector machines, ANN: neural networks, Multinorm: multinomial regression, LDA: linear discriminant analysis, Ordinal: ordinal logistic regression, CART: classification and regression tree).The superscripts 1,2,3 where 1: 1.0-2.0,2: 2.5-3.5 and 3: > 3.5 indicate the BCS class from which the value was observed.In their sequence, the first superscript indicates the class from which the minimum estimate was observed, while the second value indicates the class from which the maximum estimate was achieved).All ewe BCS predictions were based on current and previous liveweight.
Results showed that there were significant (p < 0.05) differences in model prediction performance based on the Boniferroni p-value adjustment method for pairwise comparisons (Table A2, Appendix A).The gradient boosting decision tree algorithm (XGB) had the highest (p < 0.05) accuracy (average = 90.3%)and kappa statistic (κ = 82.1%)at pre-breeding, pregnancy diagnosis, pre-lambing and weaning, making it the most accurate algorithm for ewe BCS prediction on the one to three (1.0-2.0;2.5-3.5;>3.5) scale (Table 2).The RF (Figure A2, Appendix A) algorithm had a slightly lower but still good accuracy, making it the best alternative to XGB.The multinorm, LDA, ordinal and CART algorithms had moderate to fair accuracies.Pre-lambing, XGB and RF were comparable and had the highest accuracies.The random forest and k-nearest neighbors (K-NN) in decreasing order were also considered good prediction models, having scored above 80% accuracy and 70% kappa statistics at all times of the year.The CART algorithm consistently gave the lowest (p > 0.05) accuracy except pre-lambing where its accuracy was (p = 0.047; Table A1) comparable to that of ordinal logistic regression.The lowest average accuracy was 66.6% seen for the CART model at weaning (Table 2, parenthesis).Overall, all algorithms had greater accuracy than a random guess (i.e., accuracy = 33.3%) in classifying BCS.
In terms of overall authenticity, models were biased towards being more specific than sensitive (Table 3).The ranking of model authenticity followed a trend like that of accuracy.The gradient boosting decision tree algorithm (XGB) had the highest sensitivity (average = 87.7%)as well as specificity (average = 93.9%)across all stages of the annual sheep weighing cycle, making it the most authentic and powerful algorithm for categorizing ewe into the correct BCS classes on three-point scale (1.0-2.0;2.5-3.5;>3.5) (Table 3).The XGB model was closely followed by RF (average sensitivity = 85.5%, average specificity: 92.8%) while CART (average sensitivity: 58.7%, average specificity: 79.5%) was the poorest.
In the following section we present results for the construct or latent variables which are representative of the three specific measures of model accuracy (class-level or balanced accuracy, precision and F-measure) together with two indicators of predictive power/authenticity (sensitivity, specificity) across four stages of the annual sheep weighing cycles (Figures 2-5).A summary of the indicators of accuracy and authenticity was provided in Tables 2 and 3. Additionally, Table A3 provides two extra measures of accuracy (precision and F-measure) used in the construction of the accuracy latent variable.The results show the patterns in the relationship between the latent variables with BCS class prediction for each model.The CART model had the lowest accuracy and power measures across all stages of the annual sheep weighing cycle and was selected as the reference for comparisons.

Pregnancy Diagnosis
At pregnancy diagnosis, the models had a clear-cut hierarchy in performance, with XGB being the best and CART the poorest (Figure 3).The multinom and LDA models were closely juxtaposed indicating that they had comparable performance.The XGB was the best algorithm with 21% more accuracy than CART, which was the least accurate in predicting ewe BCS (Table 2).The best balance between accuracy and authenticity was observed in the ANN model.The XGB, RF, SVM and K-NN models were biased towards accuracy while the multinom, LDA, ordinal and CART were biased towards authenticity Accuracy latent variable (30.35%) Sensitivity-Specificity latent variable (23.48%)  dot is above, then model or BCS class was more accurate than sensitive-specific while the reverse indicates that the model was more sensitive than accurate.The further and more positive a model is along the diagonal line, the greater and better is its prediction power.The variance explained by each extracted first dimension for each latent variable (accuracy, sensitivity-specificity) is given in parenthesis along the axes.

Pre-Lambing
At pre-lambing, the models had a clear-cut hierarchy in performance, with XGB being the best and CART the poorest (Figure 4).It was worth noting that the K-NN model, which had been among the best four models at pre-breeding and pregnancy diagnosis, was downgraded into a moderate model.The K-NN, multinom and LDA models had overlapping overall performance.The XGB was the best algorithm with 23% more accuracy than CART, which was the least accurate in predicting ewe BCS (Table 2).All models were biased with XGB, RF, SVM and ANN inclined towards accuracy, while K-NN, Multinon, LDA, ordinal and CART were inclined towards authenticity (Figure 4).The best overall accuracy was achieved in the >3.5 BCS class and the lowest in the 2.5-3.5 class (Table 2, parenthesis).Regarding BCS class-level model accuracy, there was no clear pattern.The majority of the models (RF, K-NN, ANN, multinom, LDA and ordinal) were most accurate in the >3.5 BCS class and least accurate in the 2.5-3.5 class.The least accuracy for majority of the models (XGB, RF, K-NN, SVM and ordinal) was observed in the 2.5-3.5 class.The highest accuracy (92%) was achieved using the RF model in the >3.5 BCS class and the lowest (63%) was observed using the CART algorithm in either the 1.0-2.0class (Table 2, parenthesis).
All models were most sensitive to the >3.5 class and least sensitive to the 1.0-2.0class except K-NN and CART with the highest sensitivity in the 2.5-3.5 class and ordinal with the lowest sensitivity in the 2.5-3.5 class.The XGB was the best algorithm with 31% more sensitive than CART, which was the least sensitive in predicting ewe BCS (Table 3).The highest BCS classification sensitivity was observed using XGB models (88.8%) in the >3.5 BCS class while CHART (37.9%) had the lowest in the 1.0-2.0class (Table 3, parenthesis).All models had the highest specificity observed in the 1.0-2.0BCS class.The XGB was the If dot is above, then model or BCS class was more accurate than sensitive-specific while the reverse indicates that the model was more sensitive than accurate.The further and more positive a model is along the diagonal line, the greater and better is its prediction power.The variance explained by each extracted first dimension for each latent variable (accuracy, sensitivity-specificity) is given in parenthesis along the axes.
Agriculture 2021, 11, x FOR PEER REVIEW 11 of 24 best algorithm with 16% more specificity than CART, which had the least specificity in predicting ewe BCS (Table 2).The highest specificity (97.5%) was observed in the 1.0-2.0class for XGB and the lowest (71.2%) in the 2.5-3.5 class for CART model (Table 3, parenthesis).A plot of the accuracy and sensitivity-specificity latent variables from their first dimension/component obtained through a factor analysis of mixed variables (a combination of principle component and multiple correspondence analyses) procedure on measures of performance for the prediction of ewe BCS at pre-lambing.Dots (red sphere: model, blue square: BCS class).Dotted diagonal line indicates a balance between accuracy and sensitivity-specificity.If dot is above, then model or BCS class was more accurate than sensitive-specific while the reverse indicates that the model was more sensitive than accurate.The further and more positive a model is along the diagonal line, the greater and better is its prediction power.The variance explained by each extracted first dimension for each latent variable (accuracy, sensitivityspecificity) is given in parenthesis along the axes.

Weaning
At weaning, the models had a clear-cut hierarchy in performance, with XGB being the best and CART the poorest (Figure 5).The RF and K-NN models had overlapping overall performance.The XGB was the best algorithm with 33% more accuracy than Accuracy latent variable (29.25%) Sensetivity-Specificity latent variable (19.65%) Figure 4.A plot of the accuracy and sensitivity-specificity latent variables from their first dimension/component obtained through a factor analysis of mixed variables (a combination of principle component and multiple correspondence analyses) procedure on measures of performance for the prediction of ewe BCS at pre-lambing.Dots (red sphere: model, blue square: BCS class).Dotted diagonal line indicates a balance between accuracy and sensitivity-specificity.If dot is above, then model or BCS class was more accurate than sensitive-specific while the reverse indicates that the model was more sensitive than accurate.The further and more positive a model is along the diagonal line, the greater and better is its prediction power.The variance explained by each extracted first dimension for each latent variable (accuracy, sensitivity-specificity) is given in parenthesis along the axes.
Agriculture 2021, 11, x. https://doi.org/10.3390/xxxxxwww.mdpi.com/journal/agriculture the >3.5 BCS class and the least in the 2.5-3.5 class, except for the CART, whose specificity arrangement was the opposite, and for ANN and multinom, which had their highest specificity in the 1.0-2.0class.The XGB was the best algorithm with 17% more specificity than CART, which had the least specificity in predicting ewe BCS (Table 3).The highest specificity (96.5%) was observed in the 1.0-2.0class for XGB and the lowest (72.4%) in the 2.5-3.5 class for CART model (Table 3, parenthesis).dot is above, then model or BCS class was more accurate than sensitive-specific while the reverse indicates that the model was more sensitive than accurate.The further and more positive a model is along the diagonal line, the greater and better is its prediction power.The variance explained by each extracted first dimension for each latent variable (accuracy, sensitivity-specificity) is given in parenthesis along the axes.

The Balance between Sensitivity and Specificity
The data showed that the overall specificity 86% (67-98%) was higher than sensitivity 74% (37-96%) values across all algorithms (Table 3).An assessment of the indicators of the balance between sensitivity and specificity was undertaken and the indices are summarized in Table 4.The positive likelihood ratio (PLR) for all models were greater than 1.0 while the negative likelihood ratio (NLR) was less than 1.0 across stages of the annual cycle.The XGB model had the highest PLR and lowest NLR, while CART had the lowest PLR and highest NLR across stage of the annal cycle.Similarly, Youden's index, YI, was consistently highest for XGB model and lowest for the CART model.Accuracy latent variable (29.75%) Sensitivity-Specificity latent variable (20.68%) Figure 5.A plot of the accuracy and sensitivity-specificity latent variables from their first dimension/component obtained through a factor analysis of mixed variables (a combination of principle component and multiple correspondence analyses) procedure on measures of performance for the prediction of ewe BCS at weaning.Dots (red sphere: model, blue square: BCS class).A plot of the accuracy and sensitivity-specificity latent variables from the first dimension/component obtained through a factor analysis of mixed variables (a combination of Principle Component Analysis and Multiple Correspondence Analysis) procedure on measures of performance for the prediction of ewe BCS at weaning.Dots (red sphere: model, blue square: BCS class).Dotted diagonal line indicates a balance between accuracy and sensitivity-specificity.If dot is above, then model or BCS class was more accurate than sensitive-specific while the reverse indicates that the model was more sensitive than accurate.The further and more positive a model is along the diagonal line, the greater and better is its prediction power.The variance explained by each extracted first dimension for each latent variable (accuracy, sensitivity-specificity) is given in parenthesis along the axes.

Pre-Breeding
At pre-breeding, the models had a clear-cut hierarchy in performance, with XGB being the best and CART the poorest (Figure 2).The XGB was the best algorithm with 17% more accuracy than CART, which was the least accurate in predicting ewe BCS (Table 2).The best balance between accuracy and authenticity (points along or touching the diagonal line) was observed in the moderate performing models including ANN, multinom, LDA and ordinal (Figure 2).The best performing models (XGB, RF, SVM and K-NN) were biased towards accuracy while the poorest (CART) was biased towards authenticity.In terms of BCS, the best accuracy was achieved in the 1.0-2.0class and the lowest in the 2.5-3.5 class for all models except for XGB which was least accurate in the >3.5 class.The best accuracy (97.5%) was achieved using the XGB in the 1.0-2.0BCS class, and the lowest (58.6%) was observed using the CART algorithm in the 2.5-3.5 class (Table 2, parenthesis).
All models were most sensitive to the 1.0-2.0class and least sensitive to the 2.5-3.5 class except XGB which was least sensitive to the > 3.5 class.The XGB was the best algorithm, being 23% more sensitive than CART, which was the least sensitive in predicting ewe BCS (Table 3).The highest BCS classification sensitivity was observed using XGB and K-NN models (96.3%) in the 1.0-2.0BCS class while CART (37.0%) had the lowest in the 2.5-3.5 class (Table 3, parenthesis).All models had the highest specificity observed in the 1.0-2.0BCS class except for SVM which had the highest specificity in the >3.5 class and both K-NN and CART which had their lowest in the >3.5 class.The XGB was the best algorithm with 12% more specificity than CART, which had the least specificity in predicting ewe BCS (Table 2).The highest specificity (98.9%) was observed in the 1.0-2.0class for XGB and the lowest (72.6%) in the >3.5 class for CART model (Table 3, parenthesis).

Pregnancy Diagnosis
At pregnancy diagnosis, the models had a clear-cut hierarchy in performance, with XGB being the best and CART the poorest (Figure 3).The multinom and LDA models were closely juxtaposed indicating that they had comparable performance.The XGB was the best algorithm with 21% more accuracy than CART, which was the least accurate in predicting ewe BCS (Table 2).The best balance between accuracy and authenticity was observed in the ANN model.The XGB, RF, SVM and K-NN models were biased towards accuracy while the multinom, LDA, ordinal and CART were biased towards authenticity (Figure 3).In terms of BCS, the best accuracy was achieved in the 1.0-2.0class and the lowest in the >3.5 class for all models except for SVM, ANN and ordinal which were least accurate in the 2.5-3.5 class.The highest accuracy (93.4%) was achieved using the XGB in the 1.0-2.0BCS class and the lowest (64.0%) was observed using the CART algorithm in either the >3.5 class (Table 2

, parenthesis).
There was no clear pattern in class-level model sensitivity at pregnancy diagnosis.The XGB was the best algorithm with 29% more sensitivity than CART, which was the least sensitive in predicting ewe BCS (Table 3).The highest BCS classification sensitivity was observed using K-NN models (91.8%) in the 1.0-2.0BCS class while CART (41.1%) had the lowest in the >3.5 class (Table 3, parenthesis).All models had the highest specificity observed in the 1.0-2.0BCS class except for CART which had the its highest in the >3.5 class.The XGB was the best algorithm with 14% more specificity than CART, which had the least specificity in predicting ewe BCS (Table 2).The highest specificity (96.3%) was observed in the 1.0-2.0class for XGB and the lowest (67.1%) in the 2.5-3.5 class for CART model.

Pre-Lambing
At pre-lambing, the models had a clear-cut hierarchy in performance, with XGB being the best and CART the poorest (Figure 4).It was worth noting that the K-NN model, which had been among the best four models at pre-breeding and pregnancy diagnosis, was downgraded into a moderate model.The K-NN, multinom and LDA models had overlapping overall performance.The XGB was the best algorithm with 23% more accuracy than CART, which was the least accurate in predicting ewe BCS (Table 2).All models were biased with XGB, RF, SVM and ANN inclined towards accuracy, while K-NN, Multinon, LDA, ordinal and CART were inclined towards authenticity (Figure 4).The best overall accuracy was achieved in the >3.5 BCS class and the lowest in the 2.5-3.5 class (Table 2, parenthesis).
Regarding BCS class-level model accuracy, there was no clear pattern.The majority of the models (RF, K-NN, ANN, multinom, LDA and ordinal) were most accurate in the >3.5 BCS class and least accurate in the 2.5-3.5 class.The least accuracy for majority of the models (XGB, RF, K-NN, SVM and ordinal) was observed in the 2.5-3.5 class.The highest accuracy (92%) was achieved using the RF model in the >3.5 BCS class and the lowest (63%) was observed using the CART algorithm in either the 1.0-2.0class (Table 2, parenthesis).
All models were most sensitive to the >3.5 class and least sensitive to the 1.0-2.0class except K-NN and CART with the highest sensitivity in the 2.5-3.5 class and ordinal with the lowest sensitivity in the 2.5-3.5 class.The XGB was the best algorithm with 31% more sensitive than CART, which was the least sensitive in predicting ewe BCS (Table 3).The highest BCS classification sensitivity was observed using XGB models (88.8%) in the >3.5 BCS class while CHART (37.9%) had the lowest in the 1.0-2.0class (Table 3, parenthesis).All models had the highest specificity observed in the 1.0-2.0BCS class.The XGB was the best algorithm with 16% more specificity than CART, which had the least specificity in predicting ewe BCS (Table 2).The highest specificity (97.5%) was observed in the 1.0-2.0class for XGB and the lowest (71.2%) in the 2.5-3.5 class for CART model (Table 3, parenthesis).

Weaning
At weaning, the models had a clear-cut hierarchy in performance, with XGB being the best and CART the poorest (Figure 5).The RF and K-NN models had overlapping overall performance.The XGB was the best algorithm with 33% more accuracy than CART, which was the least accurate in predicting ewe BCS (Table 2).The majority of the models were biased towards accuracy, except for multinom, LDA, ordinal and CART, which were inclined towards authenticity (Figure 5).The best overall accuracy was achieved in the >3.5 BCS class and the lowest in the 2.5-3.5 class.Regarding the BCS level model accuracy, there was no clear pattern.However, the majority of the models (XGB, RF, SVM, K-NN and ANN) were most accurate in the >3.5 BCS class.The least model accuracy was equally observed in the 1.0-2.0 and 2.5-3.5 BCS classes, across models.The highest accuracy (93.2%) was achieved using the RF model in the >3.5 BCS class, and the lowest (61.4%) was observed using the CART algorithm in either the 2.5-3.5 class (Table 2

, parenthesis).
There was no clear pattern in class-level model sensitivity at weaning.The XGB was the best algorithm with 34% more sensitivity than CART, which was the least sensitive in predicting ewe BCS (Table 2).The highest BCS classification sensitivity was observed using XGB models (92.3%) in the 2.5-3.5 BCS class while CHART (39.2%) had the lowest in the 2.5-3.5 class (Table 3, parenthesis).All models had the highest specificity observed in the >3.5 BCS class and the least in the 2.5-3.5 class, except for the CART, whose specificity arrangement was the opposite, and for ANN and multinom, which had their highest specificity in the 1.0-2.0class.The XGB was the best algorithm with 17% more specificity than CART, which had the least specificity in predicting ewe BCS (Table 3).The highest specificity (96.5%) was observed in the 1.0-2.0class for XGB and the lowest (72.4%) in the 2.5-3.5 class for CART model (Table 3, parenthesis).

The Balance between Sensitivity and Specificity
The data showed that the overall specificity 86% (67-98%) was higher than sensitivity 74% (37-96%) values across all algorithms (Table 3).An assessment of the indicators of the balance between sensitivity and specificity was undertaken and the indices are summarized in Table 4.The positive likelihood ratio (PLR) for all models were greater than 1.0 while the negative likelihood ratio (NLR) was less than 1.0 across stages of the annual cycle.The XGB model had the highest PLR and lowest NLR, while CART had the lowest PLR and highest NLR across stage of the annal cycle.Similarly, Youden's index, YI, was consistently highest for XGB model and lowest for the CART model.A good model (PLR value > 1.0 and the larger PLR is the better, NLR value less than 1.0 and the smaller the better, YI ranges from 0 to 1.0 and values that approach 1.0 show higher authenticity and prediction power).

Overall Model Ranking
Overall, black box models were better than low-level white box models (Table 5).The XGB was consistently the best performing while CART was the poorest model.There was change in model ranking across stages of the annual cycle except for XGB, LDA, ordinal and CART.

Discussion
The present study utilized machine learning classification algorithms to explore the possibility of predicting BCS from current and previous liveweight in mature ewes (at approximately 43-54 months of age).Body condition score was treated as a categorical variable with three levels (1.0-2.0,2.5-3.5;>3.5).Nine of the most recognized machine learning models (XGB, ANN, RF, K-NN, SVM, ordinal, multinom, LDA and CART models) were applied to preprocessed datasets.
We applied a strategy to reduce the accuracy and authenticity measures into two dimensions in order to generate latent variables or constructs that were plotted to give a visual summary of model performance.This technique gave a visual display (a holistic picture) of overall model performance which made it easier to decipher the patterns in the relationship between the accuracy and authenticity of models in BCS prediction.Previous studies have suggested the use of several metrics to give an indication about a model's accuracy and authenticity [24,43,48,49].These have, however, been piecemeal with no unifying interface.By bringing together both accuracy and authenticity measures in a single display, we appear to have cracked that enigma.This innovation could serve as a platform for interrogating even better ways of model performance evaluation.

Overall Accuracy
The findings suggest that ewe BCS prediction from current and previous liveweight can be achieved using machine learning classification algorithms within the limited BCS range used in the present study.The results indicated that XGB was the most efficient and robust model (overall accuracy = 87.6%;sensitivity = 87.7%;specificity = 93.9%).Other good alternatives to XGB for predicting ewe BCS were three algorithms (K-NN, RF and SVM) with accuracies > 80% and kappas > 70%, while the remaining four (CART, ordinal, LDA and multinomial) were weak algorithms (accuracies < 70%, kappas < 60%).All models performed better than a random guess, with the most efficient models giving prediction errors as low as 11% and 38%.According to Galdi and Tagliaferri [50], a perfect classifier has a rate of 100%, while a random guess would give a 33.3% error for threelevel classifiers [50,51].The weakest algorithms outperformed a random guess by only 8, 11, 15 and 20%, respectively, using the current study data.Whereas accuracy measures can be interpreted arbitrarily, Cohen's kappa statistic has been classified [42,52] into six different categories, no agreement (values ≤ 0), none to slight (0.01-0.20), fair (0.21-0.40), moderate (0.41-0.60), substantial (0.61-0.80) and almost perfect agreement (0.81-1.00).Further, Fleiss et al. [53] suggested that kappa values greater than 0.75 may be taken to represent excellent agreement beyond serendipity, values below 0.40 as poor agreement and values between 0.40 and 0.75 as fair to good agreement.The findings in this study suggest that using the top performing algorithms (XGB and RF), ewe BCS can be predicted with high accuracy across four phases of the annual cycle.

Class-Level Accuracy
Results also showed that at the accuracy-related class level, metrics including accuracy, precision and F-measure were highest for XGB, making it the most efficient and robust model for ewe BCS prediction.Further, there appeared to be variability in all metrics across stages of the annual sheep weighing cycle and BCS class.This variation in accuracy across the stages of the annual cycle suggests that with the exception of XGB, different models may be required to predict BCS at different stages of the annual cycle.Similarly, different models may be required if there is need for greater accuracy in one BCS class than others.This is especially important when great accuracy is required for management decisions with far reaching consequences such as when limited resources must be allocated to only target classes.Further, results indicated that the higher-level (black box) machine learning models such as XGB and RF were better at separating BCS into distinct classes than the lower-level (white box) models such as multinomial or ordinal logistic regression.
In the current study, the best balance between accuracy and authenticity (sensitivityspecificity) was achieved during pre-breeding compared to other stages of the annual cycle.This observation could have been due to the "relative ease" to condition score ewe pre-breeding than other stages of the annual cycle [2,54].Prior to breeding, most farmers enhance ewe feeding in a process known as flushing [55,56], which likely resulted in uniform tissue (fat and muscle) distribution around the body.In addition, the weight measurements recorded pre-breeding are not confounded by the conceptus mass which is the case at pregnancy diagnosis and pre-lambing.The conceptus mass influences the ewe liveweight from pregnancy through the pre-lambing stage [54,57], which coincides with the two time-point weight measurements during those stages of the annual cycle.Further, during lactation a ewe has its greatest nutrient requirements for energy and protein [58], and at weaning a ewe is drained by the lactation process, leading to variability in fat deposition around the body; consequently, the ewe are lighter.Using the same ewe population, we have previously reported a decreasing trend in ewe BCS as a ewe aged, plateauing after 43-54 months [9].This was attributed to a likelihood that farmers were underfeeding their aging ewes at certain stages or periods of the annual cycle.Lactation period could be one of such periods, resulting in failure to meet ewe dietary energy and protein requirements and consequently leading to thinner animals.The management conditions at pregnancy diagnosis, pre-lambing and weaning, therefore, could lead to differences in fat deposition around the body, resulting in variability in BCS.

Class-Level Model Authenticity
Among the indicators of model authenticity, the models had apparently greater specificity than sensitivity, which could point to unbalanced distinguishing power to make predictions.An examination of three indicators of balance between sensitivity and specificity or model authenticity/power (PLR and YI) indicated that all models had values within acceptable authenticity and power (PLR > 1.0, NLR < 1.0 and YI > 1.0) across stage four stages of the annual cycle, indicating that all models had balanced sensitivity and specificity.Results also showed that XGB had the highest PLR and YI and the lowest NLR.Combined with the results from the measures of accuracy, these results rank XBG as the most robust model for BCS prediction.Sensitivity is defined as the proportion of individuals or items who belong to a given BCS class and are correctly identified, while specificity is the proportion which do not belong to a given class and are excluded by the test.There exists an inverse relationship between sensitivity and specificity of a test or prediction model [59,60].If a model has high sensitivity, it is capable of detecting "real" BCS classes, but it also faces losses from consuming more resources due to mandatory confirmatory tests (to rule out the false positives) or when the limited resources have to be given to only the right candidates.However, if a model has high specificity, the system benefits from a significant reduction in the consumption of resources and time, but it has a decreased capacity to detect "real" BCS classes, which can lead to failure to detect many events of importance [44].The higher specificity would not be advantageous, as failure to detect ewes inside or outside the BCS range (2.5-3.5) for optimum productivity would affect management decisions negatively.Therefore, a good model needs to achieve a balance between sensitivity and specificity [55].
This study suggests that ewe BCS prediction from current and previous liveweight can usefully be achieved using machine learning classification algorithms within a limited BCS range used in the present study.This study used unadjusted liveweight (i.e., confounded by factors such as fleece length variations and fetal mass from pregnancy to lambing) records alone to achieve accuracies up to 89% in order to assign BCS to one out of three classes.It is likely that if adjusted liveweights were used together with other key variables that affect BCS, optimum accuracy would be achieved from these BCS prediction algorithms.Semakula et al. [10] suggested that the accuracy of BCS prediction could be improved if all key variables affecting the relationship between liveweight and BCS were accounted for.If this was the case, the efficiency of the machine learning models tested could also be enhanced.
Although not directly comparable, having used different scale ranges and different measures of model performance, the best ML model (XGB) in the current study had great efficiency (based on liveweight predictors, alone and achieved greater than 90% accuracies) and was stable (accuracy: 86-93%) across stages of the annual cycle.In their previous study based on linear regression models, Semakula et al. [10] achieved only weak to moderate wellness of fit (R 2 = 50%) using more resources (both LW and BCS records combined).Further, the model wellness of fit and accuracy varied greatly (R 2 : 28-64%) across stages of the annual cycle, making the linear regression models less stable.When combined, therefore, this suggests that machine learning models would offer better BCS predictions than the linear regression models.

Conclusions
The results of the present study showed that ewe BCS (grouped) can be predicted with great accuracy on a narrow BCS (1.0-2.0,2.5-3.5, >3.5) scale from a ewe's current and previous liveweight using machine learning algorithms.The gradient boosting decision trees algorithm was the most efficient for ewe BCS prediction.The results of this study, therefore, support the hypothesis that BCS can be accurately predicted from a ewe's current and previous liveweights.The algorithms, having been trained on a large representative dataset, should be able to give accurate ewe BCS predictions.These algorithms (acquired intelligence) could be incorporated into weighing systems to easily and quickly give farmers ewe BCS without the need for the hands-on burden.Future studies should investigate how to ameliorate the accuracy of BCS prediction and the possibility of individual BCS prediction on a full range (1-5).

Figure 1 .Figure 1 .
Figure 1.Distribution of ewe body condition scores by stage of the annual cycle from 18,354 individual records of 5761 ewes during their fourth year (43-54 months) of age.Bar colors (grey, yellow, blue and green) indicate BCS proportions

F − measure = 2 *
correctly classified subjects for a given class given that they truly belonged to that class Precision = TP TP+FP F-measure The harmonic mean of the precision and sensitivity best if there is some sort of balance between precision and sensitivity.(sensitivity * precision) sensitivity+precision Sensitivity The proportion of correctly classified subjects for a given class to those who truly belong to that class.Sensitivity = TP TP+FN Specificity The proportion of subjects correctly classified as not belonging to a given class to those that truly do not belong to that class.Specificity = TN TN+FP Positive likelihood rate (PLR) The ratio between the true positive and the false positive rates for "positive" events that are detected by a model.PLR = Sensitivity 1−Specificity = true positive, TN = true negative, FP = false positive, FN = false negative, κ = Cohen's kappa statistic, p o = actual observed agreement, and p e represents chance agreement.

Figure 2 .
Figure 2.A plot of the accuracy and sensitivity-specificity latent variables from their first dimension/component obtained through a factor analysis of mixed variables (a combination of principle component and multiple correspondence analyses) procedure on measures of performance for the prediction of ewe BCS during pre-breeding.Dots (red sphere: model, blue square: BCS class).Dotted diagonal line indicates a balance between accuracy and sensitivity-specificity.If dot is above, then model or BCS class was more accurate than sensitive-specific, while the reverse indicates that the model was more sensitive than accurate.The further and more positive a model is along the diagonal line, the greater and better its prediction power.The variance explained by each extracted first dimension for each latent variable (accuracy, sensitivityspecificity) is given in parenthesis along the axes.

Figure 2 .
Figure 2.A plot of the accuracy and sensitivity-specificity latent variables from their first dimension/component obtained through a factor analysis of mixed variables (a combination of principle component and multiple correspondence analyses) procedure on measures of performance for the prediction of ewe BCS during pre-breeding.Dots (red sphere: model, blue square: BCS class).Dotted diagonal line indicates a balance between accuracy and sensitivity-specificity.If dot is above, then model or BCS class was more accurate than sensitive-specific, while the reverse indicates that the model was more sensitive than accurate.The further and more positive a model is along the diagonal line, the greater and better its prediction power.The variance explained by each extracted first dimension for each latent variable (accuracy, sensitivity-specificity) is given in parenthesis along the axes.

Figure 3 .
Figure 3.A plot of the accuracy and sensitivity-specificity latent variables from their first dimension/component obtained through a factor analysis of mixed variables (a combination of principle component and multiple correspondence analyses) procedure on measures of performance for the prediction of ewe BCS during pregnancy diagnosis.Dots (red sphere: model, blue square: BCS class).Dotted diagonal line indicates a balance between accuracy and sensitivity-specificity.If dot is above, then model or BCS class was more accurate than sensitive-specific while the reverse indicates that the model was more sensitive than accurate.The further and more positive a model is along the diagonal line, the greater and better is its prediction power.The variance explained by each extracted first dimension for each latent variable (accuracy, sensitivity-specificity) is given in parenthesis along the axes.

Figure 3 .
Figure 3.A plot of the accuracy and sensitivity-specificity latent variables from their first dimension/component obtained through a factor analysis of mixed variables (a combination of principle component and multiple correspondence analyses) procedure on measures of performance for the prediction of ewe BCS during pregnancy diagnosis.Dots (red sphere: model, blue square: BCS class).Dotted diagonal line indicates a balance between accuracy and sensitivity-specificity.If dot is above, then model or BCS class was more accurate than sensitive-specific while the reverse indicates that the model was more sensitive than accurate.The further and more positive a model is along the diagonal line, the greater and better is its prediction power.The variance explained by each extracted first dimension for each latent variable (accuracy, sensitivity-specificity) is given in parenthesis along the axes.

Figure 4 .
Figure 4.A plot of the accuracy and sensitivity-specificity latent variables from their first dimension/component obtained through a factor analysis of mixed variables (a combination of principle component and multiple correspondence analyses) procedure on measures of performance for the prediction of ewe BCS at pre-lambing.Dots (red sphere: model, blue square: BCS class).Dotted diagonal line indicates a balance between accuracy and sensitivity-specificity.If dot is above, then model or BCS class was more accurate than sensitive-specific while the reverse indicates that the model was more sensitive than accurate.The further and more positive a model is along the diagonal line, the greater and better is its prediction power.The variance explained by each extracted first dimension for each latent variable (accuracy, sensitivityspecificity) is given in parenthesis along the axes.

Figure 5 .
Figure5.A plot of the accuracy and sensitivity-specificity latent variables from their first dimension/component obtained through a factor analysis of mixed variables (a combination of principle component and multiple correspondence analyses) procedure on measures of performance for the prediction of ewe BCS at weaning.Dots (red sphere: model, blue square: BCS class).A plot of the accuracy and sensitivity-specificity latent variables from the first dimension/component obtained through a factor analysis of mixed variables (a combination of Principle Component Analysis and Multiple Correspondence Analysis) procedure on measures of performance for the prediction of ewe BCS at weaning.Dots (red sphere: model, blue square: BCS class).Dotted diagonal line indicates a balance between accuracy and sensitivity-specificity.If dot is above, then model or BCS class was more accurate than sensitive-specific while the reverse indicates that the model was more sensitive than accurate.The further and more positive a model is along the diagonal line, the greater and better is its prediction power.The variance explained by each extracted first dimension for each latent variable (accuracy, sensitivity-specificity) is given in parenthesis along the axes.

Table 2 .
Accuracy and kappa statistics of nine predictive models for ewe BCS at 43-54 months of age at different stages of the annual cycle.Values in parenthesis denote the minimum and maximum accuracy, in ascending order.
ANN: neural networks, Multinorm: multinomial regression, LDA: linear discriminant analysis, Ordinal: ordinal logistic regression, CART: classification and regression tree).The superscripts 1,2,3 where 1: 1.0-2.0,2: 2.5-3.5 and 3: >3.5 indicate the BCS class from which the value was observed.The first superscript indicates the class from which the minimum estimate was observed, while the second value indicates the class from which the maximum estimate was achieved).All models were significant (p < 0.05) and better than a random guess (i.e., accuracy = 33.3%).All ewe BCS predictions were based on current and previous liveweight.

Table 3 .
Indicators of authenticity (sensitivity and specificity) of nine predictive models for ewe BCS at 43-54 months of age at different stages of the annual cycle.Values in parenthesis denote the minimum and maximum sensitivity or specificity, in ascending order.

Table 4 .
Measures of the balance between sensitivity and specificity of the BCS prediction models by stage of the annual cycle.networks, multinorm: multinomial regression, LDA: linear discriminant analysis, Ordinal: ordinal logistic regression, CART: classification and regression tree).Measures of the balance between sensitivity and specificity (PLR: Positive likelihood rate, NLR: Negative likelihood rate and YI: Youden's index).

Table 5 .
Model ranking by stage of annual cycle and overall.Overall (overall rank with means in parenthesis).The lower the rank the greater the BCS prediction performance.