Applying Machine Learning Algorithms for the Classification of Mink Infected with Aleutian Disease Using Different Data Sources

Simple Summary Aleutian disease (AD) is a major infectious disease found in mink farms, and it causes financial losses to the mink industry. Controlling AD often requires a counterimmunoelectrophoresis (CIEP) method, which is relatively expensive for mink farmers. Therefore, predicting AD infected mink without using CIEP records will be important for controlling AD in mink farms. In the current study, we applied nine machine learning algorithms to classify AD-infected mink. We indicated that the random forest could be used to classify AD-infected mink (accuracy of 0.962) accurately. This result could be used for implementing machine learning in controlling AD in the mink farms. Abstract American mink (Neogale vison) is one of the major sources of fur for the fur industries worldwide, whereas Aleutian disease (AD) is causing severe financial losses to the mink industry. A counterimmunoelectrophoresis (CIEP) method is commonly employed in a test-and-remove strategy and has been considered a gold standard for AD tests. Although machine learning is widely used in livestock species, little has been implemented in the mink industry. Therefore, predicting AD without using CIEP records will be important for controlling AD in mink farms. This research presented the assessments of the CIEP classification using machine learning algorithms. The Aleutian disease was tested on 1157 individuals using CIEP in an AD-positive mink farm (Nova Scotia, Canada). The comprehensive data collection of 33 different features was used for the classification of AD-infected mink. The specificity, sensitivity, accuracy, and F1 measure of nine machine learning algorithms were evaluated for the classification of AD-infected mink. The nine models were artificial neural networks, decision tree, extreme gradient boosting, gradient boosting method, K-nearest neighbors, linear discriminant analysis, support vector machines, naive bayes, and random forest. Among the 33 tested features, the Aleutian mink disease virus capsid protein-based enzyme-linked immunosorbent assay was found to be the most important feature for classifying AD-infected mink. Overall, random forest was the best-performing algorithm for the current dataset with a mean sensitivity of 0.938 ± 0.003, specificity of 0.986 ± 0.005, accuracy of 0.962 ± 0.002, and F1 value of 0.961 ± 0.088, and across tenfold of the cross-validation. Our work demonstrated that it is possible to use the random forest algorithm to classify AD-infected mink accurately. It is recommended that further model tests in other farms need to be performed and the genomic information needs to be used to optimize the model for implementing machine learning methods for AD detection.


Introduction
Mink is the major source of the fur industry worldwide [1], and Aleutian disease (AD), which is caused by the Aleutian mink disease virus (AMDV), brings tremendous financial losses to the mink industry [2]. The AD is associated with some important traits of farmed tively. The CIEP tests were conducted at the Animal Health Laboratory at the University of Guelph (Guelph, ON, Canada) to detect the existence of anti-AMDV antibodies in the blood samples, and the results were recorded as negative or positive. The IATs were conducted at the CCFAR to measure the serum gamma globulin level in the serum, and the results were scored into four categories from 0 (low) to 4 (high). All bodyweight (BW), growth parameters, and feed intake data were collected using the established protocols described by Do et al. [34] and Davoudi et al. [35], respectively. The mink were housed individually in single cages, and the feed was distributed to each pen every morning. The amount of allocated feed was regulated based on the leftover records one day before in order to avoid unnecessary feed waste and to meet the mink's appetite. The daily feed intake (DFI) was obtained by subtracting the amount of leftover from the quantity of feed supplied. The average daily feed intake (ADFI) was calculated by averaging the DFI records obtained during the test period. The average daily gain (ADG), feed conversion ratio (FCR), Kleiber ratio (KR), residual feed intake (RFI), residual gain (RG), and residual intake and gain (RIG) were derived from the body weight and daily feed intake data [35]. The growth curve parameters, including asymptotic weight (α), growth rate at mature (k), shape parameter (m), weight at the inflection point (WIP), and age at the inflection point (AIP), were obtained from the body weights of mink using the Richard growth model [34]. A total of 33 features were examined for the development of the ML algorithms to classify animals for AD. The number of CIEP positive and negative mink as well as the mean values for these features are given in Table 1. A simplifized workflow of the current study is shown in Figure 1.

Algorithm Selection and the Data Preparation
The classification of AD-infected mink was constructed using the following algorithms: artificial neural networks, decision tree, extreme gradient boosting, gradient boosting, K-nearest neighbors, linear discriminant analysis, support vector machines (linear form), naive bayes, and random forest. These algorithms were selected as they have been widely used for the diagnosis of human diseases, e.g., cancers [36][37][38], as well as for predicting phenotypes in livestock [17,19,39]. All calculations were performed in R using the caret package [40], and CIEP was used as the response variable in the models. Since the CCFAR was infected with AD in 2012, more animals were positive (954) than negative (203) using CIEP in the current dataset. The missing data of features (Table 1) were input using the mice R package [41]. If the imbalance ratio is high, the decision function favors the majority group (positive CIEP group). For non-probabilistic classifiers such as logistic regression, neural networks, and support vector machines algorithms, an imbalanced data structure can affect their parameters [39]. We used the over-sampling function in the Rose package [42] to create the balanced data. In this function, the minority group (negative CIEP, n = 203) was oversampled from 203 to 954 in order to balance a sample size as the majority group (positive CIEP, n = 954). The preProcess function from the caret package

Algorithm Selection and the Data Preparation
The classification of AD-infected mink was constructed using the following algorithms: artificial neural networks, decision tree, extreme gradient boosting, gradient boosting, Knearest neighbors, linear discriminant analysis, support vector machines (linear form), naive bayes, and random forest. These algorithms were selected as they have been widely used for the diagnosis of human diseases, e.g., cancers [36][37][38], as well as for predicting phenotypes in livestock [17,19,39]. All calculations were performed in R using the caret package [40], and CIEP was used as the response variable in the models. Since the CCFAR was infected with AD in 2012, more animals were positive (954) than negative (203) using CIEP in the current dataset. The missing data of features (Table 1) were input using the mice R package [41]. If the imbalance ratio is high, the decision function favors the majority group (positive CIEP group). For non-probabilistic classifiers such as logistic regression, neural networks, and support vector machines algorithms, an imbalanced data structure can affect their parameters [39]. We used the over-sampling function in the Rose package [42] to create the balanced data. In this function, the minority group (negative CIEP, n = 203) was oversampled from 203 to 954 in order to balance a sample size as the majority group (positive CIEP, n = 954). The preProcess function from the caret package [40] was used to scale and center the variables in the training dataset. The relative importance of the features and feature selection were examined using the Boruta package [43]. The Boruta algorithm is a wrapper approach that is built based on the random forest. This algorithm creates shadow features as a replica of actual features, and then randomly shuffles to remove any correlation with the response variable. In the next step, a random forest classifier is run, and the Z-score is computed by dividing the average loss by its standard deviation. The maximum Z-score of randomized shadow features is used to set a threshold for the selection of important features [43]. If the Z-score computed for an actual feature is significantly more than the Z-score of the shadow feature, then it is considered as an important feature [43].

Model Training and Performance Assessment
Following the oversampling, the data was randomly divided into 80% for training and 20% for testing datasets. We created ten different sets of training-testing dataset using the createDataPartition function. The models were built in each training dataset and evaluated in each test data. In the training dataset, we used the trainControl function to select the hyperparameters for the model building. The repeated cross-validation methods implemented in the trainControl function were used. In this method, for each one of the ten iterations, the hyperparameters were selected using a search within the 10-fold cross-validation structure on a random 70% subset of the training dataset. Each algorithm was run separately using the default initial hypermeters and the train function of the caret package. The confusionMatrix function was used to evaluate the model performance from the best built model for each training dataset and the corresponding testing dataset.
The model fit and ranking of the models were assessed using several scores that were computed using the number of true positive (TP), true negative (TN), false positive (FP), and false-negative (FN). The following formulas were used for the calculation of accuracy (Equation (1)), specificity (Precision; Equation (2)), which is the fraction of correct predictions, sensitivity (Recall; Equation (3)), which measures a fraction of the correct predictions per true number of samples, and F-Measure (F1; Equation (4)), which is a goodness of fit assessment for a classification analysis that balances precision and recall: A receiver operating characteristic (ROC) curve was used to depict the sensitivity against 1-specificity over all possible decision thresholds ranging for classifying the predicted AD-infected mink and was characterized using the pROC package [44]. The accuracy of the models was assessed by calculating the area under the curve (AUC). The values of AUC were interpreted as non-accurate (AUC = 0.5), less accurate (0.5 < AUC ≤ 0.7), moderately accurate (0.7 < AUC ≤ 0.9), highly accurate (0.9 < AUC < 1) and perfectly (AUC = 1) [45]. Moreover, the pairwise differences in the accuracy of the models were compared using the Wilcoxon test.

Feature Importance and the Model Performance
The descriptive statistics of all numerical features are shown in Table 1. A total of 33 different features were collected and used as input in the Boruta package. The relative importance of the features based on the random forest from the Boruta package is shown in Figure 2.
The ELISA-P was identified as the most important feature for the classification of AD-infected mink. Other important features based on the ranking by the Boruta package for the AD-infected mink classification were those related to bodyweight measures, growth curve parameters, and DFI. Sex, birth year, and color were the less important features for the classification of AD-infected mink. The importance of the ELISA is expected as the ELISA systems are also alternative methods for the diagnosis of AD [46]. Previously, we also reported that the ELISA tests had significant phenotypic and genetic correlations with CIEP [33]. The age at sampling might be an important feature for CIEP since, if animals stayed on the farm for a longer period of time, they might have a higher chance of being infected by the AMDV. Bodyweights, growth curve parameters, and DFI were important traits for the growth of animals. Since AD harms the animal's health and growth [47], it was expected that these features would be important for classifying AD-infected animals. Interestingly, the variation in feed intake was important for the CIEP classification, which might be because of the inconsistency in the diet of infected mink. Sex and color type were less important for the CIEP classification, which was also supported by our previous study that these effects were not significantly affecting the CIEP [33]. The ELISA-P was identified as the most important feature for the classification of AD-infected mink. Other important features based on the ranking by the Boruta package for the AD-infected mink classification were those related to bodyweight measures, growth curve parameters, and DFI. Sex, birth year, and color were the less important features for the classification of AD-infected mink. The importance of the ELISA is expected as the ELISA systems are also alternative methods for the diagnosis of AD [46]. Previously, we also reported that the ELISA tests had significant phenotypic and genetic correlations with CIEP [33]. The age at sampling might be an important feature for CIEP since, if animals stayed on the farm for a longer period of time, they might have a higher chance of being infected by the AMDV. Bodyweights, growth curve parameters, and DFI were important traits for the growth of animals. Since AD harms the animal's health and growth [47], it was expected that these features would be important for classifying AD-infected animals. Interestingly, the variation in feed intake was important for the CIEP classification, which might be because of the inconsistency in the diet of infected mink. Sex and color type were less important for the CIEP classification, which was also supported by our previous study that these effects were not significantly affecting the CIEP [33]. Table 2 presents the sensitivity, specificity, F1, and accuracy of the nine ML algorithms using four different sub-sampling procedures. Overall, the sensitivity, specificity, F1, and accuracy were varied between the algorithms. The specificity ranged from 0.588 (K-nearest neighbors) to 0.938 (random forest), while the sensitivity ranged from 0.841 (naive bayes) to 0.987 (extreme gradient boosting). All algorithms obtained higher specificity values than sensitivity values. All algorithms had the F1 and accuracy values of more than 0.7, thereby indicating that they could be used for the classification of AD-infected mink with an acceptable accuracy. The random forest algorithm had an excellent performance considering both the F1 measure and accuracy (>0.95).  Table 2 presents the sensitivity, specificity, F1, and accuracy of the nine ML algorithms using four different sub-sampling procedures. Overall, the sensitivity, specificity, F1, and accuracy were varied between the algorithms. The specificity ranged from 0.588 (K-nearest neighbors) to 0.938 (random forest), while the sensitivity ranged from 0.841 (naive bayes) to 0.987 (extreme gradient boosting). All algorithms obtained higher specificity values than sensitivity values. All algorithms had the F1 and accuracy values of more than 0.7, thereby indicating that they could be used for the classification of AD-infected mink with an acceptable accuracy. The random forest algorithm had an excellent performance considering both the F1 measure and accuracy (>0.95).  Table 3 shows the confusion matrix obtained from the random forest algorithm. The random forest could correctly classify 186 out of 190 CIEP positive mink and 184 out of 190 CIEP negative mink. The Friedman test indicated the significant differences in the accuracies obtained from the different algorithms according to the subsampling procedures (p-value < 2.2 × 10 −16 ). The paired samples used in the Wilcoxon tests for the differences indicated that all algorithms had significant differences in their accuracies, except for the K-nearest neighbors with the linear discriminant analysis (p = 0.18) and the naive bayes (p = 0.51) ( Table 4).   The average values of the AUC (Figure 3) indicated that the random forest and the extreme gradient boosting were the best algorithms with the highest AUC as their AUC values were >0.90. Other algorithms had a moderate AUC with the AUC values ranging from 0.75 (naive bayes and K-nearest neighbors) to 0.88 (gradient boosting).

Performance Assessment
All of the tested algorithms in this study have been used for the diagnosis of diseases [17,39,45]. The decision tree, gradient boosting, random forest, and extreme gradient boosting are all tree-based models, and their performances will be influenced by the class imbalances via the leaf impurity. To solve the problem of class imbalance, the oversampling method was chosen for handling the imbalance classifiers as it does not lead to any information loss. In the current study, the random forest outperformed the other methods, which was consistent with the previous study [39] that implemented the random forest to predict the leg weakness in pigs. However, the random forest was not the best method as observed by Shao et al. [48], who showed that the support vector machines outperformed the neural networks, random forest, and linear regression to predict the corrected inventory decision for the market using China's hog inventory data. The random forest was also less accurate compared with the support vector machines, kernel ridge regression, and Adaboost.R2 in the prediction of the reproductive performance traits in pigs using genomic data [49]. The random forest approach is known to be fairly stable in the presence of outliers and noise and can handle the correlations between the predictors [49,50].   All of the tested algorithms in this study have been used for the diagnosis of diseases [17,39,45]. The decision tree, gradient boosting, random forest, and extreme gradient boosting are all tree-based models, and their performances will be influenced by the class imbalances via the leaf impurity. To solve the problem of class imbalance, the oversampling method was chosen for handling the imbalance classifiers as it does not lead to any information loss. In the current study, the random forest outperformed the other Extreme gradient boosting was the second-best method for the classification of ADinfected mink, which might be because this method could perform implicit variable selections and could capture the non-linear relationships [51,52]. Both the random forest and the extreme gradient boosting showed great potential for the classification of CIEP in the current study with high accuracies, F1 values, and AUCs. Especially, these algorithms were close to perfect (sensitivity > 0.99) in classifying AD-infected mink. Both the random forest and extreme gradient boosting succeeded in detecting the posture and behavior in dairy cows [51] with the accuracy obtained from the extreme gradient boosting algorithm in predicting the posture was 0.99, and the random forest had the highest overall accuracy in predicting the behavior (0.76). The NNET performed fairly well, but was not the best, and that could be due to the limitation of the fine tuning in hyper parameters for the NNET or the limitation of the small sample size in the current study. Nevertheless, the performance of the ML algorithms depends on the data, and therefore, it is necessary to test different algorithms to find the most suitable one.
The current study had some limitations. Although CIEP has been adapted by the mink farmers for controlling AD, it is important to mention that the CIEP test results are binary outcomes while AD is a chronic disease. The CIEP measure is sensitive to the status of the disease; therefore, the results of the current study might be limited by the lack of repeated measurements of CIEP. More frequent measures of CIEP are required to confirm the AD status better and to consequently apply the ML algorithms to the classification of ADinfected mink. Additionally, even though the random forest reached a very high accuracy, specificity, and sensitivity, some individuals were still wrongly classified. Therefore, larger sample sizes with more features or better hyper-parameterizing are required for correctly classifying these individuals. The results of the current study were also limited to use in the AD positive mink farms. However, the majority of mink farms are infected with AD; thus, the results are still beneficial for most mink farmers. In the meantime, these results could be helpful for the farmers who want to cull animals based on the classified AD-infected mink obtained by the ML algorithms.
Finally, although being considered as a gold standard for the AD test, CIEP is a relatively expensive test and requires a large labor force, due to the many steps in the CIEP test that are performed manually. Moreover, the CIEP results are prone to false-positive results as the accuracies of the CIEP results are dependent on the experience of the readers in visualizing the bands. These drawbacks of CIEP limit its application in large mink farms. Alternatively, the ELISA test can be used for high-throughput assays, and the ELISA results can be used in the ML approaches (e.g., random forest or extreme gradient boosting) to accurately classify the AD-infected mink. Therefore, the mink farmers might not need to perform the CIEP test, but use the information from the ELISA test to predict AD risks and to decide which animals are needed to be culled for the control of AD.

Conclusions
In summary, among the nine ML algorithms, the random forest was the best method for the classification of AD-infected mink in the current dataset. This study indicated that it is possible to classify AD-infected mink with a high accuracy, specificity, and sensitivity using the random forest algorithms. Therefore, it is suggested that the random forest algorithm might be used for classifying the AD-infected mink in other AD-positive farms. Given the fact that the current study used the data from only one AD-positive farm and the performance of the ML algorithms were sensitive to the data input, it is recommended that further model tests in other AD-positive farms be performed. Since AD is a chronic disease, it is also recommended to collect disease records more frequently for better disease monitoring. Finally, it is also recommended to combine the genomic information to optimize the model for the implementation of machine learning methods in controlling AD.