Investigation of the Relationships between Coat Colour, Sex, and Morphological Characteristics in Donkeys Using Data Mining Algorithms

Simple Summary The donkey (Equus asinus) is an odd-toed ungulate and the smallest species in the Equidae family. It is characteristically short-legged with extremely long ears. The wild ancestor of the donkey is equally Equus asinus, which is generally known as the “African wild ass” and is reportedly still extant. Donkeys are the only ungual animal domesticated exclusively in Africa. By nature, donkeys are very companionable, calm, enduring, intelligent, prudent, playful, and keen to learn, and they enjoy the company of humans. In Turkey, donkeys are used for pack transport and riding in order to lessen the physical load on humans. This study was conducted to assess the prediction performance of various algorithms using the morphological traits, body coat colour distribution, and body measurements of donkeys raised in Turkey. Abstract This study was carried out in order to determine the morphological characteristics, body coat colour distribution, and body dimensions of donkeys raised in Turkey, as well as to determine the relationships between these factors. For this reason, the predictive performance of various machine learning algorithms (i.e., CHAID, Random Forest, ALM, MARS, and Bagging MARS) were compared, utilising the biometric data of donkeys. In particular, mean measurements were taken from a total of 371 donkeys (252 male and 119 female) with descriptive statistical values as follows: height at withers, 100.7 cm; rump height, 103.1 cm; body length, 103.8 cm; chest circumference, 112.8 cm; chest depth, 45.7 cm; chest width, 29.1 cm; front shin circumference, 13.5 cm; head length, 55 cm; and ear length, 22 cm. The body colour distribution of the donkeys considered in this study was calculated as 39.35% grey, 19.95% white, 21.83% black, and 18.87% brown. Model fit statistics, including the coefficient of determination (R2), mean square error, root-mean-square error (RMSE), mean absolute percentage error (MAPE), and standard deviation ratio (SD ratio), were calculated to measure the predictive ability of the fitted models. The MARS algorithm was found to be the best model for defining the body length of donkeys, with the highest R2 value (0.916) and the lowest RMSE, MAPE, and SD ratio values (2.173, 1.615, and 0.291, respectively). The experimental results indicate that the most suitable model is the MARS algorithm, which provides a good alternative to other data mining algorithms for predicting the body length of donkeys.


Introduction
Donkeys have played a very important role in agricultural practices until recently, and they are still used for transportation in rural areas of Turkey, as well as in other countries where cultivation is still carried out [1,2]. They have traditionally been used as a beast of burden and, even though the world has moved toward mechanisation, this ancient animal is still being used as a biological vehicle. In dry and semi-dry areas, they serve humans testicular ultrasonographic measurements in donkeys, precision calliper measurements after orchiectomy in some of the donkeys, and sperm quality parameters.
To the best of our knowledge, there is no research on the use of CHAID, Automatic Linear Model, Random Forest, MARS, and Bagging MARS algorithms for body length prediction using different biometric measurements from donkeys. This study aims to determine the body length of donkeys through a comparative analysis of data mining methods, utilising gender, age, coat colour, and morphological characteristics. The results are expected to be useful in determining the relationships between various morphological features of donkeys.

Materials and Methods
For this study, 371 donkeys aged 1-15 years old were analysed, with data collected from Agri (39 • [19,20]. The section relates to physical description results mainly from studies carried out by the second author between October 2013 and February 2014 in 13 provinces in Turkey. The donkeys were distributed by colour as follows: 70 brown donkeys (18.87%), 146 grey donkeys (39.35%), 81 black donkeys (21.83%), and 74 white donkeys (19.95%). In terms of gender, there were 119 female donkeys (32%) and 252 male donkeys (68%).
Live weights and various body measurements were collected from the native donkey populations in different provinces of Turkey in order to determine their morphological features.
A total of 12 different body measurements were collected from the donkeys. The body measurements, including withers height (WH), height at rump (HR), chest depth (CD), chest width (CW), and body length (BL), were measured by means of a measuring stick. Other body measurements, including chest circumference (CC), head length (HL), front shank circumference (CAC), head length (HL), and ear length (EL), were obtained by means of a tape measure [21,22]. The ages of the donkeys were determined by their owners.

Chi-Square Automatic Interaction Detection (CHAID)
CHAID is a technique based on a criterion variable with two or more categories. This allows investigators to determine the segmentation with respect to that variable in accordance with the combination of a range of independent variables [23,24]. CHAID was originally proposed in [25]. The CHAID algorithm applies the F significance test to a scale response variable [26], and Bonferroni adjustment is utilised to calculate adjusted p values in the tree structure [27].
The selection of the suitable independent variables from the set of input variables is made in such a way that, in the resulting hierarchically arranged structure, the first independent variable for the partition of input data is selected as the variable with the lowest p-value, and is, for this reason, most strongly associated with the dependent variable. In hypothesis testing, if the p value is equal to or lower than the predefined level of significance α, then the alternative hypothesis, which suggests a dependency between variables, is accepted, which, in the context of tree development, denotes node splitting using a given independent variable. Else, the node is considered to be the terminal node. Tree building ends when the p values of all the observed independent variables are higher than a certain split threshold [28].
Merging the values of each independent variable so that a certain number of nodes, with statistically significant differences between them, appear on the tree. Actually, the algorithm identifies pairs of values of independent variables which are least different from the dependent variable so that the number of categories of predictor variables depends on the Chi-square test results and p value. If the obtained p value is higher than a certain merge threshold, the algorithm merges particular categories with no statistically significant Animals 2023, 13, 2366 4 of 23 differences. Next, the search for a new merging pair continues till the pairs for which the p value is smaller than the described level of significance α, are not identified [28].
Two key functions of statistical tests in CHAID analysis can be identified. These are a combination of individual values and determination of predictor variable categories and the selection of predictor variables according to the statistical significance of their association with the dependent variable [29]. If nonbinary predictor variables are concerned, then the test value increases along with the number of branches into which they are split. However, variables with more categories are more likely to be identified as statistically significant in relation to the dependent variable compared to the independent variables with fewer categories [24].
In the ordinal classification problems for variables, the Chi-square test is used in determining the significance of the relationship and the best split for each tree building level. For regression-type problems, the F-test is used as the criterion of numerical variables division [30]. Such applicability to both classification and regression problems is one of the key advantages of this algorithm. Conversely, one of the key disadvantages of the CHAID method is that it requires large amounts of data because they are at every tree level split into several groups, which may become too small for reliable analysis [31].

Random Forest (RF) Algorithm
The random forest model (RF) is a premium method for regression and classification in the field of decision tree learning. It is very influential, as its regression accuracy is typically better than that of other regression methods. The RF model was proposed by L. Breiman in 1984 [32].
Instead of splitting each node using the best split among all assessed variables, RF splits each node using the best among a subset of predictors randomly chosen at that node. A new training data set is obtained from the original data set with a replacement. Then, a tree is grown using random attribute selection [33].
RF is very fast and robust against overfitting, and it is possible to form as many trees as the user wants [34]. After developing a number of decision trees, the output of the model is obtained by averaging the output values of all of the individual trees. After training single trees, the learner bagging algorithm is applied to the RF model. Bagging repeatedly selects bootstrap samples from the training set and fits t b trees considering the Gini impurity of these samples. After the training process, the predicted values for unseen instances x are calculated by averaging the prediction results from all regression trees, as follows: Here, a random forest (RF) was learnt from an ensemble of 500 regression trees. All variables were included as predictors and total tree height served as the response variable [35]. The RF method was built using the R function "ranger", with all hyperparameters kept at the default values. The "Ranger" function was applied as a faster and more memory-influential random forest implementation for the analysis of data, compared to other commonly used random forest packages in R [36].
To model the relationship between body morphological features in this study, given these to training input-output, the RF regression model was performed as follows: • From the initial training dataset, ntree bootstrap sample sets, or X i (i = bootstrap iteration, and its value was restricted to the range of [1, ntree]), were drawn at random with replacement. For that bootstrap sample set, the elements that are absent from X i are referred to as out-of-bag data.

•
Morphological features were randomly chosen at each node of each tree, and the feature with the lowest Gini Index that best partitioned those features was chosen. • For each tree, until a predetermined stop condition was met, the data splitting process in each internal node of a rule was repeated from the root node.

Multivariate Adaptive Regression Spline (MARS)
Multivariate adaptive regression spline (MARS) is a nonparametric regression method put forward by Friedman in 1991 [37,38]. A regression double is typically denoted by (X i , Y i ), where X i represents the independent variable(s), and Y i represents the dependent variable(s). In the MARS model, there are one or more split point(s) for every independent variable, denoted as t i . For X i ≥ t i , there is an equation named the right-side basis function (BF), while for X i < t i there is another equation named the left-side basis function. These two basis functions (spline functions) relate X i to the dependent variable Y i . The following equations provide the mathematical representation of the right and left basis functions [39]: where q (≥0) is the power to which the splines are raised, which defines the degree of smoothness of the outcome function estimate. In a MARS algorithm, the approximated MARS function is composed of a linear combination of basis functions, defined as a product of basis functions. The MARS model can be written as follows [40]: where b km is the kth univariate function in B m (x) and k m denotes the total number of univariate terms multiplied in B m (x). When k m > 1, then k m is the degree of the interaction term. Conversely, if k m = 1, then the basis function is univariate. In each basis function, the refraction points are the knots for the basis function. The simplest form for b km are truncated linear functions of the form: where the location t is called the knot of the basis function.
In order to eliminate duplicate BFs, MARS uses the generalized cross-validation (GCV) criteria, which is stated in the following way [41]: The MARS predictive model with interaction term used in this paper was constructed based on the lowest GCV [42]. Ten-fold cross validation was considered as a resampling technique in the MARS model. Here, N represents the total number of points in the data, while C(B) is a complexity penalty that increases with the number of BFs in the model, determined as follows [43]:

Bootstrap Aggregating Multivariate Adaptive Regression Splines (Bagging MARS)
The Bagging MARS algorithm is a technique put forward in [44], which is performed to reduce the variance estimators in classification and regression. The use of this method is not only limited to improving the estimator but it may also be used to improve the accuracy and predictive power. Bagging MARS algorithm usages bootstrapping among resampling methods. Bagging models may provide their own internal estimate of predictive accuracy correlating well with either cross-validation estimates or test set estimates [45].
Theoretically, the bagging estimator is described as f Bagging = E f (x) [44]. In practice, the bootstrap expectation is obtained through the use of a Monte Carlo method. For every bootstrap simulation b {1, 2, . . . , B}, the MARS methodf b (x) is calculated to approximate the bagging expectation as follows: where the number B indicates the accuracy of the Monte Carlo approximation. Its value is generally taken as 100, depending on the sample size. A bootstrap sample (n) is a sample acquired randomly from the studied data on the basis of replacement. Some data points are selected multiple times in the bootstrap sample. Bagging MARS is a useful tool that is used to enhance the predictive accuracy of the MARS model [46]. Here, number of bootstrap samples was considered as three.

Automatic Linear Model (ALM)
Automatic Linear Modelling (ALM) is an improved version of the linear regression method, which is used to process and analyse data and make predictions. The term ALM refers to a data mining approach similar to regression trees, which utilises a machine learning approach to determine the best predictive model using the existent data. ALM is carried out in several steps, including preliminary data processing, replacing missing data values, determining the quality predictor, identifying outliers, and calculating the stepwise model and coefficient of determination (R 2 ).
where y is the dependent variable; c is a constant; b 1 , b 2 , . . . , b n are the parameter coefficients; and x 1 , x 2 , . . . , x n are the independent variables [47].
Since the process of evaluating all possible subsets can provide the best subsets after taking into account all possible regression models, the researcher can then select an appropriate final model from the most promising subsets [48].
ALM is considered a new method, introduced in SPSS software (version 19 and higher), and allows researchers to automatically select the best subset when there are generally large numbers of variables. In ALM, prediction variables are automatically transformed to provide an improved data fit, and SPSS uses time and other metrics rescaling, outlier correction, and other methods for this purpose [49].
To compare the predictive performances of the CHAID, RF, MARS, and Bagging MARS models in the 10-fold cross-validation, the following model evaluation criteria were calculated [50][51][52][53]. Performance evaluation of used data mining techniques is performed with proportions 80:20 of training data and test data.
Root-mean-square error (RMSE): Mean absolute percentage error (MAPE): Standard deviation ratio: The R software version 4.2.0 was used for the analyses, taking the number of folds in the cross-validation as 10 [54]. Results were obtained using the RF algorithm "random-Forest", MARS and Bagging MARS algorithms in the "earth" packages, while the model evaluation performance criteria for the data mining algorithms were calculated using the "ehaGoF" package [55].
Using the R "corrplot" package, the Pearson correlation coefficients between BL and body characteristics were calculated. Furthermore, the multicollinearity problem between the independent variables was assessed at the outset of the analysis, and it was discovered that there was no issue. CHAID and ALM techniques were carried out using relevant packages in the SPSS V.26.0 software (2019) [56].

Descriptive Statistics
Descriptive statistics, including the morphological characteristics, of donkeys aged 1-15 years with 4 different hair colours and bred in 13 different cities in Turkey are given in Table 1.

Correlation Matrix and Principal Component Analysis (PCA) Results
The correlation matrix for the body morphological characteristics of donkeys is presented in Table 2.
Examining the correlation coefficients in Table 2, the correlation coefficients a all morphological features were found to be positive. The highest correlations between WH and HR (0.951), WH and HL (0.783), and BL and HR (0.767); meanwhi weakest correlations were between EL and HL (0.045), CD and HL (0.096), and TL a (0.126). This information is also confirmed in the Principal Component Analysis graph shown in Figure 1. According to the PCA analysis, the contribution of pri component 1 (PC1) was 53.11%, while that of principal component 2 (PC2) was 10 for a total of 63.64%. In the PCA analysis, it was determined that all variables were same direction, and the correlation coefficients between all variables were positive result of the PCA, it was found that CAC and LL, HW and CW, HR and WH, CD an and EL and HL were closely related to each other. In other words, the corre coefficients between closely related variables were high and important. TL and H very distantly related, and therefore, the correlation coefficient between them w smallest. Similarly, CD and HL and EL and TL were also distantly related, thus prese low correlation coefficients.  The CHAID, random forest (RF), automatic linear modelling (ALM), MARS, and Bagging MARS methods were analysed to determine the effects of other morphological features on body length in donkeys. Their respective results are summarised in the following.

Result of the CHAID Algorithm
In order to determine the effects of variables on body length, the parent node/child node ratio was set as 32:16 in the CHAID algorithm. The number of folds in the crossvalidation was set as 10, and the regression tree obtained by the CHAID algorithm is presented as a diagram in Figure 2. Examining the CHAID diagram (Figure 1), it can be determined that the first-order effective independent variable affecting the body length of donkeys was HR (Adj. p value = 0.000, F = 132.422); the second-order independent variables were TL (Adj. p value = 0.001, F = 17.781), CW (Adj. p-value = 0.000, F = 30.249), and HW (Adj. p value = 0.000, F = 23.959); and the third-order independent variables were CAC (Adj. p value = 0.000, F = 23.827), LL (Adj. p value = 0.003, F = 14.527), and HW (Adj. p value = 0.048, F = 6.925). Branches generated by independent variables in the whole tree construction were statistically significant (p < 0.05). The performance of the CHAID algorithm was calculated as 0.728, 0.521, 3.89, and 3.03 in terms of R 2 , standard deviation ratio (SD ratio), RMSE, and MAPE, respectively. The results of the CHAID algorithm are generally summarised as follows.

Random Forest (RF) Algorithm Results
The RF algorithm results are summarised as follows. The random forest trees created to obtain the smallest error value are presented in Figure 3.

Random Forest (RF) Algorithm Results
The RF algorithm results are summarised as follows. The random forest trees created to obtain the smallest error value are presented in Figure 3. The model was constructed using an RF algorithm with the dependent variable as body length (BL). In the RF model, the linear traits of animals were included as predictors, namely, WH, HR, CD, CW, CC, HL, CAC, HL, EL, Province, sex, and coat colour. The random forest algorithm included 500 trees. The model described 82.95% of the variation of the dependent variable, with MSE = 8.911, RMSE = 2.985, MAE = 2.4, and Bias = 0.6. In the constructed model, the most significant factor affecting body length was province, withers height (WH), followed by HR and HL, respectively (Table 3 and Figure 4).  In the constructed model, the most significant factor affecting body length was province, withers height (WH), followed by HR and HL, respectively (Table 3 and Figure 4).  The regression tree depicting the morphological features affecting BL in the random forest (RF) algorithm is shown in Figure 5.    The regression tree depicting the morphological features affecting BL in the random forest (RF) algorithm is shown in Figure 5.  Examining Figure 5, the following explanations can be derived from the RF algorithm (n = 297 nodes).

Automatic Linear Modelling (ALM) Results
The model prediction coefficients and significance values obtained from the ALM are provided in Table 4. The ALM evaluated the predictability of the body length mean score. The morphological features which contributed most to the model are shown in Table 4. Notably, the variables EL and sex were not statistically significant in the ALM procedure. Table 4 also shows estimates for the parameters included in the overall model and their individual effects on the target variable. The coefficients focused on the relationship that each predictor had with the mean body length, holding the values of other predictor variables constant. The importance values of the predictors, as defined by the ALM procedure, are also given in Table 4. These values were normalised, such that the importance values were summed to 1. The accuracy value of this model was 71.3% (i.e., the adjusted R 2 of the model multiplied by 100).
The predictor importance graph ( Figure 6) indicates the relative importance of each predictor in estimating the model, where the values for the province, HR, CC, HL, HW, TL, and CAC were 0.415, 0.231, 0.171, 0.065, 0.051, 0.034 and 0.033, respectively. Overall, the results indicate that province was the most important predictor of body length.
individual effects on the target variable. The coefficients focused on the relationship that each predictor had with the mean body length, holding the values of other predictor variables constant. The importance values of the predictors, as defined by the ALM procedure, are also given in Table 4. These values were normalised, such that the importance values were summed to 1. The accuracy value of this model was 71.3% (i.e., the adjusted R 2 of the model multiplied by 100).
The predictor importance graph ( Figure 6) indicates the relative importance of each predictor in estimating the model, where the values for the province, HR, CC, HL, HW, TL, and CAC were 0.415, 0.231, 0.171, 0.065, 0.051, 0.034 and 0.033, respectively. Overall, the results indicate that province was the most important predictor of body length.  Table 5). The predictor values further indicated that the province, HR, CC, HL, HW, TL, and CAC were positively correlated with BL ( Figure 8).

MARS Algorithm Results
The model estimation coefficients obtained by the MARS algorithm for the prediction of body length are given in Table 6.
According to the results presented in Table 6, all of the coefficients for the MARS predictive model were statistically significant (p < 0.001). The desirable predictive quality of the MARS equation produced here was obtained while ensuring the smallest GCV (7.862). The recorded or observed values in body length of donkeys were correlated very strongly with those predicted by the MARS model (p < 0.001) as an animal breeding model. For the prediction equation of the MARS model with 50 terms, no over-fitting was observed, as the R 2 estimate (0.916) was close to the CVR 2 estimate (0.859). The SD ratio of 0.291, RMSE of 2.173, and MAPE of 1.615 indicate that the MARS model for capturing influential factors, such as morphological characteristics and age, had an excellent fit.  From the MARS algorithm results, some terms and their coefficients can be interpreted as follows: When HR > 108 in donkeys, the effect (corresponding to the positive coefficient of 0.286) on body length was found to be positive; if HR ≤ 108, the corresponding negative coefficient (−1.063314) on body length was found to result in an adverse effect. If CC > 115, the effect on body length (BL) was positive (4.494783 The greatest positive effect on body length in donkeys was 13.999 cm when CC > 118 cm; the second-largest positive effect was when HR ≤ 108, TL ≤ 46, and CAC > 13.5, in which case the body length will increase by 7.389 cm. The third-largest positive effect was when CC > 115 cm, where body length will increase by 4.495 cm. As for the greatest negative effect, body length will decrease by 8. The relative importance of the variables predicting body length as a result of the MARS algorithm is demonstrated in Figure 9. According to the MARS algorithm, the predictor importance graph displays the relative importance of each predictor in estimating the model, where the values for HR, WH, HL, province, HW, CW, LL, CC, CD, CAC, TL, EL, and sex, respectively. Overall, the results indicate that the HR variable was the most important predictor of body length. − 0.01671139 * max(0, 108 − HR) * max(0, CC − 111) * max(0, HW36) * max(0, LL − 48) It is also possible to estimate the body length by assigning various values to the morphological features that express the independent variables in the equation obtained using the MARS algorithm. For example, with Age = 12, WH = 107, HR = 106, CC = 124, CD = 47.5, CW = 34, HW = 41.5, TL = 59.5, HL = 55.5, CAC = 14.5, LL = 51, and EL = 22.5, when Colour = "Grey", Province = "Konya", and Sex = "Male", we obtain BL = 114.932 cm.
The relative importance of the variables predicting body length as a result of the MARS algorithm is demonstrated in Figure 9. According to the MARS algorithm, the predictor importance graph displays the relative importance of each predictor in estimating the model, where the values for HR, WH, HL, province, HW, CW, LL, CC, CD, CAC, TL, EL, and sex, respectively. Overall, the results indicate that the HR variable was the most important predictor of body length.  The estimated values obtained by the MARS algorithm are presented together with the observed values in Figure 10.
Animals 2023, 13, x FOR PEER REVIEW 1 The estimated values obtained by the MARS algorithm are presented togethe the observed values in Figure 10.

Bagging MARS Algorithm Results
The

Bagging MARS Algorithm Results
The

Discussion
In a previous study [16], the mean body length (BL) was 131.  [57]. These body morphological features are consistent with those observed in this study. In another study, the average weight of West African donkeys was 126 kg, with an average height at the withers of 99.5 cm and a body length of 104.4 cm [58]. This was also very close to the results obtained in this study.
The mean body length of Turkish native breed male and female donkeys with different coat colours was 101-109 cm [14], higher than that observed in this study. In one study [13], the average height at withers calculated in adult female Amiata donkeys reared in Tuscany was 125.8 cm, and their front shank length was 16.9 cm, again higher than the values obtained in this study. The mean chest circumference was found to be 108.42 cm in 6-month-old Pêga donkeys [59]. In this study, in which different body colours were studied, we obtained similar values.
In the study [14], one-way ANOVA and Tukey's tests were used to assess the statistical significance of differences between morphological characteristics of the studied donkey groups-Banat donkey, hybrid individuals, and two sub-populations of Balkan donkeys delineated based on their nuclear genetic profiles (BalkD-BGP and BalkD-RGP). Tukey's test in the one-way ANOVA analysis indicated that statistically significant differences between the two subpopulations of the Balkan donkey were obtained for characteristics such as body length, chest circumference, chest depth, chest width, and height at withers. The body measurements obtained from Banat donkeys were different from the results presented in the current study, while those in hybrid individuals, BalkD-BGP, and BalkD-RGP donkeys were similar.
For adult donkeys, withers height was 131.1 cm [60]. The minimum withers height for females as determined by [61] was 120 cm. In the study of [62], the mean withers height for male Pêga donkeys was reported as 131 cm. [63,64] investigated the Nordestino donkey breed and reported the withers height values as 117 cm and 106 cm, respectively, and reported that Pêga donkeys were taller. It was found to be higher than the values in this study. These differences were caused by different breeds, environmental conditions, and breeding in different regions.
When the morphological features of donkeys were examined in the study of [65], the average body length was 64 cm, chest circumference 113.2 cm, height at withers 102.4, tail length 60.7 cm, and ear length 26.7 cm. When compared with the body measurements in this study, chest circumference and height at withers measurements were found to be very close to each other, tail length and ear length characteristics were higher, but body length values were lower. While the correlations between these variables in the authors' study were in the range of −0.50-0.85, in this study, the correlation coefficients between the same variables were obtained in the range of 0.126-0.951. In different continents or regions where animals were raised, climate differences such as temperature and precipitation played an important role in this difference. In addition, the difference in donkey populations in the studies can be considered as another factor in the different results.

Conclusions
In this study, the performance indicators of the CHAID, RF, MARS, Bagging MARS, and ALM methods were analysed, in terms of their donkey body length prediction ability. A total of 11 morphological variables, as well as province, age, sex, and coat colour, were taken as inputs to build the models. The results were compared through several comparative statistics, including coefficient of determination (R 2 ), root-mean-squared error (RMSE), mean absolute percentage error (MAPE), and standard deviation ratio (SD ratio). The outcomes of this study are as follows: According to the MARS algorithm results, fourteen predictor variables affect body length in donkeys: namely, height at withers, height at the rump, chest circumference, chest depth, chest width, haunch width, ear length, head length, front shank circumference, limb length, tail length, age, sex, and coat colour. The variables that presented the largest contributions were chest circumference, height at rump, ear length, and front shank circumference.
The number of bootstrap samples was taken as three in Bagging MARS, which is a useful tool that can be used to improve the predictive accuracy of MARS models. However, the MARS algorithm obtained better donkey body length prediction results.
The RF method was effective in predicting the body length of donkeys, capturing 82.95% of the variation. Meanwhile, the accuracy value of ALM was 73.1%, lower than that of the RF model.
In order of importance, the variables affecting the body length in donkeys were Province, WH, and HR for the RF Algorithm; HR, WH, and HL for the MARS Algorithm; and, HR followed by TL, CW, and HW for the CHAID algorithm.
In terms of the performance results, the algorithms followed the order MARS > Bagging MARS > Random Forest > CHAID > ALM (best to worst).
Through the use of livestock data, it was concluded that data mining methods are very useful for determination of the relationships between body morphological properties, potentially allowing for the estimation of any variable.  Informed Consent Statement: Informed consent was obtained from all farm owners involved in the study. As this study did not involve sensitive data collection from human participants, no human ethical approval was necessary to be obtained in Turkey.