Bayesian Linear Regression Modelling for Sperm Quality Parameters Using Age, Body Weight, Testicular Morphometry, and Combined Biometric Indices in Donkeys

Simple Summary The prediction of sperm output and other reproductive traits based on testicular biometry is an important tool in the reproductive management of stallions. Nevertheless, corresponding research in donkeys remains scarce. Several donkey breeds in Europe face a compromising threat of extinction, which has been accelerated by the low renovation of populations and their inbreeding levels. Although research on female reproductive physiology has made crucial advances, much less is known about the physiology of the male. In the present work, two Bayesian models were built to predict for sperm output and quality parameters in donkeys. Models included combinations of age as a covariate and biometric and testicular measurements as independent factors. Results evidenced that the goodness-of-fit was similar for both models—hence, the combination of biometry and testicular factors presented improved predictive power. The application of these models may assist in the process of making decisions in respect to the reproductive/biological, clinical, and selection handling of the animals. Abstract The aim of the present study is to define and compare the predictive power of two different Bayesian models for donkey sperm quality after the evaluation of linear and combined testicular biometry indices and their relationship with age and body weight (BW). Testicular morphometry was ultrasonographically obtained from 23 donkeys (six juveniles and 17 adults), while 40 ejaculates from eight mature donkeys were analyzed for sperm output and quality assessment. Bayesian linear regression analyses were considered to build two statistical models using gel-free volume, concentration, total sperm number, motility, total motile sperm, and morphology as dependent variables. Predictive model 1 comprised the covariate of age and the independent factors testicular measurements (length, height and width), while model 2 included the covariate of age and the factors of BW, testicular volume, and gonadosomatic ratio. Although goodness-of-fit was similar, the combination of predictors in model 1 evidenced higher likelihood to predict gel-free volume (mL), concentration (×106/mL), and motility (%). Alternatively, the combination of predictors in model 2 evidenced higher predictive power for total sperm number (×109), morphologically normal spermatozoa (%), and total motile sperm count (×109). The application of the present models may be useful to gather relevant information that could be used hereafter for assisted reproductive technologies.


Introduction
Measuring testicular size reports an approximate measurement of the amount of testicular parenchyma present in a certain individual, which in turn determines the potential for sperm production [1,2]. Studies on equine testicular biometry are fairly common, and the correlation between testicular dimensions and the capacity for sperm production has frequently been addressed in the literature, allowing the establishment of a predictive formula for daily sperm output (DSO) [2][3][4]. Still, contextually, corresponding publications on male donkey remain scarce. Besides biometric testicular data such as length, width, and volume, other factors might be investigated as independent variables (covariates) in DSO prediction, like age, body weight (BW), or the gonadosomatic ratio (GSI), which is the gonadal weight/BW ratio [5]. Previous studies evidenced that knowledge from stallions could not be directly assumed for donkeys, as their reproductive physiology may present some specific particularities. For instance, donkeys present spermatogenesis cycles that last for 47.2 days, with each spermatogenic stage lasting for 10.5 days [6].
The numbers of the population of the Burro de Miranda's (Equus asinus), as it occurs in other donkey breeds in southern Europe countries [7], dramatically fall, with a mature male population of around 40 jackasses and 300 breeding females. Besides, reproduction rates in such populations are low, with an excessive overuse of the same males in the past, which derived in the occurrence of genetic bottlenecks. Although the reproductive physiology of Miranda donkey females has already been studied [8], no research focusing on the biometry or reproductive physiology of Miranda donkey males has been reported to date. Considering the actual population structure, the interest in applying assisted reproductive technologies (ARTs) have raised. However, the implementation of these techniques requires consistent knowledge of reproductive biology in order to select the best males for ARTs programs to be successful. Testicular biometry, due to its correlation with sperm production, is a valuable tool to estimate male fertility and is also an essential element of the breeding soundness evaluation (BSE). In light of the aforementioned points, the accurate prediction of sperm output and quality parameters obtained from biometrical and testicular morphometric parameters, will improve the effectiveness of reproductive management in male donkeys.
In this context and bearing in mind the small but diverse mature male population, some issues in regards the statistical approach to follow may arise. For these reasons, the statistical tools used in this study were chosen to fit the characteristics of the data to be analyzed. According to Oravecz and Muth [9], the popularity of growth curve modelling (GCM) lies in its flexibility to simultaneously analyze within-individual changes (e.g., changes with age, change due to intervention, due to natural changes occurring along the life of the individuals, etc.) and between-individual effects (i.e., individual differences). In other words, GCM may be useful to model inter-individual differences and intra-individual variation. GCM has been successfully used to model the evolution of semen parameters in males from other species, such as boars [10].
An individual's specific growth trajectory, specified as a mathematical function that describes how variables reciprocally relate over time, captures how an individual uniquely changes. GCM covers situations that range from those for which the change function is linear to other occasions when curvilinear polynomial functions are fitted (for instance, quadratic, cubic, etc.), which means that modelling is not limited to consider straight-line functional growth. Beyond handling varying growth functions, GCM can flexibly handle unbalanced designs, meaning study individuals may be measured at different occasions and need not be excluded from the analysis, even if some of their measurements are missing [8].
In these regards, Bayesian inference potentiates the flexibility of GCM, given Bayesian analyses do not assume large samples, as it would happen in maximum likelihood estimation (either it is nonparametric or parametric inference). Besides, smaller data sets can be evaluated preventing power loss and retaining precision, as suggested by Hox, et al. [11] and Lee and Song [12]. In small sample size conditions, the probability of finding significant results decreases [13]. Given power issues, this limitation often translates into an increased hardness to obtain meaningful results [14].
According to Stoltzfus [15], the basic assumptions that must be met for the outputs of regression analyses to be valid include independence of errors, linearity in continuous variables, the absence of multicollinearity, and a lack of strongly influential outliers. Additionally, there should be an adequate number of events per independent variable (covariate) to avoid an overfit model. Commonly recommended minimum "rules of thumb" range from 10 to 20 events per covariate. This would be supported by Chen, et al. [16], who suggested that the usual minimum number of observations for running a linear regression to be 30 to obtain statistically significant estimates. The same authors would even state that sometimes this requirement cannot be met, for instance when the number of individuals in the sample is limited, which is common to all donkey breeds [17]. Consequently, the general rule of thumb explains that, to succeed when conducting a linear regression analysis, the number of observations must not be smaller than 30 or 3× (k + 1), where k represents the number of independent variables (covariates); hence, the sample size used in the present study fulfils all the assumptions to be used in linear regression analyses.
Contextually, Bayesian estimation methods have been reported to require a much smaller ratio of parameters to observations (1:3 instead of 1:5); that is, Bayesian inference maximizes the ability to determine significant effects for relatively limited sample sizes. These sample limitations are reflected in the broadening of confidence intervals, which must be accompanied by an acceptable Bayes factor value.
To the best of our knowledge, no previous study has reported an estimation of donkey sperm output and quality traits using a Bayesian approach. In this context, the aims of the present study are to define and compare the predictive power of two Bayesian predictive models for sperm quality parameters using linear testicular measurements, combined biometrical indices, and their relationship with age and BW as predictive factors.

Animals
The study was carried out in the Veterinary Teaching Hospital of University of Trás-os-Montes and Alto Douro (VTH-UTAD, Vila Real, Portugal). Animals have been evaluated with approval and in collaboration with the Association for Study and Protection of the Donkey Breed Burro de Miranda (AEPGA), in the behalf of a scientific protocol of caooperation signed between both institutions. All animal procedures were conducted in accordance with national laws for animal welfare and experimentation as with the EU Directive 2010/63/EU for animal experiments and the approval of the Directive Hospital Committee (Approval Ref. 408/VTH-UTAD).
Animals were clinically examined, and the genital tract was palpated previous to ultrasonographic (US) evaluation. Body weight (BW) (kg) was assessed using an equine digital floor weighting scale. For the testicular morphometric evaluation, 23 Miranda donkeys were considered. Animals were allocated to two groups by age; juvenile to prepubertal (n = 6) (≤14 months) and mature (n = 17) (≥24 months) ( Table 1). Only clinically healthy animals with normal size and consistency symmetrical testis and epididymis showing no echogenic changes in the testicular parenchyma were included. Epididymis and spermatic cords were included. Either in the juvenile or adult group, only animals with both testicles at scrotal position were considered. For the assessment of sperm, a sub-group of eight Miranda mature donkeys was selected from the mature group for further sperm collection and evaluation. Exams were performed during the autumn-winter season in 2018-2019 (US evaluations of the juveniles); and spring-summer in 2019 (semen collections and US examination of adult males).

Testicular Morphometry Evaluation
US testicular measurements were obtained from 23 donkeys aged from 7 to 259 months old and with 120 to 400 kg of weight (juvenile donkeys means: 11.17 ± 2.77 months old and 160.33 ± 24.66 kg; mature donkeys means: 79.94 ± 50.25 months old and 279.47 ± 54.43 kg, respectively). US measurements were performed with Philips ® CX30 Portable Ultrasound (Philips ® , Amsterdam, Holland) with a sectorial 3.0-7.0 MHz transducer, following the previously described technique for stallion measurements [4]. Longitudinal and transversal plans were performed in each testicle, being the epididymis excluded from testicular US measurements. The electronic cursors were placed at the limit of tunica albuginea, and after three consecutive scans, the following parameters were obtained considering the largest measurement (cm): right and left length (L), height (H), and width (W) (cm). Right and left testicular volume (TV) were calculated using the Lambert formula, TV = L × W × H × 0.5233, used to measure the volume of an ellipsoid [4]. Total testicular volume (TTV), which represents the sum of the right and left TV, was obtained for each donkey. To compute gonadosomatic ratio (GSI) (%), i.e., testicular weight/BW, TTV (cm 3 ) was directly converted into grams, based on the fact that testis volume density in mammals is very close to one [18]. After US measurements, routine orchiectomy was performed on five juvenile and two adult donkeys. After surgery, the extirpated testis (n = 14) were measured-the same measurements as in vivo-using precision sliding calipers.

Semen Collection and Evaluation
A sub-group of eight jackasses, ranging between 34 and 259 months of age (214-400 kg), was selected from the mature group for semen collection and further evaluation. A total of 40 ejaculates (five ejaculates per jackass) was collected. Donkeys had been successfully used in previous natural services. Before starting the experiment, sperm collections were performed for three consecutive days to minimize the number of sperm from extra-gonadal reserves, as it has been previously reported for donkeys [19]. Collections were performed at two-day intervals and using an artificial vagina (AV) (Hannover model-Minitub Iberica S.L., Tarragona, Spain) lubricated with non-spermicidal gel (ReproJelly-Minitub Iberica S.L., Tarragona, Spain), using a jenny in heat as a mount. The AV was filled with warm water to reach and maintain an inner temperature of 50-55 • C. A sterile semen collection bottle was used in each collection. The gel fraction was removed by filtering the whole ejaculate with a nylon filter (Minitub Iberica S.L., Tarragona, Spain). Gel-free ejaculate was immediately evaluated for volume (mL), motility (%), concentration (×10 6 /mL), and percentage of morphologically normal (%). Volume was measured in a graduated semen collection bottle. Then, each collected ejaculate was evaluated for sperm motility and concentration. For sperm motility evaluation, an aliquot of gel-free ejaculate was immediately extended 1:1 (vol/vol) with INRA 96 extender at 37 • C. Sperm motility was blind and subjectively estimated by the same experienced operator after the evaluation of motile spermatozoa (%) considering five different fields under light microscopy (×200), placing a semen droplet in a prewarmed (37 • C) slide covered by a cover slip. Concentration was determined using an improved Neubauer hemocytometer. Total sperm number (TSN, ×10 9 ) was computed considering the product between the volume of gel-free ejaculates and sperm concentration, whereas total motile sperm count (TMS, ×10 9 ) was obtained by computing the product between motility and TSN. Sperm morphology defects (head, intermediary piece, tail) were evaluated in eosin-nigrosin stained smears using light microscopy in oil immersion objective lens (×1000), counting a total of 200 sperm cells [20].

Parametric Assumptions Testing and Approach Decision
Since sample size was a limitation in this study, parametric assumptions were tested to decide on the most appropriate statistical approach to follow to analyse the present data. The Shapiro-Francia W' test (for 50 < n < 2500 samples), Shapiro-Wilk test (for n < 50 samples), and Levene's test were used to discard gross violations of parametric assumptions (normality and homoscedasticity). The Shapiro-Francia W' test was performed using the Shapiro-Francia normality routine of the test and distribution graphics package of the Stata Version 15.0 software (StataCorp [21]. Supplementary Tables S1 and S2 report a gross violation of normality assumption occurred in all variables of testicular biometry and sperm parameters (p < 0.01), respectively, except for gel-free volume (mL) and sperm concentration (×10 6 /mL). Homoscedasticity was violated as well (p < 0.01); hence, a nonparametric approach was suggested.
All statistical tests, including all Bayesian procedures, were performed using the explore procedure of the descriptive statistics package in SPSS Statistics (Version 25.0, IBM Corp., Armonk, NY, USA) [22].

Comparative Analysis of US and Caliper Testicular Morphometry between Juvenile and Mature Jacks
Bayesian one-way ANOVA procedure was used to detect differences in the means for testicular measurements between juvenile and mature jackstocks using the Bayesian ANOVA task from the Bayesian statistics procedure of SPSS Statistics, Version 25.0, IBM Corp. [22].

Analysis of US Testicular Morphometry, Age and BW
Bayesian inference of Pearson's correlation was used to characterize the posterior distribution of the linear correlation between age and BW, US testicular measurements, and composite indices using the Pearson correlation task from the Bayesian statistics procedure of SPSS Statistics, Version 25.0, IBM Corp. [22]. The correlation methods used and discussed in this paper can be validly used even if we work with repeated measures as we tested independent data [23]. Furthermore, in case variable pairs tested held a perfect linear correlation r xy = 1, the integral equation to perform Bayesian inference for Pearson's correlation would not have converged [24].

Analysis of US and Caliper Testicular Morphometry
Bayesian inference of Pearson's correlation was used to characterize the posterior distribution of the linear correlation between caliper testicular biometry variables (n = 14 testis) and US testicular biometry variables (n = 46 testis). The Pearson's correlation coefficient measures the pairwise linear relation between the dependent variable y and the independent variable x. When r xy = |1|, the dependent variable y is perfectly linearly correlated with the independent variable x. Then, following a decreasing order, a coefficient of |0.8| < r xy <|1| suggests a strong linear correlation; a coefficient of |0.3| < r xy <|0.6| suggests a moderate correlation; and a coefficient of 0 < r xy < |0.3| suggests a weak correlation, respectivelyProfillidis and Botzoris [25].
The two methods were compared to decide on whether to use US or real biometric parameters or a combination of both to build Bayesian regression models. Bayesian inference for Pearson correlation was performed using the Pearson correlation task from the Bayesian statistics procedure of SPSS Statistics, Version 25.0, IBM Corp. [22]. The aforementioned test evidenced that US and caliper measuring methods were significantly correlated.
Supplementary Table S3 summarizes the estimated Pearson's correlation pairwise coefficients and respective Bayes factors. For all measurement pairs, the estimated Pearson's correlation coefficient was always higher than 0.938, with corresponding Bayes factor of <0.001. As a result, the use of US measurements was exclusively selected to integrate the models for predicting sperm output, provided measurements were taken in vivo, hence they had a higher clinical applicability.
According to Dogan [26], although correlation analyses may erroneously detect the occurrence of incidental relationships instead of meaningful clinical/biological association, these may be a preferable choice under certain contexts. For instance, the same authors reported that one of the critical problems in other presumably more robust techniques such as the Bland-Altman analysis relies on the need for the data to meet the assumption of a normal distribution. Contrastingly, when testing Pearson's correlations, the pairs of continuous variables need not be normally distributed, although their differences should.
To determine the violation of this assumption, data may be tested against the normal distribution using classical methods such as the Shapiro-Wilk test or the Kolmogorov-Smirnov test.
Additionally, the same authors reported the fact that the Bland-Altman analysis is not an appropriate method to compare items for which repeated measurements were considered, as in the present study. In these regards, Batterham [27] would suggest that in a spreadsheet-based simulation of calibration and validity studies, a Bland-Altman plot of difference versus mean values for the instrument and criterion may show a systematic proportional bias in the instrument's readings, even though none is present. This artifactual bias arises in a Bland-Altman plot of any measures with substantial random error. In contrast, a regression analysis of the criterion versus the instrument shows no bias. In this context, a regression analysis also provides complete statistics for recalibrating the instrument, if bias develops, or if random error changes since the last calibration. Consequently, the Bland-Altman analysis of validity should therefore be abandoned in favor of regression, as was performed in our study.
Each of the regression models used in this study followed the general equation y i = X 1 β 1 + . . . X i β i + ε i , where i = 1,2, . . . i is the ith number of factors; y i is the vector of records for the aforementioned dependent variables with dimension n (217 records belonging to 31 jacks); X i is the appropriate incidence matrix for factors; and β i are the standardized regression coefficients for the ith number of factors and covariates considered, respectively. The general regression equation for model 1 was Y = Intercept + β age (months) ·age (months) Oppositely, the general regression equation for model 2 was Y = Intercept + β Age (months) ·Age (months) + β BW (kg) · BW (kg) + β TTV (cm3) ·TTV (cm 3 ) + β GSI ·GSI, except for gonadosomatic ratio (GSI) (%), for which the last term in the equation was not included, provided this term refers to gonadosomatic ratio (GSI) (%) itself (β GSI ·GSI).
According to Carlin [28], Bayesian inferences are sensitive to the dependence of variables on time (conditional on θ and x). If such dependence is large, it needs to be modeled, or the inferences will not be appropriate. For this reason, age was considered in the models. Under this design, time (age) plays a similar role to a blocking variable or covariable. For example, suppose that E(y|x, θ) has a linear trend in time (age) but that this dependence is not modeled (that is, suppose that a model is fit ignoring time (age)). Then, posterior means of factors or covariables in the model will tend to be reasonable, but posterior standard deviations will be too large, because this design yields treatment assignments that, compared to complete randomization, tend to be more balanced for time (age).
As Brewer [29] suggested, in our case, the use of an intercept was necessary as an empirical need for it was detected (for instance when unstandardized coefficients are used as in the present study). In these regards, confidence intervals for the estimated intercept were used as empirical indicators for the need of the intercept. Residual effects (ε i ) were assumed to follow a normal distribution as ε i |X i N 0, σ 2 εi , where X εi is an identity matrix and σ 2 εi is residual variance, respectively. For a continuous predictor variable, such as those in the present study, unstandardized coefficients are produced by the linear regression model using the independent variables measured in their original scales.
Unstandardized coefficients β i can be interpreted considering what was stated by Hayes, et al. [30]-that all other variables being held constant, an increase of one unit in X i is associated with an average increase of β i units in Y. In the sections below, a detailed summary of the priors and posterior distributions used in this study is reported. A full description of the algorithms used by SPSS to perform Bayesian Inference on Multiple Linear Regression Models in this study can be found in the public document IBM SPSS Statistics Algorithms v. 25.0. by IBM Corp. [24].
When large number of parameters are being considered in a model, quadratic approximation has been reported to be computationally faster in terms of discretization and computing the likelihood over all possible parameter combinations compared to other approximations such as the Markov Chain Monte Carlo (MCMC) methods used in this study. However, the use of this quadratic approach was not feasible given it assumes the posterior distribution follows a normal distribution. In the context of our data, this assumption cannot be presumed provided the gross violation reported for the distribution properties reported at previous assumption testing stage.
After Bayesian Pearson's correlation coefficients across variables had been performed, two distinct combinations of factors were evaluated. First, model 1 comprised the covariate of age (months) and the independent factors of LLT (length of left testicle) (cm), LRT (length of right testicle) (cm), HLT (length of left testicle) (cm), HRT (height of right testicle) (cm), WLT (cm) (width of left testicle), and WRT (width of right testicle) (cm). Second, model 2 comprised the covariate of age (months) and the factors BW (kg), TTV (total testicular volume) (cm 3 ) and GSI (%). Lowest correlations were found for age and any of the rest of variables, hence, the covariate was retained in both models. The value of almost 1 for the correlation found between VLT (cm 3 ) (volume of left testicle) and VRT (volume of right testicle) (cm 3 ) and TTV (cm 3 ) was the basis to decide on using composite TTV (cm 3 ), given the reduced number of variables in model 2. BW was only considered in model 2, given the high generalized close to or above 0.9 correlations that it held with biometric caliper measurements. Bayesian linear regression analyses were performed using the linear regression package from the Bayesian statistics task of SPSS Statistics, Version 25.0, IBM Corp. [22]. The Bayesian Linear Regression test routine of the linear regression and related package of the Stata Version 16.0 software process was used to compute posterior distribution statistics for each factor included in each model to predict for each dependent variable. Once the analysis had been performed, we interpreted the estimated effect of the factors considered in the resulting predictive models, their confidence intervals, and the posterior distribution statistics.

Jeffrey-Zellner-Siow (JZS) Mixture of g-Priors
For the present analyses, the Jeffrey-Zellner-Siow mixture of g-priors [31] was used. Jeffrey-Zellner-Siow's prior somehow appears as a data-dependent prior through its dependence on X i , but this has been reported not to be a drawback since regression models are conditional on X i . As suggested by Heck [32], JZS prior could be an alternative that may satisfy several theoretical requirements such as the equality constraint on the testrelevant parameters, for instance of β, which leads to the null hypothesis H 0 = β = β 0 [33]. The benefits of JSZ prior distribution had also been reported by Rouder, et al. [34] and Liang, et al. [31]. Contextually, conditional on the residual variance (σ 2 εi ), the JZS prior defines a multivariate Cauchy distribution for the slope parameters of the full model, as follows (β i |σ 2 εi ) ∼ MVC 0 P , γ i 2 σ 2 εi C i −1 , which is defined by a P-dimensional zero vector (location vector) and a scale matrix. The constant γ i determines the amount of scaling, which is chosen by the user a priori, the residual variance σ 2 εi , and the matrix C i = X i X i /N i , which is the covariance matrix of the centred design matrix X i .
There are qualities of the JZS prior [34] that make it especially appropriate when performing linear regression analyses. Among these, the prior is symmetric and centered at zero in line with the predictive matching criterion as reported by Bayarri, et al. [35]. As a result, positive and negative values of the slope parameters have a priori the same probability to occur. Furthermore, JZS prior is scale invariant, thus the resulting Bayes factor does not depend on the scale of both the dependent variable and factors or covariates, hence results do not change when different unit variables are evaluated together, which is common in field conditions studies.
This independence from the measurements of model elements is achieved by scaling the multivariate Cauchy distribution by the residual variance σ 2 εi (a priori, a larger residual variance implies larger slopes) and by the inverse of the covariance matrix C i (a priori, a covariate with a larger variance implies smaller slopes). It may be worth considering that the procedure of defining a scaled prior for unstandardized coefficients (β i ) equals the process of defining a prior for standardized coefficients (β * i ) [34]. Third, the scale parameter γ is fixed to a constant by the user, which allows prior beliefs to be specified about the expected effect size. IBM Corp.
[24] algorithm guide reports that the algorithm of JZS prior for linear regression analyses, to compute the Bayes Factor uses the default value of γ = 2 √ π = 3.5, which reflects a prior belief of a medium effect size. For a single covariate x, this choice implies that the standardized regression slope Authors such as Rouder and Morey [36] also reported additional theoretical advantages of the JZS prior, such as its consistency in model selection (the fact that the Bayes factor, goes to infinity as the number of observations N increases without bound-favoring the data-generating model) or consistency in information (the Bayes factor for a certain effect goes to infinity as the proportion of explained variance or R Squared (R 2 ) increases to 1). Additionally, Bayes factors for JZS prior can be relatively easily and highly precisely computed [37] and has been adapted for the default t-test [38], ANOVA [34], and linear regression [32].

Factor and Covariate Effects Bayesian Modelling (FCEBM)
Being y i , any of the effects of any of the independent variables (covariates) considered in this study, the posterior distribution of y i in the context of the data D is This is an average of the posterior distributions of each model, weighted by the corresponding posterior model probabilities. In the previous equation, the posterior predictive distribution of y i given a particular model M i is and the posterior probability of model M i is given by is the likelihood, and p(M i ) is the prior probability that M i is the true model. For a problem with P potential covariates, the number of models, K, can be enormous (K = 2 P in the absence of other constraints). Only a small number of these models will have much support from the data, thus be selected by SPSS for each covariate. Marginal posterior distributions of all unknowns were estimated using the Gibbs sampling algorithm.

Factors and Covariate Effect Bayesian Interpretation (CEBI)
The checklist proposed by Depaoli and Van de Schoot [39] was used to detect issues to check before estimating the model, (b) issues to check after estimating the model but before interpreting results, (c) understanding the influence of priors, and (d) actions to take after interpreting results.
Interpreting the effect of each particular covariate (independent variables used in this study) can be made as follows.
First, the posterior probability p[β * i = 0/D] expresses the likelihood that the factor or covariate has an effect on each particular variable. Standard rules of thumb [40] for interpreting this posterior probability are as follows: <50% evidence against the effect; 50-75% weak evidence for the effect; 75-95% positive evidence; 95-99% strong evidence; >99% very strong evidence, whose results are comparable to commonly used thresholds to define significance of evidence through Bayes factor (BF) as reported in Supplementary  Table S4.
Second, posterior distribution estimates (means) are used to measure the magnitude of the effect of a particular factor and covariate. For continuous predictor variables (metric covariates), such as the numeric variables used in this study, the regression coefficient represents the difference in the predicted value of the response variable for each one-unit change in the predictor variable, assuming all other predictor variables are held constant. When response variables are metric and can readily be interpreted in terms of impact, such as the ones in our study, β regression coefficients effect sizes by themselves.
Third, the 95% credibility interval shows that there is a 95% probability that these regression coefficients (posterior distribution mean value for each covariate and factor) in the population lie within the corresponding credibility intervals. When 0 is not contained in the credibility interval, a significant effect for such factor is detected.
Supplementary Table S5 report a summary of posterior distribution statistics from Bayesian unstandardized linear (β) regression coefficients for each of the aforementioned variables considered in the analyses and a summary of Bayesian ANOVA outputs to test for differences in the means for US and caliper testicular measurements between juvenile (n = 6) and mature donkeys (n = 17).

Convergence Criterion
The rounds of iteration continued until a tolerance convergence criterion of 10 −8 was reached as suggested in literature [41]. Once the convergence criterion was reached, initial parameters were set, and model fitting properties were evaluated. The maximum number of iteration rounds used for each analysis was 2000 as suggested in IBM SPSS Statistics Algorithms version 25.0 by IBM Corp.
[24]. This convergence criterion was chosen provided it has been used in Bayesian ANOVA and linear regression analyses in research contexts of limited sample sizes [42].

Model Validity, Explanatory Power of Present Data, and Predictive Power of Future Data
The process of validation and comparison of Bayesian model is fully mathematically described in Geweke [43]. In this context, some authors [44] have suggested a correct proof for model validation should be based on the mean square error (MSE) of the models being evaluated. Additionally, although mean square residual or error (MSE) and minimum mean-square residual or error (MMSE) have been used and widely reported to measure how close a regression line is to a set of points (how good a certain model fits the data being observed), mean square prediction error or MSPE (= RSS/no. of observations) was chosen to measure error variation given MSE has been reported to be influenced by the number of predictors [26] in cases of reduced sample sizes [45,46].
Residual sum of squares (RSS) measures the amount of variance in a data set that is not explained by a regression model. That is, if we consider a regression to be a measure of the strength of the relationship between a dependent variable and an independent variable from a set of independent variables, then the RSS measures the amount of error remaining between the regression function and the data set-hence, it essentially determines how well a regression model explains or represents the data in the model. A smaller RSS figure represents a better suitability of the regression function to model for the data that it is intended to model.
In Bayesian inference, Monte Carlo Standard Error (MCSE) is another measure of accuracy of the chains. It is defined as the standard deviation of the chains divided by their effective sample size. MCSE has been reported to be the nonparametric or Bayesian counterpart of MSPE, and has been suggested to be used as the validation criteria in Bayesian Linear Regression model comparison studies [47].
Bayes factor (BF) provides an indirect measure of the explanatory power of the model to describe presently observed data (in our study). Larger BFs imply higher likelihoods for the combination of factors considered to explain the response variables being modelled. Commonly used thresholds to define significance of evidence following the premises by Jeffreys [48] and Lee and Wagenmakers [49] are reported in Table S4. Intrinsically related to BF, Bayesian R 2 can be considered as a data-based estimate of the proportion of variance explained for data. Additionally, acceptance rate, efficiency, and Monte Carlo standard error (MCSE) were used to determine the validity of the Bayesian methods implemented. Supplementary Table S4 reports a summary of the description and interpretation of each model validity parameter used. Bayesian statistics predictive accuracy of the model [50] can be estimated through posterior predictive checking [51] (Supplementary Table S6). BIC was then calculated, as it explains how well the model will predict on new data. Bayesian information criterion (BIC) or Schwarz information criterion (also SIC, SBC, SBIC) was computed as follows: where MSPE is the mean squared prediction error, N is the number of observations or records, and K is the number of independent parameters of the model. BIC was evaluated to compare predictive power across models. To summarize, BIC considers both the statistical goodness of fit and the number of parameters that have to be estimated to achieve this particular degree of fit, by imposing a penalty the number of parameters is increased [52,53]. BIC measures the trade-off between model fit and complexity of the model to determine [54]. Lower BIC values suggest that a particular model should have improved prediction properties in comparison to models for which higher values have been reported. In these regards, Bayesian R 2 answers a different question as Bayesian R 2 estimates the explanatory power of observed data, when the model is regression and non-adjusted R 2 is used.
Frequently, when more variables are added, model predictive accuracy decreases. Consequently, a model with higher R 2 will have higher-hence, worse-BIC values. The addition of "noise" variables to the fit (for which a relationship has not been suggested) will increase R 2 values, but it will also decrease predictive power of the model. Hence, the model with more "noise" variables will have higher R 2 and higher BIC. Table 2 reports a summary of the descriptive statistics for age, BW, ultrasonographic (US) and caliper testicular morphometry (cm); L, H, W, TTV, GSI for juvenile and mature donkeys (n = 46 testis). Descriptive statistics of US and caliper measurements were computed for each group to perform a comparative analysis (Supplementary Table S7a,b). In the juvenile group, mean TTV (cm 3 ) was 17.74 ± 9.89 (n = 12 testis), while in mature group, TTV (cm 3 ) was 271.69 ± 133.21 (n = 34 testis).

Descriptive Analysis for US Testicular Morphometry, Combined Biometric Indices and Sperm Output
In the juvenile group, there was a progressive increase in all US testicular measurements, namely in TTV, from seven to 24 months, which was especially noticeable after 11 months of age. At 12-14 months, mean TTV was 21.05 ± 9.30 cm 3 (n = 10 testis), and at 24-26 months, TTV was 85.27 ± 18.66 cm 3 (n = 4 testis) (P < 0.01). Additionally, an increase in TTV was described after 150 kg of BW had been attained, which was verified in all donkeys after 12 months. On the other hand, after 168 months of age, a gradual decrease in TTV was noted. No difference between left and right testicle was found. Gonadosomatic ratio (GSI) (%) means in juveniles was 0.11 ± 0.06 and in matures 0.95 ± 0.39. Significant differences (p < 0.001) were found between juvenile and mature groups for all testicular biometrical parameters (Table S5).

Bayesian Pearson's Correlation Coefficients Preliminary Testing
Following a probabilistic view of regression, it can be assumed that any dependent variable (Y) has a certain associated variance σ 2 . Linear regression bases on identifying the weight vector from observed data of a dependent variable to then use it to make predictions. For the model to be stable enough, the variance of the weight vector (W ls ) should be low. If weight vectors variance is high, it means that the model is very sensitive to data. The weights differ largely with observed data if the variance is high. This means that the model might not perform well with observed data. When highly correlated covariables are used in regression models, the variance of the weight vector will be large. This occurs because when highly correlated features (covariates or factors) are considered, the values in the Singular Value Decomposition "S" matrix will be small. Hence inverse square of "S" matrix (S −2 ) will be large which makes the variance of W ls large. For these reasons, Pearson's correlation coefficients must be tested prior to performing regression analyses. Table 3 summarizes the estimated sample Pearson's correlation coefficient and the Bayes factors for BW (kg), age (months), US testicular biometric parameters and composite indices. For all variable pairs, the estimated Pearson's correlation coefficient was always higher than 0.461, with a corresponding Bayes factor of <0.001, in all cases. Besides, moderate to high Bayesian inference Pearson's correlation coefficients were found between age, BW, and testicular biometric variables. Pearson's correlation coefficients between testicular biometry and BW were always >0.778, whereas Pearson's correlation coefficients between testicular biometry and age were >0.467.

Bayesian Linear Regression Modelling for Sperm Quality and Output Predictions Model Explicative Power
Bayesian determination coefficients (R 2 ) or percentages of variance captured for each of the two models and their respective Bayes factors are provided in Table 4. Both models were considerably more likely than others comprising just the intercept.
Bayesian estimates of linear regression coefficients for predictive models 1 and 2 for gel-free volume, concentration, morphologically normal or abnormal, TSN, GSI, motility, and TMS are presented in Tables 5-7. The intercept term in the regression evidences the average expected value for the response variable when all of the predictor variables are equal to zero.

Predictive Power and Model Validity
Posterior predictive P values for models 1 and 2 were around 0.331. The combination of predictors in model 1 evidenced a higher likelihood to predict for gel-free volume (mL), concentration (×10 6 /mL), and motility (%) (BIC: 387.587 to 534.480). Yet, the combination of predictors in M=model 2 evidenced a higher likelihood to predict for TSN (×10 9 ), morphologically normal and abnormal spermatozoa (%), TMS (×10 9 ) and gonadosomatic ratio (GSI) (%), (BIC: −40.559 to 34,635.240). Age-related effects were verified on the following parameters: gel-free volume, morphologically abnormal spermatozoa (%), and TSN (Tables 5 and 6). The summary of the results for the parameters of validity of both models is reported in Table 8.   TTV-total testicular volume; TSN-total sperm number; TMS-total motile sperm count; GSI-gonadosomatic ratio.

Testicular Morphometry (Juveniles and Matures) and Sperm Quality Parameters in Miranda Donkey Breed
The indication that testis biometry could provide a quantitative indication of sperm production has been previously reported in bulls [55,56], bucks [57,58], and dogs [59,60]. In the horse, morphometric, ultrasonographic-echotextural, and histomorphometric studies have been carried out [3,59,60], which evidenced the relation between testicular dimensions and sperm outputs [2] and the contribution of the ultrasonographic (US) evaluation in the accurate evaluation of the testicular functional status [61].
Albeit less than in horses, some studies on testicular morphometry have been conducted in donkey breeds such as Brazilian Pêga [6,62]; Ethiopian [63]; Egyptian [64,65]; and in the Italian breeds, Ragusano [19] and Martina Franca [66,67]. These studies have addressed the considerable existing variation among donkey breeds, which led to the need of investigating testicular dimensions in Miranda donkey breed. Besides, no previous works on Bayesian approaches to predict for sperm output and quality in donkeys has been conducted to the knowledge of the authors.
Mean US values of TTV of 271.69 cm 3 (±133.21) obtained in mature Miranda donkeys were higher than those found in Egyptian donkeys [64], similar to those in Brazilian Pêga donkeys [62], and lower than those reported for Ethiopian donkeys [63]. In comparison to other morphologically similar breeds to Miranda donkey, our values were similar to slightly lower than TTV values found in Ragusano and Martina Franca donkeys [19,66]. Even if all the aforementioned breeds were medium to large-sized, differences of TTV could still be attributed to BW, age and management conditions of the males selected for the studies.
In the juveniles, studies are still scarce, but the values in our study (17.74 ± 9.89 cm 3 ) were similar to those described for the prepubertal Egyptian [68] and Amiata donkeys [67]. Donkeys between 10 to 14 months are still in their pubertal transition period and, which reaches its end at 19-20 months of age, when testis have presumably completed their descent into the scrotum [37]. In the present work, a rapid increment of TV was verified after 11-12 months, which, besides, was simultaneous to the increase of BW. According to the work by Rota, et al. [67], a progressive increase of testicular width was noted after 10 months, and notably after 16 months of age; however, puberty-defined by the first presence in the ejaculate of 50 ×10 6 sperm with at least 10% of motility-, was not attained in donkeys before 19-20 months. A previous histological work by our group evidenced that although a rapid increase of TV could be observed after 12-14 months, spermatogenesis was still incipient at that age [69]. Still, further studies should be carried out to precisely determine the age of Miranda donkey at puberty.
The comparative analysis of US measurements with those obtained using a precision caliper after orchiectomy evidenced that the former were very accurate. The precise position and orientation of the probe during US examination and the correct handling of the testicle, avoiding excessive tension during the exam, may have additionally contributed to the obtention of reliable US measurements.
The quantitative and qualitative sperm parameters obtained; total sperm number (TSN) per ejaculate, volume, concentration and morphology were within the range found for other European Donkey breeds such as Zamorano-Leonese [70], Catalonian [71], Andalusian [72][73][74], and Amiata donkey [67]. The values of GSI obtained for mature donkeys (0.9494) were higher than those reported in other domestic species [5]. This finding is of great interest and application when implementing ARTs' strategies and is consistent with previous studies that observed the comparatively greater efficiency for sperm production of donkeys among mammals, characterizing by a high spermatogenic efficiency and short length of spermatogenesis [6,75].

Bayesian Approach and Predictive Models
Comparative observations were taken at different time points, from a population whose membership changes over time, but retains some constant members. This sample condition is known as partially overlapping samples and for this study, it implies the fact that not all the animals which were measured for morphometric parameters were evaluated for semen parameters. As reported by Kay, et al. [76] in studies working with partially overlapping samples, when there has been a gross violation of normality, as in our study, samples should be considered independent. In a nonparametric context, the strong subdivision of samples across the different experiments may condition results, hence, a Bayesian approach was followed given smaller data sets that can be evaluated avoiding power loss and retaining precision.
Posterior predictive p values (total model probability) for models 1 and 2 of around 0.331. Indicated moderately plausible good-fitting models. Similarly, the difference of more than 3 log likelihood units can be considered as strong evidence against models 1 or 2 depending on the parameters considered. The higher value reported for this parameter may suggest the acceptance of a more parameter-rich or simpler model accordingly. BIC explains how well the model will fit for new data (instead of explaining the existing data, which is measured by Adjusted R 2 ). Models presenting lower BIC values evidence improved predictions for the dependent variable or variables that they model for. Frequently, adding more variables decreases predictive accuracy, and in that case, the model with even higher Adjusted R 2 will display higher BIC, decreasing its predictive power [52,76,77]. However, considering the higher Adjusted R 2 and the lower BIC, model 1 performs better at explaining and predicting than model 2 for gel-free volume (mL) and concentration (×10 6 /mL). For motility (%), model 1 was more precise to predict for future data while slightly worse at explaining present data (0.01 lower R 2 ). The opposite situation was reported for TSN (×10 9 ) and TMS (×10 9 ), for which model 2 was more precise to predict for future data, although it may be slightly worse at explaining present data (0.13 lower R 2 ). For morphologically normal and abnormal spermatozoa (%) and gonadosomatic ratio (GSI) (%), model 2 suggested a higher ability to explain for present and predict for future data.
In the present research, when comparatively analyzing testis' biometry predictive power on spermatic parameters, some differences were found. The left testicle seems to exert a higher influence on gel-free volume, while, on the other hand, the biometry of the testicles seems to affect TSN differently. Concretely, the length and width of the left testicle and the height of the right testicle seem to increase in parallel with sperm quantitative parameters. Oppositely, as length and width of right testicle and height of the left testicle increase, sperms output parameters seem to decrease. Hence, the negative/positive balance between linear regression coefficients of morphometry variables (length, width and height) suggest that testis may reciprocally react to changes in the contralateral testicle, which affects almost all sperm outputs variables.
A previous work purposes the "compensation hypothesis" in birds, that states that one of the testis could serve as a "back-up" for any reduced function of the other and provides a mechanism to explain intraspecific variation in degree and direction of gonad asymmetry [77]. Another work relates that the degree of testicular asymmetry was positively correlated with inbreeding coefficient and negatively correlated with the proportion of normal sperm [78]. However, in the present work, testicular asymmetry was not found in both clinical and morphometric evaluation, as both features do not meet the inclusion criterion.
Mahmoud Ali Omar, et al. [79] reported a similar compensatory effect in the right testicle after the removal of the contralateral testicle in donkeys. Other authors have ascribed this compensation to the increase in serum LH and FSH concentrations and, potentially higher intratesticular testosterone [80]. Unilateral orchiectomy has been reported to increase the mean diameter of seminiferous tubules by 21% and of their lumina by 51% [81]. Additionally, two events in line with our results were described. A weight compensation was reported for the remaining testis, which has been already described [82]. Also, the histological examination of the testis of donkeys after unilateral orchiectomy with scrotum suture revealed hyperplasia of Leydig and Sertoli cells [79]. This had also been reported by Putra and Blackshaw [83], who suggested an increase in the number of Sertoli cells and germ cells occupying the seminiferous epithelium after unilateral orchiectomy. Our results may evidence that compensation may occur physiologically without these events, as it has also been reported in other species [78]. Still, future works are necessary in order to confirm these findings in donkeys.
In the present study, the age covariate, included in both predictive models, was significantly and positively correlated with several parameters, namely with gel-free volume and sperm output (TSN). The significant age-related positive effects on gel-free volume and TSN agreed those in previous works in stallions [84,85]. For instance, the influence of age in testicular dimensions of juvenile and peripubertal donkeys was verified by Rota, et al. [67], who suggested that age markedly influenced testicular width.
On the other hand, age shows a linear association with abnormal sperm morphology in model 1. Morphologically abnormal spermatozoa percentage slightly increases with age; while sperm concentration and morphologically normal spermatozoa linearly decrease. The negative impact of advanced age on morphology has been already described in stallions and has been ascribed to testicular degeneration, abnormal epididymal function [86] or to age-related testicular dysfunction associated with deterioration in DNA sperm motility [87]. A study in Egyptian donkeys reports that from six years onward, histological features were indicative of spermatogenic efficiency starting to decrease [65]; however, more studies should be performed before concluding that the same occurs in Miranda donkey breed.
In general, stronger correlations between BW and testicular biometry than between age and testicular biometry were verified in the present study. This agrees with the findings in a previous study conducted in stallions which emphasized the influence of body size in testicular measurements and sperm output [2]. However, the analysis of regression coefficients evidenced that the association of motility and total motile sperm (TMS) with TTV was not always constant. On the contrary, sperm motility, as well as TMS and concentration, were positively and linearly associated with gonodasomatic ratio (GSI). Overall, this supports the fact that even if the measurements of the testicular parameters could provide useful information about the potential sperm production, when it comes to predict motility, these parameters should be adjusted for the BW of the donkey, as reported by [Woodall and Johnstone [88]] when predicting for fertility in dogs. Contextually, further investigations should allow to determine and confirm the relationship between BW and TV in donkeys.

Conclusions
The results of the present work evidence the reliability of ultrasonographic measurements of testis, which emphasizes its importance and value to obtain reference values of donkey testicular volumes. Values of testicular volume and sperm output in the Miranda donkey breed are similar to those in other affine European donkey breeds. Gonadomatic index (GSI) is higher in the donkey than in other domestic species as previously described, which confirms the great reproductive potential of male donkeys.
Combinations of biometrical and testicular morphometric factors (age, body weight, testicular volume and GSI) will likely improve the predictive accuracy of Models than using factors separately. Besides biometry, considering data such as BW and age, testicular volume, and GSI may be systematically taken into consideration and integrated on BSE of donkeys. The present study provides new insights into donkey reproductive biology, which may be transferred to ARS strategies. Appropriate use of both models may be useful to further improve knowledge on the reproductive characteristics of donkey breeds, which may reinforce clinical purposes and maximize the outcomes from direct conservation or selection strategies.