Estimation of Heritability under Correlated Errors Using the Full-Sib Model

In plant and animal breeding, sometimes observations are not independently distributed. There may exist a correlated relationship between the observations. In the presence of highly correlated observations, the classical premise of independence between observations is violated. Plant and animal breeders are particularly interested to study the genetic components for different important traits. In general, for estimating heritability, a random component in the model must adhere to specific assumptions, such as random components, including errors, having a normal distribution, and being identically independently distributed. However, in many real-world situations, all of the assumptions are not fulfilled. In this study, correlated error structures are considered errors that are associated to estimate heritability for the full-sib model. The number of immediately preceding observations in an autoregressive series that are used to predict the value at the current observation is defined as the order of the autoregressive models. First-order and second-order autoregressive models i.e., AR(1) and AR(2) error structures, have been considered. In the case of the full-sib model, theoretical derivation of Expected Mean sum square (EMS) considering AR(1) structure has been obtained. A numerical explanation is provided for the derived EMS considering AR(1) structure. The predicted mean squares error (MSE) is obtained after including the AR(1) error structures in the model, and heritability is estimated using the resulting equations. It is noticed that correlated errors have a major influence on heritability estimation. Different correlation patterns, such as AR(1) and AR(2), can be inferred to change heritability estimates and MSE values. To attain better results, several combinations are offered for various scenarios.


Introduction
Genetic improvement in plants and animals is primarily determined by the degree to which desirable characteristics are inherited and depends on the accuracy of selection and selection intensity, as well as on the amount of genetic variation, etc. [1]. The availability of genetic diversity, the link between numerous yield and yield-related variables, and heritability are critical for identifying potential genotypes for the development of crop varieties and animal breeds. In real-world phenomena in plant and animal breeding, it has been observed that observations are not independently distributed and that some form of connection exists between the observations [2]. The classical premise of independence among observations is violated in the presence of highly correlated patterns in the data. Plant and animal breeders are particularly interested in knowledge of the genetic components of variations in crucial characteristics [3]. Therefore, from the perspective of plant and animal breeding programs, it is crucial to estimate different genetic variations and make assumptions about their inheritance based on estimates of different genetic characteristics [2]. To tackle the requirements and to improve the crop and animal breeding programs, extensive research on statistical methods for genetic advancement is required.
In statistical modelling aspects, prediction of phenotypic variability is influenced by several factors [2]. In some cases, the general linear model suffers due to violations of the assumptions, and it is fascinating to investigate the impact of correlated error components besides correlated observations to arrive at a random error component. As a result, there is a need to investigate a model in which structural variation caused by error may be further considered for a random error model. In some real-world scenarios, response variables do not adhere to the normality assumptions, Roy et al. [3] implemented a Bayesian linear mixed model for estimation of heritability using pedigree data. It also encourages the use and development of statistical techniques that allow the amount of variability, due to different causes at both the genetic and phenotypic levels, to be assessed scientifically, and allows factors to be compared with a relatively high degree of accuracy, and breeding values to be predicted more efficiently. To date, a significant range of statistical methods for analyzing random and fixed effects models are available in the literature. The development of methods for estimating variance components began a long time ago, in the early twentieth century. Fisher [4] made a significant addition to variance component models by proposing the analysis of the variance technique of estimation. Cochran [5] pioneered the use of unbalanced data. Henderson [6] described a method for determining variance components in a challenging scenario using unbalanced data. Because of the shortcomings (negativity, lack of distributional features) of ANOVA (Analysis of Variance) estimators, other techniques such as ML (Maximum Likelihood), REML (Restricted Maximum Likelihood), MINQUE (Minimum Norm Quadratic Unbiased Estimation), and others evolved.
Again, numerous approaches are available in the literature for assessing the correlation between observations. Durbin and Watson [7] provided a method for determining the presence of a first-order autocorrelation disturbance in the error term. Diblasi and Bowman [8] provided a test statistic and graphical approach for assessing evidence for the presence of a spatial correlation in data. In case of half-sib data, Singh et al. [9] estimate variance components by considering correlated errors which follow autoregressive of order one i.e., AR(I). Costa et al. [10] estimate genetic factors of test day record for fat and protein yields using autoregressive multiple lactation animal models. The term autoregression refers to a regression of a variable against itself. In time series concept, autoregression is a value that is regressed on previous values from the same time series. The order of an autoregressive series is defined as the number of immediately preceding observations in the series that are used to predict the value at the current observation. Under this study, we have used first-order and second-order autoregressive models as AR(1) and AR(2), respectively. Orunmuyi et al. [11] used the maternal half-sib (dam variance) and full-sib (sire + dam variance) components to assess the genetic parameters of fertility and hatchability in two strains of Rhode Island Red (RIR) chickens, denoted as Strain A and Strain B, respectively. Rameez et al. [12] studied to evaluate the performance of Magur (Clarias magur) raised in a two-year class, estimate their heritabilities at stocking and harvest, as well as determine the genetic and phenotypic correlations between them. Using information from incredibly small family samples, Ødegård and Meuwissen [13] explored how sharing of identities by descent across the entire genome among close relatives can be utilized to quantify additive genetic diversity originating entirely from within-family variation. It was assumed that information from genome-wide markers could be used to accurately recreate genomic identity-by-descent relationships when estimating genetic variation from phenotypic data. The outcomes were contrasted with those of conventional pedigree-based genetic analysis. Estaghvirou et al. [14] explored how outliers affect the accuracy and robustness of genomic prediction systems in plant breeding. Lourenço et al. [15] explored the robust estimate of heritability and prediction accuracy in plant breeding, using simulation and empirical data. Hodge and Acosta [16] proposed an algorithm for full-sib genetic dataset analysis with extended mixed-model software. The variance component estimates, genetic parameter estimates, and BLUP solutions for genetic values provided by the method are essentially the same as those created by specialized genetic software programs. The method for determining the theoretical probability of heritability (h 2 ) estimations from full-sib analysis exceeding unity was described by Prabhakaran and Sharma [17]. Two hundred and fifty six full-sib families of maize (Zea mays L.) were examined by Marker and Joshi [18] at two distinct degrees of fertility. It was discovered that the additive genetic variance was more significant for grain yield per plant than the variance resulting from dominance deviations. Keeping the impact of the correlated error structure in mind, the development of a statistical technique for estimating genetic parameters when errors are correlated is an important research topic in the field of statistical genetics. The correlation contained in the error structure is overlooked in the standard analytic technique of field experimental data. We cannot ignore the correlation impact when there are highly associated patterns in the data. As a result, it becomes vital to seek out these instances and ways of investigation. Because theories for estimating variance components are only accessible in the literature for uncorrelated errors, the current study was undertaken with the goal in mind to investigate the effect of correlated errors on the quantitative trait inheritance in the case of a full-sib model. This paper is organized as follows. Section 2 describes the statistical approach to estimating the heritability under a two-way nested model. Derivation of MSE for a full-sib model under correlated errors considering AR(1) structure is shown. Section 3 discusses the results and includes a discussion of the proposed methodology on a simulated dataset, and compared with ANOVA (Analysis of Variance) estimators, other techniques, such as ML (Maximum Likelihood), REML (Restricted Maximum Likelihood), and MINQUE (Minimum Norm Quadratic Unbiased Estimation). The paper is concluded with a Conclusion section.

Materials and Methods
One of the primary needs for studying the statistical characteristics of genetic parameters was to simulate statistical-biological models with known population parameters. In general, different procedures can be used to estimate heritability, among them, those based on offspring regression on parents and on sib correlation or sib analysis have several advantages and disadvantages. The accuracy and bias of the estimators are determined by the relationship between them, used in the study. At present, we are solely interested in using sib analysis to estimate heritability. As stated below, data generation for correlated and uncorrelated instances is performed using a two-way classification model. In this work, Roningen [19] employed simulation models, specifically two-way nested models (full-Sib's model) for estimating heritability. The following is a brief overview of the Monte Carlo method:

Two-Way Nested Model
This is commonly referred to as the full-sib analysis model and may be expressed as where, s i = the effect of ith sire d ij = the effect of jth dam mated to ith sire e ijk = the random effect associated with kth member of the ijth full-sib group. The simulation model used for generating full-sib data is given as follows: where µ is considered as a general mean and σ s , σ d , σ e represents the standard deviation of the sire component, dam component, and the error component, respectively, and a i , a ij and a ijk are standard normal variates.

Estimation of Heritability by Full-Sib Correlation
Let, t be the estimated intra-class correlation coefficient between full-sib, then an estimate of heritability can be derived as correlation (Falconer [2]).
The genetic composition of this estimate of heritability is Here, V A , V D , V AA , V AD , and V DD are defined as variance due to additive effect, variance due to dominance effect, variance due to interaction of additive components, variance due to interaction of additive and dominance effect, and variance due to interaction of dominance component, respectively. Thus, this estimate is subject to bias from dominance deviations, as well as from non-allelic interactions. The estimate of heritability is the least reliable of all the estimates of heritability.
The statistical model for estimation of heritability based on the full-sib mating design with n offspring per dam is where Y ijk is the measurement of a character on the kth progeny of the jth dam mated to the ith sire. µ is the general mean; s i is the sire effect common to all the progeny of the ith sire; d ij is the dam's effect common to all the progeny of the jth dam mated to the ith sire and e ijk is random deviation. All effects except µ are random and independent, with expectations of zero and variances The data on progeny are then subjected to hierarchical analysis of variance; between sires, between dams within sires, and within dams and within sires to get the estimates of sire and dam components of variance. The form of analysis along with mean squares and their composition in terms of the observational components of variance are shown in Table 1.
and σ 2 s = COV(HS) Noting the genetic composition of the variance components σ 2 s + σ 2 d and σ 2 e , three different estimates of heritability can be obtained from these variance components and are as follow: The estimate of heritability by full-sib correlation was obtained by taking the average of sire and dam components and its sampling variance was determined by the approximate formula of sampling variance of an intra-class correlation coefficient.

Correlated Case
Suppose that sires are independent but within sire, progenies are correlated. Further, assume that the correlated errors follow AR(1) i.e., From the above equation, e ij s are generated.
In case of AR (2), where, η ij is considered as random error component.
In a similar fashion, we have generated the correlated data for different error structures other than AR(1) e.g., AR(2), a function of a distance, etc.

Derivation of MSE for Full-Sib Model under Correlated Errors (AR(1))
The usual full-sib model is as follows: Cov e ijk , e i jk = 0 , ∀ i and i Cov e ijk , e ijk = ρ |k−k | σ 2 e , ∀ k and k From the first principal, we have obtained the following equation for the above model.

Estimation of Heritability and MSE Values in Case of Correlated Errors (AR(1)) and Different Sample Sizes for the Different Parametric Values of Heritability
The data were generated from a population with low and high heritability for various sample sizes and family structures. Data have been generated using different heritability values i.e., high, and low (0.5, 0.1), using a full-sib model and different sample sizes 100 and 500 and different correlations of errors A(1) and AR (2). ρ = −1 to +1. After generating the data, variance components are estimated using SAS Proc Varcomp. ANOVA, ML, REML, and MIVQUE methods are used. The heritability estimates, along with MSE (Means Square Error), are obtained and given in Tables 2 and 3. In almost all cases, biased estimates are obtained. Estimates for considering only sire components are better than considering both sire and dam components and dam components alone.

Estimation of Heritability and MSE Values in Case of Correlated Errors (AR(2)) and Different Sample Sizes for the Different Parametric Values of Heritability
The data were generated from a population with low and high heritability for various sample sizes and family structures. The heritability estimates, along with MSE (Means Square Error), are obtained and shown in Tables 4 and 5. It is noticed that in almost all cases, biased estimates are obtained. Estimates for considering only sire components are better than considering sire and dam components and dam components alone. A combination of correlation i.e., (0, −0.5) and (0, 0.5) provide better results than any other combination. Increasing sample size decreases the MSE values.    (2)) and different sample sizes in case heritability of 0.5.

Estimation of Heritability and MSE Values in Case of Correlated Errors (AR(1)) and Different Sample Sizes and Different Parametric Values Heritability Using Derived Formulae
The heritability estimates along with MSE (Means Square Error) are obtained and given in Table 6. The expected mean sum of squares due to error is overestimated when the correlation is negative, and they increase as the degree of correlation increases. However, these expected mean sums of a square are underestimated if errors are positively correlated and they decrease with an increase in the degree of correlation and approach to 0 as ρ tends to unity. On the other hand, only reverse results are obtained for estimating the mean sums of squares due to sire i.e., the expected mean sums of squares are under-estimated when ρ is negative and they are overestimated if the correlation is positive. As ρ tends to unity, the expected mean sum of squares due to sire approaches its maximum value. Heritability values are overestimated if the correlation is positive. The same trend follows for all levels of heritability. Also, heritability increases from zero to nearly four as the autoregressive value increases. In the present work, a full-sib model is used to generate data. AR(1) and AR (2) errors are employed in this case. In the case of a full-sib model, equations for E(MSE) and heritability estimation in the presence of AR(1) correlation in the error are derived. Different AR(1) values ranging from −1 to +1 are investigated for different heritability values between 0.1 and 0.5. When the correlation is negative, the predicted mean sum of squares is overestimated, and this overestimation increases as the degree of correlation increases.
However, if the errors are positively associated, the expected mean sums of squares are underestimated, and they drop with an increasing degree of correlation and approach 0 as ρ tends to unity. On the other hand, when calculating the mean sums of squares due to sire, the expected mean sums of squares are underestimated when ρ is negative and overstated when the correlation is positive. As ρ tends to unity, the expected mean sum of squares due to sire approaches to its maximum value. If the correlation is positive, the heritability values are overestimated. A similar pattern may be seen at all levels of heritability. As the autoregressive coefficients rise from minus one to almost one, heritability rises from zero to nearly four. In the case of AR(2), if the AR(1) value is fixed while the AR(2) values are changed, the MSE value decreases as the correlation value increases in general. Occasionally, random tendencies are observed. We discovered that specific combinations of AR(1) and AR (2) (2), if the AR(1) value is fixed while altering the AR(2) values, the MSE value decreases as the correlation value increases in general. Occasionally, random tendencies are discovered. We discovered that combining AR(1) and AR (2)  Estimates based only on sire components outperform estimates based solely on sire and dam components. The correlation combination of (0,−0.5) and (0, 0.5) produces better results than any other combination. The MSE values drop as the sample size increases. Different correlation patterns, such as AR(1) and AR(2), can be inferred to alter heritability estimates and Means Square Error values. Various combinations are proposed for different circumstances to achieve better results.

Conclusions
In many plant and animal breeding experiments, it is seen that the observations are not independently distributed and that some type of correlation exists between the observations. In the presence of correlated error structures, the classical assumption of independence between observations is violated. Generally, plant and animal breeders are particularly interested to study the genetic components and the underlying causes of several traits. Thus, evaluating various genetic variations and inferring their inheritance based on estimations of various genetic characteristics is crucial from the perspective of plant and animal breeding programs. In the case of the full-sib model, theoretical derivation of Expected Mean sum square (EMS), considering AR(1) structure has been obtained. A numerical explanation is provided for the derived EMS considering AR(1) structure. MSE appears to be growing in tandem with the increasing trend in the correlation coefficient. It has been discovered that the MSE value increases in tandem with increasing correlation. For combined AR(1) and AR(2) structure, AR (1) is kept fixed, and AR(2) is updated. The trend in MSE behaves the same. When AR(1) and AR(2) are both adjusted, a decent correlation structure combination is found that reduces MSE. According to a simulation study, estimating heritability using only the sire component produced better results than estimating heritability using both the dam and sire components and using only the dam component.
Author Contributions: A.K.P., H.S.R., R.K.P. and P.K. conceived the research. A.K.P., H.S.R. and M.Y. collected the data and designed the methodology. R.K.P., A.K.P., M.Y. and H.S.R. supported the empirical analysis. H.S.R., R.K.P. and P.K. prepared the draft and edited the manuscript. A.K.P. supervised all activity. A.K.P., H.S.R., R.K.P., P.K. and M.Y. reviewed the manuscript. All authors have read and agreed to the published version of the manuscript.