An Algorithm for Genetic Analysis of Full-Sib Datasets with Mixed-Model Software Lacking a Numerator Relationship Matrix Function, and a Comparison with Results from a Dedicated Genetic Software Package

: Research Highlights: An algorithm is presented that allows for the analysis of full-sib genetic datasets using generalized mixed-model software programs. The algorithm produces variance component estimates, genetic parameter estimates, and Best Linear Unbiased Prediction (BLUP) solutions for genetic values that are, for all practical purposes, identical to those produced by dedicated genetic software packages. Background and Objectives: The objective of this manuscript is to demonstrate an approach with a simulated full-sib dataset representing a typical forest tree breeding population (40 parents, 80 full-sib crosses, 4 tests, and 6000 trees) using two widely available mixed-model packages. Materials and Methods: The algorithm involves artiﬁcially doubling the dataset, so that each observation is in the dataset twice, once with the original female and male parent identiﬁcation, and once with the female and male parent identities switched. Five linear models were examined: two models using a dedicated genetic software program (ASREML) with the capacity to specify A or other pedigree-related functions, and three models with the doubled dataset and a parent (or sire) linear model (ASREML, SAS Proc Mixed, and R lme4). Results: The variance components, genetic parameters, and BLUPs of the parental breeding values, progeny breeding values, and full-sib family-speciﬁc combining abilities were compared. Genetic parameter estimates were essentially the same across all the analyses (e.g., the heritability ranged from h 2 = 0.220 to 0.223, and the proportion of dominance variance ranged from d 2 = 0.057 to 0.058). The correlations between the BLUPs from the baseline analysis (ASREML with an individual tree model) and the doubled-dataset / parent models using SAS Proc Mixed or R lme4 were never lower than R = 0.99997. Conclusions: The algorithm can be useful for analysts who need to analyze full-sib genetic datasets and who are familiar with general-purpose statistical packages, but less familiar with or lacking access to other software.


Introduction
The use of linear mixed models is widespread in the analysis of genetic data, in particular, the progeny test data of animals and plants, including forest trees [1][2][3][4]. The use of the term "mixed models" refers to linear models with a combination of fixed and random effects as explanatory factors that account in varying degrees for the variation observed in a trait of interest to the breeder, e.g., milk yield in dairy cows or volume growth in forest tees. Breeders are primarily interested in predicting how specific genotypes will perform in future environments, under some type of commercial production system for an animal product or plant crop, and therefore typically treat the underlying genetic factors affecting the trait of interest as random effects to be predicted. By contrast, breeders are typically less interested in the specific effects of the test location, or experimental replication within the test location, and they often treat these as fixed effects to be estimated [5]. Thus, the use of mixed models is well-suited for genetic data analysis, using fixed-effect estimates to adjust phenotypic data during the prediction of the random genetic effects. In particular, it is common to use software programs that combine the Restricted Maximum Likelihood (REML) estimation of variance components associated with the random effects in the model, along with use of those variance component estimates, to both estimate the fixed effects (with Best Linear Unbiased Estimates, BLUEs) and predict the random effects (Best Linear Unbiased Predictions, BLUPs).
Almost all genetic improvement programs make extensive use of designed experiments known as progeny tests. Progeny tests contain offspring arising from known or controlled matings among a set of parents, and these offspring or progeny are measured for the commercial traits of interest. In the analysis of progeny test data, a type of genetic effect of particular interest to breeders is the "additive" genetic effect. In a given population, the additive genetic effects will have a particular "additive genetic variance" associated with the distribution of those effects. This additive genetic variance (σ 2 A ) represents the amount of genetic variance that can be passed from parent to progeny; thus, it is of great interest to the breeder. Alternatively, one can think of additive genetic variance as what can be inherited by the progeny from its parents; thus, the proportion of additive genetic variance relative to the total phenotypic variance is defined as the heritability (h 2 ) of a particular trait. The amount of genetic variance and the heritability have a huge impact on the amount of genetic gain that a breeder is likely to achieve, and on the difficulty or ease of achieving that gain [6,7].
Normally, different causal random effects in the linear model are assumed to be independently distributed, each with its own unique variance. An unusual feature of linear models for progeny test data arises from the fact that most breeding programs are focused on organisms that are diploid, which means that individuals have two copies of each gene, one inherited from the mother and one from the father. If the progeny test is a full-sib test (i.e., for all progeny, both parents are known), then an appropriately constructed linear model must somehow account for the fact that female parents and male parents are both sampling the same distribution of additive genetic effects (assuming no maternal effects or mitochondrial effects). In addition, many plant species are monoecious; that is, individual plant or tree genotypes can produce both female and male reproductive structures. Thus, in a given progeny dataset, the same parental genotype could be the female parent of some progeny and the male parent of other progeny.
A solution to deal with this data structure involves the use of a matrix Aσ 2 A , which is the variance-covariance structure of the additive genetic effects for all parental and progeny genotypes in the dataset. The matrix A is known as the numerator relationship matrix, and the elements are estimates of the proportion of alleles IBD (identical by descent) in pairs of genotypes in the study [8,9]. Dedicated genetic analysis software programs typically make use of a pedigree matrix (containing a list of all progeny genotypes and their female and male parents) to form the A matrix used in the variance component estimation and BLUP of the random effects [8]. Some examples of such programs include ASREML [10], Echidna [11], MTDF-REML [12], SeleGen [13], WOMBAT [14], and others.
However, mixed-model packages or procedures are also available in a number of general-purpose statistical software packages, both commercial packages, such as SAS Proc Mixed [15], and open-source programs, such as the R lme4 package [16,17]. Most of these packages do not have a pedigree matrix or numerator relationship option, and do not have the capacity to equate female and male random terms in the linear model to represent the same additive genetic effect. The objective of this manuscript is to describe an algorithm for the genetic analysis of a full-sib progeny test dataset using general-purpose linear mixed-model software to produce variance component estimates, genetic parameter estimates, and random-effect BLUP solutions equivalent to those that are obtained using dedicated genetic data analysis software with the capacity to model the numerator relationship matrix. Such an algorithm Forests 2020, 11,1169 3 of 21 might be very useful for analysts familiar with general-purpose statistical packages, but less familiar with, or lacking access to, other software. The algorithm will be demonstrated with SAS Proc Mixed and R lme4

Overview of Analytical Approaches and Description of the Algorithm
All linear mixed models can be expressed with the following general equation: where y is an n × 1 vector of observations, where n is the number of records; X is an n × p design matrix for fixed effects, where p is the number of levels of fixed effects; b is the p × 1 vector of fixed effects; Z is an n × q design matrix for random effects, where q is the number of levels of random effects; u is the q × 1 vector of random tree effects; e is an n × 1 vector of random residuals.
All models in this study included fixed effects for test (different field locations where progeny tests were established) and replication(test) (different replications within a test), and also included random terms for accounting for all or some proportion of the variance associated with different kinds of genetic effects: additive, additive x environment, dominance, and dominance x environment.

Tree Model, ASREML = ASR_ind
For this manuscript, a full-sib progeny dataset was analyzed using ASREML, with a linear model using a numerator relationship matrix. Animal breeders commonly refer to this type of analysis as the animal model, given that the numerator relationship matrix models the variance and covariances of the additive effects of individual animal genotypes in the dataset. In this manuscript, the animal model (or tree model, if one is a forest tree breeder) is denoted as ASR_ind. This was considered the baseline analysis, and four other approaches were compared to this analysis, looking at variance components associated with random effects, genetic parameters (e.g., heritability = h 2 ) calculated from those variance components, and BLUP solutions for random effects. Specifically, we focus on breeding value predictions (BV, the additive effect associated with a given genotype) and predictions of full-sib family Specific Combining Ability (SCA, a measure of the average dominance effect observed in the progeny of a given full-sib cross). Four other linear mixed models were compared to ASR_ind.

Parent Model, ASREML = ASR_parent
This model does not utilize a numerator relationship matrix; rather, it includes terms for the female and male parents to account for the additive effects. This model is commonly called a sire model by animal breeders [8] and will be referred to in this manuscript as a parent model, and denoted ASR_parent. The parent model accounts for the pedigree (parentage) of the progeny, but does not define the pedigree of the parents since they are assumed to be an unrelated base population, with their A matrix equal to an identity matrix. The ASR_parent model makes use of special options within ASREML to overlay the female and male design matrices, thus equating the female and male random effects. In other words, the program recognizes that those two different terms draw from the same additive genetic effect distribution, and uses them together to estimate a single variance component associated with the parent. It also recognizes that the female k and male k have the same additive effect, the one associated with parental genotype k. This model directly predicts the family General Combining Ability (GCA) [4], among the other random effect solutions. The GCA of a half-sib family k is defined as 1 2 the BV of parent k, and represents the average additive effect inherited by all progeny of parent k.

Progeny Breeding Value Predictions for Parent Models
The ASR_parent model (and the three subsequent parent models, described below) was also used to predict the BV of all progeny in the dataset as a function of the mid-parent breeding value, plus an adjustment for the Mendelian sampling of the additive genetic alleles affecting the trait of interest: where â ijk = the BV ith progeny of parent j × parent k; â j = the BV of parent j; â k = the BV of parent k; w ijk = the within-family Mendelian additive effect of the individual ijk.
For each tree,ŵ was calculated asŵ ijk = h 2 w (e ijk ), where e ijk = the residual from the linear mixed model associated with progeny from parent j × parent k, and h 2 w = the within-family heritability = 1 2 σ 2 A /σ 2 e , i.e., the ratio of the additive genetic variance within a full-sib family to the residual variance. This approach is effectively equivalent to the Reduced Animal Model developed by Quaas and Pollack [18], where the predicted BVs for progeny are obtained by back-solving from the parental BVs [8].

Algorithm for Use with General-Purpose Mixed Linear Model Software
In addition to the ASR_ind and ASR_parent models, three additional linear mixed-model scenarios were examined, all using the same linear models with exactly the same random-effect terms, but run with three different software programs (ASREML, R, and SAS). The three models are parental models, as described above, but without the use of any options or modules to force the program to equate male and female random effects. Normally, such an analysis would estimate separate and different variance components for female and male random effects, and predict different GCA values for the same parent k when used as female k or male k. The proposed algorithm for performing a proper analysis of these data does not involve modifying the mixed-model analysis programmed in the software; rather, it involves manipulating the dataset. Specifically, for any given full-sib dataset with n progeny observations, the dataset is doubled to include 2n total observations. Every observation is included twice, once with the true female and male parent identifiers, and again with a unique tree identifier and the female and male identifiers swapped. For example, consider two observations in a .csv data format: Tree 1241 is a progeny of female 4 × male 6 in test 1, rep 1 with a standardized volume = 128.43. This observation is also included in the dataset with a different TreeID (21241), as a progeny of female 6 × male 4. All the other identification variables are unchanged, as is the phenotypic measurement. The dataset is then analyzed normally with female and male random effects in the model. The program does not recognize that these effects are the same, and it will estimate separate variances for the female and male effects, but the final variance estimates produced by the software will be the same (σ 2 female = σ 2 male ), and the BLUPs for the GCA for parent k as a female will be equal to the BLUP for GCA for parent k as a male. These results arise because in the doubled dataset, every progeny of parent k, whether truly derived with k as the female parent or k as the male parent, is included in the dataset as progeny with k listed as the female parent (and also in another observation where k is listed as the male parent). Whether this doubling of the dataset has any other effect on the variance component estimates, genetic parameter estimates, or other BLUP solutions will be demonstrated using an example analysis.
Using this doubled dataset, parent linear mixed models were run with three software packages: ASREML, and the model will be denoted ASR_parDouble; R lme4, and the model will be denoted R_parDouble; SAS Proc Mixed, and the model will be denoted SAS_parDouble.

Summary of Different Linear Mixed Models
The five different linear mixed models are summarized in Table 1, along with the genetic interpretations of the model variance components. Note that all the models included additive x environment interaction terms such as female × environment and male × environment in order to facilitate comparisons among the results. Additionally, note that ASR_ind uses the numerator relationship matrix A, so the variance associated with that component is the full additive genetic variance. This also means that the residual variance from that model will be lower than the residual variance in all the other models by an amount equal to 1 2 σ 2 A . For all the linear models, the following genetic parameters were then estimated as functions of the variance components: where σ 2 A = the additive genetic variance, σ 2 AE = the additive × environment variance, σ 2 D = the dominance genetic variance, and σ 2 DE = the dominance × environment variance. The Type B genetic correlations [19] both provide a measure of the genotype × environment interaction variance, and range from r B = 0, indicating no consistent genetic effect across environments, to r B = 1, indicating a perfectly consistent genetic effect across all environments (or in other words, the absence of any genotype x environment variance).
The specific ASREML code, R code, and SAS code for the different linear mixed models are presented as supplementary materials in Archive S1. The code and the example dataset are also available online at https://camcore.cnr.ncsu.edu/software/.

Precision of Variance Component Estimates
For all the ASREML models, the standard errors of the variance component estimates were produced directly by the program. The genetic parameters were calculated using the !VPREDICT option, and the standard errors calculated by the program. For the SAS_parDouble model, the ASYCOV option of Proc Mixed produces an asymptotic variance-covariance matrix of the variance component estimates. Elements of this matrix were then used to calculate the standard errors of the genetic parameters. The standard errors of h 2 were calculated using the standard error of the additive variance estimate and treating the phenotypic variance estimate as a constant according to Dickerson's approximation [20]. The standard errors of d 2 were estimated in a similar manner. The standard errors of the Type B correlations (r Bg and r Bd ) were calculated as a first-order approximation of a Taylor-expansion series [21].
The R package lme4 does not provide standard error estimates for the variance components, but does provide confidence interval estimates around the variance component estimates. This is a philosophical decision by the program developer, reflecting the view that standard errors should be restricted for use with normal or nearly normal distributions, and thus are not appropriate for variance component estimates that are asymmetrically distributed [22]. Using the confint function of lme4, a 68.3% confidence interval was calculated for the variance component estimates of the R_parDouble model. This confidence interval covers the same range as a ±1.0 standard deviation range would for a normally distributed variable.

Precision of BLUP Estimates
The standard errors of prediction for the BLUPs for the parental BVs (SE(a − â)) and full-sib SCA effects (SE(s −ŝ)) were directly calculated from the software for all the linear models. The standard errors of prediction for the BLUPs for the progeny BVs were directly calculated only for the ASR_ind model. For the other models, it is theoretically possible to estimate SE(a − â) for progeny (e.g., as functions of the SE(a − â) of the parents, covariances among the â of the parents, the within-family heritability, and the within-family additive genetic variance), but these are not produced directly by the software.

Correlations among BLUPs from Different Models
To compare BLUPs from the different linear models, we examined the correlations among the genetic value predictions Corr(â k ,â ASR ind ) and developed linear regression models relating the predicted genetic values from the four parent models (â k ) to those from ASR_ind (â ASR ind ).

Description of the Dataset
A simulated dataset representative of a typical forest tree-breeding scenario was used to compare the different linear mixed models. Specifically, the initial genetic parameters were assumed based on results typical of the 8-year stem volume for tropical pines [23]: h 2 = 0.20, d 2 = 0.10, r Bg = 0.75, r Bd = 0.75, and total phenotypic variance = 2500. Using these genetic values, a population of 40 parents was generated, and these 40 parents were crossed in an incomplete diallel to form 80 full-sib families. The different parents were included in different numbers of crosses: of the 40 parents, 4 parents were included in only 2 crosses, 8 parents were used in 3 crosses, 16 parents in 4 crosses, 8 parents in 5 crosses, and 4 parents were used in 6 crosses. The 80 full-sib families were divided into four sets (A, B, C, and D) and established in four field tests. Each field test was assumed as a randomized complete block with 25 replications, with single-tree plots (one tree/family/replication). All the full-sib families were included in three of the four tests, with each test containing three of the sets (i.e., ABC, ABD, ACD, and BCD) in Tests 1 to 4, respectively. For the purposes of the simulation, 100% survival was assumed, so there were 6000 progeny from 40 parents and 80 full-sib families in the total dataset. All the simulated data were generated using a modification of the algorithm described by Lstiburek et al. [24].

Variance Component Estimates
Throughout the presentation of the results and the following discussion, for convenience, we will use the estimates from the ASR_ind model as the "best" estimates, and compare the other models to these. Estimates of the variance components from the five linear models are presented in Table 2. The two ASREML models using the original dataset (ASR_ind and ASR_parent) gave identical variance component estimates for all the terms (up to the fifth significant digit). All of the models using the doubled dataset produced very similar estimates for all model terms. For the additive genetic variance, ASR_ind produced an estimate ofσ 2 A = 557.5, and the range of the three doubled-dataset models was fromσ 2 A = 555.5 to 557.4. For the female x male variance (σ 2 fm ), which can be interpreted as the variance of the full-sib SCA effects, ASR_ind produced an estimate ofσ 2 f m = 36.5, and the range of the three doubled-dataset models was fromσ 2 f m = 35.6 to 35.9. The female × environment and female × male × environment interaction variances (σ 2 fe and σ 2 fme ) were also very similar across the models. ASR_ind produced an estimate ofσ 2 f e = 47.3, and the range of the three doubled-dataset models wasσ 2 f e = 44.7 to 45.9. ASR_ind produced an estimate ofσ 2 f me = 25.0, and the range of the three doubled-dataset models wasσ 2 f me = 25.9 to 27.1. Finally, ASR_ind gave an estimate of the phenotypic variance ofσ 2 phen = 2534.5, and the range of the three doubled-dataset models wasσ 2 phen = 2495.5 to 2513.4. The ASR_ind model included a term for an individual tree, so the software directly estimated additive variance (σ 2 A ), and female parent variance (σ 2 f = 1 4 σ 2 A ) was derived. All other models included terms for female and male parents, so the software directly estimated female parent variance (σ 2 f = 1 4 σ 2 A ), and additive variance (σ 2 A ) was derived. Other model terms were female x environment (σ 2 fe ), female x male (σ 2 fm ), female x male x environment (σ 2 fme ), residual (σ 2 resid ), and total phenotypic variance (σ 2 phen ). 2 The R package lme4 does not provide standard error estimates for variance component estimates; rather, it provides confidence intervals around the estimate. Values presented are for a 68.3% confidence interval, equivalent to ± 1.0 standard deviation for a normally distributed variable. 3 The analyses varied according to the software (ASREML, SAS Proc Mixed, or R lme4), linear model terms (individual tree or parent model), and/or the use of a normal or doubled dataset.

Precision of Variance Component Estimates
As expected, the estimated standard errors of the variance components from ASR_ind and ASR_parent were identical (with the exception of se(σ 2 resid ), which makes sense since σ 2 resid is different in the two models). Comparing the standard errors of the variance component estimates from the doubled-dataset models to those from the ASR_ind and ASR_par models, there are lower standard error estimates for some of the random effects ( Table 2). The standard error estimates from ASR_ind and the doubled-dataset models (ASR_parDouble and SAS_parDouble) were similar for the additive variance, with se(σ 2 A ) = 167.2 compared to se(σ 2 A ) = 163.7 and 163.7, respectively. This was also observed for the female × environment variance, with se(σ 2 f e ) estimates of 13.8 compared to 13.5 and 13.7 from ASR_ind, ASR_parDouble, and SAS_parDouble, respectively. For the additive variance, R_parDouble estimatedσ 2 . This is comparable to the range of estimatedσ 2 A = 557.5 ± 167.2 from the ASR_ind model (Table 2). There was, however, a notable difference in the standard error estimates for the dominance-related effects, the female × maleand female × male × environmentvariance, and to a lesser extent, the residual variance ( Table 2). For the female × male effect, the se(σ 2 f m ) from ASR_ind was se(σ 2 f m ) = 18.3 versus se(σ 2 f m ) = 12.8 and 12.9 from ASR_parDouble and SAS_parDouble, respectively. Similarly, for the female × male × environmenteffect, ASR_ind gave se(σ 2 f me ) = 19.2 versus se(σ 2 f me ) = 13.5 and 13.7 from ASR_parDouble and SAS_parDouble, respectively. Finally, for the residual variance, the se(σ 2 resid ) from ASR_par was se(σ 2 resid ) = 39.5, while the estimate from the doubled-dataset analysis with ASR_parDouble was se(σ 2 resid ) = 27.5. For these variance component estimates from the R_parDouble model, the confidence intervals were also more narrow than would be expected compared to the results from the ASR_ind model (Table 2).
This apparent difference in the precision of these variance component estimates from the doubled-dataset analyses is due to fact that the software "counts" twice as many degrees of freedom to estimate the female × male, female × male × environment, and residual variances as are actually present in the original dataset. Specifically, for this example, there are 80 full-sib (female × male) crosses in the dataset, and 240 full-sib × environment combinations (each full-sib family tested on three sites). However, because of the doubling of the dataset, each unique female × male combination is also a unique male × female when the parents are reversed. Thus, the software "counts" 160 full-sib (female × male) crosses in the dataset, and 480 full-sib × environment combinations, and this leads to lower standard errors (or a smaller confidence interval, in the case of R_parDouble using R lme4) for these variance components.

Genetic Parameter Estimates
Although there was some slight variation among the variance component estimates from the different models, there was no important variation in the genetic parameter estimates from the different models (Table 3). ASR_ind and ASR_parent produced identical genetic parameter estimates. For most of the genetic parameters, the three doubled-dataset models produced estimates that differed from the ASR_ind estimates only at the third significant figure. For example, for heritability, ASR_ind gave an estimate ofĥ 2 = 0.220, and the range of the three doubled-dataset models wasĥ 2 = 0.221 to 0.223. For the proportion of dominance, ASR_ind gave an estimate ofd 2 = 0.058, and all three doubled-dataset models produced an estimate ofĥ 2 = 0.057. For the Type B genetic correlation, ASR_ind gave an estimate ofr Bg = 0.747, and the range of the three doubled-dataset models wasr Bg = 0.752 to 0.757. For the Type B dominance correlation, ASR_ind gave an estimate ofr Bd = 0.594, and the range of the three doubled-dataset models wasr Bd = 0.568 to 0.581. Among the genetic parameter estimates, this was the largest deviation observed from the ASR_ind parameter estimate. Finally, for the within-family heritability (h 2 w ), ASR_ind gave an estimate ofĥ 2 w = 0.133, and the range of the three doubled-dataset models wasĥ 2 w = 0.133 to 0.135. This is an important parameter, because it is used to calculate the within-family additive genetic deviation from the mid-parent breeding value, the Mendelian genetic sampling term.

BLUPs of Genetic Effects
Tables 3-5 present a subset of the predicted breeding values for the parents, the SCA values for the full-sib families, and the breeding values for the progeny. The selected subsets cover the range of the genetic value predictions, sampling approximately every 1 2 standard deviation in the population from minimum to maximum (based on the z-scores for the ASR_ind predictions).
For the parental breeding value predictions for the parents (â), the five different linear models produced nearly identical predictions for particular genotypes (Table 4). For any genotype with a breeding value prediction more than 1 2 a standard deviation away from the population mean of zero, the predictions from the different linear models differed only at the third significant figure. For example, for Parent 7, which had a z-score around 2.2 among the parental breeding value predictions, the predictions from the five linear models were â = 44.53, 44.54, 44.62, 44.76, and 44.60.  For the full-sib SCA predictions (ŝ), the five different linear models produced nearly identical predictions for particular genotypes (Table 5). For any full-sib cross with a z-score more than 0.50, the predictions from the different linear models differed only at the third significant figure. For example, for the full-sib family of female 28 × male 21, with a z-score = −2.17 for the predicted SCA effect, the predictions from the five linear models wereŝ 28_21 = −6.92, −6.91, −6.86, −6.83, and −6.87 (female_male = 28_21 in Table 5). Finally, as with the parental breeding value predictions, for the progeny breeding value predictions (â), the five different linear models produced nearly identical predictions for particular genotypes ( Table 6). For any progeny with a z-score more than 0.50, the predictions from the different linear models differed only at the third significant figure. For example, for Tree 1692, with a z-score = +3.00 for the predicted BV, the predictions from the five linear models were â = 46.70, 46.70, 46.82, 46.97, and 46.80. Since there were 6000 progeny, the z-score of the predicted breeding values covered a wider range (z = −2.80 to +3.86) than was observed among the 40 parents (z = −2.30 to +2.30). Table 6. Individual tree-breeding value predictions from five linear mixed models 1 for a simulated full-sib progeny dataset with 40 parents, 80 families from an unbalanced mating design, 4 tests, and 25 trees/family/test. SD(â) = the observed standard deviation of the population of 6000 breeding value predictions. SE(a − â) = the average standard error of prediction 2 . Selected trees have breeding values that span the distribution of 6000 trees; standardized z-values are listed 3 .

Precision of the BLUPs
Similar to the results with the BLUP solutions from the different linear models, the estimated standard errors of the predictions from the different linear models were very similar. For example, the average standard error of the predictions for the 40 parents from the ASR_ind model was SE(a − â) = 11.87, compared to SE(a − â) = 11.71 for the three doubled-dataset models (Table 4). For the SCA effects, the average standard error of the predictions for the 80 full-sib families from the ASR_ind model was SE(a − â) = 5.15, while SE(a − â) ranged from 5.09 to 5.11 for the three doubled-dataset models (Table 5). Comparing the SE(a − â) for the parents and progeny, the average SE(a − â) = 17.37 for the progeny (Table 6) compared to the average SE(a − â) = 11.87 for the parents (Table 4), and every progeny had a larger SE(a − â) than every parent (see supplemental materials, Archive S1).
As data increase in quality and quantity, BLUP predictions for a given random effect will have not only lower errors of prediction (e.g., SE(a − â)), but also a higher variance among predictions, approaching the estimated variance for that random effect [8,25,26]. The breeding value predictions for both parents and progeny are scaled based on the additive genetic variance. From the ASR_ind model, the estimated additive variance wasσ 2 A = 557.5, which corresponds toσ A = 23.6, which estimates the spread among the "true" breeding values in the population. Since, in this dataset, the parental breeding values were based on many progeny, they were precisely predicted, and the distribution of the parental breeding value predictions approached the distribution of the true breeding values. In fact, the standard deviation of the parent breeding value predictions was SD(â) = 20.65 for ASR_ind, with similar values for the other models (Table 4). For the progeny, the standard deviation of the breeding value predictions was smaller, SD(â) = 15.45 for ASR_ind, with similar values for the other models (Table 6). For the progeny, one-half of the breeding value arises from Mendelian sampling within the full-sib family, and the only datum for predicting that component is the single phenotypic measurement on that tree, which in this study, was relatively imprecise (i.e., h 2 w = 0.133, Table 3). Thus, it makes sense that individual tree BVs are predicted with less precision than parent BVs and therefore span a smaller range. There is a larger observed range among the progeny BVs (from −42.99 to 60.02, Table 6) compared to the parental BVs (−47.41 to 47.50, Table 5); however, this is entirely due to the fact that there were only 40 parents sampled compared to 6000 progeny. With such a large number of progeny, a few extreme outliers were found (z-scores of z = −2.80 and z = 3.86, Table 6), but most of the progeny BVs were clustered closer around the mean of zero.
The predictions of the SCA effects are scaled based on 1 4 σ 2 D , which is estimated byσ 2 f m in the linear models of Table 2. For model ASR_ind,σ 2 f m = 36.5, corresponding toσ f m = 6.0, which estimates the spread among the "true" SCA effects in the population. In this simulated experiment, full-sib families were tested on three sites with 25 progeny per site, and the SD(ŝ) was around 3.2 for all five models.

Correlations among Genetic Value Predictions from Different Models
Perhaps most importantly, there was an extremely high correlation among the predicted genetic values from all five linear models, for the parental breeding value predictions, progeny breeding value predictions, and full-sib family SCA predictions ( Table 7). The correlation between the genetic values predicted from the four parent linear models and the ASR_ind model was never below Corr(â m ,â ASR ind = 0.99997) and was mostly 1.00000. The linear regression models relatingâ m toâ ASR ind ( orŝ m toŝ ASR ind ) had slopes very near to 1.00, and intercepts very near to zero (Table 6). It is also worth noting that the differences betweenâ m andâ ASR ind (orŝ m andŝ ASR ind ) are quite small. For example, for the breeding values of the progeny, the model R_parDouble had the largest differences betweenâ m andâ ASR ind , with a standard error of the differences of SD(â m −â ASR ind ) = 0.1284. The range among the progeny breeding value predictions of SD(â) = 15.45 (Table 7), so in this case, the ratio of SD(â m −â ASR ind )/SD(â m ) is 0.0083. In other words, relative to the variation among the predicted genetic values, the difference in the predictions from the different methods is practically negligible.

Genetic Value Predictions
Perhaps the most important objective in the analysis of genetic datasets is the prediction of genetic values to guide selection. In progeny test analysis, accurate and precise predictions of breeding values are critically important. Breeding values represent additive genetic value, so these predictions guide selection decisions among parents, progeny, or both, in the formation of a base population to begin the next cycle of breeding and improvement. Breeding value predictions are used to make selection decisions in the formation of a production population, that is, those genotypes that will be used to produce progeny for commercial deployment [5]. In forest trees, the most common form of the production population is a clonal seed orchard (CSO) (e.g., see [27,28]). In a CSO, a small number of selected genotypes are multiplied through some type of vegetative propagation and established in a single location. Following open pollination due to wind or insects, seed collected from the orchard is used to establish commercial plantings. In some cases, rather than relying on open pollination, specific full-sib families are selected as the deployment option [29]. The full-sib families are produced by controlled pollination, producing a large amount of seed, with seedlings as the deployment unit, or a small amount of seed that is multiplied vegetatively to produce rooted cuttings as the deployment unit. In this case, the genetic value of the full-sib family would be equal to the average of the two parental breeding values, plus the SCA effect for that family. In this case, accurate and precise predictions of the SCA effects would be important for making optimal selection decisions among all the possible full-sib families.
The doubled-dataset/parent-model algorithm described in this manuscript produced essentially identical breeding value predictions to an individual tree analysis for parental and progeny breeding value predictions, and SCA effects for full-sib families. In every case, the correlation among the predictions from the ASR_ind model and the doubled-dataset models was essentially unity, the linear regression coefficients were essentially one, and the linear regression intercept was essentially zero. These data indicate that the breeding values for parents, SCA effects for full-sib families, and breeding values for progeny from the four parent models are essentially identical to those from ASR_ind; thus, regardless of which linear model is used, there will be no differences in terms of selection decisions, and no differences in genetic gain predictions for a population of selected genotypes.

Variance Component Estimates
Some small differences in the variance component estimates produced by the different linear models and methods might be expected due to differences among the software in the convergence algorithms. ASREML uses the Average Information algorithm and sparse matrix methods [30,31] and assesses convergence by looking at the change in the REML log-likelihood [32]. SAS Proc Mixed uses the Newton-Raphson algorithm and assesses convergence by looking at the change in the REML log-likelihood [15,32]. The R lme4 package uses a hybrid approach, beginning with a number of iterations using the Expectation-Maximization algorithm [33], followed by a number of Newton-Raphson iterations, and assesses convergence by looking at the relative change in the variance parameter estimates [32]. These differences in convergence algorithm among the three software packages (and perhaps other differences such as the precision of the calculations) could account for the small differences in the variance component estimates observed among the three doubled-dataset parental models in Table 2.
There would likely also be some small differences in the variance component estimates in the doubled-dataset/parent models compared to ASR_ind and ASR_par, due to the artificial doubling of the data. As noted above, for the additive and additive × environment interaction terms, the number of females (and the number of males) in the doubled-dataset analysis equals the number of parents in ASR_ind or ASR_par, so we expect no bias in the variance component estimates associated with these random effects. However, for the random effects female × male, female × male × environment, and residual, we expect there to be a very small downward bias in the variance estimate compared to the true underlying variance. For a simple random variable y~N(µ σ 2 ), a sample variance estimator s 2 is an unbiased estimator of σ 2 E s 2 = σ 2 It can be demonstrated that with a doubled dataset, the sample variance estimator s 2 is a biased estimator of σ 2 (see Appendix A).
However, for mating designs of any size at all (i.e., any n of a moderate size, e.g., greater than 30), the effect of the bias should be negligible.
Once a mixed-model software package converges to a REML solution for the variance components associated with the random effects, the estimated variance components are then used to properly weight the observed data in the calculation of the solutions of random effects. The doubled-dataset/parent models gave essentially the same variance component estimates and genetic parameter estimates (functions of the variance components associated with the random terms in the models). There were some small differences among the software packages and/or linear models, but any differences had no important effect on the solutions of the random effects (breeding values and SCA predictions), as mentioned above.
It is important to note that any inferences regarding fixed effects (e.g., F-tests for differences among sites) would likely be affected by the doubling of the apparent number of degrees of freedom for many of the random effects. If such inferences are important, the researcher should be careful to test for these using a separate analysis with a non-doubled dataset.

Precision of Variance Component Estimates
It is also possible to demonstrate that the use of a doubled dataset of a random sample of y will produce a very small downward bias in the estimates of the precision for the variance component estimates, but again, this downward bias is negligible (see Appendix A). In other words, the expected value of the standard error estimate with a doubled dataset is approximately equal to the standard error estimate of the normal dataset: However, the use of the doubled-dataset/parent model algorithm for mixed-model analysis produces notably lower standard error estimates from the software output than is correct for some of the random effects (female × male, female × male × environment, and residual). This is due to a doubling of the apparent degrees of freedom, from (n − 1) to (2n − 1), and thus, we suggest that a correction factor r mating designs of any size at all (i.e., any n of a moderate size, e.g., greater than he bias should be negligible. ed-model software package converges to a REML solution for the variance ciated with the random effects, the estimated variance components are then used to he observed data in the calculation of the solutions of random effects. The doubledodels gave essentially the same variance component estimates and genetic tes (functions of the variance components associated with the random terms in the ere some small differences among the software packages and/or linear models, but ad no important effect on the solutions of the random effects (breeding values and , as mentioned above. nt to note that any inferences regarding fixed effects (e.g., F-tests for differences ld likely be affected by the doubling of the apparent number of degrees of freedom ndom effects. If such inferences are important, the researcher should be careful to g a separate analysis with a non-doubled dataset. riance Component Estimates sible to demonstrate that the use of a doubled dataset of a random sample of y will mall downward bias in the estimates of the precision for the variance component in, this downward bias is negligible (see Appendix A). In other words, the expected ard error estimate with a doubled dataset is approximately equal to the standard the normal dataset: e use of the doubled-dataset/parent model algorithm for mixed-model analysis lower standard error estimates from the software output than is correct for some ects (female × male, female × male × environment, and residual). This is due to a doubling egrees of freedom, from (n − 1) to (2n − 1), and thus, we suggest that a correction 1)/( − 1), where n = the degrees of freedom, should adjust the estimated standard ses, the factor Ɲ approaches √2, so the general use of Ɲ = √2 should be a very good or example, for the female × male variance, multiplying the estimated standard errors d-dataset analyses by Ɲ = √2 gives an adjusted se(̂2 ) = 18.1 and 18.2 from nd SAS_parDouble, respectively, similar to the se(̂2 ) = 18.3 from ASR_ind. female × male × environment variance, multiplying the estimated standard errors from et analyses by the correction factor Ɲ = √2 gives an adjusted se(̂2 ) = 19.2 and 19.4 ble and SAS_parDouble, respectively, similar to the se(̂2 ) = 19.2 from ASR_ind. ce interval for the female × male variance arising from the R_parDouble analysis restimates the range (or overestimates the precision) due to the apparent doubling e degrees of freedom. We suggest that an approximate adjustment of the range can e same adjustment factor Ɲ mentioned above. For example, for the female x male uble estimates ̂2 = 35.6, with an asymmetric 68.3% confidence interval from 23.8 to ̂2 +14.1). Multiplying the lower and upper deviations by Ɲ = √2 gives an ̂2 −16.8 to ̂2 +19.9. This is roughly comparable to the range of the estimated ̂2 the ASR_ind model (Table 2). We believe that the proposed method for adjusting estimates of the variance components will be a very good approximation; however, = (2n − 1)/(n − 1), where n = the degrees of freedom, should adjust the estimated standard errors. As n increases, the factor However, for mating designs of any size at all (i.e., any n of a moderate size, e.g., greater than 30), the effect of the bias should be negligible.
Once a mixed-model software package converges to a REML solution for the variance components associated with the random effects, the estimated variance components are then used to properly weight the observed data in the calculation of the solutions of random effects. The doubleddataset/parent models gave essentially the same variance component estimates and genetic parameter estimates (functions of the variance components associated with the random terms in the models). There were some small differences among the software packages and/or linear models, but any differences had no important effect on the solutions of the random effects (breeding values and SCA predictions), as mentioned above.
It is important to note that any inferences regarding fixed effects (e.g., F-tests for differences among sites) would likely be affected by the doubling of the apparent number of degrees of freedom for many of the random effects. If such inferences are important, the researcher should be careful to test for these using a separate analysis with a non-doubled dataset.

Precision of Variance Component Estimates
It is also possible to demonstrate that the use of a doubled dataset of a random sample of y will produce a very small downward bias in the estimates of the precision for the variance component estimates, but again, this downward bias is negligible (see Appendix A). In other words, the expected value of the standard error estimate with a doubled dataset is approximately equal to the standard error estimate of the normal dataset: However, the use of the doubled-dataset/parent model algorithm for mixed-model analysis produces notably lower standard error estimates from the software output than is correct for some of the random effects (female × male, female × male × environment, and residual). This is due to a doubling of the apparent degrees of freedom, from (n − 1) to (2n − 1), and thus, we suggest that a correction factor Ɲ = √(2 − 1)/( − 1), where n = the degrees of freedom, should adjust the estimated standard errors. As n increases, the factor Ɲ approaches √2, so the general use of Ɲ = √2 should be a very good approximation. For example, for the female × male variance, multiplying the estimated standard errors from the doubled-dataset analyses by Ɲ = √2 gives an adjusted se(̂2 ) = 18.1 and 18.2 from ASR_parDouble and SAS_parDouble, respectively, similar to the se(̂2 ) = 18.3 from ASR_ind. Similarly, for the female × male × environment variance, multiplying the estimated standard errors from the doubled dataset analyses by the correction factor Ɲ = √2 gives an adjusted se(̂2 ) = 19.2 and 19.4 from ASR_parDouble and SAS_parDouble, respectively, similar to the se(̂2 ) = 19.2 from ASR_ind.
The confidence interval for the female × male variance arising from the R_parDouble analysis output also underestimates the range (or overestimates the precision) due to the apparent doubling of the female x male degrees of freedom. We suggest that an approximate adjustment of the range can be made using the same adjustment factor Ɲ mentioned above. For example, for the female x male variance, R_parDouble estimates ̂2 = 35.6, with an asymmetric 68.3% confidence interval from 23.8 to 49.8 (̂2 −11.9 to ̂2 +14.1). Multiplying the lower and upper deviations by Ɲ = √2 gives an adjusted range of ̂2 −16.8 to ̂2 +19.9. This is roughly comparable to the range of the estimated ̂2 = 36.5 ± 18.3 from the ASR_ind model (Table 2). We believe that the proposed method for adjusting the standard error estimates of the variance components will be a very good approximation; however, approaches √ 2, so the general use of  A).
However, for mating designs of any size at all (i.e., any n of a moderate size, e 30), the effect of the bias should be negligible.
Once a mixed-model software package converges to a REML solution f components associated with the random effects, the estimated variance components properly weight the observed data in the calculation of the solutions of random effec dataset/parent models gave essentially the same variance component estima parameter estimates (functions of the variance components associated with the rand models). There were some small differences among the software packages and/or lin any differences had no important effect on the solutions of the random effects (bree SCA predictions), as mentioned above.
It is important to note that any inferences regarding fixed effects (e.g., F-test among sites) would likely be affected by the doubling of the apparent number of de for many of the random effects. If such inferences are important, the researcher sho test for these using a separate analysis with a non-doubled dataset.

Precision of Variance Component Estimates
It is also possible to demonstrate that the use of a doubled dataset of a random produce a very small downward bias in the estimates of the precision for the vari estimates, but again, this downward bias is negligible (see Appendix A). In other wo value of the standard error estimate with a doubled dataset is approximately equa error estimate of the normal dataset: However, the use of the doubled-dataset/parent model algorithm for mixed produces notably lower standard error estimates from the software output than is of the random effects (female × male, female × male × environment, and residual). This is d of the apparent degrees of freedom, from (n − 1) to (2n − 1), and thus, we suggest factor Ɲ = √(2 − 1)/( − 1), where n = the degrees of freedom, should adjust the est errors. As n increases, the factor Ɲ approaches √2, so the general use of Ɲ = √2 shoul approximation. For example, for the female × male variance, multiplying the estimated from the doubled-dataset analyses by Ɲ = √2 gives an adjusted se(̂2 ) = 18.1 ASR_parDouble and SAS_parDouble, respectively, similar to the se(̂2 ) = 18.3 Similarly, for the female × male × environment variance, multiplying the estimated stan the doubled dataset analyses by the correction factor Ɲ = √2 gives an adjusted se(̂2 from ASR_parDouble and SAS_parDouble, respectively, similar to the se(̂2 ) = 19.2 fr The confidence interval for the female × male variance arising from the R_pa output also underestimates the range (or overestimates the precision) due to the ap of the female x male degrees of freedom. We suggest that an approximate adjustment be made using the same adjustment factor Ɲ mentioned above. For example, for variance, R_parDouble estimates ̂2 = 35.6, with an asymmetric 68.3% confidence in to 49.8 (̂2 −11.9 to ̂2 +14.1). Multiplying the lower and upper deviations by Ɲ adjusted range of ̂2 −16.8 to ̂2 +19.9. This is roughly comparable to the range of t = 36.5 ± 18.3 from the ASR_ind model (Table 2). We believe that the proposed meth the standard error estimates of the variance components will be a very good approxim = √ 2 should be a very good approximation. For example, for the female × male variance, multiplying the estimated standard errors from the doubled-dataset analyses by  A).
However, for mating designs of any size at all (i.e., any n of a moderate size, e.g., greater than 30), the effect of the bias should be negligible.
Once a mixed-model software package converges to a REML solution for the variance components associated with the random effects, the estimated variance components are then used to properly weight the observed data in the calculation of the solutions of random effects. The doubleddataset/parent models gave essentially the same variance component estimates and genetic parameter estimates (functions of the variance components associated with the random terms in the models). There were some small differences among the software packages and/or linear models, but any differences had no important effect on the solutions of the random effects (breeding values and SCA predictions), as mentioned above.
It is important to note that any inferences regarding fixed effects (e.g., F-tests for differences among sites) would likely be affected by the doubling of the apparent number of degrees of freedom for many of the random effects. If such inferences are important, the researcher should be careful to test for these using a separate analysis with a non-doubled dataset.

Precision of Variance Component Estimates
It is also possible to demonstrate that the use of a doubled dataset of a random sample of y will produce a very small downward bias in the estimates of the precision for the variance component estimates, but again, this downward bias is negligible (see Appendix A). In other words, the expected value of the standard error estimate with a doubled dataset is approximately equal to the standard error estimate of the normal dataset: However, the use of the doubled-dataset/parent model algorithm for mixed-model analysis produces notably lower standard error estimates from the software output than is correct for some of the random effects (female × male, female × male × environment, and residual). This is due to a doubling of the apparent degrees of freedom, from (n − 1) to (2n − 1), and thus, we suggest that a correction factor Ɲ = √(2 − 1)/( − 1), where n = the degrees of freedom, should adjust the estimated standard errors. As n increases, the factor Ɲ approaches √2, so the general use of Ɲ = √2 should be a very good approximation. For example, for the female × male variance, multiplying the estimated standard errors from the doubled-dataset analyses by Ɲ = √2 gives an adjusted se(̂2 ) = 18.1 and 18. The confidence interval for the female × male variance arising from the R_parDouble analysis output also underestimates the range (or overestimates the precision) due to the apparent doubling of the female x male degrees of freedom. We suggest that an approximate adjustment of the range can be made using the same adjustment factor Ɲ mentioned above. For example, for the female x male variance, R_parDouble estimates ̂2 = 35.6, with an asymmetric 68.3% confidence interval from 23.8 to 49.8 (̂2 −11.9 to ̂2 +14.1). Multiplying the lower and upper deviations by Ɲ = √2 gives an adjusted range of ̂2 −16.8 to ̂2 +19.9. This is roughly comparable to the range of the estimated ̂2 = 36.5 ± 18.3 from the ASR_ind model (Table 2). We believe that the proposed method for adjusting  .
However, for mating designs of any size at all (i.e., any n of a moderate size, e.g., greater than 30), the effect of the bias should be negligible.
Once a mixed-model software package converges to a REML solution for the variance components associated with the random effects, the estimated variance components are then used to properly weight the observed data in the calculation of the solutions of random effects. The doubleddataset/parent models gave essentially the same variance component estimates and genetic parameter estimates (functions of the variance components associated with the random terms in the models). There were some small differences among the software packages and/or linear models, but any differences had no important effect on the solutions of the random effects (breeding values and SCA predictions), as mentioned above.
It is important to note that any inferences regarding fixed effects (e.g., F-tests for differences among sites) would likely be affected by the doubling of the apparent number of degrees of freedom for many of the random effects. If such inferences are important, the researcher should be careful to test for these using a separate analysis with a non-doubled dataset.

Precision of Variance Component Estimates
It is also possible to demonstrate that the use of a doubled dataset of a random sample of y will produce a very small downward bias in the estimates of the precision for the variance component estimates, but again, this downward bias is negligible (see Appendix A). In other words, the expected value of the standard error estimate with a doubled dataset is approximately equal to the standard error estimate of the normal dataset: However, the use of the doubled-dataset/parent model algorithm for mixed-model analysis produces notably lower standard error estimates from the software output than is correct for some of the random effects (female × male, female × male × environment, and residual). This is due to a doubling of the apparent degrees of freedom, from (n − 1) to (2n − 1), and thus, we suggest that a correction factor Ɲ = √(2 − 1)/( − 1), where n = the degrees of freedom, should adjust the estimated standard errors. As n increases, the factor Ɲ approaches √2, so the general use of Ɲ = √2 should be a very good approximation. For example, for the female × male variance, multiplying the estimated standard errors from the doubled-dataset analyses by Ɲ = √2 gives an adjusted se(̂2 ) = 18.1 and 18. The confidence interval for the female × male variance arising from the R_parDouble analysis output also underestimates the range (or overestimates the precision) due to the apparent doubling of the female x male degrees of freedom. We suggest that an approximate adjustment of the range can be made using the same adjustment factor Ɲ mentioned above. For example, for the female x male variance, R_parDouble estimates ̂2 = 35.6, with an asymmetric 68.3% confidence interval from 23.8 = √ 2 gives an adjusted se(σ 2 f m ) = 19.2 and 19.4 from ASR_parDouble and SAS_parDouble, respectively, similar to the se(σ 2 f m ) = 19.2 from ASR_ind. The confidence interval for the female × male variance arising from the R_parDouble analysis output also underestimates the range (or overestimates the precision) due to the apparent doubling of the female x male degrees of freedom. We suggest that an approximate adjustment of the range can be made using the same adjustment factor produces notably lower standard error estimates from the software output than is correct for some of the random effects (female × male, female × male × environment, and residual). This is due to a doubling of the apparent degrees of freedom, from (n − 1) to (2n − 1), and thus, we suggest that a correction factor Ɲ = √(2 − 1)/( − 1), where n = the degrees of freedom, should adjust the estimated standard errors. As n increases, the factor Ɲ approaches √2, so the general use of Ɲ = √2 should be a very good approximation. For example, for the female × male variance, multiplying the estimated standard errors from the doubled-dataset analyses by Ɲ = √2 gives an adjusted se(̂2 ) = 18.1 and 18.2 from ASR_parDouble and SAS_parDouble, respectively, similar to the se(̂2 ) = 18.3 from ASR_ind. Similarly, for the female × male × environment variance, multiplying the estimated standard errors from the doubled dataset analyses by the correction factor Ɲ = √2 gives an adjusted se(̂2 ) = 19.2 and 19.4 from ASR_parDouble and SAS_parDouble, respectively, similar to the se(̂2 ) = 19.2 from ASR_ind.
The confidence interval for the female × male variance arising from the R_parDouble analysis output also underestimates the range (or overestimates the precision) due to the apparent doubling of the female x male degrees of freedom. We suggest that an approximate adjustment of the range can be made using the same adjustment factor Ɲ mentioned above. For example, for the female x male variance, R_parDouble estimates ̂2 = 35.6, with an asymmetric 68.3% confidence interval from 23.8 to 49.8 (̂2 −11.9 to ̂2 +14.1). Multiplying the lower and upper deviations by Ɲ = √2 gives an adjusted range of ̂2 −16.8 to ̂2 +19.9. This is roughly comparable to the range of the estimated ̂2 = 36.5 ± 18.3 from the ASR_ind model (Table 2). We believe that the proposed method for adjusting the standard error estimates of the variance components will be a very good approximation; however, However, the use of the doubled-dataset/parent model algorithm for mixed-mod produces notably lower standard error estimates from the software output than is corre of the random effects (female × male, female × male × environment, and residual). This is due to of the apparent degrees of freedom, from (n − 1) to (2n − 1), and thus, we suggest that a factor Ɲ = √(2 − 1)/( − 1), where n = the degrees of freedom, should adjust the estimate errors. As n increases, the factor Ɲ approaches √2, so the general use of Ɲ = √2 should be approximation. For example, for the female × male variance, multiplying the estimated stan from the doubled-dataset analyses by Ɲ = √2 gives an adjusted se(̂2 ) = 18.1 and ASR_parDouble and SAS_parDouble, respectively, similar to the se(̂2 ) = 18.3 from Similarly, for the female × male × environment variance, multiplying the estimated standard the doubled dataset analyses by the correction factor Ɲ = √2 gives an adjusted se(̂2 ) = 1 from ASR_parDouble and SAS_parDouble, respectively, similar to the se(̂2 ) = 19.2 from A The confidence interval for the female × male variance arising from the R_parDou output also underestimates the range (or overestimates the precision) due to the apparen of the female x male degrees of freedom. We suggest that an approximate adjustment of th be made using the same adjustment factor Ɲ mentioned above. For example, for the fe variance, R_parDouble estimates ̂2 = 35.6, with an asymmetric 68.3% confidence interv to 49.8 (̂2 −11.9 to ̂2 +14.1). Multiplying the lower and upper deviations by Ɲ = √ adjusted range of ̂2 −16.8 to ̂2 +19.9. This is roughly comparable to the range of the es = 36.5 ± 18.3 from the ASR_ind model (Table 2). We believe that the proposed method fo the standard error estimates of the variance components will be a very good approximatio = √ 2 gives an adjusted range ofσ 2 f m −16.8 toσ 2 f m +19.9. This is roughly comparable to the range of the estimatedσ 2 A = 36.5 ± 18.3 from the ASR_ind model (Table 2). We believe that the proposed method for adjusting the standard error estimates of the variance components will be a very good approximation; however, the accuracy and robustness of the method should be confirmed by theoretical derivation or simulation studies.

Genetic Parameter Estimates
Breeders might also use genetic parameter estimates to guide breeding or testing strategies. For example, a breeder might want to examine how the number of progeny per site and the number of test sites would affect the precision of the breeding value predictions for the parents or progeny. Alternatively, a breeder might want to understand the size of the SCA variance relative to the additive genetic variance to determine if the marginal gain from the deployment of the best full-sib families would justify the cost of full-sib family testing versus half-sib family testing. Since the individual tree model and the doubled-dataset/parent models give the same genetic parameter estimates, the breeder will come to the same conclusions regarding breeding strategy alternatives.

Calculation of Individual Tree-Breeding Values with Half-Sib Progeny Test Data
This manuscript has demonstrated an approach to calculating the individual tree-breeding values of progeny using a doubled full-sib dataset and linear mixed model with female and male parents as random effects (see supplementary materials, Archive S1). The individual tree Mendelian deviation is calculated from the individual tree residual multiplied by the within-family heritability, and this is added to the mid-parent breeding value to calculate the individual tree-breeding value. The same basic approach will work with half-sib progeny test data with an appropriate adjustment of the linear model and the software code. Note that with a half-sib model, only one parent (typically the female or seed parent with forest trees) is included in the linear model, so there is no need to double the dataset. It is also important to note that the numerator of the within-family heritability for the half-sib case represents 3 4 of the additive variance, rather than 1 2 of the additive variance, as in the full-sib case.

Limitations: Multiple Generations
The doubled-dataset/parent model produces appropriate results with a single-generation full-sib dataset, i.e., a dataset with one distinct generation of parents and one distinct generation of progeny. For datasets with multiple generations, where some genotypes occur in the dataset as both parents and progeny, a parent model will not work. Multiple-generation datasets must be analyzed with software that allows for the pedigree function to calculate an appropriate numerator relationship matrix. For some organisms that typically produce small numbers of progeny (e.g., those in many animal-breeding programs), this will be a significant limitation. For some organisms that produce many progeny, this limitation is less problematic. For example, with forest trees (and many other plants), it is common for the parents of any given generation g to be represented by hundreds of progeny tested across a number of environments in replicated designs. Thus, each parental breeding value will be predicted quite accurately. Any progeny selected to be the parents of generation g + 1 will likewise be represented by many progeny, so those data can be analyzed as a stand-alone dataset to provide accurate breeding value predictions for generation g + 1's parents.

Datasets with Both Full-Sib and Half-Sib Observations
Sometimes, a breeder might want to analyze a dataset that contains a mix of full-sib and half-sib progeny test data. Half-sib families (or open-pollinated families, derived from random mating in a seed orchard, native stand, or commercial planting) are easy to collect and, assuming a large population of male (pollen) parents, can provide a good measure of the maternal breeding value. With a full-sib progeny dataset, each individual tree observation would be associated with two parental random effects, one for the female parent and one for the male parent. With half-sib data, each individual tree observation would be associated with only parental random effects, typically for the female parent. This is theoretically easy with a pedigree function that creates a Z design matrix for parents: a full-sib observation would have values in two columns, and a half-sib observation would have values in only one column. However, general linear mixed-model software packages do not allow a missing value for a random-effect classification variable (the male parent, in this case), and typically, observations with missing class variables are deleted from the dataset. In the case of a mixed full-sib and half-sib dataset, one option might be to code all the half-sib progeny as having a common unknown male parent. This might be appropriate if all of the half-sib families were collected from a common seed orchard. Another option might be to code each half-sib family with a unique unknown male parent. This might be a safer and more robust approach, which could account for different compositions of male parents, e.g., due to different seed collection sites, differences in flowering phenology, etc. If the proportion of half-sib progeny in the dataset is small, we believe there would be minimal effect on the variance component and genetic parameter estimates, but this is an area deserving further research, and breeders should exercise some caution.

Heterogeneous Variance Components
The doubled-dataset/parent model algorithm outlined here has been presented with a homogeneous variance example; that is, each random effect is assumed to have a single variance parameter that applies to the entire population of observations. This assumption may not always be valid, and a breeder might wish to model some heterogeneity in some of the random effects. For example, different test environments might have different residual variances depending on the inherent uniformity of the field site, or on differences in management. Another example could be tests of different classes or types of environments that produce differences in growth or different levels of genetic expression of a trait. If these situations exist, a breeder may want to include heterogeneity in the linear model. Whatever flexibility exists in a particular software program to allow for complex heterogeneous variance components in the G matrix or the R matrix could easily be used with the algorithm presented here.

Conclusions
Breeders who need to analyze a full-sib progeny test dataset, and wish to do so using a general linear mixed-model software package lacking a pedigree function, can accomplish this analysis using a doubled dataset and a model including female and male parents as random effects. This approach will provide variance component estimates, genetic parameter estimates, breeding value predictions for both the parents and progeny, and solutions to all other important random effects that are, for all practical purposes, identical to the results that would be obtained from an analysis using individual tree data and a pedigree function.

Supplementary Materials:
The following archive is available online at http://www.mdpi.com/1999-4907/11/11/ 1169/s1. Archive S1: Algorithm FS MM.zip: This is a zip file with five folders, one for each software/linear model described here: ASR_ind, ASR_parent, ASR_parDouble, SAS_parDouble, and R_parDouble. Each folder contains one file with the code to run the model, and an "Output" subfolder with relevant files of parameter estimates and BLUP solutions. All files are either in text or .CSV format. Acknowledgments: Thanks go to Luis Ibarra for assistance with generating the simulated dataset, Fikret Isik for helpful tips on coding ASREML, and Salvador Gezan for discussions about the precision of the variance component estimates. Additionally, many thanks go to Camcore members for their long-term commitment to the tree improvement of tropical and sub-tropical pines and eucalypts. Now define a parameter s 2 d , a variance estimator with a doubled dataset.