Genomic Prediction of Growth and Stem Quality Traits in Eucalyptus globulus Labill . at Its Southernmost Distribution Limit in Chile

The present study was undertaken to examine the ability of different genomic selection (GS) models to predict growth traits (diameter at breast height, tree height and wood volume), stem straightness and branching quality of Eucalyptus globulus Labill. trees using a genome-wide Single Nucleotide Polymorphism (SNP) chip (60 K), in one of the southernmost progeny trials of the species, close to its southern distribution limit in Chile. The GS methods examined were Ridge Regression-BLUP (RRBLUP), Bayes-A, Bayes-B, Bayesian least absolute shrinkage and selection operator (BLASSO), principal component regression (PCR), supervised PCR and a variant of the RRBLUP method that involves the previous selection of predictor variables (RRBLUP-B). RRBLUP-B and supervised PCR models presented the greatest predictive ability (PA), followed by the PCR method, for most of the traits studied. The highest PA was obtained for the branching quality (~0.7). For the growth traits, the maximum values of PA varied from 0.43 to 0.54, while for stem straightness, the maximum value of PA reached 0.62 (supervised PCR). The study population presented a more extended linkage disequilibrium (LD) than other populations of E. globulus previously studied. The genome-wide LD decayed rapidly within 0.76 Mbp (threshold value of r2 = 0.1). The average LD on all chromosomes was r2 = 0.09. In addition, the 0.15% of total pairs of linked SNPs were in a complete LD (r2 = 1), and the 3% had an r2 value >0.5. Genomic prediction, which is based on the reduction in dimensionality and variable selection may be a promising method, considering the early growth of the trees and the low-to-moderate values of heritability found in the traits evaluated. These findings provide new understanding of how develop novel breeding strategies for tree improvement of E. globulus at its southernmost range limit in Chile, which could represent new opportunities for forest planting that can benefit the local economy.


Introduction
The genus Eucalyptus L'Her.comprises more than 700 species that are distributed mainly in Australia, and these species are planted in a wide variety of environments, such as the Mediterranean, tropical, subtropical and temperate climates [1,2].Both the species and hybrids of the genus are among the main sources of biomass worldwide and among the main hardwoods used for the production of pulp and wood [3].For example, Eucalyptus globulus Labill. is one of the most planted hardwood species for industrial uses in various countries with temperate zones (e.g., Portugal, Spain, Uruguay and Chile), which has been the target of breeding in several programs around the world to improve economically important traits such as tree growth and wood quality.E. globulus is naturally distributed in coastal south-eastern Australia [4], between 38 • and 43 • S latitude, where oceanic and subpolar oceanic climates are predominant.However, the species has been extensively planted under temperate conditions [1,2].Notably, E. globulus exhibits physiological plasticity and has been successfully grown in a broad range of environmental conditions that are characterized by moderate abiotic stresses [5][6][7].In Chile, for instance, several studies have been conducted to provide an understanding of the mechanisms through which E. globulus trees respond to cold conditions [8][9][10].In fact, severe cold winter periods restrict the majority of the E. globulus plantations to coastal and central Chile [7].In fact, the southern limit of E. globulus in Chile occurs in central Los Lagos (administrative region) at approximately latitude 41 • S, accounting for only 4% of the total E. globulus plantations in Chile.
Individual selection based on phenotypic data of the traits of interest, which includes pedigree records, has been the most commonly used strategy to genetically improve forest tree species.In this sense, the estimation of quantitative genetic parameters in tree breeding programs is important to ensure efficient selection [11].On the other hand, the growth rate is relatively long, and low juvenile-mature correlations in forest trees have stimulated interest in marker-assisted selection (MAS) to accelerate breeding through early selection [12].In MAS, individual tree breeding values are predicted based on the effects of selected markers, and is most effective for traits controlled by few quantitative trait loci (QTLs), each of which controls a relatively large proportion of the total phenotypic variation.In this context, efforts to increase genetic gains using MAS in different breeding programs of Eucalyptus have been performed [11][12][13][14][15][16][17].However, the benefits of MAS may be diminished by the low genetic variance that can be explained by a given QTL [18][19][20].This principle is particularly important when the selection processes focus mainly on complex traits (controlled by multiple genes), such as stem straightness, wood volume, wood quality and growth traits [21][22][23].Moreover, many QTLs underlying a complex trait are useful within the same or related families and environments (e.g., Ukrainetz et al. [24], Mamani et al. [25]), and the use of small population sizes and conventional statistical methods have been inadequate to accurately detect small effects of QTLs [26].
To overcome the limitations of MAS in polygenic traits, Meuwissen et al. [27] proposed the genomic prediction/selection (GS) method; in which environmental factors are important, and it focuses on traits that are affected by a large number of genes.In their study, these researchers compared several frequentist analyses (e.g., best linear unbiased prediction (BLUP) and least squares) and their Bayesian counterpart models based on the predictive ability of breeding values.The GS principles agree with the infinitesimal model, which is based on the fact that breeding values result from the small and additive effects of alleles in a large number of loci [28].GS is based on the simultaneous prediction of the effects of thousands of DNA markers (e.g., SNPs) scattered throughout the genome of an organism, which disregards the use of significance tests for individual markers.Daetwyler et al. [29] highlighted that GS can increase genetic gain through greater precision in the estimation of breeding values, the reduction in generational intervals and an optimal use of available genetic resources.In the context of the genetic improvement of forest tree species, GS was originally proposed as a promising way to capture a greater proportion of the genetic variation for growth traits [30].Moreover, according to Suontama et al. [31], Grattapaglia [32], Iwata et al. [33] and Zhong et al. [34], genomic selection is expected to enhance the genetic improvement of plants, including forest tree species by providing more accurate estimates of breeding values compared with pedigree-based methodologies.
Several models and methodological variants have been proposed for a better use of the benefits of GS because this method requires the development of statistical models that usually contain a massive number of markers (predictor variables) and a limited number of phenotypic data.The statistical methods commonly used in GS correspond to Bayes-A, Bayes-B, Bayes Cπ, Bayesian Least Absolute Shrinkage and Selection Operator (LASSO; BLASSO) [27,35,36], Genomic-BLUP (GBLUP) [36] and Ridge Regression BLUP (RRBLUP) [27].In each GS model, genetic merits are predicted under different analytical assumptions.Therefore, there is no universally optimal statistical method for all the traits in every population.For instance, Suontama et al. [31] and Durán et al. [37] showed a significant improvement in breeding value accuracy for wood properties in Eucalyptus by implementing GBLUP method compared to pedigree based prediction.On the other hand, Resende-Junior et al. [38] determined the SNP effects based on five predictive models in a training population of Pinus taeda L., in which the Bayes Cπ, Bayes-A and RRBLUB-B methods (using a subset of selected markers) showed greater predictive ability than RRBLUP and BLASSO methods.In turn, some studies have shown that the predictive accuracy of GS models can be enhanced by previously selecting a subset of predictor variables based on the individual effects of each marker on the study traits [38][39][40].Likewise, Long et al. [41], Solberg et al. [42], Du et al. [43] and Azevedo et al. [44] demonstrated the potential to combine the dimension reduction and variable selection for accurate and cost-effective prediction of genomic breeding values.Among the methods proposed for these objectives, the principal component regression (PCR), partial least squares (PLS) and their extensions supervised PCR and sparse PLS stand out.The reduction in dimensionality becomes more relevant when considering the new genomic data platforms, which have allowed for high-density genomic data to be obtained, thereby increasing the possibility of finding genomic regions controlling the variation of a trait.Considering the large size of the SNP platforms currently available in Eucalyptus (e.g., EUchip60K), a preselection of SNP and/or the use of dimensionality reduction methods may improve the prediction ability of the traits of interest for the forestry industry.
The present study was undertaken to examine the ability of different GS models to predict growth traits (diameter at breast height, tree height and wood volume), stem straightness and branching quality of E. globulus trees using 60,000 SNP markers, in one of the southernmost progeny trials of the species, close to its southern distribution limit in Chile.A better understanding of how develop novel breeding strategies for tree improvement of E. globulus could represent new opportunities for forest planting at its southernmost range limit in Chile.

Genetic Material and Phenotypic Measurements
The breeding population consisted of a mixture of E. globulus Labill.families composed of 62 full-sib and 3 half-sib families (1968 individuals), established in 2012 in La Poza, municipality of Purranque, in the administrative region of Los Lagos; the southernmost distribution of E. globulus in Chile [7].The local conditions of La Poza are shown in Table 1.A randomized complete block design was used in this experiment, with 30 blocks, single-tree plots, and a spacing of 2.5 m between the trees within a block.A total of 1968 trees were measured at four years of age for the following five phenotypic traits: total tree height (H), diameter at breast height (DBH), total wood volume (VOL), stem straightness (ST) and branching quality (BQ).The ST was evaluated in the first two-thirds of the total height of the tree using a categorical scale of 6 levels (1 represents trees with curvature in the first stretch of the total height of the tree; and 6 represents trees without problems that may show a slight curvature in the upper third of the tree without loss of productivity).BQ was evaluated according to different criteria that define the quality of branches (diameter, angle and distribution in the tree) using a categorical scale of 6 levels (1 represents trees with extreme deficiency in branch diameter and any other variable; and 6 represents trees that present an optimal combination of all the variables that qualify the quality of branches without loss of productivity).Nuclear DNA was isolated from leaf tissue of 647 individuals randomly selected (approximately 10 individuals per family) from the breeding population according to Doyle and Doyle [45] and Porebsky et al. [46].The sample was genotyped using the EUChip60K SNP system (GeneSeek, Lincoln, NE, USA) developed by Silva-Junior et al. [47].The genotyping quality was evaluated using Genome Studio software (Illumina, San Diego, CA, USA).Subsequently, the genotyping matrix was filtered considering a minor allele frequency (MAF) ≥0.05 and a maximum proportion of 10% of lost data.The genotyped sample was used to estimate the LD pattern in the breeding population studied.The LD between marker pairs (expressed in terms of r 2 ) was corrected by relatedness and estimated using LDcorSV package version 1.3.2[48].The LD decay curve was plotted using R software version 3.4 according to the method of Hill and Weir [49], and it was based on the physical distance of the genome of Eucalyptus grandis [50].

Estimation of Pedigree-Based Breeding Values
Estimated breeding values (EBV) were obtained using the Best Linear Unbiased Prediction (BLUP) with estimates of variance components based on the Restricted Maximum Likelihood (REML) method by the ASREML program version 4 [51].This pedigree-based model was as follows: where y represents the vector of phenotypic data, β is the vector of fixed effects (general mean and block effect), a is the vector of additive effects, which ∼ N(0, Aσ 2 a ) where A is the average numerator relationship matrix from pedigree information and σ 2 a is the additive genetic variance, X and Z correspond to the incidence matrices associated with the fixed and random effects, respectively, and ε represents the vector of residual effects, which ∼ N(0, Iσ 2 e ) where I is an identity matrix and σ 2 e is the residual variance.Due to the ordinal nature of the traits BQ and ST, their estimated breeding values (EBVs) were obtained using a Generalized Linear Mixed Model, evaluated with a logistic regression model in the ASREML program v4 [51].Additionally, coefficients of additive genetic variation (CVa) were calculated using variance component estimates and the means of each trait (X) as:

Genomic Prediction Models
We compared the following statistical methods with regard to their predictive performance: Ridge Regression BLUP (RRBLUP), Bayes-A, Bayes-B, Bayesian Least Absolute Shrinkage and Selection Operator (BLASSO), PCR, supervised PCR and a variant of RRBLUP, which involves the previous selection of predictive variables (RRBLUP-B) [27,35,38,43].The genotypic information is usually fitted using a basic linear model that includes a vector β (the overall mean) and a vector m with the marker effects [38].
In RRBLUP and RRBLUP-B, a mixed model was fitted in which the vector of fixed effect (β), i.e., the overall mean, and the vector of random marker effects (m) were estimated simultaneously using the following mixed model equations: where σ 2 m and σ 2 e correspond to the variance components (estimated by REML) of marker and residual effects, respectively, and the variance ratio σ 2 e /σ 2 m corresponds to the shrinkage parameter for the random SNP marker effects.RRBLUP-B corresponds to a variant of the RRBLUP method [38], which utilizes a selected subset of markers.The number of markers in the subset was defined according to the criteria of Resende-Junior et al. [38].In a first step, the marker effects from the RRBLUP were ranked in decreasing order by their absolute values and grouped in multiples of 50 markers (50, 100, 150 . . ., n).The group that maximized the predictive ability was selected as the optimum number of marker effects to be used in predictive model.Finally, markers effects were re-estimated in a new RRBLUP analysis, using selected markers (more details in Resende-Junior et al. [38]).
The methods Bayes A, Bayes B and BLASSO relax the assumption of common prior variance to all marker effects [30].In the Bayes-A method [27], it is assumed that each ith marker effect (m i ) follows a normal prior distribution ∼ N(0, Iσ 2 m i ).The prior distribution of marker variances is assumed to be scaled inverted chi-square, , where S 2 and v are a scale parameter and the number of degrees of freedom, respectively.The Bayes-B method [27] uses a prior distribution that has a high density, π, at σ 2 m i = 0, and an inverted chi-square distribution for σ 2 m i > 0 with probability 1-π.The BLASSO method, a double exponential prior distribution (DE) is assumed for marker effects, p(m i |λ, σ 2 e = DE(m i |0, λ/ σ 2 e , where λ corresponds to a regularization parameter.The DE distribution generates a strong contraction (closer to zero) to estimate the effect of the markers [52].
PCR is a statistical method that addresses the problem of multicollinearity when there are a large number of explanatory variables in a prediction model [53].Importantly, dimension reduction techniques have recently gained much attention in the analysis of high dimensional genomic data [54][55][56].This method consists of finding combinations of the predictor variables that best represent the variability of the data (main components).In the first step, the components that mainly explain the variation of the response variable (e.g., deregressed EBV) are identified.Subsequently, the model is validated by reducing the matrix dimension of the predictor variables in such a way that only relevant components are incorporated in the prediction model.Considering that matrix X (matrix of markers or predictor variables) is centered and scaled (X*) with n × m dimensions (n is the sample size and m is the number of markers), its singular decomposition value is as follows [54]: where U, D and V T are matrices of n × m, m × m and m × p, respectively, n is the number of observations (n = 647), p is the total number of predictors (SNPs); m is the number of X "ranked" elements; D is a diagonal matrix of singular values d j ; and U contains the main components (u 1 , u 2 . . .u m ) in the order of d 1 ≥ d 2 ≥ . . .≥ 0. The PCR model corresponds to a regression expressed as follows: where β * corresponds to the regression coefficients that allow selecting the main components that explain the variation of y.The matrix V m × m is also known as the load matrix of X.When the number of components (p) is defined, the load matrix is truncated to a matrix of m × p.Therefore, the matrix containing the selected components is represented by T nxp = X nxm V mxp .The regression model using these new terms is y = Tb + e.Therefore, the estimated score coefficients can be expressed as follows: (further details in Du et al. [49]).The supervised PCR method consists of two steps [44].In the first step, the individual coefficients of each predictor variable (effects of the SNPs) are obtained to "rank" each SNP according to the magnitude of its effects.Subsequently, a set of variables ranked in the first positions are selected to perform a PCR analysis.The variable selection in supervised PCR model was based upon association with the phenotype of each SNP, using a single-marker regression (Long et al. [41]).A value of significance p < 0.01 for the regression between SNP and phenotypic response was considered as threshold for SNP selection in the supervised PCR method.
The optimal number of components for the PCR and supervised PCR methods was determined by selecting the minimum prediction error value (predicted residual error sum of squares-PRESS) for each component.These methods were implemented using the PLS package in R [57].The RRBLUP, Bayes-A, Bayes-B and BLASSO methods were implemented in R software using the RRBLUP version 4.6 [58] and BLR version 1.5 [59] packages.

Cross-Validation and Prediction Ability
The sample of 647 trees was divided in two subsets of individuals.A total of 582 individuals (~90%) were used to train the models and estimate the effects of the markers, while the rest of the individuals (~10%) were used to evaluate and validate the prediction values.In the case of the RRBLUP, BAYES-A, BAYES-B, BLASSO and RRBLUP-B methods, 100 cycles were used to obtain the average prediction ability of each method.The PCR and supervised PCR methods were validated according to Du et al. [43] using 10 folds in the cross-validation analysis.The prediction ability of all the methods was calculated as the correlation coefficient between genomic estimated breeding values (GEBVs) and EBVs of the validation subsample.

Validation of Pedigree Data
In an additional analysis, a genomic relationship matrix was calculated to evaluate the consistency between the relatedness coefficient based on pedigree information and genomic information based on SNPs data.The genomic relationship matrix was obtained in R software using the RRBLUP version 4.6 [58] package.

Estimates of Variance Components and Heritability of Growth Traits, Branching Quality and Stem Straightness
The estimates of the variance components (REML) and heritability for branching quality (BQ), stem straightness (ST), total tree height (H), diameter at breast height (DBH) and wood volume (VOL) are presented in Table 2 (BLUP analysis).According to these results, the study traits showed low-to-moderate heritability, with estimates varying from 0.07 to 0.19 for BQ and DBH.The heritability values of all traits were consistent with other breeding populations of E. globulus [60][61][62][63].Furthermore, these findings were supported by the coefficients of additive genetic variation, which were similar to those found in other studies with E. globulus and Eucalyptus spp.[63][64][65].The results evidenced that growth traits such (H, DBH, VOL) and stem form traits (ST and BQ) had a low additive genetic control in early stages of E. globulus, which can be usual for populations younger than 5 years [61,62].In this context, the low heritability found is supporting the use of genomic prediction methods that have the potential to capture a greater proportion of the phenotypic variation during selection cycles.In this study, the consistence between the frequency distribution of coefficients of genomic and pedigree-based relationship data was additionally evaluated.As expected, the analysis of genomic data revealed that the most of individuals are not related.This result is in agreement with the analysis of pedigree data (Figure 1).However, both relationship matrix (based on genomic and pedigree), showed that several individuals are related with others, and the coefficients were between 0.2-0.3,which reflects the mating design implemented in this breeding population of E. globulus.
Forests 2018, 9, x FOR PEER REVIEW 7 of 18 data revealed that the most of individuals are not related.This result is in agreement with the analysis of pedigree data (Figure 1).However, both relationship matrix (based on genomic and pedigree), showed that several individuals are related with others, and the coefficients were between 0.2-0.3,which reflects the mating design implemented in this breeding population of E. globulus.

Final Set of Qualified SNPs and Linkage Disequilibrium Decay
After applying the corresponding filters, the final set of markers consisted of 14,442 high-quality SNPs, which were distributed on the 11 chromosomes of E. globulus with an average of 1356 SNPs per chromosome and an average distance of approximately 4000 bp.The genome-wide linkage disequilibrium (LD) decay pattern for the study population is shown in Figure 2. Decay of LD showed a clear nonlinear trend with physical distance.According to these results, the genome-wide LD decayed rapidly within 0.76 Mbp, regarding a threshold value of r 2 = 0.1 (p < 0.05).The average LD on all chromosomes was r 2 = 0.09.In addition, the 0.15% of total pairs of linked SNPs were in a complete LD (r 2 = 1), and the 3% had an r 2 value >0.5.Particularly, the LD of chromosome 2 decayed faster than the other linkage groups, and chromosome 11 had the greatest extension of LD, decaying at 1 Mbp (r 2 = 0.11) (Table 3).Approximately 3.8% of the marker pairs of each chromosome registered

Final Set of Qualified SNPs and Linkage Disequilibrium Decay
After applying the corresponding filters, the final set of markers consisted of 14,442 high-quality SNPs, which were distributed on the 11 chromosomes of E. globulus with an average of 1356 SNPs per chromosome and an average distance of approximately 4000 bp.The genome-wide linkage disequilibrium (LD) decay pattern for the study population is shown in Figure 2. Decay of LD showed a clear nonlinear trend with physical distance.According to these results, the genome-wide LD decayed rapidly within 0.76 Mbp, regarding a threshold value of r 2 = 0.1 (p < 0.05).The average LD on all chromosomes was r 2 = 0.09.In addition, the 0.15% of total pairs of linked SNPs were in a complete LD (r 2 = 1), and the 3% had an r 2 value >0.5.Particularly, the LD of chromosome 2 decayed faster than the other linkage groups, and chromosome 11 had the greatest extension of LD, decaying at 1 Mbp (r 2 = 0.11) (Table 3).Approximately 3.8% of the marker pairs of each chromosome registered a value of LD (r 2 ) > 0.5.Particularly, 5.6% of the marker pairs of chromosome 11 were found in strong LD (r 2 > 0.5).

Figure 2.
Genome-wide linkage disequilibrium (LD) decay pattern for the study population of Eucalyptus globulus.Labill.Pairwise LD values expressed by r 2 ; LD (R 2 ), as adjusted according to Hill and Weir [53] (curve colored by black).Distance (bp) is the distance between SNP pairs.The LD threshold of r 2 = 0.1 is indicated with a dotted line.Table 3. LD estimates (corrected by the relatedness) for each chromosome.Distance (Mbp) and r 2 express the average values of genomic distance and LD between pairs of markers.Max.r 2 and Min.r 2 correspond to the maximum and minimum LD values, respectively, found between each pair of markers.Dist.Max and Dist.Min represent the maximum and minimum distance between markers found in LD, respectively.CH is the number of chromosome of Eucalyptus.

Predictive Ability of Frequentist, Bayesian and Dimension Reduction Methods
In Table 4 and Figure 3, the predictive ability (PA) values of all the methods under study and each evaluated trait are shown.As expected, PA varied between the seven prediction methods explored here, as well as between the traits under study [43].The highest PA values were obtained by means of methods of dimension reduction and variable selection (RRBLUP-B and supervised PCR).In the case of RRBLUP-B, PA was increased in all traits by reducing between 94 and 97% the total number of SNP markers (Figure 4).The highest predictive ability values for DBH, VOL, H, ST and BQ were obtained using models based on 450, 900, 850, 800 and 950 SNPs, respectively.In particular, the maximum PA values reached by the RRBLUP, Bayes-A, Bayes-B and BLASSO methods were significantly lower than the rest of the methods, with maximum values corresponding to 0.14 (ST), 0.17 (BQ and ST), 0.24 (BQ) and 0.21 (BQ), respectively.According to the results, more parsimonious models provided a higher prediction of growth and stem form than other predictive Table 3. LD estimates (corrected by the relatedness) for each chromosome.Distance (Mbp) and r 2 express the average values of genomic distance and LD between pairs of markers.Max.r 2 and Min.r 2 correspond to the maximum and minimum LD values, respectively, found between each pair of markers.Dist.Max and Dist.Min represent the maximum and minimum distance between markers found in LD, respectively.CH is the number of chromosome of Eucalyptus.

Predictive Ability of Frequentist, Bayesian and Dimension Reduction Methods
In Table 4 and Figure 3, the predictive ability (PA) values of all the methods under study and each evaluated trait are shown.As expected, PA varied between the seven prediction methods explored here, as well as between the traits under study [43].The highest PA values were obtained by means of methods of dimension reduction and variable selection (RRBLUP-B and supervised PCR).In the case of RRBLUP-B, PA was increased in all traits by reducing between 94 and 97% the total number of SNP markers (Figure 4).The highest predictive ability values for DBH, VOL, H, ST and BQ were obtained using models based on 450, 900, 850, 800 and 950 SNPs, respectively.In particular, the maximum PA values reached by the RRBLUP, Bayes-A, Bayes-B and BLASSO methods were significantly lower than the rest of the methods, with maximum values corresponding to 0.14 (ST), 0.17 (BQ and ST), 0.24 (BQ) and 0.21 (BQ), respectively.According to the results, more parsimonious models provided a higher prediction of growth and stem form than other predictive models in this population of E. globulus.On the other hand, the RRBLUP and BAYES-A methods performed a greater predictive ability than the BAYES-B and BLASSO methods for growth traits (VOL, DHB and H), and vice versa for stem form traits (BQ and ST).models in this population of E. globulus.On the other hand, the RRBLUP and BAYES-A methods performed a greater predictive ability than the BAYES-B and BLASSO methods for growth traits (VOL, DHB and H), and vice versa for stem form traits (BQ and ST).The importance of each SNP on the studied traits was evaluated according to their effects and individual regression coefficients (Figure 5).The 950, 800, 900, 450 and 850 SNPs that were used to predict the traits under study through the RRBLUP-B model were located mainly in chromosomes 2 (BQ), 2 (ST), 6 (VOL), 11 (DBH), and 8 (H), respectively.The variability of SNPs effects was higher for BQ and ST, and lower for H and DBH.The SNP effects for H and DBH yielded a more homogeneous distribution than the other studied traits, while some SNPs generated an effect at least 75% above the mean of SNP effects for BQ and ST.The importance of each SNP on the studied traits was evaluated according to their effects and individual regression coefficients (Figure 5).The 950, 800, 900, 450 and 850 SNPs that were used to predict the traits under study through the RRBLUP-B model were located mainly in chromosomes 2 (BQ), 2 (ST), 6 (VOL), 11 (DBH), and 8 (H), respectively.The variability of SNPs effects was higher for BQ and ST, and lower for H and DBH.The SNP effects for H and DBH yielded a more homogeneous distribution than the other studied traits, while some SNPs generated an effect at least 75% above the mean of SNP effects for BQ and ST.

Discussion
The study population presented a more extended LD than other populations of E. globulus previously studied.For example, Durán et al. [37] found that LD decays within 3000 bp in a clonal population of the species.In contrast, Thavamanikumar et al. [67] found that the LD decays rapidly

Discussion
The study population presented a more extended LD than other populations of E. globulus previously studied.For example, Durán et al. [37] found that LD decays within 3000 bp in a clonal population of the species.In contrast, Thavamanikumar et al. [67] found that the LD decays rapidly within 1000 bp in natural populations of E. globulus.The high LD found in the present study may be explained by the fact that most of the families included in the studied population (62/65) were derived from controlled crosses (from an incomplete factorial mating design).The genomic relationship matrix showed that several individuals were tightly related with others; as revealed by the values of relatedness coefficients (estimated between 0.2-0.3),which reflect the mating design implemented in this population of E. globulus.The LD pattern is one of the main factors that contribute to the success of breeding programs that involve GS [27,68,69].The greater extent of LD in a population results in fewer markers that are needed to find QTLs associated with phenotypic traits, and a more accurate genomic prediction can be obtained.According to Liu et al. [70], populations that present extensive LD in long genomic distances allow for predictions with higher stability.In this context, genomic prediction models may benefit from the genetic structure of the studied population in such a way that the high LD values in their chromosomes could contribute in obtaining models with a relatively high predictive ability.
The results of the present study revealed that the reduction in dimensionality and selection of variables could contribute to obtain predictive models with greater predictive accuracy.Several studies have shown that a subset of selected variables can reasonably increase PA and the reliability of BV estimates predicted by GS [40,71,72].For example, Long et al. [71], Usai et al. [72] and Weigel et al. [73] found that the maximum accuracy value of a predictive model was obtained by using only 1% to 3% of the total variables.In agreement with present results, Liu et al. [70] reported that closer genetic relationships between the training and validation populations as well as relatively high LD result in the need for fewer markers to achieve greater predictive accuracy.Moreover, the genetic architecture that underlies the study trait also determines the most appropriate method for predicting BV [74,75].Growth-related traits, such as H, DBH and VOL, have been described as less heritable (h 2 < 0.3; in narrow-sense) than other complex traits in E. globulus (e.g., wood quality and chemical properties; h 2 > 0.5 [76,77]), which explain the small number of genomic regions or QTLs detected for these traits, in contrast to other traits.
The predictive ability for growth-related traits was maximized by dimension reduction and variable selection methods, while all tested Bayesian methods performed similarly, with the lowest PA values for these traits.Bayes-B and BLASSO are also considered to be methods of variable selection, in which the predictor variables with an effect value closer to zero are removed from the prediction model [59].Unexpectedly, the Bayes-A method was slightly superior to the previous methods, suggesting that a large number of markers have an effect on the study trait (similar to RRBLUP).In general, the BLASSO and Bayes-B methods have been suggested for the prediction of traits that are controlled by a small number of QTLs and relatively large effects [78,79], while Bayes-A method exploits the prior knowledge that many SNPs have small individual effects on the trait [80].For the present study, the BLASSO and Bayes-B methods would not be adequate to predict breeding values of DBH, H and VOL because these traits are better represented by an infinitesimal model where the effects are relatively small and homogeneous.On the other hand, the predictive ability of DBH, VOL and H obtained by the PCR method was superior to the Bayesian methods.The PCR method develops the components and/or latent variables identifying linear combinations that present a correlation degree [41], which would explain the response variable considerably.Therefore, these methods can deal with the multicollinearity effects of the predictor variables.Based on the LD pattern results of the study population, approximately 3% of the predictor variables were highly correlated, which may indicate that the main components consisted of SNPs found in LD because their frequencies in the population would be correlated.Notably, the SNPs that play a relatively important role for growth-related traits (RRBLUP-B) were located on all 11 chromosomes, which was consistent with the fact that QTLs detected for growth traits, such as DBH and H of Eucalyptus, have previously been identified in different linkage groups [81,82].For instance, Bundock et al. [83], Thumma et al. [82] and Gion et al. [84] identified QTLs for DBH and H on chromosomes 2, 5, 6, 8 and 11 in Eucalyptus trees of 4 and 5 years of age, which is consistent with the findings of the present study.In addition, when we ranked the top ten SNPs explaining the phenotypic variation of each trait, three SNPs performed a high effect on DBH and VOL, which is consistent with previous studies of QTL for growth traits in Eucalyptus (e.g., Arriagada et al. [13]).These background and results confirm that the methods involving dimensionality reduction and variable selection (RRBLUP-B, supervised PCR) perform a better predictive ability than the PCR method.
The PA for BQ and ST traits was significantly increased with the use of predictive methods based on dimension reduction and variable selection (RRBLUP-B, supervised PCR) as well as Bayes-B and BLASSO.Bayes-B and BLASSO methods assume that the effects of some SNPs on phenotypic variation tend to be zero; therefore, these SNPs could be removed from regression model [27].The effects of the markers (expressed in absolute value) of BQ and ST were highly variable, indicating that the variation of these traits may be explained by a combination of QTLs of greater effects and others of lesser effects.Further, the BLASSO and Bayes-B methods were slightly superior to RRBLUP and Bayes-A, supporting the previous hypothesis of that some SNPs with an effect value closer to zero are not important to explain the phenotypic variation of these traits, and they can be removed from the prediction model.To date, no study on Eucalyptus species has reported QTLs associated with BQ.However, in rice [85] and, more recently, in peach [86], the angle of branches and spikes has been found to be controlled by a QTL of greater effect known as TAC1, which provides an estimate of the possible oligogenic inheritance of BQ.Regarding ST, a limited number of QTL studies associated with this trait have been reported for Eucalyptus [13,14].In agreement with the present findings, Arriagada et al. [13] in a breeding population of Eucalyptus cladocalyx, reported microsatellite markers associated with ST, which were located on chromosome 11.
The results of the genomic prediction of the present study were comparable to previous reports in Eucalyptus species.For instance, in hybrids of E. gradis Hill ex Maiden, E. urophylla S.T. Blake and E. globulus of 3-4 years of age, Resende et al. [30] obtained PA values higher than 0.5 in growth traits using the RRBLUP method.However, the present results involved a smaller number of predictor variables to obtain maximum accuracy.In other Eucalyptus species, Müller et al. [87] obtained a PA for growth-related traits (DBH and VOL) lower than 0.5 in open pollinated families of E. benthamii Maiden & Cambage and E. pellita F. Muell.Regarding other traits relevant to wood production, such as BQ and ST, Isik et al. [88] in another species of interest for the forestry industry (Pinus pinaster Aiton), reported a PA for stem sweep over 0.55, and they obtained the maximum predictive accuracy by non-Bayesian methods.
According to the present results, the methods based on dimensionality reduction in the matrix of predictor variables (i.e., PCR) performed better than traditional prediction methods, such as RRBLUP and Bayesian methods.An advantage of PCR is that no assumptions of prior distribution of marker affects are made [42].This quality allows PCR to have broad applications in the prediction of phenotypic traits with different genetic architectures.Furthermore, supervised methods tend to be computationally more efficient than unsupervised methods [44] and allow the removal of variables correlated with other predictor variables, but they are irrelevant to explain the variation of the variable response.Prediction models based on variable reduction have been widely used in animal genomic prediction, but their use in plants is scarce [38,89].Therefore, taking advantage of prediction models that involve dimensionality reduction and variable selection is beneficial.
The results of this study provide new knowledge to assist breeding programs of E. globulus based on growth and wood form traits.We evaluated a progeny trial established under unusual conditions for E. globulus to generate accurate genomic prediction models.In fact, the southern limit of E. globulus in Chile occurs close to the study location, at the central part of the Chilean administrative region of Los Lagos; a region that account for only 4% of the total E. globulus plantations in Chile.In the global context, several regions worldwide will be affected by the climate change, and therefore, it is expected a significant reduction of temperate forests [90,91].The diversification of plantation areas could represent new opportunities for forest planting that can benefit the local economy.

Figure 1 .
Figure 1.Distribution of pedigree (a) and genetic relationship values (b) for the breeding population studied.

Figure 1 .
Figure 1.Distribution of pedigree (a) and genetic relationship values (b) for the breeding population studied.

Figure 2 .
Figure 2. Genome-wide linkage disequilibrium (LD) decay pattern for the study population of Eucalyptus globulus.Labill.Pairwise LD values expressed by r 2 ; LD (R 2 ), as adjusted according to Hill and Weir[53] (curve colored by black).Distance (bp) is the distance between SNP pairs.The LD threshold of r 2 = 0.1 is indicated with a dotted line.

Figure 3 .
Figure 3. Box plot of adjusted means for the predictive ability (PA) of seven GS methods for branching quality (BQ), stem straightness (ST), wood volume (VOL), diameter at breast height (DBH) and tree height (H).

Figure 3 .
Figure 3. Box plot of adjusted means for the predictive ability (PA) of seven GS methods for branching quality (BQ), stem straightness (ST), wood volume (VOL), diameter at breast height (DBH) and tree (H).

Table 1 .
Site conditions of La Poza, municipality of Purranque, Chilean administrative region of Los Lagos.

Table 2 .
Restricted Maximum Likelihood (REML) estimates of variance components and narrow-sense heritability (standard error in parenthesis) for stem straightness (ST), branching quality (BQ), total tree height (H), diameter at breast height (DBH) and stem volume (VOL), evaluated in full-and half-sib families of Eucalyptus globulus Labill.REML: Restricted Maximum Likelihood, CVa: coefficients of additive genetic variation.

Table 2 .
Restricted Maximum Likelihood (REML) estimates of variance components and narrowsense heritability (standard error in parenthesis) for stem straightness (ST), branching quality (BQ), total tree height (H), diameter at breast height (DBH) and stem volume (VOL), evaluated in full-and half-sib families of Eucalyptus globulus Labill.REML: Restricted Maximum Likelihood, CVa: coefficients of additive genetic variation.

Table 4 .
Predictive ability of seven prediction methods (with cross-validation) for branching quality (BQ), stem straightness (ST), wood volume (VOL), diameter at breast height (DBH) and tree height (H) in Eucalyptus globulus families, evaluated under oceanic climate conditions.NPV corresponds to number of predictor variables, SNPs or Components (PCR and Supervised PCR (S PCR)) used.

Table 4 .
Predictive ability of seven prediction methods (with cross-validation) for branching quality (BQ), stem straightness (ST), wood volume (VOL), diameter at breast height (DBH) and tree height (H) in Eucalyptus globulus families, evaluated under oceanic climate conditions.NPV corresponds to number of predictor variables, SNPs or Components (PCR and Supervised PCR (S PCR)) used.