Association of Allelic Variation in PtoXET16A with Growth and Wood Properties in Populus tomentosa

Xyloglucan endo-transglycosylases (XETs) modify the xyloglucan-cellulose framework of plant cell walls and, thus, affect cell wall expansion and strength. Dissecting the mechanism by which natural variation in XETs affects wood properties can inform breeding efforts to improve wood quality and yield traits. To this end, we isolated a full-length PtoXET16A cDNA clone from Populus tomentosa. Real-time PCR analysis showed that PtoXET16A was maximally expressed in the root, followed by phloem, cambium, and developing xylem, suggesting that PtoXET16A plays important roles in the development of vascular tissues. Nucleotide diversity and linkage disequilibrium analysis revealed that PtoXET16A has high single nucleotide polymorphism (SNP) diversity (π = 0.01266 and θw = 0.01392) and low linkage disequilibrium (r2 ≥ 0.1, within 900 bp). SNP- and haplotype-based association analyses of 426 individuals from a natural population indicated that nine SNPs (including two non-synonymous markers and one splicing variant) (p ≤ 0.05, false discovery rate Q ≤ 0.01), and nine haplotypes (p ≤ 0.05) were significantly associated with growth and wood properties, each explaining from 3.40%–10.95% of phenotypic variance. This work shows that examination of allelic variation and linkage disequilibrium by a candidate-gene-based approach can help to decipher the genetic basis of wood formation. Moreover, the SNP markers identified in this study can potentially be applied for marker-assisted selection to improve growth and wood-property traits in Populus.

as either wall-loosening or wall-strengthening agents, depending on the isoform, but no isoform-specific properties have been identified that could explain the differences in their effects [21][22][23][24].
Wood quality and yield are key traits for cultivation of trees and many wood-and yield-related traits vary quantitatively; to improve these traits, researchers have shown tremendous interest in using association mapping to identify the genes responsible for this quantitative variation. Association mapping identifies quantitative trait loci by examining the marker-trait associations that can be attributed to the strength of linkage disequilibrium (LD) between markers and functional polymorphisms across a set of diverse germplasm. Combined with exploiting natural diversity and development of robust statistical analysis methods, association mapping is becoming one of the main methods for dissecting the genetic architecture of key traits and has been widely used in forest tree species [25][26][27]. Of these tree species, Populus tomentosa, as one of the main commercial tree species used for timber production in China, plays an indispensable role in ecological and environmental protection along the Yellow River. Therefore, association studies of single nucleotide polymorphisms (SNPs) associated with growth and wood properties of P. tomentosa are essential to detect functional allelic variation for marker-assisted selection in breeding programs that aim to improve the quality and quantity of wood products in P. tomentosa.
Here, we used a candidate gene based approach to examine genetic variation in the P. tomentosa XET homolog PtoXET16A. We used a combination of single-marker-and haplotype-based association methods in an association population to identify several associations underlying natural variation of complex growth and wood properties. Our study provides a necessary foundation for improving the quantity and quality of wood by breeding in the tree crop species P. tomentosa.

Isolation of PtoXET16A from P. tomentosa
A full-length cDNA encoding a XET16A-like protein was isolated from a cDNA library prepared from mature xylem zone of P. tomentosa. The cDNA clone (GenBank Accession No. KM267530) is 1141 bp in length, and contains a full-length open reading frame (870 bp), encoding a polypeptide of 290 amino acids with an estimated molecular mass of 33.70 kD and a pI of 7.62 (http://web.expasy.org/protparam/), flanked by 146 bp of 5' untranslated region (5'UTR) and 122 bp of 3'UTR ( Figure 1). Alignment of the full-length cDNA sequence to the genomic sequence showed that PtoXET16A has three introns and four exons ( Figure 1). Identification of protein domains, families and functional sites by matches to the Prosite database (http://prosite.expasy.org/prosite.html) and analysis of the protein sequence for Pfam matches (http://pfam.sanger.ac.uk/) showed that the predicted protein has the active site of glycosyl hydrolase family 16 EIDFEFLGNRT (at residues 107-117) ( Figure 1) and an XET C-terminal sequence in the fourth exon (at residues 234-284) ( Figure 1).
The molecular phylogeny of XTH gene products includes three major branches (I/II, IIIA and IIIB) ( Figure 2). Of these, the largest cluster confirmed previous studies that suggested merging groups I and II. This analysis indicates that PtoXET16A belongs to group I. A BLASTP search with PtoXET16A as the query sequence revealed that the PtoXET16A protein shares 98% identity with PttXET16-34 (AAN87142), 79% identity with AtXTH5 (AT5G13870) and 76% with OsXTH2 (Os11g0539200) ( Figure 2, Table S1). The alignment shows that PtoXET16A lacks four amino acids (YIIV) that are present in the XET16As from other species. The tertiary structure predicted using Swissmodel (http://swissmodel.expasy.org/), showed that PtoXET16A and PttXET16-34 have similar structures. However, the amino acids missing in PtoXET16A but present in PttXET16-34 did produce a structural difference in one region ( Figure 2).

Analysis of PtoXET16A Expression
We determined to what extent PtoXET16A exhibits tissue-specific expression in P. tomentosa. Levels of PtoXET16A mRNA in various poplar tissues, including apical meristem, root, phloem, cambium, developing xylem, mature xylem, young leaf and mature leaf, were measured by quantitative real time-PCR (RT-PCR) with gene-specific primers and Actin as an internal control ( Figure 3a). PtoXET16A mRNA was the most abundant in root (5.033 ± 0.012), followed by phloem (1.573 ± 0.002), cambium (1.471 ± 0.009), and developing xylem (1.392 ± 0.006). In contrast, relatively lower abundances of PtoXET16A mRNA were detected in mature leaf (0.647 ± 0.013), young leaf (0.637 ± 0.002) and mature xylem (0.530 ± 0.016). These observations indicated that PtoXET16A shows preferential expression in vascular tissues, suggesting that PtoXET16A plays an important role in wood formation.
We further tested whether hormone treatments induced PtoXET16A expression, testing abscisic acid (ABA), indoleacetic acid (IAA), GA, and naphthylacetic acetic acid (NAA) (Figure 3b) in P. tomentosa. The results revealed that the expression of PtoXET16A was induced by treatment with most plant hormones, except for IAA (Figure 3b). Of these treatments, the expression level of PtoXET16A following GA treatment was more than four times higher than the controls, indicating that the expression of PtoXET16A could be strongly regulated by GA.

Figure 2.
A rooted phylogenetic tree and three-dimensional structures of XTH gene products. (a) A rooted phylogenetic tree of PtoXET16A and other predicted products of XTH genes. The similarity to other XTH gene products was calculated using the UPGMA program. Full-length protein sequences were used for the comparison and the gene models used are listed in Table S1. The phylogenetic tree presents predicted protein sequences for the XTH family of P. trichocarpa, numbered according to Geisler-Lee et al. [9], Arabidopsis thaliana XTH proteins, numbered according to Yokoyama and Nishitani [7], and Oryza sativa XTH gene products, numbered according to Yokoyama et al. [8]; (b) Three-dimensional structures of PtoXET16A constructed using Swissmodel (http://swissmodel.expasy.org/); (c) Three-dimensional structures of PttXET16-34, constructed using Swissmodel (http://swissmodel.expasy.org/). The polypeptide chain is colored from blue (N terminus) to red (C terminus). The red circle shows the location of four missing amino acids (YIIV) compared with PttXET16-34.
We further tested whether PtoXET16A was inducible by different abiotic stresses (Figure 3c). Similar expression patterns were observed in plants exposed to freezing, heat and high-salinity stresses, in which the relative PtoXET16A mRNA levels gradually increased over the course of the stress treatment (Figure 3c). Compared with the control, the relative expression of PtoXET16A was repressed in freezing and high-salinity stresses; conversely, PtoXET16A expression was significantly induced in heat and drought conditions. When the plants recovered, PtoXET16A expression returned to the level of the control (Figure 3c). We also found that the strongest relative expression of PtoXET16A was in response to high-temperature stress (Figure 3c). Under drought conditions, the highest expression was observed at 10% soil water content (4.260 ± 0.011), followed by 30% soil water content (2.171 ± 0.007) (Figure 3c). These results indicated that PtoXET16A expression is sensitive to heat and drought stimuli. (c) Expression analysis in P. tomentosa of PtoXET16A in response to abiotic stresses; CK, control check; Re, recovered condition; F, freezing stress; HT, high-temperature stress; D, drought condition; S, high-salinity stress. Samples were exposed to 150 mM NaCl, 4 °C and 42 °C for 3, 6, and 24 h for high-salinity, freezing stress and high temperature stress treatments, respectively. Drought condition was induced by withholding soil water content to 30%, 20%, and 10% of their original content at room temperature. The treated plants were then transferred to pots under normal growing conditions for 24 h to recover from cold, heat, drought and high salinity, which were denoted as Re-F, Re-HT, Re-D, Re-S, respectively. As control, samples without treatments were used.

SNP Diversity and Genotyping
To identify SNPs in PtoXET16A, the approximately 2266 bp genomic region of PtoXET16A was amplified and sequenced from 43 unrelated individuals, representing almost the entire natural range of P. tomentosa. Table 1 summarizes the statistical analysis of nucleotide polymorphisms over different regions of PtoXET16A. Across the samples, 134 SNPs were detected in the whole gene at a frequency of approximately one SNP every 17 bp (Table 1). Forty-three of these SNPs occurred in exons, and included 18 missense and 25 nonsense mutations ( Table 1). All together, 49 of 134 SNPs (38.1%) were considered as common (frequency > 0.10). In general, the PtoXET16A locus has high nucleotide diversity (π), with π = 0.01266 and θw = 0.01392, respectively (Table 1). More specifically, estimates of nucleotide diversity, π, for the different gene regions ranged from 0.00239 (intron 2) to 0.02461 (intron 1), and θw varied between 0.00142 (intron 2) and 0.01985 (intron 1). Within coding regions, the non-synonymous nucleotide substitution rate (πnonsyn) was markedly lower than πsyn, with a πnonsyn/πsyn ratio of 0.2554 < 1.0, suggesting that diversity at the synonymous sites of exon regions resulted from strong purifying selection ( Table 1). The 49 common SNPs were successfully genotyped across 426 trees in the association population by using locked nucleic acid technology.
Genetic differentiation within and among three geographically independent climatic regions was examined using the nucleotide diversity data from PtoXET16A ( Table 2). Levels of nucleotide variation (measured using π) in the three climatic regions varied, but showed similar patterns of πtot, πsil, πs and πn ( Table 2). These observations suggested that the level of selective constraint was similar between the three climatic regions. Tajima's D was positive in the southern, northeastern and northwestern climatic regions but negative in the P. tomentosa population as a whole; however, no significant departures from the neutral expectation were observed ( Table 2). The Fu and Li's D statistical tests were positive for the northeastern and northwestern populations, but were negative for the southern region and the P. tomentosa population as a whole, revealing the existence of an excess of low-frequency mutations for this gene region in the P. tomentosa species-wide samples ( Table 2).  N, number of sequences sampled; π tot , average nucleotide diversity in full gene; π sil , average nucleotide diversity in synonymous and noncoding sites; π s , average nucleotide diversity of synonymous mutation; π n , average nucleotide diversity of non-synonymous mutation.

Linkage Disequilibrium and Phenotype-Genotype Associations
The nonlinear regression shows a clear and rapid decline of LD with distance in base pairs within PtoXET16A (r 2 ≥ 0.1, within 900 bp), indicating that LD of the SNP loci did not extend over the entire gene region ( Figure 4). Within-group analyses of LD showed a similar decline in samples from the southern region, with the r 2 values declining to 0.1 within 900 bp. Nevertheless, we observed a higher level of LD within samples from the northeastern and northwestern regions, with the r 2 values declining to 0.1 within approximately 1700 bp. These results revealed that the northeastern and northwestern regions seem to have experienced similar histories and the southern region had a higher evolutionary rate. Associations between 30 SNPs and 10 growth and wood quality traits were tested by using the mixed linear model (MLM) in TASSEL version 2.1 (Buckler lab, New York, NY, USA, 2010). The MLM identified 37 significant markers (p < 0.05), but correction for false discovery rate (FDR) (FDR < 0.05) reduced this to 13. These associations were identified in the exon, intron, and 3'UTR regions of PtoXET16A (Table 3).

Single-SNP-Based Association
In total, twelve associations representing nine significant SNPs were identified (Table 3). Of these, SNP15, a missense mutation in exon 3, resulted in an encoded amino acid change from Tyr to His, and was significantly associated with two traits, including diameter at breast height and stem volume (Table 3). SNP6, another missense mutation in exon 1, resulted in an encoded amino acid change from Ala to Pro, was significantly associated with lignin content and explained 10.95% of the phenotypic variance (Table 3). SNP21 in intron 3, a synonymous mutation, was found to be associated with fiber length and microfiber angle, and SNP29 from the 3'UTR showed genetic association with fiber length and lignin content (Table 3). Markedly, SNP16, one splice variation in the second base of intron 3, resulted in splice junction (splice donor) from GT to GC, was significantly associated with lignin content and explained 5.37% of the phenotypic variance. All together, these SNP loci explained a small proportion of the phenotypic variance, with the individual effects ranging from 3.40% to 10.95% (Table 3). These small SNP effects are in accordance with polygenic quantitative models of complex traits.
Most of the associations were consistent with modes of gene action other than codominance ( Table 4). Five of the 12 marker-trait pairs for which dominance and additive effects could be calculated were consistent with over-or underdominance (|d/a| > 1.25). For example, heterozygotes for SNP6 had higher lignin content, on average, than either homozygote class (18.09% for GG, 21.02% for GC, 17.92% for CC) (Figure 5a). The remaining seven marker-trait pairs were split between modes of gene action that were partially to fully dominant (0.50 < |d/a| < 1.25) (Figure 5a), such as SNP14 (20.96 for CC, 17.59 for CT, 17.26 for TT in microfiber angle) or codominant (|d/a| ≤ 0.5), such as SNP15 (22.14 for CC, 21.44 for CT, 21.14 for TT in diameter at breast height) (Figure 5a).
where G is the overall trait mean, G ij is the trait mean in the ijth genotypic class and p i is the frequency of the ith marker allele. These values were always calculated with respect to the minor allele.

Haplotype-Based Associations
Haplotype analysis by Haploview, using genotype data for 30 SNPs from 426 individuals in the association population, showed four distinct haplotype blocks within PtoXET16A ( Figure 6). We used a haplotype trend regression test [28] to identify the haplotypes significantly associated with the 10 growth and wood quality traits. Three common haplotypes (allele frequency > 5%, p-value < 0.05) were observed with significant effect on these traits. These haplotypes span exon 1, intron 3, exon 4 and the 3'UTR ( Figure 6). The proportion of phenotypic variation explained by these haplotypes ranged from 1.63% to 10.46% (Table 5). Among them, three haplotypes from SNP5-6 and SNP16-19 were associated with diameter at breast height, with the individual effects ranging from 1.63% to 2.71%. In addition, association between the three haplotypes and stem volume were observed, which explained 3.05%-4.41% of the phenotypic variance (Table 5). Of these, two haplotypes from SNP5-6 were associated with lignin content and one haplotype from SNP27-30 was associated with microfiber angle (Table 5).

Figure 5.
Genotypic effect on SNP markers in the association population (a) Three modes of gene action quantified using the ratio of dominant to additive effects estimated from least-square means for each genotypic class. Three modes of gene action were observed as over-dominance (SNP6), full dominance (SNP14) and co-dominance (SNP15); (b) Genotypic effect on SNP6 with lignin content in the discovery population and three subsets (Subset A, 170 individuals from the southern region; Subset B, 91 individuals from the northwestern region; Subset C, 165 from the northeastern region).

Validation of Association Testing
Three smaller subsets derived from the association population were used to validate the significant single-SNP associations identified in the association population ( Table 6). Eight of the twelve significant marker-trait associations were validated in at least one of the three smaller subsets. In total, six SNP markers (SNP6, SNP14, SNP15, SNP16, SNP21, and SNP29) were significantly associated with five traits, including fiber length, lignin content, D, V and MFA at the threshold of p < 0.05. The proportion of phenotypic variation explained by the six SNP markers varied from 2.39% to 13.07% ( Table 6). Associations of SNP22, SNP23 and SNP27 with fiber length/width were not validated in any of the three smaller subsets. The failure to validate these three significant associations in this study may arise from the complexity of quantitative traits, the small sample size or other factors.

Transcript Analysis of SNP Genotypes
To identify whether these significant associations affect gene expression at the mRNA level, we validated SNP associations via gene expression analyses. Transcript levels among the different genotypic classes for nine significantly associated SNPs (Table 3) were compared by RT-PCR with gene-specific primers. The assays used secondary xylem from the 20-year-old trees to quantify the mRNA levels in 30 trees (including almost all genotypes). Measurement of different transcript abundance across three (using analysis of variance, ANOVA) or two genotypic classes (using t-test) indicated that two markers (SNP16 and SNP29) showed significant differences in the RNA transcript levels in the association population. For the marker SNP16 (intron 3, IVS3 + 2T>C), the higher abundance of the mRNA (1.4922 ± 0.4271) was found in the TT group, and the transcript level of the CT group was 0.9175 ± 0.0828 (Figure 7). In examining genotype-specific transcript levels for SNP29 (3'UTR), the heterozygous trees (1.1116 ± 0.1042 in GC group) for this marker showed higher relative mRNA abundance than the homozygous trees (0.7390 ± 0.1389 in GG group) (Figure 7).

Nucleotide Diversity and LD in PtoXET16A
SNP-based association mapping requires a comprehensive investigation of the patterns of SNP distribution and frequency within the full-length PtoXET16A locus in a natural population of P. tomentosa. Within coding regions, the πnonsyn:πsyn ratio (0.2554) was significantly less than 1 for PtoXET16A, as commonly observed for other genes in natural populations of forest trees [29]. Synonymous mutations occurring during evolution may be fixed with a higher probability than neutral ones because of purifying selection. Interestingly, levels of average nucleotide diversity in noncoding regions were lower than in the coding regions (Table 1) due to conservation of intron 2, in which just one SNP locus (SNP14) was observed. Some introns experience higher selective constraints than synonymous coding sites, possibly because they harbor key regions for regulation of expression [30]. This suggests that these conserved intron sequences may affect gene regulation [31]. In this study, a significantly higher frequency of polymorphisms was found in intron 1 than in the other regions in PtoXET16A (π = 0.02461) ( Table 1). This finding suggests that this noncoding region may be relatively unstable and a "hotspot" for genetic change [32].
LD describes a key aspect of genetic variation in natural populations. We examined LD across tree climatic regions and observed a rapid decay of LD within just a few hundred base pairs, indicating that association genetics has the potential to identify the genes responsible for variation in these traits. Previous studies in P. tremula (five genes) [33], P. nigra (nine genes) [34] and P. trichocarpa (39 genes) [35] showed a similarly rapid decay of LD. In this study, the level of LD decay in PtoXET16A was analyzed separately within each of the three climatic regions and for the complete natural population (Figure 4). The results showed that the northeastern region had higher LD than the northwestern and southern regions, consistent with the higher frequency of exclusive SNPs observed in this region (Table 2). Our results show that the LD in PtoXET16A within three climatic regions may be more extensive than the LD found in our range-wide P. tomentosa samples (Figure 4), consistent with previous studies [36,37]. However, a recent genome-wide study of the extensive LD in P. trichocarpa (r 2 > 0.2, within 3-6 kb) suggests that genome-wide association studies and genomic selection in natural populations may be more feasible in Populus than previously assumed [38]. Therefore, our future work will focus on estimation of LD decay with greater genomic coverage and exploration of the variability of haplotype structure across the entire genome. Such studies will also help to elucidate how Populus managed to adapt to a wide variety of environmental conditions [39].

The Putative Functional Roles of XETs
Xyloglucan is incorporated and modified in the cell wall network by various enzymes encoded by XTHs, and forty-one XTH gene models can be found in the P. trichocarpa genome. Genome sequencing projects have revealed the presence of XTHs in various plant species and their transcription in various plant tissues [10,40,41]. For example, PtXTH10 and PtXTH24 are expressed in shoot tips, young leaves, mature leaves, and roots [42]. OsXTH8 is expressed in rice leaf sheath, root, leaf blade and callus [18]. AtXTH4 and AtXTH27 are highly and ubiquitously expressed in most organs including leaf, stem, flower and silique [7]. We found PtoXET16A expression in most organs and tissues, with the most abundant expression in root, followed by phloem, cambium and developing xylem (Figure 3a). These results are consistent with studies of its ortholog AtXTH5, which exhibits root-specific expression [7]. These results also support the finding that XTH genes expressed in the cambial region are also strongly expressed in the developing phloem [5].
The existence of at least 16 XET genes in Populus suggests that individual XETs may exhibit distinct patterns of expression in different developmental stages and in response to hormonal and environmental stimuli [5,15,40]. The expression of PtXTH22 shows up-regulation in response to various hormones including 6-benzylaminopurine, IAA, salicylic acid, GA, brassinosteroids, jasmonic acid, and ABA [42]. In our study, the expression of PtoXET16A was up-regulated by most of the plant hormone treatments, except for IAA (Figure 3b). Of these, PtoXET16A was most strongly induced by GA, to four times higher mRNA levels than the controls (Figure 3b). Transcriptional up-regulation of XTH expression by GA has been demonstrated in several instances [43]. For example, OsXTH8 expression increased in a dose-and time-dependent manner with GA3 treatment [18]. In P. tomentosa (Figure 3b), similar expression patterns were also observed, indicating a general interaction between XETs and GA. Products of XET-encoding genes may be responsible for the resulting cell wall changes in response to plant hormones. As signals that mediate plant systemic responses [44], interactions between plant hormones also modulate the expression of stress-responsive genes [45]. Hence, it is expected that PtoXET16A is up-regulated in response to abiotic stresses in P. tomentosa (Figure 3c). The relative expression levels increased gradually over time in response to freezing, heat and high-salinity stresses (Figure 3c). After recovery, PtoXET16A expression returned to the level of control plants (Figure 3c), suggesting that PtoXET16A maintained a certain level of expression under all four stimuli. Because the cell wall is a primary determinant of cell and organ shape, alterations in morphogenesis would most likely require cell wall modification. Since XETs modify a major component of the plant cell wall, they may have important functions in altering wall properties in response to environmental conditions. Moreover, the level was significantly up-regulated under heat and drought conditions, indicating that PtoXET16A expression is sensitive to heat and drought stimuli [19,46]. XETs have been detected in plants responding to stresses including flooding [47], salt tolerance [46] and others.

Dissecting Allelic Polymorphisms Underlying Growth and Wood Properties
XETs encode key enzymes acting in the formation and modification of the carbohydrate matrix of wood cell walls, justifying selection of an XET as a targeted candidate gene for examination of wood properties [1]. XETs probably play crucial roles in assimilating the products of photosynthesis into sugars and starch [48], in synthesizing cell wall biopolymers and in creating various glycosylated compounds [9]. Hence, we identified allelic polymorphisms in PtoXET16A, and examined their association with underlying growth and wood properties, using LD-based association in P. tomentosa. Our study identified 12 significant associations, which accounted for a small proportion of the phenotypic variance and were within the range of those published previously for forest trees [26]. Because of the low LD in P. tomentosa (Figure 4), once a marker-trait association has been discovered and validated, it is likely that such a marker is located in close proximity to the causal polymorphism or is the functional variant [49]. In total, eight of the 12 significant marker-trait associations representing six SNP markers were validated in at least one of the three subset populations. These SNPs, thus, provide powerful molecular markers for efficient marker-assisted breeding in P. tomentosa.
Of these, five SNP markers (SNP21, SNP22, SNP23, SNP27, and SNP29) associated with fiber length/width were observed using MLM, each explaining 3.40%-5.70% of the phenotypic variance (Table 3), supporting the idea that XET may have a strong effect on cell expansion in fibers [5]. Fibers, as the most abundant secondary wall-containing cells in the wood of dicot species, are affected by maturation (growth) stresses, particularly in tension wood [50]. One association between SNP15 and diameter at breast height was observed in the association population (Table 3) and validated in the subset populations. SNP15 is a missense mutation in exon 3 and results in an encoded amino acid change from Tyr to His. Many functional analyses of SNPs have examined coding regions in candidate genes related to wood traits, identifying SNPs that can alter proteins. These SNPs may have major effects on protein function, and thus on plant phenotype. Our study is supported by an analysis of PtxtXET16-34 overexpression, in which xyloglucan distribution was higher in radial walls than in tangential walls, suggesting that XETs affect the expansion of radial walls especially during meristematic stage [5].
Intuitively, haplotypes may be more powerful than individual, non-ordered markers [35]. Of the markers identified in this study, SNP6, a missense mutation in exon 1, was significantly associated with lignin content and two haplotypes around SNP6 were also associated with lignin content traits (Table 3) suggesting that SNP6 may be a functional polymorphism that is in or near a locus that affects lignin content. Genotypic effect analysis showed that the trees heterozygous (GC) for this marker showed higher average lignin content than the homozygous trees ( Figure 5), indicating over-dominance. Although there is no experimental evidence supporting a direct interaction, an indirect interaction between lignin and xyloglucan via connections with other wall components is possible. Pattathil et al. [51] suggested that lignin has a central role in overall wall structure, perhaps through direct covalent connections to diverse wall polysaccharides or through strong non-covalent interactions with these polymers. XETs, which are believed to be important regulators of cell wall expansion, specifically cutting the backbone of xyloglucan and re-forming a glycosidic bond with the free end of another xyloglucan chain [52].
Introns in a wide range of organisms including plants, animals, and fungi are able to increase the expression of the gene that they are contained in. In this study, we found a splicing site mutation, SNP16 (IVS3 + 2T>C) in intron3 of PtoXET16A. MLM testing indicated that it was closely associated with lignin content (p < 0.001, FDR < 0.05) ( Table 3), suggesting that this splicing variant contributes to the variation in lignin content. SNPs in introns could affect phenotypic traits because those particular introns may play an important role in regulating gene expression [53]. Previous studies have demonstrated that introns can enhance eukaryotic gene expression and affect gene expression in diverse organisms including plants, insects, mice, and humans [54,55]. Some of this stimulation is due to a splicing-dependent increase in mRNA levels. To know if the expression level of alternatively spliced mRNA varied between the two groups (CT and TT), we used real-time PCR to quantify the mRNA levels in 30 trees, and observed significant differences in the RNA transcript levels in the association population. The mean relative expression levels of mRNA products for TT group and CT group were 1.4922 ± 0.4271 and 0.9175 ± 0.0828, respectively (Figure 7a). This indicated that this mutation affects the expression of PtoXET16A. Interestingly, in the association population, just two genotypes of SNP16 were observed (CT and TT). The number of genotypes was 391 and 34, respectively, suggesting that SNP16 may be a recessive lethal mutant (CC) or trees have a better wood phenotype with the T allele. Two haplotypes from SNP16-19 were associated with diameter at breast height and stem volume, indicating that the haplotypes containing the polymorphisms could cause variation in growth and wood quality traits.

Plant Materials
The association population consisted of 426 unrelated P. tomentosa individuals grown in Guan Xian County, Shandong Province, where root segments of 1047 native individuals collected from the entire natural distribution range of P. tomentosa were used to establish a clonal arboretum in 1982 using a randomized complete block design with three replications [56]. On the basis of principal component analysis and ISODATA fuzzy clustering of 16 meteorological factors [57], the total climatic zone covered by these individuals can be divided into three large climatic regions: southern, northwestern, and northeastern. In the present study, 426 unrelated individuals representing almost the entire geographic distribution of P. tomentosa (170 from the southern region, 91 from the northwestern region, and 165 from the northeastern region) were used for the SNP association analysis. In addition, a panel of 43 unrelated individuals (15 from the southern region, 14 from the northwestern region, and 14 from the northeastern region) was sequenced to identify SNPs within PtoXET16A.

Phenotypic Data
Phenotypic data of ten traits were measured, including lignin content, holocellulose content, alpha-cellulose content, fiber length, fiber width, microfibril angle, tree height, diameter at breast height, volume of wood, and tree height/tree diameter. The data of tree height and diameter at breast height were derived from field surveys in 2011 and used to evaluate the volume of wood. Wood cores were collected from each tree at a height 1.35 m above ground level, in which the variation in microfibril was characterized using an X-ray power diffractometer (Philips, Eindhoven, The Netherlands). These wood cores were then ground into wood meal, in which the holocellulose, α-cellulose, and lignin contents were determined using near-infrared reflectance spectroscopy according to Schimleck et al. [58]. Fiber length and width were measured using the Color CCTV Camera (Panasonic SD, Beijing, China). Furthermore, the SAS for Windows version 8.2 (SAS Institute, Cary, NC, USA) was used for ANOVA and correlation analysis for these phenotypic traits.

Isolation of PtoXET16A cDNA
The P. tomentosa cDNA library from xylem was constructed using the Superscript λ System according to the manufacturer's instructions (Life Technologies, Rockville, MD, USA). The constructed cDNA library consisted of 5.0 × 10 6 pfu with an insert size of 1.0-4.0 kb. Random end sequencing of 5000 cDNA clones and comparison with all available XTH sequences revealed that a full-length cDNA with high similarity to PttXET16-34 was isolated and named PtoXET16A.

RNA Extraction
For RNA extraction, fresh tissue samples of root, leaf, and apex were collected from the 1-year-old P. tomentosa clone LM 50. The wood-forming tissues of upright stems, including developing and mature xylem tissues, were collected by scraping the thin (approximately 1.0 mm) and the deep layer on the exposed xylem surface at breast height; the other wood forming tissues, including phloem and cambium, were collected as described [59]. All tissues were immediately frozen in liquid nitrogen and stored at −80 °C. Total RNA was extracted from the various tissues using the Plant Qiagen RNAeasy kit (Qiagen, Shanghai, China) according to the manufacturer's instructions. Additional on-column DNase digestions were performed three times during the RNA purification, using the RNase-Free DNase Set (Qiagen, Shanghai, China). RNA was then quantified and reverse transcribed into cDNA using the Superscript First-Strand Synthesis system and the supplied poly (T) primers (Invitrogen, Life Technologies, New York, NY, USA) [60].

Phylogenetic Analysis and Three-Dimensional Structures of PtoXET16A
To analyze the phylogenetic relationships between PtoXET16A and the XTHs from other species, the amino acid sequences of XTH family members, including the XTH family of P. trichocarpa, numbered according to Geisler-Lee et al. [9], A. thaliana XTH proteins, numbered according to Yokoyama and Nishitani [5]. O. sativa XTH gene products, numbered according to Yokoyama et al. [7], were identified by searching public databases available at NCBI (http://www.ncbi.nlm.nih.gov). Full-length protein sequences were used for the comparison and the gene models used are listed in Table S1. Phylogenetic analyses were conducted using MEGA version 4.0 (Arizona State University Campus, Phoenix, AZ, USA, 2007), and the neighbor-joining method was used to build phylogenetic trees [61]. Bootstrap analysis was performed using 1000 replicates. Three-dimensional structures of PtoXET16A and PttXET16-34 used Swissmodel (http://swissmodel.expasy.org/).

SNP Discovery and Genotyping
To identify SNPs within PtoXET16A, the entire gene was sequenced and analyzed in 43 unrelated individuals from the association population, without considering locations of insertions/deletions, using the software DnaSP4.90.1 (UB Web, Barcelona, Spain, 2010). All of these 43 sequences have been deposited in the GenBank database (GenBank Accession No. KM267487-KM267529). Thirty common SNPs were genotyped in 426 trees in the association population by amplification using locked nucleic acids [62,63]. Amplification was performed in a final reaction volume of 25 mL containing 20 ng genomic DNA, 0.8 U Taq DNA polymerase (Promega, Beijing, China), 50 ng forward primer, 50 ng reverse primer, 1× PCR buffer (Promega, Beijing, China), and 0.2 mM dNTPs (Promega, Beijing, China). The PCR conditions were: 94 °C for 3 min, 30 cycles of 94 °C denaturation for 30 s, annealing at 56-60 °C (depending on the primers) for 15 s, and extension at 72 °C for 1 min, with a final extension at 72 °C for 5 min.

Linkage Disequilibrium Analysis
To assess the pattern of linkage disequilibrium in the sequenced candidate gene region, the decay of LD with physical distance (base pairs) between SNP sites within PtoXET16A was estimated by linear regression analysis of linkage disequilibrium, using the DnaSP program version 4.90.1 (UB Web, Barcelona, Spain, 2010). The squared correlation of allele frequencies r 2 [64] was used to test the LD between pairs of SNP markers using the software package HAPLOVIEW (http://www.broad.mit.edu/ mpg/haploview.html). The interval value of the parameter varies from 0 to 1. The significance (p-values) of r 2 for each SNP locus was calculated using 100,000 permutations.

Association Testing
All trait-SNP association tests between 49 SNP markers and 10 traits were conducted using MLM with 10 4 permutations in the software package TASSEL, version 2.0.1 (http://www.maizegenetics.net/) [65]. The MLM can be described as follows: y = μ + Qυ + Zu + e, where y is a vector of phenotype observation, μ is a vector of intercepts; υ is a vector of population effects; u is a vector of random polygene background effects; e is a vector of random experimental errors; Q is a matrix defining the population structure from STRUCTURE; and Z is a matrix relating y to u. Var (u) = G = σ 2 aκ with σ 2 a as the unknown additive genetic variance and κ as the kinship matrix. In this Q + K model, the relative kinship matrix (K) was obtained using the method proposed by Ritland [66], which is built into the program SPAGeDi, Version 1.2 [67], and the population structure matrix (Q) was identified based on the significant subpopulations (K = 11) [68], as assessed according to the statistical model described by Evanno et al. [69], using 20 neutral genomic simple sequence repeat markers. The positive FDR method was applied to correct for multiple testing by using QVALUE software (University of Washington, Washington, DC, USA) [70].

Haplotype Analysis
Haplotype frequencies from genotype data were estimated and haplotype association tests were done on a three-marker sliding window, using haplotype trend regression software. The significances of the haplotype associations were based on 1000 permutation tests. The modes of gene action were quantified using the ratio of dominant to additive effects estimated from least-square means for each genotypic class. Partial or complete dominance was defined as values in the range 0.50 < |d/a| < 1.25, whereas additive effects were defined as values in the range |d/a| ≤ 0.50. Values of |d/a| ≥ 1.25 were equated with under-or over-dominance. Details of the algorithm and formulas for calculating gene action were previously described [31,35].

Conclusions
In this study, we used association analysis to examine the relationship between polymorphisms in PtoXET16A and wood quality and yield in a P. tomentosa association population. Nucleotide diversity and LD patterns of PtoXET16A suggested differences between the geographical subsets P. tomentosa (Figure 4). We observed that nine SNPs, including two non-synonymous markers and one splicing variation, showed significant associations with growth and wood properties ( Table 3). These SNPs explained a small proportion of the phenotypic variance, from 3.40% to 10.95%, consistent with a polygenic quantitative model. Haplotype analysis also indicated that the haplotypes containing the polymorphisms could cause variation in growth and wood quality traits. We also found that PtoXET16A is highly expressed in vascular tissues. Our study of PtoXET16A expression and genetic variation suggested that this XET may affect the expansion of radial walls. The SNP markers identified in this study can be used for marker-assisted selection to improve growth and wood-property traits in P. tomentosa.   [8]; c P. trichocarpa XTH gene products, numbered according to Geisler-Lee et al. [9].