Genome Wide Association Study Identiﬁes Candidate Genes Related to the Earlywood Tracheid Properties in Picea crassifolia Kom.

: Picea crassifolia Kom. is one of the timber and ecological conifers in China and its wood tracheid traits directly affect wood formation and adaptability under harsh environment. Molecular studies on P. crassifolia remain inadequate because relatively few genes have been associated with these traits. To identify markers and candidate genes that can potentially be used for genetic improvement of wood tracheid traits, we examined 106 clones of P. crassifolia , and investigated phenotypic data for 14 wood tracheid traits before speciﬁc-locus ampliﬁed fragment sequencing (SLAF-seq) was employed to perform a genome wide association study (GWAS). Subsequently, the results were used to screen single nucleotide polymorphism (SNP) loci and candidate genes that exhibited a signiﬁcant correlation with the studied traits. We developed 4,058,883 SLAF-tags and 12,275,765 SNP loci, and our analyses identiﬁed a total of 96 SNP loci that showed signiﬁcant correlations with three earlywood tracheid traits using a mixed linear model (MLM). Next, candidate genes were screened in the 100 kb zone (50 kb upstream, 50 kb downstream) of each of the SNP loci, whereby 67 candidate genes were obtained in earlywood tracheid traits, including 34 genes of known function and 33 genes of unknown function. We provide the most signiﬁcant SNP for each trait-locus combination and candidate genes occurring within the GWAS hits. These resources provide a foundation for the development of markers that could be used in wood traits improvement and candidate genes for the development of earlywood tracheid in P. crassifolia .


Introduction
Picea crassifolia Kom. (Qinghai spruce) is an evergreen conifer native to the Qilian Mountain areas of northwestern China that are known to be among the world's arid and semi-arid mountains. P. crassifolia is the most dominant tree species in this mountainous area and acts as a natural green reservoir for regulating and conserving water resources [1][2][3]. Drought is the primary factor for limiting tree growth in arid and semi-arid regions, greatly influencing xylem anatomical traits [4]. P. crassifolia is well adapted to these environmental conditions and produces high quality wood; however, the mechanism of wood formation under these conditions is still unclear. Studies conducted on P. crassifolia sampled from different altitudes of the Tibetan plateau, northwestern China, found that the development of tracheid radial diameter is closely related to temperature and precipitation, and trees could change their internal characteristics to adapt to changing climate [1,5]. Gymnosperms DHK: Da Hekou, QL: Qi Lian, XYH: Xi Yinghe, and SDL: Shi Dalong. Within each orchard, clones were planted at 5 × 5 m within and between rows on the same soil type and managed similarly in a complete randomized block design with 18 replications.

Wood Tracheid Traits Phenotyping
A set of 14 wood tracheid traits were phenotyped and used as the research targeted traits (Table 1). From every clone, bark to pith 5 mm wood cores were extracted from the south and north directions at 1.3 m height. Tracheid length, diameter, and lumen diameter were measured by Motic 2.0 software under light microscope. A total of 15-20 tracheids were measured for each clone. Additionally, tracheid wall thickness, tracheid lumen-diameter ratio, tracheid wall-lumen ratio, and tracheid length-diameter ratio were calculated using software EXCEL 2020. Phenotypic traits indexes (mean value, standard deviation, coefficient of variation, skewness, kurtosis, and other parameters) were analyzed by SPSS 19.0 statistical software. The Origin software (version 8.2; https://www.originlab.com/https://www.originlab. com/, accessed on 20 August 2021) was used to display the frequency distribution of each trait. Correlations among traits were analyzed and visualized by psych and ggplot2 packages in R software 4.1.0.

DNA Isolation, Construction of SLAF Library, and Genome Sequencing
Total DNA was extracted from young needle tissues of each clone using a modified CTAB method [25]. Extracted genomic DNA concentration exceeded 20 ng/L, meeting the required quality for database construction. Due to the lack of P. crassifolia (L.) H.Karst. genome sequence, Picea abies genome was used as a reference genome (ftp://plantgenie.org/Data/ ConGenIE/Picea_abies/v1.0/, accessed on 12 December 2020). The reference genome was 12 G in size with 37.88% GC content. The digestion of genomic DNA and the construction of SLAF-seq library according to the protocol by Biomarker Technologies Co. (Beijing, China). Clusters of libraries were loaded into an Illumina HiSeq for paired-end sequencing. The obtained clean reads were compared to the reference genome by BWA software [26], and the number of SLAF-tags and polymorphic SLAF-tags were counted. SNP markers were developed by GATK [27] and Samtools [28]. Afterwards, SNP heterozygosity was calculated by PLINK software [29], the final SNP data with MAF < 0.05, heterozygosity > 0.02, and SNPs with more than two alleles were used to filter out.

Population Genetic Analyses and Linkage Disequilibrium
Based on the high-quality SNP markers, the software MEGA X [30] was used to construct the NJ tree among the 106 P. crassifolia studied clones. Principal component analysis (PCA) and calculation of a relative kinship matrix were performed using the software EIGENSTRAT [31] and GCTA [32], respectively. Additionally, population structure analysis using 12,275,765 SNPs to infer the genetic background of clonal cluster membership under a given number of populations (K) was carried out. The number of genetic clusters was predefined as K = 1-10 for all clones and was calculated using Admixture software [33,34].
Linkage Disequilibrium (LD) was estimated by calculating the squared allele frequency correlation coefficient (r 2 ) between pairs of all SNP markers distributed throughout the genome using the vcftools software package [35]. The r 2 values were plotted against corresponding genetic distances in pairwise distance (bp). A nonlinear fitted-curve was drawn using second-degree locally weighted polynomial regression (LOESS) by applying the "loess" function in the R statistical program (http://www.r-project.org, accessed on 20 August 2021).

Association Genetics Analysis
A total of 12,275,765 SNPs from 106 clones were used in the GWAS. The efficient model was performed with both general linear model (GLM) and mixed linear model (MLM) using TASSEL [36], FaSTLMM [37], and EMMAX [38]. The population structure matrix generated from Admixture was used as the Q matrix for the GLM model, while the Q values and the K values of the kinship coefficient matrix were calculated by SPAGeDi [33] software using MLM model. p-values of p ≤ 1.268 × 10 −7 (p = 0.01/n; n = total markers used, which is roughly a Bonferroni correction, corresponding to −log 10 (p) = 7) and p ≤ 1.268 × 10 −8 (p = 0.1/n; n = total markers used, which is roughly a Bonferroni correction, corresponding to −log 10 (p) = 8) were defined as the genome wide control threshold and suggestive threshold, respectively. A standard interval of 100 kb (50 kb upstream and downstream) was explored for each candidate locus and adjusted according to the extent of local linkage disequilibrium with the candidate SNP (R 2 < 0.8). All candidate genes were annotated by GO, KEGG, NR, KOG, Pfam, and Swissprot databases. Manhattan plots were generated using the R package "CMplot" (https://github.com/YinLiLin/R-CMplot accessed on 20 August 2021).

Sequencing Results
The SLAF-seq sequencing, resulted in a total of 1375.57 Mb of clean data from the 106 P. crassifolia clones. GC content ranged from 39.19 to 43.62%, with an average of 40.27%. The sequencing quality value Q30 ranged from 90.97 to 97.82%, with an average of 95.51% (Table S2). SNP detection was performed, based on 4,058,883 SLAF tags detected in the sequencing, producing a total of 12,275,765 SNP loci with a minor allele frequency >0.05 were generated, and SNP integrity of each sample ranged from 15.40 to 36.56%, and the SNP heterozygosity ranged from 5.41 to 10.99% (Table S3). Our analysis yielded a total of 1,573,899 polymorphic SLAF markers with a mean depth of 21.21 (Table S4). After quality control, a total of 12,275,765 SNPs were used for subsequent GWAS analyses.

Phenotypic Characterization of 14 Wood Tracheid Traits
Tracheid phenotypic data exhibited abundant variation among clones as the 106 clones originated from different forest regions ( Figure 1). Coefficients of variation (CVs) of the 14 traits exhibited values ranged between 5.62 and 89.47%, with the largest and smallest CV values were observed for ELDR, and EWLR, respectively, and average CV values across the 14 traits was 17.54% ( Table 2). The average CVs of earlywood tracheid properties were relatively large, indicating that earlywood tracheid properties are easily affected by the environment. In general, the coefficient of variation in a half of 14 wood tracheid traits were more than 10%, indicating the present of large phenotypic variation among the studied clones.
Forests 2022, 13, x FOR PEER REVIEW 5 of 16 tracheid traits were more than 10%, indicating the present of large phenotypic variation among the studied clones.    The absolute values of skewness and kurtosis of most tracheid traits in the studied population were less than 1, and most traits were normally distributed, indicating that these traits were quantitative traits controlled by multiple genes. Correlation analysis showed a highly significant relationship existed mainly among earlywood tracheid traits ( Figure 2). In earlywood, significant correlations were observed among ED, EL, ELD, EWLR, ELWR, and EWT, ranging from −0.57 to 0.94 (p < 0.01), whereas ED was significantly positively correlated with ELD (traits correlation coefficient (r t ) = 0.94, p < 0.01), indicating that the diameter and lumen diameter of tracheid in development were increased simultaneously. Additionally, EWT was significantly negatively correlated with EWLR (r t = −0.86, p < 0.01), suggesting that tracheid wall thickness decreased as the lumen diameter of tracheid in earlywood increased. Finally, similar correlation results were observed among latewood tracheid traits. The absolute values of skewness and kurtosis of most tracheid traits in the studied population were less than 1, and most traits were normally distributed, indicating that these traits were quantitative traits controlled by multiple genes. Correlation analysis showed a highly significant relationship existed mainly among earlywood tracheid traits ( Figure 2). In earlywood, significant correlations were observed among ED, EL, ELD, EWLR, ELWR, and EWT, ranging from −0.57 to 0.94 (p < 0.01), whereas ED was significantly positively correlated with ELD (traits correlation coefficient (rt) = 0.94, p < 0.01), indicating that the diameter and lumen diameter of tracheid in development were increased simultaneously. Additionally, EWT was significantly negatively correlated with EWLR (rt = −0.86, p < 0.01), suggesting that tracheid wall thickness decreased as the lumen diameter of tracheid in earlywood increased. Finally, similar correlation results were observed among latewood tracheid traits.

Analysis of Population Structure and Linkage Disequilibrium
The Admixture software was used to analyze population clustering and structure of the 106 clones (Figure 3a,b). Specifically, clustering was first performed assuming that the number of clusters (K) was between 1 and 10. Then, the results were cross-validated to determine that the optimal K-value was 1 (according to the valley of the error rates of cross-validation). In other words, our results implied that the collection most likely originated from the same ancestors. Principal component analysis showed that the first two principal components explained only 2.86% and 2.63% of the total variance ( Figure 3c). The top 18 PCA components cumulatively contribute about 30% of the total marker variation. This means that the population structure of the association panel was weak and the population structure cannot be explained by a few principal components, which might be attributed to the extensive exchanges of P. crassifolia breeding materials. Additionally, we

Analysis of Population Structure and Linkage Disequilibrium
The Admixture software was used to analyze population clustering and structure of the 106 clones (Figure 3a,b). Specifically, clustering was first performed assuming that the number of clusters (K) was between 1 and 10. Then, the results were cross-validated to determine that the optimal K-value was 1 (according to the valley of the error rates of crossvalidation). In other words, our results implied that the collection most likely originated from the same ancestors. Principal component analysis showed that the first two principal components explained only 2.86% and 2.63% of the total variance (Figure 3c). The top 18 PCA components cumulatively contribute about 30% of the total marker variation. This means that the population structure of the association panel was weak and the population structure cannot be explained by a few principal components, which might be attributed to the extensive exchanges of P. crassifolia breeding materials. Additionally, we observed family relationships along the diagonal with a scattered distribution of closely related individuals, and the remained part of the relationship matrix indicated low kinship (Figure 3d).  The LD in P. crassifolia was estimated using all squared correlation coefficient (r 2 ) values and the physical distances between the same SNP pair. The nonlinear fitted-curve indicated that the LD is low in P. crassifolia, rapidly decaying by over 50% (from 0.50 to 0.10) (Figure 4). The average distance associated with the LD decline for r 2 = 0.02 varied roughly from 50 to 3,000 bp. This result is expected and consistent with the trend of rapid LD decay in conifers, including Norway spruce [14] and loblolly pine [39]. Additionally, we detected high LD extending up to 100 kb. In woody plants, especially conifers, there are relatively high LD exist, for example, LD extend up to 140 kb and >145 kb in Norway spruce [14] and Shorea platyclados [40], respectively. This indicates that associations between phenotypic traits and markers in LD can be more easily and feasibly detected with GWAS than with analysis of quantitative trait loci (QTLs) [40]. The LD in P. crassifolia was estimated using all squared correlation coefficient (r 2 ) values and the physical distances between the same SNP pair. The nonlinear fitted-curve indicated that the LD is low in P. crassifolia, rapidly decaying by over 50% (from 0.50 to 0.10) (Figure 4). The average distance associated with the LD decline for r 2 = 0.02 varied roughly from 50 to 3000 bp. This result is expected and consistent with the trend of rapid LD decay in conifers, including Norway spruce [14] and loblolly pine [39]. Additionally, we detected high LD extending up to 100 kb. In woody plants, especially conifers, there are relatively high LD exist, for example, LD extend up to 140 kb and >145 kb in Norway spruce [14] and Shorea platyclados Slooten ex x Endert [40], respectively. This indicates that associations between phenotypic traits and markers in LD can be more easily and feasibly detected with GWAS than with analysis of quantitative trait loci (QTLs) [40].

Association Analysis of 14 Wood Tracheid Traits
The GLM and MLM models identified more significant SNPs for the former than the later with some SNPs overlapped between the two models. However, the accuracy of MLM was better than that of GLM. A total of 96 SNP loci (p < 1.11 ×10 −8 ) randomly distributed on 887,836 loci (Tables S5 and S6) of Picea abies genome [41] were identified on the MLM as significantly associated with three wood tracheid traits (ELDR, EWLR, and EWT). All SNP loci contributed to more than 10% to phenotypic variation. Among them, 9 were simultaneously detected for ELDR, EWLR, and EWT, and 11 SNP were simultaneously detected for EWLR and EWT. These significant associations could reflect that the genetic basis of the observed correlations among these traits, supporting the observed phenotypic correlation and pleiotropic effect between phenotypes.
The regional Manhattan plots and quantile-quantile plots (QQ plots) of GLM and MLM for three earlywood tracheid traits (ELDR, EWLR, and EWT) and one latewood tracheid trait (LWLR) are presented in Figure 5. These plots include candidate genes within 100 kb of the significant SNP marker (50 kb upstream, 50 kb downstream), yielding a total of 67 candidate genes (Table 3), including 33 with unknown functions and 34 with predicted function annotation. Among these SNPs, some were highly associated with LWLR for the GLM, but not with the MLM. Three earlywood tracheid traits (ELDR, EWLR, and EWT) were associated with seven candidate genes (MA_10430313g0010, MA_10429843g0020, MA_10434936g0010, MA_11137g0010, MA_119933g0010, MA_17692g0010, and MA_19953g0020). The synonymous SNP MA_119933 is located on the gene MA_119933g0010 homologous with wall-associated receptor kinase, this ability to bind and respond to several types of pectins correlates with a demonstrated role for WAKs in both the pathogen response and cell expansion during plant development [42]. Therefore, the gene homologous with MA_119933g0010 in P. crassifolia may be involved in the cell wall development in earlywood tracheid.

Association Analysis of 14 Wood Tracheid Traits
The GLM and MLM models identified more significant SNPs for the former than the later with some SNPs overlapped between the two models. However, the accuracy of MLM was better than that of GLM. A total of 96 SNP loci (p < 1.11 ×10 −8 ) randomly distributed on 887,836 loci (Tables S5 and S6) of Picea abies genome [41] were identified on the MLM as significantly associated with three wood tracheid traits (ELDR, EWLR, and EWT). All SNP loci contributed to more than 10% to phenotypic variation. Among them, 9 were simultaneously detected for ELDR, EWLR, and EWT, and 11 SNP were simultaneously detected for EWLR and EWT. These significant associations could reflect that the genetic basis of the observed correlations among these traits, supporting the observed phenotypic correlation and pleiotropic effect between phenotypes.
The regional Manhattan plots and quantile-quantile plots (QQ plots) of GLM and MLM for three earlywood tracheid traits (ELDR, EWLR, and EWT) and one latewood tracheid trait (LWLR) are presented in Figure 5. These plots include candidate genes within 100 kb of the significant SNP marker (50 kb upstream, 50 kb downstream), yielding a total of 67 candidate genes (Table 3), including 33 with unknown functions and 34 with predicted function annotation. Among these SNPs, some were highly associated with LWLR for the GLM, but not with the MLM. Three earlywood tracheid traits (ELDR, EWLR, and EWT) were associated with seven candidate genes (MA_10430313g0010, MA_10429843g0020, MA_10434936g0010, MA_11137g0010, MA_119933g0010, MA_17692g0010, and MA_19953g0020). The synonymous SNP MA_119933 is located on the gene MA_119933g0010 homologous with wall-associated receptor kinase, this ability to bind and respond to several types of pectins correlates with a demonstrated role for WAKs in both the pathogen response and cell expansion during plant development [42]. Therefore, the gene homologous with MA_119933g0010 in P. crassifolia may be involved in the cell wall development in earlywood tracheid.   Two earlywood tracheid traits, EWLR and EWT, were associated with two candidate genes (MA_10303051g0010 and MA_10883g0010). The gene MA_10883g0010 was homologous with EXORDIUM-like 2 (EXL2), which may play a role in a brassinosteroid-dependent regulation of growth and development, and the extracellular EXORDIUM protein mediates cell expansion in Arabidopsis leaves [43,44]. Therefore, the homologous gene of MA_10883g0010 in P. crassifolia may be involved in earlywood tracheid cell expansion. Additionally, a number of genes related to metabolism and transportation were associated highly with EWLR, including carboxylase, reductases, phosphatase, and transporters. This suggests that various physiological and biochemical reactions and material transportation in tracheid probably affect the earlywood tracheid wall-lumen ratio.

Discussion
In typical GWAS, phenotype and genotype data are collected for a large sample of assembled individuals [45]. However, a representative sample size combined with highthroughput sequencing and appropriate algorithms is sufficient to generate a relatively rich set of SNP and association loci in forest trees [46,47]. While it is possible that a detected genetic marker resides within a causative gene for the phenotype of interest, this is often not the case. Instead, GWAS rely on linkage disequilibrium (LD) between markers under testing and functional polymorphisms of causative genes [48,49]. The large number of SNPs in GWAS of conifers species is a reflection of their large and complex genomes and dramatically increased marker density would enable said markers to better track LD with causal variants in these large, genetically diverse genomes [50]. A large number of SNP loci can be identified in relatively small number of samples in conifers. For example, in lodgepole pine, more than 95,000 SNPs were obtained in 98 serotinous and nonserotinous samples from three populations [51]. De la Torre et al. [52] identified 799 significant associations of cold-related traits by GWAS in 217 samples in Douglas-fir. GWAS in 194 maritime pines from different families provided the map position of 1671 SNPs corresponding to 1192 different loci [39]. In our study, the studied populations were mostly composed of individuals representing a species with narrow distribution range [53], we intentionally selected 106 individuals representing different natural forest populations that could represent the core germplasm of Qilian Mountain as determined by the little genetic relationship among them. The results of GWAS showed that there are abundant phenotypic variation and associated SNP loci. Thus, these obtained results demonstrate that sample size is not the most important factor in GWAS and the genetic relationship among samples and LD effects should be the focus of sample selection. Therefore, samples representing the core germplasm resources can be used as the materials for association analysis of the quantitative traits under the limited materials (i.e., small sample size).
Genomes of conifers are huge and complex and only few species have their genomes sequenced including Norway spruce [41], white spruce [54], and loblolly pine [55]. Traditional methods in developing molecular markers are usually cumbersome, time-consuming, and in most cases are unable to meet the experiment requirements. SLAF-seq-based GWAS is a fast and cost-effective approach to develop a large number of SNP markers in the absence of reference genomes. It is an effective method to determine molecular markers that influence essential traits in the absence of whole-genome markers [56,57]. As a consequence, SLAF-seq-based GWAS has been used in several conifers, for example, Bai et al. [58] used the SLAF-seq technique to analyze Pinus massoniana Lamb. germplasm resources in Guangdong Province, and identified 471,660 SNP markers in 599,164 SLAF polymorphic markers. Yang et al. [59] also used this approach and detected 524,662 high-quality SLAFs and identified 249,619 SNPs in hybrid clones of Taxodium species. In this study, a total of 4,058,883 SLAF-tags were detected and 12,275,765 SNP markers were developed and employed in GWAS to identify SNP loci associated with important traits and determined their candidate genes.
GLM and MLM are the most used algorithmic models in GWAS. The advantage of GLM is that it is more comprehensive and can identify more SNPs associations with the traits; however, its accuracy is lower than that of MLM [60][61][62][63]. In our study, the associated analysis results showed that the significantly associated SNPs in MLM (p < 1.11 × 10 −7 ) were actually less than that of the GLM. Meanwhile, in the analysis of associated candidate gene, latewood tracheid trait LWLR was associated with few candidate genes in GLM model, but not in MLM model ( Figure 5). This is because MLM model is more stringent than GLM model, and MLM model considers the influence of population structure (Q value) and genetic relationship (K value) and removes possible false associations [64]. Therefore, MLM can improve the accuracy of the analysis but can also miss some important SNP loci due to the strict screening conditions. Thus, multiple algorithmic models should be used to conduct GWAS data analysis to overcome this limitation [65,66].
Additionally, a total of 14 wood tracheid traits of P. crassifolia were used in GWAS. Due to the large amount of genome data and high levels of genome wide heterozygosity, the Norway spruce genome we used as reference genome did not assign identified association to the chromosome level. As the majority of the SNP loci were associated with earlywood tracheid traits, this is an indication of extensive variations in earlywood formation and maybe the presence of a more complex regulatory network. In temperate zones, earlywood forms in the spring, and the activity of cell cambium and physiological metabolism are vigorous due to suitable temperature and sufficient water [67,68]. As a result, the development of earlywood tracheid in P. crassifolia is expected to harbor more variation and thus SNP loci were easily associated with earlywood tracheid traits. Hence, QQ plots of the four highly associated earlywood tracheid traits (ELDR, EWLR, EWT, and LWLR) were generated to validate the accuracy of the population correction. The results of the QQ plots showed that, overall, the observed values did not match the expected values except for a few outliers at the beginnings. In other words, the highly associated SNP loci do not show normal distribution in the three earlywood tracheid traits. If a SNP locus deviate from expectation, it is considered that the deviation of the SNP observed value is caused by the genetic effect caused by the SNP mutation (i.e., true association) [69].
Next, candidate genes were screened in the 100 kb (50 kb upstream, 50 kb downstream) zone of each of the observed highly significant SNP loci. The MLM generated 67 potential candidate genes associated with three earlywood tracheid traits, ELDR, EWLR, and EWT ( Figure 5). The multiple annotation databases results showed 34 candidate genes while the function of additional 33 candidate genes were unknown. The candidate gene MA_119933g0010 was associated with three earlywood tracheid traits (ELDR, EWLR, and EWT), and it is homologous with wall-associated receptor kinase. WAKs plays an important role in cell development, and our results indicated that the candidate SNP loci of gene MA_119933g0010 has a similar function in earlywood tracheid. Likewise, the gene MA_373300g0010 identified by the GWAS of the tracheid traits in Norway spruce is likely similar to WAKs [7]. In Arabidopsis, WAKs proteins are involved in cell wall expansion of leaf and regulation of leaf senescence [70][71][72]. Therefore, WAKs are probably key proteins and regulators in tracheid development of the spruce. The candidate gene MA_10883g0010 was associated with EWLR and EWT, and is homologous with EXORDIUM-like 2 that is active in mediating regulation of growth and development [43,44]. EXOORDIUM and WAKs proteins in P. crassifolia may have a significant effect in the development of earlywood tracheid. In addition, several candidate genes related to various enzymes and transporters were highly associated with EWLR, including carboxylesterase, phosphoenolpyruvate carboxylase, 2-alkenal reductase, ABC transporter D family member, sodium/metabolite cotransporter, and Proline transporter. The association of these enzymes and transporters also showed that metabolism and material transport are active in earlywood tracheid, and that the growth and development regulation of earlywood is important for P. crassifolia. We expect that these associated SNP loci with 67 candidate genes will provide essential genetic basis for earlywood tracheid traits improvement in P. crassifolia.

Conclusions
This work presents the first genome wide dissection of wood tracheid traits in P. crassifolia. A total of 67 highly significant SNP loci were associated with three earlywood tracheid traits, ELDR, EWLR, and EWT. These SNP loci have identified a set of candidate genes that could be exploited to improve earlywood tracheid traits for regulating development of earlywood tracheid for ecological adaptability of P. crassifolia under arid and semi-arid conditions. Previous research on the environmental effects on the development of tracheid in conifers mainly focused on morphology, and our study provided a major opportunity for understanding the underpinning of wood tracheid traits through using 12,275,765 SNPs for GWAS and extending the work to functional mapping approach. It is also worth mentioning that no associations were detected in latewood tracheid traits. The magnitude of variation earlywood tracheid traits was also higher than in latewood tracheid traits. These results showed that SNP mutations are more likely to affect growth and development of earlywood tracheid. Therefore, studies of wood development or drought adaptability of P. crassifolia, are more likely to benefit if focus is directed to the regulation and improvement of earlywood tracheid traits. Multiple SNP loci associated with the candidate genes related to cell and cell wall development are expected to provide a genetic basis for exploring and verifying the key mechanisms regulating earlywood tracheid development in P. crassifolia.

Supplementary Materials:
The following supporting information can be downloaded at: https:// www.mdpi.com/article/10.3390/f13020332/s1, Table S1: The information of 106 clones in P. crassifolia; Table S2: The sequencing result of P. crassifolia; Table S3: The SNP statistics of P. crassifolia; Table S4: The SLAF-tag statistics of P. crassifolia; Table S5: The information of SNP loci in P. crassifolia; Table S6: The locus information of reference genome; Table S7: The SRA metadata of SLAF-seq in P. crassifolia. Data Availability Statement: The raw sequencing data and the SLAF-sequencing data (SRA accession: SUB10523923), are available in the NCBI Sequence Read Archive (SRA) database under BioProject accession number PRJNA771805, the detailed information of SRA metadata is shown in Table S7. Other datasets supporting the conclusions of this article are included within the article and its additional files.