Nucleotide Diversities and Genetic Relationship in the Three Japanese Pine Species; Pinus thunbergii, Pinus densiflora, and Pinus luchuensis

The nucleotide diversities and genetic relationship in the three Japanese pine species, P. thunbergii, P. densiflora, and P. luchuensis, were measured using low-copy anchor loci in Pinaceae. The average nucleotide diversity among these three Japanese pines revealed that P. thunbergii was the highest (6.05 × 10 −3 ), followed by P. densiflora (5.27 × 10 −3 ) and P. luchuensis (5.02 × 10 −3 ). In comparison to other conifer species, it was concluded that the pines possessed an intermediate level of nucleotide diversity. The Heat shock protein (HSP) gene in P. thunbergii, Phenylalanine tRNA synthetase, RuBP carboxylase, and Disease resistance response protein 206 genes in P. densiflora were significantly deviated from standard neutral models. Some of these genes were related to stress or pathogen/defense response. As the samples used in this study were collected from natural populations that showed specific characteristics of being resistant to pine wilt nematode, it was hypothesized that the initial selection was an important factor in discriminating the deviation from neutrality models. Phylogenetic reconstruction revealed that the three Japanese pines were split into two lineages corresponding to P. densiflora and P. thunbergii–P. luchuensis. The latter lineage was differentiated further into two clades; P. thunbergii and P. luchuensis. The result concludes that the three Japanese pines are closely related and P. thunbergii is genetically closer to P. luchuensis, than P. densiflora. OPEN ACCESS Diversity 2011, 3 122


Introduction
Analyzing the amounts and patterns of nucleotide diversity within and between species is important to comprehend the mechanisms of evolution by which genetic diversity is maintained and the processes of genetic polymorphisms within species become transformed into genetic divergence between species [1].Such diversities are influenced by evolutionary processes, such as mutation, recombination, selection, and population structures.
Single nucleotide polymorphisms (SNPs) are co-dominant, typically bi-allelic markers with high abundance and stability in many eukaryotic organisms [2,3].In several model species, such as Drosophila [4], Arabidopsis [5,6], and maize [7], studies of nucleotide diversities using SNPs have been conducted.However, studies of the amounts and patterns of genetic diversities in non-model and higher plants, such as forest tree species are still relatively scarce [8].Hamrick and Godt [9] reported that nucleotide diversity in higher plants is strongly affected by life history traits, such as generation time, pollination mechanisms, and mating systems.Therefore, there is a need to conduct investigations of the nucleotide diversity on higher plants which possess different life history traits from the annual selfing Arabidopsis and the annual outcrossing maize.
Based on morphological and molecular data, Pinus has been divided into two monophyletic subgenera: Haploxylon (subgenus Strobus, with one fibrovascular bundle in a needle) and Diploxylon (subgenus Pinus, with two fibrovascular bundles in a needle) [10,11].These subgenera have been further divided into sections and subsections [12].In this study, we focused on the Japanese species of subgenus Pinus, Pinus thunbergii Parl., P. densiflora Sieb.and Zucc., and P. luchuensis Mayr.These three Japanese pine species were grouped into the same subsection of Pinus [13].P. thunbergii and P. densiflora were widely distributed across the Japanese archipelago excepting the Hokkaido and Ryukyu Islands.P. luchuensis, however, was naturally distributed specifically in coastal forests on the Ryukyu Islands [14,15].These three Japanese pines are considered an attractive model of forest trees with respect of its nucleotide diversities and its phylogenetic relationship.
A recent study using low-copy anchor loci [16,17] exposed the usefulness of multiple markers for unraveling the nucleotide diversity and phylogenetic analysis at low taxonomic levels.These studies suggest that independent markers that sample a range of substitution rates and patterns can provide a greater resolution than chloroplast and nuclear-ribosomal DNA for resolving relationship analysis among closely related species.The large genome size of pines (22-37 pg/2C; [18]) hinders the development and application of low-copy anchor loci for nucleotide diversity and phylogenetic analysis [19].However, despite this problem, efforts to develop low-copy anchor loci have been directed toward P. taeda L. and P. pinaster Aiton which revealed a large number of low-copy anchor loci that show orthology across pine species [20][21][22][23].Because these conifer anchor loci show strong evidence for positional orthology, they are promising candidates for nucleotide diversity and phylogenetic analysis in other pine species [21].
In this study, we investigated the potential of fifteen conifer anchor loci to resolve nucleotide diversities and phylogenetic relationship among the three Japanese pine species of P. thunbergii, P. densiflora, and P. luchuensis.

Sample and DNA Extraction
Sixteen individuals in each of the P. thunbergii, P. densiflora, and P. luchuensis species were used in this study (Table 1).P. thunbergii and P. densiflora samples were selected from germplasm collection of Kyushu Regional Breeding Office, the Forest Tree Breeding Center (FTBC), Japan.P. luchuensis samples were collected from the Okinawa Prefectural Forest Resources Research Center, Japan.These samples were initially collected from natural populations that showed favorable characteristics of resistance to pine wilt nematode (PWN).Additionally, we also collected two samples each of P. taeda and P. palustris Mill.as an out-group.In comparison to the three Japanese pine species, P. taeda and P. palustris were classified into different subsections of Australes [13].P. taeda were obtained from Kyushu Regional Breeding Office, FTBC, whereas P. palustris were sampled from the arboretum of Kyushu University, Japan.Total genomic DNA from each individual sample was extracted from needle tissue using the modified cetyltrimethyl ammonium bromide (CTAB) method [24] from those of Murray and Thompson [25].The extracted DNA was further purified by MagneSil-Red (Promega) based on the manufacturer's instructions.Based on previously published low-copy anchored loci in Pinaceae [20][21][22]26], 21 loci were initially screened.PCR reactions were performed in a total volume of 10 L containing 1 × PCR buffer (Invitrogen), 1.5 mM MgCl 2 , 0.2 mM each dNTP, approximately 20 ng of template DNA and 0.25 U Platinum Taq DNA polymerase (Invitrogen).
PCR amplifications were conducted in a Biometra T1 thermocycler (Biometra) using the modified 'touchdown' PCR [27] as follows: 94 C for 1 min, followed by 10 cycles touchdown of 94 C for 30 s, 60 to 55 C (decreasing 0.5 C per cycle) or 56-50 C (decreasing 0.6 C per cycle) for 30 s, and 72 C for 1 min, followed by 25 cycles of 94 C for 30 s, 55 C or 50 C for 30 s and 72 C for 1 min, with a final extension of 2 min at 72 C.Annealing temperatures were specifically optimized for the respective primer pair (Table 2).The sequences of the primer pairs used in PCR reactions are given in Table 2. Five microliters of PCR products were electrophoresed on a 1.2% agarose gel to check the successfulness of PCR amplification.
Primers which produced single and clear PCR amplifications in all 52 samples were used for direct-sequencing (Table 2).PCR products of 15 primer pairs selected through preliminary screening were purified using 2.0 U exonuclease-I (Exo-I; New England Biolabs) and 2.0 U antarctic phosphatase (AP; New England Biolabs) to remove the excess of primers and dNTPs.Exo-AP treated PCR products were directly sequenced by using the BigDye Terminator Sequencing v3.1 kit (Applied Biosystems).Capillary electrophoresis was conducted on ABI 3130 automated sequencer (Applied Biosystems).All samples were sequenced in both directions at least once.For each locus, the forward and reverse reads were analyzed using the Sequencher v4.2 (GeneCodes Corp).A putative SNP was accepted as a genuine polymorphism if all chromatograms were unambiguous and all quality scores exceed 25 at that site.Resequencing was performed when necessary to maintain these criteria.Only sequences that matched the criteria were used for further analysis.The confounding effect of indels which produce overlapping phase shifts under condition of direct sequencing was overcome by resequencing the megagametophytic DNA of the corresponding sample.Four to eight megagametophytes were used to identify the indels.We also designed an additional primer pair (Pt_2763 2F and 2R in Table 2) for sequencing reaction in locus Pt_2763 because of its longer PCR product.All sequences obtained were deposited in the DNA Data Bank of Japan database (Accession numbers AB605617 -AB605618, AB605666 -AB605707, AB605731 -AB605761).

Data Analyses
Nucleotide diversity () at the level of the gene fragment represents the proportion of nucleotides that differ between two sequences, averaged over all available pairs of sample comparison was computed from the number of polymorphic SNP sites on a base pair basis [28].Insertions and deletions were excluded from this analysis.To test for the departures from the standard neutral model of evolution, three models of Tajima's D [29], Fu and Li's F * , and D * [30] statistics were computed to obtain insights into the hypothesis of selective neutrality.Non-significant value indicates no evidence for evolutionary selection.The above analyses were performed using DnaSP v5 [31].To control for an elevated rate of false positives resulting from multiple testing, the standard Bonferroni correction was applied.The distribution of SNP along gene fragments was examined to understand if nucleotide diversity was distributed randomly or organized in haplotypes.Since the diploid samples were used for sequencing, direct determination of the allelic haplotypes were complicated.Thus, haplotypes were inferred using PHASE v.2.1.1 [32,33] that included in DnaSP v5.A similar situation was experienced with grapevine [34] and cassava [35].Haplotype based gene diversity (Hd) was estimated for each polymorphic gene fragment using DnaSP v5.
Polymorphic information content (PIC) value for each identified SNP site was calculated as described by Botstein et al. [36].Observed heterozygosity (Ho) was calculated as the proportion of heterozygous individuals at each polymorphic site.PIC, Ho, and expected heterozygosity (He) were calculated using Cervus v3.0.3 [37].
In order to construct a large phylogenetic data set, all locus sequences were concatenated in each individual case.Sequence alignment was carried out using ClustalW with gaps opening and extension penalties of 15, and 6.66, respectively.Phylogenetic relationship analysis was conducted according to maximum likelihood method [38] with the Tamura-Nei model and 1,000 bootstrap replications.The analysis was conducted using PhyML v3.0 [39].P. taeda and P. palustris sequences were included for concatenation and used as an out-group in the analysis.

Results
The approach used here was based on the direct-sequencing of low copy anchored loci in Pinaceae.PCR amplification was performed using primer pairs designed from these loci [20][21][22]26], which targeted the intron and/or 3' un-translated region [20] of the selected gene fragments.Of the screened 21 primer pairs, 15 (71%) produced single and clear PCR products ranging from 211 to 918 bp (Table 2); while the remaining 6 (29%) were produced multiple PCR products or no PCR amplifications.In total 5,825 bp sequences were obtained from 15 loci in each pine species or 275,552 bp (5,825 bp × 32 samples, and 5,822 bp × 16 samples; excluding the out-group samples).Every amplicon was sequenced from all 16 samples of each pine species.
The number of haplotypes varied among loci in each species.In P. thunbergii, the number of haplotypes of each locus varied from 2 in PtIFG_2358 to 9 in PtIFG_2274, with an average of 5.1 haplotypes per locus.In P. densiflora and P. luchuensis, the number of haplotypes per locus ranged from 3 in PtIFG_606 and PtIFG_9151 to 13 in PtIFG_2274, with an average of 6.3; and from 1 in PtIFG_2358 to 7 in Pt_2763, with an average of 4.5 (Table 3), respectively.Haplotype diversity (Hd) in P. thunbergii and P. densiflora was highest in locus PtIFG_2274 (0.891 and 0.897, respectively) and lowest in locus PtIFG_2358 (0.063 and 0.236, respectively).In P. luchuensis however, the highest Hd was observed in locus PtIFG_606 (0.776) and lowest in locus PtIFG_2358 (0.000).Average haplotype diversities (Hd; 0.586, 0.586, and 0.521 in P. thunbergii, P. densiflora, and P. luchuensis, respectively) were found to be roughly similar in the three pines species, in which an intermediate diversity level was concluded.According to Clarke et al. [40], the maximum number of expected haplotypes from segregation sites (s) is s + 1.In our results, however, only five loci, PtIFG_1643 (in P. densiflora), PtIFG_2358 (in P. luchuensis), PtIFG_8702 (in P. densiflora and P. luchuensis), PtIFG_9151 (in P. thunbergii and P. luchuensis), and PpINR_AS01H04 (in P. thunbergii) were consistent with this rule (Table 3).In most loci, the major haplotypes are accompanied by a series of low frequency haplotypes (data not shown).The frequency of the minor allele (q in Supplementary Data) varied from 0.031 to 0.500 in P. thunbergii and P. densiflora, and 0.031 to 0.469 in P. luchuensis.Even though variations were observed between numbers of discovered SNPs and haplotypes among the loci, positive correlations (r = 0.831 in P. thunbergii, r = 0.925 in P. densiflora, and r = 0.886 in P. luchuensis) were evidenced between the number of SNP sites identified for a locus and its number of inferred haplotypes.
The observed heterozygosity (Ho) of each SNP locus in P. thunbergii, P. densiflora and P. luchuensis were spanned from 0.000 to 0.563; 0.000 to 0.625; and 0.000 to 0.688, respectively (see Supplementary Data).Accordingly, the expected heterozygosity (He) in P. thunbergii and P. densiflora showed the same results, ranging from 0.063 to 0.516.In P. luchuensis however, it ranged from 0.063 to 0.514.The same range of PIC (0.059 to 0.375) was revealed in P. thunbergii and P. densiflora, whereas in P. luchuensis, it ranged from 0.059 to 0.374.
We employed three neutrality tests (Tajima's D, Fu and Li's D*, and Fu and Li's F*) to assess whether signatures of selection were present in the loci.The results of these tests are shown in Table 4.In P. thunbergii, all loci had non-significant deviation from the three models of neutrality tests with the exception of locus PpINR_AS01C7 (Fu and Li's D* = 1.499;Fu and Li's F* = 1.687).Significant deviations were also detected in loci PtIFG_2358 (Tajima'  Phylogenetic reconstruction revealed that the three Japanese pines were split into two lineages corresponding to P. densiflora with high bootstrap support (BS > 85%), and P. thunbergii-P.luchuensis, however, with low (BS < 50%; Figure 1).The lineage P. thunbergii-P.luchuensis was differentiated further into two clades; one included P. thunbergii (BS = 98%), and P. luchuensis (BS = 92%) in another clades (Figure 1).The low bootstrap of the lineage P. thunbergii-P.luchuensis was consistent with the result of the neighbor-joining method (data not shown).

Discussion
The major objective of this study was to assess the nucleotide diversities and to construct phylogenetic relationship in P. thunbergii, P. densiflora, and P. luchuensis using the characterized SNPs in selected loci of low-copy gene fragments.This study identified 122 SNPs (an average of one SNP per 48 bp), 140 SNPs (an average of one SNP per 42 bp), and 115 SNPs (an average of one SNP per 51 bp) in P. thunbergii, P. densiflora, and P. luchuensis, respectively.In comparison to other conifer species, SNP frequencies discovered in P. thunbergii and P. densiflora were roughly similar to that found in Pseudotsuga menziesii Mirb.(one SNP per 46 bp) [41].In P. luchuensis, the SNP frequency was found to be similar to that found in P. taeda (one SNP per 50 bp) [42].Populus tremula L. was shown lower SNP frequency, in which one SNP was found in every 60 bp [43].In Populus nigra L. however, SNP frequency was detected in a higher frequency, in which one SNP was found in every 26 bp over the nine sequenced genes [44].
Average nucleotide diversities among the three Japanese pines revealed that P. thunbergii possessed the highest value (6.05 ×10 −3 ), followed by P. densiflora (5.27 × 10 −3 ), and P. luchuensis (5.02 × 10 −3 ) (Table 3).In comparison to P. menziesii [41], P. nigra [44], Arabidopsis [45], and Drosophila [46] as compiled in [47], the nucleotide diversities in these Japanese pine species were 1.1-1.3folds lower.However, in comparison with P. taeda [42], the nucleotide diversities were similar.These values were 2.5-3.0 fold higher than in Cryptomeria japonica D.Don [48,41], P. pinaster, and Pinus radiata D.Don [49].Based on these comparisons, we concluded that the level of nucleotide diversities in the three Japanese pines species were intermediate in the range of published conifer values.This difference in diversity level of some species is not surprising as distinct genetic are generally present among species.Further investigation of nucleotide diversity in more diverse populations, including the P. thunbergii and P. densiflora from Korean peninsula, P. densiflora from Shandong and eastern Manchuria in China, and southern Ussuriland in Russia, or P. luchuensis from Taiwan and a wider area of the Ryukyu Islands would be useful to achieve a better understanding on overall genetic diversity.
In general, the total nucleotide diversities in conifers are somewhat similar or lower than that of Arabidopsis (total nucleotide diversity = 7.0 × 10 −3 [45]) and Zea mays L. (total nucleotide diversity = 9.60 × 10 −3 [7]).The mutation rates per generation in conifers are expected to be high because of their long generation time, if mutation rates per year are constant.Therefore, the conifer population size may be smaller than those in Arabidopsis and Z. mays.However, estimated mutation rates per year in angiosperms are generally higher than those in conifer [50].The mutation rates per year are estimated to be 1.5 × 10 −8 in Arabidopsis [5], and 5.99-7.00× 10 −9 in Z. mays [51], while corresponding estimates in Pinus spp.are 0.70-1.31× 10 −9 [52], and in Cupressaseae, 1.9 × 10 −9 [53].By assuming that divergence time is, for example 100 mya, the effect of the long generation time in conifers may be balanced by the low mutation rate.
Although values of nucleotide diversity were found to be higher in P. thunbergii than in P. densiflora, haplotype diversity in P. thunbergii was similar to that in P. densiflora.(Table 3).This discrepancy seems to be due to deviations from the standard neutral models (which assumes neutrality and random mating) in P. thunbergii, as detected in PpINR_AS01C7 (Heat shock protein (HSP) gene), and in P. densiflora, as indicated in PtIFG_2358 (Phenylalanine tRNA synthetase gene), PpINR_AS01H04 (RuBP carboxylase gene), and Pt_3113 (Disease resistance response protein 206 gene) (Table 4).These deviations from the standard neutral model may be due to either the effect of past population structure or selection [29,54].The Heat shock protein (HSP) and Disease resistance response protein 206 genes were suspected to be genes related to stress-related or pathogen/defense responses [55,56].As previously stated (see Experimental Section), the currently used pine samples were initially collected from natural populations that showed special resistance characteristics to PWN.The initial selection might have affected the deviation from neutrality model.However, we bore in mind that the deviation could also have been due to an artifact of breeding.Therefore, a further investigation on resistant genes to PWN is necessary.
The informativeness of a genetic marker depends on the number of detected allele frequencies; accordingly this is quantified by the polymorphic information content (PIC).The inferred phylogeny using low copy anchor loci clearly showed that the three Japanese pines split into two lineages, one comprising P. densiflora, and the other P. thunbergii-P.luchuensis (Figure 1).The result concludes that the three Japanese pines were closely related, in which P. thunbergii seems to be genetically closer to P. luchuensis, than P. densiflora.The result was in accordance with the previously published karyotype study of P. thunbergii, P. densiflora, and P. luchuensis which revealed that the karyotype of P. luchuensis resembled that of P. thunbergii more than P. densiflora [57].Hizume et al. [14] compared fluorescent banding patterns of these three Japanese pines, and also suggested that P. thunbergii, P. densiflora, and P. luchuensis were closely related, and that P. luchuensis was more closely related to P. thunbergii than P. densiflora.Our result was in accordance with these previous cytogenetical researches.

Conclusions
We presented the first analysis of the amount and patterns of nucleotide diversity in the three Japanese pines, P. thunbergii, P. densiflora, and P. luchuensis.By using the low copy anchor loci in Pinaceae, the nucleotide diversity in the three Japanese pines was at an intermediate level compared with the published nucleotide diversities in conifers.Phylogenetic reconstruction showed that two lineages corresponding to P. densiflora and P. thunbergii-P.luchuensis were assumed.In conclusion, the three Japanese pines were genetically close-related, in which P. thunbergii was genetically closer to P. luchuensis than P. densiflora.
s D = −2.068;Fu and Li's D* = −3.371;Fu and Li's F* = −3.474),PpINR_AS01H04 (Tajima's D = 2.393; Fu and Li's F* = 1.880), and Pt_3113 (Tajima's D = −1.838) in P. densiflora, although almost all of the significant deviations had disappeared (except in loci PtIFG_2358 (Fu and Li's D* and F*), and PpINR_AS01H04 (Fu and Li's F*)) after Bonferroni correction.The significantly positive values from the neutrality tests are indicative of balancing or diversifying selection for two or more alleles, whereas significant negative values are indicative of negative or purifying selection against genotypes carrying the less frequent alleles, and/or are indicative of a recent population bottleneck eliminating less frequent alleles [1,29].

Figure 1 .
Figure 1.Phylogenetic relationship of Pinus thunbergii, Pinus densiflora and Pinus luchuensis based on sequence variations derived from 15 low-copy anchor loci.Numbers beside the branch represent bootstrap (BS) value (%) from the maximum likelihood analysis.BS < 50% were not shown.Pinus taeda and Pinus palustris were used as an out-group.

Table 1 .
Pinus thunbergii, Pinus densiflora, and Pinus luchuensis samples used in this study.

Table 2 .
Low-copy anchor loci used in this study.
a touchdown PCR protocol (see Experimental Section); b as reported in previously published paper of primer source; * number in parentheses indicates PCR product length detected in Pinus densiflora.

Table 3 .
Summary of polymorphism within each species.Pinus thunbergii, Pinus densiflora, and Pinus luchuensis, respectively.Hd, haplotype diversity; nucleotide diversity.Nucleotide diversity value is × 10 3 .Numbers in parentheses indicate the number of haplotypes.

Table 4 .
Summary of the results of neutrality tests of fifteen loci.P. den, P. luc represent Pinus thunbergii, Pinus densiflora, and Pinus luchuensis, respectively.Significant deviation from neutrality tests are indicated in bold; *, significant at P < 0.05.N/A, not applicable.
The highest PIC value of individual SNP in P. thunbergii was 0.375 at polymorphic sites 441 bp, 456 bp, 466 bp and 471 bp of Pt_3113.In P. densiflora, the highest individual PIC value was 0.375 at polymorphic sites 48 bp of PtIFG_1584), 182 bp and 237 bp of PpINR_AS01H04.The highest PIC value of individual SNP in P. luchuensis was 0.374, detected at polymorphic sites 403 bp, 414 bp, 459 bp, 464 bp of PtIFG_2274, 34 bp, 164 bp of PpINR_Pp.ap9,and 441 bp, 456 bp, 466 bp and 471 bp of Pt_3113.