Development of Genomic SSR for the Subtropical Hardwood Tree Dalbergia hupeana and Assessment of Their Transferability to Other Related Species

: Dalbergia hupeana Hance ( D. hupeana ) is a precious hardwood tree of the genus Dalbergia . It is one of the few species widely distributed within subtropical areas and is important for timber production and forest restoration. At present, there is little published genetic information on D. hupeana . Therefore, we performed a genome survey using next generation sequencing (NGS) and developed a set of novel genomic SSR (gSSR) markers from the assembled data, and assessed the transferability of these markers to other Dalbergia species in Asia. The results of the genome survey show the genome size of D. hupeana to be about 664 Mb and highly heterozygous. The assembly of sequencing data produced 2,431,997 contigs, and the initial assembly of the NGS data alone resulted in contig N50 of 393 kb with a total of 720 Mb. A total of 127,742 perfect SSR markers were found in the assembled contigs. A total of 37 highly polymorphic and easily genotyped gSSR markers were developed in D. hupeana , while the majority of gSSR markers could be successfully transferred to nine other Dalbergia species in Asia. The transferability rate of gSSR markers was highest in D. balansae , which is more closely related to D. hupeana . Seven gSSR markers were able to be ampliﬁed in all tested species. In addition, a preliminary assessment of the genetic diversity of three tree species in the Dalbergia genus suggested a high level of genetic diversity within populations distributed in the subtropical area in China. However, the determination of the global status of their genetic variation still requires further and more comprehensive assessment. Our ﬁndings will enable further studies on the genetic diversity, phylogenetics, germplasm characterization, and taxonomy of various Dalbergia species.


Introduction
The genus Dalbergia, a member of the Fabaceae family, contains over 250 species distributed across pan-tropical areas of the world [1]. Most species have important economic value because their hardwood is a precious timber in addition to the application of trunk root extracts in the medical field, e.g., as a source of flavonoids and terpenoids. For these reasons, the natural resources of the Dalbergia species have been decreasing in recent decades [2]. The genus Dalbergia was listed in CITES (Convention on International Trade in Endangered Species of Wild Fauna and Flora), and more than one-third of Dalbergia species are considered endangered or critically endangered and included on the red list of the IUCN (International Union for Conservation of Nature), suggesting these species are facing different threats [3,4]. In China, 29 species of the genus Dalbergia were recorded in Flora of China; 14 of these species are endemic, and one has been introduced [5]. A few studies have focused on the genetic diversity of some Dalbergia species [6][7][8][9][10][11]; however, many species have not been studied and the limited availability of molecular markers may be one of the primary reasons for this.
Dalbergia hupeana Hance (D. hupeana) is a large-sized tree species widely distributed in the subtropical areas of China and South Korea [12,13]. This species is a precious hardwood and a broad-leaved tree of economic and ecological value, with applications in timber production and landscape restoration. However, the natural resources of this species are decreasing, and its habitat is also becoming increasingly fragmented, according to a previous report [14]. Genomic knowledge on D. hupeana and an assessment of its genetic resources are lacking.
Molecular markers are a powerful tool for characterizing interspecific and intraspecific genetic variation [15][16][17][18][19]. Thus, cases where the genetic diversity of trees has not been assessed primarily result from a lack of effective molecular markers [20]. Therefore, the development of molecular markers is a key component of the fundamental work required for forest genetic resources (FGR) conservation and management. Simple sequence repeat (SSR) markers are powerful, effective, and economical for the assessment of co-dominant inheritance, with good reproducibility [21,22]. The development of next generation sequencing (NGS) techniques has resulted in a revolution in the field of de novo whole genome assembly and multi-omics of plants [23], and developing genomic or genic SSRs from NGS data is both feasible and affordable [24][25][26].
In the described study, we aimed to uncover genomic knowledge and develop effective SSR markers for D. hupeana by utilizing de novo sequencing to survey and characterize the D. hupeana genome. We then screened and verified a set of genomic SSR (gSSR) by PCR and capillary electrophoresis. Additionally, in order to potentially broaden the scope of application of these markers, their transferability and degree of polymorphism were examined in other related species in the Dalbergia genus and representative populations.

Plant Materials
A diploid seedling of D. hupeana, which was collected in Anhui province, was used to perform de novo sequencing. The information of other plant materials in the Dalbergia genus is shown in Table 1. Their fresh leaves were collected, dried, and stored in silica gel at 4 • C. The voucher specimens of Dalbergia species were preserved in the Research Institute of Forestry, Chinese Academy of Forestry. Photographs of the studied species of D. hupeana and its leaves and fruits are shown in Figure 1. Total genomic DNA was extracted from leaf tissue using a DNA extraction kit (NuClean Plant Genomic DNA Kit, CWBIO) according to the manufacturer's protocol. The concentration of total genomic DNA was examined using a microplate spectrophotometer (Molecular Device, Sunnyvale, CA, USA). The quality of total genomic DNA was verified by electrophoresis on 0.8% agarose gels.

De Novo Genome Sequencing
Three genomic paired-end libraries with 350 bp insertions were constructed using the Illumina HiSeq platform (Illumina, San Diego, CA, USA) and a diploid seedling of D. hupeana, following the guidance of the standard procedure at Biomarker Technologies Co., Ltd. (Beijing, China). Clean reads were obtained after the filtering and correction of the sequence data, and were relatively accurate for estimating the genome size, heterozygosity rate, and percentage of repetitive content.

Genome Survey and Assembly
Genome size was estimated using the total length of sequence reads divided by the sequencing depth. The sequencing depth of D. hupeana was calculated based on the k-mer (k = 21) analysis using a Lander-Waterman algorithm, and we estimated the genome size, heterozygosity rate, and percentage of repetitive content using the k-mer method [27]. The genome assembly based on the NGS refined reads (k-mer = 55) was performed using SOAPdenovo with default parameters [28].

Identification and Verification of gSSR Markers
The sequence-repeat search mode in MISA software (Saxony-Anhalt, Germany. http://pgrc.ipk-gatersleben.de/misa/misa.html, accessed on 12 February 2021) was used to mine candidate SSR markers from all contigs of the draft D. hupeana genome. The following search parameters were set for discovering SSR markers: di-, tri-, tetra-, penta-, and hexa-nucleotide motifs with a minimum of 6, 5, 5, 5, and 5 repeats, respectively. Then, primer pairs that flanked the SSRs were designed using Premier v5.0 (Palo Alto, CA, USA) with the parameters of a 100-300 bp final product length, 18-22 bp primer size, 40-60% GC content, and 55-62 • C annealing temperature, with an optimum annealing temperature of 60 • C. A total of 200 randomly selected candidate gSSR loci were used for preliminary screening, and the corresponding primers were synthesized by Majorbio Biotech (Beijing, China). A total of 8 individuals of D. hupeana from different locations were selected to test the gSSR markers for polymorphism. PCR amplifications were performed in 20 µL reaction volumes as follows: 10 µL of 2 × Taq MasterMix (Aid Lab, Beijing, China), approximately 50 ng of DNA, 5 pmol of reverse primer and 5 pmol of forward primer with the 5 end labeled with fluorescent dyes (FAM, HEX, ROX, or TAMRA; Ruibiotech, Beijing, China), and sterile double-distilled water added to 20 µL. PCR reactions were performed with a touchdown profile with an initial denaturation at 94 • C for 5 min; 10 cycles of 94 • C for 30 s, 65 • C for 30 s, and 72 • C for 30 s, with a 1 • C decrease in the annealing temperature of each cycle; 35 cycles of 94 • C for 30 s, 55 • C for 30 s, and 72 • C for 30 s; and a final extension at 72 • C for 10 min. The PCR products were detected by an ABI 3730XL capillary electrophoresis analyzer with a GeneScan-500LIZ size standard. Fragments were genotyped for their presence/absence at each locus, and the allele sizes were scored using GeneMaker v2.2.0 (SoftGenetics LIC, State College, PA, USA) and manually checked twice to reduce genotyping errors.

Transferability to Other Dalbergia Species
Cross-species amplification was tested in another nine Dalbergia species (Table 1) using the polymorphic gSSR markers. The total genomic DNA was extracted from two individuals for each species. PCR amplification was performed, and the methods were conducted as described under the section "Identification and Verification of gSSR markers".

Assessment of Genetic Diversity in Different Species
Five populations in the Dalbergia genus (three representing D. hupeana, and one each for D. balansae and D. polyadelpha) were used to test the effectiveness of the developed gSSR markers in the assessment of genetic diversity. The following genetic diversity parameters were calculated using GenALEx v6.502 (Canberra, Australia) [29,30]: the number of observed alleles (N a ), the number of effective alleles (N e ), observed heterozygosity (H o ), expected heterozygosity (H e ), Shannon's information index (I), and unbiased expected heterozygosity (uH e ). The polymorphism information content (PIC) of gSSR markers was calculated using CERVUS v3.0.4 (London, UK) [31].

Genome Survey
Three libraries of paired-end sequences with 350 bp short inserts of individual of D. hupeana were constructed. A total of 120.24 Gb of clean data were obtained, which correspond to an approximately 180-fold coverage of the estimated genome size. The Q20 and Q30 were 97.11% and 92.16%, respectively. All clean data were used for assembly and k-mer analysis; the results show the number of k-mers was 100,590,170,915 and the peak of the depth distribution was 151 for the 21-mer frequency distribution (Figure 2a). The genome size of D. hupeana was estimated as 664 Mbp, the heterozygosity rate was calculated to be 1.00%, the repeat rate was estimated as 42.84%, and an average of 34.93% GC content was determined (Figure 2b). Based on the k-mer curve distribution, a k-mer size = 55 was chosen for assembly with the default parameters in SOAPdenovo, and a total of 2,431,997 raw contigs were produced to assemble a final draft genome of 720 Mb with contig N50 size of 393 kb. For scatter plots, the X-axis represents GC content and the Y-axis represents the average depth. For the histograms, the X-axis represents the GC content and the Y-axis represents the frequency distribution of sequencing depth.

Development and Verification of gSSR Markers
After filtering, a total of 127,742 perfect SSR markers were found in the assembled contigs. The most abundant motifs were di-nucleotide repeats (88,999), accounting for 69.67%, followed by tri-nucleotide repeats (34,536), accounting for 27.03%. Finally, tetranucleotide and penta-nucleotide repeats accounted for 2.41% and 0.61%, respectively. Then, 200 pairs of primers were randomly selected to test the success rate of amplification, and eight individuals of D. hupeana from different sites were randomly chosen for the detection of polymorphisms. As a result, most of the primers yielded clear amplification, with nearly a 57% success rate. However, 37 pairs of primers were successful in revealing abundant polymorphisms, showing clear allelic bands; the characteristics of these gSSRs are shown in Table 2. These gSSR markers were further utilized for population genetic analysis of three wild populations of D. hupeana, and their transferability to other Dalbergia species was tested.

Transferability of Developed gSSR Markers to Other Dalbergia Species
A total of 37 polymorphic gSSR markers were developed and verified in D. hupeana. All of these gSSR markers were then tested for amplification in nine Dalbergia species for the assessment of cross-species transferability. Among the 37 tested primer pairs, seven gSSR markers (Dhup14, Dhup61, Dhup114, Dhup120, Dhup139, Dhup141, Dhup164) were able to be amplified in all nine tested species, and two gSSR markers failed to yield positive amplicon in any of the other tested Dalbergia species (Table S1). The rates of successful amplification in other Dalbergia species were recorded as between 40.54% and 70.27%. A higher number of gSSR marker sets were successfully amplified in D. balansae (26), D. oliveri (23), D. polyadelpha (21), and D. cochinchinensis (20), with the lowest in D. yunnanensis (15) (Figure 3, Table S1).  (Table S2).
A total of 26 of the newly developed gSSR markers were effectively amplified, and all of these gSSR markers were polymorphic in the D. balansae population (n = 27). The number of alleles (N a ) varied from 3 (Dhup164) to 15 (Dhup108 and Dhup104), with an average of 7.038. The mean value of PIC was 0.604, ranging from 0.278 (Dhup14) to 0.904 (Dhup108); 18 gSSR markers were shown to be highly informative (Table S3). The observed heterozygosity (H o ) ranged from 0.074 to 0.885, and the expected heterozygosity (H e ) ranged from 0.294 to 0.911, with a mean of 0.467 and 0.635, respectively (Table S3).
A total of 22 of the newly developed gSSR markers proved to be effectively amplified in the D. polyadelpha population (n = 27). A total of 21 gSSR markers were polymorphic, except Dhup120. The number of alleles (N a ) varied from 2 (Dhup78, Dhup144, Dhup181 and Dhup183) to 14 (Dhup122) with an average of 5.091 per locus. The mean value PIC was 0.471, ranging from 0.071 (Dhup70) to 0.879 (Dhup122). A total of 11 gSSR markers were shown to be highly informative in D. polyadelpha. The observed heterozygosity (H o ) ranged from 0.037 to 0.889, and the expected heterozygosity varied from 0.072 to 0.889, with a mean of 0.399 and 0.512, respectively (Table S3).

Discussion
The Dalbergia species are good sources of precious timber; a few tree species have been identified as rosewood, and genetic research has been performed on them. The genetic diversity of endangered species and their phylogenetic relationships have been unveiled through studies using molecular markers and DNA sequences [6][7][8][9][10][11]. However, many species in the Dalbergia genus have not been investigated, especially subtropical species, which are good sources of hardwood and also face threats from climate change and human activity [32][33][34][35]. The development of effective and powerful molecular markers will be conducive to facilitating research on these species. SSR markers are one of the most powerful DNA markers for genetic research because of the co-dominant inheritance of SSRs and their amenability to genotyping. The development of NGS techniques has made it easier to acquire more genetic information and develop SSR markers. In this current study, the genomic information of D. hupeana was obtained using NGS. The results indicate that the D. hupeana genome has high heterozygosity with a medium number of repetitions of medium size. Combined with Hi-C technology, the most up-to-date third-generation sequencing (Circular Consensus Sequencing, CCS) has the potential to assemble a high-quality and chromosome-level reference genome [36,37]. Using the acquired sequence information, genomic SSRs were mined, characterized, and evaluated in D. hupeana. Dinucleotide repeats were observed to be the most abundant type of repeat motif in the genome. Trinucleotide repeats are also one of the most abundant repetition motifs in genes for tree or fruit species because they are less likely to lead to frameshifts in the coding sequence [38]. The PIC value is a considerable parameter showing the power of a molecular marker. It is generally acknowledged that molecular markers with PIC > 0.5 are defined as highly polymorphic [39]. Most of the developed gSSR markers show high polymorphism in D. hupeana according to the PIC value of each locus. This indicates that these gSSR markers have good potential to detect genetic variation, being useful for further population genetic analyses of this species.
Previous reports indicate that SSR markers are transferable among closely related species in plants [40,41]. In the Dalbergia genus, several SSR markers from D. nigra and D. monticola were transferrable to other Dalbergia species in South America [42]. It is generally known that the amplification conditions affect the efficacy and fidelity of PCR [43]. In this study, the majority of high polymorphic gSSR markers could be successfully transferred from D. hupeana to other Dalbergia species in Asia. Only two gSSR markers in D. hupeana could not be amplified in the other nine species, and seven gSSR makers could be amplified in all tested species. The transferability rates ranged from 40% (D. yunnanensis) to 70% (D. balansae). The transfer success rate seems to be related to consanguinity in accordance with the inferred phylogenetic relationships in the Dalbergia genus [34,[44][45][46]. These results indicate high conservation of the primer binding sites in the genome of the 10 selected Dalbergia taxa assayed in the present study, and these gSSR markers may have great potential in phylogenetic and evolutionary studies of Dalbergia populations. Finally, we selected natural populations from three Dalbergia species to assess their genetic diversity using the developed gSSRs. The results show that, compared to D. hupeana, the average number of alleles (N a ) in D. balansae was similar, while that of D. polyadelpha was lower. A higher expected heterozygosity (H e ) may indicate abundant genetic diversity within populations of D. hupeana and D. balansae; these values were similar to those of D. oliveri (mean H e = 0.73) [47]. However, H o was lower than H e , suggesting these populations may have experienced isolation, genetic drift, or inbreeding, though further population genetic studies would be needed before making any conclusions.

Conclusions
We performed a genome survey in D. hupeana to gain knowledge on the structure of its genome. The k-mer analysis suggested that the genome of D. hupeana is medium sized and highly heterozygous. The assembled data were also utilized to screen and develop a set of novel gSSR markers, which showed high polymorphism and ease of genotyping. The majority of gSSR markers could be successfully transferred to other Dalbergia species and their populations in Asia. These findings will enable further studies on genetic diversity, phylogenetics, germplasm characterization, and taxonomy of various species of the genus Dalbergia.