Molecular Identification and Evaluation of the Genetic Diversity of Dendrobium Species Collected in Southern Vietnam

Dendrobium has been widely used not only as ornamental plants but also as food and medicines. The identification and evaluation of the genetic diversity of Dendrobium species support the conservation of genetic resources of endemic Dendrobium species. Uniquely identifying Dendrobium species used as medicines helps avoid misuse of medicinal herbs. However, it is challenging to identify Dendrobium species morphologically during their immature stage. Based on the DNA barcoding method, it is now possible to efficiently identify species in a shorter time. In this study, the genetic diversity of 76 Dendrobium samples from Southern Vietnam was investigated based on the ITS (Internal transcribed spacer), ITS2, matK (Maturase_K), rbcL (ribulose-bisphosphate carboxylase large subunit) and trnH-psbA (the internal space of the gene coding histidine transfer RNA (trnH) and gene coding protein D1, a polypeptide of the photosystem I reaction center (psaB)) regions. The ITS region was found to have the best identification potential. Nineteen out of 24 Dendrobium species were identified based on phylogenetic tree and Indel information of this region. Among these, seven identified species were used as medicinal herbs. The results of this research contributed to the conservation, propagation, and hybridization of indigenous Dendrobium species in Southern Vietnam.


Introduction
Dendrobium is among the most abundant genera of flowering plants with over 1148 known species, which ranks second in the orchid family, after the Bulbophyllum genus [1]. Dendrobium is diverse in shapes, colors, and sizes, and is hence considered as a favorite ornamental plant. Some Dendrobium species are also used as medicinal herbs, such as D. densiflorum and D. chrysotoxum [2]. Many studies on diverse Dendrobium species by geographic regions have been published for Australia [3,4], mainland Asia [5,6], China [7], Thailand [8,9], etc. These studies again confirm the rich diversity of the beautiful orchids.
The living environment of indigenous Dendrobium species in Vietnam is declining due to climate change and over-exploitation. An evaluation of genetic diversity and identification of Dendrobium

Data Analysis
FinchTV software [22] was used to read and adjust nucleotide sequences. Forward and reverse sequences were combined into consensus sequences and aligned using Seaview 4.0 [23]. The ITS2 sequence was then extracted from the ITS sequence (Based on accession number JN388570.1) for analyses. The phylogenetic tree and variable parameters were calculated in MEGA 7.0 software [24] by using the Maximum Likelihood algorithm, following the 2-parameter Kimura model. The sequence of orchid species Paphiopedilum delenatii was used as an outgroup to root the tree.

Sample Collection, Amplification, and Sequencing
The 76 Dendrobium samples (Appendix A) were collected and divided into two groups: the collection of Biotechnology Center Ho Chi Minh (coded as TT) and the commercial samples (coded DT, PN). For ITS and matK, all 76 collected samples were amplified. Since rbcL is a conserved region, only 35 samples from 30 species were amplified.
The PCR results in both ITS and matK regions achieved success rates of 94.73% and 97.26%, respectively. Notably, the rbcL area had the best rate of 100%. Particularly in the trnH-psbA region, the PCR success rate was 82.19%. However, the amplification and sequencing of trnH-psbA were at low levels. Therefore, the data from the trnH-psbA region was not included in further analyses in the study.

Genetic Diversity Based on Nucleotide Polymorphism and Phylogenetic Analyses
Seventy-six samples of 30 collected Dendrobium species were included in the survey (Appendix A). For phylogenetic analysis, sequences of Dendrobium species from our study were compared with GenBank accessions (Accession numbers of GenBank sequences are shown in Appendix B). Based on the phylogenetic tree, individuals of the same species should cluster in the same branch that separates from the other species. In general, there was no conflict among the three constructed trees. However, the ITS gave the most separated branches. The ITS2 trees showed the same clusters as the ITS trees. Hence the ITS region was representatively analyzed for the divergence of Dendrobium species in Southern Vietnam.
On the ITS tree, samples of some species were grouped with their conspecific accessions from GenBank without mixing with other different species, i.e., D. aloifolium, D. amabile, D. capillipes, D. chrysotoxum, D. crumenatum, D. crystallinum, D. densiflorum, D. farmeri, D. intricatum, D. parishii, D. secundum, D. sulcatum, and D. venustum. D. superbum was the synonym name of D. anosmum. Hence their sequences were mixed up for both our samples and GenBank accessions and closely related to their sister D. parishii. As a result, the hybrid samples of D. anosmum × parishii and D. anosmum × D. aphyllum were also included in the phylogenetic branch of these species. D. anosmum × parishii is named D. nestor, and D. anosmum × D. aphyllum is named Adastra. The separation of D. parishii from D. anosmum was also reported by Tran et al. (2018) [15].
In both ITS and matK phylogenetic trees, our sample of D. salaccense was not clustered with a group of the species accessions from GenBank. Interestingly, after searching other similar sequences from GenBank using the BLAST tool, our sample 24DT was homologous with D. hancockii at 99.71% in ITS data and 100% in matK data (data not show). These two species have the same Vietnamese name, "Hoang Thao Truc". Hence species confusion might happen during the sampling process. The scientific name of sample 24DTwas then corrected to D. hancockii.
Among three samples of D. fimbriatum, two samples, 22DT and 22DT2, were grouped with other D. fimbriatum accessions from GenBank but sample 22TT was totally separated from this group. However, when compared to GenBank sequences, the remaining sample 22TT was also matched with another conspecific accession D. fimbriatum (MK522230.1) and was closely related to D. devonianum species (Figure 1). A further observation on the original alignment of these accessions showed that sequences of 22TT and D. fimbriatum (MK522230.1) were highly similar throughout the length and were fractionated into different regions, in which some fragments were similar to other D. fimbriatum accessions, some were similar to D. devonianum sequences, and some were distinct from all of others. This result proposed the conclusion that the 22TT sample was a hybrid of D. fimbriatum and D. devonianum as these two species share the same local habitat (Appendix A). Otherwise, D. fimbriatum might be diverted into different directions of the evolution process. The variety D. gatton sunray was located in the same branch of D. pulchellum in both ITS and matK trees. D. pulchellum was crossed with D. chrysotoxum forming D. illustre. Then, D. illustre was crossed back with D. pulchellum to create D. gatton sunray. As a result, the hybrid, which contains lots of genetic characters from D. pulchellum, was grouped with its parent in phylogenetic trees.
Sequences of two species, D. signatum and D. tortile, were mixed up on the same branch. In terms of sexual morphology, their flowers are remarkably similar except that petals of species D. tortile are non-yellowed, more purple, and more twisted. Hence the molecular result was consistent with morphological features. D. signatum is sometimes called by the synonym scientific name D. tortile var. hildebrandi (Rolfe) T. Tang and F.T. Wang (1951). As a result, they had a very close genetic relationship. D. hercoglossum and D. linguella, are two synonym names of one species. On all phylogenetic trees, this species was closely related to D. nobile, D. signatum, and D. tortile and could not be completely distinguished.
Two species, D. primulinum and D. cretaceum, which have similar morphological features, were also close in genetic characters. The same situation also happened for two species, D. primulinum and D. cretaceum. The most divergent species was D. devonianum within our three conspecific samples, and even sequences of this species from GenBank were significantly separated into different branches on all ITS, matK, and rbcL trees. Although there was not enough data to clarify this issue, the results suggested a hypothesis of breeding between D. devonianum and other species in nature.
Briefly, there is a diversity of 28 species of Dendrobium in Southern Vietnam, including three hybrid species, which were investigated in this study. Among conspecific variations, there was also divergence, shown in different lengths of branches on the same cluster, i.e., species D. amabile, D. secundum, D. capillipes, D. chrysotoxum, and D. crystallinum (Figure 1).

Potential Sequences for Identification of Dendrobium Species in Southern Vietnam
Investigating genetic diversity of Dendrobium populations not only provides information for species management but also helps distinguish herbals and their adulterants, and significantly supports conservation by identifying and limiting trade of valuable and endangered species illegally. In this study, we assayed the potential of using sequences in species identification for practical conservation. In this analysis, 24 original species were included, except for three hybrids and the undetermined species D. devonianum. Twenty-three species were analyzed using matK and rbcL data since D. parishii could not be amplified. The most critical measurement for evaluation was the species resolution of each region. Therefore, tree-based methods and indel information were combined to optimize achievement (Appendix C). Criteria such as variable sites, informative parsimony sites, and singleton sites were also recorded.
From both ITS trees, three pairs of species were not separated, i.e., D. cretaceum and D. primulinum; D. hercoglossum and D. nobile; D. tortile and D. signatum. Our examination of insertion and deletion information from their full ITS sequences indicated the differences between D. cretaceum and D. primulinum at sites 86, 89, 221-222 (aligned with the complete ITS of Dendrobium primulinum HM054747.1) (shown in Figure 2), which did not exist in short version, ITS2. D. primulinum in this study had three deletions at sites 86, 221, 222, and 1 insertion at site 89. Therefore, these two species were distinguished, and ITS could identify 19 out of 24 species (79.16%). Although less divergent, the long ITS (15) contained more indel sites than the short ITS2 (12) and was proven to be useful in previous studies [28,29]. The combination of multiple loci as a single marker did not provide more species resolution.  In terms of best match/best close match methods in the evaluation of potential sequences for species identification, ITS2 gave the best results of the correct match, following by ITS and matK. rbcL gave the lowest effect (Table 3). Table 3. The identification results of the "best match/ best close match" method. In terms of best match/best close match methods in the evaluation of potential sequences for species identification, ITS2 gave the best results of the correct match, following by ITS and matK. rbcL gave the lowest effect ( Table 3).
The "best match/best close match" methods [30] are based on comparing the genetic distance of the analyzed sequences. The sequences that achieve intra-value are the smallest when compared to the order of the same species classified as correct. If this intra-value is also present when compared to other species, the sequence is classified as ambiguous. The sequences with intra-distances greater than inter-distances are categorized as incorrect. For the "best close match" method, a threshold value (%) is calculated based on all intra-distances, to determine the similarity of sequences. The sequences that do not meet this value (no match) will be deleted before being identified.
Both the matK and rbcL regions are quite conserved sequence areas [31], and there was a similarity level higher than 97%, so when the threshold (3%) was set, no sequence was classified as "no match". Meanwhile, the ITS and ITS2 sequences are sequences of high diversity, so the results (50 and 53, respectively) were higher than matK and rbcL. When using the "best close match" with a threshold of 3% of the ITS2 region, the highest results were obtained (48 sequences), indicating that ITS2 was the most likely area of determination in the studied regions. Therefore, the ITS and ITS2 sequence regions were identified as potential barcodes.
In general, the results derived from best match/best close match methods (Appendix D) were consistent with branch forming of each sample on phylogenetic trees. For instance, on the tree (Figure 1), 30PN was separated in another branch from the group of 30DT and 30TT. The best match calculation from ITS data also reported sample 30PN D. nobile as incorrect while the two remain samples of that species, 30DT and 30TT, were correct. However, for this method, the relationship among species was not visualized as well as the tree-based method. For instance, we could not recognize that D. anosmum and D. superbum were clustered on the same branch as they are synonymous names of the same species, or D. primulinum with D. cretaceum. Hence, best match/best close match methods were used just for general evaluation of identification potential of a sequence.

Discussion
ITS was also used in previous studies on identification of Dendrobium species, among which some studies focused on medicinal species for distinguishing herbals and their adulterants [13,14]. In a previous study of Tran et al. (2018) [15], 19 out of 23 Vietnamese Dendrobium species (82.61%) were identified using the ITS marker (Appendix E). In our study, 28 species were considered in which 19 species (67.86%) were identified using the same marker ITS. Some species were identified in study of by Tran et al. (2018) but not in ours, i.e., D. anosmum and D. nobile. In contrast, two species, D. amabile and D. fameri, were clearly separated on monophyletic branches in our study but not in the previous research. Unidentified species were species with their sequences grouped with sequences of other species, forming paraphyletic or polyphyletic branches [28]. In the two studies, ITS could not resolve 100% of Dendrobium species. However it was the best in comparison with matK and rbcL markers in our study. The difference of resolution effectiveness actually much depends on component of sample data. Sixteen species from our study were not included in study of Tran et al. (2018) and, vice versa, 11 species in their study were not in our collection. Tran and his colleagues collected samples from the whole of Vietnam and mostly from the northern areas, while our study collected species from southern regions. Besides, in the study of Tran et al. (2018), the sample size was small, with 32 specimens, and most of the sampled species (15 out of 23) were examined with only one representative sample. Therefore genetic diversity among conspecific individuals was not investigated in their study. In our study, 2 to 3 samples for each species, except for five species, D. aphyllum, D. parishii, D. salaccense, D. sulcatum, and D. tortile, were included for intra-and inter-specific genetic analyses. In short, our study results and the report of Tran et al. do not contradict each other but both gave a remarkable contribution to the sequence library of Vietnamese native Dendrobium diversity.
The intergenic spacer trnH-psbA was recommended by Yao et al. (2009) for the identification of 15 Dendrobium species [32] due to high divergence of sequences. In our study, this region was more difficult to amplify than other regions. The amplification rate was just 82.19% after repetition. This problem was consistent with the previous report of Gigot et al. (2007). trnH-psbA is supposed to contain too many tandem mononucleotide repeats which results in high levels of length variation and causes problem in amplification, bidirectional sequencing, and alignment [33].
The matK and rbcL markers were used for this orchid group by Asahina  Among those barcoding regions, ITS was the most commonly used. [2,8,9,14,15,25,[36][37][38][39]. Our results again confirmed the effect of ITS in the evaluation of genetic diversity and the identification of Dendrobium species not only in Southern Vietnam but also in other habitats.

Conclusions
The ITS2 region has the highest level of genetic diversity among the surveyed areas. In particular, the ITS region has more indels to help increase the ability to identify species. In general, both ITS and ITS2 have the most potential for assessment of genetic diversity and identification of Dendrobium species in Southern Vietnam. In this study, 19 Dendrobium species were recognized, many of which have high levels of diversity within the same species. Some species with easily confused morphological characteristics have also been redefined for accuracy based on molecular sequences. Research has contributed to increasing data in the library of Dendrobium of Vietnam and the world. Also, the two species with very similar morphologies can be distinguished, D. primulinum (used as medicinal herbs) and D. creatceum, to avoid confusion when using these species as medicinal herbs.     Appendix D Table A4. Sequence identification results based on best match/best close match methods.