Screening of the Candidate DNA Barcodes for Three Important Amorphophallus Species Identification

Amorphophallus is widely distributed in Southeast Asia, Africa, and other places, with more than 170 species. Amorphophallus has high medicinal value and is commonly used in medicine. However, the current classification based on morphology is challenging in with regard to Amorphophallus and closely related species. This study used six barcodes, namely ITS2, matK, rcbL, nad1, trnH-psbA, and trnL-trnF, to evaluate their identification ability for three important Amorphophallus species, including A. konjac, A. albus, and A. muelleri. We recommend that trnH-psbA can be applied to the Amorphophallus trade, quickly identify the purity of A. konjac and A. albus and distinguish A. muelleri from its related species for A. konjac and A. albus genetic improvement.


Introduction
The Amorphophallus genus belongs to Araceae and is a perennial herb. It contains more than 170 species, mainly distributed throughout Asia and Africa [1]. Its tubers can produce many konjac glucomannan (KGM), which is widely used in medical fields [2]. Studies have found that KGM has therapeutic effects on many diseases, such as gastric cancer [3], inflammatory bowel disease [4], hyperglycemia, and hyperlipidemia [5,6], skin inflammation [7], and so on. KGM also has positive health effects, such as controlling body weight [8], maintaining a healthy intestinal flora ecology [9,10], etc. In the Amorphophallus market, the most widely traded are Amorphophallus konjac, followed by Amorphophallus albus. Among them, A. konjac has the highest glucomannan content, followed by A. albus [11]. Amorphophallus is usually made into dry chips for sale, and because there is no morphological difference between different Amorphophallus varieties, different Amorphophallus may be mixed in this process. This can lead to many problems, such as (1) mixed raw materials can cause economic losses to buyers or sellers, (2) mixed raw materials used in drug production can lead to insufficient efficacy or toxic effect, (3) diverse raw materials used in drug research can lead to wrong conclusions and affect new drugs' development. Therefore, it is vital to identify the purity of Amorphophallus' raw materials. The traditional classification of Amorphophallus is mainly based on the

Sampling Strategy
A total of 25 individuals of five Amorphophallus species (four A. bulbifer accessions, five A. konjac accessions, five A. albus accessions, five A. yuloensis accessions, and six A. muelleri accessions) were used for this study. Among them, A. konjac and A. albus were collected from Enshi, Hubei, China; A. bulbifer and A. muelleri were collected from Indonesia; A. yuloensis collected from Yunan, China. All the plants, as mentioned above, were transplanted in the Amorphophallus Resource Nursery of Wuhan University.
Healthy, fresh leaves of each plant were collected and dried immediately in silica gel for DNA extraction.

DNA Extraction, PCR Amplification, and Sequencing
Total genomic DNA was extracted from the dried leaves using the Plant Genomic DNA kit (Tiangen, Beijing, China) following the manufacturer's protocol. DNA concentrations were estimated and standardized on 1.0% (w/v) agarose gels. Six barcodes were used, including a nuclear genome fragments ITS2, four chloroplast genome fragments matK, rbcL, trnL-trnF, and trnH-psbA, and one mitochondrial genome fragment nad1. The PCR reaction system includes 10 µL 2XSINGKE Master Mix (TSINGKE, Beijing, China), 1.5 µL (50 ng/µL) Template DNA, 6.9 µL ddH 2 O, 0.8 µL primer each. Their primers were all published (except for nad1), and their amplification procedures were improved (Table 1). PCR amplification was performed on a LifePro Thermal Cycler (BIOER, Hangzhou, China). The first generation of the two-terminal sequencing strategy was used to ensure the accuracy of the data. Purification and sequencing were completed in TSINGKE Biology Company (TSINGKE, Beijing, China).

Data Analysis
Raw sequences for each region were assembled and edited by SeqMan TM , one of the LASERGENE software packages (DNASTAR, Inc. https://www.dnastar.com/), verified by a BLASTN search on GenBank. The sequence for subsequent analysis was uploaded to Genbank, and the Genbank serial number was obtained (Supplementary Table S1). Sequences were aligned using the ClustalW method in MEGA-X [42]. Then we removed the regions with low quality at the 5 and 3 ends after sequence alignment. The sequence characteristics of the six regions were analyzed by DnaSP6 software [43]. We evaluated six barcodes' classification efficiency using the Similarity-based method, Distance-based method, and Tree-based method. In the Similarity-based method, we used the "Best match" and "Best close match" algorithm from TaxonDNA software to test whether the matched query from the same species, the threshold is calculated by the "Pairwise Summary" function in this software [44]. The minimum interspecific distance should be greater than the maximum intraspecific distance in an ideal barcode, forming an obvious "Barcoding Gap" [45]. Therefore, we calculated pairwise genetic distances based on the Kimura 2-parameter (K2P) nucleotide substitution model obtained from MEGA-X [42]. The Automatic Barcode Gap Discovery (ABGD) program [46] was employed to test the identification ability of those barcodes, based on K2P distance; other settings were default values. Neighbor-joining (NJ) and Maximum-likelihood (ML) trees were constructed for each barcode by MEGA-X [42], based on a K2P distance model [47]. Relative support for the branches of the NJ and ML trees was assessed via 1000 bootstrap replicates. Only species with multiple individuals forming a monophyletic clade in trees with a bootstrap value above 70% were considered to be successfully identified. SPSS 22.0 software was used for all statistical calculations.

Barcode Universality
With the universal primers, these six barcodes were successfully amplified in all samples. We sequenced a total of 150 sequences. Except for the failure of the trnH-psbA sequence of a sample, all others were successfully sequenced.

Species Discrimination Based on Different Analytical Methods
We used three methods to evaluate these barcodes' identification ability and put our focus on the identification of the A. konjac, A. albus, and A. muelleri.
Distance-based method: In this method, we calculated the genetic distance between sequences to assess whether the barcode forms a "barcoding gap" within and between species, thereby determining the barcode's ability to identify the species [21]. We calculated the pairwise distance between A. konjac, A. albus, and A. muelleri with four other kinds of Amorphophallus species. The maximum intraspecific distance and minimum interspecific distance of all pairs are listed in Table 4. The results showed that the maximum intraspecific distance of trnH-psbA, ITS2, and trnL-trnF was less than the minimum interspecific distance in all groups. Relative distribution of intraspecific and interspecific distances for the six DNA barcodes in four groups (A. konjac; A. albus; A. muelleri; A. bulbifer and A. yuloensis) was performed (Figure 1), the result showed that trnH-psbA and trnL-trnF can form a "barcoding gap" (Figure 1e,f), remaining barcodes can't separate intra-and inter-specific distances. In order to verify the ability of different barcodes to identify species based on genetic distance. The Automatic Barcode Gap Discovery (ABGD) program [46] was employed ( Figure 2); the result showed that in the initial partition process, the program consistently divided all trnH-psbA sequences into four groups: (1) A. konjac; (2) A. albus; (3) A. muelleri; (4) A. bulbifer and A. yuloensis (Figure 2e).     Tree-based method [48]: The results of constructed neighbor-joining (NJ) and maximum-likelihood (ML) trees based on Kimura 2-Parameter (K2P) distance were almost the same (Figures 3 and 4). In both trees, trnH-psbA could divide A. konjac, A. albus, and A. muelleri into three independent clades (bootstrap value > 70%) (Figures 3e and 4e), while matK could divide A. konjac and A. albus into two independent clades (Figures 3b and 4b). In ML tree, although trnL-trnF could form three independent clades representing A. muelleri, A. bulbifer, and A. yuloensis, respectively, it could not distinguish A. konjac and A. albus (Figure 4f).

Discussion
The 100% PCR amplification rate showed that these six barcodes were not only found in other plants [19,21] but were also relatively conservative in Amorphophallus, which provides the possibility of testing those barcodes.
Among the six barcodes, trnH-psbA had the largest sequence length variation, ranging from 687 to 914 bp; the reason may be due to the existence of (AT) di-nucleotide repeat in trnH-psbA sequence which we have observed. This phenomenon was also found in Zhang's study [17], were they found that the length of trnH-psbA in Lysimachia also had a large variation, ranging from 378 to 473 bp, due to the existence of poly A/T. trnH-psbA had the largest variable sites (25.61%) and parsimony-informative sites (25.61%) in Amorphophallus, which corresponded with the report of Azuma et al. and Hamilton et al. [49,50], they have found that the trnH-psbA had a high percentage of the variable site in Magnoliaceae and Lecythidaceae. Many other reports have also shown that trnH-psbA has a high variation in many plants, such as in the genus Rhododendron [21], Labiatae [19], and Myristicaceae [51]. We speculate that there may be two reasons for the high variation of trnH-psbA: (1) As mentioned above, the trnH-psbA sequence has been found to have an amount of di-nucleotide repeat in this and other studies. Due to the presence of the di-nucleotide repeat region, replication slippage is more likely to occur during DNA replication [52], which makes the sequence more prone to mutation; (2) Compared with other barcodes (rbcL, nad1, matK) in the coding region, trnH-psbA is subject to neutral selection and can tolerate more variations without being eliminated, so it accumulates more DNA variations in evolution. Correspondingly, trnH-psbA also has the largest interspecific divergence (0-22.74%). This result corresponds to Kress [53], whose group demonstrated that the trnH-psbA was highly variable in flowering plants, suggesting that trnH-psbA may be an ideal barcode to distinguish species because it has great genetic divergence among different species. Our study also had differences with other studies. Many researchers have found that ITS2 had a greater variation than trnH-psbA in many plants, such as in Lysimachia L [17], Parnassia [48], Apiaceae [54]. But in this genus, the trnH-psbA was more varied than ITS2, which may indicate that trnH-psbA has a faster evolution rate than ITS2 and the other four regions in the genus Amorphophallus. Our study found that the noncoding sequences (ITS2, trnH-psbA, trnL-trnF) have more variable sites than coding sequences (matK, nad1, rbcL). This was consistent with the classical hypothesis that DNA variation occurs more in noncoding regions than in coding regions, and the same phenomenon was also found by Zuo et al. [16].
We used three methods to evaluate the barcodes' identity ability; no matter which method was used, trnH-psbA could always separate A. konjac, A. albus, and A. muelleri from each other. Previous studies have found that the trnH-psbA has a good identification effect in other plants, such as Sabia parviflora [55], Labiatae [19], Acacia [56], and Polygonaceae [27]. On the contrary, the core barcodes recommended by CBOL, matK, and rbcL did not have good identification ability in this genus and other plants [16,17,48,57,58], which may indicate that the coding sequence may not be suitable for general plant barcode. Therefore, more studies are needed in DNA barcoding to find more appropriate barcodes for plants in the future.
In this study, the classification of the three Amorphophallus, has two applications; one is used to identify the purity of raw materials of A. konjac and A. albus, which can not only ensure the interests of trade without loss, but also ensure the purity of Amorphophallus pharmaceutical products. On the other hand, A. muelleri can be quickly and accurately screened from its related species, which can be used for genetic improvement of A. konjac and A. albus.
Although DNA barcodes have been used in species classification and identification, there are still some problems: (1) in this study, although trnH-psbA could achieve our classification purpose, some difficulties arose during sequencing (one sample of trnH-psbA failed to be sequenced) and aligned due to containing amount poly (AT); (2) there is currently no standard method to detect the classification ability of DNA barcodes [59]. We often have to use a variety of methods to detect the classification ability of barcodes, and thus, the results may not be consistent. Therefore, in the future, we need to look for more suitable DNA barcodes and develop a standard method for screening DNA barcodes. In practical applications (such as authentification in the trade of medicinal materials), barcodes can also be combined with other molecular biology techniques to identify species faster, such as Amplification Refractory Mutation System (ARMS) technology [60].
Funding: This research received no external funding.