Chemotaxonomic Classification of Peucedanum japonicum and Its Chemical Correlation with Peucedanum praeruptorum, Angelica decursiva, and Saposhnikovia divaricata by Liquid Chromatography Combined with Chemometrics

The roots of Peucedanum japonicum (Apiaceae) have been used as an alternative to the roots of Saposhnikovia divaricata (Apiaceae) to treat common cold-related symptoms in Korea. However, a variety of Peucedanum species, including the roots of P. praeruptorum or Angelica decursiva (=P. decursivum), have been used to treat phlegm–heat-induced symptoms in China. Hence, as there are differences in the medicinal application of P. japonicum roots between Korea and China, chemotaxonomic classification of P. japonicum was evaluated. Sixty samples derived from P. japonicum, P. praeruptorum, A. decursiva, and S. divaricata were phylogenetically identified using DNA barcoding tools, and chemotaxonomic correlations among the samples were evaluated using chromatographic profiling with chemometric analyses. P. japonicum samples were phylogenetically grouped into the same cluster as P. praeruptorum samples, followed by S. divaricata samples at the next cluster level, whereas A. decursiva samples were widely separated from the other species. Moreover, P. japonicum samples showed higher chemical correlations with P. praeruptorum samples or A. decursiva samples, but lower or negative chemical correlations with S. divaricata samples. These results demonstrate that P. japonicum is more genetically and chemically relevant to P. praeruptorum or A. decursiva and, accordingly, the medicinal application of P. japonicum might be closer to the therapeutic category of these two species than that of S. divaricata.


Introduction
Bang-Pung (Saposhnikoviae Radix), a traditional herbal medicine, has been used to treat common cold-induced disorders and relieve pain, and it originates from the roots of Saposhnikovia divaricata (Turcz.) Schischk. (Apiaceae) in Korean and Chinese pharmacopieas [1,2]. The roots of Peucedanum japonicum Thunb. (Apiaceae) are exclusively registered as "Sik-Bang-Pung" (Peucedani Japonici Radix; "Sik" means cultivated) in the Korean herbal pharmacopeia [3]. As the roots of P. japonicum are recognized as an alternative to Bang-Pung (the roots of S. divaricata), it is interesting that two different herbs belonging to two separate genera are used for the same medicinal purpose. Another notable issue is that the plant P. japonicum and its roots are botanically and medicinally named "Bin-Hae-Jeon-Ho" (bin hai qian hu; "Bin-Hae" means the plant lives near coastal areas), a type of "Jeon-Ho" in the Chinese literature [4,5]. Jeon-Ho, a medicinal name of Peucedani Radix, has been used to treat phlegm-and heat-induced respiratory disorders profiling' method for chemically distinguishing between Atractylodes species and Amomum species, using ITS sequencing, HPLC analysis, and chemometrics [24][25][26].
In this research, we collected samples of the aforementioned four species and confirmed their genetic authentication using ITS sequence-based DNA barcoding analysis to establish the chemotaxonomic relevance of P. japonicum (PJ), S. divaricata (SD), P. praeruptorum (PP), and A. decursiva (AD). Thereafter, chemical profiling of the samples was carried out using an HPLC-diode array detector (HPLC-DAD) and chemotaxonomic classification was performed by chemometric analysis.

Internal Transcribed Spacer Regions of Nuclear Ribosomal Cistron
The nucleotide sequences of the ITS (ITS1, ITS2, include 5.8 s) were used for identification of the species of distributed Peucedani Radix, Peucedami Japonici radix and Saposhnikoviae Radix. Around 689-693 bp amplified nucleotide sequences were determined, based on the samples listed in Tables 1 and 2. The determined sequences were confirmed using the Blast in NCBI GenBank data. In total, 44 site differences were observed among the four species shown in Tables 1 and 2 (Table 3). A difference in 37 sites was observed between two species of Peucedani Radix (PJ and AD) with a 0.94 sequence identity matrix (Table S1). The two same genus samples, PP and PJ, had 16 site differences (sequence identity matrix 0.975; Table S1). SD and PJ had nine different sites (sequence identity matrix 0.986; Table S1). The results indicated that the ITS could be used as a clear and useful tool to identify the species of medicinal herbs in Tables 1 and 2.

Chloroplast Genome-Based DNA Barcode Sequence Analysis
Four chloroplast DNA barcode regions (rbcL, marK, psbA-trnH, and trnL-F intergenic spacer) were analyzed to supplement the results of the ITS. The psbA-trnH region had the most variable sites among the four plastid barcode regions (except the ITS region). On the other hand, the trnL-F intergenic spacer was the most conserved region among all of the analyzed DNA barcode regions (including ITS).
Although four analyzed plastid loci had less abundant variable sites and were highly conserved compared to the ITS region, it could be useful to separate the species used in this study.
To obtain more detail, a 390F/1326R primer set was used to amplify the matK, and the 933-base partial nucleotide sequences were determined. The total number of variable sites was 13; however, some of them were highly conserved. In particular, there were only two site differences between PP and PJ. Nevertheless, matK could distinguish four species sufficiently.
For the rbcL with a rbcL a-f/724R primer set, 670-base partial nucleotide sequences were determined in all samples listed in Tables 1 and 2. Six variable sites were found, which was a well-conserved region among the five DNA barcode regions. The sequence identity matrix of rbcL was 0.992-1.000. In the rbcL region, PP and PJ showed identical nucleotide sequences (sequence identity matrix 1, Supplementary Materials Table S1). Except for this case, other species could be identified by comparing rbcL nucleotide sequences.
The psbA-trnH with a trnH2/psbAF primer set showed different lengths of amplified product dependent on the species, and around 311-345 base partial nucleotide sequences were determined. The psbA-trnH had the shortest aligned length among the five DNA barcode regions but showed the most variable sites (17) among the four chloroplast barcode regions, and had several indel sequences. Therefore, the sequence identity matrix ranged from 0.904 to 0.944. Between PP and PJ, the closest result was 0.944.
A trnL-e/trnL-f primer set was used for the trnL-F intergenic spacer, and 444-base nucleotide sequences were identified. As mentioned earlier, the trnL-F intergenic spacer was the most conserved region, and the sequence identity matrix ranged from 0.99 to 1.00. Four variable sites were observed in AD alone, and the other three species showed the same sequence.
A single region chloroplast DNA barcode analysis approach and be inaccurate in determining the species' identity. Therefore, the Consortium for the Barcode of Life (CBOL)-Plant working group also recommends two-or more locus combinations to initiate the barcode process for plant species. The recommended standard combination locus is rbcL-matK, because this combination is a practical solution to the complex balance between universality, sequence quality classification, and cost. [27]. However, this combination is not always efficient and, in this case, three-or more locus analyses are necessary. Therefore, the psbA-trnH and trnL-F intergenic spacer regions were additionally selected for analysis in this study. Both loci were quite short, which suggests that they could practically be used for the analysis of processed distributed samples. Unfortunately, the trnL-F intergenic spacer regions of the four species shown in Tables 1 and 2 were highly conserved. In contrast, the psbA-trnH is already known as one of the most variable genome segments in the angiosperm chloroplasts. As expected, it had most variable sites in this study, but it also had many indel sequences. Therefore, single-locus analysis was not efficient.

Phylogenetic Analysis
To analyze the genetic relationship among the four species used in this study, the PhyML + SMS (Maximum likelihood-based inference of phylogenetic trees with Smart Model Selection) program was performed with concatenated nucleotide sequences of ITS and four chloroplast DNA barcodes. Four species commonly located in the Selineae tribe of the Apiacae family are clustered into two groups, genus Angelica and genus Peucedanum (including genus Saposhnikovia) ( Figure 1). The results of genetic flexibility analysis of the four species show that PP and PJ are very close. Comparing the results with the plants of genus Peucedanum, PP and PJ, shows that they are not only the same genus, but genetically closer to other species. AD is located in the genus Angelica group and is relatively far in genetic distance compared to the other three species.

Chromatographic Profiling of PJ, PP, AD, and SD Samples
HPLC analytical conditions, including the mobile phase modifier, mobile phase composition, and UV detection wavelength, were optimized for the chromatographic analysis of the samples. The addition of 0.1% trifluoroacetic acid (TFA; v/v) in water with acetonitrile was chosen owing to better inter-peak separation and clearer detection of peaks under the acidic mobile phase in the mobile phase without TFA. The UV detection wavelengths were selected based on the optimal absorbance of each peak as follows: 4 peaks at UV 235 nm, 13 peaks at UV 250 nm, 14 peaks at UV 275 nm, 4 peaks at UV 300 nm, 10 peaks at UV 310 nm, 46 peaks at UV 325 nm, 7 peaks at UV 323 nm, and 2 peaks at UV 350 nm (Supplementary Materials Table S2).

Chromatographic Profiling of PJ, PP, AD, and SD Samples
HPLC analytical conditions, including the mobile phase modifier, mobile phase composition, and UV detection wavelength, were optimized for the chromatographic analysis of the samples. The addition of 0.1% trifluoroacetic acid (TFA; v/v) in water with acetonitrile was chosen owing to better inter-peak separation and clearer detection of peaks under the acidic mobile phase in the mobile phase without TFA. The UV detection wavelengths were selected based on the optimal absorbance of each peak as follows: 4 peaks at UV 235 nm, 13 peaks at UV 250 nm, 14 peaks at UV 275 nm, 4 peaks at UV 300 nm, 10 peaks at UV 310 nm, 46 peaks at UV 325 nm, 7 peaks at UV 323 nm, and 2 peaks at UV 350 nm (Supplementary Materials Table S2).
The intraday precision of the sample was <0.2% for retention time and <4.0% for the absolute peak area, and interday precision was <0.1% for retention time and <3.0% for the peak area (Supplementary Materials Table S3).
The overlapping intra-species chromatograms showed similar patterns within the samples of each species, except for a few outliers in the AD and SD samples, whereas those of inter-species comparisons showed slightly different patterns among PJ, PP, and AD samples, with a few identical peaks for PJ and AD samples after a retention time of The intraday precision of the sample was <0.2% for retention time and <4.0% for the absolute peak area, and interday precision was <0.1% for retention time and <3.0% for the peak area (Supplementary Materials Table S3).
The overlapping intra-species chromatograms showed similar patterns within the samples of each species, except for a few outliers in the AD and SD samples, whereas those of inter-species comparisons showed slightly different patterns among PJ, PP, and AD samples, with a few identical peaks for PJ and AD samples after a retention time of 40 min. The chromatographic patterns of the SD samples were distinguishable from those of the PJ, PP, and AD samples over the entire retention time (Figures 2 and S1). Among the peaks commonly occurring in more than two species, the average peak areas of many peaks were significantly different between two species as follows: 26   Among the peaks commonly occurring in more than two species, the average peak areas of many peaks were significantly different between two species as follows: 26 peaks between PJ and AD samples (peaks 1, 3, 4, 14 Figure S2).

Clustering Analysis of the Samples Using Chemometric Statistical Methods
The chemotaxonomic correlations between PJ samples and PP, AD, and SD samples were measured using chemometric clustering tools, principal component analysis, and k-means clustering analysis. As shown in Figure 3, the samples of each species formed distinct clusters based on their own origins in the principal component (PC) scores, PC1 and PC2, which explained 21.7% and 18.7% of the total variance, respectively. PJ, PP, and AD samples were densely located in their own species groups which were clearly separated from each other. Although the SD samples also formed clusters, they were distributed widely in the SD group, resulting in the interruption of a few samples (SD02 and SD03) in the PP group. The PJ samples were located closer to the PP samples than to the SD samples, whereas the AD samples were exceptionally different from the other sample groups in terms of PC1 and PC2 scores.

Clustering Analysis of the Samples Using Chemometric Statistical Methods
The chemotaxonomic correlations between PJ samples and PP, AD, and SD samples were measured using chemometric clustering tools, principal component analysis, and kmeans clustering analysis. As shown in Figure 3, the samples of each species formed distinct clusters based on their own origins in the principal component (PC) scores, PC1 and PC2, which explained 21.7% and 18.7% of the total variance, respectively. PJ, PP, and AD samples were densely located in their own species groups which were clearly separated from each other. Although the SD samples also formed clusters, they were distributed widely in the SD group, resulting in the interruption of a few samples (SD02 and SD03) in the PP group. The PJ samples were located closer to the PP samples than to the SD samples, whereas the AD samples were exceptionally different from the other sample groups in terms of PC1 and PC2 scores. In the k-means clustering analysis, four clusters were selected as the optimal number of clusters, which was determined using the silhouette method ( Figure S3). The distribution of samples according to their dimension scores was consistent with those in the PC plot. However, the grouping of samples was different, as follows: PP01, PP26, SD02, and SD05 samples were contained in the PJ cluster and PP27 was contained in the SD cluster. Hence, the PJ cluster was intruded by the PP cluster which also slightly overlapped with the SD cluster. The AD cluster also showed a distinct distance from the other clusters, as shown in the PC plot ( Figure 4). In the k-means clustering analysis, four clusters were selected as the optimal number of clusters, which was determined using the silhouette method ( Figure S3). The distribution of samples according to their dimension scores was consistent with those in the PC plot. However, the grouping of samples was different, as follows: PP01, PP26, SD02, and SD05 samples were contained in the PJ cluster and PP27 was contained in the SD cluster. Hence, the PJ cluster was intruded by the PP cluster which also slightly overlapped with the SD cluster. The AD cluster also showed a distinct distance from the other clusters, as shown in the PC plot ( Figure 4). The results of clustering analyses demonstrate that the PJ samples showed better chemical proximity with the PP samples than the SD samples due to the PC (=dimension score in k-means) scores of the PJ samples being closer to those of the PP samples than the SD samples [28,29].

Evaluation of Similarity between the Samples Using Pearson's Correlation Coefficient
The similarity among the PJ, PP, AD, and SD samples was evaluated using Pearson's correlation coefficient (r), which ranged from −1 to +1 (Table S4). In the correlation plot, the PJ samples showed a higher correlation with the AD samples (r = 0.791 for mean and 0.976 for median), followed by most PP samples (r = 0.029 for mean and 0.021 for median), and were negatively correlated with the SD samples (r = −0.099 for mean and −0.097 for median). The correlations between the PP and AD samples (r = −0.011 for mean and −0.014 for median), PP and SD samples (r = −0.038 for mean and median), and AD and SD samples (r = −0.061 for mean and −0.060 for median) were not different. The mean and median r values also showed that the intra-species relationships were as follows: PJ-PJ > PP-PP = AD-AD > SD-SD. Meanwhile, inter-species relationships were as follows: PJ-AD > PJ-PP > PP-AD > PP-SD > AD × SD > PJ-SD ( Figure 5 and Table 4). The results of clustering analyses demonstrate that the PJ samples showed better chemical proximity with the PP samples than the SD samples due to the PC (=dimension score in k-means) scores of the PJ samples being closer to those of the PP samples than the SD samples [28,29].

Evaluation of Similarity between the Samples Using Pearson's Correlation Coefficient
The similarity among the PJ, PP, AD, and SD samples was evaluated using Pearson's correlation coefficient (r), which ranged from −1 to +1 (Table S4). In the correlation plot, the PJ samples showed a higher correlation with the AD samples (r = 0.791 for mean and 0.976 for median), followed by most PP samples (r = 0.029 for mean and 0.021 for median), and were negatively correlated with the SD samples (r = −0.099 for mean and −0.097 for median). The correlations between the PP and AD samples (r = −0.011 for mean and −0.014 for median), PP and SD samples (r = −0.038 for mean and median), and AD and SD samples (r = −0.061 for mean and −0.060 for median) were not different. The mean and median r values also showed that the intra-species relationships were as follows: PJ-PJ > PP-PP = AD-AD > SD-SD. Meanwhile, inter-species relationships were as follows: PJ-AD > PJ-PP > PP-AD > PP-SD > AD × SD > PJ-SD ( Figure 5 and Table 4). Molecules 2022, 26, x FOR PEER REVIEW 11 of 17    The average r values of the samples of each species relative to the individual samples of other independent species exhibited diverse intra-and inter-sample variations compared to the rest of the species samples, that is, higher correlations for PP01-PJ samples, PP01-and PP27-AD samples, and PP26-SD samples and lower correlation for AD01-PJ samples, PP01-PP samples, AD01-PP samples, AD01-AD samples, and SD10-SD samples (Figures S4-S7).
Positive and higher coefficient (r) values between the PJ and the AD samples indicate a stronger inter-species correlation, as compared with the weaker inter-species correlation between the PJ and the PP samples, with lower r values. Negative and weaker r values between the PJ and the SD samples indicated that their inter-species similarity was lower than those with positive r values [30,31].
The chemotaxonomic classification of PJ, PP, AD, and SD samples based on phylogenetic authentication was successfully evaluated using chromatographic profiling combined with chemometric analysis, and their chemical characteristics were clearly reflected by their own species. The chemical correlation between PJ and the other three species was also investigated, and it was observed that PJ had a strong chemical relationship with PP and AD, depending on the measured chemometric analyses; however, its chemical relationship with SD was less or even negligible.
As the therapeutic activity of herbal medicines can possibly be explained by their chemical constituents, the chemotaxonomic closeness between PJ and PP or AD might provide evidence for their analogous medicinal application to reduce phlegm and heatinduced respiratory symptoms. Meanwhile, the chemotaxonomic distance between PJ and SD presumably indicates their therapeutic dissimilarity with respect to relieving common cold-induced disorders. Further pharmacological and clinical studies are required to confirm these chemotaxonomic results.
Nonetheless, there are a few limitations to this study, as follows: (1) sample numbers for AD or SD were insufficient, mainly owing to difficulties in collecting genuine species; (2) intra-species chemical variation in SD samples was prominent, presumably owing to diverse habitats [32,33], and this could be caused by environmental factors, including temperature, dryness/humidity, rainfall, soil conditions, and altitude, which definitely affect the production of secondary metabolites in herbal plants [34][35][36]; (3) insufficient research on chemical classifications among the aforementioned four species.
Sixteen samples of PJ, twenty-seven samples of PP, seven samples of AD, and ten samples of SD were collected from agricultural plantations, natural habitats, and markets in Korea and China, and were also supplied by the Korea Institute of Oriental Medicine ( Table 1). The collected samples were morphologically authenticated by the authors (J.H. Kim and G. Lee). For genetic identification, 20 additional voucher samples were used as standard reference samples ( Table 2). All voucher specimens (code No. PNUKM-2021-PJ01-PJ16, PP01-PP27, AD01-AD07, and SD01-SD10) and extracted genomic DNA were deposited at the herbarium of the College of Korean Medicine in Wonkwang University and at the School of Korean Medicine, Pusan National University.

Preparation of Genomic DNA
A NucleoSpin ® Plant II kit (Macherey-Nagel, Dueren, Germany) with PL1 lysis buffer was used to extract the genomic DNA of the samples in Tables 1 and 2. Some samples required extra steps to treat 10% cetyltrimethylammonium bromide with 0.7 M NaCl added to remove high levels of phenolic compounds and/or polysaccharides.

Determination of DNA Sequences of PCR Product
All of the PCR products were sub-cloned after being separated from agarose gels by use of a TOPcloner™ TA Kit (Enzynomics, Daejeon, Korea). The DNA sequences were then determined through an interpretation performed by Bioneer (Daejeon, Korea). To improve the accuracy of the results, the DNA barcode analysis was repeated three times, independently from the genomic DNA extraction stage.

Analysis of DNA Sequences and Preparation of Dendrogram
The determined DNA sequences were first analyzed using Bioedit's ClustalW multiple sequence alignment (Bioedit, v7.0.9; available from http://www.mbio.ncsu.edu/BioEdit/ page2.html, accessed on 28 November 2021) and reconfirmed with multiple sequence alignment using MAFFT (MAFFT, v7; available from https://mafft.cbrc.jp/alignment/server/, accessed on 29 November 2021) [42]. To confirm the polymorphism represented by IUPAC symbols, all sequences were identified at least twice using a chromatogram of nucleotide sequences provided by the Bioneer sequencing service. The phylogenetic tree for the ITS region was analyzed using MAFFT (multiple alignment, v7.407_1), BMGE (alignment curation, v.1.12_1) [43] and PhyML (tree inference based on the maximum-likelihood, v.3.1_1) [44,45] workflow (PhyML/OneClick, available from https://ngphylogeny.fr/, accessed on 3 December 2021). Phylogenetic analysis of the combined five DNA barcodes (ITS and four plastid regions) was followed by the PhyML+SMS/OneClick method, which was performed in accordance with MAFFT, BMGE, and PhyML+SMS (maximum likelihood-based inference of phylogenetic trees with Smart Model Selection, available from https://ngphylogeny.fr/, accessed on 8 December 2021) [46]. All sequence results completed in the analysis were compared and confirmed with the NCBI GenBank using Blast [47]. The reference data ITS nucleotide sequences for analysis of the genetic relationships comes from NCBI Genbank and were represented by the accession number in phylogenetic tree. The Chloroplast barcode data were also collected from NCBI Genbank, including MT671397.

Preparation of Samples for HPLC Analysis
The pulverized PJ, SD, PP, and AD samples were homogenized using a 500 µm testing sieve (Chunggyesanggong-sa; Gunpo, Gyeonggi, Korea). The extraction was performed using a powder of 500 mg with 5 mL of methanol using an ultrasonic extractor (Power Sonic 520; Hwashin Tech, Daegu, Korea) for 30 min. The extract, filtered through a 0.2 µm syringe filter (BioFact, Daejeon, Korea), was evaporated using a nitrogen-blowing concentrator (MGS2200; Eyela, Miyagi, Japan) and was re-dissolved in HPLC-grade methanol at a concentration of 20,000 µg/mL prior to HPLC injection.
The precision of the analytical methods was determined by analyzing the retention times and absolute areas of selected peaks of PJ samples within one day (intraday precision) and over three consecutive days (interday precision). Precision was represented as relative standard deviations (RSDs), where RSD (%) = ((standard deviation/mean value) × 100).

Chemometric Statistical Analysis
The genetically identified samples of PJ, PP, AD, and SD were analyzed using HPLC, and the chemical relationship between the samples was evaluated using chemometric tools, that is, principal component analysis, k-means cluster analysis, and Pearson's correlation analysis. In total, 100 peaks were selected as profiling peaks, which were >1.0% of the total peak area. Sixty samples and the absolute area of each profiling peak were used as a matrix for construction of the PC plot, the k-means cluster plot, and for the calculation of Pearson's correlation coefficient. The difference between the absolute area of each peak from the samples of independent species was evaluated using Tukey's test, with significance at p < 0.05, p < 0.01, and p < 0.001. Chemometric analyses and Tukey's test were conducted using the open source software R (v. 4.1.2; The R Foundation for Statistical Computing).

Conclusions
Overall, 60 samples of PJ, PP, AD, and SD were phylogenetically authenticated using ITS and chloroplast genome-based DNA barcoding analysis at the species level. Chemotaxonomic classification of PJ and its chemical correlation with the remaining three species were investigated using chromatographic profiling with chemometric analyses. PJ samples, which showed the closest phylogenetic relationship with PP samples, showed a stronger chemical correlation with PP or AD samples but a weaker or even negative chemical correlation with SD samples. The results from the phylogenetic analysis-hyphenated chemotaxonomic correlation suggested the transfer of PJ from SD to the category of PP or AD for medicinal applications. Further pharmacological and clinical study would be necessary to support the chemical re-categorization of PJ.
Supplementary Materials: The following supporting information can be downloaded online, Figure S1: Chromatograms of the samples of Peucedanum japonicum, P. praeruptorum, Angelica decursiva, and Saposhnikovia divaricata; Figure S2: Multiple comparison of absolute peak areas among the samples; Figure S3: Silhouette plot; Figures S4-S7: Average Pearson's correlation coefficients among the samples; Table S1: Sequence identity matrix; Table S2: Retention times and detection wavelengths of profiling peaks; Table S3: Precision of profiling peaks; Table S4: Pearson's correlation coefficients among the samples.