Classification and Authentication of Lonicerae Japonicae Flos and Lonicerae Flos by Using 1H-NMR Spectroscopy and Chemical Pattern Recognition Analysis

Lonicerae japonicae flos and Lonicerae flos are increasingly widely used in food and traditional medicine products around the world. Due to their high demand and similar appearance, they are often used in a confused or adulterated way; therefore, a rapid and comprehensive analytical method is highly required. In this case, the comparative analysis of a total of 100 samples with different species, growth modes, and processing methods was carried out by nuclear magnetic resonance (1H-NMR) spectroscopy and chemical pattern recognition analysis. The obtained 1H-NMR spectrums were employed by principal component analysis (PCA), partial least-squares discriminant analysis (PLS-DA), orthogonal partial least-squares discriminant analysis (OPLS-DA), and linear discriminant analysis (LDA). Specifically, after the dimensionality reduction of data, linear discriminant analysis (LDA) exhibited good classification abilities for the species, growth modes, and processing methods. It is worth noting that the sample prediction accuracy from the testing set and the cross-validation predictions of the LDA models were higher than 95.65% and 98.1%, respectively. In addition, the results showed that macranthoidin A, macranthoidin B, and dipsacoside B could be considered as the main differential components of Lonicerae japonicae flos and Lonicerae Flos, while secoxyloganin, secologanoside, and sweroside could be responsible for distinguishing cultivated and wild Lonicerae japonicae Flos. Accordingly, 1H-NMR spectroscopy combined with chemical pattern recognition gives a comprehensive overview and provides new insight into the quality control and evaluation of Lonicerae japonicae flos.


Introduction
Lonicerae japonicae flos (LJF), the dried bud or flower with initial blooming of Lonicera japonica Thunb., is not only one of the most widely used traditional Chinese medicines (TCMs) but has also been widely used as a food or dietary supplement by people for thousands of years in China [1,2].In clinical settings, LJF is commonly used in the treatment of carbuncles, furuncles, febrile fever, and so on for its clearing heat and detoxification functions [3].What is more, it is also an important constituent for various traditional Chinese medicine preparations [4].There are more than 500 frequently used preparations containing LJF in the cure of diseases [5,6].Many of them, such as the Lianhuaqingwen capsule/granula that contain LJF, have shown satisfactory efficacy in preventing severe acute respiratory symptoms in 2003, influenza A/H1N1 in 2009, and especially the novel coronavirus (COVID-19) that broke out in 2019 [7][8][9].
Drug and food adulteration refers to the fraudulent and purposeful substitution or addition of some other substances to a product to enhance its apparent value or reduce its cost.With the increasing prevalence of adulteration, modern detection and quality control methods have gradually become more and more important [10,11].Due to the excellent pharmaceutical activities of LJF, the consumption of LJF is increasing largely.The phytochemical constituents of LJF are species-dependent and can significantly vary according to the geographical location, wild-harvested or cultivated, processing method, and cultivation method.However, Lonicerae flos (LF, Lonicera macrantha (D.Don) Spreng.) is often found to be substituted or adulterated in LJF for historical or commercial reasons.Since 2005, the Chinese Pharmacopoeia listed LJF and LF as two independent items based on their different plant morphology, chemical composition, and medicinal properties [12].It has been reported that LF is rich in saponins which can cause an immediate hypersensitivity reaction in drug injection [13].Thus, for the safety of clinical medication, especially in drug injection, LJF is forbidden to be substituted or adulterated by LF [14,15].However, the two kinds of traditional Chinese medicine have morphological similarity; they are not able to be easily and clearly identified.Even worse, people often use LF instead of LJF due to the high yields and low price of LF.In this situation, it is essential to solve the problem of the mixed use of LJF to ensure its efficiency and safety in clinical settings.
Over the past few decades, there have been many advances in the quality control of LJF and LF.For example, Shi et al. established a quality control method for LJF and LF using the chemical fingerprints of seven major compounds and antibacterial effects based on ultra-high performance liquid chromatography (UHPLC) and microcalorimetry [1].Zhao et al. developed rapid screening and quantitative analysis methods of adulterant LF in LJF by Fourier-transform near-infrared spectroscopy (FT-NIRs) and chemometrics [16].Cai et al. provided a quality evaluation approach of LJF and LF based on 50 multiple bioactive constituent contents determined by UFLC-QTRAP-MS/MS and multivariate statistical analysis [17].What is more, Gu et al. [18] and Xie et al. [19] provided tools to discriminate LJF according to species, growth modes, processing methods, and geographical origin with UHPLC and chemical pattern recognition.Most of the methods mentioned above used the contents of several active substances as the evaluation standard, which is limited to reflecting the overall quality of herbals.However, traditional Chinese medicine should be treated as a whole rather than relying on one or more of the main labeled compounds.Thus, it is necessary to focus on a new comprehensive, effective, and environmentally friendly method for the quality control of TCM.Nuclear magnetic resonance (NMR) technology, as one of the modern, accurate diagnostic methods, relies on the response of all proton-bearing compounds in the sample to achieve qualitative and quantitative analysis, which offers some advantages over other methods such as fast analysis, simple in sample preparation, it being highly robust and highly reproducible, having stronger specificity, not requiring specific reference substances, and not relying on chromatographic separation [20].NMR can collect all the chemical information of TCM, which is consistent with the holistic idea of TCM and can be a more comprehensive and effective tool for quality control [21].In the past few years, NMR has been widely used in the quality control [22] and authenticity identification of many kinds of TCM [23,24].However, to the best of our knowledge, there are no reports on the identification of LJF and LF by combining 1 H-NMR spectroscopy with chemical pattern recognition.
In this study, the application of the 1 H-NMR spectroscopy method coupled with chemical pattern recognition analysis for the profiling of LJF samples with different growth modes, processing methods, and LF is reported.All of the information data obtained from 1 H-NMR spectra were used for the establishment of chemometrics models.Partial least-squares discriminant analysis (PLS-DA) and orthogonal partial least-squares discriminant analysis (OPLS-DA) were applied to screen out the specific variations.The S-lines contributed to finding out the main differential components of different species and growth modes.Linear discriminant analysis (LDA) exhibited good classification abilities based on the specific variation in the classification of LJF with different species, growth modes, and processing methods.The established method is fast, environmentally friendly, and reproducible and can be a comprehensive quality evaluation method of LJF and LF without specific references.

Optimization of Sample Preparation
To obtain more useful chemical information, the conditions for the cultivated Lonicerae japonicae flos (cLJF) samples were optimized by comparing NMR solvents (CD 3 OD, (CD 3 ) 2 CO, (CD 3 ) 2 SO, and C 5 D 5 N).The results suggested that the CD 3 OD as the NMR solvent obtained the most chemical information compared with the other solvents.Therefore, CD 3 OD was considered as the optimum condition and applied equally to wLJF and LF (Figure 1).Six standard references, macranthoidin A, macranthoidin B, dipsacoside B, secologanoside, secoxyloganin and sweroside were also determined using the same experimental conditions as the samples.Their 1 H-NMR spectrums were shown in Figure 2.
Molecules 2023, 28, x FOR PEER REVIEW 3 of 15 abilities based on the specific variation in the classification of LJF with different species, growth modes, and processing methods.The established method is fast, environmentally friendly, and reproducible and can be a comprehensive quality evaluation method of LJF and LF without specific references.

Optimization of Sample Preparation
To obtain more useful chemical information, the conditions for the cultivated Lonicerae japonicae flos (cLJF) samples were optimized by comparing NMR solvents (CD3OD, (CD3)2CO, (CD3)2SO, and C5D5N).The results suggested that the CD3OD as the NMR solvent obtained the most chemical information compared with the other solvents.Therefore, CD3OD was considered as the optimum condition and applied equally to wLJF and LF (Figure 1).Six standard references, macranthoidin A, macranthoidin B, dipsacoside B, secologanoside, secoxyloganin and sweroside were also determined using the same experimental conditions as the samples.Their 1 H-NMR spectrums were shown in Figure 2.

Identification and Analysis of Different Species and Growth Modes 2.2.1. Principal Component Analysis (PCA)
To reduce the dimensionality of the multivariate data while preserving most of the variance within them, the PCA method was used, which is an unsupervised clustering method requiring no knowledge of the data set.In this study, A PCA analysis was performed based on the obtained data matrix with dimensions 100 (samples) × 200 (integrated values) using SIMCA-P 14.0 software.The analysis showed that the normalized integrate value matrix was transformed into principal components (PCs) for analysis, and 13 PCs were obtained, which extracted and explained 95.7% (R 2 X = 0.957) of the variance, and the predictive ability (Q 2 ) of the model was 86.9% (Q 2 = 0.869).The PCA score plot revealed that all the samples were divided into three separate groups, where LJF and LF could be distinguished, indicating that species is the main factor affecting Lonicera quality (Figure 3).However, wLJF is closer to LF, which revealed that growth mode may affect Lonicera quality.The PCA model showed that several wLJF were mixed with cLJF and LF samples; therefore, the supervised pattern recognition method was needed to find out the specific variation accurately.

Principal Component Analysis (PCA)
To reduce the dimensionality of the multivariate data while preserving most of the variance within them, the PCA method was used, which is an unsupervised clustering method requiring no knowledge of the data set.In this study, A PCA analysis was performed based on the obtained data matrix with dimensions 100 (samples) × 200 (integrated values) using SIMCA-P 14.0 software.The analysis showed that the normalized integrate value matrix was transformed into principal components (PCs) for analysis, and 13 PCs were obtained, which extracted and explained 95.7% (R 2 X = 0.957) of the variance, and the predictive ability (Q 2 ) of the model was 86.9% (Q 2 = 0.869).The PCA score plot revealed that all the samples were divided into three separate groups, where LJF and LF could be distinguished, indicating that species is the main factor affecting Lonicera quality (Figure 3).However, wLJF is closer to LF, which revealed that growth mode may affect Lonicera quality.The PCA model showed that several wLJF were mixed with cLJF and LF samples; therefore, the supervised pattern recognition method was needed to find out the specific variation accurately.

Extraction of the Specific Variation
When supervised pattern recognition research is carried out, samples are generally divided into a training set and a testing set.The training set was used to establish models and the testing set was used to verify the recognition accuracy and the predictive ability of the models [25].In this study, 100 batches of samples were randomly divided, of which 47 batches of cLJF, 6 batches of wLJF, and 8 batches of LF were used as the training set, and the remaining 39 batches were used as the testing set.In order to accurately discriminate LJF and LF, PLS-DA and OPLS-DA were combined for the first time for the dimensionality reduction of the data.Figure 4 displays the details of the dimensionality reduction process.

Extraction of the Specific Variation
When supervised pattern recognition research is carried out, samples are generally divided into a training set and a testing set.The training set was used to establish models and the testing set was used to verify the recognition accuracy and the predictive ability of the models [25].In this study, 100 batches of samples were randomly divided, of which 47 batches of cLJF, 6 batches of wLJF, and 8 batches of LF were used as the training set, and the remaining 39 batches were used as the testing set.In order to accurately discriminate LJF and LF, PLS-DA and OPLS-DA were combined for the first time for the dimensionality reduction of the data.Figure 4 displays the details of the dimensionality reduction process.PLS-DA was performed in SIMCA-P 14.0 software based on the data matrix with dimensions 61 (samples) × 200 (integrated values) of 61 batches of the training set.Variable importance in project (VIP), a commonly used variable importance in project, was selected to evaluate the contribution of the variables.The larger the VIP value of the variable, the greater its contribution to classification.Variables with VIP values greater than 1.0 can be distinguished as feature markers [26,27].With VIP value > 1 as the screening criteria, 75 variables were selected in the PLS-DA model.After that, the data matrix of 61 (samples) × 75 (integrated values) was used to perform PLS-DA and OPLS-DA models, and 42 and 43 variables were obtained, respectively.At the intersection, 35 variables were selected as the characteristic variables, indicating their contribution to the accuracy of discrimination.

Identification of the Characteristic Variables
In order to identify the different characteristic variables between the groups, the Slines analysis of the OPLS-DA models based on the selected 35 characteristic variables were obtained between LJF and LF on the one hand (Figure 5a) and between cLJF and wLJF on the other hand (Figure 5b).The results in Figure 5a indicate that the main difference between LJF and LF was found in the region between 0.65 and 1.80 ppm, which corresponds to the signals of macranthoidin A, macranthoidin B, and dipsacoside B. And they have positive intensities in LF compared to LJF (Figures 1 and 2).Interestingly, when comparing wLJF with cLJF in a separate model, 0.65 and 1.80 ppm were also the main different regions as shown in the S-line (Figure 5b).However, in the current Chinese Pharmacopoeia [28] and most of the research [29], macranthoidin B and dipsacoside B are considered as characteristic components to evaluate the chemical quality of LF, whereas they are in trace amounts in LJF.Thus, our results indicate that when macranthoidin B and PLS-DA was performed in SIMCA-P 14.0 software based on the data matrix with dimensions 61 (samples) × 200 (integrated values) of 61 batches of the training set.Variable importance in project (VIP), a commonly used variable importance in project, was selected to evaluate the contribution of the variables.The larger the VIP value of the variable, the greater its contribution to classification.Variables with VIP values greater than 1.0 can be distinguished as feature markers [26,27].With VIP value > 1 as the screening criteria, 75 variables were selected in the PLS-DA model.After that, the data matrix of 61 (samples) × 75 (integrated values) was used to perform PLS-DA and OPLS-DA models, and 42 and 43 variables were obtained, respectively.At the intersection, 35 variables were selected as the characteristic variables, indicating their contribution to the accuracy of discrimination.

Identification of the Characteristic Variables
In order to identify the different characteristic variables between the groups, the S-lines analysis of the OPLS-DA models based on the selected 35 characteristic variables were obtained between LJF and LF on the one hand (Figure 5a) and between cLJF and wLJF on the other hand (Figure 5b).The results in Figure 5a indicate that the main difference between LJF and LF was found in the region between 0.65 and 1.80 ppm, which corresponds to the signals of macranthoidin A, macranthoidin B, and dipsacoside B. And they have positive intensities in LF compared to LJF (Figures 1 and 2).Interestingly, when comparing wLJF with cLJF in a separate model, 0.65 and 1.80 ppm were also the main different regions as shown in the S-line (Figure 5b).However, in the current Chinese Pharmacopoeia [28] and most of the research [29], macranthoidin B and dipsacoside B are considered as characteristic components to evaluate the chemical quality of LF, whereas they are in trace amounts in LJF.Thus, our results indicate that when macranthoidin B and dipsacoside B are used as quality markers for LF, reasonable limits need to be set since they can also be detected in most wLJF.Besides, in order to avoid immediate hypersensitivity reactions, cLJF would be more suitable than wLJF when they are used as the raw material for drug injection production.
Previous research has proven that compared with cLJF, wLJF contains more secoxyloganin and secologanoside [19].This can also be seen from the 1 H-NMR spectra in our study.In the growth modes, other different characteristic variables are also found in the regions between 2.65 and 2.95 ppm, 4.60 and 4.70 ppm, and 5.20 and 5.30 ppm (Figure 5b) which represent the part signals of secoxyloganin and secologanoside (Figure 2).And the region of 5.50 and 5.55 ppm corresponds to the part signal of sweroside.
dipsacoside B are used as quality markers for LF, reasonable limits need to be set since they can also be detected in most wLJF.Besides, in order to avoid immediate hypersensitivity reactions, cLJF would be more suitable than wLJF when they are used as the raw material for drug injection production.Previous research has proven that compared with cLJF, wLJF contains more secoxyloganin and secologanoside [19].This can also be seen from the 1 H-NMR spectra in our study.In the growth modes, other different characteristic variables are also found in the regions between 2.65 and 2.95 ppm, 4.60 and 4.70 ppm, and 5.20 and 5.30 ppm (Figure 5b) which represent the part signals of secoxyloganin and secologanoside (Figure 2).And the region of 5.50 and 5.55 ppm corresponds to the part signal of sweroside.

Linear Discriminant Analysis (LDA)
The linear discriminant analysis system is robust in terms of classification, and it can also be used for dimension reduction or data visualization as well.It is a supervised machine learning method that computes decision boundaries in order to enhance separation between multiple classes, unlike PCA, which tries to maximize variance.A maximum distance between projected means and a minimal projected variance is used to distinguish different classes [25].
In this study, stepwise LDA was performed in SPSS 26.0 software based on the data matrix with dimensions 100 (samples) × 35 (integrated values), which used 61 batches of the training set and 39 batches of the testing set.Consequently, in order to generate discriminant functions, the LDA model finally selected seven characteristic variables, which denoted integrated values of variables V186, V184, V147, V145, V66, V31, and V17.Among the seven characteristic variables, V186, V184, V147, and V145 were in the regions of 0.65-1.80ppm and 2.65-2.95ppm.This further indicated that macranthoidin A, macranthoidin B, dipsacoside B, secoxyloganin, and secologanoside were the quality markers of the species and growth modes.The three discriminant functions of identification were as follows:

Linear Discriminant Analysis (LDA)
The linear discriminant analysis system is robust in terms of classification, and it can also be used for dimension reduction or data visualization as well.It is a supervised machine learning method that computes decision boundaries in order to enhance separation between multiple classes, unlike PCA, which tries to maximize variance.A maximum distance between projected means and a minimal projected variance is used to distinguish different classes [25].
In this study, stepwise LDA was performed in SPSS 26.0 software based on the data matrix with dimensions 100 (samples) × 35 (integrated values), which used 61 batches of the training set and 39 batches of the testing set.Consequently, in order to generate discriminant functions, the LDA model finally selected seven characteristic variables, which denoted integrated values of variables V186, V184, V147, V145, V66, V31, and V17.Among the seven characteristic variables, V186, V184, V147, and V145 were in the regions of 0.65-1.80ppm and 2.65-2.95ppm.This further indicated that macranthoidin A, macranthoidin B, dipsacoside B, secoxyloganin, and secologanoside were the quality markers of the species and growth modes.The three discriminant functions of identification were as follows: where A is the classification function of cLJF, B is the classification function of LF, and C is the classification function of wLJF.All the training sets were divided into three separate regions (Figure 6a), indicating significant differences between the samples with different species and growth modes.The leave-one-out cross-validation method was used as a powerful parameter to predict the accuracy of the model; the LDA model correctly classified 100.0% of the samples.To validate the classification prediction performance of the established model, three discriminant functions were used to verify 39 batches of testing set samples.The results of the discriminant function values (Supplementary Tables S1 and S2) and the score plot (Figure 6b) of the samples showed that all samples were accurately divided into their categories.This indicated that the established LDA model in our study is reliable in distinguishing cLJF, wLJF, and LF samples.
erful parameter to predict the accuracy of the model; the LDA model correctly classified 100.0% of the samples.To validate the classification prediction performance of the established model, three discriminant functions were used to verify 39 batches of testing set samples.The results of the discriminant function values (Supplementary Tables S1 and S2) and the score plot (Figure 6b) of the samples showed that all samples were accurately divided into their categories.This indicated that the established LDA model in our study is reliable in distinguishing cLJF, wLJF, and LF samples.

Verification of the Distinguishing Ability of the Characteristic Variables
To verify whether the seven variables can discriminate LJF and LF, PLS-DA and OPLS-DA models were performed in SIMCA-P 14.0 software.In the PLS-DA and OPLS-DA models, the data matrix with dimensions 100 (samples) × 7 (integrated values) was used.A clearer separation was achieved between the cLJF, wLJF, and LF samples, which can be seen in the score plots (Figure 7a,b).The validation results in the PLS-DA and OPLS-DA models show that all the testing set samples were correctly classified into the corresponding categories (Figure 7c,d).The models showed high values of R 2 X, R 2 Y, and Q 2 (Table 1), in which R 2 X and R 2 Y (close to 1) indicated the good fitness of the model, whereas the high Q 2 value (>0.5) showed the good predictivity of the model.

Verification of the Distinguishing Ability of the Characteristic Variables
To verify whether the seven variables can discriminate LJF and LF, PLS-DA and OPLS-DA models were performed in SIMCA-P 14.0 software.In the PLS-DA and OPLS-DA models, the data matrix with dimensions 100 (samples) × 7 (integrated values) was used.A clearer separation was achieved between the cLJF, wLJF, and LF samples, which can be seen in the score plots (Figure 7a,b).The validation results in the PLS-DA and OPLS-DA models show that all the testing set samples were correctly classified into the corresponding categories (Figure 7c,d).The models showed high values of R 2 X, R 2 Y, and Q 2 (Table 1), in which R 2 X and R 2 Y (close to 1) indicated the good fitness of the model, whereas the high Q 2 value (>0.5) showed the good predictivity of the model.
The PLS-DA and OPLS-DA models were validated using a permutation test to confirm the validity of the developed models.The 200 permutation tests were performed, and the vertical intercept values of R 2 and Q 2 of PLS-DA and OPLS-DA were (−0.0153, −0.238) and (−0.0247, −0.188), respectively, indicating that the models were not over-fitting (Figure 7e,f).In general, the accuracies of the LDA, PLS-DA, and OPLS-DA were all 100%, indicating their ability to distinguish the different species and growth modes of the samples based on the 1 H-NMR spectra.

Identification and Analysis of Two Processing Methods
The processing method of TCM is also an important factor influencing its quality and yield [15].Many studies have suggested that the plucking time and processing methods, influenced by nature, determine the quality and pharmacological effects of cLJF [30].Therefore, it was necessary to discuss the influence of the processing methods on cLJF.
In order to investigate the effects of different processing methods on cLJF, OPLS-DA and LDA models were used to classify the cLJF samples.Firstly, the cLJF samples were divided into a training set, which contained 26 batches of hot air drying and 28 batches of sun drying, and the remaining 23 batches were divided into a testing set.Secondly, the OPLS-DA model was built.In the training step of OPLS-DA, two hot air drying samples were misclassified as sun-drying and three sun-drying samples were misclassified as hot-air drying (Figure 8a).In the testing step of OPLS-DA, one sun drying sample was misclassified as hot air drying (Figure 8b).With the R 2 X and Q 2 of the OPLS-DA model, it indicated the good fitness and poor predictive ability of the model (Table 2).Finally, the 200 permutation tests were performed, and the vertical intercept values of R 2 and Q 2 of OPLS-DA were (0.331 and −0.612), respectively, indicating that the model was not over-fitting (Figure 8c).Moreover, based on the variables, with the VIP value >1 in the OPLS-DA model, we attempted to establish a stepwise LDA model to further discuss the influence of the processing methods (Figure 9).The results of the LDA analysis of two processing methods showed that due to the misclassification of J47, the sample prediction accuracy from the training set was 100%, while the sample prediction accuracy from the testing set was 95.65%.The value of cross-validation accuracy was 98.1%.As a result, compared to OPLS-DA, the LDA model could be a more appropriate method to classify the processing methods of cLJF.Moreover, based on the variables, with the VIP value >1 in the OPLS-DA model, we attempted to establish a stepwise LDA model to further discuss the influence of the processing methods (Figure 9).The results of the LDA analysis of two processing methods showed that due to the misclassification of J47, the sample prediction accuracy from the training set was 100%, while the sample prediction accuracy from the testing set was 95.65%.The value of cross-validation accuracy was 98.1%.As a result, compared to OPLS-DA, the LDA model could be a more appropriate method to classify the processing methods of cLJF.

Materials and Reagents
A total of 100 batches of LJF and LF samples were collected from all over China, including 77 batches of cultivated Lonicerae japonicae flos (cLJF) with different processing methods and different origins, 10 batches of wild Lonicerae japonicae flos (wLJF), and 13 batches of Lonicerae flos (LF).The detailed sample information is shown in Table 3.All the samples were authenticated by Chief Pharmacist Ji Zhang, who is the former Director of the Traditional Chinese Medicine Herbarium at the China National Institute for Food and Drug Control.Voucher samples were preserved and stored in the cold sample room of Shenzhen Institute for Drug Control.

Materials and Reagents
A total of 100 batches of LJF and LF samples were collected from all over China, including 77 batches of cultivated Lonicerae japonicae flos (cLJF) with different processing methods and different origins, 10 batches of wild Lonicerae japonicae flos (wLJF), and 13 batches of Lonicerae flos (LF).The detailed sample information is shown in Table 3.All the samples were authenticated by Chief Pharmacist Ji Zhang, who is the former Director of the Traditional Chinese Medicine Herbarium at the China National Institute for Food and Drug Control.Voucher samples were preserved and stored in the cold sample room of Shenzhen Institute for Drug Control.

Sample Preparation
The aqueous extract of the samples was based on a previously reported method [18].Briefly, each sample was accurately weighed at 6.00 g, soaked with 120 mL of water for 1 h, and extracted by reflux twice for 1 h each time.Then, the extract solution, filter, and

1 H-NMR Spectra Measurement
The 1 H-NMR spectra were analyzed by a Bruker AV III HD-500 NMR spectrometer with the experimental parameters based on the previous report with slight modification [31].The instrument was operated at a proton NMR frequency of 500.13MHz and acquired under automation at a constant temperature of 300 K.A presaturation sequence (ZGPR) was applied to suppress the residual solvent signal, of which the transmitter frequency offset was set for 4.842 ppm.For each sample, the 1 H-NMR spectrum consisted of 128 scans with a spectral width of 8012.820Hz and an acquisition time of 2.0447 s.The 1 H-NMR spectra of 100 batches of samples and the six standard references were determined using the above experimental conditions, and the spectra were automatically Fourier transformed (Figures 1 and 2).

Data Processing of the NMR Spectra
Automatic phase and baseline corrections were applied during spectra processing.The calibration of the data was performed by shifting the TMS signal to 0.0 ppm using MestReNova 9.0 software (version 14.2.0,Mestrelab Research, Santiago, Spain).The region of δ −0.025-9.975 in the nuclear magnetic resonance spectrum was integrated at the section of 0.05 ppm, thus producing 200 discrete bucketed regions.In order to eliminate the dimensional influence of each variable, all of the integrated values were normalized in relation to the peak of TMS signal intensity and scaled to 1.0.Then, a data set consisting of a 100 × 200 matrix was obtained for further chemometric analyses, in which rows represented samples and columns represented integrated values determined by NMR.

Multivariate Statistical Analysis
Chemical pattern recognition analysis can be described as the use of mathematical and statistical techniques to analyze several types of chemical data [32,33].It includes unsupervised pattern recognition and supervised pattern recognition [34].The most widely used method for unsupervised pattern recognition is principal component analysis (PCA), which only extracts important information from the data and is used to provide an intrinsic overview of the data set and reveal possible groups and outliers [35].The commonly applied supervised pattern recognition methods are partial least-squares-discriminant analysis (PLS-DA), orthogonal partial least-squares discriminant analysis (OPLS-DA), and linear discriminant analysis (LDA).In supervised pattern recognition methods, the samples of prior known information are usually divided into a training set and a testing set.
The 1 H-NMR spectrums were processed by MesrReNova 14.2.0 software (Mestrelab Research, Santigo, Spain).The normalized data matrix obtained from each sample was analyzed by SIMCA-P 14.0 software (Umetrics AB, Umea, Sweden) and SPSS 26.0 software (IBM, Chicago, IL, USA) for chemical pattern recognition.Simca-p14.0software was used for the PCA, PLS-DA, and OPLS-DA models and SPSS 26.0 software was used for the LDA models.

Conclusions
In this study, the 1 H-NMR spectrums of LJF and LF were obtained by 1 H-NMR spectroscopy, and the possibility of discriminating the species and processing methods of LJF was investigated systematically with PCA, PLS-DA, OPLS-DA, and LDA models.The LDA results were highly satisfactory, with good cross-validated predictions (higher than 98.1%) and low classification errors (below 4.35%).This strongly demonstrated that the constructed method can serve as a powerful tool for distinguishing LJF and LF and classification of LJF from different processing modes without a reference substance and accurate determination of the content.The S-line further indicated that the 0.65-1.80ppm and 2.65-2.95ppm represented the 1 H-NMR spectra profiles of macranthoidin A, macranthoidin B, dipsacoside B, secoxyloganin, and secologanoside, which can serve as a significant characteristic for distinguishing LJF and LF and the classification of LJF from different processing modes.Therefore, the 1 H-NMR spectra profiles coupled with chemical pattern recognition provide a new tool in the comprehensive quality control of LJF and offer a favorable strategy to solve the problem of mixed use of LF.

Figure 3 .
Figure 3. Score plot of the PCA based on the species and growth modes.

Figure 4 .
Figure 4. Flowchart for extraction of the specific variation.

Figure 4 .
Figure 4. Flowchart for extraction of the specific variation.

Figure 5 .
Figure 5. S-line obtained based on the OPLS-DA model between LJF and LF (a) and between cLJF and wLJF (b).

Figure 5 .
Figure 5. S-line obtained based on the OPLS-DA model between LJF and LF (a) and between cLJF and wLJF (b).

Figure 6 .
Figure 6.The classification models for samples based on the species and growth modes: (a) LDA score plot of the training set samples and (b) LDA score plot of the training set and testing set samples.

Figure 6 .
Figure 6.The classification models for samples based on the species and growth modes: (a) LDA score plot of the training set samples and (b) LDA score plot of the training set and testing set samples.

Figure 7 .
Figure 7.The classification models for the samples based on the species and growth modes.(a) PLS-DA score plot of the training set samples.(b) OPLS-DA score plot of the training set samples.(c) PLS-DA score plot of the training set and testing set samples.(d) OPLS-DA score plot of the training set and testing set samples.(e) Permutation test result of PLS-DA.(f) Permutation test result of OPLS-DA.Dotted line in (e,f) represent the regression line of R 2 and Q 2 in the permutation test.

Figure 8 .
Figure 8.The classification models for the cLJF samples based on the two processing methods: (a) OPLS-DA score plot of the training set samples; (b) OPLS-DA score plot of the training set and testing set samples; and (c) permutation test result of the OPLS-DA.Dotted line in (c) represents the regression line of R 2 and Q 2 in the permutation test.

Figure 9 .
Figure 9. LDA score plot of the training set and testing set samples of cLJF based on the two processing methods.

Figure 9 .
Figure 9. LDA score plot of the training set and testing set samples of cLJF based on the two processing methods.

Table 1 .
Summary of classification results from the PLS-DA and OPLS-DA models of species and growth modes.

Table 2 .
Summary of classification results from the OPLS-DA model of the two processing methods.