Comparative Analysis of Chemical Constituents of Moringa oleifera Leaves from China and India by Ultra-Performance Liquid Chromatography Coupled with Quadrupole-Time-Of-Flight Mass Spectrometry

With the aim to discuss the similarities and differences of phytochemicals in Moringa oleifera leaves collected from China (CML) and India (IML) in mind, comparative ultra-performance liquid chromatography coupled with quadrupole-time-of-flight mass spectrometry (UPLC-QTOF-MS) analysis was performed in this study. A screening analysis based on a UNIFI platform was first carried out to discuss the similarities. Next, untargeted metabolomic analysis based on multivariate statistical analysis was performed to discover the differences. As a result, a total of 122 components, containing 118 shared constituents, were characterized from CML and IML. The structure types included flavonoids, alkaloids, glyosides, organic acids and organic acid esters, iridoids, lignans, and steroids, etc. For CML, 121 compounds were characterized; among these, 18 potential biomarkers with higher contents enabled differentiation from IML. For IML, 119 compounds were characterized; among these, 12 potential biomarkers with higher contents enabled differentiation from CML. It could be concluded that both CML and IML are rich in phytochemicals and that CML is similar to IML in the kinds of the compounds it contains, except for the significant differences in the contents of some compounds. This comprehensive phytochemical profile study provides a basis for explaining the effect of different growth environments on secondary metabolites and exists as a reference for further research into or applications of CML in China.


UPLC-QTOF-MS E
UPLC-QTOF-MS E analysis was performed on a Waters Xevo G2-XS QTOF mass spectrometer (Waters Co., Milford, MA, USA) equipped with a UPLC system through an electrospray ionization (ESI) interface. Chromatographic separation was performed on an ACQUITY UPLC BEH C 18 (100 mm × 2.1 mm, 1.7 µm) column provided by Waters Corporation. The mobile phases were composed of eluent A (0.1% formic acid in water, v/v) and eluent B (0.1% formic acid in acetonitrile, v/v) with flow rate of 0.4 mL/min. The elution conditions applied were: 0-2 min, 10% B; 2-26 min, 10-100% B; 26-29 min, 100% B; 29-29.1 min, 100-10% B; 29.1-32 min, 10% B. Mixtures of 90/10 and 10/90 water/acetonitrile were used as the weak wash solvent and the strong wash solvent, respectively. The temperatures of the column and autosampler were 30 • C and 15 • C, respectively. The mass spectrum was acquired from 100 to 1500 Da in MS E mode. The positive mode conditions were as follows: capillary voltage, 2.6 kV; source temperature, 150 • C; cone voltage, 40 V; cone gas flow, 50 L/h; desolvation temperature, 400 • C; desolvation gas flow, 800 L/h. Negative mode conditions were identical to the positive mode conditions except for the capillary voltage (2.2 kV). During a single LC run, data acquisition was performed via the mass spectrometer by rapidly switching from a low collision energy (CE) scan to a high-CE scan in MS E mode. The collision energy of low energy function was set to 6 V while the ramp collision energy of high energy function was set to 20~40 V. Leucine enkephalin (LE) (m/z 554.2615 in ESI − mode and 556.2771 in ESI + mode), the external reference of Lock Spray™, was infused at a constant flow of 10 µL/min. During acquisition, data were collected in continuum mode for the screening analysis and in centroid mode for the metabolomics analysis. Masslynx™ V4.1 workstation (Waters, Manchester, UK) was used to record the data.

Screening Analysis of Components of CML and IML by UNIFI Platform
To quickly identify the chemical compounds, the MS raw data, compressed with Waters Compression and Archival Tool v1.10, was automatedly screened and identified using the streamlined workflow of UNIFI 1.7.0 software (Waters, Manchester, UK) [30][31][32][33]. The parameters were as follows: for 2D peak detection, 200 was set as the minimum peak area; for 3D peak detection, the peak intensities of low energy and high energy were set as over 1000 and over 200 counts, respectively; mass error in the range of ±5 ppm was set for identified compounds; retention time in the range of ±0.1 min was allowed to match the reference substance. Generated predicted fragments from the structure were identified as the matching compounds. Negative adducts containing +COOH and -H and positive adducts containing +H and +Na were selected in the analysis. Leucine enkaplin was selected as the reference compound, and [M − H] − 554.2620 was used for the negative ion and [M + H] + 556.2766 for the positive ion. Components were further verified by comparing reference substances with retention time and by comparing characteristic MS fragmentation patterns in the literature. The chemical information database used for the components was as follows: besides the in-house Traditional Medicine Library in the Waters UNIFI platform, the investigation of chemical constituents was conducted systematically. A self-built database of compounds that were reported in ML was established by searching online databases or internet search engines such as PubMed, Full-Text Database (CNKI), ChemSpider, Web of Science, and Medline. Chemical information including the component name, structures of the components, and molecular formula were available from the database.

Metabonomics Analysis of CML and IML
The raw data were processed for alignment, deconvolution, and data reduction, etc., with MarkerLynx XS V4.1 software (Waters, Milford, CT, USA) [34]. A Markerlynx processing method was first created, and its main parameters included: retention time (RT) range 0~26 min, minimum intensity 5%, mass range 100~1500 Da, mass tolerance 0.10, mass window 0.10, marker intensity threshold 2000 counts, retention time window 0.20, and noise elimination level 6. After processing the data, the results were able to be shown in Extended Statistics (XS) Viewer. m/z-RT pairs with corresponding intensities for all the detected peaks from each data file were listed. The same values of RT and m/z in different batches of samples were regarded as the same component. Furthermore, multivariate statistical analysis was performed. Firstly, PCA was used to show the pattern recognition and maximum variation aiming to obtain the overview and classification. Secondly, OPLS-DA in ESI + and ESI − modes was performed in order to get the maximum separation between the CML and IML groups and to explore the potential chemical markers that contribute to the differences. Then, S-plots were created to provide visualization of the OPLS-DA predictive component loading to facilitate model interpretation. Meanwhile, the use of variable importance for the projection (VIP) was helpful in screening the different components, and metabolites with VIP value > 1.0 and p-value below 0.05 were considered as potential markers [32]. In addition, permutation testing was performed to provide reference distributions of the R 2 /Q 2 values that could indicate statistical significance [35,36]. Simca 15.0 software (Umetrics, Malmö, Sweden) was used to show the analysis results [33,35].

Identification of Components from CML and IML Based on the UNIFI Platform
As a result of screening analysis, a total of 122 compounds were identified or tentatively characterized in both ESI + and ESI − mode from CML and IML. There were 118 shared constituents identified in CML and IML. More specifically, 121 and 119 compounds were characterized from CML and IML, respectively ( Table 2). Both of the two types of Moringa oleifera leaves are rich in natural components with various structural patterns, including flavonoids, alkaloids, glyosides, organic acids and organic acid esters, iridoids, lignans, and steroids, etc. Base peak intensity (BPI) chromatograms marked with the number of compounds are shown in Figure 1. The chemical structures of the compounds are summarized in Figure 2.

Diversity Evaluation of CML and IML Using Metabolomics Analysis
The QC injections were clustered tightly in PCA, indicating a satisfactory stability of the system. According to their common spectral characteristics, the PCA 2D plots of the samples from CML and IML groups were able to be easily classified within two clusters (Figure 3). The CML and IML samples were clearly separated, indicating that these two samples could be easily differentiated.

Diversity Evaluation of CML and IML Using Metabolomics Analysis
The QC injections were clustered tightly in PCA, indicating a satisfactory stability of the system. According to their common spectral characteristics, the PCA 2D plots of the samples from CML and IML groups were able to be easily classified within two clusters (Figure 3). The CML and IML samples were clearly separated, indicating that these two samples could be easily differentiated. In order to evaluate the differences between the leaves in the two areas, OPLS-DA score plot, Splot, permutation test, and variable importance in the projection values were obtained to understand which variables were responsible for this sample separation [72]. After OPLS-DA plots (Figures 4a  and 5a) in both ESI + and ESI − modes were performed, the maximum separation between the CML and IML groups was available. With sufficient permutation testing, the lines of grouping samples were significantly located underneath the random sampling lines (Figure 4b and 5b), which indicates a definite validity for the following characteristic metabolites biomarkers identification. S-plots were then created to explore the potential chemical markers that contributed to the differences. Based on p values (p < 0.05) and VIP values (VIP > 1) [26,30] from univariate statistical analysis, 30 robust known chemical markers enabling differentiation between CML and IML were marked and listed (Figures  4c and 5c and Table 2). Additionally, a heatmap was generated from these chemical markers in order to systematically evaluate the markers (Figure 6), which visually showed the intensities of potential chemical markers between the two samples. In order to evaluate the differences between the leaves in the two areas, OPLS-DA score plot, S-plot, permutation test, and variable importance in the projection values were obtained to understand which variables were responsible for this sample separation [72]. After OPLS-DA plots (Figures 4a and 5a) in both ESI + and ESI − modes were performed, the maximum separation between the CML and IML groups was available. With sufficient permutation testing, the lines of grouping samples were significantly located underneath the random sampling lines (Figures 4b and 5b), which indicates a definite validity for the following characteristic metabolites biomarkers identification. S-plots were then created to explore the potential chemical markers that contributed to the differences. Based on p values (p < 0.05) and VIP values (VIP > 1) [26,30] from univariate statistical analysis, 30 robust known chemical markers enabling differentiation between CML and IML were marked and listed (Figures 4c and 5c and Table 2). Additionally, a heatmap was generated from these chemical markers in order to systematically evaluate the markers (Figure 6), which visually showed the intensities of potential chemical markers between the two samples.

Discussion
Via the screening analysis, 121 and 119 compounds were characterized in CML and IML, respectively. As the results show, 93 compounds were identified in negative mode and 29 compounds were identified in positive mode. From the BPI chromatograms, it seems that the negative ionization mode was better than the positive mode based on the quantity and the responses of the identified compounds. However, it was still necessary to have run the positive mode because some compounds showed better responses in this mode than in the negative mode. The results also showed that both these ML areas are rich in natural components. It has been reported that there is high flavonoid content (presenting in flavanol and glycoside forms) in M. oleifera leaves [4,18]. In this study,

Discussion
Via the screening analysis, 121 and 119 compounds were characterized in CML and IML, respectively. As the results show, 93 compounds were identified in negative mode and 29 compounds were identified in positive mode. From the BPI chromatograms, it seems that the negative ionization mode was better than the positive mode based on the quantity and the responses of the identified compounds. However, it was still necessary to have run the positive mode because some compounds showed better responses in this mode than in the negative mode. The results also showed that both these ML areas are rich in natural components. It has been reported that there is high flavonoid content (presenting in flavanol and glycoside forms) in M. oleifera leaves [4,18]. In this study, flavonoids were also the main chemical composition. Besides the most common flavonoids, 36 flavonoids, such as apigenin-8-C-glucoside, quercetin 3-O-β-D-glucopy-ranoside, kaempferol-7-O-α-L-rhamnoside, and 5, 7, 2 , 5 -tetrahydroxyflavone, were identified or tentatively characterized in M. oleifera leaves for the first time. Moreover, isothiocyanates have become a major topic of research interest regarding Moringa for their various biological activities [18]. In our study, there were 4 isothiocyanates which were found both in IML and CML. A total of 118 compounds were shared constituents in CML and IML, which means that they were similar in terms of the kinds of compound contained. This comprehensive phytochemical profile study has revealed the structural diversity of secondary metabolites and the similar patterns within CML and IML.
Furthermore, in nontargeted metabolomic analysis, when taking the contents of the constituents into account, it was found that there indeed existed differences between CML and IML. Thirty robust known biomarkers enabling this differentiation were discovered. These are able to illustrate the differences between CML and IML and provide a basis for explaining the effect of different growth environments on secondary metabolites. With CML, there are 18 potential biomarkers, including seven flavonoids (14,33,55,63,74,79, and 84), five organic acids and organic acid esters (30,38,80,115, and 116), two glyosides (12 and 29), and four others (69, 93, 121, and 122). Among these biomarkers, compounds 14, 33, and 38 were detected only in CML under experimental conditions, and the others' contents in CML were greater than those in IML. Among these potential biomarkers, components 14, 33, 55, 74, 79, 30, and 80 were identified or tentatively characterized in M. oleifera leaves for the first time. It has been reported that M. oleifera leaves which originate from China have the maximum antioxidant activity when compared alongside those from Faisalabad, Multan, and India [73]. As is known, biological activity is caused by the high contents of phytochemicals. Correlation studies between potential markers and biological activities should be performed in the future. For IML, there are 12 potential biomarkers, including six flavonoids (50,51,59,62,78, and 82), three organic acids and organic acid esters (81, 97, and 119), one glyoside (100), one alkaloid (104), and one lignan (60). Among these, compound 82 was detected only in IML under experimental conditions, and the other 11 compounds' contents were greater in IML than those in CML.
Based on the above results, it could be concluded that some of the secondary plant metabolite contents of CML and IML differ from each other. This is just as it is with other natural plants.
In summary, a total of 122 components, including 118 shared constituents, were characterized from CML and IML. For CML, 121 compounds were characterized, and among these, 18 potential biomarkers with higher contents enabled differentiation from IML. For IML, 119 compounds were characterized, and among these, 12 potential biomarkers with higher contents enabled differentiation from CML.
Even so, several unresolved issues still remain. For example, in the future, potential chemical markers' and identified compounds' pharmacological activities should be screened. In addition, there are still some unidentified components, despite 122 compounds being identified, as shown in the BPI chromatograms. Further research should be performed on these unknown components.

Conclusions
In this study, 121 and 119 chemical compounds, including 118 shared constituents, were respectively identified or tentatively characterized from CML and IML by combining UPLC-QTOF-MS and a UNIFI platform. Both CML and IML, which originate from two separate countries, are rich in phytochemicals and are similar in the kinds of compounds they contain. Moreover, a metabolomics study based on UPLC-QTOF-MS combined with multivariate statistical analysis has shown the significant differences in the contents of an amount of the compounds in these two accessions. A total of 30 robust known biomarkers enabling differentiation were discovered. For CML and IML, 18 and 12 potential biomarkers were identified, respectively. This study provides further data to make up for the deficient amount of study performed on the chemical constituents of Moringa oleifera leaves and can help with planning strategies focused on the proper utilization of this resource, as well as providing a reference for the further application of CML in China.

Conflicts of Interest:
The authors declare no conflict of interest.