Chemical Profiling and Screening of the Marker Components in the Fruit of Cassia fistula by HPLC and UHPLC/LTQ-Orbitrap MSn with Chemometrics

Cassia fistula L. which is known as “Golden Shower”, is used as an ornamental plant due to its flowers, and fruit parts of this plant have a high medicinal value. There are few reports providing a comprehensive overview of the chemical composition of its fruit or explaining the differences between samples from different sources because of the complexity of its chemical components. The purpose of the present study was to establish a fingerprint evaluation system based on Similarity Analysis (SA), Hierarchical Cluster Analysis (HCA) and Principal Component Analysis (PCA) for the composition identification and quality control of this herb. Twelve samples from Xinjiang and Sichuan provinces in China and India were analyzed by HPLC, and there were fifteen common peaks in the twelve batches. Molecular weight and formula information can be derived from thirty-one peaks by UHPLC/LTQ-Orbitrap MSn, molecular structure information of twenty components was obtained, of which ten compounds were identified by comparison with standard materials. Samples of twelve batches were divided according to their similarity into four groups, which were basically consistent with three different C. fistula fruit-producing areas. Five compounds were finally considered to be chemical markers to determine the quality of this herb. A fingerprints method combined with chemometrics was established to differentiate the origin of the fruit of C. fistula which has the advantages of effectivity and convenience, laying the foundation for the quality evaluation of this herb from different sources.


Introduction
Cassia fistula Linn. (Caesalpinaceae) which is native to India, the Amazon and Sri Lanka, is a semi-wild Indian Labernum also commonly known as "golden shower", which is extensively cultivated throughout India as an ornamental and deciduous plant for its beautiful bunches of yellow flowers. The plant has been widely diffused in tropical countries around the world, from the West Indies to South Asia and from South Africa to Egypt, Mauritius, India, South Africa, Mexico, China, East Africa, Brazil, and so on [1][2][3]. "Golden shower" not has only a high ornamental value, but also possesses useful properties in the treatment of diseases. C. fistula has been used as a kind of traditional effective drug by local people for a long time. It has been explained in the Indian literature that the plant has many advantageous applications against some diseases such as skin infections, liver troubles, tuberculous glands, and even malaria [4]. One of the most representative is its fruit parts which was reported to have a high medicinal utility. Fruit parts of C. fistula have been reported scientifically to possess hypolipidemic, antidiabetic, antioxidant, antimicrobial, antitumor, sedative, purgative, hepatoprotective and abortifacient activity [5][6][7][8][9][10][11].
It has generally acknowledged that the efficacy of plant drugs is related to their chemical composition and active compounds. At present, more studies have focused on the pharmacological and biological activities of plants' individual organs. Few studies have provided a comprehensive description of the chemical profile and explained the differences between samples from different origins because the chemical composition of the fruit of C. fistula is so complicated [12][13][14][15][16][17]. Because of the lack of identification of subsequent peaks, one cannot point out the specific components which cause these differences. Therefore, it is very necessary and important to establish a valid strategy to comprehensively identify the chemical components of the fruit of C. fistula, and to seek several characteristic markers which can be applied to discriminating the medicinal herbs from different places of production and control their quality.
In this case, HPLC fingerprints of critical components provide a new approach for quality control of drugs. There are numerous studies about fingerprint analysis combined with chemometrics for the screening of bioactive components and the quality control of medicinal drugs [18][19][20]. In the present research, thirty-six peaks were detected by HPLC, of which fifteen were common peaks in the similarity analysis (SA). In order to identify clearly what these peaks are, we adopted a UHPLC/LTQ-Orbitrap MS n method. It has the advantages of high resolution and sensitivity, and HPLC coupled with hybrid LTQ-Orbitrap MS system has been applied to many studies in this field [21][22][23], because it is an ideal identification method for natural compounds by obtaining accurate molecular mass and multistage MS n fragment ions of the tested samples. Screening a representative sample (S7) that can reflect the peak characteristics of these twelve batches of medicinal materials as far as possible by UHPLC/LTQ-Orbitrap MS n . UHPLC/LTQ-Orbitrap MS n was performed using the negative ion mode of the HESI source for detection, molecular weight and formula information was gained from a total of thirty-one peaks, the molecular structure information of twenty components was obtained through matching their accurate molecular weights with known compounds in the online databases and by comparison of their molecular ions and fragment ions, whereby ten compounds were identified. Moreover, three components were discovered for the first time in the fruit of C. fistula, two of these compounds are related to sennoside A/B and one is a new compound. Another compound, 1-[1,5-dihydroxy-3-methyl-8-[3,4,5-trihydroxy-6-[(3,4,5-trihydroxyoxan-2-yl)oxymethyl] oxan-2-yl]oxynaphthalen-2-yl]ethanone (PubChem CID: 22297273), was also reported for the first time in this herb. To the best of our knowledge, it is the first time that so many compounds were inferred in the fruit parts of C. fistula. Differences in medicinal materials result from the number and content of ingredients, so for the purpose of seeking the reasons that may lead to differences between several batches, we need to study these components with chemometrics. According to the HCA result the samples were divided into four groups that approximately coincided with three different places of origin of the fruit of C. fistula. We also found for the first time that these compounds, including C 12  , can be suitable markers to discriminate between these medicinal materials of different quality according to PCA. In conclusion, the results proved that fingerprints combined with chemometrics represent an effective, simple and rapid means for the quality control of C. fistula fruit.

Validation of the Method
Fingerprinting is a comprehensive and quantifiable means of identification. It is based on the systematic study of pharmaceutical chemical components and is mainly used to evaluate the authenticity, superiority and stability of drugs. The relative retention time, relative peak area and similarities were used to evaluate the quality of the fingerprints. Rhein (peak t) which is a relatively large single peak in the chromatogram, was assigned as the reference peak to calculate relative retention times and relative peak areas.
Precision testing was performed by repeated injection of the same sample solution six continuous times. The RSD of relative retention time of the common peaks was below 0.33%, the RSD of relative peak area was below 4.84%, and the similarities of six chromatograms were all 1.000.
The stability was evaluated with one sample over 24 h. The RSDs of relative retention time and relative peak area of the common peaks were all below 0.99% and 3.29%, and the similarities of different chromatograms were all above 0.999.
The repeatability was determined by the detection of six prepared samples. The RSDs of relative retention time and relative peak area of the common peaks were respectively below 0.80% and 4.54%; the similarities of six chromatograms were all above 0.999.
These results showed that all samples remained stable during the testing period, and demonstrated the conditions were reliable and satisfactory for the fingerprint analysis.

Similarity Analysis (SA)
The chromatographic profile must be representative of all samples and characteristics of integrity and fuzziness. The identification and analysis of the samples can be conducted well even though the amounts of several chemical components are different from the others by analyzing the mutual pattern of chromatograms [24].
Twelve batches of samples from different origins (Table 1) were determined and the chromatograms were analyzed by SES to generate a common pattern R (Figure 1). Peak a, b, c, d, f, g, i, j, l, m, r, s, t, u and v were common peaks, and their peak areas are listed in the Supplementary Materials. The SES software was used to calculate the similarities of different chromatograms compared to the common pattern. All these results are shown in Table 2. large single peak in the chromatogram, was assigned as the reference peak to calculate relative retention times and relative peak areas. Precision testing was performed by repeated injection of the same sample solution six continuous times. The RSD of relative retention time of the common peaks was below 0.33%, the RSD of relative peak area was below 4.84%, and the similarities of six chromatograms were all 1.000.
The stability was evaluated with one sample over 24 h. The RSDs of relative retention time and relative peak area of the common peaks were all below 0.99% and 3.29%, and the similarities of different chromatograms were all above 0.999.
The repeatability was determined by the detection of six prepared samples. The RSDs of relative retention time and relative peak area of the common peaks were respectively below 0.80% and 4.54%; the similarities of six chromatograms were all above 0.999.
These results showed that all samples remained stable during the testing period, and demonstrated the conditions were reliable and satisfactory for the fingerprint analysis.

Similarity Analysis (SA)
The chromatographic profile must be representative of all samples and characteristics of integrity and fuzziness. The identification and analysis of the samples can be conducted well even though the amounts of several chemical components are different from the others by analyzing the mutual pattern of chromatograms [24].
Twelve batches of samples from different origins (Table 1) were determined and the chromatograms were analyzed by SES to generate a common pattern R (Figure 1). Peak a, b, c, d, f, g, i, j, l, m, r, s, t, u and v were common peaks, and their peak areas are listed in the Supplementary Materials. The SES software was used to calculate the similarities of different chromatograms compared to the common pattern. All these results are shown in Table 2. Covered HPLC chromatograms of samples from No. S1 to S12. The common pattern (marked R) was obtained by SES.
From the results we can conclude that the similarities of the different chromatograms compared to the common pattern were all above 0.800, except for sample S1 (0.796), which indicated that chemical constituents of different samples were not highly influenced by their origins. The common pattern is a very effective identification for the samples of the fruit of C. fistula. Figure 1. Covered HPLC chromatograms of samples from No. S1 to S12. The common pattern (marked R) was obtained by SES.
From the results we can conclude that the similarities of the different chromatograms compared to the common pattern were all above 0.800, except for sample S1 (0.796), which indicated that chemical constituents of different samples were not highly influenced by their origins. The common pattern is a very effective identification for the samples of the fruit of C. fistula.  Other than these fifteen common peaks, we can also draw the following conclusions from the chromatograms: peak e appeared in S3, S7, S8, S9 and S10, among which the peaks of S3, S9 and S10 had a relatively large peak area; peak h appeared in S7, S8, S10 and S12, and the peak areas of S8 and S10 were relatively larger; peak k merely appeared in S3 and S6; both peak n and o appeared in S2, S5, S9 and S11; both peak p and q only appeared in S8 and S10. The uncommon peaks of these respective batches were also responsible for differences in these medicinal materials from different sources. From these uncommon peaks, we can find out clearly that the samples of S3, S8 and S10 were different from other samples, and this result can be linked to that of Hierarchical Cluster Analysis (HCA) described below to elaborate the differences. To find out what the common peaks screened are by SA, we need to identify them carefully.

Chemical Profiling of the Fruit of C. fistula by UHPLC/LTQ-Orbitrap MS n
Liquid chromatography coupled with mass spectrometry has been proved as a powerful tool for structural characterization of unknown compounds [25]. There are relatively few reports on the composition of the fruit parts of C. fistula, so it is especially meaningful to identify the constituents by high-resolution mass spectrometry. We can obtain the total ion current (TIC) chromatogram of methanol extract of the fruit of C. fistula in negative ion mode by UHPLC/LTQ-Orbitrap MS n . Since most of ingredients of the medicinal drug are flavonoids and anthraquinones, mainly acidic components, we mainly analyzed the negative ion mode. Firstly, the chemical compounds of the fruit of C. fistula were systematically investigated and summarized from previous reference literatures. Secondly, the databases (e.g., ChemSpider website http://www.chemspider.com and PubChem https://pubchem.ncbi.nlm.nih.gov/) were employed to search for compound structures based on the predicted chemical formulae. Finally, the structure of the compound is tentatively or unequivocal deduced by its chromatographic behavior, MS fragment data, comparisons with reference standards, and database-matching (e.g., Metlin https://metlin.scripps.edu/landing_page. php?pgcontent=mainPage). These compounds identified are classified into three groups including flavonoids, anthraquinones, disaccharide and others, the first two being the main ones.
We selected a representative sample (S7) of twelve batches of these medicinal drugs that can reflect the peak characteristics as far as possible by UHPLC/LTQ-Orbitrap MS n . Altogether, there were a total of thirty-six peaks, the [M − H] − ions can be found for thirty-one compounds, the molecular structure information was acquired for twenty components, and ten components were accurately identified by comparison with standard substances. Some of these compounds are presented in Figure 2. In addition, a new compound had been found (sennoside triglucoside), another other compound (sennoside monoglucoside) had been deduced in the fruit of C. fistula for the first time among the sennoside derivatives, and another compound, oxynaphthalen-2-yl]ethanone was also reported for the first time in this herb. Figure 3 shows the total ion current (TIC) chromatogram of methanol extract of this medicinal drug in the negative mode by UHPLC/LTQ-Orbitrap MS n . Detailed information about this drug is given in Table 3, considering the elution order.
The HPLC chromatogram profiles ( Figure 4) also basically correspond to that of the UHPLC/LTQ-Orbitrap MS n chromatogram of the fruit of C. fistula in negative-ionization mode, in both of which thirty-six chromatographic peaks were detected. 6               The ions in bold values were diagnostic ions. In the "Peak" of the above table, a: Components were confirmed by comparison with reference standards; b: Structures were tentatively inferred. In the "Lever" of the above table, it was interesting to classify unknown compounds into four levels by spectral similarity, they were respectively A, B, C and D. Procyanidins are a type of polymers consisting of a flavane-3-alcohol monomer as the basic unit. Among them, type B procyanidins are the most widely distributed. Compound 5 was provisionally assigned as procyanidin B2 due to its parent ion [M − H] − at m/z 577 and predominant daughter ions at m/z 425 and 407 because of a RDA cleavage, and at m/z 289 and 245, indicating the loss of procyanidin B2 in agreement with its MS/MS data of the known compound from the literature. Figure 5A shows the RDA cleavage and binding sites of procyanidin B2 [26].  Figure  5A shows the RDA cleavage and binding sites of procyanidin B2 [26].  Figure 5B shows the RDA cleavage and binding sites of (−)epicatechin [26].
Compounds 11, 13 and 17 are quercetin derivatives, which that are the major flavonoids in the fruit of C. fistula under investigation, according to their common fragment ion at m/z 301 which is characteristic of quercetin. Because of the removal of glucose and rhamnose, respectively, they were therefore hypothetically assigned to be isoquercitrin (13) and quercitrin (17). Compound 11 was putatively identified as rutin because of the similarity with spectral library data.   Figure 5B shows the RDA cleavage and binding sites of (−)-epicatechin [26].
Compounds 11, 13 and 17 are quercetin derivatives, which that are the major flavonoids in the fruit of C. fistula under investigation, according to their common fragment ion at m/z 301 which is characteristic of quercetin. Because of the removal of glucose and rhamnose, respectively, they were therefore hypothetically assigned to be isoquercitrin (13) and quercitrin (17). Compound 11 was putatively identified as rutin because of the similarity with spectral library data.
Peak 20 gave a [M − H] − ion at m/z 563 and produced fragment ions at m/z 299 and m/z 255 (yielded after losing 44 Da of CO 2 ), suggesting a formula of C 26 H 28 O 14 , it was hard to understand how the m/z 299 ion can be produced, and it was assumed that it came from the loss of two molecules of xylose.
Compounds 24 and 28 are kaempferol derivatives which are the other relatively major flavonoids in the fruit of C. fistula under research, in the light of the quasi-molecular ion of compound 28 at m/z 285 that is characteristic of kaempferol, and its fragment ions at m/z 257 and 211 after the comparison of its MS/MS data. The fragment ions of compound 24 are approximately similar to the fragment ions of kaempferol, so compound 24 can be considered as its isomer.
Compound 27 had a quasi-molecular ion at m/z 271. The C ring of this compound underwent RDA cleavage to generate fragment ions of m/z 151 and 119, and the B ring was cracked to generate fragment ions with m/z l77 and 93, so this component was presumed to be naringenin by comparison of its molecular ions and fragment ions with literature values. Free anthraquinones are typical of anthraquinones, and the following five free anthraquinones are the most common: rhein, emodin, chrysophanic acid, aloe emodin, physcion. Rhein and emodin were inferred to exist after comparing these compounds with the literature. Compound 30 was tentatively identified as rhein because of its parent ion [M − H] − at m/z 283 and daughter ion at m/z 257 and the other one at m/z 239. According to the prediction of the mass spectrometry, the fragment ion at m/z 257 should lose a group with a mass of about 26 from the quasi-molecular ion, but there was no a mass of approximately 26 in this system. The analysis showed it should be a system where the carboxyl of the quasi-molecular ion system lost a CO-neutral small molecule with a mass number of 28, but it was also affected by the H radical in the ion trap. One of the H radicals combined with its oxygen atom to form a hydroxyl at the cleavage site of losing a CO molecule, and the other H radical combined with its adjacent carbon at the cleavage site to form a m/z 257 fragment ion. The formation of m/z 239 fragment ion was due to that the formed hydroxyl underwent a dehydration reaction with the H in its ortho position.

Identification of Disaccharides and Other Compounds
Compounds 1 and 2 were inferred to be disaccharides in the fruit of C. fistula based on their common fragment ion at m/z 179 which is characteristic of aldohexoses. The fruit of C. fistula is a bit sticky, and it was suspected that there are more saccharides in it. The method is not suitable for the analysis of polysaccharides inside, and they can be detected through more advanced methods.
Compound 14 was tentatively assigned as 1-[1,5-dihydroxy-3-methyl-8-[3,4,5-trihydroxy-6-[(3,4, 5-trihydroxyoxan-2-yl)oxymethyl]oxan-2-yl]oxynaphthalen-2-yl]ethanone. This compound, whose parent ion [M − H] − was at m/z 525 and predominant daughter ion was at m/z 231, first lost a molecule of xylose and glucose combined with its secondary mass spectrometry fragment ion information. When the MS 3 scan was performed, it also took off an acetyl molecule according to its tertiary mass spectrometry fragment ions at m/z 189 and 187. Its chemical structure is shown in Figure 2, and the secondary and tertiary fragment ion chromatograms of this compound are listed in the Supplementary Materials.  Peaks 3, 4, 7, 16, 20, 23, 29, 32, 34, 35 and 36 are currently only defined by their molecular formulas and molecular weights, and their structural information needed to be further confirmed by other approaches such as NMR.
As a result, the molecular formulas of thirteen peaks were determined for fifteen common peaks in Figure 1, they were as follows: C 12  Among these thirteen formulas, the structure of eight peaks has been putatively identified respectively as a disaccharide (Peak a), (+)-catechin (Peak c), sennoside B (Peak g), sennoside A (Peak j), the isomer of kaempferol (Peak r), naringenin (Peak s), rhein (Peak t), 1-O-methylchrysophanol (Peak u).

Identification of Ten Chemical Components by Comparison with Standard Materials
To accurately confirm more components, the following compounds were compared with standard materials: (+)-catechin, (−)-epicatechin, rutin, sennoside B, quercitrin, sennoside A, naringenin, kaempferol, rhein and emodin ( Figure 6). Since these compounds were reported several times to have biological activities, they were quantified by means of chromatography. Comparing the chromatogram of the medicinal material with that of these standard materials, separation and tailing of four compounds, (−)-epicatechin (2), sennoside A (6), naringenin (7), and rhein (9), met the requirements, so they can be used as quantitative indicators. This method provided the basis for establishing its quality standards, and the result is shown in Figure 7. As a result, the molecular formulas of thirteen peaks were determined for fifteen common peaks in Figure 1, they were as follows: C12H22O11 (Peak a), C17H30O12 (Peak b), C15H14O6 (Peak c), C27H30O15 (Peak d), C42H38O20 (Peak g), C30H26O9 (Peak i), C42H38O20 (Peak j), C26H28O14 (Peak m), C15H10O6 (Peak r), C15H12O5 (Peak s), C15H8O6 (Peak t), C15H10O5 (Peak u), C15H22O2 (Peak v). Among these thirteen formulas, the structure of eight peaks has been putatively identified respectively as a disaccharide (Peak a), (+)-catechin (Peak c), sennoside B (Peak g), sennoside A (Peak j), the isomer of kaempferol (Peak r), naringenin (Peak s), rhein (Peak t), 1-O-methylchrysophanol (Peak u).

Identification of Ten Chemical Components by Comparison with Standard Materials
To accurately confirm more components, the following compounds were compared with standard materials: (+)-catechin, (−)-epicatechin, rutin, sennoside B, quercitrin, sennoside A, naringenin, kaempferol, rhein and emodin ( Figure 6). Since these compounds were reported several times to have biological activities, they were quantified by means of chromatography. Comparing the chromatogram of the medicinal material with that of these standard materials, separation and tailing of four compounds, (−)-epicatechin (2), sennoside A (6), naringenin (7), and rhein (9), met the requirements, so they can be used as quantitative indicators. This method provided the basis for establishing its quality standards, and the result is shown in Figure 7.

Hierarchical Cluster Analysis (HCA)
In order to evaluate resemblances and dissimilarities of the fruits of C. fistula based on the fingerprint data, HCA was performed which can divide tested samples into different groups [20]. HCA is a useful and simple statistical method for seeking relatively homogeneous clusters based on measured characteristics, and samples with high similarity can be clustered into the homogenous groups. Recently, this method is widely used in the origin discrimination, identification, quality assessment of drugs, and is diffusely applied to the fingerprint analysis [28]. The method can classify different medicinal herbs by measuring the peak areas from their corresponding HPLC fingerprints. The common characteristic peaks, which were calculated by the SES, were selected for the HCA. HCA of the samples was performed based on the relative peak areas of all fifteen common peaks.
The HCA results are shown in Figure 8, where the quality characteristics were revealed clearly, and the results revealed that the samples can be divided into four quality clusters. Interestingly, these four groups were basically similar to the three different producing areas (shown in Table 1) of these herbs, including Xinjiang Province (Group A and C) in China, India (Group B) and Sichuan Province (Group D) in China. Group A was made up of S2, S4, S5, S7, S9, S11 and S12; Group B consisted of S1 and S6; Group C was made up of S3; Group D was composed of S8 and S10. The case which divided into four groups indicated that there were differences in chemical species and content among India, Xinjiang Province and Sichuan Province samples. Overall, it would be more intuitive and effective to distinguish the fruit of C. fistula from different regions according to the results of the HCA.

Hierarchical Cluster Analysis (HCA)
In order to evaluate resemblances and dissimilarities of the fruits of C. fistula based on the fingerprint data, HCA was performed which can divide tested samples into different groups [20]. HCA is a useful and simple statistical method for seeking relatively homogeneous clusters based on measured characteristics, and samples with high similarity can be clustered into the homogenous groups. Recently, this method is widely used in the origin discrimination, identification, quality assessment of drugs, and is diffusely applied to the fingerprint analysis [28]. The method can classify different medicinal herbs by measuring the peak areas from their corresponding HPLC fingerprints. The common characteristic peaks, which were calculated by the SES, were selected for the HCA. HCA of the samples was performed based on the relative peak areas of all fifteen common peaks.
The HCA results are shown in Figure 8, where the quality characteristics were revealed clearly, and the results revealed that the samples can be divided into four quality clusters. Interestingly, these four groups were basically similar to the three different producing areas (shown in Table 1) of these herbs, including Xinjiang Province (Group A and C) in China, India (Group B) and Sichuan Province (Group D) in China. Group A was made up of S2, S4, S5, S7, S9, S11 and S12; Group B consisted of S1 and S6; Group C was made up of S3; Group D was composed of S8 and S10. The case which divided into four groups indicated that there were differences in chemical species and content among India, Xinjiang Province and Sichuan Province samples. Overall, it would be more intuitive and effective to distinguish the fruit of C. fistula from different regions according to the results of the HCA.

Principal Components Analysis (PCA)
The HPLC response signal from fifteen characteristic components of twelve batches of the fruit of C. fistula was imported into SPSS 25.0 software (SPSS, Chicago, IL, USA) with their peak areas as the tested object. PCA provided an easy visualization of the complete data set in a reduced dimension plot, showing the degree of aggregation and dispersion during several components of different samples, and the method was used to explain the differences [29].
The first two principal components in the correlation plot (SPSS 25.0) were extracted by PCA, and the cumulative contribution of variances of these components was up to 82.1%. PC1 represented

Principal Components Analysis (PCA)
The HPLC response signal from fifteen characteristic components of twelve batches of the fruit of C. fistula was imported into SPSS 25.0 software (SPSS, Chicago, IL, USA) with their peak areas as the tested object. PCA provided an easy visualization of the complete data set in a reduced dimension plot, showing the degree of aggregation and dispersion during several components of different samples, and the method was used to explain the differences [29].
The first two principal components in the correlation plot (SPSS 25.0) were extracted by PCA, and the cumulative contribution of variances of these components was up to 82.1%. PC1 represented 61.8% of the total variance, and it had the highest contribution rate and contained the most information; PC2 represented 20.3% of the total variance. The factor loading matrix of tested samples is shown in Table 4, and the loading plots of PCA of fifteen components of the fruit of C. fistula are shown in Figure 9 [20,24,[30][31][32]. 61.8% of the total variance, and it had the highest contribution rate and contained the most information; PC2 represented 20.3% of the total variance. The factor loading matrix of tested samples is shown in Table 4, and the loading plots of PCA of fifteen components of the fruit of C. fistula are shown in Figure 9 [20,24,[30][31][32].   can explain the contribution of variables to each principal component. Most of the compounds were comparatively clustered together, and five compounds were outliers that were responsible for the significant differences in these medicinal materials from different places. Combination of the factor loading matrix and loading plots indicated that the main components responsible for the separation were C 12 (peak v) which contributed most to the grouping result. These five compounds were much more statistically significant in chemotaxonomy than the other identified components, and the higher concentration of them may be due to the good quality of the fruit of C. fistula.

Reagents and Materials
A total of twelve batches of the fruit of C. fistula collected in present study were purchased from various corporations and hospitals from two provinces in China. The batch numbers are listed in Table 1

Preparation of Standard Solutions and Sample Solutions
Individual stock solutions of the references used for qualitative analysis were prepared by dissolving the each reference in methanol at an appropriate concentration and then stored at 4 • C until use. The dry plant material was firstly ground into powder and sieved (65 mesh). A total of 1 g plant material powder was accurately weighed, and then extracted by ultrasonication (KQ5200DA, 200 W, 40 kHz, Kunshan, China) for 90 min with 100 mL pure methanol at room temperature for each sample. The samples were shaken well, filtered, and the filtrate was passed through a 0.45 µm microporous membrane to prepare the tested samples.

Chemical Profiling by UHPLC/LTQ-Orbitrap MS n
Chromatographic analysis was performed on a Dionex Utimate 3000 Series HPLC system (Thermo Scientific, Shanghai, China) equipped with a quaternary solvent delivery system and a column temperature controller. All the samples were analyzed at a column temperature of 30 • C on a Diamosil-C 18 column (4.6 mm × 250 mm, 5 µm, Dema, Beijing, China). The mobile phase consisted of water with 0.1% formic acid solution (eluent B), and acetonitrile (eluent A) using a gradient elution mode of 10% A at 0-3 min, 10-15% A at 3-5 min, 15-26% A at 5-60 min, 26-66% A at 60-110 min, 66% A at 110-120 min. The flow rate was 1.0 mL/min, and the injection volume was 10 µL.
A mass spectrometer (Thermo Scientific, LTQ-Orbitrap XL, Shanghai, China) was applied to MS detection. The operation conditions were as follows: capillary temperature, 350 • C; apci vaporizer temperature, 300 • C; sheath gas flow, 30 L/min; aux gas flow, 10 L/min; source voltage, 3 kV; capillary voltage, −35 V; Tube Lens, −110 V. Mass spectra were recorded across the range of m/z 50-2000. Thermo Workstation Acquisition Software Xcalibur Version 2.2 and UHPLC analysis Software Chromeleon Version 7.0 were utilized for system control, data acquisition and data processing.

Similarity Analysis and Qualitative Analysis of Ten Standard Materials by HPLC
Similarity analysis and qualitative analysis were carried out on a LC-20AD Series HPLC system (Shimadzu, Hong Kong, China) equipped with a quaternary solvent delivery system and a column temperature controller. Chromatographic separation was conducted on a Diamosil-C 18 column (4.6 mm × 250 mm, 5µm, Dema, Beijing, China). The mobile phase was composed of solvent B (0.1% phosphoric acid solution) and solvent A (acetonitrile) with a gradient elution program: 0-3 min, 10% A; 3-5 min, 10-15% A; 5-60 min, 15-26% A; 60-110 min, 26-66% A; 110-120 min, 66% A. The constant flow rate was 1.0 mL/min and the column was maintained at 30 • C. The injection volume was 20 µL and the detection wavelength was set at 254 nm. The PDA detector was set to scan from 190 nm to 400 nm, with 254 nm optimized as the detection wavelength for analysis.

Data Analysis and Statistics
The recognized components were assigned by comparison with the exact molecular weight and fragment ion peaks with the established database. Some of the proposed components were confirmed by comparison with standard substances. The HPLC data were analyzed through the Similarity Evaluation System (SES) for the Chromatographic Fingerprints of TCMs (2012), software. Hierarchical Cluster Analysis (HCA) and Principal Components Analysis (PCA) were performed by SPSS (SPSS Statistical Software Package, version 25.0, Chicago, IL, USA).

Conclusions
The therapeutic effects of medicinal drugs are based on their complicated chemical constituents, HPLC and UHPLC/LTQ-Orbitrap MS n fingerprint analysis combined with chemometrics were applied to study the complex system of the fruit of C. fistula. In summary, a common pattern was established by determining and comparing the fingerprints of twelve samples of the fruit of C. fistula from different sources, and fifteen common peaks were selected. Therefore, it is very necessary to identify these peaks. A total of thirty-one components were recognized by UHPLC/LTQ-Orbitrap MS n , of which twenty compounds were tentatively inferred by comparing mass spectrometry data with that of reference compounds and literature data. Additionally, ten components were identified by comparison with standard materials. Furthermore, HCA and PCA can provide a good reference for quality analysis and origin research of this herb. Twelve batches of medicinal materials from different origins were categorized, and five ingredients, C 12  included, were proved to be the intuitive cause of their differences. Moreover, a new compound had been inferred (sennoside triglucoside) with regard to sennoside derivatives, and another compound had been deduced in the fruit of C. fistula (sennoside monoglucoside) for the first time. Another compound (1-[1,5-dihydroxy-3-methyl-8-[3,4,5-trihydroxy-6-[(3,4,5-trihydroxyoxan-2-yl)oxymethyl]oxan-2-yl] oxynaphthalen-2-yl]ethanone) was also first reported in this herb.
On the one hand, the results of multivariate statistical analysis here were not systematic because of the limitation of the number of samples collected. On the other hand, since the final elution gradient of acetonitrile was only up to 66% in this experiment, the larger polarity compounds were isolated but low polarity compounds were not isolated. Subsequent experiments will research the chemical identification of low polar components of the fruit of C. fistula and further study their systematic analysis.
Supplementary Materials: The following are available online. Table S1: The peak area of the fifteen common peaks; Figure S1: The chromatogram of secondary fragment ions about Compound 12; Figure S2