Global Profiling of the Antioxidant Constituents in Chebulae Fructus Based on an Integrative Strategy of UHPLC/IM-QTOF-MS, MS/MS Molecular Networking, and Spectrum-Effect Correlation

An integrative strategy of UHPLC/IM-QTOF-MS analysis, MS/MS molecular networking (MN), in-house library search, and a collision cross-section (CCS) simulation and comparison was developed for the rapid characterization of the chemical constituents in Chebulae Fructus (CF). A total of 122 Constituents were identified, and most were phenolcarboxylic and tannic compounds. Subsequently, 1,3,6-tri-O-galloyl-β-d-glucose, terflavin A, 1,2,6-tri-O-galloyl-β-d-glucose, punicalagin B, chebulinic acid, chebulagic acid, 1,2,3,4,6-penta-O-galloyl-β-d-glucose, and chebulic acid, among the 23 common constituents of CF, were screened out by UPLC-PDA fingerprinting and multivariate statistical analyses (HCA, PCA, and OPLS-DA). Then, Pearson’s correlation analysis and a grey relational analysis were performed for the spectrum-effect correlation between the UPLC fingerprints and the antioxidant capacity of CF, which was finally validated by an UPLC-DPPH• analysis for the main antioxidant constituents. Our study provides a global identification of CF constituents and contributes to the quality control and development of functional foods and preparations dedicated to CF.


Introduction
Terminalia chebula Retz.(Combretaceae) is a Terminalia tree native to South and Southeast Asia [1].Its dried fruit, namely Chebulae Fructus (CF), is a well-known, common folk medicine widely used in Tibetan and Ayurvedic systems of medicine [2] in countries such as China, India, and Nepal [3].CF is recognized as "the king of medicine" and frequently prescribed in traditional medicinal prescriptions with valuable and diverse pharmacological potentials, such as antioxidant [4], anti-inflammatory [5], anti-diabetic [6], antimicrobial [7], and virustatic [8] activities.Modern pharmacological studies have found that CF exerted predominantly antioxidant activity by relying on its high biological yield of phenolcarboxylic and tannic antioxidants [9], as exemplified by gallic acid, ellagic acid, chebulic acid, corilagin, chebulain, and chebulagic acid, with great intrinsic potential for preventing oxidative stress and inhibiting reactions caused by oxygen or peroxides, as well as counteracting the destructive effects of oxidation in animal tissues in order to reap further anti-aging benefits [10] for the body.Among the edible fruits in the Himalayan region of India, CFs with the highest antioxidant level have been used as a daily dietary supplement among people living in the mountains [11].A recent study revealed that corilagin, chebulagic acid, and chebulinic acid in CF may be absorbed into the body for a comparatively long term to exert continuous treatment effects [12].
As is known, the difficulty in researching CF lies in the complexity of components and multi-bioactivities, whose associations have not been systematically unveiled, making it difficult to systematically elucidate CF's effectiveness [16].Mass spectrometry (MS)based, non-targeted metabolomics have become an effective method for the identification of bioactive constituents in Chinese medicines (CMs) because of the high throughput, fast screening, and wide coverage [16].The Global Natural Products Social (GNPS)-based molecular networking (MN) is a computational method that bridges the gap between popular mass spectrometric (LC-MS/MS) data-processing and molecular networking analysis, by calculating the similarities between the MS/MS spectral pairs in GNPS [17].Recently, LC-MS/MS, coupled with molecular networking, has enabled the identification of the main constituents, such as ellagic acid and its derivatives, from T. leiocarpa (DC.)Baill [18].And spectrum-effect relationship modeling; enzymatic-and chemical-reagent-based online chromatographic screening [19]; and biological target fishing [20] have been developed as optional methods for the fast discovery and screening of active constituents from CMs.
In this study, an integrative strategy of GNPS-based MS/MS classic molecular networking, an in-house library search, and a simulation and comparison of collision cross-sections (CCS) was developed firstly based upon the UPLC/IM-QTOF-MS analysis for the rapid identification of the phenolcarboxylic and tannic metabolites in CF (Figure 1).Next, the antioxidant capacity of CF was evaluated by DPPH • -scavenging, ABTS + -scavenging, and ferric-reducing assays, and then Pearson's correlation analysis and a grey relational analysis were performed to assess the spectrum-antioxidant relationship between UPLC fingerprints and the antioxidant effects of CF.Finally, a comparative UPLC analysis, before and after treatment of DPPH • , was further applied to verify the main antioxidants in CF.

Plant Material
Eighteen batches of CF (whole fruits or the flesh of fruits) were collected from different countries and regions in 2021, with detailed information, as shown in Supplementary Table S1.All the collected materials were authenticated as the dried ripe fruits of Terminalia chebula Retz.(Samples S1-S18) based on the DNA-barcoding and PCR method, as reported by Prof. Xiaoxuan Tian from the Institute of Traditional Chinese Medicine at the Tianjin University of Traditional Chinese Medicine.As recorded in the Chinese pharmacopoeia 2020 Edition [21], the dried, ripe fruits of T. chebula were collected, and their impurities were cleaned from the surface before being washed and dried in the sun, resulting in CF fruits.CF fruits were cleaned and briefly soaked to moisten them enough to remove the kernels, before being dried in the sun to yield the flesh of CF fruits.The dried materials were pulverized by a shredder BJ-800A (Hangzhou Baijie Technology Co., Ltd., Hangzhou, China) and filtered through a 50-mesh sieve to yield CF powder, before use.

Plant Material
Eighteen batches of CF (whole fruits or the flesh of fruits) were collected from different countries and regions in 2021, with detailed information, as shown in Supplementary Table S1.All the collected materials were authenticated as the dried ripe fruits of Terminalia chebula Retz.(Samples S1-S18) based on the DNA-barcoding and PCR method, as reported by Prof. Xiaoxuan Tian from the Institute of Traditional Chinese Medicine at the Tianjin University of Traditional Chinese Medicine.As recorded in the Chinese pharmacopoeia 2020 Edition [21], the dried, ripe fruits of T. chebula were collected, and their impurities were cleaned from the surface before being washed and dried in the sun, resulting in CF fruits.CF fruits were cleaned and briefly soaked to moisten them enough to remove the kernels, before being dried in the sun to yield the flesh of CF fruits.The dried materials were pulverized by a shredder BJ-800A (Hangzhou Baijie Technology Co., Ltd., Hangzhou, China) and filtered through a 50-mesh sieve to yield CF powder, before use.

GNPS-based MS/MS Classic Molecular Networking
Molecular networking is a spectral correlation and visualization approach that visually displays the chemical space present in tandem mass spectrometry (MS/MS) experiments.GNPS was used for MS/MS molecular networking that visualized sets of MS/MS spectra among related molecules (called spectral networks), even when the MS/MS spectra themselves had not been matched to any known compounds [17].In MS/MS molecular networks, each MS/MS spectrum was displayed as a node, and the spectrum-to-spectrum components were aligned as an edge, and the authors observed that similar spectra from structurally related molecules tended to be clustered in one component when these molecular fragments were similar, as reflected in their MS/MS patterns [27].
Classical molecular networking is well suited for discovery and can be analyzed directly from raw mass spectrometry files.Before GNPS-based MS/MS classic molecular networking, all MS data files were converted from the .dfile format (Agilent MassHunter) to the .MzXML format by the software of MSConvert 3.023083, a standard tool for significant inquiries [28].Through an FTP client, such as the software of WinSCP 5.17.9, the .mzXMLdocuments were upload to GNPS (https://gnps.ucsd.edu/,last accessed on 10 November 2022) for further data analysis.In this study, the mass spectrometry files of fractions F1 and F2, the only two fractions derived from the CF total extract, were uploaded.The spectrum files G1, G2, and G6 were F1, F2, and blank, respectively, and G6 as the blank spectra was filtered before networking (https://gnps.ucsd.edu/ProteoSAFe/status.jsp?task=8daa3 4c6c40849109f4c08302b0f2e01/, last accessed on 12 November 2022).The tolerances of the precursor and the fragment ion mass were set at 0.01 Da, and all other parameters were defaults.
In addition, the MolNetEnhancer reanalysis of the molecular networking was conducted for automatic chemical classification through ClassyFire [29] to provide a more comprehensive chemical overview while illuminating structural details for each fragmented spectrum (https://gnps.ucsd.edu/ProteoSAFe/status.jsp?task=b66535212e0c402 7b88a0a2a96c75324/, last accessed on 12 November 2022).Finally, the cytoscape file was downloaded and visualized by Cytoscape 3.8.2software for further network analysis.

Establishment of In-House Compound Library
An in-house compound library of currently reported constituents of CF was established by a comprehensive literature review of online databases, including SciFinder ® , Sci-enceDirect (https://www.sciencedirect.com/,last accessed on 1 November 2022), PubMed (https://pubmed.ncbi.nlm.nih.gov/, last accessed on 1 November 2022), and CNKI (https: //www.cnki.net/,last accessed on 10 October 2022), combined with the HMDB database (https://hmdb.ca/, last accessed on 15 November 2022).The comprehensive molecular information (such as the 2D and 3D structures, molecular formula, molecular weight, exact mass, and the reported MS pattern) of more than 140 compounds (including phenolcarboxylic and tannic compounds, flavonoids, and triterpenoids) were included and collated (Supplementary Table S3).By applying a built-in library-search utility using the Agilent MassHunter workstation 10.0 software, the constituents in CF were preliminarily identified by molecular formula, as well as by MS or MS/MS library search.

UPLC Fingerprinting and Multivariate Statistical Analysis
Each CF powder (0.2 g) was accurately weighed, placed into a 50 mL volumetric flask, and ultrasonically extracted (600 W, Zhixin Instrument Co., Ltd., Shanghai, China) with 50 mL methanol, for 20 min at 30 • C. The extraction of each CF sample was then centrifuged (17,709× g for 10 min) at room temperature before direct injection, for UPLC fingerprinting.
After UPLC analysis, the original data files with the suffix ".cdf" were introduced into the Similarity Evaluation System for Chromatographic Fingerprint of TCM (Version 2012A) software, for the generation of the chromatographic fingerprints, the assignment of the common peaks, and the calculation of spectral similarities.Multivariate statistical analyses, including HCA, PCA, and OPLS-DA, were conducted on the chromatographic data of 18 batches of CF samples for non-targeted sample classification and the screening of differential metabolites by the R4.2.2-programming language and the SIMCA-P (version 14.1) software (Umetrics, Umeå, Sweden).Heatmaps were produced using the heatmap package in the R4.2.2-programming language.
2.8.Antioxidant Assays, Spectrum-Effect Correlation, and DPPH • Pretreated UPLC Analysis The antioxidant capacity of CF was evaluated by DPPH • and ABTS + radical-scavenging and ferric-reducing antioxidant-power (FRAP) assays, following previously reported methods [30][31][32], with minor changes as detailed in the Supplementary Section S3.Evaluation of antioxidant capacity.The spectrum-antioxidant relationship of CF was then studied by Pearson's correlation analysis [33] and grey relational analysis [34], between relative contents (%) of the selected common constituents and the antioxidant effect (IC 50 values of the DPPH • radical-scavenging effect, the ABTS + radical-scavenging capacity at a concentration of 0.05 mg/mL, and the ferric-reducing antioxidant power at a concentration of 0.5 mg/mL) of the CF methanol extracts.A comparative UPLC-PDA analysis was performed, before and after treatment with DPPH • (viz.UPLC-DPPH • analysis) [31], for a selected CF sample (sample S5).The tested solution of CF methanol extract (CFME, 1.0 mg/mL) was mixed with an equal volume of DPPH • (1.0 mmol/L) solution and incubated in the dark at room temperature for 30 min before being centrifuged and injected for UPLC analysis by applying the above UPLC fingerprinting method, with a solution of CF methanol extract (CFME, 1.0 mg/mL), mixed with an equal volume of methanol, as the control sample (n = 6).

Integrative Strategy for Global Identification of the Constituents in CF
Herein, an integrative strategy combining a UPLC/IM-QTOF-MS analysis, a GNPSbased MS/MS classic molecular networking, an in-house compound library search, and a CCS simulation and comparison was firstly developed for the global characterization of the chemical constituents in CF (Figure 1).
Firstly, the CF crude extract and the derived fractions (F1 and F2) were subjected to a UHPLC-DAD-(-)-QTOF-MS analysis before the GNPS-based MS/MS classic molecular networking, where the network nodes were automatically annotated by a spectral library search with the corresponding MS/MS spectra.Then, the molecular network was further re-analyzed by a MolNetEnhancer utility to yield enhanced results (Figure 2A) by integrating metabolomic mining and annotation tools [29].Secondly, we established an in-house compound library that consisted of a compound name, a molecular formula, and a molecular weight, of each constituent of CF as reported, combined with the HMDB database for matching the observed compounds' MS/MS spectral data.This step could be used to ensure the comprehensiveness and the novelty of the identification results because most of the constituents in the literature have not yet been included in the existing online databases.Thirdly, the MS cleavage law of specific compounds was summarized from the fragmentation patterns of the standard constituents, so the unknown constituents with similar fragmentation patterns could be inferred accordingly with the molecular formula, the glycosyl type, and the sequence of phenolcarboxylic or tannic substituents.This could be a key method for the discovery of novel compounds.The results of the molecular networking, the library search, and the fragmentation reference of the standard constituents were then combined and de-duplicated for a more global identification of the constituents in CF.Finally, the UPLC/IM-QTOF-MS analysis and the CCS simulation and comparison (Table 1) were used to assign more accurate chemical structures for the identified isomers.Meanwhile, a spider-web diagram was drawn based on the total-ion-current chromatographic (TIC) peak areas of the identified phenolcarboxylic and tannic constituents in CF (Figure 2B).constituents were then combined and de-duplicated for a more global identification of the constituents in CF.Finally, the UPLC/IM-QTOF-MS analysis and the CCS simulation and comparison (Table 1) were used to assign more accurate chemical structures for the identified isomers.Meanwhile, a spider-web diagram was drawn based on the total-ioncurrent chromatographic (TIC) peak areas of the identified phenolcarboxylic and tannic constituents in CF (Figure 2B).

Identification Based on Molecular Networking and Annotation of the Network Nodes
The GNPS-based MS/MS classic molecular network of the CF crude extract and the derived fractions F1 and F2 was constructed, as shown in Figure 2A.In the molecular network, each node contained the information of the precursor mass, retention time, and the MS/MS fragmentation, and the nodes with highly similar MS/MS patterns were clustered and classified into one network.There were 430 nodes in total, with the exception of the orphaned nodes due to their inability to reflect the relationships among the MS/MS spectra.However, due to the limited number of shared compounds collated in GNPScooperative online public spectral libraries, there were only 19 nodes auto-annotated in CF's molecular network (https://gnps.ucsd.edu/ProteoSAFe/result.jsp?task=8daa34c6c40849 109f4c08302b0f2e01&amp;view=view_all_annotations_DB/, last accessed on 12 November 2022).The molecular network was then re-analyzed by MolNetEnhancer to yield 4 tannic components (6, 17, 13, and 14) with 132 nodes, 1 triterpenoid component (4) with 12 nodes, 2 flavonoid components (3 and 30) with 4 nodes, and other components with less nodes (Figure 2A).Based on the quasi-molecular ions, the fragment ions, and the adduct ions of the annotated standard compounds and the MS/MS spectral similarities among the nodes, the chemical structures of other known and/or unknown compounds (i.e., MN-01-MN-37, Supplementary Figure S34) and the derivative pathways were extrapolated, as shown in Supplementary Figures S35-S39.In other words, our study initially relied on classic molecular networking to pursue unknown compounds based on the known compounds of CF extract, and the derived fractions based on their MS/MS fragmentation patterns.
In the molecular network, the nodes of gallotannins had accumulated predominantly in component 6, including monomeric, dimeric, trimeric, tetrameric, and polymeric galloyl tannins.Components 17 and 13 included mainly the chebulic ellagitannins featuring chebuloyl scaffolds in multiple derivative forms.For example, the two carboxyl groups in a chebulic acid could be dehydrated to form an anhydrate fragment ion (m/z 337.0201), while other types of chebuloyl (Che) in chebulanin, chebulagic acid, and chebulinic acid exposed three carboxyl groups by opening of the glycoside ester bond to form fragment ions at m/z 337.0201.In component 17, there were two kinds of chebulic ellagitannins in CF: One was simple chebulic acid polyol esters, and the other was tannins containing the Che unit.As compared to the quasi-molecular ion of chebulic acid (m/z 355.031, [M-H] − ), the mass differences of the nodes at m/z 369.047, 383.062, and 411.093 were 14.016, 28.031, and 56.062 Da, respectively, indicating that they were mono-methylated, di-methylated, and tetra-methylated products of chebulic acid.
Component 13 contained chebulic ellagitannins, also called chebulanin derivatives.The node at m/z 1303.18 was the [2M-H] − ion of chebulanin.As compared to the quasimolecular ion of chebulanin (m/z 651.0839 [M-H] − ), the structures of MN-29, MN-30, MN-31, and MN-33 were deduced accordingly from the nodes at m/z 815.152, 813.136, 807.126, and 803.094, respectively, with mass differences of 164, 162, 156, and 152 Da, respectively.The 1 -O-methyl neochebulanin node with a precursor mass at m/z 683.110 was deduced by following our in-house compound library search; thus, the dimethyl neochebulanin node at m/z 697.126 with a mass difference of 14 Da (as compared to m/z 683.110) could be deduced accordingly, which had been derived by a cleavage of 2-glycosidic bond in chebulanin, followed by a mono-methylation and a di-methylation of the exposed carboxyl groups.
The ellagitannin nodes were typically clustered in component 14 for the fragmentation of a common fragment ion at m/z 300.9998.The ellagitannins could be identified accordingly and were distinguished based on their characteristic hexahydroxydiphenoyl (HHDP) fragments (Figure 2C), such as MN-12 (t R = 13.86 min, m/z 783.0672) and terflavin B (t R = 15.80 min, m/z 783.0679).The predominant fragment ion of MN-12 was t, which was a characteristic fragment ion of HHDP-glucose, while the predominant fragment ion of terflavin B was m/z 450.9940, which was a characteristic fragment ion of flavogallonic acid.In addition, component 14 also contained nodes of ellagitannins with rhamnose substituted by ellagic acyls, which could have been generated by the hydrolysis of HHDP, such as 116 (MN-15) and 118 (4-O-(4 -O-galloyl-α-L-rhamnosyl)ellagic acid) in Table 1.
Notably, the MS profiles of gallic acid, chebulic acid, chebulanin, and ellagic acid played crucial roles in the annotation of the nodes in components 6, 17, 13, and 14 (Supplementary Figures S35-S39), respectively.

Identification by in-House Library Search
Forty-seven compounds were preliminarily identified by an in-house library search based on the auto-MS/MS mode of data acquisition in order to fragment the most intense precursor ions (Supplementary Table S4).However, the in-house library search may have resulted in inaccurate identification due to the presence of multiple isomers for one phenolcarboxylic or tannic compound.In addition, an accurate identification by our inhouse library search also required a high resolution and a high signal-to-noise ratio of the MS and MS/MS spectra for the compounds identified.

Identification Based on the Cleavage Law of Standard Constituents
As shown in Figure 3 and Supplementary Figures S40-S49, gallic acid ( 12 This fragment would be successively dehydrated and decarboxylated in order to yield fragment ions at m/z 319.0090 and 275.0200, or decarboxylated and dehydrated to yield fragment ions at m/z 293.0303 and 275.0200.Chebulic acid also produced a fragment ion at m/z 337.0201; however, with a molecule different from the aforementioned compounds, chebuloyl could be generated by the rearrangement of chebulic acid.It should be noted that most ellagitannins generated additional ions of [M-2H] 2− or [2M-H] − , in addition to the deprotonated ion of [M-H] − .Their core structures could be easily identified by characteristic fragment ions with the sequential or simultaneous loss of 337 Da (chebuloyl unit).The compounds containing the HHDP moiety, such as punicalagins A (39) and B (47), corilagin (56), and chebulagic acid, produced a fragment ion at m/z 300.9990 in a way similar to ellagic acid [35] and compounds containing the ellagic acyl group, as exemplified by terminalin (112) and 4-O-(3 ,4 -di-O-galloyl-α-L-rhamnosyl)ellagic acid (121).The typical [M-H] − ion of the compound 1,3,6-tri-O-galloyl-β-D-glucose (68) could be observed at m/z 635.0879, which firstly lost a 170 Da (a gallic acyl unit) to yield m/z 465.0668, and then lost another galloyl unit and a C 2 H 2 O moiety to yield fragment ions at m/z 313.0569 and 271.0459, respectively.On this basis, a series of non-standard known and previously unknown compounds could be identified.

Isomer and Unknown Structure Discrimination Based on Collision Cross-Sections
To date, ion mobility spectroscopy (IMS) has enabled the separation of compounds on the basis of differences in the mobility of ions through buffering gases in an electric field.IM-QTOF-MS offers respective CCS of the target analytes, which is a unique physicochemical parameter associated with the ion shape, charge, and size, as well as the compound's structure and conformation [36].Herein, CCS of the identified constituents were measured and predicted, as listed in Table 1, and some isomers were distinguished by comparisons of the measured CCS values with those predicted by machine-learning algorithms.For mono-galloyl glucoses, the predicted CCS values (170.416-176.058Å 2 ) showed a good match with the measured CCS values (172.066-175.413Å 2 ), but for other compounds, the CCS simulation was unsatisfactory.Yet, the galloylation (or other acylation) site on a glycose or shikimic acid, or a quinic acid could not be assigned due to the almost identical CCS values from different isomers with unidentifiable arrival times and similar structures [37].Despite this, CCS for all the possible chemical structures deduced from the aforementioned molecular networking were predicted before the comparison with the measured results of the constituents in CF, providing a solution for refining the final identification results, to some extent.

Global Profiling of the Phenolcarboxylic and Tannic Constituents in CF
Plant polyphenolic acids, especially hydrolyzable tannins (HTs), have been increasingly recognized as playing vital roles in long-term health due to their reduction in the risk of chronic diseases [37].HTs possess ester and glycosidic bonds in their structures and are easily hydrolyzed into simple phenolic compounds, sugars, and polyols.Gallotannins, ellagitannins, and cheublic ellagitannins [4,38] were the main HTs of CF.The hydrolysis of gallotannins yielded gallic acid, sugar (mostly glucose), and polyols.Ellagitannins (ETs) represent a complex class of tannins formed by the esterification of polyols (i.e., glucose or rhamnose) with hexahydroxydiphenoyl (HHDP) or phenolic acid, related to its source, and have been hydrolyzed to yield ellagic acid [35].The common phenolic acyls include valoneoyl (Val), sanguisorboyl (Sang), dehydrohexahydroxydiphenoyl (DHHDP), and chebuloyl (Che), which can be derived from HHDP by dehydrogenation, rearrangement, cycloreversion, etherification, or other reactions (Figure 2C).Due to the presence of polyhydroxy groups in sugar and polyols, ETs have an enormous structural variability due to the different linking forms between HHDP residues and glucose moiety, and, in particular, due to their tendencies to polymerize into dimeric and oligomeric derivatives or form multiple isomers after

Isomer and Unknown Structure Discrimination Based on Collision Cross-Sections
To date, ion mobility spectroscopy (IMS) has enabled the separation of compounds on the basis of differences in the mobility of ions through buffering gases in an electric field.IM-QTOF-MS offers respective CCS of the target analytes, which is a unique physicochemical parameter associated with the ion shape, charge, and size, as well as the compound's structure and conformation [36].Herein, CCS of the identified constituents were measured and predicted, as listed in Table 1, and some isomers were distinguished by comparisons of the measured CCS values with those predicted by machine-learning algorithms.For monogalloyl glucoses, the predicted CCS values (170.416-176.058Å 2 ) showed a good match with the measured CCS values (172.066-175.413Å 2 ), but for other compounds, the CCS simulation was unsatisfactory.Yet, the galloylation (or other acylation) site on a glycose or shikimic acid, or a quinic acid could not be assigned due to the almost identical CCS values from different isomers with unidentifiable arrival times and similar structures [37].Despite this, CCS for all the possible chemical structures deduced from the aforementioned molecular networking were predicted before the comparison with the measured results of the constituents in CF, providing a solution for refining the final identification results, to some extent.

Global Profiling of the Phenolcarboxylic and Tannic Constituents in CF
Plant polyphenolic acids, especially hydrolyzable tannins (HTs), have been increasingly recognized as playing vital roles in long-term health due to their reduction in the risk of chronic diseases [37].HTs possess ester and glycosidic bonds in their structures and are easily hydrolyzed into simple phenolic compounds, sugars, and polyols.Gallotannins, ellagitannins, and cheublic ellagitannins [4,38] were the main HTs of CF.The hydrolysis of gallotannins yielded gallic acid, sugar (mostly glucose), and polyols.Ellagitannins (ETs) represent a complex class of tannins formed by the esterification of polyols (i.e., glucose or rhamnose) with hexahydroxydiphenoyl (HHDP) or phenolic acid, related to its source, and have been hydrolyzed to yield ellagic acid [35].The common phenolic acyls include valoneoyl (Val), sanguisorboyl (Sang), dehydrohexahydroxydiphenoyl (DHHDP), and chebuloyl (Che), which can be derived from HHDP by dehydrogenation, rearrangement, cycloreversion, etherification, or other reactions (Figure 2C).Due to the presence of polyhydroxy groups in sugar and polyols, ETs have an enormous structural variability due to the different linking forms between HHDP residues and glucose moiety, and, in particular, due to their tendencies to polymerize into dimeric and oligomeric derivatives or form multiple isomers after esterification with one or more functional groups [13,35].Notably, chebublic ellagitannins appear to be the HTs with the highest content and the most complex structure in CF.
By utilizing the described integrative identification strategy, a total of 122 constituents (Table 1, Supplementary Figure S34) were identified in CF.There were 106 phenolcarboxylic and tannic compounds that could be classified into four categories, including 20 phenolcarboxylic acids, 28 gallotannins, 25 ellagitannins, and 33 chebulic ellagitannins.Among them, gallic acid, ellagic acid, and chebulic acid were the most common phenolcarboxylic acids as structural units frequently used by hydrolyzable tannins of CF.Overall, CF is rich in HTs with relatively complex and diverse chemical structures, high molecular weights, and low abundance, which introduces certain difficulties into the differentiation and characterization.
As previously mentioned, MS/MS molecular networking represents a network display of MS/MS spectral data, allowing the simultaneous identification of known metabolites and their structural analogs from the crude extracts and enriched components of CMs through the clustering of similar MS/MS spectral nodes.Despite the intrinsically complex constituents, the diverse mass-spectral-fragmentation pathways, and the lack of specialized MS and MS/MS libraries dedicated to natural products, the rapid and accurate annotation of the network nodes continues to be possible and commendable in order to achieve the successful application of this method.Routinely, identification based on the cleavage law (viz.fragmentation pathways) of standard constituents has been well developed as the most reliable method for the qualitative and quantitative analysis of herbal medicines; however, its application has been limited and, at times, impractical when the standard substances for the major constituents were limited.Under these circumstances, the construction of an in-house compound library dedicated to automatic search enabled a global preliminary speculation concerning the constituents and proved to be an effective choice for the node annotation of molecular networks.In addition, CCS measurement and prediction specialized for the discrimination of isomers and unknown structures was also developed.However, due to the lack of measured CCS values of phenolcarboxylic and tannic isomers and without a well-established understanding of the definitive characteristic fragmentations from simple phenolcarboxylic molecules to those oligomerized and large polymerized tannic molecules of standard compounds, the comprehensive use of the above complementary or mutually validated methods remains imperfect and controversial for fast, accurate, and global identification of the constituents in CF.
As depicted in Figure 4C, the CF samples were clustered into three groups with the dominant contents of chebulagic acid (95), chebulinic acid (107), and ellagic acid (110)/gallic acid (12).Chebulagic acid, chebulinic acid, and chebulic acid were screened out according to their chemotaxonomic potentials.Interestingly, seven out of the eight potential chemotaxonomic markers, as previously mentioned, belonged to the hydrolysable tannins that played crucial roles in the quality control of CF.Among the eight differential constituents, there were three gallotannins, one phenolcarboxylic acid, two ellagitannins, and two chebulic ellagitannins, most of which possessed significant antioxidant properties [4].In addition, 1,3,6-tri-O-galloyl-β-D-glucose (68) showed strong binding capabilities at locations for receptor-binding domain mutants in order to inhibit the viral entry of SARS-CoV-2 into target cells [39].A total of 95 (chebulagic acid) and 107 (chebulinic acid) showed direct and potent-dose-dependent anti-viral activities in vitro against HSV-2 and exhibited potent HCV NS3/4A inhibitory activities [8,40].As reported, antioxidant treatment ameliorated respiratory syncytial virus-induced disease and lung inflammation, as well as potentially prevented long-term effects associated with RSV infection, such as bronchial asthma [41].According to a recent study [9], CF showed a significant dose-dependent antioxidant effect due to its contents of hydrolyzable tannins.Herein, we found that the contents of hydrolyzable tannins in CF samples from group III were apparently lower than the samples from the other two groups.

Antioxidant Evaluation and Screening of the Antioxidant Constituents
DPPH • -radical-scavenging assay is one of the most common methods used for antioxidant evaluation in vitro, while ABTS + and FRAP assays are combined regularly to investigate antioxidant potentials.In our study, as shown in Supplementary Table S7 and Figure S53, CF sample S5 exhibited the most potent antioxidant activities with respective values of 2.993 µg/mL (IC 50 of DPPH • radical scavenging), 14.47 ± 1.106 mmol/g (ABTS+ radical scavenging capacity), and 1.566 ± 0.182 mmol/g (ferric reducing antioxidant power), in the three assays.The DPPH • radical-scavenging IC 50 values of samples S1-S12 (2.993 ≤ IC 50 ≤ 5.303 µg/mL) were significantly lower than the samples S13-S18 (5.761 ≤ IC 50 ≤ 10.110 µg/mL), and we noticed that samples S1-S12 were all from the whole fruits of CF, while samples S13-S18 were all from the flesh of CF.This phenomenon indicated that whole-fruit preservation before processing and extraction was beneficial for the conservation of the active antioxidant ingredients in CF, such as chebulagic acid and chebulinic acid.
The spectrum-effect correlation is a reliable method to understanding the relationship between the efficacy and the constituents using chemometric methods.Herein, Pearson's correlation analysis and the grey relational analysis were performed for the spectrumantioxidant effect correlation between the UPLC fingerprints and the antioxidant effects.
A heat-map of Pearson's correlation coefficients was plotted of the normalized 23 common peak areas (Figure 5A), the IC 50 values of the DPPH • assay, and the total antioxidation capabilities (T-AOC) of the ABTS + and FRAP assays for the 18 batches of CFME.The dark red and blue points depict the positive and negative correlations, respectively.Generally, the DPPH • radical-scavenging IC 50 values were negatively correlated to the antioxidant activity while the T-AOC values in the ABTS + and FRAP assays were positively correlated to the antioxidant activity.The correlation coefficients of the ABTS + and FRAP assays were exactly opposite of those of the DPPH • assay.As a result, seven peaks (60, 72, 86, 89, 95, 97, and 107) with r < -0.5 showed obviously negative correlations to the DPPH • radical-scavenging activity, while six peaks (60, 72, 89, 95, 97, and 107) with r > 0.  The grey relational analysis (GRA) was performed on the relative contents of the UPLC chromatographic peaks and the antioxidant capabilities, and the distances between the variables were calculated using the mean values to obtain the correlations between the contents (%) of the constituents and the antioxidant activities of the CF methanol extracts (Figure 5B).As a result, the relational degrees (r) between the identified constituents and the antioxidant efficacies ranged from 0.5289 to 0.8845.Specifically, there were 8 constituents (15, 39, 56, 74, 80, 89, 95, and 97) with grey relational degrees above 0.7, indicating that the antioxidant capacity of CF may have been the overall effects of those dedicated constituents with high relevance.
Finally, a comparative UPLC analysis, before and after treatment with DPPH • [30], was carried out to verify the main antioxidant metabolites in CF.As shown in Figure 5C, almost all the main constituents, except for the triterpenoid constituents, exhibited varying degrees of antioxidant activities indicated by the significant decrease in the The grey relational analysis (GRA) was performed on the relative contents of the UPLC chromatographic peaks and the antioxidant capabilities, and the distances between the variables were calculated using the mean values to obtain the correlations between the contents (%) of the constituents and the antioxidant activities of the CF methanol extracts (Figure 5B).As a result, the relational degrees (r) between the identified constituents and the antioxidant efficacies ranged from 0.5289 to 0.8845.Specifically, there were 8 constituents (15, 39, 56, 74, 80, 89, 95, and 97) with grey relational degrees above 0.7, indicating that the antioxidant capacity of CF may have been the overall effects of those dedicated constituents with high relevance.
Finally, a comparative UPLC analysis, before and after treatment with DPPH • [30], was carried out to verify the main antioxidant metabolites in CF.As shown in Figure 5C, almost all the main constituents, except for the triterpenoid constituents, exhibited varying degrees of antioxidant activities indicated by the significant decrease in the relative contents, represented by the decrease (%) in their UPLC peak areas after treatment with DPPH • (Supplementary Table S8).The results showed that most of the active constituents were polymeric phenolcarboxylic or tannic compounds, and 47 (punicalagin B), 72 (1,2,6tri-O-galloyl-β-D-glucose), 56 (corilagin), 95 (chebulagic acid), and 107 (chebulinic acid) possessed the most potent DPPH radical-scavenging activities, with decreases of 43.16%, 31.48%,31.06%,27.18%, and 22.01% in their peak areas, respectively.Interestingly, the monomeric phenolcarboxylic or tannic compounds, the so-called structural units of tannins, such as 110 (ellagic acid), 12 (gallic acid), and 3 (chebulic acid), showed much weaker DPPH • scavenging effects, with decreases of 8.69%, 8.31%, and 4.19% in their peak areas, respectively, even though they had been previously thought to show excellent antioxidant activities [42].
As reported, the 13 major constituents demonstrated potent antioxidant activities, and the CF extracts showed stronger antioxidant activities than the individual chemical constituents [4].However, our spectrum-effect correlation and the UPLC-DPPH • analysis revealed that the main phenolcarboxylic and tannic compounds, especially the so-called structural units of tannins, including chebulic acid and gallic acid, did not exert a favorable antioxidant contribution to the overall antioxidant activity of CFME.This phenomenon indicated that there may exist synergistic or antagonistic effects on the antioxidant activities of the main constituents in the crude extract of CF.Finally, our study resulted in the discoveries of compounds 15 ( Overall, the characterization and the fingerprinting of the phenolcarboxylic and tannic constituents, the antioxidant evaluation, and the spectrum-effect correlation were accomplished successively in our study, enabling the development of antioxidant constituent markers for chemotaxonomic significance when distinguishing CF samples.At the same time, we speculated that the common phenolcarboxylic and tannic constituents with high structural similarities possibly undergo mutual transformation and/or degradation during collection, preservation, and processing prior to being prescribed in a clinical setting, which will benefit from our ongoing in-depth studies and research.

Conclusions
In summary, with the specialized integrative identification strategy, a total of 122 constituents of CF were identified, including 20 phenolcarboxylic acids, 28 gallotannins, 25 ellagitannins, and 33 chebulic ellagitannins, and among them 38 were identified for the first time as new compounds of CF.Furthermore, 1,3,6-tri-O-galloyl-β-D-glucose, terflavin A, 1,2,6-tri-O-galloyl-β-D-glucose, punicalagin B, chebulinic acid, chebulagic acid, 1,2,3,4,6penta-O-galloyl-β-D-glucose, and chebulic acid, among the 23 common characteristic constituents, of the CF samples were screened out after multivariate statistical analyses.Finally, the antioxidant capacity of CF was evaluated, and the spectrum-antioxidant correlation was determined by Pearson's correlation analysis and a grey relational analysis before a comparative DPPH • -pre-treated UPLC analysis was applied to verify the significant antioxidant metabolites in CF.Our study not only provided a promising strategy for the global characterization and identification of the antioxidant constituents of CF, but it also laid a good technical and methodological foundation for future quality control and applications of CF.However, future studies are still needed to uncover the composition changes during collection, preservation, and processing; the distribution of phenolcarboxylic and tannic constituents in CF; the easily confused CMs; the different medicinal parts of T. chebula; and the development of medicinal and industrial products.In addition, in vivo and clinical applications of CF antioxidants are still needed before the development and application of CF and its main metabolites can be fully realized.

Figure 1 .
Figure 1.The new strategy for identification and characterization of the components in CF.

Figure 1 .
Figure 1.The new strategy for identification and characterization of the components in CF.

Figure 2 .
Figure 2. Classic molecular networking re-analyzed by MolNetEnhancer.(A) GNPS-based molecular networking of the UHPLC-QTOF-MS data; (B) the display of the ion abundance ratios of three types of hydrolyzable tannins and their core phenolcarboxylic acyls in mass spectrometry by spider-web mode; (C) the common phenolcarboxylic acyls for HHDP.

Figure 2 .
Figure 2. Classic molecular networking re-analyzed by MolNetEnhancer.(A) GNPS-based molecular networking of the UHPLC-QTOF-MS data; (B) the display of the ion abundance ratios of three types of hydrolyzable tannins and their core phenolcarboxylic acyls in mass spectrometry by spider-web mode; (C) the common phenolcarboxylic acyls for HHDP.

Table 1 .
The compounds identified by UPLC/IM-QTOF-MS and MS/MS molecular networking.
* The constituents identified by comparisons with the standard compounds; MNs represent those constituents identified based on analysis of the GNPS-molecular networking.