New Type of Tannins Identified from the Seeds of Cornus officinalis Sieb. et Zucc. by HPLC-ESI-MS/MS

There is a lack of information on the compound profile of Cornus officinalis Sieb. et Zucc. seeds. This greatly affects their optimal utilization. In our preliminary study, we found that the extract of the seeds displayed a strong positive reaction to the FeCl3 solution, indicating the presence of polyphenols. However, to date, only nine polyphenols have been isolated. In this study, HPLC-ESI-MS/MS was employed to fully reveal the polyphenol profile of the seed extracts. A total of 90 polyphenols were identified. They were classified into nine brevifolincarboxyl tannins and their derivatives, 34 ellagitannins, 21 gallotannins, and 26 phenolic acids and their derivatives. Most of these were first identified from the seeds of C. officinalis. More importantly, five new types of tannins were reported for the first time: brevifolincarboxyl-trigalloyl-hexoside, digalloyl-dehydrohexahydroxydiphenoyl (DHHDP)-hexdside, galloyl-DHHDP-hexoside, DHHDP-hexahydroxydiphenoyl(HHDP)-galloyl-gluconic acid, and peroxide product of DHHDP-trigalloylhexoside. Moreover, the total phenolic content was as high as 79,157 ± 563 mg gallic acid equivalent per 100 g in the seeds extract. The results of this study not only enrich the structure database of tannins, but also provide invaluable aid to its further utilization in industries.


Introduction
Studies have shown that plant-based diets rich in polyphenols can exert healthpromoting effects by reducing the risk of many diseases, such as cancer and neurodegenerative, cardiovascular, and inflammatory diseases. Therefore, it is vital to explore new sources of bioactive plant polyphenols and carry out their characterization for promoting human health [1][2][3].
Cornus officinalis, also known as Asiatic dogwood, is a deciduous shrub in the genus Cornus of the family Cornaceae that is mainly distributed in China, Korea, and Japan [4]. The pericarp of its fruit is used as a traditional Chinese herbal medicine that is widely used clinically along with other herbal medicine to treat different symptoms. For example, it has been used in combination with Mantidis Oötheca, Rubi Fructus, and Rosae laevigatae Fructus to clinically treat the urinary bladder dysfunction. It is prescribed together with Radix Rehmanniae Praeparata, Dioscoreae Rhizoma, Alismatis rhizoma, Moutan Cortex, and Poria to treat patients with vertigo, tinnitus, and waist and knees weakness [5]. Because of its wide traditional clinical use, many phytochemical and pharmacological studies have been conducted on the fruit pericarp. To date, about 90 compounds have been isolated and identified and are classified as terpenoids, flavonoids, tannins, polysaccharides, phenylpropanoids, sterols, and carboxylic acids, with iridoids, tannins, and flavonoids being the major compounds [6]. They display a wide range of pharmacological activities, such as hypoglycemic [7], antibacterial [8], hypolipidemic [9], antioxidant [10], anticancer [11], neuroprotective [12], and hepatoprotective activities [13].
In contrast, few reports have been published on the seeds because of their minor applications. The seeds account for approximately 50% of the fresh fruit's weight. It is estimated that approximately 6000 tons of seeds are generated annually, owning to the huge pericarp consumption in the Chinese medicine industry [14]. Moreover, seed use meets the 12th sustainable development goal (SDG 12), sustainable consumption and production of the United Nations (UN) 2030 Agenda for Sustainable Development 2015 [15]. Therefore, there is an urgent need to develop optimal processing methods for the valorization of seeds.
Notably, fruit seeds can be converted into biofuels. Production of bio-oil from the seeds of cherry plum and peach has been reported [16][17][18][19]. However, based on our observations of the seeds of C. officinalis, we found that each seed has a tiny kernel surrounded by a thick wall of the lignified endocarp. Biofuel conversion of the seeds is not feasible, owing to the tiny kernel. The kernel is the main source for biofuel conversion, as it comprises abundant fatty acids; however, it accounts for less than 5% of the seed weight. Nevertheless, we found a water-soluble yellow powder substance in the cavities of the thick endocarp, which accounts for 40% of the endocarp weight. Moreover, it gives a strong positive reaction to FeCl 3 solution (6C 6 H 6 OH + FeCl 3 → H 3 [Fe(C 6 H 6 O) 6 ] (purple color) + 3HCl), indicating the presence of polyphenol.
In our pre-experimental study, we found that the number of reported polyphenols was far less than the number of polyphenols detected by HPLC in the pre-experiment. Therefore, the main objectives of the present study were to characterize and identify polyphenols in the seeds of C. officinalis and to provide valuable information for its use on an industrial scale, such as antioxidant additives in food or drugs.
The HPLC-ESI-MS/MS is a powerful tool used for the separation and identification of polyphenols in plant extracts and can provide an invaluable contribution to polyphenol analysis. It was employed as the main investigation tool to achieve the study objectives [22].

General
In this study, a total of 90 phenolic components were identified using coupled chromatographic and mass spectrometric analysis of the water-soluble extract obtained from the seeds of C. officinalis. They were classified into nine brevifolincarboxyl tannins and their derivatives, as well as 34 ellagitannins, 21 gallotannins, and 26 phenolic acids and their derivatives. Among them, we reported five new types of tannin for the first time: brevifolincarboxyl-trigalloyl-hexoside, digalloyl-dehydrohexahydroxydiphenoyl (DHHDP)-hexdside, galloyl-DHHDP-hexoside, DHHDP-hexahydroxydiphenoyl(HHDP)galloyl-gluconic acid, and the peroxide product of DHHDP-trigalloylhexoside.
The polyphenols were identified based on their chromatographic profiles, their MS data of [M-H] − , and their MS/MS fragmentation profiles by comparing with published data. Notably, the obtained deprotonated polyphenol molecules and their typical cleavage of precursor ions accelerated their identification. The MS spectrum of the brevifolincarboxyl moiety was first revealed by the specific fragment ions at m/z 247, 273, and 291. The DHHDP moiety in the tannin structure was indicated by the fragment ions of brevifolincarboxyl moiety, together with a fragment ion, indicating a 44-Da mass loss from the [M-H] − resulting from rearrangement and decarboxylation, and this greatly helped in the identification of the DHHDP moiety. The fragment ions at m/z 249.03, 275.02, and 300.99 are typical of the HHDP moiety. The galloyl moiety was revealed by the fragment ions at m/z 169.01 and 125.02. Furthermore, the number of galloyl structures in tannin can be determined by a group fragment ion representing a 152-Da mass difference, indicating that consecutive galloyl moieties are lost. The 44-Da mass loss from the pseudo-molecular ion is characteristic of phenolic acid. The total ion chromatogram of the seed water extract of C. officinalis in the negative ESI model is illustrated in Figure 1. Compound identification within each class is detailed below and summarized in Table 1. brevifolincarboxyl moiety, together with a fragment ion, indicating a 44-Da mass loss from the [M-H] − resulting from rearrangement and decarboxylation, and this greatly helped in the identification of the DHHDP moiety. The fragment ions at m/z 249.03, 275.02, and 300.99 are typical of the HHDP moiety. The galloyl moiety was revealed by the fragment ions at m/z 169.01 and 125.02. Furthermore, the number of galloyl structures in tannin can be determined by a group fragment ion representing a 152-Da mass difference, indicating that consecutive galloyl moieties are lost. The 44-Da mass loss from the pseudo-molecular ion is characteristic of phenolic acid.
The total ion chromatogram of the seed water extract of C. officinalis in the negative ESI model is illustrated in Figure 1. Compound identification within each class is detailed below and summarized in Table 1.   [3]. Additionally, typical fragment ions for brevifolincarboxyl moiety (274 Da) at m/z 247.0257, 273.0030, and 291.0139 were exhibited (see D8) [23]. Additionally, a hexose core (180 Da) can be determined based on the mass difference between the molecular weight (910 Da) and the total weight of the determined moieties (730 Da). Therefore, A1-1 was putatively assigned as a brevifolincarboxyl-trigalloyl-hexoside. Moreover, we found a fragment ion at m/z 435.0560 that resulted from the loss of H 2 O from the fragment ion at m/z 453.0302. This shows the presence of brevifolincarboxyl-hexoside moiety, hence supporting the proposed structure. Furthermore, two other compounds A1-2 and A1-3, with retention times of 21.28 min and 23.80 min, respectively, displayed the same pseudomolecular fragment ion and fragment patterns, indicating the occurrence of two brevifolincarboxyl-trigalloylhexoside isomers. Regarding the structures of the three isomers, the differences were based on the position of the linkages of the three gallic and brevifolincarboxyl moiety to the hexose core.
To the best of our knowledge, brevifolincarboxyl-trigalloyl-hexoside-type tannins have not been reported. The only two analog compounds reported are decarboxylated geraniin, a product of geraniin treated with sodium benzenesulfinate [24], and repandusinin from the genus Mallotus [25].
Hydrolysable tannins include gallotannin (GT) and ellagitannin (ET). They are the polyol esters, usually of glucose or quinic acid [26], with the moieties of HHDP and gallic acid [22]. A1 features a tannin with a brevifolincarboxyl moiety linked to hexose, which has not been widely described. The MS/MS data of A1-1 revealed that the fragment ions at m/z 247, 273, and 291 could be used as typical indicator ions for identifying a brevifolincarboxyl moiety in a tannin structure [3,27]. The A1-1 MS fragment pattern is shown in Figure 2. Hydrolysable tannins include gallotannin (GT) and ellagitannin (ET). They are the polyol esters, usually of glucose or quinic acid [26], with the moieties of HHDP and gallic acid [22]. A1 features a tannin with a brevifolincarboxyl moiety linked to hexose, which has not been widely described. The MS/MS data of A1-1 revealed that the fragment ions at m/z 247, 273, and 291 could be used as typical indicator ions for identifying a brevifolincarboxyl moiety in a tannin structure [3,27]. The A1-1 MS fragment pattern is shown in Figure 2.  This was speculated to be formed by the loss of two consecutive galloyl moieties and H 2 O from the fragment ion at 757.0929. However, the mass difference of 44 Da indicated that the fragment ion at 757.0929 was formed by the loss of a carboxylic moiety from the pseudomolecular ion at m/z 801.0795. Thus, it can be inferred that the brevifolincarboxyl group did not constitute the final structure of A3. Based on the reaction of geraniin with sodium benzenesulfinate [24], we suggested that a brevifolincarboxyl group was formed from DHHDP by rearrangement and decarboxylation ( Figure 3). Therefore, A3 was tentatively identified as digalloyl-DHHDP-hexoside. This was speculated to be formed by the loss of two consecutive galloyl moieties and H2O from the fragment ion at 757.0929. However, the mass difference of 44 Da indicated that the fragment ion at 757.0929 was formed by the loss of a carboxylic moiety from the pseudo-molecular ion at m/z 801.0795. Thus, it can be inferred that the brevifolincarboxyl group did not constitute the final structure of A3. Based on the reaction of geraniin with sodium benzenesulfinate [24], we suggested that a brevifolincarboxyl group was formed from DHHDP by rearrangement and decarboxylation ( Figure 3). Therefore, A3 was tentatively identified as digalloyl-DHHDP-hexoside.   A1-1, A1-2, A1-3, A2, and A3. Therefore, A4 was tentatively identified as a DHHDP-trigalloylhexoside. This study is the first to report this finding from the seeds of C. officinalis. The only type of tannin in DHHDP-trigalloylhexoside is isoterchebin with a structure of 1,2,3-O-galloyl-4,6-O-DHHDP-β-D-glucose, which has been reported in the fruit of Cornus officinalis Sieb. et Zucc. by Okuda, 1981 [29].  [30]. Therefore, A6 was tentatively assigned as a DHHDP-HHDP-galloyl-gluconic acid. The group fragment ions at m/z 247.0253, 273.0068, and 291.0096, corresponding to the brevifolincarboxyl moiety, confirmed the occurrence of the DHHDP moiety.
Tannin-type DHHDP-HHDP-galloyl-gluconic acid of A6 has not been reported to date. A typical mass loss of 318 Da was observed with the DHHDP moiety [3]. This tannin type is characterized by a gluconic acid as the polyol core, that is rarely reported in the tannin structure. Lagerstannin C (galloyl-HHDP-gluconic acid) from Lagerstroemia speciosa L. pers, punigluconin (digalloyl-HHDP-gluconic acid) from pomegranate (Punica granatum L.) peel, and 12 mixed HHDP-galloylgluconic acids from the jabuticaba species are examples of gluconic acid as the core of tannin [31].
Tannin-type DHHDP-HHDP-galloyl-gluconic acid of A6 has not been reported to date. A typical mass loss of 318 Da was observed with the DHHDP moiety [3]. This tannin type is characterized by a gluconic acid as the polyol core, that is rarely reported in the tannin structure. Lagerstannin C (galloyl-HHDP-gluconic acid) from Lagerstroemia speciosa L. pers, punigluconin (digalloyl-HHDP-gluconic acid) from pomegranate (Punica granatum L.) peel, and 12 mixed HHDP-galloylgluconic acids from the jabuticaba species are examples of gluconic acid as the core of tannin [31].
As Based on the fragmentation pattern of A4, it was assumed that A7 was a peroxide product of DHHDP-trigalloylhexoside (Figure 4), which should be regarded as an intermediate product in the decarboxylation process from A4 to A1.  Possible structures of the tannin-type A1-A7 are illustrated in Figure 5. Regarding the polyphenol structure of tannin, the moieties attached to the polyol are galloyl group in gallotannin (type I), HHDP group in ellagitannin (type II), DHHDP group in dehydroellagitannin (type III), and transformed DHHDP group in transformed dehydroellagitannin (type IV) [32]. Herein, we first report the occurrence of brevifolincarboxyl moiety as the substituent to the hexose core in A1-1, A1-2, A1-3, and A2 from the seeds of C. officinalis. To date, there are only two reported brevifolincarboxyl tannin: 1-O-galloyl-3,6-HHDP-4-O-brevifolincarboxyl-β-D-glucopyranose, that is the basic hy- Possible structures of the tannin-type A1-A7 are illustrated in Figure 5. Regarding the polyphenol structure of tannin, the moieties attached to the polyol are galloyl group in gallotannin (type I), HHDP group in ellagitannin (type II), DHHDP group in dehydroellagitannin (type III), and transformed DHHDP group in transformed dehydroellagitannin (type IV) [32]. Herein, we first report the occurrence of brevifolincarboxyl moiety as the substituent to the hexose core in A1-1, A1-2, A1-3, and A2 from the seeds of C. officinalis. To date, there are only two reported brevifolincarboxyl tannin: 1-O-galloyl-3,6-HHDP-4-Obrevifolincarboxyl-β-D-glucopyranose, that is the basic hydrolytic product of geraniin and repandusinin from the genus Mallotus [24,25]. Our study promoted the brevifolincarboxyl tannin structure diversity, which can be classified as a new type V tannin. Based on the DHHDP moiety fragment pattern in A3, A4, and A5, the brevifolincarboxyl moiety is thought to be biosynthetically derived from the DHHDP moiety by rearrangement decarboxylation and lactonization [24]. The bio-relationship can be then illustrated in Figure 6.
Molecules 2023, 28, x FOR PEER REVIEW 11 of 25 drolytic product of geraniin and repandusinin from the genus Mallotus [24,25]. Our study promoted the brevifolincarboxyl tannin structure diversity, which can be classified as a new type V tannin. Based on the DHHDP moiety fragment pattern in A3, A4, and A5, the brevifolincarboxyl moiety is thought to be biosynthetically derived from the DHHDP moiety by rearrangement decarboxylation and lactonization [24]. The bio-relationship can be then illustrated in Figure 6.

B1-1 showed [M-2H
] 2-at an m/z of 708.0711 with a retention time of 9.6 min, corresponding to a molecular weight of 1418 Da. The produced mono-charged fragment ion at m/z of 785.0749 corresponded to a valoneoyl-galloyl-hexoside, such as isorugosin B [33], without resulting from the loss of the HHDP-galloyl-hexose moiety (e.g., gemin D, B2). This enabled the tentative identification of B1-1 as a dimer composed of a valoneoyl-galloyl-hexoside and HHDP-galloyl-hexoside, such as camptothin A [34]. The fragment ions at m/z 481.0508 corresponded to the HHDP-hexoside moiety. It was the result from the loss of one HHDP moiety from the pseudo-parent ion. B3 was then tentatively identified as a bis-HHDP-hexose-type tannin [22]. The fragment ion at m/z 300.9970 was typical for ellagic acid, indicating the occurrence of the HHDP moiety. Dissociation of the ion at m/z 300.9970 yielded an m/z 257.0208 (loss of 44 Da, free carboxyl unit), which is characteristic of LHHDP produced by the loss of 44 Da, a free carboxyl unit, from ellagic acid.
B4-3 with a retention time of 16.19 min was characterized as a digalloyl-HHDP-hexoside-type tannin, as with tellimagrandin I as an exemple [36].   Figure 6. Suggested biosynthesis route of brevifolincarboxyl moiety from the DHHDP moiety. The fragment ions at m/z 481.0508 corresponded to the HHDP-hexoside moiety. It was the result from the loss of one HHDP moiety from the pseudo-parent ion. B3 was then tentatively identified as a bis-HHDP-hexose-type tannin [22]. The fragment ion at m/z 300.9970 was typical for ellagic acid, indicating the occurrence of the HHDP moiety. Dissociation of the ion at m/z 300.9970 yielded an m/z 257.0208 (loss of 44 Da, free carboxyl unit), which is characteristic of LHHDP produced by the loss of 44 Da, a free carboxyl unit, from ellagic acid.

Ellagitannins
B4-3 with a retention time of 16.19 min was characterized as a digalloyl-HHDPhexoside-type tannin, as with tellimagrandin I as an exemple [36]. This identification was possible based on its [M-H] − ion at m/z 785.0776 and the release of typical fragment ions at m/z 483.1278 corresponding to an HHDP-hexoside moiety, resulting from the loss of two galloyl moieties (152 Da) from the [M-H] − ion. Typical fragment ions at m/z 300.9963, 275.0201, and 249.0407 confirmed the appearance of the HHDP moiety in B4-3. B4-3 has other two isomers, B4-2 and B4-1, with retention times at 11.65 and 13.85 min, respectively, that showed a similar fragment pattern as B4-3.
The molecular weight of B5-1 was determined to be 1570 Da based on the doubly deprotonated ion at m/z 784.0739 with a retention time of 12.51 min. The fragment ion at m/z 785.0741 corresponded to an HHDP-digalloyl-hexoside moiety, such as tellimagrandin I (B4), indicating that B5-1 was a dimer composed of two HHDP-digalloyl-hexoside moi-eties by the elimination of H 2 . The fragment ion at m/z 450.990, attributed to valoneic acid tridilactone [VTL-1] − , indicated the occurrence of a valoneoyl bridge. Thus, B5-1 was tentatively determined as an HHDP-digalloyl-hexoside dimer type tannin, such as cornusiin A [34]. Moreover, the fragment ions at m/z 633.06 and 300.9970 attributed to an HHDP-digalloyl-hexoside moiety, such as gemin D (B2), and ellagic acid further supported the proposed B5-1 structure. Additionally, there were five other isomers or anomers of B5-1, with retention times of 13.46, 14.38, 15.51, 17.95, and 19.62 min, that showed identical MW and fragmentation patterns.
B6 displayed a molecular weight of 1086 Da, based on the doubly deprotonated ion at m/z 542.03, with a retention time of 14.38 min. The fragment ion at m/z 785.0763 was attributed to the HHDP-digalloyl-hexoside moiety, such as tellimagrandin I (B4), which resulted from the loss of the EA moiety from the pseudo-parent ion. The cornusiin B isomer was tentatively assigned as E13 [37]. The fragment ions at m/z 633.0634, 450.970, and 300.997 indicated the appearance of gemin D, VTL, and EA moieties, respectively, which supported the proposed structure.
B7 exhibited a pseudomolecular ion at m/z 953.0909. The fragment ion m/z at 785.0754 resulted from the loss of a valoneoyl moiety, which was supported by the fragment ions at m/z 909.0972, corresponding to the loss of 44-Da carboxyl unit. The fragment ions at m/z 615.0253 and 462.9903 were attributed to dehydrated galloyl-HHDP-hexoside and dehydrated HHDP-hexoside, respectively. They indicated the consecutive loss of two gallyol moieties from the fragment ion at m/z 785.0754. Thus, B7 was presumed to be a compound of valoneoyl-HHDP-digalloyl-hexoside-type tannin, such as isocoriariin B [15]. The fragment ions at m/z 249.0408, 275.0200, 300.996, 169.0124, and 125.0101 were attributed to the HHDP and galloyl moieties in B7, which further confirmed the supposed structure.
A molecular weight of 2202 Da was assigned to B8, based on the doubly deprotonated ion at m/z 1100, with a retention time of 15.51 min. The fragment ion at m/z 1417.0668 resulted from the loss of valoneoyl-galloyl-hexoside moiety, such as isorugssin F. This indicated the appearance of a B1 moiety, the dimer conjugated by HHDP-galloyl-hexoside and valoneoyl-digalloyl-hexoside, such as gemin D (B2) and isorugssin F. The fragment ion at m/z 633.0761 indicated the occurrence of the HHDP-galloyl-hexoside moiety, such as gemin D (B2), by the 1568-Da mass loss of a dimer conjugated with two valoneoyldigalloyl-hexoside, such as isorugssin B. Based on these results, B8 was identified as a trimer of HHDP-galloyl-hexoside and two valoneoyl-digalloyl-hexoside, such as cornusiin F [38]. Moreover, the fragment ions at m/z 450.9950, 783.0518, and 1567.1799 indicated the occurrence of the VTL moiety, dehydrated valoneoyl-digalloyl-hexoside moiety, such as isorugssin F, and the moiety resulting from the loss of an HHDP-galloyl-hexoside moiety, such as gemmin D, from the pseudo-parent ion. These findings supported the stipulated structure of B8.
A molecular weight of 2354 Da was assigned to B9-1, based on the doubly deprotonated ion at m/z 1176.0540, with a retention time of 15.51 min. The fragment ion at m/z 1417.1671 resulted from the loss of a valoneoyl-digalloyl-hexoside moiety, such as isorugssin B, indicating the appearance of B1 moiety, a dimer conjugated by HHDP-galloyl-hexoside, and valoneoyl-digalloyl-hexoside, such as gemmin D and isorugssin F. Fragment ion at m/z 633.0614 indicated the occurrence of gemmin D by the 1720-Da mass loss of a dimer conjugated with valoneoyl-digalloyl-hexoside and valoneoyl-trigalloyl-hexoside moieties, such as isorugssin B and isorugssin F, from the pseudo-parent ion. Based on these results, B9-1 was identified as a trimer of HHDP-galloyl-hexoside, valoneoyl-digalloyl-hexoside, and valoneoyl-trigalloyl-hexoside, such as cornusiin C [34]. Moreover, the fragment ions at m/z 450.9904, 783.052, and 935.064 indicated the occurrence of a valoneoyl moiety, dehydrated valoneoyl-digalloyl-hexoside moiety, such as isorugssin F, and dehydrated valoneoyl-trigalloyl-hexoside, such as isorugssin B. These supported the assumed structure of the B9-1. B9-2, B9-3, B9-4, and B9-5 displayed retention times of 18.36, 17.01, 17.95, 19.62, and 22.04 min, respectively. Moreover, they exhibited similar fragment patterns as B9-1. Therefore, they have identified as isomers of B9-1 with a difference in the position of the moieties linked to the hexose core or anomers with a different configuration of the anomeric hydrogen at C-1 of the hexose core.
A molecular weight of 938 Da was assigned to B10-1, based on the doubly deprotonated ion at m/z 468.0396, with a retention time of 17.95 min. The fragment ion at m/z 767.05313 was attributed to the dehydrated HHDP-digalloyl-hexoside (B4), such as tellimagrandin I, resulting from the loss of a galloy moiety and H 2 O, which enabled the identification of B10-1 as a HHDP-trigalloyl-hexoside-type tannin, such as tellimagrandin II [39]. The other fragment ions at m/z 614.9811 and 300.9920 indicated the occurrence of dehydrated HHDP-digalloyl-hexoside and HHDP moieties, which supported the proposed structure of B10-1. B10-2, with a retention time of 18.36 min, was also identified as a HHDP-trigalloyl-hexoside-type tannin as B10-1 with the difference in the position of the moieties linked to the hexose core base on the similar fragment pattern.
A molecular weight of 1722 Da was assigned to B11-1, based on the doubly deprotonated ion at m/z 860.0783, with a retention time of 17.01 min. The fragment ion at m/z 937.0754 resulted from the loss of HHDP-digalloyl-hexoside, indicating the occurrence of valoneoyl-digalloyl-hexoside, such as isorugosin B moiety, which enabled the tentative identification of B11-1 as a dimer of HHDP-digalloyl-hexoside and valoneoyl-digalloylhexoside, such as cornusiin D [39]. It released a fragment ion at m/z 632.97, which was attributed to a HHDP-galloyl-hexoside moiety (B2) resulting from the loss of an HHDP (302 Da) from the pseudo-molecular ion. B12-1 was tentatively identified as galloyl-bis-HHDP-hexoside. Moreover, the fragment ion at m/z 783.01 resulted from the loss of gallic acid from [M-H] − . The presence of the HHDP moiety was confirmed by the fragment ion at m/z 300.997. B12-2, with a retention time of 21.64 min, exhibited a galloyl-bis-HHDP-hexoside-type tannin that displayed a fragment pattern similar to that of B12-1. The galloyl-bis-HHDP-hexoside-type tannin has been reported in pomegranate (Punica granatum L.) peel [40], but that has not been detected in the seeds of Cornus officinalis Sieb. et Zucc.
B13 was assigned as an ellagic acid pentoside which had a pseudo-molecular ion at m/z 433.0394 and MS/MS fragment ions at 299.997 and 300.9634; this dissociation pattern was observed in Fragaria chiloensis berries [41] and attributed to an ellagic acid pentoside.
B14, which exhibited a pseudo-molecular ion at m/z 447.0561 and fragmentation ions at m/z 315.0157 (loss of pentoside residue, 132 Da) and m/z 299.9885 (further loss of methyl) in the MS/MS spectrum, could be attributed to methyl ellagic acid pentoside. This hypothesis is in agreement with the result of the fragmentation that yielded m/z of 271 by the loss of CO 2 from methyl ellagic acid. Methyl ellagic acid derivatives were also detected in strawberries by Seeram et al. [42].
The examples of the structures of the tannin type of B1-B14 are illustrated in Figure 7.

Gallotannins
C1-1, C1-2, and C1-3, with retention times of 4.2, 5.37, and 7.05 min, respectively, were characterized as monogalloyl-hexoside isomers. This identification was based on the [M-H] − ion at m/z 331.069, and the fragment ions at m/z 169.012 indicating the loss of a hexose moiety (162 Da) and m/z 125.023 typical for the galloyl moiety resulting from the loss of the carboxylic function (44 Da) [31]. These compounds differ in the linkage position of the galloyl moiety to the hexose core.
Five compounds C2-1 to C2-5 (t R 6.0, 7.05, 8.69, 10.09, and 10.69 min), with the same precursor ion of m/z 483.07, were identified as digalloyl-hexoside isomers, relying on the product ions at m/z 331.069, corresponding to a monogalloyl-hexoside, and resulting from the loss of galloyl moiety (152 Da) from the parent ion [31]. Moreover, the fragment ion at m/z 169.012 indicating a galloyl moiety resulted from the loss of a hexose moiety (162 Da) from the pseudo-molecular ion.
C7 gave a pseudo-molecular ion [M-H] − at m/z 361.0796, which liberated fragment ions at m/z 169.0129, indicating the loss of a heptose moiety (210 Da), and m/z 125.023 typical for galloyl moiety. Therefore, this compound was identified as monogalloyl-heptoside, which was in accordance with previous results [13].
C8 was assigned as a digalloyl heptoside, which displayed an [M-H] − at m/z 513.0904. It produced fragment ions at m/z 361.0781 (C7) and 343.071, which resulted from the loss of a gallic acid moiety (152 Da) and a further loss of water (18 Da).
The examples of the structures of the tannin type of C1-C8 are illustrated in Figure 8.   [44][45][46][47][48]. The ellagic acid was assigned to D7, based on the typical fragment ions m/e at 249.03, 275.02, and 300.9966. D8, with a precursor ion [M-H] − at m/z 291.05, was assigned as brevifolin carboxylic acid, relying on the fragment ion of m/z 247.0256 resulted from the loss of carboxyl moiety. The MS data were in agreement with those previously reported for brevifolin carboxylic acid [23].

Hydroxycinnamic Acids and Their Derivatives
D9 and D10 were tentatively identified as the citric acid derivatives based on the typical fragment ions of citric acid at m/z 102.9472, 146.9371, and 190.9266. D13-1 was identified as caftaric acid (m/z 311.0364), which showed the loss of a tartaric acid moiety in the MS/MS experiment (132 Da) and a partial decarboxylation of the caffeic acid moiety resulting in fragments at m/z 179.0555 and 135.0125. This fragmentation pattern was also observed for D13-2, as characterized by the retention times specified in Table 1. This is presumably due to D/L isomers of tartaric acid. D11 (m/z 295.0671) was identified as a caffeoylmalic acid, based on its fragments at m/z 133.0124, 71.0120, and 115.0020, which are characterized to malic acid moiety, as well as the typical fragment of caffeic acid at m/z 179. D12 revealed a [M-H] − ion at m/z 393.03 and a loss of 98 Da in the MS/MS, resulting in a fragment at m/z 295.0671, which, in turn, showed a fragmentation pattern identical to D11. Therefore it was concluded that D12 represented a caffeoylmalic acid derivative [44][45][46][47][48]. Additionally, typical fragments of ellagic acid at m/z 300.9971 and 299.9874 were observed. Therefore, D24 was identified as valoneic acid bilactone isomer [3]. To our knowledge, valoneic acid bilactone has not been reported in the seeds of Cornus officinalis Sieb. et Zucc.

Hydroxybenzoic Acids and Their Derivatives
The structures of the phenolic acids are illustrated in Figure 9.

Non-Phenolic Compounds
Other non-phenolic compounds, such as free malic, citric, tartaric, and quinic acids, were identified. E3-1 and E3-2 exhibited the same fragments at m/z 71.01199, 115.00212, and 133.01247, which are characterized by the fragmentation pattern of malic acids. However, differences in the retention time were observed for E3-1 and E3-2 at 4.2 and 5.37 min, respectively, indicating the two isomers of malic acids. E2-1 and E2-2 exhibited the same [M-H] − ion at m/z 149.0081, which were detected with retention times of 3.91 and 12.84 min, indicating the occurrence of two tartaric acid isomers. This identification was based on the fragments at m/z 105.0180 and 87.0066. E4-1 and E4-2, exhibiting the same [M-H] − ion at m/z 191.0193, were detected at 4.2 and 5.37 min, indicating the occurrence of different isomeric structures, and they were identified as quinic acids, based on the typical fragment of quinic acid at m/z 191.0193,173.0683, and 111.0068 [44][45][46][47][48].
The structures of the non-phenolic compounds are illustrated in Figure 10.

Non-Phenolic Compounds
Other non-phenolic compounds, such as free malic, citric, tartaric, and quinic acids, were identified. E3-1 and E3-2 exhibited the same fragments at m/z 71.01199, 115.00212, and 133.01247, which are characterized by the fragmentation pattern of malic acids. However, differences in the retention time were observed for E3-1 and E3-2 at 4. The structures of the non-phenolic compounds are illustrated in Figure 10.

Total Phenolic Content (TPC)
The identified compounds indicated that the aqueous extract of the seeds was rich in tannins. We then investigated the TPC using the Folin-Ciocalteu colorimetric method, which showed a result of 79,157 ± 563 mg gallic acid equivalent (GAE)/100 g in the seed extract. Compared to the tannin-rich fruits, such as raspberries (average 233. 50 Figure 10. Structures of the non-phenolic compounds.

Total Phenolic Content (TPC)
The identified compounds indicated that the aqueous extract of the seeds was rich in tannins. We then investigated the TPC using the Folin-Ciocalteu colorimetric method, which showed a result of 79,157 ± 563 mg gallic acid equivalent (GAE)/100 g in the seed extract. Compared to the tannin-rich fruits, such as raspberries (average 233.50 mg/100 g in fresh weight), pomegranates (>10 g/100 g in dry material) peach kernels (ranging from 12.7 to 3.8 g/100 g), or the kernels of apricot cultivars (ranging from 209.4 to 10.60 mg GAE/100 g), the seeds extract of C. officinalis provides a new source of tannins, indicating its potential as an antioxidant for use in the food industry [49][50][51][52].

Solvents and Reagents
Gallic acid (GA) was obtained from the National Institutes for Food and Drug Control (Beijing, China). Acetonitrile and formic acid were of HPLC grade and purchased from Dikma Scientific (Tianjin, China). Folin-Ciocalteu reagent was obtained from Yuanyie Biotech Co., Ltd. (Shanghai, China). The water was distilled and deionized.

Plant Source
Mature fruits of C. officinalis were harvested in October 2021 from the Muzhi country in Luoyang, Henan, China. The samples were identified by Prof. Ximing Lu, Medical College, Henan University of Science and Technology, Luoyang, China. After separation from fruits, the seeds were air-dried at room temperature and then stored at 4 • C prior to analysis. Voucher specimens are maintained in the college herbarium, with certificate No. 22-7(7).

Sample Preparation
Owing to the structurally unstable nature of polyphenols, we performed the percolation extraction method at room temperature (20 • C). For the polyphenols, being water soluble, we used water as the extracting solvent. Percolation was performed in a stainlesssteel percolator with a ball valve at the bottom. The inner diameter and height of the percolator were 5 cm and 30 cm, respectively. First, 20 mL water was poured into the percolator, then 50 g milled seeds were added. Percolation was performed at room temperature, with a flow rate of percolate 0.5 L/h using 600 mL H 2 O. Thereafter, the seed extract solvent was placed in a freeze dryer (SCIENTZ-30FG, Ningbo, China). After thermal equilibration, the shelf temperature was lowered to −40 • C and maintained for 12 h. Subsequently, the system was evacuated to a pressure of 20 Torr, and the shelf temperature was adjusted to −40 • C and held for 24 h. The shelf temperature was then raised successively to −20 • C (8 h), 0 • C (6 h), and finally, to 20 • C (2 h). The resulting amorphous samples were weighted and sealed at 4 • C for further analysis.

LC-MS Analysis
LC-MS analyses were carried out using a Dinonex Ultimate 3000 UHPLC system (Ultimate 3000-Thermo Scientific, Waltham, MA, USA), coupled with a quadrupoleorbitrap hybrid mass analyzer (Q-Exactive, Thermo Scientific). The chromatographic separation of the polyphenol extract was achieved on an Eclipse Plus C18 analytical column (250 mm × 4.6 m, 2.6 µm, ZORBAX, Agilent, Palo Alto, CA, USA). The column temperature was set at 30 • C. The mobile phase was composed of (A) water with 0.2% formic acid and (B) acetonitrile with 0.2% formic acid. Elution was accomplished with the following solvents gradient: 0-3 min 10% B, 18% B at 13 min and kept unchanged until 16 min, and 30% B at 25 min and kept unchanged until 30 min. Finally, the system returned to 10% B in 2 min. The flow rate and the injection volume were 0.6 mL/min and 10 µL, respectively. The acquisition was carried out in negative ionization mode (ESI-). The ESI temperature was set at 300 • C, the capillary temperature at 320 • C, and the electrospray voltage at 2.8 kV. Sheath and auxiliary gas were 30 and 5 arbitrary units, respectively. The acquisition was performed in full scan/ddMS 2 modes. The parameters were optimized as follows: (i) full scan acquisition: resolution 70,000 FWHM (at m/z 200); (ii) dd-MS 2: resolution 17,500 FWHM (at m/z 200). The normalised collision energy (NCE) was set at 30.

Total Phenolic Content
Diluted seeds extract (5 µL) was placed in each well of a 96-well plate and mixed with 10 µL of Folin-Ciocalteu reagent, 100 µL of H 2 O, and 50 µL of 10% sodium carbonate, and the mixture was shaken for 30 s. Total polyphenols were determined after 1 h of incubation at room temperature in the dark. The absorbance was then measured at 765 nm on a microplate reader HBS-1096A (DeTie, Nanjing, China). Gallic acid was used as a standard. The standard curve (1) with r as 0.9993 was prepared using different concentrations of gallic acid. The total phenolic contents were calculated as mg of gallic acid equivalent (GAE) per 100 g of the extract. The results were expressed as the mean ± standard deviations of three replications. Y = 0.1227 X + 0.0085, where Y is the value of the absorbance; X is the concentration of samples.

Conclusions
In this study, water-soluble compounds in the seeds of Cornus officinalis Sieb. et Zucc. were identified using HPLC-ESI-MS/MS. A total of 97 compounds were characterized and classified as brevifolincarboxyl tannins and their derivatives, ellagitannins, gallotannins, phenolic acids and their derivatives, and non-phenolic acids. Five new types of tannins have been identified. Moreover, the method to effectively recognize the brevifolincarboxyl moiety and DHHDP moiety from the MS/MS data using typical fragment ions was summarized. Furthermore, the study of the inferred structures of tannins with technologies such as NMR and X-ray crystallography is needed in further research. The results of this study not only enrich the structures of tannin-type compounds, but also provide invaluable information for its further utilization in the industry.