Preliminary Characterization of Phytochemicals and Polysaccharides in Diverse Coffee Cascara Samples: Identification, Quantification and Discovery of Novel Compounds

Coffee cascara is the first and most significant by-product of the coffee processing industry, whose valorization has become an urgent priority to reduce harmful environmental impacts. This work aimed to provide an improved understanding of phytochemicals and polysaccharides in coffee cascara in order to offer information for the better evaluation of potential applications. Phytochemicals in 20 different coffee cascara samples were ultrasonically extracted and analyzed by HPLC-UV and HPLC-MS/MS. Four novel compounds were isolated for the first time from coffee cascara, including two still unknown tautomers (337 Da), and two dihydroflavonol glycosides (dihydromyricetin glycoside and dihydromyricetin rhamnosylglycoside). Their presence can contribute to the design of new value-added applications of coffee cascara. Chemical characterization of two polysaccharides from two of the coffee cascara pulp samples showed that they were mainly composed of homogalacturonan, with rhamnose and arabinose as minor neutral sugars. In addition, principal component analysis results indicated that coffee cultivar and/or country significantly impacted the phytochemical composition of coffee cascara, although differences may be reduced by the external environment and processing method. It is suggested that processing method should be carefully designed when generating coffee cascara from the same cultivar and country/farm.


Introduction
Coffee is a highly traded global commodity [1] and stands out as one of the most popularly consumed beverages worldwide, representing an important pillar of the economy in many developing countries [2]. In the coffee crop year of 2018/2019, the global output of coffee reached approximately 10 million tons [3]. However, it should be noted that this consumption only represents coffee beans, which account for just 20% of the total coffee berry weight. The remaining 80% corresponds to the by-products generated during coffee processing, which may be generally divided into two categories: pre-roasting by-products including coffee skin, pulp, mucilage, and parchment, and post-roasting by-product coffee silverskin [1,4]. Upon coffee harvesting and processing, cascara is the first generated by-product, whose characteristics and composition strongly depend on the post-harvest processing methods. Coffee cascara husk (CH) is obtained from the traditional dry method and comprises coffee skin, pulp, mucilage, parchment, and maybe part of the silverskin, accounting for nearly 45% of the total coffee cherry mass [5]. Coffee cascara pulp (CP) is commonly generated from the mechanical de-pulping process included in the wet method and semi-dry method [6], and mainly contains the coffee skin and pulp [7,8]. Therefore, coffee cascara husk and pulp are regarded as the two main by-products from coffee bean production, and a huge amount of these two by-products is generated every year. Generally, coffee cascara husk and pulp are considered environmental pollutants, and their accumulation will lead to severe environmental issues if they are not treated properly [9], especially with regard to the impact on freshwater ecosystems [10].
With the continuous increase of global coffee production and consumption, the coffee by-product management problem has become critical, and the development of novel applications based on the transformation of these by-products into value-added resources has received considerable interest from researchers worldwide in recent years, such as in biotechnological and agricultural/environmental applications, including the production of compost [11][12][13], mushrooms [14], enzyme [15], succinic acid [16], and biofuels [17][18][19]. In addition, coffee cascara (husk and pulp) has been verified as a safe food ingredient [20] and has noteworthy amounts of high-value components (such as protein, fibers, polysaccharides, and bioactive and flavor compounds) that could be recovered and reused as value-added products [21,22]. Therefore, food industrial applications of coffee cascara have also been developed [23].
Indeed, coffee cascara has been proven to be a potential source of macro-and/or micro-nutrients and non-nutrient health-beneficial bioactive compounds, which can be applied as either whole ingredients or specific-compound(s)-enriched extracts [7]. For instance, the extraction of sugars or flour from coffee cascara has been patented for use in products for human or animal consumption [24]. Moreover, the aqueous extract of coffee cascara has been applied in the production of healthy and sustainable yogurts possessing α-glucosidase inhibition activities [25], and a safe and novel "instant cascara" beverage with antioxidant properties has been developed based on this aqueous cascara extract [26]. Results from a comprehensive scientometric analysis showed that the most active applications of coffee by-products were based on the bioactive compounds, among which food applications were the most studied [27]. Therefore, it is possible to say that the coffee cascara-based formulation of functional food products is developing, is a research area of ongoing interest, and represents a feasible, efficient, and sustainable strategy for valorization of coffee cascara [27].
To design suitable processing strategies for such applications of coffee cascara, a proximate, quantitative characterization of all the phytochemicals is necessary and required but has been scarcely reported. In addition, the impact of the processing methods, cultivars, and cultivation areas of the coffee cascara samples should also be assessed, since they may affect in a significant manner the composition and concentration of constituents, and thus the proper utilization of the product [28].
In the present work, 20 coffee cascara (pulp and husk) samples generated from 13 cultivars were collected from 13 local farms in seven countries, and a preliminary characterization was carried out by HPLC-UV and HPLC-MS/MS, mainly including the identification and quantification of the main bioactive compounds and searching for novel compounds. Additionally, pectic polysaccharides present in two of the coffee pulp samples were also extracted and chemically characterized. The results obtained in this study will provide an improved understanding of phytochemicals and pectic polysaccharides in coffee cascara and may further contribute to the practical development/application of these coffee by-products as functional foods or beverages.

Extraction and Identification of Phytochemicals in CP and CH Samples
CP and CH samples were ground into fine powder using a pulverizer (Xiamen Hehui Electronic Technology Co., Led., Fujian, China) and stored at 4 • C for further use.
An ultrasonic-assisted method was used for extracting the phytochemicals in CP and CH. Briefly, the sample powder (1 g) was mixed with H 2 O (30 mL) at a solid-to-liquid ratio of 1:30 (w/v) and sonicated (Qsonica Q700-220 equipped with a probe tip of 12.7 mm diameter) for 3 min (1 s burst and 1 s cooling period) at 50% of maximal power. After sonication, the crude extract was centrifuged at 10,000× g (4 • C) for 20 min, and then the recovered supernatant was filtered through a 0.45 µm membrane, giving rise to the final sample, which was stored at 4 • C for further analysis. All the experiments were performed in triplicate.
HPLC analysis was performed on an Agilent 1220 Infinity II system, equipped with a diode array detector (DAD). Separation of the phytochemicals was carried out using a Shim-pack GIS C18 column (4.6 × 250 mm, 5 µm; Shimadzu), thermostated at 30 • C. The gradient elution program was as follows: 20% B at 0 min, 70% B at 70 min. The flow rate was 1 mL/min, and the injection volume was 10 µL. The detection wavelength was set at 280 and 320 nm, and on-line UV-vis spectra of the phytochemicals were recorded between 190 and 600 nm by means of the DAD. Solvent A was water with 0.1% formic acid (v/v) and HPLC elution fractions of the major phytochemicals were collected and then analyzed by electrospray ionization tandem mass spectrometry (ESI-MS/MS) for identification. Mass spectrometry (MS) was carried out using a TSQ Endura mass spectrometer (Thermo Fisher Scientific, Waltham, MA, USA) equipped with a heated electrospray ionization (HESI) source and a triple quadrupole analyzer. The fractions were infused into the ESI source at a flow rate of 0.16 mL/min by means of a UHPLC (UltiMate 3000, Thermo Fisher Scientific, Waltham, MA, USA) coupled with the mass spectrometer, and the solvent was methanol/H 2 O (50/50, v/v) acidified with 0.1% (v/v) formic acid. MS analyses were performed in both positive and negative modes under the following conditions: the ESI source temperature was 300 • C, the spray voltage was set at 3.5 kV for the positive mode and 3 kV for the negative mode, and nitrogen was used as sheath gas (35 Arbitrary, Arb) and aux gas (10 Arb). Mass spectra were recorded with a scan rate of 1000 Da/sec in the mass range m/z 100-1000. Tandem mass spectrometry analyses were performed with argon as collision-induced dissociation (CID) gas (0.5 m Torr) using a collision energy ramped from 0-55 V. The full-scan MS/MS spectra were obtained over an m/z range of 100-500. Data analysis was carried out using the Thermo Xcalibur software (version 4.0.27.10, Thermo Fisher Scientific, Waltham, MA, USA).

Extraction of CP Pectic Polysaccharides (CPPs)
CPPs were extracted according to the method previously described by Reichembach and de Oliveira Petkowicz [29], with minor modifications. Briefly, 3 g of the CP powder was boiled in 20 mL of 80% ethanol (v/v) for 20 min under reflux, in order to remove the pigments and low molecular weight compounds and inactivate the endogenous enzymes. The alcohol insoluble residue (AIR) was then recovered by vacuum filtration through a 0.45 µm membrane and washed with 10 mL of anhydrous ethanol three times. After drying at room temperature (18 • C), the AIR was mixed with 0.1 M HNO 3 in a solid-liquid ratio of 1:25 (w/v) and boiled for 30 min under reflux to extract the CPPs. The mixture was filtered and concentrated at 10,000× g for 20 min, and the recovered supernatant was then used to precipitate the CPPs by addition of 2 volumes of anhydrous ethanol. After a 16-h precipitation at 4 • C, the CPPs were centrifuged (10,000× g, 20 min), washed with 10 mL of anhydrous ethanol three times, and lyophilized, giving the final CPP samples. The content of protein, total phenolics, total polysaccharides, and uronic acid were all determined by colorimetric methods. The content of protein was measured by the Bradford method, and BSA was used as standard. The total phenolics content (TPC) was measured by the Folin-Ciolcateu method [30], and gallic acid was used as standard. The content of total polysaccharides was measured by the phenol-sulfuric acid method with glucose as standard. The content of uronic acid was measured by the m-hydroxydiphenyl method after saponification of the CPPs in 0.05 M NaOH for 30 min at room temperature [31], and GalA was used as standard. All the experiments were performed in triplicate.

Monosaccharide Composition
CPPs were hydrolyzed with 2 M TFA at 120 • C for 2 h, and the hydrolysates were then dried by rotary evaporation, washed three times with pure method to completely remove the TFA residue, and redissolved in H 2 O. After filtration through a 0.45 µm membrane, the samples were analyzed by HPLC (Agilent 1220 Infinity II equipped with a refractive index detector) using an Aminex ® HPX-87H ion exclusion column (7.8 × 300 mm; Bio-Rad, Hercules, CA, USA). Monosaccharides were eluted isocratically with 5 mM H 2 SO 4 under a flow rate of 0.45 mL/min. The injection volume was 20 µL, and the RID detector and the column were thermostated at 40 • C and 55 • C, respectively.

Statistical Analysis
Principal Component Analysis (PCA) was performed using SPSS Statistics 19.0 (IBM, Armonk, NY, USA) and visualized using Microsoft Office Excel Professional Plus 2019 software. The 7 quantified phytochemicals of 8 CP and 12 CH were analyzed by PCA, except for theobromine which was only found in CH sample 7.

Identification and Quantification of Phytochemicals in CP Samples
It was previously reported that coffee by-products contain a significant amount of caffeine [32], so HPLC analysis was carried out using both the aqueous CP extracts without ( Figure 1 [33]. Therefore, compound 1 was assigned to trigonelline, and its identification was further confirmed by HPLC ( Figure 2c) and UV-vis  [33]. Therefore, compound 1 was assigned to trigonelline, and its identification was further confirmed by HPLC ( Figure 2c) and UV-vis spectrum (Figure 2d) using authentic trigonelline. HPLC elution fraction of compound 2 was collected and directly analyzed by MS for identification. However, the [M + H] + ion corresponding to compound 2 could not be distinguished from the background ions whose signal was strong due to low sample concentration. Therefore, an elution fraction of 100 mL was prepared, lyophilized, and redissolved in 1 mL pure methanol, giving a highly concentrated sample of compound 2. As shown in Figure 3a, this concentrated sample only contains compound 2 at a much higher level than the original ones (  Figure 3c). In addition, as shown in Figure 3c, several minor fragment ions with m/z less than 100 were also observed on the MS/MS spectrum of the ion with m/z 338, which were found on the MS/MS spectrum of the ion with m/z 142.97 as well (Figure 3d), indicating that the ion with m/z 142.97 obtained on Figure 3b was actually derived from m/z 338. Therefore, it could be concluded that the yellow-colored compound 2 would have a molecular weight of 337 Da, which could be fragmented into at least three ions with m/z 321, 303, and 143.
However, compound 2 still could not be successfully identified based on the MS and MS/MS data alone. HPLC elution fraction of compound 2 was collected and directly analyzed by MS for identification. However, the [M + H] + ion corresponding to compound 2 could not be distinguished from the background ions whose signal was strong due to low sample concentration. Therefore, an elution fraction of 100 mL was prepared, lyophilized, and redissolved in 1 mL pure methanol, giving a highly concentrated sample of compound 2. As shown in Figure 3a, this concentrated sample only contains compound 2 at a much higher level than the original ones (  Figure 3c). In addition, as shown in Figure 3c, several minor fragment ions with m/z less than 100 were also observed on the MS/MS spectrum of the ion with m/z 338, which were found on the MS/MS spectrum of the ion with m/z 142.97 as well (Figure 3d), indicating that the ion with m/z 142.97 obtained on Figure 3b was actually derived from m/z 338. Therefore, it could be concluded that the yellow-colored compound 2 would have a molecular weight of 337 Da, which could be fragmented into at least three ions with m/z 321, 303, and 143.
MS/MS spectrum of the ion with m/z 338, which were found on the MS/MS spectrum of the ion with m/z 142.97 as well (Figure 3d), indicating that the ion with m/z 142.97 obtained on Figure 3b was actually derived from m/z 338. Therefore, it could be concluded that the yellow-colored compound 2 would have a molecular weight of 337 Da, which could be fragmented into at least three ions with m/z 321, 303, and 143.
However, compound 2 still could not be successfully identified based on the MS and MS/MS data alone. However, compound 2 still could not be successfully identified based on the MS and MS/MS data alone.
As shown in Figure 4a, two major ions with m/z 125.10 and 169.04 were observed on the MS spectrum of compound 3. Further MS/MS of the latter ( Figure 4b) shows that it is mainly broken down into one fragment ion with m/z 125.11, suggesting that the ion with m/z 125.10 observed in Figure 4a was already due to the fragmentation of m/z 169. Therefore, the ion with m/z 169 is the [M-H] − ion of compound 3, indicating that compound 3 would have a molecular weight of 170 Da, whose specific MS/MS fragment ion is that with m/z 125. These results were consistent with the reported MS and MS/MS data of gallic acid [34]. Finally, compound 3 was unambiguously identified as gallic acid by HPLC ( Figure 4c) and UV-vis spectrum ( Figure 4d) analysis using commercial gallic acid as the external standard.   According to the HPLC analysis (Figure 1), compound 4 was present in the samples at a very low concentration, so an elution fraction of 50 mL was collected by HPLC, which was then lyophilized and redissolved in 1 mL pure methanol, giving a more concentrated sample of compound 4 ( Figure 5a) for use in further identification by MS. In addition, this concentrated sample also exhibits a light-yellow color, but it could not be concluded that compound 4 is itself yellow, because a certain level of contamination by compound 2 was also found in the sample (Figure 5a), which could contribute to the yellow color.
As shown in Figure 5b, it was surprising that the obtained MS and MS/MS spectra of compound 4 are quite similar to those of compound 2 (Figure 3b,c), on the basis of which we could conclude that compound 4 would also have a molecular weight of 337 Da, the same as that of compound 2.
It should be noted that compound 3 (gallic acid, peak 3) is eluted between compound 2 (peak 2) and compound 4 (peak 4) based on HPLC analysis (Figure 1, CP sample 1, in According to the HPLC analysis (Figure 1), compound 4 was present in the samples at a very low concentration, so an elution fraction of 50 mL was collected by HPLC, which was then lyophilized and redissolved in 1 mL pure methanol, giving a more concentrated sample of compound 4 (Figure 5a) for use in further identification by MS. In addition, this concentrated sample also exhibits a light-yellow color, but it could not be concluded that compound 4 is itself yellow, because a certain level of contamination by compound 2 was also found in the sample (Figure 5a), which could contribute to the yellow color. stable than the latter in solution and exists in an equilibrium of two tautomeric forms. Similarly, compound 4 also could not be identified based on the current data.  (Figure 6b). It was previously reported that protocatechuic acid (PA) was found to be one of the main phenolics in coffee pulp, whose molecular weight is 154 Da [35]. Therefore, compound 5 was assigned to PA, and further confirmation was performed by HPLC ( Figure 6c) and UVvis spectrum analysis (Figure 6d) using authentic PA. As shown in Figure 1, a significant decrease in the peak of compound 6 (in black) was observed after CHCl3 treatment (in red), indicating that compound 6 is most likely caffeine. As expected, compound 6 was assigned to caffeine after HPLC verification using the commercial caffeine standard (Figure 7a). Moreover, a small peak 7 at the HPLC retention time of caffeine was found after CHCl3 treatment, which would be caffeine residue or a new compound 7 that co-eluted with caffeine during HPLC analysis. As shown in Figure 7b, the UV-vis spectrum of peak 7 was distinct from that of caffeine, indicating that caffeine was completely removed by CHCl3 treatment, and peak 7 corresponds to a new compound 7 which cannot be detected in the presence of higher levels of caffeine. Finally, As shown in Figure 5b, it was surprising that the obtained MS and MS/MS spectra of compound 4 are quite similar to those of compound 2 (Figure 3b,c), on the basis of which we could conclude that compound 4 would also have a molecular weight of 337 Da, the same as that of compound 2.
It should be noted that compound 3 (gallic acid, peak 3) is eluted between compound 2 (peak 2) and compound 4 (peak 4) based on HPLC analysis (Figure 1, CP sample 1, in black). If the peak of contaminant compound 2 obtained in Figure 5a was introduced due to a poor manual collection of peak 4 during HPLC purification, then a peak corresponding to compound 3 should also be observed in Figure 5a, which is not the case. In addition, UV-vis spectrum analysis of the concentrated sample of compounds 2 and 4 ( Figure 5d) showed that these two compounds have almost the same absorption behaviors in the 190-600 nm spectral range. Combined with the MS analysis results described above, this information indicated a strong relationship between compounds 4 and 2, which could be concluded as follows: compound 4 is most likely a tautomer of compound 2, which is less stable than the latter in solution and exists in an equilibrium of two tautomeric forms. Similarly, compound 4 also could not be identified based on the current data.
On the MS spectrum of compound 5 (Figure 6a), only one major ion with m/z 153.04 was observed, which corresponds to the [M-H] − ion of compound 5, and its MS/MS fragmentation only gives one ion with m/z 109.24 as a specific fragment ion (Figure 6b). It was previously reported that protocatechuic acid (PA) was found to be one of the main phenolics in coffee pulp, whose molecular weight is 154 Da [35]. Therefore, compound 5 was assigned to PA, and further confirmation was performed by HPLC ( Figure 6c) and UV-vis spectrum analysis (Figure 6d) using authentic PA.
On the MS spectrum of compound 5 (Figure 6a), only one major ion with m/z 153.04 was observed, which corresponds to the [M-H] − ion of compound 5, and its MS/MS fragmentation only gives one ion with m/z 109.24 as a specific fragment ion (Figure 6b). It was previously reported that protocatechuic acid (PA) was found to be one of the main phenolics in coffee pulp, whose molecular weight is 154 Da [35]. Therefore, compound 5 was assigned to PA, and further confirmation was performed by HPLC ( Figure 6c) and UVvis spectrum analysis (Figure 6d) using authentic PA. As shown in Figure 1, a significant decrease in the peak of compound 6 (in black) was observed after CHCl3 treatment (in red), indicating that compound 6 is most likely caffeine. As expected, compound 6 was assigned to caffeine after HPLC verification using the commercial caffeine standard (Figure 7a). Moreover, a small peak 7 at the HPLC retention time of caffeine was found after CHCl3 treatment, which would be caffeine residue or a new compound 7 that co-eluted with caffeine during HPLC analysis. As shown in Figure 7b, the UV-vis spectrum of peak 7 was distinct from that of caffeine, indicating that caffeine was completely removed by CHCl3 treatment, and peak 7 corresponds to a new compound 7 which cannot be detected in the presence of higher levels of caffeine. Finally, As shown in Figure 1, a significant decrease in the peak of compound 6 (in black) was observed after CHCl 3 treatment (in red), indicating that compound 6 is most likely caffeine. As expected, compound 6 was assigned to caffeine after HPLC verification using the commercial caffeine standard (Figure 7a). Moreover, a small peak 7 at the HPLC retention time of caffeine was found after CHCl 3 treatment, which would be caffeine residue or a new compound 7 that co-eluted with caffeine during HPLC analysis. As shown in Figure 7b, the UV-vis spectrum of peak 7 was distinct from that of caffeine, indicating that caffeine was completely removed by CHCl 3 treatment, and peak 7 corresponds to a new compound 7 which cannot be detected in the presence of higher levels of caffeine. Finally, compound 7 was unambiguously identified as 3-caffeoylquinic acid (theoretical molecular weight    . This result suggested that compound 8 is most likely a glycosidic compound, and its specific MS/MS fragment ion is that with m/z 319, which also corresponds to the [M-H] − ion of its aglycone (320 Da). Therefore, compound 8 was identified as dihydromyricetin glycoside. Compound 10 was only observed in CP sample 7, which was unambiguously identified as theobromine based on HPLC and UV-vis spectrum analysis by using commercial theobromine as the external standard ( Figure 9). In summary, compounds 1, 3, 5, 6, 7, and 10 were unambiguously identified as trigonelline, gallic acid, protocatechuic acid, caffeine, 3-caffeoylquinic acid, and theobromine, respectively, including 3 alkaloids and 3 phenolic acids.
Compounds 2 and 4 were still unknown, but current data indicated that they were most likely two tautomers with molecular weight of 337 Da. As for compounds 8 and 9, they were successfully identified as two dihydroflavonol glycosides, dihydromyricetin glycoside (DMG), and dihydromyricetin rhamnosylglycoside (DMRG); however, their exact structure determination requires further NMR analysis.
In addition, the quantification and distribution of these 10 compounds in different CP samples is summarized in Table 2, and their identification information is also summarized in Table 3. Table 2. Quantification and distribution of the 10 CP compounds (mg/g).

CP No. Trigonelline Compound 2 † Gallic Acid Compound 4 † Protocatechuic
Acid Caffeine  Compound 9 was only found in the CP sample 4 (Figure 1), whose molecular weight is 628 Da according to the MS analysis (Figure 8c). Although several fragment ions were observed on the MS/MS spectrum of compound 9 (Figure 8d), the presence of the ones with m/z 481 and 319 were more important for identification, which would indicate that compound 9 is most likely a dihydromyricetin glycoside (compound 8) derivative. In addition, the fragment ions with m/z 609.84 and 481.44 would result from the precursor ion with m/z 627.50 by the loss of one water molecule (627 − 18 = 609 Da) and a rhamnosyl (627 − 146 = 481 Da), respectively, and further elimination of a second glycosyl from the latter ion would yield the ion with m/z 319.23 (481 − 162 = 319 Da). Therefore, compound 9 was identified as dihydromyricetin rhamnosylglycoside.
Compound 10 was only observed in CP sample 7, which was unambiguously identified as theobromine based on HPLC and UV-vis spectrum analysis by using commercial theobromine as the external standard ( Figure 9). Compound 10 was only observed in CP sample 7, which was unambiguously identified as theobromine based on HPLC and UV-vis spectrum analysis by using commercial theobromine as the external standard ( Figure 9). In summary, compounds 1, 3, 5, 6, 7, and 10 were unambiguously identified as trigonelline, gallic acid, protocatechuic acid, caffeine, 3-caffeoylquinic acid, and theobromine, respectively, including 3 alkaloids and 3 phenolic acids.
Compounds 2 and 4 were still unknown, but current data indicated that they were most likely two tautomers with molecular weight of 337 Da. As for compounds 8 and 9, they were successfully identified as two dihydroflavonol glycosides, dihydromyricetin glycoside (DMG), and dihydromyricetin rhamnosylglycoside (DMRG); however, their exact structure determination requires further NMR analysis.
In addition, the quantification and distribution of these 10 compounds in different CP samples is summarized in Table 2, and their identification information is also summarized in Table 3. In summary, compounds 1, 3, 5, 6, 7, and 10 were unambiguously identified as trigonelline, gallic acid, protocatechuic acid, caffeine, 3-caffeoylquinic acid, and theobromine, respectively, including 3 alkaloids and 3 phenolic acids.
Compounds 2 and 4 were still unknown, but current data indicated that they were most likely two tautomers with molecular weight of 337 Da. As for compounds 8 and 9, they were successfully identified as two dihydroflavonol glycosides, dihydromyricetin glycoside (DMG), and dihydromyricetin rhamnosylglycoside (DMRG); however, their exact structure determination requires further NMR analysis.
In addition, the quantification and distribution of these 10 compounds in different CP samples is summarized in Table 2, and their identification information is also summarized in Table 3. Values are expressed as the mean ± SD. Different letters within same columns indicate significant difference at p < 0.05. " † ": quantified by using the standards prepared in lab, and the others were quantified based on the use of authentic standards; "-": not detectable; "+": compounds detected in the sample but unquantifiable due to lack of authentic standards; " ‡ ": quantified based on its absorption at 360 nm, where caffeine doesn't have any absorption.

Identification and Quantification of Phytochemicals in CH Samples
As shown in Figure 10, a total of 6 compounds were observed in the 12 CH samples, all of which were already found in the CP samples, including already unambiguously identified trigonelline (compound 1), protocatechuic acid (compound 5), caffeine (compound 6), and 3-caffeoylquinic acid (compound 7), as well as the two still unknown tautomers (compounds 2 and 4), and their quantification and distribution in different CH samples is summarized in Table 4. Values are expressed as the mean ± SD. Different letters within same columns indicate significant difference at p < 0.05. " † ": quantified by using the standards prepared in lab, and the others were quantified based on the use of authentic standards; "-": not detectable; "+": compound detected in the sample but unquantifiable due to very low concentration; " ‡ ": quantified based on its absorption at 360 nm, where caffeine doesn't have any absorption.  Values are expressed as the mean ± SD. Different letters within same columns indicate significant difference at p < 0.05. " † ": quantified by using the standards prepared in lab, and the others were quantified based on the use of authentic standards; "-": not detectable; "+": compound detected in the sample but unquantifiable due to very low concentration; " ‡ ": quantified based on its absorption at 360 nm, where caffeine doesn't have any absorption. In general, CH samples contained fewer compounds compared to CP samples. Because gallic acid, theobromine, and the two novel dihydromyricetin glycosides (compounds 8 and 9) were not observed in CH samples, they may not exist at all or are present in an undetectable level. Therefore, it could be considered if these four compounds might be further used as biological marker compounds for identification of some coffee cultivars or coffee cascara samples. Moreover, for the 4 common compounds previously reported in coffee cascara samples, including trigonelline, caffeine, 3-caffeoylquinic acid, and protocatechuic acid, the average value of the three former compounds in CP samples (trigonelline: 6.84 mg/g; caffeine: 7.07 mg/g; 3-caffeoylquinic acid: 3.96 mg/g) were higher than in CH samples (trigonelline: 6.49 mg/g; caffeine: 5.34 mg/g; 3-caffeoylquinic acid: 1.12 mg/g), and the latter was scarcely present in CH samples.
It is also worth noting that the identification and quantification of 3-caffeoylquinic acid in coffee cascara samples should be carefully performed, because it could not be separated from caffeine under some HPLC gradients and would then be easily ignored in the presence of high levels of caffeine. Thus, at least two detection wavelengths were proposed during HPLC analysis, including 280 nm and 360 nm, since caffeine does not have any absorption at 360 nm.

Chemical Characterization of CPPs
As described above, more compounds were found in the two CP samples 1 (Coffea arabica L. var. Castillo) and 2 (Wush Wush, Ethiopian heirloom) from Colombia, whose contents were also relatively higher, showing a better potential for further application from the point view of bioactive compounds. Moreover, coffee pulp was reported as a source of pectic polysaccharides, and a more recent study on a pectin from Brazilian coffee (Coffea arabica L.) pulp suggested the role of coffee pectin as a suitable ingredient in the food industry [29]. Therefore, these two CP samples were finally selected for CPP extraction and preliminary characterization, which may be used for further evaluation of their application potential from the point view of macromolecular compounds.
Generally, as shown in Table 5, CPPs from CP sample 1 (CPPs-1) and CP sample 2 (CPPs-2) contained similar levels of protein and total phenolics. Furthermore, CPPs-1 was obtained in a higher yield than CPPs-2, while its galacturonic acid content was lower than that of CPPs-2. Values are expressed as the mean ± SD. * CPPs yield was calculated according to dry mass of the AIR using the following equation: Yield (%) = (mass of lyophilized CPPs/mass of dried AIR) × 100.
It was reported that the composition of pectic polysaccharides varies according to the plant source and extraction methods [36], and galacturonic acid is predominantly contained in pectic polysaccharides as residues of their main linear chains (homogalacturonan or HG segments, also known as 'smooth regions'), as well as other neutral sugars in the side chains (rhamnogalacturonans or RG-I segments, also known as 'hair regions') [37], such as rhamnose, arabinose, galactose, etc. [38]. Therefore, monosaccharides determine the unique structures and properties of polysaccharides as natural basic units, and their composition is required for the structural characterization of polysaccharides [39]. The results showed that CPPs-1 and CPPs-2 had similar pectic monosaccharide compositions, suggesting that the effect of cultivar on the composition of polysaccharides might be reduced when the external environment was the same, since these two CP samples were collected from the same local farm in Colombia. Of course, an originally similar polysaccharide composition of these two coffee cultivars still could not be fully excluded. As shown in Figure 11 and Table 6, GalA was measured as the predominant constituent monosaccharide of both CPPs-1 and CPPs-2, and Glu, Rha, and Ara were their minor neutral constituent monosaccharides. However, the main neutral monosaccharide (visualized as peak 3 in Figure 11a) of these two CPP samples still could not be confirmed, because the Aminex ® HPX-87H ion exclusion column could not separate Gal, Xyl, Man, and Fuc, and other methods such as HPAEC (High performance anion exchange chromatography), gas chromatography (GC), and GC-mass spectrometry (GC-MS) should be considered for a complete characterization. Thus, these results suggested that CPPs-1 and CPPs-2 could mainly consist of HG, meaning polygalacturonic acid-rich 'smooth regions', but their side chain lengths that can be revealed by the value of the molar ratio (Ara + Gal)/Rha [29] could not be estimated due to a lack of the molar proportion of Gal. other methods such as HPAEC (High performance anion exchange chromatography), gas chromatography (GC), and GC-mass spectrometry (GC-MS) should be considered for a complete characterization. Thus, these results suggested that CPPs-1 and CPPs-2 could mainly consist of HG, meaning polygalacturonic acid-rich 'smooth regions', but their side chain lengths that can be revealed by the value of the molar ratio (Ara + Gal)/Rha [29] could not be estimated due to a lack of the molar proportion of Gal. Figure 11. HPLC analysis of constituent monosaccharides of CPPs-1 and CPPs-2 (a) and monosaccharide standards (b). The peaks 1, 2, 4 and 5 in (a) are identified as GalA, Glu, Rha and Ara, respectively. Peak 3 in (a) cannot be identified, because its corresponding standard peak (blue peak 3 in (b)) represents the peak of co-eluted Gal, Xyl, Man and Fuc.

PCA Analysis of 20 Coffee Cascara Samples
Principal component analysis was conducted on seven phytochemicals among 20 coffee cascara samples ( Figure 12). The first two principal components, PC1 (44.3%) and PC2 (34.0%), explained 78.3% of the total variance. These seven phytochemicals (Figure 12a) determined the distribution of the 20 coffee cascara samples (Figure 12b), among which caffeine and 3-caffeoylquinic acid had greater impact. Compared to CP samples, relative Figure 11. HPLC analysis of constituent monosaccharides of CPPs-1 and CPPs-2 (a) and monosaccharide standards (b). The peaks 1, 2, 4 and 5 in (a) are identified as GalA, Glu, Rha and Ara, respectively. Peak 3 in (a) cannot be identified, because its corresponding standard peak (blue peak 3 in (b)) represents the peak of co-eluted Gal, Xyl, Man and Fuc.

PCA Analysis of 20 Coffee Cascara Samples
Principal component analysis was conducted on seven phytochemicals among 20 coffee cascara samples ( Figure 12). The first two principal components, PC1 (44.3%) and PC2 (34.0%), explained 78.3% of the total variance. These seven phytochemicals (Figure 12a) determined the distribution of the 20 coffee cascara samples (Figure 12b), among which caffeine and 3-caffeoylquinic acid had greater impact. Compared to CP samples, relative lower content of caffeine and 3-caffeoylquinic acid within a small variation range (Tables 2 and 4) led to a good clustering of CH samples (Figure 12b). CP samples did not present a clustering, mainly because they were generated from distinct coffee cultivars (due to a random sample collection) ( Figure 12c) and collected from different countries (Figure 12d), which is why their content of caffeine and 3-caffeoylquinic acid varied in a significant manner. Therefore, these analyses implied that coffee cultivar and/or country determined the phytochemical composition of coffee cascara.
In addition, although CH samples were also produced from five different coffee cultivars, they still clustered (Figure 12c), which indicates their similarities in phytochemicals. This was most likely because 4 of the coffee cultivars were collected from the same country (Panama, Figure 12d), and the same process method (natural process) was applied (Figure 12e), which could suggest that the impact of coffee cultivar on the phytochemicals might be reduced by the same external environments, such as country (mainly geography, climate, and soil) and process method. This also implied that when coffee cascara was generated from the same cultivar and country, the selection of process method would be important. Of course, it still could not be fully excluded that these four coffee cultivars (Caturra, Typica, Geisha, and Catuai) were originally similar in phytochemicals.
However, a more substantial evaluation was limited by the number of biological replicates investigated. chemicals might be reduced by the same external environments, such as country (mainly geography, climate, and soil) and process method. This also implied that when coffee cascara was generated from the same cultivar and country, the selection of process method would be important. Of course, it still could not be fully excluded that these four coffee cultivars (Caturra, Typica, Geisha, and Catuai) were originally similar in phytochemicals.
However, a more substantial evaluation was limited by the number of biological replicates investigated.

Possible Structure of the Four Unknown Compounds
As described above, unknown compounds 2 and 4 were most likely two tautomers with molecular weight of 337 Da, which might be considered as derivatives of p-coumaroylquinic acid according to several excellent studies on in-depth characterization of chlorogenic acids by LC-MS n [40][41][42][43]. To the best of our knowledge, they were isolated for the first time from coffee cascara, which could be therefore regarded at least as novel coffee cascara compounds. However, their possible structure could not be predicted in this study due to a lack of sufficient MS/MS fragmentation information. The reason was that their corresponding [M + H] + ion with m/z 338 could not be well fragmented at low collision energy (Figures 3c and 5c), and this ion was completely fragmented when the energy was increased slightly, leading to an observation of fragment ions mainly with m/z less than 100 (data not shown). Therefore, their specific MS/MS fragment ions could not be clarified in both cases, which were important for preliminary prediction/identification of their structure. Further optimization of the MS/MS analytical conditions should be conducted, or other analysis methods should be considered. It was found that the novel

Possible Structure of the Four Unknown Compounds
As described above, unknown compounds 2 and 4 were most likely two tautomers with molecular weight of 337 Da, which might be considered as derivatives of p-coumaroylquinic acid according to several excellent studies on in-depth characterization of chlorogenic acids by LC-MS n [40][41][42][43]. To the best of our knowledge, they were isolated for the first time from coffee cascara, which could be therefore regarded at least as novel coffee cascara compounds. However, their possible structure could not be predicted in this study due to a lack of sufficient MS/MS fragmentation information. The reason was that their corresponding [M + H] + ion with m/z 338 could not be well fragmented at low collision energy (Figures 3c and 5c), and this ion was completely fragmented when the energy was increased slightly, leading to an observation of fragment ions mainly with m/z less than 100 (data not shown). Therefore, their specific MS/MS fragment ions could not be clarified in both cases, which were important for preliminary prediction/identification of their structure. Further optimization of the MS/MS analytical conditions should be conducted, or other analysis methods should be considered. It was found that the novel compound 2 was present in a significant amount in two CP samples (1 and 2, Castillo and Wush Wush cultivars from Colombia) and one CH sample (7, Typica cultivar from Panama), and further clarification of its structure and bioactivities would be beneficial for developing new value-added applications of these coffee cascara in the food, beverage, or cosmetics industries.
Dihydroflavonols represent a relatively scarce group of flavonoids [44], which serve as substrates for the production of colored anthocyanins and flavonols in the biosynthetic pathway of flavonoids. In the reported dihydroflavonol glycosides, most of them were identified as glycosides derived from dihydrokaempferol and dihydroquercetin. However, dihydromyricetin glycosides and dihydromyricetin disaccharides were rarely reported, so only a few examples were found, such as dihydromyricetin-3-O-rhamnoside naturally isolated from the leaves of Erica arborea (Ericaceae) [45], and dihydromyricetin diglucoside and its methylated derivative from jambolão (Syzygium cumini) [46]. For coffee cascara, the previous studies were mostly focused on several well-known compounds, such as caffeoylquinic acids, caffeine, trigonelline, and protocatechuic acid [32,47]. Recently, a study on in-depth characterization of the bioactive compounds in coffee silverskin was reported, in which 30 compounds were identified [48]. However, such dihydroflavonol glycosides were still not found. Therefore, the two dihydromyricetin glycosides (compounds 8 and 9) observed in this study could be regarded as novel coffee cascara polyphenolic compounds, which were reported in coffee cascara for the first time.
Generally, glycosylation was observed at the C 3 -OH of the benzopyran ring C for flavonols and dihydroflavonols. Therefore, these two compounds could be tentatively assigned to dihydromyricetin 3-O-glycoside and dihydromyricetin-3-O-rhamnosyl-glycoside (glycosylation at C 3 -OH of the benzopyran ring C, Figure 13a,b), respectively, and their possible MS/MS fragmentation pathways were also established based on the MS and MS/MS data (Figure 13c,d). However, the glycosylation at C 5 -and C 7 -OH of the benzopyran ring A and at the C 4 -OH of the aromatic ring B could not be fully excluded, since glycosylation of dihydrokaempferol and dihydroquercetin at these positions has been reported, such as dihydrokaempferol-5-O-glucoside (Helicioside A) and dihydroquercetin-5-O-glucoside (Helicioside B) from the leaves of Helicia cochinchinensis (Proteaceae) [49], dihydroquercetin-7-O-glucoside [50], and dihydroquercetin-4 -O-glucoside [51]. In addition, the current MS and MS/MS data could not support a clear identification of the glycosyl group (glucose or galactose) of these two compounds, as well as the interglycosidic linkage between the rhamnosyl and glycosyl group (at C 2 , C 3 or C 5 of the glycosyl group, Figure 13b) for the dihydromyricetin-3-O-rhamnosylglycoside, and further identification by NMR analysis is required.

Conclusions
Phytochemicals from 20 coffee cascara samples (eight coffee cascara pulps and 12 coffee cascara husks) were analyzed by HPLC-UV and HPLC-MS/MS. A total of 10 compounds were observed in eight coffee cascara pulp samples, among which six were also found in 12 coffee cascara husk samples. After identification, six of the 10 compounds were unambiguously identified, including three alkaloids (trigonelline, caffeine, and theobromine) and three phenolic acids (gallic acid, protocatechuic, acid and 3-caffeoylquinic acid). The other four compounds were isolated for the first time from coffee cascara, which could be therefore regarded as novel coffee cascara compounds, including two still unknown tautomers with molecular weight of 337 Da, and two flavonoid glycosides, dihydromyricetin glycoside and dihydromyricetin rhamnosylglycoside. The observation of these compounds would provide an improved understanding of the phytochemicals composition of coffee cascara and may contribute to the design of new value-added application of these coffee by-products in the nutraceutical, cosmetics, or even pharmaceutical industries. It is worth noting that a Chinese study has reported that an enzymatically synthesized dihydromyricetin-7-O-glucoside showed quite similar antidiabetic activity with the antidiabetic drug gliquidone during a clinical trial [52]. Therefore, further detailed structure and bioactivity investigation of these two novel compounds should be performed.

Conclusions
Phytochemicals from 20 coffee cascara samples (eight coffee cascara pulps and 12 coffee cascara husks) were analyzed by HPLC-UV and HPLC-MS/MS. A total of 10 compounds were observed in eight coffee cascara pulp samples, among which six were also found in 12 coffee cascara husk samples. After identification, six of the 10 compounds were unambiguously identified, including three alkaloids (trigonelline, caffeine, and theobromine) and three phenolic acids (gallic acid, protocatechuic, acid and 3-caffeoylquinic acid). The other four compounds were isolated for the first time from coffee cascara, which could be therefore regarded as novel coffee cascara compounds, including two still unknown tautomers with molecular weight of 337 Da, and two flavonoid glycosides, dihydromyricetin glycoside and dihydromyricetin rhamnosylglycoside. The observation of these compounds would provide an improved understanding of the phytochemicals composition of coffee cascara and may contribute to the design of new value-added application of these coffee by-products in the nutraceutical, cosmetics, or even pharmaceutical industries.
In addition, pectic polysaccharides were also extracted from two of the coffee cascara pulp samples and characterized. The results showed that they mainly consisted of polygalacturonic acid-rich 'smooth regions'. However, their clear neutral sugar composition and degrees of methylation and acetylation should be further determined, which would provide more information for estimating the application potentials of coffee cascara pulp from the point view of macromolecules.