Characterization and Classification of Cocoa Bean Shells from Different Regions of Venezuela Using HPLC-PDA-MS/MS and Spectrophotometric Techniques Coupled to Chemometric Analysis

The cocoa bean shell (CBS) is one of the main cocoa byproducts with a prospective to be used as a functional food ingredient due to its nutritional and sensory properties. This study aims to define the chemical fingerprint of CBSs obtained from cocoa beans of diverse cultivars and collected in different geographical areas of Venezuela assessed using high-performance liquid chromatography coupled to photodiodes array and mass spectrometry (HPLC-PDA-MS/MS) and spectrophotometric assays combined with multivariate analysis for classification purposes. The study provides a comprehensive fingerprint and quantitative data for 39 compounds, including methylxanthines and several polyphenols, such as flavan-3-ols, procyanidins, and N-phenylpropenoyl amino acids. Several key cocoa markers, such as theobromine, epicatechin, quercetin-3-O-glucoside, procyanidin_A pentoside_3, and N-coumaroyl-l-aspartate_2, were found suitable for the classification of CBS according to their cultivar and origin. Despite the screening methods required a previous purification of the sample, both methodologies appear to be suitable for the classification of CBS with a high correlation between datasets. Finally, preliminary findings on the identification of potential contributors for the radical scavenging activity of CBS were also accomplished to support the valorization of this byproduct as a bioactive ingredient in the production of functional foods.


Introduction
Several studies have associated the consumption of cocoa and chocolate products with multiple beneficial properties to human health, including inhibition of lipid peroxidation and the protection of LDL-cholesterol against oxidation, reduction of blood pressure, cancer prevention, inflammatory processes modulation in the human body, prevention of the development of metabolic diseases such as obesity and diabetes, etc. [1,2]. These health benefits have been related to the chemical compounds present in these products, mainly polyphenols and alkaloids [3,4]. In addition to their biological activities, theobromine and caffeine; the flavan-3-ols epicatechin and catechin; several procyanidins; and N-phenylpropenoyl amino acids have been reported as key non-volatile molecules contributing to the bitter taste and the astringent mouthfeel imparted during consumption the chemical fingerprint of CBS by solid-phase extraction combined with high-performance liquid chromatography coupled to tandem mass spectroscopy (SPE-HPLC-PDA-MS/MS) and to determine the compounds responsible for the differences among several CBSs from cocoa beans of different cultivars and geographical areas of Venezuela to allow for the authenticity of this material. Moreover, the applicability of the spectrophotometric assays, such as screening assays, was also evaluated for the classification of the CBS. Finally, the correlation among the datasets obtained was investigated to find the link between the single chemical compounds and their responses to screening assays and their contribution to the antioxidant capacity of the CBS.

Cocoa Bean Shell
Fermented cocoa beans (Theobroma cacao L.) from different cocoa-growing areas of Venezuela and cultivars, collected during the seasons of 2014 and 2015, were purchased from several local cocoa companies. In total, 10 samples (2 batches of each) from seven different regions of Venezuela (Sur del Lago, Caucagua, Merida, Cuyagua, Canoabo, Ocumare and Carenero) and two cultivars (Criollo (C) (n = 5), Trinitario (T) (n = 5)) were collected (see Figure S1). Specific information related to fermentation and drying conditions is not available since the suppliers retained this information as confidential. Cocoa beans were roasted at 130 • C for 20 min using a ventilated oven Memmert UFE 550 (ENCO, Spinea, Italy). After separation from the beans, CBS samples were ground in an ultracentrifugal mill Retsch ZM 200 (Retsch Gmbh, Haan, Germany) to obtain a uniform powder with 250 µm of particle size. The humidity content of the CBS (ranging between 5.46-7.44%) was determined using a Gibertini Eurotherm electronic moisture analyzer (Gibertini Elettronica, Novate Milanese MI, Italy). Samples were stored under vacuum at −20 • C before extraction.

Extraction of Bioactive Compounds
For the extraction of polyphenols and methylxanthines extraction, 1.5 g of CBS powder was extracted in 30 mL of an ethanol-water mixture (50:50, v/v), according to the methodology described by Barbosa-Pereira et al. [36]. Extractions were performed at 25 • C under constant rotatory oscillation (3 oscillations × s −1 ) using a VRL 711 orbital shaker (Asal S.r.l., Milan, Italy) for 2 h. Samples were centrifuged at 4200× g using a Heraeus Megafuge 11 Centrifuge (Thermo Fisher Scientific, Hanau, Germany) for 10 min at 4 • C. The extractions were performed in duplicate for all samples, and the supernatants were filtered with a 0.45 µm PTFE filter. Then, 20 mL of each extract was evaporated at 35 • C under constant rotatory agitation at 300 rpm with a flow of nitrogen, using an evaporator/concentrator system Glas-Col ® (Glas-Col LLC, Terre Haute, IN, USA). Finally, the dried extract was redissolved in water for the further purification/fractionation step.

Fractionation and Purification of Bioactive Compounds Present in CBS by SPE
SPE fractionation and purification of CBS compounds were performed with Discovery DPA-6S cartridges (Supelco, Bellafonte, PA, USA) containing 500 mg of polyamide sorbent in 6 mL tubes, fitted with frits, in a SPE vacuum manifold of 24 positions (Phenomenex, Castel Maggiore, Italy). Before sample charge, the cartridges were activated with methanol (5 mL) and then preconditioned with the solvent of samples (5 mL of Mili-Q water). Afterward, 500 µL of crude extract prepared in Mili-Q water, as described above in Section 2.3, was loaded on the cartridge. Subsequently, the analytes adsorbed were eluted with different solvent mixtures. Four separated fractions (F) were obtained as follows: 100% water containing 0.1% of formic acid (F1); methanol-water (50:50) (v/v) (F2); 100% methanol (F3); and finally, 70% acetone-water (70:30) (v/v) followed by 100% acetone (F4), 5 mL each. The fractionation experiments were performed in duplicate to ensure the repeatability of a final number of 120 fractions (40 for each fraction). All fractions were evaporated to dryness, under nitrogen at 35 • C and constant rotatory agitation at 300 rpm, using an evaporator/concentrator system Glas-Col ® (Glas-Col LLC, Terre Haute, IN, USA), and reconstituted with methanol-water containing formic acid 0.1% (50:50) for further analysis (F1 in 1 mL, while F2, F3, and F4 in 200 µL). The fractions yielded were then filtered through a 0.22 µm PTFE filter for HPLC-PDA-ESI-MS/MS analysis. Additionally, the total contents in phenolics, tannins, and flavonoids were assessed by spectrophotometric assays, and the antioxidant capacity of each fraction was measured by the DPPH radical scavenging test.

HPLC-PDA-ESI-MS/MS
Chromatographic analyses were carried out with an HPLC-PDA Thermo-Finnigan Spectra System (Thermo-Finnigan, Waltham, MA, USA). The system was equipped with a P2000 binary gradient pump, SCM 1000 degasser, AS 3000 automatic injector, and Finnigan Surveyor PDA Plus detector.
The CBS compounds were separated on a reverse-phase Kinetex ® 5 µm Phenyl-Hexyl column 100 Å, LC column 150 × 4.6 mm (Phenomenex, Castel Maggiore, Italy) equipped with a SecurityGuardTM analytical guard cartridge system (Phenomenex). Ultrapure water containing 0.1% of formic acid (solvent A) and 100% methanol (solvent B) were used as mobile phase. A gradient elution method was applied as follows: 0-2 min, 90% A and 10% B; 2-18 min, linear gradient from 10 to 50% of B; 18-40 min, linear gradient from 50 to 80% B; 40-42 min, linear gradient up to 90% of B; and 42-45 min a linear gradient up to 90% A and 10% of B for column re-equilibration. The column temperature was set at 35 • C. The mobile phase flow rate was 1.0 mL/min. The sample injection volume was 10 µL. PDA spectra were recorded in full scan modality over the wavelength (λ) range of 200 to 400 nm, and PDA chromatograms were extracted at different wavelengths according to the nature of molecules. The instrument control and data collection and processing were performed with ChromQuest software (version 5.0) (Thermo-Finnigan, Waltham, MA, USA).
The MS/MS analysis was performed with an API 3200 QTRAP LC-MS/MS System (Applied Biosystem Sciex, Foster City, CA, USA) equipped with an ESI Turbo V source and a triple quadrupole mass analyzer (Applied Biosystem Sciex) controlled by Analyst software (version 1.6) (AB SCIEX, Redwood City, CA, USA). The detection of chemical compounds was attained in negative ionization mode using the following conditions: ion spray voltage −4500 V; turbo spray temperature 550 • C; curtain gas 2.07 × 10 5 Pa; interface heater on; nebulizer gas 2.4 × 10 5 Pa; and heater gas 10 × 10 5 Pa. Zero air was used as a nebulizer, heater, and eliminator of eluent, while nitrogen was used as curtain and collision gas. The method of full scan mode at high sensitivity, enhanced mass spectrum (EMS), was used for data acquisition, recorded in the range of m/z 100-1000 amu, operating with the following parameters: declustering potential (DP) of −20 V, entrance potential (EP) Foods 2021, 10, 1791 5 of 25 of −10 V, and collision energy (CE) of −30 eV. Product ions (MS/MS) were generated according to the information-dependent acquisition (IDA) mode, with a threshold of 50,000 cps and a collision energy (CE) of −30 eV, collected in an enhanced product ion (EPI) mode. For methylxanthines, mass spectrometry analyses were carried out under the same conditions described above for the other chemical compounds, but in this case, the detection was accomplished in positive ionization mode.

Identification and Quantification of Chemical Compounds in CBS
Tentative identification of chemical compounds was accomplished by comparing their UV-vis spectrum, molecular ion [M-H] −/+ , and mass spectrometry fragmentation pattern (MS/MS) with those already described in the literature. When commercial standards were available, the individual phenolic compounds were identified by comparing the retention times and UV-vis spectra (λ max ) with those obtained by injecting pure standards under the same HPLC conditions and with molecular ion and MS fragmentation data provided by MS/MS analysis. Standard solutions of each compound were prepared in a 50:50 (v/v) methanol-water containing 0.1% of formic acid mixture. The solutions were injected individually into the HPLC column and eluted under the same analytical conditions described above to determine their chromatographic retention times and to collect UV spectra by the DAD detector and MS fragmentation data. The quantification of each phenolic compound and methylxanthines was made by the external standard method using seven-point regression curves constructed at the wavelength of maximum absorbance of each analyte. Limits of detection (LOD) and quantification (LOQ) for the polyphenols and methylxanthines were calculated by multiplying the signal of the noise of the blanks by 3 and 10 times, respectively. For those polyphenols for which commercial standards were not available, quantification was performed as equivalents of phenolic compounds related to the corresponding family group of molecules. The content of polyphenols and methylxanthines present in CBSs and their fractions was expressed as mg/kg dry weight (dw) and g/kg dw of CBS, respectively.

Total Phenolics, Total Flavonoids, and Total Tannins
The total phenolic content (TPC), total flavonoid content (TFC), and total tannin content (TTC) were assessed according to the spectrometric assays described by Barbosa-Pereira et al. [36]. All the determinations were performed in triplicate using 96-well microplates. The corresponding absorbance was recorded using a BioTek Synergy HT spectrophotometric multi-detection microplate reader (BioTek Instruments, Milan, Italy). Quantification of TPC was carried out using a standard curve of commercial gallic acid (20-100 mg/L, R 2 = 0.9981), and the concentration was expressed as mg of gallic acid equivalents (GAE)/g of CBS, while for TFC and TTC, the results were calculated based on the standard curve of catechin (5-500 mg/L, R 2 = 0.9980) and expressed as mg of catechin equivalents (CE)/kg of CBS.

Antioxidant Capacity
The antioxidant capacity of extracted polyphenols present in CBSs and its fractions was determined by the 2,2 -diphenyl-1-picrylhydrazyl (DPPH·) radical scavenging assay described by Barbosa-Pereira et al. [36]. All determinations were performed in triplicate using 96-well microplates, and the absorbance was measured at 515 nm using a BioTek Synergy HT spectrophotometric multi-detection microplate reader (BioTek Instruments, Milan, Italy). The inhibition percentage (IP) of the radical DPPH was calculated using the following equation: where A 0 is the absorbance at the initial time, and A 30 is the absorbance after 30 min. A linear regression curve of Trolox was used at 12.5-300 µM (R 2 = 0.9988) to calculate the radical scavenging activity (RSA) values. Results were expressed as mmol of Trolox equivalents (TE)/kg of CBS.

Chemometrics and Statistical Analysis
To discriminate the Venezuela CBS samples as a function of the geographic area of origin and cultivar, principal-component analysis (PCA) plots based on the normalized data (log 10 ) were built by using the made4 package of R (https://www.r-project.org) and the function dudi.pca. Analysis of similarity based on phenolic molecules and spectrophotometric assays was applied with 999 permutations to detect significant differences as a function of geographical origin or cultivar, by using the anosim function in vegan package of R. Non-parametric Kruskal-Wallis and Wilcoxon tests were carried out to find chemical compounds differentially abundant between all the variables. Data were visualized as box plots representing the interquartile range between the first and the third quartile, with the error bars showing the lowest and the highest value. Pairwise Spearman's non-parametric correlations (corr.test function in psych package of R) were used to study the relationships between bioactive compounds and spectrophotometric assays and antioxidant capacity. The correlation plots were visualized in R using the corrplot package of R. p-values were adjusted for multiple testing, and a false discovery rate (FDR) <0.05 or lower was considered as statistically significant.

Results and Discussion
A comprehensive characterization of chemical compounds of CBSs from Venezuela was performed to define the chemical fingerprint for the valorization of this cocoa byproduct as a food ingredient due to its bioactivities and sensorial properties. Thus, the present study was divided into two main sections: (1) one dedicated to the chemical characterization by advanced HPLC-PDA-MS/MS and rapid screening spectrophotometric assays, (2) followed by the chemometric analysis and the identification of key markers, which allowed the classification and authentication of CBS.

Chemical Profile of CBS Characterized by HPLC-PDA-MS/MS
The chemical compounds of CBS samples from different growing regions of Venezuela, identified or tentatively identified by HPLC-PDA-MS/MS, are described in Table 1. The fractionation of the raw extract by SPE allowed different fractions to be yielded and contributed to an easier chromatographic separation, identification, and classification of the bioactive compounds. The chromatograms of a representative CBS raw extract and their fractions obtained by HPLC-PDA-MS/MS are shown in Figure S2. The compounds were characterized by their retention times (R t ), the UV-vis spectrum, and the characteristic maximum wavelength, molecular formula, molecular ion [M-H], and mass spectrometry fragmentation pattern (MS/MS) together with those already described in the literature. The confirmation of the identity of some compounds was achieved by comparing with data obtained by injecting pure standards under the same HPLC-PDA-MS/MS conditions (see Table 1). Since studies on the chemical profile of cocoa by-products such as CBS by HPLC-MS/MS are limited or inexistent in the literature, the tentative identification of the chemical compounds will be discussed by comparing data with studies performed in cocoa beans and cocoa products and with the recent study reported by Cádiz-Gurrea et al. [30] on CBS from cocoa beans with origin in Peru. The alkaloids, theobromine and caffeine (compound 2 and 19, respectively), detected at UV λ max of 272 nm, were the most representative group of chemical compounds identified in the CBS samples. The identification was achieved by comparing the retention times and MS/MS fragmentation, acquired in positive mode with commercial standards. These methylxanthines were isolated in the first fraction (F1) yielded from the SPE fractionation process of CBS raw extract with water (see Figure S2). Theobromine, with [M-H] + ion at m/z 181, and MS/MS fragments 167, 138, and 137 m/z, and caffeine, with [M-H] + ion at m/z 195, and MS/MS fragments 181 and 138 m/z, are the main representative alkaloids found in cocoa and cocoa-related products, including cocoa bean shell [3,18,21,22,37].

Phenolic Acids
The compound 1 (λ max of 293 nm, R t 4.73 min), with [M-H] − ion at m/z 153, MS/MS fragment 109 m/z, was identified as protocatechuic acid. The retention time and MS/MS fragmentation pattern obtained with the reference standard confirmed the identity of the phenolic acid. Protocatechuic acid is a hydroxybenzoic acid derivate, widely distributed in nature, reported for cocoa beans and related products in several studies [22,23]. Recently, this compound was also described in cocoa bean husks from Peru [30], Colombia [12], Honduras [38], and Ecuador [11,13].

Flavan-3-ols and Their Glycosides
Compounds 7 and 16, with characteristic UV-visible spectra and λ max at 278 nm, [M-H] − ion at m/z 289, and MS/MS fragments 245, 205, and 203 m/z, were identified in the CBS as catechin and epicatechin, respectively, and their identity was confirmed by comparison with pure standards. These compounds have been recognized as the main compounds of the cocoa bean, cocoa products, and cocoa byproducts [3,20,23,36].
Compound 4 and compound 9, with [M-H] − ion at m/z 451 and the MS/MS fragments 289 and 245 m/z from catechin, acquired at different retention times, 7.3 min and 9.5 min, were tentatively identified as two isomers of flavan-3-ols glycosides, catechin-3-O-glucoside_1 and catechin-3-O-glucoside_2, respectively. These catechin glycosides were described for cocoa beans in studies performed by Patras et al. [19] and Cádiz-Gurrea et al. [30]. Both compounds were identified in CBS samples by HPLC-PDA-MS/MS for the first time in this study.

Procyanidins
In addition to the monomers of flavan-3-ols and their glycosides, several oligomers (procyanidins) and their derivates composed of several unities of catechin and epicatechin were also identified in the CBS samples. A total of 18 compounds, such as B-type procyanidin dimers and trimers and A-type procyanidin glycosides, were tentatively identified among the several fractions yielded from the CBS raw extracts.
Four chemical compounds (compounds 14, 15, 22, and 26) with the same UV-visible spectra λ max at 280, [M-H] − ion at m/z 577, and the fragmentation pattern that results in the fragment 289 m/z, equivalent to the monomers of catechin/epicatechin, were identified as B-type procyanidin dimers. Some of these compounds have been described in the literature in cocoa beans and other cocoa products such as cocoa husks and beans from Peru [19,30]. In addition, eight chemical compounds (11, 13, 21, 24, 25, 27, 29, and 31), with λ max at 278-280 nm, [M-H] − ion at m/z 865, and the same fragmentation pattern in negative ionization mode that results in the fragments 577, 425, 407, and 289 m/z, were identified as B-type procyanidins trimers [39]. As for procyanidins B-type dimers, the same studies mentioned above described these types of polyphenols for cocoa beans and cocoa products. This study describes, for the first time, the identification of B-type procyanidin trimers in CBS. Finally, six compounds (23, 28, 30, 32, 33, and 35), with the UVvisible spectra typical at λ max 280 nm were identified as A-type procyanidins glycosides. 449, and 289 m/z, were identified as isomers of Procyanidin A-type pentosides (1, 2, and 3, respectively) [19]. On the other hand, compound 28 and compound 33, with [M-H] − ion at m/z 737 and a similar fragmentation pattern that results in the fragments 611, 539, 449, and 289 m/z, were identified as isomers of procyanidin A-type hexoside [20,40]. Additionally, compound 32, with [M-H] − ion at m/z 995, which displayed the fragmentation pattern in negative ionization mode that results in the fragments 865 and 407 m/z, was identified as procyanidin A-type trimer arabinoside, as described by D'Souza et al. [20] for fermented cocoa beans. These compounds were not described previously in the literature for cocoa bean shells. , and 119 m/z, as described for cocoa beans, and the UV-visible spectra typical at λ max 286 nm [7]. N-coumaroyl-L-tyrosine (Deoxyclovamide) (compound 34) was tentatively identified based on the [M-H] − ion at m/z 326; the characteristic MS/MS fragments 282, 206, 163, 145, 134, and 119 m/z; and its UV-visible spectra typical at λ max 283 nm, as described by several authors in the literature for cocoa products [37].
Besides the coumaric acid derivates, N-caffeoyl-L-aspartate was also tentatively iden- UV-visible spectra typical at λ max 286 and 318 nm, was tentatively identified as N-Feruloyl-L-aspartate due to the fragmentation pattern characteristic of ferulic acid, 193 and 134 m/z. The hydroxycinnamic acids coumaric, caffeic, and ferulic acids' MS/MS fragmentation patterns were confirmed by comparison with the data obtained from the analysis of pure reference standards. N-Coumaroyl-L-glutamate and N-Feruloyl-L-aspartate were described before for cocoa beans [7] but are described for the first time for CBS in the present work.

Others
Among the compounds described and classified above, two isomers of sulfated compounds containing sulfonic acid [19], compounds 5 (C 11 H 21 O 9 S_1) and 8 (C 11 H 21 O 9 S_2), with UV-visible spectra typical at λ max 278 nm, were tentatively identified based on the [M-  [19,20]. Compound 10 with λ max 293 nm and [M-H] − ion at m/z 357 was tentatively identified as Sweroside, as described by others in the literature for cocoa products [26].

Quantitative Distribution of Bioactive Compounds in CBS and Antioxidant Capacity
The quantification of 11 chemical compounds was performed based on the calibration curves prepared with external standards, according to the analytical parameters described in Table S1. The other compounds were tentatively quantified according to the similarity of the molecular structure as follows: procyanidins expressed as equivalents of procyanidin B1; catechin-3-O-glycosides as equivalents of catechin; flavonol-3-O-glycosides as equivalents of quercetin-3-O-glucoside; and N-phenylpropanoid-L-amino acids as equivalents of the respective phenolic acid present in the structure, p-coumaric, caffeic acid, and ferulic acids. The amounts of each molecule and total amounts of each group of molecules quantified for the several CBS samples are in Table 2. The amounts determined for each CBS quantified in the single fractions are in Tables S2-5 for fractions F1, F2, F3, and F4, respectively.
Among the 39 chemical compounds identified in CBS samples, methylxanthines were the most representative group found in the range of 5.5 and 15.7 g/kg of CBS. As described for cocoa products, theobromine was present in higher concentrations (4.66-9.95 g/kg) than caffeine (0.84-5.80 g/kg) [33]. Furthermore, the CBS samples of Criollo cultivar were those with high amounts of both compounds, and these results are in agreement with those described in the literature for cocoa beans and related products, including CBS [36,41]. Considering the ratios of theobromine/caffeine proposed by several authors for the classification of cocoa cultivars, in this study, this ratio was considered for the first time for CBS, and the results showed similar values between 1.71 and 5.54 g/kg [21]. These values were 10-fold higher than those described in the literature for cocoa beans [22,33] and lower than that reported by others [21,27]. However, the results are in line with those reported for cocoa bean shells in recent years [36,42]. CBSs from the Criollo cultivar showed lower theobromine/caffeine ratios than those observed for CBSs from the Trinitario cultivar. The average value to discriminate CBS Criollo from Trinitario cultivars was around 2, since the values for Criollo cultivars ranged between 1.7 and 1.9, while for the Trinitario cultivar, the ratio ranged between 1.9 and 5.5. This mean value of ratio was lower than that described in the literature for cocoa beans, where values above 3 are characteristic for Trinitario cocoa beans, and values below 3 are characteristic of the Criollo cultivar [21]. Another observation from the CBS analysis was that the percentage of caffeine within the range 0.08-0.5% was higher in samples of CBS from the Criollo cultivar (0.4-0.5%). These contents were similar to those described in the literature by Pedan et al. [21] and higher than those described by Carrillo et al. [27]. The methylxanthines have been related positively to the acceptance of cocoa products by consumers because of their effect on the central nervous system due to their stimulating properties and their mood improvements, among other positive effects on the cardiovascular system or in preventing intestinal inflammation [43]. Thus, CBS constitutes an important source of these bioactive compounds, the CBSs from the Criollo cultivar being those with high interest. These compounds might be isolated with the SPE purification process proposed in this study in fraction F1 (Table S2).
The other group of chemical compounds (n = 37), quantified in this study by HPLC-PDA, was composed mainly of phenolic compounds in the range between 237.1 and 1319.6 mg/kg of CBS. CBSs yielded from the Trinitario cultivar cocoa beans produced in the regions of Merida and Cayagua were those with high amounts of these bioactive compounds, 1319.6 and 1134.9 mg/kg, respectively. On the other hand, CBSs of the Criollo cultivar from the Carenero and Ocumare regions were those with low contents in phenolic compounds (367.4 and 237.1 mg/kg, respectively). The sum of the fractions F2, F3, and F4 (see Tables S3-S5) contributed with 16.9%, 35.5%, and 46.7%, respectively, to the total amount.
Phenolic acids contribute with 8.3% (average value) to the total amount of phenolics in CBS analyzed in this study. Notwithstanding the recent literature reporting other phenolic acids in CBS from Peru [30], protocatechuic acid was the only compound identified in CBSs from Venezuela. In general, the CBSs samples yielded from cocoa beans of the Trinitario cultivar were those with high contents in protocatechuic acid (53.95-113.8 mg/kg). The contents of protocatechuic acid found in this study for the CBS were higher than those found by Rodríguez-Carrasco et al. [22] for the respective cocoa beans from these three origins and cultivars from Venezuela, Canoabo_C, Ocumare_C, and Sur del Lago_T (17.3, 23.7, and 28.8 mg/kg, respectively).
The most representative group of polyphenols present in cocoa products, and therefore also in CBS yielded from cocoa beans from different regions of Venezuela studied in this work, was flavonoids (around 63-82%, average 71.57%), comprising flavan-3-ols and their derivates, procyanidins, and flavanols and their respective glycosides. These values were similar to those described in the literature for cocoa beans and their products [22,23].
The flavan-3-ols catechin and epicatechin were found to be the main compounds in CBS samples, representing around 23% of the total content of polyphenols in Trinitario CBS and 10% for the Criollo cultivar. These compounds were purified by SPE and isolated in fraction 3 (Table S4). Epicatechin was found in higher concentrations in all samples, the samples Merida_1_T and Cayagua_T CBSs being those with high amounts reaching 298.0 and 234.2 mg/kg, respectively. These values are in agreement with the lowest values described in the literature for CBS from different geographical origins [33,36] and were lower than those observed by other studies on cocoa beans [22]. The ratio of epicatechin/catechin (E/C) was also determined for all CBS samples, for the first time in this study, in the range from 5.0 for Cayagua_T samples to 10.0 in Sur del Lago_T. This ratio was previously described by Ioannone et al. [15] for cocoa beans during the roasting process (E/C < 1) and by Damm et al. [18] for chocolate and cocoa samples, in the range from 2.2 to 4.0. These authors also observed that the ratio was not constant among samples and is dependent on the processing conditions. Indeed, considering the data described by Rodríguez-Carrasco et al. [22], the epicatechin/catechin ratios observed for the cocoa beans analyzed from the same geographic areas of Venezuela used in this study, Canoabo_C, Ocumare_C, and Sur del Lago_T, were 1.5, 1.6, and 1.7, respectively, while for the respective CBS analyzed in this study, the values were 6.3, 6.2, and 10.0, respectively. Thus, since the results obtained from CBS samples were significantly higher than cocoa beans, this ratio might be considered a parameter to discriminate CBS from cocoa beans and other cocoa products.
Derivates of flavan-3-ols, the glycosides of catechin, catechin-3-O-glucoside_1, and catechin-3-O-glucoside_2, were also quantified. The sum of these isomeric compounds represents 6% of the total polyphenols present in the CBS that were randomly distributed among all CBS samples analyzed in the range from 17.70 mg/kg of CBS (Canoabo_C) to 96.98 mg/kg of CBS (Cuyagua_T). These compounds were purified by SPE, isolated in fraction F2 (Table S3), and identified and quantified for the first time for CBS of Venezuela.
The total content in oligomers of flavan-3-ols, classified in procyanidins B-type (PCB), PCB trimers, and procyanidins A-type (PCA) glycosides, represents between 32% and 54% of the total amounts of phenolic compounds present in CBS. Among the three groups of PCBs, the PCB trimers were found in higher amounts from 77.66 mg/kg of CBS (Ocumare_2_C) up to 346.5 mg/kg (Merida_1_T) and represent 22.5% of the total content of polyphenols in CBS, while PCBs and PCA glycosides represent around 7.3 and 13.2% of the total amount of chemical compounds, respectively. Regarding the PCBs, PCB_1 was the isomer found in higher amounts, within the range from 6.29 to 48.47 mg/kg, and the CBS samples yielded from cocoa beans from the Trinitario cultivar were those with high concentrations. Considering the PCA glycosides, CBS samples obtained from cocoa beans from the geographical area of Merida were those with high amounts of these compounds (124.4-150.0 mg/kg), in particular PCA pentoside_3 (39.06-71.61 mg/kg). Despite these three groups of PCBs being randomly distributed, considering their sum, the total amount of PCBs was found to be higher for almost all CBSs of the Trinitario cultivar if compared with the Criollo cultivar. The total amount of PCBs in CBS samples of Canoabo_C and Ocumare_2_C was higher than that described for the cocoa beans of the same geographical area by Rodríguez-Carrasco et al. [22], while for CBS samples from Sur del Lago_T, the values were lower for CBS than cocoa beans. However, in this study, a high number of compounds were quantified, compared to the work performed for the cocoa beans, and other compounds might contribute differently to the final amounts.
Flavonols (quercetin) and their glycosides were also quantified for CBS in the range from 5.64 mg/kg of CBS (Ocumare_2_C) to 61.5 mg/kg (Ocumare_1_T), but their contribution for the total amounts of phenolic compounds was around 3.8%, and the contribution of quercetin alone represents less than 0.5% for all CBS samples from Venezuela. The CBSs with high contents of flavonols were those yielded from cocoa beans of the Trinitario cultivar, the quercetin-3-O-arabinoside being the most relevant, as described before in the literature for cocoa beans [34]. The total contents determined in CBS samples were similar or higher than that reported by Rodríguez-Carrasco et al. [22] for the respective cocoa beans and those described by Damm et al. [18] for other cocoa products, despite the heterogenicity of the individual compounds among the studies.
Another group of chemical compounds found in the CBS samples was the N-phenylpropenoyl-L-amino acids that have been proposed as cocoa product chemical markers. These compounds represent 12.6% (from 4.1% to 19.5%) of the total amount of bioactive compounds, in the range from 26.6 mg/kg of CBS (Ocumare_2_C) to 217.9 mg/kg of CBS (Ocumare_1_T). Among them, the pattern N-caffeoyl-L-aspartate > N-coumaroyl-L-aspartate_2 > N-coumaroyl-L-tyrosine was followed for all CBS samples, similar to that described by others for cocoa beans and related products [3,18].
Finally, a small group of chemicals that contributes 7.5% to the total content of bioactive compounds was also quantified for the first time for CBS in the range from 11.93 mg/kg (Ocumare_2_C) to 114.4 mg/kg (Ocumare_1_T). The compounds C 11 H 21 O 9 S_2 and hydroxyjasmonic acid sulfate were those found in high amounts among all CBS samples.
From the data obtained and compared with the study performed on cocoa beans, it was noticed that the chemical profile of CBS is similar to that of cocoa beans. However, the proportions of some families of compounds change substantially among them. Cocoa beans display higher contents of flavan-3-ols (47-53%) than PCBs (26-35%) [22], while for CBS samples, the amount in PCBs is higher (36-54%) than flavan-3-ols (9-25%). Additionally, the phenolic acids showed a higher contribution for the total compositions of bioactive compounds of CBS (4-14%) than for cocoa beans (2-9%). These data allow the discrimination of CBSs from their related cocoa bean and might be of great interest for the industry for authentication purposes and for avoiding falsifications in cocoa powder.
Parallel to the HPLC-PDA-MS/MS analysis, spectrophotometric assays, including total phenolic content (TPC), total flavonoid content (TFC), and total tannins content (TTC), were also performed to characterize the CBS samples from Venezuela. The radical scavenging activity (RSA) of the CBS extracts was also assessed with the DPPH assay. All fractions yielded from the SPE purification were evaluated separately for each spectrophotometric assay. TPC values of CBS ranged from 5.87 g GAE/kg of CBS (Carenero_C) to 9.12 g GAE/kg of CBS (Caucagua_T), which was 10-fold higher than the total amount of phenolics determined by chromatography. These results were in line with previous studies performed for CBS and cocoa beans from different origins, including Venezuela (5-12 g GAE/kg) [27,36]. The contributions of each fraction for the total value of TPC were on average of 43.1%, 17.6%, 14.1%, and 25.1% for F1, F2, F3, and F4, respectively. However, the main compounds identified and quantified in fraction F1 were the methylxanthines, and phenolic compounds were not identified. These results support the concerns recently highlighted by Granato et al. [17] related to the interferences of sample components that lead to overestimation values in some screening methods. On the other hand, the results obtained with the TFC and TTC assays were more accurate. In the case of TFC, the values ranged from 1.89 g CE/kg of CBS (Caranero_C) to 3.65 g CE/kg of CBS (Merida_1_T) against 0.29 and 1.01 g/kg of CBS determined by chromatographic analysis for the same samples, respectively. The contributions of the fractions for the total TFC value were 11.7, 23.5, 24.4%, and 41.7% for F1, F2, F3, and F4, respectively. The last three fractions contain the flavonoids determined by chromatography (Tables S3-S5). Similar results were obtained with the TTC method (TTC values ranged from 1.05 to 1.75 g CE/kg of CBS), where the average contributions were around 19.06%, 10.48%, 33.82%, and 34.99% for F1, F2, F3, and F4, respectively. Fractions F3 and F4 were those with high amounts of procyanidins (Tables S4 and S5) quantified by HPLC-PDA (up to 0.56 g/kg of CBS), but considering all flavan-3-ols, the final content of 0.93 g/kg of CBS was reached.
Regarding the antioxidant capacity (RSA) of the CBS, the results showed RSA values in the range of 17.33 mmol TE/kg of CBS (Carenero_C) and 26.33 mmol TE/kg of CBS (Caucagua_T). Additionally, with this methodology, the antioxidant capacity could be overestimated considering the results obtained in all fractions. According to the data, fraction F1 contributes to 28.6% of the total antioxidant capacity, despite the chromatographic analysis not being able to identify the bioactive compounds. Considering the contribution of fractions F2, F3, and F4 to the total RSA of CBS samples, 18.6%, 16.9%, and 34.7%, respectively, the results might allow for a high correlation between the RSA and the bioactive compounds present in each fraction. These methods showed a moderate correlation among them (RSA/TPC, r = 0.83; RSA/TFC, r = 0.69; RSA/TTC, r = 0.69). However, the results were lower than those observed previously by the authors for CBS raw extracts without purification processes [13,36].

Classification of CBSs Based on Chemical Compounds Determined by HPLC-PDA-MS/MS
Based on the chemical profile determined for CBSs yielded from Venezuelan cocoa beans, Figure 1 shows the principal component analysis (PCA) used to find differences among cultivars (Figure 1a) and the geographic areas of Venezuela (Figure 1b).
The PCA explained 67.3% of the total variance and showed a clear separation among the Criollo and Trinitario cultivars, which was confirmed by the ANOSIM statistical test (p < 0.001). The highest content of flavan-3-ols and PCB trimers in the Criollo cultivar and the low amounts of N-phenylpropenoyl-L-amino acids, compared to Trinitario CBSs, allow for a clear separation among cultivars. Several compounds (33 out of the 39) were shown to be significantly different and allowed for the classification of CBSs from Venezuela according to the cultivar (FDR < 0.001 (n = 21); FDR < 0.01 (n = 8); FDR < 0.05 (n = 4), as shown in Figure S3. The boxplots (Figure 1c) show the abundance of the main 12 molecules that allow for the classification of CBSs (FDR < 0.001), considering the most representative key chemical markers described for cocoa products in the literature [13]. Epicatechin, N-coumaroyl-L-aspartate, and PCB_trimer_4 were specific compounds with the highest concentrations in Trinitario cultivars from different Venezuela geographical areas. On the other hand, the highest concentrations of other cocoa key compounds, such as the methylxanthines theobromine and caffeine, and the polyphenol PCA_pentoside_3, were found in CBS samples of the Criollo cultivar. The procyanidins, such as PCB trimer_3 and PCA hexoside_1, were also found to be significant as chemical markers for the Criollo cultivar ( Figure S3). Table 2. Total and individual contents of chemical compounds and spectrophotometric assay results of CBS samples (n = 10) yielded from cocoa beans from seven different regions of Venezuela and two cultivars, Trinitario (n = 5) and Criollo (n = 5). Each origin contains 4 samples from 2 different batches (n = 40). The content of bioactive compounds for each sample results from the sum of the quantification of the four fractions yielded individually (n = 120). Data are expressed as mg/kg dw of CBS, except for methylxathines, which are expressed as g/kg dw of CBS.

Classification of CBSs Based on Chemical Compounds Determined by HPLC-PDA-MS/MS
Based on the chemical profile determined for CBSs yielded from Venezuelan cocoa beans, Figure 1 shows the principal component analysis (PCA) used to find differences among cultivars (Figure 1a) and the geographic areas of Venezuela (Figure 1b).  Taking into the account the origin of the CBS, the samples were separated according to the growing area of the respective cocoa beans (p < 0.001) (Figure 1b). However, in some cases, an effect of the cultivar was observed on this separation. Despite the samples from the same region being separated and positioned closely in Figure 1b, such as Ocumare and Merida (top for Ocumare and bottom for Merida), the samples remain separated according to their cultivar. Indeed, the samples Merida_2 and Merida_3 (Criollo) cluster together, while Merida_1 (Trinitario) was separated. The influence of the cultivar on the separation of CBS has been previously reported for the volatile fingerprint of CBS [29]. However, this effect was not perceptible for Canoabo CBSs from the Criollo cultivar, which clustered near CBSs from the Trinitario cultivar. CBS samples Ocumare_1 and Cuyagua (Trinitario cultivar) cluster together due to their similar chemical profile. The geographical proximity of both regions ( Figure S1) might be the main factor that contributes to a similar chemical profile as described in the literature for cocoa beans [20]. Together with Merida (Merida_1), these two origins were those with the highest content of chemical compounds determined by HPLC-PDA-MS/MS. CBSs from these regions were also those with the highest content of N-phenylpropenoyl-L-amino acids. On the other hand, samples from the Criollo cultivar from the Merida (Merida_2, Merida_3) and Carenero regions were clustered closely in the PCA due to the characteristic high content in PCA glycosides. Furthermore, these samples, together with Ocumare_2, were those with the lowest amounts of bioactive compounds. From the HPLC-PDA-MS/MS data, 24 chemical compounds were found as potential markers for the classification of CBSs according to the geographical region (FDR < 0.05), as shown in Figure S4 (see Supplementary Materials). Figure 1d shows the boxplot with the abundance of the most representative key compounds (n = 12) that allow for the classification of CBSs (FDR < 0.05) according to the growth region. Protocatechuic acid was found in high concentrations in the CBSs from three close geographical areas: Ocumare, Cayuagua, and Canoabo ( Figure S1). On the other hand, CBSs from Caucagua, Sur del Lago, and Canoabo were characterized by the highest amount of epicatechin, one of the main key markers of cocoa. Sample Merida_1 was in the PCA close to the Ocumare_1 and Cayagua due to the key N-phenylpropenoyl-L-amino acids, N-Caffeoyl-L-aspartate and N-coumaroyl-L-aspartate_2, but the presence of high amounts of epicatechin, PCB_trimers isomers 4 and 7, and quercetin-3-O-arabinoside allowed for a good discrimination among them. The highest contents of the chemical compounds PCB_2, PCA_pentoside_3, and N-Caffeoyl-L-aspartate were found among all CBS samples from Merida, and thus, they might be used as key markers for this region. In the same way, other cocoa key markers, such as the methylxanthines (theobromine and caffeine), allowed the separation of Merida (Criollo cultivar) from Carenero and Ocumare_2.

Classification of CBSs Based on Spectrophotometric Analysis Data Set
The spectrophotometric assays dataset that was used to discriminate the CBSs from Venezuela will be discussed in this section.
Since F3 and F4 were the fractions with a high contribution of bioactive compounds, in Figure 2, the contribution of both fractions for the classification of CBS according to cultivar (Figure 2a

Classification of CBSs Based on Spectrophotometric Analysis Data Set
The spectrophotometric assays dataset that was used to discriminate the CBSs from Venezuela will be discussed in this section.
Since F3 and F4 were the fractions with a high contribution of bioactive compounds, in Figure 2, the contribution of both fractions for the classification of CBS according to cultivar (Figure 2a    The PCA of spectrophotometric assay data of fraction F3 and F4 explains 96.93% and 96.31% of the variance, respectively, and shows a clear separation among the two cultivars, Criollo and Trinitario, confirmed by the ANOSIM statistical test (p < 0.001).
Taking into account the geographical regions of Venezuela, a similar pattern of discrimination to that obtained with the HPLC-PDA-MS/MS dataset was achieved for CBS samples (p < 0.001) considering fractions F3 (Figure 2b) and F4 (Figure 2e). The spectrophotometric assays confirm the separation of CBSs from Carenero, Merida_2 and Merida_3, and Ocumare_2 from the others in both fractions, and this discrimination was more effective in F3 since in the case of F4, Merida_3 and Ocumare_2 clustered together. On the other hand, F4 showed the best separation of CBSs from Merida_1, Sur del Lago, and Canoabo, similar to that obtained with HPLC-PDA-MS/MS analysis. The boxplot of each spectrophotometric assay result obtained for fractions F3 and F4 is shown in Figures 2c and 2f, respectively. The results highlighted that F3 and F4 datasets allow effective discrimination using the four screening assays, which could be used as a tool for the rapid discrimination of CBS samples from different cultivars and few geographic origins. Fraction F3 Merida_2 was separated due to the high response to the RSA, TTC, and TPC assays, while Carenero and Canoabo samples were on the opposite side of PCA. Likewise, the TFC assay allowed for a complete discrimination of all Merida samples due to their high response to this assay.

Correlation between HPLC-PDA-MS/MS and Spectrophotometric Assays Data Sets
The correlation between spectrophotometric assays and HPLC-PDA-MS/MS showed the relationship between RSA values and polyphenols. The dataset was evaluated separately for each fraction and compared with the respective bioactive compounds determined by chromatographic analysis (Tables S2-S5). The correlation between the analytical techniques for the sum of the fractions F2, F3, and F4 is shown in Figure 3.
The heatmap clearly shows two main clusters of spectrophotometric assays: TPC and RSA (cluster 1) and TFC and TTC (cluster 2). The high correlation between TPC and RSA has been reported in the literature by several authors, including for CBS samples. Nevertheless, descriptions of the correlation between screening methods and the individual chemical compound are limited [28,36]. In this study, both methods were correlated positively with the chemical compounds, such as PCA_pentoside_3, catechin-3-O-glucoside_2, PCB_trimer_1, or N-caffeoyl-L-aspartate. Considering the high number of bioactive compounds determined by HPLC-PDA-MS/MS, their correlation with cluster 1 was limited, mainly for the TPC method. The methods TFC and TTC used to determine flavonoids and tannins, respectively, showed a high correlation among them. In addition, a positive correlation between these screening methods and almost all flavonoids identified by chromatography was observed, mainly with flavan-3-ols (catechin and epicatechin) and their derivates. Indeed, significant positive correlations were found between TFC data and several chemical compounds, such as flavan-3-ols monomers (catechin and epicatechin), catechin-3-O-glucoside_2, quercetin-3-O-glucosides, or the aglycone quercetin. Moreover, N-phenylpropenoyl-L-amino acids showed a higher positive correlation with TFC than TTC and a negative correlation with TPC. On the other hand, the TTC method that determines the total amount of tannins showed the highest positive correlation with several procyanidins, mainly type-B trimers (isomers 1, 2, 4, 6, 7, and 8). Moreover, some of these compounds were also correlated positively with antioxidant capacity, such as PCB-trimers 2, 6, and 7 [39]. Other compounds, such as catechin, epicatechin, or N-feruloyl-L-aspartate, might also contribute to the antioxidant capacity of the CBS samples, but, despite the positive correlation, this was not significant compared with the other compounds present in CBS and with that described before by other authors [36,39].
of these compounds were also correlated positively with antioxidant capacity, such as PCB-trimers 2, 6, and 7 [39]. Other compounds, such as catechin, epicatechin, or N-feruloyl-L-aspartate, might also contribute to the antioxidant capacity of the CBS samples, but, despite the positive correlation, this was not significant compared with the other compounds present in CBS and with that described before by other authors [36,39].

Conclusions
This study provides, for the first time, the chemical fingerprint of cocoa bean shells (CBSs) from Venezuelan cocoa beans, based mainly on methylxanthines and phenolic compounds determined by SPE-HPLC-PDA-ESI-MS/MS. Several compounds, such as catechin glycosides, procyanidins A-type glycosides, and N-Feruloyl-L-aspartate, were identified, for the first time, in CBS samples. Moreover, the presence of high amounts of high added-value compounds, such as methylxanthines, N-phenylpropenoyl-L-amino acids, flavan-3-ols, and procyanidins, valorizes the byproduct as a food ingredient. CBSs of the Trinitario cultivar from the geographical regions of Merida, Cayagua, and Ocumare were those with high contents of bioactive compounds.
The datasets obtained by HPLC-PDA-MS/MS and spectrophotometric assays allowed for the classification of CBSs yielded from cocoa beans from different Venezuelan regions according to both geographical area and cultivar. However, the classification by the screening method (spectrophotometric assays) results required a previous purification process of the extracts and analysis of the fractions separately, which constitutes a critical limitation.
The HPLC-PDA-MS/MS methodology consented to the classification of CBSs according to cultivar and geographical origin through the identification of several chemical compounds recognized as cocoa key markers, such as epicatechin, theobromine, caffeine, N- value described for cocoa beans in the literature (T/C = 3). This study also proposes, for the first time, the establishment of the epicatechin/catechin ratio (E/C) as a criterium to discriminate CBSs (5 < E/C < 10) from cocoa beans (E/C < 2) and related products, such as chocolate (2 < E/C < 4). This parameter could be of interest for authentication purposes and to control/avoid the adulteration of cocoa powder or other related products by the addition of CBS. Finally, this study disclosed a high correlation between spectrophotometric assays, TFC and TTC, and several individual chemical compounds, mainly flavonoids, determined by HPLC-PDA-MS/MS. Instead, the correlation between single compounds and the antioxidant capacity (RSA) was limited to a low number of molecules, and their contribution as antioxidants remains unclear. Despite several new insights of this study, our findings are incomplete and limited to CBSs from Venezuela. Thus, further and exhaustive investigations are necessary to consider a large set of CBS samples from different origins and cultivars to sustain these preliminary findings.
Analogous to cocoa beans, the selective collection and discrimination of this byproduct might be of great interest for the food industry sector, not only for flavor properties, but also because it constitutes an attractive source of bioactive compounds with application as a functional ingredient. Therefore, the valorization, authentication, and selective recovery of CBS might contribute to the sustainability of the cocoa production industry within the circular economy framework.
Supplementary Materials: The following are available online at https://www.mdpi.com/article/10 .3390/foods10081791/s1, Figure S1. Fermented and dried cocoa beans used to yield the cocoa bean shell (CBS) samples from different geographical areas in Venezuela, Figure S2. HPLC-PDA-MS/MS chromatograms of a representative CBS raw extract yielded from cocoa beans with origin in Merida and Criollo cultivar (Merida_2_C), and its respective fractions yielded by solid phase extraction (SPE), as follows: (a) CBS raw extract; (b) 1st fraction (F1); (c) 2nd fraction (F2); (d) 3rd fraction (F3); and finally, (e) 4th fraction (F4), recorded at 280 nm, Figure S3. Pairwise comparison of several chemical compounds quantified in CBSs by HPLC-PDA-MS/MS that could be used as potential markers among cocoa cultivars. FDR < 0.001; FDR < 0.01 and FDR < 0.05 are highlighted in green, yellow and red, respectively, Figure S4. Pairwise comparison of several chemical compounds quantified in CBSs by HPLC-PDA-MS/MS, which could be used as potential markers among Venezuela geographical regions. FDR < 0.05 are highlighted in red, Table S1. Analytical parameters of HPLC/PDA-MS/MS for the quantification and semi-quantification of polyphenols and methylxanthines in CBS, Table  S2. Total amounts of methylxanthines and spectrophotometric assay results determined in fraction F1 yielded from CBS samples from Trinitario and Criollo cultivars and from different regions of Venezuela. Data are expressed g/kg of CBS, Table S3. Total amounts of bioactive compounds (n = 10) and spectrophotometric assay results determined in fraction F2 yielded from CBS samples from Trinitario and Criollo cultivars and from different regions of Venezuela. Data are expressed as mg/kg of CBS, Table S4. Total amounts of bioactive compounds (n = 23) and spectrophotometric assay results determined in fraction F3 yielded from CBS samples from Trinitario and Criollo cultivars and from different regions of Venezuela. Data are expressed mg/kg of CBS, Table S5. Total amounts of bioactive compounds (n = 19) and spectrophotometric assay results determined in fraction F4 yielded from CBS samples from Trinitario and Criollo cultivars and from different regions of Venezuela. Data are expressed as mg/kg of CBS. Funding: The present work has been supported by COVALFOOD "Valorisation of high added-value com-pounds from cocoa industry by-products as food ingredients and additives". This project has received funding from the European Union's Seventh Framework programme for research and innovation under the Marie Skłodowska-Curie grant agreement No 609402-2020 researchers: Train to Move (T2M).