5-CQA and Mangiferin, Two Leaf Biomarkers of Adaptation to Full Sun or Shade Conditions in Coffea arabica L.

Phenolic compounds are involved in plant response to environmental conditions and are highly present in leaves of Coffea arabica L., originally an understory shrub. To increase knowledge of C. arabica leaf phenolic compounds and their patterns in adaptation to light intensity, mature leaves of Ethiopian wild accessions, American pure lines and their relative F1 hybrids were sampled in full sun or under 50% shade field plots in Mexico and at two contrasting elevations in Nicaragua and Colombia. Twenty-one phenolic compounds were identified by LC-DAD-MS2 and sixteen were quantified by HPLC-DAD. Four of them appeared to be involved in C. arabica response to light intensity. They were consistently more accumulated in full sun, presenting a stable ratio of leaf content in the sun vs. shade for all the studied genotypes: 1.6 for 5-CQA, F-dihex and mangiferin and 2.8 for rutin. Moreover, 5-CQA and mangiferin contents, in full sun and shade, allowed for differentiating the two genetic groups of Ethiopian wild accessions (higher contents) vs. cultivated American pure lines. They appear, therefore, to be potential biomarkers of adaptation of C. arabica to light intensity for breeding programs. We hypothesize that low 5-CQA and mangiferin leaf contents should be searched for adaptation to full-sun cropping systems and high contents used for agroforestry systems.


Introduction
Arabica coffee (Coffea arabica L.) is a cash crop of major social-economic importance throughout the intertropical zone. Wild C. arabica is an understory bush that originated in the southwest Ethiopian leaves, and leading to studies on leaf nutraceutical compounds [33,34]. However, studies on the role of leaf phenolic compounds in coffee adaptation to environmental constraints are rare, with the exception of those on CGAs in the response to cold of tolerant genotypes [35] and on glycosylated flavonoids in high light tolerance of Catuai, an American pure line variety [30].
The aim of the present study was thus to increase current knowledge of C. arabica leaf phenolic compounds and their patterns in plant adaptation to light intensity.

Identification of Phenolic Compounds in Mature Leaves of C. arabica Using LC-MS 2
To enable the most thorough analysis possible of the phenolic compounds present in the mature leaves of C. arabica, even those accumulated at very low concentrations, a LC-DAD-MS 2 analysis was conducted on leaf extracts from a C. arabica variety (cv. Marsellesa) grown under full sun. Only 23 substances reached the automatic pseudomolecular ion detection threshold that triggers introduction of ionized substances in the collision cell for fragmentation pattern acquisition. All generated a fragmentation pattern after positive ionization, while only 21 yielded a fragmentation pattern under negative ionization ( Table 1). The identity of 10 compounds was confirmed by parallel analysis of pure standards, while basic structural information is suggested for the others based on their high-resolution mass, fragmentation pattern and UV spectrum.  10 . The three compounds share the same UV spectrum with a shoulder at 300 nm and a maximum at 325 nm. These characteristics correspond to mono-CQA isomers. Compound 5 was identified as 5-O-CQA by comparison with a standard, while compounds 2 and 6 were identified as 3-O-and 4-O-CQA, respectively, based on the literature describing their retention time (RT) elution order and their specific MS 2 spectra in negative mode [36,37]. Using the same combined analyses, 9 and 11 were identified as 5-O-coumaroylquinic acid (5-CoumQA) and 5-O-feruloylquinic acid (5-FQA), respectively. Compound 3 was characterized by m/z 409.1107 and 407.1012 in positive and negative mode, respectively, suggesting a molecular formula of C 19 H 20 O 10 . The UV absorption spectrum (λ max at 296 nm) and complex MS 2 fragmentation suggest that 3 is a glycosylated benzophenone-C-heteroside, tentatively identified as iriflophenone 3-C-β-glucoside [38]. By comparison with standards, peaks 4 and 8 were identified as (+)-catechin and (−)-epicatechin, respectively, and peak 10 as mangiferin (Mang). Compound 23 showed a UV spectrum similar to Mang, indicating that 23 is a Mang derivative. Its m/z under positive and negative modes corresponded to Mang + C 7 H 4 O 2 . Compared to Mang, it displayed an additional fragment ion at m/z 385 in negative ionization mode corresponding to a loss of [parahydroxybenzoate+H 2 O], and a major fragment ion at m/z 121 in positive mode, corresponding to a benzoic acid ion ([C 7 H 5 O 2 . ] + ). This suggests that 23 is mangiferin parahydroxybenzoate (Mang-OHbenz). Peak 12 was characterized by a pseudomolecular ion at m/z 595.1643 and at m/z 593.1555, in positive and negative mode, respectively, leading to a molecular formula of C 27 H 30 O 15 . Its UV spectrum (λ max at 265 and 330 nm) resembles a di-hexosyl flavone [39]. Its MS 2 fragmentation in positive and negative modes suggests that 12 is a flavone-di-C-hexose (F-dihex). Peaks 13, 14, 16 and 18 shared the same UV spectra (λ max at 257 and 353 nm with 13 and 14 but with an additional shoulder at 300 nm) indicative of 3-O-substituted flavonols [39]. Compounds 14, 16 and 18 were identified as quercetin-3,4-di-O-glucoside (Q-diglu), rutin (quercetin-3-O-rutinoside) and quercetin-3-β-D-glucoside (Q-glu), respectively, by comparison with standards. Peak 13 showed the same UV spectrum as compound 14 but eluted earlier. The two substances therefore shared the same UV-absorbing moiety. In addition, 13 displayed pseudomolecular ions at m/z 773.2129 and 771.2036 in positive and negative mode, respectively (this is a gain of 146 amu compared to 14, corresponding to the addition of a deoxyhexose). It was therefore annotated as quercetin-3-O-dihexose-deoxyhexose (Q-dihex-dhex). Table 1. Result of the LC-DAD-MS 2 analysis performed with a leaf methanolic extract of C. arabica cv. Marsellesa. RT, retention time in the analytic system in minutes; Mass, theoretical mass expressed in Da; Wavelength, maximum absorbance in the UV range; sh, shoulder. Experimental mass accuracy of less than 2 ppm. ND: not detected; *: metabolites identified by comparison with a pure standard compound. Peaks 15 and 19 only fragmented in positive mode because of their low abundance and/or lesser stability of their deprotonated forms. They shared the same UV spectrum (λ max at 265 and 346 nm) and MS 2 fragmentation as a base peak at m/z 287, indicative of 3-O-substituted kaempferol derivatives. Compound 15 showed fragment ions at m/z 612 (loss of deoxyhexose), 449 (loss of deoxyhexose + hexose) and 287 (loss of deoxyhexose + 2 hexoses) and was annotated as kaempferol-3-O-dihexose-deoxyhexose (K-dihex-dhex). Similarly, compound 19 was annotated as kaempferol-3-O-hexose-deoxyhexose (K-hex-dhex).
Among CGAs, only 5-CQA, 3,5-diCQA and 4,5-diCQA presented significant differences in leaf content (p < 0.05) depending on light intensity ( Table 2). Like all other CGAs, except FQA, they also presented significant differences in leaf content depending on the elevation, and significant interactions were observed between two factors, light and elevation. 5-CQA had the most discriminating CGA leaf contents by far not only for light conditions but also for elevation, suggesting that other abiotic factors linked to elevation also significantly influence 5-CQA leaf content. Indeed, a significant difference in 5-CQA content depending on light was observed at low elevation, as in full sun, concentrations were 40% higher than those in the shade ( Figure 2).
With the exception of catechin, the concentration of each of the flavonoids in leaves varied significantly (p < 0.05) depending on the light conditions. The concentration of the eight compounds quantified also differed significantly depending on elevation, with no significant interaction between light and elevation, with the exception of the three glycosylated quercetins ( Table 2).
A significant effect of light was only observed in concentrations of epicatechin, with a low F value of 6.0 (Table 2), and a significant difference in concentration in leaves growing in the sun and in the shade was only seen at high elevation, the concentrations being the highest in the shade (Table S1). There were always significant differences in leaf catechin content between plants grown in the sun and in the shade at both elevations, but inversed: full-sun leaf catechin content was highest at low elevation and lowest at high elevation (Table S1; Figure 2).
Of the two glycosylated kaempferols quantified, K-hex-dhex presented by far the most variable leaf contents depending on light ( Table 2). The leaf contents were significantly higher in full sun than in the shade at both elevations: 10-fold higher at low elevation and 3-fold higher at high elevation ( Figure 2). Table 2. Factorial ANOVA of the concentrations of phenolic compounds in the leaves of C. arabica cv. Marsellesa according to elevation (650 m asl vs. 1250 m asl) and light conditions (full sun vs. 50% shade). For each chemical family, the compound(s) with the most significant difference in leaf contents for one or both environmental factors and the corresponding F score are underlined.  With the exception of catechin, the concentration of each of the flavonoids in leaves varied significantly (p < 0.05) depending on the light conditions. The concentration of the eight compounds quantified also differed significantly depending on elevation, with no significant interaction between light and elevation, with the exception of the three glycosylated quercetins (Table 2). A significant effect of light was only observed in concentrations of epicatechin, with a low F value of 6.0 (Table 2), and a significant difference in concentration in leaves growing in the sun and in the shade was only seen at high elevation, the concentrations being the highest in the shade (Table  S1). There were always significant differences in leaf catechin content between plants grown in the Like K-hex-dhex, rutin leaf contents were largely and significantly higher in full sun than in the shade at both elevations: 4-fold higher at the low elevation and 2.3-fold higher at the high elevation ( Figure 2). It should be noted that there were major significant effects of light and elevation on the rutin content in leaves, but without any significant interaction between the two abiotic factors.

Phenolic Compounds
At low elevation, F-dihex leaf content was 1.81-fold higher in full sun than in the shade, while no significant difference was observed at high elevation (Table S1; Figure 2). Among the glycosylated flavonoids, this compound was therefore less discriminating of light conditions than K-hex-dhex and rutin. However, the concentrations of all three glycosylated quercetins were highly variable for light conditions, rutin content being by far the most variable ( Table 2).
Concentrations of mangiferin, the only xanthone derivative quantified in this study, differed significantly in leaves depending on both light conditions and elevation, but no significant interaction was found between the two ( Table 2). Significant differences were observed at both elevations with higher concentrations in the sun than in the shade: 1.45-fold higher in full sun than in the shade at the low elevation, and 1.30-fold higher at the high elevation ( Figure 2; Table S1).
A heatmap was drawn using the correlation coefficients among leaf contents of these 16 phenolic compounds (Figure 3). A strong correlation was noticed between the three flavonoids belonging to glycosylated quercetins: Q-dihex-dhex, Q-diGlu and rutin. These clustered away from all other phenolic compounds, including 5-CQA and the three other glycosylated flavonols. The other large cluster was subdivided into substances whose contents negatively correlated with those of the quercetin analog/rutin cluster and the substances whose contents displayed no correlation with this cluster. Within this large cluster, it can be noticed that the contents of the flavanols, epicatechin and catechin are highly correlated (R = 0.80; p < 0.05) as well as those of 5-CQA and 4,5-diCQA (R = 0.87) and the two glycosylated kaempferols (R = 0.85) that sub-clustered with mangiferin. Together, based on ANOVA and correlation analysis of the leaf phenolic contents of the Marsellesa cultivar, we retained six major compounds with significant differences according to light conditions at one or both elevations: 5-CQA, catechin, K-hex-dhex, rutin, F-dihex and mangiferin.

Influence of Light Intensity on the Leaf Phenolic Content of Numerous C. arabica Genotypes Grown in Different Environments
The same 16 phenolic compounds analyzed in the mature leaves of cv. Marsellesa in the Mexican field trials were identified in all C. arabica genotype samples (7 American pure lines, 8 Ethiopian wild accessions and 19 of their F1 hybrid clones) in trials in both Nicaragua and Colombia (data not shown). Correlation and linear regression analyses were conducted on the concentration (Table S2) of the six previously selected phenolic compounds discriminating full-sun and 50% shade conditions. These analyses concerned all the genotypes, those studied in the Nicaragua and Colombia trials and cv. Marsellesa grown in the two Mexican field trials ( Figure 4). Together, based on ANOVA and correlation analysis of the leaf phenolic contents of the Marsellesa cultivar, we retained six major compounds with significant differences according to light conditions at one or both elevations: 5-CQA, catechin, K-hex-dhex, rutin, F-dihex and mangiferin.

Influence of Light Intensity on the Leaf Phenolic Content of Numerous C. arabica Genotypes Grown in Different Environments
The same 16 phenolic compounds analyzed in the mature leaves of cv. Marsellesa in the Mexican field trials were identified in all C. arabica genotype samples (7 American pure lines, 8 Ethiopian wild accessions and 19 of their F1 hybrid clones) in trials in both Nicaragua and Colombia (data not shown). Correlation and linear regression analyses were conducted on the concentration (Table S2) of the six previously selected phenolic compounds discriminating full-sun and 50% shade conditions. These analyses concerned all the genotypes, those studied in the Nicaragua and Colombia trials and cv. Marsellesa grown in the two Mexican field trials ( Figure 4).
Except for catechin and K-hex-dhex, leaf phenolic contents in the sun and in the shade were significantly correlated (r 2 > 0.5; p < 0.05). The r 2 of mangiferin was particularly high (r 2 = 0.812), as well as that of F-dihex (r 2 = 0.747) and 5-CQA (r 2 = 0.738) and to a lesser extent that of rutin (r 2 = 0.634). The high correlations between these four compounds indicated that, whatever the environmental conditions linked to the geographical location of the trial, there was a strong and stable relation between the leaf contents evaluated in full sun and in the shade (50% light exclusion in all the trials).
However, marked variability in leaf content was observed among the genotypes. In the shade, 5-CQA and mangiferin leaf contents varied from 0.41 to 2.86 mg.100 mg −1 DW and 0.17 to 1.62 mg.100 mg −1 DW, respectively (Table S2). For 5-CQA, mangiferin and F-dihex, the slope of the regression line was similar: 0.64, 0.62 and 0.62, respectively. For rutin, the ratio of leaf content in the sun to that in the shade was much higher and the slope of the regression line was much lower: 0.36 (Figure 4).  Except for catechin and K-hex-dhex, leaf phenolic contents in the sun and in the shade were significantly correlated (r 2 > 0.5; p < 0.05). The r 2 of mangiferin was particularly high (r 2 = 0.812), as well as that of F-dihex (r 2 = 0.747) and 5-CQA (r 2 = 0.738) and to a lesser extent that of rutin (r 2 = 0.634). The high correlations between these four compounds indicated that, whatever the environmental conditions linked to the geographical location of the trial, there was a strong and stable relation between the leaf contents evaluated in full sun and in the shade (50% light exclusion in all the trials). However, marked variability in leaf content was observed among the genotypes. In the shade, 5-CQA and mangiferin leaf contents varied from 0.41 to 2.86 mg.100 mg −1 DW and 0.17 to 1.62 mg.100 mg −1 DW, respectively (Table S2). For 5-CQA, mangiferin and F-dihex, the slope of the regression line was similar: 0.64, 0.62 and 0.62, respectively. For rutin, the ratio of leaf content in the sun to that in the shade was much higher and the slope of the regression line was much lower: 0.36 (Figure 4).

Influence of C. arabica Genetic Groups on Leaf Phenolic Contents in Two Different Environments
Next, we compared the leaf phenolic contents of the four compounds (5-CQA, mangiferin, F-dihex and rutin) that showed a significant correlation in the sun and in the shade, taking into account the averages calculated for each C. arabica genetic group, Ethiopian wild accessions, American pure lines and F1 hybrid clones, in Nicaragua and Colombia, in full sun and in the shade. As can be seen in Figure 5, whatever the compound and the location, the highest contents were observed in full sun. The difference between rutin contents in the sun and in the shade were particularly clear. lines and F1 hybrid clones, in Nicaragua and Colombia, in full sun and in the shade. As can be seen in Figure 5, whatever the compound and the location, the highest contents were observed in full sun. The difference between rutin contents in the sun and in the shade were particularly clear. The concentration of 5-CQA in the leaves of the Ethiopian wild accessions was the highest under full sun regardless of the location of the trial, indicating that genotype dependence prevailed over location dependence. In the shade, concentrations of 5-CQA in the leaves of the Ethiopian wild accessions in Nicaragua were lower than those of all genetic groups in Colombia. Leaf contents in the shade were thus distributed according to the location and the relationship between genetic groups (Ethiopian wild accessions > F1 hybrid clones > American pure lines) was conserved.
The dependence on location of mangiferin and rutin contents in the leaves was clear, the samples from Colombia being separated from those from Nicaragua. The relationship between the concentration of mangiferin and the genetic group was conserved at the two locations regardless of the light condition (Ethiopian wild accessions > F1 hybrid clones > American pure lines) and the highest values were observed in Colombia. The same distribution was observed for rutin content, except for in the shade in Nicaragua, where the relationship between genetic groups was inversed, and the American pure lines had the highest rutin content (American pure lines > F1 hybrid clones > Ethiopian wild accessions).
Under each light condition, slight differences were observed between the concentration of Fdihex in all genetic groups in Nicaragua and the Ethiopian wild accession group in Colombia ( Figure  5). It was thus impossible to identify a clear relationship between the F-dihex content and genetic group or location.
Interestingly, we observed that concentrations of 5-CQA, mangiferin and rutin in F1 hybrid clones grown in the sun were intermediate between those of their Ethiopian wild accession father and their American pure line mother, but closer to the latter. The same observation was made concerning concentrations of mangiferin and rutin in coffee trees grown in the shade. The concentration of 5-CQA in the leaves of the Ethiopian wild accessions was the highest under full sun regardless of the location of the trial, indicating that genotype dependence prevailed over location dependence. In the shade, concentrations of 5-CQA in the leaves of the Ethiopian wild accessions in Nicaragua were lower than those of all genetic groups in Colombia. Leaf contents in the shade were thus distributed according to the location and the relationship between genetic groups (Ethiopian wild accessions > F1 hybrid clones > American pure lines) was conserved.

LC-MS 2 Analysis Enabled Identification of Two Novel Mangiferin Derivatives in C. arabica
The dependence on location of mangiferin and rutin contents in the leaves was clear, the samples from Colombia being separated from those from Nicaragua. The relationship between the concentration of mangiferin and the genetic group was conserved at the two locations regardless of the light condition (Ethiopian wild accessions > F1 hybrid clones > American pure lines) and the highest values were observed in Colombia. The same distribution was observed for rutin content, except for in the shade in Nicaragua, where the relationship between genetic groups was inversed, and the American pure lines had the highest rutin content (American pure lines > F1 hybrid clones > Ethiopian wild accessions).
Under each light condition, slight differences were observed between the concentration of F-dihex in all genetic groups in Nicaragua and the Ethiopian wild accession group in Colombia ( Figure 5). It was thus impossible to identify a clear relationship between the F-dihex content and genetic group or location.
Interestingly, we observed that concentrations of 5-CQA, mangiferin and rutin in F1 hybrid clones grown in the sun were intermediate between those of their Ethiopian wild accession father and their American pure line mother, but closer to the latter. The same observation was made concerning concentrations of mangiferin and rutin in coffee trees grown in the shade.

LC-MS 2 Analysis Enabled Identification of Two Novel Mangiferin Derivatives in C. arabica
LC-DAD-MS 2 was used for an exhaustive characterization of the alkaloids and phenolic compounds present in mature leaves of C. arabica. Two metabolites from the benzophenone group, an iriflophenone-C-glucoside and a xanthone derivative, mangiferin parahydroxybenzoate, were tentatively identified in coffee plants for the first time. Their presence in C. arabica leaves, along with confirmation of the presence of mangiferin previously described [31], points to the existence of a biosynthetic pathway for mangiferin via benzophenone synthase, as already reported in Hypericum and Anemarrhena [41,42] and reviewed in Joubert et al. [43].
The nature of the main accumulated flavanols was also clarified. They consist of catechin, epicatechin and a procyanidin. In contrast to the work of Ratanamarno and Surbkar [44], we detected no gallocatechin derivatives. Lastly, LC-MS 2 analysis enabled annotation of some glycosylated derivatives of quercetin and kaempferol previously observed by Martins et al. [30]. These authors analyzed the phenolic content of the "Catuai Vermelho IAC 44" variety, an American pure line of C. arabica. Interestingly, the same 21 metabolites were identified in the mature leaves of the 34 genotypes analyzed in the present study, whether they were American pure lines, Ethiopian wild accessions or F1 hybrid clones, and irrespective of the light condition, elevation and latitude. However, some accumulated at concentrations that were too low to be quantified on HPLC-DAD chromatograms and only 16 could be used to compare the coffee leaf samples. In a quest for biochemical markers for use in cultivar development programs, it is preferable to target compounds that are relatively abundant and easy to quantify with simple instrumentation, ideally usable in field experiments.

Identification of Leaf Phenolic Contents as Biomarkers of Adaptation of C. arabica to Full-Sun or Shade Conditions
In agreement with the widely accredited role of phenolic compounds in the adaptive response of plants, particularly to abiotic environments [45][46][47], we were able to identify phenolics whose concentrations in C. arabica leaves changed with environmental conditions, especially with changes in light intensity, i.e., in our study, full sun vs. 50% shade.

Preselection of Candidate Biomarkers of Adaptation of C. arabica to Full-Sun Conditions
Among the 16 phenolic compounds quantified by HPLC-DAD in the two field trails at two contrasting elevations in Mexico and in the study of a single American pure line (C. arabica cv. Marsellesa), we were able to identify compounds that could be used as markers of adaptation to full sun: 5-CQA, catechin, K-hex-dhex, F-dihex, rutin and mangiferin. These compounds were retained because their leaf contents were higher and/or displayed the highest significant difference between leaf contents in the sun and in the shade at at least one of the two elevations. Indeed, with the exception of K-hex-dhex, the compounds were present at medium to high concentrations in the leaves at least in full sunlight. K-hex-dhex was retained because of its highly significant differences in leaf contents in the sun and in the shade at both elevations. Additionally, the leaf contents of the six compounds were significantly correlated with those of the other compounds of the same chemical subgroup, thus reinforcing the results obtained, meaning they are good representatives of the response to light stress by these chemical subgroups.

Selection of Biomarkers for Adaptation of C. arabica to Full Sunlight Using Genetic Diversity and Contrasting Environments
The six preselected candidate biomarkers for the adaptation of C. arabica to full sun were evaluated in seven other American pure lines, eight Ethiopian wild accessions and 19 of their F1 hybrids, in two contrasting geographical trials in Colombia and Nicaragua.
Combining the data on the concentration of each of the six preselected phenolics in the leaves of all 34 genotypes studied in the four field trials in Mexico, Colombia and Nicaragua, we finally retained four, 5-CQA, mangiferin, F-dihex and rutin, because the ratio of their concentration in the sun to that in the shade remained stable in the leaves of all the genetic material and in all four contrasting environments, as measured by r 2 (0.74; 0.81; 0.75 and 0.63, respectively). All the slopes of the regression lines were below 1, indicating that, just like in cv. Marsellesa at two elevations in Mexico, the concentrations of these compounds in leaves of the 34 other genotypes were always higher in the sun than in the shade. The strong linear correlation between the concentrations of these four phenolics in the leaves suggests that C. arabica germplasm could be phenotyped in only one of the light conditions (full sun or 50% shade) but could still predict the degree of adaptation to full sunlight or shade, mimicking agroforestry systems. This also suggests that these four candidate biomarkers of adaptation could be used in very different environments and with contrasting genotypes like Ethiopian wild accessions and American pure lines.

Influence of C. arabica Genotype on the Phenolic Leaf Contents and Selection of Biomarkers of Adaptation to Full Sunlight or Shade in Breeding Programs
By comparing the influence of the genotype on the four previously selected leaf phenolic contents at the two field trials in Colombia and Nicaragua, 5-CQA and mangiferin were shown to be the best candidate biomarkers of adaptation to full sunlight in C. arabica because both compounds globally reflected the genetic structure at the two locations and made it possible to differentiate Ethiopian wild accessions from American pure lines at the same location. Moreover, in the case of 5-CQA, the influence of genotype even prevailed over the influence of location, making it possible to differentiate Ethiopian wild accessions from American pure lines regardless of the location. All Ethiopian wild accessions consistently presented higher concentrations of 5-CQA in their leaves in the sun regardless of the location and of mangiferin at a same location compared to the American pure lines. This may be linked to the origin of the Ethiopian wild accessions, i.e., understory bushes in mesophilous forests [1] and that of American pure lines derived from the C. arabica "Yemen-Harare" group domesticated for cultivation in full sunlight [3][4][5][6][7][8][9].
The higher concentrations of both 5-CQA and mangiferin in leaves in full sun than in the shade observed in this study are in accordance with observations made in C. arabica cv. Catuai Vermelho IAC 44 in Brazil with two-fold higher accumulation of both compounds in leaves in full sun than in the shade [30]. Very little information is available on the likely role of 5-CQA in plant response to light stress, and mangiferin has been more widely studied in humans than in plants for its multipotent anti-inflammatory potential, anti-lipid peroxidation and antibacterial activities [48][49][50]. However, since this compound is considered to be a strongly antioxidant molecule, higher mangiferin contents in leaves exposed to full sunlight point to a likely role for mangiferin in protecting leaves from oxidative damage linked to more intense photosynthesis. A protective role of mangiferin against UV radiation in Aphloia theiformis has also been suggested [51]. A previous study on wild coffee species indicated higher accumulation of mangiferin in leaves of species originating from high altitudes where UV irradiance is higher [31]. It is interesting to note that, when we compared the average concentrations of 5-CQA and mangiferin in each genetic group (Ethiopian wild accessions, American pure lines and F1 hybrid clones), in the two trials, Colombia values were higher than Nicaragua values. This is consistent with the higher global horizontal irradiance observed during the trial in Colombia compared to the trial in Nicaragua (Table 3) and reinforces the capacity of these candidate biomarkers. However, the influence of other factors cannot be ruled out, e.g., that of UV-B irradiance, which increases as one approaches the Equator, or of temperature which, in our case, was higher in Colombia.
An interesting result concerning 5-CQA was that, in the two field trials in Mexico, we found a significant correlation between the concentration of 5-CQA in leaves and the concentrations of all the other phenolic compounds analyzed, except quercetin derivatives. This suggests that 5-CQA plays a key role in the response of phenolic metabolism to light stress, as suggested by Grace and Logan [17]. Phenylpropanoid biosynthesis may offer an alternative pathway for photochemical energy dissipation. In this energy overflow mechanism, which transforms photosynthesis products into stable compounds, 5-CQA may be the main sink.
Another important observation is that in both the Nicaragua and Colombia trials, the F1 hybrid clones had very similar concentrations of 5-CQA and mangiferin in the leaves of shaded and unshaded plants to those measured in their mother American pure lines, suggesting a significant maternal effect in adaptation to light growing conditions. This should be taken into account in future breeding programs. APL: American pure lines; EWA: Ethiopian wild accession.
The concentrations of 5-CQA and mangiferin in leaves in full sun and in the shade are among the highest observed for the phenolic compounds, irrespective of the light conditions and the field trial, particularly the concentration of 5-CQA. Thus, their detection and quantification are quite easy, accurate and reproducible.
5-CQA is the most common chlorogenic acid in plants. As an esterified form of caffeic acid with quinic acid, this compound possesses a phenol ring produced by hydroxycinnamic acid and a specific absorption spectrum. Near-infrared (NIR) spectroscopy has been successfully used to characterize coffee beans and to study environmental effects on coffee bean quality [52,53]. An NIR calibration was obtained, based on partial least squares regression with quantitative values of different concentrations of CQA [54]. The NIR technique thus enables rapid evaluation of CQA content in coffee beans. More recently, a study was conducted on coffee leaves using Fourier transform near infrared spectroscopy (FT-NIR) combined with a statistical method for classification (SIMCA) [55]. Mangiferin also possesses a phenol ring and a specific absorption spectrum. Thus, CQA and mangiferin evaluation by NIR could be considered for high-throughput phenotyping of C. arabica germplasm. This approach would help identify candidate progenitors with good adaptation to full-sun conditions, i.e., likely those with low leaf 5-CQA and mangiferin contents, or to shaded agroforestry conditions, i.e., those with high leaf 5-CQA and mangiferin contents. It would facilitate high-throughput phenotyping of large F1 offspring and the creation of new F1 hybrids.
Like 5-CQA and mangiferin, the concentration of F-dihex in full sun was higher than in the shade in all the genotypes studied. However, although the concentration of F-dihex in leaves in full sun was well correlated with the concentration in the shaded leaves across the different genotypes and the four field trials, this compound did not reflect genetic structure (Ethiopian wild accessions vs. American pure lines) when its leaf contents were compared in different genetic groups, meaning F-dihex cannot be used as a stable biomarker of the adaptation of C. arabica to full sunlight.
Rutin leaf contents were considerably higher in plants grown in full sun than in plants grown under shade in all the genotypes studied and at all four study locations. This is in agreement with the results of previous studies showing that quercetin derivatives [56] and, more globally, flavonols [57] are involved in plant response to light, particularly to UV-B radiation. This could explain why concentrations of rutin in the leaves of C. arabica cv. Marsellesa in Mexico in both the sun and shade were much higher at the high elevation (1250 m asl) than at the low elevation (650 m asl), as global horizontal irradiance was very similar at the two trial sites but UV-B radiation increases with elevation. The marked variations in rutin contents in the same variety grown at two contrasting elevations points to high dependence of the compound biosynthesis on environmental conditions. Moreover, when comparing rutin content between genetic groups, the relationship between concentrations in leaves grown in the sun and in the shade did not reflect the genetic structure of the genotypes studied, i.e., Ethiopian wild accessions vs. American pure lines. Thus, rutin appears to be of no interest as a selection biomarker for adaptation to growth in full sun, but could be useful to assess light stress, particularly UV stress, in a given coffee genotype at a given time and in different environmental conditions. This is in accordance with the stimulation of quercetin derivative biosynthesis by light intensity observed in coffee leaves under natural [30] or controlled conditions [58] and in line with the reaction of numerous plants to UV-B irradiance [56]. Recent studies indicated that quercetin derivatives, and more specifically ortho-dihydroxylated B-ring flavonoids, play an important role in photoprotection as ROS-detoxifying agents and, to a lesser extent, as UV screens [59][60][61].

Locations and Plant Material
This study was conducted at four locations in three countries, two in Mexico, one in Nicaragua and one in Colombia, from latitude 4 • N to 19 • N, which covers most of the Latin American C. arabica growing area in the Northern Hemisphere ( Figure 6). Table 3 lists the location and main climatic and agronomic factors at each location. Each field trial was made up of two adjacent twin plots, each including all the genotypes studied, one in full sunshine and the other one under a black polyethylene shading net that excluded 50% of the light. When the leaf samples were collected, the coffee trees were three years old in Mexico and Colombia. In Nicaragua, they were also 3 years old after the coppicing of the original 24-year-old trees. The coffee trees were considered to be adults with full fruit production at all the elevations of the four experimental field trails. In Mexico, the study was carried out in two similar field trials located at two contrasting elevations (650 m vs. 1250 m asl) to generate a proxy of temperatures in order to mimic global warming associated with climate change.   In the two experimental plots in Mexico, a single American pure line of C. arabica was studied: cv. Marsellesa, belonging to the Sarchimor group (a cross between the Timor hybrid CIFC 832/2, a natural interspecific hybrid of C. arabica and C. canephora) and the Costa Rican compact line Villa Sarchi). The 22 genotypes studied in Nicaragua and the 11 studied in Colombia were C. arabica American pure lines, Ethiopian wild accessions of C. arabica and cloned F1 hybrids propagated by somatic embryogenesis, and from crosses between American pure lines as mothers and Ethiopian wild accessions as fathers, except for two F1 hybrid clones in the Nicaraguan trial that were crosses between two American pure lines. Table 4 lists the details of all the plant material studied in the three countries.

Leaf Sampling for Secondary Metabolite Analysis
For each growth condition (full sun or shade), at between 5 and 6 h after sunrise, ten mature leaves were collected from the same tree at the third node (from the branch apex) of the 6th to 8th pairs of plagiotropic branches (counting from the orthotropic apex). For samples of plants growing in full sun, care was taken to ensure that the leaves collected were fully exposed to the sun (by avoiding shaded leaves) and faced eastward. Leaves from the same tree grown in the same conditions were packed together and immediately plunged into liquid nitrogen and conserved at −80 • C until freeze dried. For each growth condition, 9 to 10 trees were sampled in June 2015 in Mexico, three to five trees in April 2019 in Nicaragua, and two to six trees (two American pure lines, three Ethiopian wild accessions and six F1 hybrid clones) in June 2014 in Colombia.

Secondary Metabolite Analysis
The freeze-dried leaf samples were vacuum packed and sent to the IRD laboratory, Montpellier, France, where they were ground into a fine powder in an A10 IKA model electric blender (IKA ® -Werke GmbH & Co. KG, Staufen, Germany) and stored until extraction. Each sample was extracted by stirring 25 mg of plant material in 6 mL of MeOH/H 2 O (80:20, v/v) in a Rotamax 120 (Heidolph Instruments GmbH & CO. KG, Schwabach, Germany) at 250 rpm and at 4 • C for 3 h, supplemented with 10 µL of 5-methoxyflavone (4 mM) as an internal standard. After centrifugation at 3500 rpm at 8 • C for 10 min, the organic extract was collected and filtered (Millipore, 0.25 µm porosity) before analysis. Each extraction was carried out in triplicate.
Quantitative analyses were carried out on a Shimadzu LC 20 HPLC-DAD system (Shimadzu Corporation, Kyoto, Japan) as described by Campa et al. [58]. Parallel analyses were performed in triplicate on pure standard solutions of trigonelline, theobromine, caffeine, mangiferin, 5-CQA and caffeic acid purchased from Sigma-Aldrich Chimie To confirm the standard-based identifications and to provide some basic structural information on the other substances, LC-DAD-MS 2 analyses were performed on an Agilent Infinity ® 1290 system (Agilent Technologies, Santa Clara, USA) coupled to a UV/vis DAD detector and equipped with a QTOF 6530 detector (Agilent) controlled by MassHunter ® software (Agilent). Analytical separation was carried out on a Poroshell ® 120 EC-C18 column (100 mm × 3.0 mm, 2.7 µm) equipped with a pre-column (Poroshell ® 20 EC-C18, 5 mm × 3.0 mm, 2.7 µm). A gradient of 0.4% formic acid in water (A) and acetonitrile (B) was used as follows: 0 min, 1% B; 1.5 min, 1% B; 6 min, 10% B; 12 min, 35% B; 14 min, 100% B; 16 min, 100% B. The flow rate and column temperature were 1.0 Ml.min −1 and 60 • C, respectively. A 2.0 µL aliquot of sample extract was injected into the column. The ESI source was optimized for positive and negative ionization modes (in "Auto MSMS" acquisition mode) as follows: scan spectra from m/z 50 to 2000, capillary voltage 3.5 kV, nozzle voltage 2000 V, fragmentor 110 V, fixed collision-induced dissociation (CID) energy at 20 eV. Nitrogen was used as the nebulizing gas at a flow rate of 12 L.min −1 and a temperature of 310 • C at 40 psi.

Statistical Analysis
All statistical analyses were performed in Statistica ® 7.1. Biochemical data were checked for homogeneity of variance using Levene's test. A factorial analysis of variance (ANOVA) with a Newman-Keuls test (p < 0.05) for comparing averages was performed on the 16 quantified phenolic contents for samples from Mexico (elevation × growing light condition). A Pearson's correlation matrix was also performed for leaf contents of these same 16 phenolic compounds by pooling sun and shade data. A heatmap was constructed with R software using the gplots package with default parameters for symmetrical dendrogram computation (linkage clustering using the Euclidian distance measures) and reordering using row (equal to column) means and a linear set of 50 shades of color (no color break) as correlation coefficient bins [62]. A linear regression analysis was performed on six phenolic leaf contents (5-CQA, catechin, mangiferin, K-hex-dhex, F-dihex and rutin), pooling all studied genotype samples from Colombia and Nicaragua. A Newman-Keuls test (p < 0.05) was performed for 5-CQA, mangiferin, F-dihex and rutin leaf contents on averages of combinations of genetic groups (Ethiopian wild accessions vs. American pure lines vs. F1 hybrid clones) × country (Colombia vs. Nicaragua).

Conclusions
This study allowed us to identify two phenolic compounds, chlorogenic acid 5-CQA and xanthone mangiferin, as good candidate biomarkers of C. arabica adaptation to variations in light intensity. Analysis of concentrations of the two compounds in the leaves of 34 C. arabica genotypes belonging to three different genetic groups grown in full sun or under 50% shade showed that the best adapted genotypes to high light intensity had the lowest 5-CQA and mangiferin contents, a trait that was preserved when grown in the shade. The predictive accuracy of these two biomarkers should now be confirmed on a wider range of C. arabica genotypes among the recently identified genetic groups of wild C. arabica germplasm. The aim would be to confirm our hypothesis that high 5-CQA and/or mangiferin leaf contents are a signature of greater adaptation to shade, i.e., to agroforestry systems, and conversely, that low concentrations of the compounds in leaves indicate good adaptation to full-sun cropping systems. Although the two compounds are good biomarkers for the degree of C. arabica adaptation to light, under no circumstances do they explain the mechanisms of C. arabica genotype adaptation to full sunlight. Questions concerning the molecular basis of the adaptation of arabica trees to full-sunlight conditions remain unanswered. In addition, proven analytical techniques that allow fast, accurate and reliable dosages of these two compounds, such as NIR spectroscopy techniques, make it possible to envisage opportunities for high-throughput phenotyping of C. arabica genotypes.
Supplementary Materials: The following are available online at http://www.mdpi.com/2218-1989/10/10/383/s1, Table S1: Concentrations of phenolic compounds in mature leaves of C. arabica cv. Marsellesa grown at two different elevations (650 and 1250 m asl) in full sun vs. under 50% shade, Table S2: Concentrations of six major phenolic compounds in mature leaves of C. arabica grown in Nicaragua and Colombia in full sun or under 50% shade.