Chemical Markers to Distinguish the Homo- and Heterozygous Bitter Genotype in Sweet Almond Kernels

Bitterness in almonds is controlled by a single gene (Sk dominant for sweet kernel, sk recessive for bitter kernel) and the proportions of the offspring genotypes (SkSk, Sksk, sksk) depend on the progenitors’ genotype. Currently, the latter is deduced after crossing by recording the phenotype of their descendants through kernel tasting. Chemical markers to early identify parental genotypes related to bitter traits can significantly enhance the efficiency of almond breeding programs. On this basis, volatile metabolites related to almond bitterness were investigated by Solid Phase Microextraction-Gas Chromatography-Mass Spectrometry coupled to univariate and multivariate statistics on 244 homo- and heterozygous samples from 42 different cultivars. This study evidenced the association between sweet almonds’ genotype and some volatile metabolites, in particular benzaldehyde, and provided for the first time chemical markers to discriminate between homo- and heterozygous sweet almond genotypes. Furthermore, a multivariate approach based on independent variables was developed to increase the reliability of almond classification. The Partial Least Square-Discriminant Analysis classification model built with selected volatile metabolites that showed discrimination capacity allowed a 98.0% correct classification. The metabolites identified, in particular benzaldehyde, become suitable markers for the early genotype identification in almonds, while a DNA molecular marker is not yet available.


Introduction
Almond (Prunus dulcis (Mill.), D. A. Webb; syn. P. amygdalus, Batsch.) is the main nut tree worldwide and almonds have an important commercial value, with an annual world production exceeding 3,000,000 tons in shell [1]. Sweet almond kernels are widely consumed raw or minimally processed, as well as used as an ingredient in food products. Genetic improvement programs for almonds in different countries such as Spain, Australia and the United States, have been selecting and with bitterness, this marker is not completely effective in predicting slight differences in bitterness such as those existing between sweet and semi-bitter kernels, and even less effective in detecting possible differences between sweet homo-and heterozygotes. This could be due to the performances of the analytical methods applied for the determination of the cyanogenic glucoside, or to the existence of secondary factors linked to the recessive allele affecting the production of benzaldehyde or other compounds causing bitterness perception.
According to Wirthensohn et al. [10] the overlap of the concentration ranges in sweet and semi-bitter kernel indicates that amygdalin may not be the only compound defining the marzipan-like flavor in sweet almonds. Some authors have pointed out the close correlation between bitter marzipan-like flavor and benzaldehyde, one of the amygdalin catabolites, even at low bitterness intensities assessed in sweet almond cultivars [25]. In addition, other almond volatile compounds, such as benzyl alcohol, revealed higher values in bitter almonds than in sweet almonds [26], and their levels tend to be higher in almonds with higher levels of benzaldehyde [25,27].
On this basis, the concentration of benzaldehyde and other related volatile compounds was monitored in 42 homozygous and heterozygous almond cultivars and selections, with the aim of identifying suitable chemical markers to classify sweet kernel almonds according to their genotype (homozygotes or heterozygotes). With this aim, a Solid Phase Microextraction-Gas Chromatography-Mass Spectrometry (SPME-GC-MS) method was optimized and applied to 244 almond samples obtained from 124 different trees.

Samples
Almonds (Prunus dulcis (Mill.), D. A. Webb; syn. P. amygdalus, Batsch.) of 41 different cultivars and selections and one feral tree were studied. For 37 of these, their genotypes were previously reported in the literature [12,17,20,28] or determined by IRTA's almond breeding program. In agreement with these sources, the 42 cultivars and selections consisted of 22 homozygous and 14 heterozygous sweet kernel cultivars, five selections without known genotype and one reference bitter feral tree. Few of these heterozygous cultivars ('Tuono', 'Guara', 'Genco') are described as semi-bitter, although no precise and objective criteria have been set for this classification. Hereinafter, all the samples except the bitter one will be considered as sweet kernel almonds. A total of 244 almond samples were obtained from 124 different trees (Table 1). These samples were produced in 2012 and 2015 in different geographical areas: Constantí and Gandesa in Tarragona and Les Borges Blanques in Lleida (Catalonia, Spain). Out of the 42 cultivars, 10 (8 homozygous and 2 heterozygous) were analysed both in 2012 and 2015. Almonds were collected, shelled and blanched by hand, then packed under vacuum, stored at 2-8 • C, and analysed within three months.

Sample Preparation and Solid Phase Microextraction (SPME) Conditions
The SPME fiber divinylbenzene/carboxen/polydimethylsiloxane fiber (50/30 µm, 2 cm long from Supelco Ltd., Bellefonte, PA, USA) was selected as being the most suitable for compounds with a wide range of molecular weight and polarity. The extraction of volatiles was performed on a suspension of ground almonds in aqueous solution on the basis of preliminary results obtained by comparing the uptake of volatiles obtained from ground almonds (1 g) and from ground almonds in aqueous suspension (1 g in 2 mL of ultrapure water). A multilevel factorial experiment was then applied to optimize the rest of the parameters affecting the extraction of volatile compounds: extraction temperature (40, 50, 60 • C), extraction time (20, 30, 40 min), sample amount (1, 1.5 g) and pH of the suspension (3.5, 7). The optimized factorial design consisted of 20 experiments performed in duplicate and randomized (Supplementary Table S1). The dependent variables were the GC-MS responses of 12 representative compounds of the volatile profile, belonging to different chemical families ( Table 2). The influence of the different factors was evaluated by means of a normalized Pareto diagram, elaborated with the chromatographic responses of each analyte in the different extraction conditions. The optimal value of each factor involved in the extraction was statistically calculated and the best extraction conditions were chosen for the analysis. Table 2. Results of the factorial design: optimal extraction conditions based on the regression models for the factors that significantly influenced extraction (p < 0.05).

RT a (min)
Compound Finally, almond samples were analysed as follows: 10 g of skinless almonds were ground during 1 min using a domestic grinder (Iberica Group, Barcelona, Spain), then 1 g of the sample was suspended in 2 mL of ultrapure water (pH 7) in a 10 mL vial. The sample was spiked with 4-methyl-2-pentanol (Sigma-Aldrich, St. Louis, MO, USA) to a final concentration of 0.5 µg/g of almonds and sealed with a PTFE-silicone septum. The vial was placed in a water bath at 60 • C under magnetic stirring, and the SPME fiber was maintained for 40 min in the sample headspace. The volatile compounds of the fiber were desorbed for 1 min at 260 • C in the gas chromatograph injection port.
Intra-day repeatability was assessed by analyzing the same almond sample five times and calculating the percent relative standard deviation (Supplementary Table S2).

Gas Chromatography-Mass Spectrometry (GC-MS) Analysis
GC-MS analyses were performed in 2012 on a Thermo Scientific Trace GC Ultra coupled to a quadrupole mass selective spectrometer DSQ II (Thermo Scientific, Bremen, Germany) and in 2015 on an Agilent GC 6890N coupled to a quadrupole mass selective spectrometer 5973 (Agilent Technology, Palo Alto, CA, USA). Both were equipped with a split-splitless injection port. Helium was the gas carrier, at linear velocity of 1 mL/min. The separation of the volatiles was performed by a column Supelcowax-10 (30 m × 0.25 mm i.d., 0.25 µm film thickness), purchased from Supelco Ltd (Bellefonte, PA, USA). The temperature of the column was held at 40 • C for 5 min and increased to 250 • C at 6 • C/min. Electron impact mass spectra were recorded at 70 eV ionization energy in the 35-250 m/z range, 2 scan/s. Volatile compounds were identified by comparison of their mass spectra and retention times with those of standard compounds or tentatively identified by comparing their mass spectra with the reference mass spectra of the Wiley 6.0 library and their linear retention indices with those reported in the literature. For quantitative analysis, relative amounts of volatile compounds were calculated by using the internal standard method. The compounds were quantified by considering the relative response factor to be 1 and were expressed as micrograms per gram equivalents of 4-Methyl-2-pentanol.

Statistical Analysis
Statistical elaboration for the optimization of the SPME conditions was carried out using Statgraphics Plus 5.1© (Statgraphic Technologies Inc., The Plains, VA, USA). Four factors were tested at three or two levels, as previously described. The factorial design consisted of 20 experiments performed in duplicate. The normalized results of the experimental design, evaluated at a significance level of 5%, were analysed using a standardized Pareto diagram, which shows a frequency histogram where the length of each bar in the graph is proportional to the absolute value of its standardized effect. The significance of the factors studied and the optimal values for each factor were established by means of an ANOVA and a regression analysis of the model, respectively. The results were considered significant with values of p < 0.05.
Univariate statistical analysis was performed with SPSS software v25© (IBM Corp., NY, USA). Student's t-test was applied to compare homo-and heterozygous groups, and bilateral Pearson correlations were assessed between benzaldehyde and the compounds presenting significant differences by the t-test, and between benzaldehyde and bitterness. In all cases, p < 0.05 was considered significant. Analysis of variance by General Linear Model (GLM) of SPSS was carried out according to the harvest year and geographical production area.
Multivariate analysis was carried out with SIMCA software v13.0© (Umetrics AB, Sweden). With the variables selected by univariate statistics (6 variables) and after data pre-processing (scaling to unit variance), a Principal Component Analysis (PCA) was developed to explore the natural clustering of samples and detect potential outliers (according to Hotelling's T2 range and distance to the model parameters). A Partial Least Square-Discriminant Analysis (PLS-DA) classification model was then built with the same variables to classify the samples into homo-or heterozygous categories.

Results and Discussion
In almonds, individuals with sweet kernel phenotype can present homozygous (SkSk) or heterozygous (Sksk) genotype. To classify them according to this genotype, suitable metabolic markers were investigated after optimizing a proper analytical method.

Optimization of SPME-GC-MS Method for the Assessment of Volatile Compounds
A 29% increase in total chromatographic area was observed by analyzing ground almonds in suspension in comparison to dry extraction (Supplementary Table S3). This greater efficiency is justified by a better mass transfer due to a greater exposure of the surface of the almond particles compared to direct extraction, in which these particles tend to agglomerate. The presence of water could also favor enzymatic reactions leading to some volatiles related to almond bitterness [26]. Table 2 shows the optimal values for the extraction variables that were found to significantly influence the extraction of each volatile compound. The temperature and the extraction time were the parameters with the highest influence on volatiles uptake. As expected, for most compounds an increased chromatographic response was observed at 60 • C and 40 min. The compounds whose uptake was significantly influenced by pH showed a better extraction at pH 7. The amount of sample only showed a significant effect on few volatile compounds, and it was maintained at 1 g to favor a proper stirring during the extraction.

Univariate Statistical Analysis of Raw Almond Volatile Components
Thirty compounds were detected in the headspace of the samples under study (Supplementary  Table S2), most of which were previously described in almonds [29,30]. To identify metabolites whose biogenesis could be related to the almond genotype (SkSk, Sksk), we focused on the compounds that presented significant differences between homo-and heterozygous almonds when assessed by univariate analysis (Table 3). Benzaldehyde, benzyl alcohol and 1-penten-3-ol presented significantly higher concentrations in kernels from heterozygous (Sksk) cultivars, while branched aldehydes 2-and 3-methylbutanal, and branched alcohols 2-Methylpropan-1-ol, 3-Methylbutan-1-ol, 3-Methyl-3-buten-1-ol and 3-Methyl-2-buten-1-ol were more abundant in homozygous (SkSk) ones. A relationship with the recessive allele could be hypothesized for those of them that presented clear trends according to the genotype: SkSk<Sksk<sksk, such as benzaldehyde and benzyl alcohol; or SkSk>Sksk>sksk, such as branched alcohols 2-Methylpropan-1-ol, 3-Methylbutan-1-ol, 3-Methyl-3-buten-1-ol and 3-Methyl-2-buten-1-ol (Table 3). All these compounds' results significantly correlated with benzaldehyde in all the sweet almond phenotypes (Table 3). On the contrary, branched aldehydes and 1-Penten-3-ol did not follow any of these trends, and they did not significantly correlate with benzaldehyde, suggesting that their formation could be driven by varietal factors unrelated to the kernel bitterness. For this reason, they were not further considered as possible genotype markers in sweet almonds. Although the harvest year and the production area influenced the concentration of the selected volatiles (Supplementary Table S4), the differences between SkSk and Sksk groups were high enough to allow the differentiation of these genotypes in spite of the annual and geographical variability. While bitterness and marzipan-like flavor had been previously related to benzaldehyde and benzyl alcohol in semi-bitter and bitter almonds [10,25,31], no data were available about the occurrence of these compounds in sweet almonds according to their genotype. While benzaldehyde is known to proceed from amygdalin catabolism [7][8][9], the biosynthesis of benzyl alcohol in almonds has not been elucidated. Kwak et al. [26] documented that it is formed in bitter almond kernel by enzymatic reactions, which may consist of the reversible enzymatic reduction of benzaldehydes as described in other plants [32]. This would substantiate the association of benzyl alcohol with benzaldehyde and almonds' bitter character. In the same way, the enzymatic formation of branched alcohols was predominant in sweet rather than in bitter almond kernels [26], but it was unknown that these compounds were also predominant in homozygous sweet almond genotypes compared to heterozygotes.
Box-and-whisker plots were built to explore the concentration ranges of the selected compounds and their capacity to differentiate homo-and heterozygous sweet genotypes (Figure 1). While most of the compounds presented certain overlap in the ranges of homo-and heterozygous groups, benzaldehyde levels allowed a neat distinction between these groups. We report for the first time a discrimination between homo-and heterozygous sweet almond genotypes based on a chemical marker, which resulted from the analysis of more than 200 samples from 36 distinct cultivars. These results indicate that benzaldehyde performed better than reported for amygdalin to differentiate homo-and heterozygous sweet almond kernels [7,22]. This could be the consequence of a higher sensitivity in the detection of benzaldehyde, which led to differentiation even between kernels of very low bitterness. This was sustained by the significant correlation (Pearson correlation = 0.787, p < 0.001) between benzaldehyde and the mean bitterness intensity of the sweet cultivars under study, assessed by IRTA's almond sensory panel on samples from previous harvest years (Table 1). In addition, we could hypothesize that the accumulation of amygdalin in the kernel is not the only effect of the recessive bitter allele in heterozygotes, and that the latter could influence other enzymatic reactions such as the catabolic routes yielding benzaldehyde and related compounds, as well as the synthesis of branched alcohols.
Foods 2020, 9, x FOR PEER REVIEW 8 of 12 were classified as homozygous cultivars (Figure 1). This classification may be verified once the bitter character segregation data are available in the progeny of these cultivars. Although the homo-and heterozygous sweet almonds considered in the present study could be discriminated directly by their levels of benzaldehyde, all the metabolites whose biogenesis seemed to be linked to the almond genotype could be useful to support this classification as confirmation parameters or in multivariate models.

Multivariate Statistical Analysis of Raw Almond Volatile Components
A multivariate statistical approach based on various potential genotype markers was carried out to support the differentiation allowed by benzaldehyde with the aim of providing a more reliable classification tool. PCA was carried out with the biomarkers previously selected by univariate analysis (3 Principal Components (PCs) accounted for 94.7% of the total variance explained, no outliers detected). While PC1 seemed to depend on varietal characteristics not linked to the bitter allele (data not shown), the scores and loadings plots corresponding to PC2 and PC3 confirmed that a clear differentiation between hetero-and homozygous individuals (Figure 2a) was driven by benzaldehyde and benzyl alcohol, and branched alcohols, respectively (Figure 2b). PC2 was the component that mainly contributed to the differentiation between hetero-and homozygous individuals (19.6% of explained variance). As expected, benzaldehyde was the variable that mainly contributed positively to this component, followed by benzyl alcohol (PC2 loadings 0.744 and 0.503, Benzaldehyde could represent a suitable chemical marker for the early genotype identification in almond cultivars and selections used in breeding programs. In this regard, samples from the five sweet almond selections without known genotype (IRTA-7, IRTA-11, 'Cambra', 'Felisia' and 'Soleta') were classified as homozygous cultivars (Figure 1). This classification may be verified once the bitter character segregation data are available in the progeny of these cultivars.
Although the homo-and heterozygous sweet almonds considered in the present study could be discriminated directly by their levels of benzaldehyde, all the metabolites whose biogenesis seemed to be linked to the almond genotype could be useful to support this classification as confirmation parameters or in multivariate models.

Multivariate Statistical Analysis of Raw Almond Volatile Components
A multivariate statistical approach based on various potential genotype markers was carried out to support the differentiation allowed by benzaldehyde with the aim of providing a more reliable classification tool. PCA was carried out with the biomarkers previously selected by univariate analysis (3 Principal Components (PCs) accounted for 94.7% of the total variance explained, no outliers detected). While PC1 seemed to depend on varietal characteristics not linked to the bitter allele (data not shown), the scores and loadings plots corresponding to PC2 and PC3 confirmed that a clear differentiation between hetero-and homozygous individuals (Figure 2a) was driven by benzaldehyde and benzyl alcohol, and branched alcohols, respectively (Figure 2b). PC2 was the component that mainly contributed to the differentiation between hetero-and homozygous individuals (19.6% of explained variance). As expected, benzaldehyde was the variable that mainly contributed positively to this component, followed by benzyl alcohol (PC2 loadings 0.744 and 0.503, respectively), while 2-methyl propanol, 3-Methylbutan-ol, 3-Methyl-3-buten-1-ol and 3-Methyl-2-buten-1-ol were the ones mainly contributing negatively to this PC (PC2 loadings −0.310, −0.252, −0.140 and −0.113, respectively).  On this basis, to dispose of a classification tool for sweet almonds based on all volatile compounds whose biogenesis seemed to be linked to their genotype, a supervised discriminant technique was applied to find the maximum correlation between the data and each of the categories of interest (heterozygous vs homozygous). A PLS-DA classification model developed according to the almond genotype and based on the previously selected variables provided a 98.0% correct classification, as obtained by leave-10%-out cross-validation ( Table 4). The corresponding predicted values are reported in Supplementary Table S5. The permutation test (n = 20) indicated that the model was not over-fitted according to the Q 2 scores (Model's Q 2 = 0.81, permutation models' Q 2 < 0). Moreover, PLS-DA regression coefficients confirmed the major role of benzaldehyde in the  On this basis, to dispose of a classification tool for sweet almonds based on all volatile compounds whose biogenesis seemed to be linked to their genotype, a supervised discriminant technique was applied to find the maximum correlation between the data and each of the categories of interest (heterozygous vs homozygous). A PLS-DA classification model developed according to the almond genotype and based on the previously selected variables provided a 98.0% correct classification, as obtained by leave-10%-out cross-validation (Table 4). The corresponding predicted values are reported in Supplementary Table S5. The permutation test (n = 20) indicated that the model was not over-fitted according to the Q 2 scores (Model's Q 2 = 0.81, permutation models' Q 2 < 0). Moreover, PLS-DA regression coefficients confirmed the major role of benzaldehyde in the classification model and evidenced the lower but significant contribution of some branched alcohols (Figure 3). classification model and evidenced the lower but significant contribution of some branched alcohols ( Figure 3).  Four heterozygous samples out of 203 were misclassified by the PLS-DA, three 'Nonpareil' and one 'FGFP092'. Other samples from these cultivars were correctly classified by the model. All the samples from these cultivars could be well distinguished from homozygous samples by considering only the benzaldehyde content ( Figure 1). The slight reduction in the classification efficiency observed by PLS-DA was compensated by a higher classification reliability, given by the application of an approach based on various independent variables.
According to the PLS-DA model, and in agreement with the benzaldehyde content, all the samples belonging to the five cultivars with unknown genotype were classified as homozygous, according to their predicted values (Supplementary Table S6). Such classification was feasible according to their genealogy.

Conclusions
In conclusion, the results obtained in this work evidenced the association between sweet almonds' genotype and some volatile metabolites and provided for the first time chemical markers to discriminate between homo-and heterozygous sweet almonds. In particular, the amount of benzaldehyde, assessed by a simple, rapid, automatable and affordable technique such as SPME-GC-MS allowed to differentiate between the homo-and heterozygous samples analyzed in the study (n = 203) and to tentatively classify almond kernels with unknown genotype (n = 39). Moreover, the PLS- Four heterozygous samples out of 203 were misclassified by the PLS-DA, three 'Nonpareil' and one 'FGFP092'. Other samples from these cultivars were correctly classified by the model. All the samples from these cultivars could be well distinguished from homozygous samples by considering only the benzaldehyde content ( Figure 1). The slight reduction in the classification efficiency observed by PLS-DA was compensated by a higher classification reliability, given by the application of an approach based on various independent variables.
According to the PLS-DA model, and in agreement with the benzaldehyde content, all the samples belonging to the five cultivars with unknown genotype were classified as homozygous, according to their predicted values (Supplementary Table S6). Such classification was feasible according to their genealogy.

Conclusions
In conclusion, the results obtained in this work evidenced the association between sweet almonds' genotype and some volatile metabolites and provided for the first time chemical markers to discriminate between homo-and heterozygous sweet almonds. In particular, the amount of benzaldehyde, assessed by a simple, rapid, automatable and affordable technique such as SPME-GC-MS allowed to differentiate between the homo-and heterozygous samples analyzed in the study (n = 203) and to tentatively classify almond kernels with unknown genotype (n = 39). Moreover, the PLS-DA classification model built with selected independent metabolites that had discrimination capacity and were thus more likely to provide a greater reliability to the classification, allowed 98.0% of correct category assignment. The selected metabolites, and in particular benzaldehyde, represent suitable chemical markers for the early genotype identification in almond cultivars and selections used in breeding programs. While a DNA molecular marker is not available, this technique can be used to distinguish homo-and heterozygous bitter genotypes in sweet almond and thus it is useful both to determine genotypes of parents for further breeding or screening unwanted seedlings derived from crosses when breeding.
Supplementary Materials: The following are available online at http://www.mdpi.com/2304-8158/9/6/747/s1, Table S1: Experiments performed to develop the SPME method, after optimizing the experimental design, Table S2: Volatile compounds identified in the homozygous and heterozygous cultivars and selections under study, Table S3: comparison of chromatographic areas obtained by dry extraction and extraction in aqueous suspension, Table S4: influence of harvest year and production area on the concentration of the selected volatiles obtained by analysis of variance, S5: samples from cultivars with known genotype and their predicted values as the SkSk (homozygous) and Sksk (heterozygous) class of the PLS-DA model, Table S6: samples from cultivars and selections with unknown genotype, and their predicted values as the SkSk (homozygous) class of the PLS-DA model.