Insights of Phenolic Pathway in Fruits: Transcriptional and Metabolic Profiling in Apricot (Prunus armeniaca)

There is an increasing interest in polyphenols, plant secondary metabolites, in terms of fruit quality and diet, mainly due to their antioxidant effect. However, the identification of key gene enzymes and their roles in the phenylpropanoid pathway in temperate fruits species remains uncertain. Apricot (Prunus armeniaca) is a Mediterranean fruit with high diversity and fruit quality properties, being an excellent source of polyphenol compounds. For a better understanding of the phenolic pathway in these fruits, we selected a set of accessions with genetic-based differences in phenolic compounds accumulation. HPLC analysis of the main phenolic compounds and transcriptional analysis of the genes involved in key steps of the polyphenol network were carried out. Phenylalanine ammonia-lyase (PAL), dihydroflavonol-4-reductase (DFR) and flavonol synthase (FLS) were the key enzymes selected. Orthologous of the genes involved in transcription of these enzymes were identified in apricot: ParPAL1, ParPAL2, ParDFR, ParFLS1 and ParFLS2. Transcriptional data of the genes involved in those critical points and their relationships with the polyphenol compounds were analyzed. Higher expression of ParDFR and ParPAL2 has been associated with red-blushed accessions. Differences in expression between paralogues could be related to the presence of a BOXCOREDCPAL cis-acting element related to the genes involved in anthocyanin synthesis ParFLS2, ParDFR and ParPAL2.


Introduction
Apricot (Prunus armeniaca) is an important fruit crop in Mediterranean basin countries and Asia, with a wide diversity in pomological characteristics and fruit quality properties due to its different diversification centers [1]. Apricots are a good source of vitamins, carotenoids, and polyphenols [2], which makes this species a good choice from a nutraceutical point of view [3].
Higher plants have several defense mechanisms against biotic and abiotic stresses. Some of these mechanisms result in the synthesis of a large number of secondary metabolites. Flavonoids are one of these defense-related secondary metabolites, being a family of polyphenols synthesized by the phenylpropanoid biosynthetic pathway [4]. These secondary metabolites remain in different plant organs and accumulate on the plant surface [5]. In the case of flavonoid compounds, their accumulation is unequally distributed within tissues, as its concentration is higher in the peel of several fruits such as apple [6], peach [7], or apricot [8].
Polyphenols have been identified as secondary metabolites with great antioxidant activity [9][10][11]. In recent years, there is an increasing interest in them as contributors to fruit quality and dietary properties. In the case of apricot, the fruit peel is an excellent source of phenolic compounds. The main phenylpropanoid-derivate secondary metabolites in source of phenolic compounds. The main phenylpropanoid-derivate secondary metabolites in apricot are chlorogenic and neochlorogenic acids, two caffeate derivates monolignols, while the main flavonols are rutin and quercetin-3-glucuronide [12].
Phenylpropanoid biosynthesis starts from the conversion of L-phenylalanine into cinnamic acid due to the action of phenylalanine ammonia-lyase (PAL) (Figure 1).
Phenylalanine ammonia-lyase (PAL) has been described as the first enzyme in the phenylpropanoid pathway, considered a key regulatory point between primary and secondary metabolism through conversion of L-phenylalanine into cinnamic acid [13]. PAL is encoded by a multi-gene family, in which the number of genes involved depends on the species. In Arabidopsis and Nicotiana, four PAL-encoding genes have been described [14][15][16], five in poplar [17], and two in different Prunus species [18]. In the following step, cinnamic acid 4-hydroxylase converts cinnamic acid into 4-coumaric acid, to which a coenzyme-A is added due to the action of 4-coumarate-CoA ligase, giving 4-coumaroyl-CoA as a result. At this point, the pathway can branch off to the caffeate derivates biosynthesis, producing chlorogenic and neochlorogenic acids. Alternatively, 4-coumaroyl-CoA is also used by chalcone synthase to catalyze the synthesis of chalcone, which is isomerized to colorless flavanones. These compounds can be hydroxylated at three different positions, by three different flavonoid hydroxylases, producing a group of dihydroflavonols. Then, the phenolic pathway can branch off to the flavonols biosynthesis due to the action of flavonol synthase (FLS). This enzyme uses dihydroflavonols (dihydroquercetin, dihydrokaempferol, or dihydromyricetin) as a substrate to produce kaempferol, quercetin, or myricetin, the main precursors of some flavonols such as rutin or quercetin-3-glucuronide. Previous works have identified FLS-encoding genes in Arabidopsis [19,20]. In addition, FLS has been related with dihydroflavonols catalysis to flavonol but also it has been related to anthocyanin accumulation [20,21]. On the other hand, dihydroflavonol-4-reductase (DFR) enzyme controls one of the limiting steps of the anthocyanin pathway, reducing dihydroflavonols to leucoanthocyanidins [22][23][24], therefore using the same substrate as FLS. Several DFR-encoding genes have been identified in different species [23,[25][26][27]. Although phenolic metabolism regulation remains ambiguous in some points, various studies have identified the role of MYB transcription factors in phenolic synthesis regulation [28][29][30]. Phenylalanine ammonia-lyase (PAL) has been described as the first enzyme in the phenylpropanoid pathway, considered a key regulatory point between primary and secondary metabolism through conversion of L-phenylalanine into cinnamic acid [13]. PAL is encoded by a multi-gene family, in which the number of genes involved depends on the species. In Arabidopsis and Nicotiana, four PAL-encoding genes have been described [14][15][16], five in poplar [17], and two in different Prunus species [18]. In the following step, cinnamic acid 4-hydroxylase converts cinnamic acid into 4-coumaric acid, to which a coenzyme-A is added due to the action of 4-coumarate-CoA ligase, giving 4-coumaroyl-CoA as a result. At this point, the pathway can branch off to the caffeate derivates biosynthesis, producing chlorogenic and neochlorogenic acids. Alternatively, 4-coumaroyl-CoA is also used by chalcone synthase to catalyze the synthesis of chalcone, which is isomerized to colorless flavanones. These compounds can be hydroxylated at three different positions, by three different flavonoid hydroxylases, producing a group of dihydroflavonols. Then, the phenolic pathway can branch off to the flavonols biosynthesis due to the action of flavonol synthase (FLS). This enzyme uses dihydroflavonols (dihydroquercetin, dihydrokaempferol, or dihydromyricetin) as a substrate to produce kaempferol, quercetin, or myricetin, the main precursors of some flavonols such as rutin or quercetin-3-glucuronide. Previous works have identified FLS-encoding genes in Arabidopsis [19,20]. In addition, FLS has been related with dihydroflavonols catalysis to flavonol but also it has been related to anthocyanin accumulation [20,21]. On the other hand, dihydroflavonol-4-reductase (DFR) enzyme controls one of the limiting steps of the anthocyanin pathway, reducing dihydroflavonols to leucoanthocyanidins [22][23][24], therefore using the same substrate as FLS. Several DFR-encoding genes have been identified in different species [23,[25][26][27]. Although phenolic metabolism regulation remains ambiguous in some points, various studies have identified the role of MYB transcription factors in phenolic synthesis regulation [28][29][30].
Nevertheless, although the main steps of the metabolic pathway are described, the identification of key gene enzymes and their roles in the phenylpropanoid pathway of some 3 of 17 fruit crops remain uncertain. As the first step for a better understanding of the phenolic pathway in fruits, we selected a set of apricot accessions from the IVIA's apricot breeding program with genetic-based differences in phenolic compound accumulation [8].
Fruit phenolic content of the genotypes selected was evaluated and compared with the genetic expression of genes encoding key enzymes of the phenolic biosynthesis pathway related to primary phenolic compounds (PAL), anthocyanin biosynthesis (DFR), and secondary phenolic metabolites (FLS). Since FLS and DFR use the same substrate for producing either flavonols or anthocyanins, respectively, their possible role in flavonol accumulation in apricot should be studied. Characterization of the expression of main genes acting in the phenolic pathway and its relationship with fruit polyphenol content will provide tools to unravel the phenolic pathway of fruit species. This information will be of interest in breeding programs aimed at increasing fruit quality and useful for the promotion of fruit consumption.

Apricot Polyphenol Content
Total polyphenol content and the main phenolic compounds were evaluated for each year of study, including the two-years average content. Results are indicated in Table 1 and Table S1. Significant differences were found among all genotypes studied. The higher values were obtained in genotypes with an important red-blush color on the skin: 'Dama Rosa', 'GG9310', 'GG979', 'GP9817', and 'HM964'. The most important disease affecting Prunus species is caused by the Plum Pox Virus (PPV). The donor of PPV resistance 'Goldrich' and hybrids between 'Goldrich' and the Mediterranean autochthonous varieties (Ginesta and Palau) (Figure 2), presented more than 50% of red-blush in the skin and the highest amounts of total polyphenol content. The variety 'Mitger' contributes as well to the total polyphenol content of hybrids. Results indicated that hybrids from these three varieties (Ginesta, Palau and Mitger) crossed with 'Goldrich' produced genotypes with interesting polyphenol content.
The main secondary phenolic compounds: rutin, quercetin, chlorogenic, and neochlorogenic acid were analyzed and a similar trend was obtained. 'Dama Rosa' showed the highest concentrations for all the studied compounds. 'Goldrich' hybrids 'Dama Rosa', 'Dama Taronja', and 'GP9817' showed higher content of neochlorogenic acid and rutin compared to the other accessions ( Figure 3). Differences among cultivars were found in both years (Table S1). The main secondary phenolic compounds: rutin, quercetin, chlorogenic, and neochlorogenic acid were analyzed and a similar trend was obtained. 'Dama Rosa' showed the highest concentrations for all the studied compounds. 'Goldrich' hybrids 'Dama Rosa', 'Dama Taronja', and 'GP9817' showed higher content of neochlorogenic acid and rutin compared to the other accessions ( Figure 3). Differences among cultivars were found in both years (Table S1).

Putative Orthologous and Phylogenetic Analysis
BLAST analysis using P. persica and A. thaliana DFR, FLS, and PAL identified a total of five genes in P. armeniaca: ParDFR (PARG07267), ParFLS1 (PARG08425), ParFLS2 (PARG08426), ParPAL1 (PARG18722), ParPAL2 (PARG02214). Table S2 shows high (>95%) conservation between peach and apricot for all genes. PAL genes were located in different linkage groups in both species, and as a consequence, in different synteny blocks. PpePAL1 was located in LG2, meanwhile apricot was located in LG5. However, PpePAL2, located in LG6, matched in LG1 in apricot. PpeDFR, PpeFLS1, and PpeFLS2 were located  The main secondary phenolic compounds: rutin, quercetin, chlorogenic, and neochlorogenic acid were analyzed and a similar trend was obtained. 'Dama Rosa' showed the highest concentrations for all the studied compounds. 'Goldrich' hybrids 'Dama Rosa', 'Dama Taronja', and 'GP9817' showed higher content of neochlorogenic acid and rutin compared to the other accessions ( Figure 3). Differences among cultivars were found in both years (Table S1).

Putative Orthologous and Phylogenetic Analysis
BLAST analysis using P. persica and A. thaliana DFR, FLS, and PAL identified a total of five genes in P. armeniaca: Table S2 shows high (>95%) conservation between peach and apricot for all genes. PAL genes were located in different linkage groups in both species, and as a consequence, in different synteny blocks. PpePAL1 was located in LG2, meanwhile apricot was located in LG5. However, PpePAL2, located in LG6, matched in LG1 in apricot. PpeDFR, PpeFLS1, and PpeFLS2 were located

Putative Orthologous and Phylogenetic Analysis
BLAST analysis using P. persica and A. thaliana DFR, FLS, and PAL identified a total of five genes in P. armeniaca: Table S2 shows high (>95%) conservation between peach and apricot for all genes. PAL genes were located in different linkage groups in both species, and as a consequence, in different synteny blocks. PpePAL1 was located in LG2, meanwhile apricot was located in LG5. However, PpePAL2, located in LG6, matched in LG1 in apricot. PpeDFR, PpeFLS1, and PpeFLS2 were located in LG1 in peach, but they match with LG2 in Prunus armeniaca. All the predicted locations matched with the synteny between these regions in apricot and peach. In addition, Arabidopsis thaliana and Prunus armeniaca also had a high identity (>80%) for PAL, more than 70% for ParDFR and 60% for ParFLS1 and 45.65% for ParFLS2 (Table S3). In addition, protein alignment revealed a high conservation among Prunus and Arabidopsis thaliana (Tables S4 and S5). ParPAL1 and ParPAL2 showed around 80% of similarity with AtPAL1 and AtPAL2, respectively. Regarding DFR, similarity was around 70% mean. FLS showed the lowest similarity with 57% and 43% for FLS1 and FLS2. A similar trend was observed for Prunus persica and Arabidopsis thaliana.
ParPAL1 and the putative PAL1 orthologous from Prunus persica and Malus domestica were clustered together. ParPAL2 and its putative orthologous were grouped in a different cluster which showed the differences among both paralogs. The phylogenetic tree of phenylalanine ammonia-lyase proteins ( Figure 4A), showed that all Arabidopsis thaliana proteins clustered together.
in LG1 in peach, but they match with LG2 in Prunus armeniaca. All the predicted locations matched with the synteny between these regions in apricot and peach. In addition, Arabidopsis thaliana and Prunus armeniaca also had a high identity (>80%) for PAL, more than 70% for ParDFR and 60% for ParFLS1 and 45.65% for ParFLS2 (Table S3). In addition, protein alignment revealed a high conservation among Prunus and Arabidopsis thaliana (Tables S4 and S5). ParPAL1 and ParPAL2 showed around 80% of similarity with AtPAL1 and AtPAL2, respectively. Regarding DFR, similarity was around 70% mean. FLS showed the lowest similarity with 57% and 43% for FLS1 and FLS2. A similar trend was observed for Prunus persica and Arabidopsis thaliana.
ParPAL1 and the putative PAL1 orthologous from Prunus persica and Malus domestica were clustered together. ParPAL2 and its putative orthologous were grouped in a different cluster which showed the differences among both paralogs. The phylogenetic tree of phenylalanine ammonia-lyase proteins ( Figure 4A), showed that all Arabidopsis thaliana proteins clustered together. The phylogenetic tree revealed that DFR proteins of Prunus persica and Prunus armeniaca clustered together, being closed to its orthologous from Malus domestica ( Figure 4B).
The predicted proteins encoded by FLS genes of Arabidopsis thaliana grouped in a cluster. On the other hand, Prunus persica predicted proteins from PpeFLS2 and ParFLS2 were grouped in the same cluster, as were Prunus armeniaca PpeFLS1 and ParFLS1. However, Fragaria vesca predicted sequences encoded by FvFLS clustered in another tree branch with the Malus domestica proteins group ( Figure 4C).

Gene Expression
Genetic expression of the genes studied (ParPAL1, ParPAL2, ParDFR, ParFLS1, ParFLS2) did not show a year effect but a genotype effect (Kruskal-Wallis test). Subsequently, we found minor differences in gene expression among genotypes ( Figure 5, Table S6).
FSL (C) genes. Each tree was bootstrapped 1000 times. Numbers close to each branch represent the percentage of replicate trees in which the associated taxa clustered together in the bootstrap test. Trees are drawn to scale according to evolutionary distances (p-distance), included under each tree representing the number of substitutions per site.
The phylogenetic tree revealed that DFR proteins of Prunus persica and Prunus armeniaca clustered together, being closed to its orthologous from Malus domestica ( Figure 4B).
The predicted proteins encoded by FLS genes of Arabidopsis thaliana grouped in a cluster. On the other hand, Prunus persica predicted proteins from PpeFLS2 and ParFLS2 were grouped in the same cluster, as were Prunus armeniaca PpeFLS1 and ParFLS1. However, Fragaria vesca predicted sequences encoded by FvFLS clustered in another tree branch with the Malus domestica proteins group ( Figure 4C).

Gene Expression
Genetic expression of the genes studied (ParPAL1, ParPAL2, ParDFR, ParFLS1, Par-FLS2) did not show a year effect but a genotype effect (Kruskal-Wallis test). Subsequently, we found minor differences in gene expression among genotypes ( Figure 5, Table S6).
Genetic expression of ParPAL1, ParPAL2, ParDFR, and ParFLS2 showed significant differences among genotypes ( Figure 5). Concerning the expression of flavonol-synthase encoding gene ParFLS1, no significant differences among genotypes were observed. Regarding the expression of phenylalanine ammonia-lyase (ParPAL1 and ParPAL2), only the variety 'Goldrich' showed significant differences on PAL1 and two genotypes showed significant differences on PAL2 ('Mitger' and HG9850).

Contribution of 'Goldrich' to Phenolic Compounds Content and Genetic Expression
In this study, 'Goldrich' used as donor of resistance to PPV in most apricot breeding programs worldwide and the main contributor to the hybrids included in this study, was evaluated as contributor of compounds for fruit quality ( Table 2). Genetic expression of ParPAL1, ParPAL2, ParDFR, and ParFLS2 showed significant differences among genotypes ( Figure 5). Concerning the expression of flavonol-synthase encoding gene ParFLS1, no significant differences among genotypes were observed.
Regarding the expression of phenylalanine ammonia-lyase (ParPAL1 and ParPAL2), only the variety 'Goldrich' showed significant differences on PAL1 and two genotypes showed significant differences on PAL2 ('Mitger' and HG9850).

Contribution of 'Goldrich' to Phenolic Compounds Content and Genetic Expression
In this study, 'Goldrich' used as donor of resistance to PPV in most apricot breeding programs worldwide and the main contributor to the hybrids included in this study, was evaluated as contributor of compounds for fruit quality ( Table 2). Table 2. 'Goldrich' contribution to phenolic content: Sum of squares (SS) and model parameters coefficients. SS r : SS relative; SS t : SS total; p-v: p-value; G r : Goldrich relative; Sig: Significance.

Year
Goldrich The variety 'Goldrich' showed a significant genetic effect on total polyphenol content. A coefficient of 382.28 mg 100 g −1 DW, which represents more than 45% of the general average of the population. A similar genetic effect was observed for the specific phenolic compounds, except quercetin-3-glucuronide, in which the genetic effect of 'Goldrich' was not significant. The genetic effect of 'Goldrich' for neochlorogenic and chlorogenic acids were 127.94 and 135.22 mg 100g −1 , representing 56% and 57% of the general average, respectively. For rutin, the coefficient was 110.7 mg 100g −1 (37.3% of the general average).
Concerning genetic expression, the cultivar 'Goldrich' had a genetic effect on the expression of all the genes studied. This effect was significant for the five genes studied ParPAL1, ParPAL2 ParDFR, ParFLS1, ParFLS2, ( Table 3). The genetic effect of 'Goldrich' varies from 58.2% in ParFLS2 to 98.7% in ParDFR. Table 3. 'Goldrich' contribution to genetic expression: Sum of squares and model parameters coefficients. SS r : SS relative; SS t : SS total; p-v: p-value; G r : Goldrich relative; Sig: Significance.

Relationships between Gene Expression and Phenolic Compound Accumulation
A correlation analysis performed among compounds and expression of genes studied revealed a significant correlation between neochlorogenic acid and the rest of the phenolic compounds. (Table 4). ParDFR expression revealed a positive correlation with ParAL2 (0.8) but also showed positive correlation with ParFLS1, which also correlated positively with ParPAL2. The gene expression obtained indicates interaction among the genes selected in key steps of the polyphenol pathway.
To complete the previous study, we studied the relationships between the gene expression and each phenolic compound content through a linear regression model (Tables S7 and S8). Ratios such as PAL/FLS, PAL/DFR or FLS/DFR were analyzed in order to study the differences in gene expression balance and its possible relationship with a preferential biosynthesis of anthocyanins, flavonols or caffeate-derivates. The trend between the phenolic compounds content and the expression of genes obtained is summarized in Figure 6. Both neochlorogenic and chlorogenic acid content were negatively influenced by Par-PAL2/ParFLS2 ratio. Due to neochlorogenic and chlorogenic acids being synthetized in the same pathway branch, the correlation between their content and the gene expression was also evaluated together. Data from the two-years average revealed a negative impact of ParPAL2/ParFLS1 in the neochlorogenic and chlorogenic total content. Concerning rutin and quercetin-3-glucuronide content, no significant correlation was found. The gene expression effect on the levels of accumulation of all the compounds was low.

Cis-Acting Elements Analysis
Due to the correlation among expression of some genes, a study of upstream sequences to find cis-acting elements recognized by MYB-like transcription factors was carried out (Figure 7).
In ParDFR, we found at 694 bp upstream from ATG, a TATA-BOX-PAL related, next to other TATA-box-like motif and MRE (a MYB-recognition element). In addition, a MYC motif was found together with a TATA-box-like. Furthermore, at 238 bp upstream from ATG, a MRE was found encoding also a BOXLCOREDCPAL, a motif related with the PAL promoter region. This MRE was closed to a MYC motif.
In ParPAL2, 403 bp and 255 bp upstream from ATG we found an MRE encoding a Both neochlorogenic and chlorogenic acid content were negatively influenced by ParPAL2/ParFLS2 ratio. Due to neochlorogenic and chlorogenic acids being synthetized in the same pathway branch, the correlation between their content and the gene expression was also evaluated together. Data from the two-years average revealed a negative impact of ParPAL2/ParFLS1 in the neochlorogenic and chlorogenic total content. Concerning rutin and quercetin-3-glucuronide content, no significant correlation was found. The gene expression effect on the levels of accumulation of all the compounds was low.

Cis-Acting Elements Analysis
Due to the correlation among expression of some genes, a study of upstream sequences to find cis-acting elements recognized by MYB-like transcription factors was carried out (Figure 7).

Polyphenol Content
The total polyphenol and individual phenolic compounds analyzed were genotypedependent. The higher values corresponded to genotypes derived from varieties characterized by important red skin color, such as the Mediterranean autochthonous varieties 'Ginesta', 'Palau', and 'Mitger' or the donor of resistance to PPV 'Goldrich'. This fact agrees with the references in which polyphenol content, anthocyanins and red color of fruits are related [31,32]. On the other hand, the linear model indicates that contribution of the variety 'Goldrich' to the content of polyphenols is remarkable in agreement with previous results [8]. This suggests that the introgression of resistance to PPV (the most In ParDFR, we found at 694 bp upstream from ATG, a TATA-BOX-PAL related, next to other TATA-box-like motif and MRE (a MYB-recognition element). In addition, a MYC motif was found together with a TATA-box-like. Furthermore, at 238 bp upstream from ATG, a MRE was found encoding also a BOXLCOREDCPAL, a motif related with the PAL promoter region. This MRE was closed to a MYC motif.
In ParPAL2, 403 bp and 255 bp upstream from ATG we found an MRE encoding a BOXLCOREDCPAL with a different sequence from the one found in ParDFR. However, 220 bp upstream from ATG we found the same MRE encoding a BOXLCOREDCPAL as found in DFR. In addition, a TATA-BOX-PAL related was found 139 bp upstream.
However, in ParPAL1 we did not find the same MRE encoding the BOXLCOREDCPAL, found in ParDRF and PAL2 upstream. Indeed, we found 551 bp upstream from ATG, also the same MRE motif but differing only in a nucleotide. On the other hand, in 276 bp upstream we found an MRE encoding a PAL-box-like motif, identical as found twice in PAL2.
In ParFLS1, we found four MRE, but none of them encoded a PAL-box-like motif. However, 438 bp upstream from ATG, we found a MYC motif, but also an MRE antisense.
In ParFLS2, we found 572 bp upstream the same MRE encoding a BOXLCOREDCPAL, as found in ParDFR and ParPAL2. Furthermore, 765 bp upstream we found the same MYC/MRE motif found in ParFLS1. Moreover, the same cis-acting element was found antisense 289 bp upstream from ATG, but antisense.

Polyphenol Content
The total polyphenol and individual phenolic compounds analyzed were genotypedependent. The higher values corresponded to genotypes derived from varieties characterized by important red skin color, such as the Mediterranean autochthonous varieties 'Ginesta', 'Palau', and 'Mitger' or the donor of resistance to PPV 'Goldrich'. This fact agrees with the references in which polyphenol content, anthocyanins and red color of fruits are related [31,32]. On the other hand, the linear model indicates that contribution of the variety 'Goldrich' to the content of polyphenols is remarkable in agreement with previous results [8]. This suggests that the introgression of resistance to PPV (the most important objective of the apricot breeding programs worldwide) is not negatively affecting the fruit quality of apricot, another important objective of the apricot breeding programs from the Mediterranean basin.
Genetic expression of ParPAL1 was the highest in 'Goldrich'. This accession has been previously identified as a contributor of phenolic compounds content in its derived hybrids [8]. Indeed, phenylalanine ammonia-lyase (PAL) plays a significant role in the phenylpropanoid metabolism pathway. PAL, as the first key enzyme in phenylpropanoid biosynthesis, catalyzes the conversion of L-phenylalanine to cinnamic acid, linking primary metabolism with secondary metabolism and becoming a speed-limiting step in phenylpropanoid metabolism [33]. In Prunus species, this genetic family consists of two PAL members [18] and in our study they were identified in apricot by synteny with peach (ParPAL1 and ParPAL2). We have identified the 'Goldrich' genetic effect in increasing ParPAL1 expression. This result, along with the previously described effect in the increase of phenolic compounds [8], suggests that this gene contributes to phenolic accumulation in the group of genotypes studied.
The next critical step analyzed is the one where the phenolic pathway branches off towards anthocyanins or flavonol synthesis. Dihydroflavonol reductase (DFR) is an enzyme that catalyzes the reduction from dihydroflavonols to anthocyanins biosynthesis [22][23][24]. Our results revealed major ParDFR expression in hybrids from cultivars with high percentages of red-blush [34]. This red coloration could be associated with anthocyanin accumulation as shown by previous studies in apricot [35]. Consequently, our results may suggest a higher ParDFR expression in those cultivars with high percentages of red-blush on the fruit skin.
Alternatively, flavonol synthase (FLS) catalyzes the reaction from dihydroflavonols to flavanols, a group of flavonoids in which rutin and quercetin-3-glucuronide are found. In apricot, two FLS encoding genes are present: ParFLS1 and ParFLS2. A two crop years average revealed lower expression of ParFLS2 in those genotypes without contribution of autochthonous genitors characterized by red-blush fruits. High expression was obtained in hybrids from cultivars with an important percentage (>50%) of fruit skin covered by a red-blush with a high intensity of over color [34]. Additionally, most of the cultivars of this group were also reported as the accessions with major total content in polyphenols. These results are in agreement with previous works, indicating that expression of FLS could be related to phenolic biosynthesis and also linked with anthocyanins accumulation [20,21].
At gene expression level, the 'Goldrich' effect was correlated positively with ParPAL1. Taking into account that 'Goldrich' has a positive contribution on polyphenol content, this fact suggests that ParPAL1 expression levels are related to the accumulation of phenolic compounds. On the other hand, we have not found correlations between individual ParPAL1 gene expression and any studied compound (Table 4). This fact can be explained because the analysis was carried out at full maturity, whereas main polyphenol compounds biosynthesis might occur in previous fruit stages. As neochlorogenic and chlorogenic content were influenced negatively with ParPAL2/ParFLS1 ratio (Figure 6), we suggest that ParPAL2 could be unbalancing the pathway to anthocyanin biosynthesis, having a negative impact on the synthesis of these compounds.

Genes and Its Inference in Polyphenols Pathway
Both PAL and FLS putative orthologous analysis resulted in two genes per enzyme identified in the P. armeniaca genome. Genome duplication is common among plants, leading to the duplication of genes [36]. Indeed, it has been described that the Rosaceae family origin comes from a polyploidization event, explaining the presence of two of these genes in the Rosaceae species [37]. In agreement, A. thaliana presents three copies of FLS and four of PAL, as result of the two polyploidization events that originated this species [38,39]. Functional redundancy and natural selection lead to gene loss, silencing or neo-functionalization [40]. Dosage-dependent genes are usually retained in the duplicated genomes [41], suggesting the dosage dependence of FLS and PAL in the phenylpropanoid pathway.
Taking into account this information, we did a screening of possible MRE cis-acting elements involved in phenolic biosynthesis. Results revealed a common MRE (MYBCORE) containing also a BOXLCOREDCPAL motif in ParDFR and ParPAL2, which suggested that both genes can be regulated by the same transcription factor. However, this MRE was not found in ParPAL1. This fact suggests different regulation or even different roles of each identified PAL paralogues in apricot. This is also supported by the high correlation of ParDFR and ParPAL2 expression (Table 4), which indicates that they share the same regulation and supports the existence of different regulation for each paralogue. This specialization between paralogues that result from ancestral genomic duplications has been previously described [42] and even can lead to neo-functionalization of genes. In addition, most of the accessions with a high expression for ParFLS2, such as 'Dama Rosa', are siblings of the traditional cultivar 'Ginesta', a cultivar that had more than 50% of red-blush [34]. These results suggest a possible role of ParFLS2 in anthocyanin synthesis, in agreement with previous studies that proposed a disequilibrium in the expression of FLS and DFR enzymes determine the accumulation of flavonols and anthocyanins [20,21,30].
The transcriptional study was made at fruit maturity. From the results obtained, a further analysis of ParPAL1 in different immature fruit stages would contribute to identify accurately its role in peel polyphenol content. Furthermore, the results obtained indicated a possible shared regulation for ParFLS2 and ParDFR expression related to anthocyanin biosynthesis in apricot. Our results contribute to unravel the relationship between genetic of red-blush trait and polyphenol compounds and the relationship between ParFLS2 and anthocyanin biosynthesis in apricot.

Plant Material
A set of 2 Mediterranean cultivars ('Canino' and 'Mitger') a North American variety ('Goldrich') and 9 hybrids from the IVIA's apricot breeding program were analyzed (Table 5). 'Goldrich' used as the main donor of resistance to PPV at the breeding program is one of the parents in most of the resistant hybrids obtained. 'Canino' and 'Mitger' are two autochthonous varieties used for introgression of adaptability to Mediterranean conditions. The trees are maintained at the IVIA's apricot collection located in Moncada (latitude 37 • 45 31.5 N, longitude 1 • 01 35.1 W), Spain. Five fruits per tree were harvested at the ripening stage during two growing seasons (2019 and 2020). For each fruit, the peel was separated from the flesh with a peeler. The samples consisted of a mix of the peel from 5 fruits per genotype and year. Samples were frozen with liquid nitrogen and kept at −80 • C until processing.

HPLC Analysis
For HPLC analysis, the tissue was processed to lyophilized powder. Tissue homogenization was carried out using a vortex. Phenolic compounds were extracted and determined according to the procedure described by [43,44]. Briefly, 10 mg of freeze-dried peel were mixed with 1 mL of DMSO/MeOH (1:1, v/v). Then, the sample was centrifuged (Eppendorf 5810R centrifuge; Eppendorf Iberica, Madrid, Spain) at 4 • C for 20 min at 10,000 rpm. The supernatant was filtered through a 0.45 µm nylon filter and analyzed by HPLC-DAD and HPLC-MS in a reverse-phase column C18 Tracer Excel 5 µm 120 OSDB (250 mm × 4.6 mm) (Teknokroma, Barcelona, Spain). An Alliance liquid chromatographic system (Waters, Barcelona, Spain) equipped with a 2695 separation module, was coupled to a 2996 photodiode array detector and a ZQ2000 mass detector. A gradient mobile phase consisting of acetonitrile (solvent A) and 0.6% acetic acid (solvent B) was used at a flow rate of 1 mL/min, with an injection volume of 10 µL. The gradient change was as follows: 10% 2 min, 10-75% 28 min, 75-10% 1 min, and hold at 10% 5 min. An HPLC-MS analysis was performed and worked under electrospray ion positive (flavonoids) and negative (phenolic acids) conditions. Capillary voltage was 3.50 kV, cone voltage was 20 V, source temperature was 100 • C, desolvation temperature was 225 • C, cone gas flow was 70 L/h.
Chromatograms were recorded at 340 nm absorbance. Chlorogenic acid and rutin were identified by comparison with pure standards obtained from Sigma-Aldrich (Sigma Co., Barcelona, Spain) using an external calibration curve. In addition, standards were run daily with samples for validation. Neochlorogenic acid and quercetin-3-glucuronide were tentatively identified based on their retention times, UV-vis spectra and mass spectrum characteristics and mass spectrum data with available data described in the literature. For the quantitative analysis, an external calibration curve with available standards chlorogenic acid and rutin was carried out. In addition, standards were run daily with samples for validation. All the solvents used were of LC-MS grade. Three samples per cultivar were analyzed and all the samples were run in triplicate. The Empower 2 software (Waters, Spain) was used for data processing. Standard measurements ( Figure S1) and a sample of the chromatograms in apricot peel sample ( Figure S2) are included.

Obtention of Gene Sequences and Cis-Acting Elements Motif Identification
To identify the genetic regulation in the phenolics biosynthesis pathway, a set of genes encoding for dihydroflavonol-4-reductase (DFR), flavonol synthase (FLS) and phenylalanine ammonia-lyase (PAL) were selected. To obtain putative orthologs of apricot species, a BLAST search was performed using A. thaliana and P. persica described genes in GDR (Genome Database of Rosaceae) [45] on Prunus armeniaca genome.
Identification of cis-acting elements was made from a total sequence of 1500 bp upstream of the start codons from the Prunus armeniaca genome published at Genomic Database of Rosaceae (GDR). Analysis of cis-acting elements was made using PLACE (Plant cis-acting Elements) database [46] and searching for described motifs related to the phenolic pathway.
In addition, to check the sequence conservation among species, a phylogenetic analysis was made with the obtained Prunus armeniaca genes predicted proteins and Prunus persica  [47]. Multiple protein sequence alignment was performed with the ClustalW program with MEGA X v.10.1.8 software [48], and a phylogenetic tree was built with the Neighbor-Joining method using MEGA X v.10.1.8 software with a bootstrap value of 1000 replicates.
The number of amino acid differences per site from between sequences (p-distance) was calculated with MEGA X Software with bootstrap method with 1000 replications. 1p-distance was calculated to similarity estimation among proteins. In addition, a BLAST and a synteny of Prunus persica against and Prunus armeniaca reference genome was performed in the GDR database. Moreover, a BLAST of Arabidopsis thaliana against Prunus armeniaca genome was also performed in GDR database [45].

Gene Expression
Samples consisted of 80 mg of powered tissue. RNA isolation was made using Plant/Fungi Total RNA Purification Kit (NORGEN, Thorold, ON, Canada) with some modifications. Frozen power tissue was diluted in 600 mL of lysis buffer C, a 2% PVP-40 and 2% β-mercaptoethanol was added. Purified RNA quality and integrity were checked by agarose gel electrophoresis, RNA was quantified by Qubit (Invitrogen, Carlsbad, CA, USA).
cDNA synthesis was obtained from 500 ng of RNA diluted in 10 µL reaction using the PrimeScript RT Reagent kit ('Perfect Real Time') (Takara Bio, Otsu, Japan).
Amplification was carried out with StepOnePLus Real-Time PCR System (Life Technologies, Carlsbad, CA, USA) software and TB Green Premix Ex Taq (Tli RNaseH Plus) (Takara Bio, Otsu, Japan) kit was used. Mix reaction contained 7.5 µL enzyme, 0.09 µL of primers (100 µM), 0.3 mL ROX, 5.02 µL H20, and 1 µL of cDNA. Mix was incubated at 95 • C for 30 s, followed by 40 cycles of 5 s at 95 • C and 30 s at 60 • C. Finally, the mix was incubated for 15 s at 95 • C, followed by a minute at 60 • C and 15 s at 95 • C. Apricot ACTIN and SAND geometric mean expression was used as housekeeping gene for normalization. Primers used are indicated in Table 6. For each year and genotype, the calculated expression was the mean of three biological replicates. Relative expression of each gene was calculated using the relative standard curve method. For testing the contribution of 'Goldrich' to the phenolic content and genetic expression in the set of accessions, we performed a regression of the data to a general linear model [8].
In the model, the phenotype is linearly explained as follows: where C is the general average of the population (constant), G Goldrich is the genetic effect of 'Goldrich', Year is the environmental effect due to the year and Residual is the residual effect. The model was calculated using the Statgraphics Centurion VII version 17.2.00 software (Statpoint Technologies, Warrenton, VA, USA). A quantitative variable for evaluating the genetic effect of 'Goldrich' was included with a value of 1 for 'Goldrich', 0.5 value for 'Goldrich × X' hybrids, and a null value for the other genotypes non-related to 'Goldrich'. Model parameters were estimated with a 95% confidence level (p ≤ 0.05).
Elucidation of parameters significantly influent in phenolic content was made by a linear regression model with Statgraphics Centurion VII version 17.2.00 software (Statpoint Technologies, Warrenton, VA, USA). Parameters included in the linear regression were: genetic expression in apricot of ParDFR, ParFLS1, ParFLS2, ParPAL1, and ParPAL2, and the following genetic expression ratios: ParPAL1/ParPAL2, ParPAL1/ParFLS1, ParPAL1/ParFLS2, ParPAL2/ParFLS1, ParPAL2/ParFLS2, and ParFLS1/ParFLS2. Non-significant parameters were excluded from each model and only those significant were maintained.
In addition, a multivariate analysis was performed with Statgraphics XVII software (Statpoint Technologies, Warrenton, VA, USA) to study Pearson correlation among gene expression, phenolic contents, and the relationships among all of them. Correlation with a p < 0.05 was considered significant.

Conclusions
The set of accessions studied showed the levels of expression of key genes in the polyphenol biosynthesis pathway are genotype-dependent. In addition, cultivar 'Goldrich', used as donor of PPV resistance, contributed positively to ParPAL1 expression levels. This genetic expression agrees with the previously described contribution to total polyphenol content. Transcriptional data of the main genes involved in critical points at the polyphenol pathway have been described and their relationships with the different polyphenol compounds identified. Higher expression of ParDFR and ParPAL2 has been associated to red-blushed accessions. Differences in expression between paralogues in the phenolic pathway can be linked to the presence of a BOXCOREDLPAL cis-acting element related to the genes involved in anthocyanin synthesis: ParDFR, ParFLS2, and ParPAL2.