Genome-Wide Association Study of Healthful Flavonoids among Diverse Mandarin Accessions

Mandarins have many unique flavonoids with documented health benefits and that help to prevent chronic human diseases. Flavonoids are difficult to measure and cannot be phenotyped without the use of specialized equipment; consequently, citrus breeders have not used flavonoid contents as selection criteria to develop cultivars with increased benefits for human health or increased tolerance to diseases. In this study, peel, pulp, and seed samples collected from many mandarin accessions and their hybrids were analyzed for the presence of selected flavonoids with documented human health benefits. A genome-wide association study (GWAS) was used to identify SNPs associated with biosynthesis of flavonoids in these mandarin accessions, and there were 420 significant SNPs were found to be associated with 28 compounds in peel, pulp, or seed samples. Four candidate genes involved in flavonoid biosynthesis were identified by enrichment analysis. SNPs that were found to be associated with compounds in pulp samples have the potential to be used as markers to select mandarins with improved phytonutrient content to benefit human health. Mandarin cultivars bred with increased flavonoid content may provide value to growers and consumers.


Introduction
Beneficial flavonoid content may be a consumer-driven trait that potentially can be improved through breeding and used as a tool to differentiate mandarins with convenience traits in the marketplace, as mandarins with higher levels of healthful flavonoids and marketed as such may be preferred by health-conscious consumers. Flavonoids are a diverse, large group of plant-based compounds, many of which are reported to have beneficial health effects. Consumers are seeking foods that improve their quality of life and prevent nutrition-dependent chronic diseases [1]. Citrus contains many unique flavonoids known to have health-promoting properties, specifically monoterpenes and flavanones, which are rarely present in other plants [2]. Mandarins are a large and phenotypically diverse group of citrus with a flavonoid composition that has been analyzed in "wild Chinese" [3] and conventional cultivars [4]. A recent study by Wang et al. [5], showed the metabolic diversity of flavonoids in citrus species, including 14 mandarin cultivars that were tested to showcase flavonoid diversity. Differences in mandarin flavonoid contents suggests a genetic component that can be exploited in a modern plant breeding program.
Citrus breeding is a long-term process and has many complexities that other crop species do not. Citrus has a long juvenility period and large tree size, which can substantially increase cultivar release time and the costs associated with running a breeding

Sample Preparation and Flavonoid Extraction
Peel (flavedo and albedo), pulp (juice vesicles, segments, and segments walls) and seeds (if present) were collected from the whole fruit. Peel and pulp samples were collected for each accession as follows: the fruits were divided into three equal groups that contained at least two fruit per biological replicate. Peel and pulp were separated and ground into a fine powder with liquid nitrogen and stored at −80 • C. Seeds were bulked per accession and separated into three experimental replicates and ground with liquid nitrogen and stored at −80 • C. Peel, pulp and seed samples were freeze dried with a Lab Conoco Freezone 2.5 L freeze dryer (Kansas City, MO, USA) for five days until completely dry. Approximately ten milligrams of sample were weighed into 2 mL centrifuge tubes and exact weights were recorded for dry weight quantification. The extraction was done according to De Vos et al. [20]. Briefly, 1 mL of a solution of 0.1% formic acid in methanol was added to 10 mg of dry sample. Samples were vortexed and sonicated then centrifuged and dried in a Thermo Fisher Speed Vac Concentrator (Hanover Park, IL, USA). The dry extract was reconstituted in 1 mL of methanol and purified with SPE C18 cartridges from Restek (Bellefonte, PA, USA) and filtered with 0.22 µm, 13 mm diameter nylon syringe filters. Filtered extracts were transferred to LC-MS analytical vials and stored at −80 • C until analysis.

Analysis of Flavonoids with LC-MS/MS and Peak Detection and Quantification
Flavonoid analysis was performed with a Thermo TSQ Quantum Ultra triple quadrupole electrospray ionization coupled with a mass spectrometer, with an Accela 1250 quaternary UHPLC pump (LC-MS/MS) (Waltham, MA, USA). The system was equipped with an autosampler, a drawer kept at 10 • C and a nitrogen degasser. Flavonoids were separated on a Phenomonex Gemini C18 reverse phase column (3 µm 150 × 3 mm; Torrance, CA, USA). The column was kept at room temperature through the entire program. The gradient elution program was arranged with two eluants: eluant A, 0.1% formic acid in HPLC water, and eluant B, 0.1% formic acid in acetonitrile. The injection volume was 5 µL. The 45-min program was as follows: 0-20 min, 5-75% B, 25-26 min, ramped to 95% B; 26-33 min, 95% B with a flow of 0.2 mL min −1 . The mass spectrometry parameters were set as follows: spray voltage 3500 V (positive mode) and 2500 V (negative mode), sheath gas at 45 Arb, aux gas at 20 Arb, sweep gas at 1 Arb; CID gas at 1.5 mTorr, ion transfer tube temperature set at 235 • C, and vaporizer temperature set at 275 • C.
Flavonoid standards were infused individually into the mass spectrophotometer system to determine the correct m/z values for product ions. Standards were injected individually into the LC-MS/MS system to determine correct retention times. Retention times and m/z values were input into the Trace Finder software system (Thermo Scientific) for semi-automatic identification of compounds in samples. Standard mixtures of known concentrations were injected ten times to determine limits of detection (LOD) and limits of quantification (LOQ). Each compound was considered to be detected at 3:1 and quantified at a 10:1 signal to noise ratio [21]. Flavonoids in samples were quantified as the area under the curve relative to the area under the curve of the internal standard catechin. The quantification was calculated by dividing the relative area by the recorded dry weight for each sample, giving micrograms of compound per milligram of dry weight sample.

GWAS
High-density SNP genotype data was generated with Axiom TM Citrus 56AX (Affymetrix, Inc., Santa Clara, CA, USA) (58 K autosomal and 500 Chloroplast SNPs) developed by University of California, Riverside [22]. Stringent PolyHighResolution (PHR) loci were classified by Axiom TM Analysis Suite v1.1.1.66 (Affymetrix, Santa Clara, CA, USA). The genotyping data were composed of a total number of 51,297 nuclear SNPs distributed across all chromosomes. SNPs were filtered by call rate and minor allele frequency and linkage disequilibrium (LD) pruning using SNP and Variation Suite (SVS) V_8.6.0 (SVS, Golden Helix, Inc., Bozeman, MT, USA, http://www.goldenhelix.com (accessed on 17 July 2018)). The total number of SNPs was filtered with a call rate (CR > 0.95) and a minor allele frequency (MAF > 0.05) to yield 21,451 SNPs. The MAF and CR filtered data were LD pruned with a threshold of 0.5 calculated with the composite haplotype method (CHM) and data were LD pruned leaving 15,913 SNPs with low pairwise LD [23]. Population structure was determined using principal component analysis (PCA; Figure 1). An identity by descent (IBD) kinship matrix was created using Efficient Mixed-Model.
automatic identification of compounds in samples. Standard mixtures of known concen-trations were injected ten times to determine limits of detection (LOD) and limits of quantification (LOQ). Each compound was considered to be detected at 3:1 and quantified at a 10:1 signal to noise ratio [21]. Flavonoids in samples were quantified as the area under the curve relative to the area under the curve of the internal standard catechin. The quantification was calculated by dividing the relative area by the recorded dry weight for each sample, giving micrograms of compound per milligram of dry weight sample.

GWAS
High-density SNP genotype data was generated with Axiom TM Citrus 56AX (Affymetrix, Inc., Santa Clara, CA, USA) (58 K autosomal and 500 Chloroplast SNPs) developed by University of California, Riverside [22]. Stringent PolyHighResolution (PHR) loci were classified by Axiom TM Analysis Suite v1.1.1.66 (Affymetrix, Santa Clara, CA, USA). The genotyping data were composed of a total number of 51,297 nuclear SNPs distributed across all chromosomes. SNPs were filtered by call rate and minor allele frequency and linkage disequilibrium (LD) pruning using SNP and Variation Suite (SVS) V_8.6.0 (SVS, Golden Helix, Inc., Bozeman, MT, USA, http://www.goldenhelix.com (accessed on 17 July 2018)). The total number of SNPs was filtered with a call rate (CR > 0.95) and a minor allele frequency (MAF > 0.05) to yield 21,451 SNPs. The MAF and CR filtered data were LD pruned with a threshold of 0.5 calculated with the composite haplotype method (CHM) and data were LD pruned leaving 15,913 SNPs with low pairwise LD [23]. Population structure was determined using principal component analysis (PCA; Figure 1). An identity by descent (IBD) kinship matrix was created using Efficient Mixed-Model. Association Expedited (EMMAX) [24]. The association analysis was performed for each flavonoid compound measured in each fruit sample using the Clementine mandarin genome v1.0 as a reference (https://phytozome.jgi.doe.gov (accessed on 17 July 2018)). Twenty-eight of the 32 compounds that Association Expedited (EMMAX) [24]. The association analysis was performed for each flavonoid compound measured in each fruit sample using the Clementine mandarin genome v1.0 as a reference (https: //phytozome.jgi.doe.gov (accessed on 17 July 2018)). Twenty-eight of the 32 compounds that were chosen for GWAS were successfully quantified in mandarin accessions. Using the software SNP and Variation Suite (SVS), a mixed linear model (MLM) was used to make associations and population structure was corrected using a kinship matrix and PCA as covariates in MLM model. Manhattan plots were created using the −log10 p values for all LD-pruned SNPs used in the study. Q-Q plots were produced by plotting "expected −log10 p values" on the x-axis and "observed −log10 p values" on the y-axis. The effective number of independent SNPs was calculated using the simple m software (http://simplem. sourceforge.net (accessed on 17 July 2018)) [25] in R (R Core Team 2018). Significant associations were made when false discovery rate (FDR) was less than 1.23 × 10 −5 or −log10 (p values) were greater than 4.9. Subsequently, a sub-network enrichment analysis (SNEA) in Pathway Studio [26] was done using the default settings. SNEA uses a global expression regulatory network extracted from the entire PubMed database and full-text journals to extract regulatory networks. Using the non-parametric Mann-Whitney test, SNEA identified significant (p ≤ 0.05), or over-represented (p ≤ 0.05) ontologies. The SNEA determined homologous genes and their specific involvement in biological processes. Genes were considered candidates if annotations were identified as involved in flavonoid related biological processes.

Analysis of 28 Target Flavonoids among Diverse Mandarin Accessions
Twenty-eight target compounds were quantified in at least one sample type of the mandarin accessions, and 25 compounds were quantified in all samples ( Table 1). The compounds quercetin, rhoifolin, taxifolin, and scutellarein were not detected in any mandarin accession. The mean, minimum, and maximum concentrations of each compound varied depending on the compound and the sample ( Table 1). The minimum value zero was considered to be below the detection limit for the LC-MS/MS instrument. Most compounds had minimum values below the detection limit represented by a 0 µg/g value (Table 1). Coumarin was not detected in seed samples for any accession. Kaempferol was only detected in seed samples, while umbelliferone was only detected in peel samples. Hesperidin, nobiletin, and tangeretin were detected in all accessions and samples types indicated by minimum values greater than 0 µg/g (Table 1). Neodiosmin was detected in seed of all accessions (min = 0.030 µg/g) but was not detected in peel and pulp samples of some accessions (Table 1). Limonin and heptamethoxyflavone were detected in pulp and seed of all accessions but were below detectable levels for some accessions in peel samples. Didymin was detected in pulp of all accessions (min = 0.0050 µg/g) but had below detectable levels for peel and seed in some accessions (Table 1).

GWAS
GWAS was performed with a subset of SNPs that were LD pruned and filtered with a CR > 0.9 and MAF> 0.05, thus generating 15,913 SNPs that were used for analysis. GWAS was performed for each compound detected in each sample resulting in a total of 420 SNPs (Table S2, List of GWAS determined SNPs in mandarin peel, pulp, and seed samples). In total, fifty-three Manhattan plots showing significant SNPs were generated with Q-Q plots that show model fitness ( Figure S1, Genome Wide Association Manhattan and Quantile-Quantile Plots Presented by Compound for Significant SNPS). Significant SNPs were found on all nine chromosomes with the greatest number (75)

GWAS
GWAS was performed with a subset of SNPs that were LD pruned and filtered with a CR > 0.9 and MAF> 0.05, thus generating 15,913 SNPs that were used for analysis. GWAS was performed for each compound detected in each sample resulting in a total of 420 SNPs (Table S2, List of GWAS determined SNPs in mandarin peel, pulp, and seed samples). In total, fifty-three Manhattan plots showing significant SNPs were generated with Q-Q plots that show model fitness ( Figure S1, Genome Wide Association Manhattan and Quantile-Quantile Plots Presented by Compound for Significant SNPS). Significant SNPs were found on all nine chromosomes with the greatest number (75)    Distribution of compounds and numbers of significant SNPs can be visualized by sample type (Figure 3). In pulp, diosmin had the most and diosmetin, eriodictyol, isosakurenetin, and naringenin had the least number of associated SNPs ( Figure 3A). Eriocitrin in seed sample had the highest number, however diosmetin, isoakurenetin, naringenin, narirutin, neoeriocitrin, and neohesperidin had similar numbers of associated SNPs ( Figure 3B). Hesperetin and hesperidin had the lowest number of significant SNPs in seed sample ( Figure 3B). Diosmetin and sinensetin had the greatest and, narirutin and nomilin had the least, number of associated SNPs in peel sample ( Figure 3C). kurenetin, and naringenin had the least number of associated SNPs ( Figure 3A). Eriocitrin in seed sample had the highest number, however diosmetin, isoakurenetin, naringenin, narirutin, neoeriocitrin, and neohesperidin had similar numbers of associated SNPs (Figure 3B). Hesperetin and hesperidin had the lowest number of significant SNPs in seed sample ( Figure 3B). Diosmetin and sinensetin had the greatest and, narirutin and nomilin had the least, number of associated SNPs in peel sample ( Figure 3C).  A summary of the most significant SNPs per compound in each sample also shows a distribution over 9 chromosomes ( Table 2). The most significant SNP, associated with kaempferol in seed sample, had a p-value of 4.51 × 10 −21 and was located on chromosome 4 ( Table 2). The greatest number of compounds had their most significant SNP located on chromosome 6 and included apignein, didymin, diosmetin, eriodictyol, hesperetin, limonin, luteolin, naringenin, naringin, nomilin, and poncirin ( Table 2). Chromosome 3 had nine of the most associated SNPs found for the compounds apigenin, eriocitrin, isosakurenetin, isosinensetin, naringin, neoeriocitrin, nomilin, and sinensetin. Eight compounds (diosmetin, hesperetin, isosinensetin, naringenin, narirutin, neohesperidin, nobiletin, and poncirin) had SNPs with the highest significance associated on chromosome 5 ( Table 2). Five of the most significant SNPs were found on chromosome 2 and were associated with diosmin, isosakurenetin, luteolin, narirutin, and nobiletin ( Table 2). The chromosomes that had the most significant SNPs associated with four compounds were chromosome 1 (apigenin, diosmetin, diosmin, heptamethoxyflavone) and chromosome 4 (heptamethoxyflavone, kaempferol, limonin, neohesperidin) ( Table 2). The three compounds neohesperidin, poncirin, and sinensetin were most associated with SNPs on chromosome 9. Chromosome 7 and 8 each had one significant SNP associated with eriodictyol and diosmin respectively ( Table 2). The compounds luteolin, naringenin, and sinensetin had the most significantly associated SNPs, AX-160742943, AX-160808091, and AX-160947519 respectively found in peel and pulp samples ( Table 2). SNPs associated with didymin in peel and seed samples had the same SNP (AX-160548920) and highest p-value (Table 2).

Candidate Genes
Four candidate genes were selected based on their involvement in flavonoids regulation and biosynthesis in maize, rice, and arabidopsis (Table 3). Candidate genes located on chromosomes 1 and 2, using the Clementine mandarin genome v1.0 as a reference, were significantly associated with heptamethoxyflavone and naringenin in seed samples. The candidate genes found on chromosomes 6 and 3 were associated with naringenin and didymin in peel samples. Gene locations, chromosome number, annotation description, pathway affiliation, Arabidopsis gene ID, predicted citrus ID, and references for flavonoid affiliation are listed in Table 3.

Discussion
Genome-wide association is a powerful tool that allows plant breeders to dissect complex quantitative traits and provides higher genetic resolution than bi-parental quantitative trait loci (QTL) mapping studies [33]. Citrus breeding is a long-term endeavor, and GWAS provides the potential for genetic marker development based on genome locations associated with difficult to measure traits, that could in turn, accelerate citrus breeding progress. Difficult to measure traits have been the target of GWAS studies in many crops [34][35][36]. Most noteworthy is the use of GWAS for complex nutritional traits, such as flavonoid contents; for example, GWAS was used to identify genes or QTLs associated with total flavonoid content in barley [37], sorghum [38], and rice [39]. This current study is unique in that individual flavonoids of citrus were phenotyped in each sample using a targeted LC-MS/MS approach for their identification and quantification. To our knowledge this is the first study to use a targeted flavonoid (measuring individual compounds) and GWAS approach to uncover associations.
The variation of the flavonoid concentrations that was measured was considerable, with wide ranges from below the limit of detection to over 20,000 µm/g for dry weight sample. Similar ranges were found for targeted flavonoids in juice and edible parts of citrus fruits [40]. For some compounds, the concentrations were not below the detection limit (e.g., tangeretin), however for those compounds no or a few significant SNPs were found (Tables 1 and 2). This may be due to inadequate variation of the compound in the population, a similar issue is discussed by Caballero et al. [41]. In total we identified 420 significant SNPs associated with 28 flavonoids in peel, pulp, and seed samples. A range in numbers of significant SNPs was found for flavonoids in each sample ( Table 2). A study in corn also found variation in significant SNP numbers for two nutritional traits, zinc and iron content [42].
In mandarin pulp samples there were 139 GWAS-identified SNPs associated with 17 compounds (apigening, diosmetin, diosmin, eriodictyol, heptamethoxyflavone, hesperetin, hesperidin, isosakurenetin, isosinensetin, limonin, naringenin, naringin, neohesperidin, nobiletin, nomilin, poncirin, and sinensetin). Mandarin pulp is the part of the fruit that is usually consumed and many compounds with significant SNP associations have been shown to have human health benefits [43]. One of these compounds, nobiletin, had nine significantly associated SNPs and was found at high levels in pulp of some but not all accessions. Nobiletin is a polymethoxylated flavone, which is a class of flavonoids with high absorption and bioactivity [44]. Nobiletin has been extensively studied and has been found to have bioactivities that include inhibiting tumor cell growth, antioxidant activity, inhibition of inflammation, and prevention of cardiovascular diseases. The other flavonoid compounds significantly associated with SNPs have bioactivities similar to nobiletin in pulp [45]. The prevalence of nobiletin in mandarin pulp samples, and its high human absorption, makes it an excellent marker development candidate for mandarin phytonutrient improvement.
Mandarin peel and seed samples had 85 and 196 SNPs significantly associated with targeted flavonoid concentration, respectively. Peel sample had the highest amounts and greatest diversity of compounds of all the sample types tested. Mandarin peel is edible and sometimes used in teas, as an ingredient in some dishes, or as an ingredient in traditional Asian herbal medicines, however, it is not typically consumed. Breeding mandarin varieties with healthful flavonoid compounds in peel and seed may be of use to processors that are interested in high levels of a particular compound to use for processed products (e.g., supplements or powders). In arabidopsis, flavonoid compounds have been shown to elicit plant defense responses against plant pathogens and diseases [46]. In citrus high flavonoid accumulation of tangeretin and naringin inhibited the growth of the fungus Penicillium digitatum on citrus peels [47]. Thus, it is possible that marker development and breeding for increased flavonoid content in peel sample may contribute to resistance for fungal diseases such as citrus black spot (Guignardia citricarpa), a new problem for fresh citrus growers. It is also important to mention that the significant SNPs found in peel, pulp, and seed sample need to be tested in a segregating population to confirm associations found in this study for flavonoids compounds.
Four candidate genes were found to be involved in flavonoid biosynthesis. The first of these, MYB96, codes for a transcription factor in arabidopsis ( Table 3). The MYB96 transcription factor was found to regulate stress responses and flavonoid biosynthesis in a rice model [27]. Recently in citrus, MYB family members were found to regulate flavonoid production [48]. The predicted MYB96 location in the Clementine genome is on chromosome 3 at LOC18048626 (Table 3). There were other SNPs that mapped to MYB transcription factors, however enrichment analysis showed no association between other MYB genes and flavonoid biosynthesis. The two genes, MPL12.14 and AT3G29635, were also found to be involved in flavonoid biosynthesis from the enrichment analysis. However, there were no studies in the literature that directly related MPL12.14 and AT3G29635 to flavonoid accumulation. The gene MPL12.14 was found in arabidopsis to be involved with the synthesis of phenylpropanoids for the construction of cell wall components [28]. The gene AT3G29635 was found to be involved with anthocyanin acyltransferases and phenylpropanoid regulation in Arabidopsis [30]. It may be that MPL12.14 and AT3G29635 are predicted to be involved in regulation of flavonoids, but there have not been any studies to validate their role in citrus. The homologous genes for MPL12.14 and AT3G29635 in Citrus clementina are on chromosomes 2 and 1 at locations LOC18053255 and LOC18055186 respectively. The fourth candidate gene found was CRTISO which was mapped to LOC18038313 in the Citrus clementina genome and regulates carotenoid synthesis. In addition, CRTISO was found to be involved in flavonoid biosynthesis and regulation in arabidopsis and maize [31,32].
In other citrus flavonoid studies, genes such as chalcone synthase (CHS) and chalcone isomerase, that are upstream in the biosynthetic pathway of a given flavonoid, have been identified and shown to regulate flavonoid biosynthesis [49,50]. Citrus contains many unique flavonoids and genetic regulators of downstream products of the flavonoid pathways that have not been identified because many of them do not exist in model plants.
More studies with citrus mutants are needed to elucidate genes in the flavonoid pathway that are responsible for the synthesis of unique citrus flavonoids. Flavonoids not only play an important role in optimizing human health but are also linked to plant disease and pest tolerance. Recently Killiny et al. [51] found that citrus cultivars with greater levels of volatile and non-volatile compounds may have greater tolerance to Huanglongbing (HLB). The relationship between HLB tolerance and flavonoid contents have yet to be elucidated. However, there are studies that show there are higher levels of flavonoids in fruits from HLB-infected trees [52]. Flavonoid association studies can play an important role in breeding for superior plant and human health. Genetic components have an important role in the regulation of citrus flavonoids, and this foundational study can be used to further validate and investigate regulation of the citrus flavonoid pathways.
Supplementary Materials: The following supporting information can be downloaded at: https: //www.mdpi.com/article/10.3390/plants11030317/s1, Figure S1: Genome Wide Association Manhattan and Quantile-Quantile Plots Presented by Compound for Significant SNPS; Table S1: Diverse mandarin accessions used for the genome wide association of healthful flavonoids; Table S2: List of GWAS determined SNPs in mandarin peel, pulp, and seed.