Genetic Diversity of Blueberry Genotypes Estimated by Antioxidant Properties and Molecular Markers

Blueberries (Vaccinium spp.) have gained much attention worldwide because of their potential health benefits and economic importance. Genetic diversity was estimated in blueberry hybrids, wild clones and cultivars by their antioxidant efficacy, total phenolic and flavonoid contents, and express sequence tag–simple sequence repeat (SSR) (EST–SSR), genomic (G)–SSR and express sequence tag–polymerase chain reaction (EST–PCR) markers. Wide diversity existed among the genotypes for antioxidant properties, with the highest variation for DPPH radical scavenging activity (20-fold), followed by the contents of total flavonoids (16-fold) and phenolics (3.8-fold). Although a group of 11 hybrids generated the maximum diversity for antioxidant activity (15-fold), wild clones collected from Quebec, Canada, had the maximum variation for total phenolic (2.8-fold) and flavonoid contents (6.9-fold). Extensive genetic diversity was evident from Shannon’s index (0.34 for EST–SSRs, 0.29 for G–SSR, 0.26 for EST–PCR) and expected heterozygosity (0.23 for EST–SSR, 0.19 for G–SSR, 0.16 for EST–PCR). STRUCTURE analysis separated the genotypes into three groups, which were in agreement with principal coordinate and neighbour-joining analyses. Molecular variance suggested 19% variation among groups and 81% among genotypes within the groups. Clustering based on biochemical data and molecular analysis did not coincide, indicating a random distribution of loci in the blueberry genome, conferring antioxidant properties. However, the stepwise multiple regression analysis (SMRA) revealed that 17 EST–SSR, G–SSR and EST–PCR markers were associated with antioxidant properties. The study is valuable to breeding and germplasm conservation programs.


Introduction
Blueberry is an economically and medicinally high-value crop that belongs to the genus Vaccinium L., which contains about 400-500 species native to countries all over the world except Antarctica and Australia [1,2]. The five major groups of blueberries grown commercially include (1) lowbush (V. angustifolium Ait.; 2n = 4x = 48), (2) highbush (V. corymbosum L.; 2n = 4x = 48), (3) half-high (the product of hybridization between highbush and lowbush blueberries), (4) southern highbush (hybrids between V. corymbosum and mostly V. darrowi Camp and/or V. ashei Reade) and (5) Rabbiteye (V. virgatum Ait.; syn. V. ashei; 2n = 6x = 72) [3]. Blueberries are consumed fresh or in other commercially processed forms mainly for their high antioxidant activity, which fight off harmful radicals in the body. The high antioxidation activity is due to high concentrations of anthocyanins, flavonoids and phenolic acids. These phenolic compounds are linked to an improvement of night vision, prevention of macular degeneration, anticancerous activity and reduction in heart disease [4,5]. The clinical benefits of blueberry are not only limited to its fruits; blueberry leaves have also been shown to possess antidiabetic [6] and antimicrobial activities [7] and have been used as a traditional remedy for the treatment of diabetic symptoms [8,9]. The lowbush blueberry cultivar Fundy (FUN), a selection from open-pollinated seedlings of cultivar Augusta, was developed at Kentville Research and Development Centre, AAFC, NS, Canada [24]. The parentages of highbush blueberry cultivar Polaris (POL) and four half-high blueberry cultivars, Patriot (PAT), Chippewa (CHIP), St. Cloud (STC) and Northblue (NOB), are presented in Table 1. The lowbush blueberries were less than 0.5 m in height, and the highbush blueberries were 2-2.5 m tall. Half-high and hybrid blueberries used in the study were of intermediate height, between lowbush and highbush blueberries. All genotypes were grown and maintained in a greenhouse in 6 L plastic pots containing a 2:1 peat:perlite mixture, under natural light conditions (maximum PPF 90 µmol m −2 s −1 ) at 20 ± 2 • C and 85% relative humidity. Normal cultural practices were followed to maintain the plants [10]. In a replication of three from each plant, fresh young leaves were collected, shock-frozen with liquid nitrogen, and stored at −80 • C until DNA and phenolic extraction for molecular and biochemical analyses, respectively.

Preparation of Leaf Extracts
Briefly, 500 mg of shock-frozen leaves of each genotype was homogenized in a FastPrep-24 Tissue and Cell Homogenizer (MP Biomedicals, Irvine, CA, USA) containing a solution of 80% aqueous acetone and 0.2% formic acid (1:4 g/mL) [28,29]. The homogenate was retained at 4 • C, with slow agitation for 30 min, followed by centrifugation at 20,000× g at 4 • C for 15 min using an Allegra 64R (Beckman Coulter Inc., Palo Alto, CA, USA), and the supernatant was collected. Extraction was performed twice more with the pallets, and the supernatant was mixed with the original crude extract. The extracts were saved in an ultralow freezer (Thermo Scientific, Burlington, ON, Canada) for further determination of antioxidant capacity and total phenolic and flavonoid contents. All chemical analyses were conducted thrice with each sample, and mean values were used for analysis.

Total Antioxidant Activity (TAA)
Free radical scavenging activity was estimated as percentage inhibition of radicals. 2,2-Diphenyl-1-picrylhydrazyl (DPPH) is an artificially stabilized free radical [29]. An aliquot of diluted extract or gallic acid standard solution (5 mg/mL; ≥98% purity) was added to 1.7 mL DPPH methanolic solution (0.06 mM), mixed thoroughly, and kept in the dark for 20 min at room temperature. The mixture's absorbance was examined at 517 nm using an Ultrospec 4300 Pro UV-visible spectrophotometer (Amersham Biosciences Corp., San Francisco, CA, USA). Blank was prepared using aqueous acetone (80%) mixed with the DPPH solution. The gallic acid standard curve (at ≥98% purity) was prepared, and the linearity of the gallic acid standard curve (r 2 = 0.98) was obtained in the range of 20-80 µg/mL. The results were expressed as milligrams of gallic acid equivalents (GAE) per gram of fresh leaf (mg GAE/g fl). The following formula was used to calculate percentage inhibition [30]: % Radical scavenging activity = [(Absorbance (Blank) − Absorbance (Extract) )/Absorbance (Blank) ] × 100

Total Phenolic Content (TPC)
An optimized Folin-Ciocalteu method [28,31] was used to estimate total phenolic content. Folin-Ciocalteu reagent (100 µL) was added to the diluted leaf extract (100 µL; 200 µg/mL), and 200 µL of 20% saturated (w/v) sodium carbonate was added to it after 5 min, followed by 1.5 mL distilled water. The mixture was incubated in the dark for 35 min at room temperature and centrifuged at 4000× g for 10 min in Allegra 64R. The absorbance of the gallic acid standard solution (5 mg/mL) and test samples were measured with Ultrospec 4300 Pro at 725 nm wavelength after 3 min. The absorbance values were recorded linearly of the standard calibration curve (r 2 = 0.98) for the gallic acid standard solution (5 mg/mL; ≥98% purity), taken in a range of 2.5-10 µg/mL, and outcomes are presented as mg GAE/g fl.

Total Flavonoid Content (TFC)
Total flavonoid content was estimated using colorimetric assay [32] following Goyali et al. [28]; 500 µL of sample extract was added to 2 mL distilled water, followed by 150 mL of 5% (w/v) sodium nitrate, to which 150 mL of 10% (w/v) aluminum chloride was added after 5 min. Then, 1 M sodium hydroxide solution (1 mL) was added to the mixture after 6 min of incubation at room temperature. The mixture was diluted with 1.2 mL distilled water, and the absorbance was measured at 510 nm using Ultrospec 4300 Pro. Catechin solution (1 mg/mL) was used in a range of 20-200 µg/mL for standard curve calibration (r 2 = 0.99), and the total flavonoid content was calculated as milligrams of catechin equivalents per gram of fresh leaves (mg CE/g fl).
The PCR was carried out in an optimized amplification reaction mixture (25 µL) containing 20 ng of template DNA, 1× PCR buffer (1.5 mM MgCl2, pH 8.7; Qiagen), 200 µM of each deoxynucleotide triphosphate (dNTP), 0.2 µM of each of the 20 forward and reverse primers and 0.63 unit of Taq DNA polymerase (Qiagen). Mastercycler ep Gradient S (Eppendorf AG, 22331 Hamburg, Germany) was used to amplify DNA, which was programmed for a 10-min initial "hot start" denaturation step at 94 • C, and then 40 cycles of 40 s of a denaturation step at 92 • C, a 70-s annealing step at an appropriate annealing temperature and a 2-min extension step at 72 • C. The final extension step was at 72 • C for 10 min, and then the sample was held at 4 • C. The amplified DNA products were separated by electrophoresis using 2% agarose 3:1 high-resolution blend (HRB; Ameresco, Solon, OH) gel, precasted with 2× tris-borate ethylene diamine tetraacetic acid buffer and 1× GelRed nucleic acid stain (Biotium Inc., Hayward, CA 94545, USA) solution, along with a low range 100 base pair (bp) DNA ladder and a midrange 1 kb DNA ladder (Norgen Bioteck Corp., Thorold, ON, Canada). UV light enabled visualization, scoring, and recording banding patterns in a transilluminating gel documentation system (InGenius 3, Syngene, Beacon House, Cambridge, UK). The length of the DNA fragment was calculated by Gene Tools software (Syngene) by comparison with standard size marker mobility [22].

Data Collection and Statistical Analysis
Data for antioxidant activity, phenolic, and flavonoid contents are presented as mean value ± standard deviation (SD) of three replications. The TAA result among groups was statistically evaluated by variance analysis (ANOVA), and Tukey's test was employed for comparing treatment means at a critical difference (p) of ≤0.05. Because the residuals of TPC and TFC among groups followed non-normal distribution, a violation of one of the preconditions of ANOVA, the observations were statistically evaluated by a nonparametric Kruskal-Wallis test [33], with the significance value fixed at ≤0.05. Similarly, the residuals of TAA, TPC, and TFC among the individuals followed non-normal distribution; the observations were statistically evaluated by a nonparametric Kruskal-Wallis test, with the significance value fixed at ≤0.05. The correlation coefficient (r), coefficient of determination (r 2 ), and linear regression between TPC and TFC, TPC and TAA, and TFC and TAA were analyzed at a confidence interval of 95%. To eliminate the effects of different scales of measurement, the biochemical data were standardized by subtracting mean values from the original values, followed by division with SD [34]. A Euclidean dissimilarity distance matrix was generated using these standardized values. The agglomerative hierarchical clustering (AHC) method, an algorithm that works on the dissimilarities between various individuals and groups them, was used to generate an unweighted pair group average (UPGMA) dendrogram based on the Euclidean dissimilarity distance matrix of antioxidant activity and phenolic and flavonoid content data. Principal component analysis (PCA), a multivariate technique that analyzes data matrices of several correlated quantitative dependent variables, was performed for individuals, based on biochemical data, as well as for biochemical components [35]. The above analysis was performed using XLSTAT 2020 version 22.3.21.0 (©Addinsoft, New York, NY 10001, USA).
EST-SSR, G-SSR, and EST-PCR markers, which can discriminate between homozygous and heterozygous individuals, are codominant in nature. In polyploid plants like blueberry, it is very difficult to perform codominant scoring of alleles in heterozygote samples as it is extremely difficult to calculate the number of alleles present at a particular locus from band intensities; the only suggested way of scoring is to record the absence or presence of an allele denoted as 0 or 1, respectively, in a matrix, and all bands from one primer are treated as alleles and one locus [36,37].
Indices, including polymorphic information content (PIC) [38], effective multiplex ratio (EMR) [39], discrimination power (D) [40] and resolving power (R) [41], were calculated using the program Online Marker Efficiency Calculator (iMEC) [42]. These indices give an idea about the primer's ability to discriminate among genotypes and the primer system's overall utility. The PIC is a value that reflects the marker's ability to detect polymorphism within the population. PIC gives an approximation of a primer's discrimination capacity based on the number of alleles that are expressed and the respective allelic frequencies.
This was used to evaluate the level of informativeness of each primer (high, PIC > 0.5; moderate, 0.5 > PIC > 0.25; low, PIC < 0.25) [38]. The associated value EMR was calculated as the product of a total number of polymorphic loci and the portion of polymorphic loci over a total number of loci [39]: EMR = n p .n p /n where n p is the number of polymorphic loci, and n is the total number of loci. Therefore, the higher the value of EMR, the more competent the primer is. MI is another associated statistical parameter used to evaluate a marker system [39]. It was calculated as follows: MI = PIC.EMR Primer discrimination power, D, is defined as the probability that two randomly chosen individuals have different banding patterns and are, therefore, differentiable [40]. This was calculated as follows: where C is the confusion probability. For the i-th pattern of the given j-th primer, present at frequency pi in a set of varieties, C = Σ ci = Σ pi Npi-1/N−1, where for N individuals, C is equal to the sum of all ci for all of the patterns generated by the primer. Resolving power, R, was calculated as follows: where Ib or band informativeness is denoted on a scale of 0 or 1 and is described as Ib = 1 − (2 × |0.5 − p|); p is defined as the samples' portion of the observed band. The resolving power or the primer's capability to differentiate between genotypes can be represented by the sum of these adjusted values for all generated bands [41].
To compare diversity among blueberry genotypes, indices such as percentage of polymorphic loci (PL), the observed (Na) and effective number of alleles (Ne) [43], expected heterozygosity/Nei gene diversity index (He) [44] and Shannon's information index of diversity (I) [45] were calculated using GenAlEx version 6.5 [46].
A Bayesian clustering approach for population structure analysis was used for 70 genotypes using the STRUCTURE program ver. 2.3.4 (https://web.stanford.edu/group/ pritchardlab/structure_software/release_versions/v2.3.4/html/structure.html (accessed in September 2020)). The software uses a Markov Chain Monte Carlo (MCMC) estimation to determine the number of subpopulations (∆K) [47]. Using this model, some populations (K) were presumed to be present, and each of them was characterized by a set of allele frequencies at every locus. Genotypes in the sample were assigned to clusters (populations) or jointly to more populations if their genotypes indicated that they were admixed. Every locus was thought to be independent, and each K population was presumed to follow the Hardy-Weinberg equilibrium. The ∆K method [48] was used to determine the most likely number of K. The number of genetically different clusters (K) was kept to range 1 to 10, with five independent runs, followed by a burn-in length of 100,000 and 100,000 iterations. POPHELPER, an R package, and a web server (http://royfrancis.github.io/pophelper/, accessed on 1 September 2020) were used to estimate the number of population clusters and their visualization [49].
The software DARwin 6.0.9 [50] was used to depict phylogenetic trees with EST-SSR, G-SSR, and EST-PCR markers using unweighted neighbour-joining (NJ). Jaccard's coefficient [51] was used to calculate the dissimilarity matrix, with 30,000 bootstraps. GenAlEx version 6.5 [46] was used for principal coordinate analysis (PCoA) and hierarchical analysis of molecular variance (AMOVA). For AMOVA, the blueberry genotypes were divided into seven groups, out of which four groups were comprised of wild clones collected from four Canadian provinces (NL, PE, QC, and NB). Group 5 consisted of six blueberry cultivars, Group 6 of 11 hybrids from the first cross (HB1-11), and Group 7 of 17 hybrids from the second cross (HB12-28).
The association between EST-SSR, G-SSR and EST-PCR markers and biochemical attributes of blueberry leaf extracts (TAA, TPC and TFC) was estimated by stepwise multiple regression analysis (SMRA) [54] using SPSS version 27 (IBM Corp., Armonk, NY 10504-1722, USA). The biochemical components were treated as dependent variables, and molecular markers were treated as independent variables. The F-value criteria was set between 0.045-0.099 for the inclusion or removal of independent variables for regression [55].

Total Antioxidant Activity (TAA)
Total antioxidant activity was highly diverse (p < 0.015) in the present material (Table 3 and Table S2). The values varied 20-fold among all genotypes, with BC6 and BC22 having the highest TAA (5.82 ± 0.03 mg GAE/g fl), followed by BC13 (5.45 ± 0.09 mg GAE/g fl), while HB6 had the lowest (0.29 ± 0.10 mg GAE/g fl). The variation for TAA was the highest among genotypes in Cross 1 (15 times) and lowest in Cross 2 (2.04 times), followed by cultivars (2.09 times), among all groups. The NB wild clones had the highest average TAA value (4.29 ± 1.06), followed by CVs (4.23 ± 1.10 mg GAE/g fl) and Cross 2 hybrids (4.08 ± 0.89 mg GAE/g fl). Hybrids in Cross 1 had the lowest average TAA value (2.55 ± 1.28 mg GAE/g fl) among all the groups. In NL clones, the highest TAA value was observed in genotype BC6 (5.82 ± 0.03 mg GAE/g fl) and the lowest in BC9 (2.45 ± 0.07 mg GAE/g fl). The TAA values for PE clones ranged from 1.14 ± 0.08 for BC12 to 5.45 ± 0.09 for BC13. In wild QC clones, the TAA was highest in BC22 (5.82 ± 0.03 mg GAE/g fl) and lowest in BC28 (1.14 ± 0.05 mg GAE/g fl). The value of TAA among NB wild clones ranged from 1.84 ± 0.05 mg GAE/g fl for BC29 to 5.13 ± 0.07 mg GAE/g fl for BC34. The values of TAA among NB wild clones ranged from 1.84 ± 0.05 mg GAE/g fl for BC29 to 5.13 ± 0.07 mg GAE/g fl for BC34. The average TAA value for eight NB wild clones was the highest (4.29 ± 1.06 mg GAE/g fl) among all groups (Table 3). For cultivars, the highest TAA value was observed in highbush cultivar POL (5.19 ± 0.03 mg GAE/g fl), followed by FUN (5.12 ± 0.07 mg GAE/g fl), and the lowest in half-high cultivar STC (2.48 ± 0.08 mg GAE/g fl). The value for TAA in Cross 1 ranged from 0.29 ± 0.10 for HB6 to 4.48 ± 0.04 for HB11 and, in Cross 2, it ranged from 2.52 ± 0.07 mg GAE/g fl for HB16 to 5.13 ± 0.05 mg GAE/g fl for HB21.Two NL (BC6: 5.82 ± 0.03 mg GAE/g fl; BC13: 5.45 ± 0.09 mg GAE/g fl) and one QC (BC22: 5.82 ± 0.03 mg GAE/g fl) clones possessed more TAA than all cultivars (Table S2). The value of TAA among NB wild clones ranged from 1.84 ± 0.05 for BC29 to 5.13 ± 0.07 for BC34. The average TAA value for eight NB wild clones was the highest (4.29 ± 1.06) among all groups (Table 3). For cultivars, the highest TAA value was observed in highbush cultivar POL (5.19 ± 0.03) and the lowest in half-high cultivar STC (2.48 ± 0.08). The value for TAA in Cross 1 ranged from 0.29 ± 0.10 for HB6 to 4.48 ± 0.04 for HB11, and, in Cross 2, it ranged from 2.52 ± 0.07 for HB16 to 5.13 ± 0.05 for HB21.

Total Flavonoid Content (TFC)
The results for TFC are displayed in Table 3 and Table S2. The wild clones, cultivars and hybrids showed wide variation (15.63 times) among themselves for TFC (p < 0.0003), ranging from 0.64 ± 0.02 CE/g fl to 9.99 ± 0.11 CE/g fl for the cultivar NOB and NL wild clone BC2, respectively. The wild clones from NL had the highest average TFC (5.26 ± 2.49 CE/g fl), followed by PE (2.14 ± 0.79 CE/g fl) and QC (1.88 ± 1.59 CE/g fl) wild clones, among all groups. The average values of TFC were the lowest in Cross 2 (1.41 ± 0.29 CE/g fl), and it was followed by Cross 1 (1.39 ± 0.18 CE/g fl). For NL clones, the TFC value was the highest in BC2 (9.99 ± 0.11 CE/g fl) and the lowest in BC8 (2.00 ± 0.01 CE/g fl). The values of TFC for PE clones ranged from 3.77 ± 0.09 CE/g fl for BC18 to 1.22 ± 0.02 CE/g fl for BC15. The highest and lowest values of TFC for QC clones were found in BC22 (5.56 ± 0.04 CE/g fl) and BC21 (0.81 ± 0.02 CE/g fl). TFC for eight NB wild clones ranged from 1.19 ± 0.02 CE/g fl (BC29) to 2.70 ± 0.05 CE/g fl (BC34). The value for TFC among cultivars was the highest for lowbush cultivar FUN (2.83 ± 0.02) and the lowest for half-high cultivar NOB (0.64 ± 0.02 CE/g fl). The TFC value for Cross 1 and Cross 2 ranged from 1.04 ± 0.03 CE/g fl (HB1) to 1.71 ± 0.02 CE/g fl (HB3) and from 0.87 ± 0.04 CE/g fl (HB19) to 1.95 ± 0.03 CE/g fl (HB12), respectively. While eight NL clones (BC1-7, 10) were found superior, with higher TPC, ranging from 3.49 ± 0.03 mg CE/g fl to 9.99 ± 0.11 mg CE/g fl, than those of the cultivars, there were two clones-one PE clone (BC18, 3.77 ± 0.09 mg CE/g fl) and one QC clone (BC22, 5.56 ± 0.04)-that had higher TFC than the cultivars (S2).

Relationship Among Antioxidant Properties
To evaluate the relationships between TAA, TPC and TFC, linear regression was performed. There was a significant relationship observed between TAA and TPC (r 2 = 0.124, Figure 1a), TAA and TFC (r 2 = 0.149, Figure 1b), and TFC and TPC (r 2 = 0.682, Figure 1c). The Pearson correlation coefficient between TFC and TPC (r = 0.826) was significantly higher, followed by TAA and TFC (r = 0.387) and TAA and TPC (r = 0.352) (Table S3).

Analysis of Primer's Discriminatory Capacity
For 10 EST-SSR primers, the polymorphic information content (PIC) values varied between 0.03 for marker CA23 to 0.96 for CA112, with an average of 0.35 (Table 4), which suggested that CA112 is the most informative and CA23 the poorest among EST-SSR primer pairs. All other EST-SSR primers, except CA787 and NA961, fell into a moderate category, with PIC values between 0.5 and 0.25 [38]. While markers CA236 and NA961 had the highest effective multiplex ratio (EMR) value (1.80), CA112 was the poorest, with an EMR value of 0.08. The marker index (MI) ranged from 0.03 to 0.65 for markers CA23 and CA421, respectively. Discrimination power (D) was the highest for CA236 (0.91) and the lowest for CA23 (0.03). Resolving power (R) ranged from 0.03 (CA23) to 2.91 (CA421). EST-SSR primer CA23 is the poorest primer pair among all primers in the category, with the lowest MI, D, and R values. The highest values for MI and R were observed in CA421, followed by CA236 ( Table 4). The latter was, however, the best for D values (0.91), and it was followed by CA483 (0.90).  (Table 4).
When eight EST-PCR primers were used, PIC was highest for CA227 and CA1423 (0.37) and the lowest for NA27. The EMR values ranged from 0.86 (CA287) to 3.57 (CA54). MI was highest for CA227 (1.16) and lowest for NA27 (0.03). The trends for D and R values were almost similar. The highest value (0.96) for D was observed in CA21, CA791, and CA54, while NA27 attained the lowest value (0.03). The highest value (4.31) of R was observed in CA791, followed by CA54 (4.11), while the lowest was 0.03 for NA27 (Table 4).
Among the three groups of primers, the average values of EMR, MI, D, and R suggest that the EST-PCR primer system is the most effective primer system, rather than the G-SSR and EST-SSR primer systems (Table 4).

Analysis of Population Genetic Diversity
For EST-SSR primers, the percentage of polymorphic loci (PL) among the seven populations ranged from 29% for cultivars to 79% for PE clones and hybrid group Cross 2, with a mean of 62% ( Table 5). The observed (Na) and effective number of alleles (Ne) were also highest in hybrid group Cross 2 (Na = 10.08; Ne = 1.48) and lowest in cultivars (Na = 3.75; Ne = 1.30). Nei's gene diversity or expected heterozygosity (He) and Shannon's information index (I) were also highest in the Cross 2 group (He = 0.28, I = 0.42) while He was the lowest in cultivars and hybrid group Cross 1 (He = 0.17). Hybrid group Cross 1 has the lowest value for I (0.25), followed by cultivars (0.26). The average values for He and I were 0.23 and 0.34, respectively (Table 5).
For G-SSR primer pairs, the PL, like for EST-SSRs, was highest in hybrid group Cross 2 (64%) and lowest in cultivars (32%), with an average of 51%. The number of observed alleles was the highest in hybrid group Cross 2 (8.38), followed by Cross 1 (6.25). The QC clones had the lowest number of observed alleles (4.02) and the highest number of effective alleles (Ne = 1.44). The average number of alleles was found to be the lowest in the cultivar (Na = 2.97) and PE (Ne = 1.28) populations. The QC clones had the highest He (0.24) and I (0.35) values, and it was followed by hybrid group Cross 2 (He = 0.21; I = 0.32). The He value was lowest in the Cross 1 group and PE wild clones (He = 0.17). Hybrid group Cross 1 also showed the lowest I value (0.24), followed by PE clones (0.25) ( Table 5).
For the EST-PCR primer system, the values of PL (75.86%), Na (7.92), Ne (1.33), He (0.20), and I (0.31) were the highest for the HS2 population. The lowest level of polymorphism (46.55%) was observed in the NB population and cultivars. The number of alleles (Na = 2.75) was also lowest for cultivars. The effective number of alleles (Ne = 1.22) was minimum in the NL population. The NB population also had the lowest values of 0.14 and 0.21 for Nei's gene diversity or expected heterozygosity (He) and Shannon's information index (I), respectively (Table 5).

Unweighted Neighbour-Joining (NJ) Tree
The NJ analysis displays interindividual distances graphically. Blueberry genotypes were resolved with statistical confidence based on Jaccard's dissimilarity coefficients [51]. As in the STRUCTURE analysis ( Figure S1), two main clusters are observed for EST-SSR primer pairs ( Figure S4 Figure S1), with few exceptions. The lowbush wild clones BC35 and 36, which are part of Cluster 1 of the NJ tree ( Figure S4), are grouped into Cluster 2 of the STRUCTURE analysis ( Figure S1). The highbush cultivar Polaris and hybrid HB22 in Cluster 2 of the NJ tree ( Figure S4) are part of Cluster 1 of the STRUCTURE analysis ( Figure S1).

Principal Coordinate Analysis (PCoA)
PCoA revealed the genetic relationship of the 70 genotypes, which was in support of the Bayesian inferences from the STRUCTURE and unweighted neighbour-joining analyses for most of the genotypes. PCoA for EST-SSR ( Figure S7) confirms the STRUCTURE ( Figure S1) and NJ groupings ( Figure S4), as most of the lowbush wild clones are on the left side of the axis (Cluster 2), except for BC1, BC10, BC17, BC26, BC32, and BC34. All six blueberry cultivars and all hybrids, except HB20 and 22, are also placed on the right side of the axis (Cluster 1, Figure S4).
The separation of the majority of genotypes in PCoA for the G-SSR primers ( Figure S8) is also aligned with the STRUCTURE ( Figure S2) and NJ groupings ( Figure S5). Most of the genotypes that are on the right side of the axis of the PCoA graph (Cluster 1, Figure S8) are also represented in Cluster 1 of the STRUCTURE analysis ( Figure S2). The majority of these genotypes are also present in Cluster 1 of the NJ tree ( Figure S5). Similarly, genotypes present on the left side of the PCoA graph are also assembled in Cluster 2 of the STRUCTURE and NJ groupings. The genotypes from Cluster 3 of the NJ tree ( Figure S5) can be seen in the PCoA graph's center, close to the central axis ( Figure S8).
The PCoA of EST-PCR primers ( Figure S9) resembles the results of STRUCTURE ( Figure S3) and NJ analyses ( Figure S6 Tables 1 and 2 for genotype labels).

Analysis of Molecular Variance (AMOVA)
There were significant differences among the seven groups of blueberries (p ≤ 0.0001), demonstrating a high genetic diversity level. This was confirmed with relatively high values for total differentiation (PhiPT: 0.228, 0.182, 0.167, and 0.186 for EST-SSR, G-SSR, EST-PCR, and combined primer pairs, respectively) for all groups, showing little similarity among them. In the present study, AMOVA analysis of EST-SSR, G-SSR, EST-PCR, and all primers combined showed a variance of 23%, 18%, 17%, and 19%, respectively, among the groups. The values for variation among genotypes within these groups were 77%, 82%, 83%, and 81%, respectively (Table 6). Table 6. Genetic differentiation among blueberry genotypes by analysis of molecular variance (AMOVA) based on seven groups, where four groups were comprised of wild clones collected from four Canadian provinces, the fifth group had all six cultivars, the sixth group contained 11 hybrids from the first cross (HB1-11), and the seventh group contained 17 hybrids from the second cross (HB12-28).

Relationship between Biochemical and Genetic Analysis
The Mantel test was used to check the correlation between biochemical and genetic distances of EST-SSR, G-SSR, EST-PCR, and all primers combined. Figure 7 shows there was no significant correlation between biochemical and genetic distances, as indicated by scatter plots (a, b, c, d) and poor correlation coefficient values (e, f, g, h), r(AB), of 0.046, −0.042, −0.018, and −0.064 for EST-SSR, G-SSR, EST-PCR, and combined primers, respectively.
Results of SMRA between polymorphic EST-SSR, G-SSR and EST-PCR markers, with 24, 25 and 58 alleles, respectively, and the biochemical traits in the 70 genotypes are represented in Table 7. Alleles showing significant association based on multiple correlation co-efficient (R 2 ) were considered. SMRA identified 17 alleles associated with various biochemical components. According to SMRA calculations of fraction of variation for each primer pair (R 2 ; Table 7), it was revealed that a combination of four alleles accounted for 33% of TAA among 70 blueberry genotypes. Out of these four, only VCC_I2_1 showed a positive, statistically significant (t = 2.343, p < 0.022) correlation with TAA, as explained by a beta coefficient value of 0.242. The other three alleles, VCC_S10_1, NA800_1 and CA791_4, had a statistically significant but negative correlation to TAA (Table 7). TPC variation (69%) was explained by 11 alleles, out of which seven alleles, CA1423_6, VCC_K4_6, CA54_7, CA23_1, VCC_K4_1, CA1029_1 and VCC_K4_9, were positively correlated, while other four alleles, CA54_3, VCC_I8_1, CA791_7 and CA21_7, were negatively correlated with TPC (Table 7). VCC_K4_6 was the highest contributor, with a beta coefficient value of 0.585, and positively significant (t = 5.554, p < 0.000). Six alleles, VCC_K4_6, CA54_7, CA23_1, CA1029_1, VCC_K4_1 and CA1423_1, out of seven alleles were positively correlated with TFC. Allele CA791_1 was negatively correlated with TFC. VCC_K4_6 (t = 8.369, p < 0.000) was the key contributor, with a proportion of variation of 21% and a beta coefficient value of 0.818 (Table 7). Five alleles, VCC_K4_6, CA54_7, CA23_1, VCC_K4_1 and CA1029_1, were also associated and positively correlated with TPC as well as TFC (Table 7).

Discussion
The study presented here provides insight into genetic diversity, with respect to genetic relationship and structure, and biochemical properties of two groups of selected hybrids of lowbush and half-high blueberries, half-high wild blueberry clones, and highbush and lowbush blueberry cultivars. The antioxidant properties of blueberries are well known for their medicinal value in negating the harmful effects of free radicals [56]. The leaves of blueberry wild clones and cultivars can have higher antioxidant activity [57,58], polyphenolics, and proanthocyanidins than the fruit [59,60].
The antioxidant activity depends on the synergistic and antagonistic interaction of various compounds and environmental factors [61]. There is no standard agreed method for estimating antioxidant activity because of its complexity [62]. In the present study, we used the DPPH radical scavenging method, as it is sensitive and cheaper than other known procedures [63]. Out of all the groups, the NB wild clones had the highest TAA, followed by CV, Cross 2, and NL wild clones. TPC and TFC were highest in NL wild clones, followed by CV. The wild clones from NL and NB proved to be an important resource for improving antioxidant properties in the blueberry breeding program. Phenolics are the abundantly available secondary metabolite derived from phenylalanine via the secondary metabolic pathway, catalyzed by phenylalanine lyase L (PAL). Various biotic and abiotic factors can cause stress in source plants and trigger higher activity of PAL [64]. Low levels of light in the NL province could have contributed to higher levels of TAA, TPC, and TFC. Leaf maturity can have a significant impact on phytochemical composition in blueberry. In their study, Riihinen et al. [60] reported the red leaves of V. corymbosum possessed higher levels of quercetin and kaempferol, p-coumaric, and caffeic acids than the green leaves. This could be the case because solar radiation increases these compounds as a part of the photoprotective mechanism [60]. On top of that, the red leaves contained a small amount of anthocyanins, while green leaves did not have any anthocyanin content [65]. Therefore, TPC and TFC may not sufficiently explain total antioxidant activity as they are the cocktail of various compounds and their activities. The DPPH value is calculated by the addition of various antioxidant compounds, which depends on the chemical used during the extraction of leaves [66]. However, Wang and Lin [67] reported contradicting observation that the young leaves from different varieties of blackberries, raspberries and strawberries possessed higher TPC and TAA than older leaves. There was a positive correlation between TAA with TPC and TFC, which was also reported in previous studies involving blueberries [28,29,68].
The biochemical analysis in the present study provides important information about the diversity of antioxidant properties. However, biochemical characteristics by themselves are not enough for the presence of genetic diversity. The DNA marker system provides a precise and reliable method for further analysis of variability. The extent of genetic diversity between and within populations is often the outcome of a combination of factors such as gene flow, genetic drift, inbreeding, mutation and the selection effect [43]. It is very expensive, time-consuming, and laborious to develop species-specific molecular markers. Because of these constraints, we used EST-SSR, G-SSR, and EST-PCR markers developed for highbush blueberries [19,20]. Our report is apparently the first to use these three types of markers to assess genetic diversity in a group of hybrids obtained by crossing lowbush with half-high blueberries. Microsatellite markers have also been used for hybrid identification in closely related wild Petunia species [69]. Although G-SSR markers are highly abundant in the plant genome and are attractive due to their reproducibility and polymorphic nature, most of them lack close linkage to transcribed regions and do not have a specific genic function. On the other hand, SSR markers derived from EST sequences are associated with the genome's transcribed or expressed regions [70]. The single-pass sequence of cDNA clones that are picked randomly is the source of EST-SSR and EST-PCR markers [71]. All primer pairs used in this study showed an elevated polymorphism that confirmed the high degree of genetic diversity in the blueberry genome of the current material.
In the present study, the discriminatory power of EST-SSR, G-SSR, and EST-PCR  Table 4), proved that this primer pair is not worthy for analyzing present blueberry hybrids, wild clones, and cultivars. On the other hand, the EST-PCR primer pair CA227, with its highest MI value among all primers (1.16), was the best for overall utility to study the present material, and it was followed by CA1423 (MI = 1.00) and CA54 (MI = 0.99). These three EST-PCR primer pairs may be very valuable for analyzing blueberry hybrids. However, the moderate-to-high values for most of the primer pairs could be attributed to their effectiveness in studying the genetic diversity of the present material.
In the present study, the mean allele number for EST-SSR, G-SSR and EST-PCR were 6.30, 5.36 and 4.40, respectively, which is comparable or lower than previous studies involving SSR and/or EST-PCR primer pairs in blueberries (22.4 [17]. The lower values of diversity parameter could be an indication of genetic erosion resulting from selective farming and deforestation [3]. We used three complementary methods: STRUCTURE, NJ tree, and PCoA to study population structure and genotype relationships in wild, cultivated, and hybrid blueberries using 26 PCR-based marker pairs. Genotype identification using DNA markers is favoured due to their consistency and reliability, as they are unaffected by the environment [77]. The combined STRUCTURE analysis divided the genotypes into three major groups, with some admixtures confirmed by PCoA and NJ analyses for most genotypes. Admixtures in the wild blueberry clones that were observed in the present material with STRUCTURE analysis might be due to the consequence of a glacial bottleneck and quick colonization of these blueberries, along with increased regional gene flow due to the migration of human beings and trade in agriculture [78]. Although the hybrids were distributed in all three clusters, most of them formed distinct subgroups, either alone or with lowbush or half-high cultivars. This might be because they had been developed through crossing between lowbush and half-high blueberries and share the genes from both parents. However, most of the wild lowbush blueberries, except for the NB clones and the half-high cultivars, were grouped based on their phenotypes. While lowbush blueberries are less than 0.5 m tall, the suckering to crown-forming of half-high blueberry plants are 0.5 to 1.0 m tall. Highbush blueberry plants are crown-forming and 2.0 or higher in height [79]. In the present study, most of the wild clones, although collected from four different provinces, did not group based on their collection place. Although there is a wide genetic variation among the wild clones, there is no pattern of differentiation based on their collection places. This was also observed in AMOVA analysis, where in the combined analysis, the variation among groups was 19%, and most of the variations (81%) were among the genotypes within provinces, cultivars or hybrid groups. Similar observations were also reported by Debnath [10] and Tailor, Bykova, Igamberdiev and Debnath [22], who worked with different sets of wild lowbush blueberry clones and observed that wild clones were grouped into different clusters. In the present study, it was evident that there was no clear difference between the wild, cultivated, and hybrid blueberries, indicating that diversity-wise, the present genotypes are all heterogeneous in nature. This might be due to a smaller variation among different groups than the variation among the genotypes within a group. STRUCTURE, NJ and PCoA analyses, along with AMOVA analysis, were complementary to each other and, thus, instead of using one method, a number of procedures are more informative for drawing valid conclusions. Similarly, using more than one type of molecular marker is always better than using a single type of molecular marker [10,22]. In our study, STRUCTURE, NJ, PCoA and AMOVA analyses using EST-SSR, G-SSR and ESTPCR markers have well discriminated the wild blueberry clones, cultivars and hybrids from the wild and cultivated blueberries that are part of our current germplasm repository for the cool climates of Canada.
There are no reports available on the relationship between molecular markers and biochemical properties in blueberry. Our study found no parallels between genetic and biochemical data, as observed by phylogenetic trees, PCA-PCoA graphs, and the Mantel test of correlation. The poor correlation between genetic clustering from biochemical data indicates varying genomic coverage in blueberries. Molecular markers span across the genome and most of which are not expressed at the phenotypic level. The noncoding regions of the genome that are not accessible to phenotypic expression might be the reason for the dissimilarity between molecular and chemical diversity [77]. There are only three reports available on the comparative analysis of molecular markers with biochemical properties. In their study, Debnath and Sion [80] reported no correlation between genetic diversity based on ISSR markers and chemical diversity based on antioxidant activity and anthocyanin content in lingonberry. Similar observations were also reported in strawberry [81] and cranberry using ISSR, EST-SSR, and EST-PCR markers [82]. We also studied the association of EST-SSR, G-SSR and EST-PCR markers with 70 genotypes of blueberry and found that only one marker was associated with TPC, as revealed by DPPH assay, and five markers were associated with both TPC and TFC. This can be explained by the polyploid nature of blueberry and the distribution of associated alleles across the whole genome. However, our study is the first one to use SMRA to identify markers associated with antioxidant properties. This method can provide easy and reliable identification of favourable genotypes or populations in a breeding program at an early stage and has been used to associate molecular markers with traits in numerous species such as mulberry [83], buckthorn [84], and Tunisian olive [85]. This approach is a convenient and quick tool for marker-trait association, without the need to map populations. Multigenic control of TPC, TFC and antioxidant traits can have practical uses in future blueberry breeding programs.
Blueberries are of significant importance for their antioxidant phytochemicals, especially phenolic metabolites that play a significant role in human health benefits and plant defence mechanisms [86]. Most of plant phenolics are flavonoids and nonflavonoids. Flavonoids are of two types: anthocyanins and anthoxanthins. While anthocyanins are pigment molecules (red, blue and purple), anthoxanthins are white to yellow or colourless molecules and include flavanols, flavonols, flavones and isoflavones. Nonflavonoids are comprised of phenolic acids, lignans and stilbenes. Tannin and lignin are the other nonflavonoid subclasses [86]. The flavonol quercetin is an important nutritional bioactive compound with high bioaccessibility (~80%) [87]. Quercetin helps in protecting against osteoporosis, cancer, pulmonary and cardiovascular diseases, and ageing [88]. In blueberry, anthocyanins were found to possess the highest inhibition effects on in vitro colon cancer cell proliferation, followed by flavonols and tannins [89]. Biomarker-based human clinical studies showed that regular and moderate consumption of blueberries and/or anthocyanins is associated with reduced risk of death, cardiovascular disease and type 2 diabetes [90]. In another study with in vitro cell bioassays for anti-inflammatory and antioxidant activities, Grace et al. [91] reported that the anthocyanin group of phenolics was mainly responsible for the bioactivity, and blueberry extract suppressed proinflammatory markers (in-terleukin-1β, cyclooxygenase-2, inducible nitric oxidesynthase, and interleukin-6 [92]). Polyphenol-, anthocyanin-and proanthocyanidin-rich components of crude wild blueberry extract were found to suppress mRNA biomarkers of acute inflammation, and mlvidin-3-glucoside suppressed the effects of proinflammatory genes that are responsible for transcriptional regulation and cytokine-mediated inflammation [92]. It has been observed that in-vitro antioxidant assays with blueberries resulted in a strong correlation with those of total phenolic and total anthocyanin contents [91]. In the current study, we measured total phenolic, flavonoid and antioxidant contents to study genetic and biochemical diversity in blueberry germplasm and to identify blueberry genotypes with high bioactive components and wide diversity for use in an on-going breeding program. Identifying phenolic-rich cultivars for breeding species with high bioactive composition is an important approach to improving the nutritional quality of blueberries. Crossing between selected genotypes is expected to develop new cultivars that combine superior health-promoting bioactive components with diverse adaptability under a changing environment. However, blueberry genotypes with more specific profiles of polyphenols are of significant importance for human health, which can be explored in future research with some of the selected promising genotypes from the current material. As to the mass balance of phenolics and what is in human blood, the amount observed is low, especially for highly hydrophilic phenolics. In this, researchers have earlier ignored the metabolites that are also present; if this was done appropriately, the actual intake is much higher than was originally thought. Efforts have also been made to lipophilize phenolic compounds to enhance their absorption. Therefore, having more phenolics, especially those with different polarities, is a good idea, as a mixture of phenolics is present in each material (personal communication: F. Shahidi). Valuable single phenolic compounds can be estimated with the selected material after chromatographic separation, as the total assays do not reflect the situation in terms of polyphenols; they also can target amino acids and reducing agents.
However, when dealing with the same type of material, the trends provided are quite informative, and although absolute values may not be exact, the trends are always valid. This assumes that the amino acids/proteins present are not varied to any great extent, which is a valid assumption in almost all cases (personal communication: F. Shahidi).

Conclusions
Our study is the first of its kind to investigate antioxidant activity and phenolic and flavonoid contents, along with genetic diversity analysis, using three types of marker systems-EST-SSR, G-SSR, and EST-PCR-in blueberry. The study identified two NL (BC2, 6) for TAA and one QC wild clone (BC22) for TPC and TFC, superior to cultivars and hybrids. These wild blueberry clones hold the key to designing future breeding exercises to generate cultivars with valuable antioxidant traits. The present study indicates that 10 EST-SSR, eight G-SSR, and eight EST-PCR primer pairs could distinguish and report genetic variations at the molecular level among wild and cultivated lowbush, half-high and highbush blueberries and among hybrids between lowbush and half-high blueberries. The EST-PCR primer pair CA227 was the best to discriminate blueberry hybrids, clones, and cultivars, followed by EST-PCR primer pairs CA1423 and CA54. The alleles of CA1423 and CA54 also showed a strong positive correlation and association to TPC and TFC in SMRA. The utility of these primers across different blueberry species can help identify and characterize interspecies blueberry hybrids and select useful genotypes as a parent in a breeding program. DNA fingerprinting with more than one type of molecular marker will allow better management of the blueberry germplasm and conservation efforts. Clustering based on EST-SSR, G-SSR, EST-PCR, and combined primer data was different from antioxidant properties. These markers are spread across the genome, many of which are located in noncoding regions, explaining the poor correlation between genetic and biochemical data. However, these markers' potential utility is immense, as shown by our association study in the blueberry marker-biochemical relationship using SMRA, and can prove to be a valuable tool.