Expressed Sequence Tag-simple Sequence Repeat (est-ssr) Marker Resources for Diversity Analysis of Mango (mangifera Indica L.)

In this study, a collection of 24,840 expressed sequence tags (ESTs) generated from five mango (Mangifera indica L.) cDNA libraries was mined for EST-based simple sequence repeat (SSR) markers. Over 1,000 ESTs with SSR motifs were detected from more than 24,000 EST sequences with di-and tri-nucleotide repeat motifs the most abundant. Of these, 25 EST-SSRs in genes involved in plant development, stress response, and fruit color and flavor development pathways were selected, developed into PCR markers and characterized in a population of 32 mango selections including M. indica varieties, and related Mangifera species. Twenty-four of the 25 EST-SSR markers exhibited polymorphisms, identifying a total of 86 alleles with an average of 5.38 alleles per locus, 73 and distinguished between all Mangifera selections. Private alleles were identified for Mangifera species. These newly developed EST-SSR markers enhance the current 11 SSR mango genetic identity panel utilized by the Australian Mango Breeding Program. The current panel has been used to identify progeny and parents for selection and the application of this extended panel will further improve and help to design mango hybridization strategies for increased breeding efficiency.


Introduction
The genus Mangifera belongs to the Anacardiaceae family and comprises 69 species [1], with the best known being the common mango (Mangifera indica L.).Mangoes are regarded among the five most important fruit commodities traded worldwide, along with, bananas, apples, grapes and oranges [2].The estimated extent of Australian commercial mango production for local and overseas markets is on average 45,000 tons per annum from around 9,000 hectares (data from 2005-2011) [2].
"Kensington Pride" has dominated commercial production in Australia with its unique flavor and low fiber.However, its shortfalls have long been recognized [3][4][5], and include excessive vigor, irregular bearing and disease susceptibility.The dominance of "Kensington Pride" has narrowed the genetic base of mango production in Australia [6].Since the 1960s, Australian breeders have been systematically attempting to widen the genetic base of the mango industry by identifying alternative varieties suited to Australian growing conditions through various selection and traditional breeding programs [7].The progressive release of new cultivars, including "Delta R2E2" in 1991 [8], "B74" (Calypso™) in 2000 [9], "Honeygold" in 2002 [10], and "NMBP 1243", "NMBP 1201" and "NMBP 4069" in 2009 [11] have helped develop and diversify the Australian mango industry.However, breeders have recognized the need for the continual development of new cultivars to keep the Australian industry competitive in domestic and international markets.
Breeding mangoes is a long term activity complicated by a heterozygous genome, polyembryony, juvenility, low fruit set and retention rates, long evaluation periods, and out-crossing behavior.These factors make genetic improvement through conventional parental selection and breeding slow and unpredictable.Adoption of molecular markers and genomics-based breeding strategies will likely improve predictability and breeding efficiency.Currently, the lack of basic genome sequence and DNA marker information limits the practical application and adoption of molecular technologies in mango breeding.Molecular markers linked to important phenotypic traits are especially useful when the traits are difficult, costly and time consuming to observe.Markers indicating DNA polymorphisms within a specific target gene are preferable as there is minimal risk in losing linkage due to recombination.Such markers are allele-specific and remain informative whatever the genetic background, and are therefore more likely to be transferable across taxonomic boundaries.
The initial low coverage genetic maps of M. indica developed by Kashkush et al. [12] and Chunwongse et al. [13] have only limited information on molecular markers and patterns of genetic diversity that reflect the evolutionary relationships of individual varieties and that may assist in identifying groups of varieties that are related by common ancestry [14].
In recent years, Mangifera germplasm has been collected and analyzed using simple sequence repeat (SSR) markers by Duval et al. [15], Schnell et al. [16] and more recently by Dillon et al. [6].The traditional techniques of developing SSR markers are usually time consuming, labor intensive and of low efficiency [17].However, alternative strategies to identify SSR markers have been developed that use comparative genomics tools such as expressed sequence tags (ESTs) [18][19][20].A key advantage of EST-SSRs is that they are often more transferable across closely related genera compared to anonymous SSRs from untranslated regions (UTRs) or non-coding sequences e.g., [21,22].This is due to the primer target sequences residing in the expressed DNA regions expected to be relatively well conserved, thereby increasing the chance of marker transferability across species boundaries [23,24].Despite their potential to represent selectively deleterious frame-shift mutations in coding regions, EST-SSRs appear to reveal equivalent levels of polymorphisms compared to SSRs located in UTRs, most likely due to an evolutionary trend towards tri-nucleotide repeats in these coding regions [17].EST-SSRs are physically linked to expressed genes and therefore represent potentially functional markers.
An estimated 2%-5% of all plant-derived ESTs are thought to harbor SSRs [25], although the actual frequency of SSR-bearing ESTs in any particular analysis is highly dependent on the search parameters.Moreover, 80%-90% of EST-SSRs are typically found to be polymorphic [26,27].Taking into account typical marker development attrition rates, it is likely that EST databases containing as few as 1,000 sequences could provide sufficient markers to facilitate population genetic analyses [17].EST-derived SSRs have been well documented in some plant species including Arabidopsis thaliana [28], sugarcane [29], and cacao [30].Putative functions can be deduced for the SSRs using homology searches and thereby provide a new resource that can further aid in genetic and evolutionary studies [31].As the numbers of cloned mango genes and available EST sequences from diverse tissues slowly increase [32] large-scale searches for SSR motifs and design of SSR primers using computational methods are becoming feasible.
In this study we present the identification and validation of 25 mango EST-SSRs linked to candidate genes involved in plant development, stress response, fruit color and flavor development pathways.The EST-SSRs were tested for the extent of PCR amplification, polymorphism and heterozygosity across a diverse selection of varieties of M. indica and related Mangifera species held at the Australian National Mango Genebank (ANMG).

Plant Material
Thirty-two mango (M.indica) varieties and Mangifera species maintained at the ANMG at Southedge Research Station, Mareeba (16°45′S, 145°16′E) and at Ayr Research Station (19°31′S, 147°22′E), Queensland, Australia, were used in this study (Table 1).All varieties were grafted onto the uniform polyembryonic rootstock of the cultivar "Kensington Pride".EST libraries were constructed from "Kensington Pride" red leaves, flowers, fruit pulp and skin, and roots and "Irwin" red leaves."Kensington Pride" was selected as it is the predominant variety grown in Australia."Irwin" was selected for its high fruit color, high productivity, semi-dwarf characteristics and as a parent of a breeding population of the Australian Mango Breeding Program (AMBP).

Phenotypic Evaluation of Mango Fruit
Pulp color, background skin color and blush color were evaluated on the majority of the varieties analyzed.At harvest, 10 fruit from each variety were sampled evenly from all quadrants of each tree.Fruits were transported to the laboratory within two hours of harvest, where they were dipped in 1 mL• L −1 of the fungicide carbendazim at 52 °C for 5 min and subsequently held between 22 °C and 24 °C to ripen.All color evaluations were undertaken on fruit at the eating ripe stage.Color was evaluated categorically and electronically using the Hunter L. a. b. color scale [33].

Genomic DNA Extraction
Genomic DNA extractions were performed according to the method described by Dillon et al. [6].

RNA Extraction
RNA was extracted from "Kensington Pride" red leaf, fruit skin, fruit flesh, flower and root tissues, and from "Irwin" red leaf tissue using the Spectrum™ Plant Total RNA Kit (Sigma-Aldrich, Sydney, Australia) according to the manufacturer's instructions.

EST Library Construction, Sequencing and Annotation
The SuperScript Plasmid System for cDNA Synthesis and Cloning (Invitrogen) was used to construct the cDNA libraries in accordance with the manufacturer's protocols.Single pass, 5' end sequencing was performed at the Australian Genome Research Facility (AGRF) using Applied Biosystems 3730 capillary sequencers.The raw chromatogram files were quality clipped using phred [34,35] and vector sequences were removed using CrossMatch within the Staden package [36].The Staden output files were parsed using Perl scripts prior to assembly using CAP3 [37].Putative functions of resulting contig and singleton sequences were assigned on the basis of similarity to A. thaliana amino acid sequences (TAIR8) [38] using BLASTx [39].Bioinformatics analysis was performed at the Queensland Facility for Advanced Bioinformatics (QFAB).

EST Data Mining
EST sequences were mined for SSRs using Perl scripts with thresholds of six repeat units for dinucleotide repeats and four repeat units for tri-, tetra-, penta-, and hexa-nucleotide repeat motifs.Sequences with putative SSRs were passed to Primer3 [40] and PCR primers were designed where sequence context permitted.
A set of 25 EST-SSRs was further analyzed (Table 2).These markers were selected based on their placement within putative genes involved in plant development, stress response, and fruit ripening and color development.Primer pairs were synthesized by Applied Biosystems (Foster City, CA, USA) and forward primers were labeled at the 5' end with fluorescent dyes 6FAM, VIC, PET or NED.

DNA Amplification and Capillary Electrophoresis
EST-SSR polymerase chain reaction (PCR) amplifications were carried out in a Veriti® Thermal Cycler (Applied Biosystems: Foster City, CA, USA).The amplifications were conducted in a total of 6 μl containing 1x ImmoBuffer (Bioline Pty Ltd.: Alexandria, Australia) 1.5 mM MgCl 2 , 1.25 mM dNTPs, 0.33 μM of each primer and 0.2 units Immolase™ DNA polymerase (Bioline Pty Ltd.: Alexandria, Australia).Thermal cycling conditions included an initial denaturation at 95 °C for 15 min followed by 40 cycles of 30 s at 94 °C, 30 s at 55 °C, and 60 s at 72 °C with 10 min at 72 °C for a final extension.
PCR amplicons were separated by capillary electrophoresis on a 3730 DNA Analyzer (Applied Biosystems: Foster City, CA, USA).Samples were prepared by adding 1 L of PCR product mixed with 10.4 L of HiDi formamide and 0.06 L of the size standard LIZ 500 (Applied Biosystems: Foster City, CA, USA) prior to a 60 min separation at 230 V, 32 amp.

Data Analysis
Allele data analysis was performed using the GeneMapper software version 3.7 (Applied Biosystems: Foster City, CA, USA) for internal standard and fragment size determination and for allelic designations.Automated allele calling was performed initially and flagged data then called manually.
The genetic similarities between the genotypes were calculated from allele frequency data using three genetic distance methods: Cavalli-Sforza's chord distance [41], Reynolds distance [42], and Nei's genetic distance [43,44].Evaluation of the three analysis methods was based on the degree of congruence among tree topologies as well as the ability to detect geographical groupings.The best results were obtained with Cavalli-Sforza's chord distance, a measure that assumes no mutation, that all gene frequency changes are caused by genetic drift alone, is independent of samples size and number of loci and is not strongly affected by null alleles [45].The Cavalli-Sforza chord distance uses the geometric distance between multi-dimensional points on a hyper-sphere (a sphere with >3 dimensions) [46].
Dendrograms were constructed only using the Cavalli-Sforza chord distance, with the neighbourjoining (NJ) method and rooted on the mid-point [47].The robustness of the dendrograms was assessed by creating 1,000 bootstrap replicates of the data and then generating a majority rule consensus tree.Distance calculations, tree construction and bootstrapping were all performed in PowerMarker V3.0 [48].
Expected and observed heterozygosity were calculated using CERVUS © 3.0.3[49].Polymorphism information content (PIC) values for diversity analysis were calculated (CERVUS © 3.0.3)for each locus according to the formula: PIC = 1 - Pi 2 , where Pi is the frequency of the ith allele in examined genotypes [50].EST-SSR and phenotypic data (background skin color, blush color, pulp color of fruit) were evaluated by estimating cophenetic correlation using Mantel's matrix correspondence test with 10,000 permutations [51].The Euclidean distance or simple-matching distance was used for the phenotypic data.

Analysis of Mango EST-SSR Sequences
A total of 24,840 EST sequences were generated from five M. indica cDNA libraries prepared from "Kensington Pride" red leaf, fruit, flower and root and "Irwin" red leaf.BLASTx analysis of the quality clipped and trimmed ESTs identified 22,726 sequences (93%) with matches to A. thaliana amino acid sequences at e values less than 1 × 10 −10 .These libraries contained approximately 14.5 × 10 6 nucleotides of mango sequence with an average length of EST sequences of 578 nucleotides.Using strict threshold criteria, 1,802 SSRs were identified from over 1,100 EST sequences (4%).Assembly of the SSR-containing ESTs produced 174 contigs and 582 singletons with an average length of 781 nucleotides and 647 nucleotides, respectively.Based on this assembly, 10 contigs showed evidence of in silico SSR variability.A single SSR each was present in 866 ESTs, whereas 116 ESTs contained two SSRs and 29 ESTs contained three or more SSRs.Fifty-seven different SSR motif types were represented.Repeat numbers ranged from four to 42 with an average repeat length of 15.6 nucleotides.The most common repeat motif found within all mango EST-SSRs were the tri-nucleotide repeats with 1,367 EST-SSRs, almost 76% of the total EST-SSRs identified (Table 2).The next most common EST-SSRs were the di-nucleotide repeats with 296 identified (16.4%), followed by tetra-(3.8%),hexa-(2.8%)and the least common penta-nucleotide repeats with just 1% found.The most frequent EST-SSR tri-nucleotide repeat motif was (AAG) n and di-nucleotide repeat motif (AG) n ."Kensington Pride" red leaf (n = 454) and root (n = 438) cDNA libraries showed the highest number of EST-SSR sequences.The lowest number of EST-SSR sequences were identified in "Irwin" red leaf (n = 286) and "Kensington Pride" fruit skin and flesh (n = 296) cDNA libraries.

Marker Development and Polymorphism of Mango EST-SSRs within Mangifera indica
Only di-, tri-, tetra-, penta-and hexa-nucleotide repeats were considered as potential candidates for EST-SSR marker development (Table 3).Primer pairs were designed for 36 mined EST sequences and PCR was successful for 25 with a single distinct PCR product generated across a selection of 27 M. indica varieties and five related Mangifera species.Only two alleles were detected in any individual marker combination but not all loci produced allele sizes that conformed to the repeat unit length indicated.Thirteen EST-SSR markers produced allele sizes that were shorter than the repeat length of the locus (QGMi001, QGMi002, QGMi004, QGMi008, QGMi009, QGMi010, QGMi011, QGMi014, QGMi015, QGMi016, QGMi019, QGMi024 and QGMi025).Of the 25 EST-SSR loci assessed only one marker (QGMi017) showed no polymorphism within any of the Mangifera species analyzed.This marker was discounted in any further analyses.A further five EST-SSR loci (QGMi006, QGMi008, QGMi019, QGMi022 and QGMi023) failed to show polymorphism at the intra species level within M. indica varieties.Discounting all six monomorphic EST-SSR loci, a total of 83 alleles were detected across the 27 M. indica varieties assessed (Table 3).The number of alleles detected per locus varied from two to 13 with an average of 4.37 alleles per locus.Seven EST-SSR loci had a PIC value higher than 0.5.The highest number of alleles (13) was determined for QGMi009, with a PIC value of 0.843 and the lowest number of alleles (two) was determined for QGMi007, QGMi012, QGMi014 and QGMi025.The least polymorphic was SSR locus QGMi014 with a PIC value of 0.036.The average observed heterozygosity (H O ) was below the average expected heterozygosity (H E ), indicating a tendency towards inbreeding, most likely due to population isolation.

Cross-Species Amplification
Cross-species amplification of M. indica EST-SSR loci in five Mangifera species, including Mangifera caesia Jack, Mangifera foetida Lour., Mangifera laurina Blume, Mangifera odorata Griff., and an unidentified Mangifera species, was evaluated.All EST-SSR makers showed a high transferability.M. caesia showed the greatest EST-SSR loci polymorphism among analyzed Mangifera varieties with eleven markers showing private allele sizes in this species (Table 4), while three EST-SSR loci (QGMi010, QGMi020, and QGMi024) repeatedly failed to amplify a PCR product.

Mangifera Diversity Analysis
The SSR marker allele data from the 25 EST-SSR markers was used to generate a bootstrapped Cavalli-Sforza distance neighbor-joining dendrogram for the 32 M. indica and related Mangifera varieties (Figure 1a).Cluster analysis revealed that the 32 varieties showed a high level of genetic diversity.Pooling the information of these 25 EST-SSR markers with data from 11 SSR markers from a previous analysis [6] we were able to generate a bootstrapped Cavalli-Sforza distance neighbor-joining dendrogram for the 32 varieties with a total of 36 markers (Figure 1b).Even with the extra 11 markers, cluster analysis continues to show a high level of diversity among the Mangifera varieties.The rate of polymorphism between varieties is indicative of the genetic distance among wild germplasm and commercial mango varieties in this study.
The correlation of the phenotypic data with the overall Cavalli-Sforza distance for all EST-SSR was not evident for categorical background skin, blush and pulp colors of fruit (data not shown).

Discussion
High quality genetic analyses of crops such as mango require large numbers of informative polymorphic markers for genetic or comparative mapping and quantitative trait loci identification.Identification of markers that are tightly linked to target genes and monitoring their patterns of introgression for broadening the genetic base of mango varieties, are equally important.In mango, genetic analysis has been hampered due to the lack of sufficiently informative markers creating the need to discover high quality markers before useful genetic mapping can be undertaken.In other crops, EST-SSRs have increasingly become the marker of choice for these sorts of analyses.In comparison to other crop plants like rice (~15,200), A. thaliana (8,253), Brassica (5,923), and potato (4,820) [52], there were no publically available EST-SSR markers for mango identified prior to the commencement of this study.
The polymorphic EST-SSR markers developed in this study significantly increase the number of informative microsatellite markers available for genetic analysis of Mangifera species.These markers have been shown to be useful for determining the genetic relationships, exploring potential pedigrees and estimating the genetic background of cultivated accessions of M. indica.
A total of approximately 1,000 ESTs with SSR motifs were identified from over 24,000 EST sequences, a total of 4%.This number is within the predicted 2%-5% of plant-derived SSR-bearing ESTs [25].The frequency range of monocots is between 1.5% to 4.7% [25], while a frequency range of 2.65% to 16.82% has been reported in 49 dicot species [53].Frequency of EST-SSRs in various plant genomes is significantly influenced by the repeat length and the criteria used for mining the SSRs in the database [54].
In our study tri-nucleotide repeats were the predominant repeat motif present in all EST sequences identified, comprising 76% of all the EST-SSRs.These findings are in agreement with the situation in watermelon [20], safflower [22], and citrus [55], where tri-nucleotide repeats were also the most prevalent repeat motif detected.Tri-nucleotide repeats generally prevail in coding regions, which is usually attributed to selection against frame-shift mutations caused by length variation in non-trimetric repeats [56].Di-nucleotide repeats are typically more frequent in untranslated regions, but occasionally occur in coding regions as well.The most frequent EST-SSR tri-nucleotide repeat motif identified was (AAG) n and di-nucleotide repeat motif (AG) n .This is similar to that of EST-SSRs found in coffee [54].Differences in the repeat type abundance in various plant taxa can also be attributed to the differences in the SSR search criteria used for EST database mining in different studies.
The extent of cross transferability of EST-SSR markers determines their suitability in comparative genome mapping and phylogenetics.The EST-SSR markers showed a high level of polymorphism and high transferability across the five Mangifera species analyzed.The study also identified a number of private alleles within the Mangifera species.M. caesia showed the greatest EST-SSR loci polymorphism among analyzed Mangifera varieties with eleven markers showing private allele sizes within this species, while three EST-SSR loci (QGMi010, QGMi020, and QGMi024) repeatedly failed to generate a PCR product.Private alleles were also identified in M. foetida, M. laurina and the unidentified Mangifera species (Table 4).
The five Mangifera species analyzed in this study clustered together in both of the diversity dendrograms generated from the 25 EST-SSRs and the pooled 36 EST-SSR plus SSR markers.A strong relationship between M. foetida var."Bogor 2" and M. odorata var."Kweni", supported by a bootstrap value of 83%, was seen with the diversity analysis using all 36 microsatellite markers.Ding Hou [57] suggested a hybrid origin for M. odorata, which was later verified as a cross between M. indica and M. foetida [58,59].Based on phylogenetic relationships of the internal transcribed spacer (ITS) sequences of these species, M. odorata is more closely related to M. foetida than to M. indica [60].However, more recently Hidayat et al. [61] placed M. odorata closer to M. indica than to M. foetida based on variation of the chloroplast matK sequences.
A strong link between "Lippens" and "Irwin" (85%) in this study indicates the close relationship between these two Florida accessions.Parentage analysis has identified "Lippens" as the maternal parent of "Irwin" [62]."Haden" is also identified as the paternal parent of "Irwin" and the maternal parent of "Lippens" [63].While the parents of "Palmer" are unknown, the strong link between Palmer and "Keitt" (92%) suggest a common ancestry for these two accessions.The genetic similarity of the Florida accessions arises from their common heritage that can be traced back to as few as four Indian accessions and the "Terpentine" land race [63].A close relationship between the Indian accession "Hybrid 17" and "Alphonso" again indicates a common heritage."Hybrid 17" is a seedling of the maternal parent "Alphonso" (pers.comm.C.P.A. Iyer).

Conclusions
In conclusion the results of this study demonstrate that genotyping Mangifera accessions with microsatellite markers can quickly reveal the genetic diversity among accessions.Understanding the diversity and relatedness of accessions can assist breeders to better select parents with the potential to contribute desired genes to progeny and for developing new commercial cultivars.Genetic diversity within a breeding program is highly desirable to enable new cultivars to be produced with novel productivity and fruit quality traits necessary for sustainable productivity and market competitiveness.The development of a comprehensive mango SSR catalogue facilitates characterization of potential genetic markers in the progeny of polymorphic cultivars, and is essential in an important crop species such as mango that is virtually devoid of linkage associations.

Figure 1 .
Figure 1.Neighbor-joining dendrogram, rooted on the mid-point, using Cavalli-Sforza distance based on (a) 25 EST-SSR markers and (b) 25 EST-SSR plus 11 SSR markers.Scale bar indicates branch length.Bootstrap values greater than 50% are indicated.

Table 1 .
Country of origin of 32 Mangifera varieties used in the evaluation of mango expressed sequence tag-simple sequence repeat (EST-SSR) microsatellite markers.

Table 2 .
EST-SSR nucleotide repeat motifs in mango DNA.

Table 3 .
Characteristics of 25 EST-SSR markers screened across 27 varieties of M. indica and five Mangifera species.
E = expected heterozygosity; H O = observed heterozygosity; PIC = polymorphic information content; ND = Not Determined.

Table 4 .
Private alleles within the five Mangifera species analyzed.