Microsatellite Marker Discovery in the Stingless Bee Uruçu-Amarela (Melipona rufiventris Group, Hymenoptera, Meliponini) for Population Genetic Analysis

The species Melipona rufiventris Lepeletier, 1836 is a Brazilian native stingless bee that is part of a species complex known as the ‘rufiventris group’, making it difficult to distinguish between the different species. Populations in this group are facing a severe decline, leading to the risk of local extinction, and therefore, their conservation should be treated as a major concern. This study describes the first set of tri- and tetranucleotide microsatellite markers, using next-generation sequencing technology for use in the identification of genetic diversity and population structure in the ‘rufiventris group’. A total of 16 microsatellite loci displayed polymorphism. Analysis of the whole data set (n = 50) detected 63 alleles in all loci, ranging from 2 to 7 with a mean of 3.9 alleles/locus. A genetic diversity analysis revealed high values for population differentiation estimates (FST = 0.252, RST = 0.317, and DEST = 0.284) between the Atlantic Forest, Cerrado, and Caatinga biomes. An additional evidence for genetic divergence among populations was also found in the ’rufiventris group’; these should be treated as separate conservation units or even as separate species. These microsatellite markers have demonstrated a strong potential for assessing population discrimination in this threatened stingless bee group.


Introduction
Bees are considered the main pollinators in natural and agricultural environments. This ecosystem service is essential to sustaining species diversity and production of crops. Meliponines belong to this group of eusocial insects and are characterized by atrophied stingers in both workers and queens, and are found in most tropical or subtropical regions of the world [1].
The species Melipona rufiventris Lepeletier, 1836 is a native stingless bee distributed mainly in the Atlantic Forest, Caatinga, and Cerrado biomes in Brazil [1,2]. It is a polytypic species that is part of a group known as the 'rufiventris group' because of their similar morphology, which makes it difficult to distinguish between the different species [3][4][5]. Additional biological data are necessary for a better identification of the taxonomic status of these species. Most of the literature suggests that M. rufiventris is a complex of two or three species with a distinct distribution [2,6,7]. This native stingless bee group has a limited flight capacity compared to Apis mellifera [8,9], and when associated with geographic boundaries, it can entail low dispersal rates among populations. Moreover, queen mating in M. rufiventris occurs with a single male, and as a result, considerably low intracolonial genetic variability can be generated [10].
Some of the Brazilian endemic stingless bees are on the official national list of fauna species threatened with extinction [11]. The species Melipona rufiventris is one of these bees that are becoming rare, and are listed as Endangered (EN), due to the adverse effects of deforestation, habitat loss, and predatory collection of honey [5,12]. Therefore, its conservation should be treated as a major concern, aiming at the adequate management of this important genetic resource and the implementation of efficient conservation strategies, securing pollination services in both commercial agriculture and natural ecosystems.
Microsatellite markers, or simple sequence repeats (SSR), have emerged as one of the most popular and effective tools for determining the genetic divergence among populations and the estimation of population structure and genetic diversity in different taxa [12][13][14], which is essential for the development of efficient conservation strategies. The use of microsatellites as an advanced tool is of fundamental importance in the estimation of possible disturbances caused in natural bee populations. Microsatellites allow the assessment of not only the structural situation of genetic composition but also of the reproductive behavior, social structure, and dispersal in endangered bee species [15].
There are very few species-specific microsatellite markers for M. rufiventris (eight markers; [10]), and most are dinucleotides presenting low repeat numbers (≤8), which might affect the level of polymorphism detected by these available markers. The polymorphism in a microsatellite locus depends on the number of repeats it contains, and the level of polymorphism increases with the number of repeats [15,16]. Moreover, long core motifs (i.e., tri-and tetranucleotides or longer), enable a better separation of alleles, compared to dinucleotide repeats, which more frequently produce stutter peaks, making it difficult to correctly interpret electropherograms of microsatellite genotyping [17].
Cross-species amplification of microsatellite loci from primers developed from other species in the same genus (e.g., Melipona bicolor [18], M. seminigra merrillae [19], M. interrupta manaosensis [20], M. mondury [21], M. subnitida [22] and M. fasciculata [23]) might occur mostly due to the presence of conserved flanking sequences across closely related taxa. However, the use of heterologous microsatellite primers in a non-source species, where nonamplifying (null) alleles are expected, could substantially reduce genetic variability, and consequently, the resolving power of the marker [24]. Thus, it is expected that species-specific tri-and tetranucleotide markers will provide a better dataset than previously identified markers [10] in improving our understanding of the population structure and diversity of the stingless bee Melipona rufiventris.
Next generation sequencing has been recently used for the isolation and development of microsatellite loci in the Melipona species [22,23]. The technology has the advantage over traditional methods used for the discovery of SSR markers, as it generates large amounts of sequencing data and consequently large numbers of markers in a simpler and rapid approach, avoiding the construction of microsatellite-enriched DNA libraries, which is a time-consuming and laborious procedure [25].
Therefore, this study describes the first set of tri-and tetranucleotide microsatellite markers, using low coverage whole genome sequencing data from the Illumina platform, and their application to assess genetic diversity and population structure of the existing forms in the 'Melipona rufiventris group' in three Brazilian biomes.

Sampling and DNA Extraction
A total of 50 worker bees, one from each colony, were randomly collected in the states of Minas Gerais (n = 20), Goiás (n = 15), and Piauí (n = 15), Brazil ( Figure 1, Table 1). Total genomic DNA was extracted from the thorax of each bee using the ExtractME Genomic DNA Kit (DNA-Gdansk), following the manufacturer's protocol for animal tissue. DNA samples were electrophoresed on 1.0% agarose gel to test for overall quantity and quality of the DNA yield. to assess genetic diversity and population structure of the existing forms in the 'Melipona rufiventris group' in three Brazilian biomes.

Sampling and DNA Extraction
A total of 50 worker bees, one from each colony, were randomly collected in the states of Minas Gerais (n = 20), Goiás (n = 15), and Piauí (n = 15), Brazil ( Figure 1, Table 1). Total genomic DNA was extracted from the thorax of each bee using the ExtractME Genomic DNA Kit (DNA-Gdansk), following the manufacturer's protocol for animal tissue. DNA samples were electrophoresed on 1.0% agarose gel to test for overall quantity and quality of the DNA yield.

Library Preparation and High-Throughput Sequencing
A single individual DNA sample of approximately 1 µg was used to prepare the genomic library for sequencing, following the standard protocol of the Illumina Nextera XT Library Preparation kit (Illumina Inc., San Diego, CA, USA). The DNA library was sequenced using a MiSeq Benchtop Sequencer (Illumina Inc., San Diego, CA, USA), targeting 500-bp fragments with 2 × 250-bp reads in a paired-end sequencing configuration. The paired-end Illumina reads were first combined to produce contigs with CLC Genomics Workbench 7.0.4 (Qiagen, Redwood City, CA, USA).

SSR Mining and Primer Design
Contigs were added directly into Msatcommander 0.8.2 [26] for detection of possible microsatellite loci with at least six repeats for tri-and tetranucleotides. Primers, forward and reverse, were designed for each short tandem repeat sequence at their flanking regions. Long mononucleotide repeats [(A)n, (G)n, (C)n, (T)n, n > 5] between primer annealing locations were avoided for marker development. Primer design was performed with the web-based Primer3 program (http://bioinfo.ut.ee/primer3-0.4.0/).

PCR Amplification and Validation of Selected SSRs
Genomic DNA from the 50 individuals described above (Table 1) were used to validate all designed primer pairs using polymerase chain reactions (PCRs), and to obtain baseline allele frequency information. Reactions were performed in a 10-µL total volume containing at least 20 ng of genomic DNA, with 1.0 × buffer, 2 to 3 mM MgCl 2 , 1.0 mM dNTP mix, 0.25 mM of each primer and 0.75 units of Taq DNA polymerase (Thermo Scientific Inc, Waltham, MA, USA). All reactions were run on a Veriti 96-well Thermal Cycler (Applied Biosystems, Waltham, MA, USA) using the PCR temperature profile indicated in Table 2. Amplicons were screened by silver nitrate detection on denatured 6% polyacrylamide gels.

Data Analysis
The software Micro-Checker 2.2.3 [27] was used to detect null alleles and scoring problems in the genotyped data. Observed heterozygosity (H O ), expected heterozygosity (H E ), and allele richness (A R ) were calculated using the Fstat v2.9.3.2 software [28]. The polymorphic information content (PIC) was determined using Cervus ver. 3.0 [29]. Deviations from the Hardy-Weinberg equilibrium (HWE) and tests for linkage disequilibrium were conducted using the Genepop software [30]. A Bayesian grouping admixture model with no population assumed a priori, implemented in the software Structure 2.3.4 [31], was used to identify genetically homogeneous groups within the genotyped data. The estimate of the best K, number of groups that best fit the data, was calculated based on 5 replications for each K (from 1 to 11) using Structure Harvester v.0.6.92 [32]. The program was set up for 1,000,000 Markov chain Monte Carlo repetitions, following a burn-in-period of 500,000 iterations. Population structure was also analyzed using principal coordinate analysis (PCoA), R ST , a measure of genetic differentiation analogous to F ST , and D EST estimator of actual differentiation, as implemented in GenAlEx v.6.5 [33].

Sequence Assembly and SSR Mining
Illumina MiSeq sequencing resulted in 54,555,929 reads, which were assembled into a total of 137,313 contig sequences. Minimum and maximum contigs were 200 and 13,505 bases, respectively, with an average size of 397 bases. The Msatcommander 0.8.2 program identified 9745 (7.1%) contigs with microsatellite loci consisting of di-to hexa-nucleotide SSRs with at least six repetitions.
For ease of imaging and scoring, only tri-and tetranucleotides were examined. From these potential microsatellite markers, 25 loci were randomly selected for primer designing and validation in Melipona rufiventris. All microsatellite sequences isolated and validated in this study were deposited in the NCBI in the GenBank database under accession numbers MK133898-MK133922.

SSR Validation
Among all 25 designed primer pairs, 16 loci (64%) were amplified successfully, producing consistent and specific PCR bands of expected size. Possible assembling errors or mutations in both primer-annealing sites at each locus could have resulted in the failed amplifications.
All loci were observed to segregate independent of each other, showing no evidence of linkage disequilibrium (p > 0.05). The polymorphism of microsatellite loci was also separately evaluated in terms of allelic richness (A R ), heterozygosities (H), and the polymorphic information content (PIC) in the samples collected from the three distinct biomes (Table 3). Mean A R , observed and expected heterozygosities (H O and H E ), and PIC were low to moderate, respectively, 2.4, 0.514/0.425, 0.345 for Minas Gerais in the Cerrado biome, 2.7, 0.547/0.479, 0.400 for Goiás in the Atlantic Forest, and 2.7, 0.563/0.502, 0.412 for Piauí in the Caatinga. Despite the fact that the low levels of genetic diversity might occur in social Hymenoptera compared to other insects [34], recent population reduction might also have contributed to this alarming scenario, reducing the genetic variability in these populations of the 'rufiventris group' [5,35]. Heterozygosity estimates were of similar magnitude when compared to those found for Melipona fasciculata [23], but were a little lower for M. subnitida [22]. From the 48 combinations (16 loci and 3 populations) of the overall samples, 10 significant deviations from HWE after sequential Bonferroni correction (p < 0.0031) were detected, mostly towards an excess of heterozygotes.
This could be attributed to a reduction in population size, probably associated with extensive deforestation for new land use and urban expansion, which leads to a disproportionate loss of rare alleles, such that predicting heterozygosity from allele numbers alone creates an underestimated value [36,37]. Additionally, null alleles, inbreeding or selection for or against a certain allele could also have played a part in the departures of HWE [24]. Evidence for null alleles was found only for Mruf1 in Goiás and Piauí, at estimated frequencies of 0.216 and 0.218, respectively, and for loci Mruf9 (0.234) and Mruf18 (0.261) in Minas Gerais.

Genetic Divergence Among Populations in the 'Rufiventris Group"
In our study, F ST (0.252), R ST (0.317), and D EST (0.284) estimates obtained from the microsatellite loci suggest the existence of genetic differentiation among populations/colonies in the 'rufiventris group' (Table 3). Even though these estimates are obtained through different computational methods, they were of a broadly similar magnitude, indicating significant genetic differences among the samples collected in the three contrasting biomes. Admittedly, a more extensive sampling effort over a wider spatial range in the Atlantic Forest was needed. Values of F ST > 0.25 indicate very high genetic differentiation, which can be caused by natural selection or of limited gene flow between populations [38]. Our results are consonant with earlier observations in Melipona rufiventris [12] (F ST = 0.250) and in M. beecheii [39] (F ST = 0.280). Such high differentiation might reflect partial isolation with reduced gene flow between the colonies located in different biomes, resulting in high inbreeding within localities [40]. A low migration rate between populations of M. rufiventris, 0.055 bees per generation, was detected from microsatellite data collected in Minas Gerais [12].
The admixture model-based clustering (Structure analysis) recognized three distinct genetic populations. The ad hoc statistic ∆K, used to infer the true number of clusters (K) that capture the major structure of the dataset, revealed the best K at the second level of sub-population separation (K = 3), with a strong signal (∆K > 400), as seen in Figure 2a. These clusters represent a biologically meaningful level of organization within the colonies surveyed in this study, considering three distinct biomes (Figure 2b). High population genetic structure suggests low levels of gene flow among the colonies from distinct biomes that might not be sufficient to counterbalance genetic drift [41].
Principal coordinate analysis (PCoA) of codominant genotypic distance was conducted to obtain further insights into the relationships among populations of the 'rufiventris group'. The analysis corroborated the population assignments inferred by Structure and the pairwise F ST calculations (Figure 2c). Colonies from Goiás, in the Atlantic Forest biome, diverge along both PCoA axes, whereas colonies from Minas Gerais and Piauí, even though clearly distinct, showed very little overlap in the distribution of individuals. All colonies, however, were clearly clustered into three different groups forming separate genetic clusters and, thus, suggesting a high genetic differentiation. In fact, individuals from colonies occurring in the Atlantic Forest showed morphological differences that are not described for bees in other regions, and according to Melo [2] these two forms should be treated as two different species. The variety found in the Cerrado biome (Minas Gerais) should be called Melipona rufiventris Lepeletier 1836, while the form found in the Atlantic Forest (Goiás) should be called M. mondury Smith 1863 [2,6]. Our study also suggests that a third form exists in the Caatinga (Piauí), in agreement with results from previous research [7] that indicates a new species in the 'rufiventris group', with a distribution that goes from the northwest of Minas Gerais to Maranhão, including the state of Piauí.

Conclusions
In summary, our findings showed that the set of molecular markers described here-which are the first microsatellite loci for the 'rufiventris group'-using NGS technology, have a strong potential for population-level genetic studies, and consequently, will add valuable information to aid in the conservation and management of the species. The information obtained by this set of markers could be used in conjunction with ecological data to genetically monitor the 'rufiventris group', in order to help detect populations at risk of decline and improve population viability for ensuring appropriate conservation and management decisions.
Further support is also given to the presence of high degree of genetic differentiation in the 'rufiventris group', representing colonies from three distinct biomes, and that these should be treated as separate conservation units, each managed accordingly. However, further investigation should be conducted to confirm the extent of this genetic delimitation.

Conclusions
In summary, our findings showed that the set of molecular markers described here-which are the first microsatellite loci for the 'rufiventris group'-using NGS technology, have a strong potential for population-level genetic studies, and consequently, will add valuable information to aid in the conservation and management of the species. The information obtained by this set of markers could be used in conjunction with ecological data to genetically monitor the 'rufiventris group', in order to help detect populations at risk of decline and improve population viability for ensuring appropriate conservation and management decisions.
Further support is also given to the presence of high degree of genetic differentiation in the 'rufiventris group', representing colonies from three distinct biomes, and that these should be treated as separate conservation units, each managed accordingly. However, further investigation should be conducted to confirm the extent of this genetic delimitation.

Acknowledgments:
The authors would like to thank the Genomics and Bioinformatics Center of Drug Research and Development Center of Federal University of Ceará for technical support. Permits for field collection and DNA accession were given by IBAMA/CGEN no. A81805D.

Conflicts of Interest:
The authors declare that they have no conflict of interest.
Data Availability: The datasets generated or analyzed during the current study are available in the GenBank repository (accession numbers MK133898-MK133922) but restrictions apply to the availability of these data, so they are not publicly available until article publication.