The Potential of Complementary Sex-Determiner Gene Allelic Diversity for Studying the Number of Patrilines within Honeybee Colonies

: Polyandry, a fundamental aspect of honeybee biology, inﬂuences genetic diversity within bee colonies. The Csd gene, responsible for sex determination, exhibits a high intraspeciﬁc polymorphism, making it a promising candidate for studying patrilines. This study investigates the potential of the complementary sex-determiner gene ( Csd ) as a marker for genetic studies of honeybee colonies, comparing its efﬁcacy with standard microsatellite markers. A worker bee brood from ﬁve colonies was genotyped using both Csd and microsatellite markers. The results showed that Csd alleles offer higher variability than microsatellite markers, providing a superior resolution in genotyping. The number of distinct Csd alleles in each colony surpassed microsatellite markers, demonstrating the potential of Csd for detailed genetic analyses. Despite challenges in Csd ampliﬁcation efﬁciency, a two-step nested PCR protocol proved effective. Interestingly, Csd genotyping alone identiﬁed more patrilines than the set of ﬁve microsatellite markers, emphasizing its efﬁcacy. Combining Csd and microsatellite genotyping enhances the resolution of genetic studies in honeybee colonies, offering valuable insights into genetic diversity, reproductive success, and social dynamics. The Csd gene emerges as a promising tool for advancing genetic studies in honeybee populations.


Introduction
Polyandry is a mating system in which a female individual mates with multiple males.Although relatively rare in the animal kingdom, it is observed in various species, including honeybees.In honeybees, the polyandrous mating behavior of queens is a fundamental aspect of their biology.The queen bee mates with multiple drones (usually 10-20) during her mating flights that occur early in her life [1].After mating, the queen stores the sperm from these drones in the spermatheca, a specialized organ that allows her to fertilize eggs throughout her lifetime.The sperm in the spermatheca is thoroughly mixed, ensuring that, at least in the short term, there is no non-random fine-scale sperm usage.This means that the worker bee offspring arising from fertilized eggs laid in a given time indeed represent the majority of patrilines (i.e., groups of individuals sharing the same drone parent) present in the colony [2][3][4].Polyandry in honeybees is expected to offer advantages, such as sexual selection through drone and sperm competition, ensuring that the most viable and genetically fit sperm fertilizes the eggs, thereby increasing the quality of the queen's offspring [5].Additionally, polyandry increases intracolonial genetic Appl.Sci.2024, 14, 26 2 of 8 diversity, enhancing the colony's ability to adapt to changing environmental conditions, resist diseases, and improve its overall fitness [6].Studying patrilines provides insights into genetic diversity within honeybee colonies, the reproductive success of different drones, the effects of genetic diversity on colony health and productivity, and social dynamics among worker bees.Understanding the genetic diversity within colonies is also important from a practical standpoint, as it informs decisions regarding improving the desired traits in breeding programs such as disease resistance, gentleness, and productivity [7].One of the important components of a successful breeding program is mating control, in which genetically superior individuals are identified and used in the selection of breeding lines to generate genetic progress [8].Due to peculiarities in the reproductive biology of honeybees, the implementation of mating control is demanding and needs a proper evaluation of its efficiency.The assessment of mating control success is measured by how many drones (patrilines) the queen mated with came from known (installed) drone-producing colonies and how many from unknown colonies (originating from a large area around the mating site).Current methods employ the indirect genotyping of the worker bee brood using several polymorphic microsatellite loci to estimate the number and the origin of the patrilines in each colony and the share of the patrilines that belong to the known drones-the ones that belong to the drone-producing colonies that were installed [9].
Microsatellite markers, also known as simple sequence repeats (SSRs) or short tandem repeats (STRs), are widely used tools to study inter-and intra-colonial genetic diversity in honeybees [10,11].These markers consist of short repeated units of 1-6 base pairs in length, mainly located in the non-coding regions of the genome [12].Microsatellite markers are highly polymorphic, meaning that the number of repeats can vary significantly among individuals in a population.The analysis of microsatellite markers involves the PCR amplification of specific loci and determining their length variation in the amplified fragments using capillary electrophoresis or, less frequently, DNA sequencing techniques.Microsatellite analysis has been successfully used in honeybee research for studying social organization, population structure, genome mapping, and evolutionary history [6,13,14].Numerous microsatellite loci in the honeybee genome have been described, and different sets of markers are used for specific studies.However, microsatellite marker analysis has limitations, including the occurrence of homoplasy (alleles identical in length but with different sequences), null alleles (alleles that fail to yield PCR products, leading to false homozygosity), and a limited number of alleles for a given microsatellite marker in a population, resulting in the low resolution of the analysis.To overcome some of these limitations, increasing the number of microsatellite loci analyzed is necessary, but it can be time-consuming due to the number of PCR reactions required for an individual bee, optimizing multiplex PCR conditions, and assembling genotypes from multiple loci.Therefore, the use of a single hypervariable genetic marker instead of standard microsatellite markers could be convenient for genetic studies on honeybees, alleviating the amount of genotyping effort, and one such candidate could be the complementary sex-determining gene (Csd).
The Csd gene is responsible for sex determination in honeybees [15].Bees that are heterozygous for Csd develop into females, while bees that are hemizygous or homozygous for Csd develop into males (drones).Generally, drones are haploid as they emerge from unfertilized eggs.Diploid individuals can also develop into males when they are homozygous for Csd; however, they are eliminated by their nestmates at an early stage of development [16].Due to the absence of fitness in homozygotes and the greater likelihood of common alleles forming homozygotes compared to rare ones, Csd undergoes robust over-dominant and negative frequency-dependent selection.This combination of selective pressures, referred to as balancing selection, results in a significant level of intraspecific polymorphism [17].The number of Csd alleles present in the worldwide honeybee population is difficult to estimate, as Csd alleles are unevenly distributed, and there is minimal overlap between local honeybee populations [18,19].To date, several hundred Csd alleles have been deposited in public databases [20].The Csd gene encodes an SR-type protein that initiates the sex-determination cascade, resulting in the sex-specific splicing of feminizer (fem) and doublesex (dsx) transcripts [21][22][23][24][25][26].Exons 6-8 of the Csd gene encode the potential-specifying domain, consisting of the hypervariable region (HVR) flanked by arginine/serine-and proline-rich domains.The HVR domain includes asparagine/tyrosine repeats varying in number across alleles, while the adjacent proline-rich domain contains TTCCTG/A repeats.These domains are separated by a conserved sequence (ATTAAT) containing a VspI (AseI) restriction enzyme recognition site (Figure 1).Similar to microsatellite loci, the diversity of the HVR is likely maintained by mutations occurring as a result of unequal crossing-over and/or polymerase slippage during replication.However, the mutation rate of the Csd gene is estimated to be 2.4 times faster than that of microsatellite loci [19].
Appl.Sci.2024, 14, 26 3 of 9 Csd alleles have been deposited in public databases [20].The Csd gene encodes an SR-type protein that initiates the sex-determination cascade, resulting in the sex-specific splicing of feminizer (fem) and doublesex (dsx) transcripts [21][22][23][24][25][26].Exons 6-8 of the Csd gene encode the potential-specifying domain, consisting of the hypervariable region (HVR) flanked by arginine/serine-and proline-rich domains.The HVR domain includes asparagine/tyrosine repeats varying in number across alleles, while the adjacent proline-rich domain contains TTCCTG/A repeats.These domains are separated by a conserved sequence (ATTAAT) containing a VspI (AseI) restriction enzyme recognition site (Figure 1).Similar to microsatellite loci, the diversity of the HVR is likely maintained by mutations occurring as a result of unequal crossing-over and/or polymerase slippage during replication.However, the mutation rate of the Csd gene is estimated to be 2.4 times faster than that of microsatellite loci [19].The aim of this work was to explore the possibility of replacing or supplementing the standard sets of microsatellite markers used in honeybee research with the analysis of Csd diversity as a method for establishing patrilines and assessing their diversity within honeybee colonies.

Terminal Restriction Length Polymorphism Analysis of Csd Gene
The analysis of the Csd gene through Terminal Restriction Length Polymorphism (T-RFLP) involved a two-step nested PCR reaction with 30 cycles at a 51 °C annealing temperature.The amplification targeted the fragment encoding the potential specifying the domain, utilizing specified oligo-nucleotide primers [23] complementary to the Csd gene The aim of this work was to explore the possibility of replacing or supplementing the standard sets of microsatellite markers used in honeybee research with the analysis of Csd diversity as a method for establishing patrilines and assessing their diversity within honeybee colonies.

Terminal Restriction Length Polymorphism Analysis of Csd Gene
The analysis of the Csd gene through Terminal Restriction Length Polymorphism (T-RFLP) involved a two-step nested PCR reaction with 30 cycles at a 51 • C annealing temperature.The amplification targeted the fragment encoding the potential specifying the domain, utilizing specified oligo-nucleotide primers [23] complementary to the Csd gene (NC_037640.1).The primer sequences were as follows: csdF1: 5 AGACrATATGAAAAATTA CACAATGA, csdR1: 5 TCATwTTTCATTATTCA, csdF2: 5 -HEX-TATCGAGAAAsATCGAA AGAACGAT, csdR2: 5 -6FAM-ATTGAAATCCAAGGTCCCATTGGT, and the PCR Mix Plus kit from A&A Biotechnology was used.Between amplifications, exonuclease I (0.05 U, Thermofisher Scientific, Waltham, MA, USA) digestion was performed (30 min 37 • C, 15 min 85 • C) in order to remove primers.Subsequently, 1-2 µL of the second amplification product underwent digestion in a 10 µL reaction mixture with 2.5 U of the VspI restriction enzyme (Thermo Scientific), followed by a 1 h incubation at 37 • C. A denaturation step with Hi-Di formamide and 0.25 µL of 350 bp Rox standard (Life Technologies, Carlsbad, CA, USA) was applied to 1 µL of the digestion product, followed by capillary electrophoresis on an ABI Prism 310 apparatus.

Data Analysis
Amplified microsatellite markers and Csd restriction fragments were analyzed using GeneMarker v2.2.0 software (SoftGenetics LLC, State College, PA, USA) [28].Size-scoring assignments were validated through a visual inspection.The maternal genotype of each colony was determined using drone genotyping.Patrilines were established either by grouping worker bees with the same set of paternal microsatellites or the identical restriction fragment pattern of paternal Csd amplicons.Individuals with missing data at a specific marker were grouped into the largest patriline assignment for all other loci.

Results
In order to investigate the potential of using Csd alleles as a marker for patriline identification in honeybee colonies and compare its usefulness with sets of commonly used microsatellite markers, we analyzed worker bees from five colonies headed by naturally inseminated one-year-old queens.We analyzed five microsatellite loci (A113, Ap43, Ap55, B124, and A7); the number of markers used was similar to the numbers of microsatellite loci analyzed in other studies detecting patrilines in bee colonies [4,5,[29][30][31][32].Additionally, we amplified the region of the Csd gene, encoding the potential-specifying domain (Figure 1).In the case of the Csd fragment, the amplification product was digested with the AseI restriction enzyme.The amplification products were analyzed using capillary electrophoresis.
The number of distinct Csd alleles identified in each colony was determined by the number of combinations of the lengths of 5 -and 3 -fragments and compared to the number of distinct microsatellite markers (Table 1).
The results show that the number of different Csd alleles was greater than any of the other marker loci in all the analyzed colonies.The mean number of identified Csd alleles was 17, whereas, in the case of other markers, the mean number was within the range of 4 to 8.8.To identify the number of patrilines in each colony, we reconstructed the paternal genotypes of the analyzed worker bees.To undertake this task, we characterized the maternal loci by genotyping the drones and then identifying the paternal alleles in the worker bees as those that were not the maternal alleles.All worker bees analyzed in our study carried one of the two alleles which were carried by the drones present in the colony, and therefore, those two alleles were identified as the maternal alleles.The results show that using Csd genotyping alone, we were able to identify more patrilines than when using the combined set of microsatellite markers.However, using both microsatellite and Csd genotyping enabled us to discover substantially more (from × 1.1 to × 1.6) patrilines than Csd alone (Table 2).

Discussion
The presented results show that there are several advantages to using Csd genotyping as a means of establishing patrilines in honeybee colonies.Compared to the microsatellite markers used in this study, Csd alleles exhibit greater variability, providing a significantly higher resolution in genotyping.In addition to the previously mentioned influence of balancing selection on the diversity of Csd alleles, this is attributed to the structure of Csd, which involves amplifying a gene fragment consisting of two variable elements-encoding the HVR and the proline-rich domain [15].These elements are subsequently separated through restricted enzyme digestion and represented by HEX-and 6-FAM-labelled fragments, respectively.There is no correlation between the lengths of these aforementioned fragments, allowing them to be considered as two independent markers.The relatively high number of Csd alleles identified in each colony simplifies the process of identifying maternal alleles, as these two maternal alleles appear with a much higher frequency (0.25 each) against the background of several paternal alleles.Consequently, we found that genotyping drones to identify maternal alleles is usually unnecessary.A relatively low number of distinct microsatellite alleles within a given population also gives rise to issues in reconstructing paternal genotypes.In situations where the worker bee's genotype matches that of the queen, unambiguously identifying the paternal allele becomes impossible.The scarcity of distinct alleles also elevates the probability of homozygous genotype formation.While these can be identified as such, they may also be interpreted as a consequence of the presence of a null allele that has not been amplified, possibly due to incompatible PCR primers.
Furthermore, the most commonly used microsatellite markers consist of repeat units consisting of only two nucleotides, which can lead to errors in scoring their sizes [10,11].Some microsatellite markers consist of two repeat units of different sizes, resulting in the uneven spacing of the amplicon lengths, further complicating the scoring process.In contrast to commonly used microsatellite markers, the variable region of the Csd gene encodes a protein [15].Therefore, the length of variation in Csd alleles occurs in multiples of three nucleotides, resulting in evenly spaced amplicon fragments.Sizes can be predicted from sequencing data that are already available in publicly accessible databases.The only exception from the 3N rule is the length diversity of the sequence of the intron (62 nucleotides) localized between exons 7 and 8; however, in the described dataset, we found only two Csd alleles differing in the length of the intron by one nucleotide (61 and 63 nucleotides in length).
However, the use of Csd genotyping has some drawbacks, the most significant of which is the relatively inefficient amplification of the Csd HVR fragment, at least in our experience.Although we were generally able to amplify the analyzed fragment in a one-step PCR reaction when using DNA obtained from fresh specimens and isolated using column-based protocols, such conditions limit the analysis of historical samples and significantly increase its cost.Consequently, we established a protocol utilizing a two-step nested-PCR reaction with exonuclease I digestion performed between the steps.Although this protocol admittedly consumes more time, both the amplification and digestion steps can be completed in a single day.We hope that a more efficient protocol can be developed in the future.The amplification of the microsatellite markers used in this study was efficiently achieved in a one-step PCR reaction, regardless of the method of sample storage and DNA isolation.The overall lower efficiency of Csd amplification, combined with the wide spectrum of allele lengths, can also lead to differences in the efficiency of amplifying the two alleles present in the analyzed diploid specimen if these alleles happen to vary significantly in length.The shorter allele amplifies much more readily, often at the expense of the longer allele's amplification efficiency.This discrepancy can be observed in the peak intensity during capillary electrophoresis.While, in extreme cases, this effect may pose challenges in reliably detecting signals from the longer allele, it generally proves beneficial as it facilitates the straightforward pairing of the 5 and 3 fragments of the analyzed alleles.
Our results show that, although patriline identification using the Csd genotyping offers a higher resolution (×1.1-×1.6)than using standard sets of microsatellites, it is not sufficient to identify all patrilines in a given colony.This issue could arise due to the limited number of restriction patterns compared to the variety of Csd sequences.Consequently, relying solely on the restriction patterns of Csd amplicons for assigning paternal alleles to individual groups may result in erroneously combining two or more distinct paternal alleles into a single group.Our initial assessment, which was performed in silico (using sets of Csd sequences extracted from a database, see [26]), in order to estimate the extent of this error, showed that between 1 and 2 Csd alleles in a colony consisting of 20 patrilines would be incorrectly assigned to a group with an identical restriction pattern but different sequences.The analysis was performed under the assumption that the queen was inseminated by 10-20 drones, which, in the light of our present results and studies performed by other groups, seems to be a substantially underestimated number [13].Admittedly, using more microsatellite loci may improve the capability to score all the patrilines within a single colony.However, the techniques to genotype via microsatellite loci are time-consuming and prone to errors (null alleles, misinterpretation of allele banding patterns, contamination) [33]; thus, each additional locus that needs to be amplified represents a significant effort.An adequate number of microsatellite loci to infer all the patrilines depends largely on the diversity of each specific locus, the non-sampling error, and the size of the population and can thus vary between the studies [34,35].Using Csd as a substitute marker for patriline estimation in the evaluation of mating control success instead of using multiple microsatellite loci may still bring an advantage in the reduction in molecular and downstream analyses and a slightly enhanced resolution.Recent research proved that Csd gene variants may also be efficiently determined from a complex mixed sample such as honey.Their technique targeted the HVR region of the Csd gene and was compatible with high-throughput sequencing technologies [36].
In summary, we conclude that combining Csd and microsatellite genotyping substantially enhances the resolution of genetic studies in honeybee colonies, offering valuable insights into genetic diversity, reproductive success, and social dynamics.The Csd gene emerges as a promising tool for advancing genetic studies in honeybee populations.Genotyping using Csd should allow for a reduction in the number of microsatellite markers used, contributing to the increased comparability of results between research teams and individual studies.Currently, this is significantly challenging, as the lack of consensus among scientists and the limitations imposed by the limited microsatellite marker diversity have led to a situation where different sets of microsatellite markers, amplified with differently labeled primers, are in use [37].

Figure 1 .
Figure 1.Apis mellifera Csd locus.The honeybee Csd gene locus is depicted, with rectangles representing exons and shaded rectangles or their fragments denoting the coding segments of the gene.Arrows indicate the positions of the primers employed in this study.The green and blue sections of exons 7 and 8 illustrate the localization of the amplicon and its corresponding restriction fragments, which were analyzed using the T-RFLP method in this research.

Figure 1 .
Figure 1.Apis mellifera Csd locus.The honeybee Csd gene locus is depicted, with rectangles representing exons and shaded rectangles or their fragments denoting the coding segments of the gene.Arrows indicate the positions of the primers employed in this study.The green and blue sections of exons 7 and 8 illustrate the localization of the amplicon and its corresponding restriction fragments, which were analyzed using the T-RFLP method in this research.

Table 1 .
Number of distinct marker loci (maternal and paternal) found in each of the five analyzed colonies.

Table 2 .
Number of patrilines identified in the analyzed colonies using microsatellites, Csd or microsatellites combined with Csd genotyping.