Signatures of Positive Selection in the Genome of Apis mellifera carnica: A Subspecies of European Honeybees

The technology of long reads substantially improved the contingency of the genome assembly, particularly resolving contiguity of the repetitive regions. By integrating the interactive fragment using Hi-C, and the HiFi technique, a solid genome of the honeybee Apis mellifera carnica was assembled at the chromosomal level. A distinctive pattern of genes involved in social evolution was found by comparing it with social and solitary bees. A positive selection was identified in genes involved with cold tolerance, which likely underlies the adaptation of this European honeybee subspecies in the north hemisphere. The availability of this new high-quality genome will foster further studies and advances on genome variation during subspeciation, honeybee breeding and comparative genomics.


Introduction
Insect pollination is essential for maintaining the balance of the ecosystem, which contributes approximately 35% to crop pollination [1]. The honeybee Apis mellifera is a key managed pollinator, with an estimated annual global economic value of $195 billion [2]. The recent decline of honeybee colonies has provoked serious concerns regarding the biodiversity, as well as food security [3]. A number of stressors have been proven to cause the honeybee collapses, including parasites, pesticide, climate change and habitat loss [4][5][6][7][8]. Among those stressors, parasites are a major cause leading to colony losses, with a synergistic effect with pesticide. It is known that honeybee strains showed variation in tolerance towards parasites and climates [9][10][11][12][13]. However, the genetic mechanism underlying that tolerance is yet not fully understood. In this study, we aim to reveal the genomic variation within A. mellifera species, which is essential to understand their evolution, adaptation to different climates, as well as to refine breeding strategies for disease-tolerant strains.
In this study, we provided a highly contiguous genome assembly of a valuable subspecies of the European honeybee, Apis mellifera carnica. Two complementary sequencing technologies of HIFI and Hi-C were used to generate a high-quality chromosome genome assembly. Comparative genomic analysis revealed gene families selected during the social evolution and climate adaptation.
Life 2022, 12, 1642 3 of 12 tree was constructed using the Neighbor joining model with 1000 bootstraps and the small carpenter bees were used to root the tree.

Identifying Genome Selection Pressures
Orthologs between 7 species (Nasonia vitripennis, Bombus terrestris, Bombus impatiens, Apis florea, Apis dorsata, Apis laboriosa, Apis cerana) and 4 subspecies (Apis mellifera carnica, Apis mellifera caucasica, Apis mellifera ligustica, Apis mellifera mellifera) were discovered using Orthofinder (v2.5.2) [25]. To optimize the number of single-copy orthologs, we categorized them as such if at least 3 Apis mellifera subspecies and Apis cerana had a single-copy gene for the given orthogroup, culminating at 6328 single-copy ortholog families. Phylogenetic tree reconstruction, including all species described above, was undertaken by OrthoFinder. For each single-copy ortholog family, the longest protein isoforms for each of the species' gene were used in multiple sequence alignment with MAFFT (using local-pair algorithm and 1000 iterations) [26] and unreliably aligned residues and sequences were masked with GUIDANCE (v2.02) [27]. To optimize alignment length without gaps, we ran a maxalign script and removed subsequent sequences leading to more than 30% of gapped alignment as long as it did not result in the removal of any A. mellifera subspecies and A. cerana [28]. The protein sequences were replaced with coding sequences in the multiple alignments using the pal2nal script [29]. Furthermore, sequences containing a stop codon or having a length inconsistency between protein and DNA coding sequences (after removal of undefined bases) were filtered out. Alignments regions, where gapped positions were present, were removed with a custom python script, as these are the most problematic for positive selection inference [30,31]. Finally, CDS shorter than 100 nucleotides were eliminated [32].
Phylogenetic tests of positive selection in protein-coding genes usually contrast substitution rates at non-synonymous sites to substitution rates at synonymous sites taken as a proxy to neutral rates of evolution. The adaptive branch-site random effects model (aBSREL) from Hyphy software package was used to detect positive selection experienced by a gene family in a subset of sites in a specific branch of its phylogenetic tree [33]. The test for positive selection was run only on the branches leading to the origin of A. mellifera and on each A. mellifera subspecies. Results from the adaptive branch-site random effects model were corrected for multiple testing as one series using False Discovery Rate (FDR) and set up our significant threshold at 10% [34].

Test for Functional Category Enrichment
Gene Ontology (GO) annotations for our gene families were taken from Hymenoptera Genome database [35]. The enrichment of functional categories was evaluated with the package topGO version 2.4 of Bioconductor [36,37]. To identify functional categories enriched for genes under positive selection, strengthened, and relaxed selection pressure, the SUMSTAT test was used [38,39]. The SUMSTAT test is more sensitive than other methods and minimizes the rate of false positives [40][41][42][43]. To be able to use the distribution of log-likelihood ratios of the aBSREL and RELAX tests as scores in the SUMSTAT test, a fourth root transformation was used [39]. This transformation conserves the ranks of gene families [44]. Gene Ontology categories mapped to less than 10 genes were discarded. The list of significant gene sets resulting from enrichment tests is usually highly redundant. We therefore implemented the "elim" algorithm from the Bioconductor package topGO, to decorrelate the graph structure of the Gene Ontology. To account for multiple testing, the final list of p-values resulting from this test was corrected with the FDR and set up our significant threshold at 20%. To cluster the long list of significant functional categories, we used REVIGO with the SimRel semantic similarity algorithm and medium size (0.7) result list [45,46].

The Genome Assembly Statistics of A. mellifera carnica
A robust genome of 226.02 Mbp comprised of 313 contigs was assembled, which were further collapsed into 169 scaffolds (GCA_013841245.2) (Table 1). By aligning the predicted protein sequences to 1066 core arthropod Benchmarking Universal Single-copy orthologs (BUSCOs) [47], 93.53% of complete BUSCOs were identified. The results suggest that the assembled genome and predicted gene set were complete. Phylogenetic tree reconstruction revealed the subspeciation events of the European honeybees and the topology agreed with the evolution from solitary to social living ( Figure 1) [48,49].

The Genome Assembly Statistics of A. mellifera carnica
A robust genome of 226.02 Mbp comprised of 313 contigs was assembled, which were further collapsed into 169 scaffolds (GCA_013841245.2) ( Table 1). By aligning the predicted protein sequences to 1066 core arthropod Benchmarking Universal Single-copy orthologs (BUSCOs) [47], 93.53% of complete BUSCOs were identified. The results suggest that the assembled genome and predicted gene set were complete. Phylogenetic tree reconstruction revealed the subspeciation events of the European honeybees and the topology agreed with the evolution from solitary to social living ( Figure 1) [48,49].

Genome Alignment among Honeybee Subspecies
By pair-wised alignment, 83% and 89% of A. mellifera carnica genome can be perfectly aligned to A. mellifera mellifera and A. mellifera DH4 genome, respectively (Figures 2 and S1). The average length of the aligned region was 103 Kbp and 108 Kbp for A. mellifera mellifera and A. mellifera DH4, respectively ( Figure S2). The inverted fragment may reflect natural structural variation among the genomes, reflecting local adaption [50,51]. Overall, 88,380 microsatellites with dinucleotides motif were identified in A. mellifera carnica. Comparatively, 89,099 and 89,569 were identified in A. mellifera DH4 and A. mellifera mellifera, Figure 1. Phylogenetic tree of the studies insects for the genome selection. All nodes were 100% bootstrap supported. N. vitripennis was used to root the tree. For A. mellifera carnica, 93.53% of complete BUSCOs were found, which suggests the assembly is complete.

Genome Alignment among Honeybee Subspecies
By pair-wised alignment, 83% and 89% of A. mellifera carnica genome can be perfectly aligned to A. mellifera mellifera and A. mellifera DH4 genome, respectively (Figures 2 and S1). The average length of the aligned region was 103 Kbp and 108 Kbp for A. mellifera mellifera Life 2022, 12, 1642 5 of 12 and A. mellifera DH4, respectively ( Figure S2). The inverted fragment may reflect natural structural variation among the genomes, reflecting local adaption [50,51]. Overall, 88,380 microsatellites with dinucleotides motif were identified in A. mellifera carnica. Comparatively, 89,099 and 89,569 were identified in A. mellifera DH4 and A. mellifera mellifera, respectively. The relative abundance of microsatellites along the motifs were not significantly different among the three genomes (Pearson's Chi-squared test, df = 24, p = 0.26). However, the number of microsatellites decreased with the increasing number of repeats for all three genomes (Pearson's correlation coefficient, df = 9, p < 0.001, Figure 3A). A set of linkage map makers were further compared among the three genomes [52][53][54]. Out of 1081 paired microsatellite primers, 839 (77%) could be aligned to all three genomes ( Figure S3), with an average density of 4.8 cM per locus ( Figure 3B). For the remaining 242 markers, 162 were aligned to at least one genome. A. mellifera DH4 shared a higher number of markers with A. mellifera carnica compared with A. mellifera mellifera, which significantly deviated from random (Pearson's Chi-squared test, df = 2, p < 0.01) and is congruent with the genome phylogenetic tree in general ( Figure S4). respectively. The relative abundance of microsatellites along the motifs were not significantly different among the three genomes (Pearson's Chi-squared test, df = 24, p = 0.26). However, the number of microsatellites decreased with the increasing number of repeats for all three genomes (Pearson's correlation coefficient, df = 9, p < 0.001, Figure 3A). A set of linkage map makers were further compared among the three genomes [52][53][54]. Out of 1081 paired microsatellite primers, 839 (77%) could be aligned to all three genomes ( Figure  S3), with an average density of 4.8 cM per locus ( Figure 3B). For the remaining 242 markers, 162 were aligned to at least one genome. A. mellifera DH4 shared a higher number of markers with A. mellifera carnica compared with A. mellifera mellifera, which significantly deviated from random (Pearson's Chi-squared test, df = 2, p < 0.01) and is congruent with the genome phylogenetic tree in general ( Figure S4).

Phylogenetic Analysis of Sociality-Related Proteins
The evolutionary process of bee sociality is fascinating, and highlights how genomes evolved to give rise to new and complex behaviors [49,55,56]. Insulin is an essential gene family regulating honeybee caste determination [57,58]. Hexamerin regulates the reproductive tissue development after honeybee caste differentiation [59][60][61]. The two gene families were selected to indicate the social evolution. The phylogenetic tree of insulin and hexamerin gene families clearly showed that solitary bees (digger bees and red mason bees) were an early branch from the root (small carpenter bees), followed by bumble bees

Phylogenetic Analysis of Sociality-Related Proteins
The evolutionary process of bee sociality is fascinating, and highlights how genomes evolved to give rise to new and complex behaviors [49,55,56]. Insulin is an essential Life 2022, 12, 1642 6 of 12 gene family regulating honeybee caste determination [57,58]. Hexamerin regulates the reproductive tissue development after honeybee caste differentiation [59][60][61]. The two gene families were selected to indicate the social evolution. The phylogenetic tree of insulin and hexamerin gene families clearly showed that solitary bees (digger bees and red mason bees) were an early branch from the root (small carpenter bees), followed by bumble bees (Figure 4). The four honeybee species were clustered together, indicating that the a distinctive gene selection of sociality [62][63][64].

Signature of Positive Selection
Overall, 4897 single-copy orthologous groups were identified, out of which 245 orthologous groups displayed signs of positive selection in at least one branch test. The number of orthologous groups under positive selection within each branch varied significantly (Chi-squared test, p < 2.2 × 10 −16 ) and ranged from 10 to 114 with 10, 27, 45, 78, and 114 for A. mellifera caucasica, A. mellifera mellifera, A. mellifera spp., A. mellifera carnica, and A. mellifera ligustica, respectively (Table S1). Such a variation in the number of genes under positive selection among the different A. mellifera subspecies may highlight a fast pace of adaptation or directed domestication in A. mellifera carnica and A. mellifera ligustica compared with others [65,66]. However, this result could also be the consequence of different population structures among A. mellifera subspecies with differential gene flow and introgression levels. Only one orthologous group, encoding for the protein obscurin involved in myogenesis and Hippo signaling pathway [67,68], was found to be under positive selection in three branches (A. m. carnica, A. m. ligustica, and A. m. spp., Table 2). Moreover, a few orthologous groups were found under positive selection in more than one branch (Table 2). Additionally, Hippo signaling pathway was involved in cold temperature adaptation in bees [13].

Signature of Positive Selection
Overall, 4897 single-copy orthologous groups were identified, out of which 245 orthologous groups displayed signs of positive selection in at least one branch test. The number of orthologous groups under positive selection within each branch varied significantly (Chi-squared test, p < 2.2 × 10 −16 ) and ranged from 10 to 114 with 10, 27, 45, 78, and 114 for A. mellifera caucasica, A. mellifera mellifera, A. mellifera spp., A. mellifera carnica, and A. mellifera ligustica, respectively (Table S1). Such a variation in the number of genes under positive selection among the different A. mellifera subspecies may highlight a fast pace of adaptation or directed domestication in A. mellifera carnica and A. mellifera ligustica compared with others [65,66]. However, this result could also be the consequence of different population structures among A. mellifera subspecies with differential gene flow and introgression levels. Only one orthologous group, encoding for the protein obscurin involved in myogenesis and Hippo signaling pathway [67,68], was found to be under positive selection in three branches (A. m. carnica, A. m. ligustica, and A. m. spp., Table 2). Moreover, a few orthologous groups were found under positive selection in more than one branch (Table 2). Additionally, Hippo signaling pathway was involved in cold temperature adaptation in bees [13].

Functional Categories Enriched of Positively Selected Genes
We identified 11 significant functional categories in A. m. carnica, 28 in A. m. ligustica, 5 in A. m. mellifera and caucasica, and 6 in A. mellifera branch, which were enriched in genes under positive selection at 20% FDR. The long list of significant GO-terms found to be significantly enriched of positively selected genes in A. m. ligustica were mainly related to larval development ( Figure 5), as demonstrated with clustering from REVIGO [45]. Interestingly, the two most significant enriched functional genes under positive selection in A. mellifera branch, chitin metabolism and mitochondrial translation (Figure 6), matched functional genes previously found in A. mellifera [34]. While functions found to be enriched of positively selected genes in A. m. mellifera were mainly related to the nervous system, in A. m. caucasica. They were mainly related to autophagy/cell death ( Figure 6). In A. m. carnica, most of significant GO terms seems to be linked with stress tolerance, notably response to wounding, reactive oxygen species (ROS) metabolism, and larval midgut programmed cell death ( Figure 6). More precisely, several significant GO terms are likely involved in cold resistance in honeybees, such as developmental growth [69], ROS metabolism [70], and hippo signaling [13]. Interestingly, the gene encoding the protein Dachsous, which was under positive selection in A. m. carnica and found in several significant GO terms, plays a key role in the adaptation to temperate climate in the A. m. sinisxinyuan [13]. Furthermore, the genes Amel_mTOR and Amel_Imp, encoding serine/threonine-protein kinase mTOR and insulin-like growth factor 2 mRNA-binding protein 1, respectively, might also play a role in cold tolerance as they both are involved in the insulin pathway regulating food intake, essential for cold tolerance [71,72]. Moreover, cold tolerance in A. cerana involved serine/threonine-protein kinases like mTOR [73]. The protein RB1-inducible coiled-coil protein 1, part of the GO term 'larval midgut cell programmed cell death', was found to be under positive selection in A. m. caucasica and involved in cold adaptation in amphipods [74].

Functional Categories Enriched of Positively Selected Genes
We identified 11 significant functional categories in A. m. carnica, 28 in A. m. ligustica 5 in A. m. mellifera and caucasica, and 6 in A. mellifera branch, which were enriched in genes under positive selection at 20% FDR. The long list of significant GO-terms found to be significantly enriched of positively selected genes in A. m. ligustica were mainly related to larval development ( Figure 5), as demonstrated with clustering from REVIGO [45]. Inter estingly, the two most significant enriched functional genes under positive selection in A mellifera branch, chitin metabolism and mitochondrial translation (Figure 6), matched functional genes previously found in A. mellifera [34]. While functions found to be en riched of positively selected genes in A. m. mellifera were mainly related to the nervous system, in A. m. caucasica. They were mainly related to autophagy/cell death ( Figure 6). In A. m. carnica, most of significant GO terms seems to be linked with stress tolerance, nota bly response to wounding, reactive oxygen species (ROS) metabolism, and larval midgu programmed cell death ( Figure 6). More precisely, several significant GO terms are likely involved in cold resistance in honeybees, such as developmental growth [69], ROS metab olism [70], and hippo signaling [13]. Interestingly, the gene encoding the protein Dachsous, which was under positive selection in A. m. carnica and found in several signif icant GO terms, plays a key role in the adaptation to temperate climate in the A. m. sinis xinyuan [13]. Furthermore, the genes Amel_mTOR and Amel_Imp, encoding serine/threonine-protein kinase mTOR and insulin-like growth factor 2 mRNA-binding protein 1, re spectively, might also play a role in cold tolerance as they both are involved in the insulin pathway regulating food intake, essential for cold tolerance [71,72]. Moreover, cold toler ance in A. cerana involved serine/threonine-protein kinases like mTOR [73]. The protein RB1-inducible coiled-coil protein 1, part of the GO term 'larval midgut cell programmed cell death', was found to be under positive selection in A. m. caucasica and involved in cold adaptation in amphipods [74].  Supplementary Materials: The following supporting information can be downloaded at: www.mdpi.com/xxx/s1, Figure S1. Dot plot of among three honeybee subspecies. The honey bee Apis mellifera carnica showed frame shifts and inversion compared with A. mellifera DH4 and A. mellifera mellifera. Figure S2. Aligned sequences between the genomes. The average length of the aligned region was 103 Kbp and 108 Kbp for A. mellifera mellifera and A. mellifera DH4 respectively. Figure S3. Linkage map markers using to align among the three honeybee genomes. Figure S4. Phylogenetic tree and estimated completeness of the studies 7 genomes. (a) The phylogenetic tree was constructed on protein sequences of 874 single-copy orthologs shared among all 7 genomes. All nodes were 100% bootstrap supported. D. melanogaster was used to root the tree. (b) Completeness of predicted protein sets of each genome was assessed by aligning to the arthropod BUSCOs. For A. mellifera carnica, 93.53% of complete BUSCOs were found, which suggest the assembly is complete. Table S1. Number of positively selected genes in each A. mellifera subspecies tested and as well as the A. mellifera branch and number of overlapping genes under positive selection in more than one branch.
Author  Supplementary Materials: The following supporting information can be downloaded at: https://www. mdpi.com/article/10.3390/life12101642/s1, Figure S1. Dot plot of among three honeybee subspecies. The honey bee Apis mellifer a carnica showed frame shifts and inversion compared with A. mellifera DH4 and A. mellifera mellifera. Figure S2. Aligned sequences between the genomes. The average length of the aligned region was 103 Kbp and 108 Kbp for A. mellifera mellifera and A. mellifera DH4 respectively. Figure S3. Linkage map markers using to align among the three honeybee genomes. Figure S4. Phylogenetic tree and estimated completeness of the studies 7 genomes. (a) The phylogenetic tree was constructed on protein sequences of 874 single-copy orthologs shared among all 7 genomes. All nodes were 100% bootstrap supported. D. melanogaster was used to root the tree.
(b) Completeness of predicted protein sets of each genome was assessed by aligning to the arthropod BUSCOs. For A. mellifera carnica, 93.53% of complete BUSCOs were found, which suggest the assembly is complete. Table S1. Number of positively selected genes in each A. mellifera subspecies tested and as well as the A. mellifera branch and number of overlapping genes under positive selection in more than one branch.