Evidence for Divergent Selection on Immune Genes between the African Malaria Vectors, Anopheles coluzzii and A. gambiae

Simple Summary A comparison of the genomes of the African malaria vectors, Anopheles gambiae and A. coluzzii, revealed that immune genes are highly diverged. Although these two species frequently co-occur within a single site, they occur in distinct larval habitats. Our results taken in the context of known differences in the larval habitats occupied by these taxa support the hypothesis that observed genetic divergence may be driven by immune response to microbial agents specific to these habitats. Strict within species mating may have subsequently evolved in part to maintain immunocompetence which might be compromised by dysregulation of immune pathways in hybrids. We conclude that the evolution of immune gene divergence among this important group of species may serve as a useful model to explore ecological speciation in general. Abstract During their life cycles, microbes infecting mosquitoes encounter components of the mosquito anti-microbial innate immune defenses. Many of these immune responses also mediate susceptibility to malaria parasite infection. In West Africa, the primary malaria vectors are Anopheles coluzzii and A. gambiae sensu stricto, which is subdivided into the Bamako and Savanna sub-taxa. Here, we performed whole genome comparisons of the three taxa as well as genotyping of 333 putatively functional SNPs located in 58 immune signaling genes. Genome data support significantly higher differentiation in immune genes compared with a randomly selected set of non-immune genes among the three taxa (permutation test p < 0.001). Among the 58 genes studied, the majority had one or more segregating mutations (72.9%) that were significantly diverged among the three taxa. Genes detected to be under selection include MAP2K4 and Raf. Despite the genome-wide distribution of immune genes, a high level of linkage disequilibrium (r2 > 0.8) was detected in over 27% of SNP pairs. We discuss the potential role of immune gene divergence as adaptations to the different larval habitats associated with A. gambiae taxa and as a potential force driving ecological speciation in this group of mosquitoes.


Introduction
Virulence genes in pathogens and immune genes in host/vector species are thought to be co-evolving in an arms race [1,2], resulting in rapid evolution of immune genes, as observed across a broad group of organisms, including primates [3], Drosophila species [4,5], and mosquitoes [6]. Differences in selection pressure imposed by distinct microbial environments is likely a major driver of immunological divergence [7].
The immunogenetics of Anopheles mosquitoes has been intensively studied, primarily as a means of understanding their role in the transmission of human malaria parasites. Extensive variation in susceptibility to Plasmodium infection has been observed among populations of the principal African mosquito vectors [8][9][10]. However, it is likely that any direct selection for Plasmodium resistance in these populations is weak due to the generally low infection rate [11,12], variety of different immune signaling pathways involved in resistance [13][14][15][16], and the fact that its innate immune system is co-evolving with many other microbial challenges present in the environment [17]. Anopheles mosquitoes experience different immune challenges over the course of development (e.g., larval density, larval nutrition [18,19], and during adult blood feeding [20,21]). It is likely that immune challenges in the larval habitat would be the most important driver of selection [22].
West African populations of A. gambiae sensu lato are highly structured and have served as models for the study of speciation with gene flow [23][24][25]. In this paper, we focus on the Bamako and Savanna sub-taxa (known as chromosomal forms) of A. gambiae sensu stricto and A. coluzzii, formerly referred to as the Mopti form of A. gambiae but elevated to species status in 2013 [26]. These three are found in sympatry in Mali [27,28], are largely reproductively isolated [29], and are morphologically indistinguishable so that molecular and/or cytogenetic techniques are required to distinguish them [30,31]. Genomic divergence between A. coluzzii and the Bamako and Savanna forms A. gambiae is most conspicuous in the centromeric regions in each of their three chromosomes (X, 2, and 3). The metaphor "islands of speciation" has been coined to refer to these genomic regions [24,32]. Although adults of the three taxa are sympatric, they differ with respect to the aquatic environments inhabited by their larvae [33][34][35]. These habitats likely contain distinct microbial communities that drive divergence in immune related genes and, since many of these same genes are related to immunity against Plasmodium, this may explain differences in susceptibility to the malaria parasite [6,21,36].
In this study, we set out to test the hypothesis that immune genes show a high degree of divergence. We evaluated a suite of 58 immune genes and compared these with comparable groups of non-immune related genes. In addition to underlying variation in Plasmodium susceptibility, immune gene divergence would support the hypothesis that immune challenges are an important selective force driving ecological speciation in this group of mosquitoes [35,37].
We determined genotypes for 333 segregating single nucleotide polymorphisms (SNPs) for each of 471 mosquitoes from the village of Kela, Mali, a site where the Bamako and Savanna forms of A. gambiae and A. coluzzii occur in sympatry. This panel of SNPs contained either non-synonymous substitutions or resulted in a change in codon frequency with the potential to cause functional variation in the 58 immune genes in which they occur [38][39][40]. The selected immune genes are all located outside the speciation islands; therefore, there is no a priori expectation that they should show elevated divergence among taxa. However, SNPs were observed in all 58 genes tested, and allelic variation was significantly associated with taxa. Additionally, we estimated between-taxon divergence of SNPs in an expansive set of 231 known immune genes using whole-genome sequencing of 12 Bamako, 11 Savanna, and 13 A. coluzzii collected from Kela. This allowed us to compare levels of divergence between immune and non-immune genes. Based on these data, we discuss the potential implications

Species Identification
Members of the A. gambiae s.l. species complex (A. coluzzii and A. gambiae) were distinguished from other Anopheles species using an established PCR assay [42]. Specimens of A. coluzzii were distinguished from A. gambiae s.s. based on an established PCR assay [30,43].
A. gambiae is further divided into sub-specific taxa known as the Bamako and Savanna chromosomal forms [27]. These were are identified by karyotyping five paracentric inversions on the right arm of chromosome 2 (2Rj, 2Rb, 2Rc, 2Rd, and 2Ru) following standard classification guidelines [27,28]. The polytene chromosomes used for these analyses were extracted from ovarian nurse cells using standard cytogenetic methods [41]. Chromosome banding patterns were examined using an Olympus BX-50 phase contrast microscope (Olympus America, San Jose, CA, USA) and scored relative to the polytene chromosome map for the A. gambiae complex [44]. Immune signaling pathways associated with resistance to microbial infection in Anopheles mosquitoes involve proteins with functional domains that are highly conserved across metazoans and include the nuclear factor (NF)-κB-dependent Toll and immune deficiency (IMD) pathways [45,46], signaling through Janus kinase/signal transducers and activators of transcription (JAK-STAT) [47], and the mitogen-activated protein kinase (MAPK)-dependent signaling pathways [16,[48][49][50]. Signaling from the tumor necrosis factor (TNF) receptor can also modulate these pathways [51,52]. Other signaling pathways involved in growth, metabolism, and development also regulate innate immunity and responses to infection. These include the insulin/insulin-like growth factor signaling (IIS) cascade [14,16,[53][54][55] and transforming growth factor (TGF)-beta signaling [56][57][58].
Despite the high degree of conservation of functional domains in these immune signaling pathway proteins, there is variation in susceptibility phenotypes within vector species which can be explained, in part, by genetic variation [8,36,59]. It has been proposed that the ancestral A. gambiae phenotype is resistant to Plasmodium infection and that a deficiency in immunity is responsible for susceptibility [8]. Studies of Neotropical Anopheles species also support this hypothesis [60].

SNP Discovery
A total of 48 individual mosquitoes including 32 A. gambiae and 16 A. coluzzii were selected for SNP discovery and confirmation. The conserved domains (e.g., catalytic, protein interaction) of a subset of 58 immune-related genes were sequenced. ChromasLite ver. 2.01 Technelysium Pty Ltd, South Brisbane, Australia) was used to view chromatograms and convert chromatograms to text sequences. Geneious software (Geneious Biologics, West Auckland, New Zealand) was used for sequence alignment. Details are provided in File S1.

SNP Genotyping
From the total of 1947 SNPs discovered, 333 were potentially functional (non-synonymous substitutions or synonymous changes with a 2-fold difference in the codon usage rates, therefore likely to affect folding [38,61]). These were selected for multiplex iPLEX SNP genotyping (Agena Biosciences, San Diego, CA, USA). A total of 254 A. gambiae s.s. (167 Bamako form, 87 Savanna form) and 194 A. coluzzii, were selected for SNP genotyping. SNP genotyping was conducted at the Veterinary Genetics Laboratory at UC Davis.

Genotype Data Analysis
The significance of the association between genotype and chromosomal form was determined using chi-square tests adjusting for multiple comparisons. Effect size and power calculations were conducted using the pwr package in R statistics software [62]. Heterozygosity and F ST were calculated using Arlequin version 3.5 [63]. Loci under selection were detected using the method described by Excoffier et al. [64] as implemented in Arlequin version 3.5 [63]. This method employees a hierarchical island model, in which demes exchange more migrants within than between groups, to generate the joint distribution of genetic diversity within and between populations and greatly reducing false positive loci [64]. Linkage disequilibrium value (r 2 ) was calculated using a maximum likelihood method implemented in the EMLD program [65]. Cytoscape [66] was used to draw a gene network.

Whole Genome Sequencing and Data Analysis
We performed whole-genome sequencing on 23 A. gambiae s.s. individuals (12 Bamako and 11 Savanna) and 12 A. coluzzii individuals all collected from Kela, Mali. We followed the protocol described in Norris et al. [67] for genomic DNA library construction. Genomic DNA libraries were sequenced by the QB3 Vincent J Coates Genomics Sequencing Laboratory at UC Berkeley on the Illumina HiSeq2500 platform with paired-end 100 bp reads. Reads were aligned to the A. gambiae reference genome (AgamP3 [68]) with the BWA-MEM aligner [69]. Freebayes v9.9.2-46 [70] was used for SNP identification employing standard filters. F ST values were calculated using the Weir and Cockerham estimator implemented in VCFtools 0.1.12b [71]. Bootstrap p-values [72,73] were generated by comparing the F ST value of the set of immune genes to values computed for a random sample of 231 genes (out of 12,519 total genes) repeated 1000 times. Details are provided in File S1.

Data Accessibility
Genome sequencing data have been deposited in the NCBI Sequence Read Archive (SRP062875). Other data associated with this study are available in supporting information at: https://popi.ucdavis. edu/misc/insects/Lee_et_al_2020/.

Results
Mean heterozygosities for the 333 immune gene SNPs were similar (H = 0.14) in all three taxa with standard deviations of 0.160-0.168, although the number of invariant SNPs was slightly lower in A. coluzzii (n = 56) than in the Bamako (n = 76) or Savanna (n = 76) A. gambiae populations. Of the 333 SNPs sampled, 100 in 43 genes showed significant associations with taxa (Table 1).
F ST based on SNPs was highest between A. coluzzii and Bamako (=0.106) and lowest between the Bamako and Savanna populations (=0.049). Both genome sequence and SNP genotype data consistently indicate that the sub-specific taxa Savanna and Bamako (sub-taxa within A. gambiae) are least diverged.
Paracentric chromosome inversions on chromosome 2R likely contribute to immune gene divergence since 13 of the 14 2R genes studied are located near (<1 Mbp) or within inversion break points (j, b, c, and u, indicated in Figure 1). FST based on SNPs was highest between A. coluzzii and Bamako (=0.106) and lowest between the Bamako and Savanna populations (=0.049). Both genome sequence and SNP genotype data consistently indicate that the sub-specific taxa Savanna and Bamako (sub-taxa within A. gambiae) are least diverged.
Paracentric chromosome inversions on chromosome 2R likely contribute to immune gene divergence since 13 of the 14 2R genes studied are located near (<1 Mbp) or within inversion break points (j, b, c, and u, indicated in Figure 1). This significantly exceeds the average number of immune genes expected in these regions if their distributions were random (8.6, +/−1.59, permutation test p < 0.05). It is not clear whether immune genes migrate to inversions, similar to genes migrating from the X to autosomes [74,75], or if inversions were selected upon in these regions. We used a hierarchical island model [64] to identify loci under selection and to account for population structure. This model is expected to generate a lower number of markers under selection by removing false positive loci, as compared with nonhierarchical models that are strictly based on FST values [64]. Although Raf-784 and Raf-851 are only 67 bp apart, we present the genotype distribution of both SNPs because the most common haplotype in each taxon is different. For instance, GC (a concatenation of Raf-784 and Raf-851) is the most common Raf haplotype in A. coluzzii, the AC haplotype is most common in Bamako, and AA is most common in Savanna). This pattern of taxon-specific divergence in immune signaling genes supports restricted gene flow between taxa that may be influenced by adaptation to different ecological niches [76]. Furthermore, these data indicate that potentially functional divergence is not restricted to pericentric regions [77]. In support of selection driven divergence, three SNPs, MAP2K4-164, Raf-784, and Raf-851, were identified as loci under selection (based on a hierarchical island model [64] p < 4.2 × 10 −5 ; Table 2). This significantly exceeds the average number of immune genes expected in these regions if their distributions were random (8.6, +/−1.59, permutation test p < 0.05). It is not clear whether immune genes migrate to inversions, similar to genes migrating from the X to autosomes [74,75], or if inversions were selected upon in these regions. We used a hierarchical island model [64] to identify loci under selection and to account for population structure. This model is expected to generate a lower number of markers under selection by removing false positive loci, as compared with non-hierarchical models that are strictly based on F ST values [64]. Although Raf-784 and Raf-851 are only 67 bp apart, we present the genotype distribution of both SNPs because the most common haplotype in each taxon is different. For instance, GC (a concatenation of Raf-784 and Raf-851) is the most common Raf haplotype in A. coluzzii, the AC haplotype is most common in Bamako, and AA is most common in Savanna). This pattern of taxon-specific divergence in immune signaling genes supports restricted gene flow between taxa that may be influenced by adaptation to different ecological niches [76]. Furthermore, these data indicate that potentially functional divergence is not restricted to pericentric regions [77]. In support of selection driven divergence, three SNPs, MAP2K4-164, Raf-784, and Raf-851, were identified as loci under selection (based on a hierarchical island model [64] p < 4.2 × 10 −5 ; Table 2).
Relatively low linkage disequilibrium (LD) was found among the DUSP19, ILP2, MAPK10, MOK-RAGE, RAS, REL1, TAB1, and Toll5A genes (r 2 < 0.27 in all pairwise LD), which show varying degrees of pathway networking (Figure 2).  Relatively low linkage disequilibrium (LD) was found among the DUSP19, ILP2, MAPK10, MOK-RAGE, RAS, REL1, TAB1, and Toll5A genes (r 2 < 0.27 in all pairwise LD), which show varying degrees of pathway networking (Figure 2).   Table 3). We also found high LD (mean r 2 = 0.56) between SNPs with no obvious signaling pathway relationship (Figure 2). The level of divergence of SNPs within a comprehensive set of 231 known immune genes was significantly higher (bootstrap p < 0.001) than non-immune genes in comparisons between A. coluzzii and both A. gambiae taxa ( Figure 3A,C). Although the overall F ST in the 231 immune genes between Bamako and Savanna forms was non-zero (F ST = 0.037), it was not significantly higher than non-immune related genes (p = 0.435; Figure 3B).
Insects 2020, 11, x 7 of 13 All three taxa have relatively large proportions of SNP pairs (>27%) that are under what we consider high LD (r 2 > 0.8; Table 3). We also found high LD (mean r 2 = 0.56) between SNPs with no obvious signaling pathway relationship (Figure 2). The level of divergence of SNPs within a comprehensive set of 231 known immune genes was significantly higher (bootstrap p < 0.001) than non-immune genes in comparisons between A. coluzzii and both A. gambiae taxa ( Figure 3A,C). Although the overall FST in the 231 immune genes between Bamako and Savanna forms was non-zero (FST = 0.037), it was not significantly higher than nonimmune related genes (p = 0.435; Figure 3B). The FST of immune genes was highest between A. coluzzii and Savanna (=0.134) and lowest between Bamako and Savanna (=0.037). There were also significantly more (bootstrap p = 0.005) SNPs detected within the set of immune genes (=17,712) than a random set of genes (mean ± SEM = 14,408.9 ± 38.1). Genes within the speciation islands were excluded from both the immune and non-immune The F ST of immune genes was highest between A. coluzzii and Savanna (=0.134) and lowest between Bamako and Savanna (=0.037). There were also significantly more (bootstrap p = 0.005) SNPs detected within the set of immune genes (=17,712) than a random set of genes (mean ± SEM = 14,408.9 ± 38.1). Genes within the speciation islands were excluded from both the immune and non-immune gene lists, and the same criteria for selecting potentially functional SNPs (non-synonymous or synonymous with a 2-fold difference in usage rate between codons) were applied uniformly.

Discussion
Divergence between A. coluzzii and A. gambiae (Savanna + Bamako) has been described as largely restricted to several small, pericentromeric genomic islands of speciation, one on each of the three Insects 2020, 11, 893 8 of 13 chromosomes [24,32]. Collectively, these speciation islands represent 3% of the genome, with the remainder of the genome lacking obvious regions of differentiation, presumably due to ongoing gene flow among taxa. Since the immune SNPs analyzed herein lie outside these speciation islands, there is no a priori reason to think they would be highly diverged by, for example, hitchhiking. Additionally, stabilizing selection on potentially functional SNPs would be expected to reduce divergence due to drift. Despite these assumptions, we observed that immune SNPs were significantly associated with taxa. However, fixed mutations were rare, supporting the hypothesis that gene flow is occurring (Supplementary Materials Table S1). Furthermore, most of the immune signaling genes on chromosome 2R are in or near inversions, locations predicted to increase the level of divergence. Based on these observations, we propose that immune gene variation could be driving and maintaining divergence among the three A. gambiae taxa within (Table S2).
SNPs in the immune genes analyzed here occur in encoded conserved domains and introduce changes in the amino acid composition in their products (non-synonymous SNP) or were predicted to alter the rate of protein translation and/or folding (synonymous SNP with greater than 2-fold codon frequency change) [38][39][40]. Thus, these mutations could introduce functional changes in the innate immune system that alter survival from basic immune challenges and, coincidentally, susceptibility to P. falciparum infection. Susceptibility phenotypes resulting from immune gene SNPs are likely to be influenced by different genetic backgrounds. Furthermore, immune signaling protein variants can perturb multiple signaling networks that impact different cellular processes, thereby leading to outcomes that have indirect rather than direct effects on protein variation [78]. Direct versus indirect effects must be carefully considered when determining if and how naturally occurring nucleotide variants may be causative (as opposed to incidental) given the significant divergence between taxa ( Figure 3) and high LD between immune genes (Table 3).
While SNPs that are functionally associated with P. falciparum infection have obvious implications for public health, we argue that they could also play a significant role in speciation in this group of mosquitoes. SNPs in immune signaling genes that functionally alter larval responses to environmental pathogens could explain observed larval habitat segregation among A. gambiae taxa [34,79,80]. This segregation provides the foundation for an ecologically-based chromosomal speciation hypothesis, which suggests that differing ecological characteristics are driving divergence and adaptation among A. gambiae taxa [27,35,81]. A study in Burkina Faso supports this hypothesis by showing that A. coluzzii is more often encountered in rice fields, whereas the Savanna taxon (A. gambiae) is more often found in small, rainfall-dependent bodies of water [33,79]. The larval habitats of A. gambiae s.l., therefore, may harbor diverse microbial communities and the taxa may differ in their tolerance of each. Hybrids between taxa are surprisingly rare in most sympatric sites, and this has been largely attributed to prezygotic isolation [82][83][84]. However, we suggest that mate choice may have evolved in part to maintain a specific immunocompetence by avoiding dysregulation of immune pathways in hybrids [85,86]. In addition, subpar immune function in hybrids may lead to poor survival under natural conditions [32,87].

Conclusions
Genome comparison of African malaria vectors, Anopheles gambiae and A. coluzzii, revealed that immune genes are highly diverged. Although sympatric, the two species occur in distinct larval habitats. Our results suggest the hypothesis that divergence is driven by an immune response to microbes specific to the different larval aquatic habitats occupied by these species. Assortative mating may have subsequently evolved in part to maintain immunocompetence, which might be compromised by dysregulation of immune pathways in hybrids. We conclude that the evolution of immune gene divergence among this important group of species may play a critical role in malaria transmission and represent a useful model to explore ecological speciation in general.