Genome Organization and Copy-Number Variation Reveal Clues to Virulence Evolution in Coccidioides posadasii

The human fungal pathogen Coccidioides spp. causes valley fever, a treatment-refractory and sometimes deadly disease prevalent in arid regions of the western hemisphere. Fungal virulence in the mammalian host hinges on a switch between growth as hyphae and as large spherules containing infectious spores. How these virulence programs are encoded in the genome remains poorly understood. Drawing on Coccidioides genomic resources, we first discovered a new facet of genome organization in this system: spherule-gene islands, clusters of genes physically linked in the genome that exhibited specific mRNA induction in the spherule phase. Next, we surveyed copy-number variation genome-wide among strains of C. posadasii. Emerging from this catalog were spherule-gene islands with striking presence–absence differentiation between C. posadasii populations, a pattern expected from virulence factors subjected to different selective pressures across habitats. Finally, analyzing single-nucleotide differences across C. posadasii strains, we identified signatures of natural selection in spherule-expressed genes. Together, our data establish spherule-gene islands as candidate determinants of virulence and targets of selection in Coccidioides.


Introduction
Coccidioides spp. are fungal pathogens endemic to desert regions in the Western Hemisphere. Coccidioides infects, colonizes, and can kill immunocompetent individuals when they inhale spores from soils, with a high risk among agricultural and construction workers in endemic areas [1,2]. The ability of Coccidioides to cause disease depends on an elaborate developmental transition from hyphae to airborne spores called arthroconidia, and then to a host form known as spherules. The latter fill with vegetative endospores and ultimately rupture, enabling dissemination to new infection sites within the host [3].
Against a backdrop of candidate-gene studies of Coccidioides virulence factors [4,5], the molecular mechanisms of Coccidioides virulence remain poorly understood. However, genome [6][7][8][9][10] and transcriptome [11][12][13][14] data in this organism raise the possibility of incisive in silico analyses of candidate virulence genes. In other fungal pathogens, as a complement to genomic screens [15,16], the study of genome architecture has proven to accelerate the discovery of virulence genes-particularly when they group into virulence islands [17][18][19][20]. By the same token, population genomics has proven to be a powerful approach in other fungi for the discovery of how and why virulence factors diversify across isolates in the wild. Indeed, rapid evolution is an identifying characteristic of the genes of fungal virulence programs [17][18][19][20][21].
We set out to use computational genomic strategies to investigate the organization, function, and evolution of candidate virulence factors in Coccidioides. Our results revealed

Spherule-and Hyphae-Gene Islands
Expression data for wild-type C. posadasii were downloaded from [12], which identified 1082 genes significantly upregulated in the spherule phase, 1200 genes significantly upregulated in the hyphae phase, and 5019 genes which were not differentially expressed across life stages. To analyze these expression patterns in inferred secondary metabolite clusters, antiSMASH (antibiotics and Secondary Metabolite Analysis Shell) annotations were retrieved from the GenBank annotations described above and integrated with differential expression calls.
In order to identify spherule-and hyphae-gene islands (genome regions enriched for genes upregulated in the respective life stage), we carried out a permutation analysis using the total number of genes induced in the stage, n true , as follows. We first divided the genome into 25 kb windows and tabulated the number of genes induced in the stage in each window. We call w t, true the number of windows encompassing at least t genes induced in the stage. Next, we generated a permuted localization data set, in which we randomly labeled genes across the genome as upregulated or not upregulated in the focal life stage, with a total of n true genes assigned to the former category. Using this permuted genome, we tabulated w t, perm , the number of windows which overlapped at least t genes induced in the stage. We repeated this calculation for 10,000 permuted genomes and used as an empirical p-value, the fraction for which w t, perm ≥ w t, true . We then iterated through possible values of t and identified those enabling highly-powered discovery of islands. Our final analysis for spherule-upregulated genes used t = 4 genes per window; at this value, the number of spherule-gene islands was significant at empirical p = 0.000145 (Supplementary Table S2). Our final analysis for hyphae-upregulated genes used t = 6 genes per window, yielding hyphae-gene islands which were significant at empirical p = 0.00715 (Supplementary Table S2).

Population-Genetic Statistics
For each gene of the C. posadasii genome, nucleotide diversity (π) was calculated using the DendroPy 4.5.2 popgenstat module [32]. Interpopulation nucleotide diversity (D XY ) between the Arizona population, on the one hand, and the Texas/Mexico/South American population on the other, was calculated as where n is the number of sequences from each population, and d ij is the number of sites with nucleotide differences at the same position in each pair of sequences. Per-gene measurements are shown in Supplementary Table S3, though it should be noted that we assessed significance for gene cohorts based on resampling tests (see below).
For P N /P S , for each gene of the genome, EggLib 3.0.0's coding diversity module [33] was used to count the number of possible nonsynonymous polymorphic sites (N sites ), possible synonymous polymorphic sites (S sites ), nonsynonymous changes (N), and synonymous changes (S). P N /P S was then calculated as P n P s = N N sites S S sites Per-gene P N /P S data are shown in Supplementary Table S3; we assessed the significance of this statistic across gene cohorts via resampling tests (see below).
F ST , Tajima's D, and π were calculated between groups of strains for each 250 base pair window in the genome using vcftools 2.7.5 [29]. Per-window values are listed in Supplementary Table S4; significance evaluation used resampling tests in gene groups (see below).
To measure enrichment of nucleotide diversity in spherule-upregulated genes, we calculated M spherule , the median value of π across spherule-upregulated genes. We then sampled 10,000 random groups of genes, each the same size as the spherule-upregulated gene cohort, and calculated M random , the median value of π in each sample. We used as an empirical p value the proportion of samples in which M random ≥ M spherule . Enrichment tests for other statistics were calculated using an analogous approach. We note that, where appropriate, the resampling strategy controls for differences in sample size and demography between populations.

Copy Number Variation
To identify copy-number-variable regions in each strain, we used the BAM files generated with BWA-MEM as input into Control-FREEC [34] with the following parameters: window = 250, telocentromeric = 0, minExpectedGC = 0.33, maxExpectedGC = 0.63. This returned, for each strain, the predicted copy number of each 250 bp window of the genome.
For each 250 bp window in the output that did not overlap a transposon, we calculated V ST as: where Var signifies the variance of copy number in a given population and N signifies the number of isolates in a given population [35]. Per-window measurements across the genome are shown in Supplementary Table S5B. The degree of enrichment for high V ST in the chromosome III region was calculated as follows. We carried out a simulation, taking into account three parameters: the total number of windows of the genome containing CNVs that had V ST above the 99th percentile, 63; the subset of these in the chromosome III region, which numbered 35; and the size of the chromosome III region, 190-kb. We simulated a genome in which, across all CNV-containing windows of the genome, we randomly marked 68 as top-scoring with respect to V ST . We then used this simulated genome to tabulate the number n of 190-kb regions containing at least 35 simulated top-scoring loci. We repeated this calculation 10,000 times, and we took, as an empirical p-value, the proportion of such simulated data sets in which n > 0.

Secondary-Metabolite Clusters Induced during Spherule Formation
In order to gain biological insight into life-stage-specific expression programs in C. posadasii, we made use of recent transcriptomic profiles of hyphae and spherules of this organism in standard culture conditions [12]. We first evaluated whether and how certain cohorts of genes were particularly active in each life stage. Among genes induced in hyphae, we detected an enrichment of loci encoding secreted factors and carbohydrate active enzymes (Table 1), which is expected if the fungus carries out extracellular breakdown of complex substrates when living in the soil in its hyphal form. In genes induced in spherules, we found enrichment for members of predicted secondary metabolite clusters, a trend that was robust to the source of the cluster annotations (Table 1). Inspecting individual secondary metabolite gene clusters, we identified several with predominantly hyphalspecific or spherule-specific expression ( Figure 1). From these annotation data, we conclude that life-stage-specific expression programs in C. posadasii are functionally coherent, with particularly strong evidence for secondary metabolites in the spherule phase.

Rapid Nucleotide Evolution of Spherule-Expressed Genes
Trends from other fungal pathogens have made it clear that virulence genes tend to evolve rapidly relative to those of other functional classes [17][18][19][20][21]. In order to explore this notion in C. posadasii, we first focused on single-nucleotide differences between strains. We made use of whole-genome resequencing of clinical isolates from a well-sampled Arizona population, as well as smaller data sets from the Texas/Mexico/South America and

Rapid Nucleotide Evolution of Spherule-Expressed Genes
Trends from other fungal pathogens have made it clear that virulence genes tend to evolve rapidly relative to those of other functional classes [17][18][19][20][21]. In order to explore this notion in C. posadasii, we first focused on single-nucleotide differences between strains. We made use of whole-genome resequencing of clinical isolates from a well-sampled Arizona population, as well as smaller data sets from the Texas/Mexico/South America and Caribbean populations [10,25]. For each strain, we mapped raw genomic DNA reads to the C. posadasii reference genome (strain Silveira) [9] to call single-nucleotide variants. Resampling tests using the entire set of spherule-expressed genes ( Table 2) established that these loci were particularly likely to vary in terms of nucleotide differences across Arizona strains (quantified by π, the average number of nucleotide differences between random samples of two alleles from a population) and between populations (quantified with D XY , the absolute population differentiation). These effects were of small magnitude, but robustly significant ( Table 2), indicative of an uptick in evolutionary rate, on average, among genes induced in the spherule phase. An unbiased search of annotation cohorts also revealed high rates of single-nucleotide variation in phosphotransferase enzyme family genes (PF01636; Table 2), echoing previous studies on this family in other fungi [36,37]. Table 2. Genes induced in spherules evolve quickly. Each row reports results from one populationgenomic metric of single-nucleotide variation, as analyzed in genes across C. posadasii strains: π, average number of nucleotide differences between strain pairs in the Arizona population; D XY , absolute divergence between the Arizona and Texas/Mexico/South America populations; P N /P S , count of non-synonymous variants across the Arizona population relative to that of synonymous variants. The first column reports the median of the indicated metric across all genes. The second, third, and fourth columns report, respectively, the number of genes induced in spherules with sufficient sequence data to calculate each metric, the median of the metric across genes in this gene set, and the results of a resampling-based test for enrichment of high values of the metric in this gene set; the last three columns report the analogous quantities for genes annotated with the phosphotransferase enzyme family. Bold values signify p < 0.05.

Spherule-Upregulated Genes
Phosphotransferase Enzyme Family Next, we examined protein evolutionary rate across C. posadasii Arizona strains. Results revealed a modest but significant enrichment for higher values in spherule-expressed genes, paralleling the trends we had seen in nucleotide changes ( Table 2). We reasoned that the signal for protein variation could be attributed, in part, to diversifying the selection at individual loci. In order to explore this, we evaluated the relative proportion of amino acid polymorphism and silent changes at each gene in turn. The former were in excess in 68 spherule-expressed genes (and 414 other genes; Supplementary Table S3), each representing a signature of positive selection on the function of the respective protein. Such a given gene harbored mostly singleton amino-acid changes (Supplementary Figure S1), suggesting a history of rare variants, each accumulated independently by a lineage within Arizona, at these loci. Together, our analyses of single-nucleotide variants make it clear that spherule-expressed genes evolve rapidly across C. posadasii strains.

Identifying Spherule-Gene Islands
We now aimed to search for patterns of chromosomal organization among the genes of C. posadasii induction programs. We developed a resampling-based test for genomic regions harboring an excess of spherule-or hyphal-expressed genes in the Silveira type strain (see Section 2). The results revealed 36 regions with a particularly high number of genes expressed in spherules, and four genomic regions in the analogous test for hyphae ( Figure 2 and Supplementary Table S2). We refer to the former as spherule-gene islands. Among them, we noted six loci which we had already highlighted as secondary metabolite gene clusters with spherule induction (Figure 2). Such overlap served as validation of our unbiased approach to identify co-regulated, functionally similar genes that were physically linked, particularly in the spherule expression program. More broadly, we considered the catalog of the latter to support the inference of a physical organization of C. posadasii virulence genes in islands. physically linked, particularly in the spherule expression program. More broadly, we considered the catalog of the latter to support the inference of a physical organization of C. posadasii virulence genes in islands.

An Atlas of Copy Number Variation
Having identified spherule-gene islands in C. posadasii, we hypothesized that some such loci could be subject to gains and losses across strains. In order to explore this, we used whole-genome resequencing resources [10,25] as input into a pipeline to call copy number changes relative to the Silveira reference genome.
Before focusing on spherule-gene islands (see below), we harnessed the data for a more general survey of copy number-variable regions.
Most copy-number variable loci across C. posadasii encompassed relatively short stretches in gene-poor regions ( Figure 3A, Supplementary Table S5, and Supplementary

An Atlas of Copy Number Variation
Having identified spherule-gene islands in C. posadasii, we hypothesized that some such loci could be subject to gains and losses across strains. In order to explore this, we used whole-genome resequencing resources [10,25] as input into a pipeline to call copy number changes relative to the Silveira reference genome.
Before focusing on spherule-gene islands (see below), we harnessed the data for a more general survey of copy number-variable regions.
Most copy-number variable loci across C. posadasii encompassed relatively short stretches in gene-poor regions ( Figure 3A, Supplementary Table S5, and Supplementary Figure S2). They tended to be flanked by transposons ( Figure 3B), as expected if instability mediated by homologous transposon sequences drove their gain and loss across lineages. In most cases, the minor allele (tabulated as presence or absence relative to Silveira) manifested in only a few strains, though a handful of variants were detected at high frequency across genomes (Supplementary Figure S3). Figure S2). They tended to be flanked by transposons ( Figure 3B), as expected if instability mediated by homologous transposon sequences drove their gain and loss across lineages. In most cases, the minor allele (tabulated as presence or absence relative to Silveira) manifested in only a few strains, though a handful of variants were detected at high frequency across genomes (Supplementary Figure S3). In the main panel, each row reports the results of quantification of copy number in the indicated strain relative to the C. posadasii reference assembly (Teixeira et al., 2021). The x-axis reports genome location, and the color scale reports copy number. At the top, the y-axis reports the VST metric of population differentiation of the copy number; rows underneath report transposon and gene locations. Only results from chromosome I are shown; the remaining chromosomes are reported in Figure S2. (B) The x-axis reports the distance between a copy number variant (CNV) boundary and the nearest transposon; the y-axis reports the proportion of CNVs with the transposon distance on the x, as a kernel density estimate.
We quantified population stratification across windows of copy-number variants using the VST statistic [35,38] (Figure 3 and Supplementary Figure S2). Top-scoring loci in We quantified population stratification across windows of copy-number variants using the V ST statistic [35,38] (Figure 3 and Supplementary Figure S2). Top-scoring loci in this calculation had copy number variants which were over-represented in the Arizona, Texas/Mexico/South America, or Caribbean populations; the 99th percentile of V ST genome-wide corresponded to 68 genomic windows (Supplementary Table S5). Among these V ST peaks, we noted three with gene losses unique to Texas/Mexico/South America strains (Supplementary Table S5), likely reflecting fixations during bottlenecks in the establishment of this population [10,39].
Returning to the complete set of copy number variants across C. posadasii strains, we sought to highlight variable loci overlapping coding genes. In total, these numbered 1157 regions (Supplementary Table S5). Two involved >100 genes each, in strain GT-162 from Texas and strain JTORRES from Venezuela, respectively ( Figure 3A). Most of the remainder were smaller, with an interquartile range of~3 kb to 19 kb (Supplementary Table S5). In order to investigate trends in function among the genes at copy number-variable sites, we focused on Arizona C. posadasii. Enrichment analyses of copy-number changes among Arizona strains revealed an over-representation of phosphotransferase enzyme genes (PF01636; p = 0.000078), paralleling the enrichment of single-nucleotide variants we had noted in this gene cohort (Table 2).
Together, these data make clear that copy-number variation is widespread in C. posadasii-including population-specific variants, which is suggestive of phenotypes and/or selective pressures unique to the distinct endemic areas of the fungus.

Widespread Deletions in a Spherule-Expressed Locus
We aimed to investigate copy number variation in spherule-gene islands across C. posadasii, as a window into genomic and evolutionary forces at these sites. From the intersection between our identified sets of copy-number variant loci and spherule-gene islands, one striking case of overlap emerged (Figure 4). This was a 190-kb stretch on chromosome III, whose full extent was detectable in the genomes of only 30% of isolates from the Arizona population (grey stripe in Figure 4). Its sequence was largely undetectable in most other strains, although 3/4 Venezuelan isolates harbored a duplication of the centromeric end (rightmost orange block in Figure 4). Consequently, the whole region was an extreme outlier in terms of population differences in copy number: it encompassed 35 of the 68 genome windows scoring in the 99th percentile of V ST (top trace, Figure 4; resampling p < 10 −4 ). Thus, C. posadasii populations harbor distinct haplotypes at high frequency at this chromosome III locus. Given the similarity to patterns at contingency loci [40] and sites subject to balancing selection [41,42] in other microbial pathogens, we chose to describe the genomics of the chromosome III region in more depth.  The 78 coding genes of the chromosome III region included 15 spherule-induced genes (falling into two separately identified spherule-gene islands; Figure 2) and another 9 genes induced in hyphae. This constituted a strong enrichment of spherule-induced genes across the region relative to the null expectation (Fisher's two-sided test p = 0.008). Most salient among the annotated genes at this locus (Supplementary Table S6) were an The 78 coding genes of the chromosome III region included 15 spherule-induced genes (falling into two separately identified spherule-gene islands; Figure 2) and another 9 genes induced in hyphae. This constituted a strong enrichment of spherule-induced genes across the region relative to the null expectation (Fisher's two-sided test p = 0.008). Most salient among the annotated genes at this locus (Supplementary Table S6) were an inferred polysaccharide synthase for the spherule capsule; a patatin-like phosopholipase; a hemolysin; and a ferric reductase-all orthologs of known or suspected factors mediating interactions with the host in other fungal [43][44][45]. These functions established the chromosome III region as a compelling example of a putative virulence island with presence-absence variation across the species.
Investigating the architecture of the chromosome III region, we noted extensive sequence similarity in some pairs of coding genes encompassed by the locus (Figure 5 and Section 4.1). We inferred that each such match represented a case of paralogy generated by a local duplication. This included a stretch of seven contiguous genes; a stretch of two contiguous genes; and a single duplicated gene, each present in two copies within the region in Silveira ( Figure 5 and Section 4.1). Sequence identities of the gene pairs in Silveira ranged from 68% to 100% in non-gapped regions ( Figure 5). Transposons were positioned throughout the chromosome III region and at its boundaries (Supplementary Figure S4), plausibly acting as drivers of rearrangements across strains. which most of the chromosome III locus existed at least as far back as the Coccidioides ancestor, undergoing gain and duplication of individual genes, as well as wholesale loss, within C. posadasii. We envisioned two possible models for the history of the chromosome III region in C. posadasii. In one scenario, a given deletion would have occurred once in a member of an ancestral C. posadasii population. If so, descendants of the latter, namely the extant strains we had noted that lacked the respective stretch of genome, would show elevated allele-sharing in flanking regions. Alternatively, chromosome III loss events could have occurred repeatedly in C. posadasii lineages, in which case extant strains with deletions would not exhibit high sequence similarity to each other in surrounding regions. In order to explore evidence for these possibilities, we used genomes from the deeply-sampled Arizona C. posadasii population. We categorized each Arizona strain as harboring a full deletion or full retention of the chromosome III locus. We then examined sequence variation in windows flanking the chromosome III locus. We tabulated population differentiation (using the FST metric) between the strains of the two categories, and, separately, we calculated nucleotide diversity and Tajima's D, an allele frequency metric, for each category of strains. In no case did the chromosome III locus show a signal as an outlier relative to the genomic null (Supplementary Table S4). Strains harboring the deletion also did not cluster together in the species-wide phylogeny (Supplementary Figure S5). We thus detected no allele-sharing among Arizona C. posadasii strains with the chromosome III deletion, supporting an inference of independent de novo events as the origin of the high frequency of the deletion haplotype. In summary, our observations of the chromosome III locus revealed that C. posadasii populations have repeatedly acquired new haplotypes at this site and maintained them at high frequency, and that the region is rich with putative virulence genes. These results We likewise assessed homology of each gene of the chromosome III region in C. immitis, the sister species to C. posadasii, finding C. immitis orthologs for most genes of the locus (Supplementary Table S6). Approximately half of the genes had homologs in a more distant relative, U. reesii (Supplementary Table S6). These data supported a picture in which most of the chromosome III locus existed at least as far back as the Coccidioides ancestor, undergoing gain and duplication of individual genes, as well as wholesale loss, within C. posadasii.
We envisioned two possible models for the history of the chromosome III region in C. posadasii. In one scenario, a given deletion would have occurred once in a member of an ancestral C. posadasii population. If so, descendants of the latter, namely the extant strains we had noted that lacked the respective stretch of genome, would show elevated allele-sharing in flanking regions. Alternatively, chromosome III loss events could have occurred repeatedly in C. posadasii lineages, in which case extant strains with deletions would not exhibit high sequence similarity to each other in surrounding regions. In order to explore evidence for these possibilities, we used genomes from the deeply-sampled Arizona C. posadasii population. We categorized each Arizona strain as harboring a full deletion or full retention of the chromosome III locus. We then examined sequence variation in windows flanking the chromosome III locus. We tabulated population differentiation (using the F ST metric) between the strains of the two categories, and, separately, we calculated nucleotide diversity and Tajima's D, an allele frequency metric, for each category of strains. In no case did the chromosome III locus show a signal as an outlier relative to the genomic null (Supplementary Table S4). Strains harboring the deletion also did not cluster together in the species-wide phylogeny (Supplementary Figure S5). We thus detected no allele-sharing among Arizona C. posadasii strains with the chromosome III deletion, supporting an inference of independent de novo events as the origin of the high frequency of the deletion haplotype.
In summary, our observations of the chromosome III locus revealed that C. posadasii populations have repeatedly acquired new haplotypes at this site and maintained them at high frequency, and that the region is rich with putative virulence genes. These results underscore the power of our genomic surveys to uncover candidate virulence determinants in C. posadasii and to track their evolution across the species.

Discussion
In this work, we have studied functional, genomic, and evolutionary correlates of the spherule expression program in C. posadasii. We have shown that spherule genes tend to fall into virulence-island-like regions, evolve rapidly, and participate in secondary-metabolite enzymatic pathways. We have also characterized the complex evolutionary changes at one spherule-gene island, including gains and losses of genes with functions likely relevant for interactions with the mammalian host. Our genomic data serve as a hypothesis-generating framework for virulence determinants in this under-studied fungus, and they shed new light on the forces that have shaped Coccidioides genomes.
Our identification of secondary-metabolite clusters expressed in distinct phases of the C. posadasii life cycle is consistent with the suites of hyphal-and spherule-unique small molecules observed in this system [46]. Hyphal-specific metabolites may well mediate antagonistic relationships between Coccidioides and bacteria in soils [6,47,48]. Likewise, it is tempting to speculate that spherule-specific metabolites govern C. posadasii interactions with mammalian hosts, as is the case in other fungal pathogens [49]. As functions become clear for these small molecules, they will likely prove to be critical to the fitness of the organism, given signatures of positive selection between Coccidioides species at secondarymetabolite genes [7].
Our discovery of spherule-gene islands in C. posadasii mirrors analyses of linked genes co-regulated in infection conditions in other fungal pathogens [50,51]. Are the loci we study here virulence islands? This term is conventionally used for stretches of genome that underlie virulence and are passed around within and between species by horizontal gene transfer, as seen in bacteria [52] and other fungi [53][54][55]. The chromosome III locus, whose presence-absence variation we have traced represents a prime candidate for such a case. Furthermore, its high population differentiation strongly suggests that natural selection in distinct niches drove large-scale gains and losses at this locus. As such, our characterization of the chromosome III region opens a first window onto the principle that virulence determinants matter for fitness in C. posadasii. By analogy with other fungal pathogens [17][18][19][20][21]55], Coccidioides would tune virulence loci to avoid fitness costs of virulence programs during growth in the environment and/or recognition by hosts during infection. If so, the latter would likely manifest in the small desert mammals thought to be the primary animal vector of the fungus [56,57].
That said, apart from our focal chromosome III site, our population analyses did not reveal gains and losses of other spherule genes, or spherule-gene islands, between C. posadasii strain genomes. Many may not lie in rapidly evolving chromosomal domains of the kind established in other fungi (i.e., under the two-speed genome model; [58]). For example, chromosome ends, which are sinks for transposons [5] and genes unique to the genus [7] in Coccidioides, show no overall enrichment for spherule genes or islands as we identify them here. We do expect, however, that population sampling of additional strains could ultimately reveal copy number variation at more candidate virulence loci in C. posadasii.
Alongside our analyses of presence-absence variation, we also used single-nucleotide variants to detect signatures of rapid evolution of spherule-expressed genes. We also identified hundreds of genes with a preponderance of amino acid changes across strains. On the basis of the low population frequencies of the latter, we suggest an interpretation in which some genes harbor an excess of variants in our analyzed genomes as a product of virulence function and ascertainment bias. This model is contingent on the fact that almost all strains sequenced in the current literature were collected from patient samples (although see [8,59]). Thus, we envision that a rare allele could arise in a given C. posadasii population that confers no particular advantage when the fungus is living in the soil, but promotes opportunistic virulence in humans [20]. Such sites would show up at elevated frequency in clinical genomes, reflecting their importance for virulence, but without positive selection, per se, as a driving force. Future work will establish whether and how such loci do, in fact, contribute to infection mechanisms.
In conclusion, our results underscore the power of deep genomic resources that, when fed into analysis methods focused on gene function and evolution, can inform models of C. posadasii biology. We expect that such in silico approaches will continue to complement experimental strategies to accelerate progress in the field, elucidating how Coccidioides persists in the wild and how it causes disease.

Supplementary Note
In our analysis of the region of C. posadasii chromosome III with signal of extensive loss and rearrangement across strains (Figure 4 and Supplementary Table S6), we traced evidence for paralogous relationships between genes of the locus using sequence similarity searches. The complete set of results is schematized in Figure 5; we provide details of the inferred paralogous pairs here. The data revealed three likely duplication events: one involving two contiguous genes induced in spherules (D8B26_005352 to D8B26_005353 paralogous to D8B26_005401 to D8B26_005403, indicated by the orange star in Figure 5); one involving a seven-gene contiguous span with no evidence for spherule-specific expression (D8B26_005372 to D8B26_005378 paralogous to D8B26_005394 to D8B26_005400, indicated by the blue star in Figure 5); and one involving a single additional spherule-induced gene (D8B26_005351 paralogous to D8B26_005411, indicated by the purple star in Figure 5).