Genome-Wide Association Analysis for Triazole Resistance in Aspergillus fumigatus

Aspergillus fumigatus is a ubiquitous fungus and the main agent of aspergillosis, a common fungal infection in the immunocompromised population. Triazoles such as itraconazole and voriconazole are the common first-line drugs for treating aspergillosis. However, triazole resistance in A. fumigatus has been reported in an increasing number of countries. While most studies of triazole resistance have focused on mutations in the triazole target gene cyp51A, >70% of triazole-resistant strains in certain populations showed no mutations in cyp51A. To identify potential non-cyp51A mutations associated with triazole resistance in A. fumigatus, we analyzed the whole genome sequences and triazole susceptibilities of 195 strains from 12 countries. These strains belonged to three distinct clades. Our genome-wide association study (GWAS) identified a total of six missense mutations significantly associated with itraconazole resistance and 18 missense mutations with voriconazole resistance. In addition, to investigate itraconazole and pan-azole resistance, Fisher’s exact tests revealed 26 additional missense variants tightly linked to the top 20 SNPs obtained by GWAS, of which two were consistently associated with triazole resistance. The large number of novel mutations related to triazole resistance should help further investigations into their molecular mechanisms, their clinical importance, and the development of a comprehensive molecular diagnosis toolbox for triazole resistance in A. fumigatus.


Introduction
Aspergillus fumigatus is an opportunistic human fungal pathogen that is found in a broad range of substrates and is capable of surviving and growing in numerous environmental conditions. A. fumigatus is the primary cause of invasive aspergillosis, a life-threatening mold infection with high morbidity and mortality rates in immunocompromised patients. Its high sporulating capacity contributes to the environmental prevalence of A. fumigatus, leading to the high likelihood of infection in at-risk populations [1]. Globally, it is estimated that over 200,000 cases of invasive aspergillosis occur annually [2]. However, this number may represent only one-half of actual cases due to under-and mis-diagnoses [2]. Depending on factors such as population of patients, site of infection and antifungal management, mortality rates associated with invasive aspergillosis range from 60 to 90% [3].
Currently, there are four main classes of antifungals for aspergillosis treatment: azoles, polyenes, echinocandins, and allylamines. Among all antifungal agents, aspergillosis is commonly treated with triazole antifungals as the first choice because their use has been associated with better clinical response, less infusion-related toxicity, less nephrotoxicity and increased survival [4]. For aspergillosis treatment, itraconazole and voriconazole are among some of the commonly used triazole antifungals. Triazole antifungals work by inhibiting a vital enzymatic step in the synthesis of ergosterol, a major sterol and crucial Figure 1. Maximum likelihood phylogenetic tree detailing the strain characteristics. Branches with a red dot represent those with over 75% bootstrap support, based on 500 bootstrap iterations. The inner-most circle denotes the clade affiliation of strains with strain names corresponding to those in Supplementary Table S1. The second inner-most circle represents country of origin for individual strains with different colors representing different countries as shown in the left "Country" panel. The third circle from the inside denotes strain ecological niche, with hollow squares representing strains from the natural environment, solid red squares representing strains from the clinical environment, and the source for the remaining strains (unmarked) were unknown. The itraconazole and voriconazole minimum inhibitory concentrations (MIC) were represented in the two outer circles with different colors representing different MIC values as shown in the left "TriazoleMIC" panel. The white boxes in the two outer circles represent strains with no MIC data. The branch length separating Clade I from the two other clades was manually truncated to make relationships in the other two clades more visible.

Known Mutations Associated with Triazole Resistance
The MIC data for triazole drugs in this population identified 122 and 123 strains with known itraconazole and voriconazole MIC values respectively. We first examined the statistical association between mutations at 44 amino acid sites that had been previously found to be related to triazole resistance in A. fumigatus. The 44 known sites were mainly identified based on epidemiological surveys and are listed in Table 1.  Supplementary Table S1. The second inner-most circle represents country of origin for individual strains with different colors representing different countries as shown in the left "Country" panel. The third circle from the inside denotes strain ecological niche, with hollow squares representing strains from the natural environment, solid red squares representing strains from the clinical environment, and the source for the remaining strains (unmarked) were unknown. The itraconazole and voriconazole minimum inhibitory concentrations (MIC) were represented in the two outer circles with different colors representing different MIC values as shown in the left "TriazoleMIC" panel. The white boxes in the two outer circles represent strains with no MIC data. The branch length separating Clade I from the two other clades was manually truncated to make relationships in the other two clades more visible.

Known Mutations Associated with Triazole Resistance
The MIC data for triazole drugs in this population identified 122 and 123 strains with known itraconazole and voriconazole MIC values respectively. We first examined the statistical association between mutations at 44 amino acid sites that had been previously found to be related to triazole resistance in A. fumigatus. The 44 known sites were mainly identified based on epidemiological surveys and are listed in Table 1.  [37] test defined resistant strains as having MIC values ≥ 2 mg/L and the second test set the resistance values at MIC ≥ 4 mg/L. A Bonferroni-corrected p-value threshold of 4.07 × 10 −4 (0.05/122) was used to evaluate associations between the 22 SNPs and triazole resistance.
Of the 22 known SNPs tested, only one in the Lysine-98 amino acid site, located in the gene cyp51A, was found to be significantly associated with itraconazole and pan-azole resistance in both Fisher's Exact tests (Table 1). We further sought to conduct Fisher's Exact tests using subgroups consisting of solely itraconazole resistant (i.e., resistant to itraconazole but susceptible to voriconazole) or solely voriconazole resistant (i.e., resistant to voriconazole but susceptible to itraconazole) strains groups. However, the sample sizes of these subgroups were all below the requirement needed to achieve the desired Bonferroni-corrected p-value threshold. Thus, these subgroups were omitted from testing.
To unmask the effect of all these listed known mutation sites in cyp51A associated with triazole resistance, our study conducted a stepwise analysis of these sites using Fisher's Exact tests. First, additional Fisher's Exact tests were conducted after strains with the well-documented L98H mutation in cyp51A, which alone with its accompanying tandem repeat TR 34 can confer triazole resistance, were removed. From the 122 strains with known MIC values, 21 strains contained the TR 34 /L98H mutation (Table S1). Using both MIC resistance thresholds and a Bonferroni-corrected threshold of 4.95 × 10 −4 (0.05/101), the additional Fisher's Exact tests identified no SNPs significantly associated with itraconazole and/or pan-azole resistance among these 22 known mutations.
To unmask the effect of other known cyp51A mutations associated with triazole resistance, additional Fisher's Exact tests were also conducted after removal of strains containing any of these known mutations (Table S1). From the strains with known MIC values, 64 strains contained the known mutations in these cyp51A sites (Table S1). After removal of the 64 strains and using a Bonferroni-corrected threshold of 8.62 × 10 −4 (0.05/58), the additional Fisher's Exact tests identified no SNPs significantly associated with itraconazole and/or pan-azole resistance among these 22 mutation sites.
A final set of Fisher's Exact tests were conducted focusing on a clade-level. Clade II was chosen for these additional analyses as the cluster contained the greatest number of strains and none of the Clade II strains contained the L98H mutation in cyp51A. The strains from Clade II with both itraconazole and voriconazole MIC values (n = 71) were used in this final set of the Fisher's Exact tests. Using a Bonferroni-corrected threshold of 7.04 × 10 −4 (0.05/71), no SNPs were found to be significantly associated with itraconazole and/or pan-azole resistance from these 22 mutations sites.
Together, the stepwise analyses results revealed that these well-characterized mutation sites do not account for the observed triazole resistance in our sample sets. Therefore, additional modes of action and uncharacterized novel mutations should be investigated for their possible involvement with triazole susceptibility in A. fumigatus.
A summary of the overexpressed genes and specific fold changes, revealed in previous studies by RT-qPCR and RNA-seq during triazole exposure, are detailed in Table 2. Using these studies, a total of 37 overexpressed genes with triazole exposure were identified for further investigation. However, subsequent analysis excluded cyp51A-related mutations as they have already been extensively searched and discussed in Section 2.2.
We identified SNPs in these 37 overexpressed genes and their neighbouring intergenic regions using the soft-filtered vcf file. A total of 3230 SNP sites were identified in these overexpressed genes from our dataset. Using the same procedure as that used for the 22 known mutation sites, we identified SNPs significantly associated with itraconazoleresistance and pan-azole resistance in these 36 overexpressed genes. Multiple Fisher's Exact tests, with a Bonferroni-corrected threshold of 1.55 × 10 −5 (0.05/3230), were conducted on these sites.
Fisher's Exact tests were also conducted after removal of the 21 strains containing the L98H mutation in cyp51A. Using both MIC resistance thresholds of 2 mg/L and 4 mg/L, three SNPs were found to be significantly associated with itraconazole resistance (Table S3). The three SNPs consisted of the previously found intergenic variant in mfsB, and two novel intergenic variants-one in AFUA_6G01960 and the second in AFUA_6G01960 (Table S3). No SNPs were found to be significantly associated with pan-azole resistance.
A third set of Fisher's Exact tests were conducted after removal of the 64 strains containing the mutations in cyp51A and using both MIC thresholds. Using the MIC resistance thresholds of 4 mg/L, one SNP was found to be significantly associated with pan-azole resistance. This SNP was found in the intergenic region of abcA (Table S3).

Genome-Wide Association Study
In addition to examining known triazole resistance mutations and SNPs in genes overexpressed during triazole exposure, a genome-wide association study (GWAS) was performed on the 122 and 123 strains with known itraconazole and voriconazole MIC values to investigate potential novel mutations associated with triazole sensitivity. The results of our analyses are summarized in Figure 2. Specifically, the itraconazole GWAS Manhattan plot can be found in Figure 2A and for voriconazole, in Figure 2B. The generated quantile-quantile plots for the GWAS results displayed no systematic inflation in our samples ( Figure S1A,B). thresholds of 4 mg/L, one SNP was found to be significantly associated with pan-azole resistance. This SNP was found in the intergenic region of abcA (Table S3).
A final set of Fisher's Exact tests was completed and focused solely on strains from Clade II (n = 71). Using both MIC resistance thresholds, 2 mg/L and 4 mg/L, no SNPs were found to be significantly associated with itraconazole and/or pan-azole resistance in this sample set.

Genome-Wide Association Study
In addition to examining known triazole resistance mutations and SNPs in genes overexpressed during triazole exposure, a genome-wide association study (GWAS) was performed on the 122 and 123 strains with known itraconazole and voriconazole MIC values to investigate potential novel mutations associated with triazole sensitivity. The results of our analyses are summarized in Figure 2. Specifically, the itraconazole GWAS Manhattan plot can be found in Figure 2A and for voriconazole, in Figure 2B. The generated quantile-quantile plots for the GWAS results displayed no systematic inflation in our samples ( Figure S1A We further examined the top 20 significant SNPs identified by the GWAS analysis. Among the 20 SNPs obtained from the itraconazole GWAS, 13 (65%) were located in intergenic regions and 7 (35%) within protein-coding regions (Table 3). These seven SNPs consisted of five missense variants, one synonymous variant, and one non-coding tran-   Additional GWAS analyses were conducted in the same stepwise manner seen in the previous Fisher's Exact tests. Firstly, to alleviate any potential masking effect caused by the L98H mutation in cyp51A, the 21 strains with the L98H mutation were removed. A second GWAS, using the same previous pipeline, was then conducted. The results of the second GWAS are summarized in Figure 3A,B as Manhattan plots for itraconazole and voriconazole, respectively. The generated quantile-quantile plots for both GWAS results displayed no genomic inflation ( Figure S2A,B).
previous Fisher's Exact tests. Firstly, to alleviate any potential masking effect caused by the L98H mutation in cyp51A, the 21 strains with the L98H mutation were removed. A second GWAS, using the same previous pipeline, was then conducted. The results of the second GWAS are summarized in Figure 3A,B as Manhattan plots for itraconazole and voriconazole, respectively. The generated quantile-quantile plots for both GWAS results displayed no genomic inflation ( Figure S2A  The top 20 significant SNPs identified by the second GWAS analyses were examined. Among the 20 SNPs obtained from the itraconazole GWAS, 13 (65%) were located in intergenic regions and 7 (35%) within protein-coding regions (Table 5). These seven SNPs comprised of four missense variants, one synonymous variant, and two non-coding transcript variants (Table 5). In terms of the top 20 SNPs obtained from the second voriconazole GWAS, 10 (50%) were found in intergenic regions and the remaining 10 in coding regions (Table 6). These 10 coding-region SNPs consist of 5 missense variants, 3 synonymous variants, and 2 non-coding transcript variants (Table 6). Among the top 20 SNPs associated with each of the two drugs, none were shared between the two triazole drugs. The top 20 significant SNPs identified by the second GWAS analyses were examined. Among the 20 SNPs obtained from the itraconazole GWAS, 13 (65%) were located in intergenic regions and 7 (35%) within protein-coding regions (Table 5). These seven SNPs comprised of four missense variants, one synonymous variant, and two non-coding transcript variants (Table 5). In terms of the top 20 SNPs obtained from the second voriconazole GWAS, 10 (50%) were found in intergenic regions and the remaining 10 in coding regions Unique SNP sites are denoted by asterisks "*" (n = 8).  Unique SNP sites are denoted by asterisks "*" (n = 12).
A third set of GWAS analyses was also done to alleviate any potential masking effect caused by the known mutations in cyp51A, previously listed in Table 1. The 64 strains with cyp51A mutations were removed and the third GWAS, using the same previous pipeline, was then conducted. The results of the third GWAS are summarized in Figure 4A,B as Manhattan plots for itraconazole and voriconazole, respectively. The generated quantilequantile plots for both GWAS results displayed no genomic inflation ( Figure S3A,B). A third set of GWAS analyses was also done to alleviate any potential masking effect caused by the known mutations in cyp51A, previously listed in Table 1. The 64 strains with cyp51A mutations were removed and the third GWAS, using the same previous pipeline, was then conducted. The results of the third GWAS are summarized in Figure 4A,B as Manhattan plots for itraconazole and voriconazole, respectively. The generated quantilequantile plots for both GWAS results displayed no genomic inflation ( Figure S3A Chromosome − log 10 (p ) The top 20 significant SNPs identified by the third GWAS analyses were examined. Among the top 20 SNPs obtained from the itraconazole GWAS, 11 (55%) were located in intergenic regions and 9 (45%) within protein-coding regions (Table 7). These nine SNPs comprised of three missense variants, two synonymous variants, three non-coding transcript variants and one intragenic variant (Table 7). In terms of the top 20 SNPs obtained from the third voriconazole GWAS, 10 (50%) were found in intergenic regions and the remaining 10 in coding regions (Table 8). These 10 coding-region SNPs consist of six missense variants and four synonymous variants (Table 8). Among the top 20 SNPs associated with each of the two drugs, two SNPS were shared between the two triazole drugs. The Chromosome − log 10 (p ) The top 20 significant SNPs identified by the third GWAS analyses were examined. Among the top 20 SNPs obtained from the itraconazole GWAS, 11 (55%) were located in intergenic regions and 9 (45%) within protein-coding regions (Table 7). These nine SNPs comprised of three missense variants, two synonymous variants, three non-coding transcript variants and one intragenic variant (Table 7). In terms of the top 20 SNPs obtained from the third voriconazole GWAS, 10 (50%) were found in intergenic regions and the remaining 10 in coding regions (Table 8). These 10 coding-region SNPs consist of six missense variants and four synonymous variants (Table 8). Among the top 20 SNPs associated with each of the two drugs, two SNPS were shared between the two triazole drugs. The first variant was a synonymous C to A mutation at the position 2,539,714 on chromosome 4, in the gene AFUA_4G09770. The second mutation was a synonymous T to C mutation at position 2,131,740 of chromosome 5, in the gene AFUA_5G08390. Table 7. Top 20 significant SNPs obtained from the third GWAS associated with itraconazole resistance, arranged based on their −log 10 (p-values) from the highest to lowest. Unique SNP sites compared to the previous two GWAS analyses are denoted by asterisks "*" (n = 12). Unique SNP sites compared to the previous two GWAS analyses are denoted by asterisks "*" (n = 18).
A final set of GWAS was completed to focus our analysis on a clade-level, using strains from Clade II. The strains from Clade II with itraconazole (n = 71) and voriconazole (n = 72) MIC values were used for the fourth GWAS, using the same previous pipelines. The results of this GWAS are summarized in Figure 5A,B as Manhattan plots for itraconazole and voriconazole, respectively. The generated quantile-quantile plots for both GWAS results displayed no genomic inflation ( Figure S4A,B). The top 20 significant SNPs identified by the GWAS analyses on strains from Clade II were examined. Among the top 20 SNPs obtained from the itraconazole GWAS, 15 (75%) were located in intergenic regions and 5 (25%) within protein-coding regions (Table  9). These five SNPs comprised of two missense variants, two synonymous variants, and one non-coding transcript variant (Table 9). In terms of the top 20 SNPs obtained from the voriconazole GWAS, 6 (30%) were found in intergenic regions and the remaining 14 in Chromosome − log 10 (p ) The top 20 significant SNPs identified by the GWAS analyses on strains from Clade II were examined. Among the top 20 SNPs obtained from the itraconazole GWAS, 15 (75%) were located in intergenic regions and 5 (25%) within protein-coding regions (Table 9). These five SNPs comprised of two missense variants, two synonymous variants, and one non-coding transcript variant (Table 9). In terms of the top 20 SNPs obtained from the voriconazole GWAS, 6 (30%) were found in intergenic regions and the remaining 14 in coding regions (Table 10). These 14 coding-region SNPs consist of seven missense variants, five synonymous variants, one non-coding transcript variant and one intragenic variant (Table 10). Among the top 20 SNPs associated with each of the two drugs, no mutation sites were shared between the two triazole drugs. Unique SNP sites compared to the previous three GWAS analyses are denoted by asterisks "*" (n = 6). Unique SNP sites compared to the previous three GWAS analyses are denoted by asterisks "*" (n = 12).

Linkage Disequilibrium Analysis
Linkage disequilibrium analyses were conducted using the top 20 SNPs obtained by the four GWAS analyses and all 314,999 SNPs in the soft-filtered vcf file to search for SNPs highly linked (R 2 > 0.85) to these significantly associated SNPs. Specifically, we focused on highly linked non-synonymous mutations. The results of this association analysis are presented in Table 11 for itraconazole and in Table 12 for voriconazole. In total, for itraconazole resistance, we identified 15 additional highly linked missense variants located in 13 (putative) protein-coding genes (Table 11). For voriconazole resistance, this analysis revealed 11 additional missense SNPs located in 11 different (putative) protein coding genes (Table 12). None of these additional missense SNPs were shared between the two drugs.
Fisher's Exact tests, with a Bonferroni-corrected p-value threshold of 4.07 × 10 −4 (0.05/122), were conducted to examine associations among these highly linked mutations to itraconazole and pan-azole resistance (Table 13). MIC resistance thresholds of 2 mg/L and 4 mg/L were both tested for these 26 sites and using all 122 strains. Both MIC thresholds identified four SNPs to be significantly associated with itraconazole resistance as well as two of these SNPs also being associated with pan-azole resistance (Table 13). Additional Fisher's Exact tests were also conducted after the removal of the 21 strains with the L98H mutation in cyp51A and using a Bonferroni-corrected p-value threshold of 4.95 × 10 −4 (0.05/101) (Table 14). For both MIC resistance thresholds, the results showed that the three previously noted SNPs, in AFUA_1G17380, AFUA_3G09040 and AFUA_3G09070, were again significantly associated with itraconazole resistance. After removal of the 21 strains and using both MIC thresholds, two of these SNPs, AFUA_3G09040 and AFUA_3G09070, were now also significantly associated with pan-azole resistance. Another shared SNP with the previous analysis is the missense variant in AFUA_1G03370, which was found to be significantly associated with pan-azole resistance at both MIC resistance thresholds. Furthermore, using the MIC threshold of 2 mg/L, a novel missense variant in AFUA_7G06290 was found to be associated with pan-azole resistance (Table 14). A third set of Fisher's exact tests were conducted after removal of the 64 strains containing known cyp51A mutations and using a Bonferroni-corrected p-value threshold of 8.62 × 10 −4 (0.05/58) (Table 15). For both MIC resistance thresholds, the tests determined three previously identified SNPs to be significantly associated with both itraconazole and pan-azole resistance. These three SNPs were a missense variant in AFUA_1G17380, AFUA_3G09040, and AFUA_3G09070. Using both MIC thresholds, the tests also identified the previous AFUA_7G06290 missense variant to be significantly associated with panazole resistance. Lastly, another set of Fisher's Exact test was conducted to focus solely on strains from Clade II and used a Bonferroni-corrected threshold of 7.04 × 10 −4 (0.05/71) (Table 16). For both MIC resistance thresholds, the tests determined three previously identified SNPs to be significantly associated with both itraconazole and pan-azole resistance. These SNPs were a missense variant in AFUA_1G17380, AFUA_3G09040 and AFUA_3G09070. Furthermore, using both MIC thresholds, the tests also identified the previously noted missense variant in AFUA_7G06290 to be significantly associated with pan−azole resistance (Table 16).

Discussion
In this study, we analyzed the genomic polymorphisms among 195 A. fumigatus isolates collected from 12 countries as well as the International Space Station to investigate the potential associations between genomic SNPs and triazole resistance. Phylogenetic analyses of the whole-genome SNPs identified three main clades in this sample, with Clade I being very divergent from the other two clades. Most strains in this clade were from Spain and they likely represent a cryptic species within A. fumigatus sensu stricto. Among these 195 strains, the minimum inhibitory concentrations of two triazoles, itraconazole and voriconazole, were reported for 122 and 123 strains, respectively. Over the past two decades, an increasing number of studies have been conducted to investigate the genetic diversity and population structure of A. fumigatus using different molecular markers [14,[40][41][42][43]. A previous study exploring global population genetic variation by Ashu et al. identified 8 genetic clusters by examining nine short tandem repeats in 2026 A. fumigatus isolates from 13 countries [13]. However, a more recent study analyzing the same short tandem repeats of 4049 A. fumigatus isolates identified two broad genetic clusters [14]. The wholegenome SNP analyses here revealed three divergent clades and within both Clades II and III, several sub-clades with significant bootstrap supports were also found. Therefore, the true number and composition of the genetic clusters in the global A. fumigatus population remain uncertain and depend on how clades and genetic clusters are defined. However, based on previous studies, most genetic clusters and clades contain geographically and ecologically diverse strains, consistent with frequent gene flow and great adaptability of A. fumigatus genotypes [14,26].
Among geographic and ecological populations, different frequencies of triazole resistance have been reported, likely reflecting their variations in strain source, clinical antifungal usage, agricultural fungicide usage, and surveillance techniques [44][45][46][47][48][49][50][51]. In 2017, Garcia-Rubio et al. reviewed previously published literature and reported that the global triazole-resistant rate ranged from 0.55% to 30% [30]. In the samples analyzed here and using an MIC threshold of 2 mg/L, 61.48% of all isolates with available MIC data were itraconazole resistant and 43.90% were voriconazole resistant. Furthermore, 63.46% and 43.81% of the clinical isolates were itraconazole and voriconazole resistant, respectively. Similarly, there was a high frequency of environmental isolates resistant to itraconazole and voriconazole, at 50.00% and 44.44%, respectively. Using the MIC threshold of 4 mg/L, resistance frequencies for itraconazole remained the same, however, these values changed for voriconazole. The resistance rate for voriconazole in clinical strains decreased to 35.24% and for environmental strains, it changed to 33.33%. The high rates of resistance among strains analyzed here could be attributed to the biases among research groups in preferentially submitting drug-resistant strains for whole-genome sequencing. However, the broad range of triazole MIC values among the large number of sequenced strains allowed us to infer potential novel genetic variants not identified in previous studies.
Indeed, in this study, GWAS for both itraconazole and voriconazole resistance identified novel genes and mutations linked to triazole resistance. We conducted four GWAS analyses. The first analysis used all strains with known MIC values for itraconazole (n = 122) and voriconazole (n = 123). Meanwhile, the following two analyses investigated novel mutations associated with itraconazole and voriconazole resistance at other SNP loci, unrelated to cyp51A. Specifically, the second GWAS removed the 21 strains containing the L98H mutation in cyp51A and the third GWAS removed the 64 strains containing known mutations in cyp51A related to triazole resistance. The last GWAS was done on a clade-level, focusing the analysis on strains from Clade II with known triazole MIC values (n = 71). For each GWAS, we focused our investigation on the top 20 SNPs obtained via the GWAS analyses for each of the two drugs. We identified a total of six missense variants to be putatively associated with itraconazole resistance. These six missense variants were located in six genes: AFUA_1G09780, AFUA_2G08060, AFUA_3G09090, AFUA_5G08150, AFUA_6G01860, and AFUA_8G02350. The first mutation was in AFUA_1G09780, which encodes for a stomatin family protein. The stomatin proteins belong to a highly conserved family of integral membrane proteins. In humans, stomatin interacts with various ion channels and modulates their activity. The proteins are also thought to perform specific scaffolding functions in membranes. However, the functional information of stomatin proteins in fungi is scarce. The ortholog of AFUA_1G09780 in the closely related Aspergillus nidulans species is AN1287 (stoB). The AN1287 protein is located in the inner mitochondrial membrane [52]. In addition to targeting ergosterol biosynthesis, triazole exposure also causes production of deleterious mitochondrial reactive oxygen species (ROS). Therefore, since triazoles promote ROS accumulation, the mitochondrial membrane complexes represent another group of targets when studying resistance [53]. Of note, this mutation in AFUA_1G09780, was found to be significantly associated with itraconazole resistance in all four GWAS analyses. The second mutation was in AFUA_5G08150, which encodes for a putative ABC bile acid transporter. Although this specific gene has not been previously linked to triazole resistance, other ABC transporter members are known to modulate triazole extrusion. Previous studies have found multiple ABC transporter members to be overexpressed with triazole exposure and that expression levels of various ABC transporter members were higher among triazole-resistant isolates [23,37,54]. In most cases, overexpression in this family of transporters seems to prevent intracellular drug concentrations from reaching levels needed to be effective at impacting ergosterol biosynthesis. The next variant was found in AFUA_8G02350, encoding a putative polyketide synthase. The polyketide synthases are involved in the biosynthesis of polyketides, which are a large and structurally diverse group of secondary metabolites [55]. These compounds have a wide range of biological activities that are important for ecological and evolutionary adaptation in fungi [55]. Furthermore, the gene AFUA_8G02350 is within a terpene hybrid cluster [56]. The fourth gene, AFUA_3G09090, encodes for an uncharacterized protein. The fifth missense mutation related to itraconazole resistance was found in AFUA_2G08060, which encodes an involucrin repeat protein. This protein has been found to be involved in tethering Woronin bodies to septal pores [57]. In multicellular fungi such as A. fumigatus, cells are connected to each other via intercellular bridges called septal pores and Woronin bodies plug these pores upon injury to avoid excessive loss of cell content. Deletion mutants with impaired Woronin bodies have shown to have impaired stress resistance and delayed hyphal wounding response [57]. The missense mutation identified here may enhance the strains' ability to respond quickly to triazole drugs. The last missense mutation was in AFUA_6G01860, which encodes a putative MFS lactose permease. This class of protein plays a role in transmembrane transport and is an MFS family member, a superfamily of transport proteins that are mainly responsible for antifungal resistance development through drug efflux activity.
Interestingly, our itraconazole GWAS analysis results differ from those in a GWAS conducted by Zhao and colleagues who examined SNPs associated with itraconazole sensitivity [29]. They completed a GWAS using 76 clinical A. fumigatus isolates collected from Japan. Our comparisons revealed no overlap between our top 20 SNPs and the SNPs they found to be highly associated with itraconazole sensitivity. Two factors might have contributed to the different observations. In the first, the study by Zhao et al. focused on itraconazole sensitivity in non-resistant clinical isolates of A. fumigatus, with itraconazole MIC ranging from 0.125 to 1 mg/L among their 76 isolates [29]. In contrast, the itraconazole MICs for our 122 strains with itraconazole MICs ranged from 0.13 to 32 mg/L. Secondly, all the strains analyzed by Zhao et al. were from one country, Japan [29]. In contrast, our itraconazole GWAS included 122 strains from eight countries, Canada (n = 12), India (n = 12), Japan (n = 8), Netherlands (n = 21), Spain (n = 19), Germany (n = 1), the United Kingdom (n = 24), and the United States (n = 25). Together, these results suggest that additional novel SNPs associated with itraconazole sensitivity or resistance will likely be present in other geographic and/or ecological populations of A. fumigatus. Another recent preprint by Rhodes and colleagues also conducted a GWAS on itraconazole resistance using treeWAS, a phylogenetic tree-based GWAS approach [58]. A comparison between the top 20 SNPs from our GWAS analyses and their significantly associated SNPs to itraconazole resistance found no overlap in SNP sites. This difference in results is most likely related to sample selection, as their sample set focused on strains obtained from the United Kingdom and Republic of Ireland. Another potential reason for the discrepancy is that our study used a quantitative phenotype, based on MIC values, for our GWAS while Rhodes and colleagues used a binary phenotype, separating strains into susceptible and resistant strains by defining resistance as MIC ≥ 2 mg/L. However, their results are consistent with our general conclusion that a large number of additional novel mutations, un-related to cyp51A, are significantly associated with triazole resistance in A. fumigatus.
The voriconazole GWAS analyses identified a total of 17 missense variants to be putatively associated with resistance. These variants were found in 17 different genes: AFUA_1G03370, AFUA_1G09780, AFUA_1G12540, AFUA_1G17410, AFUA_2G01700, AFUA _2G13030, AFUA_4G09580, AFUA_5G01000, AFUA_5G02210, AFUA_5G14610, AFUA_6G 09870, AFUA_7G00740, AFUA_7G05960, AFUA_8G01250, AFUA_8G01340, AFUA_8G01940, and AFUA_8G02280. Two of these missense variants were found in AFUA_1G03370 and AFUA_5G02210, which encodes for putative proteins of unknown functions. A third gene, AFUA_1G12540, encodes a putative TMEM1 family protein with uncharacterized function. The next seven variants were found in members of enzyme families with roles across a large number of biological processes, which comprised of the genes AFUA_1G17410 that encodes a putative beta-glucosidase, AFUA_2G01700 that encodes a putative serine/threonine protein kinase, AFUA_2G13030 that encodes a phenylalanyl-tRNA synthetase, AFUA_5G01000 that encodes a putative oxidoreductase of the 2-oxoglutarate (2OG)-Fe(II) oxygenase superfamily, AFUA_5G14610 that encodes a putative carboxypeptidase Y, AFUA_7G00740 that encodes a putative protein kinase, and AFUA_8G01250 that encodes a putative GNAT family acetyltransferase. The next variant was found in AFUA_7G05960, which encodes a putative C2H2 finger domain protein. Three significantly associated missense variants also encoded for putative C6 zinc cluster transcription factors, which were found in AFUA_6G09870, AFUA_8G01940 and AFUA_8G02280. Other members of this transcription factor family have been linked to triazole resistance. For example, a previous transcriptome study had found finA, a C6 zinc finger domain protein, displayed increased mRNA levels during adaptation to voriconazole exposure [38]. In addition, another C6 zinc-cluster transcription factor, AtrR, had been found to be associated with triazole resistance by regulating expression of genes related to ergosterol biosynthesis [59]. However, the Zn cluster family is the largest family of transcription factors known in eukaryotes and thus additional testing is required [60]. Interestingly, a missense mutation in AFUA_1G09780 was also found significantly associated to voriconazole and was the same SNP found to be associated with itraconazole resistance. The remaining two genes were AFUA_8G01340 that encodes a putative MFS sugar transporter and AFUA_4G09580, which encodes the major allergen Aspf2. Although our study focused on examining missense variants, significantly associated SNPs obtained by GWAS also included synonymous, intergenic and intronic variants. These variants can have biological consequences and contribute to functional changes in the protein. For example, synonymous mutations can affect critical cis-regulatory sequences, alter mRNA structure, and impact translational speed [61]. Furthermore, non-coding variants can be found within potential regulatory sequences such as enhancers, promoters, and 5 and 3 UTRs [62]. Through these regulatory roles, non-coding variants can influence processes such as transcription, translation and splicing.
The results of the itraconazole and voriconazole GWAS showed few overlaps between significant SNPs; the first GWAS having one SNP overlap in the gene AFUA_1G09780 and the third GWAS having two shared SNPs, a synonymous mutation in AFUA_4G09770 and a synonymous mutation in AFUA_5G08390. The remaining two GWAS found no overlapping SNPs between the two antifungal drugs. Several reasons could have contributed to the low number of shared SNPs. Although all azoles operate using the same common mode of action, decreasing ergosterol synthesis by inhibiting the fungal enzyme 14α-sterol demethylase, there are differences between itraconazole and voriconazole in terms of their mechanisms of action. Voriconazole also inhibits 24-methylene dihydrolanasterol demethylation in Aspergillus and its antifungal activity is likely a result of a combination of effects in addition to inhibition of ergosterol synthesis [63]. Secondly, our sample set consisted of a large number of strains that had different susceptibilities to the two drugs and were only resistant to one of the two antifungals. Finally, our sample set was not a natural randomly mating population but were selected strains sequenced by different laboratories based on their own specific objective and purpose. As a result of the diversity of strains and their originating populations, some of the shared azole-resistance related mutations may have been less frequent in our studied sample set and were, thus, likely filtered out during quality control.
Linkage disequilibrium was also evaluated using the top 20 SNPs obtained by each of the four GWAS analyses to identify additional highly linked missense SNPs. Four sets of Fisher's Exact tests were conducted: using all 122 strains, after removal of the 21 strains with the L98H mutation, after removal of the 64 strains with the cyp51A mutations, and using only the 71 Clade II strains. Interestingly, the last three tests identified two SNPs, missense variants in AFUA_3G09040 and AFUA_3G09070, to be significantly associated with itraconazole as well as pan-azole resistance using both MIC thresholds (Tables 14-16). In the first test, conducted using all 122 strains and at both MIC thresholds, these two missense variants were also found to be significantly associated with itraconazole resistance (Table 13). Another SNP in AFUA_1G17380 was also found in three tests to be significantly associated with itraconazole and pan-azole resistance using both MIC thresholds (Table 13, Table 15, and Table 16). Furthermore, in the remaining test, the SNP site was significantly associated with itraconazole resistance (Table 14). In terms of the function of these three genes, AFUA_1G17380 encodes a putative 3-oxoacyl-(acyl-carrier-protein) reductase, AFUA_3G0904 encodes for an uncharacterized protein and AFUA_3G09070 encodes a putative carboxylesterase.
In addition, our study examined 20 previously known amino acid sites associated with triazole resistance. Fifteen of these known amino acid sites were in the cyp51A gene. Of these 15 sites, several have been functionally validated in previous studies and found to directly contribute to triazole resistance. These validated mutation sites consisted of G54, L98, Y121, G138, M220, T289 and G448 [8,64]. However, hot-spot SNP sites that confer triazole resistance by itself only include five of the seven sites, at G54, Y121, G138, M220 and G448. Meanwhile, for L98 and T289, a combination with a tandem repeat is required for triazole resistance, specifically TR 34 /L98H and TR 46 /Y121F/T289A [65]. Mutations in the G138 site has also been validated to cause multi-azole resistance [66]. Lastly, SNP sites P216 and F219 both confer resistance to itraconazole when mutated [67]. Specifically, there have been two amino acid substitutions in F219 in triazole-resistant strains, F219I and F219L [67]. However, interestingly, we have instead identified a different substitution F219S in our sample set; although it was not significantly associated with resistance (Table 1). We have also included the amino acid sites F46, M172, N248, D255, and E427; although only 3 of these sites could be found in our filtered genotype file, specifically F46, D255, and E427. Strains with a combination of these five mutations, as F46Y/M172V/D255E or F46Y/M172V/D255E/N248T/E427K, have shown higher triazole MICs than the wild-type strains [30]. The three remaining cyp51A mutation sites, H147, S297 and F495, have been identified to be associated with resistant isolates but have not been functionally validated. Mutations in H147 were found to coincide with isolates with G448 mutations and were not found to be associated with resistance by itself and is thought to only increase protein stability [9,68]. Similarly, S297 and F495 mutations are found in some TR 34 /L98H mutant A. fumigatus strains but have not been proven to be sufficient for triazole resistance by themselves [12,69]. However, these two amino acid positions, S297 and F495, are located near the triazole binding pocket of Cyp51A [70]. The remaining five known amino acid sites associated with triazole resistance are non-cyp51A mutations. Three of these SNP sites (I412, P309, and S305) are in hmg1, which encodes a 3-hydroxy-3methyl-glutaryl-coenzyme A (HMG-CoA) reductase. Mutation at these three sites have been identified in triazole-resistant isolates and their association with triazole resistance has been functionally validated by inserting each SNP into the hmg1 gene of the laboratory strain akuB KU80 [33]. These three sites, I412, P309, and S305, are predicted to be within the conserved sterol-sensing domain of hmg1 [33]. Furthermore, akuB KU80 mutants with the substitution I412S and S305P possessed significantly different cellular sterol profiles compared to the unaltered akuB KU80 strain; independent of cyp51A and cyp51B expression [33]. Another known mutation site is L167 in the uncharacterized AFUA_7G01960 gene. The nonsense mutation L167* was first identified in the multi-azole-resistant clinical isolate V157-62, and subsequently functionally validated by inserting the specific SNP into an azole-susceptible clinical isolate, V130-15 [36]. At this amino acid position, a nonsense mutation was generated, which increased resistance to itraconazole [36]. Furthermore, this SNP was also associated with decreased ergosterol in the fungal membrane [36]. Interestingly, overexpression of the AFUA_7G01960 gene itself has also been correlated with increased voriconazole resistance [38]. Taken together with bioinformatic analysis, AFUA_7G01960 is predicted to be a putative transcription factor involved in ergosterol biosynthesis and mutation at L167 likely prevents its activity, thus leading to increased resistance [36]. The last known SNP site we investigated was E180 in AFUA_2G10600, a gene encoding the mitochondrial 29.9 KD NADH oxidoreductase subunit of respiratory complex I. The amino acid substitution E180D is present in itraconazole-resistant clinical isolates of A. fumigatus [37]. Furthermore, restriction enzyme-mediated insertion of this mutation in the AFUA_2G10600 gene led to increased itraconazole resistance [71]. This insertion was at a Xhol site, 534 bp from the start codon of the gene. This increase in resistance indicates that intact AFUA_2G10600 may confer azole susceptibility through mitochondrial NADH metabolism or NAD/NADH redox stress [71]. Complete deletion of the coding region for this 29.9KD subunit was also found to result in itraconazole resistance in the laboratory strain A1163 KU80, increasing from an MIC of 0.25 mg/L to >8 mg/L, which further supports its contribution to itraconazole resistance [37].
Using a Fisher's Exact test on these 22 mutations, with all 122 strains and using both MIC resistance thresholds (2 mg/L and 4 mg/L), we found only one mutation, L98H in cyp51A, to be highly associated with itraconazole and pan-azole resistance (Table 1). Examining all samples with known triazole MIC data, this L98H mutation was found in 21 strains. We also determined using coverage data across the promoter region of cyp51A that all 21 strains with the L98H mutation were accompanied with the common 34-bp tandem repeat ( Figure S5). A subsequent Fisher's Exact test was, thus, done after removing strains with the L98H mutation in cyp51A (n = 21). Additional Fisher's exact tests were also conducted after removal of all strains containing the cyp51A mutations (n = 64) and conducted again with only Clade II strains (n = 71). However, these additional tests identified no SNPs significantly associated with itraconazole and/or pan-azole resistance. A potential reason why the previously functionally validated sites were not highly associated with triazole resistance in our study is that strain counts for mutation genotypes at these sites were low in our 122-strain sample set and only ranged between 1 and 6 strains, making them unable to meet our criteria (>5% frequency in the population) for inclusion (Table S4).
We further examined the distribution of mutation phenotypes in these functionally validated sites in our three clades, using all 195 strains. Interestingly, for cyp51A, the mutation L98H was only present in strains of Clade III (n = 22). For the T289 site, three strains had the mutation T289A and this mutation always accompanied with the substitution Y121F. Furthermore, these three strains were all from Clade III. Mutations in the site G54, specifically G54V (n = 5), G54E (n = 1), G54W (n = 3) and G54R (n = 2), were found mostly in Clade II strains with a rate of 90.91% (10/11). Mutations M220I (n = 2), M220V (n = 1), P216L (n = 4), and F219S (n = 1) were also only found in Clade II strains. For the gene hmg1, mutations in this gene were only seen in Clade II strains (n = 7).
Fisher's Exact tests were also conducted on 37 genes previously found to be overexpressed with triazole exposure using 3230 SNP sites (Table S3). The tests were conducted using both MIC resistance thresholds, 2 mg/L and 4 mg/L. The test conducted with all 122 strains and the MIC threshold set at 2 mg/L found 57 SNPs in or beside 14 genes to be associated with itraconazole resistance. For pan-azole resistance, 11 significantly associated SNPs were found. These SNPs were located in or beside six genes. The test conducted with the MIC threshold of 4 mg/L found the same 57 SNPs in or beside 14 genes to be significantly associated with itraconazole resistance. However, there were slight changes in the SNPs significantly associated with pan-azole resistance when using the 4 mg/L MIC threshold. The 4 mg/L MIC threshold determined 10 SNPs in or beside five genes to be associated with pan-azole resistance. Furthermore, to focus on novel mutations associated with triazole resistance not linked to cyp51A, two additional Fisher's Exact tests were also completed after removing the 21 strains containing the L98H mutation and again after removing the 64 strains with well-known mutations in cyp51A (Table S3). Using both MIC thresholds, the results after removal of the 21 strains found three SNPs to be significantly associated with itraconazole resistance. One SNP had already been noted as associated with itraconazole resistance by the first set of Fisher's Exact test, done prior to the 21-strain removal, which was an intergenic variant in mfsB. However, two novel intergenic variants, one in AFUA_6G01960 and the second in AFUA_1G16460, were found to be significantly associated with itraconazole resistance as well. Furthermore, after removal of the 21 strains, no SNPs were found to be significantly associated with pan-azole resistance (Table S3). In terms of the results after removal of the 64 strains, one novel SNP site was found to be significantly associated with triazole differences. Using the MIC threshold of 2 mg/L, the test identified an intergenic variant in abcA to be associated with itraconazole resistance (Table S3). A final set of Fisher's Exact tests were conducted on a clade-level, using only strains in Clade II (n = 71). However, no SNPs sites were significantly associated with triazole resistance using this sample set. The result differences between tests could be due to genetic hitchhiking alongside the resistance polymorphism L98H in cyp51A as well as to the reduced sample size, thus decreasing the sample count of certain SNPs and making them unable to meet the Bonferroni-corrected critical p-value threshold criteria in these tests.
In about 20% to 70% of triazole-resistant clinical A. fumigatus strains, no mutations related to cyp51A were observed [29,72]. Molecular assays for the detection of A. fumigatus and its cyp51A alterations have been produced to provide rapid detection of cyp51Amediated triazole resistance in clinical samples of A. fumigatus [72]. However, as shown in our analyses, in many of the triazole-resistant strains, relying on assays targeting only the cyp51A mutations would lead to misidentification of these strains as triazole susceptible and cause inappropriate treatment strategies. Indeed, over the last decade, novel triazole resistance mechanisms have been increasingly reported including mutations in hapE, hmg1, yap1, and cox10 genes [10,34,35,73]. The wide and growing range of resistance mechanisms seen in A. fumigatus demonstrates the high potential this fungus has for stress adaptation, including adaptation to antifungal drug resistance. Delays in the initiation of appropriate antifungal therapy are associated with overall increased mortality. Thus, tools enabling direct detection of resistance using rapid molecular methods can greatly facilitate optimal therapy for individual patients. Together, the putative variants found in this study represent promising candidates for future studies to investigate emerging mechanisms of triazole resistance in A. fumigatus. Moreover, these candidate SNPs hold great potential for developing additional diagnostic markers for accurate and rapid identification of triazole resistance in a clinical setting.

Whole Genome Sequences and Strains
Whole-genome sequences for 195 A. fumigatus isolates were used in this study. This sample set comprised of 184 whole-genome sequences obtained from the National Center for Biotechnology Information (NCBI) Sequence Read Archive and an additional 12 isolates that were sequenced from our previous study [63]. This strain collection spans 12 countries, across four continents consisting of 61 strains from North America, 1 from South America, 91 from Europe, 40 from Asia, as well as two strains from the International Space Station. In total, 163 of the 195 strains were isolated from a clinical environment, 29 from the natural environment and 3 of unknown sources. Among them, 122 and 123 samples had antifungal susceptibility profiles to itraconazole and voriconazole, respectively. These profiles are recorded as the minimum inhibitory concentrations (MICs) and are presented in Table S1.

Variant Calling
For genome sequence analysis, a modified pipeline from our previous study was used [74]. In brief, FastQC v0.11.5 was used to check for read quality and low-quality sequences were trimmed using Trimmomatic v0.36 [75,76]. The reads were then mapped to the reference A. fumigatus strain Af293 (GenBank accession GCA_000002655.1) using the BWA-MEM algorithm v0.7.17 [77]. Duplicate reads were removed using MarkDuplicates in the Picard tool and variants were called using FreeBayes v0.9. 21-19 [78,79]. The initial variant filtering was done via vcftools to remove indels, variants with a quality score below 15, and variants with a call rate less than 0.90 [80]. A second filtering step removing multiallelic sites was also conducted using vcftools and this resulting vcf file was named the "soft-filtered" file, which contained 314,999 SNP sites.

Phylogenetic Analysis
To infer evolutionary relationships among the 195 samples, nucleotides of SNP sites were concatenated for each sample and the invariant sites of sequence alignment were removed using RAxML ascertainment bias correction [81]. The maximum likelihood phylogenetic tree was constructed based on 314,999 SNP sites, using the ASC_GTRCAT nucleotide substitution model and 500 bootstrap replicates in RAxML v8.0.25 [81]. The phylogeny was then visualized using iTOL [82]. Strains were assigned into clades based on pairwise SNP comparisons, with a threshold set at 50,000 SNPs.

Genome-Wide Association Study and Linkage Disequilibrium
Variants were annotated with SnpEff v5.0 using the Af293 reference genome annotation to determine functional effects of genetic variants [83]. Highly linked (VIF > 2) SNP markers were removed using PLINK 1.90 beta to ensure uniform sampling of the genome [84]. Association analysis via a mixed linear model was done in TASSEL 5 using two parameters: a population structure defined by 5 principal component vectors, determined based on the scree plot, and a kinship matrix calculated using the Identity by State method (Centered IBS) [85]. To avoid biases due to imbalanced allele frequencies, the minimum allele frequency was also set to 0.05 using TASSEL 5. A total of 21,432 SNP sites remained for conducting the itraconazole GWAS and a total of 21,226 SNP sites for the voriconazole GWAS. A second GWAS was conducted after the removal of strains that contained the L98H mutation in cyp51A (n = 21). For the second association analysis, a total of 22,411 and 21,214 SNP sites remained for conducting the itraconazole and voriconazole GWAS, respectively. A third GWAS was also conducted after the removal of strains that contained well-known mutations associated with in cyp51A (n = 64). For the third association analysis, a total of 20,176 and 20,278 SNP sites remained for conducting the itraconazole and voriconazole GWAS, respectively. Lastly, we conducted a GWAS examining only the strains from Clade II for itraconazole (n = 71) and voriconazole (n = 72). For the analysis focusing solely on strains from Clade II, a total of 16,702 SNP sites remained for conducting the itraconazole GWAS and a total of 16,782 SNP sites remained for the voriconazole GWAS. Using the results of the GWAS, further association mapping between the top 20 SNPs and all SNPs in the soft-filtered vcf file was conducted using TASSEL 5 to determine additional highly linked SNPs of interest.  Table S1: Additional information on the total 195 Aspergillus