Glacial Legacies: Microbial Communities of Antarctic Refugia

Simple Summary Microbial communities in Antarctica have only recently been described with the increasing popularity and ease of genome sequencing. Using these methods, we aimed to test hypotheses of refugia using microbial communities in the McMurdo Dry Valleys of Victoria Land, Antarctica. Refugia are habitable areas that remain undisturbed during cycles of glacial expansions throughout Earth’s history. They may contain ancient lineages and unique communities worthy of conservation, as well as provide insight into the biotic history of Antarctica. We found unique microbial community assemblages from putative refugia in the McMurdo Dry Valleys indicating long-lived climax-communities in one of the harshest environments in the world. This finding corroborates the importance of glacial legacies in structuring not just the physical and geochemical environment, but also the soil microbial communities in this landscape. Abstract In the cold deserts of the McMurdo Dry Valleys (MDV) the suitability of soil for microbial life is determined by both contemporary processes and legacy effects. Climatic changes and accompanying glacial activity have caused local extinctions and lasting geochemical changes to parts of these soil ecosystems over several million years, while areas of refugia may have escaped these disturbances and existed under relatively stable conditions. This study describes the impact of historical glacial and lacustrine disturbance events on microbial communities across the MDV to investigate how this divergent disturbance history influenced the structuring of microbial communities across this otherwise very stable ecosystem. Soil bacterial communities from 17 sites representing either putative refugia or sites disturbed during the Last Glacial Maximum (LGM) (22-17 kya) were characterized using 16 S metabarcoding. Regardless of geographic distance, several putative refugia sites at elevations above 600 m displayed highly similar microbial communities. At a regional scale, community composition was found to be influenced by elevation and geographic proximity more so than soil geochemical properties. These results suggest that despite the extreme conditions, diverse microbial communities exist in these putative refugia that have presumably remained undisturbed at least through the LGM. We suggest that similarities in microbial communities can be interpreted as evidence for historical climate legacies on an ecosystem-wide scale.


Introduction
Currently, only 0.34% of Antarctica is ice-free, with the remaining 99.7% covered by ice on average 2 km thick [1]. While the McMurdo Dry Valleys (MDV), located in Southern Victoria Land, East Antarctica, are the largest ice-free region in Antarctica (~4500 km 2 ; [2]), the contemporary conditions of the MDV are largely inimical to life. Mean annual temperatures of −20°C and annual precipitation of less than 5 cm water equivalent [3] lead to highly stable, but inhospitable conditions in which soil communities experience little change [4]. glacial disturbance [50], but even some low elevation localities further from the coast have escaped inundation by the expansion of the Ross Ice Sheet and associated paleolakes [51,52]. Secondly, high elevation habitable soils require periodic wetting to temper the negative effects of salt deposition for habitability [53]. Finally, soils need to have been historically colonized and their biota persists through the LGM. Previous work on springtails [46,[54][55][56][57], mites [43,58], midges [45], and mosses [48] support evidence of glacial refugia using population-level genetic comparisons between refugia and more recently colonized habitat. Phylogeographic analyses have been used to identify Antarctic refugia, both on Antarctic islands [47] and on the continent itself [44]. Microbial signatures of refugial are expected to reflect these phylogeographic patterns, with putative refugia displaying distinct climax communities and exhibiting phylogenetic similarity when compared to more recently disturbed locations.
These cold desert ecosystems are particularly sensitive to environmental changes, brief or long-term, due to millions of years of adaptation in severely limited and extreme environments [59]. Climate change has already altered the community composition of microfauna in the MDV, disproportionately impacting species adapted to dry and cold conditions, and ultimately decreasing the overall abundance of soil fauna [60]. Ancient, endemic lineages only persist in patchily distributed habitable spaces [61] and are already at the limits of life in the discontinuous environments of the MDV. Climate change may disproportionally affect these dispersal-limited terrestrial organisms given the lack of ecosystem function overlap in the MDV [52,59,62]. Identifying past and present refugia is an important consideration in understanding the resistance and resilience of the MDV to climate change [63]. This study presents a biological sampling effort of sites in the MDV which may represent putative refugia to determine if there is a microbial community signature comparable to previous patterns observed for invertebrates and mosses in other Antarctic systems, such as higher phylogenetic diversity of microbial taxa in putative refugia sites. We set out to better understand how the structure of microbial communities corresponds to the glacial history of these valleys and identify areas of high biodiversity providing sanctuary to Antarctic life during its climate history. We hypothesize that putative refugia will have distinct microbial communities as a result of climate legacies caused by the LGM.

Sampling Site Selection
We selected sampling sites based on the following criteria: signs of periodic wetting, elevation, published exposure age estimates, and proximity to recorded local glacier terminal moraines or paleolake shorelines. Putative refugia were selected at higher elevations that avoided paleolake inundation and glacial incursion, yet still experienced periodic wetting to temper the negative effects of salt deposition on viable mesofaunal communities [41,53]. Sites disturbed during the LGM either by glacial expansion or paleolake inundation were chosen at lower elevation and with reported soil exposure ages younger than the LGM (less than 26 kya), as well as proximity to recorded local glacier terminal moraines or paleolake shorelines (see Table 1 for site classification). Information from geological surveys dating soil exposure ages was used to discriminate between sites that have remained exposed since before the LGM (>26 kya) as putative refugia, and more recently disturbed sites. However, no precise consensus about which candidate sites represent actual refugia in the MDV exists in the current literature due to an incomplete exposure dating record and limited biological census identifying high elevation habitats. It is likely that refugia existed throughout the MDV as temporarily overlapping habitable soils throughout recurrent glacial cycles of the Pleistocene [13]. Two sample sites were chosen per valley system to incorporate varying geological histories, although not every valley system could be represented by both a putative refugia and disturbed site. Samples were all collected from sites in the MDV during the Antarctic summer, between December and February, in years ranging from 1994 to 2018. For each site, only one soil sample was used for DNA extraction. Figure 1 shows sample locations on a map of the McMurdo Dry Valleys. Site elevation and distance from the coast were estimated using Google Earth.
each site, only one soil sample was used for DNA extraction. Figure 1 shows sample loc tions on a map of the McMurdo Dry Valleys. Site elevation and distance from the coa were estimated using Google Earth.

DNA Extraction and Sequencing
Entire soil top layers to 10 cm depth were removed from one location at each of the sampling sites using the standard sampling procedure for Dry Valley soils described in Freckman and Virginia [73]. All samples were frozen after initial microinvertebrate extractions and were thawed to 10 • C prior to microbial DNA extraction. DNA extraction was performed on each sample using the provided standard protocol with DNeasy PowerSoil Kit (Qiagen, Germantown, MD, USA). For each sample, 0.25 g of homogenized whole soil was used for each extraction, replicated 6-fold per sample. A total of 5 negative controls were also performed on pure PCR-grade water during extraction. One reagent-only control was included in the final sequencing products and one positive control was used to ensure amplification and sequencing worked as planned. Sequencing libraries were prepared from each sample following established protocols [74]. In total, 4 µL of extracted DNA was added to a 96 well plate including sample-specific primers with 17 µL of AccuPrime Pfx Supermix, and 2 µL of each paired set of dual-indexed primers for the V4 region of the 16 S rRNA gene [74]. To prevent evaporation during thermal cycling, a drop of mineral oil was added to each well. Individual 20 µL reactions were placed into a thermocycler at 95 • C for 2 min followed by 30 cycles of 95 • C for 20 s, 55 • C for 15 s, and 72 • C for 5 min then 72 • C for 10 min and held at 4 • C. Amplified DNA was visualized via gel electrophoresis using 4 µL of amplified DNA and 4 µL of loading dye in a 1% agarose gel at 100 volts for 60 min alongside a DNA ladder. PCR amplicons were normalized and pooled to 20 ng/µL per sample using the Charm Biotech Just-a-Plate TM Normalization kit. Amplicons were submitted for sequencing on an Illumina MiSeq (Illumina, San Diego, CA, USA) using paired-end 500 cycle v2 chemistry at the Arizona State University genomics core sequencing center. All post-extraction steps included a reagent-only negative control to remove potential contaminant sequences.

Community Analysis
All analyses were performed using QIIME 2 (q2) 2021.2 [75]. Raw sequence data from 109 samples plus 5 negative controls and one reagent-only control were first demultiplexed and quality filtered using the q2-demux plugin then denoised with DADA2 denoise-paired function (via q2-dada2) [76]. Sequences were matched with the metadata feature table. Taxonomy was assigned to ASVs using the q2-feature-classifier [77] classify-sklearn naïve Bayes taxonomy classifier against the Silva 138 99% 515 F/860 R region sequences [78]. Sequences were filtered to remove chloroplast and mitochondrial reads with q2's inbuilt function. Taxonomy bar plots were generated in q2 to visualize diversity at the class-level, excluding samples with fewer than 3 classes represented in total ( Figure 2) to remove their outlier effect on beta-diversity metrics. The remaining 86 samples were used to construct a phylogenetic tree with FastTree 2 [79] (via q2-phylogeny) using MAFFT [80] (via q2-alignment) aligned sequences. Based on the observed features, samples were then rarefied (subsampled without replacement) to 2650 sequences per sample, which excluded the negative and reagent-only controls, leaving 81 samples for diversity analysis. Lower Beacon Valley was the only sampling site where all samples had sequencing depths below this filter point and as such all those samples were excluded from further analyses.
Lastly, specific analysis of microbiome composition (ANCOM) [88] was performed to assess which taxonomic groups were differentially abundant between samples from different valley basins and between putative refugia and disturbed sites. An ANCOM statistical test generates W-values representing a count of ANOVAs that rejected the null hypothesis Biology 2022, 11, 1440 7 of 24 that a given ASV is equally abundant per site [88]. Using this method, we tested the significance of each ASV being differentially abundant among our tested groups-those with differing site history and those taken from different valley basins. Differentially abundant taxa were resolved to the best taxonomic resolution against the Silva reference database. Diversity metrics were all computed using q2-diversity, including alpha-diversity, Pielou′s evenness index [81], Shannon entropy [82], Faith′s Phylogenetic Diversity [83], beta diversity weighted UniFrac [84], unweighted UniFrac [85], Bray-Curtis dissimilarity [86], and Principle Coordinate Analysis (PCoA). The average change in PC1 for each site, overall, was tested for difference from zero using a one-sample t-test with Benjamini-Hochberg false discovery rate (FDR) correction [87]. Significant differences between the mean community composition were calculated using q2′s inbuilt PERMANOVA analysis.
Lastly, specific analysis of microbiome composition (ANCOM) [88] was performed to assess which taxonomic groups were differentially abundant between samples from different valley basins and between putative refugia and disturbed sites. An ANCOM statistical test generates W-values representing a count of ANOVAs that rejected the null hypothesis that a given ASV is equally abundant per site [88]. Using this method, we tested the significance of each ASV being differentially abundant among our tested groups-those with differing site history and those taken from different valley basins. Differentially abundant taxa were resolved to the best taxonomic resolution against the Silva reference database.

Environmental Analysis
Soil environmental parameters were analyzed to control for the potential effects on microbial community composition. Soil acidity and conductivity are important in driving the abundance of key bacterial groups in Dry Valley ecosystems [89], even on a very fine scale [90], while productivity is largely driven by soil moisture content and freely available organic material [91].

Environmental Analysis
Soil environmental parameters were analyzed to control for the potential effects on microbial community composition. Soil acidity and conductivity are important in driving the abundance of key bacterial groups in Dry Valley ecosystems [89], even on a very fine scale [90], while productivity is largely driven by soil moisture content and freely available organic material [91].
Environmental analyses were performed by the BYU Environmental Analytics Lab. In short, gravimetric water content was estimated by drying 25 g of soil at 105 • C and weighing soil dry mass, electroconductivity was measured using a RC-16 C Conductivity Bridge (Beckman Instruments, Brea, CA) and pH was measured using a Thermo Orion Model 410 A+ (Thermo Electron, Waltham, MA, USA). Organic matter content was detected using chromic acid titration [92] and ammonium and nitrate content measured on a Fialyzer 2000 (FIALAb, Seattle, WA, USA). Phosphorus and potassium content was measured using 0.5 M sodium bicarbonate following the Schoenau and Karamonos protocol [93].

Statistical Analyses of Alpha Diversity
Qiime2's built-in Kruskal-Wallis tests were used to compare the diversity between the two groups with different hypothesized glacial histories, and between the different valley basins. The Shannon diversity index was used to assess the differences in alpha diversity between sampled sites. This alpha diversity index was chosen so that abundant taxa and phylogenetic differences between the community members at different sites would be represented in the community analysis. As Shannon diversity had a normal distribution across our sampled locations, the effects of elevation, location, and site geochemistry on microbial alpha diversity were modeled using multiple linear regression models [94] in R [95]. To control for correlative effects of different environmental conditions, a model selection approach was used to select the best explanatory factors for the full dataset as well as the putative refugia sites and the assumed recently disturbed sites separately. The full dataset was considered by including the variations between putative refugia sites and disturbed sites as a random effect, and r-squared values were estimated for the resulting mixed effect models using the piecewiseSEM package in R (https://github. com/jslefche/piecewiseSEM accessed 21 September 2022). As samples were sequenced in up to six replicates, within-location variation was included as a random factor to avoid pseudoreplication. Distance to coast and elevation were strongly correlated in the dataset and so were considered separately to select a best-fitting model. The effects of distance to coast and elevation were further explored in the complete model selection using other environmental factors, which were correlated to these effects in different ways depending on the the site's proposed glacial history.

Multivariate Analyses
The beta diversity results of the different communities were combined with the effects of our measured environmental variables to plot these interactions in non-metric multidimensional space [96] and using distance-based redundancy analysis [97]. Using the vegan package in R [98] the community composition of each site was plotted in both ordination spaces using the Bray-Curtis distance metric, after which the environmental drivers of this composition were plotted using proportionally scaled arrows. Detrended correspondence analysis [99] was further used to validate the results of these ordination plots.

Sample Site Classification
Given that our sampling sites were uniformly chosen for habitable soil, that is, signs of periodic wetting and limited salt accumulation, geochemical factors normally used as a proxy for soil exposure age-sulfate, nitrate, and chloride concentrations-were not used to categorize sites in this study [23][24][25][26]. The initial classification of putative refugia or disturbed site status mainly relied on published exposure age estimates (Table 1)

Data Retention
After filtering for chloroplast and mitochondrial sequences, samples were filtered to a sampling depth of 2650 sequences per sample based on the alpha rarefaction curve (see Supplementary Figure S1 Figure 2. A total of 1390 unique features were retained after rarefaction, and 94 different classes were identified across our samples, with the top 3 classes being Actinobacteria, Blastocatellia and Thermoleophilia, together making up around 40% of total reads ( Figure 2). Actinobacteriota, Acidobacteriota, Chloroflexi, and Alphaproteobacteria were the four most prevalent phyla of the 33 identified, together making up at least 40%, but frequently over 60% of the reads in each sample after filtering. Crenarchaeota and Halobacterota were the only clades of Archaea recovered from these samples.

Alpha Diversity
Shannon diversity was highest in the McKay Glacier Basin and Garwood Valley, followed by Miers Valley, but significant differences in diversity were found within some of these valleys (Figure 3). For example, significantly higher diversity was found in the disturbed site within Miers Valley than in the higher elevation putative refugium in the same valley basin (p < 0.05, Kruskal-Wallis pairwise comparison). The full table with Shannon diversity comparisons between each site is reported in Supplementary Table  S1. Overall, a higher alpha diversity was found in recently disturbed sites over putative refugia sites and this distinction was found to be significant between the two groups (p = 0.003, Kruskal-Wallis pairwise comparison). These results were consistent between the Faith PD metric and Shannon diversity, but not when calculating Pielou's evenness.

Beta Diversity
Community composition (measured in the Unweighted UniFrac distance between samples) was significantly different between all the sampled valley basins, and each sampled location was found to be significantly different from all other sampled sites in pairwise comparisons, with three sites as exceptions (Dais, Brownworth and Higher Miers Valley, p < 0.05 threshold, PERMANOVA). These three comparisons also had the lowest sample size out of all pairwise comparisons. Comparisons with other measures of community dissimilarity were largely congruent except for two exceptions when using the Weighted UniFrac distance to compare different valleys (

Beta Diversity
Community composition (measured in the Unweighted UniFrac distance between samples) was significantly different between all the sampled valley basins, and each sampled location was found to be significantly different from all other sampled sites in pairwise comparisons, with three sites as exceptions (Dais, Brownworth and Higher Miers Valley, p < 0.05 threshold, PERMANOVA). These three comparisons also had the lowest sample size out of all pairwise comparisons. Comparisons with other measures of community dissimilarity were largely congruent except for two exceptions when using the Weighted UniFrac distance to compare different valleys (between Alatna Valley and Upper Wright Valley as well as between Upper Wright Valley and Victoria Valley) (p < 0.05 threshold, PERMANOVA test, see Supplementary Table S2 for reported p-values). However, overall, the community composition of putative refugia sites was found to be significantly different from the communities of disturbed sites using all three metrics (p = 0.001 (all metrics), PERMANOVA test). Figure 4 shows the community composition of our samples in ordination space, measured in the Unweighted UniFrac, Weighted UniFrac and Bray-Curtis distance metrics.

Effects of Environmental Conditions
3.5.1. Distance to Coast and Elevation Figure 5 shows the relationship between the distance of a sampling site to the nearest coastline and its elevation above sea level to the Shannon diversity index of the microbial community. The best models selected were discordant between the full dataset and refugia sites and disturbed sites separately, with increased distance to the coastline selected as the best significant predictive factor (p < 0.05, linear regression) only in the full dataset,  Figure 5 shows the relationship between the distance of a sampling site to the nearest coastline and its elevation above sea level to the Shannon diversity index of the microbial community. The best models selected were discordant between the full dataset and refugia sites and disturbed sites separately, with increased distance to the coastline selected as the best significant predictive factor (p < 0.05, linear regression) only in the full dataset, and is associated with a loss of diversity (between −0.012 and −0.026 decrease in Shannon index per kilometer distance from shore). Both the refugia and disturbed site datasets separately show a moderate negative trend in diversity at greater distances from the coast (see Supplementary File S1 for full p-value and intercept report).

Soil Geochemistry
None of the investigated environmental variables were significantly different between the grouped putative refugia sites and the recently disturbed sites (see Supplementary Figure 3, p > 0.05, ANOVA)-full geochemical data are reported in Supplementary  Table S3. The correlation matrices for all investigated environmental variables including elevation and distance to coast (see Supplementary Table S4) showed strong correlations between several of the key environmental drivers. A model selection approach was used to determine the best predictors of microbial diversity in our datasets. Soil organic matter content was retained as the only factor significant in the full dataset (p = 0.0135, "Shat-terthwaite′s t-test") while it was not significantly predictive of alpha diversity in the dataset of recently disturbed sites (p > 0.05, linear regression) or potential refugia. Figure 6 shows the organic matter content across the two datasets and its relation to alpha diversity. Putative refugial sites and disturbed sites considered separately did not fit in any single linear model, as many of the correlated variables such as distance to coast, elevation and organic matter content had a similar weight in the model selection. Model averaging of all best predictive models in the potential refugia dataset provided a model without any significant effects on alpha diversity that included distance to coast, elevation, organic matter content, gravimetric water content and potassium content. The best predictive single-variable model included only distance to coast, but its effect on diversity was not sig-

Soil Geochemistry
None of the investigated environmental variables were significantly different between the grouped putative refugia sites and the recently disturbed sites (see Supplementary Figure S3, p > 0.05, ANOVA)-full geochemical data are reported in Supplementary Table S3. The correlation matrices for all investigated environmental variables including elevation and distance to coast (see Supplementary Table S4) showed strong correlations between several of the key environmental drivers. A model selection approach was used to determine the best predictors of microbial diversity in our datasets. Soil organic matter content was retained as the only factor significant in the full dataset (p = 0.0135, "Shatterthwaite's t-test") while it was not significantly predictive of alpha diversity in the dataset of recently disturbed sites (p > 0.05, linear regression) or potential refugia. Figure 6 shows the organic matter content across the two datasets and its relation to alpha diversity. Putative refugial sites and disturbed sites considered separately did not fit in any single linear model, as many of the correlated variables such as distance to coast, elevation and organic matter content had a similar weight in the model selection. Model averaging of all best predictive models in the potential refugia dataset provided a model without any significant effects on alpha diversity that included distance to coast, elevation, organic matter content, gravimetric water content and potassium content. The best predictive single-variable model included only distance to coast, but its effect on diversity was not significantly different from zero (p > 0.05, linear regression). In the dataset of disturbed sites, potassium was the only factor retained in the best predictive model but had no significant effect on microbial diversity. All resulting models only explained a small proportion of the dataset's total variability (adjusted R-squared: 0.19 in the dataset of putative refugia, 0.01 in the dataset of recently disturbed sites, and 0. 33 in the full dataset, see Supplementary Figure S1 for full results). Salt and nitrate concentrations were not significantly correlated with biodiversity when considering the entire dataset or either group of sites separately. Putative refugia sites also did not have any higher average salt concentrations or nitrate content, regardless of their longer exposure age and higher elevation (see Figure 7). Any non-significant environmental predictors across sites are reported in Supplementary Figure S4. nitrate content, regardless of their longer exposure age and higher elevation (see Figure  7). Any non-significant environmental predictors across sites are reported in Supplementary Figure 4.

Drivers of Community Structure
3.6.1. Ordination Plots Figure 8 shows the result of the non-metric multidimensional scaling (NMDS) plot. Community distance metrics are plotted using the Bray-Curtis metric, reflecting the nonphylogenetically weighted distribution of sampled communities in ordination space. Similar to the alpha diversity models, organic matter plays a large role in driving community structure. Elevation and distance from the coastline have a similar and strong effect on community composition, aligning with the ordination of several of our hypothesized refugia sites such as those in Alatna Valley, Upper Wright Valley and Victoria Valley. Nitrogen content and conductivity had a strong and equal effect on community composition, reflected mostly in the high salt content soils of the Brownworth samples taken from Lower Wright Valley. Geographically, we saw a reasonable effect of longitudinal distance which was inversely correlated compared to the distance from coastline. Latitudinal differences across these valleys did not play a strong role in shaping community structure. Distance-based RDA ordination mapping of environmental drivers corroborated these results and showed highly similar patterns in community structure (see Supplementary Figure S5). A detrended correspondence analysis showed largely congruent clustering of communities largely driven by a higher degree of heterogeneity of the potential refugia sites (Supplementary Figure S6). 3.6. Drivers of Community Structure 3.6.1. Ordination Plots Figure 8 shows the result of the non-metric multidimensional scaling (NMDS) plot. Community distance metrics are plotted using the Bray-Curtis metric, reflecting the nonphylogenetically weighted distribution of sampled communities in ordination space. Similar to the alpha diversity models, organic matter plays a large role in driving community structure. Elevation and distance from the coastline have a similar and strong effect on community composition, aligning with the ordination of several of our hypothesized refugia sites such as those in Alatna Valley, Upper Wright Valley and Victoria Valley. Nitrogen content and conductivity had a strong and equal effect on community composition, reflected mostly in the high salt content soils of the Brownworth samples taken from Lower Wright Valley. Geographically, we saw a reasonable effect of longitudinal distance which was inversely correlated compared to the distance from coastline. Latitudinal differences across these valleys did not play a strong role in shaping community structure. Distance-based RDA ordination mapping of environmental drivers corroborated these results and showed highly similar patterns in community structure (see Supplementary Figure S5). A detrended correspondence analysis showed largely congruent clustering of communities largely driven by a higher degree of heterogeneity of the potential refugia sites (Supplementary Figure S6).  Figure 9A plots the first principal coordinate of the Unweighted UniFrac diversity metric against each site′s latitude. Unweighted UniFrac distances were selected as the most informative beta diversity measurement for this figure as it considers the phylogenetic dissimilarity between different communities, thus including a signal of evolutionary history or dispersal limitation between the valleys and over the glaciation history of the sites. This further illustrates those sites, especially those classified as putative refugia, showing remarkably similar community composition over a large latitudinal range. Conversely, sites at higher elevations also had highly similar community composition whereas the lower elevation disturbed sites display a larger variation on the first principal coordinate axis ( Figure 9B).  Figure 9A plots the first principal coordinate of the Unweighted UniFrac diversity metric against each site's latitude. Unweighted UniFrac distances were selected as the most informative beta diversity measurement for this figure as it considers the phylogenetic dissimilarity between different communities, thus including a signal of evolutionary history or dispersal limitation between the valleys and over the glaciation history of the sites. This further illustrates those sites, especially those classified as putative refugia, showing remarkably similar community composition over a large latitudinal range. Conversely, sites at higher elevations also had highly similar community composition whereas the lower elevation disturbed sites display a larger variation on the first principal coordinate axis ( Figure 9B).

Sample Composition
The analysis of the composition of microbes (ANCOM) comparing putative refugia sites to recently disturbed sites found eleven distinct ASVs that significantly differed in abundance between those two groups. Eight of these ASVs belonging to the Gemmatimonadaceae, Chitinophagaceae, Blastocatellaceae, Gaiellaceae, and Armatimonadales families, and the phylum Chloroflexi are found in significantly higher percentile abundances for putative refugia sites (W = 4717, 4649, 4699, 4452,4401, 4498, 4644, and 4411, respectively). Two ASVs, both from the order Pyrinomonadales, are associated with recently disturbed sites (W = 4722 and 4532). One archaeal ASV belonging to genus Candi-

Sample Composition
The analysis of the composition of microbes (ANCOM) comparing putative refugia sites to recently disturbed sites found eleven distinct ASVs that significantly differed in abundance between those two groups. Eight of these ASVs belonging to the Gemmatimonadaceae, Chitinophagaceae, Blastocatellaceae, Gaiellaceae, and Armatimonadales families, and the phylum Chloroflexi are found in significantly higher percentile abundances for putative refugia sites (W = 4717, 4649, 4699, 4452, 4401, 4498, 4644, and 4411, respectively). Two ASVs, both from the order Pyrinomonadales, are associated with recently disturbed sites (W = 4722 and 4532). One archaeal ASV belonging to genus Candidatus Nitrocosmicus (W = 4784) also differs between putative refugia and recently disturbed sites. Inter-valley comparisons resolved 517 ASVs (11%) as being significantly associated with a specific valley system. The full table displaying individual taxa ANCOM scores is reported in the supplementary material (Supplementary Table S5).

Discussion
The MDV provide a unique ecosystem to study phylogeographic patterns left by geological cycles and thousands of years of evolution in an environment that has undergone remarkably little change [100]. At the same time, it is well documented that populations of Antarctic biota have been strongly influenced by events in their geological and climate history and carry evolutionary traces of extinction during glacial advance or persistence in non-glaciated refugia [12,13,101]. In this study the microbial composition of several high elevation putative refugia and more recently disturbed low elevation sites were characterized to determine signatures of divergent glacial legacies on microbial communities.
Geological work has determined that some high elevation sites in continental Antarctica have remained exposed through periods of glacial expansion while low elevation sites were inundated by local glaciers or paleolakes, assumptions supported by Antarctic glacial models, isotope dating, and microarthropod and moss population genetic evidence [43,46,48]. Here, we have identified a biological line of evidence for the presence of such refugia in the Dry Valleys by characterizing the communities of microbial life at putative refugia sites.

Alpha Diversity
While microbial community diversity was initially hypothesized to be higher in refugia, comparable to high population genetic diversity in refugia found among other terrestrial organisms [15,44], our findings present the opposite result. Shannon diversity indices were significantly lower in putative refugia sites than in more recently disturbed sites in this study, while differences between the valley basins were more pronounced than between disturbed sites and putative refugia. None of the individual comparisons showed higher diversity in the putative refugia site compared to a recently disturbed site in the same valley basin, except in Miers and Victoria Valley.
The results of our model selection to determine the effects of environmental factors on alpha diversity across the sample sites showed no clear predictive effects of site biogeochemistry or other environmental factors, such as elevation or distance to the coastline. While in all of the datasets, distance from the coast was found to be a better predictor of decreased biodiversity than elevation, these were non-significant drivers of alpha biodiversity. These patterns were most pronounced in the full dataset, but are likely caused by the location of putative refugia sites, which are all located further from the coastline and at higher overall elevation than the disturbed sites. The geochemical soil properties most predictive of biodiversity were not congruent between putative refugia and disturbed sites, although organic matter content patterns were influenced by zero-inflation, caused by the very low contents found in these Antarctic soils and the limits of the detection test. The nitrate and salt concentrations of soils sampled in this study did not present a clear distinction between longer exposed, putative refugia sites and disturbed sites. This may indicate that many of the high elevation putative refugia investigated in this study are in fact experiencing periodic wetting at a similar frequency to the lower elevation disturbed sites. However, an overall pattern of decreasing alpha diversity with increasing nitrate and salt contents was observed in this study, consistent with the findings of Dragone et al., [35].
Contrary to the widely accepted influence of soil moisture, water limitation did not seem to be a predictive factor in the diversity between the sites sampled in this study, while across the MDV ecosystem this is often suggested as the main constraint on microbial diversity [28,102]. Instead, organic matter content was the best predictor when modeling the lower elevation disturbed sites which are likely to have fewer constraints in water input due to periodic wetting by snow and glacier runoff, ephemeral streams [6], and a closer water table, whereas water availability may be more strongly limited in higher elevation sites. These results are largely congruent with literature showing that while environmental factors are key drivers of soil habitability, their effects are difficult to generalize over large spatial ranges [30,90].

Similarities in Community Composition
Displaying community structure in ordination space did not result in a clear clustering of all disturbed and putative refugia sites across the whole dataset. Instead, several different clustering patterns can be identified which cast doubt on the classification of some of the selected disturbed and putative refugia sites. Strikingly, the factors displaying the strongest effect on community composition are those directly associated with refugia identification: elevation, distance from coast, and the accumulation of nitrate and salt compounds. Patterns of community clustering were consistent regardless of beta diversity metric and ordination type, indicating that these results are robust and microbial community signature is different both in terms of abundance and composition.

Within Valley Comparisons
Across all beta diversity indices, Garwood Valley samples clustered strongly together and were highly similar, although the distances between these communities were reported as significant. The two sites within Garwood Valley also exhibited equally high levels of biodiversity and very similar environmental conditions. Evidence exists for the inundation of our lower elevation site in Garwood Valley [20]. While it is unclear whether higher Garwood Valley escaped inundation or another large disturbance in the relatively recent past, the community signature we describe is highly similar to the lower elevation site in the same valley basin and resembles other disturbed sites from this study.
Conversely, samples from the McKay Glacier region showed large community composition differences although the two sites are at very similar elevations, potentially driven by high organic matter content in only one of the sites. Other environmental variables such as phosphorus content and water availability also differed strongly between the two sites. Despite these differences, both sites exhibited relatively high biodiversity making it hard to draw specific conclusions about the glacial history of these sites, even though relatively recent glaciation has been proposed for this area [70].
In Miers Valley, Taylor Valley, Lower Wright Valley, and Alatna Valley the ordination plots showed high differentiation between the two sampled sites within each valley. These differences were less pronounced in the Unweighted UniFrac distance metric than in the other two beta diversity metrics. This can be partially explained by the large distance between the paired sample locations, such as in the case of Victoria Valley and Lower Wright Valley. However, this is not the case for other within-valley comparisons. Differentiation in Lower Wright Valley samples could be further explained due to large differences in elevation, distance from the coastline, and soil geochemistry, particularly the high concentrations of nitrates and salts found in the disturbed site. The two sites in Taylor Valley similarly have striking differences in local geochemistry and elevation. However, paired samples in Miers Valley exhibit highly similar environmental conditions, with the only difference being the hypothesized inundation of the lower elevation site through glacial lake formation during the LGM (20-10 kya) [20].

Putative Refugia Sites
In each of the plots, a tentative cluster can be found of at least five core putative refugia sites: Battleship Promontory in Alatna Valley, Beacon Valley, both Upper Wright Valley sites and Wall Valley in Victoria Valley. These sites are spread across a large latitudinal range and have well established exposure age estimates that indicate all these sites have remained undisturbed for at least several million years. In each beta diversity metric, the much lower elevation site in Lower Victoria Valley fell into this cluster, supporting the geological evidence that this site has likely been exposed for 120-300 kya, which is prior to the LGM but suggests a more recent disturbance than the core putative refugia sites [19,71,72]. Given that this relatively recently disturbed site clustered with putative refugia sites for which much older exposure age estimates exist, perhaps any exposure age beyond the LGM is a benchmark for a climax community. As evidence, both sites in Victoria Valley clustered closer to the putative refugia sites despite marked dissimilarity resulting from differences in elevation and soil geochemistry.
Clustering patterns for the lower Alatna Valley site were similar to putative refugia, although here, no specific estimates of soil exposure ages have been published. As such, this is a tentative biological estimate that exposure ages in lower Alatna Valley may be similar to those found at Battleship Promontory. The lower elevation site in Alatna Valley was also found to have some of the highest salt and nitrate concentrations in this study, further evidence that these soils have been undisturbed for a considerable amount of time.
The community composition in high elevation sites from lower Wright and Miers Valleys clustered together with the five core putative refugia sites in the unweighted UniFrac principal coordinate plot, but had varying results using other beta diversity metrics. Regarding putative refugia in Miers Valley, the ambiguity of this classification can be attributed partially to it being located at the lowest elevation out of all the putative refugia sites and at the greatest distance from the rest of the high elevation sites in this study. The Dais site in lower Wright Valley matches the geochemical and elevational conditions of the other putative refugia sites, and geological evidence show this site has been relatively undisturbed for around 4 million years [69]. However, biodiversity was the lowest out of any investigated site in this study, potentially explaining the poor clustering with other putative refugia using both abundance-weighted diversity metrics.
Surprisingly, both the low and high elevation sites from Taylor Valley showed tentative clustering with the larger putative refugia site cluster, especially in the two reported UniFrac ordination plots. While Taylor Valley has been the focus of much study regarding the microbial life in the MDV, specific glacial histories are not known for some of the higher plateaus around its lake basins. Part of the within valley differences are shown to be explained by the large variation in phosphorus availability and pH between the two sites, yet the similarity of both these sites to several of the putative refugia for which most geological evidence exists is puzzling. The lower elevation site in Taylor Valley was almost certainly recently inundated by glacial lakes [16], while the recent history of interactions between Mt. Falconer and the Commonwealth Glacier is less clear.

Similarities across the MDV
Geographic distance plays a strong role in community composition. Beta diversity comparisons among paired sites found that each valley system was significantly distinct from other paired valley sites. Likewise, ANCOMs found significant differences in the dominant taxa among the paired sites of each valley system. While katabatic winds can be a source of dispersal in this polar ecosystem, successful migration is still limited by distance, geographic barriers and local conditions [11,103]. Interestingly, several of the hypothesized refugia sites in this study displayed highly similar microbial communities while being geographically separated. The area from Beacon Valley to Alatna Valley covers the majority of the MDV latitudinal variation, a horizontal distance of more than 100 km with major geographic barriers. Yet, all of the putative refugia sites identified across this region show remarkably similar communities in the context of their geographic distance.
The high degree of similarity between the putative refugia sites in Alatna Valley, Beacon Valley, Miers Valley, Victoria Valley, and both Upper Wright Valley pairs appears not to be structured by commonalities in environmental variables. Unsurprisingly, high elevation and large distance from the coastline are strong predictors of microbial community structure, but even sites with disparate elevations such as the putative refugia in Miers Valley and the lower Victoria Valley site display a strong similarity to the other putative refugia sites at higher elevations.
The similarity in community composition of the paired putative refugia and disturbed site in Garwood Valley presents an argument that this site's community is not structured by the same legacy effects as the main cluster of putative refugia sites. The other main outlier in this comparison is formed by the low elevation site in Taylor Valley, which exhibits remarkable similarity to the cluster of refugia sites, and the identified putative refugia in Miers Valley which are not as similar to the other putative refugia sites as those within the observed cluster. It also provides increased evidence for the consideration of the lower Victoria Valley site as a putative refugium due to its high degree of similarity to other refugia over a large geographic range.

Taxonomic Differences between Putative Refugia and Disturbed Sites
While metabarcoding approaches have made the description of taxa in extreme environments more accessible [104], functional information for many of these species is dependent on notoriously difficult culture-based approaches [105] and is thus frequently absent from the literature. What literature exists for the dominant families associated with putative refugia revealed by ANCOM are known to have adaptations for low soil moisture, broad pH range, and high salt conditions. For example, bacteria of phylum Gemmatimonadetes are adapted to drier soils [106], the family Blastocatellacae is slightly acidophilic to neutrophilic mesophiles adapted to a broad range of temperatures and pH levels [107], and the phylum Chloroflexi is known to be photoautotrophic, a beneficial attribute given the low carbon input in the MDV [108]. Of the taxa found in higher abundances in recently disturbed sites, the ammonia-oxidizing archaea Candidatus Nitrocosmicus could indicate more productivity in these lower valley soils associated with the increased presence of functional nitrification taxa [109]. Eleven percent of all identified ASVs were associated with a specific valley system, indicating a high endemicity of each valley community created by geographic barriers and limited dispersal.

Conclusions
This study has characterized putative high elevation refugia, capable of supporting microbial diversity comparable to model habitats from low elevation valley sites. As the Antarctic continent is extremely harsh and inhospitable, these islands of suitable soil habitat at high elevations could be considered important sites for management and conservation. If these refugia have insulated microbes from climate changes in the past, they can potentially serve the same purpose in the face of climate shifts. Likewise, the fact that refugia are isolated may provide resistance against potential invasive species that could threaten more vulnerable soils-those at lower elevations and closer to the coast. Refugia may also serve to preserve highly adapted ancient genotypes whereas lower valley basin populations could undergo strong selection as a result of anthropogenic influence. The differentially abundant taxa found by an analysis of microbial composition further supports the hypothesis that these refugia harbor genetically distinct bacterial lineages.
We show that sites with different glaciation histories have a distinct microbial community composition underscoring the importance of glacial legacies in this system. Putative refugia have distinctive microbial communities when compared to recently disturbed regions. Given the slightly different signal for distinct community compositions between the two glaciation histories using the weighted or unweighted beta diversity measurements, differential ASV abundance may play a stronger role in distinguishing the community composition of putative refugia sites and recently disturbed sites. Community composition was most strongly influenced by elevation and location of the sampled area, but not soil geochemistry. While there may not be one distinct signature of microbial composition in Dry Valley refugia, we show that microbial community structure reflects the glacial history of the McMurdo Dry Valleys.