Untargeted Exometabolomics Provides a Powerful Approach to Investigate Biogeochemical Hotspots with Vegetation and Polygon Type in Arctic Tundra Soils

: Rising temperatures in the Arctic have led to the thawing of tundra soils, which is rapidly changing terrain, hydrology, and plant and microbial communities, causing hotspots of biogeochemical activity across the landscape. Despite this, little is known about how nutrient-rich low molecular weight dissolved organic matter (LMW DOM) varies within and across tundra ecosystems. Using a high-resolution nano-liquid chromatography-mass spectrometry (LC/MS) approach, we characterized the composition and availability of LMW DOM from high-centered polygons (HCP) and low-centered polygons (LCP) with Eriophorum angustifolium or Carex aquatilis as the dominant vegetation. Over 3000 unique features (i.e., discrete mass/charge ions) were detected; 521 were identiﬁed as differentially abundant between polygonal types and 217 were putatively annotated using high mass accuracy MS data. While polygon type was a strong predictor of LMW DOM composition and availability, vegetation and soil depth were also important drivers. Extensive evidence was found for enhanced microbial processing at the LCP sites, which were dominated by Carex plant species. We detected signiﬁcant differences between polygon types with varying aboveground landscape features or properties, and hotspots of biogeochemical activity, indicating LMW DOM, as quantiﬁed by untargeted exometabolomics, provides a window into the dynamic complex interactions between landscape topography, vegetation, and organic matter cycling in Arctic polygonal tundra soils. MS 2 high-mass accuracy measurements. This demonstrates the value of this information-rich signal and shows how it can be used diagnostically for both qualitative and quantitative inquiries.


Introduction
The Arctic is warming at least twice as fast as any other landscape on the planet and stores nearly half the Earth's terrestrial carbon (C) in soil organic matter (SOM) associated with permafrost soils [1][2][3]. Rising temperatures have accelerated permafrost degradation, resulting in physical, hydrological, and chemical shifts across the landscape that have led to previously frozen SOM suddenly becoming available for microbial decomposition [4][5][6].
Mobilizing even a fraction of this C-rich SOM via these physical and biochemical processes may accelerate the release of greenhouse gases (GHGs) (i.e., CO 2 , CH 4 , N 2 O) from the landscape, creating a significant positive feedback to climate change [7][8][9][10]. Reliably predicting where (hotspots) or when (hot moments) this C loss is most likely to occur across the Arctic landscape, however, depends on multiple interacting factors. These include landscape heterogeneity [11][12][13] and the associated shifts in hydrology (topography) [14,15], Low molecular weight (LMW) dissolved organic matter (DOM) is the water-soluble fraction of SOM most available to microbial decomposers and is a complex and dynamic mixture of small molecules (<1500 Da) of both biotic and abiotic origin (i.e., plant root exudates, products of microbial metabolism or turnover, photodegradation products) [30]. In laboratory incubations of Arctic soils, LMW DOM composition has been shown to be sensitive to variations in both temperature [31][32][33] and moisture [34]. Analogously, the structure and function of soil microbial communities have been shown to be strongly influenced by the molecular composition of this highly labile substrate pool [35][36][37]. Despite it representing an information-rich chemical fingerprint of biological function in soil and, thus, a potential indicator of C vulnerability across space that could help reduce uncertainty in process-based predictive models of C cycling [23,24], the spatial variability of LMW DOM composition across Arctic polygonal tundra landscapes is largely unknown.
Some of this uncertainty stems from analytical challenges in LMW DOM detection, isolation, and quantitation. While there is not a single analytical platform that can characterize all LMW DOM species in a given soil sample, liquid chromatography (LC) coupled with high-resolution mass spectrometry (HRMS) and a soft ionization technique such as electrospray ionization (ESI) is a powerful approach that allows for the detection and relative quantitation of thousands of molecular species in a single measurement [38][39][40][41][42]. Low molecular weight (LMW) dissolved organic matter (DOM) is the water-soluble fraction of SOM most available to microbial decomposers and is a complex and dynamic mixture of small molecules (<1500 Da) of both biotic and abiotic origin (i.e., plant root exudates, products of microbial metabolism or turnover, photodegradation products) [30]. In laboratory incubations of Arctic soils, LMW DOM composition has been shown to be sensitive to variations in both temperature [31][32][33] and moisture [34]. Analogously, the structure and function of soil microbial communities have been shown to be strongly influenced by the molecular composition of this highly labile substrate pool [35][36][37]. Despite it representing an information-rich chemical fingerprint of biological function in soil and, thus, a potential indicator of C vulnerability across space that could help reduce uncertainty in process-based predictive models of C cycling [23,24], the spatial variability of LMW DOM composition across Arctic polygonal tundra landscapes is largely unknown.
Some of this uncertainty stems from analytical challenges in LMW DOM detection, isolation, and quantitation. While there is not a single analytical platform that can characterize all LMW DOM species in a given soil sample, liquid chromatography (LC) coupled with high-resolution mass spectrometry (HRMS) and a soft ionization technique such as electrospray ionization (ESI) is a powerful approach that allows for the detection and relative quantitation of thousands of molecular species in a single measurement [38][39][40][41][42]. Here, we use two complementary LC phases, reversed-phased (RP) and hydrophilic interaction chromatography (HILIC), and both MS-polarities (positive-and negative-ion mode) at the nano-scale, as this was recently shown to be an effective approach to detect and resolve Soil Syst. 2021, 5, 10 3 of 19 an expanded range of LMW DOM in soil [43]. We applied this untargeted approach in soil organic horizons with contrasting aboveground vegetation community composition, obtained from the centers of a high-and low-centered polygon, to elucidate the molecular distribution and differential abundances of LMW DOM across space and yield new insights into the diversity of organic substrates available to plant and microbial communities in these polygonal tundra systems.

Study Site and Sample Description
A detailed description of the study site, soil cores, and sample collection can be found in the Supporting Information. In brief, four soil cores were collected from a polygonal tundra landscape on the northern coastal plain of Alaska on the Barrow Environmental Observatory. A total of two cores were collected from the organic soil horizon to a depth of 15 cm from the center of an LCP and two from the center of an HCP. The aboveground vegetation at these locations varied by polygonal type [44]. In order to assess differences in LMW DOM potentially associated with aboveground vegetation, in each polygonal center, we took one soil core from an area dominated by sedge Carex aquatilis and one soil core from an area dominated by sedge Eriophorum angustifolium. The cores were shipped frozen to Oak Ridge National Laboratory (ORNL, Oak Ridge, TN, USA) where they were stored at −80 • C until processing. Using a band saw, each frozen core was cut into three 5-cm sections (top, middle, and bottom) to enable the evaluation of any within-horizon variation. The sections were thawed overnight at 4 • C and then homogenized by hand removing all inorganic, mineral, or live plant material [45]. Live roots were also removed from each section, dried, and weighed to evaluate any correlation between LMW DOM abundance and root biomass. A subsample from each of the core sections (n = 12) was taken for determination of water content, total carbon and nitrogen, and total organic carbon (Table S1) using conventional techniques, which are described in detail in the Supporting Information.

Soil Extraction and Sample Preparation
Biological replicates were obtained by extracting each core section in triplicate (n = 36, nine per core), along with three controls (extraction with no soil), using the procedure we previously optimized for these soils [43]. Briefly, a single aqueous extraction (LC/MSgrade H 2 O, pH = 5, 1:3 w/v, 1 h) was employed to maintain high throughput and extract compounds that would be found free in the soil solution and bioavailable to both plant and microbial communities [38,46]. A pooled quality-control (QC) sample, consisting of equal volumes of all 36 samples and an internal standard, 4-amino-6-methyl-8-(2 -deoxyβ-D-ribofuranosyl)-7(8H)-pteridone (6-MAP), was also prepared to monitor instrument performance and assist with normalization procedures in post-processing [47]. Extracts were centrifugal filtered (Amicon Ultra, 3 kDa, 4 • C, 15 min), concentrated (12×) using a SpeedVac Concentrator, and then separated into two aliquots. One aliquot was further evaporated to near-dryness and brought back up in 95:5 (v/v) acetonitrile:water, creating one organic and one aqueous aliquot per sample for analysis by HILIC and RP-LC, respectively. Extracts were stored at −80 • C until LC/MS analysis.

Instrumentation and LC/MS Data Collection
Samples and controls were thawed and prepared immediately prior to injection. Separations were performed using a Dionex UltiMate 3000 HPLC pump (ThermoFisher Scientific, Waltham, MA, USA), and 100 µm i.d. fused-silica (Polymicro Technologies, Phoenix, AZ, USA) columns laser-pulled and pressure-packed in-house to 20 cm with either Kinetex C18 material (5 µm, 100 Å, Phenomenex, Torrance, CA, USA) or zwitterionic, polymer-based ZIC-pHILIC material (5 µm, Sequant, bulk material provided by EMD Millipore (Burlington, MA, USA). Optimized mobile phase compositions (Table S2) and gradients (Table S3) can be found in the Supporting Information. Quality controls were Soil Syst. 2021, 5, 10 4 of 19 run every six injections and samples were randomized to reduce instrument-derived variation. Technical blanks representing the column re-equilibration conditions were also run regularly to monitor background ions and carry-over between samples.
To obtain high-mass accuracy measurements, columns were coupled to an LTQ-Orbitrap Velos Pro mass spectrometer (ThermoFisher Scientific) equipped with a nanoelectrospray ionization source (Proxeon, Syddanmark, Denmark) operated in either positiveor negative-ionization mode under direct control of the XCalibur software, v2.2 SP1.48 (ThermoFisher Scientific) resulting in four separate LC/MS analyses per sample (HILIC +/− and RP +/−, n = 144). Full precursor (MS 1 ) scans were acquired over a mass range of 50-1500 m/z in centroid mode at a resolving power of 30,000. Fragmentation spectra (MS 2 ) were also collected to provide an additional dimension for annotation by database matching or eliminating candidates for features that match to multiple hits. Additional details about chemicals, instrumentation settings, calibration, and standards, can be found in the Supporting Information.

Untargeted LC/MS Data Processing
Raw LC/MS files were processed using the freely available MZmine (v2.30) software [48,49]. Detailed descriptions of each of the modules and parameters used for peak detection (Table S4), chromatogram alignment, peak list generation, and annotation (Table S5) are in the Supporting Information. Briefly, chromatograms were built using high-resolution MS 1 and MS 2 data and retention time (RT) within a specified m/z and RT tolerance, resulting in an assigned peak area. All chromatograms within each LC/MS condition were aligned across the sample set (including blanks and controls) to assist with normalization and relative quantitation. Although a soft-ionization technique, electrospray ionization can create in-source fragments, adducts, or ion complexes that can complicate spectral analysis and annotation. Using the identification module in MZmine, each spectrum was searched for adducts, complexes, and fragments using specified RT and m/z thresholds. The proportion of each LC/MS dataset identified as either adducts, complexes, or fragments did not exceed~10% ( Figure S1).

Data Filtering, Normalization, and Statistical Analyses
To evaluate the ability of the LC/MS approach to detect quantitative variations in LMW DOM abundance across space, in addition to peak detection and alignment, it is also important to remove as much noise, background signal, and unwanted variation as possible. To accomplish this, multiple conservative LC/MS-based metabolomic data processing techniques were applied, including normalization procedures, data transformations to remove intragroup batch effects ( Figure S2), a blank/control correction, and reproducibility and abundance thresholds [40,41,50]. This resulted in a matrix of features-defined here as a unique RT, MS 1 m/z, and MS 2 fragmentation spectrum with a corresponding peak height (intensity) and a peak area. Any duplicate features or features that had zero peak area after normalization were also removed, resulting in a matrix of high-quality features (HQFs). The number and complexity of HQFs detected by each LC/MS condition were used to evaluate LMW DOM coverage, measurement depth, and the qualitative and quantitative reproducibility across samples by comparing the accurate mass of the corresponding [M+H] + or [M-H] − molecular ion and the peak area for each feature. Only the HQFs that were observed in at least three replicates per core were carried on to subsequent quantitative analyses. This step helps reduce the probability of false positives and creates a more conservative list of only the most reproducible and abundant HQFs to be compared between samples. Missing values were imputed for statistical analyses and various univariate and multivariate statistical analyses (e.g., Students t-test, ANOVA, PCA), and data visualization techniques (e.g., volcano plots, heat maps) were used to help identify clusters of features that were consistently and significantly different between samples for annotation and to examine the relative abundance differences between extraction replicates Soil Syst. 2021, 5, 10 5 of 19 and polygon or vegetation types. Additional details about normalization procedures, selected statistical tests, and thresholds can be found in the Supporting Information.

Feature Annotation
Annotation of features that were consistently observed and significantly differentially abundant due to polygon type or vegetation was carried out in a three-step procedure described in detail in the Supporting Information. In brief, features were compared against a series of databases and assigned putative chemical formulas using high mass accuracy measurements, Kind and Fiehn's "Seven Golden Rules", and parameters modified from Kujawinski and Behn's compound identification algorithm (CIA) for small molecules [51][52][53]. Databases included KEGG [54], METLIN [55], MMCD [56], PubChem [57], HMDB [58], LipidMaps [59], or Plant Cyc [60]. Features that matched to multiple hits in a database or multiple formulas were manually scrutinized in an iterative approach by assessing high-resolution mass spectral data for consistent fragmentation profiles.

Results
Given that this was the first application of this untargeted exometabolomics technique across multiple Arctic sampling sites, we began with an evaluation of the analytical performance of the approach. Due to the large number of samples and data generated, we then applied a series of data mining techniques and multivariate statistical analyses to reduce the dimensionality of the data in order to discriminate ecologically relevant features and compositional variations due to polygon type or vegetation. Features that were significantly and reproducibly differentially abundant between sites were further investigated and putatively annotated using high-mass accuracy MS data and database searching.

Evaluation of Analytical Performance Across Multiple Sites
Given that a detailed analysis of the analytical performance of this untargeted LC/MS approach in Arctic soils was conducted previously [43], only a few primary figures of merit-measurement depth, reproducibility, and LMW DOM coverage-were examined here. All data processing, filtering steps, and statistical analyses were conducted separately for each LC/MS condition (HILIC +/−, RP +/−) to eliminate any confounding effects such as different ionization efficiencies or noise levels for example. Across the four conditions, 13,673 molecular species (RT, MS 1 , and MS 2 ) were detected, aligned, and exported for data filtering and analysis (Table 1). Table 1. LMW DOM coverage by HILIC and RP in positive-and negative-ion mode at each level of data filtering, expressed as the number of features detected across all 36 soil water extracts from 4 cores obtained from 2 polygon types and 2 species of vegetation.  Figure S3), indicating that variations observed between samples were not experimentally derived but instead due to biogeochemical variation. However,~18% of the aligned peaks were observed in nearly all the runs including the blanks and controls ( Figure S4a), suggesting these were background signals from the sample preparation procedures or LC/MS analyses. After removing these, as well as any zeros or duplicate features, 11,258 HQFs remained for downstream analyses. When we plotted the frequency at which the HQFs were observed across the dataset, we noted a recurrent trend in the data where the number of features observed increased sharply at approximate intervals of nine ( Figure S4b), corresponding with the sample set size for each core (9 extracts). These results indicate that the data filtering protocol employed here effectively reduces the false discovery rate (FDR) and increases the proportion of LMW DOM analytes represented.

HILIC (+) HILIC (−) RP (+) RP (−)
These results also suggest that a common set of LMW DOM features exists within each core, and across all four cores, despite variations in aboveground vegetation or topography. Indeed, when we examined the overlap between the four cores for each LC/MS condition using the neutral mass for each [M+H] + or [M-H] − singly charged precursor ion within 0.005 Da, on average there was a 37% overlap in the features detected ( Figure S5). Contrastingly, on average, 15.5% of the features detected were found to be unique to each core indicating there was unique biogeochemical activity within each core as well, and that the optimized LC/MS approach employed here was sensitive enough to detect these subtle variations between sites.
To examine the variability of LMW DOM across each of the LC/MS conditions prior to data filtering and normalization and evaluate the reproducibility of the untargeted approach across biological replicates, we built a correlation matrix using the calculated Pearson coefficients for each extract ( Figure 2) and PCA plots using the unique identifiers and peak areas for each HQF ( Figure S7). While there was some variability among replicates, which was more noticeable in the RP datasets, in general, there was a fair amount of correlation across each of the nine samples within each core. Interestingly, while it may be expected that aboveground vegetation dictates belowground SOM composition, the cores from the same polygon type were more highly correlated to one another than the cores with the same aboveground vegetation ( Figure 2). The PCA components accounted for 58, 51, 54, and 61% of the variation across the HILIC (+), HILIC (−), RP (+), and RP (−) datasets, respectively, indicating both polygon type and vegetation have a major effect on the LMW DOM composition.

Impacts of Polygon Type and Vegetation on LMW DOM Abundance
To reduce the dimensionality of these data and identify features that were significantly differentially abundant between cores, we performed pairwise comparisons by t-test and fold change analysis between cores of the same polygon or vegetation type, followed by an ANOVA to determine features that were in higher relative-abundance uniquely due to polygon type or vegetation (p-value < 0.001, FC > 4). To visualize these differentially abundant features, we first used volcano plots to isolate the features that had the greatest FC and lowest p-value between conditions ( Figure 3). There were more features found in higher relative abundance in the Eriophorum cores versus the Carex cores and at the HCP sites versus the LCP sites.

Impacts of Polygon Type and Vegetation on LMW DOM Abundance
To reduce the dimensionality of these data and identify features that were significantly differentially abundant between cores, we performed pairwise comparisons by ttest and fold change analysis between cores of the same polygon or vegetation type, followed by an ANOVA to determine features that were in higher relative-abundance uniquely due to polygon type or vegetation (p-value < 0.001, FC > 4). To visualize these differentially abundant features, we first used volcano plots to isolate the features that had the greatest FC and lowest p-value between conditions ( Figure 3). There were more features found in higher relative abundance in the Eriophorum cores versus the Carex cores and at the HCP sites versus the LCP sites. To then evaluate the quantitative reproducibility of the differentially abundant features, an analysis of the coefficient of variance (CV%) for the peak areas across replicates revealed that 95% of these differentially abundant features showed acceptable reproducibility (CV < 10%, Figure S8) indicating the optimized data collection and processing techniques were robust and that the data filtering protocols were conservative, selecting for LMW DOM features that were consistently detected across replicates. It is important to note that some variability observed among replicates is not unexpected. Despite the subsamples of soil being relatively small (4 g), it has been well-established that LMW DOM composition and abundance can vary at even the micro-site or aggregate scale (10 s-100 s if µm) [61,62]. That the untargeted approach applied here can detect these subtle differences is an added benefit, as it demonstrates the sensitivity of the technique to detecting variation in the abundance of LMW DOM across space and capturing both the biotic and abiotic impacts on this pool. Soil Syst. 2021, 5, x 8 of 21 To then evaluate the quantitative reproducibility of the differentially abundant features, an analysis of the coefficient of variance (CV%) for the peak areas across replicates revealed that 95% of these differentially abundant features showed acceptable reproducibility (CV < 10%, Figure S8) indicating the optimized data collection and processing techniques were robust and that the data filtering protocols were conservative, selecting for LMW DOM features that were consistently detected across replicates. It is important to note that some variability observed among replicates is not unexpected. Despite the subsamples of soil being relatively small (4 g), it has been well-established that LMW DOM composition and abundance can vary at even the micro-site or aggregate scale (10 s-100 s if µ m) [61,62]. That the untargeted approach applied here can detect these subtle differences is an added benefit, as it demonstrates the sensitivity of the technique to detecting variation in the abundance of LMW DOM across space and capturing both the biotic and abiotic impacts on this pool.

Molecular Characterization of Differentially Abundant LMW DOM Features
We further investigated the relationship between polygon type or vegetation and LMW DOM abundance by directly contrasting the differentially abundant LMW DOM features using molecular data obtained from the high-resolution LC/MS measurements. Differentially abundant features ranged in molecular weight (~56-900 m/z) and polarity, exhibited by their elution across the full retention time window for each LC/MS condition ( Figure S9). The m/z distribution did not vary appreciably between cores (Figure S10) or

Molecular Characterization of Differentially Abundant LMW DOM Features
We further investigated the relationship between polygon type or vegetation and LMW DOM abundance by directly contrasting the differentially abundant LMW DOM features using molecular data obtained from the high-resolution LC/MS measurements. Differentially abundant features ranged in molecular weight (~56-900 m/z) and polarity, exhibited by their elution across the full retention time window for each LC/MS condition ( Figure S9). The m/z distribution did not vary appreciably between cores ( Figure S10) or between the operationally defined depth increments we employed ( Figure S11) as has been found in previous work [43]. These data support that the LC/MS conditions were not biased toward any particular class of compounds and that molecular weight alone is not adequate at differentiating LMW DOM abundance across space and that additional molecular information is required.
Of the 521 differentially abundant features, 217 (42%) were assigned molecular formulas while 304 (58%) did not meet the criteria for a confident assignment or were possible adducts, complexes, or fragments identified by the MZmine modules during annotation. As described above, we estimate approximately 10% may have been adducts; including sodium (Na + ) or chloride (Cl − ) adducts as these are commonly seen in the characterization of OM using positive-and negative-ESI, respectively [63,64]. Additionally, these soils have been shown to have high iron concentrations [34,[65][66][67], and since organo-iron complexes can be soluble in soil water, they may have been extracted here as part of the LMW DOM pool. Because organo-metal complexes generally dissociate upon ionization, however, they would not appear in the mass spectrum, or would appear as an ion ([M-Fe+H] + ) less the mass of iron (55.9349 m/z) requiring a manual search of the data to annotate these. Across the 217 assigned features, the average mass error was 0.65 ppm and the average molecular weight was 379.9353 m/z [68].
Elemental data from formula assignments were then used to assign a biomolecular compound class-lipids, proteins (amino acids and amino sugars), lignins, carbohydrates, unsaturated hydrocarbons, condensed aromatics (phenolics), tannins, and aliphatics-to each differentially abundant feature based on their H/C and O/C ratios ( Figure 4) in a van Krevelen plot [69,70]. There is a clear density of formulas in the low O/C and high H/C regions of the plot, indicating an abundance of aliphatic compounds-such as lipids, sugars, and amino acids-possibly derived from microbial biomass or root exudates and plant leachates. The high presence of formulas consistent with phenolics, lignins, and proteinaceous (i.e., peptides, amino sugars) material is indicative of microbially digested lignocellulosic biomass or freshly deposited plant material.
ble adducts, complexes, or fragments identified by the MZmine modules during annotation. As described above, we estimate approximately 10% may have been adducts; including sodium (Na + ) or chloride (Cl − ) adducts as these are commonly seen in the characterization of OM using positive-and negative-ESI, respectively [63,64]. Additionally, these soils have been shown to have high iron concentrations [34,[65][66][67], and since organo-iron complexes can be soluble in soil water, they may have been extracted here as part of the LMW DOM pool. Because organo-metal complexes generally dissociate upon ionization, however, they would not appear in the mass spectrum, or would appear as an ion ([M-Fe+H] + ) less the mass of iron (55.9349 m/z) requiring a manual search of the data to annotate these. Across the 217 assigned features, the average mass error was 0.65 ppm and the average molecular weight was 379.9353 m/z [68].
Elemental data from formula assignments were then used to assign a biomolecular compound class-lipids, proteins (amino acids and amino sugars), lignins, carbohydrates, unsaturated hydrocarbons, condensed aromatics (phenolics), tannins, and aliphatics-to each differentially abundant feature based on their H/C and O/C ratios ( Figure 4) in a van Krevelen plot [69,70]. There is a clear density of formulas in the low O/C and high H/C regions of the plot, indicating an abundance of aliphatic compounds-such as lipids, sugars, and amino acids-possibly derived from microbial biomass or root exudates and plant leachates. The high presence of formulas consistent with phenolics, lignins, and proteinaceous (i.e., peptides, amino sugars) material is indicative of microbially digested lignocellulosic biomass or freshly deposited plant material. New methods have recently been proposed to improve biomolecular assignment of molecules from ecological samples, for example, by including N and P as well [71]. Since N-containing compounds made up over 70% of the differentially abundant features detected at each polygon and are vulnerable to microbial degradation, here, we have also included a van Krevelen analysis between the two polygon types using the N/C ratio (Figure 5); though this technique may also be used with the S/C or P/C ratios to visualize the distribution of heteroatoms across LMW DOM features detected. Using this approach, the results show a separation between the N-containing features at the LCP and HCP sites. More features with a low N/C ratio (N/C < 0.2) and high H/C content (H/C > 1.5), which New methods have recently been proposed to improve biomolecular assignment of molecules from ecological samples, for example, by including N and P as well [71]. Since N-containing compounds made up over 70% of the differentially abundant features detected at each polygon and are vulnerable to microbial degradation, here, we have also included a van Krevelen analysis between the two polygon types using the N/C ratio ( Figure 5); though this technique may also be used with the S/C or P/C ratios to visualize the distribution of heteroatoms across LMW DOM features detected. Using this approach, the results show a separation between the N-containing features at the LCP and HCP sites. More features with a low N/C ratio (N/C < 0.2) and high H/C content (H/C > 1.5), which are consistent with lipid-like compounds, were found at the HCP site, consistent with our findings from above. Features in the region N/C < 0.1 and H/C < 1.5, indicating a high number of amino groups and the presence of phytochemicals (bioactive plant compounds) [71], were similar between the two polygon types. Features with higher N/C > 0.2, consistent with LMW DOM compounds having secondary or tertiary amines (i.e., alkaloids, cyclic amines), were more dominant at the LCP site.
are consistent with lipid-like compounds, were found at the HCP site, consistent with our findings from above. Features in the region N/C < 0.1 and H/C < 1.5, indicating a high number of amino groups and the presence of phytochemicals (bioactive plant compounds) [71], were similar between the two polygon types. Features with higher N/C > 0.2, consistent with LMW DOM compounds having secondary or tertiary amines (i.e., alkaloids, cyclic amines), were more dominant at the LCP site. For a more detailed view of the LMW DOM chemistry at these sites, the average molecular properties for the differentially abundant features that were assigned formulas have been reported (Table 2). Due to polygon type being a stronger predictor of LMW DOM abundance, features that were in higher relative abundance at either the HCP or LCP sites have been highlighted. A selection of LMW DOM features that had the highest fold change between sites have been further characterized in Table 3.  For a more detailed view of the LMW DOM chemistry at these sites, the average molecular properties for the differentially abundant features that were assigned formulas have been reported (Table 2). Due to polygon type being a stronger predictor of LMW DOM abundance, features that were in higher relative abundance at either the HCP or LCP sites have been highlighted. A selection of LMW DOM features that had the highest fold change between sites have been further characterized in Table 3.  In addition to the volcano plots, another more detailed way we examined the differentially abundant features was to distinguish clusters of features that vary similarly across cores using two-way hierarchical clustering with the normalized log2 peak areas and a unique identifier for each feature ( Figure 6). This allows for the visualization of LMW DOM features that were consistently and similarly varying across space indicating hotspots of biogeochemical activity. Consistent with analyses described above, the cores clustered into two main groupings corresponding to polygon type, LCP or HCP, samples 1-9 with 19-27 and samples 10-18 with 28-36, respectively. A total of four clusters have been highlighted to show the subtle, but consistent and significantly different variations between cores due to polygon type, vegetation, or in some cases, depth. For example, cluster 1 shows 76 features that are more abundant across all of the cores except for the Eriophorum core at the LCP site where those features were found in lower relative abundance. Cluster 2 shows 71 features that were depleted in both LCP cores but not the HCP cores. Cluster 3 indicates that 67 features were depleted in the Eriophorum core at the HCP site, and cluster 4 shows 44 features that were in higher relative abundance in the LCP cores, but that this varied with depth in the Carex core at that site.
In addition to the volcano plots, another more detailed way we examined the differentially abundant features was to distinguish clusters of features that vary similarly across cores using two-way hierarchical clustering with the normalized log 2 peak areas and a unique identifier for each feature ( Figure 6). This allows for the visualization of LMW DOM features that were consistently and similarly varying across space indicating hotspots of biogeochemical activity. Consistent with analyses described above, the cores clustered into two main groupings corresponding to polygon type, LCP or HCP, samples 1-9 with 19-27 and samples 10-18 with 28-36, respectively. A total of four clusters have been highlighted to show the subtle, but consistent and significantly different variations between cores due to polygon type, vegetation, or in some cases, depth. For example, cluster 1 shows 76 features that are more abundant across all of the cores except for the Eriophorum core at the LCP site where those features were found in lower relative abundance. Cluster 2 shows 71 features that were depleted in both LCP cores but not the HCP cores. Cluster 3 indicates that 67 features were depleted in the Eriophorum core at the HCP site, and cluster 4 shows 44 features that were in higher relative abundance in the LCP cores, but that this varied with depth in the Carex core at that site. Soil Syst. 2021, 5, x 12 of 21 In cluster 1, of the 75 differentially abundant features, 52 (69%) were assigned a chemical formula (average mass error = 0.406 ppm) based on high mass accuracy MS 1 measurements, and 4 others that were not assigned a chemical formula but did match to a database (<5 ppm), for a total of 56 (75%) features annotated in the cluster (Table S6). When a compound was annotated by both elemental formula assignment and database matching, most of the time the formulas matched. However, there were instances where different formulas were assigned to the same molecule, which occurred twice in cluster 1, indicated by the asterisks in Table S6. In these cases, we were able to use MS 2 fragmentation data to match to available data or eliminate incorrect assignments, highlighting the value of MS 2 data in providing information about both known (already in a database) and unknown compounds, and adducts or complexes. As an example, in the case of the [M-H] − ion detected at 192.0527 m/z, characteristic neutral losses of formamide (-CH3NO, 45.0214 Da) and multiple dehydrations (-H2O, 18.0098 Da) were observed (Figure S12), indicating a structure consistent with glucuronamide, a monosaccharide derivative of beta-D-glucuronic acid, a common microbial metabolite involved in ascorbic acid synthesis [72].
It is important to note that although no formula or database match was made for the compounds at the bottom of Table S6, each of those features were reproducibly (n > 3) and reliably (S/N > 3) detected, and were robustly and conservatively determined to be significantly (p < 0.001) differentially abundant between samples. In addition, each feature has a reproducible RT and peak area, and both MS 1 and MS 2 high-mass accuracy measurements. This demonstrates the value of this information-rich signal and shows how it can be used diagnostically for both qualitative and quantitative inquiries. In cluster 1, of the 75 differentially abundant features, 52 (69%) were assigned a chemical formula (average mass error = 0.406 ppm) based on high mass accuracy MS 1 measurements, and 4 others that were not assigned a chemical formula but did match to a database (<5 ppm), for a total of 56 (75%) features annotated in the cluster (Table S6). When a compound was annotated by both elemental formula assignment and database matching, most of the time the formulas matched. However, there were instances where different formulas were assigned to the same molecule, which occurred twice in cluster 1, indicated by the asterisks in Table S6. In these cases, we were able to use MS 2 fragmentation data to match to available data or eliminate incorrect assignments, highlighting the value of MS 2 data in providing information about both known (already in a database) and unknown compounds, and adducts or complexes. As an example, in the case of the [M-H] − ion detected at 192.0527 m/z, characteristic neutral losses of formamide (-CH 3 NO, 45.0214 Da) and multiple dehydrations (-H 2 O, 18.0098 Da) were observed (Figure S12), indicating a structure consistent with glucuronamide, a monosaccharide derivative of beta-D-glucuronic acid, a common microbial metabolite involved in ascorbic acid synthesis [72].
It is important to note that although no formula or database match was made for the compounds at the bottom of Table S6, each of those features were reproducibly (n > 3) and reliably (S/N > 3) detected, and were robustly and conservatively determined to be significantly (p < 0.001) differentially abundant between samples. In addition, each feature has a reproducible RT and peak area, and both MS 1 and MS 2 high-mass accuracy measurements. This demonstrates the value of this information-rich signal and shows how it can be used diagnostically for both qualitative and quantitative inquiries.

Discussion
While the total number of features varied some between soil cores, overall, HILIC (+) detected the greatest number of HQFs across the four cores with 3929 (34.9%) followed by RP (+) with 3618 (32.1%), HILIC (−) with 2170 (19.3%), and finally RP (−) with 1541 (13.7%) ( Table 1). This is likely due to the more favorable ionization conditions in positivemode, and more reproducible retention on the HILIC columns for the small, highly polar compounds that dominate LMW DOM [73]. Despite some differences in performance, the optimized LC/MS conditions were still highly orthogonal with just 94 (2%) HQFs observed by all four conditions ( Figure S6). These results confirm that the dual-LC, dualpolarity approach is effective at expanding coverage of the LMW DOM pool and is sensitive enough to capture both shared features as well as those unique to Arctic soils obtained from different sampling sites with varying aboveground characteristics.
Soil cores from the same polygon type were more highly correlated to one another than the cores collected in areas of each polygon type that were dominated by the same aboveground vegetation (Figure 2). This suggests polygon type (topography) is a stronger predictor of LMW DOM abundance than vegetation cover at this scale and that it may be a useful scaling parameter to connect biogeochemical measurements with landscape properties (i.e., thaw depth, hydrology) [18,26,29].
The lower abundance of LMW DOM at the LCP site ( Figure 3) may be due to increased transport (horizontal or vertical) of LMW DOM out of the organic horizon [74,75], likely due to the lower topography and more saturated conditions, or increased microbial processing. The higher relative abundance of LMW DOM features in the Eriophorum cores suggests either an accumulation or an increase in the rate at which LMW DOM is made available, possibly due to higher root biomass (Table S1), which has been shown to enhance substrate availability [76]. Alternatively, this could indicate a depletion of LMW DOM in the Carex cores, which could be due to increased microbial processing (biogeochemical hotspot), or plant uptake, of DOM at those sites. Plant uptake of DOM has been observed in Arctic vegetation as a way for plants to overcome nitrogen limitation [77][78][79]. Further analysis of the molecular details or additional studies that include isotopic labeling or gas flux measurements could be conducted to verify increased mineralization of LMW DOM at the LCP and Carex sites.
As both Carex and Eriophorum are vascular plant species and decomposition is generally slowed in Arctic systems, an accumulation of lignified LMW DOM across the cores was expected (Figure 4). Most of the differentially abundant features between cores were more abundant in the Eriophorum cores and were dominated by formulas consistent with low O/C and high H/C (i.e., aliphatic) content. This may have been due to the higher root biomass present in those samples (Table S1) or, more likely, necromass found beneath the Eriophorum cores [80,81].
In contrasting the molecular details of the two polygon types, there were readily observable differences reflected in the LMW DOM pool. Consistent with our hypothesis above from the van Krevelen analysis, there are multiple lines of evidence to support a hotspot of increased microbial processing and C cycling at the LCP site. First, both the average m/z and double bond equivalents (DBE) were lower in the LCP cores, characteristic of SOM that has undergone microbial decomposition (Table 2). Second, the proportion of features characteristic of compounds with higher biodegradability-lipids, carbohydrates, aliphatics-were also lower in the LCP cores (Figure 4), suggesting they may have been preferentially degraded and released as GHGs [31,32,82]. Third, there was a higher relative abundance of tannins and other condensed aromatics at the LCP site, as shown in Figure 4 and by the higher aromaticity index (AI) and 25.3% tannin content as shown in Table 2, suggesting an accumulation of these more recalcitrant features. Finally, although LCP centers are generally more anaerobic due to saturated conditions, the average oxygen content (demonstrated by the average molecular formula, O/C, and O/S ratios) of the differentially abundant LMW DOM features at the LCP was higher than the HCP (Table 2), further supporting enhanced microbial processing of OM at the LCP site.
Additionally of note, although there were a similar number of chemical formulas detected in higher relative abundance at each polygon, the features at the HCP site were more chemically diverse as indicated by a more equitable distribution among the assigned compound classes (Table 2). One explanation for this is that although the aboveground vegetation in each core represented primarily a single species, the HCPs generally have higher plant diversity [44]. This has been associated with more diverse plant inputs into the soil and increased microbial diversity, in turn leading to a more diverse substrate pool [83].
Interestingly, of the assigned formulas at the HCP and LCP sites, 88% and 72%, respectively, contained N, suggesting root exudation of organic N and/or microbial processing of SOM may be an important process occurring at these sites, especially in the HCP cores. This could be a result of the priming effect. Because HCP polygons are drier [84], plant and microbial activity may be more limited. As such, vegetation may allocate more N belowground to try and stimulate microbial processing of organic matter to release nutrients for uptake [85,86]. Somewhat surprisingly,~11% of the differentially abundant formulas across all four cores contained both sulfate and nitrate groups (O > 6), which are characteristic of secondary organic aerosols (SOAs) [87]. Secondary organic aerosols are formed in the atmosphere through a complex interaction of sunlight and volatile organic compounds that originate from industrial emissions, cars, burning biomass, or even vegetation [88]. They have been shown to be an important input of organic C to alpine systems where they influence a range of biogeochemical processes [89]. However, while they have been observed near Utqiaġvik before, it has generally been along the coastline or in the marine environment closest to anthropogenic activities [90]. To the best of our knowledge, this is the first evidence of water-soluble SOAs in polygonal tundra soils on the BEO. These results suggest that some portion of LMW DOM that is available for microbial processing or plant uptake is derived from volatile organic carbon precursors.
Among the LMW DOM features annotated in Cluster 1 by database matching, there were amino acids, plant hormones, microbial metabolites, lignin-like molecules, and DNA/RNA fragments/derivatives (Table S6). These data support that this approach can detect key compounds involved in biogeochemical cycling. For example, a urea derivative (N-hydroxymethyl urea, [M-H] − detected at 89.0358 m/z), was found to be in higher relative abundance in every core except the Eriophorum core at the HCP site. As a key metabolite in N cycling (i.e., ornithine cycle), urea is produced/excreted when there is an accumulation of highly toxic ammonia. An accumulation of extracellular urea in these soils may suggest increased inorganic N availability. This example demonstrates the utility of this untargeted approach for targeting additional compounds of interest; here, those involved in the urea cycle (i.e., glutamate, glutamine, arginine, citrulline) and in elucidating ecologically relevant molecular information to be used in mechanistic modeling.
In conclusion, this study implemented a dual-LC, dual-polarity nano-LC/MS approach to examine the variation in LMW DOM relative abundance in soil cores with two contrasting aboveground vegetation profiles and polygon types. Taken together, these results reveal a complex picture of C and N cycling at these sites, yielding insights into the chemical processing and relative degradability of the LMW DOM features found across the Arctic landscape. These results support that a broad range of compounds with varying physicochemical properties and concentrations were detected by the optimized approach and that the untargeted platform is sensitive, robust, and reproducible even when applied across multiple cores from different sites across the landscape. We provide evidence that LMW DOM is a diverse and reactive pool, and while there were a common set of metabolites among the cores, there were significant differences observed between sites as well indicating LMW DOM may be an important indicator of biogeochemical variation across the landscape. In addition, the untargeted LC/MS approach was sensitive to variation at multiple scales. While polygon type was a strong predicter of LMW DOM composition and relative abundance, vegetation also had an impact, indicating LMW DOM provides a window into the dynamic and complex interactions between landscape topography, vegetation, and SOM cycling. Furthermore, this study revealed evidence of enhanced microbial processing at the LCP and Carex sites demonstrating its ability to detect hotspots of biogeochemical activity across space. Of the 521 differentially abundant features detected, 217 were putatively annotated by formula assignment, database matching, and evaluating the fragmentation data. For some compounds, this is the first time they have been reported in Arctic soils, including the 11% of detected formulas consistent with secondary organic aerosols, although additional studies are needed to understand the relative importance of this process in these systems. With an average mass error of <1 ppm, these high-mass accuracy measurements combined with reproducible retention times and peak areas provide an information-rich chemical profile of LMW DOM features in polygonal tundra soils. Correlating these qualitative and quantitative variations with additional landscape-scale features (i.e., hydrology, gas fluxes) would yield additional insight into how this chemical signal may be used to predict various processes impacting C cycling in the Arctic.
Supplementary Materials: The following are available online at https://www.mdpi.com/2571-8 789/5/1/10/s1: Table S1: Polygon soil core sample summary and corresponding TOC, TN, TC, C:N, and dry root weight results. Table S2: Optimized mobile phase conditions and additives for each LC phase and MS polarity, injection volume, and flow rates. Table S3: Optimized gradient conditions for nano-LC separations, for positive-and negative-MS modes on C18-RP and ZIC-pHILIC columns. Table S4: MZmine parameters used for the peak detection modules applied in the analysis of the polygonal tundra soil organic horizons. Table S5: MZmine parameters used for the peak list generation modules applied in the analysis of the polygonal tundra soil organic horizons. Table S6: "Cluster 1", listing the differentially abundant LMW DOM features found in lower relative abundance in the Eriophorum-HCP core. Figure S1: Percent aligned peaks that were annotated my MZmine as an adduct, complex, or fragment of another peak within 10 ppm and +/− 0.1 min RT for each core grouped by LC/MS condition. C = Carex, E = Eriophorum, A = Site A or Low-Centered Polygon (LCP), and B = Site B or High-Centered Polygon (HCP). Figure S2: Box plots of raw log 2 abundance values for HILIC (−) dataset which had a systematic shift in quantitative values (a) due to experimental variation (i.e., instrumentation, ionization efficiency, extraction efficiency). Plot (b) shows how the normalization procedure removes this systematic error. Figure S3: PCA plots of raw log 2 abundance values from the blank, controls, and samples before normalization, imputation, and filtering procedures. Figure S4: (a) Histogram of the frequency of observations for each aligned peak (RT, MS 1 , and MS 2 ) across the entire dataset (all 4 cores), including blanks and controls (55 total runs), before any data quality filtering steps and (b) a histogram of the HQFs across the 36 samples, after filtering out zeros, duplicates, and signals that were observed in the blanks or controls. Figure Figure S7: PCAs of HQFs by LC/MS condition. Figure S8: Pie chart with the results from a coefficient of variance analysis for peak areas of the differentially abundant LMW DOM features. Figure S9: Distribution of molecular weight (m/z) and retention time (RT) for differentially abundant features, detected across the 36 extracts, separated by LC/MS condition. Figure S10: Distribution of molecular weight (m/z) against fold change for LMW DOM features that were differentially abundant due to polygon (E-HCP_E-LCP and C-HCP_C-LCP) or vegetation type (C-HCP_E-HCP and C-LCP_E-LCP). Figure S11: Distribution of m/z by core and depth; solid color = top, stripes = middle, dots = bottom, C = Carex aquatilis, E = Eriophorum angustifolium.