Using Community Science to Reveal the Global Chemogeography of River Metabolomes

River corridor metabolomes reflect organic matter (OM) processing that drives aquatic biogeochemical cycles. Recent work highlights the power of ultrahigh-resolution mass spectrometry for understanding metabolome composition and river corridor metabolism. However, there have been no studies on the global chemogeography of surface water and sediment metabolomes using ultrahigh-resolution techniques. Here, we describe a community science effort from the Worldwide Hydrobiogeochemistry Observation Network for Dynamic River Systems (WHONDRS) consortium to characterize global metabolomes in surface water and sediment that span multiple stream orders and biomes. We describe the distribution of key aspects of metabolomes including elemental groups, chemical classes, indices, and inferred biochemical transformations. We show that metabolomes significantly differ across surface water and sediment and that surface water metabolomes are more rich and variable. We also use inferred biochemical transformations to identify core metabolic processes shared among surface water and sediment. Finally, we observe significant spatial variation in sediment metabolites between rivers in the eastern and western portions of the contiguous United States. Our work not only provides a basis for understanding global patterns in river corridor biogeochemical cycles but also demonstrates that community science endeavors can enable global research projects that are unfeasible with traditional research models.


Introduction
Organic matter (OM) transformations in aquatic ecosystems are a critical source of uncertainty in global biogeochemical cycles [1][2][3][4]. More than half of OM inputs to freshwater ecosystems are metabolized before reaching the oceans [1,2,4], yet while several studies have focused on quantifying OM uptake and export rates [1,5,6], the processes driving river corridor OM transformations across spatial scales remain poorly understood.
River corridor OM pools contain an extensive variety of molecules that are both produced and metabolized by microorganisms, which are processes reflected in the composition of sediment and surface water metabolomes [2,7,8]. Metabolic transformations of OM in freshwater ecosystems have been traditionally estimated by a combination of laboratory incubations and in-stream tracer additions [9][10][11][12]. However, results from incubation experiments are challenging to scale beyond laboratory conditions [9,10], and in-stream tracer processing often does not reflect ambient biogeochemical processes, as the naturally occurring metabolome is more chemically diverse than the tracer added to the stream [11,12]. Several studies have shown that OM pool composition can influence microbial activity, highlighting complexities in the metabolic processes that determine OM transformations [13][14][15][16][17][18]. Consequently, determining mechanisms underlying river corridor metabolome composition at a large scale remains challenging.
Environmental metabolomics uses the identification of small molecules in an organism (metabolites) to characterize the interactions of organisms within their environment [19]. Over the past several years, this definition has been extended to encompass all metabolites present in complex environmental systems for which it is difficult to attribute specific metabolites to specific organisms [20][21][22][23][24]. Different metabolomic techniques have been implemented across fields to enhance our understanding of microbial communities [25,26], anthropogenic activities and pollution sources [27][28][29], and potential bioremediation strategies [30]. Recently, environmental metabolomics, enabled by ultrahigh-resolution mass spectrometry, has allowed us to reveal connections between OM character, reactivity, and biochemical transformations within and across river ecosystems [15,17,18,[31][32][33][34][35]. These advances have vastly improved our understanding of the mechanisms governing OM bioavailability and biochemical transformations at a global scale. For instance, previous studies have used ultrahigh-resolution metabolomics from river water across different climatic regions to find common compositional features that would inform global carbon dynamics [36] and to investigate environmental drivers affecting OM composition, bioavailability, and transport of OM [37]. In addition, recent studies show that OM thermodynamics influence aerobic respiration under carbon-limited scenarios [16], that biogeochemical hotspots are influenced by OM nitrogen content [17], and that hyporheic zone mixing induces OM metabolism via a priming effect [15]. These detailed metabolome characterizations have the potential to enable global-scale inferences about watershed features (e.g., vegetation, lithology, hydrology, microbiology, climate) that govern the reactivity and fate of OM across river corridors [35,38]. In turn, metabolomics can enhance our predictive capabilities of global river corridor biogeochemical cycles by helping to improve the representation of biochemical mechanisms in numerical models, such as reactive transport codes [39,40]. For example, an emerging substrate-explicit model uses thermodynamic theory to explicitly account for the chemical composition of all metabolites in OM pools to improve the predictive capacity of biogeochemical models [40].
Characterizing metabolomes across global spatiotemporal scales requires a way to collect multiple data types across diverse locations in such a way that they can be analyzed together. This goal can be facilitated by a framework that requires studies to Integrate biological, physical, and chemical processes across scales; Coordinate with consistent methods; be Open across the research lifecycle; and Network with global collaborators to reduce the burden on a single team (ICON) [41,42]. When ICON principles are applied, they allow for distributed sampling in ways that have historically been difficult to achieve.
The Worldwide Hydrobiogeochemistry Observation Network for Dynamic River Systems (WHONDRS) is a global consortium of researchers based out of Pacific Northwest National Laboratory that uses an ICON-based approach to understand coupled hydrologic, biogeochemical, and microbial functions in river corridors [35]. ICON principles allow WHONDRS to collect open, globally distributed data through collaboration with the scientific community. The WHONDRS consortium designs sampling campaigns that target specific spatial and temporal scales, modifies its approach based on community input, and then sends free sampling kits to collaborators. All WHONDRS data are openly accessible through Environmental Systems Science Data Infrastructure for a Virtual Ecosystem (ESS-DIVE-https://data.ess-dive.lbl.gov/) and the National Center for Biotechnology Information (NCBI), and the WHONDRS consortium ascribes to FAIR data principles (findable, accessible, interoperable, reusable) [43]. This approach enables WHONDRS to collect, analyze, and distribute ultrahigh-resolution metabolomic data to the global scientific community.
Here, we describe a community science effort conducted by the WHONDRS consortium during July-August 2019 that used Fourier-transform ion cyclotron resonance mass spectrometry (FTICR-MS) to characterize metabolomes in global surface water and sediment spanning a range of biomes (e.g., desert-like in the Columbia Plateau, subtropical in southern Florida, temperate forests in the Mid-Atlantic) and stream orders [44]. We describe key metabolome characteristics of surface water and sediment and also explore spatial variation of these characteristics within the United States. We focus on central aspects of metabolomes including assigned elemental groups, chemical classes, descriptor indices, and biochemical transformations. This paper provides a benchmark for studying integrated surface water and sediment river corridor metabolomes and highlights the need to engage a wider scientific community in order to expand the reach and impact of scientific advancements.

Surface Water Metabolome Is More Unsaturated, Aromatic, Oxidized, Rich, and Variable Than Sediment Metabolome
In order to assess patterns in global metabolome composition, we derived a number of descriptive metrics that summarize FTICR-MS metabolomic profiles. Specifically, we compared double-bond equivalents (DBE), modified aromaticity index (AI Mod ), nominal oxidation state of carbon (NOSC), inferred chemical classes (e.g., lignin-like, protein-like), and elemental groups (e.g., CHO, CHON, CHOSP) of surface water and sediment metabolomes. The double-bond equivalent metric (DBE) describes the degree of chemical unsaturation of bonds in a particular metabolite [45], AI Mod quantifies the degree of aromaticity (i.e., ring-like shape) of a metabolite [45,46], and NOSC indicates the energy required to oxidize different metabolomes [47]. High values of AI Mod can denote the existence of either aromatic (AI Mod > 0.5) or condensed aromatic structures (AI Mod ≥ 0.67), and high DBE indicates more saturated compounds. NOSC is inversely correlated with the Gibbs free energy of carbon oxidation. Higher NOSC corresponds to metabolites that are more oxidized and thermodynamically favorable [15][16][17][18]47,48]. Chemical class assignments for each metabolite were predicted using oxygen-to-carbon and hydrogen-to-carbon ratios (i.e., Van Krevelen classes [49]). Finally, we used the molecular formula assigned to each metabolite to describe the relative abundance of different heteroatom combinations associated with CHO groups (i.e., differences in -N, -S and/or -P). We then compared metrics across all metabolites found in any surface water sample vs. all metabolites found in any sediment sample. All analyses in Section 2.1 were conducted only on FTICR-MS peaks that were able to be assigned a molecular formula. Other metrics describing metabolome composition are reported in the SI (Table S1).
Surface water metabolomes were composed of comparatively more unsaturated and aromatic compounds with a higher nominal oxidation state than sediment. This was denoted by significantly higher AI Mod , DBE, and NOSC than sediment metabolomes (Figure 1, p-value < 0.001). In addition, we observed higher relative abundances of lignin-like, tannin-like, and condensed-hydrocarbon-like metabolites in surface water versus sediment (Figure 2, all p-values < 0.001). These classes of metabolites are characteristic of terrestrial OM [50], and their prevalence in surface water metabolomes indicates a larger contribution of terrestrial OM in surface water relative to sediment. This may also indicate greater contributions of microbially processed OM in sediment, as has been observed previously in comparisons between surface water and hyporheic zone porewater [15,51]. Density plots comparing the properties of all molecular formulas found in surface water and sediment samples. AI Mod (modified aromaticity index) is a measure of the potential ring-like structure in a given molecular formula. DBE (double-bond equivalents) is an approximation of potential unsaturation. NOSC (nominal oxidation state of carbon) represents the degree of oxidation/reduction of a given molecular formula. Significance values obtained via a two-sided Mann-Whitney U test to compare sample type distributions are denoted in the upper right corner of each panel.
The higher relative abundance of unsaturated and aromatic metabolites in surface water contrasts with previous studies that have observed that these compounds are more common in sediment porewater than in lake surface water or aquifer recharge water [52,53]. These studies inferred low physical, chemical, and/or biological transformation of sediment porewater associated OM. This deviation might be connected to the systems studied. For example, Pracht et al. [52] examined a system where sediment OM was protected due to physical and/or chemical constraints such as mineral sorption and hydrophobic encapsulation [52]. We studied rivers where the shallow benthic layer and the hyporheic zone are known to enhance biogeochemical reactions [54][55][56][57][58]. In turn, we hypothesize that very high rates of biological activity in riverbed sediment [59,60] could be responsible for lower AI Mod , DBE, and NOSC values of sediment metabolomes relative to surface water, in contrast to previous work in potentially less active lake and aquifer systems. Box plots comparing the relative abundances of metabolites belonging to specific elemental groups (a) and chemical classes (b) between sediment and surface water. As in Figure 1, these values were obtained from all metabolites assigned molecular formulas in sediment and surface water.
In addition, the relative abundances of lipid-like and protein-like metabolites were significantly higher in sediment than in surface water (p-value < 0.001). More lipid-like compounds in sediment could reflect higher microbial biomass [61] and further supports our inference that sediment metabolomes were influenced by microbial processes to a greater extent than surface water metabolomes. This highlights the key role played by riverbed sediment and associated hyporheic zones in river corridor biogeochemistry that likely influences global elemental cycles but is not captured in current Earth system models.
Conversely, elemental groups of metabolites were similar across surface water and sediment. The median abundance of each elemental group did not vary more than 5% between the two environments, except for CHO (~9%) and CHONSP (~0.2%, not statistically significant) groups. Bulk similarities in elemental groups, in contrast to chemical classes, could indicate that the presence or absence of heteroatoms alone is insufficient to distinguish metabolomes and that elemental stoichiometry of the entire metabolite (the basis for chemical class assignment) may be more important for distinguishing metabolomes. This is important because the elemental stoichiometry of metabolites can mechanistically connect OM thermodynamics to biogeochemical reactions and rates [40].
Metabolites found in surface water were distinct from and showed more among-sample variation than those in sediment. To evaluate compositional differences, we conducted a principal component analysis (PCA) and a beta-dispersion analysis ( Figure 3). For consistency with prior analyses in this section, we present a PCA on only peaks assigned a molecular formula in Figure 3. When performed on all peaks, regardless of formula assignment, PCA results were consistent with Figure 3 ( Figure S1, Table S2). The PCA, in conjunction with a PERMANOVA comparison, indicated that surface water and sediment metabolomes significantly diverged in composition (p-value < 0.001). Loadings for PC1 and PC2 are presented in Table S3. In general, the loadings suggest that many metabolites contributed to the separation between surface water and sediment metabolomes (PC1), while CHON-containing metabolites primarily drove variability in surface water metabolomes (PC2). The beta-dispersion analysis further indicated that surface water metabolomes were more dispersed in multivariate space than sediment metabolomes ( Figure 3; p-value < 0.001). Additionally, surface water metabolomes had higher richness (i.e., more peaks with assigned formulas detected on average) than sediment metabolomes ( Figure 1). These patterns indicate that metabolomes in surface water and sediment may be shaped by distinct processes that likely span differences in inputs, rates of microbial activity, and abiotic constraints. Differences between surface water and sediment metabolomes were significant per a Euclidean distance-based PERMANOVA (p-value < 0.001). The degree of among-sample variation was evaluated by quantifying beta-dispersion. Surface water had higher beta-dispersion per a two-sided Mann-Whitney U test (p-value < 0.001) (b).
We hypothesize that higher richness and greater among-sample variation in surface water metabolomes could reflect more heterogeneous environmental pressures. For example, we sampled across a broad range of latitudes and stream orders that likely led to among-site variation in light exposure (Table S4 [ 44]). This may, in turn, have led to variation in surface water temperatures and surface water metabolite photodegradation, thereby increasing metabolome variability and richness [62]. Hydrology could also contribute to metabolome richness in surface water as precipitation events and associated runoff transport large amounts of terrestrial OM into rivers [63]. For example, precipitation has been shown to increase aromatic OM and decrease more labile OM in surface water [64][65][66]. We hypothesize that more immediate connectivity between surface water and terrestrial systems, relative to connectivity between sediment and terrestrial systems, is at least partially responsible for greater variation and higher richness in surface water metabolomes.
In addition, lower sediment metabolome variation and richness could be due to comparatively higher rates of microbial activity in sediment that degrade polymeric OM into a limited set of less chemically complex metabolites. This would result in a reduction in the number of distinct metabolites present by collapsing a diverse pool of OM into microbial exudates. Sediment metabolomes may also be constrained by interactions with sediment mineral surfaces, especially considering that we studied only the water-extractable metabolome. This subset of the full sediment metabolome may inherently be composed of a restricted suite of metabolites [67], leading to lower among-sample variation and lower richness. We nonetheless hypothesize that among-site variation in mineralogy could contribute to some of the observed sediment metabolome variation. The WHONDRS consortium is currently generating mineralogy data to test this hypothesis.
Together, the observations presented in this section indicate that there are significant differences across global surface water and sediment metabolomes, where surface water metabolomes are more unsaturated, aromatic, oxidized, rich, and variable. These characteristics suggest that surface water metabolomes are more dynamic due to a variety of watershed and river corridor processes (discussed above), while sediment metabolomes may be more stable integrators of localized processes (e.g., mineral interactions and microbial processing of OM).

Nitrogen-, Sulfur-and Phosphorous-Containing Transformations Vary Across Surface Water and Sediment Metabolomes
We evaluated how potential reactions in metabolomes varied across the globe by inferring biochemical transformations as per Bailey et al. [61], Kaling et al. [68], Moritz et al. [69], Graham et al. [17,18], Garayburu-Caruso et al. [16], Danczak et al. [38], and Stegen et al. [15]. This method leverages the ultrahigh-resolution of FTICR-MS to compare mass differences between detected peaks to a database of common biochemical transformations. Identified biochemical transformations provide information regarding the frequency at which a specific molecule could have been gained or lost during metabolism. Resulting transformation counts can then be separated based upon their chemical properties to study the potential role of the molecule gained or lost in metabolome composition. Unlike the analyses described in the previous section, where formula assignments of metabolites are necessary, this method allows for the incorporation of all detected metabolites into downstream analyses.
We observed that biochemical transformations involving molecules containing nitrogen (N), sulfur (S), or phosphorous (P) exhibited divergent patterns between surface water and sediment metabolomes ( Figure 4). Specifically, surface water had a significantly higher relative abundance of N-containing transformations, while sediment metabolomes had more S-and P-containing transformations (p-value < 0.001 in all cases). This contrast between surface water and sediment may occur due to variation in nutrient requirements within the water column and sediment, as biochemical transformations have been inferred to reflect nutrient limitations in other systems. For example, Garayburu-Caruso et al. [16] found an increase in N-containing transformations under nutrient-limited conditions, but only when N-containing OM was introduced. More N-containing transformations in surface water as compared to sediment could therefore reflect microbial N mining in surface water through the preferential decomposition of N-containing OM [70,71]. In addition, a higher abundance of P-and S-containing biochemical transformations in sediment further suggest that microbial metabolism is limited by different factors between surface water and sediment environments, which are potentially associated with nutrient assimilation processes [72,73].
Interestingly, the higher abundance of N-containing transformations in surface water observed in this study is contrary to past work, where porewater had a higher relative abundance of N-containing transformations than surface water [15,38]. Given that these studies were collected from the Pacific Northwest region of the United States (e.g., eastern Washington and Oregon), this highlights the necessity to expand research beyond individual test systems or geographic regions and emphasizes the utility of global studies through efforts like WHONDRS.
In contrast, biochemical transformations that did not involve N-, S-, or P-containing molecules were not significantly different between surface water and sediment metabolomes. Similar patterns have been observed in other studies [38]. These results indicate the presence of ubiquitous biochemical transformations that occur in both surface water and sediment (Figure 4). Based on these results, we hypothesize that N-, S-, and P-containing transformations may have a stronger dependency than CHO-only transformations on nutrient status. That is, changes in nutrient availability across surface water and sediment environments may drive shifts in N-, S-, and/or P-containing transformations but not influence transformations that do not involve these nutrients. Additional data on variation in nutrient limitation and availability will be required to test this hypothesis.

Sediment Metabolomes Are More Spatially Variable Than Surface Water Metabolomes
Because most of our sampling locations were in the contiguous United States (CONUS), we used CONUS data to resolve potential spatial patterns in metabolomes ( Figures 5 and 6). In order to uncover site-by-site metabolomic variation, we calculated the mean value for each derived metric (e.g., AI Mod , NOSC, DBE) and calculated the relative abundance of elemental groups and chemical classes (e.g., CHO and lignin-like) for each sample.
Overall, differences between the mean properties of CONUS surface water and sediment metabolomes were generally consistent with differences between global surface water and sediment metabolomes reported in Figure 1. For instance, surface water metabolomes displayed higher AI Mod , DBE, and NOSC than sediment metabolomes ( Figure 5, p-value < 0.001 for all, Table S5). We also observed similar patterns in both the relative abundances of specific elemental groups (Figure 6, p-value < 0.001 for all, Table S5) and chemical classes (p-value < 0.001 for all, Table S5 and File S1). In order to expand our analyses, we investigated spatial patterns in individual metabolomic features across the CONUS by comparing sites that were east (hereafter "East", surface water n = 34, sediment n = 33) vs. west (hereafter "West", surface water n = 45, sediment n = 38) of the Mississippi River.  In general, we observed greater spatial patterning in sediment metabolomes than in surface water metabolomes. Average NOSC and AI Mod values of sediment metabolomes were higher in the East than in the West (p-value < 0.001 for both). Metabolites containing CHONP, CHONS, or CHOSP constituted a significantly lower relative abundance of metabolomes in the East sediment, relative to the West (p-value < 0.001 for all). Lastly, lignin-and tannin-like metabolites constituted a higher relative abundance of sediment metabolomes in the East, while protein-, condensed-hydrocarbon-, and unsaturated-hydrocarbon-like metabolites were more abundant in the West (p-value < 0.001 for all, Table S5 and File S1).
In contrast to spatial patterns in sediment metabolomes, surface water showed less spatial structure with only CHO and CHOS metabolites showing significant shifts between East and West (p-value < 0.001 for both). CHO and CHOS metabolites comprised higher and lower relative abundances in the West than in the East, respectively ( Figure 6). No spatial patterns were observed in NOSC (p-value = 0.4, Table S5), DBE (p-value = 0.07, Table S5), or AI Mod (p-value = 0.93, Table S5) in surface water ( Figure 5). Results from the remainder of molecular indices, elemental groups, and chemical class comparisons are shown in Table S5 and File S1.
Together, these patterns suggest that spatial differences in the metabolic processes driving OM cycling in the East versus the West have a stronger influence on sediment metabolomes than surface water metabolomes. While our non-spatial analyses showed greater among-sample and among-site variability in surface water metabolomes (Figure 3), the lack of spatial structure across the CONUS suggests that this variability is not driven by factors that are spatially structured at the continental scale. Instead, we hypothesize that surface water metabolomes are more temporally variable due to fluctuating inputs from precipitation events. To more directly evaluate spatial structure in surface water metabolomes, it is likely necessary to control for precipitation history and hydrologic connectivity to terrestrial systems. These inferences are supported by previous studies addressing spatial dissolved OM chemography dynamics showing that longitudinal patterns of dissolved OM in surface water are sensitive to hydrologic events [74][75][76]. However, there is a complex interaction between hydrology and space, as the sources and quality of OM from different regions may respond differently to hydrological variation [77].
The contrasting patterns in surface water and sediment metabolome characteristics across the East-West gradient could be the result of many factors, including vegetation cover [78], underlying lithology [79], photoreactivity [62], climate and precipitation regime [79], and/or microbial metabolism [80]. Additional data and analyses will be required to disentangle the relative contributions of these potential drivers. Pursuing this knowledge is important for explaining and ultimately predicting OM transformations. In turn, representing these processes and their impacts on biogeochemical cycles in processed-based models has the potential to improve the accuracy of biogeochemical predictions across the globe.

WHONDRS Summer 2019 Sampling Campaign
In July and August 2019, the WHONDRS consortium initiated a study of global river corridors to evaluate interactions between ecosystem features, microbial communities, and metabolomes in surface water and shallow sediments. To design the study, the WHONDRS consortium held multiple webinars with collaborators who volunteered to collect samples. The webinars allowed for community input on sampling protocol and data collected. More details are available at https://whondrs.pnnl.gov.
Briefly, WHONDRS developed sampling protocols and videos in coordination with the scientific community that were made openly available via YouTube, sent free sampling kits to collaborators, and conducted a suite of biogeochemical analyses on surface water and sediment. All data will be made open access following QA/QC at https://data.ess-dive.lbl.gov/. Preliminary data are available on a Google Drive linked via https://whondrs.pnnl.gov as they become available.
The 2019 study collected samples and metadata associated with stream order, climate, vegetation, and geomorphological features from 97 river corridors in 8 countries within a 6-week period, from 29 July to 19 September (Table S4, [44]). Stream order information (Table S4) was acquired for sites within the continental United States through the EPA National National Hydrography Dataset Plus (https: //www.epa.gov/waterdata/nhdplus-national-hydrography-dataset-plus), and stream orders for a couple of Canada sites were acquired through British Columbia Data Catalogue (https://catalogue.data. gov.bc.ca/dataset/75299593-3222-40f9-879f-29e9824fc978). Stream orders indicate the relative size of a stream [81]. The data provided in the SI were calculated following Strahler's definition of stream order [82]. This is estimated based on the size of its tributaries; for example, if two 1st order streams come together, they will form a 2nd order stream. Lower stream orders tend to be small tributaries or headwaters, while large stream orders are often major rivers [81,82].
This paper focuses on surface water (95 sites) and sediment (78 sites) collected across biomes (i.e., desert, tropical, temperate forests), from which a total of 504 samples were analyzed. Toyoda et al. [44] provide additional metadata associated with specific site characteristics (e.g., hydrogeomorphology, vegetation, temperature, discharge).

Sample Collection and Laboratory Pre-Processing
At each location, collaborators selected sampling sites within 100 m of a station that measured river discharge, height, or pressure. Within each site, 3 depositional zones were identified for sediment collection following NEON's protocol (NEON.DOC.001193; [83]) and labeled as upstream, midstream, or downstream. The depositional zones were situated within 10 m of each other. Surface water was sampled in triplicate prior to sediment sampling. Surface water was collected only at the downstream site before collecting the sediments to make sure the water collected was not affected by sediment debris mobilized during water or sediment sampling at upstream locations. Sediments were collected from all three zones, where each zone provided a biological replicate of a sediment sample.
Surface water was collected using a 60 mL syringe and was filtered through a 0.22 µm sterivex filter (EMD Millipore) into a 40 mL glass vial (I-Chem amber VOA glass vials; ThermoFisher, pre-acidified with 10 µL of 85% phosphoric acid). Subsequently, 125 mL of surface sediments (1-3 cm depth) were sampled from a~1 m 2 area at each depositional zone with a stainless steel scoop, making sure the sediments were saturated upon collection. All samples were shipped to Pacific Northwest National Laboratory on blue ice within 24 h of collection.
Surface water samples were immediately frozen at −20 • C upon receiving. Sediments from each depositional zone were individually sieved to <2 mm, subsampled into proteomic friendly tubes (Genesee Scientific), and stored at −20 • C for FTICR-MS analysis.

Fourier Transform Ion Cyclotron Resonance Mass Spectrometry (FTICR-MS)
Surface water samples were thawed in the dark at 4 • C for 72 h. Non-purgeable organic carbon (NPOC) was determined using a 5 mL aliquot of the acidified water sample by a Shimadzu combustion carbon analyzer TOC-L CSH/CSN E100V with ASI-L autosampler. NPOC concentrations (Table S6) were normalized to 1.5 mg C L −1 across all samples to allow for data comparison across sites within this study and other WHONDRS sampling campaigns. Diluted samples were acidified to pH 2 with 85% phosphoric acid and extracted with PPL cartridges (Bond Elut), following Dittmar et al. [84].
Sediment samples were thawed overnight in the dark at 4 • C. Then, sediment organic matter was extracted in proteomic friendly tubes (Genesee Scientific) with a 1:2 ratio of sediment to water (5 g of sediment to 10 mL of milli-Q water). During the extraction, tubes were continuously shaken in the dark at 375 rpm and 21 • C for 2 h, after which the tubes were centrifuged at 6000 rcf and 21 • C for 5 min. The supernatant was collected and filtered through 0.22 µm polyethersulfone membrane filter (Millipore Sterivex, USA) into borosilicate glass vials. NPOC (Shimadzu combustion carbon analyzer TOC-Vcsh with ASI-V autosampler) was determined using a 5 mL aliquot from the filtered supernatant. As with the water samples, this supernatant was normalized to a standard NPOC concentration (Table S4) of 1.5 mg C L −1 , acidified to pH 2 with 85% phosphoric acid, and extracted with PPL cartridges following the same methods described above.
A 12 Tesla (12 T) Bruker SolariX Fourier transform ion cyclotron mass spectrometer (FTICR-MS; Bruker, SolariX, Billerica, MA, USA) located at the Environmental Molecular Sciences Laboratory in Richland, WA, was used to collect ultrahigh-resolution mass spectra of surface water and sediment OM pools. Resolution was 220 K at 481.185 m/z. The FTICR-MS was outfitted with a standard electrospray ionization (ESI) source, and data were acquired in negative mode with the voltage set to +4.2 kV. The instrument was externally calibrated weekly to a mass accuracy of <0.1 ppm; in addition, the instrument settings were optimized by tuning on a Suwannee River Fulvic Acid (SRFA) standard. Data were collected with an ion accumulation of 0.05 s for surface water and 0.1 or 0.2 s for sediment from 100-900 m/z at 4 M. One hundred forty-four scans were co-added for each sample and internally calibrated using an OM homologous series separated by 14 Da (-CH2 groups). The mass measurement accuracy was typically within 1 ppm for singly charged ions across a broad m/z range (100 m/z-900 m/z). BrukerDaltonik Data Analysis (version 4.2) was used to convert raw spectra to a list of m/z values by applying the FTMS peak picker module with a signal-to-noise ratio (S/N) threshold set to 7 and absolute intensity threshold to the default value of 100. We aligned peaks (0.5 ppm threshold) and assigned chemical formulas using Formularity [85]. The Compound Identification Algorithm in Formularity was used with the following criteria: S/N > 7 and mass measurement error <0.5 ppm. This algorithm takes into consideration the presence of C, H, O, N, S, and P and excludes other elements.
It is important to note that FTICR-MS is not quantitative and does not provide information about the structure of the molecular formulas identified. This method provides a non-targeted approach to reliably identify molecular formulas of organic metabolites with masses between 200-900 m/z. The power of FTICR-MS is that it can capture thousands of metabolites simultaneously in contrast to other global environmental metabolomics techniques that yield less information. A key consideration with FTICR-MS-derived information is that it captures all ionizable organic molecules and thus is source-agnostic (e.g., not all detected compounds are guaranteed to be biologically derived). Hence there is a tradeoff of depth vs. specificity in metabolomics methods, and FTICR-MS sacrifices some specificity for depth. In addition, the sediment-water extractions performed in this study provide chemical selectivity towards water-extractable OM. Although water-soluble OM in the sediments is the primary interest of this study, the extraction method has the potential to bias towards the most labile pool of the sediment OM and can also extract a higher abundance of carbohydrates when compared to other solvents [67].

FTICR-MS Data Analysis
All FTICR-MS analyses were performed using R v4.0.0 [86], and all plots were generated using the ggplot2 package (v3.2.2) [87]. The R package "ftmsRanalysis" [88] was used to (1) remove peaks outside of a high confidence m/z range (200 m/z-900 m/z) and/or with a 13C isotopic signature; (2) calculate molecular formula properties (i.e., Kendrick defect, double-bond equivalent, modified aromaticity index, nominal oxidation state of carbon, standard Gibbs Free Energy of carbon oxidation); and (3) to determine to which chemical class a given metabolite belonged [45,47,49]. Using "ftmsRanalysis" [74] data outputs, we can obtain the central aspects of metabolomes investigated in this study, where elemental groups are categorized by the combination of elemental atoms present in each metabolite with molecular formula identified (e.g., CHO, CHON, CHOSP). The double-bond equivalent metric (DBE) describes the degree of chemical unsaturation of bonds in a particular metabolite [45], the modified aromaticity index (AI Mod ) quantifies the degree of aromaticity (i.e., ring-like shape) of a metabolite [46], and NOSC indicates the energy required to oxidize different metabolomes [47]. High values of AI Mod can denote the existence of either aromatic (AI Mod > 0.5) or condensed aromatic structures (AI Mod ≥ 0.67), and high DBE indicates more saturated compounds. NOSC is inversely correlated with the Gibbs free energy of carbon oxidation. Higher NOSC corresponds to metabolites that are more oxidized and thermodynamically favorable [15][16][17][18]47,48]. Chemical class assignments for each metabolite were predicted using oxygen-to-carbon and hydrogen-to-carbon ratios (i.e., Van Krevelen classes [49]).
In order to evaluate bulk variation across sample types, a Mann-Whitney U test (wilcox.test) with a false discovery rate (FDR) p-value adjustment (p.adjust) was used to evaluate the divergence in molecular properties of all metabolites with molecular formulas assigned (46.08% of the total 95,681 peaks) present in either surface water or sediment. Differences in elemental group and chemical class relative abundances within samples between surface water and sediment were evaluated using the same approach. A principal component analysis (PCA; prcomp) was used to visualize differences between surface water and sediment metabolomes after a presence/absence transformation. A Euclidean distance matrix was obtained (vegdist, vegan package v2.5-6) and evaluated using a PERMANOVA (adonis, vegan package v2.5-6) in order to assess multivariate differences between sample types [89]. Inter-sample type variability was evaluated using the same Euclidean distance matrix in a beta-dispersion analysis (betadisper, vegan package v2.5-6); divergence in distance to centroid values was then evaluated using a Mann-Whitney U test [89].
To determine CONUS-scale patterns, sites were divided into eastern and western US based on their position relative to the location of the Mississippi River at St. Louis, Missouri. Replicates at each site were merged such that if a metabolite was observed in one replicate, it was considered present at the site. Given that FTICR-MS samples typically have less than 100% reproducibility [90,91], we considered a metabolite to be present in a sample if it was detected in any of the three replicates. This allowed us to maximize our detection of metabolites and has been previously employed [38]. Average molecular properties were then calculated, and elemental group/chemical class relative abundances were determined for each site/sample type combination based upon the metabolites present. This resulted in a single value for any given variable in surface water or sediment at a given site. Differences between these values across the East vs. West CONUS were then assessed using a Mann-Whitney U test with FDR correction. Maps were generated using ggplot to visualize spatial variance. All maps can be found in File S1.

Biochemical Transformation Analysis
We inferred biochemical transformations in sediment and surface water metabolomes as per Bailey et al. [61], Kaling et al. [68], Moritz et al. [69], Graham et al. [17,18], Garayburu-Caruso et al. [16], Danczak et al. [38], and Stegen et al. [15] to estimate the gain or loss of specific molecules (e.g., glucose, valine, glutamine). Briefly, pairwise mass differences were calculated between every peak in a sample and compared to a reference list of 1255 masses associated with commonly observed biochemical transformations (i.e., reactions of organic matter, Table S7). It is important to note that a molecular formula assignment is not necessary for this method as it allows for the incorporation of all detected metabolites. For mass differences matching to compounds in the reference list (within 1 ppm), we inferred the gain or loss of that compound via a biochemical transformation. For example, if a mass difference between two peaks corresponded to 71.0371, that would correlate to the loss or gain of the amino acid alanine, while a mass difference of 79.9662 would correspond to a loss or gain of a phosphate. Transformations were separated into 4 different groups based upon their labels: CHO-only, N-containing, S-containing, and P-containing. Differences in the relative abundance of transformations across samples were identified using a Mann-Whitney U test with FDR correction.

Conclusions
We leveraged community science facilitated by the WHONDRS consortium to present the first ultrahigh-resolution analysis of global river corridor metabolomes of both surface water and sediment. Our data showed a strong divergence between surface water and sediment metabolomes, consistent with previous work within local systems. Surface water metabolomes were more rich and variable and contained more unsaturated and aromatic metabolites than sediment, possibly suggesting higher influence from terrestrial inputs or lower microbial processing. Further, surface water and sediment metabolomes had a consistent set of core biochemical transformations (CHO-only) but differed in N-, S-, and P-containing transformations that may be more influenced by nutrient limitations. Finally, we hypothesize the presence of systematic, spatially structured drivers influencing sediment metabolomes more strongly than surface water, as sediment changed along longitudinal patterns within the contiguous United States.
While there are many potential explanations for these patterns, the publicly available datasets being actively compiled by WHONDRS are well-suited for follow-on analyses to identify factors underlying metabolome variability. Given that the WHONDRS sampling campaign spanned 1st to 9th stream orders across multiple biomes (e.g., desert-like in the Columbia Plateau, subtropical in southern Florida, temperate forests in the Mid-Atlantic), outcomes of current and future data analyses and modeling efforts will enable transferable knowledge that can be applied throughout the world. To expand the breadth of questions that can be pursued with the data, WHONDRS is currently collecting information pertaining to mineralogy, geochemistry (e.g., anion and total N concentrations), microbiology (e.g., metagenomics, metatranscriptomics, flow cytometric cell counts), and various remote sensing data types (e.g., vegetation cover). Future questions might, for example, involve exploring spatial patterns of metabolomes across stream orders; correlating N-, S-, and P-containing transformations with land use, mineralogy, and vegetation; or investigating relationships between microbial activity and metabolome composition. We encourage the scientific community to explore WHONDRS datasets and combine them with additional data products to pursue novel scientific questions at local to global scales and to further engage with and pursue science that embodies the ICON principles.
Supplementary Materials: The following are available online at http://www.mdpi.com/2218-1989/10/12/518/s1. Table S1: This table contains by-sample mean, median, and standard deviation for molecular characteristics (i.e., aromaticity index, H:C ratios, etc.; Sheet (1), the number of metabolites belonging to a given elemental group (i.e., CHON, CHO; Sheet (2), and the number of metabolites belonging to some compound class (i.e., %lignin-like, %protein-like, etc.; Sheet (3); Sheet (4), information about how each of the different measurements was calculated. Figure S1: Principal component analysis (PCA) performed on all peaks, regardless of formula assignment. Table S2: Loadings for PCA performed on all peaks. PC1 loadings on the x-axis are presented in Column A, while PC2 loadings on the y-axis are presented in Column B. Table S3: Loadings for PCA performed peaks with molecular formula assigned. PC1 loadings on the x-axis are presented in Column A, while PC2 loadings on the y-axis are presented in Column B. Table S4: This spreadsheet contains meta-data for each site from which samples were collected, including (but not limited to) latitude, longitude, stream order, and sampling date. Table S5: This table contains the results of the FDR-corrected, two-sided Mann-Whitney statistics performed to evaluate spatial variability across the contiguous United States of America. Table S6: Table of non-purgeable organic carbon (NPOC) concentrations for surface water and sediment-water extractions. Table S7: The file is the database of transformations used in the transformation analysis. The first column represents the transformation label, while the second column is the corresponding mass difference. There are two types of transformations listed in this file: (1) the gain or loss of the listed molecular formula (e.g., C1H1O1N1), with numeric values indicating the number of atoms associated with the element that precedes the numeric value; and (2) a substitution reaction denoted by an underscore (e.g., C1H1N1O_1). In the case of a substitution reaction, the underscore connects the element lost to the number of atoms lost. For example, C1H1N1O_1 indicates that a molecule gained C1H1N1 and lost one O atom. Some substitution reactions include multiple elements that are lost such that there are multiple underscores. In all cases, an underscore connects the element lost to the number of atoms lost. In all cases, atoms are gained if they are not followed immediately by an underscore. For example, C_1H_4O2 indicates loss of one C, loss of four H, and gain of two O. If no numeric value follows an element, it indicates that there is a gain of a single atom of that element (e.g., CH2 indicates one atom of C). File S1: A compressed file containing all of the maps generated during the spatial analysis of the contiguous United States of America.  Data Availability: Original and expanded metadata, as well as surface water and sediment data used in this study, are publicly available on the Department of Energy data archive site ESS-DIVE [44,92]. All scripts used in this study are available on GitHub at https://github.com/danczakre/GlobalRiverMetabolomes.