3.1. Physico-Chemical Data
Analysis of the physico-chemical parameters measured during sampling and the data available on the EHMP website revealed the level of degradation of the sampled system as a temporal snapshot. The salinity measurements of the sites indicate that BR
4 and BR
5 are above the recommended threshold for a freshwater ecosystem in the “
Australian and New Zealand Guidelines for Fresh and Marine Water Quality (AWA)” [
49] (threshold value of ca. 1.1 ppt). However, it is noteworthy to mention that these sites are influenced by the tide and the nearby estuary, although samples were collected during low tide. All the other sites were within the acceptable threshold for salinity (
Table 1).
The AWA guidelines also state that lowland river systems have turbidity limits between 6–50 NTU. For the sites sampled, BR3 was observed to be above this threshold (137 NTU); the tidal influenced sites (BR4 and BR5) were also within this range but are considered high for estuaries/marine environments (which have a threshold of 0.5–10 NTU).
All sites were within the acceptable range stated in the guidelines with respect to pH (6.5–8.0). However, site BR2 was observed at the maximum range of the threshold (pH 8.0). For the EHMP derived data, chlorophyll a levels (5 μg·L−1) were observed to be within acceptable limits with the exception of site BR1 (6.3 μg·L−1). The total phosphorous (TP) (acceptable level ≤50 μg·L−1) was high for all the sites (above 50 μg·L−1), with sites BR2 (320 μg·L−1), BR3 (270 μg·L−1), and BR4 (140 μg·L−1) observed to contain exceptionally high TP when compared to the threshold value. Likewise for filterable reactive phosphate (FRP) (acceptable level ≤20 μg·L−1), all sites were observed above the guideline value, with sites BR2 (240 μg·L−1) and BR3 (260 μg·L−1) the highest. For total nitrogen (TN) (acceptable levels ≤500 μg·L−1), sites BR2 (880 μg·L−1) and BR3 (810 μg·L−1) were above the guideline value. Nitrogen oxide(s) (NOx) (acceptable level ≤60 μg·L−1) was high for all sites with the exception of BR1 (31 μg·L−1). Sites BR2 (500 μg·L−1), BR3 (550 μg L−1), BR4 (210 μg·L−1) and BR5 (100 μg·L−1) were observed to have significantly higher NOx content than the threshold. Lastly, all sites were below the threshold regulatory limit for ammonium based nitrogen (NH4+) (<20 μg·L−1).
3.2. Trace Metals
Heavy metal pollution of waterways is typically associated with mining activities or discharges from manufacturing industries. Heavy metal pollution in water and sediments can have serious effects on the aquatic ecosystem and can make water unsuitable for livestock and/or human consumption. Furthermore, some animals (i.e., fish, shellfish and oysters) can also “bio-accumulate” metals [
17], making them unsafe for consumption. As such, the concentration of metals in an urban stream is of interest, more so when the stream has the potential to further impact downstream fisheries in estuaries and marine environments.
In the context of this study, soluble metals in the sampled river were analyzed because they would most likely impact the planktonic biota (in terms of abundance and diversification) and their metabolism (i.e., metabolic output). However, metals bound within sediments and biofilms, although important, were considered outside the scope of this investigation and is the focus of future work.
All trace metals analyzed were below the trigger values set for freshwater aquatic ecosystems at the 90% level of protection of species in the guidelines. The trigger values were set at 80 μg·L
−1 for aluminum, 0.4 μg·L
−1 for cadmium, 1.8 μg·L
−1 for copper, 5.6 μg·L
−1 for lead, 13 μg·L
−1 for nickel, and 15 μg·L
−1 for zinc. The guidelines have no set limit for chromium, cobalt and iron [
49]. However, as indicated in
Table 3, site BR
3 was observed to have elevated levels for all the metals analyzed. In particular, site BR
3 was observed to have significantly higher concentrations of aluminum (2.01 μg·L
−1) and iron (4.53 μg·L
−1) compared to the up-stream and down-stream sampling sites. It also was observed to have slightly higher concentrations of cobalt (2.38 ng·L
−1), chromium (1.0 ng·L
−1), copper (8.1 ng·L
−1), lead (7.6 ng·L
−1) and nickel (36.9 ng·L
−1). It is noteworthy to mention that site BR
3 was located near a wastewater treatment plant and is located at the junction of a side stream that enters into the BR.
3.4. Metagenomics
The estimated Good’s coverage of the sample groups (75,000 sequence reads per site, with 3 independent samples subsampled to 25,000 sequence reads) ranged from 97% to 100%, with an average of 96% ± 1.2% among all samples. An average Shannon diversity index of 5.00 ± 0.24, OTU richness value of 1725 ± 172, and abundance-based coverage estimate of 4024 ± 1035 richness were observed among all samples.
Table 5 provides a summary of the bacterial metagenomics data based on observed and unique features per order, family and genera for each sampled site.
Figure 2 illustrates the bacterial order profile summary of the sampled sites and
Figure 3 provides an overview of the site features in terms of similarity and uniqueness as presented as a Venn diagram.
The unique Family features for the sites were Clostridiales incertae sedis, Desulfonatronaceae, Corynebacteriaceae, GpX, and Dermatophilaceae for Site BR1; Incertae Sedis, Aquificaceae for site BR2; Ktedonobacteraceae, Herpetosiphonaceae, Aerococcaceae, and Spirochaetales incertae sedis for site BR3, Brevinemataceae, Euzebyaceae, Dietziaceae, Thermoactinomycetaceae 1, Saccharospirillaceae, Cohaesibacteraceae, Dermacoccaceae, Clostridiaceae 2, Psychromonadaceae, Rubrobacteraceae, and Thiohalorhabdus for site BR4; and, Aquificales incertae sedis, Thermosporotrichaceae, Desulfarculaceae, Congregibacter, Cellulomonadaceae, Acholeplasmataceae, Bartonellaceae, Thermolithobacteraceae, Lactobacillaceae, Leuconostocaceae, Micromonosporaceae, Promicromonosporaceae, and Sphaerobacteraceae for site BR5.
3.5. Community Metabolomics
The GC-MS analysis of the samples indicated a presence of 289 peaks per chromatogram, of which 54 were considered statistically significant (S/N ratio ≥50 with an adjusted p-value ≤ 0.05). Univariate and multivariate statistical tools such as t-test, Principal Component Analysis (PCA) and Partial Least Square-Discriminant Analysis (PLS-DA) were used to analyze the distribution and classification of the various metabolites. Due to the unsupervised nature of the data and the number of sample sites, PCA was observed as a less satisfactory method to discriminate between the metabolite distributions. As such, samples were processed further using Partial Least Square-Discriminant Analysis (PLS-DA). PLS-DA is used to examine large datasets and has the ability to measure linear/polynomial correlation between variable matrices by lowering the dimensions of the predictive model, allowing easy distribution between the samples and the metabolite features that cause the distribution.
The data quality of PLS-DA model was assessed by the linearity (
R2X) and predictability (
Q2), which were observed at 0.8294 and 0.565, respectively. These are indicative of a model that reasonably fits the data and has a weak/moderate predictive capability (~0.5).
Figure 4A illustrates the PLS-DA score scatter plot of the metabolomic dataset groups (sample sites), and
Figure 4B illustrates the loading scatter plot of the observed metabolites. The majority of the identified metabolites were sugars, fatty acids and amino acids. Secondary metabolites such as perillyl alcohol, lithocholic acid and phytol were also observed. As biological datasets tend to significantly vary from sample to sample, a distance of observation (DModX) analysis was also used to identify and eliminate any outliers. DModX is the normalised observational distance between variable set and X modal plane and is proportional to variable’s residual standard deviation (RSD). “DCrit (critical value of DModX)”, derived from the F-distribution, calculates the size of observational area under analysis. The DModX plot (not shown) data indicate that no samples exceeded the threshold for rejecting a sample. The threshold for a moderate outlier is considered when the sample DModX value is twice the DCrit at 0.05, which, in this instance, was 2.897 (DCrit = 1.435).
Table 6 lists the ‘identified’ significant metabolites after Benjamini-Hochberg adjustment. The unique metabolite features for the sites were Unknown Compound 13 (MW = 218.2) for site BR
2; Xylitol (dTMS),
l-Arabinose (4TMS), Unknown Compound 4 (MW = 325.2), and Unknown Compound 15 (MW = 189.1) for site BR
3; Phytol mixture of isomers, Erythritol (4TMS) and
d-Fructose (5TMS) for site BR
4; and, Unknown Compound 18 (MW = 278.2) and Unknown Compound 9 (MW = 325.2) for site BR
5.
Figure 5 provides an overview of the site metabolite features in terms of similarity and uniqueness as presented as a Venn diagram.
3.6. Multi-Omics
As illustrated in the summary table (
Table 7), an assessment of the water quality parameters in isolation is often difficult and tedious to decipher in terms of the system’s health and resilience; not to mention looking at the metagenomics and metabolomics data in isolation, due to the volume of data. An elevated result or a breach of the guidelines may not necessarily mean that the site or system is degraded. For example, the microbial indicators of the sites sampled suggest that sites BR
3, BR
4, and BR
5 may pose a risk to human health (and were indeed classed as low quality). However, as illustrated in the study by Ahmed et al. [
52] of the same samples, only site BR
3 was observed to have a human wastewater signature. Likewise, sites BR
4 and BR
5 had elevated salinity levels according to the guidelines but it was noted that these sites were heavily influenced by the tide. As such, it is important to note that such data only provide a snapshot of the system at the time of sampling and may not represent the characteristics of the overall system at all times. One approach to overcome such problems is to sample the system more frequently (both temporarily and longitudinally). However, this will significantly increase the cost of analysis. An alternative approach that requires fewer samples to be collected is a multi-omics approach. Environmental multi-omics relies on a deeper analysis of the system being sampled in terms of bacterial diversity and metabolic output. Furthermore, it combines metadata to investigate relationships between sites. While it is ideal to do such an analysis over a period of time in order to establish seasonal trends, the study presented herein demonstrates its application and illustrates the added value of such an approach.
As such, in order to assess the entire system (from site BR1 through to BR5 from the perspective of heavy metals, physical and chemical parameters, metabolites and bacterial diversity), first the multiple datasets collected need to be collated and analyzed using a multi-omics approach in order to see if the data provide insight into the river system’s health. Investigating complex systems in isolation, whether it be analyzing measurements or sites in isolation, without consideration of upstream and downstream conditions, can result in an incorrect assessment of overall health or degradation. To this end, a series of PLS-DA plots were created in order to combine the multiple datasets presented herein. Each dataset was first matched by site name and log transformed in SIMCA to normalize the data. This enabled the data to be interrogated and provided a greater depth of analysis compared to investigating each site and parameter in isolation. The following section details such an assessment using the MAC characterization (i.e., Class A and D; which is also the same grouping as turbidity), the salinity data (i.e., high and low salinity) and MAC classification in combination with low salinity site data to categorize sites for comparison.
3.6.1. Microbial Water Quality Assessment Category Class Assessment
After the data were uploaded into SIMCA individually, matched by sample location identifiers and log transformed, they were then grouped based on the MAC category of ‘Class A’ and ‘Class D’. The resulting PLS-DA model was assessed by the linearity (
R2X and
R2Y) and predictability (
Q2), which were observed at 0.584, 0.987 and 0.750, respectively. This is indicative of a model that reasonably fits the data and has a good predictive capability (>0.7).
Figure 6A illustrates the PLS-DA score scatter plot of the combined datasets grouped based on MAC values (i.e., Class A and Class D), and
Figure 6B illustrates the loading scatter plot of the observed parameters.
Using the MAC Class PLS-DA model, the dominant significant taxa classified at the class level for the ‘Class A’ pooled samples were Acidobacteria, Alphaproteobacteria, Anaerolineae, Bacilli, Betaproteobacteria, Chlamydiae, Chloroflexi, Elusimicrobia, Fusobacteria, Gammaproteobacteria, Gemmatimonadetes, Holophagae, Ignavibacteria, Ktedonobacteria, Mollicutes, Negativicutes, Nitrospira, Opitutae, Spartobacteria, Spirochaetes, Thermodesulfobacteria, Verrucomicrobiae, and Zetaproteobacteria. The dominant significant metabolic features were metabolites relating to carbohydrate metabolism (l-gulose, l-arabinose), glucagon signaling pathway (α-d-Glucose-1-phosphate, dipotassium salt dihydrate), and starch and sucrose metabolism (d-Cellobiose). Furthermore, no trace metals were correlated with the pool ‘Class A’ sample cohort.
In contrast, the dominant significant taxa classified at the class level for the ‘Class D’ pooled samples were Armatimonadetes, Chlorobia, Chloroplast, Chrysiogenetes, Chthonomonadetes, Clostridia, Cyanobacteria, Deferribacteres, Dehalococcoidetes, Deinococci, Epsilonproteobacteria, Fibrobacteria, Flavobacteria, Lentisphaeria, Planctomycetacia, Sphingobacteria, Synergistia, Thermolithobacteria, Thermomicrobia, and Thermotogae. The dominant significant metabolic features were metabolites relating to secondary bile acid biosynthesis (Lithocholic acid), carbohydrate metabolism (3,6-anhydro-d-galactose), fatty acid biosynthesis (capric acid), fructose and mannose metabolism (d-mannose), biosynthesis of unsaturated fatty acids (erucic acid methyl ester), and pentose and glucuronate interconversions (d-ribulose), in addition to chemical markers commonly found in human waste stream such as phytol mixture of isomers (manufacture of synthetic forms of vitamin E and vitamin K1), and osteoarthritis medication (d-glucosamine hydrochloride). Lastly, the trace metals of Al, Cr, Fe, Co, Ni, Cu, Zn and Pb were associated with the pooled ‘Class D’ sample cohort.
This suggests that ‘Class D’ pooled samples are correlated based on a number of factors, primarily bacteria that are known to cause or influence algae blooms (such as Cyanobacteria), organisms that lack aerobic respiration (Clostridia, Synergistia) and a number of green sulfur and non-sulfur bacteria (Chlorobia, Thermomicrobia), which are exacerbated due to the presence of pollutants (such as the presence of human waste stream indicators and heavy metals). Furthermore, bacteria capable of dehalogenating polychlorinated aliphatic alkanes and alkenes (Dehalococcoidetes) and organisms highly resistant to environmental hazards (Deinococci) were more abundant in Class D pooled samples. The presence of such organisms suggest the organisms within the sites are resistant to pollutants. However, the presence of Fibrobacteria suggests that commensal bacteria and opportunistic pathogens may also be present. In contrast, ‘Class A’ pooled samples were found to have organisms more commonly found in soil and aquatic environments, with no significant human waste-derived contaminants or metals present.
3.6.2. Salinity Assessment
The data were grouped based on salinity data which was classed as ‘Low’ (<1.0 ppt) and ‘High’ (~30 ppt). The resulting PLS-DA model was assessed by the linearity (
R2X and
R2Y) and predictability (
Q2), which were observed at 0.450, 0.983 and 0.911, respectively. This is indicative of a model that reasonably fits the data and has an excellent predictive capability (>0.9).
Figure 7A illustrates the PLS-DA score scatter plot of the combined datasets grouped based on Salinity values (Low and High), and
Figure 7B illustrates the loading scatter plot of the observed parameters.
Using the salinity class PLS-DA model, the dominant significant taxa classified at the class level for the ‘Low’ salinity sample sites were: Acidobacteria, Alphaproteobacteria, Anaerolineae, Bacilli, Betaproteobacteria, Chlamydiae, Chloroflexi, Elusimicrobia, Fusobacteria, Gammaproteobacteria, Gemmatimonadetes, Holophagae, Ignavibacteria, Ktedonobacteria, Negativicutes, Nitrospira, Opitutae, Spartobacteria, Spirochaetes, Thermodesulfobacteria, Verrucomicrobiae, and Zetaproteobacteria. The dominant significant metabolic features were metabolites relating to carbohydrate metabolism (l-gulose, butanoic acid), alanine metabolism (propanedioic acid), Biosynthesis of secondary metabolites (glycerol). Furthermore, Al, Cr, Zn and Pb were correlated with the ‘low’ salinity pooled cohort.
In contrast, the dominant significant taxa classified at the class level for the ‘High’ salinity sample sites were: Armatimonadetes, Caldilineae, Chlorobia, Chloroplast, Chrysiogenetes, Chthonomonadetes, Clostridia, Cyanobacteria, Deferribacteres, Deinococci, Epsilonproteobacteria, Fibrobacteria, Flavobacteria, Planctomycetacia, Sphingobacteria, Synergistia, Thermolithobacteria, Thermomicrobia, and Thermotogae. The dominant significant metabolic features were metabolites relating to secondary bile acid biosynthesis (lithocholic acid), carbohydrate metabolism (3,6-anhydro-d-galactose), fatty acid biosynthesis (capric acid), fructose and mannose metabolism (d-mannose), biosynthesis of unsaturated fatty acids (erucic acid methyl ester), osteoarthritis medication (d-glucosamine hydrochloride), and pentose and glucuronate interconversions (d-ribulose). In addition to chemical markers commonly found in human waste streams, such as Phytol, and perillyl alcohol (a monoterpene isolated from the essential oils of lavandin, peppermint, spearmint, cherries, celery seeds, and several other plants) were also detected. Lastly, Fe was associated with the pooled high salinity sample cohort. Like the previous assessment, the addition of salinity as a grouping highlights the presence of photosynthetic bacteria in addition to bacteria that are resilient to pollution sources.
3.6.3. Microbial Water Quality Assessment Category Class A and Low Salinity Assessment
As illustrated in
Figure 6, sites BR
1 and BR
2 were grouped apart from BR
3. In order to further analyze this sub-grouping, the ‘High’ salinity based sites (BR
4 and BR
5) were removed and a subsequent PLS-DA comparison was undertaken.
Figure 8 illustrates the PLS-DA comparison based on Microbial Water Quality Assessment Category class and ‘Low’ salinity. The resulting PLS-DA model was assessed by the linearity (
R2X and
R2Y) and predictability (
Q2), which were observed at 0.560, 0.964 and 0.657, respectively. This is indicative of a model that reasonably fits the data and has an average predictive capability (≥ 0.5).
This comparison highlights the increased presence of metals, short-chain fatty acids (SCFA) and sugars in site BR3 when compared with sites BR1 and BR2. Furthermore, the increased abundance of bacteria belonging to Acidobacteria, Actinobacteria, Armatimonadetes, Chloroflexi, Chloroplast, Chrysiogenetes, Chthonomonadetes, Dehalococcoidetes, Fibrobacteria, Sphingobacteria, and Thermolithobacteria suggests an environment that is capable of dehalogenating polychlorinated aliphatic alkanes and alkenes (Dehalococcoidetes) and the presence of Fibrobacteria suggests that commensal bacteria and opportunistic pathogens may also be present.