1. Introduction
Antibiotic resistance genes (ARGs) are genetic elements that confer resistance to antimicrobial agents and have become globally recognized as emerging environmental contaminants of public health concern [
1]. Although ARGs occur naturally in soil and aquatic microbiomes, anthropogenic pressures, particularly the widespread use of antibiotics in clinical, agricultural, and veterinary settings, have accelerated their distribution and abundance across ecosystems [
2,
3]. Wastewater effluents, urban stormwater runoff, and agricultural discharges are primary conduits through which ARGs are introduced into freshwater systems [
4,
5]. These environments function as reservoirs of resistance genes and may promote their horizontal transfer to human pathogens, elevating the risk of treatment-resistant infections [
1,
6].
Studies have documented the presence of specific ARG classes in freshwater systems, including genes conferring resistance to sulfonamides, tetracyclines, β-lactams, aminoglycosides, and more [
5,
7]. These genes are frequently associated with mobile genetic elements such as plasmids and integrons, facilitating their spread across diverse microbial hosts [
8]. In rivers and lakes located downstream of wastewater treatment plants or livestock operations, elevated concentrations of these resistance determinants have been repeatedly observed [
4,
5,
9]. For example, Pruden et al. demonstrated that ARGs were significantly elevated downstream of urban and agricultural activity, specifically sulfonamide and tetracycline resistance genes [
5]. Similarly, Di Cesare et al. found widespread co-detection of ARGs and integrons in wastewater treatment plants, underscoring the role of effluent discharge as a persistent point source [
8].
Environmental factors, particularly rainfall and hydrological events, can greatly influence the mobilization and distribution of ARGs in freshwater systems. Rainfall introduces multiple contaminants through surface runoff, which can carry antibiotic residues, resistant bacteria, and fecal materials from agricultural land, impervious surfaces, and septic systems into nearby water bodies [
10,
11,
12]. Storm events have been shown to increase both the concentration and diversity of ARGs in aquatic environments. For example, O’Malley et al. observed that stormwater runoff in urban environments exhibited seasonal and spatial shifts in the ARG profiles of microbes as well as their surrounding environments, with increased abundances during periods of high precipitation [
10]. Similarly, Baral et al. used metagenomic sequencing to track ARGs in an urban stream during storm events and identified high abundance of multidrug and vancomycin resistance genes immediately following wet-weather conditions [
13]. These findings suggest that rainfall can act as a vector, both resuspending sediment-associated ARGs and introducing new sources via runoff. In another study, Di Cesare et al. found that ARG abundance in a riverine microbial community increased significantly after rainfall, even in areas without direct point-source pollution [
14].
Rainfall may also influence ARG fate by altering flow rates, sediment mixing, or dilution effects, leading to complex and sometimes site-specific patterns [
11,
12]. For example, storm-induced runoff can increase the concentration of resistance genes near inflow points while diluting them downstream. These differential effects highlight the need to examine ARG trends in freshwater systems at both temporal and spatial scales. Resistance genes may also persist long after rainfall events due to their association with biofilms, sediments, and mobile genetic elements, further complicating efforts to trace sources and assess risks [
9].
The public health implications of ARG contamination in freshwater lakes are particularly relevant in suburban and residential areas, where water bodies are often used for recreation and are in close contact with domestic animals. Pets exposed to ARG-contaminated water have been shown to harbor resistant bacteria, including extended-spectrum β-lactamase-producing
Escherichia coli, which can spread to human caretakers through close contact [
15]. Such environments also provide ideal conditions for the accumulation of multidrug resistance determinants, increasing the likelihood of encountering bacterial strains capable of resisting multiple antibiotic classes [
6,
16].
Despite growing evidence linking precipitation to ARG mobilization in riverine and wastewater-influenced systems, there remains a critical gap in understanding how rainfall affects ARG patterns in small, suburban freshwater lakes. Riverine systems (lotic environments) are characterized by continuous flow and high mixing, which promote rapid dispersion of contaminants, including ARGs [
17]. In contrast, lakes (lentic environments) typically experience slower water movement, reduced turbulence, and vertical stratification [
18]. These hydrological features may facilitate localized ARG accumulation, sediment settling, or selective persistence following storm events. Suburban lakes are often embedded within urbanized landscapes and subject to diffuse pollution from lawns, septic systems, roadways, and pet activity, yet they remain underrepresented in environmental antimicrobial resistance research [
10,
19]. As climate change projections suggest increasing frequency and intensity of storm events, understanding how rainfall influences ARG dynamics in these contexts is vital.
This study addresses this research gap by evaluating the impact of rainfall on ARG abundance and composition in Lake Katherine, a suburban lake in Columbia, South Carolina. It was hypothesized that rainfall events enhance the abundance and diversity of ARGs in suburban lake water through runoff and hydrological change. This study investigates how short-term rainfall events influence the presence and distribution of ARGs over time, using metagenomic sequencing, taxonomic analysis, and statistical modeling. The findings contribute to a growing body of research on environmental resistomes and inform strategies for surveillance and mitigation in residential watersheds vulnerable to storm-driven microbial pollution.
2. Materials and Methods
2.1. Sampling Location and Site Selection
Lake Katherine is a suburban freshwater lake located east of Columbia, South Carolina, USA. It was selected for this study due to its recreational use and location within a residential neighborhood, both of which increase the likelihood of human–pathogen interactions and anthropogenic inputs [
20]. To assess the spatial and temporal dynamics of ARGs, water sampling was conducted over the course of one year, from September 2019 to September 2020. This period captured seasonal variation, fluctuations in hydrology, and the effects of both dry weather and rainfall-influenced conditions.
Three sampling sites were selected to represent distinct hydrological characteristics and potential exposure to pollutants (
Figure 1C). The inlet, designated as Site 1 (blue marker; 34.007569, −80.961264), is located at the point where a tributary flows into Lake Katherine, serving as the primary entry point of surface water and a likely conduit for upstream contaminants. The cove, Site 2 (orange marker; 34.005997, −80.958055), is a semi-enclosed, low-flow region within the lake body that does not receive direct tributary input but may promote the accumulation of particulate matter and microbial communities. The outlet, Site 3 (purple marker; 33.997428, −80.965949), is situated where water exits the lake and reflects the cumulative effects of upstream contributions, in-lake processes, and residential runoff [
21].
At each sampling site, triplicate surface water samples were collected using sterile 2 L polypropylene bottles. Immediately following collection, the samples were stored on ice and transported to the laboratory for processing and nucleic acid extraction.
To assess the effects of precipitation on ARG dynamics, sampling events were classified as wet-weather if they occurred within 48 h following a two-day cumulative rainfall total of ≥1 cm. Events with rainfall totals below this threshold were classified as dry-weather. This cutoff was selected to differentiate minimal precipitation from events likely to generate surface runoff and mobilize contaminants into the lake. Daily rainfall data were obtained from the Semmes Lake weather station, located approximately 1.35 km from Lake Katherine [
22]. The timing of rainfall relative to sampling dates is detailed in
Supplementary Table S1, and a visual timeline is provided in
Supplementary Figure S1.
The lake is embedded within an urban watershed that likely contains both point and non-point sources of ARGs. The elevated ARG levels observed during dry-weather conditions suggest the presence of persistent local contamination sources. In contrast, rain-driven inputs may contribute large numbers of allochthonous bacteria and ARGs via surface runoff or episodic infrastructure discharge.
2.2. Sample Processing and DNA Extraction
Water samples were processed within 8 h to minimize degradation [
23]. Samples and sterile water controls were homogenized and 1 L from each sample was filtered using a 0.22 µm Corning polyether sulfone (PES) vacuum filter (Corning Inc., Corning, NY, USA) [
24]. The filter was removed and placed into a 50 mL conical tube. An additional 20 mL of unfiltered water from the respective sample was added to the tube and vortexed to resuspend the captured microbial mass. The resuspended cells were then centrifuged at 6000×
g and the resultant cell pellet was used for nucleic acid extraction. DNA extraction was performed using the DNeasy PowerSoil Kit (QIAGEN, Hilden, Germany), following the manufacturer’s protocol [
25]. Final DNA was eluted in 50 µL of RNase-free water and DNA concentration was measured using a Qubit 2.0 fluorometer (Life Technologies, Carlsbad, CA, USA) [
26]. DNA integrity and purity were assessed using an Agilent 2100 Bioanalyzer (Agilent Technologies, Santa Clara, CA, USA) [
27]. DNA yields ranged from approximately 35 to 145 ng/µL [
Supplementary Table S2] and were within expected ranges for environmental samples.
2.3. Library Preparation and Sequencing
DNA was enzymatically sheared to approximately 300 bp, and Illumina sequencing libraries were prepared and barcoded using the New England Biolabs Ultra II FS DNA Library Prep Kit (NEB, Ipswich, MA, USA) and the New England Biolabs NEBNext Multiplex Oligos for Illumina following manufacturer protocols (NEB, Ipswich, MA, USA) [
28]. Individually barcoded libraries (96 samples) were combined (1:1 based on Qubit concentrations) in EB buffer to a concentration of 15 nM in 20 µL [
29]. Combined libraries were analyzed on a Bioanalyzer (Agilent Technologies, Santa Clara, CA, USA) to estimate the final library fragment size and concentration [
29]. Sequencing was performed on the Illumina NovaSeq 6000 platform (Illumina, San Diego, CA, USA), using S4 flow cells with 2 × 150 bp paired-end sequencing [
29]. Each analyzed sample yielded between approximately 8.7 million and 370 million total sequencing reads (paired and unpaired combined), with a mean of ~76.5 million reads per sample [
Supplementary Table S3]. This sequencing depth was sufficient to support robust metagenomic assembly and antibiotic resistance gene profiling. The raw Illumina sequencing reads for each sample are available at the NCBI GenBank under the BioProject accession # PRJNA1282196.
2.4. Bioinformatic Analysis
Following each NovaSeq run, raw sequencing reads were quality filtered using FastP (version 0.20.0) with the parameters -Q -L -g --poly_g_min_len 5 and --adapter_fasta to remove sequencing adapters and trim poly-G tails [
30]. Error correction of high-quality reads was then performed using SPAdes (version 3.12) with the --only-error-correction -m 800 settings [
31]. Corrected reads were assembled into metagenomic contigs using MEGAHIT (version 1.2.7) with --presets meta-sensitive and a minimum contig length of 500 bp [
32]. Open reading frames (ORFs) were predicted from assembled contigs using Prodigal (version 2.6.3) in meta mode, which is optimized for gene calling in metagenomic datasets [
33]. The resulting amino acid sequences were aligned to the DeepARG database using DIAMOND (version 0.9.24) in predictive mode (--model LS --type prot -I -o -d), which applies deep learning models to identify both known and putative ARGs from protein sequences [
34].
To quantify the abundance of each predicted ARG, cleaned sequencing reads were mapped back to the Prodigal-predicted ORFs derived from the assembled contigs using Bowtie2 (version 2.3.2), and alignments were saved as BAM files [
35]. Coverage and read depth across each ORF were calculated using samtools bedcov (version 1.12) and bedtools coverage (version 2.27.1), respectively [
36,
37]. Both tools were run with default settings. The resulting coverage and count data were merged using a custom Python script that paired ORF-level depth with read counts. These data were further merged with DeepARG output files using Python to generate a unified file containing ARG identity, gene coverage, and read count data per sample. Total sequencing depth per sample was calculated using a Python script, which counted the number of reads across all paired and unpaired FASTQ files. These values were then used to calculate a normalized abundance metric. A detailed overview of the bioinformatics workflow, including software tools and data flow between steps, is provided in
Supplementary Figure S2.
Statistical tests were performed using Python (Python version 3.11.11, Python Software Foundation, Wilmington, DE, USA) for total ARG and gene-class level analysis. ARG abundance was quantified using RPKM, defined as the number of reads mapped to a gene per kilobase of transcript per million mapped reads, and shown in the following equation:
This metric provides a normalized measure of gene abundance, enabling comparative analysis across different genes and samples [
38]. By normalizing for both gene length and sequencing depth, RPKM allows for direct comparison of ARG abundance across samples, making it a reliable metric for assessing the influence of rainfall on ARG dynamics [
39].
Taxonomic classification of contigs was performed using Kaiju (version 1.9.2), with contigs aligned to the NCBI non-redundant protein database using default greedy mode and the parameters -t, -f, -i, -v, and -z 40 to specify the taxonomy nodes file, index database, input file, verbose mode, and multithreading [
40]. Taxonomic assignments were mapped to the ORFs by trimming contig suffixes and merging based on identifiers between Prodigal, DeepARG, and Kaiju outputs. This allowed each ARG to be associated with both a predicted resistance class and a corresponding microbial taxonomic lineage.
Putative plasmid-origin contigs were identified using PlasClass (version 0.1.1) [
41]. Contigs with a plasmid probability score ≥ 0.8 were retained as plasmid-associated. ARGs predicted by DeepARG were cross-referenced with plasmid predictions to identify resistance genes potentially carried on plasmids.
2.5. Statistical Analysis
2.5.1. Two-Day Cumulative Rainfall Window Selection
For analyses involving cumulative rainfall, a two-day cumulative rainfall window (including the sampling day and one day prior) was selected based on preliminary correlation analyses evaluating windows ranging from 1 to 7 days. This window was chosen for its superior statistical performance and ecological relevance. Detailed results of this preliminary analysis and the rationale for selecting the two-day rainfall window are presented in the Results section (
Section 3.1).
2.5.2. Sites-Combined Analysis
Using Python, a polynomial regression model (Equation (2)) was used to investigate the relationship between rainfall and ARG abundance. This model was chosen to account for the hypothesized non-linear relationship, where ARG abundance might initially increase with rainfall but then plateau or decline at higher rainfall levels due to dilution effects.
The analysis focused on total ARG abundance, RPKM, across three sampling sites in Lake Katherine. The predictor variable was cumulative rainfall over the two days leading up to each sampling event, measured in centimeters. A second-order polynomial model was applied to capture potential curvature in the relationship.
Equation (2) is a polynomial regression model for total ARG abundance, which is as follows:
In Equation (2), β0 represents the intercept, β1 captures the linear effect of rainfall, β2 represents the quadratic effect of rainfall, and ε accounts for unexplained variability. To guard against multicollinearity, we computed the Pearson correlation between the standardized rainfall term and its square (r = −0.395, corresponding to VIF ≈ 1.18), indicating minimal collinearity.
Separate models were fitted for all sites combined and for each of the three sampling sites (Site 1, Site 2, and Site 3), allowing for site-specific differences in rainfall effects. To account for replicates and streamline analysis, total ARG abundance values were averaged by location and sampling date. To assess the significance of the coefficients, a bootstrap resampling approach with 1000 iterations was employed, generating 95% confidence intervals (CIs) for the intercept, linear, and quadratic terms.
2.5.3. Gene-Class Level Analysis
To analyze the relationship between rainfall and ARG abundance at the gene-class level, a Zero-Inflated Poisson (ZIP) model was employed. The dataset consisted of ARG abundance values (in RPKM) across multiple sampling sites and dates. The ZIP model was specifically chosen due to the high proportion of zero observations in gene-class-level ARG data. To account for variability due to replicates, samples from the same site and date were aggregated by averaging values for each gene class. This approach reduced within-site noise while preserving the temporal and spatial resolution of the dataset. ZIP models were fitted separately for each ARG class at each sampling site, with rainfall, measured as the two-day cumulative precipitation, included as the primary predictor. Overall, the model formula can be displayed as follows:
In Equation (3), μ represents the expected ARG abundance measured in RPKM, the intercept β0 represents the expected log(RPKM) when rainfall is zero, and the slope coefficient β1 quantifies the change in the log of the expected RPKM for each unit increase in the two-day cumulative rainfall. This approach accounts for the potential non-linear effects and excess zeros often observed in environmental datasets.
To assess the relationship between rainfall and ARG abundance at the gene-class level, gene classes were first filtered to ensure sufficient representation. Specifically, only gene classes with at least five non-zero observations, and where non-zero observations constituted at least 10% of the total measurements, were retained. This filtering step minimized the inclusion of sparse data, thereby improving the reliability of subsequent statistical analyses [
42,
43]. Parameter estimates, confidence intervals, and
p-values for each gene class and site were exported for further processing. Since the model coefficients are in log units, the intercepts and slopes can be exponentiated to interpret values on the original RPKM scale. For slopes, the percentage change in expected ARG abundance per unit increase in rainfall was calculated as follows:
This equation allows us to view the slope on a percent change scale as opposed to a log-linear scale, facilitating a clearer understanding of rainfall as a potential driver of ARG abundance in Lake Katherine, while accounting for site-level and replicate-level variability. Residual diagnostics were also performed on the ZIP models, deviance response residuals vs. fitted-value plots and Q-Q plots, to check model fit and distributional assumptions [
Supplementary ZIP Residual Plots].
2.6. Taxonomic Analysis
To investigate the taxonomic associations with ARG classes, a custom Python workflow was developed to aggregate and visualize taxonomic composition across sampling events at Lake Katherine. Taxonomic assignment files were grouped by sampling date, and each file was parsed to extract key metadata including sampling date, predicted ARG class, taxonomic class, and phylum. For each ARG class, counts of associated taxonomic classes were aggregated across all samples. A contingency table of raw counts was generated with ARG classes as rows and taxonomic classes as columns. To account for differing total counts across ARG classes, the table was normalized to express values as percentages, producing a 100% stacked profile of taxonomic contributions to each ARG class. This allowed for comparative visualization of the relative microbial community composition across ARG classes. Taxonomic diversity was examined by calculating the Shannon diversity index for each ARG class based on its normalized taxonomic profile. Using Python, a heatmap was generated to visualize the abundance of taxonomic classes associated with each ARG class, using a non-linear color scale to preserve sensitivity across both low and high abundance taxa. ARG classes were hierarchically clustered using average linkage and Euclidean distance, and their order was optimized to group similar community profiles, before being plotted on the upper X axis. Adjacent to the heatmap, a phylum-level annotation stripe was included, with taxonomic classes grouped by phylum to highlight broad lineage-level patterns. For clarity, all “Candidatus” taxonomic names were abbreviated (e.g., Candidatus Omnitrophia to C. Omnitrophia). This analysis examines how distinct microbial taxa contribute to the distribution of ARGs in the lake system.
Principal Component Analysis (PCA) followed by k-means clustering was used to evaluate clustering patterns in taxonomic composition across ARG classes. Taxonomic data were aggregated from Excel files containing taxonomic counts at the class level, grouped by ARG class and taxonomic class, and normalized to generate percent compositions. PCA was conducted using the scikit-learn library in Python on the resulting matrix of relative abundances, and the first two principal components were retained. To determine the optimal number of clusters for k-means, the elbow method was applied by calculating within-cluster sum of squares (WCSS) for values of k ranging from 1 to 10. The optimal k was identified using the KneeLocator package. K-means clustering was then performed on the PCA-transformed data using the selected k value. PCA scatterplots were generated with cluster annotations and ARG class labels using matplotlib, and text labels were adjusted using the adjustText library to minimize overlap.
Lastly, taxonomic assignments were linked to sample rainfall data and classified as Dry (<1 cm two-day rainfall) or Wet (≥1 cm). For each sample, counts of ARG-bearing contigs were aggregated by taxonomic class, then converted to relative abundances:
Mean relative abundances under Dry and Wet conditions were computed and used to generate side-by-side bar charts of the top fifteen taxa. This approach controlled for unequal sample numbers in each category and highlights which bacterial lineages harbor ARGs during baseflow versus runoff-driven periods.
2.7. Plasmid Classification and ARG Association
Contigs identified as potentially plasmid-origin using PlasClass were cross-referenced with ARG predictions obtained from DeepARG. To address biases due to varying contig lengths or fragmented sequences, ARG classifications were summarized at the sample level using a binary presence/absence metric for each ARG class (1 = presence of at least one plasmid-predicted contig containing the ARG class; 0 = absence). Detection frequencies were calculated for each ARG class across all 82 samples by determining the proportion of samples in which each ARG class was present on plasmid-predicted contigs.
An identical workflow was applied for taxonomic analysis. Taxonomic assignments from Kaiju were matched to plasmid-predicted contigs. In cases of multiple assignments per contig, the most frequently occurring taxonomic class was selected. Each sample was then scored similarly, using a binary presence/absence approach for each identified taxonomic class associated with plasmids at the sample level.
Finally, Fisher’s exact tests with Benjamini–Hochberg correction were performed to identify ARG and taxonomic classes significantly enriched or depleted on plasmid-associated contigs (adjusted p < 0.05). This presence/absence approach at the sample level provided an unbiased comparison across samples, effectively minimizing potential biases from contig fragmentation or variable assembly quality.
4. Discussion
This study demonstrates that rainfall is a complex driver of resistome fluctuations in Lake Katherine. In contrast to previously documented increases in ARG abundance following rainfall in rivers, this study observed heterogeneous responses across ARG classes and locations within the lake, reflecting distinct hydrological and anthropogenic processes characteristic of suburban lakes. Rain events did not uniformly elevate total ARG levels; instead, gene-class-specific responses emerged. Several ARG classes (notably Aminoglycoside, Bacitracin, and Unclassified resistance genes) increased in abundance following precipitation, whereas others (such as Tetracycline, Multidrug, and Peptide resistance genes) declined in relative abundance. This decline suggests that rain events may also introduce environmental changes, such as dilution effects, shifts in microbial community composition, or extracellular DNA degradation, that reduce the relative abundance of some resistance determinants. For example, heavy rainfall can dilute background ARG concentrations or disrupt sediment-bound communities, leading to microbial turnover. In addition, extracellular ARGs released from lysed cells may be degraded by nucleases or bound to particulates, reducing their detectability in post-rain samples. The contrasting trends may suggest that stormwater runoff can introduce pulses of ARGs even as dilution or microbial turnover suppresses certain classes. Also, the impact of rainfall varied spatially within the lake: the inlet site (closest to stormwater inputs) showed the strongest positive correlation between rainfall and ARG abundance, while downstream sites exhibited weaker or no relationships. Such non-uniform patterns highlight the complex interactions between hydrology and ARG dynamics, consistent with observations that precipitation can both concentrate and disperse resistance elements depending on local conditions [
10,
14].
Taxonomic profiling of ARG-bearing metagenomic contigs provided insight into the microbial reservoirs underlying these patterns. Pseudomonadota, particularly Betaproteobacteria, were disproportionately represented among ARG-associated bacteria across many gene classes. Members of the Betaproteobacteria (along with Alpha- and Gammaproteobacteria) prevailed over the communities carrying ARGs that increased post-rainfall, like aminoglycoside and multidrug resistance genes, suggesting that rain-mediated runoff transports bacteria carrying these resistance genes from urban environments or wastewater sources into the lake [
51]. The prominence of these proteobacterial classes is consistent with their known association with polluted runoff and their role as vectors for disseminating clinically relevant ARGs in aquatic environments [
16].
Another notable finding is the prevalence of plasmid-associated ARGs, highlighting the potential for horizontal gene transfer in this environment. Plasmids are well-recognized vectors for transferring ARGs across bacterial species, so their presence in the lake’s metagenome increases the risk of ARG dissemination [
55]. Broad-spectrum resistance categories, like Unclassified or Multidrug ARGs, showed an enriched proportion of plasmid-borne genes, suggesting these mobile elements could readily spread resistance traits within the microbial community. This scenario is concerning because plasmid-mediated transfer can propagate ARGs beyond their original hosts, potentially seeding resistance into opportunistic pathogens or commensal bacteria in the water [
56]. The detection of plasmid-linked ARGs in Lake Katherine thus reinforces the notion that environmental resistomes are not static repositories but interactive pools with the capacity for gene exchange. It demonstrates the importance of surveilling not just which resistance genes are present, but also their genetic context and mobility, to better anticipate patterns of ARG spread.
Collectively, these findings carry important public health implications for recreational freshwater systems. Suburban lakes like Lake Katherine are frequently used for fishing, swimming, or other forms of contact, yet they may serve as unnoticed reservoirs of antibiotic resistance. Although total ARG abundance remained relatively stable, storm-driven hydrological pulses caused significant reorganizations of the resistome, selectively enriching ARG classes linked to potentially mobile genetic elements and clinically important antibiotics, most notably aminoglycoside resistance genes. The concurrent increase in unclassified ARGs is also concerning, as these sequences either represent novel, previously undescribed resistance elements or reflect gaps in current annotation databases. Their response to rainfall suggests they may originate from sources that are not well-characterized in reference datasets, such as environmental or non-clinical bacteria. Future studies using functional metagenomics or metatranscriptomics will be essential to validate the resistance potential and ecological roles of these unclassified sequences. Their enrichment following rainfall suggests that stormwater inputs may not only mobilize known resistance elements but also introduce genetically diverse elements whose implications for human and environmental health remain poorly understood. Given that recreational exposure can involve dermal contact or incidental ingestion, such shifts in ARG profiles after rain events may present meaningful episodic health risks to lake users [
1]. Even if lake water is not used for drinking, these ARGs and resistant microbes can be transported downstream or into underlying groundwater, contributing to the wider dissemination of resistance in the environment [
4]. The fact that significant ARG fluctuations occur in response to ordinary rainstorms suggests that even well-maintained suburban water bodies are vulnerable to resistance pollution during wet weather. From a public health standpoint, this study underscores the need for regular monitoring of ARG levels in urban and suburban recreational waters and for integrating antibiotic resistance considerations into stormwater management and water quality guidelines. Proactive surveillance and mitigation strategies, such as timely advisories after heavy rains or improved runoff controls, could help minimize community exposure to environmental antibiotic resistance.
Despite its insights, this study has several limitations that highlight avenues for future research. First, the focus on recent rainfall as the primary explanatory variable means that other environmental factors were not accounted for in the analysis. Integrating ARG data with physico-chemical parameters such as turbidity, dissolved organic carbon (DOC), nutrient concentrations, and temperature could reveal how environmental conditions modulate ARG stability, transport, and persistence. Variables such as antecedent dry periods, seasonal temperature changes, nutrient loads, and water chemistry fluctuations could also influence ARG abundance and stability. Incorporating these factors into multivariate models or long-term datasets would provide a more holistic understanding of ARG dynamics [
6,
51]. Second, this investigation was limited to a single lake over the course of one year. Expanding surveillance to additional lakes and watersheds, encompassing different land use contexts and climates, is important to determine the generality of the rainfall-ARG relationships observed here. Comparative studies suggest that ARG profiles and drivers can vary significantly across freshwater systems; therefore, a broader geographic sampling would strengthen inferences about how ubiquitous these rain-driven resistome fluctuations are [
20,
21]. Third, the metagenomic approach relied on short-read sequencing, which limited the assembly of full-length ARG sequences and the resolution of their genetic contexts, like distinguishing whether a gene resides on a chromosome versus a plasmid. Future work employing long-read sequencing or hybrid assembly techniques could capture the complete architecture of ARG elements, allowing direct linkage of resistance genes to specific hosts and mobile elements [
56]. Such detailed genomic context would improve source tracking of ARGs and help discern between endemic background resistance and newly introduced genes. ARGs were quantified at the DNA level without assessment of their expression or phenotypic impact. Follow-up studies using metatranscriptomics or culture-based assays would be valuable to evaluate whether the detected ARGs are actively expressed and confer resistance under in situ conditions. Fourth, the distribution of sampling events across rainfall conditions was uneven, with wet-weather samples outnumbering dry-weather samples by approximately 2:1 at each site [
Supplemental Table S9]. Although this imbalance accurately reflects precipitation patterns during the study period, it reduces statistical power for some class- and site-specific comparisons under dry conditions. Consequently, effect estimates for gene classes or locations with fewer dry samples may be associated with wider confidence intervals and increased risk of Type II error. Future studies should employ stratified sampling or targeted collection efforts to achieve a more balanced representation of both dry- and wet-weather conditions, thereby improving statistical power for dry-weather comparisons and enhancing the reliability of class-specific and site-specific analyses. Although co-occurrence or modularity-based network analyses may offer additional insight into ARG transfer dynamics, such approaches require more complete genomic resolution and validated linkage data than are available in the current short-read dataset. These analyses remain an important direction for future work. Addressing these gaps will deepen our understanding of how environmental pressures like precipitation shape the spread of antibiotic resistance. In an era of changing climate and expanding urbanization, developing this knowledge is crucial for crafting effective water management policies and public health interventions to curb the proliferation of antibiotic resistance [
51].