Analysis of Microbial Communities and Pathogen Detection in Domestic Sewage Using Metagenomic Sequencing

: Wastewater contains diverse microbes, and regular microbiological screening at wastewater treatment plants is essential for monitoring the wastewater treatment and protecting environmental health. In this study, a metagenomic approach was used to characterize the microbial communities in the inﬂuent and efﬂuent of a conventional domestic sewage treatment plant in the metropolitan city of Jeddah. Bacteria were the prevalent type of microbe in both the inﬂuent and efﬂuent, whereas archaea and viruses were each detected at <1% abundance. Greater diversity was observed in efﬂuent bacterial populations compared with inﬂuent, despite containing similar major taxa. These taxa consisted primarily of Proteobacteria, followed by Bacteroidetes and Firmicutes. Metagenomic analysis provided broad proﬁles of 87 pathogenic/opportunistic bacteria belonging to 47 distinct genera in the domestic sewage samples, with most having <1% abundance. The archaea community included 20 methanogenic genera. The virus-associated sequences were classiﬁed mainly into the families Myoviridae, Siphoviridae, and Podoviridae. Genes related to resistance to antibiotics and toxic compounds, gram-negative cell wall components, and ﬂagellar motility in prokaryotes identiﬁed in metagenomes from both types of samples. This study provides a comprehensive understanding of microbial communities in inﬂuent and efﬂuent samples of a conventional domestic sewage treatment plant and suggests that metagenomic analysis is a feasible approach for microbiological monitoring of wastewater treatment. a metagenomic analysis of the taxonomic functional of communities of from in


Introduction
The exponential growth of the world's population along with urbanization and socioeconomic development has resulted in the generation of billions of tons of solid waste and wastewater every year [1]. About 80% of the wastewater flows back into the ecosystem without being treated or reused (https://www.unwater.org/water-facts/quality-andwastewater), carrying a wide variety of contaminants, including microorganisms, heavy metals, and organic as well as inorganic compounds [2][3][4]. The natural water quality is adversely affected by sewage discharge because it drastically changes the physicochemical as well as microbial composition of freshwater sources [2,[4][5][6]. Various types of wastewater treatment plants (WWTPs) are used in different countries to improve the physicochemical and microbiological discharge quality of wastewater effluent [2,3,7,8]. WWTPs employs a series of processes, namely, preliminary treatment, secondary treatment, secondary clarification, tertiary treatment, disinfection, sludge processing etc [9]. Microbial community composition and diversity in treated water is shaped by both operating conditions and influent characteristics [3,9]. The core bacterial community in WWTPs is mainly comprised of Proteobacteria, classified as β-proteobacteria followed by γ-proteobacteria and genera of Dokdonella, and Zoogloea [3,10]. Nitrospira is dominantly found in activated sludge, and Arcobacter taxa are highly abundant in raw sewage [3,11]. A 16S amplicon study of mechanical aeration located in the metropolitan city of Jeddah. The capacity of the WWTP was about 2000 m 3 /day. Influent samples were collected from the raw sewage, and effluent samples were collected from the final treated water storage tank used for irrigation in clean sterilized containers. The samples were kept on ice during transportation to our laboratory at King Fahd Medical Research Center. The samples were stored at −20 • C before processing for DNA extraction. Samples were centrifuged at 8000× g for 20 min at ambient temperature. Total genomic DNA was extracted from the 200-mg pelleted biomass of each replicate sample using DNeasy PowerSoil Kit (Qiagen, Hilden, Germany) following the manufacturer's instructions. The concentration of DNA was measured with a Qubit dsDNA BR assay using a Qubit fluorometer (Invitrogen, Carlsbad, CA, USA).

Shotgun Sequencing and Data Analysis
Extracted DNA from three replicates of each sample type was pooled in equal concentrations of 20 ng/µL to obtained homogenous representative total genomic DNA and processed as a one sample for shotgun sequencing. DNA libraries with~400-bp insert size were prepared using Nextera XT DNA library preparation kit (Illumina, Inc., San Diego, California, USA). Sequencing was performed with 2 × 300 bp chemistry on a MiSeq platform (Illumina, Inc.), using a V3 cartridge from a 600-cycle kit (Illumina, Inc.) [23]. The raw sequence reads were uploaded to Metagenomic Rapid Annotations using Subsystems Technology (MG-RAST) and were analyzed using the default settings [24]. The classification of bacterial communities at different taxonomic levels from the metagenomes isolated from samples was performed by annotating the open reading frames (ORFs) against the NCBI reference sequence (RefSeq) database in MG-RAST. The identification of potential pathogenic taxa was further confirmed by using the tools of Kaiju, GOTTCHA2, and PATRIC that use protein-level, gene-independent signature-based metagenomic taxonomic classification, and Kraken 2 algorithm based on k-mers, respectively. For functional analysis, the ORFs were mapped to seed sub-system and KEGG Orthology (KO) databases. The de novo assemblies were prepared from high-quality reads using MetaSPAdes algorithms. The alpha diversity was performed with the Chao1 and Shannon diversity indices using the Calypso 8.84 server [25]. Sequencing data were deposited into the European Nucleotide Archive (ENA) under accession numbers SAMEA7130608 to SAMEA7130609.

Microbial Diversity
A massive amount of wastewater (>300 km 3 ) is produced around the globe each year, and only about 60% is treated before being release into the aquatic environment [1]. Several studies have identified diverse microbial communities in the wastewater that could influence the microbial ecology of the connected ecosystem [6,26,27]. Especially of the expansion in use of treated wastewater in the semi-arid climate country like Saudi Arabia for irrigation and industrial use makes it essential to characterize the microbial quality of these conventionally treated water sources [19]. In agreement with previous studies, our metagenomic analysis revealed complex microbial communities in the domestic sewage samples collected from a conventional sewage treatment plant in the coastal city of Jeddah. In rarefaction curves, both the samples attended saturation plateau ( Figure 1A). Chao1 analysis demonstrated a decrease estimated richness of species in effluent compared to influent ( Figure 1B). The Shannon index analysis revealed increased diversity in effluent compared to influent ( Figure 1C).

Bacteria
Bacteria were the prevalent type of microbe in both influent and effluent samples (influent = 98.9%, effluent = 98.1%), whereas archaea and virus were each detected at less than 1% abundance. In a previous study from China, a comparable concentration of bacteria (95.1%) was found in sewage treatment plant influent [26]. We found 33 phyla in the studied samples, including 28 bacteria and five archaeal phyla. Among those, 25 bacterial phyla and four archaeal phyla were common in both influent and effluent samples. Moreover, 241 families, 583 genera, and 1417 species of bacteria were retrieved from both samples. Among archaea, 29 families, 60 genera, and 88 species were identified. In addition, 150 viruses, mainly bacteriophages, were identified in influent and 240 in effluent samples. Domestic sewage is mainly comprised of microbial flora of human and animal wastes, vegetables, food, and soil. Diverse bacteria comprising influent, treatment process without membrane bioreactor and tertiary treatment, and the local environmental conditions of relatively high temperature may result the high diversity of bacterial community in the effluent of conventional biologically treated wastewater [2]. Moreover, bacteria play a crucial role in wastewater treatment [28]. Similarly, Giwa et al. detected more diverse bacteria and measurable concentrations of archaea and viruses in the effluent compared with the influent in a metagenomic analysis of samples from a biological WWTP [26].
The most abundant phylum in the sewage samples was Proteobacteria followed by Bacteroidetes, Firmicutes, and Actinobacteria ( Figure 2A). The relative abundance of 18 minor phyla was less than 1% in both samples, whereas Fibrobacteres, Dictyoglomi, and Candidatus Poribacteria were uniquely identified in the effluent metagenome. Proteobacteria accounted for 85.9% and 63.8% of the classified sequences in influent and effluent samples, respectively ( Figure 2A). Recently, a study of the global diversity of bacterial communities in 269 WWTPs showed the predominance of Proteobacteria in wastewater [3] and found that this phylum plays a broad role in organic matter degradation and nutrient cycling [2]. In the current study, phylotypes belonging to the ε-proteobacteria class dominated Proteobacteria in the influent (63.02%) but sharply decreased to 0.6% in the effluent. β-proteobacteria were more abundant in the effluent (39.6%) compared with the influent (4.8%), whereas γ-proteobacteria were detected at a relatively high abundance in both samples (influent = 10.1%, effluent = 14.8%). The results showing the Proteobacteria community markedly changed between the raw and treated wastewater. However, a detailed comparison between the effluent and influent wastewater communities identified shared taxa between the two types of samples. A positive correlation has been previously identified between the abundance of ε-proteobacteria and occurrence of antibiotics contamination like penicillins, tetracycline, quinolones, sulfonamides, and triclosan in the in-

Bacteria
Bacteria were the prevalent type of microbe in both influent and effluent samples (influent = 98.9%, effluent = 98.1%), whereas archaea and virus were each detected at less than 1% abundance. In a previous study from China, a comparable concentration of bacteria (95.1%) was found in sewage treatment plant influent [26]. We found 33 phyla in the studied samples, including 28 bacteria and five archaeal phyla. Among those, 25 bacterial phyla and four archaeal phyla were common in both influent and effluent samples. Moreover, 241 families, 583 genera, and 1417 species of bacteria were retrieved from both samples. Among archaea, 29 families, 60 genera, and 88 species were identified. In addition, 150 viruses, mainly bacteriophages, were identified in influent and 240 in effluent samples. Domestic sewage is mainly comprised of microbial flora of human and animal wastes, vegetables, food, and soil. Diverse bacteria comprising influent, treatment process without membrane bioreactor and tertiary treatment, and the local environmental conditions of relatively high temperature may result the high diversity of bacterial community in the effluent of conventional biologically treated wastewater [2]. Moreover, bacteria play a crucial role in wastewater treatment [28]. Similarly, Giwa et al. detected more diverse bacteria and measurable concentrations of archaea and viruses in the effluent compared with the influent in a metagenomic analysis of samples from a biological WWTP [26].
The most abundant phylum in the sewage samples was Proteobacteria followed by Bacteroidetes, Firmicutes, and Actinobacteria ( Figure 2A). The relative abundance of 18 minor phyla was less than 1% in both samples, whereas Fibrobacteres, Dictyoglomi, and Candidatus Poribacteria were uniquely identified in the effluent metagenome. Proteobacteria accounted for 85.9% and 63.8% of the classified sequences in influent and effluent samples, respectively ( Figure 2A). Recently, a study of the global diversity of bacterial communities in 269 WWTPs showed the predominance of Proteobacteria in wastewater [3] and found that this phylum plays a broad role in organic matter degradation and nutrient cycling [2]. In the current study, phylotypes belonging to the ε-proteobacteria class dominated Proteobacteria in the influent (63.02%) but sharply decreased to 0.6% in the effluent. β-proteobacteria were more abundant in the effluent (39.6%) compared with the influent (4.8%), whereas γ-proteobacteria were detected at a relatively high abundance in both samples (influent = 10.1%, effluent = 14.8%). The results showing the Proteobacteria community markedly changed between the raw and treated wastewater. However, a detailed comparison between the effluent and influent wastewater communities identified shared taxa between the two types of samples. A positive correlation has been previously identified between the abundance of ε-proteobacteria and occurrence of antibiotics contamination like penicillins, tetracycline, quinolones, sulfonamides, and triclosan in the influent of urban wastewater and was negatively correlated with the abundance of β-proteobacteria, and Firmicutes [29]. In comparison, β-proteobacteria observed abundantly in the treated wastewater irrespective of the seasonal temperature effect and mainly responsible for organic and nutrient removal [30].
Diversity 2021, 13, x FOR PEER REVIEW 5 of 15 fluent of urban wastewater and was negatively correlated with the abundance of β-proteobacteria, and Firmicutes [29]. In comparison, β-proteobacteria observed abundantly in the treated wastewater irrespective of the seasonal temperature effect and mainly responsible for organic and nutrient removal [30].  Bacteroidetes increased to 18.9% in the effluent from 5.4% in the influent and were mainly represented by the classes Flavobacteria, Sphingobacteria, and Cytophagia. Taxa from another class, Bacteroidia, were found at a relatively higher abundance in the influent (4.18%) compared with the effluent (2.98%). The Clostridia class dominated Firmicutes (influent = 1.9%, effluent = 3.2%). Bacteria from the phyla Firmicutes and Bacteroidetes have been previously reported to compose the predominant community in human stool samples [31,32], but they were relatively less abundant than Proteobacteria in our sewage samples. The data suggest that the bacterial community of sewage influent mainly had a nonfecal origin, as previously observed from a sewage analysis in US cities [33]. Microbes in the human gut are mainly anaerobes and likely have low survival rates after being discharged into an aerobic sewage environment. Meanwhile, the mixing of fecal and nonfecal bacteria leads to a novel microbial community composition in sewage [27]. The variation observed in relative abundance of Bacteroidetes in several studies between influents (5.1-15.7%) and effluents (2.4-12.5%) can probably be attributed to types of wastewater processed, modifications of the biological treatment condition as well as of variation in environmental factors in different geographical regions [34]. For example, Proteobacteria most abundantly identified in aerobic, whereas Bacteroidetes taxa were abundantly found in anaerobic treatment process [35].
At the family level, the most dominant families differed between the influent and effluent samples. In influent, 12 families were identified at ≥1% relative abundance and accounted for 75.5% of the total sequence reads. Families of Campylobacteraceae (53%) and Helicobacteraceae (7.2%) from ε-proteobacteria were dominant in influent ( Figure 2B). Gram-negative bacteria from the families Enterobacteriaceae and Pseudomonadaceae and sulfate-reducing bacterial families Desulfovibrionaceae and Desulfobulbaceae were detected in influent at a range of 1% to 2%. The human gut-associated family Bacteroidaceae was found at a relatively higher abundance in influent (2.2%) compared with effluent (1.5%), as previously reported [33]. In effluent, 25 families were identified as having ≥1% abundance and accounted for 68.9% of total sequence reads. Families of Comamonadaceae (10.9%), Rhodocyclaceae (10.3%), and Burkholderiaceae (6.9%) from β-proteobacteria and Flavobacteriaceae (6.4%) from phylum Bacteroidetes were dominant in effluent ( Figure  2B). The sulfate-reducing bacterial families had less than 1% abundance, and Enterobacteriaceae was detected at 1.2% in effluent. The relative abundance of Pseudomonadaceae increased to 4.4% in effluent compared with 1.0% in influent. Consistent with our findings, Gonzalez-Martinez et al. identified the families of Campylobacteraceae, Bacteroidaceae, and Comamonadaceae at high concentrations in 10 different wastewater treatment systems (WWTSs) and their influents [36]. Previously, Rhodocyclaceae and Comamonadaceae were described as the core families in wastewater treatment activated sludge systems responsible for denitrification and aromatic compound degradation [37].
A higher number of genera were identified in effluent compared with influent. Among the 583 genera identified in the current study, 171 genera were uniquely found in effluent and 412 genera in influent were commonly also found in effluent. Consistent with previous studies, variation in relative abundance of different genera was observed between influent and effluent samples [26]. In influent, 11 genera were identified as having ≥1% abundance and represented 65.9% of the total genera. Genus Arcobacter from Campylobacteraceae was predominantly found in the influent (49.39%) and included two dominant species, Arcobacter butzleri (35.38%) and Arcobacter nitrofigilis (14.01%, Figure 2C). Other relatively dominant genera included Sulfurimonas (3.9%), Bacteroides (2.22%), Sulfurospirillum (2.19%), Desulfovibrio (1.7%), Campylobacter (1.4%), and Helicobacter (1.3%), which had less than 1% abundance in effluent, with the exception of Bacteroides (1.52%, Figure 2C). Previously, Arcobacter was reported to be the most dominant bacteria in influent and supported by metagenomic analysis of wastewater samples from multiple countries, including Canada, China, Germany, Saudi Arabia, and the United States [3,11,38,39]. Bacteria from this taxon have been associated with both humans and animal infection and can cause gastroenteritis, mastitis, and abortion in livestock [40]. Arcobacter along with Pseudomonas, Acinetobacter, Bacteroides, Aeromonas, and Trichococcus was identified as a consistent genus in sewage influent, which are considered residents of the sewer infrastructure [16,36]. Among the core genera described, Clostridium and Bacteroides are consistent genera in the human gut microbiota, which suggests that human gut bacteria are part of the core genera of domestic wastewater [41].
In effluent, 15 genera were found to have an abundance of more than 1% and represented 36.7% of the total genera. Genera of Acidovorax (4.52%), Dechloromonas (4.37%), Pseudomonas (3.95%), Flavobacterium (3.21%), and Burkholderia (2.84%) were found at relatively higher abundance in effluent ( Figure 2C). Species of Acidovorax and Dechloromonas have been reported from WWTSs capable of aerobic heterotrophic growth and of anaerobic growth through denitrification [42,43]. Moreover, the sulfate-reducing bacteria (i.e., Sulfurospirillum, Desulfomicrobium, Desulfuromonas, Desulfovibrio, Desulfobacter, and Thiobacillus) are commonly found in WWTSs and other aquatic environments [44]. Metagenomic analysis provided us with broad profiles of bacterial pathogens in wastewater. Li et al. identified 113 pathogenic bacterial species from influent, effluent, and activated sludge samples from a sewage treatment plant [45]. In the current study, relative increases in the diversity of known pathogenic and opportunistic bacteria were found in effluent. Among the 87 pathogenic/opportunistic bacteria identified (representing 47 distinct genera), 62 were common to influent and effluent samples, whereas 24 were unique to effluent. Among the ESKAPE pathogens Enterococcus faecium, Klebsiella pneumoniae, Acinetobacter baumannii, and Pseudomonas aeruginosa were commonly found in both types of samples based on metagenomics (Table 1). However, the relative abundance of the pathogenic bacteria was <0.1% in the studied samples, with the exception of Pseudomonas aeruginosa, which was detected at a relatively higher abundance in effluent (1.19%). Also, Al-Jassim et al. reported Pseudomonas spp. from a conventional wastewater treatment plant in Saudi Arabia using 16S amplicon sequencing and further revealed that Pseudomonas aeruginosa was mainly found in the influent and non-chlorinated effluent samples using culture-dependent method [11]. In the influent, taxa from the pathogenic genera of Bacteroides, Campylobacter, and Helicobacter had >1% abundance, and Campylobacter and Helicobacter were substantially decreased in effluent (<0.1% abundance). Burkholderia were identified as having 2.84% abundance in effluent compared with 0.57% in influent (Table 1). In previous studies from conventional wastewater treatment plant, the bacterial pathogens A. baumannii, K. pneumoniae, and E. coli were often detected in sewage by both molecular- [26] and culture-dependent methods [18]. However, the identification of diverse pathogens in the effluent samples suggests that the sedimentation procedure does not entirely prevent microbiota from being transferred into effluent [26].

Archaea
Archaea were identified as having the same relative abundance (0.6%) in the metagenomes of both samples, but variations were noted in the distribution of taxa and their abundance between influent and effluent. The archaeal sequence reads were dominantly aligned to Euryarchaeota (≥90%) in both samples, whereas Crenarchaeota increased to 8.0% in effluent from 2.1% in influent ( Figure 3A). Similarly, Qin et al. reported that Euryarchaeota was dominant, followed by Crenarchaeota in samples from WWTPs [46]. The archaea community included 20 methanogenic genera that overall represented 85.3% of archaeaassociated sequence reads in influent and 60.2% in effluent; they were taxonomically classified mainly to Methanoregula, Methanosarcina, Methanococcoides, Methanospirillum, Methanococcus, Methanoculleus, and Methanocaldococcus ( Figure 3B). In addition, the presence of methanogenic archaea, including genera of Thermococcus, Pyrococcus, and Archaeoglobus, was identified with more than 1% abundance in both influent and effluent metagenomes ( Figure 3B). Methanogens are the most diverse groups of archaea and the focus of research because of their substantial contribution to methane emissions globally, as well as their role in wastewater treatment. Previously, families of Methanosaetaceae and Methanosarcinaceae were reported to be the predominant archaeal methanogens among the archaeal population in WWTSs [14]. These finding showed that archaea participate in the biodegradation process [46].

Archaea
Archaea were identified as having the same relative abundance (0.6%) in the metagenomes of both samples, but variations were noted in the distribution of taxa and their abundance between influent and effluent. The archaeal sequence reads were dominantly aligned to Euryarchaeota (≥90%) in both samples, whereas Crenarchaeota increased to 8.0% in effluent from 2.1% in influent ( Figure 3A). Similarly, Qin et al. reported that Euryarchaeota was dominant, followed by Crenarchaeota in samples from WWTPs [46]. The archaea community included 20 methanogenic genera that overall represented 85.3% of archaea-associated sequence reads in influent and 60.2% in effluent; they were taxonomically classified mainly to Methanoregula, Methanosarcina, Methanococcoides, Methanospirillum, Methanococcus, Methanoculleus, and Methanocaldococcus ( Figure 3B). In addition, the presence of methanogenic archaea, including genera of Thermococcus, Pyrococcus, and Archaeoglobus, was identified with more than 1% abundance in both influent and effluent metagenomes ( Figure 3B). Methanogens are the most diverse groups of archaea and the focus of research because of their substantial contribution to methane emissions globally, as well as their role in wastewater treatment. Previously, families of Methanosaetaceae and Methanosarcinaceae were reported to be the predominant archaeal methanogens among the archaeal population in WWTSs [14]. These finding showed that archaea participate in the biodegradation process [46]. . Taxonomic analysis of archaea associated sequence reads in the influent and effluent samples. Percent relative abundance of (A) phyla, and (B) genera identified methanogenic or detected in relatively higher abundance. The relative abundance was calculated by normalizing the sequence reads of each archaea taxon to the total number of archaea associated sequences reads in the respective metagenome. Figure 3. Taxonomic analysis of archaea associated sequence reads in the influent and effluent samples. Percent relative abundance of (A) phyla, and (B) genera identified methanogenic or detected in relatively higher abundance. The relative abundance was calculated by normalizing the sequence reads of each archaea taxon to the total number of archaea associated sequences reads in the respective metagenome.

Virus
Bacteria are the main microorganisms in wastewater, and bacteriophages regulate the microbial community composition in wastewater system [47]. In the current study, the viromes was analyzed based on the metagenomes using the RefSeq database. Only 0.3% sequence reads in influent and 0.6% in effluent were related to viral genotypes, and they were taxonomically classified into 10 families. Around 5% of identified viruses could not be classified to existing viral families. The most frequent families of viruses found were Myoviridae (influent = 32.9%, effluent = 43.9%), Siphoviridae (influent = 32.3%, effluent = 25.6%), and Podoviridae (influent = 25.9%, effluent = 20.5%). These families include bacteriophage viruses and accounted for ≥90% of the virus sequence reads. Among the 76 Myoviridaeassociated phages, Burkholderia, Prochlorococcus, Enterobacteria, and Synechococcus phages were detected at relatively higher abundances (Figure 4). From Siphoviridae, 105 phages were detected, including Pseudomonas, Flavobacterium, Stenotrophomonas, and Mycobacterium phages, whereas Podoviridae was represented by 58 phages, including Bordetella, Enterobacteria, and Pseudomonas phages, which were relatively more abundant (Figure 4). Families of Poxviridae and Adenoviridae that include potential human pathogens were identified at ≤0.1% abundance in both types of samples. The lytic ssDNA viruses' family Microviridae was identified uniquely in effluent (0.3%). Inconsistencies have been observed with regard to the abundance of viruses in wastewater systems. For example, in metagenomic detection of pathogens from a sewage treatment plant, Giwa et al. identified viruses at a relatively high abundance of 7% in the effluent [26]. In contrast, another study retrieved a minute concentration of viruses in wastewater [48]. Bacteriophages have been identified as the predominant members of the viral microbiomes studied [26,49]. They influence microbial communities through interactions with their specific bacterial hosts. We found that bacteriophages dominate virus-associated sequences in sewage metagenomes [49]. Metagenomic analysis in this study does not represent a complete profile of viruses' biodiversity and is considered a limitation of this study. Since total genomic DNA extracted from pellets of the centrifuged samples were used for shotgun sequencing. No viruses' specific protocols were adopted to get complete coverage of DNA and RNA viruses in the studied samples. However, the finding of this study is consistent with previous studies that reported bacteriophages predominantly in the wastewater samples [26]. of a large core of genes in wastewater that are essential for microbial cellular and community functions [7,28]. Genes related to cobalt-zinc-cadmium resistance (1.9%) and multidrug resistance efflux pumps (2.4%) were relatively abundant in influent compared with effluent (>1%). Moreover, β-lactamase, aminoglycoside adenylyltransferases, and methicillin resistance genes were detected in both samples at less than 1% abundance. Domestic sewage is considered a potential source of antimicrobial resistance genes. Charmaine et al. identified a

Metagenomes and Function Analysis
A total of 176,540 sequence reads containing 54,668,916 bp were obtained from influent, and 358,528 sequence reads containing 112,299,165 bp were obtained from effluent. Dereplication identified 6997 sequences as artificial duplicate reads in influent and 1036 sequences in effluent sample. Of the sequences tested, 5.8% in influent and 1.5% in effluent failed to pass the quality control pipeline. Mean G+C content was 37 ± 13% in influent and 55 ± 13% in effluent. The mean length of reads was 310 ± 108 bp for influent and 313 ± 75 bp for effluent. The total number of predicted protein features was 90,271 in influent and 331,250 in effluent. The predicted rRNA features were 729 in influent and 1515 in effluent.
A significant proportion of both metagenomes was associated with housekeeping functions. The genes for amino acids and derivatives, protein metabolism, carbohydrates, nucleosides and nucleotides, respiration, cell wall and capsule, cofactors, vitamins, prosthetic groups, pigments, virulence, disease, and defense were present at relatively high abundance (3.4-12.9%) in both samples ( Figure 5A). SEED subsystem classification suggested that pathways related to resistance to antibiotics and toxic compounds, gram-negative cell wall components, flagellar motility in prokaryotes, and inorganic sulfur assimilation were higher in influent compared to effluent sample ( Figure 5B). Genes for RNA processing and modification, DNA repair, protein degradation, and capsular and extracellular polysaccharides were present at relatively higher abundance in effluent than influent ( Figure 5B). The results from this study are consistent with the reported occurrence of a large core of genes in wastewater that are essential for microbial cellular and community functions [7,28].
Genes related to cobalt-zinc-cadmium resistance (1.9%) and multidrug resistance efflux pumps (2.4%) were relatively abundant in influent compared with effluent (>1%). Moreover, β-lactamase, aminoglycoside adenylyltransferases, and methicillin resistance genes were detected in both samples at less than 1% abundance. Domestic sewage is considered a potential source of antimicrobial resistance genes. Charmaine et al. identified a variety of antimicrobial resistance genes causing multidrug resistance to quinolone, β-lactam, rifamycin, chloramphenicol, bacitracin aminoglycoside, sulfonamide, tetracycline, and vancomycin in a metagenomic study of municipal WWTPs [12]. The diversity of antimicrobial resistance genes in wastewater depends on the type of waste under processing. From an environmental health perspective, it is crucial to appropriately treat raw sewage because it may enter the connected aquatic ecosystem.
Similarly, in the KO database, metabolism-associated genes were found with more than 50% relative abundance, followed by genes related to genetic information processing, environmental information processing, and cellular processes ( Figure 6A). Further analysis at the subcategory level revealed that pathways for alanine, aspartate, and glutamate metabolism [PATH:ko00250] and the cysteine and methionine metabolism [PATH:ko00270] were detected at a relatively higher abundance in influent (9.4% and 5.3%) compared with effluent (3.5% and 2.2%), respectively ( Figure 6B). ABC transporters [PATH:ko02010] and bacterial secretion system [PATH:ko03070] pathways from the membrane transport category were identified at relatively higher abundance in effluent (6.6% and 2.2%) compared with influent (3.2% and 1.7%), respectively ( Figure 6B). The genes related to pathways for glycolysis/gluconeogenesis [PATH:ko00010], bacterial chemotaxis [PATH:ko02030], flagellar assembly [PATH:ko02040], peptidoglycan biosynthesis [PATH:ko00550], and nitrogen metabolism [PATH:ko00910] were also detected at a relative abundance of ≥1% in both samples ( Figure 6B). Previous metagenomic studies have also reported high proportions of these genes in wastewater. These genes are responsible for essential metabolic activities in microbial communities and are necessary in WWTSs [28]. In agreement with our findings, Sidhu et al. reported variation in the pre-and post-treated sewage [50]. They found genes for motility, DNA repair, protein metabolism, and respiration at significantly higher abundance in pretreated sewage, and genes related to amino acids and their derivatives, carbohydrate metabolism, fatty acids, and lipid metabolism were dominant in treated sewage samples [50].

Conclusions
This study provides a broad metagenomic analysis of the taxonomic and functional profile of microbial communities of influent and effluent from a conventional sewage treatment plant in Jeddah. The core bacterial community observed in this study showed similarity with the previously reported microflora of various conventional sewage treatment plants. Some of the microbial taxa were commonly detected in the influent and effluent samples, but the overall microbial community substantially changed in the effluent from influent at the lower taxonomic level. Although we found pathogenic bacteria, antimicrobial, and metal resistance genes in both influent and effluent samples, the levels were trivial, and associated environmental risks are most likely limited from its use in irrigation. The wastewater metagenomes carried mainly housekeeping genes along with functional pathways associated with the wastewater treatment process. Functional genes data obtained from metagenomic analysis can be used to enhance bioaugmentation for the improvement of contaminants degradation in wastewater treatment process. The data presented in this study is from one WWTP, and the low number of samples is the limitation of this study. Further research is recommended to investigate the microbial diversity and determine the distribution of pathogens and antimicrobial resistance genes over a broad scale and a long time in WWTPs from which effluent is used locally for irrigation and other non-potable use.

Informed Consent Statement: Not applicable.
Data Availability Statement: Sequencing data is available from the European Nucleotide Archive (ENA) under accession numbers SAMEA7130608 to SAMEA7130609.