Plasmid Genomes Reveal the Distribution, Abundance, and Organization of Mercury-Related Genes and Their Co-Distribution with Antibiotic Resistant Genes in Gammaproteobacteria

Mercury (Hg) pollution poses human health and environmental risks worldwide, as it can have toxic effects and causes selective pressure that facilitates the spread of antibiotic resistant genes (ARGs) among microbes. More and more studies have revealed that numerous Hg-related genes (HRGs) can help to resist and transform Hg. In the present study, we systematically analyzed the HRG distribution, abundance, organization, and their co-distribution with ARGs, using 18,731 publicly available plasmid genomes isolated from a Gammaproteobacteria host. Our results revealed that there were many Hg-resistant (mer) operon genes but they were not extensively distributed across plasmids, with only 9.20% of plasmids harboring HRGs. Additionally, no hgcAB genes (which methylate Hg to create methylmercury) were identified in any of the analyzed plasmids. The host source significantly influenced the number of HRGs harbored by plasmids; plasmids isolated from humans and animals harbored a significantly smaller number of HRGs than plasmids isolated from the wastewater and sludge. HRG clusters displayed an extremely high organizational diversity (88 HRG cluster types), though incidences of more than half of the HRG cluster types was <5. This indicates the frequent rearrangement among HRGs in plasmids. The 1368 plasmids harboring both HRGs and ARGs, were dominated by Klebsiella, followed by Escherichia, Salmonella, and Enterobacter. The tightness of the HRG and ARG co-distribution in plasmids was affected by the host sources but not by pathogenicity. HRGs were more likely to co-occur with specific ARG classes (sulfonamide, macrolide-lincosamide-streptogramin, and aminoglycoside resistance genes). Collectively, our results reveal the distribution characteristics of HRGs in plasmids, and they have important implications for further understanding the environmental risks caused by the spread of ARGs through the plasmid-mediated co-transfer of ARGs and HRGs.


Introduction
Mercury (Hg) occurs naturally in the Earth's biogeochemical system. However, centuries of human activities, such as coal burning, mining, and other industrial activities, have massively increased the amounts of Hg in the atmosphere, ocean, and terrestrial systems [1]. Hg is a global environmental concern as its most chemical forms are toxic to all living organisms, especially methylmercury, which is a potent neurotoxin that affects human and animal development and health [2].
Microbes play an important role in the global geochemical cycle of Hg [3][4][5][6]. They have evolved various mechanisms to resist and transform Hg and organomercury compounds.
These mechanisms include the reduction of Hg(II) to gaseous Hg(0), the oxidization of Hg(0) to Hg(II), the demethylation of organomercury compounds, and the methylation of Hg(II) [4,[6][7][8][9]. Many microbial genes, which are denoted as Hg-related genes (HRGs) in this study, are implicated in these mechanisms. There is a growing consensus that there are more HRGs in microbial genomes than other metal-related genes.
The microbial detoxification of Hg mainly relies on the Hg-resistant (mer) operon, which has been reported to have originated in thermophiles [10,11]. Among the microbial metal resistant systems, only the mer operon detoxification mechanism leads to a largescale transformation of its toxic target [4]. The mer operon can be located on transposons, plasmids, and chromosomes, and its components vary among the microbial genomes. The microbial mer genes include merA (encoding Hg(II) reductase), merB (encoding organomercury lyase), merP (encoding periplasmic Hg(II) scavenging protein), merT, merC, merE, merF, and merG (all encoding inner membrane-spanning proteins), and merR and merD (regulatory genes) [10]. Additionally, a two-gene cluster, hgcA and hgcB, is required for the Hg methylation in anaerobic microorganisms, typically in sulfur-reducing bacteria [9]. hgcA encodes a putative corrinoid protein with a strictly conserved cysteine that is proposed to be the ligand for the cobalt in the corrinoid cofactor, whereas hgcB encodes a ferredoxin-like protein, thought to be an electron donor to HgcA [9]. The hgcAB gene set has been found in nearly all anaerobic (but not aerobic) environments, including the oxygenated layers of the open ocean [12][13][14][15]. However, it has been reported to be effectively absent in mammalian microbiomes [12]. Furthermore, regarding the oxidization of Hg(0) to Hg(II) and the merindependent oxidative demethylation, the biochemical pathways and the corresponding genes have not been identified [16].
Environmental Hg, which is directly toxic to living organisms, plays important roles in the spread of drug resistance among bacteria. Exposure to Hg and antibiotics has led to intersecting evolutionary pressures on bacteria, leading to cross-resistant bacterial phenotypes [17,18]. More and more studies have reported that bacteria from both clinical and natural environments can harbor a wealth of antibiotic resistant genes (ARGs) and HRGs, and these two types of genes are closely linked [19][20][21]. This may be explained by an adaptive evolution for bacterial genomes against Hg toxicity, and ancient antimicrobial agents before antibiotics were discovered for large production and use.
PCR-and metagenome-based methods have been extensively used to investigate the HRG distribution and the association of HRGs with AGRs [20]. However, the PCR-based approach can underestimate the abundance of HRGs when multiple copies of target genes are present in the microbial genomes. The metagenome-based approach can cause false positives due to the limitation regarding the short-read length. In most cases, neither PCRnor metagenome-based methods can link specific genes to their respective hosts [22]. Therefore, the two approaches provide insufficient information to determine the distribution of HRGs and ARGs in individual genomes.
Plasmids are highly transmissible genetic elements that play an important role in the spread of resistance genes [23,24]. However, our knowledge of the HRG distribution, abundance, organization, and co-distribution with ARGs in plasmids remains limited. So far, tens of thousands of plasmid genomes from the National Center for Biotechnology Information (NCBI) genome database were available, and more than half of the total available sequenced plasmids derived from a Gammaproteobacteria host. In this study, bioinformatics analyses were performed to annotate the HRGs and ARGs in plasmids isolated from a Gammaproteobacteria host. We then revealed the organizational diversity of HRGs, and the distribution of HRGs in plasmids isolated from various host sources. Additionally, we conducted an analysis of the physical genetic distances between ARGs and HRGs.

Retrieval of the Plasmid Genomic Sequences
All available plasmid genomes in Gammaproteobacteria (as of 6 December 2021) were retrieved from the NCBI genome database. Plasmids from the same strain that were sequenced more than once were randomly dereplicated. As a result, 18,731 plasmids were retained and annotated using a rapid prokaryotic sequence annotation algorithm implemented in Prokka v1.12 [25] to avoid any gene calling deviation. The assembly number and related information for all plasmid genomes are listed in Table S1.
The plasmid genetic location information, prokaryotic host, the taxonomic classification of the host, isolation source, and other information were retrieved from the original GenBank file in batches using an in-house Perl program. The protein-coding gene sequences of each plasmid were extracted from the GenBank files produced by Prokka and used for the HRG and ARG annotation. According to the "isolation source" and the "host information", the host source of plasmids harboring HRGs were divided into the following categories: "human", "animal", "wastewater and sludge", "miscellaneous sources". The potential pathogenicity of each host species was obtained from the species list from the NCBI Pathogen Detection [26] and the published database covering the recognized species of human bacterial pathogens [27].

Annotation of HRGs
We constructed a HRG database by (1) extracting HRGs from BacMet version 2.0 [28] and (2) obtaining hgcAB gene sequences used by microorganisms to methylate Hg from previous studies [9,[29][30][31]. First, utilizing an in-house Perl script, we used the keyword "Mercury" to search the "Compound" column of the BacMet2 gene annotation mapping file, to obtain the HRG GenBank IDs, which we used to extract the HRG sequences of the predicted HRGs in BacMet2. Second, the accession numbers or locus tags of the hgcAB genes from previous studies [9,[29][30][31] were used to extract the hgcAB sequences from the NCBI protein database.
Our HRG database was then used to identify HRGs in each plasmid genome. We used the predicted proteomes as queries with the DIAMOND BLASTP [32]. Stringent criteria (e-value ≤ 1e-5; identity ≥ 80%; coverage ≥ 90%) [33,34] were used in each case, except for hgcAB, for which a lenient criterion (e-value ≤ 1e-5) that was used due to few similarities. Homologous hgcA sequences without the conserved sequence motif [G(I/V)NVWC(A/S/G)(A/S/G)GK] and the homologous hgcB sequences that were not near hgcA in the genomes were discarded [30,35].

Annotation of the ARGs
SARG version 2.0 is a structured ARG database that assigns query sequences to ARG classes, based on a similarity search [36]. We created an ARG database by using SARG as the primary database and integrating other sources of ARGs, comprising the Comprehensive Antibiotic Resistance Database (CARD) [37], ARGminder [38] and AMRFinderplus [39].
Each of these three databases was transformed into a SARG-like database by creating the following files: (1) a sequence file (FASTA format) involving a sequence ID (the ARG's unique ID and ARG annotation separated by a blank), followed by the sequence data and (2) a list file (tab-delimited format) with two columns containing the following: (i) the ARG class and ARG name, separated by two consecutive underscores and (ii) each ARG's unique ID (separated by a comma if there were multiple ARGs).
Next, the ARGs in the three databases were matched to the ARG class names used in the SARG database as follows: (1) if the ARG name in the list files of the three databases was identical to that in the SARG list file, the ARG class name (denoted X) was replaced (if it was not already the same) with that used in SARG (denoted Y) and (2) for all other ARGs that had this ARG class name (X), but a different ARG name from that in the SARG database, the ARG class name was replaced with that used in the SARG database (Y). Only 428 ARGs could not be matched to an ARG class name used in the SARG database, using the above steps. The ARG class of each of these ARGs was determined by searching them against the SARG database using DIAMOND BLASTP with a strict cutoff (e-value ≤ 1 − e 5 ; identity ≥ 80%; coverage ≥ 90%) [34,40].
Our ARG database was then used to identify ARGs in each plasmid genome. We used the predicted proteomes of the plasmid genomes as queries, using the DIAMOND BLASTP algorithm with stringent criteria (e-value ≤ 1e-5; identity ≥ 80%; coverage ≥ 90%) [34,40].

Co-Distribution Analysis of the ARGs and HRGs
Plasmids harboring both ARGs and HRGs were used for a co-distribution analysis. Based on the method used in a previous study [34], two parameters were used to accurately analyze the co-distribution of ARGs and HRGs in plasmids. The two parameters were The A-H-incidence ave (the average number of ARGs that occur within a specified physical genetic distance of HRGs among all plasmids) and the A-H-distance min (the minimum distance between ARGs and HRGs in each plasmid) [34].
The A-H-incidence ave was calculated as follows: (1) the physical genetic distance between each ARG-HRG pair in each plasmid was calculated, (2) the cumulative number of ARGs within the specified distance (distance increased from 0 to 100 kb with 200 bp as the step width) was recorded for each plasmid, and (3) the A-H-incidence ave was obtained by dividing the total number of ARGs within the specified physical genetic distance of HRGs by the total number of plasmids.
The A-H-distance min was calculated as follows: (1) the physical genetic distance between each ARG-HRG pair in each plasmid was calculated, (2) the HRGs were subjected to deredundancy, ensuring that each HRG class appeared only once, and (3) the A-H-distance min was obtained by determining the minimum distance between an ARG and a HRG, for each ARG class in each plasmid.

Analysis of Preference of the HRGs for Specific ARG Classes
In step 3 of the calculation of the A-H-distance min , the minimum distance between any ARGs and HRGs in each plasmid was obtained.

Statistical Analysis
A Kruskal-Wallis test, a Wilcoxon rank-sum test, and plotting were performed in R software version 4.1.3 (R Core Team Vienna, Austria) with the "ggpubr" package. A result was considered significant if p < 0.05, highly significant if p < 0.01, and extremely significant if p < 0.001, according to the previous studies.

Overview of the Plasmid Host Taxonomy
According to the host taxonomic classification, all of the 18,731 plasmids affiliated to 135 genera, but 93.09% of the plasmid numbers were concentrated in 17 genera host (Table S2). The predominant genera were Escherichia and Klebsiella, and they occupied 54.57% of all plasmids belonged to these two genera. It is important to note that although the large number of plasmids used in this study led to a relatively comprehensive HRG profile, they do not necessarily represent the complete microbial diversity in Gammaproteobacteria [34].

Distribution of the HRGs in Plasmid Genomes
To ascertain the distribution and organizational diversity of the HRGs, all plasmids were analyzed, based on a sequence similarity search analysis. The results indicated that 1723 plasmids harbored HRGs, accounting for only 9.20% of all plasmids. In these plasmids, 9931 HRGs, belonging to 16 HRG classes, were identified ( Figure 1A). There were many HRGs but they were not extensively distributed across the plasmids. The most common HRGs were merR, merA, merP, merD, merE, and merT. They were similar in number, reflecting the relatively high prevalence of HRG clusters, composed of these six HRGs in plasmids. The number of HRGs in each plasmid was highly variable. The maximum number of HRGs (24) was found in both plasmid pF10AN_1 (305.55 kp) from the Klebsiella pneumoniae strain F10(AN) and plasmid unnamed1 (150.42 kp) from the Enterobacter cloacae complex sp. strain AR_0072 (Table S3). The genome size of the 1723 plasmids that harbored HRGs varied markedly, from 5.37 to 1039.05 kb. Inevitably, larger plasmids had more genes, there was a certain degree of correlation between the plasmid genome size and the number of HRGs (Spearman's ρ = 0.30, p-value < 2.2e −16 ).
The hosts of the plasmids harboring HRGs belonged to 32 genera, dominated by Klebsiella, Escherichia, Salmonella, Enterobacter, and Citrobacter, all of which are potential pathogens (Table 1). These five genera accounted for more than 83% of the total number of plasmids carrying HRGs. Thus, HRGs were frequently associated with pathogenic strains. It might be a coincidence that the first reported bacterium that exhibited Hg resistance was a clinical isolate, which was also penicillin resistant [41]. However, it is likely that many more plasmids were isolated from pathogens than non-pathogens because there has been more medical microbiology research conducted by microbiologists. As for five genera (Xanthomonas, Piscirickettsia, Vibrio, Erwinia, and Yersinia), the number of plasmids from each of the genera was greater than 100, but there was none with or less than 1% of HRGs identified in the plasmids. There may be a distribution preference for HRGs related to plasmid hosts at the genus level. For instance, out of 5508 plasmids isolated from Escherichia, only 267 plasmids carried HRGs, while, 139 out of 655 plasmids from Citrobacter contained HRGs. The microbial hgcAB genes used to methylate Hg have been found in highly diverse anaerobic settings, e.g., soils, sediments, rice paddies, and various extreme environments [9,12,35,42]. However, the hgcAB gene set is relatively rare, occurring in only~1.4% of the sequenced microbial genomes [12]. In the present study, using a fairly lenient criterion (e-value ≤ 1−e 5 ) plus a conserved hgcA sequence motif search, the hgcAB gene set was not identified in any plasmids. Thus, the hgcAB gene set appears to be absent from plasmids, and is more likely to be located in microbial chromosomes, suggesting a low risk of a hgcAB transfer.
According to the plasmid host source, they were classified into several groups: human (901 plasmids), animal (242 plasmids), wastewater and sludge (144 plasmids), and miscellaneous sources (159 plasmids). The host source for 277 plasmids were unknown (denoted as NA in Table S3). Overall, there were significant differences in the number of HRGs in the plasmids isolated from these different sources ( Figure 2). Specifically, the median numbers of HRGs in plasmids isolated from humans (six HRGs), animals (six HRGs), both of which are slightly smaller than that of the isolates from the wastewater and sludge (seven HRGs) ( Figure 2). It accorded with a previous study, reported by us, that the number of arsenicrelated genes per Burkholderiales isolate from humans and animals was less than that of the isolates from wastewater and sludge [22]. Plasmid from human and animal hosts have a median of HRGs up to six. However, we found no significant difference in the number of HRGs between the plasmids isolated from humans and animals. We speculate that our results may be related to the fact that before antibiotic use, Hg compounds were extensively used as disinfectants and antiseptics in hospitals and communities. Nevertheless, our results revealed that the host source plays a crucial role in influencing the distribution of HRGs in plasmids.

HRG Clusters Are Highly Abundant and Have an Extremely High Organizational Diversity
The bacterial mer operon plays a crucial role in the Hg biogeochemistry and bioremediation by converting both the reactive inorganic and organic forms of Hg to relatively inert, volatile, and monoatomic forms [43]. Consistent with previous studies [4,44], almost all of the identified HRGs (96.29%) were grouped into clusters, which indicates that HRGs tend to work together. To investigate the organizational diversity of the HRG clusters (referring to mer clusters), we defined a HRG cluster as at least two HRGs that occur together or are separated by no more than two other genes. We thereby identified 88 HRG cluster types, with 1680 incidences in plasmids ( Table 2).  merR-merT-merP-merC/X-merA-merD-merE represent the two most common HRG cluster types, together accounting for 59.52% of all incidences of clusters (Table 2). Of the 88 HRG cluster types, 64 (72.73%) had an incidence of <5. Taken together, compared to the small number of HRG classes (16 HRG classes), the HRG cluster types are extremely abundant and diverse [4]. This phenomenon further highlights the frequent rearrangement that occurs among HRGs [4]. The extensive findings regarding HRGs in this study enrichened the gene sources for the exploitation of the mer-mediated functions for the Hg bioremediation [44]. Along with the fairly high diversity of HRG clusters, the transcription of HRGs can be activated in diverse ways, e.g., the merA transcription activation in Thermus thermophilus [45].
Many studies have reported finding the HRG and ARG co-distribution in various complex environments or in single microorganisms [20,21,24,45,46]. We found that there were 1368 plasmids harboring both HRGs and ARGs, representing approximately 7.30%, 79.40%, and 22.13% of all analyzed plasmids, plasmids with HRGs, and plasmids with ARGs, respectively (Table S4). The plasmid hosts belonged to 25 genera, but were largely concentrated in Klebsiella, Escherichia, Salmonella, and Enterobacter (Table 1). The number of HRGs and ARGs harbored by the 1368 plasmids were 8146 and 10,207, accounting for 82.03% and 32.18% of the HRGs (out of a total of 9931) and ARGs (out of a total of 31,723) of all analyzed plasmids, respectively. In some of the plasmids that harbored both HRGs and ARGs, HRGs were flanked by numerous ARGs (Figure 3). In contrast to the ratio of the co-distribution plasmid number in plasmids with HRGs (79.40%) and plasmids with ARGs (22.13%), the plasmids harboring both ARGs and HRGs had a high ratio of resistance genes, HRGs (82.03%) and ARGs (32.18%). Thus, ARGs and HRGs tend to be co-distributed. Furthermore, the incidences of ARGs and HRGs were slightly positively correlated (Spearman's ρ = 0.23, p-value < 2.2e −16 ). We speculate that the correlation between ARGs and HRGs might be attributable to a co-selection mechanism [48]. For instance, at a water treatment plant that disinfected surface water, the number of HRGs and ARGs related to multidrug, β-lactam, and aminoglycoside resistance increased [46]. Evolutionary events that lead to the emergence of new microbial antibiotic resistance factors are rare, while transmission events of widespread resistance genes are common [48]. As key drivers of horizontal gene transfer, plasmids harboring both HRGs and ARGs can be transferred at seemingly high rates via several mechanisms [49]. For example, Hg pollution in soil accelerates the transfer of ARGs among bacteria [50,51].
We also analyzed the physical genetic distance between the co-distributed HRGs and ARGs using the A-H-incidence ave and the A-H-distance min (see "Co-Distribution Analysis of the ARGs and HRGs" section in Section 2.4). The results indicated that ARGs were located very close to HRGs, with a median A-H-distance min of 5.67 kb. As the distance from the HRGs increased, the A-H-incidence aves tended to be greater in plasmids from humans and animals than plasmids from wastewater and sludge ( Figure 4A). Furthermore, based on the A-H-distance min , ARGs and HRGs were much closer in plasmids from humans (a median A-H-distance min of 5.67 kb) and animals (a median A-H-distance min of 4.10 kb) than in plasmids from wastewater and sludge (a median A-H-distance min of 8.46 kb) ( Figure 4C). This is consistent with a previous report showing that bacteria from humans and animals had the highest co-selection potential [52]. This may be due to humans and animals being the main bacterial host sources that involve a regular and intentional high exposure to antibiotics. Thus, the high concentrations of antibiotics drove the increased ARG and HRG co-distribution [52]. As the distance from HRGs increased, a rising trend of the A-H-incidence aves increased more in the plasmids from pathogens, compared to the plasmids from non-pathogens ( Figure 4B). While, there was no significant difference on the physical genetic distance between ARGs and HRGs for the plasmids from pathogens (a median A-H-distance min of 5.67 kb) and non-pathogens (a median A-H-distance min of 5.92 kb) ( Figure 4D). These results hint that the occurrences of HRGs and ARGs are closely related in pathogens, as is the case in a previous study of complete bacterial genomes [34]. The degree of difference is more obvious in that study [34]. We speculate that the shuttling of plasmids harboring both ARGs and HRGs between clinical and non-clinical hosts weakened the difference between the pathogens and non-pathogens.

Preference of HRGs for Specific ARG Classes
In microbial genomes, functionally related genes (e.g., functional genes conferring resistance to specific antibiotics and metals) tend to cluster together, becoming adjacent to each other due to selective pressure. Based on the physical genetic distance calculations, the nearest ARG classes to the HRGs were analyzed to reveal the functional association between the ARG classes and HRGs. HRGs had a clear preference for several ARG classes (related to a resistance against sulfonamide, macrolide-lincosamide-streptogramin, aminoglycoside, tetracycline, and β-lactam), as they were close in terms of the physical genetic position (Table 3). Conclusively, the empirical observation of the genetic co-distribution ( Figure 3) indicates that Hg exposure can promote resistance against a wide range of antibiotics, and the selective preference for particular classes of antibiotics is obvious [52]. As the formation and spread of drug resistance (especially in bacterial pathogens) is a severe health issue, which leads to millions of human deaths each year, more attention should be paid to the co-transmission of ARGs with HRGs caused by the Hg selective pressure.

Conclusions
In this study, a bioinformatics analysis of a large number of plasmid genomes confirmed that there were many HRGs but they were not extensively distributed across the plasmids, with only 9.20% of plasmids harboring HRGs. We found that the HRG clusters had a high organizational diversity, with 88 HRG cluster types being found in plasmids, despite the relatively small number of HRG types. Moreover, the incidences of more than half of the HRG cluster types were low. These results suggest that HRG clusters in plasmids were subjected to the frequent gene rearrangement, in response to the extensive historical use of Hg. Narrow-spectrum Hg-resistance operons were the predominant HRG cluster types in plasmids, mainly involving merR-merT-merP-merC/X-merA-merD-merE. Fortunately, the hgcAB gene set (which confers the ability to methylate Hg(II)) was not present in the plasmids, which lowers the risk of a methylmercury release into the environment. A substantial pro-portion of plasmids harboring HRGs also harbored ARGs (79.40%), and plasmids harboring both ARGs and HRGs had higher ratios of HRGs and ARGs than those of all analyzed plasmids. Furthermore, there was a weak but significant correlation between HRGs and ARGs. Thus, we speculate that HRGs and ARGs are prone to co-distribution in plasmids. In addition, the HRG and ARG distribution in plasmids was related to the host source. Plasmids from humans and animals, compared to wastewater and sludge, had a greater increase in the incidence of ARGs within a specified physical genetic distance of HRGs as the specified distance increased, and a smaller average minimum distance between ARGs and HRGs, among all plasmids harboring both ARGs and HRGs. These findings suggest that the plasmid hosts exposed to antibiotics or Hg, might acquire HRGs and ARGs during the long-term adaptive evolution under selective pressure. In summary, this study highlighted the incidence of ARGs (which pose large health risks) caused by the environmental Hg contamination, which has resulted in the frequent plasmid-mediated transfer of ARGs and HRGs. It should be noted that the findings should not be extended too far into the environment, since the skew in the data has its roots in our interest in medically relevant pathogens.
Supplementary Materials: The following supporting information can be downloaded at: https: //www.mdpi.com/article/10.3390/genes13112149/s1, Table S1: Plasmid genomes used in this study, Table S2: Taxonomic distribution of the plasmid hosts, Author Contributions: X.L. conceived and designed the study, and performed the bioinformatic analyses. X.L., Z.Y., X.W. and G.Z. analyzed the data. X.L., Z.Y. and S.S. drafted the manuscript. X.L. and L.C. supervised the bioinformatic analysis work, and reviewed and edited the manuscript. All authors approved the final manuscript. All authors have read and agreed to the published version of the manuscript.