MazF Endoribonucleolytic Toxin Conserved in Nitrospira Specifically Cleaves the AACU, AACG, and AAUU Motifs

MazF is an endoribonucleolytic toxin that cleaves intracellular RNAs in sequence-specific manners. It is liberated in bacterial cells in response to environmental changes and is suggested to contribute to bacterial survival by inducing translational regulation. Thus, determining the cleavage specificity provides insights into the physiological functions of MazF orthologues. Nitrospira, detected in a wide range of environments, is thought to have evolved the ability to cope with their surroundings. To investigate the molecular mechanism of its environmental adaption, a MazF module from Nitrospira strain ND1, which was isolated from the activated sludge of a wastewater treatment plant, is examined in this study. By combining a massive parallel sequencing method and fluorometric assay, we detected that this functional RNA-cleaving toxin specifically recognizes the AACU, AACG, and AAUU motifs. Additionally, statistical analysis suggested that this enzyme regulates various specific functions in order to resist environmental stresses.


Introduction
Toxin-antitoxin (TA) systems are involved in bacterial survival by regulating cell growth under various environmental stresses [1][2][3]. Since its discovery, dozens of TA systems have been detected on prokaryotic chromosomes and plasmids, and they have been categorized into six types [3][4][5][6][7][8][9][10]. The most common type II TA systems consist of a pair of co-regulated proteins: a toxin that inhibits cell growth by suppressing essential cellular processes and an antitoxin that acts as its cognate inhibitor [3,5]. Normally, the two proteins form a stable toxin-antitoxin complex, but when cells are exposed to environmental stresses, proteases are induced and preferential degradation of antitoxins liberate toxins that inhibit cell growth [1][2][3]5].
MazEF is an archetypical pair of type II TA systems that encodes an endoribonucleolytic toxin, MazF and its cognate antitoxin MazE [11,12]. The Escherichia coli MazF is known to contribute to cellular survival in high temperature, starvation, and antibiotics conditions [13] by cleaving intracellular RNAs in an ACA-specific manner and inducing translational regulation [14,15]. However, the physiological roles of MazF enzymes in other species remain largely unclear because their recognition sequences are different among MazF homologues [16]. Depending on the cleavage specificity, MazF may either generally degrade intracellular RNAs to reduce overall protein synthesis [11,14] or severely inhibit the expression of specific genes [17]. Thus, determining the recognition sequence and its abundancy in the transcripts is the first step to predicting the physiological functions of MazF orthologues.
Previously, we detected the recognition sequences of two MazF enzymes conserved in Nitrosomonas europaea, a representative ammonia-oxidizing bacterium, and performed statistical analysis to predict their main regulation targets [18,19]. The results suggested that one of the MazF enzymes improves heavy metal resistance by leaving the related transcripts uncleaved [18], while another downregulates cellular activities by digesting mRNAs essential to ammonia oxidation and carbon fixation [19]. As the other crucial contributor to complete nitrification, the environmental adaption of nitrite-oxidizing bacteria (NOB) is also of interest. Among all known NOB, the genus Nitrospira has been recognized as the most diverse and widespread [20]. Through recent genome-based studies, Nitrospira has been found to possess great metabolic versatility and is thought to have evolved the ability to cope with a wide range of environmental stresses [21,22].
In this study, we focused on a MazF module from Nitrospira strain ND1 isolated from the activated sludge of a wastewater treatment plant [23] and characterized by whole genome sequence analysis [24]. The specific cleavage sites of the endoribonuclease were detected by using a combination of massive parallel sequencing and fluorometric analysis. Based on the results, the main regulation targets were predicted.

Estimating MazF-nd1 Cleavage Sites
An RNA-sequencing approach, previously developed in our laboratory [26], was used to detect the candidate recognition sequences of MazF-nd1. Five synthetic RNAs (1000-1, 1000-2, 1000-3, 1000-4, and 1000-5) were digested by MazF-nd1 and a unique 45-nucleotide barcode RNA was ligated to the digested 5 -end of the fragments. The MazF-cleaved sites were detected by specifically mapping the barcode-ligated reads and identifying the nucleotide positions with high relative coverage increases; the value was defined as the coverage of the position divided by the coverage of a former position (Figure 2a). We selected 50 nucleotides showing the largest relative coverage increases (Table S1). These nucleotides and their nearby sequences were analyzed to determine the nucleotide frequency at each position using WebLogo [27]. Four-base motifs, AAUU and AACU, were distinctively extracted as potential recognition sequences ( Figure 2b). Furthermore, MazF-nd1 is likely to cleave those sequences between the two adenines because the coverage increased significantly at the second A-residue ( Figure 2b).

MazF-nd1 Recognizes RNA at AACU, AACG, and AAUU Sequences
In the 50 sequences selected from our sequencing results (Table S1), only 9 and 15 sequences contained AAUU and AACU around the most increased adenine, respectively, while the remaining sequences contained AAUU-or-AACU-like motifs that differed in either the first or last base, such as AAUC and UACU. Since some MazF homologues were found to possess more than one cleavage sequence with different affinities [28,29], we performed a fluorometric assay [30] using a set of short DNA/RNA chimeric oligonucleotides, which does not form a strong secondary structure, to determine the main recognition sequences from among the candidate sequences listed in Table 4. Because both ends of these probes were tagged with a pair of dyes where fluorescence resonance energy transfer occurs, fluorescence intensity increases when the oligonucleotides are cleaved, and the two dyes become separated. As predicted from the RNA-sequencing results, the fluorescence intensity rapidly increased when the probes containing AAUU and AACU were treated with MazF-nd1 ( Figure S2a,c). The reaction of the AACG-containing probe showed a similar increase ( Figure  S2d). The fluorescence intensities of other reactions were also increased to different degrees ( Figure  S2b,d), suggesting that MazF-nd1 possesses numbers of suboptimal cleavage sequences. The probes were cleaved by MazF-nd1 specifically at the RNA motifs because changes in fluorescence intensity was not detected when probes containing only DNA or RNA without any of the potential cleavage sequences were used as the substrates ( Figure S2e).

MazF-nd1 Recognizes RNA at AACU, AACG, and AAUU Sequences
In the 50 sequences selected from our sequencing results (Table S1), only 9 and 15 sequences contained AAUU and AACU around the most increased adenine, respectively, while the remaining sequences contained AAUU-or-AACU-like motifs that differed in either the first or last base, such as AAUC and UACU. Since some MazF homologues were found to possess more than one cleavage sequence with different affinities [28,29], we performed a fluorometric assay [30] using a set of short DNA/RNA chimeric oligonucleotides, which does not form a strong secondary structure, to determine the main recognition sequences from among the candidate sequences listed in Table 4. Because both ends of these probes were tagged with a pair of dyes where fluorescence resonance energy transfer occurs, fluorescence intensity increases when the oligonucleotides are cleaved, and the two dyes become separated. As predicted from the RNA-sequencing results, the fluorescence intensity rapidly increased when the probes containing AAUU and AACU were treated with MazF-nd1 ( Figure S2a,c). The reaction of the AACG-containing probe showed a similar increase ( Figure S2d). The fluorescence intensities of other reactions were also increased to different degrees ( Figure S2b,d), suggesting that MazF-nd1 possesses numbers of suboptimal cleavage sequences. The probes were cleaved by MazF-nd1 specifically at the RNA motifs because changes in fluorescence intensity was not detected when probes containing only DNA or RNA without any of the potential cleavage sequences were used as the substrates ( Figure S2e).
To compare all reactions quantitatively, we first converted the fluorescence data into percentages by considering the average fluorescence intensity of RNase-treated reactions as 100% and those of reactions with no enzyme as zero. Next, the percentage fluorescence intensity of each reaction was fitted to an integrated rate equation to calculate the initial reaction velocities ( Figure 3). As shown in Table 1, reactions with AACU, AACG, and AAUU sequences showed higher initial reaction velocities than reactions with other similar sequences by at least fivefold. Taken together, our results clearly demonstrate that MazF-nd1 is an active ribonuclease that cleaves RNA in a sequence-specific manner at AACU, AACG, and AAUU. To compare all reactions quantitatively, we first converted the fluorescence data into percentages by considering the average fluorescence intensity of RNase-treated reactions as 100% and those of reactions with no enzyme as zero. Next, the percentage fluorescence intensity of each reaction was fitted to an integrated rate equation to calculate the initial reaction velocities ( Figure 3). As shown in Table 1, reactions with AACU, AACG, and AAUU sequences showed higher initial reaction velocities than reactions with other similar sequences by at least fivefold. Taken together, our results clearly demonstrate that MazF-nd1 is an active ribonuclease that cleaves RNA in a sequence-specific manner at AACU, AACG, and AAUU.  Table S1.   Table S1.

Analysis of Intracellular Targets of MazF-nd1
Determining the recognition sequence of MazF enabled estimation of its intracellular targets and the molecular behavior of Nitrospira strain ND1 under environmental stresses. Accordingly, we evaluated the potential effects of MazF-nd1 on Nitrospira strain ND1 based on the probability of the AACU, AACG, and AAUU sequences existing in all 4624 coding sequences (CDS) ( Table S2). As described previously [31], the parameter P shows a smaller value when K, which is the actual number of the recognition sequences in a gene, is significantly larger than E, which is the mathematically calculated number. To estimate the intracellular targets of MazF-nd1, 25 protein-coding sequences with the smallest P values were analyzed (Table 2). Although 9 of the 25 genes were not annotated for Nitrospira, some of the listed genes encode notable proteins. For example, FumC, ranked third, is recognized as a tricarboxylic acid (TCA) cycle enzyme stimulated by iron limitation [32,33], and proteins of TonB-dependent transporter, which is a well-known iron transporter, were ranked 11, 17, and 25 [34]. Because iron is an important cofactor for nitrite oxidoreductase, a key enzyme in NOB that oxidizes nitrite to nitrate and shuttles electrons into the respiratory chain [20], downregulation of the genes encoding TonB-dependent receptor may cause inhibition of overall cellular processes. Transcripts without recognition sequences are considered to be tolerant to MazF [28,31]. Thus, we extracted genes lacking MazF-nd1-recognition sequences. Here, 201 CDS were detected but only 18 were annotated (Table 3). Interestingly, a mercuric transport protein, extracted as a MazF-tolerant gene in N. europaea, one of the most well-investigated nitrifiers [18], was detected. These results suggest that the two MazF orthologues from different nitrifiers may commonly improve heavy metal resistance.

Discussion
Nitrifying bacteria are generally sensitive to environmental fluctuations and easily enter a dormant state. Previously, we showed the MazF endoribonucleases (referred to as MazF NE1181 and MazFne1 in the previous papers) in N. europaea are functional growth regulators, and their sequence-specificities may allow N. europaea to alter its translation profile and survive under certain stressful conditions [18,19]. MazF-nd1 shares only 34.2% and 23.5% identities with MazF NE1181 and MazFne1, respectively ( Figure S3). Thus, we hypothesized that MazF-nd1 might possess a unique cleavage sequence and could have unique physiological roles.
In the present study, we employed a combination of massive parallel sequencing and fluorometric assays to determine the cleavage sequences of MazF-nd1. Although sequence logo analysis predicted that AACU and AAUU are the main targets (Figure 2b), six other sequences were also detected through massive parallel sequencing (Table S1 and Table 1). Hence, we compared the initial reaction velocities quantitatively using fluorometric assays, and revealed that AACG also serves as a determinant of MazF-nd1, in addition to AACU and AAUU (Table 1 and Figure 3). The reason why the AACG motif was not recovered by sequencing logo analysis remains to be fully clarified but one possible explanation is that MazF-mediated-RNA-cleavage may have been hindered by secondary structures in the long substrate RNAs used in the sequencing-based assay [35]. It is further important to recognize that the magnitude of relative coverage increase may not always correlate with MazF cleavage-activity, because the value is simply defined as the coverage of the position divided by the coverage of a former position (see Materials and Methods).
Bacteria such as N. europaea, Deinococcus radiodurans, and Mycobacerium tuberculosis harbor multiple MazF endoribonucleases with different sequence-specificities [16,18,19,29,[41][42][43][44], suggesting functional diversity of MazF homologues even within a single bacterial species or genome. Notably, RASTA-Bacteria predicted that the gene encoded at NSND_63228 in Nitrospira strain ND1 genome also codes for a MazF toxin (MazF-nd2). Given that the two MazF proteins share only 24.1% identity ( Figure S5), it appears that they possess distinct RNA cleavage specificities and physiological roles. Here, we have failed to construct an expression vector encoding the mazF-nd2 gene (stop codons were repeatedly inserted into its open reading frame, probably due to the high toxicity of MazF-nd2; data not shown). As such, it could be hypothesized that Nitrospira strain ND1 might utilize both MazEF pairs depending on the surroundings, where MazF-nd1 could be involved in reversible growth arrest rather than programmed cell death. Further studies are warranted to understand how the MazEF pairs regulate translation and benefit Nitrospira under stressful situations.

Conclusions
In conclusion, we found that a MazF module in the Nitrospira strain ND1 is a functional RNA-cleaving toxin that specifically recognizes the AACU, AACG, and AAUU motifs. We predicted its intracellular targets; our results suggest that this enzyme regulates various specific functions to resist environmental stresses. Our findings provide a foundation for further studies of the environmental adaption of Nitrospira.

In Vivo Toxicity of MazF-nd1
The pET21a-mazF-nd1 and pET21c plasmids were introduced into E. coli BL21 (DE3) (BioDynamics Laboratory, Tokyo, Japan), which were then cultivated at 37 • C overnight on LB plates containing 100 µg/mL ampicillin. These cells were then pre-cultivated for 12 h in LB medium containing 100 µg/mL ampicillin. The pre-cultivated cells were streaked on LB plates containing 100 µg/mL ampicillin with or without 100 µM of isopropyl β-D-1-thiogalactopyranoside (IPTG) and were incubated at 37 • C overnight.

Protein Expression
The pET21a-mazE-nd1 and pET21a-mazF-nd1 plasmids were introduced into E. coli BL21 (DE3) (BioDynamics Laboratory) and were cultivated at 37 • C overnight on LB plates containing 100 µg/mL ampicillin. These cells were then pre-cultivated overnight in LB medium containing 100 µg/mL ampicillin and inoculated into 1 L of ampicillin supplemented LB medium. One millimolar of IPTG was added to induce MazE-nd1 and MazF-nd1 when the OD 600 values exceeded 5.0. After 3.5 h of incubation, the cells were harvested by centrifugation at 9200× g and then stored at −80 • C until use.

Protein Purification
E. coli cells containing MazE-nd1 and MazF-nd1 were thawed on ice and suspended in 18 mL of binding buffer (20 mM sodium phosphate buffer (pH 8.0), 300 mM NaCl, 0.05% Triton X-100, 5 mM β-mercaptoethanol, and 40 mM imidazole). The cells were lysed by sonication and collected by centrifugation at 8900× g. The supernatant was then filtered through a 0.45-µm membrane (Millex, Darmstadt, Germany) and applied to 1 mL His-Trap FF column (GE Healthcare, Little Chalfont, UK). Non-specifically bound proteins were removed by washing with 32 column volumes (cv) of binding buffer using AKTA pure 25 (GE Healthcare). Hexa-histidine-tagged MazE-nd1 and MazF-nd1 were selectively eluted by increasing the concentration of elution buffer (20 mM sodium phosphate buffer (pH 8.0), 300 mM NaCl, 0.05% Triton X-100, 5 mM β-mercaptoethanol, and 500 mM imidazole) using the following program: flow rate, 1 mL/min; linear elution gradient, 20 cv; fraction size, 0.5 mL. Molecular weight and purity were confirmed by performing sodium dodecyl sulfate polyacrylamide gel electrophoresis and protein concentration was determined using a Protein Assay (Bio-Rad, Hercules, CA, USA).

Cleavage Sequence Identification
The cleavage sequence of MazF-nd1 was identified using a previously developed sequencing method [26]. In this study, 300 ng of MazF-nd1 and a 1.5 µg of mixture of five synthetic RNAs-1000-1, 1000-2, 1000-3, 1000-4, and 1000-5 were incubated at 37 • C for 1 h in MazF reaction buffer. Sequencing was performed using the MiSeq platform (Illumina, San Diego, CA, USA), and data were analyzed using CLC Genomics 10.1.1 (Qiagen, Venlo, The Netherlands). Barcode-ligated reads were extracted as describe in a previous report [26], and were mapped against the references using the following parameters: match score, mismatch cost, insertion cost, and deletion cost equal 3; length and similarity fraction both equal 1. Nucleotide positions with coverage less than 5000 were excluded from analysis, and 50 nucleotide positions showing the highest relative coverage increases, which is defined as the coverage of the position divided by the coverage of a former position, were selected. Sequences from 5 bp upstream to 5 bp downstream of these positions were extracted and aligned using WebLogo [27]. The sequencing dataset was deposited into the DDBJ Sequence Read Archive under the accession number DRA008257.

Fluorometric Detection of MazF-nd1 Activity
Forty picomoles of MazF-nd1 was added to 10 pmol of fluorescent-labeled oligonucleotides in a total volume of 20 µL. Additionally, a reaction with 100 ng of RNase A (Novagen, Darmstadt, Germany) was also performed as a positive control. All samples were incubated at 37 • C in MazF reaction buffer. Fluorescence intensity was recorded every 80 s using a Light Cycler 480 system (Roche, Basel, Switzerland) with 483 nm excitation and 533 nm detection filters. All data were collected in triplicate and the average was calculated.
The fluorescence intensity was converted into a percentage by considering the average fluorescence intensity of RNase-treated reactions as 100% and that of reactions with no enzyme as zero. The percentage fluorescence intensity of each reaction was fitted to the integrated rate equation (Equation (1)): where F(t) was the fluorescence intensity at time t; F max was the presumed maximum fluorescence intensity; and k was the observed rate constant. The cleavage activity of MazF-nd1 for each sequence was calculated as the initial reaction velocity by multiplying F max by k following the derivative (Equation (2)): Data analysis was conducted with KaleidaGraph 4.5.0 (Synergy Software, Reading, PA, USA).

Analyzing the Frequency of MazF-nd1 Cleavage Sites in Nitrospira Strain ND1 Genome
All 4624 protein-coding sequences from Nitrospira strain ND1 were obtained from the NCBI database as of 18 March 2019. According to a previous study [31], P was determined using the following equation (Equation (3)): In this equation, p is the probability of either AACU, AACG, or AAUU appearing in a Nitrospira gene, which was calculated as the sum of (percentage of A) 2 × (percentage of C) × (percentage of U + percentage of G) and (percentage of A) 2 × (percentage of U) 2 , L represents the length of the CDS, K is the actual number of recognition sequences in the CDS, and E is the expected number of recognition sequences in the CDS, calculated as p(L − 3).