The Peptidoglycan Biosynthesis Gene murC in Frankia: Actinorhizal vs. Plant Type.

Nitrogen-fixing Actinobacteria of the genus Frankia can be subdivided into four phylogenetically distinct clades; members of clusters one to three engage in nitrogen-fixing root nodule symbioses with actinorhizal plants. Mur enzymes are responsible for the biosynthesis of the peptidoglycan layer of bacteria. The four Mur ligases, MurC, MurD, MurE, and MurF, catalyse the addition of a short polypeptide to UDP-N-acetylmuramic acid. Frankia strains of cluster-2 and cluster-3 contain two copies of murC, while the strains of cluster-1 and cluster-4 contain only one. Phylogenetically, the protein encoded by the murC gene shared only by cluster-2 and cluster-3, termed MurC1, groups with MurC proteins of other Actinobacteria. The protein encoded by the murC gene found in all Frankia strains, MurC2, shows a higher similarity to the MurC proteins of plants than of Actinobacteria. MurC2 could have been either acquired via horizontal gene transfer or via gene duplication and convergent evolution, while murC1 was subsequently lost in the cluster-1 and cluster-4 strains. In the nodules induced by the cluster-2 strains, the expression levels of murC2 were significantly higher than those of murC1. Thus, there is clear sequence divergence between both types of Frankia MurC, and Frankia murC1 is in the process of being replaced by murC2, indicating selection in favour of murC2. Nevertheless, protein modelling showed no major structural differences between the MurCs from any phylogenetic group examined.


Introduction
Bacteria are exposed to various environmental stresses against which the cell wall has to provide protection, while also maintaining the cell shape. Intracellular symbiotic bacteria need to survive both inside and outside the host, which can create the need for cell wall modifications. The cell walls of Gram-positive bacteria are characterized by a ca. 30 nm thick peptidoglycan (PG) layer. As a much thinner layer in the cell wall, PG can also be found in Gram-negative bacteria (3-6 nm), cyanobacteria (10 nm), and moss chloroplasts (5 nm) [1][2][3][4]. PG is synthesized by the activity of Mur ligases, which add a number of amino acids to UDP-N-acetylmuramic acid to connect different strands. In a series of ATP-dependent reactions, L-alanine (MurC), D-glutamate (MurD), L-lysine (MurE), and D-alanyl-D-alanine (MurF) are added sequentially. The precise sequence of the reactions catalysed by these enzymes was reviewed by Barreteau et al. [5] and tested in Mycobacterium tuberculosis [6], as they represent an interesting target for antibiotic drug discovery. Several reports on the activity of Mur ligases have been published for non-pathogenic bacteria. In the cyanobacterium Anabena sp., murC and murB play an important role in heterocyst differentiation [7]. Although, so far, PG layers have not been identified for the chloroplasts of Spermatophyta, it is known that Arabidopsis thaliana Frankia datiscae Dg1 [15], in the nodules of Coriaria arborea Linds. induced by Candidatus Frankia meriodionalis Cppng1 [14], and in the nodules of Coriaria myrtifolia induced by Frankia coriariae BMG5.1 [17]. The expression was also analysed in the nodules of Discaria trinervis (Gillies ex hook.) Reiche induced by Frankia discariae BCU110501 [22]. Plants were germinated from seeds and grown in a greenhouse under a 16/8 h light/dark regime at 23/18 • C. Plants were supplied with a quarter strength Hoagland's solution with 10 mM KNO 3 [23] once a week, and with deionized water twice a week. Eight weeks after germination, the plants were inoculated with crushed nodules of D. glomerate, which were then used for the propagation of Dg1, Dg2, and Cm1, separately. Crushed nodules of C. arborea induced by Cppng1 [14] were used for inoculation with Cppng1. For inoculation with F. coriariae BMG5.1 [17] and F. discariae BCU110501 [22], in vitro cultures of Frankia were used, which had been grown in a BAP medium without nitrogen [24], adjusted to pH 9 for BMG5.1, at 28 • C for 35 and 28 d, respectively. For inoculation, the cells were pelleted and washed twice with sterile double-distilled H 2 O. At the time of inoculation, the washed root systems of the plants were evaluated to confirm that no plant was already nodulated. After inoculation, plants were supplied once per week with a quarter Hoagland's medium without nitrogen, and twice a week with deionized water. Nodules of D. glomerata were harvested twelve weeks after inoculation, while nodules of C. arborea, C. myrtifolia, and of D. trinervis were harvested four months after inoculation. The nodules were frozen in liquid nitrogen and stored at −80 • C.
Cultures of F. discariae BCU110501 grown in a BAP medium were used for the analysis of the expression levels of murC1 and murC2 in vitro.

Gene Expression Analysis using Real-Time Quantitative PCR (RT-qPCR)
Primer pairs were designed for the consensus region of murC1 and for the consensus region of murC2 for Dg1, Dg2, and BMG5.1, while separate primer pairs were designed for Cppng1 and for BCU110501. All of the primers were designed using Primer3, available on Benchling with standard settings (Benchling Biology Software [25]); a list of all of the primers used in this study is provided in Supplementary Table S1. RNA isolation, gDNA digestion, cDNA synthesis, and RT-qPCR analysis were performed as described previously [14]. The gene encoding translation initiation factor three (IF-3), infC, was used as a reference gene to normalize the expression data [14,26]. An unpaired two-tailed Student's t-test was used to test if the copies were expressed at significantly different levels. The statistical analysis and data visualisation were performed in RStudio (RStudio team [27]).

Phylogenetic and Synteny Analysis
All of the murC gene sequences of plants and bacteria, publicly available in the NCBI GenBank (www.ncbi.nlm.nih.gov/genbank) database, were selected for the calculation of a phylogenetic tree. Amino acid sequences of both MurC1 and MurC2 were taken from Frankia cluster-2 strains and were used for identifying homologous sequences using blastP. Frankia protein sequences were taken from GenDB [28] and the JGI database [29]. Angiosperm and cyanobacterial sequences were taken from GenBank. Liverwort, hornwort, moss, fern, and conifer sequences were taken from the 1KP database (www.oneKP.com [30][31][32]). Additional bacterial sequences were taken from the UniProt database; here, only the reviewed sequences were selected. All of the sequences were used for a blastP search against the GenBank database to confirm MurC identity. All of the sequences used are given in Supplementary  Table S2.
First, multiple alignment for all protein sequences of the murC genes were created using the CLUSTAL Omega algorithm with 25 iterations and were trimmed using UGene [33,34]. The model of substitution was predicted using ModelTest-NG on the CIPRES portal [35]. The phylogenetic tree was constructed using RAxML on the CIPRES portal with predefined settings, using the LG model of amino acid evolution and with 250 bootstrap iterations [35].
The murC synteny in Frankia genomes was studied in the GenDB database for the genomes of cluster-2 strains, and in the JGI database for the genomes of cluster-1, -3, and -4 strains.

Tetranucleotide Frequency
IslandViewer 4 [36] integrates genomic island predictions from multiple methods-IslandPath-DIMOB based on nucleotide bias and the presence of mobility genes, SIGI-HMM based on codon usage bias with a Hidden Markov Model approach, IslandPick based on a comparative genomics approach and the previously published Islander database, a fourth highly precise method that specifically predicts GIs integrated into tRNAs and tmRNAs with a precise definition of boundaries.

Protein Modelling
A MurC model was built for each group using SWISS-MODEL [37][38][39][40][41]. Because of the variation of sequences making up the outgroup, no protein was modelled for the TheCaHe group. The best model for each group was selected, and the different models were compared in 3D using the Dali server [42] and using the crystal structure of Haemophilus influenzae as a template (PDB id:1P31 [43]). The conservation within each group was visualised using Consurf [44][45][46], which was also used to compare the protein structure between the different groups. Figure 1 shows the organisation of the different copies of murC in the genomes of Frankia strains from the four different clusters, with murG as a reference. Only the strains from cluster-2 and -3 contain two copies of murC, murC1 and murC2, while the strains from cluster-1 and -4 contain only one copy, murC2. The organisation of the two copies differs in cluster-2 from that in cluster-3. While in the genomes of the cluster-2 strains, the two copies have the same orientation and might be organised in an operon together with murG (ca. 100 bp between murC2 and murC1), in the genomes of the cluster-3 strains, murC1 and murC2 show opposite orientations, and a bioinformatically predicted open reading frame (ORF) encoding a nitroreductase family deazaflavin-dependent oxidoreductase, sharing the orientation of murC2, is present between both genes. The same ORF can also be found between murG and murC2 in the genomes of the Frankia strains from cluster-1.  is used as a reference of the orientation of the different murC genes. MurC1, labelled in light red, was found in the strains from cluster-2 and cluster-3. MurC2, presented in orange, was found in all of the strains from all four clusters. A bioinformatically-predicted open reading frame encoding a nitroreductase family deazaflavin-dependent oxidoreductase, labelled in dark red, was found in cluster-1 and cluster-3. In clusters-1 and -4, murG is the last gene in a long operon of genes encoding the enzymes involved in peptidoglycan biosynthesis; in cluster-3, it is the second-to-last gene in this conserved operon. In cluster-2, murG-murC2-murC1 alone form an operon.

Consensus Amino Acid Sequences of Frankia MurC2 Shows Higher Similarity to MurC from Plants than from other Actinobacteria
The amino acid sequences of MurC from various other species were used to build a maximum-likelihood phylogenetic tree (Supplementary Figure S2). The backbone of the resulting tree allowed for distinguishing the following five different groups: MurC homologs from Gram-positive bacteria (I) in which Frankia MurC1 (Ia) proteins are embedded; a branch comprising plant MurC sequences (IIa) and Frankia MurC2 (IIb); a branch with the MuC proteins from cyanobacteria (III) and one for Gram-negative bacteria (IV); and last group containing sequences from Campylobacter, Helicobacter, and Thermotoga (referred to as TheCaHe (V)). Consensus sequences were generated for the different groups using UGENE (alignments are shown in Supplementary Figure S3). A heatmap was generated to visualise the distance similarity matrix of the consensus sequence for each of the groups (Figure 2), as well as for all of the analysed sequences separately (interactive Supplementary Figure S1). The consensus sequence of Frankia MurC1 showed the highest similarity with the consensus of MurC from other actinobacteria (62%), while the consensus sequence of Frankia MurC2 displayed the highest similarity with the consensus sequence of MurC from plants (63%). The amino acid similarity between the consensus sequences of Frankia MurC1 and MurC2 was at 46%, while the similarity of either consensus sequence compared with the consensus sequence of Gram-negative bacteria, cyanobacteria, or CaHeFra, individually, was below 57%.

No Sequence Evidence for Horizontal Gene Transfer (HGT) could be Found
The similarity of MurC2 with plant MurC proteins suggested the possibility of HGT. Therefore, IslandViewer [36] was used to identify the genomic islands (GIs), commonly defined as clusters of genes of a probable horizontal origin in microbial genomes. However, GC content analysis and tetranucleotide analysis (Supplementary Figure S4) could not confirm the HGT hypothesis. Nevertheless, keeping in mind that if the copy was acquired by HGT, this had to have taken place at least 100 Mya [47], which means sufficient time passed for the codon adaptation of the copy to the genome. In summary, the HGT hypothesis could be neither confirmed nor disproven.

Protein Modelling Reveals Little Structural Difference among Different Groups of MurC Sequences
While the amino acid sequence similarity between the MurC1 and MurC2 proteins was low, the modelling of the proteins revealed little structural differences ( Figure 3). Most of the protein models (illustrated in yellow and orange) could be aligned to each other and to the crystal structure of Haemophilus influenzae MurC (given in green, Figure 3). The amino acid sequences close to the substrate-binding site showed the highest conservation (given in magenta and indicated by an arrow, Supplementary Figure S5), while the least conservation was found outside of the active sites of the enzyme (illustrated in blue, Supplementary Figure S5).

Gene Expression Data Shows Differential Expression of murC1 and murC2 in Frankia Cluster-2 in Nodules, but not in Frankia Cluster-3
In order to determine whether either murC1 or murC2 was preferentially expressed in symbiosis, the relative expression levels of murC1 and murC2 were studied in nodules induced by cluster-2 strains; as these strains have not been able to be cultivated thus far, their expression levels under non-symbiotic conditions could not be analysed. For the cluster-3 strain Frankia discariae BCU110501, the expression levels of murC1 and murC2 were compared in the nodules as well as in nitrogen-fixing in vitro cultures ( Figure 4). Overall, murC2 was found to be expressed at significantly higher levels than murC1 in the nodules induced by the cluster-2 strains belonging to different uncultured species, i.e., Candidatus Frankia datiscae Dg1, Candidatus F. datiscae Cm1, Candidatus Frankia californiensis Dg2, and Candidatus Frankia meridionalis Cppng1, as well as in Frankia coriariae BMG5.1 [13][14][15]17,22]. In the only cluster-3 strain examined, F. discariae BCU110501, no significant differences between the expression levels of the two genes could be found either in the nodules or in the culture.  [14], Candidatus Frankia datiscae Cm1 [14], Frankia coriariae BMG5.1 [17], Candidatus Frankia datiscae Dg1 [15], and Candidatus Frankia meridionalis Cppng1 [13]), and in the nodules induced by the in vitro cultivated strain from cluster-3 (right graph; Frankia discariae BCU110501 [22]). The boxplots represent the mean of three independent biological samples; individual data points are presented by dots. The expression levels were determined by RT-qPCR and normalized against the expression level of the translation initiation factor three (IF-3), infC. Significant differences between the expression levels of murC1 and murC2, individually, were found for strains from cluster-2 (student's t-test, p < 0.05 indicated by asterisk *), but not from cluster-3.

Discussion
The genomes of the Frankia strains from cluster-2 and cluster-3 contain two types of murC, murC1 and murC2, while the genomes of the cluster-1 and cluster-4 strains only contain murC2. As the analysis included seven cluster-1 genomes and four cluster-4 genomes, the possibility that murC1 was missed every time because of insufficient coverage can be excluded. Furthermore, both genes are linked in cluster -1 and cluster-3; so, missing one of them repeatedly would be even more unlikely. Based on the sequence comparisons and phylogenetic analysis, MurC1 is more similar to MurC of other Gram-positive bacteria, including actinobacteria, and should therefore be considered as the ancestral type of Frankia MurC. As cluster-2 represents the earliest divergent symbiotic cluster [13][14][15][16][17], it can be concluded that the second type, murC2, which is present in all Frankia genomes available, was acquired by the common ancestor and maintained in the strains from all four clusters. The phylogeny of MurC2 supports the position of cluster-2 as the earliest divergent cluster, contradicting previous results [15], which can be explained by the lack of differentiation between murC1 and murC2 in the earlier analysis. Interestingly, the amino acid sequences of Frankia MurC2 shows more similarity to the sequences of plant MurC than to the MurC sequences taken from other bacteria. Similarly, in the phylogenetic tree, Frankia MurC2 and plant MurC appear as sister groups. Two scenarios are possible to explain these results.
First, murC2 could have been acquired via HGT by the common Frankia ancestor from an unidentified source. This should have preceded the evolution of the actinorhizal root nodule symbiosis and the separation between symbiotic and non-symbiotic strains. It is assumed that root nodule symbioses evolved ca. 100 Mya, and that Frankia strains were the original microsymbionts [47]. The fact that MurC2 shows a higher similarity to plant than bacterial MurC sequences would be consistent with a close relationship between future host plants, and Frankia as a precondition for the evolution of an intracellular symbiosis. However, if murC2 was acquired from the future plant host, it would be expected to find the highest similarity with angiosperm MurC sequences, which diverged from gymnosperms ca. 200 Mya. Our data do not support this. Recent studies have shown that the plant enzymes for the synthesis of chloroplast peptidoglycan, such as MurC, do not originate from cyanobacteria, but from an unknown bacterial source [22,48]. Thus, the origin of Mur ligase genes is clearly more complex than it was assumed when the endosymbiosis hypothesis first arose. At any rate, using software for the identification of putative genomic islands, the HGT hypothesis could be neither confirmed nor disproven.
The second explanation for the presence of two types of murC in Frankia genomes is a gene duplication event. This could then have been followed by convergent evolution, which led to more similarity between MurC2 and plant MurC proteins. Frankia are able to grow both in soil and intracellularly in the plant during root nodule symbiosis. Within symbiosis, bacterial differentiation is steered by the plant (reviewed by Pawlowski and Demchenko [9]), comparable to the direction of the differentiation of chloroplasts by the host cell. The similarity of Frankia MurC2 to plant MurC proteins might be associated with the presence of regulatory processes, which allow for easier structural changes of the peptidoglycan required for the intracellular lifestyle. This assumption is supported by the fact that in cyanobacteria, murC and murB were found to play a role in the differentiation of nitrogen-fixing heterocysts [7]. However, this would not explain why members of Frankia cluster-4, which do not engage in symbiosis, would have retained the MurC type more similar to that of plants over the ancestral MurC type of Gram-positive bacteria. At the same time, cluster-2 strains, which have a limited saprotrophic potential [14], did retain both types.
Analysis of the expression levels of both murC types showed that in the cluster-2 strains, murC2 was expressed at higher levels than murC1. Based on their organisation within the genome (Figure 1), it is possible that a weak terminator is present between the two copies in the cluster-2 genomes, resulting in lower expression levels of murC1. Intriguingly, in the nodules induced by a member of Frankia cluster-3, as well as in nitrogen-fixing cultures of the same strain, both types of murC were expressed at similar levels. However, it is known that MurC activity is regulated post-translationally by a Ser/Thr kinase [49]. Hence, it is not clear whether the transcription levels reflect the enzyme activity levels.
The synteny of the Frankia murC genes (Figure 1) does not seem to support a common origin of murC2. The situation in the genomes of the cluster-1, -3, and -4 strain would be consistent with a common origin; the synteny in cluster-3 would then represent the ancestral situation, while the cluster-1 strains would have lost murC1 and the cluster-4 strains would have lost murC1 as well as the conserved gene. However, the synteny of murC1 and murC2 in the genomes of the cluster-2 strains, while supporting an origin of murC2 by duplication of murC1, does not fit this pattern. In summary, if the similarity between Frankia murC2 and plant murC genes is due to convergent evolution, duplication of murC, followed by adaptive evolution of the second copy, could have happened more than once during the evolution of Frankia.
Altogether, Frankia murC1 has been replaced by Frankia murC2 in cluster-1 and cluster-4, and is expressed at much lower levels than Frankia murC2 in cluster-2. Thus, selection clearly favours murC2 over murC1 in Frankia. This could not be linked to differences in the protein structure, so it should be due to differences in the secondary modification.

Conclusions
Frankia genomes can contain two types of murC, murC1 and murC2. The former encodes a protein more similar to the MurC of other Gram-positive bacteria, including actinobacteria, and thus presumably represents the ancestral type of Frankia MurC. The presence of murC2 in the strains of the non-symbiotic cluster-4 indicates that the origin of murC2 predates the evolution of actinorhizal symbioses.
MurC2 shows a higher similarity with the MurC proteins from plants than from bacteria, which could be explained by HGT or by gene duplication followed by convergent evolution; detailed phylogenetic analysis involving MurC proteins from different groups of higher plants, as well as the synteny situation in Frankia genomes, favour the second option.
In spite of the fact that the amino acid sequences of MurC1 and MurC2 show significant divergence, murC1 was lost in Frankia clusters -1 and -4, and murC2 shows consistently higher expression levels in symbiosis in Frankia cluster-2, indicating selection for murC2 over murC1; modelling did not show any significant structural differences between both types of MurC.
Supplementary Materials: The following are available online at http://www.mdpi.com/2073-4425/11/4/432/s1. Figure S1: Interactive homology heatmap of all MurC sequences used in this study. Figure Figure S4: Tetranucleotide analysis for the identification of genomic islands (GIs). IslandViewer 4 was used to identify GIs in different Frankia genomes [36]. Predictions of genomic islands in the Candidatus Frankia datiscae Dg1 genome (accession number CP002801.1) are shown in a circular visualization, with blocks coloured according to the prediction method: IslandPick (green), IslandPath-DIMOB (blue), SIGI-HMM (orange), Islander (turquoise). Figure S5: Models of MurC protein of Gram positive bacteria, Gram negative bacteria, cyanobacteria, Frankia, and plants, using the solved crystal structure of Haemophilus influenzae as template (PDB accession number: 1P31), using Consurf software. Conserved sites are illustrated in magenta, less conserved sites in blue. Table S1: Primers used in this study. Table S2: MurC protein sequences used in this study. Funding: This study was financed by the Swedish Research Council Vetenskapsrådet (VR 2012-03061 to K.P.). The bioinformatics support of the BMBF-funded project "Bielefeld-Giessen Center for Microbial Bioinformatics" (BiGi), and the BMBF grant FKZ 031A533 within the German Network for Bioinformatics Infrastructure (de.NBI) are gratefully acknowledged.

Acknowledgments:
We thank Dan Sjöstrand for advice on protein modelling and Anna Pettersson (Department of Ecology, Environment, and Plant Sciences, Stockholm University) for taking care of the plants. We also thank Rán Finnsdóttir and Anaís Carpelan (Swedish Argicultural University, Uppsala) for collecting the MurC amino acid sequences from various databases. We thank Luis Wall for providing the D. trinervis seeds and BCU110501 culture for this study, and Amir Ktari and Maher Gtari for the BMG5.1 culture.

Conflicts of Interest:
The authors declare no conflict of interest. The funders had no role in the design of the study; in the collection, analyses, or interpretation of data; in the writing of the manuscript; or in the decision to publish the results.