Genes 2012, 3(2), 291-319; doi:10.3390/genes3020291

Article
The Chlamydiales Pangenome Revisited: Structural Stability and Functional Coherence
Fotis E. Psomopoulos 1,2, Victoria I. Siarkou 3, Nikolas Papanikolaou 4, Ioannis Iliopoulos 4, Athanasios S. Tsaftaris 1,5, Vasilis J. Promponas 6 and Christos A. Ouzounis 1,6,7,*
1
Institute of Agrobiotechnology, Centre for Research & Technology Hellas (CERTH), Thessaloniki GR-57001, Greece; E-Mails: fpsom@certh.gr (F.E.P.); tsaft@certh.gr (A.S.T.)
2
Department of Electrical & Computer Engineering, Aristotle University of Thessaloniki, Thessaloniki GR-54124, Greece
3
Laboratory of Microbiology & Infectious Diseases, Faculty of Veterinary Medicine, Aristotle University of Thessaloniki, Thessaloniki GR-54124, Greece; E-Mail: vickysi@vet.auth.gr
4
Division of Medical Sciences, University of Crete Medical School, Heraklion GR-71110, Greece; E-Mails: papnikol@med.uoc.gr (N.P.); iliopj@med.uoc.gr (I.I.)
5
Department of Genetics & Plant Breeding, Aristotle University of Thessaloniki, Thessaloniki GR-54124, Greece
6
Bioinformatics Research Laboratory, Department of Biological Sciences, University of Cyprus, P.O. Box 20537, Nicosia CY-1678, Cyprus; E-Mail: vprobon@ucy.ac.cy
7
Donnelly Centre for Cellular & Biomolecular Research, University of Toronto, 160 College Street, Toronto, Ontario M5S 3E1, Canada
*
Author to whom correspondence should be addressed; E-Mail: ouzounis@certh.gr; Tel.: +30-231-049-8473; Fax: +30-231-049-8270.
Received: 27 March 2012; in revised form: 2 May 2012 / Accepted: 8 May 2012 /
Published: 16 May 2012

Abstract

: The entire publicly available set of 37 genome sequences from the bacterial order Chlamydiales has been subjected to comparative analysis in order to reveal the salient features of this pangenome and its evolutionary history. Over 2,000 protein families are detected across multiple species, with a distribution consistent to other studied pangenomes. Of these, there are 180 protein families with multiple members, 312 families with exactly 37 members corresponding to core genes, 428 families with peripheral genes with varying taxonomic distribution and finally 1,125 smaller families. The fact that, even for smaller genomes of Chlamydiales, core genes represent over a quarter of the average protein complement, signifies a certain degree of structural stability, given the wide range of phylogenetic relationships within the group. In addition, the propagation of a corpus of manually curated annotations within the discovered core families reveals key functional properties, reflecting a coherent repertoire of cellular capabilities for Chlamydiales. We further investigate over 2,000 genes without homologs in the pangenome and discover two new protein sequence domains. Our results, supported by the genome-based phylogeny for this group, are fully consistent with previous analyses and current knowledge, and point to future research directions towards a better understanding of the structural and functional properties of Chlamydiales.
Keywords:
comparative genomics; pangenome analysis; Chlamydiales; protein family detection; genome annotation; genome trees

1. Introduction

Members of the order Chlamydiales are obligate intracellular bacteria, characterized by a unique developmental cycle and are important pathogens of humans and animals resulting in a wide range of diseases, including several zoonoses [1,2,3]. The order Chlamydiales, separated from other eubacteria by forming a deep branch in ribosomal RNA-based phylogenetic trees, has been enriched by new lineages. Beside the family Chlamydiaceae, in which important chlamydial pathogens are grouped, new families, such as Parachlamydiaceae, Simkaniaceae and Waddliaceae, have been recognized to accommodate newly discovered pathogenic and non-pathogenic chlamydial organisms [4,5,6].

Since the release of the first chlamydial genome sequence from Chlamydia trachomatis (serovar D) [7], new genomes are being sequenced, thus offering insights into the genome organization and functional capacity of the corresponding species [8]. Besides its crucial importance for applied research in medical and veterinary microbiology [9], this corpus of genomic information is also key to understanding the evolutionary position of various chlamydial species (or strains) and the inference of the internal phylogeny of this distinct taxon [8,10,11,12].

As the intracellular lifestyle imposes constraints on gene content and metabolic capabilities, the Chlamydiales might represent one of the best datasets for the development of pangenome analysis methods [13]. Additional challenges are the wide variety of chlamydial genome sizes with unequal rates of reduction, and a repertoire of less characterized proteins than other bacterial groups whose pangenomes have been analyzed, e.g., Streptococcus or Salmonella [14,15].

Previously, we have used the genome of Chlamydia trachomatis [7] as a case study for annotation transfer quality [16]. Using a novel encoding scheme and a scoring function called TABS for transitive annotation-based scale [16], our main finding regarding annotation was that, despite a number of inconsistencies, automated annotation pipelines performed remarkably well when benchmarked against a manually curated annotation corpus [16]. These results are important for the quantification of reproducibility and consistency in genome-wide annotation [17].

In this work, we explore the entire set of the Chlamydiales pangenome with a broad collection of genome sequences publicly available to date (31 Chlamydiaceae and six other Chlamydiales genomes), twice as many as in a similar recent analysis [18]. Importantly, our pangenome analysis pipeline incorporates recently sequenced genomes of key Chlamydiaceae species not previously reported, thus augmenting our understanding from previous findings [18,19].

We focus on key aspects of pangenome analysis and explore multiple facets of the Chlamydiales gene content in terms of protein-coding genes and families. We also provide certain key findings that might illuminate the evolutionary history of this group as well as interesting sequence motifs not widely shared within this order. Beyond the confirmation of the recent analysis of the Chlamydiales as mentioned above [18], we also use this group to expand on methods for pangenome analysis [13,20] by proposing a pangenome analysis pipeline. Our results are consistent with wider studies of pangenomes [21] and provide additional knowledge for Chlamydiales. In conclusion, pangenome analysis offers an opportunity for the study of bacterial genome evolution, the development of relevant methods and the understanding of genome structure and proteome function on a large scale.

2. Experimental Section

2.1. Data Collection

All protein sequence data from 37 genomes were compiled into a single data collection (February–July 2011), including the most recent published Chlamydiales genomes. In total, 43,736 protein-coding genes were extracted from public databases corresponding to the entire set of 37 genome sequences from the bacterial order Chlamydiales currently available (Table 1). Sequence data were codified following the style of the COGENT database [22], for easy identification both by programs and human users (Supplement S1). The above notation is followed throughout this work. The COGENT scheme encodes genus and species names into a four-character identifier prefix string, followed by a code for the strain name, its version (in this collection all versions are considered as version 1 and optionally hidden) and finally for proteins the relative order of the sequence within the genome [23] (Table 1). We have also recorded the date of publication for the corresponding genome (or the release date where no publication was available) (Supplement S2).

Table Table 1. List of Chlamydiales genome sequences used in this study.

Click here to display table

Table 1. List of Chlamydiales genome sequences used in this study.
##Species and Strain Name/CodesInternal IdentifierProtein-Coding Genes
01Candidatus Protochlamydia amoebophila UWE25CPRO-UWE-012,031
02Chlamydia muridarum NiggCMUR-NIG-01911
03Chlamydia trachomatis 434/BuCTRA-434-01874
04Chlamydia trachomatis A/HAR-13CTRA-AHA-01919
05Chlamydia trachomatis B/Jali20/OTCTRA-BJA-01875
06Chlamydia trachomatis B/TZ1A828/OTCTRA-BTZ-01880
07Chlamydia trachomatis D/UW-3/CXCTRA-DUW-01895
08Chlamydia trachomatis L2b/UCH-1/proctitisCTRA-L2B-01874
09Chlamydophila abortus S26/3CABO-S26-01932
10Chlamydophila caviae GPICCCAV-GPI-011,005
11Chlamydophila felis Fe/C-56CFEL-FEC-011,013
12Chlamydophila pneumoniae AR39CPNE-AR3-011,112
13Chlamydophila pneumoniae CWL029CPNE-CWL-011,052
14Chlamydophila pneumoniae J138CPNE-J13-011,069
15Chlamydophila pneumoniae TW-183CPNE-TW1-011,113
16Waddlia chondrophila WSU 86-1044WCHO-WSU-011,956
17Chlamydia trachomatis E/150CTRA-E15-01927
18Chlamydophila pecorum E58CPEC-E58-01988
19Chlamydophila psittaci 6BCCPSI-6BC-01975
20Chlamydophila abortus LLGCABO-LLG-01925
21Chlamydophila pneumoniae LPCoLNCPNE-LPC-011,105
22Chlamydophila psittaci Cal10CPSI-CAL-011,005
23Parachlamydia acanthamoebae UV7PACA-UV7-012,788
24Parachlamydia acanthamoebae str. Hall’s coccusPACA-HAL-012,809
25Simkania negevensis ZSNEG-ZXX-012,518
26Waddlia chondrophila 2032/99WCHO-203-012,015
27Chlamydophila psittaci 01DC11CPSI-01D-01975
28Chlamydophila psittaci 02DC15CPSI-02D-01978
29Chlamydophila psittaci 08DC60CPSI-08D-01973
30Chlamydia trachomatis D-ECCTRA-DEC-01878
31Chlamydia trachomatis D-LCCTRA-DLC-01878
32Chlamydia trachomatis E/11023CTRA-E11-01926
33Chlamydia trachomatis G/11074CTRA-G74-01919
34Chlamydia trachomatis G/11222CTRA-G22-01927
35Chlamydia trachomatis G/9301CTRA-G93-01921
36Chlamydia trachomatis G/9768CTRA-G97-01920
37Chlamydia trachomatis Sweden2CTRA-SWE-01875
Total43,736

The first column signifies the inclusion order into the genome collection and does not reflect any other relationship. The second column lists the species and strain name, the third column the COGENT-style identifier and the last column the number of protein-coding genes.

2.2. Sequence Comparison

All protein sequence data were masked using CAST with default parameters (threshold = 40), to exclude compositionally biased regions [24]. In total, 6,906 such regions were filtered out, provided for further study (Supplement S3).

The masked sequences were used as queries against the genome corpus, in an all-against-all mode with BLAST (blastall, e-value threshold 10−6) [25,26]; in total, more than 40,000 BLAST searches were performed and 1,709,325 significant similarities below threshold were obtained (Supplement S4).

2.3. Clustering and Annotation

The similarity pairwise list (from Supplement S4) was submitted to MCL sequence clustering [27], with default parameters (e.g., inflation value 2.0); clusters were incrementally assigned to an integer identifier. Clusters are sorted by their size (number of members in a cluster, Supplement S5); thus, the largest clusters have smallest-integer identifiers (see Results and Appendix-Table 1). This approach has also been used successfully elsewhere [28] as a method of choice.

Annotation transfer based on the first chlamydial genome ever sequenced was implemented through the direct matching of the lead sequences to a previously highly curated dataset for Chlamydia trachomatis D/UW-3/CX [7].

The annotation qualifiers used in the manually curated corpus [16] are: ENZYME (for enzymes with EC number assignments), FUNCTION (for other protein functions), SIMILAR-TO (for those sequences with a similarity to a protein of known function but no specific assignment) and DOMAIN (for the existence of a known, named protein sequence domain) [16] (Appendix-Table 1).

Sequence matching of the original dataset to the data collection presented here was performed by MagicMatch [29], which was the first scheme to implement the MD5 checksum for protein sequence identification, an approach later propagated in all major database resources.

2.4. Analysis of Unique Genes

All unique genes, i.e., more than 2,000 genes with no similarity within the pangenome, were searched against the non-redundant protein sequence database (nrdb: 15,052,178 entries) [30]. Results from this search were evaluated manually and key similarities were extracted for further investigation (Supplement S6).

2.5. Genome Trees

Genome-based trees were calculated using phylogenetic profile distance [31,32]. Similarity values were measured by the shared number of genes represented by phylogenetic profiles, symmetrified by the minimum shared value, normalized by minimum self-similarity and turned into distance values as previously described [32,33] (Supplement S7).

2.6. Sequence Alignments

Multiple sequence alignments were performed and visualized by JalView [34]. Novel motifs reported in this work are provided below and in FASTA format (Supplements S8,9).

2.7. Data Availability

Per genome contributions to the pangenome are also provided (Supplement S10). All sequence data and results (in 10 Supplements) have been made available at datadryad.org, under the identifier [35].

3. Results

3.1. General Characteristics of the Chlamydiales Pangenome

The Chlamydiales collection herein contains over 40,000 protein-coding genes in total, with ~1,200 genes/genome on average, with significant deviations (Table 1). We take the view to present the two extreme tails of this data collection in detail following the clustering step for the identification of protein families within the pangenome and comment on the intermediate cases. In other words, we primarily focus on the two classes of the most interesting clusters, (i) those containing the core genes and (ii) those corresponding to “unique” genes, without significant similarities within the pangenome, thus singleton clusters. The functional characterization of the entire complement as well as further issues listed in the discussion for future research are clearly beyond the scope of this critical review.

3.2. Protein Families

In total, the clustering has yielded 5,554 clusters corresponding to protein families. For practical purposes, we define a protein family as one that contains at least three genes: in that sense, there are 294 cases, which do not detect themselves in this comparison (typically because of either short length, abnormal composition, or both), 2,038 unique genes (singletons) and 1,177 doublets. The remaining 2,045 clusters represent protein families with three or more members, distributed across 37 genomes (Figure 1).

Genes 03 00291 g001 200
Figure 1. Pangenome protein family size distribution. Cluster size is displayed on the x-axis (bins until 50 are all shown; above 50, bins are shown for each ten counts, labels for every five bin sizes); absolute frequency of clusters is shown on the left y-axis (bars, green curve); cumulative count of clusters is shown on the right y-axis (orange curve). Families are defined as those clusters with at least three members (see text); all cluster frequencies are shown here for completeness. The bimodal nature of the distribution can be seen between the peak at low cluster sizes and 37; above 37 there are multi-member and multi-species protein families (see text).

Click here to enlarge figure

Figure 1. Pangenome protein family size distribution. Cluster size is displayed on the x-axis (bins until 50 are all shown; above 50, bins are shown for each ten counts, labels for every five bin sizes); absolute frequency of clusters is shown on the left y-axis (bars, green curve); cumulative count of clusters is shown on the right y-axis (orange curve). Families are defined as those clusters with at least three members (see text); all cluster frequencies are shown here for completeness. The bimodal nature of the distribution can be seen between the peak at low cluster sizes and 37; above 37 there are multi-member and multi-species protein families (see text).
Genes 03 00291 g001 1024

It is evident that the protein family size distribution follows, as expected, the shape of other pangenome analyses, with a clear bimodal distribution, with one peak at low-count families which has been called the “accessory pool” and another peak at the limit of the genomes under consideration, which has been called the “extended core” [21]. The so-called “character genes” (which we prefer to define as “peripheral”, as opposed to “core” genes) exhibit, by definition, a heterogeneous distribution across genomes (and between peaks) and present an additional challenge for further interpretation (Figure 1).

The peak at exactly 37 with 312 counts, i.e., 312 families with exactly 37 members, corresponds to the number of 37 genomes analyzed across the pangenome. Beyond that peak, there are 180 protein families with more than 37 members (clusters 1–180) (Supplement S5), of which ten contain more than 100 members and are discussed below.

3.3. Multi-Member Families

The four largest families with more than 120 members are represented by the ABC transporter permeases (530 members), the polymorphic outer membrane proteins of Chlamydiaceae [36] (POMPs, 435 members), the flagellum-specific ATP synthases/type III secretion system ATPases, e.g., CT669 [37] (152 members), and a family of unknown function recently characterized as type III secreted effectors [38] (DUF582, 140 members) (Figure 2).

Genes 03 00291 g002 200
Figure 2. Top ten multi-member families within the pangenome. Genomes (with full COGENT-like codes) are shown on the x-axis, sorted by total protein-coding gene count (see also Table 1). Absolute cumulative counts of multi-member families are shown on the y-axis (displayed in the figure legend from left to right and then top to bottom, e.g., ABC transporter permeases, POMPs, type III secretion system ATPases, etc. according to size, see text), color coded according to figure legend.

Click here to enlarge figure

Figure 2. Top ten multi-member families within the pangenome. Genomes (with full COGENT-like codes) are shown on the x-axis, sorted by total protein-coding gene count (see also Table 1). Absolute cumulative counts of multi-member families are shown on the y-axis (displayed in the figure legend from left to right and then top to bottom, e.g., ABC transporter permeases, POMPs, type III secretion system ATPases, etc. according to size, see text), color coded according to figure legend.
Genes 03 00291 g002 1024

Following those, there are another four families with more than 110 members each: the EF-Tu/EF-G/LepA family (119 members), the oligopeptide binding protein family OppA (114 members), the GroEL family (111 members) and finally the Ile-Leu-Val (ILV)-tRNA synthetases (111 members). These are followed by two families with more than 100 members, namely the Dihydrolipoamide acetyltransferase E2 component/Dihydrolipoamide succinyltransferase (110 members) and the 3-oxoacyl-[acyl-carrier protein] reductase families (109 members) (Figure 2).

A significant number of multi-member families contain proteins of known function (Supplement S5). Interestingly, families containing only homologues from S. negevensis, W. chondrophila, P. acanthamoebae and Protochlamydia amoebophila are 172 in total, remarkably close to the 171 clusters of “orthologous” proteins in this group of species reported recently [18].

3.4. Core Genes

At the other end of the bimodal distribution, there are 312 families with 37 genes each, reflecting the number of genomes analyzed. However, there are eight clusters here with duplicates per genome (clusters 224, 460: S. negevensis; 254, 276, 420: P. acanthamoebae; 255, 272: P. amoebophila; 429: W. chondrophila 203) (two of which of unknown functional roles, Appendix-Table 1). Thus, there are exactly 304 protein families with 37 genes each represented once in each genome, which can be truly called “core” genes, most of which have some source of annotation (Appendix-Table 1). These represent just over a quarter of the average chlamydial genome (304/1182 = 26%).

Annotations transferred from the manually curated seed annotation corpus of C. trachomatis reveal a wide range of functional roles for this core set, as expected (Appendix-Table 1). Indeed, 227 families of the core set can be assigned to a functional role, according to the annotation qualifiers originally used (see Experimental Section). Only an additional 77 cases in this set do not contain any annotation (Appendix-Table 1). It can be argued, therefore, that this level of characterization of 75% (227/304) across 37 genomes signifies a functional coherence that is consistent with our current knowledge of this taxonomic order. This list is provided for further investigation by the community; it is worth pointing out that it encompasses basic cellular roles in genetic information processing (e.g., cluster 184), including transcription (e.g., cluster 187) and translation (e.g., clusters 242–243), metabolic transformations (e.g., cluster 182 or 196), transport systems (e.g., clusters 193–195) and other key processes (e.g., cluster 192). It is interesting to note that apart from complements represented by ribosomal proteins or aminoacyl-tRNA synthetases, other systems are also coherently detected, for example the NifU [39]/NifS [40] genes (clusters 221–222).

3.5. Peripheral Genes

In the midst of the two extremes (viz. peaks) of the bimodal family size distribution, there exists a wide variety of cases with an anomalous and clearly heterogeneous pattern. There are 428 families with more than ten and less than 37 members (not shown, available in Supplement S5). Their hererogeneous composition is reflected by the fact that 217 of the 428 families (just over 50%) do not contain a homolog outside the Chlamydiaceae, i.e., across the larger genomes mentioned above. Within this group, however, there is a significant variation of family phylogenetic distribution (not shown) that needs to be explored in future research.

3.6. Unique Genes

In total, there are 2,038 unique genes represented by singleton clusters, thus not falling into families within the pangenome. The content of genomes with unique genes varies significantly, from 0 to 796 (S. negevensis), with 55 unique genes on average. In percentage points, this varies from obviously 0 to 32% of the genome (S. negevensis), with an average of just over 3% per genome (Figure 3).

Genes 03 00291 g003 200
Figure 3. Correlation between genome size and unique genes. Genome size is given as the number of protein-coding genes (shown on the x-axis) against the count of unique genes (number of unique genes without homologs within the pangenome, shown on the y-axis; y-axis is displayed on logarithmic scale). The six points on the upper right part of the graph are evidently those genomes with largest gene counts, all outside the Chlamydiaceae family (see Table 1 and text). The pattern observed is primarily due to the sampling of taxonomic space of the Chlamydiales and will vary as more genomes from this group become available.

Click here to enlarge figure

Figure 3. Correlation between genome size and unique genes. Genome size is given as the number of protein-coding genes (shown on the x-axis) against the count of unique genes (number of unique genes without homologs within the pangenome, shown on the y-axis; y-axis is displayed on logarithmic scale). The six points on the upper right part of the graph are evidently those genomes with largest gene counts, all outside the Chlamydiaceae family (see Table 1 and text). The pattern observed is primarily due to the sampling of taxonomic space of the Chlamydiales and will vary as more genomes from this group become available.
Genes 03 00291 g003 1024

The densest part of the phylogeny exhibits no unique genes—17 genomes, including most of the C. trachomatis and C. psittaci strains, C. pneumoniae CWL029 and C. abortus LLG (Figure 3, missing points corresponding to 17 genomes with zero value on the y-coordinate, available in Supplement S5). Twenty genomes have unique genes, of which six genomes have less than 10 such genes and one with 15 unique genes (Figure 3), all from the above group, or less than 2% of their genome entries. Another five genomes with a handful of unique genes are C. pneumoniae AR39 (33/3%), TW-183 (43/4%) and LPCoLN (60/5%) as well as C. felis (27/3%) and C. caviae (29/3%). The remaining eight genomes contain the majority of unique genes, 1818 in number or 89% of total, ranging from 66 (W. chondrophila WSU, 3% of genome) to 796 genes (S. negevensis, 32% of genome). This is not entirely a biological effect, rather a sampling artifact arising from the deeper sequencing of the C. trachomatis/C. pneumoniae group (see below).

The six outliers which form a different group above (upper right, Figure 3) are all species with large genomes (ca. 2,000 protein-coding genes or more): the two W. chondrophila strains (3–4%), the two P. acanthamoebae strains (4–8%), P. amoebophila (20%), and S. negevensis (32%), listed here according to the absolute number of their unique genes per genome. In relative terms, however, two species namely C. muridarum (36/4%), and C. pecorum (76/8%) contain a significant number of unique genes given their relatively small genome size (both less than 1,000 protein-coding genes).

3.7. Properties of Unique Genes

The genes considered as singletons in this analysis are 2,038 as mentioned above. Of those, a number of short genes might fall into pangenome families (not shown) but do not seriously affect the overall assessment (e.g., case CCAV-GPI-01-000824 in Supplement S6). This is an artifact of sensitivity for the two different searches, first against the 40,000 or so genes of the pangenome and second against the entire nrdb database of more than 15 million sequences. While a full analysis of the unique gene complement of the Chlamydiales is under progress, it is interesting to report on a number of findings pertinent to this work.

A number of genes from the pangenome have identified homologs such as cell-wall associated hydrolases (TC0114 from C. muridarum Nigg), proteins of unknown function (e.g., pc0061, pc0549, pc0850, pc0855), endonucleases (e.g., pc0252), exonucleases (pc0951), transposases (e.g., pc0068), DNA repair proteins (e.g., pc0286), acyltransferases (e.g., pc0180), Mg chelatases (pc0480), oxidoreductases (pc0504), streptomycin 6-kinases (e.g., pc0510), metallophosphoesterases (pc0948) from P. amoebophila and LmbE/ypjG family proteins (e.g., wcw_0275) or transposases (e.g., wcw_0482) from W. chondrophila WSU. Similarly, multiple cases of similarity to families of known or unknown function are discovered for unique genes from the larger genomes (not shown).

One such domain is an enigmatic, short and highly conserved motif containing the triplet Pro-Cys-Tyr (PCY), present in the C. pneumoniae AR39 CP0988 protein. This protein is 52 residues long and does not exhibit significant similarities to any other protein in the Chlamydiales pangenome. However, it does show similarity to a set of short proteins (<100 residues long) from various species, including Acinetobacter, Brucella, Clostridium, Coxiella, Curvibacter, Eubacterium, Parvimonas, Rhizobium, Ruminococcus, Selenomonas, Streptomyces, other longer proteins from Chloroflexi, Heliobacterium, Lactobacillus, the C-terminus of a Propionibacterium protein (HL046PA2) and an uncultured Acidobacteria bacterium HF4000_26D02, and importantly, to a number of longer plant proteins from Nicotiana tabacum, Pinus koraiensis, Solanum demissum (middle of protein) and Vitis vinifera (N-terminus, total length 1,193 residues) (Supplement S8). This conserved region with this peculiar phylogenetic distribution has not been characterized previously to our knowledge, and can be considered a genuine novel domain of unknown function (Figure 4). It remains unclear whether the domain has been universally lost from the Chlamydiales pangenome or acquired from C. pneumoniae through horizontal transfer.

Another interesting example of a unique protein is the P. amoebophila pc0506. This 82-residue-long uncharacterized protein is evidently absent from the core pangenome and yet it exhibits significant similarity to four Verrucomicrobia proteins from Verrucomicrobium spinosum, Chthoniobacter flavus, Pedosphaera parvula and Coraliomargarita akajimensis, in this order of similarity, ranging from 53% down to 44% sequence identity (Figure 5). The above mentioned proteins reportedly belong the leucyl aminopeptidase superfamily (Supplement S9). The functional significance of this biochemical role for P. amoebophila is not yet understood. Yet, the strong mutual similarity of this protein family with Verrucomicrobial and P. amoebophila members (no other member in the entire pangenome) can be placed within the general controversy of the connection of Chlamydiales with the so-called PVC group [41,42] (see below).

Genes 03 00291 g004 200
Figure 4. Alignment of the PCY domain. The PCY motif is centered around position 15 of the multiple alignment. The domain was discovered following five iterations with PSI-BLAST with CP0988 as query sequence (GI:16752158), until convergence and an e-value threshold 0.005. In total 70 sequences were recovered; redundancy was removed at 95% with Jalview [34], resulting in 32 sequences shown here. The length of the domain is just 30 residues; boxes signify sequence identity at 50% or above (darker color: more conserved). GI labels are provided, along with sequence coordinates on the left of the alignment (see text for more details and discussion).

Click here to enlarge figure

Figure 4. Alignment of the PCY domain. The PCY motif is centered around position 15 of the multiple alignment. The domain was discovered following five iterations with PSI-BLAST with CP0988 as query sequence (GI:16752158), until convergence and an e-value threshold 0.005. In total 70 sequences were recovered; redundancy was removed at 95% with Jalview [34], resulting in 32 sequences shown here. The length of the domain is just 30 residues; boxes signify sequence identity at 50% or above (darker color: more conserved). GI labels are provided, along with sequence coordinates on the left of the alignment (see text for more details and discussion).
Genes 03 00291 g004 1024
Genes 03 00291 g005 200
Figure 5. Alignment of a unique leucyl aminopeptidase family. The domain was discovered following five iterations with PSI-BLAST with pc0506 as query sequence (YP_007505.1). Display conventions as in Figure 4.

Click here to enlarge figure

Figure 5. Alignment of a unique leucyl aminopeptidase family. The domain was discovered following five iterations with PSI-BLAST with pc0506 as query sequence (YP_007505.1). Display conventions as in Figure 4.
Genes 03 00291 g005 1024

In all, it appears that properties encoded from most unique genes, apart from their unusual phylogenetic distribution, represent accessory functional roles that provide additional versatility to the largest genomes in the group, possibly related to their extra functional capabilities. Two exceptions with seemingly central functions are wcw_0805, with similarity to the 50S L34 ribosomal protein family and wcw_861, with similarity to 6-pyruvoyl tetrahydrobiopterin synthases, both from W. chondrophila WSU (not shown).

3.8. Protein Family Contributions from Genome Projects

As mentioned above, we have tracked the original publication (and/or release) data for the genomes under consideration, in terms of novel families detected per genome sequence (Supplement S10). By mapping the protein families which appear first in this ranking order, we can thus estimate the relative “novelty” or contribution of previously unseen protein families within the chlamydial pangenome and the typical “pangenome saturation curve” (Figure 6).

Genes 03 00291 g006 200
Figure 6. Protein family contributions from genome projects. Genome codes are sorted according to their original publication date (and/or release date, x-axis); absolute number of “novel” protein families within the pangenome are given (left y-axis, blue curve and square symbols); cumulative sum of protein families (up to 5,260, excluding those without self-hits, see text) is also shown, defined as a “pangenome saturation curve” (right y-axis, green curve and square symbols).

Click here to enlarge figure

Figure 6. Protein family contributions from genome projects. Genome codes are sorted according to their original publication date (and/or release date, x-axis); absolute number of “novel” protein families within the pangenome are given (left y-axis, blue curve and square symbols); cumulative sum of protein families (up to 5,260, excluding those without self-hits, see text) is also shown, defined as a “pangenome saturation curve” (right y-axis, green curve and square symbols).
Genes 03 00291 g006 1024

As expected, and discussed above (Figure 3), for the densest part of the group, little or no contributions have been provided. Apart from the larger genomes, which have added hundreds of new gene types [19], the more distant members of the group with small genomes, for instance C. caviae or C. pecorum, have also contributed a significant number (80 and 76, respectively—Supplement S10).

3.9. Genome Phylogeny

Finally, we have reconstructed the genome phylogeny of the pangenome based on the sharing of phylogenetic profile patterns based on the above analysis (see Experimental Section). Evidently, the pangenome is stratified according to the known, established phylogeny patterns [10] (Figure 7). The genome tree is another concise way to visualize the “novelty” components of the various species and strains that have been sequenced, exemplified above in various contexts, e.g., number of unique genes (Figure 3) or the tracking of the relative contributions of novel protein families (Figure 6). A future aspect of this work will be to infer the history of the pangenome using methods of ancestral state reconstruction [43]. The evolutionary history of the Chlamydiales as reflected by the genome tree might also shed light on the ongoing controversy about their status within the tree of life [41].

Genes 03 00291 g007 200
Figure 7. Genome tree of the Chlamydiales. Dendrogram representing phylogenetic relationships of the 37 Chlamydiales genomes analyzed, based on sharing of phylogenetic profiles (see Experimental Section for details). Genome codes are given as labels.

Click here to enlarge figure

Figure 7. Genome tree of the Chlamydiales. Dendrogram representing phylogenetic relationships of the 37 Chlamydiales genomes analyzed, based on sharing of phylogenetic profiles (see Experimental Section for details). Genome codes are given as labels.
Genes 03 00291 g007 1024

The genome tree accurately reflects the current taxonomy of Chlamydiales [4,44], with a couple of notable exceptions namely the clustering of C. abortus with C. psittaci, the closer relationship of C. felis with the former two species against C. caviae—in agreement with previous findings [6,44] but not with other proposals [8]—as well as the distinct relationship of C. pecorum at the root of Chlamydiacae and not as a sister group of C. pneumoniae [4,44]. The resulting phylogenetic tree using genome-wide phylogenetic profile sharing patterns can also act as an internal control of the pangenome analysis, since all the closely related strains sequenced are grouped together with very high accuracy (Figure 7).

4. Discussion

Our results suggest that the Chlamydiales pangenome reflects a certain degree of structural stability, as core genes represent over a quarter of an average genome, as well as functional coherence, in the sense that most functional properties of these genes are consistent with current knowledge. Unlike various claims in the recent literature, it turns out that, at least in the case of a highly constrained pangenome of intracellular pathogens, there is an unexpected degree of stability, given the wide range of phylogenetic relationships within this particular taxon.

It is thus shown that for the smallest of genomes (<900 protein-coding genes), over a third of their gene content is shared with larger genomes (>2,000 genes), decorated by a broader element of so-called “character”, or peripheral, genes. This distribution, which in turn is influenced by the sampling of phylogeny and other factors, requires further investigation, being beyond the scope of this work.

It should also be pointed out that the Chlamydiales pangenome exhibits general characteristics of distribution not dissimilar to other recent pangenome analyses, including those of the Salmonella pangenome with 45 strains [15], the Streptococcus pneumoniae pangenome with 44 strains [14] and the Campylobacter pangenome with 96 strains [28], suggesting the conservation of a core pangenome within and across bacterial taxa that have been sampled adequately. In the case of Salmonella, tracking the contributions of new strains to the entire core set and the pangenome suggests a slight expansion with more sampling and a stable core, reminiscent of the Chlamydiales, with one third of the pangenome represented in the core set [15]. A slightly less stable pattern is detected in the Streptococcus pneumoniae group [14], possibly due to a wider diversity in that sample, yet with a similar pattern of core set saturation. Interestingly, an attempt for ancestral reconstruction in the S. pneumoniae/S. mitis complex suggests that there is a dual process of genome expansion and reduction in the different paths leading to the genomes of contemporary strains [14]. A more comprehensive analysis of the Campylobacter pangenome with 96 strains [28], using a combination of experimental and theoretical work, also points to the same direction: Within the two species groups examined, the core gene set overlap reaches 80%, supporting earlier findings for the related Helicobacter pylori strains [45].

5. Conclusions

We have thus examined the salient features of the Chlamydiales pangenome, introducing a pangenome analysis pipeline and certain definitions that facilitate the discovery of core and peripheral genes, the identification of unique genes with various origins as well as the detection of novel protein sequence domains. We expect that analogous efforts will lead to rigorous standards for pangenome analysis in the future. Future research opportunities abound, for example: ancestral reconstruction [43], syntenic patterns of genome structure (e.g., [28,45]), the (presently limited) enrichment with expression data, the evolutionary histories of ‘peripheral’ genes (as discussed above), the connection of Chlamydiales with plants [46,47,48,49,50], the position of the Chlamydiales in the tree of life, and the connection with the PVC superphylum [41,42,50]. Wider challenges that go beyond the above pangenome-specific issues might include a more detailed annotation of the entire dynamic range of family distribution [21], the characterization of protein function in a wider context including comparative metabolic reconstructions [19], the evolution of mobile elements [51], the deeper understanding of the physiological and pathological properties [52,53] of the strains that have been sequenced and the connection with other pangenomes [28].

Acknowledgements

Parts of this work have been supported by the FP6 Network of Excellence ENFIN (contract # LSHG-CT-2005-518254) and the FP7 Collaborative Project MICROME (grant agreement # 222886-2), both funded by the European Commission. C.A.O. thanks the Department of Biological Sciences at the University of Cyprus for their kind hospitality during the spring semester of 2012.

References

  1. Wyrick, P.B. Intracellular survival by Chlamydia. Cell. Microbiol. 2000, 2, 275–282.
  2. Corsaro, D.; Venditti, D. Emerging chlamydial infections. Crit. Rev. Microbiol. 2004, 30, 75–106.
  3. Horn, M. Chlamydiae as symbionts in eukaryotes. Annu. Rev. Microbiol. 2008, 62, 113–131.
  4. Everett, K.D.; Bush, R.M.; Andersen, A.A. Emended description of the order Chlamydiales, proposal of Parachlamydiaceae fam. nov. and Simkaniaceae fam. nov., each containing one monotypic genus, revised taxonomy of the family Chlamydiaceae, including a new genus and five new species, and standards for the identification of organisms. Int. J. Syst. Bacteriol. 1999, 49, 415–440.
  5. Ludwig, W.; Euzéby, J.; Whitman, W.B. Road map of the phyla Bacteroidetes, Spirochaetes, Tenericutes (Mollicutes), Acidobacteria, Fibrobacteres, Fusobacteria, Dictyoglomi, Gemmatimonadetes, Lentisphaerae, Verrucomicrobia, Chlamydiae, and Planctomycetes. In Bergey’s Manual of Systematic Bacteriology, 2nd; Krieg, N.R., Staley, J.T., Brown, D.R., Hedlund, B.P., Paster, B.J., Ward, N.L., Ludwig, W., Whitman, W.B., Eds.; Springer-Verlag: New York, NY, USA, 2010; Volume 4, pp. 1–19.
  6. Corsaro, D.; Valassina, M.; Venditti, D. Increasing diversity within Chlamydiae. Crit. Rev. Microbiol. 2003, 29, 37–78.
  7. Stephens, R.S.; Kalman, S.; Lammel, C.; Fan, J.; Marathe, R.; Aravind, L.; Mitchell, W.; Olinger, L.; Tatusov, R.L.; Zhao, Q.; et al. Genome sequence of an obligate intracellular pathogen of humans: Chlamydia trachomatis. Science 1998, 282, 754–759.
  8. Stephens, R.S.; Myers, G.; Eppinger, M.; Bavoil, P.M. Divergence without difference: Phylogenetics and taxonomy of Chlamydia resolved. FEMS Immunol. Med. Microbiol. 2009, 55, 115–119.
  9. Wang, Y.; Kahane, S.; Cutcliffe, L.T.; Skilton, R.J.; Lambden, P.R.; Clarke, I.N. Development of a transformation system for Chlamydia trachomatis: Restoration of glycogen biosynthesis by acquisition of a plasmid shuttle vector. PLoS Pathog. 2011, 7, e1002258.
  10. Horn, M.; Collingro, A.; Schmitz-Esser, S.; Beier, C.L.; Purkhold, U.; Fartmann, B.; Brandt, P.; Nyakatura, G.J.; Droege, M.; Frishman, D.; et al. Illuminating the evolutionary history of chlamydiae. Science 2004, 304, 728–730.
  11. Subtil, A.; Dautry-Varsat, A. Chlamydia: Five years A.G. (after genome). Curr. Opin. Microbiol. 2004, 7, 85–92.
  12. Vandahl, B.B.; Birkelund, S.; Christiansen, G. Genome and proteome analysis of Chlamydia. Proteomics 2004, 4, 2831–2842.
  13. Angiuoli, S.V.; Hotopp, J.C.; Salzberg, S.L.; Tettelin, H. Improving pan-genome annotation using whole genome multiple alignment. BMC Bioinformatics 2011, 12, 272.
  14. Donati, C.; Hiller, N.L.; Tettelin, H.; Muzzi, A.; Croucher, N.J.; Angiuoli, S.V.; Oggioni, M.; Dunning Hotopp, J.C.; Hu, F.Z.; Riley, D.R.; et al. Structure and dynamics of the pan-genome of Streptococcus pneumoniae and closely related species. Genome Biol. 2010, 11, R107, doi:10.1186/gb-2010-11-10-r107.
  15. Jacobsen, A.; Hendriksen, R.S.; Aaresturp, F.M.; Ussery, D.W.; Friis, C. The Salmonella enterica Pan-genome. Microb. Ecol. 2011, 62, 487–504.
  16. Iliopoulos, I.; Tsoka, S.; Andrade, M.A.; Enright, A.J.; Carroll, M.; Poullet, P.; Promponas, V.; Liakopoulos, T.; Palaios, G.; Pasquier, C.; et al. Evaluation of annotation strategies using an entire genome sequence. Bioinformatics 2003, 19, 717–726, doi:10.1093/bioinformatics/btg077.
  17. Ouzounis, C.A.; Karp, P.D. The past, present and future of genome-wide re-annotation. Genome Biol. 2002, 3, comment2001.1–comment2001.6.
  18. Collingro, A.; Tischler, P.; Weinmaier, T.; Penz, T.; Heinz, E.; Brunham, R.C.; Read, T.D.; Bavoil, P.M.; Sachse, K.; Kahane, S.; et al. Unity in variety—The pan-genome of the Chlamydiae. Mol. Biol. Evol. 2011, 28, 3253–3270, doi:10.1093/molbev/msr161.
  19. Bertelli, C.; Collyn, F.; Croxatto, A.; Ruckert, C.; Polkinghorne, A.; Kebbi-Beghdadi, C.; Goesmann, A.; Vaughan, L.; Greub, G. The Waddlia genome: A window into chlamydial biology. PLoS One 2010, 5, e10890.
  20. Laing, C.; Buchanan, C.; Taboada, E.N.; Zhang, Y.; Kropinski, A.; Villegas, A.; Thomas, J.E.; Gannon, V.P. Pan-genome sequence analysis using Panseq: An online tool for the rapid analysis of core and accessory genomic regions. BMC Bioinformatics 2010, 11, 461.
  21. Lapierre, P.; Gogarten, J.P. Estimating the size of the bacterial pan-genome. Trends Genet. 2009, 25, 107–110.
  22. Janssen, P.; Enright, A.J.; Audit, B.; Cases, I.; Goldovsky, L.; Harte, N.; Kunin, V.; Ouzounis, C.A. COmplete GENome Tracking (COGENT): A flexible data environment for computational genomics. Bioinformatics 2003, 19, 1451–1452.
  23. Goldovsky, L.; Janssen, P.; Ahren, D.; Audit, B.; Cases, I.; Darzentas, N.; Enright, A.J.; Lopez-Bigas, N.; Peregrin-Alvarez, J.M.; Smith, M.; et al. CoGenT++: An extensive and extensible data environment for computational genomics. Bioinformatics 2005, 21, 3806–3810.
  24. Promponas, V.J.; Enright, A.J.; Tsoka, S.; Kreil, D.P.; Leroy, C.; Hamodrakas, S.; Sander, C.; Ouzounis, C.A. CAST: An iterative algorithm for the complexity analysis of sequence tracts. Complexity analysis of sequence tracts. Bioinformatics 2000, 16, 915–922, doi:10.1093/bioinformatics/16.10.915.
  25. Altschul, S.F.; Madden, T.L.; Schaffer, A.A.; Zhang, J.; Zhang, Z.; Miller, W.; Lipman, D.J. Gapped BLAST and PSI-BLAST: A new generation of protein database search programs. Nucleic Acids Res. 1997, 25, 3389–3402.
  26. Altschul, S.F.; Gish, W.; Miller, W.; Myers, E.W.; Lipman, D.J. Basic local alignment search tool. J. Mol. Biol. 1990, 215, 403–410.
  27. Enright, A.J.; van Dongen, S.; Ouzounis, C.A. An efficient algorithm for large-scale detection of protein families. Nucleic Acids Res. 2002, 30, 1575–1584.
  28. Lefebure, T.; Bitar, P.D.; Suzuki, H.; Stanhope, M.J. Evolutionary dynamics of complete Campylobacter pan-genomes and the bacterial species concept. Genome Biol. Evol. 2010, 2, 646–655.
  29. Smith, M.; Kunin, V.; Goldovsky, L.; Enright, A.J.; Ouzounis, C.A. MagicMatch—Cross-referencing sequence identifiers across databases. Bioinformatics 2005, 21, 3429–3430.
  30. Sayers, E.W.; Barrett, T.; Benson, D.A.; Bolton, E.; Bryant, S.H.; Canese, K.; Chetvernin, V.; Church, D.M.; DiCuccio, M.; Federhen, S.; et al. Database resources of the National Center for Biotechnology Information. Nucleic Acids Res. 2011, 39, D38–D51.
  31. Pellegrini, M.; Marcotte, E.M.; Thompson, M.J.; Eisenberg, D.; Yeates, T.O. Assigning protein functions by comparative genome analysis: Protein phylogenetic profiles. Proc. Natl. Acad. Sci. USA 1999, 96, 4285–4288.
  32. Kunin, V.; Ahren, D.; Goldovsky, L.; Janssen, P.; Ouzounis, C.A. Measuring genome conservation across taxa: Divided strains and united kingdoms. Nucleic Acids Res. 2005, 33, 616–621.
  33. Snel, B.; Huynen, M.A.; Dutilh, B.E. Genome trees and the nature of genome evolution. Annu. Rev. Microbiol. 2005, 59, 191–209.
  34. Waterhouse, A.M.; Procter, J.B.; Martin, D.M.; Clamp, M.; Barton, G.J. Jalview Version 2—A multiple sequence alignment editor and analysis workbench. Bioinformatics 2009, 25, 1189–1191.
  35. The 10 Supplements can be accessed at http://dx.doi.org/10.5061/dryad.rr064j8q/.
  36. Heinz, E.; Tischler, P.; Rattei, T.; Myers, G.; Wagner, M.; Horn, M. Comprehensive in silico prediction and analysis of chlamydial outer membrane proteins reflects evolution and life style of the Chlamydiae. BMC Genomics 2009, 10, 634.
  37. Stone, C.B.; Bulir, D.C.; Gilchrist, J.D.; Toor, R.K.; Mahony, J.B. Interactions between flagellar and type III secretion proteins in Chlamydia pneumoniae. BMC Microbiol. 2010, 10, 18.
  38. Muschiol, S.; Boncompain, G.; Vromman, F.; Dehoux, P.; Normark, S.; Henriques-Normark, B.; Subtil, A. Identification of a family of effectors secreted by the type III secretion system that are conserved in pathogenic Chlamydiae. Infect. Immun. 2011, 79, 571–580.
  39. Ouzounis, C.; Bork, P.; Sander, C. The modular structure of NifU proteins. Trends Biochem. Sci. 1994, 19, 199–200.
  40. Ouzounis, C.; Sander, C. Homology of the NifS family of proteins to a new class of pyridoxal phosphate-dependent enzymes. FEBS Lett. 1993, 322, 159–164.
  41. Devos, D.P.; Reynaud, E.G. Evolution. Intermediate steps. Science 2010, 330, 1187–1188, doi:10.1126/science.1196720.
  42. McInerney, J.O.; Martin, W.F.; Koonin, E.V.; Allen, J.F.; Galperin, M.Y.; Lane, N.; Archibald, J.M.; Embley, T.M. Planctomycetes and eukaryotes: A case of analogy not homology. Bioessays 2011, 33, 810–817.
  43. Kunin, V.; Ouzounis, C.A. GeneTRACE-reconstruction of gene content of ancestral species. Bioinformatics 2003, 19, 1412–1416.
  44. Bush, R.M.; Everett, K.D. Molecular evolution of the Chlamydiaceae. Int. J. Syst. Evol. Microbiol. 2001, 51, 203–220.
  45. Janssen, P.J.; Audit, B.; Ouzounis, C.A. Strain-specific genes of Helicobacter pylori: Distribution, function and dynamics. Nucleic Acids Res. 2001, 29, 4395–4404.
  46. Brinkman, F.S.; Blanchard, J.L.; Cherkasov, A.; Av-Gay, Y.; Brunham, R.C.; Fernandez, R.C.; Finlay, B.B.; Otto, S.P.; Ouellette, B.F.; Keeling, P.J.; et al. Evidence that plant-like genes in Chlamydia species reflect an ancestral relationship between Chlamydiaceae, cyanobacteria, and the chloroplast. Genome Res. 2002, 12, 1159–1167, doi:10.1101/gr.341802.
  47. Huang, J.; Gogarten, J.P. Did an ancient chlamydial endosymbiosis facilitate the establishment of primary plastids? Genome Biol. 2007, 8, R99, doi:10.1186/gb-2007-8-6-r99.
  48. Becker, B.; Hoef-Emden, K.; Melkonian, M. Chlamydial genes shed light on the evolution of photoautotrophic eukaryotes. BMC Evol. Biol. 2008, 8, 203.
  49. Moustafa, A.; Reyes-Prieto, A.; Bhattacharya, D. Chlamydiae has contributed at least 55 genes to Plantae with predominantly plastid functions. PLoS One 2008, 3, e2205.
  50. Kamneva, O.K.; Liberles, D.A.; Ward, N.L. Genome-wide influence of indel substitutions on evolution of bacteria of the PVC superphylum, revealed using a novel computational method. Genome Biol. Evol. 2010, 2, 870–886.
  51. Moran, N.A.; McCutcheon, J.P.; Nakabachi, A. Genomics and evolution of heritable bacterial symbionts. Annu. Rev. Genet. 2008, 42, 165–190.
  52. Merhej, V.; Royer-Carenzi, M.; Pontarotti, P.; Raoult, D. Massive comparative genomic analysis reveals convergent evolution of specialized bacteria. Biol. Direct 2009, 4, 13.
  53. Harris, S.R.; Clarke, I.N.; Seth-Smith, H.M.; Solomon, A.W.; Cutcliffe, L.T.; Marsh, P.; Skilton, R.J.; Holland, M.J.; Mabey, D.; Peeling, R.W.; et al. Whole-genome analysis of diverse Chlamydia trachomatis strains identifies phylogenetic relationships masked by current clinical typing. Nat. Genet. 2012, 44, 413–419.

Appendix

Table Appendix-Table 1. Core gene and protein families in the Chlamydiales.

Click here to display table

Appendix-Table 1. Core gene and protein families in the Chlamydiales.
Cluster IDLead SequenceMaster SequenceFunction Annotation from Master Sequence
181CABO-LLG-01-000000CTRA-DUW-01-000647NA
182CABO-LLG-01-000003CTRA-DUW-01-000644ENZYME UDP-N-acetylglucosamine pyrophosphorylase [EC] 2.7.7.23
183CABO-LLG-01-000004CTRA-DLC-01-000248FUNCTION PhoB-like protein
184CABO-LLG-01-000015CTRA-DLC-01-000228FUNCTION RecA protein
185PACA-UV7-01-001616CTRA-DUW-01-000487NA
186CABO-LLG-01-000023CTRA-DUW-01-000658NA
187CABO-LLG-01-000024CTRA-DUW-01-000624FUNCTION RNA Polymerase Sigma-54 factor RpoN
188CABO-LLG-01-000026CTRA-DUW-01-000622ENZYME Uracil DNA glycosylase [EC] 3.2.2.-
189CABO-LLG-01-000028CTRA-DUW-01-000620SIMILAR-TO NTPase HAM1 homolog [EC] 3.6.1.15
190CABO-LLG-01-000029CTRA-DLC-01-000273NA
191CABO-LLG-01-000032CTRA-DUW-01-000616NA
192CABO-LLG-01-000034CTRA-DLC-01-000278FUNCTION Peptidoglycan-associated lipoprotein
193CABO-LLG-01-000035CTRA-DLC-01-000279FUNCTION TolB macromolecule uptake homolog
194CABO-LLG-01-000037CTRA-DUW-01-000611FUNCTION TolR/ExbD macromolecule uptake homolog
195CABO-LLG-01-000040CTRA-DLC-01-000284FUNCTION protein translocase TatD/MttC homolog
196CABO-LLG-01-000047CTRA-DUW-01-000600ENZYME enolase [EC] 4.2.1.11
197CABO-LLG-01-000048CTRA-DUW-01-000599FUNCTION Excinuclease ABC subunit B
198CABO-LLG-01-000049CTRA-DUW-01-000598ENZYME Tryptophanyl-tRNA Synthetase [EC] 6.1.1.2
199CTRA-G22-01-000161CTRA-DUW-01-000746ENZYME Seryl-tRNA Synthetase [EC] 6.1.1.11
200CABO-LLG-01-000054CTRA-DUW-01-000593FUNCTION Nickel transporter CnrT homolog
201CABO-LLG-01-000061CTRA-DUW-01-000586NA
202CABO-LLG-01-000062CTRA-DUW-01-000585FUNCTION type II secretion system protein D homolog
203CABO-LLG-01-000063CTRA-DUW-01-000584FUNCTION type II secretion system protein E homolog
204CABO-LLG-01-000064CTRA-DLC-01-000308FUNCTION type II secretion system protein F homolog
205CABO-LLG-01-000065CTRA-DLC-01-000309NA
206CABO-LLG-01-000070CTRA-DUW-01-000577FUNCTION protein secretion system YscT homolog
207CABO-LLG-01-000072CTRA-DLC-01-000316FUNCTION protein secretion system YscR homolog
208CABO-LLG-01-000073CTRA-DEC-01-000317FUNCTION protein secretion system YscL homolog
209CABO-LLG-01-000074CTRA-DUW-01-000573NA
210CABO-LLG-01-000076CTRA-DUW-01-000571ENZYME lipoate synthase [EC] 2.8.1.-
211CABO-LLG-01-000081CTRA-DUW-01-000714ENZYME Endonuclease III [EC] 4.2.99.18
212CABO-LLG-01-000083CTRA-DUW-01-000716ENZYME Phosphatidylserine decarboxylase [EC] 4.1.1.65
213CABO-LLG-01-000085CTRA-DUW-01-000718FUNCTION preprotein translocase subunit SecA
214CABO-LLG-01-000089CTRA-DUW-01-000722ENZYME ATP-dependent Clp protease ATP-binding subunit ClpX [EC]
215CABO-LLG-01-000091CTRA-DUW-01-000724FUNCTION Trigger factor
216CABO-LLG-01-000093CTRA-DUW-01-000726FUNCTION Rod shape-determining protein MreB
217CABO-LLG-01-000094CTRA-DUW-01-000727ENZYME Phosphoenolpyruvate carboxykinase (GTP) [EC] 4.1.1.32
218CABO-LLG-01-000098CTRA-DUW-01-000731ENZYME Glycerol-3-phosphate dehydrogenase [NAD+] [EC] 1.1.1.8
219CABO-LLG-01-000099CTRA-DUW-01-000732ENZYME UDP-N-acetylhexosamine pyrophosphorylase [EC] 2.7.7.-
220CCAV-GPI-01-000128CTRA-DUW-01-000503FUNCTION Transcription termination factor Rho
221CABO-LLG-01-000104CTRA-DUW-01-000737DOMAIN NifU
222PACA-HAL-01-002518CTRA-DUW-01-000261ENZYME NifS aminotransferase [EC] -.-.-.-
223CABO-LLG-01-000109CTRA-DUW-01-000742ENZYME Biotin-[acetyl-CoA-carboxylase] synthetase [EC] 6.3.4.15
224 *CABO-LLG-01-000121CTRA-DUW-01-000754DOMAIN SET
225CABO-LLG-01-000122CTRA-DUW-01-000755SIMILAR-TO metallo-beta-lactamase [EC] 3.5.-.-
226CABO-LLG-01-000123CTRA-DUW-01-000756FUNCTION Cell division protein FtsK C-terminus
227CABO-LLG-01-000125CTRA-DUW-01-000757NA
228CABO-LLG-01-000126CTRA-DUW-01-000758FUNCTION preprotein translocase complex subunit YajC
229CABO-LLG-01-000130CTRA-DUW-01-000762ENZYME Protoporphyrinogen oxidase HemY [EC] 1.3.3.4
230CABO-LLG-01-000132CTRA-DUW-01-000764ENZYME Uroporphyrinogen decarboxylase HemE [EC] 4.1.1.37
231CABO-LLG-01-000134CTRA-DLC-01-000129ENZYME Alanyl-tRNA Synthetase [EC] 6.1.1.7
232CABO-LLG-01-000135CTRA-DUW-01-000767ENZYME Transketolase [EC] 2.2.1.1
233CABO-LLG-01-000136CTRA-DUW-01-000768SIMILAR-TO AMP nucleosidase [EC] 3.2.2.4
234CABO-LLG-01-000142CTRA-DUW-01-000774ENZYME Phospho-N-acetylmuramoyl-pentapeptide-transferase [EC]
235CABO-LLG-01-000143CTRA-DUW-01-000775ENZYME UDP-N-acetylmuramoylalanine-D-glutamate ligase [EC]
236CABO-LLG-01-000144CTRA-DUW-01-000776SIMILAR-TO N-acetylmuramoyl-L-alanine amidase C-terminus [EC]
237CABO-LLG-01-000146CTRA-DLC-01-000117ENZYME UDP-N-acetylglucosamine-N-acetylmuramyl-(pentapeptide) pyrophosphoryl-undecaprenol N-acetylglucosamine transferase
238CABO-S26-01-000517CTRA-DUW-01-000125ENZYME Biotin carboxylase [EC] 6.3.4.14
239CABO-LLG-01-000150CTRA-DUW-01-000781NA
240CABO-LLG-01-000155CTRA-DUW-01-000786NA
241CABO-LLG-01-000157CTRA-DUW-01-000788ENZYME bis(5'-nucleosyl)-tetraphosphatase [EC] 3.6.1.17
242CABO-LLG-01-000168CTRA-DLC-01-000098ENZYME Cysteinyl-tRNA Synthetase [EC] 6.1.1.16
243CABO-LLG-01-000173CTRA-DUW-01-000804FUNCTION Ribosomal protein S14
244CABO-LLG-01-000174CTRA-DUW-01-000805NA
245CABO-LLG-01-000176CTRA-DUW-01-000808ENZYME Excinuclease ABC subunit C [EC] -.-.-.-
246CABO-LLG-01-000177CTRA-DUW-01-000809FUNCTION DNA mismatch repair protein MutS
247CABO-LLG-01-000184CTRA-DUW-01-000815ENZYME CDP-diacylglycerol-glycerol-3-phosphate
248CABO-LLG-01-000185CTRA-DUW-01-000816ENZYME Glycogen synthase [EC] 2.4.1.21 2
249CABO-LLG-01-000186CTRA-DUW-01-000817FUNCTION Ribosomal protein L25
250CABO-LLG-01-000187CTRA-DUW-01-000818ENZYME Peptidyl-tRNA hydrolase [EC] 3.1.1.29
251CABO-LLG-01-000188CTRA-DUW-01-000819FUNCTION Ribosomal protein S6
252CABO-LLG-01-000189CTRA-DUW-01-000820FUNCTION Ribosomal protein S18
253CABO-LLG-01-000190CTRA-DUW-01-000821FUNCTION Ribosomal protein L9
254 *CABO-LLG-01-000193CTRA-DUW-01-000823NA
255 *CABO-LLG-01-000194CTRA-DUW-01-000824SIMILAR-TO Small-peptide endopeptidase [EC] 3.4.24.55
256CABO-LLG-01-000195CTRA-DLC-01-000073ENZYME Glycerol-3-phosphate acyltransferase [EC] 2.3.1.15
257CABO-LLG-01-000196CTRA-DLC-01-000072ENZYME Ribonuclease E [EC] 3.1.4.-
258CABO-LLG-01-000197CTRA-DLC-01-000071NA
259CABO-LLG-01-000214CTRA-DLC-01-000063ENZYME Glucosamine-fructose-6-phosphate aminotransferase [EC]
260CABO-LLG-01-000218CTRA-DUW-01-000840ENZYME Succinyl-CoA synthetase beta chain [EC] 6.2.1.5
261CABO-LLG-01-000222CTRA-DUW-01-000843SIMILAR-TO Small-peptide endopeptidase [EC] 3.4.24.55
262CABO-LLG-01-000224CTRA-DUW-01-000845ENZYME CDP-diacylglycerol-serine O-phosphatidyltransferase [EC]
263CABO-LLG-01-000229CTRA-DUW-01-000850ENZYME UDP-N-acetylenolpyruvoylglucosamine reductase [EC]
264CABO-LLG-01-000230CTRA-DLC-01-000047FUNCTION Transcription termination protein NusB
265CABO-LLG-01-000231CTRA-DLC-01-000046NA
266CABO-LLG-01-000233CTRA-DUW-01-000854FUNCTION Ribosomal protein L20
267CABO-LLG-01-000234CTRA-DUW-01-000855ENZYME Phenylalanyl-tRNA Synthetase alpha chain [EC] 6.1.1.20
268CABO-LLG-01-000236CTRA-DUW-01-000857NA
269CABO-LLG-01-000237CTRA-DUW-01-000858NA
270CABO-LLG-01-000240CTRA-DUW-01-000861ENZYME Polynucleotide phosphorylase [EC] 2.7.7.8
271CABO-LLG-01-000241CTRA-DUW-01-000862NA
272 *CABO-LLG-01-000254CTRA-DUW-01-000874FUNCTION ABC transporter, ATP-binding protein N-terminus
273CABO-LLG-01-000267CTRA-DUW-01-000385ENZYME Glucose-6-phosphate isomerase [EC] 5.3.1.9
274CABO-LLG-01-000269CTRA-DLC-01-000502ENZYME Malate dehydrogenase [EC] 1.1.1.82
275CABO-LLG-01-000271CTRA-DUW-01-000382SIMILAR-TO D-Amino Acid Dehydrogenase [EC] 1.-.-.-
276 *CABO-LLG-01-000276CTRA-DLC-01-000508ENZYME 3-dehydroquinate dehydratase [EC] 4.2.1.10
277CPRO-UWE-01-000881CTRA-DUW-01-000373ENZYME 3-phosphoshikimate 1-carboxyvinyltransferase [EC]
278CABO-LLG-01-000277CTRA-DUW-01-000376ENZYME 3-dehydroquinate synthase [EC] 4.6.1.3
279CABO-LLG-01-000278CTRA-DUW-01-000375ENZYME Chorismate synthase [EC] 4.6.1.4
280CABO-LLG-01-000288CTRA-DUW-01-000371ENZYME Dihydrodipicolinate reductase [EC] 1.3.1.26
281CABO-LLG-01-000290CTRA-DUW-01-000369ENZYME Aspartokinase [EC] 2.7.2.4
282CABO-LLG-01-000298CTRA-DUW-01-000328NA
283SNEG-ZXX-01-000625CTRA-DLC-01-000783FUNCTION Translation initiation factor IF-2
284CABO-LLG-01-000304CTRA-DUW-01-000322FUNCTION Ribosomal protein L11
285CABO-LLG-01-000305CTRA-DUW-01-000321FUNCTION Ribosomal protein L1
286CABO-LLG-01-000306CTRA-DUW-01-000320FUNCTION Ribosomal protein L10
287CABO-LLG-01-000308CTRA-DUW-01-000318ENZYME DNA-directed RNA polymerase beta subunit [EC] 2.7.7.6
288CABO-LLG-01-000309CTRA-DUW-01-000317ENZYME DNA-directed RNA polymerase beta prime subunit [EC]
289CABO-LLG-01-000312CTRA-DUW-01-000314NA
290CABO-LLG-01-000313CTRA-DLC-01-000569ENZYME vacuolar ATPase proteolipid subunit E [EC] 3.6.1.34
291CABO-LLG-01-000314CTRA-DUW-01-000312NA
292CABO-LLG-01-000317CTRA-DUW-01-000309ENZYME vacuolar ATPase proteolipid subunit D [EC] 3.6.1.34
293CABO-LLG-01-000320CTRA-DLC-01-000576NA
294CABO-LLG-01-000324CTRA-DUW-01-000337ENZYME Pyruvate kinase [EC] 2.7.1.40
295CPRO-UWE-01-001632CTRA-DUW-01-000012NA
296CABO-LLG-01-000328CTRA-DUW-01-000013ENZYME Cytochrome Oxidase D subunit I [EC] 1.10.3.-
297CABO-LLG-01-000329CTRA-DUW-01-000014ENZYME Cytochrome Oxidase D subunit II [EC] 1.10.3.-
298CABO-LLG-01-000331CTRA-DLC-01-000860NA
299CABO-LLG-01-000332CTRA-DLC-01-000861NA
300CABO-LLG-01-000333CTRA-DUW-01-000015FUNCTION PhoH-like protein
301CABO-LLG-01-000337CTRA-DLC-01-000856NA
302CABO-LLG-01-000338CTRA-DUW-01-000022FUNCTION Ribosomal protein L31
303CABO-LLG-01-000342CTRA-DUW-01-000026FUNCTION Ribosomal protein S16
304CABO-LLG-01-000343CTRA-DUW-01-000027ENZYME tRNA (guanine N-1) methyltransferase [EC] 2.1.1.31
305CABO-LLG-01-000344CTRA-DUW-01-000028FUNCTION Ribosomal protein L19
306CABO-LLG-01-000345CTRA-DUW-01-000029ENZYME Ribonuclease HII [EC] 3.1.26.4
307CABO-LLG-01-000346CTRA-DUW-01-000030ENZYME Guanylate kinase [EC] 2.7.4.8
308CABO-LLG-01-000358CTRA-DUW-01-000215ENZYME Ribose 5-phosphate isomerase A [EC] 5.3.1.6
309CABO-LLG-01-000359CTRA-DLC-01-000666NA
310CABO-LLG-01-000360CTRA-DUW-01-000213NA
311CABO-LLG-01-000368CTRA-DLC-01-000765NA
312CABO-LLG-01-000374CTRA-DUW-01-000147ENZYME DNA ligase (NAD+) [EC] 6.5.1.2
313CABO-LLG-01-000379CTRA-DUW-01-000210ENZYME 3-deoxy-d-manno-octulosonic-acid transferase [EC] 2.-.-.-
314CABO-LLG-01-000392CTRA-DUW-01-000186NA
315CABO-LLG-01-000393CTRA-DUW-01-000185ENZYME CTP synthetase [EC] 6.3.4.2
316CABO-LLG-01-000404CTRA-DUW-01-000195ENZYME Queuine tRNA-ribosyltransferase [EC] 2.4.2.29
317CABO-LLG-01-000420CTRA-DUW-01-000132NA
318CABO-LLG-01-000423CTRA-DUW-01-000199ENZYME O-sialoglycoprotein endopeptidase [EC] 3.4.24.57
319CABO-LLG-01-000425CTRA-DUW-01-000187NA
320CABO-LLG-01-000426CTRA-DUW-01-000188ENZYME Glucose-6-phosphate dehydrogenase [EC] -.-.-.-
321CABO-LLG-01-000432CTRA-DLC-01-000753FUNCTION Ribosomal protein S9
322CABO-LLG-01-000433CTRA-DUW-01-000126FUNCTION Ribosomal protein L13
323CABO-LLG-01-000435CTRA-DUW-01-000152NA
324CABO-LLG-01-000448CTRA-DUW-01-000138FUNCTION Sua5 homolog
325CABO-LLG-01-000451CTRA-DUW-01-000190ENZYME Thymidylate kinase (dTMP kinase) [EC] 2.7.4.9
326CABO-LLG-01-000459CTRA-DUW-01-000217ENZYME Fructose-bisphosphate aldolase class I [EC] 4.1.2.13
327CABO-LLG-01-000471CTRA-DUW-01-000239FUNCTION acyl carrier protein ACP
328PACA-UV7-01-000731CTRA-DUW-01-000105ENZYME Enoyl-[acyl-carrier protein] reductase (NADH) [EC]
329CABO-LLG-01-000473CTRA-DLC-01-000640ENZYME Malonyl CoA-acyl carrier protein transacylase [EC]
330CABO-LLG-01-000474CTRA-DLC-01-000639ENZYME 3-oxoacyl-[acyl-carrier-protein] synthase III [EC]
331CABO-LLG-01-000475CTRA-DUW-01-000243FUNCTION Recombination protein RecR homolog
332CABO-LLG-01-000477CTRA-DUW-01-000245NA
333CPNE-TW1-01-000387CTRA-DUW-01-000055ENZYME 2-oxoglutarate dehydrogenase E1 component [EC] 1.2.4.2
334CABO-LLG-01-000486CTRA-DUW-01-000254FUNCTION Inner-membrane protein YidC
335CABO-LLG-01-000489CTRA-DUW-01-000101ENZYME holo-[acyl-carrier protein] synthase [EC] 2.7.8.7
336CABO-LLG-01-000490CTRA-DUW-01-000100ENZYME Thioredoxin reductase (NADPH) [EC] 1.6.4.5
337CABO-LLG-01-000494CTRA-DUW-01-000096FUNCTION Ribosome-binding factor A RbfA
338CABO-LLG-01-000496CTRA-DUW-01-000094ENZYME Riboflavin kinase [EC] 2.7.1.26
339CABO-LLG-01-000499CTRA-DUW-01-000090NA
340CABO-LLG-01-000500CTRA-DUW-01-000089NA
341CABO-LLG-01-000501CTRA-DLC-01-000793FUNCTION Ribosomal protein L28
342CABO-LLG-01-000508CTRA-DLC-01-000801ENZYME Methylenetetrahydrofolate dehydrogenase [EC] 1.5.1.15
343CABO-LLG-01-000509CTRA-DUW-01-000078FUNCTION Thiamine biosynthesis lipoprotein ApbE precursor
344CABO-LLG-01-000510CTRA-DUW-01-000077FUNCTION Small protein B SmpB homolog
345CABO-LLG-01-000511CTRA-DUW-01-000076ENZYME DNA polymerase III beta chain [EC] 2.7.7.7
346CABO-LLG-01-000514CTRA-DUW-01-000073SIMILAR-TO zinc protease [EC] -.-.-.-
347CABO-LLG-01-000516CTRA-DUW-01-000071FUNCTION ABC transporter, permease protein TroD
348CABO-LLG-01-000519CTRA-DUW-01-000068FUNCTION periplasmic substrate binding protein TroA
349CPSI-CAL-01-000799CTRA-DUW-01-000423FUNCTION high-affinity ZnuA homolog
350CABO-LLG-01-000520CTRA-DLC-01-000812NA
351CABO-LLG-01-000523CTRA-DUW-01-000064ENZYME 6-phosphogluconate dehydrogenase [EC] 1.1.1.44
352CABO-LLG-01-000524CTRA-DUW-01-000063ENZYME Tyrosyl-tRNA Synthetase [EC] 6.1.1.1
353CABO-LLG-01-000535CTRA-DLC-01-000825NA
354CABO-LLG-01-000541CTRA-DLC-01-000831NA
355CABO-LLG-01-000544CTRA-DUW-01-000045FUNCTION single-stranded DNA-binding protein SSB
356CABO-LLG-01-000545CTRA-DUW-01-000044NA
357CABO-LLG-01-000547CTRA-DUW-01-000042NA
358CABO-LLG-01-000554CTRA-DLC-01-000619ENZYME Protein phosphatase 2C [EC] 3.1.3.16
359CABO-LLG-01-000558CTRA-DUW-01-000257NA
360CABO-LLG-01-000560CTRA-DUW-01-000108ENZYME A/G-specific adenine glycosylase [EC] 3.2.2.-
361CABO-LLG-01-000564CTRA-DUW-01-000104NA
362CABO-LLG-01-000571CTRA-DUW-01-000268ENZYME Acetyl-coenzyme A carboxylase carboxyl transferase
363CABO-LLG-01-000574CTRA-DUW-01-000271ENZYME N-acetylmuramoyl-L-alanine amidase AmiB [EC] 3.5.1.28
364CABO-LLG-01-000577CTRA-DUW-01-000273FUNCTION Penicillin-binding protein 3
365CABO-LLG-01-000578CTRA-DUW-01-000274NA
366CABO-LLG-01-000581CTRA-DUW-01-000277DOMAIN TPR
367CABO-LLG-01-000585CTRA-DLC-01-000601NA
368CABO-LLG-01-000586CTRA-DUW-01-000282NA
369CABO-LLG-01-000587CTRA-DLC-01-000599NA
370CPEC-E58-01-000614CTRA-DUW-01-000284NA
371CABO-LLG-01-000591CTRA-DLC-01-000597FUNCTION Glycine cleavage system H protein
372CABO-LLG-01-000594CTRA-DUW-01-000288SIMILAR-TO Lipoate-protein ligase A [EC] 6.3.4.-
373CABO-LLG-01-000596CTRA-DUW-01-000290ENZYME tRNA (5-methylaminomethyl-2-thiouridylate)-methyltransferase
374CABO-LLG-01-000601CTRA-DUW-01-000293ENZYME Nitrogen regulatory IIA protein A component [EC] 2.7.1.69
375WCHO-WSU-01-000243CTRA-DUW-01-000294ENZYME Nitrogen regulatory IIA protein A component [EC] 2.7.1.69
376CABO-LLG-01-000603CTRA-DUW-01-000295ENZYME dUTP pyrophosphatase [EC] 3.6.1.23
377CABO-LLG-01-000608CTRA-DUW-01-000300ENZYME Ribonuclease III [EC] 3.1.26.3
378CABO-LLG-01-000609CTRA-DLC-01-000581FUNCTION DNA repair protein RadA
379CABO-LLG-01-000610CTRA-DUW-01-000302ENZYME Porphobilinogen deaminase [EC] 4.3.1.8
380CABO-LLG-01-000616CTRA-DUW-01-000340NA
381CABO-LLG-01-000623CTRA-DUW-01-000346DOMAIN DnaJ
382CABO-LLG-01-000624CTRA-DLC-01-000536FUNCTION Ribosomal protein S21
383CABO-LLG-01-000628CTRA-DUW-01-000351ENZYME Aryl-sulfate sulphohydrolase [EC] 3.1.6.1
384CABO-LLG-01-000631CTRA-DUW-01-000354FUNCTION Septum formation protein Maf homolog
385CABO-LLG-01-000632CTRA-DUW-01-000355NA
386WCHO-WSU-01-000567CTRA-DUW-01-000392NA
387CABO-LLG-01-000633CTRA-DUW-01-000356NA
388CABO-LLG-01-000636CTRA-DUW-01-000333ENZYME Triosephosphate isomerase [EC] 5.3.1.1
389CABO-LLG-01-000637CTRA-DUW-01-000334ENZYME Exonuclease VII large subunit [EC] 3.1.11.6
390CABO-LLG-01-000641CTRA-DUW-01-000360ENZYME Dimethyladenosine transferase [EC] 2.1.1.-
391CABO-LLG-01-000642CTRA-DUW-01-000361NA
392CABO-LLG-01-000643CTRA-DUW-01-000362DOMAIN Thioredoxin
393CABO-LLG-01-000646CTRA-DLC-01-000868NA
394CABO-LLG-01-000647CTRA-DLC-01-000869ENZYME Ribonuclease HII [EC] 3.1.26.4
395CABO-LLG-01-000651CTRA-DUW-01-000004ENZYME glutamyl-tRNA (Gln) amidotransferase, subunit B [EC]
396CABO-LLG-01-000670CTRA-DUW-01-000386NA
397CABO-LLG-01-000671CTRA-DUW-01-000387SIMILAR-TO metallo-beta-lactamase [EC] 3.5.-.-
398CABO-LLG-01-000684CTRA-DLC-01-000488NA
399CABO-LLG-01-000686CTRA-DUW-01-000396NA
400CABO-LLG-01-000690CTRA-DLC-01-000484FUNCTION Heat-inducible transcription repressor HrcA
401CABO-LLG-01-000691CTRA-DUW-01-000403FUNCTION GrpE protein
402CABO-LLG-01-000698CTRA-DUW-01-000432NA
403CABO-LLG-01-000701CTRA-DUW-01-000435NA
404CABO-LLG-01-000705CTRA-DUW-01-000438ENZYME ubiquinone/menaquinone biosynthesis methlytransferase
405CABO-LLG-01-000706CTRA-DLC-01-000449NA
406CABO-LLG-01-000707CTRA-DUW-01-000440ENZYME Diaminopimelate epimerase [EC] 5.1.1.7
407CABO-LLG-01-000709CTRA-DLC-01-000446ENZYME Serine hydroxymethyltransferase [EC] 2.1.2.1
408CABO-LLG-01-000713CTRA-DUW-01-000406NA
409CABO-LLG-01-000714CTRA-DLC-01-000479NA
410CABO-LLG-01-000717CTRA-DUW-01-000410ENZYME Lipid A 4'-kinase [EC] 2.7.1.130
411CABO-LLG-01-000722CTRA-DLC-01-000471FUNCTION DnaK suppressor protein
412CABO-LLG-01-000723CTRA-DUW-01-000416ENZYME Lipoprotein signal peptidase [EC] 3.4.23.36
413CABO-LLG-01-000735CTRA-DUW-01-000427FUNCTION Ribosomal protein L27
414CABO-LLG-01-000736CTRA-DLC-01-000459FUNCTION Ribosomal protein L21
415CABO-LLG-01-000738CTRA-DUW-01-000444NA
416CABO-LLG-01-000739CTRA-DUW-01-000445ENZYME Sulfite reductase (NADPH) flavoprotein alpha-component
417CABO-LLG-01-000740CTRA-DLC-01-000442FUNCTION Ribosomal protein S10
418CABO-LLG-01-000751CTRA-DUW-01-000456ENZYME Glutamyl-tRNA Synthetase [EC] 6.1.1.17
419CABO-LLG-01-000752CTRA-DLC-01-000431NA
420 *CABO-LLG-01-000753 NA
421CABO-LLG-01-000754CTRA-DUW-01-000458ENZYME Single-stranded-DNA-specific exonuclease RecJ [EC]
422CABO-LLG-01-000759CTRA-DUW-01-000463ENZYME Cytidylate kinase [EC] 2.7.4.14
423CABO-LLG-01-000761CTRA-DUW-01-000465ENZYME Arginyl-tRNA Synthetase [EC] 6.1.1.19
424CABO-LLG-01-000762CTRA-DUW-01-000466ENZYME UDP-N-acetylglucosamine 1-carboxyvinyltransferase [EC]
425CABO-LLG-01-000764CTRA-DUW-01-000468NA
426CABO-LLG-01-000778CTRA-DUW-01-000480NA
427CABO-LLG-01-000779CTRA-DUW-01-000481NA
428CABO-LLG-01-000784CTRA-DUW-01-000486ENZYME Phenylalanyl-tRNA Synthetase beta chain [EC] 6.1.1.20
429 *CABO-LLG-01-000789CTRA-DUW-01-000491FUNCTION Dipeptide binding protein DppA
430CABO-LLG-01-000792CTRA-DUW-01-000496NA
431CABO-LLG-01-000793CTRA-DUW-01-000497ENZYME Protoheme ferro-lyase [EC] 4.99.1.1
432CABO-LLG-01-000794CTRA-DUW-01-000498FUNCTION Aminoacid-binding periplasmic protein precursor
433CABO-LLG-01-000795CTRA-DUW-01-000499ENZYME HemK modification methylase homolog [EC] -.-.-.-
434CABO-LLG-01-000796CTRA-DUW-01-000500NA
435CABO-LLG-01-000801CTRA-DLC-01-000386DOMAIN ATP-binding
436CABO-LLG-01-000802CTRA-DUW-01-000505ENZYME DNA polymerase I [EC] 2.7.7.7
437CABO-LLG-01-000803CTRA-DLC-01-000384NA
438CABO-LLG-01-000805CTRA-DUW-01-000508ENZYME CDP-diacylglycerol-glycerol-3-phosphate
439CABO-LLG-01-000807CTRA-DUW-01-000511FUNCTION Glucose inhibited division protein A GidA
440CABO-LLG-01-000808CTRA-DUW-01-000512ENZYME Lipoate-protein ligase A [EC] 6.3.4.-
441CABO-LLG-01-000810CTRA-DUW-01-000514ENZYME Holliday Junction DNA Helicase RuvA [EC] -.-.-.-
442CABO-LLG-01-000811CTRA-DUW-01-000515ENZYME Holliday Junction DNA Helicase RuvC [EC] 3.1.22.4
443CABO-LLG-01-000813CTRA-DUW-01-000517NA
444CABO-LLG-01-000814CTRA-DUW-01-000518ENZYME Glyceraldehyde 3-phosphate dehydrogenase [EC] 1.2.1.12
445CABO-LLG-01-000820CTRA-DUW-01-000524FUNCTION Ribosomal protein L15
446CABO-LLG-01-000821CTRA-DUW-01-000525FUNCTION Ribosomal protein S5
447CABO-LLG-01-000822CTRA-DUW-01-000526FUNCTION Ribosomal protein L18
448CABO-LLG-01-000824CTRA-DUW-01-000528FUNCTION Ribosomal protein S8
449CABO-LLG-01-000825CTRA-DUW-01-000529FUNCTION Ribosomal protein L5
450CABO-LLG-01-000826CTRA-DUW-01-000530FUNCTION Ribosomal protein L24
451CABO-LLG-01-000827CTRA-DUW-01-000531FUNCTION Ribosomal protein L14
452CABO-LLG-01-000828CTRA-DUW-01-000532FUNCTION Ribosomal protein S17
453CABO-LLG-01-000830CTRA-DUW-01-000534FUNCTION Ribosomal protein L16
454CABO-LLG-01-000831CTRA-DUW-01-000535FUNCTION Ribosomal protein S3
455CABO-LLG-01-000833CTRA-DUW-01-000537FUNCTION Ribosomal protein S19
456CABO-LLG-01-000834CTRA-DUW-01-000538FUNCTION Ribosomal protein L2
457CABO-LLG-01-000835CTRA-DUW-01-000539FUNCTION Ribosomal protein L23
458CABO-LLG-01-000836CTRA-DLC-01-000351FUNCTION Ribosomal protein L4
459CABO-LLG-01-000837CTRA-DUW-01-000541FUNCTION Ribosomal protein L3
460 *CABO-LLG-01-000839CTRA-DUW-01-000543ENZYME Methionyl-tRNA formyltransferase [EC] 2.1.2.9
461CABO-LLG-01-000841CTRA-DUW-01-000545ENZYME (3R)-hydroxymyristoyl-[acyl carrier protein] dehydratase
462CABO-LLG-01-000842CTRA-DLC-01-000345ENZYME UDP-3-O-[3-hydroxymyristoyl] N-acetylglucosamine
463CABO-LLG-01-000843CTRA-DUW-01-000547ENZYME apolipoprotein N-acyltransferase [EC] 2.3.1.
464CABO-LLG-01-000846CTRA-DUW-01-000550DOMAIN ATP-binding
465CABO-LLG-01-000847CTRA-DUW-01-000551NA
466CABO-LLG-01-000849CTRA-DUW-01-000553ENZYME rRNA methyltransferase SpoU homolog [EC] -.-.-.-
467CABO-LLG-01-000852CTRA-DUW-01-000556ENZYME Histidyl-tRNA Synthetase [EC] 6.1.1.21
468CABO-LLG-01-000855CTRA-DUW-01-000558ENZYME DNA polymerase III alpha chain [EC] 2.7.7.7
469CABO-LLG-01-000856CTRA-DUW-01-000559NA
470CABO-LLG-01-000857CTRA-DLC-01-000331NA
471CABO-LLG-01-000858CTRA-DUW-01-000561NA
472CABO-LLG-01-000860CTRA-DLC-01-000327ENZYME D-alanyl-D-alanine carboxypeptidase DacF [EC] 3.4.16.4
473CABO-LLG-01-000865CTRA-DUW-01-000710ENZYME Phosphoglycerate kinase [EC] 2.7.2.3
474CABO-LLG-01-000867CTRA-DUW-01-000708FUNCTION Phosphate transport system protein PhoU
475CABO-LLG-01-000874CTRA-DUW-01-000703FUNCTION ABC transporter, ATP-binding protein
476SNEG-ZXX-01-002117CTRA-DUW-01-000701FUNCTION ABC transporter, ATP-binding protein
477CABO-LLG-01-000880CTRA-DUW-01-000697FUNCTION Ribosomal protein S2
478CABO-LLG-01-000881CTRA-DLC-01-000198FUNCTION Translation elongation factor EF-TS
479CABO-LLG-01-000882CTRA-DUW-01-000695ENZYME Uridylate kinase [EC] 2.7.4.
480CABO-LLG-01-000892CTRA-DUW-01-000684NA
481CABO-LLG-01-000895CTRA-DUW-01-000681DOMAIN FHA
482CABO-LLG-01-000897CTRA-DUW-01-000679ENZYME glutamyl-tRNA reductase [EC] 1.2.1.
483CABO-LLG-01-000904CTRA-DUW-01-000672ENZYME KDO-8-phosphate synthetase [EC] 4.1.2.16
484CABO-LLG-01-000912CTRA-DLC-01-000254NA
485CABO-LLG-01-000913CTRA-DUW-01-000640SIMILAR-TO Endonuclease IV [EC] 3.1.21.2
486CABO-LLG-01-000914CTRA-DUW-01-000641FUNCTION Ribosomal protein S4
487CABO-LLG-01-000916CTRA-DUW-01-000657FUNCTION Multidrug-efflux transporter
488CABO-LLG-01-000917CTRA-DLC-01-000238ENZYME Exodeoxyribonuclease V gamma subunit [EC] 3.1.11.5
489CABO-LLG-01-000920CTRA-DUW-01-000652SIMILAR-TO Amino-acid aminotransferase class I [EC] 2.6.1.
490CABO-LLG-01-000921CTRA-DUW-01-000651FUNCTION Transcription Elongation Factor GreA C-terminus
491CABO-LLG-01-000923CTRA-DUW-01-000649NA
492CABO-LLG-01-000924CTRA-DLC-01-000245ENZYME Porphobilinogen synthase [EC] 4.2.1.24

* Eight clusters do not contain one member per genome exactly and are marked; these include cluster 420 which does not contain a master sequence from the C. trachomatis original annotation dataset; NA: not available. Lead sequence is first sequence found in cluster; master sequence is sequence where annotation is drawn from (see Experimental Section for details).

Supplementary Files

  • Supplementary File 1:

    Supplementary (ZIP, 57274 KB)

  • Genes EISSN 2073-4425 Published by MDPI AG, Basel, Switzerland RSS E-Mail Table of Contents Alert