Genes 2011, 2(4), 869-911; doi:10.3390/genes2040869

Annotation of Protein Domains Reveals Remarkable Conservation in the Functional Make up of Proteomes Across Superkingdoms
Arshan Nasir 1, Aisha Naeem 2, Muhammad Jawad Khan 2, Horacio D. Lopez-Nicora 3 and Gustavo Caetano-Anollés 1,*
Evolutionary Bioinformatics Laboratory, Department of Crop Sciences, University of Illinois, Urbana, IL 61801, USA; E-Mail:
Mammalian NutriPhysioGenomics Laboratory, Department of Animal Sciences, University of Illinois, Urbana, IL 61801, USA; E-Mails: (A.Na.); (M.J.K.)
Plant Pathology Laboratory, Department of Crop Sciences, University of Illinois, Urbana, IL 61801, USA; E-Mail:
Author to whom correspondence should be addressed; E-Mail:; Tel.: +1-217-333-8172; Fax: +1-217-333-8046.
Received: 16 September 2011; in revised form: 28 October 2011 / Accepted: 28 October 2011 /
Published: 8 November 2011


: The functional repertoire of a cell is largely embodied in its proteome, the collection of proteins encoded in the genome of an organism. The molecular functions of proteins are the direct consequence of their structure and structure can be inferred from sequence using hidden Markov models of structural recognition. Here we analyze the functional annotation of protein domain structures in almost a thousand sequenced genomes, exploring the functional and structural diversity of proteomes. We find there is a remarkable conservation in the distribution of domains with respect to the molecular functions they perform in the three superkingdoms of life. In general, most of the protein repertoire is spent in functions related to metabolic processes but there are significant differences in the usage of domains for regulatory and extra-cellular processes both within and between superkingdoms. Our results support the hypotheses that the proteomes of superkingdom Eukarya evolved via genome expansion mechanisms that were directed towards innovating new domain architectures for regulatory and extra/intracellular process functions needed for example to maintain the integrity of multicellular structure or to interact with environmental biotic and abiotic factors (e.g., cell signaling and adhesion, immune responses, and toxin production). Proteomes of microbial superkingdoms Archaea and Bacteria retained fewer numbers of domains and maintained simple and smaller protein repertoires. Viruses appear to play an important role in the evolution of superkingdoms. We finally identify few genomic outliers that deviate significantly from the conserved functional design. These include Nanoarchaeum equitans, proteobacterial symbionts of insects with extremely reduced genomes, Tenericutes and Guillardia theta. These organisms spend most of their domains on information functions, including translation and transcription, rather than on metabolism and harbor a domain repertoire characteristic of parasitic organisms. In contrast, the functional repertoire of the proteomes of the Planctomycetes-Verrucomicrobia-Chlamydiae superphylum was no different than the rest of bacteria, failing to support claims of them representing a separate superkingdom. In turn, Protista and Bacteria shared similar functional distribution patterns suggesting an ancestral evolutionary link between these groups.
functional annotation; fold superfamily; molecular function; protein domain; SCOP; structure; superkingdom

1. Introduction

Proteins are active components of molecular machinery that perform vital functions for cellular and organismal life [1,2]. Information in the DNA is copied into messenger RNA that is generally translated into proteins by the ribosome. Nascent polypeptide chains are unfolded random coils but quickly undergo conformational changes to produce characteristic and functional folds. These folds are three-dimensional (3D) structures that define the native state of proteins [3,4]. Biologically active proteins are made up of well-packed structural and functional units referred to as domains. Domains appear either singly or in combination with other domains in a protein and act as modules by engaging in combinatorial interplays that enhance the functional repertoires of cells [5]. While molecular interactions between domains in mutidomain proteins play important roles in the evolution of protein repertoires [6], it is the domain structure that is maintained in proteins for long periods of evolutionary time [79]. This is in sharp contrast to amino acid sequence, which is highly variable. For this reason, protein domains are also considered evolutionary units [7,1012].

1.1. Classification of Domains

Domains that are evolutionarily related can be grouped together in hierarchical classifications [1,10,13]. One scheme of classifying protein domains is the well-established “Structural Classification of Proteins” (SCOP). The SCOP database groups domains that have sequence conservation (generally with >30% pairwise amino acid residue identities) into fold families (FFs), FFs with structural and functional evidence of common ancestry into fold superfamilies (FSFs), FSFs with common 3D structural topologies into folds (Fs), and Fs sharing a same general architecture into protein classes [10,14]. SCOP identifies protein domains using concise classification strings (css) (e.g., c.26.1.2, where c represents the protein class, 26 the F, 1 the FSF and 2 the FF). The 97,178 domains indexed in SCOP 1.73 (corresponding to 34,494 PDB entries) are classified into 1,086 F, 1,777 FSFs, and 3,464 FFs. Compared to the number of protein entries in UniProt (531,473 total entries as of July 27, 2011) the number of domain structural designs at these different levels of structural abstraction is quite limited. Their relatively small number suggests that fold space is finite and is evolutionarily highly conserved [1,7,15].

1.2. Assigning FSF Structures to Proteomes

Genome-encoded proteins can be scanned against advanced linear hidden Markov models (HMMs) of structural recognition in SUPERFAMILY [16,17]. HMM libraries are generated using the iterative Sequence Alignment and Modeling (SAM) method. SAM is considered one of the most powerful algorithms for detecting remote homologies [18]. The SUPERFAMILY database currently provides FSF structural assignments for a total of 1,245 model organisms including 96 Archaea, 861 Bacteria and 288 Eukarya.

1.3. Assigning Functional Categories to Protein Domains

Assigning molecular functions to FSFs is a difficult task since approximately 80% of the FSFs defined in SCOP are multi-functional and highly diverse [19]. For example, most of the ancient FSFs, such as the P-loop-containing NTP hydrolase FSF (c.37.1), are highly abundant in nature and include many FFs (20 in case of c.37.1). Each of those families may have functions that impinge on multiple and distinct pathways or networks. The functional annotation scheme introduced by Vogel and Chothia in SUPERFAMILY is a one-to-one mapping scheme that is based on information from various resources, including the Cluster of Orthologus Groups (COG) and Gene Ontology (GO) databases and manual surveys [2023]. When a FSF is involved in multiple functions, the most predominant function is assigned to that multi-functional FSF under the assumption that the most dominant function is the most ancient and predominantly present in all proteomes. The error rate in assignments is estimated to be <10% for large FSFs and <20% for all FSFs [23].

The SUPERFAMILY functional classification maps seven general functional categories to 50 detailed functional categories in a two-tier hierarchy (Table 1). The seven general categories include Metabolism, Information, Intracellular processes (ICP), Extracellular processes (ECP), Regulation, General, and Other (we will refer to them as “categories” and “functional repertoires” interchangeably). In this study, we take advantage of this coarse-grained functional annotation scheme to assign individual functional categories to FSFs. We are aware that this one-to-one mapping may not provide a complete profile for multi-functional domains [19]. Dissection of such detailed functions and their comparison across organisms is a difficult problem that we will not address in this study. In contrast, we focus on domains defined at FSF level and use the coarse-grained functional annotation scheme to explore the functional diversity of the proteomes encoded in genomes that have been completely sequenced. Our results yield a global picture of the functional organization of proteomes that is only possible with this classification scheme. Results suggest that the functional structure of proteomes is remarkably conserved across all organisms, ranging from small bacteria to complex eukaryotes. There is also evidence for the existence of few outliers that deviate from global trends. Here we explore what makes these proteomes distinct.

2. Results and Discussion

2.1. General Patterns in the Distribution of FSF Domain Functions

We studied the molecular functions of 1,646 domains defined at the FSF level of structural abstraction (SCOP 1.73) that are present in the proteomes of a total of 965 organisms spanning the three superkingdoms. A total of 135 FSFs that could not be annotated were excluded from analysis. For these FSFs, the functional annotation is not available. Out of the 1,646 FSFs studied, approximately one-third (32.38%) performs molecular functions related to Metabolism. Categories Other (16.58%), ICP (12.63%), Regulation (12.45%), and Information (12.21%) are uniformly distributed within proteomes. In contrast, General (7.96%) and ECP (5.77%) are significantly underrepresented compared to the rest (Figure 1(A)). The total number of FSFs in each category exhibits the following decreasing trend: Metabolism > Other > ICP > Regulation > Information > General > ECP. These patterns of FSF number and relative proteome content are for the most part maintained when studying the functional annotation of FSFs belonging to each superkingdom (Figure 1(B)). However, the number of FSFs in each superkingdom varies considerably and increases in the order Archaea, Bacteria and Eukarya, as we have shown in earlier studies [7].

The significantly higher number of FSFs devoted to Metabolism is an anticipated result given the central importance of metabolic networks. However, the much larger number of FSFs corresponding to Other is quite unexpected. The 273 FSFs belonging to this category include 200 and 73 FSFs in sub-categories unknown functions and viral proteins, respectively. The sub-category unknown function includes FSFs for which the functions are either unknown or are unclassifiable. Viruses are defined as simple biological entities that are considered to be “gene poor” relatives of cellular organisms [24]. However, the number of domains belonging to viral proteins that are present in cellular organisms makes a noteworthy contribution to the total pool of FSFs (4.43%). Thus, viruses have a much more rich and diverse repertoire of domain structures than previously thought and their association with cellular life has contributed considerable structural diversity to the proteomic make up (A. Nasir, K.M. Kim and G. Caetano-Anollés, ms. in preparation).

The numbers of FSFs belonging to categories Regulation, Information, and ICP are uniformly distributed in proteomes. However, the ECP category is the least represented, perhaps because this category is the last to appear in evolution [7,15]. Extra cellular processes are more important to multicellular organisms (mainly eukaryotes) than to unicellular organisms. Multicellular organisms need efficient communication, such as signaling and cell adhesion. They also trigger immune responses and produce toxins when defending from parasites and pathogens. These ECP processes, which are depicted in the minor categories of cell adhesion, immune response, blood clotting and toxins/defense, are needed when interacting with environmental biotic and abiotic factors and for maintaining the integrity of multicellular structure. These categories are also present in the microbial superkingdoms but their functional role may be different than in Eukarya.

We note that current genomic research is highly shifted towards the sequencing of microbial genomes, especially those that hold parasitic lifestyles and are of bacterial origin. In fact, 67% of proteomes in our dataset belong to Bacteria. This bias can affect conclusions drawn from global trends such as those in Figure 1(A), including the under-representation of ECP FFs, because of their decreased representation in microbial proteomes.

2.2. Distribution of FSF Domain Functions in the Three Superkingdoms of Life

In order to explore whether the overall distribution of general functional categories differs in organisms belonging to the three superkingdoms, we analyzed proteomes at the species level and calculated both the percentage and actual number of FSFs corresponding to different functional repertoires (Figure 2).

FSF domains follow the following decreasing trend in both the percentage and actual counts of FSFs, and do so consistently for the three superkingdoms: Metabolism > Information > ICP > Regulation > Other > General > ECP. Note that trend lines across proteomes seldom overlap and cross in Figure 2. It is noteworthy however that this trend differs from the decreasing total numbers of FSFs we described above (Figure 1). Thus, no correlation should be expected between the numbers of FSFs for individual proteomes and the total set for each category. This suggests that variation in functional assignments across proteomes of superkingdoms may not necessarily match overall functional patterns.

Proteomes in microbial superkingdoms Archaea and Bacteria exhibit remarkably similar functional distributions of FSFs (Figure 2(A)). The only exception appears to be the slight overrepresentation of Regulation FSFs (green trend lines) and underrepresentation of ICP (black trend lines) in Archaea compared to Bacteria (especially Proteobacteria). These distributions are clearly distinct from those in Eukarya. Proteomic representations of FSFs corresponding to Metabolism and Information are decreased while those of all other five functional categories are significantly and consistently increased (Figure 2(A)). There is also more variation evident in Eukarya; large groups of proteomes exhibit different patterns of functional use (clearly evident in Information; red trend lines in Figure 2(A)).

On the whole, the relative functional make up of the proteomes of individual superkingdoms appear highly conserved (Figure 2(A)). There is however considerable variation in the metabolic functional repertoire of organisms, especially in Bacteria, where Metabolism ranges 30–50% of proteomic content (100–350 FSFs, Tables S1 and S2). This variation is not present in other functional repertoires.

Consequently, tendencies of reduction in the metabolic repertoire are generally offset by small increases in the representation of the other six repertoires, with the notable exception of Information. In this particular case, when Metabolism goes down Information goes up. For example, bacterial proteomes with metabolic FSF repertoires of <45% offset their decrease by a corresponding increase in Information FSFs (generally from ∼20% to ∼35%, Figure 2(A)). In all superkingdoms, we identify groups of proteomes or few outliers that deviate from the global trends (vertical dotted lines in Figure 2(A)). As we will discuss below this is generally a consequence of reductive evolution imposed by the lifestyle of organisms (discussed in detail below). Outliers are particularly evident in Bacteria and harbor sharp increases in Information repertoires, not always with corresponding decreases in Metabolism. In Archaea, decreases of Metabolism are generally offset by increases of the Regulation category, with an exception in Nanoarchaeum equitans (see below). In Eukarya, decreases in Metabolism go in hand with decreases in Information, and are correspondingly offset mostly by increases in Regulation and ECP. Apparently, the advantages of regulatory control (e.g., signal transduction and transcriptional and posttranscriptional regulation) and multicellularity counteract the interplay of Metabolism and Information in eukaryotes.

When we look at the actual number of FSFs within each functional repertoire (Figure 2(B)), we observe a clear trend in domain use that matches the total trend for superkingdoms described above (Figure 1). In most cases, the functional repertoires of Archaea are smaller than those of Bacteria, and bacterial repertoires are generally smaller than those of Eukarya (Figure 2(B)). This holds true for all functional categories. However, the numbers of metabolic FSFs vary 1.5–4 fold in proteomes of superkingdoms, the change being maximal in Bacteria. While both proteomes in Eukarya and Bacteria show similar ranges of metabolic FSFs, the repertoire of Archaea is more constrained. Furthermore, FSFs belonging to categories Other and ECP are significantly higher in Eukarya than in the microbial superkingdoms. These remarkable observations suggest high conservation in the make up of proteomes of superkingdoms and at the same time considerable levels of flexibility in the metabolic make-up of organisms. Results also support the evolution of the protein complements of Archaea and Bacteria via reductive evolutionary processes and Eukarya by genome expansion mechanisms [7,25]. Reductive tendencies in microbial superkingdoms do not show bias in favor of any functional category. Furthermore, enrichment of eukaryal proteomes with viral proteins supports theories, which state that viruses have played an important role in the evolution of Eukarya [26].

2.3. Distribution of FSF Domain Functions in Individual Phyla/Kingdoms

Figure 2 also describes the functional distribution of FSFs at the phyla/kingdom level for each superkingdom. Plots describing the percentages (Figure 2(A)) and actual number of FSFs in proteomes (Figure 2(B)) highlight the existence of “outliers” (vertical dotted lines in Figure 2(A)) that deviate from the global functional trends that are typical of each superkingdom.

In Archaea, the functional repertoires of the proteomes of Euryarachaeota, Crenarchaeota, Korarcheota and Thaumarchaeota were remarkably conserved and consistent with each other. Only N. equitans could be considered an outlier (insets of Figure 2). Its proteome deviates from the global archaeal signature by reducing its proteomic make up (it has only 200 distinct FSFs) and by exchanging Information for metabolic FSFs. N. equitans is an obligate intracellular parasite [27] that is part of a new phylum of Archaea, the Nanoarchaeota [28]. N. equitans has many atypical features, including the almost complete absence of operons and presence of split genes [29], tRNA genes that code for only half of the tRNA molecule [30], and the complete absence of the nucleic acid processing enzyme RNAse P [31]. Some of these features were used to propose that N. equitans is a living fossil [32], represents the root of superkingdom Archaea and the tree of life [33], and is part of a very ancient and yet to be described superkingdom (M. Di Giulio, personal communication). Phylogenomic analyses of domain structures in proteomes suggest Archaea is the most ancient superkingdom [19,34] and has placed N. equitans at the base of the tree of life together with other archaeal species. Its ancestral nature is therefore in line with the evolutionary and functional uniqueness of N. equitans and the very distinct functional repertoire we here report.

In Bacteria, the functional repertoires of bacterial phyla were also remarkably conserved. Only Information and Metabolism showed significantly distinct patterns and considerable variation in the use of FSFs. Again, decreases in representation of metabolic FSFs were generally offset by increases in informational FSFs (Figure 2(A)). Notable outliers include the Tenericutes and the Spirochetes. As groups, they have the highest relative usage of Information FSFs, which are clearly offset by a decrease in metabolic FSFs. The Tenericutes is a phylum of bacteria that includes class Mollicutes. Members of the Mollicutes are typical obligate parasites of animals and plants (some of medical significance such as Mycoplasma) that lack cell walls and have gliding motility. These organisms are characterized by small genome sizes [35] considered to have evolved via reductive evolutionary processes [36]. Because of its unique properties and history, mycoplasmas have been used recently to produce a completely synthetic genome [37]. There were also clear outliers in the Proteobacteria. These included Candidatus Blochmannia floridanus (symbiont of ants), Baumannia cicadellinicola (symbiont of sharpshooter insect), Candidatus Riesia pediculicola, Candidatus Carsonella ruddii (symbiont of sap-feeding insects) and Candidatus Hodgkinia cicadicola (symbiont of cicadas). These bacteria are generally endosymbionts of insects (e.g., ants, sharpshooters, psyllids, cicadas) that have undergone irreversible specialization to an intracellular lifestyle. Candidatus Carsonella ruddii has the smallest genome of any bacteria [38]. There were also bacterial proteome groups that were expected to be outliers but were no different than the rest. Bacteria belonging to the superphylum Planctomycetes-Verrucomicrobia-Chlamydiae (PVC) are different from other bacterial phyla because they have an “eukaryotic touch” [39]. Indeed, PVC bacteria display genetic and cellular features that are characteristics of Eukarya and Archaea, including the presence of Histone H1, condensed DNA surrounded by membrane, α-helical repeat domains and β-propeller folds that make up eukaryotic-like membrane coats, reproduction by budding, ether lipids and lack of cell walls [4042]. Due to the unique nature of the PVC superphylum, it was proposed that these organisms be identified as a separate superkingdom that contributed to the evolution of Eukarya and Archaea [40]. However, trees of life generated from domain structures in hundreds of proteomes did not dissect the PVC superphylum into a separate group [7,19,34]. Functional distributions of FSFs now show PVC proteomes appear no different from bacteria (Figure 2). These results do not support PVC-inspired theories that explain the diversification of the three cellular superkingdoms of life.

In contrast to the functional repertoires of bacterial and archaeal phyla, proteomes belonging to individual kingdoms in Eukarya had functional signatures that were highly conserved (Figure 2(A)). However, these signatures differed between groups. Plants and fungi had functional representations that were very similar and showed little diversity. In contrast, Metazoa functional distributions increased the representation of ECP and Regulation FSFs in exchange of FSFs in Metabolism and Information. Protista had patterns that resemble those of Plants and Fungi but had widely varying metabolic repertoires, very much like Bacteria. This possible link between basal eukaryotes and bacteria revealed by our comparative analysis is consistent with the existence of an ancestor of Bacteria and Eukarya and the early rise of Archaea [34]. Only few outliers belonging to kingdoms Fungi (Encephalitozoon cuniculi and Encephalitozoon intestinalis) and Protista (Guillardia theta) were identified. E. cuniculi and E. intestinalis are eukaryotic parasites with highly reduced genomes [43,44]. Similarly, Guillardia theta is a nucleomorph that has a highly compact and reduced genome with loss of nearly all metabolic genes [45].

When we look at the actual number of FSFs in proteomes of phyla and kingdoms (Figure 2(B)) we observe that while the overall patterns match those of FSF representation (Figure 2(A)), FSF number revealed considerable variation in the metabolic repertoire of Protista and Bacteria. FSFs in these groups typically ranged 130–340, with PVC and Spirochetes exhibiting the smallest range (130–300 FSFs). In contrast, metabolic repertoires of Archaea and the other eukaryotic kingdoms typically ranged 200–260 FSFs and 270–350 FSFs, respectively. This observation is significant. It provides comparative information to support a unique evolutionary link of phyla within superkingdoms Eukarya and Bacteria. Plots of FSF number also clarified functional patterns in outliers, revealing they did not have more numbers of FSFs in Information but rather have reduced metabolic repertoires. This shows that parasitic outliers get rid of metabolic domains and become more and more dependent on host cells.

2.4. Effect of Organism Lifestyle

The analysis thus far revealed the existence of a small group of outliers within each superkingdom. Manual inspection of lifestyles of these organisms showed that all of these organisms are united by a parasitic or symbiotic lifestyle. For example, N. equitans is the smallest archaeal genome ever sequenced and represents a new phylum, the Nanoarchaeaota [28]. This organism interacts with Ignicoccus hospitalis, establishing the only known parasite/symbiont relationship of Archaea, and harbors a highly reduced genome [29]. Parasitic/symbiotic relationships with various plants and animals can be found in Tenericutes and in the endosymbionts of insects that belong to Proteobacteria. Similarly, the Encephalitozoon species are eukaryotic parasites that lack mitochondria and have highly reduced genomes [43,44]. E. cunniculi has even a chromosomal dispersion of its ribosomal genes, very much like N. equitans, and the rRNA of the large ribosomal subunit reduced to its universal core [46]. Similarly, Guillardia theta is a nucleomorph that has a highly compact and reduced genome with loss of nearly all metabolic genes [45]. Thus, all outliers exhibit extreme or unique cases of genome reduction.

In order to explore whether organisms that engage in parasitic or symbiotic interactions have general tendencies that resemble those of the outliers, we classified organisms into three different lifestyles: free living (FL) (592 proteomes), facultative parasitic (P) (153 proteomes), and obligate parasitic (OP) (158 proteomes). Functional distributions for the seven general functional categories for these proteomic sets explained the role of parasitic life on proteomic constitution (Figure 3). Plots of percentages (Figure 3(A)) and actual number of FSFs in proteomes (Figure 3(B)) showed FSF distribution in FL organisms were remarkably homogenous and that the vast majority of variability within superkingdoms was ascribed to the P and OP lifestyles. This variability was for the most part explained by a sharp decline in the number of metabolic FSFs that are assigned to the Metabolism general category (Figure 3(B)). Plots also support the hypothesis that parasitic organisms have gone the route of massive genome reduction in a tendency to loose all of their metabolic genes. This tendency makes them more and more dependent on host cells for metabolic functions and survival [47,48].

The number of domains corresponding to each general functional category in the proteomes of FL organisms increases in the order Archaea, Bacteria and Eukarya (Table S3). When compared to the total proteomic set (Figure 2), Metabolism remains the predominant functional category and a large number of domains in all the proteomes perform metabolic functions. Again, the proteomes of Eukarya have the richest FSF repertoires, and those of Archaea the most simple. Since maximum variability lies within the proteome repertoires of parasitic/symbiotic organisms (Figure 3) and parasitism/symbiosis in these organisms is the result of secondary adaptations, the analysis of proteomic diversity in FL organisms allows us to test if the functional repertoires of superkingdoms are indeed statistically significant. Analysis of variance showed that the number of FSFs for each functional repertoire was consistently different between superkingdoms (p < 0.0001; Table S3). This supports the conclusions drawn from earlier analyses that the microbial superkingdoms followed a genome reduction path while Eukarya expanded their genomic repertoires [7,25].

2.5. Analysis of Minor Functional Categories

The seven general categories of molecular functions map to 50 minor categories (Table 1). We explored the distribution of FSFs corresponding to each minor category in superkingdoms (Figure 4). Only category “not annotated” (NONA) was excluded from analysis. In terms of percentage (Figure 4(A)), the overall functional signature is split into two components: prokaryotic and eukaryotic. Prokaryotes spend most of their domain repertoire on Metabolism and Information whereas Eukarya stand out in ECP (particularly cell adhesion, immune response), Regulation (DNA binding, signal transduction), and all the minor functional categories corresponding to ICP and General.

In terms of domain counts (Figure 4(B)), proteomes of Eukarya have the richest functional repertoires with a significantly large number of FSFs devoted for each minor functional category. Bacteria and Archaea work with small number of domains. However, the number of FSFs in Bacteria is significantly higher compared to Archaea (supporting results of Figures 1 and 2 and Table S3). These results are consistent with the evolutionary trends in proteomes described previously [7,19,25]. Our results support the complex nature of the Last Universal Common Ancestor (LUCA) [19] and are consistent with the evolution of microbial superkingdoms via reductive evolutionary processes and the evolution of eukaryal proteomes by genome expansion [7,25]. It appears that Archaea went on the route of genome reduction very early in evolution and was followed by Bacteria and finally Eukarya. Late in evolution, the eukaryal superkingdom increased the representation of FSFs and developed a rich proteome. This can explain the relatively huge and diverse nature of eukaryal proteomes compared to prokaryotic proteomes. Finally, there appears to be no significant difference in the distributions of FSFs corresponding to Metabolism and Information between Bacteria and Eukarya except for minor category “Translation” (green trend lines in Figures 4(B, Information)) that is significantly higher in Eukarya compared to Bacteria. This shows that Bacteria exhibit incredible metabolic and informational diversity despite their reduced genomic complements. We conclude that the genome expansion in Eukarya occurred primarily for functions related to ECP, ICP, Regulation and General.

2.6. Reliability of Functional Annotations and Conclusions of this Study

Our analysis depends upon the accuracy of assigning structures to protein sequences and the SCOP protein classification and SUPERFAMILY functional annotation schemes. Databases such as SCOP and SUPERFAMILY are continuously updated with more and more genomes and new assignments. We therefore ask the reader to focus on the general trends in the data as opposed to the specifics such as the exact percentage or numbers of FSFs in each functional repertoire. Trends related to the number of domains in Archaea relative to Bacteria and Eukarya and the reduction of metabolic repertoires in parasitic organisms should be considered robust since these have been reliably observed in previous studies with more limited datasets [1,7,15,19,34]. Biases in sampling of proteomes in the three superkingdoms is not expected to over or underestimate the remarkably conserved nature of the functional makeup. We show that the conservation of molecular functions in proteomes is only broken in genomic outliers that are united by parasitic lifestyles. Thus equal sampling will not significantly alter the global trends described for individual superkingdoms. In light of our results, organism lifestyle is the only factor affecting the conserved nature of proteomes. Finally, we propose that lower or higher than expected numbers of FSFs in any category (subcategory) can be explained either by possible limitations of the scheme used to annotate molecular functions of FSFs or the simple nature of the functional repertoire. For example, the number of FSFs in subcategory structural proteins (main category General) is 7 (Table 1) despite the importance of structural proteins in cellular organization. Table S4 lists the description of these FSFs and shows that indeed these FSF domains play important structural roles. Their limited number indicates that the structural and functional organization is quite limited and very few folds play important structural roles. Another possibility is the “hidden” overlap between FSFs and molecular functions due to the one-to-one mapping limitations of the SUPERFAMILY functional annotation scheme. Most of the large FSFs include many FFs and participate in multiple pathways; for few FSFs a complete functional profile may not be intuitively obvious. This may be one of the shortcomings of using this functional annotation scheme but dissection of such detailed functions and pathways is a difficult task and is not described in this study. In summary, we do not believe that the classification or annotation schemes, despite their limitations, would undergo serious revisions or weaken our findings.

3. Experimental Section

3.1. Data Retrieval

We downloaded the protein architecture assignments for a total of 965 organisms including 70 Archaea, 651 Bacteria and 244 Eukarya (Table S5) from SUPERFAMILY ver. 1.73 MySQL [16,17] at an E-value cutoff of 10−4. This cutoff is considered a stringent threshold to eliminate the rate of false positives in HMM assignments [19]. Classification of organisms according to their lifestyles was done manually and resulted in 592 FL, 153 P, and 158 OP organisms.

3.2. Assigning Functional Categories to Protein Domains

The most recent domain functional annotation file for SCOP 1.73 was downloaded from the SUPERFAMILY webserver [23]. For each genome we extracted the set of unique FSFs present and then mapped them to the 7 general and 50 detailed functional categories. We calculated both the percentage and actual number of domains using programming implementations in Python 3.1 (

3.3. Statistical Analysis

The statistical significance between the numbers of functional FSFs in FL organisms of superkingdoms was evaluated by Welch's ANOVA in SAS (, which is the appropriate test to detect differences between means for groups having unequal variances [49]. We excluded organisms with P and OP lifestyles in order to remove noise from the data. Additionally, in order to meet asymptotic normality, we used the Log10 transformation and rescaled the data to 0–7 using the following formula,

N normal = [ Log 10 ( N x y ) / Log 10 ( N max ) ] × 7
where Nxy is the count of a FSF in x functional category in y superkingdom; Nmax is the largest value in the matrix and Nnormal is the normalized and scaled score for FSF x in y superkingdom.

4. Conclusions

Our analysis revealed a remarkable conservation in the functional distribution of protein domains in superkingdoms for proteomes for which we have structural assignments. Figure S1 showcases average distribution of FSFs in phyla, kingdoms, and superkingdoms. The biggest proportion of each proteome is devoted in all cases to functions related to Metabolism. Phylogenomic analysis has shown that Metabolism appeared earlier than other functional groups and their structures were the first to spread in life [1,50]. This would explain the relative large representation of Metabolism in the functional toolkit of cells. Usage of domains related to ECP and Regulation is significantly higher in Metazoa compared to the rest. This showcases the importance of regulation signal transduction mechanisms for eukaryotic organisms [51,52]. Our results support the view that prokaryotes evolved via reductive evolutionary processes whereas genome expansion was the route taken by eukaryotic organisms. Genome expansion in Eukarya seems to be directed towards innovation of FSF architectures, especially those linked to Regulation, ECP and General. Finally, viral structures make up a substantial proportion of cellular proteomes and appear to have played an important role in the evolution of cellular life.

Organisms with parasitic lifestyles have simple and reduced proteomes and rely on host cells for metabolic functions. Tenericutes are unique in this regard. They spend most of their proteomic resources in functions linked to Information (e.g., translation, replication). Remarkably, we find that the conservation of molecular functions in proteomes is only broken in “outliers” with parasitic lifestyles that do not obey the global trends. We conclude that organism lifestyle is a crucial factor in shaping the nature of proteomes.

Genes 02 00869f1 200
Figure 1. Number of protein FSFs annotated for each functional category defined in SCOP 1.73 (A) and in the three superkingdoms (B). The functional distributions show that coarse-grained functions are conserved across cellular proteomes and Metabolism is the most dominant functional category. Numbers in parentheses indicate the total number of FSFs annotated in each dataset. The number of FSFs increases in the order Archaea, Bacteria and Eukarya.

Click here to enlarge figure

Figure 1. Number of protein FSFs annotated for each functional category defined in SCOP 1.73 (A) and in the three superkingdoms (B). The functional distributions show that coarse-grained functions are conserved across cellular proteomes and Metabolism is the most dominant functional category. Numbers in parentheses indicate the total number of FSFs annotated in each dataset. The number of FSFs increases in the order Archaea, Bacteria and Eukarya.
Genes 02 00869f1 1024
Genes 02 00869f2 200
Figure 2. The functional distribution of FSFs in individual proteomes of the three superkingdoms. Both the percentage (A) and actual FSF numbers (B) indicate conservation of functional distributions in proteomes and the existence of considerable functional flexibility between superkingdoms. Dotted vertical lines indicate genomic outliers. Insets highlight the interplay between Metabolism (yellow trend lines) and Information (red trend lines) in N. equitans.

Click here to enlarge figure

Figure 2. The functional distribution of FSFs in individual proteomes of the three superkingdoms. Both the percentage (A) and actual FSF numbers (B) indicate conservation of functional distributions in proteomes and the existence of considerable functional flexibility between superkingdoms. Dotted vertical lines indicate genomic outliers. Insets highlight the interplay between Metabolism (yellow trend lines) and Information (red trend lines) in N. equitans.
Genes 02 00869f2 1024
Genes 02 00869f3 200
Figure 3. The functional distribution of FSFs with respect to organism lifestyle. Both the percentage (A) and actual FSF numbers (B) indicate that obligate parasitic (OP) and facultative parasitic (P) organisms exhibit considerable variability in their metabolic repertoires (yellow trend lines) that is offset by corresponding increases in the Information FSFs (red trend lines).

Click here to enlarge figure

Figure 3. The functional distribution of FSFs with respect to organism lifestyle. Both the percentage (A) and actual FSF numbers (B) indicate that obligate parasitic (OP) and facultative parasitic (P) organisms exhibit considerable variability in their metabolic repertoires (yellow trend lines) that is offset by corresponding increases in the Information FSFs (red trend lines).
Genes 02 00869f3 1024
Genes 02 00869f4a 200Genes 02 00869f4b 200
Figure 4. The percentage (A) and number (B) of FSFs in minor functional categories across superkingdoms. Archaea (A) and Bacteria (B) spend most of their proteomes in functions related to Metabolism and Information whereas Eukarya (E) stand out in the minor categories of Regulation, General, Intracellular processes (ICP) and Extracellular processes (ECP). In turn, the number of FSFs increases in the order Archaea, Bacteria and Eukarya. Eukaryal proteomes have the richest functional repertoires for Regulation, Other, General, ICP and ECP.

Click here to enlarge figure

Figure 4. The percentage (A) and number (B) of FSFs in minor functional categories across superkingdoms. Archaea (A) and Bacteria (B) spend most of their proteomes in functions related to Metabolism and Information whereas Eukarya (E) stand out in the minor categories of Regulation, General, Intracellular processes (ICP) and Extracellular processes (ECP). In turn, the number of FSFs increases in the order Archaea, Bacteria and Eukarya. Eukaryal proteomes have the richest functional repertoires for Regulation, Other, General, ICP and ECP.
Genes 02 00869f4a 1024Genes 02 00869f4b 1024
Genes 02 00869f5 200
Figure S1. Average distribution of FSFs in phyla, kingdom, and superkingdoms suggest conservation of functional design in proteomes. Numbers in parentheses indicate total number of proteomes analyzed for each phyla/kingdom.

Click here to enlarge figure

Figure S1. Average distribution of FSFs in phyla, kingdom, and superkingdoms suggest conservation of functional design in proteomes. Numbers in parentheses indicate total number of proteomes analyzed for each phyla/kingdom.
Genes 02 00869f5 1024
Table 1. Mapping between the general and minor functional categories for 1,781 protein domains defined in structural classification of proteins (SCOP) 1.73 and the number of fold superfamilies (FSFs) corresponding to each minor category in our dataset of 965 organisms. A total of 135 FSFs could not be annotated. m/tr, metabolism and transport.

Click here to display table

Table 1. Mapping between the general and minor functional categories for 1,781 protein domains defined in structural classification of proteins (SCOP) 1.73 and the number of fold superfamilies (FSFs) corresponding to each minor category in our dataset of 965 organisms. A total of 135 FSFs could not be annotated. m/tr, metabolism and transport.
Functional categoryMinor categoriesNo. of FSF domains
Metabolism (533 FSFs)Energy54
E- transfer31
Amino acids m/tr20
Nitrogen m/tr1
Nucleotide m/tr30
Carbohydrate m/tr30
Polysaccharide m/tr21
Coenzyme m/tr50
Lipid m/tr17
Cell envelope m/tr8
Secondary metabolism11
Other enzymes156
General (131 FSFs)Small molecule binding27
Ion binding13
Lipid/membrane binding4
Ligand binding3
Protein interaction49
Structural protein7
Information (201 FSFs)Chromatin structure7
DNA replication/repair68
RNA processing10
Nuclear structure0
Other (273 FSFs)Unknown function200
Viral proteins73
Extracellular processes (95 FSFs)Cell adhesion31
Immune response19
Blood clotting5
Intracellular processes (208 FSFs)Cell cycle, Apoptosis20
Phospholipid m/tr6
Cell motility20
Protein modification35
Ion m/tr21
Regulation (205 FSFs)RNA binding, m/tr19
Signal transduction53
Other regulatory function34
Receptor activity18
Table S1. Average number of FSF domains in each phyla/kingdom corresponding to the seven general functional categories. Numbers were rounded up when the decimal value exceeded 0.5 and rounded down otherwise. Nanoarchaeota and Tenericutes have the least number of metabolic domains and are highlighted in bold. Eukaryal kingdoms (Fungi, Metazoa, Plants and Protista) have the richest FSF repertoires compared to the prokaryotes.

Click here to display table

Table S1. Average number of FSF domains in each phyla/kingdom corresponding to the seven general functional categories. Numbers were rounded up when the decimal value exceeded 0.5 and rounded down otherwise. Nanoarchaeota and Tenericutes have the least number of metabolic domains and are highlighted in bold. Eukaryal kingdoms (Fungi, Metazoa, Plants and Protista) have the richest FSF repertoires compared to the prokaryotes.
Rest of Bacteria *255113674837276

*Includes proteomes from Chlorobi, Chloroflexi, Aquificae, Deinococcus thermus, Fusobacteria, Acidobacteria, Deferribacters, Dictyoglomi, Elusimicrobia, Synergistetes, Fibrobacters, Gemmatimonadetes, Nitrospirae, and Thermobaculum.

Table S2. Average percentage of FSF domains in each phyla/kingdom corresponding to the seven general functional categories. Numbers were rounded up when the decimal value exceeded 0.5 and rounded down otherwise. Nanoarchaeota (highlighted in bold) is an outlier considering it has the smallest percentage for metabolic domains compared to the rest and this decrease is offset by an increase in the informational FSFs.

Click here to display table

Table S2. Average percentage of FSF domains in each phyla/kingdom corresponding to the seven general functional categories. Numbers were rounded up when the decimal value exceeded 0.5 and rounded down otherwise. Nanoarchaeota (highlighted in bold) is an outlier considering it has the smallest percentage for metabolic domains compared to the rest and this decrease is offset by an increase in the informational FSFs.
Rest of Bacteria *4621129751

*Includes proteomes from Chlorobi, Chloroflexi, Aquificae, Deinococcus thermus, Fusobacteria, Acidobacteria, Deferribacters, Dictyoglomi, Elusimicrobia, Synergistetes, Fibrobacters, Gemmatimonadetes, Nitrospirae, and Thermobaculum

Table S3. Comparison of functional categories across superkingdoms using Welch's ANOVA.

Click here to display table

Table S3. Comparison of functional categories across superkingdoms using Welch's ANOVA.
Functional categoryF-ratioDFP-value *

*All the P-values are statistically significant at 0.05.

Table S4. Names and description of FSF domains corresponding to subcategory structural proteins in the main category General.

Click here to display table

Table S4. Names and description of FSF domains corresponding to subcategory structural proteins in the main category General.
No.SCOP IdFSF IdDescription
1103589g.71.1Mini-collagen I, C-terminal domain
351269b.85.1Anti-freeze protein (AFP) III-like domain
456558d.182.1Baseplate structural protein gp11
558002h.1.6Chicken cartilage matrix protein
658006h.1.7Assembly domain of catrillage oligomeric matrix protein
775404d.213.1Vesiculovirus (VSV) matrix proteins
Table S5. List of organisms analyzed with their taxonomic classifications.

Click here to display table

Table S5. List of organisms analyzed with their taxonomic classifications.
No.Genome NamePhyla/KingdomSuperkingdom
1Malassezia globosa CBS 7966FungiEukaryota
2Ustilago maydisFungiEukaryota
3Puccinia graminis f. sp. tritici CRL 75-36-700-3FungiEukaryota
4Melampsora laricis-populinaFungiEukaryota
5Sporobolomyces roseus IAM 13481FungiEukaryota
6Serpula lacrymans var. lacrymans S7.9FungiEukaryota
7Coprinopsis cinerea okayama7 130 v3FungiEukaryota
8Pleurotus ostreatusFungiEukaryota
9Laccaria bicolor S238N-H82FungiEukaryota
10Agaricus bisporus var. bisporusFungiEukaryota
11Schizophyllum communeFungiEukaryota
12Heterobasidion annosumFungiEukaryota
13Phanerochaete chrysosporium RP-78 2.1FungiEukaryota
14Postia placentaFungiEukaryota
15Tremella mesentericaFungiEukaryota
16Cryptococcus neoformans JEC21FungiEukaryota
17Magnaporthe grisea 70-15FungiEukaryota
18Podospora anserinaFungiEukaryota
19Sporotrichum thermophile ATCC 42464FungiEukaryota
20Thielavia terrestris NRRL 8126FungiEukaryota
21Chaetomium globosum CBS 148.51FungiEukaryota
22Neurospora tetraspermaFungiEukaryota
23Neurospora discreta FGSC 8579FungiEukaryota
24Neurospora crassa OR74AFungiEukaryota
25Cryphonectria parasiticaFungiEukaryota
26Verticillium dahliae VdLs.17FungiEukaryota
27Verticillium albo-atrum VaMs.102FungiEukaryota
28Fusarium oxysporum f. sp. lycopersici 4286FungiEukaryota
29Nectria haematococca mpVIFungiEukaryota
30Fusarium verticillioides 7600FungiEukaryota
31Fusarium graminearumFungiEukaryota
32Trichoderma atrovirideFungiEukaryota
33Trichoderma reesei 1.2FungiEukaryota
34Trichoderma virens Gv29-8FungiEukaryota
35Botrytis cinerea B05.10FungiEukaryota
36Sclerotinia sclerotiorumFungiEukaryota
37Alternaria brassicicolaFungiEukaryota
38Pyrenophora tritici-repentisFungiEukaryota
39Cochliobolus heterostrophusFungiEukaryota
40Stagonospora nodorumFungiEukaryota
41Mycosphaerella fijiensis CIRAD86FungiEukaryota
42Mycosphaerella graminicola IPO323FungiEukaryota
43Ajellomyces dermatitidis SLH14081FungiEukaryota
44Histoplasma capsulatum class NAmI strain WU24FungiEukaryota
45Microsporum canis CBS 113480FungiEukaryota
46Microsporum gypseumFungiEukaryota
47Arthroderma benhamiae CBS 112371FungiEukaryota
48Trichophyton equinum CBS 127.97FungiEukaryota
49Trichophyton verrucosum HKI 0517FungiEukaryota
50Trichophyton tonsurans CBS 112818FungiEukaryota
51Trichophyton rubrum CBS 118892FungiEukaryota
52Paracoccidioides brasiliensis Pb18FungiEukaryota
53Coccidioides posadasii RMSCC 3488FungiEukaryota
54Coccidioides immitis RSFungiEukaryota
55Uncinocarpus reesii 1704FungiEukaryota
56Aspergillus fumigatus Af293FungiEukaryota
57Neosartorya fischeri NRRL 181FungiEukaryota
58Penicillium chrysogenum Wisconsin 54-1255FungiEukaryota
59Penicillium marneffei ATCC 18224FungiEukaryota
60Aspergillus carbonarius ITEM 5010FungiEukaryota
61Aspergillus terreus NIH2624FungiEukaryota
62Aspergillus oryzae RIB40FungiEukaryota
63Aspergillus niger ATCC 1015FungiEukaryota
64Aspergillus flavus NRRL3357FungiEukaryota
65Aspergillus clavatus NRRL 1FungiEukaryota
66Aspergillus nidulans FGSC A4FungiEukaryota
67Tuber melanosporum VittadFungiEukaryota
68Pichia stipitis CBS 6054FungiEukaryota
69Candida guilliermondii ATCC 6260FungiEukaryota
70Lodderomyces elongisporus NRRL YB-4239FungiEukaryota
71Debaromyces hanseniiFungiEukaryota
72Candida dubliniensis CD36FungiEukaryota
73Candida tropicalis MYA-3404FungiEukaryota
74Candida parapsilosisFungiEukaryota
75Candida albicans SC5314FungiEukaryota
76Yarrowia lipolytica CLIB122FungiEukaryota
77Candida lusitaniae ATCC 42720FungiEukaryota
78Vanderwaltozyma polyspora DSM 70294FungiEukaryota
79Candida glabrata CBS138FungiEukaryota
80Kluyveromyces thermotolerans CBS 6340FungiEukaryota
81Lachancea kluyveriFungiEukaryota
82Kluyveromyces waltiiFungiEukaryota
83Ashbya gossypii ATCC 10895FungiEukaryota
84Zygosaccharomyces rouxiiFungiEukaryota
85Saccharomyces mikatae MITFungiEukaryota
86Saccharomyces paradoxus MITFungiEukaryota
87Saccharomyces cerevisiae SGDFungiEukaryota
88Saccharomyces bayanus MITFungiEukaryota
89Pichia pastoris GS115FungiEukaryota
90Kluyveromyces lactisFungiEukaryota
91Schizosaccharomyces octosporus yFS286FungiEukaryota
92Schizosaccharomyces japonicus yFS275FungiEukaryota
93Schizosaccharomyces pombeFungiEukaryota
94Allomyces macrogynus ATCC 38327FungiEukaryota
95Rhizopus oryzae RA 99-880FungiEukaryota
96Phycomyces blakesleeanusFungiEukaryota
97Mucor circinelloidesFungiEukaryota
98Spizellomyces punctatus DAOM BR117FungiEukaryota
99Batrachochytrium dendrobatidis JEL423FungiEukaryota
100Encephalitozoon cuniculiFungiEukaryota
101Encephalitozoon intestinalisFungiEukaryota
102Homo sapiens 59_37d (all transcripts)MetazoaEukaryota
103Pan troglodytes 59_21n (all transcripts)MetazoaEukaryota
104Gorilla gorilla 59_3b (all transcripts)MetazoaEukaryota
105Pongo pygmaeus 59_1e (all transcripts)MetazoaEukaryota
106Macaca mulatta 59_10n (all transcripts)MetazoaEukaryota
107Callithrix jacchus 59_321a (all transcripts)MetazoaEukaryota
108Otolemur garnettii 59_1g (all transcripts)MetazoaEukaryota
109Microcebus murinus 59_1d (all transcripts)MetazoaEukaryota
110Tarsius syrichta 59_1e (all transcripts)MetazoaEukaryota
111Rattus norvegicus 59_34a (all transcripts)MetazoaEukaryota
112Mus musculus 59_37l (all transcripts)MetazoaEukaryota
113Spermophilus tridecemlineatus 59_1i (all transcripts)MetazoaEukaryota
114Dipodomys ordii 59_1e (all transcripts)MetazoaEukaryota
115Cavia porcellus 59_3c (all transcripts)MetazoaEukaryota
116Oryctolagus cuniculus 59_2b (all transcripts)MetazoaEukaryota
117Ochotona princeps 59_1e (all transcripts)MetazoaEukaryota
118Tupaia belangeri 59_1h (all transcripts)MetazoaEukaryota
119Sus scrofa 59_9c (all transcripts)MetazoaEukaryota
120Bos taurus 59_4h (all transcripts)MetazoaEukaryota
121Vicugna pacos 59_1e (all transcripts)MetazoaEukaryota
122Tursiops truncatus 59_1e (all transcripts)MetazoaEukaryota
123Canis familiaris 59_2o (all transcripts)MetazoaEukaryota
124Felis catus 59_1h (all transcripts)MetazoaEukaryota
125Equus caballus 59_2f (all transcripts)MetazoaEukaryota
126Myotis lucifugus 59_1i (all transcripts)MetazoaEukaryota
127Pteropus vampyrus 59_1e (all transcripts)MetazoaEukaryota
128Sorex araneus 59_1g (all transcripts)MetazoaEukaryota
129Erinaceus europaeus 59_1g (all transcripts)MetazoaEukaryota
130Procavia capensis 59_1e (all transcripts)MetazoaEukaryota
131Loxodonta africana 59_3b (all transcripts)MetazoaEukaryota
132Echinops telfairi 59_1i (all transcripts)MetazoaEukaryota
133Dasypus novemcinctus 59_2c (all transcripts)MetazoaEukaryota
134Macropus eugenii 59_1b (all transcripts)MetazoaEukaryota
135Monodelphis domestica 59_5k (all transcripts)MetazoaEukaryota
136Ornithorhynchus anatinus 59_1m (all transcripts)MetazoaEukaryota
137Anolis carolinensis 59_1c (all transcripts)MetazoaEukaryota
138Taeniopygia guttata 59_1e (all transcripts)MetazoaEukaryota
139Meleagris gallopavo 57_2 (all transcripts)MetazoaEukaryota
140Gallus gallus 59_2o (all transcripts)MetazoaEukaryota
141Xenopus laevisMetazoaEukaryota
142Xenopus tropicalis 59_41p (all transcripts)MetazoaEukaryota
143Danio rerio 59_8e (all transcripts)MetazoaEukaryota
144Gasterosteus aculeatus 59_1l (all transcripts)MetazoaEukaryota
145Oryzias latipes 59_1k (all transcripts)MetazoaEukaryota
146Tetraodon nigroviridis 59_8d (all transcripts)MetazoaEukaryota
147Takifugu rubripes 59_4m (all transcripts)MetazoaEukaryota
148Branchiostoma floridae 1.0MetazoaEukaryota
149Ciona savignyi 59_2j (all transcripts)MetazoaEukaryota
150Ciona intestinalis 59_2o (all transcripts)MetazoaEukaryota
151Strongylocentrotus purpuratusMetazoaEukaryota
152Helobdella robustaMetazoaEukaryota
153Capitella sp. IMetazoaEukaryota
154Bombyx moriMetazoaEukaryota
155Nasonia vitripennisMetazoaEukaryota
156Apis mellifera 38.2d (all transcripts)MetazoaEukaryota
157Drosophila grimshawi 1.3MetazoaEukaryota
158Drosophila willistoni 1.3MetazoaEukaryota
159Drosophila pseudoobscura 2.13MetazoaEukaryota
160Drosophila persimilis 1.3MetazoaEukaryota
161Drosophila yakuba 1.3MetazoaEukaryota
162Drosophila simulans 1.3MetazoaEukaryota
163Drosophila sechellia 1.3MetazoaEukaryota
164Drosophila melanogaster 59_525a (all transcripts)MetazoaEukaryota
165Drosophila erecta 1.3MetazoaEukaryota
166Drosophila ananassae 1.3MetazoaEukaryota
167Drosophila virilis 1.2MetazoaEukaryota
168Drosophila mojavensis 1.3MetazoaEukaryota
169Aedes aegypti 55 (all transcripts)MetazoaEukaryota
170Culex pipiens quinquefasciatusMetazoaEukaryota
171Anopheles gambiae 49_3j (all transcripts)MetazoaEukaryota
172Tribolium castaneum 3.0MetazoaEukaryota
173Pediculus humanus corporisMetazoaEukaryota
174Acyrthosiphon pisumMetazoaEukaryota
175Daphnia pulexMetazoaEukaryota
176Ixodes scapularisMetazoaEukaryota
177Lottia giganteaMetazoaEukaryota
178Pristionchus pacificusMetazoaEukaryota
179Meloidogyne incognitaMetazoaEukaryota
180Brugia malayi WS218MetazoaEukaryota
181Caenorhabditis japonicaMetazoaEukaryota
182Caenorhabditis brenneriMetazoaEukaryota
183Caenorhabditis remaneiMetazoaEukaryota
184Caenorhabditis elegans 59_210a (all transcripts)MetazoaEukaryota
185Caenorhabditis briggsae 2MetazoaEukaryota
186Schistosoma mansoniMetazoaEukaryota
187Nematostella vectensis 1.0MetazoaEukaryota
188Hydra magnipapillataMetazoaEukaryota
189Trichoplax adhaerensMetazoaEukaryota
190Giardia lamblia 2.3ProtistaEukaryota
191Trypanosoma cruzi strain CL BrenerProtistaEukaryota
192Trypanosoma bruceiProtistaEukaryota
193Leishmania mexicana 2.4ProtistaEukaryota
194Leishmania major strain FriedlinProtistaEukaryota
195Leishmania infantum JPCM5 2.4ProtistaEukaryota
196Leishmania braziliensis MHOM/BR/75/M2904 2.4ProtistaEukaryota
197Aureococcus anophagefferensProtistaEukaryota
198Phytophthora ramorum 1.1ProtistaEukaryota
199Phytophthora sojae 1.1ProtistaEukaryota
200Phytophthora infestans T30-4ProtistaEukaryota
201Phytophthora capsiciProtistaEukaryota
202Paramecium tetraureliaProtistaEukaryota
203Tetrahymena thermophila SB210 1ProtistaEukaryota
204Babesia bovis T2BoProtistaEukaryota
205Theileria parvaProtistaEukaryota
206Theileria annulataProtistaEukaryota
207Plasmodium falciparum 3D7ProtistaEukaryota
208Plasmodium vivax SaI-1 7.0ProtistaEukaryota
209Plasmodium knowlesi strain HProtistaEukaryota
210Plasmodium yoelii ssp. yoelii 1ProtistaEukaryota
211Plasmodium chabaudiProtistaEukaryota
212Plasmodium berghei ANKAProtistaEukaryota
213Cryptosporidium hominisProtistaEukaryota
214Cryptosporidium murisProtistaEukaryota
215Cryptosporidium parvum Iowa IIProtistaEukaryota
216Neospora caninum Nc-Liverpool 6.2ProtistaEukaryota
217Neospora caninumProtistaEukaryota
218Toxoplasma gondii ME49ProtistaEukaryota
219Naegleria gruberiProtistaEukaryota
220Guillardia thetaProtistaEukaryota
221Arabidopsis lyrataPlantaeEukaryota
222Arabidopsis thaliana 10 (all transcripts)PlantaeEukaryota
223Carica papayaPlantaeEukaryota
224Medicago truncatulaPlantaeEukaryota
225Glycine maxPlantaeEukaryota
226Cucumis sativusPlantaeEukaryota
227Populus trichocarpa 6.0PlantaeEukaryota
228Vitis viniferaPlantaeEukaryota
229Brachypodium distachyonPlantaeEukaryota
230Oryza sativa ssp. japonica 5.0PlantaeEukaryota
231Zea mays subsp. maysPlantaeEukaryota
232Sorghum bicolorPlantaeEukaryota
233Selaginella moellendorffiiPlantaeEukaryota
234Physcomitrella patens subsp. patensPlantaeEukaryota
235Ostreococcus sp. RCC809PlantaeEukaryota
236Ostreococcus lucimarinus CCE9901PlantaeEukaryota
237Ostreococcus tauriPlantaeEukaryota
238Micromonas sp. RCC299PlantaeEukaryota
239Micromonas pusilla CCMP1545PlantaeEukaryota
240Coccomyxa sp. C-169PlantaeEukaryota
241Chlorella sp. NC64APlantaeEukaryota
242Chlorella vulgarisPlantaeEukaryota
243Volvox carteri f. nagariensisPlantaeEukaryota
244Chlamydomonas reinhardtii 4.0PlantaeEukaryota
245Candidatus Koribacter versatilis Ellin345AcidobacteriaBacteria
246Candidatus Solibacter usitatus Ellin6076AcidobacteriaBacteria
247Acidobacterium capsulatum ATCC 51196AcidobacteriaBacteria
248Gardnerella vaginalis 409-05ActinobacteriaBacteria
249Bifidobacterium longum NCC2705ActinobacteriaBacteria
250Bifidobacterium animalis ssp. lactis AD011ActinobacteriaBacteria
251Bifidobacterium dentium Bd1ActinobacteriaBacteria
252Bifidobacterium adolescentis ATCC 15703ActinobacteriaBacteria
253Kineococcus radiotolerans SRS30216ActinobacteriaBacteria
254Catenulispora acidiphila DSM 44928ActinobacteriaBacteria
255Stackebrandtia nassauensis DSM 44728ActinobacteriaBacteria
256Acidothermus cellulolyticus 11BActinobacteriaBacteria
257Nakamurella multipartita DSM 44233ActinobacteriaBacteria
258Geodermatophilus obscurus DSM 43160ActinobacteriaBacteria
259Frankia sp. CcI3ActinobacteriaBacteria
260Frankia alni ACN14aActinobacteriaBacteria
261Thermobifida fusca YXActinobacteriaBacteria
262Thermomonospora curvata DSM 43183ActinobacteriaBacteria
263Streptosporangium roseum DSM 43021ActinobacteriaBacteria
264Streptomyces griseus ssp. griseus NBRC 13350ActinobacteriaBacteria
265Streptomyces avermitilis MA-4680ActinobacteriaBacteria
266Streptomyces scabiei 87.22ActinobacteriaBacteria
267Streptomyces coelicolorActinobacteriaBacteria
268Actinosynnema mirum DSM 43827ActinobacteriaBacteria
269Saccharomonospora viridis DSM 43017ActinobacteriaBacteria
270Saccharopolyspora erythraea NRRL 2338ActinobacteriaBacteria
271Kribbella flavida DSM 17836ActinobacteriaBacteria
272Nocardioides sp. JS614ActinobacteriaBacteria
273Propionibacterium acnes KPA171202ActinobacteriaBacteria
274Salinispora arenicola CNS-205ActinobacteriaBacteria
275Salinispora tropica CNB-440ActinobacteriaBacteria
276Gordonia bronchialis DSM 43247ActinobacteriaBacteria
277Rhodococcus jostii RHA1ActinobacteriaBacteria
278Rhodococcus opacus B4ActinobacteriaBacteria
279Rhodococcus erythropolis PR4ActinobacteriaBacteria
280Nocardia farcinica IFM 10152ActinobacteriaBacteria
281Mycobacterium abscessus ATCC 19977ActinobacteriaBacteria
282Mycobacterium sp. MCSActinobacteriaBacteria
283Mycobacterium avium ssp. paratuberculosis K-10ActinobacteriaBacteria
284Mycobacterium vanbaalenii PYR-1ActinobacteriaBacteria
285Mycobacterium tuberculosis H37RvActinobacteriaBacteria
286Mycobacterium bovis AF2122/97ActinobacteriaBacteria
287Mycobacterium ulcerans Agy99ActinobacteriaBacteria
288Mycobacterium gilvum PYR-GCKActinobacteriaBacteria
289Mycobacterium marinum MActinobacteriaBacteria
290Mycobacterium smegmatis MC2 155ActinobacteriaBacteria
291Mycobacterium leprae TNActinobacteriaBacteria
292Corynebacterium aurimucosum ATCC 700975ActinobacteriaBacteria
293Corynebacterium kroppenstedtii DSM 44385ActinobacteriaBacteria
294Corynebacterium efficiens YS-314ActinobacteriaBacteria
295Corynebacterium urealyticum DSM 7109ActinobacteriaBacteria
296Corynebacterium jeikeium K411ActinobacteriaBacteria
297Corynebacterium glutamicum ATCC 13032 KitasatoActinobacteriaBacteria
298Corynebacterium diphtheriae NCTC 13129ActinobacteriaBacteria
299Tropheryma whipplei TwistActinobacteriaBacteria
300Sanguibacter keddieii DSM 10542ActinobacteriaBacteria
301Kytococcus sedentarius DSM 20547ActinobacteriaBacteria
302Beutenbergia cavernae DSM 12333ActinobacteriaBacteria
303Leifsonia xyli ssp. xyli CTCB07ActinobacteriaBacteria
304Clavibacter michiganensis ssp. michiganensis NCPPB 382ActinobacteriaBacteria
305Jonesia denitrificans DSM 20603ActinobacteriaBacteria
306Brachybacterium faecium DSM 4810ActinobacteriaBacteria
307Xylanimonas cellulosilytica DSM 15894ActinobacteriaBacteria
308Kocuria rhizophila DC2201ActinobacteriaBacteria
309Rothia mucilaginosa DY-18ActinobacteriaBacteria
310Arthrobacter sp. FB24ActinobacteriaBacteria
311Arthrobacter chlorophenolicus A6ActinobacteriaBacteria
312Arthrobacter aurescens TC1ActinobacteriaBacteria
313Renibacterium salmoninarum ATCC 33209ActinobacteriaBacteria
314Micrococcus luteus NCTC 2665ActinobacteriaBacteria
315Cryptobacterium curtum DSM 15641ActinobacteriaBacteria
316Eggerthella lenta DSM 2243ActinobacteriaBacteria
317Slackia heliotrinireducens DSM 20476ActinobacteriaBacteria
318Atopobium parvulum DSM 20469ActinobacteriaBacteria
319Conexibacter woesei DSM 14684ActinobacteriaBacteria
320Rubrobacter xylanophilus DSM 9941ActinobacteriaBacteria
321Acidimicrobium ferrooxidans DSM 10331ActinobacteriaBacteria
322Sulfurihydrogenibium sp. YO3AOP1AquificaeBacteria
323Sulfurihydrogenibium azorense Az-Fu1AquificaeBacteria
324Persephonella marina EX-H1AquificaeBacteria
325Hydrogenobaculum sp. Y04AAS1AquificaeBacteria
326Thermocrinis albus DSM 14484AquificaeBacteria
327Aquifex aeolicus VF5AquificaeBacteria
328Hydrogenobacter thermophilus TK-6AquificaeBacteria
329Dyadobacter fermentans DSM 18053BacteroidetesBacteria
330Cytophaga hutchinsonii ATCC 33406BacteroidetesBacteria
331Spirosoma linguale DSM 74BacteroidetesBacteria
332Candidatus Azobacteroides pseudotrichonymphae genomovar.BacteroidetesBacteria
333Prevotella ruminicola 23BacteroidetesBacteria
334Parabacteroides distasonis ATCC 8503BacteroidetesBacteria
335Porphyromonas gingivalis W83BacteroidetesBacteria
336Bacteroides vulgatus ATCC 8482BacteroidetesBacteria
337Bacteroides thetaiotaomicron VPI-5482BacteroidetesBacteria
338Bacteroides fragilis NCTC 9343BacteroidetesBacteria
339Candidatus Amoebophilus asiaticus 5a2BacteroidetesBacteria
340Salinibacter ruber DSM 13855BacteroidetesBacteria
341Rhodothermus marinus DSM 4252BacteroidetesBacteria
342Chitinophaga pinensis DSM 2588BacteroidetesBacteria
343Pedobacter heparinus DSM 2366BacteroidetesBacteria
344Candidatus Sulcia muelleri GWSSBacteroidetesBacteria
345Zunongwangia profunda SM-A87BacteroidetesBacteria
346Gramella forsetii KT0803BacteroidetesBacteria
347Robiginitalea biformata HTCC2501BacteroidetesBacteria
348Flavobacteriaceae bacterium 3519-10BacteroidetesBacteria
349Capnocytophaga ochracea DSM 7271BacteroidetesBacteria
350Flavobacterium psychrophilum JIP02/86BacteroidetesBacteria
351Flavobacterium johnsoniae UW101BacteroidetesBacteria
352Blattabacterium sp. BgeBacteroidetesBacteria
353Candidatus Protochlamydia amoebophila UWE25ChlamydiaeBacteria
354Chlamydophila pneumoniae TW-183ChlamydiaeBacteria
355Chlamydophila caviae GPICChlamydiaeBacteria
356Chlamydophila felis Fe/C-56ChlamydiaeBacteria
357Chlamydophila abortus S26/3ChlamydiaeBacteria
358Chlamydia muridarum NiggChlamydiaeBacteria
359Chlamydia trachomatis D/UW-3/CXChlamydiaeBacteria
360Pelodictyon phaeoclathratiforme BU-1ChlorobiBacteria
361Chlorobium luteolum DSM 273ChlorobiBacteria
362Chlorobium chlorochromatii CaD3ChlorobiBacteria
363Chlorobium phaeobacteroides DSM 266ChlorobiBacteria
364Chlorobium phaeovibrioides DSM 265ChlorobiBacteria
365Chlorobium limicola DSM 245ChlorobiBacteria
366Chlorobaculum parvum NCIB 8327ChlorobiBacteria
367Chlorobium tepidum TLSChlorobiBacteria
368Chloroherpeton thalassium ATCC 35110ChlorobiBacteria
369Prosthecochloris aestuarii DSM 271ChlorobiBacteria
370Dehalococcoides sp. CBDB1ChloroflexiBacteria
371Dehalococcoides ethenogenes 195ChloroflexiBacteria
372Thermomicrobium roseum DSM 5159ChloroflexiBacteria
373Sphaerobacter thermophilus DSM 20745ChloroflexiBacteria
374Herpetosiphon aurantiacus ATCC 23779ChloroflexiBacteria
375Roseiflexus sp. RS-1ChloroflexiBacteria
376Roseiflexus castenholzii DSM 13941ChloroflexiBacteria
377Chloroflexus sp. Y-400-flChloroflexiBacteria
378Chloroflexus aggregans DSM 9485ChloroflexiBacteria
379Chloroflexus aurantiacus J-10-flChloroflexiBacteria
380Gloeobacter violaceus PCC 7421CyanobacteriaBacteria
381Acaryochloris marina MBIC11017CyanobacteriaBacteria
382Prochlorococcus marinus MIT 9313CyanobacteriaBacteria
383Nostoc punctiforme PCC 73102CyanobacteriaBacteria
384Nostoc sp. PCC 7120CyanobacteriaBacteria
385Anabaena variabilis ATCC 29413CyanobacteriaBacteria
386Trichodesmium erythraeum IMS101CyanobacteriaBacteria
387Thermosynechococcus elongatus BP-1CyanobacteriaBacteria
388cyanobacterium UCYN-ACyanobacteriaBacteria
389Cyanothece sp. ATCC 51142CyanobacteriaBacteria
390Synechocystis sp. PCC 6803CyanobacteriaBacteria
391Synechococcus elongatus PCC 6301CyanobacteriaBacteria
392Microcystis aeruginosa NIES-843CyanobacteriaBacteria
393Denitrovibrio acetiphilus DSM 12809DeferribacteresBacteria
394Deferribacter desulfuricans SSM1DeferribacteresBacteria
395Deinococcus deserti VCD115Deinococcus-ThermusBacteria
396Deinococcus geothermalis DSM 11300Deinococcus-ThermusBacteria
397Deinococcus radiodurans R1Deinococcus-ThermusBacteria
398Meiothermus ruber DSM 1279Deinococcus-ThermusBacteria
399Thermus thermophilus HB27Deinococcus-ThermusBacteria
400Dictyoglomus turgidum DSM 6724DictyoglomiBacteria
401Dictyoglomus thermophilum H-6-12DictyoglomiBacteria
402Elusimicrobium minutum Pei191ElusimicrobiaBacteria
403uncultured Termite group 1 bacterium phylotype Rs-D17ElusimicrobiaBacteria
404Fibrobacter succinogenes ssp. succinogenes S85FibrobacteresBacteria
405Acidaminococcus fermentans DSM 20731FirmicutesBacteria
406Veillonella parvula DSM 2008FirmicutesBacteria
407Natranaerobius thermophilus JW/NM-WN-LFFirmicutesBacteria
408Symbiobacterium thermophilum IAM 14863FirmicutesBacteria
409Anaerococcus prevotii DSM 20548FirmicutesBacteria
410Finegoldia magna ATCC 29328FirmicutesBacteria
411Clostridiales genomosp. BVAB3 UPII9-5FirmicutesBacteria
412Candidatus Desulforudis audaxviator MP104CFirmicutesBacteria
413Pelotomaculum thermopropionicum SIFirmicutesBacteria
414Desulfitobacterium hafniense Y51FirmicutesBacteria
415Desulfotomaculum reducens MI-1FirmicutesBacteria
416Desulfotomaculum acetoxidans DSM 771FirmicutesBacteria
417Eubacterium rectale ATCC 33656FirmicutesBacteria
418Eubacterium eligens ATCC 27750FirmicutesBacteria
419Syntrophomonas wolfei ssp. wolfei GoettingenFirmicutesBacteria
420Heliobacterium modesticaldum Ice1FirmicutesBacteria
421Alkaliphilus oremlandii OhILAsFirmicutesBacteria
422Alkaliphilus metalliredigens QYMFFirmicutesBacteria
423Clostridium phytofermentans ISDgFirmicutesBacteria
424Clostridium novyi NTFirmicutesBacteria
425Clostridium kluyveri DSM 555FirmicutesBacteria
426Clostridium cellulolyticum H10FirmicutesBacteria
427Clostridium beijerinckii NCIMB 8052FirmicutesBacteria
428Clostridium thermocellum ATCC 27405FirmicutesBacteria
429Clostridium tetani E88FirmicutesBacteria
430Clostridium perfringens 13FirmicutesBacteria
431Clostridium difficile 630FirmicutesBacteria
432Clostridium botulinum A ATCC 3502FirmicutesBacteria
433Clostridium acetobutylicum ATCC 824FirmicutesBacteria
434Caldicellulosiruptor saccharolyticus DSM 8903FirmicutesBacteria
435Anaerocellum thermophilum DSM 6725FirmicutesBacteria
436Coprothermobacter proteolyticus DSM 5265FirmicutesBacteria
437Thermoanaerobacter tengcongensis MB4FirmicutesBacteria
438Carboxydothermus hydrogenoformans Z-2901FirmicutesBacteria
439Moorella thermoacetica ATCC 39073FirmicutesBacteria
440Ammonifex degensii KC4FirmicutesBacteria
441Thermoanaerobacter pseudethanolicus ATCC 33223FirmicutesBacteria
442Thermoanaerobacter sp. X514FirmicutesBacteria
443Thermoanaerobacter italicus Ab9FirmicutesBacteria
444Halothermothrix orenii H 168FirmicutesBacteria
445Enterococcus faecalis V583FirmicutesBacteria
446Oenococcus oeni PSU-1FirmicutesBacteria
447Leuconostoc citreum KM20FirmicutesBacteria
448Leuconostoc mesenteroides ssp. mesenteroides ATCC 8293FirmicutesBacteria
449Lactobacillus casei ATCC 334FirmicutesBacteria
450Lactobacillus crispatus ST1FirmicutesBacteria
451Lactobacillus rhamnosus GGFirmicutesBacteria
452Lactobacillus johnsonii NCC 533FirmicutesBacteria
453Lactobacillus salivarius UCC118FirmicutesBacteria
454Lactobacillus fermentum IFO 3956FirmicutesBacteria
455Lactobacillus sakei ssp. sakei 23KFirmicutesBacteria
456Lactobacillus reuteri DSM 20016FirmicutesBacteria
457Lactobacillus gasseri ATCC 33323FirmicutesBacteria
458Lactobacillus plantarum WCFS1FirmicutesBacteria
459Lactobacillus helveticus DPC 4571FirmicutesBacteria
460Lactobacillus delbrueckii ssp. bulgaricus ATCC 11842FirmicutesBacteria
461Lactobacillus brevis ATCC 367FirmicutesBacteria
462Lactobacillus acidophilus NCFMFirmicutesBacteria
463Pediococcus pentosaceus ATCC 25745FirmicutesBacteria
464Lactococcus lactis ssp. lactis Il1403FirmicutesBacteria
465Streptococcus gallolyticus UCN34FirmicutesBacteria
466Streptococcus equi ssp. zooepidemicus MGCS10565FirmicutesBacteria
467Streptococcus dysgalactiae ssp. equisimilis GGS_124FirmicutesBacteria
468Streptococcus mitis B6FirmicutesBacteria
469Streptococcus uberis 0140JFirmicutesBacteria
470Streptococcus pyogenes M1 GASFirmicutesBacteria
471Streptococcus pneumoniae TIGR4FirmicutesBacteria
472Streptococcus agalactiae NEM316FirmicutesBacteria
473Streptococcus mutans UA159FirmicutesBacteria
474Streptococcus thermophilus LMG 18311FirmicutesBacteria
475Streptococcus suis 05ZYH33FirmicutesBacteria
476Streptococcus sanguinis SK36FirmicutesBacteria
477Streptococcus gordonii Challis subCH1FirmicutesBacteria
478Exiguobacterium sp. AT1bFirmicutesBacteria
479Exiguobacterium sibiricum 255-15FirmicutesBacteria
480Bacillus tusciae DSM 2912FirmicutesBacteria
481Alicyclobacillus acidocaldarius ssp. acidocaldarius DSM 446FirmicutesBacteria
482Brevibacillus brevis NBRC 100599FirmicutesBacteria
483Paenibacillus sp. JDR-2FirmicutesBacteria
484Listeria welshimeri ser. 6b SLCC5334FirmicutesBacteria
485Listeria innocua Clip11262FirmicutesBacteria
486Listeria seeligeri ser. 1/2b SLCC3954FirmicutesBacteria
487Listeria monocytogenes EGD-eFirmicutesBacteria
488Lysinibacillus sphaericus C3-41FirmicutesBacteria
489Oceanobacillus iheyensis HTE831FirmicutesBacteria
490Anoxybacillus flavithermus WK1FirmicutesBacteria
491Geobacillus sp. WCH70FirmicutesBacteria
492Geobacillus thermodenitrificans NG80-2FirmicutesBacteria
493Geobacillus kaustophilus HTA426FirmicutesBacteria
494Bacillus subtilis ssp. subtilis 168FirmicutesBacteria
495Bacillus licheniformis ATCC 14580FirmicutesBacteria
496Bacillus amyloliquefaciens FZB42FirmicutesBacteria
497Bacillus halodurans C-125FirmicutesBacteria
498Bacillus weihenstephanensis KBAB4FirmicutesBacteria
499Bacillus thuringiensis ser. konkukian 97-27FirmicutesBacteria
500Bacillus cereus ATCC 14579FirmicutesBacteria
501Bacillus anthracis Ames AncestorFirmicutesBacteria
502Bacillus pseudofirmus OF4FirmicutesBacteria
503Bacillus clausii KSM-K16FirmicutesBacteria
504Bacillus pumilus SAFR-032FirmicutesBacteria
505Bacillus megaterium QM B1551FirmicutesBacteria
506Macrococcus caseolyticus JCSC5402FirmicutesBacteria
507Staphylococcus saprophyticus ssp. saprophyticus ATCC 15305FirmicutesBacteria
508Staphylococcus lugdunensis HKU09-01FirmicutesBacteria
509Staphylococcus haemolyticus JCSC1435FirmicutesBacteria
510Staphylococcus epidermidis RP62AFirmicutesBacteria
511Staphylococcus carnosus ssp. carnosus TM300FirmicutesBacteria
512Staphylococcus aureus ssp. aureus NCTC 8325FirmicutesBacteria
513Streptobacillus moniliformis DSM 12112FusobacteriaBacteria
514Sebaldella termitidis ATCC 33386FusobacteriaBacteria
515Leptotrichia buccalis C-1013-bFusobacteriaBacteria
516Fusobacterium nucleatum ssp. nucleatum ATCC 25586FusobacteriaBacteria
517Gemmatimonas aurantiaca T-27GemmatimonadetesBacteria
518Thermodesulfovibrio yellowstonii DSM 11347NitrospiraeBacteria
519Rhodopirellula baltica SH 1PlanctomycetesBacteria
520Pirellula staleyi DSM 6068PlanctomycetesBacteria
521Nautilia profundicola AmHProteobacteriaBacteria
522Sulfurospirillum deleyianum DSM 6946ProteobacteriaBacteria
523Arcobacter butzleri RM4018ProteobacteriaBacteria
524Campylobacter hominis ATCC BAA-381ProteobacteriaBacteria
525Campylobacter lari RM2100ProteobacteriaBacteria
526Campylobacter curvus 525.92ProteobacteriaBacteria
527Campylobacter concisus 13826ProteobacteriaBacteria
528Campylobacter jejuni ssp. jejuni NCTC 11168ProteobacteriaBacteria
529Campylobacter fetus ssp. fetus 82-40ProteobacteriaBacteria
530Sulfurimonas denitrificans DSM 1251ProteobacteriaBacteria
531Wolinella succinogenes DSM 1740ProteobacteriaBacteria
532Helicobacter hepaticus ATCC 51449ProteobacteriaBacteria
533Helicobacter mustelae 12198ProteobacteriaBacteria
534Helicobacter acinonychis SheebaProteobacteriaBacteria
535Helicobacter pylori 26695ProteobacteriaBacteria
536Nitratiruptor sp. SB155-2ProteobacteriaBacteria
537Sulfurovum sp. NBC37-1ProteobacteriaBacteria
538Bdellovibrio bacteriovorus HD100ProteobacteriaBacteria
539Syntrophus aciditrophicus SBProteobacteriaBacteria
540Syntrophobacter fumaroxidans MPOBProteobacteriaBacteria
541Desulfotalea psychrophila LSv54ProteobacteriaBacteria
542Desulfatibacillum alkenivorans AK-01ProteobacteriaBacteria
543Desulfobacterium autotrophicum HRM2ProteobacteriaBacteria
544Desulfococcus oleovorans Hxd3ProteobacteriaBacteria
545Desulfohalobium retbaense DSM 5692ProteobacteriaBacteria
546Desulfomicrobium baculatum DSM 4028ProteobacteriaBacteria
547Lawsonia intracellularis PHE/MN1-00ProteobacteriaBacteria
548Desulfovibrio magneticus RS-1ProteobacteriaBacteria
549Desulfovibrio vulgaris HildenboroughProteobacteriaBacteria
550Desulfovibrio salexigens DSM 2638ProteobacteriaBacteria
551Desulfovibrio desulfuricans ssp. desulfuricans G20ProteobacteriaBacteria
552Pelobacter propionicus DSM 2379ProteobacteriaBacteria
553Pelobacter carbinolicus DSM 2380ProteobacteriaBacteria
554Geobacter uraniireducens Rf4ProteobacteriaBacteria
555Geobacter sp. FRC-32ProteobacteriaBacteria
556Geobacter lovleyi SZProteobacteriaBacteria
557Geobacter bemidjiensis BemProteobacteriaBacteria
558Geobacter sulfurreducens PCAProteobacteriaBacteria
559Geobacter metallireducens GS-15ProteobacteriaBacteria
560Haliangium ochraceum DSM 14365ProteobacteriaBacteria
561Sorangium cellulosum So ce 56ProteobacteriaBacteria
562Anaeromyxobacter sp. Fw109-5ProteobacteriaBacteria
563Anaeromyxobacter dehalogenans 2CP-CProteobacteriaBacteria
564Myxococcus xanthus DK 1622ProteobacteriaBacteria
565Magnetococcus sp. MC-1ProteobacteriaBacteria
566Sideroxydans lithotrophicus ES-1ProteobacteriaBacteria
567Aromatoleum aromaticum EbN1ProteobacteriaBacteria
568Dechloromonas aromatica RCBProteobacteriaBacteria
569Thauera sp. MZ1TProteobacteriaBacteria
570Laribacter hongkongensis HLHK9ProteobacteriaBacteria
571Chromobacterium violaceum ATCC 12472ProteobacteriaBacteria
572Neisseria meningitidis Z2491ProteobacteriaBacteria
573Neisseria gonorrhoeae FA 1090ProteobacteriaBacteria
574Methylotenera mobilis JLW8ProteobacteriaBacteria
575Methylovorus sp. SIP3-4ProteobacteriaBacteria
576Methylobacillus flagellatus KTProteobacteriaBacteria
577Thiobacillus denitrificans ATCC 25259ProteobacteriaBacteria
578Candidatus Accumulibacter phosphatis clade IIA UW-1ProteobacteriaBacteria
579Methylibium petroleiphilum PM1ProteobacteriaBacteria
580Leptothrix cholodnii SP-6ProteobacteriaBacteria
581Ralstonia eutropha JMP134ProteobacteriaBacteria
582Cupriavidus taiwanensisProteobacteriaBacteria
583Cupriavidus metallidurans CH34ProteobacteriaBacteria
584Ralstonia pickettii 12JProteobacteriaBacteria
585Ralstonia solanacearum GMI1000ProteobacteriaBacteria
586Polynucleobacter necessarius ssp. asymbioticus QLW-P1DMWA-1ProteobacteriaBacteria
587Burkholderia phytofirmans PsJNProteobacteriaBacteria
588Burkholderia phymatum STM815ProteobacteriaBacteria
589Burkholderia thailandensis E264ProteobacteriaBacteria
590Burkholderia pseudomallei K96243ProteobacteriaBacteria
591Burkholderia mallei ATCC 23344ProteobacteriaBacteria
592Burkholderia sp. 383ProteobacteriaBacteria
593Burkholderia ambifaria AMMDProteobacteriaBacteria
594Burkholderia cenocepacia AU 1054ProteobacteriaBacteria
595Burkholderia multivorans ATCC 17616ProteobacteriaBacteria
596Burkholderia vietnamiensis G4ProteobacteriaBacteria
597Burkholderia xenovorans LB400ProteobacteriaBacteria
598Burkholderia glumae BGR1ProteobacteriaBacteria
599Rhodoferax ferrireducens T118ProteobacteriaBacteria
600Verminephrobacter eiseniae EF01-2ProteobacteriaBacteria
601Delftia acidovorans SPH-1ProteobacteriaBacteria
602Polaromonas sp. JS666ProteobacteriaBacteria
603Polaromonas naphthalenivorans CJ2ProteobacteriaBacteria
604Variovorax paradoxus S110ProteobacteriaBacteria
605Acidovorax ebreus TPSYProteobacteriaBacteria
606Acidovorax sp. JS42ProteobacteriaBacteria
607Acidovorax citrulli AAC00-1ProteobacteriaBacteria
608Herminiimonas arsenicoxydansProteobacteriaBacteria
609Janthinobacterium sp. MarseilleProteobacteriaBacteria
610Bordetella petrii DSM 12804ProteobacteriaBacteria
611Bordetella avium 197NProteobacteriaBacteria
612Bordetella pertussis Tohama IProteobacteriaBacteria
613Bordetella parapertussis 12822ProteobacteriaBacteria
614Bordetella bronchiseptica RB50ProteobacteriaBacteria
615Nitrosospira multiformis ATCC 25196ProteobacteriaBacteria
616Nitrosomonas eutropha C91ProteobacteriaBacteria
617Nitrosomonas europaea ATCC 19718ProteobacteriaBacteria
618Caulobacter sp. K31ProteobacteriaBacteria
619Caulobacter crescentus CB15ProteobacteriaBacteria
620Caulobacter segnis ATCC 21756ProteobacteriaBacteria
621Phenylobacterium zucineum HLK1ProteobacteriaBacteria
622Erythrobacter litoralis HTCC2594ProteobacteriaBacteria
623Sphingopyxis alaskensis RB2256ProteobacteriaBacteria
624Novosphingobium aromaticivorans DSM 12444ProteobacteriaBacteria
625Sphingobium japonicum UT26SProteobacteriaBacteria
626Sphingomonas wittichii RW1ProteobacteriaBacteria
627Zymomonas mobilis ssp. mobilis ZM4ProteobacteriaBacteria
628Maricaulis maris MCS10ProteobacteriaBacteria
629Hirschia baltica ATCC 49814ProteobacteriaBacteria
630Hyphomonas neptunium ATCC 15444ProteobacteriaBacteria
631Dinoroseobacter shibae DFL 12ProteobacteriaBacteria
632Jannaschia sp. CCS1ProteobacteriaBacteria
633Ruegeria sp. TM1040ProteobacteriaBacteria
634Ruegeria pomeroyi DSS-3ProteobacteriaBacteria
635Roseobacter denitrificans OCh 114ProteobacteriaBacteria
636Rhodobacter sphaeroides 2.4.1ProteobacteriaBacteria
637Rhodobacter capsulatus SB 1003ProteobacteriaBacteria
638Paracoccus denitrificans PD1222ProteobacteriaBacteria
639Magnetospirillum magneticum AMB-1ProteobacteriaBacteria
640Rhodospirillum centenum SWProteobacteriaBacteria
641Rhodospirillum rubrum ATCC 11170ProteobacteriaBacteria
642Azospirillum sp. B510ProteobacteriaBacteria
643Granulibacter bethesdensis CGDNIH1ProteobacteriaBacteria
644Gluconacetobacter diazotrophicus PAl 5ProteobacteriaBacteria
645Gluconobacter oxydans 621HProteobacteriaBacteria
646Acetobacter pasteurianus IFO 3283-01ProteobacteriaBacteria
647Candidatus Puniceispirillum marinum IMCC1322ProteobacteriaBacteria
648Candidatus Pelagibacter ubique HTCC1062ProteobacteriaBacteria
649Neorickettsia sennetsu MiyayamaProteobacteriaBacteria
650Neorickettsia risticii IllinoisProteobacteriaBacteria
651Wolbachia endosymbiont of Culex_quinquefasciatus PelProteobacteriaBacteria
652Wolbachia endosymbiont of Drosophila melanogasterProteobacteriaBacteria
653Wolbachia endosymbiont TRS of Brugia malayiProteobacteriaBacteria
654Wolbachia sp. wRiProteobacteriaBacteria
655Ehrlichia chaffeensis ArkansasProteobacteriaBacteria
656Ehrlichia canis JakeProteobacteriaBacteria
657Ehrlichia ruminantium WelgevondenProteobacteriaBacteria
658Anaplasma phagocytophilum HZProteobacteriaBacteria
659Anaplasma marginale St. MariesProteobacteriaBacteria
660Anaplasma centrale IsraelProteobacteriaBacteria
661Orientia tsutsugamushi BoryongProteobacteriaBacteria
662Rickettsia bellii RML369-CProteobacteriaBacteria
663Rickettsia canadensis McKielProteobacteriaBacteria
664Rickettsia typhi WilmingtonProteobacteriaBacteria
665Rickettsia prowazekii Madrid EProteobacteriaBacteria
666Rickettsia peacockii RusticProteobacteriaBacteria
667Rickettsia felis URRWXCal2ProteobacteriaBacteria
668Rickettsia massiliae MTU5ProteobacteriaBacteria
669Rickettsia africae ESF-5ProteobacteriaBacteria
670Rickettsia akari HartfordProteobacteriaBacteria
671Rickettsia rickettsii Sheila SmithProteobacteriaBacteria
672Rickettsia conorii Malish 7ProteobacteriaBacteria
673Xanthobacter autotrophicus Py2ProteobacteriaBacteria
674Azorhizobium caulinodans ORS 571ProteobacteriaBacteria
675Methylobacterium chloromethanicum CM4ProteobacteriaBacteria
676Methylobacterium extorquens PA1ProteobacteriaBacteria
677Methylobacterium sp. 4-46ProteobacteriaBacteria
678Methylobacterium populi BJ001ProteobacteriaBacteria
679Methylobacterium nodulans ORS 2060ProteobacteriaBacteria
680Methylobacterium radiotolerans JCM 2831ProteobacteriaBacteria
681Candidatus Hodgkinia cicadicola DsemProteobacteriaBacteria
682Ochrobactrum anthropi ATCC 49188ProteobacteriaBacteria
683Brucella microti CCM 4915ProteobacteriaBacteria
684Brucella canis ATCC 23365ProteobacteriaBacteria
685Brucella suis 1330ProteobacteriaBacteria
686Brucella melitensis bv. 1 16MProteobacteriaBacteria
687Brucella ovis ATCC 25840ProteobacteriaBacteria
688Brucella abortus bv. 1 9-941ProteobacteriaBacteria
689Rhizobium sp. NGR234ProteobacteriaBacteria
690Sinorhizobium medicae WSM419ProteobacteriaBacteria
691Sinorhizobium meliloti 1021ProteobacteriaBacteria
692Rhizobium etli CFN 42ProteobacteriaBacteria
693Rhizobium leguminosarum bv. viciae 3841ProteobacteriaBacteria
694Agrobacterium vitis S4ProteobacteriaBacteria
695Agrobacterium radiobacter K84ProteobacteriaBacteria
696Agrobacterium tumefaciens C58ProteobacteriaBacteria
697Candidatus Liberibacter asiaticus psy62ProteobacteriaBacteria
698Chelativorans sp. BNC1ProteobacteriaBacteria
699Parvibaculum lavamentivorans DS-1ProteobacteriaBacteria
700Mesorhizobium loti MAFF303099ProteobacteriaBacteria
701Methylocella silvestris BL2ProteobacteriaBacteria
702Beijerinckia indica ssp. indica ATCC 9039ProteobacteriaBacteria
703Oligotropha carboxidovorans OM5ProteobacteriaBacteria
704Rhodopseudomonas palustris CGA009ProteobacteriaBacteria
705Nitrobacter winogradskyi Nb-255ProteobacteriaBacteria
706Nitrobacter hamburgensis X14ProteobacteriaBacteria
707Bradyrhizobium sp. ORS278ProteobacteriaBacteria
708Bradyrhizobium japonicum USDA 110ProteobacteriaBacteria
709Bartonella tribocorum CIP 105476ProteobacteriaBacteria
710Bartonella henselae Houston-1ProteobacteriaBacteria
711Bartonella grahamii as4aupProteobacteriaBacteria
712Bartonella quintana ToulouseProteobacteriaBacteria
713Bartonella bacilliformis KC583ProteobacteriaBacteria
714Acidithiobacillus ferrooxidans ATCC 23270ProteobacteriaBacteria
715Mannheimia succiniciproducens MBEL55EProteobacteriaBacteria
716Aggregatibacter aphrophilus NJ8700ProteobacteriaBacteria
717Aggregatibacter actinomycetemcomitans D11S-1ProteobacteriaBacteria
718Haemophilus somnus 129PTProteobacteriaBacteria
719Pasteurella multocida ssp. multocida Pm70ProteobacteriaBacteria
720Haemophilus parasuis SH0165ProteobacteriaBacteria
721Haemophilus ducreyi 35000HPProteobacteriaBacteria
722Haemophilus influenzae Rd KW20ProteobacteriaBacteria
723Actinobacillus succinogenes 130ZProteobacteriaBacteria
724Actinobacillus pleuropneumoniae L20ProteobacteriaBacteria
725Tolumonas auensis DSM 9187ProteobacteriaBacteria
726Aeromonas salmonicida ssp. salmonicida A449ProteobacteriaBacteria
727Aeromonas hydrophila ssp. hydrophila ATCC 7966ProteobacteriaBacteria
728Aliivibrio salmonicida LFI1238ProteobacteriaBacteria
729Vibrio fischeri ES114ProteobacteriaBacteria
730Vibrio parahaemolyticus RIMD 2210633ProteobacteriaBacteria
731Vibrio harveyi ATCC BAA-1116ProteobacteriaBacteria
732Vibrio sp. Ex25ProteobacteriaBacteria
733Vibrio splendidus LGP32ProteobacteriaBacteria
734Vibrio vulnificus YJ016ProteobacteriaBacteria
735Vibrio cholerae O1 biov. El Tor N16961ProteobacteriaBacteria
736Photobacterium profundum SS9ProteobacteriaBacteria
737Psychromonas ingrahamii 37ProteobacteriaBacteria
738Idiomarina loihiensis L2TRProteobacteriaBacteria
739Shewanella piezotolerans WP3ProteobacteriaBacteria
740Shewanella loihica PV-4ProteobacteriaBacteria
741Shewanella halifaxensis HAW-EB4ProteobacteriaBacteria
742Shewanella sediminis HAW-EB3ProteobacteriaBacteria
743Shewanella denitrificans OS217ProteobacteriaBacteria
744Shewanella pealeana ATCC 700345ProteobacteriaBacteria
745Shewanella oneidensis MR-1ProteobacteriaBacteria
746Shewanella baltica OS155ProteobacteriaBacteria
747Shewanella woodyi ATCC 51908ProteobacteriaBacteria
748Shewanella sp. MR-7ProteobacteriaBacteria
749Shewanella amazonensis SB2BProteobacteriaBacteria
750Shewanella violacea DSS12ProteobacteriaBacteria
751Shewanella frigidimarina NCIMB 400ProteobacteriaBacteria
752Shewanella putrefaciens CN-32ProteobacteriaBacteria
753Colwellia psychrerythraea 34HProteobacteriaBacteria
754Pseudoalteromonas atlantica T6cProteobacteriaBacteria
755Pseudoalteromonas haloplanktis TAC125ProteobacteriaBacteria
756Teredinibacter turnerae T7901ProteobacteriaBacteria
757Saccharophagus degradans 2-40ProteobacteriaBacteria
758Marinobacter aquaeolei VT8ProteobacteriaBacteria
759Alteromonas macleodii Deep ecotypeProteobacteriaBacteria
760Hahella chejuensis KCTC 2396ProteobacteriaBacteria
761Kangiella koreensis DSM 16069ProteobacteriaBacteria
762Alcanivorax borkumensis SK2ProteobacteriaBacteria
763Marinomonas sp. MWYL1ProteobacteriaBacteria
764Chromohalobacter salexigens DSM 3043ProteobacteriaBacteria
765Methylococcus capsulatus BathProteobacteriaBacteria
766Dichelobacter nodosus VCS1703AProteobacteriaBacteria
767Stenotrophomonas maltophilia R551-3ProteobacteriaBacteria
768Xylella fastidiosa 9a5cProteobacteriaBacteria
769Xanthomonas axonopodis pv. citri 306ProteobacteriaBacteria
770Xanthomonas albilineansProteobacteriaBacteria
771Xanthomonas oryzae pv. oryzae KACC10331ProteobacteriaBacteria
772Xanthomonas campestris pv. campestris ATCC 33913ProteobacteriaBacteria
773Halothiobacillus neapolitanus c2ProteobacteriaBacteria
774Alkalilimnicola ehrlichii MLHE-1ProteobacteriaBacteria
775Thioalkalivibrio sp. HL-EbGR7ProteobacteriaBacteria
776Halorhodospira halophila SL1ProteobacteriaBacteria
777Allochromatium vinosum DSM 180ProteobacteriaBacteria
778Nitrosococcus halophilus Nc4ProteobacteriaBacteria
779Nitrosococcus oceani ATCC 19707ProteobacteriaBacteria
780Coxiella burnetii RSA 493ProteobacteriaBacteria
781Legionella longbeachae NSW150ProteobacteriaBacteria
782Legionella pneumophila ssp. pneumophila Philadelphia 1ProteobacteriaBacteria
783Baumannia cicadellinicola HcProteobacteriaBacteria
784Candidatus Carsonella ruddii PVProteobacteriaBacteria
785Candidatus Vesicomyosocius okutanii HAProteobacteriaBacteria
786Candidatus Ruthia magnifica CmProteobacteriaBacteria
787Cronobacter turicensis z3032ProteobacteriaBacteria
788Cronobacter sakazakii ATCC BAA-894ProteobacteriaBacteria
789Candidatus Riesia pediculicola USDAProteobacteriaBacteria
790Dickeya zeae Ech1591ProteobacteriaBacteria
791Dickeya dadantii Ech703ProteobacteriaBacteria
792Candidatus Hamiltonella defensa 5ATProteobacteriaBacteria
793Candidatus Blochmannia floridanusProteobacteriaBacteria
794Pectobacterium wasabiae WPP163ProteobacteriaBacteria
795Pectobacterium atrosepticum SCRI1043ProteobacteriaBacteria
796Pectobacterium carotovorum ssp. carotovorum PC1ProteobacteriaBacteria
797Sodalis glossinidius morsitansProteobacteriaBacteria
798Pantoea ananatis LMG 20103ProteobacteriaBacteria
799Wigglesworthia glossinidiaProteobacteriaBacteria
800Buchnera aphidicola APSProteobacteriaBacteria
801Photorhabdus asymbioticaProteobacteriaBacteria
802Photorhabdus luminescens ssp. laumondii TTO1ProteobacteriaBacteria
803Edwardsiella ictaluri 93-146ProteobacteriaBacteria
804Edwardsiella tarda EIB202ProteobacteriaBacteria
805Yersinia pseudotuberculosis IP 32953ProteobacteriaBacteria
806Yersinia pestis CO92ProteobacteriaBacteria
807Yersinia enterocolitica ssp. enterocolitica 8081ProteobacteriaBacteria
808Xenorhabdus bovienii SS-2004ProteobacteriaBacteria
809Shigella sonnei Ss046ProteobacteriaBacteria
810Shigella flexneri 2a 2457TProteobacteriaBacteria
811Shigella dysenteriae Sd197ProteobacteriaBacteria
812Shigella boydii Sb227ProteobacteriaBacteria
813Serratia proteamaculans 568ProteobacteriaBacteria
814Salmonella enterica ssp. enterica ser. Typhimurium LT2ProteobacteriaBacteria
815Proteus mirabilis HI4320ProteobacteriaBacteria
816Klebsiella variicola At-22ProteobacteriaBacteria
817Klebsiella pneumoniae ssp. pneumoniae MGH 78578ProteobacteriaBacteria
818Escherichia fergusonii ATCC 35469ProteobacteriaBacteria
819Escherichia coli K-12 subMG1655ProteobacteriaBacteria
820Erwinia tasmaniensis Et1/99ProteobacteriaBacteria
821Erwinia pyrifoliae Ep1/96ProteobacteriaBacteria
822Erwinia amylovora ATCC 49946ProteobacteriaBacteria
823Enterobacter sp. 638ProteobacteriaBacteria
824Citrobacter rodentium ICC168ProteobacteriaBacteria
825Citrobacter koseri ATCC BAA-895ProteobacteriaBacteria
826Azotobacter vinelandii DJProteobacteriaBacteria
827Pseudomonas entomophila L48ProteobacteriaBacteria
828Pseudomonas syringae pv. tomato DC3000ProteobacteriaBacteria
829Pseudomonas stutzeri A1501ProteobacteriaBacteria
830Pseudomonas putida KT2440ProteobacteriaBacteria
831Pseudomonas fluorescens Pf-5ProteobacteriaBacteria
832Pseudomonas mendocina ympProteobacteriaBacteria
833Pseudomonas aeruginosa PAO1ProteobacteriaBacteria
834Cellvibrio japonicus Ueda107ProteobacteriaBacteria
835Psychrobacter sp. PRwf-1ProteobacteriaBacteria
836Psychrobacter arcticus 273-4ProteobacteriaBacteria
837Psychrobacter cryohalolentis K5ProteobacteriaBacteria
838Acinetobacter baumannii ATCC 17978ProteobacteriaBacteria
839Acinetobacter sp. ADP1ProteobacteriaBacteria
840Thiomicrospira crunogena XCL-2ProteobacteriaBacteria
841Francisella philomiragia ssp. philomiragia ATCC 25017ProteobacteriaBacteria
842Francisella tularensis ssp. tularensis SCHU S4ProteobacteriaBacteria
843Brachyspira hyodysenteriae WA1SpirochaetesBacteria
844Leptospira borgpetersenii ser. Hardjo-bovis L550SpirochaetesBacteria
845Leptospira interrogans ser. Lai 56601SpirochaetesBacteria
846Leptospira biflexa ser. Patoc Patoc 1 (Paris)SpirochaetesBacteria
847Treponema pallidum ssp. pallidum NicholsSpirochaetesBacteria
848Treponema denticola ATCC 35405SpirochaetesBacteria
849Borrelia garinii PBiSpirochaetesBacteria
850Borrelia afzelii PKoSpirochaetesBacteria
851Borrelia burgdorferi B31SpirochaetesBacteria
852Borrelia recurrentis A1SpirochaetesBacteria
853Borrelia duttonii LySpirochaetesBacteria
854Borrelia turicatae 91E135SpirochaetesBacteria
855Borrelia hermsii DAHSpirochaetesBacteria
856Aminobacterium colombiense DSM 12261SynergistetesBacteria
857Thermanaerovibrio acidaminovorans DSM 6589SynergistetesBacteria
858Candidatus Phytoplasma maliTenericutesBacteria
859Aster yellows witches-broom phytoplasma AYWBTenericutesBacteria
860Onion yellows phytoplasma OY-MTenericutesBacteria
861Acholeplasma laidlawii PG-8ATenericutesBacteria
862Mesoplasma florum L1TenericutesBacteria
863Ureaplasma parvum ser. 3 ATCC 700970TenericutesBacteria
864Ureaplasma urealyticum ser. 10 ATCC 33699TenericutesBacteria
865Mycoplasma mycoides ssp. mycoides SC PG1TenericutesBacteria
866Mycoplasma capricolum ssp. capricolum ATCC 27343TenericutesBacteria
867Mycoplasma crocodyli MP145TenericutesBacteria
868Mycoplasma conjunctivae HRC/581TenericutesBacteria
869Mycoplasma penetrans HF-2TenericutesBacteria
870Mycoplasma mobile 163KTenericutesBacteria
871Mycoplasma arthritidis 158L3-1TenericutesBacteria
872Mycoplasma agalactiae PG2TenericutesBacteria
873Mycoplasma synoviae 53TenericutesBacteria
874Mycoplasma pulmonis UAB CTIPTenericutesBacteria
875Mycoplasma pneumoniae M129TenericutesBacteria
876Mycoplasma hyopneumoniae 232TenericutesBacteria
877Mycoplasma hominisTenericutesBacteria
878Mycoplasma genitalium G37TenericutesBacteria
879Mycoplasma gallisepticum R(low)TenericutesBacteria
880Kosmotoga olearia TBF 19.5.1ThermotogaeBacteria
881Petrotoga mobilis SJ95ThermotogaeBacteria
882Fervidobacterium nodosum Rt17-B1ThermotogaeBacteria
883Thermosipho melanesiensis BI429ThermotogaeBacteria
884Thermosipho africanus TCF52BThermotogaeBacteria
885Thermotoga lettingae TMOThermotogaeBacteria
886Thermotoga sp. RQ2ThermotogaeBacteria
887Thermotoga naphthophila RKU-10ThermotogaeBacteria
888Thermotoga petrophila RKU-1ThermotogaeBacteria
889Thermotoga neapolitana DSM 4359ThermotogaeBacteria
890Thermotoga maritima MSB8ThermotogaeBacteria
891Coraliomargarita akajimensis DSM 45221VerrucomicrobiaBacteria
892Opitutus terrae PB90-1VerrucomicrobiaBacteria
893Methylacidiphilum infernorum V4VerrucomicrobiaBacteria
894Akkermansia muciniphila ATCC BAA-835VerrucomicrobiaBacteria
895Thermobaculum terrenum ATCC BAA-798Bacteria
896Hyperthermus butylicus DSM 5456CrenarchaeotaArchaea
897Aeropyrum pernix K1CrenarchaeotaArchaea
898Ignicoccus hospitalis KIN4/ICrenarchaeotaArchaea
899Staphylothermus marinus F1CrenarchaeotaArchaea
900Desulfurococcus kamchatkensis 1221nCrenarchaeotaArchaea
901Metallosphaera sedula DSM 5348CrenarchaeotaArchaea
902Sulfolobus tokodaii 7CrenarchaeotaArchaea
903Sulfolobus islandicus Y.N.15.51CrenarchaeotaArchaea
904Sulfolobus solfataricus P2CrenarchaeotaArchaea
905Sulfolobus acidocaldarius DSM 639CrenarchaeotaArchaea
906Thermofilum pendens Hrk 5CrenarchaeotaArchaea
907Caldivirga maquilingensis IC-167CrenarchaeotaArchaea
908Pyrobaculum calidifontis JCM 11548CrenarchaeotaArchaea
909Pyrobaculum arsenaticum DSM 13514CrenarchaeotaArchaea
910Pyrobaculum aerophilum IM2CrenarchaeotaArchaea
911Pyrobaculum islandicum DSM 4184CrenarchaeotaArchaea
912Thermoproteus neutrophilus V24StaCrenarchaeotaArchaea
913Methanocella paludicola SANAEEuryarchaeotaArchaea
914Methanosaeta thermophila PTEuryarchaeotaArchaea
915Methanococcoides burtonii DSM 6242EuryarchaeotaArchaea
916Methanosarcina acetivorans C2AEuryarchaeotaArchaea
917Methanosarcina mazei Go1EuryarchaeotaArchaea
918Methanosarcina barkeri FusaroEuryarchaeotaArchaea
919Methanohalophilus mahii DSM 5219EuryarchaeotaArchaea
920Methanosphaerula palustris E1-9cEuryarchaeotaArchaea
921Candidatus Methanoregula boonei 6A8EuryarchaeotaArchaea
922Methanospirillum hungatei JF-1EuryarchaeotaArchaea
923Methanocorpusculum labreanum ZEuryarchaeotaArchaea
924Methanoculleus marisnigri JR1EuryarchaeotaArchaea
925Methanopyrus kandleri AV19EuryarchaeotaArchaea
926Ferroglobus placidus DSM 10642EuryarchaeotaArchaea
927Archaeoglobus profundus DSM 5631EuryarchaeotaArchaea
928Archaeoglobus fulgidus DSM 4304EuryarchaeotaArchaea
929Thermococcus onnurineus NA1EuryarchaeotaArchaea
930Thermococcus kodakarensis KOD1EuryarchaeotaArchaea
931Thermococcus gammatolerans EJ3EuryarchaeotaArchaea
932Thermococcus sibiricus MM 739EuryarchaeotaArchaea
933Pyrococcus horikoshii OT3EuryarchaeotaArchaea
934Pyrococcus abyssi GE5EuryarchaeotaArchaea
935Pyrococcus furiosus DSM 3638EuryarchaeotaArchaea
936Thermoplasma volcanium GSS1EuryarchaeotaArchaea
937Thermoplasma acidophilum DSM 1728EuryarchaeotaArchaea
938Picrophilus torridus DSM 9790EuryarchaeotaArchaea
939Haloquadratum walsbyi DSM 16790EuryarchaeotaArchaea
940Halomicrobium mukohataei DSM 12286EuryarchaeotaArchaea
941Halorhabdus utahensis DSM 12940EuryarchaeotaArchaea
942Haloterrigena turkmenica DSM 5511EuryarchaeotaArchaea
943Natronomonas pharaonis DSM 2160EuryarchaeotaArchaea
944Natrialba magadii ATCC 43099EuryarchaeotaArchaea
945Halorubrum lacusprofundi ATCC 49239EuryarchaeotaArchaea
946Haloferax volcanii DS2EuryarchaeotaArchaea
947Halobacterium salinarum R1EuryarchaeotaArchaea
948Halobacterium sp. NRC-1EuryarchaeotaArchaea
949Haloarcula marismortui ATCC 43049EuryarchaeotaArchaea
950Methanocaldococcus sp. FS406-22EuryarchaeotaArchaea
951Methanocaldococcus fervens AG86EuryarchaeotaArchaea
952Methanocaldococcus vulcanius M7EuryarchaeotaArchaea
953Methanocaldococcus jannaschii DSM 2661EuryarchaeotaArchaea
954Methanococcus aeolicus Nankai-3EuryarchaeotaArchaea
955Methanococcus maripaludis S2EuryarchaeotaArchaea
956Methanococcus vannielii SBEuryarchaeotaArchaea
957Methanothermobacter thermautotrophicus Delta HEuryarchaeotaArchaea
958Methanosphaera stadtmanae DSM 3091EuryarchaeotaArchaea
959Methanobrevibacter ruminantium M1EuryarchaeotaArchaea
960Methanobrevibacter smithii ATCC 35061EuryarchaeotaArchaea
961uncultured methanogenic archaeon RC-IEuryarchaeotaArchaea
962Aciduliprofundum boonei T469EuryarchaeotaArchaea
963Candidatus Korarchaeum cryptofilum OPF8KorarchaeotaArchaea
964Nanoarchaeum equitans Kin4-MNanoarchaeotaArchaea
965Nitrosopumilus maritimus SCM1ThaumarchaeotaArchaea

This study began as a class project in CPSC 567, a course in bioinformatics and systems biology taught by G.C.-A. at the University of Illinois in spring 2011. We thank Kyung Mo Kim and Liudmila Yafremava for information about lifestyles. A.N., A.Na., M.J.K. and H.D.L.-N. conceived the experiments and analyzed the data. G.C.-A. supervised the project and edited the manuscript. Research was supported by the National Science Foundation (MCB-0749836), CREES-USDA and the Soybean Disease Biotechnology Center (to G.C.-A.). Any opinions, findings, and conclusions and recommendations expressed in this material are those of the authors and do not necessarily reflect the views of the funding agencies.


  1. Caetano-Anolles, D.; Kim, K.M.; Mittenthal, J.E.; Caetano-Anolles, G. Proteome evolution and the metabolic origins of translation and cellular life. J. Mol. Evol. 2011, 72, 14–33.
  2. Lesk, A.M. Introduction to Protein Architecture; Oxford University Press: New York, NY, USA, 2001.
  3. Cordes, M.H.; Davidson, A.R.; Sauer, R.T. Sequence space, folding and protein design. Curr. Opin. Struct. Biol. 1996, 6, 3–10.
  4. Linderstrom-Lang, K.U.; Schellman, J.A. The Enzymes; Academic Press: New York, NY, USA, 1959; pp. 443–510.
  5. Wang, M.; Caetano-Anolles, G. The evolutionary mechanics of domain organization in proteomes and the rise of modularity in the protein world. Structure 2009, 17, 66–78.
  6. Vogel, C.; Bashton, M.; Kerrison, N.D.; Chothia, C.; Teichmann, S.A. Structure, function and evolution of multidomain proteins. Curr. Opin. Struct. Biol. 2004, 14, 208–216.
  7. Wang, M.; Yafremava, L.S.; Caetano-Anolles, D.; Mittenthal, J.E.; Caetano-Anolles, G. Reductive evolution of architectural repertoires in proteomes and the birth of the tripartite world. Genome Res. 2007, 17, 1572–1585.
  8. Gerstein, M.; Hegyi, H. Comparing genomes in terms of protein structure: Surveys of a finite parts list. FEMS Microbiol. Rev. 1998, 22, 277–304.
  9. Chothia, C.; Gough, J.; Vogel, C.; Teichmann, S.A. Evolution of the protein repertoire. Science 2003, 300, 1701–1703.
  10. Murzin, A.G.; Brenner, S.E.; Hubbard, T.; Chothia, C. Scop: A structural classification of proteins database for the investigation of sequences and structures. J. Mol. Biol. 1995, 247, 536–540.
  11. Orengo, C.A.; Michie, A.D.; Jones, S.; Jones, D.T.; Swindells, M.B.; Thornton, J.M. Cath—A hierarchic classification of protein domain structures. Structure 1997, 5, 1093–1108.
  12. Riley, M.; Labedan, B. Protein evolution viewed through escherichia coli protein sequences: Introducing the notion of a structural segment of homology, the module. J. Mol. Biol. 1997, 268, 857–868.
  13. Ponting, C.P.; Russell, R.R. The natural history of protein domains. Annu. Rev. Biophys. Biomol. Struct. 2002, 31, 45–71.
  14. Andreeva, A.; Howorth, D.; Chandonia, J.M.; Brenner, S.E.; Hubbard, T.J.; Chothia, C.; Murzin, A.G. Data growth and its impact on the scop database: New developments. Nucleic Acids Res. 2008, 36, D419–D425.
  15. Caetano-Anolles, G.; Wang, M.; Caetano-Anolles, D.; Mittenthal, J.E. The origin, evolution and structure of the protein world. Biochem. J. 2009, 417, 621–637.
  16. Gough, J.; Karplus, K.; Hughey, R.; Chothia, C. Assignment of homology to genome sequences using a library of hidden markov models that represent all proteins of known structure. J. Mol. Biol. 2001, 313, 903–919.
  17. Wilson, D.; Madera, M.; Vogel, C.; Chothia, C.; Gough, J. The superfamily database in 2007: Families and functions. Nucleic Acids Res. 2007, 35, D308–D313.
  18. Karplus, K. Sam-t08, hmm-based protein structure prediction. Nucleic Acids Res. 2009, 37, W492–W497.
  19. Kim, K.M.; Caetano-Anolles, G. The proteomic complexity and rise of the primordial ancestor of diversified life. BMC Evol. Biol. 2011, 11, 140:1–140:24.
  20. Vogel, C.; Berzuini, C.; Bashton, M.; Gough, J.; Teichmann, S.A. Supra-domains: Evolutionary units larger than single protein domains. J. Mol. Biol. 2004, 336, 809–823.
  21. Vogel, C.; Teichmann, S.A.; Pereira-Leal, J. The relationship between domain duplication and recombination. J. Mol. Biol. 2005, 346, 355–365.
  22. Vogel, C.; Chothia, C. Protein family expansions and biological complexity. PLoS Comput. Biol. 2006, 2, e48:0370–e48:0382.
  23. Vogel, C. Function annotation of SCOP domain superfamilies 1.73. Superfamily-HMM library and genome assignments server, Available online: (accessed on 28 October 2011).
  24. Moreira, D.; Lopez-Garcia, P. Ten reasons to exclude viruses from the tree of life. Nat. Rev. Microbiol. 2009, 7, 306–311.
  25. Wang, M.; Kurland, C.G.; Caetano-Anolles, G. Reductive evolution of proteomes and protein structures. Proc. Natl. Acad. Sci. USA 2011, 108, 11954–11958.
  26. Koonin, E.V.; Wolf, Y.I.; Nagasaki, K.; Dolja, V.V. The big bang of picorna-like virus evolution antedates the radiation of eukaryotic supergroups. Nat. Rev. Microbiol. 2008, 6, 925–939.
  27. Das, S.; Paul, S.; Bag, S.K.; Dutta, C. Analysis of nanoarchaeum equitans genome and proteome composition: Indications for hyperthermophilic and parasitic adaptation. BMC Genomics 2006, 7, 186:1–186:16.
  28. Huber, H.; Hohn, M.J.; Rachel, R.; Fuchs, T.; Wimmer, V.C.; Stetter, K.O. A new phylum of archaea represented by a nanosized hyperthermophilic symbiont. Nature 2002, 417, 63–67.
  29. Waters, E.; Hohn, M.J.; Ahel, I.; Graham, D.E.; Adams, M.D.; Barnstead, M.; Beeson, K.Y.; Bibbs, L.; Bolanos, R.; Keller, M.; Kretz, K.; Lin, X.; Mathur, E.; Ni, J.; Podar, M.; Richardson, T.; Sutton, G.G.; Simon, M.; Soll, D.; Stetter, K.O.; Short, J.M.; Noordewier, M. The genome of Nanoarchaeum equitans: Insights into early archaeal evolution and derived parasitism. Proc. Natl. Acad. Sci. USA 2003, 100, 12984–12988.
  30. Randau, L.; Munch, R.; Hohn, M.J.; Jahn, D.; Soll, D. Nanoarchaeum equitans creates functional trnas from separate genes for their 5′- and 3′-halves. Nature 2005, 433, 537–541.
  31. Randau, L.; Schroder, I.; Soll, D. Life without rnase p. Nature 2008, 453, 120–123.
  32. Di Giulio, M. Nanoarchaeum equitans is a living fossil. J. Theor. Biol. 2006, 242, 257–260.
  33. Di Giulio, M. The tree of life might be rooted in the branch leading to nanoarchaeota. Gene 2007, 401, 108–113.
  34. Kim, K.M.; Caetano-Anolles, G. The evolutionary history of protein fold families and proteomes confirms Archaea is the most ancient superkingdom. Ms. submitted. .
  35. Woese, C.R.; Maniloff, J.; Zablen, L.B. Phylogenetic analysis of the mycoplasmas. Proc. Natl. Acad. Sci. USA 1980, 77, 494–498.
  36. Chambaud, I.; Heilig, R.; Ferris, S.; Barbe, V.; Samson, D.; Galisson, F.; Moszer, I.; Dybvig, K.; Wróblewski, H.; Viari, A.; Rocha, E.P.; Blanchard, A. The complete genome sequence of the murine respiratory pathogen Mycoplasma pulmonis. Nucleic Acids Res. 2001, 29, 2145–2153.
  37. Gibson, D.G.; Smith, H.O.; Hutchison, C.A., III; Venter, J.C.; Merryman, C. Chemical synthesis of the mouse mitochondrial genome. Nat. Methods 2010, 7, 901–903.
  38. Nakabachi, A.; Yamashita, A.; Toh, H.; Ishikawa, H.; Dunbar, H.E.; Moran, N.A.; Hattori, M. The 160-kilobase genome of the bacterial endosymbiont carsonella. Science 2006, 314, 267.
  39. Forterre, P.; Gribaldo, S. Bacteria with a eukaryotic touch: A glimpse of ancient evolution? Proc. Natl. Acad. Sci. USA 2010, 107, 12739–12740.
  40. Santarella-Mellwig, R.; Franke, J.; Jaedicke, A.; Gorjanacz, M.; Bauer, U.; Budd, A.; Mattaj, I.W.; Devos, D.P. The compartmentalized bacteria of the planctomycetes-verrucomicrobia-chlamydiae superphylum have membrane coat-like proteins. PLoS Biol. 2010, 8, e1000281:1–e1000281:11.
  41. Kamneva, O.K.; Liberles, D.A.; Ward, N.L. Genome-wide influence of indel substitutions on evolution of bacteria of the PVC superphylum, revealed using a novel computational method. Genome Biol. Evol. 2010, 2, 870–886.
  42. Devos, D.P.; Reynaud, E.G. Evolution. Intermediate steps. Science 2010, 330, 1187–1188.
  43. Katinka, M.D.; Duprat, S.; Cornillot, E.; Méténier, G.; Thomarat, F.; Prensier, G.; Barbe, V.; Peyretaillade, E.; Brottier, P.; Wincker, P.; Delbac, F.; El Alaoui, H.; Peyret, P.; Saurin, W.; Gouy, M.; Weissenbach, J.; Vivares, C. P, Genome sequence and gene compaction of the eukaryote parasite Encephalitozoon cuniculi. Nature 2001, 414, 450–453.
  44. Corradi, N.; Pombert, J.F.; Farinelli, L.; Didier, E.S.; Keeling, P.J. The complete sequence of the smallest known nuclear genome from the microsporidian Encephalitozoon intestinalis. Nat. Commun. 2010, 1, 77, doi:10.1038/ncomms1082.
  45. Douglas, S.; Zauner, S.; Fraunholz, M.; Beaton, M.; Penny, S.; Deng, L.T.; Wu, X.; Reith, M.; Cavalier-Smith, T.; Maier, U.G. The highly reduced genome of an enslaved algal nucleus. Nature 2001, 410, 1091–1096.
  46. Peyretaillade, E.; Biderre, C.; Peyret, P.; Duffieux, F.; Metenier, G.; Gouy, M.; Michot, B.; Vivares, C.P. Microsporidian encephalitozoon cuniculi, a unicellular eukaryote with an unusual chromosomal dispersion of ribosomal genes and a lsu rrna reduced to the universal core. Nucleic Acids Res. 1998, 26, 3513–3520.
  47. Martin, W.; Herrmann, R.G. Gene transfer from organelles to the nucleus: How much, what happens, and why? Plant Physiol. 1998, 118, 9–17.
  48. Keeling, P.J.; Slamovits, C.H. Causes and effects of nuclear genome reduction. Curr. Opin. Genet. Dev. 2005, 15, 601–608.
  49. Welch, B.L. The significance of the difference between two means when the population variances are unequal. Biometrika 1938, 29, 350–362.
  50. Caetano-Anolles, G.; Kim, H.S.; Mittenthal, J.E. The origin of modern metabolic networks inferred from phylogenomic analysis of protein architecture. Proc. Natl. Acad. Sci. USA 2007, 104, 9358–9363.
  51. Ingham, P.W.; Nokano, Y.; Seger, C. Mechanisms and functions of Hedgehog signalling across the metazoa. Nat. Rev. Genet. 2011, 12, 393–406.
  52. Bürglin, T.R. Evolution of hedgehog and hedgehog-related genes, their origin from Hog proteins in ancestral eukaryotes and discovery of a novel Hint motif. BMC Genomics 2008, 9, 127:1–127:28.
Genes EISSN 2073-4425 Published by MDPI AG, Basel, Switzerland RSS E-Mail Table of Contents Alert