Next Article in Journal
Physical Routes to Primitive Cells: An Experimental Model Based on the Spontaneous Entrapment of Enzymes inside Micrometer-Sized Liposomes
Next Article in Special Issue
The Heptameric SmAP1 and SmAP2 Proteins of the Crenarchaeon Sulfolobus Solfataricus Bind to Common and Distinct RNA Targets
Previous Article in Journal
Highly Iterated Palindromic Sequences (HIPs) and Their Relationship to DNA Methyltransferases
Previous Article in Special Issue
The Role of Active Site Residues in ATP Binding and Catalysis in the Methanosarcina thermophila Acetate Kinase
Order Article Reprints
Font Type:
Arial Georgia Verdana
Font Size:
Aa Aa Aa
Line Spacing:
Column Width:

Phylogeny and Taxonomy of Archaea: A Comparison of the Whole-Genome-Based CVTree Approach with 16S rRNA Sequence Analysis

Life Research Center and Department of Physics, Fudan University, 220 Handan Road, Shanghai 200433, China
Thermo Fisher Scientific, 200 Oyster Point Blvd, South San Francisco, CA 94080, USA
Author to whom correspondence should be addressed.
Life 2015, 5(1), 949-968;
Received: 9 December 2014 / Revised: 6 March 2015 / Accepted: 9 March 2015 / Published: 17 March 2015
(This article belongs to the Special Issue Archaea: Evolution, Physiology, and Molecular Biology)


A tripartite comparison of Archaea phylogeny and taxonomy at and above the rank order is reported: (1) the whole-genome-based and alignment-free CVTree using 179 genomes; (2) the 16S rRNA analysis exemplified by the All-Species Living Tree with 366 archaeal sequences; and (3) the Second Edition of Bergey’s Manual of Systematic Bacteriology complemented by some current literature. A high degree of agreement is reached at these ranks. From the newly proposed archaeal phyla, Korarchaeota, Thaumarchaeota, Nanoarchaeota and Aigarchaeota, to the recent suggestion to divide the class Halobacteria into three orders, all gain substantial support from CVTree. In addition, the CVTree helped to determine the taxonomic position of some newly sequenced genomes without proper lineage information. A few discrepancies between the CVTree and the 16S rRNA approaches call for further investigation.

1. Introduction

Prokaryotes are the most abundant and diverse creatures on Earth. The recognition of Archaea as one of the three main domains of life [1,2] was a milestone in the development of biology and a great success of using the 16S rRNA sequences as molecular clocks for prokaryotes, as suggested by Carl Woese and coworkers [3,4]. The Second Edition of Bergey’s Manual of Systematic Bacteriology [5] (hereafter, the Manual), a magnificent work of more than 8000 pages, took 12 years (2001–2012) to complete and is being considered by many microbiologists as the best approximation to an official classification of prokaryotes [6]. As stated in the Preface to vol. 1 of the Manual, these volumes “follow a phylogenetic framework based on analysis of the nucleotide sequence of the small ribosomal subunit RNA, rather than a phenotypic structure.” However, the “congruence” of phylogeny and taxonomy on the basis of 16S rRNA sequence analysis raises a question of principle, namely the necessity of cross-verification of whether the present classification is capable of providing a natural and objective demarcation of microbial organisms.
The answer comes with the advent of the genomic era. A whole-genome-based, alignment-free, composition vector approach to prokaryotic phylogeny, called CVTree [7,8,9,10,11,12], has produced robust phylogenetic trees that agree with prokaryotic taxonomy almost at all taxonomic ranks, from domain down to genera and species, and more importantly, many apparent disagreements have disappeared, with new taxonomic revisions appearing. In fact, all published taxonomic revisions for prokaryotes with sequenced genomes have added to the agreement of CVTree with taxonomy. A recent example from the domain Archaea was the reclassification of Thermoproteus neutrophilus to Pyrobaculum neutrophilum [13].
In this paper, we study Archaea phylogeny across many phyla. This is distinct from the phylogeny of species in a narrow range of taxa, e.g., that of vertebrates (a subphylum) or human versus close relatives (a few genera). Accordingly, the phylogeny should be compared with taxonomy at large or, as Cavalier-Smith [14] put it, with “mega-classification” of prokaryotes, focusing on taxonomy of higher ranks. Although in taxonomy, the description of a newly discovered organism necessarily starts from the lower ranks, higher rank assignments are often incomplete or lacking. At present, the ranks above class are not covered by the Bacteriological Code [15,16]. The number of plausible microbial phyla may reach hundreds, and archaeal ones are among the least studied. According to the 16S rRNA analysis, the major archaeal classes and their subordinate orders have been more or less delineated. Therefore, in order to carry out the aforementioned cross-verification, we make an emphasis on higher ranks, such as phyla, classes and orders. A study using 179 Archaea genomes provides a framework for the further study of lower ranks.

2. Material and Method

Publicly available Archaea genome sequences are the material for this study. At present, more than 30,000 prokaryotic genomes have been sequenced [17], among which, about 16,000 have been annotated [18]. These numbers keep growing and make whole-genome approaches more than ever feasible.
As of the end of 2014, there were 165 Archaea genomes released on the NCBI FTP site [19]. These genomes with corresponding lineage information from NCBI taxonomy were part of the built-in database of the CVTree web servers [20,21]. A search of NCBI databases revealed 14 more archaeal genomes; these were uploaded to the web server at run time. Archaea genomes listed in the EBI Genome Pages [22] were all included. A full list of these 179 genomes with accession numbers is given in the Appendix.
A whole-genome-based phylogeny avoids the selection of sequence segments or orthologous genes. It must be alignment-free, due to the extreme diversity of prokaryotic genome size and gene content. Our way of implementing alignment-free comparison consists of using K-peptide counts in all protein products encoded in a genome to form a raw “composition vector” (CV). The raw CV components then undergo a subtraction procedure in order to diminish the background caused by neutral mutations, hence to highlight the shaping role of natural selection [23]. Using whole genomes as input data also helps to circumvent the problem of lateral gene transfer (LGT), as the latter is merely a mechanism of genome evolution together with lineage-dependent gene loss. Being a nightmare for single- or few-protein-based phylogeny, LGT may even play a positive role in whole-genome approaches, as it takes place basically in shared ecological niches [24] and among closely-related species [25]. Plasmid genomes were excluded from our input data, thus further reducing plasmid-mediated LGT. Using whole genome input and the alignment-free method also makes CVTree a parameter-free approach. In other words, given the genomes, phylogenetic trees are generated without any adjustment of the parameters or the selection of sequence segments.
As the CVTree methodology has been elucidated in many previous publications (see, e.g., [7,8,9,10,11,12]) and a web server was released twice in 2004 [26] and 2009 [20], we will not discuss the methodological aspects of CVTree here. However, it should be understood that the peptide length K, though looking like a parameter, does not function as a parameter. For a discussion on the role of K and why K = 5, 6 leads to the best results, we refer to a recent paper [27]. All CVTree figures shown in this paper were generated at K = 6. In this paper, the term CVTree is used to denote the method [7,8,9,10,11,12,27], the web server [20,21,26] and the resulting tree; see, e.g., [28].
Traditionally, a newly generated phylogenetic tree is subject to statistical re-sampling tests, such as bootstrap and jackknife. CVTree does not use sequence alignment. Consequently, there is no way to recognize informative or non-informative sites. Instead, we take all of the protein products encoded in a genome as a sampling pool for carrying out bootstrap or jackknife tests [7]. Although it was very time-consuming, CVTrees did pass these tests well [11]. However, successfully passing of statistical re-sampling tests only informs about the stability and self-consistency of the tree with respect to small variations of the input data. It is by far not a proof of the objective correctness of the tree. Direct comparison of all branchings in a tree with an independent taxonomy at all ranks would provide such a proof. The 16S rRNA phylogeny cannot be verified by Bergey’s taxonomy, as the latter follows the former. However, the agreement of branchings in CVTree with Bergey’s taxonomy would provide much stronger support to the tree, as compared to statistical tests. This is the strategy we adopt for the CVTree approach.
There are two aspects of a phylogenetic tree: the branching order (topology) and the branch lengths. Branching order is related to classification and branch length to evolution time. Calibration of branch lengths is always associated with the assumption that the mutation rate remains more or less a constant across all species represented in a tree, an assumption that cannot hold true in a large-scale phylogenetic study, like the present one. Therefore, branching order in trees is of primary concern, whereas calibration of branch lengths makes less sense. Accordingly, all figures in this paper only show the branching scheme without the indication of branch lengths and bootstrap values.
Branching order in a tree by itself does not bring about taxonomic ranks, e.g., class or order. The latter can be assigned only after comparison with a reference taxonomy, which is not a rigid framework, but a modifiable system. Though there is a dissimilarity measure in the CVTree algorithm, it is not realistic to delineate taxa by using this measure, at least for the time being. Even if defined in the future, it must be lineage dependent. For example, it cannot be expected that the same degree of dissimilarity may be used to delineate classes in all phyla. In addition, monophyly is a guiding principle in comparing branching order with taxonomy. Here, monophyly must be understood in a pragmatic way, restricted to the given set of input data and the reference taxonomy. If all genomes from a taxon appear exclusively in a tree branch, the branch is said to be monophyletic.
In order to effectively deal with several thousands of genomes in a run, we have parallelized the CVTree algorithm and moved the web server to a computer cluster with 64 cores. The new CVTree3 web server [21] is capable of producing trees with several thousands of leaves in a few minutes for a range of K-values, say for K = 3 to 7. In addition, the CVTree3 web server has the following advanced features:
CVTree3 is equipped with an interactive tree display, which allows collapsing or expanding the tree branches at the disposal of the user. The user may concentrate on an interested taxon by submitting an enquiry; only the neighborhood of the taxon is expanded and all of the rest collapsed properly, keeping the topology unchanged. Here, “collapsing” means replacing a whole branch by a single leaf. Usually, a collapsed branch is labeled by the name of the highest common taxon followed by the number of strains it represents. For example, <C>Methanococci{12} denotes a class-level monophyletic branch containing 12 leaves. If a taxon name is seen in two (or more) collapsed branches, such as <C>Classname{3/12} and <C>Classname{9/12}, then the taxonomically monophyletic class does not correspond to a single branch in the collapsed tree.
The web server reports “convergence statistics” of all tree branches, i.e., a list of all monophyletic and non-monophyletic taxa at all taxonomic ranks for every K-value. For example, the first two lines of the report read:
<D>Archaea{165}− − K5K6K7
<D>Bacteria{2707} − − K5K6 − −
(Numerals in curly brackets tell the number of organisms present in a collapsed branch.) Therefore, the two domains Archaea and Bacteria are both well defined as monophyletic branches at K = 5 and 6. We note that in the statistics, only genomes with complete lineage information are counted. The example project referred to in this paper contained, in addition, 14 archaeal and 143 bacterial genomes with one or more “unclassified” rank in the lineage. Therefore, in total {165 + 14}= 179 Archaea and {2707+243}= 2850 Bacteria genomes were used. The {m+n}convention is useful for looking for incomplete lineages in CVTree branches.
The lineage information of an organism is given in one line with labels <D>, <P>, <C>, <O>, <F>, <G> and <S>, standing for the ranks domain, phylum, class, order, family, genus and species. The sTrain label <T> does not appear in lineage information, but may be seen in a leaf. The original lineage information of the built-in genomes was taken from the NCBI taxonomy. The lineage information of user’s genomes was provided at uploading. Users are allowed to make lineage modifications and to see new statistics after doing re-collapsing.
When displaying a tree, the user may pull down a lineage modification window and enter a trial lineage in the form “old_lineage new_lineage”. For example, the initial lineage for <T>Caldiarchaeum_cryptofilum_OPF8_uid58601 put it in phylum Thaumarchaeota, but there is evidence that it belongs to a new phylum, Aigarchaeota, so the modification may look like:
<P>Thaumarchaeota · · · <G>Caldiarchaeum <P>Aigarchaeota · · · <G>Caldiarchaeum
The modification line is not required to contain all ranks, but the written part must be uniquely recognizable. By submitting the lineage modification, the user performs “re-collapsing” and gets a new report of “convergence statistics”.
The user may select any part of a CVTree and produce a print-quality figure in SVG, EPS, PDF or PNG format.
All of these useful features help to reveal the agreement and discrepancy of a large tree with taxonomy.

3. Outline of Archaea Taxonomy at and above the Rank Order

The taxonomy of Archaea was described in Volume 1 of the Manual, which appeared in 2001 [29], thus being somewhat outdated. Two phyla, the Crenarchaeota and the Euryarchaeota, were listed there. The Crenarchaeota contained only one class, Thermoprotei. According to the latest information provided in the List of Prokaryotic Names with Standing in Nomenclature (LPSN [30]), the class Thermoprotei contains five orders: Thermoproteales, Desulfococcales, Sulfolobales, Acidilobales and Fervidicoccales, the last two being proposed in 2009 [31] and 2010 [32], respectively. Originally, the phylum, Euryarchaeota, contained seven classes: Methanobacteria, Methanococci, Halobacteria, Thermoplasmata, Thermococci, Archaeoglobi and Methanopyri; all comprising one order, except for Methanococci, which contained three orders. Later on, in a revised roadmap of the Manual [33], the class Methanococci was left with only one order; the other two orders became part of the newly proposed class, Methanomicrobia. A third order, Methanocellales, in the last class was proposed in 2008 [34]. Very recently, there appeared a proposal [35] to divide the single-order class, Halobacteria, into three orders.
Over the past 15 years, a few new archaeal phyla have been proposed: Korarchaeota [36,37], Thaumarchaeota [38,39,40], Nanoarchaeota [41,42,43], Aigarchaeota [44], Parvarchaeota [45] and Bathyarchaeota [46]. All but the last three phyla have been listed in LPSN [30]. We will not touch on Parvarchaeota and Bathyarchaeota, due to a lack of well-annotated genome data.
The main focus of the present study is to check and compare the positions of these high-rank taxa in CVTree and to compare them with the 16S rRNA sequence analysis where some results obtained by other authors are available.

4. Results and Discussion

4.1. 16S rRNA Archaeal Phylogeny According to All-Species Living Tree

An authoritative reference to the 16S rRNA phylogeny is the All-Species Living Tree Project (LTP) [47,48,49]. LTP is an ambitious project to construct a single 16S rRNA tree based on all available type strains of hitherto named species of Archaea and Bacteria. The latest release, LTPs115 [50], of March, 2014, was based on 366 archaeal and 9905 bacterial 16S rRNA sequences. However, the 104-page PDF of the tree is hard to comprehend, especially when it comes to comparing the tree branchings with classification at various taxonomic ranks. We fetched the treeing and lineage information files LPTs115_SSU_tree.newick and LTPs115_SSU.csv from the LTP web site [50] and then collapsed the fully-fledged tree into various taxonomic ranks where possible.
We first obtained the Archaea branch containing 366 leaves and collapsed basically to the rank class without doing lineage modification (figure not shown). In fact, it was cut from the original “All-Species Living Tree” LTPs115 [50] based on all 366 archaeal and 9905 bacterial 16S rRNA sequences.
There was a line <C>Methanomicrobia{71/72} indicating that an outlier violated the monophyly of the branch. By inspecting the figure, the outlier turned out to be:
<O>Unclassified_Methanomicrobia · · · <T>HQ896499 · · · Unclassified_Methanomicrobia
It was located next to the monophyletic <C>Thermoplasmata{8}. Therefore, it does not look like an “Unclassified_Methanomicrobia”, but might be a miss-classified Thermoplasmata. Judging by its close neighborhood, we may temporarily modify the lineage to:
<C>Thermoplasmata<O>Thermoplasmatales<F>Thermoplasmataceae<G>Methanomassiliicoccus· · ·
After making the lineage modification, we get Figure 1. The branchings in Figure 1 fully agree with the taxonomy of Archaea, as outlined in Section 3, at the phylum and class ranks. In particular, the eight classes of Euryarchaeota all behave as well-defined monophyletic branches. Further more, if one expands the class Methanomicrobia, its three subordinate orders, Methanocellales{3}, Methanosarcinales{31} and Methanomicrobiales{37}, all appear as monophyletic branches (not shown in Figure 1). The definition of orders within Thermoprotei, the only class in Crenarchaeota, is somehow problematic (more on this point near the end of Subsection 4.2).
This kind of agreement should be expected, as the archaeal taxonomy is largely based on the 16S rRNA sequence analysis. However, as by design, the LTP is restricted to type strains with validly published names, one cannot check the positions of the newly proposed phyla and those strains lacking a definite lineage. The whole-genome-based CVTree approach may complement these aspects of phylogeny, since the criterion for inclusion of a strain into the tree is the availability of a sequenced genome, independent of its standing in nomenclature. In Subsection 4.3, the CVTree results are compared with 16S rRNA analyses done by other authors.
Figure 1. The Archaea branch in the All-Species Living Tree based on 366 16S rRNA sequences. The tree has been collapsed to the rank class (<C>), and only one lineage modification has been made. Numerals in curly brackets indicate the number of sequences contained in a collapsed branch. The collapsing and lineage modification was performed by using a web server similar to CVTree3. This Living Tree Viewer is accessible to all users [51].
Figure 1. The Archaea branch in the All-Species Living Tree based on 366 16S rRNA sequences. The tree has been collapsed to the rank class (<C>), and only one lineage modification has been made. Numerals in curly brackets indicate the number of sequences contained in a collapsed branch. The collapsing and lineage modification was performed by using a web server similar to CVTree3. This Living Tree Viewer is accessible to all users [51].
Life 05 00949 g001

4.2. The Whole-Genome-Based CVTree Phylogeny

CVTrees based on 179 Archaea, 2850 Bacteria and eight Eukarya genomes were generated by using the improved version CVTree3 [21] of the web server [20]. We show the Archaea part of a big CVTree in Figure 2. When inspecting the figure, we pay more attention to the newly proposed phyla and those taxa with incomplete or suspicious lineage information.
Figure 2. The 179-genome Archaea branch of CVTree obtained by using the CVTree3 web server [21] without making lineage modifications. It has been collapsed to the rank class where possible. The branching order is to be compared with taxonomy, but does not scale the branch lengths.
Figure 2. The 179-genome Archaea branch of CVTree obtained by using the CVTree3 web server [21] without making lineage modifications. It has been collapsed to the rank class where possible. The branching order is to be compared with taxonomy, but does not scale the branch lengths.
Life 05 00949 g002
In what follows, the non-monophyletic branches are summarized and possible lineage modifications are suggested.
The first line of Figure 2 <F>Halobacteriaceae{28+1} informs that among the 29 genomes, there was one without proper lineage information. In fact, it was Halophilic_archaeon_DL31_uid72619, a name not validly published and not following the basic rule for a binomen. Its NCBI lineage from phylum down to genus was “unclassified”. However, by expanding this line, the strain is seen to be located deeply inside the class Halobacteria (see Figure 4). As at present, the class consists of only one order, which, in turn, is made of one family [33], it is safe to assign this strain to a yet unspecified genus. This modification would yield a monophyletic branch, Halobacteria{29}.
The fourth line of Figure 2 <P>Euryarchaeota{0+3} represents a cluster obtained by collapsing three strains (not explicitly written in the figure):
  • Thermoplasmatales_archaeon_BRNA1_uid195930, with NCBI lineage <C>Thermoplasmata<O>Unclassified<F>Unclassified;
  • Candidatus_Methanomethylophilus_alvus_Mx1201_uid196597, with NCBI lineage <C>Unclassified<O>Unclassified<F>Unclassified,
  • Methanomassiliicoccus_sp_Mx1_Issoire_uid207287, with NCBI lineage <C>Methanomicrobia<O>Unclassified<F>Unclassified.
If the NCBI lineage would be accepted, two of the above strains must violate the monophyly of the classes Thermoplasmata{4/5} and Methanomicrobia{24/25}. However, the fact that these three strains, taken together, make a monophyletic branch hint of the possibility to assign them to a yet unspecified class. This modification would restore the monophyly of the two classes Methanomicrobia{24} (Line 5 in Figure 2) and Thermoplasmata{4} (Line 3 in Figure 2), as seen in Figure 2.
The newly proposed phylum, Thaumarchaeota, appears to be non-monophyletic, as an outlying strain, Candidatus Caldiarchaeum subterranum, was assigned to this phylum according to the NCBI taxonomy. The NCBI assignment might reflect its position in some phylogenetic tree based on concatenated proteins, e.g., Figure 2 in [52]. However, in the original paper reporting the discovery of this strain [44] and in recent 16S rRNA studies, e.g., [46], Candidatus Caldiarchaeum subterranum was proposed to make a new phylum, Aigarchaeota. CVTrees support the introduction of this new phylum. A lineage modification of Candidatus Caldiarchaeum subterranum from Thaumarchaeota to Aigarchaeota would lead to a monophyletic Thaumarchaeota.
The Candidatus genus, Aciduliprofundum, is considered a member of the DHEV2 (deep-sea hydrothermal vent euryarchaeotic 2) phylogenetic cluster. No taxonomic information was given in the original papers [53,54]. The NCBI taxonomy did not provide definite lineage information for this taxon at the class, order and family ranks. According to [53], the whole DHEV2 cluster was located close to Thermoplasmatales in a maximum-likelihood analysis of 16S rRNA sequences. A similar placement was seen in [52], where a Bayesian tree of the archaeal domain based on concatenation of 57 ribosomal proteins put a lonely Aciduliprofundum next to Thermoplasmata. However, in CVTrees, constructed for all K-values from three to nine, Aciduliprofundum is juxtaposed with the class Thermococci{18}. An observation in [54] that this organism shares a rare lipid structure with a few species from Thermococcales may hint to its possible association with the latter. If we temporarily presume a lineage:
<C>Thermococci<O>Unclassified<F>Unclassified<G>Aciduliprofundum · · ·
one might have a monophyletic class <C>Thermococci{20}.
Since none of the 13 DHEV2 members listed in [53] have a sequenced genome so far, CVTree cannot tell the placement of the DHEV2 cluster as a whole for the time being. It remains an open problem whether DHEV2 is close to Thermoplasmata or to Thermococci or if a new class is needed to accommodate DHEV2.
The new phylum, Korarchaeota, violates the monophyly of the phylum, Crenarchaeota, by drawing to itself the family, Thermofilaceae. However, in an on-going study of ours (not published yet) using a much larger dataset, this violation no longer shows up; both Korarchaeota and Crenarchaeota restore their phylum status. Taking into account the fact that both Korarchaeota and Thermofilaceae are represented by single species for the time being, their placement certainly requires further study with broader sampling of genomes.
However, it is worth noting that the whole lower cluster of Figure 2 supports a recent proposal for a new “TACK” superphylum [55], made of Thaumarchaeota, Aigarchaeota, Crenarchaeota and Korarchaeota.
After making all of the aforementioned lineage modifications, the resulting CVTree (not shown) looks much like Figure 2 with minor changes of some labels.
All eight classes of Euryarchaeota, as listed in Section 3, are well-defined on their own. In addition, a new class might be introduced for the three archaeons without detailed lineage information, collapsed as <P>Euryarchaeota{0+3}. The last point cannot be checked in the All-Species Living Tree without extending it to cover organisms without validly published names.
Now, it comes to inspect the orders in the single-class phylum, Crenarchaeota. There is no a priori reason to expect that 16S rRNA sequence analysis and the CVTree approach should lead to identical tree branchings. Though all being assigned to Crenarchaeota, the forty eight 16S rRNA sequences in the All-Species Living Tree and the 50 genomes in the CVTree do not belong to the same set of organisms. One can only compare those in common.
Two orders, Sulfolobales and Thermoproteales, are monophyletic in both CVTree and 16S rRNA trees, putting aside the insertion of the single-species, Korarchaeota, into Thermoproteales in CVTree. The introduction of the new orders, Acidilobales in 2009 [31] and Fervidicoccales in 2010 [32], violated the monophyly of the so-far monophyletic order, Desulfurococcales (the genus, Acidilobus, was considered part of Desulfurococcaceae before 2009). A main criterion to distinguish species of the new order from that in Desulfurococcales was indicated in [31] as acidophily, a point that might require further verification.
The CVTree results summarized above were a continuation and extension of a similar study [56] based on 62 Archaea genomes available at the beginning of 2010. The fact that, five years apart and with 117 more genomes added, the results remain consistent informs of the robustness of the CVTree approach.

4.3. Phylum Distribution in Other Phylogenies

The conclusions drawn above concerning the positions of the newly proposed phyla and organisms with uncertain lineage information cannot be directly compared with the All-Species Living Tree Project [47,48,49], as by design, LTP only includes strains with validly published names and standing in nomenclature. To this end, one must look for other published studies.
An effective way of comprehending a tree with many leaves consists of collapsing the tree branches to appropriate taxonomic ranks, as we did in Figure 1 and Figure 2. For published results of other authors, we collapsed their trees manually. Figure 3 shows four such trees collapsed to the phylum level from corresponding trees in [44] and [52]. Figure 3a is a maximum likelihood tree of concatenated SSU and LSU rRNAs using 3063 nucleotide positions; Figure 3b is a maximum likelihood tree of 45 concatenated ribosomal proteins and nine RNA polymerase subunits using 5993 aligned amino acids; and Figure 3c is a maximum likelihood tree from translation EF2 proteins based on 590 residues. All of these three subfigures were obtained by collapsing Figure 4 in [44]. Figure 3d was collapsed from a Bayesian tree based on concatenation of 67 ribosomal proteins from 89 genomes (Figure 2 in [52]).
Figure 3. Archaea trees collapsed to phyla. Abbreviations: A = Aigarchaeota, C = Crenarchaeota, E = Euryarchaeota, K = Korarchaeota, P = Parvarchaeota, N = Nanoarchaeota, T = Thaumarchaeota. (a–c) Obtained by collapsing Figure 4 in [44]; (d) obtained by collapsing Figure 2 in [52]. Numerals in parentheses indicate the number of species represented in each phylum. For details, see the text and the cited papers.
Figure 3. Archaea trees collapsed to phyla. Abbreviations: A = Aigarchaeota, C = Crenarchaeota, E = Euryarchaeota, K = Korarchaeota, P = Parvarchaeota, N = Nanoarchaeota, T = Thaumarchaeota. (a–c) Obtained by collapsing Figure 4 in [44]; (d) obtained by collapsing Figure 2 in [52]. Numerals in parentheses indicate the number of species represented in each phylum. For details, see the text and the cited papers.
Life 05 00949 g003
The interrelationship among phyla deduced from a limited number of representatives in a tree is subject to further changes when more data become available. In 2001, when there was only one genome from each of the bacterial phyla, Aquificae and Thermotogae, there was speculation that these phyla would make a clade [57,58]. A decade later, it was observed that, though remaining in a big cluster, many other phyla have gotten inserted in between Aquificae and Thermotogae; see, e.g., [10]. This point concerns especially the archaeal phyla with only one representative genome for the time being.
By comparing our Figure 2 with trees in Figure 3, we see:
The newly proposed phyla, Thaumarchaeota, Korarchaeota and Aigarchaeota, are supported in many phylogenies; especially the superphylum “TACK” is supported in most phylogenies, with “TAC” being a persistent core.
The nano-sized archaean symbiont, Nanoarchaeum equitans, has a highly reduced genome (490,885 bp [42]). It is the only described representative of a newly proposed phylum, Nanoarchaeota, and it cuts into the otherwise monophyletic phylum, Euryarchaeota. We note that the monophyly of Euryarchaeota was also violated by Nanoarchaeum in some 16S rRNA trees; see, e.g., Figure 4 in a 2009 paper [59], as well as (c) and (d) in Figure 3. It has been known that tiny genomes of endosymbiont microbes often tend to move towards the baseline of a tree and distort the overall picture. In fact, we have suggested skipping such tiny genomes when studying bacterial phylogeny; see, e.g., [28] and a note on the home page of the CVTree web server [20]. In the present case, we may at most say that Nanoarchaeota probably makes a separate phylum, but its cutting into Euryarchaeota might be a side effect due to the tiny size of the highly-reduced genome.
So far, we have concentrated on “mega-classification” [14] of Archaea species, mainly their taxonomy at the rank order and above. Quite recently, there appeared a proposal [35] to split the single-order class, Halobacteria, into three orders: Haloferacales, Natrialbales and Halobacteriales. In order to check whether CVTree supports this proposal or not, an expansion of the class, Halobacteria{29}, the first line in Figure 2, is given in Figure 4. Indeed, the three main branches are clearly seen in Figure 4, corresponding to the three proposed orders, except for a single genus, Halakalicoccus, which did not take a definite position, even in trees obtained by different methods in [35]. Being supported by the previous predictive power of CVTree, we anticipate that the position of Halakalicoccus in Figure 4 may better reflect the reality, a point verifiable in the future.
Figure 4. The class, Halobacteria, expanded to the genus level.
Figure 4. The class, Halobacteria, expanded to the genus level.
Life 05 00949 g004

5. Conclusions

The CVTree approach to prokaryotic phylogeny distinguishes itself from the 16S rRNA sequence analysis, both in the input data (genomes instead of RNA sequences) and in the methodology (K-peptide counting versus sequence alignment). The agreement of the two approaches makes the results more objective and convincing, whereas a few discrepancies call for further study. A phylogenetic study across many phyla naturally places emphasis on building a robust backbone for classification. At taxonomic rank order and above, whole-genome approaches are essentially simpler, as the only prerequisite is having the genomes at hand. Sooner or later, phylogenetic information and taxonomic placement will become by-products of genome analyses. The cost of sequencing a prokaryotic genome will drop below the average expense of carrying out conventional phenotyping experiments. To this end, a crucial factor is the availability of reliable, convenient and easy-to-use tools, such as the CVTree web server. The technique of collapsing and expanding tree branches with an interactive display, as well as automatic reporting of comparison results at all taxonomic ranks makes large-scale studies more feasible. The experience accumulated in this study on 179 archaeal strains will be instructive for carrying out similar studies on Bacteria, which would cover hundred-fold more strains.
The 16S rRNA sequence analysis will remain an indispensable tool in microbiology. The number of sequenced genomes can never catch up with that of rRNA sequences. Although the CVTree method adds more agreement than discrepancy to the 16S rRNA results, the difference between the two approaches certainly deserves in-depth scrutiny. In addition, since high resolution power at the species level and below is a prominent advantage of CVTree as compared to 16S rRNA sequence analysis [12,60], we will elaborate on this aspect in the future when the amount of sequenced archaeal genomes will have increased substantially.


The authors thank the support of the National Basic Research Program of China (973 Project Grant No. 2013CB34100) and of The State Key Laboratory of Applied Surface Physics, as well as the Department of Physics, Fudan University. An early discussion with Jiandong Sun on problems raised in this study is also gratefully acknowledged. The authors thank the three anonymous reviewers for making essential comments and suggestions to improve the manuscript.

Author Contributions

Bailin Hao designed the study and wrote the manuscript. Guanghong Zuo and Zhao Xu built and maintained the web server, collected data and carried out the calculation. Guanghong Zuo and Bailin Hao performed the analysis. All authors have read and approved the final manuscript.

Appendix: List of Genomes Used in This Study

All of the 179 genomes used in the present study are listed in the following table, together with their accession number and approximate proteome size (in 106 amino acids). The 165 genomes from the NCBI FTP site [19] come with uid numbers, but the uploaded ones appear without uid. We note that in the EBI list of Archaea [22], there are 176 species. Excluding a tiny one, 175 genomes remain. The four genomes present at NCBI, but absent at EBI, are Nos. 31, 40, 106 and 137.
Table A1. list of genomes used in this study.
Table A1. list of genomes used in this study.
No. Name of StrainProteome Size (106AA)Accession Number
1Acidianus hospitalis W1 uid668750.62NC_015518
2Acidilobus saccharovorans 345 15 uid513950.45NC_014374
3Aciduliprofundum boonei T469 uid433330.47NC_013926
4Aciduliprofundum sp. MAR08 339 uid1844070.45NC_019942
5Aeropyrum camini SY1 JCM 12091 uid2223110.47NC_022521
6Aeropyrum pernix K1 uid577570.49NC_000854
7Archaeoglobus fulgidus DSM 4304 uid577170.67NC_000917
8Archaeoglobus fulgidus DSM 87740.69CP006577
9Archaeoglobus profundus DSM 5631 uid434930.48NC_013741
10Archaeoglobus sulfaticallidus PM70 1 uid2010330.61NC_021169
11Archaeoglobus veneficus SNP6 uid652690.56NC_015320
12Caldisphaera lagunensis DSM 15908 uid1834860.44NC_019791
13Caldivirga maquilingensis IC 167 uid587110.60NC_009954
14Candidatus Caldiarchaeum subterraneum uid2272230.51NC_022786
15Candidatus Korarchaeum cryptofilum OPF8 uid586010.48NC_010482
16Candidatus Methanomethylophilus alvus Mx1201 uid1965970.49NC_020913
17Candidatus Nitrosopumilus koreensis AR1 uid1761290.47NC_018655
18Candidatus Nitrosopumilus sp. AR2 uid1761300.49NC_018656
19Candidatus Nitrososphaera evergladensis SR10.82CP007174
20Candidatus Nitrososphaera gargensis Ga9 2 uid1767070.77NC_018719
21Methanomassiliicoccus sp. Mx1 Issoire uid2072870.56NC_021353
22Cenarchaeum symbiosum A uid614110.62NC_014820
23Desulfurococcus fermentans DSM 16532 uid751190.40NC_018001
24Desulfurococcus kamchatkensis 1221n uid591330.40NC_011766
25Desulfurococcus mucosus DSM 2162 uid622270.39NC_014961
26Ferroglobus placidus DSM 10642 uid408630.66NC_013849
27Ferroplasma acidarmanus fer1 uid540950.57NC_021592
28Fervidicoccus fontis Kam940 uid1622010.38NC_017461
29Halalkalicoccus jeotgali B3 uid503050.83NC_014297
30Haloarcula hispanica ATCC 33960 uid724751.00NC_0159432
31Haloarcula hispanica N601 uid2309200.98NC_0230102
32Haloarcula marismortui ATCC 43049 uid577190.97NC_0063972
33Halobacterium salinarum R1 uid615710.60NC_010364
34Halobacterium sp. DL10.83CP007060
35Halobacterium sp. NRC 1 uid577690.59NC_002607
36Haloferax mediterranei ATCC 33500 uid1673150.84NC_017941
37Haloferax volcanii DS2 uid468450.82NC_013967
38Halogeometricum borinquense DSM 11551 uid549190.82NC_014729
39Halomicrobium mukohataei DSM 12286 uid591070.90NC_013202
40Halophilic archaeon DL31 uid726190.81NC_015954
41Halopiger xanaduensis SH 6 uid681051.05NC_015666
42Haloquadratum walsbyi C23 uid1620190.77NC_017459
43Haloquadratum walsbyi DSM 16790 uid586730.78NC_008212
44Halorhabdus tiamatea SARL4B uid2140820.79NC_021921
45Halorhabdus utahensis DSM 12940 uid591890.91NC_013158
46Halorubrum lacusprofundi ATCC 49239 uid588070.93NC_0120292
47Halostagnicola larsenii XH-480.78CP007055
48Haloterrigena turkmenica DSM 5511 uid435011.09NC_013743
49Halovivax ruber XH 70 uid1848190.91NC_019964
50Hyperthermus butylicus DSM 5456 uid577550.45NC_008818
51Ignicoccus hospitalis KIN4 I uid583650.40NC_009776
52Ignisphaera aggregans DSM 17230 uid518750.54NC_014471
53Metallosphaera cuprina Ar 4 uid663290.54NC_015435
54Metallosphaera sedula DSM 5348 uid587170.64NC_009440
55Methanobacterium formicicum strain BRM90.67CP006933
56Methanobacterium sp. AL 21 uid636230.72NC_015216
57Methanobacterium sp. MB1 complete sequence uid2316900.56NC_023044
58Methanobacterium sp. SWAN 1 uid673590.66NC_015574
59Methanobrevibacter ruminantium M1 uid458570.76NC_013790
60Methanobrevibacter smithii ATCC 35061 uid588270.56NC_009515
61Methanobrevibacter sp. AbM4 uid2065160.50NC_021355
62Methanocaldococcus fervens AG86 uid593470.44NC_013156
63Methanocaldococcus infernus ME uid488030.41NC_014122
64Methanocaldococcus jannaschii DSM 2661 uid577130.48NC_000909
65Methanocaldococcus sp. JH1460.47CP009149
66Methanocaldococcus sp. FS406 22 uid424990.51NC_013887
67Methanocaldococcus vulcanius M7 uid411310.49NC_013407
68Methanocella arvoryzae MRE50 uid616230.89NC_009464
69Methanocella conradii HZ254 uid1579110.70NC_017034
70Methanocella paludicola SANAE uid428870.86NC_013665
71Methanococcoides burtonii DSM 6242 uid580230.69NC_007955
72Methanococcus aeolicus Nankai 3 uid588230.44NC_009635
73Methanococcus maripaludis C5 uid587410.51NC_009135
74Methanococcus maripaludis C6 uid589470.51NC_009975
75Methanococcus maripaludis C7 uid588470.51NC_009637
76Methanococcus maripaludis KA1 DNA0.55AP011526
77Methanococcus maripaludis OS7 DNA0.52AP011528
78Methanococcus maripaludis S2 uid580350.49NC_005791
79Methanococcus maripaludis X1 uid707290.51NC_015847
80Methanococcus vannielii SB uid587670.49NC_009634
81Methanococcus voltae A3 uid495290.51NC_014222
82Methanocorpusculum labreanum Z uid587850.52NC_008942
83Methanoculleus bourgensis MS2T uid1713770.77NC_018227
84Methanoculleus marisnigri JR1 uid585610.72NC_009051
85Methanohalobium evestigatum Z 7303 uid498570.63NC_014253
86Methanohalophilus mahii DSM 5219 uid473130.59NC_014002
87Methanolobus psychrophilus R15 uid1779250.87NC_018876
88Methanomethylovorans hollandica DSM 15978 uid1848640.69NC_019977
89Methanoplanus petrolearius DSM 11571 uid526950.83NC_014507
90Methanopyrus kandleri AV19 uid578830.50NC_003551
91Methanoregula boonei 6A8 uid588150.73NC_009712
92Methanoregula formicicum SMSP uid1844060.81NC_019943
93Methanosaeta concilii GP6 uid662070.84NC_015416
94Methanosaeta harundinacea 6Ac uid811990.73NC_017527
95Methanosaeta thermophila PT uid584690.51NC_008553
96Methanosalsum zhilinae DSM 4017 uid682490.61NC_015676
97Methanosarcina acetivorans C2A uid578791.42NC_003552
98Methanosarcina barkeri str Fusaro uid577151.12NC_007355
99Methanosarcina mazei Go1 uid578931.02NC_003901
100Methanosarcina mazei Tuc01 uid1901850.82NC_020389
101Methanosphaera stadtmanae DSM 3091 uid584070.49NC_007681
102Methanosphaerula palustris E1 9c uid591930.82NC_011832
103Methanospirillum hungatei JF 1 uid581811.01NC_007796
104Methanothermobacter marburgensis str Marburg uid516370.49NC_014408
105Methanothermobacter thermautotrophicus str Delta H uid578770.53NC_000916
106Methanothermobacter thermautotrophicus CaT2 DNA0.51AP011952
107Methanothermococcus okinawensis IH1 uid515350.45NC_015636
108Methanothermus fervidus DSM 2088 uid601670.38NC_014658
109Methanotorris igneus Kol 5 uid673210.51NC_015562
110Nanoarchaeum equitans Kin4 M uid580090.15NC_005213
111Natrialba magadii ATCC 43099 uid462451.05NC_013922
112Natrinema pellirubrum DSM 15624 uid744371.06NC_019962
113Natrinema sp. J7 2 uid1713371.05NC_018224
114Natronobacterium gregoryi SP2 uid744391.04NC_019792
115Natronococcus occultus SP4 uid1848631.12NC_019974
116Natronomonas moolapensis 8 8 11 uid1901820.82NC_020388
117Natronomonas pharaonis DSM 2160 uid584350.78NC_007426
118Nitrosopumilus maritimus SCM1 uid589030.49NC_010085
119Nitrososphaera viennensis EN760.73CP007536
120Palaeococcus pacificus DY203410.56CP006019
121Picrophilus torridus DSM 9790 uid580410.47NC_005877
122Pyrobaculum aerophilum str IM2 uid577270.66NC_003364
123Pyrobaculum arsenaticum DSM 13514 uid584090.61NC_009376
124Pyrobaculum calidifontis JCM 11548 uid587870.61NC_009073
125Pyrobaculum islandicum DSM 4184 uid586350.53NC_008701
126Pyrobaculum neutrophilum V24Sta uid584210.53NC_010525
127Pyrobaculum oguniense TE7 uid844110.71NC_016885
128Pyrobaculum sp. 1860 uid823790.73NC_016645
129Pyrococcus abyssi GE5 uid629030.54NC_000868
130Pyrococcus furiosus COM1 uid1696200.57NC_018092
131Pyrococcus furiosus DSM 3638 uid578730.59NC_003413
132Pyrococcus horikoshii OT3 uid577530.55NC_000961
133Pyrococcus sp. NA2 uid665510.57NC_015474
134Pyrococcus sp. ST04 uid1672610.52NC_017946
135Pyrococcus yayanosii CH1 uid682810.51NC_015680
136Pyrolobus fumarii 1A uid734150.54NC_015931
137Salinarchaeum sp. Harcht Bsk1 uid2070010.91NC_021313
138Staphylothermus hellenicus DSM 12710 uid458930.46NC_014205
139Staphylothermus marinus F1 uid587190.46NC_009033
140Sulfolobus acidocaldarius DSM 639 uid583790.63NC_007181
141Sulfolobus acidocaldarius N8 uid1890270.62NC_020246
142Sulfolobus acidocaldarius Ron12 I uid1890280.64NC_020247
143Sulfolobus acidocaldarius SUSAZ uid2322540.59NC_023069
144Sulfolobus islandicus HVE10 4 uid1620670.76NC_017275
145Sulfolobus islandicus L D 8 5 uid436790.77NC_013769
146Sulfolobus islandicus L S 2 15 uid588710.76NC_012589
147Sulfolobus islandicus LAL14 1 uid1972160.71NC_021058
148Sulfolobus islandicus M 14 25 uid588490.74NC_012588
149Sulfolobus islandicus M 16 27 uid588510.76NC_012632
150Sulfolobus islandicus M 16 4 uid588410.75NC_012726
151Sulfolobus islandicus REY15A uid1620710.72NC_017276
152Sulfolobus islandicus Y G 57 14 uid589230.78NC_012622
153Sulfolobus islandicus Y N 15 51 uid588250.77NC_012623
154Sulfolobus solfataricus 98 2 uid1679980.72NC_017274
155Sulfolobus solfataricus P2 uid577210.84NC_002754
156Sulfolobus tokodaii str 7 uid578070.76NC_003106
157Thermococcus barophilus MP uid547330.62NC_014804
158Thermococcus eurythermalis strain A5010.60CP008887
159Thermococcus gammatolerans EJ3 uid593890.64NC_012804
160Thermococcus kodakarensis KOD1 uid582250.64NC_006624
161Thermococcus litoralis DSM 5473 uid829970.67NC_022084
162Thermococcus nautili strain 30 10.61CP007264
163Thermococcus onnurineus NA1 uid590430.56NC_011529
164Thermococcus sibiricus MM 739 uid593990.55NC_012883
165Thermococcus sp. 4557 uid708410.61NC_015865
166Thermococcus sp. AM4 uid547350.63NC_016051
167Thermococcus sp. CL1 uid1682590.58NC_018015
168Thermococcus sp. ES10.58CP006965
169Thermofilum pendens Hrk 5 uid585630.54NC_008698
170Thermofilum sp. 1910b uid2153740.52NC_022093
171Thermogladius cellulolyticus 1633 uid1674880.41NC_017954
172Thermoplasma acidophilum DSM 1728 uid615730.45NC_002578
173Thermoplasma volcanium GSS1 uid577510.45NC_002689
174Thermoplasmatales archaeon BRNA1 uid1959300.44NC_020892
175Thermoproteus tenax Kra 1 uid744430.55NC_016070
176Thermoproteus uzoniensis 768 20 uid650890.59NC_015315
177Thermosphaera aggregans DSM 11486 uid489930.40NC_014160
178Vulcanisaeta distributa DSM 14429 uid528270.71NC_014537
179Vulcanisaeta moutnovskia 768 28 uid636310.67NC_015151

Conflicts of Interest

The authors declare no conflict of interest.


  1. Woese, C.R.; Fox, G.E. Phylogenetic structure of the prokaryotic domain: The primary kingdoms. Proc. Natl. Acad. Sci. USA 1977, 74, 5088–5090. [Google Scholar] [CrossRef] [PubMed]
  2. Woese, C.R.; Kandler, O.; Wheelis, M.L. Towards a natural system of organisms: Proposal for the domains Archaea, Bacteria, and Eucarya. Proc. Natl. Acad. Sci. USA 1990, 87, 4576–4579. [Google Scholar] [CrossRef] [PubMed]
  3. Fox, C.E.; Magrum, L.J.; Balch, W.E.; Wolfe, R.S.; Woese, C.R. Classification of methanogenic bacteria by 16S ribosomal RNA characterization. Proc. Natl. Acad. Sci. USA 1977, 74, 4537–4541. [Google Scholar] [CrossRef] [PubMed]
  4. Fox, G.E.; Pechman, K.R.; Woese, C.R. Comarative cataloging of 16S ribosomal ribonucleic acid: Molecular approach to procaryotic systematics. Int. J. Syst. Bacteriol. 1977, 27, 44–57. [Google Scholar] [CrossRef]
  5. The Bergey’s Manual Trust. In Bergey’s Manual of Systematic bacteriology, 2nd ed.; Springer: New York, NY, USA; Volumes 1∼5, pp. 2001–2012.
  6. Konstantinidis, K.T.; Tiedje, J.M. Towards a genome-based taxonomy for prokaryotes. J. Bacteriol. 2005, 187, 6258–6264. [Google Scholar]
  7. Qi, J.; Wang, B.; Hao, B. Whole genome prokaryote phylogeny without sequence alignment: A K-string composition approach. J. Mol. Evol. 2004, 58, 1–11. [Google Scholar] [CrossRef] [PubMed]
  8. Hao, B.; Qi, J. Prokaryote phylogeny without sequence alignment: From avoidance signature to composition distance. J. Bioinf. Comput. Biol. 2004, 2. [Google Scholar] [CrossRef]
  9. Gao, L.; Qi, J.; Sun, J.; Hao, B. Prokaryote phylogeny meets taxonomy: An exhaustive comparison of composition vector trees with systematic bacteriology. Sci. China Life Sci. 2007, 50, 587–599. [Google Scholar] [CrossRef]
  10. Li, Q.; Xu, Z.; Hao, B. Composition vector approach to whole-genome-based prokaryotic phylogeny: Success and foundations. J. Biotech. 2010, 149, 115–119. [Google Scholar] [CrossRef]
  11. Zuo, G.; Xu, Z.; Yu, H.; Hao, B. Jackknife and bootstrap tests of the composition vector trees. Genomics Proteomics Bioinform. 2010, 8, 262–267. [Google Scholar] [CrossRef]
  12. Hao, B. CVTrees support the Bergey’s systematics and provide high resolution at species level and below. Bull. BISMiS 2011, 2, 189–196. [Google Scholar]
  13. Chan, P.P.; Cozen, A.E.; Lowe, T.M. Reclassification of Thermoproteus neutrophilus Stetter and Zillig 1989 as Pyrobaculum neutrophilum comb. nov., based on phylogenetic analysis. Int. J. Syst. Evol. Microbiol. 2013, 63, 751–759. [Google Scholar] [CrossRef] [PubMed]
  14. Cavalier-Smith, T. The neomuran origin of archaebacteria, the negibacterial root of the universal tree and bacterial megaclassification. Int. J. Syst. Evol. Microbiol. 2002, 52, 7–76. [Google Scholar] [PubMed]
  15. Lapage, S.P.; Sneath, P.H.A.; Lessel, E.F.; Skerman, V.B.D.; Seeliger, H.P.R.; Clark, W.A. International Code of Nomenclature of Bacteria: Bacteriological Code 1990; ASM Press: Washington, DC, USA, 1992. [Google Scholar]
  16. De Vos, P.; Trüper, H.G. Judicial Commission of the International Committee on Systematic Bacteriology. Int. J. Syst. Evol. Microbiol. 2000, 50, 2239–2244. [Google Scholar] [CrossRef]
  17. The GOLD (Genomes On Line Database) site. Available online: (accessed on 12 February 2015).
  18. PATRIC (Pathosystems Resource Integration Center). Available online: (accessed on 12 February 2015).
  19. The NCBI FTP site. Available online: (accessed on 27 February 2015).
  20. Xu, Z.; Hao, B. CVTree update: A newly designed phylogenetic study platform using composition vectors and whole genomes. Nucleic Acids Res. 2009, 37, W174–W178. [Google Scholar] [CrossRef] [PubMed]
  21. The much improved CVTree3 Web Server. Available online: (accessed on 25 February 2015).
  22. The EBI Archaea genome list. Available online: (accessed on 15 February 2015).
  23. Kimura, M. The Neutral Theory of Molecular Evolution; Cambridge University Pess: Cambridge, UK, 1985. [Google Scholar]
  24. Woese, C. The universal ancestor. Proc. Natl. Acad, Sci. USA 1998, 95, 6854–6859. [Google Scholar] [CrossRef]
  25. Wagner, A; de la Chaus, N. Distant horizontal gene transfer is rare for multiple families of prokaryotic insertion sequences. Mol. Genet. Genomics 2008, 280, 397–408. [Google Scholar] [CrossRef] [PubMed][Green Version]
  26. Qi, J.; Luo, H.; Hao, B. CVTree: A phylogenetic tree reconstruction tool based on whole genomes. Nucleic Acids Res. 2004, 32, W45–W47. [Google Scholar] [CrossRef] [PubMed]
  27. Zuo, G.; Li, Q.; Hao, B. On K-peptide length in composition vector phylogeny of prokaryotes. Comput. Biol. Chem. 2014, 53, 166–173. [Google Scholar] [CrossRef] [PubMed]
  28. Hao, B. Whole-genome based prokaryotic branches in the Tree of Life. In Darwin’s Heritage Today: Proceedings of the Darwin 200 Beijing International Conference; Long, M., Gu, H., Zhou, Z., Eds.; High Education Press: Beijing, China, 2010; pp. 102–113. [Google Scholar]
  29. Garrity, G.M.; Holt, J.G. Taxonomic Outline of the Archaea and Bacteria. In Bergey’s Manual of Systematic Bacteriology, 2nd ed.; Boone, D.R., Castenholz, R.W., Eds.; Springer: New York, NY, USA, 2001; Volume 1, pp. 155–156. [Google Scholar]
  30. Parte, A.C. LPSN—list of prokaryotic names with standing in Nomenclature. Nucleic Acids Res. 2014, 42, D613–D616. [Google Scholar] [CrossRef] [PubMed]
  31. Prokofeva, M.I.; Kostrikina, N.A.; Kolganova, T.V.; Tourova, T.P.; Lysenko, A.M.; Lebedinsky, A.V.; Bonch-Osmolovskaya1, F.A. Isolation of the anaerobic thermoacidophilic crenarchaeote Acidilobus saccharovorans sp. nov. and proposal of Acidilobales ord. nov., including Acidilobaceae fam. nov. and Caldisphaeraceae fam. nov. Int. J. Syst. Evol. Microbiol. 2009, 59, 3116–3122. [Google Scholar] [CrossRef] [PubMed]
  32. Perevalova, A.A.; Bidzhieva, S.K.; Kublanov, I.V.; Hinrichs, K.-U.; Liu, X.L.; Mardanov, A.V.; Lebedinsky, A.V.; Bonch-Osmolovskaya, E.A. Fervidicoccus fontis gen. nov., sp. nov., an anaerobic, thermophilic crenarchaeote from terrestrial hot springs, and proposal of Fervidicoccaceae fam. nov. and Fervidicoccales ord. nov. Int. J. Syst. Evol. Microbiol. 2010, 60, 2082–2088. [Google Scholar] [CrossRef] [PubMed]
  33. Garrity, G.M.; Bell, J.A.; Lilburn, T.G. The Revised Roadmap to the Manual. In Bergey’s Manual of Systematic Bacteriology, 2nd ed.; Springer: New York, NY, USA, 2005; Volume 2, pp. 159–187. [Google Scholar]
  34. Sakai, S.; Imachi, H.; Hanada, S.; Ohashi, A.; Harada1, H.; Kamagata, Y. Methanocella paludicola gen. nov., sp. nov., a methane-producing archaeon, the first isolate of the lineage “Rice Cluster I”, and proposal of the new archaeal order Methanocellales ord. nov. Int. J. Syst. Evol. Microbiol. 2008, 58, 929–936. [Google Scholar] [CrossRef] [PubMed]
  35. Gupta, R.S.; Naushad, S.; Baker, S. Phylogenomic analyses and molecular signatures for the class Halobacteria and its two major clades: A proposal for division of the class Halobacteria into an emended order Halobacteriales and two new orders, Haloferacales ord. nov. and Natrialbales ord. nov. Int. J. Syst. Evol. Microbiol. 2014. [CrossRef]
  36. Barns, S.M.; Delwiche, C.F.; Palmer, J.D.; Pace, N.R. Perspectives on archaeal diversity, thermophyly and monophyly from environmental rRNA sequences. Proc. Natl. Acad. Sci. USA 1996, 93, 9188–9193. [Google Scholar] [CrossRef] [PubMed]
  37. Auchtung, T.A.; Shyndriayeva, G.; Cavanaugh, C.M. 16S rRNA phylogenetic analysis and quantification of Koarchaeota indigenous to the hot springs of Kamchatka, Russia. Extremophiles 2011, 15, 105–116. [Google Scholar] [CrossRef] [PubMed]
  38. Brochier-Armanet, C.; Boussau, B.; Gribaldo, S.; Forterre, P. Mesophilic crenarchaeota: Proposal for a third archaeal phylum, the Thaumarchaeota. Nat. Rev. Microbiol. 2008, 6, 245–252. [Google Scholar] [CrossRef] [PubMed]
  39. Gupta, R.S.; Shami, A. Molecular signatures for the Crenarchaeota and Thaumarchaeota. Antonie van Leeuwenhoek 2011, 99, 133–157. [Google Scholar] [CrossRef] [PubMed]
  40. Pester, M.; Schleper, C.; Wagner, M. The Thaumarchaeota: An emerging view of their phylogeny and ecophysiology. Curr. Opin. Microbiol. 2011, 14, 300–308. [Google Scholar] [CrossRef] [PubMed]
  41. Huber, H.; Hohn, M.J.; Rachel, R.; Fuchs, T.; Wimmer, V.C.; Stetter, K.O. A new phylum of Archaea represented by a nano-sized hyperthermophilic symbiont. Nature 2002, 417, 63–67. [Google Scholar] [CrossRef] [PubMed]
  42. Waters, E.; Hohn, M.J.; Ahel, I.; Graham, D.E.; Adams, M.D.; Barnstead, M.; Beeson, K.Y.; Bibbs, L.; Bolanos, R.; Keller, M.; et al. The genome of Nanoarchaeum equitan: Insights into early archaeal evolution and derived parasitism. Proc. Natl. Aad. Sci. USA 2003, 100, 12984–12988. [Google Scholar] [CrossRef]
  43. Clingenpeel, S.; Kan, J.; Macur, R.E.; Woyke, T.; Lavalvo, D.; Carley, J.; Inskeep, W.P.; Nealson, K.; McDermott, T. Yellowstone Lake Nanoarchaeota. Front. Microbiol. 2013, 4. [Google Scholar] [CrossRef] [PubMed]
  44. Nunoura1, T.; Takaki, Y.; Kakuta, J.; Nishi, S.; Sugahara, J.; Kazama, H.; Chee, G.-J.; Hattori, M.; Kanai, A.; Atomi, H.; et al. Insights into the evolution of Archaea and eukaryotic protein modifier systems revealed by the genome of a novel archaeal group. Nucleic Acids Res. 2011, 39, 3204–3223. [Google Scholar] [CrossRef] [PubMed]
  45. Baker, B.J.; Comolli, L.R.; Dick, G.J.; Hauser, L.J.; Haytt, D.; Dill, B.J.; Land, M.L.; VerBerkmoes, N.C.; Hettich, R.L.; Banfield, J.F. Enegmatic, ultrasmall, uncultivated Archaea. Proc. Natl. Acad. Sci. USA 2010, 107, 8806–8811. [Google Scholar] [CrossRef] [PubMed]
  46. Meng, J.; Xu, J.; Qin, D.; He, Y.; Xiao, X.; Wang, F. Genetic and functional properties of uncultivated MCG archaea assessed by metagenome and gene expression analyses. ISME J. 2014, 8, 650–659. [Google Scholar] [CrossRef] [PubMed]
  47. Yarza, P.; Richter, M.; Peplies, J.; Euzéby, J.; Amann, R.; Schleifer, K.-H.; Ludwig, W.; Glöckner, F.O.; Roselló-Móra, R. The All-Species Living Tree project: A 16S rRNA-based phylogenetic tree of all se-quenced type strains. Syst. Appl. Microbiol. 2008, 31, 241–250. [Google Scholar] [CrossRef] [PubMed]
  48. Yarza, P.; Ludwig, W.; Euzéby, J.; Amann, R.; Schleifer, K.-H.; Glöckner, F.O.; Rossweló-Móra, R. Update of the All-Species Living Tree project based on 16S and 23S rRNA sequence analysis. Syst. Appl. Microbiol. 2010, 33, 291–299. [Google Scholar] [CrossRef] [PubMed][Green Version]
  49. Yilmaz, P.; Wegener-Parfrey, L.; Yarza, P.; Gerken, J.; Pruesse, E.; Quast, C.; Schweer, T.; Peplies, J.; Ludwig, W.; Glöckner, F.O. The SILVA and “All-species Living Tree Project (LTP)” taxonomic frameworks. Nucleic Acids Res. 2014, 42, D643–D648. [Google Scholar] [CrossRef] [PubMed]
  50. LTPs115 web site. Available online: (accessed on 25 November 2014).
  51. LVTree Viewer. Available online: (accessed on 25 November 2014).
  52. Brochier-Armanet, C.; Forterre, P.; Gribaldo, S. Phylogeny and evolution of the Archaea: One hundred genomes later. Curr. Opin. Microbiol. 2011, 14, 274–281. [Google Scholar] [CrossRef] [PubMed]
  53. Reysenbach, A.-L.; Liu, Y.; Banta, A.B.; Beveridge, T.J.; Kirshtein, J.D.; Schouten, S.; Tivey, M.K.; von Damm, K.L.; Voytek, M.A. A ubiquitous thermoacidophilic archaeon from deep-sea hydrothermal vents. Nature 2006, 422, 444–447. [Google Scholar] [CrossRef]
  54. Schouten, S.; Baas, M.; Hopmans, E.C.; Reysenbach, A.-L.; Sinninghe Damste, J.S. Tetraether membrane lipids of Candidatus “Aciduliprofundum boonei”, a cultivated obligate thermoacidophilic euryarchaeote from deep-sea hydrothermal vents. Extremophiles 2008, 12, 119–124. [Google Scholar] [CrossRef] [PubMed]
  55. Guy, L.; Ettema, T.J.G. The archaeal “TACK” superphylum and the origin of eukaryotes. Trends Micrbiol. 2011, 19, 580–587. [Google Scholar] [CrossRef]
  56. Sun, J.; Xu, Z.; Hao, B. Whole-genome based Archaea phylogeny and taxonomy: A composition vector approach. Chin. Sci. Bull. 2010, 55, 2323–2328. [Google Scholar] [CrossRef]
  57. Daubin, V.; Gouy, M.; Perriére, G. Bacterial molecular phylogeny using supertree approach. Genome Inform. 2001, 12, 155–164. [Google Scholar] [PubMed]
  58. Wolf, Y.I.; Rogiozin, I.B.; Grishin, N.V.; Tatusov, R.L.; Koonin, E.V. Genome tree constructed using five different approaches suggest new major bacterial clades. BMC Evol. Biol. 2001, 1. [Google Scholar] [CrossRef]
  59. Gribaldo, S.; Brochier, C. Phylogeny of prokaryotes: Does it exist and why should we care? Res. Microbiol. 2009, 160, 513–521. [Google Scholar] [CrossRef] [PubMed]
  60. Zuo, G.; Hao, B.; Staley, J.R. Geographic divergence of “Sulfolobus islandicus” strains assessed by genomic analyses including electronic DNA hybridization confirms they are geovars. Antonie van Leeuwenoek 2014, 105, 431–435. [Google Scholar] [CrossRef]

Share and Cite

MDPI and ACS Style

Zuo, G.; Xu, Z.; Hao, B. Phylogeny and Taxonomy of Archaea: A Comparison of the Whole-Genome-Based CVTree Approach with 16S rRNA Sequence Analysis. Life 2015, 5, 949-968.

AMA Style

Zuo G, Xu Z, Hao B. Phylogeny and Taxonomy of Archaea: A Comparison of the Whole-Genome-Based CVTree Approach with 16S rRNA Sequence Analysis. Life. 2015; 5(1):949-968.

Chicago/Turabian Style

Zuo, Guanghong, Zhao Xu, and Bailin Hao. 2015. "Phylogeny and Taxonomy of Archaea: A Comparison of the Whole-Genome-Based CVTree Approach with 16S rRNA Sequence Analysis" Life 5, no. 1: 949-968.

Article Metrics

Back to TopTop