Next Article in Journal
Engineering Useful Microbial Species for Pharmaceutical Applications
Next Article in Special Issue
Diversity, Composition, and Ecological Function of Endophytic Fungal Communities Associated with Erigeron breviscapus in China
Previous Article in Journal
Lentinan Reduces Transmission Efficiency of COVID-19 by Changing Aerodynamic Characteristic of Exhaled SARS-CoV-2 Aerosols in Golden Hamsters
 
 
Font Type:
Arial Georgia Verdana
Font Size:
Aa Aa Aa
Line Spacing:
Column Width:
Background:
Article

Global Archaeal Diversity Revealed Through Massive Data Integration: Uncovering Just Tip of Iceberg

by
Antonios Kioukis
1,
Antonio Pedro Camargo
2,
Pavlos Pavlidis
3,
Ioannis Iliopoulos
4,
Nikos C Kyrpides
2 and
Ilias Lagkouvardos
1,*
1
Department of Clinical Microbiology, School of Medicine, University of Crete, 70013 Heraklion, Greece
2
DOE Joint Genome Institute, Lawrence Berkeley National Laboratory, Berkeley, CA 94720, USA
3
Foundation for Research and Technology Hellas, Institute of Computer Science, 70013 Heraklion, Greece
4
School of Medicine, University of Crete, 70013 Heraklion, Greece
*
Author to whom correspondence should be addressed.
Microorganisms 2025, 13(3), 598; https://doi.org/10.3390/microorganisms13030598
Submission received: 24 January 2025 / Revised: 27 February 2025 / Accepted: 28 February 2025 / Published: 5 March 2025
(This article belongs to the Special Issue Earth Systems: Shaped by Microbial Life)

Abstract

:
The domain of Archaea has gathered significant interest for its ecological and biotechnological potential and its role in helping us to understand the evolutionary history of Eukaryotes. In comparison to the bacterial domain, the number of adequately described members in Archaea is relatively low, with less than 1000 species described. It is not clear whether this is solely due to the cultivation difficulty of its members or, indeed, the domain is characterized by evolutionary constraints that keep the number of species relatively low. Based on molecular evidence that bypasses the difficulties of formal cultivation and characterization, several novel clades have been proposed, enabling insights into their metabolism and physiology. Given the extent of global sampling and sequencing efforts, it is now possible and meaningful to question the magnitude of global archaeal diversity based on molecular evidence. To do so, we extracted all sequences classified as Archaea from 500 thousand amplicon samples available in public repositories. After processing through our highly conservative pipeline, we named this comprehensive resource the ‘Global Archaea Diversity’ (GAD), which encompassed nearly 3 million molecular species clusters at 97% similarity, and organized it into over 500 thousand genera and nearly 100 thousand families. Saline environments have contributed the most to the novel taxa of this previously unseen diversity. The majority of those 16S rRNA gene sequence fragments were verified by matches in metagenomic datasets from IMG/M. These findings reveal a vast and previously overlooked diversity within the Archaea, offering insights into their ecological roles and evolutionary importance while establishing a foundation for the future study and characterization of this intriguing domain of life.

1. Introduction

In the early days, all non-eukaryotic single-celled microorganisms were classified as ‘Prokaryota’ based on their morphology. Advances in comparative and phylogenetic methods have clarified the distinctions between Prokaryota and the three domains of life, Eukarya, Eubacteria, and Archaebacteria, which were traditionally considered the three primary domains of life. Initially, Archaebacteria were classified as part of a single domain alongside Bacteria. However, subsequent research revealed significant distinctions, leading to their reclassification into two separate domains: Archaea and Bacteria [1]. The genetic and biochemical similarities (presence of histones, complex RNA polymerases, etc.) between Eukaryotes and Archaea led Carl Woese to propose that Archaea are closer to Eukaryotes than to Bacteria [2]. This relation is being reinforced to this day, with the discovery of novel species belonging to the new order of Asgardarchaeota, which contain multiple new shared characteristics between Archaea and Eukaryotes [3,4].
Despite advancements in DNA amplification techniques, including refined polymerase chain reaction (PCR) methodologies, as well as reduced sequencing costs, PCR-free sample preparation protocols, and the utilization of the 16S ribosomal RNA (rRNA) marker gene, our comprehension of microbiology remains constrained by the challenges associated with laboratory cultivation [5,6,7,8]. The selection of this marker gene is a result of its evolutionary conservation across all known life forms, which allows it to serve as a basis for phylogenetic and taxonomic identification. Initially, Archaea were categorized into two superphyla: Euryarchaeota and Crenarchaeota [3,9,10]. However, the expansion of sampling efforts across diverse environments has unveiled evolutionary branches that do not fit neatly into the proposed superphyla, which are exemplified by Korarchaeota [11]. Furthermore, even within these emerging superphyla, deeply branching microbial groups have been identified, prompting the coining of the term ‘microbial dark matter’ [12] to describe this uncultivated diversity.
The number of Archaea phyla has steadily increased in the last decade, now reaching twenty [13]. The discovery of these new phyla and other taxonomic groups is facilitated by novel techniques like metagenomic assembly and binning as well as the method of single-cell amplification and sequencing [14,15,16,17,18]. From the manual Sanger protocol to the tabletop sequencers of today, the progress has been unimaginable. Those advances dropped the cost of sequencing vertically, enabling us to produce more microbial data than ever before. In conjunction with the development of new computational methods, the rate at which novel Archaea are being uncovered has only accelerated.
The exploration of ecological niches has shown an unequal emphasis, particularly with a predominant focus on Bacteria. Several factors have contributed to this skew. The paramount importance placed on identifying Bacteria with harmful implications for human health, such as the sequencing of Haemophilus influenzae in 1995, has driven research toward this direction. Likewise, attention was directed toward economically significant microorganisms, which is exemplified by the sequencing of Saccharomyces cerevisiae in 1996. In contrast, Archaea have not received comparable attention as they neither pose direct pathogenic threats to humans nor offer immediate economic benefits. The design of primers targeting the bacterial 16S rRNA gene, the relatively low abundance of Archaea in many microbial communities, and the challenges associated with laboratory cultivation further hinder efforts to even roughly estimate the diversity of archaeal organisms. Consequently, attempts to estimate its microbial diversity have exhibited a substantial range, varying from a few million [19] to a few billion [20].
Although it is true that all microbes contribute to carbon and nutrient cycles, certain groups, e.g., Archaea, possess unique metabolic pathways that distinguish their roles within these processes. Given their dispersal into extreme habitats, it is challenging to comprehend the full extent of the uncharted territory within the ‘dark matter’. However, a mere catalog is insufficient; investigation of this domain offers the potential for discovering novel functionalities that may drive advancements in biotechnology. On a more theoretical level, such exploration has the capacity to provide profound insights into the evolutionary history of Archaea and illuminate the origins of complex cellular life, thereby enriching our comprehension of the world [21,22].
In this study, we leveraged a dataset that comprises 500,000 microbial samples, which had been meticulously pre-processed and integrated into the IMNGS database [23]. Our primary objective was to comprehensively investigate the archaeal diversity encompassed within. Specifically, we aimed to accurately determine the lower bounds of this diversity and to pinpoint ecological niches that require further focus. Notably, we shed light on a potential bias in our estimation of the archaeal taxonomic clades within the tree of life. This bias arises from an overrepresentation of cultivable microbes, which leads to an underrepresentation of other taxa. These taxa are routinely discarded despite holding valuable information. We employed a verification process, drawing upon a metagenomic resource for validation, thus ensuring the robustness and reliability of our results. Furthermore, our investigation yielded a significant discovery—the identification of a novel molecular-based class of Archaea that resides within the Asgardarchaeota clade, which is tentatively denoted as ‘Sleipnirarchaeota’.

2. Materials and Methods

2.1. Dataset Creation

IMNGS crawls Sequence Read Archive (SRA) [24] and extracts all available 16S rRNA samples regardless of environmental origin or sequencing technology used. At the moment, more than 500,000 processed samples are present in the platform, with 117,000 samples including at least one sequence classified as of archaeal origin. All IMNGS samples have been processed through a pipeline that includes the steps for the following: non-16S-like sequence removal using sortmerna [25], error filtering and Operational Taxonomic Unit (OTU) creation based on the UPARSE [26] algorithm present in USEARCH [27], as well as the taxonomic assignment by RDP [28].
Since every OTU’s assumed taxonomy is already available by the preprocessing pipeline of IMNGS, we extracted the OTUs belonging to the Archaea domain from every sample ( n s e q = 15.8 million). The selected OTUs were compared against the Living Tree Project (LTP) v.128 [29] and SILVA v.138.1 [30] databases, with any exceedingly high (>98%) matched OTU getting replaced by its target (0.2 million hits from LTP and 1.2 million from SILVA). These replacements increase the validity and power of our analysis without reducing its generality due to the databases holding sequences of higher average quality and length and having been expertly verified. Any LTP or SILVA Archaea sequences with no hits on our dataset were also appended to our dataset due to them representing archaeal biodiversity not originally present in our dataset.
An important challenge encountered was that the dataset’s OTUs originate from different amplicon studies with different primers used, and the targeted regions are, in the best case, only partially overlapping. To address this challenge, we followed the standard of identifying a common region across studies and trimming all OTUs accordingly, as described in TIC [31] (Figure S1). Since TIC uses the latest SINA aligner in combination with SILVA as the reference database (v138.2), the taxonomy classification of all OTUs was updated. The reclassification process allowed for assigned taxonomies to be corrected, enabling the detection of sequences classified erroneously as Archaea, which were discarded ( n s e q = 12 million).
The extraction of the most-represented region from each OTU and the following collapse of the alignment gaps is a straightforward and easy-to-implement procedure as described in the TIC Pipeline (SINA positions: 10300-25300) (Figure S1). The Escherichia coli 16S rRNA gene was also aligned through SILVA, and the number of bases (n = 244) within the selected region was used as the lowest limit of required information each Archaea OTU must have for it to be included into the next stage ( n s e q = 9 million). A dereplication was performed on the extracted sub-sequences with the output acting as our final dataset ( n s e q = 6.3 million) (Figure 1).

2.2. Clustering Limits Identification

In the Archaea domain, no well-defined limits exist for the sequence percentage similarity between families, genera, and species, which are required for usage of our clustering tool TIC for the identification of novel molecular families, genera, and species. Even if they existed, those limits would not be applicable to our project since we have extracted a non-standard region of the 16S rRNA gene combining the V4 and V5 16S rRNA variable regions (Figure S1, Table S1). To overcome this hurdle, we queried all SILVA sequences against those included in LTP. Any match over 98% was appended to LTP and taxonomically characterized by its match. We split this combined LTP/SILVA dataset based on its taxonomic information and calculated the intra-group differences on all levels on the full 16S region. Since our Archaea dataset contains a sub-region of 16S, we extracted the same region from the combined LTP/SILVA dataset, keeping the full taxonomic information, so we can split the dataset based on it and re-calculate the intra-group similarities. Through this process, we verified that our selected region almost perfectly mirrors the percentage dissimilarity of the full sequences (family level: 89%, genus level 93%, species level 97%), which verifies that our region is representative of the diversity contained within the full 16S rRNA sequence (Figure 2).

2.3. Verification

Verification was performed using Integrated Microbial Genomes and Microbiomes (IMG/M) [32] as a search target. IMG/M contains annotated metagenomes from the three domains of life, which are sequenced at DOE’s Joint Genome Institute, submitted by external users, or imported from the same source as IMNGS (SRA). IMG/M hosts approximately 65 billion metagenome genes processed through the GOLD pipeline [33], which encompasses filtering, error correction, and assembly of reads, followed by annotating the structure and function of contigs and contig-based binning. The volume of IMG/M enables the verification of our novel molecular species, ensuring both their presence in amplicon and metagenomic samples and that they are not artifacts.

3. Results

3.1. Archaea Knowledge Expansion

GAD was reorganized into 2.8 million species OTUs (SOTUs), 561 thousand genus OTUs (GOTUs), and 98 thousand family OTUs (FOTUs), which are orders of magnitude larger than those contained in the LTP, SILVA, and GTDB [34,35] databases (Table 1, Figure 3).
As expected, the majority of SOTUs, GOTUs, and FOTUs contain a single OTU from only one SRA sample, and removing those singletons decreases the numbers significantly (Table 1).
Another approach in finding the low limit for the global archaeal diversity is based on their presence in different samples. To do this, we counted only the most abundant SOTU from every GOTU for each sample, i.e., when two SOTUs come from the same sample and have been identified as belonging to the same GOTU, only the most abundant SOTU would be counted. This helps account for noise (batch effect) or errors (sequencing limitations) in our data since GOTUs are dissimilar enough to tolerate those differences. Generalizing this for every SOTU, GOTU, and sample, we obtained a low estimation of 1.5 million SOTUs. (Figure 4).
At the family level, the analysis is more complex. When we used VSEARCH [36] to cluster LTP and SILVA sequences from the same family, we obtained multiple clusters rather than a single one. This was expected because taxonomy classifiers like SINA consider factors beyond sequence similarity to assign taxonomy, while naive clustering tools like VSEARCH rely purely on sequence similarity. Since TIC uses VSEARCH for clustering, the reported number of FOTUs is likely an overestimate.
Furthermore, not all taxonomic branches are equal in terms of novelty included (Figure 3). The order Woesearchaeales has not been explored enough to identify families, genera, and species mainly due to difficulties in lab cultivation. Our analysis shows that Woesearchaeales contains almost half of the diversity in the family and genera levels, and since a third of our dataset’s OTUs (2.2/6.4 million) were taxonomically classified to it, painting it as a valid target for further work (Figure S2). On the other hand, there is evidence of a new narrow order (sequence to species rate = 0.2) within the Thermoplasmatota class that was not previously identified.
We have also verified the environmental preferences of Archaea being soil and saline waters as reported in [37,38,39,40,41]. Furthermore, the least explored Archaea phyla (as evidenced by the number of known classes and orders enclosed) incorporate most of the novelty found, giving us a clear indication of where we should focus our efforts for the purpose of exploration and acquiring new knowledge (Figure 5).

3.2. Novelty

A novelty score, which was based on the number of FOTUs, GOTUs, and SOTUs within the samples of each environment, was normalized by the number of environment samples present in our dataset. Saline water samples hold the most unexplored novelty on each level, while host-associated (Table S2) and plant samples are the most studied.
The sampling effort is not spread equally between all environments; for example, our dataset contains 15 thousand samples originating from saline water with an equal number of host-associated samples despite the fact that 70% of our planet is covered by oceans. However, the rate of identification of new molecular species ( d s o t u ) after incorporation of new samples into our data varies greatly, with soil ( d s o t u > 27) and saline water ( d s o t u > 25) environments having a lot of diversity that remains unexplored. In contrast, we have to expend a lot of additional effort to find novelty in host-associated and plant environments (Figure 6 and Figure 7).

3.3. Verification

IMG/M is a data management system containing metagenomes from a large diversity of microbiomes. We re-categorized the SOTUs to look into their contributing SRA sample. An SOTU containing a single OTU from an SRA sample was termed a singleton whereas a doubleton contains at least one sequence from two different samples, followed by tripletons and moretons. We searched for high-quality matches between GAD and IMG/M. Even though our sequences originated from amplicon studies and IMG/M contains only metagenomes, 94% of our SOTUs were matched up to the family level (89% similarity). (Figure 8). This provides an independent line of support for the validity of the predicted Archaea diversity.
Metagenomic sequencing, due to the distribution of the reads across the multiple genomes of the targeted microbial samples, tends to capture predominantly the 16S genes from the most dominant species. Therefore, a match between an SOTU from the GAD collection and the metagenomic sequences of IMG/M would not only provide additional evidence of its presence in the WGS microbiomes but also support its key role in the detected environment. It is expected that cosmopolitan archaeal species would be overrepresented in the metagenomic database, while more niche-specific species would be less represented or not represented at all. Our division of predicted SOTUs from singletons to moretons likely reflects a ranking of ecological spread despite the sampling biases in the databases. Our results show that the SOTU singletons are harder to be found in IMG/M, with only 0.6% of the 2.4 million found to have a near exact match. This rate of matches to the IMG/M sequences grows to 5% for doubleton SOTUs, 7% for tripletons, and 18% for moretons. Nevertheless, moving to a species level of similarity (97%) as a cutoff for matching sequences, only singletons show a distinguishably lower rate of 10% matching while all other sequences were matched at around 30% rate. For higher taxonomic levels (genus/family), the discrepancy in the matching rate disappears (Table 2).

3.4. Asgardarchaeota Case Study

Asgardarchaeota represent the closest relatives of Eukaryotes [2,42,43,44] from the other domains of life through the existence of genes encoding homologous proteins [45] and other characteristics [46,47]. However, the inability to perform lab cultivation still hampers our understanding of this important phylum. This obstacle is evidenced both by LPSN [48], which contains only three candidate classes, and SILVA containing only three classes (Heimdalarchaeota, Lokiarchaeota and Odinarchaeota) out of the twenty proposed by other studies [13,49,50] at the time of this analysis. Since our approach is based on SILVA as a taxonomic reference, we limited the initial specificity to the three classes only, with all others lumped into the UNKCLASS label.
Based on the known Asgard classes, Lokiarchaeota encompass the overwhelming majority of our data (Figure 9a), and each class also shows a difference in its niche environment (Figure S3). From a bird’s eye view, the Asgard SOTUs originate from diverse environments (Figure 9b) dominated by the soil and saline water categories.
All Asgard SOTUs with unidentified class (UNKCLASS) were placed on the phylogenetic tree produced by Liu et al. [49], using EPA-ng [51] (Figure 10).
While no unknown SOTUs were identified as belonging to certain proposed classes (Gerd, Wukong, Heimdal, and Baldr), some were assigned to known classes, reducing the number of unknowns. There are five clusters produced from exclusively novel SOTUs that should be studied more since they may represent novel Asgard classes. Among those, Cluser 5 is the most diverse and encompasses twenty-six SOTUs (six verified by different targets of IMG/M) originating from different environments and twenty-five SRA samples. In combination with the cluster distance from other known classes, we are confident that this is a new class, and we propose the new class name: Sleipnirarchaeota. Based on the samples analyzed in this study, Sleipnirarchaeota are predominantly associated with soil and saline water environments.

4. Discussion

4.1. Lower Boundary of Archaea Diversity

The extent of archaeal diversity remains a subject of active scientific debate, with estimates ranging from a few thousand to millions of taxa [20]. In this study, we focused on uncovering the archaeal dark matter, defined as archaeal taxa detected in environmental samples originally characterized using methodologies optimized for bacterial identification rather than archaeal-specific approaches. These archaeal populations are often underrepresented due to biases in detection methods, such as primer specificity and limitations in reference databases. However, this study underscores that significant novelty resides within already analyzed and published datasets. It demonstrates how the systematic reanalysis of existing data can yield valuable insights and drive novel breakthroughs, ultimately advancing scientific understanding. By extension, it highlights the importance of the scientific community adhering to the principles of FAIR (Findable, Accessible, Interoperable, and Reusable) data management [52] that enable such analysis. The realization of the hidden wealth available in public repositories has led to the development of specialized algorithms and pipelines for the utilization of available data. The general methodology employed in this study has been validated in prior [53], which refined on the foundational concepts originally proposed by Lagkouvardos et al. [54]. Following this process, we identified a robust lower boundary for archaeal diversity, quantified at 2,807,013 SOTUs. This estimate is supported by evidence from both metagenomic data and curated database records.
Our findings provide a foundational estimate for the lower limit of archaeal diversity and underscore the critical need for archaeal-specific detection and classification methodologies. Addressing these methodological challenges will enable a more comprehensive resolution of archaeal complexity and improve our understanding of their ecological and evolutionary significance within microbial communities.

4.2. Sleipnirarchaeota

The first sequences of the superphylum Asgardarchaeota were isolated from the sediments around the hydrothermal vents in Loki’s castle in the Atlantic Ocean [55] and consequently classified in the Lokiarchaeota class. Initial analysis placed them in a monophyletic relationship with Eukaryotes, evidencing a shared common ancestor, which indicates that eukaryotic cells evolved from Archaea. A further support to this claim is the presence of homologous genes between them [45,55,56,57,58,59,60].
Thorarcheota were the second class to be discovered in estuary sediments, followed by Heimdallarchaeota and Odinarchaeota [47,55] encountered in anaerobic sediments from hot springs and groundwater [44], Helarchaeota were isolated by sampling deep-sea hydrothermal vents [46]. Their late discovery was due to incompatibility with the common PCR primers and their low abundance (>1%).
In recent years, the Asgard phylum has undergone significant expansion, now encompassing at least eleven classes, which challenges previous estimates of Asgardarchaeota diversity and raises the intriguing question of whether Eukaryotes represent a deep clade within the Asgard phylum [49]. In this study, we provide evidence supporting the existence of multiple novel molecular classes within the Asgardarchaeota, each with varying numbers of members. Specifically, the verification of Cluster 5 from IMG/M strongly supports its classification as a new class. Consistent with the Norse mythology-inspired nomenclature, we propose the name ‘Sleipnirarchaeota’ for this newly identified lineage.

5. Conclusions

Herein, we demonstrate that integrative data from high throughput molecular methods allow us to bypass cultivation constraints in the investigation of microbial diversity. The evidence shown here supports the existence of millions of archaeal ‘molecular species’ as well as the presence of several so far unknown higher lineages. Nevertheless, other than giving us some general ecological findings, it is clear that simple amplicon-based data are not sufficient to give insight into the physiology, overall function, and ecology of this massive microbial dark matter. A combination of metagenomic and single-cell sequencing is the natural next step in our quest for understanding our microbial world.
Furthermore, this study reaffirms that our costly, already-published data remain underutilized. Most raw sequence data are deposited in the SRA but see little to no use for contextualization or integration despite the potential demonstrated. Enhancing accompanying metadata would significantly increase their utility and, in combination with specialized overlay tools, could unlock their full potential.

Supplementary Materials

The following supporting information can be downloaded at: https://www.mdpi.com/article/10.3390/microorganisms13030598/s1, Figure S1: The archaeal Operational Taxonomic Units (OTUs) obtained from the IMNGS dataset were aligned using the SINA alignment tool. The figure illustrates the total number of aligned bases at each position across the alignment. The highlighted region corresponds to the specific segment referenced in the main text, representing a region that encapsulates the observed diversity and facilitates meaningful comparisons between Samples; Figure S2: There are differences in the species and genera expansion rate among orders, with a minimum of 0.2 in the Woesearcheales order and a maximum of 0.7 in the unknown order contained within the class Thermoplasmata. The family expansion rate was uniform and constrained within rates of [5 ∗ 10−5, 7 ∗ 10−2]; Figure S3: Enviromental preferences of Asgardarcheota classes; Table S1: Positioning of the nine variable and hyper-variable regions on the 16S rRNA gene of Escherichia coli (strain K12) reference strain. Positions refer to both the whole gene and its aligned form with SINA. In bold the regions in the SINA alignment for the V4 and V5 regions that were used in our analysis; Table S2: Table with the count of samples for each category of sample origin, as available in SRA, that indicate an animal host as a source. Due to poor annotation they were grouped as Host-Associated in this analysis.

Author Contributions

Conceptualization, I.L.; investigation, A.K. and A.P.C.; data curation, A.K.; writing—original draft preparation, I.L.; writing—review and editing, P.P., I.I., N.C.K. and I.L.; funding acquisition, I.L. All authors have read and agreed to the published version of the manuscript.

Funding

This research was funded by from the Hellenic Foundation for Research and Innovation (HFRI) and the General Secretariat for Research and Technology (GSRT) under grant agreement No 710. APC and NCK were supported by the U.S. Department of Energy Joint Genome Institute https://ror.org/04xm1d337 (accessed on 4 March 2025), a DOE Office of Science User Facility, is supported by the Office of Science of the U.S. Department of Energy operated under Contract No. DE-AC02-05CH11231.

Institutional Review Board Statement

Not applicable.

Informed Consent Statement

Not applicable.

Data Availability Statement

The data are available on Github (accessed on 4 March 2025).

Conflicts of Interest

The authors declare no conflicts of interest.

References

  1. Woese, C.R.; Kandler, O.; Wheelis, M.L. Towards a natural system of organisms: Proposal for the domains Archaea, Bacteria, and Eucarya. Proc. Natl. Acad. Sci. USA 1990, 87, 4576–4579. [Google Scholar] [CrossRef]
  2. Doolittle, W.F.; Logsdon, J.M., Jr. Archaeal genomics: Do archaea have a mixed heritage? Curr. Biol. 1998, 8, R209–R211. [Google Scholar] [CrossRef] [PubMed]
  3. DeLong, E.F. Archaea in coastal marine environments. Proc. Natl. Acad. Sci. USA 1992, 89, 5685–5689. [Google Scholar] [CrossRef]
  4. MacGregor, B.J.; Moser, D.P.; Alm, E.W.; Nealson, K.H.; Stahl, D.A. Crenarchaeota in lake Michigan sediment. Appl. Environ. Microbiol. 1997, 63, 1178–1181. [Google Scholar] [CrossRef]
  5. Rappé, M.S.; Giovannoni, S.J. The uncultured microbial majority. Annu. Rev. Microbiol. 2003, 57, 369–394. [Google Scholar] [CrossRef] [PubMed]
  6. Hugenholtz, P.; Goebel, B.M.; Pace, N.R. Impact of culture-independent studies on the emerging phylogenetic view of bacterial diversity. J. Bacteriol. 1998, 180, 4765–4774. [Google Scholar] [CrossRef]
  7. Pace, N.R. A molecular view of microbial diversity and the biosphere. Science 1997, 276, 734–740. [Google Scholar] [CrossRef] [PubMed]
  8. Baker, B.J.; Dick, G.J. Omic approaches in microbial ecology: Charting the unknown. Microbe 2013, 8, 353–359. [Google Scholar] [CrossRef]
  9. Barns, S.M.; Fundyga, R.E.; Jeffries, M.W.; Pace, N.R. Remarkable archaeal diversity detected in a Yellowstone National Park hot spring environment. Proc. Natl. Acad. Sci. USA 1994, 91, 1609–1613. [Google Scholar] [CrossRef]
  10. Fuhrman, J.A.; McCallum, K.; Davis, A.A. Novel major archaebacterial group from marine plankton. Nature 1992, 356, 148–149. [Google Scholar] [CrossRef]
  11. Barns, S.M.; Delwiche, C.F.; Palmer, J.D.; Pace, N.R. Perspectives on archaeal diversity, thermophily and monophyly from environmental rRNA sequences. Proc. Natl. Acad. Sci. USA 1996, 93, 9188–9193. [Google Scholar] [CrossRef] [PubMed]
  12. Rinke, C.; Schwientek, P.; Sczyrba, A.; Ivanova, N.N.; Anderson, I.J.; Cheng, J.F.; Darling, A.; Malfatti, S.; Swan, B.K.; Gies, E.A.; et al. Insights into the phylogeny and coding potential of microbial dark matter. Nature 2013, 499, 431–437. [Google Scholar] [CrossRef] [PubMed]
  13. Adam, P.S.; Borrel, G.; Brochier-Armanet, C.; Gribaldo, S. The growing tree of Archaea: New perspectives on their diversity, evolution and ecology. ISME J. 2017, 11, 2407–2425. [Google Scholar] [CrossRef] [PubMed]
  14. Stepanauskas, R.; Sieracki, M.E. Matching phylogeny and metabolism in the uncultured marine bacteria, one cell at a time. Proc. Natl. Acad. Sci. USA 2007, 104, 9052–9057. [Google Scholar] [CrossRef]
  15. Dick, G.J.; Andersson, A.F.; Baker, B.J.; Simmons, S.L.; Thomas, B.C.; Yelton, A.P.; Banfield, J.F. Community-wide analysis of microbial genome sequence signatures. Genome Biol. 2009, 10, 1–16. [Google Scholar] [CrossRef]
  16. Venter, J.C.; Remington, K.; Heidelberg, J.F.; Halpern, A.L.; Rusch, D.; Eisen, J.A.; Wu, D.; Paulsen, I.; Nelson, K.E.; Nelson, W.; et al. Environmental genome shotgun sequencing of the Sargasso Sea. Science 2004, 304, 66–74. [Google Scholar] [CrossRef]
  17. Tyson, G.W.; Chapman, J.; Hugenholtz, P.; Allen, E.E.; Ram, R.J.; Richardson, P.M.; Solovyev, V.V.; Rubin, E.M.; Rokhsar, D.S.; Banfield, J.F. Community structure and metabolism through reconstruction of microbial genomes from the environment. Nature 2004, 428, 37–43. [Google Scholar] [CrossRef]
  18. Peng, Y.; Leung, H.C.; Yiu, S.M.; Chin, F.Y. IDBA-UD: A de novo assembler for single-cell and metagenomic sequencing data with highly uneven depth. Bioinformatics 2012, 28, 1420–1428. [Google Scholar] [CrossRef] [PubMed]
  19. Schloss, P.D.; Girard, R.A.; Martin, T.; Edwards, J.; Thrash, J.C. Status of the archaeal and bacterial census: An update. mBio 2016, 7, e00201-16. [Google Scholar] [CrossRef]
  20. Amann, R.; Rosselló-Móra, R. After all, only millions? mBio 2016, 7, e00999-16. [Google Scholar] [CrossRef]
  21. Dombrowski, N.; Teske, A.P.; Baker, B.J. Expansive microbial metabolic versatility and biodiversity in dynamic Guaymas Basin hydrothermal sediments. Nat. Commun. 2018, 9, 4999. [Google Scholar] [CrossRef] [PubMed]
  22. Hug, L.A.; Baker, B.J.; Anantharaman, K.; Brown, C.T.; Probst, A.J.; Castelle, C.J.; Butterfield, C.N.; Hernsdorf, A.W.; Amano, Y.; Ise, K.; et al. A new view of the tree of life. Nat. Microbiol. 2016, 1, 16048. [Google Scholar] [CrossRef]
  23. Lagkouvardos, I.; Joseph, D.; Kapfhammer, M.; Giritli, S.; Horn, M.; Haller, D.; Clavel, T. IMNGS: A comprehensive open resource of processed 16S rRNA microbial profiles for ecology and diversity studies. Sci. Rep. 2016, 6, 33721. [Google Scholar] [CrossRef] [PubMed]
  24. Leinonen, R.; Sugawara, H.; Shumway, M.; Collaboration, I.N.S.D. The sequence read archive. Nucleic Acids Res. 2010, 39, D19–D21. [Google Scholar] [CrossRef] [PubMed]
  25. Kopylova, E.; Noé, L.; Touzet, H. SortMeRNA: Fast and accurate filtering of ribosomal RNAs in metatranscriptomic data. Bioinformatics 2012, 28, 3211–3217. [Google Scholar] [CrossRef]
  26. Edgar, R.C. UPARSE: Highly accurate OTU sequences from microbial amplicon reads. Nat. Methods 2013, 10, 996–998. [Google Scholar] [CrossRef]
  27. Edgar, R.C. Search and clustering orders of magnitude faster than BLAST. Bioinformatics 2010, 26, 2460–2461. [Google Scholar] [CrossRef]
  28. Lan, Y.; Wang, Q.; Cole, J.R.; Rosen, G.L. Using the RDP classifier to predict taxonomic novelty and reduce the search space for finding novel organisms. PLoS ONE 2012, 7, e32491. [Google Scholar] [CrossRef]
  29. Yarza, P.; Richter, M.; Peplies, J.; Euzeby, J.; Amann, R.; Schleifer, K.H.; Ludwig, W.; Glöckner, F.O.; Rosselló-Móra, R. The All-Species Living Tree project: A 16S rRNA-based phylogenetic tree of all sequenced type strains. Syst. Appl. Microbiol. 2008, 31, 241–250. [Google Scholar] [CrossRef]
  30. Quast, C.; Pruesse, E.; Yilmaz, P.; Gerken, J.; Schweer, T.; Yarza, P.; Peplies, J.; Glöckner, F.O. The SILVA ribosomal RNA gene database project: Improved data processing and web-based tools. Nucleic Acids Res. 2012, 41, D590–D596. [Google Scholar] [CrossRef]
  31. Kioukis, A.; Pourjam, M.; Neuhaus, K.; Lagkouvardos, I. Taxonomy Informed Clustering, an Optimized Method for Purer and More Informative Clusters in Diversity Analysis and Microbiome Profiling. Front. Bioinform. 2022, 2, 864597. [Google Scholar] [CrossRef] [PubMed]
  32. Chen, I.M.A.; Chu, K.; Palaniappan, K.; Ratner, A.; Huang, J.; Huntemann, M.; Hajek, P.; Ritter, S.; Varghese, N.; Seshadri, R.; et al. The IMG/M data management and analysis system v. 6.0: New tools and advanced capabilities. Nucleic Acids Res. 2021, 49, D751–D763. [Google Scholar] [CrossRef] [PubMed]
  33. Clum, A.; Huntemann, M.; Bushnell, B.; Foster, B.; Foster, B.; Roux, S.; Hajek, P.P.; Varghese, N.; Mukherjee, S.; Reddy, T.B.K.; et al. DOE JGI Metagenome Workflow. mSystems 2021, 6, e00804-20. [Google Scholar] [CrossRef]
  34. Parks, D.H.; Chuvochina, M.; Rinke, C.; Mussig, A.J.; Chaumeil, P.A.; Hugenholtz, P. GTDB: An ongoing census of bacterial and archaeal diversity through a phylogenetically consistent, rank normalized and complete genome-based taxonomy. Nucleic Acids Res. 2022, 50, D785–D794. [Google Scholar] [CrossRef] [PubMed]
  35. Chaumeil, P.A.; Mussig, A.J.; Hugenholtz, P.; Parks, D.H. GTDB-Tk: A toolkit to classify genomes with the Genome Taxonomy Database. Bioinformatics 2020, 36, 1925–1927. [Google Scholar] [CrossRef]
  36. Rognes, T.; Flouri, T.; Nichols, B.; Quince, C.; Mahé, F. VSEARCH: A versatile open source tool for metagenomics. PeerJ 2016, 4, e2584. [Google Scholar] [CrossRef]
  37. Baker, B.J.; De Anda, V.; Seitz, K.W.; Dombrowski, N.; Santoro, A.E.; Lloyd, K.G. Diversity, ecology and evolution of Archaea. Nat. Microbiol. 2020, 5, 887–900. [Google Scholar] [CrossRef]
  38. Zhou, Z.; Liu, Y.; Lloyd, K.G.; Pan, J.; Yang, Y.; Gu, J.D.; Li, M. Genomic and transcriptomic insights into the ecology and metabolism of benthic archaeal cosmopolitan, Thermoprofundales (MBG-D archaea). ISME J. 2019, 13, 885–901. [Google Scholar] [CrossRef]
  39. Lloyd, K.G.; Schreiber, L.; Petersen, D.G.; Kjeldsen, K.U.; Lever, M.A.; Steen, A.D.; Stepanauskas, R.; Richter, M.; Kleindienst, S.; Lenk, S.; et al. Predominant archaea in marine sediments degrade detrital proteins. Nature 2013, 496, 215–218. [Google Scholar] [CrossRef]
  40. Tully, B.J. Metabolic diversity within the globally abundant Marine Group II Euryarchaea offers insight into ecological patterns. Nat. Commun. 2019, 10, 271. [Google Scholar] [CrossRef]
  41. Rinke, C.; Rubino, F.; Messer, L.F.; Youssef, N.; Parks, D.H.; Chuvochina, M.; Brown, M.; Jeffries, T.; Tyson, G.W.; Seymour, J.R.; et al. A phylogenomic and ecological analysis of the globally abundant Marine Group II archaea (Ca. Poseidoniales ord. nov.). ISME J. 2019, 13, 663–675. [Google Scholar] [CrossRef]
  42. Spang, A.; Stairs, C.W.; Dombrowski, N.; Eme, L.; Lombard, J.; Caceres, E.F.; Greening, C.; Baker, B.J.; Ettema, T.J. Proposal of the reverse flow model for the origin of the eukaryotic cell based on comparative analyses of Asgard archaeal metabolism. Nat. Microbiol. 2019, 4, 1138–1148. [Google Scholar] [CrossRef]
  43. Imachi, H.; Nobu, M.K.; Nakahara, N.; Morono, Y.; Ogawara, M.; Takaki, Y.; Takano, Y.; Uematsu, K.; Ikuta, T.; Ito, M.; et al. Isolation of an archaeon at the prokaryote–eukaryote interface. Nature 2020, 577, 519–525. [Google Scholar] [CrossRef] [PubMed]
  44. Zaremba-Niedzwiedzka, K.; Caceres, E.F.; Saw, J.H.; Bäckström, D.; Juzokaite, L.; Vancaester, E.; Seitz, K.W.; Anantharaman, K.; Starnawski, P.; Kjeldsen, K.U.; et al. Asgard archaea illuminate the origin of eukaryotic cellular complexity. Nature 2017, 541, 353–358. [Google Scholar] [CrossRef] [PubMed]
  45. Akıl, C.; Robinson, R.C. Genomes of Asgard archaea encode profilins that regulate actin. Nature 2018, 562, 439–443. [Google Scholar] [CrossRef] [PubMed]
  46. Seitz, K.W.; Dombrowski, N.; Eme, L.; Spang, A.; Lombard, J.; Sieber, J.R.; Teske, A.P.; Ettema, T.J.; Baker, B.J. Asgard archaea capable of anaerobic hydrocarbon cycling. Nat. Commun. 2019, 10, 1822. [Google Scholar] [CrossRef]
  47. Seitz, K.W.; Lazar, C.S.; Hinrichs, K.U.; Teske, A.P.; Baker, B.J. Genomic reconstruction of a novel, deeply branched sediment archaeal phylum with pathways for acetogenesis and sulfur reduction. ISME J. 2016, 10, 1696–1705. [Google Scholar] [CrossRef]
  48. Parte, A.C.; Carbasse, J.S.; Meier-Kolthoff, J.P.; Reimer, L.C.; Göker, M. List of Prokaryotic names with Standing in Nomenclature (LPSN) moves to the DSMZ. Int. J. Syst. Evol. Microbiol. 2020, 70, 5607. [Google Scholar] [CrossRef]
  49. Liu, Y.; Makarova, K.S.; Huang, W.C.; Wolf, Y.I.; Nikolskaya, A.N.; Zhang, X.; Cai, M.; Zhang, C.J.; Xu, W.; Luo, Z.; et al. Expanded diversity of Asgard archaea and their relationships with eukaryotes. Nature 2021, 593, 553–557. [Google Scholar] [CrossRef]
  50. Xie, R.; Wang, Y.; Huang, D.; Hou, J.; Li, L.; Hu, H.; Zhao, X.; Wang, F. Expanding Asgard members in the domain of Archaea sheds new light on the origin of eukaryotes. Sci. China Life Sci. 2021, 65, 818–829. [Google Scholar] [CrossRef]
  51. Barbera, P.; Kozlov, A.M.; Czech, L.; Morel, B.; Darriba, D.; Flouri, T.; Stamatakis, A. EPA-ng: Massively parallel evolutionary placement of genetic sequences. Syst. Biol. 2019, 68, 365–369. [Google Scholar] [CrossRef] [PubMed]
  52. Wilkinson, M.D.; Dumontier, M.; Aalbersberg, I.J.; Appleton, G.; Axton, M.; Baak, A.; Blomberg, N.; Boiten, J.W.; da Silva Santos, L.B.; Bourne, P.E.; et al. The FAIR Guiding Principles for scientific data management and stewardship. Sci. Data 2016, 3, 1–9. [Google Scholar] [CrossRef]
  53. Mourgela, R.N.; Kioukis, A.; Pourjam, M.; Lagkouvardos, I. Large-scale integration of amplicon data reveals massive diversity within saprospirales, mostly originating from saline environments. Microorganisms 2023, 11, 1767. [Google Scholar] [CrossRef]
  54. Lagkouvardos, I.; Weinmaier, T.; Lauro, F.M.; Cavicchioli, R.; Rattei, T.; Horn, M. Integrating metagenomic and amplicon databases to resolve the phylogenetic and ecological diversity of the Chlamydiae. ISME J. 2014, 8, 115–125. [Google Scholar] [CrossRef] [PubMed]
  55. Spang, A.; Saw, J.H.; Jørgensen, S.L.; Zaremba-Niedzwiedzka, K.; Martijn, J.; Lind, A.E.; Van Eijk, R.; Schleper, C.; Guy, L.; Ettema, T.J. Complex archaea that bridge the gap between prokaryotes and eukaryotes. Nature 2015, 521, 173–179. [Google Scholar] [CrossRef]
  56. Ettema, T.J.; Lindås, A.C.; Bernander, R. An actin-based cytoskeleton in archaea. Mol. Microbiol. 2011, 80, 1052–1061. [Google Scholar] [CrossRef]
  57. Koonin, E.V.; Yutin, N. The dispersed archaeal eukaryome and the complex archaeal ancestor of eukaryotes. Cold Spring Harb. Perspect. Biol. 2014, 6, a016188. [Google Scholar] [CrossRef]
  58. Dalziel, M.; Crispin, M.; Scanlan, C.N.; Zitzmann, N.; Dwek, R.A. Emerging principles for the therapeutic exploitation of glycosylation. Science 2014, 343, 1235681. [Google Scholar] [CrossRef] [PubMed]
  59. Eme, L.; Spang, A.; Lombard, J.; Stairs, C.W.; Ettema, T.J. Archaea and the origin of eukaryotes. Nat. Rev. Microbiol. 2017, 15, 711–723. [Google Scholar] [CrossRef]
  60. Williams, T.A.; Cox, C.J.; Foster, P.G.; Szöllősi, G.J.; Embley, T.M. Phylogenomics provides robust support for a two-domains tree of life. Nat. Ecol. Evol. 2020, 4, 138–147. [Google Scholar] [CrossRef]
Figure 1. Depicted are 15.8 million Archaea sequences originating from 177K SRA samples preprocessed by the IMNGS. Every OTU sequence was searched against LTP and SILVA, with any match over 98% being replaced by the database sequence due to their higher quality. SINA was used for alignment and classification. After identification of the most represented region, Escherichia coli 16S rRNA gene was also aligned. The number of E. coli bases (n = 244) within the selected region was used as the lowest limit of required information that each Archaea sequence included in our dataset must contain, so the sequence is included in the next stage. A dereplication was performed on the extracted sub-sequences with the output acting as our final dataset.
Figure 1. Depicted are 15.8 million Archaea sequences originating from 177K SRA samples preprocessed by the IMNGS. Every OTU sequence was searched against LTP and SILVA, with any match over 98% being replaced by the database sequence due to their higher quality. SINA was used for alignment and classification. After identification of the most represented region, Escherichia coli 16S rRNA gene was also aligned. The number of E. coli bases (n = 244) within the selected region was used as the lowest limit of required information that each Archaea sequence included in our dataset must contain, so the sequence is included in the next stage. A dereplication was performed on the extracted sub-sequences with the output acting as our final dataset.
Microorganisms 13 00598 g001
Figure 2. The selected region almost perfectly mirrors the percentage dissimilarity of the full sequences (family level: 89%, genus level 93%, species level 97%). Thus, our region is representative of the diversity contained within the full 16S rRNA.
Figure 2. The selected region almost perfectly mirrors the percentage dissimilarity of the full sequences (family level: 89%, genus level 93%, species level 97%). Thus, our region is representative of the diversity contained within the full 16S rRNA.
Microorganisms 13 00598 g002
Figure 3. Krona plot quantifying the size of each order after TIC. Novel (created by TIC) and known molecular orders (provided by the SINA classifier) are included.
Figure 3. Krona plot quantifying the size of each order after TIC. Novel (created by TIC) and known molecular orders (provided by the SINA classifier) are included.
Microorganisms 13 00598 g003
Figure 4. Schematic of the process for estimation of the low limit of archaeal diversity. A list of highly confident SOTUs was assembled by joining the selections of the most abundant SOTU per GOTU per sample for all samples.
Figure 4. Schematic of the process for estimation of the low limit of archaeal diversity. A list of highly confident SOTUs was assembled by joining the selections of the most abundant SOTU per GOTU per sample for all samples.
Microorganisms 13 00598 g004
Figure 5. Graphlan plot with the center depicting the taxonomic tree of the SOTUs after TIC that incorporates both novel (red) and known (white) clades up to the order level. The three inner rings quantify the number of (#) SOTUs, genera, and families within each order. Names of the orders containing more than 10 K SOTUs are also given on the left side. The fourth ring (Abundance) shows the environment of the original IMNGS sample, where the most abundant sequence contained within this order comes from. The outer ring (Prevalence) depicts the majority-rule-voted environment from all the sequences contained within the selected order.
Figure 5. Graphlan plot with the center depicting the taxonomic tree of the SOTUs after TIC that incorporates both novel (red) and known (white) clades up to the order level. The three inner rings quantify the number of (#) SOTUs, genera, and families within each order. Names of the orders containing more than 10 K SOTUs are also given on the left side. The fourth ring (Abundance) shows the environment of the original IMNGS sample, where the most abundant sequence contained within this order comes from. The outer ring (Prevalence) depicts the majority-rule-voted environment from all the sequences contained within the selected order.
Microorganisms 13 00598 g005
Figure 6. Rarefaction curves indicating the cumulative archaeal diversity at broad ecological niches (gamma diversity).The X axis represents the number of (#) microbial profiles intergrated. The Y axis represents the number of (#) thousands SOTUs discovered up to the selected number of profiles. dsotu corresponds to the expected novel SOTUs by the integration of one additional sample of that niche past those already included in the study. Dashed line is the 1:1 relation rate (1 new SOTU per 1 additional sample). (a) Host (Table S2); (b) Plant; (c) Soil; (d) Freshwater; (e) Saline water.
Figure 6. Rarefaction curves indicating the cumulative archaeal diversity at broad ecological niches (gamma diversity).The X axis represents the number of (#) microbial profiles intergrated. The Y axis represents the number of (#) thousands SOTUs discovered up to the selected number of profiles. dsotu corresponds to the expected novel SOTUs by the integration of one additional sample of that niche past those already included in the study. Dashed line is the 1:1 relation rate (1 new SOTU per 1 additional sample). (a) Host (Table S2); (b) Plant; (c) Soil; (d) Freshwater; (e) Saline water.
Microorganisms 13 00598 g006
Figure 7. A novelty score, which was based on the number of FOTUs, GOTUs, and sOTUs within the samples of each environment, was normalized by the number of environment samples present in our dataset. Saline water samples contain the highest levels of unexplored novelty across all taxonomic levels, while host-associated samples (Table S2) and plant samples are the most extensively studied.
Figure 7. A novelty score, which was based on the number of FOTUs, GOTUs, and sOTUs within the samples of each environment, was normalized by the number of environment samples present in our dataset. Saline water samples contain the highest levels of unexplored novelty across all taxonomic levels, while host-associated samples (Table S2) and plant samples are the most extensively studied.
Microorganisms 13 00598 g007
Figure 8. SOTUs verification as a factor of matching to sequences in IMG/M at different similarity levels. Horizontal red lines correspond to similarity cutoffs used for assigning sequences to species (97%), genera (93%), and families (89%). Vertical blue lines correspond to how many SOTUs are verified at each level. Dashed black line indicates the total number of SOTUs.
Figure 8. SOTUs verification as a factor of matching to sequences in IMG/M at different similarity levels. Horizontal red lines correspond to similarity cutoffs used for assigning sequences to species (97%), genera (93%), and families (89%). Vertical blue lines correspond to how many SOTUs are verified at each level. Dashed black line indicates the total number of SOTUs.
Microorganisms 13 00598 g008
Figure 9. (a) Environmental association for the Asgard Archaea SOTUs based on the origin of the IMNGS samples with soil being the most rich environment. (b) Distribution of the predicted SOTUs to the Asgard classes present in SILVA at the time of this analysis.
Figure 9. (a) Environmental association for the Asgard Archaea SOTUs based on the origin of the IMNGS samples with soil being the most rich environment. (b) Distribution of the predicted SOTUs to the Asgard classes present in SILVA at the time of this analysis.
Microorganisms 13 00598 g009
Figure 10. Asgard SOTUs with UNKCLASS placement on the phylogenetic tree. There are five clusters that remain unknown and should be studied more as they may represent novel Asgard classes. Cluster 5 contains 26 SOTUs (6 verified by different targets of IMG/M) from diverse environments and originating from 25 SRA samples.
Figure 10. Asgard SOTUs with UNKCLASS placement on the phylogenetic tree. There are five clusters that remain unknown and should be studied more as they may represent novel Asgard classes. Cluster 5 contains 26 SOTUs (6 verified by different targets of IMG/M) from diverse environments and originating from 25 SRA samples.
Microorganisms 13 00598 g010
Table 1. Comparison between the number of families, genera, and species included in the Living Tree Project (LTP), SILVA, and GTDB databases at the time of our study and those predicted from this study. (# symbol refers to “the number of” the corresponding taxon.)
Table 1. Comparison between the number of families, genera, and species included in the Living Tree Project (LTP), SILVA, and GTDB databases at the time of our study and those predicted from this study. (# symbol refers to “the number of” the corresponding taxon.)
Database#Families#Genera#Species
LTP35129490
SILVA82161-
GTDB262127693044
GAD (singletons in)98,172561,7882,807,013
GAD (singletons out)30,88787,200419,934
Table 2. Number of SOTUs with match at IMG/M at different similarity levels. SOTUs are divided according to the number of samples positive for their presence.
Table 2. Number of SOTUs with match at IMG/M at different similarity levels. SOTUs are divided according to the number of samples positive for their presence.
SOTUs99979389
Singletons2,398,35117,986 (0.7%)254,283 (10%)1,485,595 (61%)2,246,610 (93%)
Doubletons151,6008157 (5%)52,428 (34%)112,662 (74%)144,255 (95%)
Tripletons61,8244905 (7%)22,172 (35%)41,752 (67%)58,978 (95%)
Moretons195,23836,579 (18%)61,631 (31%)151,397 (77%)189,018 (96%)
Total2,807,01367,627 (2%)390,514 (13%)1,791,406 (63%)2,638,461 (93%)
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

Share and Cite

MDPI and ACS Style

Kioukis, A.; Camargo, A.P.; Pavlidis, P.; Iliopoulos, I.; Kyrpides, N.C.; Lagkouvardos, I. Global Archaeal Diversity Revealed Through Massive Data Integration: Uncovering Just Tip of Iceberg. Microorganisms 2025, 13, 598. https://doi.org/10.3390/microorganisms13030598

AMA Style

Kioukis A, Camargo AP, Pavlidis P, Iliopoulos I, Kyrpides NC, Lagkouvardos I. Global Archaeal Diversity Revealed Through Massive Data Integration: Uncovering Just Tip of Iceberg. Microorganisms. 2025; 13(3):598. https://doi.org/10.3390/microorganisms13030598

Chicago/Turabian Style

Kioukis, Antonios, Antonio Pedro Camargo, Pavlos Pavlidis, Ioannis Iliopoulos, Nikos C Kyrpides, and Ilias Lagkouvardos. 2025. "Global Archaeal Diversity Revealed Through Massive Data Integration: Uncovering Just Tip of Iceberg" Microorganisms 13, no. 3: 598. https://doi.org/10.3390/microorganisms13030598

APA Style

Kioukis, A., Camargo, A. P., Pavlidis, P., Iliopoulos, I., Kyrpides, N. C., & Lagkouvardos, I. (2025). Global Archaeal Diversity Revealed Through Massive Data Integration: Uncovering Just Tip of Iceberg. Microorganisms, 13(3), 598. https://doi.org/10.3390/microorganisms13030598

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Metrics

Back to TopTop