Combining Flow Cytometry and Metagenomics Improves Recovery of Metagenome-Assembled Genomes in a Cell Culture from Activated Sludge

The recovery of metagenome-assembled genomes is biased towards the most abundant species in a given community. To improve the identification of species, even if only dominant species are recovered, we investigated the integration of flow cytometry cell sorting with bioinformatics tools to recover metagenome-assembled genomes. We used a cell culture of a wastewater microbial community as our model system. Cells were separated based on fluorescence signals via flow cytometry cell sorting into sub-communities: dominant gates, low abundant gates, and outer gates into subsets of the original community. Metagenome sequencing was performed for all groups. The unsorted community was used as control. We recovered a total of 24 metagenome-assembled genomes (MAGs) representing 11 species-level genome operational taxonomic units (gOTUs). In addition, 57 ribosomal operational taxonomic units (rOTUs) affiliated with 29 taxa at species level were reconstructed from metagenomic libraries. Our approach suggests a two-fold increase in the resolution when comparing sorted and unsorted communities. Our results also indicate that species abundance is one determinant of genome recovery from metagenomes as we can recover taxa in the sorted libraries that are not present in the unsorted community. In conclusion, a combination of cell sorting and metagenomics allows the recovery of MAGs undetected without cell sorting.


Introduction
Metagenomics is a standard technique used to study microbial community composition and functional potential [1][2][3][4][5]. Thanks to novel bioinformatics tools, reconstructing genomes in complex communities has significantly improved [1]. High-throughput sequencing allows analyzing genomes from species within microbial communities, including hitherto-uncultured species [6][7][8]. The potential for obtaining genomes of uncultivable species with metagenomics has been proven by assembling nearly complete MAGs of dominant species from microbial communities with relatively low microbial diversity [9]. Nevertheless, the recovery of metagenome-assembled genomes (MAGs) is currently challenged by a lack of understanding of how specialized bioinformatics tools for assembly and binning influence MAG recovery [10]. For example, insufficient sequencing depth makes it difficult to reconstruct MAGs of individual species with relative abundances lower than 1% in a given metagenome [9]. This makes the recovery of MAGs of low-abundance species in complex communities difficult, as the reconstruction of nearly complete MAGs may demand hundreds of millions of short-sequencing reads (150-200 bp) per metagenome [11,12].
The limitations of current metagenomics approaches, such as access to rare species, introduce bias to the taxonomic composition of complex microbial communities in genomecentric metagenome studies. Eliminating this bias is particularly interesting in wastewater treatment plants because rare species, such as nitrifiers, may be underestimated or ignored. The efficiency of the system depends on the presence of certain species in the activated sludge (AS) [13], which is the most used process for wastewater treatment globally [13] and consists of a high concentration of microorganisms able to convert organic and inorganic wastewater constituents, thereby purifying water and synthesizing resources [13] that can be used in biotechnological processes [14]. Additionally, previous studies suggest that low abundant taxa found in a complex environment may become relevant back-ups under environmental stress [15] and should not be overlooked.
A potential strategy to circumvent the challenges of recovering low-abundant species in metagenomes is to reduce the complexity of a target microbial community. Greib and collaborators [12] revealed that obstacles associated with access to low abundant taxa in complex communities using metagenomics and single-cell sequencing could be addressed. To this end, microbial target groups were enriched before genomic analysis by taxonspecific fluorescent labeling of microbial species and fluorescence-activated cell sorting (FACS). FACS seems to be a promising technology that separates microbial communities into sub-communities [16] that can be sequenced separately. Flow cytometry and cell sorting have been combined as a high throughput technique for analyzing microbiomes at the single-cell level [17,18]. Cell sorting does not impair cells when separating them from the whole community [19], and reduces contamination by extracellular DNA [20]. The usefulness of this approach has already been demonstrated in soil samples [20]. The study of Alteio et al. [20] demonstrated that cell sorting and metagenomics approaches expanded the diversity of soil taxa and recovered more genomes from 'minimetagenomes' than bulk metagenomes.
This study aimed to improve MAG recovery by combining flow cytometry, cell sorting, and metagenomics from activated sludge microbiomes. Briefly, we sequenced metagenomes of the unsorted and sorted communities from a batch-grown activated sludge microbiome collected in a wastewater treatment plant. After MAG recovery, we dereplicated the different MAGs into genome operational taxonomic units (gOTUs) and compared their phylogeny and abundances to 16S rRNA genes (rOTUs) reconstructed from the different libraries. We used our data to check if the recovered MAGs belonged to the dominant microbial community in our cell culture. In addition to improving genome recovery from sorted and unsorted communities, we used our dataset to extrapolate the factors controlling the recovery of MAGs, including genome coverage and relative abundances. Furthermore, we defined a detection limit for MAG recovery and a strategy to define cut-off limits for MAG recovery in metagenomes.

Cell Cultivation
We used the cells of a microbial community originating from an activated sludge basin of a wastewater treatment plant (Eilenburg, Saxonia, Germany-51 • 27'39.4" N, 12 • 36'17.5" E) as our model microbial community described in Liu et al. [21]. The entire workflow employed in this study, including the experimental scheme of the sequential batch cultivation and treatment of the microbial community, is presented in Supplementary File S1: Figure S1A,B. The activated sludge sample (30 mL thawed inoculum) was first cultured in a 1-L batch flask in 300 mL medium at 30 • C and 125 rpm for 17 h to reduce the undegraded and inorganic particles from the sludge. The second and third batches (each also in 1-L flasks) were inoculated with an initial optical density (OD) of 0.05 (OD 600nm , d = 0.5 cm), each, and cultivated in the same way.

Metagenome Libraries, Quality Control, and Assembly of Raw Reads
The DNA extracts from the four metagenomes (UC, DG, LA, and OG) were used for library preparation and sequencing. The libraries were prepared using the Nextera XT DNA preparation kit from Illumina according to the manufacturer's instructions. Next, metagenomes were paired-end sequenced using Illumina (Illumina NextSeq 500; 2 × 150 bp) at a minimum of 20 million reads per library. Finally, high-throughput sequencing was done by StarSEQ ® GmbH. The sequenced libraries were imported in FASTQC format. The reads were quality controlled by removing low-quality reads and adapters using Trim Galore v0.4.3 implemented in metaWRAP pipeline [22]. The trimmed reads were assembled to contigs using metaSPAdes v3.11.1 [23].
MAGs were classified as medium quality if their quality score was greater than 50 and with completeness between 50% and 80% and less than 10% contamination. MAGs with a quality score greater than 50, completeness above 80%, and contamination below 5% were considered high-quality.

Phylogenetic Classification
We used GTDB-tk v0.3.2 [27,28] to assign the taxonomy to the MAGs recovered from all libraries. We selected a representative MAG from each taxonomic group for relative abundance analysis. This selection was based on the average nucleotide identity (ANI) distance between microbial species' genomes clustered at 95% ANI and on quality score metrics. We used Fastree2 [29] (default parameters) to construct a phylogenetic tree with concatenated protein alignments of gOTUs recovered from the unsorted community and sorted sub-communities. We used iTol [30] to generate the visual representation of the tree via the EMBL server (https://itol.embl.de/, accessed on 8 September 2022).

Clustering of MAG to Genome Operational Taxonomic Units (gOTUs)
To assess a better representation of our genomes across sorted sub-communities and unsorted communities, we dereplicated the recovered MAGs into gOTUs using Mu-DoGeR [31]. Briefly, MAGs and their reference sequences classified to the same taxonomy were grouped and clustered into gOTUs at 95% average nucleotide identity (ANI) distance using fastANI v1.0 [32], with default parameters. Next, we used hierarchical clustering with bootstrap resampling, resulting in 13 gOTUs with unique taxonomic classifications. Next, the gOTUs with the best quality per taxonomic cluster were selected as representatives and further processed for relative abundance analysis. To note that MAGs classified as Sphingobacterium (at the genus level) recovered in the DG and Comamonas B-9 (classified at the species level according to EZBioCloud.net) recovered in the UC were not included in further analyses.

Calculation of Genome Operational Taxonomic Units Relative Abundances and Coverage
The quality-checked reads from the four libraries were mapped to the gOTUs even if they were not recovered in a given library. Read mapping for all libraries was done using Bowtie2 [33], and mapped reads were retrieved using Samtools [34]. The relative abundance of each gOTU was calculated using Samtools [34]. We calculated the coverage values of gOTUs by multiplying the number of reads of genomes by the average size of reads in the libraries divided by the size of genomes (in base pairs) (Equation (2)). This analysis allowed a comparison of gOTUs abundances across different libraries. The evolutionary history of prokaryotes is complex, and it has been suggested that lateral gene transfer happens more often than not [35], which could lead to the recovery of genomes from species that are not present in the community. To avoid partial detection of (false positive) gOTUs in samples where they were not present, we assigned a gOTU in a sample when it showed a minimum of 10x coverage. The gOTUs with a coverage smaller than 10x were considered below detection limit (BDL). We also performed rarefactions analysis on the metagenomics libraries to determine if we reached the maximum detection of gOTUs within our detection limit by plotting the sample sizes versus observed coverage using the Vegan package of R [36].
In Equation (2), L is the average size of reads per library, N is the number of reads per genome and G is the genome size (in base pairs).

Reconstruction of Phylogenetic Marker Genes from Metagenomic Libraries
We reconstructed nearly full-length phylogenetic marker genes (16S rRNA) from metagenomic libraries using the Mapping-Assisted Targeted Assembly for Metagenomics (MATAM) pipeline v1.6.0 [37]. The paired-end reads of the metagenomic libraries were interleaved, and taxonomic information of 16S rRNA sequences was obtained by alignment to the SILVA database version 132 NR95 [38] with the RDP classifier [39].
Furthermore, MATAM mapped the reads of each marker gene to the metagenomic libraries and calculates the relative abundances of each 16S rRNA reconstructed sequence.

Curation of Reconstructed 16S rRNA, Removal of Chimeric and Clustering of Sequences
After running MATAM, we removed all 16S rRNA sequences smaller than 900 bp using SeqKit [40] since these might not identify bacteria at the species level. Next, chimeric sequences were removed from 16S rRNA sequences using the reference database SILVA 132 NR95 with the uchime_ref function of UCHIME [41]. Next, we performed primary clustering at 97% similarities per library. Furthermore, we merged representative 16S rRNA sequences from all libraries resulting from the primary clustering and clustered them at 97% similarity using cluster fast function from VSEARCH [42]. We name this final set of reconstructed 16S rRNA genes clustered at 97% similarity to our rOTUs. Additionally, to determine which rOTUs map to the gOTUs, sequences of rOTUs reconstructed from MATAM were further classified to species level using EZbiocloud [43].

Calculation of Relative Abundances of rOTUs
The relative abundances of rOTUs were determined in three steps. First, the relative abundances of 16S rRNA sequences obtained from MATAM were calculated by dividing the counts of each 16S rRNA sequence by the total number of sequences in each library (Supplementary File S5: Table S3). Second, from the primary clustering, we summed up the relative abundances of all sequences in each cluster per library (Supplementary File S6: Table S4). In the last step, we determined the relative abundances of rOTUs. After, we summed them with all other rOTUs in the clusters resulting from the secondary clustering (secondary clustering was performed with the merged representative sequences from all libraries (Supplementary File S7: Table S5)).

Separation of Sub-Communities by Flow Cytometry
A wastewater microbial community was cultivated in sequential batch culture prior to cytometric analyses. In a cytometric histogram, a total of 200,000 cells per sample were measured using cell information on scattering light and fluorescence signals attached to the DNA contents of cells ( Figure 1). A total of twenty-one gates were generated inside of the cell gate according to the position and cell abundances, reflecting the structure of a microbial community [21] (Figure 1, Supplementary File S3: Table S1). Cells were clustered into gates with an average cell abundance per gate of 7658.1 cells. Following from this, due to low cell numbers per some of the sub-communities, three types of sub-communities were defined and named as dominant (DG), low abundant (LA), and outer cells (OG). The DG consisted of five sub-communities (G1, G2, G3, G4, and G9), while the LA subcommunity consisted of 16 sub-communities (G5, G6, G7, G8, G10, G11, G12, G13, G14,  G15, G16, G17, G18, G19, G20, and G21)

Quality of Recovered Metagenome-Assembled Genomes
We recovered 24 MAGs of high and medium quality from all communities (sorted and unsorted). Three MAGs were retrieved from DG (two high-quality and one mediumquality), seven MAGs from LA (three high-quality and four medium-quality), and six from OG (two high-quality and four medium-quality). A total of eight MAGs were

Quality of Recovered Metagenome-Assembled Genomes
We recovered 24 MAGs of high and medium quality from all communities (sorted and unsorted). Three MAGs were retrieved from DG (two high-quality and one mediumquality), seven MAGs from LA (three high-quality and four medium-quality), and six from OG (two high-quality and four medium-quality). A total of eight MAGs were recovered from UC (six high-quality and two medium-quality) (Supplementary Data- Table S2). The quality of MAGs recovered in sorted sub-communities differed in terms of the N50 statistics (ranging between 16-527), strain heterogeneity (varied between 0-100), and quality score (ranging between 51.41-93.04). The number of high-quality MAGs recovered from LA (3) was superior to the MAGs recovered from DG (two) and OG (two) (Supplementary File S4: Table S2). Furthermore, we mapped 72.38% reads from DG, 61.51% from OG, 58.92% from LA and 76.34% from the UC to the MAGs recovered in those libraries. This analysis indicates that other species exist in the community and were not recovered with our approach.

Taxonomic Profiling of MAGs
MAGs were classified as bacteria and placed in five different Orders. These MAGs were affiliated to 11 different taxa at different taxonomic levels based on average nucleotide identity (ANI) distances ( Figure 2, Supplementary File S4: Table S2). Comparing the three sub-communities, the DG presented the least bacterial diversity with three different taxa: Elizabethkingia ursingii and Escherichia flexneri, classified up to species level, and Sphingobacterium sp., classified to genus level (Supplementary File S4: Table S2). The OG subcommunity was composed of six taxa: Sphingobacterium sp., Acinetobacter gerneri, E. flexneri, Comamonas terrigena and Empedobacter falsenii, classified to species level, and Variovorax sp., classified to genus level. The LA sub-community presented the highest bacterial diversity consisting of seven taxa: E.mpedobacter falsenii, E.scherichia flexneri, Elizabethkingia miricola, Acinetobacter pittii, C.omamonas terrigena, Sphingobacterium sp., classified to species level, and Variovorax sp., classified to genus level (Supplementary File S4: Table S2). Phylogenetic tree of representative metagenome-assembled genomes using the concatenated protein alignment of genes from GTDB-tk [28]. The inside node labels represent taxonomy at the species level. The inner column (A), represents taxonomy at the family level. The size of the scale indicates relative evolutionary divergence value for taxa at each taxonomic rank.
Compared to the three sub-communities, the UC presented fewer taxa recovered. Six taxa were classified to species level (Escherichia flexneri, C.omamonas terrigena, A. bouvetii, A. baumannii, Sphingobacterium sp., Empedobacter falsenii and Variovorax sp.) (Supplementary File S4: Table S2). A total of five species were recovered in both the unsorted community and at least one or more of the sorted sub-communities. E. flexneri was present in all libraries (unsorted community and sorted sub-communities) (Supplementary File S4: Table S2). The number of gOTUs observed between the unsorted and sorted sub-community is shown in a Venn diagram ( Figure 3).

Figure 2.
Phylogenetic tree of representative metagenome-assembled genomes using the concatenated protein alignment of genes from GTDB-tk [28]. The inside node labels represent taxonomy at the species level. The inner column (A), represents taxonomy at the family level. The size of the scale indicates relative evolutionary divergence value for taxa at each taxonomic rank.
Compared to the three sub-communities, the UC presented fewer taxa recovered. Six taxa were classified to species level (Escherichia flexneri, C.omamonas terrigena, A. bouvetii, A. baumannii, Sphingobacterium sp., Empedobacter falsenii and Variovorax sp.) (Supplementary File S4: Table S2). A total of five species were recovered in both the unsorted community and at least one or more of the sorted sub-communities. E. flexneri was present in all libraries (unsorted community and sorted sub-communities) (Supplementary File S4: Table S2). The number of gOTUs observed between the unsorted and sorted sub-community is shown in a Venn diagram ( Figure 3).  Table S2). A total of five species were recovered in both the unsorted community and at least one or more of the sorted sub-communities. E. flexneri was present in al libraries (unsorted community and sorted sub-communities) (Supplementary File S4: Table S2). The number of gOTUs observed between the unsorted and sorted sub-community is shown in a Venn diagram (Figure 3). MAGs. It presents the number of taxa observed between MAGs recovered from unsorted and sorted sub-communities. One taxon was present in both the unsorted community and sorted sub-commu nities. Four taxa were found in the unsorted community and a combination of one or more sub communities. Two taxa were present only in the unsorted community and five were exclusively found in the sorted sub-communities.

Metagenome-Assembled Genomes (MAGs) Unique to Sorted Sub-Communities
Four MAGs (MDS_FC07-MDS_FC10) affiliated with four species were exclusively recovered in the sorted sub-communities (Supplementary File S4: Table S2). Of these, one MAG (classified as Elizabethkingia ursingii) from DG was of high quality with estimated completeness of 82.8%, contamination of 0.06%, and quality score of 82.50% (Table 1 and  Supplementary File S4: Table S2). Two MAGs were exclusively retrieved in the LA subcommunity: Elizabethkingia miricola and A. pittii. The completeness and contamination of E. miricola were 67.92% and 2.15%, respectively. A. pittii completeness was estimated at 51.41% complete and free of contamination (Table 1 and Supplementary File S4: Table S2) One medium-quality MAG (62.93% completeness and zero contamination), recovered exclusively in OG, was classified as A. gerneri (Table 1 and Supplementary File S4: Table S2) It presents the number of taxa observed between MAGs recovered from unsorted and sorted subcommunities. One taxon was present in both the unsorted community and sorted sub-communities. Four taxa were found in the unsorted community and a combination of one or more sub-communities. Two taxa were present only in the unsorted community and five were exclusively found in the sorted sub-communities.

Metagenome-Assembled Genomes (MAGs) Unique to Sorted Sub-Communities
Four MAGs (MDS_FC07-MDS_FC10) affiliated with four species were exclusively recovered in the sorted sub-communities (Supplementary File S4: Table S2). Of these, one MAG (classified as Elizabethkingia ursingii) from DG was of high quality with estimated completeness of 82.8%, contamination of 0.06%, and quality score of 82.50% (Table 1 and  Supplementary File S4: Table S2). Two MAGs were exclusively retrieved in the LA subcommunity: Elizabethkingia miricola and A. pittii. The completeness and contamination of E. miricola were 67.92% and 2.15%, respectively. A. pittii completeness was estimated at 51.41% complete and free of contamination (Table 1 and Supplementary File S4: Table S2). One medium-quality MAG (62.93% completeness and zero contamination), recovered exclusively in OG, was classified as A. gerneri (Table 1 and Supplementary File S4: Table S2).

Coverage and Relative Abundance of gOTUs per Library
We performed bin-relative abundance analysis using gOTUs recovered from the unsorted and sorted sub-communities to access the fraction of reads of gOTUs in each metagenome. After calculating the coverage, 11 taxa remained since one taxon (C. B-9) was found below the detection limit in all groups (Table 1) and removed from further analyses. The average detection limit of the gOTUs with coverage above ten times was 2.92% (±0.57). The rarefaction curves of genome coverage of the microbial communities observed in the unsorted and sorted sub-communities are illustrated in Figure 4, where all communities were able to achieve their plateau for the given cut-off coverage. Table 1. Genome coverage of genome operational taxonomic units (gOTUs) per sub-community. All gOTUs in the same cluster were classified with the same taxonomy. Mapping the reads from all libraries to the gOTUs was done irrespective of their recovery. We removed gOTUs with less than ten times coverage and labeled them as below the detection limit (BDL). The marks indicate in which library the gOTUs were recovered. The numbers in parenthesis indicate the coverage times of each gOTU in the libraries expressed in percentages. (DG: Dominant sub-community; LA: Low abundant sub-community; OG: Outer gate sub-community; UC: Unsorted sub-community).

Coverage and Relative Abundance of gOTUs per library
We performed bin-relative abundance analysis using gOTUs recovered from the unsorted and sorted sub-communities to access the fraction of reads of gOTUs in each metagenome. After calculating the coverage, 11 taxa remained since one taxon (C. B-9) was found below the detection limit in all groups (Table 1) and removed from further analyses. The average detection limit of the gOTUs with coverage above ten times was 2.92% (±0.57). The rarefaction curves of genome coverage of the microbial communities observed in the unsorted and sorted sub-communities are illustrated in Figure 4, where all communities were able to achieve their plateau for the given cut-off coverage.

Coverage and Relative Abundance of gOTUs per library
We performed bin-relative abundance analysis using gOTUs recovered from the unsorted and sorted sub-communities to access the fraction of reads of gOTUs in each metagenome. After calculating the coverage, 11 taxa remained since one taxon (C. B-9) was found below the detection limit in all groups (Table 1) and removed from further analyses. The average detection limit of the gOTUs with coverage above ten times was 2.92% (±0.57). The rarefaction curves of genome coverage of the microbial communities observed in the unsorted and sorted sub-communities are illustrated in Figure 4, where all communities were able to achieve their plateau for the given cut-off coverage.

Coverage and Relative Abundance of gOTUs per library
We performed bin-relative abundance analysis using gOTUs recovered from the unsorted and sorted sub-communities to access the fraction of reads of gOTUs in each metagenome. After calculating the coverage, 11 taxa remained since one taxon (C. B-9) was found below the detection limit in all groups (Table 1) and removed from further analyses. The average detection limit of the gOTUs with coverage above ten times was 2.92% (±0.57). The rarefaction curves of genome coverage of the microbial communities observed in the unsorted and sorted sub-communities are illustrated in Figure 4, where all communities were able to achieve their plateau for the given cut-off coverage.

Coverage and Relative Abundance of gOTUs per library
We performed bin-relative abundance analysis using gOTUs recovered from the unsorted and sorted sub-communities to access the fraction of reads of gOTUs in each metagenome. After calculating the coverage, 11 taxa remained since one taxon (C. B-9) was found below the detection limit in all groups (Table 1) and removed from further analyses. The average detection limit of the gOTUs with coverage above ten times was 2.92% (±0.57). The rarefaction curves of genome coverage of the microbial communities observed in the unsorted and sorted sub-communities are illustrated in Figure 4, where all communities were able to achieve their plateau for the given cut-off coverage.

Coverage and Relative Abundance of gOTUs per library
We performed bin-relative abundance analysis using gOTUs recovered from the unsorted and sorted sub-communities to access the fraction of reads of gOTUs in each metagenome. After calculating the coverage, 11 taxa remained since one taxon (C. B-9) was found below the detection limit in all groups (Table 1) and removed from further analyses. The average detection limit of the gOTUs with coverage above ten times was 2.92% (±0.57). The rarefaction curves of genome coverage of the microbial communities observed in the unsorted and sorted sub-communities are illustrated in Figure 4, where all communities were able to achieve their plateau for the given cut-off coverage.

Coverage and Relative Abundance of gOTUs per library
We performed bin-relative abundance analysis using gOTUs recovered from the unsorted and sorted sub-communities to access the fraction of reads of gOTUs in each metagenome. After calculating the coverage, 11 taxa remained since one taxon (C. B-9) was found below the detection limit in all groups (Table 1) and removed from further analyses. The average detection limit of the gOTUs with coverage above ten times was 2.92% (±0.57). The rarefaction curves of genome coverage of the microbial communities observed in the unsorted and sorted sub-communities are illustrated in Figure 4, where all communities were able to achieve their plateau for the given cut-off coverage.
Microorganisms 2023, 11, x FOR PEER REVIEW 9 of 17 Table 1. Genome coverage of genome operational taxonomic units (gOTUs) per sub-community. All gOTUs in the same cluster were classified with the same taxonomy. Mapping the reads from all libraries to the gOTUs was done irrespective of their recovery. We removed gOTUs with less than ten times coverage and labeled them as below the detection limit (BDL). The marks indicate in which library the gOTUs were recovered. The numbers in parenthesis indicate the coverage times of each gOTU in the libraries expressed in percentages.

Coverage and Relative Abundance of gOTUs per library
We performed bin-relative abundance analysis using gOTUs recovered from the unsorted and sorted sub-communities to access the fraction of reads of gOTUs in each metagenome. After calculating the coverage, 11 taxa remained since one taxon (C. B-9) was found below the detection limit in all groups (Table 1) and removed from further analyses. The average detection limit of the gOTUs with coverage above ten times was 2.92% (±0.57). The rarefaction curves of genome coverage of the microbial communities observed in the unsorted and sorted sub-communities are illustrated in Figure 4, where all communities were able to achieve their plateau for the given cut-off coverage.
Microorganisms 2023, 11, x FOR PEER REVIEW 9 of 17 Table 1. Genome coverage of genome operational taxonomic units (gOTUs) per sub-community. All gOTUs in the same cluster were classified with the same taxonomy. Mapping the reads from all libraries to the gOTUs was done irrespective of their recovery. We removed gOTUs with less than ten times coverage and labeled them as below the detection limit (BDL). The marks indicate in which library the gOTUs were recovered. The numbers in parenthesis indicate the coverage times of each gOTU in the libraries expressed in percentages.

Coverage and Relative Abundance of gOTUs per library
We performed bin-relative abundance analysis using gOTUs recovered from the unsorted and sorted sub-communities to access the fraction of reads of gOTUs in each metagenome. After calculating the coverage, 11 taxa remained since one taxon (C. B-9) was found below the detection limit in all groups (Table 1) and removed from further analyses. The average detection limit of the gOTUs with coverage above ten times was 2.92% (±0.57). The rarefaction curves of genome coverage of the microbial communities observed in the unsorted and sorted sub-communities are illustrated in Figure 4, where all communities were able to achieve their plateau for the given cut-off coverage.  Table 1. Genome coverage of genome operational taxonomic units (gOTUs) per sub-community. All gOTUs in the same cluster were classified with the same taxonomy. Mapping the reads from all libraries to the gOTUs was done irrespective of their recovery. We removed gOTUs with less than ten times coverage and labeled them as below the detection limit (BDL). The marks indicate in which library the gOTUs were recovered. The numbers in parenthesis indicate the coverage times of each gOTU in the libraries expressed in percentages.

Coverage and Relative Abundance of gOTUs per library
We performed bin-relative abundance analysis using gOTUs recovered from the unsorted and sorted sub-communities to access the fraction of reads of gOTUs in each metagenome. After calculating the coverage, 11 taxa remained since one taxon (C. B-9) was found below the detection limit in all groups (Table 1) and removed from further analyses. The average detection limit of the gOTUs with coverage above ten times was 2.92% (±0.57). The rarefaction curves of genome coverage of the microbial communities observed in the unsorted and sorted sub-communities are illustrated in Figure 4, where all communities were able to achieve their plateau for the given cut-off coverage.
Microorganisms 2023, 11, x FOR PEER REVIEW 9 of 17 Table 1. Genome coverage of genome operational taxonomic units (gOTUs) per sub-community. All gOTUs in the same cluster were classified with the same taxonomy. Mapping the reads from all libraries to the gOTUs was done irrespective of their recovery. We removed gOTUs with less than ten times coverage and labeled them as below the detection limit (BDL). The marks indicate in which library the gOTUs were recovered. The numbers in parenthesis indicate the coverage times of each gOTU in the libraries expressed in percentages.

Coverage and Relative Abundance of gOTUs per library
We performed bin-relative abundance analysis using gOTUs recovered from the unsorted and sorted sub-communities to access the fraction of reads of gOTUs in each metagenome. After calculating the coverage, 11 taxa remained since one taxon (C. B-9) was found below the detection limit in all groups (Table 1) and removed from further analyses. The average detection limit of the gOTUs with coverage above ten times was 2.92% (±0.57). The rarefaction curves of genome coverage of the microbial communities observed in the unsorted and sorted sub-communities are illustrated in Figure 4, where all communities were able to achieve their plateau for the given cut-off coverage.
Microorganisms 2023, 11, x FOR PEER REVIEW 9 of 17 Table 1. Genome coverage of genome operational taxonomic units (gOTUs) per sub-community. All gOTUs in the same cluster were classified with the same taxonomy. Mapping the reads from all libraries to the gOTUs was done irrespective of their recovery. We removed gOTUs with less than ten times coverage and labeled them as below the detection limit (BDL). The marks indicate in which library the gOTUs were recovered. The numbers in parenthesis indicate the coverage times of each gOTU in the libraries expressed in percentages.

Coverage and Relative Abundance of gOTUs per library
We performed bin-relative abundance analysis using gOTUs recovered from the unsorted and sorted sub-communities to access the fraction of reads of gOTUs in each metagenome. After calculating the coverage, 11 taxa remained since one taxon (C. B-9) was found below the detection limit in all groups (Table 1) and removed from further analyses. The average detection limit of the gOTUs with coverage above ten times was 2.92% (±0.57). The rarefaction curves of genome coverage of the microbial communities observed in the unsorted and sorted sub-communities are illustrated in Figure 4, where all communities were able to achieve their plateau for the given cut-off coverage.

Coverage and Relative Abundance of gOTUs per library
We performed bin-relative abundance analysis using gOTUs recovered from the unsorted and sorted sub-communities to access the fraction of reads of gOTUs in each metagenome. After calculating the coverage, 11 taxa remained since one taxon (C. B-9) was found below the detection limit in all groups (Table 1) and removed from further analyses. The average detection limit of the gOTUs with coverage above ten times was 2.92% (±0.57). The rarefaction curves of genome coverage of the microbial communities observed in the unsorted and sorted sub-communities are illustrated in Figure 4, where all communities were able to achieve their plateau for the given cut-off coverage.

Coverage and Relative Abundance of gOTUs per library
We performed bin-relative abundance analysis using gOTUs recovered from the unsorted and sorted sub-communities to access the fraction of reads of gOTUs in each metagenome. After calculating the coverage, 11 taxa remained since one taxon (C. B-9) was found below the detection limit in all groups (Table 1) and removed from further analyses. The average detection limit of the gOTUs with coverage above ten times was 2.92% (±0.57). The rarefaction curves of genome coverage of the microbial communities observed in the unsorted and sorted sub-communities are illustrated in Figure 4, where all communities were able to achieve their plateau for the given cut-off coverage.
Microorganisms 2023, 11, x FOR PEER REVIEW 9 of 17 Table 1. Genome coverage of genome operational taxonomic units (gOTUs) per sub-community. All gOTUs in the same cluster were classified with the same taxonomy. Mapping the reads from all libraries to the gOTUs was done irrespective of their recovery. We removed gOTUs with less than ten times coverage and labeled them as below the detection limit (BDL

Coverage and Relative Abundance of gOTUs per library
We performed bin-relative abundance analysis using gOTUs recovered from the unsorted and sorted sub-communities to access the fraction of reads of gOTUs in each metagenome. After calculating the coverage, 11 taxa remained since one taxon (C. B-9) was found below the detection limit in all groups (Table 1) and removed from further analyses. The average detection limit of the gOTUs with coverage above ten times was 2.92% (±0.57). The rarefaction curves of genome coverage of the microbial communities observed in the unsorted and sorted sub-communities are illustrated in Figure 4, where all communities were able to achieve their plateau for the given cut-off coverage.
Microorganisms 2023, 11, x FOR PEER REVIEW 9 of 17 Table 1. Genome coverage of genome operational taxonomic units (gOTUs) per sub-community. All gOTUs in the same cluster were classified with the same taxonomy. Mapping the reads from all libraries to the gOTUs was done irrespective of their recovery. We removed gOTUs with less than ten times coverage and labeled them as below the detection limit (BDL

Coverage and Relative Abundance of gOTUs per library
We performed bin-relative abundance analysis using gOTUs recovered from the unsorted and sorted sub-communities to access the fraction of reads of gOTUs in each metagenome. After calculating the coverage, 11 taxa remained since one taxon (C. B-9) was found below the detection limit in all groups (Table 1) and removed from further analyses. The average detection limit of the gOTUs with coverage above ten times was 2.92% (±0.57). The rarefaction curves of genome coverage of the microbial communities observed in the unsorted and sorted sub-communities are illustrated in Figure 4, where all communities were able to achieve their plateau for the given cut-off coverage.
Microorganisms 2023, 11, x FOR PEER REVIEW 9 of 17 Table 1. Genome coverage of genome operational taxonomic units (gOTUs) per sub-community. All gOTUs in the same cluster were classified with the same taxonomy. Mapping the reads from all libraries to the gOTUs was done irrespective of their recovery. We removed gOTUs with less than ten times coverage and labeled them as below the detection limit (BDL

Coverage and Relative Abundance of gOTUs per library
We performed bin-relative abundance analysis using gOTUs recovered from the unsorted and sorted sub-communities to access the fraction of reads of gOTUs in each metagenome. After calculating the coverage, 11 taxa remained since one taxon (C. B-9) was found below the detection limit in all groups (Table 1) and removed from further analyses. The average detection limit of the gOTUs with coverage above ten times was 2.92% (±0.57). The rarefaction curves of genome coverage of the microbial communities observed in the unsorted and sorted sub-communities are illustrated in Figure 4, where all communities were able to achieve their plateau for the given cut-off coverage.

Coverage and Relative Abundance of gOTUs per library
We performed bin-relative abundance analysis using gOTUs recovered from the unsorted and sorted sub-communities to access the fraction of reads of gOTUs in each metagenome. After calculating the coverage, 11 taxa remained since one taxon (C. B-9) was found below the detection limit in all groups (Table 1) and removed from further analyses. The average detection limit of the gOTUs with coverage above ten times was 2.92% (±0.57). The rarefaction curves of genome coverage of the microbial communities observed in the unsorted and sorted sub-communities are illustrated in Figure 4, where all communities were able to achieve their plateau for the given cut-off coverage.
Microorganisms 2023, 11, x FOR PEER REVIEW 9 of 17 Table 1. Genome coverage of genome operational taxonomic units (gOTUs) per sub-community. All gOTUs in the same cluster were classified with the same taxonomy. Mapping the reads from all libraries to the gOTUs was done irrespective of their recovery. We removed gOTUs with less than ten times coverage and labeled them as below the detection limit (BDL

Coverage and Relative Abundance of gOTUs per library
We performed bin-relative abundance analysis using gOTUs recovered from the unsorted and sorted sub-communities to access the fraction of reads of gOTUs in each metagenome. After calculating the coverage, 11 taxa remained since one taxon (C. B-9) was found below the detection limit in all groups (Table 1) and removed from further analyses. The average detection limit of the gOTUs with coverage above ten times was 2.92% (±0.57). The rarefaction curves of genome coverage of the microbial communities observed in the unsorted and sorted sub-communities are illustrated in Figure 4, where all communities were able to achieve their plateau for the given cut-off coverage.
Microorganisms 2023, 11, x FOR PEER REVIEW 9 of 17 Table 1. Genome coverage of genome operational taxonomic units (gOTUs) per sub-community. All gOTUs in the same cluster were classified with the same taxonomy. Mapping the reads from all libraries to the gOTUs was done irrespective of their recovery. We removed gOTUs with less than ten times coverage and labeled them as below the detection limit (BDL

Coverage and Relative Abundance of gOTUs per library
We performed bin-relative abundance analysis using gOTUs recovered from the unsorted and sorted sub-communities to access the fraction of reads of gOTUs in each metagenome. After calculating the coverage, 11 taxa remained since one taxon (C. B-9) was found below the detection limit in all groups (Table 1) and removed from further analyses. The average detection limit of the gOTUs with coverage above ten times was 2.92% (±0.57). The rarefaction curves of genome coverage of the microbial communities observed in the unsorted and sorted sub-communities are illustrated in Figure 4, where all communities were able to achieve their plateau for the given cut-off coverage.
Microorganisms 2023, 11, x FOR PEER REVIEW 9 of 17 Table 1. Genome coverage of genome operational taxonomic units (gOTUs) per sub-community. All gOTUs in the same cluster were classified with the same taxonomy. Mapping the reads from all libraries to the gOTUs was done irrespective of their recovery. We removed gOTUs with less than ten times coverage and labeled them as below the detection limit (BDL

Coverage and Relative Abundance of gOTUs per library
We performed bin-relative abundance analysis using gOTUs recovered from the unsorted and sorted sub-communities to access the fraction of reads of gOTUs in each metagenome. After calculating the coverage, 11 taxa remained since one taxon (C. B-9) was found below the detection limit in all groups (Table 1) and removed from further analyses. The average detection limit of the gOTUs with coverage above ten times was 2.92% (±0.57). The rarefaction curves of genome coverage of the microbial communities observed in the unsorted and sorted sub-communities are illustrated in Figure 4, where all communities were able to achieve their plateau for the given cut-off coverage.

Coverage and Relative Abundance of gOTUs per library
We performed bin-relative abundance analysis using gOTUs recovered from the unsorted and sorted sub-communities to access the fraction of reads of gOTUs in each metagenome. After calculating the coverage, 11 taxa remained since one taxon (C. B-9) was found below the detection limit in all groups (Table 1) and removed from further analyses. The average detection limit of the gOTUs with coverage above ten times was 2.92% (±0.57). The rarefaction curves of genome coverage of the microbial communities observed in the unsorted and sorted sub-communities are illustrated in Figure 4, where all communities were able to achieve their plateau for the given cut-off coverage.
Microorganisms 2023, 11, x FOR PEER REVIEW 9 of 17 Table 1. Genome coverage of genome operational taxonomic units (gOTUs) per sub-community. All gOTUs in the same cluster were classified with the same taxonomy. Mapping the reads from all libraries to the gOTUs was done irrespective of their recovery. We removed gOTUs with less than ten times coverage and labeled them as below the detection limit (BDL). The marks indicate in which library the gOTUs were recovered. The numbers in parenthesis indicate the coverage times of each gOTU in the libraries expressed in percentages.

Coverage and Relative Abundance of gOTUs per library
We performed bin-relative abundance analysis using gOTUs recovered from the unsorted and sorted sub-communities to access the fraction of reads of gOTUs in each metagenome. After calculating the coverage, 11 taxa remained since one taxon (C. B-9) was found below the detection limit in all groups (Table 1) and removed from further analyses. The average detection limit of the gOTUs with coverage above ten times was 2.92% (±0.57). The rarefaction curves of genome coverage of the microbial communities observed in the unsorted and sorted sub-communities are illustrated in Figure 4, where all communities were able to achieve their plateau for the given cut-off coverage.
Microorganisms 2023, 11, x FOR PEER REVIEW 9 of 17 Table 1. Genome coverage of genome operational taxonomic units (gOTUs) per sub-community. All gOTUs in the same cluster were classified with the same taxonomy. Mapping the reads from all libraries to the gOTUs was done irrespective of their recovery. We removed gOTUs with less than ten times coverage and labeled them as below the detection limit (BDL

Coverage and Relative Abundance of gOTUs per library
We performed bin-relative abundance analysis using gOTUs recovered from the unsorted and sorted sub-communities to access the fraction of reads of gOTUs in each metagenome. After calculating the coverage, 11 taxa remained since one taxon (C. B-9) was found below the detection limit in all groups (Table 1) and removed from further analyses. The average detection limit of the gOTUs with coverage above ten times was 2.92% (±0.57). The rarefaction curves of genome coverage of the microbial communities observed in the unsorted and sorted sub-communities are illustrated in Figure 4, where all communities were able to achieve their plateau for the given cut-off coverage.
Microorganisms 2023, 11, x FOR PEER REVIEW 9 of 17 Table 1. Genome coverage of genome operational taxonomic units (gOTUs) per sub-community. All gOTUs in the same cluster were classified with the same taxonomy. Mapping the reads from all libraries to the gOTUs was done irrespective of their recovery. We removed gOTUs with less than ten times coverage and labeled them as below the detection limit (BDL

Coverage and Relative Abundance of gOTUs per library
We performed bin-relative abundance analysis using gOTUs recovered from the unsorted and sorted sub-communities to access the fraction of reads of gOTUs in each metagenome. After calculating the coverage, 11 taxa remained since one taxon (C. B-9) was found below the detection limit in all groups (Table 1) and removed from further analyses. The average detection limit of the gOTUs with coverage above ten times was 2.92% (±0.57). The rarefaction curves of genome coverage of the microbial communities observed in the unsorted and sorted sub-communities are illustrated in Figure 4, where all communities were able to achieve their plateau for the given cut-off coverage.  Table 1. Genome coverage of genome operational taxonomic units (gOTUs) per sub-community. All gOTUs in the same cluster were classified with the same taxonomy. Mapping the reads from all libraries to the gOTUs was done irrespective of their recovery. We removed gOTUs with less than ten times coverage and labeled them as below the detection limit (BDL

Coverage and Relative Abundance of gOTUs per library
We performed bin-relative abundance analysis using gOTUs recovered from the unsorted and sorted sub-communities to access the fraction of reads of gOTUs in each metagenome. After calculating the coverage, 11 taxa remained since one taxon (C. B-9) was found below the detection limit in all groups (Table 1) and removed from further analyses. The average detection limit of the gOTUs with coverage above ten times was 2.92% (±0.57). The rarefaction curves of genome coverage of the microbial communities observed in the unsorted and sorted sub-communities are illustrated in Figure 4, where all communities were able to achieve their plateau for the given cut-off coverage.
Microorganisms 2023, 11, x FOR PEER REVIEW 9 of 17 Table 1. Genome coverage of genome operational taxonomic units (gOTUs) per sub-community. All gOTUs in the same cluster were classified with the same taxonomy. Mapping the reads from all libraries to the gOTUs was done irrespective of their recovery. We removed gOTUs with less than ten times coverage and labeled them as below the detection limit (BDL

Coverage and Relative Abundance of gOTUs per library
We performed bin-relative abundance analysis using gOTUs recovered from the unsorted and sorted sub-communities to access the fraction of reads of gOTUs in each metagenome. After calculating the coverage, 11 taxa remained since one taxon (C. B-9) was found below the detection limit in all groups (Table 1) and removed from further analyses. The average detection limit of the gOTUs with coverage above ten times was 2.92% (±0.57). The rarefaction curves of genome coverage of the microbial communities observed in the unsorted and sorted sub-communities are illustrated in Figure 4, where all communities were able to achieve their plateau for the given cut-off coverage.
Microorganisms 2023, 11, x FOR PEER REVIEW 9 of 17 Table 1. Genome coverage of genome operational taxonomic units (gOTUs) per sub-community. All gOTUs in the same cluster were classified with the same taxonomy. Mapping the reads from all libraries to the gOTUs was done irrespective of their recovery. We removed gOTUs with less than ten times coverage and labeled them as below the detection limit (BDL

Coverage and Relative Abundance of gOTUs per library
We performed bin-relative abundance analysis using gOTUs recovered from the unsorted and sorted sub-communities to access the fraction of reads of gOTUs in each metagenome. After calculating the coverage, 11 taxa remained since one taxon (C. B-9) was found below the detection limit in all groups (Table 1) and removed from further analyses. The average detection limit of the gOTUs with coverage above ten times was 2.92% (±0.57). The rarefaction curves of genome coverage of the microbial communities observed in the unsorted and sorted sub-communities are illustrated in Figure 4, where all communities were able to achieve their plateau for the given cut-off coverage.
Microorganisms 2023, 11, x FOR PEER REVIEW 9 of 17 Table 1. Genome coverage of genome operational taxonomic units (gOTUs) per sub-community. All gOTUs in the same cluster were classified with the same taxonomy. Mapping the reads from all libraries to the gOTUs was done irrespective of their recovery. We removed gOTUs with less than ten times coverage and labeled them as below the detection limit (BDL

Coverage and Relative Abundance of gOTUs per library
We performed bin-relative abundance analysis using gOTUs recovered from the unsorted and sorted sub-communities to access the fraction of reads of gOTUs in each metagenome. After calculating the coverage, 11 taxa remained since one taxon (C. B-9) was found below the detection limit in all groups (Table 1) and removed from further analyses. The average detection limit of the gOTUs with coverage above ten times was 2.92% (±0.57). The rarefaction curves of genome coverage of the microbial communities observed in the unsorted and sorted sub-communities are illustrated in Figure 4, where all communities were able to achieve their plateau for the given cut-off coverage.

Coverage and Relative Abundance of gOTUs per library
We performed bin-relative abundance analysis using gOTUs recovered from the unsorted and sorted sub-communities to access the fraction of reads of gOTUs in each metagenome. After calculating the coverage, 11 taxa remained since one taxon (C. B-9) was found below the detection limit in all groups (Table 1) and removed from further analyses. The average detection limit of the gOTUs with coverage above ten times was 2.92% (±0.57). The rarefaction curves of genome coverage of the microbial communities observed in the unsorted and sorted sub-communities are illustrated in Figure 4, where all communities were able to achieve their plateau for the given cut-off coverage.
Microorganisms 2023, 11, x FOR PEER REVIEW 9 of 17 Table 1. Genome coverage of genome operational taxonomic units (gOTUs) per sub-community. All gOTUs in the same cluster were classified with the same taxonomy. Mapping the reads from all libraries to the gOTUs was done irrespective of their recovery. We removed gOTUs with less than ten times coverage and labeled them as below the detection limit (BDL

Coverage and Relative Abundance of gOTUs per library
We performed bin-relative abundance analysis using gOTUs recovered from the unsorted and sorted sub-communities to access the fraction of reads of gOTUs in each metagenome. After calculating the coverage, 11 taxa remained since one taxon (C. B-9) was found below the detection limit in all groups (Table 1) and removed from further analyses. The average detection limit of the gOTUs with coverage above ten times was 2.92% (±0.57). The rarefaction curves of genome coverage of the microbial communities observed in the unsorted and sorted sub-communities are illustrated in Figure 4, where all communities were able to achieve their plateau for the given cut-off coverage.

Coverage and Relative Abundance of gOTUs per library
We performed bin-relative abundance analysis using gOTUs recovered from the unsorted and sorted sub-communities to access the fraction of reads of gOTUs in each metagenome. After calculating the coverage, 11 taxa remained since one taxon (C. B-9) was found below the detection limit in all groups (Table 1) and removed from further analyses. The average detection limit of the gOTUs with coverage above ten times was 2.92% (±0.57). The rarefaction curves of genome coverage of the microbial communities observed in the unsorted and sorted sub-communities are illustrated in Figure 4, where all communities were able to achieve their plateau for the given cut-off coverage.
Microorganisms 2023, 11, x FOR PEER REVIEW 9 of 17 Table 1. Genome coverage of genome operational taxonomic units (gOTUs) per sub-community. All gOTUs in the same cluster were classified with the same taxonomy. Mapping the reads from all libraries to the gOTUs was done irrespective of their recovery. We removed gOTUs with less than ten times coverage and labeled them as below the detection limit (BDL). The marks indicate in which library the gOTUs were recovered. The numbers in parenthesis indicate the coverage times of each gOTU in the libraries expressed in percentages.

Coverage and Relative Abundance of gOTUs per library
We performed bin-relative abundance analysis using gOTUs recovered from the unsorted and sorted sub-communities to access the fraction of reads of gOTUs in each metagenome. After calculating the coverage, 11 taxa remained since one taxon (C. B-9) was found below the detection limit in all groups (Table 1) and removed from further analyses. The average detection limit of the gOTUs with coverage above ten times was 2.92% (±0.57). The rarefaction curves of genome coverage of the microbial communities observed in the unsorted and sorted sub-communities are illustrated in Figure 4, where all communities were able to achieve their plateau for the given cut-off coverage.  Table 1. Genome coverage of genome operational taxonomic units (gOTUs) per sub-community. All gOTUs in the same cluster were classified with the same taxonomy. Mapping the reads from all libraries to the gOTUs was done irrespective of their recovery. We removed gOTUs with less than ten times coverage and labeled them as below the detection limit (BDL). The marks indicate in which library the gOTUs were recovered. The numbers in parenthesis indicate the coverage times of each gOTU in the libraries expressed in percentages.

Coverage and Relative Abundance of gOTUs per library
We performed bin-relative abundance analysis using gOTUs recovered from the unsorted and sorted sub-communities to access the fraction of reads of gOTUs in each metagenome. After calculating the coverage, 11 taxa remained since one taxon (C. B-9) was found below the detection limit in all groups (Table 1) and removed from further analyses. The average detection limit of the gOTUs with coverage above ten times was 2.92% (±0.57). The rarefaction curves of genome coverage of the microbial communities observed in the unsorted and sorted sub-communities are illustrated in Figure 4, where all communities were able to achieve their plateau for the given cut-off coverage.
Microorganisms 2023, 11, x FOR PEER REVIEW 9 of 17 Table 1. Genome coverage of genome operational taxonomic units (gOTUs) per sub-community. All gOTUs in the same cluster were classified with the same taxonomy. Mapping the reads from all libraries to the gOTUs was done irrespective of their recovery. We removed gOTUs with less than ten times coverage and labeled them as below the detection limit (BDL). The marks indicate in which library the gOTUs were recovered. The numbers in parenthesis indicate the coverage times of each gOTU in the libraries expressed in percentages.

Coverage and Relative Abundance of gOTUs per library
We performed bin-relative abundance analysis using gOTUs recovered from the unsorted and sorted sub-communities to access the fraction of reads of gOTUs in each metagenome. After calculating the coverage, 11 taxa remained since one taxon (C. B-9) was found below the detection limit in all groups (Table 1) and removed from further analyses. The average detection limit of the gOTUs with coverage above ten times was 2.92% (±0.57). The rarefaction curves of genome coverage of the microbial communities observed in the unsorted and sorted sub-communities are illustrated in Figure 4, where all communities were able to achieve their plateau for the given cut-off coverage.
Microorganisms 2023, 11, x FOR PEER REVIEW 9 of 17 Table 1. Genome coverage of genome operational taxonomic units (gOTUs) per sub-community. All gOTUs in the same cluster were classified with the same taxonomy. Mapping the reads from all libraries to the gOTUs was done irrespective of their recovery. We removed gOTUs with less than ten times coverage and labeled them as below the detection limit (

Coverage and Relative Abundance of gOTUs per library
We performed bin-relative abundance analysis using gOTUs recovered from the unsorted and sorted sub-communities to access the fraction of reads of gOTUs in each metagenome. After calculating the coverage, 11 taxa remained since one taxon (C. B-9) was found below the detection limit in all groups (Table 1) and removed from further analyses. The average detection limit of the gOTUs with coverage above ten times was 2.92% (±0.57). The rarefaction curves of genome coverage of the microbial communities observed in the unsorted and sorted sub-communities are illustrated in Figure 4, where all communities were able to achieve their plateau for the given cut-off coverage.
Microorganisms 2023, 11, x FOR PEER REVIEW 9 of 17 Table 1. Genome coverage of genome operational taxonomic units (gOTUs) per sub-community. All gOTUs in the same cluster were classified with the same taxonomy. Mapping the reads from all libraries to the gOTUs was done irrespective of their recovery. We removed gOTUs with less than ten times coverage and labeled them as below the detection limit (BDL

Coverage and Relative Abundance of gOTUs per library
We performed bin-relative abundance analysis using gOTUs recovered from the unsorted and sorted sub-communities to access the fraction of reads of gOTUs in each metagenome. After calculating the coverage, 11 taxa remained since one taxon (C. B-9) was found below the detection limit in all groups (Table 1) and removed from further analyses. The average detection limit of the gOTUs with coverage above ten times was 2.92% (±0.57). The rarefaction curves of genome coverage of the microbial communities observed in the unsorted and sorted sub-communities are illustrated in Figure 4, where all communities were able to achieve their plateau for the given cut-off coverage.

Coverage and Relative Abundance of gOTUs per library
We performed bin-relative abundance analysis using gOTUs recovered from the unsorted and sorted sub-communities to access the fraction of reads of gOTUs in each metagenome. After calculating the coverage, 11 taxa remained since one taxon (C. B-9) was found below the detection limit in all groups (Table 1) and removed from further analyses. The average detection limit of the gOTUs with coverage above ten times was 2.92% (±0.57). The rarefaction curves of genome coverage of the microbial communities observed in the unsorted and sorted sub-communities are illustrated in Figure 4, where all communities were able to achieve their plateau for the given cut-off coverage.
Microorganisms 2023, 11, x FOR PEER REVIEW 9 of 17 Table 1. Genome coverage of genome operational taxonomic units (gOTUs) per sub-community. All gOTUs in the same cluster were classified with the same taxonomy. Mapping the reads from all libraries to the gOTUs was done irrespective of their recovery. We removed gOTUs with less than ten times coverage and labeled them as below the detection limit (BDL

Coverage and Relative Abundance of gOTUs per library
We performed bin-relative abundance analysis using gOTUs recovered from the unsorted and sorted sub-communities to access the fraction of reads of gOTUs in each metagenome. After calculating the coverage, 11 taxa remained since one taxon (C. B-9) was found below the detection limit in all groups (Table 1) and removed from further analyses. The average detection limit of the gOTUs with coverage above ten times was 2.92% (±0.57). The rarefaction curves of genome coverage of the microbial communities observed in the unsorted and sorted sub-communities are illustrated in Figure 4, where all communities were able to achieve their plateau for the given cut-off coverage.
Microorganisms 2023, 11, x FOR PEER REVIEW 9 of 17 Table 1. Genome coverage of genome operational taxonomic units (gOTUs) per sub-community. All gOTUs in the same cluster were classified with the same taxonomy. Mapping the reads from all libraries to the gOTUs was done irrespective of their recovery. We removed gOTUs with less than ten times coverage and labeled them as below the detection limit (BDL

Coverage and Relative Abundance of gOTUs per library
We performed bin-relative abundance analysis using gOTUs recovered from the unsorted and sorted sub-communities to access the fraction of reads of gOTUs in each metagenome. After calculating the coverage, 11 taxa remained since one taxon (C. B-9) was found below the detection limit in all groups (Table 1) and removed from further analyses. The average detection limit of the gOTUs with coverage above ten times was 2.92% (±0.57). The rarefaction curves of genome coverage of the microbial communities observed in the unsorted and sorted sub-communities are illustrated in Figure 4, where all communities were able to achieve their plateau for the given cut-off coverage.

Coverage and Relative Abundance of gOTUs per library
We performed bin-relative abundance analysis using gOTUs recovered from the unsorted and sorted sub-communities to access the fraction of reads of gOTUs in each metagenome. After calculating the coverage, 11 taxa remained since one taxon (C. B-9) was found below the detection limit in all groups (Table 1) and removed from further analyses. The average detection limit of the gOTUs with coverage above ten times was 2.92% (±0.57). The rarefaction curves of genome coverage of the microbial communities observed in the unsorted and sorted sub-communities are illustrated in Figure 4, where all communities were able to achieve their plateau for the given cut-off coverage.

Coverage and Relative Abundance of gOTUs per library
We performed bin-relative abundance analysis using gOTUs recovered from the unsorted and sorted sub-communities to access the fraction of reads of gOTUs in each metagenome. After calculating the coverage, 11 taxa remained since one taxon (C. B-9) was found below the detection limit in all groups (Table 1) and removed from further analyses. The average detection limit of the gOTUs with coverage above ten times was 2.92% (±0.57). The rarefaction curves of genome coverage of the microbial communities observed in the unsorted and sorted sub-communities are illustrated in Figure 4, where all communities were able to achieve their plateau for the given cut-off coverage.

Coverage and Relative Abundance of gOTUs per library
We performed bin-relative abundance analysis using gOTUs recovered from the unsorted and sorted sub-communities to access the fraction of reads of gOTUs in each metagenome. After calculating the coverage, 11 taxa remained since one taxon (C. B-9) was found below the detection limit in all groups (Table 1) and removed from further analyses. The average detection limit of the gOTUs with coverage above ten times was 2.92% (±0.57). The rarefaction curves of genome coverage of the microbial communities observed in the unsorted and sorted sub-communities are illustrated in Figure 4, where all communities were able to achieve their plateau for the given cut-off coverage.

Coverage and Relative Abundance of gOTUs per library
We performed bin-relative abundance analysis using gOTUs recovered from the unsorted and sorted sub-communities to access the fraction of reads of gOTUs in each metagenome. After calculating the coverage, 11 taxa remained since one taxon (C. B-9) was found below the detection limit in all groups (Table 1) and removed from further analyses. The average detection limit of the gOTUs with coverage above ten times was 2.92% (±0.57). The rarefaction curves of genome coverage of the microbial communities observed in the unsorted and sorted sub-communities are illustrated in Figure 4, where all communities were able to achieve their plateau for the given cut-off coverage.

Coverage and Relative Abundance of gOTUs per library
We performed bin-relative abundance analysis using gOTUs recovered from the unsorted and sorted sub-communities to access the fraction of reads of gOTUs in each metagenome. After calculating the coverage, 11 taxa remained since one taxon (C. B-9) was found below the detection limit in all groups (Table 1) and removed from further analyses. The average detection limit of the gOTUs with coverage above ten times was 2.92% (±0.57). The rarefaction curves of genome coverage of the microbial communities observed in the unsorted and sorted sub-communities are illustrated in Figure 4, where all communities were able to achieve their plateau for the given cut-off coverage.

Coverage and Relative Abundance of gOTUs per library
We performed bin-relative abundance analysis using gOTUs recovered from the unsorted and sorted sub-communities to access the fraction of reads of gOTUs in each metagenome. After calculating the coverage, 11 taxa remained since one taxon (C. B-9) was found below the detection limit in all groups (Table 1) and removed from further analyses. The average detection limit of the gOTUs with coverage above ten times was 2.92% (±0.57). The rarefaction curves of genome coverage of the microbial communities observed in the unsorted and sorted sub-communities are illustrated in Figure 4, where all communities were able to achieve their plateau for the given cut-off coverage. On the other hand, coverage of Comamonas terrigena was almost 2-fold lower in OG (5.67%) compared to LA (10.90%) and UC (10.63%). The gOTUs with the smallest coverage in the UC are A. gerneri and Sphingobacterium sp. (3.03%). Further, nine gOTUs were found below the detection limit in one or more sub-communities: six gOTUs in the DG, two gOTUs in the LA, and one gOTU in the OG. However, these taxa were found with coverage above the threshold in one or more sub-communities (Table 1) (see Section 3.1). The genome coverages and relative abundances obtained for gOTUs in each sub-community can be found in Table 1. in all other sub-communities was greater than the average detection limit. Variovorax sp. classified gOTUs were recovered in all sub-communities except DG, with the relative abundances all above the average detection limit except for DG.
A total of 21 taxa were only found in rOTUs, with A. bereziniae, Chryseobacterium pasteurii, Sphingobacterium multivorum and Stenotrophomonas pavanii being recovered in all subcommunities (Supplementary File S8: Table S6). The relative abundances of Chryseobacterium pasteurii, Sphingobacterium multivorum and Stenotrophpmonas terrae were greater than the average detection limit (>0.15%) in rOTUs, while A. bereziniae relative abundance was below the average detection limit in DG and LA sub-communities. Our data indicated that relative abundance, although relevant, may not be the most important feature for the recovery of MAGs.

Discussion
Our study combined high-throughput flow cytometry cell sorting and metagenomics to recover genomes in a cell culture of a wastewater microbial community. For this purpose, we performed Influx high-throughput cell sorting on the microbial community after separating cells virtually into sub-communities and sorting them via FACS before sequencing. By sorting cells of sub-communities, we could recover more gOTUs with higher quality than in whole communities. This phenomenon is related to the limited ability of metagenomics to recover low-abundant species [12,44]. By dividing the community into sub-communities, we reduced diversity in each sub-community but increased resolution in the entire sample. In this study, the 2-fold increase in the number of taxa identified in the sub-communities compared to the unsorted communities indicates the benefits of employing high-throughput cell sorting before metagenomics analysis. The number of taxa obtained in the sub-communities (11) further highlights the benefits of this strategy as, on average, only five MAGs are recovered per library [45], in contrast to the number that was obtained by our approach with 24 MAGS.
In the present study, we recovered 16 MAGs from three sorted sub-communities (3 libraries), after staining the cells with DAPI and eight MAGs from the unsorted community (1 library). The study by Alteio et al. [20] also coupled metagenomics and flow cytometry cell sorting to improve genome recovery in soil communities. Alteio and collaborators sorted 359 'minimetagenome' (sorted cells) after staining the cells with SYBR green, which can be difficult, and managed to recover 200 MAGs from the sorted cells. Furthermore, they retrieved 29 MAGs from the whole community (4 libraries). A possible reason for Alteio and collaborators to generate fewer MAGs per sorted cell group might be that the authors sorted the cells in microwell plates and applied metagenomes to individual sorted cells. While in our study, we combined the individual gates and generated only three sub-communities. Therefore, in the current study, we did not recover individual cells but created smaller sub-communities showing that these two studies had different approaches. We suggest that combining both approaches may recover a larger fraction of genomes in a microbial community.
The taxonomy of the genomes of different bacterial clades present in the wastewater treatment plant, including Gammaproteobacteria (six unsorted and nine sorted MAGs) and Bacteroidia (two unsorted and seven sorted MAGs), was classified at family, genus, and species levels. We found some species in these bacterial clades unique to sorted subcommunities. Interestingly, the number of taxa observed in the unsorted community and sorted sub-communities showed that some MAGs overlapped between all groups. For instance, Escherichia flexnerii was constantly present in all pooled sub-communities and the unsorted community, and three taxa (Variovarax sp., Comamonas terrigena and Empedobacter falsenii) found in the unsorted community were also recovered from the low abundant and outer sub-communities.
One species (Escherichia flexneri) was present in all communities and showed a high genome-coverage in all of the communities where it was recovered, indicating that Escherichia flexneri is one of the most abundant species in the community. Furthermore, we could recover several species (Comamonas terrigena, Sphingobacterium sp., Empedobacter falsenii and Variovorax sp.) across multiple communities; however, their coverages were, in some cases, below the detection limit. Interestingly, other species only found in a single community exhibited higher coverages in the libraries where they were not recovered (for instance, A. bouvetti and A. baumanni). We observed a similar trend for Elizabethkingia ursingii, which showed higher coverages in the communities where it was not recovered (in fact, it was below the detection limit in the DG sub-community where it was recovered). In other cases, species showed a higher coverage in libraries where they were not recovered as a MAG (e.g., A. pitti and A. gerneri). Some species showed coverages below the detection limit in the libraries where they were recovered (e.g., Empedobacter falsenii and Elizabethkingia ursingii). This trend suggests that coverage alone is not the primary driver in genome recovery. A study by Meziti and collaborators [46] used a minimum of 7x MAGs sequence coverage to quantify gene-level diversity within populations of microbial communities. In the current study, we were stricter and used 10x genome coverage. Another study by Meziti et al. [47] suggested 10x coverage is necessary for the reliable recovery of high-quality MAGs in the metagenomes. Our data strongly suggest that coverage is not the only factor influencing genome recovery because we could recover genomes with coverages below the detection limit in the libraries where species showed less than 10× genome coverage. For example, genomes affiliated with Empedobacter falsenii and Elizabethkingia ursingii were recovered as high-quality MAGs, with coverage of 5x in LA and 8x in DG.
A study by Liu and collaborators [21], using the same inoculum from our study and different cultivation conditions, recovered 23 amplicon sequencing variant (ASV) phylotypes classified to class level. The authors clustered their ASVs at 97% sequence similarity and considered ASVs with relative abundances below 0.35% contamination based on internal controls added to the sequencing run. A total of 22 classes of prokaryotes present in the study of Liu and collaborators were not found in our gOTUs. These classes were Acidimicrobiia, Actinobacteria, Alphaproteobacteria, Anaerolineae, Bacilli, Blastocatellia, Betaproteobacteria, Caldilineae, Chlorobia, Chloroflexia, Clostridia, Cytophagia, Deltaproteobacteria, Flavobacteriia, Gracilibacteria, Halobacteria, Nitrospira, Thermotogae, Sphingobacteriia, ML635J-21, Verrucomicrobiae, and Saccharibacteria. We reconstructed rOTUs from the metagenomic libraries of the unsorted and sorted sub-communities by binning the interleaved paired-end raw reads individually with the MATAM pipeline [37] to determine if there were other species within our data that we could not recover using gOTUs, because the reconstruction of rOTUs needs a smaller coverage in the metagenomes. We found 59 rOTUs classified to species level. In comparing species found in gOTUs and rOTUs, a total of eight taxa found in gOTUs were also found in the rOTUs. Still, not all shared the same distribution profile (i.e., not found in the same sub-communities). For example, A. baumannii gOTUs were only recovered in the unsorted sub-community (UC). In contrast, A. baumannii rOTUs were recovered in all sub-communities. By comparing our rOTUs to that of Liu and collaborators [21] we observed similar trends of microbial community compositions (at the class level) in the rOTUs found in the present study. Six classes of taxa found in our study were also present in the former study. However, 17 additional taxa classes found in the study by Liu and colleagues were not detected in our rOTUs. These classes were Acidimicrobiia, Actinobacteria, Anaerolineae, Blastocatellia, Caldilineae, Chlorobia, Chloroflexia, Clostridia, Cytophagia, Deltaproteobacteria, Gracilibacteria, Halobacteria, Nitrospira, Thermotogae, ML635J-21, Verrucomicrobiae, and Saccharibacteria.
The average detection limit of relative abundance of recovered rOTUs was 0.15% (±0.20) in the unsorted and sorted sub-communities, almost 20 times lower than the detection of our prokaryotic metagenome-assembled genomes (gOTUs; i.e., 2.92% ± 0.57). Five rOTUs (Escherichia flexneri, A. baumannii, A. bouvetii, A. pittii and Elizabethkingia miricola) were recovered in all sub-communities. Furthermore, these species have relative abundances above the rOTUs average detection limit in all groups (Supplementary File S8: Table S6). In the study of Liu and collaborators [21], the detection limit was 0.35%, two times higher than our rOTUs detection limit. It is important to note that Liu and collab-orators had internal controls that were used to define the detection limit of the amplicon sequence variants in their studies. Studies involving internal controls for the recovery of metagenome-assembled genomes are lacking and would be necessary for determining the current limits of genome-centric analysis of microbial communities.
Several factors may influence MAG recovery, such as sequencing depth, coverage, fragmentation, genetic diversity, and relative abundance of species [48,49]. Here, we studied the effect of species coverage and relative abundance in MAG recovery. Our results show that species coverage and relative abundance in MAG recovery do not fully explain our ability to recover MAGs and that other factors may play a more significant role. Moreover, other issues have been understudied in the recovery of MAGs, such as the presence of closely related species, which can lead to chimeras [50] and difficulties in species abundance estimation via the alignment of short reads. Sorting individual cells could be the better strategy to enhance the recovery of genomes of species with low abundances in a complex environment [12]. Furthermore, we observed that, on average, 32.71% (±8.39) of reads from each library were not mapped to any MAG. Thus, the extraction of higher yields and higher-quality DNA [51] may also help circumvent some of these issues. Integrating flow cytometry and metagenomics shows a higher resolution in deciphering microbial community composition by decreasing community complexity, including long-read sequencing data leading to improved genome-centric analyses of microbial communities.

Conclusions
Combining high-throughput cell sorting with metagenomics offers a promising opportunity to improve the genome recovery of low-abundant species from complex communities by reducing the complexity of environmental samples through cell sorting. Our study demonstrated that genome coverage and relative abundances are not the only essential factors governing the recovery of metagenome-assembled genomes, and further studies are necessary to underpin them (e.g., the presence of closely related species). Although we recovered species with genome coverage below the threshold (10x genome coverage), it would be important to test if those species were real with the addition of genome standards (internal controls) prior to sequencing. Furthermore, we suggest that combining single-cell sorting with separating sub-communities using flow cytometry may increase the recovery of underrepresented species in unsorted microbial communities.
Supplementary Materials: The following supporting information can be downloaded at: https:// www.mdpi.com/article/10.3390/microorganisms11010175/s1, Figure S1: Experimental design, Figure  S2: Flow cytometry measurement of three batches of cell culture, Table S1: gate clustering according to optical properties, relative abundances and cell counts, Table S2: Genome operational taxonomic units (gOTUs), taxonomic classification and genomic feature summary, Table S3: 16S rRNA sequences reconstructed from all sub-communities (DG, OG. LA, UC) using MATAM, Table S4: Number of clusters, sequences and the relative abundances of representative and all 16S rRNA sequences in all sub-communities (DG, OG. LA, UC), Table S5: Relative abundance of 16S rRNA operational taxonomic units (rOTUs), Table S6: Comparison of relative abundances of ribosomal operational taxonomic units (rOTUs) found in genome operational taxonomic units (gOTUs), Table S7: Comparison of gOTUs and rOTUs in the unsorted community (UC) and sorted sub-communities (dominant sub-community (DG), low abundant sub-community (LA), and outer sub-community (OG)).