Accuracy and Completeness of Long Read Metagenomic Assemblies

Microbes influence the surrounding environment and contribute to human health. Metagenomics can be used as a tool to explore the interactions between microbes. Metagenomic assemblies built using long read nanopore data depend on the read level accuracy. The read level accuracy of nanopore sequencing has made dramatic improvements over the past several years. However, we do not know if the increased read level accuracy allows for faster assemblers to make as accurate metagenomic assemblies as slower assemblers. Here, we present the results of a benchmarking study comparing three commonly used long read assemblers, Flye, Raven, and Redbean. We used a prepared DNA standard of seven bacteria as our input community. We prepared a sequencing library using a VolTRAX V2 and sequenced using a MinION mk1b. We basecalled with Guppy v5.0.7 using the super-accuracy model. We found that increasing read depth benefited each of the assemblers, and nearly complete community member chromosomes were assembled with as little as 10× read depth. Polishing assemblies using Medaka had a predictable improvement in quality. We found Flye to be the most robust across taxa and was the most effective assembler for recovering plasmids. Based on Flye’s consistency for chromosomes and increased effectiveness at assembling plasmids, we would recommend using Flye in future metagenomic studies.


Introduction
Current methods for identifying microbes involve isolating and sequencing individual community members, amplicon sequencing 16S rRNA genes, or metagenomics [1,2]. Isolating individual microbes requires culturing, which is often difficult or practically impossible [1]. Sequencing 16S rRNA genes cannot provide information on the entire genomes, such as genes that might increase virulence or provide antibiotic resistance [2]. Metagenomics is a method where an entire sample is sequenced, and the individual community members are sorted out later with bioinformatic analyses [3]. Metagenomic sequencing can detect unculturable and novel community members [1]. The individual community member sequences can be studied to identify pathogens in difficult to diagnose disease, genes that may increase virulence, and look for correlations between co-infecting pathogens that increase disease severity [1,2,4,5]. Currently, most metagenomic approaches use Illumina based technology, which produces high accuracy, short reads [2]. Over the past several years, Oxford Nanopore Technologies (ONT) has increased sequencing throughput and yield to be reasonable for metagenomic studies. While these reads are error prone, the reads are also orders of magnitude longer than short read platforms [2].
The short reads (150-300 bp) from Illumina sequencing make genome assembly difficult for complex communities. Short read lengths do not facilitate scaffolding multiple contigs built for a genome into a single scaffold, resulting in fragmented assemblies [6]. Short reads cannot span long repeat regions, causing repeat regions to collapse, providing less complete assemblies [7]. More complete genomes can be assembled using long read sequencing technologies, such as ONT or PacBio [6]. ONT sequencers platforms (e.g., MinION) have produced reads greater than 2 mb long and can easily produce libraries with mean read lengths greater than 16 kb, which makes it possible to assemble long repeat regions [8][9][10]. However, the high error rates of nanopore sequencing also prevent short read assemblers from producing quality assemblies with long read data [10,11].
Three commonly used long read specific assemblers include Flye, Raven, and Redbean [11][12][13][14]. Flye is a long read metagenomic assembler that constructs a repeat graph to assemble and polish contigs [15,16]. These contigs are then used to build an assembly graph with A-Bruijn [15,16]. Previous studies found that while Flye can build more accurate metagenomic assemblies than Raven or Redbean, it also takes more time and memory [11,17]. Raven is a fast assembler that uses an Overlap-Layout-Consensus (OLC) approach to build an assembly graph from raw reads [18]. For some individual assemblies Raven can have comparable accuracy to Flye after the assemblies are polished, but has less accuracy for metagenomic assemblies [13,17,18]. Redbean is another fast assembler that follows the OLC concept by using a fuzzy de Bruijn graph to build assemblies from raw reads [19,20]. Previous studies have found that Redbean uses more memory and builds less accurate assemblies than Raven [11,17].
Benchmarking is used to compare bioinformatics tools and to determine which tool is best suited for a particular task [12,21]. Benchmarking studies for metagenomic assemblers often include well characterized communities or mock communities, like one of the many ZymoBIOMICS Microbial Community Standards [6,11,15,22]. Mock communities are synthetic communities composed of multiple known microbes, with known sequences and abundances [23]. This information allows for accurate assessment and comparison of assemblers for metagenomic data A past benchmarking study using a ZymoBIOMICS Microbial Community Standard found that Raven and Redbean could not build complete assemblies for the E. coli and Salmonella enterica community members [11]. Raven did well for the other community members in the ZymoBIOMICS Microbial Community Standard [11] Raven also performs well for individual assemblies of E. coli [13,14]. These differences in performance suggest that the high read error rate may cause Raven to confuse genome fragments from other community members with E. coli fragments. If so, a higher read accuracy, as produced by Guppy 5.0.7, may allow Raven to assemble all community members from the mock community with similar accuracy to Flye. Another weakness of Raven and Redbean, is that they often fail to build assemblies for plasmids [17]. These weaknesses may limit the performance of Raven and Redbean for complex metagenomic assemblies, where plasmids may be common and particular community members may be present. Improvements in converting the electrical signal from nanopore sequencing to to nucleotides (basecalling) have led to increased read level nanopore sequence accuracy. The release of new models (e.g., super-accuracy model) for Guppy have pushed modal accuracy even higher (see also https://nanoporetech.com/accuracy (accessed on 18 July 2022)). As the individual reads improve in quality, faster assemblers, such as Raven, may be able to build assemblies of problematic community members, such as E. coli, with comparable accuracy to slower, but more accurate assemblers, like Flye.
Here, we compare the completeness and accuracy of metagenomic assemblies built with Flye, Raven, and Redbean. We used data basecalled with the super-accuracy model of Guppy to systematically explore the impact of read depth. From this comparison, we contrast the areas of strength and weakness of long read metagenomic assemblers.

Sequencing
We sequenced a mock community standard (ZymoBIOMICS HMW DNA Standard, catalog #D6322) using long read sequencing to compare metagenomic assembly methods. The HMW DNA standard is a synthetic microbial community comprising three Gram negative bacteria, four Gram positive bacteria, and one yeast (Table 1). Bacterial community members have a genome size between 2.73 mb to 6.792 mb, a GC content between 32.9% and 66.2% (Table 1). The E. coli plasmid is 110,009 bp, the S. aureus 1 plasmid is 6339 bp, the S. aureus 2 plasmid is 2218 bp, and the S. aureus 3 plasmid is 2995 bp. Each bacterial community also contributed 14% of nucleotides in the mock community (Table 1). We excluded the Saccharomyces community member due to the reference genome being highly fragmented. The template DNA in the community has a mean length of 24 kb. Reference genome sequences can be found at https://s3.amazonaws.com/zymo-files/BioPool/D632 2.refseq.zip (accessed on 15 July 2021).
We used 1 µg of the HMW DNA standard as input for the VolTRAX V2 (ONT) to prepare a sequencing library (VSK-VSK002 workflow). The VolTRAX library is analogous to the Rapid Sequencing library and results in additional DNA template fragmentation as the library is prepared. We sequenced the prepared library using the MinION mk1b (ONT) on a r9.4.1 flow cell (FLO-MIN106) for 48 h (VSK002 script). We basecalled the reads using Guppy version 5.0.7 with the super-accuracy model (-c dna_r9.4.1_450bps_sup.cfg). We set a minimum quality filter of ≥ 10 (-min_qscore 10). To generate a subsample of reads, we used trycycler [24]. We used a genome size of 42 mb and the -min read depth parameter to generate subsamples of 420 mb, 840 mb, 1260 mb, 2100 mb, 4200 mb, and 8400 mb. These total yields should theoretically represent 10×, 20×, 30×, 50×, 100×, and 200× read depths. At each read depth, we produced 12 subsamples for a total of 72 datasets. The mean number of bases, mean longest read length, and mean N50 for each read depth was found using NanoStat -fastq [25].

Quality Assessment
We measured assembly quality and completeness with the genome fraction output by MetaQuast v5.1.0 [27]. For MetaQuast we used the references in (Table 1) to measure the completeness of both the polished and unpolished metagenomic assemblies. We measured assembly accuracy with the median Q-score output by Pomoxis assess_assembly https://github.com/nanoporetech/pomoxis/ (accessed on 15 July 2021). Pomoxis was used with the references in (Table 1) to find the quality scores (Q-scores) of the assemblies. For each assembly, we calculated Q-scores for chromosomes and plasmids separately.
We completed all analysis, including assembly, polishing, and assembly quality assessment on a server with an Intel Core i9 9900K 3.6 GHz Eight Core (16 thread) CPU, a Nvidia Quadro GV100 GPU, and 128 GB of ram. We measured the time required and the max memory used to build each assembly using GNU time with parameter -f %ee. The time, assembly, polishing, MetaQuast, and Pomoxis steps were automated using custom bash scripts https://github.com/jeremyButtler/assembler-scripts (accessed on 22 June 2022).

Subsampling Statistics
We sequenced the ZymoBIOMICS HMW DNA Standard on a nanopore sequencer and subsampled reads into subsamples of 420 mb (~10× read depth), 840 mb (~20× read depth), 1260 mb (~30× read depth), 2100 mb (~50× read depth), 4200 mb (~100× read depth), and 8400 mb (~200× read depth). For each targeted read depth, our mean number of bases was very close to our target number of base pairs ( Table 2). The mean read N50 between our read depths differed by only 18 base pairs (15,012 to 15,030 bp) and was 300 bp greater than our raw data mean read N50 of 14,703 bp ( Table 2). Each time the read depth was doubled, we saw a two-fold increase in the mean number of reads (Table 2). Our mean Q-score for the raw data was 13.5. Across all read depths, we found Flye produced assemblies that aligned with nearly 100% of the community of reference genomes (metagenome fractions) (Figure 1a). Even at our smallest read depth of 10×, Flye recovered nearly 100% of the metagenomic fraction ( Figure 1a). With increasing read depth, Raven and Redbean produced assemblies with improved metagenome fractions (Figure 1a). Raven and Redbean reached a maximum metagenome fraction of 95% at 200× read depth (Figure 1b). At the individual community member level, Flye, Raven, and Redbean produced assemblies with over 99.9% median genome fractions for most of the community members ( Figure 1, Table A1). With Raven and Redbean having the most difficulty assembling Escherichia coli and Salmonella enterica, recovering less than 80% genome fraction even at 200× read depth (Figure 1b).

Accuracy (Q-Score)
Across all read depths, we found Flye produced the most accurate metagenomic assemblies, followed by Raven, and then Redbean (Figure 2a). Increased read depth and polishing, predictably improved the median quality scores (Q-scores) of assemblies from all assemblers (Figure 2a). All assemblers had a large improvement in Q-scores between 10× and 50× read depth (Figure 2a). At 200× read depth Flye reached a maximum Q-score of 50, while Raven and Redbean reached a maximum Q-score of 46 and 45, respectively, (Figure 2a). At the individual community member level, Raven and Redbean had the most difficulty in the assembly of E. coli and S. enterica (Figure 2b). E. coli assemblies produced with Flye were more accurate (median Q-score 42.81) than those from Raven (26.73) and Redbean (under 20) (Figure 2b). S. enterica assemblies produced by Flye were highly accurate (median Q-score 50), while Raven was slightly less accurate (42.54), but Redbean produced error prone assemblies (under 20) (Figure 2b). We also found that Raven and Redbean, but not Flye, had over 10 misassemblies for E. coli and S. enterica ( Figure A1).

Genome Fraction
Across all read depths, we found Flye recovered over 94% of the plasmid genomes (Figure 3a). After 50× read depth Flye recovered nearly 100% of the plasmid genomes (Figure 3a). After 20× read depth, Raven and Redbean decreased the recovery of plasmid genomes (Figure 3a). Raven and Redbean assembled a maximum of 95% of the plasmid genome at 20× read depth (Figure 3a).   At the individual plasmid level, Raven and Redbean both struggled with the plasmids smaller than 7 kb (Figure 3b). Raven and Redbean assembled more of plasmids under 7 kb at 30× and 50× read depth than at 200× read depth ( Figure A2a,b). Raven could assemble the 2995 bp plasmid for all replicates at 50× read depth, but not at 200× read depth ( Figure A2b).

Accuracy (Q-Score)
Across all read depths, we found that found that Flye assembled the most accurate plasmids (Figure 4a). With increased read depth, Flye produced more accurate plasmid assemblies (Figure 4a). However, Polishing did not improve the accuracy of Flye plasmid assemblies (Figure 4a). At 100× read depth, Flye plasmid assemblies had a median Q-score of 50 (Figure 4a).  Across all read depths, polishing Raven and Redbean plasmid assemblies resulted in more accurate plasmid genomes (Figure 4a). Increased read depth did not imporeve the accuracy of Raven produced plasmid assemblies (Figure 4a). Beyond 50× read depth, Redbean produced more accurate plasmid assemblies than Raven (Figure 4a). However, Raven build more accurate plasmids assemblies than Redbean when the read depth was under 100× (Figure 4a).
At the individual plasmid level, only the E. coli 110,009 bp plasmid could be assembled by all assemblers (Figure 4b). All assemblers had a similar accuracy for the E. coli plasmid (Q-scores around 26) (Figure 4b). All assemblers were able to assemble the E. coli plasmid without misassemblies, but Flye and Redbean did have misassemblies for the plasmids under 7 kb ( Figure A3). However, Flye assembled almost all replicates for each plasmid and had near perfect median Q-scores for plasmids under 7 kb (Figure 4b).

Assembly Time and Memory Usage
Predictably, we found that assemblers needed more time and memory to build an assembly with greater read input (Figure 5a,b). When the read depth was under 50×, all assemblers used less than 30 min to complete an assembly (Figure 5a). At 200× read depth, Flye needed over 400 minutes to complete an assembly (Figure 5a). With that same input Raven required just 50 min and Redbean required only 25 minutes to complete an assembly (Figure 5a). Across all read depths, Raven and Redbean used less memory than Flye to build an assembly (Figure 5b). At read depths under 100×, Raven used less memory than Redbean to build an assembly (Figure 5b). At 50× read depth, Raven needed 5.5 Gb of memory to build an assembly, while Redbean needed 7.7 Gb of memory to build an assembly (Figure 5b). Beyond 50× read depth, Raven used more memory than Redbean to build an assembly (Figure 5b). At 200× read depth, Raven used 15.6 Gb of memory to build an assembly, while Redbean used 10.5 Gb of memory to build an assembly (Figure 5b). Flye used the most memory to build an assembly, requiring 10.6 Gb of memory at 10× read depth and 55.8 Gb of memory at 200× read depth (Figure 5b).

Discussion
We compared the accuracy and completeness of metagenomic assemblies built by three long read assemblers, Flye, Raven, and Redbean. For chromosomes, we found Flye was the only assembler that made near complete and accurate genomes for all community members. For plasmids, we found Flye was the only assembler that could assemble all plasmids reliably. However, Raven and Redbean were superior to Flye in time and memory usage.

Effect of Read Depth
For chromosomes, we found with increased read depth, all assemblers made more accurate and complete assemblies. We found that there was a sharp increase in accuracy between 10× and 50× read depth. At 10× read depth, Flye was the only assembler that had near complete metagenome fractions. Showing that Flye should be used for low read depth datasets. However, for more accurate assemblies, future metagenomic studies should continue to aim for a read depth of at least 30×.
For plasmids, we found most plasmids under 7 kb were assembled best by Flye, with the most plasmids recovered at 200× read depth. However, Raven and Redbean had decreased small plasmid recovery at deeper read depths and performed best at read depths between 20× or 50×. The decrease in assembled plasmids under 7 kb at deeper read depths suggests that Raven and Redbean are discarding smaller reads and contigs at deeper read depths. This results in plasmids under 7 kb being missed at deeper read depths but being retained at more shallow read depths. These observations are consistent with Wick and Holt [17], who also found that both Raven and Redbean struggled to complete assemblies of smaller plasmids. These results highlight the weakness in Raven and Redbean for recovering plasmids.
We found that the accuracy of the larger E. coli plasmid (Q-score under 30) was much lower than the chromosome assemblies (40 or 50). This suggests that the plasmids have more error prone regions, assemblers are more likely to make misassemblies for plasmids, or that the plasmid references have more errors than the chromosome references. For reference errors, Flye could often assemble plasmids under 3 kb with no indels or mismatches and with only 2 to 3 misassemblies. Errors in the references are a less likely but still a potential explanation for why Flye, Raven, and Redbean had poor performance for the E. coli plasmid.
For misassembly errors, we found all assemblers had no errors in E. coli plasmid assemblies at 200× read depth, showing that the problem is not from misassemblies in the E. coli plasmid. Other sources of errors in the E. coli palsmid could be from more error prone regions in the E. coli plasmid or errors inserted by the assemblers in the assembly of the E. coli plasmid. During the process of generating these results, a new version of Flye was released (v2.9), which included improvements for recovering plasmids and accounts for the improved accuracy of the super-accuracy model. However, more testing with a broader range of plasmid sizes is needed to determine if the errors are from error prone regions or from the assembler.

Metagenomics and Viruses
Though our study examined a mock microbial community mostly consisting of bacterial genomes, our results still provide insights into how reliable each assembler may be for viral metagenomic assemblies. The E. coli plasmid in our study is 110 kb long, which is close to or under the size of a large virus, such as the 170 to 190 kb African swine fever virus (ASFV) [36]. We have previously used Flye to recover ASFV successfully from a metagenomic sample [37], while the smaller plasmids in our study are near the size of small viruses, such as porcine circovirus type 2, which is 1.76 kb long [38].
For the larger plasmids and likely larger viruses, we found that Raven or Redbean would likely work as well as Flye. However, only Flye could make reliable assemblies for the smaller plasmids and so, is the only reliable assembler for smaller viruses. Even then Flye will often have a few misassemblies, so it might be best to use an assembler, like viralFlye that is designed for viruses [39]. However, viralFlye is specialized for virus detection and thus has limitations on the max genome size [39]. This may limit viralFlye's use for bacterial community members. Making Flye or assemblies made with both viralFlye and Flye the best option for sequencing mixed communities of viruses and bacteria.

Effect of Polishing
We found that polishing improved the accuracy of all chromosome assemblies. However, for Flye and Redbean, polishing continued to improve the accuracy at 200× read depth. This suggests that even more data will improve the accuracy of polished Flye assemblies. To achieve highly accurate assemblies, we would recommend polishing and using the greatest read depth as possible.
For Flye, polishing had little effect on the accuracy of plasmid assemblies. Instead, most plasmids smaller than 3 kb had no indels or mismatches at 200× read depth. This shows that polishing did not decrease the accuracy of the perfect assemblies. Likely, the high accuracy was due to the genome sizes of the plasmids being smaller than the error rate of consensuses assemblies (one error in 10,000 bp for chromosomes). The idea of size is somewhat supported by the ten fold larger E. coli plasmid assemblies built by Flye having much higher error rates (median Q-score~28) than the plasmids under 3 kb. Since polishing provides large improvements for chromosomes, while having no decrease in accuracy for plasmids, we would recommend polishing all metagenomic assemblies.

Problem Isolates
We found that Raven and Redbean struggled to build assemblies of E. coli and Salmonella enterica. Latorre-Pérez et al. [11], also found that Raven and Redbean struggled with E. coli and S. enterica strains for the log and even mock communities from ZymoBIOMICS, both of which use the same E. coli and S. enterica strains as the HMW DNA Standard Mock Community. However, in a non-metagenomic study, Chen et al. [40] found that Raven could assemble complete genomes for a different strain of E. coli and possibly a different serovar of Salmonella (S. Typhimurium). This suggests that either the strain of E. coli used in the mock community is a problematic strain or that assembling genomes of E. coli combined with S. enterica is difficult. Breckell and Silander [13] found that strain specific characteristics of different E. coli made some E. coli strains harder to for assemblers to assemble, so it is possible that the strain of E. coli in the mock community could be a more difficult strain to assemble. However, Breckell and Silander [13] found that problematic strains of E. coli were problematic for all assemblers. Flye had few misassemblies for E. coli at 200× read depth and had more accurate assemblies of E. coli than Raven or Redbean. This evidence is not consistent with a problematic strain of E. coli. However, we cannot fully eliminate the idea that the strain of E. coli in the mock community may be a more difficult strain to assemble.

Other Studies
To the best of our knowledge, our study is the first study to compare metagenomic assemblies made by Flye, Raven, and Redbean using super-accurate basecalled reads. We found Flye still made more accurate and complete genomes than Raven or Redbean when used with highly accurate reads. This is consistent with a previous comparison of Flye, Raven, and Redbean assemblies made from the less accurate reads Latorre-Pérez et al. [11]. Like Sereika et al. [22] we found accurate genomes could be built from read depths as low as 30× using Flye (Q 45). This is an improvement from the Q-score of 43.6 at 80× read depth seen by Broddrick et al. [41] and the Q-score of around 40 seen by Latorre-Pérez et al. [11] for 6 Gb of data. We also know from Sereika et al. [22] that even higher accuracy can be achieved if a r10.4 flow cell is used instead of a r9.4 flow cell.
Like Latorre-Pérez et al. [11], we found that most community members had genome fractions over 99% at 200× read depth. However, our genome fractions for most community members were often at over 99.9%, which is higher than what Latorre-Pérez et al. [11] found at 3 Gb and 6 Gb. The only exceptions were assemblies of E. coli and S. enterica made by Raven and Redbean, which had lower genome fractions than what Latorre-Pérez et al. [11] found.
One key difference between this study and Latorre-Pérez et al. [11] is that the raw data used here had much longer reads (mean read N50 of 15 kb), where as Latorre-Pérez et al. [11] had 4 kb reads. This difference may account for the higher genome fractions for most community members, but does not explain the decrease in genome fractions for E. coli and S. enterica assemblies made by Raven or Redbean.
Like Breckell and Silander [13] and Latorre-Pérez et al. [11], we found Flye and Raven to be better than Redbean in assembling complete genomes. However, unlike Breckell and Silander [13], but like Latorre-Pérez et al. [11], we found Flye assembled more accurate assemblies than Raven. The difference may be that Breckell and Silander [13] looked at assembling single isolates instead of metagenomes in this study and Latorre-Pérez et al. [11]. This suggests that Raven may be better suited for assembling single isolates than metagenomics.
Like Wick and Holt [17], we found Flye needed more time and memory than Raven and Redbean to complete an assembly. The large time and memory demands of Flye may limit Flye to lab use or at least limit Flye to high end laptops. However, Flye was the only assembler able to assemble the entire mock community at Q-scores greater than 40. Furthermore, the use of the super-accuracy basecalling model will likely require a higher end laptop with a good GPU. This makes the high time and memory usage of Flye less of an issue.

Conclusions
We found Flye was more reliable than Raven or Redbean for building accurate and complete assemblies for both chromosomes and plasmids from metagenomic communities. We found that Raven and Redbean struggle to recover small plasmids. This suggests that Flye would be a better choice for assembling viral community members. For our study's community, Raven and Redbean only performed better than Flye in the computational resources needed to build an assembly. However, for a metagenomic study using the superaccurate basecalling model, the extra time and memory usage needed to run Flye would likely be minimal. On the other hand, the cost in accuracy from problematic communities members or missing small plasmid and virus assemblies from Raven and Redbean could lead to misinterpretations. Thus, for future metagenomic studies that use the super-accurate basecalling model, we would recommend using Flye. Data Availability Statement: The sequencing data for this project can be found in the NCBI SRA https://www.ncbi.nlm.nih.gov/sra/PRJNA903965 under accession number PRJNA903965.

Acknowledgments:
We are thankful to Tracie Haan and Taylor Seitz who sequenced the synthetic community used in our study. We would like to thank members of Drown Lab, Olin Silander, Ursel Schütte, Diane Wagner, Karsten Hueffer, and Eric Bortz who provided feedback on the experimental analysis and manuscript drafts.     Figure A4. Histogram of read length and quality of unsubsampled data.