MDPI - Publisher of Open Access Journals

21 pages, 6094 KB

Open AccessArticle

Nanopore-Aware Embedded Detection for Mobile DNA Sequencing: A Viterbi–HMM Design Versus Deep Learning Approaches

by Karim Hammad, Zhongpan Wu, Ebrahim Ghafar-Zadeh and Sebastian Magierowski

Biosensors 2025, 15(9), 569; https://doi.org/10.3390/bios15090569 - 1 Sep 2025

Viewed by 755

Nanopore-based DNA sequencing has emerged as a transformative biosensing technology, enabling real-time molecular diagnostics in compact and mobile form factors. However, the computational complexity of the basecalling process—the step that translates raw nanopore signals into nucleotide sequences—poses a critical energy challenge for mobile [...] Read more.

Nanopore-based DNA sequencing has emerged as a transformative biosensing technology, enabling real-time molecular diagnostics in compact and mobile form factors. However, the computational complexity of the basecalling process—the step that translates raw nanopore signals into nucleotide sequences—poses a critical energy challenge for mobile deployment. While deep learning (DL) models currently dominate this task due to their high accuracy, they demand substantial power budgets and computing resources, making them unsuitable for portable or field-scale biosensor platforms. In this work, we propose an embedded hardware–software framework for DNA sequence detection that leverages a Viterbi-based Hidden Markov Model (HMM) implemented on a custom 64-bit RISC-V core. The proposed HMM detector is realized on an off-the-shelf Virtex-7 FPGA and evaluated against state-of-the-art DL-based basecallers in terms of energy efficiency and inference accuracy. From one side, the experimental results show that our system achieves an energy efficiency improvement of 6.5×, 5.5×, and 4.6×, respectively, compared to similar HMM-based detectors implemented on a commodity x86 processor, Cortex-A9 ARM embedded system, and a previously published Rocket-based system. From another side, the proposed detector demonstrates 15× and 2.4× energy efficiency superiority over state-of-the-art DL-based detectors, with competitive accuracy and sufficient throughput for field-based genomic surveillance applications and point-of-care diagnostics. This study highlights the practical advantages of classical probabilistic algorithms when tightly integrated with lightweight embedded processors for biosensing applications constrained by energy, size, and latency. Full article

(This article belongs to the Special Issue Novel Nanomaterials and Nanotechnology: From Fabrication Methods and Improvement Strategies to Applications in Biosensing and Biomedicine (2nd Edition))

► Show Figures

Figure 1

26 pages, 916 KB

Open AccessReview

Integrating Artificial Intelligence in Next-Generation Sequencing: Advances, Challenges, and Future Directions

by Konstantina Athanasopoulou, Vasiliki-Ioanna Michalopoulou, Andreas Scorilas and Panagiotis G. Adamopoulos

Curr. Issues Mol. Biol. 2025, 47(6), 470; https://doi.org/10.3390/cimb47060470 - 19 Jun 2025

Cited by 6 | Viewed by 3159

Abstract

The integration of artificial intelligence (AI) into next-generation sequencing (NGS) has revolutionized genomics, offering unprecedented advancements in data analysis, accuracy, and scalability. This review explores the synergistic relationship between AI and NGS, highlighting its transformative impact across genomic research and clinical applications. AI-driven [...] Read more.

The integration of artificial intelligence (AI) into next-generation sequencing (NGS) has revolutionized genomics, offering unprecedented advancements in data analysis, accuracy, and scalability. This review explores the synergistic relationship between AI and NGS, highlighting its transformative impact across genomic research and clinical applications. AI-driven tools, including machine learning and deep learning, enhance every aspect of NGS workflows—from experimental design and wet-lab automation to bioinformatics analysis of the generated raw data. Key applications of AI integration in NGS include variant calling, epigenomic profiling, transcriptomics, and single-cell sequencing, where AI models such as CNNs, RNNs, and hybrid architectures outperform traditional methods. In cancer research, AI enables precise tumor subtyping, biomarker discovery, and personalized therapy prediction, while in drug discovery, it accelerates target identification and repurposing. Despite these advancements, challenges persist, including data heterogeneity, model interpretability, and ethical concerns. This review also discusses the emerging role of AI in third-generation sequencing (TGS), addressing long-read-specific challenges, like fast and accurate basecalling, as well as epigenetic modification detection. Future directions should focus on implementing federated learning to address data privacy, advancing interpretable AI to improve clinical trust and developing unified frameworks for seamless integration of multi-modal omics data. By fostering interdisciplinary collaboration, AI promises to unlock new frontiers in precision medicine, making genomic insights more actionable and scalable. Full article

(This article belongs to the Special Issue Technological Advances Around Next-Generation Sequencing Application)

► Show Figures

Graphical abstract

19 pages, 2493 KB

Open AccessArticle

From Metagenomes to Functional Expression of Resistance: floR Gene Diversity in Bacteria from Salmon Farms

by Javiera Ortiz-Severín, Iñaki Hojas, Felipe Redin, Ervin Serón, Jorge Santana, Alejandro Maass and Verónica Cambiazo

Antibiotics 2025, 14(2), 122; https://doi.org/10.3390/antibiotics14020122 - 24 Jan 2025

Cited by 2 | Viewed by 1795

Abstract

Background. The increase in antibiotic resistance in human-impacted environments, such as coastal waters with aquaculture activity, is related to the widespread use of antibiotics, even at sub-lethal concentrations. In Chile, the world’s second largest producer of salmon, aquaculture is considered the main source [...] Read more.

Background. The increase in antibiotic resistance in human-impacted environments, such as coastal waters with aquaculture activity, is related to the widespread use of antibiotics, even at sub-lethal concentrations. In Chile, the world’s second largest producer of salmon, aquaculture is considered the main source of antibiotics in coastal waters. In this work, we aimed to characterize the genetic and phenotypic profiles of antibiotic resistance in bacterial communities from salmon farms. Methods. Bacterial metagenomes from an intensive aquaculture zone in southern Chile were sequenced, and the composition, abundance and sequence of antibiotic resistance genes (ARGs) were analyzed using assembled and raw read data. Total DNA from bacterial communities was used as a template to recover floR gene variants, which were tested by heterologous expression and functional characterization of phenicol resistance. Results. Prediction of ARGs in salmon farm metagenomes using more permissive parameters yielded significantly more results than the default Resistance Gene Identifier (RGI) software. ARGs grouped into drug classes showed similar abundance profiles to global ocean bacteria. The floR gene was the most abundant phenicol-resistance gene with the lowest gene counts, showing a conserved sequence although with variations from the reference floR. These differences were recovered by RGI prediction and, in greater depth, by mapping reads to the floR sequence using SNP base-calling. These variants were analyzed by heterologous expression, revealing the co-existence of high- and low-resistance sequences in the environmental bacteria. Conclusions. This study highlights the importance of combining metagenomic and phenotypic approaches to study the genetic variability in and evolution of antibiotic-resistant bacteria associated with salmon farms. Full article

► Show Figures

Figure 1

24 pages, 2494 KB

Open AccessArticle

The Impact of Oxford Nanopore Technologies Based Methodologies on the Genome Sequencing and Assembly of Romanian Strains of Drosophila suzukii

by Attila Cristian Ratiu, Adrian Ionascu and Nicoleta Denisa Constantin

Insects 2025, 16(1), 2; https://doi.org/10.3390/insects16010002 - 24 Dec 2024

Cited by 1 | Viewed by 2060

Abstract

Background: Drosophila suzukii is a worldwide invasive species with serious economic impacts. Herein, we are presenting the first project of sequencing and assembling the whole genomes of two lines of D. suzukii derived from Romanian local populations using exclusively Oxford Nanopore Technologies data. [...] Read more.

Background: Drosophila suzukii is a worldwide invasive species with serious economic impacts. Herein, we are presenting the first project of sequencing and assembling the whole genomes of two lines of D. suzukii derived from Romanian local populations using exclusively Oxford Nanopore Technologies data. Methods: We implemented both MinION and Flongle flow-cells and tested the impact of various basecalling models and assembly strategies on the quality of the sought-after representative genome assemblies. Results: We demonstrate that the sup-basecalling model significantly improved the read quality and that adding a relatively small collection of reads had a significant positive impact over the assembly quality. The novel dScaff bioinformatics prototype tool allowed us to perform sequence-level quality tests, as well as to represent assembly selections and display both the contig redundancy and the repeats-enriched genomic sub-sequences. Moreover, we used dScaff to propose a minimal assembly variant corresponding to one of our lines, GB-ls-coga4, which assured a basic linear coverage of the genome and exhibited quality parameters comparable with those particular to the current reference genome assembly. Conclusions: The study presents the first sequencing and assembly of a D. suzukii line in Romania and argues the efficiency of long-read sequencing strategies. Full article

(This article belongs to the Section Insect Molecular Biology and Genomics)

► Show Figures

Figure 1

20 pages, 3569 KB

Open AccessArticle

Moving Beyond Oxford Nanopore Standard Procedures: New Insights from Water and Multiple Fish Microbiomes

by Ricardo Domingo-Bretón, Federico Moroni, Socorro Toxqui-Rodríguez, Álvaro Belenguer, M. Carla Piazzon, Jaume Pérez-Sánchez and Fernando Naya-Català

Int. J. Mol. Sci. 2024, 25(23), 12603; https://doi.org/10.3390/ijms252312603 - 23 Nov 2024

Cited by 2 | Viewed by 3335

Abstract

Oxford Nanopore Technology (ONT) allows for the rapid profiling of aquaculture microbiomes. However, not all the experimental and downstream methodological possibilities have been benchmarked. Here, we aimed to offer novel insights into the use of different library preparation methods (standard-RAP and native barcoding-LIG), [...] Read more.

Oxford Nanopore Technology (ONT) allows for the rapid profiling of aquaculture microbiomes. However, not all the experimental and downstream methodological possibilities have been benchmarked. Here, we aimed to offer novel insights into the use of different library preparation methods (standard-RAP and native barcoding-LIG), primers (V3–V4, V1–V3, and V1–V9), and basecalling models (fast-FAST, high-HAC, and super-accuracy-SUP) implemented in ONT to elucidate the microbiota associated with the aquatic environment and farmed fish, including faeces, skin, and intestinal mucus. Microbial DNA from water and faeces samples could be amplified regardless of the library–primer strategy, but only with LIG and V1–V3/V1–V9 primers in the case of skin and intestine mucus. Low taxonomic assignment levels were favoured by the use of full-length V1–V9 primers, though in silico hybridisation revealed a lower number of potential matching sequences in the SILVA database, especially evident with the increase in Actinobacteriota in real datasets. SUP execution allowed for a higher median Phred quality (24) than FAST (11) and HAC (17), but its execution time (6–8 h) was higher in comparison to the other models (0.6–7 h). Altogether, we optimised the use of ONT for water- and fish-related microbial analyses, validating, for the first time, the use of the LIG strategy. We consider that LIG–V1–V9-HAC is the optimal time/cost-effective option to amplify the microbial DNA from environmental samples. However, the use of V1–V3 could help to maximise the dataset microbiome diversity, representing an alternative when long amplicon sequences become compromised by microbial DNA quality and/or high host DNA loads interfere with the PCR amplification/sequencing procedures, especially in the case of gut mucus. Full article

(This article belongs to the Special Issue Molecular Progression of Gut Microbiota)

► Show Figures

Figure 1

21 pages, 3662 KB

Open AccessReview

The Third-Generation Sequencing Challenge: Novel Insights for the Omic Sciences

by Carmela Scarano, Iolanda Veneruso, Rosa Redenta De Simone, Gennaro Di Bonito, Angela Secondino and Valeria D’Argenio

Biomolecules 2024, 14(5), 568; https://doi.org/10.3390/biom14050568 - 10 May 2024

Cited by 40 | Viewed by 9653

Abstract

The understanding of the human genome has been greatly improved by the advent of next-generation sequencing technologies (NGS). Despite the undeniable advantages responsible for their widespread diffusion, these methods have some constraints, mainly related to short read length and the need for PCR [...] Read more.

The understanding of the human genome has been greatly improved by the advent of next-generation sequencing technologies (NGS). Despite the undeniable advantages responsible for their widespread diffusion, these methods have some constraints, mainly related to short read length and the need for PCR amplification. As a consequence, long-read sequencers, called third-generation sequencing (TGS), have been developed, promising to overcome NGS. Starting from the first prototype, TGS has progressively ameliorated its chemistries by improving both read length and base-calling accuracy, as well as simultaneously reducing the costs/base. Based on these premises, TGS is showing its potential in many fields, including the analysis of difficult-to-sequence genomic regions, structural variations detection, RNA expression profiling, DNA methylation study, and metagenomic analyses. Protocol standardization and the development of easy-to-use pipelines for data analysis will enhance TGS use, also opening the way for their routine applications in diagnostic contexts. Full article

(This article belongs to the Section Molecular Genetics)

► Show Figures

Graphical abstract

26 pages, 6676 KB

Open AccessArticle

Nanopore-Sequencing Metabarcoding for Identification of Phytopathogenic and Endophytic Fungi in Olive (Olea europaea) Twigs

by Ioannis Theologidis, Timokratis Karamitros, Aikaterini-Eleni Vichou and Dimosthenis Kizis

J. Fungi 2023, 9(11), 1119; https://doi.org/10.3390/jof9111119 - 18 Nov 2023

Cited by 4 | Viewed by 4100

Abstract

Metabarcoding approaches for the identification of plant disease pathogens and characterization of plant microbial populations constitute a rapidly evolving research field. Fungal plant diseases are of major phytopathological concern; thus, the development of metabarcoding approaches for the detection of phytopathogenic fungi is becoming [...] Read more.

Metabarcoding approaches for the identification of plant disease pathogens and characterization of plant microbial populations constitute a rapidly evolving research field. Fungal plant diseases are of major phytopathological concern; thus, the development of metabarcoding approaches for the detection of phytopathogenic fungi is becoming increasingly imperative in the context of plant disease prognosis. We developed a multiplex metabarcoding method for the identification of fungal phytopathogens and endophytes in olive young shoots, using the MinION sequencing platform (Oxford Nanopore Technologies). Selected fungal-specific primers were used to amplify three different genomic DNA loci (ITS, beta-tubulin, and 28S LSU) originating from olive twigs. A multiplex metabarcoding approach was initially evaluated using healthy olive twigs, and further assessed with naturally infected olive twig samples. Bioinformatic analysis of basecalled reads was carried out using MinKNOW, BLAST+ and R programming, and results were also evaluated using the BugSeq cloud platform. Data analysis highlighted the approaches based on ITS and their combination with beta-tubulin as the most informative ones according to diversity estimations. Subsequent implementation of the method on symptomatic samples identified major olive pathogens and endophytes including genera such as Cladosporium, Didymosphaeria, Paraconiothyrium, Penicillium, Phoma, Verticillium, and others. Full article

► Show Figures

Figure 1

22 pages, 2255 KB

Open AccessArticle

Sequencing, Fast and Slow: Profiling Microbiomes in Human Samples with Nanopore Sequencing

by Yunseol Park, Jeesu Lee and Hyunjin Shim

Appl. Biosci. 2023, 2(3), 437-458; https://doi.org/10.3390/applbiosci2030028 - 17 Aug 2023

Cited by 5 | Viewed by 4517

Abstract

Rapid and accurate pathogen identification is crucial in effectively combating infectious diseases. However, the current diagnostic tools for bacterial infections predominantly rely on century-old culture-based methods. Furthermore, recent research highlights the significance of host–microbe interactions within the host microbiota in influencing the outcome [...] Read more.

Rapid and accurate pathogen identification is crucial in effectively combating infectious diseases. However, the current diagnostic tools for bacterial infections predominantly rely on century-old culture-based methods. Furthermore, recent research highlights the significance of host–microbe interactions within the host microbiota in influencing the outcome of infection episodes. As our understanding of science and medicine advances, there is a pressing need for innovative diagnostic methods that can identify pathogens and also rapidly and accurately profile the microbiome landscape in human samples. In clinical settings, such diagnostic tools will become a powerful predictive instrument in directing the diagnosis and prognosis of infectious diseases by providing comprehensive insights into the patient’s microbiota. Here, we explore the potential of long-read sequencing in profiling the microbiome landscape from various human samples in terms of speed and accuracy. Using nanopore sequencers, we generate native DNA sequences from saliva and stool samples rapidly, from which each long-read is basecalled in real-time to provide downstream analyses such as taxonomic classification and antimicrobial resistance through the built-in software (<12 h). Subsequently, we utilize the nanopore sequence data for in-depth analysis of each microbial species in terms of host–microbe interaction types and deep learning-based classification of unidentified reads. We find that the nanopore sequence data encompass complex information regarding the microbiome composition of the host and its microbial communities, and also shed light on the unexplored human mobilome including bacteriophages. In this study, we use two different systems of long-read sequencing to give insights into human microbiome samples in the ‘slow’ and ‘fast’ modes, which raises additional inquiries regarding the precision of this novel technology and the feasibility of extracting native DNA sequences from other human microbiomes. Full article

(This article belongs to the Special Issue Feature Papers in Applied Biosciences 2023)

► Show Figures

Figure 1

16 pages, 1005 KB

Open AccessArticle

Estimated Nucleotide Reconstruction Quality Symbols of Basecalling Tools for Oxford Nanopore Sequencing

by Wiktor Kuśmirek

Sensors 2023, 23(15), 6787; https://doi.org/10.3390/s23156787 - 29 Jul 2023

Cited by 5 | Viewed by 2617

Abstract

Currently, one of the fastest-growing DNA sequencing technologies is nanopore sequencing. One of the key stages involved in processing sequencer data is the basecalling process, where the input sequence of currents measured on the nanopores of the sequencer reproduces the DNA sequences, called [...] Read more.

Currently, one of the fastest-growing DNA sequencing technologies is nanopore sequencing. One of the key stages involved in processing sequencer data is the basecalling process, where the input sequence of currents measured on the nanopores of the sequencer reproduces the DNA sequences, called DNA reads. Many of the applications dedicated to basecalling, together with the DNA sequence, provide the estimated quality of the reconstruction of a given nucleotide (quality symbols are contained on every fourth line of the FASTQ file; each nucleotide in the FASTQ file corresponds to exactly one estimated nucleotide reconstruction quality symbol). Herein, we compare the estimated nucleotide reconstruction quality symbols (signs from every fourth line of the FASTQ file) reported by other basecallers. The conducted experiments consisted of basecalling the same raw datasets from the nanopore device by other basecallers and comparing the provided quality symbols, denoting the estimated quality of the nucleotide reconstruction. The results show that the estimated quality reported by different basecallers may vary, depending on the tool used, particularly in terms of range and distribution. Moreover, we mapped basecalled DNA reads to reference genomes and calculated matched and mismatched rates for groups of nucleotides with the same quality symbol. Finally, the presented paper shows that the estimated nucleotide reconstruction quality reported in the basecalling process is not used in any investigated tool for processing nanopore DNA reads. Full article

(This article belongs to the Section Chemical Sensors)

► Show Figures

Figure 1

12 pages, 1015 KB

Open AccessArticle

Direct Nanopore Sequencing of Human Cytomegalovirus Genomes from High-Viral-Load Clinical Samples

by Kathy K. Li, Betty Lau, Nicolás M. Suárez, Salvatore Camiolo, Rory Gunson, Andrew J. Davison and Richard J. Orton

Viruses 2023, 15(6), 1248; https://doi.org/10.3390/v15061248 - 26 May 2023

Cited by 5 | Viewed by 3489

Abstract

Nanopore sequencing is becoming increasingly commonplace in clinical settings, particularly for diagnostic assessments and outbreak investigations, due to its portability, low cost, and ability to operate in near real-time. Although high sequencing error rates initially hampered the wider implementation of this technology, improvements [...] Read more.

Nanopore sequencing is becoming increasingly commonplace in clinical settings, particularly for diagnostic assessments and outbreak investigations, due to its portability, low cost, and ability to operate in near real-time. Although high sequencing error rates initially hampered the wider implementation of this technology, improvements have been made continually with each iteration of the sequencing hardware and base-calling software. Here, we assess the feasibility of using nanopore sequencing to determine the complete genomes of human cytomegalovirus (HCMV) in high-viral-load clinical samples without viral DNA enrichment, PCR amplification, or prior knowledge of the sequences. We utilised a hybrid bioinformatic approach that involved assembling the reads de novo, improving the consensus sequence by aligning reads to the best-matching genome from a collated set of published sequences, and polishing the improved consensus sequence. The final genomes from a urine sample and a lung sample, the former with an HCMV to human DNA load approximately 50 times greater than the latter, achieved 99.97 and 99.93% identity, respectively, to the benchmark genomes obtained independently by Illumina sequencing. Thus, we demonstrated that nanopore sequencing is capable of determining HCMV genomes directly from high-viral-load clinical samples with a high accuracy. Full article

(This article belongs to the Section Human Virology and Viral Diseases)

► Show Figures

Figure 1

15 pages, 3896 KB

Open AccessArticle

Accuracy and Completeness of Long Read Metagenomic Assemblies

by Jeremy Buttler and Devin M. Drown

Microorganisms 2023, 11(1), 96; https://doi.org/10.3390/microorganisms11010096 - 30 Dec 2022

Cited by 7 | Viewed by 4248

Abstract

Microbes influence the surrounding environment and contribute to human health. Metagenomics can be used as a tool to explore the interactions between microbes. Metagenomic assemblies built using long read nanopore data depend on the read level accuracy. The read level accuracy of nanopore [...] Read more.

Microbes influence the surrounding environment and contribute to human health. Metagenomics can be used as a tool to explore the interactions between microbes. Metagenomic assemblies built using long read nanopore data depend on the read level accuracy. The read level accuracy of nanopore sequencing has made dramatic improvements over the past several years. However, we do not know if the increased read level accuracy allows for faster assemblers to make as accurate metagenomic assemblies as slower assemblers. Here, we present the results of a benchmarking study comparing three commonly used long read assemblers, Flye, Raven, and Redbean. We used a prepared DNA standard of seven bacteria as our input community. We prepared a sequencing library using a VolTRAX V2 and sequenced using a MinION mk1b. We basecalled with Guppy v5.0.7 using the super-accuracy model. We found that increasing read depth benefited each of the assemblers, and nearly complete community member chromosomes were assembled with as little as 10× read depth. Polishing assemblies using Medaka had a predictable improvement in quality. We found Flye to be the most robust across taxa and was the most effective assembler for recovering plasmids. Based on Flye’s consistency for chromosomes and increased effectiveness at assembling plasmids, we would recommend using Flye in future metagenomic studies. Full article

(This article belongs to the Special Issue 10th Anniversary of Microorganisms: Past, Present and Future)

► Show Figures

Figure 1

16 pages, 2277 KB

Open AccessArticle

Globally Disseminated Multidrug Resistance Plasmids Revealed by Complete Assembly of Multidrug Resistant Escherichia coli and Klebsiella pneumoniae Genomes from Diarrheal Disease in Botswana

by Teddie O. Rahube, Andrew D. S. Cameron, Nicole A. Lerminiaux, Supriya V. Bhat and Kathleen A. Alexander

Appl. Microbiol. 2022, 2(4), 934-949; https://doi.org/10.3390/applmicrobiol2040071 - 11 Nov 2022

Cited by 5 | Viewed by 3353

Abstract

Antimicrobial resistance is a disseminated global health challenge because many of the genes that cause resistance can transfer horizontally between bacteria. Despite the central role of extrachromosomal DNA elements called plasmids in driving the spread of resistance, the detection and surveillance of plasmids [...] Read more.

Antimicrobial resistance is a disseminated global health challenge because many of the genes that cause resistance can transfer horizontally between bacteria. Despite the central role of extrachromosomal DNA elements called plasmids in driving the spread of resistance, the detection and surveillance of plasmids remains a significant barrier in molecular epidemiology. We assessed two DNA sequencing platforms alone and in combination for laboratory diagnostics in Botswana by annotating antibiotic resistance genes and plasmids in extensively drug resistant bacteria from diarrhea in Botswana. Long-read Nanopore DNA sequencing and high accuracy basecalling effectively estimated the architecture and gene content of three plasmids in Escherichia coli HUM3355 and two plasmids in Klebsiella pneumoniae HUM7199. Polishing the assemblies with Illumina reads increased base calling precision with small improvements to gene prediction. All five plasmids encoded one or more antibiotic resistance genes, usually within gene islands containing multiple antibiotic and metal resistance genes, and four plasmids encoded genes associated with conjugative transfer. Two plasmids were almost identical to antibiotic resistance plasmids sequenced in Europe and North America from human infection and a pig farm. These One Health connections demonstrate how low-, middle-, and high-income countries collectively benefit from increased whole genome sequencing capacity for surveillance and tracking of infectious diseases and antibiotic resistance genes that can transfer between animal hosts and move across continents. Full article

► Show Figures

Figure 1

14 pages, 3392 KB

Open AccessArticle

Basecalling Using Joint Raw and Event Nanopore Data Sequence-to-Sequence Processing

by Adam Napieralski and Robert Nowak

Sensors 2022, 22(6), 2275; https://doi.org/10.3390/s22062275 - 15 Mar 2022

Cited by 5 | Viewed by 5377

Abstract

Third-generation DNA sequencers provided by Oxford Nanopore Technologies (ONT) produce a series of samples of an electrical current in the nanopore. Such a time series is used to detect the sequence of nucleotides. The task of translation of current values into nucleotide symbols [...] Read more.

Third-generation DNA sequencers provided by Oxford Nanopore Technologies (ONT) produce a series of samples of an electrical current in the nanopore. Such a time series is used to detect the sequence of nucleotides. The task of translation of current values into nucleotide symbols is called basecalling. Various solutions for basecalling have already been proposed. The earlier ones were based on Hidden Markov Models, but the best ones use neural networks or other machine learning models. Unfortunately, achieved accuracy scores are still lower than competitive sequencing techniques, like Illumina’s. Basecallers differ in the input data type—currently, most of them work on a raw data straight from the sequencer (time series of current). Still, the approach of using event data is also explored. Event data is obtained by preprocessing of raw data and dividing it into segments described by several features computed from raw data values within each segment. We propose a novel basecaller that uses joint processing of raw and event data. We define basecalling as a sequence-to-sequence translation, and we use a machine learning model based on an encoder–decoder architecture of recurrent neural networks. Our model incorporates twin encoders and an attention mechanism. We tested our solution on simulated and real datasets. We compare the full model accuracy results with its components: processing only raw or event data. We compare our solution with the existing ONT basecaller—Guppy. Results of numerical experiments show that joint raw and event data processing provides better basecalling accuracy than processing each data type separately. We implement an application called Ravvent, freely available under MIT licence. Full article

(This article belongs to the Section Intelligent Sensors)

► Show Figures

Figure 1

8 pages, 2198 KB

Open AccessCommunication

Streamlining Quantitative Analysis of Long RNA Sequencing Reads

by Sebastian Oeck, Alicia I. Tüns, Sebastian Hurst and Alexander Schramm

Int. J. Mol. Sci. 2020, 21(19), 7259; https://doi.org/10.3390/ijms21197259 - 1 Oct 2020

Cited by 1 | Viewed by 4008

Abstract

Transcriptome analyses allow for linking RNA expression profiles to cellular pathways and phenotypes. Despite improvements in sequencing methodology, whole transcriptome analyses are still tedious, especially for methodologies producing long reads. Currently, available data analysis software often lacks cost- and time-efficient workflows. Although kit-based [...] Read more.

Transcriptome analyses allow for linking RNA expression profiles to cellular pathways and phenotypes. Despite improvements in sequencing methodology, whole transcriptome analyses are still tedious, especially for methodologies producing long reads. Currently, available data analysis software often lacks cost- and time-efficient workflows. Although kit-based workflows and benchtop platforms for RNA sequencing provide software options, e.g., cloud-based tools to analyze basecalled reads, quantitative, and easy-to-use solutions for transcriptome analysis, especially for non-human data, are missing. We therefore developed a user-friendly tool, termed Alignator, for rapid analysis of long RNA reads requiring only FASTQ files and an Ensembl cDNA database reference. After successful mapping, Alignator generates quantitative information for each transcript and provides a table in which sequenced and aligned RNA are stored for further comparative analyses. Full article

(This article belongs to the Collection Feature Papers in “Molecular Biology”)

► Show Figures

Figure 1

17 pages, 1867 KB

Open AccessArticle

Comparison of Illumina versus Nanopore 16S rRNA Gene Sequencing of the Human Nasal Microbiota

by Astrid P. Heikema, Deborah Horst-Kreft, Stefan A. Boers, Rick Jansen, Saskia D. Hiltemann, Willem de Koning, Robert Kraaij, Maria A. J. de Ridder, Chantal B. van Houten, Louis J. Bont, Andrew P. Stubbs and John P. Hays

Genes 2020, 11(9), 1105; https://doi.org/10.3390/genes11091105 - 21 Sep 2020

Cited by 83 | Viewed by 15468

Abstract

Illumina and nanopore sequencing technologies are powerful tools that can be used to determine the bacterial composition of complex microbial communities. In this study, we compared nasal microbiota results at genus level using both Illumina and nanopore 16S rRNA gene sequencing. We also [...] Read more.

Illumina and nanopore sequencing technologies are powerful tools that can be used to determine the bacterial composition of complex microbial communities. In this study, we compared nasal microbiota results at genus level using both Illumina and nanopore 16S rRNA gene sequencing. We also monitored the progression of nanopore sequencing in the accurate identification of species, using pure, single species cultures, and evaluated the performance of the nanopore EPI2ME 16S data analysis pipeline. Fifty-nine nasal swabs were sequenced using Illumina MiSeq and Oxford Nanopore 16S rRNA gene sequencing technologies. In addition, five pure cultures of relevant bacterial species were sequenced with the nanopore sequencing technology. The Illumina MiSeq sequence data were processed using bioinformatics modules present in the Mothur software package. Albacore and Guppy base calling, a workflow in nanopore EPI2ME (Oxford Nanopore Technologies—ONT, Oxford, UK) and an in-house developed bioinformatics script were used to analyze the nanopore data. At genus level, similar bacterial diversity profiles were found, and five main and established genera were identified by both platforms. However, probably due to mismatching of the nanopore sequence primers, the nanopore sequencing platform identified Corynebacterium in much lower abundance compared to Illumina sequencing. Further, when using default settings in the EPI2ME workflow, almost all sequence reads that seem to belong to the bacterial genus Dolosigranulum and a considerable part to the genus Haemophilus were only identified at family level. Nanopore sequencing of single species cultures demonstrated at least 88% accurate identification of the species at genus and species level for 4/5 strains tested, including improvements in accurate sequence read identification when the basecaller Guppy and Albacore, and when flowcell versions R9.4 (Oxford Nanopore Technologies—ONT, Oxford, UK) and R9.2 (Oxford Nanopore Technologies—ONT, Oxford, UK) were compared. In conclusion, the current study shows that the nanopore sequencing platform is comparable with the Illumina platform in detection bacterial genera of the nasal microbiota, but the nanopore platform does have problems in detecting bacteria within the genus Corynebacterium. Although advances are being made, thorough validation of the nanopore platform is still recommendable. Full article

(This article belongs to the Special Issue Omics Research of Pathogenic Microorganisms)

► Show Figures

Figure 1

Search Results (18)

Further Information

Guidelines

MDPI Initiatives

Follow MDPI

Saved Queries

Search Filter Reset All

Years

Feature Papers

Subjects

Journals

Article Types

Countries / Regions

Search Results (18)

Further Information

Guidelines

MDPI Initiatives

Follow MDPI