The “Dry-Lab” Side of Food Authentication: Benchmark of Bioinformatic Pipelines for the Analysis of Metabarcoding Data
Abstract
:1. Introduction
2. Materials and Methods
2.1. Input Data
2.2. Bioinformatic Pipelines Application for the Analysis of Sequencing Data
2.2.1. First Bioinformatic Pipeline (BP1)
2.2.2. Second Bioinformatic Pipeline (BP2)
2.3. Output Data Analysis
2.3.1. Comparison of Retained Sequences and Number of Features
2.3.2. Data Filtering, Sample Composition Comparison, and Statistical Analysis
2.3.3. Alpha Diversity Indices: Shannon Index and Species Richness
2.4. Bioinformatic Pipelines Friendly Usability Evaluation and Comparison
3. Results and Discussion
3.1. Output Data Analysis
3.1.1. Retained Sequences and Number of Features
3.1.2. Sample Composition and Statistical Analysis
3.1.3. Alpha Diversity Indices: Shannon Index and Species Richness
3.2. Bioinformatic Pipelines Friendly Usability Evaluation and Comparison
3.2.1. Computational Skills and System Requirements (C1)
3.2.2. Data Analysis Streamlining (C2)
3.2.3. Cost of Analysis (C3)
3.2.4. Computational Time Consumption (C4)
4. Conclusions
Supplementary Materials
Author Contributions
Funding
Data Availability Statement
Acknowledgments
Conflicts of Interest
References
- Danezis, G.P.; Tsagkaris, A.S.; Camin, F.; Brusic, V.; Georgiou, C.A. Food authentication: Techniques, trends & emerging approaches. TrAC Trends Anal. Chem. 2016, 85, 123–132. [Google Scholar] [CrossRef]
- Giusti, A.; Malloggi, C.; Magagna, G.; Filipello, V.; Armani, A. Is the metabarcoding ripe enough to be to the authentication of foodstuff of animal origin? A systematic review. Compr. Rev. Food Sci. Food Saf. 2024, 23, e13256. [Google Scholar] [CrossRef] [PubMed]
- Vinothkanna, A.; Dar, O.I.; Liu, Z.; Jia, A.-Q. Advanced detection tools in food fraud: A systematic review for holistic and rational detection method based on research and patents. Food Chem. 2024, 446, 138893. [Google Scholar] [CrossRef] [PubMed]
- Luque, G.M.; Donlan, C.J. The characterization of seafood mislabeling: A global meta-analysis. Biol. Conserv. 2019, 236, 556–570. [Google Scholar] [CrossRef]
- Giusti, A.; Malloggi, C.; Lonzi, V.; Forzano, R.; Meneghetti, B.; Solimeo, A.; Tinacci, L.; Armani, A. Metabarcoding for the authentication of complex seafood products: The fish burger case. J. Food Compos. Anal. 2023, 123, 105559. [Google Scholar] [CrossRef]
- Hellberg, R.S.; Hernandez, B.C.; Hernandez, E.L. Identification of meat and poultry species in food products using DNA barcoding. Food Control 2017, 80, 23–28. [Google Scholar] [CrossRef]
- Sanger, F.; Nicklen, S.; Coulson, A.R. DNA sequencing with chain-terminating inhibitors. Proc. Natl. Acad. Sci. USA 1977, 74, 5463–5467. [Google Scholar] [CrossRef] [PubMed]
- Pan, Y.; Qiu, D.; Chen, J.; Yue, Q. Combining a COI Mini-Barcode with Next-Generation Sequencing for Animal Origin Ingredients Identification in Processed Meat Product. J. Food Qual. 2020, 2020, 1–9. [Google Scholar] [CrossRef]
- Jagadeesan, B.; Gerner-Smidt, P.; Allard, M.W.; Leuillet, S.; Winkler, A.; Xiao, Y.; Chaffron, S.; Van Der Vossen, J.; Tang, S.; Katase, M.; et al. The use of next generation sequencing for improving food safety: Translation into practice. Food Microbiol. 2018, 79, 96–115. [Google Scholar] [CrossRef] [PubMed]
- Callahan, B.J.; McMurdie, P.J.; Rosen, M.J.; Han, A.W.; Johnson AJ, A.; Holmes, S.P. DADA2: High-resolution sample inference from Illumina amplicon data. Nat. Methods 2016, 13, 581–583. [Google Scholar] [CrossRef] [PubMed]
- Callahan, B.J.; McMurdie, P.J.; Holmes, S.P. Exact sequence variants should replace operational taxonomic units in marker-gene data analysis. ISME J. 2017, 11, 2639–2643. [Google Scholar] [CrossRef] [PubMed]
- Ismail, H.D. Bioinformatics: A Practical Guide to Next Generation Sequencing Data Analysis; Chapman and Hall/CRC: New York, NY, USA, 2023. [Google Scholar] [CrossRef]
- Westcott, S.L.; Schloss, P.D. De novo clustering methods outperform reference-based methods for assigning 16S rRNA gene sequences to operational taxonomic units. PeerJ 2015, 3, e1487. [Google Scholar] [CrossRef] [PubMed]
- Hakimzadeh, A.; Asbun, A.A.; Albanese, D.; Bernard, M.; Buchner, D.; Callahan, B.; Caporaso, J.G.; Curd, E.; Djemiel, C.; Durling, M.B.; et al. A pile of pipelines: An overview of the bioinformatics software for metabarcoding data analyses. Mol. Ecol. Resour. 2023, 24, e13847. [Google Scholar] [CrossRef] [PubMed]
- Mathon, L.; Valentini, A.; Guérin, P.; Normandeau, E.; Noel, C.; Lionnet, C.; Boulanger, E.; Thuillier, W.; Bernatchez, L.; Mouillot, D.; et al. Benchmarking bioinformatic tools for fast and accurate eDNA metabarcoding species identification. Mol. Ecol. Resour. 2021, 21, 2565–2579. [Google Scholar] [CrossRef] [PubMed]
- Mbareche, H.; Dumont-Leblond, N.; Bilodeau, G.J.; Duchaine, C. An Overview of Bioinformatics Tools for DNA Meta-Barcoding Analysis of Microbial Communities of Bioaerosols: Digest for Microbiologists. Life 2020, 10, 185. [Google Scholar] [CrossRef] [PubMed]
- Roy, S.; Coldren, C.; Karunamurthy, A.; Kip, N.S.; Klee, E.W.; Lincoln, S.E.; Leon, A.; Pullambhatla, M.; Temple-Smolkin, R.L.; Voelkerding, K.V.; et al. Standards and Guidelines for Validating Next-Generation Sequencing Bioinformatics Pipelines. J. Mol. Diagn. 2018, 20, 4–27. [Google Scholar] [CrossRef] [PubMed]
- Gargis, A.S.; Kalman, L.; Lubin, I.M. Assuring the Quality of Next-Generation Sequencing in Clinical Microbiology and Public Health Laboratories. J. Clin. Microbiol. 2016, 54, 2857–2865. [Google Scholar] [CrossRef] [PubMed]
- Jeske, J.T.; Gallert, C. Microbiome Analysis via OTU and ASV-Based Pipelines—A Comparative Interpretation of Ecological Data in WWTP Systems. Bioengineering 2022, 9, 146. [Google Scholar] [CrossRef] [PubMed]
- D’Argenio, V.; Casaburi, G.; Precone, V.; Salvatore, F. Comparative Metagenomic Analysis of Human Gut Microbiome Composition Using Two Different Bioinformatic Pipelines. BioMed Res. Int. 2014, 2014, 1–10. [Google Scholar] [CrossRef] [PubMed]
- Glassman, S.I.; Martiny, J.B.H. Broadscale Ecological Patterns Are Robust to Use of Exact Sequence Variants versus Operational Taxonomic Units. mSphere 2018, 3, e00148-18. [Google Scholar] [CrossRef] [PubMed]
- Barnes, C.J.; Rasmussen, L.; Asplund, M.; Knudsen, S.W.; Clausen, M.-L.; Agner, T.; Hansen, A.J. Comparing DADA2 and OTU clustering approaches in studying the bacterial communities of atopic dermatitis. J. Med. Microbiol. 2020, 69, 1293–1302. [Google Scholar] [CrossRef] [PubMed]
- Chiarello, M.; McCauley, M.; Villéger, S.; Jackson, C.R. Ranking the biases: The choice of OTUs vs. ASVs in 16S rRNA amplicon data analysis has stronger effects on diversity measures than rarefaction and OTU identity threshold. PLoS ONE 2022, 17, e0264443. [Google Scholar] [CrossRef] [PubMed]
- Kang, W.; Anslan, S.; Börner, N.; Schwarz, A.; Schmidt, R.; Künzel, S.; Rioual, P.; Echeverría-Galindo, P.; Vences, M.; Wang, J.; et al. Diatom metabarcoding and microscopic analyses from sediment samples at Lake Nam Co, Tibet: The effect of sample-size and bioinformatics on the identified communities. Ecol. Indic. 2021, 121, 107070. [Google Scholar] [CrossRef]
- Kappel, K.; Gadelmeier, A.; Denay, G.; Gerdes, L.; Graff, A.; Hagen, M.; Hassel, M.; Huber, I.; Näumann, G.; Pavlovic, M.; et al. Detection of adulterated meat products by a next-generation sequencing-based metabarcoding analysis within the framework of the operation OPSON X: A cooperative project of the German National Reference Centre for Authentic Food (NRZ-Authent) and the competent German food control authorities. J. Consum. Prot. Food Saf. 2023, 18, 375–391. [Google Scholar] [CrossRef]
- Klapper, R.; Velasco, A.; Döring, M.; Schröder, U.; Sotelo, C.G.; Brinks, E.; Muñoz-Colmenero, M. A next-generation sequencing approach for the detection of mixed species in canned tuna. Food Chem. X 2023, 17, 100560. [Google Scholar] [CrossRef] [PubMed]
- Denay, G.; Preckel, L.; Petersen, H.; Pietsch, K.; Wöhlke, A.; Brünen-Nieweler, C. Benchmarking and Validation of a Bioinformatics Workflow for Meat Species Identification Using 16S rDNA Metabarcoding. Foods 2023, 12, 968. [Google Scholar] [CrossRef] [PubMed]
- Giusti, A.; Spatola, G.; Mancini, S.; Nuvoloni, R.; Armani, A. Novel foods, old issues: Metabarcoding revealed mislabeling in insect-based products sold by e-commerce on the EU market. Food Res. Int. 2024, 184, 114268. [Google Scholar] [CrossRef] [PubMed]
- Piper, A.M.; Batovska, J.; Cogan NO, I.; Weiss, J.; Cunningham, J.P.; Rodoni, B.C.; Blacket, M.J. Prospects and challenges of implementing DNA metabarcoding for high-throughput insect surveillance. GigaScience 2019, 8, giz092. [Google Scholar] [CrossRef] [PubMed]
- Oksanen, J.; Simpson, G.; Blanchet, F.; Kindt, R.; Legendre, P.; Minchin, P.; O’Hara, R.; Solymos, P.; Stevens, M.; Szoecs, E.; et al. _vegan: Community Ecology Package_. R package Version 2.6-4. 2022. Available online: https://CRAN.R-project.org/package=vegan (accessed on 2 May 2024).
- de Santiago, A.; Pereira, T.J.; Mincks, S.L.; Bik, H.M. Dataset complexity impacts both MOTU delimitation and biodiversity estimates in eukaryotic 18S rRNA metabarcoding studies. Environ. DNA 2021, 4, 363–384. [Google Scholar] [CrossRef]
- Anslan, S.; Mikryukov, V.; Armolaitis, K.; Ankuda, J.; Lazdina, D.; Makovskis, K.; Vesterdal, L.; Schmidt, I.K.; Tedersoo, L. Highly comparable metabarcoding results from MGI-Tech and Illumina sequencing platforms. PeerJ 2021, 9, e12254. [Google Scholar] [CrossRef] [PubMed]
- Reitmeier, S.; A Hitch, T.C.; Treichel, N.; Fikas, N.; Hausmann, B.; E Ramer-Tait, A.; Neuhaus, K.; Berry, D.; Haller, D.; Lagkouvardos, I.; et al. Handling of spurious sequences affects the outcome of high-throughput 16S rRNA gene amplicon profiling. ISME Commun. 2021, 1, 1–12. [Google Scholar] [CrossRef] [PubMed]
- Joppich, M.; Zimmer, R. From command-line bioinformatics to bioGUI. PeerJ 2019, 7, e8111. [Google Scholar] [CrossRef] [PubMed]
- Mahmoud, M.A.A.; Magdy, M. Metabarcoding profiling of microbial diversity associated with trout fish farming. Sci. Rep. 2021, 11, 421. [Google Scholar] [CrossRef] [PubMed]
- Pérez-Fleitas, E.; Milián-García, Y.; Sosa-Rodríguez, G.; Amato, G.; Rossi, N.; Shirley, M.H.; Hanner, R.H. Environmental DNA-based biomonitoring of Cuban Crocodylus and their accompanying vertebrate fauna from Zapata Swamp, Cuba. Sci. Rep. 2023, 13, 20438. [Google Scholar] [CrossRef] [PubMed]
- Milián-García, Y.; Young, R.; Madden, M.; Bullas-Appleton, E.; Hanner, R.H. Optimization and validation of a cost-effective protocol for biosurveillance of invasive alien species. Ecol. Evol. 2021, 11, 1999–2014. [Google Scholar] [CrossRef] [PubMed]
- Milián-García, Y.; Janke, L.A.A.; Young, R.G.; Ambagala, A.; Hanner, R.H. Validation of an Effective Protocol for Culicoides Latreille (Diptera: Ceratopogonidae) Detection Using eDNA Metabarcoding. Insects 2021, 12, 401. [Google Scholar] [CrossRef] [PubMed]
- Giorgi, F.M.; Ceraolo, C.; Mercatelli, D. The R Language: An Engine for Bioinformatics and Data Science. Life 2022, 12, 648. [Google Scholar] [CrossRef] [PubMed]
- Giardine, B.; Riemer, C.; Hardison, R.C.; Burhans, R.; Elnitski, L.; Shah, P.; Zhang, Y.; Blankenberg, D.; Albert, I.; Taylor, J.; et al. Galaxy: A platform for interactive large-scale genome analysis. Genome Res. 2005, 15, 1451–1455. [Google Scholar] [CrossRef] [PubMed]
- Blankenberg, D.; Taylor, J.; Schenck, I.; He, J.; Zhang, Y.; Ghent, M.; Veeraraghavan, N.; Albert, I.; Miller, W.; Makova, K.D.; et al. A framework for collaborative analysis of ENCODE data: Making large-scale analyses biologist-friendly. Genome Res. 2007, 17, 960–964. [Google Scholar] [CrossRef] [PubMed]
- Afgan, E.; Baker, D.; van den Beek, M.; Blankenberg, D.; Bouvier, D.; Čech, M.; Chilton, J.; Clements, D.; Coraor, N.; Eberhard, C.; et al. The Galaxy platform for accessible, reproducible and collaborative biomedical analyses: 2016 update. Nucleic Acids Res. 2016, 44, W3–W10. [Google Scholar] [CrossRef] [PubMed]
- McMurdie, P.J.; Holmes, S. phyloseq: An R package for reproducible interactive analysis and graphics of microbiome census data. PLoS ONE 2013, 8, e61217. [Google Scholar] [CrossRef] [PubMed]
- Wickham, H.; Averick, M.; Bryan, J.; Chang, W.; McGowan, L.D.A.; François, R.; Grolemund, G.; Hayes, A.; Henry, L.; Hester, J.; et al. Welcome to the tidyverse. J. Open Source Softw. 2019, 4, 1686. [Google Scholar] [CrossRef]
- Brandies, P.A.; Hogg, C.J. Ten simple rules for getting started with command-line bioinformatics. PLoS Comput. Biol. 2021, 17, e1008645. [Google Scholar] [CrossRef] [PubMed]
- Salmaso, N.; Riccioni, G.; Pindo, M.; Vasselon, V.; Domaizon, I.; Kurmayer, R. Metabarcoding protocol: Analysis of Bacteria (including Cyanobacteria) using the 16S rRNA gene and a DADA2 pipeline (Version 1). Interreg Alpine Space: Salzburg, Austria, 2021. [Google Scholar] [CrossRef]
- Yilmaz, P.; Parfrey, L.W.; Yarza, P.; Gerken, J.; Pruesse, E.; Quast, C.; Schweer, T.; Peplies, J.; Ludwig, W.; Glöckner, F.O. The SILVA and “All-species Living Tree Project (LTP)” taxonomic frameworks. Nucleic Acids Res. 2013, 42, D643–D648. [Google Scholar] [CrossRef] [PubMed]
- Pruesse, E.; Quast, C.; Knittel, K.; Fuchs, B.M.; Ludwig, W.; Peplies, J.; Glockner, F.O. SILVA: A comprehensive online resource for quality checked and aligned ribosomal RNA sequence data compatible with ARB. Nucleic Acids Res. 2007, 35, 7188–7196. [Google Scholar] [CrossRef] [PubMed]
- Wang, Q.; Garrity, G.M.; Tiedje, J.M.; Cole, J.R. Naive Bayesian Classifier for Rapid Assignment of rRNA Sequences into the New Bacterial Taxonomy. Appl. Environ. Microbiol. 2007, 73, 5261–5267. [Google Scholar] [CrossRef] [PubMed]
- Maidak, B.L.; Cole, J.R.; Lilburn, T.G.; Parker, C.T., Jr.; Saxman, P.R.; Farris, R.J.; Garrity, G.M.; Olsen, G.J.; Schmidt, T.M.; Tiedje, J.M. The RDP-II (Ribosomal Database Project). Nucleic Acids Res. 2001, 29, 173–174. [Google Scholar] [CrossRef] [PubMed]
- Edgar, R.C. Updating the 97% identity threshold for 16S ribosomal RNA OTUs. Bioinformatics 2018, 34, 2371–2375. [Google Scholar] [CrossRef] [PubMed]
Classification Criterion | BP Category | Description |
---|---|---|
Level of customization | Customizable | Chain of software, tools, or algorithms with commands and settings that can be modified to be adaptable to different users’ needs [14]. |
Precompiled | Chain of software, tools, or algorithms with pre-defined and validated commands and settings that usually facilitate the analysis for users with few bioinformatics skills [14]. | |
Feature typology | Operational Taxonomic Units (OTUs) | Present a hierarchical clustering phase in which raw sequences are grouped into OTUs according to their pairwise similarity (de novo clustering) [12,13]. |
Amplicon Sequence Variants (ASVs) | Present a denoising phase, instead of a clustering phase, in which an error correction algorithm is applied to sequences to produce features [10]. This procedure produces ASVs, which are identical denoised reads with as little as a 1base pair difference between variants [11]. | |
Users’ interface | Command-line interface (CLI/CL) | BP built using software in which commands are typed into a terminal [14]. |
Graphical user interface (GUI) | BP built using software in which users interact with graphical icons [14]. |
Authors | Tool/Algorithms | Feature Typology | Level of Customization | Users’ Interface | Type of Comparison |
---|---|---|---|---|---|
Denay et al. [27] | VSearch | OTU-based (95% de novo clustering) | customizable | CLI | Workflow performances [a] |
VSearch | OTU-based (97% de novo clustering) | customizable | CLI | ||
VSearch | OTU-based (100% de novo clustering—dereplication) | customizable | CLI | ||
DADA2 | ASV-based (denoising) | customizable | CLI | ||
Kappel et al. [25] | VSearch | OTU-based (97% de novo clustering) | customizable | CLI | retained sequences (minimum, maximum, mean, DS); features (OTUs, ASVs) number and percentage, sample compositions |
VSearch | OTU-based (100% de novo clustering—dereplication) | customizable | CLI | ||
DADA2 | ASV-based (denoising) | customizable | CLI | ||
Klapper et al. [26] | QIIME (DADA2) | ASV-based (denoising) | customizable | CLI | sample composition |
Galaxy (DADA2) | ASV-based (denoising) | customizable | GUI | ||
Galaxy (VSearch) | OTU-based (97% de novo clustering) | customizable | GUI |
Criteria | Sub-Criteria (SC) | Score 0 | Score 1 |
---|---|---|---|
Computational skills and system requirement (C1) | Is the pipeline available on Windows? (SC1a) | No | Yes |
Do you need to have any programming experience to use the pipeline? (SC1b) | Yes | No | |
Data analysis streamlining (C2) | Can the BP be easily applied to all samples simultaneously? (SC2a) | No | Yes |
Is it possible to perform output data analysis (i.e., diversity index and plotting of results) on the software hosting the BP? (SC2b) | No | Yes | |
Cost of analysis (C3) | Is the software used for hosting BPs free of charge? (SC3a) | No | Yes |
Are there any free tutorials available for using the pipeline? (SC3b) | No | Yes | |
Computational time consumption (C4) | Which is the faster BP? (SC4a) | Slower | Faster |
Sequencing Datasets | BP | Total Analyzed Reads | Min–Max Reads for Sample | Average Reads for Sample | Min–Max Retained Sequences (%) | Average Retained Reads (%) | N. Features |
---|---|---|---|---|---|---|---|
FBs | BP1 (ASVs) | 2,264,053 | 25,006–247,583 | 94,336 | 73.8–96.8 | 92.0 | 65 |
FBs | BP2 (OTUs) | 76.6–97.5 | 91.2 | 287 | |||
IBPs | BP1 (ASVs) | 1,461,601 | 2312–123,871 | 32,408,02 | 73.1–99.1 | 93.9 | 281 |
IBPs | BP2 (OTUs) | 66.1–86.6 | 81.3 | 315 |
Criteria | Sub-Criteria (SC) | Score 0 | Score 1 | BP1 | BP2 |
---|---|---|---|---|---|
Computational skills and system requirement (C1) | Is the pipeline available on Windows? (SC1a) | No | Yes | 1 | 1 |
Do you need to have any programming experience to use the pipeline? (SC1b) | Yes | No | 0 | 1 | |
Data analysis streamlining (C2) | Can the BP be easily applied to all samples simultaneously? (SC2a) | No | Yes | 1 | 0 |
Is it possible to perform output data analysis (i.e., diversity index and plotting of results) on the software hosting the BP? (SC2b) | No | Yes | 1 | 0 | |
Cost of analysis (C3) | Is the software used for hosting BPs free of charge? (SC3a) | No | Yes | 1 | 0 |
Are there any free tutorials available for using the pipeline? (SC3b) | No | Yes | 1 | 1 | |
Computational time consumption (C4) | Which is the faster BP? (SC4a) | Slower | Faster | 1 | 0 |
TOT | 6 | 3 |
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content. |
© 2024 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).
Share and Cite
Spatola, G.; Giusti, A.; Armani, A. The “Dry-Lab” Side of Food Authentication: Benchmark of Bioinformatic Pipelines for the Analysis of Metabarcoding Data. Foods 2024, 13, 2102. https://doi.org/10.3390/foods13132102
Spatola G, Giusti A, Armani A. The “Dry-Lab” Side of Food Authentication: Benchmark of Bioinformatic Pipelines for the Analysis of Metabarcoding Data. Foods. 2024; 13(13):2102. https://doi.org/10.3390/foods13132102
Chicago/Turabian StyleSpatola, Gabriele, Alice Giusti, and Andrea Armani. 2024. "The “Dry-Lab” Side of Food Authentication: Benchmark of Bioinformatic Pipelines for the Analysis of Metabarcoding Data" Foods 13, no. 13: 2102. https://doi.org/10.3390/foods13132102
APA StyleSpatola, G., Giusti, A., & Armani, A. (2024). The “Dry-Lab” Side of Food Authentication: Benchmark of Bioinformatic Pipelines for the Analysis of Metabarcoding Data. Foods, 13(13), 2102. https://doi.org/10.3390/foods13132102