Next Article in Journal
Leveraging 3D Model Systems to Understand Viral Interactions with the Respiratory Mucosa
Next Article in Special Issue
Characterization and Diversity of 243 Complete Human Papillomavirus Genomes in Cervical Swabs Using Next Generation Sequencing
Previous Article in Journal
Novel Broad-Spectrum Antiviral Inhibitors Targeting Host Factors Essential for Replication of Pathogenic RNA Viruses
Previous Article in Special Issue
The International Virus Bioinformatics Meeting 2020
 
 
viruses-logo
Article Menu

Article Menu

Article
Peer-Review Record

NCBI’s Virus Discovery Codeathon: Building “FIVE” —The Federated Index of Viral Experiments API Index

Viruses 2020, 12(12), 1424; https://doi.org/10.3390/v12121424
by Joan Martí-Carreras 1,*, Alejandro Rafael Gener 2,3,4,5,*, Sierra D. Miller 6, Anderson F. Brito 7, Christiam E. Camacho 8, Ryan Connor 8,*, Ward Deboutte 1, Cody Glickman 1, David M. Kristensen 9, Wynn K. Meyer 10,†, Sejal Modha 11, Alexis L. Norris 12, Surya Saha 13,14, Anna K. Belford 15, Evan Biederstedt 16, James Rodney Brister 8, Jan P. Buchmann 17, Nicholas P. Cooley 18, Robert A. Edwards 19, Kiran Javkar 20,21, Michael Muchow 22, Harihara Subrahmaniam Muralidharan 20,23, Charles Pepe-Ranney 24, Nidhi Shah 20, Migun Shakya 25, Michael J. Tisza 15, Benjamin J. Tully 26, Bert Vanmechelen 1, Valerie C. Virta 27, JL Weissman 28, Vadim Zalunin 8, Alexandre Efremov 8 and Ben Busby 8,29,*add Show full author list remove Hide full author list
Reviewer 1: Anonymous
Reviewer 2: Anonymous
Viruses 2020, 12(12), 1424; https://doi.org/10.3390/v12121424
Submission received: 16 October 2020 / Accepted: 2 December 2020 / Published: 10 December 2020
(This article belongs to the Special Issue Virus Bioinformatics 2020)

Round 1

Reviewer 1 Report

I tried to understand the logic of the authors - but it was not easy

To give examples-

Section with Figure 1 - ...  "recognition of protein domains" 

If the recognition is done to classify metagenomic contigs as viral -

why to compare RPS-BLAST with MASH -- which was demonstrated to have poor prediction accuracy  -

Better to compare RPS BLAST performance with the tools that are specialized on the virus sequence recognition - there are several ... VirFind, VirSorter, etc.

RPS BLAST performance (run time) would be much better if instead of 6-frame translation the genes in the contig were predicted first and then RPS BLAST would be used instead of RPS tBLASTn

Fig. 2 is missing

Fig. 3 is low informative - with information repeating itself twice

Given poor potential on MASH to solve the tasks - Figure 4 is becoming of very low interest

Fig. 5 meaning is not possible to understand 

for instance --the legend says -"Node color corresponds to taxonomic distribution of k-mer." no single color is explained.

Fig. 6 would better belong to Suppl Materials (hard to understand without many additional clarifications of the field names)

Reviewer 2 Report

fine for me

 

This manuscript is a resubmission of an earlier submission. The following is a list of the peer review reports and author responses from that submission.

Round 1

Reviewer 1 Report

I do not see that the authors have reached the goal - creating a virtual (federated) database of viral sequences with uniform functional and taxonomy annotation

There is no statistics characterizing the structure, taxonomy and the numbers of entries in the database

I also did not find a clear and logical description of the pipelines that were supposed to be in place to process a large number - of order of 10 mln of contigs reassembled from 2,953 SRA entries

Several important sections are poorly written

Particularly section related to domain search pipeline (Lines 201-213)

The sequence of steps in this section is not well justified. The pipeline, as implemented, is split into two independent methods (Fig 1), one of them (Mash based) was eventually found to be substantially less sensitive than the second method (RPS-tBLASTn based) … an average recall of only 15.3%. – makes the second method not usable in practice unless the precision value is close to 100%. However, this second important measure – precision- was not mentioned at all.

The section does not make precise statements, for instance “Recall percentages were calculated per dataset by dividing the unique viral CDD’s identified by RPS-tBLASTn by the unique viral CDD’s identified by the Mash pipeline.” (lines 205-207)

Line 250 --- construction of whole genome HMMs, not well defined concept and method of design (needs multiple alignments of viral genomes) this has to be explained before coming up with a database of whole genome HMMs … a construct possible may be for short genome viruses but may not be so useful model for viruses with long genomes

Lines 265-271 Another critical section related to Taxonomy-domain-integration pipeline was not well written; it is impossible to reconstruct the logic and to see justification for the steps taken

Only fragments of the database were built – like virus graphs, which concept was not clearly explained, particularly its usefulness

Lines 530-535 – was the database built? – There is no link to a full size database, which seems to be a future work

Reviewer 2 Report

In this article Carreras et al. depict the works performed during the Virus Hunting Toolkit codeathon during which FIVE a federated viral database was elaborated. The initiative is particularly outstanding and need to be promoted. The article is well written and describes what has been done with, via a github repository, the open access to the methods developed.

The reviewer consider this article must be taken has it is, i.e. the description of a work that has been done and can not be done again.

As a virologist some aspects of what has been attempted still remain totally obscure to me (section 3.4 virus graph fig 5) but a detailed explanation is probably out of the scope of this article. May be citing some references would help virologist to understand the rational of this virus graph working group.

minor comments:

P14 L 513  searching potential instead of searaching potential

P15 L556 cloud computing instead of could computing

Back to TopTop