Review Reports - A Method for Calculating Whole-Genome Sequencing Outcomes from Trio Data

Round 1

Reviewer 1 Report

Comments and Suggestions for Authors

In "A Method for Calculating Whole-Genome Sequencing Outcomes from Trio Data," the authors developed an algorithm for WGS trio data by computing Mendelian-consistency scores, implemented in C++ and integrated into a Nextflow workflow. They evaluated the pipeline with two variant callers, which are DeepVariant and HaplotypeCaller, across two datasets and concluded that DeepVariant performs better than HaplotypeCaller.

Minor issues:

Where can readers access the C++ implemented code that integrated with the Nextflow workflow? Please provide a link to a public repository and add it to the manuscript.
The test datasets are not available, possibly due to privacy or IRB constraints. Why not use well-known, publicly available trios datasets (e.g., NA12878 with parents NA12891/NA12892, or HG00733 with parents HG00731/HG00732) so that readers can reproduce and compare results? Including more information such as accession numbers, exact software versions, and parameter settings would also help ensure reproducibility.

Author Response

Comment 1: Where can readers access the C++ implemented code that integrated with the Nextflow workflow? Please provide a link to a public repository and add it to the manuscript.

Response 1: Thank you for your comment. Link has been added to the "Data Availability Statement" section of the manuscript: https://github.com/ispras/sarek_trio

Comment 2: The test datasets are not available, possibly due to privacy or IRB constraints. Why not use well-known, publicly available trios datasets (e.g., NA12878 with parents NA12891/NA12892, or HG00733 with parents HG00731/HG00732) so that readers can reproduce and compare results? Including more information such as accession numbers, exact software versions, and parameter settings would also help ensure reproducibility.

Response 2: Thank you for your comment. The created module was intentionally tested on data provided by UFIC RAS to compare the performance of DeepVariant and HaplotypeCaller in under-represented populations. Datasets such as HG and NA are often used to train models (e.g., DeepVariant), so the results may be biased.

Author Response File: Author Response.docx

Reviewer 2 Report

Comments and Suggestions for Authors

I am very glad to see a deterministic algorithm proposed in an era when AI bubble papers are rampant. The algorithm is explainable and can be implemented in reality. Here are my comments.

1) Please add an introductory paragraph between the titles of Section 2 and Section 2.1.

2) The font family of the text in Figure 4 is informal. Please change the font family to "Times New Roman", making it aligned with that used in the content.

3) Normally, screenshots should not appear in the research paper. Please consider adjusting the presentation.

4) Some English expressions read like translation results from Chinese or Russian. For example, in Lines 220-221, the sentence "Because std::set keeps its elements sorted, these operations run in linear time, allowing scalable analysis of large genomic datasets" is not well-written. Please try to optimize the English expression.

5) Please approach to optimize the writing style and the academic expression.

Overall, I would like to make a "Minor Revision" decision.

Author Response

Thank you very much for your comments!

Comment 1: Please add an introductory paragraph between the titles of Section 2 and Section 2.1.

Response 1: We have added the introductory paragraph between Section 2 and 2.1.

Comment 2: The font family of the text in Figure 4 is informal. Please change the font family to "Times New Roman", making it aligned with that used in the content.

Response 2: All images have been updated.

Comment 3: Normally, screenshots should not appear in the research paper. Please consider adjusting the presentation.

Response 3: Screenshots have been deleted/re-maded

Comment 4: Some English expressions read like translation results from Chinese or Russian. For example, in Lines 220-221, the sentence "Because std::set keeps its elements sorted, these operations run in linear time, allowing scalable analysis of large genomic datasets" is not well-written. Please try to optimize the English expression.

Response 4: We have corrected the expression: As std::set keeps its elements sorted, these operations run in linear time. It allows scalable analysis of large genomic datasets.

Comment 5: Please approach to optimize the writing style and the academic expression.

Response 5: We have corrected the sentences with informal or vague expressions.

Author Response File: Author Response.docx