Next Article in Journal
Dataset for Electronics and Plasmonics in Graphene, Silicene, and Germanene Nanostrips
Previous Article in Journal
Mapping Hierarchical File Structures to Semantic Data Models for Efficient Data Integration into Research Data Management Systems
 
 
Article
Peer-Review Record

Curating, Collecting, and Cataloguing Global COVID-19 Datasets for the Aim of Predicting Personalized Risk

by Sepehr Golriz Khatami 1,2,†, Astghik Sargsyan 1,2,†, Maria Francesca Russo 3, Daniel Domingo-Fernández 1,4, Andrea Zaliani 5, Abish Kaladharan 6, Priya Sethumadhavan 6, Sarah Mubeen 1,2,4, Yojana Gadiya 5, Reagon Karki 5, Stephan Gebel 1, Ram Kumar Ruppa Surulinathan 1,2, Vanessa Lage-Rupprecht 1, Saulius Archipovas 7, Geltrude Mingrone 3,8,9, Marc Jacobs 1, Carsten Claussen 5, Martin Hofmann-Apitius 1,2 and Alpha Tom Kodamullil 1,*
Reviewer 1:
Reviewer 2: Anonymous
Submission received: 8 June 2023 / Revised: 6 December 2023 / Accepted: 9 January 2024 / Published: 29 January 2024

Round 1

Reviewer 1 Report

Comments and Suggestions for Authors

The paper delivers an extensive overview of existing COVID-19 datasets and their corresponding metadata, focusing on their potential applications for data-driven personalized models. These models are pivotal for predicting individualized risks and patient-specific risk factors. The authors adeptly contextualize their approach in constructing a comprehensive COVID-19 data catalog that provides the requisite data and variables for modeling the unique progression of COVID-19 and enhancing personalized risk modeling procedures.

 Although the paper delves into a pertinent and promising academic domain, it would benefit from enhanced clarity in articulating the primary challenges it tackles. Additionally, the absence of a Conclusion section is noticeable. Improving this section would enable highlighting significant achievements and providing further insights into future directions and ongoing research.

 Notably, the extensive list of coauthors, including 19 individuals in addition to the COPERIMOplus consortium, raises concerns. Generally, the submission of a scientific paper should involve no more than 4 to 5 coauthors to ensure a more focused and cohesive collaboration. This is important as the manuscript should not be perceived as a project consortium deliverable but as a scholarly work expressing the perspectives of specific researchers.

Important style formatting issues:

Table legend comes at the top and not at the bottom of the Table. Also, revise (reformat) the text used to explain the table content; this text is not part of the Table legend but a specific content below the table legend.

 The recommendation is to use hyperlinks instead of the full URL to reference the data sources. Review the manuscript to replace URLs with hyperlinks.

 Figures with a bar-column chart should have a label for each bar column (e.g., Fig. 3, Fig. 4, Fig. 5. Fig. 8, Fig. 9)

 Line 71: It is better to adopt a hyperlink style approach and explain each dataset's key points regarding relevant parameters/variables.

 

Line 79: should provide a short critical analysis of all datasets outlined and not just the Microsoft Azure COVID-19 data

Line 136: what is the difference from what is mentioned in sub-section 2.1.3 (L 243)? Please clarify.

 Line 337: table legend in the wrong format. As a template, the Table should occupy less space.

 Line 342: this subsection should include a link to access the interface or provide a print screen

 Line 436: Table 3: I was expecting some quantitative values; as this Table is presented, it does not provide helpful information.

 Line 533: consider reformatting the Table 4 to reduce its extension

Inappropriate self-citations by authors:  [20] and [26]

 

 

Comments on the Quality of English Language

 Minor editing of the English language required

Author Response

Dear Reviewer,

 

Thank you for the thoughtful and thorough reviews. After addressing your comments, we are confident that our manuscript has been greatly improved. We have addressed all of your comments below .

  • The paper delivers an extensive overview of existing COVID-19 datasets and their corresponding metadata, focusing on their potential applications for data-driven personalized models. These models are pivotal for predicting individualized risks and patient-specific risk factors. The authors adeptly contextualize their approach in constructing a comprehensive COVID-19 data catalog that provides the requisite data and variables for modeling the unique progression of COVID-19 and enhancing personalized risk modeling procedures.

Thank you for the comment.

  •  Although the paper delves into a pertinent and promising academic domain, it would benefit from enhanced clarity in articulating the primary challenges it tackles. Additionally, the absence of a Conclusion section is noticeable. Improving this section would enable highlighting significant achievements and providing further insights into future directions and ongoing research.

The conclusion part was initially included in the Discussion section. As per your suggestion we move the last part in the Conclusion part to emphasis significant achievements and provide further insights into future directions and ongoing research.

  • Notably, the extensive list of coauthors, including 19 individuals in addition to the COPERIMOplus consortium, raises concerns. Generally, the submission of a scientific paper should involve no more than 4 to 5 coauthors to ensure a more focused and cohesive collaboration. This is important as the manuscript should not be perceived as a project consortium deliverable but as a scholarly work expressing the perspectives of specific researchers.

As per your suggestion we removed several coauthors and added them in the acknowledgement part as collaborators in the COPERIMOPlus project.

Important style formatting issues:

  • Table legend comes at the top and not at the bottom of the Table. Also, revise (reformat) the text used to explain the table content; this text is not part of the Table legend but a specific content below the table legend.

Thank you for pointing that issue out. We double checked the legends of the tables and figures in the manuscript and we moved the explanation parts to the main text. 

  • The recommendation is to use hyperlinks instead of the full URL to reference the data sources. Review the manuscript to replace URLs with hyperlinks.

Using hyperlinks would of course be more reading friendly, but replacing the URLs with hyperlinks will lead to issues for direct access in the printed version of the publication.

  • Figures with a bar-column chart should have a label for each bar column (e.g., Fig. 3, Fig. 4, Fig. 5. Fig. 8, Fig. 9)

We checked the mentioned figures and have not found any missing labels. All mentioned figures have appropriate labels for each bar column. 

  •  Line 71: It is better to adopt a hyperlink style approach and explain each dataset's key points regarding relevant parameters/variables.

As mentioned above, adopting hyperlinks will lead to issues for direct access in the printed version of the publication.

  • Line 79: should provide a short critical analysis of all datasets outlined and not just the Microsoft Azure COVID-19 data

The Microsoft Azure COVID-19 data catalogue and COVIC are mentioned in that particular place to provide examples of issues that datasets face. Which is a part of analytical analysis. 

 

  •  Line 136: what is the difference from what is mentioned in sub-section 2.1.3 (L 243)? Please clarify.

The Data Repositories described in th 2.1.3 (line 243) do not include exclusively data of clinical trials, whereas the clinical trials repositories presented in the 2.1.1(line 136) do.

  • Line 337: table legend in the wrong format. As a template, the Table should occupy less space.

Maybe because of line dispositions it is hard to follow which Table is meant. If the suggestion is about the Table 1, it is a template that later will be filled based on the AI-readiness. It cannot be reduced or modified otherwise it will not represent the whole information.

  • Line 342: this subsection should include a link to access the interface or provide a print screen

The COVID-19 data viewer is described properly in the 3.6 part (line 537).

 

  •  Line 436: Table 3: I was expecting some quantitative values; as this Table is presented, it does not provide helpful information. 

The purpose of Table 3 is to provide the quality of the dataset with respect to the selected criteria, based on the AI-readiness. It covers MC-19 dataset qualification only as an example. The AI-readiness evaluation does not include any quantitative analysis.

  • Line 533: consider reformatting the Table 4 to reduce its extension

We checked the Table 4 again to reduce the extension but every line is important and removing any of those will lead to loss of information.

  • Inappropriate self-citations by authors:  [20] and [26]

We would like to keep the mentioned citations as from the citation  [20] variables are taken and described in the manuscript, and the citation [26] clearly describes the study results and clearly provides a significant comparison.

 

 

Comments on the Quality of English Language

  • Minor editing of the English language required

Some minor editing is done.

Reviewer 2 Report

Comments and Suggestions for Authors
  1. Relevance and Originality: The paper addresses a highly relevant and timely topic, focusing on the collection and curation of COVID-19 datasets. The goal of predicting personalized risk using these datasets is original and has significant potential impact, especially in the context of the ongoing pandemic.

  2. Methodology: The authors have outlined a comprehensive methodology for curating and cataloging COVID-19 datasets. This includes study selection, data investigation, quality assessment, data acquisition, and data curation. The approach appears to be well-structured and methodical.

  3. Data Collection and Analysis: The paper details the collection of data from a variety of sources, including global initiatives, clinical trials, and publications. The authors have taken steps to ensure the quality and relevance of the data by applying specific criteria for inclusion in their catalogue. This thoroughness is commendable and essential for the reliability of the research.

  4. Interoperability and Harmonization Efforts: A significant aspect of the paper is the focus on harmonizing and mapping variables from different datasets for interoperability. This is crucial for the development of robust predictive models and indicates a deep understanding of the challenges in dealing with diverse datasets.

  5. Potential for Impact: The creation of a COVID-19 data catalogue and the subsequent development of predictive models have the potential to greatly impact public health responses and individual patient care. The work could contribute significantly to the body of knowledge on COVID-19 and pandemic response in general.

  6. Areas for Improvement: While the paper is strong in its current form, the authors might consider discussing any limitations of their methodology, such as potential biases in dataset selection or challenges in data harmonization. Additionally, discussing the practical applications and potential limitations of the predictive models would provide a more comprehensive understanding of the research's impact.

  7. Clarity and Presentation: The paper is well-organized and presents its findings clearly. The use of figures and tables enhances the reader's understanding of the methodology and results.

 

Comments on the Quality of English Language

No major problem on the language

Author Response

Dear Reviewer,

 

Thank you for the thoughtful and thorough review. We have addressed all of your comments below.

Relevance and Originality: The paper addresses a highly relevant and timely topic, focusing on the collection and curation of COVID-19 datasets. The goal of predicting personalized risk using these datasets is original and has significant potential impact, especially in the context of the ongoing pandemic.

Thank you for your comment..

Methodology: The authors have outlined a comprehensive methodology for curating and cataloging COVID-19 datasets. This includes study selection, data investigation, quality assessment, data acquisition, and data curation. The approach appears to be well-structured and methodical.

  • Thank you for your comment.
  1. Data Collection and Analysis: The paper details the collection of data from a variety of sources, including global initiatives, clinical trials, and publications. The authors have taken steps to ensure the quality and relevance of the data by applying specific criteria for inclusion in their catalogue. This thoroughness is commendable and essential for the reliability of the research.
  • Thank you for your comment.

Interoperability and Harmonization Efforts: A significant aspect of the paper is the focus on harmonizing and mapping variables from different datasets for interoperability. This is crucial for the development of robust predictive models and indicates a deep understanding of the challenges in dealing with diverse datasets.

  • Thank you for your comment.

Potential for Impact: The creation of a COVID-19 data catalogue and the subsequent development of predictive models have the potential to greatly impact public health responses and individual patient care. The work could contribute significantly to the body of knowledge on COVID-19 and pandemic response in general.

  • Thank you for your comment.

Areas for Improvement: While the paper is strong in its current form, the authors might consider discussing any limitations of their methodology, such as potential biases in dataset selection or challenges in data harmonization. Additionally, discussing the practical applications and potential limitations of the predictive models would provide a more comprehensive understanding of the research's impact.

  • Thank you for mentioning that. The limitations and challenges are described in the DIscussion section. It is hard to add more at this stage.

Clarity and Presentation: The paper is well-organized and presents its findings clearly. The use of figures and tables enhances the reader's understanding of the methodology and results.

  • Thank you for your comment.
Back to TopTop