Review Reports - A Pilot Study Using Natural Language Processing to Explore Textual Electronic Mental Healthcare Data

Round 1

Reviewer 1 Report

Comments and Suggestions for Authors

The study outlines some of the major steps and publications that have utilized natural language processing (NLP) on secondary mental health data in the UK. Certain elements of the NLP pipeline are well described, including various NLP tasks, which contribute to the field's understanding. However, I suggest several major improvements to enhance the manuscript’s clarity and contribution:

Comprehensive Description of the NLP Pipeline
At present, the manuscript provides a fragmented description of the NLP methodological pipeline. For instance, the annotation process is described without clearly explaining its purpose, and classifier validation metrics are presented without any justification of their importance. I recommend that the authors first present a complete overview of the NLP pipeline to give readers a holistic understanding. After this, individual steps can be examined in greater detail. I encourage the authors to consider a similar structure:

Vaci, N., Liu, Q., Kormilitzin, A., De Crescenzo, F., Kurtulmus, A., Harvey, J., ... & Nevado-Holgado, A. (2020). Natural language processing for structuring clinical text data on depression using UK-CRIS. BMJ Ment Health, 23(1), 21-26.
I encourage the authors to consider a similar structure.

Integration of Recent NLP Developments
The manuscript identifies challenges such as the need for extensive annotations but overlooks recent advancements in NLP, such as zero-shot and few-shot transformer models. These models can perform effectively with minimal annotated samples, potentially addressing the stated challenges. Including a discussion of these developments and their implications would significantly strengthen the manuscript.
Expansion of the ChatGPT Section
The ChatGPT section is mentioned only briefly. Expanding on this topic could add value, particularly by discussing the feasibility of integrating ChatGPT-like models within the secure NHS environment. Specifically, the authors might explore:

The potential for chat-based systems to summarize medical documents for clinicians.

How such systems could be implemented securely and efficiently.

The practical benefits and challenges of such integration, including clinical applications and ethical considerations.

Minor:

The manuscript references the use of SHFT's data:

Vaci, N., Koychev, I., Kim, C. H., Kormilitzin, A., Liu, Q., Lucas, C., ... & Nevado-Holgado, A. (2021). Real-world effectiveness, its predictors and onset of action of cholinesterase inhibitors and memantine in dementia: retrospective health record study. The British Journal of Psychiatry, 218(5), 261-267.

Overall Recommendation
The manuscript demonstrates solid potential but requires revisions to achieve clarity and relevance. I believe that the authors can significantly enhance the quality and impact of their work.

Author Response

Response: The authors would like to thank the reviewers for their valuable comments and constructive feedback, which have helped us to improve the manuscript significantly, making it appropriate for the readership. We have made substantial amendments to the manuscript to address all the points recommended by the reviewers and we hope that the new version of the manuscript is up to the standards set by the journal of informatics.

Please find below a detailed point-by-point responses to all comments (reviewers’ comments in black, authors’ responses in blue color). The requested amendments were taken into accounts, including substantial additions off new sections and extension of the existing sections (in blue). We also added 18 new references (and reordered all references in the text) to support the new and extended sections in this revised version of the manuscript. We believe all reviewer’s concerns were sufficiently addressed in the enclosed revised version. We remain open to any further modification suggested by the Editor and Reviewers.

Response: We appreciate the reviewer's thoughtful feedback on our study and the recognition of the importance of NLP in secondary mental health data modelling and analysis.

Comprehensive Description of the NLP Pipeline
At present, the manuscript provides a fragmented description of the NLP methodological pipeline. For instance, the annotation process is described without clearly explaining its purpose, and classifier validation metrics are presented without any justification of their importance. I recommend that the authors first present a complete overview of the NLP pipeline to give readers a holistic understanding. After this, individual steps can be examined in greater detail. I encourage the authors to consider a similar structure: Vaci, N., Liu, Q., Kormilitzin, A., De Crescenzo, F., Kurtulmus, A., Harvey, J., ... & Nevado-Holgado, A. (2020). Natural language processing for structuring clinical text data on depression using UK-CRIS. BMJ Ment Health, 23(1), 21-26.
I encourage the authors to consider a similar structure

Response: We have reviewed the structure used by Vaci et al. (2020) and have reorganized certain parts of our methodology to align with best practices. This includes ensuring a logical flow from pipeline description to detailed analysis of each component.

Integration of Recent NLP Developments
The manuscript identifies challenges such as the need for extensive annotations but overlooks recent advancements in NLP, such as zero-shot and few-shot transformer models. These models can perform effectively with minimal annotated samples, potentially addressing the stated challenges. Including a discussion of these developments and their implications would significantly strengthen the manuscript.

Response: Thank you for your insightful suggestion. The authors agree that this addition strengthens the manuscript by acknowledging cutting-edge developments and their relevance to our study. See the added section 5.

Expansion of the ChatGPT Section
The ChatGPT section is mentioned only briefly. Expanding on this topic could add value, particularly by discussing the feasibility of integrating ChatGPT-like models within the secure NHS environment. Specifically, the authors might explore:

The potential for chat-based systems to summarize medical documents for clinicians.

How such systems could be implemented securely and efficiently.

The practical benefits and challenges of such integration, including clinical applications and ethical considerations

Response: We expanded the ChatGPT section in the manuscript to address the feasibility of integrating ChatGPT-like models in the NHS, their potential for summarizing medical documents, secure implementation strategies, and the practical benefits and challenges of integration. See section 6.

Minor:

The manuscript references the use of SHFT's data:

Response: Reference added (reference# 34): line 291, section 4.

Response: Thank you for your constructive comments, we hope we sufficiently addressed your above comments.

Reviewer 2 Report

Comments and Suggestions for Authors

In this manuscript authors present methodological study of textual electronic mental-healthcare data using NLP. This type of research is required to predict the future trends of any sustainable methodologies. The manuscript requires some revision with the following suggestions for publication:

1) In Table 1. Reference citations should be provided to validate the reasons.

2) In section 2 (Overview of key methods), visualization of different methods should be presented to quantify the discussed approaches.

3) Some publicly available databases used to train the NLP models are to be discussed.

4) Visualization of the results of NLP tools should be represented through graphs or bar charts year wise to showcase the developments of exploring EHR data from mental health.

5) Provide some case studies to validate the methodology.

Author Response

The authors would like to thank the reviewers for their valuable comments and constructive feedback, which have helped us to improve the manuscript significantly, making it appropriate for the readership. We have made substantial amendments to the manuscript to address all the points recommended by the reviewers and we hope that the new version of the manuscript is up to the standards set by the journal of informatics.

Response:

The authors appreciate the reviewer's thoughtful feedback on our methodological study and the recognition of the importance of NLP in secondary mental health data to predict future trends and further advances, potential deployment and implementation in clinical practice.

In Table 1. Reference citations should be provided to validate the reasons

Response:

We appreciate the reviewer’s suggestion, as it has strengthened the evidence base supporting our arguments. These citations have been integrated directly within Table 1.

In section 2 (Overview of key methods), visualization of different methods should be presented to quantify the discussed approaches.

Response: The key methods were revised in the current version of the manuscript. Recent advances in NLP methods including the use of LLMs we presented in details, see sections 5 and 6.

Some publicly available databases used to train the NLP models are to be discussed.

Response:Thank you for your valuable suggestion. We have now included a discussion on publicly available databases commonly used to train NLP models in healthcare, see last paragraph is section 7.

Visualization of the results of NLP tools should be represented through graphs or bar charts year wise to showcase the developments of exploring EHR data from mental health.

Response: The developments of exploring the EHR data is presented in details in section 5 and 6.

Provide some case studies to validate the methodology

Response: Case studies are provided within section 4 of the manuscript “Southern Health NHS Foundation Trust’s NLP approach for UK-CRIS”.

Round 2

Reviewer 1 Report

Comments and Suggestions for Authors

Authors improved the manuscript considerably.

Author Response

Reviewer 2 Report

Comments and Suggestions for Authors

In this manuscript authors present methodological study of textual electronic mental-healthcare data using NLP. This type of research is required to predict the future trends of any sustainable methodologies. In the revised manuscript some of the review points are discussed.

1) Section 3, should include some basic architecture NLP methods used to explore EHR data. So that the visualization of the proposed algorithms should be clear to the readers.

Other contributions are ok for me. With the above suggestions, the manuscript can be accepted for publication.

Comments on the Quality of English Language

The writing is satisfactory.

Author Response

Please find below a detailed point-by-point response to the only remaining comment from the 2^nd round of the review process (reviewer’ comment in black, authors’ responses in blue color). The requested amendments were taken into account in this revised version of the manuscript.

We believe all reviewer’s concerns were sufficiently addressed in the enclosed revised version. We remain open to any further modification suggested by the Editor and Reviewers.

Reviewer's comment:

1) Section 3, should include some basic architecture NLP methods used to explore EHR data. So that the visualization of the proposed algorithms should be clear to the readers.

Authors' response:

We appreciate the reviewer's thoughtful feedback on our study and its importance in helping future trends in NLP development in the healthcare and medical field to process EHRs data.

A paragraph and a Diagram (in Table 2) about the basic NLP architectures and methods used to explore EHR data was added in the revised version of the manuscript; see section 3; pages: 6 and 7.

This is, as requested, better visualise the proposed algorithms and related architectures