Identifying Military Service Status in Electronic Healthcare Records from Psychiatric Secondary Healthcare Services: A Validation Exercise Using the Military Service Identification Tool

Leightley, Daniel; Palmer, Laura; Williamson, Charlotte; Leal, Ray; Chandran, Dave; Murphy, Dominic; Fear, Nicola T.; Stevelink, Sharon A. M.

doi:10.3390/healthcare11040524

Open AccessArticle

Identifying Military Service Status in Electronic Healthcare Records from Psychiatric Secondary Healthcare Services: A Validation Exercise Using the Military Service Identification Tool

by

Daniel Leightley

^1,*,†

,

Laura Palmer

^1,†,

Charlotte Williamson

¹

,

Ray Leal

¹,

Dave Chandran

²,

Dominic Murphy

^1,3

,

Nicola T. Fear

^1,4,‡ and

Sharon A. M. Stevelink

^1,5,‡

¹

King’s Centre for Military Health Research, King’s College London, Weston Education Centre, Cutcombe Road, London SE5 9RJ, UK

²

Biomedical Research Centre (BRC), Institute of Psychiatry, Psychology and Neuroscience, King’s College London, London SE58AB, UK

³

Combat Stress, Tyrwhitt House, Oaklawn Road, Leatherhead, London KT22 0BX, UK

⁴

Academic Department of Military Mental Health, King’s College London, Weston Education Centre, Cutcombe Road, London SE5 9RJ, UK

⁵

Department of Psychological Medicine, Institute of Psychiatry, Psychology and Neuroscience, King’s College London, London SE58AB, UK

^*

Author to whom correspondence should be addressed.

^†

These authors contributed equally to this work.

^‡

These authors contributed equally to this work.

Healthcare 2023, 11(4), 524; https://doi.org/10.3390/healthcare11040524

Submission received: 16 December 2022 / Revised: 3 February 2023 / Accepted: 7 February 2023 / Published: 10 February 2023

(This article belongs to the Section Health Informatics and Big Data)

Download

Browse Figure

Review Reports Versions Notes

Abstract

:

Electronic healthcare records (EHRs) are a rich source of information with a range of uses in secondary research. In the United Kingdom, there is no pan-national or nationally accepted marker indicating veteran status across all healthcare services. This presents significant obstacles to determining the healthcare needs of veterans using EHRs. To address this issue, we developed the Military Service Identification Tool (MSIT), using an iterative two-staged approach. In the first stage, a Structured Query Language approach was developed to identify veterans using a keyword rule-based approach. This informed the second stage, which was the development of the MSIT using machine learning, which, when tested, obtained an accuracy of 0.97, a positive predictive value of 0.90, a sensitivity of 0.91, and a negative predictive value of 0.98. To further validate the performance of the MSIT, the present study sought to verify the accuracy of the EHRs that trained the MSIT models. To achieve this, we surveyed 902 patients of a local specialist mental healthcare service, with 146 (16.2%) being asked if they had or had not served in the Armed Forces. In total 112 (76.7%) reported that they had not served, and 34 (23.3%) reported that they had served in the Armed Forces (accuracy: 0.84, sensitivity: 0.82, specificity: 0.91). The MSIT has the potential to be used for identifying veterans in the UK from free-text clinical documents and future use should be explored.

Keywords:

electronic health records; mental health; armed forces; secondary mental healthcare; national health service; United Kingdom; veterans; military service identification tool

1. Introduction

The United Kingdom’s (UK) veteran population, defined by the British Government as those who have served in the military for at least one day [1], is estimated to be 2.5 million [2]. According to the 2021 census of England and Wales, 1.9 million people reported a veteran status [3]. UK veterans receive their healthcare from the National Health Service (NHS) alongside civilian counterparts, with their care recorded in local, regional, and national Electronic Healthcare Records (EHRs) [4]. EHRs—structured and unstructured (i.e., free-text, self-reported questionnaires)—can be used to evaluate disease prevalence, perform epidemiological analyses, investigate the quality of care, improve clinical decision-making, and employed for other research purposes [5,6,7,8]. The use of EHRs can also overcome challenges related to the recruitment and retention of veterans in research [9].

Whilst most veterans transition out of the military without experiencing any difficulties, a sizeable minority report a range of mental health problems [10,11,12]. Some of these are a result of their military service [13]. Research has shown that 93% of personnel who report having mental health difficulties seek some form of help for their problems, with the majority seeking informal support [14,15]. Currently, there is no universal pan-national or nationally accepted marker in UK EHRs to identify veterans, neither is there a requirement for healthcare providers to record veteran status or ask individuals about their potential military employment. This lack of data prohibits an evaluation of the healthcare needs of those who have served in the Armed Forces [16].

The ability to identify those who serve in the military would allow a comparison between those with and without military experience in relation to their demographic characteristics, physical and mental health outcomes, and healthcare utilization. Prior research has similarly conducted comparisons between the general population and specific cohorts, such as those with chronic mental illness [17,18], and emergency service workers [19]. The ability to identify military veterans could further enable primary and secondary healthcare services to target known areas of concern, such as alcohol use, with novel interventions [11,15]. Only three studies have analyzed secondary care in the UK using military samples [4,20,21]. These studies identified veterans via external sources, manual human review, and the use of data linkage algorithms. The development of the MSIT [22], a Natural Language Processing (NLP) tool, enabled the use of a machine learning tool to analyze free-text clinical documents.

Veterans were manually identified using the South London and Maudsley (SLaM) Biomedical Research Centre (BRC) Clinical Record Interactive Search (CRIS) database holding secondary mental healthcare electronic records for the SLaM NHS Foundation Trust. An iterative approach was then followed. First, a structured query language (SQL) method was developed, which was refined using NLP and machine learning to create the MSIT, a tool designed to identify if a patient was a non-veteran or veteran [20,22]. We obtained an accuracy of 0.93 in correctly predicting non-veterans and veterans, a positive predictive value of 0.81 and a sensitivity of 0.75 using this approach. This method informed the second stage, which was the creation of the MSIT using machine learning, which, when tested, obtained an accuracy of 0.97, a positive predictive value of 0.90 and a sensitivity of 0.91.

Given the large-scale use of free text in patient record systems and the vital role they play in clinical decision-making and describing patient characteristics, it is important this recording is accurate. Variations in the methods of notetaking among different healthcare professionals—combined with local, regional and national differences—have created data quality issues that could impact the use of automated text analysis tools [23,24]. The objective of this study was to therefore validate the accuracy of free-text medical documents used to assess the performance of the MSIT by contacting participants to verify the accuracy of their military service status. It is important to note that the MSIT tool identifies military service. Since serving personnel transition from military to civilian healthcare services once leaving the military, almost all positive identifications of military service made by the MSIT will relate to veterans.

2. Background

Routinely collected EHRs can be used to evaluate disease prevalence, monitor disease spread, facilitate epidemiological analyses [25], and improve clinical decision-making which can influence patient outcomes [5,6]. EHRs function as a single integrated standardized longitudinal electronic version of the traditional paper health record and are held by hospitals, clinics, and other healthcare providers across the UK [4,16]. EHRs combine structured fields such as test results, or questionnaire responses with unstructured fields which contain free-text medical notes or communications (e.g., letters sent to patients).

In recent years, there has been a growth in the use of EHRs in the field of health data analytics either via NLP, or machine learning [26]. The use of these innovative approaches follows two main themes: to generate knowledge to improve the effectiveness of treatment, or to predict the outcome of treatment and diagnoses, or a combination of both [27].

In the context of NLP algorithms, the two main themes specified are used to perform syntactic representation (e.g., tokenization, sentence, and structure detection), extract specific information of themes (e.g., identify depressive symptoms or represent text in a structured form [28]), capture meaning from documents (e.g., sentiment of statements or free-text) and detect relationships (e.g., between diseases and conditions [29]).

For example, Downs et al. (2017; [30]) used NLP to detect keywords associated with suicidal ideation or attempted suicide in adolescents with autism spectrum disorders with a high degree of accuracy. A similar approach was taken by Al-Harras et al. (2021; [31]) where NLP was used to establish the presence of motor signs (e.g., gait, rigidity, tremor) using free-text clinical notes in patients with a dementia diagnosis. To identify motor signs, the authors developed an annotation pipeline where a set of documents were manually labeled, keywords extracted, and used as a gold standard reference when analyzing future documents. By identifying motor signs, the authors were able to identify the co-morbidity profiles of patients experiencing these symptoms and ascertained associations with survival and hospitalization. Irving et al. (2021; [32]) used NLP to identify gender differences in clinical presentations and illicit substance use. The authors developed an NLP framework that used a labeled corpus to extract information from free-text unstructured medical records to create a structured dataset that could then be used for analysis. Using this approach, they were able to identify clear differences between genders in clinical presentation, and in the substances used.

Approaches in recent years have also used machine learning combined with NLP to analyze large datasets. Kapadi et al. (2022; [33]) developed a machine learning and NLP framework to predict the inpatient risk of re-admission based on free-text clinical notes. The authors devised a framework where clinical notes were analyzed and coded using NLP, where term frequency-inverse document frequency vectors were computed and trained a Random Forest machine learning model. While the framework did not yield highly accurate results, it was able to analyze free-text clinical notes at scale. Conversely, Han et al. (2022; [34]) developed a framework where they were able to automatically detect social determinants of health using NLP and machine learning. First, they manually annotated over 3000 free-text notes and then trained a range of complex machine-learning models. With this framework, they were able to produce Area Under the Curve results of 0.97.

EHRs are a powerful resource when combined with NLP and machine learning; this approach may enable health practitioners, decision-makers and researchers to monitor and improve admissions, clinical decision-making patient outcomes, and quality of care. These algorithms also have the potential to identify current and former occupations, including military service, thus enabling a suite of research that could examine the health characteristics of different occupational groups.

3. Materials and Methods

3.1. Data Source—Clinical Record Interactive Search (CRIS) System

The Clinical Record Interactive Search (CRIS) system provides de-identified EHRs from the SLaM NHS Foundation Trust, a secondary and tertiary mental healthcare provider serving a geographical catchment of approximately 1.3 million residents across 4 south London boroughs (namely Lambeth, Southwark, Lewisham, and Croydon) [25].

3.2. Military Service Identification Tool (MSIT)

A machine learning classification framework underpins the MSIT and is responsible for making predictions. It was developed in Python using the Natural Language Processing Toolkit (version 3.2.5) [35] and Scikit-learn (version 0.20.3) [36]. A demonstration of MSIT is available via GitHub (https://github.com/DrDanL/kcmhr-msit, accessed on 8 February 2023). For this article, we provide a brief overview of the MSIT.

A gold standard manually labeled dataset was created to train the MSIT. This labeled dataset included terms such as “veteran”, “army”, and “served in the forces”. Once documents had been identified, free-text documents were preprocessed to remove:

(1): punctuations (using regular expressions);
(2): words/phrases related to another individual’s military service;
(3): stop words and frequently occurring terms (except military terms); and
(4): word/phrases that may cause confusion with correctly identifying a veteran.

The remaining features were then converted into term frequency-inverse document frequency features. The classification framework was trained to identify veterans based on the use of military terms and phrases with the outcome being binary (1: veteran and 0: non-veteran; see for more information [22]).

To improve the true positive rate of the MSIT and reduce the potential for false positives, a postprocessing of the outcome was applied based on a set of rules. For each document that was classified as containing indicators of military service, an SQL operation was performed to ensure the document used a military term or phrase.

3.3. Military Service Identification Tool (MSIT) Validation Approach

Whilst the MSIT has high precision, it is important to ensure that the MSIT can identify “true” military veterans and that the underlying data used to train the MSIT are accurate. This was achieved by sending an online survey to a sample of patients in the SLaM NHS Foundation Trust to determine their self-reported veteran/non-veteran status and to compare this to their MSIT classifications. Responses allowed us to validate the MSIT algorithm’s predictive performance. Incorrect classifications would provide further information with which to refine the algorithm and improve its accuracy. To undertake this evaluation, the following steps were followed:

Sample: The MSIT was run over a batch of 779,944 records from patients who had used SLaM NHS Foundation Trust services since January 2017. This month marked the inception of the Consent for Consent (C4C) mechanism that records patients who consent to be contacted for research purposes. From these records, the BRC identified patients who were listed (1) as “Alive”; (2) aged 18 years or older; (3) had given C4C; (4) did not have indicators of dementia or psychosis according to their diagnostic codes; (5) had an email address or mobile telephone number; (6) were able to communicate in English without an interpreter; and (7) if they were active patients, were approved by their care coordinator to take part in the study.
MSIT Execution: Once the sample had been identified, all free-text clinical documents relating to their care in SLaM were analyzed using the MSIT. Each patient was evaluated as being a military veteran or non-veteran based on their medical notes.
Recruitment: Recruitment spanned February to June 2022. The research team performed another manual screening to detect any changes or further details regarding patients’ eligibility criteria (e.g., becoming an inpatient, now deceased, changes to their C4C status, and/or apparent communication needs that may prevent them from being able to take part in the survey). Participants were contacted in the first instance by email and, if this was not available, by text. Reminder emails and texts were sent and if participants requested not to be contacted, the researchers updated their C4C status.
Data collection: Patients were invited to take part in the study via email or text, which included the survey link. After consenting, participants were asked a single question: “Have you ever served in the Armed Forces (military)?” to which they could reply “Yes” or “No”. If participants endorsed having served in the Armed Forces, they were asked follow-up questions about their military characteristics to further develop the MSIT, and to provide context for understanding any inaccurate misclassifications. These questions collected branch of service, rank, length of service, and regular/reserve status. Data relating to patients’ EHRs accessible via SLaM Electronic Patient Journey System (ePJS) were kept separately from their survey responses.
Prize draw: Participants who completed the survey were entered into a prize draw for 26 e-vouchers comprised of 20 × £10, 5 × £20, and 1 × £50.

3.4. Statistical Analysis

All analyses were performed using STATA 16.1 MP (StataCorp, College Station, TX, USA). The positive predictive value was defined as the proportion of correctly identified true veterans over the total number of true veterans identified by the MSIT. Sensitivity was defined as the proportion of non-veterans identified by the MSIT over the total number of actual non-veterans (identified by patient report); specificity was determined as the proportion of veterans identified by the MSIT over the total number of actual veterans. Accuracy was measured using the Youden Index [37], which considers sensitivity and specificity ([38]; summation minus 1), which results in a value that lies between 0 (absence of accuracy) and 1 (perfect accuracy).

3.5. Ethical Approval

Ethical approval was given by the East of Scotland Research Ethics Service within the NHS Research Ethics Service (Ref: 20/ES/0060). Approvals were obtained by the SLaM Research and Development Office at King’s College London (Ref: R&D2020/029).

4. Results

4.1. Sample and MSIT Execution

Each stage of sampling, recruitment and inclusion/exclusion is depicted in Figure 1. Between March 2022 and June 2022, the MSIT was run over 779,944 free-text EHRs held by the BRC CRIS system, representing 141,762 patients. Each document was assigned a flag (1 = veteran, 2 = non-veteran), and this was aggregated at patient level so that any presence of a veteran flag in a document meant the patient was identified as being a veteran.

After applying the automated exclusion criteria listed in Section 3.3, 1684 (1.2%) remained for manual confirmation of their eligibility. After this process, 902 (53.6%) patients were invited to take part in the study with 149 (16.5%) providing a response to the questionnaire. Three of these responses had to be excluded as they were duplicate entries.

4.2. Validation Results

In total, 146 participants provided eligible responses to the questionnaire (see Table 1). Of these, 112 (76.7%) were classified by the MSIT as non-veterans and 34 (23.3%) were classified as veterans. When corroborating survey responses and MSIT classifications, we found 84.2% of the sample was accurately categorized by the MSIT (n = 122/146). A sensitivity and specificity analysis were performed to determine how many veterans and non-veterans were misclassified. Overall, 23 true non-veterans were inaccurately categorized by the MSIT as veterans, and 1 veteran was inaccurately categorized by the MSIT as non-veteran. We found:

The sensitivity of the MSIT was 0.83.
The specificity of the MSIT was 0.92.

A manual investigation of inaccurate classifications found that the MSIT had a high degree of accuracy with some exceptions. In minor examples of misclassification, the MSIT may be prone to assigning non-veterans as veterans. The most common reasons for misclassification included the mentioning of military family members and support received by the Salvation Army.

Table 1. MSIT classifications compared with patient-reported classification.

Outcome	True Non-Veteran	True Veteran	Total
MSIT Non-veteran	111	1	112
MSIT Veteran	23	11	34
Total	134	12	146

5. Discussion

This validation study has demonstrated that it is possible to identify veterans from free-text clinical notes using the MSIT. The MSIT tool performed well, as indicated by its high sensitivity and specificity. To the authors’ knowledge, this is the only study to have developed, applied, tested, and validated an NLP and machine learning framework for the identification of veterans in the UK using a large psychiatric database. Notably, by assessing the validity of the underlying free-text record, we have also demonstrated accuracy in the identification of records belonging to veterans.

In the absence of a universal marker denoting prior military service, there has been no systematic way to identify and to thus examine veteran populations in mental healthcare services. The current Veterans’ Strategy Action Plan reinforces the importance of better understanding the health and well-being needs of the veteran community [39]. The ability to identify veterans may advance investigations into the mental health characteristics of veterans accessing these services, and their navigation through, and use of, such services. This knowledge is vital for delivering commitments to the veteran community as outlined in the NHS Long Term Plan, such as inclusive access to services, improving provision, and identifying and addressing potential health disparities [40]. Asking a simple question—such as “have you ever served in the UK Armed Forces?”—and have the response recorded could yield significant public health benefits. While this question is asked at a local level by some General Practitioners, it is not recorded routinely in secondary care.

EHR-based Case Registers, such as CRIS, function as single, complete, and integrated electronic versions of traditional paper health records [4]. These registers have been positioned as a ‘new generation’ for health research and, since the year 2000, are mandatory across the UK [4]. The methodological advantages of Case Registers—including their longitudinal nature, largely structured fields, and detailed coverage of defined populations—make them an ideal research and monitoring tool [41]. There is also the potential to have these Case Registers linked to multiple healthcare providers, third-sector charitable organizations, and medication dispensing providers to further improve the overall support and clinical delivery for patients.

EHRs in mental healthcare provide extremely rich material and analyses of their data can reveal patterns in healthcare provisions, patient profiles and mental and physical health problems [4,42]. This is hugely advantageous for investigating vulnerable sub-groups within the wider population [25,43,44].

5.1. Strengths and Limitations

This study has demonstrated that it is possible to identify veterans who accessed secondary mental healthcare services in the UK by using a Case Register. The MSIT was able to identify potential veterans with high precision, with 84.2% of cases correctly identified when corroborated against patients’ self-reported status. A key strength of the MSIT was the exploitation of NLP and the annotation of a large corpus of medical records. This is advantageous for automating the process of identifying veterans, as well as reducing the possibility of human error, and conscious/unconscious biases, and overcoming challenges when using military cohorts linked to Case Registers [4,45].

At present, the MSIT is the only tool implemented by CRIS using C4C to validate the integrity of the patient record and validate reported status [46]. The methodology reported in this manuscript could aid, and further support, future studies using C4C to validate NLP tools and patient record validity.

The MSIT does not rely on any coding structure or predefined fields and solely uses free-text documents, which broadens its potential applications to areas like diagnoses, occupations, and identifying ethnicity. This may allow studies to assess the specific characteristics of veterans in the SLaM sample compared to their non-veteran counterparts, as undertaken by Mark et al. (2019; [20]).

This current study, however, highlights some limitations of self-report data, whether this is inputted by healthcare professionals or reported by the patient themselves. For instance, veterans may be misclassified as non-veterans if they did not disclose their veteran status during consultations, if this was not recorded, or if it was misreported by clinicians. Whilst notes on occupational history are commonly taken during consultations, any reports of military service during consultations are also not verified using military records, and therefore there may be false disclosures.

In terms of the MSIT’s performance, there some evidence of non-veterans being misclassified as veterans when keywords referred to other contexts, e.g., military family members, military metaphors, or the Salvation Army. Minor revisions to the keywords used by the tool are required before future use, however, the current findings suggest that the MSIT does not require any substantial changes.

Despite maximizing the sampling pool to improve our chances of finding eligible veterans, our attempts did not yield a large veteran sample (n = 12 “true” veterans comparative to n = 112 “true” non-veterans). This was not deemed a substantial problem since the research team deliberately sought to oversample non-veterans. This was decided because erroneous classifications of non-veterans would contaminate a “true” veteran sample for future analyses. In addition, the smaller sample size of veterans reflects the reality that non-veterans outnumber veterans in the general population.

This study was not able to establish whether veterans or non-veterans were disproportionately impacted by specific exclusion criteria of this study (e.g., being less likely to have C4C or being a current inpatient). This may prohibit specific individuals, e.g., those with more complex mental health issues who may not consent to participate in research, from being represented in future analyses. This is an important consideration for future research.

5.2. Implications

To address the inconsistencies in whether veteran status is recorded, a military service marker in the case registers and similar NHS databases could be implemented. Although time-consuming, it is possible that this could be verified by referencing UK Ministry of Defence records. This could be accompanied by broader educational efforts to help clinicians better support veterans in their care, to have a more in-depth understanding of the unique mental health needs of this population, and to be aware of the benefits of recording veteran status for referring into bespoke and specialist services. Any endeavor to ascertain veteran status must acknowledge that this can be a highly sensitive and personal attribute, especially in areas such as Northern Ireland [47].

Additionally, the MSIT could be customized and further tested to identify other occupational groups, including those that are similarly exposed to potentially traumatic events, such as first responders; who have similar work-related patterns, such as oil rig workers, or who experience specific mental health issues, such as suicide among construction workers [48].

Whilst the MSIT was successful in the context of the SLaM NHS Foundation Trust, further work is required to refine the tool to function on other datasets from other NHS Trusts. To that end, we have released the source-code of the tool. As with any other effort to access and test tools on EHRs, the researchers encountered many bureaucratic and governance hurdles, making the research an extensive and lengthy endeavor. This is noteworthy for future research seeking to test, develop and utilize the MSIT and other applications. The potential benefits of the MSIT, however, remain. Principally, the MSIT allows for the identification of veterans until a mandatory field is introduced, and for its continued use in retrospective analyses.

6. Conclusions

We have shown that it is possible to identify veterans using NLP in electronic healthcare records of a secondary mental health service. The MSIT was able to identify military veterans with high accuracy and precision. This work has demonstrated that the MSIT can be used in a healthcare setting and was successful in exploiting 800,000+ free-text medical records. With the ability to identify military veterans, we can now conduct analyses comparing the health and well-being needs of this important cohort with the public, and other high-intensity occupations. With further refinement, the MSIT can be implemented in other electronic healthcare systems and to possibly identify other occupational groups.

Author Contributions

L.P. was the study coordinator of the project, led ethical applications and amendments, data-processing in liaison with D.C., managed recruitment into the study, obtained and analyzed data. C.W. conducted all recruitment into the study, patient contacts and administrative tasks relating to recruitment. R.L. supported data-processing. D.L. developed the Military Service Identification Tool. S.A.M.S., D.M. and N.T.F. conceived the study and secured the funding. D.M., S.A.M.S. and N.T.F. were Chief Investigators in the study. L.P., D.L., C.W., S.A.M.S., D.M. and N.T.F. interpreted the findings and critically revised and approved the manuscript. Guarantors of the manuscript (L.P., D.L., S.A.M.S. and N.T.F.) had final responsibility for the decision to submit for publication. All authors have read and agreed to the published version of the manuscript.

Funding

This study was funded by the Forces in Mind Trust (Project: FiMT18/0525KCL), a funding scheme run by the Forces in Mind Trust using an endowment awarded by the National Lottery Community Fund. The salary of S.A.M.S. was partly paid by the NIHR Biomedical Research Centre at the SLaM NHS Foundation Trust and King’s College London. In addition to the listed authors, the study involved support from the NIHR Biomedical Research Centre. NIHR Biomedical Research Centre is a partnership between the SLaM NHS Foundation Trust and the Institute of Psychiatry, Psychology, and Neuroscience at King’s College London.

Institutional Review Board Statement

Ethical approval for the use of CRIS as an anonymized database for secondary analysis was granted by the Oxford Research Ethics Committee (reference: 08/H0606/71+5). This study has been approved by the CRIS Patient Data Oversight Committee of the National Institute for Health and Care Research (NIHR) Maudsley Biomedical Research Centre’ (reference: 16-056). This study has also received NHS Ethical approval (20/ES/0060).

Informed Consent Statement

Only participants who provided consent for consent via the UK National Health Service opt-in service were included in this study. Those choosing to take part were also asked to provide informed consent.

Data Availability Statement

The datasets used in this study are based on patient data, which are not publicly available. Although the data are pseudonymized, that is, personal details of the patient are removed, the data still contain information that could be used to identify a patient. Access to these data requires a formal application to the CRIS Patient Data Oversight Committee of the NIHR Biomedical Research Centre. On request and after suitable arrangements are put in place, the data and modeling employed in this study can be viewed within the secure system firewall. The corresponding author can provide more information about the process.

Acknowledgments

The present study involved support from the NIHR Maudsley Biomedical Research Centre (a partnership between the South London and Maudsley National Health Service NHS Foundation Trust and the Institute of Psychiatry, Psychology and Neuroscience at King’s College London). We would particularly like to thank Megan Pritchard (former lead in Clinical Record Interactive Search training and development), Daisy Kornblum (clinical informatician) and Debbie Cummings (administrator).

Conflicts of Interest

NTF is partly funded by the United Kingdom’s Ministry of Defence. NTF sits on the Independent Group Advising on the Release of Data at NHS Digital. NTF is also a trustee of one military-related charity. DM is employed by Combat Stress, a national charity in the United Kingdom that provides clinical mental health services to veterans and is a trustee of the Forces in Mind Trust (the funder for the project). DL is a reservist in the UK Armed Forces. This work has been undertaken as part of his civilian employment. CW is currently in receipt of a PhD studentship via the King’s Centre for Military Health Research Health and Wellbeing Study funded by the Office for Veterans’ Affairs (OVA), UK Government. SAMS is supported by the National Institute for Health and Care Research (NIHR) Maudsley Biomedical Research Centre at South London and Maudsley NHS Foundation Trust and the National Institute for Health and Care Research, NIHR Advanced Fellowship, SAMS, NIHR300592. The views expressed in this publication are those of the author(s) and not necessarily those of the NHS, the NIHR, the OVA or the UK Ministry of Defence or the Department of Health and Social Care.

References

Veterans: Key Facts. 2016. Available online: https://www.armedforcescovenant.gov.uk/wp-content/uploads/2016/02/Veterans-Key-Facts.pdf (accessed on 12 March 2019).
Population Projections: UK Armed Forces Veterans Residing in Great Britain, 2016 to 2028. London, UK. 2019. Available online: https://assets.publishing.service.gov.uk/government/uploads/system/uploads/attachment_data/file/775151/20190107_Enclosure_1_Population_Projections_-_UK_Armed_Forces_Veterans_residing_in_Great_Britain_-_2016_to_2028.pdf (accessed on 1 August 2022).
UK Armed Forces Veterans, England and Wales: Census 2021. London, UK. 2022. Available online: https://www.ons.gov.uk/peoplepopulationandcommunity/armedforcescommunity/bulletins/ukarmedforcesveteransenglandandwales/census2021 (accessed on 15 December 2022).
Leightley, D.; Chui, Z.; Jones, M.; Landau, S.; McCrone, P.; Hayes, R.D.; Wessely, S.; Fear, N.T.; Goodwin, L. Integrating electronic healthcare records of armed forces personnel: Developing a framework for evaluating health outcomes in England, Scotland and Wales. Int. J. Med. Inform. 2018, 113, 17–25. [Google Scholar] [CrossRef] [PubMed]
Payne, R.A.; Abel, G.A.; Guthrie, B.; Mercer, S.W. The effect of physical multimorbidity, mental health conditions and socioeconomic deprivation on unplanned admissions to hospital: A retrospective cohort study. Can. Med. Assoc. J. 2013, 185, E221–E228. [Google Scholar] [CrossRef] [PubMed]
Simmonds, S.J.; Syddall, H.E.; Walsh, B.; Evandrou, M.; Dennison, E.M.; Cooper, C.; Sayer, A.A. Understanding NHS hospital admissions in England: Linkage of Hospital Episode Statistics to the Hertfordshire Cohort Study. Age Ageing 2014, 43, 653–660. [Google Scholar] [CrossRef] [PubMed]
Chui, Z.; Leightley, D.; Jones, M.; Landau, S.; McCrone, P.; Hayes, R.D.; Wessely, S.; Fear, N.T.; Goodwin, L. Mental health problems and admissions to hospital for accidents and injuries in the UK military: A data linkage study. PLoS ONE 2023, 18, e0280938. [Google Scholar] [CrossRef]
Williamson, C.; Palmer, L.; Leightley, D.; Pernet, D.; Chandran, D.; Leal, R.; Murphy, D.; Fear, N.T.; Stevelink, S.A.M. Military veterans and civilians’ mental health diagnoses: An analysis of secondary mental health services. Soc. Psychiatry Psychiatr. Epidemiol. 2022, 16, 1–9. [Google Scholar] [CrossRef] [PubMed]
Williamson, C.; Rona, R.J.; Simms, A.; Fear, N.T.; Goodwin, L.; Murphy, D.; Leightley, D. Recruiting Military Veterans into Alcohol Misuse Research: The Role of Social Media and Facebook Advertising. Telemed. e-Health 2022, 29, 93–101. [Google Scholar] [CrossRef]
Williamson, C.; Baumann, J.; Murphy, D. Exploring the health and well-being of a national sample of U.K. treatment-seeking veterans. Psychol. Trauma Theory Res. Pract. Policy 2022. [Google Scholar] [CrossRef] [PubMed]
Leightley, D.; Williamson, C.; Rona, R.J.; Carr, E.; Shearer, J.; Davis, J.P.; Simms, A.; Fear, N.T.; Goodwin, L.; Murphy, D. Evaluating the Efficacy of the Drinks:Ration Mobile App to Reduce Alcohol Consumption in a Help-Seeking Military Veteran Population: Randomized Controlled Trial. JMIR mHealth uHealth 2022, 10, e38991. [Google Scholar] [CrossRef]
Palmer, L.; Norton, S.; Rona, R.J.; Fear, N.T.; Stevelink, S.A.M. The evolution of PTSD symptoms in serving and ex-serving personnel of the UK armed forces from 2004 to 16: A longitudinal examination. J. Psychiatr. Res. 2023, 157, 18–25. [Google Scholar] [CrossRef] [PubMed]
Stevelink, S.A.M.; Jones, M.; Hull, L.; Pernet, D.; MacCrimmon, S.; Goodwin, L.; MacManus, D.; Murphy, D.; Jones, N.; Greenberg, N.; et al. Mental health outcomes at the end of the British involvement in the Iraq and Afghanistan conflicts: A cohort study. Br. J. Psychiatry 2018, 213, 1–8. [Google Scholar] [CrossRef] [Green Version]
Stevelink, S.A.M.; Jones, N.; Jones, M.; Dyball, D.; Khera, C.K.; Pernet, D.; MacCrimmon, S.; Murphy, D.; Hull, L.; Greenberg, N.; et al. Do serving and ex-serving personnel of the UK armed forces seek help for perceived stress, emotional or mental health problems? Eur. J. Psychotraumatol. 2019, 10, 1556552. [Google Scholar] [CrossRef] [PubMed]
Irizar, P.; Leightley, D.; Stevelink, S.; Rona, R.; Jones, N.; Gouni, K.; Puddephatt, J.-A.; Fear, N.; Wessely, S.; Goodwin, L. Drinking motivations in UK serving and ex-serving military personnel. Occup. Med. 2020, 70, 259–267. [Google Scholar] [CrossRef]
Morgan, V.A.; Jablensky, A.V. From inventory to benchmark: Quality of psychiatric case registers in research. Br. J. Psychiatry 2010, 197, 8–10. [Google Scholar] [CrossRef]
Reilly, S.; Olier, I.; Planner, C.; Doran, T.; Reeves, D.; Ashcroft, D.; Gask, L.; Kontopantelis, E. Inequalities in physical comorbidity: A longitudinal comparative cohort study of people with severe mental illness in the UK. BMJ Open 2015, 5, e009010. [Google Scholar] [CrossRef]
Perera, B.; Audi, S.; Solomou, S.; Courtenay, K.; Ramsay, H. Mental and physical health conditions in people with intellectual disabilities: Comparing local and national data. Br. J. Learn. Disabil. 2020, 48, 19–27. [Google Scholar] [CrossRef]
Stevelink, S.A.M.; Pernet, D.; Dregan, A.; Davis, K.; Walker-Bone, K.; Fear, N.T.; Hotopf, M. The mental health of emergency services personnel in the UK Biobank: A comparison with the working population. Eur. J. Psychotraumatol. 2020, 11, 1799477. [Google Scholar] [CrossRef] [PubMed]
Mark, K.M.; Leightley, D.; Pernet, D.; Murphy, D.; Stevelink, S.A.M.; Fear, N.T. Identifying Veterans Using Electronic Health Records in the United Kingdom: A Feasibility Study. Healthcare 2019, 8, 1. [Google Scholar] [CrossRef] [PubMed]
Goodwin, L.; Leightley, D.; Chui, Z.E.; Landau, S.; McCrone, P.; Hayes, R.D.; Jones, M.; Wessely, S.; Fear, N.T. Hospital admissions for non-communicable disease in the UK military and associations with alcohol use and mental health: A data linkage study. BMC Public Health 2020, 20, 1236. [Google Scholar] [CrossRef] [PubMed]
Leightley, D.; Pernet, D.; Velupillai, S.; Stewart, R.J.; Mark, K.M.; Opie, E.; Murphy, D.; Fear, N.T.; Stevelink, S.A.M. The Development of the Military Service Identification Tool: Identifying Military Veterans in a Clinical Research Database Using Natural Language Processing and Machine Learning. JMIR Med. Inform. 2020, 8, e15852. [Google Scholar] [CrossRef] [PubMed]
Botsis, T.; Hartvigsen, G.; Chen, F.; Weng, C. Secondary Use of EHR: Data Quality Issues and Informatics Opportunities. Summit Translat. Bioinforma. 2010, 2010, 1–5. [Google Scholar]
Hogan, W.R.; Wagner, M.M. Free-text fields change the meaning of coded data. In Proceedings of the AMIA Annual Fall Symposium, Washington, DC, USA, 26–30 October 1996; pp. 517–521. Available online: http://www.ncbi.nlm.nih.gov/pubmed/8947720 (accessed on 15 December 2022).
Perera, G.; Broadbent, M.; Callard, F.; Chang, C.-K.; Downs, J.; Dutta, R.; Fernandes, A.; Hayes, R.D.; Henderson, M.; Jackson, R.; et al. Cohort profile of the South London and Maudsley NHS Foundation Trust Biomedical Research Centre (SLaM BRC) Case Register: Current status and recent enhancement of an Electronic Mental Health Record-derived data resource. BMJ Open 2016, 6, e008721. [Google Scholar] [CrossRef] [Green Version]
Koleck, T.A.; Dreisbach, C.; Bourne, P.E.; Bakken, S. Natural language processing of symptoms documented in free-text narratives of electronic health records: A systematic review. J. Am. Med. Informatics Assoc. 2019, 26, 364–379. [Google Scholar] [CrossRef] [PubMed]
Schneeweiss, S. Learning from Big Health Care Data. N. Engl. J. Med. 2014, 370, 2161–2163. [Google Scholar] [CrossRef]
Fernandes, A.C.; Chandran, D.; Khondoker, M.; Dewey, M.; Shetty, H.; Dutta, R.; Stewart, R. Demographic and clinical factors associated with different antidepressant treatments: A retrospective cohort study design in a UK psychiatric healthcare setting. BMJ Open 2018, 8, e022170. [Google Scholar] [CrossRef] [PubMed]
Barrett, J.R.; Lee, W.; Shetty, H.; Broadbent, M.; Cross, S.; Hotopf, M.; Stewart, R. ‘He left me a message on Facebook’: Comparing the risk profiles of self-harming patients who leave paper suicide notes with those who leave messages on new media. BJPsych Open 2016, 2, 217–220. [Google Scholar] [CrossRef]
Downs, J.; Velupillai, S.; George, G.; Holden, R.; Kikoler, M.; Dean, H.; Fernandes, A.; Dutta, R. Detection of Suicidality in Adolescents with Autism Spectrum Disorders: Developing a Natural Language Processing Approach for Use in Electronic Health Records. AMIA Annu. Symp. Proc. AMIA Symp. 2017, 2017, 641–649. [Google Scholar] [PubMed]
Al-Harrasi, A.M.; Iqbal, E.; Tsamakis, K.; Lasek, J.; Gadelrab, R.; Soysal, P.; Kohlhoff, E.; Tsiptsios, D.; Rizos, E.; Perera, G.; et al. Motor signs in Alzheimer’s disease and vascular dementia: Detection through natural language processing, co-morbid features and relationship to adverse outcomes. Exp. Gerontol. 2021, 146, 111223. [Google Scholar] [CrossRef]
Irving, J.; Colling, C.; Shetty, H.; Pritchard, M.; Stewart, R.; Fusar-Poli, P.; McGuire, P.; Patel, R. Gender differences in clinical presentation and illicit substance use during first episode psychosis: A natural language processing, electronic case register study. BMJ Open 2021, 11, e042949. [Google Scholar] [CrossRef] [PubMed]
Kapadi, T.; Luz, S. Natural Language Processing of Electronic Patient Records to Predict Psychiatric Inpatients at Risk of Early Readmission to Hospital Using Predictive Models Derived Through Machine Learning. BJPsych Open 2022, 8, S6. [Google Scholar] [CrossRef]
Han, S.; Zhang, R.F.; Shi, L.; Richie, R.; Liu, H.; Tseng, A.; Quan, W.; Ryan, N.; Brent, D.; Tsui, F.R. Classifying social determinants of health from unstructured electronic health records using deep learning-based natural language processing. J. Biomed. Inform. 2022, 127, 103984. [Google Scholar] [CrossRef]
Loper, E.; Bird, S. NLTK: The Natural Language Toolkit. In Proceedings of the ACL-02 Workshop on Effective Tools and Methodologies for Teaching Natural Language Processing and Computational Linguistics, Philadelphia, PA, USA, 7 July 2002; Volume 1, pp. 63–70. [Google Scholar] [CrossRef]
Pedregosa, F.; Varoquaux, G.; Gramfort, A.; Michel, V.; Thirion, B.; Grisel, O.; Blondel, M.; Prettenhofer, P.; Weiss, R.; Dubourg, V.; et al. Scikit-learn: Machine Learning in Python. J. Mach. Learn. Res. 2011, 12, 2825–2830. [Google Scholar]
Youden, W.J. Index for rating diagnostic tests. Cancer 1950, 3, 32–35. [Google Scholar] [CrossRef] [PubMed]
Yerushalmy, J. Statistical problems in assessing methods of medical diagnosis, with special reference to X-ray techniques. Public Heal. Rep. 1947, 62, 1432–1449. [Google Scholar] [CrossRef]
Veterans’ Strategy Action Plan (2022–2024). London, UK. 2022. Available online: https://assets.publishing.service.gov.uk/government/uploads/system/uploads/attachment_data/file/1103936/Veterans-Strategy-Action-Plan-2022-2024.pdf (accessed on 15 December 2022).
Healthcare for the Armed Forces Community: A Forward View. London, UK. 2021. Available online: https://www.england.nhs.uk/wp-content/uploads/2021/03/Healthcare-for-the-Armed-Forces-community-forward-view-March-2021.pdf (accessed on 15 December 2022).
Stewart, R. The big case register. Acta Psychiatr. Scand. 2014, 130, 83–86. [Google Scholar] [CrossRef]
Stewart, R.; Soremekun, M.; Perera, G.; Broadbent, M.; Callard, F.; Denis, M.; Hotopf, M.; Thornicroft, G.; Lovestone, S. The South London and Maudsley NHS Foundation Trust Biomedical Research Centre (SLAM BRC) case register: Development and descriptive data. BMC Psychiatry 2009, 9, 51. [Google Scholar] [CrossRef]
Velupillai, S.; Hadlaczky, G.; Baca-Garcia, E.; Gorrell, G.M.; Werbeloff, N.; Nguyen, D.; Patel, R.; Leightley, D.; Downs, J.; Hotopf, M.; et al. Risk Assessment Tools and Data-Driven Approaches for Predicting and Preventing Suicidal Behavior. Front. Psychiatry 2019, 10, 36. [Google Scholar] [CrossRef]
Downs, J.M.; Ford, T.; Stewart, R.; Epstein, S.; Shetty, H.; Little, R.; Jewell, A.; Broadbent, M.; Deighton, J.; Mostafa, T.; et al. An approach to linking education, social care and electronic health records for children and young people in South London: A linkage study of child and adolescent mental health service data. BMJ Open 2019, 9, e024355. [Google Scholar] [CrossRef]
Rhead, R.; MacManus, D.; Jones, M.; Greenberg, N.; Fear, N.T.; Goodwin, L. Mental health disorders and alcohol misuse among UK military veterans and the general population: A comparison study. Psychol. Med. 2020, 52, 292–302. [Google Scholar] [CrossRef]
CRIS NLP SERVICE: Library of Production-Ready Applications. London, UK. 2022. Available online: https://www.maudsleybrc.nihr.ac.uk/media/463740/applications-library-v21.pdf (accessed on 1 February 2022).
Armour, C.; Ross, J.; McLafferty, M.; Hall, M. Public Attitudes to the UK Armed Forces in Northern Ireland. Belfast, Ireland. 2018. Available online: https://nivso.org.uk/storage/194/3-NIVHWS+-+Public+attitudes+to+the+UK+Armed+Forces+in+Northern+Ireland-June-2018----3-part-1.pdf (accessed on 1 February 2023).
Suicide by Occupation, England and Wales, 2011 to 2020 Registrations. London, UK. 2021. Available online: https://www.ons.gov.uk/peoplepopulationandcommunity/birthsdeathsandmarriages/deaths/adhocs/13674suicidebyoccupationenglandandwales2011to2020registrations (accessed on 15 December 2022).

Figure 1. Flow diagram of the recruitment process for the MSIT validation study.

Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

© 2023 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).

Share and Cite

MDPI and ACS Style

Leightley, D.; Palmer, L.; Williamson, C.; Leal, R.; Chandran, D.; Murphy, D.; Fear, N.T.; Stevelink, S.A.M. Identifying Military Service Status in Electronic Healthcare Records from Psychiatric Secondary Healthcare Services: A Validation Exercise Using the Military Service Identification Tool. Healthcare 2023, 11, 524. https://doi.org/10.3390/healthcare11040524

AMA Style

Leightley D, Palmer L, Williamson C, Leal R, Chandran D, Murphy D, Fear NT, Stevelink SAM. Identifying Military Service Status in Electronic Healthcare Records from Psychiatric Secondary Healthcare Services: A Validation Exercise Using the Military Service Identification Tool. Healthcare. 2023; 11(4):524. https://doi.org/10.3390/healthcare11040524

Chicago/Turabian Style

Leightley, Daniel, Laura Palmer, Charlotte Williamson, Ray Leal, Dave Chandran, Dominic Murphy, Nicola T. Fear, and Sharon A. M. Stevelink. 2023. "Identifying Military Service Status in Electronic Healthcare Records from Psychiatric Secondary Healthcare Services: A Validation Exercise Using the Military Service Identification Tool" Healthcare 11, no. 4: 524. https://doi.org/10.3390/healthcare11040524

APA Style

Leightley, D., Palmer, L., Williamson, C., Leal, R., Chandran, D., Murphy, D., Fear, N. T., & Stevelink, S. A. M. (2023). Identifying Military Service Status in Electronic Healthcare Records from Psychiatric Secondary Healthcare Services: A Validation Exercise Using the Military Service Identification Tool. Healthcare, 11(4), 524. https://doi.org/10.3390/healthcare11040524

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Menu

Identifying Military Service Status in Electronic Healthcare Records from Psychiatric Secondary Healthcare Services: A Validation Exercise Using the Military Service Identification Tool

Abstract

1. Introduction

2. Background

3. Materials and Methods

3.1. Data Source—Clinical Record Interactive Search (CRIS) System

3.2. Military Service Identification Tool (MSIT)

3.3. Military Service Identification Tool (MSIT) Validation Approach

3.4. Statistical Analysis

3.5. Ethical Approval

4. Results

4.1. Sample and MSIT Execution

4.2. Validation Results

5. Discussion

5.1. Strengths and Limitations

5.2. Implications

6. Conclusions

Author Contributions

Funding

Institutional Review Board Statement

Informed Consent Statement

Data Availability Statement

Acknowledgments

Conflicts of Interest

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI