This research showed that it is feasible to identify two groups of probable veterans who accessed secondary mental health care services through SLaM using CRIS. The hit rate for correctly detecting veterans from these EHRs was 43.3% and our final sample consisted of 693 veterans. We were able to identify probable veterans from pre and post NS era, but the procedure for doing so was far from straightforward. This study is the first to identify differences in mental disorder diagnoses and treatment duration between NS era veterans and post NS era veterans. Further, as far as we are aware, it is also the first to identify veterans and non-veterans accessing the same secondary mental health service. This study identified differences in prevalence rates between pre and post NS era and healthcare utilisation. This could be due to older veterans having more complex healthcare needs coupled with higher levels of comorbidity resulting in longer and more frequent inpatient stays.
An important strength of the study was the exploitation of EHRs, which are advantageous for investigating subgroups within the wider population. For example, we found that NS era veterans had longer inpatient stays within SLaM than post NS era veterans. Future research is required to investigate the drivers behind this observation and why differences may exist between two groups of individuals that had the same occupation. Another strength was that we tested the utility and feasibility of identifying UK veterans accessing NHS secondary mental health care for the first time. The study was unique in describing and evaluating how to use secondary health care records to undertake research on veteran’s secondary mental health care utilisation.
It must be noted that there was no way to confirm our identified CRIS veterans, verified by the research team using clinical notes, were true
veterans; nor can we be confident of the integrity of the underlying primary data source. However, this is also the case for other EHR datasets, such as the Adult Psychiatric Morbidity Survey (APMS); for cohort studies, which require completers to self-report their veteran status [24
]; and for hospitals in general. We were careful and deliberate about who to classify as a probable veteran in this study—we read through all clinical notes at least twice and only confirmed veteran status when an explicit statement about the patient serving was reported. Even so, the process relied both on patients self-reporting having previously served in the military—which may have been inaccurate, particularly considering this population of individuals were suffering from severe and complex mental health problems; as well as clinicians documenting veteran status in their notes. Further, applying the exclusion terms may have resulted in false
negatives (i.e., wrongly excluding probable veterans), but it was not possible to evaluate this due to the large volumes of data used. It is worth noting, however, that protocol requires clinicians to talk through a patient’s previous occupations when they first enter mental health care services [25
Along the same lines, there was no way to confirm that our verified non-veterans had not served in the AF. Some veterans may not have volunteered the fact that they had belonged to the military to their health care provider. In this case, there would be no mention of these individuals’ veteran statuses in their clinical notes. We can acknowledge that certain demographic and military factors, such as age and time since leaving the service, might impact on disclosure [26
]—perhaps older veterans who left the military many years ago are less likely to reveal their history of serving in the AF than younger veterans who are still adjusting back to civilian life.
While we have shown that the process is feasible, the manual identification of probable veterans from CRIS was labour and resource intensive and time consuming. We recommend trying to improve the accuracy and efficiency of identifying veterans from EHR databases, such as CRIS, where possible. For example, as is already the case in NHS Scotland [27
], the implementation of a military marker across the UK, perhaps one that could be verified with the Ministry of Defence’s records, would be extremely helpful. This would clearly indicate which patients had previously served in the AF, eliminating the reliance on self-reported veteran status and speeding up the manual identification process as a result. However, this approach could only identify newly presenting veterans and would not help with determining serving status retrospectively.
A more immediate solution to accelerating veteran identification is the creation of digitalised tools, such as natural language processing (NLP) methods, to automatically detect these individuals using keywords and rules. Of great importance is its utility in being applied automatically to EHR and free-text clinical notes [28
]. NLP sub-themes, such as text mining, are represented as a set of programmatic rules or machine learning algorithms (i.e. automated learning from gold standard labelled data) to extract meaning from “naturally-occurring” text (meaning human generated text) [9
]. The result is often an output that can be interpreted by humans with relative ease [30
]. Previous work has developed successful NLP approaches for use within CRIS to identify subgroups of interest—for example, to allow the identification of patient suicide attempts [28
], or linking education, social care and EHR records for adolescents mental health analyses with is the subject of ongoing work [33
]. Considering this study, the creation of a similar NLP tool would ensure a consistent, reliable and effective approach to identifying veterans from free-text clinical notes. To the best of our knowledge, there is currently no tool within the UK that identifies veterans through secondary mental health care records in this way.
Because of time and resource constraints, we were only able to include 693 probable veterans. While this sample is large enough to show proof of concept, it is lacking in statistical power for more complex analyses, such as those involving subgroups of women or ethnic minority veterans. We have already detected approximately 6000 potential veterans from our current research. Based on this work, we expect that 30–40% could be verified as actual veterans using NLP, suggesting an anticipated sample size of 2000 veterans could easily be reached from CRIS. It would also be possible to match this veteran group to non-veterans accessing mental health services through SLaM. Using these data, we could establish whether there are similarities in socio-demographic, mental health, medication, suicide and treatment characteristics present between veterans and non-veterans, as well as between NS and post NS era veterans.
Data within EHRs are not collected primarily for research purposes and therefore often have large amounts of missing values [34
]—as was the case here. While our included data can adequately test feasibility, such missing values decrease reliability and robustness. When time and manpower allow, we recommend backfilling missing data for outcome variables going forwards, to ensure data completeness. We could use participants’ clinical written notes to do this, by manually working through each patient’s records one-by-one. Details left out of the database’s structured fields are often included within these free-text fields [35
], which would allow researchers to improve data quality. However, it must be acknowledged that these notes could be out-of-date and may introduce bias. Alternatively, we could use other known data sources to build a more comprehensive picture of the socio-demographic information of veterans seeking mental health treatment. We are aware that these additional data focus on patients who access primary mental health services, whose treatment needs are less complex than those who access secondary NHS care. However, some of our sample will have records in both systems, which would allow the exploitation of additional information for those individuals that do.