DECOVID: A UK Two-Center Harmonized Database of Acute Care Electronic Health Records for COVID-19 Research
Abstract
1. Summary
Data Visualizations
2. Methods
2.1. Cohort
2.2. Data Pipeline
2.3. Local Data Extraction Processes and Principles
3. Dataset Description
3.1. Data Summaries
| UCLH | UHB | |||
|---|---|---|---|---|
| Characteristic | COVID-19 1 | Non-COVID-19 | COVID-19 1 | Non-COVID-19 |
| Patient-level Summaries 2 | ||||
| Patients, n | 2483 | 75,153 | 7547 | 80,237 |
| Number of visits per patient, median [IQR] | 1 [1, 2] | 1 [1, 2] | 1 [1, 2] | 1 [1, 2] |
| Sex at birth, n (%) | ||||
| Female | 1107 (44.6) | 40,309 (53.6) | 3613 (47.9) | 39,952 (49.8) |
| Male | 1376 (55.4) | 34,809 (46.3) | 3934 (52.1) | 40,283 (50.2) |
| Not recorded or withheld 3 | 0 (0.0) | 35 (<0.1) | 0 (0.0) | <10 (<0.1) |
| Ethnicity 4, n (%) | ||||
| Asian or Asian British | 332 (13.4) | 6610 (8.8) | 1295 (17.2) | 10,243 (12.8) |
| Black or African or Caribbean or Black British | 269 (10.8) | 6256 (8.3) | 326 (4.3) | 3284 (4.1) |
| Mixed or Multiple ethnic groups | 47 (1.9) | 1593 (2.1) | 111 (1.5) | 1518 (1.9) |
| White | 1153 (46.4) | 37,339 (49.7) | 4420 (58.6) | 50,349 (62.8) |
| Other | 259 (10.4) | 8091 (10.8) | 216 (2.9) | 2426 (3.0) |
| Not recorded or withheld 4 | 423 (17.0) | 15,264 (20.3) | 1179 (15.6) | 12,417 (15.5) |
| Visit-level Summaries | ||||
| Hospital visits, n | 2783 | 115,478 | 7984 | 130,559 |
| Age at presentation 5, years, median [IQR] | 59 [45, 74] | 40 [30, 59] | 63 [48, 78] | 53 [35, 70] |
| Presenting from, visits, n (%) | ||||
| Home | 1155 (41.5) | 44,250 (38.3) | 5741 (71.9) | 54,183 (41.5) |
| Transferred as inpatient | 276 (9.9) | 2059 (1.8) | 298 (3.7) | 2890 (2.2) |
| Other settings 6 | <10 (<0.1) | 53 (0.0) | 207 (2.6) | 697 (0.5) |
| Not recorded or non-standard 7 | 1348 (48.4) | 69,116 (59.9) | 1738 (21.8) | 72,789 (55.8) |
| Body Mass Index at presentation, kg/m2, median [IQR] | 27.7 [23.0, 31.2] | 25.6 [22.2, 29.8] | 27.6 [23.7, 32.5] | 26.6 [23.1, 31.1] |
| Length of stay, days, median [IQR] | 4 [0, 12] | 0 [0, 1] | 3 [0, 10] | 0 [0, 1] |
| Types/levels of care ever received during visit 8, visits, n (%) | ||||
| Emergency Department | 2251 (80.9) | 82,466 (71.4) | 6954 (87.1) | 94,190 (72.1) |
| Inpatient ward (level 1) | 1926 (69.2) | 45,402 (39.3) | 6143 (76.9) | 60,003 (46.0) |
| HDU/ICU (level 2/3 care) | 857 (30.8) | 4961 (4.3) | 842 (10.5) | 11,659 (8.9) |
| Theaters 9 | 203 (7.3) | 13,627 (11.8) | <10 (<0.1) | 161 (0.1) |
| For selected clinical observations, median number of measurements per patient-day 10 [% of patient-days with no measurements] 11 | ||||
| Inpatient ward (level 1) | ||||
| Heart rate | 7.3 [5.2] | 7.0 [8.7] | 5.5 [6.7] | 5.5 [6.1] |
| Temperature | 7.2 [2.8] | 6.7 [8.7] | 5.5 [6.7] | 5.5 [6.1] |
| Respiratory rate | 7.4 [2.7] | 7.0 [8.5] | 5.5 [6.5] | 5.5 [6.0] |
| Oxygen saturation | 7.7 [2.8] | 7.0 [8.7] | 5.6 [6.6] | 5.5 [6.1] |
| Glasgow Coma Score Total Sum Score [34] | 0.1 [65.1] | 0.0 [66.8] | 0.0 [91.8] | 0.0 [86.1] |
| NEWS2 [35] | 7.0 [8.0] | 6.0 [14.4] | 5.4 [8.2] | 5.4 [6.7] |
| HDU/ICU (Level 2/3) | ||||
| Heart rate | 24.3 [3.1] | 23.0 [4.5] | 22.0 [2.3] | 18.0 [4.1] |
| Temperature | 11.8 [3.4] | 9.3 [4.8] | 9.4 [2.8] | 8.2 [4.6] |
| Respiratory rate | 23.3 [3.1] | 21.0 [4.4] | 4.7 [67.9] | 10.7 [23.4] |
| Oxygen saturation | 24.9 [3.1] | 22.7 [4.5] | 22.5 [2.3] | 18.0 [4.1] |
| Glasgow Coma Score Total Sum Score [34] | 5.0 [16.4] | 5.7 [19.1] | 3.3 [46.2] | 2.4 [45.4] |
| NEWS2 [35] | 0.1 [75.9] | 0.2 [67.9] | 0.0 [96.1] | 0.0 [62.6] |
| For selected laboratory/point-of-care tests, median number of measurements per patient-day 10 [% of patient-days with no measurement] 11 | ||||
| Inpatient ward (level 1) | ||||
| Full blood count | 0.7 [50.7] | 0.7 [47.4] | 0.5 [57.3] | 0.6 [52.8] |
| Electrolytes (Creatinine, Sodium, Potassium) | 0.7 [49.5] | 0.5 [47.8] | 0.5 [53.1] | 0.6 [49.8] |
| C-reactive protein | 0.5 [60.0] | 0.1 [71.3] | 0.5 [61.7] | 0.5 [59.9] |
| HDU/ICU (Level 2/3) | ||||
| Full blood count | 1.0 [17.2] | 1.0 [27.5] | 1.1 [14.3] | 1.1 [33.4] |
| Electrolytes (Creatinine, Sodium, Potassium) | 1.0 [16.2] | 1.1 [25.9] | 1.1 [13.3] | 1.1 [32.7] |
| C-reactive protein | 1.0 [19.1] | 1.0 [33.4] | 1.0 [24.2] | 1.0 [42.3] |
| Arterial blood gas | 5.0 [30.5] | 0.0 [57.1] | 7.8 [11.6] | 2.0 [47.7] |
| For selected medications, number of visits with at least one drug exposure record (% of total number of visits) | ||||
| Dexamethasone [36] | 876 (31.5) | 4175 (3.6) | 2009 (25.2) | 3734 (2.9) |
| Tocilizumab [37] | 53 (1.9) | 59 (0.1) | 11 (0.1) | 1 (0.0) |
| Maximum respiratory support received during visit (% of total visits) | ||||
| No respiratory support | 1187 (42.7) | 98,854 (85.6) | 3777 (47.3) | 112,000 (85.8) |
| Supplementary oxygen | 559 (20.1) | 4424 (3.8) | 3100 (38.8) | 15,244 (11.7) |
| High flow nasal oxygen | 346 (12.4) | 1557 (1.3) | 297 (3.7) | 960 (0.7) |
| Non-invasive ventilation | 351 (12.6) | 8966 (7.8) | 299 (3.7) | 801 (0.6) |
| Invasive ventilation | 340 (12.2) | 1677 (1.5) | 511 (6.4) | 1554 (1.2) |
| Clinical diagnoses, number of condition records per visit, median [IQR] | ||||
| Past medical history 12 | 2 [1, 3] | 1 [1, 3] | NA 13 | NA 13 |
| Hospital visit/episode level 14 | 1 [1, 2] | 1 [1, 1] | 3 [2, 4] | 2 [2, 2] |
| Consultant episode level 14 | 12 [7, 21] | 4 [2, 8] | 20 [11, 34] | 9 [5, 17] |
| Problem list 15 | 5 [2, 8] | 2 [2, 3] | 1 [1, 1] | 3 [1, 3] |
| Discharged to/Outcomes, visits, n (%) | ||||
| Discharged home | 2088 (75.0) | 106,088 (91.9) | 6086 (76.2) | 109,007 (83.5) |
| Discharged to other setting | 44 (1.6) | 414 (0.4) | 410 (5.1) | 2206 (1.7) |
| Transferred as inpatient | 181 (6.5) | 3548 (3.1) | 355 (4.4) | 3900 (3.0) |
| Remain in hospital 16 | 39 (1.4) | 4460 (3.9) | 33 (0.4) | 85 (0.1) |
| Died | 428 (15.4) | 675 (0.6) | 1044 (13.1) | 1605 (1.2) |
| Not recorded or non-standard 17 | <10 (0.1) | 293 (0.3) | 56 (0.7) | 13,756 (10.5) |
3.2. Data Tables
3.3. Database Identifiers
3.4. Data Records
3.5. Data Granularity
3.6. Data Quality
3.7. Technical Validation
4. User Notes
4.1. Data Access
4.2. Data Usage
Author Contributions
Funding
Data Availability Statement
Acknowledgments
Conflicts of Interest
Appendix A
| Quality Check |
|---|
| Every patient in the DECOVID database should have a valid record in the VISIT_OCCURRENCE data table |
| A patient’s year of birth should be between 1900 and 2002 |
| Discharge information should be coded as NULL for a patient if their visit_end_datetime is NULL (and vice versa) |
| A patient cannot have a visit_start_date from one visit that is strictly between the visit_start_date and visit_end_date from another visit. This quality check covers both the VISIT_OCCURRENCE and VISIT_DETAIL data tables |
| In the VISIT_OCCURRENCE data table, check that visit_start_date/datetime does not occur after the row was last updated (last_updated_datetime) |
| Patient visits to the A&E (i.e., emergency department) should have an interval of less than 24 h between visit_end_datetime and visit_start_datetime |
| Visits for the same patient should have the same person_id for all visits. Where there is a preceding visit_occurrence record for a patient, the visit_occurrence should have the same person_id and have an earlier start date |
| When discharge_to implies that a patient died, check that the patient has a death recorded |
| No patient should have a visit_occurrence with an end_date/datetime that extends 24 h beyond their death_date. This includes having no records in the VISIT_OCCURRENCE data table with a NULL visit_end_datetime |
| Rows in the MEASUREMENT data table that are complete duplicates of each other (apart from the measurement_id) are possibly a result of being part of different overlapping observation sets. These records are de-duplicated |
| For COVID swab test results in the SPECIMEN data table, pending results should have NULL time and NULL value. The time a sample was taken should be within the last 14 days |
| Except for the COVID swab tests, value_as_number and value_as_concept_id should not both be null; and measurment_datetime should not be null |
| with regard to the MEASUREMENT data table, check that all values lie within the ranges of possibility specified by a clinician, where possible |
| For the VISIT_OCCURRENCE and VISIT_DETAIL data tables, check that every record in the VISIT_OCCURRENCE data table has at least one record in the VISIT_DETAIL data table |
| Where there is a preceding visit (VISIT_OCCURRENCE and VISIT_DETAIL) ID, this visit should have an earlier start date |
| In any clinical data table, check that the concept is not from the wrong vocabulary, and the concept is of the correct type |
| In any clinical data table, check that all date and datetime fields match, and if a datetime field is not null then its corresponding date is not null |
| In the CONDITION_OCCURRENCE data table, the condition_start_date must be between a year before the patient’s year of birth and the condition_end_date. |
| In the CONDITION_OCCURRENCE data table, condition_start_date/datetime must be greater than visit_occurrence_start_date/datetime; and condition_end_date/datetime must be greater than (or equal to if date) condition_start_date/datetime |
| In any of the clinical data tables, table end_date/datetime must be greater than or equal to table start_date/datetime |
| In the CONDITION_OCCURRENCE data table, stop_reason, provider_id, condition_source_value, condition_source_concept_id, and condition_status_source_value are null |
| A person_id in the CONDITION_OCCURRENCE data table must match a person_id in the VISIT_OCCURRENCE and VISIT_DETAIL data tables |
| In the CONDITION_OCCURRENCE data table, condition_status_concept_id and condition_type_concept_id must match a concept_id in the concept table of the correct type |
| No sexual health conditions should be included in the CONDITION_OCCURRENCE data table |
| For the VISIT_OCCURRENCE data table, check that records start less than 60 min after a previous visit |
| Except for COVID swab tests, check that all measurements can be assigned to a visit_occurrence |
| For any clinical data table, check that the start_date/datetime is within bounds of the DECOVID project period (i.e., after 1 January 2020), including reference to the end_date/datetime |
| For records in the VISIT_OCCURRENCE and VISIT_DETAIL data tables, end_date/datetime should be after start_date/datetime |
| Where there are multiple records per patient in the VISIT_DETAIL data table, all but the first record should have a non-null value in preceding_visit_detail_id |
| For patients who have died and have a record in the DEATH data table, check that the death_datetime is not greater than the latest update_datetime and should not extend beyond the DECOVID observation period (i.e., prior to 1 January 2020) or is not NULL |
| Check that the concept is not from the wrong vocabulary for race/ethnicity |
| Local quality check reports are created, including data visualizations and tabular data summaries to explore the distribution of key variables, such as to identify patients with an age that is too high (see Figure A1 for an example) |

Appendix B
| Data Table | Revisions |
|---|---|
| person |
|
| observation_period |
|
| specimen |
|
| death |
|
| visit_occurrence |
|
| visit_detail |
|
| procedure_occurrence |
|
| drug_exposure |
|
| device_exposure |
|
| condition_occurrence |
|
| measurement |
|
| note |
|
| note_nlp |
|
| observation |
|
| fact_relationship |
|
References
- NHS. NHS Trust. Available online: https://datadictionary.nhs.uk/nhs_business_definitions/nhs_trust.html (accessed on 3 November 2025).
- DECOVID. DECOVID Protocol, DECOVID: A Highly Granular, Near Real Time Clinical Database and Research Environment from Digitally Mature NHS Trusts to Answer Critical Questions and Improve Patient Care During the COVID Pandemic. Available online: https://7aa1b654-606c-4622-8514-d83dfc3eba35.filesusr.com/ugd/93d683_8fd95f128e3d4daba264806de39685a9.pdf (accessed on 3 November 2025).
- Gallier, S.; Price, G.; Pandya, H.; McCarmack, G.; James, C.; Ruane, B.; Forty, L.; Crosby, B.L.; Atkin, C.; Evans, R.; et al. Infrastructure and operating processes of PIONEER, the HDR-UK Data Hub in Acute Care and the workings of the Data Trust Committee: A protocol paper. BMJ Health Care Inform. 2021, 28, e100294. [Google Scholar] [CrossRef]
- Garza, M.; Del Fiol, G.; Tenenbaum, J.; Walden, A.; Zozus, M.N. Evaluating common data models for use with a longitudinal community registry. J. Biomed. Inform. 2016, 64, 333–341. [Google Scholar] [CrossRef] [PubMed]
- Voss, E.A.; Makadia, R.; Matcho, A.; Ma, Q.; Knoll, C.; Schuemie, M.; DeFalco, F.J.; Londhe, A.; Zhu, V.; Ryan, P.B. Feasibility and utility of applications of the common data model to multiple, disparate observational health databases. J. Am. Med. Inform. Assoc. 2015, 22, 553–564. [Google Scholar] [CrossRef] [PubMed]
- Overhage, J.M.; Ryan, P.B.; Reich, C.G.; Hartzema, A.G.; Stang, P.E. Validation of a common data model for active safety surveillance research. J. Am. Med. Inform. Assoc. 2012, 19, 54–60. [Google Scholar] [CrossRef] [PubMed]
- Grimson, F.; Niklas, N.; Hermans, R.; Cirneanu, L.; Maissenhaelter, B.; Kim, J. Evaluation of statistical software for federated analysis of multi-site real world studies. In Proceedings of the Pharmaceutical Industry (PSI) 2019 Conference, Stockholm, Sweden, 27–30 May 2019. [Google Scholar]
- NHS Digital. Hospital Episode Statistics. Available online: https://digital.nhs.uk/data-and-information/data-tools-and-services/data-services/hospital-episode-statistics (accessed on 3 November 2025).
- Dahella, S.S.; Briggs, J.S.; Coombes, P.; Farajidavar, N.; Meredith, P.; Bonnici, T.; Darbyshire, J.L.; Watkinson, P.J. Implementing a system for the real-time risk assessment of patients considered for intensive care. BMC Med. Inform. Decis. Mak. 2020, 20, 161. [Google Scholar] [CrossRef]
- Wood, A.; Denholm, R.; Hollings, S.; Cooper, J.; Ip, S.; Walker, V.; Denaxas, S.; Akbari, A.; Banerjee, A.; Whiteley, W.; et al. Linked electronic health records for research on a nationwide cohort of more than 54 million people in England: Data resource. BMJ 2021, 373, 826. [Google Scholar] [CrossRef]
- NIHR Oxford Biomedical Research Centre. Infections in Oxfordshire Research Database (IORD). Available online: https://oxfordbrc.nihr.ac.uk/research-themes-overview/antimicrobial-resistance-and-modernising-microbiology/infections-in-oxfordshire-research-database-iord/#:~:text=The%20Infections%20in%20Oxfordshire%20Research,covering%20about%201%25%20of%20England (accessed on 3 November 2025).
- Harris, S.; Shi, S.; Brealey, D.; MacCallum, N.S.; Denaxas, S.; Perez-Suarez, D.; Ercole, A.; Watkinson, P.; Jones, A.; Ashworth, S.; et al. Critical Care Health Informatics Collaborative (CCHIC): Data, tools and methods for reproducible research: A multi-centre UK intensive care database. Int. J. Med. Inform. 2018, 112, 82–89. [Google Scholar] [CrossRef]
- GO FAIR. FAIR Principles. Available online: https://www.go-fair.org/fair-principles/ (accessed on 3 November 2025).
- ISARIC4C Consortium. ISARIC4C (Coronavirus Clinical Characterisation Consortium). Available online: https://isaric4c.net/ (accessed on 3 November 2025).
- Brat, G.A.; Weber, G.M.; Gehlenborg, N.; Avillach, P.; Palmer, N.P.; Chiovato, L.; Cimino, J.; Waitman, L.R.; Omenn, G.S.; Malovini, A.; et al. International electronic health record-derived COVID-19 clinical course profiles: The 4CE consortium. NPJ Digit. Med. 2020, 3, 109. [Google Scholar] [CrossRef]
- Pollard, T.; Johnson, A.E.W.; Raffa, J.D.; Celi, L.A.; Mark, R.G.; Badawi, O. The eICU Collaborative Research Database, a freely available multi-center database for critical care research. Sci. Data 2018, 5, 180178. [Google Scholar] [CrossRef]
- Bennett, T.D.; Moffitt, R.A.; Hajagos, J.G.; Amor, B.; Anand, A.; Bissell, M.M.; Bradwell, K.R.; Bremer, C.; Byrd, J.B.; Denham, A.; et al. Clinical characterization and prediction of clinical severity of SARS-CoV-2 infection among US adults using data from the US National COVID Cohort Collaborative. JAMA Netw. Open 2021, 4, e2116901. [Google Scholar] [CrossRef]
- Observational Health Data Sciences and Informatics. ATHENA. Available online: https://athena.ohdsi.org/search-terms/start (accessed on 3 November 2025).
- NHS. Critical Care Level. Available online: https://datadictionary.nhs.uk/attributes/critical_care_level.html (accessed on 3 November 2025).
- NHS Health Research Authority. DECOVID V1 [COVID-19] (Ethics Approval). Available online: https://www.hra.nhs.uk/planning-and-improving-research/application-summaries/research-summaries/decovid-v1/ (accessed on 3 November 2025).
- Public Health England. NHS Acute (Hospital) Trust Catchment Populations. Available online: https://app.powerbi.com/view?r=eyJrIjoiODZmNGQ0YzItZDAwZi00MzFiLWE4NzAtMzVmNTUwMThmMTVlIiwidCI6ImVlNGUxNDk5LTRhMzUtNGIyZS1hZDQ3LTVmM2NmOWRlODY2NiIsImMiOjh9 (accessed on 3 November 2025).
- Oakley, C.; Pascoe, C.; Balthazor, D.; Bennett, D.; Gautam, N.; Isaac, J.; Isherwood, P.; Matthews, T.; Murphy, N.; Oelofse, T.; et al. Assembly Line ICU: What the Long Shops taught us about managing surge capacity for COVID-19. BMJ Open Qual. 2020, 9, e001117. [Google Scholar] [CrossRef]
- Epic Systems Corporation. Software Verona, Wisconsin. Available online: https://www.epic.com/ (accessed on 3 November 2025).
- University Hospitals Birmingham NHS Foundation Trust. Birmingham Systems PICS. Available online: https://web.archive.org/web/20170806072442/https://www.uhb.nhs.uk/birmingham-systems-pics.htm (accessed on 3 November 2025).
- Observational Health Data Sciences and Informatics. OMOP Common Data Model. Available online: https://ohdsi.github.io/CommonDataModel/ (accessed on 3 November 2025).
- LOINC (Regenstrief Institute, Inc.). Logical Observation Identifiers Names and Codes. Available online: https://loinc.org/ (accessed on 3 November 2025).
- SNOMED International. Systematized Nomenclature of Medicine Clinical Terms. Available online: https://www.snomed.org/ (accessed on 3 November 2025).
- NHS Digital. Read Codes. Available online: https://digital.nhs.uk/services/terminology-and-classifications/read-codes (accessed on 3 November 2025).
- World Health Organization. International Statistical Classification of Diseases and Related Health Problems (ICD). Available online: https://www.who.int/standards/classifications/classification-of-diseases (accessed on 3 November 2025).
- DECOVID Data Paper GitHub Repository. Available online: https://github.com/alan-turing-institute/DECOVID-data-paper/ (accessed on 3 November 2025).
- Bakewell, N.; Goudie, R.J.B.; Gardiner, S.; Karoune, E.; Rockenschaub, P.; Green, B.; Nicholls, H.; Whitaker, K.J.; Aslett, L. Alan-Turing-Institute/DECOVID-Data-Paper: DECOVID Data Paper Repository; Version V1; Zenodo: London, UK, 2025. [Google Scholar] [CrossRef]
- Rusanov, A.; Weiskopf, N.G.; Wang, S.; Weng, C. Hidden in plain sight: Bias towards sick patients when sampling patients with sufficient electronic health record data for research. BMC Med. Inform. Dec. Mak. 2014, 14, 51. [Google Scholar] [CrossRef]
- NHS Digital. Hospital Admitted Patient Care Activity 2020-21. Available online: https://digital.nhs.uk/data-and-information/publications/statistical/hospital-admitted-patient-care-activity/2020-21 (accessed on 3 November 2025).
- Teasdale, G.; Jennett, B. Assessment of coma and impaired consciousness: A practical scale. Lancet 1974, 304, 81–84. [Google Scholar] [CrossRef]
- Royal College of Physicians. National Early Warning Score (NEWS) 2: Standardising the Assessment of Acute-Illness Severity in the NHS; Updated report of a working party; RCP: London, UK, 2017. [Google Scholar]
- RECOVERY Collaborative Group; Horby, P.; Lim, W.S.; Emberson, J.R.; Mafham, M.; Bell, J.L.; Linsell, L.; Staplin, N.; Brightling, C.; Ustianowski, A. Dexamethasone in hospitalized patients with COVID-19. N. Engl. J. Med. 2021, 384, 693–704. [Google Scholar]
- RECOVERY Collaborative Group. Tocilizumab in patients admitted to hospital with COVID-19 (RECOVERY): A randomised, controlled, open-label, platform trial. Lancet 2021, 397, 1637–1645. [Google Scholar] [CrossRef] [PubMed]
- Office for National Statistics. Ethnic Group, National Identity and Religion. Available online: https://www.ons.gov.uk/methodology/classificationsandstandards/measuringequality/ethnicgroupnationalidentityandreligion (accessed on 3 November 2025).
- NHS Digital. NHS Data Model and Dictionary, Admission Source. Available online: https://www.datadictionary.nhs.uk/attributes/admission_source.html?hl=admission%2Csource (accessed on 3 November 2025).
- Poulos, J.; Zhu, L.; Shah, A.D. Data gaps in electronic health record (EHR) systems: An audit of problem list completeness during the COVID-19 pandemic. Int. J. Med. Inform. 2021, 150, 104452. [Google Scholar] [CrossRef]
- NHS Digital. NHS Data Model and Dictionary, Destination of Discharge. Available online: https://www.datadictionary.nhs.uk/attributes/destination_of_discharge.html (accessed on 3 November 2025).
- Codd, E.F. Further normalization of the database relational model. In Data Base Systems. Courant Computer Science Symposium, 6th ed.; Rustin, E., Ed.; Prentice-Hall: Englewood Cliffs, NJ, USA, 1972; pp. 33–64. [Google Scholar]
- Observational Health Data Sciences and Informatics. Standardized Clinical Data Tables. Available online: https://www.ohdsi.org/web/wiki/doku.php?id=documentation:cdm:standardized_clinical_data_tables (accessed on 3 November 2025).
- NHS Businesses Services Authority. Dictionary of Medicines and Devices (DM+D). Available online: https://www.nhsbsa.nhs.uk/pharmacies-gp-practices-and-appliance-contractors/dictionary-medicines-and-devices-dmd (accessed on 3 November 2025).
- NHS. Consultant Episode (Hospital Provider). Available online: https://datadictionary.nhs.uk/nhs_business_definitions/consultant_episode__hospital_provider_.html (accessed on 3 November 2025).
- Cowie, M.R.; Blomster, J.I.; Curtis, L.H.; Duclaux, S.; Ford, I.; Fritz, F.; Goldman, S.; Janmohamed, S.; Kreuzer, J.; Leenay, M.; et al. Electronic health records to facilitate clinical research. Clin. Res. Cardiol. 2017, 106, 1–9. [Google Scholar] [CrossRef] [PubMed]
- OHDSI. Data Quality Dashboard. Available online: https://github.com/OHDSI/DataQualityDashboard (accessed on 3 November 2025).
- OHDSI. Software Tools. Available online: https://www.ohdsi.org/software-tools/ (accessed on 3 November 2025).
- Kahn, M.G.; Callahan, T.J.; Barnard, J.; Bauck, A.E.; Brown, J.; Davidson, B.N.; Estiri, H.; Goerg, C.; Holve, E.; Johnson, S.G.; et al. A harmonized data quality assessment terminology and framework for the secondary use of electronic health record data. EGEMS 2016, 4, 1244. [Google Scholar] [CrossRef] [PubMed]
- Palmer, E. d.inspectEHR. Available online: https://github.com/DocEd/d.inspectEHR (accessed on 3 November 2025).
- United Kingdom Data Service. What Is the Five Safes Framework? Available online: https://ukdataservice.ac.uk/help/secure-lab/what-is-the-five-safes-framework/ (accessed on 3 November 2025).
- NHS England Data Security and Protection Toolkit. Available online: https://www.dsptoolkit.nhs.uk/Help/3 (accessed on 3 November 2025).
- PIONEER Data Request Process. Available online: https://www.pioneerdatahub.co.uk/data/data-request-process/ (accessed on 3 November 2025).
- PIONEER Data Service & Costs. Available online: https://www.pioneerdatahub.co.uk/data/data-services-costs/ (accessed on 3 November 2025).
- PIONEER Data Request Form User Guide. Available online: https://www.pioneerdatahub.co.uk/wp-content/uploads/PIONEER-DRF-User-Guide-Final.pdf (accessed on 3 November 2025).
- The UK Caldicott Guardian Council. The Caldicott Principles. Available online: https://www.ukcgc.uk/the-caldicott-principles (accessed on 3 November 2025).
- DECOVID. Research. Available online: https://web.archive.org/web/20211206225110/https://www.decovid.org/research (accessed on 3 November 2025).
- Office for National Statistics. Deaths Involving COVID-19 by Local Area and Socioeconomic Deprivation. Available online: https://www.ons.gov.uk/peoplepopulationandcommunity/birthsdeathsandmarriages/deaths/bulletins/deathsinvolvingcovid19bylocalareasanddeprivation/deathsoccurringbetween1marchand31july2020 (accessed on 3 November 2025).
- Leslie, D.; Mazumder, A.; Peppin, A.; Wolters, M.K.; Hagerty, A. Does “AI” stand for augmenting inequality in the era of COVID-19 healthcare? BMJ 2021, 372, 304. [Google Scholar] [CrossRef]
- Public Health England. Beyond the Data: Understanding the Impact of COVID-19 on BAME Groups. Available online: https://assets.publishing.service.gov.uk/government/uploads/system/uploads/attachment_data/file/892376/COVID_stakeholder_engagement_synthesis_beyond_the_data.pdf (accessed on 3 November 2025).
- Abrams, E.M.; Szefler, S.J. COVID-19 and the impact of social determinants of health. Lancet Respir. Med. 2020, 8, 659–661. [Google Scholar] [CrossRef]
- Quinn, S.C.; Kumar, S. Health inequalities and infectious disease epidemics: A challenge for global health security. Biosecur. Bioterror. 2014, 12, 263–273. [Google Scholar] [CrossRef] [PubMed]
- Chowkwanyun, M.; Reed, A.L., Jr. Racial health disparities and COVID-19—Caution and context. N. Engl. J. Med. 2020, 383, 201–203. [Google Scholar] [CrossRef] [PubMed]
- Agniel, D.; Kohane, I.S.; Weber, G.M. Biases in electronic health record data due to processes within the healthcare system: Retrospective observational study. BMJ 2018, 361, 1479. [Google Scholar] [CrossRef] [PubMed]







| Table Name | Description |
|---|---|
| person | Contains records that uniquely identify each patient in the source data. The sex at birth of patients is recorded in the gender_concept_id field. This field is mapped to standard mapping values of male, female, or other/unknown, which includes cases where the sex at birth of a patient is withheld, or not asked/missing. Note, the name of this field is the naming convention used in the person table of the OMOP CDM, which is why the name has not been revised to sex. The ethnicity of patients is recorded in the race_concept_id column, according to the 18 ethnic groups used in the 2011 England and Wales Census [38]. |
| death | Contains the clinical event for when a patient dies, including both in-hospital and out-of-hospital deaths (extracted from the NHS Spine up until 31 March 2021). Cause of death is not included, and the date of death, rather than the precise time of death, is recorded. |
| visit_occurrence | Contains records on the spans of time describing a patient’s individual episodes of care/visits. |
| visit_detail | Contains records on clinically meaningful movements of a patient within each record of the parent visit_occurrence table. Each row represents a movement between geographically separate care sites within a hospital Trust (e.g., patient transferred from one Adult Inpatient Ward to another Adult Inpatient Ward (geographically distinct from the first)) or between care sites within a hospital (e.g., patient transferred from A and E Majors to ICU within a hospital). |
| condition_occurrence | Contains records on the presence of a disease or medical condition stated as a diagnosis, and a sign or a symptom, which is either observed by a provider or reported by the patient. The concepts in this table were mapped from vocabularies, including diagnosis standards such as SNOMED-CT and ICD-9/10. |
| measurement | Contains records of measurements, i.e., structured values (numerical or categorical) obtained about a patient or a patient’s clinical/biological sample. The concepts of this table were primarily mapped from SNOMED and LOINC codes. There are 135 distinct clinical measurement types. |
| specimen | Contains records identifying clinical/biological samples from a patient. |
| drug_exposure | Contains records about the utilization of a drug when ingested or otherwise introduced into the body. The concepts used in this table were primarily mapped from SNOMED codes. |
| procedure_occurrence | In DECOVID, this contains only records on the insertion and removal of endotracheal and tracheostomy tubes. The concepts used in this table were all mapped from SNOMED codes. |
| fact_relationship | Contains records (i.e., facts) that belong to OMOP-CDM domains (e.g., Measurement) and their relationship(s) with other records from any of the OMOP-CDM data tables that may belong to the same OMOP-CDM domain or a different OMOP-CDM domain. |
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content. |
© 2025 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).
Share and Cite
DECOVID Consortium; Aslett, L.J.M.; Avramescu, A.; Bakewell, N.; Birds, I.; Bowler, L.; Camilleri, M.P.J.; Chung, S.-C.; Clifton, D.A.; Cohen, S.N.; et al. DECOVID: A UK Two-Center Harmonized Database of Acute Care Electronic Health Records for COVID-19 Research. Data 2025, 10, 195. https://doi.org/10.3390/data10120195
DECOVID Consortium, Aslett LJM, Avramescu A, Bakewell N, Birds I, Bowler L, Camilleri MPJ, Chung S-C, Clifton DA, Cohen SN, et al. DECOVID: A UK Two-Center Harmonized Database of Acute Care Electronic Health Records for COVID-19 Research. Data. 2025; 10(12):195. https://doi.org/10.3390/data10120195
Chicago/Turabian StyleDECOVID Consortium, Louis J. M. Aslett, Andreea Avramescu, Nicholas Bakewell, Isabel Birds, Louise Bowler, Michael P. J. Camilleri, Sheng-Chia Chung, David A. Clifton, Samuel N. Cohen, and et al. 2025. "DECOVID: A UK Two-Center Harmonized Database of Acute Care Electronic Health Records for COVID-19 Research" Data 10, no. 12: 195. https://doi.org/10.3390/data10120195
APA StyleDECOVID Consortium, Aslett, L. J. M., Avramescu, A., Bakewell, N., Birds, I., Bowler, L., Camilleri, M. P. J., Chung, S.-C., Clifton, D. A., Cohen, S. N., Constantine-Cooke, N., Daub, E. G., Davidson, S., Denaxas, S., Diaz-Ordaz, K., Feltbower, R., Gallier, S., Gardiner, S., Gasperoni, F., ... Zou, X. (2025). DECOVID: A UK Two-Center Harmonized Database of Acute Care Electronic Health Records for COVID-19 Research. Data, 10(12), 195. https://doi.org/10.3390/data10120195

