Self-Reported SARS-CoV-2 Vaccination Is Consistent with Electronic Health Record Data among the COVID-19 Community Research Partnership

Introduction: Observational studies of SARS-CoV-2 vaccine effectiveness depend on accurate ascertainment of vaccination receipt, date, and product type. Self-reported vaccine data may be more readily available to and less expensive for researchers than assessing medical records. Methods: We surveyed adult participants in the COVID-19 Community Research Partnership who had an authenticated Electronic Health Record (EHR) (N = 41,484) concerning receipt of SARS-CoV-2 vaccination using a daily survey beginning in December 2020 and a supplemental survey in September–October 2021. We compared self-reported information to that available in the EHR for the following data points: vaccine brand, date of first dose, and number of doses using rates of agreement and Bland–Altman plots for visual assessment. Self-reported data was available immediately following vaccination (in the daily survey) and at a delayed interval (in a secondary supplemental survey). Results: For the date of first vaccine dose, self-reported “immediate” recall was within ±7 days of the date reported in the “delayed” survey for 87.9% of participants. Among the 19.6% of participants with evidence of vaccination in their EHR, 95% self-reported vaccination in one of the two surveys. Self-reported dates were within ±7 days of documented EHR vaccination for 97.6% of the “immediate” surveys and 92.0% of the “delayed” surveys. Self-reported vaccine product details matched those in the EHR for over 98% of participants for both “immediate” and “delayed” surveys. Conclusions: Self-reported dates and product details for COVID-19 vaccination can be a good surrogate when medical records are unavailable in large observational studies. A secondary confirmation of dates for a subset of participants with EHR data will provide internal validity.


Introduction
Observational studies of SARS-CoV-2 vaccine effectiveness (VE) and time to seroconversion or seroreversion depend on accurate ascertainment of vaccination receipt, date, and product type. While medical records of vaccination are typically considered the gold 2 of 15 standard, self-reported vaccine data may be more readily available and less expensive within the context of an observational study.
In their guidance on how to assess COVID-19 VE using observational studies, the WHO advised against the use of self-reported COVID-19 vaccination data as the sole source indicating whether a person is vaccinated for primary analyses, due to recall bias and lack of product details [1]. Self-reported dates are subject to recall bias, but Electronic Health Record (EHR) data may contain vaccination status only on a subset of participants and may also be erroneous [2,3]. In the United States, COVID-19 vaccines have been made available in both medical and non-medical settings, including hospitals, pharmacies, public health departments, and mass vaccination sites, and the information may not be reliably recorded in a participant's EHR; thus, self-reported data may be the only source for obtaining real-time vaccination information in a disease surveillance study.
Researchers have assessed the validity of self-reported influenza vaccine status in various populations and have found self-reported influenza vaccine status to be a very sensitive and moderately specific indicator of actual vaccine status [4,5]. Other studies have found slightly lower, yet still high, sensitivity and specificity for pneumococcal vaccine status in adults [5][6][7][8]. Influenza vaccination agreement is higher when self-reporting on the current influenza season compared to the past season suggesting that there may be a difference in immediate and delayed self-reporting of vaccination [9]. Accurate self-report of other vaccinations, including hepatitis A, may be lower and the accuracy of self-report varies by race/ethnicity and age [4,10].
Previous research has focused on agreement of vaccine receipt but not on date of vaccine receipt or product [4][5][6][7][8][9][10]. To our knowledge, studies on the accuracy of selfreported COVID-19 vaccination receipt, date, number of doses, or product type are not yet publicly available.

Study Aims
The goal of this analysis was to assess the level of agreement of self-reported vaccination receipt, number of doses, date(s) received, and vaccine product from two separate surveys (immediate and delayed recall) with vaccination information from participant EHRs. We hypothesized strong agreement between the two based on previous research of self-reported vaccination information for other vaccines.

Study Sample
The COVID-19 Community Research Partnership (CCRP) is a prospective, multisite cohort syndromic COVID-19 surveillance study of a convenience sample of adults (18+ years) enrolled from April 2020 through June 2021 primarily through direct email outreach at 10 healthcare systems from the mid-Atlantic and southeastern United States (http://www.covid19communitystudy.org/, 14 March 2022) [11]. Data were collected via a secure, Health Insurance Portability and Accountability Act of 1996 (HIPAA)-compliant, online platform through March 2022. All participants provided informed consent, and Institutional Review Board (IRB) approval was provided by the Wake Forest School of Medicine IRB. Study sites included: Atrium Health (Charlotte, NC, USA), Campbell University School of Osteopathic Medicine (Lillington, NC, USA), Medstar Health (Columbia, MD, USA), New Hanover Regional Medical Center (Wilmington, NC, USA), Tulane University (New Orleans, LA, USA), University of Maryland, Baltimore (Baltimore, MD, USA), University of Mississippi Medical Center (Jackson, MS, USA), Vidant Health (Greenville, NC, USA), Wake Forest Baptist Health (Winston-Salem, NC, USA), and WakeMed Health and Hospitals (Raleigh, NC, USA). For a subset of participants, EHR data were available.

Ascertainment of Vaccination Date
At baseline, participants self-reported demographic characteristics including age, sex, race, ethnicity, educational level, healthcare worker status, and county of residence. We classified counties of residence as urban, rural, or suburban based on population density estimates. COVID-19 symptoms and personal behaviors were self-reported in daily electronic surveillance surveys. Beginning in December 2020, a question was added to the daily electronic surveillance survey asking participants "Have you received a vaccine for COVID-19, since the last survey?" Those responding in the affirmative were asked to enter the vaccination date, the dose number (dose 1 or dose 2), and the vaccine brand (Box A1). For the first month of the daily vaccine survey question, the date of receipt was not included; thus, we used the date the survey was completed as a surrogate for date of receipt. Additionally, we developed an algorithm to identify data entry errors in the daily survey (e.g., year < 2020) and again used the date the survey was entered as the vaccine date in these cases. A secondary supplemental survey (Box A2) was sent to participants in September-October 2021 to ask whether they had received a COVID-19 vaccine, the date of each dose, the dose number (allowing more than two doses), and the vaccine brand. The daily survey was used to evaluate self-reported, immediate recall while the secondary supplemental survey was used for self-reported, delayed recall. In addition, a subset of participants had a record of COVID-19 vaccination in their EHR.

Participants
Of the 66,403 adult participants enrolled in the CCRP, 46,228 (69.6%) had a data entry on or after 14 December 2020 (first date of COVID-19 vaccine availability in the U.S.). 41,484 adult participants (18+ years) enrolled in the CCRP with COVID-19 vaccination data (receipt, date, and brand) provided on one or more vaccine doses from at least one source (i.e., authenticated EHR, immediate recall survey response, or delayed recall survey response) were included in the current analysis ( Figure 1). For comparisons of sources, participants with at least two sources of vaccination data (immediate recall self-report, delayed recall self-report, or EHR data) were included in the comparison of vaccination information. Participants missing the date of dose 1 (i.e., only have an entry labeled as dose 2) on the daily survey or with implausible dates/dates with obvious typos (e.g., dates prior to 1 December 2020 for participants not in a clinical trial and dates after the date of entry) on the supplemental survey were excluded. Included in the primary analysis are 32,122 participants (Table 1) with at least two sources of vaccine information for comparison. In total, 29,919 participants are included in the comparison of the two self-reported sources (immediate and delayed recall); 6628 participants are included in the comparison of immediate recall self-report and EHR vac- Included in the primary analysis are 32,122 participants (Table 1) with at least two sources of vaccine information for comparison. In total, 29,919 participants are included in the comparison of the two self-reported sources (immediate and delayed recall); 6628 participants are included in the comparison of immediate recall self-report and EHR vaccine information; and 6241 participants are included in the comparison of delayed recall self-report and EHR vaccine information (Table 2).

Statistical Analysis
We compared vaccine receipt, number of doses, date of the first dose, and vaccine brand received from self-report (both immediate and delayed recall) to each other and to those available in the EHR. We calculated agreement of receipt, number of doses and vaccine brand (proportion of participants with matching report). We visually assessed Bland-Altman plots comparing the date of dose 1 (as number of days from 14 December 2020) from the various sources of vaccination dates, representing the mean difference and the 90% agreement interval around the mean difference. Lastly, we compared the presence of vaccine receipt in EHR compared with self-report among all participants with information about receipt and date of at least one dose of a COVID-19 vaccine from at least one source (N = 41,488). R version 4.0.13 was used for all statistical analyses. * Participants were excluded because their only dose in the daily survey ("immediate" recall) appeared to be their booster dose (e.g., it is after their delayed recall doses 1 and 2, or there is only one dose recorded in the daily/"immediate" recall survey, and the dose is reported to be received after 1 October 2021). † Delayed recall survey indicated dose 1 as the same date they took the survey likely due to an error with the date picker selecting "today's" date by default. In most cases for these participants, the dates of doses 1, 2, and 3 were all entered as "today's" date. ± Participants were excluded because their only vaccine date in EHR appeared to be their booster dose (e.g., does not match their self-reported dose 1 or 2, only one dose in EHR, and only EHR dose is after 1 October 2021).

Comparison of Two Sources of Self-Reported Vaccine Information: Immediate vs. Delayed Recall
Prior to comparing the vaccine information, using Bland-Altman plot analysis, we identified 1653 participants to exclude from further analysis ( Table 2 and Figure 2A). For most participants, the difference between delayed and immediate report of vaccine information is low, indicated by the fact that most points are included in the agreement interval around zero (green lines). Two point clouds outside the agreement interval at 100 and~200 days mean difference in days after 14 December 2020 reflect disagreement, potentially due to participants erroneously entering their booster date as their first dose, or due to entering the day of taking the survey as their date of vaccination. After excluding the flagged participants, 28,266 were available for comparison ( Table 2). The mean number of days between the initial survey (immediate recall) and supplemental survey (delayed recall) was 205.0 ± 42.6 days (mean ± SD). The number of doses self-reported immediately and self-reported delayed matched for 94.4% (N = 26,684). Self-reported immediate recall date of first dose of vaccine was within ±7 days of the delayed recall date for 87.9% of participants (Table 2 and Figure 2B). As depicted in Figure 1B, a majority of participants were within the 90% agreement interval around the mean difference (green lines), but there are groupings of participants with large differences where one dose may be mislabeled as a second or third dose on a subsequent survey. Self-reported vaccine product (for the first dose recorded) from the delayed recall survey matches the immediate recall survey for 98.2% of participants. ~200 days mean difference in days after 14 December 2020 reflect disagreement, poten-tially due to participants erroneously entering their booster date as their first dose, or due to entering the day of taking the survey as their date of vaccination. After excluding the flagged participants, 28,266 were available for comparison ( Table 2). The mean number of days between the initial survey (immediate recall) and supplemental survey (delayed recall) was 205.0 ± 42.6 days (mean ± SD). The number of doses self-reported immediately and self-reported delayed matched for 94.4% (N = 26,684). Self-reported immediate recall date of first dose of vaccine was within +/− 7 days of the delayed recall date for 87.9% of participants (Table 2 and Figure 2B). As depicted in Figure 1B, a majority of participants were within the 90% agreement interval around the mean difference (green lines), but there are groupings of participants with large differences where one dose may be mislabeled as a second or third dose on a subsequent survey. Self-reported vaccine product (for the first dose recorded) from the delayed recall survey matches the immediate recall survey for 98.2% of participants.

Comparison of Self-Reported Immediate Recall to EHR Vaccine Information
Prior to comparing the vaccine information, we flagged 567 participants (Table 2 and Figure 3A). After excluding the flagged participants, 6061 were available for comparison ( Table 2). The number of doses self-reported and recorded in EHR matched for 94.7%, and among participants who had two doses recorded in their EHR (N = 5826), 96.5% (N = 5621) participants also self-reported two doses. Self-reported date of first dose of vaccine was within +/− 7 days of the EHR date for 97.6% of participants (Table 2 and Figure 3B). As depicted in Figure 3B, after excluding the flagged participants, nearly all participants are

Comparison of Self-Reported Immediate Recall to EHR Vaccine Information
Prior to comparing the vaccine information, we flagged 567 participants (Table 2 and Figure 3A). After excluding the flagged participants, 6061 were available for comparison ( Table 2). The number of doses self-reported and recorded in EHR matched for 94.7%, and among participants who had two doses recorded in their EHR (N = 5826), 96.5% (N = 5621) participants also self-reported two doses. Self-reported date of first dose of vaccine was within ±7 days of the EHR date for 97.6% of participants (Table 2 and Figure 3B). As depicted in Figure 3B, after excluding the flagged participants, nearly all participants are within the 90% agreement interval around the mean difference (green lines) suggesting that the difference between self-reported date of the first dose and the first date listed in EHR is low. Self-reported vaccine product (for the first dose recorded) from the delayed recall survey matched the EHR product for 98.4% of participants. A majority of participants selfreported receiving the Pfizer (BNT16262) vaccine product (87%), 10% reported receiving Moderna (mRNA-1273), and then the remainder reported another brand.

Comparison of Self-Reported Delayed Recall to EHR Vaccine Information
Prior to comparing the vaccine information, we flagged 1072 participants (Table 2 and Figure 4A). After excluding the flagged participants, 5169 were available for comparison ( Table 2). The number of doses self-reported and recorded in EHR matched for 97.7% of participants and among participants who had two doses recorded in their EHR (N = 5018), 99.8% (N = 5006) of participants also self-reported two doses. Self-reported date of first dose of vaccine was within ±7 days of the EHR date for 92% of participants (Table 2 and Figure 4B). As illustrated in Figure 4B, after excluding the flagged participants, the majority of participants are within the 90% agreement interval around the mean difference (green lines) with a subset of participants in the top-center that may have mislabeled doses and participants in the bottom-center that have dates similar to the date the supplemental survey was released. Self-reported vaccine product (for the first dose recorded) from the delayed recall survey matched the EHR product for 99.1% of participants. Similar to within the 90% agreement interval around the mean difference (green lines) suggesting that the difference between self-reported date of the first dose and the first date listed in EHR is low. Self-reported vaccine product (for the first dose recorded) from the delayed recall survey matched the EHR product for 98.4% of participants. A majority of participants self-reported receiving the Pfizer (BNT16262) vaccine product (87%), 10% reported receiving Moderna (mRNA-1273), and then the remainder reported another brand.

Comparison of Self-Reported Delayed Recall to EHR Vaccine Information
Prior to comparing the vaccine information, we flagged 1072 participants (Table 2 and Figure 4A). After excluding the flagged participants, 5169 were available for comparison ( Table 2). The number of doses self-reported and recorded in EHR matched for 97.7% of participants and among participants who had two doses recorded in their EHR (N = 5018), 99.8% (N = 5006) of participants also self-reported two doses. Self-reported date of first dose of vaccine was within +/− 7 days of the EHR date for 92% of participants (Table 2 and Figure 4B). As illustrated in Figure 4B, after excluding the flagged participants, the majority of participants are within the 90% agreement interval around the mean difference (green lines) with a subset of participants in the top-center that may have mislabeled doses and participants in the bottom-center that have dates similar to the date the supplemental survey was released. Self-reported vaccine product (for the first dose recorded) from the delayed recall survey matched the EHR product for 99.1% of participants. Similar to the immediate recall, a majority of participants self-reported receiving the Pfizer vaccine product (87%), 11% reported receiving Moderna, and the remainder reported receiving another brand.

Comparison of Vaccine Receipt
As a secondary analysis, we compared the presence of vaccine receipt in the EHR with self-report among all participants with information about receipt and date of at least one dose of a COVID-19 vaccine from at least one source (N = 41,484). Among participants with at least one survey entry after 1 March 2021 when vaccines were widely available (N = 39,735), 19.6% (N = 7786) of participants had evidence of vaccination in their EHR, and 94.9% of them self-reported vaccination in one of the two surveys. Among participants with at least one survey entry after the supplemental survey was sent on 16 September (N = 32,103), 19.6% of participants (N = 6303) had evidence of vaccination in their EHR and 99% of them self-reported vaccination in the supplemental survey (Appendix Table A1).

Discussion
In the CCRP cohort, we found strong agreement between self-reported and EHR information pertaining to receipt of the SARS-CoV-2 vaccinations including number of doses, date of receipt of the first dose, and product details. In our cohort, there was high

Comparison of Vaccine Receipt
As a secondary analysis, we compared the presence of vaccine receipt in the EHR with self-report among all participants with information about receipt and date of at least one dose of a COVID-19 vaccine from at least one source (N = 41,484). Among participants with at least one survey entry after 1 March 2021 when vaccines were widely available (N = 39,735), 19.6% (N = 7786) of participants had evidence of vaccination in their EHR, and 94.9% of them self-reported vaccination in one of the two surveys. Among participants with at least one survey entry after the supplemental survey was sent on 16 September (N = 32,103), 19.6% of participants (N = 6303) had evidence of vaccination in their EHR and 99% of them self-reported vaccination in the supplemental survey (Appendix A Table A1).

Discussion
In the CCRP cohort, we found strong agreement between self-reported and EHR information pertaining to receipt of the SARS-CoV-2 vaccinations including number of doses, date of receipt of the first dose, and product details. In our cohort, there was high agreement between immediate and delayed recall self-reported vaccination information in terms of number of doses reported (94% agreement), vaccine product (98% agreement), and date of first vaccine dose (88% within ±7 days). While number of doses and vaccine product had similar agreement with EHR vaccination information for the immediate and delayed recall self-report, the immediate recall date of dose 1 was more closely aligned with the date found in EHR, with 98% of participants within ±7 days for immediate recall compared to 92% for delayed recall. This difference is likely explained by the timing between vaccination and report. On average, 205 days passed between the daily survey (immediate recall) and supplemental survey (delayed recall). While the coverage of SARS-CoV-2 vaccination data in participant electronic medical records was only available for about a fifth of participants with linked EHR, nearly all of them (95%) also self-reported vaccination. Among participants with data in two sources, comparison of the details (number of doses, dates, and product) revealed the strength of self-reported COVID-19 vaccination data.
To our knowledge, this is the first study to evaluate the accuracy of self-reported COVID-19 vaccination information. Researchers have shown that in test-negative studies of VE, self-reported influenza vaccination results in biased estimates, suggesting the importance of validating self-reported dates [12,13]. However, they also found that the greatest impact for influenza VE estimation occurred when vaccination coverage was low. About half of Americans typically receive an influenza vaccine annually [14] compared with over 80% of Americans having received at least one dose of a SARS-CoV-2 vaccine [15]. This difference in vaccine coverage (50% for influenza vs. 80% for COVID-19) taken together with our findings of high agreement of self-report and EHR vaccine information suggests that the use of self-reported SARS-CoV-2 vaccination may have less of an impact on biasing VE estimates than observed with influenza.
While EHR vaccination information may be considered the gold standard, we note that the lack of documentation of vaccination in EHR likely does not indicate that no vaccination had been obtained. In the CCRP, only a subset of our participants had documentation of SARS-CoV-2 vaccination in their medical records. Moreover, vaccination dates in EHR do not guarantee delivery of the vaccine by the healthcare system and may be data entry from a patient's vaccination card (including the possibility of forged vaccination cards) or patient recall during subsequent routine medical visits. Data entry errors are also possible in medical records.
The COVID-19 vaccines are unique from other vaccinations (e.g., influenza) in their distribution at largely mass vaccination sites and pharmacies (e.g., CVS, Walgreens) rather than from one's primary care provider or employer [16,17]. The rollout of the vaccines in December 2020 and early 2021 was a memorable milestone in the pandemic which may have made the date of receipt of first dose more unforgettable than an annual influenza vaccine. Additionally, the vaccine brands were widely discussed by the media, making them part of popular culture and easily recognizable [18].
The CCRP comparison of self-reported and EHR vaccination information had several strengths including the large number of diverse participants with self-reported and EHR data available for comparison. Many participants self-reported their vaccine data twice: once in real time as vaccines were being administered and once several months later for confirmation. Our study has a number of important limitations. The CCRP is a U.S. healthcare-system-based convenience sample of participants who are predominantly non-Hispanic White and about a quarter are healthcare workers. These results may not be consistent in populations of varying race/ethnicity, occupation, education level, health literacy, or region. Importantly, the results are relevant only to the U.S. population where the details of COVID-19 vaccines have been discussed extensively in the media and may not be generalizable to other regions with different media landscapes, and vaccine attitudes and availability. Additionally, there were significant differences between participants with and without documentation of vaccination in EHR: female participants, healthcare workers, participants in the South East, and participants residing in urban areas were more likely to have vaccine information in their EHR (Appendix A Table A2). There were also differences by enrollment site. This difference in availability of EHR data highlights the importance of accepting self-reported vaccine dates particularly where EHR vaccination data may be sparse. Thus, we did not compute sensitivity or specificity of self-reported receipt of vaccination. Further, while EHR may be convenient and inexpensive in some cases, the observed limited coverage of COVID-19 vaccination in the EHR makes its utility limited. Additionally, due to the structure of the vaccination data we received from EHR (e.g., unnumbered doses), it was difficult to identify the dose number especially when one or more doses were likely missing. The self-reported dates for CCRP were also flawed. Initially, we did not ask the date of receipt for vaccination, and later the data entry system allowed the date field to be left blank. Insufficient data checks were in place to prevent the entry of nonsensical dates (e.g., typos, multiple doses labeled dose 1, dose 2 occurring before dose 1, etc.). For the immediate recall survey, we used the date the daily survey was entered as a surrogate for the date of receipt. For the supplemental survey (delayed recall), the date picker that allowed users to enter a date either through text input, or by choosing a date from the calendar defaulted to "today's" date which led to issues with data quality. For the CCRP analyses, we took advantage of having three sources of vaccination data and compared the three to identify the dates most likely to be correct for each participant, taking into account agreement between more than one source, timing between doses, and timing of entry of immediate recall survey. Lastly, while vaccine product agreement was high, the majority of participants received the Pfizer vaccine increasing the chance for random agreement.

Conclusions
Our findings suggest that self-reported vaccination information regarding dates of administration and product details can be an effective surrogate when medical records are unavailable in large observational studies. There are limitations to using medical records for COVID-19 vaccination as they do not always note if vaccination actually occurred; thus, self-reported dates may provide more data and be the simplest option for obtaining vaccination status and details. It is important to have robust data entry rules in place to avoid errors and when possible, to collect data in real time about vaccine receipt as the self-reported data may be of higher quality than delayed recall. A secondary confirmation of vaccine data on a subset of participants using EHR will provide evidence of validity.

Institutional Review Board Statement:
The study was conducted in accordance with the Declaration of Helsinki and the study was reviewed and approved by the Wake Forest Institutional Review Board (IRB), which served as the central IRB for this study (See 45 C.F.R. part 46; 21 C.F.R. part 56). The study is registered with ClinicalTrials.gov, NCT04342884.

Informed Consent Statement:
All participants in the COVID-19 Community Research Partnership provided consent for participation.

Data Availability Statement:
The datasets used and/or analyzed during the current study are available from the corresponding author on reasonable request.
Data Sharing: Results of the COVID-19 CRP are being disseminated on the study website (https: //www.covid19communitystudy.org/, 14 March 2022) as well as in publications and presentations in medical journals and at scientific meetings. At end of the study, the databases will be made publicly available in a de-identified manner according to CDC and applicable U.S. Federal policies.