Exploring Drugs and Vaccines Associated with Altered Risks and Severity of COVID-19: A UK Biobank Cohort Study of All ATC Level-4 Drug Categories Reveals Repositioning Opportunities

Effective therapies for COVID-19 are still lacking, and drug repositioning is a promising approach to address this problem. Here, we adopted a medical informatics approach to repositioning. We leveraged a large prospective cohort, the UK-Biobank (UKBB, N ~ 397,000), and studied associations of prior use of all level-4 ATC drug categories (N = 819, including vaccines) with COVID-19 diagnosis and severity. Effects of drugs on the risk of infection, disease severity, and mortality were investigated separately. Logistic regression was conducted, controlling for main confounders. We observed strong and highly consistent protective associations with statins. Many top-listed protective drugs were also cardiovascular medications, such as angiotensin-converting enzyme inhibitors (ACEI), angiotensin receptor blockers (ARB), calcium channel blocker (CCB), and beta-blockers. Some other drugs showing protective associations included biguanides (metformin), estrogens, thyroid hormones, proton pump inhibitors, and testosterone-5-alpha reductase inhibitors, among others. We also observed protective associations by influenza, pneumococcal, and several other vaccines. Subgroup and interaction analyses were also conducted, which revealed differences in protective effects in various subgroups. For example, protective effects of flu/pneumococcal vaccines were weaker in obese individuals, while protection by statins was stronger in cardiovascular patients. To conclude, our analysis revealed many drug repositioning candidates, for example several cardiovascular medications. Further studies are required for validation.


Introduction
Coronavirus disease 2019  has resulted in a pandemic affecting more than a hundred countries worldwide [1][2][3]. More than 220 million confirmed infections and 4.56 million fatalities have been reported worldwide as of 6 September 2021 (https: //coronavirus.jhu.edu/map.html, accessed on 6 September 2021). Besides the burden due to the disease itself, COVID-19 has created heavy burdens on the medical systems in many countries and has led to delays in the diagnosis and treatment of other types of diseases [4,5]. Therefore, it is of urgent public interest to gain deeper understanding into the disease, including identifying risk factors (RFs) for infection and severe disease, and uncovering new treatment strategies.
Although vaccines have been developed for COVID-19, its distribution is highly uneven and only a small proportion of the world's population has been fully vaccinated so far. In addition, vaccine hesitancy remains a major issue that has led to suboptimal vaccination coverage [6,7]. Inadequate knowledge and awareness of COVID-19, especially among the younger population, may also contribute to the continuous rise in the number of cases [8]. Coupled with viral variants that may be associated with increased transmission and reduced vaccine effectiveness [9], the search for drugs that may reduce susceptibility to disease and/or disease severity remains highly important.
A number of clinical risk factors (e.g., age, obesity, cardiometabolic disorders, renal diseases, presence of multiple comorbidities) [10][11][12][13][14][15] have been found to increase the risk of infection or complications. However, it is less well-known how different drugs may affect the risks of COVID-19 or its severity. Importantly, drugs with protective effects may be potentially repurposed for the prevention or treatment of the disease, as development of a new drug is often extremely lengthy and costly.
Drug repositioning by computational or statistical approaches for COVID-19 is an area of intense interest. Please refer to other reviews (e.g., [16][17][18]) for an overview of recent studies. For instance, one widely used methodology is the network-based approach, which can integrate different data sources, including omics data and drug-protein-disease interaction networks [16,[19][20][21]. Another methodology is the structure-based approach, which enables a large number of compounds to be screened for their ability to bind to known or predicted molecular targets for COVID-19 treatment [16,[22][23][24][25]. These methodologies are promising but may have their limitations. For example, they generally do not provide direct evidence for the candidates' effectiveness in real-world or clinical settings. In addition, these approaches may be limited by inadequate knowledge of the pathophysiology and molecular basis of COVID-19. Another limitation is that most drug repositioning studies did not consider patient characteristics; for example, a drug may be more effective within a certain age group or in those with a certain comorbidity. In addition, the effect size (e.g., relative risk reduction) of individual drugs and the level of statistical significance usually cannot be easily estimated by network/structure-based approaches.
Here, we employed a different methodology not previously applied to drug repositioning studies for COVID-19. We adopted a medical informatics approach which involves screening a large number of drugs for their associations with the disease, leveraging a largescale population cohort. In brief, we performed a comprehensive study on all Anatomical Therapeutic Chemical Classification System (ATC) level-4 drug categories (N = 819) and assessed their associations with susceptibility to, and severity of, COVID-19 in the UK Biobank (UKBB), controlling for possible confounders. Vaccines were also included for analysis. To our knowledge, this is the most comprehensive analysis to date to screen for drug associations and repositioning candidates for COVID-19, leveraging real-world population data.
While pharmacoepidemiology studies are typically focused on one or a few drugs, COVID-19 is a new disease, and we still have limited understanding of its pathophysiology and treatment. As a result, a hypothesis-driven approach may have important limitations of missing potential drug associations and new repositioning candidates. In the field of genetic epidemiology, it has been observed that hypothesis-driven candidate gene studies are not as reliable as genome-wide association studies (GWAS) [26] which are relatively unbiased, indicating merits of the latter approach. In the same vein, here we adopted a "drug-wide" association study approach, which provides a systematic and unbiased assessment of drug associations and repositioning candidates. This approach has also been advocated before [27].
In the present study, we performed rigorous analyses on the impact of medications/vaccinations on the risk of infection, disease severity, and mortality. Analyses were also conducted within infected patients, tested subjects, and the whole population respectively, and for five different time windows of prescriptions. We also performed further subgroup and interaction analyses to reveal differential effects of the drugs in people with different clinical background. This may enable more "personalized" drug repositioning, i.e., prioritizing drug candidates for specific patient subgroups.

UK Biobank Data
The UK Biobank is a large-scale prospective cohort comprising over 500,000 subjects aged 40-69 years who were recruited in 2006-2010 [28]. In this study, subjects with recorded mortality before 31 January 2020 (N = 28,930) were excluded, as it was the date for the first recorded case in UK. This study was conducted under project 28732.

COVID-19 Phenotypes
COVID-19 outcome data were downloaded from UKBB data portal. Information regarding COVID-19 data in the UKBB can be viewed at http://biobank.ndph.ox.ac.uk/ showcase/exinfo.cgi?src=COVID19 (accessed on 3 November 2020). Briefly, the latest COVID test results were downloaded on 6 November 2020 (last update 3 November 2020). We consider inpatient (hospitalization) status at testing as a proxy for severity. Data on date and cause of mortality were also extracted (latest update on 21 October 2020). Cases indicated by U07.1 were considered to be (laboratory-confirmed) COVID-19-related fatalities.
A case was considered as having "severe COVID-19" if the subject was hospitalized and/or if the cause of mortality was U07.1. We required both test result and origin to be 1 (positive test and inpatient origin) to be considered as a hospitalized case. For a small number of subjects with initial outpatient origin and positive test result, but changed to inpatient origin and negative result within 2 weeks, we still considered these subjects inpatient cases (i.e., assume the hospitalization was related to the infection).
For a minority of subjects (N = 19) whose mortality cause was U07.1 but test results were negative within one week, to be conservative, they were excluded from subsequent analyses.

Medication Data
Medication data was obtained from the primary care data for COVID-19 research in UKBB (details available at https://biobank.ndph.ox.ac.uk/showcase/showcase/docs/ gp4covid19.pdf, accessed on 9 November 2020). We made use of the latest release of General Practice (GP) records released by UKBB, which contains prescription data from two electronic health record (EHR) systems (TPP or EMIS) for~397,000 UKBB participants. The drug code and issue date of each drug are available. Please also refer to Figure 1 for an overview of our analysis workflow.

Figure 1.
An overview of the analytic workflow. We considered five exposure time windows and multiple statistical models. We conducted analyses within infected patients, tested subjects, and the whole population, respectively. Effects of prescribed medications/vaccinations on the risk of infection, severity of disease (hospitalization as proxy) and mortality were investigated separately. Missing data were accounted for by multiple imputation. Inverse probability weighting (IPW) of the probability of being tested (Prob(tested)) was employed to reduce testing bias. Multivariable logistic regression was conducted, controlling for main confounders. We primarily focused on drugs with protective effects, as residual confounding tends to bias towards harmful effects. In addition, we performed further subgroup and interaction analysis to identify factors that may modify the drug effects.

Time Window of Prescriptions
Since the GP records cover many years of prescriptions, we set time windows to restrict prescriptions with a certain time period as the "exposure". The "index date" was defined as (1) the date of the first positive COVID-19 test for infected subjects (for U07.1 cases, the mortality date was regarded as the index date if no test record was found); or (2) the date of last test for those who were tested negative; or (3) 3 November 2020 (the date of the latest update of COVID-19 test results) for those who were untested.
The issue date of each prescription was available, but the duration was not. Time windows were determined by whether the drug was issued within a specified period before the index date. The following windows were considered for medications: 6 months, 1 year, 2 years, and 5 years. Narrower time windows (<6 months) may not be desirable and may lead to many prescriptions being missed, as the latest issue date was 25 July 2020, but the latest index date was 3 November 2020.
As for vaccines, unlike many medications, vaccines are not prescribed regularly, and most vaccines only need to be given once or less than a few times; hence, a narrow time window is not optimal due to sparsity of data. For seasonal vaccines, namely flu vaccines, they are usually given in autumn (September to November) or early winter in the UK. A time window of 6 months will lead to missing most of the flu vaccines given. On the other hand, it is also reasonable to consider a longer time window (e.g., 10 years) as vaccine effects can be more long-lasting [29]. In view of the above, we considered time windows of 1, 2, 5, and 10 years for vaccinations. For flu vaccines, we defined "past 1 year" as prescriptions from 1 September 2019 onwards (and similarly for past k years) to account for the seasonal nature of vaccination. An overview of the analytic workflow. We considered five exposure time windows and multiple statistical models. We conducted analyses within infected patients, tested subjects, and the whole population, respectively. Effects of prescribed medications/vaccinations on the risk of infection, severity of disease (hospitalization as proxy) and mortality were investigated separately. Missing data were accounted for by multiple imputation. Inverse probability weighting (IPW) of the probability of being tested (Prob(tested)) was employed to reduce testing bias. Multivariable logistic regression was conducted, controlling for main confounders. We primarily focused on drugs with protective effects, as residual confounding tends to bias towards harmful effects. In addition, we performed further subgroup and interaction analysis to identify factors that may modify the drug effects.

Time Window of Prescriptions
Since the GP records cover many years of prescriptions, we set time windows to restrict prescriptions with a certain time period as the "exposure". The "index date" was defined as (1) the date of the first positive COVID-19 test for infected subjects (for U07.1 cases, the mortality date was regarded as the index date if no test record was found); or (2) the date of last test for those who were tested negative; or (3) 3 November 2020 (the date of the latest update of COVID-19 test results) for those who were untested.
The issue date of each prescription was available, but the duration was not. Time windows were determined by whether the drug was issued within a specified period before the index date. The following windows were considered for medications: 6 months, 1 year, 2 years, and 5 years. Narrower time windows (<6 months) may not be desirable and may lead to many prescriptions being missed, as the latest issue date was 25 July 2020, but the latest index date was 3 November 2020.
As for vaccines, unlike many medications, vaccines are not prescribed regularly, and most vaccines only need to be given once or less than a few times; hence, a narrow time window is not optimal due to sparsity of data. For seasonal vaccines, namely flu vaccines, they are usually given in autumn (September to November) or early winter in the UK. A time window of 6 months will lead to missing most of the flu vaccines given. On the other hand, it is also reasonable to consider a longer time window (e.g., 10 years) as vaccine effects can be more long-lasting [29]. In view of the above, we considered time windows of 1, 2, 5, and 10 years for vaccinations. For flu vaccines, we defined "past 1 year" as prescriptions from 1 September 2019 onwards (and similarly for past k years) to account for the seasonal nature of vaccination.

Mapping to ATC
All the medications were mapped to the ATC Classification (https://www.genome. jp/kegg-bin/get_htext?br08303, accessed on 9 November 2020). Drug categories were defined by the fourth level of ATC classification.

Covariate Data
We performed multivariable regression analysis with adjustment for potential confounders including basic demographic variables (age, sex, ethnic group), comorbidities (coronary artery disease (CAD), diabetes (DM), hypertension, asthma, chronic obstructive pulmonary disease (COPD), depression, dementia, history of cancer, blood urea and creatinine reflecting renal function), indicators of general health (number of medications taken, number of non-cancer illnesses), anthropometric measures (body mass index (BMI)), socioeconomic status (Townsend deprivation index) and lifestyle risk factor (smoking status). For disease traits, we included information from ICD-10 diagnoses (code 41270) and self-reported illnesses (code 20002), and incorporated data from all waves of follow-ups. Subjects with no records of the relevant disease from either self-report or ICD-10 were regarded as having no history of the disease.

Sets of Analysis
We performed a total of eight sets of analysis (Table 1). The impact of prescribed medication/vaccination on the risk of infection (Models E and F), severity of infection (Models A, C, and G) and risk of mortality (Models B, D, and H) from COVID-19 were investigated separately. Both hospitalized and fatal cases were grouped under the "severe" category. We also considered different study designs and conducted our analyses with different comparison samples. Models A and B are restricted to the infected subjects, while models C, D, and E involve comparison of severe, fatal and general infected cases to the general population (with no known diagnosis of COVID-19). On the other hand, models F, G, and H compared infected, severe, and fatal cases, respectively, against subjects who were tested negative for SARS-CoV-2.
There were 397,000 subjects in the UKBB with available GP prescription records. Among them, 30,835 subjects have received at least one COVID-19 test, and 3858 had been tested positive. There were 1318 cases classified as "severe" (hospitalized or mortality from COVID-19) and 170 fatal cases. In total 393,142 UKBB participants did not have a known diagnosis of COVID-19. The detailed count of participants for each model is listed in Table 2. Only subjects with available GP prescription records are shown.

Statistical Analysis Methods
Logistic regression (using the R package speedglm) was used to examine the impact of medication on different outcomes in the eight sets of analysis. For more stable estimates, analysis was not performed if the number of subjects taking the drug in the affected or unaffected group was less than five. All statistical analyses were conducted using R. The false discovery rate (FDR) approach by Benjamini and Hochberg [30] was performed to control for multiple testing. This approach controls the expected proportion of false positives among the rejected null hypotheses.

Imputation of Missing Data
Missing values of remaining features were imputed with the R package "missRanger". The program is based on missForest, which is an iterative imputation approach based on random forest (RF). It has been widely used and shown to produce low imputation errors and good performance in predictive models [31]. The program missRanger is largely based on the algorithm of missForest, but uses the R package "ranger" [32] to build RF for improvement in speed (we found that other packages, such as MICE and missForest, are computationally too slow to produce results for the large-scale analyses here). Predictive mean matching (pmm) was employed to avoid imputation of values not present in the original data, and to increase variance to more realistic levels for multiple imputation (MI). We followed the default settings with pmm.k = 5 and num.trees = 100. We performed the analyses on multiply imputed datasets (imputed for 10 times) and combined the results by Rubin's rules [33] using the function "mi.meld" under the R package "amelia". Another advantage of missRanger is that out-of-bag errors (in terms of classification errors or normalized root-mean-squared error) could be computed, which provides an estimate of imputation accuracy.

Inverse Probability Weighting of the Probability of Being Tested
Bias due to non-random testing has been discussed previously in other works [34,35]. As a person has to be tested to be diagnosed with COVID-19, factors leading to increased probability of being tested will also lead to an apparent increase in the risk of infection [35]. In addition, it has been raised that collider bias can occur when conditioned on the tested group. This could result in spurious associations, for example, between a risk factor and COVID-19 severity if both increases the probability of being tested (Pr(tested)). One way to reduce this kind of bias is to employ inverse probability weighting (IPW) of Pr(tested). Essentially, we wish to create a pseudo-population, or mimic a scenario under which testing is random instead of selected for certain subgroups. The IPW approach up-weighs those who are less likely to be tested and down-weighs those who have a high chance of being tested. This may create more unbiased estimates of the effects of drugs.
We took reference to the approach described in [34] to analyze the data with IPW. Following our recent work [36] which aims to predict COVID-19 severity with machine learning (ML), here we also employed an ML model (XGboost) to predict Pr(tested) based on a range of factors. An advantage of using ML models is that nonlinear and complex interactions can be considered, which may improve predictive performance over logistic models. We employed the same set of predictors as in our previous work [36], and followed the same analysis strategy of hyper-parameter tuning and cross-validation to obtain predicted probabilities (please refer to [36] for details). Beta-calibration [37] was performed, and the resulting average AUC was 0.622. The predicted probabilities (i.e., Pr(tested)) were used to construct weights for IPW. Stabilized weights [38] were used.

Subgroup Analysis
For selected drugs showing tentative protective effects, we also performed further subgroup and interaction analyses. These drugs included cardiovascular medications listed in Table 3, four vaccines with protective associations (influenza, pneumococcal, typhoid, and combined bacterial/viral vaccines), and other top drugs with consistent protective associations across multiple models/time windows as listed in Table 4.  Subgroup analysis was performed with respect to main demographic features (age, sex, and ethnicity) and main comorbidities (same as the diseases listed under "covariate data"). We also compared log(OR) estimates across the subgroups with or without the risk factor of interest. The test statistic was obtained by z = (β 1 − β 2 )/ var(β 1 ) + var(β 2 ), where β 1 and β 2 refer to the coefficients under the two independent subgroups.

Interaction Analysis
As a complementary approach, we also performed analysis with a logistic model including an interaction term (drug*risk_factor). The same set of drugs and risk factors were studied. The two approaches are similar in principle; however, stratified analysis yields more unbiased estimates if confounders have subgroup-dependent associations, while the interaction term approach produces more precise (lower-SE) estimates (hence higher power to detect interactions) [39].

Controlling for Other Drugs
We also performed additional regression analyses controlling for other top-ranked drugs. Two sets of analyses were conducted. In the first set of analysis, we controlled for the top 10 or 20 protective and harmful drugs in each time window and model. As for the second analysis, for drugs with protective associations, we controlled all other protective drugs with FDR < 0.05 or 0.1 (this analysis was performed for protective drugs only, as there were too many drugs associated with harmful effects to be included as covariates).

Results
Due to the large number of models and drugs being studied, we highlight the main results and findings from different sensitivity analysis.
Confounding by indication and other comorbidities is unavoidable, and, in particular, drugs showing harmful effects may possibly be explained by such confounding. On the other hand, as it is expected that most diseases tend to increase the risk/severity of infection, drugs showing protective effects are much less likely to be affected by confounding, and such associations may be relatively more reliable. We therefore place a greater emphasis on protective drugs in the sections below; this is also in line with our primary objective to prioritize repositioning candidates. Drugs with harmful effects are briefly discussed for comprehensiveness.
A summary of the demographic and covariate data of the original UKBB dataset is shown in Table S1. The missing rates and out-of-bag (OOB) errors for different variables from multiple imputations are shown in Table S2.

Primary Analysis with Multiple Imputation of Covariates
Full results of all drug categories across all time windows (including 6, 12, 24, 60, and 120 months; the last time window only for vaccines) are shown in Tables S6-S10. All protective associations (with at least nominal significance, i.e., p < 0.05) are shown in Table S3, while all association results with vaccines are presented in Table S4. For drugs associated with increased odds of infection/severity, we also summarize the top 10 drugs (ranked by p-value) from each model and time window, and organize them together in Table S5.

Overview
Across all categories, statins showed the strongest and most consistent protective associations. Highly significant protective effects were seen across infected subjects, tested subjects, or the whole population, especially in reducing the severity or mortality of infection. Albeit with smaller effect sizes, we also observed that statins might be linked to lower susceptibility to infection (model E). Interestingly, a number of top-listed drugs are also cardiovascular medications, such as angiotensin-converting enzyme inhibitors (ACEI), angiotensin receptor blockers (ARB), calcium channel blocker (CCB), and beta-blockers.
For simplicity, odds ratios (OR) are presented for a time horizon of 1 year if not further specified.

Drugs for Cardiometabolic Disorders
Significant protective associations with FDR < 0.05 are shown in Table 3. Statins showed protective effects across models A, C, D, E, and G. Significant protective effects against severe infection were seen among infected subjects (OR for prescriptions within a 12-month window, same below: 0.50, 95% CI: 0.42-0.60), tested subjects (OR = 0.63, 0.54-0.73), or when comparing severe cases to the general population (OR = 0.49, 0.42-0.57). In addition, protective association against fatal infection was observed (OR = 0.51, CI 0.34-0.74). Statins was also associated with lower susceptibility to infection, with ORs of 0.83 (CI: 0.77-0.91) and 0.86 (CI: 0.79-0.93) for prescriptions within 1 year and 2 years, respectively.
Another group of drugs with highly consistent protective associations were ACEI and ARB. ACEI showed protective associations against severe disease among infected subjects

Vaccines
Significant associations for vaccines with FDR < 0.05 are shown in Table 5. One of the most consistent associations was observed for influenza vaccines. Protective associations were observed across almost all models (B to H), and across all time windows. Flu vaccination was associated with lower odds of infection when compared to population controls  In view of the significant findings, we repeated the analyses on flu vaccines with other ways to define the exposure (Table S14)

Other Drugs Showing Protective Associations
Significant results for other drugs having protective effects and FDR < 0.05 are shown in Table 6. As for other drugs, proton pump inhibitors (PPI) were associated with lower odds of infection when we compared test-positive against test-negative patients (model Prior use of thyroid hormones was consistently associated with lower risk of infection and severity, no matter whether the general population or test-negative individuals were considered as controls. The ORs were similar across all time windows. For model E (infected vs. population), the OR for 1-year time window was 0.80 (CI 0.71 to 0.92), which was close to the effect size under model F (infected vs. test-negative). For model C (hospitalized/fatal cases vs. population), the OR for 1-year time window was 0.62 (CI 0.48 to 0.79), and it was similar when constrained to tested subjects.

Drugs Ranked by Consistency of Protective Associations
We also ranked the drugs in term of their consistency of protective associations. Briefly, drugs were ranked by their frequency of being at least nominally significant (p < 0.05) across the four time windows and eight models (Table 4). This serves as an alternative approach to prioritize drugs. For some drugs, the results may not be significant after FDR correction. Nevertheless, if a drug showed consistent associations (at least nominally) across multiple models or time-frames, it may also be worthy of further investigation.

Drug Associated with Increased Odds of Risk/Severity of Infection
Among the drugs with harmful associations, the more frequently top-listed ones include laxatives, opioids (N02AA), benzodiazepines, tetracycline, penicillins, other antipsychotics (N05AX), and antidementia drugs (N06DA/DX). The full results are presented in Tables S6-S10, and a summary is also provided in Table S5.

Analysis Restricted to Subjects with Complete Covariate Data, and Models with/without IPW
As a sensitivity analysis, for the above analysis with imputed covariates, we also repeated models A to H without IPW of Pr(tested). In addition, we also repeated the analyses, limiting to subjects with complete covariate data, with or without the IPW approach. In general, we observed similar drugs with significant results, and the topranked protective or harmful drugs were similar to the above. Comparing results with and without IPW, the list of significant drugs remained similar although the OR estimates and SE were adjusted. The full results are presented in Tables S7 and S8 (complete covariate data with and without IPW) and Table S9 (imputed covariates without IPW).

Subgroup Analysis
The proportion of subjects falling into each subgroup is presented in Table S10, while full results are presented in Table S11. We performed a statistical test to compare the log(OR) across the two subgroups with and without the risk factor; drugs with protective effect in one subgroup but significantly different OR in the other subgroup are listed in Table 7. For example, the protective effects of pneumococcal and flu vaccines were significantly weaker in obese (BMI > 30) subjects under model F. With regards to age, several drugs, such as PPI and ACEI, showed larger protective effects in those with age > 70 under models F and E, respectively. Statins, ACEIs, and PPI showed stronger protective associations in hypertensive patients under models C, E, and F, respectively. Regarding ethnicity as a subgroup, a number of drugs, including several vaccines, appeared to have stronger protective effects in the white compared to non-white subjects. However, only <10% of the UKBB subjects included here were non-white, and the non-white subgroup was heterogeneous and composed of several different ethnicities. We did not observe clear evidence of sex-specific effects in this analysis.   Table S11 for details. CAD, coronary artery disease, HT, hypertension.

Interaction Analysis
A summary of results (results with FDR < 0.2) is presented in Table 8, while a fuller version is given in Table S12. Full results are given in Table S13. More significant results (at FDR < 0.2) are observed compared to stratified analysis, presumably due to the higher power of this approach. For example, we found that most vaccines showing protective effects, including influenza and pneumococcal vaccines, interacted with BMI and obesity significantly. Higher BMI was associated with reduced protective effects, in line with evidence from subgroup analysis.
On the other hand, statins, biguanides (metformin), and antiplatelet drugs showed positive interactions with BMI. For CAD, significant interaction was observed with several cardiometabolic drugs, including beta-blockers (nonselective), antiplatelet drugs, and statins, suggesting larger protective effects for such drugs in CAD patients. In a similar vein, most cardiometabolic medications showed interaction with HT, indicating more prominent protective associations in HT patients.
Considering age as an interacting variable, interaction was observed with a large number of drugs, most suggesting weaker protection as age increases. Considering specific medications, statins interact with multiple risk factors and demonstrate larger protective effects with CAD, obesity, DM, CAD, HT, dementia, and in males. However, its effect tends to be weaker with increasing age. Interaction analysis with flu vaccines showed that its effect may be weaker in the obese and with increasing age, but was stronger in the white population and asthmatic subgroup. ACEI and ARB showed stronger protective effects in the white and HT patients, but weaker effects with advanced age.

Controlling for Other Medications
We primarily focused on protective drugs, as the number of drugs with significant negative effects is large and is hard to control for all. Overall, most drugs with protective effects remain significant (at least for a subset of models), despite controlling for other medications (Table S15). However, biguanides (A10BA), CCB (C08CA), and platelet aggregation inhibitors, excluding heparin (B01AC), showed a relatively consistent trend of nonsignificant association with outcome when other protective drugs were controlled for. The findings are similar when controlling for top-10/20 drugs or all protective drugs having FDR < 0.05/0.1.  We added an interaction term drug*interacting factor in the regression model. For "interaction term", 1 denotes significant interaction effects towards protection (i.e., presence of the interacting factor tends to increase the protective effect of the drug); −1 denotes significant interaction effects towards harmful side (presence of the interacting factor tends to reduce the protective effect of the drug). We consider significant results in any model or time window. For age and BMI, they were modeled as continuous variables unless otherwise specified. For full results, please refer to Tables S12 and S13.

Discussion
In this work, we performed a thorough and rigorous analysis on the effect of drugs and vaccines on COVID-19 susceptibility and severity. We uncovered a number of drugs with potentially protective effects, which may be further explored as candidates for drug repositioning.
As an approach based on observational data, different kinds of bias, such as confounding and selection bias, may affect the results. We performed analysis on infected subjects (models A and B), the whole population (models C, D, E) and the tested population (models F, G, H) to obtain a more comprehensive picture of drug effects under different settings, and to avoid limitations (e.g., selection bias, collider bias, unscreened controls) of some designs.

Highlights of Relevant Drugs
Below, we highlight drugs that are tentatively associated with altered risk or severity of infection. We preferentially consider drugs that showed significant associations across multiple models and time windows, those with stronger statistical significance, and those with protective effects, as confounding by indication is much less likely.

Drugs for Cardiometabolic Disorders with Protective Effects
Interestingly, many drugs with potential protective effects are indicated for cardiometabolic (CM) disorders. Cardiometabolic risk factors, such as obesity, hypertension, DM, and CAD, have consistently been shown to be associated with risk and severity of infection [15,40]; as such, it is biologically plausible that drugs for treating CM disorders may be beneficial.
Among all drugs, the strongest and most consistent protective association was observed for statins. The beneficial effects of statins are supported by several previous studies. For example, a recent meta-analysis of four retrospective studies of COVID-19 patients [41] showed a significantly decreased hazard of severity or mortality of infection (pooled HR = 0.70) when comparing statin users against nonusers. Another retrospective study by Tan et al. [42] also reported lower risk of intensive care unit (ICU) admission among statin users in infected patients. Yet another work showed that statins may be effective in reducing in-hospital mortality among diabetic patients [43]. Potential mechanisms for the protective actions of statins have been discussed elsewhere [44][45][46]. It has been postulated that, besides reducing CVD risks, statins may reduce risk/severity of infection by inhibiting inflammation and excessive immune response, producing direct antiviral effects, improving endothelial function, and exerting an antithrombotic effect, among other actions [44][45][46].
Another group of drugs worth highlighting is ACEI and ARB. There have been intense discussions on whether ACEI/ARB may affect risk or severity of infection from early on, as ACE2 is a receptor for SARS-CoV-2. Nevertheless, a recent study showed that ACE2 is localized in respiratory cilia, and the use of ARB/ACEI does not change its expression [47]. Recent systemic reviews and meta-analysis (for example, see [48] with continuous updates) of observational studies do not support an association between ACEI/ARB prior use and severity of infection. However, several studies [47,[49][50][51][52][53][54][55] reported protective effects of ACEI/ARB on severity or mortality of disease. Here, we observed consistent association of prior use of ACEI/ARB with reduced risks of severe/fatal infection (models A, C, G) and overall infection risk in the population (model E).
For several other kinds of cardiometabolic drugs, the associations were not as strong, but may still be worthy of further studies. Biguanides (mainly metformin) are observed to be protective for severe COVID-19 infection, both among the infected and at a population level. For example, in a meta-analysis on four observational studies of hospitalized patients mostly with type 2 DM, the use of metformin was associated with a lower risk of mortality (OR = 0.75, 95% CI = 0.67-0.85) [56]. A number of mechanisms have been proposed [56,57]. For example, besides improving glycemic control and weight reduction, metformin may lead to AMPK activation which potentially reduces viral entry by phosphorylation of ACE2 receptor. It may also lead to mTOR pathway inhibition and prevents hyperactivation of the immune system [56].
Other drugs of interest may include beta-blockers and calcium channel blockers (C08CA, dihydropyridine derivatives). It was suggested that beta-blockers may be useful in preventing hyperinflammation and hence beneficial for COVID-19 [58]. For calcium channel blockers (CCBs), a study using cell culture suggested that CCBs, especially amlodipine and nifedipine, were useful in blocking viral entry and infection in epithelial lung cells [59]. In another retrospective study [60], both beta-blockers and CCBs were associated with lower mortality. Another relevant study in the UK [61] utilized data from the UK Clinical Practice Research Datalink (CPRD) and found that ACEI/ARB, CCBs, and thiazide diuretics were all associated with lower odds of diagnosis, while beta-blockers do not show any association after adjusting for consultation frequency. None of the above drugs were associated with mortality in that study [61].

Vaccines
There has been intense interest in whether vaccines indicated for other diseases may protect against COVID-19. Here, we observed that a number of vaccines showed protection against infection or severe infection. For example, pneumococcal vaccines were protective against infection in the population and tested subjects, and risk of severe infection (model G). Significant protective associations were also observed for tetanus and typhoid vaccines at a time horizon of 10 years (the power to detect associations is likely stronger over longer periods due to larger number of people having received the vaccine; it does not exclude the possibility that the vaccines may have effects over shorter time windows). We also observed associations with the J07CA category, which contains various bacterial and viral vaccines (see https://www.whocc.no/atc_ddd_index/?code=J07CA, accessed on 9 November 2020).
For influenza vaccines, we observed highly consistent protective associations. It has been proposed that "trained innate immunity", which may involve epigenetic reprogramming of innate immune cells, may enable a vaccine to protect against other diseases [62,63]. Interestingly, two studies in Italy reported that higher coverage rate of flu vaccine was associated with lower rate of infection, hospitalization, and mortality from COVID-19. Another larger-scale study, based on electronic records of 137,037 subjects who have received viral PCR tests, showed that a number of vaccines (given in the past 1, 2, or 5 years) were associated with lower risks of infection [64]. These included flu and pneumococcal vaccines also implicated in the present study. Another recent study in the Netherlands [65] also showed a reduced risk (Relative risk = 0.61, 95% CI: 0.46-0.82) of infection among recipients of flu vaccine, and this effect size was similar to that observed here. In vitro studies by the same authors showed that the vaccine was able to induce a trained immunity response, including an increase of cytokine responses after stimulation of immune cells with SARS-CoV-2.
We note that this is an observational study, and residual confounding may be present. For example, it is possible that people receiving flu vaccines are more health-conscious and observe preventive measures better. However, we observed waning protective effects over time, which makes sense biologically but could not be entirely explained by the above confounder alone. In addition, the vaccine appears to have stronger effect sizes if fatal infection is considered as the outcome (although the confidence interval is large), which cannot be easily explained by health-consciousness. On the other hand, as flu vaccines are more likely to be received by the elderly and those with chronic illnesses, residual confounding of these factors tend to push the effects towards the harmful side.
Taken together, we believe that the protective effects of vaccines may not be easily and fully explained away by other confounders. Further experimental and clinical studies are warranted to investigate the nonspecific effects of flu and other vaccines, especially since COVID-19 vaccines may not be easily available to many people (especially those in low-income countries) in a short timeframe.

Other Potential Protective Drugs
We briefly highlight a few other drugs with potential protective effects. Estrogens (G03CA) were among the drugs showing protective associations. As many studies reported higher risks of severe disease in men than in women, it has been hypothesized that estrogen may play a part in the sex-discordant outcomes, for example via its effects on immune response to infections [66][67][68].
Thyroid hormones (TH) were also among the top-ranked drugs. It was postulated that TH may ameliorate tissue injury due to hypoxia by suppression of p38 MAPK [69]. Clinical trials on TH are ongoing [69,70], and our findings support a protective role of TH in COVID-19.
Another drug category of note is proton pump inhibitors (PPI). Several studies have suggested harmful effects of PPI on disease severity, which may be related to reduced gastric acid production with subsequent bacterial overgrowth [71][72][73]. However, an in vitro screening study revealed that PPIs may serve as a potent inhibitor of SARS-CoV-2 replication [74]. The difference in findings between the current study and previous works may be due to heterogeneity in study samples and designs, differences in the outcome studied (e.g., hospitalization vs. ICU admission used in some other studies; infection risk vs. severity of disease, etc.), and variations in the covariates being adjusted for. Residual confounding, such as by other comorbidities and drugs given, may also affect the results. Interestingly, we observed that effects of PPI may be stronger in certain subgroups (e.g., older age, HT), which may also account for the discrepancy in results across different studies.
Several other top-ranked drug categories in Table 4 may also be worth discussing. Testosterone-5-alpha reductase inhibitors (5ARis) were recently shown in a small randomized controlled trial (RCT) to reduce the time to remission [75]. Two earlier observational studies also reported lower risk of ICU admission and frequency of symptoms [76,77]; 5ARis block the conversion of testosterone to its more potent form, dihydrotestosterone. Of note, one of the key receptors for the SAR-CoV-2 virus is TMPRSS2 [78], and the only known promoter of the gene is an androgen response element in the promoter region [79].
Another drug category of interest is platelet aggregation inhibitors (B01AC). It has been reported that COVID-19 is associated with higher risk of thrombotic events, including deep vein thrombosis and pulmonary embolism [80]. Antithrombotic therapies have been hypothesized to reduce thrombo-inflammatory processes as a result of endothelial dysfunction related to viral infection [81]. An observational study reported that aspirin is associated with reduced risk of mechanical ventilation and mortality in hospitalized patients [82]; however, RCTs are lacking.
For some of the protective drugs highlighted above, we note that their significance weakened (or became nonsignificant) when controlling for other medications. However, we expect multicollinearity among the drug variables, as cardiometabolic disorders are highly comorbid and one patient often takes multiple medications. Multicollinearity may render interpretation of individual predictors difficult due to unstable coefficient estimates [83].
In our secondary analyses, we also considered subgroup and interaction effects. While this is a more exploratory analysis and further replications are required, it shed light on how the effects of drugs/vaccines may differ in people with different clinical background and may contribute to more "personalized" drug repositioning in the future. For instance, we observed a consistent trend that the protective associations of flu and pneumococcal vaccines were weaker in obese individuals. As an example, comparing those who received flu vaccine in the past season (2019-2020) against those who did not, the estimated OR for infection was 0.76 in the obese group and 0.54 in the non-obese group (model F). It has been observed before that obese individuals respond less well to flu and other vaccines due to impaired immunological responses [84,85]. As another example, statins were observed to have more prominent protective effects in those with cardiometabolic abnormalities, such as DM, HT, CAD, and obesity. This is also supported by a recent study [43] which showed mortality reduction in statin users in diabetic patients only.

Drugs with Potentially Harmful Effects
We noted a number of drugs with potentially harmful effects, but we caution that residual confounding, such as confounding by indication, other comorbidities, and general poor health, may lead to bias towards an increased odds of infection or severe disease.
For example, people who have poorer health in general may visit their GPs more often and be prescribed drugs (e.g., laxatives, antibiotics, painkillers), which may lead to confounding. Nevertheless, it is possible that some of the top-ranked drugs may indeed increase the risk/severity of infection. For instance, it is slightly unexpected that laxatives were highly significant across multiple models and time windows. It has recently been postulated that dysregulation of gut microbiome may be associated with susceptibility or resilience to infection [86,87], and laxatives represent a main category of drugs that affect the gut microbiome [88]. Interestingly, several associations involve psychiatric medications such as benzodiazepines, antipsychotics, and antidementia drugs. The association may be due to underlying neuropsychiatric conditions (e.g., anxiety, psychosis, dementia, etc.), or the effect of the drugs, or a combination of both. Some of the above drugs overlap with those revealed in a recent study using primary care data in Scotland. In a univariate analysis restricted to nonresidents in care homes and those without major conditions, laxatives, anxiolytics, penicillins, and opioid analgesics were significantly associated with ICU admission or mortality from COVID-19 when compared to population controls [89]. These drugs were also top-listed as drugs with harmful effects in this study.
Patients taking immunosuppressants are more susceptible to viral infections in general, and it is possible that these drugs are also associated with increased vulnerability to COVID-19 infection [90]. On the other hand, such drugs may dampen excessive immune responses ("cytokine storm") that may occur in severe infections [91]. However, here we did not find consistent evidence of associations between immunosuppressive agents and COVID-19. Across immunosuppressive drugs (ATC category L04), we only found two significant associations (FDR < 0.05). Interleukin inhibitors were associated with higher susceptibility to infection (model E) and selective immunosuppressants (L04AA) were associated with higher risk of severe infection (model C), respectively, when compared to population controls (Table S6). No other significant associations were observed. Of note, a few preclinical studies reported that thiopurines, a type of immunosuppressant, may lead to reduced viral replication [92,93] via other mechanisms, although clinical studies suggested possible harmful effects [94,95]. However, the number of patients taking such drugs was too small for meaningful analysis in this study.

Different Results under Different Models
We note that sometimes the different models may yield different results. One main observation is that analysis on the tested population appears to result in more findings of drugs with protective effects. We also observed that some drugs in model F (infected vs. tested negative) may show different effects under model E (infected vs. general population). Several reasons may explain this finding. First, confounding by indication is inevitable and may play a more important role when analyzing general population samples. It is possible that apparent harmful effects of drugs are due to the diseases/conditions that the prescription is related to, or poorer health in general. Based on a machine learning model for predicting testing probability (see Figure S1), we observed that people who are older, having more comorbidities and taking more medications, suffering from cardiovascular conditions, etc. were more likely to be tested. Compared to the general population, the tested group may represent a more "homogeneous" population, enriched for people with poorer health and more comorbidities in general. Therefore, a proportion of confounders which overlap with factors associated with higher Pr(tested) are essentially controlled for by stratification, if we only study the tested subjects. On the other hand, in the general population, as there is a higher proportion of healthy subjects, the effect of confounding by indication may be stronger. Another possibility is collider bias due to conditioning on a subgroup of subjects. For example, a drug may be associated with certain conditions which, in turn, are associated with higher chance of being tested; on the other hand, those who have more severe symptoms or complications are more likely to be tested. Conditioning on testing may result in spurious associations between the drug and severity of infection. However, we have tried to minimize this type of bias by the IPW approach, and we did not observe substantial difference in results with or without IPW correction for most drugs. However, we note that, even with adjustment by IPW, there is still chance for residual selection or collider bias. For example, some factors associated with Pr(tested) may not be captured in the prediction model. A third possibility to consider is that a drug may truly produce different effects in different subgroups, due to effect modification by other factors or diseases. For instance, a recent study reported that the protective effect of statins is more marked in patients with diabetes [43]. The fact that risk factor associations may differ between a whole-population-or tested-population-based study has also been noted previously, for example in [35].

Strengths and Limitation
This study has a number of strengths. First and foremost, the study was performed on a large cohort with a sample size close to half a million. The sample was not limited to one or a few medical centers, and covered the entire UK population, although this is not an entirely random sample and participation bias still exists [34]. The large and well-characterized sample also enables analysis of infected and tested, as well as the whole population. We have studied all level-4 ATC drug categories, allowing an unbiased and systematic analysis on the association of different drugs with COVID-19 risks or outcomes. This avoids the risk of publication bias, especially negative results to be unreported. Drugs showing null associations can still be of important public health interest, as this may suggest that patients on such medications may not need to change their regimen in view of the pandemic. In addition, medication history was retrieved from GP records, which minimize recall bias and errors from self-reporting. Another strength is that we performed a variety of statistical analysis to reduce bias, including control for potential confounders, multiple imputation, IPW to reduce effects of testing bias, and study of different time windows and multiple models. Some of our findings were also corroborated by previous studies. Many previous clinical studies were limited to hospitalized or infected individuals, which cannot study the effect of drugs on susceptibility to infection. Selection on hospitalized/infected subjects may also be prone to selection/collider bias, as discussed elsewhere [34]; therefore, we included multiple models with infected and tested, as well the whole population as samples, which aims to reduce limitations due to specific designs.
There are also various limitations, some of which have been mentioned above. First and foremost, this is an observational study based on a retrospective cohort of UKBB. As this is not a randomized controlled trial, confounding is inevitable, especially confounding by indication. Although we have controlled for main confounders in the regression model, residual confounding is still likely. Since confounding by indication will likely bias towards increased odds of infection or severe disease, null or protective associations may be more reliable. Confounding by the use of other types of drugs is also possible. In addition, the UKBB cohort is not random, and participants are on average healthier than the general population [96]. The majority of participants are of European descent, so the findings may not be generalizable to other ethnicities. In addition, the subjects are mostly >50 years old, and drug effects in younger individuals may be different.
Regarding drug history, it is worth noting that vaccination records are not complete, as individuals may receive vaccination outside GP practices. Over-the-counter prescriptions were not counted, and it cannot be guaranteed that all drugs issued are dispensed by the pharmacy (see https://biobank.ctsu.ox.ac.uk/crystal/crystal/docs/tppgp4covid19. pdf, accessed on 9 November 2020). However, if this misclassification is nondifferential (unrelated to outcome), the bias will be towards the null. There is a relatively high missing rate of GP prescription records for deceased COVID-19 patients, which leads to reduced power to detect associations. While the UKBB cohort sample is large, we still have low power to detect associations for drugs that are uncommonly prescribed. Another limitation with the GP records is that only the issue date, but no duration or dosage, is available.
As for the outcome, hospitalization is a rough proxy for severity only. For models comparing to the general population, it is likely that a proportion of the population may be infected but were not tested. This tends to lead to bias on the conservative side (akin to the use of unscreened controls in genetic studies [97,98]), especially under model E. Patients with more severe symptoms are less likely to remain untested, so other models may be less affected by this bias. We note that this study focuses on prior (or pre-diagnostic) use of drugs and their association with infection risk/severity, and does not provide direct evidence for whether newly prescribed drugs to recently diagnosed patients will be useful or not. The current study represents one approach to drug repositioning with real-world population data, yet integrating results from other repositioning approaches (e.g., network/structure-based) may further improve the reliability of candidates.

Clinical Implications
We highlight a few clinical implications here, although we stress that further studies are required to confirm our findings. We discovered a number of drugs with potential protective effects that, if replicated and tested in further trials, may represent promising repurposing candidates (for prevention or treatment of disease). As CM disorders are a major risk factor for severe infection, this study also provides further support for the safety of CM medications and reinforces the need to continue these drugs for those indicated. In a similar vein, negative findings (nonsignificant associations with COVID-19) in this study may also be of value, given that some patients or physicians may have concerns over the risk of COVID-19 induced by existing drugs.
Another important finding is that flu (and possibly others, e.g., pneumococcal) vaccines may be associated with lower odds of infection and severity of disease. If further confirmed, the finding is clinically important as COVID-19 vaccines are not fully available yet to a large part of the world's population (especially those in developing countries), some may be hesitant to take the new vaccine, and the efficacy of existing vaccines varies and is less than perfect. At least, the present work supports that flu and other vaccinations should be continued and encouraged amid the pandemic. For any vaccines/drugs that may be repurposed for COVID-19, we believe that even a modest reduction in the risk/severity of infection may still be highly useful, given the huge number of people at risk for COVID-19 and its complications.

Conclusions
Here, we observed that a number of drugs, including many for cardiometabolic disorders, may be associated with lower odds of infection/severity of COVID-19. Several existing vaccines, especially flu vaccines, may be beneficial against COVID-19 as well. Due to the observational nature of the study, confounding cannot be excluded, and other limitations may be present. We understand that causal relationship between drugs and disease cannot be reliably concluded from this study alone, and shall regard the findings as more exploratory than confirmatory. Nevertheless, to our knowledge, this is the most comprehensive study to date on drug/vaccine associations with COVID-19. We believe that the current work provides a valuable resource to prioritize repositioning candidates for future meta-analyses, clinical trials, and/or experimental studies.  Table S1: Demographic and other characteristics of the original UKBB data, Table S2: Out-of-bag (OOB) errors for different variables from multiple imputations, Table S3: (a) All protective associations with at least nominal significance (p < 0.05) (6 month to 5 years), Table S4: All association results with vaccines (time windows of 1, 2, 5 and 10 years), Table S5: (a) Top 10 drugs with harmful effects (ranked by p-value) from each model and time window (time window of 6 month to 5 years) (b) Summary table by frequency of being listed among the top 10, Table S6: All association results based on subjects with available GP prescription records, with multiple imputation of covariates and inverse probability weighting (IPW) of probability of being tested, Table S7: Analysis restricted to subjects with complete covariate data, with IPW, Table S8: Analysis restricted to subjects with complete covariate data, without IPW, Table S9: Analysis with imputed covariates without IPW, Table S10: Proportion of subjects in each subgroup, Table S11: Full results of subgroup analysis, Table S12: Summary table of interaction analyses (results with FDR < 0.2), Table S13: Full results of interaction analyses, Table S14: Further analysis on associations of flu vaccine and risks/severity of infection, according to the season of vaccination, Table S15: Results of analyses after controlling for other top medications. Figure   Data Availability Statement: UK Biobank data are available to eligible researchers after completing an application procedure (https://www.ukbiobank.ac.uk/enable-your-research).