Race, Ethnicity, and Pharmacogenomic Variation in the United States and the United Kingdom

The relevance of race and ethnicity to genetics and medicine has long been a matter of debate. An emerging consensus holds that race and ethnicity are social constructs and thus poor proxies for genetic diversity. The goal of this study was to evaluate the relationship between race, ethnicity, and clinically relevant pharmacogenomic variation in cosmopolitan populations. We studied racially and ethnically diverse cohorts of 65,120 participants from the United States All of Us Research Program (All of Us) and 31,396 participants from the United Kingdom Biobank (UKB). Genome-wide patterns of pharmacogenomic variation—6311 drug response-associated variants for All of Us and 5966 variants for UKB—were analyzed with machine learning classifiers to predict participants’ self-identified race and ethnicity. Pharmacogenomic variation predicts race/ethnicity with averages of 92.1% accuracy for All of Us and 94.3% accuracy for UKB. Group-specific prediction accuracies range from 99.0% for the White group in UKB to 92.9% for the Hispanic group in All of Us. Prediction accuracies are substantially lower for individuals who identified with more than one group in All of Us (16.7%) or as Mixed in UKB (70.7%). There are numerous individual pharmacogenomic variants with large allele frequency differences between race/ethnicity groups in both cohorts. Frequency differences for toxicity-associated variants predict hundreds of adverse drug reactions per 1000 treated participants for minority groups in All of Us. Our results indicate that race and ethnicity can be used to stratify pharmacogenomic risk in the US and UK populations and should not be discounted when making treatment decisions. We resolve the contradiction between the results reported here and the orthodoxy of race and ethnicity as non-genetic, social constructs by emphasizing the distinction between global and local patterns of human genetic diversity, and we stress the current and future limitations of race and ethnicity as proxies for pharmacogenomic variation.


Introduction
Pharmacogenomic variants are genetic differences that affect how patients respond to medications, in terms of drug efficacy and toxicity [1][2][3]. Pharmacogenomic mechanisms of action include genetic modifications to enzymes and transporters that regulate the rate at which drugs are metabolized and absorbed (pharmacokinetics) or genetic changes to drug targets (pharmacodynamics). Pharmacogenomic testing is increasingly being used to predict how individuals will respond to certain medications and to guide treatment decisions. The United States Food and Drug Administration (FDA) documents wellsupported pharmacogenomic associations for 114 medications, and the Clinical Pharmacogenetics Implementation Consortium (CPIC) has developed clinical practice guidelines for 145 medications [4,5].
Pharmacogenomics also has implications for public health owing to differences in the frequencies of pharmacogenomic variants among population groups. A number of

Biobank Volunteer Participants
This study used data from volunteer participants enrolled in the NIH All of Us Research Program (All of Us) and the UK Biobank (UKB) [36,37]. All of Us participant data were accessed under the terms of the Georgia Institute of Technology Data Use and Registration Agreement, and UKB participant data were accessed under application number 65206. The All of Us operational protocol is approved by the NIH IRB (protocol number 2016-05), and ethics approval for UKB was obtained from the Community Health Index Advisory Group (CHIAG) for Scotland, the Patient Information Advisory Group (PIAG) for England and Wales, and the North West Multi-centre Research Ethics Committee (MREC) for the United Kingdom (project ID 299116). Written informed consent was obtained from all participants. All of Us participant inclusion criteria include adults aged 18 and older, the legal authority and decisional capacity to consent, and currently residing in the US or a territory of the US. All of Us exclusion criteria exclude minors under the age of 18 and vulnerable populations (prisoners and individuals without the capacity to give consent). UKB participant inclusion criteria include adults aged 40-69 at recruitment, the capacity to consent, and living within 20-25 miles of one of the UKB assessment centers. UKB exclusion criteria exclude participants who express the view that they would want to be withdrawn Pharmaceutics 2023, 15,1923 3 of 14 should they lose mental capacity or die. The main difference in inclusion criteria for the two studies relates to the age of the participants. All of Us includes adults aged 18 and over, whereas UKB includes adults aged . This difference reflects the UKB's decision to focus on complex diseases of middle and old age.

Biobank Participant Data
All of Us participant data were accessed and analyzed using the Researcher Workbench, and UKB participant data were downloaded from the UKB data portal and analyzed locally. Whole genome sequence variant data for All of Us participants were accessed from the Controlled Tier dataset v6 (curated version C2022Q2R2), and genotype imputed whole genome variant data for UKB participants were accessed from data field 21008. All of Us participants' self-identified race and ethnicity data were accessed from the Controlled Tier dataset, and UKB participants' self-identified ethnic group data were accessed from data field 2100. The top five largest race/ethnicity groups were taken for each biobank.

Pharmacogenomic Variants
Pharmacogenomic variants that are associated with patient drug response were mined from the PharmGKB database [38]. NCBI dbSNP variant identifiers (rsIDs), associated genes and drugs, and levels of evidence for variant-drug associations were taken for each pharmacogenomic variant. PharmGKB variant rsIDs were used to extract pharmacogenomic variants from All of Us and UKB whole genome variant datasets. Pharmacogenomic variant alternate allele frequencies for All of Us and UKB were calculated using PLINK v1.9 [39].

Machine Learning Prediction
Principal component analysis (PCA) was performed on the All of Us and UKB pharmacogenomic variants using the FastPCA program implemented in PLINK v2.0, run with the "approx" modifier for the top 25 principal components (PCs) [40,41]. Pharmacogenomic PCA data were used to predict participant race and ethnicity using machine learning classifiers, with race/ethnicity as class labels and the top 25 PC-values as feature vectors. K-nearest neighbors (k-NN), random forest (RF), and support vector machine (SVM) classifier methods were implemented using the scikit-learn machine learning library v1.1.2 for Python [42]. All three methods were implemented with randomized searches to determine optimal prediction hyperparameters (training) and 5-fold cross-validation (CV) to measure prediction accuracy (testing). Accuracy was quantified as the mean ± standard deviation for the percentage of correct race/ethnicity predictions in the five test datasets for each biobank cohort. Model training and testing were repeated for feature vectors covering contiguous ranges of 2-25 PCs. Additional details on the machine learning classification approaches used here can be found in the Supplementary Methods.
Pharmacogenomic PCA allele weights were calculated by FastPCA in the form of j = 1 − n variant (SNP) allele dosage coefficients for the ith PCs using PC i = ∑ n j=1 AlleleWeight ij SNPdosage ij . The magnitude of allele weights corresponds to the effect each SNP has on a given PC, which can be taken as a measure of genetic divergence.

Predicted Adverse Drug Reactions
The predicted number of excess adverse drug reactions per 1000 patients for All of Us minority racial and ethnic group participants compared to participants from the majority White group were calculated based on toxicity-associated pharmacogenomic variant effect allele frequency differences between groups, considering the mode of effect as recessive (two toxicity effect alleles needed) or dominant (one or two toxicity effect alleles needed). For the recessive model of adverse drug reactions ( R ADR ): where p 2 min is the homozygous genotype fraction for the minority group toxicity-associated allele p, and p 2 maj is the homozygous genotype fraction for the majority group toxicityassociated allele p.
For the dominant model of adverse drug reactions ( D ADR ): where 2 * p min * (1 − p min ) is the heterozygous genotype fraction for the minority group toxicity-associated allele p, and 2 * p maj * 1 − p maj is the heterozygous genotype fraction for the majority group toxicity-associated allele p.

Race and Ethnicity in the All of Us and UKB Cohorts
All of Us and UKB volunteer participants self-identify their race and ethnicity upon enrollment. All of Us race and ethnic groups are defined based on the US Census standards, and UKB ethnic groups are defined based on the UK National Health Service (NHS) standards. The US makes a distinction between race based on ancestral origins, and ethnicity based on shared culture, whereas the UK defines ethnicity based on shared national origins. The race and ethnic groups are similar for both countries, albeit with differences that reflect the distinct patterns of immigration and resulting demographic characteristics of each country. For example, the Hispanic ethnic category exists only in the US, and the Asian category in the UK covers South Asian immigrants from Bangladesh, India, and Pakistan, with Chinese broken out as a separate group. The US classification allows for the selection of More than one group, whereas the UK classification requires the selection of a single ethnic group but includes a Mixed category.
The All of Us participant cohort is 54.0% White, 19.6% Black or African American, 15.9% Hispanic or Latino, 3.1% Asian, and 3.6% More than one; the UKB participant cohort is 94.4% White, 1.9% Asian, 1.5% Black, 0.3% Chinese, and 0.6% Mixed. Although the All of Us cohort is substantially more racially/ethnically diverse than UKB, White participants make up the majority of each biobank, which could bias machine learning classification algorithms. Accordingly, White participants were randomly down-sampled to 20,000 participants for All of Us and 10,000 participants for UKB to yield more balanced group sample sizes for subsequent machine learning prediction (Table 1). Both biobanks have more female than male participants; the All of Us cohort is 60.8% female and UKB is 53.4% female. The average age for both biobanks is 53 years.

Pharmacogenomic Variation, Race, and Ethnicity
Pharmacogenomic variants mined from the PharmGKB database (n = 6509) were intersected with genome-wide genotype data from All of Us (n = 6311) and UKB (n = 5966). Pharmacogenomic variants were analyzed using principal components analysis (PCA) and compared to self-identified race and ethnicity for All of Us and UKB participants. PCA of pharmacogenomic variants yields clusters that correspond approximately to participant race and ethnicity groups for both All of Us and UKB ( Figure 1). Nevertheless, there appears to be a continuum of pharmacogenomic variation for the first two PCs with no sharp boundaries between race and ethnicity clusters. The White group forms the most coherent cluster for All of Us, while the White and Chinese groups are the most coherent for UKB. The Hispanic group has the broadest PCA distribution for any single All of Us group, consistent with its designation as an ethnic group that may include individuals from different racial groups. The Asian group in All of Us forms two clusters, corresponding to South and East Asian ancestry. The Asian group in UKB corresponds to South Asian ancestry, consistent with the NHS definition of the ethnic group. The More than one and Mixed groups are the most dispersed groups in All of Us and UKB, respectively.

Pharmacogenomic Variation, Race, and Ethnicity
Pharmacogenomic variants mined from the PharmGKB database (n = 6509) were in tersected with genome-wide genotype data from All of Us (n = 6311) and UKB (n = 5966) Pharmacogenomic variants were analyzed using principal components analysis (PCA) and compared to self-identified race and ethnicity for All of Us and UKB participants. PCA of pharmacogenomic variants yields clusters that correspond approximately to participant race and ethnicity groups for both All of Us and UKB ( Figure 1). Nevertheless, there appears to be a continuum of pharmacogenomic variation for the first two PCs with no sharp boundaries between race and ethnicity clusters. The White group forms the mos coherent cluster for All of Us, while the White and Chinese groups are the most coheren for UKB. The Hispanic group has the broadest PCA distribution for any single All of Us group, consistent with its designation as an ethnic group that may include individuals from different racial groups. The Asian group in All of Us forms two clusters, correspond ing to South and East Asian ancestry. The Asian group in UKB corresponds to South Asian ancestry, consistent with the NHS definition of the ethnic group. The More than one and Mixed groups are the most dispersed groups in All of Us and UKB, respectively. The relationship between biobank participants' race/ethnicity and genome-wide pat terns of pharmacogenomic variation was quantified via machine learning classification Classification algorithms are supervised learning algorithms that are used to predict categorical variables (classes) from a defined vocabulary (class labels). For this study, All o Us and UKB participants' self-identified race/ethnicity groups were taken as class labels and pharmacogenomic PC values were taken as features used for model training and class prediction. Three different machine learning classifiers-k-nearest neighbors (k-NN), ran dom forests (RF), and support vector machines (SVM)-were used to evaluate the accuracy with which pharmacogenomic PC values predict participant ethnicity in UKB. Al three methods gave similar results, with the best overall performance of 94.3% mean accuracy using 16 principal components (PCs) shown by RF ( Table 2)  The relationship between biobank participants' race/ethnicity and genome-wide patterns of pharmacogenomic variation was quantified via machine learning classification. Classification algorithms are supervised learning algorithms that are used to predict categorical variables (classes) from a defined vocabulary (class labels). For this study, All of Us and UKB participants' self-identified race/ethnicity groups were taken as class labels and pharmacogenomic PC values were taken as features used for model training and class prediction. Three different machine learning classifiers-k-nearest neighbors (k-NN), random forests (RF), and support vector machines (SVM)-were used to evaluate the accuracy with which pharmacogenomic PC values predict participant ethnicity in UKB. All three methods gave similar results, with the best overall performance of 94.3% mean accuracy using 16 principal components (PCs) shown by RF ( Table 2). The UKB results for k-NN and SVM are shown in . Most of the pharmacogenomic variation is captured by the first 3-4 PCs (Supplementary Figure S2), and SIRE classification accuracy with RF does not change significantly after the first 3 PCs (Figure 2 and Supplementary Figure S1). The highest overall RF race/ethnicity prediction accuracy for All of Us is 92.1% using 17 PCs (Figure 2).   The accuracy of race/ethnicity classification varies according to groups in both All of Us and UKB. PC values for misclassified individuals from distinct race/ethnicity groups are shown in Figure 3A,B. Misclassified individuals from specific groups tend to map just outside the borders of their respective pharmacogenomic clusters. There is a relatively large number of misclassified Hispanic participants from All of Us, who tend to group with Black or White clusters, consistent with the definition of this group. Misclassified participants who identified as More than one in All of Us or Mixed in UKB show a more dispersed distribution in pharmacogenomic PC space. The accuracy of race/ethnicity prediction is highest for the White group in both All of Us (98.6%; Figure 3C) and UKB (99.0%; Figure 3D). The prediction accuracy of Hispanic individuals in All of Us is high (92.9%) despite the relatively high pharmacogenomic diversity of the group. Participants who identified with More than one group in All of Us are predicted primarily as Hispanic (41%), with broad distribution across White (22.7%), More than one (16.9%), and Black (16.3%) groups, and Mixed ethnicity is predicted with 70.7% accuracy in UKB. The accuracy of race/ethnicity classification varies according to groups in both All of Us and UKB. PC values for misclassified individuals from distinct race/ethnicity groups are shown in Figure 3A,B. Misclassified individuals from specific groups tend to map just outside the borders of their respective pharmacogenomic clusters. There is a relatively large number of misclassified Hispanic participants from All of Us, who tend to group with Black or White clusters, consistent with the definition of this group. Misclassified participants who identified as More than one in All of Us or Mixed in UKB show a more dispersed distribution in pharmacogenomic PC space. The accuracy of race/ethnicity prediction is highest for the White group in both All of Us (98.6%; Figure 3C) and UKB (99.0%; Figure 3D). The prediction accuracy of Hispanic individuals in All of Us is high (92.9%) despite the relatively high pharmacogenomic diversity of the group. Participants who identified with More than one group in All of Us are predicted primarily as Hispanic (41%), with broad Allele weights from PC1 and PC2 were used to identify pharmacogenomic variants that have the highest levels of genetic divergence among samples (Table 3 and Supplementary Table S1). In light of the relationship between race, ethnicity, and pharmacogenomic variation, these variants tend to show the greatest allele frequency differences between race/ethnicity groups (Figure 4 and Supplementary Figure S3). Group-divergent pharmacogenomic variants of this kind can be found across PharmGKB evidence levels (1A, 1B, 2A, 2B, and 3) and correspond to effects on efficacy, dosage, and toxicity for a wide variety of drugs.  Allele weights from PC1 and PC2 were used to identify pharmacogenomic variants that have the highest levels of genetic divergence among samples (Table 3 and  Supplementary Table S1). In light of the relationship between race, ethnicity, and pharmacogenomic variation, these variants tend to show the greatest allele frequency differences between race/ethnicity groups (Figure 4 and Supplementary Figure S3). Group-divergent pharmacogenomic variants of this kind can be found across PharmGKB evidence levels (1A, 1B, 2A, 2B, and 3) and correspond to effects on efficacy, dosage, and toxicity for a wide variety of drugs.

Adverse Drug Reactions
The potential clinical impact of group-divergent pharmacogenomic variants was evaluated by calculating the predicted number of excess adverse drug reactions per 1000 patients for minority patients compared to the majority White group in All of Us. For example, the pharmacogenomic variant rs4646437 (chr7:99767460:G:A) has been associated with severe side effects among heroin-dependent patients treated with methadone [43]. The toxic effect is dominant, with both AA and AG genotype patients showing more severe side effects compared to patients with GG genotype. The A allele is found at 72.5% frequency among Black All of Us participants compared to 10.5% frequency for White participants. This allele frequency difference, under the dominant effect model (̂), predicts 726 more adverse drug reactions to methadone among 1000 Black patients treated compared to White patients.
The pharmacogenomic variant rs9923231 (chr16:31096368:C:T) has been associated with the risk of anticoagulation and excess bleeding in patients treated with warfarin and phenprocoumon. The toxic effect is dominant, with CT and TT patients showing an increased risk of adverse effects. The T allele is found at 67.4% frequency among Asian All of Us participants compared to 33.8% frequency for White participants. This allele frequency difference, under the dominant model (̂), predicts 332 more adverse reactions

Adverse Drug Reactions
The potential clinical impact of group-divergent pharmacogenomic variants was evaluated by calculating the predicted number of excess adverse drug reactions per 1000 patients for minority patients compared to the majority White group in All of Us. For example, the pharmacogenomic variant rs4646437 (chr7:99767460:G:A) has been associated with severe side effects among heroin-dependent patients treated with methadone [43]. The toxic effect is dominant, with both AA and AG genotype patients showing more severe side effects compared to patients with GG genotype. The A allele is found at 72.5% frequency among Black All of Us participants compared to 10.5% frequency for White participants. This allele frequency difference, under the dominant effect model ( D ADR ), predicts 726 more adverse drug reactions to methadone among 1000 Black patients treated compared to White patients.
The pharmacogenomic variant rs9923231 (chr16:31096368:C:T) has been associated with the risk of anticoagulation and excess bleeding in patients treated with warfarin and phenprocoumon. The toxic effect is dominant, with CT and TT patients showing an increased risk of adverse effects. The T allele is found at 67.4% frequency among Asian All of Us participants compared to 33.8% frequency for White participants. This allele frequency difference, under the dominant model ( D ADR ), predicts 332 more adverse reactions to warfarin or phenprocoumon among 1000 Asian patients treated compared to White patients.
The pharmacogenomic variant rs1801133 (chr1:11796321:G:A) has been associated with the risk of hematotoxicity among pediatric leukemia patients treated with methotrexate [44]. The adverse effect is dominant, with AA and AG genotype patients showing an increased risk of toxicity. The A allele is found at 10.4% among Black All of Us participants compared to 34.8% among White participants. This allele frequency difference, under the dominant effect model ( D ADR ), predicts 377 more adverse reactions to methotrexate among 1000 White patients treated compared to Black patients.
The pharmacogenomic variant rs9694958 (chr8:42298528:A:G) has been associated with the risk of skin rash among non-small cell lung cancer patients treated with gefitinib [45]. The toxic effect is recessive, with AA genotype patients showing an increased risk of developing a skin rash. The A allele is found at 33.3% frequency among Black All of Us participants compared to 92.0% frequency for White participants. This allele frequency difference, under the recessive effect model ( R ADR ), predicts 735 more adverse reactions to gefitinib among 1000 White patients treated compared to Black patients.

Discussion
The results presented here may appear to be paradoxical in light of the widely held notion that race and ethnicity are social constructs and thus poor proxies for genetic diversity. If this really is the case, then how can it be that pharmacogenomic variants predict race/ethnicity with such high accuracy, show large allele frequency differences between groups, and support the clinical relevance of race and ethnicity for adverse drug reactions? The resolution to this apparent paradox lies in the distinction between global and local patterns of human genetic diversity. The racial and ethnic group categories used in the US and the UK map poorly on global patterns of human genetic diversity, the vast majority of which are found within Africa [46][47][48]. For instance, given the extensive genetic variation and deep divergence times among African populations, White and Nigerian British individuals from UKB would be more closely related to each other than either is to Khoisan individuals from Southern Africa, even though Nigerian and Khoisan individuals would be racially classified as Black. There is also no reason to think that the discrete and categorical race/ethnicity groups used in the US and UK would accommodate more continuous patterns of global genetic variation [49,50].
Race and ethnicity, however, are defined locally in a way that reflects particular countries' migration histories and their resulting demographic characteristics. This can be seen in the categories used by the US and UK biobanks studied here, which differ in ways that capture distinct aspects of each country's demography. Racial and ethnic categories also change over time in a way that reflects changing demographic patterns within countries. The US census racial and ethnic classifications have changed 20 times since they were first used in the 18th century and are likely to change with the next census to reflect the increasing diversity of the country [51]. The local (and temporal) correspondence between race, ethnicity, and demography explains the connection between race, ethnicity, and genetic diversity reported here and elsewhere. This is especially true given the fact that race in the US is explicitly defined in terms of ancestral origins and ethnicity in the UK is defined in terms of immigrants' national origins. The discontinuous sampling of divergent migrant and native populations that created modern, cosmopolitan populations, such as the US and the UK, is expected to yield clear genetic differences between socially defined groups. It is thus simultaneously true that race and ethnicity are poor proxies for global patterns of human genetic diversity, while there are also pronounced and clinically relevant genetic differences between locally defined racial and ethnic groups. In other words, socially constructed race and ethnicity groups can show genetic differences that are relevant to health.

Caveats and Limitations
The race and ethnicity categories studied here amount to broad groups, which may encompass multiple genetically diverged subgroups. For example, the Asian category in All of Us includes both East and South Asian groups, which are genetically diverged, whereas the Asian category in the UKB includes primarily South Asian groups. The inclusion of East and South Asian groups together in our analysis obscures pharmacogenomic differences between them. PharmGKB has adopted a biogeographic grouping system-based on seven globally geographically defined groups-to standardize the reporting of variability in pharmacogenomic allele frequencies [52]. This system is better designed to capture global patterns of pharmacogenomic variation, including countries outside the US and the UK, than socially defined race and ethnicity groups. Nevertheless, in the clinical setting, physicians have ready access to patient race and ethnicity, whereas biogeographic ancestry would require the analysis of patient genomic data.
There are other important caveats and limitations to the reliance on self-identified, and locally defined, race and ethnicity groups as proxies for pharmacogenomic variation. Beyond genetics, race and ethnicity groups also differ with respect to social determinants of health, lifestyle, and environment, all of which can be highly relevant to patient care. As it relates to genetic factors, race and ethnicity information will be rendered useless by pharmacogenomic testing, which provides a far more accurate and direct assessment of pharmacogenomic variation and risk. If all patients had ready access to pharmacogenomic testing, patients' race and ethnicity would be irrelevant to treatment decisions. However, tests of this kind have yet to be widely and routinely implemented, and minority individuals are currently less likely to have access to genetic data of this kind [17]. In addition, as we have shown previously, race and ethnicity serve to stratify pharmacogenomic risk among population groups rather than accurately predict specific variants for any given individual [8]. In this sense, race and ethnicity should be considered pharmacogenomic risk factors for patient stratification rather than direct diagnostic tools for predicting the presence of specific variants in individual patients.
Finally, as demographic diversity in countries such as the US and the UK continues to increase, particularly owing to increased immigration and intermarriage, traditional racial and ethnic groups will become increasingly irrelevant to pharmacogenomic risk stratification. This is supported by the relatively low prediction accuracy values seen for the More than one group in All of Us and the Mixed group in UKB. All of these facts underscore the need to move from viable but imprecise genetic proxies-such as race, ethnicity, and ancestry-to direct measures of genetic diversity in support of more equitable precision medicine.

Conclusions
In their updated guidance for reporting race and ethnicity, the Journal of the American Medical Association declared that "Race and ethnicity are social constructs, without scientific or biological meaning" [21]. However, as we have shown here, socially defined race and ethnicity groups show differences in the frequency of pharmacogenomic variants that are directly relevant to health care. Our results on adverse drug reactions illustrate how ignoring the pharmacogenomic implications of race and ethnicity could exacerbate health disparities that burden US and UK minority groups. The social and genetic dimensions of race and ethnicity are not mutually exclusive, and the implications of both should be considered when treating patients. Considered together, the results of this study and the caveats discussed above suggest that, at this time, patient race and ethnicity should still be considered as one among many factors when making treatment decisions.
Supplementary Materials: The following supporting information can be downloaded at: https: //www.mdpi.com/article/10.3390/pharmaceutics15071923/s1, Figure S1: Accuracy for prediction of UKB participants' ethnicity using pharmacogenomic PCA data; Figure S2: Scree plots for pharmacogenomic PCAs computed using All of Us and UKB; Table S1: Highly diverged pharmacogenomic variants in UKB; Figure S3: Examples of divergent pharmacogenomic variants in UKB. Informed Consent Statement: Informed consent was obtained from all subjects involved in the study.
Data Availability Statement: All of Us data can be accessed by registered researchers using the Researcher Workbench https://www.researchallofus.org/data-tools/workbench/. UKB data can be accessed via researcher agreement using the Access Management System http://amsportal. ukbiobank.ac.uk/.