Insights into HIV-1 Transmission Dynamics Using Routinely Collected Data in the Mid-Atlantic United States

Background: Molecular epidemiological approaches provide opportunities to characterize HIV transmission dynamics. We analyzed HIV sequences and virus load (VL) results obtained during routine clinical care, and individual’s zip-code location to determine utility of this approach. Methods: HIV-1 pol sequences aligned using ClustalW were subtyped using REGA. A maximum likelihood (ML) tree was generated using IQTree. Transmission clusters with ≤3% genetic distance (GD) and ≥90% bootstrap support were identified using ClusterPicker. We conducted Bayesian analysis using BEAST to confirm transmission clusters. The proportion of nucleotides with ambiguity ≤0.5% was considered indicative of early infection. Descriptive statistics were applied to characterize clusters and group comparisons were performed using chi-square or t-test. Results: Among 2775 adults with data from 2014–2015, 2589 (93%) had subtype B HIV-1, mean age was 44 years (SD 12.7), 66.4% were male, and 25% had nucleotide ambiguity ≤0.5. There were 456 individuals in 193 clusters: 149 dyads, 32 triads, and 12 groups with ≥ four individuals per cluster. More commonly in clusters were males than females, 349 (76.5%) vs. 107 (23.5%), p < 0.0001; younger individuals, 35.3 years (SD 12.1) vs. 44.7 (SD 12.3), p < 0.0001; and those with early HIV-1 infection by nucleotide ambiguity, 202/456 (44.3%) vs. 442/2133 (20.7%), p < 0.0001. Members of 43/193 (22.3%) of clusters included individuals in different jurisdictions. Clusters ≥ four individuals were similarly found using BEAST. HIV-1 viral load (VL) ≥3.0 log10 c/mL was most common among individuals in clusters ≥ four, 18/21, (85.7%) compared to 137/208 (65.8%) in clusters sized 2–3, and 927/1169 (79.3%) who were not in a cluster (p < 0.0001). Discussion: HIV sequence data obtained for HIV clinical management provide insights into regional transmission dynamics. Our findings demonstrate the additional utility of HIV-1 VL data in combination with phylogenetic inferences as an enhanced contact tracing tool to direct HIV treatment and prevention services. Trans-jurisdictional approaches are needed to optimize efforts to end the HIV epidemic.


Introduction
In February 2019, there was a call to end the HIV epidemic in the U.S. within a decade, using currently available treatment and prevention modalities [1]. Based on the , and Puerto Rico) were selected as over 50% of HIV diagnoses were made in these locations [2]. An estimated 14% of people living with HIV (PWH) were not aware of their condition and were thought to contribute approximately 38% of new transmissions with the remaining new infections arising from individuals with known HIV infection who were either not in care or not virally suppressed despite being in care [3]. Continued progress towards elimination of the HIV epidemic requires diagnosis of a higher percentage of PWH. Consistent early detection and treatment with durable viral suppression would reduce transmission [4]. This goal may be enhanced by implementing strategies to detect transmission clusters. Those can be reconstructed using HIV-1 sequence data. Expanded molecular epidemiologic investigation that also incorporates HIV-1 viral load (VL) may (i) identify undiagnosed PWH or those with suboptimal treatment with continued viremia who need antiretroviral treatment for their own health, but also to limit transmission to others [5][6][7]; and (ii) identify contacts at risk of infection for targeted provision of prophylactic antiretrovirals including appropriate preor post-exposure prophylaxis to prevent HIV acquisition [8][9][10][11].
Molecular epidemiologic approaches to characterize HIV transmission have been used in research and public health settings to describe local transmission dynamics [12][13][14]. Such studies have augmented our understanding of transmission dynamics previously based primarily on epidemiologic data alone. In 2014-2015, molecular epidemiologic approaches were critical to detect and monitor an HIV outbreak leading to 181 new HIV infections in southeastern Indiana related to drug injection [15]. The U.S. Centers for Disease Control and Prevention have since added a molecular surveillance component to routine public health activities to identify HIV outbreaks. Since then, this approach has been used to investigate outbreaks across the U.S. [16]. The approach focuses on identifying rapidly growing transmission clusters, with selection of a very low genetic distance threshold of 0.5% defining closely related viral sequences among individuals with recently diagnosed HIV. Only a small fraction of the new diagnoses invokes a public health investigation, with most investigations conducted by local health jurisdictions and limited to intra-state analyses.
DC has the highest HIV prevalence in the U.S. with an estimated 1.8% of the population living with HIV in 2019 [17]. DC residents have the highest estimated lifetime risk of acquiring HIV infection (1:39) followed in sixth place by neighboring Maryland (1:85) [18]. The adjacent geographic location of DC to Maryland and Virginia and the sharing of work force result in social and cultural interactions across borders, impacting the HIV epidemic. Using public health HIV surveillance data from these three contiguous jurisdictions, we found that approximately 21% of the PWH in the DC metropolitan area received care in more than one jurisdiction demonstrating the converging and overlapping epidemics in this high HIV transmission region [19]. Active movement of PWH in the metropolitan DC area is demonstrated with routine surveillance data that demonstrate outmigration of approximately 42% of PWH originally diagnosed in DC in 2019, and in-migration contributing 17% of the 12,408 PWH actively receiving care in DC [17]. Investigating HIV transmission dynamics in this region, we previously analyzed HIV pol sequence data from individuals who had enrolled in clinical or observational studies from early in the HIV pandemic [20][21][22]. Transmission clusters were common among men who have sex with men (MSM) and younger individuals. Individuals with evidence of recent infection based on low proportion of nucleotide ambiguity were more likely to belong in a transmission cluster [23,24].
In this study, we used HIV-1 sequences generated for clinical purposes along with limited demographic data, to characterize HIV transmission patterns in this high prevalence region. Our supposition is this approach would identify transmission clusters that traverse state borders, across this porous interconnected region, highlighting the importance of trans-jurisdictional surveillance to maximize its benefit in reducing HIV infection. We further postulate that laboratory surrogate markers for HIV care continuum metrics such as HIV-1 VL will provide an important discriminating factor to guide clinical and public health interventions.

Study Population
This is a retrospective molecular epidemiology study, using HIV sequencing data generated as part of routine clinical care from PWH receiving medical treatment in the mid-Atlantic region including DC, Maryland (MD) and Virginia (VA). HIV pol data sequenced using the Sanger method from 2008-2015 (protease, reverse transcriptase, and, when available, integrase) were obtained under a data use agreement with one national clinical laboratory. Additional information including age, sex, state, zip code, and year of service was available to the research team. A subset of individuals had quantitative HIV-1 RNA VL testing performed for routine clinical care at the Clinical Laboratory Improvement Amendments of 1988 (CLIA) certified national reference laboratory using a U.S. Food and Drug Administration cleared test method from Roche Molecular Systems (Branchburg, NJ, USA) with a lower limit of detection of 1.3 log 10 copies/mL (c/mL). Data from samples obtained within a year of the sequencing date were included in this analysis. When zip code or state of residence were not provided, we used where medical service was provided as a location marker.

Sequence Alignment and Characterization
The earliest sequence available for each individual age ≥18 years, obtained during 2014-2015 were used. We aligned protease (Pr) and reverse transcriptase (RT) regions of HIV-1 pol sequences in Clustal-Omega version clustal-omega-1.2.1using HXB2 Pr-RT as the reference sequence spanning nucleotides 2252-3443 of the HXB2 reference sequence [25]. The Sierra pipeline to the Stanford HIV Database was used in sequence evaluation (subtype, gene coverage, and quality) and assessment of HIV-1 drug resistance [26]. The proportion of nucleotides with ambiguity ≤0.5% was used to indicate early or recent HIV infection within one year of infection [23,24].

Bayesian Analysis
To probe the likelihood that clusters actually represented local transmission networks, the specificity of cluster designation was assessed using a probability-based approach, the Bayesian evolutionary analysis sampling trees (BEAST) [29]. We identified sequences that were in groups of four or more at a GD ≤3% using the ML method. In order to verify the veracity and robustness of the transmission clusters we performed a blast search in the GenBank HIV sequence database for each sequence in this subset and selected the ten sequences most similar to each sequence in our data set [30]. Excluding duplicate sequences, sequences without dates, and those of significantly shorter length, we were left with 120 GenBank sequences. A total of 183 sequences comprising the study and GenBank sequences, with an additional subtype D reference sequence, were used for the BEAST analysis. The sequences were aligned as above. BEAST analysis was performed using the GTR+I+G substitution model, uncorrelated log-normal relaxed molecular clock [31], the parametric tree prior coalescent assumption of the Gaussian Markov random field (GMRF) skyride [32,33], and Markov chain Monte Carlo (MCMC) of iterations for 135 million states to achieve a posterior effective sample size of 302. Clusters of sequences that grouped together with a posterior probability nodal support of 95% or greater were identified and compared to the original clusters identified using the ML method to confirm the robustness of grouping.

Network Analysis
Network analysis was performed to visually display the effect of different GD thresholds on the network composition. The aligned HIV sequence data were uploaded into HIV-TRACE [34] to implement the Tamura Nei 93 (TN93) substitution model that determines the pairwise distances between sequences [35]. The ambiguities between nucleotide pairs were resolved at an ambiguity fraction of 0.05 and the overlap was set at 500. The program generated a list of the nodes (represents sequences) and edges (connections between the nodes) used to derive network figures. The node and link attribute files were integrated with the remaining metadata and uploaded in the web-based HIV-TRACE visualization tool. Network figures were generated for GD thresholds of 0.5%, 1.5%, 2.0%, and 3.0%.

Statistical Analysis
We used R script to generate descriptive statistics to characterize the frequency and distribution of demographic and cluster data within the study population [36]. We conducted comparisons between groups using t-test and chi square tests for continuous and categorical data, respectively.

Ethical Considerations and Disclosures
This study was reviewed and approved by the Georgetown University Institutional Review Board under the terms of the data use agreement allowing the use of limited personal identifiers including zip code, with appropriate data security safeguards instituted by the Georgetown University Information Systems. Data from zip codes with fewer than 20 individuals were omitted to avoid unintended loss of privacy for these individuals. The terms of use limit access to these data to the study team and their delegates that include members of the national reference laboratory that generated these data. Publications resulting from this data set required review and approval by the national reference laboratory.

Cluster Data
Among 2589 individuals with subtype B HIV-1, 456 (17.6%) aligned within 193 clusters using the ML method (GD 3%, 90% bootstrap support) ( Table 2). The majority of clusters were dyads (149) and triads (32). There were 12 clusters of ≥4 individuals: 4 clusters of 4; 5 clusters of 5; and 1 cluster each with 6, 7, and 8 individuals. Proportionally, males were more likely to be in clusters than females, 349/1762 (19.8%) vs. 107/823 (13.0%), p < 0.0001. The age distribution of individuals based on cluster designation is demonstrated in Figure 1 When using a more stringent genetic distance threshold of 2%, the number of clusters decreased to a total of 109:88 dyads, 14 triads, four clusters of four, and three clusters of five individuals. Even fewer clusters (72) were identified using a genetic distance cutoff of 1.5%, with 59 dyads, 10 triads, and 3 clusters of 4 individuals. At the genetic distance threshold of 0.5%, only 13 dyads and 1 triad were identified.

BEAST Analysis
All the clustered sequences with ≥4 members found using the maximal likelihood approach with a genetic distance threshold of 3% remained in clusters, with posterior probability support ≥0.95 (Figure 2a,b) when analyzed using BEAST. Twenty-one out of the sixtytwo individuals had HIV-1 VL available, and 18/21 (85.7%) had HIV-1 VL ≥ 3.0 log 10 c/mL. There was evidence of transmitted K103N/K103S drug resistance mutations within a large transmission cluster as shown in Figure 2b, with the resistance mutation noted among multiple members of the cluster. The range and frequency of major HIV-1 resistance mutations are shown in Figure S1.
Percentages may not add up to 100% due to rounding.
When using a more stringent genetic distance threshold of 2%, the number of clusters decreased to a total of 109:88 dyads, 14 triads, four clusters of four, and three clusters of five individuals. Even fewer clusters (72) were identified using a genetic distance cutoff of 1.5%, with 59 dyads, 10 triads, and 3 clusters of 4 individuals. At the genetic distance threshold of 0.5%, only 13 dyads and 1 triad were identified.

BEAST Analysis
All the clustered sequences with ≥4 members found using the maximal likelihood approach with a genetic distance threshold of 3% remained in clusters, with posterior probability support ≥0.95 (Figure 2a,b) when analyzed using BEAST. Twenty-one out of the sixty-two individuals had HIV-1 VL available, and 18/21 (85.7%) had HIV-1 VL ≥ 3.0 log10 c/mL. There was evidence of transmitted K103N/K103S drug resistance mutations within a large transmission cluster as shown in Figure 2b, with the resistance mutation noted among multiple members of the cluster. The range and frequency of major HIV-1 resistance mutations are shown in Figure S1. HIV-1 sequences that were identified in clusters size four or more based on GD 3% using the maximal likelihood method were included alongside the most closely related sequences from the Gen-Bank HIV Sequence database at Los Alamos (labeled "LA"). Bayesian evolutionary analysis sampling trees [29] analysis was performed using the GTR+I+G substitution model, uncorrelated lognormal relaxed molecular clock, tree coalescent assumption of the Gaussian Markov random field (GMRF) skyride [31][32][33], and run of 135 million states to achieve an effective sample size of 302. The cluster relationships were maintained with high posterior probability of ≥95% when analyzed using this BEAST approach. Male participants are indicated in purple and females indicated in green, and sequences from Los Alamos indicated in black. (b). Cluster characteristics: HIV-1 drug resistance mutations and viral load. HIV-1 viral load data are written in black with the corresponding sequence HIV-1 sequences that were identified in clusters size four or more based on GD 3% using the maximal likelihood method were included alongside the most closely related sequences from the GenBank HIV Sequence database at Los Alamos (labeled "LA"). Bayesian evolutionary analysis sampling trees [29] analysis was performed using the GTR+I+G substitution model, uncorrelated log-normal relaxed molecular clock, tree coalescent assumption of the Gaussian Markov random field (GMRF) skyride [31][32][33], and run of 135 million states to achieve an effective sample size of 302. The cluster relationships were maintained with high posterior probability of ≥95% when analyzed using this BEAST approach. Male participants are indicated in purple and females indicated in green, and sequences from Los Alamos indicated in black. (b). Cluster characteristics: HIV-1 drug resistance mutations and viral load. HIV-1 viral load data are written in black with the corresponding sequence when available. Among the 21 individuals with HIV-1 VL data available, 18 (85.7%) had levels ≥3 log10 c/mL. Four HIV-1 drug resistance associated mutations were identified in two clusters including the resistance mutations Y188H, K101E, K103N/S, and G190S. One cluster had multiple cluster members with drug resistance mutations.

Network Analysis
Using HIV-TRACE we identified 519 nodes linked to at least one other node at GD 3.0% (Figure 3a-d). A total of 189 clusters were formed, of which the largest had 33 nodes with 46 connected links at a mean genetic distance of 2.8% and 1.39 links per node. The secondlargest dense cluster consisted of 15 nodes making 38 links with a higher transmission speed of 2.53 links per node. The male transmission clusters comprised 66% of identified networks and the female clusters were 10.5%. Recent infection based on the proportion of nucleotide ambiguity ≤0.5% identified a higher number of nodes from 2014 than from 2015. Networks spanning more than one jurisdiction were identified in 75 nodes comprising 14% of individuals identified to be in a transmission cluster. second-largest dense cluster consisted of 15 nodes making 38 links with a higher transmission speed of 2.53 links per node. The male transmission clusters comprised 66% of identified networks and the female clusters were 10.5%. Recent infection based on the proportion of nucleotide ambiguity ≤0.5% identified a higher number of nodes from 2014 than from 2015. Networks spanning more than one jurisdiction were identified in 75 nodes comprising 14% of individuals identified to be in a transmission cluster.

Discussion
This collaborative project was designed to determine the utility of regional sequence and HIV-1 VL data generated as part of clinical care for the characterization of HIV transmission patterns, given the highly interconnected mid-Atlantic metropolitan DC area. An important element of our research is the public-private-academic partnership that allowed us to access this representative regional dataset generated in the course of routine clinical care with multi-jurisdictional representation under a legally binding data sharing agreement. Enrolling such a comprehensive cohort to obtain sequence and laboratory data for research purposes only would have been challenging and costly. With the safeguards that we have placed both in the design and implementation of the study to protect individual's privacy, we demonstrate the feasibility and utility of such collaborations. Our findings of the relative frequency of trans-jurisdictional transmission clusters in this highly interconnected region highlights the need to overcome restrictions in data sharing based on traditional jurisdictional borders, regional or national. Such restrictions limit collaborations and innovative strategies that span jurisdictions are essential to adequately respond to the call to end the HIV epidemic.
We demonstrate the high fidelity of rapid maximal likelihood phylogenetic approaches for identifying transmission clusters, further validated with the more time intensive probability-based sequence and network analyses. Overall, we rarely identified large transmission clusters, with most clusters containing two or three sequences in this convenience sample of sequences. There were few transmission clusters with size 4 or greater. This is likely the result of the sampling framework that was not focused exclusively on newly diagnosed PWH, but rather included all individuals for whom HIV sequence data were generated for clinical management, including those with chronic HIV-1 infection. In addition to MSM and younger individuals, we also identified a subset of individuals in their fourth or higher decades of life, including women, in transmission clusters. Interventions to end the HIV epidemic in the region will need targeting also these important and often missed sub-populations that remain at risk for HIV-1 infection.
Evidence supporting the validity of rapid maximal likelihood based phylogenetic methods shows that large data sets can be analyzed to support contact tracing within relatively short periods of time, with computational capacity that is generally available in many public health settings. In this context, more recent uses of HIV sequence data have emerged that focus on identifying new HIV outbreaks. These projects have primarily used very narrow genetic distance thresholds of 0.5-1.5% [13,14]. While this may be appropriate for identifying emerging large outbreaks, these cutoffs yield o reduced case numbers for investigation when applied to our regional dataset, with only fourteen clusters (thirteen dyads and one triad) fitting the most stringent metric. Most individuals found in larger clusters, identified using the genetic distance threshold of 3% and verified using a network and probability based phylogenetic analysis with BEAST, would have been missed. A significant proportion of individuals with evidence of linked transmission in our analyses demonstrated very low proportion of nucleotide ambiguity, suggesting that some of these infections indeed could have been more recently acquired. The relatedness of individuals within clusters using a less stringent genetic distance cutoff could be epidemiologically important in a mature epidemic such as in the mid-Atlantic region, where linked transmissions may span decades. The consistent findings despite using different methods of inference to determine sequence relatedness demonstrate justifies using a 3% genetic distance threshold to robustly identify relevant transmission clusters. The persistence of the identified clusters after the addition of the most closely related sequences from GenBank suggests supports likely epidemiologic linkages among the persons from whom sequences were obtained. While use of currently accepted genetic distance thresholds in public health molecular surveillance programs are set to identify emerging outbreaks, we demonstrated how these thresholds would miss identifying epidemiologically important targets for intervention in the context of an established generalized epidemic. Relaxing the stringency beyond 3%, though, appeared counter-productive in our case as it may result in artifactual linkages, a finding that has previously been reported by others [13].
The addition of selected laboratory testing such as HIV-1 VL further increases the potential utility of molecular sequence data. HIV-1 VL is an important marker of linkage, engagement and retention in care in the context of the HIV care continuum [37]. The most recent report indicates that 57% of the estimated 1.2 million individuals with HIV in the U.S. had viral suppression [38]. Ideally, all individuals with detectable viral load would benefit from interventions to support linkage to care. We demonstrate how overlaying molecular epidemiology with HIV-1 VL data allows for further risk stratification such that individuals with evidence for recent infection based on sequence ambiguity, cluster membership, and high HIV-1 VL would be preferentially targeted for support services if resources are constrained. Natural history studies have demonstrated HIV-1 VL of approximately 3 log 10 c/mL as the threshold for transmission risk, with escalating VL associated with increasing transmission risk [39]. More recent data have identified that ART use with durable viral suppression decreases risk of transmission [5][6][7]. The relatively high proportion of individuals within clusters with HIV-1 VL ≥ 3log 10 copies/mL highlights the need for targeted and effective ART programs. While all individuals should receive ART for their own health, those who are in transmission clusters with high VL could be prioritized for adherence support and interventions to halt further transmission. Individuals who are in dyads and triads are often not considered high value targets for intervention and contact tracing but are more frequently identified than the larger transmission networks that are currently the target of public health interventions [40]. The addition of HIV-1 VL data to transmission cluster data allows one to discern which individuals are biologically more likely to transmit and hence should be prioritized for supportive intervention. Such an approach may allow us to break through the current plateau in new HIV-1 diagnoses towards the goal of ending the HIV epidemic.
Earlier molecular epidemiology studies in clinical settings have demonstrated the decreased transmission risk with antiretroviral treatment and viral suppression [41]. We demonstrate how molecular epidemiologic elements can bolster efforts to identify individuals with potential for transmission at the local population level, using routinely collected clinical management laboratory data. Ideally, all individuals with viremia should receive interventions to support engagement in care. In the absence of unlimited resources, however, priority could be directed to those with elevated viremia for whom there is molecular epidemiologic support for risk for transmission [42], thereby providing an additional incen-tive for intervening both for the sake of the individual's health and to reduce transmission to others. Such active monitoring with expanded use of molecular data in conjunction with clinical metadata such as HIV-1 VL would contextualize and refine assessments of transmission risk [42]. Individuals in larger cluster sizes who remain viremic have greater potential to transmit HIV, risk that could be effectively mitigated with appropriate outreach to ensure engagement in care and effective suppressive antiretroviral therapy. Discovery of risk before transmission and successful mitigation represents the holy grail of infectious disease control, and successful implementation would serve to accelerate the progress towards ending the HIV epidemic.
One of the limitations of our study is the sampling frame only covers an estimated quarter of data that are generated from individuals in the region. While the lower sampling density may decrease the number of transmission clusters that are identified due to missing data, our findings suggest that the identified clusters are robust. The persistence of the identified clusters after the addition of the most closely related sequences from GenBank suggests that the inferred relationships were not simply due to sampling bias. Our data are limited to PWH receiving care and does not capture the important subset of PWH who are not in care. Although having near complete sampling from the affected community would result in finding more transmission clusters, even this subset yields important information with actionable findings, of the kind pivotal to ending the HIV epidemic efforts. While individual data related to social contacts was not available in our data set, partner information is often under-reported [43]. The dataset does not include ART histories and thus limits full characterization of the HIV care-continuum for the studied population.
Lastly, there has been a mixed response to using molecular epidemiology to characterize transmission dynamics both in the scientific community and among advocacy groups [44][45][46][47]. It remains critical to assess any presumed negative consequences. Our analysis validates the utility of genotypic and viral load data even in the absence of detailed epidemiologic data and may be an important adjunctive public health tool in areas where concerns related to stigma and marginalization may limit the engagement of people living with HIV [48].

Summary
Optimal use of molecular genotypic data should include more relaxed genetic distance cutoffs to identify putative transmission clusters. These data should be analyzed in the context of additional routine clinical and laboratory data that are collected during the implementation and monitoring of standard of care treatment and prevention programs. Such an approach would provide context and identify opportunities for intervention to guide care teams to provide differentiated care services and optimize resource allocation and utilization. Only through bold and integrated interventions will we reach the next milestone towards ending the HIV epidemic. Funding: This research has received funding from the Washington, DC, Metropolitan Site of the Women's Interagency Human immunodeficiency Virus (HIV) Study (WIHS) (multi-principal investigators, M. Y. and S. G. K.) is supported by the National Institutes of Health (NIH), National Institute of Allergy and Infectious Diseases (NIAID; U01AI034994) and co-funded by the National Cancer Institute, the National Institute on Drug Abuse, and Eunice Kennedy Shriver National Institute of Child Health and Human Development. This project was funded in part with federal funds (grant KL2TR000102, previously KL2RR031974) from the National Center for Research Resources and the National Center for Advancing Translational Sciences, a trademark of DHHS, part of the Re-Engineering the Clinical Research Enterprise. This research has also been facilitated by the services and resources provided by the District of Columbia Center for AIDS Research, an NIH funded program (P30AI117970), which is supported by the following NIH Co-Funding and Participating Institutes and Centers: NIAID, NCI, NICHD, NHLBI, NIDA, NIMH, NIA, NIDDK, NIMHD, NIDCR, NINR, FIC and OAR. The content is solely the responsibility of the authors and does not necessarily represent the official views of the NIH. The funders had no role in the design of the study; in the collection, analyses, or interpretation of data; in the writing of the manuscript; or in the decision to publish the results.
Informed Consent Statement: Patient consent was waived as the study made use of existing data that were used for research purposes as a limited data set with limited identifiers: ZIP codes, birth dates, or other dates only. Consent was not sought as it was not thought to be feasible given the retrospective design of the study, and the research could not be practicably conducted without the waiver of consent.

Data Availability Statement:
The data were made available to the study team through a Data Licensing and Services agreement between QUEST Diagnostics and Georgetown University and does not permit transfer of the data. The national reference laboratory has rights to the data.