Spatial Clustering of County-Level COVID-19 Rates in the U.S.

Despite the widespread prevalence of cases associated with the coronavirus disease 2019 (COVID-19) pandemic, little is known about the spatial clustering of COVID-19 in the United States. Data on COVID-19 cases were used to identify U.S. counties that have both high and low COVID-19 incident proportions and clusters. Our results suggest that there are a variety of sociodemographic variables that are associated with the severity of COVID-19 county-level incident proportions. As the pandemic evolved, communities of color were disproportionately impacted. Subsequently, it shifted from communities of color and metropolitan areas to rural areas in the U.S. Our final period showed limited differences in county characteristics, suggesting that COVID-19 infections were more widespread. The findings might address the systemic barriers and health disparities that may result in high incident proportions of COVID-19 clusters.


Introduction
Since late 2019, coronavirus disease 2019 , caused by the novel severe acute respiratory syndrome coronavirus-2 (SARS-CoV-2), has rapidly spread around the globe [1][2][3]. Early studies suggested that SARS-CoV-2 originated in a local market in China, where it was transmitted from animals to humans [4]. Initial COVID-19 cases were reported in November 2019, with the World Health Organization (WHO) declaring it a pandemic on 11 March 2020 [5]. Emerging research suggests human-to-human transmission of COVID-19 through respiratory droplets or direct contact with an infected person [1,3,4,6,7]. Based on data from August of 2021, there were over 202 million global COVID-19 cases, with over four million deaths globally [8]. The highest percentage of cumulative cases associated with the COVID-19 burden was concentrated in the Americas, Europe, and Asia [8].
The spread of COVID-19 within the United States (U.S.) may be influenced by sociodemographic conditions that vary geographically [9][10][11][12][13]. For example, initial U.S. reports identified geographic disparities in the availability of personal protective equipment, ventilators, intensive care unit beds, hospital beds, and other vital medical resources necessary to treat COVID-19 [9,14,15]. As more data have become publicly available [16,17], racial/ethnic disparities in COVID-19 incident proportions and associated fatalities within the U.S. are likely attributable to the intersection of long-standing social injustices and structural discrimination with socioeconomic status (S.E.S.) and built (or physical) environmental factors. Many of these factors unequally distribute the risk for COVID-19 across the U.S. [18,19]. Furthermore, urban areas may be important to study with regard to the COVID-19 pandemic. For example, those living in urban areas are more likely to be subjected to racial and socioeconomic residential segregation, which can inequitably expose residents to a cadre of factors that would increase the overall transmission of the virus, as well as the overall severity of the disease [20].
Some county-level characteristics may be correlated with COVID-19. Recent studies on COVID-19 focusing on neighborhood social contexts demonstrated that poverty, comorbidities, and race/ethnicity are some of the important correlates to COVID-19 outcomes [7]. Other studies showed that living in poverty may increase the risk for contracting COVID-19 through impaired access to healthcare [21,22], higher risk of comorbidities [23], and an impaired ability to practice physical or social distancing [24]. Specifically, using cross-sectional data within the United States, researchers have examined the relationship between county-level sociodemographic risk factors on COVID-19 incidence and mortality. Their analyses suggest that increases in county level social vulnerability indices, especially county-level minority proportions and English language proficiency, were related to increases in both incidence and mortality rates [25]. These findings have been supported by other articles using preliminary data from state health departments and other anecdotal evidence [26][27][28][29][30]. The disparities apparent by race/ethnicity may be attributable to the legacy of institutional racism that influences the rates of chronic diseases and access to healthcare coverage [31]. Taken together, the convergence of these sociodemographic and structural factors may be essential elements to consider in understanding the burden of COVID-19 within the U.S.
Although COVID-19 data continue to emerge, limited research explores the spatial clustering of COVID-19 cases. To the best of our knowledge, only one study used geographically weighted regression models (i.e., examining spatial relationships at a variety of geographic scales that provide nonparametric estimates) to better elucidate geospatial patterns of COVID-19 incidence proportions within the continental U.S., concerning sociodemographic and environmental variables [32,33]. These results indicated that a combination of income indicators, healthcare professionals, and the percentage of Black females could explain the variability of disease incidence within the contiguous U.S. [32]. However, the prior study is limited in providing descriptive characteristics of the counties based on their clustering type. Therefore, their analyses suggest regions or states that have a higher disease burden.
There are also studies that have used complex modeling to examine some of the racial/ethnic disparities in COVID-19 rates in the United States. Specifically, researchers used structured compartmental models for seroprevalence data from the state of New York to examine immunity thresholds, final sizes, and COVID-19 risk across groups [34]. Their models suggest that the higher cumulative incidence for Hispanics and non-Hispanic Blacks compared to non-Hispanic Whites reflected the different racial/ethnic inequalities in both individual and community level socioeconomic status indicators [34]. Additionally, researchers have used machine learning methods to examine the role of racial residential segregation on COVID-19 infection and mortality [35]. Their models suggest that counties that are a standard deviation above the mean for racial residential segregation were more likely to have infection and mortality rates that were higher than other counties [35]. These studies suggest the inequitable burden of the COVID-19 pandemic on communities of color within the United States.
In contrast, spatial clustering analyses with Moran's I can determine specific counties with higher or lower case rates relative to their surrounding counties. Additionally, spatial clustering analyses can identify sociodemographic characteristics that may influence COVID-19 incidence. Thus, the objectives of this study were to use spatial clustering analyses to determine whether COVID-19 incident proportions vary spatially in the U.S., how spatial clustering may change over time, and sociodemographic characteristics of counties within high COVID-19 incident proportion clusters in the U.S. We hypothesize that there would be higher COVID-19 clusters in areas with lower socioeconomic status and adverse community conditions.

Data and Measures
The number of COVID-19 cases by county was downloaded from U.S.A. Facts. Four distinct periods were created to better understand changes in spatial clusters by each period, which were defined as quartiles based on the cumulative cases as of 30 April 2021: 22 January-16 May 2020, 17 May-9 September 2020, 10 September-4 January 2021, and 5 January 2021-30 April 2021. The total number of persons residing in each county was gathered from 2019 Estimates, which was downloaded from the U.S. Census Bureau [36].
To standardize the number of COVID-19 cases across the U.S., the number of COVID-19 cases for each county was divided by the total population within each county and then multiplied by 100,000. The COVID-19 incident proportion is defined as the total number of positive cases divided by the total population within each county. The incident cases for each period were created by subtracting the case numbers on the first day of the study period from the number on the last day of the study period and multiplying each by 100,000 to obtain rates per 100,000 persons. An incident proportion was created for each county in the contiguous U.S., excluding Hawaii, Alaska, and U.S. territories [37]. Additional data on county-level racial/ethnic and age composition, socioeconomic factors, health outcomes, and health behaviors were downloaded from the 2020 Robert Wood Johnson Foundation's County Health Indicator data (Table A1) [38]. The county level characteristics that were selected for this study have been used in previous studies [39][40][41][42].
2.2. Geographic Information System (G.I.S.) Process 2018 U.S. county cartographic boundary shapefiles were downloaded from the U.S. Census Bureau and uploaded into ArcGIS 10.5.1 (ESRI, Redlands, CA, USA). However, only counties in the contiguous U.S. were included in the analyses [40]. Using choropleth maps, the COVID-19 incident proportions during each of the periods were mapped in ArcGIS 10.5.1 across the contiguous U.S. counties and independent cities.

Statistical Analysis
Four distinct periods of the data were analyzed in a two-step process. Briefly, during the first step, Global Moran's I was used to examine if COVID-19 incident proportions on the county level were spatially autocorrelated [43]. These values range from −1 to +1. If the Moran's I value is positive, there is a clustering of COVID-19 incident proportions within the surrounding geographic area (i.e., counties). If the Moran's I value is negative, the COVID-19 incident proportions are dispersed across the geographic area. Furthermore, inverse distance was applied to examine these spatial relationships, where nearby county COVID-19 incident proportions have a more significant impact on the computations for each county compared to counties that are further away, as previously carried out [40,44]. Z-scores refer to standard deviations from the mean; the more significant the standard deviation, the greater the results are from the mean or standard normal distribution [45]. These analyses identify statistically significant clusters of COVID-19 across the contiguous counties in the 48 U.S. states. The second step of the data analysis involved using the Anselin Local Moran's I to identify the specific counties within the contiguous U.S. with high and low COVID-19 incident proportions statistically different from nearby counties [46]. The results of this analysis provide a map of the spatial distribution of clustering to identify hot spots (counties with high COVID-19 incident proportions), cold spots (counties with low COVID-19 incident proportions), spatial outliers (counties with COVID-19 incident proportions that differ from nearby counties), and clusters that do not fall within any cluster type. This tool classifies each county into the groups mentioned above by using the COVID-19 prevalent/incident cases and creating a local Moran's I value, a z-score, a p-value, and a classification identifier [46]. The final z-score and p-values for each county represent the likelihood of a statistically significant difference in each county's COVID-19 incident proportions [46]. These results also allow for a basic comparison between the characteristics of the hot and cold spot clusters and their outliers. We then used a t-test to compare each cluster type to each unclustered county to test for a statistical difference in the county's sociodemographic characteristics [47].

Overall Distribution of COVID-19 Cases
The overall distribution of COVID-19 cases between the first reported COVID-19 case in the U.S. on 22 January 2020 and 16 May 2020 ranged from 0 to 12,247.43 cases per 100,000 persons across the U.S. counties ( Figure 1). Based on the geographic distribution of COVID-19, the highest COVID-19 incident proportions were found among counties in the New England and mid-Atlantic areas (primarily Massachusetts, Rhode Island, Connecticut, New York, New Jersey, Maryland, and Pennsylvania), southern Louisiana, southeastern Michigan, western New Mexico, southern Alabama, southern Mississippi, southern Georgia, northern Arizona, and northwestern Washington. The value of Moran's I for the contiguous areas of the U.S. was 0.25; since this value is positive, it indicates that the COVID-19 cases were spatially clustered on the county level. Since the z-score was 50.72, there was a less than 1% chance that this pattern could have occurred by chance. The second step of the data analysis involved using the Anselin Local Moran's I to iden tify the specific counties within the contiguous U.S. with high and low COVID-19 inciden proportions statistically different from nearby counties [46]. The results of this analysis pro vide a map of the spatial distribution of clustering to identify hot spots (counties with hig COVID-19 incident proportions), cold spots (counties with low COVID-19 incident propor tions), spatial outliers (counties with COVID-19 incident proportions that differ from nearb counties), and clusters that do not fall within any cluster type. This tool classifies each count into the groups mentioned above by using the COVID-19 prevalent/incident cases and cre ating a local Moran's I value, a z-score, a p-value, and a classification identifier [46]. The fina z-score and p-values for each county represent the likelihood of a statistically significan difference in each county's COVID-19 incident proportions [46]. These results also allow fo a basic comparison between the characteristics of the hot and cold spot clusters and the outliers. We then used a t-test to compare each cluster type to each unclustered county t test for a statistical difference in the county's sociodemographic characteristics [47].

Overall Distribution of COVID-19 Cases
The overall distribution of COVID-19 cases between the first reported COVID-19 cas in the U.S. on 22 January 2020 and 16 May 2020 ranged from 0 to 12,247.43 cases per 100,00 persons across the U.S. counties ( Figure 1). Based on the geographic distribution of COVID 19, the highest COVID-19 incident proportions were found among counties in the New Eng land and mid-Atlantic areas (primarily Massachusetts, Rhode Island, Connecticut, New York, New Jersey, Maryland, and Pennsylvania), southern Louisiana, southeastern Mich gan, western New Mexico, southern Alabama, southern Mississippi, southern Georgia northern Arizona, and northwestern Washington. The value of Moran's I for the contigu ous areas of the U.S. was 0.25; since this value is positive, it indicates that the COVID-1 cases were spatially clustered on the county level. Since the z-score was 50.72, there wa a less than 1% chance that this pattern could have occurred by chance.  The overall distribution of incident COVID-19 proportions ranged from 0 to 14,355.26 per 100,000 persons between 17 May 2020 and 9 September 2020. Based on the geographic distribution of COVID-19, higher incident cases could be found among counties widespread in the Southern states (e.g., South Carolina, Florida, and Mississippi), southern and eastern Arizona, and interior California ( Figure 2). The value of Moran's I for the contiguous areas of the U.S. was 0.47; since this value is positive, it indicates that the COVID-19 cases were spatially clustered on the county level. Since the z-score was 92.36, there was a less than 1% chance that this pattern could have occurred by chance.
The overall distribution of incident COVID-19 proportions ranged from 0 to 14,094.04 per 100,000 persons between 10 September 2020 and 4 January 2021. Based on the geographic distribution of COVID-19, higher incident cases can be found among counties in the Midwest (e.g., the Dakotas, Wisconsin, and Kansas), Tennessee, western Texas, and eastern New Mexico ( Figure 3). The value of Moran's I for the contiguous areas of the U.S. was 0.53; since this value is positive, it indicates that the COVID-19 cases were spatially clustered on the county level. Since the z-score was 105.12, there was a less than 1% chance that this pattern could have occurred by chance.
The overall distribution of incident COVID-19 proportions ranged from 0 to 15,181.75 per 100,000 persons between 5 January 2021 and 30 April 2021. Based on the geographic distribution of COVID-19, higher incident cases were found among counties in the southwest (e.g., Arizona and interior California), interior Texas, eastern Michigan, and along the East Coast (e.g., Massachusetts, New Jersey, the Carolinas) ( Figure 4). The value of Moran's I for the contiguous areas of the U.S. was 0.35; since this value is positive, it indicates that the COVID-19 cases were spatially clustered on the county level. Since the z-score was 68.37, there was a less than 1% chance that this pattern could have occurred by chance. The overall distribution of incident COVID-19 proportions ranged from 0 to 14,355.26 per 100,000 persons between 17 May 2020 and 9 September 2020. Based on the geographic distribution of COVID-19, higher incident cases could be found among counties widespread in the Southern states (e.g., South Carolina, Florida, and Mississippi), southern and eastern Arizona, and interior California ( Figure 2). The value of Moran's I for the contiguous areas of the U.S. was 0.47; since this value is positive, it indicates that the COVID-19 cases were spatially clustered on the county level. Since the z-score was 92.36, there was a less than 1% chance that this pattern could have occurred by chance. The overall distribution of incident COVID-19 proportions ranged from 0 to 14,094.04 per 100,000 persons between 10 September 2020 and 4 January 2021. Based on the geographic distribution of COVID-19, higher incident cases can be found among counties in the Midwest (e.g., the Dakotas, Wisconsin, and Kansas), Tennessee, western Texas, and eastern New Mexico ( Figure 3). The value of Moran's I for the contiguous areas of the U.S. was 0.53; since this value is positive, it indicates that the COVID-19 cases were spatially

Anselin's Local Moran's I
The Anselin Local Moran's I classified each county in the contiguous U.S. based on their similarities or differences. Based on the results of the Anselin Local Moran's I, widespread clusters of high COVID-19 incident proportions, as of 16 May 2020, were statistically significant and located in the New England and Mid-Atlantic states (e.g., Massachusetts, New York, and Delaware), southwestern Georgia, Mississippi, southern Alabama, southeastern Louisiana, northern Texas, western Oklahoma, southwestern Kansas, and northern Arizona ( Figure 5). High prevalence COVID-19 county outliers were primarily found in northern Nevada. Clusters of counties with low numbers of COVID-19 cases were significantly located in the Appalachian Mountains area, the midwestern states (including Wisconsin, Minnesota, North and South Dakota, Illinois, Nebraska, Iowa, and Montana) southern states (including Missouri, Arkansas, Kansas, Oklahoma, and Texas) and western states (e.g., Oregon, northern California, southern Utah, southeastern Idaho, eastern Wyoming). Low prevalence outliers were primarily concentrated in southeastern Nebraska, western Iowa, northern Texas, southeastern Arkansas, northern Mississippi, central Alabama, and central Georgia.

Anselin's local Moran's I
The Anselin Local Moran's I classified each county in the contiguous U.S. based on their similarities or differences. Based on the results of the Anselin Local Moran's I, widespread clusters of high COVID-19 incident proportions, as of 16 May 2020, were statistically significant and located in the New England and Mid-Atlantic states (e.g., Massachusetts, New York, and Delaware), southwestern Georgia, Mississippi, southern Alabama, southeastern Louisiana, northern Texas, western Oklahoma, southwestern Kansas, and northern Arizona ( Figure 5). High prevalence COVID-19 county outliers were primarily found in northern Nevada. Clusters of counties with low numbers of COVID-19 cases were significantly located in the Appalachian Mountains area, the midwestern states (including Wisconsin, Minnesota, North and South Dakota, Illinois, Nebraska, Iowa, and Montana) southern states (including Missouri, Arkansas, Kansas, Oklahoma, and Texas) and western states (e.g., Oregon, northern California, southern Utah, southeastern Idaho, eastern Wyoming). Low prevalence outliers were primarily concentrated in southeastern Nebraska, western Iowa, northern Texas, southeastern Arkansas, northern Mississippi, central Alabama, and central Georgia. As of 9 September 2020, high COVID-19 incident proportion clusters were primarily located in Southern states (Georgia, Alabama, Mississippi, Arkansas, southeastern Texas, and Louisiana), northwestern Iowa, Arizona, and interior California ( Figure 6). Clusters with low COVID-19 incident cases were primarily located in New England and Mid-Atlantic states, midwestern states, and western states (e.g., Montana, Idaho, Utah, northern California, Oregon, and Washington). High incidence outlier counties were located across the United States and included counties in Indiana, Illinois, Wisconsin, Missouri, the Dakotas, Nebraska, and Kansas. Low incidence outlier counties were primarily located in North Carolina, Georgia, Florida, Arkansas, southeastern Texas, southern Minnesota, and northwestern Iowa. As of 9 September 2020, high COVID-19 incident proportion clusters were primarily located in Southern states (Georgia, Alabama, Mississippi, Arkansas, southeastern Texas, and Louisiana), northwestern Iowa, Arizona, and interior California ( Figure 6). Clusters with low COVID-19 incident cases were primarily located in New England and Mid-Atlantic states, midwestern states, and western states (e.g., Montana, Idaho, Utah, northern California, Oregon, and Washington). High incidence outlier counties were located across the United States and included counties in Indiana, Illinois, Wisconsin, Missouri, the Dakotas, Nebraska, and Kansas. Low incidence outlier counties were primarily located in North Carolina, Georgia, Florida, Arkansas, southeastern Texas, southern Minnesota, and northwestern Iowa. . Low incident clusters were located primarily along the coastal regions of the U.S., stretching from Maine to Florida westward through Georgia, Louisiana, and southeastern Texas. Low incident clusters were also located from Washington state to central California. High incident outlier counties were located primarily in Alabama, South Carolina, eastern Texas, and northern California. Low incident outlier counties were primarily located in the central U.S., stretching from the Dakotas to the north, Montana to the west, western Ohio to the east, and western Texas to the South.
As of 30 April 2021, high COVID-19 incident proportion clusters were primarily located along the eastern U.S, ranging from southern New Hampshire to South Carolina. High incident proportion clusters were also located in Kentucky, Michigan, northeastern Tennessee, Oklahoma, Texas, and eastern Arizona (Figure 8). Low incident proportion clusters were primarily located in the western U.S., including Washington, Oregon, northern California, Nevada, New Mexico, Montana, and Idaho. Low incident proportion clusters were also located in the Dakotas, Nebraska, Kansas, Iowa, southern Kansas, Wisconsin, Minnesota, and Georgia. High incident outlier counties were located primarily in Nebraska, Iowa, Washington, Idaho, Colorado, and Texas. Low incident outlier counties were located primarily in Texas, Oklahoma, Tennessee, the Carolinas, Virginia, and Pennsylvania. As of 4 January 2021, high COVID-19 incident proportion clusters were primarily located in the central U.S. and included North and South Dakota, Minnesota, Wisconsin, Illinois, Indiana, Nebraska, Kansas, Oklahoma, Missouri, Montana, Wyoming, Arkansas, Tennessee, and Texas ( Figure 7). Low incident clusters were located primarily along the coastal regions of the U.S., stretching from Maine to Florida westward through Georgia, Louisiana, and southeastern Texas. Low incident clusters were also located from Washington state to central California. High incident outlier counties were located primarily in Alabama, South Carolina, eastern Texas, and northern California. Low incident outlier counties were primarily located in the central U.S., stretching from the Dakotas to the north, Montana to the west, western Ohio to the east, and western Texas to the South.
As of 30 April 2021, high COVID-19 incident proportion clusters were primarily located along the eastern U.S, ranging from southern New Hampshire to South Carolina. High incident proportion clusters were also located in Kentucky, Michigan, northeastern Tennessee, Oklahoma, Texas, and eastern Arizona (Figure 8). Low incident proportion clusters were primarily located in the western U.S., including Washington, Oregon, northern California, Nevada, New Mexico, Montana, and Idaho. Low incident proportion clusters were also located in the Dakotas, Nebraska, Kansas, Iowa, southern Kansas, Wisconsin, Minnesota, and Georgia. High incident outlier counties were located primarily in Nebraska, Iowa, Washington, Idaho, Colorado, and Texas. Low incident outlier counties were located primarily in Texas, Oklahoma, Tennessee, the Carolinas, Virginia, and Pennsylvania.

Comparison of Clustering Characteristics
Overall, there are statistically significant differences in the spatial distribution of clusters of both high and low COVID-19 incident proportions across the four time points in this study ( Table 1). As of 16 May 2020, high COVID-19 cluster counties were more likely than unclustered counties to have higher % female, % under 18, % Black, % Asian, Median Household Income, % Single Parent Households, % Food Insecure, % Unemployed, % Adults with Diabetes, % Fair or Poor Health, % Smokers, % Physically Inactive, and % Severe Housing Issues (Table 1). Low COVID-19 cluster counties were more likely than unclustered counties to have higher % Rural, % 65 and older, % White, High School Graduation Rate, and % Smokers. High COVID-19 cluster outlier counties were more likely than unclustered counties to have higher % Native American, % Hispanic, and % Uninsured. Low COVID-19 cluster outlier counties were more likely than unclustered counties to have higher % Rural, % Black, % Food Insecure, % Fair or Poor Health, and % Adults with Obesity.  29.47% (6.36%) *** Note: T-test were used to compare values for each "Unclustered Counties" to each remaining group. p-value: *** < 0.001; ** <0.01; * < 0.05.
As of 4 January 2021, high COVID-19 cluster counties were more likely than unclustered counties to have higher % under 18, % White, % Native American, Black/White Segregation, Median Household Income, High School Graduation Rate, % Homeowners, and Life Expectancy (Table 3). Low COVID-19 cluster counties were more likely than unclustered counties to have higher % Black, % Asian, % Single Parent Household, and % Severe Housing Cost Burden. High COVID-19 cluster outlier counties were more likely than unclustered counties to have higher % Single Parent Households, % Severe Housing Issues, and % Adults with Diabetes, to name a few. Low COVID-19 cluster outlier counties were more likely than unclustered counties to have higher % Rural, % 65 and older, % Asian, High School Graduation Rate, % Some College, and Food Environment Index.

Discussion
Our results show that the U.S. has significant disparities in the geographic distribution and clustering of COVID-19 cases that vary based on the time period. In the initial months of the COVID-19 pandemic, widespread areas with the highest COVID-19 incident proportions were in counties within New England, along the East Coast, and areas throughout the South. However, as the pandemic continued, high incident clusters moved from widespread areas in the Midwest back to counties along the East Coast and South. Furthermore, incidence proportions also revealed a similar trend. As the pandemic has continued, rural counties within the United States have become increasingly more prevalent in the high COVID-19 cluster areas. Racial and ethnic minorities have tended to be classified in the high COVID-19 cluster areas at one point during the pandemic; however, counties with higher percentages of Blacks, Hispanics, and Asians tended to remain in either high COVID-19 clusters or high COVID-19 outliers. Our results suggest that counties with higher percentages of Whites had remained in the low COVID-19 cluster counties until Period 3: 19 September-4 January, when counties with higher percentages of Whites became classified as high COVID-19 cluster counties. Furthermore, in the initial periods of the pandemic, counties that were considered in the low cluster areas were more likely to have higher Black/White segregation. Comparing trends between county characteristics suggests more significant differences in county characteristics during the initial periods of the pandemic, whereas more recent data suggests that there are not as many significant differences between county demographics.
Our findings around the inequitable adverse county-level exposures associated with higher clusters of COVID-19 cases contribute to the emerging literature around the inequitable distribution of COVID-19 in the U.S.A recent spatial analysis of COVID-19 suggested that several factors, including sociodemographic characteristics and comorbidities, would be associated with areas with an increased COVID-19 burden [7]. Additionally, as more research has emerged around the COVID-19 pandemic, analyses have suggested that more vulnerable populations may be at an increased risk of COVID-19 infection. Specifically, counties that have a higher minority population and lower English language proficiency may be more vulnerable than counties that are predominately White and native English speakers [25]. Furthermore, these relationships have also been suggested by machine learning and other modeling techniques, which have shown a disproportionate burden of COVID-19 infection and mortality among communities of color within the United States [34,35]. Again, structural inequalities likely contribute to these disparities that are apparent in the existing literature and our study. Like prior findings around sociodemographic characteristics and health status, we demonstrated that the counties that fell into statistically significant higher prevalence or incidence cluster areas had a lower percentage of Whites and an increasing percentage of households with overcrowding and cost concerns.
The outcomes of our Anselin Local Moran's I analyses suggest that inadequate housing, under/unemployment, preexisting adverse health conditions (i.e., diabetes and obesity), health behaviors (i.e., physical inactivity and smoking), and household characteristics, in addition to segregation along racial lines, have created additive effects of social determinants of health that may help to explain some the inequitable distribution and unequal risk of COVID-19 among minority communities within the contiguous U.S. These conditions culminate in chronic stress, which may impair the immune system, potentially increasing susceptibility to COVID-19 and adverse health complications [48][49][50][51].
The results from this study can be used to help inform public policy for mitigating COVID-19 risk. The cluster demographics results can be used to identify emergent cases and clusters of COVID-19 further in order to develop a more robust infrastructure to monitor COVID-19 infections [52]. Our results suggest that, until the fourth period of analysis in this study, counties with higher percentages of Blacks, Hispanics, and Asians were more likely to be classified within the high cluster areas. Similarly, counties with higher percentages of Whites remained in the low cluster categories until the third analysis period for this study, when counties with higher percentages of Whites were classified in the high cluster categories. Our results are similar to a recent study examining the geographical variations of COVID-19 cases and demographic characteristics [53]. Like our results, this study found a significant correlation between county percentages of Blacks and COVID-19 cases and deaths-however, these results did not remain among Hispanic populations.
Furthermore, counties with higher levels of Whites were negatively correlated with COVID-19 cases and deaths. Our findings around the potential racial burden of COVID-19 are also supported by a recent study by Mahajan and Larkins-Pettigrew. Their study examined COVID-19 and race by county [54]. Like our findings, their results suggest a positive correlation between county percentages of both Blacks and Asians and COVID-19 cases. However, both our study and theirs suggest that counties with a higher percentage of Whites had lower COVID-19 cases and deaths.
This investigation has several strengths and limitations. The strengths of this study include testing for spatial autocorrelations using Moran's I and Anselin Local Moran's I, which have not been conducted to test spatial clustering of COVID-19 incident proportions across U.S. counties. Furthermore, to the best of our knowledge, this is the first study that explores the sociodemographic characteristics of the different clusters of COVID-19 within the continental U.S. Limitations include the delayed onset of COVID-19, which may underrepresent the published data used for the analysis of the study. The number of cases included in these analyses is those tested for COVID-19; this number could be underreported, as some people may be asymptomatic or may opt out of COVID-19 testing. Additionally, the county characteristics reflect patterns at the county level and not the characteristics of those who tested positive for the virus. Furthermore, our analyses are unable to account for the confounding of local policies with public health interventions that may relate to differences in case numbers within these counties.
Additionally, using counties as the geographic scale may mask the actual dispersion of COVID-19 incident proportions within counties. As mentioned, census tracts or census block groups may be at a more granular scale. However, all of these geographic scales may be subject to the modifiable areal unit problem (i.e., geographical boundaries may change over time, altering the comparison of the results across multiple years [40,55]. Furthermore, smaller administrative boundary areas, such as counties and/or zip codes, may be the most granular level on which these data can be represented due to privacy concerns. Future research could potentially utilize granular data at the census tract level, for example, while protecting the privacy of less populated census tracts or block groups.

Conclusions
Despite the dearth of research using geographic information systems to examine COVID-19, we sought to examine the spatial distribution of COVID-19 within the continental U.S. using geospatial statistical analyses. The results of our analyses suggest that several sociodemographic variables are correlated with higher county-level proportions of COVID-19. The results of this study may be helpful for health policy decision-makers in their attempts to provide vital public health resources and dismantle some of the structural barriers faced by some residents within high COVID-19 prevalence or incidence clusters. These types of spatial clustering maps can allow state and county leaders to strategically mitigate an increase in incident proportions in their jurisdictions based on public health evidence. By identifying spatial clusters, state and local-level government leaders can more effectively monitor their jurisdictions in a public health-informed manner to prevent an uncontrollable spread of COVID-19. Ultimately, these data can be used to prioritize and reallocate vital public health, treatment, and testing equipment to the most impacted areas. It is also important to note that this may not be the last pandemic. The strategies used to mitigate this pandemic may help to prevent or address future pandemics.
Institutional Review Board Statement: Not available.

Conflicts of Interest:
The authors declare no conflict of interest.