Assessing Early Heterogeneity in Doubling Times of the COVID-19 Epidemic across Prefectures in Mainland China, January–February, 2020

To describe the geographical heterogeneity of COVID-19 across prefectures in mainland China, we estimated doubling times from daily time series of the cumulative case count between 24 January and 24 February 2020. We analyzed the prefecture-level COVID-19 case burden using linear regression models and used the local Moran’s I to test for spatial autocorrelation and clustering. Four hundred prefectures (~98% population) had at least one COVID-19 case and 39 prefectures had zero cases by 24 February 2020. Excluding Wuhan and those prefectures where there was only one case or none, 76 (17.3% of 439) prefectures had an arithmetic mean of the epidemic doubling time <2 d. Low-population prefectures had a higher per capita cumulative incidence than high-population prefectures during the study period. An increase in population size was associated with a very small reduction in the mean doubling time (−0.012, 95% CI, −0.017, −0.006) where the cumulative case count doubled ≥3 times. Spatial analysis revealed high case count clusters in Hubei and Heilongjiang and fast epidemic growth in several metropolitan areas by mid-February 2020. Prefectures in Hubei and neighboring provinces and several metropolitan areas in coastal and northeastern China experienced rapid growth with cumulative case count doubling multiple times with a small mean doubling time.


Introduction
The coronavirus disease 2019 (COVID-19) pandemic, caused by the severe acute respiratory syndrome coronavirus 2 (SARS-CoV-2), began in Wuhan and soon spread globally. Prior studies have investigated SARS-CoV-2 transmission dynamics in mainland China [1]. For instance, population mobility data was found to be highly predictive of COVID-19 importation risk from Wuhan to other Chinese cities in early 2020 [2]; population flow from Wuhan to 296 prefectures was found to drive the spatiotemporal distribution of COVID-19 cases in China in the spring of 2020 [3]. However, detailed descriptions of Epidemiologia 2021, 2 96 mainland China's sub-provincial administrative units that reported COVID-19 data (i.e., prefecture or equivalents) remained a gap in the epidemiological literature.
The doubling time of cumulative case count describes how fast an epidemic is growing [4]. For example, in 2003, the doubling times of the severe acute respiratory syndrome (SARS) epidemic in different countries across the world was investigated, and it was suggested that variation in doubling times of SARS epidemics across countries might arise from variation in both transmission rates and control efforts [5]. In 2020, doubling times of the COVID-19 pandemic across different European countries, different regions in Iran, and different states of the United States (U.S.) have been studied, respectively [6][7][8][9]. We previously analyzed the COVID-19 epidemic doubling time by province in mainland China, from 20 January through 9 February 2020 [10]. Further analysis by prefecture would provide a more detailed account of the early phase of the pandemic within mainland China. We hypothesized that population size or population density might be correlated with the arithmetic mean of the doubling times and ran regression models accordingly. We also investigated the power-law relationship between cumulative case count and population size. Furthermore, we investigated potential prefecture-level spatial clustering of the cumulative case count, the total times the cumulative case count doubled and the mean doubling times. Spatial autocorrelation Moran's I statistics was performed to identify potential clusters.
The objectives of this paper are: (1) to describe the sub-provincial administrative units that reported COVID-19 case count data as reflected in an oft-cited dataset, and whether their cumulative case count in the early phase of the epidemic follows a power-law relationship with population size; (2) to compute the COVID-19 epidemic doubling times by prefecture in mainland China in the early phase of the epidemic and its relation with population size and density; and (3) to identify spatial clusters of the cumulative number of cases and the total times the cumulative case count doubled by prefecture in mainland China in the early phase of the epidemic.

Geographic Scope
The geographic area of this study was mainland China comprising 22 provinces, 5 ethnic minority "autonomous regions" and 4 centrally administered municipalities (Appendix A). In the provinces and "autonomous regions," there were three tiers of subprovincial administrative units: prefecture-level units, county-level units, and townshiplevel units [11]. Among the prefecture-level units, the majority were prefectural-level cities that encompass cities and their surrounding counties [11]. In the centrally administered municipalities of Beijing, Chongqing, Shanghai, and Tianjin, there were two sub-municipal levels: urban districts and street communities [11].

COVID-19 Cumulative Incidence Data Sources
Sub-provincial cumulative numbers of confirmed COVID-19 cases were reported daily by provincial health commissions since 20 January 2020 [10]. Such data were collated daily by DingXiangYuan (abbreviated as DXY), an online community of mainland Chinese healthcare professionals [12]. DXY maintained a publicly available and oft-cited website that published the aggregated data that was updated daily. DXY was the source of mainland Chinese COVID-19 case count data available on the oft-cited Johns Hopkins University dashboard [13]. We downloaded DXY data from an openly available Github source that crawled data from DXY (a crawler developed by Isaac Lin, aka "BlankerL") [14]. The dataset analyzed here covered one month immediately after the implementation of Wuhan's cordon sanitaire on 23 January 2020: from 24 January 2020 (the first date of this dataset) through 24 February 2020 (the day this dataset was retrieved) by the date the data was retrieved from governmental press releases by DXY. We cleaned the dataset for errors and inconsistencies in data entry, as per the official press releases that our team collected from provincial government websites [10].

Epidemic Doubling Time
If the cumulative case count C(t) doubles between time point t 1 and time point t 2 , i.e., C(t 2 )/C(t 1 ) = 2, the time difference t 2 − t 1 is known as the epidemic doubling time, t d . A long epidemic doubling time indicates a slow epidemic growth. The shorter the doubling time, the faster the epidemic grows. We computed how long it took for the cumulative case count to double each time. The arithmetic means of the successive epidemic doubling times provide summaries of the epidemic growth of a location over the study period. Given that the doubling times are inversely proportional to the growth rate, and given a fixed study period, the number of times the cumulative case count has doubled provides a crude indicator of the epidemic growth; however, this is sensitive to the starting value of the cumulative incidence. The numbers of doubling times are presented alongside their arithmetic mean by prefecture.

Regression
We ran linear regression models with the dependent variable the arithmetic mean doubling time, and independent variable (a) population size or (b) population density, with the date of the first reported COVID-19 case as a covariate. The unit of analysis was a prefecture. For these prefectures, we obtained their population data from the 2010 China Census [15], and the geographical area data from the respective English language Wikipedia page for each province, autonomous region, or centrally administered municipality. We excluded prefectures where the cumulative case count doubled two or fewer times by 24 February 2020, from our regression models, as they introduced a lot of noise therein.
We characterized the functional relationship between population size and cumulative case count in prefectures excluding those in Hubei by 24 February 2020. Prefectures in Hubei were excluded because their cumulative case count was disproportionately high compared to those in other provinces. If the relationship between population size and cumulative case count follows the power-law, then log(cumulative case count) = g * log(population size), or, log(cumulative case count/population size) = m * log(population size) where m = g − 1. Per capita cumulative case count would be exactly proportional to population size and there would be no heterogeneity when m = 0. Prefectures with lower population size would have a higher per capita cumulative case count when m < 0; and a lower per capita cumulative case count when m > 0. Linear regression was used to obtain an estimate of m [16]. Please refer to the Appendix A for further details.

Spatial Clustering
In this study, the local Moran's I index was used to identify the spatial clusters of COVID-19 cases and doubling times in mainland China. Using case count as an example, the local Moran's I index can be expressed as: where I i is the local Moran's I index for location i; z i is the cumulative number of reported cases at location i; z is the mean value of reported cases; σ 2 is the variance of z, and w ij is the spatial weight matrix which is represented based on a distance of weighting between locations i and j. The local Moran's I index can reflect the clusters of homogeneous values (e.g., high values surrounded by neighbors with high values) [17].

Programming
We used Python to process the cumulative incidence data and compute the COVID-19 epidemic doubling time by prefecture. R 3.5.1 to 3.6.2 (R Core Team, R Foundation for Statistical Computing, Vienna, Austria) was used in statistical analysis. ArcGIS Pro (Version 2.4.0) was used in spatial analysis and map creation.

Ethics
The Institution Review Board of Georgia Southern University determined that this was not defined as human subjects research under human subjects regulations (H20364).

Sub-Provincial COVID-19 Reporting Units
A total of 462 sub-provincial administrative entities (reporting units) reported COVID-19 data, including zero case counts (Table 1). They comprised 448 sub-provincial geographical reporting units and 14 divisions of the Xinjiang Production and Construction Corps (XPCC) [18]. The XPCC divisions were excluded from further analysis. The 448 sub-provincial geographical reporting units are referred to as "prefectures" in this paper, even though prefectures are just one type of many different reporting units (Appendix A). Of the 448 prefectures, 39 had zero cumulative case count by 24 February 2020. Of the remaining 409 prefectures with cases, the case in Ganjiang New District was merged with the cases in the City of Nanchang for our maps and statistical analysis. Thus, a total of 408 prefectures were used in creating maps and conducting spatial analysis. Of these 408 prefectures, 8 were excluded for our statistical analysis by population and population density due to their geographic irregularities. Thus, a total of 439 prefectures (400 with cases and 39 without cases by 24 February 2020) were included in the statistical analysis (Tables 1 and 2). In Tables 3 and 4, we provide the descriptive statistics of the population and population density of 439 prefectures by their cumulative number of reported COVID-19 cases. Figure 1 describes the change in cumulative case count by prefecture by week from 26 January through 16 February 2020; cumulative case count on 23 February 2020, is presented in Figure 2. Our results highlight the geographic extent of the epidemic that affected many Chinese prefectures. As presented in Table 3, a total of 98% (1300 million; 2010 Census) of the mainland Chinese population lived in the 400 prefectures with at least one COVID-19 case by 24 February 2020. Nevertheless, some remote prefectures were spared. The 39 prefectures without any cases by 24 February 2020 had a population of 32 million (2%; 2010 Census). The city of Wuhan was the only reporting unit with a cumulative number of 10,000+ confirmed cases. Another seven prefectures, all in Hubei, with a total population of 27.8 million (2010 Census) had a case count in the order of thousands (1000-9999) each. Outside Hubei, all the prefectures reported a cumulative case count <1000 as of February 24. A total of 36 prefectures (8 in Hubei and 28 outside Hubei) with a total of 220 million inhabitants (2010 Census) reported case counts in the hundreds (100-999) ( Table 3).    Figure 1 describes the change in cumulative case count by prefecture by week from 26 January through 16 February 2020; cumulative case count on 23 February 2020, is presented in Figure 2. Our results highlight the geographic extent of the epidemic that affected many Chinese prefectures. As presented in Table 3, a total of 98% (1300 million; 2010 Census) of the mainland Chinese population lived in the 400 prefectures with at least one COVID-19 case by 24 February 2020. Nevertheless, some remote prefectures were spared. The 39 prefectures without any cases by 24 February 2020 had a population of 32 million (2%; 2010 Census). The city of Wuhan was the only reporting unit with a cumulative number of 10,000+ confirmed cases. Another seven prefectures, all in Hubei, with a total population of 27.8 million (2010 Census) had a case count in the order of thousands (1000-9999) each. Outside Hubei, all the prefectures reported a cumulative case count <1000 as of February 24. A total of 36 prefectures (8 in Hubei and 28 outside Hubei) with a total of 220 million inhabitants (2010 Census) reported case counts in the hundreds (100-999) ( Table 3).    Added unto our maps is the Hu Line, a separator of population density in China first proposed by the Chinese geographer Hu Huanyong in 1935 [19]. This conceptual line is a straight line connecting Heihe, Heilongjiang Province, to Tengchong, Yunnan Province. The Hu Line divides China into a densely populated southeast and a sparsely populated northwest. The Hu Line succinctly articulates the demographic and geographic disparities in China [20]. To the northwest of the Hu Line, 6% of China's population spans across more than half of China's territory. That means for every square-kilometer there are 11 people, approximately one-fourth of the average global population density [21]. In contrast, approximately 94% of the Chinese population lives to the southeast of the Hu Line. It translates into a population density of 260 people per square-kilometer, approximately six times the global average [21]. By applying the Hu Line to our maps of the COVID-19 epidemic in China (Figures 1, 2, 4 and 5), we present graphically how the vast majority of the Chinese population lived in prefectures affected by COVID-19 in February 2020. Those prefectures that reported zero cases were sparsely populated even though their areas were large. Added unto our maps is the Hu Line, a separator of population density in China first proposed by the Chinese geographer Hu Huanyong in 1935 [19]. This conceptual line is a straight line connecting Heihe, Heilongjiang Province, to Tengchong, Yunnan Province. The Hu Line divides China into a densely populated southeast and a sparsely populated northwest. The Hu Line succinctly articulates the demographic and geographic disparities in China [20]. To the northwest of the Hu Line, 6% of China's population spans across more than half of China's territory. That means for every square-kilometer there are 11 people, approximately one-fourth of the average global population density [21]. In contrast, approximately 94% of the Chinese population lives to the southeast of the Hu Line. It translates into a population density of 260 people per square-kilometer, approximately six times the global average [21]. By applying the Hu Line to our maps of the COVID-19 epidemic in China (Figures 1, 2, 4, and 5), we present graphically how the vast majority of the Chinese population lived in prefectures affected by COVID-19 in February 2020. Those prefectures that reported zero cases were sparsely populated even though their areas were large.       Figure 4 shows that the cumulative case count in prefectures in Hubei and its neighboring provinces doubled eight or more times in the study period. The cumulative case count in Wuhan doubled 15 times in the study period; another 19 prefectures doubled 8 to 11 times (Table 5). Severe epidemics happened in major metropolitan areas, such as Guangzhou and Shenzhen in Guangdong Province in the South, Wenzhou, Taizhou, Ningbo, and Hangzhou in Zhejiang Province in the East, as well as in the northeastern cities of Qiqihar and Harbin in Heilongjiang Province.

Doubling Time
Another measure of the growth rate of the epidemic is the arithmetic mean of the doubling times ( Figure 5). This metric was low in Hubei province and several coastal cities, indicating fast epidemic growth. Excluding Wuhan and excluding those prefectures where there was only one case or none, a total of 76 (17.3% of 439) prefectures had an arithmetic mean of the epidemic doubling time of <2 days (Table 5).
Among prefectures outside Hubei province and where the epidemic had doubled ≥3 times, for every increase in 100,000 residents, the arithmetic means of the doubling times changed by −0.012 (95% CI, −0.017, −0.006) after controlling for the date of the first reported case (Table 6). While the association between population size and the arithmetic mean of the doubling times was statistically significant (p < 0.001), the model only explained a small part of the variance (adjusted R 2 = 0.057). We further tested if there is any association between population density and the arithmetic mean of the doubling times; the statistical association was found to be insignificant (Table 6).

Spatial Clustering
Based on the local Moran's I clusters for the cumulative number of confirmed cases on 16 February 2020, most prefectures in Hubei province reported significantly more cases than prefectures in other provinces ( Figure 6). Prefectures in Hubei had similarly high case numbers as their neighbors, exhibiting a high-high cluster. The prefectures adjacent to Hubei province, represented by dark blue on the map, exhibit a low-high pattern. These prefectures had significantly lower numbers than cities in Hubei province. Harbin in the Heilongjiang Province showed a high-low pattern, meaning that a significantly higher number is observed in Harbin than its neighbors.
Based on the local Moran's I clusters for the total number of times the cumulative case count had doubled by 16 February 2020, many prefectures in the central and southeast part of China showed a high-high pattern, suggesting a rapid growth of the epidemic (Figure 7). A few prefectures such as Kunming, Chengdu, Baoding, and Dalian, showed significantly more times the cumulative case count doubled than their immediate neighbors are represented in a high-low pattern on the map.
Epidemiologia 2021, 2, FOR PEER REVIEW Figure 6. Results of local Moran's I clusters for the cumulative number of reported cases on 16 February 2020, indicating high numbers surrounded by high numbers (high-high clusters), high numbers surrounded by low numbers (high-low clusters), low numbers surrounded by high numbers (low-high clusters) and low numbers surrounded by low numbers (low-low clusters). Purple color represents that jurisdictions without cases by 16 February 2020, or jurisdictions outside mainland China (Hong Kong, Macau, and Taiwan).  February 2020, indicating high numbers surrounded by high numbers (high-high clusters), high numbers surrounded by low numbers (high-low clusters), low numbers surrounded by high numbers (low-high clusters) and low numbers surrounded by low numbers (low-low clusters). Purple color represents that jurisdictions without cases by 16 February 2020, or jurisdictions outside mainland China (Hong Kong, Macau, and Taiwan).
Based on the local Moran's I clusters for the arithmetic mean of doubling time, prefectures in the southeastern part of mainland China experienced fast epidemic growth with a short average doubling time were either in low-low clusters or high-low clusters (Figure 8). Some prefectures in the northern or northwestern part of mainland China experienced slow epidemic growth with a long average doubling time.
Our spatial clustering analysis captured a snapshot of the epidemic in mid-February 2020 when prefectures in Hubei province had very high case counts compared to the rest of mainland China. Meanwhile, prefectures across central, eastern, and southern China experienced rapid case growth. In some prefectures in northern, northeastern, and western China, epidemic hotspots were found. From the local clustering analysis, the epicenters of the epidemics can be clearly identified through the high-high clusters. Likewise, vulnerable areas located near the epicenters can be identified through the lowhigh clusters. Identifying the clusters of high and low case numbers can help us detect the sharpest boundaries between areas with a high and low level of transmission, which may help guide intervention measures.
If performed in real-time, spatial analysis can help epidemiologists identify epidemic hotspots and jurisdictions with rapid epidemic growth. Public health interventions can be applied in tiers that are proportionate to the risk of infection so that the epidemic can be controlled while damage to the economy and limitations to personal liberty can be minimized. Figure 6. Results of local Moran's I clusters for the cumulative number of reported cases on 16 February 2020, indicating high numbers surrounded by high numbers (high-high clusters), high numbers surrounded by low numbers (high-low clusters), low numbers surrounded by high numbers (low-high clusters) and low numbers surrounded by low numbers (low-low clusters). Purple color represents that jurisdictions without cases by 16 February 2020, or jurisdictions outside mainland China (Hong Kong, Macau, and Taiwan).

Limitations
This study has its limitations. First, underreporting due to underdiagnosis was a possibility, especially in Hubei province where the supply of diagnostic equipment was low and the capacity of local hospitals was overwhelmed during the study period [22]. Differential reporting rate across prefectures was a possibility too. Underlying this study was a strong assumption that the reporting rate remained the same over time within a prefecture. Given that the time frame of our dataset began on 24 January 2020, four days after the Chinese central government initiated nationwide reporting of COVID-19, we opined that the reporting rate was fairly stable during our study period. However, changes in case definitions might introduce further uncertainty in the case count reported in China as described in Tsang et al. [1]. Second, our dataset was crawled by a third-party crawler [14], from the DXY website that aggregated China's official data for this COVID-19 pandemic. We have manually identified and corrected errors therein as per provincial governmental sources [10]. Nonetheless, some minor errors might still remain. Third, the dates in the DXY dataset were the dates that DXY retrieved the data from the health authorities. They were the same dates as the data was released as the website maintained near-real-time updates for its visitors. Since late January 2020, data every 24 h ending at midnight was released by China's National Health Commission at around 8 am the next day (Beijing Time). Thus, the reporting date was one day behind the actual date when the cases were confirmed. Nevertheless, this limitation would not affect our calculation of the doubling times as long as the daily reporting periods remained consistent. Fourth, we acknowledged that the explanatory power of our regression models was limited because very few predictors were included in the regression models. low numbers (high-low clusters), low numbers surrounded by high numbers (low-high clusters), and low numbers surrounded by low numbers (low-low clusters). Purple color represents that jurisdictions without cases by 16 February 2020, or jurisdictions outside mainland China (Hong Kong, Macau, and Taiwan).

Figure 8.
Results of local Moran's I clusters for the arithmetic mean of the doubling times, indicating high numbers surrounded by high numbers (high-high clusters), high numbers surrounded by low numbers (high-low clusters), low numbers surrounded by high numbers (low-high clusters), and low numbers surrounded by low numbers (low-low clusters). Purple color represents that jurisdictions without cases by 16 February 2020, or jurisdictions outside mainland China (Hong Kong, Macau, and Taiwan).
Our spatial clustering analysis captured a snapshot of the epidemic in mid-February 2020 when prefectures in Hubei province had very high case counts compared to the rest of mainland China. Meanwhile, prefectures across central, eastern, and southern China experienced rapid case growth. In some prefectures in northern, northeastern, and western China, epidemic hotspots were found. From the local clustering analysis, the epicenters of the epidemics can be clearly identified through the high-high clusters. Likewise, vulnerable areas located near the epicenters can be identified through the low-high clusters. Identifying the clusters of high and low case numbers can help us detect the sharpest boundaries between areas with a high and low level of transmission, which may help guide intervention measures.
If performed in real-time, spatial analysis can help epidemiologists identify epidemic hotspots and jurisdictions with rapid epidemic growth. Public health interventions can be applied in tiers that are proportionate to the risk of infection so that the epidemic can be controlled while damage to the economy and limitations to personal liberty can be minimized. Figure 8. Results of local Moran's I clusters for the arithmetic mean of the doubling times, indicating high numbers surrounded by high numbers (high-high clusters), high numbers surrounded by low numbers (high-low clusters), low numbers surrounded by high numbers (low-high clusters), and low numbers surrounded by low numbers (low-low clusters). Purple color represents that jurisdictions without cases by 16 February 2020, or jurisdictions outside mainland China (Hong Kong, Macau, and Taiwan).

Conclusions
Our analysis documented spatial heterogeneity in the epidemic growth of the COVID-19 epidemic across prefectures in mainland China from 24 January to 24 February 2020. First, we found that the epidemic heavily affected prefecture-level cities in Hubei and neighboring provinces and a number of metropolitan areas in southern, eastern, and northeastern China. Nevertheless, our analysis showed that by 24 February 2020, the epidemic had spread to prefectures that comprise 98% of the Chinese population. We found that the power-law relationship between population size and cumulative case count (by 24 February 2020) indicated low-population prefectures had a higher per capita cumulative case count than high-population prefectures. Second, an increase in population size was associated with a very small reduction in the mean doubling time. Third, spatial analysis indicated that by mid-February 2020, prefecture-level cumulative case count clustered around Hubei while many prefectures across central, coastal, and northeastern China experienced rapid growth with cumulative case count doubling multiple times with a small mean doubling time. This study demonstrates that, if performed in real-time, spatial analysis of prefecture-level COVID-19 data can enable epidemiologists to stratify local jurisdictions by their epidemic growth. Tiers of public health interventions can be implemented by local jurisdictions in a manner that is proportionate to their epidemic risk. Spatial analysis can offer additional insights into the epidemic that enables effective responses to control its spread and yet minimizes unnecessary draconian measures that harm the economy and limit personal liberty. ( 胡煥庸 ) in 1935 [19]. This conceptual line is a straight line across mainland China, starting from Aihui County ( 璦琿縣 ) now part of the city of Heihe ( 黑河市 ) in Heilongjiang Province to the city (formerly county) of Tengchong ( 騰衝市 ) in Yunnan Province. Hu developed this concept to articulate the great disparity in geography, demography, and economic development between the southeastern part of China and the northwestern part of China. In 1935, Hu proposed that to the southeast of the Hu Line, 96% of China's population resided in 36% of China's landmass, while to the northwest of the Hu Line, 4% of China's population resided in 64% of China's landmass [19]. Even though since 1935, China's population has increased substantially, and the territorial claims of the People's Republic, as well as the geographical area that it effectively governs, have changed, Hu's observation made in 1935 remains valid today: 94% of China's population resides in 43% of China's landmass that lies southeast to the Hu Line [21]. which is equivalent to: log(cumulative case count)/log(population size) = g Now, the relationship between per capita cumulative case count and population size can be expressed as follows: Per capita cumulative case count = Cumulative case count/population size log(cumulative case count/population size) = m * log(population size) log(cumulative case count) − log(population size) = m * log(population size) log(cumulative case count) = (m + 1) * log(population size) Therefore, g = m + 1; or, m = g − 1. So, if we plot a figure, with log(cumulative case count / population size) on the y-axis and log(population size) on the x-axis, the slope of the regression line, i.e., m, would be approximately 0 (equivalent to having g = 1). If m < 0 (i.e., if g < 1), as population size increases, the per capita cumulative case count will decrease; if m > 0 (i.e., if g > 1), as population size increases, the per capita cumulative case count will increase.
Appendix A.5. Footnotes to Table 1 1 A reporting unit in this study is defined from the perspective of epidemiological surveillance. A reporting unit is a sub-provincial unit that reports COVID-19 data to the provincial health commissions and their data are shown distinctly in a provincial health commission press release. In a province, such sub-provincial report units are usually prefecture-level cities ( 地級市 ), prefectures ( 州 ), autonomous prefecture ( 自治州 ) of ethnic minorities, or leagues ( 盟 ) in Inner Mongolia. In the four centrally administered municipalities, they are (urban) districts ( 區/区 ) and (rural) counties ( 縣 /县). 2 For the centrally administered municipalities of Beijing, Shanghai, and Tianjin, the municipal government categorized non-natives to their cities as a distinct category (see Table 2). These categories were present in the DXY data set but were not included in our analysis. They were also not included in the column, "DXY data entries (excl. duplicate row)", because these categories were not categories created by (sub-provincial) location.