What Is the Impact of Early and Subsequent Epidemic Characteristics on the Pre-delta COVID-19 Epidemic Size in the United States?

It is still uncertain how the epidemic characteristics of COVID-19 in its early phase and subsequent waves contributed to the pre-delta epidemic size in the United States. We identified the early and subsequent characteristics of the COVID-19 epidemic and the correlation between these characteristics and the pre-delta epidemic size. Most (96.1% (49/51)) of the states entered a fast-growing phase before the accumulative number of cases reached (30). The days required for the number of confirmed cases to increase from 30 to 100 was 5.6 (5.1–6.1) days. As of 31 March 2021, all 51 states experienced at least 2 waves of COVID-19 outbreaks, 23.5% (12/51) experienced 3 waves, and 15.7% (8/51) experienced 4 waves, the epidemic size of COVID-19 was 19,275–3,669,048 cases across the states. The pre-delta epidemic size was significantly correlated with the duration from 30 to 100 cases (p = 0.003, r = −0.405), the growth rate of the fast-growing phase (p = 0.012, r = 0.351), and the peak cases in the subsequent waves (K1 (p < 0.001, r = 0.794), K2 (p < 0.001, r = 0.595), K3 (p < 0.001, r = 0.977), and K4 (p = 0.002, r = 0.905)). We observed that both early and subsequent epidemic characteristics contribute to the pre-delta epidemic size of COVID-19. This identification is important to the prediction of the emerging viral infectious diseases in the primary stage.


Introduction
Coronavirus 19 disease (COVID-19) has led to a worldwide pandemic [1,2]. As of 31 March 2021, the pandemic disease was affecting people in over 200 countries and territories, with more than 127 million confirmed cases and 4 million deaths reported globally [3]. At the same time, in the United States (US), there were over 30 million confirmed cases and 550,354 deaths attributed to COVID-19 [4,5]. The cumulative incidence of COVID-19 in the United States at that time exceeded 9000/100,000. Whereas, at that time, only 16% of Americans were completely vaccinated, and the delta variant was just beginning to spread in the United States [6][7][8]. It can be considered that the epidemic of COVID-19 in the United States was in an early stage up to that time. Exploring the factors

Spatiotemporal Changes of COVID-19 Pandemic in the United States
The geographic distribution of the COVID-19 epidemic showed a substantial change from 31 March 2020 to 31 March 2021, with a significant shift from the Eastern United States to the Central United States ( Figure S1). On 31 March 2020, the incidence in most states was at a relatively low level (<300/100,000), while the highest incidence was reported in Eastern United States (New York, 275/100,000; New Jersey, 125/100,000). By 11 December 2020, there was a substantial increase in incidence in all states across the United States. In particular, there was a strong emergence in the Central United States (North Dakota, 11,445/100,000; South Dakota, 10,136/100,000). From 11 December 2020 to 31 March 2021, the growth of incidence slowed down slightly, but the Central United States remained the most severely affected area.
Remarkably, most of the states that experienced three or four waves were concentrated in the central part of the United States. The Getis-Ord Gi* statistic for total COVID-19 infectious identified Eastern United States as COVID hotspots in early 2020 and the Central United States as hotspots in 2021 and late 2020 ( Figure S1). The Anselin's Local Moran's I analysis identified New York, located in the Eastern United States, as a high-low outlier in 2020, and "high-high" clusters were mainly found in the Central United States as of 31 March 2021.

Early Epidemic Characteristics of the COVID-19 Epidemic in the US
We demonstrated two-phase linear fits to the first 100 confirmed cases of COVID-19 during the early phase of the epidemic in the 51 states of the US (Figure 1). The simple model identified one slow-growing phase and one fast-growing phase in the early phase of the 100 confirmed cases. The slow-growing phase was relatively short with a growth rate of 1.6 (1.2-2.0) cases/day, whereas the growth rate in the fast-growing phase was about 11 times higher (18.2 (14.5-21.8) cases/day, Table 1). The conversion from the slow phase to the fast phase occurred on day 13 (9.8-16.1). The average number of confirmed cases at the phase transition point was 12.6 (10.0-15.3) days, and 96.1% (49/51) of the states transited from the slow-growing phase to the fast-growing phase at a level below 30 cases ( Figure 1, Table 1). Consistent with a previous finding in China [15], we regarded '30' confirmed cases as a critical threshold where the COVID-19 epidemic started to increase rapidly. Further, in the 51 states, the days required for the number of confirmed cases to increase from 30 to 100 was 5.6 (5.1-6.1) days. The average case-fatality rates in the first 100 confirmed cases across all US states were 1.1% (0.6-1.6%).

Early Epidemic Characteristics of the COVID-19 Epidemic in the US
We demonstrated two-phase linear fits to the first 100 confirmed cases of COVID during the early phase of the epidemic in the 51 states of the US (Figure 1). The sim model identified one slow-growing phase and one fast-growing phase in the early ph of the 100 confirmed cases. The slow-growing phase was relatively short with a grow rate of 1.6 (1.2-2.0) cases/day, whereas the growth rate in the fast-growing phase w about 11 times higher (18.2 (14.5-21.8) cases/day, Table 1). The conversion from the s phase to the fast phase occurred on day 13 (9.8-16.1). The average number of confirm cases at the phase transition point was 12.6 (10.0-15.3) days, and 96.1% (49/51) of the sta transited from the slow-growing phase to the fast-growing phase at a level below 30 ca ( Figure 1, Table 1). Consistent with a previous finding in China [15], we regarded confirmed cases as a critical threshold where the COVID-19 epidemic started to incre rapidly. Further, in the 51 states, the days required for the number of confirmed case increase from 30 to 100 was 5.6 (5.1-6.1) days. The average case-fatality rates in the f 100 confirmed cases across all US states were 1.1% (0.6-1.6%).   Figure 2 demonstrates the multi-logistic fitting to the COVID-19 epidemics in the 51 states. As of 31st March, our model identified that all 51 states experienced at least 2waves of COVID-19 growth, among which 23.5% (12/51) experienced a 3-wave growth, and 15.7% (8/51) experienced a 4-wave growth ( second wave. For the third and fourth waves, the average number of estimated confirmed cases at the peak was 389,757 (256,873-522,641) and 327,861 (43,091), and the outbreak duration was 94.1 (86.6-101.6) and 92.1 (76.9-107.3) days respectively. The fourth outbreak (estimated to saturate at 327,861 cases), the third outbreak (estimated to saturate at 389,757 cases) and the second outbreak (estimated to saturate at 327,686 cases) is significantly greater than the first outbreak (saturate at 111,061 cases). However, the duration of the outbreaks showed no significant difference across the waves.  Multi-logistic fittings for the dynamics of the newly reported incidence are presented for each state as separate panels with the blue line, green line, red line and light blue line representing the development process of the first wave, second wave, third wave and fourth wave estimated by fitting, respectively. The types of line colors correspond to the number of waves obtained by fitting. The total number of waves at 1, 2, 3 and 4 were also displayed as the colors of the borders of panels with blue, green, red and bright blue, respectively.

Epidemic Size and Associated Characteristics
As of 31 March 2021, the overall epidemic size was 30,326,324 cases in the United States, ranging from the lowest 19,275 cases in Vermont to 3,669,048 cases in California. Figure 3 demonstrated significant correlations between epidemic size and time from 30 to 100 (p = 0.003, r = −0.405) and growth rate of the fast-growing phase (p = 0.012, r = 0.351). Additionally, epidemic size also showed significant positive correlations with K 1 (p < 0.001, r = 0.794), K 2 (p < 0.001, r = 0.595), K 3 (p < 0.001, r = 0.977) and K 4 (p = 0.002, r = 0.905). In addition, the epidemic size was positively correlated with the new cases on restriction day (p < 0.001, r = 0.764) and new cases on reopening day (p < 0.005, r = 0.880).  * All the parameters were defined by the multi-logistic fitting; 1 The parameters K 1 , K 2 , K 3 , K 4 , and K represent the asymptotic values that bound the function and therefore specify the level at which the epidemic and the overall epidemic saturates; 2 The parameters tm 1 , tm 2 , tm 3 , and tm 4 represent the midpoint of each epidemic growth and hence the peak of each outbreak; 3 The parameters ∆t 1 , ∆t 2 , ∆t 3 , and ∆t 4 are the lengths of time intervals required for the epidemics to grow from 10% to 90% of the saturation level.

Correlation between Early and Subsequent Epidemic Characteristics
Notably, early characteristics of the epidemic also show a correlation with overall epidemic-wide fluctuations (Figure 3  . Spearman correlation between epidemic size, multi-logistic parameters, characteristics in the early stage of the epidemic and non-pharmacological intervention characteristics. The size of each circle represents the absolute value of the correlation coefficient. The color of each circle represents the sign of the correlation coefficient and the magnitude of its absolute value. The asterisks in circles indicate that the p-value of the hypothesis test for the correlation coefficient is less than 0.05. HAQ: healthcare and access quality index.

Discussion
Our study demonstrated that during the early phase of the epidemics in 51 US states, 30 cases appear to be a critical threshold for switching from a relatively slow-growing phase to a fast-growing phase. This is consistent with one of our previous published studies [15]. We identified multiple temporal waves and geographical distribution in the subsequent COVID-19 epidemics. Most states (50/51) have experienced at least 2 waves of the epidemic outbreak. The subsequent waves are significantly stronger and longer than the first wave, but states with a higher first wave tend to have higher subsequent waves as well. We also showed a geographic shift of the epidemic from the coastal states to inland states. The COVID-19 epidemic size across the US states is significantly associated with the duration from 30 to 100 cases, the growth rate of the fast-growing phase, and the peak cases in the subsequent waves.
Our study demonstrated similar early epidemic characteristics in the US compared with the previous findings in China [15]. In both settings, both countries were unprepared for the unprecedented outbreak. Based on the previous study, we again demonstrate that the first 30 cases appear to be an essential indicator of the onset of a rapid phase of COVID-19 transmission. Once this critical level is reached, the epidemic tends to enter a period of rapid expansion. Establishing an early warning system based on the number of confirmed cases per day is crucial to controlling the spread of an epidemic in the early phase. Interventions that aim to contain it to a low level in its early stage may be most beneficial in reducing its subsequent size.
Our study indicated that early characteristics are predictive of the subsequent waves of the epidemics and the epidemic size. Notably, the shorter the duration to increase from 30 to 100 cases in its early phase, the more severe the subsequent epidemic burden. The shorter duration represents a more rapid epidemic spread and likely reflects the absence of effective prevention strategies and diagnostic capacity in the states, leading to a high transmission rate. Consistent with this, more rapid growth in the early phase is also associated with a greater magnitude of subsequent waves. We believe the early rapid development of the outbreak may serve as an important warning signal for the healthcare system to respond to the outbreak and help predict the potential size of COVID-19 in the later development of the epidemic.
Our study demonstrated a significant shift in the geographical distribution of the COVID-19 epidemic from coastal to inland states, and inland states appeared to have experienced more subsequent waves, similar to the results of an earlier study [27]. This might be due to their delay in their response to COVID-19 prevention. California, New Jersey, Illinois, and Connecticut started restrictions early. In comparison, as of 3 May 2020, the eight states in the Central United States had not issued any residence orders nor lockdown interventions. Consistent with this, our study found that the lower the cumulative number of cases on lockdown days, the lower the extent of the subsequent epidemic size. Furthermore, the first eight states that experienced the fourth wave might have higher population density and more frequent changes in social distancing restrictions, such as frequent lockdown and reopening [28][29][30][31][32][33][34]. This may have led to the earlier decline of the previous wave and the earlier appearance of the next wave. Additionally, a low cumulative number of cases on the reopening day also corresponds to a smaller epidemic size intensity. This may also reflect the fact that states that emphasize non-pharmaceutical interventions are more effective at controlling the epidemic [35], which confirmed the view in a previous study that a hasty reopening may lead to another epidemic [10]. However, although many areas would implement restrictions and reopen according to the epidemic situation, the reduced adherence caused by pandemic fatigue may also lead to a large-scale epidemic [36].
Our study also demonstrated a strong correlation between waves. We are the first to identify the number of waves and quantify the size and duration of the waves. We report the subsequent waves were at least three times more than the first waves and each cycle of the waves is about 3 months. The duration likely reflects the time for the healthcare system to react, intervene, and gradually reduce the epidemics.
There are some limitations to our findings. First, our research only verified the predelta epidemic data before 31 March 2021 without considering the underestimated cases in the first waves [37] and the effect of the vaccination on the trajectory of the epidemic and multiple rounds of public health interventions. Further investigation is necessary to identify the impacts of these interventions on the course of the COVID-19 epidemic in the US. Second, states in the US may begin to witness the sixth wave of outbreaks with the emergence of another strain. The increasing waves may affect the prediction accuracy of various characteristics of the epidemic. Third, many potential confounding factors such as environmental, meteorological and intervention factors [38,39] were not included in our current study, which demands future investigations. Fourth, the vaccination efficacy, along with the population structure that may influence the vaccine effect and susceptibility [40,41], was not taken into consideration due to the limitation of data sources. The mortality rate was also affected by the complexity of its influencing factors [42][43][44] and was not considered to be an outcome indicator. Moreover, our study only considered the first restriction and first reopening. Many areas have since implemented more restrictions and reopening, which were not included in this study. The number of new cases on the days of other restrictions and reopening would likely serve as important indicators to predict the epidemic size, which could be addressed in further study.

Data Source
We collected publicly available data from 51 states in the United States, that reported on cases of COVID-19 (number of daily confirmed cases, deaths, and recovery cases) from 21 January 2020 to 31 March 2021 from the Coronavirus resource center of Johns Hopkins University and Medicine [3]. In accordance with the methods of another study [45], for some inconsistencies between Johns Hopkins University data and state-level reporting, we have manually supplemented and corrected this dataset and adjusted for reporting-day biases according to the NY Times website [4] to improve the accuracy of our analysis.
We collected social distancing data from the IHME COVID-19 Forecasting Team's article [46]. We also obtained each state's population size from the WorldPop Population Counts [47] along with the Healthcare Access and Quality (HAQ) Index, which was a summary measure of personal healthcare access and quality, from the GBD's article [48].

Selection of Epidemic Characteristics Indicators
We defined the epidemic size for each state as the cumulative number of confirmed cases on 31 March 2021.
For the early epidemic trend of each 51 states, we used the Joinpoint software [49] to identify the trend and transition point of the epidemic during the initial phase of the epidemic based on the first 100 confirmed cases. We imposed a two-phase fit (it can be determined through the Joinpoint software automatically) [15] with the maximum of one joinpoint (corresponding to two-time intervals) and used a linear regression model for both phases. We identified: (1) the time of the transition point between the two phases; (2) the number of cases at the transition point; the growth rates of the (3) first (slow-growing) phase and (4) the second (fast-growing) phase. For the 51 states of the United States, the average number at the turning point is 12.6 (10-15.3) days (Table 1), along with the majority (49/51) of transition points occurring below 30 cases, which is consistent with our previous conclusion of 30 cases in China [15]. Therefore, in this study, we regarded 30 cases as an important threshold for the epidemic growth where the epidemic changed from a slow-growing to a fast-growing phase. We also estimated three additional predictors based on the first 100 confirmed cases, namely: (1) the days required to increase from 30 to 100 cases (time from 30 to 100); (2) the case fatality rate among the first 100 confirmed cases (CFR-100). The 'first 100 cases' was defined as the number of confirmed cases on the day the 100th confirmed case was reported.
We also collected various non-pharmacological intervention indicators under the epidemic situation, which were as follows: (1) the time of first restriction and first reopening for 51 US states; (2) the confirmed new cases on the day of restriction and reopening.
For the study period in each of the 51 states, we used a simple multi-logistic fitting (https://logletlab.com/, accessed on 5 April 2022) to identify the key characteristics of the COVID-19 epidemic based on the cumulative number of confirmed COVID-19 cases. We modelled the epidemic patterns by identifying 1 to 4 growth waves of the COVID-19 epidemic. We identified: (1) K m (m = 1,2,3,4) for each wave which represents the asymptotic values that bound the function and therefore specify the level at which the epidemic saturates. The sum of K represents the point at which the epidemic has finally reached saturation; (2) t m for each wave represents the midpoint of each epidemic growth and hence the peak of each outbreak. t m4 -t m3 , t m3 -t m2 , t m2 -t m1 represent the time interval between the two consecutive outbreaks. The sum of t m represents the sum of the time required for each epidemic to reach its peak; (3) ∆t for each phase represents the lengths of time intervals required for the epidemics to grow from 10% to 90% of the saturation level. The sum of ∆t represents the sum of the interval lengths required for each epidemic to increase from 10% of saturation level to 90%. Figure S3 is a schematic graph with three main characteristics of K m , t m and ∆t.

Statistical Analysis
We generated the disease mapping of COVID-19 in ArcGIS10.8 (Environmental Systems Research Institute, Redlands, CA, USA). The reported incidence in each state was joined to the shapefile of state boundary by administrative unit code. We created distribution maps for the COVID-19 on 31 March 2020, 11 December 2020 (the day when the Food and Drug Administration (FDA) of the United States issued an Emergency Use Authorization (EUA) for the first vaccine to prevent COVID-19) [50], and 31 March 2021, separately. We also performed Anselin Local Moran's I and Getis-Ord Gi* to identify the spatial variations of the COVID-19 in the United States further. For the geospatial analysis, we normalized the epidemic size by dividing the population size of each of the states to obtain the infected rates of per 100,000 individuals.
After verifying the non-normal distribution of most variables through the Kolmogorov-Smirnov test, we used the Spearman correlation test in the OriginPro 2021b software (Origin-Lab Corporation, Northampton, MA, USA) to examine the correlation between epidemic size, early epidemic characteristics (time from 30 to 100, CFR in the first 100 confirmed cases, day of the phase turning point, number of cases at the turning point, slow-growing phase (case/day), fast-growing phase (case/day)), intervention (new cases on restriction, new cases on reopening) and HAQ indicators and the multi-logistic fitting characteristics of the subsequent epidemics (number of phases; K 1 , K 2 , K 3 , K 4, Sum of K; t m1 , t m2 , t m3 , t m4 , sum of t m ; t m4 -t m3 , t m3 -t m2 , t m2 -t m1 ; ∆t 1, ∆t 2, ∆t 3 ∆t 4, Sum of ∆t).

Conclusions
In conclusion, we confirmed that the first 30 cases of COVID-19 might be a critical threshold for switching from a relatively slow-growing phase to a fast-growing phase in the early epidemic in the United States. Further, most states have experienced more than 1 wave of the COVID-19 epidemic, and the magnitude of the first wave tends to predict the magnitude of subsequent waves. The pre-delta epidemic size is negatively and significantly correlated with the duration from 30 to 100 cases but positively correlated with the growth rate of the fast-growing phase, and the peak cases in the subsequent 4 waves.

Supplementary Materials:
The following supporting information can be downloaded at: https: //www.mdpi.com/article/10.3390/pathogens11050576/s1, Figure S1: The changes of the geographic distribution of COVID-19 incidence in the United States from 31 March 2020 to 31 March 2021, including 11 December 2020; Figure S2: Comparison of the fitted parameters for the multi-logistic approximation of 50 US states and Washington DC; Figure S3: A schematic graph with three main indicators of K m , t m and ∆t; Table S1: The Global Moran's I of COVID-19 incidence in the United States on 31 March 2020, 31 March 2021 and 11 December 2020; Table S2: Spearman correlation between multi-logistic parameters and indicators in the early stage of the epidemic and non-pharmacological intervention indicators.