Mathematical and Statistical Analysis of Doubling Times to Investigate the Early Spread of Epidemics: Application to the COVID-19 Pandemic

: Simple mathematical tools are needed to quantify the threat posed by emerging and re-emerging infectious disease outbreaks using minimal data capturing the outbreak trajectory. Here we use mathematical analysis, simulation and COVID-19 epidemic data to demonstrate a novel approach to numerically and mathematically characterize the rate at which the doubling time of an epidemic is changing over time. For this purpose, we analyze the dynamics of epidemic doubling times during the initial epidemic stage, deﬁned as the sequence of times at which the cumulative incidence doubles. We introduce new methodology to characterize epidemic threats by analyzing the evolution of epidemics as a function of (1) the number of times the epidemic doubles until the epidemic peak is reached and (2) the rate at which the doubling times increase. In our doubling-time approach, the most dangerous epidemic threats double in size many times and the doubling times change at a relatively low rate (e.g., doubling times remain nearly invariant) whereas the least transmissible threats double in size only a few times and the doubling times rapidly increases in the period of emergence. We derive analytical formulas and test and illustrate our methodology using synthetic and COVID-19 epidemic data. Our mathematical analysis demonstrates that the series of epidemic doubling times increase approximately according to an exponential function with a rate that quantiﬁes the rate of change of the doubling times. Our analytic results are in excellent agreement with numerical results. Our methodology offers a simple and intuitive approach that relies on minimal outbreak trajectory data to characterize the threat posed by emerging and re-emerging infectious diseases.


Introduction
The trajectory of an infectious disease epidemic, which is typically tracked as the number of new reported cases by date of symptoms onset, is shaped by multiple factors including the natural history of the disease, background immunity patterns, environmental factors [1], superspreading events [2], control interventions and behavior changes. These factors are not static but change over the different time scales associated with the natural history of the infectious disease (e.g., influenza, Ebola, HIV/AIDS). Further, these factors shape the early growth dynamics [3,4] and the basic reproduction number (R 0 ), which quantifies the number of secondary cases per primary case in a completely susceptible population, and the generation interval, which dictates the time scale for the generation of new cases [5]. It is worth pointing out that even the generation interval is not necessarily invariant, but it can be affected during the course of an outbreak by control interventions and susceptible depletion (e.g., contact tracing and isolation) [6,7]. Hence, transmission estimates that capture dynamic changes in epidemic growth could provide a conceptual framework to characterize the epidemic threat posed by infectious diseases.
Parameter R 0 is a widely used indicator of transmission potential in a native population and is driven by the average contact rate and the mean infectious period of the disease [5]. Yet, it only characterizes transmission potential at the onset of the epidemic and varies geographically for a given infectious disease according to local healthcare, outbreak response, social, and cultural factors that in turn influence the probability of superspreading events [8][9][10]. Furthermore, estimating R 0 requires information about the natural history of the infectious disease. Thus, our ability to estimate reproduction numbers for novel infectious diseases is hindered by the lack of information about their epidemiological characteristics and transmission mechanisms. Therefore, approaches that rely on empirical patterns rather than specific mechanistic assumptions have proved valuable especially during the early transmission stages [11]. It is also worth noting that mathematical analyses of the doubling time of epidemics remain limited to exponentially growing epidemics, an assumption that does not always hold. Here we derive and demonstrate mathematical expressions that allow the characterization of the rate of change of the doubling time beyond exponential growth dynamics.
Although the effective reproduction number, R t , tracks time-dependent changes in transmission potential during the course of an epidemic, this quantity does not convey information about the magnitude or dynamic changes in the trajectory of the epidemic [12]. For instance, R t could be fluctuating over time around the epidemic threshold of 1.0 regardless of whether the outbreak's incidence rate is fluctuating around tens of cases per week or fluctuating around thousands of cases per week. This is because epidemic trajectories at broader spatial scales hide sub-epidemic growth dynamics. More informative metrics could synthesize real-time information about the extent to which the epidemic is expanding over time. Such metrics would be particularly useful if they rely on minimal data, such as time series of cases or deaths, of the outbreak's trajectory.
Epidemic doubling times characterize the sequence of times at which the cumulative incidence doubles [13][14][15][16][17][18]. Here we introduce metrics that capture the dynamics of epidemic doubling times to provide a parsimonious and practical framework to quantify the threat associated with infectious diseases. We use mathematical and numerical analysis which we illustrate using COVID-19 epidemic data of the early stages of the epidemic in various hotspot countries.

Representative Country-Level COVID-19 Data
We retrieved daily reported cumulative case data of the early spread of the COVID-19 pandemic in Germany, France, Canada, the United Kingdom (UK), The Netherlands, and the United States of America (USA) from the World Health Organization (WHO) website [19] and for Spain and Italy from the corresponding governmental datasets [20,21] from early February to 24 May 2020. We calculated the daily incidence from the cumulative trajectory and analyzed the early incidence trajectory for the 8 countries.

Synthetic Epidemic Growth Data
We conducted simulation studies for testing and demonstrating the mathematical results and estimation of parameters characterizing the rate of change of doubling times. For this purpose, we simulated epidemic growth curves by adding a Poisson error structure to the daily incidence curve obtained from the generalized-growth model (GGM) [8,22], which provides a first approximation to the growth phase of the epidemic's trajectory. We generate synthetic data to verify the methodology before we apply it to real epidemic data.

The Generalized-Growth Model for Early Ascending Phase
We analyze the evolution of epidemics as a function of the number of times the cumulative incidence doubles before the epidemic peaks, and the rate at which the doubling time increases (hence the epidemic slows down), owing to interventions, behavior changes and depletion of susceptible individuals. The most difficult to control are epidemic threats that double in size many times by the time the peak has been reached, while doubling times increase at a relatively low rate. Conversely, the least difficult to control are epidemic threats that only double in size very few times, while the sequence of doubling times rapidly increases.
During the epidemic growth phase, the times at which cumulative incidence doubles are given by and n d is the total number of times cumulative incidence doubles ( Figure 1A). The actual sequence of "doubling times" is defined as follows (see Figure 1B): As a first approximation, we employ the generalized-growth model (GGM) [8,22] to characterize the temporal evolution of doubling times during the growth phase of the epidemic's trajectory. The GGM is a useful phenomenological model that relaxes the assumption of exponential growth in the early ascending phase of an outbreak, taking the form: where C (t) describes the incidence at time t, the solution C(t) describes the cumulative number of cases at time t, r is a positive parameter denoting the growth rate (with units of (people) (1−p) per time), and p ∈ [0, 1] is a "deceleration of growth" parameter (dimensionless). If p = 0, this equation describes constant incidence over time and the cumulative number of cases grows linearly, whereas p = 1 describes exponential growth. This model has been shown to provide an improved description of the early growth phase of epidemics [8,22] and has displayed promising results for short-term epidemic forecasting [23][24][25]. Cumulative incidence C(t 10 ) In this illustration the doubling times increase as cumulative incidence doubles, indicating that the epidemic features sub-exponential rather than exponential growth dynamics.
For exponential growth dynamics (i.e., p = 1), it is well known that the doubling times remain invariant and is given by: ∆t j = ln 2 r , whereas the doubling times follow an increasing trend when the epidemic grows sub-exponentially fast as shown in Figure 1.
When p < 1 (sub-exponential epidemic growth), given a time t > 0 and the initial number of cases C 0 , one can determine the time to the doubling of cases from t. That is, given t > 0, one can find ∆t = ∆t(t) such that 2C(t) = C(t + ∆t). As shown in the next section, this function is given by Alternatively, one can also express the doubling time ∆t as a function of C = C(t) as follows (full derivation given below): Thus, for the case of sub-exponential growth, i.e., when p < 1, one concludes that the doubling time increases exponentially according to The constant b d depends only on the deceleration of growth parameter p < 1. Hence, b d is the exponential rate at which the doubling times, ∆t j , increase. The slower the rate b d , the slower the rate at which the epidemic doubling times increase, which tends to increase the number of times the epidemic doubles.

Evolution of Doubling Times for Sub-Exponential Growth. Mathematical Analysis
In this section, we carry out mathematical analysis of the temporal evolution of doubling times during the epidemic growth phase using the generalized growth model (1). We wish to find the "doubling time", ∆t, such that for any t > 0, For exponential growth dynamics (i.e., p = 1), one can easily check that ∆t remains invariant and is given by ∆t = ln 2 r . Consider the case p < 1 (sub-exponential epidemic growth). Given C(0) = C 0 , from Equation (1) one obtains Substituting expression (6) into (5), one has This identity is equivalent to In order to solve for ∆t, we rewrite (8) as Equation (9) yields and therefore ∆t = This proves (2) as stated in Section 3 above. One can also express the doubling time, ∆t, as a function of C(t). Indeed, according to (6), Identity (12) implies Combining (11) and (13), one concludes Thus, given C(t), r and p, we can determine ∆t. This confirms property (3) of subexponential growth model introduced in Section 3.

Estimating the Rate of Change of Doubling Times from Epidemic Data
The exponential rate, b d , at which the doubling times ∆t j increase can simply be estimated by least squares fitting, where we estimate b d as the slope of the line given by the equation: The intercept parameter, ln λ α+1 , is also estimated using least squares. That is, we estimate two parameters denoted by Θ, and the sequence of doubling times given by ∆t j is sufficient to estimate both.
To quantify parameter uncertainty, we employ parametric bootstrapping with a Poisson error structure (e.g., [26][27][28]) around the incidence curve from which the sequence of doubling times, ∆t j , can be obtained. That is, we re-estimate parameters Θ i , where i = 1, 2, . . . , s, and s is the number of bootstrap realizations.

Doubling Time Analysis of COVID-19 Epidemics
We conducted simulation studies to verify our analytic results and assess the amount of time-series data (e.g., number of doublings in cumulative incidence) that is needed to reliably estimate the rate b d at which the doubling times increase. For this purpose, we simulated 500 incidence curves comprising 35 days by adding a Poisson error structure to the daily incidence curve obtained from the generalized-growth model. For illustration, we set common parameter r = 0.2 and vary the scaling of growth parameter, p. The initial condition was set at C(0) = 1, see Figure 2.   Figure 3B). We also assessed our ability to estimate b d from limited simulated data using the GGM that incorporates a Poisson error structure as described above. Our results based on simulated data indicate that the parameter b d can be reliably estimated from the first few doubling times of the epidemic (Figure 4).

5 10
Consecutive times at which cumulative incidence doubles   The evolution of the doubling times during the early phase of the representative COVID-19 epidemics in various hotspot countries is displayed in Figure 5. Overall, the mean doubling time ranged from 1.9 (Italy) to 4.1 (Canada). Importantly, Figure 5 also shows that the sequence of doubling times is well characterized by the generalized-growth model.

Conclusions
In this paper, we have introduced a novel framework to characterize epidemic control based on the evolution of doubling times associated with the epidemic's ascending phase. Our methodology, based on analytic and numerical results, validated by simulations, relies on minimal outbreak data and, is independent of the time scales driving the transmission process. Our work contributes in several ways: (1) Our approach provides a framework to numerically and mathematically characterize the rate at which the doubling time of an epidemic is changing over time. (2) We derive analytical formulas of the rate at which the doubling time is changing and test and illustrate our methodology using synthetic and COVID-19 epidemic data. (3) Our mathematical analysis (and proof) demonstrates that the series of epidemic doubling times increase approximately according to an exponential function with a rate that quantifies the rate of change of the doubling times. Our analytic results are in excellent agreement with numerical simulations.
Future studies could characterize the doubling times for past outbreaks of different infectious diseases at different geographic and spatial contexts in order to glean a more refined and comprehensive picture of the relationship between changing doubling times and geographic variability in socio-demographic factors and interventions. Indeed, all outbreaks are influenced by stochasticity and by multiple factors including the mode of transmission and severity that shapes the transmission network structure as well as the fraction of the susceptible population [3,[29][30][31][32] and the effects of behavior changes and control interventions [3,33,34].
Our analysis is not exempt of limitations. First, to illustrate our methodology we were not exhaustive on the number of outbreak datasets on COVID-19, but rather we focused on a convenient sample of COVID-19 datasets at the country level to illustrate our methodology. Second, surveillance epidemiological data is often subject to limitations. In particular, near real-time epidemiological data is often subject to measurement noise and reporting delays, in particular owing to the interval between infection and reporting [17]. Reporting delays could distort the incidence pattern particularly for the most recent weeks of the trajectory of the epidemic [35][36][37].
Our framework is based on a transmission scale that goes beyond estimates of transmission potential tied to the early epidemic onset such as R 0 , or parameters such as Reff that does not include notions of outbreak magnitude. Our method could be readily applied for monitoring the progression of newly emerging or re-emerging infectious diseases as it only requires incidence data describing the cumulative number of cases as a function of time. However, as other transmission metrics, it may be affected by changes in case definitions or underreporting over the course of epidemics, particularly as epidemiological surveillance systems tend to capture more severe cases.