- freely available
- re-usable
Int. J. Environ. Res. Public Health 2010, 7(1), 291-302; doi:10.3390/ijerph7010291
Abstract
: The basic reproduction number, R_{0}, a summary measure of the transmission potential of an infectious disease, is estimated from early epidemic growth rate, but a likelihood-based method for the estimation has yet to be developed. The present study corrects the concept of the actual reproduction number, offering a simple framework for estimating R_{0} without assuming exponential growth of cases. The proposed method is applied to the HIV epidemic in European countries, yielding R_{0} values ranging from 3.60 to 3.74, consistent with those based on the Euler-Lotka equation. The method also permits calculating the expected value of R_{0} using a spreadsheet.1. Introduction
1.1. The Basic Reproduction Number
The basic reproduction number, R_{0}, of an infectious disease is the average number of secondary cases generated by a single primary case in a fully susceptible population [1]. R_{0} is the most widely used epidemiological measurement of the transmission potential in a given population. Statistical estimation of R_{0} has been performed for various infectious diseases [2,3], aiming towards understanding the dynamics of transmission and evolution, and designing effective public health intervention strategies. In particular, R_{0} has been used for determining the minimum coverage of immunization, because the threshold condition to prevent a major epidemic in a randomly-mixing population is given by 1-1/R_{0} [4]. In addition, R_{0} gives an estimate of the so-called final size, i.e., the proportion of the population that will experience infection by the end of an epidemic [5,6].
1.2. Statistical Estimation of R_{0}
Methodological discussions concerning the statistical inference of R_{0} are still in progress, and it is recognized that the estimate is very sensitive to dispersal (or underlying epidemiological assumptions) of the progression of a disease [7]. When one estimates R_{0} using epidemic data of an emerging (or exotic) disease, the exponential growth rate, r, of infections during the initial phase of the epidemic is used [8,9]. Assuming that the generation time, i.e., the time from infection of a primary case to infection of a secondary case generated by the primary case [10], is known (or separately estimated from other empirical data), the growth rate r is transformed to R_{0} using that knowledge (see below). That is, the conventional estimation technique has required two statistical steps, namely, first estimate r and then convert r to R_{0}.
The estimation method can be illustrated by employing a simple renewal process which adheres to the original definition of R_{0} [1]. Let j(t) be the number of new infections (i.e., incidence) at calendar time t. Supposing that each infected individual on average generates secondary cases at a rate A(τ) at time τ since infection (where τ is referred to as the “infection-age” hereafter), j(t) is written as:
Since R_{0} represents the total number of secondary cases that a primary case generates during the entire course of infection, the estimate is given by ([11]):
When j(t) follows an exponential growth path, it is easy to extract the integral of A(τ) from equation (1). Supposing that the incidence grows exponentially at a rate r, we have j(t) = kexp(rt) where k is a constant, and moreover, j(t – τ) = kexp(rt)exp(–rτ). This simplifies (1) to the so-called Euler-Lotka equation:
Since the density function of the generation time, g(τ), represents the frequency of secondary transmissions relative to infection-age τ, we can write g(τ) as:
Replacing A(τ) in the right-hand side of (3) by that of (4), an estimator of R_{0} is obtained [9]:
The estimation of R_{0} is achieved by measuring the exponential growth rate r from the incidence data and also by assuming that g(τ) is known (or separately estimated from empirical observation such as contact tracing data [12,13]). It should be noted that the above mentioned framework has not been given a likelihood-based method for estimating R_{0} (i.e., a likelihood function used for fitting a statistical model to data, and providing estimates for R_{0}, has been missing). Moreover, equation (5) may not be easily used by non-experts, e.g., an epidemiologist who wishes to estimate R_{0} using her/his own data.
The purpose of the present study is to offer an improved framework for estimating R_{0} from early epidemic growth data, which may be slightly more tractable among non-experts as compared with the above mentioned estimator (5). A likelihood-based approach is proposed to permit derivation of the uncertainty bounds of R_{0}. For an exposition of the proposed method, the incidence data of the HIV epidemic in Europe is explored.
2. Methods
2.1. Actual Reproduction Number
In addition to R_{0}, a different measurement of the transmission potential using widely available epidemiological data, the actual reproduction number, R_{a}, has been proposed for HIV/AIDS [14]. The concept of R_{a} is much simpler than R_{0} in that R_{a} is defined as a product of the mean duration of infectiousness and the ratio of incidence to prevalence [15]. The prevalence p(t) at calendar time t is written as:
The actual reproduction number R_{a} is written as:
R_{a} coincides with R_{0} as long as β(τ) is constant. Nevertheless, in many instances, the contact frequency and infectiousness (which may be partly reflected, for example, in the viral load distribution of the infected host) vary as a function of infection-age τ. The variation in β(τ) is particularly the case for HIV infection. Thus, although the usefulness of the incidence-to-prevalence ratio and R_{a} has been emphasized to have an application in HIV/AIDS [15], R_{a} tends to yield a biased estimate (if R_{a} is regarded as a proxy for R_{0}), and the estimate of R_{a} is not as robust as that is obtained with equation (5) to objectively interpret the transmission potential [17,18].
2.2. Correcting R_{a}
Here, the above mentioned negative aspect of R_{a} is reconsidered by correcting the definition of R_{a}. The disease of interest in the present study is HIV. The frequency of secondary transmissions relative to infection-age τ (i.e., the generation time distribution), approximated by a step function, is shown in Figure 1. As has been known for some time [19], the frequency of secondary transmissions is very high shortly after infection (for a duration d_{1} = 0.24 years), followed by a long asymptomatic period with a low frequency of secondary transmissions (for d_{2} = 8.38 years) [20]. Subsequently, the frequency rises sharply again resulting in a substantial number of secondary cases for a duration d_{3} = 0.75 years until death or until the infected individual ceases risky sexual contact due to AIDS [20–22]. Assuming that the contact frequency does not vary as a function of infection-age, g_{1}, g_{2} and g_{3} have been estimated at 1.30, 0.05 and 0.36 per year [20]. Here, g(τ) is the density function of the generation time, with a mean of 3.79 years.
Here the concept of R_{a} is corrected. The equation (8) is rewritten as:
The numerator represents the number of new infections at calendar time t, while the denominator was originally intended to represent “the total number of infectious individuals” who can potentially be primary cases with an equal chance at time t. Nevertheless, in order that the estimator of the actual reproduction number coincides with that of R_{0}, the concept of the denominator is better replaced by “the total number of effective contacts (which can potentially lead to secondary transmissions with an equal probability)”. Therefore, the corrected R_{a} is better written as:
Replacing g(τ) in the right-hand side of (11) by that of (12), we get:
Thus, the estimator of corrected R_{a} in (11) is identical to that of R_{0}. In other words, R_{0} can be estimated from the incidence data and the generation time without assuming exponential growth of cases during the early phase of an epidemic. It should be noted that the ratio of prevalence to mean generation time p(t)/D in the denominator of the right-hand side of (10) has been replaced by “the total number of effective contacts that have equal potential to generate secondary cases”.
2.3. Data
Here the epidemic data of HIV/AIDS in three European countries: France, the Western part of Germany (i.e., the former Western Germany) and the United Kingdom (UK) are investigated [23]. Figure 2A shows the yearly incidence in these countries from 1976–2000. During the observation period, a total of 23,243, 13,126 and 11,491 AIDS cases were diagnosed in France, Western Germany and the UK, respectively. Although the time of HIV infection is not directly observable, the HIV incidence has been estimated by employing a back-calculation technique and using the AIDS incidence and the incubation period distribution of AIDS [23]. The present study does not discuss the details of back-calculation, but explanatory guides can be found elsewhere [24–26]. Figure 2B shows the enlarged HIV incidence curve during the initial phase of an epidemic. The peak incidence was observed in 1982 for Western Germany and 1983 for France and the UK. In the following, the time period from 1976 up to one year prior to the peak incidence is taken as the early growth phase. For the purpose of an exposition of the proposed method, the HIV incidence is assumed to have been in a single homogeneously-mixing population.
2.4. Statistical Analysis
R_{0} is estimated using two different methods, one employing the estimator (5) and another using the corrected R_{a}. For the former approach, the exponential growth rate is estimated via a pure birth process [27]. Given that the cumulative incidence from year 0 to t – 1 is observed, the conditional likelihood of observing the cumulative incidence J_{t} cases in year t is proportional to ([28]):
The latter method, proposed in the present study, employs the estimator of corrected R_{a} in (11). Since the data are yearly, the equation (11) needs to be rewritten in discrete-time:
The likelihood of estimating R_{0} with (15) is proposed as follows. First, the inverse of both sides of (15) is taken:
The numerator of the right-hand side indicates the total number of effective contacts made by potential primary cases in year t that have an equal probability of resulting in a secondary transmission, and the denominator is the number of secondary cases in year t.
The right-hand side of equation (16) is interpreted as follows. Figure 3A shows a transmission tree, i.e., a representation of who infected whom, where each primary case on average generates two secondary cases. A transmission tree of this kind is usually unobserved unless rigorous contact-tracing with microbiological examination is implemented. Thus, a likelihood-based approach to reconstructing the tree is considered (Figure 3B) [29–31]. Given the total number of effective contacts that have equal potential for resulting in secondary transmission, the probability of a single effective contact resulting in a secondary transmission (or the probability that a secondary case is linked to an effective contact made by a single primary case in year t) is given by 1/R_{0}. This is a simple binomial sampling process. In other words, the likelihood function for estimating R_{0} is:
3. Results and Discussion
Table 1 shows the estimates of r and R_{0} for HIV in France, Western Germany and the UK. The maximum likelihood estimates of r ranged from 1.15 to 2.15 per year with the smallest estimate in France and the highest in Western Germany. The 95% CI for Western Germany did not overlap with those of the other two countries, reflecting the steep rise in incidence in this region in Figure 2B. Based on the exponential growth assumption in (5), the estimate of R_{0} ranged from 3.65–4.08. Again, Western Germany yielded the highest estimate without an overlap of the uncertainty bound with the other two countries. The maximum likelihood estimate of R_{0} based on the proposed new method ranged from 3.59 to 3.74. Not only were the qualitative patterns for the expected values of R_{0} consistent with those based on an exponential growth assumption, but the 95% CI also broadly overlapped with those based on the other method. In particular, although R_{0} in Western Germany using the proposed method is smaller than that based on an exponential growth assumption, the 95% CIs of the two methods overlapped. The 95% CI based on the proposed method appeared to be wider than those based on exponential growth assumption. Since HIV is mainly transmitted via sexual contact, the above mentioned estimate may vary with the mixing pattern and contact frequency (thus, there is no general disease-specific R_{0}, especially for HIV/AIDS). At least, compared with a previous estimate of R_{0} as ranging from 13.9 to 54.5 in the USA, based on an exponential growth assumption that adopted a mean infectious period of 10 years [32], R_{0} in the present study appeared to be much smaller using a precise estimate of the generation time distribution.
The present study proposed the use of the corrected actual reproduction number, R_{a}, for statistical inference of R_{0} based on incidence and known relative frequency of secondary transmissions (i.e., the generation time distribution) during the early growth phase of an epidemic. The previously available method using (5) forced us to adopt an exponential growth assumption, and moreover, an additional step towards the estimation of r (i.e., the translation of r to R_{0}) was required [9]. The proposed method does not necessarily require an exponential growth assumption and provides a “short-cut” to estimate R_{0} from incidence data. The simple likelihood function employing a binomial distribution was also proposed to yield an appropriate uncertainty bound of R_{0}. It should be noted that given the knowledge of g_{s} and readily available incidence data, equation (15) permits calculation of the expected value of R_{0} without likelihood. Such a calculation can be attained using any spreadsheet.
The usefulness of the actual reproduction number, calculated as a product of the mean generation time and the incidence-to-prevalence ratio, has been previously emphasized in assessing the epidemiological time course of an epidemic [14,15]. However, it appears that R_{a} does not precisely capture the secondary transmission if the transmission rate β(τ) varies with infection-age τ [18], and moreover, the cohort- and period-reproduction numbers directly derived from the renewal equation have been suggested to be more accurate figures in capturing the underlying transmission dynamics [17,33]. In the present study, replacing the denominator (i.e., prevalence) of R_{a} by the total number of potential contacts, it was shown that the R_{0} derived from the renewal equation coincides with the corrected actual reproduction number, R_{a}, and also that the likelihood can be quite easily derived. The corrected R_{a} does not require prevalence data, and uses only the incidence data and the generation time distribution.
Many future tasks remain, however. Most importantly, the estimation of R_{0} from early epidemic growth data for a heterogeneously-mixing population is called for. R_{0} in the present study can even be interpreted as R_{0} for a heterogeneously-mixing population (i.e., the dominant eigenvalue of the next-generation matrix), provided that the early growth rate is the same among sub-populations (though it is not the case if the growth rate varies across sub-populations) [34,35]. Analyzing heterogeneous transmission among approximately-aggregated discrete groups, the estimate of the next-generation matrix would give more detailed insights into the epidemic dynamics, including the most important target host for intervention [36]. As discussed in a modeling study in this special issue of the International Journal of Environmental Research and Public Health [37], understanding the implications of sexual partnerships and their variations as a function of calendar time as well as infection-age is also of utmost importance. As the next step for a similar estimation framework, methods for estimating robust R_{0} and the next-generation matrix from structured data (i.e., data stratified by age- and/or risk-groups) will be useful. Despite the future challenges, I believe the present study at least satisfies a need to offer a likelihood-based approach to estimate R_{0} from early epidemic growth data, while being easily tractable and calculable among general epidemiologists.
The work of H. Nishiura was supported by the JST PRESTO program.
References
- Diekmann, O; Heesterbeek, JA; Metz, JA. On the definition and the computation of the basic reproduction ratio R_{0} in models for infectious diseases in heterogeneous populations. J. Math. Biol 1990, 28, 365–381. [Google Scholar]
- Dietz, K. The estimation of the basic reproduction number for infectious diseases. Stat. Methods. Med. Res 1993, 2, 23–41. [Google Scholar]
- Becker, NG. Analysis of Infectious Disease Data; Chapman & Hall: Boca Raton, FL, USA, 1989. [Google Scholar]
- Smith, CE. Factors in the transmission of virus infections from animal to man. Sci Basis Med Ann Rev 1964, 125–150. [Google Scholar]
- Kendall, DG. Deterministic and stochastic epidemics in closed populations. In Proceedings of the 3rd Berkeley Symposium on Mathematical Statistics and Probability; University of California Press: Berkeley, CA, USA, 1956; pp. 149–165. [Google Scholar]
- Ma, J; Earn, DJ. Generality of the final size formula for an epidemic of a newly invading infectious disease. Bull. Math. Biol 2006, 68, 679–702. [Google Scholar]
- Heffernan, JM; Wahl, LM. Improving estimates of the basic reproductive ratio: using both the mean and the dispersal of transition times. Theor. Pop. Biol 2006, 70, 135–145. [Google Scholar]
- Massad, E; Coutinho, FA; Burattini, MN; Amaku, M. Estimation of R_{0} from the initial phase of an outbreak of a vector-borne infection. Trop. Med. Int. Health 2009, 15, 120–126. [Google Scholar]
- Wallinga, J; Lipsitch, M. How generation intervals shape the relationship between growth rates and reproductive numbers. Proc. Roy. Soc. Lon. Ser. B 2007, 274, 599–604. [Google Scholar]
- Svensson, A. A note on generation times in epidemic models. Math. Biosci 2007, 208, 300–311. [Google Scholar]
- Diekmann, O; Heesterbeek, JAP. Mathematical Epidemiology of Infectious Diseases: Model Building, Analysis and Interpretation; John Wiley & Son: Chichester, UK, 2000. [Google Scholar]
- Garske, T; Clarke, P; Ghani, AC. The transmissibility of highly pathogenic avian influenza in commercial poultry in industrialised countries. PLoS ONE 2007, 2, e349. [Google Scholar]
- White, LF; Pagano, M. A likelihood-based method for real-time estimation of the serial interval and reproductive number of an epidemic. Stat. Med 2008, 27, 2999–3016. [Google Scholar]
- Amundsen, EJ; Stigum, H; Rottingen, JA; Aalen, OO. Definition and estimation of an actual reproduction number describing past infectious disease transmission: application to HIV epidemics among homosexual men in Denmark, Norway and Sweden. Epidemiol. Infect 2004, 132, 1139–1149. [Google Scholar]
- White, PJ; Ward, H; Garnett, GP. Is HIV out of control in the UK? An example of analysing patterns of HIV spreading using incidence-to-prevalence ratios. AIDS 2006, 20, 1898–1901. [Google Scholar]
- Chowell, G; Nishiura, H. Quantifying the transmission potential of pandemic influenza. Phys. Life. Rev 2008, 5, 50–77. [Google Scholar]
- Fraser, C. Estimating individual and household reproduction numbers in an emerging epidemic. PLoS ONE 2007, 2, e758. [Google Scholar]
- Nishiura, H; Chowell, G. The effective reproduction number as a prelude to statistical estimation of time-dependent epidemic trends. In Mathematical and Statistical Estimation Approaches in Epidemiology; Chowell, G, Hyman, JM, Bettencourt, LMA, Castillo-Chavez, C, Eds.; Springer: Dordrecht, Germany, 2009; pp. 103–121. [Google Scholar]
- Shiboski, SC; Jewell, NP. Statistical analysis of the time dependence of HIV infectivity based on partner study data. J. Amer. Statist. Assn 1991, 87, 360–372. [Google Scholar]
- Hollingsworth, TD; Anderson, RM; Fraser, C. HIV-1 transmission, by stage of infection. Int. J. Infect. Dis 2008, 198, 687–693. [Google Scholar]
- Wawer, MJ; Gray, RH; Sewankambo, NK; Serwadda, D; Li, X; Laeyendecker, O; Kiwanuka, N; Kigozi, G; Kiddugavu, M; Lutalo, T; Nalugoda, F; Wabwire-Mangen, F; Meehan, MP; Quinn, TC. Rates of HIV-1 transmission per coital act, by stage of HIV-1 infection, in Rakai, Uganda. Int. J. Infect. Dis 2005, 191, 1403–1409. [Google Scholar]
- Abu-Raddad, LJ; Longini, IM. No HIV stage is dominant in driving the HIV epidemic in sub-Saharan Africa. AIDS 2008, 22, 1055–1061. [Google Scholar]
- Artzrouni, M. Back-calculation and projection of the HIV/AIDS epidemic among homosexual/bisexual men in three European countries: Evaluation of past projections and updates allowing for treatment effects. Eur. J. Epidemiol 2004, 19, 171–179. [Google Scholar]
- Brookmeyer, R; Gail, MH. AIDS Epidemiology: A Quantitative Approach (Monographs in Epidemiology and Biostatistics); Oxford University Press: New York, NY, USA, 1994. [Google Scholar]
- Jewell, NP; Dietz, K; Farewell, VT. AIDS Epidemiology: Methodological Issues; Birkhäuser: Berlin, Germany, 1992. [Google Scholar]
- Nishiura, H. Lessons from previous predictions of HIV/AIDS in the United States and Japan: epidemiologic models and policy formulation. Epidemiol. Perspect. Innov 2007, 4, 3. [Google Scholar]
- Bailey, NTJ. The Elements of Stochastic Processes with Applications to the Natural Sciences; Wiley: New York, NY, USA, 1964. [Google Scholar]
- Nishiura, H; Castillo-Chavez, C; Safan, M; Chowell, G. Transmission potential of the new influenza A(H1N1) virus and its age-specificity in Japan. Euro. Surveill 2009, 14, 19227. [Google Scholar]
- Wallinga, J; Teunis, P. Different epidemic curves for severe acute respiratory syndrome reveal similar impacts of control measures. Amer. J. Epidemiol 2004, 160, 509–516. [Google Scholar]
- Haydon, DT; Chase-Topping, M; Shaw, DJ; Matthews, L; Friar, JK; Wilesmith, J; Woolhouse, ME. The construction and analysis of epidemic trees with reference to the 2001 UK foot-and-mouth outbreak. Proc. Roy. Soc. Lon. Ser. B 2003, 270, 121–127. [Google Scholar]
- Nishiura, H; Schwehm, M; Kakehashi, M; Eichner, M. Transmission potential of primary pneumonic plague: time inhomogeneous evaluation based on historical documents of the transmission network. J. Epidemiol. Community Health 2006, 60, 640–645. [Google Scholar]
- Jacquez, JA; Simon, CP; Koopman, JS. The reproduction number in deterministic models of contagious diseases. Comments Theor. Biol 1991, 2, 159–209. [Google Scholar]
- Grassly, NC; Fraser, C. Mathematical models of infectious disease transmission. Nat. Rev. Microbiol 2008, 6, 477–487. [Google Scholar]
- Nishiura, H; Chowell, G; Heesterbeek, H; Wallinga, J. The ideal reporting interval for an epidemic to objectively interpret the epidemiological time course. J. R. Soc. Interface 2010, 7, 297–307. [Google Scholar]
- Nishiura, H; Chowell, G; Safan, M; Castillo-Chavez, C. Pros and cons of estimating the reproduction number from early epidemic growth rate of influenza A (H1N1) 2009. Theor. Biol. Med. Model 2010, 7, 1. [Google Scholar]
- Hethcote, HW; Yorke, JA. Gonorrhea Transmission Dynamics and Control (Lecture Notes in Biomathematics, 56); Springer-Verlag: Berlin, Germany, 1980. [Google Scholar]
- Muller, H; Bauch, C. When do sexual partnerships need to be accounted for in transmission models of human papilloma virus. Int J Environ Res Public Health 2010. [Google Scholar]
Table 1. Comparison of the estimates of the basic reproduction number for HIV/AIDS obtained using two different estimation methods. |
Country | r (/year)^{1} | R_{0} (exponential growth) ^{2} | R_{0} (proposed likelihood) ^{3} |
---|---|---|---|
France | 1.15 (1.12, 1.17) | 3.65 (3.64, 3.66) | 3.59 (3.38, 3.81) |
Western Germany | 2.15 (2.02, 2.29) | 4.08 (4.02, 4.14) | 3.74 (3.43, 4.08) |
UK | 1.21 (1.18, 1.25) | 3.67 (3.66, 3.69) | 3.65 (3.38, 3.96) |
^{1}The intrinsic growth rate during the exponential growth phase;^{2}the basic reproduction number estimated using equation (5);^{3}the basic reproduction number estimated using equation (17); the 95% confidence intervals are shown in parentheses.
© 2010 by the authors; licensee Molecular Diversity Preservation International, Basel, Switzerland. This article is an open-access article distributed under the terms and conditions of the Creative Commons Attribution license (http://creativecommons.org/licenses/by/3.0/).