Analytical Estimation of Data-Motivated Time-Dependent Disease Transmission Rate: An Application to Ebola and Selected Public Health Problems

Obtaining reasonable estimates for transmission rates from observed data is a challenge when using mathematical models to study the dynamics of ?infectious? diseases, like Ebola. Most models assume the transmission rate of a contagion either does not vary over time or change in a fixed pre-determined adhoc ways. However, these rates do vary during an outbreak due to multitude of factors such as environmental conditions, social behaviors, and public-health interventions deployed to control the disease, which are in-part guided by changing size of an outbreak. We derive analytical estimates of time-dependent transmission rate for an epidemic in terms of either incidence or prevalence using a standard mathematical SIR-type epidemic model. We illustrate applicability of our method by applying data on various public health problems, including infectious diseases (Ebola, SARS, and Leishmaniasis) and social issues (obesity and alcohol drinking) to compute transmission rates over time. We show that time-dependent transmission rate estimates can have a large variation, depending on the type of available data and other epidemiological parameters. Time-dependent estimation of transmission rates captures the dynamics of the problem better and can be utilized to understand disease progression more accurately.


Introduction
An epidemic is a function of environmental factors and a contact structure that varies over time, which in turn leads to varying transmission potential of an "infection". We also refer the word "infection" to describe social influences exerted by a typical influential individual with a particular social problem that results in a naive (to the social problem) individual becoming involved in the problem. For example, an alcohol drinker might influence an abstainer into imitating drinking behavior and initiating alcohol drinking. Many authors have studied outbreaks of social problems and infectious diseases using compartmental transmission/influence model. Qualitative aspects of homogeneous mixing models with constant transmission potential of an infection are well understood for various applications. These models are relatively easy to analyze and can answer questions, at the population level, with good precision. Homogeneous mixing compartmental models have a long history; however, quantification of temporal transmission potential of an infectious agent, an input variable for this type of model, has been a challenge.
William Hamer first published a paper in 1906 containing an epidemic model for the transmission of measles where his observation included the incidence of new cases, in a time interval, is proportional to the product, SI, of the density of susceptibles (S) and the density I of infectives (I) in the population. The formulation of incidence can be explained by considering some epidemiological quantity. Consider a single susceptible individual in a homogeneously mixing population of size N. This individual contacts other members of the population at the rate c, per unit time, and a proportion I/N of these contacts are with individuals who are "infectious". If the probability of transmission of infection given contact is ρ, then the rate at which the infection is transmitted to a susceptible is ρcI/N, per unit time, and the rate at which the susceptible population becomes infected is ρcSI/N.
The contact rate is often a function of population density, reflecting the fact that contacts take time and saturation occurs. If c is assumed approximately proportional to N or equal to constant, incidence can be represented by terms such as βSI (referred as mass action incidence) or βSI/N (referred as standard incidence), respectively. The parameter β, which includes the contact rate c, is known as a "transmission coefficient" (or "effective contact rate" or "transmission potential") with units as time −1 . At low population densities mass action is a reasonable approximation of a much more complex contact structure; however, in general, standard incidence is more appropriate for modeling transmission for human diseases or influences for social problems. The term βI/N is sometimes referred as the force of infection, i.e., per-capita rate at which susceptible members of the host population are becoming infected. On the other hand, the transmission rate, represents the number of new infections per unit of time generated by an infected individual. The transmission rate is calculated by dividing incidence for a given time period by a disease prevalence for the same time interval.
Most infectious disease data are collected in form of incidence and/or prevalence. Prevalence of a "disease" in a population is defined as the total number of cases of the disease in the population at a given time, whereas prevalence proportion is computed by dividing the total number of cases in the population by the number of individuals in the population. It is used as an estimate of how common a condition is within a population over a certain period of time. Incidence is a measure of the risk of developing some new condition within a specified period of time. Incidence proportion (also known as cumulative incidence) is the number of new cases within a specified time period divided by the size of the population initially at risk. When the denominator is the sum of the person-time of the atrisk population, it is also known as the incidence density rate or person-time incidence rate. Using person-time rather than just time handles situations where the amount of observation time differs between people, or when the population at risk varies with time. Prevalence is a measurement of all individuals affected by the disease within a particular period of time, whereas incidence is a measurement of the number of new individuals who contract a disease during a particular period of time. So, prevalence and incidence proportion at the time t is given by I(t)/N(t) and β × (S(t)/N(t)) × (I(t)/N(t)), respectively.
In compartmental mathematical models, varied assumptions are made based on characteristics of a modeling disease which lead modelers to focus on more important aspects of the epidemic. For example, an epidemic that occurs on a timescale that is much shorter than that of the population replenishment (that is, epidemic occurs at a much faster rate than births and deaths in the population), constant population size can be assumed. Additional common features of these models might include temporary or permanent recovery of infected individuals and a birth rate into infective class. Whether establishment or a major outbreak of an infectious disease or a social problem will occur in a population, requires extensive experience or a mathematical model of disease dynamics and estimates of the parameters of the disease model. Here, we provide a method for estimating the transmission coefficient (β), which is a key parameter in shaping the epidemic dynamics generated from the model because of nonlinearity associated with the term containing it. A suitable set of data for estimation of β includes prevalence and incidence of the outbreak in question. There are many different methods for estimating β but most of them results in an aggregate value over time. The methods in the literature include estimation using regression of prevalence and time since start of an epidemic [1], estimating from equation for basic reproductive rate when threshold density is known [2], estimating from equilibrium prevalence [3,4], using age prevalence curves [5], inferring from behavior or contact data [6], and iterative comparison of field prevalence data with model predictions [7].
Some researchers have modeled time-varying transmission coefficients for diseases that follow seasonal patterns but using a predefined functional form [8]. On the other hand, a study by Finkenstadt and Grenfell [9] uses a discrete time model that allows for a temporally varying transmission parameter with a period of one year with no assumption on functional form. However, their estimation is computationally intensive and assumes that reporting interval of the available data must be an integer fraction of the serial interval of the disease. Another study by Pollicott et al. (2012), suggested first to fit the data with a pre-defined continuous function and then provide an analytical estimate of the transmission rate. However, their method was only applicable to prevalence data, with some restrictive assumptions on the initial number of susceptible or vital rates [10]. In the current study, we provide an analytical estimation of transmission coefficient using distinct and novel mathematical approach that is not only applicable to both prevalence and incidence data but also has its applicability to wide public-health problems including social issues. Table 1 provides a brief comparison of the estimation procedures in the Pollicott et al and the present study.  Examples of social problems such as alcohol drinking and obesity and infectious diseases such as Ebola, Visceral Leishmaniasis (or Kala-azar), and SARS are used to show relevance of the analytical work. The available data of US college alcohol drinking and obesity outbreak in US include prevalence trends, whereas incidence data of Ebola outbreak in West Africa (Guinea, Sierra Leone, and Liberia), Kala-azar outbreak in Bihar, and SARS epidemic in Hong Kong are used for the estimation.
In this paper, we compute time-dependent and -independent transmission coefficient of Ebola virus disease along with other health care problems such as college alcohol drinking, the obesity epidemic in United States, the spread of Visceral Leshmaniasis, and the spread of the 2003 SARS Outbreak in Hong Kong. The remaining paper is stratified as follows: Section 2 provides a compartmental SIR model and two analytical expressions of transmission coefficients based on prevalence and incidence data; examples for computing coefficient over time using each of the two expressions and field data are shown in Section 3; and finally, the results are discussed in Section 4. Figure 1 represents the overview of this paper.

Formulation for Time-Dependent Estimation
Epidemics in a population are typically captured via an SIR-type (Susceptible-Infectious-Recovered) epidemic models. It is assumed that in a well-mixed population, individuals interact with each other at random. The model considers a population of size N, where S(t), I(t) and R(t) represents number of susceptible, infectious, and removed individuals at time t. Individuals are recruited in the population at the rate b(t), die at the µ constant percapita mortality rate and recover from infection at a α constant per-capita rate. The model assumes that the recruitment rate is governed by immigration, emigration and natural births, and the recovered individuals are immune to the infection but after temporary immunity period a recovered individual may lose immunity and move to S class. Hence, a "disease" outbreak in population can be captured by the following system of differential equations: where R(t) = 1 − S(t) − I(t) and parameters are defined in Tables 2 and 3, Figure 2.  Following steps carried out in Hadeler [11] and using Equations (1) and (2), we derive two explicit expressions for β(t): one based on prevalence data and other on the incidence of the disease. The main derivation steps for are mentioned below. Transmission or influence coefficient Per-capita recovery rate (its reciprocal is infectious period) γ(t) Per-capita rate of losing immunity or relapse rate µ(t) Per-capita mortality or departure rate Table 3. Definition of variables and parameters in the model given by Equations (1) and (2).

Derivation of β(t) Expression in Terms of Prevalence
Suppose prevalence data are available. Derivation of β(t) as a function of prevalence is carried out as follows. Adding Equations (1) and (2) we obtain Setting (3) and solving it we obtain where S(t) is given by Equation (4). Note, beside prevalence (I), we also need I to compute β(t) using Formula (5). However, I can be approximated using prevalence data.

Derivation of β(t) Expression in Terms of Incidence
On the other hand, suppose incidence data are available. To calculate expression of β(t) as a function of incidence (w(t) = β(t)SI) we first solve Equation (2) for I with initial condition I(T) (where T ∈ [0, L] is a time at which the prevalence proportion, I(T), is available) and obtain (6) where H(a, b) = exp − b a (α(s) + µ(s))ds . Using this expression of I(t) in Equation (1) and solving the resultant equation for S with initial condition S(0) we obtain Thus, where S(t) and I(t) are given by Equations (6) and (7), respectively. Note, we need prevalence at time point T, I(T), to compute β(t) using Formula (7). The time point T can be appropriately chosen, close to maximum of prevalence and not towards starting or end of epidemic.

Time-Independent Estimation: Bayesian Analysis
The Bayesian Monte Carlo Markov Chains (MCMC) approach can be used to quantify uncertainty around the transmission rates and compare our analytical estimates with it.
Let θ represents vector of our transmission parameters and y = (y 1 , y 2 , . . . . . . , y T ) T is the available data set. We can take likelihood function in our Bayesian approach as where T is the total number of data points in the data set, σ is the appropriately chosen variance and f (θ) is the model output function for which data are used. If more than one data sets are used then the likelihood can be modified as follows: Although a Bayesian approach can provide uncertainty around time-independent average transmission rate, it does not inform how the transmission rate varied over time and uncertainty itself is constant over time. Therefore, this approach, while assists in understanding uncertainty in disease progression, it does not address the challenge of capturing changing transmission rates over the progression of an epidemic with respect to time.

Results
We use four examples to show how to estimate β over time from the available epidemiological data. The examples provide a method to study social and public-health issues. To compute estimates of β(t), we use first order discretization for derivatives and composite trapezoidal rule for integration as given below These discretizations are used in the formulas given in Equations (5) and (8).
We can avoid this discretization by choosing a function, for example, a polynomial that can be fitted to the prevalence and incidence temporal data. This fitted function can then be used directly in Equations (5) and (8). Additional demographical and epidemiological data that we require in the β(t) estimation for both incidence and prevalence case are duration of infectious period, recruitment rate, natural mortality rate, and relapse rate.

Using Incidence Data
In this section, we apply available incidence data to three past epidemics:  [21]. For these estimates, prevalence is taken as 31 May 2015, as this point is close to the maximum prevalence and not towards the start of the epidemic (see Section 2.1.2). Incidence is calculated by dividing these case counts by the 2016 population for each country, as reported by the United Nations (UN) [22]. We assume a constant recovery rate of 10 days (α(t) = α), a constant relapse rate of 10 years (γ(t) = γ), no vertical transmission (p = 1), and a constant population (b(t) = µ(t) = u = 0); since the CDC data provides monthly case counts, these parameters are adjusted to per month rates. We estimate β(t) by simplifying Equation (6) as follows: On discretizing Equation (10) we obtain the following expressions. If t ≤ T, where where g 1 (x) = e m 1 x w(x) and m 1 = α. For the estimation of β(t) with regards to available incidence data, the estimates are found in Table A1 (see Appendix A (Figure 3h). Comparing the results for each region, we find the largest temporal estimate for both the mean and median β(t) to be that of Guinea (see Table 4 and Figure 4). Analyzing the estimates for transmission rate temporally, we observe that transmission rate follows the incidence pattern reflecting the effects of exponential incline in the beginning of epidemic as well as impacts of diseaseacquired immunity as well as non-pharmaceutical interventions implemented over the course of epidemic (Figure 3).  Visceral Leishmaniasis (VL) is a vector borne infectious disease that is spread from person to person by a bite of the tiny insect, sandfly. Large population suffers from VL in some tropical and subtropical countries of the world. The highest burden of the VL is found in Indian state of Bihar. We obtained underreporting adjusted 2005 incidence data of Bihar from [7]. The data contain number of new cases during past month adjusted for underreporting. The Expression (13) is used to estimate β(t) via two different models. The first model was for a single outbreak and hence demography was not considered whereas the second model assumed birth and death though with a same per-capita rate.
If t ≤ T then where where Since annual epidemic during 2005 started showing clear trend of decaying in the month of October, we took this time to compute the prevalence of VL in Bihar. Prevalence during October 2005 was computed under assumption that 25% of worldwide leishmaniasis prevalence is from VL cases whereas remaining is from other forms of Leishmaniasis. It also assumed 20% of global burden is in Bihar. Since some proportion of a population are naturally immune to the disease, we carried out estimation for three different values of initial proportion of susceptibles, namely 0.1, 0.5 and 0.8. Recovery rate of 0.21 per month and influx/outflux rate of the population of 0.00138 was computed using data from   [7]. The other assumptions of the model include constant recovery (i.e., α(t) = α), no vertical transmission (i.e., p = 1), permanent recovery (i.e., γ(t) = 0) and same constant per-capita incoming and outgoing rates (i.e., b(t) = µ(t) = µ). We only model human population and do not take into account vector population explicitly. Thus, β(t) could be interpreted as vectorial capacity of sandfly population transmitting infection between humans.
The obtained estimates of β(t) are given in Table A2 (see Appendix A

2003 SARS Outbreak in Hong Kong
Severe acute respiratory syndrome (SARS) is a viral respiratory illness caused by a coronavirus. SARS epidemic in Hong Kong is shown in Figure 7a. We estimated transmission coefficient using a single outbreak model with parameters values given in Table A3. The formula used for estimating β(t) is On discretizing Equation (15) we obtain following expressions. If t ≤ T, where If t > T, where b 6 = I(T)e α(T−t) + he −tα g 4 (t) + g 4 (T) 2 + n−1 ∑ k=1 g 4 (kh) (19) where g 4 (x) = e m 4 x w(x) and m 4 = α.
The temporal estimates of β(t) are shown in Table A3 (see Appendix A) and Figure 7b.

Using Prevalence Data
We use US national college alcohol drinking and obesity data as examples in this section. In Appendix A, We also present a hypothetical example with synthetic prevalence data and known time-varying transmission rate to illustrate the ability of our analytical expression to accurately capture the time-dependent transmission rate

College Alcohol Drinking
The available alcohol drinking data represent prevalence (proportion of cases at a certain time) and not incidence (new cases over time period). This is because the data are based on the survey where the drinking pattern estimates are obtained by asking individuals their drinking behavior during past one year. Hence, data can be interpreted as the number of individuals in certain drinking category at a particular time. Therefore, we use formula given in Equation (5) to estimate β(t). We assume that drinking is a result of social influences exerted by drinkers (I) on susceptibles (S) or social drinkers. Individuals recovered from drinking at a constant rate α (i.e., α(t) = α). The recovery is assumed to be permanent (i.e., γ(t) = 0). The incoming and departure rates are same (i.e., µ(t) = b(t) = µ) and p = 1. These assumptions are reasonable in context of the type of data (college population) used here.
Alcohol drinking data, obtained from Engs et al., 1997 and 1999, is given in the Table A4 [12,13] that represent the trend observed in national college drinking surveys. The recovery rate, α is taken to be 0.17 [4]. We estimate β(t) using simplified Equation (5) and above assumptions as follows where If µ = 0, this equation can be reduced, where f 2 is −αI(x). We found that mean estimate of β is 1.04 (std = 0.3; Table A4 see Appendix A and Figure 8) during 1982-1994 for the national college drinkers. The estimates of β are comparable to the estimates obtained in the [4]. These estimates of β(t) are all contained in 95% CI of the estimates in the [4], which are β 0 = 1.69 (95% CI [0.63, 2.75]) and β 2 = 0.75 (95% CI [0.29, 1.21]). Engs et al., 1994 and1997 suggest that 65% of freshmen are drinkers during the start of Fall semester. Hence, we assumed that 0.65 proportion of incoming students are drinkers, i.e., p = 0.35. We assumed negligible change in size of a college population and consider rate of enrollments equal to combined rate of graduation and dropout rates (i.e., b(t) = µ(t) = µ).

Obesity Epidemic in US
We use model to see whether weight gain in one person is associated with weight gain in his or her family members and friends. Obese persons are an individual whose body-mass index (the weight in kilograms divided by the square of the height in meters) is greater than or equal to 30. It is found that there has been increasing number of obese persons in a community and a person's chances of becoming obese increases dramatically if he or she had a parent, sibling, friend or spouse who became obese in a given interval [24]. The most reasonable explanations for the obesity epidemic, include changes in which luxuries and food consumption are being promoted in the society and has not spared any socioeconomic class. An obesity is a result of individual's choice and behavior which is influenced by appearance and behavior of others in the community. Hence, it suggests that just as with the spread of drug-use or infectious diseases, weight gain in one person might influence weight gain in other person, i.e., it is not that obese or non-obese people simply find other similar people to hang out with. This influence could be direct or indirect, which can vary continuously over time and may depend on demographic and social factors of the community as well.
We used annual CDC data from references [14,18] to estimate parameters for our obese epidemic model. The data obtained from [25] include an age-adjusted prevalence of obesity in US using the projected 2000 U.S. population. The model assumes constant population and hence b(t) = µ(t) = µ. It is assumed that 6% of children are born obese [14]. The value of recovery rate is assumed to be equal to an average of rate at which an overweight individuals move on diet (4.068 × 10 −3 per week [18]) and rate at which an obese individual stops or reduces bakery, fried meals and soft drinks consumption (4.4379 × 10 −3 per week [18]). We assume obesity reduces life span by 6 to 7 years. Hence if average life span in US is 78.4 years than average life span of at-risk population for obese is (78.4 − 6.5) years. The estimated β from [18] ranges from 0.02 to 0.04. These estimates are much lower than our estimated values in Table A5 (see Appendix A) with range of (0.36, 3.02) (Figure 9). This is because the region of our study differs from the region modeled by [18]. Our results suggest that estimates of transmission coefficient increase with increase in µ and decrease in initial size of susceptible population, S(0). where

Estimation of Time-Dependent Transmission Coefficient Using Synthetic Prevalence and Incidence Data
We demonstrate our method of using prevalence and incidence data to estimate time-dependent transmission coefficient using synthetic prevalence and incidence data generated with two choices of transmission coefficients and the model ((1) and (2)) with rest of parameters given by (Table 5). Table 5. Parameters for generating synthetic prevalence data.

Parameters p b(t) γ(t) µ(t) α(t) Values
1 33/1000 0 33/1000 8 In the first case, we assumed transmission coefficient to be constant over time and in the second case, we consider a transmission coefficient that is seasonal. They are given by 1.
β(t) = 20(1 − cos 2πt), with = 0.1. We generate daily prevalence and incidence data for two years and estimated monthly transmission coefficient using Equations (5) and (8) respectively. We used MATLAB's 'pchip' function to interpolate the synthetic prevalence and incidence data in the formulation and integrated using 'integral' function. The monthly estimates for time-dependent transmission coefficient were reasonably accurate and close to the true values of the transmission coefficients used to generate prevalence and incidence data in both the cases when transmission coefficient was constant and when it was periodic ( Figure 10). As birth rates, mortality rates and recovery rates are often considered constant in models, we used constant terms for these variables to limit simulation time. However, if time-dependent information on these variables is available they can easily be incorporated and simulated.

Discussion
Compartmental models have provided valuable insights into the epidemiology of many infectious diseases. Transmission coefficient, a product of contact rate and probability of transmission given a contact, is a parameter in the compartmental model which naturally varies over time. This coefficient had the greatest effect on predictions of dynamics of disease or social problem and difficult to estimate. However, due to lack of detailed data as well as complexities involved in numerical estimating this parameter, most studies estimate it as a time-independent parameter averaging it over the course of epidemic. In this study, we present a method to estimate time-dependent transmission rate using two types of data commonly reported during infectious disease outbreaks: the time series of the number of infectives (or prevalence) and the number of new cases generated during a period of time (or incidence). By deriving an analytical method that uses a standard deterministic model and these data sets to directly estimate β(t), this new approach resolves the computational challenges often involved with more complex model. By applying our approach to several infectious diseases, we illustrate applicability of our methods in various contexts. Moreover, similar approaches can be applied with any appropriate mathematical model to derive time-dependent transmission rate for diseases whose dynamics may need to incorporate other factors such as environment (for. e.g., role of waterbodies in cholera spread) or vector dynamics (for. e.g., impact of mosquito in dengue transmission).
Utility of approach presented in this manuscript is demonstrated using several publichealth problems including Ebola, Visceral Leishmaniasis, US college alcohol drinking and obesity outbreak in the US. In particular, we estimated the temporal estimates of transmission rate for Ebola during 2014-2016 outbreak in West Africa (aggregated) as well as for individual countries of Liberia, Guinea and Sierra Leone. Our results though limited by the accuracy of data, demonstrated the wide-variability in transmission risks across the three countries. Moreover, we found that our temporal estimates of transmission risk followed the pattern of incidence closely, but slightly delayed, reflecting the substantial contribution of transmission risk towards the nature of disease progression. During the times of publichealth emergencies due to an infectious disease outbreaks such as Ebola outbreak in West Africa or ongoing COVID-19 pandemic, effective reproductive numbers are often estimated using incidence data to understand the progression of disease and inform strategies to curb the transmission. Although estimates of effective reproductive numbers are useful, combining it with estimation of time-varying transmission risk through our approach can be more informative to inform public-health decision-making. Transmission risk at a particular time is a product of contacts and probability of transmission. Thus, it can be used to make short term predictions about new infections as well as it can inform how much reduction in contact patterns or risk of transmission (through mask/vaccination/hygiene) can reduce the transmission parameter sufficiently to reverse the trend of an epidemic.
In the current study, we used simple deterministic model along with simple integration numerical techniques to show how commonly reported data (incidence and prevalence) can be used in informing temporal transmission risk, and thus manage public-health challenges more effectively. Practical application of our approach would improve with use of more complex models (appropriate) as well more sophisticated integration techniques. Moreover, analytical derivation can be used to understand the impact of changes in any other input parameter (such as smaller/longer quarantine periods) on transmission risk in a straightforward way. Similarly, an area of future research can expand presented framework to understand how incomplete data may alter the quality of parameter estimation. Therefore, value of analysis reported here is as a beginning point for future research that will build on current approach to develop computational models that can inform policies in swift manner during public-health emergencies. We believe using our methods can provide good approximation of time-dependent transmission coefficients and goodness of approximation should increase with use of more sophisticated modeling techniques.

Conflicts of Interest:
The authors declare no conflict of interest. The funders had no role in the design of the study; in the collection, analyses, or interpretation of data; in the writing of the manuscript, or in the decision to publish the results.

Abbreviations
The following abbreviations are used in this manuscript:      Table A5. Age-adjusted Prevalence of Obesity in US using the projected 2000 U.S. population [25].