Modeling and Controlling Epidemic Outbreaks: The Role of Population Size, Model Heterogeneity and Fast Response in the Case of Measles

: Modelers typically use detailed simulation models and vary the fraction vaccinated to study outbreak control. However, there is currently no guidance for modelers on how much detail (i.e., heterogeneity) is necessary and how large a population to simulate. We provide theoretical and numerical guidance for those decisions and also analyze the beneﬁt of a faster public health response through a stochastic simulation model in the case of measles in the United States. Theoretically, we prove that the outbreak size converges as the simulation population increases and that the outbreaks are slightly larger with a heterogeneous community structure. We ﬁnd that the simulated outbreak size is not sensitive to the size of the simulated population beyond a certain size. We also observe that in case of an outbreak, a faster public health response provides beneﬁts similar to increased vaccination. Insights from this study can inform the control and elimination measures of the ongoing coronavirus disease (COVID-19) as measles has shown to have a similar structure to COVID-19.


Introduction
The ongoing novel coronavirus disease 2019 (COVID- 19) pandemic has shown the importance of efforts to control outbreaks before they become a pandemic. A potential COVID-19 vaccine provides hope for containing the pandemic. Public health officials need to plan for the implementation of vaccination programs while also considering the vaccine hesitancy (necessity, safety and freedom of choice) [1]. Insights from other well-known diseases, especially ones with a similar epidemic progression structure, can help public officials in decision making. Measles is one such example. Due to the structural and progression similarities, researchers have suggested understanding and utilization of origin, models and vaccines of measles for controlling the ongoing COVID-19 pandemic [2][3][4][5][6][7][8][9][10][11][12]. Researchers have shown that there are similarities between the protein S in COVID-19 with protein F in measles [3,4]. Many have suggested using the measles, mumps and rubella (MMR) vaccine as an intervention [5][6][7][8][9][10][11]. They develop epidemic models following the same standard structure as measles where the population is divided into susceptibles, exposed, infectious and recovered groups [12]. Research suggests that the occurrence of COVID-19 might follow a similar pattern to recent measles outbreaks in the United States [13]. For this reason, in this paper we study the model and control measures of epidemic outbreaks in the case of measles to transfer the knowledge to  Vaccination coverage gaps have resulted in spikes in epidemic outbreaks of preventable infectious diseases among which measles has become the most alarming worldwide [14]. Measles is a highly rates of religious or philosophical exemptions to vaccination requirements [30,31] make it important to consider alternative interventions. However, studies of other interventions such as better case detection or a faster public health response are missing. Our study includes three intervention variables, which are the vaccinated fraction, the probability of detecting and reporting a case and the number of days' delay in the public health response. Our numerical simulations show that a faster public health response can be just as important as increasing vaccination coverage.

Materials and Methods
We built a model to determine the size of the outbreak, when it was detected and how various interventions or model characteristics affected it. We modeled the progression of measles using an SEIR (Susceptible-Exposed-Infectious-Recovered) model with a 10-day exposed (or incubation) period when susceptible individuals came into contact with the infection but were not able to spread it to others and a subsequent eight day infectious period when they were able to spread the infection to other susceptible individuals [51,52]. We modeled measles transmission using a stochastic simulation model. Every day there was a chance that the outbreak could be detected and contained shortly thereafter with a vaccination program (or closure of the school). At the start of the simulation there was a single infectious individual and we focused on the final size C of the outbreak. We assumed a constant population of size n, of which a fraction v was unvaccinated. We examined the effect of community structure on measles transmission and how people connected and communicated. To study the extremes of community structure we compared the case of homogenous mixing (where everybody was part of the same community group) and the case where the population was divided into two equal-sized community groups. In the latter case, an individual was more likely to infect another individual in the same community group than one in the other group, where this relative likelihood was the mixing parameter, ϕ. This was similar to the non-random mixing from Glasser et al. [53] where the inter-school contacts and disease specific infection probabilities decreased based on distance in the case of measles. When ϕ = 0 there was no mixing between the two community groups and when ϕ = 1 there was no preferential mixing between the groups (there was effectively only one big community group). However, we kept the basic reproductive number, R 0 , the same in all cases. We also considered the less stylized case of transmission between two schools, a smaller elementary school of 500 students and a high school with 2000 students.
We assumed each infectious individual had a daily probability q of being detected and reported. Throughout the paper we use detecting and reporting a case interchangeably. Note that sometimes the outbreak was never detected due to underreporting [54] and not detecting the imported cases [55]. Under homogenous mixing (a single community group), the simulation was halted days after the first case was detected and reported. With two community groups, we assumed that public health officials separately stopped the spread in each group. Specifically, the spread in each group was halted days after the first detected case in that group and σ = 5 days after the first detected case in the other group, whichever occurred first.

Model
Here we describe the equations that describe our model. Below is also a table of notation. We start by describing the equations for the case with a homogenous population. We let n be the size of our population and v the fraction unvaccinated. Thus, before the outbreak there were vn susceptible individuals in the population. On day t, this population was divided into S(t) susceptible individuals, E k (t) individuals in day k ∈ {1, 2, . . . , 10} of the incubation period, I k (t) individuals in day k ∈ {1, 2, . . . , 8} of the infectious period and R(t) individuals who had recovered from an infection. Note that S(t) + 10 k=1 E k (t) + 8 k=1 I k (t) + R(t) = vn. We denoted the total number of infectious individuals on day t as T(t). On day 1, we had exactly one infected individual who was in the first day of the incubation period. The system dynamics are then given by Equations (1)- (9), where N(t) was the number of new infections on day t. (1) Equations (10) and (11) describe the number of new infections N(t) on day t. These were distributed binomially with each susceptible individual on day t − 1 having a probability p(t) of being infected. We calculated p(t) as the total number of infectious individuals, T(t − 1) times the basic reproductive number R 0 and divided by the length of the infectious period, 8, and the size of the population. This way, the first infectious individual created R 0 secondary cases in expectation if the entire population was susceptible.
Equations (12)- (14) describe the final size of the outbreak, C. Here, Z(t) was the number of cases reported on day t. We then found the time, t, which gave us the first reported case of the outbreak (min t : Z(t) ≥ 1 ). Thus, on day τ, ε days after the first reported case, public health authorities stopped the spread of the outbreak by coordinating various activities such as press releases about the index case, isolation, school closures, polymerase chain reaction (PCR) tests and postexposure prophylaxis (PEP) administration [56]. The coordination of these activities generally takes time [56], thus we used a lag term, ε, to account for the delay. The lesser the value of ε meant the faster public health response. The outbreak size C was then the number of infected and removed individuals on that day. The number of reported cases per day, Z(t), was distributed binomially with each infectious individual having a probability q of being detected and reported each day.
Now we describe the small modifications to the model we needed to make for the case where the population was divided into two subgroups. These are summarized in Equations (15)- (22). We let n x , S x , E x k , I x k , R x , T x , N x , Z x , τ x , C x and p x be the respective variables for subgroup x = 1 or x = 2. Initially, the only infected individual was in subgroup 1 and was in the first day of the incubation period. Recall that the mixing parameter ϕ was the ratio of the probability of infecting a susceptible in the other subgroup to the probability of infecting a susceptible in the same subgroup. Thus, we adjusted the infection probabilities p x (t) by considering both infections from the same subgroup and the other subgroup. In addition, we not only had a public health intervention stopping the spread in a subgroup ε days after the first reported case in that subgroup but also σ days after the first reported case in the other subgroup. Table 1 below summarizes the notation used in this manuscript.
Susceptibles, S(t) Individuals in day k of the incubation period, E k (t) Individuals in day k of the infectious period, I k (t) Recovered individuals, R(t) Total number of infectious individuals, T(t) New infections, N(t) Probability a susceptible individual is infected, p(t) Basic reproduction number, R 0 Number of reported individuals, Z(t) Probability an infectious case is reported each day, q Lag between the first reported case in same subgroup and stopping, ε Stopping time, τ Mixing parameter, ϕ Lag between the first reported case in the other subgroup and stopping, σ

Simulations
We conducted numerical experiments to verify our analytical results and provide additional insights. For our simulations, we considered a stylized case with a focus mostly in the U.S. We utilized the average elementary school size for the subgroup size to obtain the total population size [57]. By using the average vaccination level of 90%, we obtained the fraction unvaccinated [58]. Despite being a mandatory reported disease, public health authorities still struggle vastly with underreporting. We gathered daily reporting probabilities based on studies about the underreporting of measles in other developed countries [54,59,60]. We found the delay in the reporting and the public health action from two case studies of measles in Colorado, U.S. [56]. For the baseline of the mixing parameter we chose 5, the middle value of the range, which was [0, 1] and for the lag between the first reported case in the other subgroup and stopping, we chose five days. Table 2 presents the summary of the parameter values.  [54,59,60] Lag between the first reported case in the subgroup stopping, ε (days) 3 [56] Lag between the first reported case in the other subgroup and stopping, σ (days) 5 We tracked the number of infected cases, C, and the number of infected cases in the second community group, the one that was initially not infected, C 2 , when we assumed two community groups. Specifically, we examined the expected size of the outbreak, E[C], the probability the outbreak was at least five, P [C ≥ 5] and the probability that the outbreak spread beyond the initial community group, P [C 2 ≥ 1]. In the post elimination era, the median number of outbreak cases is five; thus, five was the threshold outbreak size we used in our metric [22].
We also examined the impact of public health interventions such as vaccinated fraction of the population, daily reporting probability and faster public health response. We varied the vaccination coverage using the different vaccination rates across states [61]. The values of daily reporting probability was changed based on the range in different underreporting studies [54,59,60]. In order to measure the impact of faster response, we investigated results by reducing the delay term from 3 days to 0 days. During the sensitivity analysis, all parameters not varied in a figure or table were at their baseline value as given in Table 2.
We simulated several scenarios to conduct a sensitivity analysis for the less stylized case of transmission between two schools, a smaller elementary school of 500 students and a high school with 2000 students. We assumed that initially there was a single infected case in the elementary school as measles cases were more likely to appear in the elementary school [62].

Theoretical Results
In this section, we analyze the impact of population size on the model and the model heterogeneity. For the population size, we show that the epidemic outcome measures converged to finite limits as the population size increased. For the model heterogeneity, we show that the outbreaks were larger and longer in the inhomogeneous case.
Recall that T = T(t, n) denoted the infectious population and N(t, n) the number of new infections, where we may drop the dependence on t and n for convenience. We let C(t, n) denote the size of the outbreak at time t, the number who were infected or recovered at this time. We let Y(i) denote whether individual i became infected at time t so that the number of new infections at time t was N = nv−C i=1 Y(i). For given T, the Y(i) was independent and identically distributed with Y(1) ∼ Bernoulli(Tβ/n), where β was the effective contact rate.

Convergence as Population Size Increased
Our goal in this subsection is to prove for the homogenous case that the infection process converged to a finite limit as the population size became large. For notational convenience, let C = C(t + 1, n) and note that C(1, n) = 1 and C = C + N. Theorem 1. For all t, T(t, n) N(t, n) and C(t, n) converge in distribution as n → ∞.

Proof.
Note that log (1 (1)). Since this is a product of limits, it converges to vw(∞). The claim follows from the continuity of exp().

Proof.
The characteristic function of a random variable X is E e isX as a function of s.
Let ϕ n (s) = E e isN(t,n) be the characteristic function of N(t, n). Define the characteristic function of Poisson(T(t, ∞)βv) as where we use the fact that the characteristic function of Poisson(λ) is e (λ(e is −1)) . By Levy's continuity theorem, it suffices to show for all s > 0, that ϕ n (s) → ϕ(s) as n → ∞ .
Note that e isN = nv−C i=1 e isY(i) a.s. Thus, where we use the fact that the characteristic function of Bernoulli(p) is (1 − p + pe is ). By Lemmas 1 and 2, C/n → 0 in distribution. We now construct a joint probability space where T(t, n) → T(t, ∞) a.s. and C(t, n)/n → 0 a.s. Then, by Lemma 3, E e isN T, C → e (T(t,∞)βv(e is −1)) a.s. Since e isN is bounded, we can apply the bounded convergence theorem to take the expectation of both sides and prove the claim.
Proof of Theorem 1. Since C(1, n) = 1 and T and C are the sum of previous new infections, we can apply induction using Lemma 4.

Outbreaks Are Longer with Two Subgroups
We now turn our attention model heterogeneity. There were two reasons why heterogeneity increased the expected outbreak size. The first reason was when a community was divided into two subgroups: the outbreak lasted longer when the community was divided into two subgroups. This was because if a case was detected in one subgroup, public health officials did not intervene in both subgroups at the same time. Instead, they intervened later in the subgroup that did not detect the first case. In the homogenous case, the intervention when the first case was detected affected more people immediately (i.e., the entire population) than in the two subgroup case. To formalize this, let τ (1) = max τ 1 , τ 2 and τ (2) = min τ 1 , τ 2 . Also, define the case detection times in the homogenous case and the case with two subgroups, τ = min t : Z(t) ≥ 1 and τ x = min t : Z x (t) ≥ 1 , respectively. Theorem 2. Suppose that delay between detection and intervention is larger if the case is detected in the other subgroup, σ ≥ and that everything else, the total number of infections and detected cases is the same as in the homogenous case, Z 1 (t) + Z 2 (t) = Z(t) for all t. The stopping time in the homogenous case is then the same as in one of the subgroups but before the stopping time in the other subgroup, τ = τ (2) Proof. Note that τ = min τ 1 , τ 2 . Hence, τ = τ + = min τ 1 , τ 2 + = τ (2) . Obviously, τ (2) ≤ τ (1) . The last inequality, τ (1) ≤ τ + σ − follows from the definition of the stopping time (20) and (21).
Additional intuition for τ ≤ τ 1 , τ 2 (a consequence of τ = min τ 1 , τ 2 ) is the following. The number of person-days of infected individuals required until a case was detected is a geometric random variable with mean 1/q. In populations with more infected individuals (which larger populations tend to have), one had a higher probability of detecting a case and detection occurred more quickly.

Heterogeneity in Infection Risk
We now focus on the second way that model heterogeneity led to larger outbreaks: the variation in the infection risk between individuals in the case with two subgroups. We abstracted beyond two subgroups and considered the general case where the infection probability varied among individuals, Y(i) ∼ Bernoulli(p i ). To ensure a valid comparison, we used the average infection probability in the homogenous case, p * = 1 n n i=1 p i = Tβ/n, ensuring that E[N] = nv−C i=1 p i = Tβ(v − C/n) in any case. Formally, we again assumed that the Y(i) were independent given T. Our argument that outbreaks grew more slowly in the homogenous case has two parts (we show later that outbreaks were also more likely to fizzle out in the homogenous case). First, we will show that the variance in the number of new infections, Var[N], was maximized in the homogenous case. Second, we go on to show that increases in the variance of the number of infectious individuals, Var[T], decreased the expected number of new infections. While we prove both parts below, we were unable to combine them to argue about the outbreak size at any particular time, C(t), due to difficulty with the induction step.

Lemma 5. The homogenous case maximizes Var[N T, C]
.
where p * is the average infection probability as defined above. Thus, , proving the claim.

Lemma 6. Let C = T + R. Then, E[N R] decreases in Var[T|R].
Proof. By definition, Tβ n R and thus β n proving the claim.
In addition to the expected number of new infections being smaller in the homogenous case, we also showed that the probability of no new infections was larger in that case. This is particularly important at the beginning of an outbreak, which can fizzle out if the first few infected individuals do not manage to infect any others.

Proof.
The proof is similar to that of Lemma 5. Note that , proving the claim.

The Impact of Population Size
In this section, we not only confirm the convergence proof in Section 3.1 but also provide how large a population size is large enough to be close to converging. Figure 1 examines the effect of changing the population size on the average outbreak size and the probability of a large outbreak, one where five or more become infected. After increasing initially, both outcome measures varied little as the population increased beyond a thousand. This implies that there was no gain to simulating measles outbreaks with larger populations.

The Impact of Heterogeneity
In this section, we confirm that heterogeneity resulted in more infections and numerical results allowed us to determine whether the increase was significant. Figure 1 also shows how the average outbreak size and the probability of a large outbreak was greater for a community divided into two subgroups than a homogenous community while keeping the total population size fixed. Figure 2 focuses on this aspect of the community structure by comparing the distribution of the outbreak size in the two cases. It also shows that large outbreaks were slightly more frequent in the inhomogeneous case. There were multiple reasons including the difference in stopping times between subgroups and the heterogeneity of the force of infection.  Figure 3 further explores the community structure in the case of the two subgroups by varying the mixing parameter, which specified how separate the two subgroups were. The mixing parameter was the ratio of the likelihood that an individual infected someone in the other subgroup rather than someone in the same subgroup. When the mixing parameter was zero there was no mixing between subgroups. In that case, we focused on the subgroup with the initial infection (because it would never spread to the other subgroup) and viewed it as a single homogenous population of half the original size. This partly explains why the expected number of infected cases was lower with no mixing. When the mixing parameter was one, there was complete mixing. This was like a single homogenous population except that, as discussed in the previous figure, there was an additional lag until the outbreak was contained. Figure 3 shows that as the mixing parameter increased, the expected number of outbreak size seemed to slightly increase.

The Impact of Public Health Interventions
We now turn our sensitivity analysis to important parameters for public health interventions. Figure 4 focuses on the effect of changing the vaccinated fraction of the population. For each percentage point that the vaccinated fraction increased, the expected size of the outbreak decreased by approximately 0.35 and the probability of a large outbreak decreased by approximately 3.3 percentage points.
Two other important parameters for public health interventions were the daily probability of detecting an infected case and the lag between the first detected case and a public health response stopping the spread of the outbreak. Table 3 examines the sensitivity to these parameters for the average outbreak size (E[C]), the probability of a large outbreak with five or more infected (P[C ≥ 5]) and, for the case of a population with two subgroups, the probability the infection spreads from one subgroup to the other (P[C 2 ≥ 1]). While the benefits of increasing the detection probability were significant, we saw diminishing returns for greater increases. Decreasing the lag between detection and response also decreased the outcome measures (except for the probability of the second subgroup being infected, which changed little) but only by a small amount. By the time a case was detected, the outbreak had already spread to the other subgroup. Thus, a few days of delay did not make much difference and the change in P[C 2 ≥ 1] was even less.  Table 3. Additional sensitivity analysis. Here C 2 is the number of infected cases in the other subgroup (not the one with the initial infection). The number of replications was 1000. The standard error was less than 0.18 for E[C] and 0.02 for P [C ≥ 5] and P [C 2 ≥ 1].

Two Community Groups
One Community We simulated several scenarios to conduct a sensitivity analysis for the less stylized case of transmission between two schools, a smaller elementary school of 500 students and a high school with 2000 students. We assumed that initially there was a single infected case in the elementary school. In Table 4, we examine different choices for the fraction vaccinated, the mixing parameter, the daily probability of detecting a case and the lag between the first reported case and stopping the outbreak. The outcome measures were the same as in Table 1, with P[C 2 ≥ 1] now denoting the probability that the outbreak spread from the elementary school to the high school. The results had the same trend as those in the case of a population with two equal-sized subgroups we discussed previously. As the vaccinated fraction of the population, the mixing parameter or the response lag increased, the expected number of infected cases, the probability of having at least five infected cases and the probability of having at least one infected case in the high school when the public health authorities took action decreased. Table 4. Sensitivity analysis for two unequal-sized groups, an elementary school of 500 and a high school of 2000 individuals. The initial infection was in the elementary school and C 2 is the number of cases in the high school. There were 1000 replications. The standard error was less than 0. 23

Discussion
We presented a stochastic simulation model of a measles outbreak to investigate the impact of the simulated population size, model complexity and public health interventions. We studied this for both homogeneous mixing (i.e., a single community) and an inhomogeneous case where the population was divided into two community groups. We observed the final number of infected cases, the number of infected cases in the second community group where the initial number of infected people was zero and the probability of the infection spreading from the initially infected community group to the second community group. Our theoretical analysis showed that results converged to a finite limit as the population size increased and gave multiple reasons (from the way that outbreaks were stopped to the role of heterogeneity) why the expected outbreak size for a community divided into two subgroups was greater than in the case of a single homogenous community. We also performed a sensitivity analysis on various model parameters as well as suggesting public health interventions such as increasing the probability of reporting a case and decreasing the lag between detecting a case and the public health authority stopping transmission.
We found that the size of the modeled population ceased to matter after exceeding 1000 individuals (Figure 1), matching the theoretical result. This made sense because with 10% unvaccinated, the unvaccinated population was 100 individuals, which was large enough so that the stochastic effects of small numbers were small. For smaller populations, the outbreaks were smaller as they were limited by the population size. Overall, the two subgroup model had slightly larger outbreaks compared with the homogenous model (Figure 2), which also matched the theoretical results. Increasing the mixing parameter in the two subgroup model increased the average outbreak size but the effects were small (Figure 3). In addition, the probability of an outbreak spreading from one subgroup to another was around 50% and was somewhat insensitive to the parameters. Increasing the vaccinated fraction had a larger effect than increasing the probability of detecting a case, which in turn had a larger effect than decreasing the lag between detection and a public health response in terms of decreasing the size of the outbreak (Figure 4 and Table 3). The asymmetric case with a bigger high school and a smaller elementary school had similar results. Larger outbreaks (i.e., lower vaccination rates or smaller detection probabilities) were also associated with higher probabilities of the infection spreading from one subgroup to another (Table 4).
One limitation of our study was that we only examined two community structures (one with homogenous mixing and one with two subgroups). The hope is that these represent the extremes and other community structures will have outcomes between the two extremes. Our model did not capture the possibility that the unvaccinated individuals were all clustered around the index case in the social network; our model assumed that the unvaccinated were spread randomly within each subgroup. Social network data could evaluate the likelihood of this possibility. This could also be captured in our model by restricting the study population to the smaller set of social contacts of the index case (e.g., maybe 20 individuals with 40% vaccinated). We are not aware of such social network data. Another limitation was that there are no studies to justify a value for the case detection probability in our model. Such a study would be valuable especially because the true extent of measles outbreaks is uncertain (i.e., underreporting) [54] and because we found that increasing the detection probability would be an effective way of decreasing the size of measles outbreaks. In addition, we also did not consider details such as variability in the measles progression across different individuals. However, this was not the focus of our study.
The model and the parameters were mainly based on measles in the U.S. The characteristics of the model especially the values of the parameters can change based on applications to other diseases and geographies. For example, the average MMR vaccination coverage rate is around 90% in the U.S., thus the median number of outbreak size is very low (five people). It might be hard to attain these vaccination coverage numbers in other countries and for other diseases. As a result, the outbreak sizes might be larger.
There are three conclusions we can draw. First, unless the population of interest is actually small, modelers can save a lot of computational effort by limiting their simulated population to 1000 individuals with little cost in accuracy. Second, this is irrespective of the heterogeneity in the population. As heterogeneity increased, the expected results increased slightly. Thus, unless focusing on the impact of specific interactions, one can exclude complex heterogeneity items from models. This would also speed up validation, calibration and sensitivity analysis. Third, in contexts where it is difficult to raise the vaccination coverage (which are many), alternative interventions such as better detection of infections and a faster response to any detected cases should be considered.
COVID-19 has demonstrated the importance of transforming scientific knowledge on epidemic modeling into an effective response that saves as many lives as possible and reduces the financial and health consequences to society and the economy. Many researchers suggest the use of learnings from measles to contain the pandemic as viruses causing both diseases share same protein and have a similar epidemic progression. Hopefully, conclusions drawn from this study can inform policy makers as they make long term plans for the global COVID-19 pandemic. Policy makers and researchers have to consider adjustments. For example, measles is one of the most contagious diseases (with R 0 of 18). With a low median size of outbreak, the time it takes for the confirmed cases to be brought to public authorities (in other words, simulation length) is short. For diseases with lower R 0 such as COIVD-19, the simulations might need to take longer. Currently, a vaccination for COVID-19 does not exist so the results from this study can be more useful once the vaccination uptake is in further stages such as elimination. This study can also help when considering and evaluating alternative strategies to vaccination during the vaccination uptake. Additionally, insights from this study can help decision makers to deal with future outbreaks efficiently and effectively before they become pandemics.