Three-Stage Numerical Solution for Optimal Control of COVID-19

: In this paper, we present a three-stage algorithm for ﬁnding numerical solutions for optimal control problems. The algorithm ﬁrst performs an exhaustive search through a discrete set of widely dispersed solutions which are representative of large subregions of the search space; then, it uses the search results to initialize a Monte Carlo process that searches quasi-randomly for a best solution; then, it ﬁnally uses a Newton-type iteration to converge to a solution that satisﬁes mathematical conditions of local optimality. We demonstrate our methodology on an epidemiological model of the coronavirus disease with testing and distancing controls applied over a period of 180 days to two different subpopulations (low-risk and high-risk), where model parameters are chosen to ﬁt the city of Houston, Texas, USA. In order to enable the user to select his/her preferred trade-off between (number of deaths) and (herd immunity) outcomes, the objective function includes costs for deaths and non-immunity. Optimal strategies are estimated for a grid of (death cost) × (non-immunity cost) combinations, in order to obtain a Pareto curve that represents optimum trade-offs. The levels of the four controls for the different Pareto-optimal solutions over the 180-day period are visually represented and their characteristics discussed. Three different variants of the algorithm are run in order to determine the relative importance of the three stages in the optimization. Results from the three algorithm variants are fairly consistent, indicating that solutions are robust. Results also show that the Monte Carlo stage plays an especially prominent role in the optimization, but that all three stages of the process make signiﬁcant contributions towards ﬁnding lower-cost, more effective control strategies.


Introduction
Optimal control may be characterized as the determination of inputs to a dynamical system that optimize an objective functional while satisfying all constraints of the system [1]. In epidemiology, optimal control plays an important role in finding from among the available strategies the most effective strategy to reduce the infection rate to an acceptable level, while minimizing the cost of therapeutic or preventive measures that control the disease progression.
Two major mathematical results in the theory of optimal control are Pontryagin's maximum principle and the Arrow theorem. In some cases, these are sufficient to determine optimal solutions. For example, Pontryagin's principle is applied to a mathematical model of dengue transmission in [2] to determine optimal interventions. However, more realistic models often do not satisfy the hypotheses required for theoretical analysis. In such cases, various numerical methods are used to estimate optimal controls. For example, the authors in [3] investigate the measures against dengue fever by using mathematical modeling and numerical methods of optimal control. Optimal control theory and nonlinear programming Other related studies analyze the general control problem for COVID-19-based scenarios, without reference to any particular region. Mathematical models with nonpharmaceutical and pharmaceutical measures were formulated [13] where community awareness was considered. An investigation of the optimal control of epidemics where an effective vaccine is impossible and only non-pharmaceutical measures are practical is presented in [14] with the objective of minimizing the number of COVID-19 disease deaths by avoiding an overload of the intensive care treatment capacities and establishing herd immunity in the population in order to prevent a second outbreak of the epidemic, while keeping intervention costs at a minimum. A model which presumes two modes of virus transmission (direct contact and contamination of surfaces) is described in [15], and the effect of non-pharmaceutical interventions on the transmission dynamics is established. In [16], optimal control theory is applied to a COVID-19 transmission model where optimal control strategies are obtained by minimizing the exposed and infected population for a given implementation cost. By applying the Pontryagin maximum principle to characterize the optimal controls, the authors in [17] propose an optimal strategy by carrying out awareness campaigns for citizens with practical measures to slow down COVID-19 spread, diagnosis, including surveillance of airports and the quarantine of infected individuals.

COVID-19 Epidemic Model Formulation
A deterministic compartmental model was used to model the transmission dynamics of COVID-19. The model was previously used in [18], and is based on Yang et al. [6]. The model divides the entire population into two subpopulations characterized by low and high risk, respectively, of COVID-19 complications. Each subpopulation is further subdivided into the following compartments: susceptible(S), exposed (E), pre-symptomatic infectious (P Y ), pre-asymptomatic infectious (P A ), symptomatic infectious (I Y ), asymptomatic infectious (I A ), symptomatic infectious that are hospitalized (I H ), recovered (R), and deceased (D). Recovered individuals are assumed to have permanent immunity, and dead individuals are not infectious. The multicompartment model for each subpopulation is diagrammed in Figure 1, and the explicit equations are as follows: where j is the subpopulation index (0 = low risk, 1 = high risk), and N j = S j + E j + P A j + P Y j + I A j + I Y j + I H j + R j represent the subpopulation totals. The complicated nonlinear terms in the last two equations in System (1) reflect the additional mortality that occurs when the ventilator capacity (represented by the parameter θ) is exceeded. These terms are derived in [18]. The interpretation and numerical values of the model's parameters are listed in Table 1. At t = 0, all compartment populations are assumed to be nonnegative, and, for t ≥ 0, the model has non-negative solutions contained in the feasible region Γ = {S j , E j , P A j , P Y j , I A j , I Y j , I H j , R j , D j } ∈ R 18 + . Table 1. Baseline parameters used in the model [6].

Parameters Interpretation Values
β baseline transmission rate 0.0640 γ A recovery rate on asymptomatic compartment Equal to γ Y γ Y recovery rate on symptomatic non-treated compartment 1 γ Y = 4.0 τ symptomatic proportion 0.55 proportion of pre-symptomatic transmission 0.44 ω Y relative infectiousness of symptomatic individuals 1.0 ω P relative infectiousness of pre-symptomatic individuals Hospitals' capacity to treat seriously ill patients 3000 [19] 1/r number of deaths from the people who are put on respirators 1/3

COVID-19 Epidemic Model Formulation under Controls
In order to study disease mitigation, we introduce the effects of two controls: testing and social distancing. For our purposes, 'social distancing' refers to all measures designed to reduce potentially infectious social contact, including wearing masks, maintaining distance, additional cleaning and sanitizing, erecting barriers, limiting building and room occupancy, and closure of offices, stores, schools, etc. Social distancing reduces the overall infectivity, while COVID testing reduces the infectivity of the asymptomatic and presymptomatic infectious compartments. The model with controls is identical to (1), except the first two equations are modified as follows: where u i , v i denote respectively the levels of testing and distancing control for the two population subgroups (0 where Φ ji represents the mean number of contacts per day experienced by individuals in group i from individuals of group j. The values of the matrix (3) were obtained by averaging the contacts between low and high risk individuals over all age groups in the model [6].

Theoretical Results in Optimal Control
An optimal control problem comprises a cost function J[x(t), u(t)], a set of variables state x(t) ∈ X, a set of state controls u(t) ∈ U and all depending on t, where t 0 ≤ t ≤ t f , and the target is to find control u(t) and the corresponding state variable x(t) to maximize or minimize an objective functional. In the Lagrange formulation, the objective functional can be defined as follows: where the functions f and g are continuously differentiable. The control u(t) is piecewise continuous, and the state x(t) is piecewise differentiable. We give two important theorems in the study of the optimal control theories. The first theorem gives necessary conditions for an optimal control and corresponding optimal solution. These conditions are necessary to guarantee local optimality but are not sufficient to ensure global optimality. Theorem 1. (Pontryagin's maximum principle) [20].
If u * (t) and x * (t) are optimal for the problem then there exists a piecewise differential adjoint function λ(t) such that for all controls u at each time t, where the Hamiltonian H is given by and where the functions f and g are continuously differentiable, and x(t) is piecewise differentiable.
The proof of Theorem 1 may be found in [21].
The following theorem gives sufficient conditions for global optimality of a control, for control problems that satisfy certain conditions. Theorem 2. (The Arrow theorem) [22].
For the optimal control problem (5), the conditions of the maximum principle are sufficient for the global minimization of J[x(t), u(t)] if the maximized Hamiltonian function H, defined in (7), is convex in the variable x for all t in the time interval [t 0 , t f ] for the given λ.
The proof of Theorem 2 may be found in [23,24]. Both Pontryagin's and Arrow's theorem have conditions of applicability in order to establish local and global optimality, respectively. When these conditions do not hold, numerical methods may be used to approximate optimal solutions. Such methods usually involve performing both a wide-ranging search through the solution space, as well as a process of local convergence when a region of promising solutions is located. Simulated annealing and other Monte Carlo algorithms are prominent examples of such methods. However, when the dimensionality of the solution space is very high, Monte Carlo methods can be very computationally expensive. Although Monte Carlo methods perform the dual task of global search and local convergence, they are not really optimized for either task separately. Consequently, it is reasonable to include pre-and post-processing algorithms that assist the Monte Carlo in the tasks of global search and local convergence, respectively. The pre-processing algorithm makes use of prior knowledge to perform a discrete coarse sampling from the set of likely solutions, while the post-processing algorithm takes the Monte Carlo output and adjusts it so that it satisfies mathematical conditions for local optimality. The three-stage numerical algorithm proposed in this paper is based on this conception.

Objective Function
In this section, we develop an objective function that models the cost of COVID-19 infection on a human population. Our algorithm is designed to find controls that minimize this objective function, given the disease dynamics obeys Systems (1) and (2).
There are several considerations in selecting an optimal control strategy. Of course, the strategy should reduce deaths as much as possible-however, this goal must be balanced against the cost of implementation. Furthermore, it may be desirable to attain a certain level of herd immunity by reducing the number of non-immune individuals in the population, in order to avoid recurrence of the epidemic. Achieving this latter goal requires that a certain proportion of the population contracts the disease, thus incurring the risk of deaths. It follows that the goals of reducing deaths and attaining herd immunity are, to some extent, contradictory, and must be balanced against each other.
In order to create a single objective function, we assign a monetary value to deaths, as well as to remaining non-immune individuals. Naturally, it is difficult to assign a monetary cost equivalent for deaths. In Section 9, we will show how these monetary values can be adjusted to obtain optimal strategies for different death target levels, or for different target levels of herd immunity (as measured by the number of non-immune individuals remaining in the population).
We first establish some convenient notation: , In the following, we will often drop the function arguments (e.g., the 't' in u(t)) in order to simplify the equations. Using this notation, our objective function is defined as follows: where and where N A j = S j + E j + P A j + P Y j + I A j is the number of asymptomatic individuals in subpopulation. The functions α j and β j correspond to the cost associated with COVID testing and social distancing respectively for subpopulations j = 0, 1 (note these same cost functions were previously used in [18]). The coefficient a j0 represents the fixed cost when the testing program is implemented; a j1 is the proportional cost to the number of tested people, u j is the fraction of asymptomatic individuals in population j that are tested, and a j2 represents the increasing cost incurred as the testing program becomes more intensive (reflecting the law of diminishing returns). The cost function β j in (11) reflects the economic cost of social distancing measures, which increases at an increasing rate as measures intensify according to diminishing returns. The parameter v j expresses the proportionate reduction in contacts that result from the implemented measures. The coefficients e j (j = 0, 1) reflect the cost per death for the two respective population groups: it multiplies the sum of deaths that have occurred during the time interval plus the predicted deaths of individuals that are ill at the final time t f . The coefficients g j (j = 0, 1) are penalties for non-immune individuals that are still present in the population by the end of the period: these coefficients may be adjusted to achieve a desired level of herd immunity, as will be made clear in the subsequent discussion. The coefficients f j (j = 0, 1) multiply remaining individuals in the two respective groups who are infective, but who eventually recover: this cost is included because these individuals mediate the spread of the disease among the remaining population. Table 2 summarizes the cost and level coefficients used in our simulations. Justification for parameter values is provided in [18]. f 0 , f 1 cost multiplier for remaining infected 2 (assumed) g 0 , g 1 cost per remaining non-immune $1-10 6.5 (assumed)

Local Optimality Conditions for Testing and Distancing Optimal Control Problems
In this section, we will derive approximate local optimum conditions for the controls u k (t) and v k (t) for k = 0, 1 and for all times 0 ≤ t f . Before doing this, we must first make more precise the definition of local optimum in this context. We suppose that all controls are constant on all intervals of the form [nδ, (n + 1)δ) where δ is a small increment and (n + 1)δ ≤ t f . Given a control u with a corresponding solution X, then the control u k (nδ) (resp. v k (nδ)) is said to be locally optimal if the cost of solution X is less than or equal to the cost of the solution obtained when the control is modified by changing its value on the interval [nδ, (n + 1)δ).
In the following definition, for simplicity, we will assume that the time scale and solution values have been rescaled such that N 0 + N 1 = 1 and dX dt ∞ ≤ 1. We further suppose that δ as specified in the previous paragraph satisfies δ 1. Naturally, the original solution can be recovered from the rescaled solution by inverting the rescaling.
We consider first local optimal conditions for the testing controls u 0 and u 1 . We will compute the impact on the cost function from the perturbation of the control u k (t) for The perturbed control differs from the original control only in the values of u k (t) in the interval s ≤ t ≤ s + δ. The solution to the System (1) and (2) corresponding to the perturbed control u ζ is denoted by X ζ .
We may examine the dependence of J tot (u ζ , X ζ ) on ζ by decomposing it into three terms: Of these three terms, (1) and (2) is unchanged on this interval. It follows from the chain rule that: Noting that du k,ζ dζ = 1 and d 2 u k,ζ dζ 2 = 0 for s ≤ t ≤ s + δ, and using the fact that From (15) and (18), it follows that where the final inequality holds as long as ]. In view of the discontinuity of the function α in (11), this gives us the following local optimality conditions for the control u k (s): Given a control u, we may adjust the value of u k (s) to satisfy local optimality as follows. First, we approximate the solution ζ * to d dζ J tot (ζ * ) = 0 using Newton's method: From (18) and (20), we may conclude where The numerator and denominator in (22) can both be estimated numerically. Due to , and the fact that the function α in (11) is discontinuous at 0, we modify our final estimate of the local optimum value of ζ (denoted by ζ) as follows: We may similarly develop local optimal conditions for the distancing controls v 0 and v 1 . Define a perturbed control v ξ = u 0,ξ , u 1,ξ , v 0,ξ , v 1,ξ as follows: After calculations similar to those above, we obtain For v k (s), we have the following conditions: Given a control v, we may adjust v k (s) to satisfy local optimality conditions as follows. We first guess the solution ξ * to d dξ J tot (ξ * ) = 0 using Newton's method: and hence with can be approximated numerically.
Otherwise, ξ = 0. In conclusion, the locally optima conditions for the controls u k (s) and v k (s) are recapitulated in the following theorem: Given the objective function J tot defined by (10), perturbed controls u ζ and v ξ defined by (12) and (23), and values ζ * and ξ * that solve d dζ J tot u ζ , X ζ = 0 and d dξ J tot v ξ , X ξ = 0, respectively, then

Numerical Method for Estimating Optimal Controls
As mentioned above, local optima may not be global optima. There may be several local optima, and the problem is to identify which one is the global optimum. This requires that we do a wide-ranging search of solution space, as well as a directed search that converges to a local optimum. In this implementation, we do this in three stages. The first stage is to do an exhaustive search of a limited set of widely dispersed solutions which are representative of large subregions of the search space. In our case, solutions are limited to those in which controls only take at most three different values on three consecutive intervals that are specified. During the next stage, we allow solutions to vary on smaller intervals, which gives a much larger search space. On this search space, we implement a simulated annealing algorithm. Since this stage is still limited to a subset of the solution space, the resulting solution will still not be locally optimal. During the final stage, the solution obtained from simulated annealing is taken as the starting point to find a close-by locally optimal solution, which is our final estimated solution. The following subsections describe the three stages of the algorithm in sequence. The complete Python code, together with the testbench, is available on GitHub: the link is provided in the "Data Availability" section at the end of this paper.

Enumerative Method
The first stage of the solution algorithm involves an exhaustive search of a discrete set of diverse solutions. In order to create a discrete set, we consider controls that are constant on a fixed set of intervals, whose values are also taken from a discrete set. Supposing that we choose N intervals and allow the k'th control to assume a kn values on the n'th interval, then the total number of control strategies in the discrete set is ∏ N−1 n=0 ∏ 3 k=0 a kn . We then exhaustively evaluate the costs for all of these control strategies and identify the best. Naturally, both the number of intervals and the number of values per interval must be limited in order to keep the number of control strategies at a manageable level. The improved calculation method introduced in [5] enables more rapid computation of the solutions, by ordering the control strategies in such a way that successive strategies have minimal changes and recomputing only the portion of the solution that has changed.
In order to limit the number of solutions tried during the enumerative stage, the low-risk and high-risk testing control level were set to be equal, as were the low-and high-risk distancing control level. The number of intervals N was limited to small values, as described in Section 10. Each independent control had three possible values: 0, the maximum possible control level, (as specified in the last two rows of Table 2), and 1/3 of the maximum control level. Altogether, there are 3 · 3 = 9 different control combinations for each interval of constant control, meaning that the number of solutions tried during the enumerative stage was at most 9 4 = 6561. Under the method in [5], on average, only half of each solution is calculated, meaning that the computational burden was equivalent to computing 3281 solutions to the System (1). Solutions were computed using the solve_ivp routine from the scipy.integrate package in Python, with time steps equal to 1 day (180 time steps per solution).

Monte Carlo Method
The enumerative method gives the optimal solution from among a very limited class of strategies. Since most strategies do not belong to this limited class, it does not find an overall optimal solution. Nonetheless, it does provide a good starting point which can be further improved by additional search. In cases where the set of possible strategies has very high dimension, Monte Carlo methods are often used to search the solution space. Monte Carlo combines features of random search with directed search. In particular, simulated annealing is one Monte Carlo variant that alternates between exploratory and convergent modes. The balance between exploration and convergence is set using a temperature parameter (which is varied in a controlled fashion during the algorithm process): the higher the temperature, the more random the search for candidate solutions.
Simulated annealing algorithms have the following general structure: 1.
Randomly generate a candidate solution from a probability distribution (this distribution is modified as the algorithm progresses); 3.
Calculate the cost value for the candidate solution: 4.
If the candidate solution's cost value is less than the current solution's, then replace the current solution with the candidate solution. If the candidate solution's cost value is also the lowest so far obtained, then replace the overall best solution with the candidate solution; 5.
Else replace the current solution with the candidate solution with a probability that depends on the current temperature and the cost difference between current and candidate solutions; 6.
Update temperature and candidate probability distribution; 7.
If the current solution has not changed for n f lat iterations, then reset the temperature; 8.
If temperature has been reset n reset times, then terminate and output the best solution; 9.
In our implementation, all candidate solutions are piecewise constant on N inter = 20 equally-spaced time intervals that cover the 180-day period (compare the maximum number of intervals used in the enumerative solution, which was 4). This facilitates the generation of new candidate solutions, and provides solutions of sufficiently fine granularity so that there exists a candidate solution that is very close to the true optimum, which provides a suitable starting point for the local optimization described in the next subsection.
The steps listed above are described in more detail in the following subsections.

Probability Distribution for Candidate Generation
Each new candidate solution is a perturbation of the current solution, and differs by the current solution by individual control values on one or two intervals. The perturbation is generated by first randomly selecting two intervals, then randomly selecting two control measures, and then randomly generating small changes at a control level for the controls on these intervals, which are added to the current control levels for the two selected intervals. Each of these three selections rely on probability distributions that are updated after each iteration, based on the change in solution cost obtained by the perturbation.
Another distinctive feature of our method is that the generation of new candidate solutions involves changing control values on two different time intervals. This was done because it provides a way to shift the balance of control to an earlier or later period during the simulation period. For example, by decreasing earlier controls and increasing later controls, it is possible to delay the start of control but maintain the overall intensity. Such shifts cannot be accomplished if controls are changed only for a single time interval.
The probability distributions for interval selection, control selection, and control perturbation values were chosen as follows: • Two select interval pairs for control change, an N iter by N iter matrix P, was defined (where N iter = 20 as above) and initialized with 2's on the diagonal and 1's in all off-diagonal entries. At each iteration, P is normalized so that entries sum to 1, and then a random pair of indices (n 0 , n 1 ) is chosen according to the resulting distribution. • To select controls to be changed, a N iter by 4 matrix U was defined, and initialized with all entries equal to 0.5. In order to select the controls for intervals n 0 and n 1 , the n 0 and n 1 rows are each normalized separately to sum to one. We denote the normalized values by U n k ,j , respectively, where k = 0, 1 and j = 0, 1, 2, 3. Then, control j in interval n k is selected for changing with probability U n 0 ,j . Note that, according to this scheme, more than one control may be changed for each interval. If no control is selected, the random selection is repeated until at least one control is selected. • To determine the magnitude of control perturbations, a N iter by 4 matrix R was defined, and initialized to initStep. The perturbed control u n k ,j is changed by an amount R n k ,j · rand · B k,j , where rand denotes a uniform random variable on [0, 1] and B k,j denotes a Bernoulli random variable that takes the values −0.5 or 1 with equal probability.

Candidate Selection
The new solution with perturbed controls is computed, and the objective function is evaluated. The new solution is selected with probability min(1, e −∆/T ), where ∆ is the difference between the new and current objective function values and T is the temperature. In this manner, solutions that do not improve the objective function are sometimes accepted, where the probability of acceptance decreases with increasing ∆. The acceptance rate is regulated by the temperature-if the temperature is very high, most new solutions are accepted, regardless of ∆, while, if the temperature is near 0, then only new solutions with ∆ ≤ 0 are selected. Hence, when the temperature is high, the algorithm may escape from a globally suboptimal local minimum, while, when the temperature is low, the algorithm systematically moves towards the minimum.

Temperature Update
The temperature T is given by T m · ∆ avg , where T m is a temperature multiplier (initialized as 0.001) and ∆ avg is a running average of the observed positive changes in objective function over the last several iterations. At the end of each iteration, T m is multiplied by T reduce (where T reduce < 1 is an algorithm parameter), and ∆ avg is updated as follows: where φ mem < 1 is an algorithm parameter which determines the "memory" of the running average process. The above prescription scales the temperature appropriately to match the size of observed ∆'s. Simultaneously, the temperature multiplier gradually lowers the temperature, which represents the cooling process that is part of simulated annealing.

Probability Matrix Updates
At each iteration, the matrices P, U and R are updated to increase the probability of favorable interval and control selections and decrease the probability of unfavorable selections. The update equations are given as follows: P n 0 ,n 1 = max(P min , P n 0 ,n 1 − sign(∆) · P δ ), where U δ and R δ are algorithm parameters that control the sizes of changes in the U and R matrix entries, respectively, and δu n j ,k is the perturbation in control u n j ,k .

Temperature Reset
A count is maintained of the number of iterations since the last objective function decrease was observed. If this count exceeds n f lat , then the temperature multiplier is reset to T reset (a high value), and the iteration count is reset to zero (n f lat and T reset are both algorithm parameters). This ends the current cooling cycle, and starts a new one, putting the system into a "hot" state during which the solution may escape from a local minimum.

Cycle Termination
For each cooling cycle, a record is kept of whether or not the objective function was improved during that cycle. If not, the 'no improvement' count is increased by one. If the 'no improvement' count reaches a threshold N iter , then the algorithm terminates.
A pseudocode for the simulated annealing algorithm is given in Algorithm 1. Algorithm parameters are listed in Table 3. Algorithm parameters were not fully optimized, but some adjustments were made based on preliminary observations of algorithm performance.

Algorithm 1:
Monte Carlo algorithm pseudocode 1: Initialize system parameters (Table 1), costs (Equation (11)), initial vector xI, final time t f 2: Initialize algorithm parameters: (Table 3) 3: f lat count = 0 4: Inititalized P, U, R as defined Section 8.2.1 5: uInt best = Intervalwise average of initial control u 6: Compute x best , cost best = cost curr using initial vector xI and controls uInt best 7: while reset count < n reset do 8: Select two random intervals for control changes 9: Select random control measures for the two intervals to be changed 10: Compute random changes for selected measures on selected intervals 11: Updated vector of control measures uInt and uTmp 12: Recompute x tmp , cost tmp using initial vector xI and controls uTmp 13: if cost tmp < cost curr then 14: Reset f lat count to zero 15: update P, and U, to increase probability of favorable choice. 16 T m = T m · T reduce 37: end while 38: U best = expand uInt best to size t f 39: return cost best , U best

Local Optimum Solution
Although the Monte Carlo algorithm improves on the enumerative solution, there is no guarantee of optimality: in particular, the solution may not even be locally optimal. However, we may use the Monte Carlo output solution as a starting point to find a locally optimal solution, using an algorithm based on the local optimality conditions derived in Section 7. A pseudocode for the local optimization is given in Algorithm 2. end if 47: end for 48: compute cost best using initial vector xI and controls u best 49: return cost best , U best

Overview
Our analysis of simulation results focuses on three different outcomes: the implementation cost; the number of deaths; and the final level of herd immunity, which is measured by the number of non-immune individuals remaining in the population at the end of the period. We want to evaluate how different strategies can achieve different trade-offs between these outcomes. In order to do this, we independently vary the costs per death and per non-immune individual (e j and g j respectively in the objective function (10)). By varying these costs and finding strategies that minimize the total cost J tot in (10), then one can find optimal strategies that achieve different target levels of death and herd immunity. For example, if one is primarily interested in reducing deaths to a certain level, one may set non-(died or recovered) cost to zero, and adjust the per-death cost so that the target death level is reached. On the other hand, if one wants to achieve a certain level of herd immunity, we can set the per-death cost at a low value and adjust the non-immune cost until the optimal solution reaches the target. The range of e j and g j values used in the simulation is specified in Table 2. Altogether, the two cost values used formed a 25 × 25 grid with logarithmic spacing.
In the algorithm described above, there are three stages to the optimization: enumerative, Monte Carlo, and local optimization. To evaluate the relative importance of each stage, we used three different combinations of these stages, and compared the solutions obtained and the runtime required to converge to a solution. The three combinations are denoted as follows: partial enumerative and partial Monte Carlo (PEPM); full enumerative and no Monte Carlo (FENM); and no enumerative and full Monte Carlo (NEFM). All three combinations used the same local optimization method as the final stage. We did not include a variant in which only local optimization was used because we found that the execution time was too long so that simulations could not be completed in a reasonable time frame. Table 4 shows the simulation parameters used in each combination, where significance of the different parameters is indicated in Algorithm 1. In the following subsections, we first present the Pareto curve that expresses the tradeoff between the opposing goals of minimizing deaths and maximizing herd immunity; next, we display and discuss the characteristics of the different Pareto optimal control strategies as they vary over the treatment period; then, we compare the contributions of the three optimization stages to the overall optimization, as well as comparing the relative effectiveness of the three different algorithm variants that emphasize different stages; then, we examine the characteristics of optimal solutions as a function of death cost and non-immunity cost; and finally we present and discuss the execution times for the three algorithm variants. Figure 2 shows the Pareto curves for the trade-off between deaths and herd immunity found by the three different algorithms. The points on each curve were found by first setting a target number of deaths, and then finding the solution which meets the death target and has maximum herd immunity level (given by the number of recovered). The colors of the dots indicate the implementation costs of the solutions. In our case, the target death values ranged from 0 to 100,000 with steps of 4000. This graph can be used to assess various trade-offs between herd immunity level and number of deaths. For example, if fewer than 40,000 deaths is desired, the curve indicates that, under the best possible strategy, the number of remaining non-immune individuals in the population is roughly 10 5 . Alternatively, if, for example, the target is to reduce non-immune individuals below 10 4.5 , then this level is reached at the cost of about 60,000 deaths. The figure shows that the three algorithms produce similar trade-offs between deaths and herd immunity, but NEFM is slightly better in that the number of non-immune achieved for each death level is slightly smaller in most cases. This shows that NEFM solutions tend to give better trade-offs than the other two, indicating that an extended Monte Carlo algorithm is very effective in most cases in locating overall optima, even when no preliminary enumerative search is conducted.  Figure 3 shows the implementation costs of the Pareto optimal solutions shown in Figure 2 as a function of death level. The dots' colors measure the herd immunity level (indicated by the number of recovered). The figure shows that the NEFM algorithm is generally finding solutions with lower implementation cost, even while improving the herd immunity over the other two algorithms. This makes NEFM indisputably the best of the three algorithm variants, underscoring the key importance of Monte Carlo in the optimization process.

Temporal Characteristics of Pareto Optimal Solutions
The heatmaps in Figures 4 and 5 show the control measures (testing and distancing, respectively) that were implemented during the 180-day period for the different Pareto optimal solutions indicated in Figure 2. The x-axis shows the time in days, while the y-axis represents the total number of deaths. The heatmap color for each pair of (x, y) values indicates the level for day x of the controls applied for the Pareto optimal solution that results in a total of y deaths over the 180-day period. Heatmap colors are associated with control levels according to the colorbar scale at right. For example, for PEPM, the low-risk testing control level on day 50 for a Pareto optimal strategy corresponding to 60,000 deaths, is around 0.7, as indicated by the yellow color of the pixel at the location (50, 60, 000). On the other hand, for the same strategy on day 100, it is almost zero, as indicated by the dark blue color of the pixel a location (50, 60, 000). To see the time progression of the Pareto optimal strategy that yields a total value of d deaths, one may scan across the pixels at locations (x, d) for x ≤ 0 ≤ 180.  The first row of figures shows the daily progression of low-risk distancing control levels for PEPM, FENM, and NEFM for Pareto optimal strategies that attain different overall deaths, as indicated on the y-axis of the graphs. The second row shows corresponding figures for high-risk distancing levels for the PEPM, FENM, and NEFM solutions. Figure 4 shows low and high risk testing strategies for different death levels. By comparing across the three columns, it appears that PEPM is intermediate between FENM and NEFM. FENM consistently places heavy emphasis on testing for about the first quarter of the period (except for below about 15,000 deaths, as well as testing in the third quarter of the period for deaths between 20,000-40,000. PEPM also has early testing, but at a less intensive level and for a longer time (about the first third of the full period), and less late testing at a later time. NEFM shows spotty testing throughout the central part of the period, with much less emphasis especially at deaths above 50,000. Note that, for all three cases, for death levels above 40,000, there is a final day beyond which no testing is employed: for example, when deaths = 80,000, there is no testing after day 100. This indicates that the desired level of herd immunity has been reached, and no further control is necessary.
Comparing across the rows in Figure 4, it appears that strategies applied to low-risk and high-risk subpopulations are similar, regardless of the algorithm. Consistently, testing applied to high-risk is slightly less intensive than for low risk. This may be due to the fact that, according to the contact matrix entries in (3), low-risk individuals have at least three times as many daily contacts with both low-and high-risk persons. It follows that low-risk individuals play a much larger role in spreading the disease than high-risk individuals, so it is reasonable that testing should be more focused on them. Figure 5 shows low and high risk distancing strategies for different death levels. Once again, PEPM is intermediate between FENM and NEFM. FENM has intensive distancing for the second quarter of the period, except below 20,000 deaths, where distancing stretches throughout the 180-day period. It appears that distancing levels are high during the intermediate period where testing is low, indicating a trade-off between the two control approaches. PEPM is similar, except there is also low-level distancing in the first quarter and intensive distancing is delayed. For NEFM, the period of intensive distancing is even longer, starting around 15-20 days and lasting for about half of the total period. Below 20,000 deaths, there is intensive distancing throughout the period for all three strategies, which explains why testing measures are greatly reduced for these death levels. It appears that distancing has a greater effect than testing in limiting disease exposure but is less preferred when some disease exposure is permitted in order to promote herd immunity.
As with testing, distancing strategies for low-and high-risk are similar regardless of death level for each of the three algorithms. Distancing also resembles testing in that above 40,000 deaths; there is a time at which all control is ceased, indicating that the disease has been sufficiently controlled and is left to run its course. Figure 6 shows the contributions of the three optimization stages towards lowering the objective function in the overall solution process. The graphs compare the total cost obtained at the end of each optimization stage, in proportion to the final cost obtained by the completed optimization procedure. In the PEPM solution, the partial optimal enumerative solution produces a total cost that is generally between 5-25 percent higher than the final cost obtained by the three-stage algorithm, while the partial Monte Carlo stage improves this result to reach between 1-10 percent higher than the final solution. In the NEFM solution, the Monte Carlo stage reaches within 15 percent of the locally optimal solution. In the FENM solution, the enumerative alone reached within 15 percent of the final locally optimal solution. Figures 7 and 8 both show the percentage total cost difference between two different optimization methods produced by each combination of (non-immune cost, death cost). The base 10 logarithms of non-immune cost and death cost are plotted on the xand y-axes, respectively. The heatmap color scale indicates the percentage cost difference between the optimized total costs obtained by the two different methods for each (x, y) combination. Figure 7 shows that PEPM resulted in a higher total cost compared to NEFM for majority of the cost combinations, where cost differences range up to 15 percent. However, there is a region in the graph (non-immunity cost from 10 0 -10 3 and death cost from 10 5.1 -10 5.25 ) where NEFM resulted in a higher cost than PEPM, with differences of up to about 5 percent. We will see in Figure 9 that the Pareto solutions shown in Figure 3 lie in the region where the difference between PEPM and NEFM is largest. The fact that NEFM outperforms PEPM for most cost combinations shows that the Monte Carlo stage has a larger effect on optimization than the enumerative method. Figure 8 shows similar differences between FENM and NEFM. The fact that sometimes FENM and PEPM gives better results than NEFM shows that the enumerative method can be useful in providing the Monte Carlo stage with a suitable initial condition.   The heatmap colors represent the quantity of interest for each graph, and the color scale for the heatmap is given by the first colorbar at right. The dots represent the (non-immune, death) cost combinations that correspond to the Pareto solutions shown in Figure 3, and the second colorbar on the right interprets the colors of the dots, which correspond to the log base 10 of the final recovered population attained by the solutions represented by the dots. Figure 9 shows a narrow transition region between cost combinations that produce few deaths (blue region) and those that produce many deaths (yellow region). The blue region corresponds to solutions that protect the population, but do not promote herd immunity, while the yellow regions show solutions where most of the population becomes immune. All of the Pareto optimal solutions are located in the transition region, for all three algorithms. By comparison with Figures 7 and 8, it appears that most of the Pareto optimal solutions lie in the band where NEFM produces better optima than the other two algorithms. Heatmap showing number of deaths as a function of death and non-immunity costs under PEPM, FENM, and NEFM algorithms, respectively. Pareto optimal solutions are indicated by hollow dots, and the dot colors correspond to number of recovered according to the color scale at the far left. The figure shows that all Pareto optimal solutions occur in the transition region between death-nonimmune cost combinations producing few deaths (blue region) and those corresponding to many deaths (yellow region). Figure 10 is comparable to Figure 9, except the heatmap shows the total number of remaining non-immune in the population after 180 days since the spread of the disease. This figure has similar characteristics to the previous figure-the low-death and high-death regions in Figure 9 correspond to low and high levels of herd immunity, respectively. The only visible difference is the transition band between low and high herd immunity is wider and more clearly defined, especially for the FENM and PEPM algorithm variants. Figure 11 is like the previous two figures except the heatmap shows the implementation cost. Low and high death regions correspond to high and low implementation costs, respectively. This figure differs from the previous two in that the transition band between high and low cost regions is somewhat lower, and the Pareto optimal solutions lie above the transition band instead of within. This is explained by the fact that our Pareto trade-off does not consider implementation cost, but is rather focused on deaths and herd immunity.

Execution Times for Three Algorithm Variants
The heatmaps in Figure 12 display the runtimes obtained by the three different algorithms for the different (non-immune, death) cost combinations. All simulations were done on a Macbook Pro with a single 8-core 2.3 GHz Intel Core i9 processor and 16 GB RAM, running Python 3.7.7. In comparison with PEPM and NEFM, the FENM cost combination had the longest runtimes. The runtime for the cost combination (5.5, 3) on FENM was 80 min, while, on PEPM and NEFM, it was less than 40 min. This drastic change in the runtime can be explained by the expensive enumerative search that must compute a large number of solutions. It may also be noted that Pareto optimal solutions are among those that have longer runtimes.  Table 5 summarizes mean runtimes for the three optimization stages, for the three algorithm variants. The PEPM method had the shortest mean execution time, and NEFM was about 30% longer (recall, however, that NEFM obtained better solutions). FENM had by far the longest mean execution time, which was more than double that of NEFM.

Conclusions
This paper applies numerical methods for finding optimal controls to an SEIR epidemic model of COVID-19 in the city of Houston, TX, USA with low-risk and high-risk population groups. Available controls include COVID-19 testing and various distancing measures. We formulate an objective function that models the cost of COVID-19 by assigning monetary values to deaths and non-immunity, which can be adjusted to obtain solutions that meet different death or herd immunity level targets. We determine the local optimality conditions for the optimal control problem under testing and distancing controls.
A three-stage algorithm for numerically estimating optimal solutions is designed, and the performance (in terms of final solution total cost and computational cost) for three algorithm variants is thoroughly investigated. Of the three algorithm variants, the version that included the longest Monte Carlo stage without any enumerative solution gave the best control solutions, in terms of finding strategies that give better trade-offs between death and herd immunity and at lower cost. This implies that, of the three stages, the Monte Carlo stage has an especially significant effect on the optimization. However, results also show that all three stages contribute to the obtaining of good control solutions. Pareto optimal solutions that balance death versus herd immunity level exhibit a near-linear trade-off between these two variables, as shown in Figure 2. The best obtained solutions show that distancing tends to outweigh testing, especially when the reduction of deaths is emphasized at the expense of herd immunity. Algorithm runtimes for the best-performing algorithm variant average under 20 min for a 180-day period, indicating that the algorithm is practically feasible for exploratory analysis.