Real-Time Optimization of Social Distancing to Mitigate COVID-19 Pandemic Using Quantized Extremum Seeking

: The application of extremum seeking control is investigated to mitigate the spread of the COVID-19 pandemic, maximizing social distancing while limiting the number of infections. The procedure does not rely on the accurate knowledge of an epidemiological model and takes realistic constraints into account, such as hospital capacities, the observation horizon of the pandemic evolution and the quantized government sanitary policy decisions. Based on the bifurcation analysis of a SEIARD compartmental model providing two possible types of equilibria, numerical simulation reveals the transient behaviour of the extremum of the constrained cost function, which, if rapidly caught by the algorithm, slowly drifts to the steady-state optimum. Speciﬁc features are easily incorporated in the real-time optimization procedure, such as quantized sanitary condition levels and long actuation (decision) periods (usually several weeks), requiring processing of the discrete control signal saturation and quantization. The performance of the proposed method is numerically assessed, considering the convergence rate and accuracy (quantization bias).


Introduction
Since January 2020, human society has been deeply impacted by the COVID-19 pandemic.In this context, mathematical modeling and numerical simulation of the virus spread as a function of several factors, including social distancing, testing and quarantining, mobility restrictions and vaccination, have been playing a key role in the decision policy of many governments worldwide [1].The most popular dynamic model finds its origin in the work of [2], who proposed a compartmental representation, categorizing people in several possible states such as susceptible (S), symptomatic Infected (I) or Removed/Recovered (R).The so-called SIR models provide predictions based on historical data and can be used to develop hypothetical control strategies.For instance, Ref. [3] proposes an optimal SEIAR model-based open-loop control approach (adding the Exposed and Asymptomatic compartments) and suggests that on-off policies alternating between strict social distancing and relaxing can be effective at flattening the infection curve.Furthermore, Ref. [4] investigates open-loop optimal control as well as model predictive control (MPC) with online adaptation of the social policy constraint, and robust MPC using interval state estimation to take account of uncertainties in the model and measurements.In the same spirit, Ref. [5] develops an MPC control strategy taking account of time-dependent specifications and logical relations between model variables, and multiple predefined discrete levels of governmental interventions (control input quantization).As all the model variables are not accessible to measurements, it is necessary to develop state estimators in order to apply full-state feedback, which poses additional challenges.In [4], an interval observer is developed, whereas an observer for Linear Parameter Varying (LPV) systems is designed in [5].
Data-driven control methods have also attracted interest, with different optimal formulations, such as in [6], showing that the cost of eradicating the disease may be significantly higher than the cost of managing the pandemic by hospital saturation limitations, which is claimed to be the policy adopted by several US local administrations.Ref. [7] also recommend deep-learning-based strategies to mitigate the pandemic, assimilating a high number of data samples to approach hyper parameters (effective reproduction number of the virus over time, hospitalization rate, etc).
As stressed in [8], one can of course question the validity of the dynamic prediction models, and several recent research papers have proposed real-time optimization (RTO) strategies to take model uncertainties into account either by considering robust-to-mismatch or, more radically, model-free techniques.In one of our recent publications [9], discrete model-free extremum seeking (ES) has been been applied for the first time to the control of social distancing while avoiding hospital saturation.This contrasts with the first MPCbased studies of [3,4], which only aim at limiting the infectious cases in a conservative way.
Another emerging research stream aims at assessing the risk of several alternative control policies including the consideration of vaccination strategies guided by sociodemographic and health factors [10] as well as the possible withdrawal of the vaccination passport to grant more freedom [11].
In this connection, more elaborate objective/cost functions are considered, such as [12], who proposes a concomitant optimization objective with the concern of providing advanced solutions considering people psychological health.In the same stream of studies, Ref. [13] develops an original sliding-mode-based RTO design, adapted to a SIRDQ model with the objective of reducing the quarantine period while guaranteeing an effective regulation of the reproduction number to a desired value.Several numerical validations are proposed using first-order sliding-mode and second-order super-twisting methods.
The objective of the present study is to investigate the application of model-free quantized discrete extremum seeking control (QESC) to achieve the optimization of social distancing while mitigating the pandemic and limiting the number of infection cases.Extremum seeking control (ESC) is an RTO method achieving a direct input adaptation [14] to reach a steady-state map extremum, either by tracking an uncertain model-based trajectory [15] or by relying on the existence of a measurable convex objective function without any a priori knowledge about the process model [16].The latter is also denominated as model-free perturbation-based ES and aims at estimating the objective function gradient and forcing its estimate to zero while persistently exciting the input using a periodic dither signal.Several review papers highlight the potential of the ES methods to solve RTO problems in different scientific fields (see, for instance, [17,18], for reviews of ESC developments over the last few decades).
As underlined in [9], discrete ESC presents some operating challenges such as the condition of persistency of excitation [19], which prevents the output signal from reaching a true steady-state (the latter is only achieved on average [16]), the convergence dependency on the dither signal frequency which should be adapted to the process operating conditions and time constants [20], and the nature of the actuator, which is not assumed to present saturation or quantized levels.The latter issue has recently been tackled by [21], based on the work of [22], who lay the foundations of the anti-windup ESC providing stability and convergence proofs.However, actuator saturation and quantization studies in the framework of ESC are limited to two-level situations and, in this work, an extension to multiple quantized levels is proposed, which corresponds to the various social distancing levels that could be imposed in a governmental policy.
The motivation of this work is therefore to extend our preliminary results [9] in order to include a rigorous treatment of saturation and quantization of the control signal (i.e., social distancing in the context of the pandemic) using the results of [21].To sum up, the objectives are to design (i) a procedure focusing on psychological health and the reduction of social distancing since hospitals should be less and less likely to overpass their bed capacities thanks to the vaccination, (ii) a realistic discrete software tool supporting decision policies without requiring significant computational loads (in contrast with, for instance, deep-learning based methods) and (iii) the first validation of a QESC strategy in the framework of the COVID-19 pandemic.
The next section presents the epidemiological model used in [3] as an emulator of the population behavior to test the ESC approaches.Section 2.2 computes the equilibrium points of the model and demonstrates a bifurcation behavior depending on the level of social distancing.In Section 2.3, a measurable cost function is proposed, which will use the concept of barrier functions, and serves as basis for ESC, which is further discussed in Section 3. The numerical application is detailed in Section 4, where the two time scales of the convergence are highlighted and the issue of quantization of the measures is introduced.The final section is dedicated to conclusions and research perspectives.

COVID-19 Outbreak Modeling 2.1. SIR Modeling
Compartmental population modeling ( [2]) is, by far, the most common formalism to model epidemics and describe the transitions between susceptible S(t), infected I(t) and removed/recovered R(t) states.In [9], the compartmental SEAIR model of [3] describing the COVID-19 outbreak is considered, which also accounts for the asymptomatic population A(t) (this class of individuals gathers cases which are not detected due to asymptomatic conditions, or due to the lack of testing) as well as the exposed population E(t).This model also includes mortality, with a perished population P(t), but no natality.The resulting dynamics of the several compartmental variables are represented by an ordinary differential equation system as follows: where N is the total population and S, E, A, I, R, and P are, respectively, the susceptible, exposed, unreported infected (asymptomatic or unconfirmed), reported/confirmed infected, removed/recovered and perished populations.The parameters α a and α i are the rates of exposure to the A and I populations, respectively.α a characterizes, in a broad sense, social distancing and α i , quarantining, and can be considered as manipulated (control) inputs from a system and control perspective, as well as the screening/testing rate κ.Constant (at least in first approximation) parameters account for the (inverse of the) latent period of the virus l (0.5 days −1 ), the infectious period of unconfirmed cases ρ (0.1 days −1 ) and the recovery rate β (0.025 days −1 ).These parameters represent the situation in the US in 2020 according to [3].

Bifurcation Analysis
Neglecting the death rate µ, which is fortunately very low as compared to the recovery rate γ (one to two orders of magnitude smaller) and considering a constant total population N in model (1), two equilibrium points can be obtained, which correspond either to the extinction of the infection (the steady-state susceptible population level is S ss = N and all other variables are 0) or to the stabilization of the epidemics, i.e., non-zero steadystate values of the several variables depending on the several rates defined in Table 1.The interested reader may refer to [9] to find the detailed expressions and their derivation.
A local stability analysis based on the Jacobian of (1) around the equilibrium points show that the eigenvalues are (non-strictly) negative (one eigenvalue is always zero) over a social distancing range of α a = [0.050.4], exhibiting a dynamic bifurcation at a critical value α a,c , a function of the chosen parametrization.The resulting eigenvalue trajectories therefore present two arcs, separating the range of α a values in two categories, each of them leading to the epidemics extinction (α a < α a,c ) or stabilization (α a > α a,c ).

Constrained Objective
Most of the published studies of optimal control applications to the COVID-19 outbreak require the knowledge of a dynamic model in the form of Equations ( 1) and some robustness provision to account for parameter uncertainties.In contrast, we aim at proposing a model-free strategy allowing direct social distancing adaptation under realistic decision policies with long observation periods (e.g., several weeks) and long sampling periods.Indeed, the pandemic dynamics evolve with the vaccination rate and efficiency as well as the appearance of new mutant strains, challenging model-based strategies.
In most studies, the focus is put on the fatality or infected case limitations, whereas the objective of the present study is to apply an optimal control policy minimizing social distancing (maximizing α a ) with the concern of psychological health [3,12,23], under constraints such as hospital bed capacity.
The objective function therefore reads: where −α a represents social distancing while ψ and φ are respectively a logarithmic barrier on the infected cases and a penalty constraint on the comfort of social distancing: where η ψ , η φ and are design parameters.I re f represents the critical level of infections, corresponding to a number of infected people which might lead to an overflow of intensive care hospitalizations.α a,re f is the penalty reference for social distancing, i.e., a level at which people will start feeling psychologically affected.However, logarithmic barriers may sometimes induce numerical issues during transient phases, and, as recommended in [24], Equation (3a) is approximated by a combined barrier-penalty expression as in: which is active in the feasible region I(t) − I re f ≥ and which is active in the complementary region and where η P is a new design parameter.
Objective 2 is then rewritten as: The chosen parametrization is summarized in Table 2 and Figure 1 shows the evolution of ( 6) as a function of α a after 200 days and once in steady-state.To solve this minimization problem, a discrete extremum seeking strategy has been proposed in [9], resulting in a two-stage convergence rate, first, quickly catching the transient optimum and greedily tracking [25] its drift towards the steady-state optimum (highlighted by the star in Figure 1).Even if this application was successful, several practical shortcomings were highlighted, such as the inconsistent daily changes and the infinity of possible quantization levels (each of them assimilated to an adopted sanitary policy) of the social distancing variable.In this study, we therefore propose a new problem formulation including these important aspects to make the control policy applicable in a real epidemiological context.

Classical Discrete-Time Extremum Seeking
Extremum seeking (ES) is a real-time optimization (RTO) strategy driving a system to optimal operating conditions corresponding to the extremum of a measurable convex objective function J [26].To apply this approach, model 1 and objective function 6 are first cast under the following generic nonlinear state-space form: where x ∈ n is the state vector, u ∈ r the input vector, y ∈ m the output vector, C the m × n measurement matrix and J the cost function to be minimized.The convergence of the extremum seeking algorithm is guaranteed if (i) there exists a unique couple of minimizers x * and u * under achievable steady-state conditions, and (ii) if the cost function is convex, fulfilling the necessary condition of optimality [16].In the COVID-19 pandemic context, a daily reporting of cases is delivered, and a discrete formulation of the perturbation-based ES is therefore recommended, based on the scheme represented in Figure 2.

Static Map
(  ) Discrete perturbation-based extremum seeking [26].The input u is modulated with the dither signal d perturbing the measured objective function h = J.The latter signal is then demodulated in two steps: first by removing the continuous component and low frequencies through a high-pass filter with cut-off frequency f HP , then by multiplying the filtered signal h HP by the dither signal to isolate the information on the gradient ξ at ω.The integration of the gradient estimate provides the input estimate û.
The system input is excited by a periodic dither signal and the objective function measurement is high-pass filtered in order to recover the useful information at the dither frequency.The filtered signal, h HP , is then demodulated with the same dither signal, providing the cost criterion gradient estimate ξ = ∂h ∂u .Finally, the input signal is recovered from the integration of ξ.
The ES loop of Figure 2 is governed by the following equations: where f HP is the high-pass filter cut-off frequency, k I the integrator gain, k is the discrete time variable and T s the sampling period.The reader may refer to [16,26] for additional elements about stability and convergence analyses of discrete ES.Moreover, Ref. [27] also propose further analysis considering state constraints, introducing barrier and penalty functions, such as Equation (3).In the next subsection, to solve the practical shortcomings discussed in Section 2.3 regarding social distancing management, the particular case involving quantization of the actuator level is presented, adapting the strategy of [21].

Discrete-Time Quantized Extremum Seeking
Under specific quantized setting of the actuator over n steps covering the range of admissible values u k (k = 0, 1, . . ., n − 1) belonging to the set U, the input can be reformulated as follows: where the chosen constant actuator quantum is u + − − .The discrete perturbation-based ES equations become: and the ES scheme is updated by including the new quantizing blocks as shown in Figure 3.

Static Map
(  ) Bias estimator  ത  Figure 3. Quantized discrete perturbation-based extremum seeking (adapted from [21]).In comparison with Figure 2, this scheme allows for estimating the bias δ due to the input quantization (saturation) Γ(u), and providing a correction (by addition to the dither signal).
The bias created by the quantization of the input is comparable to a saturation which should be compensated in order to avoid windup of the ES integral loop and loss of convergence.A signal δ k is introduced by [21], accounting for the estimation bias in such a way that where N denotes a horizon over which the input signal is averaged.Equation ( 11) highlights the role of variable δ which acts on the input to compensate asymptotically the saturation bias.This variable is updated as follows: where λ is an adaptation gain chosen so as to allow δ to reach a sufficient level with respect to the dither signal magnitude and which measures the deviation between the quantization of the perturbed/compensated gradient (i.e., including the dither and the bias estimation) and its original quantized counterpart.
The magnitude of the dither signal a evolves in relation with the gradient estimate as suggested in [28], until a lower bound a − is reached.This allows the ES algorithm to reduce its oscillations (or even halt if a − ≈ 0) when reaching a sufficiently close neighborhood of the optimum.It also allows for increasing the dither magnitude if the gradient estimate suggests that a departure from the neighborhood has occurred (for instance, in the presence of external disturbances).It should be noticed that the adaptation of the dither magnitude assumes smoothness of the cost function in the optimum vicinity.Under persistence of excitation (PE), the ES algorithm converges in a close neighborhood of the optimum, function of the dither signal magnitude and frequency.This PE condition is fulfilled with: where 2 π tan −1 ( ξ) forces the magnitude a to evolve with the gradient.σ a should be set taking into account that the larger the dither magnitude, the faster the ES converges and the smaller the dither magnitude, the more accurate the ES algorithm.The value of γ a should then be selected so that gradient variations can be taken into account even if convergence is under progress and the magnitude of the dither signal has already significantly decreased.

Quantized ESC Application to the SEAIR Model
Following the evolution of the sanitary policies in the years 2020-2021 applied by the governments, which have been periodically tighten and relaxed, a quantized ES strategy appears quite instinctive.The application of classical constrained discrete-time ESC to a SEIAR model considering objective function 6 reveals that convergence is achieved in about 100 days to a transient optimum which is drifting with time to a steady-state optimum.The ESC is able to track this optimum in a greedy way over hundreds of days.However, classical ESC considers a daily policy change, which is impractical.The quantized ESC together with a sufficiently long sampling period is a more appropriate approach.In the following, this update period is set to one month (30 days).
The QESC parametrization is based on the guidelines of [21,26,27], and is reported in Table 3.The following numerical study considers two case studies, either with bias compensation or without (in the latter case, δ is simply set to zero and never updated).Figures 4-7 show the results of the QESC application over 1000 days.In Figure 5, the input evolution in both cases is similar, even though an offset appears after 100 days.The objective function reaches a transient optimum after 100 days, as it was observed in [9], before drifting to the neighborhood of the steady-state optimum after 350 days (about 1 year).We can conclude that constraining the problem by input quantization and longer sampling periods does not deteriorate the convergence time of the ESC strategy and, furthermore, that the bias compensation allows approaching more accurately the steady-state optimum.The adaptation of the dither signal magnitude a behaves as expected since it stops decreasing between 100 and 200 days, when the gradient has, on average, not yet converged to 0. After 200 days, the exponential decrease restarts and, interestingly, the bias compensation variable δ also stops varying at the same time.28.This diagram confirms that, thanks to the bias estimation, the QESC is able to drive the system at the closest quantized level of the optimum J * (α * a ).

Conclusions
This study proposes an original application of quantized extremum seeking control (QESC) to solve the social distancing optimization problem in the framework of the COVID-19 pandemic.The problem formulation aims at minimizing social distancing, preserving population psychological health, while avoiding the number of infections from rising above a specific level defined, for instance, by hospital bed capacities.The proposed strategy does not rely on the a priori knowledge of an epidemiological model and only adapts social distancing on the basis of an objective function measurement.Considering a compartmental SEAIR model as digital simulator of a hypothetical sanitary situation, discrete ES is able to quickly converge to a transient optimum of the objective map which slowly drifts until reaching steady-state.The proposed QESC deals with practical shortcomings such as (i) the dither signal magnitude reduction/extinction when stabilizing at the optimum, (ii) the long observation period following the application of a sanitary policy and the corresponding long sampling period constraint, (iii) the quantization of the decision policy over a limited number of decision levels and (iv) the compensation of the saturation bias.The results show no deterioration of the convergence performance while improving the simplicity of decisionmaking.Future work includes the combination of quarantining, testing and vaccination as new inputs, requiring strategies like mutivariable ES [16], maximum-likelihood ES [29] or Newton-based ES [30].

Figure 1 .
Figure 1.Evolution of objective function (6) with respect to the input α a , describing the cost of pandemic mitigation with respect to the social distancing level.This figure highlights a unique optimum represented by the black star.Continuous line: steady-state values.Dashed line: transient values after 200 days.

1 DFigure 4 .Figure 5 .Figure 6 .Figure 7 .
Figure 4. Application of discrete QESC to system (1)-time evolution of the states.In blue: QESC with bias compensation-In dashed red: QESC without bias compensation.Even if the state variables present almost identical transient trajectories, the latter diverge after 200 days and end up in different steady-states.Non-intuitively, converging to a closer neighborhood of the cost objective optimum (i.e., optimizing social distancing) unfortunately leads to slightly higher casualties while still limiting the number of infections.

Table 3 .
Parameter values of the ES algorithm.