A Hybrid Modeling Technique of Epidemic Outbreaks with Application to COVID-19 Dynamics in West Africa

Simple Summary The intrinsic dynamics of the propagation of a disease changes along an epidemic course, especially for long lasting epidemics such as the COVID-19. Indeed, the natural evolution of the pathogen and countermeasures such as quarantining, lockdown, social distancing and vaccination modify the transmission dynamics of the disease. With a view to match these theoretical changes to potential changes in observed epidemiological data, we designed a hybrid modeling framework where we integrated: (1) two growth curves for daily reported positive cases, differentiating the early epidemic phase and a second phase with a potentially different dynamics; (2) two logistic regression models for daily recoveries and deaths; and (3) a SIQR (Susceptible, Infective, Quarantined, Recovered) mechanistic model to provide an overview of the dynamics of the disease in the target population. This joint modeling approach allows explicit analytical expressions for the different compartments of the SIQR model, circumventing common identifiability issues in such models. The changes in the disease transmission pattern can be subjected to countermeasures so as to assess their effectiveness along time. For illustrative purposes, we applied the approach to COVID-19 data from West Africa. It turned out that the first imported COVID-19 case(s) in West Africa likely entered the region between 28 January and 7 February 2020. Moreover, the first measures implemented by West African authorities impacted the dynamics of the disease one month after the outbreak. Abstract The widely used logistic model for epidemic case reporting data may be either restrictive or unrealistic in presence of containment measures when implemented after an epidemic outbreak. For flexibility in epidemic case reporting data modeling, we combined an exponential growth curve for the early epidemic phase with a flexible growth curve to account for the potential change in growth pattern after implementation of containment measures. We also fitted logistic regression models to recoveries and deaths from the confirmed positive cases. In addition, the growth curves were integrated into a SIQR (Susceptible, Infective, Quarantined, Recovered) model framework to provide an overview on the modeled epidemic wave. We focused on the estimation of: (1) the delay between the appearance of the first infectious case in the population and the outbreak (“epidemic latency period”); (2) the duration of the exponential growth phase; (3) the basic and the time-varying reproduction numbers; and (4) the peaks (time and size) in confirmed positive cases, active cases and new infections. The application of this approach to COVID-19 data from West Africa allowed discussion on the effectiveness of some containment measures implemented across the region.


Introduction
The ravages of the COVID-19 pandemic has deepened the need for mathematical and statistical tools to understand the dynamics of epidemics across the world. Simple mathematical models of infectious diseases are useful for providing insight into epidemic trajectories and disease dynamics [1][2][3]. However, applications should target complex but parsimonious models which make realistic assumptions and let the observed data drive estimations.
There are two common approaches to epidemiological modeling: phenomenological models and mechanistic models (e.g., compartmental models). On the one hand, phenomenological models use an empirical approach based on growth curve fitting (e.g., by nonlinear least squares [4] or by maximum likelihood [5]) to describe the temporal progression of case counts (e.g., daily confirmed positive cases). In this regard, the logistic bell curve has been widely used for various epidemic data, but it lacks flexibility for epidemics whose data exibits asymmetry or varying growth patterns [4,6,7]. With a view to allow flexibility, Tovissodé et al. [5] considered the generic growth curve of Turner et al. [8] with application to COVID-19 data. This approach concedes the simple logistic curve when it is supported by the observed data, but offers the possibility to fit various flexible growth models such as the generalized logistic model [9,10], the hyperlogistic model [8,11], the hyper-Gompertz [8] and the Gompertz curves [12,13]. However, to be realistic, models for epidemic data should be able to account for the potential effect of containment measures when implemented after an epidemic outbreak. In a target population undergoing an epidemic wave, the number of infective individuals may be assumed to follow an exponential growth in the early epidemic phase where no containment measures were implemented or the implemented measures were not yet effective [14]. In this case, the variation of the number of infective individuals is expected to shift to a sub-exponential growth resulting from negative feedbacks due to a decrease in the probability that an infectious individual meets a susceptible individual [6] or effects of the containment measures, if any. The major advantage of the phenomenological modeling approach is its simplicity while allowing the estimation of various quantities of interest to understand an epidemic, e.g., the "epidemic latency period" defined as the delays between the appearance of the first infectious case in the population ("patient zero") and the outbreak [14] and epidemic peak time and size, and the forecast of future incidence. The main limitation of phenomenological models is the inability to inform on the transmission process (new infections) and the removal processes (recovery and death) of an epidemic. As a result, phenomenological modeling lacks the ability to assess the effects of control interventions.
On the other hand, and contrary to phenomenological models, mechanistic models structure the population under study into different epidemiological states [4] and allow assessing the effects of control interventions on the population and disease dynamics. For instance, the effect of various control measures (e.g., contact limitation, detection and diagnosis) on COVID-19 transmission has been assessed using the Susceptibles-Exposed-Infectives-Recovered (SEIR) model and its variants [15][16][17]. However, because only a few epidemiological states can be observed, mechanistic models often face an identifiability issue in the estimation of model parameters [18][19][20]. In addition, there is generally no closed form solutions to the differential equations describing the considered epidemiological states. As a consequence, the estimation of compartmental models often relies on numerical approximations which make fitting procedures (e.g., nonlinear least squares or Bayesian estimation) computationally intensive and may introduce high-order errors in both estimates and forecasts [21]. Moreover, some quantities of high interest to understand epidemic outbreaks, which are readily available from a growth model including the epidemic latency period, are hard to derive under compartmental models.
This study proposes a hybrid framework to combine the advantages of phenomenological and mechanistic models while circumventing some of the limits of the two approaches. We focus on epidemic waves managed with at least an isolation measure for all identified infectives, as for the COVID-19 pandemic in nearly all the world. The objective of this work is to provide a quantitative framework in which epidemiologists can identify, from a large family of models, the parsimonious model that explains patterns in an observed dataset, and then assess hypotheses on the potential course of related but unobservable processes of interest. Specifically, we modeled confirmed positive cases using a combination of the exponential growth curve for the initial epidemic phase and the generic growth curve [8] after this initial phase. This development allows the estimation of the duration of the exponential growth phase and the theoretical time and size of the peak of new positive cases. Secondly, we modeled removal (recovery and death) from identified positive cases as binary processes using two logistic regression models to monitor the evolution and the peak (time and size) of the actives among detected cases. Finally, to provide an overall view for a target epidemic, we integrated the growth curve and the logistic regression removal rates into a mechanistic SIQR model frame [22] in which the population is structured in Susceptibles, Infectives, Quarantined (identified actives cases) and Recovered individuals. The result is a mechanistic model in which the sizes of the different states (compartments) have closed form expressions. This allows inference on various epidemiological parameters such as the delay between the appearance of the first infectious case in the population ("patient zero") and the outbreak ("epidemic latency period"), the reproduction number, the unobservable new infections per unit time as well as the proportion of the target population immunized against the pathogen of the target disease.
In addition to the estimates (with quantified uncertainty) for common epidemiological parameters, the proposed hybrid modeling framework extracts from the observed data and demographic rates, the evolution along the epidemic course of the key parameter to summarize the dynamics of an epidemic: the reproduction number. The changes in this parameter can thus be confronted to control measures promoted/enforced by public health authorities and governments. For illustrative purpose, we used the developed modeling framework: (i) to model COVID-19 case reporting data (daily PCR-confirmed positives, recoveries and deaths) from Western Africa (28 February to 31 August 2020); and (ii) to evaluate the transmission pattern of the disease in the region during the considered period. The results were used to discuss the effectiveness of some containment measures implemented by governments across the region.

The Hybrid Modeling Framework
In this section, we describe the three sub-models integrated into the proposed modeling framework, namely, the growth model, the logistic removal rates and the Susceptible-Infective-Quarantined-Recovered (SIQR) mechanistic model.

Mixture of Growth Models for Detected Cases
We assume that the cumulative number C t of reported cases, as a function of time t, has the form where t e > 0 is the duration from outbreak to the end of the exponential growth phase, is the generic growth model [8] with u t = [1 + ωνρ(t − τ)] −1/ρ , Ω > 0 is a constant such that the ultimate epidemic size (detected) is ξ + Ω, ω > 0 is the "intrinsic" growth rate constant for the sub-exponential growth phase, ν > 0 is a growth acceleration parameter, ρ (−1 < ρ < ν −1 ) is a shape parameter controlling the skewness of the growth curve during the sub-exponential epidemic phase (see Appendix A.1 for restriction related details) and τ is a constant of integration determined by the initial conditions of the epidemic. The generic growth curve ϕ t specified for t > t e encompasses many special or limiting cases including the Bertalanffy-Richards (ρ → 0), hyper-Gompertz (ν → 0 while ων 1+ρ →ω withω constant), Gompertz (ν → 0, ρ → 0 while ων →ω), hyper-logistic (ν = 1) and logistic (ν = 1 and ρ → 0) growth models [8] (see Appendix A.1 for details). The parameter ω 0 > 0 in (1) is the exponential growth rate for the early epidemic phase and τ 0 ∈ R determines the growth rate at t = 0. The constants ω 0 and τ 0 are set such that the first derivativeĊ t and the second derivativeC t of C t with respect to t are smooth at t = t e (i.e., at the end of the exponential growth phase). Specifically, whereφ e =φ t e andφ e =φ t e ;φ t andφ t are, respectively, the first and second derivatives of ϕ t (see Appendix A.1 for details); and (4) follows from setting ω 0 e ω 0 (t e −τ 0 ) =φ e . Furthermore, the real constant ξ in (1) ensures that C t does not jump at t = t e . In other words, ξ is given by ξ = e ω 0 (t e −τ 0 ) − ϕ e (with ϕ e = ϕ t e ) which by (4) simplifies to In (1), the time (in e.g., days, weeks or months) of the first identified cases corresponds to t = 1. In other words, to match (1) to the observed data, C 1 is identified to the number of cases reported in the time interval (0, 1], C 2 is the number of cases reported in the time interval (0, 2], etc. If Ω → ∞ and νρ → 0, the curve C t converges to an exponential growth curve with rate ω 0 . However, this scenario can be ruled out since the size of any target population is finite and so is Ω. In practice, the exponential growth is prevented by negative feedbacks which decrease the probability that an infectious individual and a susceptible individual meet and have an adequate contact (i.e., contact sufficient for transmission). For instance, the growth of the infectives is naturally continuously lowered by the increasing fraction of the population constituted by individuals who recovered and become less susceptible (temporarily or permanently immune) to the infection [6]. To prevent the exponential growth of the infectives, control measures such as quarantining and lockdown reduce the probability of contact between susceptible and infectious individuals, whereas some other measures such as social distancing and wearing a face mask reduce the likelihood of transmission whenever contacts happen.
The specification of the growth model in (1) to an epidemic thus implies that the growth rateĊ t , i.e., the number of new cases reported per unit time given bẏ withφ t defined in Appendix A.1, will peak and then fall toward zero case per unit time.
The peak occurs at a time t p > t e when the growth accelerationC t given by, withφ t defined in Appendix A.1, vanishes. The expressions of the time (t p ) and the size (Ċ p ) of the peak are available in Appendix A.2 for the general situation (ν = 0 and ρ = 0), as well as for limiting cases. The number of detected cases C t is the basic data reported during an epidemic. Once this has been modeled, various epidemic related quantities can be inferred upon introduction of disease related parameters (e.g., detection of infectives, recoveries and deaths) and demographic parameters (e.g., natural mortality, births and immigration).

Infectives, Epidemic Latency Period and Active Cases
Since only a fraction of infectives are identified at a time t, the number I t of infective individuals in a target population is obtained using (6) as I t = δ −1Ċ t [5], which reads where I 0 = δ −1 ω 0 e −ω 0 τ 0 is the number of infectives at the outbreak (t = 0) and δ ∈ (0, 1] is the detection rate assumed constant along the epidemic course (after the outbreak). Note that the number of infectives before the outbreak (t < 0) is obtained by back extrapolation as I t = I 0 e ω 0 t , i.e., considering an exponential growth before the outbreak [14]. We refer to the time from the appearance of the first infectious case in the population ("patient zero") to the outbreak as the "epidemic latency period". An estimate of the duration t o of this period is obtained by setting I t = 1 [14]. By (8), the duration of the epidemic latency period is estimated by t o = ω −1 0 log I 0 , which on using (4) simplifies to The number of detected and active cases, i.e., individuals tested positive and in isolation at a hospital or at home at time t, is denoted Q t following Hethcote et al. [22] for "Quarantined" state, although we refer to Q t as "Actives". Given the detected cases C t in (1), Q t satisfiesQ where α t is the recovery rate and t is the death rate (natural and disease-related mortality) of actives. Indeed, following Tovissodé et al. [5], we allow the removal rates α t and t from Q t to be time varying. This is appropriate when recovery and death data are available in addition to the reported positive cases per unit time. The two rates have here the logistic forms The number of active cases is then given by (see Appendix B for details) where Q 0 is available from Equation (A3) and represents the number of persistent cases from previous epidemic waves (isolated actives) at the outbreak of the target epidemic wave (e.g., Q 0 = 0 for a new disease-related epidemic) and F t is defined as if κ = 0 and λ = 0 1 + e κ 0 +κt 1/κ e 0 t if κ = 0 and λ = 0 1 + e κ 0 +κt 1/κ 1 + e λ 0 +λt 1/λ if κ = 0 and λ = 0 . (14)

Overall Epidemic Dynamics
The dynamics of an epidemic, as expressed by the variations of the infectives I t , is determined by the combination of the transmission rate (new infections) and the average residence time, i.e., the average duration from infection to isolation, recovery or death. The core parameter to summarize these dynamics is, at moment t, the reproduction number denoted R t , which is indeed crucial for quantifying the intensity of control measures required to control an epidemic [7].
The reproduction number is defined as the average number of secondary cases generated by a primary case. With a view to derive R t under the growth model in (1), we first consider an overall picture of the target population in order to enlighten the sources (transmission and removal) of the variations of I t as given in (8).

The SIQR Model
Following the authors of [5,14], we consider the Susceptible-Infectious-Quarantined-Recovered (SIQR) model of Hethcote et al. [22] to obtain a picture of the different states of individuals in a target population. We use the "quarantine-adjusted incidence" version [22] of this model since the underlying transmission mechanism explicitly recognizes the isolation of detected cases. In this framework, letting N t denote, at time t, the size of the target population (assumed finite but large), N t satisfies where S t is the size of the class of susceptible individuals, I t is the class of infectives, Q t is the size of the class of detected active cases and R t is the size of the class of individuals who recovered (both detected and not detected). We assume that the infection has zero latent period (susceptible individuals become infectious as soon as they become infected). The individuals in the classes R are assumed permanently immune within the period of time considered. It is also assumed that known active cases (in the class Q) do not mix with other classes and do not infect the susceptibles (i.e., the transmission rate from Q-class individuals is considered negligible). The corresponding SIQR model is described by the following set of nonlinear differential equations [22] where η is the recruitment rate of susceptibles (births and immigration); β t is the total number of adequate contacts (i.e., contacts sufficient for transmission) per unit time; µ is the per capita natural mortality rate; α t and γ are the recovery rates from actives Q t and infectives I t respectively; t and π are the death rates (natural and disease-related) for actives Q t and infectives I t respectively; and δ t is the detection rate which is null (δ t = 0) for t < 0 and equals δ t = δ for t ≥ 0. Note that (18) is the same as (10) for t ≥ 0. Unlike in [22], we allow the transmission rate β t to be time varying as a consequence of the form of the number of infectives I t already available in (8). The transfer diagram for this SIQR model is shown in Figure 1. Transfer diagram for a SIQR model with quarantine-adjusted incidence. S is the class of susceptibles, I is the class of infectives, Q is the class of detected active cases, i.e., individuals tested positive and in isolation at a hospital or at home and R is the class of individuals who contracted the disease, were detected or not, and have recovered. The individuals in class R are considered permanently immune.
The system (16)- (19) always has the disease-free equilibrium P 0 = (S = η/µ, I = 0, Q = 0, R = 0), i.e., in the absence of the disease, the population size N t approaches the carrying capacity N * = η/µ. Further discussion of the equilibria of the system are given in Appendix C.1. The availability of the number of infectives in Equation (8) makes it possible to solve the system (16)- (19). Indeed, from (17), the transmission rate, i.e., the number of adequate contacts per unit time (for I t > 0) is given by From (20), and using the same approach considered to find the number Q t of active cases in Equation (13) from the number I t of infectives in Equation (8), the expressions of the number S t of susceptibles, the number R t of recovered individuals and the total number of persons infected during an epidemic wave can be obtained (see Appendix C.2 for details).

The Effective Reproduction Number
From the definition of the effective reproduction number as the average number of secondary cases generated by a primary case, the threshold R t corresponds to the product of the transmission rate β t and the average residence time 1/(γ + δ t + π) in the class of infectives, i.e., This effective reproduction number is sometimes referred to as a "quarantine" reproduction number [22] or simply a "control" reproduction number to acknowledge the influence of isolation of identified infectives, and other control measures, if any [15]. The basic reproduction number defined as the average number of secondary infections produced when one primary infectious individual enters a completely susceptible population . This expression is simplified, assuming N o /(N o − 1) = 1 for the sake of beauty [23] and mostly because N o is large (recall this is a model assumption), as During the epidemic latency period (t o < t < 0) where the growth is exponential (İ t /I t = ω 0 ) and the detection rate is δ t = 0, the time-varying reproduction number is given by From the outbreak, the time-varying effective reproduction number during the remaining of the exponential phase has the same form It appears from (22) and (23) that R t > 1 during the whole exponential growth phase as expected. During the sub-exponential growth phase, the time-varying effective reproduction number is given by where z t =φ t /φ t (see Appendix A.1).

Epidemic Peak
The peak of new infections occurs when the second derivative of the total number of infected persons (since the beginning of the epidemic) vanishes. This peak time denoted t new satisfies t new > t e and is the solution of (see details in Appendix C.3) which can be solved for t using a numerical root finding routine such as the R [24] function uniroot or the Matlab [25] function fzero. Afterwards, the peak sizeṪ new (the maximum number of new infections per unit time) is obtained by inserting t new in (A14).

Long-Term Epidemic Dynamics
The specification of the growth model in (1) to an epidemic implicitly assumes that the number of infectives in (8) peaks at time t p and then approaches zero. The decay of the infectives after the peak can happen at various rates, depending on the growth pattern (determined by contacts between the infectives and the susceptibles or intermediate hosts), the response of the infected individual's organism (natural or induced with medicine or a vaccine) to the disease (recovery and death process) and the testing efforts (detection followed by isolation). There are actually two alternative paths from a disease-related state (i.e., I t > 0) toward the unique (disease-free) equilibrium P 0 : transmissions either stop (R t reaches zero) or continue fro a long time at a rate which cannot sustain an epidemic (0 < R t ≤ 1). These two scenarios are discussed further in Appendix C.4.

Statistical Model and Inference
To allow likelihood inference in the growth models in (1) using observed epidemiological data, we follow Tovissodé et al. [5] and assign to new reported cases Y t (t = 1, 2, · · · , n) a log-normal distribution with probability density function (pdf) where σ > 0 is a dispersion parameter (standard deviation at logarithmic scale). This specification yields the mean E[Y t ] =Ċ t and the variance Var[Y t ] = Ċ t + 1 2 e σ 2 − 1 while allowing null values of Y t . In addition, the numbers of new recoveries G t and new deaths M t from known active cases Q t (t = 1, 2, · · · , n) are modeled using logistic regression models with probability mass functions (pmf) where α t = 1 + e κ 0 +κt −1 and t = 1 + e λ 0 +λt −1 . The parameter vector indexing the pdf in (26) and the conditional pmf in (27) and (28) is θ = (Ω, ω, ν, ρ, τ, t e , σ, κ 0 , κ, λ 0 , λ) when the generic growth curve is considered for the sub-exponential growth phase. If a special case of the generic growth curve is desired, the corresponding restricted parameters must be withdrawn from θ. For instance, the use of a hyper-logistic growth curve (ν = 1) implies θ = (Ω, ω, ρ, τ, t e , σ, κ 0 , κ, λ 0 , λ) . Given Q 0 , the conditional log-likelihood of an observed series {Y t , G t , M t } with t = 1, 2, · · · , n, as a function of the parameter θ is The conditional maximum likelihood estimate θ of θ can be obtained using an optimization algorithm to maximize the log-likelihood function . Note that the three components of (θ) are separable and can be maximized independently. In other words, the parameter vector θ has the partition θ = (θ Y , θ G , θ M ) and the maximum likelihood estimates of the components θ Y = (Ω, ω, ν, ρ, τ 0 , t e , σ) , θ G = (κ 0 , κ) and θ M = (λ 0 , λ) can be obtained by maximizing Y , G and M respectively.
Since both the binomial and the log-normal distributions belong to the exponential family, we consider the common deviance statistic used in Generalized Linear Models [26] for checking the goodness-of-fit of the log-normal model associated to Y t and the binomial models associated to G t and M t . For the selection of the parsimonious model agreeing with the observed data, we consider the likelihood ratio statistic [27]. Further details on the deviance statistic and the likelihood ratio test are given in Appendix D.

Context and Objectives
The Western African region has 16 countries (Benin, Burkina-Faso, Cape Verde, Côte d'Ivoire, Gambia, Ghana, Guinea, Guinea-Bissau, Liberia, Mali, Mauritania, Niger, Nigeria, Senegal, Sierra Leone and Togo), covering 6,140,178 km 2 with a population of about 402,555,230 inhabitants [28] (Table 1). The first COVID-19 patient was formally identified in Western Africa in late (27) February 2020. We considered COVID-19 daily infection (PCR-confirmed cases on the day of reporting), recovery and death data, from 28 February to 31 August 2020, obtained from the Global Rise of Education Platform [29]. This period roughly corresponds to the first wave of the COVID-19 pandemic in the region [30]. We concentrated on these six months of data since the proposed modeling framework has been designed for a single epidemic wave. As of 31 August 2020, the region had 167,684 confirmed cases, among which 83.64% recovered and 1.52% died (Table 1). Although the region is heterogeneous, we treated it as if it were homogeneous. Indeed, it must be kept in mind that the reported COVID-19 cases occurred in small clusters concentrated in the main cities of each country. Hence, the sparsity of the data for the whole region actually reflect data sparsity at national and city levels.
The purpose of this analysis is to demonstrate, by example, the use of the proposed modeling framework. The specific aims are: (i) to model COVID-19 case reporting data (daily PCR-confirmed positives, recoveries and deaths) from Western Africa (28 February to 31 August 2020); and (ii) to evaluate the transmission pattern of the disease. Most West African governments have planned and subsequently implemented several control measures, either before or overlapping with the time of diagnosis of the first national cases [31]. The main sequence of public health and movement restriction measures taken by West African governments during the considered period includes personal hygiene and social distancing recommendations and isolation/lockdown ( Table 2). The adoption of these containment measures followed a sustained increment during late March 2020. The modeling results are used to discuss the effectiveness of the containment measures and the implications for the control of the further spread of COVID-19 in West African countries.

Data Analysis
All computations and statistical analyses were performed in R software [24]. The significance level of statistical tests was set to 5%.

Model Fitting
We fitted the generic growth curve to the daily new infections Y t . We used the optim routine of R software to maximize the log-likelihood (30). We also fitted three of its special cases (Bertalanffy-Richards, hyper-logistic and hyper-Gompertz), which were compared to the generic model fit using likelihood ratio tests. Instead of directly maximizing the log-likelihoods (31) for θ G and (32) for θ M with the optim routine, we used the glm routine of R with the family specification "family = binomial(logit)". Since COVID-19 was a new disease in 2020, we considered the number of known active cases Q 0 = 0 at t = 0 in (27) and (28). We plotted the daily new positives, recoveries, deaths and actives to provide graphical insights in the fitted models.

Overall Epidemic Dynamics
We analyzed the overall dynamics of the COVID-19 epidemic in West Africa using the mechanistic SIQR model described in Section 2.3. The rate parameters δ (detection rate), γ and π (recovery and death rates in infected but non-detected individuals) cannot be estimated using only the available data sequence {Y t , G t , M t } (daily new positives, recoveries and deaths) without additional assumptions on their relationships with the rate parameters for detected cases (α t and t ). We obtained from the literature δ = 0.009 [30] and γ + π = 1/10 [14,30] and assumed that the ratio of the daily recovery probability to the daily death probability in non detected infectives is equal to this ratio in the detected individuals at outbreak, i.e., before the implementation of treatments, if any. From γ/π = α 0 / 0 ≈ 5.1495, we obtained γ = 1/11.9419 and π = 1/61.4953.
Two demographic parameters are required in the SIQR model: the daily recruitment rate of susceptibles (through births and immigration) η (individuals/day) and the per capita natural mortality rate µ (day −1 ). Using the birth rate ρ b (total births and net immigrations in a period of length L divided by the average population size N during this period), the recruitment rate η was estimated by Under "natural" (i.e., disease-free) conditions where N t = S t , the variation ∆N of the population size N t over a period of length L satisfies where N i is the population of West Africa at the beginning of the period. The Equation (34) follows by (A5) with I 0 = 0. The variation ∆N of the population size is given by ∆N = r b N − r d N, where r b N represents the total recruitment during the period and r d N represents the total number of deaths with ρ d the mortality rate (individuals/day). Consequently, µ can be obtained by solving (34) for µ using ∆N = (r b − r d )N. We considered L = 365.25 days, N = 401,861,254, N i = 397,429,929 [28]. Using the annual birth (32.816/1000) and death rates (7.952/1000) [32] and the net annual immigrations (−177,000 individuals) in West Africa [28], we obtained the rates r b = (32.816/1000) − (177,000/N) = 32.371/1000 and r d = 7.952/1000. By (33) and (34), we then found and used for our analyses on West Africa, η = 35,615.35 individuals/day and µ = 2.1745 × 10 −5 day −1 . We plotted the daily number of new infections, infectives and recovered individuals, as well as the reproduction number in the West African population.

Standard Error and Confidence Interval
Standard errors were obtained for quantities calculated using estimated model parameters by the delta method [33]. For a positive definite parameter or calculated quantity φ in general, we first found the estimateφ and its logarithmic scale-standard errorσ φ by the delta method and computed its logarithmic scale-mean given byμ φ = logφ − 0.5σ 2 φ . We then obtained the bounds of its shortest confidence interval as described by Dahiya and Guttman [34].

Growth Curve for New Positives and Logistic Regressions for Removals
The results of the likelihood ratio tests comparing the generic growth model (1) against its closest special cases are presented in Table 3. The growth model involving the generic growth curve was retained. Indeed, the combination of an early exponential growth and the generic growth models was found to be the best growth model for the new positive cases in West Africa, as compared to the combinations of the exponential growth with the Bertalanffy-Richards, hyper-logistic and hyper-Gompertz growth models (Table 3; p-value < 0.001).  The deviance based χ 2 test for overall goodness-of-fit (Table 4) indicates a lack-offit (p-value < 0.001), with an overall adjusted-deviance reduction ratio of r 2 dev = 11.60%. Looking for the sub-models, we noticed that the estimated growth curve is significantly different from the corresponding null model fit (p-value < 0.001) and does not lack fit (p-value = 0.6115). Indeed, the adjusted-deviance reduction ratio is r 2 dev = 95.26% (the adjusted-coefficient of determination is r 2 a = 99.96%). The overall lack of fit is due to the logistic regression fits for the daily recoveries (r 2 dev = 9.25%) and deaths (r 2 dev = 49.08%). We nevertheless kept these fits because there are significantly different from the corresponding null model fits (p-value < 0.001). Table 4. Deviance based goodness-of-fit test results for the combination of an early exponential growth curve with a generic growth curve (fitted to daily PCR-confirmed positives) and logistic regression models (fitted to daily numbers of recoveries and deaths) using West African COVID-19 data from 28 February to 31 August 2020.
The exponential growth phase lasted about one month (t e = 29.48, CI(t e ) = [26.94, 31.79] days) after the outbreak ( Table 5). The growth curve fitted to the cumulative positive cases is given by  (35) where t is the time (day) from the outbreak. Figure 2A shows the daily confirmed positive cases and the fitted growth curve based on a log-normal error structure. The observed peak of new positives happened 148 days after the outbreak (24 July 2020) and amounted to 2626 positive cases. However, the number of positive cases showed a high variability around this date (16-29/07/2020), with most daily records roughly ranging between 1600 and 2000 new positive cases (Figure 2A) around an average of 1803 cases (with standard error SE = 86.48). The estimated peak time for the new positive cases was around 15 July 2020, i.e., about 139 days after the outbreak (Table 6), and the estimate of the peak size is about 1805 new positive cases (CI(Ċ p ) = [1643. 19,1969.86]). Assuming a log-normal distribution, the 95% prediction interval for the peak size is PI(Ċ p ) = [1368.93, 2669.55] new positive cases, which includes the observed value. The 95% prediction interval for the peak time is PI(t p ) = [126.59, 151.65] days, which also includes the observed peak time. Table 5. Estimate, standard error (SE), Wald test statistic (z-value), p-value (P(> |z|)) and 95% confidence interval (CI 95% ) for the parameters of the combination of an early exponential growth curve with a generic growth curve (fitted to daily PCR-confirmed positives) and logistic regression parameters (fitted to daily numbers of recoveries and deaths) using West African COVID-19 data from 28 February to 31 August 2020. Table notes: ind., individuals; t e (day) is the duration of the exponential growth phase after the outbreak; Ω and ξ (ind.) determine the ultimate epidemic size (detected) as ξ + Ω; ω > 0 (day −1 ) is the "intrinsic" growth rate constant for the sub-exponential growth phase; ν > 0 is a growth acceleration parameter, ρ is a shape parameter controlling the skewness of the growth curve during the sub-exponential growth phase; τ (day) is a constant of integration determined by the initial conditions of the epidemic outbreak; σ is the logarithmic-scale standard deviation of the log-normal distribution fitted to the daily new positive case reporting data; κ 0 and κ are the logit-scale intercept and slope for the daily probability α t that an active case recovers at time t (α t = 1/(1 + e −(κ 0 +κt) )); λ 0 and λ are the logit-scale intercept and slope for the daily probability t that an active case dies at time t ( t = 1/(1 + e −(λ 0 +λt) )); ω 0 (day −1 ), τ 0 (day) and ξ (individuals) are not free parameters, but computed using Equations (4) and (5); ω 0 is the growth rate during the exponential growth phase; τ 0 and ξ ensure that the daily number of positivesĊ t and the cumulative number of positives C t are smooth at t = t e ; * z-value was computed at logarithmic scale for positive definite parameters (t e , Ω, ω, ν, ρ, σ and ω 0 ), so that a p-value < 0.05 indicates significant difference from 1 at 5% level.

Parameter
Based on the logistic regression parameters shown in Table 5, the probabilities of removals from the actives (quarantined) are shown in Figure 3. The probabilities of recovery and death areα 0 = 0.0169 andˆ 0 = 0.0033, respectively, at outbreak (t = 0). The recovery probability then improved, with an odd ratio (recover/not recover) increasing on average  , daily recoveries α t Q t , (B), daily deaths t Q t (C) and known actives cases (quarantined at home/hospital) Q t (D) in COVID-19 daily case reporting data from West Africa (28 February to 31 August 2020). The fitted curves are based on a combination of an early exponential growth model and a generic growth model with log-normal error structure for the daily new positive casesĊ t , two logistic regression models for the probabilities of recovery (α t ) and death ( t ) and the combination ofĊ t , α t and t (using (13)) for actives Q t . Two outlying data points (6006 recoveries on 20 June 2020 and 11,468 recoveries on 4 August 2020) were removed from the graph (B) for a better visualization.  , basic reproduction number; I 0 , number of infectives at outbreak; R 0 , reproduction number at outbreak; t p , time of the peak of positive cases;Ċ p , size of the peak of positive cases; t new , time of the peak of new infections;Ṫ max , size of the peak of new infections; t Qmax , time of the peak of active cases; Q max , size of the peak of active cases; R 186 , total number of recovered in the population at t = 186 days (i.e., at the end of the studied period (31 August 2020)); -indicates not applicable. Figure 2B,C shows the removals (daily recovery and death) and the fitted values based on the logistic regression models for removal probabilities. We noticed that the lack-of-fit (indicated by the residual deviance test) is due to the very large variability of the observed daily proportions of recoveries and deaths. However, despite the lack-of-fit in the logistic regression fits, the use of the related recovery and death probabilities (α t and t ) along with the fitted growth curve (Ċ t ), resulted in fitted active cases (Q t ) agreeing to a large extent with the observed daily actives ( Figure 2D), with an adjusted-coefficient of determination of 97.08%. The peak of known active cases (Q t ) was on 19 July 2020 and amounted to 41,435 actives. The fitted peak is about 42,507 actives around 26 July 2020 ( Table 6). The 95% prediction interval is PI(Q max ) = [34,807.25, 50,893.54] actives for the maximum of active cases and PI(t Q max ) = [139.92, 159.71] days for the peak time t Q max (16 July to 5 August 2020).

Overall Epidemic Dynamics
The estimate of the duration of the epidemic latency period (delay between the arrival of the first infectious individual and outbreak) is about 25 days (CI(t o ) = [19.91, 29.87] days; see Table 6). Accordingly, the first imported COVID-19 case(s) in West Africa likely entered the region during the last week of January and the first week of February (28 January  Figure 4 shows the curves of the daily number of new infections (Ṫ t ), the daily number of infectives (I t ) and the immune fraction of the population (R t = K t + U t ). As expected, the peak in new infections occurred before the peak in detected infected individuals (observed 143 days after the outbreak). The time-varying effective reproduction number is shown in Figure 5. It appears that the effective reproduction number first decreased during the sub-exponential growth phase (from 2.52 on 27 February 2020), reaching 1 on 15 July and 0.66 on 31 August 2020. The effective reproduction number attained a minimum value of 0.61 on 29 September 2020 and then increased with a dynamics indicating R ∞ = 1.  19)) with rate parameters δ = 0.009 day −1 (detection rate), γ = 1/11.9419 day −1 (recovery rate for non detected), π = 1/61.4953 day −1 (death rate for non detected), η = 35,615.35 individuals/day (recruitment rate) and µ = 2.1745 × 10 −5 day −1 (natural mortality rate).  19)) with rate parameters δ = 0.009 day −1 (detection rate), γ = 1/11.9419 day −1 (recovery rate for non detected), π = 1/61.4953 day −1 (death rate for non detected), η = 35,615.35 individuals/day (recruitment rate) and µ = 2.1745 × 10 −5 day −1 (natural mortality rate).

Discussion
The importance of mathematical models in understanding and predicting the course of an epidemic outbreak and in assessing the impacts of public health control measures has been well documented in the current context of the COVID-19 pandemic [15,[35][36][37]. Whereas phenomenological modeling is limited in the scope of inference, compartmental modeling faces identifiability issues and is usually computationally intensive [38]. This study proposes a hybrid modeling framework which combines phenomenological and mechanistic modeling approaches to assess the dynamics of epidemic outbreaks while circumventing some of the limitations of each approach. We illustrate our description of the different epidemiological aspects that the hybrid modeling framework deals with using COVID-19 data from West Africa (28 February to 31 August 2020). It is worth noting that the heterogeneity of the West African region in terms of testing and reporting policies, especially for the first epidemic wave, is an important limitation for this application. This is systematically true for any regional assessment of the pandemic [15]. Our analysis aims to provide an overall view of the dynamics of the pandemic in the West Africa. However, the analysis of the data from each country may be conducted to obtain finer country-specific results (for some countries, these may significantly deviate from the overall trend).
The proposed modeling framework uses a combination of the exponential growth model for the initial dynamics of the epidemic and a generic growth curve [8] to capture the observed patterns in the number of detected positive individuals. This phenomenological model is flexible, includes many special cases and thus allows selecting the effective parsimonious model fitting the observed data based on likelihood ratio tests [27,39] or information criteria such as the Akaike's Information Criterion [40]. The effectiveness of this approach to phenomenological modeling has been demonstrated on COVID-19 data [5]. Our application on COVID-19 data from West Africa nevertheless showed that the logistic regression of recoveries and deaths in the identified positive individuals against time can lack fit, as measured by an asymptotic χ 2 test on the residual deviance statistic. Nevertheless, these fits can be improved by adding explanatories (different from time, but related to available health facilities) in the logistic regression models. The deterministic SIQR model [22] considered for mechanistic modeling explicitly acknowledges the isolation of the detected positive individuals. It does not, however, include an exposed (E) state as in the SEIQR model [41]. The use of the SEIQR model may provide better insights on the effectiveness of control measures since most of the measures first impact the exposition of susceptible individuals. In general, the proposed modeling approach can be extended by considering more complex models such as the SEIQR and the SIDARTHE model [42] instead of the SIQR model considered herein.
Among interest quantities provided by the hybrid modeling framework, we have the epidemic latency period t o (the time from the appearance of the first infectious case in the population to the outbreak). For the West African region, the result indicates that the first imported COVID-19 case(s) in West Africa likely entered the region around 28 January-7 February 2020. To the best of our knowledge, this is the first estimate of this duration in the region. This epidemic latency period is much lower than the 40 days estimated for Italy [14]. This is in line with the relatively late arrival of the virus in the region, compared to the Asian and European continents, and the prevention and detection measures anticipated by many West African governments [31]. We obtained a basic reproduction number (CI(R o ) = [2.60, 2.69]) higher than the estimate (CI(R o ) = [1.84, 1.87]) obtained by [15]. Our estimate is, however, closer to country-specific estimates obtained for Nigeria (CI(R o ) = [2.37, 2.47]) [43] and Ghana (CI(R o ) = [1.99, 3.37]) [44].
During the early phase of the epidemic after the outbreak in West Africa, the detection and isolation of a fraction of infected individuals reduced the reproduction number from R o to a control reproduction number of R 0 = 2.52, i.e., about 5.26% decrease. We estimated the duration of this phase characterized by an exponential growth to be about one month after the outbreak. This implies that the control measures implemented by West African governments to limit the transmission of the disease were not effective on average before April 2020. Indeed, apart from measures taken to limit the importation of new positive individuals (travel bans), many actions to limit the local propagation of the disease were first implemented in late March 2020 [31]  After the exponential growth phase, the sub-exponential growth pattern allowed the epidemic to peak. The estimated peak time for the detected positive cases was around 15 July 2020, and close to the observed peak time (24 July 2020). This estimated date has a delay of about eight days with respect to the estimated peak time of new infections (CI(t new ) = [126. 18, 136.11] days). This estimate is higher than the estimate (CI(t new ) = [108, 112] days) obtained by [30]. These contrasting results may be related to the more realistic SIQR model considered in this work as compared to the simpler SIR model used by Honfo et al. [30] who ignored the quarantine-adjustment of the disease incidence [22]. On the contrary, the estimated maximum number of new infections (CI(Ṫ max ) = [20,284.04, 24,464.98]) agrees with the estimate (CI(Ṫ max ) = [24,239,26,294] new infections) obtained by Honfo et al. [30].
Our results show that the time-varying effective reproduction number has decayed over April-August 2020, reaching 1 on about 15 July 2020 and 0.66 at the end of the considered period (31 August 2020). Based on the modeled dynamics, the effective reproduction number likely reached its minimum value 0.61 around 29 September 2020. However, the reproduction number likely increased again to approach R ∞ = 1 in the long run. Overall, the various measures decided and enforced by different West African governments, against the first COVID-19 epidemic wave in the region, were able to contain the propagation of the disease (importation of new cases and local transmission) in five months.
However, the COVID-19 pandemic will remain an important issue for a long time, and local region's endemic to the pathogen will likely appear in the long run. This is so because of the following factors: the re-opening of borders and airports in the region to limit the related economic feedback [45,46]; the relaxation of measures such as the ban of sport, political, cultural and religious gatherings [31,47]; and the natural evolution of the SARS-Cov-2 virus [48][49][50][51]. The limited resources and capacity of Sub-Saharan Africa countries in general [52][53][54] to immunize their population through vaccination will compound this threat in the region.

Conclusions
There are two common approaches to epidemiological modeling: phenomenological models and mechanistic models. This study proposes a hybrid framework which combines the two approaches, starting from fitting curves to observed data (confirmed positive cases, recoveries and deaths) and then providing an overall view of the epidemic dynamics by integrating the fitted curves into a compartmental model. The proposed approach allows estimating the delay between the appearance of the first infectious case in the population and the outbreak ("epidemic latency period"); the duration of period during which the epidemic growths exponentially; the basic and control reproduction numbers; and the peaks (time and size) in positive cases, active cases and new infections. An application to COVID-19 data from West Africa indicates that the hybrid modeling framework can be used to match effective control measures dictated by health policies with changes in the transmission dynamics of the studied disease.

Data Availability Statement:
The authors confirm that the data supporting the findings of this work are available within the article. Table A2. Peak time (t p ) and size (φ p =φ t p ) of the generic growth curve [8] and its limiting cases.
I t = 0 by (8). Consequently, the disease always dies out in the long run and the system tends to the disease-free equilibrium P 0 . This happens because the fraction of infectives in the population decreases to very near zero and the fraction of quarantined (Q) decreases to zero (through recovery and death). Eventually, over 100 or more years, the recovered people (R) slowly die off and the birth process slowly increases the susceptibles (S), until everyone is susceptible at the disease-free equilibrium P 0 [55]. Note, however, that the SIQR model described by (16)- (19) together with (8) is meant for a single epidemic wave, whereas it is possible to have successive epidemic waves or even overlapping epidemic waves [1] which would be described by a mixture of many SIQR models.
From (18), the number Q t of known active cases before the outbreak, is given by on assuming constant recovery (α 0 ) and death ( 0 ) rates before the outbreak and on denoting Q o the number of persistent cases from previous epidemic waves (e.g., Q o = 0 for a new disease-related epidemic).

Appendix C.2. Susceptibles, Recovered, Total and Lost Cases
In addition to infectives (I t ) and actives (Q t ) already available from the growth curve C t , the computation of the population size in (15) requires the expressions of the sizes of the compartments of susceptibles (S t ) and immunes (R t ). Inserting Equation (20) into (16) and replacing in light of (8)İ t /I t = ω 0 for t ≤ 0 andİ t /I t =C t /Ċ t for t > 0 yieldṡ Therefore, the number of susceptible individuals is given for −t o ≤ t ≤ 0 by where S o is the number of susceptibles at the beginning of the epidemic, obtained from (15) with t = −t o (I o = 1) as where N o is the initial population size (i.e., at t = −t o ) and K o is the number of known immune individuals at the beginning of the target epidemic (recovered from past outbreaks if any). The number of susceptibles after the outbreak (t > 0) is where S 0 is the number of susceptibles at the outbreak (t = 0) and is available from (A5), S e = S t e is the number of susceptibles at the end of the exponential growth phase and z t =φ t /φ t is the ratio of the growth accelerationφ t to the growth rateφ t ( Table 2). From the transfer diagram in Figure 1, the total number of individuals who were infected and then recovered, and are alive can be decomposed as where K t is the number of individuals who were tested positive, were isolated and then recovered (known) and U t is the number of individuals who contracted the infection but were not detected and have recovered (unknown). Equation (19) is then equivalent to the systemK From (A9), the number of known recovered individuals K t is given for −t o ≤ t ≤ 0 by After the outbreak, K t is given by where K 0 (available from (A11)) is the number of known recovered individuals before the considered outbreak (recovered from past outbreaks if any) and K e = K t e is the number of known recovered individuals at the end of the exponential growth phase. From (A10), the number of unknown recovered individuals is where U e = U t e is the number of undetected and recovered cases at the end of the exponential growth phase. The total number of persons infected during an epidemic wave is indicative of the overall cost of the epidemic in terms of its overall impact on the society (in regard to, e.g., health, work and communication). The total number of new infections denotedṪ t is given byṪ where I z = I t z is given by (8). The number of new detected cases isĊ t = δI t as before, but the number of known active cases becomes where Q z = Q t z is given by (13) and F z = F t z is given by (14). Whereas the number K t of known immunes has the same expression given in (A12) with Q t given by (A22), the number U t of unknown immunes becomes where U z = U t z is given by (A13). From (A21), the number of infectives falls to 1 at time Finally, since the removal rate of infectives is γ + δ + π per unit time, the probability that the number of infectives drops to zero at a time t end = t f + r with r a non-negative integer is (γ + δ + π)(1 − γ − δ − π) r . Under this scenario, the system (16)- (19) will tend to the disease free equilibrium P 0 at which the size of the population stabilizes at N * = η/µ.

Appendix C.4.2. Asymptotic End of Transmissions
When the shape of the curve of infectives has growth parameters such that z t = ϕ t /φ t > −(γ + δ + π) for t > t e , the transmission of the disease does not stop straightly, but continues at a low rate. Indeed, under this scenario, inserting the limit lim t→∞ I t = 0 in (24) yields where z lim = lim t→∞ z t ≤ 0 is available in Table A3 (note that z lim = −νω when ρ → 0 and z lim = 0 otherwise). Therefore, R ∞ ≤ 1 and the population asymptotically tends to the disease-free equilibrium P 0 [22]. However, if z lim > −(γ + δ + π), then we also have R ∞ > 0. For instance, under the simple logistic growth model (ν = 1, ρ → 0), z t decreases and tends to −ω as t → ∞ (Table A3) and R ∞ = 1 − ω/(γ + δ + π) which satisfies 0 < R ∞ < 1 (from ω < γ + δ + π). In general, when ρ = 0, the shape of ϕ t may allow z t to properly decrease for t > t e and become negative from t > t new so that R t < 1. However, when z t reaches its limit z lim > −(γ + δ + π), it bounces and tends to 0 (Table A3) so that R ∞ = 1.
The limit (A25) shows that, when R t does not sharply reach zero but ρ → 0, the asymptotic reproduction number depends on rate parameters (γ, δ and π) that can be controlled to hasten the disease to die out. In the situation, where ρ = 0, R ∞ is independent of model parameters, so that the long run dynamics is less likely to respond to changes in the rate parameters.

Appendix D. Goodness-of-fit and Model Selection
We define the likelihood s of the saturated model by replacingĊ t in (26) by the observed values Y t , α t in (27) by the observed daily recovery probabilities G t /(Q t−1 + Y t ) and t in (28) by the observed daily death probabilities M t /(Q t−1 + Y t ). Similarly, we define the likelihood n of the null model by replacing eachĊ t by the daily mean count Y = n −1 ∑ n t=1 Y t , each α t by the overall daily recovery probabilityᾱ (obtained assuming κ = 0) and each t by the overall daily death probability¯ (obtained assuming λ = 0). The residual deviance of the maximum likelihood fit is then given by DEV res = 2 s − ( θ) and the null deviance of the null model fit is given by DEV null = 2( s − n ). The quantity DEV res is a statistic to test the null hypothesis H 0 : the assumed model is not significantly different from the unknown model that generated the data. If H 0 is true, then the large sample distribution (i.e., as n → ∞) of DEV res is the χ 2 k distribution with k = n − m degrees of freedom where m = 12 is the number of individual model parameters in θ [56]. If the overall goodness of fit test based on DEV res rejects H 0 , then the corresponding statistics (residual deviances) can be computed for the three sub-models (i.e., considering the loglikelihoods in (29)-(32)) to identify the sub-models lacking goodness-of-fit. The percentage of information explained by the maximum likelihood fit for the cumulative data can be evaluated using the common adjusted-coefficient of determination where r = cor(C t , Y .t ) is the Pearson's correlation coefficient between C t and Y .t = ∑ t j=1 Y j , and m Y is the number of individual model parameters in θ Y . The explanative power of the overall fit can be assessed via the adjusted-deviance reduction ratio [57] Let H(θ) be the hessian matrix of and define the asymptotic covariance matrix Σ(θ) = −[H(θ)] −1 . In a large sample, the covariance matrix of the maximum likelihood estimate θ is estimated by Σ = Σ θ and square roots of the diagonal elements of Σ provide standard errors for individual parameters in θ. For the selection of the parsimonious model agreeing with the observed data, the likelihood ratio statistic can be used. To test a null hypothesis H 0 against an alternative H 1 with q > 0 restrictions fewer than H 0 , the likelihood ratio (LR) statistic is given by [27] LR = 2 ( θ (1) ) − ( θ (0) ) (A28) where θ (0) is the estimate under H 0 and θ (1) is the estimate under H 1 . If the null hypothesis H 0 is true, the test statistic LR converges in distribution to the χ 2 (q) distribution with q degrees of freedom as n → ∞ [39]. There are however distinct special cases of model (2) leading to the same number of parameters (m). For example, we have the Bertalanffy-Richards (m = 7), hyper-logistic (m = 7) and hyper-Gompertz (m = 7). We also have the Gompertz (m = 6) and logistic (m = 6). In these situations, q = 0 and the likelihood ratio test cannot be used. Thereafter, we suggest to consider information criteria such as the Akaike's Information Criterion (AIC) (the lower, the better): AIC = −2 ( θ) + 2m [40].