A Bayesian Model of COVID-19 Cases Based on the Gompertz Curve

: The COVID-19 pandemic has highlighted the need for ﬁnding mathematical models to forecast the evolution of the contagious disease and evaluate the success of particular policies in reducing infections. In this work, we perform Bayesian inference for a non-homogeneous Poisson process with an intensity function based on the Gompertz curve. We discuss the prior distribution of the parameter and we generate samples from the posterior distribution by using Markov Chain Monte Carlo (MCMC) methods. Finally, we illustrate our method analyzing real data associated with COVID-19 in a speciﬁc region located at the south of Spain. Results of the models described in ﬁrst and second scenario can be checked in the ShinyStan App at and Conﬂicts of Interest: The declare no conﬂict of


Introduction
In December 2019, the number of pneumonia cases inexplicably increased in China. Later, scientists discovered that they were caused by a novel kind of coronavirus, called SARS-CoV-2, which appeared for the first time in Wuhan, China, see [1,2]. From that date, the disease began spreading in a huge number of countries and regions outside China where have confirmed new cases and deaths almost every day. The COVID-19, the disease related to this new coronavirus, has had a huge impact in human health and social life all over the world, even more than some other infectious diseases occurred in recent years. At the time the paper was submitted, it had already affected 68.31 millions of people with 1.56 millions of deaths (taken from Our World in Data, https://ourworldindata.org/). In fact, COVID-19 has become an important risk factor of mortality currently around the world.
Due to the impact on health, society and economy there is a need to find mathematical models that lead us to understand the evolution of that disease and evaluate if a particular policy has been successful in achieving the intended outcomes, e.g., to reduce the infections. It is well-known that the infectious disease transmission is a complex diffusion process due to social relationships. Different models have been widely developed in the literature to study the transmission process of infectious diseases theoretically, that allows us accurately predict the future development trend of infectious diseases, see among others [3][4][5][6][7][8][9][10]. While the traditional epidemiological models describe the dynamic behavior of the diseases through differential equations allowing the laws of transmission within the population, the statistic models (also so-called phenomenological models) which follow certain laws of epidemiology [9,11], are widely used in real-time forecasting for infection trajectory or size of epidemics in early stages of pandemic [12,13].
The main objective of this paper is to develop a model for the cumulative number of COVID-19 cases from a Bayesian perspective. Once the model is fitted, we will be able to make predictions to evaluate the trend of new infected cases. Bayesian methods provide an excellent theoretical framework for analyzing experimental data and the main key of its success lies on its ability to incorporate prior knowledge about the quantity of interest as a distribution function. In this case, there exists a lot of information about the new coronavirus during the pandemic which is worth to take into account as well as the behaviour of other types of coronavirus.
The goal of the Bayesian approach is to learn about the parameters which describe our phenomenon of interest by taking into account different sources of information. Often, the decision makers have access to external information such as expert views and past studies or data from other locations. This previous knowledge is incorporated into the Bayesian analysis as the prior distribution. It is well known that the prior distribution leads in general to a better estimation of the quantity under study, when it is used together with experimental data. Thorough reviews of the Bayesian approach can be found in [14][15][16].
Let [X|Θ = θ] = X θ be the underlying observation having a probability density function (PDF), p θ (x), where θ means the unknown parameter belonging to the parameter space Θ ⊆ R n , n ∈ N, n ≥ 1. As explained above, under the Bayesian approach, prior beliefs about parameters are combined with sample information based on the experience from a sample x = (x 1 , . . . , x n ) of the variable X θ by using the Bayes theorem.
Let π be the prior belief on Θ which incorporates our beliefs about the parameter θ before any data observation. It is also common to denote by π(θ) the PDF of a particular prior distribution π. In literature, it is possible to find that prior distributions can be obtained using many methods. It is remarkable that, in general, it is not easy to find the best way to express the prior information as a prior distribution function. However, insightful choice of prior may be crucial for obtaining a proper estimate of the posterior.
At this point and based on a sample of the underlying distribution, x = {x 1 , . . . , x n }, jointly with the prior density, π(θ), and Bayes' theorem we obtain the posterior distribution, denoted by π x , as a random variable having the following PDF where l(θ|x) denotes the likelihood function of the sample and m π (x) denotes the marginal density given by Just as the prior distribution π reflects the knowledge about θ before any experimentation, so π x reflects the update belief about θ after having a sample x. That means that the posterior distribution mixes the prior belief with the information contained in the sample about θ. For further information see [14]. Finally, the posterior distribution can be used to solve all standard statistical problems, like point and interval estimation, hypothesis testing and predictions. Recently, we find a rapid increase in the number of publications related to model COVID-19 using Bayesian techniques in literature, see for example [17][18][19][20][21].

The Model
Our interest is focused on finding a probabilistic model to describe the evolution of the SARS-CoV-2 in a specific region and forecast the number of new cases in near future time intervals from a Bayesian perspective. Based on the interpretation of the model as a complex system, we assume the total number of infections experienced up to time t is a non-homogeneous Poisson process (NHPP) denoted by {N(t), t ≥ 0}. One of the main issue in the NHPP model is to determine an appropriate intensity function, λ(·), which leads us to an increasing and invertible mean value function representing the expected number of infections experienced up to t, i.e., Further details on the statistical analysis and NHPPs can be found in [22,23] and a comprehensive catalogue of intensity functions is given in [24].
For the purpose of this work, we will consider the classical Gompertz curve to explain the cumulative number of new cases. Our choice is based on the intuitive biological interpretation of its parameters and the fact that this curve is widely used in growth analysis in Biology. Additionally, the Gompertz curve is of particular interest to describe a growth curve for population studies in situations where growth is not symmetrical about the point of infection, see [25][26][27][28], for further information about the Gompertz model. Moreover, the Gompertz curve have been widely used in epidemiology and virology to explain the behaviour of many biological processes, see for example [29][30][31][32]. Concerning to the COVID-19 we refer the readers to [33][34][35][36][37][38]. Finally, we also recommend readers see [39] where authors propose a generalized Gompertz growth model. Remarking its limitations we will find that the Gompertz model does not address the core issues of epidemiological models, namely, the well-mixing hypothesis and lack of spatial influences.
Among the different reparameterisations we find in literature we will consider the following expression of the Gompertz curve given by where t represents the time since the first case of infection and a, b and c are parameters having a biological interpretation. A detailed interpretation of those parameters can be found in [33]. To sum up, a represents the upper asymptote of infections and also determines the area under the curve ∂g(t|θ)/∂t, b sets the displacement along the time and it is related with the initial cases at time zero, g(0|θ), and also determines the location of the maximum on the time axis, t max = ln(b)/c, as we will discuss later on. Finally, c is a coefficient that determines the exponential decay rate of the relative growth rate of g(t), i.e., It is also worth mentioning that 1/c measures the width (duration) of the curve, see [33] for further information.
It is clear that the Gompertz curve considers some initial counts at time zero and also we should take in account that N(0) is assumed to be zero in a NHPP process. Therefore, in order to avoid the problem of detecting the initial moment, in other words the disease initial outbreak, we will consider A straightforward computation shows that the intensity function is given by

The Likelihood Function
Let N(t) be a Poisson process with intensity function λ(t|θ) given in (4). Suppose that the vector of observed times x = (t 1 , . . . , t n ) recorded in the interval (0, T], where T is a known value, satisfies t 1 < . . . < t n , then, from Theorem 5.4 in [23], the likelihood function is given by

The Prior Distribution
As has been mentioned, the prior distribution represents prior beliefs and tries to reflect the analyst's pre-data knowledge about the parameter. Among the various ways of choosing a prior, see Chapter 3 in [14], we will consider the objective and informative one. Our choice is based on the fact that official media provide a vast amount of information around the world. It is reasonable that all this huge amount of information can help us to formulate a proper prior.
In our case, the specific prior belief π(θ), θ = (a, b, c) ∈ R + × R + × R + is a multivariate random vector having a particular dependence structure. We first try to identify the marginal distribution associated with the parameter a. We recall that it represents an expected asymptote of the cumulative number of infections. For our purpose, we collect data about new confirmed cases per day per 100,000 people in different countries in the world and, more specifically, in the different regional governments in Spain, Autonomous Communities (AC). At first glance, data are far from being normal distributed, but right skewed and having a heavy right tail. This is not surprising if we take in account the effect of many uncontrollable sociopolitical covariates in each region or country. At this moment we decided just consider the Spanish AC and we fit different heavy-tailed distributions to the observations by using parametric methods, see Figure 1. Among all distributions were tested, the Birnbaum-Saunders, gamma, log-normal and inverse Gaussian distributions seem fit the data, (p-value > 0.80). Finally, we decided to use the inverse Gaussian for different reasons. First, it's suitable for modeling phenomena where there is a greater likelihood of getting extremely large values compared to other distributions, which agrees with the high contagious nature of the new disease. Second, it better reflects a sharp peak in the histogram. Finally, it has the advantage it is easier to estimate probabilities. Then, just denoting by r the expected new confirmed cases per day per 100,000 people, it is assumed that r follows an inverse Gaussian, denoted by r ∼ IG(µ, β), having a mean parameter µ > 0 and a shape parameter β > 0.

Remark 1.
For the Spanish ACs, the choice of the inverse Gaussian seems reasonable as we have argued before. However, a more detailed study should be necessary to propose a prior distribution having a valid global interpretation. Anyway, due to the particular nature of r, we think a rightskewed density should be always a nice choice. This could be a subject for future research.
From the previous arguments, considering a population of P inhabitants in a specific region and taking into account that inverse Gaussian distributions are a scale family, the specific marginal prior belief for the parameter a is also an inverse Gaussian, IG(αµ, αβ), where α = P/100,000 informs us about the size of the population. Therefore, the baseline prior density is given by At this point we will analyze the parameter b. From Expression (2), it is well-known that the parameter b depends on both the initial cases and the asymptote a and can be expressed as where M 0 represents the unknown initial cases at time zero in a specific region. It is not unrealistic suppose those initial cases are independent of the parameters and can be assumed to follow a discrete uniform distribution with range {1, . . . , M}, denoted by U {1, M}, where M represents a bound for the infections when considering t = 0. From the previous arguments, the conditional prior distribution of b given a also follows a discrete uniform distribution with range {ln(a/M), ln(a/(M − 1)), . . . , ln(a)}, that is, Remark 2. Here in Expression (7) we have induced some variability in the number of cases at time zero. This could be especially useful when the reported cases could be lower than the real cases. The role of M can be even interesting to detect infected group arrivals. This argument leads us to assume that the conditional distribution π(b|a) is discrete, although the distribution for b is continuous. This can be easily observed just computing π(b). It is also clear by construction that we introduce a dependence structure between parameters a and b. On the other hand, we realize the difficulty to establish the "time zero". For such a purpose, it can be defined as the date when the number of cases divided by the population first exceeds a certain threshold which should be sufficiently high to reflect a spread of the epidemic, as it is described in [33]. In fact, in the real example in Section 3 we have considered a similar argument just looking for the closest day to the epidemic growth in Andalusia.
Finally, for constructing the prior distribution of the parameter c we will assume to be independent of the other parameters. This fact can be empirically seen in [34,35,37] where authors describe different estimates of c in several countries. Moreover, Figure 7 described in [33] based on data from 73 countries shows a spread over more than one order of magnitude. Therefore, we will assume that c follows a continuous uniform distribution on the interval [c 1 , c 2 ] independent of the marginal distributions of a and b, that is, where c 1 and c 2 represent the lower and upper bounds of the parameter, respectively. The model depends on several hyper-parameters, namely µ, β, M, c 1 and c 2 . The parameter α can not be considered an hyper-parameter due to its value is known in practice. We will consider some specific values for the hyper-parameters later on. We recall that the hyper-parameters µ and β are related to the inverse Gaussian and that distribution has been selected according to the Spanish ACs.

The Posterior Distribution
Due to the complexity of the calculation of the normalization constant m π (x) in Equation (1), we will use a Markov Chain Monte Carlo algorithm (MCMC) to obtain independent samples in order to characterize the posterior distribution π x (θ). Specifically, we will use the no-U-turn sampler (NUTS) as MCMC algorithm due to its good performance in this kind of problems.
The NUTS algorithm is an adaptive extension of the Hamiltonian Monte Carlo (HMC) which requires no hand-tuning to obtain samples from (unnormalized) distribution. One of the main drawbacks of the HMC algorithm is the hand-tuning of two parameters namely step size, , and desired number of steps, L. Incorrect values of these parameters leads a poor HMC's performance. NUTS overcomes this problem eliminating the need to set a number of steps L by adding a stop criterion on the Hamiltonian simulation. To sum up, the main idea behind NUTS is the use of a recursive algorithm to build a set of likely candidate points that spans a wide swath of the target distribution, stopping automatically when it starts to double back and retrace its steps. For further details, see Algorithms 2 and 3 in [40].
Furthermore, we will use Stan's programming language and rstan package [41][42][43] which have an implementation of the NUTS algorithm and several tools to check the goodness of the inference. One standard method to check the convergence of the MCMC algorithm is the Gelman-Rubin statistic, which using multiple sampling chains, measures the degree to which variance (of the means) between chains exceeds what one would expect if the chains were identically distributed. Values of this statistic close to 1 indicates approximate convergence to the posterior distribution.

The Characteristics of Interest and How to Estimate Them
First we recall that a NHPP describing the cumulative number of confirmed cases up to time t, N(t), follows a Poisson distribution with mean parameter given by Λ(t|θ), i.e., N(t) ∼ Pois(Λ(t|θ)). We also recall that the NHPP assumes that the cumulative number of confirmed cases during a time interval of the form (t, t + h) depends on the current time t and the length of time interval h, and does not depend on the past history of the process. Based on the previous properties, we study the evolution of the disease by the following characteristics of potential interest.
Fixing a known value T, we are first interested in predicting the expected number of new cases of COVID-19 in future time intervals of the form (T + h 1 , T + h 2 ), h 1 < h 2 , denoted by E T+h 2 T+h 1 (θ). From both, Expression (3) and the mentioned properties of a NHPP, we obtain that To evaluate the estimates we are also interested in computing different quantiles. The quantile at level p ∈ (0, 1) of the cumulative number of confirmed cases during the time interval (T + h 1 , T + h 2 ) is given by It is clear that Q represents the maximum cumulative number of confirmed cases with 100p% confidence within the interval (T + h 1 , T + h 2 ) and corresponds to the quantile function of a Poisson distribution. For example, if we consider p = 0.95, that means that there is a 0.05 probability that the number of contagious will fall in value by more than Q T+h 2 T+h 1 (0.95, θ). This value can be useful to evaluate the impact of a particular policy to reduce infections as we will see later on.
In order to control the epidemiological process we are also interested in estimating the point of the expected maximum rate of increase, denoted by T max (θ). This point is easily computed by solving ∂ 2 Λ(t|θ)/∂t 2 = 0. The argument, T max (θ), is given by We would like to emphasize that it does not depend on the parameter a. In the Gompertz curve, T max (θ) represents the inflection value after a period of rapid growth. Finally, we will also estimate the Gompertz curve given by Expression (2).
Expressions (9)-(11) provide us three functionals of interest that depend on the parameter θ. After computing the posterior distribution of the parameter, π x , we obtain three univariate random variables by mapping π x through those functionals, namely E T+h 2 T+h 1 (π x ), Q T+h 2 T+h 1 (p, π x ) and T max (π x ). Moreover, by mapping π x in Expression (2) we will obtain a random Gompertz curve given by g(t | π x ).
From the mentioned fact that the posterior distribution has not a closed-form expression, the empirical probability distributions of E T+h 2 T+h 1 (π x ), Q T+h 2 T+h 1 (p, π x ) and T max (π x ) and the random curve g(t | π x ) can be obtained from the empirical distribution of the posterior distribution using the chains obtained by NUTS algorithm. As usual, in order to make predictions, we will compute the posterior mean and the 95% Bayesian credible quantile-based interval (CI) in each of the three empirical distributions.

A Real Example about COVID-19 Survey in Andalusia
In this section we will illustrate our method analyzing real data associated to the COVID-19 in a specific region located at the south of Spain, the province of Cádiz. We collect data from the Spanish National Network for Epidemiological Surveillance (RENAVE, by its Spanish initials). At this moment, we would like to emphasize the difficulty of choosing the time zero as we mentioned in Remark 2. Here we consider the time zero on 25 February where the first case of the COVID-19 pandemic was confirmed in Andalusia which also coincides with the closest day to the epidemic growth. Data reflect the total number of confirmed cases with SARS-CoV-2, namely all those who have a positive test on Polymerase Chain Reaction (PCR) plus those positive in a rapid antibody test made in laboratory. We discard individuals having positive test using other methods, like antigen detection or Enzyme-Linked ImmunoSorbent Assay (ELISA).

Remark 3.
It is important to decide how to be date a positive test. Following the instructions from RENAVE, if the person has symptoms, we will date the new case the day that symptoms start. If the person is asymptomatic, we will date it seven days before a positive test is recorded. Figure 2 shows the evolution of the daily new cases of COVID-19 in the province of Cádiz (black line) from 25th February to 4th October 2020. Blue color band shows the first State of Alarm in Spain declared to control infections. Note that most of the different provinces in Andalusia have a similar profile.

Remark 4.
It is worth noticing the differences we observe between the profiles of the first and second waves. Those differences cannot simply be attributed to a higher reproduction rate, but also to the increase of the number of people tested during the second wave, among other reasons. The number of cases estimated during the first wave was highly inaccurate. For example, recent estimates in France place over 9 in 10 undetected cases for the first wave, see [44]. According to Spanish data, in the first wave tests were made especially on hospitalized people and people with serious symptoms, introducing a high correlation between the seriousness and the number of confirmed cases. In the second wave more tests were available, for example allowing testing of asymptomatic individuals and screening in certain populations. On top of that, the vast majority of tests only capture infections in the respiratory system while antibody studies have issues involving bias in the collection procedures or natural reduction of antibody production. However, the technology of testing have improved substantially over time, even along the first wave. Additionally, European Centre for Disease Prevention and Control (ECDC) shows curves for different age groups which demonstrate that, while the first wave were dominated by the elderly, the early stage of the second wave was entirely dominated by the young adults, and hence there were almost no deaths. Therefore, it is apparent the dynamics of the spread of the infection was very different in the two first waves. At this moment, we evaluate the hyper-parameters of the different marginal prior distributions given in (6)- (8). The population of the province of Cádiz is estimated at p = 1,240,155 inhabitants, so the value of α is 12.40155. Just considering the confirmed cases per 100,000 people shown in Figure 1 for the Spanish ACs at the beginning of the pandemic and changing the scale by α we obtain the Maximum Likelihood Estimate (MLE) for the mean and the scale parameters in (6), i.e., µ = 399.95 and β = 525.21, respectively. The value M = 10 was determined because it was not found any case where the number of infections at the beginning were bigger. We would like to emphasize that in this first wave M had little informative value to obtain the posterior distribution of b as we have checked by taking different values for M and just observing the posterior expected quantity for M 0 = 1.14 and its posterior standard deviation equal to 0.15 shown in Table 1. In other words, the result of the estimation is essentially independent of the choice of M in this case. Finally, to bound the values of c 1 = 0.01 and c 2 = 0.2, we take into account the highest and the lowest values found in Spain and other countries, as it is seen in [34,35,37]. Those values are also reasonable with the observed range in Figure 1 described in [33].

Forecasts for the Characteristics of Interest at Different Scenarios
In order to evaluate our model, we will estimate the functionals given in (9)-(11) at different stages of the pandemic. As a natural question, we first are interested in evaluating the benefits of the first lockdown imposed by the Spain's central government. Second, we will locate our estimates during the lockdown and close to the end of the State of Alarm to verify not only that predictions are quite accurate, but also how daily new cases decrease. Finally, just observing the evolution of our estimates after the easing of Spain's lockdown restrictions, we will be able to detect the beginning of the second and third waves by the increase of the daily number of new reported contagious.

First Scenario: The Benefits of the Lockdown
The lockdown in Spain was imposed on 14 March 2020. Therefore, in order to evaluate the benefits of that decision in the province of Cádiz, we will first consider T, the ending day, as 15 March 2020. The idea is to make daily predictions of the following week, from 16 March 2020 to 22 March 2020. Moreover, it is worth mentioning that week was close to the date of the maximum number of daily new reported cases of COVID-19 in the first wave. We are aware that the classical Gompertz curve is a poor model in the early stages of an epidemic. However one of the advantages of the Bayesian approach is the incorporation of prior information which leads to a better inference for small samples. Figure 3a shows the observed time series of the daily cumulative cases up to T-to feed the Bayesian model-and a week after T (brown). Likewise, it shows a set of 500 Gompertz curves obtained by an i.i.d. random sample of size 500 from the posterior distribution π x (grey). It is remarkable the band of the Gompertz curves leads us to predict the trend of daily cumulative positive cases of SARS-CoV-2 by incorporating variability. Figure 3b shows the observed time series of the daily cases up to T (black) and a week after T (brown). It also shows the expectation (blue) and the 95% CI (blue dash line) of E T+d+1 T+d (π x ) as forecasts of the expected number of new daily cases of COVID-19 where d ∈ {0, . . . , 6}.
At first glance, a change in trend can be observed between the predictions of the expected values (which continues an upward trend) and the observed data after T, which begins a downward trend. For that reason, it seems that the lockdown imposed by the authorities was beneficial to control the initial evolution of the pandemic by reducing the daily number of expected new cases.  Regarding to the parameters of interest, it is remarkable that in case of no restrictionsno government interventions-after T, we estimate that the 95% CI for the parameter a-maximum number of infected-would lie on the interval (10,865.82, 56,400.37) having a posterior mean of 27,766.41 inhabitants, see Table 1. As the population size in Cádiz is 1,240,155 inhabitants, no restrictions could mean that approximately the 2% of population would be infected by the disease. Of course, this number could have meant the collapse of the health system and would have caused a much higher number of deaths. Additionally, the posterior mean of the time to reach the peak would have been E π x [T max (θ)] = 53.13 days, letting the effect of the pandemic considerably would have dragged on. Fortunately it was not the case. Results of this model can be checked in a ShinyStan App at https://micromegas. shinyapps.io/COVID-19-Scenario1-CA-province/.

Second Scenario: The Evolution of the Pandemic during the Lockdown
Now we will evaluate the goodness of fit of our model by making predictions during the lockdown period. For such a purpose, we will consider T, the ending day, as 3 May 2020. As in the first scenario, the idea is to make daily predictions for the following week, from 4 May 2020 to 10 May 2020. It is worth mentioning that the decrease of the number of new daily cases during the lockdown was the reason why Spanish authorities justified the end of the lockdown on 21 June 2020.
Analogously to Figures 3 and 4a shows the time series of observed daily cumulative cases up to T (black) and a week after T (brown). Moreover, shows a band of Gompertz curves obtained from an i.i.d. random sample of the posterior distribution π x (grey).
Moreover, analogously to Figures 3 and 4b shows the observed time series of daily new cases (black) up to T and a week after T (brown). It also shows the forecasts of the expected number of new daily cases as the posterior mean of E T+d+1 T+d (π x ) (blue) and its 95% CI (blue dash line). In addition, we also compute the 95% CI of Q T+d+1 T+d (0.975, π x ) (red band) and Q T+d+1 T+d (0.025, π x ) (green band), where a ∈ {0, . . . , 6}. Those quantiles lead us to measure where the middle 95% of the daily new cases lies.  At first glance, the trend of both daily and cumulative expected values are quite similar to the observed data which implies that our model fits reasonably well the observations. Table 2 shows a summary of the Bayesian estimates of the main parameters. As a first conclusion, it seems the lockdown had a direct effect on the estimates compared to the values given in Table 1. Now the posterior mean of the maximum cumulative number of confirmed cases in the province of Cádiz, parameter a, is about 1543.87 people, close to the official cumulative number of confirmed cases at the end of the State of Alarm and having a 95% CI of (1465.62, 1622.46). Therefore, we estimate that about 0.12% of the population of the province of Cádiz was detected as a confirmed case of COVID-19 in the first wave and until the end of the lockdown. Taking into account that less than one out of ten cases was detected in the first wave, as it is described in [44], our result seems consistent with seroprevalence studies made in Spain, where it was determined that 1.7% of inhabitant in the province of Cádiz presented IgG antibody against SARS-CoV2. Additionally, we also estimate that E π x (T max (θ)) = 24.73 days having a 95% CI (24.02, 25.43). All those estimates are close to the official data provided by RENAVE which predicts the peak in 20 days from 25 February 2020. To sum up, we would like to emphasize that a direct computation shows that the effect of the lockdown reduced the number of infected cases by about 94.5%. Results of this model can be checked in a ShinyStan App at https://micromegas. shinyapps.io/COVID-19-Scenario2-CA-province/.
In Spain, the end of the lockdown was on 21 June 2020 and our model fits reasonable well during that period and forecasts stop being as good after lockdown. It is apparent the easing of restrictions lead to a new change in the trend and the arrival of a new wave. We will see in the following scenario how we can predict it.

Detecting the Beginning of a New Wave
As we have mentioned, the model fits well the evolution of the number of new cases during the lockdown. By considering T, the end of the lockdown, as 21 June, we next propose a classical tool to detect the beginning of a future wave based on the 99% percentile of the number of daily new cases. For such a purpose, we first estimate daily quantiles by the posterior mean of Q T+d+1 T+d (0.99, π x ), where d ∈ {0, . . . , 41}, i.e., for the first 42 days-6 weeks-after the lockdown. Second, for the ith week we count the cumulative number of confirmed cases where the observed daily number of contagious exceeding the estimate of the 99% daily quantile, and we will denote it by W + i , i = 1, . . . , 6. For example, W + 1 = 1 means that just one day the observed new daily cases exceed the estimate of the 99% daily quantile in the first week after the lockdown. It is apparent that W + i is a risk measure that takes values from 0 to 7, i = 1, ..., 6, and the larger the value, the greater the probability of a new wave. Table 3 shows the values of W + i , i = 1, . . . , 6, in the province of Cádiz. Note the first week ranges from 22 June to 28 June and the sixth week from 27 July to 2 August. It is apparent that easing COVID-19 restrictions after the lockdown leads to more spreading of coronavirus in just a few weeks. Again we face the problem to establish the time zero as mentioned in Remark 2. The value W + 5 = 7 in Table 3 implies that in all days in the 5th week the observed new daily cases exceed the estimate of the 99% daily quantile. Therefore, in order to make predictions in the second wave we have considered the initial date as 27 July, five weeks after the lockdown was finished, and T, the ending day, as 13 September 2020. We would like to emphasize that Spain had one of the most restrictive lockdown in the world in the first wave. After the lockdown people were afraid of going back to normal. We think this was the main reason of slow growth at the beginning of July. However, little by little people in summer were more confidence and jointly to an attempt to save the tourist season, infections started growth again at the end of July. It is apparent that initial conditions in the first and second waves are different. Therefore the value of the hyper-parameter M = 100 was determined taking into account that the initial cases of the second waves are, in some sense, determined by the cases at the end of the first wave. Finally, we consider the same prior information for the parameters a and c in order to have more prior variability. Table 4. Note the second wave can be interpreted as an intermediate scenario between having no restrictions and the lockdown. Now the posterior mean of the maximum number of infected in the province of Cádiz, a, is about 14,980.964 people and we also estimate that E π x (T max (θ)) = 55.179. We recall that differences between the first and second wave can be attributed to both a higher rate of contagious and an increase of the number of people tested as described in Remark 4. As a complementary study, Andalusia is divided into eight provinces, namely Almeria, Granada, Jaén, Málaga, Sevilla, Córdoba, Cádiz and Huelva. We compute the evolution of the risk measure given in Table 3 for all of them. In order to make predictions, we only should take in account they have different population size, i.e., different α = P/100,000 in Expression (6). Table 5 shows the population sizes, P, of the eight Andalusian provinces (population size according to the Instituto Nacional de Estadística https://www.ine.es/ up/9Gq4uzeUiT).  Figure 5 shows the evolution of the risk measure in Andalusia by using a color map. This figures allows us to make inter-provincial comparisons and detect how the effects of COVID-19 vary between provinces and territories.

The parameters of interest are shown in
To conclude our analysis, we have studied the evolution of the confirmed cases in the province of Cádiz during the autumn period. We first fix the beginning of the second wave as 27 July and T as 20 September. By using a similar argument as in the detection of the second wave, we compute the measure W + i for the following four weeks, i = 1, 2, 3 and 4, obtaining 3, 3, 4 and 6, respectively. It seems a third wave appears in the fourth week from the beginning of the second wave. In addition, that fourth week coincides with a vacation period in Spain. Therefore, we finally establish the beginning of the third wave as 11 October. In contrast to the second wave, the third wave appears before flattening the second curve.
By using data from 11 October to 8 December 2020-the submission date of this work-we present in Table 6 the parameters of interest of the third wave. Again the hyperparameter M = 1000 has been modified because the third wave started having higher initial values at time zero.
Week 1st  Finally, we present in Figure 6 the band of Gompertz curve for the second wave (green) and the band of Gompertz curve for the third wave (orange) obtained from an i.i.d. random sample of the posterior distributions. It is apparent that models fit well data. From the interpretation of 1/c as the width (duration) of a wave and just observing the estimates of the parameter c in Tables 2, 4 and 6, it is apparent that the duration of the second wave (if it were not interrupted by the third) would be more than twice longer that the duration of the first one and the third wave seems to be a bit shorter than the second one.

Conclusions
We have presented a non-homogeneuos Poisson process with intensity function based on the classical Gompertz curve for modeling and forecasting COVID-19 pandemic by using Bayesian inference. In that context, we have discussed the prior distributions of the parameters. In particular, we propose a right-skewed distribution as the baseline prior distribution to model confirmed cases per day per 100,000 inhabitants. The inverse Gaussian distribution seems reasonable for such a purpose in Spain. The presented framework can be used for both estimating the number of individuals infected and evaluating the success of different policies. Independently of the comparison of our model to other ones, the Bayesian approach always suppose an improvement in the estimates when just small samples are available.
Clearly inspired in Risk Theory and jointly to the well-known properties of the nonhomogeneuos Poisson process we propose an indicator which helps us to identify the beginning of a new wave. That indicator is based on the estimates of the 99% percentile of the number of daily new cases. To sum up, after fixing a model up to time T, we evaluate the estimates of the new confirmed cases for the following weeks and we are able to detect if real cases exceed certain threshold given by the quantiles which is the key to establish a new wave. We would like to emphasize that our model is not able to predict the onset of a new wave but at least is able to detect it. For such predictions we refer the reader to dynamical models that incorporate mechanisms of social response, such as attempted in [45].
To conclude, applying our method to the province of Cádiz, located at the South of Spain, we were able to discuss the effectiveness of the first lockdown, the accuracy of the estimates during that lockdown and the beginning of the second and third waves after the lockdown. For future works it would be interesting to apply robust Bayesian techniques as described in [46][47][48][49]. In particular, it would be interesting to consider a band of prior distributions for the parameter a as described in [46]. Additionally, the relative range of variation of a is larger than for c where further research is needed to find causal mechanism to interpret those ranges. Finally, it is worth mentioning that the hyper-parameter Mwhich induces uncertainty in the initial cases-takes different values depending on the wave. For example, it is apparent that the initial cases in the first, second and third waves were different.