Next Article in Journal
A Stepwise Assessment of Parsimony and Fuzzy Entropy in Species Distribution Modelling
Next Article in Special Issue
The Impact of the COVID-19 Pandemic on the Unpredictable Dynamics of the Cryptocurrency Market
Previous Article in Journal
A Factor Analysis Perspective on Linear Regression in the ‘More Predictors than Samples’ Case
Previous Article in Special Issue
Multivariate Tail Probabilities: Predicting Regional Pertussis Cases in Washington State
 
 
Font Type:
Arial Georgia Verdana
Font Size:
Aa Aa Aa
Line Spacing:
Column Width:
Background:
Article

Piecewise Modeling the Accumulated Daily Growth of COVID-19 Deaths: The Case of the State of São Paulo, Brazil

by
Erlandson Ferreira Saraiva
1,* and
Carlos Alberto de Bragança Pereira
2
1
Institute of Matematics, Federal University of Mato Grosso do Sul, Campo Grande 79070-900, Brazil
2
Institute of Matematics and Statistics, University of São Paulo, São Paulo 05508-090, Brazil
*
Author to whom correspondence should be addressed.
Entropy 2021, 23(8), 1013; https://doi.org/10.3390/e23081013
Submission received: 8 May 2021 / Revised: 21 July 2021 / Accepted: 21 July 2021 / Published: 4 August 2021
(This article belongs to the Special Issue Modeling and Forecasting of Rare and Extreme Events)

Abstract

:
The pandemic scenery caused by the new coronavirus, called SARS-CoV-2, increased interest in statistical models capable of projecting the evolution of the number of cases (and associated deaths) due to COVID-19 in countries, states and/or cities. This interest is mainly due to the fact that the projections may help the government agencies in making decisions in relation to procedures of prevention of the disease. Since the growth of the number of cases (and deaths) of COVID-19, in general, has presented a heterogeneous evolution over time, it is important that the modeling procedure is capable of identifying periods with different growth rates and proposing an adequate model for each period. Here, we present a modeling procedure based on the fit of a piecewise growth model for the cumulative number of deaths. We opt to focus on the modeling of the cumulative number of deaths because, other than for the number of cases, these values do not depend on the number of diagnostic tests performed. In the proposed approach, the model is updated in the course of the pandemic, and whenever a “new” period of the pandemic is identified, it creates a new sub-dataset composed of the cumulative number of deaths registered from the change point and a new growth model is chosen for that period. Three growth models were fitted for each period: exponential, logistic and Gompertz models. The best model for the cumulative number of deaths recorded is the one with the smallest mean square error and the smallest Akaike information criterion (AIC) and Bayesian information criterion (BIC) values. This approach is illustrated in a case study, in which we model the number of deaths due to COVID-19 recorded in the State of São Paulo, Brazil. The results have shown that the fit of a piecewise model is very effective for explaining the different periods of the pandemic evolution.

1. Introduction

Since the discovery of a new coronavirus in the city of Wuhan, China, the virus has spread rapidly worldwide, and created a major global health crisis [1]. In fact, this disease is one of the main health problems in the world and has had an enormous economic and social impact, leading to an increase in poverty and the loss of thousands of lives in many countries. Until 27 April 2021, four million deaths had been registered in the whole world (www.worldometers.info/coronavirus, accessed on 23 July 2021).
This pandemic scenery has increased the interest in statistical models capable of projecting the evolution of the disease in countries, states, or cities. This interest lies mainly in projections that can assist government agencies in making decisions regarding the intensification of social isolation, the acquisition of hospital equipment, an increase in the number of intensive care units in hospitals.
Especially in the year 2020, many articles were published describing modeling procedures for the number of cases and/or deaths due to COVID-19 in many countries. In general, the published works model the accumulated number of cases (or deaths) by using some non-linear growth model. For example, Ref. [2] considered a simple exponential growth model to analyze the initial phase of the epidemic of COVID-19 in Africa, Ref. [3] modeled the data from Philippines and Taiwan using the generalized logistic model, Ref. [4] applied the Richards growth model to the data collected in China, France, Germany, Iran, Italy, South Korea, and Spain, Ref. [5] calibrated the logistic growth model, the generalized logistic growth model, and the generalized Richards model for the number of cases recorded in China, Ref. [6] considered the Gompertz model to predict the number of deaths in Brazil, Ref. [7] estimated the number of total COVID-19 cases and deaths in the UK, Russia, and Turkey using the Gompertz model, among others.
As discussed by the authors cited above, the growth models present satisfactory performance for modeling the accumulated number of cases (or deaths) of COVID-19. However, these models are generally adequate to model data from a specific period of the pandemic. Since the growth in the cumulative number of cases and deaths by COVID-19 has, in general, presented a heterogeneous evolution over time, this implies that the adjustment of only one of these growth models may not be adequate to explain the entire study period.
In order to overcome this issue, this article introduces a piecewise model in which different growth functions are adjusted for different pandemic periods of time. Our proposed modeling is based on the fit of three growth models, exponential, logistic and Gompertz, for each period of the pandemic. Our preference for these three models is based on studies described in the literature that indicate they are excellent for use in longitudinal data, see for example [8,9] and the references therein. However, the proposed modeling is not restricted to these three growth models and can be easily adapted for other growth models.
To identify the different periods of the pandemic in the course of time, we consider a grid of values with increments of size 1, in which, each point of the grid represents a possible change point of the pandemic. The increment of one unit represents the increase of one day in the pandemic. Then, we fit the three growth models for two sub-datasets obtained according to each possible change point. To choose the best model for each sub-dataset, we consider as a criterion the mean square error (MSE), the Akaike Information criterion [10,11], denoted by AIC, and the Bayesian Information criterion [12], denoted by BIC. The best change point d is that associated with the model with the lowest mean square error.
The fitting procedure of the three models is based on obtaining the point estimates for the parameters of the growth models by using the non-linear least square method, as described by [13,14]. We have used the software R [15] and the command nls of the package nlstools [16,17]. Based on the fitted model, we find the coordinates of the inflection point and consequently the estimated date for the peak of the pandemic in the current period.
The proposed modeling is illustrated in a case study, in which we model the cumulative number of deaths recorded in the state of São Paulo, Brazil. However, the modeling may be easily adapted for datasets from other countries, states, or cities. We opt for modeling the cumulative number of deaths instead of the cumulative number of recorded cases because these values do not depend on the number of diagnostic tests performed. This allows us to identify the changes in the pandemic’s evolution without having to adjust for changes in the number of diagnostic tests. On the other hand, one can argue that the modeling procedure now relies on accurate reporting of the mortality statistics due to COVID-19. Since the health secretary of São Paulo state publishes the mortality statistics daily and the information is checked by a group of news-press companies, we consider that these published data are adequate for our modeling procedure. As an illustration of the versatility of our proposal, we also present the model for the cumulative number of recorded cases in the State of São Paulo, Brazil.
The two main advantages of the proposed modeling approach are: 1. its effectiveness in describing the different periods of the pandemic; and 2. its capacity to explain each period of the pandemic through the estimates for the epidemiological parameters of interest, such as the growth rate for the number of daily deaths and the date for the occurrence of the peak.
The remainder of the paper is organized as follows: Section 2 presents the statistical model. Section 3 describes the adopted modeling procedure to obtain a piecewise growth model for the cumulative number of recorded deaths in the state of São Paulo, Brazil. In this section, we also present the piecewise model fitted for the cumulative number of cases recorded in the State. Section 4 concludes the paper with the final remarks. Additional details are provided in Appendix A.

2. Statistical Model

Consider D t to be the number of deaths due to COVID-19 recorded in the state of São Paulo, Brazil, on the t-th day, for t = 0 , , n , where t = 0 represents the day that the first death was recorded (17 March 2020) and n = 410 is the last day considered in the study (27 April 2021). In this period, 96,191 deaths were recorded. This dataset is publicly available on the website www.seade.gov.br/coronavirus/.
Figure 1 shows the recorded D t values, for t = 1 , , n . As one can note, the recorded D t values have a high variability which makes the modeling procedure somewhat complicated. The sample standard deviation of the D t values is 237.8288 . Due to this, we opt to develop the modeling procedure by considering the cumulative values because these values present a more stable behavior, as can be viewed in Figure 2a. Figure 2b shows the values from Figure 2a in the log-scale.
Thus, let X t be the number of deaths until the t-th day, in an accumulated way, i.e.,
X 0 = D 0 and X t = i = 0 t D i ,
for t = 1 , , n .
Then, we assume the following multiplicative growth model:
X t = h ( t | θ ) · ν t ,
where h ( t | θ ) is a nonlinear growth model indexed by the parameter θ (scalar or vector), ν t is a random error assumed as being generated from a log-normal distribution with mean μ and variance σ 2 , for t = 1 , , n .
To complete the model specification, we need now to set up the nonlinear growth model h ( t | θ ) in Equation (1), for t = 1 , , n . Hereafter, we consider the following three nonlinear growth models: exponential, logistic and Gompertz. Details on these three growth models are presented in Appendix A.
In addition, in order to avoid working with values on the scale of thousands (see Figure 2a), we opt to consider the model Equation (1) on the logarithmic scale. Thus, by taking the logarithm transformation on both sides of Equation (1), we have that:
Y t = l o g ( X t ) = f ( t | θ ) + ε t
where f ( t | θ ) = l o g ( h ( t | θ ) ) is the logarithmic transformation of a nonlinear growth model h ( t | θ ) indexed by the parameter θ and ε t = l o g ( ν t ) for t = 1 , , n d . Thus, from log-normal properties we have that ε t is a random error assumed as being generated from a normal distribution with mean μ and variance σ 2 , ε t N ( μ , σ 2 ) . In addition, we assume that μ = 0 and c o v ( ε t , ε t ) = 0 , for t , t = 1 , , n and t t .
However, since the cumulative number of deaths by COVID-19 has, in general, presented a heterogeneous evolution over time, the adoption of a single function f ( t | θ ) to explain the entire considered period may not be adequate. Thus, in order to overcome this issue, we consider that the function f ( t | θ ) is a piecewise function, given by
f ( t | θ ) = f 1 ( t | θ ) , for 0 < t d 1 ; f 2 ( t | θ ) , for d 1 < t d 2 ; , ; f k ( t | θ ) , for t > d k . .
That is, we assume that the entire period of the pandemic is divided into k different periods, being f j ( t | θ ) a function considered for recorded data in the period ( d j 1 , d j ] , for j = 1 , , k .
Table 1 shows the mathematical functions considered for f j ( t | θ ) , for j = 1 , , k . Other than for the log-exponential model that presents an infinite growth, the log-logistic and log-Gompertz have growth limits. The value α 1 = l o g ( α 1 * ) is the upper asymptote for both log-models, where α 1 * is the upper asymptote of the models in the original scale (see Appendixes A.2 and A.3). In the context of COVID-19, the value of α 1 * = exp { α 1 } is an estimate for the maximum number of deaths that will occur. The parameters ( α 2 , α 3 ) are related to the coordinates I = ( T , f ( T | θ ) ) of the inflection point. From a practical viewpoint, the point I is the peak of the pandemic, i.e., the point in which the curve of the number of deaths (log-transformed) changes its slope (positive to negative), meaning that before the peak the number of deaths grows at an increasing rate (positive slope) and after the peak, the growth is characterized by a decreasing rate (negative slope). That is, after the peak it is expected that the number of deaths is smaller each day. In addition, we have that T is the day on which the peak will occur and f ( T | θ ) is the logarithm of the cumulative number of deaths that will be recorded until the day T.
In order to get the parameter estimates of the three log-growth models, we adopt the nonlinear least square method. For this, we use the software R [15] and the command nls of the package nlstools [16,17]. Denote the estimates for the parameters of interest θ by θ ^ , where θ ^ = ( α ^ 1 , α ^ 2 ) for the log-exponential model and θ ^ = ( α ^ 1 , α ^ 2 , α ^ 3 ) for the log-logistic and log-Gompertz models.
To select the best model for the data of each period of the pandemic, we compare the three log-growth models by using as a criterion the mean square error (MSE) and the model selection criteria AIC and BIC, which are calculated according to the following expressions
M S E = 1 n t = 1 n Y ^ t Y t 2 , A I C = 2 l ( θ ^ | y ) + 2 p and B I C = 2 l ( θ ^ | y ) + p l o g ( n ) ,
where Y ^ t is the estimated value by the model, l ( θ ^ | y ) = l o g ( L ( θ ^ | y ) , in which, L ( θ ^ | y ) and p are the maximum value of the likelihood function and the number of parameters in the model, respectively. The best model is the one that has the smallest MSE, AIC and BIC values. At this point, one could opt to use another criterion of model choice, such as, for instance, the likelihood ratio test (LRT). However, as discussed by [18], there are two complications in the use of LRT for non-nested models: (1) the asymptotic distribution of the LRT under the null hypothesis will not, in general, be chi-squared, and can be difficult to evaluate mathematically; and (2) it is not clear whether model A or model B should be treated as the null model when performing the hypothesis test. In addition, if one assumes that the LRT is chosen from the three considered models, we need to make three tests (exponential × logistic), (exponential × Gompertz), and (logistic × Gompertz). Due to these two issues and the simplicity of obtaining the values of AIC and BIC using the R software, this led us to consider them as model selection criteria instead of the LRT.
Without loss of generality and for the facility of presentation, we opt to describe how to determine the change points d j in the next section, which describes the updating of the model as the pandemic evolved in the State of São Paulo.

3. Results

In this section, we present the modeling procedure adopted in the course of 411 days (from 17 March 2020 to 30 April 2021) of the pandemic in the state of São Paulo, Brazil. The first analysis was done thirty days after the record of the first death, and then, every thirty days, the model was updated.
Our choice for updating the model to every thirty days is based on the fact that, on average, the first symptoms of COVID-19 manifest between 4 to 14 days; and that the patients with grave symptoms remain an average time of 21 days in intensive clinical care. Then, it is very plausible that the changes in the evolution in the number of deaths occur between 25 to 35 days. Due to this, we opt to update the model every 30 days. However, this does not restrict the approach and a user can update the model by using another period of time, such as 20, 10 or 5 days.

3.1. Fitting of a Single Growth Model

For the first analysis, consider the cumulative number of deaths recorded in the first thirty days since the recording of the first death, i.e., the period from t = 0 (17 March 2020) to t = 29 (15 April 2020). Let D 1 = { y 0 , , y 29 } be the cumulative number of deaths log-transformed.
Once the dataset D 1 is defined, we fit the three log-growth models described in Table 1. Table 2 shows the MSE, AIC and BIC values for the three fitted log-growth models. The smallest values are highlighted in bold. As one can note, the log-Gompertz model is the best model.
Table 3 shows the point estimates and the confidence intervals ( 95 % ) for the parameters of the log-Gompertz model. The fitted model is given by
Y ^ t ( 1 ) = 7.0025 6.97 exp { 0.0861 t } ,
for t 0 , where Y ^ t ( 1 ) denotes the fitted model in the first analysis.
Figure 3a shows the values of the dataset D 1 and the fitted model (black line) for a period of 60 days (from day 0 to day 59), with 30 days of fitting and 30 days of projections. Figure 3b shows the graphic in the original scale. According to the point-estimates presented in Table 3, the projection for the maximum number of deaths is 1 , 099 (rounded exp { 7.0024 } value) and the ordinate of the inflection point is T = l o g ( 6.97 ) 0.0861 = 22.5507 , i.e., the peak of the pandemic was projected to occur on the 23rd day (08 April 2020) with 484 deaths. Until the 23rd day, 496 deaths were registered. That is, the projected value for the 23rd day presented an absolute percentage error of 2.42 % in relation to the recorded value. Although the projections do not indicate a very critical future situation, these results should be interpreted with great caution, since the recorded values on these thirty days represent only the beginning of the pandemic.
Unfortunately, the recorded values in the next thirty days (from 16 April 2020 to 15 May 2020) did not follow the projections of the initial fitted model since the recorded values were all above the projections, as shown in Figure 4. Due to this, we insert these thirty cumulative values (log-transformed) into the dataset D 1 and obtain the dataset D 1 u = { D 1 } { y 30 , , y 59 } , which we refer to as updated D 1 . Then, we update the model considering the dataset D 1 u .
Once the dataset D 1 u is defined, we repeat the fitting procedure of the three log-growth models. Table 4 shows the MSE, AIC and BIC values for the three fitted log-growth models. The smallest values are highlighted in bold. Again, the log-Gompertz model is the best model.
Table 5 shows the point estimates and the confidence intervals ( 95 % ) for the parameters of the log-Gompertz model. The fitted model is given by
Y ^ t ( 2 ) = 8.5646 7.4705 exp { 0.05 t } ,
for t > 0 , where Y ^ t ( 2 ) denotes the fitted model in the second analysis.
Figure 5 shows the values of D 1 u and the curves of the models fitted on the 30th and 60th day, respectively, in the log-scale and in the original scale for a period of 90 days, with sixty days of fitting and thirty days of projection.
The MSE of the updated model Y ^ t ( 2 ) is 0.0473 , for t 0 . That is, an MSE value greater than that of the first fitted model. Besides, as one can note, the last ten recorded values are far from the projected values by the updated model and residuals are all positive, making them serially correlated and, hence, violating the initial assumption that residuals are uncorrelated. This result made us conjecture that the fit of a single growth model may not be adequate since a new period with a growth rate and/or growth type different from the first 30 days could be beginning.
In order to empirically verify our conjecture, we registered the cumulative values in the next thirty days after the date 15 May 2020 (from 16 May 2020 to 14 June 2020) and plot these values as shown in Figure 6. As one can note, our conjecture is very plausible. Due to this, hereafter, we adopt the fit of a piecewise growth model.

3.2. Fitting of a Piecewise Growth Model

Consider D = { y 0 , , y 89 } as the cumulative number of deaths, log-transformed, recorded in the first ninety days since the first case. Let D 1 and D 2 be two sub-datasets of D representing two distinct periods of the pandemic. In order to define the change point d and to obtain the sub-datasets D 1 and D 2 , we adopt the following approach. Let G = { 18 , , 58 } be a grid from 18 to 58 with increments of size 1. Each increment of one unit represents the increase of one day in the pandemic. Then, we define the following 41 scenarios: D 1 = { y 0 , , y d } and D 2 = { y d + 1 , , y 89 } , for d G .
For each of the scenarios, we fit the three growth models to sub-datasets D 1 and D 2 . We select the best model for each of the sub-datasets by using the MSE, AIC and BIC as a criterion. In addition, the best d value is the one associated with the fitted model that has the lowest MSE value.
According to the criteria considered, the best model is composed of a log-Gompertz model for D 1 and D 2 . Figure 7 shows the MSE values of the fitted piecewise model for each d value considered, d G . The point d = 20 has lead to a piecewise model with the smallest MSE value. Due to this, we set up d = 20 as the separation point for D 1 and D 2 .
Thus, let D 1 = { y 0 , , y 20 } and D 2 = { y 21 , , y 89 } . Table 6 shows the MSE, AIC and BIC values for the three fitted growth models for D 1 and D 2 . The smallest values are highlighted in bold, indicating the log-Gompertz model for D 1 and D 2 .
Table 7 shows the point estimates and the confidence intervals ( 95 % ) for parameters of the log-Gompertz model for D 1 and D 2 , respectively. The fitted model is given by the following piecewise model
Y t ( 3 ) = 6.0242 6.3704 exp { 0.1227 t } , for 0 t < 21 ; 10.0677 4.3578 exp { 0.0242 t } , for t 21 ; ,
where Y ^ t ( 3 ) denotes the fitted model in the third analysis.
Figure 8 shows the values of the sub-datasets D 1 and D 2 and the fitted model Y ^ t ( 3 ) (black line) for a period of 120 days (from day 0 to day 119), with 90 days of fitting and 30 days of projection, for t 0 . The MSE of the fitted piecewise model is 0.0067 . This value is smaller than the MSE of the model fitted in Section 3.1, meaning that the piecewise model fits the data better.
At this point, it is important to note that future projections are given by the log-Gompertz model fitted for sub-dataset D 2 . According to point estimates for the parameters of this model, the projection for the maximum number of cases is 23,569 (rounded exp { 10.0677 } value) and the ordinate of the inflection point is T = l o g ( 4.3578 ) 0.0242 = 60.8251 , i.e., the peak of the pandemic was projected to occur on the 81st day (20 + 61) day (17 May 2020) with 8710 deaths. There were 8842 deaths recorded by the 81st day, an absolute percentage error of 1.49 % in relation to the real value.

3.3. Updates of the Piecewise Model

Following the procedure described in Section 3.2, we update the piecewise model every thirty days. However, in this section, we focus on describing the modeling procedure adopted to obtain the piecewise model. That is, we focus on the periods at which the change in the pandemic’s growth behavior happened.
In the updating ran on the 120th day (fourth analysis), the log-Gompertz model for D 2 was maintained. However, on the update run on the 150th day (fifth analysis), it was possible to identify a third period of the pandemic. For the identification of this third period, we consider the following procedure. Let D 2 = { y 21 , , y d } and D 3 = { y d + 1 , , y 149 } , for d G = { 60 , , 110 } , where G is a grid from 60 to 110 with increments of size 1, representing possible change points of the pandemic. Repeating the procedure of fitting the three growth models for each sub-dataset, we obtain that the log-Gompertz is the best model for both sub-datasets.
Figure 9 shows the MSE values from models fitted for d = { 90 , , 100 } . As one can note, d = 92 is the best change point, i.e., the point with the smallest MSE value. Thus, we set D 2 = { y 21 , , y 92 } and D 3 = { y 93 , , y 149 } .
The fitted model is given by the following piecewise model
Y ^ t ( 5 ) = 6.0242 6.3704 exp { 0.1227 } 0 t < 21 ; 10.0707 4.3603 exp { 0.0241 t } , for 21 t < 92 ; 10.8892 1.5467 exp { 0.0132 t } , for t 92 ; ,
where Y ^ t ( 5 ) denotes the fitted model in the 5-th analysis.
Figure 10 shows the values of the sub-datasets D 1 , D 2 and D 3 and the fitted model Y ^ t ( 5 ) (black line) for a period of 180 days (from day 0 to day 179), with 150 days of fitting and 30 days of projection. The MSE of the fitted piecewise model is 0.0004811 .
Following this procedure, after 14 analysis steps, we identify seven different periods of the pandemic with change points on the days d = { 20 , 92 , 240 , 295 , 363 , 381 } . The piecewise growth model has the following configuration: log-Gompertz for D 1 = { y 0 , , y 20 } , D 2 = { y 21 , , y 92 } , D 3 = { y 93 , , y 240 } , log-exponential for D 4 = { y 241 , , y 295 } , D 5 = { y 296 , , y 363 } , D 6 = { y 364 , , y 381 } and log-Gompertz for D 7 = { y 382 , } . Its mathematical expression is given by
Y ^ t ( 14 ) = 6.0242 6.3704 exp { 0.1227 t } , for 0 t < 21 ; 10.1376 4.4034 exp { 0.0229 t } , for 21 t < 93 ; 10.7378 3.1148 exp { 0.0158 t } , for 93 t < 241 ; 10.5955 + 0.0030 t , for 241 t < 296 ; 10.7666 + 0.0043 t , for 296 t < 364 ; 11.0592 + 0.0091 t , for 364 t < 382 ; 11.7423 0.5207 exp { 0.0226 t } , for t 382 ; ,
where Y ^ t ( 14 ) denotes the fitted model in the 14-th analysis step.
Figure 11 shows the recorded values and the curve of the fitted piecewise model Y ^ t ( 14 ) , in the log and original scale, for a period of 600 days. The MSE of the fitted model is 0.0042 , t 0 . At this point of the analysis, the future projection is given by the log-Gompertz model fitted for sub-dataset D 7 . According to the point estimated for the parameters of this model, the projection for the maximum number of deaths is 122,296 (exp { 11.7142 } ). In other words, if the current scenario is maintained, it is expected that there will be additional 26,105 deaths.
Table 8 shows the projections of the model for the cumulative number of deaths in the last ten days of the sub-dataset D 7 and the recorded values. The fourth line of this Table shows the absolute percentage error in relation to the real value. The biggest percentage error was 0.7877 % .
In addition to modeling the different pandemic periods, the fitted piecewise model also allows linking the model change points to facts that affect the speed of the epidemic spreading. For example, on the 334th day (13 February 2021) the first case of COVID-19 by the P 1 variant was registered in the state of São Paulo. After thirty-one days, on the 364th day (15 March 2021), the proposed approach has indicated a change point; changing from an exponential model with a growth rate of 0.0043 to an exponential model with a growth rate of 0.0091. That is, a change for a more aggressive pandemic phase occurred. Since the disease has an average time from 4 to 14 days to manifest the first symptoms and the patients with grave symptoms remain an average time of 21 days in intensive clinical care; then, it is very plausible to consider that the P 1 variant is one of the possible factors related to the change point.
On the other hand, due to an excessive increase in the number of deaths due to the COVID-19 in the first months of 2021, the government of the state of São Paulo has published on 6 March 2021 (45-th day) a decree implementing an emergency phase to contain the transmission of the disease. In this decree, a curfew from 8 pm to 5 am and the closing of commerce was adopted. Twenty-eight days after the decree, the proposed approach has identified a change from the exponential model to a Gompertz model (day 382—4 February 2021). That is, a change to a “little better” phase occurred because, in this new phase, at least, it is possible to make a projection for the occurrence of a peak. Although the modeling procedure does not allow us to associate directly the publication of the decree with the “new” pandemic phase, it is very plausible to consider that the adoption of the decree has contributed to this change.
To finalize this section, we inform the reader that the fitted piecewise model in Equation (5) is not absolutely continuous. However, in the opposite to the fit of a single growth model, the absolute continuity of a piecewise model is not a fundamental property because the aim is to identify change points at which model changes happen. Thus, the final model may have a jump in the change points. However, as is expected for a growth model, the piecewise fitted model has the following properties:
  • f ( t ) is non-decreasing;
  • f ( t ) is continuos to the right side of the change points;
  • l i m t f ( t ) = 0 and l i m t + f ( t ) = α 1 * , where α 1 * is the upper asymptote of the last function.

3.4. Piecewise Growth Model for the Cumulative Number of Cases

As another illustration of the good performance of the proposed method, we fit a piecewise model for the cumulative number of cases recorded in the State of São Paulo, Brazil, in the period from 26 February 2020 to 30 April 2021. In this period, 2,903,709 cases of COVID-19 were recorded.
The cumulative number of cases also presented a heterogeneous behavior over time. In the course of analysis, we identify eight different periods for the number of cases with change points on the days d = { 34 , 93 , 144 , 256 , 310 , 346 , 376 } . However, at this point, it is important to emphasize that the number of cases is directly connected with the number of diagnostic tests performed. Thus, when a change in the pandemic’s behavior is detected, this may be due to increased testing.
The fitted piecewise model has the following mathematical expression
Z ^ t ( 14 ) = 7.5950 l o g 1 + 4214.0455 exp { 0.2850 } , for 0 t < 35 ; 13.1173 5.2413 exp { 0.0192 t } , for 35 t < 94 ; 14.3067 2.8624 exp { 0.0143 t } , for 94 t < 145 ; 14.0132 1.1061 exp { 0.0228 t } , f o r 145 t < 257 ; 13.9311 + 0.0048 t , for 257 t < 311 ; 14.1852 + 0.0067 t , for 311 t < 347 ; 14.4121 + 0.0047 t , for 347 t < 377 ; 15.7549 + 1.1918 exp { 0.0061 t } , for t 377 ; ,
where Z ^ t ( 14 ) denotes the fitted model for the number of cases in the 14-th analysis.
Figure 12 shows the recorded values and the curve of the fitted piecewise model Z ^ t ( 14 ) , in the original scale, for a period of 1000 days. The MSE of the fitted model is 0.0042 . At this point of the analysis, the future projections are given by the log-Gompertz model fitted for sub-dataset D 8 . According to point estimates for parameters of this model, the projection for the maximum number of cases is 6,954,504 (exp { 15.7549 } ).

4. Final Remarks

In this paper, we describe a case study on the evolution of the COVID-19 pandemic in the state of São Paulo, Brazil. The main aim of the study was to fit a piecewise growth model in order to be able to explain the different periods of the pandemic’s evolution. In addition, there is also the interest in being able to give a short-term forecast for the cumulative number of cases and predict the peak point of the pandemic for each period of the pandemic.
The modeling procedure was developed in the course of time by updating the model every thirty days. In addition, for each update, we also verify the existence of two different periods of the pandemic. In the affirmative case, we separate the dataset of the period into two sub-datasets and fit a growth model for each of the sub-datasets. Overall, we identify seven periods of the pandemic for the cumulative number of deaths. To each period of the pandemic, we fit three non-linear growth models: exponential, logistic and Gompertz. In order to select the best model for each sub-dataset, we consider the MSE, the AIC and BIC as criteria.
The fitted piecewise model for the number of deaths is a mix of the Gompertz model and the exponential model. From a practical viewpoint, this can be viewed as the main advantage of the proposed method, because, it is capable of explaining each period by different values of pandemic parameters, such as the peak and the growth rate for each period. In addition, the results also show that the fit of a piecewise model is more effective than the fit of a single growth model.
From a practical viewpoint, the identification of the change points shows for government agencies that some containment procedures of the transmission of the disease need to be implemented; or, if some containment procedures implemented are working. For example, if the data from a period are explained by a Gompertz model, then there is a projection for the peak of the pandemic (inflection point) and an estimate for the maximum number of cases (upper asymptote). Under this scenery, the government’s agents may elaborate better strategies for containment of the transmission of the virus. However, if a change point is identified, in which, the recorded data after this change point are explained by an exponential model, then there is a new situation without a projection for a peak. That is, the situation changed to a more aggressive phase of the pandemic. This scenery shows that that more restrictive containment strategies need to be implemented as soon as possible.
On the other hand, if we identify a change point from a period explained by an exponential model for a period explained by a Gompertz model; this indicates that a change happened towards “a little better” situation because now, at least, it is possible to project the peak of the pandemic. This information may be used by government agencies in order to elaborate and justify the adoption of containment strategies of the transmission of the disease, and, in this way, to obtain a flattening of the Gompertz curve and, consequently, anticipate the occurrence of the peak and reduce the number of deaths (decrease the upper asymptote value).
Both examples described above show that it is important to consider the fit of a piecewise growth model, such as the one in our proposal, in contrast to fitting a single growth model. Although the paper has been developed considering only three growth models (exponential, logistic, and Gompertz), the procedure presented can be easily adapted for other kinds of growth models. The computational codes used for fitting the models are in the R language and can be requested by contacting the authors via e-mail.

Author Contributions

E.F.S. and C.A.d.B.P. have developed the whole theoretical part of the research. E.F.S. has implemented the computational codes in the R software. All authors have read and agreed to the published version of the manuscript.

Funding

This research was financed in part by the Fundação Universidade Federal de Mato Grosso do Sul-UFMS/MEC-Brasil and by Coordenação de Pessaol de Nível Superior-Brasil (CAPES)-Finance Code 001.

Institutional Review Board Statement

Not applicable.

Informed Consent Statement

Not applicable.

Data Availability Statement

https://www.seade.gov.br/coronavirus (accessed on 13 July 2021).

Conflicts of Interest

The authors declare no conflict of interest.

Appendix A. Growth Model

In order to model the cumulative number of deaths recorded in the State of São Paulo Brazil, let Y t as defined in Equation (2) of the paper.

Appendix A.1. Exponential Growth Model

One of the best known nonlinear models to describe the growth of an epidemic disease is the exponential model. Assuming this model, the cumulative number of deaths is given by
X t = f ( t | θ ) = α 1 * exp { α 2 t } ,
for θ = ( α 1 * , α 2 ) , where α 1 * is the number of deaths in the initial time, t = 0 , and α 2 is the growth rate, for t 0 .
Taking the logarithmic transformation on both sides of Equation (A1), one obtains the following linearized form of the model, called the log-exponential model,
Y t = l o g ( X t ) = α 1 + α 2 t ,
where α 1 = l o g ( α 1 * ) , for t 0 .
Figure A1 shows the exponential and log-exponential functions for an initial value α 1 * = 2 and growth rate α 2 = { 0.10 , 0.20 , 0.40 } . As one can note in Figure A1, increasing the value of α 2 increases the rate of growth curve, meaning a fast growth of the number of disease death. In addition, the curve of this model grows infinitely regardless of the population size. This can be viewed as a practical problem since the number of cumulative deaths is bound, for instance, by the population size.
Figure A1. Exponential and log-exponential models.
Figure A1. Exponential and log-exponential models.
Entropy 23 01013 g0a1

Appendix A.2 Logistic Growth Model

Consider now the logistic growth model [19]. In contrast to the exponential model, this model has an S-shape curve and, consequently, a growth limit. The cumulative number of deaths is given by
X t = f ( t | θ ) = α 1 * 1 + α 2 exp { α 3 t } ,
where θ = ( α 1 * , α 2 , α 3 ) are the model parameters, for t 0 . The parameter α 1 * is the upper asymptote. In the context of the COVID-19, the value of α 1 * is an estimate for the maximum number of deaths. The parameter α 2 is related to the coordinates ( t m , N t m ) of the inflection point, where t m = l n ( α 2 ) α 3 and N t m = α 1 * 2 . The parameter α 3 is the intrinsic growth rate at the inflection point. The time-point to obtain 100 ( 1 p ) % of the upper asymptote α 1 * is t p = l o g ( p α 2 ) l o g ( 100 p ) α 3 , for p ( 0 , 1 ) .
Taking the logarithmic transformation on both sides of Equation (A2), we obtain the log-logistic model,
Y t = l o g ( X t ) = α 1 l o g 1 + α 2 exp { α 2 t } ,
where α 1 = l o g ( α 1 * ) , for t 0 .
Figure A2 shows the logistic and log-logistic models for an upper asymptote α 1 * = 10,000, α 2 = 5 and α 3 = { 0.10 , 0.15 , 0.30 } . Similar to the exponential model, by increasing the value of the parameter α 3 , the curve is more inclined. In these Figures, the symbol • represents the inflection point and the symbol ⋄ represents the time-point needed to reach 80 % of the α 1 value.

Appendix A.3 Gompertz Growth Model

As the third growth model, consider the Gompertz model [20,21]. This model is defined as
X t = f ( t | θ ) = α 1 * exp α 2 exp { α 3 t } ,
where θ = ( α 1 * , α 2 , α 3 ) are the model parameters. The interpretation of the parameters is similar to that described for the logistic model.
The main difference in relation to the logistic model is that the curve of the Gompertz model is not symmetric in relation to the inflection point. While the ordinate value of the inflection point of the logistic model is α 1 * 2 , for the Gompertz model, this value is approximately 37 % of the α 1 * value. The time-point to obtain 100 ( 1 p ) % of the upper asymptote α 1 * is t p = l o g ( α 2 ) l o g ( l o g ( p ) ) α 3 , for p ( 0 , 1 ) .
Taking the logarithmic transformation on both sides of Equation (A3), we obtain the log-Gompertz model,
Y t = l o g ( X t ) = α 1 α 2 exp { α 3 t } , for t 0 .
where α 1 = l o g ( α 1 * ) , for t 0 .
Figure A2. Logistic and log-logistic models.
Figure A2. Logistic and log-logistic models.
Entropy 23 01013 g0a2
Figure A3 shows the Gompertz and log-Gompertz models for an upper asymptote α 1 * = 10 , 000 , α 2 = 5 and α 3 = { 0.10 , 0.20 , 0.40 } . Similar to the exponential and logistic models, by increasing the value of the parameter α 3 , the curve is more inclined. In these Figures, the symbol • represents the inflection point and the symbol ⋄ represents the time-point needed to reach 80 % of the α 1 * value.
Figure A3. Gompertz and log-Gompertz models.
Figure A3. Gompertz and log-Gompertz models.
Entropy 23 01013 g0a3

References

  1. World Health Organization (WHO). Director-General’s Opening Remarks at the Media Briefing on COVID-19—30 July of 2020. 2020. Available online: https://www.who.int/director-general/speeches/detail/who-director-general-s-opening-remarks-at-the-media-briefing-on-covid-19---30-july-2020 (accessed on 1 February 2021).
  2. Musa, S.S.; Zhao, S.; Wang, M.H. Estimation of exponential growth rate and basic reproduction number of the coronavirus disease 2019 (COVID-19) in Africa. Infect. Dis. Poverty 2020, 9, 96. [Google Scholar] [CrossRef] [PubMed]
  3. Aviv-Sharon, E.A. Generalized logistic growth modeling of the COVID-19 pandemic in Asia. Infect. Dis. Model. 2020, 5, 502–509. [Google Scholar] [CrossRef] [PubMed]
  4. Vasconcelos, G.L.; Macêdo, A.M.S.; Ospina, R.; Almeida, F.A.G.; Duarte-Filho, G.C.; Brum, A.A.; Souza, I.C.L. Modelling fatality curves of COVID-19 and the effectiveness of intervention strategies. PeerJ 2020, 8, e9421. [Google Scholar] [CrossRef] [PubMed]
  5. Wu, K.; Darcet, D.; Wang, Q. Generalized logistic growth modeling of the COVID-19 outbreak: Comparing the dynamics in the 29 provinces in China and in the rest of the world. Nonlinear Dyn. 2020, 101, 1561–1581. [Google Scholar] [CrossRef] [PubMed]
  6. Valle, J.A.M. Predicting the number of total COVID-19 cases and deaths in Brazil by the Gompertz model. Nonlinear Dyn. 2020, 102, 2951–2957. [Google Scholar] [CrossRef] [PubMed]
  7. Mazurek, J.; Rico, C.P.; Garcia, C.F. Forecasting the Number of COVID-19 Cases and Deaths in the World, UK, Russia and Turkey by the Gompertz Curve. 2020. Available online: https://www.researchgate.net/publication/341132093_Forecasting_the_number_of_COVID-19_cases_and_deaths_in_the_World_UK_Russia_and_Turkey_by_the_Gompertz_curve?channel=doi&linkId=5eb042d6299bf18b9594bc43&showFulltext=true (accessed on 23 July 2021).
  8. Budimulyati, L.S.; Norr, R.R.; Saefuddin, A.; Talib, C. Comparison on accuracy of Logistic, Gompertz and Von Bertalanffy models in predicting growth of new born calf until first mating of holstein friesian heifers. J. Indones. Trop. Anim. Agric. 2012, 3, 151–160. [Google Scholar] [CrossRef] [Green Version]
  9. Saraiva, E.F.; Sauer, L.; Pereira, B.B.; Pereira, C.A.B. A piecewise growth model for modeling the accumulated number of COVID-19 cases in the city of Campo Grande. Rev. Barsileira Biom. 2021, 39, 240–265. [Google Scholar] [CrossRef]
  10. Akaike, H.A. New look at the statistical model identification. IEEE Trans. Autom. Control 1974, 19, 716–723. [Google Scholar] [CrossRef]
  11. Bozdogan, H. Model selection and Akaike’s information criterion (AIC): The general theory and its analytical extensions. Psychometrica 1987, 52, 345–370. [Google Scholar] [CrossRef]
  12. Schwarz, G.E. Estimating the dimension of a model. Ann. Stat. 1978, 6, 461–464. [Google Scholar] [CrossRef]
  13. Vieira, S.; Hoffmann, R. Comparison of the Logistic and the Gompertz growth functions considering additive and multiplicative error terms. Appl. Stat. 1977, 26, 143–148. [Google Scholar] [CrossRef]
  14. Hsieh, Y.H. Temporal patterns and geographic heterogeneity of zika virus (zikv) outbreaks in french polynesia and Central America. PeerJ 2017, 5, e3015. [Google Scholar] [CrossRef] [PubMed] [Green Version]
  15. Cran, R. R: A Language and Environment for Statistical Computing; R foundation for Statistical Computing, Vienna, Austria. 2020. Available online: http://R-project.org (accessed on 23 July 2021).
  16. Florent, B.; Ritz, C.; Charles, M.; Brustsche, J.P.; Flandrois, P.; Delignette-Muller, M.L. A toolbox for nonlinear regression in R: The package nlstools. J. Stat. Softw. 2015, 66, 1–21. [Google Scholar]
  17. Pinheiro, J.; Bates, D.; Debroy, S.; Sakar, D.; R Core Team. Linear and Nonlinear Mixed, R Package Version 3.1-147. 2020. Available online: https://cran.r-project.org/web/packages/nlme/ (accessed on 23 July 2021).
  18. Lewis, F.; Butler, A.; Gilbert, L. A unified approach to model selection using the likelihood ratio test. Methods Ecol. Evol. 2010, 2, 155–162. [Google Scholar] [CrossRef]
  19. Blumberg, A.A. Logistic Growth Rate Functions. J. Theor. Biol. 1968, 21, 42–44. [Google Scholar] [CrossRef]
  20. Gompertz, B. On the nature of the function expressive of the law of human mortality, and on a new mode of determining the value of life contingencies. Philos. Trans. R. Soc. Lond. Biol. Sci. 1825, 182, 513–585. [Google Scholar]
  21. Winsor, C.P. The Gompertzcurve as a growth curve. Proc. Natl. Acad. Sci. USA 1932, 18. [Google Scholar] [CrossRef] [PubMed] [Green Version]
Figure 1. Number of deaths recorded by day.
Figure 1. Number of deaths recorded by day.
Entropy 23 01013 g001
Figure 2. Cumulative number of deaths until day t, for t = 0 , 1 , , n .
Figure 2. Cumulative number of deaths until day t, for t = 0 , 1 , , n .
Entropy 23 01013 g002
Figure 3. Data from 17 March 2020 to 15 April 2020 and fitted model Y ^ t ( 1 ) , for t 0 .
Figure 3. Data from 17 March 2020 to 15 April 2020 and fitted model Y ^ t ( 1 ) , for t 0 .
Entropy 23 01013 g003
Figure 4. Data from 17 March 2020 to 15 May 2020 and fitted model Y ^ t ( 1 ) , for t 0 .
Figure 4. Data from 17 March 2020 to 15 May 2020 and fitted model Y ^ t ( 1 ) , for t 0 .
Entropy 23 01013 g004
Figure 5. Data from 17 March 2020 to 15 May 2020 and fitted models on 30th Y ^ t ( 1 ) and 60th day Y ^ t ( 2 ) .
Figure 5. Data from 17 March 2020 to 15 May 2020 and fitted models on 30th Y ^ t ( 1 ) and 60th day Y ^ t ( 2 ) .
Entropy 23 01013 g005
Figure 6. Data from 17 March 2020 to 14 June 2020 and fitted models on 30th Y ^ t ( 1 ) and 60th day Y ^ t ( 2 ) .
Figure 6. Data from 17 March 2020 to 14 June 2020 and fitted models on 30th Y ^ t ( 1 ) and 60th day Y ^ t ( 2 ) .
Entropy 23 01013 g006
Figure 7. MSE values for fitted models according to the grid G .
Figure 7. MSE values for fitted models according to the grid G .
Entropy 23 01013 g007
Figure 8. Data from 26 February 2020 to 25 May 20 and fitted piecewise model Y ^ t ( 3 ) , t 0 .
Figure 8. Data from 26 February 2020 to 25 May 20 and fitted piecewise model Y ^ t ( 3 ) , t 0 .
Entropy 23 01013 g008
Figure 9. MSE values for fitted models according to the change point d.
Figure 9. MSE values for fitted models according to the change point d.
Entropy 23 01013 g009
Figure 10. Data from 26 February 2020 to 25 May 20 and fitted piecewise model Y ^ t ( 5 ) , t 0 .
Figure 10. Data from 26 February 2020 to 25 May 20 and fitted piecewise model Y ^ t ( 5 ) , t 0 .
Entropy 23 01013 g010
Figure 11. Data from 26 February 2020 to 30 April 2021 and fitted piecewise model Y ^ t ( 14 ) , t 0 .
Figure 11. Data from 26 February 2020 to 30 April 2021 and fitted piecewise model Y ^ t ( 14 ) , t 0 .
Entropy 23 01013 g011
Figure 12. Fitted piecewise model for the cumulative number of cases.
Figure 12. Fitted piecewise model for the cumulative number of cases.
Entropy 23 01013 g012
Table 1. Log-transformed growth models.
Table 1. Log-transformed growth models.
ModelParametersMathematical ExpressionInflection PointGrowth at Inflection Point
log-exponential θ = ( α 1 , α 2 ) Y t = α 1 + α 2 t does not existdoes not exist
log-logistic θ = ( α 1 , α 2 , α 3 ) Y t = α 1 l o g ( 1 + α 2 exp { α 3 t } ) T = l o g ( α 2 ) / α 3 f ( T | θ ) = α 1 l o g ( 2 )
log-Gompertz θ = ( α 1 , α 2 , α 3 ) Y t = α 1 α 2 exp { α 3 t } T = l o g ( α 2 ) / α 3 f ( T | θ ) = α 1 1
Table 2. MSE, AIC and BIC values for the fitted models for dataset D 1 .
Table 2. MSE, AIC and BIC values for the fitted models for dataset D 1 .
ModelCriterion
MSEAICBIC
log-exponential0.2776 52.6845 56.8881
log-logistic0.0968 23.0901 28.6949
log-Gompertz0.0306−11.5014−5.8966
Table 3. Parameter estimates for model Y t ( 1 ) .
Table 3. Parameter estimates for model Y t ( 1 ) .
ValuesParameter
α 1 α 2 α 3
Estimates7.0025 6.9700 0.0861
C. interval(6.6901, 7.3981)(6.6779, 7.2712)(0.0734, 0.0994)
Table 4. MSE, AIC and BIC values for the fitted models for dataset D 1 u .
Table 4. MSE, AIC and BIC values for the fitted models for dataset D 1 u .
ModelCriterion
MSEAICBIC
log-exponential0.4724 131.2807 137.5637
log-logistic0.1700 71.9576 80.3350
log-Gompertz0.0473−4.81703.5604
Table 5. Parameter estimates for model Y t ( 2 ) .
Table 5. Parameter estimates for model Y t ( 2 ) .
ValuesParameter
α 1 α 2 α 3
Estimates8.5646 7.4705 0.0500
C. interval(8.3497, 8.8117)(7.2472, 7.6960)(0.0452, 0.0551)
Table 6. MSE, AIC and BIC values for the models fitted for datasets D 1 and D 2 .
Table 6. MSE, AIC and BIC values for the models fitted for datasets D 1 and D 2 .
ModelDataset D 1 Dataset D 2
MSEAICBICMSEAICBIC
log-exponential0.201230.692733.67990.0456−11.5210−4.7755
log-logistic0.071612.019516.00240.0046−169.9950−161.0010
log-Gompertz0.0242−9.6880−5.70510.0017−237.1040−228.1100
Table 7. Estimates for the piecewise model parameters Y t ( 3 ) .
Table 7. Estimates for the piecewise model parameters Y t ( 3 ) .
ParameterDataset
D 1 D 2
α 1 6.024210.0677
(5.6457, 6.5511)(9.9692, 10.1756)
α 2 6.37044.3578
(6.0233, 6.7403)(4.2767, 4.4464)
α 3 0.12270.0242
(0.0984, 0.1484)(0.0230, 0.0254)
Table 8. Projections and percentage error for the period from 11/09/20 to 11/16/20.
Table 8. Projections and percentage error for the period from 11/09/20 to 11/16/20.
Date04/13/2104/14/2104/15/2104/16/2104/17/2104/18/2104/19/2104/20/2104/21/2104/22/21
Projection90,27990,94991,60992,25992,89993,52994,14994,75995,35995,950
Real value90,62790,81091,67392,54892,69392,79893,84294,65695,53296,191
error34813964289206731307103173241
% error0.38400.15310.06980.31230.22220.78770.32710.10880.18110.2505
Publisher’s Note: MDPI stays neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Share and Cite

MDPI and ACS Style

Saraiva, E.F.; de Bragança Pereira, C.A. Piecewise Modeling the Accumulated Daily Growth of COVID-19 Deaths: The Case of the State of São Paulo, Brazil. Entropy 2021, 23, 1013. https://doi.org/10.3390/e23081013

AMA Style

Saraiva EF, de Bragança Pereira CA. Piecewise Modeling the Accumulated Daily Growth of COVID-19 Deaths: The Case of the State of São Paulo, Brazil. Entropy. 2021; 23(8):1013. https://doi.org/10.3390/e23081013

Chicago/Turabian Style

Saraiva, Erlandson Ferreira, and Carlos Alberto de Bragança Pereira. 2021. "Piecewise Modeling the Accumulated Daily Growth of COVID-19 Deaths: The Case of the State of São Paulo, Brazil" Entropy 23, no. 8: 1013. https://doi.org/10.3390/e23081013

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Metrics

Back to TopTop