# Forecasting Covid-19 Dynamics in Brazil: A Data Driven Approach

^{1}

^{2}

^{3}

^{4}

^{5}

^{6}

^{*}

^{†}

## Abstract

**:**

## 1. Introduction

## 2. Materials and Methods

#### 2.1. Modeling Virus Dynamics (Traditional Approaches)

#### 2.2. Long Short Term Memory for Data Training (LSTM)

**U**,

**W**, and

**b**are respectively the input weights, recurrent weights, and biases;

**X**is the input;

**S**is the hidden output;

**C**is the cell state; and

**t**is the time step. The $\sigma $ and $\tau $ are the activation functions of the output gates. In the classical LSTM model, the first one is the Sigmoid function and the second one is the hyperbolic tangent function. There are several types of activation functions that could be used in LSTM architectures [22].

#### 2.3. Preliminary Clustering: Brazilian States in the Global Context

- Early Mortality: weekly number of deaths 14 days after the outbreak, divided by the number of confirmed cases, in the week of the outbreak. A two weeks period was used because it is the time required to know the outcome of a contamination.
- Days until 10x: the number of days it takes to multiply the confirmed cases by 10, from the day of the outbreak.
- Early Acceleration: if we denote ${\mathsf{\Delta}}_{{W}_{0}{W}_{1}}$ as the percentage increase of confirmed cases from the week of the outbreak to the week after, and ${\mathsf{\Delta}}_{{W}_{1}{W}_{2}}$ as he percentage increase from the 1st to the 2nd week after the outbreak, then the early acceleration is defined by:$$earlyAccel={\mathsf{\Delta}}_{{W}_{1}{W}_{2}}/{\mathsf{\Delta}}_{{W}_{0}{W}_{1}}.$$

#### 2.4. Modeling Time-Series with Modified Auto-Encoders

#### 2.4.1. The Modified Auto-Encoder proposal

#### 2.4.2. Data Processing

#### 2.4.3. Forecasting New Daily Cases

#### 2.5. Final Approximation for the Covid-19 Curves

## 3. Results

#### 3.1. SIR, SEIR, and SIRASD Results

#### 3.2. Preliminary Results with LSTM

#### 3.3. Checking the Clustering Results (Input to MAE)

#### 3.4. MAE Results

## 4. Discussion

## 5. Conclusions

## Author Contributions

## Funding

## Acknowledgments

## Conflicts of Interest

## References

- Byass, P. Eco-epidemiological assessment of the COVID-19 epidemic in China, January-February 2020. Glob. Health Action
**2020**, 13, 1760490. [Google Scholar] [CrossRef] [PubMed] - Hamzah, F.; Binti, A.; Lau, C.; Nazri, H.; Ligot, D.V.; Lee, G.; Tan, C.L. CoronaTracker: Worldwide COVID-19 Outbreak Data Analysis and Prediction. Bull. World Health Organ.
**2020**, 1, 32. [Google Scholar] [CrossRef] - Fanelli, D.; Piazza, F. Analysis and forecast of COVID-19 spreading in China, Italy and France. Chaos Solitons Fractals
**2020**, 134, 109761. [Google Scholar] [CrossRef] - Webb, G.F.; Magal, P.; Liu, Z.; Seydi, O. A model to predict COVID-19 epidemics with applications to South Korea, Italy, and Spain. medRxiv
**2020**. [Google Scholar] [CrossRef] [Green Version] - Grant, A. Dynamics of COVID-19 epidemics: SEIR models underestimate peak infection rates and overestimate epidemic duration. medRxiv
**2020**. [Google Scholar] [CrossRef] [Green Version] - Loli Piccolomiini, E.; Zama, F. Monitoring Italian COVID-19 spread by an adaptive SEIRD model. medRxiv
**2020**. [Google Scholar] [CrossRef] [Green Version] - Baerwolff, G.K. A Contribution to the Mathematical Modeling of the Corona/COVID-19 Pandemic. medRxiv
**2020**. [Google Scholar] [CrossRef] - Periwal, N.; Sarma, S.; Arora, P.; Sood, V. In-silico analysis of SARS-CoV-2 genomes: Insights from SARS encoded non-coding RNAs. bioRxiv
**2020**. [Google Scholar] [CrossRef] - Distante, C.; Piscitelli, P.; Miani, A. Covid-19 Outbreak Progression in Italian Regions: Approaching the Peak by the End of March in Northern Italy and First Week of April in Southern Italy. Int. J. Environ. Res. Public Health
**2020**, 17, 3025. [Google Scholar] [CrossRef] - Wang, L.; Li, J.; Guo, S.; Xie, N.; Yao, L.; Cao, Y.; Day, S.W.; Howard, S.C.; Graff, J.C.; Gu, T.; et al. Real-time estimation and prediction of mortality caused by COVID-19 with patient information based algorithm. Sci. Total Environ.
**2020**, 727, 138394. [Google Scholar] [CrossRef] - te Vrugt, M.; Bickmann, J.; Wittkowski, R. Effects of social distancing and isolation on epidemic spreading: A dynamical density functional theory model. arXiv
**2020**, arXiv:2003.13967. [Google Scholar] - Nesteruk, I. Comparison of the coronavirus epidemic dynamics in Italy and mainland China. ResearchGate Prepr.
**2020**. [Google Scholar] [CrossRef] - Nesteruk, I. Statistics-based predictions of coronavirus epidemic spreading in mainland China. ResearchGate Prepr.
**2020**. [Google Scholar] [CrossRef] [Green Version] - Ardabili, S.; Mosavi, A.; Ghamisi, P.; Ferdinand, F.; Varkonyi-Koczy, A.; Reuter, U.; Rabczuk, T.; Atkinson, P. COVID-19 Outbreak Prediction with Machine Learning. Preprints
**2020**, 04, 2020040311. [Google Scholar] [CrossRef] - Distante, C.; Gadelha Pereira, I.; Garcia Goncalves, L.M.; Piscitelli, P.; Miani, A. Forecasting Covid-19 Outbreak Progression in Italian Regions: A model based on neural network training from Chinese data. medRxiv
**2020**. [Google Scholar] [CrossRef] [Green Version] - Yang, Z.; Zeng, Z.; Wang, K.; Wong, S.S.; Liang, W.; Zanin, M.; Liu, P.; Cao, X.; Gao, Z.; Mai, Z.; et al. Modified SEIR and AI prediction of the epidemics trend of COVID-19 in China under public health interventions. J. Thorac. Dis.
**2020**, 12, 165. [Google Scholar] [CrossRef] - Roda, W.C.; Varughese, M.B.; Han, D.; Li, M.Y. Why is it difficult to accurately predict the COVID-19 epidemic? Infect. Dis. Model.
**2020**, 5, 271–281. [Google Scholar] [CrossRef] - Otunuga, O.M.; Ogunsolu, M.O. Qualitative analysis of a stochastic SEITR epidemic model with multiple stages of infection and treatment. Infect. Dis. Model.
**2020**, 5, 61–90. [Google Scholar] [CrossRef] - Bastos, S.B.; Cajueiro, D.O. Modeling and forecasting the early evolution of the Covid-19 pandemic in Brazil. arXiv
**2020**, arXiv:2003.14288. [Google Scholar] - Sagheer, A.K.M. Unsupervised Pre-training of a Deep LSTM-based Stacked Autoencoder for Multivariate Time Series Forecasting Problems. Sci. Rep.
**2019**, 9, 1938. [Google Scholar] [CrossRef] [Green Version] - Hochreiter, S.; Schmidhuber, J. Long short-term memory. Neural Comput.
**1997**, 9, 1735–1780. [Google Scholar] [CrossRef] [PubMed] - Goodfellow, I.; Bengio, Y.; Courville, A. Deep Learning; MIT Press: Cambridge, MA, USA, 2016. [Google Scholar]
- Sagheer, A.; Kotb, M. Time series forecasting of petroleum production using deep LSTM recurrent networks. Neurocomputing
**2019**, 323, 203–213. [Google Scholar] [CrossRef] - Dong, E.; Du, H.; Gardner, L. An interactive web-based dashboard to track COVID-19 in real time. Lancet Infect. Dis.
**2020**, 20, 533–534. [Google Scholar] [CrossRef] - Coronavírus Brasil. Brazil Health Ministry—Data Repository (Covid-19). Available online: https://covid.saude.gov.br/ (accessed on 22 May 2020).
- COVID-19. Italy—Official Covid Data Repository. Available online: https://github.com/pcm-dpc/COVID-19 (accessed on 22 May 2020).
- Ploner, M. Towards Data Science: Which Countries React Similar to Covid 19, Machine Learning Provides the Answer. Towards Data Science. Available online: https://towardsdatascience.com/ (accessed on 22 May 2020).
- McInnes, L.; Healy, J.; Melville, J. Umap: Uniform manifold approximation and projection for dimension reduction. arXiv
**2018**, arXiv:1802.03426. [Google Scholar] - Pedregosa, F.; Varoquaux, G.; Gramfort, A.; Michel, V.; Thirion, B.; Grisel, O.; Blondel, M.; Prettenhofer, P.; Weiss, R.; Dubourg, V.; et al. Scikit-learn: Machine Learning in Python. J. Mach. Learn. Res.
**2011**, 12, 2825–2830. [Google Scholar] - Frey, B.J.; Dueck, D. Clustering by passing messages between data points. Science
**2007**, 315, 972–976. [Google Scholar] [CrossRef] [Green Version] - Kingma, D.P.; Welling, M. An Introduction to Variational Autoencoders. Found. Trends Mach. Learn.
**2019**, 12, 307–392. [Google Scholar] [CrossRef] - Is the COVID-19 Pandemic Curve a Gaussian Curve? Cross Validated, Statistical Enthusiast. Available online: https://stats.stackexchange.com/q/455202 (accessed on 22 March 2020).
- Lyra, W.; do Nascimento Junior, J.D.; Belkhiria, J.; Leandro de Almeida, P.P.M.C.; de Andrade, I. Projeções Para o Estado do Rio Grande do Norte: População, Demanda por Hospitalização e Progressão dos Casos. Covid-19 Web Page of Department for Theoric and Experimental Physics—UFRN. Available online: http://astro.dfte.ufrn.br/html/Cliente/COVID19.php (accessed on 4 May 2020).
- Lyra, W.; do Nascimento, J.D.; Belkhiria, J.; de Almeida, L.; Chrispim, P.P.; de Andrade, I. COVID-19 pandemics modeling with SEIR(+CAQH), social distancing, and age stratification. The effect of vertical confinement and release in Brazil. medRxiv
**2020**. [Google Scholar] [CrossRef] [Green Version] - Dana, S.; Simas, A.B.; Filardi, B.A.; Rodriguez, R.N.; Valiengo, L.L.d.C.; Gallucci-Neto, J. Brazilian Modeling of COVID-19 (BRAM-COD): A Bayesian Monte Carlo approach for COVID-19 spread in a limited data set context. medRxiv
**2020**. [Google Scholar] [CrossRef]

**Figure 2.**Comparison results for LSTM, DLSTM, and LSTM-SAE on Covid-19 cumulative (

**a**) and daily (

**b**) number of cases, data from Hubei, province of China.

**Figure 4.**Values of the three features used for characterizing the early response to covid-19 for the Brazilian states.

**Figure 6.**Projections for Rio Grande do Norte state (at the northeast of Brazil) [34]. Figure printed out from the web application running at http://astro.dfte.ufrn.br/html/Cliente/COVID19.php. Acessed on 4 May.

**Figure 7.**Projections for Brazil with adapted SEIR model [34], extracted from http://astro.dfte.ufrn.br/html/Cliente/COVID19.php. Acessed on 4 May.

**Figure 11.**2D UMAP embedding of the different countries and states studied. The colors represents different clusters generated using Affinity Propagation.

**Figure 13.**Violin plots representing the values taken by the different features for each groups obtained after UMAP + Affinity Propagation clustering.

**Figure 18.**Curve fitting for Rio de Janeiro state (logNormal model was the best fit) with peak indicated on 31 May 2020.

**Figure 19.**Curve fitting for São Paulo state (logistic model was the best fit) with peak indicated on 26 May 2020.

**Figure 20.**Curve fitting for Rio Grande do Norte state (Burr model was the best fit) with peak indicated on 21 May 2020.

Parameters | Metrics | |||||||
---|---|---|---|---|---|---|---|---|

Hidden Layers | Epochs | Epochs AE Model | Dropout | Units | Sequence Lenght | MAPE | Corelation | |

LSTM | 1 | 15 | - | 0.3 | 4 | 5 | 211 | 0.732 |

DLSTM | 3 | 15 | - | 0.3 | [10, 8, 6] | 5 | 92 | 0.798 |

LSTM-SAE | 1 | 50 | 15 | 0.3 | 4 | 5 | 84 | 0.822 |

**Table 2.**Peak occurrences for each state predicted by the MAE Model and by a distribution probability. We also indicate the total number of cases expected by the MAE prediction and the day that it will reach $97\%$ of the total number of cases.

State | Predicted by MAE | Curve Fit Peak | Best Curve | Total | 97% of Total |
---|---|---|---|---|---|

TO | 2020-05-20 | 2020-05-20 | Pearson | 846 | 2020-06-13 |

SE | 2020-05-21 | 2020-05-22 | Lognormal | 2546 | 2020-06-13 |

MG | 2020-05-21 | 2020-05-17 | Logistic | 2992 | 2020-06-03 |

MS | 2020-05-21 | 2020-05-24 | Pearson | 327 | 2020-05-28 |

PA | 2020-06-01 | 2020-06-03 | Pearson | 10,332 | 2020-06-11 |

AP | 2020-05-20 | 2020-05-20 | Logistic | 5172 | 2020-06-15 |

MA | 2020-05-14 | 2020-05-14 | Logistic | 9684 | 2020-06-10 |

CE | 2020-05-30 | 2020-05-28 | Pearson | 11,556 | 2020-05-29 |

PE | 2020-05-29 | 2020-05-30 | Pearson | 18,210 | 2020-06-08 |

RJ | 2020-05-31 | 2020-05-31 | Lognormal | 21,587 | 2020-06-07 |

SP | 2020-05-28 | 2020-05-26 | Logistic | 64,984 | 2020-06-07 |

RN | 2020-05-20 | 2020-05-21 | Burr | 6025 | 2020-07-06 |

DF | 2020-05-25 | 2020-05-27 | Logistic | 6347 | 2020-07-06 |

RO | 2020-05-21 | 2020-05-24 | Pearson | 3061 | 2020-08-10 |

PI | 2020-05-20 | 2020-05-24 | Pearson | 4974 | 2020-08-13 |

PB | 2020-05-21 | 2020-05-26 | Pearson | 8765 | 2020-08-14 |

AL | 2020-05-21 | 2020-05-28 | Pearson | 8119 | 2020-08-11 |

BA | 2020-05-19 | 2020-05-19 | Pearson | 8945 | 2020-08-04 |

ES | 2020-05-21 | 2020-05-23 | Pearson | 18,271 | 2020-08-12 |

PR | 2020-05-21 | 2020-05-19 | Lognormal | 4038 | 2020-08-04 |

SC | 2020-05-21 | 2020-05-26 | Pearson | 15,329 | 2020-08-13 |

RS | 2020-05-20 | 2020-05-20 | Gamma | 4269 | 2020-08-03 |

MT | 2020-05-21 | 2020-05-25 | Pearson | 701 | 2020-07-30 |

GO | 2020-05-22 | 2020-05-25 | Pearson | 2245 | 2020-08-03 |

AC | 2020-05-20 | 2020-05-25 | Burr | 3490 | 2020-07-12 |

AM | 2020-05-20 | 2020-05-26 | Burr | 16,053 | 2020-07-04 |

RR | 2020-05-20 | 2020-05-24 | Burr | 2206 | 2020-07-07 |

© 2020 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (http://creativecommons.org/licenses/by/4.0/).

## Share and Cite

**MDPI and ACS Style**

Pereira, I.G.; Guerin, J.M.; Silva Júnior, A.G.; Garcia, G.S.; Piscitelli, P.; Miani, A.; Distante, C.; Gonçalves, L.M.G.
Forecasting Covid-19 Dynamics in Brazil: A Data Driven Approach. *Int. J. Environ. Res. Public Health* **2020**, *17*, 5115.
https://doi.org/10.3390/ijerph17145115

**AMA Style**

Pereira IG, Guerin JM, Silva Júnior AG, Garcia GS, Piscitelli P, Miani A, Distante C, Gonçalves LMG.
Forecasting Covid-19 Dynamics in Brazil: A Data Driven Approach. *International Journal of Environmental Research and Public Health*. 2020; 17(14):5115.
https://doi.org/10.3390/ijerph17145115

**Chicago/Turabian Style**

Pereira, Igor Gadelha, Joris Michel Guerin, Andouglas Gonçalves Silva Júnior, Gabriel Santos Garcia, Prisco Piscitelli, Alessandro Miani, Cosimo Distante, and Luiz Marcos Garcia Gonçalves.
2020. "Forecasting Covid-19 Dynamics in Brazil: A Data Driven Approach" *International Journal of Environmental Research and Public Health* 17, no. 14: 5115.
https://doi.org/10.3390/ijerph17145115