## 1. Introduction

## 2. Materials and Methods

#### 2.1. Principal Component Analysis

**A**, which is given by:

**A**:

#### 2.2. Epidemiological SEIRD Model

- Population size (N) is constant;
- Demographic features are not implemented or adopted;
- Heterogeneity: an infected individual has an equal chance of contacting a susceptible person.

- Susceptible (S): Individual who is prone to be infected on day t, and has never been infected and is not immune to infection;
- Exposed (E): Individual who has been exposed to the disease but was not able to infect another person nor show symptoms;
- Infected (I): Individual who is infected and producing virus that can potentially infect other individuals;
- Recovered (R): Individual who was ill and recovered on day t with alleged acquired immunity;
- Dead (D): Individual who died because of the infection.

#### Basic Reproduction Number

#### 2.3. Multi-Variate LSTM Model

#### 2.4. RMSE and Model Grid-Search

## 3. Experiments and Results

#### 3.1. SEIRD Model to Extract Basic and Current Reproduction Rate (${R}_{0}$ and ${R}_{e}$)

#### 3.2. Principal Components for Mobility Data

#### 3.3. Forecasting COVID-19 Deaths

## 4. Discussion

## 5. Conclusions

## Abbreviations

GDP | Gross Domestic Product |

LSTM | Long Short-Term Memory |

ODE | Ordinary Differential Equation |

OECD | Organisation for Economic Co-operation and Development |

PC | Principal Component |

PCA | Principal Component Analysis |

${R}_{0}$ | Basic Reproduction Number |

${R}_{e}$ | Current Reproduction Number |

RMSE | Root Mean Squared Error |

RNN | Recurrent Neural Network |

SEIRD | Susceptible–Exposed–Infected–Recovered–Dead Model |

SIR | Susceptible–Infected–Removed Model |

**Figure 1.**Moving average for COVID-19 in Brazil, presented according to epidemiological week. (

**a**) Cumulative and daily case reports. (

**b**) Cumulative and daily deaths.

**Figure 2.**Heat map of cumulative deaths by COVID-19 in each Brazilian state. E.g., São Paulo, which is the most populated Brazilian state, had 109,241 deaths until the date that this research has been finished.

**Figure 3.**Biweekly cases of COVID-19 reported from epidemiological week five of 2020 to epidemiological week 22 of 2021. (

**a**) States of São Paulo (yellow), Rio de Janeiro (Green), and Minas Gerais (Blue); (

**b**) Brazil.

**Figure 4.**Brazilian Community Mobility Reports for retail and recreation, transit stations, parks, workplace, residential and grocery, and pharmacy during ordinary days and national holidays during the COVID-19 pandemic. (

**a**) Community mobility reports and holidays; (

**b**) biweekly reports of COVID-19 and holidays. Red represents the holiday plus 14 days; purple represents Brazilian elections. Mobility data provided by Google LLC.

**Figure 5.**The evolution of individuals in a population during the course of an outbreak according to the SEIRD model. Each group of individuals is defined by an ODE and categorized as Susceptible, Exposed, Infected, Recovered, and Dead.

**Figure 10.**Histogram and distribution of R${}_{0}$ in holiday and non-holiday periods. (

**a**) R${}_{0}$ distribution—holiday period. (

**b**) R${}_{0}$ distribution—non-holiday period.

**Figure 11.**Shapiro–Wilk outcomes. (

**a**) Holiday period with R${}_{0}$ > 1. (

**b**) Not holiday period with R${}_{0}$ > 1. (

**c**) Holiday period with R${}_{0}$ < 1. (

**d**) Not holiday period with R${}_{0}$ < 1.

**Figure 13.**PC${}_{1}$ and PC${}_{2}$ shown with Basic Reproduction Number ${R}_{0}$ during the COVID-19 timeline in Brazil.

**Figure 14.**PC scattering based on mobility reports considering ${R}_{0}>1$ and the holiday period. A point with a value of 0 indicates that it is not a holiday, whereas a value of 1 indicates that it is a holiday period.

**Figure 15.**Cluster based on scattering of PC values using mobility data during the COVID-19 epidemic in Brazil. A point with a value of 0 indicates that it is not a holiday, whereas a value of 1 indicates that it is a holiday period. (

**a**) PC scattering based on mobility reports considering ${R}_{0}>1$ and holiday period in 2020. (

**b**) PC scattering based on mobility reports considering ${R}_{0}>1$ and holiday period in 2021.

**Figure 16.**RMSE median and average error in the boxplot on grid search using the following as input data: Cases, Cases + ${R}_{0}$, ${R}_{0}$ + ${R}_{e}$ + Holiday flag, and PC${}_{1}$ + PC${}_{2}$. For each configuration, the graphs show the median and average of the accumulated error in predicting deaths over 50 repeated experiments. (

**a**) Cases. (

**b**) Cases + ${R}_{0}$. (

**c**) ${R}_{0}$ + ${R}_{e}$ + Holiday flag. (

**d**) PC${}_{1}$ + PC${}_{2}$.

**Figure 17.**Curves of selected settings:

**Conf.1**: Cases, that is, our baseline case; considering which parameters can improve the baseline case, we select

**Conf.3**: Cases + ${R}_{0}$; without using Cases as an input feature, we select

**Conf.9**: ${R}_{0}$ + ${R}_{e}$ + Holiday flag, and;

**Conf.10**: PC${}_{1}$ + PC${}_{2}$ as input data. For these configurations, we draw the best curves (with the lowest RMSE) of each configuration over the test data. (

**a**) COVID-19 daily deaths forecast using cases as input data. (

**b**) COVID-19 daily deaths forecasts using Cases + ${R}_{0}$ as input data. (

**c**) COVID-19 daily deaths forecasts using ${R}_{0}$ + ${R}_{e}$ + Holiday flag as input data. (

**d**) COVID-19 daily deaths forecasts using PC${}_{1}$ + PC${}_{2}$ as input data.

**Table 1.**List of all column names contained in the open-access dataset of COVID-19 [17].

Column Name | Column Name |
---|---|

epi week | total cases per 100 k inhabitants |

date | deaths by totalCases |

country | recovered |

state | suspects |

city | tests |

newDeaths | tests per 100 k inhabitants |

deaths | vaccinated |

newCases | vaccinated per 100 k inhabitants |

totalCases | vaccinated second |

deathsMS | vaccinated second per 100 k inhabitants |

totalCasesMS | vaccinated single |

deaths per 100 k inhabitants | vaccinated single per 100 k inhabitants |

**Table 2.**Brazilian Holidays. * According to Regulamentation N${}^{\circ}$ 2.621/GM-MD, published on 5 August [19], the Brazilian Ministry of Defense prohibited the Army Forces from participating in commemorative events that could result in crowding, such as parades and military demonstrations.

Date | 1 Jan | 24 Feb | 25 Feb |

Weekday | Wed | Mon | Tue |

Holiday | New Year’s Day | Carnival | Carnival |

Date | 10 Apr | 21 Apr | 1 May |

Weekday | Fri | Tue | Fri |

Holiday | Good Friday | Minas Conspirancy | International Workers’ |

Date | 11 Jun | 7 Sep | 12 Oct |

Weekday | Thu | Mon | Mon |

Holiday | Corpus Christi | Independence day * | Our Lady of Aparecida |

Date | 2 Nov | 15 Nov | 25 Dec |

Weekday | Mon | Sun | Fri |

Holiday | All Souls’ | Proclamation of the Republic | Christmas |

${\mathit{R}}_{0}>1$ | ${\mathit{R}}_{0}\le 1$ | |
---|---|---|

holiday period | 80 days | 60 days |

not a holiday period | 104 days | 204 days |

PC-1 | PC-2 | |
---|---|---|

retail and recreation | −0.441861 | 0.149809 |

grocery and pharmacy | −0.378593 | −0.052756 |

parks | −0.341852 | 0.790273 |

transit stations | −0.450280 | −0.063395 |

workplaces | −0.394821 | −0.570140 |

residential | 0.431192 | 0.145478 |

**Table 5.**All settings considered for the grid-searching, evaluating ${R}_{0}$ and ${R}_{e}$, extracted from SEIRD Model, and Holiday flags, which is a flag value representing the holiday + 14 days subsequent, and PCs generated by the PCA method. For all configurations it proposed is to forecast the COVID-19 daily deaths for the last bi-weekly period, reserved for testing.

Configuration Number | Input Data |
---|---|

1 | Cases |

2 | Cases + Holiday flag |

3 | Cases + ${R}_{0}$ |

4 | ${R}_{0}$ |

5 | ${R}_{e}$ |

6 | ${R}_{0}$ + ${R}_{e}$ |

7 | ${R}_{0}$ + Holiday flag |

8 | ${R}_{e}$ + Holiday flag |

9 | ${R}_{0}$ + ${R}_{e}$ + Holiday flag |

10 | PC${}_{1}$ + PC${}_{2}$ |

11 | PC${}_{1}$ + PC${}_{2}$ + ${R}_{0}$ |

12 | PC${}_{1}$ + PC${}_{2}$ + ${R}_{0}$ + Holiday flag |

**Table 6.**RMSE over test input data median and average prediction error of 50 models for each configuration described in Table 5.

Configuration | 1 | 2 | 3 | 4 | 5 | 6 |

Median error | 214.58 | 183.11 | 173.84 | 219.05 | 221.37 | 191.65 |

Configuration | 7 | 8 | 9 | 10 | 11 | 12 |

Median error | 216.11 | 223.04 | 177.08 | 189.05 | 199.49 | 209.01 |

