National Holidays and Social Mobility Behaviors: Alternatives for Forecasting COVID-19 Deaths in Brazil

In this paper, we investigate the influence of holidays and community mobility on the transmission rate and death count of COVID-19 in Brazil. We identify national holidays and hallmark holidays to assess their effect on disease reports of confirmed cases and deaths. First, we use a one-variate model with the number of infected people as input data to forecast the number of deaths. This simple model is compared with a more robust deep learning multi-variate model that uses mobility and transmission rates (R0, Re) from a SEIRD model as input data. A principal components model of community mobility, generated by the principal component analysis (PCA) method, is added to improve the input features for the multi-variate model. The deep learning model architecture is an LSTM stacked layer combined with a dense layer to regress daily deaths caused by COVID-19. The multi-variate model incremented with engineered input features can enhance the forecast performance by up to 18.99% compared to the standard one-variate data-driven model.


Introduction
COVID-19 is a serious acute respiratory syndrome caused by the beta-coronavirus SARS-CoV-2, which was first reported at the end of 2019 [1]. The rate of transmission has ranked COVID-19 as the worst pandemic of the century in terms of scale and speed [2]. Transmission can occur through direct, indirect, or close contact with secretions of infected individuals or through direct contact with infected surfaces. SARS-CoV-2 enters the host cells via interaction with its entry receptor, angiotensin-converting enzyme 2 (ACE2), and an activating receptor, a protease such as TMPRSS2 or cathepsin [3].
The first case of COVID-19 in South America was reported on 25 February 2020 in the city of São Paulo, Brazil, which is an important travel hub for the region [4]. Since then, important control measures, such as overall or partial closing of marine, land, and air borders; travel restrictions, shutdown of schools and colleges; and imposed lockdown were implemented in different ways in Brazil and other countries of the region. According to official data, the country has reported the highest number of cases of COVID-19 in South America [5,6].
Brazil has the highest Gross Domestic Product (GDP) in South America, and the population density varies according to the regional division of the country. The mean population density is 24.69 inhabitants per square kilometer [7]. According to Organisation for Economic Co-operation and Development (OECD) [8], Brazil is composed of approximately 1.3 million independent or liberal professionals. Informal employment ranges from 20 to 49% of the workforce of the country. Formal and informal work positions were directly affected by restrictions to contain the spread of the virus, most of them based on Table 1. List of all column names contained in the open-access dataset of COVID-19 [17].
Geographic distributions for deaths by COVID-19 in all Brazilian states are shown in Figure 2. The majority of casualties are concentrated in the Southeast region of the country, which includes the states of Espírito Santo (ES), São Paulo (SP), Rio de Janeiro (RJ), and Minas Gerais (MG). Instead of using data from individual Brazilian states in the analysis, the main contribution of this work is to understand the social component across the country, despite the effect of local singularities in each Brazilian city and state. Therefore, we selected three populated states with the highest mortality rates for COVID-19 in Brazil to briefly highlight such singularities: São Paulo, Rio de Janeiro, and Minas Gerais. These states have 40% of the estimated Brazilian population.
Population densities are a result of the size of each state. São Paulo, for example, has an estimated population of 46,649,132 inhabitants and a density of 166.25 inhab/km 2 . People are relatively dispersed in the state. However, more than 22 million people are concentrated on the metropolitan area of the city of São Paulo. The same happens to Minas Gerais, with a population of approximately 21.5 million people, density of 33.41 inhab/km 2 , and more than six million people residing on the metropolitan area of the state capital, Belo Horizonte. The population density of the third smallest state in the country is 365.23 inhab/km 2 , and like the other two states, approximately 76% of the 17,463,349 inhabitants live in the metropolitan area of the state capital, the city of Rio de Janeiro [18]. Altogether, the three  05  20  07  20  09  20  11  20  13  20  15  20  17  20  19  20  21  20  23  20  25  20  27  20  29  20  31  20  33  20  35  20  37  20  39  20  41  20  43  20  45  20  47  20  49  20  51  20  01  21  03  21  05  21  07  21  09  21  11  21  13  21  15  21  17  21  19  21  21  21  23  21 Biweekly period  The Brazilian official calendar includes several federal holidays and religious festivities that can lead to crowding. State holidays and regional festivities were not included in the analysis. There were 12 official national holidays for 2020 (Table 2). Due to the severity of COVID-19 worldwide, the Brazilian Ministry of Defense followed recommendations of health authorities and canceled military parades and other festivities of September 7th to avoid public events that could increase the spread of new variants of SARS-CoV-2 [9,19]. The two rounds of Brazilian elections, in November 15th and in November 29th, were also included as potential sources of crowding. Holidays usually involve increased mobility of people. Thus, we assessed mobility of Brazilian community using data retrieved from Google LLC (Alphabet Inc., Mountain View, CA, USA) during the epidemiological weeks evaluated herein. The company started publishing mobility reports [20] in early April 2020. These reports showcase how COVID-19 and countermeasures influenced the dynamics of mobility over time. Mobility database presents median of data collected daily after removal of noises rather than raw quantities [21].
Changes and trends observed in mobility data in several contexts, such as retail and recreation, supermarket and pharmacy, parks, transit stations, businesses, and residential areas, and the variation of average daily cases of COVID-19 per week can be seen in Figure 4. There is a contrast in mobility patterns, daily reports, and deaths from weeks 14 to 20 of 2021 (Figures 1b and 4b). The increase in the number of cases was followed by a growth of death reports during the worst chapter of COVID-19 pandemic in Brazil. As a result, celebrations were canceled and stricter measures to contain mobility were implemented. Thus, the assessment of holidays and mobility patterns can reveal trends to be used as input data in regression systems to study COVID-19.

Principal Component Analysis
The mobility report dataset provided by Google contains a complex arrangement of information and patterns that are influenced by different community sectors, habits, and external events. Nevertheless, with the transformation of such information into data, it is possible to produce an independent variable that may be able to generalize a temporal event or situation. Principal Component Analysis (PCA) allowed us to perform the analysis of different projections of the dataset without losing significant information and to generalize dimensions with a smaller number of data.
The PCA method reduces the number of dimensions in a scenario with multiple variables by using a linear transformation to turn a large number of correlated original variables into a smaller number of uncorrelated variables [22]. The newly discovered dimensions are smaller or equal in size to the original variables used in the algorithm [23].
PCA is a statistical multi-variate analysis method that aims to identify the main factors that cause the most variation in a set of data. As a result, we consider as the primary goal the information contained in several original variables in a smaller set with as little information loss as possible.
To set up the sorting of the principal components (PC) of mobility reports data using PCA, the first step was to measure the average value of all dimensions of the database. To avoid unequal supplying of contribution for a given dimension into the process, we applied a method to standardize the data to generate input in a common scale. This mapping is given by: where z is the scaled value, x is the original value, and µ and σ are the mean and standard deviation, respectively. Next, we computed the covariance of variables and established the covariance matrix A, which is given by: When a linear transformation is applied to a nonzero vector, the eigenvector, or characteristic vector of that linear transformation, changes by a scalar factor. The factor by which the eigenvector is scaled corresponds to the eigenvalue. Due to this transformation, it is necessary to compute eigenvectors and corresponding eigenvalues of the matrix A: where I is the identity matrix, and λ is the eigenvalue, which is a scalar value. This means that λ defines the linear transformation. Finally, the eigenvectors are sorted by decreasing eigenvalues, and the k eigenvectors with the largest eigenvalues are chosen as the PC.

Epidemiological SEIRD Model
The Susceptible-Exposed-Infected-Recovered-Dead model, also known as SEIRD [24,25], is an epidemiological analysis based on the Susceptible-Infected-Removed model, known as SIR [26,27]. SEIRD is intended to enhance the SIR model in order to explain the evolution of a population in terms of transmissibility, contact rates, and the expected duration of infection in the course of an outbreak. Some assumptions must be made in SEIRD modeling: Demographic features are not implemented or adopted; • Heterogeneity: an infected individual has an equal chance of contacting a susceptible person. SEIRD categorizes the population in groups to analyze data and design forecasts based on reported cases: • Susceptible (S): Individual who is prone to be infected on day t, and has never been infected and is not immune to infection; • Exposed (E): Individual who has been exposed to the disease but was not able to infect another person nor show symptoms; • Infected (I): Individual who is infected and producing virus that can potentially infect other individuals; • Recovered (R): Individual who was ill and recovered on day t with alleged acquired immunity; • Dead (D): Individual who died because of the infection.
The movement of individuals from one group to the other during the course of the outbreak is resolved by using Ordinary Differential Equations (ODE), which constitutes the dynamic of the SEIRD model. ODE are defined as follows: where the infectious rate beta (β) represents infections per exposure, i.e., a susceptible individual has contact with an infected individual and presents a latent infection after exposure; the infectious rate epsilon ( ) represents the potential rate of infection per exposure, i.e., a susceptible individual that has mutual contact with an exposed/infected individual and may infect another susceptible individual; the transitional rate alpha (α) of an exposed individual to infect others is the average latent period 1 α . Gamma (γ) and mu (µ) represent, respectively, the recovery rate and rate at which infected people become deceased; 1 γ is the mean infectious period. Figure 5 synthesizes the model. SIR was the first epidemiological model to use compartments. In this model, the whole population can be assigned to only three basic compartments, namely susceptible, infected, and removed. The last compartment includes people who recovered from the infection and the ones who died because of the infection. This is one of the limitations of the model. Disregarding exposure and the incubation period is another pitfall of using SIR analysis. An alternative approach to analyze the population in compartments during an outbreak is to employ the Susceptible-Exposed-Infected-Removed-Susceptible (SEIRS) model [28]. Unlike with SIR, individuals from the removed compartment can return to the susceptible compartment. However, our database provided death figures but lacked reinfection information. Thus, the SEIRD model proved to be more appropriate for our data set.

Basic Reproduction Number
The estimated number of secondary cases created by a single (typical) infection in a fully susceptible population is known as the basic reproduction number R 0 , a dimensionless number, and is referred to in many cases as the simple reproductive rate.
A possible way to find R 0 is by adopting the next-generating matrix to calculate the reproduction number. Furthermore, the method calculates the current reproduction number R e , which is the secondary infection rate. The model is divided into two groups: X 1 , containing E and I individuals as the infective and infectious group, and X 2 , which contains S, R, and D individuals as the susceptible, recovered, and dead group . Assuming S ≈ N and f (X 1 , X 2 ) and v(X 1 , X 2 ) are the vectors of new infection parameters and other parameters, respectively, then The eigenvalue of FV −1 with the highest value is R 0 = α + β γ+µ and R e = β γ+µ . An outbreak is not likely to happen, or it is controlled, when R 0 ≤ 1. When R 0 > 1, however, the disease spreads exponentially, resulting in an epidemic.

Multi-Variate LSTM Model
Neural networks are useful tools for pattern recognition. Methods such as ARIMA [29,30] have been demonstrated to be insufficient in the long run to generalize or maintain accuracy of patterns in regressions of long periods. Some approaches are appropriate for problems that cannot be solved linearly. In such cases, the neural network allows us to identify the degree of relationships among variables, such as the ones in our study, since non-linear variables cannot have causality or correlation explained by commonly used methods. Thus, neural networks produce promising results for one-variate, bi-variate, and multi-variate regression problems.
In Long Short-Term Memory (LSTM) [31], similar to Recurrent Neural Network (RNN), a context of memory persists within the pipeline, allowing them to solve sequential and temporal subjects without being hampered by the vanishing gradient. These neural networks are built on the usage of gates that direct how information is forwarded and ignored inside its internal structures to achieve such complex learning retrieval from sequential sources.
LSTM differs from the traditional approach in that it contains an element called cell state, which determines whether the information is stored or not. The cell state can transport pertinent information throughout the sequence's processing, cross the entire thread of interactions, add or delete data from this state cell, and set it according to structured switches.
Given this benefit, the goal is to use stacked LSTM units as a layer, composed of 200 blocks attached to a dense layer to predict a bi-weekly series of COVID-19 daily deaths taking into account one-and multi-variate input data, as shown in Figure 6. For the one-variate model, we use COVID-19 daily cases as input data to predict the next 14 days of COVID-19 deaths. The learning algorithm is asked to output a function f : R n → R in order to complete this task. For the multi-variate model, we set a mix of the data, finding the best approach, such as factor R 0 and/or principal components from PCA, to a multi-variate model to predict the next 14 days of COVID-19 deaths. The proposed model has the same architecture for any case, changing only its input data.

RMSE and Model Grid-Search
To evaluate the model, we apply the use of RMSE (root mean squared error), a metric that calculates the deviation of errors between observed (ground-truth) and predicted values (hypotheses): where y i is the true value, the ground truth, andŷ i is the predicted value from the model. The sum will be given different weights and the RMSE index will rise dramatically as the instances' error values rise. That is, if there is an outlier in the dataset, its weight in the RMSE calculation will be higher, harming the performance by increasing the RMSE. Furthermore, a grid search to find the best set of input data for the model in forecasting the COVID-19 deaths with the lowest RMSE was applied. We apply the same experiment for each input dataset over 50 times, which allows us to extract the expected value and the standard deviation of each configuration in predicting our goal. For this, we split the data into training and testing data. The test set is related to the last 28 days (4 weeks) of the temporal series. Therefore, the RMSE extracted and reported in this work is related to comparisons against the test data.
The aforementioned workflow is described in Figure 7. The model receives each set of input data, and the model's expected outcome applied to the test data is subjected to RMSE calculations. After the process has been repeated 50 times, we distill a boxplot graph to observe which configuration performs better on average, observing the standard deviation and mean RMSE calculated. To create a temporal plot, comparing the ground truth (real values) and the model's predictions, we choose the model with the lowest RMSE in forecasting COVID-19 deaths for each set of input configuration. For example, for all 50 repetitions using cases as input data, we plot the curve with the lowest RMSE.  Figure 7. Workflow applied to grid search of input data sets.

Experiments and Results
We performed a series of experiments in order to better understand whether community mobility on a public holiday could be a relevant element for predicting COVID-19 daily cases and deaths in Brazil. Initially, the basic reproduction numbers R 0 and R e were extracted from the SEIRD model, providing the association of national holidays and daily infections and deaths by COVID-19. Subsequently, PCA was applied for dimensionality reduction on the Community Mobility Report dataset. The dimensionless factors R 0 , R e , and the PCs generated by PCA were applied to the deep learning model as input data for the training step.

SEIRD Model to Extract Basic and Current Reproduction Rate (R 0 and R e )
Based on the assumption that infection begins with a single person, the initial values of infected, I initial , and exposed, E initial , are set to 1. The initial values of the recovered, R initial (which we added the vaccinated individuals), and deceased, D initial , groups are set to 0. To determine the initial value of susceptible S initial , we use the following equation: (11) where N is the population size or the total number of individuals. The incubation period for SARS-CoV-2 ranges from three to six days. During this period, infected individuals do not present symptoms [32,33]. Furthermore, COVID-19 symptoms, when present, usually appear up to 14 days after infection [32]. The SEIRD model uses parameters β, , α, γ, and µ to tune the forecast. Such parameters have been estimated by minimizing a function based on the least square error to obtain optimized values and an optimal solution. We defined the initial values, lower bound, and upper bound as inputs for each parameter to recover their optimal value. Initial values and respective bounds are described as follows: To generate the periodic data containing R 0 and R e , the pipeline was submitted to a loop process that consider a step of a day and take into account a period of 14 days (bi-weekly period). The extracted values are shown in Figure 8. The holidays and Brazilian elections over the COVID-19 daily cases curve are also shown in Figure 8.   We noticed a series of events that triggered crowding and, somehow, had an impact on Brazilian COVID-19 reports. Although the 7th of September holiday was canceled in 2020, there was an increase in COVID-19 reports. The same happened on All Souls' Day, on 2 November. The trend in the curve of cases and the basic and current reproduction rates, R 0 and R e , was probably influenced by proximity of people and infection caused by crowding with poor social distancing. Illegal crowds, which were not allowed at the time, may have also contributed to the increase of infections. The increase in reports was observed in each state and in the whole country (Figure 3).
From a period of 448 days of COVID-19 reports, 80 fell into a 14-day window after the holiday, as seen in Table 3. These windows, which we called holiday periods, represented 17.86% of the days in our analysis. Throughout holiday periods, R 0 values were usually above 1. Regular days, or the ones outside holiday periods, accounted for approximately 82% of the period of our study. R 0 was above one in only 23.21% of the regular days. As COVID-19 was officially reported in March of 2020 in Brazil, R 0 values were high at that point of the pandemic. We decided to exclude this month from the analysis. In this scenario, 15.94% of the days were holiday periods presenting R 0 > 1. Only 20% of the regular days had R 0 above one. When R 0 component is greater than 1, deviation is higher than when its value is less than 1 (Figure 9). When holiday periods are considered, distribution shows apparent outliers. When R 0 is less than 1, distributions are similar, ranging between 0.75 and 1.
Here we notice that asymptomatic individuals and/or unreported cases have played an important role in SARS-CoV-2 transmission, causing some of the unexpected trends ( Figure 8). For example, increases in COVID-19 cases occur when the reproducing factor (R 0 ) is less than 1.0. This trend occurs because the reproducing factor is a result of data retrieved from official reports and may not represent their real values. According to estimates [34], there is a substantial underreport in cases of COVID-19. Actually, the difference is sometimes one order of magnitude in number of cases, which brings a much higher R 0 (near to 3).

Holiday -R>1 Not holiday -R>1
All R>1 Holiday -R<1 Not holiday -R<1 All R<1  When the distributions of days of the holiday period and non-holiday are shown ( Figure 10), it is seen that the values on holiday days are more concentrated at values R 0 > 1. On non-holiday days, R 0 values are close to 1 but relatively in a range equal to or less than 1. Despite the distributions shown in Figures 9 and 10, applying the Shapiro-Wilk test [35] to assess whether the distributions are similar to normal distributions, the data that concern holiday periods and R 0 < 1 have Gaussian behavior by the obtained p-value 0.195 ( Figure 11c); for all others cases, the p-values are 0.000 (Figure 11a,b,d). Because of this, we consider two null hypotheses: (1) H0 1 p : R 0 > 1 have the same pattern in holiday periods and in non-holiday periods; (2) H0 2 p : R 0 < 1 have same pattern in holiday and non-holiday periods. Applying the Mann-Whitney test [36], the p-values obtained for the null hypothesis were 1) H0 1 p : 0.3856 and 2) H0 2 p : 0.0022, whereas the significance is compared to the value being less than 0.05. The results support the idea that R 0 > 1 on days that are in holiday periods, differing from days that are not in holiday periods. It also shows us that the R 0 < 1 does not differ if the day is a holiday period or not.
As a result, these data, in addition to the number of cases, are expected to be important when dealing with disease contamination dynamics. Once we establish a causal relationship between infection rates and the number of confirmed cases, we expect the addition of holiday dates to play an important role in improving the accuracy of predicting COVID-19 deaths.

Principal Components for Mobility Data
We used PCA to transform a set of mobility data variables to an orthogonal mapping, whereas linear regression accounts for the best straight line to fit these data. Figure 12 indicates that a single PC can approximate mobility data variability by 79.19%, whereas two PCs can approximate mobility data variability by 90.25%. By adopting two PCs, as shown in Table 4, we observed that places such as retail and recreation, as well as transportation terminals, harm PC 1 , whereas residential had a positive influence on PC. Parks had a positive impact on PC 2 , whereas workplaces had a reverse effect. Therefore, with these two PCs, we explained and simplified mobility data variability by 90.25%. Therefore, the use of PCA helped to convert correlations of six dimensions into community mobility reports into a smaller projection, with 2 dimensions. Figure 13 shows the curves derived using the PCA method, together with the R 0 value provided by the SEIRD Model .   06  20  08  20  10  20  12  20  14  20  16  20  18  20  20  20  22  20  24  20  26  20  28  20  30  20  32  20  34  20  36  20  38  20  40  20  42  20  44  20  46  20  48  20  50  20  52  20  02  21  04  21  06  21  08  21  10  21  12  21  14  21  16  21  18  21  20  21  22  21  24 Figure 14 shows the trend of scattering data, which reveals that PC 1 values are more determinant for R 0 > 1 during holiday periods when PC 1 values are more widely distributed. Further, we notice that PCA has different behaviors for mobility data for 2020 and 2021. For 2020, it results in more sparse values, which are mainly distributed between −4 and 6, as seen in the PC 1 (X axis) of Figure 15a (orange dots). For 2021, it is composed, in general, of values between −2 and 1, besides an outlier observed with a value of 4 ( Figure 15b). The proposed hypothesis here is that such PC time-series values can serve as input data for a forecast model for improving prediction. We consider that these patterns presented in the PCs may assist as a behavioral tool for the interpretation and prediction of the COVID-19 daily deaths curve under the influence of these new dimensions generated.

Forecasting COVID-19 Deaths
We suggest considering the extraction of R 0 and its presented patterns, and the PCs generated, all of them as data input to the LSTM model to predict the number of deaths by COVID-19. The generated PCs represent the community mobility in a smaller number of dimensions, but explain the variability in 90.25% of the community mobility. The set of input configurations that can be used in the proposed network are presented in Table 5. Table 5. All settings considered for the grid-searching, evaluating R 0 and R e , extracted from SEIRD Model, and Holiday flags, which is a flag value representing the holiday + 14 days subsequent, and PCs generated by the PCA method. For all configurations it proposed is to forecast the COVID-19 daily deaths for the last bi-weekly period, reserved for testing.

Configuration Number
Input Data In order to test and choose for the best configuration, each of them is submitted to a grid-search process of training. In this process, they are executed 50 times and the forecast variations of each model are then analyzed using the test input data. By using the number of cases as the basis for predicting the number of deaths (baseline case), the median value of the results obtained from this method demonstrated that the median of the RMSE error is approximately 214.58 deaths over a 14-day forecast period, as shown in Figure 16a.  Table 6 lists the values for the other model configurations that were submitted to the process in the test-input-data step. As observed in Table 6, forecast was improved by adding the holiday flag to the one-variate model (configuration 2) that initially used only number of cases as input data. Another observation was related to the use of principal components as a substitute for the use of number of cases as input data. In these cases, we expected a reasonable forecast as an alternative to using only number of cases or some other data together with infected cases. Adopting only a dimensionless value of R 0 or R e , on the other hand, was not a reasonable approach, especially if one of them was associated with some holiday flags. However, using them as input data in groups and adding holiday flags helped the forecasting problem. Table 6. RMSE over test input data median and average prediction error of 50 models for each configuration described in Table 5 Other configurations, such as using the basic reproduction number (number of cases + R 0 ) as input data, yielded more accurate results, with an average recorded RMSE of 173.84 (Figure 16b), which was an improvement of 18.99% over the baseline using only number of cases as input data. The use of R 0 and R e with the holiday flag (Figure 16c) produced an RMSE median error of 177.08. This was another configuration that should be highlighted with a higher accuracy of 17.48% over the baseline using only cases as input data.
When comparing the generated curves by the model and the COVID-19 daily deaths ground truth (Figure 17), we observed that the R 0 provided stability in forecasting and reducing the noise in the predicted curve ( Figure 17b). However, we noticed that there is a time shift of approximately a week in the predictive curve in certain periods. Considering Figure 17c, generated from the adoption of R 0 , R e , and the holiday flag, we observed this time shift problem was solved, despite an increase in RMSE error compared to the previous curve ( Figure 17b). Nevertheless, a better and more accurate prediction of deaths caused by COVID-19 was achieved.  Cases, that is our baseline case; considering which parameters can improve the baseline case, we select Conf.3: Cases + R 0 ; without using Cases as a input-feature, we select Conf.9: R 0 + R e + Holiday flag, and; Conf.10: PC 1 + PC 2 as input data. For these configurations, we draw the best curves (with the lowest RMSE) of each configuration over the test data.
there will relaxation of such policies, mainly due to a decrease in cases and deaths by 425 COVID-19 [38].  Cases, that is our baseline case; considering which parameters can improve the baseline case, we select Conf.3: Cases + R 0 ; without using Cases as a input-feature, we select Conf.9: R 0 + R e + Holiday flag, and; Conf.10: PC 1 + PC 2 as input data. For these configurations, we draw the best curves (with the lowest RMSE) of each configuration over the test data.
there will relaxation of such policies, mainly due to a decrease in cases and deaths by 425 COVID-19 [38].

426
Several factors help reducing compliance to public health policies, such as socioe-427 conomic inequality, conspiracy theory-driven that leads to noncooperation, as well as 428 (d) Figure 17. Curves of selected settings: Conf.1: Cases, that is, our baseline case; considering which parameters can improve the baseline case, we select Conf.3: Cases + R 0 ; without using Cases as an input feature, we select Conf.9: R 0 + R e + Holiday flag, and; Conf.10: PC 1 + PC 2 as input data. For these configurations, we draw the best curves (with the lowest RMSE) of each configuration over the test data. (a) COVID-19 daily deaths forecast using cases as input data. (b) COVID-19 daily deaths forecasts using Cases + R 0 as input data. (c) COVID-19 daily deaths forecasts using R 0 + R e + Holiday flag as input data. (d) COVID-19 daily deaths forecasts using PC 1 + PC 2 as input data.

Discussion
Our data suggest that holidays and holiday periods influenced R 0 values for COVID-19, as well as mobility of people. Non-pharmaceutical interventions, such as social distancing, use of masks, and hand sanitizers were the initial measures to contain the spread of SARS-CoV-2. However, the Brazilian political, economical, and social contexts influenced the establishment of effective public health policies. The lack of extensive social distancing had a major impact on the number of cases and deaths of COVID-19 after holiday periods. The same trend also occurred with activities in which crowding was involved. The earlier social distancing measures are deployed, the sooner that such policies will be relaxed, mainly due to a decrease in cases and deaths by COVID-19 [37].
Several factors help reducing compliance to public health policies, such as socioeconomic inequality, conspiracy-theory-driven noncooperation, and behavioral aspects, such as cognitive bias [38] and free riding [39]. The association of these factors have also influenced disease patterns. Cognitive biases, for example, play an important role in decision making, as they affect individual reasoning, cloud judgment [38], and create anecdotal evidence [40]. A few types of biases affect perception of reality and influence behavior. For example, confirmation bias, i.e., which is the tendency to favor, search for, interpret, and remember information that confirms one's beliefs [41], has negatively influenced prevention strategies, control measures, and research for COVID-19 [40,42,43]. Additionally, free-riders usually avoid cooperation, exploiting others' compliance with policies [39]. The lack of compliance usually has a major impact on control measures, as it compromises collective efforts to contain the spread of SARS-CoV-2. Holidays have probably intensified such behavioral factors, causing an increase in the number of cases.
We emphasize that crowding may very well be the main factor for the maintenance of COVID-19 cases and deaths from now on. The circulation of new variants poses a threat to prophylactic measures that are now available against SARS-CoV-2. Although there is evidence that current vaccine strategies are able to elicit an effective immune response against the variants of interest and variants of concern of SARS-CoV-2 described until now, non-pharmaceutical control measures must remain as continuous public policies to avoid the spread of variants that can potentially cause new waves of COVID-19 [44]. Crowding, particularly the crowding that occurred within the first year of COVID-19 in Brazil, rapidly changed COVID-19 reports. The same pattern can be observed on 1 January 2021, when disease reports started increasing after a holiday that traditionally leads to family reunions and festivities.
Nonetheless, part of the lack of compliance with non-pharmaceutical measures may have been validated by the executive branch of the federal government, which has encouraged the population to keep crowding and to avoid the use of masks, at the same time disregarding the severity of SARS-CoV-2 infection and criticizing effective countermeasures deployed by state and municipal authorities all over the country [7]. Unfortunately, control measures were not taken even by federal, state, and municipal authorities. The direct result of the non-compliance with countermeasures is a mean mortality rate of 2.345, which is approximately 77% higher than the rest of the world [45]. Additionally, COVID-19 denial has also been widely reported on traditional networks that support the Federal Executive Branch policies, on unchecked digital media, and on social networks, adding mistrust in science into this scenario. As a result, part of the population internalized political and economical biases that were part of the federal government's agenda instead of complying with collective control measures, which resulted in one of the most severe examples of COVID-19 problems in the world [7].
Non-compliance with COVID-19 collective control measures is also related to socioeconomic status in the population in several countries. Unfortunately, informal employment is a worldwide tendency, and Brazilian reality is not different from the rest of the world. In fact, issues that have historically present in the Brazilian territory were magnified with the sanitary scenario imposed by COVID-19. Increases in several types of violence, poverty, and differential access to health and education are among the hurdles Brazilians have to deal on a daily basis. Festivities and hallmark holidays are usually a chance for informal workers to increase individual or family income as they increase the demand for consumer goods such as food, clothes, and beverages. Public policies to control the spread of SARS-CoV-2, such as lockdowns and social distancing, caused a major impact in families whose income relies on informal jobs and social interactions. Informal employment causes significant tax loss and reduced public revenues, leading to less available resources for important public services, such as social protection and health care in Brazil, but also contributes to poorer working conditions and unfair competition for legitimate businesses and collective bargaining [46]. Data from Bahia, a state from the northeast region of Brazil, indicates that withdrawal of informal workers decreases the productive capacity of the country and leads to pronounced negative impacts on the economy, mainly in the service sectors. However, federal, state, and regional programs for income transfer helps to mitigate the negative impacts of COVID-19 by 50% [47]. Implementation of control measures associated with income transfer policies would not only influence mobility of people but also have a positive impact in COVID-19 reports.

Conclusions
As a result of the experiments, we unveil some intriguing concerns. First, we have discovered that the effects of holidays, or holiday periods, cause immediate increases in R 0 and R e trends. These factors, extracted from the SEIRD model, have a significant impact on the COVID-19 death curves and reports.
Furthermore, principal components generated with community mobility report data using the PCA approach indicate that holidays may cause distinct report patterns over time, which can be analyzed to improve COVID-19 regression of death curves. Furthermore, the PCA approach has proven to be important because it reduces the dimensionality of the feature space, and with fewer input dimensions, the model is easier to find.
Cases combined with holiday periods, cases in association with R 0 , or holiday periods with R 0 produced an effective strategy of analysis compared to using the current number of cases to predict future deaths. Furthermore, when access to epidemiological data is limited, the use of community mobility is a promising alternative as it produced better results than cases as input data.
Based on the trained model, it is possible to generate synthetic data, for example, simulating an increase in mobility, and see how this affects the model's forecast. If the trained model has a high forecast accuracy, we can confidently estimate the impact of each feature on the dynamic of the death curve. This finding may be used to help authorities managing resources in the event of future epidemics.
Besides having produced important results showing that we are in the right research direction, our current data are not final, and further work should be done in order to draw a thorough and final conclusion on the enhancement of the models for prediction. For that purpose, it will be necessary in future work to rank-order factors (in order of relevance), taking into account the literature consensus on factors in COVID-19 infection rate. Some papers list more than 50 potential features for predicting the number of cases, including mobility and climate variables such as temperature, humidity, and air pollution [48]. To predict deaths, the number of vaccinations, population age, and the number of available ICUs should all be considered.
Finally, using a large number of variables as input data will not necessarily improve the prediction of COVID-19 fatalities in an LSTM model. However, once a large number of data have been collected, it is possible to use exploratory methods to conduct trials that contribute to and improve the accuracy of the model. We also intend to adopt other forms of data visualization and approaches that can improve understanding of the virus's spread dynamics and help make predictions. A possible approach is the use of complex networks, which help to understand how variables interact, as shown in [49].
In the near future, we intend to study and add new variables to our prediction model, such as the association and influence of countermeasures defined by the federal, state, and municipal governments and particularities of people from different regions of the country, among others. With this approach, a comparison between the effectiveness of control measures by different states or municipalities can be evaluated to help avoid similar scenarios in the future.  Data Availability Statement: Code and data used in this research can be found in https://github. com/Natalnet/ncovid-holidays-paper (acessed on 30 October 2021).

Conflicts of Interest:
There are no conflict of interest nor competing interest for this work.