Deep-Data-Driven Neural Networks for COVID-19 Vaccine Efficacy

Vaccination strategies to lessen the impact of the spread of a disease are fundamental to public health authorities and policy makers. The socio-economic benefit of full return to normalcy is the core of such strategies. In this paper, a COVID-19 vaccination model with efficacy rate is developed and analyzed. The epidemiological parameters of the model are learned via a feed-forward neural network. A hybrid approach that combines residual neural network with variants of recurrent neural network is implemented and analyzed for reliable and accurate prediction of daily cases. The error metrics and a k-fold cross validation with random splitting reveal that a particular type of hybrid approach called residual neural network with gated recurrent unit is the best hybrid neural network architecture. The data-driven simulations confirm the fact that the vaccination rate with higher efficacy lowers the infectiousness and basic reproduction number. As a study case, COVID-19 data for the state of Tennessee in USA is used.


Introduction
The spread of SARS-CoV-2 virus began in Wuhan, China December, 2019 [1]. The virus spread across the globe quickly, affecting many lives and causing deaths, thereby making the World Health Organization (WHO) [1] to declare it a pandemic and naming it COVID-19 [1,2]. Studies on mitigation measures such as social distancing, quarantining, and government shut-downs for controlling the disease show the need for an effective vaccination program. The rate at which the population is vaccinated and its efficacy are two main factors in an effective vaccination program [3]. There is a need for coordinated efforts by all stakeholders such as policy makers, pharmaceutical companies, researchers, epidemiological and health professionals to effectively champion vaccination campaigns for curbing the spread of the disease [4]. Recent studies on vaccine efficacy [5] and vaccine confidence [6] show the importance of an effective vaccination strategy.
Mathematical models have been used to study the dynamics of an epidemic using systems of differential Equations [7][8][9]. The model used in this work is called the SIR model: it divides the population into three groups, namely Susceptible (S), Infected (I), and Recovered (R). The extensions of this basic model have been applied to COVID-19 in recent times [10]. During a pandemic period, the main challenge in this type of model is ascertaining the parameters of the model. Conventional least square method can be used to [11] estimate reasonable constant and time-dependent parameters [12].
Data-driven models that make use of deep learning architectures have been applied to epidemiological models. Raissi et al. [13] used physics informed neural network (PINN) to learn constant parameters of the Influenza model. Kharazmi et al. [12] applied PINN with S(0) ≥ 0, I(0) ≥ 0, R(0) ≥ 0, N(t) = S(t) + I(t) + R(t), β transmission rate, γ recovery rate, efficacy rate η and vaccination rate v. The population per time is assumed to be constant throughout the vaccination regime.
Integrating Equation (2) leads Thus, S(t) remains non-negative for all t > 0. Similarly, I(t) ≥ 0, R(t) ≥ 0 all remains non-negative. Proof. The population growth can be written as

Boundedness of the Model
It can be seen from Equation (3) that Integrating both sides of (4) gives the solution for N(t) Thus, when t → ∞, it can be seen According Theorem 1 and Equation (6),

The Basic Reproduction Number R 0
This refers to the number of people expected to be directly infected by one person who gets the virus [11]. It is defined as When the value of R 0 < 1, then the disease will disappear at some time in the future. The opposite is true for R 0 > 1 with severe effects. A R 0 = 1 indicates that the spread of the virus is stable and persistent. In a vaccination regime, the current reproduction number R t is defined as the reproduction number per time. It can be computed as This depends on a time-varying transmission rate β t and recovery rate γ t . The effective reproductive number R e f f is defined as where R 0 is the basic reproduction number at a period of time and S is the number of susceptible individuals and N is the population size of a geographical area.

Deep Learning Algorithms
The feed-forward neural network (FNN) is a type of deep neural network architecture whose node connections are simple and straightforward but not cyclic. The network consists of input, hidden, and output layers. The input layer takes in data and some neurons are applied then propagated forward to the hidden layers, where some linear or non-linear activation functions are applied to transform the data. In the first hidden layer, the weighted sum of the input data through the activation function is computed and propagated through the rest of the hidden layers. A bias term is added to each output from each hidden layer. Thus, the output layer processes the net output from the last hidden layer to produce the desired outcome. If the task is a classification task, then the output layer will produce discrete outcomes but if the task is a regression task, then the output layer will produce a continuous-valued outcome [2,22]. The mathematical formula that transforms the data from one layer to the other is defined as follows [23]: where w l ij is the weights between the previous node ith and current node jth, b l is the bias for the lth layer, f represents the activation function, n l is the number of neurons, z l i , i = 1, . . . , n l , shows the output of the ith node in (l − 1)th layer. In this work, the activation functions used are the the tangent hyperbolic function and softplus function Different loss functions and optimizers such as Adam or gradient descent method [2] are used to build and train the network to get the output. The difference between the network's output and the actual data is computed as the error.

Epidemiology Informed Neural Network (EINN)
The epidemiology informed neural network (EINN) is inspired by a physics informed neural network (PINN) [13] which incorporates the epidemiological parameters of the model and initial values into the loss function. . The output from EINN satisfies the differential equations in model (1). This is achieved by encoding the the residuals of the model (1) into the loss function. The method of automatic differentiation [24] is used to compute the derivatives of the output with respect to time for each residual equation. In this study, the mean squared error (MSE) is encoded as the loss function which consists of where where the residual e i , i = 1, . . . , 3 The Adam optimizer which is a first order gradient-based optimization is employed to update the networks' parameters by minimizing the loss function. Figure 1 shows the schematic diagram of the epidemiology informed neural network (EINN). The data is preprocessed by using a mini-max scaler factor to allow for smooth training in the neural network. 80 neurons are used for each of the five hidden layers. The Adam optimizer is used in all the data-driven simulations. In order to impose the epidemiological constraints, the Latin hypercube sampling [25] is employed to sample 3000 data points and the spline cubic interpolation is also used to sample 5000 data points. The tangent hyperbolic activation function is applied to all the hidden layers while the softplus activation is applied to the output layer. Implementation of EINN is done in Tensorflow which is run on Python. . The processed daily infected data at time t is passed into the network as seen in red diamond input layer. A hyperbolic tangent activation function is applied to the processed data in the hidden layer. The loss function is defined to incorporate the initial values, data and the residual from the systems of differential equations in (1) in yellow circle. The output layer produces the learned β, γ and the infected group.
The EINN Algorithm 1 for estimating the epidemiology parameters is presented as follows.

Algorithm 1 Epidemiology Informed Neural Network (EINN)
1: set input arguments t j , j = 1, . . . , M vaccination rate v efficacy rate η epidemiology parameters β, γ Initial values of the SIR model 2: construct and run neural network specify weights, biases and activation 3: compute the residuals e i =ŷ i − y i 4: calculate loss function encode the epidemiology Loss = MSE data + MSE SIR + MSE U 0 5: train the neural network set epochs 6: return the solution the epidemiology parameters β * , γ * and infected data I * .
In Algorithm 1, a sample code for computing the residuals is given below. The tf.gradients is responsible for the automatic differentiation in Tensorflow which computes the derivatives of the output of the neural network with respect to time. The loss function in Equation (13) is calculated. The network is trained using different epochs. In particular, the epochs is set at 1400 for vaccination rate at 0%, and 800 for the rest of the vaccination rates at 0.5%, 1%, 2%, 3%, 6%, 10%.
The tf.gradients is responsible for the automatic differentiation in Tensorflow which computes the derivatives of the output of the neural network with respect to time. The loss function in Equation (13) is calculated. The network is trained using different epochs. In particular, the epochs is set at 1400 for vaccination rate at 0%, and 800 for the rest of the vaccination rates at 0.5%, 1%, 2%, 3%, 6%, 10%.

Recurrent Neural Network (RNN)
Recurrent neural networks are networks that are used to learn problems involving sequential data [4]. In this work, all the variants of RNN are implemented in PyTorch.

Long Short-Term Memory (LSTM)
The long short-term memory (LSTM) is a variant of recurrent neural network (RNN) used to handle sequential data like time series data. It was developed to solve the vanishing gradient problem [26]. The LSTM has three gates that control the flow of information: input, forget, and output gates. These gates have logistic functions that control the weighted sums obtained during training by backpropagation [4]. The cell state manages the input and forget gates. The output comes from either the output gate or hidden state. The unique feature of this network is that it is able to learn long dependencies within the data and able to effectively handle time series data. Given the input data x t , and the number of hidden units h, the gates of the LSTM can be defined as follows.
• Input Gate: W xc , W hc denote the weight parameters, and b c is the bias parameter and • is the element-wise multiplication. The value of C t is ascertained from the output of memory cells C t−1 and the current time step C t . • • is the element-wise multiplication. Figure 2 shows LSTM architecture. The input, forget, and output gates are represented by I t , F t , O t , respectively. The memory cells and memory cell content are denoted, respectively, as C and C.

2.
Bidirectional LSTM The improved version of the LSTM is bidirectional LSTM (BiLSTM). The network is structured in a way that allows for both backward and forward propagation through the sequential layers [4]. In contrast to LSTM (which only allows for forward pass through the current state), the BiLSTM's unique advantage is that there is an improvement in accuracy in state reconstruction. This structure combines two hidden states, which allows for information flow from the forward layer to the backward layer. The forward, backward, and output sequences are defined as follows: where L is the sigmoid function application which is the LSTM unit in the structure of BiLSTM. 3.

Gated Recurrent Unit (GRU)
GRU is built to improve the performance of LSTM and reduce the number of its parameters. The input and forget gates from the LSTM model are merged into one gate called the update gate [4]. It is made up of only two gates, update and reset gates, instead of three gates in LSTM. The update gate couples the input and forget gates of the LSTM and the output gate as a reset gate. This gives the GRU an enhanced improvement in LSTM. The relationships among the gates are defined as follows.
• Update gate: where • W xr , W xz , W hr are the weight parameters and b r , b z are the bias parameters. • W xh , W hh are the weight parameters and b h is a bias parameter. The current update gate Z t is a combination of the previous hidden state H t−1 and current candidate hidden state H t . Figure 3 presents BiLSTM architecture (left) and GRU architecture (right). In BiLSTM, the forward and backward layers are indicated by yellow and green colors, respectively. In GRU, the reset and update gates are represented by R t and Z t , respectively.

Residual Neural Network (ResNet)
This type of deep neural network behaves like first order Euler method [19]. In this work, we cast the ResNet method as an ODE-solver and construct a hybrid approach with traditional recurrent neural networks. Given a deep neural network of the form The ResNet is defined as where y out is the output, y in is the input (data from solver or fabricated or real measurement), N is the neural network and θ is the parameter of the neural network. The basic structure of the ResNet is given in Figure 4.

Data Preprocessing
The data for this work is obtained from Tennessee Health Department [27]. The available data as of the time of preparing this manuscript is from 12 March 2020 to 16 September 2021. Cumulative cases are used for the training of the EINN where a 7-day rolling mean is applied to the data. In particular, the training data is from 12 March 2020 to 16 December 2020. This is the period of no vaccination program. The data is preprocessed using MinMax scaling. The EINN is implemented in Tensorflow while LSTM, BiLSTM, and GRU as well as k-fold cross validation are implemented in PyTorch.

Error Metrics
In this section, error metrics for data-driven simulations are presented. A survey of the reason and choice of when to use these error metrics is discussed. The metrics used to measure the performance of regression-based models include: Root Mean Squared Error (RMSE), Mean Absolute Percentage Error (MAPE), and Explained Variance (EV) [4]. The values of RMSE are used to compare different models to select the best based on least values [28]. Suppose y denotes the real data andŷ is the predicted data from the model.

1.
RMSE: By the taking the square root of the mean squared error (MSE), the Root Mean Squared Error (RMSE) is obtained and defined as follows: This error metric scales the mean squared error (MSE) which serves as a normalizer of large errors. The RMSE serves as a general-purpose error metric and is excellent for numerical predictions [28]. In terms of measuring model accuracy, the RMSE, as a scale-dependent variable, is used to compare errors of prediction from different models or model configurations for a particular variable.

MAPE:
The Mean Absolute Percent Error (MAPE) is the relative error in the mean absolute error.
The main advantage of using MAPE is that since it is weighted mean absolute error, it is useful for quantile regression. Drawbacks include that it cannot have small values close to zero or zero in the denominator; since percentages cannot exceed 100%, any values high or low cannot be captured; and high negative errors are penalized [29]. 3.

EV-The
Explained Variance is the measure of variation in the predicted y as explained by the neural network algorithm [4].
where Var is the variance between the predicted and the real data. The Explained Variance (EV) is analogous to the co-efficient of determination R 2 used for linear regression. The EV is ideal for nonlinear regression. When EV gets closer to 1, it means the algorithm predicted the targets correctly.

k-Fold Cross Validation
The k-fold cross-validation is the standard method for evaluating the performance of a deep or machine learning algorithm on a given data [30]. This approach is useful when the available data is small. In practice the typical choice for k is either 4 or 5. The procedure consists of a random split of available data into k partitions for training k identical models. In particular, the training is done on each k − 1 partitions and the remaining set (test set) is for evaluating the model. The overall score of the model is determined by the average score from the k test scores obtained. Since this paper is about regression model, the error metric used for the scoring each fold is Root Mean Squared Error (RMSE).
Given that a value of k = 4 is chosen to perform the k-fold cross validation, the LSTM, BiLSTM and GRU are each run 4 times. This creates 4 identical outcomes for each of the three algorithms. The first three partitions of the data are used for training and the remaining is used for evaluation. The mean squared error (MSE) is used as the loss function. The root mean squared error (RMSE) is computed for each k run. These errors are stored in the history of errors.
To ascertain which algorithm performs best, the mean and standard deviations are computed for comparison purposes. In order to have a more reliable comparison, a number of simulations is done using different values of k = 5, 6, 7. The parameter settings for cross validation for real COVID-19 data are: Learning rate 0.01, Epochs 1500, number of hidden layers 1, number of hidden units 16. Figure 5 summarizes above explanation for k-fold cross validation. Figure 5. k-fold cross validation. The available data is split into 4 random partitions where each fold has three train sets and one set for evaluating the model. The RMSE is used to score each outcome from training each fold. The final average score is computed from each test score. The standard deviation is also computed. Figure 6 gives the overall workflow of the deep learning algorithm for prediction. Available COVID-19 data is obtained from reliable sources. It is randomly split into train and test sets of 80% to 20%, respectively. For EINN algorithm, see Figure 1; the training is done using all days before the start of vaccination. The epidemiology parameters, transmission rate β and recovery rate γ, are learned as well as the infected group. For reliability and accuracy of the learned parameters, the EINN is run 200 times to quantify the uncertainty in the learned parameters by constructing a confidence interval bound using Bootstrap algorithm. The Poisson distribution is used to resample the data [11]. Using different values of vaccination rates, the EINN learns different β and γ. A standard ODE solver is used to produce epidemic outcomes at different vaccination rates using the learned parameters and efficacy rate η. ResNet data is passed into LSTM, BiLSTM, and GRU to have hybrid method ResNet-LSTM, ResNet-BiLSTM, and ResNet-GRU. The error metrics are used to compute the deviations in the test data. A short term prediction into the future is made for 15 days and a confidence interval is constructed. A cross validation is carried out to evaluate the performance of LSTM, BiLSTM, GRU and the hybrid approach.  Table 1 gives a statistical summary of Tennessee COVID-19 data, while the skewness is used to check the asymmetry, the kurtosis assesses the flatness of the time series distribution. In general, the value of skewness can either be positive or negative and the kurtosis ranges from −3 to 3. A kurtosis value less than 3 indicates that the distribution is flatter than Gaussian distribution. The dispersion in the distribution can be explained by the other statistics such as quartiles and standard deviation. This summary includes the five number summary, kurtosis, and skewness.  Figure 7 includes the graphs for real COVID-19 data for Tennessee and the data fitting for learning constant epidemiology parameters using EINN in Algorithm 1. Figure 7 shows the confidence interval constructed for learning the epidemiology parameters using Bootstrap algorithm. In order to have a reliable estimate of the learned parameters, the EINN algorithm is run 200 times. The Poisson distribution is used to replicate similar COVID-19 cumulative data in the Bootstrap algorithm. During each independent run of EINN, the value of each learned epidemiology parameter is stored. The array of each stored parameter is then graphed in the bar graph as distribution.

Data-Driven Simulations
In this section, the EINN algorithm is used to learn different β and γ for different vaccination rates: 0.5%, 1%, 2%, 3%, 6%, 10%. The underlying assumption is that these rates are assumed the same rate per day for ten months of vaccination. The goal is to analyze the impact of how vaccination with vaccine efficacy makes infections reduce to zero quickly. These values combined with the learned epidemiology parameter values are used to do numerical simulation for model (1) using a standard ODE solver. Figure 8 shows the model without vaccination. The epidemiology informed neural network is used to find the epidemiological parameters of the model and the values are passed into the numerical solver to obtain the shape of the graph. It takes 46 days to have a peak assuming there is no vaccination. Approximately 23.35% of the population will be infected with the virus.  Table 2 presents the impact of vaccination with efficacy rate of 80%. In this table, at the different vaccination rates v, the EINN learns different β s and γ s to produce different basic reproduction number R 0 for the entire period. Equation (7) is used to compute the basic reproduction number. As vaccination rate v increases from v = 0 to v = 0.5%, it leads to the reduction in R 0 value, from 2.5 to 2.05. In Figure 9a,b, the model with vaccination is presented. The impact on Susceptible, Infected, and Recovered groups in the population can be seen, while the percentage of the Susceptible individuals decreases as the vaccination rate increases, the percentage of Recovered individuals increases. With the choice of fixed vaccine efficacy η = 94%, two points are significant: (a) With vaccination rate of 1% and 3%, it can be observed that the percentage of the spread reduces from 13.76% to 10.11% as the rate of vaccination increases from 1.0% to 3%, respectively; (b) with vaccination rates of 6% and 10%, the percentage of the spread further reduces to 7.79% and 6.57% for correspondingly. The impact of vaccination with 94% efficacy is presented in Table 3. It can be observed from the table that for v = 0.5%, R 0 decreased from 2.5 to 2.0. This means that the higher the efficacy rate, the faster the decline in the spread of the virus. This claim is further supported by the fact that when vaccination rate v increases to 10%, the R 0 in Table 2 is 1.38 and the R 0 in Table 3 is 1.31.  Figure 10 shows the impact of vaccination with fixed efficacy at η = 80% on the infected population and the effective reproduction number that corresponds to the basic reproduction number computed in Table 2. In particular, it can be seen that as the respective effective reproduction number for vaccination rates of 0%, 1%, 3% and 10%. The decrease in the effective reproduction number as shown in the graphs is indicative of the impact vaccination has on the infected group. can be seen that as the vaccination rate v increases from 0% to 10%, the level of infectiousness decreases. The effective reproductive number is shown in (b) for different vaccination rates v with their respective basic reproduction number. As v = 0%, R 0 = 2.5 and vaccination rate increases from 0% to 3%, the R 0 decreases from R 0 = 2.5 to R 0 = 1.73. When the vaccination rate is increased further to 10%, it can be seen that the basic reproduction number reduces drastically to 1.38. This confirms the impact of vaccination in reducing the spread of the virus. Figure 11 shows the impact of vaccination with fixed efficacy at η = 94% on the infected population and the effective reproduction number that corresponds to the basic reproduction number computed in Table 3. In particular, Figure 11b shows the respective effective reproduction number for vaccination rates of 0%, 1%, 3% and 10%.
Various data-driven simulations (using the real data and ResNet data) are performed for predicting daily infected cases using LSTM, BiLSTM, and GRU for Tennessee. The LSTM, BiLSTM, and GRU are used to learn these dynamics of infected cases. The parameter settings for data-driven simulations are given in Table 4.  Figure 11. Vaccination model with effective reproduction number when efficacy is η = 94%; in (a) when the vaccination rate increases from 0% to 10%, the level of infectiousness declines; (b) shows the effective reproductive number for different vaccination rates v with their corresponding basic reproduction number R 0 . When v = 0%, R 0 = 2.5, and the vaccination rate increases from 0% to 3%, the R 0 decreases from R 0 = 2.5 to R 0 = 1.79. When the vaccination rate is increased further to 10%, it can be seen that the basic reproduction number reduces drastically to 1.31. Vaccination impacts the spread of the virus as seen in this figure.  Figure 12 shows the data-driven simulation for learning the infected group using ResNet compared with the daily infected cases. For comparative analysis, the outcome from the ResNet is compared with the daily infected cases. The magenta colored graph is ResNet. The data span a nine-month period: 16 January 2021 to 16 September 2021. In Figure 13, the output from LSTM, BiLSTM, and GRU are plotted along with the daily infected cases. The real COVID-19 data is used to train the LSTM, BiLSTM, and GRU architectures. The results are plotted along with the daily infected cases. The graph of daily cases is colored blue. The black colored graph shows the output for LSTM, the green colored graph is the graph for BiLSTM, and the magenta colored graph is the graph for GRU. The data spans nine months period: 16 January 2021 to 16 September 2021. A consideration is made for the hybrid approach: ResNet-LSTM, ResNet-BiLSTM, ResNet-GRU. The outcome ResNet is used as the input data for LSTM, BiLSTM, and GRU. The parameter settings in Table 4 are applied to the different hybrid approaches to generate the ensuring outputs.The outcome for the hybrid approach is shown in Figure 14

Error Metrics for Data-Driven Simulations
The error metrics from data-driven simulation for real COVID-19 data and for ResNet data from Tennessee are shown below. The data is split into two train and test sets, while the train data is used to train the models, the error metrics are obtained from the difference between the test data of the actual COVID-19 data and the test data from the model. Table 5 shows the error metrics table for Figure 16 gives the EV values for each approach. It can be seen that the approach with the greatest EV is ResNet-GRU.
The outcome of the errors obtained from the ResNet-LSTM, ResNet-BiLSTM, and ResNet-GRU, LSTM, BiLSTM, and GRU are presented. The error metrics RMSE, MAPE, and EV are used to determine which algorithm learns the dynamic better. In particular, the RMSE is deployed. Smaller values of RMSE indicate the best algorithm in comparison to the others. The values of EV closer to 1 demonstrates lower variation in the predictive power of the neural network.

k-Fold Cross Validation
The mean values of the RMSE for all four values of k are plotted against each value of k for each model in a bar graph in Figure 17 for real COVID-19 data. The graph colored blue is LSTM while orange colored and green colored graphs are BiLSTM and GRU, respectively. When k = 4, BiLSTM has lowest average RMSE value, followed by LSTM and then GRU. For k = 5, LSTM has the lowest average RMSE value, followed by GRU, then BiLSTM. It can be seen that as the value of k value increases to 7, the smallest value of RMSE value is GRU, followed by LSTM, then BiLSTM.  Figure 18 demonstrate the mean value of the RMSE. All four values of k are plotted against each value of k for each model in a bar graph for ResNet data. The bar graph shows the cross validation scores for Tennessee. When k = 4, the model with lowest RMSE value is ResNet-BiLSTM, followed by ResNet-GRU, then ResNet-LSTM. With k = 5, the model with lowest RMSE value is ResNet-GRU, followed by ResNet-BiLSTM, then ResNet-LSTM. When k = 6, it can be observed that ResNet-GRU has the lowest RMSE value, followed by ResNet-LSTM and ResNet-BiLSTM. As the value of k increases from 5 to 7, it can be inferred that ResNet-GRU has smallest average RMSE value.

Prediction and Confidence Interval
Bootstrap algorithm is used to sample a specified number of data samples from a given data and to replicate it for the purpose of constructing a confidence interval. A short term prediction for 15 days into the future (16 September 2021 to 1 October 2021) uses ResNet for Tennessee. Thegraph of confidence interval for ResNet-GRU predicting 15 days into the future is presented. In Figure 19, the Bootstrap algorithm is used to generate the confidence interval band after 10 independent runs. The lower 95% and upper 95% are given in the graph and the mean prediction is also shown. The Bootstrap algorithm is deployed to run the model 10 times. The Poisson distribution is used to replicate the real COVID-19 data 10 times. For all runs, the mean and standard deviation is computed to ascertain a 95% confidence interval for the future predictions (15 days ahead). The confidence interval bound indicates a reasonable bound for the future predictions. Figure 19. A confidence interval is constructed for a short-term prediction using ResNet-GRU.

Conclusions
In this work, a COVID-19 vaccination model with vaccine efficacy has been developed and analyzed, where the epidemiological parameters of the model were obtained from the inverse problem solved by epidemiological informed neural network. A hybrid approach involving residual neural network with recurrent neural network was implemented. Based on error metric values, the goal of investigating which particular type of the hybrid ap-proach produced reliable future prediction of the infected cases was achieved. In particular, error metric for data-driven simulations demonstrate that ResNet-GRU was the best hybrid approach because of its smallest root mean squared error and mean absolute percentage error values and greatest explained variance value. A k-fold cross validation with random splitting using their average root mean squared error as mean score supported the fact that ResNet-GRU is the best algorithm. Based on quantified uncertainty of 95% confidence interval, the results from short-term prediction demonstrated the overall effectiveness of our approach.
The epidemiological importance of this study was demonstrated in the vaccination model with vaccine efficacy. The model without vaccination revealed a high peak of infectiousness. However, the model with vaccination demonstrated a lower peak of infectiousness. Insights can be drawn from the fact that vaccinating the public using the same vaccination rate per day will curtail the spread of the disease faster. This was confirmed by our study.