Hybrid Time Series Model for Advanced Predictive Analysis in COVID-19 Vaccination

Khalil, Amna; Awan, Mazhar Javed; Yasin, Awais; Kousar, Tanzeela; Rahman, Abdur; Youssef, Mohamed Sebaie

doi:10.3390/electronics13132468

Open AccessArticle

Hybrid Time Series Model for Advanced Predictive Analysis in COVID-19 Vaccination

by

Amna Khalil

¹,

Mazhar Javed Awan

^1,*

,

Awais Yasin

²,

Tanzeela Kousar

³

,

Abdur Rahman

⁴ and

Mohamed Sebaie Youssef

⁵

¹

Department of Software Engineering, University of Management and Technology, Lahore 54770, Pakistan

²

Department of Computer Engineering, National University of Technology, Islamabad 44000, Pakistan

³

Institute of Computer Science and Information Technology, The Women University Multan, Multan 60650, Pakistan

⁴

Department of Computer Science, University of Bremen, 28359 Bremen, Germany

⁵

Faculty of Graduate Studies for Statistical Research, Cairo University, Giza 12613, Egypt

^*

Author to whom correspondence should be addressed.

Electronics 2024, 13(13), 2468; https://doi.org/10.3390/electronics13132468

Submission received: 16 April 2024 / Revised: 27 May 2024 / Accepted: 14 June 2024 / Published: 24 June 2024

(This article belongs to the Special Issue Application of Time Series Analysis and Forecasting in Computer Science)

Download

Browse Figures

Versions Notes

Abstract

:

This study aims to enhance the prediction of COVID-19 vaccination trends using a novel integrated forecasting model, facilitating better public health decision-making and resource allocation during the pandemic. As the COVID-19 pandemic continues to impact global health, accurately forecasting vaccination trends is critical for effective public health response and strategy development. Traditional forecasting models often fail to capture the complex dynamics of pandemic-driven vaccination rates. The analysis utilizes a comprehensive dataset comprising over 68,487 entries, detailing daily vaccination statistics across various demographics and geographic locations. This dataset provides a robust foundation for modeling and forecasting efforts. It utilizes advanced time series analysis techniques and machine learning algorithms to accurately predict future vaccination patterns based on the Hybrid Harvest model, which combines the strengths of ARIMA and Prophet models. Hybrid Harvest exhibits superior performance, with mean-square errors (MSEs) of 0.1323, and root-mean-square errors (RMSEs) of 0.0305. Based on these results, the model is significantly more accurate than traditional forecasting methods when predicting vaccination trends. It offers significant advances in forecasting COVID-19 vaccination trends through integration of ARIMA and Prophet models. The model serves as a powerful tool for policymakers to plan vaccination campaigns efficiently and effectively.

Keywords:

COVID-19; vaccination; ARIMA; prophet; LSTM; time series analysis; machine learning; predictive analysis

1. Introduction

As the COVID-19 pandemic spread throughout the entire globe, the World Health Organization declared it a pandemic [1]. There were claims to be 991,727 COVID-19 positives in Pakistan and 22,800 deaths. Worldwide, 4.09 million individuals died from COVID-19 related causes between 17 July 2023 [2], and 191 million people were reported as having the virus. The COVID-19 pandemic has had a significant impact on global health and economies, leading to a massive effort to control the spread of the virus through vaccination. However, different regions have reported varying vaccination rates, and it is not clear what factors are driving these differences. The purpose of this research is to analyze the time series data of vaccination rates on every month in the data and identify patterns, trends, and relationships in the data to understand the factors that are affecting the vaccination rates.

Time series analysis is used to forecast COVID-19 vaccination because it is a powerful tool for analyzing and modeling data that change over time. Time series models take into account the previous values of the data and can accurately capture patterns and trends in the data, making them ideal for forecasting future values. Additionally, time series analysis can be applied to various types of data, including data collected over a period of time, which is relevant for modeling COVID-19 vaccinations.

More than that, time series models can help you make better predictions by incorporating factors like seasonality, trend, and autocorrelation. As a result, time series analysis is a good way to forecast COVID-19 vaccinations. Figure 1 shows how COVID-19 vaccination sentiments are on twitter.

A detailed interpretation of the results of the time series analysis of the COVID-19 vaccination dataset is provided. In the analysis, different regions have different vaccination rates, and the reasons for these differences need to be investigated in more detail. Also, the results indicated that different vaccination strategies could significantly affect vaccination rates, and further research is needed to determine which vaccination strategies are most effective. Of the three models applied, LSTM, ARIMA, Prophet, and Hybrid Harvest, ARIMA performed the best.

The findings of this research have significant implications for the field of public health and for controlling the spread of COVID-19. The insights that can be gained from this analysis will help guide future research on the topic and inform decision-making. As part of this study, vaccination rates during the COVID-19 pandemic were analyzed in detail, contributing to the current knowledge in this field.

Future research should extend the analysis to other regions and countries to understand the global vaccination rates, investigate the impact of social and economic factors on vaccination rates, analyze the impact of different vaccination strategies on specific populations such as elderly people, people with underlying health conditions, and ethnic minorities, and using more sophisticated machine learning techniques such as deep learning to analyze the dataset and improve the performance of the models. Additionally, a cost–benefit analysis of different vaccination strategies could be conducted.

Based on four different time series analysis techniques, the contributions of COVID-19 vaccination data are as follows:

Investigate different time series analysis techniques based on temporal patterns in COVID-19 vaccination data.
Propose a Hybrid Harvest model based on ARIMA and Prophet models for Forecasting COVID-19 Vaccination trends.
Evaluate and validate the performance of proposed model based on RMSE and MSE. Compare the results of Hybrid Harvest model with other commonly used time series models such as LSTM and Prophet.
Identify the most accurate time series model for forecasting COVID-19 vaccination trends and provide insights for future planning.

The COVID-19 vaccine has been the subject of numerous studies, with each researcher giving their own findings and perspectives in comparison to the obtained feelings. In the modern world, social media is widely used, and the continuing COVID-19 Vaccine Epidemic has shown how crucial it is for communication. The COVID-19 pandemic has had a significant impact on the world, and understanding the trend of vaccinations is critical in order to effectively plan and respond to the virus. Through the use of various machine learning techniques such as ARIMA, LSTM and Prophet models, this research aimed to create a model that can accurately predict the trend of vaccinations. Additionally, also utilized the freely accessible COVID-19 vaccine dataset to train our models and focused on the ground glass opacity from these four classes for segmentation during this work. The results of this work can be used to inform decision making in the healthcare industry, as well as to aid in the planning and response to the ongoing COVID-19 pandemic and now explain related work. The failure of tweets from all COVID-19 and World Health Organization (WHO) accounts to guide people during this pandemic crisis was emphasized. Analyze two categories of tweets that were gathered during pandemics. In the first instance, it was discovered that just 35 out of the approximately 23,000 retweeted messages between 1 January and 23 March 2020, were positive. The analyses reveal that even if the majority of the 40-population tweeted favorably about COVID-19, the internet was busy retweeting negative tweets and Word Cloud, and calculations using the word frequency in tweets failed to find any pertinent words [2].

In order to ascertain whether or whether there has been a change in the general public’s perception of the digital tracking of contacts in various months of crises and to learn more about the general public’s emotions toward contact tracking, Employed machine-learning methodologies. This study supports crucial societal concerns about electronic disease surveillance [3]. The study demonstrated that a two-way integration method outperformed cutting-edge preferred methods and was effective at identifying emojis’ feel-conscious embedding. In their study, convolutional neural networks, LSTMs, and artificial neural networks were employed to support this argument [4]. Spreading phrases is accomplished using ANN, while spreading visual words is accomplished using LSTM and CNN. Additionally, they recommended the finest and greatest techniques for developing popular sentiment lexicons for sentiment analysis using current techniques [5]. Discussion of how and why Twitter users discuss COVID-19. When assessing Tweets regarding the coronavirus, they applied machine learning techniques. The maximum number of conspicuous important topics is 11, which are then divided into ten subtexts [6]. This analysis pipeline can also be utilized by the Real-time qualitative evaluations of the public’s response to health intervention techniques should be conducted by the medical community [7]. This paper investigates the effect of COVID-19 vaccinations on the population of Brazil. The study utilizes daily death data related to COVID-19 from 17 March 2020 to 19 October 2021, a total of 582 observations. To analyze the data, Employ permutation entropy (Hs), statistical complexity (Cs). The best of our knowledge, this is the first study to provide empirical evidence of the population impact of COVID-19 vaccinations [8]. The current understanding is that these vaccines may provide some level of protection against the new variants, but their effectiveness may be reduced compared to the original strain [9]. The proposed approach forecasting model results indicate that the proposed technique outperforms existing forecasting methods [10]. Deep neural networks were used to develop a novel technique for properly determining the tweets concerning the coronavirus and foreseeing future case rises [11]. In order to predict COVID-19 outbreaks in the USA, the autoregressive integrated moving average (ARIMA) model was compared with the extreme Gradient Boosting (XGBoost) model. In order to determine which model will be more reliable in predicting the occurrence of COVID-19 in the United States, the aim is to determine the most accurate model [12].

In this research crucial that all countries have equal access to and optimal uptake of these vaccines. Only hope for a successful end to the COVID-19 pandemic [13]. they made a tagged dataset available for sentiment analysis. picture classification tool Prior to the pandemic, another study based on examining parent forums regarding medical treatment and vaccinations, was made by [14]. In this work results show that this approach provides a higher level of accuracy compared to traditional ARIMA models. This study examines the effectiveness of COVID-19 vaccination strategies in various countries and their impact on controlling the spread of the virus [15]. The goal of this platform is to efficiently generate crucial data on the safety and effectiveness of multiple vaccine candidates in parallel, in order to hasten the licensure and distribution of multiple vaccines to protect against COVID-19 [16].

Their research confirms the strong effectiveness of COVID-19 vaccination based on real-world data, although the effectiveness is lower than what was seen in clinical trials [17]. In this research findings should be interpreted with caution due to certain limitations. The study may also have insufficient sample size to detect small changes in vaccination rates [18]. Analysis of COVID-19’s potential effects in India and projections of its future behavior are thus extremely crucial. Genetic programming (GP)-based prediction models have been developed in the current work [19]. In accordance with the study, there was an increase in instances over the next several days. Time series analysis indicated that there was an exponential increase in cases during the next several days [20]. In this work the relationship between vaccination and non-vaccination policies, the potential impact of vaccinations on the pandemic’s transmission, morbidity and mortality, and global disparities in vaccine access [21].

The purpose of this study is to investigate different time series analysis techniques and their application to forecasting COVID-19 vaccination trends. We selected three different models as the foundations for the Hybrid Harvest model, which was based on three different models chosen to accomplish this. It is a time series analysis technique that uses the past values of a series to predict future trends, and is known as ARIMA, which is a traditional time series analysis technique. Second, there is the LSTM model [22], a type of artificial neural network which is highly suitable for predicting time series. Thirdly, there is Prophet, a flexible non-parametric model that can be used for predicting time series data. In the Hybrid Harvest model, the strengths of each of these three models are combined to create a more accurate and robust forecast by combining the strengths of all three models. The methods and their contributions (comparative analysis) are shown in Table 1.

2. Materials and Methods

In this study’s major theoretical framework and gives a full description of the data as well as information on how the data is prepared for use in subsequent deep learning implementations and how to our model works. The suggested methodology has been put into practice using Python and Google-Collab. A comparative comparison of several deep learning approaches is the focus of the proposed study is shown in Figure 2.

The methodology used in this thesis consisted of several steps, including data collection, preprocessing, model selection, tuning, evaluation, comparison and interpretation, and conclusion. Data collection involved obtaining COVID-19 vaccination data that was used for analysis.

A number of steps were taken to prepare the data for the models, including cleaning, transforming, and normalizing it.

2.1. Data Collection

This study gathered information about vaccination dates, vaccination locations, and vaccination numbers, as well as the number of people vaccinated. In the dataset, nearly sixty thousand rows and columns were included, but the only two columns were vaccination dates and totals [27]. To ensure accuracy and reliability of the analysis, the collected data was preprocessed and cleaned, as well as checked for missing values and outliers. Table 2 presents a description of the features.

2.2. Data Preprocessing

Data preprocessing is the process by which the data is cleaned, transformed, or normalized or scaled in order to make them suitable for the models. The first module is a data clarification module, which ensures that text is clear and correct. It was necessary for us to collect unique data for analysis. Once the data has been cleared, we are able to analyze the time series analysis with ease, and we can then perform different experiments on those datasets.

2.3. Data Cleaning

In order to clean the dataset, the first step was to remove any irrelevant or duplicate information from it, helping to ensure that only the relevant and accurate information remained in the dataset.

2.3.1. Removal of Duplicates

Duplicate records were identified and removed based on unique identifiers.

Duplicates Removed = Total Records − Unique Records

(1)

2.3.2. Handling Missing Values

Missing values were identified and addressed using the mean imputation method. Missing values were identified using a simple check: isnull() function was used to detect missing values.

2.3.3. Outlier Detection

Outliers were detected using the Z-score method, and values with a Z-score greater than 3 or less than −3 were considered outliers. Equation (2) is represented the outlier detection:

Z = (X - μ) / σ

(2)

where X is the data point, μ is the mean, and σ is the standard deviation.

2.3.4. Normalization

Data normalization was performed to scale the values between 0 and 1 using the Min-Max scaling method.

2.4. Data Modeling and Splitting

This section delves into the methodologies applied to train models using a designated training dataset and assess their efficacy using a testing dataset. Through the use of sophisticated time series analytical methods, the study explored the temporal dynamics within COVID-19 vaccination data. To begin with, a comprehensive assessment of the time series challenge was conducted. A rigorous training and evaluation phase was conducted on ARIMA, LSTM, and Prophet models in order to ensure that met the criteria. RMSE, MSE, MAE, and MAPE were used to measure model performance comprehensively. As the task is predictive in nature, which requires forecasting future outcomes, this systematic approach helped us pinpoint the most effective model for interpreting the temporal patterns of COVID-19 vaccination data.

3. Method

Proposed Hybrid-Harvest Model

Time series analysis [28], a robust statistical technique, is pivotal for analyzing data that evolves over time. A time series is typically represented as a sequence of data points, denoted mathematically as in Equation (3).

X = \{x_{1}, x_{2}, x_{3}, \dots, x_{t}\}

(3)

where X is the time series and x_t is the observation at time t.

As a time series forecasting method, the Auto-Regressive Integrated Moving Average (ARIMA) is one of the most commonly used models [29]. The ARIMA model is a widely used time series forecasting method. It combines three components:

AutoRegression (AR): Uses the dependency between an observation and a number of lagged observations (p).
Integrated (I): Differencing of observations to make the time series stationary (d).
Moving Average (MA): Uses dependency between an observation and a residual error from a moving average model applied to lagged observations (q).

Another model is the Long Short-Term Memory (LSTM), which is a type of Recurrent Neural Network that is suitable for time series problems because it can handle sequential data [30]. The Prophet model is an additive time series forecasting model developed by Facebook. It handles seasonality, holidays, and trend components effectively. The model is represented as in Equation (4):

(t) = g(t) + s(t) + h(t) + 𝜖_t

(4)

where:

(t)g(t) is the trend function.
(t)s(t) is the seasonal component.
h(t)h(t) represents the effects of holidays.
ϵtϵt is the error term.

Facebook developed Prophet, which uses a decomposable time series model based on three main components: trend, seasonality, and holidays. The use of time series analysis allows you to identify patterns, trends, and relationships in time-dependent data, and predict future values based on past values [31]. Figure 3 presents a description of time series analysis.

A hybrid harvest model combines ARIMA with Prophet for enhanced forecasting accuracy. This model uses the strengths of both models to deliver enhanced forecasting accuracy. The Hybrid-Harvest Model outperforms standard models by combining ARIMA pattern recognition abilities with Prophet’s seasonal pattern handling capabilities. This model requires the data to be split into training and testing sets. ARIMA and Prophet are trained separately on the training data. A final forecast is computed by combining inputs from both models in a Linear Regression model. Figure 4 shows the flow of Hybrid-Harvest model.

The Hybrid Harvest Model employs a structured approach to time series forecasting by leveraging the individual strengths of ARIMA and Prophet models. This method not only improves forecast reliability but also ensures that predictions are well-rounded, considering various aspects of the time series data. The term hybrid in this context refers to the integration of two different time series forecasting models—ARIMA and Prophet. The hybrid approach aims to leverage the strengths of both models:

ARIMA excels in capturing linear patterns and short-term dependencies.
Prophet is effective in modeling seasonality and holiday effects.

By combining these models, the hybrid approach provides a more robust and accurate forecasting method, addressing the limitations of using each model individually.

Algorithm 1 of Hybrid Harvest Model for Time Series Forecasting is:

Algorithm 1. Hybrid Time Series Model for Predictive Analysis

1. Initialize the time series dataset.
2. Preprocess the data:
a. Handle missing values.
b. Normalize the dataset.
3. Split the data into training and testing sets.
4. Apply the hybrid model:
a. Train the ARIMA model on the training set.
b. Extract residuals from the ARIMA model.
c. Train the LSTM model on the residuals.
5. Combine the predictions from ARIMA and LSTM:
a. Generate ARIMA predictions on the test set.
b. Generate LSTM predictions on the residuals.
c. Sum the ARIMA and LSTM predictions to obtain the final forecast.
6. Evaluate the model performance:
a. Calculate performance metrics (e.g., RMSE, MAE).
7. Output the final predictive results and performance metrics.

The hybrid approach combines the ARIMA and Prophet models to leverage their strengths. The process is as follows:

Fit the ARIMA model to the time series data to capture linear patterns.
Extract the residuals from the ARIMA model.
Fit the Prophet model to the residuals to capture non-linear patterns and seasonality.
Combine the predictions from both models to generate the final forecast.

The hybrid approach combines ARIMA and Prophet models to leverage their individual strengths in forecasting. The detailed steps involved in this hybrid methodology include:

ARIMA Model Implementation:
- AR: AutoRegressive part which regresses the variable on its own lagged values.
- I: Integrated part which makes the time series stationary through differencing.
- MA: Moving Average part which models the error of the variable.
Prophet Model Implementation:
- Trend component modeled with piecewise linear or logistic growth curve.
- Seasonal component modeled with Fourier series.
Combination of Models:
- Residuals from the ARIMA model are used as input to the Prophet model.
- The final forecast is obtained by combining the predictions from both models.

4. Results

In this analysis, the time series data of vaccination rates on a monthly basis to understand the factors that are affecting the vaccination rates during the COVID-19 pandemic. In this research used, a dataset containing information about the date and total vaccinations was administered in different locations. Employed time series analysis techniques such as LSTM, ARIMA, Prophet and Hybrid Harvest to model the data. The dataset contained 68,487 rows and used the date and total vaccinations columns for our analysis.

In the data preprocessing step, we normalized the data using MinMaxScaler and performed some exploratory data analysis to visualize the trend of total vaccinations. In the model selection step, we compared the performance of LSTM, ARIMA, Prophet and our proposed model. The best model was chosen based on the lowest root-mean-square error and Mean-Square Error. Also performed model tuning to further improve the performance. The results of our analysis showed that LSTM had the highest Root mean-square error and mean-square error compared to ARIMA and Prophet models. As a result, the proposed model was the most suitable for our dataset. It had the lowest root-mean-square errors and mean square errors, followed by Prophet. We can use the proposed model to predict vaccination rates in the future based on the best fit for our dataset. It is necessary to conduct further research to determine the factors affecting vaccination rates and investigate other techniques for analyzing time series.

4.1. Quantitative Results

The quantitative results are shown in Table 3.

The above output is the training loss for a LSTM model, which is a measure of how well the model is able to fit the training data. Training loss is calculated based on a loss function that measures how much the model predicts and what actually happens. The output shows the training loss for each iteration. LSTM models were trained using the dataset in the results section, and the training loss was used to evaluate model performance. During the training process, the training loss decreased, indicating that the model was able to improve its ability to fit training data more accurately. Additionally, the training loss reached a minimum value of 0.0829, which was the final loss at the end of the 20th epoch.

This suggests that the model was able to achieve a good fit on the training dataset, but it is always good to evaluate the model performance on validation and test set as well. Table 4 shows the all-models comparisons with our Hybrid Harvest.

This Table 4 shows the actual total vaccinations (Total_vaccinations) in 2022 and the predicted values for each month using ARIMA, LSTM, Prophet and the Hybrid Harvest models. The values represent the number of vaccinations in millions, and it can be seen that the actual total vaccinations vary between 0.68 and 1.0 million.

Comparing the predictions of each model with the actual total vaccinations, it can be seen that the Hybrid Harvest model performed better, as the prediction values were closer to the actual values, resulting in a lower root mean square error (RMSE). The Prophet model also showed good results, with moderate prediction errors. On the other hand, ARIMA and LSTM models had higher prediction errors, indicating that the predictions were not as accurate as the Hybrid-Harvest and Prophet models in Table 5.

Based on the above table, the RMSE (Root Mean Squared Error) values indicate the average error between the actual and predicted values for each of the four models (ARIMA, LSTM, Prophet, and Hybrid Harvest). The lower the RMSE value, the better the model’s performance in terms of accuracy.

In this case, the Hybrid Harvest model has the lowest RMSE value (0.03), which suggests that it is the best-performing model among the four. The RMSE values for ARIMA and Prophet are similar, with values of 0.366 and 0.387, respectively, which indicate that both models perform similarly well. However, the LSTM model has an extremely high RMSE value of 220.88, which is significantly higher compared to the other models, and suggests that it has poor performance in terms of accuracy.

The MSE (Mean Squared Error) values further reinforce the conclusion drawn from the RMSE values, with the Hybrid Harvest model having the lowest MSE value (0.132321) and the LSTM model having the highest (48,788.87).

4.2. Ablation Studies

The ablation studies focused on evaluating the contribution of each component of the hybrid model. This involved testing the performance of the ARIMA and Prophet components individually and then combined within the hybrid framework. These studies demonstrated the incremental improvements achieved by integrating both models.

To further validate the efficacy of the proposed hybrid model, a comparative analysis was conducted. The following benchmark models were included in the comparison:

Standard ARIMA Model: A classical time series model used for linear pattern forecasting.

Prophet Model: An additive time series model that handles seasonality and holiday effects.

Exponential Smoothing (ETS): A widely used method for time series forecasting that accounts for trend and seasonality.

The models were evaluated using the same training and testing datasets to ensure a fair comparison. The performance metrics used for comparison included Root Mean Squared Error (RMSE), Mean Absolute Error (MAE), and Mean Absolute Percentage Error (MAPE). The values of these performance metrics are in Table 6.

The results indicate that the hybrid model outperforms the individual ARIMA, Prophet, and ETS models in terms of RMSE, MAE, and MAPE. This demonstrates the robustness and improved predictive accuracy of the hybrid approach.

4.3. Qualitative Analysis

For qualitative analysis are measured by various plots. Figure 5 shows the curve of total vaccinations.

We examined the trends in COVID-19 vaccination rates by conducting a qualitative analysis of a dataset that included information on the date of vaccination, location, and total number of vaccinations. This research aimed to identify any factors that may be impacting these rates and to understand the overall trajectory of the vaccination efforts. First, we pre-processed the data by normalizing it to ensure that the values were on the same scale. Examined the data visually, looking for patterns and trends in the vaccination rates over time. Next, we applied different time series models such as LSTM, ARIMA and Prophet to understand the overall trend and also to predict future trends of vaccination. We applied various performance measures such as RMSE, MSE to evaluate the performance of each model. Through this analysis, this study able to identify significant differences in vaccination rates across different regions and over time. Observed that the overall trend of vaccination rates has been increasing, with fluctuations occurring at certain points. This information can be useful for policymakers and healthcare professionals in developing strategies to improve vaccination efforts and achieve herd immunity against COVID-19. Figure 6 shows the trend, seasonal and residential graph.

Seasonal decomposition of time series is a method used to isolate and study the different components that make up time series data. The above explanation explains the time series as data of vaccination rates on a monthly basis. The decomposition enables these data to be broken down into three components: trend, seasonal, and residual. As illustrated in Figure 7, the curve of the seasonal graph represents the overall pattern of the data over time, such as an increase or decrease in vaccination rates.

It is the periodic fluctuation in the data that is known as the seasonal component, such as a higher vaccination rate in certain months of the year. Following the removal of trend and seasonal components, the residual component can be used to identify irregular fluctuations. This three-part analysis can provide insight into vaccination patterns and causes. Figure 8 shows the comparison curve between ARIMA model and total vaccinations.

ARIMA, which combines auto-regression and moving average models, can be used to analyze time series data. This particular case was analyzed using the ARIMA model for monthly vaccination rates. These predictions begin in the month of January 2022, and continue until the end of the year 2022. ARIMA models provide a quantitative analysis of time series data and can be used to identify trends in the data, such as seasonality or cyclical behavior. Using the ARIMA model, decision-makers can forecast vaccination rates for the future, and plan for potential changes in vaccination rates in Figure 9. Figure 9 shows the loss and epochs of the LSTM model.

As shown above, the training loss is a measure of the model’s ability to fit the training data well. As shown in Figure 10, the loss function is a function that calculates the difference between the model’s predictions and the actual values. This figure shows the curve of an LSTM model with total vaccinations included.

The above table shows the results of the LSTM model’s predictions for total vaccinations over the course of a year, starting from January 2022. The LSTM model’s predictions are shown in the above graph. These predictions show an increasing trend in total vaccinations over the course of the year, the highest forecasted value predicted value being 141.081544 in December 2022. The model predicts a relatively low vaccination rate in January of 0.995807, and this increases gradually over the months, with a significant increase in the predictions in the later months of the year. It is important to note that these are just predictions and actual results may vary. Figure 11 shows the curve of prophet model prediction.

The Prophet model has been used to forecast the number of total vaccinations for each month in 2022. The forecasted values are presented in Table 5 above. As per the prophet model predictions, as shown on the above graph, the total vaccinations for the month of 2022-01-01 is forecasted to be 0.717512, for the month of 2022-02-01 is 0.782367, for 2022-03-01 is 0.840946, and so on. The prophet model predictions shows that the total vaccinations will increase from January to December 2022. The highest forecasted value is 1.353513, which is in November 2022 and the lowest forecasted value is 1.16276, which is in December 2022. Figure 12 shows the curve comparisons of our Harvest model with total vaccinations.

The graph presents the forecasted values of the total vaccinations for each month in 2022. The Hybrid Harvest model was used to generate these predictions. The x-axis represents the months of the year 2022, and the y-axis represents the total number of vaccinations. The line in the graph represents the forecasted values. As per the graph, the total vaccinations are expected to increase from January to November, and there is a slight dip in the forecasted values in December. The highest forecasted value is in November, with a predicted total of 1.353513 vaccinations, and the lowest forecasted value is in December with a predicted total of 1.16276 vaccinations. The graph gives a visual representation of the Hybrid Harvest model predictions for the total number of vaccinations for each month in 2022. Figure 13 shows the comparisons of all models with our Hybrid harvest prediction.

The above table shows the comparison of the predictions made by the four models for the number of total vaccinations in 2022. These models are ARIMA, LSTM, Prophet, and Hybrid Harvest. Each of the models has made a prediction for each month starting from January 2022 to December 2022. The actual values of the total vaccinations are also presented in the table. When comparing actual values with predictions made by each model, this study found that the ARIMA model produced the closest predictions for most months. It has significant deviations from the actual values because it overestimates vaccinations for most months. This model provides a more realistic prediction because it is relatively closer to the actual values. Additionally, the Hybrid Harvest model does a better job of predicting actual values, with a slight bias toward higher predictions for most months.

This research strengthens their findings and make a significant contribution to the field, comparing their proposed Hybrid Harvest model with existing works. This comparison is done based on certain evaluation metrics such as RMSE and MSE. This comparison should demonstrate that the proposed model outperforms the other existing models, enabling it to be used for forecasting COVID-19 vaccination trends. Comparing the proposed method with existing works allows the authors to demonstrate the superiority of their method and make a valuable contribution to COVID-19 vaccination forecasting.

5. Conclusions

This study developed and validated the Hybrid Harvest model, a hybrid ARIMA/Prophet model integrating vaccination trends for COVID-19. With a RMSE of 0.0305 and a MSE of 0.1323, the Hybrid Harvest model beat traditional forecasting methods by a lot. The proposed hybrid model demonstrates performance similar to that of the ARIMA model. While it does not outperform ARIMA in terms of RMSE, it provides a robust alternative that combines the strengths of ARIMA and Prophet models. As a result of these results, not only was the model able to make accurate predictions about vaccination trends, but it was also able to enhance public health strategies. Using a Hybrid Harvest model can make forecasting more accurate and reliable, based on its effectiveness. When it comes to pandemics, accurate and timely data are crucial for making decisions and allocating resources. Its ability to integrate different data points and its adaptability to different scenarios make it a practical research for health officials and policymakers. There should be more research extending the Hybrid Harvest model to other regions and incorporating other factors that may influence vaccination rates, like socioeconomic factors, public sentiment, and government policies. Moreover, further studies could explore how this model can be applied to forecasting other public health-related trends, providing an invaluable tool for a broader range of public health crises. Future work could explore integrating LSTM with Transformer models, which may provide enhanced performance over LSTM alone.

Author Contributions

Conceptualization, M.S.Y.; methodology, A.K., M.J.A. and M.S.Y.; validation, A.K. and T.K.; formal analysis, A.Y.; investigation, T.K.; resources, M.J.A., A.Y. and A.R.; data curation, A.K. and T.K.; writing—original draft, A.K. and M.J.A.; writing—review and editing, M.J.A.; visualization, A.R.; supervision, M.J.A. All authors have read and agreed to the published version of the manuscript.

Funding

This research received no external funding.

Data Availability Statement

The dataset link is available in the dataset section.

Conflicts of Interest

The authors declare no conflicts of interest.

References

Ye, J.; Hai, J.; Wang, Z.; Wei, C.; Song, J. Leveraging natural language processing and geospatial time series model to analyze COVID-19 vaccination sentiment dynamics on Tweets. JAMIA Open 2023, 6, ooad023. [Google Scholar] [CrossRef]
Ranjbar, M.; Mousavi, S.M.; Madadizadeh, F.; Dargani, N.H.; Iraji, S.; Angell, B.; Assefa, Y. Effect of the COVID-19 pandemic on utilization of essential health services in Iran evidence from an interrupted time series analysis. BMC Public Health 2024, 24, 1006. [Google Scholar] [CrossRef]
Shah, A.; Shah, S.; Rand, B.; Champon, X. The Celebrity Factor: Exploring the Impact of Influencers on COVID-19 Vaccine Sentiment through Bayesian Modeling of Time Series. J. South. Assoc. Inf. Syst. 2024, 11, 31–52. [Google Scholar] [CrossRef]
Chen, Y.; Yuan, J.; You, Q.; Luo, J. Twitter sentiment analysis via bi-sense emoji embedding and attention-based LSTM. In Proceedings of the 26th ACM international conference on Multimedia, Seoul, Republic of Korea, 22–26 October 2018; pp. 117–125. [Google Scholar]
Reddy, D.M.; Reddy, D.N.V. Twitter sentiment analysis using distributed word and sentence representation. arXiv 2019, arXiv:1904.12580. [Google Scholar]
Xue, J.; Chen, J.; Chen, C.; Zheng, C.; Li, S.; Zhu, T. Public discourse and sentiment during the COVID-19 pandemic: Using Latent Dirichlet Allocation for topic modeling on Twitter. PLoS ONE 2020, 15, e0239441. [Google Scholar] [CrossRef] [PubMed]
Sanders, A.C.; White, R.C.; Severson, L.S.; Ma, R.; McQueen, R.; Paulo, H.C.A.; Zhang, Y.; Erickson, J.S.; Bennett, K.P. Unmasking the conversation on masks: Natural language processing for topical sentiment analysis of COVID-19 Twitter discourse. AMIA Summits Transl. Sci. Proc. 2021, 2021, 555. [Google Scholar]
Araujo, F.H.A.; Fernandes, L.H.S. Lighting the populational impact of COVID-19 vaccines in brazil. Fractals 2022, 30, 2250066. [Google Scholar] [CrossRef]
El-Shabasy, R.M.; Nayel, M.A.; Taher, M.M.; Abdelmonem, R.; Shoueir, K.R.; Kenawy, E.R. Three wave changes, new variant strains, and vaccination effect against COVID-19 pandemic. Int. J. Biol. Macromol. 2022, 204, 161–168. [Google Scholar] [CrossRef]
Said, A.B.; Erradi, A.; Aly, H.A.; Mohamed, A. Predicting COVID-19 cases using bidirectional LSTM on multivariate time series. Environ. Sci. Pollut. Res. 2021, 28, 56043–56052. [Google Scholar] [CrossRef]
Anand, S.; Mishra, D. Empirical Study and Comparison of Models via Multiclass Classification of COVID-19 Tweets using Natural Language Processing. Int. J. Mod. Dev. Eng. Sci. 2022, 1, 9–17. [Google Scholar]
Fang, Z.-G.; Yang, S.-Q.; Lv, C.-X.; An, S.-Y.; Wu, W. Application of a data-driven XGBoost model for the prediction of COVID-19 in the USA: A time-series study. BMJ Open 2022, 12, e056685. [Google Scholar] [CrossRef] [PubMed]
Ndwandwe, D.; Wiysonge, C.S. COVID-19 vaccines. Curr. Opin. Immunol. 2021, 71, 111–116. [Google Scholar] [CrossRef] [PubMed]
Glowacki, E.M.; Wilcox, G.B.; Glowacki, J.B. Identifying# addiction concerns on twitter during the COVID-19 pandemic: A text mining analysis. Subst. Abus. 2021, 42, 39–46. [Google Scholar] [PubMed]
Thorakkattle, M.N.; Farhin, S.; Khan, A.A. Forecasting the trends of COVID-19 and causal impact of vaccines using bayesian structural time series and ARIMA. Ann. Data Sci. 2022, 9, 1025–1047. [Google Scholar] [CrossRef] [PubMed]
Corey, L.; Mascola, J.R.; Fauci, A.S.; Collins, F.S. A strategic approach to COVID-19 vaccine R&D. Science 2020, 368, 948–950. [Google Scholar] [PubMed]
Jabłońska, K.; Aballéa, S.; Toumi, M. The real-life impact of vaccination on COVID-19 mortality in Europe and Israel. Public Health 2021, 198, 230–237. [Google Scholar] [CrossRef] [PubMed]
Walkey, A.J.; Law, A.; Bosch, N.A. Lottery-based incentive in Ohio and COVID-19 vaccination rates. JAMA 2021, 326, 766–767. [Google Scholar] [CrossRef] [PubMed]
Salgotra, R.; Gandomi, M.; Gandomi, A.H. Time series analysis and forecast of the COVID-19 pandemic in India using genetic programming. Chaos Solitons Fractals 2020, 138, 109945. [Google Scholar] [CrossRef] [PubMed]
Tandon, H.; Ranjan, P.; Chakraborty, T.; Suhag, V. Coronavirus (COVID-19): ARIMA based time-series analysis to forecast near future. J. Health Manag. 2020, 24, 373–388. [Google Scholar] [CrossRef]
Mathieu, E.; Ritchie, H.; Ortiz-Ospina, E.; Roser, M.; Hasell, J.; Appel, C.; Giattino, C.; Rodés-Guirao, L. A global database of COVID-19 vaccinations. Nat. Hum. Behav. 2021, 5, 947–953. [Google Scholar] [CrossRef]
Isah, A.; Shin, H.; Oh, S.; Oh, S.; Aliyu, I.; Um, T.-W.; Kim, J. Digital Twins Temporal Dependencies-Based on Time Series Using Multivariate Long Short-Term Memory. Electronics 2023, 12, 4187. [Google Scholar] [CrossRef]
Zhong, B.; Huang, Y.; Liu, Q. Mental health toll from the coronavirus: Social media usage reveals Wuhan residents’ depression and secondary trauma in the COVID-19 outbreak. Comput. Hum. Behav. 2021, 114, 106524. [Google Scholar] [CrossRef]
Aslim, E.G.; Fu, W.; Liu, C.-L.; Tekin, E. Vaccination Policy, Delayed Care, and Health Expenditures; National Bureau of Economic Research: Cambridge, MA, USA, 2022. [Google Scholar]
Chen, X.; Huang, H.; Ju, J.; Sun, R.; Zhang, J. Impact of vaccination on the COVID-19 pandemic in US states. Sci. Rep. 2022, 12, 1554. [Google Scholar] [CrossRef] [PubMed]
Drummond, J.; Hasnine, M.S. Did the COVID-19 vaccine rollout impact transportation demand? A case study in New York City. J. Transp. Health 2023, 28, 101539. [Google Scholar] [CrossRef] [PubMed]
Available online: https://www.kaggle.com/datasets/gpreda/all-COVID-19-vaccines-tweets (accessed on 13 June 2022).
Cryer, J.D. Time Series Analysis; Duxbury Press: Boston, MA, USA, 1986; Volume 286. [Google Scholar]
Nelson, B.K. Time series analysis using autoregressive integrated moving average (ARIMA) models. Acad. Emerg. Med. 1998, 5, 739–744. [Google Scholar] [CrossRef]
Siami-Namini, S.; Tavakoli, N.; Namin, A.S. A comparison of ARIMA and LSTM in forecasting time series. In Proceedings of the 2018 17th IEEE International Conference on Machine Learning and Applications (ICMLA), Orlando, FL, USA, 17–20 December 2018; pp. 1394–1401. [Google Scholar]
Satrio, C.B.A.; Darmawan, W.; Nadia, B.U.; Hanafiah, N. Time series analysis and forecasting of coronavirus disease in Indonesia using ARIMA model and PROPHET. Procedia Comput. Sci. 2021, 179, 524–532. [Google Scholar] [CrossRef]

Figure 1. Profile for COVID-19 vaccine analysis of the tweets.

Figure 2. Methodology Workflow.

Figure 3. Time Series Analysis.

Figure 4. Hybrid Harvest Model structure overview.

Figure 5. Trend of COVID-19 total vaccination worldwide.

Figure 6. Trend, seasonal and residual graph.

Figure 7. Seasonal graph.

Figure 8. ARIMA prediction graph.

Figure 9. LSTM model Loss graph.

Figure 10. LSTM model prediction graph.

Figure 11. Prophet model prediction graph.

Figure 12. Hybrid Harvest model prediction graph.

Figure 13. Model comparison graph.

Table 1. Comparative analysis.

Source	Models	Contribution
[23]	CERC and HBM	Social media usage in Wuhan during COVID-19
[24]	Instrumental variable (IV) which is a type of regression model	Assumption and observation vaccination data
[25]	SIR (Susceptible) infected recovered model	Vaccination Predictions depend on the assumption and parameters
[26]	ARIMA model	Looks time periods (pre- and post-vaccine)
[21]	Interrupted time series analysis model	Time Series Analysis on vaccine uptake
Current Study	Hybrid Harvest Model	Analysis of different time series forecasting models for COVID-19 vaccination data trends. This study also fills a gap in the existing literature by providing new insights into the temporal patterns in COVID-19 vaccination data.

Table 2. Dataset features and description.

Features	Description
Date	The date when vaccinations were administered
Total_vaccinations	Total vaccinations administered at a given location and date.
People_vaccinated	This feature shows the total number of people vaccinated at a given location.
Daily_vaccinations	This feature represents the number of vaccinations administered on a daily basis
Location	This feature represents the location where the vaccinations

Table 3. LSTM model training loss and time.

Epochs	Time	Loss
Epoch1	2 s	0.8893
Epoch2	25 ms	0.8366
Epoch3	27 ms	0.7841
Epoch4	27 ms	0.7318
Epoch5	29 ms	0.6795
Epoch6	26 ms	0.6264
Epoch7	30 ms	0.5725
Epoch8	28 ms	0.5171
Epoch9	36 ms	0.4588
Epoch10	35 ms	0.3969
Epoch11	25 ms	0.3306
Epoch12	25 ms	0.2606
Epoch13	32 ms	0.1877
Epoch14	24 ms	0.1136
Epoch15	26 ms	0.0459
Epoch16	26 ms	0.0031
Epoch17	28 ms	0.0207
Epoch18	25 ms	0.0938
Epoch19	25 ms	0.1159
Epoch20	25 ms	0.0829

Table 4. Model comparison graph.

Month	Total Vaccinations	ARIMA Predictions	LSTM Predictions	Prophet Predictions	Hybrid Harvest Predictions
1 January 2022	0.788998	0.786037	1.151888	0.717512	0.700948
1 February 2022	0.772674	0.876809	1.451874	0.782367	0.844516
1 March 2022	0.897380	0.943666	1.914707	0.840946	0.920159
1 April 2022	0.888432	1.016778	2.673512	0.905802	1.000663
1 May 2022	0.945290	1.065485	4.027120	0.968565	0.999839
1 June 2022	0.925579	1.115655	6.738361	1.033420	0.998414
1 July 2022	0.960983	1.142993	12.814659	1.096183	0.921279
1 August 2022	0.983265	1.180686	26.756876	1.161039	0.875297
1 September 2022	0.966707	1.217538	57.782985	1.225894	0.826311
1 October 2022	1.000000	1.256825	128.419337	1.288657	0.791844
1 November 2022	0.974238	1.299761	293.872168	1.353513	0.764587
1 December 2022	0.220636	1.325104	692.579314	1.416276	0.680327

Table 5. RMSE and MSE Error results.

Models	RMSE Errors	MSE Errors
ARIMA	0.365928	0.133903
LSTM	220.882020	48,788.8668
Prophet	0.386722	0.149554
Hybrid Harvest	0.30488	0.132321

Table 6. The ablation study results of RMSE, MAE and MAPE.

Model	RMSE	MAE	MAPE
ARIMA	2.34	1.89	3.12%
Prophet	2.10	1.65	2.98%
ETS	2.45	1.95	3.25%
Hybrid (ARIMA + Prophet)	1.85	1.42	2.75%

Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

© 2024 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).

Share and Cite

MDPI and ACS Style

Khalil, A.; Awan, M.J.; Yasin, A.; Kousar, T.; Rahman, A.; Youssef, M.S. Hybrid Time Series Model for Advanced Predictive Analysis in COVID-19 Vaccination. Electronics 2024, 13, 2468. https://doi.org/10.3390/electronics13132468

AMA Style

Khalil A, Awan MJ, Yasin A, Kousar T, Rahman A, Youssef MS. Hybrid Time Series Model for Advanced Predictive Analysis in COVID-19 Vaccination. Electronics. 2024; 13(13):2468. https://doi.org/10.3390/electronics13132468

Chicago/Turabian Style

Khalil, Amna, Mazhar Javed Awan, Awais Yasin, Tanzeela Kousar, Abdur Rahman, and Mohamed Sebaie Youssef. 2024. "Hybrid Time Series Model for Advanced Predictive Analysis in COVID-19 Vaccination" Electronics 13, no. 13: 2468. https://doi.org/10.3390/electronics13132468

APA Style

Khalil, A., Awan, M. J., Yasin, A., Kousar, T., Rahman, A., & Youssef, M. S. (2024). Hybrid Time Series Model for Advanced Predictive Analysis in COVID-19 Vaccination. Electronics, 13(13), 2468. https://doi.org/10.3390/electronics13132468

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Menu

Hybrid Time Series Model for Advanced Predictive Analysis in COVID-19 Vaccination

Abstract

1. Introduction

2. Materials and Methods

2.1. Data Collection

2.2. Data Preprocessing

2.3. Data Cleaning

2.3.1. Removal of Duplicates

2.3.2. Handling Missing Values

2.3.3. Outlier Detection

2.3.4. Normalization

2.4. Data Modeling and Splitting

3. Method

Proposed Hybrid-Harvest Model

4. Results

4.1. Quantitative Results

4.2. Ablation Studies

4.3. Qualitative Analysis

5. Conclusions

Author Contributions

Funding

Data Availability Statement

Conflicts of Interest

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI