# Using the SARIMA Model to Forecast the Fourth Global Wave of Cumulative Deaths from COVID-19: Evidence from 12 Hard-Hit Big Countries

## Abstract

**:**

## 1. Introduction

## 2. Brief Review of the Literature

## 3. Data

## 4. Methodology

#### 4.1. ARIMA and SARIMA Models

_{m}”, where: m is the frequency of data, and the lowercase and uppercase notations refer to the non-seasonal and seasonal components of the model, respectively.

- First, I split the original dataset into training and test sets, and I ran the model with the training set. Its output was compared with the target, i.e., the test set. In particular, the training set was used to predict the last 20 observations of the original dataset.1 The best ARIMA and SARIMA2 models were identified using the “auto.arima( )” function included in the package “forecast” (in the R software), developed by Hyndman and Khandakar (2008).3 This function follows sequential steps to identify the best model to fit. It finds the best model by using the unit root test to assess the non-seasonal and seasonal degrees of difference necessary to make the time series stationary4 and by looking at the minimization of the Akaike’s information criterion (AIC) and the maximum likelihood estimation (MLE).5 This procedure was used to prevent issues of overfitting and underfitting and to evaluate the overall performance of the model, i.e., its ability to predict unseen data. In addition, as suggested by Hyndman and Athanasopoulos (2021, sct. 5.2), I also compared my preferred methods to three simple forecasting methods, i.e., Mean, Naïve, and Seasonal Naïve approaches.6 To assess the suitability of each model, I used the mean absolute percentage error (MAPE) metric. In fact, it is the most widely used error metric (Kim and Kim 2016; Hyndman and Athanasopoulos 2018, sct. 3.4), and it is not scale-dependent. Thus, it is easily comparable, immediately giving a good approximation of the accuracy of the models.7
- Second, I forecasted the time window of specific interest, from 21 August 2021 to 19 September 2021, and I compared the best ARIMA and SARIMA models on the minimization of AIC and four common measures of the accuracy of models: the mean absolute error (MAE), MAPE, mean absolute scaled error (MASE) and the root mean squared error (RMSE). After identifying the best models with the “auto.arima( )” function, I fitted the SARIMA models with Gretl-2021-c software, using the exact MLE approach and standard errors of parameters based on the Hessian matrix.
- Then, I investigated the autocorrelation function (ACF) and the partial autocorrelation function (PACF) of the residuals for the first 14 lags to establish if the residuals described a white noise process. If signs of autocorrelation were present, as suggested by Hyndman and Athanasopoulos (2018, sct. 8.7), I graphically investigated ACF and PACF of the original time series (after differencing), and I added enough parameters until the residuals showed to be randomly distributed. This iterative process was based on the minimization of AIC and four common measures of the accuracy of models: MAE, MAPE, MASE, and RMSE.8
- Finally, I compared 30-day forecasts, from 21 August 2021 to 19 September 2021, with the actual trends (real-time data) to assess the overall reliability of the models by looking at the MAPE between them.

#### 4.2. Evaluation Metrics

## 5. Results and Discussion

_{7}, Bangladesh (3,1,3)(1,1,2)

_{7}, Brazil (1,1,8)(0,1,1)

_{7}, India (0,2,1)(2,0,2)

_{7}, Iran (6,2,2)(2,0,1)

_{7}, Mexico (0,2,1)(4,0,0)

_{7}, the Philippines (6,2,4)(3,0,4)

_{7}, Russia (4,2,4)(4,0,3)

_{7}, South Africa (5,1,8)(4,1,4)

_{7}, Thailand (4,2,10)(4,0,2)

_{7}, the US (6,1,1)(0,1,1)

_{7}, and Vietnam (5,2,4)(0,0,1)

_{7}.

## 6. Conclusions

## Supplementary Materials

## Funding

## Institutional Review Board Statement

## Informed Consent Statement

## Data Availability Statement

## Acknowledgments

## Conflicts of Interest

## Notes

1 | In fact, as suggested by Hyndman and Athanasopoulos (2021, sct. 5.8), in the first stage, it is crucial to ensure that models perform well on data that are not used to predict the future, and splitting the original dataset into two different subsets is a very common practice to do this. The choice of 20 observations for the test set was due to the fact that my predictive analysis was focused on the medium term. |

2 | |

3 | The “auto.arima( )” function is discussed in detail in Hyndman and Athanasopoulos (2018, sct. 8.7). |

4 | Specifically, the function uses as default the repeated Kwiatkowski–Phillips–Schmidt–Shin (KPSS) test (Kwiatkowski et al. 1992) to determine the appropriate non-seasonal order of differencing. As suggested by Hyndman (2014), this is generally more accurate than the two alternative tests, the augmented Dickey–Fuller (ADF) test (Dickey and Fuller 1979) and the Phillips–Perron (PP) test (Phillips and Perron 1988). To identify the appropriate seasonal order of differencing, the algorithm uses, as default, the test “seas”. This is a measure of seasonal strength developed by Wang et al. (2006). |

5 | For the ARIMA models, I used the following script: auto.arima(training_data,stationary=FALSE,seasonal=FALSE,ic=c(“aic”),stepwise=FALSE,nmodels=1000,approximation=FALSE,test=c(“kpss”)). While for the SARIMA models, I used the following script: auto.arima(train_argentina,stationary=FALSE,seasonal=TRUE,ic=c(“aic”),stepwise=FALSE,nmodels=1000,approximation=FALSE,test=c(“kpss”),seasonal.test=c(“seas”)). The same procedure was also applied to forecast the window of interest (from 21 August 2021 to 19 September 2021). |

6 | They were used as benchmarks, i.e., to ensure that ARIMA/SARIMA models were better than simple alternatives and, thus, worthy of being considered. |

7 | In this regard, it is useful to stress that MAPE also has some disadvantages, such as giving infinite or undefined results when one or more time series data point equals 0 or close-to-zero actual values. Moreover, it puts a heavier penalty on negative errors (i.e., when predicted values are higher than actual values) than on positive errors. In this case, the mean arctangent absolute percentage error (MAAPE) suggested by Kim and Kim (2016) could be implemented. However, since it did not modify the results of this paper, I preferred not to include it in the analysis. The output of MAAPE is available upon request. |

8 | The “auto.arima( )” function does not consider the functional form of the residuals. Thus, residuals could not be described as a white noise process. In this case, a manual adjustment is required (Hyndman and Athanasopoulos 2018, sct. 8.7). |

9 | The drift is omitted because all the models reported in Table 4 had a second difference operator (Hyndman and Athanasopoulos 2018, sct. 8.7). Moreover, a drift in first differences would imply the presence of a linear trend in levels, and that did not seem likely (Figure 1 and Figure 2). |

10 | I.e., the order of differencing needed to achieve stationarity. |

11 | To this regard, several studies showed the importance of demographic, environmental, healthcare, and lockdown policies in explaining COVID-19 deaths (Conyon et al. 2020; Sarkodie and Owusu 2020; Perone 2021a). |

12 | In Table S1 (Supplementary Materials S2), I compared the SARIMA models obtained using the “auto.arima( )” function and the adjusted SARIMA models on the minimization of AIC and four error measures (MAE, MAPE, MASE, and RMSE). The results showed that the latter outperformed the models obtained using the “auto.arima( )” function in 35 out 40 metrics, i.e., on 87.5% of all the forecast accuracy measures. The outcomes were not straightforward for Vietnam; however, the AIC, the ACF, and PACF clearly favored the adjusted SARIMA model. |

13 | The parameter values of the best SARIMA models are reported in Table S2 (Supplementary Materials S3). |

14 | Only the SARIMA model for Philippines exhibited a MASE close to 1 (0.9385). However, since it was lower than 1, SARIMA model was better than the naïve method. |

15 | It is necessary to stress that also the SARIMA model for Vietnam tended to overestimate the real trend. However, the MAPE difference between forecasted and observed data (after 30 days) is significantly lower (4.21%) than that for Thailand (10.69%). Thus, it does not appear to be a matter of serious concern. |

## References

- Adebiyi, Ariyo A., Aderemi O. Adewumi, and Charles K. Ayo. 2014. Comparison of ARIMA and artificial neural networks models for stock price prediction. Journal of Applied Mathematics 2014: 614342. [Google Scholar] [CrossRef] [Green Version]
- Ahmad, Amir, Sunita Garhwal, Santosh K. Ray, Gagan Kumar, Sharaf J. Malebary, and Omar M. Barukab. 2021. The number of confirmed cases of covid-19 by using machine learning: Methods and challenges. Archives of Computational Methods in Engineering 28: 2645–53. [Google Scholar] [CrossRef] [PubMed]
- Ala’raj, Maher, Munir Majdalawieh, and Nishara Nizamuddin. 2021. Modeling and forecasting of COVID-19 using a hybrid dynamic model based on SEIRD with ARIMA corrections. Infectious Disease Modelling 6: 98–111. [Google Scholar] [CrossRef] [PubMed]
- Alabdulrazzaq, Haneen, Mohammed N. Alenezi, Yasmeen Rawajfih, Bareeq A. Alghannam, Abeer A. Al-Hassan, and Fawaz S. Al-Anzi. 2021. On the accuracy of ARIMA based prediction of COVID-19 spread. Results in Physics 27: 104509. [Google Scholar] [CrossRef]
- Al-Turaiki, Isra, Fahad Almutlaq, Hend Alrasheed, and Norah Alballa. 2021. Empirical evaluation of alternative time-series models for covid-19 forecasting in Saudi Arabia. International Journal of Environmental Research and Public Health 18: 8660. [Google Scholar] [CrossRef]
- Alzahrani, Saleh I., Ibrahim A. Aljamaan, and Ebrahim A. Al-Fakih. 2020. Forecasting the spread of the COVID-19 pandemic in Saudi Arabia using ARIMA prediction model under current public health interventions. Journal of Infection and Public Health 13: 914–19. [Google Scholar] [CrossRef]
- Annas, Suwardi, Muh I. Pratama, Muh Rifandi, Wahidah Sanusi, and Syafruddin Side. 2020. Stability analysis and numerical simulation of SEIR model for pandemic COVID-19 spread in Indonesia. Chaos, Solitons & Fractals 139: 110072. [Google Scholar]
- Ardabili, Sina F., Amir Mosavi, Pedram Ghamisi, Filip Ferdinand, Annamaria R. Varkonyi-Koczy, Uwe Reuter, Timon Rabczuk, and Peter M. Atkinson. 2020. Covid-19 outbreak prediction with machine learning. Algorithms 13: 249. [Google Scholar] [CrossRef]
- ArunKumar, K. E., Dinesh V. Kalaga, Ch. Mohan S. Kumar, Govinda Chilkoor, Masahiro Kawaji, and Timothy M. Brenza. 2021. Forecasting the dynamics of cumulative COVID-19 cases (confirmed, recovered and deaths) for top-16 countries using statistical machine learning models: Auto-Regressive Integrated Moving Average (ARIMA) and Seasonal Auto-Regressive Integrated Moving Average (SARIMA). Applied Soft Computing 103: 107161. [Google Scholar]
- Barnett, Adrian G., and Annette J. Dobson. 2010. Analysing Seasonal Health Data. Berlin: Springer. [Google Scholar]
- Box, George E. P., and George C. Tiao. 1975. Intervention analysis with applications to economic and environmental problems. Journal of the American Statistical Association 70: 70–79. [Google Scholar] [CrossRef]
- Box, George E. P., and Gwilym M. Jenkins. 1976. Time Series Analysis: Forecasting and Control. San Francisco: Holden-Day. [Google Scholar]
- Box, George E. P., Gwilym M. Jenkins, and Gregory C. Reinsel. 1994. Time Series Analysis: Forecasting and Control, 3rd ed. Prentice Hall: Englewood Cliff. [Google Scholar]
- Cao, Long-Ting, Hong-Hui Liu, Juan Li, Xiao-Dong Yin, Yu Duan, and Jing Wang. 2020. Relationship of meteorological factors and human brucellosis in Hebei province, China. Science of the Total Environment 703: 135491. [Google Scholar] [CrossRef] [PubMed]
- Carcione, José M., Juan E. Santos, Claudio Bagaini, and Jing Ba. 2020. A simulation of a COVID-19 epidemic based on a deterministic SEIR model. Frontiers in Public Health 8: 230. [Google Scholar] [CrossRef] [PubMed]
- Castillo Ossa, Luis F., Pablo Chamoso, Jeferson Arango-López, Francisco Pinto-Santos, Gustavo A. Isaza, Cristina Santa-Cruz-González, Alejandro Ceballos-Marquez, Guillermo Hernández, and Juan M. Corchado. 2021. A Hybrid Model for COVID-19 Monitoring and Prediction. Electronics 10: 799. [Google Scholar] [CrossRef]
- Centers for Disease Control and Prevention [CDC]. 2021. What You Need to Know about Variants. Updated on 6 August 2021. Available online: https://www.cdc.gov/coronavirus/2019-ncov/variants/variant.html (accessed on 23 August 2021).
- Ceylan, Zeynep. 2020. Estimation of COVID-19 prevalence in Italy, Spain, and France. Science of the Total Environment 729: 138817. [Google Scholar] [CrossRef] [PubMed]
- Chatfield, Chris. 2000. Time-Series Forecasting, 1st ed. Boca Raton: Chapman and Hall/CRC. [Google Scholar]
- Chintalapudi, Nalini, Gopi Battineni, and Francesco Amenta. 2020. COVID-19 virus outbreak forecasting of registered and recovered cases after sixty day lockdown in Italy: A data driven model approach. Journal of Microbiology, Immunology and Infection 53: 396–403. [Google Scholar] [CrossRef] [PubMed]
- Chung, Roy C., Andrew W. H. Ip, and Sian L. Chan. 2009. An ARIMA-intervention analysis model for the financial crisis in China’s manufacturing industry. International Journal of Engineering Business Management 1: 15–18. [Google Scholar] [CrossRef] [Green Version]
- Clarke, Bertrand S., and Jennifer L. Clarke. 2018. Predictive Statistics: Analysis and Inference Beyond Models. Cambridge: Cambridge University Press, vol. 46. [Google Scholar]
- Cong, Jing, Mengmeng Ren, Shuyang Xie, and Pingyu Wang. 2019. Predicting Seasonal Influenza Based on SARIMA Model, in Mainland China from 2005 to 2018. International Journal of Environmental Research and Public Health 16: 4760. [Google Scholar] [CrossRef] [PubMed] [Green Version]
- Conyon, Martin J., Lerong He, and Steen Thomsen. 2020. Lockdowns and COVID-19 Deaths in Scandinavia. Covid Economics 26: 17–42. [Google Scholar] [CrossRef]
- Davidson, James. 2000. Econometric Theory. Hoboken: Wiley Blackwell, p. 528. [Google Scholar]
- Dickey, David A., and Wayne A. Fuller. 1979. Distribution of the estimators for autoregressive time series with a unit root. Journal of the American Statistical Association 74: 427–31. [Google Scholar]
- Earnest, Arul, Mark I. Chen, Donald Ng, and Leo Y. Sin. 2005. Using autoregressive integrated moving average (ARIMA) models to predict and monitor the number of beds occupied during a SARS outbreak in a tertiary hospital in Singapore. BMC Health Services Research 5: 1–8. [Google Scholar] [CrossRef] [Green Version]
- Engbert, Ralf, Maximilian M. Rabe, Reinhold Kliegl, and Sebastian Reich. 2021. Sequential data assimilation of the stochastic SEIR epidemic model for regional COVID-19 dynamics. Bulletin of Mathematical Biology 83: 1–16. [Google Scholar] [CrossRef] [PubMed]
- Gaudart, Jean, Ousmane Touré, Nadine Dessay, lassane A. Dicko, Stéphane Ranque, Loic Forest, Jacques Demongeot, and Ogobara K. Doumbo. 2009. Modelling malaria incidence with environmental dependency in a locality of Sudanese savannah area, Mali. Malaria Journal 8: 1–12. [Google Scholar] [CrossRef] [PubMed]
- Hasan, Najmul. 2020. A methodological approach for predicting COVID-19 epidemic using EEMD-ANN hybrid model. Internet of Things 11: 100228. [Google Scholar] [CrossRef]
- He, Zhirui, and Hongbing Tao. 2018. Epidemiology and ARIMA model of positive-rate of influenza viruses among children in Wuhan, China: A nine-year retrospective study. International Journal of Infectious Diseases 74: 61–70. [Google Scholar] [CrossRef] [Green Version]
- Hossain, Mohammad S., Mahbubul H. Siddiqee, Umme R. Siddiqi, Enayetur Raheem, Rokeya Akter, and Wenbiao Hu. 2020. Dengue in a crowded megacity: Lessons learnt from 2019 outbreak in Dhaka, Bangladesh. PLoS Neglected Tropical Diseases 14: e0008349. [Google Scholar] [CrossRef]
- Hyndman, Rob J. 2013. 2013 Forecasting with Daily Data, 13 September 2013. Available online: https://robjhyndman.com/hyndsight/dailydata/ (accessed on 5 October 2021).
- Hyndman, Rob J. 2014. Unit Root Tests and ARIMA Models. 12 March 2014. Available online: https://robjhyndman.com/hyndsight/unit-root-tests/ (accessed on 5 October 2021).
- Hyndman, Rob J., and Yeasmin Khandakar. 2008. Automatic time series forecasting: The forecast package for R. Journal of Statistical Software 27: 1–22. [Google Scholar] [CrossRef] [Green Version]
- Hyndman, Rob J., and Anne B. Koehler. 2006. Another look at measures of forecast accuracy. International Journal of Forecasting 22: 679–88. [Google Scholar] [CrossRef] [Green Version]
- Hyndman, Rob J., and George Athanasopoulos. 2018. Forecasting: Principles and Practice, 2nd ed. Melbourne: Monash University, Available online: https://otexts.com/fpp2/ (accessed on 10 August 2021).
- Hyndman, Rob J., and George Athanasopoulos. 2021. Forecasting: Principles and Practice, 3rd ed. Melbourne: Monash University, Available online: https://otexts.com/fpp3/ (accessed on 12 March 2022).
- Kane, Michael J., Natalie Price, Matthew Scotch, and Peter Rabinowitz. 2014. Comparison of ARIMA and Random Forest time series models for prediction of avian influenza H5N1 outbreaks. BMC Bioinformatics 15: 276. [Google Scholar] [CrossRef]
- Katoch, Rupinder, and Arpit Sidhu. 2021. An Application of ARIMA Model to Forecast the Dynamics of COVID-19 Epidemic in India. Global Business Review. [Google Scholar] [CrossRef]
- Khan, Farhan M., and Rajiv Gupta. 2020. ARIMA and NAR based prediction model for time series analysis of COVID-19 cases in India. Journal of Safety Science and Resilience 1: 12–18. [Google Scholar] [CrossRef]
- Kim, Sungil, and Heeyoung Kim. 2016. A new metric of absolute percentage error for intermittent demand forecasts. International Journal of Forecasting 32: 669–79. [Google Scholar] [CrossRef]
- Korolev, Ivan. 2021. Identification and estimation of the SEIRD epidemic model for COVID-19. Journal of Econometrics 220: 63–85. [Google Scholar] [CrossRef] [PubMed]
- Kufel, Tadeusz. 2020. ARIMA-based forecasting of the dynamics of confirmed Covid-19 cases for selected European countries. Equilibrium. Quarterly Journal of Economics and Economic Policy 15: 181–204. [Google Scholar]
- Kwekha-Rashid, Ameer S., Heamn N. Abduljabbar, and Bilal Alhayani. 2021. Coronavirus disease (COVID-19) cases analysis using machine-learning applications. Applied Nanoscience, 1–13. [Google Scholar] [CrossRef]
- Kwiatkowski, Denis, Peter C. B. Phillips, Peter Schmidt, and Yongcheol Shin. 1992. Testing the null hypothesis of stationarity against the alternative of a unit root: How sure are we that economic time series have a unit root? Journal of Econometrics 54: 159–78. [Google Scholar] [CrossRef]
- Lewis, Colin D. 1982. Industrial and Business Forecasting Methods: A Practical Guide to Exponential Smoothing and Curve Fitting. Boston and London: Butterworth Scientific. [Google Scholar]
- Li, Jizhen, Yuhong Li, Ming Ye, Sanqiao Yao, Chongchong Yu, Lei Wang, Weidong Wu, and Yongbin Wang. 2021. Forecasting the Tuberculosis Incidence Using a Novel Ensemble Empirical Mode Decomposition-Based Data-Driven Hybrid Model in Tibet, China. Infection and Drug Resistance 14: 1941. [Google Scholar] [CrossRef]
- Li, Qi, Na-Na Guo, Zhan-Ying Han, Yan-Bo Zhang, Shun-Xiang Qi, Yong-Gang Xu, Ya-Mei Wei, Xu Han, and Ying-Ying Liu. 2012. Application of an autoregressive integrated moving average model for predicting the incidence of hemorrhagic fever with renal syndrome. The American Journal of Tropical Medicine and Hygiene 87: 364. [Google Scholar] [CrossRef]
- Liu, X., Z. Lin, and Z. Feng. 2021. Short-term offshore wind speed forecast by seasonal ARIMA-A comparison against GRU and LSTM. Energy 227: 120492. [Google Scholar] [CrossRef]
- Liu, Lei, R. S. Luan, F. Yin, X. P. Zhu, and Q. Lü. 2016. Predicting the incidence of hand, foot and mouth disease in Sichuan province, China using the ARIMA model. Epidemiology & Infection 144: 144–51. [Google Scholar]
- Liu, Qiyong, Xiaodong Liu, Baofa Jiang, and Weizhong Yang. 2011. Forecasting incidence of hemorrhagic fever with renal syndrome in China using ARIMA model. BMC Infectious Diseases 11: 218. [Google Scholar] [CrossRef] [Green Version]
- Malki, Zohair, El-Sayed Atlam, Ashraf Ewis, Guesh Dagnew, Ahmad R. Alzighaibi, Ghada ELmarhomy, Mostafa A. Elhosseini, Aboul E. Hassanien, and Ibrahim Gad. 2021. ARIMA models for predicting the end of COVID-19 pandemic and the risk of second rebound. Neural Computing and Applications 33: 2929–48. [Google Scholar] [CrossRef] [PubMed]
- McCleary, Richard, Richard A. Hay, Errol E. Meidinger, and David McDowall. 1980. Applied Time Series Analysis for the Social Sciences. Beverly Hills: Sage Publications. [Google Scholar]
- Our World in Data. 2021. Our World in Data COVID-19 Dataset. Available online: https://ourworldindata.org/coronavirus (accessed on 25 September 2021).
- Pack, David J. 1990. In defense of ARIMA modeling. International Journal of Forecasting 6: 211–18. [Google Scholar] [CrossRef]
- Perone, Gaetano. 2020. An ARIMA Model to Forecast the Spread and the Final Size of COVID-2019 Epidemic in Italy. No. 20/07. HEDG-Health Econometrics and Data Group Working Paper Series. York: University of York. [Google Scholar]
- Perone, Gaetano. 2021a. The determinants of COVID-19 case fatality rate (CFR) in the Italian regions and provinces: An analysis of environmental, demographic, and healthcare factors. Science of the Total Environment 755: 142523. [Google Scholar] [CrossRef]
- Perone, Gaetano. 2021b. Comparison of ARIMA, ETS, NNAR, TBATS and hybrid models to forecast the second wave of COVID-19 hospitalizations in Italy. The European Journal of Health Economics, 1–24. [Google Scholar] [CrossRef]
- Phillips, Peter C., and Pierre Perron. 1988. Testing for a unit root in time series regression. Biometrika 75: 335–46. [Google Scholar] [CrossRef]
- Pinter, Gergo, Imre Felde, Amir Mosavi, Pedram Ghamisi, and Richard Gloaguen. 2020. COVID-19 pandemic prediction for Hungary; a hybrid machine learning approach. Mathematics 8: 890. [Google Scholar] [CrossRef]
- Piovella, Nicola. 2020. Analytical solution of SEIR model describing the free spread of the COVID-19 pandemic. Chaos, Solitons & Fractals 140: 110243. [Google Scholar] [CrossRef]
- Polwiang, Sittisede. 2020. The time series seasonal patterns of dengue fever and associated weather variables in Bangkok (2003–2017). BMC Infectious Diseases 20: 1–10. [Google Scholar] [CrossRef] [Green Version]
- Qiu, Hongfang, Han Zhao, Haiyan Xiang, Rong Ou, Jing Yi, Ling Hu, Hua Zhu, and Mengliang Ye. 2021. Forecasting the incidence of mumps in Chongqing based on a SARIMA model. BMC Public Health 21: 1–12. [Google Scholar] [CrossRef]
- Ren, Hong, Jian Li, Zheng-An Yuan, Jia-Yu Hu, Yan Yu, and Yi-Han Lu. 2013. The development of a combined mathematical model to forecast the incidence of hepatitis E in Shanghai, China. BMC Infectious Diseases 13: 421. [Google Scholar] [CrossRef] [Green Version]
- Roy, Santanu, Gouri S. Bhunia, and Pravat K. Shit. 2021. Spatial prediction of COVID-19 epidemic using ARIMA techniques in India. Modeling Earth Systems and Environment 7: 1385–91. [Google Scholar] [CrossRef] [PubMed]
- Safi, Samir K., and Olajide I. Sanusi. 2021. A hybrid of artificial neural network, exponential smoothing, and ARIMA models for COVID-19 time series forecasting. Model Assisted Statistics and Applications 16: 25–35. [Google Scholar] [CrossRef]
- Sahai, Alok K., Namita Rath, Vishal Sood, and Manvendra P. Singh. 2020. ARIMA modelling & forecasting of COVID-19 in top five affected countries. Diabetes & Metabolic Syndrome: Clinical Research & Reviews 14: 1419–27. [Google Scholar]
- Sarkodie, Samuel A., and Phebe A. Owusu. 2020. Impact of meteorological factors on COVID-19 pandemic: Evidence from top 20 countries with confirmed cases. Environmental Research 191: 110101. [Google Scholar] [CrossRef]
- Satpathy, Suneeta, Monika Mangla, Nonita Sharma, Hardik Deshmukh, and Sachinandan Mohanty. 2021. Predicting mortality rate and associated risks in COVID-19 patients. Spatial Information Research 29: 455–464. [Google Scholar] [CrossRef]
- Satrio, Christophorus. B. A., William Darmawan, Bellatasya U. Nadia, and Novita Hanafiah. 2021. Time series analysis and forecasting of coronavirus disease in Indonesia using ARIMA model and PROPHET. Procedia Computer Science 179: 524–32. [Google Scholar] [CrossRef]
- Sen, Parag, Mousumi Roy, and Parimal Pal. 2016. Application of ARIMA for forecasting energy consumption and GHG emission: A case study of an Indian pig iron manufacturing organization. Energy 116: 1031–38. [Google Scholar] [CrossRef]
- Singh, Sarbjit, Kulwinder S. Parmar, Jatinder Kumar, and Sidhu J. S. Makkhan. 2020. Development of new hybrid model of discrete wavelet decomposition and autoregressive integrated moving average (ARIMA) models in application to one month forecast the casualties cases of COVID-19. Chaos, Solitons & Fractals 135: 109866. [Google Scholar]
- Sujatha, R., Jyotir M. Chatterjee, and Aboul E. Hassanien. 2020. A machine learning forecasting model for COVID-19 pandemic in India. Stochastic Environmental Research and Risk Assessment 34: 959–72. [Google Scholar] [CrossRef]
- Talkhi, Nasrin, Narges A. Fatemi, Zahra Ataei, and Mehdi J. Nooghabi. 2021. Modeling and forecasting number of confirmed and death caused COVID-19 in IRAN: A comparison of time series forecasting methods. Biomedical Signal Processing and Control 66: 102494. [Google Scholar] [CrossRef]
- Tuli, Shreshth, Shikhar Tuli, Rakesh Tuli, and Sukhpal S. Gill. 2020. Predicting the growth and trend of COVID-19 pandemic using machine learning and cloud computing. Internet of Things 11: 100222. [Google Scholar] [CrossRef]
- Tran, Thai T., Thanh-Luu Pham, and Ngo X. Quang. 2020. Forecasting epidemic spread of SARS-CoV-2 using ARIMA model (Case study: Iran). Global Journal of Environmental Science and Management 6: 1–10. [Google Scholar]
- Valipour, Mohammad. 2015. Long-term runoff study using SARIMA and ARIMA models in the United States. Meteorological Applications 22: 592–98. [Google Scholar] [CrossRef]
- Viguerie, Alex, Guillermo Lorenzo, Ferdinando Auricchio, Davide Baroli, Thomas J. Hughes, Alessia Patton, Alessandro Reali, Thomas E. Yankeelov, and Alessandro Veneziani. 2021. Simulating the spread of COVID-19 via a spatially-resolved susceptible–exposed–infected–recovered–deceased (SEIRD) model with heterogeneous diffusion. Applied Mathematics Letters 111: 106617. [Google Scholar] [CrossRef]
- Wang, Lulu, Chen Liang, Wei Wu, Shengwen Wu, Jinghua Yang, Xiaobo Lu, Yuan Cai, and Cuihong Jin. 2019. Epidemic Situation of Brucellosis in Jinzhou City of China and Prediction Using the ARIMA Model. Canadian Journal of Infectious Diseases and Medical Microbiology 2019: 1429462. [Google Scholar] [CrossRef] [Green Version]
- Wang, Peipei, Xinqi Zheng, Jiayang Li, and Bangren Zhu. 2020. Prediction of epidemic trends in COVID-19 with logistic model and machine learning technics. Chaos, Solitons & Fractals 139: 110058. [Google Scholar]
- Wang, Xiaozhe, Kate A. Smith, and Rob J. Hyndman. 2006. Characteristic-based clustering for time series data. Data Mining and Knowledge Discovery 13: 335–64. [Google Scholar] [CrossRef]
- Wang, Ya-Wen, Zhong-Zhou Shen, and Yu Jiang. 2018. Comparison of ARIMA and GM (1, 1) models for prediction of hepatitis B in China. PLoS ONE 13: e0201987. [Google Scholar] [CrossRef]
- Wei, Wudi, Junjun Jiang, Hao Liang, Lian Gao, Bingyu Liang, Jiegang Huang, Ning Zang, Yanyan Liao, Jun Yu, Jingzhen Lai, and et al. 2016. Application of a Combined Model with Autoregressive Integrated Moving Average (ARIMA) and Generalized Regression Neural Network (GRNN) in Forecasting Hepatitis Incidence in Heng County, China. PLoS ONE 11: e0156768. [Google Scholar] [CrossRef]
- World Bank. 2021. World Bank Open Data. Available online: https://data.worldbank.org (accessed on 30 August 2021).
- Worldometer. 2021. Available online: https://www.worldometers.info/coronavirus/ (accessed on 30 August 2021).
- Xu, Qinqin, Runzi Li, Yafei Liu, Cheng Luo, Aiqiang Xu, Fuzhong Xue, Qing Xu, and Xiujun Li. 2017. Forecasting the incidence of mumps in Zibo City based on a SARIMA model. International Journal of Environmental Research and Public Health 14: 925. [Google Scholar] [CrossRef] [Green Version]
- Yousaf, Muhammad, Samiha Zahir, Muhammad Riaz, Sardar M. Hussain, and Kamal Shah. 2020. Statistical analysis of forecasting COVID-19 for upcoming month in Pakistan. Chaos, Solitons & Fractals 138: 109926. [Google Scholar]
- Zeng, Qianglin, Dandan Li, Gui Huang, Jin Xia, Xiaoming Wang, Yamei Zhang, Wanping Tang, and Hui Zhou. 2016. Time series analysis of temporal trends in the pertussis incidence in Mainland China from 2005 to 2016. Scientific Reports 6: 1–8. [Google Scholar] [CrossRef] [PubMed]
- Zhang, Lanyi, Jane Lin, Rongzu Qiu, Xisheng Hu, Huihui Zhang, Qingyao Chen, Huamei Tan, Danting Lin, and Jiankai Wang. 2018. Trend analysis and forecast of PM2. 5 in Fuzhou, China using the ARIMA model. Ecological Indicators 95: 702–10. [Google Scholar] [CrossRef]
- Zheng, Nanning, Shaoyi Du, Jianji Wang, He Zhang, Wenting Cui, Zijian Kang, Tao Yang, Bin Lou, Yuting Chi, Hong Long, and et al. 2020. Predicting COVID-19 in China using hybrid AI model. IEEE Transactions on Cybernetics 50: 2891–904. [Google Scholar] [CrossRef]
- Zheng, Yan-Ling, Li-Ping Zhang, Xue-Liang Zhang, Kai Wang, and Yu-Jian Zheng. 2015. Forecast model analysis for the morbidity of tuberculosis in Xinjiang, China. PLoS ONE 10: e0116832. [Google Scholar] [CrossRef]

**Figure 1.**Cumulative deaths from COVID-19 for the 12 selected countries from 19 February 2020 to 20 August 2021. Source: Our World in Data (2021).

**Figure 2.**Daily deaths from COVID-19 for the 12 selected countries from 19 February 2020 to 20 August 2021. Source: Our World in Data (2021).

**Figure 3.**The number of deaths from COVID-19 per 100,000 inhabitants in 12 hard-hit big countries from 19 February 2020 to 20 August 2021. Source: Author’s elaborations on Source: Our World in Data (2021) and World Bank (2021).

**Figure 4.**Nine sequential steps to identify and evaluate the best forecasting models for cumulative deaths from COVID-19.

**Figure 5.**ARIMA forecasting models built on the training set over the period 1 August 2021–20 August 2021, in Argentina, Bangladesh, Brazil, India, Iran, Mexico, the Philippines, Russia, South Africa, Thailand, the US, and Vietnam.

**Figure 6.**SARIMA forecasts on the training set over the period 1 August 2021–20 August 2021, in Argentina, Bangladesh, Brazil, India, Iran, Mexico, the Philippines, Russia, South Africa, Thailand, the US, and Vietnam.

**Figure 7.**ACF and PACF plot of the residuals of the best SARIMA models (reported in Table 5).

**Figure 8.**SARIMA models for forecasting the dynamics of cumulative deaths from COVID-19 over the period 21 August 2021–19 September 2021, in Argentina, Bangladesh, Brazil, India, Iran, and Mexico.

**Figure 9.**SARIMA models for forecasting the dynamics of cumulative deaths from COVID-19 over the period 21 August 2021–19 September 2021, in the Philippines, Russia, South Africa, Thailand, the US, and Vietnam.

**Figure 10.**Comparison between forecasts and real data during the period 21 August 2021–19 September 2021, for cumulative deaths from COVID-19, in Argentina, Bangladesh, Brazil, India, Iran, and Mexico.

**Figure 11.**Comparison between forecasts and real data during the period 21 August 2021–19 September 2021, for cumulative deaths from COVID-19, in the Philippines, Russia, South Africa, Thailand, the US, and Vietnam.

**Table 1.**Thirty-two selected studies on infectious disease forecasting, which used non-seasonal and seasonal ARIMA model.

Authors | Disease | Methodological Approach | Investigated Area |
---|---|---|---|

Earnest et al. (2005) | SARS | ARIMA | Singapore |

Gaudart et al. (2009) | Malaria | ARIMA, SIRS | Mali |

Liu et al. (2011) | HFRS | ARIMA | China |

Li et al. (2012) | HFRS | SARIMA | China |

Ren et al. (2013) | Hepatitis E | ARIMA, BPNN | Shanghai, China |

Kane et al. (2014) | H5N1 | ARIMA and RANDOM FOREST | Egypt |

Zheng et al. (2015) | Tuberculosis | SARIMA | Xinjiang, China |

Wei et al. (2016) | Hepatitis A | SARIMA, GRNN, and SARIMA-GRNN | Heng County, China |

Zeng et al. (2016) | Pertussis | SARIMA, ETS | China |

Xu et al. (2017) | Mumps | SARIMA | Zibo, China |

He and Tao (2018) | Influenza | ARIMA | Wuhan, China |

Wang et al. (2018) | Hepatitis B | SARIMA, GM (1,1) | China |

Cong et al. (2019) | Influenza | SARIMA | Mainland China |

Wang et al. (2019) | Human Brucellosis | ARIMA | Jinzhou, China |

Alzahrani et al. (2020) | COVID-19 | ARIMA | Saudi Arabia |

Cao et al. (2020) | Human Brucellosis | SARIMA | Hebei, China |

Ceylan (2020) | COVID-19 | ARIMA | France, Italy, Spain |

Chintalapudi et al. (2020) | COVID-19 | ARIMA | Italy |

Hossain et al. (2020) | Dengue fever | ARIMA | Dhaka, Bangladesh |

Perone (2020) | COVID-19 | ARIMA | Italy |

Polwiang (2020) | Dengue fever | ANN, ARIMA, MPR | Bangkok, Thailand |

Singh et al. (2020) | COVID-19 | ARIMA | 15 countries |

Tran et al. (2020) | COVID-19 | ARIMA | Iran |

Yousaf et al. (2020) | COVID-19 | ARIMA | Pakistan |

Ala’raj et al. (2021) | COVID-19 | SEIRD-ARIMA | US |

ArunKumar et al. (2021) | COVID-19 | ARIMA and SARIMA | 16 countries |

Li et al. (2021) | Tuberculosis | EEMD-ARIMA-NANN | Tibet |

Malki et al. (2021) | COVID-19 | SARIMA | 20 countries |

Perone (2021b) | COVID-19 | ETS, NARNN, SARIMA, TBATS, and hybrid models | Italy |

Qiu et al. (2021) | Mumps | SARIMA | Chongqing, China |

Roy et al. (2021) | COVID-19 | ARIMA | India |

Satrio et al. (2021) | COVID-19 | ARIMA and PROPHET | Indonesia |

Countries | Start Date | End Date | Observations |
---|---|---|---|

Argentina | 8 March 2020 | 20 August 2021 | 531 |

Bangladesh | 18 March 2020 | 20 August 2021 | 521 |

Brazil | 17 March 2020 | 20 August 2021 | 522 |

India | 11 March 2020 | 20 August 2021 | 528 |

Iran | 19 February 2020 | 20 August 2021 | 549 |

Mexico | 19 March 2020 | 20 August 2021 | 520 |

Philippines | 11 March 2020 | 20 August 2021 | 528 |

Russia | 19 March 2020 | 20 August 2021 | 520 |

South Africa | 27 March 2020 | 20 August 2021 | 512 |

Thailand | 23 March 2020 | 20 August 2021 | 516 |

US | 29 February 2020 | 20 August 2021 | 539 |

Vietnam | 31 July 2020 | 20 August 2021 | 386 |

**Table 3.**Comparing ARIMA and SARIMA approaches to three simple statistical methods (Mean, Naïve, and Seasonal Naïve).

Methods | AR | BD | BR | IN | IR | MX | |

Mean | Training | 72,976.46 | 9848.56 | 77,102.94 | 165,478.16 | 15,028.63 | 80,758.69 |

Test | 65.428 | 71.1171 | 64.6042 | 68.3176 | 58.2728 | 53.0982 | |

Naïve | Training | 2.1205 | 1.8492 | 2.3508 | 2.4336 | 1.8627 | 2.285 |

Test | 2.1855 | 10.3307 | 1.5637 | 1.1819 | 5.0682 | 2.0799 | |

Seasonal Naïve | Training | 12.6593 | 10.3185 | 10.959 | 12.9676 | 9.0316 | 11.3767 |

Test | 3.1073 | 13.4784 | 2.1704 | 1.6031 | 6.0435 | 2.6464 | |

ARIMA | Training | 1.1251 | 0.8419 | 0.8078 | 1.1925 | 0.4128 | 1.1023 |

Test | 0.7961 | 0.7185 | 0.4104 | 0.2059 | 1.746 | 0.6032 | |

SARIMA | Training | 1.0683 | 0.8141 | 0.4796 | 1.1867 | 0.364 | 1.026 |

Test | 0.7081 | 0.4334 | 0.1098 | 0.6559 | 1.4504 | 0.5796 | |

Methods | PH | RU | ZA | TH | US | VN | |

Mean | Training | 4911.19 | 83,559.78 | 27,054.27 | 627.5129 | 173,294.91 | 92.761 |

Test | 68.9418 | 67.2429 | 61.9092 | 94.129 | 49.7039 | 98.1843 | |

Naïve | Training | 1.8171 | 2.1524 | 2.0984 | 1.5583 | 2.2007 | 1.5119 |

Test | 5.1284 | 4.8805 | 4.7421 | 25.8882 | 0.9331 | 60.9863 | |

Seasonal Naïve | Training | 9.4063 | 11.8052 | 11.6468 | 7.9684 | 9.8018 | 7.1452 |

Test | 6.5852 | 6.3732 | 6.3233 | 33.0028 | 1.1509 | 79.0645 | |

ARIMA | Training | 1.001 | 0.8599 | 1.0944 | 0.837 | 0.9601 | 1.7891 |

Test | 1.344 | 0.0784 | 0.1913 | 2.3214 | 0.4333 | 27.36 | |

SARIMA | Training | 0.9768 | 0.5512 | 0.7008 | 0.8679 | 0.606 | 1.5797 |

Test | 1.0353 | 0.0782 | 0.4488 | 0.2977 | 0.3122 | 7.98 |

Countries | Parameters | AIC | MAE | MAPE | MASE | RMSE |
---|---|---|---|---|---|---|

Argentina | (3,2,2) | 6881.541 | 66.581 | 1.0787 | 0.3196 | 159.55 |

Bangladesh | (3,2,2) | 3812.116 | 6.7566 | 0.8228 | 0.1378 | 9.3994 |

Brazil | (3,2,2) | 7664.725 | 265.97 | 0.7199 | 0.2541 | 378.57 |

India | (0,2,1) | 7655.07 | 126.01 | 1.152 | 0.1524 | 348.46 |

Iran | (1,2,4) | 5023.525 | 16.479 | 0.4051 | 0.0893 | 23.59 |

Mexico | (2,2,2) | 7508.783 | 190.91 | 1.0532 | 0.3924 | 336.16 |

Philippines | (4,2,1) | 5489.394 | 26.527 | 0.9718 | 0.4464 | 44.104 |

Russia | (3,2,2) | 5217.116 | 27.871 | 0.9781 | 0.0764 | 36.772 |

South Africa | (2,2,3) | 5776.75 | 44.677 | 1.0602 | 0.288 | 68.688 |

Thailand | (1,2,4) | 3760.553 | 3.2342 | 0.9101 | 0.188 | 9.255 |

US | (5,2,0) | 7751.229 | 212.06 | 0.9297 | 0.181 | 325.17 |

Vietnam | (1,2,4) | 3945.154 | 8.2099 | 1.9751 | 0.4174 | 40.251 |

Countries | Parameters | AIC | MAE | MAPE | MASE | RMSE |
---|---|---|---|---|---|---|

Argentina | (0,2,1)(2,0,2)_{7} | 6851.536 | 58.478 | 1.0298 | 0.0399 | 154.55 |

Bangladesh | (3,1,3)(1,1,2)_{7} | 3745.92 | 6.425 | 0.5554 | 0.0192 | 8.9982 |

Brazil | (1,1,8)(0,1,1)_{7} | 7190.918 | 162.06 | 0.4563 | 0.021 | 256.97 |

India | (0,2,1)(2,0,2)_{7} | 7652.34 | 126 | 1.1499 | 0.0216 | 344.51 |

Iran | (6,2,2)(2,0,1)_{7} | 4944.182 | 15.442 | 0.3413 | 0.0122 | 21.571 |

Mexico | (0,2,1)(4,0,0)_{7} | 7438.09 | 156.18 | 0.9417 | 0.0456 | 312.81 |

Philippines | (6,2,4)(3,0,4)_{7} | 5456.546 | 24.988 | 0.9385 | 0.9385 | 41.113 |

Russia | (4,2,4)(4,0,3)_{7} | 4826.443 | 17.983 | 0.6797 | 0.0079 | 24.422 |

South Africa | (5,1,8)(4,1,4)_{7} | 5665.571 | 41.036 | 0.6862 | 0.0373 | 62.244 |

Thailand | (4,2,10)(4,0,2)_{7} | 3536.975 | 2.8601 | 0.9368 | 0.0261 | 7.0336 |

US | (6,1,1)(0,1,1)_{7} | 7446.294 | 172.95 | 0.5977 | 0.0208 | 263.14 |

Vietnam | (5,2,4)(0,0,1)_{7} | 3903.771 | 7.8076 | 1.9188 | 0.0651 | 37.599 |

**Table 6.**Comparison between ARIMA and SARIMA models, considering the minimization of AIC, MAE, MAPE, MASE, and RMSE metrics (in percentage), for cumulative deaths from COVID-19.

Countries | AIC | MAE | MAPE | MASE | RMSE |
---|---|---|---|---|---|

Argentina | −0.44 | −12.17 | −4.53 | −87.52 | −3.13 |

Bangladesh | −1.74 | −4.91 | −32.5 | −86.07 | −4.27 |

Brazil | −6.18 | −39.07 | −36.62 | −91.74 | −32.12 |

India | −0.036 | −0.008 | −0.18 | −85.83 | −1.13 |

Iran | −1.58 | −6.29 | −15.75 | −86.34 | −8.56 |

Mexico | −0.94 | −18.19 | −10.59 | −88.38 | −6.95 |

Philippines | −0.598 | −5.8 | −3.43 | 110.24 | −6.78 |

Russia | −7.49 | −35.48 | −30.51 | −89.66 | −33.59 |

South Africa | −1.92 | −8.15 | −35.28 | −87.05 | −9.38 |

Thailand | −5.95 | −11.57 | 2.93 | −86.12 | −24 |

US | −3.93 | −18.44 | −35.71 | −88.51 | −19.08 |

Vietnam | −1.05 | −4.9 | −2.85 | −84.4 | −6.59 |

**Table 7.**Comparison of forecasted values and real-time data over the period 21 August 2021–19 September 2021, considering the MAPE difference between them.

Countries | Values | Values | Values | Values |
---|---|---|---|---|

Until 25 August 2021 | Until 30 August 2021 | Until 9 September 2021 | Until 19 September 2021 | |

Argentina | 0.057 | 0.061 | 0.09 | 0.1107 |

Bangladesh | 0.1955 | 0.3179 | 0.4651 | 0.4761 |

Brazil | 0.0271 | 0.0588 | 0.1691 | 0.3131 |

India | 0.0479 | 0.1376 | 0.2671 | 0.2961 |

Iran | 0.0981 | 0.2209 | 0.2975 | 0.3846 |

Mexico | 0.0515 | 0.0642 | 0.0808 | 0.2623 |

Philippines | 0.6835 | 0.6182 | 0.5423 | 0.8411 |

Russia | 0.0076 | 0.011 | 0.014 | 0.032 |

South Africa | 0.0558 | 0.092 | 0.132 | 0.3331 |

Thailand | 1.2301 | 2.4151 | 5.6266 | 10.6897 |

US | 0.0848 | 0.1124 | 0.1458 | 0.1463 |

Vietnam | 1.2779 | 1.4391 | 2.6018 | 4.2089 |

Publisher’s Note: MDPI stays neutral with regard to jurisdictional claims in published maps and institutional affiliations. |

© 2022 by the author. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).

## Share and Cite

**MDPI and ACS Style**

Perone, G.
Using the SARIMA Model to Forecast the Fourth Global Wave of Cumulative Deaths from COVID-19: Evidence from 12 Hard-Hit Big Countries. *Econometrics* **2022**, *10*, 18.
https://doi.org/10.3390/econometrics10020018

**AMA Style**

Perone G.
Using the SARIMA Model to Forecast the Fourth Global Wave of Cumulative Deaths from COVID-19: Evidence from 12 Hard-Hit Big Countries. *Econometrics*. 2022; 10(2):18.
https://doi.org/10.3390/econometrics10020018

**Chicago/Turabian Style**

Perone, Gaetano.
2022. "Using the SARIMA Model to Forecast the Fourth Global Wave of Cumulative Deaths from COVID-19: Evidence from 12 Hard-Hit Big Countries" *Econometrics* 10, no. 2: 18.
https://doi.org/10.3390/econometrics10020018