A Novel βSA Ensemble Model for Forecasting the Number of Confirmed COVID-19 Cases in the US
Abstract
:1. Introduction
2. Materials and Methods
2.1. Data Collection
2.2. The α-Sutte and Proposed β-Sutte Indicator
2.3. Autoregressive Integrated Moving Average (ARIMA)
2.4. βSA Ensemble Model
3. Experiment
3.1. Data
3.2. Metrics
3.3. Flow Chart of Experiment
- The data set used in this study does not provide the daily cumulative total number of confirmed cases in the U.S., Therefore, this study applies to sum up the daily cumulative total number of cases (tot_cases) of all states to calculate the daily cumulative total number of confirmed cases in the U.S. In addition, it will select the five worst-hit states of the cumulative number of confirmed cases for evaluation.
- Split the preprocessed data set into training days from to and a testing day . The time window is set to eight, and the sliding window is set to one.
- Employ the dynamic weighting method based on the error value, and calculate the training weights , , according to their error function , , .
- By using the sliding window method to obtain the moving dynamic weights, the time points of training days required for the β-Sutte indicator is seven. Let the obtained dynamic weights be incorporated into the β-Sutte indicator to predict
- The ARIMA method uses the same dataset, then averages the results of β-Sutte indicator and ARIMA, becoming the final result of the βSA ensemble model. In addition, α-Sutte indicator is also used to compare other models with different forecast periods (one-day-ahead and five days ahead, respectively).
- Compare and discuss the results of different methods (α-Sutte indicator, β-Sutte indicator, ARIMA, and βSA ensemble model) by using evaluation metrics with RMSE and MAPE. It is expected that the β-Sutte indicator and the βSA ensemble model are better than the α-Sutte indicator and ARIMA in the performance of model evaluation metrics.
4. Results and Discussion
4.1. One-Day-Ahead Forecasting of the Cumulative Number of Confirmed Cases
4.2. Five Days Ahead Forecasting of the Cumulative Number of Confirmed Cases
4.3. Forecasting of the Daily Number of Newly Confirmed Cases
5. Conclusions
- The dynamic weight of the β-Sutte indicator proposed in this study needs seven days of in training advance, thus the time cost will be higher than the four days preset by the α-Sutte indicator. Moreover, the ARIMA model needs even more time cost for training. Therefore, how to decide which one is the best to use is an issue.
- The β-Sutte indicator proposed uses an error-based weighting method to improve the original α-Sutte indicator. In the future, perhaps other weighting methods can be used to increase the performance, such as entropy weighting, genetic algorithm, etc.
- The βSA ensemble model proposed in this study adopts a mean distributed weight method. In the future, perhaps different weighted distribution methods can be used for ensemble weight adjustment.
- Due to the many external variables involved with confirmed COVID-19 cases, other variables could be considered in the future (for example: death rate, transmission rate, etc.), and, combined with other deep learning methods, machine learning methods as a hybrid or ensemble model also have a chance to further improve the accuracy of forecasting.
Author Contributions
Funding
Institutional Review Board Statement
Informed Consent Statement
Data Availability Statement
Conflicts of Interest
References
- Zhang, X.; Ma, R.; Wang, L. Predicting turning point, duration and attack rate of COVID-19 outbreaks in major Western countries. Chaos Solitons Fractals 2020, 135, 109829. [Google Scholar] [CrossRef] [PubMed]
- Alazab, M.; Awajan, A.; Mesleh, A.; Abraham, A.; Jatana, V.; Alhyari, S. Covid-19 prediction and detection using deep learning. Int. J. Comput. Inf. Syst. Ind. Manag. Appl. 2020, 12, 168–181. [Google Scholar]
- Chimmula, V.K.R.; Zhang, L. Time series forecasting of COVID-19 transmission in Canada using LSTM networks. Chaos Solitons Fractals 2020, 135, 109864. [Google Scholar] [CrossRef] [PubMed]
- Utkucan, Ş.; Tezcan, Ş. Forecasting the cumulative number of confirmed cases of COVID-19 in Italy, UK and USA using fractional nonlinear grey Bernoulli model. Chaos Solitons Fractals 2020, 138, 109948. [Google Scholar] [CrossRef]
- Williamson, E.J.; Walker, A.J.; Bhaskaran, K.; Bacon, S.; Bates, C.; Morton, C.E.; Inglesby, P. Factors associated with COVID-19-related death using OpenSAFELY. Nature 2020, 584, 430–436. [Google Scholar] [CrossRef] [PubMed]
- Zhou, F.; Yu, T.; Du, R.; Fan, G.; Liu, Y.; Liu, Z.; Xiang, J.; Wang, Y.; Song, B.; Gu, X.; et al. Clinical course and risk factors for mortality of adult inpatients with COVID-19 in Wuhan, China: A retrospective cohort study. Lancet 2020, 395, 1054–1062, Erratum in Lancet 2020, 395, 1038. [Google Scholar] [CrossRef]
- Mehdiyev, N.; Enke, D.; Fettke, P.; Loos, P. Evaluating forecasting methods by considering different accuracy measures. Procedia Comput. Sci. 2016, 95, 264–271. [Google Scholar] [CrossRef] [Green Version]
- Ahmar, A.S.; Rahman, A.; Mulbar, U. Implementation of α-Sutte Indicator to Forecasting Consumer Price Index in Turkey. In Proceedings of the International Conference On Mathematics and Natural Sciences, Bali, Indonesia, 6–7 September 2017; pp. 1–4. [Google Scholar] [CrossRef]
- Al-Dahidi, S.; Baraldi, P.; Zio, E.; Legnani, E. A dynamic weighting ensemble approach for wind energy production prediction. In Proceedings of the 2017 2nd International Conference on System Reliability and Safety (ICSRS), Milan, Italy, 20–22 December 2017. [Google Scholar]
- Abdollahi, H.; Ebrahimi, S.B. A new hybrid model for forecasting Brent crude oil price. Energy 2020, 200, 117520. [Google Scholar] [CrossRef]
- Abolmaali, S.; Shirzaei, S. A comparative study of SIR Model, Linear Regression, Logistic Function and ARIMA Model for forecasting COVID-19 cases. AIMS Public Health 2021, 8, 598–613. [Google Scholar] [CrossRef] [PubMed]
- Ahmar, A.S.; Del Val, E.B. SutteARIMA: Short-term forecasting method, a case: Covid-19 and stock market in Spain. Sci. Total Environ. 2020, 729, 138883. [Google Scholar] [CrossRef] [PubMed]
- Kim, S.; Kim, H. A new metric of absolute percentage error for intermittent demand forecasts. Int. J. Forecast. 2016, 32, 669–679. [Google Scholar] [CrossRef]
- Chicco, D.; Warrens, M.J.; Jurman, G. The coefficient of determination R-squared is more informative than SMAPE, MAE, MAPE, MSE and RMSE in regression analysis evaluation. PeerJ. Comput. Sci. 2021, 7, e623. [Google Scholar] [CrossRef]
- Khan, F.; Ali, S.; Saeed, A.; Kumar, R.; Khan, A.W. Forecasting daily new infections, deaths and recovery cases due to COVID-19 in Pakistan by using Bayesian Dynamic Linear Models. PLoS ONE 2021, 16, e0253367. [Google Scholar] [CrossRef]
- Ahmar, A.S. α-Sutte Indicator: Suatu Pendekan Baru dalam Peramalan Data. Monograph 2017, 1–5. [Google Scholar] [CrossRef]
Variables | Definitions |
---|---|
submission_date | Date |
state | U.S. State Name |
tot_cases | Total number of cases |
conf_cases | Total number of confirmed cases |
prob_cases | Possible total number of cases |
new_case | Number of new confirmed cases |
pnew_case | Number of new possible cases |
tot_death | Total number of deaths |
conf_death | Total number of confirmed deaths |
prob_death | Total number of possible deaths |
new_death | Number of newly confirmed deaths |
pnew_death | Number of new possible death cases |
created_at | Date of the profile created |
consent_cases | If agreed, include confirmed and possible cases. If disagreed, only include all cases |
consent_deaths | If agreed, it includes confirmed and possible deaths. If disagreed, only all deaths are included. |
Notations | Definition |
---|---|
Observation at the t-th day | |
day | |
The forecasting value at the -th day | |
Error function | |
The dynamic weighting function |
Area | Metrics | α-Sutte | β-Sutte | ARIMA | βSA |
---|---|---|---|---|---|
USA | MAPE | 0.1394 | 0.1372 | 0.13273 | 0.13272 |
RMSE | 19644.57 | 19546.79 | 19614.3 | 19236.6 | |
IL | MAPE | 0.2273 | 0.2259 | 0.2205 | 0.2198 |
RMSE | 1616.399 | 1618.376 | 1636.707 | 1591.048 | |
OH | MAPE | 0.3085 | 0.3013 | 0.2921 | 0.2902 |
RMSE | 2127.701 | 2050.916 | 1989.027 | 2000.301 | |
GA | MAPE | 0.1830 | 0.1827 | 0.1746 | 0.1734 |
RMSE | 916.2098 | 914.0448 | 876.5051 | 877.3660 | |
PA | MAPE | 0.2191 | 0.2177 | 0.2232 | 0.2166 |
RMSE | 1041.438 | 1023.412 | 1005.699 | 994.5549 | |
AZ | MAPE | 0.2666 | 0.2699 | 0.2652 | 0.2603 |
RMSE | 1653.714 | 1662.204 | 1584.275 | 1607.716 |
Area | Metrics | α-Sutte | β-Sutte | ARIMA | βSA |
---|---|---|---|---|---|
USA | MAPE | 0.004448 | 0.004453 | 0.004195 | 0.004310 |
RMSE | 29848.19 | 29905.25 | 29127.87 | 29182.17 | |
IL | MAPE | 0.35 | 0.353 | 0.19 | 0.25 |
RMSE | 840.65 | 846.80 | 538.28 | 622.25 | |
OH | MAPE | 0.803 | 0.806 | 0.64 | 0.71 |
RMSE | 1067.16 | 1072.69 | 906.65 | 983.37 | |
GA | MAPE | 0.662 | 0.665 | 0.48 | 0.54 |
RMSE | 1974.65 | 1983.20 | 1523.24 | 1704.02 | |
PA | MAPE | 0.47 | 0.46 | 0.451 | 0.457 |
RMSE | 748.16 | 743.89 | 708.67 | 716.86 | |
AZ | MAPE | 0.60 | 0.61 | 0.81 | 0.70 |
RMSE | 1752.19 | 1786.45 | 2506.30 | 2125.71 |
Area | Metrics | α-Sutte | β-Sutte | ARIMA | βSA |
---|---|---|---|---|---|
USA | MAPE | 6754 | 6961 | 6222 | 5425 |
RMSE | 9311.46 | 9484.932 | 8647.039 | 7822.754 | |
IL | MAPE | 375 | 381 | 322 | 275 |
RMSE | 538.331 | 549.3422 | 442.886 | 404.925 | |
OH | MAPE | 350 | 369 | 219 | 254 |
RMSE | 648.322 | 666.000 | 357.165 | 441.814 | |
GA | MAPE | 236 | 242 | 202 | 191 |
RMSE | 333.009 | 338.687 | 275.697 | 262.301 | |
PA | MAPE | 618 | 643 | 496 | 514 |
RMSE | 980.757 | 1011.355 | 785.267 | 806.960 | |
AZ | MAPE | 197 | 202 | 132 | 154 |
RMSE | 287.056 | 297.131 | 198.661 | 231.292 |
Publisher’s Note: MDPI stays neutral with regard to jurisdictional claims in published maps and institutional affiliations. |
© 2022 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).
Share and Cite
Shih, D.-H.; Wu, T.-W.; Shih, M.-H.; Yang, M.-J.; Yen, D.C. A Novel βSA Ensemble Model for Forecasting the Number of Confirmed COVID-19 Cases in the US. Mathematics 2022, 10, 824. https://doi.org/10.3390/math10050824
Shih D-H, Wu T-W, Shih M-H, Yang M-J, Yen DC. A Novel βSA Ensemble Model for Forecasting the Number of Confirmed COVID-19 Cases in the US. Mathematics. 2022; 10(5):824. https://doi.org/10.3390/math10050824
Chicago/Turabian StyleShih, Dong-Her, Ting-Wei Wu, Ming-Hung Shih, Min-Jui Yang, and David C. Yen. 2022. "A Novel βSA Ensemble Model for Forecasting the Number of Confirmed COVID-19 Cases in the US" Mathematics 10, no. 5: 824. https://doi.org/10.3390/math10050824
APA StyleShih, D.-H., Wu, T.-W., Shih, M.-H., Yang, M.-J., & Yen, D. C. (2022). A Novel βSA Ensemble Model for Forecasting the Number of Confirmed COVID-19 Cases in the US. Mathematics, 10(5), 824. https://doi.org/10.3390/math10050824