Forecast of the COVID-19 Epidemic Based on RF-BOA-LightGBM
Abstract
:1. Introduction
2. Materials and Methods
2.1. COVID-19 Dataset
2.1.1. Time and Space Comparative Analysis of Baidu Index Search and Actual Cases
2.1.2. The Influence of the Selected Index on the Result
2.2. RF-BOA-LightGBM
2.2.1. Model Structure
2.2.2. Dataset Preprocessing
2.2.3. Tuning Algorithm
2.2.4. LightGBM
3. Results and Discussion
3.1. Performance Predictor
3.2. Experiment Results
4. Conclusions
Author Contributions
Funding
Institutional Review Board Statement
Informed Consent Statement
Data Availability Statement
Conflicts of Interest
Appendix A. Datasets from CDC and Baidu Index Search
Source | CDC- Diagnosis | Baidu- New Coronavirus | Baidu- Fever | Baidu- Dry Cough | Baidu- Fatigue | Baidu- Dyspnea | Baidu- Cough | CDC- Death Toll | ||
---|---|---|---|---|---|---|---|---|---|---|
Data | ||||||||||
Date | ||||||||||
1 January 2020 | 0 | 0 | 4001 | 1100 | 256 | 481 | 5885 | 0 | ||
2 January 2020 | 0 | 0 | 4323 | 1206 | 278 | 602 | 6448 | 0 | ||
3 January 2020 | 1 | 0 | 4212 | 1173 | 262 | 654 | 6392 | 0 | ||
4 January 2020 | 0 | 0 | 4309 | 1109 | 270 | 621 | 6570 | 0 | ||
5 January 2020 | 5 | 0 | 4327 | 1118 | 271 | 591 | 6564 | 0 | ||
6 January 2020 | 0 | 0 | 4324 | 1226 | 310 | 693 | 6404 | 0 | ||
7 January 2020 | 0 | 0 | 3920 | 1175 | 288 | 633 | 5875 | 0 | ||
8 January 2020 | 0 | 0 | 3803 | 1124 | 272 | 622 | 5354 | 0 | ||
9 January 2020 | 0 | 8812 | 3693 | 1131 | 270 | 579 | 5182 | 0 | ||
10 January 2020 | 0 | 2032 | 3700 | 1095 | 263 | 535 | 5022 | 0 | ||
11 January 2020 | 0 | 2879 | 3478 | 1083 | 237 | 498 | 5033 | 1 | ||
12 January 2020 | 0 | 1445 | 3364 | 1067 | 252 | 474 | 5011 | 1 | ||
13 January 2020 | 0 | 1515 | 3573 | 1118 | 278 | 494 | 4418 | 1 | ||
14 January 2020 | 0 | 4846 | 3479 | 1133 | 266 | 528 | 4359 | 1 | ||
15 January 2020 | 0 | 4191 | 3241 | 1097 | 245 | 512 | 4355 | 2 | ||
16 January 2020 | 0 | 5174 | 3230 | 1100 | 267 | 546 | 4220 | 2 | ||
17 January 2020 | 4 | 7713 | 3247 | 1114 | 254 | 521 | 4008 | 2 | ||
18 January 2020 | 17 | 7754 | 3271 | 1060 | 228 | 492 | 4218 | 2 | ||
19 January 2020 | 36 | 29,003 | 3418 | 1182 | 253 | 548 | 4323 | 2 | ||
20 January 2020 | 151 | 266,892 | 4064 | 3684 | 609 | 1090 | 5324 | 2 | ||
21 January 2020 | 77 | 659,926 | 5474 | 10,162 | 1106 | 2073 | 7260 | 2 | ||
22 January 2020 | 149 | 852,363 | 6782 | 21,967 | 1711 | 3125 | 8751 | 3 | ||
23 January 2020 | 131 | 1,374,253 | 9151 | 26,393 | 3141 | 4840 | 10,229 | 11 | ||
24 January 2020 | 259 | 1,469,947 | 8108 | 21,718 | 3162 | 4511 | 9059 | 41 | ||
25 January 2020 | 688 | 2,330,851 | 10029 | 24,100 | 3253 | 5922 | 12798 | 56 | ||
26 January 2020 | 769 | 2,150,021 | 10552 | 20,635 | 3117 | 5779 | 12677 | 80 | ||
27 January 2020 | 1771 | 1,816,430 | 9406 | 15,323 | 2152 | 4572 | 11,547 | 106 | ||
28 January 2020 | 1459 | 2,227,942 | 9091 | 15,115 | 2296 | 4087 | 11,185 | 132 | ||
29 January 2020 | 1737 | 1,503,255 | 9350 | 13,783 | 2088 | 3940 | 11,351 | 170 | ||
30 January 2020 | 1982 | 1,372,206 | 9287 | 12,574 | 1943 | 3541 | 10,786 | 213 | ||
31 January 2020 | 2102 | 1,390,560 | 8855 | 12,974 | 1876 | 3702 | 10,584 | 259 | ||
1 February 2020 | 2590 | 1,334,127 | 8108 | 11,425 | 1620 | 2952 | 9741 | 304 | ||
2 February 2020 | 2829 | 1,374,154 | 7682 | 10,981 | 1491 | 3162 | 9750 | 361 | ||
3 February 2020 | 3235 | 1,277,132 | 7258 | 10,683 | 1365 | 2949 | 8517 | 425 | ||
4 February 2020 | 3887 | 1,244,048 | 6602 | 9504 | 1293 | 2626 | 7258 | 490 | ||
5 February 2020 | 3694 | 1,209,808 | 6213 | 8763 | 1349 | 2380 | 7434 | 563 | ||
6 February 2020 | 3143 | 1,943,197 | 5736 | 8305 | 1295 | 2179 | 8043 | 636 | ||
7 February 2020 | 3399 | 1,643,941 | 5789 | 9236 | 1292 | 2488 | 7261 | 722 | ||
8 February 2020 | 2656 | 1,185,978 | 5126 | 7287 | 1183 | 2131 | 6718 | 811 | ||
9 February 2020 | 3062 | 1,142,892 | 5220 | 8719 | 1187 | 2004 | 8173 | 908 | ||
10 February 2020 | 2478 | 1,158,302 | 5450 | 8585 | 1212 | 1946 | 8948 | 1016 | ||
11 February 2020 | 2015 | 1,061,433 | 4814 | 7421 | 1239 | 1901 | 8641 | 1113 | ||
12 February 2020 | 15,152 | 1,050,392 | 4590 | 5971 | 1163 | 1922 | 7908 | 1367 | ||
13 February 2020 | 5090 | 1,277,024 | 4745 | 6436 | 1125 | 2049 | 8076 | 1380 | ||
14 February 2020 | 2641 | 1,069,203 | 4140 | 5339 | 1126 | 1830 | 7197 | 1523 | ||
15 February 2020 | 2009 | 948,165 | 3295 | 4537 | 1018 | 1456 | 6452 | 1596 | ||
16 February 2020 | 2048 | 904,431 | 2994 | 3953 | 942 | 1205 | 5461 | 1770 | ||
17 February 2020 | 1886 | 920,373 | 3454 | 4025 | 1046 | 1406 | 6542 | 1868 | ||
18 February 2020 | 1749 | 840,490 | 3274 | 3652 | 1056 | 1278 | 6889 | 2004 | ||
19 February 2020 | 394 | 784,784 | 3327 | 3530 | 1038 | 1315 | 6848 | 2118 | ||
20 February 2020 | 889 | 800,960 | 3035 | 3071 | 1012 | 1345 | 6552 | 2236 | ||
21 February 2020 | 397 | 776,563 | 3003 | 3244 | 935 | 1269 | 6467 | 2345 | ||
22 February 2020 | 648 | 636,594 | 2663 | 3003 | 949 | 1179 | 5606 | 2442 | ||
23 February 2020 | 409 | 622,095 | 2777 | 2771 | 978 | 1172 | 5218 | 2592 | ||
24 February 2020 | 508 | 634,391 | 3234 | 2695 | 1025 | 1286 | 5940 | 2663 | ||
25 February 2020 | 406 | 550,484 | 3066 | 2550 | 964 | 1260 | 5462 | 2715 | ||
26 February 2020 | 433 | 482,726 | 2850 | 2468 | 896 | 1202 | 5451 | 2744 | ||
27 February 2020 | 327 | 478,822 | 2835 | 2403 | 819 | 1165 | 5354 | 2788 | ||
28 February 2020 | 427 | 486,394 | 2660 | 2285 | 845 | 1195 | 5425 | 2835 | ||
29 February 2020 | 573 | 496,289 | 2420 | 2213 | 750 | 1133 | 4655 | 2870 | ||
1 March 2020 | 202 | 482,280 | 2244 | 2070 | 686 | 1151 | 4458 | 2912 | ||
2 March 2020 | 125 | 441,914 | 2468 | 2123 | 785 | 1176 | 5326 | 2943 | ||
3 March 2020 | 119 | 393,118 | 2223 | 1955 | 755 | 1143 | 4741 | 2981 | ||
4 March 2020 | 139 | 441,921 | 2264 | 1970 | 765 | 1163 | 4967 | 3012 | ||
5 March 2020 | 143 | 414,142 | 2122 | 1789 | 680 | 1140 | 5157 | 3042 | ||
6 March 2020 | 99 | 376,106 | 2111 | 1658 | 694 | 1112 | 5186 | 3070 | ||
7 March 2020 | 44 | 369,780 | 1877 | 1539 | 723 | 1072 | 4196 | 3097 | ||
8 March 2020 | 40 | 368,916 | 1759 | 1480 | 646 | 1052 | 3993 | 3119 | ||
9 March 2020 | 19 | 359,426 | 2017 | 1414 | 687 | 1133 | 5547 | 3136 | ||
10 March 2020 | 24 | 335,711 | 1792 | 1288 | 635 | 1085 | 4164 | 3158 | ||
11 March 2020 | 15 | 337,491 | 1911 | 1413 | 633 | 1049 | 4331 | 3169 | ||
12 March 2020 | 8 | 353,167 | 1891 | 1575 | 686 | 1088 | 3967 | 3176 | ||
13 March 2020 | 11 | 353,857 | 1906 | 1756 | 641 | 1119 | 3269 | 3189 | ||
14 March 2020 | 20 | 332,215 | 1745 | 1358 | 601 | 1042 | 2788 | 3199 | ||
15 March 2020 | 16 | 364,033 | 1721 | 1486 | 657 | 1037 | 2732 | 3213 | ||
16 March 2020 | 21 | 324,566 | 1985 | 1555 | 759 | 1087 | 3845 | 3226 | ||
17 March 2020 | 13 | 300,185 | 1885 | 1546 | 673 | 1068 | 3022 | 3237 | ||
18 March 2020 | 34 | 295,536 | 1920 | 1491 | 696 | 1052 | 3198 | 3245 | ||
19 March 2020 | 39 | 282,990 | 1724 | 1355 | 663 | 1057 | 2742 | 3248 | ||
20 March 2020 | 41 | 300,183 | 1779 | 1227 | 621 | 1036 | 2705 | 3255 | ||
21 March 2020 | 46 | 299,291 | 1734 | 1308 | 641 | 1006 | 2577 | 3261 | ||
22 March 2020 | 39 | 285,191 | 1736 | 1102 | 672 | 1027 | 2829 | 3270 | ||
23 March 2020 | 78 | 280,841 | 1855 | 1391 | 704 | 1102 | 3018 | 3277 | ||
24 March 2020 | 47 | 278,221 | 1830 | 1457 | 704 | 1052 | 3215 | 3281 | ||
25 March 2020 | 67 | 259,091 | 1810 | 1308 | 656 | 1045 | 3446 | 3287 | ||
26 March 2020 | 55 | 261,957 | 1839 | 1094 | 655 | 1091 | 3114 | 3292 | ||
27 March 2020 | 54 | 279,082 | 1645 | 1129 | 592 | 1061 | 2780 | 3295 | ||
28 March 2020 | 45 | 264,664 | 1525 | 1065 | 476 | 998 | 2480 | 3300 | ||
29 March 2020 | 31 | 265,761 | 1562 | 1096 | 490 | 961 | 2364 | 3304 | ||
30 March 2020 | 48 | 264,442 | 1725 | 1094 | 601 | 1031 | 3021 | 3305 | ||
31 March 2020 | 36 | 239,272 | 1772 | 1071 | 535 | 1007 | 2676 | 3312 | ||
1 April 2020 | 35 | 243,582 | 1569 | 1080 | 565 | 1013 | 2676 | 3318 |
References
- Lu, L.; Zou, Y.Q.; Peng, Y.S.; Li, K.L.; Jiang, T.J. Comparison of Baidu index and Weibo index in surveillance of influenza virus in China. Appl. Res. Comput. 2016, 33, 392–395. [Google Scholar]
- Chen, Y.; Zhang, Y.Z.; Xu, Z.W.; Wang, X.Z.; Lu, J.H.; Hu, W.B. Avian Influenza A (H7N9) and related Internet search query data in China. Sci. Rep. 2019, 9, 10434. [Google Scholar] [CrossRef] [PubMed][Green Version]
- Fung, I.C.H.; Fu, K.W.; Ying, Y.C.; Schaible, B.; Hao, Y.; Chan, C.H.; Tse, Z.T.H. Chinese social media reaction to the MERS-CoV and avian influenza A(H7N9) outbreaks. Infect. Dis. Poverty 2013, 2, 31. [Google Scholar] [CrossRef] [PubMed][Green Version]
- Gu, H.G.; Zhang, W.J.; Xu, H.; Li, P.Y.; Wu, L.L.; Guo, P.; Hao, Y.T.; Lu, J.H.; Zhang, D.M. Predicating risk area of human infection with avian influenza A (H7N9) virus by using early warning model in China. Chin. J. Epidemiol. 2015, 36, 470–475. [Google Scholar]
- COVID-19 Coronavirus Data. Available online: https://data.europa.eu/euodp/en/data/dataset/covid-19-coronavirus-data (accessed on 14 December 2020).
- Zhao, X.M.; Li, X.H.; Nie, C.H. Retrospecting the spread of new coronary pneumonia based on big data and China’s control of the epidemic. Bull. Chin. Acad. Sci. 2020, 35, 248–255. [Google Scholar]
- McCall, B. COVID-19 and artificial intelligence: Protecting health-care workers and curbing the spread. Lancet Digit. Health 2020, 2, 166–167. [Google Scholar] [CrossRef]
- Baidu Index. Available online: http://index.baidu.com/ (accessed on 1 April 2020).
- Zhang, Y.D.; Zhang, X.; Zhu, W.G. ANC: Attention network for COVID-19 explainable diagnosis based on convolutional block attention module. Cmes-Comp. Model. Eng. 2021, 127, 1037–1058. [Google Scholar]
- Zhang, X.; Lu, S.Y.; Wang, S.H.; Yu, X.; Wang, S.J.; Yao, L.; Pan, Y.; Zhang, Y.D. Diagnosis of COVID-19 pneumonia via a novel deep learning architecture. J. Comput. Sci. Tech. 2021, 1. [Google Scholar] [CrossRef]
- Sylvester, E.V.A.; Bentzen, P.; Bradbury, I.R.; Clement, M.; Pearce, J.; Horne, J.; Beiko, R.G. Applications of random forest feature selection for fine-scale genetic population assignment. Evol. Appl. 2018, 11, 153–165. [Google Scholar] [CrossRef] [PubMed]
- Li, X.K.; Chen, W.; Zhang, Q.R.; Wu, L.F. Building auto-encoder intrusion detection system based on random forest feature selection. Comput. Secur. 2020, 95, 101851. [Google Scholar] [CrossRef]
- Al Daoud, E. Comparison between XGBoost, LightGBM and CatBoost using a home credit dataset. Int. J. Comput. Inf. Eng. 2019, 13, 6–10. [Google Scholar]
- Frazier, P.I. A tutorial on Bayesian optimization. arXiv 2018, arXiv:1807.02811. [Google Scholar]
- Liashchynskyi, P.; Liashchynskyi, P. Grid search, random search, genetic algorithm: A big comparison for NAS. arXiv 2019, arXiv:1912.06059. [Google Scholar]
- Wang, Y.; Wang, T. Application of improved LightGBM model in blood glucose prediction. Appl. Sci. 2020, 10, 3227. [Google Scholar] [CrossRef]
- Liang, X. Image-based post-disaster inspection of reinforced concrete bridge systems using deep learning with Bayesian optimization. Comput.-Aided Civ. Inf. 2019, 34, 415–430. [Google Scholar] [CrossRef]
- Jones, D.R.; Schonlau, M.; Welch, W.J. Efficient global optimization of expensive black-box functions. J. Global Optim. 1998, 13, 455–492. [Google Scholar] [CrossRef]
- Sameen, M.I.; Pradhan, B.; Lee, S. Application of convolutional neural networks featuring Bayesian optimization for landslide susceptibility assessment. Catena 2020, 186, 104249. [Google Scholar] [CrossRef]
- Liang, W.Z.; Luo, S.Z.; Zhao, G.Y.; Wu, H. Predicting hard rock pillar stability using GBDT, XGBoost, and LightGBM algorithms. Mathematics 2020, 8, 765. [Google Scholar] [CrossRef]
Source | CDC- Diagnosis | Baidu- New Coronavirus | Baidu- Fever | Baidu- Dry Cough | Baidu- Fatigue | Baidu- Dyspnea | Baidu- Cough | CDC- Death Toll | ||
---|---|---|---|---|---|---|---|---|---|---|
Data | ||||||||||
Date | ||||||||||
1 January 2020 | 0 | 0 | 4001 | 1100 | 256 | 481 | 5885 | 0 | ||
2 January 2020 | 0 | 0 | 4323 | 1206 | 278 | 602 | 6448 | 0 | ||
3 January 2020 | 1 | 0 | 4212 | 1173 | 262 | 654 | 6392 | 0 | ||
4 January 2020 | 0 | 0 | 4309 | 1109 | 270 | 621 | 6570 | 0 | ||
5 January 2020 | 5 | 0 | 4327 | 1118 | 271 | 591 | 6564 | 0 | ||
6 January 2020 | 0 | 0 | 4324 | 1226 | 310 | 693 | 6404 | 0 | ||
7 January 2020 | 0 | 0 | 3920 | 1175 | 288 | 633 | 5875 | 0 | ||
8 January 2020 | 0 | 0 | 3803 | 1124 | 272 | 622 | 5354 | 0 | ||
9 January 2020 | 0 | 8812 | 3693 | 1131 | 270 | 579 | 5182 | 0 | ||
10 January 2020 | 0 | 2032 | 3700 | 1095 | 263 | 535 | 5022 | 0 | ||
11 January 2020 | 0 | 2879 | 3478 | 1083 | 237 | 498 | 5033 | 1 | ||
12 January 2020 | 0 | 1445 | 3364 | 1067 | 252 | 474 | 5011 | 1 | ||
13 January 2020 | 0 | 1515 | 3573 | 1118 | 278 | 494 | 4418 | 1 | ||
14 January 2020 | 0 | 4846 | 3479 | 1133 | 266 | 528 | 4359 | 1 | ||
15 January 2020 | 0 | 4191 | 3241 | 1097 | 245 | 512 | 4355 | 2 |
Category | First Peak Time | Time Difference (Days) |
---|---|---|
CDC-Diagnostic | 12 February 2020 | - |
Baidu-New coronavirus | 25 January 2020 | +18 |
Baidu-Fever | 26 January 2020 | +17 |
Baidu-Dry cough | 23 January 2020 | +20 |
Baidu-Fatigue | 25 January 2020 | +18 |
Baidu-Dyspnea | 25 January 2020 | +18 |
Baidu-Cough | 25 January 2020 | +18 |
Arithmetic mean | - | +18 |
Parameter | Style | Search Scope | Effect |
---|---|---|---|
learn_rate | float | (0.001, 0.3) | improve accuracy |
max_depth | int | (3, 10) | prevent overfitting |
num_leaves | int | (3, 1024) | improve accuracy |
min_data_in_leaf | int | (0, 80) | prevent overfitting |
feature_fraction | float | (0.2, 0.9) | accelerate |
bagging_fraction | float | (0.2, 0.9) | accelerate |
lambda_l1 | float | (0, 10) | prevent overfitting |
Parameter | GridSearch | RandomSearch | BOA |
---|---|---|---|
learn_rate | 0.632 | 0.828 | 0.355 |
max_depth | 7 | 8 | 5 |
num_leaves | 225 | 237 | 249 |
min_data_in_leaf | 33 | 27 | 30 |
feature_fraction | 0.7 | 0.7 | 0.8 |
bagging_fraction | 0.7 | 0.7 | 0.8 |
lambda_l1 | 2.34 | 3.45 | 1.80 |
Models | R2 | RMSE | MAE | RAE | RRSE |
---|---|---|---|---|---|
LightGBM | 0.820 | 354.945 | 138.939 | 0.535 | 0.424 |
GridSearch-LightGBM | 0.865 | 311.918 | 145.266 | 0.548 | 0.368 |
RandomSearch-LightGBM | 0.861 | 316.217 | 137.621 | 0.533 | 0.373 |
BOA-LightGBM | 0.879 | 295.686 | 124.911 | 0.508 | 0.348 |
Publisher’s Note: MDPI stays neutral with regard to jurisdictional claims in published maps and institutional affiliations. |
© 2021 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).
Share and Cite
Li, Z.; Hu, D. Forecast of the COVID-19 Epidemic Based on RF-BOA-LightGBM. Healthcare 2021, 9, 1172. https://doi.org/10.3390/healthcare9091172
Li Z, Hu D. Forecast of the COVID-19 Epidemic Based on RF-BOA-LightGBM. Healthcare. 2021; 9(9):1172. https://doi.org/10.3390/healthcare9091172
Chicago/Turabian StyleLi, Zhe, and Dehua Hu. 2021. "Forecast of the COVID-19 Epidemic Based on RF-BOA-LightGBM" Healthcare 9, no. 9: 1172. https://doi.org/10.3390/healthcare9091172