# Forecasting Weekly Influenza Outpatient Visits Using a Two-Dimensional Hierarchical Decision Tree Scheme

^{1}

^{2}

^{3}

^{4}

^{*}

## Abstract

**:**

## 1. Introduction

## 2. Methods

#### 2.1. Random Forest

#### 2.2. XGBoost

#### 2.3. Autoregressive Integrated Moving Average

#### 2.4. MARS

#### 2.5. Model Implementation

#### 2.6. Proposed Influenza Outpatient Visit Forecasting Scheme

## 3. Empirical Study

_{T,t−1}is the most important variable. It means that the number of influenza outpatient visits in the Taipei region and lag one week is the most important reference information to plan and allocate healthcare resources for influenza prevention.

## 4. Discussion

_{T,t−1}, X

_{S,t−1}, and X

_{N,t−1}are inferred to be the most important variables and can be considered as a crucial sign for influenza prevention. From Figure 5, we can find the cumulative importance of the first three important variables is 52% which is around 60% and these three variables account for only 12.5% of all 24 variables. According to the first three important variables, we can infer that when the number of influenza outpatient visits in the previous one week are bumping up in the Taipei, South, and North regions, it is highly probable that within the next four weeks, there will be a nationwide pandemic. From an administrative area perspective, it seems reasonable because the Taipei region includes Taipei City, New Taipei City, Keelung City, Yilan County, Kinmen County, and Lianjiang County. The North region includes Hsinchu City, Hsinchu County, Taoyuan City, and Miaoli County; and the South region includes Tainan City, Chiayi City, Chiayi County, and Yunlin County. These regions cover the most important areas of economic activity in Taiwan.

_{T,t−1}, X

_{S,t−1}, X

_{N,t−1}, X

_{E,t−1}, X

_{C,t−1}, X

_{T,t−2}, X

_{E,t−2}, X

_{S,t−2}, X

_{N,t−2}, and X

_{C,t−2}could be considered since these then variables account for only 41.6% of all 24 variables. The cumulative importance value the first ten important variables is 90% which is also a common concept in the statistical analysis [40]. The first ten important variables depict the geographical area of Taipei, South, North, Central, and the East regions, and its influenza outpatient visits in the previous one and two weeks are important, and the variables for lag three to four weeks are not selected. The result presents that the high-frequency variable in this study is less of a contribution to predict influenza outpatient visits nationwide. The first ten variables can be used to predict the trend of influenza outpatient visits one week to four weeks ahead.

## 5. Limitations and Future Research

## 6. Conclusions

## Author Contributions

## Funding

## Conflicts of Interest

## References

- Basile, L.; Oviedo de la Fuente, M.; Torner, N.; Martı´nez, A.; Jane, M. Real-time predictive seasonal influenza model in Catalonia, Spain. PLoS ONE
**2018**, 13, e0193651. [Google Scholar] [CrossRef] [PubMed] [Green Version] - Liebowitz, D.; Gottlieb, K.; Kolhatkar, N.S.; Garg, S.J.; Asher, J.M.; Nazareno, J.; Tucker, S.N. Efficacy, immunogenicity, and safety of an oral influenza vaccine: A placebo-controlled and active-controlled phase 2 human challenge study. Lancet Infect. Dis.
**2020**, 20, 435–444. [Google Scholar] [CrossRef] - Al-qaness, M.A.A.; Ewees, A.A.; Fan, H.; Abd Elaziz, M. Optimized Forecasting Method for Weekly Influenza Confirmed Cases. Int. J. Environ. Res. Public Health
**2020**, 17, 3510. [Google Scholar] [CrossRef] [PubMed] - Molinari, N.-A.M.; Ortega-Sanchez, I.R.; Messonnier, M.L.; Thompson, W.W.; Wortley, P.M.; Weintraub, E.; Bridges, C.B. The annual impact of seasonal influenza in the US: Measuring disease burden and costs. Vaccine
**2007**, 25, 5086–5096. [Google Scholar] [CrossRef] - Lu, J.; Meyer, S. Forecasting Flu Activity in the United States: Benchmarking an Endemic-Epidemic Beta Model. Int. J. Environ. Res. Public Health
**2020**, 17, 1381. [Google Scholar] [CrossRef] [PubMed] [Green Version] - Araz, O.M.; Bentley, D.; Muelleman, R.L. Using Google Flu Trends data in forecasting influenza-like illness related ED visits in Omaha, Nebraska. Am. J. Emerg. Med.
**2014**, 32, 1016–1023. [Google Scholar] [CrossRef] [PubMed] - Towers, S.; Chowell, G. Impact of weekday social contact patterns on the modeling of influenza transmission, and determination of the influenza latent period. J. Theor. Biol.
**2012**, 312, 87–95. [Google Scholar] [CrossRef] [PubMed] - Dugas, A.F.; Jalalpour, M.; Gel, Y.; Levin, S.; Torcaso, F.; Igusa, T.; Rothman, R.E. Influenza forecasting with Google Flu Trends. PLoS ONE
**2013**, 8, e56176. [Google Scholar] [CrossRef] - Nsoesie, E.O.; Marathe, M.; Brownstein, J.S. Forecasting peaks of seasonal influenza epidemics. Edition 1. PLOS Curr. Outbreaks
**2013**. [Google Scholar] [CrossRef] - Osthus, D.; Hickmann, K.S.; Caragea, P.C.; Higdon, D.; Valle, S.Y.D. Forecasting seasonal influenza with a state-space SIR model. Ann. Appl. Stat.
**2017**, 11, 202–224. [Google Scholar] [CrossRef] - Volkova, S.; Ayton, E.; Porterfield, K.; Corley, C.D. Forecasting influenza-like illness dynamics for military populations using neural networks and social media. PLoS ONE
**2017**, 12, e0188941. [Google Scholar] [CrossRef] [PubMed] [Green Version] - Venna, S.R.; Tavanaei, A.; Gottumukkala, R.N.; Raghavan, V.V.; Maida, A.S.; Nichols, S. A novel data-driven model for real-time influenza forecasting. IEEE Access
**2018**, 7, 7691–7701. [Google Scholar] [CrossRef] - Jordan, M.I.; Mitchell, T.M. Machine learning: Trends, perspectives, and prospects. Science
**2015**, 349, 255–260. [Google Scholar] [CrossRef] [PubMed] - Cuenca, E.; Sallaberry, A.; Ying Wang, F.; Poncelet, P. MultiStream: A multiresolution streamgraph approach to explore hierarchical time series. IEEE Trans. Vis. Comput. Graph.
**2018**, 24, 3160–3173. [Google Scholar] [CrossRef] [Green Version] - Hyndman, R.J.; Lee, A.J.; Wang, E. Fast computation of reconciled forecasts for hierarchical and grouped time series. Comput. Stat. Data Anal.
**2016**, 97, 16–32. [Google Scholar] [CrossRef] [Green Version] - Pei, S.; Kandula, S.; Yang, W.; Shaman, J. Forecasting the spatial transmission of influenza in the United States. Proc. Natl. Acad. Sci. USA
**2018**, 115, 2752–2757. [Google Scholar] [CrossRef] [PubMed] [Green Version] - Wang, Y.; Xu, K.; Kang, Y.; Wang, H.; Wang, F.; Avram, A. Regional Influenza Prediction with Sampling Twitter Data and PDE Model. Int. J. Environ. Res. Public Health
**2020**, 17, 678. [Google Scholar] [CrossRef] [PubMed] [Green Version] - Tiao, G.C.; Guttman, I. Forecasting contemporal aggregates of multiple time series. J. Econom.
**1980**, 12, 219–230. [Google Scholar] [CrossRef] - Kohn, R. When is an aggregate of a time series efficiently forecast by its past? J. Econom.
**1982**, 18, 337–349. [Google Scholar] [CrossRef] - Collins, D.W. Predicting earnings with sub-entity data: Some further evidence. J. Account. Res.
**1976**, 14, 163–177. [Google Scholar] [CrossRef] - Dunn, D.M.; Williams, W.H.; DeChaine, T.L. Aggregate versus subaggregate models in local area forecasting. J. Am. Stat. Assoc.
**1976**, 71, 68–71. [Google Scholar] [CrossRef] - Dangerfield, B.J.; Morris, J.S. Top–down or bottom–up: Aggregate versus disaggregate extrapolations. Int. J. Forecast.
**1992**, 8, 233–241. [Google Scholar] [CrossRef] - Venkatesh, B.; Anuradha, J. A hybrid feature selection approach for handling high-dimensional data. Lecture notes in Networks and Systems. In Innovations in Computer Science and Engineering; Springer: Singapore, 2019; pp. 365–373. [Google Scholar]
- Reich, N.G.; Brooks, L.C.; Fox, S.J.; Kandula, S.; McGowan, C.J.; Moore, E.; Moore, E.; Osthus, D.; Ray, E.L.; Tushar, A.; et al. A collaborative multiyear, multimodel assessment of seasonal influenza forecasting in the United States. Proc. Natl. Acad. Sci. USA
**2019**, 116, 3146–3154. [Google Scholar] [CrossRef] [Green Version] - Sharafi, M.; Ghaem, H.; Tabatabaee, H.R.; Faramarzi, H. Forecasting the number of zoonotic cutaneous leishmaniasis cases in south of Fars province, Iran using seasonal ARIMA time series method. Asian Pac. J. Trop. Med.
**2017**, 10, 79–86. [Google Scholar] [CrossRef] - Cong, J.; Ren, M.; Xie, S.; Wang, P. Predicting Seasonal Influenza Based on SARIMA Model, in Mainland China from 2005 to 2018. Int. J. Environ. Res. Public Health
**2019**, 16, 4760. [Google Scholar] [CrossRef] [PubMed] [Green Version] - Probst, P.; Wright, M.N.; Boulesteix, A.-L. Hyperparameters and tuning strategies for random forest. Wiley Interdiscip. Rev. Data Min. Knowl. Discov.
**2019**, 9, e1301. [Google Scholar] [CrossRef] [Green Version] - Breiman, L. Random forests. Mach. Learn.
**2001**, 45, 5–32. [Google Scholar] [CrossRef] [Green Version] - Suo, G.; Song, L.; Dou, Y.; Cui, Z. Multi-dimensional short-term load Forecasting based on XGBoost and fireworks algorithm. In Proceedings of the 2019 18th International Symposium on Distributed Computing and Applications for Business Engineering and Science (DCABES), Wuhan, China, 8–10 November 2019. [Google Scholar]
- Chen, T.; Guestrin, C. XGBoost: A scalable tree boosting system. In Proceedings of the 22nd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, San Francisco, CA, USA, 13–17 August 2016; pp. 785–794. [Google Scholar]
- Newbold, P. ARIMA model building and the time series analysis approach to forecasting. J. Forecast.
**1983**, 2, 23–35. [Google Scholar] [CrossRef] - Chen, P.; Pedersen, T.; Bak-Jensen, B.; Chen, Z. ARIMA-Based Time Series Model of Stochastic Wind Power Generation. IEEE Trans. Power Syst.
**2010**, 25, 667–676. [Google Scholar] [CrossRef] [Green Version] - Box, G.E.P.; Jenkins, G.M.; Reinsel, G.C. Time Series Analysis Forecasting and Control, 3rd ed.; PrenticeHall: Englewood Cliffs, NJ, USA, 1994. [Google Scholar]
- Friedman, J.H. Multivariate adaptive regression splines. Ann. Stat.
**1991**, 19, 1–67. [Google Scholar] [CrossRef] - Meyer, D.; Dimitriadou, E.; Hornik, K.; Weingessel, A.; Leisch, F. E1071: Misc Functions of the Department of Statistics, Probability Theory Group (Formerly: E1071), TUWien, 2017. R Package Version, 1.7–3. Available online: https://www.rdocumentation.org/packages/e1071 (accessed on 18 January 2020).
- Milborrow, S.; Hastie, T.; Tibshirani, R.; Miller, A.; Lumley, T. Earth: Multivariate Adaptive Regression Splines. R Package Version 5.1.2. Available online: https://www.rdocumentation.org/packages/earth (accessed on 18 January 2020).
- Liaw, A.; Wiener, M. RandomForest: Breiman and Cutler’s Random Forests for Classification and Regression. R Package Version, 4.6.14. Available online: https://www.rdocumentation.org/packages/randomForest (accessed on 18 January 2020).
- Hyndman, R.J.; Khandakar, Y. Automatic Time Series Forecasting: The Forecast Package for R.; Monash University, Department of Econometrics and Business Statistics: Clayton, Australia, 2018. [Google Scholar]
- Chen, T.; He, T.; Benesty, M.; Khotilovich, V.; Tang, Y. Xgboost: Extreme gradient boosting. R Package Version 0.90.0.2. Available online: https://www.rdocumentation.org/packages/xgboost (accessed on 18 January 2020).
- Johnson, R.A.; Wichern, D.W. Applied Multivariate Statistical Analysis, 6th ed.; Pearson: London, UK, 2014. [Google Scholar]
- Niu, D.; Liu, Y.; Cai, T.; Zheng, X.; Liu, T.; Zhou, S. A Novel Distributed Duration-Aware LSTM for Large Scale Sequential Data Analysis. In CCF Conference on Big Data; Jin, H., Lin, X., Cheng, X., Shi, X., Xiao, N., Huang, Y., Eds.; Springer: Singapore, 2019; Volume 1120. [Google Scholar]
- Lecuyer, G.; Ragot, M.; Martin, N.; Launay, L.; Jannin, P. Assisted phase and step annotation for surgical videos. Int. J. Comput. Assist. Radiol. Surg.
**2020**, 15, 673–680. [Google Scholar] [CrossRef] [PubMed] - Boutaba, R.; Salahuddin, M.A.; Limam, N.; Ayoubi, S.; Shahriar, N.; Estrada-Solano, F.; Caicedo, O.M.A. Comprehensive survey on machine learning for networking: Evolution, applications and research opportunities. J. Internet Serv. Appl.
**2018**, 9, 16. [Google Scholar] [CrossRef] [Green Version]

**Figure 2.**The nationwide outpatient visits for influenza in Taiwan from the first week of 2005 to the second week of 2020.

**Figure 3.**The influenza outpatient visits for Taipei, North, and Central regions in Taiwan from the first week of 2005 to the second week of 2020.

**Figure 4.**The influenza outpatient visits for South, East, and Kaoping regions in Taiwan from the first week of 2005 to the second week of 2020.

Metrics | Calculation * |
---|---|

MAE | $\mathrm{MAE}=\frac{1}{m}{\displaystyle {\displaystyle \sum}_{j=1}^{m}}\left|{F}_{j}-{A}_{j}\right|$ |

RMSE | $\mathrm{RMSE}=\sqrt{\frac{1}{m}{\displaystyle {\displaystyle \sum}_{j=1}^{m}}{\left({F}_{j}-{A}_{j}\right)}^{2}}$ |

MAPE | $\mathrm{MAPE}=\frac{1}{m}{\displaystyle {\displaystyle \sum}_{j=1}^{m}}\left|\frac{{F}_{j}-{A}_{j}}{{A}_{j}}\right|\times 100$ |

MASE | $\mathrm{MASE}=\frac{\frac{1}{m}{{\displaystyle \sum}}_{j=1}^{m}\left|{F}_{j}-{A}_{j}\right|}{\frac{1}{z-1}{{\displaystyle \sum}}_{t=2}^{n}\left|{A}_{t}-{A}_{t-1}\right|}$ |

RMSPE | $\mathrm{RMSPE}=\sqrt{\frac{1}{m}{\displaystyle {\displaystyle \sum}_{j=1}^{m}}{\left(\frac{{F}_{j}-{A}_{j}}{{A}_{j}}\right)}^{2}}$ |

Method | MAE | RMSE | MAPE | MASE | RMSPE |
---|---|---|---|---|---|

TDHS-RF | 4693 | 8702 | 0.126 | 1.167 | 0.188 |

TDHS-XG | 4409 | 8050 | 0.113 | 1.096 | 0.168 |

TDHS-MARS | 6415 | 14,000 | 0.159 | 1.595 | 0.226 |

RF | 5040 | 9179 | 0.133 | 1.253 | 0.194 |

XGB | 4432 | 8029 | 0.117 | 1.102 | 0.173 |

MARS | 5158 | 9861 | 0.133 | 1.282 | 0.201 |

ARIMA | 13,986 | 20,774 | 0.417 | 3.477 | 0.512 |

Step (Weeks) Ahead | Method | MAE | RMSE | MAPE | MASE | RMSPE |
---|---|---|---|---|---|---|

2 | TDHS-RF | 6451 | 10,614 | 0.176 | 1.599 | 0.240 |

TDHS-XGB | 6439 | 11,018 | 0.163 | 1.596 | 0.225 | |

TDHS-MARS | 8077 | 14,819 | 0.205 | 2.002 | 0.268 | |

RF | 6858 | 11,342 | 0.186 | 1.700 | 0.264 | |

XGB | 6489 | 11,009 | 0.168 | 1.608 | 0.231 | |

MARS | 7071 | 11,689 | 0.185 | 1.753 | 0.247 | |

ARIMA | 14,037 | 20,818 | 0.418 | 3.479 | 0.513 | |

3 | TDHS-RF | 8222 | 13,098 | 0.227 | 2.033 | 0.316 |

TDHS-XGB | 8329 | 13,764 | 0.211 | 2.059 | 0.291 | |

TDHS-MARS | 9671 | 16,012 | 0.252 | 2.391 | 0.328 | |

RF | 8691 | 13,857 | 0.241 | 2.149 | 0.346 | |

XGBoost | 8564 | 14,119 | 0.222 | 2.118 | 0.312 | |

MARS | 8780 | 14,054 | 0.234 | 2.171 | 0.323 | |

ARIMA | 14,082 | 20,862 | 0.419 | 3.482 | 0.514 | |

4 | TDHS-RF | 9994 | 15,609 | 0.280 | 2.463 | 0.401 |

TDHS-XGB | 10,178 | 16,550 | 0.264 | 2.508 | 0.375 | |

TDHS-MARS | 11,114 | 17,746 | 0.295 | 2.739 | 0.404 | |

RF | 10,478 | 16,595 | 0.295 | 2.582 | 0.435 | |

XGB | 10,433 | 17,027 | 0.274 | 2.571 | 0.401 | |

MARS | 10,462 | 16,654 | 0.285 | 2.578 | 0.414 | |

ARIMA | 14,118 | 20,903 | 0.419 | 3.479 | 0.514 |

Variable | Relative Importance | Variable | Relative Importance |
---|---|---|---|

X_{T,t−1} | 19.06% | X_{C,t−4} | 0.88% |

X_{S,t−1} | 17.45% | X_{C,t−3} | 0.85% |

X_{N,t−1} | 15.43% | X_{T,t−4} | 0.81% |

X_{E,t−1} | 9.86% | X_{S,t−3} | 0.74% |

X_{C,t−1} | 8.72% | X_{K,t−2} | 0.74% |

X_{T,t−2} | 7.32% | X_{N,t−3} | 0.65% |

X_{E,t−2} | 3.84% | X_{K,t−3} | 0.65% |

X_{S,t−2} | 3.26% | X_{S,t−4} | 0.64% |

X_{N,t−2} | 2.79% | X_{N,t−4} | 0.57% |

X_{C,t−2} | 1.89% | X_{E,t−3} | 0.53% |

X_{K,t−1} | 1.33% | X_{E,t−4} | 0.52% |

X_{T,t−3} | 0.92% | X_{K,t−4} | 0.52% |

Total | 100.00% |

© 2020 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (http://creativecommons.org/licenses/by/4.0/).

## Share and Cite

**MDPI and ACS Style**

Lee, T.-S.; Chen, I.-F.; Chang, T.-J.; Lu, C.-J.
Forecasting Weekly Influenza Outpatient Visits Using a Two-Dimensional Hierarchical Decision Tree Scheme. *Int. J. Environ. Res. Public Health* **2020**, *17*, 4743.
https://doi.org/10.3390/ijerph17134743

**AMA Style**

Lee T-S, Chen I-F, Chang T-J, Lu C-J.
Forecasting Weekly Influenza Outpatient Visits Using a Two-Dimensional Hierarchical Decision Tree Scheme. *International Journal of Environmental Research and Public Health*. 2020; 17(13):4743.
https://doi.org/10.3390/ijerph17134743

**Chicago/Turabian Style**

Lee, Tian-Shyug, I-Fei Chen, Ting-Jen Chang, and Chi-Jie Lu.
2020. "Forecasting Weekly Influenza Outpatient Visits Using a Two-Dimensional Hierarchical Decision Tree Scheme" *International Journal of Environmental Research and Public Health* 17, no. 13: 4743.
https://doi.org/10.3390/ijerph17134743