Using Internet Search Data to Forecast COVID-19 Trends: A Systematic Review
Abstract
:1. Introduction
2. Related Studies
2.1. COVID-19 Forecasting Models and Related Literature
2.2. Search Strategy and Selection Criteria
- Forecasting studies of COVID-19: papers that provided future predictions/forecasts for a specific region in the world and for a specific future horizon. The search terms used were as follows: COVID-19, coronavirus, SARS-CoV-2, prediction models, forecasting models, predictive analysis.
- Data-driven including internet search: broadly defined as papers that incorporated COVID-19-related data, internet search data, and other exogenous information into the setup or fitting of the model. Here, the internet search information was broadly defined as datasets that reflected the online search behaviors of a population of interest. The search terms used were as follows: internet search data, internet search information, Google Trends, online search behavior, COVID-19 time series information, mobility data.
3. Data Acquisition and Preprocessing
3.1. COVID-19-Related Data (Forecasting Target)
- Confirmed cases: daily/weekly COVID-19 new confirmed cases (infections) time series in different geographical resolutions. For example, the many U.S. studies used the JHU CSSE COVID-19 dataset [57] as the official ground truth. Confirmed case counts in China were obtained from China CDC [58]. The confirmed case counts in other regions were obtained from ECDC [59].
- Reported deaths: daily/weekly COVID-19 newly reported deaths time series in different geographical resolution. The data sources were similar to confirmed cases time series above.
- Hospitalizations: COVID-19 hospitalizations generally refer to the number of daily/weekly newly admitted patients to the hospitals in various geographical resolutions that tested positive for COVID-19. Hospitalizations reflect the number of severe cases, and therefore keeping track of hospitalizations is strategic for policymakers as it allows predicting the potential saturation of the hospital systems, and helping local public health officials make timely decisions in allocation of healthcare resources, such as ventilators, ICU beds, personal protective equipment, personnel, etc. Hospitalization data is maintained by different organizations across different regions in the world. For example, the U.S. Department of Health and Human Services (HHS) [60] releases the ground truth information of new hospital admissions in the U.S.
- Vaccination rates: percentage of fully vaccinated population in daily/weekly frequency, reported by different health organizations in each region. For example, U.S. vaccination rates are reported by the CDC [61] with daily frequency.
3.2. Internet Search Data
3.2.1. Query Selection
3.2.2. Search Volume Data Preprocessing
3.3. Other Auxiliary Data Sources
4. Methods
4.1. Prediction Models
4.1.1. Statistical Models
4.1.2. Deep Learning Models
4.2. Evaluation Process and Metrics
5. Results
- Persistence (naïve) rule: a rule-based model that uses the COVID-19 target (cases, death, or hospitalization) count at date as an estimate of the prediction for .
- Time series baseline: generally refers to linear-based models such as the autoregressive moving average model (ARMA) [71], and its variants (AR, MA, ARIMA, etc.), that utilize COVID-19-related time series information only (in Section 3.1).
- Simpler version of proposed model: generally refers to a simpler version of the proposed forecasting model after removing one or multiple components in the model structure (data component, architecture component, etc.).
- Other publicly available benchmark: generally refers to established and publicly available benchmark predictions, such as those of the COVID-19 forecast hub [5].
5.1. Importance of Internet Search Component
5.2. Internet Search Information Serving as Early-Warning Signals
6. Discussion
7. Conclusions
Supplementary Materials
Author Contributions
Funding
Institutional Review Board Statement
Informed Consent Statement
Data Availability Statement
Conflicts of Interest
References
- Weekly Epidemiological Update on COVID-19. 24 August 2022. Available online: https://www.who.int/publications/m/item/weekly-epidemiological-update-on-covid-19---24-august-2022 (accessed on 12 November 2022).
- Moynihan, R.; Sanders, S.; Michaleff, Z.A.; Scott, A.M.; Clark, J.; To, E.J.; Jones, M.; Kitchener, E.; Fox, M.; Johansson, M. Impact of COVID-19 pandemic on utilisation of healthcare services: A systematic review. BMJ Open 2021, 11, e045343. [Google Scholar] [CrossRef] [PubMed]
- Balest, J.; Stawinoga, A.E. Social practices and energy use at home during the first Italian lockdown due to COVID-19. Sustain. Cities Soc. 2022, 78, 103536. [Google Scholar] [CrossRef]
- Shinde, G.R.; Kalamkar, A.B.; Mahalle, P.N.; Dey, N.; Chaki, J.; Hassanien, A.E. Forecasting models for coronavirus disease (COVID-19): A survey of the state-of-the-art. SN Comput. Sci. 2020, 1, 197. [Google Scholar] [CrossRef] [PubMed]
- Ray, E.L.; Wattanachit, N.; Niemi, J.; Kanji, A.H.; House, K.; Cramer, E.Y.; Bracher, J.; Zheng, A.; Yamana, T.K.; Xiong, X. Ensemble forecasts of coronavirus disease 2019 (COVID-19) in the US. medRxiv 2020. [Google Scholar] [CrossRef]
- Jahja, M.; Farrow, D.; Rosenfeld, R.; Tibshirani, R.J. Kalman filter, sensor fusion, and constrained regression: Equivalences and insights. Adv. Neural Inf. Process. Syst. 2019, 32, 1–10. [Google Scholar]
- Jin, X.; Wang, Y.X.; Yan, X. Inter-series attention model for COVID-19 forecasting. In Proceedings of the 2021 SIAM International Conference on Data Mining (SDM), Virtual, 29 April–1 May 2021. [Google Scholar]
- Rodriguez, A.; Tabassum, A.; Cui, J.; Xie, J.; Ho, J.; Agarwal, P.; Adhikari, B.; Prakash, B.A. Deepcovid: An operational deep learning-driven framework for explainable real-time COVID-19 forecasting. In Proceedings of the AAAI Conference on Artificial Intelligence, Virtual, 22 February–1 March 2021. [Google Scholar]
- Arik, S.; Li, C.-L.; Yoon, J.; Sinha, R.; Epshteyn, A.; Le, L.; Menon, V.; Singh, S.; Zhang, L.; Nikoltchev, M. Interpretable sequence learning for COVID-19 forecasting. Adv. Neural Inf. Process. Syst. 2020, 33, 18807–18818. [Google Scholar]
- Abbott, S.; Hellewell, J.; Thompson, R.N.; Sherratt, K.; Gibbs, H.P.; Bosse, N.I.; Munday, J.D.; Meakin, S.; Doughty, E.L.; Chun, J.Y. Estimating the time-varying reproduction number of SARS-CoV-2 using national and subnational case counts. Wellcome Open Res. 2020, 5, 112. [Google Scholar] [CrossRef]
- Yang, W.; Kandula, S.; Huynh, M.; Greene, S.K.; Van Wye, G.; Li, W.; Chan, H.T.; McGibbon, E.; Yeung, A.; Olson, D. Estimating the infection-fatality risk of SARS-CoV-2 in New York City during the spring 2020 pandemic wave: A model-based analysis. Lancet Infect. Dis. 2021, 21, 203–212. [Google Scholar] [CrossRef] [PubMed]
- Yang, S.; Santillana, M.; Kou, S.C. Accurate estimation of influenza epidemics using Google search data via ARGO. Proc. Natl. Acad. Sci. USA 2015, 112, 14473–14478. [Google Scholar] [CrossRef] [Green Version]
- Santillana, M.; Nguyen, A.T.; Dredze, M.; Paul, M.J.; Nsoesie, E.O.; Brownstein, J.S. Combining search, social media, and traditional data sources to improve influenza surveillance. PLoS Comput. Biol. 2015, 11, e1004513. [Google Scholar] [CrossRef] [Green Version]
- Lu, F.S.; Hattab, M.W.; Clemente, C.L.; Biggerstaff, M.; Santillana, M. Improved state-level influenza nowcasting in the United States leveraging Internet-based data and network approaches. Nature Commun. 2019, 10, 147. [Google Scholar] [CrossRef] [PubMed] [Green Version]
- Ginsberg, J.; Mohebbi, M.H.; Patel, R.S.; Brammer, L.; Smolinski, M.S.; Brilliant, L. Detecting influenza epidemics using search engine query data. Nature 2009, 457, 1012–1014. [Google Scholar] [CrossRef]
- Ning, S.; Yang, S.; Kou, S. Accurate regional influenza epidemics tracking using Internet search data. Sci. Rep. 2019, 9, 5238. [Google Scholar] [CrossRef] [PubMed] [Green Version]
- Yang, S.; Kou, S.C.; Lu, F.; Brownstein, J.S.; Brooke, N.; Santillana, M. Advances in using Internet searches to track dengue. PLoS Comput. Biol. 2017, 13, e1005607. [Google Scholar] [CrossRef] [PubMed] [Green Version]
- Yang, S.; Ning, S.; Kou, S. Use Internet search data to accurately track state level influenza epidemics. Sci. Rep. 2021, 11, 4023. [Google Scholar] [CrossRef]
- Yang, S.; Santillana, M.; Brownstein, J.S.; Gray, J.; Richardson, S.; Kou, S. Using electronic health records and Internet search information for accurate influenza forecasting. BMC Infect. Dis. 2017, 17, 332. [Google Scholar] [CrossRef] [Green Version]
- Venna, S.R.; Tavanaei, A.; Gottumukkala, R.N.; Raghavan, V.V.; Maida, A.S.; Nichols, S. A novel data-driven model for real-time influenza forecasting. IEEE Access 2018, 7, 7691–7701. [Google Scholar] [CrossRef]
- Clemente, L.; Lu, F.; Santillana, M. Improved real-time influenza surveillance: Using internet search data in eight Latin American countries. JMIR Public Health Surveill. 2019, 5, e12214. [Google Scholar] [CrossRef]
- Dugas, A.F.; Jalalpour, M.; Gel, Y.; Levin, S.; Torcaso, F.; Igusa, T.; Rothman, R.E. Influenza forecasting with Google flu trends. PLoS ONE 2013, 8, e56176. [Google Scholar] [CrossRef] [PubMed] [Green Version]
- Osthus, D.; Hickmann, K.S.; Caragea, P.C.; Higdon, D.; Del Valle, S.Y. Forecasting seasonal influenza with a state-space SIR model. Ann. Appl. Stat. 2017, 11, 202. [Google Scholar] [CrossRef] [PubMed]
- Aramaki, E.; Maskawa, S.; Morita, M. Influenza patients are invisible in the web: Traditional model still improves the state of the art web based influenza surveillance. In Proceedings of the 2012 AAAI Spring Symposium Series, Palo Alto, CA, USA, 26–28 March 2012. [Google Scholar]
- Young, S.D.; Zhang, Q. Using search engine big data for predicting new HIV diagnoses. PLoS ONE 2018, 13, e0199527. [Google Scholar] [CrossRef] [Green Version]
- Altizer, S.; Dobson, A.; Hosseini, P.; Hudson, P.; Pascual, M.; Rohani, P. Seasonality and the dynamics of infectious diseases. Ecol. Lett. 2006, 9, 467–484. [Google Scholar] [CrossRef] [PubMed]
- Viguerie, A.; Lorenzo, G.; Auricchio, F.; Baroli, D.; Hughes, T.J.; Patton, A.; Reali, A.; Yankeelov, T.E.; Veneziani, A. Simulating the spread of COVID-19 via a spatially-resolved susceptible–exposed–infected–recovered–deceased (SEIRD) model with heterogeneous diffusion. Appl. Math. Lett. 2021, 111, 106617. [Google Scholar] [CrossRef]
- Yang, Z.; Zeng, Z.; Wang, K.; Wong, S.-S.; Liang, W.; Zanin, M.; Liu, P.; Cao, X.; Gao, Z.; Mai, Z. Modified SEIR and AI prediction of the epidemics trend of COVID-19 in China under public health interventions. J. Thorac. Dis. 2020, 12, 165. [Google Scholar] [CrossRef] [PubMed]
- He, S.; Peng, Y.; Sun, K. SEIR modeling of the COVID-19 and its dynamics. Nonlinear Dyn. 2020, 101, 1667–1680. [Google Scholar] [CrossRef]
- Moein, S.; Nickaeen, N.; Roointan, A.; Borhani, N.; Heidary, Z.; Javanmard, S.H.; Ghaisari, J.; Gheisari, Y. Inefficiency of SIR models in forecasting COVID-19 epidemic: A case study of Isfahan. Sci. Rep. 2021, 11, 4725. [Google Scholar] [CrossRef]
- Kumar, N.; Susan, S. COVID-19 pandemic prediction using time series forecasting models. In Proceedings of the 2020 11th International Conference on Computing, Communication and Networking Technologies (ICCCNT), Kharagpur, India, 1–3 July 2020; pp. 1–7. [Google Scholar]
- Maleki, M.; Mahmoudi, M.R.; Wraith, D.; Pho, K.-H. Time series modelling to forecast the confirmed and recovered cases of COVID-19. Travel Med. Infect. Dis. 2020, 37, 101742. [Google Scholar] [CrossRef] [PubMed]
- Yousaf, M.; Zahir, S.; Riaz, M.; Hussain, S.M.; Shah, K. Statistical analysis of forecasting COVID-19 for upcoming month in Pakistan. Chaos Solit. Fractals 2020, 138, 109926. [Google Scholar] [CrossRef]
- Papastefanopoulos, V.; Linardatos, P.; Kotsiantis, S. COVID-19: A comparison of time series methods to forecast percentage of active cases per population. Appl. Sci. 2020, 10, 3880. [Google Scholar] [CrossRef]
- Alazab, M.; Awajan, A.; Mesleh, A.; Abraham, A.; Jatana, V.; Alhyari, S. COVID-19 prediction and detection using deep learning. Int. J. Comput. Inf. Syst. Ind. Manag. Appl. 2020, 12, 168–181. [Google Scholar]
- Ghahramani, M.; Pilla, F. Leveraging artificial intelligence to analyze the COVID-19 distribution pattern based on socio-economic determinants. Sustain. Cities Soc. 2021, 69, 102848. [Google Scholar] [CrossRef]
- Er, S.; Yang, S.; Zhao, T. COUnty aggRegation mixup AuGmEntation (COURAGE) COVID-19 prediction. Sci. Rep. 2021, 11, 14262. [Google Scholar] [CrossRef] [PubMed]
- Mangono, T.; Smittenaar, P.; Caplan, Y.; Huang, V.S.; Sutermaster, S.; Kemp, H.; Sgaier, S.K. Information-seeking patterns during the COVID-19 pandemic across the United States: Longitudinal analysis of Google Trends data. J. Med. Internet Res. 2021, 23, e22933. [Google Scholar] [CrossRef] [PubMed]
- Li, C.; Chen, L.J.; Chen, X.; Zhang, M.; Pang, C.P.; Chen, H. Retrospective analysis of the possibility of predicting the COVID-19 outbreak from Internet searches and social media data, China, 2020. Eurosurveillance 2020, 25, 2000199. [Google Scholar] [CrossRef] [Green Version]
- Rufai, S.R.; Bunce, C. World leaders’ usage of Twitter in response to the COVID-19 pandemic: A content analysis. J. Public Health 2020, 42, 510–516. [Google Scholar] [CrossRef]
- Mavragani, A. Tracking COVID-19 in Europe: Infodemiology approach. JMIR Public Health Surveill. 2020, 6, e18941. [Google Scholar] [CrossRef] [Green Version]
- Yousefinaghani, S.; Dara, R.; Mubareka, S.; Sharif, S. Prediction of COVID-19 waves using social media and Google search: A case study of the US and Canada. Front. Public Health 2021, 9, 656635. [Google Scholar] [CrossRef]
- Rovetta, A.; Bhagavathula, A.S. COVID-19-related web search behaviors and infodemic attitudes in Italy: Infodemiological study. JMIR Public Health Surveill. 2020, 6, e19374. [Google Scholar] [CrossRef]
- Effenberger, M.; Kronbichler, A.; Shin, J.I.; Mayer, G.; Tilg, H.; Perco, P. Association of the COVID-19 pandemic with internet search volumes: A Google TrendsTM analysis. Int. J. Infect. Dis. 2020, 95, 192–197. [Google Scholar] [CrossRef]
- Liu, D.; Clemente, L.; Poirier, C.; Ding, X.; Chinazzi, M.; Davis, J.; Vespignani, A.; Santillana, M. Real-time forecasting of the COVID-19 outbreak in Chinese provinces: Machine learning approach using novel digital data and estimates from mechanistic models. J. Med. Internet Res. 2020, 22, e20285. [Google Scholar] [CrossRef] [PubMed]
- Ayyoubzadeh, S.M.; Ayyoubzadeh, S.M.; Zahedi, H.; Ahmadi, M.; Kalhori, S.R.N. Predicting COVID-19 incidence through analysis of google trends data in Iran: Data mining and deep learning pilot study. JMIR Public Health Surveill. 2020, 6, e18828. [Google Scholar] [CrossRef] [PubMed]
- Prasanth, S.; Singh, U.; Kumar, A.; Tikkiwal, V.A.; Chong, P.H. Forecasting spread of COVID-19 using google trends: A hybrid GWO-deep learning approach. Chaos Solitons Fractals 2021, 142, 110336. [Google Scholar] [CrossRef]
- Rabiolo, A.; Alladio, E.; Morales, E.; McNaught, A.I.; Bandello, F.; Afifi, A.A.; Marchese, A. Forecasting the COVID-19 epidemic by integrating symptom search behavior into predictive models: Infoveillance study. J. Med. Internet Res. 2021, 23, e28876. [Google Scholar] [CrossRef]
- Lampos, V.; Majumder, M.S.; Yom-Tov, E.; Edelstein, M.; Moura, S.; Hamada, Y.; Rangaka, M.X.; McKendry, R.A.; Cox, I.J. Tracking COVID-19 using online search. NPJ Digit. Med. 2021, 4, 17. [Google Scholar] [CrossRef] [PubMed]
- Turk, P.J.; Tran, T.P.; Rose, G.A.; McWilliams, A. A predictive internet-based model for COVID-19 hospitalization census. Sci. Rep. 2021, 11, 5106. [Google Scholar] [CrossRef]
- Ma, S.; Yang, S. COVID-19 forecasts using internet search information in the united states. Sci. Rep. 2022, 12, 11539. [Google Scholar] [CrossRef]
- Wang, T.; Ma, S.; Baek, S.; Yang, S. COVID-19 hospitalizations forecasts using internet search data. Sci. Rep. 2022, 12, 9661. [Google Scholar] [CrossRef]
- Ma, S.; Ning, S.; Yang, S. COVID-19 and Influenza Joint Forecasts Using Internet Search Information in the United States. arXiv 2022, arXiv:2202.02621. [Google Scholar]
- Google Scholar. Available online: https://scholar.google.com (accessed on 12 November 2022).
- Scopus. Available online: https://www.scopus.com/home.uri (accessed on 12 November 2022).
- PubMed National Library of Medicine. Available online: https://pubmed.ncbi.nlm.nih.gov (accessed on 12 November 2022).
- Dong, E.; Du, H.; Gardner, L. An interactive web-based dashboard to track COVID-19 in real time. Lancet Infect. Dis. 2020, 20, 533–534. [Google Scholar] [CrossRef] [PubMed]
- GitHub. Models of Infectious Disease Agent Study Association. Midas-Network/COVID-19. Available online: https://github.com/midas-network/COVID-19/tree/master/data/cases (accessed on 31 August 2022).
- European COVID-19 Forecast Hub. Available online: https://covid19forecasthub.eu/ (accessed on 31 August 2022).
- U.S. Department of Health & Human Services. Healthdata.gov COVID-19 Reported Patient Impact and Hospital Capacity by State Timeseries. 2021. Available online: https://healthdata.gov/Hospital/COVID-19-Reported-Patient-Impact-and-Hospital-Capa/g62h-syeh (accessed on 12 November 2022).
- Centers for Disease Control and Prevention. COVID-19 Vaccinations in the United States, County. Available online: https://data.cdc.gov/Vaccinations/COVID-19-Vaccinations-in-the-United-States-County/8xkx-amqh (accessed on 31 August 2022).
- FAQ about Google Trends Data. Available online: https://support.google.com/trends/answer/4365533?hl=en&ref_topic=6248052 (accessed on 31 August 2022).
- Baidu. Baidu Index. Available online: http://index.baidu.com (accessed on 31 August 2022).
- Fu, L.; Wang, B.; Yuan, T.; Chen, X.; Ao, Y.; Fitzpatrick, T.; Li, P.; Zhou, Y.; Lin, Y.-f.; Duan, Q. Clinical characteristics of coronavirus disease 2019 (COVID-19) in China: A systematic review and meta-analysis. J. Infect. 2020, 80, 656–665. [Google Scholar] [CrossRef]
- Bento, A.I.; Nguyen, T.; Wing, C.; Lozano-Rojas, F.; Ahn, Y.-Y.; Simon, K. Evidence from internet search data shows information-seeking responses to news of local COVID-19 cases. Proc. Natl. Acad. Sci. USA 2020, 117, 11220–11222. [Google Scholar] [CrossRef] [PubMed]
- Zhu, S.; Bukharin, A.; Xie, L.; Santillana, M.; Yang, S.; Xie, Y. High-resolution Spatio-temporal Model for County-level COVID-19 Activity in the US. ACM Trans. Manag. Inf. Syst. 2021, 12, 1–20. [Google Scholar] [CrossRef]
- Ilin, C.; Annan-Phan, S.; Tai, X.H.; Mehra, S.; Hsiang, S.; Blumenstock, J.E. Public mobility data enables COVID-19 forecasting and management at local and global scales. Sci. Rep. 2021, 11, 13531. [Google Scholar] [CrossRef] [PubMed]
- Google LLC. Google COVID-19 Community Mobility Reports. Available online: https://www.google.com/covid19/mobility/ (accessed on 31 August 2022).
- Apple. COVID-19 Mobility Trends Reports. Available online: https://www.apple.com/covid19/mobility (accessed on 31 August 2022).
- Facebook. Facebook Data for Good Mobility Dashboard. COVID-19 Mobility Data Network. Available online: https://www.covid19mobility.org/dashboards/facebook-data-for-good/ (accessed on 31 August 2022).
- Hamilton, J.D. Time Series Analysis; Princeton University Press: Princeton, NJ, USA, 2020. [Google Scholar]
- Potter, S. Nonlinear time series modelling: An introduction. J. Econ. Surv. 1999, 13, 505–528. [Google Scholar] [CrossRef]
- Seeger, M. Gaussian processes for machine learning. Int. J. Neural Syst. 2004, 14, 69–106. [Google Scholar] [CrossRef] [Green Version]
- Zou, B.; Lampos, V.; Cox, I. Multi-task learning improves disease models from web search. In Proceedings of the 2018 World Wide Web Conference, Lyon, France, 23–27 April 2018; pp. 87–96. [Google Scholar]
- Lampos, V.; Miller, A.C.; Crossan, S.; Stefansen, C. Advances in nowcasting influenza-like illness rates using search query logs. Sci. Rep. 2015, 5, 12760. [Google Scholar] [CrossRef] [PubMed] [Green Version]
- Hochreiter, S.; Schmidhuber, J. Long short-term memory. Neural Comput. 1997, 9, 1735–1780. [Google Scholar] [CrossRef] [PubMed]
- Chung, J.; Gulcehre, C.; Cho, K.; Bengio, Y. Empirical evaluation of gated recurrent neural networks on sequence modeling. In Proceedings of the NIPS 2014 Workshop on Deep Learning, Montreal, QC, Canada, 8–13 December 2014. [Google Scholar]
- Rumelhart, D.E.; Hinton, G.E.; Williams, R.J. Learning Internal Representations by Error Propagation; La Jolla Institute for Cognitive Science, California University San Diego: La Jolla, CA, USA, 1985. [Google Scholar]
- Bracher, J.; Ray, E.L.; Gneiting, T.; Reich, N.G. Evaluating epidemic forecasts in an interval format. PLoS Comput. Biol. 2021, 17, e1008618. [Google Scholar] [CrossRef]
- Pollett, S.; Johansson, M.A.; Reich, N.G.; Brett-Major, D.; Del Valle, S.Y.; Venkatramanan, S.; Lowe, R.; Porco, T.; Berry, I.M.; Deshpande, A. Recommended reporting items for epidemic forecasting and prediction research: The EPIFORGE 2020 guidelines. PLoS Med. 2021, 18, e1003793. [Google Scholar] [CrossRef]
- Cramer, E.Y.; Ray, E.L.; Lopez, V.K.; Bracher, J.; Brennen, A.; Castro Rivadeneira, A.J.; Gerding, A.; Gneiting, T.; House, K.H.; Huang, Y. Evaluation of individual and ensemble probabilistic forecasts of COVID-19 mortality in the United States. Proc. Natl. Acad. Sci. USA 2022, 119, e2113561119. [Google Scholar] [CrossRef]
- McGough, S.F.; Brownstein, J.S.; Hawkins, J.B.; Santillana, M. Forecasting Zika incidence in the 2016 Latin America outbreak combining traditional disease surveillance with search, social media, and news report data. PLoS Negl. Trop. Dis. 2017, 11, e0005295. [Google Scholar] [CrossRef] [PubMed]
- Teng, Y.; Bi, D.; Xie, G.; Jin, Y.; Huang, Y.; Lin, B.; An, X.; Feng, D.; Tong, Y. Dynamic forecasting of Zika epidemics using Google Trends. PLoS ONE 2017, 12, e0165085. [Google Scholar] [CrossRef] [PubMed] [Green Version]
- Carneiro, H.A.; Mylonakis, E. Google trends: A web-based tool for real-time surveillance of disease outbreaks. Clin. Infect. Dis. 2009, 49, 1557–1564. [Google Scholar] [CrossRef] [PubMed] [Green Version]
- Raissi, M.; Perdikaris, P.; Karniadakis, G.E. Physics-informed neural networks: A deep learning framework for solving forward and inverse problems involving nonlinear partial differential equations. J. Comput. Phys. 2019, 378, 686–707. [Google Scholar] [CrossRef]
- Chotpitayasunondh, T.; Fischer, T.K.; Heraud, J.M.; Hurt, A.C.; Monto, A.S.; Osterhaus, A.; Shu, Y.; Tam, J.S. Influenza and COVID-19: What does co-existence mean? Influenza Other Respir. Viruses 2021, 15, 407–412. [Google Scholar] [CrossRef]
- Monkeypox Signs and Symptoms. Available online: https://www.cdc.gov/poxvirus/monkeypox/index.html (accessed on 12 November 2022).
Study | Objective | Type of Model | Online Search Data Source | Other Data Inputs | Quality Assessment |
---|---|---|---|---|---|
Liu et al. (2020) [45] | Forecast 2-day-ahead COVID-19 cases in all 32 China provinces. | LASSO | Baidu | China CDC, Mobility, Media Cloud, COVID-19 cases | RMSE, Correlation: outperform persistence and AR baseline models in both metrics |
Ayyoubzadeh et al. (2020) [46] | Forecast 1-day-ahead COVID-19 confirmed cases in Iran | Linear regression, LSTM | Google Trends | COVID-19 cases | RMSE: linear regression with Google search data performs better than LSTM with Google search data |
Prasanth et al. (2021) [47] | Forecast 1-week-ahead COVID-19 cases and deaths in U.S., U.K., and India | LSTM | Google Trends | COVID-19 cases and deaths | RMSE, MAPE: LSTM has significant reduction from ARIMA baseline model |
Rabiolo et al. (2021) [48] | Investigate the relationship between Google Trends symptom searches and COVID-19 cases and deaths; use ARIMA to predict COVID-19 cases and deaths in Australia, Brazil, France, Iran, India, Italy, South Africa, U.K., and U.S. up to 14 days ahead | PCA and ARIMA | Google Trends | COVID-19 cases and deaths | RMSE: models with search terms and COVID-19 time series information outperform those without |
Lampos (2021) [49] | Forecast 1- and 2-week-ahead COVID-19 deaths in U.S., U.K., Australia, Canada, France, Greece, and South Africa. Produce point estimates and CI | Gaussian process (GP) | Google Trends | Media Cloud, COVID-19 deaths | MAE: the inclusion of search queries in GP autoregressive model significantly improves its performance |
Turk et al. (2021) [50] | 14-day-ahead COVID-19 hospitalizations forecast in Greater Charlotte market area in U.S. | Vector autoregression (variant) | Google Trends | Mobility, health bot, COVID-19 hospitalizations | MAPE: outperform ARIMA (time series benchmark) model |
Ma and Yang (2022) [51] | Forecast 1–4-week-ahead COVID-19 deaths in U.S. national and states level. Produce point estimates and CI | LASSO, spatialtemporal statistical approach | Google Trends | COVID-19 cases and deaths | RMSE, MAE, Correlation: outperform persistence and time series benchmarks, and perform reasonably against other CDC Forecast Hub methods |
Wang et al. (2022) [52] | Forecast 1–2-week-ahead COVID-19 hospital admissions in U.S. national and states level. Produce point estimates | LASSO, spatialtemporal statistical approach | Google Trends | COVID-19 cases, vaccination rate | RMSE, MAE, Correlation: outperform persistence and time series benchmarks, and perform reasonably against other CDC Forecast Hub methods |
Ma et al. (2022) [53] | Forecast 1–4-week-ahead COVID-19 cases, deaths, and 1-week ahead influenza. Produce point estimates and PI | LASSO, spatialtemporal statistical approach | Google Trends | COVID-19 cases, deaths, %ILI | RMSE, MAE, Correlation: outperform persistence and time series benchmarks, and perform reasonably against other CDC Forecast Hub methods |
Publisher’s Note: MDPI stays neutral with regard to jurisdictional claims in published maps and institutional affiliations. |
© 2022 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).
Share and Cite
Ma, S.; Sun, Y.; Yang, S. Using Internet Search Data to Forecast COVID-19 Trends: A Systematic Review. Analytics 2022, 1, 210-227. https://doi.org/10.3390/analytics1020014
Ma S, Sun Y, Yang S. Using Internet Search Data to Forecast COVID-19 Trends: A Systematic Review. Analytics. 2022; 1(2):210-227. https://doi.org/10.3390/analytics1020014
Chicago/Turabian StyleMa, Simin, Yan Sun, and Shihao Yang. 2022. "Using Internet Search Data to Forecast COVID-19 Trends: A Systematic Review" Analytics 1, no. 2: 210-227. https://doi.org/10.3390/analytics1020014
APA StyleMa, S., Sun, Y., & Yang, S. (2022). Using Internet Search Data to Forecast COVID-19 Trends: A Systematic Review. Analytics, 1(2), 210-227. https://doi.org/10.3390/analytics1020014