# Leveraging Geographically Distributed Data for Influenza and SARS-CoV-2 Non-Parametric Forecasting

^{1}

^{2}

^{3}

^{4}

^{*}

## Abstract

**:**

## 1. Introduction

## 2. Materials and Methods

#### 2.1. Data and Data Preprocessing

#### 2.2. Empirical Dynamic Modeling

#### 2.3. Meta-Parameters, Performance Evaluation, and Cross-Validation

## 3. Results

#### 3.1. Pooling Geographically Distributed Information Enhances EDM Performance on Influenza Data

#### 3.2. Exploring EDM on COVID-19 Data

#### 3.3. EDM as a Tool to Characterize the Epidemic Unfolding

## 4. Discussion

## 5. Concluding Remarks

## Author Contributions

## Funding

## Institutional Review Board Statement

## Informed Consent Statement

## Data Availability Statement

## Acknowledgments

## Conflicts of Interest

## Appendix A. Data Information

**Table A1.**Length (number of data points) of each series. It spans from early 2000s to the beginning of 2020, but some series miss data from the beginning or the end. Their span is exhibited in brackets (beginning year—last year). Influenza data is weekly and only contains data from September to June. COVID-19 data is daily, from the beginning of the pandemic until 19 April 2021.

Influenza | COVID-19 | |
---|---|---|

Andalusia (AN) | 679 (00-20) | 429 |

Aragon (AR) | 615 (00-18) | 421 |

Asturias (AS) | 549 (04-20) | 412 |

Balearic Islands (IB) | 672 (00-20) | 416 |

Basque Country (PV) | 661 (00-20) | 429 |

Canary Islands (CN) | 582 (03-20) | 419 |

Cantabria (CB) | 539 (05-20) | 415 |

Castile and León (CL) | 681 (00-20) | 423 |

Castile-La Mancha (CM) | 681 (00-20) | 416 |

Catalonia (CT) | 495 (05-20) | 442 |

Ceuta (CE) | 489 (05-20) | 402 |

Community of Madrid (MD) | 661 (00-20) | 457 |

Extremadura (EX) | 582 (03-20) | 416 |

Galicia (GA) | - | 420 |

La Rioja (RI) | 549 (04-20) | 417 |

Melilla (ML) | 356 (09-20) | 405 |

Navarre (NC) | 549 (04-20) | 418 |

Region of Murcia (MC) | - | 415 |

Valencian Community (VC) | 679 (00-20) | 429 |

## References

- Iuliano, A.D.; Roguski, K.M.; Chang, H.H.; Muscatello, D.J.; Palekar, R.; Tempia, S.; Cohen, C.; Gran, J.M.; Schanzer, D.; Cowling, B.J.; et al. Estimates of global seasonal influenza-associated respiratory mortality: A modelling study. Lancet
**2018**, 391, 1285–1300. [Google Scholar] [CrossRef] - Cai, Y.; Li, J.; Kang, Y.; Wang, K.; Wang, W. The fluctuation impact of human mobility on the influenza transmission. J. Frankl. Inst.
**2020**, 357, 8899–8924. [Google Scholar] [CrossRef] - Stilianakis, N.; Perelson, A.; Hayden, F. Emergence of drug resistance during an influenza epidemic: Insights from a mathematical model. J. Infect. Dis.
**1998**, 177, 863–873. [Google Scholar] [CrossRef] [PubMed] - Casagrandi, R.; Bolzoni, L.; Levin, S.A.; Andreasen, V. Th sirc model and influenza a. Math. Biosci.
**2006**, 200, 152–169. [Google Scholar] [CrossRef] [PubMed] - Dool, C.V.; Bonten, M.J.M.; Hak, E.; Heijne, J.C.M.; Wallinga, J. The effects of influenza vaccination of health care workers in nursing homes: Insights from a mathematical model. PLoS Med.
**2008**, 5, e200. [Google Scholar] [CrossRef][Green Version] - Dobrovolny, H.M.; Reddy, M.B.; Kamal, M.A.; Rayner, C.R.; Beauchemin, C.A.A. Assessing mathematical models of influenza infections using features of the immune response. PLoS ONE
**2013**, 8, e57088. [Google Scholar] [CrossRef] [PubMed] - Soo, R.; Chiew, C.J.; Ma, S.; Pung, R.; Lee, V. Decreased influenza incidence under COVID-19 control measures, Singapore. Emerg. Infect. Dis.
**2020**, 26, 1933–1935. [Google Scholar] [CrossRef] [PubMed] - Jones, N. How COVID-19 is changing the cold and flu season. Nature
**2020**, 588, 388–390. [Google Scholar] [CrossRef] [PubMed] - Lauer, S.A.; Grantz, K.H.; Bi, Q.; Jones, F.K.; Zheng, Q.; Meredith, H.R.; Azman, A.S.; Reich, N.G.; Lessler, J. The incubation period of coronavirus disease 2019 (COVID-19) from publicly reported confirmed cases: Estimation and application. Ann. Intern. Med.
**2020**, 172, 577–582. [Google Scholar] [CrossRef] [PubMed][Green Version] - Byambasuren, O.; Byambasuren, O.; Cardona, M.; Bell, K.; Clark, J.; Mclaws, M.; Glasziou, P. Estimating the extent of asymptomatic covid-19 and its potential for community transmission: Systematic review and meta-analysis. Off. J. Assoc. Med. Microbiol. Infect. Dis. Can.
**2020**, 5, 223–234. [Google Scholar] [CrossRef] - Frieden, T.; Lee, C. Identifying and interrupting superspreading events-implications for control of severe acute respiratory syndrome coronavirus 2. Emerg. Infect. Dis.
**2020**, 26, 1059–1066. [Google Scholar] [CrossRef] [PubMed] - Castro, M.; Ares, S.; Cuesta, J.A.; Manrubia, S. The turning point and end of an expanding epidemic cannot be precisely forecast. Proc. Natl. Acad. Sci. USA
**2020**, 117, 26190–26196. [Google Scholar] [CrossRef] [PubMed] - Lorenz, E. Predictability: Does the flap of a butterfly’s wing in Brazil set off a tornado in Texas? In Proceedings of the American Association for the Advancement of Science, 139th Meeting, Cambridge, MA, USA, 29 December 1972. [Google Scholar]
- Sugihara, G.; May, R.M. Nonlinear forecasting as a way of distinguishing chaos from measurement error in time series. Nature
**1990**, 344, 734. [Google Scholar] [CrossRef] [PubMed] - Viboud, C.; Boëlle, P.-Y.; Carrat, F.; Valleron, A.-J.; Flahault, A. Prediction of the Spread of Influenza Epidemics by the Method of Analogues. Am. J. Epidemiol.
**2003**, 158, 996–1006. [Google Scholar] [CrossRef] [PubMed][Green Version] - Working Group for the Surveillance and Control of COVID-19 in Spain. The first wave of the COVID-19 pandemic in Spain: Characterisation of cases and risk factors for severe outcomes, as at 27 April 2020. Eurosurveillance
**2020**, 25. [Google Scholar] [CrossRef] - Press, W.H.; Levin, R.C. Modeling, post COVID-19. Science
**2021**, 370, 1015. [Google Scholar] [CrossRef] [PubMed] - Barreiro, N.L.; Ventura, C.I.; Govezensky, T.; Núñez, M.; Bolcatto, P.G.; Barrio, R.A. Strategies for COVID-19 vaccination under a shortage scenario: A geo-stochastic modelling approach. Sci. Rep.
**2022**, 12, 1603. [Google Scholar] [CrossRef] [PubMed]

**Figure 1.**Empirical time series of the influenza and SARS-CoV-2 epidemics. (

**a**) Examples of historical time series of flu over 20 years in three Spanish regions: Basque Country (black), Community of Madrid (red), and Catalonia (blue). Data from each year (spanning from week 40 of a given year to week 20 of the next year) have been concatenated omitting the warm season (during which incidence is negligible). The gray area is expanded in (

**b**) to show the yearly exponential raise, peak, and fall that characterizes the influenza cycle. Here, is data from 2015/16. (

**c**) Evolution of the SARS-CoV-2 pandemics in the same three regions shows the pattern of waves within a single year, which are not always in phases across regions.

**Figure 2.**Illustration of different data pools for EDM. In grey, the example we want to forecast, is the testing series. In blue, is the method we call classic, which uses as library of patterns all the examples of the same region. In pink, is the method annual, which uses as library all the examples from the same year. In green, is the method pool, where we use the biggest library, taking all the series that are not from the same region or the same year.

**Figure 3.**Pooling geographically distributed information for influenza forecast. (

**a**–

**e**) Results of different numerical experiments for conditions pool (solid black curves, with shading indicating standard deviation), classic (solid red) and annual (dashed red). Filled circles in (

**a**,

**b**) mark the location of optimal meta-parameters for each protocol. The optima for pool are also marked by vertical solid lines. Solid horizontal lines in (

**c**–

**f**) mark the 0 of the vertical axis. Solid vertical lines in (

**c**–

**e**) mark the location of the peak in time. Dotted horizontal line in

**f**marks ${\rho}^{*}\left(h\right)=0.5$. (

**a**) EDM performance (as measured by correlation between data and forecast) as a function of ${n}_{L}$ with fixed, optimal ${\beta}^{*}$. (

**b**) EDM performance as a function of $\beta $ with fixed, optimal ${n}_{L}$. (

**c**) Average error in forecasting the peak magnitude. (

**d**) Average error in forecasting the peak location. (

**e**) EDM performance as we attempt to predict more time ahead. (

**f**,

**g**) Examples of how the forecast become worst as we attempt to predict with more anticipation. Real data (solid red curves) is compared to forecasts derived with one week (solid black), three weeks (dashed black), or five weeks (dotted black) of anticipation. The various shadings indicate standard deviation of the estimated quantity. (

**f**) Forecasts for the Community of Madrid. (

**g**) Forecasts for the Community of València.

**Figure 4.**Pooling geographically distributed information for COVID-19 forecast. (

**a**–

**e**) Results of different numerical experiments for conditions pool (solid black curves, with shading indicating standard deviation), classic (solid red) and annual (dashed red). Filled circles in (

**a**–

**b**) mark the location of optimal meta-parameters for each protocol. The optima for pool are also marked by vertical solid lines. Solid horizontal lines in (

**c**–

**f**) mark the 0 of the vertical axis. Solid vertical lines in (

**c**–

**e**) mark the location of the peak in time. Dotted horizontal line in (

**f**) marks ${\rho}^{*}\left(h\right)=0.5$. (

**a**) EDM performance (as measured by correlation between data and forecast) as a function of ${n}_{L}$ with fixed, optimal ${\beta}^{*}$. (

**b**) EDM performance as a function of $\beta $ with fixed, optimal ${n}_{L}$. (

**c**) Average error in forecasting the peak magnitude.

**d**Average error in forecasting the peak location. (

**e**) EDM performance as we attempt to predict more time ahead. (

**f**,

**g**) Examples of how forecast become worst as we attempt to predict with more anticipation. Real data (solid red curves) is compared to forecasts derived with one week (solid black), three weeks (dashed black), or five weeks (dotted black) of anticipation. The various shadings indicate standard deviation of the estimated quantity. (

**f**) Forecasts for the Community of Madrid. (

**g**) Forecasts for the Community of València.

**Figure 5.**Influenza and COVID-19 networks. The size of the nodes is directly proportional to how many times a certain region has taken an example of itself. The darker a node is, the better it can be described—attending to the correlation coefficient $\rho $. Connection between generic regions A and B is plotted if the number of examples A takes from B overlaps $1.25$ times the median number of examples A takes from other regions. (

**a**) Geographical representation. (

**b**) Graphical representation.

Publisher’s Note: MDPI stays neutral with regard to jurisdictional claims in published maps and institutional affiliations. |

© 2022 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).

## Share and Cite

**MDPI and ACS Style**

Boullosa, P.; Garea, A.; Area, I.; Nieto, J.J.; Mira, J.
Leveraging Geographically Distributed Data for Influenza and SARS-CoV-2 Non-Parametric Forecasting. *Mathematics* **2022**, *10*, 2494.
https://doi.org/10.3390/math10142494

**AMA Style**

Boullosa P, Garea A, Area I, Nieto JJ, Mira J.
Leveraging Geographically Distributed Data for Influenza and SARS-CoV-2 Non-Parametric Forecasting. *Mathematics*. 2022; 10(14):2494.
https://doi.org/10.3390/math10142494

**Chicago/Turabian Style**

Boullosa, Pablo, Adrián Garea, Iván Area, Juan J. Nieto, and Jorge Mira.
2022. "Leveraging Geographically Distributed Data for Influenza and SARS-CoV-2 Non-Parametric Forecasting" *Mathematics* 10, no. 14: 2494.
https://doi.org/10.3390/math10142494