The Nobel laureate Niels Bohr once said that: “Predictions are very difficult, especially if they are about the future”. Nonetheless, models that can forecast future COVID-19 outbreaks are receiving special attention by policymakers and health authorities, with the aim of putting in place control measures before the infections begin to increase. Nonetheless, two main problems emerge. First, there is no a general agreement on which kind of data should be registered for judging on the resurgence of the virus (e.g., infections, deaths, percentage of hospitalizations, reports from clinicians, signals from social media). Not only this, but all these data also suffer from common defects, linked to their reporting delays and to the uncertainties in the collection process. Second, the complex nature of COVID-19 outbreaks makes it difficult to understand if traditional epidemiological models, such as susceptible, infectious, or recovered (SIR), are more effective for a timely prediction of an outbreak than alternative computational models. Well aware of the complexity of this forecasting problem, we propose here an innovative metric for predicting COVID-19 diffusion based on the hypothesis that a relation exists between the spread of the virus and the presence in the air of particulate pollutants, such as PM2.5
, and NO2
. Drawing on the recent assumption of 239 experts who claimed that this virus can be airborne, and further considering that particulate matter may favor this airborne route, we developed a machine learning (ML) model that has been instructed with: (i) all the COVID-19 infections that occurred in the Italian region of Emilia-Romagna, one of the most polluted areas in Europe, in the period of February–July 2020, (ii) the daily values of all the particulates taken in the same period and in the same region, and finally (iii) the chronology according to which restrictions were imposed by the Italian Government to human activities. Our ML model was then subjected to a classic ten-fold cross-validation procedure that returned a promising 90% accuracy value. Finally, the model was used to predict a possible resurgence of the virus in all the nine provinces of Emilia-Romagna, in the period of September–December 2020. To make those predictions, input to our ML model were the daily measurements of the aforementioned pollutants registered in the periods of September–December 2017/2018/2019, along with the hypothesis that the mild containment measures taken in Italy in the so-called Phase 3 are obeyed. At the time we write this article, we cannot have a confirmation of the precision of our predictions. Nevertheless, we are projecting a scenario based on an original hypothesis that makes our COVID-19 prediction model unique in the world. Its accuracy will be soon judged by history—and this, too, is science at the service of society.
This is an open access article distributed under the Creative Commons Attribution License
which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited