How Schools affected COVID-19 Pandemic in Italy: Data Analysis for Lombardy Region, Campania Region, and Emilia Region

Background: CoronaVirus Disease 2019 (COVID-19) is the main discussed topic world-wide in 2020 and at the beginning of the Italian epidemic, scientists tried to understand the virus diffusion and the epidemic curve of positive cases with controversial findings and numbers. Objectives: In this paper, a data analytics study on the diffusion of COVID-19 in Lombardy Region and Campania Region is developed in order to identify the driver that sparked the second wave in Italy. Methods: Starting from all the available official data collected about the diffusion of COVID-19, we analyzed google mobility data, school data and infection data for two big regions in Italy: Lombardy Region and Campania Region, which adopted two different approaches in opening and closing schools. To reinforce our findings, we also extended the analysis to the Emilia Romagna Region. Results: The paper aims at showing how different policies adopted in school opening / closing may have on the impact on the COVID-19 spread. Conclusions: The paper shows that a clear correlation exists between the school contagion and the subsequent temporal overall contagion in a geo-


Introduction
Data analysis [14,15,16] has proved to be of fundamental importance for studying and predicting the behaviour of the pandemic of SARS-CoV2 and COVID-19, in order to intervene promptly and stem its spread [1,2,3]. The school opening has been a hotly debated topic nationwide, with at one side scientists that consider schools safe and on the other side scientists who consider schools unsafe and unsecured. In our opinion, school is not a safe environment by definition, but it must be made safe taking serious and rigorous actions.
The data (shown in Table 1) on the growth of infections by age groups from the beginning of September to March that are published weekly in the epidemiological reports of the ISS, indicate that the age group 0-9 have had a growth between 6 and 10 times higher than all other ages. Now the data show that from 29 December to 10 March last year the infections increased by 83.44% in the age group between 0 and 9 years. And 63.55% in the 10 to 19 age range. The school age is therefore the one where the contagion has grown a little more. Because the third group for growth is that between 30 and 39 years, with 52.35% and almost all the other age groups are below 50% growth. The older ones also much less. Between the ages of 80 and 89, the contagion grew by 39.71% over the period and over 90 years even less: 31.28%. These data disprove the idea that the problem was that of transport, which essentially concerned the high schools (the youngest ones mostly go to school on foot or accompanied by their parents by private means). We believe has no meaning the distinction between the school environment per se or extended to include the public transport and the dynamics of entering / leaving the school. At the moment, the main contribution of the school to viral circulation must be analysed and quantitatively assessed.
Obviously, it remains of fundamental importance to determine what risks are exposed to children with school closures, which certainly impacts on mental health, cognitive development and which are fundamental in developmental age and, consequently, arrive at risk-weighted decisions.
At the end of August, we presented a predictive model (made public on the August 25 th by one of the paper authors, on the scientific dissemination Facebook page "Predire è Meglio che Curare") to show how the second wave in Italy was practically already started. The model estimated a relative peak around the 7/8 of September and then a slight decline in slowdown waiting to see the strong impact, within two weeks, coming from the schools reopening (on September 14). If there now we look back (see Figure 2), we clearly observe that the exponential explosion of the contagion is exactly started on September 28, so exactly two weeks after the reopening of the Italian schools.

Figure 1. Observed real curves for new daily cases and deaths
Our model was based on the hypothesis that schools are an important driver of contagion. Furthermore, the major impact is to be considered in the contagion that then happen at a second layer inside the family context, leading after about two incubation cycles of the virus. This explains why we used a time lag of 14 days, in our predictive model. Even if it is not our aim to "blame" children or teachers for these infections and we watched the school operators doing their utmost in the summer to find solutions to secure the most possible the school environment, we cannot be blind and avoid seeing that the virus finds fertile ground for contagion in closed environments, very populated, poorly ventilated, as are our school environments which are not among the most modern in Europe. Therefore, to think that the school is a safe environment by definition, it is wrong because it has caused and will be the cause of spread uncontrolled virus.
In this paper, we want to analyse the few official MIUR data available on contagion at school, to understand how much the school may have impacted on the territorial contagion.

Materials and Methods
In December, the MIUR published an official dataset relating to infections in schools (joining different data collection) for the period 14 September -30 October [17]. The report spokes of approximately 65,000 cases identified in the time window 14 September -30 October (we are only talking about primary and lower secondary school, because most of the high schools were in any case remote). The available data count 65,000 cases, but they are underestimated because not all Italian schools have participated in this tracking activity, and not all schools have released their data to the ministry. It should also be considered that 75% of those under 18 years old are asymptomatic [source: ISS], and this large slice of young people is lost in the tracing activity. 65,000 cases out of 360,000 total cases detected in the same period is a considerable percentage of 18% of the total. Furthermore, the major impact is to be considered in the contagion that then arose in the second instance within the family walls, leading after about two incubation cycles of the virus, to an uncontrolled growth of the curves (the one that we observed from September 28 onwards in our predictive model and in Figure 1).
In order to correlate schools and infection, we considered the data officially released by the MIUR and we carried out a correlation analysis on the Lombardy Region (RL) and Campania Region (RC), two regions that have adopted two different policies of opening and closing schools. RL is characterized by 12 provinces for a total of 10M inhabitants, while RC has 5 provinces with 5.8M inhabitants.
Specifically, a twofold correlation study has been conducted: 1. between school contagion index (both total and separate for grade I and grade II) and an index of global contagion at the provincial level (both for RL and RC). The correlation study was done with a global contagion index on the reference period from September 14 to October 30, and also considering the first two weeks after the reopening of schools (from September 14 to September 28, where the contagion theoretically does not it should be detectable, given the latency time between positivity and the onset of symptoms and related diagnostic screening) and also in the following two weeks (from September 28 to October 12, when it is likely that contagion triggered in schools and then it potentially spreads in the intra-family context); 2. between contagion index and mobility indexes derived from the COVID-19 Google Community Mobility Report [11], where mobility data at regional and national level in different sectors (e.g., mobility near parks and public gardens, pharmacies, a work level, train stations, residential, ...) are analysed.
For the two studies, both the RL data and the CR data were analysed. We decided to use these two regions because they have applied very different territorial policies and restrictions for both the opening / closing of schools in every order and degree, both for restrictions on personal mobility. Moreover, we extended this analysis to a third Italian region: Emilia Romagna Region that is characterized by 9 provinces and a total population of 4.5M inhabitants. Emilia Romagna followed a similar opening/closure strategy such as the one applied to RL (i.e., primary and secondary schools open in attendance and secondary schools at 50% in attendance).
We computed the Correlation Index by using the Pearson Correlation, as: F-Test is then conducted on the dataset to determine if there is a significant difference between the means of two groups and to understand the statistical significance of our findings.

Comparing Lombardy, Campania and Emilia Romagna Contagion Indexes
As for the Lombardy Region (RL), the identified cases are about 14,000 out of 88,500 total cases (15.8%), in the reference period September 14 th -October 30 th . In the Campania Region (RC), the cases identified are 4,620 about 42,815 total cases (10.8%). It is important to recall that the Lombardy Region and the Campania Region have used in October different school policies, the first leaving primary and secondary schools open in attendance and secondary schools at 50% in attendance [9], while RC intervening instead with targeted closures: schools opened at September 24, and then all levels were closed in advance starting from the October 16 and until November 13 [8].
Let us now focus on the data of the Lombardy Region at the provincial level (see Figure 2 for numerical details). If we calculate a "school contagion" index and a "global contagion" index (normalized on the ISTAT2020 population [http://demo.istat.it] for each province as: cases / 1000 inhabitants), it is interesting to note that there is a strong correlation between a high rate of contagion in school and high contagion rate then at the provincial level. This correlation is not found instead by looking, for example, the population density as another variable. As an example, let's take the case of Varese (VA): VA was the Lombardy province that was mostly impacted by the second COVID-19 wave. VA has the highest school contagion rate of all (together with MB) and a very high global contagion rate.

Figure 2. Lombardy Region data set for School contagion Index and overall contagion. Correlation study
Let's now consider what happened in the first two weeks after the school reopened (Sept. 14th -Sept. 28th) where theoretically the effects of the school were just beginning to be visible: it is noted how there is no correlation between the school contagion index and the global contagion index at two weeks. (Correlation Index CI = -0.10). If, on the other hand, we look at the correlation between school contagion and the index of contagion two weeks after the reopening of schools, we have a clear correlation with CI = 0.69, which rises further considering the entire reference period (CI = 0.89) as clearly depicted in the scatter plot of Figure 3 (the linear trend line as the coefficient of determination with a very high value R 2 = 0.94). The correlation indices are identical if the school contagion data is separated between primary and secondary school. It is interesting to observe the correlation between the contagion index and the population density (CI = 0.60) but not between the school index and the population density (CI = 0.37). If, on the other hand, we observe the correlation between the contagion index and the mobility indexes, we note that the lower mobility registered with government restrictions and DPCM does not have an interdependence relation with the contagion more or less accentuated in the various provinces of RL, with the exception of transit station mobility that shows a correlation CI = -0.57.
Since the number of datapoints is limited, we tested the statistical significance with F-Test ( = 0.05) and we obtained the following values: F = 21.20 with a P(F) = 7.94E-06 and an F critical = 2.82 (gdl=11), so the null hypothesis is rejected (since F calculated is greater than the F critical.)

Figure 3. Lombardy Region scatter plot for School Contagion Index vs. Overall Contagion Index
Contrary to what was observed in the Lombardy Region, in the Campania Region (Figure 4), where all the schools were closed in advance, starting from October 16 and until November 13 [8], there is no statistical interdependence between the contagion variables in the schools and the subsequent contagion in the regional provinces: in the first two weeks of observation (September 14 to September 28) the correlation is very high CI=0.81 and then decrease instead of increasing over-time to the global value CI=0.47, with a behavior that is opposite to the one observed for Lombardy Region. As for mobility data, a strong correlation index is detected between the contagion index and the retail and recreation mobility (CI=0.86). To reinforce our findings, we extended our analysis evaluating also the Emilia Romagna Region. For this region, the identified cases are about 3,050 out of 19,670 total cases (15.5%), in the reference period September 14 th -October 30 th , with a similar % to RL. Moreover, REm followed a similar opening/closure strategy such as the one applied to RL. As depicted in Figures 5 and 6, correlation between the school contagion index and the overall contagion index per each province increases over time, starting from an inverse correlation at the beginning of school closure CI=-0.50 to an index of CI=0.41 after two weeks and CI=0.69 after four weeks since school reopening. The index for the overall period is CI=0.76 with the secondary school impacting more than the primary school (CI=0.86 vs. CI=0.58).
Since the number of datapoints is limited, we tested the statistical significance with F-Test ( = 0.05) and we obtained the following values: F = 23.16 with a P(F) = 9.27E-05 and an F critical = 3.43 (gdl=8), so the null hypothesis is rejected (since F calculated is greater than the F critical.)

Reproduction Number (Rt) And Contagion Curves Evaluation
If we take also a look to the contagion curve (new daily positive cases), see Figure 7, RC, which was the region that applied the more restrictive policies for schools, was able to invert the trend of new daily positive cases earlier than the other two regions. Moreover, it is interesting to observe that also the ascent trend was less steep than RL and REm, with the average doubling time equals to 8 days (3.4 days for RL and 6 days for REm). Moreover, we computed the Rt (reproduction number) for the Italian regions. The Rt estimation has been conducted by using the Time-Dependent method by Wallinga&Teunis [18] with a time aggregation level equals to 10 days. All regional and provincial trends are reported in our web site: www.covid19-italy.it. The trend is depicted in Figure 8: it is clear that RC was able to contain the reproduction number to a low peak value Rt=1.1 while REm and RL have a peak value of Rt=1.4 and Rt=1.5, respectively. RC was the first one to reach the guard value Rt = 1.0 (09/11/2020). In Table 2, we also summarize for all the Italian regions (categorized by the date of school opening) the dates when the Rt peak is reached and the associated Rt value. The average Rt value is also reported for the regions that opened schools at Sept. 14, and the ones that opened schools later at Sept. 24. Regions that postponed the schools opening had an average Rt lower than the one registered for regions that opened schools earlier: Rt=1.27 vs. Rt=1.46, respectively. The Campania region that applied the most stringent policies in Italy, by closing all schools promptly, registered the lowest Rt value among all the Italian regions. Moreover, we can observe that regions that opened schools earlier had their Rt peaks earlier than regions that postponed the schools reopening: in the first category (Sept. 14), 8 regions had their Rt peak at Oct. 10, 2020 and 5 regions at Oct. 20, 2020; in the second category (Sept. 24), 6 regions had their Rt peak at Oct. 20, 2020 and only 1 region at Oct. 10.

Conclusions
Our work shows that the provinces that have had a large number of cases in the school environment are the ones that have subsequently had a higher total number of cases, and as expected the contagion increased over-time. The most significant example is Varese that with reference to the other provinces of the Lombardy Region is the one that had the highest incidence in schools spreading over time throughout the entire provincial territory, thus leading to one of the most affected provinces in Italy during the second COVID-19 wave.
There are different elements and different factors that suggest us to conclude that the school is not a safe environment by definition, but it must be made sure, by taking serious actions to protect students, teachers, and operators who work and live every day the school context, such as strict personal hygienic conditions, respect for the rules, serious contact tracing activities, timely testing and swabs for students, adequate natural and artificial ventilation of classrooms, etc… This study may be extended to other Italian regions and to new data, when the MIUR will officially release new data on the infection detected within the schools.