Complex Network Modelling of Origin–Destination Commuting Flows for the COVID-19 Epidemic Spread Analysis in Italian Lombardy Region

: Currently the whole world is affected by the COVID-19 disease. Italy was the ﬁrst country to be seriously affected in Europe, where the ﬁrst COVID-19 outbreak was localized in the Lombardy region. The further spreading of the cases led to the lockdown of the most affected regions in northern Italy and then the entire country. In this work we investigated an epidemic spread scenario in the Lombardy region by using the origin–destination matrix with information about the commuting ﬂows among 1450 urban areas within the region. We performed a large-scale simulation-based modeling of the epidemic spread over the networks related to three main motivations, i.e., work, study and occasional transfers to quantify the potential contribution of each category of travellers to the spread of the epidemic process. Our ﬁndings outline that the three networks are characterised by different weight dynamic growth rates and that the network “work” has a critical role in the diffusion phenomenon showing the greatest contribution to the epidemic spread.


Introduction
COVID-19 is a respiratory infectious disease caused by the SARS-CoV-2 virus epidemics and originated in December 2019 in Wuhan, probably following a zoonotic event [1]. The current COVID-19 outbreak marks the third novel coronavirus emergence in the 21st century, after the 2003 SARS and the 2013 Middle East respiratory syndrome (MERS). Currently the whole world is affected by the COVID-19 disease [2,3].
A third wave of the Covid pandemic is now advancing across many countries in Europe, mainly due to new variants. The spreading of the virus and the difficulties in completing the vaccine campaign are characterizing this new emergency that is still stressing the health care system and in particular intensive care units. Many nations (France, Germany, Spain, Poland, Italy, etc.) are in lockdown. The rate of contagiousness is still high and, at the end of March 2021, the deaths attributed to COVID-19 are 1 million in Europe and more than 112,000 in Italy [4]. Lombardy region is one of the most affected areas in Italy with about 4000 infected individuals per day (rate of infected over 8%) and with over 30,000 deaths, or 3 per 1000 residents [5], which represents the highest percentage in Europe compared with similar densely populated areas [4].
In recent years epidemic modeling techniques have been consistently improved by integrating the information about the complex and heterogeneous structures of social and mobility networks [6]. By using large-scale datasets, several simulation models have been developed to study dynamic spread processes and characterize epidemic mechanisms through human interactions, mobility and contacts patterns. As an example, modern transportation networks have been proven to be particularly important to understand the geographical spread of epidemics where the various long-range heterogeneous connections could give rise to very complicated evolutions of epidemics [7][8][9][10]. Worldwide airport transportation networks have been used to study specific outbreaks such as pandemic influenza [11], severe acute respiratory syndrome [12] and very recently COVID-19 [13].
Here we investigated the epidemic spread in the Lombardy region by using the origindestination matrix with information about the commuting flows among the 1450 urban areas within the region. We performed a large-scale simulation-based modeling of epidemic spread over the networks related to three main motivations, i.e., work, study and occasional transfers in order to compute the contribution of each category to the dynamic epidemic diffusion. The main rationale of this study is that the connection patterns of the commuting flow networks could offer a wide-ranging overview of the spreading process in the region where the epidemic started in Italy. Indeed, the origin-destination matrices have been often used to analyze traffic flow and traffic demand and model the network structure in the long run for different purposes [14,15]. The main objective of this study is to quantify the potential contribution of each category of travellers to the spread of the epidemic process in order to assign a quantitative score to each of them.

Origin-Destination Networks
In this work we analyzed the Origin-Destination regional 2016 Matrix (OD 2016) freely available on "Portale OpenData Regione Lombardia" (www.dati.lombardia.it/Mobilit-etrasporti/Matrice-OD2016-Passeggeri/tezw-ewgk accessed on 12 March 2020). The OD matrix 2016 reports commuting flows by origin, destination, time slot, reason and prevailing travelling modality. The matrix refers to an average working day in the period February-May and includes 1525 zones (of which 1450 urban areas within the Lombardy region). It includes eight modalities (driver car, passenger car, rubber LTP, iron LTP, motorbike, bicycle, foot and other) and five reasons (work, study, occasional, business, return home). The OD 2016 matrix was obtained from the update of the OD 2014 matrix by using the analysis of socio-economic datasets and the analysis of urban and territorial developments. In turn, the OD 2014 matrix was compiled by integrating transport modelling, online questionnaires, vis-à-vis interviews, analysis of available surveys and existing demand. In order to model the mobility flows in Lombardy Region due to the three main categories "work", "study" and "occasional", we aggregated the commuting flows concerning each travelling motivation by considering all transport modalities and all time slots. In this way we constructed three mobility matrices with N = 1450 nodes in which each entry w ij indicates the number of absolute transfers between zone i and zone j. In addition, we constructed the matrix of "total" commuting flows in the Lombardy region by summing the total flows of the three categories.
We computed the degree of each node as: where a ij is the binarized entry of the w ij entry of each matrix, i.e., a ij = 1 if w ij > 0 or a ij = 0 otherwise. We also computed the strength of each node to also consider the traffic between nodes: In order to characterize both structure and organization of mobility networks at the statistical level, we used the distribution of their degree p(k) = N k N and the strength distribution p(s) = N s N , where N k is number of nodes with degree k and N s is number of nodes with degree s. These two quantities have been extensively used as intuitive topological metrics to check the level of heterogeneity in the systems and compare the central roles of the nodes [16][17][18]. In addition, we performed a correlation analysis to verify whether a statistical dependence exists between the degree and strength of each node of the networks.

Epidemic Spread on Networks
In order to explore the epidemic behavior in a population, the disease dynamics have to be defined. When modeling epidemic spread in networks, individuals or subpopulations are represented by nodes and the contact patterns amongst them are encoded by the edges of the network. In the following we will consider the standard compartmentalization approach in which nodes exist in a certain number of discrete states such as susceptible, infected or permanently recovered. The paradigmatic epidemiological model we considered is the susceptible-infected-removed (SIR) model [19,20], where nodes can have the following status: susceptible S, infected and infectious I, and recovered or removed R. At any instant, the network's state is given by the statuses of all N nodes. Thus, there are 3N states in the SIR model. That is, the state space consists of N-tuples of symbols from the set S, I, R. The rates of change of the nodes' statuses are determined by the state changes over time and the infection and recovery processes [21]. For the stochastic formulation S(t), I(t), and R(t) = N − S(t) − I(t) represent random variables that take values from the set {0, 1, . . . , N}. The focus of the model is then on determining evolution equations for the probabilities of observing a particular number of susceptible, infected and recovery nodes at a given time t. The transitions between states are given by: where β is the transmission rate, i.e., the rate at which the infection is transmitted across a link between a an infected node and a susceptible node, γ is the recovery rate, i.e., the rate at which each infected node recovers independently of all others of the network. It is worth noting that this model assumes that each node can contact or it is in contact with any other nodes by means of the network topology. In addition, both infection and the recovery processes are Markovian as: • transmission from an infected node to a susceptible node occurs across an edge as a Poisson process with rate β; • an infected node recovers by following a Poisson process with rate γ.
For unweighted networks with a SIR disease spreading across edges, the evolution equations are given by: where p i,j (t) is the probability that there are i susceptible, j infected and N − (i + j) recovered nodes at time t, with 0 ≤ i + j ≤ N. In our analysis we consider weighted networks in which each edge weight w ij quantifies the mobility rate between each couple of nodes i and j. Hence, in order to take into account both the topology and the dynamic flows over the network, we included the normalized edge weights into the epidemic model by using an edge-based transmission rate: This definition lead to a dynamic infection process in which the probability p i,j (t) is modulated by the commuting flow over the edge connecting nodes i and j. Thereby, the final evolution equations are given by: An important parameter for the analysis of an epidemic outbreak is the basic reproductive number R 0 , which relates to the number of secondary infected cases generated by a primary infected individual [19]. Under the assumption of the homogeneous mixing of the population, the basic reproductive number is defined as: Thus, from Equation (7) it is evident that any epidemic will spread across a nonzero fraction of the population only for R 0 > 1. Indeed, in this case, the epidemic spread generates a number of infected individuals larger than those who recover. The reproductive number is strictly related to the epidemic threshold. Indeed, if the spreading rate is not large enough to allow R 0 > 1, the epidemic outbreak will not affect a finite portion of the population and will die out in a finite amount of time.

Statistical Analysis
We performed M = 1000 simulations of epidemic spread processes over each of the four weighted networks. In particular, the event-based numerical technique proposed in [21] was adopted to simulate continuous-time SIR disease transmission in the undirected weighted networks.
Since we were interested in modelling the epidemic spread over the networks by considering the commuting flows between the urban areas for the different categories, we selected the following features from the curve of infection I(t) to provide a salient descriptor of the epidemic spread over the networks: • value of the peak; • time at which the peak occurs; • area under the curve.
Indeed, such features were used in literature to evaluate the social and economic impact of the epidemic [22], make forecasting analysis [23] and plan or evaluate containment policies [24].
A direct consequence of the stochastic simulations is that the onset of the epidemics could fluctuate and the peaks can occur with stochastic delays. Since we were interested in comparing the different epidemic dynamics between the four networks, we used the non-parametric Kruskal-Wallis test for testing whether samples of each metric originate from the same distribution. Without loss of generality, we set R 0 > 1 by using fixed values of γ = 0.1 and β ≥ 0.2 to perform a sensitivity analysis for different value of R 0 .

Network Structure
We analyzed the information provided by each of the three weighted networks, by inspecting both their degree and strength probability distributions. Figures 1 and 2 show the degree probability distribution p(k) and the strength probability distribution p(s) for the networks, respectively.  The degree distributions clearly resemble generalized extreme distribution models, while the strength distributions exhibit truncated exponential behaviour. Both the degree and strength distributions of all the networks are heavy-tailed distributions varying over several orders of magnitude. Other works report similar findings for transportation networks. For instance, the airline traffic among different urban areas in the world shows p(k) = k −α f (k/k x ), where α = 2 and f (k/k x ) is an exponential cutoff function which takes into account the physical constraints on the maximum number of connections that a single airport can handle [18,25]. The authors in [15] used the complex network theory to to assess the fitness of the current physical transportation network against the conceptual road network by using the origin-destination movement matrix showing a power law relation between the frequency of a degree and the degree of a node in the OD network. In general, the scale-free properties of a wide range of degree and strength values clearly indicate the heterogeneous behavior of both topology and dynamic flows on the infrastructure transportation networks. Indeed, in such networks a small number of hubs with many strong connections coexist with a large number of nodes with few weaker connections [26]. As in [18] we investigated the dependence of s i on k i , finding for all the networks a correlation law: s(k) = δk α (8) In Figure 3 the strength s(k) is reported as a function of the degree k of the nodes for the three networks "study", "work" and "occasional". For the "study" network, we found α = 1 meaning a direct correlation between the strength and the degree of the nodes.
The network "occasional" shows α = 1.24 and for the "work" network, we found α = 1.57. For the latter two cases an exponent α > 1 implies that the strength of vertices grows faster than their degree.  Figure 4 shows the epidemic curves of all the networks. The displayed curves are averaged over the thousand simulations on each network computed with R 0 = 2. It can be noted that the "total" network exhibits the highest number of infected nodes with a peak earlier than the other networks. This is followed by the "work" network, the "occasional network" and finally by the "study" network.

Simulations of Epidemic Spread
These results are even more evident in Figure 5 showing the violin plots for the distributions of the peak values of I(t), temporal locations of the peaks of I(t) and the cumulative value of the infected nodes until the end of the epidemic process computed by using the area under I(t).
Kruskal-Wallis test returned value of p < 0.000001 for the distributions of peak values, p = 0.0001 for the distributions of temporal location of peak values and p < 0.000001 for the total number of infected nodes, indicating that the tests reject the null hypothesis that all four data samples come from the same distribution at a 1%. significance level.  We also performed a sensitivity analysis by running the simulations for each network over a broader range of R 0 . Figure 6 shows the error bars with mean and standard deviation of the distributions of the peak values of I(t), temporal location of the peak of I(t) and area under I(t) for R 0 = {0.2, 0.25, 0.3, 0.35, 0.4, 0.45, 0.5}. For each value of R 0 and each of the three metrics, the four networks resulted significantly different from each other. Moreover, an upward trend is clearly evident for the distributions of peak values and area under I(t) as the reproductive coefficient increases. The growing rate is also significantly different for each of the networks considered, with the "study" network showing the slowest rate. The average value of the peak time decreases slightly faster with R 0 for the "study" network and the "total" network, while it fluctuates over a stable range of values (30-35 days) for the other two networks.

Discussion
The analyses carried out in this work shed light on several aspects of complex networks synthesised from the origin-destination commuting flows within the Lombardy region. First of all, the correlation analysis between the graph metrics strength and degree shows that although the three networks exhibit the same scale-free behaviour, they are characterised by different weight dynamic growth rates, as revealed by the α exponent. In particular, the network "work" shows the most critical dynamic growth of the weights. The critical role of this network in the diffusion phenomena is highlighted by the simulations of the epidemic spread over the complex networks. Indeed, as shown in Figure 6, the network "work" exhibits higher values of the two parameters linked to the number of infected nodes compared to the other two networks, thus proving the greatest contribution to the epidemic spread.
More sophisticated models have been proposed to analyze the COVID-19 outbreak by using the mobility flows. As an example, in [27] Graph Neural Networks were combined with Recurrent Neural Networks for modeling the spatial and temporal components of the epidemic diffusion among provincial and regional nodes. The authors combined the complex network modeling with deep learning techniques to tune the time-varying parameters of an epidemiological model (e.g., contact rates and recovery rates). In [28], the authors investigated the early COVID-19 global transmission from one country to another by examining the human mobility networks related to inflows of migration and tourism. The authors tested the impact of incoming migration and inbound tourism upon the initial global spread of COVID- 19 showing the significance of complex networks modelling in shaping social interactions and mobility fluxes for epidemic forecast. Freely available data allowed to test COVID-19 diffusion models at different spatial scales such as spreading between countries, regions and provinces within national limits [29][30][31][32].
In contrast to these studies, our work did not aim to predict the variables of the epidemic phenomenon, but one of the main goals was to provide a multivariate detailed description of the mobility flows within the Italian Lombardy region by means of complex networks. Indeed, here we exploited the information about the daily commuting flows to construct four mobility networks with very high spatial granularity. A limitation of this work is that the OD matrices did not refer to the period of the COVID-19 epidemic spread, not allowing a quantitative comparison with the actual epidemiological conditions. On the other hand, our findings could be exploited to understand both the topological and dynamic structure of the mobility networks derived by classifying the flows of individuals. This static and dynamic information, as well as the epidemic simulation scenarios, can be leveraged to optimize containment actions such as targeted lockdowns that can impact each of the three networks and specific hub nodes within them.

Conclusions
In this work we proposed a simple model to understand an epidemic spread over a large area in the Italian Lombardy region based on the mobility data provided by OD matrices. In general mobility networks in which the travelling motivations are specified, could considerably facilitate the understanding of the spread by identifying the category of travellers which contributes most to the spread. Although in our work we did not consider real-time mobility flows during the period of the virus outbreak, our analyses clearly show that the work category exhibits the highest values for two parameters related to the infection curve thus resulting the category with the greatest contribution to the epidemic spread. Similar models could support the analysis of possible strategies for gradual recovery from the lockdown and may provide general indications for similar considerations in other regions where the epidemic is currently spreading.