Reconstruction of the Temporal Correlation Network of All-Cause Mortality Fluctuation across Italian Regions: The Importance of Temperature and Among-Nodes Flux

All-cause mortality is a very coarse grain, albeit very reliable, index to check the health implications of lifestyle determinants, systemic threats and socio-demographic factors. In this work, we adopt a statistical-mechanics approach to the analysis of temporal fluctuations of all-cause mortality, focusing on the correlation structure of this index across different regions of Italy. The correlation network among the 20 Italian regions was reconstructed using temperature oscillations and traveller flux (as a function of distance and region’s attractiveness, based on GDP), allowing for a separation between infective and non-infective death causes. The proposed approach allows monitoring of emerging systemic threats in terms of anomalies of correlation network structure.


Introduction
The monthly-based all-cause death rate fluctuations of the 20 Italian regions are highly correlated in time. This happens even in the absence of recognisable macroscopic parameters, such as massive epidemics. In this work, we tried and built a phenomenological model of the observed between-region correlations based on the traveller flux among the network having, as nodes, the regions and, as edges, the mutual traveller fluxes estimated by a simple exponential model having the distance between regions and Gross Domestic Product (GDP) as major determinants. The above model was complemented by the wellknown biphasic effect of temperature on all-cause mortality [1][2][3][4][5]. The problem can be interpreted as the reconstruction of a network wiring in which the between nodes (regions) edge strength corresponds to the observed temporal correlation of the relative death rate fluctuations in time (Y-network) by the network wiring generated by the combination of between nodes fluxes and temperature effects (X-network).
The strategy of analysis was as follows: the (extremely high) between-region correlation was normalised by what was expected by the observed (well-known) biphasic effect of seasonality. The crude effect of seasonality (when partially out) had, as a consequence, the effect of lowering correlations, but we still have a very high residual correlation for a more refined model.
The biphasic effect (high mortality in winter and summer) on all-cause mortality was hypothesised as derived from an infective component prevailing in winter and a non-infective component prevailing in summer. This interpretation stems from the higher diffusion of viral infections in winter and cardiovascular (often from older people's dehydration) in summer. The winter (infectious) component was modelled using the between regions travellers flux (exponentially decaying with distance) complemented by the 'attractiveness' of each region proportional to its GDP. Thus we generated a 'between-region flux network' using a SIR-like model. The summer (non-infectious) model was formalised using a linear function of the month-specific average temperature of each region. This allows us to take into consideration the effect of local heat waves.
A model encompassing the above-sketched elements (X-network) was fitted to the observed death rates, producing the correlation network (Y-network). This minimalistic model was able to reconstruct the death rate oscillation in time and the observed betweenregion correlation network with high fidelity (corr = 0.993 and corr = 0.841, respectively).
In this work, we demonstrate that weighted edge correlation networks are a very powerful method in epidemiological studies, allowing the tracing of the dynamics of mortality (morbidity) patterns and potentially discovering anomalies relevant to public health. Taking into account the most general definition of a system as '...a set of interacting units with relationships among them' [6], we can safely state that Italy, as for death-rate fluctuations, due to the high temporal correlation among its regions, is a proper system. This allows for the sensible use of second-order statistics (such as correlations), adding unique information content to environmental and epidemiological studies that, in the great majority of cases, rely on the exploitation of a single variable (e.g., death rate fluctuations in a given area) in terms of a set of covariates (e.g., pollution, age structure, etc.).

Materials and Methods
In order to model the possible order parameters shaping the observed (and partially unexpected) very dense among regions correlation structure of the monthly-based 2011-2019 time series of all-cause mortality, we tried to keep to a minimum both the a priori hypotheses and the number of fitted parameters. This modelling choice was dictated by both the lack of any strong theory on all-cause mortality and to avoid overfitting problems.
Thus we limit ourselves to inserting, as 'explanatory variables', the two-phase effect of temperature and an index derived by commuter flux among different regions modulated by the GDP of each region (considered a proxy of region attractiveness).
In the following, we will call n im the number of deaths recorded in region i during the m-th month of recording; accordingly, we will denote T im as the average temperature in the same region during the same month.

Bi-Phasic Effect of Temperature
To account for temperature effects, we assume that n im is Poisson distributed with n im = λ i (T im ): where a c,i > 0 and a h,i > 0 (c and h stand for 'cold' and 'hot', respectively). This is a convex function with a minimum at: The four parameters (a c,i , b c,i , a h,i and b h,i ) are fitted, for each region i, by maximising the log-likelihood (through the scipy.optimize.minimize function, with TNC method [7,8]): The temperature T im is computed by associating the main administrative centre of each region with the three closest weather stations for which we have temperature readings. T im is then taken, for each month, as the weighted average of the three stations, with the weight proportional to the inverse of the distance between each station and the main administrative centre.

Analysis of Commuter Flux
We denote c ij as the number of daily commuters from region j to region i; and d ij (The inter-region distance, d ij , we used is the distance between the main administrative centres-"capoluogo"-of each region.) as the distance between the same two regions; note that c ij , unlike d ij , is not symmetric. We hypothesise an exponentially decaying relationship between the flux and the distance: where the parameters (κ, d 0 and b 0 ) are fitted by maximising the log-likelihood (through the scipy.optimize.minimize function, with the TNC method; the zeros-no commuters from region j to region i-are not included in the fit): Here we are assuming a log-normal distribution for c ij around the expected value c dist ij . Defining c i: ≡ ∑ j c ij -the number of commuters to region i, and calling GDP i the GDP of region i, we hypothesise the existence of a linear relationship: whose slope κ is fitted by maximising the log-likelihood (through the scipy.optimize. minimize function, with the TNC method): We are assuming that c i: follows a log-normal distribution around the expected value c gdp i: . This is in contrast with the assumption of log-normality for c ij , since the sum of log-normal variables is not itself log-normal. Yet, in many cases, this is a good approximation [9].
Finally, we performed a fit that considers the two effects together: the exponential decay with distance and the linear dependence on the GDP of the region of destination; indicating with pop j the population of region j, we have: As above, the fit procedure finds the best parameters κ and d 0 by maximising the log-normal log-likelihood (through the scipy.optimize.minimize function, with the TNC method; the zeros-no commuters from region j to region i-are not included in the fit).

Total Flux
We hypothesise that the total flux of persons f ij comprises, beyond the daily commuters c ij , an 'episodic' component e ij of more irregular movements: Starting from the results for c ij , we make the assumption that e ij is an exponentially decaying function of the distance between regions and a linear function of the GDP of the region of arrival i and of the population of the region of departure: With respect to Equation (8), we expect d e 0 > d 0 , since episodic travels, in contrast with frequent ones, are likely less affected by the distance to travel.

Sir Network Model
The full flux-temperature model includes two different effects. The first one is related to the non-infective component of mortality: where λ flux im is the model expectation for the number of deaths in region i at month m; and ρ 0i is a baseline mortality rate for region i. Note that this effect is akin to the warmseason component of Equation (1), but here in the flux-temperature model, for the sake of parsimony, we lose the individualised behaviour of each region i, and all regions respond to high temperatures in the same way.
The second effect takes into account the infective component of mortality. We make the simplifying assumption that, in each month, a new infectious disease starts spreading; at the end of the month, a fraction µ of the people 'recovered' from the disease dies; the following month, the process starts afresh. The spreading of the disease follows a SIR (Susceptible, Infected, Recovered) model [10] on the flux network. Defining the two matrices: the dynamics of the model reads: where S i , I i and R i are the number, respectively, of susceptible, infected and recovered individuals in region i; β measures the rate at which susceptible individuals get infected (S → I); and γ is the rate of recovery, I → R. The model, therefore, consists of 60 coupled differential equations.
The reasoning behind the model is as follows. The term ∑ j φ ij S j represents the number of susceptible individuals in region i, at a given instant in time, due to the flux from other regions (minus the flux out of region i itself-the diagonal elements φ ii ); for the infected, it is ∑ j φ ij I j . In a classical SIR model, the number of newly infected individuals dI is given by β S I pop ; in our case: At the end of the day, the reverse flux f ji (people moving back from region i to region j) will redistribute the newly infected in proportion to the fraction of susceptible individuals contributed by each region j; this is given by: that, together with Equation (17), gives the infinitesimal increment of infected people entering Equation (15) (first term on the left).
The initial conditions are always in the form of one 'patient zero' in region i 0 at time Since i 0 is not known, we assume it to be a random variable distributed such that: this amounts to assuming that the external flux to region i (people coming to region i from outside Italy) is proportional to the GDP of the region itself.
For each month, we evolve Equations (14)- (16) for 30 days (t ∈ [0, 30]); the equations are integrated using the Euler method, with step size dt = 1 day. Finally, this infective component of mortality is incorporated into the model: where with R i 0 im (t = 30) we designate the total number of recovered individuals for region i at the end of month m, when the patient zero was located in region i 0 (all months are assumed, for simplicity, to have 30 days).
To also incorporate seasonal effects in the infective dynamics, we make β a function of the temperature: with a β > 0; with the additional constraint that β < 1 dt (the condition β = 1 dt amounts to having all the population infected in a single dt; larger values lead, in the Euler approximation, to unphysical solutions).
Considering Equations (9) and (11) (14)-(16) (γ), the model comprises 28 parameters; of which 20 (the ρ 0i ) are simply used to offset the different mortality rates in different regions (due, for example, to distinct age structures). These parameters are fitted to the data by maximising the Poisson log-likelihood: where N counts is the number of terms in the sum ∑ i, m (if we consider n batch different months, having 20 regions, N counts = 20 · n batch ), and p(i 0 ) is given by Equation (19).
To this likelihood, we added two prior likelihoods to constrain the parameters of the model. The first is a soft flat prior, not-null in the range [0, 0.2] for the quantity ∑ k f ki pop i (see Equation (9)); the log-prior becomes quadratic outside the allowed (flat probability) range; the factor in front of the quadratic term is chosen large enough to practically prevent leaving the allowed range. This constrains the fraction of the population leaving a region every day to less than 20%. The second log-prior is quadratic in d e 0 (Equation (10)): to penalise very high spatial decay constants d e 0 for the episodic component of the flux. The maximisation has been carried out, in this case, through the Adam optimiser [11], with default parameters (β 1 = 0.9 and β 2 = 0.99) and a learning rate decreasing at each optimisation step according to: The training set consists of the monthly death counts for each of the 20 regions for 96 (out of 108) randomly chosen months in the period 2011-2019. We reserved 12 months (12 + 96 = 108) as test data; these months were selected to have one exemplar of each calendar month-January to December; since the dataset spans only 9 years, 3 randomlyselected years contributed two months (6 months apart, e.g., April-October) to the test data.
At each step, n batch months (with month, we here denote one specific month in a specific year; so, in the training set, we have 96 months) are randomly selected from the training set (the same month can appear multiple times in the batch). For each month m, a patient zero-region i 0m is randomly extracted with a probability given by Equation (19). The computed log-likelihood is then: a stochastic approximation of the total log-likelihood of Equation (22). For the first 10 4 optimisation steps, n batch = 10. From that step onwards, n batch = 100; and, to the log-likelihood, we added a 'regularisation' term: where corr 0 ij is the actual correlation between the monthly deaths of region i and region j; whereas corr ij is the corresponding correlations produced by the model (on the specific batch); factor 1 190 normalises the sum ∑ i>j , which comprises 190 terms. We set υ corr = 8.76 × 10 2 .
We monitored the log-likelihood (Equation (22)) on the test data during the training; since it never substantially decreased (that would suggest some level of over-fitting), we interrupted the optimisation after 10 6 steps, when improvement on the training set appeared extremely slow.
All the computations were performed with custom code written in Python; core functions were just-in-time compiled, and their gradient was computed, where necessary, through the Jax package (https://github.com/google/jax, accessed on 9 December 2022). The complete code, reproducing all the reported results, can be found at https://github. com/GuidoGigante/All-cause-mortality-fluctuation-across-Italian-regions, accessed on 9 December 2022.

Results
The course of monthly death rates, normalised to the mean over the entire period, is strikingly similar for different regions. This can be appreciated in Figure 1a, where we show three regions (chosen to be representative of the north-Lombardia, centre-Lazio, and south-Sicilia, of Italy).
Such observations are made more quantitative in Figure 1b, which shows the betweenregion correlation matrix of monthly death rate fluctuations (correlations computed on 108 data points) relative to the different regions. As evident from the figure, the between-region correlations are extremely high (0.865 ± 0.063), with smaller and less densely populated regions (i.e., Valle d'Aosta and Molise) endowed (as expected) by a lower (albeit very significant) average correlation strength (0.739 and 0.805, respectively).
To check if the bi-phasic effect of temperature was sufficient to get rid of the observed correlations (in the presence of a substantially similar age structure across the different Italian regions), a quantitative model taking into account the temperature effect was fitted to the different regions' mortality data (see Section 2).       (1) and (3)). (b) Pairwise between-region correlations for the time series of the deaths when the fitted effect of the temperature is subtracted from the raw numbers. Correlations drastically decrease (colour scale as in Figure 1b) but remain large.
By normalising the time series of death rate fluctuations by temperature effect, the between-region correlation drastically decreases (0.63 ± 0.12), therefore, confirming the expected effect of temperature on mortality (Figure 2b; colour scale as in Figure 1b). Notwithstanding that the residual entity correlation is still high, asking for some other relevant factor to be taken into consideration.
We hypothesise that strong correlations among regions also arise for an infective component, continually spreading from region to region at small time scales (less than a month), driven by the movement of people from one region to another. First, we examined the data about the flux of daily commuters between regions; such data show a clear dependence both on spatial distance (exponential decay; see Figure 3a; the continuous line is the result of a fit, see Equations (4) and (5)) and the GDP of the region of arrival, having the role of an 'attractiveness' factor (linear dependence; see Figure 3b, where the continuous line is the result of a fit; see Equations (6) and (7)).  (6) and (7)).
The two determinants (distance and GDP) are considered together in Figure 4, where the actual number of commuters (from one region to another; the flux is not symmetric) is compared to the result of the fitted model (see Equation (8)); the good agreement of the reconstructed flux with the real one (the continuous line is the identity line) supports the assumptions of the model.
Starting from these results, we make the hypothesis that the total flux of people between regions is made of the commuters flux plus an 'episodic' flux, unknown but with the same functional form (exponential decay with distance; linear dependence on GDP). Then, we built, on the total-flux matrix, a SIR-like model [10] that takes into account the exchange of infected people between regions. Temperature impacts the model in two ways. The first increases with the temperature and is akin to the high-temperature (rightmost) arm of the model of Figure 2a. The other modulates the contagiousness of the disease (higher for lower temperatures).
In the model, each month a new disease starts spreading from a given region (chosen according to a probability distribution); at the end of the month, a fraction of the 'recovered' people dies. Added to this effect are the high-temperature mortality and, finally, a generic, temperature-independent, region-specific mortality.

Discussion
We aptly reconstructed the strong correlation among temporal series of all-cause monthly death rates relative to the 20 Italian regions by a model encompassing noninfectious (mainly summer) and infectious (winter) components. This last component was modelled in terms of a set of SIR equations taking into account both the across-regions commuter exchange (daily flow) and the more irregular traveller flux (longer time flow). The different mathematical treatments of summer and winter components allowed for a neat increase in the reconstruction of both global death rates and among-regionscorrelation strength concerning crude seasonality.
The reconstruction of the observed correlation network by our model (Figure 6), excluding the hypothesis of contemporary arising 'epidemic sources' across all the regions (that have near-zero probability), confirms the reliability of the proposed model. Overall, we can consider Italy as a proper 'integrated system' that, thanks to both a rich exchange flux among regions and the sharing of 'heat waves', reaches a general coherence in death rate fluctuations. This coherent behaviour acts as a largely invariant 'mean field' governing all-cause monthly (and thus unaffected by longer time fluctuations in age class distribution) death rate fluctuations.
The existence of a very stable correlation network among Italian regions can be profitably used as a tool for the epidemiological surveillance of the territory: the arising of an anomalous value of the correlation degree of a region can be intended as the presence of an emerging source of risk (of both infectious and/or environmental origin). Thanks to the intrinsic redundancy of the correlation matrix, any (even transient) change reverberate on the entire network, allowing for a more sensible detection of instabilities: one clear example is the case of Recurrence Quantification Analysis (RQA, [12]). RQA relies upon the construction of a distance (correlation in the case of angular metrics) matrix between subsequent epochs of a time (or space [13])-dependent signal. When in the presence of regime changes driven by a slowly varying control parameter, RQA metrics (at odds with usual statistical indexes) exactly determine the entity and time (spatial) location of regime change [14,15]. Analogous considerations hold for correlation matrices and any other network system [16] induced by external (e.g., epidemics) or internal (e.g., deterioration of living conditions) driving forces [17].
The application of statistical-mechanics-inspired tools in public health is still in its infancy [18,19] and/or confined to very specific issues [20,21]. In this work, relying upon a very general consideration by Alexander Gorban and colleagues, 'It is useful to analyse correlation graphs' [22], we demonstrate how a raw (albeit very reliable) indicator, allcause mortality, is amenable to a statistical mechanics approach opening new avenues to epidemiological and environmental research.

Conclusions
All-cause mortality is considered a general indicator of the general health status of a population in the context of a particular age structure. This is why many epidemiological studies investigate both temporal and spatial fluctuation of all-cause mortality, looking for a correlation with socioeconomic [23], environmental [24][25][26] and physiological/pathological conditions [27].
Moreover, the reliability of all-cause mortality statistics, due to its coarse-grain character, makes the analysis of its fluctuations a very important viewpoint to estimate the impact of epidemic threats [28].
Our approach stems from the above considerations on all-cause mortality to explore a still neglected dimension of this index: the relative weight of purely stochastic and deterministic (coherent) components of spatio-temporal fluctuations of all-cause mortality. The presence of external drivers increasing the correlation of such fluctuations was already observed [24] in terms of departure from 'optimal temperature', here we highlight another correlation source linked to the population fluxes among different areas modelled in terms of both daily commuters and 'relative attractiveness' of different regions in terms of GDP.
This choice allowed us to get rid of a very dense correlation network among the 20 Italian regions for monthly-based fluctuation rates in the 2011-2019 period. The presence of very high temporal correlations among different regions was partly unexpected and constituted a very important result for monitoring the onset of emerging public health threats in terms of alterations of such correlation structures.
Author Contributions: Conceptualisation, methodology and writing, A.G. and G.G.; software, formal analysis and visualisation, G.G. All authors have read and agreed to the published version of the manuscript.
Funding: This research received no external funding.

Institutional Review Board Statement: Not applicable.
Data Availability Statement: The daily deaths for each Italian district ('comune'), starting from 2011, can be downloaded from the site of the Italian Institute of Statistics (ISTAT), following this link: https://www.istat.it/storage/dati_mortalita/Dataset-decessi-comunali-giornalieri_regioni% 28excel%29_5-21-ottobre-2021.zip (accessed on 24 June 2022). The geographical coordinates for each comune can be found in the following GitHub project: https://github.com/MatteoHenryChinaski/ Comuni-Italiani-2018-Sql-Json-excel, notably the file italy_geo.xlsx (accessed on 24 June 2022). The data concerning the temperature have been obtained, on 23 June 2022, from the National Centers for Environmental Information (https://www.noaa.gov/), with the following order specifications: Begin date: 2012-01-01 00:00; End date: 2021-12-31 23:59; Data types: PRCP, SNWD, TAVG, TMAX, TMIN; Custom Flags: Station Name, Geographic Location, Include Data Flags. The number of daily commuters between regions has been obtained from the data made available by ISTAT at the link: https://www.istat.it/storage/cartografia/matrici_pendolarismo/matrici_pendolarismo_20 11.zip (data description can be found at https://www.istat.it/it/archivio/157423; accessed on 28 June 2022). The number of commuters we used is the sum of commuters from any comune belonging to one region towards any comune of another region. The GDP for each region has been retrieved from the Wikipedia page https://it.wikipedia.org/wiki/Regioni_d%27Italia (GDP is 'Prodotto interno lordo' or PIL, in Italian; accessed on 10 October 2022).

Conflicts of Interest:
The authors declare no conflict of interest.