Comparison of Main Covid-19 Outbreaks and Interpretation Based on Age Differences

: The main scope of this study is a critical comparison of data coming from different regions in the world, where signiﬁcant outbreaks of the Covid-19 pandemic took place, accounting for age differences among the considered samples. Scaling laws are derived, driving interpretations of the death toll in the analyzed clusters.


Introduction
The Covid-19 epidemic stands as a unique case in human history: it is the first time that a very infectious pathology has spread within a highly globalized environment. After the first outbreak in the Hubei region of China, the virus hit the Italian region of Lombardy in March. Since no past record can be retrieved, one way to understand the Italian spread rate is to use the only previously available data: those from China, where the epidemic took place more than one month earlier.
From official data, the death toll was surprisingly high in Lombardy, if compared with Hubei data. Despite a couple of differences, at first approximation the two outbreaks are comparable, and a similar number of cumulative deaths was expected. In particular, this should have happened in the first period, before any impact of lockdown measures or differences in the health system starts to bias the data time evolution.
The goal of this study is to prove that the difference on the cumulative death toll between Lombardy and Hubei can be explained mostly considering the age difference of the two populations.

Samples and Variables
The comparison was made between the single outbreaks experienced in January and March, respectively in Hubei and Lombardy. Table 1 shows features of the two regions. Despite differences in population and area extents, the population density is comparable.
The comparison could be done on three quantities: • number of infected people; • number of people in intensive care; • number of dead people. The number of infected people is not a good universal estimator, as it depends on the number of tested ones, and subsequently on the ways people are tested and declared positive: for instance, after 12 February, in Hubei the number of positive people was determined only using symptomatology and chest X-rays, without the nose pharyngeal swab tests.
The number of people under intensive therapy depends on the bed capacity, which in turn is difficult to compare between countries with different Healthcare systems and services. In light of these considerations, the number of dead people results in the crudest but also the most significant estimator adopted in the present work for a quantitative comparison. Figure 1 shows the comparison of the cumulative number of deaths between Lombardy [4] (blue) and Hubei [5] (red), while Figure 2 shows the comparison of the two-day victims number, in the two regions. The first day considered is conventionally that when 100 cumulative deaths has been exceeded: 28 January in Hubei and 6 March in Lombardy, 37 days later. By looking at both Figures 1 and 2 it is clear how the number of deaths in Lombardy exceeded those in Hubei data.  Different hypotheses have been considered to understand this behavior: • lockdown decision delayed for too long a time in Italy; • Italian Healthcare system not equipped enough or unfit to face an epidemic; • differences in the mean population age, between China and Italy.

Age Ranges
The goal of the present work is to understand whether-at first approximation-the difference between Hubei and Lombardy in the cumulative death toll can be explained by age difference.
The mean population age in China is 37 years old as reported in 2019 [6], while in Italy is 45 years old as reported in 2018 [7]. Consequently, during the first months of the pandemic, the mean victim age (MVA) in Italy is around 80 years old [8], while for China, from data sorted according to population age [9], the MVA is estimated to be 70 years. Similar age differences were found checking the infected people mean age, about 50 in China [10] and 62 for Italy [8]. Table 2 shows the comparison of mean ages in the early months of the pandemic, between the two countries. Figure 3 shows the distribution of the number of victims per age category [8,9]. The mean victims age is the quantity to be elaborated further on, in the following.

Methodology
The idea of this work is to use the Hubei data as the starting point to build a model to be compared with other outbreaks, in particular with the Italian one, under the hypothesis that at first approximation the number of victims for different outbreaks depends mainly on the age distribution of deaths. Linear extrapolation is used to morph the Hubei data distribution to a different age range. This is a common methodology in Physics [11,12], also widely used in most of the other research fields. Extrapolation is performed daily multiplying the actual number of victims by a proper scale factor, to obtain a time evolution of the cumulative number of victims, corrected accounting for a population age different from the starting sample. Here, the actual number of deaths in Hubei N real death is rescaled as if this region population had a different mean age. The scale factor S k is the parameter that needs to be estimated. in the present work, it is defined as where M k is the mortality-namely the ratio between the number of deaths and the number of infected cases-for the specified k age range, and M real is the real age of the starting sample. Available data [10] on the population mortality M are stratified in ranges of ten years based on the age, and could be obtained in different ways: for the moment the crude estimate method is used, so mortality is simply the ratio between deaths and infected people numbers. Impact from this choice rather than others is discussed in Section 4.
Since in China the MVA is about 70 years (as discussed in Section 2.2), three hypotheses are made: It is possible to define a total of nine rescale factors, using the three hypotheses for China age range assignment times three hypotheses for Italy age range assignment.

Analysis
Data from the Hubei region provide the model to be compared to those from the Lombardy region. In principle, this comparison can be performed based on nine possible age assignment combinations, as there are three range hypotheses for both China and Italy.

China in 60-70 Range
The first hypothesis is the 60-70 range assignment to China: M k = M 60-70 . The three assignments to the Italian age are discussed as follows.

Italian Optimistic Assignment (C-Rescale)
The first hypothesis for Italy is to assign the 70-80 years range mortality to Italy. Then, the actual number of deaths in Hubei N real death is rescaled as if this region population were 10 years older, adopting the 70-80 range instead of the 60-70, previously considered as the MVA is 66 years in China. This estimate is computed using the mortality ratio between the two intervals in Hubei, as the scale factor: The scale factor is defined as a conservative rescale (C-rescale) The uncertainties on the mortality factor M k are evaluated using the Gaussian approximation from [10], and the uncertainty on the scale factor is determined through error propagation. Figure 1 shows the comparison of the cumulative number of deaths among Lombardy (blue), actual Hubei (red) and Hubei rescaled (black) according to Equation (2), while Figure 2 shows the comparison of the two-day number of deaths among the same regions.

Italian Average Assignment (M-Rescale)
Since in Italy the MVA is about 80 years, assigning the 70-80 years range mortality in the numerator of Equation (2) is a slightly optimistic choice, as mentioned before.
A different hypothesis [10] is to use the average mortality between 70-80 and 80-100 ranges, i.e., M 70-100 = (11.7 ± 0.2)%, so that the scale factor results in S 70-100 60-70 where again uncertainties are obtained in Gaussian approximation. This choice of an average mortality is defined M-rescale. Figure 1 shows the comparison between this estimate (gray) and those previously discussed in terms of the cumulative number of deaths. Figure 2 shows the comparison of the two-day number of deaths, among the same categories in the latter Figure.

Italian Pessimistic Assignment (NC-Rescale)
The third hypothesis is formulated by assigning the 80-100 years range mortality to the Lombardy region, M 80-100 = (15.0 ± 1.5)%, so that the related scale factor is: and it is defined as a pessimistic non-conservative NC-rescale factor. Figure 1 shows the comparison between this estimate (brown) and the previously discussed ones for the cumulative number of deaths.The curves of growth of Lombardy and NC-rescaled Hubei have a very similar behavior. For more consistent a comparison, Figure 1 suggests that a proper treatment of the counts as a function of time must be applied. This is explained in the following Section 3.1.4.

Time Offset
Since the starting day for each population is a conventional choice, it is possible to shift in time one distribution with respect to the others: in this way, the first day is synchronized among different samples. The adopted time offset is the first day in which both Lombardy and NC-rescaled Hubei exceed the number of 1500 deaths, instead of the 100 deaths previously used. This approach is equivalent to move the Lombardy distribution 5 days back in time, namely starting from 11 March. As a result of this choice, the agreement between Lombardy and NC-rescaled Hubei is very good, especially in the rising part of the curve, as shown in Figure 4.
The comparison between Lombardy and NC-rescaled Hubei is studied in detail by evaluating the pull in Figure 5 (left panel), namely the difference divided by the uncertainty, dominated by the systematic one associated with the scale factor, and the ratio of the two distribution entries in Figure 5 (right panel). In this latter case, a constant term is fit to the ratio, to estimate the fractional difference between the two distributions, at least starting from a time period where the agreement tends to be good: from a common time defined as the 5th day, the two distributions have an average overall difference of 4.2%.

Other Age Assignments for China
As already discussed in Section 2.3, other two hypotheses for the China age range can be done: the 70-80 range (M k = M 70-80 ) and the average one, where M k = M 60-80 . Since, three age ranges are also possible for Italy, the number of scale factor combinations is six, in addition to the three already discussed in Section 3.1: , within uncertainty. Thus, the result is almost the same discussed in Section 3.1.1. All other values are lower than the result discussed in Section 3.1, as expected. Since good agreement with the Lombardy data is found only with the highest scale factor, any other value will result in a disfavored fit and so it is not investigated further.

Systematic Uncertainties Evaluation and Control Sample
The main systematic uncertainty of this study is due to the scale factor calculation that depends on the mortality estimate method.
It is possible to vary the criterion of the mortality calculation to check the dependence of the scaling factor. In reference [10] four different criteria are presented: • crude estimation (CrE), used in the analysis; • adjusted for delayed mortality (ADJ1); • adjusted for unidentified symptomatic cases (ADJ2); • adjusted for both (ADJ3).
Details in the description of these criteria are beyond the scope of the present analysis. The main message conveyed here is that a different scale factor is associated with each of the four criteria.
Starting with the conservative scaling, the results of S 70-80 60-70 are the following: in very good agreement within one another within the uncertainties. For the non-conservative scaling, the results for S 80-100 60-70 are the following: also, in this case-namely the scale factor that yields the best agreement between Lombardy and rescaled Hubei-whatever choice is in very good agreement with the others within the uncertainties.
Since it is just the average of the previous results, by construction the M-rescale factor turns to be consistent against any choice of the four criteria, and no explicit check is reported. As a result of these checks, no dependence on the mortality estimate method is found on the way scaling factors are evaluated.
To validate the concept behind the presented scale technique-based on demographic considerations-it is important to identify an independent control sample, with features similar but not completely equivalent to Hubei and Lombardy: population density, healthcare system, single outbreak, mean population age, sufficient statistics. The selected control sample is the state of New Jersey in the USA, as it bears features similar to Lombardy. Table 3 shows the main features of interest for the present analysis. Since the New Jersey MVA is slightly lower but close to 80 years, an agreement with either the C-rescaled or the M-rescaled Hubei is likely to be expected. Figure 4 shows all results discussed so far on the cumulative number of victims plus the curve related to New Jersey. In the latter, the starting day is 27 March, when more than 100 deaths have been recorded.
Good agreement with the C-rescaled Hubei behavior is found during the first four weeks, expected as mentioned before. After this first month, the curve suddenly changes slope, starting to grow as fast as the M-rescaled Hubei behavior. As already said, this can be understood considering that New Jersey MVA is close to the upper limit of the 70-80 years range, such that a good description lies in between the C-rescaled and M-rescaled Hubei distributions. The steep rise of victims after about one month in New Jersey is also probably due to the differences in the lockdown procedure implemented in the USA with respect to the rigid one applied in China. When compared to Lombardy, possible sources of difference must be investigated within a younger mean population age, closely related to a younger mean infected age.

Results
This study demonstrates that to a very good approximation the difference in the cumulative number of deaths between Lombardy and Hubei can be explained by taking only the age difference of the two populations into account, and rescaling by means of the mortalities ratio between the two samples. Independence from the mortality calculation method is also proven. Nine scaling possibilities are tested, depending on the age range choice for both Chinese and Italian mortality. After having corrected the Lombardy starting day to achieve consistent time synchronization with Hubei, a quite good agreement is obtained between the victim growth curves of the two populations. This happens with the largest rescale factor out of the tested hypotheses for China. Thus, the mean age difference does not explain exhaustively the analyzed behaviors.
One control sample is used to test the main concept behind this procedure: New Jersey in the USA, which has similar area and population density values compared to Lombardy. Good agreement with the Hubei rescaled conservatively is found, as expected since both regions have lower age features than Italy in both mean population and victims age in New Jersey. Another relevant aspect is that lockdown measures in the USA were generally softer than in China. Of course, every single outbreak brings aspects that cannot be exhaustively explained only by means of mere demographic considerations.
The same methodology can be applied to other studies that compare data from the Covid-19 pandemic, after having quantitatively investigated and identified possible sources of difference between two outbreaks sharing similar properties, such as the population density.
In the present work, the age interval is assigned to each population. Then, the mortality ratio of the two samples for the different age range is used to estimate scale factors that can be applied to one or both samples, for an improved comparison.
In conclusion, a simple method of rescaling according to mortality factors can explain large portions of the death toll rise in different countries, with similar demographic characteristics in terms of area and population density.