An Ultrametric Random Walk Model for Disease Spread Taking into Account Social Clustering of the Population

We present a mathematical model of disease (say a virus) spread that takes into account the hierarchic structure of social clusters in a population. It describes the dependence of epidemic’s dynamics on the strength of barriers between clusters. These barriers are established by authorities as preventative measures; partially they are based on existing socio-economic conditions. We applied the theory of random walk on the energy landscapes represented by ultrametric spaces (having tree-like geometry). This is a part of statistical physics with applications to spin glasses and protein dynamics. To move from one social cluster (valley) to another, a virus (its carrier) should cross a social barrier between them. The magnitude of a barrier depends on the number of social hierarchy levels composing this barrier. Infection spreads rather easily inside a social cluster (say a working collective), but jumps to other clusters are constrained by social barriers. The model implies the power law, 1−t−a, for approaching herd immunity, where the parameter a is proportional to inverse of one-step barrier Δ. We consider linearly increasing barriers (with respect to hierarchy), i.e., the m-step barrier Δm=mΔ. We also introduce a quantity characterizing the process of infection distribution from one level of social hierarchy to the nearest lower levels, spreading entropy E. The parameter a is proportional to E.


Introduction
In this paper we present a new mathematical model of disease (say a virus) spread that differs from the standard SIR-like models [1][2][3][4] (see also [5][6][7][8][9][10] for recent applications of these models to the covid-19 epidemic). Its main assumption is the heterogeneity of the population with respect to disease spread: splitting of population into social clusters with hierarchic tree-like structure. Mathematically our model is based on the theory of random walks on hierarchic tree-like structures, ultrametic spaces. We used the results of the well known paper [11] on random walk on p-adic trees, the simplest hierarchic structures endowed with ultrametric. (We recall that a p-adic tree, where p > 1 is a natural number, is the tree with p branches leaving each vertex.) The main outputs of the paper that can be interesting for epidemiologists are the graphs presented in Section 4: the power law for asymptotics of the probability P I (C, t) to become infected in social cluster C in a long run of an epidemic; so to say at the phase of approaching of some level of immunity: where a is a parameter depending on the magnitude of a social barrier (see (3)). Hence, we can model dependence of P I (C, t) on the strength of preventative anti-epidemic measures established by state authorities (see Figures 2 and 3). We quantify herd immunity with probability This is a sort of "integral immunity", a combination of innate and adaptive components. For small values of parameter a, this function increases very slowly; cf. with Britton's analysis [8] demonstrating that induced herd immunity level for COVID-19 in Sweden is substantially slower than the classical herd immunity level.
As was pointed out, our basic assumption is that a virus spreads in a population having the structure of hierarchically coupled social clusters (see Appendix A on information about such clustering of infection during the COVID-19 epidemic). A social cluster can be a collective of some enterprise or a state department, say clerks of the community office of some town or the personnel of some hospital. Inside such a cluster people still have a relatively high degree of social connection (see preprint [12], Appendixes 1,2, for details). However, during an epidemic authorities erect sufficiently high barriers between clusters and people terminate many sorts of social contacts. Thus, we proceed with the following basic assumption. Assumption 1. Disease spread is coupled to the hierarchic social cluster structure of population.
Starting with this assumption, we design a mathematical model implying the following: Consequence 1. The epidemic has a relatively slow decay and approach to herd immunity (for sufficiently high preventative barriers).
The problem of approaching herd immunity is very important for diseases coming as repeating waves, the first wave, the second wave and so on. The herd immunity is mathematically formalized through the probability P I (C, t) (see Figure 2, Section 4).
In our model a virus (or its carrier) walks randomly in socially clustered society. It starts just from a single social cluster (this assumption is used for mathematical simplicity); it spreads relatively easily inside any cluster, but to approach other social clusters it should "jump over social barriers". For linearly increasing (with respect to hierarchy) barriers between clusters, the basic parameter a of the model (see (1)) has the form: Here ∆ is the magnitude of the elementary barrier for hopping between nearest social levels. A higher social barrier ∆ implies slower growth of herd immunity. (In the physics of spin glasses, this parameter has the form a = T log p/∆, where T is temperature. In [12], we proceeded as in physics by introducing a social analog of temperature. However, the notion of social temperature needs further justification and in the present paper we preferred to proceed without it.) Quantity ln p can be interpreted as the entropy of the process of a virus spreading into subclusters of a cluster, spreading entropy; see (17). Thus, larger spreading entropy of the social cluster tree implies quicker herd immunity. The configuration space of dynamics is the tree of social connections between people. In epidemic modeling, it is natural to assume the presence of the hierarchic structure in social clusters of people, by ranging basic social parameters coupled to infection (see Section 2 and Appendix A for examples).
This representation of social types as vectors of hierarchically ordered social coordinates has been already used by the first author and his collaborators in a series of studies in cognition, psychology and sociology [19][20][21][22][23][24].
Just before submission of his preprint [12], the first author found the recent paper of Britton et al. [25] in which the role of population heterogeneity in the spread of COVID-19 was analyzed. We remark that Britton contributed a lot in mathematical modeling of COVID-19 spread in Sweden. His models [7,8] were explored by chief epidemiologist Anders Tegnell to justify the Swedish policy with respect to the epidemic-no lock-down given the expectation of rapidly approaching herd immunity. On the basis of Britton's models, the Swedish State Health Authority predicted (at the end of April 2020) that herd immunity would be approached already in May. However, this prognoses did not match the real situation and herd immunity was not approached either in May or in June (see, e.g., [26][27][28] for reports from Public Health Institute of Sweden; see also [29][30][31][32][33]). In previous modeling [7,8] for the COVID-19 epidemic, the Swedish population was considered homogeneous. In [25], the heterogeneity of the population was considered as an important factor; the model involves two "social coordinates" (in our terminology, see Section 2): social activity and age.
Taking into account population social clustering is the basic similarity of our models (see also Appendix A) and generally paper [25] supports our approach. The main difference is that in [25] the hierarchic structure of social clustering and hence the hierarchy of barriers between clusters is not taken into account. Another crucial difference is in mathematical methods, based on the real metric vs. ultrametric. Surprisingly, these two totally different mathematical models led to graphs of the same shape; see Figure 3 (Section 4) and see Figure S2, Supplementary Material [25] (see also the remark after Figure 3.
Both models provide the possibility to play with the strengths of preventive measures and see their effects on the epidemics' dynamics.
We do not want to overshadow our model of disease spread with mathematical technicalities. Therefore, we appealed to the simplest theory developed in [11], random walks of p-branching trees, where p is the same for all vertices. The general mathematical theory is based on theory of diffusion equations over the field of p-adic numbers Q p ; see, e.g., pioneer papers [34][35][36][37][38][39][40]. We remark that in contrast to the present paper the use of the p-adic diffusion equation is restricted by the constraint "p is a prime number". The latter is crucial to defining the operation of division on a p-adic tree and determining the structure of the number field. Following [11], we proceed with an arbitrary natural branching number p > 1. Of course, in real applications the branching number can depend on vertexes. The corresponding mathematical theory is more complicated [38,39].
Finally, we point out that recently ultrametric dynamical models for hierarchic clustering started to be used in geophysics, with applications to petroleum research; see, e.g., [41][42][43]: propagation of oil through capillary network in porous disordered media.

Social Trees
We represent human society as a system of hierarchically coupled clusters. Each cluster can be represented as a disjointed union of sub-clusters corresponding to the next level of hierarchy. Said population clustering can be done in many ways. We explore the approach that was used in a series of works of ultrametric modeling in cognition and sociology (see, e.g., [19][20][21][22][23]). The tree-like representation of social types is based on the selection of hierarchically ordered social factors enumerated as m = 0, 1, 2, 3, ...; factor m = 0 is the most important, m = 1 is less important and so on. A social type is represented by a vector where its coordinates x m take (typically) discrete values quantifying the m-th factor. In the simplest case, x m takes two values, "yes" or "no"; 1 or 0. We call numbers (x m ) social coordinates. The vector representation of social types and individuals is widely used in sociology and psychology. The main distinguishing feature of our model is endowing the space of vectors with the special metric reflecting the hierarchic structure corresponding to the order of social factors. The space of all vectors of the form (4) is called the hierarchic social space. Since states play the crucial role as the epidemic-policy determining authorities, it is natural to select the most important index m = 0 as a label for the states. However, we proceed with modeling the epidemic-situation in a fixed state, and influenced by lessons of COVID-19 epidemic, we use index m = 0 for an individual's age, and then m = 1 for the presence of one chronic disease, m = 2 for gender, m = 3 for race, m = 4 for a town, m = 5 for a district, m = 6 for profession, m = 7 for the level of social activity, m = 8 for the number of children and so on. We understand that said ranking of the basic social factors related to disease spread is incomplete (see Appendix B for further discussion). The contributions of sociologists, psychologists and epidemiologists can improve the present model essentially; see even the recent article [44] on the mathematical model of the evolutionary creation of social types and contribution of genetics and natural selection.
For mathematical simplicity, we consider p-adic coordinates, x m = 0, 1, ..., p − 1, where p > 1 is a natural number. The space of all such vectors is denoted by the symbol Z p;n (p is fixed); p-adic social space. We now turn to the definition of a metric on Z p;n corresponding to hierarchy of coordinates. Consider two social vectors x = (x 1 , ..., x n−1 ) and y = (y 1 , ..., y n−1 ). Let their first k coordinates be equal, x 0 = y 0 , ..., x k−1 = y k−1 , but the kth coordinates be different, x k = y k . Then the hierarchic social distance between these social types should be d(x, y) = n − k. The first social coordinates are the most important: the common initial segment of vectors corresponds to a closer social sphere; an increase of k implies a decrease of distance between two social types. For example, let k = n − 1; i.e., two points differ only by the last coordinate; then d(x, y) = 1. This is the minimal possible distance in Z p;n . (The coordinate x n has the minimal degree of importance.) If the vectors differ already by the first coordinate, i.e., x 0 = y 0 , then d(x, y) = n. This is the maximal possible distance between points in space Z p;n . Distance d is ultrametric; it satisfies the strong triangle inequality: for any triple of points x, y, z ∈ Z p;n . Here in each triangle, the third side is less than or equal not only to the sum of two other sides (as usual), but even to their maximum. As usual, in a metric space we can introduce balls, B N (a) = {x ∈ Z p;n : d(a, x) ≤ N}, where N = 1, ..., n, and a = (a 0 , .., , a n−1 ) is some point in Z p;n , the ball's center. In an ultrametric space, any two balls are either disjointed or one is contained in another and any point of a ball can be selected as its center.
For our modeling, it is important that the space Z p;n can be split into disjoint social clusters. (As we shall see soon, these clusters are, in fact, balls.) Each cluster is determined by fixing the first few (the most important) social coordinates, where C j = {x : x 0 = j}. This cluster representation corresponds to the first level of social hierarchy; we distinguish points by their most important coordinate. Each of clusters C j can be represented similarly as where C ji = {x : x 0 = j, x 1 = i} are clusters of the deeper hierarchic level and so on, up to the single-point clusters corresponding to fixing all social coordinates. Clusters are, in fact, ultrametric balls: where a is any point of the form a 0 = i 0 , ..., a k−1 = i k−1 and arbitrary coordinates a j , j = k, ..., n − 1.
Geometrically space Z p;n is represented as a tree with p branches leaving each vertex; see Figure 1 for Z 2;3 . A cluster is a bunch of branches with a common root. By extending this root we split the cluster into sub-clusters. The vertexes of the ground level can be enumerated by natural numbers x = 0, 1, 2, 3, 4, 5, 6, 7. Distance between these numbers differs from the usual distance between natural numbers (the real line distance); we have: d(0, 1) = d(2, 3) = ... = d(6, 7) = 1 (one step elevation to pass the barrier), d(0, 2) = d(0, 3) = d(1, 2) = d(1, 3) = 2, d(4, 6) = d(4, 7) = d(5, 6) = d(5, 7) = 2 (two steps elevation to pass the barrier), d(0, 6) = ... = d(3, 7) = 3 (three steps elevation to pass the barrier). This distance is associated with the hierarchic structure of the tree. This is listing the compound branches, i.e., going from the tree's root through intermediate vertexes to the ground level; listing from the left-hand side to the right-hand side of the tree in Figure 1. This representation of tree's compound branches is transferred into natural numbers as follows: x = x 2 + x 1 2 + x 0 2 2 . Let the above consideration related to epidemic x 0 = 0, 1 (age: young, old), x 1 = 0, 1 (chronic disease: absent, present), x 2 = 0, 1 (gender: man, woman). Then, for example, 1 represents a young woman without chronic disease and 7-an old woman with chronic disease. Now we consider the procedure of extension of a social tree by adding new social coordinates, so from tree Z p;n to tree Z p;N , where N > n. As the result of such an extension, each point of social space Z p;n becomes a social cluster in social space Z p;N . In principle, it is impossible to determine a social type by fixing any finite number of social coordinates. Hence, we have to consider infinite sequences of coordinates: Denote the space of such sequences by the symbol Z p . This is the complete hierarchic social space. Points of finite trees represent social clusters.

Probability to Become Infected from the Virus-Random Walk in a Hierarchic Tree of Social Clusters
Consider a tree-like structured population with n levels of social hierarchy. Mathematically this structure is described by social space Z p;n endowed with ultrametric d. Balls determine the social cluster partitions of Z p;n ; see (8) and (6), (7).
The fundamental quantity of our modeling of epidemic is the probability to become infected at the instant of time t for a person belonging to a social cluster C (some ball in social space). Denote this probability by the symbol P I (C, t). We are interested in its dynamics and more precisely in its asymptotic behavior for large t. This stage of epidemic can be considered as the stage of approaching herd immunity. Now we present the interpretation of this probability in terms of the virus's random walking in a population that is tree-like clustered. A virus plays the role of a system moving through barriers in models of dynamics on energy landscapes (see [11,[13][14][15][16][17][18] and references herein). In our case, these are social barriers between social clusters of the population. The virus performs a complex random walk motion inside each social cluster moving in its sub-clusters, goes out of it and spreads through the whole population; sometimes the virus comes back to the original cluster from other social clusters that have been infected from this initial source of infection, and so on. During this motion the virus should cross numerous social barriers. Denote by P(C, t) the probability to find a virus in social cluster C. This probability is interpreted as in statistical mechanics of gases: as the concentration of virions (virus particles, consisting of nucleic acid surrounded by a protective coat of protein called a capsid) in cluster C. Now, we identify probabilities, P(C, t) = P I (C, t) : probability to become infected is determined by concentration of virions in this cluster. Of course, concentration of virions is coupled with concentration of infected people, but not straightforwardly, since We do not want to go into detail, since the dynamics of the probability P(C, t) were well studied in physics and microbiology; see [11,[13][14][15][16][17][18] and references herein. The asymptotics for t → ∞ (relaxation regime) depend crucially on the barriers' magnitude and how rapidly they grow up on the way from one cluster to another.

Dynamics of the Probability to Become Infected
As in the previous section, we consider a random walk on a finite tree. Here we follow the paper of Ogielski [11]. Let us consider a finite tree with n levels. Thus, there are p n points at the last level. They enumerate the total population: x = 0, ..., p n − 1.
Let a virus encounter a barrier of size ∆ m , in hopping a distance m (crossing m levels of hierarchy), where ∆ 1 < ∆ 2 < ... < ∆ m < .... It is supposed that barriers ∆ m are the same for all social clusters, i.e., they depend only on distance, but not on clusters.
Consider the tree at Figure 1. We identify the lengths of branches between vertexes with magnitudes of barriers. Then the barriers on this tree depend on clusters, so from this viewpoint the social tree is not homogeneous.
The probability to jump over the barrier ∆ m has the form (up to the normalization constant): The meaning of this formula is straightforward: probability to jump over a higher barrier is smaller.
Consider the energy landscape with a uniform barrier ∆, at every branch point; that is, a jump of distance 1 involves surmounting a barrier ∆, of distance 2, a barrier 2∆ and so on. Hence, barriers linearly grow with distance m, ∆ m = m∆, m = 1, 2, .... (11) Barriers ∆ m are sufficiently high, but they still are not walls of the lock-down type. The probability to jump over the barrier ∆ m has the form (up to the normalization constant): In particular, the probability of jumping to the nearest clusters equals R = e −∆ . It exponentially decreases with increase of barrier ∆.
The power law asymptotics given by formula (15), see below, are obtained by solving of the master equation for random walk on the p-adic tree. For finite n (the number of levels of the tree Z p;n ), this equation can be solved exactly (this is the main result of [11]). Consider the initial condition P(x, 0) = δ(x); we recall that the points of Z p;n can be represented by natural numbers x = 0, ..., p n − 1. The solution with this initial condition has the form: where R = e −∆ is the probability of jumping to the nearest clusters. Then one sends n → ∞ and uses that R < 1; finally the asymptotic law (15) for t → ∞ can be found. The same asymptotics can be derived for any initial condition of the form P(x, 0) = δ(x − y), where y is some fixed point of the configuration space Z p;n . By using random walk on the tree with n levels of hierarchy and approaching n → ∞, one can derive the following asymptotic behavior of the probability: P(x, t) [11], and hence, the probability to become infected P I (x, t); in our model, the latter is equal to the former: Since any social cluster C is given by an ultrametric ball and a ball can be represented as union of x points; the same asymptote can be derived for any social cluster (ball) C : Set a = ln p/∆. If a 1, i.e., the primary social barrier ∆ is relatively large, then the probability for a person in the social cluster C to become infected decreases rather slowly; see  Hence, immunity increases also slowly (see Figure 3), as function Thus, for the low preventive level ∆ = B, herd immunity increases sufficiently quickly; increasing the preventive level makes the growth of herd immunity essentially slower. We remark that these graphs have the same shape as graphs (for COVID-19 epidemic) obtained in the recent paper [25]; see Figure S2, for age and activity structured community (besides the initial segment of dynamics when the number of infected is very small; but we recall that our model provides the asymptotic behavior, so it does not describe the initial phase of epidemic).
The same asymptotics can be obtained in terms of p-adic diffusion; see [40]. However, the latter theory is more complicated mathematically; see also [38,39] for diffusion on general ultrametric spaces represented geometrically by arbitrary trees.
Finally, we note that parameter a also depends on p the branching index of the social tree. We recall that the mathematical model of this tree is idealized; the branching index is constant-it does not depend on a vertex. Each cluster determined by k social coordinates, The parameter p determines the complexity of the social clustering of a population. By Equation (15), an increase of p implies a speed up in decreasing the probability P I (C, t) or in other words a speed up of increasing the herd immunity. For the same one-step social barrier ∆, herd immunity is approached quicker in a population with a complex structure of social relations, large parameter p. The slowest dynamics correspond to the p = 2 : "yes-no" system of social coordinates; say there are just two districts in a town, one populated by people with high income and another by people with low income. The quantity ln p can be interpreted statistically as entropy of the process of distribution of infection into p subclusters coupled to a vertex. Suppose that a virus can spread with equal probability q i = 1/p into each of the subclusters C i 0 ...i k−1 i of the cluster C i 0 ...i k−1 . Entropy of this spreading equals to In terms of spreading entropy, asymptotics (16) can be rewritten as Thus, larger spreading entropy of the social cluster tree implies quicker approaching herd immunity.
Our conjecture is that this formula is valid for more general process of infection spread, with nonuniform distribution for probabilities q i .

Average Social Distance Traveled by Disease Spreader
In our mathematical model, when any disease spreader travels through the social tree, he/she visits a few social clusters and infects people in these clusters. The theory of random walks in ultrametric spaces predicts the average social distance for spreader's travel through clusters, starting at some fixed cluster x and jumping to other cluster y, d(x, t) = ∑ y∈Z n;p d(x, y)P(y, t) For linearly growing social barriers, and n → ∞, the asymptotic behavior has the following form: This result can be derived by a scaling argument: If the time is rescaled by a factor R (where R = e −∆ , then all sets of neighboring points on the lowest level of tree Z n;p become indistinguishable, and we are left with an effective lattice which is one level lower. This results in a shift of 1 in the ultrametric which leads to asymptotics (20).
This average distance goes to infinity. As can be expected, a lower one-step social barrier ∆ induces more rapid growth. Although the log-growth is relatively slow, it, nevertheless, implies very extended spread of the infection. Unboundness of d(x, t) can be associated with the presence of super-spreaders who jump even over high social barriers and spread the virus to social clusters that are far from the original source of infection. We repeat once again that the distance under consideration is in social and not in physical space.

Concluding Remarks
The presented ultrametric model with random walk dynamics on energy landscapes describes disease spread in the socially clustered population. This approach provides the possibility to account for the dependence of the epidemic's dynamics on the strength of barriers between social clusters. Graphs in Figures 2 and 3 show the differences between the epidemic's dynamics for relatively mild and strong preventative measures. Such measures inhibit approaching herd immunity; higher barriers imply stronger inhibition. Generally even mild preventative policy approaches herd immunity with asymptotics given by the power law. The model elevates the role of the social dimension of disease spreading compared with its purely bio-medical dimension (see also Appendix A).
We applied to the new area of research, to epidemiology, mathematical theory that was developed for applications in statistical physics (spin glasses) and microbiology (protein folding): ultrametric random walk describing dynamics on complex energy landscapes with the hierarchic structure of barriers between valleys. The presence of social barriers growing with hierarchy's levels makes the evolution of epidemic essentially slower than in models which do not take into account the cluster-character of infection spreading.
Although the model is very simplified, it reflects the basic features of disease spread in the socially clustered population. We hope that our model will stimulate further development of ultrametric epidemiological models.

Conflicts of Interest:
The authors declare no conflict of interest.

Appendix A. Social Cluster Structure of Spread of Covid-19
As was pointed in many sources, the present COVID-19 epidemic clearly has a hierarchic social cluster structure. Moreover, clustering, selection and ordering of social coordinates (Section 2), depend crucially on the country. In the USA, the difference between low and high income social clusters is really amazing. This income clustering is completed by race and immigration clustering. This is the good place to cite [51]: "At a clinic in Corona, a working-class neighborhood in Queens, more than 68 percent of people tested positive for antibodies to the new coronavirus. At another clinic in Jackson Heights, Queens, that number was 56 percent. But at a clinic in Cobble Hill, a mostly white and wealthy neighborhood in Brooklyn, only 13 percent of people tested positive for antibodies. As it has swept through New York, the coronavirus has exposed stark inequalities in nearly every aspect of city life, from who has been most affected to how the health care system cared for those patients. Many lower-income neighborhoods, where Black and Latino residents make up a large part of the population, were hard hit, while many wealthy neighborhoods suffered much less." The example of a system of social coordinates that has been considered in Section 2 matches the situation in European countries better: x 0 for age, x 1 for chronic diseases, x 2 for gender, x 3 for race, x 4 for town, x 5 for a district, x 6 for profession, x 7 for social activity, x 8 for children, etc. For the USA, we should rearrange (change the hierarchy of coordinates): x 0 for income, x 1 for race, x 2 for town, x 3 for district, x 4 for age, x 5 for chronic diseases, x 6 for gender, x 7 for profession, x 8 for social activity, x 9 for children, etc.
As was noted in Introduction, the role of population heterogeneity in the spread of COVID-19 was recently analyzed in the paper of Britton et al. [25]; we present a few citations from this paper supporting our model by taking into account the social clustering of population: "We show that population heterogeneity can significantly impact disease-induced immunity as the proportion infected in groups with the highest contact rates is greater than in groups with low contact rates." "No realistic model will depict human populations as homogenous, there are many heterogeneities in human societies that will influence virus transmission. Here, we illustrate how population heterogeneity can cause significant heterogeneity among the people infected during the course of an infectious disease outbreak... One of the simplest of all epidemic models is to assume a homogeneously mixing population in which all individuals are equally susceptible, and equally infectious if they become infected. ... To this simple model we add two important features known to play an important role in disease spreading. The first is to include age structure by dividing the community into different age cohorts, with heterogeneous mixing between the different age cohorts. ... The second population structure element categorizes individuals according to their social activity level."

Appendix B. Superspeaders as Powerful Sources of Virions
A superspreader is an unusually contagious individual who has been infected with disease; someone who infected a number of people far exceeding the two to three. As was pointed out in an MIT Technology Review [45]: "For COVID-19, this means 80% of new transmissions are caused by fewer than 20% of the carriers-the vast majority of people infect very few others or none at all, and it is a select minority of individuals who are aggressively spreading the virus. A recent preprint looking at transmission in Hong Kong supports those figures, while another looking at transmission in Shenzhen, China, pegs the numbers closer to 80/10. Lots of outbreaks around the world have been linked to single events where a superspreader likely infected dozens of people. For example, a choir practice in Washington State infected about 52 people; a megachurch in Seoul was linked to the majority of initial infections in South Korea; and a wedding in Jordan with about 350 guests led to 76 confirmed infections." The bad news is that, for the moment, we cannot diagnostically identify superspreaders.