# Self-Similarity in Population Dynamics: Surname Distributions and Genealogical Trees

## Abstract

**:**

## 1. Introduction and Historical Background

## 2. The Evolution of Surnames as a Yule Process and the Master Equation Approach

_{j,s}(k, t) for a family to have k members at time t if the number of members at time s was j. The time evolution of P

_{j,s}(k, t) is governed by the differential equation [19]

^{−γ}where

## 3. Alternative Approaches

_{E}= λ+μ, λ and μ representing (constant) birth and death rates, and the mutation rate β appears in a boundary condition at x = 1.

## 4. The Effects of Sampling on Discrete Frequency Distributions

_{k}}, where N

_{k}is the number of kinds such that for each of them there are k objects in the original set. It is in principle possible to compute the probability of any sample distribution {n

_{l}} as a function of a given set {N

_{k}}. A very important limit of the above result may be obtained when considering the (rather typical) case k, l ≪ N, n. In this limit, setting ρ ≡ n/N,

_{l}⟩ and exchanging the order of summations, we easily obtain [20]

_{F}) is:

_{p}of the original frequency distribution, as long as p ≤ n.

_{F}[20]:

## 5. Evolution of Populations and the Dynamics of Disordered Systems

^{G}H(r, G)/N as a function of w ≡ rN/2

^{G}, and the left tail of P(w), for small values of r, is a power law with a positive exponent β ≈ 0.3. The fraction σ(G) of the total population, which is expected to be absent from a given genealogical tree, can also be estimated.

_{G}(z) ≡ 〈exp[z w(G)]〉 for the weights w(G) satisfies a recursion equation having the form of a renormalization group transformation:

^{*}= h(−∞) and therefore it solves σ

^{*}= exp(2σ

^{*}− 2); numerically, σ

^{*}= 0.203. In this case, one may also extract the exponent β, finding β = − ln σ

^{*}/ ln 2 = 0.299, in excellent agreement with the results of the simulation [21].

## 6. Evaluating the Structure of Populations from Individual Genealogies

^{G}

^{−1}couples and 2

^{G}individuals in the G-th generation, but in practice due to consanguinity, the number of different couples and individuals may be sensibly reduced, especially for large values of G, leading to important repetitions of individuals and surnames that may be described by appropriate frequency distributions, which become highly nontrivial when 2

^{G}is comparable with (or larger than) the dimension of the community.

- m
_{G}(k): probability of k repetitions of an individual in the G-th generation of ancestors (m_{G}is related to H(r, G) and may be found by solving the recursive equation described in the previous section); - M
_{G}(k): probability that k (different) males belonging to the same family (i.e., carrying the same surname) may appear in the G-th generation of ancestors; - F
_{G}(k): probability that k (different) females belonging to the same family may appear in the full population in the G-th generation (F_{G}(k) should reflect the surname distribution of the population P(k), defined in Section 2, at the time corresponding to G); - R
_{G}(k): probability that a surname may appear k times in the G-th generation (surname distribution of the genealogical tree, including repetitions, trivially coincident with the surname distribution of males in the (G+1)-th generation); - D
_{G}(k): probability that k females (including repetitions) belonging to the same family may appear in the G-th generation of ancestors (surname distribution of females in the G-th generation); - C
_{G}: number of different surnames appearing in the G-th generation in the genealogical tree; - C
^{*}: number of different surnames in the population; - S
_{G}: number of different males (or couples) in the G-th generation in the genealogical tree; - N: number of different females in the full population (consisting of 2N individuals).

_{G}

_{−1}(k), M

_{G}(k) and m

_{G}(k) by noticing that under the reasonable assumption 2

^{G}>> S

_{G}> C

_{G}>> 1 (holding for a sufficiently large value of G), all probabilities may be treated as independent and therefore

_{i}} are all the sets of k integers such that ${\sum}_{i=1}^{k}{x}_{i}=y$.

_{G}(k), F

_{G}(k) and m

_{G}(k) under the assumption that wives are chosen at random within the social group. To this purpose we need some expressions derived from the theory of sampling within a frequency distribution.

_{N,S}(r, k) be the probability that k (different) females belonging to the same family (characterized by the presence of r females in the full population) appear (by random mating) among the spouses of S different males present in the same generation in the genealogical tree:

_{N,S}(r, k), we can now compute the probability Π

_{N,S}(r, y) of y individual repetitions in the genealogical tree for females belonging to a family characterized by a female frequency r in the full population:

## 7. Empirical Studies of Ancestors Tables

**PACS classifications:**87.23.-n; 89.75.Da; 05.10.-a

## Conflicts of Interest

## References

- Darwin, G.H. Marriages between first cousins in England and their effects. J. Stat. Soc.
**1875**, 38, 153–184. [Google Scholar] - Galton, F.; Watson, H.W. On the Probability of the Extinction of Families. J. Anthropol. Inst. Great Brit. Ireland.
**1874**, 4, 138–144. [Google Scholar] - Crow, J.F.; Mange, A.P. Measurement of inbreeding from the frequency of marriages between persons of the same surname. Eugenics Quarterly
**1965**, 12, 199–203. [Google Scholar] - Karlin, S.; McGregor, J. The number of mutant forms maintained in a population. In Proceedings of the 5th Berkeley Symposium on Mathematical Statistics and Probability; University of California Press: Oakland, CA, USA, 1967; Volume 4, pp. 415–438. [Google Scholar]
- Fisher, R.A.; Corbet, A.S.; Williams, C.B. The Relation Between the Number of Species and the Number of Individuals in a Random Sample of an Animal Population. J. Anim. Ecol.
**1943**, 12, 42–58. [Google Scholar] - Lasker, G.W. A coefficient of relationship by isonymy: A Method for Estimating the Genetic Relationship between Populations. Hum. Biol.
**1977**, 49, 489–493. [Google Scholar] - Fox, W.R.; Lasker, G.W. The Distribution of Surname Frequencies. Int. Stat. Rev.
**1983**, 51, 81–87. [Google Scholar] - Gottlieb, K. (Ed.) Surnames as markers of inbreeding and migration. Hum. Bio.
**1983**, 55, 209–408. - Lasker, G.W. Surnames and Genetic Structure; Cambridge University Press: Cambridge, UK, 1985. [Google Scholar]
- Boattini, A.; Lisa, A.; Fiorani, O.; Zei, G.; Pettener, D.; Manni, F. General method to unravel ancient population structures through surnames, final validation on Italian data. Hum. Bio.
**2012**, 84, 235–270. [Google Scholar] - Redmonds, G.; King, T.; Hey, D. Surnames, DNA, and Family History; Oxford University Press: Oxford, UK, 2011. [Google Scholar]
- Yule, G.U. A mathematical theory of evolution based on the conclusions of Dr. J.C. Willis. Phil. Trans. R. Soc. Lond. B Biol. Sci.
**1925**, 213, 21–87. [Google Scholar] - Simon, H.A. On a class of skew distribution functions. Biometrika
**1955**, 42, 425–440. [Google Scholar] - Newman, M.E.J. Power laws, Pareto distributions and Zipf’s law. Contemp. Phys.
**2005**, 46, 323–351. [Google Scholar] - Derrida, B.; Peliti, L. Evolution in a flat fitness landscape. Bull. Math. Biol.
**1991**, 53, 355–382. [Google Scholar] - Derrida, B.; Bessis, D. Statistical properties of valleys in the annealed random map model. J. Phys. A
**1988**, 21, L509–L515. [Google Scholar] - Serva, M.; Peliti, L. A statistical model of an evolving population with sexual reproduction. J. Phys. A
**1991**, 24, L705–L709. [Google Scholar] - Rossi, P. Surname distribution in population genetics and in statistical physics. Phys. Life Rev.
**2013**, 10, 395–415. [Google Scholar] - Baek, S.K.; Kiet, H.A.T.; Kim, B.J. Family name distributions: Master equation approach. Phys. Rev. E
**2007**, 76, 046113:1–046113:7. [Google Scholar] - Rossi, P. Invariant expectation values in the sampling of discrete frequency distributions. Physica A
**2014**, 394, 177–186. [Google Scholar] - Derrida, B.; Manrubia, S.C.; Zanette, D.H. Statistical Properties of Genealogical Trees. Phys. Rev. Lett.
**1999**, 82, 1987–1990. [Google Scholar] - Derrida, B.; Manrubia, S.C.; Zanette, D.H. Distribution of repetitions of ancestors in genealogical trees. Physica A
**2000**, 281, 1–16. [Google Scholar] - Derrida, B.; Manrubia, S.C.; Zanette, D.H. On the genealogy of a population of biparental individuals. J. Theor. Bio.
**2000**, 203, 303–315. [Google Scholar] - Kim, B.J.; Park, S.M. Distribution of Korean Family Names. Physica A
**2005**, 347, 683–694. [Google Scholar] - Zanette, D.H.; Manrubia, S.C. Vertical transmission of culture and distribution of family names. Physica A
**2001**, 295, 1–8. [Google Scholar] - Manrubia, S.C.; Zanette, D.H. At the Boundary between Biological and Cultural Evolution: the Origin of Surname Distributions. J. Theor. Bio.
**2002**, 216, 461–477. [Google Scholar] - Bak, P.; Tang, C.; Wiesenfeld, K. Self-organized criticality: An explanation of the 1/f noise. Phys. Rev. Lett.
**1987**, 59, 381–384. [Google Scholar] - Bak, P.; Sneppen, K. Punctuated equilibrium and criticality in a simple model of evolution. Phys. Rev. Let.
**1993**, 71, 4083–4086. [Google Scholar] - Flyvbjerg, H.; Bak, P.; Sneppen, K. Mean field theory for a simple model of evolution. Phys. Rev. Lett.
**1993**, 71, 4087–4090. [Google Scholar] - de Boer, J.; Derrida, B.; Flyvbjerg, H.; Jackson, A.D.; Wettig, T. Simple Model of Self-Organized Biological Evolution. Phys. Rev. Lett.
**1994**, 73, 906–909. [Google Scholar] - Doi, M. Second quantization representation for classical many-particle system. J. Phys. A
**1976**, 9, 1465–1478. [Google Scholar] - Goldenfeld, N. Kinetics of a model for nucleation-controlled polymer crystal growth. J. Phys. A
**1984**, 17, 2807–2821. [Google Scholar] - Peliti, L. Path integral approach to birth-death processes on a lattice. J. De Phys.
**1985**, 46, 1469–1483. [Google Scholar] - Jarvis, P.D.; Bashford, J.D.; Sumner, J.G. Path integral formulation and Feynman rules for phylogenetic branching models. J. Phys. A
**2005**, 38, 9621–9647. [Google Scholar] - De Luca, A.; Rossi, P. Renormalization group evaluation of exponents in family name distributions. Physica A
**2009**, 388, 3609–3614. [Google Scholar] - Reed, W.J.; Hughes, B.D. From gene families to incomes and internet file sizes: Why power laws are so common in nature. Phys. Rev. E
**2002**, 66, 067103:1–067103:4. [Google Scholar] - Reed, W.J.; Hughes, B.D. On the distribution of family names. Physica A
**2003**, 319, 579–590. [Google Scholar] - Bartley, D.L.; Ogden, T.; Song, R. Frequency distributions from birth, death and creation processes. BioSystems
**2002**, 66, 179–191. [Google Scholar] - Maruvka, Y.E.; Shnerb, N.M.; Kessler, D.A. Universal features of surname distribution in a subsample of a growing population. J. Theor. Bio.
**2010**, 262, 245–256. [Google Scholar] - Chang, J.T. Recent common ancestors of all present-day individuals. Adv. App. Prob.
**1999**, 31, 1002–1026. [Google Scholar] - Rohde, D.L.T.; Olson, S.; Chang, J.T. Modelling the recent common ancestry of all living humans. Nature
**2004**, 431, 562–566. [Google Scholar]

© 2015 by the authors; licensee MDPI, Basel, Switzerland This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution license (http://creativecommons.org/licenses/by/4.0/).

## Share and Cite

**MDPI and ACS Style**

Rossi, P.
Self-Similarity in Population Dynamics: Surname Distributions and Genealogical Trees. *Entropy* **2015**, *17*, 425-437.
https://doi.org/10.3390/e17010425

**AMA Style**

Rossi P.
Self-Similarity in Population Dynamics: Surname Distributions and Genealogical Trees. *Entropy*. 2015; 17(1):425-437.
https://doi.org/10.3390/e17010425

**Chicago/Turabian Style**

Rossi, Paolo.
2015. "Self-Similarity in Population Dynamics: Surname Distributions and Genealogical Trees" *Entropy* 17, no. 1: 425-437.
https://doi.org/10.3390/e17010425