Statistical Aspects of Two Classes of Random Binomial Trees and Forests

Huillet, Thierry E.

doi:10.3390/math13020291

Open AccessFeature PaperArticle

Statistical Aspects of Two Classes of Random Binomial Trees and Forests

by

Thierry E. Huillet

Laboratoire de Physique Théorique et Modélisation (CNRS, UMR 8089), CY Cergy Paris Université, 95302 Cergy-Pontoise, France

Mathematics 2025, 13(2), 291; https://doi.org/10.3390/math13020291

Submission received: 25 November 2024 / Revised: 4 January 2025 / Accepted: 14 January 2025 / Published: 17 January 2025

(This article belongs to the Special Issue Latest Advances in Random Walks Dating Back to One Hundred Years)

Download Versions Notes

Abstract

We consider two specific families of binomial trees and forests: simply generated binomial d-ary trees and forests versus their increasing phylogenetic version, with tree nodes in increasing order from the root to any of its leaves. The analysis (both pre-asymptotic and asymptotic) consists of some of the main statistical features of their total progenies. We take advantage of the fact that the random distribution of those trees are obtained while weighting the counts of the underlying combinatorial trees. We finally briefly stress a rich alternative randomization of combinatorial trees and forests, based on the ratio of favorable count outcomes to the total number of possible ones.

Keywords:

simply generated and increasing binomial trees and forests; total progeny; generating functions; Lagrange inversion formula; structural statistics; partition structures; combinatorial probability

MSC:

60C05; 60E05; 60J80; 60J85

1. Introduction

The main purpose of this paper is to present explicit and asymptotic methods to count two various kinds of random trees: Bienyamé–Galton–Watson (BGW) trees and increasing random trees, both generated by a binomial branching mechanism. The limiting Poisson case will also be briefly treated. We will analyze some structural properties of these Markov chains on the non-negative integers. The analysis chiefly concerns the number of leaves of these trees, the number of trees entering in the composition of a size

— n

forest of such trees, the joint sizes of the constitutive trees in a forest of k trees, and the one-dimensional marginal size of a typical tree both at fixed

n, k

and in the limit

n, k \to \infty

,

n / k \sim ρ

. The probability to observe trees with a given outdegree of nodes sequence is also investigated. In all cases, the use of generating functions is an essential ingredient. Explicit formulae can sometimes be derived with the help of Lagrange’s inversion formulae that take different expressions for the two processes under concern. On the other hand, the singularity analysis of generating functions may lead to asymptotic formulae, aiming at describing large trees. The random trees under study consist of weighted versions of combinatorial trees; this will be highlighted.

We shall study the following models:

-: Binomial BGW trees and forests appearing in branching population models, percolation on trees and branched polymers.
-: Binomial increasing trees as recursive trees appearing in phylogeny with nodes in increasing order for any path from the root to the leaves: the leaves of size $— n$ tree species can mutate to another species when adding a new atom $n \to n + 1$ , whereas the internal nodes consist of the species that can produce a new species in the process. Being always supercritical, d-ary increasing trees do not show up a phase transition at the criticality. A forest of such trees consists of different population genera. The recursive structure is an essential ingredient which is shared by the limiting Poisson increasing tree.

Such random trees and forests are constructed after assigning probability weights to the nodes (with given outdegrees) of combinatorial trees, either simple or increasing, and observing that the weights only depend on the tree size and not on its full outdegree sequence.

Deep relations of enumeration problems of trees and forests to skip-free to the left random walks can be found in [1]. It rests on a different uniform way to construct random forests arising from enumerative combinatorics, leading to very rich asymptotic behaviors. This point is briefly addressed and illustrated in Appendix A.

2. Number of Atoms and Leaves in a Size— $n$ Simple Tree

We can distinguish two main types of random trees, namely the following:

-: Ordered (or plane) trees: The reason is that one can draw the tree in the half-plane so that the children of every parent are ordered from left to right, say, from the youngest child to the eldest one. Embeddings obtained from cyclic rotations of the sub-trees around the root are not allowed.

Such trees are amenable to the Ulam–Harris–Neveu ordering of their nodes (horizontal ordering holds) and they can be represented as strata with the founder on top and the successive layers below, [2]. Given that an individual of the population at generation h is labeled by vertex

v = v_{1} \dots v_{h}

(as a concatenation of h positive integers) and gives birth to

H_{v} \geq 1

daughters, its offspring are labeled by

v 1, \dots, v H_{v}

. Each individual at generation h thus obtains a concatenated label

v = v_{1} \dots v_{h}

for which label

v_{1} \dots v_{h - 1}

is the one of its mother,

v_{1} \dots v_{h - 2}

, the one of its grandmother,…, up to ∅, the conventional label of the root. Such ordered trees are Bienaymé–Galton–Watson (in short BGW) trees in the theory of branching processes; they are also called simply generated (for short, here, simple) trees.

-: Increasing random trees: A size-n rooted and increasing labeled tree has vertices with indices or labels $\{1, \dots, n\}$ increasing for any path from the root to its leaves. In contrast with simple random trees, increasing random trees do not show a phase transition at the criticality. For some classes of special branching mechanisms (binomial, Poisson, or negative binomial), such trees can be recursively defined, proving useful in their analysis.

2.1. Simply Generated Random Trees

By recursion from the root (a unique founder of the tree), the probability of simply generated (or for short simple) size-n rooted trees generated by the local probability generating function (p.g.f.)

ϕ (z)

(with non-negative

z^{m} —

coefficient of

ϕ (z)

say

ϕ_{m} : = [z^{m}] ϕ (z)

,

m \geq 0

,

ϕ (1) = 1

) summing to 1 is obtained as

P (\bar{N} (1) = n) = [z^{n}] Φ (z),

(1)

where

Φ (z)

solves the functional equation

Φ (z) = z ϕ (Φ (z))

,

Φ (0) = 0

. The Lagrange inversion formula states that for all

n \geq 1

[z^{n}] Φ (z) = \frac{1}{n} [z^{n - 1}] ϕ {(z)}^{n} .

(2)

A more general form of the Lagrange inversion formula states that (with

'

denoting derivative)

0

[z^{n}] h (Φ (z)) = \frac{1}{n} [z^{n - 1}] (h^{'} (z) ϕ {(z)}^{n}) .

(3)

for any arbitrary analytic output function h. See [3,4]. If

h (z)

is a p.g.f.,

h (Φ (z))

is the p.g.f. of the total progeny of a branching process generated by

ϕ

with a random number of founders, say

N_{0},

for which

h (z) = E (z^{N_{0}}) .

If

h (z) = {(1 - v z)}^{- 1}

, where v ‘marks’ the number of distinguishable trees in a forest of simple random trees generated by

ϕ

, with

K (z, v) : = 1 / (1 - v Φ (z))

[z^{n}] K (z, v) = \frac{v}{n} [z^{n - 1}] ({(1 - v z)}^{- 2} ϕ {(z)}^{n}) .

(4)

This consists of a degree-n polynomial in v. The double generating function

K (z, v)

may be viewed as the ‘grand-canonical’ partition function. In that case,

[v^{k} z^{n}] \frac{1}{1 - v Φ (z)} = \frac{k}{n} [z^{n - k}] ϕ {(z)}^{n} = P (K_{n} = k)

where

K_{n}

is the number of distinguishable trees forming the size

— n

forest.

K (z, v)

consists of a ‘sequence’ of trees forming the forest, with v ‘marking’ their number.

Should the trees be distinguishable, the same formulae hold but now with

K (z, v) = e^{v Φ (z)},

consisting of a ‘set’ of trees forming the forest, so

[v^{k} z^{n}] K (z, v) = \frac{1}{k!} \frac{k}{n} [z^{n - k}] ϕ {(z)}^{n} .

(5)

One important branching mechanism

ϕ (z)

is the binomial one. Other important examples of

ϕ (z)

are

e^{- μ (1 - z)}

(mean-

μ

Poisson) or

q / (1 - p z)

(geometric) and some of them will be mentioned in the course of the analysis when needed.

2.2. The Binomial Case

We here focus on

ϕ (z) = {(q + p z)}^{d}

,

d \geq 2

and

p + q = 1

(the binomial branching mechanism) corresponding to the probability mass function (pmf)

π_{b} = (\binom{d}{b}) p^{b} q^{d - b}

,

b = 0, \dots, d

(for which

π_{0} = q^{d}

and

π_{d} = p^{d}

):

π_{b}

is the probability of an offspring having outdegree (branching number) b. For this BGW model, each mother particle alive can give birth to at most d daughters, or possibly none. With

ξ \overset{d}{\sim}

bin

(d, p),

the random number of offspring per capita,

ϕ (z) = E (z^{ξ})

: each active node independently possibly activates any of its d descendants. The produced random tree is a sub-tree of the full d-ary (d-dimensional) tree having

d^{h}

atoms at generation

h \geq 0 .

It is a model of percolation on trees [see [5], p. 438 -> ].

The random variable

ξ

has mean

p d;

its distribution is unimodal with mode at the origin if and only if

p < 1 / (d + 1) < 1 / d .

If

p > 1 / (d + 1)

, the mode is near

p (d + 1) - 1 .

Remarks and related models.

(i)

The latter model is also related to the Flory–Stockmayer binomial model of randomly branched polymers with degree-

(d + 1)

functional monomers. See [6,7] if

d = 2

, [8] for any integer d and also [9]: in this model with one founder

k = 1

, the

Φ (z)

obtained above from the bin

(d, p)

generating model

ϕ

is in fact the p.g.f. of first-generation polymers. Here, each monomer with d functional units (arms) is identified to a node of a BGW tree. Independently of one another, each of the d functional units has a probability p to be attached to a second-generation functional unit and so on. At generation 0 however, a seed monomer with full

d + 1

functional units gives birth to a random number (so with distribution bin

(d + 1, p)

) of the first generation of such polymers, all with p.g.f.

Φ (z)

. The true size of the Flory branched polymer has thus the distribution given by the modified p.g.f.

0

Φ (z) \to Φ_{d} (z) = z {(q + p Φ (z))}^{d + 1} .

(6)

This translates the fact that the seed monomer can have up to

d + 1

activated functional units, whereas all its descendants are only up to d, the first and subsequent generation of trees growing away from the seed monomer, thereby presenting only d possible free arms. In the supercritical case with

p d > 1

, there is a positive probability that the Flory tree (polymer) is a giant one with infinitely many monomers (the gelation transition). With

\bar{N} (1)

, the total number of monomers with one founder, we have

P (\bar{N} (1) = 1) = [z] Φ (z) = q^{d + 1}

and, for

n \geq 2

, by Lagrange inversion formula

\begin{matrix} P (\bar{N} (1) = n) & = & [z^{n}] Φ_{d} (z) = [z^{n - 1}] {(q + p Φ (z))}^{d + 1} \\ = & \frac{p (d + 1)}{n - 1} [z^{n - 2}] {(q + p z)}^{(n + 1) d} \\ = & \frac{p (d + 1)}{n - 1} (\binom{(n + 1) d}{n - 2}) p^{n - 2} q^{n (d - 1) + d + 2} . \end{matrix}

The leaves of the

d —

polymer tree (its nodes with outdegree zero) constitute its external boundary where possible contact with the reactants are likely to occur. As such, an estimation of their number is of interest when dealing with such size-n branched polymers.

It is also of interest to consider a polymer soup as a collection of such k independent

d —

polymers with a given size.

(i i)

With

Φ (z)

solving

Φ (z) = z {(q + p Φ (z))}^{d}

and defining

Φ_{d} (z) : = Φ {(z^{d})}^{1 / d},

we obtain that

Φ_{d} (z)

solves

Φ_{d} (z) = z (q + p Φ_{d} {(z)}^{d}) = z ϕ_{d} (Φ_{d} (z)),

corresponding to the branching mechanism

ϕ_{d} (z) = q + p z^{d}

. So

Φ_{d} (z)

is the p.g.f. of the total progeny, say

{\bar{N}}_{d} (1)

, of a branching process whose offspring per capita is either d with probability p or 0 with probability

q = 1 - p

, so all or nothing. We clearly have

P ({\bar{N}}_{d} (1) = n) > 0

only for those

n = m d + 1

,

m \geq 0

(the number of tree branches being multiple of

d)

, so

Φ_{d} (z^{1 / d})

is well defined together with

Φ (z) = Φ_{d} {(z^{1 / d})}^{d}

. We can deduce the main properties of the new model generated by

ϕ_{d}

from the previous one generated by the binomial

ϕ

:

(i i i)

If

h (z) = E (z^{N_{0}}) = q_{0} + p_{0} z

, let

Φ_{0} (z) = h (Φ (z)) = q_{0} + p_{0} Φ (z) .

When

(q_{0}, p_{0}) = (q_{0}, p_{0})

,

Φ_{0} (z) = q + p Φ (z)

is a weak version of

Φ (z)

, allowing for an empty tree with

P (\bar{N} (1) = 0) = q .

(i v)

Φ_{0} (z) : = Φ (z) / z

solves

Φ_{0} (z) = ϕ (z Φ_{0} (z))

,

Φ_{0} (0) = ß_{0} .

It is the shifted p.g.f. of

\bar{N} (1) - 1

also taking values in

N_{0} : = \{0, 1, \dots\} .

(v)

Note that

Ψ (z) : = Φ (q_{0} + p_{0} z),

as a Bernoulli-thinned version of

Φ (z)

, is not an output of

Φ (z) .

If, while deleting a productive node with probability

q_{0}

, the ancestral branch leading to that node is equivalently erased, a pruning operation of the tree results in the formation of disconnected sub-trees rooted at this node. Assuming the root to be active, the size of the pruned tree descending from the root has p.g.f.

Ψ_{0} (z)

, clearly solving

Ψ_{0} (z) = z ϕ (q_{0} + p_{0} Ψ_{0} (z)) .

This is the progeny of a BGW tree with the thinned branching mechanism

ϕ (q_{0} + p_{0} z) .

(v i)

As

d \to \infty

, the binomial branching mechanism

ϕ (z) = {(1 - \frac{μ}{d} (1 - z))}^{d}

approaches the Poisson

(μ)

p.g.f., leading to random Cayley trees.

(v i i)

The total progeny of Bellman–Harris trees for which each splitting individual alive has an exponential lifetime independent of its sisters’ particles coincides with the one of discrete-time BGW trees as long as they share the same branching mechanism. An overlap of generations results.

State-space representations. The binomial BGW process is a Markov chain on the non-negative integers with the transition matrix

P (k, k^{'}) = [z^{k^{'}}] ϕ {(z)}^{k} = (\binom{k d}{k^{'}}) p^{k^{'}} q^{k d - k^{'}}, k \in N_{0}, k^{'} \in \{0, \dots, k d\},

(7)

in the case of a single founder.

With

ϕ_{h} (z)

,

h \geq 0

, the p.g.f. of the number of individuals, say

N_{h}

, alive at step h (the generation number), obeying

ϕ_{h + 1} (z) = ϕ (ϕ_{h} (z))

,

ϕ_{0} (z) = z,

P^{h} (k, k^{'}) = [z^{k^{'}}] ϕ_{h} {(z)}^{k}

(8)

is the h-step transition matrix, involving the h-iterate

ϕ_{h} (z)

of the degree-d polynomial

ϕ (z)

(as a degree

d^{h}

polynomial). Recall that

ϕ_{h} (0)

is the probability that the process dies out before step

h,

so the probability that the time to extinction

τ_{1, 0}

of the unique founder occurs before generation h:

ϕ_{h} (0) = P (τ_{1, 0} \leq h)

.

Total progeny. By the Lagrange formula, with

P (\bar{N} (1) = n) = [z^{n}] Φ (z)

, where

\bar{N} (1) = \sum_{h \geq 0} N_{h}

is the total progeny of this branching process with one founder and overlapping generations, with

P (\bar{N} (1) = n) = \frac{1}{n} [z^{n - 1}] {(q + p z)}^{n d} = \frac{q}{n p} (\binom{n d}{n - 1}) {(p q^{d - 1})}^{n} .

(9)

By the Stirling formula, with

z_{c} : = sup (z > 0 : Φ (z) < \infty) = \frac{{((d - 1) / q)}^{d - 1}}{p d^{d}} \geq 1,

for large n,

P (\bar{N} (1) = n) \sim \frac{1}{\sqrt{2 π}} \frac{q}{p} {(\frac{d}{{(d - 1)}^{3}})}^{1 / 2} n^{- 3 / 2} z_{c}^{- n} .

(10)

Note that

z_{c} = 1

only when

p = p_{c} : = 1 / d

(the critical case with

P (\bar{N} (1) = n) \sim \frac{1}{\sqrt{2 π}} {(\frac{d}{d - 1})}^{1 / 2} n^{- 3 / 2},

a pure power law with exponent

3 / 2

).

This result is a particular illustration of a more general situation. Consider indeed the general branching mechanisms

ϕ (z)

having all its moments (equivalently for which

z_{*} : = sup (z > 0 : ϕ (z) < \infty) \in (1, \infty] .

Then, a unique positive real root to the equation

ϕ (τ) - τ ϕ^{'} (τ) = 0,

(11)

exists, with

τ < z_{*}

if

μ < 1

(

ϕ (τ) > 1

),

τ = 1

and

ϕ^{'} (τ) = 1

if

μ = 1 .

The point

(τ, ϕ (τ))

is the tangency point to the curve

ϕ (z)

of a straight line passing through the origin

(0, 0)

. Let then

z_{c} : = τ / ϕ (τ) = 1 / ϕ^{'} (τ) \geq 1

. The searched

Φ (z)

solves

ψ (Φ (z)) = z,

where

ψ (z) = z / ϕ (z)

obeys

ψ (τ) = z_{c},

ψ^{'} (τ) = 0

and

ψ^{″} (τ) = - \frac{τ ϕ^{″} (τ)}{ϕ {(τ)}^{2}} > - \infty .

Thus,

ψ (z) \sim z_{c} + \frac{1}{2} ψ^{″} (τ) {(z - τ)}^{2}

else

z \sim z_{c} + \frac{1}{2} ψ^{″} (τ) {(Φ (z) - τ)}^{2}

(a branch-point singularity). It follows that

Φ (z)

displays a dominant power singularity of the order of

- 1 / 2

at

z_{c}

with

Φ (z_{c}) = τ

in the sense (with

σ_{c}^{2} = τ^{2} ϕ^{″} (τ) / ϕ (τ)

)

Φ (z) \underset{z \to z_{c}}{\sim} τ - \sqrt{\frac{2 ϕ (τ)}{ϕ^{″} (τ)}} {(1 - z / z_{c})}^{1 / 2} = τ (1 - \frac{\sqrt{2}}{σ_{c}} {(1 - z / z_{c})}^{1 / 2}) .

(12)

By singularity analysis therefore, see [10], we obtain [in agreement with [11], Theorem 13.1, p. 32]:

P (\bar{N} (1) = n) = [z^{n}] Φ (z) \underset{n \to \infty}{\sim} \sqrt{\frac{ϕ (τ)}{2 π ϕ^{″} (τ)}} n^{- 3 / 2} z_{c}^{- k} + O (n^{- 5 / 2} z_{c}^{- n}),

(13)

to the dominant order in n, with a geometric decay term at rate

z_{c}^{- 1} = ϕ^{'} (τ) < 1

and a ‘universal’ power-law decay term

n^{- 3 / 2}

(under the moment conditions on

ϕ

).

In the critical case

μ = 1

,

z_{c} = 1

and

P (\bar{N} (1) = n)

is a pure power law with a ‘universal’ value of

3 / 2

.

In the case of the binomial branching mechanism, one can check that

τ = q / [p (d - 1)],

ϕ (τ) = {(\frac{q d}{d - 1})}^{d},

in agreement with (10). In this context as well,

τ

plays an important role.

The binomial model is supercritical if and only if

μ : = ϕ^{'} (1) = p d > 1

(else

p > p_{c} : = 1 / d

), in which case the smallest solution

ρ_{e}

to

ϕ (ρ_{e}) = ρ_{e}

lies in

(0, 1) .

This is a degree-d algebraic equation. In this case,

Φ (1) = P (\bar{N} (1) < \infty) = ρ_{e}

, the extinction probability of the binomial branching process.

-: If $ϕ^{'} (1) = 1$ ( $< 1$ ), the model is critical (subcritical) and $Φ (1) = 1$ . We have

$Φ^{'} (1) = \infty (respectively Φ^{'} (1) = \frac{1}{1 - ϕ^{'} (1)} = \frac{1}{1 - p d}) .$

(14)

Critical trees are finite with probability 1, but their time to extinction is long with a law tail equivalent to that of a Pareto(1) power-law distribution.

In the subcritical case,

\bar{N} (1)

has all its moments, in particular, a finite variance

σ^{2} (\bar{N} (1)) = \frac{p q d}{{(1 - p d)}^{3}} < \infty

.

-: If $ϕ^{'} (1) > 1$ , the model is supercritical and $Φ (1) = ρ_{e}$ together with $E (\bar{N} (1)) = \infty$ . We also have

$Φ^{'} (1) = \frac{1}{1 - ϕ^{'} (ρ_{e})} = \frac{q + p ρ_{e}}{q - p (d - 1) ρ_{e}},$

the mean number of atoms of the binomial branching process conditioned on extinction with modified branching mechanism $ϕ (ρ_{e} z) / ϕ (ρ_{e}),$ having a mean number of offspring $ϕ^{'} (ρ_{e}) < 1 .$ From the above expression of $Φ^{'} (1)$ , we obtain $ρ_{e} < \frac{q}{p (d - 1)}$ , an upper-bound of $ρ_{e} .$

From (9),

ρ_{e} = Φ (1) = \sum_{n \geq 1} P (\bar{N} (1) = n)

is an expression of

ρ_{e} .

There is also an estimate of

ρ_{e}

when the BGW process is nearly supercritical (

μ

slightly above 1). Let

{\bar{ρ}}_{e} = 1 - ρ_{e}

be the survival probability and

f (z) = ϕ (z) - z

, with

f (1) = 0, f^{'} (1) = μ - 1 and f^{″} (1) = E (ξ (ξ - 1)) = σ^{2} + μ^{2} - μ \underset{μ \sim 1^{+}}{\sim} σ_{c}^{2},

where

σ_{c}^{2}

is the variance in

ξ

at the criticality. We have

ρ_{e} = ϕ (ρ_{e}) \Leftrightarrow f (1 - {\bar{ρ}}_{e}) = 0 .

As a result of

f (1 - x) \sim f (1) - x f^{'} (1) + \frac{1}{2} x^{2} f^{″} (1),

we obtain the small survival probability estimate

{\bar{ρ}}_{e} \sim 2 (μ - 1) / σ_{c}^{2}

when the BGW process is nearly supercritical. As a function of

μ - 1

,

{\bar{ρ}}_{e}

is always continuous at 0 (

{\bar{ρ}}_{e} = 0

if

μ - 1 \leq 0

) but with a discontinuous slope at

{(μ - 1)}_{+}

, close to

2 / σ_{c}^{2} < \infty

. As

μ \to \infty

clearly

{\bar{ρ}}_{e} \to 1 .

A full power-series expansion of

{\bar{ρ}}_{e}

in terms of

μ - 1 > 0

can also be obtained as follows: define

\bar{ϕ} (z)

by

ϕ (z) = 1 + μ (z - 1) + \bar{ϕ} (1 - z)

, so with

\bar{ϕ} (0) = 0 .

The equation

ρ_{e} = ϕ (ρ_{e})

becomes

\frac{\bar{ϕ} ({\bar{ρ}}_{e})}{{\bar{ρ}}_{e}} = μ - 1 .

For

μ - 1

close to

0^{+}

, the Lagrange inversion formula gives

{\bar{ρ}}_{e} = \sum_{k \geq 1} ρ_{k} {(μ - 1)}^{k}, with

(15)

ρ_{k} = \frac{1}{k} [z^{k - 1}] {(\frac{\bar{ϕ} (z)}{z^{2}})}^{- k} .

Note

ρ_{1} = 2 / ϕ^{″} (1)

with

ϕ^{″} (1) \sim σ_{c}^{2}

when

μ

is slightly above 1. To the first order in

μ - 1

, we recover

{\bar{ρ}}_{e} \sim 2 (μ - 1) / σ_{c}^{2}

. The second-order coefficient is found to be

ρ_{2} = 4 / 3 \cdot ϕ^{‴} (1) / ϕ^{″} {(1)}^{3} .

Let us detail these formulae in the binomial example.

Example 1.

If

ϕ (z) = {(q + p z)}^{d}

,

μ = ϕ^{'} (1) = p d > 1

(the binomial case) and

ϕ^{″} (1) = p^{2} d (d - 1) \sim σ_{c}^{2}

. Here

\bar{ϕ} (z) / z^{2} = \sum_{b = 0}^{d - 2} {(- 1)}^{b} (\binom{d}{b + 2}) p^{b + 2} z^{b}

, giving

ρ_{k}

in principle, starting with

ρ_{1} = [z^{0}] {(\frac{\bar{ϕ} (z)}{z^{2}})}^{- 1} = 2 / [p^{2} d (d - 1)] .

▹

From the exact expression of

P (τ_{1, 0} > h)

as in [12,13], we observe the following finite-size scaling law in the slightly supercritical regime for which

μ = 1 + x / h,

x > 0

and

ρ_{e} \sim 1 - 2 (μ - 1) / σ_{c}^{2}

, [

σ_{c}^{2} = p^{2} d (d - 1)

, the critical variance of

ξ

when

μ = 1

:

h P (τ_{1, 0} > h) \to r (x) : = \frac{1}{σ_{c}^{2}} \frac{2 x e^{x}}{e^{x} - 1} as h \to \infty .

(16)

As in the strictly critical regime, the time to extinction has power-law tails with index 1, but with a non-constant asymptotic rate

r (x)

.

Regular supercritical or subcritical BGW processes conditioned to be critical. We end up with a last conditioning leading to a critical BGW tree with mean offspring number

μ_{c} = 1 .

Let

ϕ

, regular, obey

ϕ

which has a convergence radius of

z_{*} > 1

(possibly

z_{*} = \infty

) and

π_{0} > 0

. For such

ϕ

’s, a unique positive real root to the equation

ϕ (τ) - τ ϕ^{'} (τ) = 0,

(17)

exists, with

ρ_{e} = 1 < τ < z_{*}

if

μ < 1

(

ϕ (τ) > 1

),

τ = 1

if

μ = 1

and

ρ_{e} < τ < 1 < z_{*}

if

μ > 1

(

ϕ (τ) < 1

). In both cases,

ϕ^{'} (τ) < 1 .

Start with a supercritical branching process (

μ > 1

) and consider a process whose modified branching mechanism is

ϕ_{τ} (z) : = ϕ (τ z) / ϕ (τ)

, satisfying

ϕ_{τ} (1) = 1

and

ϕ_{τ}^{'} (1) = : μ_{c} = 1

, the one of a critical branching process with mean 1 offspring distribution and variance:

σ_{c}^{2} = τ^{2} ϕ^{″} (τ) / ϕ (τ)

. The transition matrix

P_{c}

of the critical process is given by its entries

P_{c} (k, k^{'}) = [z^{k^{'}}] ϕ_{τ} {(z)}^{k} = \frac{τ^{k^{'}}}{ϕ {(τ)}^{k}} P (k, k^{'}) = \frac{τ^{k^{'} - k}}{ϕ^{'} {(τ)}^{k}} P (k, k^{'}) .

(18)

In terms of p.g.f., with

Φ_{τ} (z) : = τ^{- 1} Φ (τ / ϕ (τ) z)

, where

Φ (z)

is the total progeny’s supercritical generating function (g.f.), solving

Φ (z) = z ϕ (Φ (z))

,

Φ (0) = 0

,

Φ_{τ} (z) = z ϕ_{τ} (Φ_{τ} (z)), Φ_{τ} (0) = 0 .

The new

Φ_{τ} (z)

has a new shifted convergence radius 1 (obeying

τ / ϕ (τ) z \leq z_{c}

) and with

Φ_{τ} (1) = τ^{- 1} Φ (τ / ϕ (τ)) = 1

(

Φ (z_{c}) = τ

).

This transformation kills the supercritical paths to only select the critical ones.

When

ϕ (z) = {(q + p z)}^{d},

τ = q / [p (d - 1)]

and

τ / ϕ (τ) = \frac{q}{p (d - 1)} {(\frac{d - 1}{d q})}^{d}

. Hence,

ϕ_{τ} (z) : = ϕ (τ z) / ϕ (τ) = {(\frac{d - 1 + z}{d})}^{d},

is the critical binomial branching mechanism.

Similarly, starting with a subcritical branching process (

μ < 1

) and considering a process whose modified branching mechanism (as a p.g.f.) is

{\tilde{ϕ}}_{c} (z) = ϕ (τ z) / ϕ (τ)

, satisfying

ϕ_{τ} (1) = 1

and

ϕ_{τ}^{'} (1) = : μ_{c} = 1

, the one of a critical branching process.

This transformation creates critical paths from the subcritical ones.

2.3. Random Trees as Weighted Combinatorial Trees

Consider the plane tree ordinary generating function (o.g.f.) solving

T (z) = z {(1 + T (z))}^{d}

, with

c_{n} = [z^{n}] T (z) = \frac{1}{n} [z^{n - 1}] {(1 + z)}^{n d} = \frac{1}{n} (\binom{n d}{n - 1})

counting the number of such combinatorial binomial size

— n

trees. By the Stirling formula, with

η_{c} : = sup (z > 0 : T (z) < \infty) = \frac{{(d - 1)}^{d - 1}}{d^{d}} < 1,

for large n,

c_{n} \sim \frac{1}{\sqrt{2 π}} {(\frac{d}{{(d - 1)}^{3}})}^{1 / 2} n^{- 3 / 2} η_{c}^{- n} .

(19)

Let

w (τ_{n}) = \prod_{b = 0}^{n - 1} w_{b}^{n_{b} (τ_{n})}

be the (multiplicative) weight of an unlabeled rooted tree

τ_{n}

with n nodes (

|τ_{n}| = n

) having

n_{b} (τ_{n})

nodes with outdegree (branching number)

b .

The weight

w (τ_{n})

is the product over the n nodes x of

τ_{n}

of the

w_{b_{n} (x)}

’s, where

b_{n} (x)

is the outdegree of x. Then,

W_{n} = \sum_{τ_{n}} w (τ_{n})

is the weight of all size-n such trees associated to the weight sequence w

: = (w_{b} \geq 0, b \geq 0),

so with

W_{n} = P (\bar{N} (1) = n)

, if, as in the binomial case,

w_{b} = p^{b} q^{d - b},

b = 0, \dots, d

and the number of these trees is

c_{n} .

Let

Φ (z) = \sum_{n \geq 1} z^{n} W_{n}

. Then,

Φ (z)

solves

Φ (z) = z ϕ (Φ (z)),

Φ (0) = 0,

where

ϕ (z) = \sum_{b \geq 0} π_{b} z^{b} = {(q + p z)}^{d} .

Recalling

\sum_{b = 0}^{n - 1} n_{b} (τ_{n}) = n

and

\sum_{b = 1}^{n - 1} b n_{b} (τ_{n}) = n - 1

(the total tree length), each tree

τ_{n}

has equal weight

\prod_{b = 0}^{d} w_{b}^{n_{b} (τ_{n})} = p^{n - 1} q^{n (d - 1) + 1} = \frac{q}{p} {(p q^{d - 1})}^{n}

and

W_{n} = P (\bar{N} (1) = n) = c_{n} \frac{q}{p} {(p q^{d - 1})}^{n},

(20)

a separable form in agreement with (9). Simply generated weighted trees are weighted versions of rooted such trees and have been introduced in [14]. When dealing with k-forests of size-n, owing now to

\{\begin{matrix} \sum_{b = 0}^{n - 1} n_{b} (τ_{n}) = n \\ \sum_{b = 1}^{n - 1} b n_{b} (τ_{n}) = n - k, \end{matrix}

then

\prod_{b = 0}^{d} w_{b}^{n_{b} (τ_{n})} = p^{n - k} q^{n (d - 1) + k} = {(\frac{q}{p})}^{k} {(p q^{d - 1})}^{n}

and

P (\bar{N} (k) = n) = c_{n, k} {(\frac{q}{p})}^{k} {(p q^{d - 1})}^{n}

where

c_{n, k}

is the number of combinatorial binomial size

— n

forests with k-trees.

2.4. Selection of Paths Mechanisms of Random Trees: Rescaling

With

a_{1}, a_{2} > 0

, consider the weight sequence

w_{b} = a_{1}^{b} a_{2}

for which

\prod_{b = 0}^{d} w_{b}^{n_{b} (τ_{n})} = a_{1}^{n - 1} a_{2}^{n}

. Then,

P (\bar{N} (1) = n) \to P (\bar{N} (1) = n) a_{1}^{n - 1} a_{2}^{n}

is the weighted version of

P (\bar{N} (1) = n)

. Equivalently,

Φ (z) \to \tilde{Φ} (z) = a_{1}^{- 1} Φ (a_{1} a_{2} z),

(21)

solving

\tilde{Φ} (z) = z \tilde{ϕ} (\tilde{Φ} (z))

where

\tilde{ϕ} (z) = a_{2} ϕ (a_{1} z)

is the modified ‘branching mechanism’ and not necessarily a p.g.f. It is a p.g.f. when

a_{2} = 1 / ϕ (a_{1})

[if in addition

a_{1} = τ

, this is the selection of the critical paths mechanism discussed above; if in addition

a_{1} = ρ_{e}

, this is the selection of the subcritical paths mechanism discussed above].

If

a_{1} = 1

and

a_{2} \leq z_{c}

,

\tilde{Φ} (z) = Φ (a_{2} z)

resulting in a weighted version of

Φ (z)

with shifted convergence radius

z_{c} \to {\tilde{z}}_{c} = z_{c} / a_{2} \geq 1

[

= 1

if in addition

a_{2} = z_{c}

]. Note that

\tilde{ϕ} (z) = a_{2} ϕ (z)

no longer is a branching mechanism if

ϕ (z)

is one with

ϕ (1) = 1

.

If

a_{1} a_{2} = 1

,

\tilde{ϕ} (z) = a_{1}^{- 1} ϕ (a_{1} z)

is the modified ‘branching mechanism’, not necessarily a p.g.f. unless

a_{1} = ϕ (a_{1})

. Then,

\tilde{Φ} (z) = a_{1}^{- 1} Φ (z)

, resulting in a scaled version of

Φ (z)

with an unmodified convergence radius. If

Φ (1) < 1

, choosing

a_{1} = a_{2}^{- 1} = Φ (1)

yields

\tilde{Φ} (1) = 1 .

2.5. Total Number of Leaves (Sterile Individuals) Versus Total Progeny

In the branching population models just discussed, it is important to control the number of leaves in the BGW tree with a single founder because leaves are nodes (individuals) of the tree (population) that gave birth to no offspring (the frontier of the tree as sterile individuals) and so are responsible for its extinction. Leaves are nodes with outdegree zero, so let

{\bar{N}}^{0} (1)

be the number of leaves in a BGW tree with

\bar{N} (1)

nodes. With

Φ (z, u_{0}) = E (z^{\bar{N} (1)} u_{0}^{{\bar{N}}^{0} (1)})

the joint p.g.f. of

(\bar{N} (1), {\bar{N}}^{0} (1))

solves the functional equation

Φ (z, u_{0}) = z (π_{0} (u_{0} - 1) + ϕ (Φ (z, u_{0}))) .

(22)

With

{\bar{N}}_{n}^{0} (1) : = {\bar{N}}^{0} (1) ∣ \bar{N} (1) = n

, we have

E (u_{0}^{{\bar{N}}_{n}^{0} (1)}) = \frac{[z^{n}] Φ (z, u_{0})}{[z^{n}] Φ (z, 1)},

where

Φ (z, 1) = Φ (z)

. It is shown using this in [15], Th.

3.13

, page 84, that, under our assumptions of

ϕ

,

\begin{matrix} \frac{1}{n} E ({\bar{N}}_{n}^{0} (1)) \underset{n \to \infty}{\to} m_{0} = \frac{ϕ (0)}{ϕ (τ)} \\ \frac{1}{n} σ^{2} ({\bar{N}}_{n}^{0} (1)) \underset{n \to \infty}{\to} σ_{0}^{2} = \frac{ϕ (0)}{ϕ (τ)} - \frac{ϕ {(0)}^{2}}{ϕ {(τ)}^{2}} - \frac{ϕ {(0)}^{2}}{τ^{2} ϕ {(τ)}^{2} ϕ^{″} (τ)} \\ \frac{{\bar{N}}_{n}^{0} (1) - m_{0} n}{σ_{0} \sqrt{n}} \underset{n \to \infty}{\overset{d}{\to}} N (0, 1) . \end{matrix}

(23)

As

n \to \infty

,

\frac{1}{n} {\bar{N}}_{n}^{0} (1)

converges in probability to

m_{0} < 1

, the asymptotic fraction of nodes in a size-n tree which are leaves. For the Geo₀

(π_{0})

generated tree with

ϕ (z) = π_{0} / (1 - {\bar{π}}_{0} z)

, it can be checked that

m_{0} = 1 / 2

, whereas for the Poisson generated tree with p.g.f.

ϕ (z) = e^{μ (z - 1)}

,

m_{0} = e^{- 1} .

For the negative binomial tree generated by

ϕ (z) = {(β / (1 - α z))}^{θ}

,

m_{0} = {(θ / (θ + 1))}^{θ}

, and for the Flory d-ary tree generated by the p.g.f.

ϕ (z) = {(1 - p + p z)}^{d}

,

m_{0} = {((d - 1) / d)}^{d} .

One possible way to see this is as follows.

With

{(x)}_{m} : = x (x - 1) \dots (x - m + 1)

, the m-falling factorial moments of

{\bar{N}}_{n}^{0} (1)

are given from Lagrange inversion formula by

\begin{matrix} E {({\bar{N}}_{n}^{0} (1))}_{m} & = & \frac{\frac{1}{n} [{(u_{0} - 1)}^{m} z^{n - 1}] {(π_{0} (u_{0} - 1) + ϕ (z))}^{n}}{\frac{1}{n} [z^{n - 1}] ϕ {(z)}^{n}} \\ = & \frac{(\binom{n}{m}) π_{0}^{m} [z^{n - 1}] ϕ {(z)}^{n - m}}{[z^{n - 1}] ϕ {(z)}^{n}} . \end{matrix}

When

m = 1

and for

ϕ (z) = {(q + p z)}^{d}

, a large n estimate using Stirling formula yields

\frac{1}{n} E ({\bar{N}}_{n}^{0} (1)) \underset{n \to \infty}{\to} m_{0} = {((d - 1) / d)}^{d},

(24)

independent of

p .

The variance estimate follows after some elementary algebraic computations dealing with

m = 2 .

When

d = 2

,

\frac{1}{n} E ({\bar{N}}_{n}^{0} (1)) \underset{n \to \infty}{\to} m_{0} = 1 / 4

as a result of

E {\bar{N}}_{n}^{0} (1) = n \frac{(\binom{2 (n - 1)}{n - 1})}{(\binom{2 n}{n - 1})} .

Remark 1.

Taking

z = 1

in (22),

Φ (u_{0}) : = Φ (1, u_{0})

solves the functional equation

Φ (u_{0}) = π_{0} (u_{0} - 1) + ϕ (Φ (u_{0}))

, with

Φ (u_{0}) = E (u_{0}^{{\bar{N}}^{0} (1)}, \bar{N} (1) < \infty)

, where

{\bar{N}}^{0} (1)

is the number of leaves of the tree regardless of its precise number of atoms. By the Lagrange inversion formula, the probability to observe

n_{0}

leaves under

\bar{N} (1) < \infty

is

[u_{0}^{n_{0}}] Φ (u_{0}) = \frac{1}{n_{0}} [u_{0}^{n_{0} - 1}] {(\frac{π_{0} u_{0}}{u_{0} + π_{0} - ϕ (u_{0})})}^{n_{0}} .

2.6. Forests

Consider now a k-forest of such trees (so with k founders). It takes into account the possibility that there are k independent distinguishable copies of BGW trees, each with a single founder. By the Lagrange formula, with

\bar{N} (k)

, the size of a forest given that it has k founders, consistently with (4), we have

\begin{matrix} P (\bar{N} (k) = n) & : & = [z^{n}] Φ {(z)}^{k} = \frac{k}{n} [z^{n - 1}] z^{k - 1} {(q + p z)}^{n d} \\ = & \frac{k}{n} [z^{n - k}] {(q + p z)}^{n d} = \frac{k}{n} (\binom{n d}{n - k}) {(\frac{q}{p})}^{k} {(p q^{d - 1})}^{n} . \end{matrix}

In the critical case (

p d = 1

),

P (\bar{N} (k) = n) = \frac{k}{n c_{n}} (\binom{n d}{n - k}) {(d - 1)}^{k - n} ({(1 - 1 / d)}^{n d}) .

In this case, the BGW process is with a constant (

= k

) population size over the generations on average. The extinction probability of a

k —

forest is

ρ_{e}^{k} .

By the Stirling formula, with

\bar{α} : = 1 - α

,

(\binom{n d}{n - k}) = \frac{(n d)!}{(n - k)! (n (d - 1) + k)!} \sim \sqrt{\frac{d}{2 π \bar{α} (d - \bar{α}) n}} {(\frac{d^{d}}{{\bar{α}}^{\bar{α}} {(d - \bar{α})}^{d - \bar{α}}})}^{n} .

As a result, with

n \geq k

, in the thermodynamic limit

n, k \to \infty

while

k = [n α]

,

0 < α \leq 1

, the number

K_{n}

of connected components (trees) given a size

— n

forest population obeys

0

P (K_{n} / n \to α) \sim α \sqrt{\frac{d}{2 π \bar{α} (d - \bar{α}) n}} {[\frac{d^{d} {(\frac{q}{p})}^{α} (p q^{d - 1})}{{\bar{α}}^{\bar{α}} {(d - \bar{α})}^{d - \bar{α}}}]}^{n} .

(25)

Hence, the number of trees forming a size

— n

forest obeys

- \frac{1}{n} log P (K_{n} / n \to α) \underset{n \to \infty}{\to} f_{1} (α) \geq 0,

(26)

where

f_{1} (α) = - log a_{1} (α)

and

a_{1} (α) = \frac{d^{d} {(\frac{q}{p})}^{α} (p q^{d - 1})}{{\bar{α}}^{\bar{α}} {(d - \bar{α})}^{d - \bar{α}}}

. We have

\begin{matrix} f_{1}^{'} (α) & = & log \frac{d - \bar{α}}{\bar{α}} - log (q / p) \\ f_{1}^{″} (α) & = & \frac{1}{α} + \frac{1}{d - \bar{α}} > 0 . \end{matrix}

The function

f_{1} (α)

is convex over

0 < α \leq 1

. In the subcritical regime (

p d < 1

), it has a minimum at

α_{*} = 1 - p d

, with

f_{1} (α_{*}) = 0

translating that

K_{n} / n \to α_{*}

, almost surely as

n \to \infty

.

Similarly, with

\bar{N} (k)

, the number of atoms in a k-forest, with

n = [k ρ],

ρ \geq 1,

\begin{matrix} P (\bar{N} (k) / k \to ρ) \\ \sim & \sqrt{\frac{d}{2 π k (ρ - 1) ((d - 1) ρ + 1)}} {(d^{d ρ} {(\frac{p ρ}{ρ - 1})}^{ρ - 1} {(\frac{q ρ}{(d - 1) ρ + 1})}^{(d - 1) ρ + 1})}^{k} . \end{matrix}

With

a_{2} (ρ) : = d^{d ρ} {(\frac{p ρ}{ρ - 1})}^{ρ - 1} {(\frac{q ρ}{(d - 1) ρ + 1})}^{(d - 1) ρ + 1}

and

\begin{matrix} f_{2} (ρ) & : & = - log a_{2} (ρ) \\ = & - ρ (log \frac{p q^{d - 1} {(ρ d)}^{d}}{(ρ - 1) {((d - 1) ρ + 1)}^{d - 1}}) + log \frac{p (d - 1) ρ + 1}{q (ρ - 1)} \end{matrix}

(convex over the domain

ρ \geq 1

), by Cramér’s theorem [16],

- \frac{1}{k} log P (\bar{N} (k) / k \to ρ) \to f_{2} (ρ) \geq 0 .

(27)

The rate function

f_{2} (ρ)

is the Legendre transform of the convex free energy

log Φ (z) .

Therefore,

f_{2} (ρ) = ρ log z_{ρ} - log Φ (z_{ρ}) where \frac{z_{ρ} Φ^{'} (z_{ρ})}{Φ (z_{ρ})} = ρ,

(28)

with

\begin{matrix} z_{ρ} & = & \frac{q}{p} \frac{(ρ - 1) {((d - 1) ρ + 1)}^{d - 1}}{{(ρ q d)}^{d}} . \\ Φ (z_{ρ}) & = & \frac{q (ρ - 1)}{p (d - 1) ρ + 1}, Φ^{'} (z_{ρ}) = z_{ρ}^{- 1} ρ Φ (z_{ρ}) . \end{matrix}

As required, as

ρ \to \infty,

z_{ρ} \to z_{c}

,

Φ (z_{ρ}) \to Φ (z_{c}) = τ

and

Φ^{'} (z_{ρ}) \to \infty .

In the subcritical case, a Central Limit Theorem (CLT) holds:

\frac{\bar{N} (k) - k E (\bar{N} (1))}{\sqrt{k} σ (\bar{N} (1))} \overset{d}{\to} N (0, 1) .

(29)

In the binary case (

d = 2

),

\begin{matrix} f_{2} (ρ) & = & - log [4^{ρ} {(\frac{p ρ}{ρ - 1})}^{ρ - 1} {(\frac{q ρ}{ρ + 1})}^{ρ + 1}] \\ f_{2}^{'} (ρ) & = & log (\frac{ρ^{2} - 1}{ρ^{2}}) - log (4 p q) \\ f_{2}^{″} (ρ) & = & \frac{2}{ρ (ρ^{2} - 1)} > 0 . \end{matrix}

In this case,

\begin{matrix} z_{ρ} & = & \frac{ρ^{2} - 1}{4 p q ρ^{2}} \\ Φ (z_{ρ}) & = & \frac{q (ρ - 1)}{p ρ + 1}, Φ^{'} (z_{ρ}) = \frac{4 p q^{2} ρ^{3}}{(p ρ + 1) (ρ + 1)} . \end{matrix}

Note

f_{2}^{'} (ρ) = 0

when

ρ = : ρ_{*} = 1 / \sqrt{1 - 4 p q} = 1 / |1 - 2 p| > 1 .

In the subcritical case,

ρ_{*} = 1 / (1 - 2 p) > 1,

with

f_{2} (ρ_{*}) = 0

, translating

\bar{N} (k) / k \to ρ_{*}

, almost surely as

k \to \infty

.

In the critical case,

\bar{N} (1)

is heavy tailed with tail index

1 / 2

; we thus expect that

k^{- 2} \bar{N} (k) \overset{d}{\to} S_{1 / 2}

(a stable

(1 / 2)

random variable) and therefore that

n^{- 1 / 2} K_{n} \overset{d}{\to} S_{1 / 2}^{- 1 / 2}

(

n \to \infty

).

In the supercritical case,

ρ_{*} = 1 / (2 p - 1) > 1,

with

f_{2} (ρ_{*}) = - \frac{2}{2 p - 1} log (4 p q) > 0 .

Remark 2.

(i) The process

{(\bar{N} (k))}_{k}

has stationary independent increments. The processes

\bar{N} (k)

and

K_{n}

are mutual inverses in that

K_{n} = inf (k : \bar{N} (k) > n) .

: The process ${(K_{n})}_{n}$ is a renewal process with times elapsed between consecutive moves up by one unit all distributed like $\bar{N} (1)$ :

$K_{n} \overset{d}{=} 1 \cdot 1_{\{\bar{N} (1) > n - 1\}} + \sum_{m = 1}^{n - 1} 1_{\{\bar{N} (1) = m\}} \cdot (1 + K_{n - m}^{'}), n \geq 1 .$
(ii): Considering the weak case with $Φ_{0} (z)$ substituted to $Φ (z)$ , the Lagrange inversion formula yields

$\begin{matrix} P (\bar{N} (k) = n) & : & = [z^{n}] Φ_{0} {(z)}^{k} = \frac{k}{n} [z^{n - 1}] z^{k - 1} {(q + p z)}^{(n + k - 1) d} \\ = & \frac{k}{n} [z^{n - k}] {(q + p z)}^{(n + k - 1) d} = \frac{k}{n q} (\binom{(n + k - 1) d}{n - k}) {(\frac{q^{d + 1}}{p})}^{k} {(p q^{d - 1})}^{n} . \end{matrix}$

In the upper binomial term, n is changed to $n + k - 1$ ( $\sim k (ρ + 1)$ ). By the Stirling formula, a new $f_{2} (ρ)$ is obtained while substituting $ρ + 1$ to ρ in (28). Note now the new $ρ \geq 0 .$

Maxwell–Boltzmann

d —

partition of n into k parts.

Let

{\bar{N}}_{n, k} = ({\bar{N}}_{n, 1}, \dots, {\bar{N}}_{n, k})

be the joint total progenies of each of the k founders. With

n_{k} = (n_{1}, \dots, n_{k})

, we have

\begin{matrix} P ({\bar{N}}_{n, k} = n_{k}) & = & \frac{[z^{n} \prod_{l = 1}^{k} z_{l}^{n_{l}}] \prod_{l = 1}^{k} Φ (z z_{l})}{[z^{n}] Φ {(z)}^{k}} \\ = & \frac{\prod_{l = 1}^{k} P ({\bar{N}}_{n, l} = n_{l})}{P (\bar{N} (k) = n)} δ_{|n_{k}| = n} \\ = & \frac{\prod_{l = 1}^{k} \frac{1}{n_{l}} (\binom{n_{l} d}{n_{l} - 1})}{\frac{k}{n} (\binom{n d}{n - k})} δ_{|n_{k}| = n} . \end{matrix}

where

|n_{k}| = \sum_{l = 1}^{k} n_{l}

sums to n. It is an exchangeable Maxwell–Boltzmann balls-in-boxes distribution (independent of

p, q

) on the simplex

|n_{k}| = n

, where the balls consist of the progenies of each founder (the boxes).

The size of a typical (1-dimensional marginal) box occupancy is given by

P ({\bar{N}}_{n, 1} = n_{1}) = \frac{[z^{n_{1}}] Φ (z) [z^{n - n_{1}}] Φ {(z)}^{k - 1}}{[z^{n}] Φ {(z)}^{k}}

(30)

= \frac{(k - 1) n}{k n_{1} (n - n_{1})} \frac{(\binom{n_{1} d}{n_{1} - 1}) (\binom{(n - n_{1}) d}{n - n_{1} - k + 1})}{(\binom{n d}{n - k})} .

With

n_{1} = 1, \dots, n - k + 1

, summing over

n_{1}

, the pmf

{\bar{N}}_{1}

is also seen non-defective and proper. Clearly,

E ({\bar{N}}_{1}) = n / k .

A large population thermodynamic limit exists (

n, k \to \infty

with

n / k \to ρ \geq 1

), with

{\bar{N}}_{1} = : {\bar{N}}_{1} (ρ)

having the mean-

ρ

limiting distribution given, from a saddle-point analysis, by

E (z^{{\bar{N}}_{1} (ρ)}) = \frac{Φ (z z_{ρ})}{Φ (z_{ρ})},

(31)

where

z_{ρ} \in (0, z_{c})

solves

\frac{z Φ^{'} (z)}{Φ (z)} = ρ .

(32)

For all

ρ \geq 1

,

z_{ρ}

is uniquely defined firstly because

\frac{z Φ^{'} (z)}{Φ (z)}

is increasing and because, as observed before, when

z \to z_{c}

,

Φ (z) \to Φ (z_{c}) = τ

and

Φ^{'} (z) \to \infty

with

ρ \to \infty .

Strict d-partition of n into k parts.

So far, as in [17], we allow the population size to vary stochastically according to a Galton–Watson branching process, possibly with a constant size on average as in the critical case. However, most population genetics studies have their origins in a Wright–Fisher or some closely related fixed-population model, in which each individual randomly chooses its ancestor [18]. We briefly describe the situation relative to the binomial branching mechanism, in which the process is strictly (almost surely) with constant population size k over the generations. Consider k independent and identically distributed random variables

ξ_{k} : = (ξ_{1}, \dots, ξ_{k})

each with binomial distribution

ß : = (π_{1}, \dots, π_{d})

. Let

ν_{k} = (ν_{1, k}, \dots, ν_{k, k}) : = (ξ_{1}, \dots, ξ_{k} ∣ |ξ_{k}| = k) .

Then, with

l_{1} + \dots + l_{k} = k,

P (ν_{k} = l_{1}, \dots, l_{k}) = \frac{π_{l_{1}} \dots π_{l_{k}}}{[z^{k}] {(q + p z)}^{k d}} = \frac{(\binom{d}{l_{1}}) \dots (\binom{d}{l_{k}})}{(\binom{k d}{k})},

(33)

recalling

π_{l} = (\binom{d}{l}) p^{l} q^{d - l}

(= 0 if l \notin \{0, \dots, d\})

and

[z^{k}] {(q + p z)}^{k d} = (\binom{k d}{k}) p^{k} q^{k (d - 1)} .

This distribution is independent of p and

q .

As a consequence, we obtain the identity

\sum_{l_{1} + \dots + l_{k} = k} (\binom{d}{l_{1}}) \dots (\binom{d}{l_{k}}) = (\binom{k d}{k}) .

The distribution of

ν_{k}

is exchangeable, the law of each component being

\begin{matrix} P (ν_{1, k} = l_{1}) & = & \frac{[z^{k} z_{1}^{l_{1}}] {(q + p z z_{1})}^{d} {(q + p z)}^{(k - 1) d}}{[z^{k}] {(q + p z)}^{k d}} \\ = & \frac{(\binom{d}{l_{1}}) (\binom{(k - 1) d}{k - l_{1}})}{(\binom{k d}{k})}, l_{1} = 0, \dots, d \land k . \end{matrix}

When

d \to \infty

, ref. (33) boils down to

P (ν_{k} = l_{1}, \dots, l_{k}) = \frac{k!}{\prod_{i = 1}^{k} l_{i}!} k^{- k},

the multinomial Wright–Fisher distribution. Asymptotic independence is obtained when

k \to \infty

in which the law of

ν_{\infty}

takes the product form of independent and identically distributed mean

— 1

Poisson distributions.

2.7. Random Simple Trees with Given Outdegree Sequences

The joint generating function of simple trees with given outdegree sequence solves

Φ (z, u) = z (\sum_{b = 0}^{d} π_{b} u_{b} Φ {(z, u)}^{b}) .

Here,

u : = (u_{1}, \dots, u_{d})

marks the nodes with the different outdegrees. Hence,

\begin{matrix} [z^{n} \prod_{b = 0}^{d} u_{b}^{n_{b}}] Φ (z, u) & = & \frac{1}{n} [z^{n - 1} \prod_{b = 0}^{d} u_{b}^{n_{b}}] {(\sum_{b = 0}^{d} π_{b} u_{b} z^{b})}^{n} \\ = & \frac{1}{n} (\binom{n}{n_{0} n_{1} \dots n_{b}}) \prod_{b = 0}^{d} π_{b}^{n_{b}} \\ = & \frac{1}{n} (\binom{n}{n_{0}}) (\binom{n - n_{0}}{n_{1} \dots n_{b}}) \prod_{b = 0}^{d} π_{b}^{n_{b}} \end{matrix}

with the

n_{b}

’s obeying

\{\begin{matrix} n_{1} + \dots + n_{d} = n - n_{0} \\ \sum_{b = 1}^{d} b n_{b} = n - 1 . \end{matrix}

(34)

As a result:

n_{0} = 1 + \sum_{b = 1}^{d} (b - 1) n_{b}

and

n = 1 + \sum_{b = 1}^{d} b n_{b} .

There are

(\binom{n - n_{0} + d - 1}{d - 1})

, such non-negative

n_{b}

’s satisfying the first constraint (as a weak composition). In the sequel, we shall use the

* *

symbol whenever summing over the

n_{d} : = (n_{1}, \dots, n_{d})

obeying the two constraints (34) above. Clearly, the number of non-negative integers solving (34) is given by

[u^{n - n_{0}} z^{n - 1}] \prod_{b = 1}^{d} {(1 - u z^{b})}^{- 1} .

It is the number of unordered partitions of

n - 1

into no more than d non-negative parts, the number of occurrences of part b in a partition being

n_{b}

with

\sum_{b = 1}^{d} n_{b} = n - n_{0} .

The joint generating function (p.g.f.) of their nodes and leaves in particular reads

Φ (z, u_{0}) = z (q^{d} (u_{0} - 1) + {(q + p Φ (z, u_{0}))}^{d}),

hence with

\begin{matrix} [z^{n}] Φ (z, u_{0}) & = & \frac{1}{n} [z^{n - 1}] {(q^{d} (u_{0} - 1) + {(q + p z)}^{d})}^{n} \\ = & \frac{1}{n} [z^{n - 1}] \sum_{n_{0} = 0}^{n} (\binom{n}{n_{0}}) q^{n_{0} d} u_{0}^{n_{0}} {[{(q + p z)}^{d} - q^{d}]}^{n - n_{0}} . \end{matrix}

Therefore, the probability of a configuration with n atoms and

n_{0}

leaves is

[z^{n} u_{0}^{n_{0}}] Φ (z, u_{0}) = \frac{q^{n_{0} d}}{n} (\binom{n}{n_{0}}) [z^{n - 1}] {[{(q + p z)}^{d} - q^{d}]}^{n - n_{0}}

(35)

\begin{matrix} = & \frac{π_{0}^{n_{0}}}{n} (\binom{n}{n_{0}}) [z^{n - 1}] {[\sum_{b = 1}^{d} π_{b} z^{b}]}^{n - n_{0}} \\ = & \frac{π_{0}^{n_{0}}}{n} (\binom{n}{n_{0}}) \sum_{n_{d}}^{* *} (\binom{n - n_{0}}{n_{1} \dots n_{d}}) \prod_{b = 1}^{d} π_{b}^{n_{b}}, \end{matrix}

in view of

{[\sum_{b = 1}^{d} π_{b} z^{b}]}^{n - n_{0}} = \sum_{n_{1} + \dots + n_{d} = n - n_{0}} z^{\sum_{b = 1}^{d} b n_{b}} (\binom{n - n_{0}}{n_{1} \dots n_{d}}) \prod_{b = 1}^{d} π_{b}^{n_{b}} .

From (35), given n and

n_{0}

, the number of d-trees having outdegrees nodes sequence

n_{1}, \dots, n_{d}

satisfying

* *

in (34) (as the factor in front of the weights

\prod_{b = 1}^{d} π_{b}^{n_{b}}

) is

c_{n, n_{0}} (n_{d}) = \frac{(n - 1)!}{n_{0}!} \frac{1}{\prod_{b = 1}^{d} n_{b}!},

(36)

in agreement with [19,20] Theorem 4.

All this can prove useful and explicit in the following situation: suppose we are interested in a specific set of

n_{d}

’s, from which the values of

n, n_{0}

follow from (34). Then, the corresponding number of d-trees is known, together with the probability of such a configuration. For example, suppose

n_{b} = n_{1}

for

b = 1, \dots, d .

Then,

n = 1 + n_{1} (\binom{d + 1}{2})

and

n_{0} = n - d n_{1}

, and the probability of this uniform configuration is

\frac{(n - 1)!}{n_{0}!} \frac{{(\prod_{b = 0}^{d} π_{b})}^{n_{1}}}{{(n_{1}!)}^{d}} .

Remark 3.

In the binary case

(d = 2)

, for fixed values of

n, n_{0} \in \{1, \dots, ⌊(n + 1) / 2⌋\},

we have

n_{1} = n - 2 n_{0} + 1

and

n_{2} = n_{0} - 1

(a single possible choice for

n_{1}, n_{2}

); if

n - 2 n_{0} + 1 < 0

, there are no solutions to (34). Hence,

P (\bar{N} (1) = n, {\bar{N}}_{b} = n_{b}; b = 0, \dots, 2) = \frac{1}{n} (\binom{n}{n_{0} n_{1} n_{2}}) \prod_{b = 0}^{2} π_{b}^{n_{b}} \cdot δ_{n_{1} = (n - 2 n_{0} + 1), n_{2} = n_{0} - 1} .

In addition, while summing over

n_{0},

P (\bar{N} (1) = n) = \frac{q}{n p} (\binom{2 n}{n - 1}) {(p q)}^{n},

in agreement with (9). Given

n, n_{0}

, there are

c_{n, n_{0}} (n_{1}, n_{2}) = \frac{(n - 1)!}{n_{0}! (n_{0} - 1)! (n - 2 n_{0} + 1)!}

such binary simple trees. If

n = 10

,

n_{0} = 4

yields

9! / (4! 3! 3!) = 420 .

There are no binary trees with

n = 4

,

n_{0} = 3

.

As an extension, given n and

n_{0}

, the number of k-forests of simple d-trees having outdegrees nodes sequence

n_{1}, \dots, n_{d}

is

c_{n, n_{0}, k} (n_{d}) = \frac{(n - k)!}{(n_{0} - k + 1)!} \frac{1}{\prod_{b = 1}^{d} n_{b}!} .

Here

n_{1}, \dots, n_{d}

now satisfy

\{\begin{matrix} n_{1} + \dots + n_{d} = n - n_{0} \\ \sum_{b = 1}^{d} b n_{b} = n - k . \end{matrix}

This is in agreement with Theorem 4 of [1].

2.8. The Limiting Poisson Case ( $d \to \infty$ )

We here mention some related computations encompassing the limiting Poisson case. Let

T (z)

solve

T (z) = z e^{T (z)}

with

T (0) = 0,

and hence with solution

T (z) = \sum_{n \geq 1} \frac{n^{n - 1}}{n!} z^{n},

(37)

with for

n \geq 1

C_{n} : = n! [z^{n}] T (z) = n^{n - 1} .

C_{n}

counts the number of labeled Cayley simple rooted trees (the Cayley formula). The convergence radius of

T (z)

is

ϱ_{c} = 1 / e .

The g.f. of random Poisson

(μ)

rooted increasing trees solves the functional equation

Φ (z) = z ϕ (Φ (z))

with

ϕ (z) = e^{- μ (1 - z)}

and

Φ (0) = 0

. Hence,

Φ (z) = \sum_{n \geq 1} \frac{n^{n - 1}}{n!} {(z / z_{c})}^{n}

with convergence radius

z_{c} = e^{μ} / μ > 1

. Then,

P ({\bar{N}}_{n} (1) = n) = [z^{n}] Φ (z) = \frac{n^{n - 1}}{n!} z_{c}^{- n},

is the Borel distribution. With

w_{b} = μ^{b} e^{- μ}

, the weight of a node with outdegree b, the weight of a size

— n

tree is

w (τ_{n}) = \prod_{b \geq 0} w_{b}^{n_{b}} = μ^{n - 1} e^{- n μ}

independent of the

n_{b}

’s. Hence,

P ({\bar{N}}_{n} (1) = n) = \frac{C_{n}}{n!} w (τ_{n})

, a separable form.

Furthermore, for k-forests of such simple trees,

P (\bar{N} (k) = n) = \frac{k}{n} [z^{n - k}] e^{- n μ (1 - z)} = \frac{k}{n} e^{- n μ} \frac{{(μ n)}^{n - k}}{(n - k)!},

(38)

the Borel–Tanner distribution [21]. Using the Stirling formula, with

ρ > 1,

P (\frac{\bar{N} (k)}{k} \to ρ) \sim \frac{\sqrt{2 π k (ρ - 1)}}{ρ} e^{- ρ μ k} {(\frac{ρ - 1}{μ ρ})}^{- k (ρ - 1)}

showing that

- \frac{1}{k} log P (\frac{\bar{N} (k)}{k} \to ρ) \to f_{2} (ρ) = ρ μ + (ρ - 1) log (\frac{ρ - 1}{μ ρ}) .

(39)

The joint combinatorial generating function of their nodes and leaves reads

T (z, u_{0}) = z (u_{0} - 1 + e^{T (z, u_{0})}),

hence with

\begin{matrix} [z^{n}] T (z, u_{0}) & = & \frac{1}{n} [z^{n - 1}] {(u_{0} - 1 + e^{z})}^{n} \\ = & \frac{1}{n} [z^{n - 1}] \sum_{n_{0} = 0}^{n} (\binom{n}{n_{0}}) u_{0}^{n_{0}} {(e^{z} - 1)}^{n - n_{0}} . \end{matrix}

Therefore,

C_{n, n_{0}} : = n! [z^{n} u_{0}^{n_{0}}] T (z, u_{0}) = (n - 1)! (\binom{n}{n_{0}}) [z^{n - 1}] {(e^{z} - 1)}^{n - n_{0}}

(40)

= \frac{n!}{n_{0}!} S_{n - 1, n - n_{0}}, n_{0} = 1, \dots, n - 1,

due to the vertical g.f. of second-kind Stirling numbers

S_{n, n_{0}}

(see [22]):

\sum_{n \geq n_{0}} S_{n, n_{0}} \frac{z^{n}}{n!} = \frac{{(e^{z} - 1)}^{n_{0}}}{n_{0}!} \Rightarrow [z^{n}] {(e^{z} - 1)}^{n_{0}} = \frac{n_{0}!}{n!} S_{n, n_{0}} .

We obtain

n_{0} C_{n + 1, n_{0}} = n_{0} (n + 1) C_{n, n_{0}} + (n + 1) (n - n_{0} + 1) C_{n, n_{0} - 1},

(41)

with boundary conditions

C_{n, 0} = C_{n, n} = 0

,

C_{n, 1} = n!

and

C_{n, n - 1} = n .

In addition,

C_{n} : = [z^{n}] T (z, 1) = n^{n - 1} = \sum_{n_{0} = 1}^{n - 1} C_{n, n_{0}} .

Assuming uniform sampling, we have

P (L_{n} = n_{0}) = \frac{C_{n, n_{0}}}{C_{n}} > 0,

n_{0} = 1, \dots, n - 1

(otherwise, 0): the law of

L_{n}

has finite support, varying with n. From (41), with

L_{n} = {\bar{N}}_{n}^{0} (1)

, we obtain

n_{0} P (L_{n + 1} = n_{0}) = n_{0} \frac{C_{n + 1, n_{0}}}{C_{n + 1}} =

{(\frac{n}{n + 1})}^{n - 1} [n_{0} P (L_{n} = n_{0}) + (n - (n_{0} - 1)) P (L_{n} = n_{0} - 1)] .

(42)

The latter recursion (41) may be written as

P (L_{n + 1} = n_{0}) = q_{n_{0}, n_{0}}^{(n)} P (L_{n} = n_{0}) + q_{n_{0} - 1, n_{0}}^{(n)} P (L_{n} = n_{0} - 1),

defining the (positive) transition coefficients

q_{n_{0}, n_{0}}^{(n)}

and

q_{n_{0} - 1, n_{0}}^{(n)}

(not transition probabilities because

q_{n_{0}, n_{0}}^{(n)} + q_{n_{0}, n_{0} + 1}^{(n)} \neq 1

). This three-term (‘space-time’ inhomogeneous) recurrence is therefore not the one of a standard Markov chain with a usual probability transition matrix. However, it is the one of a triangular Markovian probability sequence whose support varies with n linearly.

Next, the identity ([23])

\sum_{n_{0} = 1}^{n - 1} C_{n, n_{0}} \frac{(\binom{x}{n - n_{0}})}{(\binom{n}{n - n_{0}})} = x^{n - 1}

yields (with

x = n - 1

):

\sum_{n_{0} = 1}^{n - 1} n_{0} C_{n, n_{0}} = n {(n - 1)}^{n - 1} .

Also,

\partial_{u_{0}} T (z, 1) = \frac{z}{1 - T (z, 1)},

with

\begin{matrix} [z^{n}] \partial_{u_{0}} T (z, 1) & = & [z^{n - 1}] \frac{1}{1 - T (z, 1)} = \frac{1}{n - 1} [z^{n - 2}] ({(\frac{1}{1 - z})}^{'} e^{z}) \\ = & n {(n - 1)}^{n - 1}, so that \end{matrix}

\begin{matrix} E (L_{n}) & = & \frac{[z^{n}] \partial_{u_{0}} T (z, 1)}{[z^{n}] T (z, 1)} = n {(1 - \frac{1}{n})}^{n - 1} \sim n / e \\ σ^{2} (L_{n}) & = & n (n - 1) {(1 + \frac{2}{n})}^{n - 1} + n {(1 - \frac{1}{n})}^{n - 1} - n^{2} {(1 - \frac{1}{n})}^{2 (n - 1)} \\ \sim & n \frac{e - 2}{e^{2}} . \end{matrix}

The variance term is obtained while plugging

x = n - 2

in the identity. The Central Limit Theorem (CLT) therefore holds (see [23]):

\frac{L_{n} - E (L_{n})}{σ (L_{n})} \overset{d}{\to} N (0, 1) .

(43)

With

z_{ρ}

obeying

\frac{z_{ρ} Φ^{'} (z_{ρ})}{Φ (z_{ρ})} = ρ \geq 1

,

\frac{Φ (z_{ρ} z)}{Φ (z_{ρ})}

is the mean

— ρ

’Cayley’ distribution of the typical box occupancy in the thermodynamic limit

n, k \to \infty

,

n / k = ρ .

2.9. The Case $d = 1$

Combinatorial linear BGW trees are those for which

T (z) = z (1 + T (z))

,

T (0) = 0

yielding

T (z) = z / (1 - z)

, with

c_{n} = [z^{n}] T (z) = 1

). The branching number of a node is either zero or one, leading to ‘threadlike trees’. We have

c_{n} = n^{- 1} [z^{n - 1}] {(1 + z)}^{n} = 1 .

Furthermore,

c_{n, k} = [z^{n}] T {(z)}^{k} = [z^{n - k}] {(1 - z)}^{- k}

= (\binom{n - 1}{k - 1})

, the number of (ordered) compositions of n into k parts. Note that by the Lagrange inversion formula,

c_{n, k} = \frac{k}{n} [z^{n - k}] {(1 + z)}^{n} .

Random linear increasing trees are those for which

Φ (z) = z (q + p Φ (z))

,

Φ (0) = 0

yielding

Φ (z) = q z / (1 - p z)

, with

[z^{n}] Φ (z) = c_{n} w (τ_{n})

and

w (τ_{n}) = q p^{n - 1}

, the weight of all such size

— n

trees. The extinction probability of this model is

ρ_{e} = 1

(a subcritical regime).

Furthermore,

P (K_{n} = k) = [z^{n}] Φ {(z)}^{k} = (\binom{n - 1}{k - 1}) {(q p^{n - 1})}^{k}

,

k = 1, \dots, n .

In a size

— n

forest with

K_{n}

trees, the law of the number of leaves coincides with the one of

K_{n}

as a result of any threadlike tree possessing a single leaf.

In the weak case, with

Φ_{0} (z) = q / (1 - p z)

,

P (K_{n} = k) = [z^{n}] Φ_{0} {(z)}^{k} = (\binom{n + k - 1}{k - 1}) {(q p^{n - 1})}^{k}, k \geq 0

where

(\binom{n + k - 1}{k - 1})

is the number of weak compositions of n into k parts.

3. Increasing (or Recursive) $d —$ Trees as Phylogenetic Trees

A size-n rooted and increasing labeled tree has vertices with indices or labels

\{1, \dots, n\}

increasing for any path from the root to its leaves. Wherever a new connection is created in this tree, the adjunction of a new node with index

n + 1

will result in a size-(

n + 1

) rooted increasing tree. Increasing trees can in addition be unordered (Cayley) or ordered. The combinatorial version of such trees was studied by [24].

Let

T (z)

solve

T^{'} (z) = {(1 + T (z))}^{d}

with

T (0) = 0,

and hence with solution

T (z) = {(1 - (d - 1) z)}^{- 1 / (d - 1)} - 1,

(44)

with for

n \geq 1

C_{n} : = n! [z^{n}] T (z) = {[1 / (d - 1)]}_{n} {(d - 1)}^{n} = \prod_{m = 0}^{n - 1} (1 + (d - 1) m) = : {[1 : d - 1]}_{n},

where

{[a]}_{n} = a (a + 1) \dots (a + n - 1)

.

C_{n}

counts the number of labeled binomial increasing trees. The convergence radius of

T (z)

is

r_{c} = 1 / (d - 1) \leq 1 .

Such increasing trees serve as models for phylogenetic trees in which nodes represent species with labels encoding their order of appearance in the tree, and thus the chronology of evolution. The leaves of the tree are the currently living species that can mutate to a new species; the internal nodes are the ones that can generate a new species (in the

d —

ary tree context, only nodes that are not at saturation with d offspring have this ability); the different trees of a forest consist of genera.

3.1. Random Binomial Increasing Trees

The g.f. of random binomial rooted increasing trees solves the ordinary differential equation

Φ^{'} (z) = ϕ (Φ (z))

with

ϕ (z) = {(q + p z)}^{d}

and

Φ (0) = 0

, and hence

Φ (z) = \frac{1}{p} [{(q^{- (d - 1)} - p (d - 1) z)}^{- 1 / (d - 1)} - q]

= \frac{q}{p} [{(1 - p q^{d - 1} (d - 1) z)}^{- 1 / (d - 1)} - 1],

(45)

with

z_{c} = sup (z > 0 : Φ (z) < \infty) = 1 / [p q^{d - 1} (d - 1)] > 1

and

Φ (z_{c}) = \infty

.

As a result,

Φ (z)

has an algebraic singularity of order

1 / (d - 1)

at

z_{c} .

Note that

Φ (1) = P (\bar{N} (1) < \infty) = : r_{e} = \frac{q}{p} [{(1 - 1 / z_{c})}^{- 1 / (d - 1)} - 1] < 1 .

(46)

Whatever the values of p and d, there is a positive probability that

\bar{N} (1) = \infty

, and there is no phase transition to subcriticality with almost sure extinction for the increasing version of a d-tree. For each

d \geq 2

, the convergence radius

z_{c}

is a convex function of

p,

taking its maximum value

{(d / (d - 1))}^{d} > 1

when

p = 1 / d .

With

{[a]}_{n} = a (a + 1) \dots (a + n - 1)

, we obtain

P (\bar{N} (1) = n) : = [z^{n}] Φ (z) = \frac{q}{p} \frac{{[1 / (d - 1)]}_{n}}{n!} z_{c}^{- n}, n \geq 1,

(47)

with

P (\bar{N} (1) = n) \sim \frac{q}{p} \frac{n^{- (d - 2) / (d - 1)}}{Γ (1 / (d - 1))} z_{c}^{- n}

, as

n \to \infty

, with geometric decay and presenting an algebraic prefactor, the exponent of which (

θ : = (d - 2) / (d - 1) \in [0, 1)

) now depends on

d .

Note that, as required, with

w (τ_{n}) = \prod_{b = 0}^{d} w_{b}^{n_{b} (τ_{n})} = \frac{q}{p} {(p q^{d - 1})}^{n}

, the weight of each tree

τ_{n}

,

P (\bar{N} (1) = n) = \frac{C_{n}}{n!} w (τ_{n}),

is a separable form.

Remark 4 (weighted versions of increasing trees).

With

a_{1}, a_{2} > 0

, consider the weight sequence

w_{b} = a_{1}^{b} a_{2}

for which

\prod_{b = 0}^{d} w_{b}^{n_{b} (τ_{n})} = a_{1}^{n - 1} a_{2}^{n}

. Then, as for simple trees,

P (\bar{N} (1) = n) \to P (\bar{N} (1) = n) a_{1}^{n - 1} a_{2}^{n}

is the weighted version of

P (\bar{N} (1) = n)

. Equivalently,

Φ (z) \to \tilde{Φ} (z) = a_{1}^{- 1} Φ (a_{1} a_{2} z),

(48)

now solving

{\tilde{Φ}}^{'} (z) = \tilde{ϕ} (\tilde{Φ} (z))

, where

\tilde{ϕ} (z) = a_{2} ϕ (a_{1} z)

is the modified ‘branching mechanism’, not necessarily a p.g.f. It is a p.g.f. when

a_{2} = 1 / ϕ (a_{1}) .

It is very unrealistic that any evolutionary process would lead to a configuration with infinitely many species. This forces one to consider the binomial increasing branching process conditioned on extinction, so with a modified binomial branching mechanism

ϕ (z) \to ϕ (r_{e} z) / r_{e}

(but here, not a p.g.f.). The p.g.f. of its total progeny, say

\bar{N} {(1)}^{*}

, then reads

Φ^{*} (z) : = E (z^{\bar{N} {(1)}^{*}}) = \frac{Φ (z)}{Φ (1)} = \frac{{(1 - z / z_{c})}^{- 1 / (d - 1)} - 1}{{(1 - 1 / z_{c})}^{- 1 / (d - 1)} - 1} .

Note that

E (\bar{N} {(1)}^{*}) = Φ^{'} (1) / Φ (1) = ϕ (r_{e}) / r_{e} > 1

, observing

r_{e} < ρ_{e}

where

ρ_{e}

is the smallest solution to

ϕ (ρ_{e}) = ρ_{e}

(the extinction probability of the simple BGW d-ary tree).

The Lagrange inversion formula version for increasing trees states that for all

n \geq 1

,

[z^{n}] h (Φ (z)) = \frac{1}{n} [z^{- 1}] h^{'} (z) P {(z)}^{- n} .

(49)

where

P (z) : = \int_{0}^{z} d z^{'} / ϕ (z^{'}) .

Note that, because

P (0) = 0

, with

R (z) = z / P (z)

(obeying

R (0) = 1 / π_{0} > 0

if

π_{0} = ϕ (0) > 0

), this is also

[z^{n}] h (Φ (z)) = \frac{1}{n} [z^{n - 1}] h^{'} (z) R {(z)}^{n} .

In particular, (

h (z) = z

) and

ϕ (z) = {(q + p z)}^{d}

,

P (z) = \frac{1}{p (d - 1)} [q^{1 - d} - {(q + p z)}^{1 - d}]

, with (47),

[z^{n}] Φ (z) = \frac{1}{n} [z^{- 1}] P {(z)}^{- n} = \frac{1}{n} [z^{n - 1}] R {(z)}^{n} = \frac{q}{p} \frac{{[1 / (d - 1)]}_{n}}{n!} z_{c}^{- n} .

And with

h (z) = z^{k}

,

P (\bar{N} (k) = n) = [z^{n}] Φ {(z)}^{k},

\begin{matrix} P (\bar{N} (k) = n) & = & \frac{k}{n} [z^{- 1}] z^{k - 1} P {(z)}^{- n} = \frac{k}{n} [z^{n - k}] R {(z)}^{n} \\ = & \frac{1}{n} {(\frac{q}{p})}^{k} \frac{{[k / (d - 1)]}_{n - k + 1}}{(n - k)!} z_{c}^{- n}, n \geq k . \end{matrix}

If

d = 2 :

P (z) = \frac{1}{p} [q^{- 1} - {(q + p z)}^{- 1}]

and

R (z) = 1 / (\frac{1}{q} \frac{1}{q + p z})

, yielding

P (\bar{N} (k) = n) = \frac{k q^{n}}{n} [z^{n - k}] {(q + p z)}^{n} = {(\frac{q}{p})}^{k} (\binom{n - 1}{k - 1}) {(p q)}^{n} .

It involves a composition of n into k parts factor. Using Stirling formula, with

ρ > 1

, as

k \to \infty,

- \frac{1}{k} log P (\frac{\bar{N} (k)}{k} \to ρ) \to f_{2} (ρ),

(50)

with

f_{2} (ρ) = - ρ log (p q ρ) + (ρ - 1) log (ρ - 1) - log (\frac{q}{p}) .

We have

f_{2}^{'} (ρ) = - log (\frac{ρ p q}{ρ - 1})

, vanishing at

ρ_{*} = 1 / (1 - p q) > 1

with

f_{2} (ρ_{*}) = - log (\frac{q^{2}}{1 - p q}) > 0

.

Note that, while considering instead the weak model,

\begin{matrix} P (\bar{N} (k) = n) & = & [z^{n}] Φ_{0} {(z)}^{k} = q^{k} [z^{n}] {(1 - z / z_{c})}^{- k / (d - 1)} \\ = & q^{k} \frac{{[k / (d - 1)]}_{n}}{n!} z_{c}^{- n}, n \geq 0 . \end{matrix}

When

d = 2

,

P (\bar{N} (k) = n) = q^{k} (\binom{n + k - 1}{k - 1}) z_{c}^{- n}

, involving the weak composition of n into k parts factor.

Remark 5.

The Lagrange inversion formula adapted to forests of distinguishable increasing trees reads

[z^{n}] K (z, v) = \frac{v}{n} [z^{n - 1}] \frac{R {(z)}^{n}}{{(1 - v z)}^{2}},

(51)

where

K (z, v) = \frac{1}{1 - v Φ (z)} .

The p.g.f. of the number

K_{n}

of increasing trees of a size

— n

forest is

E (v^{K_{n}}) = [z^{n}] K (z, v)

, with

P (K_{n} = k) = [v^{k} z^{n}] K (z, v) .

Should the trees be distinguishable, the same formulae hold but now with

K (z, v) = e^{v Φ (z)}

, so

[z^{n}] K (z, v) = \frac{v}{n} [z^{n - 1}] e^{v z} R {(z)}^{n} .

(52)

3.2. Distribution of the Number of Leaves

The joint generating of nodes and leaves solves

\partial_{z} Φ (z, u_{0}) = π_{0} (u_{0} - 1) + {(q + p Φ (z, u_{0}))}^{d} .

Hence,

z = \int_{0}^{Φ (z, u_{0})} \frac{d z^{'}}{π_{0} (u_{0} - 1) + {(q + p z^{'})}^{d}},

with unknown explicit solution

Φ (z, u_{0})

in general. With

n_{0} = 1, \dots, n - 1

, we have

P (\bar{N} (1) = n, {\bar{N}}_{0} (1) = n_{0}) : = [z^{n} u_{0}^{n_{0}}] Φ (z, u_{0}),

(53)

with no known explicit solution in general.

Consider indeed the integral

P (z, u_{0}) = \int_{0}^{z} \frac{d z^{'}}{π_{0} (u_{0} - 1) + {(q + p z^{'})}^{d}},

of which

Φ (z, u_{0})

is the

z —

inverse:

P (Φ (z, u_{0}), u_{0}) = z

.

With

y_{b} = e^{2 i π b / d},

b = 0, \dots, d - 1,

the b-th root of unity, we have

\begin{matrix} P (z, u_{0}) & = & {[π_{0} (1 - u_{0})]}^{1 / d - 1} \int_{q {[π_{0} (1 - u_{0})]}^{- 1 / d}}^{(q + p z) {[π_{0} (1 - u_{0})]}^{- 1 / d}} \frac{d y}{- 1 + y^{d}} \\ = & {[π_{0} (1 - u_{0})]}^{1 / d - 1} \int_{q {[π_{0} (1 - u_{0})]}^{- 1 / d}}^{(q + p z) {[π_{0} (1 - u_{0})]}^{- 1 / d}} \prod_{b = 0}^{d - 1} \frac{d y}{y - y_{b}} . \end{matrix}

A fraction decomposition into simple elements of the integrand yields in principle the expression of

P (z, u_{0}) :

P (z, u_{0}) = {[π_{0} (1 - u_{0})]}^{1 / d - 1} \sum_{b = 0}^{d - 1} A_{b}^{- 1} log \frac{(q + p z) {[π_{0} (1 - u_{0})]}^{- 1 / d} - y_{b}}{q {[π_{0} (1 - u_{0})]}^{- 1 / d} - y_{b}},

where

A_{b} = \prod_{b^{'} \neq b} (y_{b^{'}} - y_{b})

. The dominant root is

y_{0} = 1,

so with

A_{0} = \prod_{b = 1}^{d - 1} (1 - y_{b}),

P (z, u_{0}) \sim A_{0}^{- 1} {[π_{0} (1 - u_{0})]}^{1 / d - 1} log \frac{(q + p z) {[π_{0} (1 - u_{0})]}^{- 1 / d} - 1}{q {[π_{0} (1 - u_{0})]}^{- 1 / d} - 1},

with dominant inverse

Φ (z, u_{0}) \sim \frac{(q {[π_{0} (1 - u_{0})]}^{- 1 / d} - 1) (e^{A_{0} {[π_{0} (1 - u_{0})]}^{1 - 1 / d} z} - 1)}{p {[π_{0} (1 - u_{0})]}^{- 1 / d}} .

Alternatively, with

B (a, b; x)

the incomplete beta function, the primitive of

1 / (1 - y^{d})

is

C + \sum_{k \geq 0} \frac{1}{k d + 1} y^{k d + 1} = : C + \frac{1}{d} B (\frac{1}{d}, 0; y^{d})

as a generalized logarithm. This gives an alternative expression of

P (z, u_{0})

. However, the computation of

Φ (z, u_{0})

obeying

P (Φ (z, u_{0}), u_{0}) = z

would require the inverse function of

P (z, u_{0})

which, to the authors’ knowledge, has no known expression in terms of special functions.

Example 2 (the binary case and the tree of life).

When

d = 2

, upon decomposing the rational fraction

\frac{1}{q^{2} (u_{0} - 1) + {(q + p z^{'})}^{2}}

into simple elements and integrating,

P (z, u_{0}) = \frac{1}{2 p q \sqrt{1 - u_{0}}} log [\frac{(p z + q - q \sqrt{1 - u_{0}}) (1 + \sqrt{1 - u_{0}})}{(p z + q + q \sqrt{1 - u_{0}}) (1 - \sqrt{1 - u_{0}})}] .

The inverse

Φ (z, u_{0})

obeying

P (Φ (z, u_{0}), u_{0}) = z

reads

Φ (z, u_{0}) = \frac{q}{p} \frac{u_{0} (e^{2 p q z \sqrt{1 - u_{0}}} - 1)}{1 + \sqrt{1 - u_{0}} - (1 - \sqrt{1 - u_{0}}) e^{2 p q z \sqrt{1 - u_{0}}}} .

When

u_{0} \to 1^{-}

, both the numerator and denominator tend to 0, with, to the first order in

u_{0} - 1,

Φ (z, u_{0}) \to Φ (z) = \frac{q^{2} z}{1 - p q z} .

This is (45) when

d = 2

as required.

Even in this explicit expression case for

Φ (z, u_{0})

,

P (\bar{N} (1) = n, {\bar{N}}_{0} (1) = n_{0}) : = [z^{n} u_{0}^{n_{0}}] Φ (z, u_{0})

has no simple expression. We now give an alternative path to obtain

P (\bar{N} (1) = n, {\bar{N}}_{0} (1) = n_{0})

, exploiting the recursive nature of binomial increasing trees.

A recurrence in special cases. Consider increasing branching random trees whose p.g.f.

Φ (z)

solves

Φ^{'} (z) = ϕ (Φ (z))

where

ϕ (z) = \sum_{b \geq 0} π_{b} z^{b} .

Consider the cases

ϕ (z) = {(\frac{q}{1 - p z})}^{θ}

,

θ > 0

,

ϕ (z) = e^{- μ (1 - z)},

μ > 0

or

ϕ (z) = {(q + p z)}^{d}

,

d \geq 2

integer (negative binomial, Poisson, or binomial).

Note that the convergence radius of

Φ (z)

is

z_{c} = \int_{0}^{z_{*}} \frac{d z^{'}}{ϕ (z^{'})},

(54)

where

z_{*} = inf (z > 0 : ϕ (z) = \infty)

is the convergence radius of

ϕ (z)

.

In these three particular cases of

ϕ

, the formation of the tree admits the following recursive tree evolution scheme (label 1 is assigned to the root).

With probability

p_{b} (n) : = Z_{n}^{- 1} (b + 1) π_{b + 1} / π_{b},

(55)

attach uniformly node

n + 1

to any of the

{\bar{N}}_{b} (τ_{n})

nodes with outdegree

b \in \{0, \dots, b^{*}\}

of a previous size-n increasing tree

τ_{n}

(

b^{*} \leq (n - 1) \land d

). The normalization constant is

Z_{n} = \sum_{b = 0}^{n - 1} {\bar{N}}_{b} (τ_{n}) (b + 1) π_{b + 1} / π_{b}

, representing the “number” of ways the new atom with label

n + 1

can be inserted in

τ_{n}

. This preferential attachment procedure results in a realization of

τ_{n + 1}

, (see [25]).

With

(B_{b} ({\bar{N}}_{b} (τ_{n})), b \in \{0, \dots, b^{*}\})

mutually exclusive Bernoulli random variables (summing to 1), each with success probability

{\bar{N}}_{b} (τ_{n}) p_{b} (n)

, for each

n \geq 1

, we then have

\{\begin{matrix} {\bar{N}}_{0} (τ_{n + 1}) = {\bar{N}}_{0} (τ_{n}) + 1 - B_{0} ({\bar{N}}_{0} (τ_{n})), \\ {\bar{N}}_{b} (τ_{n + 1}) = {\bar{N}}_{b} (τ_{n}) + B_{b - 1} ({\bar{N}}_{b - 1} (τ_{n})) - B_{b} ({\bar{N}}_{b} (τ_{n})), b \in \{1, \dots, b^{*}\}, \\ {\bar{N}}_{b^{*} + 1} (τ_{n + 1}) = 0 + B_{b^{*}} ({\bar{N}}_{b^{*}} (τ_{n})) . \end{matrix}

(56)

Whenever a connection to a node with outdegree b occurs, the number of nodes with outdegree b (respectively

b + 1

) decreases (increases) by one unit. In addition, a new node with outdegree 0 is always created, whatever the degree of the node to which the new incoming atom connects to

τ_{n}

.

For the three particular

ϕ —

models generated by the

ϕ

’s above, using

\sum_{b = 0}^{n - 1} n_{b} (τ_{n}) = n

and

\sum_{b = 1}^{n - 1} b n_{b} (τ_{n}) = n - 1

for any

τ_{n}

leading to an expression of

Z_{n}

, we obtain

p_{b} (n) = \frac{θ + b}{n (θ + 1) - 1}, \frac{1}{n}, \frac{d - b}{1 + n (d - 1)},

(57)

respectively, depending only on

(b, n)

and not on the full weight sequence

(π_{b}; b = 0, \dots, b^{*})

. In the first two examples,

b \in \{0, \dots, b^{*} = n - 1\}

, while

b \in \{0, \dots, b^{*} = (n - 1) \land d\}

in the third d-ary labeled binomial trees case.

From the first row of (56), the mean number of leaves is readily obtained to be, respectively, (growing as a fraction of

n \geq 2

):

E [{\bar{N}}_{0} (τ_{n})] = \frac{n (θ + 1) - 1}{2 θ + 1}, \frac{n}{2}, \frac{(d - 1) n + 1}{2 d - 1} .

(58)

The variance grows similarly proportionally to n, and a Central Limit Theorem can be shown to hold for

{\bar{N}}_{0} (τ_{n}) = {\bar{N}}_{n}^{0} (1)

. Note that

E [{\bar{N}}_{n}^{0} (1)]

is independent of p.

When

d = 2

in the binary case,

E [{\bar{N}}_{n}^{0} (1)] = (n + 1) / 3 .

The first row of (56) giving the evolution of the number of leaves of a size

— n

tree is

P ({\bar{N}}_{n + 1}^{0} (1) = n_{0} + 1) = \frac{d n_{0} P ({\bar{N}}_{n}^{0} (1) = n_{0})}{1 + n (d - 1)} + (1 - \frac{d (n_{0} + 1)}{1 + n (d - 1)}) P ({\bar{N}}_{n}^{0} (1) = n_{0} + 1),

giving the transition probabilities from

n_{0}

to

n_{0} + 1,

n_{0} \in \{1, . ., n - 1\}

.

P ({\bar{N}}_{n}^{0} (1) = .)

is a Markovian probability sequence. The initial conditions are

P ({\bar{N}}_{1}^{0} (1) = 1) = P ({\bar{N}}_{2}^{0} (1) = 1) = 1 .

Therefore, with

\begin{matrix} π_{m} (l - 1, l) & = & \frac{d (l - 1)}{1 + m (d - 1)} \\ π_{m} (l, l) & = & 1 - π_{m} (l - 1, l) \end{matrix}

the transition probabilities, for

n \geq 3

and

n_{0} < n

, the integrated distribution of

{\bar{N}}_{n}^{0} (1)

reads

P ({\bar{N}}_{n}^{0} (1) = n_{0}) = \sum_{m_{n_{0}}}^{*} \prod_{l = 1}^{n_{0}} π_{m_{l}} (l - 1, l) \prod_{l = 1}^{n_{0}} \prod_{m_{l - 1} < m < m_{l}} π_{m} (l, l),

(59)

where the star sum runs over the integers

m_{n_{0}} = (m_{l}; l = 1, \dots, n_{0})

obeying

m_{0} : = 1 < m_{1} < m_{2} < \dots < m_{n_{0}} \leq m_{n_{0} + 1} : = n

.

The latter explicit expression of

P ({\bar{N}}_{n}^{0} (1) = n_{0})

translates the fact that there are

n_{0}

unit moves up at points

m_{n_{0}}

with no other moves but those for this sequence. There are

(\binom{n - 1}{n_{0} - 1})

terms in this star sum (the number of strict compositions of n into

n_{0}

parts).

3.3. Increasing d-Partition of n into k Parts: Thermodynamic Limit

We have

- \frac{1}{k} log P (\frac{\bar{N} (k)}{k} \to ρ) \to f (ρ),

(60)

with

f (ρ)

the Legendre transform of

log Φ (z)

. With

\frac{z Φ^{'} (z)}{Φ (z)} = ρ \geq 1

always defining a unique

z = z_{ρ}

, we have

f (ρ) = ρ log z_{ρ} - log Φ (z_{ρ}),

(61)

where

z_{ρ} = \frac{(ρ - 1) (d - 1)}{1 + (ρ - 1) (d - 1)} z_{c},

Φ (z_{ρ}) = \frac{q}{p} [{(1 + (ρ - 1) (d - 1))}^{1 / (d - 1)} - 1] .

When

d = 2

,

z_{ρ} = \frac{1}{p q} \frac{ρ - 1}{ρ}

and

Φ (z_{ρ}) = \frac{q}{p} (ρ - 1) .

With

ρ \geq 1

, therefore,

\begin{matrix} f (ρ) & = & - log \frac{q}{p} - ρ log (p q) - ρ log ρ + (ρ - 1) log (ρ - 1) \\ f^{'} (ρ) & = & - log (p q) - log (\frac{ρ}{ρ - 1}) \\ f^{″} (ρ) & = & \frac{1}{ρ - 1} - \frac{1}{ρ} > 0 . \end{matrix}

f^{'} (ρ)

vanishes at

ρ_{*} = \frac{1}{1 - p q} > 1

, so that

\frac{\bar{N} (k)}{k} \to ρ_{*}

almost surely as

k \to \infty .

When dealing with the weak version

Φ_{0} (z) = q + p Φ (z)

of

Φ (z),

we have

[z^{n}] Φ_{0} {(z)}^{k} = q^{k} [z^{n}] [{(1 - p q^{d - 1} (d - 1) z)}^{- k / (d - 1)}] .

Hence, as

n = k ρ

, (now with

ρ \geq 0

)

P (\frac{\bar{N} (k)}{k} \sim ρ) = q^{k} \frac{{[k / (d - 1)]}_{k ρ}}{(k ρ)!} z_{c}^{- k ρ} .

By the Stirling formula, in the thermodynamic limit

n, k \to \infty,

n / k = ρ \geq 1,

\begin{matrix} P (\frac{\bar{N} (k)}{k} \sim ρ) & \sim & \frac{1}{\sqrt{2 π}} q^{k} z_{c}^{- k ρ} \frac{{(1 + ρ (d - 1))}^{\frac{k}{d - 1} - \frac{1}{2}} {(k (\frac{1}{d - 1} + ρ - 1))}^{k ρ - \frac{1}{2}}}{{(k ρ)}^{k ρ + \frac{1}{2}}} \\ \sim & \frac{1}{\sqrt{2 π}} q^{k} z_{c}^{- k ρ} {(1 + ρ (d - 1))}^{\frac{k}{d - 1}} {(\frac{1}{ρ (d - 1)} + 1)}^{k ρ}, \end{matrix}

so that

- \frac{1}{k} log P (\frac{\bar{N} (k)}{k} \to ρ) \to f (ρ),

with

f (ρ) = - log a (ρ)

and

a (ρ) = q z_{c}^{- ρ} {(\frac{1}{ρ (d - 1)} + 1)}^{ρ} {(1 + ρ (d - 1))}^{\frac{1}{d - 1}}

, the Legendre transform of

log Φ_{0} (z)

. With

\frac{z Φ_{0}^{'} (z)}{Φ_{0} (z)} = ρ \geq 0

always defining a unique

z = z_{ρ}

, we have

f (ρ) = ρ log z_{ρ} - log Φ_{0} (z_{ρ})

where

z_{ρ} = \frac{ρ (d - 1)}{1 + ρ (d - 1)} z_{c},

Φ_{0} (z_{ρ}) = q {(1 + ρ (d - 1))}^{1 / (d - 1)} .

The expression of (61) in the strict case is just a shifted version of the latter one in the weak case, with

ρ - 1

substituted to

ρ

there.

3.4. The Limiting Poisson Case ( $d \to \infty$ )

We here mention some related computations encompassing the limiting Poisson case.

Let

T (z)

solve

T^{'} (z) = e^{T (z)}

with

T (0) = 0,

and hence with solution

T (z) = - log (1 - z),

(62)

with for

n \geq 1

C_{n} : = n! [z^{n}] T (z) = (n - 1)! .

C_{n}

counts the number of labeled Cayley increasing trees. The convergence radius of

T (z)

is

r_{c} = 1 .

The g.f. of random Poisson

(μ)

rooted increasing trees solves the ordinary differential equation

Φ^{'} (z) = ϕ (Φ (z))

with

ϕ (z) = e^{- μ (1 - z)}

and

Φ (0) = 0

. Hence,

0

Φ (z) = - \frac{1}{μ} log (1 - μ e^{- μ} z),

(63)

with convergence radius

z_{c} = e^{μ} / μ > 1

. Then,

P (\bar{N} (1) = n) = [z^{n}] Φ (z) = \frac{1}{n μ} z_{c}^{- n} .

With

w_{b} = μ^{b} e^{- μ}

the weight of a node with outdegree b, the weight of a size

— n

tree is

w (τ_{n}) = \prod_{b \geq 0} w_{b}^{n_{b}} = μ^{n - 1} e^{- n μ},

independent of the

n_{b}

’s. Hence,

P ({\bar{N}}_{n} (1) = n) = \frac{C_{n}}{n!} w (τ_{n})

, a separable form.

Furthermore, for k-forests of such increasing trees, with

P (z) = \int_{0}^{z} \frac{d z^{'}}{e^{- μ (1 - z^{'})}} = z_{c} (1 - e^{- μ z})

and

R (z) = z / P (z),

P (\bar{N} (k) = n) = \frac{k}{n} [z^{n - k}] R {(z)}^{n} .

With

z_{ρ}

obeying

\frac{z_{ρ} Φ^{'} (z_{ρ})}{Φ (z_{ρ})} = ρ \geq 1

, we have

Φ (z_{ρ}) = - \frac{1}{μ} log (1 - μ e^{- μ} z_{ρ})

and

\frac{Φ (z_{ρ} z)}{Φ (z_{ρ})} = \frac{- log (1 - μ e^{- μ} z_{ρ} z)}{- log (1 - μ e^{- μ} z_{ρ})}

is the mean

— ρ

logarithmic distribution of the typical box occupancy in the thermodynamic limit

n, k \to \infty

,

n / k = ρ .

With

π_{0} = e^{- μ}

, the joint generating of nodes and leaves solves

\partial_{z} Φ (z, u_{0}) = e^{- μ} (u_{0} - 1) + e^{- μ (1 - Φ (z, u_{0}))} .

Φ (z, u_{0})

is thus the inverse function of

P (z, u_{0}) = e^{μ} \int_{0}^{z} \frac{d z^{'}}{u_{0} - 1 + e^{μ z^{'}}} = \frac{e^{μ}}{μ (1 - u_{0})} log (\frac{1 - e^{- μ z} (1 - u_{0})}{u_{0}}),

so with

Φ (z, u_{0}) = \frac{1}{μ} log \frac{1 - u_{0}}{1 - u_{0} e^{μ e^{- μ} z (1 - u_{0})}} .

Note

Φ (z, u_{0}) = \frac{1}{μ} T (μ e^{- μ} z, u_{0})

where

T (z, u_{0}) = log \frac{1 - u_{0}}{1 - u_{0} e^{z (1 - u_{0})}}

solves

\partial_{z} T (z, u_{0}) = u_{0} - 1 + e^{T (z, u_{0})}, T (0, u_{0}) = 0 .

With

n_{0} = 1, \dots, n - 1

, we thus have

P ({\bar{N}}_{n}^{0} (1) = n_{0}) = \frac{[z^{n} u_{0}^{n_{0}}] Φ (z, u_{0})}{[z^{n}] Φ (z, 1)},

(64)

with no obvious solution. However, as we know from the expression of the transition Poissonian probabilities, this probability sequence is Markovian with

P ({\bar{N}}_{n + 1}^{0} (1) = n_{0} + 1) = \frac{n_{0}}{n} P ({\bar{N}}_{n}^{0} (1) = n_{0}) + (1 - \frac{n_{0} + 1}{n}) P ({\bar{N}}_{n}^{0} (1) = n_{0} + 1) .

Hence, in agreement with [26], for

n \geq 2,

P ({\bar{N}}_{n}^{0} (1) = n_{0}) = \frac{E_{n, n_{0}}}{(n - 1)!}, for each n_{0} \in \{1, \dots, n - 1\} .

where

E_{n, n_{0}}

are the shifted first-kind Eulerian numbers.

{\bar{N}}_{n}^{0} (1)

has mean

n / 2

and variance

n / 12 :

a CLT holds. Eulerian numbers

E_{n + 1, n_{0} + 1}

count the number of permutations of

[n]

with

n_{0}

ascents. Recalling

E_{n, n_{0}} = E_{n, n - n_{0}}

,

n - {\bar{N}}_{n}^{0} (1) \overset{d}{=} {\bar{N}}_{n}^{0} (1)

, where

n - {\bar{N}}_{n}^{0} (1)

is the number of internal nodes of the size

— n

tree.

3.5. The Boundary Case $d = 1$

Combinatorial linear increasing trees are those for which

T^{'} (z) = 1 + T (z)

,

T (0) = 0

yielding

T (z) = e^{z} - 1,

with

C_{n} = n! [z^{n}] T (z) = 1

(only one such trees). The branching number of a node is either zero or one, leading to threadlike trees.

We have

P (z) = log (1 + z)

,

R (z) = z / log (1 + z)

,

C_{n} = n! n^{- 1} [z^{n - 1}] R {(z)}^{n} = 1

(an identity). Furthermore, by the Lagrange inversion formula,

C_{n, k} = n! [z^{n}] T {(z)}^{k} = n! \frac{k}{n} [z^{n - k}] R {(z)}^{n} = k! S_{n, k}

(

S_{n, k}

, the second-kind Stirling numbers).

Random linear increasing trees are those for which

Φ^{'} (z) = q + p Φ (z)

,

Φ (0) = 0

yielding

Φ (z) = \frac{q}{p} (e^{p z} - 1),

with

[z^{n}] Φ (z) = \frac{1}{n!} w (τ_{n})

and

w (τ_{n}) = q p^{n - 1}

the weight of all size

— n

such trees. Note that

r_{e} = Φ (1) = \frac{q}{p} (e^{p} - 1) < 1

is the extinction probability. Furthermore,

P (K_{n} = k) = [z^{n}] Φ {(z)}^{k} = \frac{k!}{n!} S_{n, k} {(q p^{n - 1})}^{k}

,

k = 1, \dots, n .

In a size

— n

forest with

K_{n}

distinguishable trees, the law of the total number of leaves coincides with the one of

K_{n}

as a result of any such threadlike tree possessing a single leaf.

4. Concluding Remarks

The sizes of the random progenies of both simple (BGW) and increasing trees and forests generated by the

d —

binomial branching mechanism (

d \in N : = \{1, 2, \dots\}

) are shown to be amenable to weighted combinatorial trees in the sense of Meir and Moon [14]. We exploit this fact to analyze the structural aspects of these, such as the number of leaves in a size

— n

tree, the number of trees with given outdegree sequences, the number of trees in a size

— n

forest, the number of atoms in a k-forest, or the joint and marginal sizes of trees in a size

— n

forest with k trees. We derive asymptotic results when

n \to \infty

,

k \to \infty,

separately, or when n,

k \to \infty

, jointly, while

n / k \to ρ > 0 .

We conclude by stressing that an alternative randomization to counting combinatorial trees and forests, based on the ratio of favorable count outcomes to the total number of possible ones, is of great interest, as it leads to different and very rich behaviors, for example, concerning the number of trees in a size

— n

forest.

Both randomization approaches rest on the analysis of generating functions and can sometimes take advantage of the Lagrange inversion formula.

Funding

This research received no external funding.

Data Availability Statement

There are no data associated with this paper.

Acknowledgments

T. Huillet acknowledges partial support from the “Chaire Modélisation mathématique et biodiversité” of Veolia-Ecole Polytechnique-MNHN-FondationX and support from the labex MME-DII Center of Excellence (Modèles mathématiques et économiques de la dynamique, de l’incertitude et des interactions, ANR-11-LABX-0023-01 project). This work was also funded by CY Initiative of Excellence (grant “Investissements d’Avenir”ANR- 16-IDEX-0008), Project “EcoDep” PSI-AAP2020-0000000013.

Conflicts of Interest

The authors have no conflicts of interest associated with this paper.

Appendix A

So far, random trees and forests have been constructed after assigning probability weights to the nodes (with given outdegrees) of combinatorial trees, either simple or increasing. We dealt with the binomial offspring distribution as an important representative of the ones with bounded support and having all its moments.

We here briefly consider a different approach to the randomization of combinatorial trees and forests, namely the one arising from the ratio of favorable outcomes to the number of possible ones. In this context, we emphasize the role of the Lagrange inversion formula for some additional branching models, not necessarily related to the binomial case.

To this end, we first observe that the probability law of

\bar{N} (1),

the number of nodes in a combinatorial tree with

C_{n} = n! [z^{n}] T (z)

trees (non-negative integers) of size

— n

derived from the g.f.

T (z)

, is given by the tilting

P (\bar{N} (1) = n) = \frac{z^{n} C_{n} / n!}{T (z)},

for any

z < ρ_{c} (z \leq ρ_{c})

, depending on

T (ρ_{c}) = \infty

(

T (ρ_{c}) < \infty

). Tilting is necessary in the ratio of favorable cases to possible cases randomization because of the divergence of the series

c_{n} = C_{n} / n!

. The parameter z is related to the mean

m = E (\bar{N} (1))

by

z T^{'} (z) / T (z) = m

. Here,

T (z)

solves either

T (z) = z φ (T (z))

,

T (0) = 0

or

T^{'} (z) = φ (T (z))

,

T (0) = 0

depending on whether it is a simple or an increasing combinatorial tree. Such trees are generated by the g.f.

φ (z)

with

m! [z^{m}] φ (z)

,

m \geq 1

, non-negative integers, and

φ (0) = 1

.

When conditioning on the size of the forest, the joint law of a population of k clusters in a size-n forest is

\begin{matrix} P ({\bar{N}}_{n, k} = n_{k}) & = & \frac{[z^{n} \prod_{l = 1}^{k} z_{l}^{n_{l}}] \prod_{l = 1}^{k} T (z z_{l})}{[z^{n}] T {(z)}^{k}} \\ = & (\binom{n}{n_{1} \dots n_{k}}) \frac{\prod_{l = 1}^{k} C_{n_{l}}}{C_{n, k}} δ_{|n_{k}| = n} . \end{matrix}

Here,

C_{n, k} = n! [z^{n}] T {(z)}^{k}

or

C_{n, k} = n! [z^{n}] T {(z)}^{k} / k!

, depending on the constitutive trees being distinguishable or not.

Moreover, the law of the number

K_{n}

of its clusters can be calculated after normalizing its count. As the following examples show, the asymptotic structure of

K_{n}

in this approach is very rich:

Forests of indistinguishable linear increasing trees for which $T^{'} (z) = 1 + T (z)$ yielding $T (z) = e^{z} - 1$ and $K (z, v) = e^{v T (z)}$ . With $S_{n, k}$ the Stirling numbers of the second kind, we obtain

$\begin{matrix} n! [z^{n} v^{k}] K (z, v) & = & S_{n, k}; k = 1, \dots, n . \\ P (K_{n} = k) & = & \frac{n! [z^{n} v^{k}] K (z, v)}{n! [z^{n}] K (z, 1)} = \frac{S_{n, k}}{Σ_{n}} . \\ Σ_{n} & = & \sum_{k = 1}^{n} S_{n, k}, the Bell number . \end{matrix}$

The $S_{n, k}$ s obey a triangular relation translating in one for $P (K_{n} = k) .$
In the case that the trees are assumed distinguishable, with $K (z, v) = 1 / (1 - v T (z)),$ (see [22]),

$\begin{matrix} n! [z^{n} v^{k}] K (z, v) & = & k! S_{n, k}; k = 1, \dots, n . \\ P (K_{n} = k) & = & \frac{n! [z^{n} v^{k}] K (z, v)}{n! [z^{n}] K (z, 1)} = \frac{k! S_{n, k}}{Σ_{n}} . \\ Σ_{n} & = & \sum_{k = 1}^{n} k! S_{n, k} = \frac{1}{2} \sum_{k \geq 0} 2^{- k} k^{n}, the ordered Bell number . \end{matrix}$
Forests of indistinguishable Cayley trees [23]
If $T (z) = z e^{T (z)}$ , with $K (z, v) = e^{v T (z)}$ , by the Lagrange inversion theorem,

$\begin{matrix} [z^{n}] K (z, v) & = & \frac{v}{n} [z^{n - 1}] e^{z (v + n)} \\ = & \frac{v}{n!} {(v + n)}^{n - 1} \\ n! [z^{n} v^{k}] K (z, v) & = & : C_{n, k} = (\binom{n - 1}{k - 1}) n^{n - k}; k = 1, \dots, n . \\ P (K_{n} = k) & = & \frac{n! [z^{n} v^{k}] K (z, v)}{n! [z^{n}] K (z, 1)} = \frac{1}{{(1 + n)}^{n - 1}} (\binom{n - 1}{k - 1}) n^{n - k} . \\ (\binom{n - 1}{k - 1}) n^{n - k} & = & (\binom{n}{k}) k n^{n - k - 1} . \end{matrix}$

Ref. [27] rather gives $C_{n, k} = k n^{n - k - 1}$ as the number of unordered forests with k Cayley trees while fixing the k different founders of the distinct trees out of $(\binom{n}{k})$ different ways. See also [23]. Now, with $Σ_{n} : = n! [z^{n}] K (z, 1),$

$E (v^{K_{n}}) = \frac{n! [z^{n}] K (z, v)}{Σ_{n}} = v {(1 - \frac{1}{1 + n} + \frac{v}{1 + n})}^{n - 1},$

(A1)

a shifted binomial distribution. In particular, $E (K_{n}) = \frac{2 n}{1 + n} \sim 2 .$ Note that

$E (v^{K_{n}}) \sim v e^{- (1 - v)} as n is large,$

(A2)

the p.g.f. of a shifted mean 1 Poisson random variable.
We finally observe as in [28] that the triangular array $C_{n, k}$ obeys the backward recursion

$(n - k) C_{n, k} = n k C_{n, k + 1} .$

Hence, with $Σ_{n} = \sum_{k = 1}^{n} C_{n, k}$ , $P (K_{n} = k) = C_{n, k} / Σ_{n}$ , $k = 1, \dots, n$ obeys

$P (K_{n} = k) = \frac{n k}{n - k} P (K_{n} = k + 1), k = n - 1, \dots, 1,$

with terminal condition $P (K_{n} = n) = {(n + 1)}^{- (n - 1)}$ .
For non-plane increasing trees (see [24] p. 40 and [29]), with $T (z) = - log (1 - z)$ and $|s_{n, k}|$ , the absolute Stirling numbers of the first kind, considering forests of such indistinguishable trees, we have

$\begin{matrix} K (z, v) & = & e^{v T (z)} = {(1 - z)}^{- v} \\ n! [z^{n}] K (z, v) & = & {[v]}_{n} \\ n! [z^{n} v^{k}] K (z, v) & = & |s_{n, k}| = : C_{n, k} \\ Σ_{n} & = & n! [z^{n}] K (z, 1) = \sum_{k = 1}^{n} C_{n, k} = n! \\ E (v^{K_{n}}) & = & \frac{{[v]}_{n}}{n!} . \end{matrix}$

Therefore, with $H_{n}$ as the n-th harmonic number,

$E (K_{n}) = \frac{[z^{n}] \partial_{v} K (z, 1)}{Σ_{n}} = H_{n} \sim log n,$

(A3)

$σ^{2} (K_{n}) \sim log n .$

A CLT holds.
$|s_{n, k}|$ counts the number of permutations of n elements with k disjoint cycles. The process $K_{n}$ is the Chinese Restaurant process, indicating the number of occupied tables by n clients and also the number of distinct visited species in a n-sampling process from the Poisson–Dirichlet $(1)$ partition of the unit interval [see [30], p. 57, for example].
Plane oriented (recursive) trees are those for which $T^{'} (z) = 1 / (1 - T (z))$ , $T (0) = 0,$ yielding $T (z) = 1 - \sqrt{1 - 2 z}$ , with $C_{n} = n! [z^{n}] T (z) = \frac{1}{2} {[1 / 2]}_{n - 1} 2^{n} = (2 n - 3)!! .$
Here, $P (z) = z - z^{2} / 2$ , $R (z) = {(1 - z / 2)}^{- 1}$ and

$\begin{matrix} C_{n} & = & n! n^{- 1} [z^{n - 1}] {(1 - z / 2)}^{- n} = {[n]}_{n - 1} / 2^{n - 1} \\ = & 2^{- (n - 1)} (2 n - 2)! / (n - 1)!, \end{matrix}$

in agreement with the well-known identity $(2 n - 3)!! = (2 n - 2)! / (2^{n - 1} (n - 1)!) .$
Furthermore, in agreement with [31],

$\begin{matrix} C_{n, k} & = & n! [z^{n}] T {(z)}^{k} = n! \frac{k}{n} [z^{n - k}] {(1 - z / 2)}^{- n} \\ = & \frac{n!}{(n - k)!} \frac{k}{n} {[n]}_{n - k} / 2^{n - k} = k 2^{- (n - k)} (2 n - k - 1)! / (n - k)! . \end{matrix}$

$C_{n, k}$ counts the number of k-forests of distinguishable plane-oriented trees with n nodes.

$\begin{matrix} K (z, v) & = & \frac{1}{1 - v T (z)} = \frac{1}{1 - v (1 - \sqrt{1 - 2 z})} \\ n! [z^{n}] K (z, 1) & = & Σ_{n} = n! [z^{n}] {(1 - 2 z)}^{- 1 / 2} \\ = & {[1 / 2]}_{n} 2^{n} = (2 n - 1)!! = (2 n)! / (2^{n} n!) . \end{matrix}$

$\begin{matrix} P (K_{n} = k) & = & \frac{C_{n, k}}{Σ_{n}} = \frac{k 2^{- (n - k)} (2 n - k - 1)! / (n - k)!}{(2 n)! / (2^{n} n!)} \\ = & k 2^{k} \frac{(2 n - k - 1)!}{(2 n)!} \frac{n!}{(n - k)!} . \end{matrix}$

References

Pitman, J. Enumerations of trees and forests related to branching processes and random walks. In Microsurveys in Discrete Probability; Aldous, D., Propp, J., Eds.; American Mathematical Society: Providence, RI, USA, 1998; pp. 163–180. [Google Scholar]
Neveu, J. Arbres et processus de Galton-Watson. Ann. Inst. Henri Poincaré Probab. Stat. 1986, 22, 199–207. [Google Scholar]
Stanley, R.P. Chapter 5. In Enumerative Combinatorics; Cambridge University Press: Cambridge, UK, 1999; Volume 2. [Google Scholar]
Surya, E.; Warnke, L. Lagrange Inversion Formula by Induction. Am. Math. Mon. 2023, 130, 944–948. [Google Scholar] [CrossRef]
Roch, S. Branching Processes. 2021. Available online: https://people.math.wisc.edu/~roch/mdp/roch-mdp-chap6.pdf (accessed on 24 November 2024).
Flory, P.J. Molecular size distribution in three-dimensional polymers, I Gelation. J. Am. Chem. Soc. 1941, 63, 3083–3090. [Google Scholar] [CrossRef]
Flory, P.J. Molecular size distribution in three-dimensional polymers, II Trifunctional branching units. J. Am. Soc. 1941, 63, 3091–3096. [Google Scholar] [CrossRef]
Stockmayer, W.H. Theory of molecular size distribution and gel formation in branched chain polymers. J. Chem. Phys. 1943, 11, 45–55. [Google Scholar] [CrossRef]
Simkin, M.V.; Roychowdhury, V.P. Re-inventing Willis. Phys. Rep. 2011, 502, 1–35. [Google Scholar] [CrossRef][Green Version]
Flajolet, P.; Sedgewick, R. Analytic Combinatorics; Illustrated Edition; Cambridge University Press: Cambridge, UK, 2009. [Google Scholar]
Harris, T.E. The Theory of Branching Processes; Die Grundlehren der Mathematischen Wissenschaften, Bd. 119; Springer: Berlin/Heidelberg, Germany; Prentice-Hall, Inc.: Englewood Cliffs, NJ, USA, 1963. [Google Scholar]
Garcia-Millan, R.; Font-Clos, F.; Corral, Á. Finite-size scaling of survival probability in branching processes. Phys. Rev. E 2015, 91, 042122. [Google Scholar] [CrossRef]
Corral, Á.; Garcia-Millan, R.; Font-Clos, F. Exact Derivation of a Finite-Size Scaling Law and Corrections to Scaling in the Geometric Galton-Watson Process. PLoS ONE 2016, 11, e0161586. [Google Scholar] [CrossRef]
Meir, A.; Moon, J.W. On the altitude of nodes in random trees. Can. J. Math. 1978, 30, 997–1015. [Google Scholar] [CrossRef]
Drmota, M. Random Trees: An Interplay Between Combinatorics and Probability; Springer: Wien, Austria; New York, NY, USA, 2009. [Google Scholar]
Cramér, H. Sur un nouveau théorème-limite de la théorie des probabilités. Actualités Sci. Ind. 1938, 736, 523. [Google Scholar]
Burden, C.J.; Simon, H. Genetic drift in populations governed by a Galton-Watson branching process. Theor. Biol. 2016, 109, 63–74. [Google Scholar] [CrossRef] [PubMed]
Karlin, S.; McGregor, J. Direct product branching processes and related Markov chains. Proc. Nat. Acad. Sci. USA 1964, 51, 598–602. [Google Scholar] [CrossRef] [PubMed]
Tutte, W.T. The Number of Planted Plane Trees with a Given Partition. Am. Math. Mon. 1964, 71, 272–277. [Google Scholar] [CrossRef]
Kreweras, G. Sur les partitions non croisées d’un cycle. Discret. Math. 1972, 1, 333–350, English translation by Berton A. Earnshaw: On the Non-crossing Partitions of a Cycle. 2005. Available online: https://users.math.msu.edu/users/earnshaw/research/kreweras.pdf (accessed on 24 November 2024). [CrossRef]
Tanner, J.C. A derivation of the Borel distribution. Biometrika 1961, 48, 222–224. [Google Scholar] [CrossRef]
Comtet, L. Analyse Combinatoire—Tome 1; Presses Universitaires de France: Paris, France, 1970. [Google Scholar]
Rényi, A. Some Remarks on the Theory of Trees. Magyar Tud. Akad. Mat. Kutat Int. Kzl 1959, 4, 73–85. [Google Scholar]
Bergeron, F.; Flajolet, P.; Salvy, B. Varieties of increasing trees. In Lecture Notes in Computer Science; CAAPs 92; Raoult, J.C., Ed.; 1992; Volume 581, pp. 24–48. [Google Scholar]
Panholzer, A.; Prodinger, H. Level of nodes in increasing trees revisited. Random Struct. Algorithms 2007, 31, 203–226. [Google Scholar] [CrossRef]
Najock, D.; Heyde, C.C. On the Number of Terminal Vertices in Certain Random Trees with an Application to Stemma Construction in Philology. J. Appl. Probab. 1982, 19, 675–680. [Google Scholar] [CrossRef]
Takács, L. On Cayley’s formula for counting forests. J. Comb. Theory Ser. A 1990, 53, 321–323. [Google Scholar] [CrossRef]
Clarke, L.E. On Cayley’s Formula for Counting Trees. J. Lond. Math. Soc. 1958, 33, 471–474. [Google Scholar] [CrossRef]
Mahmoud, H.; Smythe, R.T.; Szymanski, J. On the structure of random plane-oriented recursive trees and their branches. Random Struct. Algorithms 1993, 4, 151–176. [Google Scholar] [CrossRef]
Pitman, J. Combinatorial Stochastic Processes, Lecture Notes in Mathematics 1875; Springer: Berlin/Heidelberg, Germany, 2006. [Google Scholar]
Callan, D. A combinatorial survey of identities for the double factorial. arXiv 2009, arXiv:0906.1317. [Google Scholar]

Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

© 2025 by the author. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).

Share and Cite

MDPI and ACS Style

Huillet, T.E. Statistical Aspects of Two Classes of Random Binomial Trees and Forests. Mathematics 2025, 13, 291. https://doi.org/10.3390/math13020291

AMA Style

Huillet TE. Statistical Aspects of Two Classes of Random Binomial Trees and Forests. Mathematics. 2025; 13(2):291. https://doi.org/10.3390/math13020291

Chicago/Turabian Style

Huillet, Thierry E. 2025. "Statistical Aspects of Two Classes of Random Binomial Trees and Forests" Mathematics 13, no. 2: 291. https://doi.org/10.3390/math13020291

APA Style

Huillet, T. E. (2025). Statistical Aspects of Two Classes of Random Binomial Trees and Forests. Mathematics, 13(2), 291. https://doi.org/10.3390/math13020291

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Menu

Statistical Aspects of Two Classes of Random Binomial Trees and Forests

Abstract

1. Introduction

2. Number of Atoms and Leaves in a Size— $n$ Simple Tree

2.1. Simply Generated Random Trees

2.2. The Binomial Case

2.3. Random Trees as Weighted Combinatorial Trees

2.4. Selection of Paths Mechanisms of Random Trees: Rescaling

2.5. Total Number of Leaves (Sterile Individuals) Versus Total Progeny

2.6. Forests

2.7. Random Simple Trees with Given Outdegree Sequences

2.8. The Limiting Poisson Case ( $d \to \infty$ )

2.9. The Case $d = 1$

3. Increasing (or Recursive) $d —$ Trees as Phylogenetic Trees

3.1. Random Binomial Increasing Trees

3.2. Distribution of the Number of Leaves

3.3. Increasing d-Partition of n into k Parts: Thermodynamic Limit

3.4. The Limiting Poisson Case ( $d \to \infty$ )

3.5. The Boundary Case $d = 1$

4. Concluding Remarks

Funding

Data Availability Statement

Acknowledgments

Conflicts of Interest

Appendix A

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI

Article Menu

Statistical Aspects of Two Classes of Random Binomial Trees and Forests

Abstract

1. Introduction

2. Number of Atoms and Leaves in a Size— n Simple Tree

2.1. Simply Generated Random Trees

2.2. The Binomial Case

2.3. Random Trees as Weighted Combinatorial Trees

2.4. Selection of Paths Mechanisms of Random Trees: Rescaling

2.5. Total Number of Leaves (Sterile Individuals) Versus Total Progeny

2.6. Forests

2.7. Random Simple Trees with Given Outdegree Sequences

2.8. The Limiting Poisson Case ( d → ∞ )

2.9. The Case d = 1

3. Increasing (or Recursive) d — Trees as Phylogenetic Trees

3.1. Random Binomial Increasing Trees

3.2. Distribution of the Number of Leaves

3.3. Increasing d-Partition of n into k Parts: Thermodynamic Limit

3.4. The Limiting Poisson Case ( d → ∞ )

3.5. The Boundary Case d = 1

4. Concluding Remarks

Funding

Data Availability Statement

Acknowledgments

Conflicts of Interest

Appendix A

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI

2. Number of Atoms and Leaves in a Size— $n$ Simple Tree

2.8. The Limiting Poisson Case ( $d \to \infty$ )

2.9. The Case $d = 1$

3. Increasing (or Recursive) $d —$ Trees as Phylogenetic Trees

3.4. The Limiting Poisson Case ( $d \to \infty$ )

3.5. The Boundary Case $d = 1$