Next Article in Journal
Fuzzy Hilbert Transform of Fuzzy Functions
Previous Article in Journal
Integrating Explainable Artificial Intelligence in Extended Reality Environments: A Systematic Survey
Previous Article in Special Issue
Phase-Type Distributions of Animal Trajectories with Random Walks
 
 
Font Type:
Arial Georgia Verdana
Font Size:
Aa Aa Aa
Line Spacing:
Column Width:
Background:
Article

Statistical Aspects of Two Classes of Random Binomial Trees and Forests

by
Thierry E. Huillet
Laboratoire de Physique Théorique et Modélisation (CNRS, UMR 8089), CY Cergy Paris Université, 95302 Cergy-Pontoise, France
Mathematics 2025, 13(2), 291; https://doi.org/10.3390/math13020291
Submission received: 25 November 2024 / Revised: 4 January 2025 / Accepted: 14 January 2025 / Published: 17 January 2025
(This article belongs to the Special Issue Latest Advances in Random Walks Dating Back to One Hundred Years)

Abstract

:
We consider two specific families of binomial trees and forests: simply generated binomial d-ary trees and forests versus their increasing phylogenetic version, with tree nodes in increasing order from the root to any of its leaves. The analysis (both pre-asymptotic and asymptotic) consists of some of the main statistical features of their total progenies. We take advantage of the fact that the random distribution of those trees are obtained while weighting the counts of the underlying combinatorial trees. We finally briefly stress a rich alternative randomization of combinatorial trees and forests, based on the ratio of favorable count outcomes to the total number of possible ones.

1. Introduction

The main purpose of this paper is to present explicit and asymptotic methods to count two various kinds of random trees: Bienyamé–Galton–Watson (BGW) trees and increasing random trees, both generated by a binomial branching mechanism. The limiting Poisson case will also be briefly treated. We will analyze some structural properties of these Markov chains on the non-negative integers. The analysis chiefly concerns the number of leaves of these trees, the number of trees entering in the composition of a size n forest of such trees, the joint sizes of the constitutive trees in a forest of k trees, and the one-dimensional marginal size of a typical tree both at fixed n , k and in the limit n , k , n / k ρ . The probability to observe trees with a given outdegree of nodes sequence is also investigated. In all cases, the use of generating functions is an essential ingredient. Explicit formulae can sometimes be derived with the help of Lagrange’s inversion formulae that take different expressions for the two processes under concern. On the other hand, the singularity analysis of generating functions may lead to asymptotic formulae, aiming at describing large trees. The random trees under study consist of weighted versions of combinatorial trees; this will be highlighted.
We shall study the following models:
-
Binomial BGW trees and forests appearing in branching population models, percolation on trees and branched polymers.
-
Binomial increasing trees as recursive trees appearing in phylogeny with nodes in increasing order for any path from the root to the leaves: the leaves of size n tree species can mutate to another species when adding a new atom n n + 1 , whereas the internal nodes consist of the species that can produce a new species in the process. Being always supercritical, d-ary increasing trees do not show up a phase transition at the criticality. A forest of such trees consists of different population genera. The recursive structure is an essential ingredient which is shared by the limiting Poisson increasing tree.
Such random trees and forests are constructed after assigning probability weights to the nodes (with given outdegrees) of combinatorial trees, either simple or increasing, and observing that the weights only depend on the tree size and not on its full outdegree sequence.
Deep relations of enumeration problems of trees and forests to skip-free to the left random walks can be found in [1]. It rests on a different uniform way to construct random forests arising from enumerative combinatorics, leading to very rich asymptotic behaviors. This point is briefly addressed and illustrated in Appendix A.

2. Number of Atoms and Leaves in a Size— n Simple Tree

We can distinguish two main types of random trees, namely the following:
-
Ordered (or plane) trees: The reason is that one can draw the tree in the half-plane so that the children of every parent are ordered from left to right, say, from the youngest child to the eldest one. Embeddings obtained from cyclic rotations of the sub-trees around the root are not allowed.
Such trees are amenable to the Ulam–Harris–Neveu ordering of their nodes (horizontal ordering holds) and they can be represented as strata with the founder on top and the successive layers below, [2]. Given that an individual of the population at generation h is labeled by vertex v = v 1 v h (as a concatenation of h positive integers) and gives birth to H v 1 daughters, its offspring are labeled by v 1 , , v H v . Each individual at generation h thus obtains a concatenated label v = v 1 v h for which label v 1 v h 1 is the one of its mother, v 1 v h 2 , the one of its grandmother,…, up to , the conventional label of the root. Such ordered trees are Bienaymé–Galton–Watson (in short BGW) trees in the theory of branching processes; they are also called simply generated (for short, here, simple) trees.
-
Increasing random trees: A size-n rooted and increasing labeled tree has vertices with indices or labels 1 , , n increasing for any path from the root to its leaves. In contrast with simple random trees, increasing random trees do not show a phase transition at the criticality. For some classes of special branching mechanisms (binomial, Poisson, or negative binomial), such trees can be recursively defined, proving useful in their analysis.

2.1. Simply Generated Random Trees

By recursion from the root (a unique founder of the tree), the probability of simply generated (or for short simple) size-n rooted trees generated by the local probability generating function (p.g.f.) ϕ z (with non-negative z m coefficient of ϕ z say ϕ m : = z m ϕ z , m 0 , ϕ 1 = 1 ) summing to 1 is obtained as
P N ¯ 1 = n = z n Φ z ,
where Φ z solves the functional equation Φ z = z ϕ Φ z , Φ 0 = 0 . The Lagrange inversion formula states that for all n 1
z n Φ z = 1 n z n 1 ϕ z n .
A more general form of the Lagrange inversion formula states that (with denoting derivative)
0
z n h Φ z = 1 n z n 1 h z ϕ z n .
for any arbitrary analytic output function h. See [3,4]. If h z is a p.g.f., h Φ z is the p.g.f. of the total progeny of a branching process generated by ϕ with a random number of founders, say N 0 , for which h z = E z N 0 .
If h z = 1 v z 1 , where v ‘marks’ the number of distinguishable trees in a forest of simple random trees generated by ϕ , with K z , v : = 1 / 1 v Φ z
z n K z , v = v n z n 1 1 v z 2 ϕ z n .
This consists of a degree-n polynomial in v. The double generating function K z , v may be viewed as the ‘grand-canonical’ partition function. In that case,
v k z n 1 1 v Φ z = k n z n k ϕ z n = P K n = k
where K n is the number of distinguishable trees forming the size n forest. K z , v consists of a ‘sequence’ of trees forming the forest, with v ‘marking’ their number.
Should the trees be distinguishable, the same formulae hold but now with K z , v = e v Φ z , consisting of a ‘set’ of trees forming the forest, so
v k z n K z , v = 1 k ! k n z n k ϕ z n .
One important branching mechanism ϕ z is the binomial one. Other important examples of ϕ z are e μ 1 z (mean- μ Poisson) or q / 1 p z (geometric) and some of them will be mentioned in the course of the analysis when needed.

2.2. The Binomial Case

We here focus on ϕ z = q + p z d , d 2 and p + q = 1 (the binomial branching mechanism) corresponding to the probability mass function (pmf) π b = d b p b q d b , b = 0 , , d (for which π 0 = q d and π d = p d ): π b is the probability of an offspring having outdegree (branching number) b. For this BGW model, each mother particle alive can give birth to at most d daughters, or possibly none. With ξ d bin d , p , the random number of offspring per capita, ϕ z = E z ξ : each active node independently possibly activates any of its d descendants. The produced random tree is a sub-tree of the full d-ary (d-dimensional) tree having d h atoms at generation h 0 . It is a model of percolation on trees [see [5], p. 438 -> ].
The random variable ξ has mean p d ; its distribution is unimodal with mode at the origin if and only if p < 1 / d + 1 < 1 / d . If p > 1 / d + 1 , the mode is near p d + 1 1 .
Remarks and related models.
i The latter model is also related to the Flory–Stockmayer binomial model of randomly branched polymers with degree- d + 1 functional monomers. See [6,7] if d = 2 , [8] for any integer d and also [9]: in this model with one founder k = 1 , the Φ z obtained above from the bin d , p generating model ϕ is in fact the p.g.f. of first-generation polymers. Here, each monomer with d functional units (arms) is identified to a node of a BGW tree. Independently of one another, each of the d functional units has a probability p to be attached to a second-generation functional unit and so on. At generation 0 however, a seed monomer with full d + 1 functional units gives birth to a random number (so with distribution bin d + 1 , p ) of the first generation of such polymers, all with p.g.f. Φ z . The true size of the Flory branched polymer has thus the distribution given by the modified p.g.f.
0
Φ z Φ d z = z q + p Φ z d + 1 .
This translates the fact that the seed monomer can have up to d + 1 activated functional units, whereas all its descendants are only up to d, the first and subsequent generation of trees growing away from the seed monomer, thereby presenting only d possible free arms. In the supercritical case with p d > 1 , there is a positive probability that the Flory tree (polymer) is a giant one with infinitely many monomers (the gelation transition). With N ¯ 1 , the total number of monomers with one founder, we have P N ¯ 1 = 1 = z Φ z = q d + 1 and, for n 2 , by Lagrange inversion formula
P N ¯ 1 = n = z n Φ d z = z n 1 q + p Φ z d + 1 = p d + 1 n 1 z n 2 q + p z n + 1 d = p d + 1 n 1 n + 1 d n 2 p n 2 q n d 1 + d + 2 .
The leaves of the d polymer tree (its nodes with outdegree zero) constitute its external boundary where possible contact with the reactants are likely to occur. As such, an estimation of their number is of interest when dealing with such size-n branched polymers.
It is also of interest to consider a polymer soup as a collection of such k independent d polymers with a given size.
i i With Φ z solving Φ z = z q + p Φ z d and defining
Φ d z : = Φ z d 1 / d ,
we obtain that Φ d z solves
Φ d z = z q + p Φ d z d = z ϕ d Φ d z ,
corresponding to the branching mechanism ϕ d z = q + p z d . So Φ d z is the p.g.f. of the total progeny, say N ¯ d 1 , of a branching process whose offspring per capita is either d with probability p or 0 with probability q = 1 p , so all or nothing. We clearly have P N ¯ d 1 = n > 0 only for those n = m d + 1 , m 0 (the number of tree branches being multiple of d ) , so Φ d z 1 / d is well defined together with Φ z = Φ d z 1 / d d . We can deduce the main properties of the new model generated by ϕ d from the previous one generated by the binomial ϕ :
i i i If h z = E z N 0 = q 0 + p 0 z , let Φ 0 z = h Φ z = q 0 + p 0 Φ z .
When q 0 , p 0 = q 0 , p 0 , Φ 0 z = q + p Φ z is a weak version of Φ z , allowing for an empty tree with P N ¯ 1 = 0 = q .
i v Φ 0 z : = Φ z / z solves Φ 0 z = ϕ z Φ 0 z , Φ 0 0 = ß 0 . It is the shifted p.g.f. of N ¯ 1 1 also taking values in N 0 : = 0 , 1 , .
v Note that Ψ z : = Φ q 0 + p 0 z , as a Bernoulli-thinned version of Φ z , is not an output of Φ z . If, while deleting a productive node with probability q 0 , the ancestral branch leading to that node is equivalently erased, a pruning operation of the tree results in the formation of disconnected sub-trees rooted at this node. Assuming the root to be active, the size of the pruned tree descending from the root has p.g.f. Ψ 0 z , clearly solving
Ψ 0 z = z ϕ q 0 + p 0 Ψ 0 z .
This is the progeny of a BGW tree with the thinned branching mechanism ϕ q 0 + p 0 z .
v i As d , the binomial branching mechanism ϕ z = 1 μ d 1 z d approaches the Poisson μ p.g.f., leading to random Cayley trees.
v i i The total progeny of Bellman–Harris trees for which each splitting individual alive has an exponential lifetime independent of its sisters’ particles coincides with the one of discrete-time BGW trees as long as they share the same branching mechanism. An overlap of generations results.
State-space representations. The binomial BGW process is a Markov chain on the non-negative integers with the transition matrix
P k , k = z k ϕ z k = k d k p k q k d k , k N 0 , k 0 , , k d ,
in the case of a single founder.
With ϕ h z , h 0 , the p.g.f. of the number of individuals, say N h , alive at step h (the generation number), obeying ϕ h + 1 z = ϕ ϕ h z , ϕ 0 z = z ,
P h k , k = z k ϕ h z k
is the h-step transition matrix, involving the h-iterate ϕ h z of the degree-d polynomial ϕ z (as a degree d h polynomial). Recall that ϕ h 0 is the probability that the process dies out before step h , so the probability that the time to extinction τ 1 , 0 of the unique founder occurs before generation h: ϕ h 0 = P τ 1 , 0 h .
Total progeny. By the Lagrange formula, with P N ¯ 1 = n = z n Φ z , where N ¯ 1 = h 0 N h is the total progeny of this branching process with one founder and overlapping generations, with
P N ¯ 1 = n = 1 n z n 1 q + p z n d = q n p n d n 1 p q d 1 n .
By the Stirling formula, with z c : = sup z > 0 : Φ z < = d 1 / q d 1 p d d 1 , for large n,
P N ¯ 1 = n 1 2 π q p d d 1 3 1 / 2 n 3 / 2 z c n .
Note that z c = 1 only when p = p c : = 1 / d (the critical case with P N ¯ 1 = n 1 2 π d d 1 1 / 2 n 3 / 2 , a pure power law with exponent 3 / 2 ).
This result is a particular illustration of a more general situation. Consider indeed the general branching mechanisms ϕ z having all its moments (equivalently for which z * : = sup z > 0 : ϕ z < 1 , . Then, a unique positive real root to the equation
ϕ τ τ ϕ τ = 0 ,
exists, with τ < z * if μ < 1 ( ϕ τ > 1 ), τ = 1 and ϕ τ = 1 if μ = 1 .
The point τ , ϕ τ is the tangency point to the curve ϕ z of a straight line passing through the origin 0 , 0 . Let then z c : = τ / ϕ τ = 1 / ϕ τ 1 . The searched Φ z solves ψ Φ z = z , where ψ z = z / ϕ z obeys ψ τ = z c , ψ τ = 0 and ψ τ = τ ϕ τ ϕ τ 2 > . Thus, ψ z z c + 1 2 ψ τ z τ 2 else z z c + 1 2 ψ τ Φ z τ 2 (a branch-point singularity). It follows that Φ z displays a dominant power singularity of the order of 1 / 2 at z c with Φ z c = τ in the sense (with σ c 2 = τ 2 ϕ τ / ϕ τ )
Φ z z z c τ 2 ϕ τ ϕ τ 1 z / z c 1 / 2 = τ 1 2 σ c 1 z / z c 1 / 2 .
By singularity analysis therefore, see [10], we obtain [in agreement with [11], Theorem 13.1, p. 32]:
P N ¯ 1 = n = z n Φ z n ϕ τ 2 π ϕ τ n 3 / 2 z c k + O n 5 / 2 z c n ,
to the dominant order in n, with a geometric decay term at rate z c 1 = ϕ τ < 1 and a ‘universal’ power-law decay term n 3 / 2 (under the moment conditions on ϕ ).
In the critical case μ = 1 , z c = 1 and P N ¯ 1 = n is a pure power law with a ‘universal’ value of 3 / 2 .
In the case of the binomial branching mechanism, one can check that τ = q / p d 1 , ϕ τ = q d d 1 d , in agreement with (10). In this context as well, τ plays an important role.
The binomial model is supercritical if and only if μ : = ϕ 1 = p d > 1 (else p > p c : = 1 / d ), in which case the smallest solution ρ e to ϕ ρ e = ρ e lies in 0 , 1 . This is a degree-d algebraic equation. In this case, Φ 1 = P N ¯ 1 < = ρ e , the extinction probability of the binomial branching process.
-
If ϕ 1 = 1 ( < 1 ), the model is critical (subcritical) and Φ 1 = 1 . We have
Φ 1 = ( respectively Φ 1 = 1 1 ϕ 1 = 1 1 p d ) .
Critical trees are finite with probability 1, but their time to extinction is long with a law tail equivalent to that of a Pareto(1) power-law distribution.
In the subcritical case, N ¯ 1 has all its moments, in particular, a finite variance σ 2 N ¯ 1 = p q d 1 p d 3 < .
-
If ϕ 1 > 1 , the model is supercritical and Φ 1 = ρ e together with E N ¯ 1 = . We also have
Φ 1 = 1 1 ϕ ρ e = q + p ρ e q p d 1 ρ e ,
the mean number of atoms of the binomial branching process conditioned on extinction with modified branching mechanism ϕ ρ e z / ϕ ρ e , having a mean number of offspring ϕ ρ e < 1 . From the above expression of Φ 1 , we obtain ρ e < q p d 1 , an upper-bound of ρ e .
From (9), ρ e = Φ 1 = n 1 P N ¯ 1 = n is an expression of ρ e .
There is also an estimate of ρ e when the BGW process is nearly supercritical ( μ slightly above 1). Let ρ ¯ e = 1 ρ e be the survival probability and f z = ϕ z z , with
f 1 = 0 , f 1 = μ 1 and f 1 = E ξ ξ 1 = σ 2 + μ 2 μ μ 1 + σ c 2 ,
where σ c 2 is the variance in ξ at the criticality. We have
ρ e = ϕ ρ e f 1 ρ ¯ e = 0 .
As a result of
f 1 x f 1 x f 1 + 1 2 x 2 f 1 ,
we obtain the small survival probability estimate ρ ¯ e 2 μ 1 / σ c 2 when the BGW process is nearly supercritical. As a function of μ 1 , ρ ¯ e is always continuous at 0 ( ρ ¯ e = 0 if μ 1 0 ) but with a discontinuous slope at μ 1 + , close to 2 / σ c 2 < . As μ clearly ρ ¯ e 1 .
A full power-series expansion of ρ ¯ e in terms of μ 1 > 0 can also be obtained as follows: define ϕ ¯ z by ϕ z = 1 + μ z 1 + ϕ ¯ 1 z , so with ϕ ¯ 0 = 0 . The equation ρ e = ϕ ρ e becomes
ϕ ¯ ρ ¯ e ρ ¯ e = μ 1 .
For μ 1 close to 0 + , the Lagrange inversion formula gives
ρ ¯ e = k 1 ρ k μ 1 k , with
ρ k = 1 k z k 1 ϕ ¯ z z 2 k .
Note ρ 1 = 2 / ϕ 1 with ϕ 1 σ c 2 when μ is slightly above 1. To the first order in μ 1 , we recover ρ ¯ e 2 μ 1 / σ c 2 . The second-order coefficient is found to be ρ 2 = 4 / 3 · ϕ 1 / ϕ 1 3 . Let us detail these formulae in the binomial example.
Example 1.
If ϕ z = q + p z d , μ = ϕ 1 = p d > 1 (the binomial case) and ϕ 1 = p 2 d d 1 σ c 2 . Here ϕ ¯ z / z 2 = b = 0 d 2 1 b d b + 2 p b + 2 z b , giving ρ k in principle, starting with ρ 1 = z 0 ϕ ¯ z z 2 1 = 2 / p 2 d d 1 .
From the exact expression of P τ 1 , 0 > h as in [12,13], we observe the following finite-size scaling law in the slightly supercritical regime for which μ = 1 + x / h , x > 0 and ρ e 1 2 μ 1 / σ c 2 , [ σ c 2 = p 2 d d 1 , the critical variance of ξ when μ = 1 :
h P τ 1 , 0 > h r x : = 1 σ c 2 2 x e x e x 1 as h .
As in the strictly critical regime, the time to extinction has power-law tails with index 1, but with a non-constant asymptotic rate r x .
Regular supercritical or subcritical BGW processes conditioned to be critical. We end up with a last conditioning leading to a critical BGW tree with mean offspring number μ c = 1 . Let ϕ , regular, obey ϕ which has a convergence radius of z * > 1 (possibly z * = ) and π 0 > 0 . For such ϕ ’s, a unique positive real root to the equation
ϕ τ τ ϕ τ = 0 ,
exists, with ρ e = 1 < τ < z * if μ < 1 ( ϕ τ > 1 ), τ = 1 if μ = 1 and ρ e < τ < 1 < z * if μ > 1 ( ϕ τ < 1 ). In both cases, ϕ τ < 1 .
Start with a supercritical branching process ( μ > 1 ) and consider a process whose modified branching mechanism is ϕ τ z : = ϕ τ z / ϕ τ , satisfying ϕ τ 1 = 1 and ϕ τ 1 = : μ c = 1 , the one of a critical branching process with mean 1 offspring distribution and variance: σ c 2 = τ 2 ϕ τ / ϕ τ . The transition matrix P c of the critical process is given by its entries
P c k , k = z k ϕ τ z k = τ k ϕ τ k P k , k = τ k k ϕ τ k P k , k .
In terms of p.g.f., with Φ τ z : = τ 1 Φ τ / ϕ τ z , where Φ z is the total progeny’s supercritical generating function (g.f.), solving Φ z = z ϕ Φ z , Φ 0 = 0 ,
Φ τ z = z ϕ τ Φ τ z , Φ τ 0 = 0 .
The new Φ τ z has a new shifted convergence radius 1 (obeying τ / ϕ τ z z c ) and with Φ τ 1 = τ 1 Φ τ / ϕ τ = 1 ( Φ z c = τ ).
This transformation kills the supercritical paths to only select the critical ones.
When ϕ z = q + p z d , τ = q / p d 1 and τ / ϕ τ = q p d 1 d 1 d q d . Hence,
ϕ τ z : = ϕ τ z / ϕ τ = d 1 + z d d ,
is the critical binomial branching mechanism.
Similarly, starting with a subcritical branching process ( μ < 1 ) and considering a process whose modified branching mechanism (as a p.g.f.) is ϕ ˜ c z = ϕ τ z / ϕ τ , satisfying ϕ τ 1 = 1 and ϕ τ 1 = : μ c = 1 , the one of a critical branching process.
This transformation creates critical paths from the subcritical ones.

2.3. Random Trees as Weighted Combinatorial Trees

Consider the plane tree ordinary generating function (o.g.f.) solving T z = z 1 + T z d , with c n = z n T z = 1 n z n 1 1 + z n d = 1 n n d n 1 counting the number of such combinatorial binomial size n trees. By the Stirling formula, with η c : = sup z > 0 : T z < = d 1 d 1 d d < 1 , for large n,
c n 1 2 π d d 1 3 1 / 2 n 3 / 2 η c n .
Let w τ n = b = 0 n 1 w b n b τ n be the (multiplicative) weight of an unlabeled rooted tree τ n with n nodes ( τ n = n ) having n b τ n nodes with outdegree (branching number) b . The weight w τ n is the product over the n nodes x of τ n of the w b n x ’s, where b n x is the outdegree of x. Then, W n = τ n w τ n is the weight of all size-n such trees associated to the weight sequence w : = w b 0 , b 0 , so with W n = P N ¯ 1 = n , if, as in the binomial case, w b = p b q d b , b = 0 , , d and the number of these trees is c n . Let Φ z = n 1 z n W n . Then, Φ z solves Φ z = z ϕ Φ z , Φ 0 = 0 , where ϕ z = b 0 π b z b = q + p z d . Recalling b = 0 n 1 n b τ n = n and b = 1 n 1 b n b τ n = n 1 (the total tree length), each tree τ n has equal weight b = 0 d w b n b τ n = p n 1 q n d 1 + 1 = q p p q d 1 n and
W n = P N ¯ 1 = n = c n q p p q d 1 n ,
a separable form in agreement with (9). Simply generated weighted trees are weighted versions of rooted such trees and have been introduced in [14]. When dealing with k-forests of size-n, owing now to
b = 0 n 1 n b τ n = n b = 1 n 1 b n b τ n = n k ,
then b = 0 d w b n b τ n = p n k q n d 1 + k = q p k p q d 1 n and
P N ¯ k = n = c n , k q p k p q d 1 n
where c n , k is the number of combinatorial binomial size n forests with k-trees.

2.4. Selection of Paths Mechanisms of Random Trees: Rescaling

With a 1 , a 2 > 0 , consider the weight sequence w b = a 1 b a 2 for which b = 0 d w b n b τ n = a 1 n 1 a 2 n . Then,
P N ¯ 1 = n P N ¯ 1 = n a 1 n 1 a 2 n
is the weighted version of P N ¯ 1 = n . Equivalently,
Φ z Φ ˜ z = a 1 1 Φ a 1 a 2 z ,
solving Φ ˜ z = z ϕ ˜ Φ ˜ z where ϕ ˜ z = a 2 ϕ a 1 z is the modified ‘branching mechanism’ and not necessarily a p.g.f. It is a p.g.f. when a 2 = 1 / ϕ a 1 [if in addition a 1 = τ , this is the selection of the critical paths mechanism discussed above; if in addition a 1 = ρ e , this is the selection of the subcritical paths mechanism discussed above].
If a 1 = 1 and a 2 z c , Φ ˜ z = Φ a 2 z resulting in a weighted version of Φ z with shifted convergence radius z c z ˜ c = z c / a 2 1 [ = 1 if in addition a 2 = z c ]. Note that ϕ ˜ z = a 2 ϕ z no longer is a branching mechanism if ϕ z is one with ϕ 1 = 1 .
If a 1 a 2 = 1 , ϕ ˜ z = a 1 1 ϕ a 1 z is the modified ‘branching mechanism’, not necessarily a p.g.f. unless a 1 = ϕ a 1 . Then, Φ ˜ z = a 1 1 Φ z , resulting in a scaled version of Φ z with an unmodified convergence radius. If Φ 1 < 1 , choosing a 1 = a 2 1 = Φ 1 yields Φ ˜ 1 = 1 .

2.5. Total Number of Leaves (Sterile Individuals) Versus Total Progeny

In the branching population models just discussed, it is important to control the number of leaves in the BGW tree with a single founder because leaves are nodes (individuals) of the tree (population) that gave birth to no offspring (the frontier of the tree as sterile individuals) and so are responsible for its extinction. Leaves are nodes with outdegree zero, so let N ¯ 0 1 be the number of leaves in a BGW tree with N ¯ 1 nodes. With Φ z , u 0 = E z N ¯ 1 u 0 N ¯ 0 1 the joint p.g.f. of N ¯ 1 , N ¯ 0 1 solves the functional equation
Φ z , u 0 = z π 0 u 0 1 + ϕ Φ z , u 0 .
With N ¯ n 0 1 : = N ¯ 0 1 N ¯ 1 = n , we have
E u 0 N ¯ n 0 1 = z n Φ z , u 0 z n Φ z , 1 ,
where Φ z , 1 = Φ z . It is shown using this in [15], Th. 3.13 , page 84, that, under our assumptions of ϕ ,
1 n E N ¯ n 0 1 n m 0 = ϕ 0 ϕ τ 1 n σ 2 N ¯ n 0 1 n σ 0 2 = ϕ 0 ϕ τ ϕ 0 2 ϕ τ 2 ϕ 0 2 τ 2 ϕ τ 2 ϕ τ N ¯ n 0 1 m 0 n σ 0 n d n N 0 , 1 .
As n , 1 n N ¯ n 0 1 converges in probability to m 0 < 1 , the asymptotic fraction of nodes in a size-n tree which are leaves. For the Geo0 π 0 generated tree with ϕ z = π 0 / 1 π ¯ 0 z , it can be checked that m 0 = 1 / 2 , whereas for the Poisson generated tree with p.g.f. ϕ z = e μ z 1 , m 0 = e 1 . For the negative binomial tree generated by ϕ z = β / 1 α z θ , m 0 = θ / θ + 1 θ , and for the Flory d-ary tree generated by the p.g.f. ϕ z = 1 p + p z d , m 0 = d 1 / d d .
One possible way to see this is as follows.
With x m : = x x 1 x m + 1 , the m-falling factorial moments of N ¯ n 0 1 are given from Lagrange inversion formula by
E N ¯ n 0 1 m = 1 n u 0 1 m z n 1 π 0 u 0 1 + ϕ z n 1 n z n 1 ϕ z n = n m π 0 m z n 1 ϕ z n m z n 1 ϕ z n .
When m = 1 and for ϕ z = q + p z d , a large n estimate using Stirling formula yields
1 n E N ¯ n 0 1 n m 0 = d 1 / d d ,
independent of p . The variance estimate follows after some elementary algebraic computations dealing with m = 2 .
When d = 2 , 1 n E N ¯ n 0 1 n m 0 = 1 / 4 as a result of
E N ¯ n 0 1 = n 2 n 1 n 1 2 n n 1 .
Remark 1.
Taking z = 1 in (22), Φ u 0 : = Φ 1 , u 0 solves the functional equation Φ u 0 = π 0 u 0 1 + ϕ Φ u 0 , with Φ u 0 = E u 0 N ¯ 0 1 , N ¯ 1 < , where N ¯ 0 1 is the number of leaves of the tree regardless of its precise number of atoms. By the Lagrange inversion formula, the probability to observe n 0 leaves under N ¯ 1 < is
u 0 n 0 Φ u 0 = 1 n 0 u 0 n 0 1 π 0 u 0 u 0 + π 0 ϕ u 0 n 0 .

2.6. Forests

Consider now a k-forest of such trees (so with k founders). It takes into account the possibility that there are k independent distinguishable copies of BGW trees, each with a single founder. By the Lagrange formula, with N ¯ k , the size of a forest given that it has k founders, consistently with (4), we have
P N ¯ k = n : = z n Φ z k = k n z n 1 z k 1 q + p z n d = k n z n k q + p z n d = k n n d n k q p k p q d 1 n .
In the critical case ( p d = 1 ), P N ¯ k = n = k n c n n d n k d 1 k n 1 1 / d n d . In this case, the BGW process is with a constant ( = k ) population size over the generations on average. The extinction probability of a k forest is ρ e k .
By the Stirling formula, with α ¯ : = 1 α ,
n d n k = n d ! n k ! n d 1 + k ! d 2 π α ¯ d α ¯ n d d α ¯ α ¯ d α ¯ d α ¯ n .
As a result, with n k , in the thermodynamic limit n , k while k = n α , 0 < α 1 , the number K n of connected components (trees) given a size n forest population obeys
0
P K n / n α α d 2 π α ¯ d α ¯ n d d q p α p q d 1 α ¯ α ¯ d α ¯ d α ¯ n .
Hence, the number of trees forming a size n forest obeys
1 n log P K n / n α n f 1 α 0 ,
where f 1 α = log a 1 α and a 1 α = d d q p α p q d 1 α ¯ α ¯ d α ¯ d α ¯ . We have
f 1 α = log d α ¯ α ¯ log q / p f 1 α = 1 α + 1 d α ¯ > 0 .
The function f 1 α is convex over 0 < α 1 . In the subcritical regime ( p d < 1 ), it has a minimum at α * = 1 p d , with f 1 α * = 0 translating that K n / n α * , almost surely as n .
Similarly, with N ¯ k , the number of atoms in a k-forest, with n = k ρ , ρ 1 ,
P N ¯ k / k ρ d 2 π k ρ 1 d 1 ρ + 1 d d ρ p ρ ρ 1 ρ 1 q ρ d 1 ρ + 1 d 1 ρ + 1 k .
With a 2 ρ : = d d ρ p ρ ρ 1 ρ 1 q ρ d 1 ρ + 1 d 1 ρ + 1 and
f 2 ρ : = log a 2 ρ = ρ log p q d 1 ρ d d ρ 1 d 1 ρ + 1 d 1 + log p d 1 ρ + 1 q ρ 1
(convex over the domain ρ 1 ), by Cramér’s theorem [16],
1 k log P N ¯ k / k ρ f 2 ρ 0 .
The rate function f 2 ρ is the Legendre transform of the convex free energy log Φ z . Therefore,
f 2 ρ = ρ log z ρ log Φ z ρ where z ρ Φ z ρ Φ z ρ = ρ ,
with
z ρ = q p ρ 1 d 1 ρ + 1 d 1 ρ q d d . Φ z ρ = q ρ 1 p d 1 ρ + 1 , Φ z ρ = z ρ 1 ρ Φ z ρ .
As required, as ρ , z ρ z c , Φ z ρ Φ z c = τ and Φ z ρ .
In the subcritical case, a Central Limit Theorem (CLT) holds:
N ¯ k k E N ¯ 1 k σ N ¯ 1 d N 0 , 1 .
In the binary case ( d = 2 ),
f 2 ρ = log 4 ρ p ρ ρ 1 ρ 1 q ρ ρ + 1 ρ + 1 f 2 ρ = log ρ 2 1 ρ 2 log 4 p q f 2 ρ = 2 ρ ρ 2 1 > 0 .
In this case,
z ρ = ρ 2 1 4 p q ρ 2 Φ z ρ = q ρ 1 p ρ + 1 , Φ z ρ = 4 p q 2 ρ 3 p ρ + 1 ρ + 1 .
Note f 2 ρ = 0 when ρ = : ρ * = 1 / 1 4 p q = 1 / 1 2 p > 1 . In the subcritical case, ρ * = 1 / 1 2 p > 1 , with f 2 ρ * = 0 , translating N ¯ k / k ρ * , almost surely as k .
In the critical case, N ¯ 1 is heavy tailed with tail index 1 / 2 ; we thus expect that k 2 N ¯ k d S 1 / 2 (a stable 1 / 2 random variable) and therefore that n 1 / 2 K n d S 1 / 2 1 / 2 ( n ).
In the supercritical case, ρ * = 1 / 2 p 1 > 1 , with f 2 ρ * = 2 2 p 1 log 4 p q > 0 .
Remark 2.
(i) The process N ¯ k k has stationary independent increments. The processes N ¯ k and K n are mutual inverses in that
K n = inf k : N ¯ k > n .
 
The process K n n is a renewal process with times elapsed between consecutive moves up by one unit all distributed like N ¯ 1 :
K n = d 1 · 1 N ¯ 1 > n 1 + m = 1 n 1 1 N ¯ 1 = m · 1 + K n m , n 1 .
(ii) 
Considering the weak case with Φ 0 z substituted to Φ z , the Lagrange inversion formula yields
P N ¯ k = n : = z n Φ 0 z k = k n z n 1 z k 1 q + p z n + k 1 d = k n z n k q + p z n + k 1 d = k n q n + k 1 d n k q d + 1 p k p q d 1 n .
In the upper binomial term, n is changed to n + k 1 ( k ρ + 1 ). By the Stirling formula, a new f 2 ρ is obtained while substituting ρ + 1 to ρ in (28). Note now the new ρ 0 .
Maxwell–Boltzmann  d partition of n into k parts.
Let N ¯ n , k = N ¯ n , 1 , , N ¯ n , k be the joint total progenies of each of the k founders. With n k = n 1 , , n k , we have
P N ¯ n , k = n k = z n l = 1 k z l n l l = 1 k Φ z z l z n Φ z k = l = 1 k P N ¯ n , l = n l P N ¯ k = n δ n k = n = l = 1 k 1 n l n l d n l 1 k n n d n k δ n k = n .
where n k = l = 1 k n l sums to n. It is an exchangeable Maxwell–Boltzmann balls-in-boxes distribution (independent of p , q ) on the simplex n k = n , where the balls consist of the progenies of each founder (the boxes).
The size of a typical (1-dimensional marginal) box occupancy is given by
P N ¯ n , 1 = n 1 = z n 1 Φ z z n n 1 Φ z k 1 z n Φ z k
= k 1 n k n 1 n n 1 n 1 d n 1 1 n n 1 d n n 1 k + 1 n d n k .
With n 1 = 1 , , n k + 1 , summing over n 1 , the pmf N ¯ 1 is also seen non-defective and proper. Clearly, E N ¯ 1 = n / k .
A large population thermodynamic limit exists ( n , k with n / k ρ 1 ), with N ¯ 1 = : N ¯ 1 ρ having the mean- ρ limiting distribution given, from a saddle-point analysis, by
E z N ¯ 1 ρ = Φ z z ρ Φ z ρ ,
where z ρ 0 , z c solves
z Φ z Φ z = ρ .
For all ρ 1 , z ρ is uniquely defined firstly because z Φ z Φ z is increasing and because, as observed before, when z z c , Φ z Φ z c = τ and Φ z with ρ .
Strict d-partition of n into k parts.
So far, as in [17], we allow the population size to vary stochastically according to a Galton–Watson branching process, possibly with a constant size on average as in the critical case. However, most population genetics studies have their origins in a Wright–Fisher or some closely related fixed-population model, in which each individual randomly chooses its ancestor [18]. We briefly describe the situation relative to the binomial branching mechanism, in which the process is strictly (almost surely) with constant population size k over the generations. Consider k independent and identically distributed random variables ξ k : = ξ 1 , , ξ k each with binomial distribution ß : = π 1 , , π d . Let ν k = ν 1 , k , , ν k , k : = ξ 1 , , ξ k ξ k = k . Then, with l 1 + + l k = k ,
P ν k = l 1 , , l k = π l 1 π l k z k q + p z k d = d l 1 d l k k d k ,
recalling π l = d l p l q d l = 0 if l 0 , , d and z k q + p z k d = k d k p k q k d 1 . This distribution is independent of p and q . As a consequence, we obtain the identity
l 1 + + l k = k d l 1 d l k = k d k .
The distribution of ν k is exchangeable, the law of each component being
P ν 1 , k = l 1 = z k z 1 l 1 q + p z z 1 d q + p z k 1 d z k q + p z k d = d l 1 k 1 d k l 1 k d k , l 1 = 0 , , d k .
When d , ref. (33) boils down to
P ν k = l 1 , , l k = k ! i = 1 k l i ! k k ,
the multinomial Wright–Fisher distribution. Asymptotic independence is obtained when k in which the law of ν takes the product form of independent and identically distributed mean 1 Poisson distributions.

2.7. Random Simple Trees with Given Outdegree Sequences

The joint generating function of simple trees with given outdegree sequence solves
Φ z , u = z b = 0 d π b u b Φ z , u b .
Here, u : = u 1 , , u d marks the nodes with the different outdegrees. Hence,
z n b = 0 d u b n b Φ z , u = 1 n z n 1 b = 0 d u b n b b = 0 d π b u b z b n = 1 n n n 0 n 1 n b b = 0 d π b n b = 1 n n n 0 n n 0 n 1 n b b = 0 d π b n b
with the n b ’s obeying
n 1 + + n d = n n 0 b = 1 d b n b = n 1 .
As a result: n 0 = 1 + b = 1 d b 1 n b and n = 1 + b = 1 d b n b .
There are n n 0 + d 1 d 1 , such non-negative n b ’s satisfying the first constraint (as a weak composition). In the sequel, we shall use the symbol whenever summing over the n d : = n 1 , , n d obeying the two constraints (34) above. Clearly, the number of non-negative integers solving (34) is given by
u n n 0 z n 1 b = 1 d 1 u z b 1 .
It is the number of unordered partitions of n 1 into no more than d non-negative parts, the number of occurrences of part b in a partition being n b with b = 1 d n b = n n 0 .
The joint generating function (p.g.f.) of their nodes and leaves in particular reads
Φ z , u 0 = z q d u 0 1 + q + p Φ z , u 0 d ,
hence with
z n Φ z , u 0 = 1 n z n 1 q d u 0 1 + q + p z d n = 1 n z n 1 n 0 = 0 n n n 0 q n 0 d u 0 n 0 q + p z d q d n n 0 .
Therefore, the probability of a configuration with n atoms and n 0 leaves is
z n u 0 n 0 Φ z , u 0 = q n 0 d n n n 0 z n 1 q + p z d q d n n 0
= π 0 n 0 n n n 0 z n 1 b = 1 d π b z b n n 0 = π 0 n 0 n n n 0 n d * * n n 0 n 1 n d b = 1 d π b n b ,
in view of
b = 1 d π b z b n n 0 = n 1 + + n d = n n 0 z b = 1 d b n b n n 0 n 1 n d b = 1 d π b n b .
From (35), given n and n 0 , the number of d-trees having outdegrees nodes sequence n 1 , , n d satisfying * * in (34) (as the factor in front of the weights b = 1 d π b n b ) is
c n , n 0 n d = n 1 ! n 0 ! 1 b = 1 d n b ! ,
in agreement with [19,20] Theorem 4.
All this can prove useful and explicit in the following situation: suppose we are interested in a specific set of n d ’s, from which the values of n , n 0 follow from (34). Then, the corresponding number of d-trees is known, together with the probability of such a configuration. For example, suppose n b = n 1 for b = 1 , , d . Then, n = 1 + n 1 d + 1 2 and n 0 = n d n 1 , and the probability of this uniform configuration is
n 1 ! n 0 ! b = 0 d π b n 1 n 1 ! d .
Remark 3.
In the binary case ( d = 2 ) , for fixed values of n , n 0 1 , , n + 1 / 2 , we have n 1 = n 2 n 0 + 1 and n 2 = n 0 1 (a single possible choice for n 1 , n 2 ); if n 2 n 0 + 1 < 0 , there are no solutions to (34). Hence,
P N ¯ 1 = n , N ¯ b = n b ; b = 0 , , 2 = 1 n n n 0 n 1 n 2 b = 0 2 π b n b · δ n 1 = n 2 n 0 + 1 , n 2 = n 0 1 .
In addition, while summing over n 0 ,
P N ¯ 1 = n = q n p 2 n n 1 p q n ,
in agreement with (9). Given n , n 0 , there are
c n , n 0 n 1 , n 2 = n 1 ! n 0 ! n 0 1 ! n 2 n 0 + 1 !
such binary simple trees. If n = 10 , n 0 = 4 yields 9 ! / 4 ! 3 ! 3 ! = 420 . There are no binary trees with n = 4 , n 0 = 3 .
As an extension, given n and n 0 , the number of k-forests of simple d-trees having outdegrees nodes sequence n 1 , , n d is
c n , n 0 , k n d = n k ! n 0 k + 1 ! 1 b = 1 d n b ! .
Here n 1 , , n d now satisfy
n 1 + + n d = n n 0 b = 1 d b n b = n k .
This is in agreement with Theorem 4 of [1].

2.8. The Limiting Poisson Case ( d )

We here mention some related computations encompassing the limiting Poisson case. Let T z solve T z = z e T z with T 0 = 0 , and hence with solution
T z = n 1 n n 1 n ! z n ,
with for n 1
C n : = n ! z n T z = n n 1 .
C n counts the number of labeled Cayley simple rooted trees (the Cayley formula). The convergence radius of T z is ϱ c = 1 / e .
The g.f. of random Poisson μ rooted increasing trees solves the functional equation Φ z = z ϕ Φ z with ϕ z = e μ 1 z and Φ 0 = 0 . Hence,
Φ z = n 1 n n 1 n ! z / z c n
with convergence radius z c = e μ / μ > 1 . Then,
P N ¯ n 1 = n = z n Φ z = n n 1 n ! z c n ,
is the Borel distribution. With w b = μ b e μ , the weight of a node with outdegree b, the weight of a size n tree is
w τ n = b 0 w b n b = μ n 1 e n μ
independent of the n b s. Hence, P N ¯ n 1 = n = C n n ! w τ n , a separable form.
Furthermore, for k-forests of such simple trees,
P N ¯ k = n = k n z n k e n μ 1 z = k n e n μ μ n n k n k ! ,
the Borel–Tanner distribution [21]. Using the Stirling formula, with ρ > 1 ,
P N ¯ k k ρ 2 π k ρ 1 ρ e ρ μ k ρ 1 μ ρ k ρ 1
showing that
1 k log P N ¯ k k ρ f 2 ρ = ρ μ + ρ 1 log ρ 1 μ ρ .
The joint combinatorial generating function of their nodes and leaves reads
T z , u 0 = z u 0 1 + e T z , u 0 ,
hence with
z n T z , u 0 = 1 n z n 1 u 0 1 + e z n = 1 n z n 1 n 0 = 0 n n n 0 u 0 n 0 e z 1 n n 0 .
Therefore,
C n , n 0 : = n ! z n u 0 n 0 T z , u 0 = n 1 ! n n 0 z n 1 e z 1 n n 0
= n ! n 0 ! S n 1 , n n 0 , n 0 = 1 , , n 1 ,
due to the vertical g.f. of second-kind Stirling numbers S n , n 0 (see [22]):
n n 0 S n , n 0 z n n ! = e z 1 n 0 n 0 ! z n e z 1 n 0 = n 0 ! n ! S n , n 0 .
We obtain
n 0 C n + 1 , n 0 = n 0 n + 1 C n , n 0 + n + 1 n n 0 + 1 C n , n 0 1 ,
with boundary conditions C n , 0 = C n , n = 0 , C n , 1 = n ! and C n , n 1 = n . In addition,
C n : = z n T z , 1 = n n 1 = n 0 = 1 n 1 C n , n 0 .
Assuming uniform sampling, we have P L n = n 0 = C n , n 0 C n > 0 , n 0 = 1 , , n 1 (otherwise, 0): the law of L n has finite support, varying with n. From (41), with L n = N ¯ n 0 1 , we obtain
n 0 P L n + 1 = n 0 = n 0 C n + 1 , n 0 C n + 1 =
n n + 1 n 1 n 0 P L n = n 0 + n n 0 1 P L n = n 0 1 .
The latter recursion (41) may be written as
P L n + 1 = n 0 = q n 0 , n 0 n P L n = n 0 + q n 0 1 , n 0 n P L n = n 0 1 ,
defining the (positive) transition coefficients q n 0 , n 0 n and q n 0 1 , n 0 n (not transition probabilities because q n 0 , n 0 n + q n 0 , n 0 + 1 n 1 ). This three-term (‘space-time’ inhomogeneous) recurrence is therefore not the one of a standard Markov chain with a usual probability transition matrix. However, it is the one of a triangular Markovian probability sequence whose support varies with n linearly.
Next, the identity ([23])
n 0 = 1 n 1 C n , n 0 x n n 0 n n n 0 = x n 1
yields (with x = n 1 ): n 0 = 1 n 1 n 0 C n , n 0 = n n 1 n 1 . Also,
u 0 T z , 1 = z 1 T z , 1 ,
with
z n u 0 T z , 1 = z n 1 1 1 T z , 1 = 1 n 1 z n 2 1 1 z e z = n n 1 n 1 , so that
E L n = z n u 0 T z , 1 z n T z , 1 = n 1 1 n n 1 n / e σ 2 L n = n n 1 1 + 2 n n 1 + n 1 1 n n 1 n 2 1 1 n 2 n 1 n e 2 e 2 .
The variance term is obtained while plugging x = n 2 in the identity. The Central Limit Theorem (CLT) therefore holds (see [23]):
L n E L n σ L n d N 0 , 1 .
With z ρ obeying z ρ Φ z ρ Φ z ρ = ρ 1 , Φ z ρ z Φ z ρ is the mean ρ ’Cayley’ distribution of the typical box occupancy in the thermodynamic limit n , k , n / k = ρ .

2.9. The Case d = 1

Combinatorial linear BGW trees are those for which T z = z 1 + T z , T 0 = 0 yielding T z = z / 1 z , with c n = z n T z = 1 ). The branching number of a node is either zero or one, leading to ‘threadlike trees’. We have c n = n 1 z n 1 1 + z n = 1 .
Furthermore, c n , k = z n T z k = z n k 1 z k = n 1 k 1 , the number of (ordered) compositions of n into k parts. Note that by the Lagrange inversion formula, c n , k = k n z n k 1 + z n .
Random linear increasing trees are those for which Φ z = z q + p Φ z , Φ 0 = 0 yielding Φ z = q z / 1 p z , with z n Φ z = c n w τ n and w τ n = q p n 1 , the weight of all such size n trees. The extinction probability of this model is ρ e = 1 (a subcritical regime).
Furthermore, P K n = k = z n Φ z k = n 1 k 1 q p n 1 k , k = 1 , , n .
In a size n forest with K n trees, the law of the number of leaves coincides with the one of K n as a result of any threadlike tree possessing a single leaf.
In the weak case, with Φ 0 z = q / 1 p z ,
P K n = k = z n Φ 0 z k = n + k 1 k 1 q p n 1 k , k 0
where n + k 1 k 1 is the number of weak compositions of n into k parts.

3. Increasing (or Recursive) d Trees as Phylogenetic Trees

A size-n rooted and increasing labeled tree has vertices with indices or labels 1 , , n increasing for any path from the root to its leaves. Wherever a new connection is created in this tree, the adjunction of a new node with index n + 1 will result in a size-( n + 1 ) rooted increasing tree. Increasing trees can in addition be unordered (Cayley) or ordered. The combinatorial version of such trees was studied by [24].
Let T z solve T z = 1 + T z d with T 0 = 0 , and hence with solution
T z = 1 d 1 z 1 / d 1 1 ,
with for n 1
C n : = n ! z n T z = 1 / d 1 n d 1 n = m = 0 n 1 1 + d 1 m = : 1 : d 1 n ,
where a n = a a + 1 a + n 1 . C n counts the number of labeled binomial increasing trees. The convergence radius of T z is r c = 1 / d 1 1 .
Such increasing trees serve as models for phylogenetic trees in which nodes represent species with labels encoding their order of appearance in the tree, and thus the chronology of evolution. The leaves of the tree are the currently living species that can mutate to a new species; the internal nodes are the ones that can generate a new species (in the d ary tree context, only nodes that are not at saturation with d offspring have this ability); the different trees of a forest consist of genera.

3.1. Random Binomial Increasing Trees

The g.f. of random binomial rooted increasing trees solves the ordinary differential equation Φ z = ϕ Φ z with ϕ z = q + p z d and Φ 0 = 0 , and hence
Φ z = 1 p q d 1 p d 1 z 1 / d 1 q
= q p 1 p q d 1 d 1 z 1 / d 1 1 ,
with z c = sup z > 0 : Φ z < = 1 / p q d 1 d 1 > 1 and Φ z c = .
As a result, Φ z has an algebraic singularity of order 1 / d 1 at z c . Note that
Φ 1 = P N ¯ 1 < = : r e = q p 1 1 / z c 1 / d 1 1 < 1 .
Whatever the values of p and d, there is a positive probability that N ¯ 1 = , and there is no phase transition to subcriticality with almost sure extinction for the increasing version of a d-tree. For each d 2 , the convergence radius z c is a convex function of p , taking its maximum value d / d 1 d > 1 when p = 1 / d .
With a n = a a + 1 a + n 1 , we obtain
P N ¯ 1 = n : = z n Φ z = q p 1 / d 1 n n ! z c n , n 1 ,
with P N ¯ 1 = n q p n d 2 / d 1 Γ 1 / d 1 z c n , as n , with geometric decay and presenting an algebraic prefactor, the exponent of which ( θ : = d 2 / d 1 0 , 1 ) now depends on d .
Note that, as required, with w τ n = b = 0 d w b n b τ n = q p p q d 1 n , the weight of each tree τ n ,
P N ¯ 1 = n = C n n ! w τ n ,
is a separable form.
Remark 4 (weighted versions of increasing trees).
With a 1 , a 2 > 0 , consider the weight sequence w b = a 1 b a 2 for which b = 0 d w b n b τ n = a 1 n 1 a 2 n . Then, as for simple trees,
P N ¯ 1 = n P N ¯ 1 = n a 1 n 1 a 2 n
is the weighted version of P N ¯ 1 = n . Equivalently,
Φ z Φ ˜ z = a 1 1 Φ a 1 a 2 z ,
now solving Φ ˜ z = ϕ ˜ Φ ˜ z , where ϕ ˜ z = a 2 ϕ a 1 z is the modified ‘branching mechanism’, not necessarily a p.g.f. It is a p.g.f. when a 2 = 1 / ϕ a 1 .
It is very unrealistic that any evolutionary process would lead to a configuration with infinitely many species. This forces one to consider the binomial increasing branching process conditioned on extinction, so with a modified binomial branching mechanism ϕ z ϕ r e z / r e (but here, not a p.g.f.). The p.g.f. of its total progeny, say N ¯ 1 * , then reads
Φ * z : = E z N ¯ 1 * = Φ z Φ 1 = 1 z / z c 1 / d 1 1 1 1 / z c 1 / d 1 1 .
Note that E N ¯ 1 * = Φ 1 / Φ 1 = ϕ r e / r e > 1 , observing r e < ρ e where ρ e is the smallest solution to ϕ ρ e = ρ e (the extinction probability of the simple BGW d-ary tree).
The Lagrange inversion formula version for increasing trees states that for all n 1 ,
z n h Φ z = 1 n z 1 h z P z n .
where P z : = 0 z d z / ϕ z . Note that, because P 0 = 0 , with R z = z / P z (obeying R 0 = 1 / π 0 > 0 if π 0 = ϕ 0 > 0 ), this is also
z n h Φ z = 1 n z n 1 h z R z n .
In particular, ( h z = z ) and ϕ z = q + p z d , P z = 1 p d 1 q 1 d q + p z 1 d , with (47),
z n Φ z = 1 n z 1 P z n = 1 n z n 1 R z n = q p 1 / d 1 n n ! z c n .
And with h z = z k , P N ¯ k = n = z n Φ z k ,
P N ¯ k = n = k n z 1 z k 1 P z n = k n z n k R z n = 1 n q p k k / d 1 n k + 1 n k ! z c n , n k .
If d = 2 : P z = 1 p q 1 q + p z 1 and R z = 1 / 1 q 1 q + p z , yielding
P N ¯ k = n = k q n n z n k q + p z n = q p k n 1 k 1 p q n .
It involves a composition of n into k parts factor. Using Stirling formula, with ρ > 1 , as k ,
1 k log P N ¯ k k ρ f 2 ρ ,
with
f 2 ρ = ρ log p q ρ + ρ 1 log ρ 1 log q p .
We have f 2 ρ = log ρ p q ρ 1 , vanishing at ρ * = 1 / 1 p q > 1 with f 2 ρ * = log q 2 1 p q > 0 .
Note that, while considering instead the weak model,
P N ¯ k = n = z n Φ 0 z k = q k z n 1 z / z c k / d 1 = q k k / d 1 n n ! z c n , n 0 .
When d = 2 , P N ¯ k = n = q k n + k 1 k 1 z c n , involving the weak composition of n into k parts factor.
Remark 5.
The Lagrange inversion formula adapted to forests of distinguishable increasing trees reads
z n K z , v = v n z n 1 R z n 1 v z 2 ,
where K z , v = 1 1 v Φ z . The p.g.f. of the number K n of increasing trees of a size n forest is E v K n = z n K z , v , with P K n = k = v k z n K z , v .
Should the trees be distinguishable, the same formulae hold but now with K z , v = e v Φ z , so
z n K z , v = v n z n 1 e v z R z n .

3.2. Distribution of the Number of Leaves

The joint generating of nodes and leaves solves
z Φ z , u 0 = π 0 u 0 1 + q + p Φ z , u 0 d .
Hence,
z = 0 Φ z , u 0 d z π 0 u 0 1 + q + p z d ,
with unknown explicit solution Φ z , u 0 in general. With n 0 = 1 , , n 1 , we have
P N ¯ 1 = n , N ¯ 0 1 = n 0 : = z n u 0 n 0 Φ z , u 0 ,
with no known explicit solution in general.
Consider indeed the integral
P z , u 0 = 0 z d z π 0 u 0 1 + q + p z d ,
of which Φ z , u 0 is the z inverse: P Φ z , u 0 , u 0 = z .
With y b = e 2 i π b / d , b = 0 , , d 1 , the b-th root of unity, we have
P z , u 0 = π 0 1 u 0 1 / d 1 q π 0 1 u 0 1 / d q + p z π 0 1 u 0 1 / d d y 1 + y d = π 0 1 u 0 1 / d 1 q π 0 1 u 0 1 / d q + p z π 0 1 u 0 1 / d b = 0 d 1 d y y y b .
A fraction decomposition into simple elements of the integrand yields in principle the expression of P z , u 0 :
P z , u 0 = π 0 1 u 0 1 / d 1 b = 0 d 1 A b 1 log q + p z π 0 1 u 0 1 / d y b q π 0 1 u 0 1 / d y b ,
where A b = b b y b y b . The dominant root is y 0 = 1 , so with A 0 = b = 1 d 1 1 y b ,
P z , u 0 A 0 1 π 0 1 u 0 1 / d 1 log q + p z π 0 1 u 0 1 / d 1 q π 0 1 u 0 1 / d 1 ,
with dominant inverse
Φ z , u 0 q π 0 1 u 0 1 / d 1 e A 0 π 0 1 u 0 1 1 / d z 1 p π 0 1 u 0 1 / d .
Alternatively, with B a , b ; x the incomplete beta function, the primitive of 1 / 1 y d is
C + k 0 1 k d + 1 y k d + 1 = : C + 1 d B 1 d , 0 ; y d
as a generalized logarithm. This gives an alternative expression of P z , u 0 . However, the computation of Φ z , u 0 obeying P Φ z , u 0 , u 0 = z would require the inverse function of P z , u 0 which, to the authors’ knowledge, has no known expression in terms of special functions.
Example 2 (the binary case and the tree of life).
When d = 2 , upon decomposing the rational fraction 1 q 2 u 0 1 + q + p z 2 into simple elements and integrating,
P z , u 0 = 1 2 p q 1 u 0 log p z + q q 1 u 0 1 + 1 u 0 p z + q + q 1 u 0 1 1 u 0 .
The inverse Φ z , u 0 obeying P Φ z , u 0 , u 0 = z reads
Φ z , u 0 = q p u 0 e 2 p q z 1 u 0 1 1 + 1 u 0 1 1 u 0 e 2 p q z 1 u 0 .
When u 0 1 , both the numerator and denominator tend to 0, with, to the first order in u 0 1 ,
Φ z , u 0 Φ z = q 2 z 1 p q z .
This is (45) when d = 2 as required.
Even in this explicit expression case for Φ z , u 0 , P N ¯ 1 = n , N ¯ 0 1 = n 0 : = z n u 0 n 0 Φ z , u 0 has no simple expression. We now give an alternative path to obtain P N ¯ 1 = n , N ¯ 0 1 = n 0 , exploiting the recursive nature of binomial increasing trees.
A recurrence in special cases. Consider increasing branching random trees whose p.g.f. Φ z solves Φ z = ϕ Φ z where ϕ z = b 0 π b z b . Consider the cases ϕ z = q 1 p z θ , θ > 0 , ϕ z = e μ 1 z , μ > 0 or ϕ z = q + p z d , d 2 integer (negative binomial, Poisson, or binomial).
Note that the convergence radius of Φ z is
z c = 0 z * d z ϕ z ,
where z * = inf z > 0 : ϕ z = is the convergence radius of ϕ z .
In these three particular cases of ϕ , the formation of the tree admits the following recursive tree evolution scheme (label 1 is assigned to the root).
With probability
p b n : = Z n 1 b + 1 π b + 1 / π b ,
attach uniformly node n + 1 to any of the N ¯ b τ n nodes with outdegree b 0 , , b * of a previous size-n increasing tree τ n ( b * n 1 d ). The normalization constant is Z n = b = 0 n 1 N ¯ b τ n b + 1 π b + 1 / π b , representing the “number” of ways the new atom with label n + 1 can be inserted in τ n . This preferential attachment procedure results in a realization of τ n + 1 , (see [25]).
With B b N ¯ b τ n , b 0 , , b * mutually exclusive Bernoulli random variables (summing to 1), each with success probability N ¯ b τ n p b n , for each n 1 , we then have
N ¯ 0 τ n + 1 = N ¯ 0 τ n + 1 B 0 N ¯ 0 τ n , N ¯ b τ n + 1 = N ¯ b τ n + B b 1 N ¯ b 1 τ n B b N ¯ b τ n , b 1 , , b * , N ¯ b * + 1 τ n + 1 = 0 + B b * N ¯ b * τ n .
Whenever a connection to a node with outdegree b occurs, the number of nodes with outdegree b (respectively b + 1 ) decreases (increases) by one unit. In addition, a new node with outdegree 0 is always created, whatever the degree of the node to which the new incoming atom connects to τ n .
For the three particular ϕ models generated by the ϕ ’s above, using b = 0 n 1 n b τ n = n and b = 1 n 1 b n b τ n = n 1 for any τ n leading to an expression of Z n , we obtain
p b n = θ + b n θ + 1 1 , 1 n , d b 1 + n d 1 ,
respectively, depending only on b , n and not on the full weight sequence π b ; b = 0 , , b * . In the first two examples, b 0 , , b * = n 1 , while b 0 , , b * = n 1 d in the third d-ary labeled binomial trees case.
From the first row of (56), the mean number of leaves is readily obtained to be, respectively, (growing as a fraction of n 2 ):
E N ¯ 0 τ n = n θ + 1 1 2 θ + 1 , n 2 , d 1 n + 1 2 d 1 .
The variance grows similarly proportionally to n, and a Central Limit Theorem can be shown to hold for N ¯ 0 τ n = N ¯ n 0 1 . Note that E N ¯ n 0 1 is independent of p.
When d = 2 in the binary case, E N ¯ n 0 1 = n + 1 / 3 .
The first row of (56) giving the evolution of the number of leaves of a size n tree is
P N ¯ n + 1 0 1 = n 0 + 1 = d n 0 P N ¯ n 0 1 = n 0 1 + n d 1 + 1 d n 0 + 1 1 + n d 1 P N ¯ n 0 1 = n 0 + 1 ,
giving the transition probabilities from n 0 to n 0 + 1 , n 0 1 , . . , n 1 .
P N ¯ n 0 1 =   . is a Markovian probability sequence. The initial conditions are P N ¯ 1 0 1 = 1 = P N ¯ 2 0 1 = 1 = 1 . Therefore, with
π m l 1 , l = d l 1 1 + m d 1 π m l , l = 1 π m l 1 , l
the transition probabilities, for n 3 and n 0 < n , the integrated distribution of N ¯ n 0 1 reads
P N ¯ n 0 1 = n 0 = m n 0 * l = 1 n 0 π m l l 1 , l l = 1 n 0 m l 1 < m < m l π m l , l ,
where the star sum runs over the integers m n 0 = m l ; l = 1 , , n 0 obeying m 0 : = 1 < m 1 < m 2 < < m n 0 m n 0 + 1 : = n .
The latter explicit expression of P N ¯ n 0 1 = n 0 translates the fact that there are n 0 unit moves up at points m n 0 with no other moves but those for this sequence. There are n 1 n 0 1 terms in this star sum (the number of strict compositions of n into n 0 parts).

3.3. Increasing d-Partition of n into k Parts: Thermodynamic Limit

We have
1 k log P N ¯ k k ρ f ρ ,
with f ρ the Legendre transform of log Φ z . With z Φ z Φ z = ρ 1 always defining a unique z = z ρ , we have
f ρ = ρ log z ρ log Φ z ρ ,
where z ρ = ρ 1 d 1 1 + ρ 1 d 1 z c , Φ z ρ = q p 1 + ρ 1 d 1 1 / d 1 1 .
When d = 2 , z ρ = 1 p q ρ 1 ρ and Φ z ρ = q p ρ 1 . With ρ 1 , therefore,
f ρ = log q p ρ log p q ρ log ρ + ρ 1 log ρ 1 f ρ = log p q log ρ ρ 1 f ρ = 1 ρ 1 1 ρ > 0 .
f ρ vanishes at ρ * = 1 1 p q > 1 , so that N ¯ k k ρ * almost surely as k .
When dealing with the weak version Φ 0 z = q + p Φ z of Φ z , we have
z n Φ 0 z k = q k z n 1 p q d 1 d 1 z k / d 1 .
Hence, as n = k ρ , (now with ρ 0 )
P N ¯ k k ρ = q k k / d 1 k ρ k ρ ! z c k ρ .
By the Stirling formula, in the thermodynamic limit n , k , n / k = ρ 1 ,
P N ¯ k k ρ 1 2 π q k z c k ρ 1 + ρ d 1 k d 1 1 2 k 1 d 1 + ρ 1 k ρ 1 2 k ρ k ρ + 1 2 1 2 π q k z c k ρ 1 + ρ d 1 k d 1 1 ρ d 1 + 1 k ρ ,
so that
1 k log P N ¯ k k ρ f ρ ,
with f ρ = log a ρ and a ρ = q z c ρ 1 ρ d 1 + 1 ρ 1 + ρ d 1 1 d 1 , the Legendre transform of log Φ 0 z . With z Φ 0 z Φ 0 z = ρ 0 always defining a unique z = z ρ , we have
f ρ = ρ log z ρ log Φ 0 z ρ
where z ρ = ρ d 1 1 + ρ d 1 z c , Φ 0 z ρ = q 1 + ρ d 1 1 / d 1 . The expression of (61) in the strict case is just a shifted version of the latter one in the weak case, with ρ 1 substituted to ρ there.

3.4. The Limiting Poisson Case ( d )

We here mention some related computations encompassing the limiting Poisson case.
Let T z solve T z = e T z with T 0 = 0 , and hence with solution
T z = log 1 z ,
with for n 1
C n : = n ! z n T z = n 1 ! .
C n counts the number of labeled Cayley increasing trees. The convergence radius of T z is r c = 1 .
The g.f. of random Poisson μ rooted increasing trees solves the ordinary differential equation Φ z = ϕ Φ z with ϕ z = e μ 1 z and Φ 0 = 0 . Hence,
0
Φ z = 1 μ log 1 μ e μ z ,
with convergence radius z c = e μ / μ > 1 . Then,
P N ¯ 1 = n = z n Φ z = 1 n μ z c n .
With w b = μ b e μ the weight of a node with outdegree b, the weight of a size n tree is
w τ n = b 0 w b n b = μ n 1 e n μ ,
independent of the n b s. Hence, P N ¯ n 1 = n = C n n ! w τ n , a separable form.
Furthermore, for k-forests of such increasing trees, with P z = 0 z d z e μ 1 z = z c 1 e μ z and R z = z / P z ,
P N ¯ k = n = k n z n k R z n .
With z ρ obeying z ρ Φ z ρ Φ z ρ = ρ 1 , we have Φ z ρ = 1 μ log 1 μ e μ z ρ and
Φ z ρ z Φ z ρ = log 1 μ e μ z ρ z log 1 μ e μ z ρ
is the mean ρ logarithmic distribution of the typical box occupancy in the thermodynamic limit n , k , n / k = ρ .
With π 0 = e μ , the joint generating of nodes and leaves solves
z Φ z , u 0 = e μ u 0 1 + e μ 1 Φ z , u 0 .
Φ z , u 0 is thus the inverse function of
P z , u 0 = e μ 0 z d z u 0 1 + e μ z = e μ μ 1 u 0 log 1 e μ z 1 u 0 u 0 ,
so with
Φ z , u 0 = 1 μ log 1 u 0 1 u 0 e μ e μ z 1 u 0 .
Note Φ z , u 0 = 1 μ T μ e μ z , u 0 where T z , u 0 = log 1 u 0 1 u 0 e z 1 u 0 solves
z T z , u 0 = u 0 1 + e T z , u 0 , T 0 , u 0 = 0 .
With n 0 = 1 , , n 1 , we thus have
P N ¯ n 0 1 = n 0 = z n u 0 n 0 Φ z , u 0 z n Φ z , 1 ,
with no obvious solution. However, as we know from the expression of the transition Poissonian probabilities, this probability sequence is Markovian with
P N ¯ n + 1 0 1 = n 0 + 1 = n 0 n P N ¯ n 0 1 = n 0 + 1 n 0 + 1 n P N ¯ n 0 1 = n 0 + 1 .
Hence, in agreement with [26], for n 2 ,
P N ¯ n 0 1 = n 0 = E n , n 0 n 1 ! , for each n 0 1 , , n 1 .
where E n , n 0 are the shifted first-kind Eulerian numbers. N ¯ n 0 1 has mean n / 2 and variance n / 12 : a CLT holds. Eulerian numbers E n + 1 , n 0 + 1 count the number of permutations of n with n 0 ascents. Recalling E n , n 0 = E n , n n 0 , n N ¯ n 0 1 = d N ¯ n 0 1 , where n N ¯ n 0 1 is the number of internal nodes of the size n tree.

3.5. The Boundary Case d = 1

Combinatorial linear increasing trees are those for which T z = 1 + T z , T 0 = 0 yielding T z = e z 1 , with C n = n ! z n T z = 1 (only one such trees). The branching number of a node is either zero or one, leading to threadlike trees.
We have P z = log 1 + z , R z = z / log 1 + z , C n = n ! n 1 z n 1 R z n = 1 (an identity). Furthermore, by the Lagrange inversion formula, C n , k = n ! z n T z k = n ! k n z n k R z n = k ! S n , k ( S n , k , the second-kind Stirling numbers).
Random linear increasing trees are those for which Φ z = q + p Φ z , Φ 0 = 0 yielding Φ z = q p e p z 1 , with z n Φ z = 1 n ! w τ n and w τ n = q p n 1 the weight of all size n such trees. Note that r e = Φ 1 = q p e p 1 < 1 is the extinction probability. Furthermore, P K n = k = z n Φ z k = k ! n ! S n , k q p n 1 k , k = 1 , , n .
In a size n forest with K n distinguishable trees, the law of the total number of leaves coincides with the one of K n as a result of any such threadlike tree possessing a single leaf.

4. Concluding Remarks

The sizes of the random progenies of both simple (BGW) and increasing trees and forests generated by the d binomial branching mechanism ( d N : = 1 , 2 , ) are shown to be amenable to weighted combinatorial trees in the sense of Meir and Moon [14]. We exploit this fact to analyze the structural aspects of these, such as the number of leaves in a size n tree, the number of trees with given outdegree sequences, the number of trees in a size n forest, the number of atoms in a k-forest, or the joint and marginal sizes of trees in a size n forest with k trees. We derive asymptotic results when n , k , separately, or when n, k , jointly, while n / k ρ > 0 .
We conclude by stressing that an alternative randomization to counting combinatorial trees and forests, based on the ratio of favorable count outcomes to the total number of possible ones, is of great interest, as it leads to different and very rich behaviors, for example, concerning the number of trees in a size n forest.
Both randomization approaches rest on the analysis of generating functions and can sometimes take advantage of the Lagrange inversion formula.

Funding

This research received no external funding.

Data Availability Statement

There are no data associated with this paper.

Acknowledgments

T. Huillet acknowledges partial support from the “Chaire Modélisation mathématique et biodiversité” of Veolia-Ecole Polytechnique-MNHN-FondationX and support from the labex MME-DII Center of Excellence (Modèles mathématiques et économiques de la dynamique, de l’incertitude et des interactions, ANR-11-LABX-0023-01 project). This work was also funded by CY Initiative of Excellence (grant “Investissements d’Avenir”ANR- 16-IDEX-0008), Project “EcoDep” PSI-AAP2020-0000000013.

Conflicts of Interest

The authors have no conflicts of interest associated with this paper.

Appendix A

So far, random trees and forests have been constructed after assigning probability weights to the nodes (with given outdegrees) of combinatorial trees, either simple or increasing. We dealt with the binomial offspring distribution as an important representative of the ones with bounded support and having all its moments.
We here briefly consider a different approach to the randomization of combinatorial trees and forests, namely the one arising from the ratio of favorable outcomes to the number of possible ones. In this context, we emphasize the role of the Lagrange inversion formula for some additional branching models, not necessarily related to the binomial case.
To this end, we first observe that the probability law of N ¯ 1 , the number of nodes in a combinatorial tree with C n = n ! z n T z trees (non-negative integers) of size n derived from the g.f. T z , is given by the tilting
P N ¯ 1 = n = z n C n / n ! T z ,
for any z < ρ c ( z ρ c ) , depending on T ρ c = ( T ρ c < ). Tilting is necessary in the ratio of favorable cases to possible cases randomization because of the divergence of the series c n = C n / n ! . The parameter z is related to the mean m = E N ¯ 1 by z T z / T z = m . Here, T z solves either T z = z φ T z , T 0 = 0 or T z = φ T z , T 0 = 0 depending on whether it is a simple or an increasing combinatorial tree. Such trees are generated by the g.f. φ z with m ! z m φ z , m 1 , non-negative integers, and φ 0 = 1 .
When conditioning on the size of the forest, the joint law of a population of k clusters in a size-n forest is
P N ¯ n , k = n k = z n l = 1 k z l n l l = 1 k T z z l z n T z k = n n 1 n k l = 1 k C n l C n , k δ n k = n .
Here, C n , k = n ! z n T z k or C n , k = n ! z n T z k / k ! , depending on the constitutive trees being distinguishable or not.
Moreover, the law of the number K n of its clusters can be calculated after normalizing its count. As the following examples show, the asymptotic structure of K n in this approach is very rich:
  • Forests of indistinguishable linear increasing trees for which T z = 1 + T z yielding T z = e z 1 and K z , v = e v T z . With S n , k the Stirling numbers of the second kind, we obtain
    n ! z n v k K z , v = S n , k ; k = 1 , , n . P K n = k = n ! z n v k K z , v n ! z n K z , 1 = S n , k Σ n . Σ n = k = 1 n S n , k , the Bell number .
    The S n , k s obey a triangular relation translating in one for P K n = k .
    In the case that the trees are assumed distinguishable, with K z , v = 1 / 1 v T z , (see [22]),
    n ! z n v k K z , v = k ! S n , k ; k = 1 , , n . P K n = k = n ! z n v k K z , v n ! z n K z , 1 = k ! S n , k Σ n . Σ n = k = 1 n k ! S n , k = 1 2 k 0 2 k k n , the ordered Bell number .
  • Forests of indistinguishable Cayley trees [23]
    If T z = z e T z , with K z , v = e v T z , by the Lagrange inversion theorem,
    z n K z , v = v n z n 1 e z v + n = v n ! v + n n 1 n ! z n v k K z , v = : C n , k = n 1 k 1 n n k ; k = 1 , , n . P K n = k = n ! z n v k K z , v n ! z n K z , 1 = 1 1 + n n 1 n 1 k 1 n n k . n 1 k 1 n n k = n k k n n k 1 .
    Ref. [27] rather gives C n , k = k n n k 1 as the number of unordered forests with k Cayley trees while fixing the k different founders of the distinct trees out of n k different ways. See also [23]. Now, with Σ n : = n ! z n K z , 1 ,
    E v K n = n ! z n K z , v Σ n = v 1 1 1 + n + v 1 + n n 1 ,
    a shifted binomial distribution. In particular, E K n = 2 n 1 + n 2 . Note that
    E v K n v e 1 v as n is large ,
    the p.g.f. of a shifted mean 1 Poisson random variable.
    We finally observe as in [28] that the triangular array C n , k obeys the backward recursion
    n k C n , k = n k C n , k + 1 .
    Hence, with Σ n = k = 1 n C n , k , P K n = k = C n , k / Σ n , k = 1 , , n obeys
    P K n = k = n k n k P K n = k + 1 , k = n 1 , , 1 ,
    with terminal condition P K n = n = n + 1 n 1 .
  • For non-plane increasing trees (see [24] p. 40 and [29]), with T z = log 1 z and s n , k , the absolute Stirling numbers of the first kind, considering forests of such indistinguishable trees, we have
    K z , v = e v T z = 1 z v n ! z n K z , v = v n n ! z n v k K z , v = s n , k = : C n , k Σ n = n ! z n K z , 1 = k = 1 n C n , k = n ! E v K n = v n n ! .
    Therefore, with H n as the n-th harmonic number,
    E K n = z n v K z , 1 Σ n = H n log n ,
    σ 2 K n log n .
    A CLT holds.
    s n , k counts the number of permutations of n elements with k disjoint cycles. The process K n is the Chinese Restaurant process, indicating the number of occupied tables by n clients and also the number of distinct visited species in a n-sampling process from the Poisson–Dirichlet 1 partition of the unit interval [see [30], p. 57, for example].
  • Plane oriented (recursive) trees are those for which T z = 1 / 1 T z , T 0 = 0 , yielding T z = 1 1 2 z , with C n = n ! z n T z = 1 2 1 / 2 n 1 2 n = 2 n 3 ! ! .
    Here, P z = z z 2 / 2 , R z = 1 z / 2 1 and
    C n = n ! n 1 z n 1 1 z / 2 n = n n 1 / 2 n 1 = 2 n 1 2 n 2 ! / n 1 ! ,
    in agreement with the well-known identity 2 n 3 ! ! = 2 n 2 ! / 2 n 1 n 1 ! .
    Furthermore, in agreement with [31],
    C n , k = n ! z n T z k = n ! k n z n k 1 z / 2 n = n ! n k ! k n n n k / 2 n k = k 2 n k 2 n k 1 ! / n k ! .
    C n , k counts the number of k-forests of distinguishable plane-oriented trees with n nodes.
    K z , v = 1 1 v T z = 1 1 v 1 1 2 z n ! z n K z , 1 = Σ n = n ! z n 1 2 z 1 / 2 = 1 / 2 n 2 n = 2 n 1 ! ! = 2 n ! / 2 n n ! .
    P K n = k = C n , k Σ n = k 2 n k 2 n k 1 ! / n k ! 2 n ! / 2 n n ! = k 2 k 2 n k 1 ! 2 n ! n ! n k ! .

References

  1. Pitman, J. Enumerations of trees and forests related to branching processes and random walks. In Microsurveys in Discrete Probability; Aldous, D., Propp, J., Eds.; American Mathematical Society: Providence, RI, USA, 1998; pp. 163–180. [Google Scholar]
  2. Neveu, J. Arbres et processus de Galton-Watson. Ann. Inst. Henri Poincaré Probab. Stat. 1986, 22, 199–207. [Google Scholar]
  3. Stanley, R.P. Chapter 5. In Enumerative Combinatorics; Cambridge University Press: Cambridge, UK, 1999; Volume 2. [Google Scholar]
  4. Surya, E.; Warnke, L. Lagrange Inversion Formula by Induction. Am. Math. Mon. 2023, 130, 944–948. [Google Scholar] [CrossRef]
  5. Roch, S. Branching Processes. 2021. Available online: https://people.math.wisc.edu/~roch/mdp/roch-mdp-chap6.pdf (accessed on 24 November 2024).
  6. Flory, P.J. Molecular size distribution in three-dimensional polymers, I Gelation. J. Am. Chem. Soc. 1941, 63, 3083–3090. [Google Scholar] [CrossRef]
  7. Flory, P.J. Molecular size distribution in three-dimensional polymers, II Trifunctional branching units. J. Am. Soc. 1941, 63, 3091–3096. [Google Scholar] [CrossRef]
  8. Stockmayer, W.H. Theory of molecular size distribution and gel formation in branched chain polymers. J. Chem. Phys. 1943, 11, 45–55. [Google Scholar] [CrossRef]
  9. Simkin, M.V.; Roychowdhury, V.P. Re-inventing Willis. Phys. Rep. 2011, 502, 1–35. [Google Scholar] [CrossRef]
  10. Flajolet, P.; Sedgewick, R. Analytic Combinatorics; Illustrated Edition; Cambridge University Press: Cambridge, UK, 2009. [Google Scholar]
  11. Harris, T.E. The Theory of Branching Processes; Die Grundlehren der Mathematischen Wissenschaften, Bd. 119; Springer: Berlin/Heidelberg, Germany; Prentice-Hall, Inc.: Englewood Cliffs, NJ, USA, 1963. [Google Scholar]
  12. Garcia-Millan, R.; Font-Clos, F.; Corral, Á. Finite-size scaling of survival probability in branching processes. Phys. Rev. E 2015, 91, 042122. [Google Scholar] [CrossRef]
  13. Corral, Á.; Garcia-Millan, R.; Font-Clos, F. Exact Derivation of a Finite-Size Scaling Law and Corrections to Scaling in the Geometric Galton-Watson Process. PLoS ONE 2016, 11, e0161586. [Google Scholar] [CrossRef]
  14. Meir, A.; Moon, J.W. On the altitude of nodes in random trees. Can. J. Math. 1978, 30, 997–1015. [Google Scholar] [CrossRef]
  15. Drmota, M. Random Trees: An Interplay Between Combinatorics and Probability; Springer: Wien, Austria; New York, NY, USA, 2009. [Google Scholar]
  16. Cramér, H. Sur un nouveau théorème-limite de la théorie des probabilités. Actualités Sci. Ind. 1938, 736, 523. [Google Scholar]
  17. Burden, C.J.; Simon, H. Genetic drift in populations governed by a Galton-Watson branching process. Theor. Biol. 2016, 109, 63–74. [Google Scholar] [CrossRef] [PubMed]
  18. Karlin, S.; McGregor, J. Direct product branching processes and related Markov chains. Proc. Nat. Acad. Sci. USA 1964, 51, 598–602. [Google Scholar] [CrossRef] [PubMed]
  19. Tutte, W.T. The Number of Planted Plane Trees with a Given Partition. Am. Math. Mon. 1964, 71, 272–277. [Google Scholar] [CrossRef]
  20. Kreweras, G. Sur les partitions non croisées d’un cycle. Discret. Math. 1972, 1, 333–350, English translation by Berton A. Earnshaw: On the Non-crossing Partitions of a Cycle. 2005. Available online: https://users.math.msu.edu/users/earnshaw/research/kreweras.pdf (accessed on 24 November 2024). [CrossRef]
  21. Tanner, J.C. A derivation of the Borel distribution. Biometrika 1961, 48, 222–224. [Google Scholar] [CrossRef]
  22. Comtet, L. Analyse Combinatoire—Tome 1; Presses Universitaires de France: Paris, France, 1970. [Google Scholar]
  23. Rényi, A. Some Remarks on the Theory of Trees. Magyar Tud. Akad. Mat. Kutat Int. Kzl 1959, 4, 73–85. [Google Scholar]
  24. Bergeron, F.; Flajolet, P.; Salvy, B. Varieties of increasing trees. In Lecture Notes in Computer Science; CAAPs 92; Raoult, J.C., Ed.; 1992; Volume 581, pp. 24–48. [Google Scholar]
  25. Panholzer, A.; Prodinger, H. Level of nodes in increasing trees revisited. Random Struct. Algorithms 2007, 31, 203–226. [Google Scholar] [CrossRef]
  26. Najock, D.; Heyde, C.C. On the Number of Terminal Vertices in Certain Random Trees with an Application to Stemma Construction in Philology. J. Appl. Probab. 1982, 19, 675–680. [Google Scholar] [CrossRef]
  27. Takács, L. On Cayley’s formula for counting forests. J. Comb. Theory Ser. A 1990, 53, 321–323. [Google Scholar] [CrossRef]
  28. Clarke, L.E. On Cayley’s Formula for Counting Trees. J. Lond. Math. Soc. 1958, 33, 471–474. [Google Scholar] [CrossRef]
  29. Mahmoud, H.; Smythe, R.T.; Szymanski, J. On the structure of random plane-oriented recursive trees and their branches. Random Struct. Algorithms 1993, 4, 151–176. [Google Scholar] [CrossRef]
  30. Pitman, J. Combinatorial Stochastic Processes, Lecture Notes in Mathematics 1875; Springer: Berlin/Heidelberg, Germany, 2006. [Google Scholar]
  31. Callan, D. A combinatorial survey of identities for the double factorial. arXiv 2009, arXiv:0906.1317. [Google Scholar]
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

Share and Cite

MDPI and ACS Style

Huillet, T.E. Statistical Aspects of Two Classes of Random Binomial Trees and Forests. Mathematics 2025, 13, 291. https://doi.org/10.3390/math13020291

AMA Style

Huillet TE. Statistical Aspects of Two Classes of Random Binomial Trees and Forests. Mathematics. 2025; 13(2):291. https://doi.org/10.3390/math13020291

Chicago/Turabian Style

Huillet, Thierry E. 2025. "Statistical Aspects of Two Classes of Random Binomial Trees and Forests" Mathematics 13, no. 2: 291. https://doi.org/10.3390/math13020291

APA Style

Huillet, T. E. (2025). Statistical Aspects of Two Classes of Random Binomial Trees and Forests. Mathematics, 13(2), 291. https://doi.org/10.3390/math13020291

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Metrics

Back to TopTop