Abstract
We consider two specific families of binomial trees and forests: simply generated binomial d-ary trees and forests versus their increasing phylogenetic version, with tree nodes in increasing order from the root to any of its leaves. The analysis (both pre-asymptotic and asymptotic) consists of some of the main statistical features of their total progenies. We take advantage of the fact that the random distribution of those trees are obtained while weighting the counts of the underlying combinatorial trees. We finally briefly stress a rich alternative randomization of combinatorial trees and forests, based on the ratio of favorable count outcomes to the total number of possible ones.
Keywords:
simply generated and increasing binomial trees and forests; total progeny; generating functions; Lagrange inversion formula; structural statistics; partition structures; combinatorial probability MSC:
60C05; 60E05; 60J80; 60J85
1. Introduction
The main purpose of this paper is to present explicit and asymptotic methods to count two various kinds of random trees: Bienyamé–Galton–Watson (BGW) trees and increasing random trees, both generated by a binomial branching mechanism. The limiting Poisson case will also be briefly treated. We will analyze some structural properties of these Markov chains on the non-negative integers. The analysis chiefly concerns the number of leaves of these trees, the number of trees entering in the composition of a size forest of such trees, the joint sizes of the constitutive trees in a forest of k trees, and the one-dimensional marginal size of a typical tree both at fixed and in the limit , . The probability to observe trees with a given outdegree of nodes sequence is also investigated. In all cases, the use of generating functions is an essential ingredient. Explicit formulae can sometimes be derived with the help of Lagrange’s inversion formulae that take different expressions for the two processes under concern. On the other hand, the singularity analysis of generating functions may lead to asymptotic formulae, aiming at describing large trees. The random trees under study consist of weighted versions of combinatorial trees; this will be highlighted.
We shall study the following models:
- -
- Binomial BGW trees and forests appearing in branching population models, percolation on trees and branched polymers.
- -
- Binomial increasing trees as recursive trees appearing in phylogeny with nodes in increasing order for any path from the root to the leaves: the leaves of size tree species can mutate to another species when adding a new atom , whereas the internal nodes consist of the species that can produce a new species in the process. Being always supercritical, d-ary increasing trees do not show up a phase transition at the criticality. A forest of such trees consists of different population genera. The recursive structure is an essential ingredient which is shared by the limiting Poisson increasing tree.
Such random trees and forests are constructed after assigning probability weights to the nodes (with given outdegrees) of combinatorial trees, either simple or increasing, and observing that the weights only depend on the tree size and not on its full outdegree sequence.
Deep relations of enumeration problems of trees and forests to skip-free to the left random walks can be found in [1]. It rests on a different uniform way to construct random forests arising from enumerative combinatorics, leading to very rich asymptotic behaviors. This point is briefly addressed and illustrated in Appendix A.
2. Number of Atoms and Leaves in a Size— Simple Tree
We can distinguish two main types of random trees, namely the following:
- -
- Ordered (or plane) trees: The reason is that one can draw the tree in the half-plane so that the children of every parent are ordered from left to right, say, from the youngest child to the eldest one. Embeddings obtained from cyclic rotations of the sub-trees around the root are not allowed.
Such trees are amenable to the Ulam–Harris–Neveu ordering of their nodes (horizontal ordering holds) and they can be represented as strata with the founder on top and the successive layers below, [2]. Given that an individual of the population at generation h is labeled by vertex (as a concatenation of h positive integers) and gives birth to daughters, its offspring are labeled by . Each individual at generation h thus obtains a concatenated label for which label is the one of its mother, , the one of its grandmother,…, up to ∅, the conventional label of the root. Such ordered trees are Bienaymé–Galton–Watson (in short BGW) trees in the theory of branching processes; they are also called simply generated (for short, here, simple) trees.
- -
- Increasing random trees: A size-n rooted and increasing labeled tree has vertices with indices or labels increasing for any path from the root to its leaves. In contrast with simple random trees, increasing random trees do not show a phase transition at the criticality. For some classes of special branching mechanisms (binomial, Poisson, or negative binomial), such trees can be recursively defined, proving useful in their analysis.
2.1. Simply Generated Random Trees
By recursion from the root (a unique founder of the tree), the probability of simply generated (or for short simple) size-n rooted trees generated by the local probability generating function (p.g.f.) (with non-negative coefficient of say , , ) summing to 1 is obtained as
where solves the functional equation , . The Lagrange inversion formula states that for all
A more general form of the Lagrange inversion formula states that (with denoting derivative)
0
for any arbitrary analytic output function h. See [3,4]. If is a p.g.f., is the p.g.f. of the total progeny of a branching process generated by with a random number of founders, say for which
If , where v ‘marks’ the number of distinguishable trees in a forest of simple random trees generated by , with
This consists of a degree-n polynomial in v. The double generating function may be viewed as the ‘grand-canonical’ partition function. In that case,
where is the number of distinguishable trees forming the size forest. consists of a ‘sequence’ of trees forming the forest, with v ‘marking’ their number.
Should the trees be distinguishable, the same formulae hold but now with consisting of a ‘set’ of trees forming the forest, so
One important branching mechanism is the binomial one. Other important examples of are (mean- Poisson) or (geometric) and some of them will be mentioned in the course of the analysis when needed.
2.2. The Binomial Case
We here focus on , and (the binomial branching mechanism) corresponding to the probability mass function (pmf) , (for which and ): is the probability of an offspring having outdegree (branching number) b. For this BGW model, each mother particle alive can give birth to at most d daughters, or possibly none. With bin the random number of offspring per capita, : each active node independently possibly activates any of its d descendants. The produced random tree is a sub-tree of the full d-ary (d-dimensional) tree having atoms at generation It is a model of percolation on trees [see [5], p. 438 -> ].
The random variable has mean its distribution is unimodal with mode at the origin if and only if If , the mode is near
Remarks and related models.
The latter model is also related to the Flory–Stockmayer binomial model of randomly branched polymers with degree- functional monomers. See [6,7] if , [8] for any integer d and also [9]: in this model with one founder , the obtained above from the bin generating model is in fact the p.g.f. of first-generation polymers. Here, each monomer with d functional units (arms) is identified to a node of a BGW tree. Independently of one another, each of the d functional units has a probability p to be attached to a second-generation functional unit and so on. At generation 0 however, a seed monomer with full functional units gives birth to a random number (so with distribution bin) of the first generation of such polymers, all with p.g.f. . The true size of the Flory branched polymer has thus the distribution given by the modified p.g.f.
0
This translates the fact that the seed monomer can have up to activated functional units, whereas all its descendants are only up to d, the first and subsequent generation of trees growing away from the seed monomer, thereby presenting only d possible free arms. In the supercritical case with , there is a positive probability that the Flory tree (polymer) is a giant one with infinitely many monomers (the gelation transition). With , the total number of monomers with one founder, we have and, for , by Lagrange inversion formula
The leaves of the polymer tree (its nodes with outdegree zero) constitute its external boundary where possible contact with the reactants are likely to occur. As such, an estimation of their number is of interest when dealing with such size-n branched polymers.
It is also of interest to consider a polymer soup as a collection of such k independent polymers with a given size.
With solving and defining
we obtain that solves
corresponding to the branching mechanism . So is the p.g.f. of the total progeny, say , of a branching process whose offspring per capita is either d with probability p or 0 with probability , so all or nothing. We clearly have only for those , (the number of tree branches being multiple of , so is well defined together with . We can deduce the main properties of the new model generated by from the previous one generated by the binomial :
If , let
When , is a weak version of , allowing for an empty tree with
solves , It is the shifted p.g.f. of also taking values in
Note that as a Bernoulli-thinned version of , is not an output of If, while deleting a productive node with probability , the ancestral branch leading to that node is equivalently erased, a pruning operation of the tree results in the formation of disconnected sub-trees rooted at this node. Assuming the root to be active, the size of the pruned tree descending from the root has p.g.f. , clearly solving
This is the progeny of a BGW tree with the thinned branching mechanism
As , the binomial branching mechanism approaches the Poisson p.g.f., leading to random Cayley trees.
The total progeny of Bellman–Harris trees for which each splitting individual alive has an exponential lifetime independent of its sisters’ particles coincides with the one of discrete-time BGW trees as long as they share the same branching mechanism. An overlap of generations results.
State-space representations. The binomial BGW process is a Markov chain on the non-negative integers with the transition matrix
in the case of a single founder.
With , , the p.g.f. of the number of individuals, say , alive at step h (the generation number), obeying ,
is the h-step transition matrix, involving the h-iterate of the degree-d polynomial (as a degree polynomial). Recall that is the probability that the process dies out before step so the probability that the time to extinction of the unique founder occurs before generation h: .
Total progeny. By the Lagrange formula, with , where is the total progeny of this branching process with one founder and overlapping generations, with
By the Stirling formula, with for large n,
Note that only when (the critical case with a pure power law with exponent ).
This result is a particular illustration of a more general situation. Consider indeed the general branching mechanisms having all its moments (equivalently for which Then, a unique positive real root to the equation
exists, with if (), and if
The point is the tangency point to the curve of a straight line passing through the origin . Let then . The searched solves where obeys and Thus, else (a branch-point singularity). It follows that displays a dominant power singularity of the order of at with in the sense (with )
By singularity analysis therefore, see [10], we obtain [in agreement with [11], Theorem 13.1, p. 32]:
to the dominant order in n, with a geometric decay term at rate and a ‘universal’ power-law decay term (under the moment conditions on ).
In the critical case , and is a pure power law with a ‘universal’ value of .
In the case of the binomial branching mechanism, one can check that in agreement with (10). In this context as well, plays an important role.
The binomial model is supercritical if and only if (else ), in which case the smallest solution to lies in This is a degree-d algebraic equation. In this case, , the extinction probability of the binomial branching process.
- -
- If (), the model is critical (subcritical) and . We haveCritical trees are finite with probability 1, but their time to extinction is long with a law tail equivalent to that of a Pareto(1) power-law distribution.
In the subcritical case, has all its moments, in particular, a finite variance .
- -
- If , the model is supercritical and together with . We also havethe mean number of atoms of the binomial branching process conditioned on extinction with modified branching mechanism having a mean number of offspring From the above expression of , we obtain , an upper-bound of
From (9), is an expression of
There is also an estimate of when the BGW process is nearly supercritical ( slightly above 1). Let be the survival probability and , with
where is the variance in at the criticality. We have
As a result of
we obtain the small survival probability estimate when the BGW process is nearly supercritical. As a function of , is always continuous at 0 ( if ) but with a discontinuous slope at , close to . As clearly
A full power-series expansion of in terms of can also be obtained as follows: define by , so with The equation becomes
For close to , the Lagrange inversion formula gives
Note with when is slightly above 1. To the first order in , we recover . The second-order coefficient is found to be Let us detail these formulae in the binomial example.
Example 1.
If , (the binomial case) and . Here , giving in principle, starting with ▹
From the exact expression of as in [12,13], we observe the following finite-size scaling law in the slightly supercritical regime for which and , [, the critical variance of when :
As in the strictly critical regime, the time to extinction has power-law tails with index 1, but with a non-constant asymptotic rate .
Regular supercritical or subcritical BGW processes conditioned to be critical. We end up with a last conditioning leading to a critical BGW tree with mean offspring number Let , regular, obey which has a convergence radius of (possibly ) and . For such ’s, a unique positive real root to the equation
exists, with if (), if and if (). In both cases,
Start with a supercritical branching process () and consider a process whose modified branching mechanism is , satisfying and , the one of a critical branching process with mean 1 offspring distribution and variance: . The transition matrix of the critical process is given by its entries
In terms of p.g.f., with , where is the total progeny’s supercritical generating function (g.f.), solving , ,
The new has a new shifted convergence radius 1 (obeying ) and with ().
This transformation kills the supercritical paths to only select the critical ones.
When and . Hence,
is the critical binomial branching mechanism.
Similarly, starting with a subcritical branching process () and considering a process whose modified branching mechanism (as a p.g.f.) is , satisfying and , the one of a critical branching process.
This transformation creates critical paths from the subcritical ones.
2.3. Random Trees as Weighted Combinatorial Trees
Consider the plane tree ordinary generating function (o.g.f.) solving , with counting the number of such combinatorial binomial size trees. By the Stirling formula, with for large n,
Let be the (multiplicative) weight of an unlabeled rooted tree with n nodes () having nodes with outdegree (branching number) The weight is the product over the n nodes x of of the ’s, where is the outdegree of x. Then, is the weight of all size-n such trees associated to the weight sequence w so with , if, as in the binomial case, and the number of these trees is Let . Then, solves where Recalling and (the total tree length), each tree has equal weight and
a separable form in agreement with (9). Simply generated weighted trees are weighted versions of rooted such trees and have been introduced in [14]. When dealing with k-forests of size-n, owing now to
then and
where is the number of combinatorial binomial size forests with k-trees.
2.4. Selection of Paths Mechanisms of Random Trees: Rescaling
With , consider the weight sequence for which . Then,
is the weighted version of . Equivalently,
solving where is the modified ‘branching mechanism’ and not necessarily a p.g.f. It is a p.g.f. when [if in addition , this is the selection of the critical paths mechanism discussed above; if in addition , this is the selection of the subcritical paths mechanism discussed above].
If and , resulting in a weighted version of with shifted convergence radius [ if in addition ]. Note that no longer is a branching mechanism if is one with .
If , is the modified ‘branching mechanism’, not necessarily a p.g.f. unless . Then, , resulting in a scaled version of with an unmodified convergence radius. If , choosing yields
2.5. Total Number of Leaves (Sterile Individuals) Versus Total Progeny
In the branching population models just discussed, it is important to control the number of leaves in the BGW tree with a single founder because leaves are nodes (individuals) of the tree (population) that gave birth to no offspring (the frontier of the tree as sterile individuals) and so are responsible for its extinction. Leaves are nodes with outdegree zero, so let be the number of leaves in a BGW tree with nodes. With the joint p.g.f. of solves the functional equation
With , we have
where . It is shown using this in [15], Th. , page 84, that, under our assumptions of ,
As , converges in probability to , the asymptotic fraction of nodes in a size-n tree which are leaves. For the Geo0 generated tree with , it can be checked that , whereas for the Poisson generated tree with p.g.f. , For the negative binomial tree generated by , , and for the Flory d-ary tree generated by the p.g.f. ,
One possible way to see this is as follows.
With , the m-falling factorial moments of are given from Lagrange inversion formula by
When and for , a large n estimate using Stirling formula yields
independent of The variance estimate follows after some elementary algebraic computations dealing with
When , as a result of
Remark 1.
Taking in (22), solves the functional equation , with , where is the number of leaves of the tree regardless of its precise number of atoms. By the Lagrange inversion formula, the probability to observe leaves under is
2.6. Forests
Consider now a k-forest of such trees (so with k founders). It takes into account the possibility that there are k independent distinguishable copies of BGW trees, each with a single founder. By the Lagrange formula, with , the size of a forest given that it has k founders, consistently with (4), we have
In the critical case (), In this case, the BGW process is with a constant () population size over the generations on average. The extinction probability of a forest is
By the Stirling formula, with ,
As a result, with , in the thermodynamic limit while , , the number of connected components (trees) given a size forest population obeys
0
Hence, the number of trees forming a size forest obeys
where and . We have
The function is convex over . In the subcritical regime (), it has a minimum at , with translating that , almost surely as .
Similarly, with , the number of atoms in a k-forest, with
With and
(convex over the domain ), by Cramér’s theorem [16],
The rate function is the Legendre transform of the convex free energy Therefore,
with
As required, as , and
In the subcritical case, a Central Limit Theorem (CLT) holds:
In the binary case (),
In this case,
Note when In the subcritical case, with , translating , almost surely as .
In the critical case, is heavy tailed with tail index ; we thus expect that (a stable random variable) and therefore that ().
In the supercritical case, with
Remark 2.
(i) The process has stationary independent increments. The processes and are mutual inverses in that
- The process is a renewal process with times elapsed between consecutive moves up by one unit all distributed like :
- (ii)
- Considering the weak case with substituted to , the Lagrange inversion formula yieldsIn the upper binomial term, n is changed to (). By the Stirling formula, a new is obtained while substituting to ρ in (28). Note now the new
Maxwell–Boltzmann partition of n into k parts.
Let be the joint total progenies of each of the k founders. With , we have
where sums to n. It is an exchangeable Maxwell–Boltzmann balls-in-boxes distribution (independent of ) on the simplex , where the balls consist of the progenies of each founder (the boxes).
The size of a typical (1-dimensional marginal) box occupancy is given by
With , summing over , the pmf is also seen non-defective and proper. Clearly,
A large population thermodynamic limit exists ( with ), with having the mean- limiting distribution given, from a saddle-point analysis, by
where solves
For all , is uniquely defined firstly because is increasing and because, as observed before, when , and with
Strict d-partition of n into k parts.
So far, as in [17], we allow the population size to vary stochastically according to a Galton–Watson branching process, possibly with a constant size on average as in the critical case. However, most population genetics studies have their origins in a Wright–Fisher or some closely related fixed-population model, in which each individual randomly chooses its ancestor [18]. We briefly describe the situation relative to the binomial branching mechanism, in which the process is strictly (almost surely) with constant population size k over the generations. Consider k independent and identically distributed random variables each with binomial distribution . Let Then, with
recalling and This distribution is independent of p and As a consequence, we obtain the identity
The distribution of is exchangeable, the law of each component being
When , ref. (33) boils down to
the multinomial Wright–Fisher distribution. Asymptotic independence is obtained when in which the law of takes the product form of independent and identically distributed mean Poisson distributions.
2.7. Random Simple Trees with Given Outdegree Sequences
The joint generating function of simple trees with given outdegree sequence solves
Here, marks the nodes with the different outdegrees. Hence,
with the ’s obeying
As a result: and
There are , such non-negative ’s satisfying the first constraint (as a weak composition). In the sequel, we shall use the symbol whenever summing over the obeying the two constraints (34) above. Clearly, the number of non-negative integers solving (34) is given by
It is the number of unordered partitions of into no more than d non-negative parts, the number of occurrences of part b in a partition being with
The joint generating function (p.g.f.) of their nodes and leaves in particular reads
hence with
Therefore, the probability of a configuration with n atoms and leaves is
in view of
From (35), given n and , the number of d-trees having outdegrees nodes sequence satisfying in (34) (as the factor in front of the weights ) is
in agreement with [19,20] Theorem 4.
All this can prove useful and explicit in the following situation: suppose we are interested in a specific set of ’s, from which the values of follow from (34). Then, the corresponding number of d-trees is known, together with the probability of such a configuration. For example, suppose for Then, and , and the probability of this uniform configuration is
Remark 3.
In the binary case , for fixed values of we have and (a single possible choice for ); if , there are no solutions to (34). Hence,
In addition, while summing over
in agreement with (9). Given , there are
such binary simple trees. If , yields There are no binary trees with , .
As an extension, given n and , the number of k-forests of simple d-trees having outdegrees nodes sequence is
Here now satisfy
This is in agreement with Theorem 4 of [1].
2.8. The Limiting Poisson Case ()
We here mention some related computations encompassing the limiting Poisson case. Let solve with and hence with solution
with for
counts the number of labeled Cayley simple rooted trees (the Cayley formula). The convergence radius of is
The g.f. of random Poisson rooted increasing trees solves the functional equation with and . Hence,
with convergence radius . Then,
is the Borel distribution. With , the weight of a node with outdegree b, the weight of a size tree is
independent of the ’s. Hence, , a separable form.
Furthermore, for k-forests of such simple trees,
the Borel–Tanner distribution [21]. Using the Stirling formula, with
showing that
The joint combinatorial generating function of their nodes and leaves reads
hence with
Therefore,
due to the vertical g.f. of second-kind Stirling numbers (see [22]):
We obtain
with boundary conditions , and In addition,
Assuming uniform sampling, we have (otherwise, 0): the law of has finite support, varying with n. From (41), with , we obtain
The latter recursion (41) may be written as
defining the (positive) transition coefficients and (not transition probabilities because ). This three-term (‘space-time’ inhomogeneous) recurrence is therefore not the one of a standard Markov chain with a usual probability transition matrix. However, it is the one of a triangular Markovian probability sequence whose support varies with n linearly.
Next, the identity ([23])
yields (with ): Also,
with
The variance term is obtained while plugging in the identity. The Central Limit Theorem (CLT) therefore holds (see [23]):
With obeying , is the mean ’Cayley’ distribution of the typical box occupancy in the thermodynamic limit ,
2.9. The Case
Combinatorial linear BGW trees are those for which , yielding , with ). The branching number of a node is either zero or one, leading to ‘threadlike trees’. We have
Furthermore, , the number of (ordered) compositions of n into k parts. Note that by the Lagrange inversion formula,
Random linear increasing trees are those for which , yielding , with and , the weight of all such size trees. The extinction probability of this model is (a subcritical regime).
Furthermore, ,
In a size forest with trees, the law of the number of leaves coincides with the one of as a result of any threadlike tree possessing a single leaf.
In the weak case, with ,
where is the number of weak compositions of n into k parts.
3. Increasing (or Recursive) Trees as Phylogenetic Trees
A size-n rooted and increasing labeled tree has vertices with indices or labels increasing for any path from the root to its leaves. Wherever a new connection is created in this tree, the adjunction of a new node with index will result in a size-() rooted increasing tree. Increasing trees can in addition be unordered (Cayley) or ordered. The combinatorial version of such trees was studied by [24].
Let solve with and hence with solution
with for
where . counts the number of labeled binomial increasing trees. The convergence radius of is
Such increasing trees serve as models for phylogenetic trees in which nodes represent species with labels encoding their order of appearance in the tree, and thus the chronology of evolution. The leaves of the tree are the currently living species that can mutate to a new species; the internal nodes are the ones that can generate a new species (in the ary tree context, only nodes that are not at saturation with d offspring have this ability); the different trees of a forest consist of genera.
3.1. Random Binomial Increasing Trees
The g.f. of random binomial rooted increasing trees solves the ordinary differential equation with and , and hence
with and .
As a result, has an algebraic singularity of order at Note that
Whatever the values of p and d, there is a positive probability that , and there is no phase transition to subcriticality with almost sure extinction for the increasing version of a d-tree. For each , the convergence radius is a convex function of taking its maximum value when
With , we obtain
with , as , with geometric decay and presenting an algebraic prefactor, the exponent of which () now depends on
Note that, as required, with , the weight of each tree ,
is a separable form.
Remark 4 (weighted versions of increasing trees).
With , consider the weight sequence for which . Then, as for simple trees,
is the weighted version of . Equivalently,
now solving , where is the modified ‘branching mechanism’, not necessarily a p.g.f. It is a p.g.f. when
It is very unrealistic that any evolutionary process would lead to a configuration with infinitely many species. This forces one to consider the binomial increasing branching process conditioned on extinction, so with a modified binomial branching mechanism (but here, not a p.g.f.). The p.g.f. of its total progeny, say , then reads
Note that , observing where is the smallest solution to (the extinction probability of the simple BGW d-ary tree).
The Lagrange inversion formula version for increasing trees states that for all ,
where Note that, because , with (obeying if ), this is also
In particular, () and , , with (47),
And with ,
If and , yielding
It involves a composition of n into k parts factor. Using Stirling formula, with , as
with
We have , vanishing at with .
Note that, while considering instead the weak model,
When , , involving the weak composition of n into k parts factor.
Remark 5.
The Lagrange inversion formula adapted to forests of distinguishable increasing trees reads
where The p.g.f. of the number of increasing trees of a size forest is , with
Should the trees be distinguishable, the same formulae hold but now with , so
3.2. Distribution of the Number of Leaves
The joint generating of nodes and leaves solves
Hence,
with unknown explicit solution in general. With , we have
with no known explicit solution in general.
Consider indeed the integral
of which is the inverse: .
With the b-th root of unity, we have
A fraction decomposition into simple elements of the integrand yields in principle the expression of
where . The dominant root is so with
with dominant inverse
Alternatively, with the incomplete beta function, the primitive of is
as a generalized logarithm. This gives an alternative expression of . However, the computation of obeying would require the inverse function of which, to the authors’ knowledge, has no known expression in terms of special functions.
Example 2 (the binary case and the tree of life).
When , upon decomposing the rational fraction into simple elements and integrating,
The inverse obeying reads
When , both the numerator and denominator tend to 0, with, to the first order in
This is (45) when as required.
Even in this explicit expression case for , has no simple expression. We now give an alternative path to obtain , exploiting the recursive nature of binomial increasing trees.
A recurrence in special cases. Consider increasing branching random trees whose p.g.f. solves where Consider the cases , , or , integer (negative binomial, Poisson, or binomial).
Note that the convergence radius of is
where is the convergence radius of .
In these three particular cases of , the formation of the tree admits the following recursive tree evolution scheme (label 1 is assigned to the root).
With probability
attach uniformly node to any of the nodes with outdegree of a previous size-n increasing tree (). The normalization constant is , representing the “number” of ways the new atom with label can be inserted in . This preferential attachment procedure results in a realization of , (see [25]).
With mutually exclusive Bernoulli random variables (summing to 1), each with success probability , for each , we then have
Whenever a connection to a node with outdegree b occurs, the number of nodes with outdegree b (respectively ) decreases (increases) by one unit. In addition, a new node with outdegree 0 is always created, whatever the degree of the node to which the new incoming atom connects to .
For the three particular models generated by the ’s above, using and for any leading to an expression of , we obtain
respectively, depending only on and not on the full weight sequence . In the first two examples, , while in the third d-ary labeled binomial trees case.
From the first row of (56), the mean number of leaves is readily obtained to be, respectively, (growing as a fraction of ):
The variance grows similarly proportionally to n, and a Central Limit Theorem can be shown to hold for . Note that is independent of p.
When in the binary case,
The first row of (56) giving the evolution of the number of leaves of a size tree is
giving the transition probabilities from to .
is a Markovian probability sequence. The initial conditions are Therefore, with
the transition probabilities, for and , the integrated distribution of reads
where the star sum runs over the integers obeying .
The latter explicit expression of translates the fact that there are unit moves up at points with no other moves but those for this sequence. There are terms in this star sum (the number of strict compositions of n into parts).
3.3. Increasing d-Partition of n into k Parts: Thermodynamic Limit
We have
with the Legendre transform of . With always defining a unique , we have
where
When , and With , therefore,
vanishes at , so that almost surely as
When dealing with the weak version of we have
Hence, as , (now with )
By the Stirling formula, in the thermodynamic limit
so that
with and , the Legendre transform of . With always defining a unique , we have
where The expression of (61) in the strict case is just a shifted version of the latter one in the weak case, with substituted to there.
3.4. The Limiting Poisson Case ()
We here mention some related computations encompassing the limiting Poisson case.
Let solve with and hence with solution
with for
counts the number of labeled Cayley increasing trees. The convergence radius of is
The g.f. of random Poisson rooted increasing trees solves the ordinary differential equation with and . Hence,
0
with convergence radius . Then,
With the weight of a node with outdegree b, the weight of a size tree is
independent of the ’s. Hence, , a separable form.
Furthermore, for k-forests of such increasing trees, with and
With obeying , we have and
is the mean logarithmic distribution of the typical box occupancy in the thermodynamic limit ,
With , the joint generating of nodes and leaves solves
is thus the inverse function of
so with
Note where solves
With , we thus have
with no obvious solution. However, as we know from the expression of the transition Poissonian probabilities, this probability sequence is Markovian with
Hence, in agreement with [26], for
where are the shifted first-kind Eulerian numbers. has mean and variance a CLT holds. Eulerian numbers count the number of permutations of with ascents. Recalling , , where is the number of internal nodes of the size tree.
3.5. The Boundary Case
Combinatorial linear increasing trees are those for which , yielding with (only one such trees). The branching number of a node is either zero or one, leading to threadlike trees.
We have , , (an identity). Furthermore, by the Lagrange inversion formula, (, the second-kind Stirling numbers).
Random linear increasing trees are those for which , yielding with and the weight of all size such trees. Note that is the extinction probability. Furthermore, ,
In a size forest with distinguishable trees, the law of the total number of leaves coincides with the one of as a result of any such threadlike tree possessing a single leaf.
4. Concluding Remarks
The sizes of the random progenies of both simple (BGW) and increasing trees and forests generated by the binomial branching mechanism () are shown to be amenable to weighted combinatorial trees in the sense of Meir and Moon [14]. We exploit this fact to analyze the structural aspects of these, such as the number of leaves in a size tree, the number of trees with given outdegree sequences, the number of trees in a size forest, the number of atoms in a k-forest, or the joint and marginal sizes of trees in a size forest with k trees. We derive asymptotic results when , separately, or when n,, jointly, while
We conclude by stressing that an alternative randomization to counting combinatorial trees and forests, based on the ratio of favorable count outcomes to the total number of possible ones, is of great interest, as it leads to different and very rich behaviors, for example, concerning the number of trees in a size forest.
Both randomization approaches rest on the analysis of generating functions and can sometimes take advantage of the Lagrange inversion formula.
Funding
This research received no external funding.
Data Availability Statement
There are no data associated with this paper.
Acknowledgments
T. Huillet acknowledges partial support from the “Chaire Modélisation mathématique et biodiversité” of Veolia-Ecole Polytechnique-MNHN-FondationX and support from the labex MME-DII Center of Excellence (Modèles mathématiques et économiques de la dynamique, de l’incertitude et des interactions, ANR-11-LABX-0023-01 project). This work was also funded by CY Initiative of Excellence (grant “Investissements d’Avenir”ANR- 16-IDEX-0008), Project “EcoDep” PSI-AAP2020-0000000013.
Conflicts of Interest
The authors have no conflicts of interest associated with this paper.
Appendix A
So far, random trees and forests have been constructed after assigning probability weights to the nodes (with given outdegrees) of combinatorial trees, either simple or increasing. We dealt with the binomial offspring distribution as an important representative of the ones with bounded support and having all its moments.
We here briefly consider a different approach to the randomization of combinatorial trees and forests, namely the one arising from the ratio of favorable outcomes to the number of possible ones. In this context, we emphasize the role of the Lagrange inversion formula for some additional branching models, not necessarily related to the binomial case.
To this end, we first observe that the probability law of the number of nodes in a combinatorial tree with trees (non-negative integers) of size derived from the g.f. , is given by the tilting
for any , depending on (). Tilting is necessary in the ratio of favorable cases to possible cases randomization because of the divergence of the series . The parameter z is related to the mean by . Here, solves either , or , depending on whether it is a simple or an increasing combinatorial tree. Such trees are generated by the g.f. with , , non-negative integers, and .
When conditioning on the size of the forest, the joint law of a population of k clusters in a size-n forest is
Here, or , depending on the constitutive trees being distinguishable or not.
Moreover, the law of the number of its clusters can be calculated after normalizing its count. As the following examples show, the asymptotic structure of in this approach is very rich:
- Forests of indistinguishable linear increasing trees for which yielding and . With the Stirling numbers of the second kind, we obtainThe s obey a triangular relation translating in one forIn the case that the trees are assumed distinguishable, with (see [22]),
- Forests of indistinguishable Cayley trees [23]If , with , by the Lagrange inversion theorem,Ref. [27] rather gives as the number of unordered forests with k Cayley trees while fixing the k different founders of the distinct trees out of different ways. See also [23]. Now, witha shifted binomial distribution. In particular, Note thatthe p.g.f. of a shifted mean 1 Poisson random variable.We finally observe as in [28] that the triangular array obeys the backward recursionHence, with , , obeyswith terminal condition .
- For non-plane increasing trees (see [24] p. 40 and [29]), with and , the absolute Stirling numbers of the first kind, considering forests of such indistinguishable trees, we haveTherefore, with as the n-th harmonic number,A CLT holds.counts the number of permutations of n elements with k disjoint cycles. The process is the Chinese Restaurant process, indicating the number of occupied tables by n clients and also the number of distinct visited species in a n-sampling process from the Poisson–Dirichlet partition of the unit interval [see [30], p. 57, for example].
- Plane oriented (recursive) trees are those for which , yielding , withHere, , andin agreement with the well-known identityFurthermore, in agreement with [31],counts the number of k-forests of distinguishable plane-oriented trees with n nodes.
References
- Pitman, J. Enumerations of trees and forests related to branching processes and random walks. In Microsurveys in Discrete Probability; Aldous, D., Propp, J., Eds.; American Mathematical Society: Providence, RI, USA, 1998; pp. 163–180. [Google Scholar]
- Neveu, J. Arbres et processus de Galton-Watson. Ann. Inst. Henri Poincaré Probab. Stat. 1986, 22, 199–207. [Google Scholar]
- Stanley, R.P. Chapter 5. In Enumerative Combinatorics; Cambridge University Press: Cambridge, UK, 1999; Volume 2. [Google Scholar]
- Surya, E.; Warnke, L. Lagrange Inversion Formula by Induction. Am. Math. Mon. 2023, 130, 944–948. [Google Scholar] [CrossRef]
- Roch, S. Branching Processes. 2021. Available online: https://people.math.wisc.edu/~roch/mdp/roch-mdp-chap6.pdf (accessed on 24 November 2024).
- Flory, P.J. Molecular size distribution in three-dimensional polymers, I Gelation. J. Am. Chem. Soc. 1941, 63, 3083–3090. [Google Scholar] [CrossRef]
- Flory, P.J. Molecular size distribution in three-dimensional polymers, II Trifunctional branching units. J. Am. Soc. 1941, 63, 3091–3096. [Google Scholar] [CrossRef]
- Stockmayer, W.H. Theory of molecular size distribution and gel formation in branched chain polymers. J. Chem. Phys. 1943, 11, 45–55. [Google Scholar] [CrossRef]
- Simkin, M.V.; Roychowdhury, V.P. Re-inventing Willis. Phys. Rep. 2011, 502, 1–35. [Google Scholar] [CrossRef][Green Version]
- Flajolet, P.; Sedgewick, R. Analytic Combinatorics; Illustrated Edition; Cambridge University Press: Cambridge, UK, 2009. [Google Scholar]
- Harris, T.E. The Theory of Branching Processes; Die Grundlehren der Mathematischen Wissenschaften, Bd. 119; Springer: Berlin/Heidelberg, Germany; Prentice-Hall, Inc.: Englewood Cliffs, NJ, USA, 1963. [Google Scholar]
- Garcia-Millan, R.; Font-Clos, F.; Corral, Á. Finite-size scaling of survival probability in branching processes. Phys. Rev. E 2015, 91, 042122. [Google Scholar] [CrossRef]
- Corral, Á.; Garcia-Millan, R.; Font-Clos, F. Exact Derivation of a Finite-Size Scaling Law and Corrections to Scaling in the Geometric Galton-Watson Process. PLoS ONE 2016, 11, e0161586. [Google Scholar] [CrossRef]
- Meir, A.; Moon, J.W. On the altitude of nodes in random trees. Can. J. Math. 1978, 30, 997–1015. [Google Scholar] [CrossRef]
- Drmota, M. Random Trees: An Interplay Between Combinatorics and Probability; Springer: Wien, Austria; New York, NY, USA, 2009. [Google Scholar]
- Cramér, H. Sur un nouveau théorème-limite de la théorie des probabilités. Actualités Sci. Ind. 1938, 736, 523. [Google Scholar]
- Burden, C.J.; Simon, H. Genetic drift in populations governed by a Galton-Watson branching process. Theor. Biol. 2016, 109, 63–74. [Google Scholar] [CrossRef] [PubMed]
- Karlin, S.; McGregor, J. Direct product branching processes and related Markov chains. Proc. Nat. Acad. Sci. USA 1964, 51, 598–602. [Google Scholar] [CrossRef] [PubMed]
- Tutte, W.T. The Number of Planted Plane Trees with a Given Partition. Am. Math. Mon. 1964, 71, 272–277. [Google Scholar] [CrossRef]
- Kreweras, G. Sur les partitions non croisées d’un cycle. Discret. Math. 1972, 1, 333–350, English translation by Berton A. Earnshaw: On the Non-crossing Partitions of a Cycle. 2005. Available online: https://users.math.msu.edu/users/earnshaw/research/kreweras.pdf (accessed on 24 November 2024). [CrossRef]
- Tanner, J.C. A derivation of the Borel distribution. Biometrika 1961, 48, 222–224. [Google Scholar] [CrossRef]
- Comtet, L. Analyse Combinatoire—Tome 1; Presses Universitaires de France: Paris, France, 1970. [Google Scholar]
- Rényi, A. Some Remarks on the Theory of Trees. Magyar Tud. Akad. Mat. Kutat Int. Kzl 1959, 4, 73–85. [Google Scholar]
- Bergeron, F.; Flajolet, P.; Salvy, B. Varieties of increasing trees. In Lecture Notes in Computer Science; CAAPs 92; Raoult, J.C., Ed.; 1992; Volume 581, pp. 24–48. [Google Scholar]
- Panholzer, A.; Prodinger, H. Level of nodes in increasing trees revisited. Random Struct. Algorithms 2007, 31, 203–226. [Google Scholar] [CrossRef]
- Najock, D.; Heyde, C.C. On the Number of Terminal Vertices in Certain Random Trees with an Application to Stemma Construction in Philology. J. Appl. Probab. 1982, 19, 675–680. [Google Scholar] [CrossRef]
- Takács, L. On Cayley’s formula for counting forests. J. Comb. Theory Ser. A 1990, 53, 321–323. [Google Scholar] [CrossRef]
- Clarke, L.E. On Cayley’s Formula for Counting Trees. J. Lond. Math. Soc. 1958, 33, 471–474. [Google Scholar] [CrossRef]
- Mahmoud, H.; Smythe, R.T.; Szymanski, J. On the structure of random plane-oriented recursive trees and their branches. Random Struct. Algorithms 1993, 4, 151–176. [Google Scholar] [CrossRef]
- Pitman, J. Combinatorial Stochastic Processes, Lecture Notes in Mathematics 1875; Springer: Berlin/Heidelberg, Germany, 2006. [Google Scholar]
- Callan, D. A combinatorial survey of identities for the double factorial. arXiv 2009, arXiv:0906.1317. [Google Scholar]
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content. |
© 2025 by the author. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).