The McMillan theorem for colored branching processes and dimensions of random fractals

For simplest colored branching processes we prove an analog to the McMillan theorem and calculate Hausdorff dimensions of random fractals defined in terms of the limit behavior of empirical measures generated by finite genetic lines. In this setting the role of Shannon's entropy is played by the Kullback--Leibler divergence and the Hausdorff dimensions are computed by means of the so-called Billingsley--Kullback entropy, defined in the paper.


Introduction
Let us fix a finite set Ω = {1, . . . , r}, whose elements denote different colors, and consider a finite random set X containing a random number of elements of colors 1, . . . , r. The simplest colored branching process can be defined as an evolution of a population in which all individuals live the same fixed time, and then, when the lifetime ends, each individual generates (independently of others) a random set of "children" containing individuals of different colors and distributed as X. We will assume that the evolution starts with a unique initial individual. It is suitable to represent this process as a random genealogical tree with individuals as vertices and each vertex connected with its "children" by edges. Denote by X n the set of all genetic lines of length n starting from the initial individual. The colored branching process can degenerate (when, starting from some n, all of the sets X n are empty) or, otherwise, evolve endlessly. Every genetic line x = (x 1 , . . . , x n ) ∈ X n generates an empirical measure δ x,n on the set of colors Ω in accordance with the following rule: for each i ∈ Ω, the value of δ x,n (i) is the quota in the string (x 1 , . . . , x n ) of all elements having the color i.
Denote by µ(i), where i ∈ Ω, the expectation of the number of all elements x ∈ X having the color i. Let ν be an arbitrary probability measure on Ω. An analog to the McMillan theorem that will be proven below (see Theorem 11) asserts that under the condition of non-degeneracy of the colored branching process, the cardinality of the set {x ∈ X n | δ x,n ≈ ν } almost surely (i.e., with probability one) has asymptotics (up to the logarithmic equivalence) of order e −nρ(ν,µ) , where: Formally, the value of ρ(ν, µ) coincides with the usual Kullback-Leibler divergence and differs from it only in the fact that in our setting, the measure µ is not a probability, and so ρ(ν, µ) can be negative. This is the first main result of the paper. It generalizes the usual McMillan theorem, which deals with Ω n , rather than X n . Its proof is based essentially on the Legendre duality between the Kullback-Leibler divergence and the so-called spectral potential, which was introduced first in [1] and then investigated profoundly in [2,3].
We investigate also Hausdorff dimensions of random fractals defined in terms of the limit behavior of the sequence of empirical measures δ x,n . Let X ∞ be the set of all infinite genetic lines. Fix an arbitrary vector θ = (θ(1), . . . , θ(r)) ∈ (0, 1) r and define the following metric on X ∞ : dist(x, y) = n t=1 θ(x t ), where n = inf {t | x t = y t } − 1.
Denote by V any set of probability measures on Ω. The second main result of the paper (concerning Hausdorff dimensions of random fractals) is the following: under the condition of non-degeneracy of the colored branching process, almost surely, where δ x,n V means that the sequence δ x,n has at least one limit point in V and d(ν, µ, θ) is the Billingsley-Kullback entropy defined as: Moreover, almost surely: dim H {x ∈ X ∞ | δ x,n → ν } = d(ν, µ, θ).
These results are the essence of Theorems 12 and 17 below.
The systematic study of random self-similar fractals was initiated in the works by Falconer [4], Mauldin, Williams [5] and Graf [6]. They considered the attractor K of a random version of an iterated function system (IFS) in R n , letting the scaling factors θ(i) ∈ [0, 1) be independent and identically distributed (i.i.d.) random variables. The main result proven in [4][5][6] is the following: if the open set condition holds and i θ(i) > 1, then K = ∅ with positive probability, and with the same probability (i.e., almost surely, provided K = ∅), its Hausdorff dimension s is a unique solution to the equation i Eθ(i) s = 1. Recently, the research was extended to the case of random self-affine fractals in R n (see [7,8]) and random affine code tree fractals [9].
On the other hand, Billingsley in [10,11] considered fractals in the space of infinite sequences Ω N , where Ω is a finite set, defined in terms of the limit behavior of empirical measures. He investigated Hausdorff dimensions of such fractals and, in particular, obtained formulas analogous to Formulars (2) and (4) (with Ω N instead of X ∞ and µ(i) ≡ 1). Afterwards, Billingsley's ideas and methods were transferred to strange attractors of dynamical systems in R n (see [12,13]).
The present research combines both features of random self-similar fractals and subsets of nonrandom self-similar fractals defined in terms of the limit behavior of empirical measures. This combination provides its novelty.
The paper can be divided into two parts. The first one (Sections 3-7) contains preliminary information necessary for our research. All theorems in this part are known and supplied with appropriate references; however, some of them are modified in a certain way for the sake of convenience in further usage. Most of them are proven for completeness of presentation and the reader's convenience. The second part (Sections 2 and 8-11) contains essentially new results, which have been briefly formulated above.

Examples and Discussion
Example 1. Suppose that Ω = {1, 2}, and the random set X ⊂ Ω contains the elements 1, 2 with probabilities p(1), p(2), respectively. Obviously, for i = 1, 2, the expectation of the number of elements in X that are equal to i is p(i). Therefore, in the notation of the previous section, µ(i) = p(i).
Let ν be the probability distribution on Ω, such that ν(1) = 1/3 and ν(2) = 2/3. Question: how many strings (x 1 , . . . , x n ) in which the quota of elements that equal one is close to 1/3 and the quota of elements that equal two is close to 2/3 are there in the random set X n ? According to Formula (1), under the condition of non-degeneracy of the stochastic process X 1 , X 2 , X 3 , . . . , the number of those strings with probability one has an asymptotics of order e −nρ(ν,µ) , where: In particular, if p(1) = p(2) = 2/3, then ρ(ν, µ) = − 1 3 ln 2 and, respectively, e −nρ(ν,µ) = 2 n/3 . On the other hand, if p(1) = 1 and p(2) = 1/3, then ρ(ν, µ) = 1 3 ln 4 3 and e −nρ(ν,µ) = (3/4) n/3 . The latter means that for all large enough n, there are no strings x = (x 1 , . . . , x n ) in the random set X n , such that δ x,n (1) ≈ 1/3 and δ x,n (2) ≈ 2/3. Example 2. (random Cantor dust). To each string x = (x 1 , . . . , x n ) ∈ X n from Example 1, let us put in correspondence a line segment I(x) ⊂ R as follows: and so on. If a certain segment I(x) = I(x 1 , . . . , x n ) has already been defined, then I(x 1 , . . . , x n , 1) and I(x 1 , . . . , x n , 2) are the left quarter and the right half of I(x), respectively. The random Cantor dust C ∞ is defined as: Put θ(1) = 1/4 and θ(2) = 1/2. Let the distance function on the random set of infinite strings X ∞ be defined as: Consider the mapping π : X ∞ → C ∞ that takes any infinite string x = (x 1 , x 2 , . . . ) ∈ X ∞ to the point π(x) = n I(x 1 , . . . , x n ). Obviously, this mapping is bi-Lipschitz, and hence, the Hausdorff dimensions of any subset A ⊂ X ∞ and its image π(A) ⊂ R are the same. Question: what is the Hausdorff dimension of the set π(A), where: The answer is given by Formulas (3) and (4). For example, if for each segment I(x 1 , . . . , x n ), the "probabilities of survival" of its left quarter I(x 1 , . . . , x n , 1) and its right half I(x 1 , . . . , x n , 2) are p(1) = p(2) = 2/3, then under the condition π(A) = ∅ with probability one, we have: As far as I know, no theorem proven or published previously gives answers to the questions put in Examples 1 and 2. On the other hand, if, in the setting of Example 2, one replaces the set π(A) by the whole of π(X ∞ ), then dim H π(X ∞ ) is computed, for instance, in [4][5][6] and Theorem 15.2 in [14]. The answer is given there as a solution to the so-called Bowen equation. Moreover, in those papers, dim H π(X ∞ ) was computed in a more general setting when the scaling factors θ(i) ∈ [0, 1) are random.
Therefore, the following problem arises naturally: how can one compute dim H A and dim H π(A) for the set A defined in Formula (5) in the case of random scaling factors θ(i)? The present work does not solve this problem.
Falconer and Miao started the investigation of random self-affine fractals [7]. They considered a random iterated function system (RIFS) generated by a finite collection of contracting affine maps S i : R n → R n and encoded by a random tree T ∞ that is defined exactly in the same manner as the random set of infinite genetic lines X ∞ above. It turned out that under the condition of non-degeneracy T ∞ = ∅ and some additional assumptions, the attractor of RIFS with probability one has a constant dimension (Hausdorff and box-counting), which can be found as a solution to a certain equation. Recently, analogous results were obtained for random affine code tree fractals [9] and for uniformly random self-similar fractals [8]. Thereby, the natural problem arises: is it possible to compute almost sure Hausdorff dimensions for subsets of random fractals like in [7][8][9], provided the subsets are defined in terms of the limit behavior of the corresponding empirical measures (say, as in Formula (5))?
One more possible direction to generalize the results of the paper is the case when the distribution ν, empirical measures δ x,n and/or the scaling factors θ(i) are defined not on the set of colors Ω, but on the whole set of infinite strings Ω N .

The Spectral Potential
The notion of spectral potential was introduced in [1 -3]. It was defined there as a logarithm of the spectral radius of a weighted shift or transfer operator. We will use below a simplified version of the spectral potential that corresponds to an operator of weighted integration.
Let X be an arbitrary finite set. Denote by B(X) the space of all real-valued functions on X, by M (X) the set of all positive measures on X and by M 1 (X) the set of all probability distributions on X.
Every measure µ ∈ M (X) defines a linear functional on B(X) of the form: It is easily seen that this functional is positive (i.e., it takes nonnegative values on nonnegative functions).
If, in addition, the measure µ is a probability, then this functional is normalized (i.e., µ[1] = 1). Consider the nonlinear functional: where ϕ ∈ B(X) and µ ∈ M (X). We will call it the spectral potential. It is monotone (if ϕ ≥ ψ then λ(ϕ, µ) ≥ λ(ψ, µ)), additively homogeneous (that is, λ(ϕ + t, µ) = λ(ϕ, µ) + t for each constant t) and analytic in ϕ. Define a family of probability measures µ ϕ on X, depending on the functional parameter ϕ ∈ B(X), by means of the formula: Evidently, each measure µ ϕ is equivalent to µ and has the density e ϕ−λ(ϕ,µ) with respect to µ. Let us compute the first two derivatives of the spectral potential with respect to the argument ϕ. We introduce the notation: This is nothing else than the derivative of the spectral potential in the direction f at the point ϕ. An elementary computation shows that: In other words, the derivative λ (ϕ, µ) coincides with the probability measure µ ϕ . Now, put: and compute this derivative using the just obtained Formula (7): In probability theory, the expression µ ϕ [f ] is usually called the expectation of a random variable f with respect to the probability distribution µ ϕ , and the expression µ is called the covariance of random variables f and g. In particular, the second derivative: is equal to the variance of the random variable f with respect to the distribution µ ϕ . Since the variance is nonnegative, it follows that the spectral potential is convex in ϕ.

The Kullback Action
Denote by B * (X) the space of all linear functionals on B(X). Obviously, For any measure µ ∈ M (X) and nonnegative function ϕ ∈ B(X), define a product ϕµ ∈ M (X) by means of the formula (ϕµ)(x) = ϕ(x)µ(x). The Kullback action is the following functional of two arguments ν ∈ B * (X) and µ ∈ M (X): +∞ in all of the other cases.
As far as we know, in the literature, this functional was defined only for probability measures ν and µ. Different authors call it differently: the relative entropy, the deviation function, the Kullback-Leibler information function and the Kullback-Leibler divergence.
When ν is a probability measure, the Kullback action can be defined by the explicit formula: In particular, if µ(x) ≡ 1, then the Kullback action differs only in sign from Shannon's entropy: In the case of a probability measure µ, the Kullback action is nonnegative and vanishes only if ν = µ. Indeed, if the functional ν is not a probability measure that is absolutely continuous with respect to µ, then ρ(ν, µ) = +∞. Otherwise, if ν is a probability measure of the form ν = ϕµ, then Jensen's inequality and strong convexity of the function f (x) = x ln x imply: ) and the equality ρ(ν, µ) = 0 holds if and only if ϕ is constant almost everywhere, and therefore, ν coincides with µ.
Theorem 1. The spectral potential and the Kullback action satisfy the Young inequality: that turns into equality if and only if ν = µ ψ .
Theorem 2. The Kullback action ρ(ν, µ) is the Legendre transform of the spectral potential: Proof. By the Young inequality, the left-hand side of Formula (13) is not less than the right-hand one. Therefore, it is enough to associate with any functional ν ∈ B * (X) a family of functions ψ t , depending on the real-valued parameter t, on which the equality in Formula (13) is attained. Suppose first that ν is a probability measure that is absolutely continuous with respect to µ. Therefore, ν has the form ν = ϕµ, where ϕ is a nonnegative density. Consider the family of functions: When t → +∞, we have the following relations: and so, Formula (13) is proven. In all other cases, when ν is not an absolutely continuous probability measure, by definition, we have ρ(ν, µ) = +∞. Let us consider these cases one after another.
If ν is a probability measure, which is singular with respect to µ, then there exists x 0 ∈ X, such that µ(x 0 ) = 0 and ν(x 0 ) > 0. In this case, consider the family of functions: It is easily seen that: The right-hand side of the above formula goes to +∞, while t increases, and so, Formula (13) holds.
If the functional ν is not normalized, then put ψ t (x) ≡ t. In this case, the expression: is unbounded from above, and hence, Formula (13) is still valid. Finally, if the functional ν is not positive, then there exists a nonnegative function ϕ, such that ν[ϕ] < 0. Consider the family ψ t = −tϕ, where t > 0. For this family: as t → +∞, and Formula (13) still remains in force. Corollary 1. The functional ρ( · , µ) is convex and lower semicontinuous on the space B * (X).
Proof. These are properties of the Legendre transform.

The Local Large Deviations Principle and the McMillan Theorem
As above, we keep the following notation: X is a finite set; B(X) stands for the space of real-valued functions on X; B * (X) is the space of linear functionals on B(X); M 1 (X) is the set of all probability measures on X; and M (X) is the set of all positive measures on X.
With each finite sequence x = (x 1 , . . . , x n ) ∈ X n , we associate an empirical measure δ x,n ∈ M 1 (X), which is supported on the set {x 1 , . . . , x n } and assigns to every point x i the measure 1/n. The integral of any function f with respect to δ x,n looks like: For any measure µ ∈ M (X), we denote by µ n its Cartesian power, which is defined on X n .
In the case of probability measure µ, Theorem 3 is a partial case of Varadhan's large deviations principle (the explicit formulation of which can be found, e.g., in [15,16]). Therefore, this theorem can be deduced from Varadhan's large deviations principle by means of mere renormalization of µ. Nevertheless, we will prove it independently for the sake of completeness.
Proof. This follows from Formulas (9) and (10) and Theorem 3 if we set µ(x) = 1 for all x ∈ X.

Hausdorff Dimension and the Maximal Dimension Principle
Let us define the Hausdorff dimension of an arbitrary metric space Ω. Suppose that Ω is covered by at most a countable collection of sets U = {U i }. Denote by |U| the diameter of this covering: The Hausdorff measure (of dimension α) of the metric space Ω is: where U is at most a countable covering of Ω. Obviously, This implies the basic property of the Hausdorff measure: if mes(Ω, α) < ∞ for some α, then mes(Ω, β) = 0 for all β > α.
Below, we will consider the space of sequences: Denote by Z n (x) the set of all sequences y = (y 1 , y 2 , . . . ) whose first n coordinates coincide with the same coordinates of x. This set will be called a cylinder of rank n. In particular, Z 0 (x) = X N . All cylinders generate the Tychonoff topology on the space X N and the cylinder σ-algebra of subsets in X N .
On the set of all cylinders, we fix an arbitrary positive function η that possesses the following two properties: first, if Z n (x) ⊂ Z m (y), then η(Z n (x)) ≤ η(Z m (y)) and, second, η(Z n (x)) → 0 as n → ∞ at each point x ∈ X N . Define the cylindrical metric on X N by means of the formula: Evidently, the diameter of Z n (x) in this metric coincides with η(Z n (x)).
Suppose that in addition to cylindrical metric (20) on X N , there is given a Borel measure µ on X N . The function: is called the (lower) pointwise dimension of the measure µ.
The next theorem provides an effective tool for computing the Hausdorff dimensions of various subsets of X N .
It follows that if d µ (x) ≡ d on the whole of the subset A ⊂ X N , then its dimension is equal to d. A weakened version of the second part of Theorem 4 in which the condition d µ (x) ≥ d is replaced by a more strong one µ(Z n (x)) ≤ |Z n (x)| d is usually called the mass distribution principle.
Theorem 4 was first proven by Billingsley in the case when the function η in Formula (20) is a probability measure on X N (see Theorems 2.1 and 2.2 in [11]). An analog to this theorem for subsets A ⊂ R r was proven in works [12,13].
Proof. Every cylinder Z n (x) is, in fact, a ball in metric (20), whose radius equals its diameter, and vice versa, any ball in this metric coincides with a cylinder. Besides, any two cylinders Z n (x) and Z m (y) either have an empty intersection or one of them is embedded in the other. Therefore, while computing the Hausdorff measure and the dimension of a subset A ⊂ X N , it is enough to operate only with disjoint coverings of A by cylinders.
Suppose first that d µ (x) < α for all points x ∈ A. Then, for each x ∈ A, there exist arbitrarily small cylinders Z n (x) satisfying the condition |Z n (x)| α < µ(Z n (x)). Using cylinders of this kind, we can arrange a disjoint covering U of the set A of arbitrarily small diameter. For this covering, we have: and hence, dim H A ≤ α. Thus, the first part of the theorem is proven.
Suppose now that d µ (x) > α for all points x ∈ A. Define the sets: Obviously, A = ε>0 A ε . Hence, there exists an ε, such that µ * (A ε ) > 0. Let U be a disjoint covering of A by cylinders of diameters less than ε. From the definition of A ε , it follows that mes(U, α) ≥ µ * (A ε ). Therefore, dim H A ≥ α, and thus, the second part of the theorem is proven.
Each point x = (x 1 , x 2 , . . . ) ∈ X N generates a sequence of empirical measures δ x,n on the set X: In other words, δ x,n (i) is the fraction of coordinates of the vector (x 1 , . . . , x n ) coinciding with i. For every probability measure ν ∈ M 1 (X), let us define its basin B(ν) as the set of all points x ∈ X N , such that δ x,n converges to ν: Evidently, basins of different measures do not intersect each other and are nonempty. If x ∈ B(ν) and y ∈ X N differs from x in only finite number of coordinates, then y ∈ B(ν). This implies the density of each basin in X N . Every measure ν ∈ M 1 (X) generates the Bernoulli distribution P ν = ν N on X N . By the strong law of large numbers, the basin B(ν) has probability one with respect to Bernoulli distribution P ν , and its complement has zero probability P ν . In particular, any basin different from B(ν) has zero probability.
Points that do not belong to the union of all basins will be called irregular. The set or irregular points has zero probability with respect to any distribution P ν , where ν ∈ M 1 (X). As a result, X N turns out to be decomposed into the disjoint union of different basins and the set of irregular points.
Let us fix some numbers θ(i) ∈ (0, 1) for all elements i ∈ X = {1, . . . , r} and define a cylindrical θ-metric on X N by the rule: It is a partial case of cylindrical metric (20). For each measure ν ∈ M 1 (X) and θ-metric (21), define the quantity: We will call it the Billingsley entropy, because he was the first who wrote down this formula and applied it for the computation of the Hausdorff dimensions [10]. He also expressed this quantity in terms of Shannon's entropy and the Kullback action: . A partial case of this theorem, in which θ(1) = · · · = θ(r) = 1/r, was first proven by Eggleston [17]. In the complete form, this theorem and its generalizations were proven by Billingsley [10,11].
Proof. Assume first that ν(i) > 0 for every i = 1, . . . , r. Obviously, Hence, for each point x ∈ B(ν), we have: Applying Theorem 4 to the set A = B(ν) and measure µ = P ν , we obtain the statement of Theorem 5.
In the general case, the same argument provides only the lower bound d Pν (x) ≥ S(ν, θ), which implies the lower bound dim H B(ν) ≥ S(ν, θ). The inverse inequality is provided by the next lemma.
Lemma 1. Suppose that X N is equipped with metric (21). Then, for any measure ν ∈ M 1 (X) and ε > 0, there exists a neighborhood O(ν), such that the Hausdorff dimension of the set: Proof. Fix a measure ν ∈ M 1 (X) and an arbitrary positive number κ. By McMillan's theorem, there exists a neighborhood O(ν), such that for each positive integer n: Decrease this neighborhood in such a way that, in addition to Formula (24), for every measure δ ∈ O(ν), the next inequality holds: Then, for every cylinder Z n (x) satisfying the condition δ x,n ∈ O(ν), we have the estimate: For any positive integer N , the set A is covered by the collection of cylinders: Evidently, the diameter of this covering goes to zero when N increases. Now, we can evaluate mes(U N , α) by means of Formulas (24) and (25): If α > S(ν, θ), then we can choose κ > 0 so small that the exponent in braces in Formula (26) is negative, and all of the sum in Formula (26) goes to zero as N → ∞. Therefore, the Hausdorff measure (of dimension α) of the set A is zero, and hence, dim H A does not exceed α.
We will say that a sequence of empirical measures δ x,n condenses on a subset V ⊂ M 1 (X) (notation: δ x,n V ) if it has at least one limit point in V .
Similar to the known large deviations principle by Varadhan [15,16], it is natural to call the next theorem the maximal dimension principle. Theorem 6. Let the space X N be equipped with cylindrical θ-metric (21). Then, for any nonempty Proof. The set A = {x ∈ X N | δ x,n V } contains basins of all measures ν ∈ V . Therefore, by Theorem 5, its dimension is not less than the right-hand side of Formula (27).
It is easily seen from the definition (22) of the Billingsly entropy S(ν, θ) that it depends continuously on the measure ν ∈ M 1 (X). Consider the closure V of V . Obviously, it is compact. Fix any ε > 0. By Lemma 1 for any measure ν ∈ V , there exists a neighborhood O(ν), such that: Let us take a finite covering of V composed of neighborhoods of this sort. Then, the set A = {x ∈ X N | δ x,n V } will be covered by a finite collection of sets of the form {x ∈ X N | δ x,n O(ν)} satisfying inequality Formula (28). By the arbitrariness of ε, this implies the statement of Theorem 6.
A result very similar to Theorem 6 was proven by Billingsley in Theorem 7.1 in [10].

Branching Processes
First, let us recall the basic notions about the simplest Galton-Watson branching process. Suppose that a random variable Z takes nonnegative values k ∈ Z + with probabilities p k . The Galton-Watson branching process is a sequence of random variables Z 0 , Z 1 , Z 2 , . . . , such that Z 0 ≡ 1, Z 1 = Z, and further, each Z n+1 is defined as the sum of Z n independent counterparts of the random variable Z. In particular, if Z n = 0, then Z n+1 = 0, as well. Usually, Z n is understood as the total number of descendants in the n-th generation of a unique common ancestor under the condition that each descendant independently of the others gives birth to Z children.
It is known that in some cases, the posterity of the initial ancestor may degenerate (when starting from a certain n, all Z n are zeros), and in other cases, it can "flourish" (when Z n grows exponentially). The type of behavior of a branching process depends on the mean number of children of any individual: and on the generating function of that number: Obviously, the restriction of the function f (s) to the segment [0, 1] is nonnegative, nondecreasing, convex and satisfies f (1) = 1 and f (1) = m.
In the theory of branching processes (see, for instance, [18,19]), the following statements were proven. Theorem 8. If m ≤ 1, then the branching process degenerates almost surely (except the case when each individual gives birth to exactly one child). If m > 1, then the probability of degeneration q is less than one and coincides with a unique non-unit root of the equation f (s) = s on the segment [0, 1].
Theorem 9. If m > 1 and EZ 2 < ∞, then the sequence W n = Z n /m n converges almost surely to a random variable W , such that P{W > 0} = 1 − q. If m > 1 and EZ 2 = ∞, then, for any number m < m, with probability 1 − q: (here, q is the probability of degeneration of the branching process).
Thereby, in the case m > 1, there is an alternative for the total number of descendants Z n : either it vanishes at a certain moment n 0 (with probability q < 1) or it is asymptotically equivalent to W m n (with the complementary probability 1 − q), where the random variable W > 0 does not depend on n (except the case EZ 2 = ∞, when only the logarithmic equivalence ln Z n ∼ ln m n is guaranteed). All of the other types of behavior of descendants' totality have zero probability.
We will exploit these theorems in the study of colored branching processes. Suppose now that each individual may give birth to children of r different colors (or r different genders, if one likes). We will assume that the posterity of each individual in the first generation is a random set X containing a random number k 1 of children of the first color, a random number k 2 of children of the second color, and so on, up to a random number k r of children of color r. All of the elements of X (including elements of the same color) are treated as different. The ordered array k = (k 1 , k 2 , . . . , k r ) ∈ Z r + will be called the color structure of the set X of children. Denote by p k the probability of the birth of the set X with color structure k = (k 1 , k 2 , . . . , k r ). Naturally, all of the probabilities p k are nonnegative and: k∈Z r + p k = 1.
If an individual x 1 gave birth to x 2 , then x 2 gave birth to x 3 , and so on, up to an individual x n , then the sequence x = (x 1 , . . . , x n ) will be called a genetic line of length n.
Let us construct a new branching process taking into account not only the total number of descendants, but also the color of each individual and all its upward and downward lineal relations. This process may be treated as a random genealogical tree with a common ancestor in the root and all its descendants in the vertices, where each parent is linked with all its children. In the case of a degenerating population, its genealogical tree is finite, and in the case of a "flourishing" one, the tree is infinite.
Formally, it is convenient to define such a process as a sequence of random sets X n containing all genetic lines of length n. As the first set X 1 , we take X. The subsequent sets X n are built by induction: if X n is already known, then for all genetic lines (x 1 , . . . , x n ) ∈ X n , one defines disjoint independent random sets of children X(x 1 , . . . , x n ), each with a color structure distribution as in X, and X n+1 is given by: X n+1 = (x 1 , . . . , x n , x n+1 ) (x 1 , . . . , x n ) ∈ X n , x n+1 ∈ X(x 1 , . . . , x n ) .
The stochastic process X 1 , X 2 , X 3 , . . . built in this way will be referred to as a colored branching process (or unconditional colored branching process, if one wishes to emphasize that the posterity of any individual is independent of its color and genealogy).

The McMillan Theorem for Colored Branching Processes
Consider a colored branching process X 1 , X 2 , X 3 , . . . with a given finite collection of colors Ω = {1, . . . , r} and a probability distribution {p k | k ∈ Z r + }, where k = (k 1 , . . . , k r ) is the color structure of each individual's set of children X. We will always assume that X 1 = X is generated by a unique initial individual.
For any genetic line x = (x 1 , . . . , x n ) ∈ X n , define the spectrum δ x,n as the corresponding empirical measure on Ω given by the rule: where g(x t ) denotes the color of x t . In other words, δ x,n (i) is the fraction of individuals of color i in the genetic line x. Our next goal is to obtain asymptotical estimates for cardinalities of the random sets: where O(ν) is a small neighborhood of the distribution ν on the set of colors Ω. Denote by µ(i) the expectation of members of color i in X: Provided all µ(i) are finite, the vector µ = (µ(1), . . . , µ(r)) can be regarded as a measure on the set of colors Ω. This measure generates the measure µ n on Ω n as a Cartesian product. Define a mapping G : X n → Ω n by means of the formula: where g(x t ) is the color of x t .
For the measure µ from Formula (30), let us define the Kullback action: where M 1 (Ω) is the set of all probability measures on Ω. This formula is a copy of Formula (9).
Proof. It follows from Formula (29) that, for every genetic line x ∈ X n , its spectrum δ x,n coincides with the empirical measure δ ω,n , where ω = G(x). Therefore, It follows from Formulas (31) and (34) that: The latter equality converts estimate Formulas (32) and (33) into already proven estimate Formulas (14) and (15) from the large deviations principle.
It is remarkable that the last reference to the large deviations principle serves as a unique "umbilical cord" linking the first three Sections of the paper with the others. Now, we are ready to state an analog of the McMillan theorem for colored branching processes. Let q * be the probability of degeneration of the process (the probability of the event when starting from a certain number n all of the sets X n turn out to be empty).
Proof. The application of Chebyshev's inequality to Formula (32) gives: Summing up these inequalities over all n ≥ N , one obtains: This implies Formula (35) with constant 2ε instead of ε, which does not change its sense. Now, we proceed to the second part of the theorem. Let κ = −ρ(ν, µ) − ε and, at the same time, the number ε be so small that κ > 0. By the second part of Theorem 10, for any neighborhood O(ν), there exists N , such that: Without loss of generality, we may assume that O(ν) is convex. Now, we wish to construct a Galton-Watson branching process Z 1 , Z 2 , . . . satisfying the conditions: Let the random variable Z 1 be defined by Formula (37). For n > 1, set Z n to be the total number of genetic lines: (x 1 , . . . , x N , . . . , x (n−1)N +1 , . . . , x nN ) ∈ X nN such that the spectrum of each segment (x kN +1 , . . . , x (k+1)N ) belongs to O(ν). In other words, we treat as "individuals" of the process Z 1 , Z 2 , Z 3 , . . . those segments (x kN +1 , . . . , x (k+1)N ) of genetic lines of the initial process whose spectrum lies in O(ν). Now, Formula (38) follows from convexity of O(ν); and from the unconditionality of the initial colored branching process, it can be concluded that the sequence Z 1 , Z 2 , . . . in fact forms a Galton-Watson branching process. By construction, one has EZ 1 > e N κ . In this setting, Theorem 9 asserts that there is an alternative for the sequence Z n : either it tends to zero with a certain probability q < 1 or it grows faster than e nN κ with probability 1 − q. In the second case, Formula (38) implies: To finish the proof, we have to do two things: verify that Formula (39) is valid with probability 1 − q * and get rid of the multiplier N there. To do this, we will exploit two ideas. First, if the colored branching process X 1 , X 2 , . . . had been generated by m initial individuals instead of a unique one, then Formula (39) would be valid with a probability of at least 1 − q m . Second, if one genetic line is a part of another one and the ratio of their lengths is close to one, then their spectra are close to each other, as well.
Obviously, the total number of individuals in the n-th generation of the initial branching process X 1 , X 2 , X 3 , . . . equals |X n |. The sequence of random variables |X n | forms a Galton-Watson branching process with the probability of degeneration q * , that does not exceed q. Therefore, the sequence |X n | grows exponentially with probability 1 − q * .
Consider the colored branching process X k+1 , X k+2 , X k+3 , . . . obtained from the initial one by truncation of the first k generations. It can be thought of as a union of |X k | independent branching processes generated by all individuals of k-th generation. Therefore, it satisfies Formula (39) with probability at least 1 − q |X k | . Hence, for the initial process with even greater probability, we obtain the condition: where O * (ν) is an arbitrary neighborhood of ν containing the closure of O(ν). Suppose that the sequence |X n | grows exponentially. For every m ∈ N, we set the numbers: For each k = k i , condition (40) holds with probability at least 1 − q m , and together, they give the estimate: #{x ∈ X n | δ x,n ∈ O * (ν)} > e nκ , n → ∞, with probability at least 1 − N q m . By the arbitrariness of m, this estimate is valid almost surely (under the condition |X n | → ∞, which takes place with probability 1−q * ). It is equivalent to Formula (36).

Dimensions of Random Fractals (Upper Bounds)
Here, we continue the investigation of a colored branching process X 1 , X 2 , . . . with a finite collection of colors Ω = {1, . . . , r}. Let us consider the corresponding set of infinite genetic lines: Define the cylindrical θ-metric on X ∞ : where the numbers θ(1), . . . , θ(r) are taken from (0, 1). We will be interested in the Hausdorff dimensions of both the space X ∞ and its various subsets, defined in terms of the partial limits of empirical measures on Ω (we call these measures spectra; see Formula (29), and denote by δ x,n ). If the colored branching process degenerates, then X ∞ = ∅. Therefore, the only case of interest is when m = E|X 1 | > 1, and the cardinality of X n increases with a rate of order m n .
As before, we denote by µ(i), where i ∈ Ω, the expectation of individuals of color i in the random set X 1 . It will be always assumed that µ(i) < ∞. Consider any probability measure ν ∈ M 1 (Ω). It will be proven below that the dimension of the set {x ∈ X ∞ | δ x,n → ν } can be computed by means of the function: We will call it the Billingsley-Kullback entropy.
The proof is carried out in the same manner as in Lemma 1. Take any κ > 0. By Theorem 32, there exists a neighborhood O(ν), such that almost surely: Let us reduce this neighborhood in such a way that in addition to Formula (43), for all δ ∈ O(ν), Then, for each cylinder Z n (x) satisfying the condition δ x,n ∈ O(ν), the following estimate holds: For every natural N , the set A is covered by the collection of cylinders: Evidently, the diameter of this covering tends to zero as N → ∞. Hence, mes(U N , α) can be estimated by means of Formulas (43) and (44): If α > d(ν, µ, θ), then κ can be chosen so small, that the last exponent in braces is negative, and all of the sum (45) tends to zero as N → ∞. Thus, the Hausdorff measure (of dimension α) of the set A is zero, and its dimension does not exceed α.
As before, we say that a sequence of empirical measures δ x,n condenses on a subset V ⊂ M 1 (Ω) (notation δ x,n V ) if it has a limit point in V .
Theorem 12. Let X 1 , X 2 , . . . be an unconditional colored branching process with a finite set of colors Ω, and let the set X ∞ of all infinite genetic lines be equipped with a cylindrical θ-metric (41). Then, for any subset V ⊂ M 1 (Ω), almost surely: In particular, dim H X ∞ ≤ s almost surely, where s is a unique root of the "Bowen equation": Proof. It follows from the definition of the Billingsley-Kullback entropy d(ν, µ, θ) that it depends continuously on the measure ν ∈ M 1 (Ω). Let V be the closure of V . Obviously, it is compact. Take an arbitrary ε > 0. By Lemma 3, for any measure ν ∈ V , there exists a neighborhood O(ν), such that almost surely: Choose a finite covering of the set V by neighborhoods of this kind. Then, the set {x ∈ X ∞ | δ x,n V } will be covered by a finite collection of sets of the form {x ∈ X ∞ | δ x,n O(ν)} satisfying Formula (48). By the arbitrariness of ε, this implies the first statement of Theorem 12.
Let s be a solution to Equation (47). Note that for any ν ∈ M 1 (Ω), the concavity of the logarithm function implies that: Consequently, d(ν, µ, θ) ≤ s. Now, the second part of the theorem follows from the first one if we take V = M 1 (Ω).
Remark. Usually, the term "Bowen equation" is used for the equation P (sϕ) = 0, where P (sϕ) is the topological pressure of a weight function sϕ in some dynamical system (more detailed explanations can be found in [13]). If we replace the topological pressure P (sϕ) by the spectral potential: then the Bowen equation turns into the equation λ(sϕ, µ) = 0, which is equivalent to Formula (47).

Block Selections of Colored Branching Processes
Let ξ 1 , ξ 2 , ξ 3 , . . . be a sequence of independent identically distributed random variables taking values of zero or one (independent Bernoulli trials).
Proof. In the case P{ξ i = 1} = p, this follows from the law of large numbers. If P{ξ i = 1} increases, then the probability in the left-hand side of Formula (49) increases, as well.
Consider a colored branching process X 1 , X 2 , . . . with a finite set of colors Ω = {1, . . . , r}. Each X n consists of genetic lines (x 1 , x 2 , . . . , x n ) of length n, in which every subsequent individual is born by the previous one. Fix a (large enough) natural N . We will split genetic lines of length divisible by N into blocks of length N : Each block y k generates an empirical measure δ y k (spectrum) on Ω by the rule: where g(x t ) denotes the color of x t . A block selection of order N from a colored branching process X 1 , X 2 , . . . is any sequence of random subsets Y n ⊂ X nN with the following property: if (y 1 , . . . , y n+1 ) ∈ Y n+1 , then (y 1 , . . . , y n ) ∈ Y n . In this case, the sequence of blocks (y 1 , . . . , y n+1 ) will be called a prolongation of the sequence (y 1 , . . . , y n ).
As above (see Formula (30)), denote by µ(i) the expectation of children of color i born by each individual and by µ the corresponding measure on Ω.
Theorem 13. Let X 1 , X 2 , . . . be an unconditional colored branching process with a finite set of colors Ω and the probability of degeneration q * < 1. If a measure ν ∈ M 1 (Ω) satisfies the condition ρ(ν, µ) < 0, then for any neighborhood O(ν) ⊂ M 1 (Ω) and any number ε > 0 with probability 1 − q * , one can extract from the branching process a block selection Y 1 , Y 2 , . . . of an order N , such that each sequence of blocks (y 1 , . . . , y n ) ∈ Y n has at least l(N ) prolongations in Y n+1 , where: and the spectra of all blocks belong to O(ν).
Proof. Fix any numbers p and ε satisfying the conditions: By the second part of Theorem 11 for all large enough N , we have: Further, we will consider finite sequences of random sets X 1 , . . . , X nN and extract from them block selections Y 1 , . . . , Y n of order N , such that the spectra of all of their blocks belong to O(ν) and each sequence of blocks (y 1 , . . . , y k ) ∈ Y k has at least l(N ) prolongations in Y k+1 . Denote by A n the event of the existence of a block selection with these properties. Define one more event A by the condition: It follows from Formulas (50) and (51) that P(A) ≥ p + ε. Evidently, A ⊂ A 1 . Therefore, P(A 1 ) ≥ p + ε. Now, we are going to prove by induction that P(A n ) ≥ p whenever the order N of our block selection is large enough. Let us perform the step of induction. Suppose that P(A n ) ≥ p is valid for some n. Consider the conditional probability P(A n+1 |A). By the definition of events A n+1 and A, it cannot be less than the probability of the following event: there are at least l(N ) wins in a sequence of [l(N )e N ε/2 ] independent Bernoulli trials with the probability of win P(A n ) in each. Choosing N large enough and using Lemma 4 (with p = p/2 and k = [l(N )e N ε/2 ]), one can make this probability greater than 1 − ε. Then, Thus, the inequality P(A n ) > p is proven for all n. This means that with probability greater than p, one can extract from the sequence X 1 , . . . , X nN a block selection Y 1 , . . . , Y n of order N , such that the spectra of all blocks belong to the neighborhood O(ν) and each sequence of blocks (y 1 , . . . , y k ) ∈ Y k has at least l(N ) prolongations in Y k+1 .
To obtain a block selection of infinite length with the same properties, we will construct finite block selections Y 1 , . . . , Y n in the following manner. Initially, suppose that every Y k , where k ≤ n, consists of all of the sequences of blocks (y 1 , . . . , y k ) ∈ X kN , such that the spectrum of each block lies in O(ν). At the first step, we exclude from Y n−1 all of the sequences of blocks having less than l(N ) prolongations in Y n and then exclude from Y n all prolongations of the sequences that were excluded from Y n−1 . At the second step, we exclude from Y n−2 all of the sequences of blocks having after the first step less than l(N ) prolongations in the modified Y n−1 , and then, exclude from Y n−1 and Y n all prolongations of the sequences that were excluded from Y n−2 . Proceeding further in the same manner, after n steps, we will obtain a block selection Y 1 , . . . , Y n , such that each sequence of blocks from any Y k has at least l(N ) prolongations in Y k+1 . Evidently, this selection will be the maximal one among all of the selections of order N having the mentioned "l(N )-prolongations" property. Therefore, with probability at least p, all of the sets Y k are nonempty.
For every n, let us construct, as is described above, the maximal block selection Y (n) 1 , . . . , Y (n) n possessing the mentioned "l(N )-prolongations" property. From the maximality of these selections, it follows that: n . Then, with probability at least p, all of them are nonempty and compose an infinite block selection from Theorem 13. Since p may be chosen arbitrarily close to 1 − q * , such selections do exist with probability 1 − q * . Theorem 13 can be strengthened by taking several measures in place of a unique measure ν ∈ M 1 (Ω).
It can be proven in the same manner as the previous one. Here, the event A n should be understood as the existence of a finite block selection Y 1 , . . . , Y n satisfying the conclusion of Theorem 14, and the event A should be defined by the system of inequalities: We leave the details to the reader.

Dimensions of Random Fractals (Lower Bounds)
Now, we continue the investigation of the space of infinite genetic lines: which is generated by an unconditional colored branching process X 1 , X 2 , . . . with a finite set of colors Ω = {1, . . . , r}. It is assumed that there is a measure: on Ω, where µ(i) denote the expectation of children of color i born by each individual, and X ∞ is equipped with cylindrical θ-metric (41).
Theorem 15. Let X 1 , X 2 , X 3 , . . . be an unconditional colored branching process with a finite set of colors Ω and probability of degeneration q * < 1. If a measure ν ∈ M 1 (Ω) satisfies the condition d(ν, µ, θ) > 0, then with probability 1 − q * for any neighborhood O(ν), we have the lower bound: Proof. Let us fix any number α < d(ν, µ, θ) and ε > 0 so small that: Then, we choose a convex neighborhood O * (ν), whose closure lies in O(ν) and such that, for any measure δ ∈ O * (ν): By Theorem 13 with probability 1 − q * , one can extract from the branching process under consideration a block selection Y 1 , Y 2 , . . . of an order N , such that any sequence of blocks (y 1 , . . . , y n ) from Y n has at least l(N ) prolongations in Y n+1 , where: and for each block y k = (x (k−1)N +1 , . . . , x kN ), the corresponding empirical measure δ y k (spectrum) belongs to O * (ν). Let us exclude from this selection a certain part of genetic lines in such a way that each of the remaining sequences of blocks (y 1 , . . . , y n ) ∈ Y n has exactly [l(N )] prolongations in Y n+1 .
Any sequence y = (y 1 , y 2 , . . . ) ∈ Y ∞ consists of blocks of length N . Writing down in order the elements of all of these blocks, we obtain from y an infinite genetic line x = (x 1 , x 2 , . . . ) ∈ X ∞ . Denote it as π(y). By the definition of Y ∞ , the spectrum of each block y k belongs to O * (ν). For every point x = π(y), where y ∈ Y ∞ , the empirical measure δ x,nN is the arithmetical mean of the empirical measures corresponding to the first n blocks of y, and so, it belongs to O * (ν), as well. It follows that: The family of all cylinders of the form Z nN (x), where x ∈ π(Y ∞ ), generates some σ-algebra on π(Y ∞ ). Define a probability measure P on this σ-algebra, such that: Then, for all large enough N , all x ∈ π(Y ∞ ) and all natural n: P Z nN (x) ≤ e nN (ρ(ν,µ)+2ε) .
On the other hand, by Formula (54): It follows from the last two formulas and Formula (53) that: Now, we are ready to compute the Hausdorff measure of dimension α of the set π(Y ∞ ). If, while computing the Hausdorff measure, we used coverings of π(Y ∞ ) not by arbitrary cylinders, but only by cylinders of orders divisible by N , then the last formula would imply that such a measure is not less than P (π(Y ∞ )) = 1. Any cylinder can be embedded in a cylinder of order divisible by N , such that the difference of their orders is less than N and the ratio of their diameters greater than min θ(i) N . Therefore, mes π(Y ∞ ), α ≥ min θ(i) N α and hence, dim H π(Y ∞ ) ≥ α.
The set defined in the right-hand part of Formula (55) contains π(Y ∞ ). Thus, its dimension is at least α. Recall that we have proven this fact by means of a block selection that exists with probability 1 − q * . By the arbitrariness of α < d(ν, µ, θ), this implies the desired bound (52) with the same probability.
If s ≤ 0, then X ∞ = ∅ almost surely. Otherwise, if s > 0, then X ∞ is nonempty with a positive probability, and with the same probability, its dimension equals s.
We note that a more general version of Theorem 16, in which the similarity coefficients θ(i) are random, is proven in [4][5][6]14].
Proof. The expectation of the total number of children born by each individual in the branching process generating the set X ∞ is equal to m = µ(1) + · · · + µ(r). If s ≤ 0, then m ≤ 1. In this case, by Theorem 8, our branching process degenerates almost surely, and X ∞ = ∅.
If s > 0, then m > 1. In this case, by Theorem 8, our branching process does not degenerate with a positive probability, and X ∞ is nonempty. Define a measure ν ∈ M 1 (Ω) by means of the equality: ν(i) = µ(i)θ(i) s , i ∈ Ω.
Then, evidently, d(ν, µ, θ) = s. By the previous theorem, dim H X ∞ ≥ s with the same probability with which X ∞ = ∅. On the other hand, by Theorem 12, the inverse inequality holds almost surely.
For every probability measure ν ∈ M 1 (Ω), we define its basin B(ν) ⊂ X ∞ as the set of all infinite genetic lines x = (x 1 , x 2 , x 3 , . . . ), such that the corresponding sequence of empirical measures δ x,n converges to ν. What is the dimension of B(ν)? By Theorem 12, it does not exceed the Billingsley-Kullback entropy d(ν, µ, θ) with probability one. On the other hand, the inverse inequality does not follow from the previous results (and, in particular, from Theorem 15). To obtain it, we ought to enhance the machinery of block selections.
Lemma 5. Let Q 1 , . . . , Q 2 r be the vertices of a cube in R r . Then, there exists a law of choice i : R r → {1, . . . , 2 r }, such that if neighborhoods O(Q i ) are small enough, and sequences: δ n ∈ R r and ∆ n = δ 1 + · · · + δ n n satisfy the conditions δ n+1 ∈ O Q i(∆n) and δ 1 ∈ O(Q 1 )∪· · ·∪O(Q 2 r ), then the sequence ∆ n converges to the center of the cube.
Proof. First, we consider the case r = 1, when the cube is a segment. Let, for definiteness, Q 1 = −1 and Q 2 = 1. Set: Take any neighborhoods O(Q 1 ) and O(Q 2 ) with radii at most one. Then, for any sequence δ n satisfying the conditions δ n+1 ∈ O Q i(∆n) and |δ 1 | < 2, we have the estimate |∆ n | < 2/n. It may be easily proven by induction. Thus, in the one-dimensional case, the lemma is proven. To prove it in the multidimensional case, one should choose a coordinate system with the origin at the center of the cube and the axes parallel to its edges and apply the law of choice (56) to each of the coordinates independently.
We exclude from this block selection a certain part of the genetic lines, so that each of the remaining sequences of blocks (y 1 , . . . , y n ) ∈ Y n has exactly [l(N )] prolongations (y 1 , . . . , y n , y) ∈ Y n+1 , and all of these prolongations satisfy the law of choice from Lemma 5, namely, δ y ∈ O Q i(∆n) , where ∆ n = δ y 1 + · · · + δ yn n .