The Kullback–Leibler Information Function for Inﬁnite Measures

: In this paper, we introduce the Kullback–Leibler information function ρ ( ν , µ ) and prove the local large deviation principle for σ -ﬁnite measures µ and ﬁnitely additive probability measures ν . In particular, the entropy of a continuous probability distribution ν on the real axis is interpreted as the exponential rate of asymptotics for the Lebesgue measure of the set of those samples that generate empirical measures close to ν in a suitable ﬁne topology.


Introduction
Let P be a continuous probability distribution on the real axis with density ϕ(x) = dP(x)/dx.Its entropy is defined as What is the substantive sense of H(P)?More precisely, does there exist a mathematical object whose natural quantitative magnitude (e.g., volume) is a certain function of the entropy?Traditionally, entropy is treated as a measure of disorder.However, this explanation does not answer the question stated above because it does not establish a relationship between entropy and any other quantitative characteristic of disorder that can be defined and measured regardless of the entropy.
To illustrate the problem, consider the entropy of a discrete distribution P = (p 1 , . . ., p r ), Its substantive meaning is well known.Namely, let X = {1, . . ., r} be a finite alphabet.Then, the set of those words (x 1 , . . ., x n ) ∈ X n of length n 1 in which every letter i ∈ X occurs with mean frequency close to p i has cardinality of order e nH(P) (this follows from the Shannon-McMillan-Breiman theorem (see [1,2])).Thus, the entropy of a discrete distribution determines the exponential rate for the number of those words of length n in which letters occur with prescribed frequencies.
Can we say anything of that sort about the entropy of a continuous distribution?It turns out-yes.Indeed, from Theorem 3 stated below, it follows that entropy (1) determines the exponential rate for the Lebesgue measure of the set of sequences (x 1 , . . ., x n ) ∈ R n of length n 1 that generate empirical measures on R close to P. The proximity of distributions should be understood here in the sense of a fine topology, which is defined in the same way as the weak topology, but with the use of integrable functions instead of bounded ones.
For example, if P is the exponential distribution with density ϕ(x) = λe −λx , x ≥ 0, then and so the set of sequences (x 1 , . . ., x n ) ∈ R n of length n 1 that generate empirical measures close to P (in the fine topology) has Lebesgue measure of order e nH(P) = (e/λ) n .
Another example: for the Gaussian distribution P with density we get and the set of sequences (x 1 , . . ., x n ) ∈ R n of length n 1 that generate empirical measures close to P (in the fine topology) has Lebesgue measure of order e nH(P) = (2πσ 2 e) n/ 2 .These examples are based on the presentation of entropy (1) in the form H(P) = −ρ(P, Q), where Q is the Lebesgue measure on the real axis and ρ(P, Q) is the Kullback-Leibler information function: as well as on a certain generalization of the so-called local large deviation principle.Let P and Q be two probability distributions on a space X.Roughly speaking, the local large deviation principle asserts that the measure Q n of the set of sequences (x 1 , . . ., x n ) ∈ X n that generate empirical measures close to P has exponential order e −nρ(P,Q) , provided n → +∞.
It should be mentioned that different authors called the function ρ(P, Q) in different ways: the Kullback-Leibler information function [4], the relative entropy [6], the rate function [5,7,15], the Kullback-Leibler divergence, the action functional [16], and the Kullback-Leibler distance [20] (though, of course, it is nonsymmetric and hence not a metric at all).For brevity, in the sequel, we will prefer the term "Kullback action" rather than any of the listed above.
Until recently, the Kullback action and the local large deviation principle were studied only in the case when both arguments P, Q were probability distributions.Only recently, in papers [9,10], was the measure Q allowed to be no more than finite and positive, and the measure P was allowed to be finitely additive, and, moreover, real-valued.Unfortunately, this is still insufficient for the interpretation of entropy (1) because the Lebesgue measure on the real axis is infinite.Therefore, it is highly desirable to define properly the Kullback action and to obtain a generalization of the local large deviation principle for infinite measures Q.Our main result is the solution of this problem.
It turns out that at least two different ways of generalization are possible.The first approach is based on the use of the fine topology in the space of probability distributions.This is presented in Theorem 3. In the second approach, the whole space X is replaced by its certain part Y of finite measure Q, and the distribution P is replaced by its conditional distribution P Y on Y. Thereby, the problem reduces to the case of finite measures.This approach is implemented in Theorems 4 and 5.
In fact, it makes sense to consider finitely additive probability distributions P as well since some sequences of empirical measures may converge to finitely additive distributions.In such a case, the Kullback action can take values +∞ or −∞ only (Theorem 6).The corresponding versions of the large deviation principle for finitely additive measures P are presented in Theorems 7 and 8.
First results on the large deviation principle for infinite measures were obtained in [21,22], where a countable set X and the "counting" measure Q (such that Q(x) = 1 for all x ∈ X) were considered.In such a case, the Kullback action ρ(P, Q) coincides (up to the sign) with entropy (2).It was revealed in [21,22] that, for the "counting" measure Q on the countable space X, the ordinary form of the large deviation principle, formulated in terms of the weak topology, fails and so one should use the fine topology instead.
The paper is organized as follows.In the next section we recall the local large deviation principle for finite measures (Theorem 1).In Section 3, we define the Kullback action ρ(ν, µ) as the Legendre dual functional to the so-called spectral potential λ(ϕ, µ) and formulate two variants of the large deviation principle for the case of σ-finite measure µ (Theorems 3-5).These theorems are proven in Sections 4-7.In Section 8, we formulate two variants of the large deviation principle for σ-finite measures µ and finitely additive probability distributions ν (Theorems 7 and 8).Theorem 6 states that, in fact, ρ(ν, µ) turns into +∞ or −∞ if the measure ν has no density with respect to µ.It is proven in Section 9.The final Section 10 contains proofs of Theorems 7 and 8.

The Kullback Action for Finite Measures
Let us consider an arbitrary set X supplied with a σ-field A of its subsets.In what follows by "measures" we mean only nonnegative measures on the measurable space (X, A).
Suppose that ν, µ ∈ M σ (X) and the measure ν is absolutely continuous with respect to µ.Then, by the Radon-Nikodym theorem, ν can be presented in the form ν = ϕµ, where ϕ is a nonnegative measurable function, which is called the density of ν with respect to µ and denoted as ϕ = dν/dµ.This function is uniquely defined up to a set of zero measure µ.
The Kullback action ρ(ν, µ) is a function of a probability measure ν ∈ M 1 (X) and a finite measure µ ∈ M(X) defined in the following way: if ν is absolutely continuous with respect to µ, then and ρ(ν, µ) = +∞, otherwise.In (4), we set ϕ ln ϕ = 0 for ϕ = 0. Therefore, ρ(ν, µ) belongs to the interval (−∞, +∞]. With each finite sequence x = (x 1 , . . ., x n ) ∈ X n , we associate an empirical measure δ x,n ∈ M 1 (X) that is supported on the set {x 1 , . . ., x n } and assigns to each x i the measure 1/n.The expectation of any function f : X → R with respect to this empirical measure looks like Let us fix any probability measure µ ∈ M 1 (X).If the points x i ∈ X are treated as independent random variables with common distribution µ, then the empirical measure δ x,n becomes a random variable itself, taking values in M 1 (X).We will be interested in the asymptotics of its distribution.It turns out that, at a first approximation, this asymptotics is exponential with the exponent −nρ(ν, µ).
To describe the asymptotics of the empirical measures distribution, we need two topologies on the space M 1 (X).The first one is the weak topology generated by neighborhoods of the form where f 1 , . . ., f k ∈ B(X) and ε > 0. The second topology is generated by neighborhoods of the same form (5) but with functions f 1 , . . ., f k ∈ L 1 (X, µ) therein.In addition, it is supposed in this case that O(µ) contains only those measures ν for which all integrals X f i dν do exist.This topology will be referred to as the fine topology.It is useful because it enables us to formulate the usual law of large numbers in the next form: for any probability distribution µ ∈ M 1 (X), the sequence of empirical measures δ x,n converges to µ in the probability in the fine topology.On the other hand, a shortcoming of the fine topology is the fact that, with respect to it, the affine map t → (1 − t)µ 0 + tµ 1 , where t ∈ [0, 1], may be discontinuous at the ends of the segment [0, 1].
It is easy to see that the fine topology on M 1 (X) contains the weak one, but the converse, in general, does not take place.
For any nonnegative measure µ on X, denote by µ n its Cartesian power supported on X n .The next theorem describes asymptotics of the empirical measures distribution.
Theorem 1 (the local large deviation principle for finite measures).For any measures ν ∈ M 1 (X), µ ∈ M(X), and number ε > 0, there exists a weak neighborhood O(ν) ⊂ M 1 (X) such that On the other hand, for any measures ν ∈ M 1 (X), µ ∈ M(X), number ε > 0, and any fine neighborhood O(ν) ⊂ M 1 (X), the following estimate holds for all large enough n: In the case of a metric space X supplied with a Borel σ-field, the neighborhood O(ν) in (6) can be chosen from the weak topology generated by bounded continuous functions.
Remark 2. So long as each weak neighborhood in M 1 (X) belongs to the fine topology, estimates (6) and (7) complement each other: the coefficient ρ(ν, µ) cannot be increased in (6) and cannot be decreased in (7).
Remark 3. Theorem 1 is also true for finitely additive probability distributions ν on the space X if we set ρ(ν, µ) = +∞ in such a case (see [9]).
It is worth mentioning that, until recently, the absolute majority of papers on the large deviation principle dealt with random variables in Polish space (i.e., complete separable metric space), and only a few of them treated random variables in a topological space (see, for example, [4]), or in a measurable space in which the σ-field is generated by open balls and does not necessarily contain Borel sets (see [7], Section 7).In addition, only countably additive probability distributions ν and µ were considered as arguments of the Kullback action.Theorem 1 for an arbitrary measurable space X, finitely additive measures ν and nonnormalized measures µ was first proven in [9], and its generalization for finitely additive measures µ was proven in [10].

The Kullback Action for σ σ σ-Finite Measures
The shortcoming of Theorem 1 is that it does not involve the case of infinite measure µ.In particular, it does not explain any sense of entropy (1) of an absolutely continuous probability distribution on the real axis.Unfortunately, the direct extension of Theorem 1 on infinite measures µ is wrong.The next example demonstrates this.
Example ([22]).Let X be a countable set supplied with the discrete σ-field and µ be the counting measure on X (such that µ(x) = 1 for every x ∈ X).Consider a topology on the space of probability distributions M 1 (X) generated by the neighborhoods (in other words, the topology of L 1 (X, µ)).Then, for any neighborhood (8) and any number C > 0, there exists a finite subset X 0 ⊂ X such that, for all n large enough, The topology on M 1 (X) under consideration contains the weak topology generated by functions from B(X).It follows that, for C > −ρ(ν, µ), estimate (9) contradicts (6), and hence the latter cannot take place.
It turns out that, to extend Theorem 1 on σ-finite measures µ, it is enough to replace the weak neighborhood in ( 6) with a fine one.This is the main result of the paper.Its exact formulation is given in Theorem 3 below.
We also propose one more approach to extend Theorem 1, using only weak topology.Its idea is to replace the space X in estimates ( 6) and ( 7) by a large enough subset Y ⊂ X of finite measure µ(Y), and to replace the probability measure ν ∈ M 1 (X) by its conditional distribution on Y.The corresponding results are stated in Theorems 4 and 5 below.
In order to describe asymptotics of the empirical measures distribution correctly in the case of σ-finite measure µ, the definition of the Kullback action should be modified.To this end, we have to introduce the notion of a spectral potential.Denote by B(X) the set of all bounded above measurable functions on a measurable space (X, A).The spectral potential is the nonlinear functional If the integral in this formula diverges, then we set λ(ϕ, µ) = +∞.Thus, λ(ϕ, µ) can take values in the interval (−∞, +∞].
For brevity, let us introduce the notation where ν ∈ M 1 (x) and f ∈ B(X).If the integral diverges, then we put ν[ f ] = −∞.Now, we define the Kullback action ρ(ν, µ) as a function of the pair of arguments ν ∈ M 1 (X) and µ ∈ M σ (X) as follows: The next theorem shows, in particular, that in the case of a finite measure µ this definition coincides with the previous one (4).

Theorem 2. If a probability distribution
In particular, for the finite measure µ, the alternative (11) takes place.
The following theorem is our main result for the case of countably additive distributions.
Theorem 3 (the local large deviation principle for infinite measures).For any measures ν ∈ M 1 (X), µ ∈ M σ (X), and number ε > 0, there exists a fine neighborhood O(ν) ⊂ M 1 (X) such that On the other hand, for any measures ν ∈ M 1 (X), µ ∈ M σ (X), number ε > 0, and any fine neighborhood O(ν) ⊂ M 1 (X), the following estimate holds for all large enough n: Let us also formulate the local large deviation principle in terms of weak neighborhoods.
For any probability measure ν ∈ M 1 (X) and any measurable subset Y ⊂ X with ν(Y) > 0, define a conditional measure ν Y ∈ M 1 (X) according to the formula It is easily seen that the measure ν can be approximated by the conditional measures ν Y , where µ(Y) < +∞, in the fine topology (and all the more in the weak one).Therefore, it can make sense to replace fine neighborhoods of ν in Theorem 3 by weak neighborhoods of close conditional measures ν Y .
We will say that the Kullback action ρ(ν, µ) is well-defined if ν has a density ϕ = dν/dµ, and, in addition, at least one of the two integrals is finite.In all other cases (i.e., when both integrals (15) are infinite or the measure ν has no density with respect to µ), we will say that the Kullback action is ill-defined.
Theorem 4. Suppose that, for some measures ν ∈ M 1 (X) and µ ∈ M σ (X), the Kullback action ρ(ν, µ) is well-defined.Then, for any number ε > 0, there exists a set X ε ∈ A with µ(X ε ) < +∞ such that for any Y ∈ A containing X ε and having a finite measure µ(Y): (a) there exists a weak neighborhood O(ν Y ) ⊂ M 1 (Y) satisfying the estimate (b) for any fine neighborhood O(ν Y ) ⊂ M 1 (Y) and all large enough n, In addition, for any ε > 0 and any fine neighborhood O(ν) ⊂ M 1 (X), there exists a set Y ∈ A with µ(Y) < +∞ such that for all large enough n, Theorem 5. Suppose that for some measures ν ∈ M 1 (X) and µ ∈ M σ (X), the Kullback action ρ(ν, µ) is ill-defined.Then, there exists a set X 0 ∈ A with µ(X 0 ) < +∞, such that, for any Y ∈ A containing X 0 and having a finite measure µ(Y), and any ε > 0, there exists a weak neighborhood O(ν Y ) ⊂ M 1 (Y) satisfying the estimate It is worth mentioning that, under conditions of Theorem 5, the equality ρ(ν, µ) = −∞ may take place.In such a case, estimates (19) and ( 14) have opposite senses.Nevertheless, there is no contradiction here because the sets in these estimates are different.

Proof of Theorem 2
Recall that, under conditions of Theorem 2, the measure ν ∈ M 1 (X) is absolutely continuous with respect to µ ∈ M σ (X) and has a density ϕ = dν/dµ.First of all, we will prove that for any function ψ ∈ B(X), If at least one of the expressions ν[ψ] or λ(ψ, µ) takes the infinite value allowed to it, then the left-hand side of (20) turns into −∞, and so the inequality is true.Thus, it is enough to consider the case of finite ν[ψ] and λ(ψ, µ).
For any ε > 0, define the set and the conditional distribution ν ε on it: Evidently, ν ε has the density where χ ε is the characteristic function of A ε .
To finish the proof of Theorem 2, it is enough to verify the equality By virtue of (20) the left-hand side of (28) does not exceed the right-hand one.If the right-hand side of (28) equals −∞, then the equality is trivial.Consider the case when the right-hand side of (28) is greater than −∞.By σ-finiteness of µ, there exists a function η ∈ B(X) such that the integral X e η dµ is also finite.Consider the family of functions It follows that the supremum in the left-hand side of (28) coincides with the right-hand side.
Thus, inequality (13) is proven in all cases.

Proof of Theorems 4 and 5
Proof of Theorem 4.Under conditions of Theorem 4, the Kullback action is well-defined.Using the definition of well-definiteness and Theorem 2, we can choose a subset X ε ∈ A with µ(X ε ) < +∞, such that, for any set Y ∈ A that contains X ε and has a finite measure µ(Y), one of the following holds: Now, estimates ( 16) and ( 17) follow from the corresponding estimates of Theorem 1.In addition, estimate (18) comes from estimate (7) of Theorem 1.To see this, it is enough to choose a set Y ∈ A with µ(Y) < +∞ such that, along with one of the conditions (a)-(c), it satisfies the condition ν Y ∈ O(ν).
For any measurable set Y ⊃ X 0 with µ(Y) < +∞, the corresponding conditional distribution ν Y has the density χ Y ϕ/ν (Y).Therefore, and hence estimate (19) follows from estimate (6) of Theorem 1.Now consider the case when the measure ν is not absolutely continuous with respect to µ.Then, there exists X 0 ∈ A with µ(X 0 ) = 0 and ν(X 0 ) > 0. Suppose that a set Y ∈ A with µ(Y) < +∞ contains X 0 .Obviously, the conditional distribution ν Y is not absolutely continuous with respect to µ and hence ρ(ν Y , µ) = +∞.Thus, we can apply Theorem 1 to the measures ν Y , µ on the space Y and obtain (19).

The Case of Finitely Additive Probability Distributions ν ν ν
The necessity of consideration of finitely additive probability distributions ν is caused by the fact that they may happen to be accumulation points for some sequences of empirical measures.Thus, to make the description of the empirical measures distribution complete, we should obtain the estimates similar to ( 13) and ( 14) for finitely additive probability distributions ν as well.
In fact, this can be done, and the principal result is that Theorems 3 and 5 still hold true for finitely additive probability distributions ν, provided the Kullback action ρ(ν, µ) is defined by (10).In addition, in that case, ρ(ν, µ) may take values +∞ or −∞ only, and the both are possible.
The transition from countably additive distributions to only finitely additive ones is not trivial.First of all, we should adapt some previous definitions to the new setting.
Denote by N 1 (X) the set of all finitely additive probability measures on (X, A).Each ν ∈ N 1 (X) is naturally identified with a positive normalized linear functional on the space of bounded measurable functions B(X) (i.e., a functional that takes nonnegative values on nonnegative functions and the unit value on the unit function).Using this identification, we denote the integral of f ∈ B(X) with respect to ν ∈ N 1 (X) as ν[ f ].In addition, for bounded above functions f ∈ B(X), let us define ν[ f ] as Thus, for f ∈ B(X), the value ν[ f ] belongs to the interval [−∞, +∞).Similarly, for a measurable function f that is bounded from below, put Now, we define the Kullback action ρ(ν, µ) for the case when ν ∈ N 1 (X) and µ ∈ M σ (X): Obviously, this definition just duplicates (10).
Let us introduce a fine topology on N 1 (X) by means of neighborhoods of the form where ε > 0 and the functions f 1 , . . ., f k ∈ B(X) are such that all ν[ f i ] are finite.Clearly, this definition is analogous to (5).Note that the bounded above functions in (37) may be replaced by bounded below or even nonnegative ones.This will not change the collection of neighborhoods (37).Now, we reformulate Theorems 3 and 5 for the case of finitely additive distributions ν (note that Theorem 4 cannot be reformulated since ρ(ν, µ) is well-defined, and hence ν is countably additive in it).Theorem 7.For any measures ν ∈ N 1 (X), µ ∈ M σ (X), and number ε > 0, there exists a fine neighborhood O(ν) ⊂ N 1 (X) such that On the other hand, for any measures ν ∈ N 1 (X), µ ∈ M σ (X), number ε > 0, and any fine neighborhood O(ν) ⊂ N 1 (X), the following estimate holds for all large enough n: If ρ(ν, µ) = +∞, then the difference ρ(ν, µ) − ε in (38) should be replaced by 1/ε, and if ρ(ν, µ) = −∞, then the sum ρ(ν, µ) + ε in (39) should be replaced by −1/ε.
A measure ν ∈ N 1 (X) will be called proper with respect to a measure µ ∈ M σ (X) if, for any ε > 0, there exists a set A ∈ A such that µ(A) < +∞ and ν(A) > 1 − ε.If, on the contrary, there exists an ε > 0 such that the inequality ν(A) > 1 − ε implies µ(A) = +∞, then the measure ν will be called improper with respect to µ. Obviously, in the case of finite µ, all measures ν ∈ N 1 (X) are proper, and, in the case of σ-finite µ, all countably additive measures ν ∈ N 1 (X) are proper.Theorem 8. Suppose that for some measures ν ∈ N 1 (X) and µ ∈ M σ (X), the Kullback action ρ(ν, µ) is ill-defined, and the measure ν is proper with respect to µ.Then, there exists a set X 0 ∈ A with µ(X 0 ) < +∞, such that, for any Y ∈ A containing X 0 and having a finite measure µ (Y), and any ε > 0, there exists a weak neighborhood O(ν Y ) ⊂ N 1 (Y) satisfying the estimate
Proof.Construct a sequence of embedded measurable sets • such that all of them have finite measures µ(A n ), satisfy the condition ν(A n ) > 1 − 1/n, and their union is the whole X.
The restriction of µ to each A n is finite and continuous: if a sequence of embedded measurable sets The assumption of Lemma 9 implies that the restriction of ν to A n is continuous as well.It is known that the continuity of a finite measure is equivalent to its countable additivity.Then, the restriction of ν to each A n is countably additive.Since ν is proper, we have ν(B) = lim n→∞ ν(B ∩ A n ) for any measurable B. It follows that ν is countably additive on the whole X (this may be proven in the same way as the countable additivity of a σ-finite measure).
Proof of Theorem 6.It follows from (36) that either ρ(ν, µ) = +∞ or In the first case, the assertion of Theorem 6 is valid.Therefore, it is enough to consider the case when the Kullback action is defined by formula (41).By the assumption of Theorem 6, the measure ν ∈ N 1 (X) has no density with respect to µ.Then, Lemma 9 guarantees validity of at least one of the following two conditions: (a) there exists a positive ε, such that, for any δ > 0, one can choose a measurable set A δ such that µ(A δ ) < δ and ν(A δ ) ≥ ε; (b) the measure ν is improper with respect to µ.

Proof of Theorems 7 and 8
The proof for the first part of Theorem 7 is exactly the same as for the first part of Theorem 3, so we omit it.If ν ∈ M 1 (X), then the second part of Theorem 7 follows from the second part of Theorem 3. Thus, it remains to consider the case ν ∈ N 1 (X) \ M 1 (X).
Let B be some σ-field of subsets of X.We will call it discrete if it is generated by a countable or finite partition of X. Lemma 10.For any measure ν ∈ N 1 (X) and any its fine neighborhood O(ν), there exists a discrete σ-subfield B ⊂ A such that (a) the restriction of ν to B is countably additive; (b) there exists a fine neighborhood O (ν) ⊂ O(ν) generated by B-measurable functions; (c) if the measure ν is proper with respect to µ ∈ M σ (X), then the σ-field B mentioned above can be chosen in such a way that each of its atoms has a finite measure µ.
Proof.A base for the fine topology on N 1 (X) is formed by the neighborhoods where f i are measurable nonnegative functions on (X, A) with ν[ f i ] < +∞.
To each integer vector k = (k 1 , . . ., k m ) ∈ Z m , assign the set These sets form a countable measurable partition of X and generate the desired discrete σ-subfield B.
The functions g i are B-measurable.Note that, for any C > 0, we have Thus, when C goes to +∞, It follows that the restriction of ν to the σ-field B is countably additive.Assume that the measure ν is proper with respect to µ.In this case, we can construct a countable partition of X into subsets Y l ∈ A such that µ(Y l ) < +∞ and ν(Y 1 • • • Y l ) → 1 as l → +∞.The latter condition implies the equality ν(X k ) = ∑ l ν(X k ∩ Y l ).Therefore, the restriction of ν to the σ-field generated by the atoms X k ∩ Y l is countably additive.This σ-field may be treated as B. By construction, its atoms have finite measure µ.
Suppose the measure ν is proper with respect to µ and ρ(ν, µ) = −∞.We can apply Lemma 10 to ν and construct the corresponding discrete σ-subfield B ⊂ A and fine neighborhood O (ν) ⊂ O(ν).Denote by ν and μ the restrictions of ν and µ to B. By Lemma 10, they are countably additive.From definition (36), it follows that if μ(A) = 0 for some A ∈ B, then ν(A) = 0 as well (since otherwise ρ(ν, µ) = +∞).Thus, the distribution ν on B is absolutely continuous with respect to μ.
Recall that by definition, where B(X) is the set of all bounded above A-measurable functions.The same is true for all bounded above B-measurable functions, and hence ρ(ν, μ) = −∞ as well.Since ν is absolutely continuous with respect to μ, the second part of Theorem 7 for ν and μ is proven.It implies the estimate Fix such a large k that ν k ∈ O (ν), and, at the same time, µ(Y k ) = +∞.The latter implies µ(X i ) = +∞ for at least one i ≤ k.Without loss of generality, we may assume that ν k (X i ) > 0 for all i ≤ k.Obviously, for any large enough n, there exists a sequence y = (y 1 , . . ., y n ) ∈ Y n k such that the empirical measure δ y,n is so close to ν k that δ y,n ∈ O (ν) and each of the sets X 1 , . . ., X k contains at least one of the points y 1 , . . ., y n .Define positive integers i j in such a way that y j ∈ X i j for j = 1, . . ., n.Then, µ(X i j ) > 0 for all j = 1, . . ., n (since otherwise ρ(ν, µ) = +∞) and µ(X i j ) = +∞ for at least one j.Therefore, x j ∈ X i j , j = 1, . . ., n = n ∏ j=1 µ(X i j ) = +∞, and thereby estimate (39) is completely proven.
Proof of Theorem 8.If ν ∈ M 1 (X), then the assertion of Theorem 8 follows from Theorem 5.
Let ν ∈ N 1 (X) \ M 1 (X).Then, ν is not absolutely continuous with respect to µ.Since ν is proper, by Lemma 9, there exists an ε 0 > 0 such that, for any positive integer n, there exists A n ∈ A satisfying µ(A n ) < 2 −n and ν(A n ) ≥ ε 0 .Set X 0 = n A n .
Suppose a set Y ∈ A with µ(Y) < +∞ contains X 0 .Then, the conditional distribution ν Y is not absolutely continuous with respect to µ.On the other hand, (36) and the conditions µ(Y) < +∞ and and each Y k is of finite measure µ, and their union gives the whole X.Denote by ν k the conditional distribution of ν on Y k : μn x ∈ X n δ x,n ∈ O (ν) ≥ e n/εfor all large enough n.Due to the inclusion O (ν) ⊂ O(ν), we obtain (39).Consider the case of improper ν.We can apply Lemma 10 and construct the corresponding discrete σ-subfield B and a fine neighborhood O (ν) ⊂ O(ν) generated by B-measurable functions.The field B is generated by a certain denumerable partitionX = X 1 X 2 X 3 • • • .Change numeration of the sets X i so that ν(X 1 ) ≥ ν(X 2 ) ≥ ν(X 3 ) ≥ • • • .Put Y k = X 1 X 2• • • X k and denote by ν k the conditional distribution of ν on Y k .Due to the countable additivity, ν(Y k ) → 1 and ν k ∈ O (ν) for all large enough k.In addition, the improperness of ν implies that µ(Y k ) = +∞ for all large enough k.
Let us prove the Lemma for a neighborhood of this sort.Define the step-functionsg i = ε[ f i /ε],where [ • ] denotes the integer part of a number, and the neighborhood O