Abstract
Asymptotic unbiasedness and -consistency are established, under mild conditions, for the estimates of the Kullback–Leibler divergence between two probability measures in , absolutely continuous with respect to (w.r.t.) the Lebesgue measure. These estimates are based on certain k-nearest neighbor statistics for pair of independent identically distributed (i.i.d.) due vector samples. The novelty of results is also in treating mixture models. In particular, they cover mixtures of nondegenerate Gaussian measures. The mentioned asymptotic properties of related estimators for the Shannon entropy and cross-entropy are strengthened. Some applications are indicated.
Keywords:
Kullback–Leibler divergence; Shannon differential entropy; statistical estimates; k-nearest neighbor statistics; asymptotic behavior; Gaussian model; mixtures MSC:
60F25; 62G20; 62H12
1. Introduction
The Kullback–Leibler divergence introduced in [] is used for quantification of similarity of two probability measures. It plays important role in various domains such as statistical inference (see, e.g., [,]), metric learning [,], machine learning [,], computer vision [,], network security [], feature selection and classification [,,], physics [], biology [], medicine [,], finance [], among others. It is worth to emphasize that mutual information, widely used in many research directions (see, e.g., [,,,,]), is a special case of the Kullback–Leibler divergence for certain measures. Moreover, the Kullback–Leibler divergence itself belongs to a class of f-divergence measures (with ). For comparison of various f-divergence measures see, e.g., [], their estimates are considered in [,].
Let and be two probability measures on a measurable space . The Kullback–Leibler divergence between and is defined, according to [], by way of
where stands for the Radon–Nikodym derivative. The integral in (1) can take values in . We employ the base e of logarithms since a constant factor is not essential here.
If , where , and (absolutely continuous) and have densities, and , , w.r.t. the Lebesgue measure , then (1) can be expressed as
where we write instead of to simplify notation. One formally sets , if , , and . Then is a measurable function with values in . So, the right-hand sides of (1) and (2) coincide. Formula (2) is justified by Lemma A1, see Appendix A.
Denote by the support of a (version of) probability density f. The integral in (2) is taken over and does not depend on the choice of p and q versions.
The following two functionals are closely related to the Kullback - Leibler divergence. For probability measures and on having densities and , , w.r.t. the Lebesgue measure , one can introduce, according to [], p. 35, entropy H (also called the Shannon differential entropy) and cross-entropy C as follows
In view of (2), whenever the right-hand side is well defined.
Usually one constructs statistical estimates of some characteristics of a stochastic model under consideration relying on a collection of observations. In the pioneering paper [] the estimator of the Shannon differential entropy was proposed, based on the nearest neighbor statistics. In a series of papers this estimate was studied and applied. Moreover, estimators of the Rényi entropy, mutual information and the Kullback–Leibler divergence have appeared (see, e.g., [,,]). However, the authors of [] indicated the occurrence of gaps in the known proofs concerning the limit behavior of such statistics. Almost all of these flaws refer to the lack of proved correctness of using the (reversed) Fatou lemma (see, e.g., [], inequality after the statement (21), or [], inequality (91)) or the generalized Helly–Bray lemma (see, e.g., [], page 2171). One can find these lemmas in [], p. 233, and [], p. 187. Paper [] has attracted our attention and motivated study of the declared asymptotic properties. Furthermore, we would like to highlight the important role of the papers [,,,]. Thus, in a recent work [] the new functionals were introduced to prove asymptotic unbiasedness and -consistency of the Kozachenko–Leonenko estimators of the Shannon differential entropy. We used the criterion of uniform integrability, for different families of functions, to avoid employment of the Fatou lemma since it is not clear whether one could indicate the due majorizing functions for those families. The present paper is aimed at extension of our approach to grasp the Kullback–Leibler divergence estimation. Instead of the nearest neighbor statistics we employ the k-nearest neighbor statistics (on order statistics see, e.g., []) and also use more general forms of the mentioned functionals.
Note in passing that there exist investigations treating important aspects of the entropy, Kullback–Leibler divergence and mutual information estimation. The mixed models and conditional entropy estimation are studied, e.g., in [,]. The central limit theorem (CLT) for the Kozachenko–Leonenko estimates is established in []. In [], deep analysis of efficiency of functional weighted estimates was performed (including CLT). The limit theorems for point processes on manifolds are employed in [] to analyze behavior of the Shannon and the Rényi entropy estimates. The convergence rates for the Shannon entropy (truncated) estimates are obtained in [] for one-dimensional case, see also [] for multidimensional case. A kernel density plug-in estimator of the various divergence functionals is studied in []. The principal assumptions of that paper are the following: the densities are smooth and have common bounded support S, they are strictly lower bounded on S, moreover, the set S is smooth with respect to the employed kernel. Ensemble estimation of various divergence functionals is studied in []. Profound results for smooth bounded densities are established in recent work []. The mutual information estimation by the local Gaussian approximation is developed in []. Note that various deep results (including the central limit theorem) were obtained for the Kullback–Leibler estimates under certain conditions imposed on derivatives of unknown densities (see, e.g., the recent papers [,]). In a series of papers the authors demand boundedness of densities to prove -consistency for the Kozachenko–Leonenko estimates of differential Shannon entropy (see, e.g., []).
Our goal is to provide wide conditions for the asymptotic unbiasedness and -consistency of the specified Kullback–Leibler divergence estimates without such smoothness and boundedness hypotheses. Furthermore, we do not assume that densities have bounded supports. As a byproduct we obtain new results concerning Shannon differential entropy and cross-entropy.
We employ probabilistic and analytical techniques, namely, weak convergence of probability measures, conditional expectations, regular probability distributions, k-nearest neighbor statistics, probability inequalities, integration by parts in the Lebesgue–Stieltjes integral, analysis of integrals depending on certain parameters and taken over specified domains, criterion of the uniform integrability of various families of functions, slowly varying functions.
The paper is organized as follows. In Section 2, we introduce some notation. In Section 3 we formulate main results, i.e., Theorems 1 and 2. Their proofs are provided in Section 4 and Section 5, respectively. Section 6 contains concluding remarks and perspectives of future research. Proofs of several lemmas are given in Appendix A.
2. Notation
Let X and Y be random vectors taking values in and having distributions and , respectively, (below we will take and ). Consider random vectors and with values in such that and , . Assume also that are independent. We are interested in statistical estimation of constructed by means of observations and , . All the random variables under consideration are defined on a probability space , each measure space is assumed complete.
For a finite set , where , and a vector , renumerate points of E as in such a way that , is the Euclidean norm in . If there are points having the same distance from v then we numerate them according to the indexes increase. In other words, for , is the k-nearest neighbor of v in a set E. To indicate that is constructed by means of E we write . Fix , and (for each ) put
We assume that X and Y have densities and . Then with probability one all the points in are distinct as well as points of .
Following [] (see Formula (17) there) introduce an estimate of
where is the digamma function, , , are collections of integers and, for some and all , , . Note that (3) is well-defined for , . If and , , then, for and , we write
If then
and we come to formula (5) in []. For an intuitive background of the proposed estimates one can address [] (Introduction, Parts B and C).
We write for , , and is the volume of the unit ball in . Similar to (3) with the same notation and the same conditions for and , , one can define the Kozachenko - Leonenko type estimates of and , respectively, by formulas
In [], an estimate (5) was proposed for , . If , , , and , then one has
Remark 1.
Some extra notation is necessary. As in [], given a probability density f in , we consider the following functions of , and , that is, define integral functionals (depending on parameters)
Some properties of function are demonstrated in []. By virtue of Lemma 2.1 [], for each probability density f, the function introduced above is continuous in on . Hence on account of Theorem 15.84 [] the functions and for any have to be upper semicontinuous and lower semicontinuous, respectively. Therefore, Borel measurability of these nonnegative functions ensues from Proposition 15.82 []. On the other hand, the function is evidently nonincreasing whereas is nondecreasing for each x in . Notably, changing to transforms the function into the famous Hardy–Littlewood maximal function well-known in Harmonic analysis.
Set and , . Introduce a function , . For , , set Evidently, this function (for each ) is defined if . For , consider the continuous nondecreasing function , given by formula
In other words we employ the function having the form where a function , taken as N iterations of , is slowly varying for large t.
For probability densities in , and positive constants , introduce the functionals taking values in
Set .
3. Main Results
Theorem 1.
Let, for some positive and , the functionals , , be finite if and . Then and
Consider 3 kinds of conditions (labeled A,B,C, possibly with indices, and involving parameters indicated in parentheses) on probability densities.
For probability densities in and some positive
As usual, whenever (or ) for and , where Q is a -finite measure on . Condition (15) with is used, e.g., in [,,].
A version of f is upper bounded by a positive number :
) A version of f is lower bounded by a positive number :
Corollary 1.
Let, for some , condition be satisfied when and . Then the statements of Theorem 1 are true, provided that and are both valid for and . Moreover, if the latter assumption involving and holds then conditions of Theorem 1 are satisfied whenever p and q have bounded supports.
Next we formulate conditions to guarantee -consistency of estimates (4).
Theorem 2.
Let the requirement in conditions of Theorem 1 be replaced by , given and . Then and, for any fixed , the estimates are -consistent, i.e.,
Corollary 2.
For some , let condition be satisfied if and . Assume that and are both valid for and . Then the statements of Theorem 2 are true. Moreover, if the latter assumption involving and holds then conditions of Theorem 2 are satisfied whenever p and q have bounded supports.
Currently we dwell on a modification of condition introduced in [] that allows us to work with densities that need not have bounded supports.
There exist a version of density f and such that, for some ,
Remark 3.
If, for some positive ε, R and c, condition ) is true and
then is finite. Hence we could apply, for and in Theorems 1 and 2, condition and presume, for some , validity of (17) and finiteness of instead of the corresponding assumptions and . An illustrative example to this point is provided with a density having unbounded support.
Corollary 3.
The latter formula can be found, e.g., in [], p. 147, example 6.3. The proof of Corollary 3 is discussed in Appendix A.
Similarly to condition let us consider the following one.
There exist a version of density f and such that, for some ,
Remark 4.
If, for some positive ε, R and c, condition is true and
then obviously . Thus, in Theorems 1 and 2 one can employ, for and , condition and exploit, for some , the validity of (18) and finiteness of instead of the assumptions and , respectively.
Remark 5.
D.Evans applied “positive density condition” in Definition 2.1 of [] assuming the existence of constants and such that for all and . Consequently , . Then for all . Analogously, , , , and for all . The above mentioned inequalities from Definition 2.1 of [] are valid, provided that density f is smooth and its support in is a convex closed body, see proof in []. Therefore, if p and q are smooth and their supports are compact convex bodies in , the relations (14) and (16) are valid.
Moreover, as a byproduct of Theorems 1 and 2, we obtain the new results indicating both the asymptotic unbiasedness and -consistency of the estimates (7) for the Shannon differential entropy and cross-entropy.
Theorem 3.
Let and for some positive ε and R. Then is finite and the following statements hold for any fixed .
- (1)
- If, for some , , then
- (2)
- If, for some , , then
In particular, one can employ with instead of , and with instead of , where .
The first claim of this Theorem follows from the proof of Theorem 1. In a similar way one can infer the second statement from the proof of Theorem 2. If we take in conditions of Theorem 3 then we get the statement concerning the entropy since .
Now we consider the case when p and q are mixtures of some probability densities. Namely,
where , are probability densities (w.r.t. the Lebesgue measure ), positive weights , are such that , , , , . Some applications of models described by mixtures are treated, e.g., in [].
Corollary 4.
The proof of this Corollary is given in Appendix A. Thus, due to Corollaries 3 and 4 one can guarantee the validity of (14) and (16) for any mixtures of nondegenerate Gaussian densities. Note also that in a similar way we can claim the asymptotic unbaisedness and -consistency of estimates (7) for mixtures satisfying conditions of Corollary 4.
Remark 6.
Let us compare our new results with those established in []. Developing the approach of [] to analysis of asymptotic behavior of the Kozachenko–Leonenko estimates of the Shannon differential entropy we encounter new complications due to dealing with k-nearest neighbor statistics for (not only for ). Accordingly, in the framework of the Kullback–Leibler divergence estimation, we propose a new way to bound the function playing the key role in the proofs (see Formula (28)). Furthermore, instead of the function (for ), used in [] for the Shannon entropy estimates, we employ a regularly varying function where (for t large enough) is the N-fold iteration of the logarithmic function and can be large. Whence in the definition of integral functional by formula (11) one can take a function having, for , the growth rate close to that of function z. Moreover, this entails a generalization of paper [] results. Now we invoke convexity of (see Lemma 6) to provide more general conditions for asymptotic unbiasedness and -consistency of the Shannon differential entropy as opposed to [].
4. Proof of Theorem 1
Note that the general structure of this proof, as well as that of Theorem 2, is similar to the one originally proposed in [] and later used in various papers (see, e.g., [,,]). Nevertheless in order to prove both theorems correctly we employ new ideas and conditions (such as uniform integrability of a family of random variables) in our reasoning.
Remark 7.
In the proof, for certain random variables (depending on some parameters), we will demonstrate that , as (and that all these expectations are finite). To this end, for a fixed -valued random vector τ and each , where A is a specified subset of , we will prove that
It turns out that and , where the auxiliary random variables and can be constructed explicitly for each . Moreover, it is possible to show that, for each , one has , . Thus, to prove (20) the Fatou lemma is not used, it is not evident whether there exists a random variable majorizing those under consideration. Instead we verify, for each , the uniform integrability (w.r.t. measure ) of a family . Here we employ the necessary and sufficient conditions of uniform integrability provided by de la Vallée–Poussin theorem (see, e.g., Theorem 1.3.4 in []). After that, to prove the desired relation , , we have a new task. Namely, we check the uniform integrability of a family , where , w.r.t. the measure , i.e., the law of τ, and does not depend of x. Then we can prove that
Further we will explain a number of nontrivial details concerning the proofs of uniform integrability of various families, the choice of the mentioned random variables (vectors), the set A, and .
The first auxiliary result explains why without loss of generality (w.l.g.) we can consider the same parameters for different functionals in conditions of Theorems 1 and 2.
Lemma 1.
Let p and q be any probability densities in . Then the following statements are valid.
- (1)
- If for some and then for any and each .
- (2)
- If for some and then for any and each .
- (3)
- If for some and then for any and each .
In particular one can take and the statements of Lemma 1 still remain valid. The proof of Lemma 1 is given in Appendix A.
Remark 8.
According to Remark 2.4 of [] if, for some positive , the integrals , , , are finite then
Therefore (and thus in view of Lemma A1).
For such that , for fixed and , where , and , set , . Then we can rewrite the estimate as follows:
It is sufficient to prove the following two assertions.
Statement 1.
For each fixedl, allmlarge enough and any, is finite.
Moreover,
Statement 2.
For each fixed k, all nlarge enough and any, is finite.
Moreover,
Recall that, as explained in [], for a nonnegative random variable V (thus ) and any random -valued vector, one has
This signifies that both sides of (25) coincide, being finite or infinite simultaneously. Let be a regular conditional distribution function of a nonnegative random variable given X where and . Let h be a measurable function such that . It was also explained in [] that, for -almost all , it follows (without assuming )
This means that both sides of (26) are finite or infinite simultaneously and coincide.
By virtue of (25) and (26) one can establish that , for all m large enough, fixed l and for all i, and that (23) holds. To perform this take , , , (we use in the proof of Theorem 2) and . If and then (26) is true as well. To avoid increasing the volume of this paper we will only examine the evaluation of as all the steps of the proof will be the same when treating .
The proof of Statement 1 is partitioned into 4 steps. The first three demonstrate that there is a measurable , depending on p and q versions, such that and, for any , , the following relation holds:
The last Step 4 justifies the desired result (23). Finally Step 5 validates Statement 2.
Step 1. Here we establish the distribution convergence for the auxiliary random variables. Fix any and . To simplify notation we do not indicate the dependence of functions on d. For and , we identify the asymptotic behavior (as ) of the function
where
We take into account in (28) that random vectors are independent and condition that have the same law as Y. We also noted that an event is a union of pair-wise disjoint events , . Here means that exactly s observations among belong to the ball and other are outside this ball (probability that Y belongs to the sphere equals 0 since Y has a density w.r.t. the Lebesgue measure ). Formulas (28) and (29) show that is the regular conditional distribution function of given . Moreover, (28) means that , are identically distributed and we may omit the dependence on i. So, one can write instead of .
According to the Lebesgue differentiation theorem (see, e.g., [], p. 654) if , for -almost all , one has
Let denote the set of Lebesgue points of a function q, namely the points in satisfying (30). Evidently it depends on the choice of version within the class of functions in equivalent to q, and, for an arbitrary version of q, we have .
Clearly, for each , as , and . Therefore by virtue of (30), for any fixed and ,
where . Hence, for (thus ), due to (28)
Relation (31) means that
where has the Gamma distribution with parameters and .
For any , one can assume w.l.g. that the random variables and are defined on a probability space . Indeed, by the Lomnicki–Ulam theorem (see, e.g., [], p. 93) the independent copies of and exist on a certain probability space. The convergence in distribution of random variables survives under continuous mapping. Thus, for any , we see that
We have employed that a.s. for each and Y has a density, so it follows that . More precisely, we take strictly positive versions of and for each .
Step 2. Now we show that, instead of (27) validity, one can verify the following assertion. For μ-almost every
Note that if , where and , then , where is a digamma function. Set for (then ) and . Hence . By virtue of (26), for each ,
Hence, for , the relation holds if and only if (33) is true.
According to Theorem 3.5 [] we would have established (33) if relation (32) could be supplemented, for -almost all , by the condition of uniform integrability of a family . Note that, for each , a function introduced by (10) is nondecreasing on and , as . By the de la Vallée–Poussin theorem (see, e.g., Theorem 1.3.4 []), to ensure, for -almost every , the uniform integrability of , it suffices to prove the following statement. For the indicated x, a positive and , one has
where appears in conditions of Theorem 1. Moreover, it is possible to find that does not depend on as we will show further.
Step 3. This step is devoted to proving validity of (34). It is convenient to divide this step into its own parts (3a), (3b), etc. For any , set
where the product over empty set (when ) is equal to 1.
The proof of the following result is placed at Appendix A.
Lemma 2.
Let , be a distribution function such that . Then, for each , one has
Fix N appearing in conditions of Theorem 1. Observe that, for , one has . Therefore, according to Lemma 2, for and , we get where
For convenience sake we write and without indicating their dependence on and d (these parameters are fixed).
Part (3a). We provide bounds for. Take appearing in conditions of Theorem 1 and any . Introduce , where, for , . Then if . Note also that we can consider only everywhere below, because the size of sample is not less than the number of neighbors l (see, e.g., (28)). Thus, for , , and ,
and we arrive at the inequality
If and then, for all , invoking the Bernoulli inequality, one has
Recall that we assume for some , . By virtue of Lemma 1 one can take . So, due to (36) and since for all , and , we get
Therefore, for any and , one can write
where , . We took into account that whenever .
Part (3b).We give bounds for. Since if , we can write, for ,
Evidently,
where and .
By Markov’s inequality for any and . One has
Consequently, for each ,
To simplify bounds we take and set , (recall that l is fixed). Thus, and . Therefore,
where we have used simple inequality , .
For appearing in conditions of the Theorem and any , one can choose such that if then Due to (29) and (41), for and , one has
by definition of (for ) in (9). Now we use the following Lemma 3.2 of [].
Lemma 3.
For a version of a density q and each , one has where and is defined according to (9).
It is easily seen that, for any and each , one has . Thus, for , , and , we deduce from conditions of the Theorem (in view of Lemma 1 one can suppose that ) that
We also took into account that for and applied relation (42). Thus, for all and any ,
where .
Then, for all and any ,
where , .
Part (3d).To indicate bounds forwe employ several auxiliary results.
Lemma 4.
For each and any , there are such that, for arbitrary ,
The proof is given in Appendix A.
On the one hand, by (29), for any , we get
On the other hand, by (28), one has . Consequently, for any , and all ,
Moreover, . So, . Thus, due to Lemmas 2 and 4 (for )
since for , .
Now we will estimate in a way different from (40). Fix any . Note that, for all and , it holds . Then, for all , and , in view of (28) one can write
We are going to employ the following statement as well.
Lemma 5.
For each , a function , , is slowly varying at infinity.
The proof is elementary and thus is omitted.
Part (3e). Now we are ready to get the bound for. Set . Then one has
Given , Lemma 5 implies that for w large enough, namely for all , where . Take and set . Let further . Then
Hence it can be seen that
Introduce
Let us note that (1) as ;
(2) as (see Lemma A1);
(3) due to Lemma 3.
Since we conclude that . Hence, one has in view of 2) and because for any . Set further . It follows from (1), (2) and (3) that , so . We are going to consider only .
Part (3f). Here we get the upper bound for. For and each , taking into account (39), (44), (45) and (51) we can claim that
For any , one can take such that if . Then by virtue of (52), for each and ,
Hence, for each , we have established uniform integrability of the family .
Step 4. Now we verify (23). It was checked, for each (thus, for -almost every x belonging to ) that , . Set . Consider and take any . We use the following property of which is shown in Appendix A.
Lemma 6.
For each , a function is convex on .
By the Jensen inequality a function is nondecreasing and convex.
Relation (53) guarantees that, for all ,
Now we know that the family , , is uniformly integrable w.r.t. measure . Thus, for ,
and we come to relation (23) establishing Statement 1.
Step 5. Here we prove Statement 2. Similar to , one can introduce, for , , and , the following function
where was defined in (29), and
Formulas (54) and (55) show that is the regular conditional distribution function of given . Moreover, for any fixed and (thus ),
Hence, , , . Set , where and
Take . Then and, for , one can verify that , for all , and therefore as . Thus, , . Set . One can see that, for all , . Hence similar to Steps 1–4 we come to relation (24).
So, (14) holds and the proof of Theorem 1 is complete.
5. Proof of Theorem 2
We will follow the general scheme described in Remark 7. However now this scheme is more involved.
First of all note that, in view of Lemma 1, the finiteness of and implies the finiteness of and , respectively. Thus, the conditions of Theorem 2 entail validity of Theorem 1 statements. Consequently under the conditions of Theorem 2, for n and m large enough, one can claim that and , as .
We will show that for all n and m large enough. Then one can write
Therefore to prove (16) we will demonstrate that , .
Due to (28) the random variables are identically distributed (and , are identically distributed as well). The variables and are the same as in (22). We will demonstrate that and belong to . Hence (22) yields
We mainly follow the notation employed in the above proof of Theorem 1, except the possibly different choice of the sets , , positive and integers , where and . The following Theorem 2 proof is also subdivided in 5 parts. Steps 1–3 deal with the demonstration of relation as . Step 4 validates the relation as . At Step 5 we establish that
This step is rather involved. Step 6 justifies the desired statement , .
Step 1. We study, as. For and , introduce
Set . Then since . Consider
where the first four sets appeared in Theorem 1 proof, R and N are indicated in conditions of Theorem 2. It is easily seen that . The reasoning is exactly the same as in the proof of Theorem 1.
Recall that, for each , one has , where and has distribution. Convergence in law of random variables is maintained by continuous transformations. Thus, for each , we get
For any , according to (28),
Note that if , where and , then it is not difficult to verify that
Since , for , one has
where and depend only on fixed l and d.
We prove now that, for , one has
Taking into account (60) and (61) we can claim that relation (62) is equivalent to the following one: , . So, in view of (59) to prove (62) it is sufficient to show that, for each , a family is uniformly integrable for some . Then, following Theorem 1 proof, one can certify that, for all and some nonnegative ,
As usual, a product over an empty set (if ) equals to 1. To show (63) we refer to the next lemma.
Lemma 7.
Let , be a distribution function such that . Fix an arbitrary . Then
The proof of this lemma is omitted, being quite similar to one of Lemma 2. By Lemma 7 and since , for , one has
To simplify notation we do not indicate the dependence of () on fixed N, l and d.
For clarity, further implementation of Step 2 is divided into several parts.
Part (2a).At first we consider. As in Theorem 1 proof, for fixed and appearing in the conditions of Theorem 2, an inequality holds for any , and . Taking into account that if , we get, for ,
Here , for each and any .
Part (2b).Consider. Following the previous theorem proof we at first observe that for . So, for all ,
where we do not indicate the dependence of () on N, l and d.
For and appearing in the conditions of Theorem 2, one can show (see Theorem 1 proof), that inequality
holds for any , and all . Here and are the same as in the proof of Theorem 1. For all and , we come to the relations
where .
Part (2d). Now we turn to. Take . Then has the form
Due to Lemma 5 there exists such that
Pick some and set , where was introduced in (68). Consider . In view of Lemma 4 (for ), (49), (68) and since ,
, , is defined in (57). Here we have also used, for any , , , the following estimates
Part (2e). Examine. Thus, for each and , taking into account (64), (66), (67) and (69), we can claim that
Moreover, for any , one can choose such that, for , it holds . Then by (70), for each and ,
Hence we have proved the uniform integrability of the family for each . Therefore, for any (thus for -almost every ), relation (62) holds.
Step 3. Now we can return to. Set . Consider and take any . A function is nondecreasing and convex according to Lemma 6. Due to the Jensen inequality
Relation (72) guarantees that, for each and all ,
Uniform integrability of the family (w.r.t measure ) is thus established. Hence one can claim that
It is easily seen that finiteness of integrals , implies that
Thus, and , , where according to (23). Consequently, as .
Step 4. Now we considerfor, where. For , define conditional distribution function
For , , ,
Here for all , as previously. One can write instead of , because the right-hand side of (73) does not depend on i and j.
Set and , where A is introduced in (58). Evidently, and . Consider . Obviously, for any , , as . For we take . Then and for all . Thus, if . Consequently, for ,
For any fixed and , we get, as ,
Thus, is identified as a distribution function of a vector having independent components such that , . Observe also that is a distribution function of a random vector . Consequently, we have shown that as . Hence, for any ,
Here we take strictly positive versions of random variables under consideration. Note that, for all , ,
One has because and are independent, here , .
Now we intend to verify that, for any ,
Equivalently, one can prove that for each , as .
Part (4a). We will prove the uniform integrability of a familyfor. The convex function is nondecreasing. Thus, following the proof of Step 2 for any , one can find (the same as in the proof of Step 2 such that, for all ,
Here we used (71). It is essential that do not depend on x and y. Hence, for any , a family is uniformly integrable. So, we establish (78) for .
Therefore, for , the family is uniformly integrable w.r.t. . Consequently,
Thus
On the other hand, taking also into account (23), we come to the relation
Step 5. Now we consider for , where .
Similarly to Step 4, for and , introduce a conditional distribution function
where , . We write , and instead of , , , respectively, (because are i.i.d. random vectors). Moreover, is the distribution function of a random vector and the regular conditional distribution function of a random vector given . One has
Introduce
and , where the first three sets appeared in Theorem 1 proof (Step 5) Then since . It is easily seen that .
Take and . Evidently, and . For any , , as . Hence, for , one can find such that , if . Then if . Thus, for , one has
Therefore, for each fixed , , we get, as ,
Here denotes the distribution function of a vector . The components of are independent, and . Consequently, for each fixed , we have shown that as . Therefore, for such ,
Here we take strictly positive versions of the random variables under consideration. In a way similar to (77), for , , we write
Since and are independent, write , where , .
For any fixed , consider . Now our aim is to verify that, for each ,
Equivalently, we can prove, for each , that
The idea that we consider only is principal for the further proof.
Part (5a). We are going to establish that, for, a familyis uniformly integrable, whereis independent of, but might depend onM. Then, due to (83), the relation (85) would be valid for such as well. As we have seen, the function is nondecreasing and convex. Hence
Let us consider, for instance, . Alike Step 2 we can write
where
As usual a sum over empty set is equal to 0 (for ).
If , where and , then . Thus, if . In view of (87) and similarly to (38), one has
for , , , here . So, for and . Moreover, for all , in view of (87) it holds
The same reasoning as was used in Theorem 1 proof (Step 3, Part (3b)) leads to the inequalities
for all . Then similarly to (70), the relation
is valid for all and . Here do not depend on x and y. The term can be treated in the above manner. Thus, in view of (86), one has
Therefore, for any , a family is uniformly integrable. Thus, we come to (84) for .
Part (5b). Now we return to upper bound for. Set
for all . Validity of (84) is equivalent to the following relation: for any , , as . Take any . For each , it was shown that
Note that
Therefore, for , the family is uniformly integrable w.r.t. . Hence, by virtue of (84), for each ,
where . Now we turn to the case . One has and as and are independent and have a density w.r.t. the Lebesgue measure . Then in view of continuity of a probability measure it holds that , as Taking into account that, for an integrable function h, as , we get
since (the proof is similar to the establishing that ). Thus, for any , one can find such that, for all and ,
Also there exists such that, for all ,
Take . Due to (90) there is such that entails the following inequality
Moreover, in view of (24) (see Step 5 of Theorem 1 proof), it follows that
Therefore,
Step 6. Here we complete the analysis of summands in (56). Reasoning as at Steps 1–3 shows that since
for each , as . It remains to prove that , as .
For , one has for all large enough. So, it suffices to show that
For , , , , let us introduce a conditional distribution function
We used that is a collection of independent vectors. Now we combine the estimates obtained at Steps 4 and 5 of Theorem 2 proof to verify that, for and , , .
Thus, we have established that as , hence (16) holds. The proof of Theorem 2 is complete.
6. Conclusions
The aim of this paper is to provide wide conditions ensuring the asymptotic unbiasedness and mean square consistency for statistical estimates of the Kullback–Leibler divergence proposed in []. We do not impose restrictions on the smoothness of the densities under consideration and do not assume that the densities have bounded supports. Thus, in particular one can apply our results to various mixtures of distributions, for instance, to mixture of any nondegenerate normal laws in (Corollary 4). As a byproduct we relax conditions in our recent analysis of the Kozachenko - Leonenko type estimators for the Shannon differential entropy [] and use these conditions in estimating the cross-entropy as well. Observe that the integral functional appearing in Theorems 1–3 involves the function which is close to a function t when parameter N is large enough. Thus, we impose essentially less restrictive condition than one requiring a function for some instead of . Even for the latter choice of G our results provide the first valid proof without appealing to the Fatou lemma (the long standing problem to obtain correct proofs was discussed in Introduction). An interesting and hard problem for future research is to find the class of functions such that one can replace in expression of by , where , as , and keep the validity of established theorems. Here one can see the analogy with investigation of fluctuations of sums of random variables or Brownian motion by G.H.Hardy, H.D.Steinhaus, A.Ya.Khinchin, A.N.Kolmogorov, I.G.Petrovski, W.Feller and other researchers. The increasing precision on the way of description of the upper and lower functions has led to the law of the iterated logarithm and its generalizations. Another deep problem is to provide sharp conditions of CLT validity for estimates of the Kullback - Leibler divergence.
Beside pure theoretical aspects the estimates of entropy and related functionals have diverse applications. In [], the estimates of the Kullback - Leibler divergence are applied to the change-point detection in time series. That issue is important, e.g., in analysis of stochastic financial models. Moreover, it is interesting to study the spatial variant of this problem. Namely, in [,] statistical estimates of entropy and scan-statistics (see, e.g., []) were employed for identification of inhomogeneities of fiber materials. In [], the Kullback–Leibler divergence estimators are used to identify multivariate spatial clusters in the Bernoulli model. A modification of the latter paper idea can also be applied to analysis of the fiber structures. Such structures in can be modeled by spatial point stochastic process to specify the locations of the centers of fibers (segments). A certain law on the unit sphere of can be used to model their directions. The length of fibers can be fixed or follow some distribution on . Since various scan domains could contain random number of observations the development of present results will be applied along with the theory of random sums of random variables. The latter theory (see, e.g., []) is essential in this case. Moreover, we intend to employ the studied estimators in the feature selection theory, actively used in Genome-wide association studies (GWAS), see, e.g., [,,]. In this regard statistical estimates of the mutual information were proposed, see, e.g., []. Note also an important problem of stability analysis of constructing, by means of statistical estimates of the mutual information, a sub-collection of relevant (in a sense) factors determining a random response. The above mentioned applications will be considered in separate publications, supplemented with computer simulations and illustrative graphs.
Author Contributions
Conceptualization, A.B. and D.D.; validation, A.B. and D.D.; writing—original draft preparation, A.B. and D.D.; writing—review and editing, A.B. and D.D.; supervision, A.B.; project administration, A.B.; funding acquisition, A.B. All authors have read and agreed to the published version of the manuscript.
Funding
The work of the first author is supported by the Russian Science Foundation under grant 14-21-00162 and performed at the Steklov Mathematical Institute of Russian Academy of Sciences.
Institutional Review Board Statement
Not applicable.
Informed Consent Statement
Not applicable.
Data Availability Statement
Not applicable.
Acknowledgments
The authors are grateful to Professor A.Tsybakov for useful discussions. We also thank the Reviewers for remarks and suggestions improving the exposition.
Conflicts of Interest
The authors declare no conflict of interest.
Appendix A
Proofs of Lemmas 1–3 are similar to the proofs of Lemma 2.5, 3.1 and 3.2 in []. We provide them for the sake of completeness.
Proof of Lemma 1.
(1) Note that if and . Hence, for such , one has if . If then for . Thus, for and any integer .
(2) Assume that . Consider where . If then, for each , in accordance with the definition of one has . Consequently, . Let now . One has
Therefore,
Suppose now that for some and . Then, for each , the Lyapunov inequality leads to the estimate .
(3) Let . Take . Then, for any , according to the definition of we get . Hence . Consider . For any and every , the function is continuous in r on . Next fix an arbitrary . We see that there exists . For such x, set . Thus, is continuous on any segment . Hence, one can find in such that and there exists in such that . If then (since for and as ). Assume that . Obviously as . One has
Thus, in any case ( or ) one has as . Taking into account that we deduce the inequality
Assume now that for some and . Then, for any , the Lyapunov inequality entails . This completes the proof. □
Proof of Lemma 2.
Begin with relation (1). Observe that if a function g is measurable and bounded on a finite interval and is a finite measure on the Borel subsets of then is finite. So applying the integration by parts formula (see, e.g., [], p. 245), for each , we get
Assume now that . Then by the monotone convergence theorem
Given the following lower bound is obvious
Therefore (A2) implies that
By the Lebesgue monotone convergence theorem letting in (A1) yields the desired relation (1) of our Lemma. Now we assume that
Hence from the equality we get by monotone convergence theorem. Therefore, for any , we come to the inequalities
Let (). Then, for all positive b small enough,
Thus , so as . Consequently we come to (A3) taking . Then (A1) implies relation (1).
When one of the (nonnegative) integrals in (1) turns infinite while the other one is finite we come to a contradiction. Thus, (1) is established. In quite the same manner one can verify validity of relation (2), therefore further details can be omitted. □
Proof of Lemma 3.
Proof of Lemma 4.
We will check that, for given and , there exist and such that, for any ,
For the statement is obviously true. Let . It easily seen that as . Hence one can find such that, for all , the inequality is valid. Consequently, for ,
For all we write . Therefore, for any , we come to (A5). Thus, for any and , , one has
□
Proof of Lemma 6.
For , a function is convex. We show that is convex on . Consider . Write and . Then, for ,
Obviously, , . Thus, for , we get
For and , we have . Take now . Clearly, for , one has because when . Observe also that
The last inequality is established by induction in N. Thus, in view of (A6), we have proved that, for all and , the inequality holds. Hence, the function is (strictly) convex on .
Let be a continuous nondecreasing function. If the restrictions of h to and (where ) are convex functions then, in general, it is not true that h is convex on . However, we can show that is convex on . Note that a function is convex on since it is convex on and continuous on . Take now any , and . Then
as . Thus, for each , a function is convex on . □
Proof of Corollary 3.
The proof (i.e., checking the conditions of both Theorem 1 and 2) is quite similar to the proof of Corollary 2.11 in []. □
Proof of Corollary 4.
Take , where is a density, , , , . Then according to (9) and (10), for any , and , one has , , . We will apply these relations for and . It is well-known that, for any , , , , the following inequality is valid . Moreover, this inequality is obviously satisfied for all as for it holds . Therefore
The same reasoning leads to bounds and . Now in view of (13), for and , we can write . In this manner we can also represent . □
Lemma A1.
Let probability measures , and a σ-finite measure μ(e.g., the Lebesgue measure) be defined on . Assume that and have densities and , , w.r.t. the measure μ. Then the following statements are true.
- (1)
- if and only if ;
- (2)
- formula (2) holds.
Proof of Lemma A1.
(1) Let . Obviously . Therefore . Since , one has .
Now let . Assume that is not absolutely continuous w.r.t. . Then there exists a set A such that and . Consequently as . We can write , where , . We get as . Note that since on , so . Relation yields ( on and is a -finite measure). One has because . Thus, . Clearly, . Hence . We come to the contradiction. Therefore .
In such a way we have proved that if and , the relation holds if and only if . Obviously we can take as p and q any versions of and .
(2) Suppose that . We know that , are probability measures, where is a -finite measure. Then, in view of [], statement (b) of Lemma on p. 273, the following equality holds -a.s. and consequently -a.s. too (on the set having a density can be taken equal to zero). So, for -almost all . One has
where all integrals converge or diverge simultaneously. Indeed, if h is a measurable function with values in then , whenever ( is a finite or a -finite measure). We also employed [], statement (a) of Lemma on p.273, when we changed the integration by to integration by .
Now assume that is not absolutely continuous w.r.t. , i.e., in view of part (1) of the present Lemma. As usual, for any measurable , . Then
Evidently
as . Since if , we write, for all , . Thus, . Consequently . The proof is complete. □
Remark A1.
Note that formula (2) can give an infinite value of also when . It is enough to take and , .
References
- Kullback, R.; Leibler, R.A. On information and sufficiency. Ann. Math. Stat. 1951, 22, 79–86. [Google Scholar] [CrossRef]
- Moulin, P.; Veeravalli, V.V. Statistical Inference for Engineers and Data Scientists; Cambridge University Press: Cambridge, UK, 2019. [Google Scholar]
- Pardo, L. New developments in statistical information theory based on entropy and divergence measures. Entropy 2019, 21, 391. [Google Scholar] [CrossRef] [PubMed]
- Ji, S.; Zhang, Z.; Ying, S.; Wang, L.; Zhao, X.; Gao, Y. Kullback–Leibler divergence metric learning. IEEE Trans. Cybern. 2020, 1–12. [Google Scholar] [CrossRef]
- Noh, Y.K.; Sugiyama, M.; Liu, S.; du Plessis, M.C.; Park, F.C.; Lee, D.D. Bias reduction and metric learning for nearest-neighbor estimation of Kullback–Leibler divergence. Neural Comput. 2018, 30, 1930–1960. [Google Scholar] [CrossRef]
- Claici, S.; Yurochkin, M.; Ghosh, S.; Solomon, J. Model Fusion with Kullback–Leibler Divergence. In Proceedings of the 37th International Conference on Machine Learning, Online, 12–18 July 2020; Daumé, H., III, Singh, A., Eds.; PMLR: Brookline, MA, USA, 2020; Volume 119, pp. 2038–2047. [Google Scholar]
- Póczos, B.; Xiong, L.; Schneider, J. Nonparametric Divergence Estimation with Applications to Machine Learning on Distributions. In Proceedings of the 27th Conference on Uncertainty in Artificial Intelligence, Barcelona, Spain, 14–17 July 2011; AUAI Press: Arlington, VA, USA, 2011; pp. 599–608. [Google Scholar]
- Cui, S.; Luo, C. Feature-based non-parametric estimation of Kullback–Leibler divergence for SAR image change detection. Remote Sens. Lett. 2016, 11, 1102–1111. [Google Scholar] [CrossRef]
- Deledalle, C.-A. Estimation of Kullback–Leibler losses for noisy recovery problems within the exponential family. Electron. J. Stat. 2017, 11, 3141–3164. [Google Scholar] [CrossRef]
- Yu, X.-P.; Chen, S.-X.; Peng, M.-L. Application of partial least squares algorithm based on Kullback–Leibler divergence in intrusion detection. In Proceedings of the International Conference on Computer Science and Technology (CST2016), Shenzhen, China, 8–10 January 2016; Cai, N., Ed.; World Scientific: Singapore, 2017; pp. 256–263. [Google Scholar]
- Li, J.; Cheng, K.; Wang, S.; Morstatter, F.; Trevino, R.P.; Tang, J.; Liu, H. Feature Selection: A Data Perspective. ACM Comput. Surv. 2017, 50, 1–45. [Google Scholar] [CrossRef]
- Peng, H.; Long, F.; Ding, C. Feature selection based on mutual information criteria of max-dependency, max-relevance, and min-redundancy. IEEE Trans. Pattern Anal. Mach. Intell. 2005, 27, 1226–1238. [Google Scholar] [CrossRef]
- Vergara, J.R.; Estévez, P.A. A review of feature selection methods based on mutual information. Neural Comput. Appl. 2014, 24, 175–186. [Google Scholar] [CrossRef]
- Granero-Belinchón, C.; Roux, S.G.; Garnier, N.B. Kullback–Leibler divergence measure of intermittency: Application to turbulence. Phys. Rev. E 2018, 97, 013107. [Google Scholar] [CrossRef]
- Charzyńska, A.; Gambin, A. Improvement of the k-NN entropy estimator with applications in systems biology. Entropy 2016, 18, 13. [Google Scholar] [CrossRef]
- Wang, M.; Jiang, J.; Yan, Z.; Alberts, I.; Ge, J.; Zhang, H.; Zuo, C.; Yu, J.; Rominger, A.; Shi, K.; et al. Individual brain metabolic connectome indicator based on Kullback–Leibler Divergence Similarity Estimation predicts progression from mild cognitive impairment to Alzheimer’s dementia. Eur. J. Nucl. Med. Mol. Imaging 2020, 47, 2753–2764. [Google Scholar] [CrossRef] [PubMed]
- Zhong, J.; Liu, R.; Chen, P. Identifying critical state of complex diseases by single-sample Kullback–Leibler divergence. BMC Genom. 2020, 21, 87. [Google Scholar] [CrossRef] [PubMed]
- Li, J.; Shang, P. Time irreversibility of financial time series based on higher moments and multiscale Kullback–Leibler divergence. Phys. A Stat. Mech. Appl. 2018, 502, 248–255. [Google Scholar] [CrossRef]
- Beraha, M.; Betelli, A.M.; Papini, M.; Tirinzoni, A.; Restelli, M. Feature selection via mutual information: New theoretical insights. arXiv 2019, arXiv:1907.07384v1. [Google Scholar]
- Carrara, N.; Ernst, J. On the estimation of mutual information. Proceedings 2019, 33, 31. [Google Scholar] [CrossRef]
- Lord, W.M.; Sun, J.; Bollt, E.M. Geometric k-nearest neighbor estimation of entropy and mutual information. Chaos Interdiscip. J. Nonlinear Sci. 2018, 28, 033114. [Google Scholar] [CrossRef]
- Moon, K.R.; Sricharan, K.; Hero, A.O., III. Ensemble estimation of generalized mutual information with applications to Genomics. arXiv 2019, arXiv:1701.08083v2. [Google Scholar]
- Suzuki, J. Estimation of Mutual Information; Springer: Singapore, 2021. [Google Scholar]
- Sason, I.; Verdú, S. F-difergence inequalities. IEEE Trans. Inf. Theory 2016, 62, 5973–6006. [Google Scholar] [CrossRef]
- Moon, K.R.; Sricharan, K.; Greenewald, K.; Hero, A.O., III. Ensemble estimation of information divergence. Entropy 2018, 20, 560. [Google Scholar] [CrossRef]
- Rubenstein, P.K.; Bousquet, O.; Djolonga, J.; Riquelme, C.; Tolstikhin, I. Practical and Consistent Estimation of f-Divergences. In Proceedings of the NeurIPS 2019, 33rd Conference on Neural Information Processing Systems, Vancouver, BC, Canada, 8–14 December 2019; Advances in Neural Information Processing Systems. Wallach, H., Larochelle, H., Beygelzimer, A., d’Alché-Buc, F., Fox, E., Garnett, R., Eds.; Curran Associates Inc.: Red Hook, NY, USA, 2019; Volume 32, pp. 4070–4080. [Google Scholar]
- Shannon, C.E. A mathematical theory of communication. Bell Syst. Tech. J. 1948, 27, 379–423. [Google Scholar] [CrossRef]
- Kozachenko, L.F.; Leonenko, N.N. Sample estimate of the entropy of a random vector. Probl. Inf. Transm. 1987, 23, 9–16. [Google Scholar]
- Kraskov, A.; Stögbauer, H.; Grassberger, P. Estimating mutual information. Phys. Rev. E 2004, 69, 066138. [Google Scholar] [CrossRef] [PubMed]
- Leonenko, N.N.; Pronzato, L.; Savani, V. A class of Rényi information estimations for multidimensional densities. Ann. Stat. 2010, 36, 2153–2182. [Google Scholar] [CrossRef]
- Wang, Q.; Kulkarni, S.R.; Verdú, S. Divergence estimation for multidimensional densities via k-nearest-neighbor distances. IEEE Trans. Inf. Theory 2009, 55, 2392–2405. [Google Scholar] [CrossRef]
- Pál, D.; Póczos, B.; Szepesvári, C. Estimation of Rényi Entropy and Mutual Information Based on Generalized Nearest-Neighbor Graphs. In Proceedings of the NIPS 2010 Proceedings of the 23rd International Conference on Neural Information Processing Systems, Vancouver, BC, Canada, 6–9 December 2010; Advances in Neural Information Processing Systems. Lafferty, J., Williams, C., Shawe-Taylor, J., Zemel, R., Culotta, A., Eds.; Curran Associates Inc.: Red Hook, NY, USA, 2010; Volume 23, pp. 1849–1857. [Google Scholar]
- Shiryaev, A.N. Probability—1, 3rd ed.; Springer: New York, NY, USA, 2016. [Google Scholar]
- Loève, M. Probability Theory, 4th ed.; Springer: New York, NY, USA, 1977. [Google Scholar]
- Bulinski, A.; Dimitrov, D. Statistical estimation of the Shannon entropy. Acta Math. Sin. Ser. 2019, 35, 17–46. [Google Scholar] [CrossRef]
- Biau, G.; Devroye, L. Lectures on the Nearest Neighbor Method; Springer: Cham, Switzerland, 2015. [Google Scholar]
- Bulinski, A.; Kozhevin, A. Statistical estimation of conditional Shannon entropy. ESAIM Probab. Stat. 2019, 23, 350–386. [Google Scholar] [CrossRef]
- Coelho, F.; Braga, A.P.; Verleysen, M. A mutual information estimator for continuous and discrete variables applied to feature selection and classification problems. Int. J. Comput. Intell. Syst. 2016, 9, 726–733. [Google Scholar] [CrossRef]
- Delattre, S.; Fournier, N. On the Kozachenko-Leonenko entropy estimator. J. Stat. Plan. Inference 2017, 185, 69–93. [Google Scholar] [CrossRef][Green Version]
- Berrett, T.B.; Samworth, R.J. Efficient two-sample functional estimation and the super-oracle phenomenon. arXiv 2019, arXiv:1904.09347. [Google Scholar]
- Penrose, M.D.; Yukich, J.E. Limit theory for point processes in manifolds. Ann. Appl. Probab. 2013, 6, 2160–2211. [Google Scholar] [CrossRef]
- Tsybakov, A.B.; Van der Meulen, E.C. Root-n consistent estimators of entropy for densities with unbounded support. Scand. J. Stat. 1996, 23, 75–83. [Google Scholar]
- Singh, S.; Pószoc, B. Analysis of k-nearest neighbor distances with application to entropy estimation. arXiv 2016, arXiv:1603.08578v2. [Google Scholar]
- Ryu, J.J.; Ganguly, S.; Kim, Y.-H.; Noh, Y.-K.; Lee, D.D. Nearest neighbor density functional estimation from inverse Laplace transform. arXiv 2020, arXiv:1805.08342v3. [Google Scholar]
- Gao, S.; Steeg, G.V.; Galstyan, A. Efficient Estimation of Mutual Information for Strongly Dependent Variables. In Proceedings of the 18th International Conference on Artificial Intelligence and Statistics, San Diego, CA, USA, 9–12 May 2015; Lebanon, G., Vishwanathan, S.V.N., Eds.; PMLR: Brookline, MA, USA, 2015; Volume 38, pp. 277–286. [Google Scholar]
- Berrett, T.B.; Samworth, R.J.; Yuan, M. Efficient multivariate entropy estimation via k-nearest neighbour distances. Ann. Stat. 2019, 47, 288–318. [Google Scholar] [CrossRef]
- Goria, M.N.; Leonenko, N.N.; Mergel, V.V.; Novi Inverardi, P.L. A new class of random vector entropy estimators and its applications in testing statistical hypotheses. J. Nonparametr. Stat. 2005, 17, 277–297. [Google Scholar] [CrossRef]
- Evans, D. A computationally efficient estimator for mutual information. Proc. R. Soc. A Math. Phys. Eng. Sci. 2008, 464, 1203–1215. [Google Scholar] [CrossRef]
- Yeh, J. Real Analysis: Theory of Measure and Integration, 3rd ed.; World Scientific: Singapore, 2014. [Google Scholar]
- Evans, D.; Jones, A.J.; Schmidt, W.M. Asymptotic moments of near-neighbour distance distributions. Proc. R. Soc. A Math. Phys. Eng. Sci. 2002, 458, 2839–2849. [Google Scholar] [CrossRef]
- Bouguila, N.; Wentao, F. Mixture Models and Applications; Springer: Cham, Switzerland, 2020. [Google Scholar]
- Borkar, V.S. Probability Theory. An Advanced Course; Springer: New York, NY, USA, 1995. [Google Scholar]
- Kallenberg, O. Foundations of Modern Probability; Springer: New York, NY, USA, 1997. [Google Scholar]
- Billingsley, P. Convergence of Probability Measures, 2nd ed.; Wiley & Sons: New York, NY, USA, 1999. [Google Scholar]
- Alonso Ruiz, P.; Spodarev, E. Entropy-based inhomogeneity detection in fiber materials. Methodol. Comput. Appl. Probab. 2018, 20, 1223–1239. [Google Scholar] [CrossRef]
- Dresvyanskiy, D.; Karaseva, T.; Makogin, V.; Mitrofanov, S.; Redenbach, C.; Spodarev, E. Detecting anomalies in fibre systems using 3-dimensional image data. Stat. Comput. 2020, 30, 817–837. [Google Scholar] [CrossRef]
- Glaz, J.; Naus, J.; Wallenstein, S. Scan Statistics; Springer: New York, NY, USA, 2009. [Google Scholar]
- Walther, G. Optimal and fast detection of spatial clusters with scan statistics. Ann. Stat. 2010, 38, 1010–1033. [Google Scholar] [CrossRef]
- Gnedenko, B.V.; Korolev, V.Yu. Random Summation: Limit Theorems and Applications; CRC Press: Boca Raton, FL, USA, 1996. [Google Scholar]
Publisher’s Note: MDPI stays neutral with regard to jurisdictional claims in published maps and institutional affiliations. |
© 2021 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (http://creativecommons.org/licenses/by/4.0/).