# Local Intrinsic Dimensionality, Entropy and Statistical Divergences

^{1}

^{2}

^{*}

^{†}

## Abstract

**:**

## 1. Introduction

- For univariate scenarios, if working with the tail of a distribution that has a single variable, we can conduct:
- –
- Temporal analysis: when a distribution models some property varying over time (e.g., survival analysis), we can analyze the entropy of a univariate distribution within an asymptotically short window of time, or the divergence between two univariate distributions within an asymptotically short window of time.
- –
- Distance-based analysis: when a distribution models distances from a query location to its nearest neighbors and the distances are induced by a global data distribution. Here, our results can be used for analysis of tail entropy or divergence between distributions within an asymptotically small distance interval. In the case of the latter, this can provide insight into multivariate properties, since under minimal assumptions the divergences between univariate distance distributions provide lower bounds for distances between multivariate distributions [4,5]. This is applicable for models such as generative adversarial networks (GANs), where it is important to test correspondence between synthetic and true distributions at a local level [6].

- For multivariate scenarios where we are analyzing distributions with multiple variables:
- –
- If an assumption of locally spherical symmetry of the distribution holds, then we can directly compute the tail entropy of a distribution or the divergence between two tail distributions in the vicinity of a single point. Such an assumption is suitable for analyzing data distributions for many types of physical systems such as fluids, glasses, metals and polymers, where local isotropy holds.

- Formulate technical lemmas which delineate when it is possible to substitute certain types of tail distributions by simple formulations that depend only on their associated LID values.
- Use these lemmas to compute univariate tail formulations of entropy, cross entropy, cumulative entropy, entropy power and generalized q-entropies, all in terms of the LID values of the original tail distributions.
- Use these lemmas to compute tail formulations of univariate statistical divergences and distances (Kullback–Leibler divergence, Jensen–Shannon divergence, Hellinger distance, ${\chi}^{2}$ divergence, $\alpha $-divergence, Wasserstein distance and ${L}_{2}$ distance).
- Extend the univariate results to a multivariate context, when local spherical symmetry of the distribution holds.

## 2. Related Work

## 3. Local Intrinsic Dimensionality

**Definition**

**1**

**.**Let F be a real-valued function that is non-zero over some open interval containing $r\in \mathbb{R}$, $r\ne 0$. The intrinsic dimensionality of F at r is defined as follows whenever the limit exists:

**Theorem**

**1**

**.**Let F be a real-valued function that is non-zero over some open interval containing $r\in \mathbb{R}$, $r\ne 0$. If F is continuously differentiable at r and using ${F}^{\prime}\left(r\right)$ to denote the derivative $\frac{dF\left(r\right)}{dr}$, then

**Theorem**

**2 (LID Representation Theorem**

**[2]).**Let $F:\mathbb{R}\to \mathbb{R}$ be a real-valued function, and assume that ${ID}_{F}^{\ast}$ exists. Let x and w be values for which $x/w$ and $F\left(x\right)/F\left(w\right)$ are both positive. If F is non-zero and continuously differentiable everywhere in the interval $[min\{x,w\},max\{x,w\left\}\right]$, then

## 4. Definitions of Tail Entropies and Tail Dissimilarity Measures

**Definition**

**2.**

- There exists a value $r>0$ such that F is monotonically increasing over $(0,r)$;
- F is continuous over $[0,r)$;
- F is differentiable over $(0,r)$; and
- The local intrinsic dimensionality ${ID}_{F}^{\ast}$ exists and is positive.

**Definition**

**3**

**(Tail Entropy).**The entropy of F conditioned on $[0,w]$ is

**Definition**

**4**

**(Tail varentropy).**The varentropy of F conditioned on $[0,w]$ is

**Definition**

**5**

**(Cumulative Tail Entropy).**The cumulative entropy of F conditioned on $[0,w]$ is

**Definition**

**6**

**(Tail Entropy Power).**The entropy power of F conditioned on $[0,w]$ is defined to be

- It can be interpreted as a diversity. Observe that when F is a (univariate) uniform distance distribution ranging over the interval $[0,w]$, we have ${ID}_{F}^{\ast}=1$ and $\mathrm{HP}(F,w)=w$. In other words, the entropy power is equal to the ‘effective diversity’ of the distribution (the number of neighbor distance possibilities).
- Given two different queries, each with its own neighborhood, one query with tail entropy power equal to 2 and the other with tail entropy power equal to 4, we can say that the distance distribution of the second query is twice as diverse as that of the first query.

**Definition**

**7**

**(Tail q-Entropy).**For any $q>0$$(q\ne 1)$, the q-entropy of F conditioned on $[0,w]$ is defined to be

**Definition**

**8**

**(Cumulative Tail q-Entropy).**For any $q>0$$(q\ne 1)$, the cumulative q-entropy of F conditioned on $[0,w]$ is defined to be

**Definition**

**9**

**(Tail q-Entropy Power).**For any $q>0$$(q\ne 1)$, the q-entropy power of F conditioned on $[0,w]$ is defined to be

**Definition**

**10**

**(Tail Cross Entropy).**The cross entropy from F to G, conditioned on $[0,w]$, is defined to be

**Definition**

**11**

**(Tail Cross Entropy Power).**The cross entropy power from F to G, conditioned on $[0,w]$, is defined to be

**Definition**

**12**

**(Tail KL Divergence).**The Kullback–Leibler divergence from F to G, conditioned on $[0,w]$, is defined to be

**Definition**

**13**

**(Tail JS Divergence).**The Jensen–Shannon divergence between F and G, conditioned on $[0,w]$, is defined to be

**Definition**

**14**

**(Tail L2 Distance).**The L2 distance between F and G, conditioned on $[0,w]$, is defined to be

**Definition**

**15**

**(Tail Hellinger Distance).**The Hellinger distance between F and G, conditioned on $[0,w]$, is defined to be

**Definition**

**16**

**(Tail**${\chi}^{\mathbf{2}}$

**-Divergence).**The ${\chi}^{2}$ divergence between F and G, conditioned on $[0,w]$, is defined to be

**Definition**

**17**

**(Tail**$\mathbf{\alpha}$

**-Divergence).**The α-divergence from F to G, conditioned on $[0,w]$, is defined to be

**Definition**

**18**

**(Tail Wasserstein Distance).**The p-th Wasserstein distance between F and G, conditioned on $[0,w]$, is defined to be

## 5. Simplification of Tail Measures

**Lemma**

**1.**

- $\psi :{\mathbb{R}}_{+}^{3}\to \mathbb{R}$;
- $z(t,w)={F}_{w}\left(t\right)=\frac{F\left(t\right)}{F\left(w\right)}$; and
- for all fixed choices of t and w satisfying $0\phantom{\rule{0.166667em}{0ex}}<\phantom{\rule{0.166667em}{0ex}}t\phantom{\rule{0.166667em}{0ex}}\le \phantom{\rule{0.166667em}{0ex}}w\phantom{\rule{0.166667em}{0ex}}<\phantom{\rule{0.166667em}{0ex}}r$, $\psi (t,w,z)$ is monotone and continuously partially differentiable with respect to z over the interval $z\in (0,1]$.

**Proof.**

**Lemma**

**2.**

- $\psi :{\mathbb{R}}_{+}^{3}\to \mathbb{R}$;
- $z(u,w)={F}_{w}^{-1}\left(u\right)$ for all $w\in (0,r)$, where ${F}_{w}\left(t\right)\triangleq F\left(t\right)/F\left(w\right)$ is restricted to values of t in $[0,w]$; and
- for all fixed choices of u and w satisfying $u\in [0,1]$ and $0\phantom{\rule{0.166667em}{0ex}}<\phantom{\rule{0.166667em}{0ex}}w\phantom{\rule{0.166667em}{0ex}}<\phantom{\rule{0.166667em}{0ex}}r$, $\psi (u,w,z)$ is monotone and continuously partially differentiable with respect to z over the interval $z\in (0,r)$.

**Proof.**

**Lemma**

**3.**

- $\psi :{\mathbb{R}}_{+}^{3}\to \mathbb{R}$;
- $z(t,w)={ID}_{F}\left(t\right)$, and
- there exists a value $\gamma \in (0,{ID}_{F}^{\ast})$ such that for all fixed choices of t satisfying $0\phantom{\rule{0.166667em}{0ex}}<\phantom{\rule{0.166667em}{0ex}}t\phantom{\rule{0.166667em}{0ex}}\le \phantom{\rule{0.166667em}{0ex}}w\phantom{\rule{0.166667em}{0ex}}<\phantom{\rule{0.166667em}{0ex}}r$, $\psi (t,w,z)$ is monotone with respect to z over the interval $z\in ({ID}_{F}^{\ast}-\gamma ,{ID}_{F}^{\ast}+\gamma )$.

**Proof.**

## 6. Derivation of the Limits of Tail Measures

#### 6.1. Handling Derivatives of Smooth Growth Functions

#### 6.2. Substitution of LID Functions by Constants

#### 6.3. Elimination of Tail-Conditioned Smooth Growth Functions

#### 6.4. Elimination of the Inverses of Tail-Conditioned Smooth Growth Functions

#### 6.5. Normalization

#### 6.6. Summary of Results

## 7. Extension to Multivariate Distributions

#### 7.1. Multivariate Tail Distributions with Local Spherical Symmetry

**Lemma**

**4**

**.**Let $\mathbf{X}$ be an n-dimensional random vector that is spherically symmetric with a radial distribution $\mathcal{R}$. Then $\mathbf{X}$ has a density $f\left(\mathbf{x}\right)$ if and only if $\mathcal{R}$ has a density s and

#### 7.2. Multivariate Tail Entropy Variants

#### 7.3. Multivariate Cumulative Tail Entropy

#### 7.4. Multivariate Tail Divergences

#### 7.5. Observations

- A result for the Wasserstein Distance is not included, since its formulation does not generalize straightforwardly to higher dimensions, unlike the other divergence measures.
- The normalizations and weightings used depend only on the tail volume ${V}_{n}\left(w\right)$ and (for the Tsallis entropy variants) the parameter q. This generalizes our earlier univariate results where normalization was performed with regard to the tail length w.
- All the multivariate tail variants considered Table 6 are elegant generalizations of their corresponding univariate formulations, and all explicitly depend on the ratios between the LIDs and the dimension of the space n ($\phi =\frac{{ID}_{F}^{\ast}}{n}$ and $\gamma =\frac{{ID}_{G}^{\ast}}{n}$), or on the ratio of two LID values ($\rho =\frac{{ID}_{G}^{\ast}}{{ID}_{F}^{\ast}}=\frac{\gamma}{\phi}$). Among these, the Normalized Entropy Power and the Normalized Cumulative Entropy are maximized when ${ID}_{F}^{\ast}=n$, which can occur when the tail distribution is uniform. The Varentropy is minimized when ${ID}_{F}^{\ast}=n$, which can occur when the variance of the log-likelihood for a uniform distribution is equal to zero.
- As mentioned in Related Work, a number of previous studies in deep learning have found that the local intrinsic dimension in learned representations is lower than the dimension of the full space [32,33,34,35] (i.e., ${ID}_{F}^{\ast}<n$) and that the learning process progressively reduces local intrinsic dimension. Consider a concrete example where $n=100$ and ${ID}_{F}^{\ast}=12$ and the learning process is reducing ${ID}_{F}^{\ast}$ at a point from 12 to 11. The consequent effect on entropy can be interpreted from two different perspectives, either as an increase in tail distance entropy or a decrease in tail location entropy:
- Considering univariate normalized entropy power or normalized cumulative entropy (Table 1), reduction of ${ID}_{F}^{\ast}$ corresponds to an increase in entropy. Here, the entropy is measuring the uncertainty of the univariate random variable modeling distances to nearest neighbors. Thus, reduction of ${ID}_{F}^{\ast}$ corresponds to an increase in “distance entropy”.
- Considering multivariate normalized entropy power or multivariate normalized cumulative entropy (Table 6), reduction of ${ID}_{F}^{\ast}$ corresponds to an decrease in entropy. Here, the entropy is measuring the uncertainty of the multivariate random variable modeling locations of nearest neighbors, assuming local spherical symmetry. So reduction of ${ID}_{F}^{\ast}$ corresponds to a decrease in “location entropy”.

- 5.
- All four of the multivariate tail divergences listed in Table 6, as well as the Hellinger Distance, have radial integral formulations that are identical to their univariate counterparts. All the divergences and distances (including the Weighted L2 Distance) are minimized when ${ID}_{F}^{\ast}={ID}_{G}^{\ast}$.
- 6.
- By setting $n=1$, we can recover the univariate results from Table 1. However, note that the range of integration used in Table 6 is a hypersphere of radius w, where for $n=1$ it is the interval $[-w,w]$. In contrast, the integral formulations listed in Table 1 were taken over the interval $[0,w]$. For some results, this means a minor (constant factor of 2) difference between Table 1 and the result from Table 6 when $n=1$.

#### 7.6. Visualization of Behavior

## 8. Conclusions

## Author Contributions

## Funding

## Institutional Review Board Statement

## Informed Consent Statement

## Data Availability Statement

## Conflicts of Interest

## References

- Basseville, M. Divergence measures for statistical data processing—An annotated bibliography. Signal Process.
**2013**, 93, 621–633. [Google Scholar] [CrossRef] - Houle, M.E. Local Intrinsic Dimensionality I: An Extreme-Value-Theoretic Foundation for Similarity Applications. In Proceedings of the International Conference on Similarity Search and Applications, Munich, Germany, 4–6 October 2017; pp. 64–79. [Google Scholar]
- Bailey, J.; Houle, M.E.; Ma, X. Relationships Between Local Intrinsic Dimensionality and Tail Entropy. In Proceedings of the Similarity Search and Applications—Proc. of the 14th International Conference, SISAP 2021, Dortmund, Germany, 29 September–1 October 2021. [Google Scholar]
- Heller, R.; Heller, Y. Multivariate tests of association based on univariate tests. In Advances in Neural Information Processing Systems 29 (NIPS 2016); Lee, D.D., Sugiyama, M., von Luxburg, U., Guyon, I., Garnett, R., Eds.; Curran Associates Inc.: Red Hook, NY, USA, 2016; pp. 208–216. [Google Scholar]
- Maa, J.; Pearl, D.; Bartoszynski, R. Reducing multidimensional two-sample data to one-dimensional interpoint comparisons. Ann. Stat.
**1996**, 24, 1069–1074. [Google Scholar] [CrossRef] - Li, A.; Qi, J.; Zhang, R.; Ma, X.; Ramamohanarao, K. Generative image inpainting with submanifold alignment. In Proceedings of the 28th International Joint Conference on Artificial Intelligence, Macao, Hong Kong, 10–16 August 2019; pp. 811–817. [Google Scholar]
- Camastra, F.; Staiano, A. Intrinsic dimension estimation: Advances and open problems. Inf. Sci.
**2016**, 328, 26–41. [Google Scholar] [CrossRef] - Campadelli, P.; Casiraghi, E.; Ceruti, C.; Rozza, A. Intrinsic Dimension Estimation: Relevant Techniques and a Benchmark Framework. Math. Probl. Eng.
**2015**, 2015, 759567. [Google Scholar] [CrossRef] - Verveer, P.J.; Duin, R.P.W. An evaluation of intrinsic dimensionality estimators. IEEE Trans. Pattern Anal. Mach. Intell.
**1995**, 17, 81–86. [Google Scholar] [CrossRef] - Bruske, J.; Sommer, G. Intrinsic dimensionality estimation with optimally topology preserving maps. IEEE Trans. Pattern Anal. Mach. Intell.
**1998**, 20, 572–575. [Google Scholar] [CrossRef] - Pettis, K.W.; Bailey, T.A.; Jain, A.K.; Dubes, R.C. An intrinsic dimensionality estimator from near-neighbor information. IEEE Trans. Pattern Anal. Mach. Intell.
**1979**, 1, 25–37. [Google Scholar] [CrossRef] - Navarro, G.; Paredes, R.; Reyes, N.; Bustos, C. An empirical evaluation of intrinsic dimension estimators. Inf. Syst.
**2017**, 64, 206–218. [Google Scholar] [CrossRef] - Jolliffe, I.T. Principal Component Analysis; Springer: Berlin/Heidelberg, Germany, 2002. [Google Scholar]
- Costa, J.A.; Hero III, A.O. Entropic Graphs for Manifold Learning. In Proceedings of the 37th Asilomar Conference on Signals, Systems & Computers, Pacific Grove, CA, USA, 9–12 November 2003; Volume 1, pp. 316–320. [Google Scholar]
- Hein, M.; Audibert, J.Y. Intrinsic dimensionality estimation of submanifolds in R
^{d}. In Proceedings of the 22nd International Conference on Machine Learning, Bonn, Germany, 7–11 August 2005; pp. 289–296. [Google Scholar] - Rozza, A.; Lombardi, G.; Rosa, M.; Casiraghi, E.; Campadelli, P. IDEA: Intrinsic Dimension Estimation Algorithm. In Proceedings of the International Conference on Image Analysis and Processing, Ravenna, Italy, 14–16 September 2011; pp. 433–442. [Google Scholar]
- Rozza, A.; Lombardi, G.; Ceruti, C.; Casiraghi, E.; Campadelli, P. Novel High Intrinsic Dimensionality Estimators. Mach. Learn.
**2012**, 89, 37–65. [Google Scholar] [CrossRef] - Ceruti, C.; Bassis, S.; Rozza, A.; Lombardi, G.; Casiraghi, E.; Campadelli, P. DANCo: An intrinsic dimensionality estimator exploiting angle and norm concentration. Pattern Recognit.
**2014**, 47, 2569–2581. [Google Scholar] [CrossRef] - Facco, E.; d’Errico, M.; Rodriguez, A.; Laio, A. Estimating the intrinsic dimension of datasets by a minimal neighborhood information. Sci. Rep.
**2017**, 7, 12140. [Google Scholar] [CrossRef] - Zhou, S.; Tordesillas, A.; Pouragha, M.; Bailey, J.; Bondell, H. On local intrinsic dimensionality of deformation in complex materials. Nat. Sci. Rep.
**2021**, 11, 10216. [Google Scholar] [CrossRef] - Tordesillas, A.; Zhou, S.; Bailey, J.; Bondell, H. A representation learning framework for detection and characterization of dead versus strain localization zones from pre- to post- failure. Granul. Matter
**2022**, 24, 75. [Google Scholar] [CrossRef] - Faranda, D.; Messori, G.; Yiou, P. Dynamical proxies of North Atlantic predictability and extremes. Sci. Rep.
**2017**, 7, 41278. [Google Scholar] [CrossRef] - Messori, G.; Harnik, N.; Madonna, E.; Lachmy, O.; Faranda, D. A dynamical systems characterization of atmospheric jet regimes. Earth Syst. Dynam.
**2021**, 12, 233–251. [Google Scholar] [CrossRef] - Kambhatla, N.; Leen, T.K. Dimension Reduction by Local Principal Component Analysis. Neural Comput.
**1997**, 9, 1493–1516. [Google Scholar] [CrossRef] - Houle, M.E.; Ma, X.; Nett, M.; Oria, V. Dimensional Testing for Multi-Step Similarity Search. In Proceedings of the IEEE 12th International Conference on Data Mining, Brussels, Belgium, 10–13 December 2012; pp. 299–308. [Google Scholar]
- Campadelli, P.; Casiraghi, E.; Ceruti, C.; Lombardi, G.; Rozza, A. Local Intrinsic Dimensionality Based Features for Clustering. In Proceedings of the International Conference on Image Analysis and Processing, Naples, Italy, 9–13 September 2013; pp. 41–50. [Google Scholar]
- Houle, M.E.; Schubert, E.; Zimek, A. On the correlation between local intrinsic dimensionality and outlierness. In Proceedings of the International Conference on Similarity Search and Applications, Lima, Peru, 7–9 October 2018; pp. 177–191. [Google Scholar]
- Carter, K.M.; Raich, R.; Finn, W.G.; Hero, A.O., III. FINE: Fisher Information Non-parametric Embedding. IEEE Trans. Pattern Anal. Mach. Intell.
**2009**, 31, 2093–2098. [Google Scholar] [CrossRef] - Ma, X.; Li, B.; Wang, Y.; Erfani, S.M.; Wijewickrema, S.N.R.; Schoenebeck, G.; Song, D.; Houle, M.E.; Bailey, J. Characterizing Adversarial Subspaces Using Local Intrinsic Dimensionality. In Proceedings of the International Conference on Learning Representations, Vancouver, BC, Canada, 30 April–3 May 2018; pp. 1–15. [Google Scholar]
- Amsaleg, L.; Bailey, J.; Barbe, D.; Erfani, S.M.; Houle, M.E.; Nguyen, V.; Radovanović, M. The Vulnerability of Learning to Adversarial Perturbation Increases with Intrinsic Dimensionality. In Proceedings of the IEEE Workshop on Information Forensics and Security, Rennes, France, 4–7 December 2017; pp. 1–6. [Google Scholar]
- Amsaleg, L.; Bailey, J.; Barbe, A.; Erfani, S.M.; Furon, T.; Houle, M.E.; Radovanović, M.; Nguyen, X.V. High Intrinsic Dimensionality Facilitates Adversarial Attack: Theoretical Evidence. IEEE Trans. Inf. Forensics Secur.
**2021**, 16, 854–865. [Google Scholar] [CrossRef] - Ma, X.; Wang, Y.; Houle, M.E.; Zhou, S.; Erfani, S.M.; Xia, S.; Wijewickrema, S.N.R.; Bailey, J. Dimensionality-Driven Learning with Noisy Labels. In Proceedings of the International Conference on Machine Learning, Stockholm, Sweden, 10–15 July 2018; pp. 3361–3370. [Google Scholar]
- Ansuini, A.; Laio, A.; Macke, J.H.; Zoccolan, D. Intrinsic dimension of data representations in deep neural networks. In Proceedings of the Advances in Neural Information Processing Systems, Vancouver, BC, Canada, 8–14 December 2019; pp. 6111–6122. [Google Scholar]
- Pope, P.; Zhu, C.; Abdelkader, A.; Goldblum, M.; Goldstein, T. The intrinsic dimension of images and its impact on learning. In Proceedings of the International Conference on Learning Representations, Virtual Event, 3–7 May 2021. [Google Scholar]
- Gong, S.; Boddeti, V.N.; Jain, A.K. On the intrinsic dimensionality of image representations. In Proceedings of the CVPR, Long Beach, CA, USA, 5–20 June 2019; pp. 3987–3996. [Google Scholar]
- Barua, S.; Ma, X.; Erfani, S.M.; Houle, M.H.; Bailey, J. Quality Evaluation of GANs Using Cross Local Intrinsic Dimensionality. arXiv
**2019**, arXiv:1905.00643. [Google Scholar] - Romano, S.; Chelly, O.; Nguyen, V.; Bailey, J.; Houle, M.E. Measuring Dependency via Intrinsic Dimensionality. In Proceedings of the ICPR16, Cancun, Mexico, 4–8 December 2016; pp. 1207–1212. [Google Scholar]
- Lucarini, V.; Faranda, D.; de Freitas, A.; de Freitas, J.; Holland, M.; Kuna, T.; Nicol, M.; Todd, M.; Vaienti, S. Extremes and Recurrence in Dynamical Systems; Pure and Applied Mathematics: A Wiley Series of Texts, Monographs and Tracts; Wiley: Hoboken, NJ, USA, 2016. [Google Scholar]
- Levina, E.; Bickel, P.J. Maximum Likelihood Estimation of Intrinsic Dimension. In Proceedings of the Advances in Neural Information Processing Systems, Vancouver, BC, Canada, 13–18 December 2004; pp. 777–784. [Google Scholar]
- Amsaleg, L.; Chelly, O.; Furon, T.; Girard, S.; Houle, M.E.; Kawarabayashi, K.; Nett, M. Extreme-Value-Theoretic Estimation of Local Intrinsic Dimensionality. Data Min. Knowl. Discov.
**2018**, 32, 1768–1805. [Google Scholar] [CrossRef] - Hill, B.M. A Simple General Approach to Inference About the Tail of a Distribution. Ann. Stat.
**1975**, 3, 1163–1174. [Google Scholar] [CrossRef] - Johnsson, K.; Soneson, C.; Fontes, M. Low bias local intrinsic dimension estimation from expected simplex skewness. IEEE TPAMI
**2015**, 37, 196–202. [Google Scholar] [CrossRef] [PubMed] - Amsaleg, L.; Chelly, O.; Houle, M.E.; Kawarabayashi, K.; Radovanović, R.; Treeratanajaru, W. Intrinsic dimensionality estimation within tight localities. In Proceedings of the 2019 SIAM International Conference on Data Mining, Calgary, AB, Canada, 2–4 May 2019; pp. 181–189. [Google Scholar]
- Farahmand, A.M.; Szepesvári, C.; Audibert, J.Y. Manifold-adaptive dimension estimation. In Proceedings of the 24th International Conference on Machine Learning, Corvalis, OR, USA, 20–24 June 2007; pp. 265–272. [Google Scholar]
- Block, A.; Jia, Z.; Polyanskiy, Y.; Rakhlin, A. Intrinsic Dimension Estimation Using Wasserstein Distances. arXiv
**2021**, arXiv:2106.04018. [Google Scholar] - Thordsen, E.; Schubert, E. ABID: Angle Based Intrinsic Dimensionality—Theory and analysis. Inf. Syst.
**2022**, 108, 101989. [Google Scholar] [CrossRef] - Carter, K.M.; Raich, R.; Hero III, A.O. On Local Intrinsic Dimension Estimation and Its Applications. IEEE Trans. Signal Process.
**2010**, 58, 650–663. [Google Scholar] [CrossRef] - Tempczyk, P.; Golinski, A.; Spurek, P.; Tabor, J. LIDL: Local Intrinsic Dimension estimation using approximate Likelihood. In Proceedings of the ICLR 2021 Workshop on Geometrical and Topological Representation Learning, Online, 7 May 2021. [Google Scholar]
- Cover, T.M.; Thomas, J.A. Elements of Information Theory (Wiley Series in Telecommunications and Signal Processing); Wiley-Interscience: Hoboken, NJ, USA, 2006. [Google Scholar]
- Rioul, O. Information Theoretic Proofs of Entropy Power Inequalities. IEEE Trans. Inf. Theory
**2011**, 57, 33–55. [Google Scholar] [CrossRef] - Jelinek, F.; Mercer, R.L.; Bahl, L.R.; Baker, J.K. Perplexity—A measure of the difficulty of speech recognition tasks. J. Acoust. Soc. Am.
**1977**, 62, S63. [Google Scholar] [CrossRef] - Jost, L. Entropy and diversity. Oikos
**2006**, 113, 363–375. [Google Scholar] [CrossRef] - Kostal, L.; Lansky, P.; Pokora, O. Measures of statistical dispersion based on Shannon and Fisher information concepts. Inf. Sci.
**2013**, 235, 214–223. [Google Scholar] [CrossRef] - Stam, A.J. Some inequalities satisfied by the quantities of information of Fisher and Shannon. Inf. Control.
**1959**, 2, 101–112. [Google Scholar] [CrossRef] - Di Crescenzo, A.; Longobardi, M. On cumulative entropies. J. Stat. Plan. Inference
**2009**, 139, 4072–4087. [Google Scholar] [CrossRef] - Rao, M.; Chen, Y.; Vemuri, B.C.; Wang, F. Cumulative residual entropy: A new measure of information. IEEE Trans. Inf. Theory
**2004**, 50, 1220–1228. [Google Scholar] [CrossRef] - Nguyen, H.V.; Mandros, P.; Vreeken, J. Universal Dependency Analysis. In Proceedings of the 2016 SIAM International Conference on Data Mining, Miami, FL, USA, 5–7 May 2016; pp. 792–800. [Google Scholar] [CrossRef]
- Böhm, K.; Keller, F.; Müller, E.; Nguyen, H.V.; Vreeken, J. CMI: An Information-Theoretic Contrast Measure for Enhancing Subspace Cluster and Outlier Detection. In Proceedings of the 13th SIAM International Conference on Data Mining, Austin, TX, USA, 2–4 May 2013; pp. 198–206. [Google Scholar] [CrossRef]
- Tsallis, C. Possible generalization of Boltzmann-Gibbs statistics. J. Stat. Phys.
**1988**, 52, 479–487. [Google Scholar] [CrossRef] - Calì, C.; Longobardi, M.; Ahmadi, J. Some properties of cumulative Tsallis entropy. Phys. A Stat. Mech. Its Appl.
**2017**, 486, 1012–1021. [Google Scholar] [CrossRef] - Pele, D.T.; Lazar, E.; Mazurencu-Marinescu-Pele, M. Modeling Expected Shortfall Using Tail Entropy. Entropy
**2019**, 21, 1204. [Google Scholar] [CrossRef] - MacKay, D.J. Information Theory, Inference, and Learning Algorithms, 1st ed.; Cambridge University Press: Cambridge, UK, 2003. [Google Scholar]
- Kac, M.; Kiefer, J.; Wolfowitz, J. On tests of normality and other tests of goodness of fit based on distance methods. Ann. Math. Stat.
**1955**, 26, 189–211. [Google Scholar] [CrossRef] - Nowozin, S.; Cseke, B.; Tomioka, R. f-GAN: Training generative neural samplers using variational divergence minimization. In Proceedings of the 30th Annual Conference on Advances in Neural Information Processing Systems, Barcelona, Spain, 5–10 December 2016; pp. 271–279. [Google Scholar]
- Contreras-Reyes, J. Asymptotic form of the Kullback-Leibler divergence for multivariate asymmetric heavy-tailed distributions. Phys. A Stat. Mech. Its Appl.
**2014**, 395, 200–208. [Google Scholar] [CrossRef] - Houle, M.E.; Kashima, H.; Nett, M. Generalized Expansion Dimension. In Proceedings of the IEEE 12th International Conference on Data Mining Workshops, Brussels, Belgium, 10 December 2012; pp. 587–594. [Google Scholar]
- Karger, D.R.; Ruhl, M. Finding nearest neighbors in growth-restricted metrics. In Proceedings of the 34th ACM Symposium on Theory of Computing, Montreal, QC, Canada, 19–21 May 2002; pp. 741–750. [Google Scholar]
- Houle, M.E. Dimensionality, Discriminability, Density and Distance Distributions. In Proceedings of the IEEE 13th International Conference on Data Mining Workshops, Dallas, TX, USA, 7–10 December 2013; pp. 468–473. [Google Scholar]
- Karamata, J. Sur un mode de croissance régulière. Théorèmes fondamentaux. Bull. Société Mathématique Fr.
**1933**, 61, 55–62. [Google Scholar] [CrossRef] - Coles, S.; Bawa, J.; Trenner, L.; Dorazio, P. An Introduction to Statistical Modeling of Extreme Values; Springer: Berlin/Heidelberg, Germany, 2001; Volume 208. [Google Scholar]
- Houle, M.E. Local Intrinsic Dimensionality II: Multivariate Analysis and Distributional Support. In Proceedings of the International Conference on Similarity Search and Applications, Munich, Germany, 4–6 October 2017; pp. 80–95. [Google Scholar]
- Song, K. Renyi information, log likelihood and an intrinsic distribution measure. J. Statist. Plann. Inference
**2001**, 93, 51–69. [Google Scholar] [CrossRef] - Buono, F.; Longobardi, M. Varentropy of past lifetimes. arXiv
**2020**, arXiv:2008.07423. [Google Scholar] - Maadani, S.; Borzadaran, G.R.M.; Roknabadi, A.H.R. Varentropy of order statistics and some stochastic comparisons. Commun. Stat. Theory Methods
**2021**, 51, 6447–6460. [Google Scholar] [CrossRef] - Raqab, M.Z.; Bayoud, H.A.; Qiu, G. Varentropy of inactivity time of a random variable and its related applications. IMA J. Math. Control. Inf.
**2021**, 39, 132–154. [Google Scholar] [CrossRef] - Kullback, S.; Leibler, R. On information and sufficiency. Ann. Math. Stat.
**1951**, 22, 79–86. [Google Scholar] [CrossRef] - Lin, J. Divergence measures based on the Shannon entropy. IEEE Trans. Inf. Theory
**1991**, 37, 145–151. [Google Scholar] [CrossRef] - Basu, A.; Harris, I.R.; Hjort, N.L.; Jones, M.C. Robust and efficient estimation by minimising a density power divergence. Biometrika
**1998**, 85, 549–559. [Google Scholar] [CrossRef] - Hellinger, E. Neue Begründung der Theorie quadratischer Formen von unendlichvielen Veränderlichen. J. Für Die Reine Und Angew. Math.
**1909**, 136, 210–271. [Google Scholar] [CrossRef] - Cichocki, A.; Amari, S. Families of Alpha- Beta- and Gamma- Divergences: Flexible and Robust Measures of Similarities. Entropy
**2010**, 12, 1532–1568. [Google Scholar] [CrossRef] - Pearson, K. On the criterion that a given system of deviations from the probable in the case of a correlated system of variables is such that it can be reasonably supposed to have arisen from random sampling. Lond. Edinb. Dublin Philos. Mag. J. Sci.
**1900**, 50, 157–175. [Google Scholar] [CrossRef] - Kantorovich, L.V. Mathematical Methods of Organizing and Planning Production. Manag. Sci.
**1939**, 6, 366–422. [Google Scholar] [CrossRef] - Arjovsky, M.; Chintala, S.; Bottou, L. Wasserstein Generative Adversarial Networks. In Proceedings of the 34th International Conference on Machine Learning, ICML 2017, Sydney, NSW, Australia, 6–11 August 2017; Precup, D., Teh, Y.W., Eds.; PMLR: Cambridge, MA, USA, 2017; Volume 70, pp. 214–223. [Google Scholar]
- Houle, M.E. Local Intrinsic Dimensionality III: Density and Similarity. In Proceedings of the International Conference on Similarity Search and Applications, Copenhagen, Denmark, 30 September–2 October 2020. [Google Scholar]
- Itakura, F.; Saito, S. Analysis synthesis telephony based on the maximum likelihood method. In Proceedings of the 6th International Congress on Acoustics, Tokyo, Japan, 21–28 August 1968; pp. C17–C20. [Google Scholar]
- Fevotte, C.; Bertin, N.; Durrieu, J. Nonnegative Matrix Factorization with the Itakura-Saito Divergence: With Application to Music Analysis. Neural Comput.
**2009**, 21, 793–830. [Google Scholar] [CrossRef] - Bregman, L.M. The relaxation method of finding the common points of convex sets and its application to the solution of problems in convex programming. USSR Comput. Math. Math. Phys.
**1967**, 7, 200–217. [Google Scholar] [CrossRef] - Nielsen, F.; Nock, R. Sided and symmetrized Bregman centroids. IEEE Trans. Inf. Theory
**2009**, 55, 2882–2904. [Google Scholar] [CrossRef] - Banerjee, A.; Merugu, S.; Dhillon, I.S.; Ghosh, J. Clustering with Bregman Divergences. J. Mach. Learn. Res.
**2005**, 6, 1705–1749. [Google Scholar] - Fang, K.W.; Kotz, S.; Wang Ng, K. Symmetric Multivariate and Related Distributions; CRC Press: Boca Raton, FL, USA, 2018. [Google Scholar]
- Baker, J.A. Integration of Radial Functions. Math. Mag.
**1999**, 72, 392–395. [Google Scholar] [CrossRef]

**Figure 1.**Visualization of selected measures from Table 6 (

**a**) Entropy behavior as the ratio $\frac{{ID}_{F}^{\ast}}{n}$ varies; (

**b**) Divergence/distance behavior as the ratio $\frac{{ID}_{G}^{\ast}}{{ID}_{F}^{\ast}}$ varies.

**Table 1.**Asymptotic equivalences between LID formulations and tail measures of entropy or divergence. In each case, the functions F and G are assumed to be smooth growth functions. In addition, for the Normalized Wasserstein Distance, F and G must be strictly monotonically increasing, thereby guaranteeing that the inverses of ${F}_{w}$ and ${G}_{w}$ exist near zero. In some cases, for the asymptotic limit to exist non-trivially (that is, to be both finite and non-zero), the tail entropy or tail divergence must be normalized by the multiplicative factor $\frac{1}{w}$, w. For the Tail Entropy and Tail Cross Entropy, no reweighting by powers of w can lead to a non-trivial asymptotic limit as w tends to zero.

Tail Measure | Formulation | Limit as $\mathit{w}\to {0}^{+}$ |
---|---|---|

Entropy | $\mathrm{H}(F,w)$ = $-{\int}_{0}^{w}{F}_{w}^{\prime}\left(t\right)ln{F}_{w}^{\prime}\left(t\right)\phantom{\rule{0.166667em}{0ex}}\mathrm{d}t$ | Diverges (no reweighting possible) |

Varentropy | $\mathrm{VarH}(F,w)$ = ${\int}_{0}^{w}{F}_{w}^{\prime}\left(t\right){ln}^{2}{F}_{w}^{\prime}\left(t\right)\phantom{\rule{0.166667em}{0ex}}\mathrm{d}t\phantom{\rule{0.277778em}{0ex}}-{\left({\int}_{0}^{w}{F}_{w}^{\prime}\left(t\right)ln{F}_{w}^{\prime}\left(t\right)\phantom{\rule{0.166667em}{0ex}}\mathrm{d}t\right)}^{2}$ | ${\left(1-\frac{1}{{ID}_{F}^{\ast}}\right)}^{2}$ |

q-Entropy | ${\mathrm{H}}_{q}(F,w)$ = $\frac{1}{q-1}{\int}_{0}^{w}{F}_{w}^{\prime}\left(t\right)-{\left({F}_{w}^{\prime}\left(t\right)\right)}^{q}\phantom{\rule{0.166667em}{0ex}}\mathrm{d}t$ | $\frac{1}{q-1}$ if $q<1$, diverges if $q>1$ |

Normalized Cumulative Entropy | $\frac{1}{w}\mathrm{cH}(F,w)$ = $-\frac{1}{w}{\int}_{0}^{w}{F}_{w}\left(t\right)ln{F}_{w}\left(t\right)\phantom{\rule{0.166667em}{0ex}}\mathrm{d}t$ | $\frac{{ID}_{F}^{\ast}}{{({ID}_{F}^{\ast}+1)}^{2}}$ |

Normalized Cumulative q-Entropy | $\frac{1}{w}{\mathrm{cH}}_{q}(F,w)$ = $\frac{1}{w(q-1)}{\int}_{0}^{w}{F}_{w}\left(t\right)-{\left({F}_{w}\left(t\right)\right)}^{q}\phantom{\rule{0.166667em}{0ex}}\mathrm{d}t$ | $\frac{{ID}_{F}^{\ast}}{({ID}_{F}^{\ast}+1)(q{ID}_{F}^{\ast}+1)}$ if $q\ne 1$ |

Normalized Entropy Power | $\frac{1}{w}\mathrm{HP}(F,w)$ = $\frac{1}{w}exp\left(\mathrm{H}(F,w)\right)$ | $\frac{1}{{ID}_{F}^{\ast}}exp\left(1-\frac{1}{{ID}_{F}^{\ast}}\right)$ |

Normalized q-Entropy Power | $\frac{1}{w}{\mathrm{HP}}_{q}(F,w)$ = $\frac{1}{w}{\left[1+(1-q)\phantom{\rule{0.166667em}{0ex}}{\mathrm{H}}_{q}(F,w)\right]}^{\frac{1}{1-q}}$ | ${\left(\frac{{\left({ID}_{F}^{\ast}\right)}^{q}}{q{ID}_{F}^{\ast}-q+1}\right)}^{\frac{1}{1-q}}$ |

if $q\ne 1$ and $q{ID}_{F}^{\ast}-q+1>0$ | ||

Cross Entropy | $\mathrm{XH}(F;G,w)$ = $-{\int}_{0}^{w}{F}_{w}^{\prime}\left(t\right)ln{G}_{w}^{\prime}\left(t\right)\phantom{\rule{0.166667em}{0ex}}\mathrm{d}t$ | Diverges (no reweighting possible) |

Normalized Cross Entropy Power | $\frac{1}{w}\mathrm{XHP}(F;G,w)$ = $\frac{1}{w}exp\left({-\int}_{0}^{w}{F}_{w}^{\prime}\left(t\right)ln{G}_{w}^{\prime}\left(t\right)\phantom{\rule{0.166667em}{0ex}}\mathrm{d}t\right)$ | $\frac{1}{{ID}_{G}^{\ast}}exp\left(\frac{{ID}_{G}^{\ast}-1}{{ID}_{F}^{\ast}}\right)$ |

KL Divergence | $\mathrm{KL}(F;G,w)$ = ${\int}_{0}^{w}{F}_{w}^{\prime}\left(t\right)ln\frac{{F}_{w}^{\prime}\left(t\right)}{{G}_{w}^{\prime}\left(t\right)}\phantom{\rule{0.166667em}{0ex}}\mathrm{d}t$ | $\rho -ln\rho -1;\phantom{\rule{0.222222em}{0ex}}\phantom{\rule{0.222222em}{0ex}}\phantom{\rule{0.222222em}{0ex}}\rho =\frac{{ID}_{G}^{\ast}}{{ID}_{F}^{\ast}}$ |

JS Divergence | $\mathrm{JS}(F;G,w)$ = $\frac{1}{2}\left(\mathrm{KL}\left(F;\frac{F+G}{2},w\right)+\mathrm{KL}\left(G;\frac{F+G}{2},w\right)\right)$ | $\frac{1}{2}\left(\tau -ln\tau -1\right);\phantom{\rule{0.222222em}{0ex}}\phantom{\rule{0.222222em}{0ex}}\tau =min\{\rho ,\frac{1}{\rho}\};\phantom{\rule{0.222222em}{0ex}}\phantom{\rule{0.222222em}{0ex}}\rho =\frac{{ID}_{G}^{\ast}}{{ID}_{F}^{\ast}}$ |

Weighted L2 Distance | $w\phantom{\rule{0.166667em}{0ex}}\mathrm{L}2\mathrm{D}(F;G,w)$ = $w{\int}_{0}^{w}{\left({F}_{w}^{\prime}\left(t\right)-{G}_{w}^{\prime}\left(t\right)\right)}^{2}\phantom{\rule{0.166667em}{0ex}}\mathrm{d}t$ | $\frac{{\left({ID}_{F}^{\ast}-{ID}_{G}^{\ast}\right)}^{2}}{2({ID}_{F}^{\ast}+{ID}_{G}^{\ast}-1)}\left[1+\frac{1}{(2{ID}_{F}^{\ast}-1)(2{ID}_{G}^{\ast}-1)}\right]$ |

${ID}_{F}^{\ast}>\frac{1}{2}$; ${ID}_{G}^{\ast}>\frac{1}{2}$ | ||

Hellinger Distance | $\mathrm{HD}(F;G,w)$ = $\sqrt{\frac{1}{2}{\int}_{0}^{w}{\left(\sqrt{{F}_{w}^{\prime}\left(t\right)}-\sqrt{{G}_{w}^{\prime}\left(t\right)}\right)}^{2}\phantom{\rule{0.166667em}{0ex}}\mathrm{d}t}$ | $\frac{\left|1-\sqrt{\rho}\right|}{\sqrt{1+\rho}};\phantom{\rule{0.222222em}{0ex}}\phantom{\rule{0.222222em}{0ex}}\phantom{\rule{0.222222em}{0ex}}\rho =\frac{{ID}_{G}^{\ast}}{{ID}_{F}^{\ast}}$ |

${\chi}^{2}$-Divergence | ${\chi}^{2}\mathrm{D}(F;G,w)$ = ${\int}_{0}^{w}\frac{{\left({F}_{w}^{\prime}\left(t\right)-{G}_{w}^{\prime}\left(t\right)\right)}^{2}}{{G}_{w}^{\prime}\left(t\right)}\phantom{\rule{0.166667em}{0ex}}\mathrm{d}t$ | $\frac{{\left(1-\rho \right)}^{2}}{\rho (2-\rho )};\phantom{\rule{0.222222em}{0ex}}\phantom{\rule{0.222222em}{0ex}}\phantom{\rule{0.222222em}{0ex}}\rho =\frac{{ID}_{G}^{\ast}}{{ID}_{F}^{\ast}};\phantom{\rule{0.277778em}{0ex}}\rho <2$ |

$\alpha $-Divergence | $\alpha \mathrm{D}(F;G,w)$ = $\frac{1}{\alpha (1-\alpha )}{\int}_{0}^{w}\alpha {F}_{w}^{\prime}\left(t\right)+(1-\alpha ){G}_{w}^{\prime}\left(t\right)$ | $\frac{1}{\alpha (1-\alpha )}\left(1-\frac{1}{\alpha {\rho}^{\alpha -1}+(1-\alpha ){\rho}^{\alpha}}\right)$ |

$-{\left({F}_{w}^{\prime}\left(t\right)\right)}^{\alpha}{\left({G}_{w}^{\prime}\left(t\right)\right)}^{1-\alpha}\phantom{\rule{0.166667em}{0ex}}\mathrm{d}t$ | $\rho =\frac{{ID}_{G}^{\ast}}{{ID}_{F}^{\ast}}\phantom{\rule{0.277778em}{0ex}};\phantom{\rule{0.277778em}{0ex}}\alpha +\rho (1-\alpha )>0$ | |

Normalized Wasserstein Distance | $\frac{1}{w}{\mathrm{WD}}_{p}(F;G,w)$ = $\frac{1}{w}{\left({\int}_{0}^{1}{\left|{F}_{w}^{-1}\left(u\right)-{G}_{w}^{-1}\left(u\right)\right|}^{p}\phantom{\rule{0.166667em}{0ex}}\mathrm{d}u\right)}^{\frac{1}{p}}$ | $p=2$: $\sqrt{\frac{1}{\frac{2}{{ID}_{F}^{\ast}}+1}-\frac{2}{\frac{1}{{ID}_{F}^{\ast}}+\frac{1}{{ID}_{G}^{\ast}}+1}+\frac{1}{\frac{2}{{ID}_{G}^{\ast}}+1}}$ |

p even: ${\left({\sum}_{j=0}^{p}\frac{{(-1)}^{j}\left(\stackrel{p}{j}\right)}{(p-j)\xb7{\left({ID}_{F}^{\ast}\right)}^{-1}+j\xb7{\left({ID}_{G}^{\ast}\right)}^{-1}+1}\right)}^{\frac{1}{p}}$ |

**Table 2.**Derivations of asymptotic relationships between tail entropy variants and local intrinsic dimensionality. Each step shows the equivalences between the formulations when w is allowed to tend to zero. In the comments column, for each step of the derivation, the lemmas invoked are stated, as well as any additional assumptions made. If a normalization other weighting is needed to avoid divergence, or convergence to a constant (independent of F), the details are shown in a comment in the final step. In all cases, F is assumed to be a smooth growth function.

Tail Measure | Derivation Steps | Comments |
---|---|---|

Entropy | $\mathrm{H}(F,w)$ → $-{\int}_{0}^{w}{F}_{w}^{\prime}\left(t\right)ln{F}_{w}^{\prime}\left(t\right)\phantom{\rule{0.166667em}{0ex}}\mathrm{d}t$ | |

→ $-{\int}_{0}^{w}\frac{{ID}_{F}\left(t\right)\phantom{\rule{0.166667em}{0ex}}{F}_{w}\left(t\right)}{t}ln\frac{{ID}_{F}\left(t\right)\phantom{\rule{0.166667em}{0ex}}{F}_{w}\left(t\right)}{t}\phantom{\rule{0.166667em}{0ex}}\mathrm{d}t$ | using Theorem 1 | |

→ $-{\int}_{0}^{w}\frac{{ID}_{F}^{\ast}\phantom{\rule{0.166667em}{0ex}}{F}_{w}\left(t\right)}{t}ln\frac{{ID}_{F}^{\ast}\phantom{\rule{0.166667em}{0ex}}{F}_{w}\left(t\right)}{t}\phantom{\rule{0.166667em}{0ex}}\mathrm{d}t$ | using Lemma 3 | |

→ $-{\int}_{0}^{w}\frac{{ID}_{F}^{\ast}}{t}{\left(\frac{t}{w}\right)}^{{ID}_{F}^{\ast}}ln\left[\frac{{ID}_{F}^{\ast}}{t}{\left(\frac{t}{w}\right)}^{{ID}_{F}^{\ast}}\right]\phantom{\rule{0.166667em}{0ex}}\mathrm{d}t$ | using Lemma 1 | |

→ $1-\frac{1}{{ID}_{F}^{\ast}}-ln\frac{{ID}_{F}^{\ast}}{w}$ | no reweighting | |

Varentropy | $\mathrm{VarH}(F,w)$ → ${\int}_{0}^{w}{F}_{w}^{\prime}\left(t\right){ln}^{2}{F}_{w}^{\prime}\left(t\right)\phantom{\rule{0.166667em}{0ex}}\mathrm{d}t\phantom{\rule{0.277778em}{0ex}}-{\left({\int}_{0}^{w}{F}_{w}^{\prime}\left(t\right)ln{F}_{w}^{\prime}\left(t\right)\phantom{\rule{0.166667em}{0ex}}\mathrm{d}t\right)}^{2}$ | |

→ ${\int}_{0}^{w}\frac{{ID}_{F}\left(t\right)\phantom{\rule{0.166667em}{0ex}}{F}_{w}\left(t\right)}{t}{ln}^{2}\frac{{ID}_{F}\left(t\right)\phantom{\rule{0.166667em}{0ex}}{F}_{w}\left(t\right)}{t}\phantom{\rule{0.166667em}{0ex}}\mathrm{d}t\phantom{\rule{0.277778em}{0ex}}-{\left({\int}_{0}^{w}\frac{{ID}_{F}\left(t\right)\phantom{\rule{0.166667em}{0ex}}{F}_{w}\left(t\right)}{t}ln\frac{{ID}_{F}\left(t\right)\phantom{\rule{0.166667em}{0ex}}{F}_{w}\left(t\right)}{t}\phantom{\rule{0.166667em}{0ex}}\mathrm{d}t\right)}^{2}$ | using Theorem 1 | |

→ ${\int}_{0}^{w}\frac{{ID}_{F}^{\ast}\phantom{\rule{0.166667em}{0ex}}{F}_{w}\left(t\right)}{t}{ln}^{2}\frac{{ID}_{F}^{\ast}\phantom{\rule{0.166667em}{0ex}}{F}_{w}\left(t\right)}{t}\phantom{\rule{0.166667em}{0ex}}\mathrm{d}t\phantom{\rule{0.277778em}{0ex}}-{\left({\int}_{0}^{w}\frac{{ID}_{F}^{\ast}\phantom{\rule{0.166667em}{0ex}}{F}_{w}\left(t\right)}{t}ln\frac{{ID}_{F}^{\ast}\phantom{\rule{0.166667em}{0ex}}{F}_{w}\left(t\right)}{t}\phantom{\rule{0.166667em}{0ex}}\mathrm{d}t\right)}^{2}$ | using Lemma 3 | |

→ ${\int}_{0}^{w}\frac{{ID}_{F}^{\ast}}{t}{\left(\frac{t}{w}\right)}^{{ID}_{F}^{\ast}}{ln}^{2}\left[\frac{{ID}_{F}^{\ast}}{t}{\left(\frac{t}{w}\right)}^{{ID}_{F}^{\ast}}\right]\phantom{\rule{0.166667em}{0ex}}\mathrm{d}t\phantom{\rule{0.277778em}{0ex}}-{\left({\int}_{0}^{w}\frac{{ID}_{F}^{\ast}}{t}{\left(\frac{t}{w}\right)}^{{ID}_{F}^{\ast}}ln\left[\frac{{ID}_{F}^{\ast}}{t}{\left(\frac{t}{w}\right)}^{{ID}_{F}^{\ast}}\right]\phantom{\rule{0.166667em}{0ex}}\mathrm{d}t\right)}^{2}$ | using Lemma 1 | |

→ ${\left(1-\frac{1}{{ID}_{F}^{\ast}}\right)}^{2}$ | ||

q-Entropy | ${\mathrm{H}}_{q}(F,w)$ → $\frac{1}{q-1}{\int}_{0}^{w}{F}_{w}^{\prime}\left(t\right)-{\left({F}_{w}^{\prime}\left(t\right)\right)}^{q}\phantom{\rule{0.166667em}{0ex}}\mathrm{d}t$ | $q>1$ |

→ $\frac{1}{q-1}{\int}_{0}^{w}\frac{{ID}_{F}\left(t\right)\phantom{\rule{0.166667em}{0ex}}{F}_{w}\left(t\right)}{t}-{\left(\frac{{ID}_{F}\left(t\right)\phantom{\rule{0.166667em}{0ex}}{F}_{w}\left(t\right)}{t}\right)}^{q}\phantom{\rule{0.166667em}{0ex}}\mathrm{d}t$ | using Theorem 1 | |

→ $\frac{1}{q-1}{\int}_{0}^{w}\frac{{ID}_{F}^{\ast}\phantom{\rule{0.166667em}{0ex}}{F}_{w}\left(t\right)}{t}-{\left(\frac{{ID}_{F}^{\ast}\phantom{\rule{0.166667em}{0ex}}{F}_{w}\left(t\right)}{t}\right)}^{q}\phantom{\rule{0.166667em}{0ex}}\mathrm{d}t$ | using Lemma 3 | |

→ $\frac{1}{q-1}{\int}_{0}^{w}\frac{{ID}_{F}^{\ast}}{t}{\left(\frac{t}{w}\right)}^{{ID}_{F}^{\ast}}-{\left(\frac{{ID}_{F}^{\ast}}{t}{\left(\frac{t}{w}\right)}^{{ID}_{F}^{\ast}}\right)}^{q}\phantom{\rule{0.166667em}{0ex}}\mathrm{d}t$ | using Lemma 1 | |

→ $\frac{1}{q-1}\left(1-\frac{1}{{w}^{q-1}}\xb7\frac{{\left({ID}_{F}^{\ast}\right)}^{q}}{q{ID}_{F}^{\ast}-q+1}\right)$ | ||

Cumulative Entropy | $\mathrm{cH}(F,w)$ → $-{\int}_{0}^{w}{F}_{w}\left(t\right)ln{F}_{w}\left(t\right)\phantom{\rule{0.166667em}{0ex}}\mathrm{d}t$ | |

→ $-{\int}_{0}^{w}{\left(\frac{t}{w}\right)}^{{ID}_{F}^{\ast}}ln{\left(\frac{t}{w}\right)}^{{ID}_{F}^{\ast}}\phantom{\rule{0.166667em}{0ex}}\mathrm{d}t$ | using Lemma 1 | |

→ $w\frac{{ID}_{F}^{\ast}}{{({ID}_{F}^{\ast}+1)}^{2}}$ | weight by $\frac{1}{w}$ | |

Cumulative q-Entropy | ${\mathrm{cH}}_{q}(F,w)$ → $\frac{1}{q-1}{\int}_{0}^{w}{F}_{w}\left(t\right)-{\left({F}_{w}\left(t\right)\right)}^{q}\phantom{\rule{0.166667em}{0ex}}\mathrm{d}t$ | $q\ne 1$ |

→ $\frac{1}{q-1}{\int}_{0}^{w}{\left(\frac{t}{w}\right)}^{{ID}_{F}^{\ast}}-{\left(\frac{t}{w}\right)}^{q{ID}_{F}^{\ast}}\phantom{\rule{0.166667em}{0ex}}\mathrm{d}t$ | using Lemma 1 | |

→ $w\frac{{ID}_{F}^{\ast}}{({ID}_{F}^{\ast}+1)(q{ID}_{F}^{\ast}+1)}$ | weight by $\frac{1}{w}$ | |

Entropy Power | $\mathrm{HP}(F,w)$ → $exp\left(\mathrm{H}(F,w)\right)$ | |

→ $exp\left(1-\frac{1}{{ID}_{F}^{\ast}}-ln\frac{{ID}_{F}^{\ast}}{w}\right)$ | by substitution | |

→ $w\frac{1}{{ID}_{F}^{\ast}}exp\left(1-\frac{1}{{ID}_{F}^{\ast}}\right)$ | weight by $\frac{1}{w}$ | |

q-Entropy Power | ${\mathrm{HP}}_{q}(F,w)$ → ${\left[1+(1-q)\phantom{\rule{0.166667em}{0ex}}{\mathrm{H}}_{q}(F,w)\right]}^{\frac{1}{1-q}}$ | $q\ne 1$ |

→ ${\left(1+(1-q)\xb7\frac{1}{q-1}\left[1-\frac{1}{{w}^{q-1}}\xb7\frac{{\left({ID}_{F}^{\ast}\right)}^{q}}{q{ID}_{F}^{\ast}-q+1}\right]\right)}^{\frac{1}{1-q}}$ | by substitution | |

→ $w{\left(\frac{{\left({ID}_{F}^{\ast}\right)}^{q}}{q{ID}_{F}^{\ast}-q+1}\right)}^{\frac{1}{1-q}}$ | weight by $\frac{1}{w}$ |

**Table 3.**Derivations of asymptotic relationships between tail divergences and local intrinsic dimensionality. Each step shows the equivalences between the formulations when w is allowed to tend to zero. In the comments column, for each step of the derivation, the lemmas invoked are stated, as well as any additional assumptions made. If a normalization or weighting is needed, the details are shown in a comment in the final step. In all cases, F and G are assumed to be smooth growth functions.

Tail Measure | Derivation Steps | Comments |
---|---|---|

Cross Entropy | $\mathrm{XH}(F;G,w)$ → $-{\int}_{0}^{w}{F}_{w}^{\prime}\left(t\right)ln{G}_{w}^{\prime}\left(t\right)\phantom{\rule{0.166667em}{0ex}}\mathrm{d}t$ | |

→ $-{\int}_{0}^{w}\frac{{ID}_{F}\left(t\right)\phantom{\rule{0.166667em}{0ex}}{F}_{w}\left(t\right)}{t}ln\frac{{ID}_{G}\left(t\right)\phantom{\rule{0.166667em}{0ex}}{G}_{w}\left(t\right)}{t}\phantom{\rule{0.166667em}{0ex}}\mathrm{d}t$ | using Theorem 1 | |

→ $-{\int}_{0}^{w}\frac{{ID}_{F}^{\ast}\phantom{\rule{0.166667em}{0ex}}{F}_{w}\left(t\right)}{t}ln\frac{{ID}_{G}^{\ast}\phantom{\rule{0.166667em}{0ex}}{G}_{w}\left(t\right)}{t}\phantom{\rule{0.166667em}{0ex}}\mathrm{d}t$ | using Lemma 3 | |

→ $-{\int}_{0}^{w}\frac{{ID}_{F}^{\ast}}{t}{\left(\frac{t}{w}\right)}^{{ID}_{F}^{\ast}}ln\left[\frac{{ID}_{G}^{\ast}}{t}{\left(\frac{t}{w}\right)}^{{ID}_{G}^{\ast}}\right]\phantom{\rule{0.166667em}{0ex}}\mathrm{d}t$ | using Lemma 1 | |

→ $\frac{{ID}_{G}^{\ast}-1}{{ID}_{F}^{\ast}}-ln\frac{{ID}_{G}^{\ast}}{w}$ | no reweighting | |

Cross Entropy Power | $\mathrm{XHP}(F;G,w)$ → $exp\left(\mathrm{XH}(F;G,w)\right)$ | |

→ $exp\left(\frac{{ID}_{G}^{\ast}-1}{{ID}_{F}^{\ast}}-ln\frac{{ID}_{G}^{\ast}}{w}\right)$ | by substitution | |

→ $w\frac{1}{{ID}_{G}^{\ast}}exp\left(\frac{{ID}_{G}^{\ast}-1}{{ID}_{F}^{\ast}}\right)$ | weight by $\frac{1}{w}$ | |

KL Divergence | $\mathrm{KL}(F;G,w)$ → ${\int}_{0}^{w}{F}_{w}^{\prime}\left(t\right)ln\frac{{F}_{w}^{\prime}\left(t\right)}{{G}_{w}^{\prime}\left(t\right)}\phantom{\rule{0.166667em}{0ex}}\mathrm{d}t$ | |

→ ${\int}_{0}^{w}\frac{{ID}_{F}\left(t\right)\phantom{\rule{0.166667em}{0ex}}{F}_{w}\left(t\right)}{t}ln\frac{{ID}_{F}\left(t\right)\phantom{\rule{0.166667em}{0ex}}{F}_{w}\left(t\right)}{{ID}_{G}\left(t\right)\phantom{\rule{0.166667em}{0ex}}{G}_{w}\left(t\right)}\phantom{\rule{0.166667em}{0ex}}\mathrm{d}t$ | using Theorem 1 | |

→ ${\int}_{0}^{w}\frac{{ID}_{F}^{\ast}\phantom{\rule{0.166667em}{0ex}}{F}_{w}\left(t\right)}{t}ln\frac{{ID}_{F}^{\ast}\phantom{\rule{0.166667em}{0ex}}{F}_{w}\left(t\right)}{{ID}_{G}^{\ast}\phantom{\rule{0.166667em}{0ex}}{G}_{w}\left(t\right)}\phantom{\rule{0.166667em}{0ex}}\mathrm{d}t$ | using Lemma 3 | |

→ ${\int}_{0}^{w}\frac{{ID}_{F}^{\ast}}{t}{\left(\frac{t}{w}\right)}^{{ID}_{F}^{\ast}}ln\left[\frac{{ID}_{F}^{\ast}}{{ID}_{G}^{\ast}}{\left(\frac{t}{w}\right)}^{{ID}_{F}^{\ast}-{ID}_{G}^{\ast}}\right]\phantom{\rule{0.166667em}{0ex}}\mathrm{d}t$ | using Lemma 1 | |

→ $\rho -ln\rho -1$ | $\rho =\frac{{ID}_{G}^{\ast}}{{ID}_{F}^{\ast}}$ | |

JS Divergence | $\mathrm{JS}(F;G,w)$ → $\frac{1}{2}\left(\mathrm{KL}(F;M,w)+\mathrm{KL}(G;M,w)\right)$ | $M\left(t\right)=\frac{1}{2}\left(F\left(t\right)+G\left(t\right)\right)$ |

→ $\frac{1}{2}\left(\frac{{ID}_{M}^{\ast}}{{ID}_{F}^{\ast}}-ln\frac{{ID}_{M}^{\ast}}{{ID}_{F}^{\ast}}-1+\frac{{ID}_{M}^{\ast}}{{ID}_{G}^{\ast}}-ln\frac{{ID}_{M}^{\ast}}{{ID}_{G}^{\ast}}-1\right)$ | ${ID}_{M}^{\ast}=min\{{ID}_{F}^{\ast},{ID}_{G}^{\ast}\}$ | |

→ $\frac{1}{2}\left(\frac{{ID}_{M}^{\ast}}{B}+\frac{{ID}_{M}^{\ast}}{{ID}_{M}^{\ast}}-ln\frac{{ID}_{M}^{\ast}}{B}-ln\frac{{ID}_{M}^{\ast}}{{ID}_{M}^{\ast}}-2\right)$ | let $B=max\{{ID}_{F}^{\ast},{ID}_{G}^{\ast}\}$ | |

→ $\frac{1}{2}\left(\tau -ln\tau -1\right)$ | $\tau =min\left\{\frac{{ID}_{G}^{\ast}}{{ID}_{F}^{\ast}},\frac{{ID}_{F}^{\ast}}{{ID}_{G}^{\ast}}\right\}$ |

**Table 4.**Derivations of asymptotic relationships between tail distances and local intrinsic dimensionality. Each step shows the equivalences between the formulations when w is allowed to tend to zero. In the comments column, for each step of the derivation, the lemmas invoked are stated, as well as any additional assumptions made. For each tail distance, the first step of the derivations shows an expansion by which the monotonicity of each factor can be verified. If a normalization or weighting is needed, the details are shown in a comment in the final step. In all cases, F and G are assumed to be smooth growth functions.

Tail Measure | Derivation Steps | Comments |
---|---|---|

L2 Distance | $\mathrm{L}2\mathrm{D}(F;G,w)$ → ${\int}_{0}^{w}{\left({F}_{w}^{\prime}\left(t\right)-{G}_{w}^{\prime}\left(t\right)\right)}^{2}\phantom{\rule{0.166667em}{0ex}}\mathrm{d}t$ | |

→ ${\int}_{0}^{w}{\left(\frac{{ID}_{F}\left(t\right)\phantom{\rule{0.166667em}{0ex}}{F}_{w}\left(t\right)}{t}-\frac{{ID}_{G}\left(t\right)\phantom{\rule{0.166667em}{0ex}}{G}_{w}\left(t\right)}{t}\right)}^{2}\phantom{\rule{0.166667em}{0ex}}\mathrm{d}t$ | using Theorem 1 | |

→ ${\int}_{0}^{w}{\left(\frac{{ID}_{F}^{\ast}\phantom{\rule{0.166667em}{0ex}}{F}_{w}\left(t\right)}{t}\right)}^{2}-2\frac{{ID}_{F}^{\ast}\phantom{\rule{0.166667em}{0ex}}{F}_{w}\left(t\right)}{t}\xb7\frac{{ID}_{G}^{\ast}\phantom{\rule{0.166667em}{0ex}}{G}_{w}\left(t\right)}{t}+{\left(\frac{{ID}_{G}^{\ast}\phantom{\rule{0.166667em}{0ex}}{G}_{w}\left(t\right)}{t}\right)}^{2}\phantom{\rule{0.166667em}{0ex}}\mathrm{d}t$ | using Lemma 3 | |

→ ${\int}_{0}^{w}\frac{{\left({ID}_{F}^{\ast}\right)}^{2}}{{t}^{2}}{\left(\frac{t}{w}\right)}^{2{ID}_{F}^{\ast}}-\frac{2{ID}_{F}^{\ast}{ID}_{G}^{\ast}}{{t}^{2}}{\left(\frac{t}{w}\right)}^{{ID}_{F}^{\ast}+{ID}_{G}^{\ast}}+\frac{{\left({ID}_{G}^{\ast}\right)}^{2}}{{t}^{2}}{\left(\frac{t}{w}\right)}^{2{ID}_{G}^{\ast}}\phantom{\rule{0.166667em}{0ex}}\mathrm{d}t$ | using Lemma 1 | |

→ $\frac{1}{w}\xb7\frac{{\left({ID}_{F}^{\ast}-{ID}_{G}^{\ast}\right)}^{2}}{2({ID}_{F}^{\ast}+{ID}_{G}^{\ast}-1)}\left[1+\frac{1}{(2{ID}_{F}^{\ast}-1)(2{ID}_{G}^{\ast}-1)}\right]$ | weight by w | |

Hellinger Distance | $\mathrm{HD}(F;G,w)$ → $\sqrt{\frac{1}{2}{\int}_{0}^{w}{\left(\sqrt{{F}_{w}^{\prime}\left(t\right)}-\sqrt{{G}_{w}^{\prime}\left(t\right)}\right)}^{2}\phantom{\rule{0.166667em}{0ex}}\mathrm{d}t}$ | |

→ $\sqrt{\frac{1}{2}{\int}_{0}^{w}{\left(\sqrt{\frac{{ID}_{F}\left(t\right)\phantom{\rule{0.166667em}{0ex}}{F}_{w}\left(t\right)}{t}}-\sqrt{\frac{{ID}_{G}\left(t\right)\phantom{\rule{0.166667em}{0ex}}{G}_{w}\left(t\right)}{t}}\right)}^{2}\phantom{\rule{0.166667em}{0ex}}\mathrm{d}t}$ | using Theorem 1 | |

→ $\sqrt{\frac{1}{2}{\int}_{0}^{w}\frac{{ID}_{F}^{\ast}\phantom{\rule{0.166667em}{0ex}}{F}_{w}\left(t\right)}{t}-2\frac{\sqrt{{ID}_{F}^{\ast}\phantom{\rule{0.166667em}{0ex}}{F}_{w}\left(t\right)\xb7{ID}_{G}^{\ast}\phantom{\rule{0.166667em}{0ex}}{G}_{w}\left(t\right)}}{t}+\frac{{ID}_{G}^{\ast}\phantom{\rule{0.166667em}{0ex}}{G}_{w}\left(t\right)}{t}\phantom{\rule{0.166667em}{0ex}}\mathrm{d}t}$ | using Lemma 3 | |

→ $\sqrt{{\int}_{0}^{w}\frac{1}{2t}\left({ID}_{F}^{\ast}{\left(\frac{t}{w}\right)}^{{ID}_{F}^{\ast}}-2\sqrt{{ID}_{F}^{\ast}{ID}_{G}^{\ast}}{\left(\frac{t}{w}\right)}^{({ID}_{F}^{\ast}+{ID}_{G}^{\ast})/2}+{ID}_{G}^{\ast}{\left(\frac{t}{w}\right)}^{{ID}_{G}^{\ast}}\right)\phantom{\rule{0.166667em}{0ex}}\mathrm{d}t}$ | using Lemma 1 | |

→ $\frac{\left|1-\sqrt{\rho}\right|}{\sqrt{1+\rho}}$ | $\rho =\frac{{ID}_{G}^{\ast}}{{ID}_{F}^{\ast}}$ | |

${\chi}^{2}$-Divergence | ${\chi}^{2}\mathrm{D}(F;G,w)$ → ${\int}_{0}^{w}\frac{{\left({F}_{w}^{\prime}\left(t\right)-{G}_{w}^{\prime}\left(t\right)\right)}^{2}}{{G}_{w}^{\prime}\left(t\right)}\phantom{\rule{0.166667em}{0ex}}\mathrm{d}t$ | |

→ ${\int}_{0}^{w}{\left(\frac{{ID}_{F}\left(t\right)\phantom{\rule{0.166667em}{0ex}}{F}_{w}\left(t\right)}{t}-\frac{{ID}_{G}\left(t\right)\phantom{\rule{0.166667em}{0ex}}{G}_{w}\left(t\right)}{t}\right)}^{2}\frac{t}{{ID}_{G}\left(t\right)\phantom{\rule{0.166667em}{0ex}}{G}_{w}\left(t\right)}\phantom{\rule{0.166667em}{0ex}}\mathrm{d}t$ | using Theorem 1 | |

→ ${\int}_{0}^{w}\left[{\left(\frac{{ID}_{F}^{\ast}\phantom{\rule{0.166667em}{0ex}}{F}_{w}\left(t\right)}{t}\right)}^{2}-2\frac{{ID}_{F}^{\ast}\phantom{\rule{0.166667em}{0ex}}{F}_{w}\left(t\right)}{t}\xb7\frac{{ID}_{G}^{\ast}\phantom{\rule{0.166667em}{0ex}}{G}_{w}\left(t\right)}{t}+{\left(\frac{{ID}_{G}^{\ast}\phantom{\rule{0.166667em}{0ex}}{G}_{w}\left(t\right)}{t}\right)}^{2}\right]\frac{t}{{ID}_{G}^{\ast}\phantom{\rule{0.166667em}{0ex}}{G}_{w}\left(t\right)}\phantom{\rule{0.166667em}{0ex}}\mathrm{d}t$ | using Lemma 3 | |

→ ${\int}_{0}^{w}\left[\frac{{\left({ID}_{F}^{\ast}\right)}^{2}}{{t}^{2}}{\left(\frac{t}{w}\right)}^{2{ID}_{F}^{\ast}}-\frac{2{ID}_{F}^{\ast}{ID}_{G}^{\ast}}{{t}^{2}}{\left(\frac{t}{w}\right)}^{{ID}_{F}^{\ast}+{ID}_{G}^{\ast}}+\frac{{\left({ID}_{G}^{\ast}\right)}^{2}}{{t}^{2}}{\left(\frac{t}{w}\right)}^{2{ID}_{G}^{\ast}}\right]\frac{t}{{ID}_{G}^{\ast}}{\left(\frac{w}{t}\right)}^{{ID}_{G}^{\ast}}\phantom{\rule{0.166667em}{0ex}}\mathrm{d}t$ | using Lemma 1 | |

→ $\frac{{\left(1-\rho \right)}^{2}}{\rho (2-\rho )}$ | $\rho =\frac{{ID}_{G}^{\ast}}{{ID}_{F}^{\ast}}$ | |

$\alpha $-Divergence | $\alpha \mathrm{D}(F;G,w)$ → $\frac{1}{\alpha (1-\alpha )}{\int}_{0}^{w}\alpha {F}_{w}^{\prime}\left(t\right)+(1-\alpha ){G}_{w}^{\prime}\left(t\right)-{\left({F}_{w}^{\prime}\left(t\right)\right)}^{\alpha}{\left({G}_{w}^{\prime}\left(t\right)\right)}^{1-\alpha}\phantom{\rule{0.166667em}{0ex}}\mathrm{d}t$ | |

→ $\frac{1}{\alpha (1-\alpha )}{\int}_{0}^{w}\frac{\alpha {ID}_{F}\left(t\right)\phantom{\rule{0.166667em}{0ex}}{F}_{w}\left(t\right)}{t}+\frac{(1-\alpha ){ID}_{G}\left(t\right)\phantom{\rule{0.166667em}{0ex}}{G}_{w}\left(t\right)}{t}-{\left(\frac{{ID}_{F}\left(t\right)\phantom{\rule{0.166667em}{0ex}}{F}_{w}\left(t\right)}{t}\right)}^{\alpha}{\left(\frac{{ID}_{G}\left(t\right)\phantom{\rule{0.166667em}{0ex}}{G}_{w}\left(t\right)}{t}\right)}^{1-\alpha}\phantom{\rule{0.166667em}{0ex}}\mathrm{d}t$ | using Theorem 1 | |

→ $\frac{1}{\alpha (1-\alpha )}{\int}_{0}^{w}\frac{\alpha {ID}_{F}^{\ast}\phantom{\rule{0.166667em}{0ex}}{F}_{w}\left(t\right)}{t}+\frac{(1-\alpha ){ID}_{G}^{\ast}\phantom{\rule{0.166667em}{0ex}}{G}_{w}\left(t\right)}{t}-{\left(\frac{{ID}_{F}^{\ast}\phantom{\rule{0.166667em}{0ex}}{F}_{w}\left(t\right)}{t}\right)}^{\alpha}{\left(\frac{{ID}_{G}^{\ast}\phantom{\rule{0.166667em}{0ex}}{G}_{w}\left(t\right)}{t}\right)}^{1-\alpha}\phantom{\rule{0.166667em}{0ex}}\mathrm{d}t$ | using Lemma 3 | |

→ $\frac{1}{\alpha (1-\alpha )}{\int}_{0}^{w}\frac{\alpha {ID}_{F}^{\ast}}{t}{\left(\frac{t}{w}\right)}^{{ID}_{F}^{\ast}}+\frac{(1-\alpha ){ID}_{G}^{\ast}}{t}{\left(\frac{t}{w}\right)}^{{ID}_{G}^{\ast}}-\frac{{\left({ID}_{F}^{\ast}\right)}^{\alpha}{\left({ID}_{G}^{\ast}\right)}^{1-\alpha}}{t}{\left(\frac{t}{w}\right)}^{\alpha {ID}_{F}^{\ast}+(1-\alpha ){ID}_{G}^{\ast}}\phantom{\rule{0.166667em}{0ex}}\mathrm{d}t$ | using Lemma 1 | |

→ $\frac{1}{\alpha (1-\alpha )}\left(1-\frac{{\left({ID}_{F}^{\ast}\right)}^{\alpha}{\left({ID}_{G}^{\ast}\right)}^{1-\alpha}}{\alpha {ID}_{F}^{\ast}+(1-\alpha ){ID}_{G}^{\ast}}\right)$ | ||

→ $\frac{1}{\alpha (1-\alpha )}\left(1-\frac{1}{\alpha {\rho}^{\alpha -1}+(1-\alpha ){\rho}^{\alpha}}\right)$ | $\rho =\frac{{ID}_{G}^{\ast}}{{ID}_{F}^{\ast}}$ |

**Table 5.**Derivations of asymptotic relationships between tail Wasserstein distances and local intrinsic dimensionality. Each step shows the equivalences between the formulations when w is allowed to tend to zero. In the comments column, for each step of the derivation, the lemmas invoked are stated, as well as any additional assumptions made. Normalization details are shown in a comment in the final step. In all cases, F and G are assumed to be invertible smooth growth functions.

Tail Measure | Derivation Steps | Comments |
---|---|---|

Wasserstein Distance | ${\mathrm{WD}}_{2}(F;G,w)$ → $\sqrt{{\int}_{0}^{1}{\left({F}_{w}^{-1}\left(u\right)-{G}_{w}^{-1}\left(u\right)\right)}^{2}\phantom{\rule{0.166667em}{0ex}}\mathrm{d}u}$ | |

→ $\sqrt{{\int}_{0}^{1}{\left({F}_{w}^{-1}\left(u\right)\right)}^{2}-2{F}_{w}^{-1}\left(u\right)\xb7{G}_{w}^{-1}\left(u\right)+{\left({G}_{w}^{-1}\left(u\right)\right)}^{2}\phantom{\rule{0.166667em}{0ex}}\mathrm{d}u}$ | ||

$p=2$ | → $\sqrt{{\int}_{0}^{1}{w}^{2}{u}^{\frac{2}{{ID}_{F}^{\ast}}}-2{w}^{2}{u}^{\frac{1}{{ID}_{F}^{\ast}}+\frac{1}{{ID}_{G}^{\ast}}}+{w}^{2}{u}^{\frac{2}{{ID}_{G}^{\ast}}}\phantom{\rule{0.166667em}{0ex}}\mathrm{d}u}$ | using Lemma 2 |

→ $w\sqrt{\frac{1}{\frac{2}{{ID}_{F}^{\ast}}+1}-\frac{2}{\frac{1}{{ID}_{F}^{\ast}}+\frac{1}{{ID}_{G}^{\ast}}+1}+\frac{1}{\frac{2}{{ID}_{G}^{\ast}}+1}}$ | weight by $\frac{1}{w}$ | |

Wasserstein Distance | ${\mathrm{WD}}_{p}(F;G,w)$ → ${\left({\int}_{0}^{1}{\left({F}_{w}^{-1}\left(u\right)-{G}_{w}^{-1}\left(u\right)\right)}^{p}\phantom{\rule{0.166667em}{0ex}}\mathrm{d}u\right)}^{\frac{1}{p}}$ | |

$p\in \mathbb{N}$, p even | → ${\left({\int}_{0}^{1}{\sum}_{j=0}^{p}{(-1)}^{j}\left(\stackrel{p}{j}\right){\left({F}_{w}^{-1}\left(u\right)\right)}^{p-j}{\left({G}_{w}^{-1}\left(u\right)\right)}^{j}\phantom{\rule{0.166667em}{0ex}}\mathrm{d}u\right)}^{\frac{1}{p}}$ | |

→ ${\left({\int}_{0}^{1}{\sum}_{j=0}^{p}{(-1)}^{j}\left(\stackrel{p}{j}\right){\left(w{u}^{\frac{1}{{ID}_{F}^{\ast}}}\right)}^{p-j}{\left(w{u}^{\frac{1}{{ID}_{G}^{\ast}}}\right)}^{j}\phantom{\rule{0.166667em}{0ex}}\mathrm{d}u\right)}^{\frac{1}{p}}$ | using Lemma 2 | |

→ $w{\left({\sum}_{j=0}^{p}\frac{{(-1)}^{j}\left(\stackrel{p}{j}\right)}{(p-j)\xb7{\left({ID}_{F}^{\ast}\right)}^{-1}+j\xb7{\left({ID}_{G}^{\ast}\right)}^{-1}+1}\right)}^{\frac{1}{p}}$ | weight by $\frac{1}{w}$ |

**Table 6.**Asymptotic equivalences between LID formulations and tail measures of entropy or divergence for locally spherically symmetric distributions in the n-dimensional Euclidean setting. In each case, the density functions are assumed to be f and g, and the CDFs F and G of their induced distance distributions are assumed to be smooth growth functions. In the results, ${V}_{n}\left(r\right)$ and ${S}_{n-1}\left(r\right)$ denote the volume and surface area of the n-dimensional ball with radius r (respectively). In some cases, for the asymptotic limit to exist non-trivially (that is, to be both finite and non-zero), the tail entropy or tail divergence must be normalized by some multiplicative factor dependent on the tail volume ${V}_{n}\left(w\right)$.

Tail Measure | Formulation | Limit as $\mathit{w}\to {0}^{+}$ |
---|---|---|

Entropy | $\mathrm{H}(f,w)$ = $-{\int}_{\mathcal{B}\left(w\right)}{f}_{w}ln{f}_{w}\phantom{\rule{0.166667em}{0ex}}\mathrm{d}\mathcal{B}\left(w\right)\phantom{\rule{0.222222em}{0ex}}=\phantom{\rule{0.222222em}{0ex}}-{\int}_{0}^{w}{F}_{w}^{\prime}\left(t\right)ln\frac{{F}_{w}^{\prime}\left(t\right)}{{S}_{n-1}\left(t\right)}\phantom{\rule{0.166667em}{0ex}}\mathrm{d}t$ | Diverges (no reweighting possible) |

Varentropy | $\mathrm{VarH}(f,w)$ = ${\int}_{\mathcal{B}\left(w\right)}{f}_{w}\phantom{\rule{0.166667em}{0ex}}{ln}^{2}{f}_{w}\phantom{\rule{0.166667em}{0ex}}\mathrm{d}\mathcal{B}\left(w\right)-{\left({\int}_{\mathcal{B}\left(w\right)}{f}_{w}ln{f}_{w}\phantom{\rule{0.166667em}{0ex}}\mathrm{d}\mathcal{B}\left(w\right)\right)}^{2}$ | ${\left(1-\frac{1}{\phi}\right)}^{2}$ |

= ${\int}_{0}^{w}{F}_{w}^{\prime}\left(t\right){ln}^{2}\frac{{F}_{w}^{\prime}\left(t\right)}{{S}_{n-1}\left(t\right)}\phantom{\rule{0.166667em}{0ex}}\mathrm{d}t\phantom{\rule{0.277778em}{0ex}}-{\left({\int}_{0}^{w}{F}_{w}^{\prime}\left(t\right)ln\frac{{F}_{w}^{\prime}\left(t\right)}{{S}_{n-1}\left(t\right)}\phantom{\rule{0.166667em}{0ex}}\mathrm{d}t\right)}^{2}$ | $\phi =\frac{{ID}_{F}^{\ast}}{n}$ | |

q-Entropy | ${\mathrm{H}}_{q}(f,w)$ = $\frac{1}{q-1}{\int}_{\mathcal{B}\left(w\right)}{f}_{w}-{f}_{w}^{q}\phantom{\rule{0.166667em}{0ex}}\mathrm{d}\mathcal{B}\left(w\right)$ | $\frac{1}{q-1}$ if $q<1$ |

= $\frac{1}{q-1}{\int}_{0}^{w}{F}_{w}^{\prime}\left(t\right)-\frac{{\left({F}_{w}^{\prime}\left(t\right)\right)}^{q}}{{\left({S}_{n-1}\left(t\right)\right)}^{q-1}}\phantom{\rule{0.166667em}{0ex}}\mathrm{d}t$ | diverges if $q>1$ | |

Normalized | $\frac{1}{{V}_{n}\left(w\right)}\mathrm{cH}(f,w)$ = $-\frac{1}{{V}_{n}\left(w\right)}{\int}_{\mathbf{x}\in \mathcal{B}\left(w\right)}{F}_{w}(\parallel \mathbf{x}\parallel )\phantom{\rule{0.166667em}{0ex}}ln{F}_{w}(\parallel \mathbf{x}\parallel )\phantom{\rule{0.166667em}{0ex}}\mathrm{d}\mathcal{B}\left(w\right)$ | $\frac{\phi}{{(\phi +1)}^{2}}$ |

Cumulative Entropy | = $-\frac{1}{{V}_{n}\left(w\right)}{\int}_{0}^{w}\left({F}_{w}\left(t\right)\phantom{\rule{0.166667em}{0ex}}ln{F}_{w}\left(t\right)\right)\xb7{S}_{n-1}\left(t\right)\phantom{\rule{0.166667em}{0ex}}\mathrm{d}t$ | $\phi =\frac{{ID}_{F}^{\ast}}{n}$ |

Normalized | $\frac{1}{{V}_{n}\left(w\right)}{\mathrm{cH}}_{q}(f,w)$ = $-\frac{1}{{V}_{n}\left(w\right)}\xb7\frac{1}{q-1}{\int}_{\mathbf{x}\in \mathcal{B}\left(w\right)}{F}_{w}(\parallel \mathbf{x}\parallel )-{\left({F}_{w}(\parallel \mathbf{x}\parallel )\right)}^{q}\phantom{\rule{0.166667em}{0ex}}\mathrm{d}\mathcal{B}\left(w\right)$ | $\frac{\phi}{(q\phi +1)(\phi +1)}$ if $q\ne 1$ |

Cumulative q-Entropy | = $\frac{1}{{V}_{n}\left(w\right)}\xb7\frac{1}{q-1}{\int}_{0}^{w}\left({F}_{w}\left(t\right)-{\left({F}_{w}\left(t\right)\right)}^{q}\right)\xb7{S}_{n-1}\left(t\right)\phantom{\rule{0.166667em}{0ex}}\mathrm{d}t$ | $\phi =\frac{{ID}_{F}^{\ast}}{n}$ |

Normalized Entropy Power | $\frac{1}{{V}_{n}\left(w\right)}\mathrm{HP}(f,w)$ = $\frac{1}{{V}_{n}\left(w\right)}exp\left(\mathrm{H}(f,w)\right)$ | $\frac{1}{\phi}exp\left(1-\frac{1}{\phi}\right);\phantom{\rule{0.222222em}{0ex}}\phantom{\rule{0.222222em}{0ex}}\phantom{\rule{0.222222em}{0ex}}\phi =\frac{{ID}_{F}^{\ast}}{n}$ |

Normalized q-Entropy Power | $\frac{1}{{V}_{n}\left(w\right)}{\mathrm{HP}}_{q}(f,w)$ = $\frac{1}{{V}_{n}\left(w\right)}{\left[1+(1-q)\phantom{\rule{0.166667em}{0ex}}{\mathrm{H}}_{q}(f,w)\right]}^{\frac{1}{1-q}}$ | ${\left(\frac{{\phi}^{q}}{q\phi -q+1}\right)}^{\frac{1}{1-q}}$ ; $\phi =\frac{{ID}_{F}^{\ast}}{n}$ |

if $q\ne 1$ and $q\phi -q+1>0$ | ||

Cross Entropy | $\mathrm{XH}(f;g,w)$ = $-{\int}_{\mathcal{B}\left(w\right)}{f}_{w}ln{g}_{w}\phantom{\rule{0.166667em}{0ex}}\mathrm{d}\mathcal{B}\left(w\right)=-{\int}_{0}^{w}{F}_{w}^{\prime}\left(t\right)ln\frac{{G}_{w}^{\prime}\left(t\right)}{{S}_{n-1}\left(t\right)}\phantom{\rule{0.166667em}{0ex}}\mathrm{d}t$ | Diverges (no reweighting possible) |

Normalized Cross Entropy Power | $\frac{1}{{V}_{n}\left(w\right)}\mathrm{XHP}(f;g,w)$ = $\frac{1}{{V}_{n}\left(w\right)}exp\left(\mathrm{XH}(f;g,w)\right)$ | $\frac{1}{\gamma}exp\left(\frac{\gamma -1}{\phi}\right);\phantom{\rule{0.222222em}{0ex}}\phantom{\rule{0.222222em}{0ex}}\phantom{\rule{0.222222em}{0ex}}\phi =\frac{{ID}_{F}^{\ast}}{n}\phantom{\rule{0.277778em}{0ex}};\phantom{\rule{0.277778em}{0ex}}\gamma =\frac{{ID}_{G}^{\ast}}{n}$ |

Weighted | ${V}_{n}\left(w\right)\xb7\mathrm{L}2\mathrm{D}(f;g,w)$ = ${V}_{n}\left(w\right)\phantom{\rule{0.166667em}{0ex}}{\int}_{\mathcal{B}\left(w\right)}{\left({f}_{w}-{g}_{w}\right)}^{2}\phantom{\rule{0.166667em}{0ex}}\mathrm{d}\mathcal{B}\left(w\right)$ | $\frac{{\left(\phi -\gamma \right)}^{2}}{2(\phi +\gamma -1)}\left[1+\frac{1}{(2\phi -1)(2\gamma -1)}\right]$ |

L2 Distance | = ${V}_{n}\left(w\right){\int}_{0}^{w}\frac{1}{{S}_{n-1}\left(t\right)}{\left({F}_{w}^{\prime}\left(t\right)-{G}_{w}^{\prime}\left(t\right)\right)}^{2}\phantom{\rule{0.166667em}{0ex}}\mathrm{d}t$ | $\phi =\frac{{ID}_{F}^{\ast}}{n}\phantom{\rule{0.277778em}{0ex}};\phantom{\rule{0.277778em}{0ex}}\gamma =\frac{{ID}_{G}^{\ast}}{n}$ |

${ID}_{F}^{\ast}>\frac{1}{2}$; ${ID}_{G}^{\ast}>\frac{1}{2}$ | ||

Hellinger Distance | $\mathrm{HD}(f;g,w)$ = $\sqrt{\frac{1}{2}{\int}_{\mathcal{B}\left(w\right)}{\left(\sqrt{{f}_{w}}-\sqrt{{g}_{w}}\right)}^{2}\phantom{\rule{0.166667em}{0ex}}\mathrm{d}\mathcal{B}\left(w\right)}$ | $\frac{\left|1-\sqrt{\rho}\right|}{\sqrt{1+\rho}}$ |

= $\sqrt{\frac{1}{2}{\int}_{0}^{w}{\left(\sqrt{{F}_{w}^{\prime}\left(t\right)}-\sqrt{{G}_{w}^{\prime}\left(t\right)}\right)}^{2}\phantom{\rule{0.166667em}{0ex}}\mathrm{d}t}$ | $\rho =\frac{{ID}_{G}^{\ast}}{{ID}_{F}^{\ast}}$ | |

${\chi}^{2}$-Divergence | ${\chi}^{2}\mathrm{D}(f;g,w)$ = ${\int}_{\mathcal{B}\left(w\right)}\frac{{\left({f}_{w}-{g}_{w}\right)}^{2}}{{g}_{w}}\phantom{\rule{0.166667em}{0ex}}\mathrm{d}\mathcal{B}\left(w\right)$ | $\frac{{\left(1-\rho \right)}^{2}}{\rho (2-\rho )}$ |

= ${\int}_{0}^{w}\frac{{\left({F}_{w}^{\prime}\left(t\right)-{G}_{w}^{\prime}\left(t\right)\right)}^{2}}{{G}_{w}^{\prime}\left(t\right)}\phantom{\rule{0.166667em}{0ex}}\mathrm{d}t$ | $\rho =\frac{{ID}_{G}^{\ast}}{{ID}_{F}^{\ast}}\phantom{\rule{0.277778em}{0ex}};\phantom{\rule{0.277778em}{0ex}}\rho <2$ | |

$\alpha $-Divergence | $\alpha \mathrm{D}(f;g,w)$ = $\frac{1}{\alpha (1-\alpha )}{\int}_{\mathcal{B}\left(w\right)}\alpha {f}_{w}+(1-\alpha ){g}_{w}-{f}_{w}^{\alpha}{g}_{w}^{1-\alpha}\phantom{\rule{0.166667em}{0ex}}\mathrm{d}\mathcal{B}\left(w\right)$ | $\frac{1}{\alpha (1-\alpha )}\left(1-\frac{1}{\alpha {\rho}^{\alpha -1}+(1-\alpha ){\rho}^{\alpha}}\right)$ |

= $\frac{1}{\alpha (1-\alpha )}{\int}_{0}^{w}\alpha {F}_{w}^{\prime}\left(t\right)+(1-\alpha ){G}_{w}^{\prime}\left(t\right)$ | $\rho =\frac{{ID}_{G}^{\ast}}{{ID}_{F}^{\ast}}$ | |

$\hspace{1em}\hspace{1em}\hspace{1em}\hspace{1em}\hspace{1em}\hspace{1em}\hspace{1em}\hspace{1em}\hspace{1em}\hspace{1em}\hspace{1em}-{\left({F}_{w}^{\prime}\left(t\right)\right)}^{\alpha}{\left({G}_{w}^{\prime}\left(t\right)\right)}^{1-\alpha}\phantom{\rule{0.166667em}{0ex}}\mathrm{d}t$ | Require $\phantom{\rule{0.277778em}{0ex}}\phantom{\rule{0.277778em}{0ex}}\alpha +\rho (1-\alpha )>0$ | |

KL Divergence | $\mathrm{KL}(f;g,w)$ = ${\int}_{\mathcal{B}\left(w\right)}{f}_{w}ln\frac{{f}_{w}}{{g}_{w}}\phantom{\rule{0.166667em}{0ex}}\mathrm{d}\mathcal{B}\left(w\right)\phantom{\rule{0.222222em}{0ex}}=\phantom{\rule{0.222222em}{0ex}}{\int}_{0}^{w}{F}_{w}^{\prime}\left(t\right)ln\frac{{F}_{w}^{\prime}\left(t\right)}{{G}_{w}^{\prime}\left(t\right)}\phantom{\rule{0.166667em}{0ex}}\mathrm{d}t$ | $\rho -ln\rho -1;\phantom{\rule{0.222222em}{0ex}}\phantom{\rule{0.222222em}{0ex}}\phantom{\rule{0.222222em}{0ex}}\rho =\frac{{ID}_{G}^{\ast}}{{ID}_{F}^{\ast}}$ |

JS Divergence | $\mathrm{JS}(f;g,w)$ = $\frac{1}{2}\left(\mathrm{KL}\left(f;\frac{f+g}{2},w\right)+\mathrm{KL}\left(g;\frac{f+g}{2},w\right)\right)$ | $\frac{\tau -ln\tau -1}{2};\phantom{\rule{0.222222em}{0ex}}\phantom{\rule{0.222222em}{0ex}}\tau =min\{\rho ,\frac{1}{\rho}\};\phantom{\rule{0.222222em}{0ex}}\phantom{\rule{0.222222em}{0ex}}\rho =\frac{{ID}_{G}^{\ast}}{{ID}_{F}^{\ast}}$ |

Publisher’s Note: MDPI stays neutral with regard to jurisdictional claims in published maps and institutional affiliations. |

© 2022 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).

## Share and Cite

**MDPI and ACS Style**

Bailey, J.; Houle, M.E.; Ma, X. Local Intrinsic Dimensionality, Entropy and Statistical Divergences. *Entropy* **2022**, *24*, 1220.
https://doi.org/10.3390/e24091220

**AMA Style**

Bailey J, Houle ME, Ma X. Local Intrinsic Dimensionality, Entropy and Statistical Divergences. *Entropy*. 2022; 24(9):1220.
https://doi.org/10.3390/e24091220

**Chicago/Turabian Style**

Bailey, James, Michael E. Houle, and Xingjun Ma. 2022. "Local Intrinsic Dimensionality, Entropy and Statistical Divergences" *Entropy* 24, no. 9: 1220.
https://doi.org/10.3390/e24091220