Discrete Versions of Jensen–Fisher, Fisher and Bayes–Fisher Information Measures of Finite Mixture Distributions

Omid Kharazmi; Narayanaswamy Balakrishnan

doi:10.3390/e23030363

and

¹

Department of Statistics, Faculty of Mathematical Sciences, Vali-e-Asr University of Rafsanjan, P.O. Box 518, Rafsanjan, Iran

²

Department of Mathematics and Statistics, McMaster University, Hamilton, ON L8S 4K1, Canada

^*

Author to whom correspondence should be addressed.

Entropy2021, 23(3), 363;https://doi.org/10.3390/e23030363

This article belongs to the Special Issue Measures of Information

Version Notes

Order Reprints

Review Reports

Abstract

In this work, we first consider the discrete version of Fisher information measure and then propose Jensen–Fisher information, to develop some associated results. Next, we consider Fisher information and Bayes–Fisher information measures for mixing parameter vector of a finite mixture probability mass function and establish some results. We provide some connections between these measures with some known informational measures such as chi-square divergence, Shannon entropy, Kullback–Leibler, Jeffreys and Jensen–Shannon divergences.

Keywords:

Fisher information; Bayes–Fisher information; chi-square divergence; Kullback–Leibler divergence; Jensen–Shannon entropy

1. Introduction

Over the last seven decades, several different criteria have been introduced in the literature for measuring uncertainty in a probabilistic model. Shannon entropy and Fisher information are the most important information measures that have been used rather extensively. Information theory started with Shannon entropy, introduced in the pioneering work of Shannon [1], based on a study of systems described by probability density (or mass) functions. About two decades earlier, Fisher [2] had proposed another information measure, describing the interior properties of a probabilistic model, that plays an important role in likelihood-based inferential methods. Fisher information and Shannon entropy are fundamental criteria in statistical inference, physics, thermodynamics and information theory. Complex systems can be described by means of their behavior (Shannon) and their architecture (Fisher) information. For more discussions, see Zegers [3] and Balakrishnan and Stepanov [4].

Let X be a discrete random variable with probability mass function (PMF)

P = (p_{1}, \dots, p_{n})

. Then, the Shannon entropy of random variable X is defined as

H (X) = H (P) = - \sum_{i = 1}^{n} p_{i} log p_{i},

where “log” denotes the natural logarithm. For more details, see Shannon [1]. Following the work of Shannon [1], considerable attention has been paid to providing some extensions of Shannon entropy. Jensen–Shannon (JS) divergence is one such important extension of Shannon entropy that has been widely used; see Lin [5]. The Jensen–Shannon divergence between two probability mass functions

P = (p_{1}, p_{2}, \dots, p_{n})

and

Q = (q_{1}, q_{2}, \dots, q_{n})

, for

0 \leq α \leq 1

, is defined as

\begin{matrix} J S (P, Q; α) = H (α P + (1 - α) Q) - α H (P) - (1 - α) H (Q) . \end{matrix}

The JS divergence is a smoothed and symmetric version of the most important divergence measure of information theory, namely, Kullback–Leibler divergence. Recently, Jensen–Fisher (JF) and Jensen–Gini (JG) divergence measures have been introduced by Sánchez-Moreno et al. [6] and Mehrali et al. [7], respectively.

In the present paper, motivated by the idea of JS divergence, we consider discrete versions of Fisher information (DFI) and Fisher information distance (DFID), and then develop a new information measure associated with DFI measure. In addition, we provide some results for the Fisher information of a finite mixture probability mass function through a Bayesian perspective. The discrete Fisher information of a random variable X with PMF

P = (p_{1}, p_{2}, \dots, p_{n})

is defined as

\begin{matrix} I (P) = \sum_{i = 1}^{n} \frac{{(p_{i + 1} - p_{i})}^{2}}{p_{i}}, \end{matrix}

(1)

with

p_{n + 1} = 0

.

The Fisher information in (1) has been made use of in the processing of complex and stationary signals. For example, the discrete version of Fisher information has been used in detecting epileptic seizures in EEG signals recorded in humans and turtles, in detecting dynamical changes in many non-linear models such as logistic map and Lorenz model, and also in the analysis of geoelectrical signals; see Martin et al. [8], Ramírez-Pacheco et al. [9] and Ramírez-Pacheco et al. [10] for pertinent details.

The discrete Fisher information distance (DFID) between two probability mass functions

P = (p_{1}, p_{2}, \dots, p_{n})

and

Q = (q_{1}, q_{2}, \dots, q_{n})

is defined as

\begin{matrix} D (P, Q) = \sum_{i = 1}^{n} {(\frac{p_{i + 1}}{p_{i}} - \frac{q_{i + 1}}{q_{i}})}^{2} p_{i}, \end{matrix}

(2)

where, as above,

p_{n + 1} = q_{n + 1} = 0

. For some of its properties, one may refer to Ramírez-Pacheco et al. [10] and Johnson [11].

With regard to informational properties of finite mixture models, one may refer to Contreras-Reyes and Cortés [12] and Abid et al. [13]. These authors have provided upper and lower bounds for Shannon and Rényi entropies of non-gaussian finite mixtures, skew-normal and skew-t distributions, respectively. Kolchinsky and Tracey [14] have studied the upper and lower bounds for the entropy of Gaussian mixture distributions using the Bhattacharyya and Kullback–Leibler divergences.

The first purpose of this paper is to propose Jensen–Fisher information for discrete random variables

X_{1}, \dots, X_{n}

, with probability mass functions

P_{1}, \dots, P_{n},

respectively. For this purpose, we first define discrete version of Jensen–Fisher information for two PMFs

P

and

Q

, and then provide some results concerning this new information measure. Then, this idea is extended to the general case of PMFs

P_{1}, \dots, P_{n}

.

The second purpose of this work is to study Fisher and Bayes–Fisher information measures for the mixing parameter of a finite mixture probability mass function. Let

P_{1}, \dots, P_{n}

be n probability mass functions, where

P_{j} = (p_{j 1}, \dots, p_{j k})

. Then, a finite mixture probability mass function with mixing parameter vector

θ = (θ_{1}, \dots, θ_{n - 1})

, for

n \geq 2,

is given by

P_{θ} = (p_{θ}^{1}, \dots, p_{θ}^{k}),

where

\begin{matrix} p_{θ}^{j} = \frac{1}{n - 1} \sum_{i = 1}^{n - 1} θ_{i} p_{i j} + (1 - \frac{\sum_{i = 1}^{n - 1} θ_{i}}{n - 1}) p_{n j}, j = 1, \dots, k, \end{matrix}

(3)

0 \leq θ_{i} \leq 1, i = 1, \dots, n - 1

and

\sum_{i = 1}^{n - 1} θ_{i} \leq 1

.

Let X and Y be two discrete random variables with PMFs

P = (p_{1}, \dots, p_{n})

and

Q = (q_{1}, \dots, q_{n})

, respectively. Then, the Kullback–Leibler (KL) distance between X and Y (or

P

and

Q

) is defined as

K L (X | | Y) = K L (P, Q) = \sum_{i = 1}^{n} p_{i} log (\frac{p_{i}}{q_{i}}) .

The Kullback–Leibler discrimination between Y and X can be defined similarly. For more details, see Kullback and Leibler [15]. The chi-square divergence between PMFs

P

and

Q

is defined by

\begin{matrix} χ^{2} (P, Q) = \sum_{i = 1}^{n} \frac{{(p_{i} - q_{i})}^{2}}{p_{i}} . \end{matrix}

For pertinent details, see Broniatowski [16] and Cover and Thomas [17].

The rest of this paper is organized as follows. In Section 2, we first consider discrete version of Fisher information and then propose the discrete Jensen–Fisher information (DJFI) measure. We show that DJFI measure can be represented based on the mixture of discrete Fisher information distance measures. In Section 3, we consider a finite mixture probability mass function and establish some results for the Fisher information measure of the mixing parameter vector. We show that the Fisher information of the mixing parameter vector is connected to chi-square divergence. Next, in Section 4, we discuss the Bayes–Fisher information for the mixing parameter vector of probability mass functions under some prior distributions for the mixing parameter. We then show that this measure is connected to Shannon entropy, Jensen–Shannon entropy, Kullback–Leibler and Jeffreys divergence measures. Finally, we present some concluding remarks in Section 5.

2. Discrete Version of Jensen-Fisher Information

In this section, we first give a result for the DFI measure based on the log-convex and log-concave property of the probability mass function. Then, we define the discrete Jensen–Fisher information measure, and establish some interesting properties of it.

Theorem 1.

Let

P = (p_{1}, p_{2}, \dots, p_{n})

be a probability mass function.

(i): If $P$ is log-concave, then $I (P) \leq p_{1};$
(ii): If $P$ is log-convex, then $I (P) \geq p_{1} .$

Proof.

P

is log-convex (log-concave) if

p_{i}^{2} \geq (\leq) p_{i - 1} p_{i + 1} \forall i .

So, from the definition of DFI in (1), we have

\begin{matrix} I (P) & = & \sum_{i = 1}^{n} \frac{{(p_{i + 1} - p_{i})}^{2}}{p_{i}} \geq (\leq) p_{1} . \end{matrix}

□

2.1. Discrete Jensen–Fisher Information Based on Two Probability Mass Functions P and Q

We first define a symmetric version of DFID measure in (2), and then propose the discrete Jensen–Fisher information measure involving two probability mass functions.

Definition 1.

Let

P

and

Q

be two probability mass functions given by

P = (p_{1}, p_{2}, \dots, p_{n})

and

Q = (q_{1}, q_{2}, \dots, q_{n})

. Then, a symmetric version of discrete Fisher information distance in (2) is defined as

\begin{matrix} SD (P, Q) & = & \frac{1}{2} D (P, \frac{P + Q}{2}) + \frac{1}{2} D (Q, \frac{P + Q}{2}) . \end{matrix}

Definition 2.

Let

P

and

Q

be two probability mass functions given by

P = (p_{1}, p_{2}, \dots, p_{n})

and

Q = (q_{1}, q_{2}, \dots, q_{n}) .

Then, the discrete Jensen–Fisher information is defined as

\begin{matrix} JFI (P, Q) & = & \frac{I (P) + I (Q)}{2} - I (\frac{P + Q}{2}) . \end{matrix}

(4)

In the following theorem, we show that the discrete Jensen–Fisher information measure can be obtained based on mixtures of Fisher information distances.

Theorem 2.

Let

P

and

Q

be two probability mass functions given by

P = (p_{1}, p_{2}, \dots, p_{n})

and

Q = (q_{1}, q_{2}, \dots, q_{n})

. Then,

\begin{matrix} JFI (P, Q) & = & \frac{1}{2} D (P, \frac{P + Q}{2}) + \frac{1}{2} D (Q, \frac{P + Q}{2}) \\ = & SD (P, Q) . \end{matrix}

Proof.

From the definition of DFID in (2), we get

\begin{matrix} D (P, \frac{P + Q}{2}) & = & \sum_{i = 1}^{n} {(\frac{p_{i + 1}}{p_{i}} - \frac{p_{i + 1} + q_{i + 1}}{p_{i} + q_{i}})}^{2} p_{i} \\ = & \sum_{i = 1}^{n} {\{(\frac{p_{i + 1}}{p_{i}} - 1) - (\frac{p_{i + 1} + q_{i + 1}}{p_{i} + q_{i}} - 1)\}}^{2} p_{i} \\ = & \sum_{i = 1}^{n} {(\frac{p_{i + 1}}{p_{i}} - 1)}^{2} p_{i} - 2 \sum_{i = 1}^{n} (\frac{p_{i + 1}}{p_{i}} - 1) (\frac{p_{i + 1} + q_{i + 1}}{p_{i} + q_{i}} - 1) p_{i} \\ + \sum_{i = 1}^{n} {(\frac{p_{i + 1} + q_{i + 1}}{p_{i} + q_{i}} - 1)}^{2} p_{i} \\ = & \sum_{i = 1}^{n} \frac{{(p_{i + 1} - p_{i})}^{2}}{p_{i}} - 2 \sum_{i = 1}^{n} (p_{i + 1} - p_{i}) \frac{(p_{i + 1} + q_{i + 1} - (p_{i} + q_{i}))}{p_{i} + q_{i}} \\ + \sum_{i = 1}^{n} \frac{{(p_{i + 1} + q_{i + 1} - (p_{i} + q_{i}))}^{2}}{{(p_{i} + q_{i})}^{2}} p_{i} . \end{matrix}

In a similar way, we get

\begin{matrix} D (Q, \frac{P + Q}{2}) & = & \sum_{i = 1}^{n} \frac{{(q_{i + 1} - q_{i})}^{2}}{q_{i}} - 2 \sum_{i = 1}^{n} (q_{i + 1} - q_{i}) \frac{(p_{i + 1} + q_{i + 1} - (p_{i} + q_{i}))}{p_{i} + q_{i}} \\ + \sum_{i = 1}^{n} \frac{{(p_{i + 1} + q_{i + 1} - (p_{i} + q_{i}))}^{2}}{{(p_{i} + q_{i})}^{2}} q_{i} . \end{matrix}

Upon adding the above two expressions, we obtain

\begin{matrix} D (P, \frac{P + Q}{2}) + D (Q, \frac{P + Q}{2}) & = & \sum_{i = 1}^{n} \frac{{(p_{i + 1} - p_{i})}^{2}}{p_{i}} + \sum_{i = 1}^{n} \frac{{(q_{i + 1} - q_{i})}^{2}}{q_{i}} \\ - \sum_{i = 1}^{n} \frac{{(p_{i + 1} + q_{i + 1} - (p_{i} + q_{i}))}^{2}}{p_{i} + q_{i}} \\ = & I (P) + I (Q) - 2 I (\frac{P + Q}{2}) \\ = & 2 JFI (P, Q), \end{matrix}

as required. □

Example 1.

Let

X = \{\begin{matrix} 1, & w i t h p r o b a b i l i t y p, \\ 0, & w i t h p r o b a b i l i t y 1 - p, \end{matrix}

and

Y = \{\begin{matrix} 1, & w i t h p r o b a b i l i t y q, \\ 0, & w i t h p r o b a b i l i t y 1 - q . \end{matrix}

The corresponding PMFs of variables X and Y are given by

P = (p, 1 - p)

and

Q = (q, 1 - q)

, respectively. From Theorem 2, we then have

\begin{matrix} JFI (P, Q) & = & \frac{p + q}{2} {(\frac{1 - p}{p} - \frac{1 - q}{q})}^{2} . \end{matrix}

A 3D-plot of this

JFI (P, Q)

is presented in Figure 1.

Figure 1. 3D-plot of the DJFI divergence between the PMFs

P = (p, 1 - p)

and

Q = (q, 1 - q)

.

2.2. Discrete Jensen–Fisher Information Based on n Probability Mass Functions $P_{1}, \dots, P_{n}$

Let

P_{1}, \dots, P_{n}

be n probability mass functions, where

P_{i} = (p_{i 1}, \dots, p_{i k})

. In the following definition, we extend the discrete Jensen–Fisher information measure in (4) to the case of n probability mass functions.

Definition 3.

Let

P_{1}, \dots, P_{n}

be n probability mass functions given by

P_{i} = (p_{i 1}, p_{i 2}, \dots, p_{i k})

,

i = 1, 2, \dots, n

, with

\sum_{j = 1}^{k} p_{i j} = 1

, and

α_{1}, \dots, α_{n}

be non-negative real numbers such that

\sum_{i = 1}^{n} α_{i} = 1

. Then, the discrete Jensen–Fisher information (DJFI) based on the n probability mass functions is defined as

\begin{matrix} JFI (P_{1}, \dots, P_{n}; \underset{̲}{α}) & = & \sum_{i = 1}^{n} α_{i} I (P_{i}) - I (\sum_{i = 1}^{n} α_{i} P_{i}) \\ = & \sum_{i = 1}^{n} α_{i} \sum_{j = 1}^{k} \frac{{(p_{i j + 1} - p_{i j})}^{2}}{p_{i j}} - \sum_{j = 1}^{k} \frac{{(\sum_{i = 1}^{n} α_{i} p_{i j + 1} - \sum_{i = 1}^{n} α_{i} p_{i j})}^{2}}{\sum_{i = 1}^{n} α_{i} p_{i j}}, \end{matrix}

(5)

where

\underset{̲}{α} = (α_{1}, \dots, α_{n}) .

Theorem 3.

Let

P_{1}, \dots, P_{n}

be n probability mass functions given by

P_{i} = (p_{i 1}, p_{i 2}, \dots, p_{i k})

,

i = 1, 2, \dots, n

, and

\sum_{j = 1}^{k} p_{i j} = 1

. Then, the DJFI measure can be expressed as a mixture of DFID measures in (2) as follows:

\begin{matrix} JFI (P_{1}, \dots, P_{n}, \underset{̲}{α}) & = & \sum_{i = 1}^{n} α_{i} D (P_{i}, P_{T}), \end{matrix}

where

P_{T} = \sum_{i = 1}^{n} α_{i} P_{i}

is the weighted PMF.

Proof.

From the definition in (5), we get

\begin{matrix} \sum_{i = 1}^{n} α_{i} D (P_{i}, P_{T}) & = & \sum_{i = 1}^{n} α_{i} \sum_{j = 1}^{k} {(\frac{p_{i j + 1}}{p_{i j}} - \frac{\sum_{i = 1}^{n} α_{i} p_{i j + 1}}{\sum_{i = 1}^{n} α_{i} p_{i j}})}^{2} p_{i j} \\ = & \sum_{i = 1}^{n} α_{i} \sum_{j = 1}^{k} {\{\frac{p_{i j + 1}}{p_{i j}} - 1 - (\frac{\sum_{i = 1}^{n} α_{i} p_{i j + 1}}{\sum_{i = 1}^{n} α_{i} p_{i j}} - 1)\}}^{2} p_{i j} \\ = & \sum_{i = 1}^{n} α_{i} \sum_{j = 1}^{k} \frac{{(p_{i j + 1} - p_{i j})}^{2}}{p_{i j}} - 2 \sum_{j = 1}^{k} \frac{{(\sum_{i = 1}^{n} α_{i} p_{i j + 1} - \sum_{i = 1}^{n} α_{i} p_{i j})}^{2}}{\sum_{i = 1}^{n} α_{i} p_{i j}} \\ + \sum_{j = 1}^{k} \frac{{(\sum_{i = 1}^{n} α_{i} p_{i j + 1} - \sum_{i = 1}^{n} α_{i} p_{i j})}^{2}}{\sum_{i = 1}^{n} α_{i} p_{i j}} \\ = & \sum_{i = 1}^{n} α_{i} \sum_{j = 1}^{k} \frac{{(p_{i j + 1} - p_{i j})}^{2}}{p_{i j}} - \sum_{j = 1}^{k} \frac{{(\sum_{i = 1}^{n} α_{i} p_{i j + 1} - \sum_{i = 1}^{n} α_{i} p_{i j})}^{2}}{\sum_{i = 1}^{n} α_{i} p_{i j}} \\ = & \sum_{i = 1}^{n} α_{i} I (P_{i}) - I (\sum_{i = 1}^{n} α_{i} P_{i}) \\ = & JFI (P_{1}, \dots, P_{n}, \underset{̲}{α}), \end{matrix}

as required. □

3. Fisher Information of a Finite Mixture Probability Mass Function

In this section, we discuss Fisher information for parameter

θ

of a finite mixture probability mass function.

Theorem 4.

The Fisher information of PMF in (3) about parameter

θ_{i}, i = 1, \dots, n - 1

, is given by

\begin{matrix} I (θ_{i}) = \frac{1}{{(θ_{i} - (n - 1))}^{2}} χ^{2} (P_{θ_{- i}}, P_{θ}), i = 1, \dots, n - 1, \end{matrix}

(6)

where

P_{θ_{- i}} = (p_{θ_{- i}}^{1}, \dots, p_{θ_{- i}}^{k}),

p_{θ_{- i}}^{j} = \frac{n - 2}{n - 1} p_{i j} + \frac{1}{n - 1} \sum_{ł = 1, ł \neq i}^{n - 1} θ_{ł} p_{ł j} + \frac{1}{n - 1} (1 - \sum_{ł = 1, ł \neq i}^{n - 1} θ_{ł}) p_{n j}, j = 1, \dots, k,

and

θ_{- i} = (θ_{1}, \dots, θ_{i - 1}, θ_{i + 1}, \dots, θ_{n - 1}) .

Proof.

From the definition of Fisher information in (1) and for

i = 1, \dots, n - 1

, we have

\begin{matrix} I (θ_{i}) & = & \sum_{j = 1}^{k} {[\frac{\partial log P_{θ}}{\partial θ_{i}}]}^{2} p_{θ}^{j} \\ = & \frac{1}{{(n - 1)}^{2}} \sum_{j = 1}^{k} \frac{{(p_{i j} - p_{n j})}^{2}}{{(p_{θ}^{j})}^{2}} p_{θ}^{j} \\ = & \frac{1}{{(θ_{i} - (n - 2))}^{2}} \sum_{j = 1}^{k} \frac{(p_{θ_{- i}}^{j} - p_{θ}^{j})^{2}}{p_{θ}^{j}} \\ = & \frac{1}{{(θ_{i} - (n - 1))}^{2}} χ^{2} (P_{θ_{- i}}, P_{θ}), i = 1, \dots, n - 1, \end{matrix}

(7)

where the third equation follows from the fact that, for

i = 1, \dots, n - 1,

p_{i j} - p_{n j} = \frac{n - 1}{θ_{i} - (n - 2)} (p_{θ}^{j} - p_{θ_{- i}}^{j}) .

□

4. Bayes–Fisher Information of a Finite Mixture Probability Mass Function

In this section, we discuss Bayes–Fisher information for the mixing parameter vector

θ

of the finite mixture probability mass function in (3) under some prior distributions for the mixing parameter vector. We now introduce two notations that will be used in the sequel. Consider the parameter vector

θ = (θ_{1}, \dots, θ_{n - 1})

, and then define

(0_{i}, θ) = (θ_{1}, \dots, θ_{i - 1}, 0, θ_{i + 1}, \dots, θ_{n - 1})

and

(1_{i}, θ) = (θ_{1}, \dots, θ_{i - 1}, 1, θ_{i + 1}, \dots, θ_{n - 1}) .

Theorem 5.

The Bayes–Fisher information for parameter

θ_{i}, i = 1, \dots, n - 1

, of the finite mixture PMF in (3), under the uniform prior on

[0, 1],

is given by

\begin{matrix} \tilde{I} (θ_{i}) & = & K L (P_{(1_{i}, θ)}, P_{(0_{i}, θ)}) + K L (P_{(0_{i}, θ)}, P_{(1_{i}, θ)}) \\ = & J (P_{(0_{i}, θ)}, P_{(1_{i}, θ)}), \end{matrix}

where

P_{(1_{i}, θ)} = (p_{(1_{i}, θ)}^{1}, \dots, p_{(1_{i}, θ)}^{n}),

with

p_{(1_{i}, θ)}^{j} = \frac{1}{n - 1} p_{i j} + \frac{1}{n - 1} \sum_{ł = 1, ł \neq i}^{n - 1} θ_{ł} + (1 - \frac{1}{n - 1} (1 + \sum_{ł = 1, ł \neq i}^{n - 1} θ_{ł})) p_{n j},

(8)

and

P_{(0_{i}, θ)} = (p_{(0_{i}, θ)}^{1}, \dots, p_{(0_{i}, θ)}^{n}),

with

p_{(0_{i}, θ)}^{j} = \frac{1}{n - 1} \sum_{ł = 1, ł \neq i}^{n - 1} θ_{ł} p_{ł j} + (1 - \frac{1}{n - 1} \sum_{ł = 1, ł \neq i}^{n - 1} θ_{ł}) p_{n j},

(9)

and J corresponds to Jeffreys’ divergence.

Proof.

By definition and from (7), for

i = 1, \dots, n - 1

, we have

\begin{matrix} \tilde{I} (θ_{i}) & = & E [I (Θ_{i})] = \frac{1}{{(n - 1)}^{2}} \int_{0}^{1} \{\sum_{j = 1}^{k} \frac{{(p_{i j} - p_{n j})}^{2}}{p_{θ}^{j}}\} d θ_{i} \\ = & \frac{1}{n - 1} \sum_{j = 1}^{k} (p_{i j} - p_{n j}) \{\int_{0}^{1} \frac{1}{n - 1} \frac{p_{i j} - p_{n j}}{p_{θ}^{j}} d θ_{i}\} \\ = & \frac{1}{n - 1} \sum_{j = 1}^{k} (p_{i j} - p_{n j}) \{log (p_{θ}^{j}) |_{0}^{1}\} . \end{matrix}

(10)

On the other hand, we have

\begin{matrix} p_{(1_{i}, θ)} - p_{(0_{i}, θ)} = \frac{1}{n - 1} (p_{i j} - p_{n j}) . \end{matrix}

(11)

Hence, upon substituting (11) into (10), we obtain

\begin{matrix} \tilde{I} (θ_{i}) & = & \frac{1}{n - 1} \sum_{j = 1}^{k} (p_{i j} - p_{n j}) log \{\frac{p_{(1_{i}, θ)}^{j}}{p_{(0_{i}, θ)}^{j}}\} \\ = & \sum_{j = 1}^{k} (p_{(1_{i}, θ)} - p_{(0_{i}, θ)}) log \{\frac{p_{(1_{i}, θ)}^{j}}{p_{(0_{i}, θ)}^{j}}\} \\ = & K L (P_{(1_{i}, θ)}, P_{(0_{i}, θ)}) + K L (P_{(0_{i}, θ)}, P_{(1_{i}, θ)}) \\ = & J (P_{(0_{i}, θ)}, P_{(1_{i}, θ)}), \end{matrix}

as required. □

Theorem 6.

For the mixture model with PMF in (3), we have the following:

(i): The Bayes–Fisher information for $θ_{i}, i = 1, \dots, n - 1$ , under $B e t a (2, 1)$ prior with PMF $π (θ_{i}) = 2 θ_{i}, θ_{i} \in [0, 1]$ , is

$\begin{matrix} \tilde{I} (θ_{i}) & = & 2 K L (P_{(0_{i}, θ)}, P_{(1_{i}, θ)}), i = 1, \dots, n - 1; \end{matrix}$
(ii): The Bayes-Fisher information for parameter $θ_{i}, i = 1, \dots, n - 1$ , under $B e t a (1, 2)$ prior with PMF $π (θ_{i}) = 2 (1 - θ_{i}), θ_{i} \in [0, 1]$ , is

$\begin{matrix} \tilde{I} (θ_{i}) & = & 2 K L (P_{(1_{i}, θ)}, P_{(0_{i}, θ)}), i = 1, \dots, n - 1 . \end{matrix}$

Proof.

By definition, and from (7), for

i = 1, \dots, n - 1

, we have

\begin{matrix} \tilde{I} (θ_{i}) & = & E [I (Θ_{i})] = \frac{1}{{(n - 1)}^{2}} \int_{0}^{1} \{\sum_{j = 1}^{k} \frac{{(p_{i j} - p_{n j})}^{2}}{p_{θ}^{j}}\} π (θ_{i}) d θ_{i} \\ = & \frac{2}{n - 1} \sum_{j = 1}^{k} (p_{i j} - p_{n j}) \{\int_{0}^{1} \frac{θ_{i}}{n - 1} \frac{p_{i j} - p_{n j}}{p_{θ}^{j}} d θ_{i}\} \\ = & \frac{2}{n - 1} \sum_{j = 1}^{k} (p_{i j} - p_{n j}) \{\int_{0}^{1} \frac{p_{θ}^{j} - p_{(0_{i}, θ)}^{j}}{p_{θ} (x)} d θ_{i}\} \\ = & \frac{2}{n - 1} \sum_{j = 1}^{k} (p_{i j} - p_{n j}) \{1 - \frac{(n - 1) p_{(0_{i}, θ)}^{j}}{p_{i j} - p_{n j}} log (p_{θ}^{j}) |_{0}^{1}\} \\ = & - 2 \sum_{j = 1}^{k} p_{(0_{i}, θ)}^{j} log \{\frac{p_{(1_{i}, θ)}^{j}}{p_{(0_{i}, θ)}^{j}}\} \\ = & 2 \sum_{j = 1}^{k} p_{(0_{i}, θ)}^{j} log \{\frac{p_{(0_{i}, θ)}^{j}}{p_{(1_{i}, θ)}^{j}}\} \\ = & 2 K L (P_{(0_{i}, θ)}, P_{(1_{i}, θ)}), \end{matrix}

as required for Part (i). Part (ii) can be proved in an analogous manner.

Let us now consider the following general triangular prior for the parameter

θ_{i}, i = 1, \dots, n - 1

:

π_{α} (θ_{i}) = \{\begin{matrix} \frac{2 θ_{i}}{α}, & 0 < θ_{i} \leq α, \\ \frac{2 (1 - θ_{i})}{1 - α}, & α \leq θ_{i} < 1, \end{matrix}

(12)

for some

α \in (0, 1) .

□

Theorem 7.

The Bayes–Fisher information for parameter

θ_{i}, i = 1, \dots, n - 1,

with the general triangular prior with density

π_{α} (θ_{i})

in (12), is given by

\begin{matrix} \tilde{I} (θ_{i}) & = & \frac{2}{α (1 - α)} [α K L (P_{(1_{i}, θ)}, P_{α}) + (1 - α) K L (P_{(0_{i}, θ)}, P_{α})] \\ = & \frac{2}{α (1 - α)} J S (P_{(0_{i}, θ)}, P_{(1_{i}, θ)}; α), \end{matrix}

where

P_{α} = (p_{α}^{1}, \dots, p_{α}^{k})

is a finite mixture PMF, with

\begin{matrix} p_{α}^{j} & = & \frac{α}{n - 1} p_{i j} + \frac{1}{n - 1} \sum_{l = 1, l \neq i}^{n - 1} θ_{l} p_{l j} + (1 - \frac{1}{n - 1} (α + \sum_{l = 1, l \neq i}^{n - 1} θ_{l})) p_{n j} \end{matrix}

and

P_{(1_{i}, θ)}

and

P_{(0_{i}, θ)}

are as defined in (8) and (9), respectively.

Proof.

From the assumptions made, for

i = 1, \dots, n - 1

, we have

\begin{matrix} \tilde{I} (θ_{i}) & = & E [I (Θ_{i})] = \int_{0}^{α} I (θ_{i}) π_{α} d θ_{i} + \int_{α}^{1} I (θ_{i}) π_{α} d θ_{i} \\ = & \frac{2}{(n - 1) α} \sum_{j = 1}^{k} (p_{i j} - p_{n j}) [\int_{0}^{α} \frac{θ_{i}}{n - 1} \frac{p_{i j} - p_{n j}}{p_{θ}^{j}} d θ_{i}] \\ + \frac{2}{(n - 1) (1 - α)} \sum_{j = 1}^{k} (p_{i j} - p_{n j}) [\int_{α}^{1} \frac{1 - θ_{i}}{n - 1} \frac{(p_{i j} - p_{n j})}{p_{θ}^{j}} d θ_{i}] \\ = & \frac{2}{(n - 1) α} \sum_{j = 1}^{k} (p_{i j} - p_{n j}) [\int_{0}^{α} (1 - \frac{p_{(0_{i}, θ)}^{j}}{p_{θ}^{j}}) d θ_{i}] \\ - \frac{2}{(n - 1) (1 - α)} \sum_{j = 1}^{k} (p_{i j} - p_{n j}) [\int_{0}^{α} (1 - \frac{p_{(1_{i}, θ)}^{j}}{p_{θ}^{j}}) d θ_{i}] \\ = & \frac{2}{α} \sum_{j = 1}^{k} p_{(0_{i}, θ)}^{j} log \{\frac{p_{α}^{j}}{p_{(0_{i}, θ)}^{j}}\} + \frac{2}{1 - α} \sum_{j = 1}^{k} p_{(1_{i}, θ)}^{j} log \{\frac{p_{(1_{i}, θ)}^{j}}{p_{α}^{j}}\} \\ = & \frac{2}{α (1 - α)} [α K L (P_{(1_{i}, θ)}, P_{α}) + (1 - α) K L (P_{(0_{i}, θ)}, P_{α})] \\ = & \frac{2}{α (1 - α)} J S (P_{(0_{i}, θ)}, P_{(1_{i}, θ)}; α), \end{matrix}

as required. □

5. Concluding Remarks

In this paper, we have introduced the discrete version of Jensen–Fisher information measure, and have shown that this information measure can be expressed as a mixture of discrete Fisher information distance measures. Further, we have considered a finite mixture probability mass function and have derived Fisher information and Bayes–Fisher information for the mixing parameter vector. We have shown that the Fisher information for the mixing parameter is connected to chi-square divergence. We have also studied the Bayes–Fisher information for the mixing parameter of a finite mixture model under some prior distributions. These results have provided connections between the Bayes–Fisher information and some known informational measures such as Shannon entropy, Kullback–Leibler, Jeffreys and Jensen–Shannon divergence measures.

Author Contributions

All authors contributed equally to this work. All authors have read and agreed to the published version of the manuscript.

Funding

This research received no external funding.

Data Availability Statement

Data sharing not applicable.

Conflicts of Interest

The authors declare no conflict of interest.

References

Shannon, C.E. A mathematical theory of communication. Bell Syst. Tech. J. 1948, 27, 379–423. [Google Scholar] [CrossRef]
Fisher, R.A. Tests of significance in harmonic analysis. Proc. R. Soc. Lond. A Math. Phys. Sci. 1929, 125, 54–59. [Google Scholar]
Zegers, P. Fisher information properties. Entropy 2015, 17, 4918–4939. [Google Scholar] [CrossRef]
Balakrishnan, N.; Stepanov, A. On the Fisher information in record data. Stat. Probab. Lett. 2006, 76, 537–545. [Google Scholar] [CrossRef]
Lin, J. Divergence measures based on the Shannon entropy. IEEE Trans. Inf. Theory 1991, 37, 145–151. [Google Scholar] [CrossRef]
Sánchez-Moreno, P.; Zarzo, A.; Dehesa, J.S. Jensen divergence based on Fisher’s information. J. Phys. A Math. Theor. 2012, 45, 125305. [Google Scholar] [CrossRef]
Mehrali, Y.; Asadi, M.; Kharazmi, O. A Jensen-Gini measure of divergence with application in parameter estimation. Metron 2018, 76, 115–131. [Google Scholar] [CrossRef]
Martin, M.T.; Pennini, F.; Plastino, A. Fisher’s information and the analysis of complex signals. Phys. Lett. A 1999, 256, 173–180. [Google Scholar] [CrossRef]
Ramírez-Pacheco, J.; Torres-Román, D.; Rizo-Dominguez, L.; Trejo-Sanchez, J.; Manzano-Pinzón, F. Wavelet Fisher’s information measure of 1/f^α signals. Entropy 2011, 13, 1648–1663. [Google Scholar] [CrossRef]
Ramírez-Pacheco, J.; Torres-Román, D.; Argaez-Xool, J.; Rizo-Dominguez, L.; Trejo-Sanchez, J.; Manzano-Pinzón, F. Wavelet q-Fisher information for scaling signal analysis. Entropy 2012, 14, 1478–1500. [Google Scholar] [CrossRef]
Johnson, O. Information Theory and the Central Limit Theorem; World Scientific Publishers: Singapore, 2004. [Google Scholar]
Contreras-Reyes, J.E.; Cortés, D.D. Bounds on Rényi and Shannon entropies for finite mixtures of multivariate skew-normal distributions: Application to swordfish (Xiphias gladius linnaeus). Entropy 2017, 18, 382. [Google Scholar] [CrossRef]
Abid, S.H.; Quaez, U.J.; Contreras-Reyes, J.E. An information-theoretic approach for multivariate skew-t distributions and applications. Mathematics 2021, 9, 146. [Google Scholar] [CrossRef]
Kolchinsky, A.; Tracey, B.D. Estimating mixture entropy with pairwise distances. Entropy 2017, 19, 361. [Google Scholar] [CrossRef]
Kullback, S.; Leibler, R.A. On information and sufficiency. Ann. Stat. 1951, 22, 79–86. [Google Scholar] [CrossRef]
Broniatowski, M. Minimum divergence estimators, Maximum likelihood and the generalized bootstrap. Entropy 2021, 23, 185. [Google Scholar] [CrossRef] [PubMed]
Cover, T.; Thomas, J. Elements of Information Theory; John Wiley & Sons: Hoboken, NJ, USA, 2006. [Google Scholar]

Figure 1. 3D-plot of the DJFI divergence between the PMFs

P = (p, 1 - p)

and

Q = (q, 1 - q)

.

Publisher’s Note: MDPI stays neutral with regard to jurisdictional claims in published maps and institutional affiliations.

© 2021 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (http://creativecommons.org/licenses/by/4.0/).

Discrete Versions of Jensen–Fisher, Fisher and Bayes–Fisher Information Measures of Finite Mixture Distributions

Abstract

1. Introduction

2. Discrete Version of Jensen-Fisher Information

2.1. Discrete Jensen–Fisher Information Based on Two Probability Mass Functions P and Q

2.2. Discrete Jensen–Fisher Information Based on n Probability Mass Functions $P_{1}, \dots, P_{n}$

3. Fisher Information of a Finite Mixture Probability Mass Function

4. Bayes–Fisher Information of a Finite Mixture Probability Mass Function

5. Concluding Remarks

Author Contributions

Funding

Data Availability Statement

Conflicts of Interest

References

Article Metrics

Citations

Article Access Statistics

Discrete Versions of Jensen–Fisher, Fisher and Bayes–Fisher Information Measures of Finite Mixture Distributions

Abstract

1. Introduction

2. Discrete Version of Jensen-Fisher Information

2.1. Discrete Jensen–Fisher Information Based on Two Probability Mass Functions P and Q

2.2. Discrete Jensen–Fisher Information Based on n Probability Mass Functions P 1 , … , P n

3. Fisher Information of a Finite Mixture Probability Mass Function

4. Bayes–Fisher Information of a Finite Mixture Probability Mass Function

5. Concluding Remarks

Author Contributions

Funding

Data Availability Statement

Conflicts of Interest

References

Article Metrics

Citations

Article Access Statistics

2.2. Discrete Jensen–Fisher Information Based on n Probability Mass Functions $P_{1}, \dots, P_{n}$