Next Article in Journal
Disentangling the Information in Species Interaction Networks
Next Article in Special Issue
Entropy, Information, and the Updating of Probabilities
Previous Article in Journal
Quantum Foundations of Classical Reversible Computing
Previous Article in Special Issue
Classical and Quantum H-Theorem Revisited: Variational Entropy and Relaxation Processes
 
 
Font Type:
Arial Georgia Verdana
Font Size:
Aa Aa Aa
Line Spacing:
Column Width:
Background:
Article

On the α-q-Mutual Information and the α-q-Capacities

by
Velimir M. Ilić
1,* and
Ivan B. Djordjević
2
1
Mathematical Institute of the Serbian Academy of Sciences and Arts, Kneza Mihaila 36, 11000 Beograd, Serbia
2
Department of Electrical and Computer Engineering, University of Arizona, 1230 E. Speedway Blvd., Tucson, AZ 85721, USA
*
Author to whom correspondence should be addressed.
Entropy 2021, 23(6), 702; https://doi.org/10.3390/e23060702
Submission received: 12 February 2021 / Revised: 19 May 2021 / Accepted: 26 May 2021 / Published: 1 June 2021
(This article belongs to the Special Issue The Statistical Foundations of Entropy)

Abstract

:
The measures of information transfer which correspond to non-additive entropies have intensively been studied in previous decades. The majority of the work includes the ones belonging to the Sharma–Mittal entropy class, such as the Rényi, the Tsallis, the Landsberg–Vedral and the Gaussian entropies. All of the considerations follow the same approach, mimicking some of the various and mutually equivalent definitions of Shannon information measures, and the information transfer is quantified by an appropriately defined measure of mutual information, while the maximal information transfer is considered as a generalized channel capacity. However, all of the previous approaches fail to satisfy at least one of the ineluctable properties which a measure of (maximal) information transfer should satisfy, leading to counterintuitive conclusions and predicting nonphysical behavior even in the case of very simple communication channels. This paper fills the gap by proposing two parameter measures named the α -q-mutual information and the α -q-capacity. In addition to standard Shannon approaches, special cases of these measures include the α -mutual information and the α -capacity, which are well established in the information theory literature as measures of additive Rényi information transfer, while the cases of the Tsallis, the Landsberg–Vedral and the Gaussian entropies can also be accessed by special choices of the parameters α and q. It is shown that, unlike the previous definition, the α -q-mutual information and the α -q-capacity satisfy the set of properties, which are stated as axioms, by which they reduce to zero in the case of totally destructive channels and to the (maximal) input Sharma–Mittal entropy in the case of perfect transmission, which is consistent with the maximum likelihood detection error. In addition, they are non-negative and less than or equal to the input and the output Sharma–Mittal entropies, in general. Thus, unlike the previous approaches, the proposed (maximal) information transfer measures do not manifest nonphysical behaviors such as sub-capacitance or super-capacitance, which could qualify them as appropriate measures of the Sharma–Mittal information transfer.

1. Introduction

In the past, extensive work has been written on defining the information measures which generalize the Shannon entropy [1], such as the one-parameter Rényi entropy [2], the Tsallis entropy [3], the Landsberg–Vedral entropy [4], the Gaussian entropy [5], and the two-parameter Sharma–Mittal entropy [5,6], which reduces to former ones for special choices of the parameters. The Sharma–Mittal entropy can axiomatically be founded as the unique q-additive measure [7,8] which satisfies generalized Shannon–Kihinchin axioms [9,10] and which has widely been explored in different research fields starting from statistics [11] and thermodynamics [12,13] to quantum mechanics [14,15], machine learning [16,17] and cosmology [18,19]. The Sharma–Mittal entropy has also been recognized in the field of information theory, where the measures of conditional Sharma–Mittal entropy [20], Sharma–Mittal divergences [21] and Sharma–Mittal entropy rate [22] have been established and analyzed.
Considerable research has also been done in the field of communication theory in order to analyze information transmission in the presence of noise if, instead of Shannon’s entropy, the information is quantified with (instances of) Sharma–Mittal entropy and, in general, the information transfer is quantified by an appropriately defined measure of mutual information, while the maximal information transfer is considered as a generalized channel capacity. Thus, after Rényi’s proposal for the additive generalization of Shannon entropy [2], several different definitions for Rényi information transfer were proposed by Sibson [23], Arimoto [24], Augustin [25], Csiszar [26], Lapidoth and Pfister [27] and Tomamichel and Hayashi [28]. These measures have been explored thoroughly and their operational characterization in coding theory, hypothesis testing, cryptography and quantum information theory was established, which qualifies them as a reasonable measure of Rényi information transfer [29]. Similar attempts have also been made in the case of non-additive entropies. Thus, starting from the work of Daroczy [30], who introduced a measure for generalized information transfer related to the Tsallis entropy, several attempts followed for the measures which correspond to non-additive particular instances of the Sharma–Mittal entropy, so the definitions for the Rényi information transfer were considered in [24,31], for the Tsallis information transfer in [32] and for the Landsber–Vedral information transfer in [4,33].
In this paper we provide a general treatment of the Sharma–Mittal entropy transfer and a detailed analysis of existing measures, showing that all of the definitions related to non-additive entropies fail to satisfy at least one of the ineluctable properties common to the Shannon case, which we state as axioms, by which the information transfer has to be non-negative, less than the input and output uncertainty, equal to the input uncertainty in the case of perfect transmission and equal to zero, in the case of a totally destructive channel. Thus, breaking some of these axioms implies unexpected and counterintuitive conclusions about the channels, such as achieving super-capacitance or sub-capacitance [4], which could be treated as nonphysical behavior. As an alternative, we propose the α -q-mutual information as a measure of Sharma–Mittal information transfer, maximized with the α -q-capacity. The α -q mutual information generalizes the α -mutual information by Arimoto [24], which is defined as a q-difference between the input Sharma–Mittal entropy and the appropriately defined conditional Sharma–Mittal entropy if the output is given, while the α -q-capacity represents a generalization of Arimoto’s α -capacity in the case of q = 1 . In addition, several other instances can be obtained by specifying the values of parameters α and q, which includes the information transfer measures for the Tsallis, the Landsber–Vedral and the Shannon entropy, as well as the case of the Gaussian entropy which was not considered before in the context of information transmission.
The paper is organized as follows. The basic properties and special instances of the Sharma–Mittal entropy are listed in Section 2. Section 3 reviews the basics of communication theory, introduces the basic communication channels and establishes the set of axioms which information transfer measures should satisfy. The information transfer measures which are defined by Arimoto are introduced in Section 4, and the alternative definitions for Rényi information transfer measures are discussed in Section 5. Finally, the α -q-mutual information and the α -q-capacities are proposed and their properties analyzed in Section 6 while the previously proposed measures of Sharma–Mittal entropy transfer are discussed in Section 7.

2. Sharma–Mittal Entropy

Let the sets of positive and nonnegative real numbers be denoted with R + and R 0 + , respectively, and let the mapping η q : R R be defined in
η q ( x ) = x , for q = 1 2 ( 1 q ) x 1 ( 1 q ) ln 2 , for q 1
so that its inverse is given in
η q 1 ( x ) = x , for q = 1 1 1 q log ( ( 1 q ) x ln 2 + 1 ) , for q 1 .
The mapping q and its inverse are increasing continuous (hence invertible) functions such that 0 ) = 0 . The q-logarithm is defined in
Log q ( x ) = η q ( log x ) = log x , for q = 1 x ( 1 q ) 1 ( 1 q ) ln 2 , for q 1 ,
and its inverse, the q-exponential, is defined in
Exp q ( y ) = 2 y , for q = 1 1 + ( 1 q ) y ln 2 1 1 q for q 1 ,
for 1 + ( 1 q ) y ln 2 > 0 . Using ηq, we can define the pseudo-addition operation q [7,8]
x q y = η q η q 1 ( x ) + η q 1 ( y ) = x + y + ( 1 q ) x y ; x , y R ,
and its inverse operation, the pseudo substraction
x q y = η q η q 1 ( x ) η q 1 ( y ) = x y 1 + ( 1 q ) y ln 2 ; x , y R .
The q can be rewritten in terms of the generalized logarithm by settings x = log u and y = log v so that
Log q ( u · v ) = Log q ( u ) q Log q ( v ) ; u , v R + .
Let the set of all n-dimensional distributions be denoted with
Δ n ( p 1 , , p n ) | p i 0 , i = 1 n p i = 1 ; n > 1 .
Let the function H n : Δ n R 0 + satisfy the following the Shannon–Khinchin axioms, for all n N , n > 1 .
GSK1
H n is continuous in Δ n ;
GSK2
H n takes its largest value for the uniform distribution, U n = 1 / n , , 1 / n Δ n , i.e., H n ( P ) H n ( U n ) , for any P Δ n ;
GSK3
H n is expandable: H n + 1 ( p 1 , p 2 , , p n , 0 ) = H n ( p 1 , p 2 , , p n ) for all ( p 1 , , p n ) Δ n ;
GSK4
Let P = ( p 1 , , p n ) Δ n , P Q = ( r 11 , r 12 , , r n m ) Δ n m , n , m N , n , m > 1 such that p i = j = 1 m r i j , and Q | k = ( q 1 | k , , q m | k ) Δ m , where q i | k = r i k / p k and α R 0 + are some fixed parameters. Then,
H n m ( P Q ) = H n ( P ) q H m ( Q | P ) , where H m ( Q | P ) = f 1 k = 1 n p k ( α ) f ( H m ( Q | k ) ) ,
where f is an invertible continuous function and P ( α ) = ( p 1 ( α ) , , p n ( α ) ) Δ n is the α -escort distribution of distribution P Δ n defined in
p k ( α ) = p k α i = 1 n p i α , k = 1 , , n , α > 0 .
GSK5
H 2 1 2 , 1 2 = Log q ( 1 ) .
As shown in [9], the unique function H n , which satisfies [GSK1]-[GSK5], is Sharma–Mittal entropy [6].
In the following paragraphs we will assume that X and Y are discrete jointly distributed random variables taking values from sample spaces { x 1 , , x n } and { y 1 , , y m } , and distributed in accordance to P X Δ n and P Y Δ m , respectively. In addition, the joint distribution of X and Y will be denoted in P X , Y Δ n m and the conditional distribution of X given Y will be denoted in P X | Y = P X , Y ( x , y ) P Y ( y ) Δ m , provided that P Y ( y ) > 0 . We will identify the entropy of a random variable X with the entropy of its distribution P X and the Sharma–Mittal entropy will be denoted with H α , q ( X ) H n ( P X ) .
Thus, for a random variable which is distributed to X, Sharma–Mittal entropy can be expressed in
H α , q ( X ) = 1 1 q x P X ( x ) α 1 q 1 α 1 ,
and it can equivalently be expressed as the ηq transformation of Rényi entropy as in
H α , q ( X ) η q R α ( X ) .
Sharma–Mittal entropy, for α , q R 0 + \ 1 , being a continuous function of the parameters and the sums goes over the support of P X . Thus, in the case of q = 1 , α 1 , Sharma–Mittal reduces to Rényi entropy of order α [2]
R α ( X ) H α , 1 ( X ) = 1 1 α log x P X ( x ) α ,
which further reduces to Shannon entropy for α = 1 , q = 1 ,34]
S ( X ) H 1 , 1 ( X ) = x P X ( x ) log P X ( x ) ,
while in the case of q 1 , α = 1 it reduces to Gaussian entropy [5]
G q ( X ) H 1 , q ( X ) = 1 ( 1 q ) ln 2 Π i = 1 n P X ( x ) P X ( x ) 1 .
In addition, Tsallis entropy [3] is obtained for α = q 1 ,
T q ( X ) 1 ( 1 q ) ln 2 x P X ( x ) q 1 ,
while in the case of for q = 2 α it reduces to the Landsberg–Vedral entropy [4]
L α ( X ) H α , 2 α ( X ) = 1 ( α 1 ) ln 2 1 x P X ( x ) α 1 .

3. Sharma–Mittal Information Transfer Axioms

One of the main goals of information and communication theories is characterization and analysis of the information transfer between sender X and receiver Y, which communicate through a channel. The sender and receiver are described by probability distributions P X and P Y while the communication channel with the input X and the output Y is described by the transition matrix P Y | X :
P Y | X ( i , j ) P Y | X ( y j | x i ) .
We assume that maximum likelihood detection is performed at the receiver, which is defined by the mapping d : { y 1 , , y m } { x 1 , , x n } as follows:
d ( y j ) = x i P Y | X ( y j | x i ) > P Y | X ( y j | x k ) ; for all k i ,
assuming that the inequality in (19) is uniquely satisfied. Thus, if the input symbol x i is sent and the output symbol y j is received, the x i will be detected if x i = d ( y j ) and a detection error will be made otherwise, and we define the error function functions ϕ : { x 1 , , x m } × { y 1 , , y m } { 0 , 1 } as in
ϕ ( x i , y j ) = 1 , if   x i = d ( y j ) 0 , otherwise ,
the detection error if a symbol x i is sent
P e r r ( x i ) = y j P Y | X ( y j | x i ) ϕ ( x i , y j ) ; for all x i ,
as well as the average detection error
P ¯ e r r = x i P X ( x i ) P e r r ( x i ) = x i , y j P X , Y ( x , y ) ϕ ( x i , y j ) .
Totally destructive channel: A channel is said to be totally destructive if
P Y | X ( i , j ) = P Y | X ( y j | x i ) = P Y ( y j ) = 1 m ; for all x i ,
i.e., if the sender X and receiver Y are described by independent random variables,
X Y P X , Y ( x , y ) = P X ( x ) P Y ( y ) ,
where the relationship of independence is denoted in . In this case, ϕ i ( y j ) = 1 for all y j and the probability of error is P e r r ( x i ) = 1 ; for all x i , as well as the average probability of error P ¯ e r r = 1 , which means that a correct maximum likelihood detection is not possible.
Perfect communication channel: A channel is said to be perfect if for every x i ,
P Y | X ( y j | x i ) > 0 , for at least one y j
and for every y j
P Y | X ( y j | x i ) > 0 , for exactly one x i .
Note that in this case P Y | X ( y j | x i ) can still take a zero value for some y j and that ϕ i ( y j ) = 0 for any non-zero P Y | X ( y j | x i ) . Thus, the error probability is equal to zero P e r r ( x i ) = 0 ; for all x i , as well as the average probability of error P ¯ e r r = 0 , which means that perfect detection is possible by means of a maximum likelihood detector.
Noisy channel with non-overlapping outputs: A simple example of a perfect transmission channel is the noisy channel with non-overlapping outputs (NOC), which is schematically described in Figure 1. It is a 2-input m = 2 k -output channel ( k N ) defined by the transition matrix:
P Y | X = P Y | X ( · | x 1 ) P Y | X ( · | x 2 ) = 1 k 1 k 0 0 0 0 1 k 1 k
(in this and in the following matrices, the symbol “⋯” stands for the k-time repletion). In the case of k = 1 and m = 2 k = 2 , the channel reduces to the noiseless channel. Although the channel is noisy, the input can always be recovered from the output (if y j is received and j k , the input symbol x 1 is sent, otherwise x 2 is sent). Thus, it is expected that the information which is passed through the channel is equal to the information that can be generated by the input. Note that for a channel input distributed in accordance with
P X = P X ( x 1 ) P X ( x 2 ) = a 1 a ; 0 a 1 ,
the joint probability distribution P X , Y can be expressed as in:
P X , Y = a k a k 0 0 0 0 1 a k 1 a k
and the output distribution P Y , which can be obtained by the summations over columns, is
P Y = P Y ( y 1 ) , , P Y ( y m ) T = a k , , a k , 1 a k , , 1 a k T .
Binary symmetric channels: The binary symmetric channel (BSC) is a two input two output channel described by the transition matrix
P Y | X = P Y | X ( · | x 1 ) T P Y | X ( · | x 2 ) T = 1 p p p 1 p ,
which is schematically described in Figure 2. Note that for p = 1 2 BSC reduces to a totally destructive channel, while in the case of p = 0 it reduces to a perfect channel.

Sharma–Mittal Information Transfer Axioms

In this paper, we search for information theoretical measures of information transfer between sender X and receiver Y, which communicate through a channel if the information is measured with Sharma–Mittal entropy. Thus, we are interested in the information transfer measure, I α , q ( X , Y ) , which is called the α -q-mutual information and its maximum,
C = max P X I α , q ( X , Y ) ,
which is called the α -q-capacity and which requires the following set of axioms to be satisfied.
(A1)
The channel cannot convey negative information, i.e.,
C α , q ( P Y | X ) I α , q ( X , Y ) 0 .
(A2)
The information transfer is zero in the case of a totally destructive channel, i.e.,
P Y | X ( y | x ) = 1 m , for all x , y I α , q ( X , Y ) = C α , q ( P Y | X ) = 0 ,
which is consistent with the conclusion that the average probability of error is one, P ¯ e r r = 1 , in the case of a totally destructive channel.
(A3)
In the case of perfect transmission, the information transfer is equal to the input information, i.e.,
X = Y I α , q ( X , Y ) = H α , q ( X ) , C α , q ( P Y | X ) = Log q n ,
which is consistent with the conclusion that the average probability of error is zero, P ¯ e r r = 0 , in the case of a perfect transmission channel, so that all the information from the input is conveyed.
(A4)
The channel cannot transfer more information than it is possible to be sent, i.e.,
I α , q ( X , Y ) C α , q ( P Y | X ) Log q n ,
which means that a channel cannot add additional information.
(A5)
The channel cannot transfer more information than it is possible to be received, i.e.,
I α , q ( X , Y ) C α , q ( P Y | X ) Log q m ,
which means that a channel cannot add additional information.
(A6)
Consistency with the Shannon case:
lim q 1 , α 1 I α , q ( X , Y ) = I ( X , Y ) , and lim q 1 , α 1 C α , q ( P Y | X ) = C ( P Y | X )
Thus, the axioms ( A ¯ 2 ) and ( A ¯ 3 ) ensure that the information measures are consistent with the maximum likelihood detection (19)–(21). On the other hand, the axioms ( A ¯ 1 ), ( A ¯ 4 ) and ( A ¯ 5 ), prevent a situation in which a physical system conveys information in spite of going through a completely destructive channel, or in which the negative information transfer is observed, indicating that the channel adds or removes information by itself, which could be treated as nonphysical behavior without an intuitive explanation. Finally, the property ( A ¯ 6 ) ensure that the information transfer measures can be considered as generalizations of corresponding Shannon measures. For these reasons, we assume that the satisfaction of the properties ( A ¯ 1 )–( A ¯ 5 ) is mandatory for any reasonable definition of Sharma–Mittal information transfer measures.

4. The α-Mutual Information and the α-Capacity

One of the first proposals for the Rényi mutual information goes back to Arimoto [24], who considered the following definition of mutual information:
I α ( X , Y ) = α 1 α log y x P X ( α ) ( x ) P Y | X α ( y | x ) 1 α ,
where the escort distribution P X ( α ) is defined as in (10), and he also invented an iterative algorithm for the computation of the α -capacity [35], which is defined from the α -mutual information:
C α ( P Y | X ) = max P X I α ( X , Y ) .
Notably, Arimoto’s mutual information can equivalently be represented using the conditional Rényi entropy
R α ( X | Y ) = α α 1 log 2 y P Y ( y ) x P X | Y = y ( x ) α 1 α ,
as in
I α ( X , Y ) R α ( X ) R α ( X | Y ) ,
which can be interpreted as the input uncertainty reduction after the output symbols are received and, in the case of α 1 , the previous definition reduces to the Shannon case. In addition, this measure is directly related to the famous Gallager exponent
E 0 ρ , P X = log y x P X ( x ) P Y | X 1 1 + ρ ( y | x ) 1 + ρ ,
which has been widely used to establish the upper bound of error probability in channel coded communication systems [36] via the relationship [29]
I α ( X , Y ) = α 1 α E 0 1 α 1 , P X ( α ) .
In addition, in the case of α 1 , it reduces to
I 1 ( X , Y ) = lim α 1 I α ( X , Y ) = I ( X , Y ) ,
where
I ( X , Y ) = x , y P X , Y ( x , y ) log P X , Y ( x , y ) P X ( x ) P Y ( y )
stands for Shannon’s mutual information [37].
The α -mutual information I α ( X , Y ) and the α -capacity C α ( P Y X ) satisfy the axioms ( A ¯ 1 )–( A ¯ 6 ) for q = 1 and α > 0 , as stated by the following theorem, which further justifies their usage as the measures of (maximal) information transfer.
Theorem 1.
The mutual information measures I α and C α satisfy the following set of properties:
(A1)
The channel cannot convey negative information, i.e.,
C α ( P Y | X ) I α ( X , Y ) 0 .
(A2)
The (maximal) information transfer is zero in the case of a totally destructive channel, i.e.,
P Y | X ( y | x ) = 1 m , for all x , y I α ( X , Y ) = C α ( P Y | X ) = 0 .
(A3)
In the case of perfect transmission, the (maximal) information transfer is equal to the (maximal) input information, i.e.,
X = Y I α ( X , Y ) = R α ( X ) , C α ( P Y | X ) = log n .
(A4)
The channel cannot transfer more information than it is possible to be sent, i.e.,
I α ( X , Y ) C α ( P Y | X ) log n ;
(A5)
The channel cannot transfer more information than it is possible to be received, i.e.,
I α ( X , Y ) C α ( P Y | X ) log m .
(A6)
Consistency with the Shannon case:
lim α 1 I α ( X , Y ) = I ( X , Y ) , and lim α 1 C α ( P Y | X ) = C ( P Y | X )
Proof. 
As shown in [38], R α ( X | Y ) R α ( X ) , and the nonnegativity property ( A 1 ) follows from the definition of Arimoto’s mutual information (42). In addition, if X Y , then P Y | X ( y | x ) = P Y ( y ) so that the definition (61) implies the property ( A 2 ). Furthermore, in the case of a perfect transmission channel, the mutual information (61) can be represented in -4.6cm0cm
I α ( X , Y ) = α α 1 log y x P X ( x ) α P Y | X α ( y | x ) 1 α x P X ( α ) ( x ) 1 α = α α 1 log y P X ( d ( y ) ) α P Y | X α ( y d ( y ) ) 1 α x P X ( α ) ( x ) 1 α ,
and since
y P X ( d ( y ) ) α P Y | X α ( y d ( y ) ) 1 α = y P X ( d ( y ) ) P Y | X ( y d ( y ) ) = x y : d ( y ) = x P X ( d ( y ) ) P Y | X ( y d ( y ) ) = x P X ( x ) y : d ( y ) = x P Y | X ( y | x ) = 1 ,
we obtain I α ( X , Y ) = R α ( X ) , which proves the property ( A 3 ). Moreover, from the definition as shown in [38], Arimoto’s conditional entropy is positive and satisfies the weak chain rule R α ( X | Y ) R α ( X ) log m , so that the properties ( A 4 ) and ( A 5 ) follow from the definition of Arimoto’s mutual information (42). Finally, the property ( A 6 ) follows directly from the equation (45) and can be approved using L’Hôpital’s rule, which completes the proof of the theorem. □

5. Alternative Definitions of the α-Mutual Information and the α-Channel Capacity

Since Rényi’s proposal, there have been several lines of research to find an appropriate definition and characterization of information transfer measures related to Rényi entropy, which are established by the substitution of the Rényi divergence measure
D α ( P | | Q ) = 1 α 1 log x P ( x ) α Q ( x ) 1 α ,
instead of the Kullback–Leibler one,
D ( P | | Q ) = D 1 ( P | | Q ) = x P ( x ) log P ( x ) Q ( x ) ,
in some of the various definitions which are equivalent in the case of Shannon information measures (46) [29]:
I ( X , Y ) = min Q Y E D α P Y | X Q Y = min Q Y E D α P Y | X Q Y = min Q X min Q Y D α P X , Y Q X Q Y = D α P X , Y P X P Y = S ( X ) S ( X | Y )
where S ( X | Y ) stands for the Shannon conditional entropy,
S ( X | Y ) = x , y P X , Y ( x , y ) log P X | Y ( x | y ) .
All of these measures are consistent with the Shannon case in view of the property ( A 6 ), but their direct usage as measures of Rényi information transfer leads to a breaking of some the properties ( A 1 )–( A 5 ), which justifies the usage of Arimoto’s measures from the previous section as appropriate ones in the context of this research. In the following section, we review the alternative definitions.

5.1. Information Transfer Measures by Sibson

Alternative approaches based on Rényi divergence were proposed by Sibson [23] and considered later by several authors in the context of quantum secure communications [39,40,41,42,43,44], who introduced
J α 1 ( X ; Y ) = min Q Y D α P Y | X P X Q Y P X ,
which can be represented as in [26]
J α 1 ( X , Y ) = α α 1 log y x P X ( x ) P Y | X α ( y | x ) 1 α
and, in the discrete setting, can be related to the Gallager exponent as in [29]:
J α 1 ( X , Y ) = α 1 α E 0 1 α 1 , P X ,
which differs from Arimoto’s definition (61) since in this case the escort distribution does not participate in the error exponent, but an ordinary one does. However, in the case of a perfect channel for which X = Y , the conditional distribution P Y | X α ( y | x ) = 1 for x = y and zero otherwise, so Sibson’s measure (60) reduces to R 1 / α ( X ) , thus breaking the axiom ( A 3 ). This disadvantage can be overcome by the reparametrization α 1 / α so that J 1 / α 1 ( X , Y ) is used as a measure of Rényi information transfer, and the properties of the resulting measure can be considered in a manner similar to the case of Arimoto.

5.2. Information Transfer Measures by Augustin and Csiszar

An alternative definition of Rényi mutual information was also presented by Augustin [25], and later Csiszar [26], who defined
J α 2 ( X ; Y ) = min Q Y E D α P Y | X Q Y ,
However, in the case of perfect transmission, for which X = Y , the measure reduces to Shannon entropy
J α 2 ( X ; Y ) = S ( X ) ,
which breaks the axiom ( A 3 ).

5.3. Information Transfer Measures by Lapidoth, Pfister, Tomamichel and Hayashi

A similar obstacle to the case of the Augustin–Csiszar measure can be observed in the case of mutual information which was considered by Lapidoth and Pfister [27] and Tomamichel and Hayashi [28], who proposed
J α 3 ( X ; Y ) = min Q X min Q Y D α P X , Y Q X Q Y .
As shown in [27] (Lemma 11), if X = Y , then
J α 3 ( X ; Y ) = α 1 α lim α R α ( X ) if α 0 , 1 2 , R α 2 α 1 ( X ) if α > 1 2
so the axiom ( A 3 ) is broken in this case, as well.
Remark 1.
Despite the difference between the definitions of information transfer, in the discrete setting, the alternative definitions discussed above reach the same maximum over the set of input probability distributions, P X ,26,29,45].

5.4. Information Transfer Measures by Chapeau-Blondeau, Delahaies, Rousseau, Tridenski, Zamir, Ingber and Harremoes

Chapeau-Blondeau, Delahaies and Rousseau [31], and independently Tridenski, Zamir and Ingber [46] and Harremoes [47], defined the Rényi mutual information using the Rényi divergence (55), so that the mutual information defined using the Rényi divergence
J α 4 ( X , Y ) = D α P X , Y P X P Y
for α > 0 and α 1 , while in the case of α = 1 it reduces to Shannon mutual information. However, the ordinal definition can correspond only to a Rényi entropy of order 2 α since in the case of X = Y it reduces to J α 4 ( X , Y ) = R 2 α ( X ) (see also [47]), which can be overcome by the reparametrization α = 2 q , similar to the case of Sibson’s measure. This measure has been discussed in the past with various operational characterizations, and could also be considered as a measure of information transfer, although the satisfaction of all of the axioms ( A 1 )–( A 6 ) is not self-evident for general channels.

5.5. Information Transfer Measures by Jizba, Kleinert and Shefaat

Finally, we will mention the definition by Jizba, Kleinert and Shefaat [48],
J α 4 ( X , Y ) R α ( X ) R ^ α ( X | Y ) ,
which is defined in the same manner as in Arimoto’s case (42), but with another choice of conditional Rényi entropy
R ^ α ( X | Y ) = 1 1 α log x P X ( α ) ( x ) 2 ( 1 α ) R α ( X | Y = y ) ,
which arises from the Generalized Shannon–Khinchin axiom [GSK4] if the pseudo-additivity in the equation (9) is restricted to an ordinary addition, in which case the GSK axioms uniquely determine Rényi entropy [49]. However, despite its wide applicability in the modeling of causality and financial time series, this mutual information can take negative values which breaks the axiom ( A 1 ), which is assumed to be mandatory in this paper. For further discussion of the physicalism of negative mutual information in the domain of financial time series analysis, the reader is referred to [48].

6. The α-q Mutual Information and the α-q-Capacity

In the past several attempts have been done to define an appropriate channel capacity measure which corresponds to instances of the Sharma–Mittal entropy class. All of them follow a similar recipe by which the channel capacity is defined as in (32), as a maximum of appropriately defined mutual information I α , q . However, all of the classes consider only special cases of Sharma–Mittal entropy and all of them fail to satisfy at least one of the properties ( A ¯ 1 )–( A ¯ 6 ) which an information transfer has to satisfy, as will be discussed Section 7.
In this section we propose a general measures of the α -q mutual information and the α -q-capacity by the requirement that the axioms ( A ¯ 1 )–( A ¯ 6 ) are satisfied, which could qualify them as appropriate measures of information transfer, without nonphysical properties. The special instances of the α -q (maximal) information transfer measures are also discussed and the analytic expressions for a binary symmetric channel are provided.

6.1. The α-q Information Transfer Measures and Its Instances

The α -q-mutual information (42) is defined using the q-subtraction defined in (6), as follows:
I α , q ( X , Y ) = H α , q ( X ) q H α , q ( X | Y ) ,
where we introduced the conditional Sharma–Mittal entropy H α , q ( Y | X ) as in
H α , q ( X | Y ) = η q R α ( X | Y ) = 1 ( 1 q ) ln 2 y P Y ( y ) x P X | Y = y ( x ) α 1 α α ( 1 q ) α 1 1 ,
R α ( X | Y ) stands for Arimoto’s definition of the conditional Rényi entropy (41). The expression (69) can also be obtained if the mapping η q is applied to both sides of the equality (42), by which Arimoto’s mutual information is defined, so we may establish the relationship
I α , q ( X , Y ) = η q I α ( X , Y ) = η q α 1 α log y x P X ( α ) ( x ) P Y | X α ( y | x ) 1 α ,
which can be represented using the Gallager error exponent (43) as in
I α , q ( X , Y ) = η q α 1 α E 0 1 α 1 , P X ( α ) = 1 ( 1 q ) ln 2 2 α ( 1 q ) 1 α E 0 1 α 1 , P X ( α ) 1 .
Arimoto’s α -q-capacity is now defined in
C α , q = max P X I α , q ( X , Y ) ,
and using the fact that η q is increasing, it can be related with the corresponding α -capacity as in
C α , q = max P X I α , q ( X , Y ) = max P X η q I α ( X , Y ) = η q max P X I α ( X , Y ) = η q C α ( P Y | X ) .
Using the expressions (45) and (71), in the case of α = 1 , the α -q mutual information reduces to
I 1 , q = 1 ( 1 q ) ln 2 Π x , y 2 P X , Y ( x , y ) log P X , Y ( x , y ) P X ( x ) P Y ( y ) 1 = 1 ( 1 q ) ln 2 Π x , y P X , Y ( x , y ) P X ( x ) P Y ( y ) P X , Y ( x , y ) 1 .
The α -q-capacity is given in
C 1 , q = max P X 1 ( 1 q ) ln 2 Π x , y P X , Y ( x , y ) P X ( x ) P Y ( y ) P X , Y ( x , y ) 1
and these measures can serve as (maximal) information transfer measures corresponding to Gaussian entropy, which was not considered before in the context of information transmission. Naturally, if in addition q 1 , the measures reduce to Shannon’s mutual information and Shannon capacity [37].
Additional special cases of the α -q (maximal) information transfer include the α -mutual information (42) and the α -capacity (40), which are obtained for q = 1 ; the measures which correspond to Tsallis entropy can be obtained for q = α and the ones which correspond to Landsberg–Vedral entropy for q = 2 α . These special instances are listed in Table 1.
As discussed in Section 7, previously considered information measures cover only particular special cases and break at least one of the axioms ( A ¯ 1 )–( A ¯ 5 ), which leads to unexpected and counterintuitive conclusions about the channels, such as negative information transfer and achieving super-capacitance or sub-capacitance [4], which could be treated as a nonphysical behavior. On the other hand, apart from the generality, the α -q information transfer measures proposed in this paper overcame the disadvantages which could qualify them as appropriate measures, as stated in the following theorem.
Theorem 2.
The α-q information transfer measures I α , q and C α , q satisfy the set of the axioms ( A ¯ 1 )–( A ¯ 6 ).
Proof. 
The proof is the straightforward application of the mapping η q to the equations in the α -mutual information properties ( A 1 )–( A 5 ), while the ( A ¯ 6 ) follows from the above discussion. □
Remark 2.
Note that the symmetry I α , q ( X , Y ) = I α , q ( Y , X ) does not hold in general in the case of the α-q mutual information nor in the case of the α mutual information [50,51] and if the mutual information is defined so that the symmetry is preserved, some of the axioms ( A ¯ 1 )–( A ¯ 6 ) might be broken. In addition, the alternative definition of the mutual information, I α , q ( Y , X ) = H α , q ( Y ) H α , q ( Y | X ) , which uses an ordinary substraction operator instead of q operation, can also be introduced, but in this case the property ( A ¯ 5 ) might not hold in general, as discussed in Section 7.

6.2. The α-q-Capacity of Binary Symmetric Channels

As shown by Cai and Verdú [45], the α -mutual information of Arimoto’s type I α is maximized for the uniform distribution P X = ( 1 / 2 , 1 / 2 ) , and Arimoto’s α -capacity has the value
C α ( B S C ) = 1 r α ( p ) ,
where the binary entropy function r α is defined as
r α ( p ) = R α ( p , 1 p ) = 1 1 α log ( p α + ( 1 p ) α ) ,
for α > 0 , α 1 , while in the limit of α 1 , the expression (78) reduces to the well-known result for the Shannon capacity (see Fano [52])
C 1 ( B S C ) = lim α 1 C α ( B S C ) = 1 + p log p + ( 1 p ) log ( 1 p ) .
The analytic expressions for the α -q-capacities of binary symmetric channel’s can be obtained from the expressions (74) and (77), so that
C α , q ( B S C ) = q C α ( B S C ) = 1 ( 1 q ) ln 2 2 1 q p α + ( 1 p ) α 1 q 1 α 1 ;
in the case of q = 1 , it reduces to the case of Rényi entropy while, in the case of α = 1 , to the case of Gaussian entropy (77)
C 1 , q ( B S C ) = 1 ( 1 q ) ln 2 2 p p ( 1 p ) 1 p 1 .
The analytic expressions for BSC α -q capacities for other instances can straightforwardly be obtained by specifying the values of the parameters, whose instances are listed in Table 1, while the plots of the BSC α -q-capacities, which correspond to the Gaussian and the Tsallis entropies, are shown in Figure 3 and Figure 4.
The α -q-capacity (80) can equivalently be expressed in
C α , q ( B S C ) = Log q 2 q h α , q ( p ) ,
where the Sharma–Mittal binary entropy function is defined in
h α , q ( p ) = H α , q ( p , 1 p ) = 1 1 q ( p α + ( 1 p ) α ) 1 q 1 α 1 ,
which reduces to the Rényi binary entropy function, in the case of q = 1 ,
h α , 1 ( p ) = lim q 1 h α , q ( p ) = R α ( p , 1 p ) = 1 1 α log p α + ( 1 p ) α ) ,
to the Tsallis binary entropy function, in the case of α = 1 ,
h q , q ( p ) = h q , q ( p ) = T q ( p , 1 p ) = 1 1 q p q + ( 1 p ) q 1 ,
to the Gaussian binary entropy function, in the case of α = 1 ,
h 1 , q ( p ) = lim α 1 h α , q ( p ) = G q ( p , 1 p ) = 1 ( 1 q ) ln 2 p ( 1 q ) p ( 1 p ) ( 1 q ) ( 1 p ) 1 ,
and to the Shannon binary entropy function, in the case of α = q = 1 ,
h 1 , 1 ( p ) = lim q , α 1 h α , q ( p ) = S ( p , 1 p ) = p log p ( 1 p ) log ( 1 p ) .
The expression (82) can be interpreted similarly as in the Shannon case. Thus, a BSC channel with input X and output Y can be modeled with an input–output relation Y = X Z where ⊕ stands for modulo 2 sum and Z is channel noise taking values from { 1 , 0 } , distributed in accordance with ( p , 1 p ) . If we measure the information which is lost per bit during transmission with the Sharma–Mittal entropy H α , q ( Z ) = h α ( p ) , then C α , q stands for useful information left over for every bit of information received.

7. An Overview of the Previous Approaches to Sharma–Mittal Information Transfer Measures

In this section, we review the previous attempts at a definition of Sharma–Mittal information transfer measures, which are defined from the basic requirement of consistency with the Shannon measure as given by the axiom ( A ¯ 6 ). However, as we show in the following paragraphs, all of them break at least one of the axioms ( A ¯ 1 )–( A ¯ 5 ), which are satisfied in the case of the α -q (maximal) information transfer measures (69) and (73), in accordance with the discussion in Section 6.

7.1. Daróczy’s Capacity

The first considerations of generalized channel capacities and generalized mutual information for the q-entropy go back to Daróczy [30], who introduced conditional Tsallis entropy
T ¯ q ( Y | X ) = x P X q ( x ) T q ( Y | X = x ) ,
where the row entropies are defined as in
T q ( Y | X = x ) = 1 ( 1 q ) log ( 2 ) x P Y | X ( y | x ) q 1
and the mutual information is defined as in
J α , q 5 ( X , Y ) = T q ( Y ) T ¯ q ( Y | X ) .
However, in the case of a totally destructive channel, X Y , P Y | X ( y | x ) = P Y ( y ) , T q ( Y | X = x ) = T q ( Y ) and
T q ( Y | X ) = T q ( Y ) x P X ( x ) q
so that
J α , q 5 ( X , Y ) = T q ( Y ) 1 x P X ( x ) q = 1 x P X ( x ) q Log q m .
This expression is zero for an input probability distribution P X = ( 1 , 0 , , 0 ) and its permutations, but, in general, it is negative for q < 1 , positive for q > 1 and 0 only for q = 1 , so the axiom ( A ¯ 2 ) is broken (see Figure 5). As a result, the channel capacity, which is defined in accordance to (32), is zero for q 1 and positive for q > 1 , as illustrated in Figure 6 by the example of BSC for which the Daroczy’s channel capacity can be computed as in [30,53]
C q 5 ( B S C ) = 1 2 1 q q 1 2 q q 1 [ 1 ( 1 p ) q p q ] .
In the same figure, we plotted the graph for the α -q channel capacities proposed in this paper, and all of them remain zero in the case of a totally destructive BSC, as expected.

7.2. Yamano Capacities

Similar problems to the ones mentioned above arise in the case of mutual information and corresponding capacity measures considered by Yamano [33], who addressed the information transmission characterized by Landsberg–Vedral entropy L q , given in (17).
Thus, the first proposal is based on the mutual information of the form
J q 6 ( X , Y ) = L q ( X ) + L q ( Y ) L q ( X , Y ) ,
where the joint entropy is defined in
L q ( X , Y ) = 1 q 1 1 x , y P X , Y ( x , y ) q 1 .
However, in the case of a fully destructive channel, P Y ( y ) = 1 / m and P X , Y ( x , y ) = P X ( x ) / m , so that
J q 6 ( X , Y ) = 1 q 1 1 x P X ( x ) q 1 + 1 q 1 m q 1 1 1 q 1 m q 1 1 x P X ( x ) q 1 ,
which can be simplified to
J q 6 ( X , Y ) = 1 m q 1 q 1 1 x P X ( x ) q 1 .
Similarly to the case of Daroczy’s capacity, this expression is zero for an input probability distribution P X = ( 1 , 0 , , 0 ) and its permutations but, in general, it is negative for q > 1 , positive for q < 1 and 0 only for q = 1 , so the axiom ( A ¯ 2 ) is broken (see Figure 5). In Figure 6 we illustrated the Yamano channel capacity as a function of the parameter q, in the case of two input channels with P X = [ a , 1 a ] , the channel capacity is zero for q > 1 (which is obtained for P X = [ 1 , 0 ] ), and
C q 6 ( B S C ) = 1 q 1 2 q 1 2 2 q 2 ,
for q > 1 (which is obtained for P X = [ 1 / 2 , 1 / 2 ] ). In the same Figure, we plotted the graph for the α -q channel capacities proposed in this paper, and, as before, all of them remain zero in the case of a totally destructive BSC, as expected.
Further attempts were made in [33], where the mutual information is defined in an analogous manner to (66) and (66), with the generalized divergence measure introduced in [54]. Thus, the alternative measure for mutual information is defined in
J q 7 ( X , Y ) = 1 ( 1 q ) ln 2 1 x , y P X , Y q ( x , y ) 1 x , y P X , Y ( x , y ) P X ( x ) P Y ( y ) P X , Y ( x , y ) 1 q .
However, in the case of the simplest perfect communication channel for which X = Y , the mutual information reduces to
J q 7 ( X , Y ) = 1 ( 1 q ) ln 2 1 x P X ( x ) 2 q x P X ( x ) q L q ( X ) ,
which breaks the axiom ( A ¯ 3 ).

7.3. Landsber–Vedral Capacities

To avoid these problems, Landsberg and Vedral [4] proposed the mutual information measure and related channel capacities for the Sharma–Mittal entropy class H α , q , particularly considering the choice of q = α , which corresponds to Tsallis entropy, q = 2 α , and the case of q = 1 , which corresponds to the Rényi entropy
J α , q 8 ( X , Y ) = H α , q ( Y ) H α , q ˜ ( Y | X ) ,
where the conditional entropy H α , q ˜ L V ( Y | X ) is defined as in
H α , q ˜ ( Y | X ) = x P X ( x ) H α , q ( Y | X = x )
and
H α , q ( Y | X = x ) = 1 1 q y P Y | X ( y | x ) α 1 q 1 α 1 .
Although this definition bears some similarities to the α -q mutual information proposed in formula (69), several key differences can be observed. First of all, it characterizes the information transfer as the output uncertainty reduction after the input symbols are known, instead of input uncertainty reduction, after the output symbols are known (42). In addition, it uses the ordinary—operation instead of the q one. In addition, note that the definition of conditional entropy (102) generally differs from the definition proposed in (70).
The definition (101) resolves the issue of the axiom ( A ¯ 2 ) which appears in the case of the Daroczy capacity, since in the case of a totally destructive channel ( X Y ), P Y | X ( y | x ) = P Y ( y ) and L q ( Y | X = x ) = L q ( Y ) and L q ( Y | X ) = L q ( Y ) , so that I α , q l v ( X , Y ) = 0 . However, the problems remain with the axiom ( A ¯ 5 ), which can be observed in the case of a noisy channel with non-overlapping outputs if the number of channel inputs is lower than the number of channel outputs n < m . Indeed, in the case of a noisy channel with non-overlapping outputs given by the transition matrix (27), both of the row entropies L q ( Y | X = x ) have the same value, which is independent of x
H α , q ( Y | X = x ) = k 1 q 1 ( q 1 ) ln 2 = Log q k ; for x = x 1 , x 2 ,
and the maximal value of Landsberg–Vedral mutual information (101) is obtained only by maximizing H α , q ( Y ) over P X , which is achieved if X is uniformly distributed, since in this case Y is uniformly distributed, as well as ( a = 1 2 in (28)), so the maximal value of the output entropy is H α , q ( Y ) = Log q ( 2 k ) and the mutual information is maximized for
C α , q 8 ( N O C ) = Log q ( 2 k ) Log q ( k ) ,
which is greater than Log q ( 2 ) for k 2 , i.e., for m 4 outputs, so the axiom ( A ¯ 5 ) is broken, which is illustrated in Figure 7.

7.4. Chapeau-Blondeau–Delahaies–Rousseau Capacities

Following a similar approach to the one in Section 5.4, Chapeau-Blondeau, Delahaies and Rousseau considered the definition of mutual information which corresponds to the Tsallis entropy using Tsallis divergence,
D q , q ( P | | Q ) = 1 q 1 x P ( x ) q Q ( x ) 1 q 1 ,
can be written in
J q 9 ( X , Y ) = D q , q P X , Y P X P Y = η q D q P X , Y P X P Y = 1 1 q 1 x , y P X , Y ( x , y ) q P X ( x ) 1 q P Y ( y ) 1 q .
However, this definition is not directly applicable as a measure of information transfer to the Tsallis entropy with index q, since in the case of X = Y it reduces to J q 9 ( X , Y ) = T 2 q ( X ) , and requires the reparametrization q 2 q , similar to Section 5.4, while the satisfaction of the axioms ( A ¯ 4 ) and ( A ¯ 5 ) is not self evident.

8. Conclusions and Future Work

A general treatment of the Sharma–Mittal entropy transfer was provided together with the analyses of existing information transfer measures for the non-additive Sharma–Mittal information transfer. It was shown that the existing definitions fail to satisfy at least one of the axioms common to the Shannon case, by which the information transfer has to be non-negative, less than the input and output uncertainty, equal to the input uncertainty in the case of perfect transmission and equal to zero in the case of a totally destructive channel. Thus, breaking some of these axioms implies unexpected and counterintuitive conclusions about the channels, such as achieving super-capacitance or sub-capacitance [4], which could be treated as nonphysical behavior. In this paper, alternative measures of the α -q mutual information and the α -q channel capacity were proposed so that all of the axioms which are broken in the case of the Sharma–Mittal information transfer measures considered before are satisfied, which could qualify them as physically consistent measures of information transfer.
Taking into account the previous research of non-extensive statistical mechanics [3], where the linear growth of the physical quantities has been recognized as a critical property in non-extensive [55] and non-exponentially growing systems [56], and taking into account the previous research from the field of information theory, where the Sharma–Mittal entropy has been considered an appropriate scaling measure which provides extensive information rates [21], the α -q mutual information and the α -q channel capacity seem to be promising measures for the characterization of information transmission in the systems where the Shannon entropy rate diverges or disappears in an infinite time limit. In addition, as was shown in this paper, the proposed information transfer measures are compatible with the maximum likelihood detection, which indicates their potential for operational characterization of coding theory and hypothesis testing problems [26].

Author Contributions

Conceptualization, V.M.I. and I.B.D.; validation, V.M.I. and I.B.D.; formal analysis, V.M.I.; funding acquisition, I.B.D.; project administration, I.B.D.; writing—original draft preparation, V.M.I.; writing—review and editing, V.M.I. and I.B.D. All authors have read and agreed to the published version of the manuscript.

Funding

This research was supported in part by NSF under grants 1907918 and 1828132 and by Ministry of Science and Technological Development, Republic of Serbia, Grants Nos. ON 174026 and III 044006. The APC was funded by NSF under grants 1907918 and 1828132.

Data Availability Statement

Not applicable.

Conflicts of Interest

The authors declare no conflict of interest.

References

  1. Ilić, V.M.; Stanković, M.S. A unified characterization of generalized information and certainty measures. Phys. A Stat. Mech. Appl. 2014, 415, 229–239. [Google Scholar] [CrossRef] [Green Version]
  2. Renyi, A. Probability Theory; North-Holland Series in applied mathematics and mechanics; North-Holland Publishing Company: Amsterdam, The Netherlands, 1970. [Google Scholar]
  3. Tsallis, C. Possible generalization of Boltzmann-Gibbs statistics. J. Stat. Phys. 1988, 52, 479–487. [Google Scholar] [CrossRef]
  4. Landsberg, P.T.; Vedral, V. Distributions and channel capacities in generalized statistical mechanics. Phys. Lett. A 1998, 247, 211–217. [Google Scholar] [CrossRef]
  5. Frank, T.; Daffertshofer, A. Exact time-dependent solutions of the Renyi Fokker-Planck equation and the Fokker-Planck equations related to the entropies proposed by Sharma and Mittal. Phys. A Stat. Mech. Appl. 2000, 285, 351–366. [Google Scholar] [CrossRef]
  6. Sharma, B.; Mittal, D. New non-additive measures of entropy for discrete probability distributions. J. Math. Sci. 1975, 10, 28–40. [Google Scholar]
  7. Tsallis, C. What are the numbers that experiments provide. Quim. Nova 1994, 17, 468–471. [Google Scholar]
  8. Nivanen, L.; Le Méhauté, A.; Wang, Q.A. Generalized algebra within a nonextensive statistics. Rep. Math. Phys. 2003, 52, 437–444. [Google Scholar] [CrossRef] [Green Version]
  9. Ilić, V.M.; Stanković, M.S. Generalized Shannon-Khinchin axioms and uniqueness theorem for pseudo-additive entropies. Phys. A Stat. Mech. Appl. 2014, 411, 138–145. [Google Scholar] [CrossRef] [Green Version]
  10. Jizba, P.; Korbel, J. When Shannon and Khinchin meet Shore and Johnson: Equivalence of information theory and statistical inference axiomatics. Phys. Rev. E 2020, 101, 042126. [Google Scholar] [CrossRef]
  11. Esteban, M.D.; Morales, D. A summary on entropy statistics. Kybernetika 1995, 31, 337–346. [Google Scholar]
  12. Lenzi, E.; Scarfone, A. Extensive-like and intensive-like thermodynamical variables in generalized thermostatistics. Phys. A Stat. Mech. Appl. 2012, 391, 2543–2555. [Google Scholar] [CrossRef]
  13. Frank, T.; Plastino, A. Generalized thermostatistics based on the Sharma-Mittal entropy and escort mean values. Eur. Phys. J. B Condens. Matter Complex Syst. 2002, 30, 543–549. [Google Scholar] [CrossRef]
  14. Aktürk, O.Ü.; Aktürk, E.; Tomak, M. Can Sobolev inequality be written for Sharma-Mittal entropy? Int. J. Theor. Phys. 2008, 47, 3310–3320. [Google Scholar] [CrossRef]
  15. Mazumdar, S.; Dutta, S.; Guha, P. Sharma–Mittal quantum discord. Quantum Inf. Process. 2019, 18, 1–26. [Google Scholar] [CrossRef] [Green Version]
  16. Elhoseiny, M.; Elgammal, A. Generalized Twin Gaussian processes using Sharma–Mittal divergence. Mach. Learn. 2015, 100, 399–424. [Google Scholar] [CrossRef] [Green Version]
  17. Koltcov, S.; Ignatenko, V.; Koltsova, O. Estimating Topic Modeling Performance with Sharma–Mittal Entropy. Entropy 2019, 21, 660. [Google Scholar] [CrossRef] [Green Version]
  18. Jawad, A.; Bamba, K.; Younas, M.; Qummer, S.; Rani, S. Tsallis, Rényi and Sharma-Mittal holographic dark energy models in loop quantum cosmology. Symmetry 2018, 10, 635. [Google Scholar] [CrossRef] [Green Version]
  19. Ghaffari, S.; Ziaie, A.; Moradpour, H.; Asghariyan, F.; Feleppa, F.; Tavayef, M. Black hole thermodynamics in Sharma–Mittal generalized entropy formalism. Gen. Relativ. Gravit. 2019, 51, 1–11. [Google Scholar] [CrossRef] [Green Version]
  20. Américo, A.; Khouzani, M.; Malacaria, P. Conditional Entropy and Data Processing: An Axiomatic Approach Based on Core-Concavity. IEEE Trans. Inf. Theory 2020, 66, 5537–5547. [Google Scholar] [CrossRef]
  21. Girardin, V.; Lhote, L. Rescaling entropy and divergence rates. IEEE Trans. Inf. Theory 2015, 61, 5868–5882. [Google Scholar] [CrossRef]
  22. Ciuperca, G.; Girardin, V.; Lhote, L. Computation and estimation of generalized entropy rates for denumerable Markov chains. IEEE Trans. Inf. Theory 2011, 57, 4026–4034. [Google Scholar] [CrossRef] [Green Version]
  23. Sibson, R. Information radius. Z. Wahrscheinlichkeitstheorie Verwandte Geb. 1969, 14, 149–160. [Google Scholar] [CrossRef]
  24. Arimoto, S. Information Mesures and Capacity of Order α for Discrete Memoryless Channels. In Topics in Information Theory; Colloquia Mathematica Societatis János Bolyai; Csiszár, I., Elias, P., Eds.; North-Holland Pub. Co.: Amsterdam, The Netherlands, 1977; Volume 16, pp. 41–52. [Google Scholar]
  25. Augustin, U. Noisy Channels. Ph.D. Thesis, Universität Erlangen-Nürnberg, Erlangen, Germany, 1978. [Google Scholar]
  26. Csiszár, I. Generalized cutoff rates and Rényi’s information measures. IEEE Trans. Inf. Theory 1995, 41, 26–34. [Google Scholar] [CrossRef]
  27. Lapidoth, A.; Pfister, C. Two measures of dependence. Entropy 2019, 21, 778. [Google Scholar] [CrossRef] [Green Version]
  28. Tomamichel, M.; Hayashi, M. Operational interpretation of Rényi information measures via composite hypothesis testing against product and Markov distributions. IEEE Trans. Inf. Theory 2017, 64, 1064–1082. [Google Scholar] [CrossRef] [Green Version]
  29. Verdú, S. α-mutual information. In Proceedings of the 2015 Information Theory and Applications Workshop (ITA), San Diego, CA, USA, 1–6 February 2015; pp. 1–6. [Google Scholar]
  30. Daróczy, Z. Generalized information functions. Inf. Control 1970, 16, 36–51. [Google Scholar] [CrossRef] [Green Version]
  31. Chapeau-Blondeau, F.; Rousseau, D.; Delahaies, A. Renyi entropy measure of noise-aided information transmission in a binary channel. Phys. Rev. E 2010, 81, 051112. [Google Scholar] [CrossRef] [Green Version]
  32. Chapeau-Blondeau, F.; Delahaies, A.; Rousseau, D. Tsallis entropy measure of noise-aided information transmission in a binary channel. Phys. Lett. A 2011, 375, 2211–2219. [Google Scholar] [CrossRef] [Green Version]
  33. Yamano, T. A possible extension of Shannon’s information theory. Entropy 2001, 3, 280–292. [Google Scholar] [CrossRef]
  34. Shannon, C.E. A mathematical theory of communication. Bell Syst. Tech. J. 1948, 27, 379–423. [Google Scholar] [CrossRef] [Green Version]
  35. Arimoto, S. Computation of random coding exponent functions. Inf. Theory IEEE Trans. 1976, 22, 665–671. [Google Scholar] [CrossRef]
  36. Gallager, R. A simple derivation of the coding theorem and some applications. IEEE Trans. Inf. Theory 1965, 11, 3–18. [Google Scholar] [CrossRef] [Green Version]
  37. Cover, T.M.; Thomas, J.A. Elements of Information Theory (Wiley Series in Telecommunications and Signal Processing); John Wiley & Sons, Inc: Hoboken, NJ, USA, 2006. [Google Scholar]
  38. Fehr, S.; Berens, S. On the conditional Rényi entropy. Inf. Theory IEEE Trans. 2014, 60, 6801–6810. [Google Scholar] [CrossRef]
  39. Wilde, M.M.; Winter, A.; Yang, D. Strong converse for the classical capacity of entanglement-breaking and Hadamard channels via a sandwiched Rényi relative entropy. Commun. Math. Phys. 2014, 331, 593–622. [Google Scholar] [CrossRef] [Green Version]
  40. Gupta, M.K.; Wilde, M.M. Multiplicativity of completely bounded p-norms implies a strong converse for entanglement-assisted capacity. Commun. Math. Phys. 2015, 334, 867–887. [Google Scholar] [CrossRef] [Green Version]
  41. Beigi, S. Sandwiched Rényi divergence satisfies data processing inequality. J. Math. Phys. 2013, 54, 122202. [Google Scholar] [CrossRef] [Green Version]
  42. Hayashi, M.; Tomamichel, M. Correlation detection and an operational interpretation of the Rényi mutual information. J. Math. Phys. 2016, 57, 102201. [Google Scholar] [CrossRef]
  43. Hayashi, M.; Tajima, H. Measurement-based formulation of quantum heat engines. Phys. Rev. A 2017, 95, 032132. [Google Scholar] [CrossRef] [Green Version]
  44. Hayashi, M. Quantum Wiretap Channel With Non-Uniform Random Number and Its Exponent and Equivocation Rate of Leaked Information. IEEE Trans. Inf. Theory 2015, 61, 5595–5622. [Google Scholar] [CrossRef] [Green Version]
  45. Cai, C.; Verdú, S. Conditional Rényi Divergence Saddlepoint and the Maximization of α-Mutual Information. Entropy 2019, 21, 969. [Google Scholar] [CrossRef] [Green Version]
  46. Tridenski, S.; Zamir, R.; Ingber, A. The Ziv–Zakai–Rényi bound for joint source-channel coding. IEEE Trans. Inf. Theory 2015, 61, 4293–4315. [Google Scholar] [CrossRef]
  47. Harremoës, P. Interpretations of Rényi entropies and divergences. Phys. A Stat. Mech. Its Appl. 2006, 365, 57–62. [Google Scholar] [CrossRef] [Green Version]
  48. Jizba, P.; Kleinert, H.; Shefaat, M. Rényi’s information transfer between financial time series. Phys. A Stat. Mech. Appl. 2012, 391, 2971–2989. [Google Scholar] [CrossRef] [Green Version]
  49. Jizba, P.; Arimitsu, T. The world according to Rényi: Thermodynamics of multifractal systems. Ann. Phys. 2004, 312, 17–59. [Google Scholar] [CrossRef]
  50. Iwamoto, M.; Shikata, J. Information theoretic security for encryption based on conditional Rényi entropies. In Proceedings of the International Conference on Information Theoretic Security, Singapore, 28–30 November 2013; pp. 103–121. [Google Scholar]
  51. Ilić, V.; Djordjević, I.; Stanković, M. On a general definition of conditional Rényi entropies. Proceedings 2018, 2, 166. [Google Scholar] [CrossRef] [Green Version]
  52. Fano, R.M. Transmission of Information; M.I.T. Press: Cambridge, MA, USA, 1961. [Google Scholar]
  53. Ilic, V.M.; Djordjevic, I.B.; Küeppers, F. On the Daróczy-Tsallis capacities of discrete channels. Entropy 2015, 20, 2. [Google Scholar]
  54. Yamano, T. Information theory based on nonadditive information content. Phys. Rev. E 2001, 63, 046105. [Google Scholar] [CrossRef] [Green Version]
  55. Tsallis, C.; Gell-Mann, M.; Sato, Y. Asymptotically scale-invariant occupancy of phase space makes the entropy Sq extensive. Proc. Natl. Acad. Sci. USA 2005, 102, 15377–15382. [Google Scholar] [CrossRef] [Green Version]
  56. Korbel, J.; Hanel, R.; Thurner, S. Classification of complex systems by their sample-space scaling exponents. New J. Phys. 2018, 20, 093007. [Google Scholar] [CrossRef]
Figure 1. Noisy channel with non-overlapping outputs.
Figure 1. Noisy channel with non-overlapping outputs.
Entropy 23 00702 g001
Figure 2. Binary symmetric channel.
Figure 2. Binary symmetric channel.
Entropy 23 00702 g002
Figure 3. The α -q-capacity of BSC for the Gaussian entropy (the case of α = 1 ) as a function of q for various values of the channel parameter p from 0.5 (totally destructive channel) to 0 (perfect transmission). All of the curves lies between 0 and Log q 2 , which is the maximum value of the Gaussian entropy.
Figure 3. The α -q-capacity of BSC for the Gaussian entropy (the case of α = 1 ) as a function of q for various values of the channel parameter p from 0.5 (totally destructive channel) to 0 (perfect transmission). All of the curves lies between 0 and Log q 2 , which is the maximum value of the Gaussian entropy.
Entropy 23 00702 g003
Figure 4. The α -q-capacity of BSC for the Tsallis entropy (the case of α = q ) as a function of q for various values of the channel parameter p from 0.5 (totally destructive channel) to 0 (perfect transmission). All of the curves lies between 0 and Log q 2 , which is the maximum value of the Tsallis entropy.
Figure 4. The α -q-capacity of BSC for the Tsallis entropy (the case of α = q ) as a function of q for various values of the channel parameter p from 0.5 (totally destructive channel) to 0 (perfect transmission). All of the curves lies between 0 and Log q 2 , which is the maximum value of the Tsallis entropy.
Entropy 23 00702 g004
Figure 5. Daróczy’s (solid lines) and Yamano’s (dashed lines) mutual information in the case of a totally destructive BSC as functions of the input distribution parameter a, P X = [ a , 1 a ] T for different values of q, obtaining negative values for q < 1 and q > 1 , respectively, breaking the axioms ( A ¯ 1 ) and ( A ¯ 2 ). The α -q-mutual information is zero; for all q, and satisfies ( A ¯ 1 ) and ( A ¯ 2 ).
Figure 5. Daróczy’s (solid lines) and Yamano’s (dashed lines) mutual information in the case of a totally destructive BSC as functions of the input distribution parameter a, P X = [ a , 1 a ] T for different values of q, obtaining negative values for q < 1 and q > 1 , respectively, breaking the axioms ( A ¯ 1 ) and ( A ¯ 2 ). The α -q-mutual information is zero; for all q, and satisfies ( A ¯ 1 ) and ( A ¯ 2 ).
Entropy 23 00702 g005
Figure 6. Daróczy’s (solid lines) and Yamano’s (dashed lines) capacities in the case of totally destructive BSC as functions of the parameter q. In the regions of q < 1 and q > 1 , respectively, the corresponding negative mutual information is maximized for P X = [ 1 , 0 ] T (zero capacity) having the positive values outside the regions and breaking the axiom ( A ¯ 2 ). The α -q-capacity is zero; for all q, and satisfies ( A ¯ 2 ).
Figure 6. Daróczy’s (solid lines) and Yamano’s (dashed lines) capacities in the case of totally destructive BSC as functions of the parameter q. In the regions of q < 1 and q > 1 , respectively, the corresponding negative mutual information is maximized for P X = [ 1 , 0 ] T (zero capacity) having the positive values outside the regions and breaking the axiom ( A ¯ 2 ). The α -q-capacity is zero; for all q, and satisfies ( A ¯ 2 ).
Entropy 23 00702 g006
Figure 7. Landsberg–Vedral capacities for the Tsallis (solid lines) and the Landsberg–Vedral (dashed lines) entropies in the case of a (perfect) noisy channel with non-overlapping outputs with m outputs as functions of q, for different values of m. The axiom ( A ¯ 4 ) is broken for all m > 2 and satisfied in the case of corresponding α -q-capacities, C q , q and C q , 2 q .
Figure 7. Landsberg–Vedral capacities for the Tsallis (solid lines) and the Landsberg–Vedral (dashed lines) entropies in the case of a (perfect) noisy channel with non-overlapping outputs with m outputs as functions of q, for different values of m. The axiom ( A ¯ 4 ) is broken for all m > 2 and satisfied in the case of corresponding α -q-capacities, C q , q and C q , 2 q .
Entropy 23 00702 g007
Table 1. Instances of the α -q-mutual information for different values of the parameters and corresponding expressions for the BSC α -q-capacities.
Table 1. Instances of the α -q-mutual information for different values of the parameters and corresponding expressions for the BSC α -q-capacities.
H α , q I α , q C α , q
S
α = q = 1
x , y P X , Y ( x , y ) log P X , Y ( x , y ) P X ( x ) P Y ( y ) 1 + p log p + ( 1 p ) log ( 1 p )
R α
q = 1
α 1 α E 0 1 α 1 , P X ( α ) 1 log ( p α + ( 1 p ) α ) 1 α
T q
q = α
1 ( 1 q ) ln 2 2 q E 0 1 q 1 , P X ( q ) 1 1 ( 1 q ) ln 2 2 1 q ( p q + ( 1 p ) q ) 1 1
L α
q = 2 α
1 ( α 1 ) ln 2 2 α E 0 1 α 1 , P X ( α ) 1 1 ( 1 α ) ln 2 2 α 1 ( p α + ( 1 p ) α ) 1
G q
α = 1
1 ( 1 q ) ln 2 Π x , y P X , Y ( x , y ) P X ( x ) P Y ( y ) P X , Y ( x , y ) 1 1 ( 1 q ) ln 2 2 1 q p ( 1 q ) p ( 1 p ) ( 1 q ) ( 1 p ) 1
E 0 ρ , P X = log y x P X ( x ) P Y | X 1 1 + ρ ( y | x ) 1 + ρ
Publisher’s Note: MDPI stays neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Share and Cite

MDPI and ACS Style

Ilić, V.M.; Djordjević, I.B. On the α-q-Mutual Information and the α-q-Capacities. Entropy 2021, 23, 702. https://doi.org/10.3390/e23060702

AMA Style

Ilić VM, Djordjević IB. On the α-q-Mutual Information and the α-q-Capacities. Entropy. 2021; 23(6):702. https://doi.org/10.3390/e23060702

Chicago/Turabian Style

Ilić, Velimir M., and Ivan B. Djordjević. 2021. "On the α-q-Mutual Information and the α-q-Capacities" Entropy 23, no. 6: 702. https://doi.org/10.3390/e23060702

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Metrics

Back to TopTop