Next Article in Journal
Evaluating Ecohydrological Model Sensitivity to Input Variability with an Information-Theory-Based Approach
Next Article in Special Issue
Brain Tumor Segmentation Based on Bendlet Transform and Improved Chan-Vese Model
Previous Article in Journal
Polarization Attack on Continuous-Variable Quantum Key Distribution with a Local Local Oscillator
Previous Article in Special Issue
Bendlet Transform Based Adaptive Denoising Method for Microsection Images
 
 
Font Type:
Arial Georgia Verdana
Font Size:
Aa Aa Aa
Line Spacing:
Column Width:
Background:
Article

Essential Conditions for the Full Synergy of Probability of Occurrence Distributions

by
Rubem P. Mondaini
* and
Simão C. de Albuquerque Neto
COPPE, Centre of Technology, Federal University of Rio de Janeiro, Rio de Janeiro 21941-901, Brazil
*
Author to whom correspondence should be addressed.
Entropy 2022, 24(7), 993; https://doi.org/10.3390/e24070993
Submission received: 14 June 2022 / Revised: 11 July 2022 / Accepted: 12 July 2022 / Published: 18 July 2022
(This article belongs to the Special Issue Entropy and Its Applications across Disciplines III)

Abstract

:
In this contribution, we specify the conditions for assuring the validity of the synergy of the distribution of probabilities of occurrence. We also study the subsequent restriction on the maximal extension of the strict concavity region on the parameter space of Sharma–Mittal entropy measures, which has been derived in a previous paper in this journal. The present paper is then a necessary complement to that publication. Some applications of the techniques introduced here are applied to protein domain families (Pfam databases, versions 27.0 and 35.0). The results will show evidence of their usefulness for testing the classification work performed with methods of alignment that are used by expert biologists.

1. Introduction

We have been working since 2015 on the problem of testing the alignment of protein domain families which are proposed by expert biologists and bioinformaticians. We have found that the use of selected entropy measures is very proficient for testing the results published by those professionals and they favour a rigorous ANOVA statistical analysis [1]. In order to reduce the search space for admissible values of entropy measures, we have emphasized the need for work in the region related to strict concavity of these entropies. This study has been undertaken in a previous work, and we present in Section 2 a summary of those developments. In the present work, we aim to complement the results of a previous publication [2], and a subsequent restriction on the parameter space has to be performed in order to guarantee the synergy of the probability distributions to be tested. Non-synergetic distributions are not worthwhile for working because they will not preserve the fundamental property of getting more information of amino acids into t-sets of columns than to sum up the information obtained from individual columns. In Section 3, a brief digression is then made for introducing the Sharma–Mittal class of entropy measures. Section 4 emphasizes the aspects of synergy of the distributions and their consequences for the reduction of the parameter space of Sharma–Mittal entropies. In Section 5, we treat the analysis of the maximal extension of the parameter space, and we repeat the reduction process imposed by the requirement of fully synergetic distributions of Section 4. We conclude the paper in Section 6 by studying the relation of Hölder and generalized Khinchin–Shannon (GKS) inequalities.

2. The Construction of the Probabilistic Space

Let us consider a set of m f domains ( m f rows) from a chosen family of protein domains. In order to associate a rectangular array with this family, to be taken as its representative in the probabilistic space we are constructing, we specify its number of columns as n f = n . This means that among m f rows, we disregard all rows such that the number of their amino acids satisfies n f < n and preserve m f rows whose number of amino acids satisfies n f n , but disregard ( n f n ) amino acids in these m f rows. We then choose m rows from among the m f rows to obtain m × n rectangular arrays. There are m f ! / [ m ! ( m f m ) ! ] of these m × n rectangular arrays. Any one of them can be used as a representative of the domain family to be analysed in the statistical procedure to be implemented.
The next step is to assign a joint probability of occurrence of a set of variables a 1 , , a t in columns j 1 , , j t to be given by
p j 1 j t ( a 1 , , a t ) = n j 1 j t ( a 1 , , a t ) m ,
where n j 1 j t ( a 1 , , a t ) stands for the number of occurrences of the set a 1 , , a t in the t columns of the subarray m × t of the representative array m × n ( 1 t n ). The symbols a 1 , , a t will be running over the letters of the one-letter code for the twenty amino acids: a j ( 1 j t ) {A, C, D, E, F, G, H, I, K, L, M, N, P, Q, R, S, T, V, W, Y}.
We then have
a 1 , , a t n j 1 j t ( a 1 , , a t ) m .
We also introduce the conditional probabilities of occurrence, which are given implicitly by
p j 1 j t ( a 1 , , a t ) p j 1 j t ( a 1 , , a t 1 | a t ) p j t ( a t ) ,
where p j 1 j t ( a 1 , , a t 1 | a t ) is the probability of occurrence of the amino acids in the columns j 1 , , j t 1 , if the distribution of amino acids in the j t -th column is known a priori.
The Bayes’ law for probabilities of occurrence [2,3] can be written as
p j 1 j t ( a 1 , , a t ) p j 1 j t ( a 1 , , a t 1 | a t ) p j t ( a t ) = p j t j 1 j t 1 ( a t | a 1 , , a t 1 ) p j 1 j t 1 ( a 1 , , a t 1 ) = p j t j t 1 j 1 j t 2 ( a t , a t 1 | a 1 , , a t 2 ) p j 1 j t 2 ( a 1 , , a t 2 ) = = p j t j 3 j 1 j 2 ( a t , , a 3 | a 1 , a 2 ) p j 1 j 2 ( a 1 , a 2 ) = p j t j 2 j 1 ( a t , , a 2 | a 1 ) p j 1 ( a 1 ) = p j t j 1 ( a t , , a 1 ) .
The equality of the three first right-side members, as well as the equality of the three last ones, does correspond to the application of Bayes’ law [2,3]. The symmetries for the joint probability distribution p j 1 j t ( a 1 , , a t ) are due to the ordering of the columns for the distributions of amino acids.
From the ordering j 1 < j 2 < < j t , the values assumed by the variables j 1 , , j t are respectively given by
j 1 = 1 , 2 , , n t + 1 j 2 = j 1 + 1 , j 1 + 2 , , n t + 2 , 1 t n j t 1 = j t 2 + 1 , j t 2 + 2 , , n 1 j t = j t 1 + 1 , j t 1 + 2 , , n .
We then have n t = n ! t ! ( n t ) ! geometric objects p j 1 j t ( a 1 , , a t ) of t columns and ( 20 ) t components each.

3. The Sharma–Mittal Class of Entropy Measures

As emphasized in Ref. [2], the introduction of random variable functions such as entropy measures associated with the probabilities of occurrence, is suitable to provide an analysis of the evolution of these probabilities through the regions of the parameter space of entropies. The class of Sharma–Mittal entropy measures seems to be particularly adapted to this task when related to the occurrence of amino acids in the objects p j 1 j t ( a 1 , , a t ) . The thermodynamic interpretation of the notion of entropy greatly helps to classify the distribution of its values associated with protein domain databases and to interpret its evolution through the Fokker–Planck equations to be treated in forthcoming articles in this line of research.
The two-parameter Sharma–Mittal class of entropy measures is usually given by
( S M ) j 1 j t ( r , s ) = α j 1 j t ( s ) 1 r 1 s 1 1 r ; ( S M ) j t ( r , s ) = α j t ( s ) 1 r 1 s 1 1 r ,
where
α j 1 j t ( s ) = a 1 , , a t p j 1 j t ( a 1 , , a t ) s ; α j t ( s ) = a t p j t ( a t ) s .
The parameters r, s must bound a region corresponding to a strict concavity in the parameter space. A necessary requirement to be satisfied [3] is
2 ( S M ) j 1 j t ( r , s ) p j 1 j t ( a 1 , , a t ) 2 = s α j 1 j t ( s ) s r 1 s · p j 1 j t ( a 1 , , a t ) s 2 · s ( s r ) ( 1 s ) 2 p ^ j 1 j t ( a 1 , , a t ) 1 < 0 ,
where p ^ j 1 j t ( a 1 , , a t ) stands for the escort probability associated with the joint probability p j 1 j t ( a 1 , , a t ) , or,
p ^ j 1 j t ( a 1 , , a t ) = p j 1 j t ( a 1 , , a t ) s α j 1 j t ( s ) ; p ^ j t ( a t ) = p j t ( a t ) s α j t ( s ) .
Equation (8) leads to
r s > 0 .
Some special cases of one-parameter entropies are commonplace in the scientific literature [3,4,5,6,7,8,9]:
The r = s region is the domain of the Havrda–Charvat [6] entropy measure H j 1 j t ( s ) ,
H j 1 j t ( s ) = α j 1 j t ( s ) 1 1 s .
The r = 2 s , 0 s 1 , region will stand for the domain of the Landsberg–Vedral [7] entropy measure, L j 1 j t ( s ) ,
L j 1 j t ( s ) = α j 1 j t ( s ) 1 ( 1 s ) α j 1 j t ( s ) H j 1 j t ( s ) α j 1 j t ( s ) .
The Renyi R j 1 j t ( s ) [8] and the “non-extensive” Gaussian [9] G j 1 j t ( r ) entropy measures are obtained from limit processes:
R j 1 j t ( s ) lim r 1 α j 1 j t ( s ) 1 r 1 s 1 1 r = lim r 1 d d r α j 1 j t ( s ) 1 r 1 s 1 d d r ( 1 r ) = lim r 1 e 1 r 1 s · log α j 1 j t ( s ) · 1 1 s log α j 1 j t s 1 = log α j 1 j t ( s ) 1 s .
G j 1 j t ( r ) lim s 1 α j 1 j t ( s ) 1 r 1 s 1 1 r = 1 1 r exp ( 1 r ) · lim s 1 d d s log α j 1 j t ( s ) d d s ( 1 s ) 1 = 1 1 r exp ( 1 r ) · lim s 1 d d s α j 1 j t ( s ) α j 1 j t ( s ) 1 .
After using the definition of α j 1 j t ( s ) , Equation (7), and lim s 1 α j 1 j t ( s ) = 1 from Equations (1) and (2), we get:
G j 1 j t ( r ) = e ( 1 r ) S j 1 j t 1 1 r ,
where S j 1 j t is the Gibbs–Shannon entropy measure
S j 1 j t = a 1 , , a t p j 1 j t ( a 1 , , a t ) log p j 1 j t ( a 1 , , a t ) .
The Gibbs–Shannon entropy measure, Equation (15), is also obtained by taking the convenient limits of the special cases of Sharma–Mittal entropies, Equations (11)–(14):
lim s 1 H j 1 j t ( s ) = lim s 1 L j 1 j t ( s ) = lim s 1 R j 1 j t ( s ) = lim r 1 G j 1 j t ( r ) = S j 1 j t .
We shall analyse in the next section the structure of the two-parameter space of Sharma–Mittal entropy by taking into consideration these special cases.
We are now reminded that for the limit of Gibbs–Shannon entropy, a conditional entropy measure is defined [3] by
S j 1 j t 1 | j t = a 1 , , a t p j t ( a t ) p j 1 j t ( a 1 , , a t 1 | a t ) log p j 1 j t ( a 1 , , a t 1 | a t ) .
We then have analogously for the conditional Sharma–Mittal entropy measure [3]
( S M ) j 1 j t 1 | j t = a 1 , , a t p ^ j t ( a t ) p j 1 j t ( a 1 , , a t 1 | a t ) s 1 r 1 s 1 1 r .
It is easy to show by trivial calculation that, analogously to Equation (16), we will have
lim s 1 lim r s ( S M ) j 1 j t 1 | j t = lim s 1 lim r 2 s ( S M ) j 1 j t 1 | j t = lim s 1 lim r 1 ( S M ) j 1 j t 1 | j t = lim r 1 lim s 1 ( S M ) j 1 j t 1 | j t = S j 1 j t 1 | j t .
From Equations (6), (7) and (18) and the application of the Bayes’ law, Equation (4), we can write
( S M ) j 1 j t = ( S M ) j t + ( S M ) j 1 j t 1 | j t + ( 1 r ) ( S M ) j t ( S M ) j 1 j t 1 | j t .

4. Aspects of Synergy and the Reduction of the Parameter Space for Fully Synergetic Distributions

For the Gibbs–Shannon entropy measure, the inequality written by A. Y. Khinchin [3,10] is
S j 1 j t 1 | j t S j 1 j t 1 .
This inequality would be described by Khinchin as: “On the average, the knowledge a priori of the distribution on the column j t can only decrease the uncertainty of the distribution on the j 1 , , j t 1 columns”. We can write an analogous inequality for the Sharma–Mittal class of entropies
( S M ) j 1 j t 1 | j t ( S M ) j 1 j t 1 .
We then get from Equations (20) and (22)
( S M ) j 1 j t ( S M ) j t + ( S M ) j 1 j t 1 + ( 1 r ) ( S M ) j t ( S M ) j 1 j t 1 .
After iteration of this equation, t t 1 t 2 , we can also write
( S M ) j 1 j t l = 1 t 1 + ( 1 r ) ( S M ) j l 1 1 r .
The inequalities in (21)–(24) are associated with what are called “synergetic conditions”. In this section, we also derive the fully synergetic conditions as GKS inequalities.
After using Equations (7) and (9) in Equation (23), we get
α j 1 j t ( s ) 1 r 1 s 1 r α j t ( s ) 1 r 1 s · α j 1 j t 1 ( s ) 1 r 1 s 1 r ,
and after iteration and use of Equation (24)
α j 1 j t ( s ) 1 r 1 s 1 r l = 1 t α j l ( s ) 1 r 1 s 1 r .
The hatched region of strict concavity in the parameter space of Sharma–Mittal entropies, C = { ( s , r ) | r s > 0 } , is depicted in Figure 1. The special cases corresponding to Havrda–Charvat’s ( r = s ), Landsberg–Vedral’s ( r = 2 s ), Renyi’s ( r = 1 ), and “non-extensive” Gaussian’s ( s = 1 ) entropies are also represented.
We can identify three subregions in Figure 1. They will correspond to
R I = { ( s , r ) | 1 > r s > 0 } α j 1 j t ( s < 1 ) l = 1 t α j t ( s < 1 ) ,
R II = { ( s , r ) | r s > 1 } α j 1 j t ( s > 1 ) l = 1 t α j t ( s > 1 ) ,
R III = { ( s , r ) | r > 1 > s > 0 } α j 1 j t ( s < 1 ) l = 1 t α j t ( s < 1 )
where the ordering of α -symbols has been obtained from Equation (26). The subregions R I and R III are what we call fully synergetic subregions, and the corresponding inequalities are the GKS inequalities [2].
The subregions R I , R II , and R III are depicted in Figure 2a–c, respectively. The union of subregions R I and R III is the fully synergetic Khinchin–Shannon restriction to be imposed on the strict concavity region of Figure 1 and it is depicted in Figure 2d below.

5. The Maximal Extension of the Parameter Space and Its Reduction for Fully Synergetic Distribution

In Figure 1 and Figure 2d, we have depicted the structure of the strict concavity region for Sharma–Mittal entropy measures and its reduction to a subregion by the application of the requirement of fully synergetic distributions, respectively. Our analysis has used a coarse-grained approach to concavity given by Equations (8) and (10). We now introduce some necessary refinements for characterizing the probability of occurrence in subarrays of m rows and t columns, m × t . For t columns, there are ( 20 ) t possibilities of occurrence of amino acids, which could be a large number, but we could count not individual amino acids, but groups of t-sets of amino acids ( μ -groups) which appear on the m rows of the m × t array. We characterize these μ -groups by μ = 1 , , m , from all equal μ -groups ( μ = 1 ) to m different μ -groups ( μ = m ). We also call q μ , the number of equal t-sets of a given μ -group.
In Equation (2), the sum is over all the amino acids that make up the geometric object defined in Equation (1), the probability of occurrence. We can now perform the sum over μ -groups and write
a 1 q μ , , a t q μ p j 1 j t a 1 q μ , , a t q μ = a 1 q μ , , a t q μ n j 1 j t a 1 q μ , , a t q μ m = μ = 1 m q μ m = 1 ,
where a 1 q μ , , a t q μ are the t-sets of a μ -group. We also have from Equation (7)
a 1 q μ , , a t q μ p j 1 j t a 1 q μ , , a t q μ s = a 1 q μ , , a t q μ n j 1 j t a 1 q μ , , a t q μ m s = μ = 1 m q μ m s = α j 1 j t ( s ) .
From Equations (30) and (31), we can now proceed to the calculation of the Hessian matrix for Sharma–Mittal entropy measures. We have for the first derivative of ( S M ) j 1 j t
( S M ) j 1 j t p j 1 j t a 1 q μ , , a t q μ = s 1 s α j 1 j t ( s ) s r 1 s p j 1 j t a 1 q μ , , a t q μ s 1 .
We then have for a generic element of the Hessian matrix [2]
H q μ q ν = 2 ( S M ) j 1 j t p j 1 j t a 1 q μ , , a t q μ p j 1 j t a 1 q ν , , a t q ν = s α j 1 j t ( s ) s r 1 s p j 1 j t a 1 q μ , , a t q μ s 2 s ( s r ) ( 1 s ) 2 p j 1 j t a 1 q μ , , a t q μ p j 1 j t a 1 q ν , , a t q ν · p ^ j 1 j t a 1 q ν , , a t q ν δ μ ν ,
where p ^ j 1 j t a 1 q μ , , a t q μ is the escort probability associated to p j 1 j t a 1 q μ , , a t q μ , or
p ^ j 1 j t a 1 q μ , , a t q μ p j 1 j t a 1 q μ , , a t q μ s b 1 q ν , , b t q ν p j 1 j t b 1 q ν , , b t q ν .
The principal minors are given by
det H q μ q ν ( μ , ν = 1 , , k ) = ( 1 ) k 1 s k α j 1 j t ( s ) k ( s r ) 1 s μ = 1 k p j 1 j t a 1 q μ , , a t q μ s 2 · s ( s r ) ( 1 s ) 2 μ = 1 k p ^ j 1 j t a 1 q μ , , a t q μ 1 , k = 1 , , m ,
and we have
μ = 1 k p ^ j 1 j t a 1 q μ , , a t q μ = μ = 1 k p j 1 j t a 1 q μ , , a t q μ s μ = 1 m q μ m s = μ = 1 k q μ m s μ = 1 m q μ m s σ k ( s )
according to Equation (31).
From Equations (35) and (36), the requirement of strict concavity will lead to
s ( s r ) ( 1 s ) 2 σ k ( s ) 1 < 0 .
We then have
det H q μ q ν ( μ , ν = 1 , , k ) < 0 , k odd ; > 0 , k even .
This does correspond to the criterion of negative definiteness of the Hessian matrix for strict concavity of multivariate functions [11].
Each k-value is associated with the k-epigraph region, which is the k-extension of the strict concavity region presented in Figure 1. These regions are given by
C k max = { ( s , r ) | r s > 0 } ( s , r ) | s > r > s ( 1 s ) 2 s σ k ( s ) , k = 1 , , m .
The greatest lower bound of the sequence of k-curves is given by σ m ( s ) = 1 . We then have
r m ( s ) = 2 1 s .
We can then write for the maximal extended region of strict concavity
C max = { ( s , r ) | r s > 0 } ( s , r ) | s > r > 2 1 s .
The region corresponding to Equation (41) is depicted in Figure 3 below.
We are now ready to undertake the application of restrictions for fully synergetic distributions (validity of GKS inequalities) to the maximal strict concavity region of Figure 3.
We start by identifying two regions included in Figure 3. They will be given by
R IV = ( s , r ) | 1 > s > r 2 1 s > 0 α j 1 j t ( s < 1 ) l = 1 t α j t ( s < 1 ) ,
R V = ( s , r ) | s > r 2 1 s > 1 α j 1 j t ( s > 1 ) l = 1 t α j t ( s > 1 ) .
These regions are depicted in Figure 4a,b, respectively.
In order to find the reduced region corresponding to Figure 3, analogously to what has been done for Figure 1, we also need the subregions R I , R III , Equations (27) and (29): the resulting subregion of fully synergetic distributions is given by R IV R I R III and is depicted in Figure 5.

6. Hölder Inequalities and GKS Inequalities: A Possible Conjecture

In this section, we study the relation between GKS inequalities [2] and Hölder inequalities by using examples of distributions obtained from databases of protein domain families. In order to start, some definitions and properties of the probabilistic space are now in order.
Let us first introduce the definition of the conditional probability of occurrence of the escort probability of occurrence [12]. This is a simple application to escort probabilities of Equation (3):
p ^ j 1 j t ( a 1 , , a t 1 | a t ) = p ^ j 1 j t ( a 1 , , a t ) p ^ j t ( a t ) .
From the definitions of escort probabilities, Equation (9), we can write
p ^ j 1 j t ( a 1 , , a t ) = p j 1 j t ( a 1 , , a t ) s b 1 , , b t p j 1 j t ( b 1 , , b t ) s ,
and
p ^ j t ( a t ) = p j t ( a t ) s b t p j t ( b t ) s .
In Equations (44)–(46), the symbols a 1 , , a t ; b 1 , , b t assume the representative letters of the one-letter code for the 20 amino acids, a j ; b j ( 1 j t ) {A, C, D, E, F, G, H, I, K, L, M, N, P, Q, R, S, T, V, W, Y}.
After substituting Equations (45) and (46) into Equation (44), we get
p ^ j 1 j t ( a 1 , , a t 1 | a t ) = p j 1 j t ( a 1 , , a t 1 | a t ) s p j t ( a t ) s b 1 , , b t p j t ( b t ) s p j 1 j t ( b 1 , , b t 1 | b t ) s · p ^ j t ( a t ) ,
and from Equation (46)
p ^ j 1 j t ( a 1 , , a t 1 | a t ) = p j 1 j t ( a 1 , , a t 1 | a t ) s b 1 , , b t p ^ j t ( b t ) p j 1 j t ( b 1 , , b t 1 | b t ) s .
We also write the definition of escort probability of occurrence of the conditional probability of occurrence [12]
p j 1 j t ( a 1 , , a t 1 | a t ) ^ = p j 1 j t ( a 1 , , a t 1 | a t ) s b 1 , , b t 1 p j 1 j t ( b 1 , , b t 1 | b t ) s .
We can check the definitions of Equations (48) and (49) from the equality of the two escort probabilities with the original conditional probability, for s = 1
s = 1 p ^ j 1 j t ( a 1 , , a t 1 | a t ) = p j 1 j t ( a 1 , , a t 1 | a t ) ^ = p j 1 j t ( a 1 , , a t 1 | a t ) .
We should note that the denominators of the right-hand sides of Equations (48) and (49), or,
Z a 1 , , a t p ^ j t ( a t ) p j 1 j t ( a 1 , , a t 1 | a t ) s = α j 1 j t ( s ) α j t ( s ) ,
and
X ( a t ) a 1 , , a t 1 p j 1 j t ( a 1 , , a t 1 | a t ) s
will be equal if all amino acids in the j t column are equal. If we have, for instance, the j t column given by:
j t ( A , A , A , A , , A ) m .
The unit vectors of probabilities p ^ j t and p j t will also be equal and given by
( p ^ j t ) T = ( p j t ) T = ( 1 , 0 , 0 , 0 , , 0 ) 20 .
This means that for this special case of an event of rare occurrence, we also have the equality of the conditional of the escort probability and the escort probability of the conditional probability, or the left-hand sides of Equations (48) and (49), respectively.
For a j t -column with a generic distribution of amino acids, the denominators Z and X ( a t ) on the right-hand sides of Equations (48) and (49) will no longer be equal. An ordering of these denominators should be decided from the probabilities of amino acid occurrence in a chosen protein domain family.
This study is undertaken with the help of the functions Z and X ( a t ) of Equations (51) and (52) and with the functions J and U, defined below:
J a 1 , , a t 1 a t p ^ j t ( a t ) p j 1 j t ( a 1 , , a t 1 | a t ) s ,
U a 1 , , a t 1 p j 1 j t 1 ( a 1 , , a t 1 ) s α j 1 j t 1 ( s ) .
Our method will then be the comparison of pairs of functions in order to proceed with the search for the effect of fully synergetic distributions of amino acids.
There are six comparisons to study:
(I)
X ( a t ) Z
or
1 p j t ( a t ) s a 1 , , a t 1 p j 1 j t ( a 1 , , a t ) s α j 1 j t ( s ) α j t ( s ) ; p j t ( a t ) 0 .
(II)
X ( a t ) J
or
1 p j t ( a t ) s a 1 , , a t 1 p j 1 j t ( a 1 , , a t ) s 1 α j t ( s ) a 1 , , a t 1 a t p j t ( a t ) s 1 p j 1 j t ( a 1 , , a t ) s ; p j t ( a t ) 0 .
(III)
X ( a t ) U
or
1 p j t ( a t ) s a 1 , , a t 1 p j 1 j t ( a 1 , , a t ) s α j 1 j t 1 ( s ) .
(IV)
U J
or
a 1 , , a t 1 p j 1 j t 1 ( a 1 , , a t 1 ) s α j 1 j t 1 ( s ) H α j t ( s ) s ,
where H is defined by,
H a 1 , , a t 1 a t p j t ( a t ) s 1 p j 1 j t ( a 1 , , a t ) s .
(V)
J Z
or
a 1 , , a t 1 a t p ^ j t ( a t ) p j 1 j t ( a 1 , , a t 1 | a t ) s H α j t ( s ) s α j 1 j t ( s ) α j t ( s ) .
(VI)
U Z
or
a 1 , , a t 1 p j 1 j t 1 ( a 1 , , a t 1 ) s α j 1 j t 1 ( s ) α j 1 j t ( s ) α j t ( s ) .
Equations (57)–(59) should be multiplied by p j t ( a t ) s and after that, each one has to be summed over a t . We then have, respectively,
α j 1 j t ( s ) α j 1 j t ( s ) ,
α j t ( s ) s 1 α j 1 j t ( s ) H ,
α j 1 j t ( s ) α j t ( s ) · α j 1 j t 1 ( s ) .
Equations (60), (62) and (63) can be written, respectively, as
α j t ( s ) s · α j 1 j t 1 ( s ) H
H α j t ( s ) s 1 · α j 1 j t ( s ) ,
α j t ( s ) · α j 1 j t 1 ( s ) α j 1 j t ( s ) ,
The Hölder’s inequality as applied to probabilities of occurrence [3] is written as
1 α j t ( s ) s a t p j t ( a t ) s 1 p j 1 j t ( a 1 , , a t ) s a t p j 1 j t ( a 1 , , a t ) s α j t ( s ) .
After multiplying by α j t ( s ) s and summing over a 1 , , a t 1 , we get
H α j t ( s ) s 1 α j 1 j t ( s ) , s 1 .
We also define
O α j t ( s ) s 1 α j 1 j t ( s ) ,
B α j t ( s ) s α j 1 j t 1 ( s ) .
We then summarize the results obtained:
  • Equation (64) is only an identity: α j 1 j t ( s ) = α j 1 j t ( s ) .
  • Equations (65) and (68) can be ordered by Hölder’s inequality, Equations (70) and (71).
  • Equations (66) and (69) can be ordered by GKS inequalities, corresponding to fully synergetic distributions of amino acids, α j 1 j t ( s < 1 ) α j t ( s < 1 ) · α j 1 j t 1 ( s < 1 ) .
  • Equation (67) cannot be ordered without additional experimental/phenomenological information on the probabilities of occurrence to be obtained from updated versions of protein domain family databases [13].
We now collect the formulae obtained from the analysis performed on this section. Equations (65) and (68) are ordered by Hölder’s inequality. We write
H O 0 , s < 1 .
Equations (66) and (69) are ordered by GKS inequality. We write
B O 0 , s < 1 .
After using Equation (73), we can write Equation (67) as
B H 0 .
In Figure 6a,b we have depicted the curves corresponding to functions H O and B O for seven 3-sets of contiguous columns and 80 rows, chosen from databases Pfam 27.0 and Pfam 35.0, respectively. There are also inset figures in order to show the curves for s 1 .
In Figure 7a,b, we do the same for the differences B H . We emphasize that for the 3-sets such that B H 0 , 0 s 1 , the GKS inequalities B O 0 will result from the validity of Hölder’s inequality. We have worked with the PF01926 protein domain family to perform all the calculations.

7. Concluding Remarks

The first comment we want to make to the present work is about the possibility of working in a region of the parameter space that preserves the strict concavity and the fully synergetic structure of the Sharma–Mittal class of entropy measure distributions to be visited by solutions of a new successful statistical mechanics approach. The usual work with Havrda–Charvat distributions describes the evolution along the boundary ( r = s ) of the region ( r s > 0 ) that was correctly considered to correspond to strict concavity, but it is also known to be non-synergetic for s > 1 . We now have the opportunity to develop this statistical mechanics approach along an extended boundary, preserving the strict concavity and providing the study of the evolution of fully synergetic entropy distributions. A first sketch of these developments will be presented in a forthcoming publication.
With respect to Figure 6 and Figure 7, we could hypothesize that if the ordering of B and H could not be obtained, this would be due to the poor alignment of some protein domain families we have been using, but we are not confident enough that we could do this, because we would need much more information “in silico” to be obtained from many other protein domain families. In other words, we expect that a good alignment of a protein domain family will result in the ordering of B and H , but we need to verify this in a large number of families from different Pfam versions before we proceed with a proposal of a method to improve the Pfam database. This looks promising for good scientific work in the line of research we have been aiming to introduce in Ref. [2] and in this contribution.

Author Contributions

Conceptualization, R.P.M. and S.C.d.A.N.; methodology, R.P.M. and S.C.d.A.N.; formal analysis, R.P.M. and S.C.d.A.N.; writing—original draft preparation, R.P.M.; writing—review and editing, R.P.M. and S.C.d.A.N.; visualization, R.P.M. and S.C.d.A.N.; supervision, R.P.M.; project administration, R.P.M. All authors have read and agreed to the published version of the manuscript.

Funding

This research received no external funding.

Institutional Review Board Statement

Not applicable.

Informed Consent Statement

Not applicable.

Data Availability Statement

Not applicable.

Conflicts of Interest

The authors declare no conflict of interest.

Abbreviations

The following abbreviation is used in this manuscript:
GKSGeneralized Khinchin-Shannon

References

  1. Mondaini, R.P.; de Albuquerque Neto, S.C. The Statistical Analysis of Protein Domain Family Distributions via Jaccard Entropy Measures. In Trends in Biomathematics: Modeling Cells, Flows, Epidemics, and the Environment; Mondaini, R.P., Ed.; Springer: Cham, Switzerland, 2020; pp. 169–207. [Google Scholar]
  2. Mondaini, R.P.; de Albuquerque Neto, S.C. Alternative Entropy Measures and Generalized Khinchin-Shannon Inequalities. Entropy 2021, 23, 1618. [Google Scholar] [CrossRef] [PubMed]
  3. Mondaini, R.P.; de Albuquerque Neto, S.C. Khinchin–Shannon Generalized Inequalities for “Non-additive” Entropy Measures. In Trends in Biomathematics: Mathematical Modeling for Health, Harvesting, and Population Dynamics; Mondaini, R.P., Ed.; Springer: Cham, Switzerland, 2019; pp. 177–190. [Google Scholar]
  4. Beck, C. Generalized Information and Entropy Measures in Physics. Contemp. Phys. 2009, 50, 495–510. [Google Scholar] [CrossRef]
  5. Sharma, B.D.; Mittal, D.P. New Non-additive Measures of Entropy for Discrete Probability Distributions. J. Math. Sci. 1975, 10, 28–40. [Google Scholar]
  6. Havrda, J.; Charvat, F. Quantification Method of Classification Processes. Concept of Structural α-entropy. Kybernetica 1967, 3, 30–35. [Google Scholar]
  7. Landsberg, P.T.; Vedral, V. Distributions and Channel Capacities in Generalized Statistical Mechanics. Phys. Lett. A 1998, 247, 211–217. [Google Scholar] [CrossRef]
  8. Rényi, A. On Measures of Entropy and Information. In Contributions to the Theory of Statistics, Proceedings of the Fourth Berkeley Symposium on Mathematical Statistics and Probability, Berkeley, CA, USA, 20 June–30 July 1960; Neyman, J., Ed.; University of California Press: Berkeley, CA, USA, 1961; Volume 1, pp. 547–561. [Google Scholar]
  9. Oikonomou, T. Properties of the “Non-extensive Gaussian” Entropy. Phys. A 2007, 381, 155–163. [Google Scholar] [CrossRef] [Green Version]
  10. Khinchin, A.Y. Mathematical Foundations of Information Theory; Dover Publications, Inc.: New York, NY, USA, 1957. [Google Scholar]
  11. Marsden, J.E.; Tromba, A. Vector Calculus, 6th ed.; W. H. Freeman and Company Publishers: New York, NY, USA, 2012. [Google Scholar]
  12. Mondaini, R.P.; de Albuquerque Neto, S.C. The Maximal Extension of the Strict Concavity Region on the Parameter Space for Sharma-Mittal Entropy Measures. In Trends in Biomathematics: Stability and Oscillations in Environmental Social and Biological Models; Mondaini, R.P., Ed.; Springer: Cham, Switzerland, 2022. [Google Scholar]
  13. Mistry, J.; Chuguransky, S.; Williams, L.; Qureshi, M.; Salazar, G.A.; Sonnhammer, E.L.L.; Tosatto, S.C.E.; Paladin, L.; Raj, S.; Richardson, L.J.; et al. Pfam: The Protein Families Database in 2021. Nucleic Acids Res. 2021, 49, D412–D419. [Google Scholar] [CrossRef]
Figure 1. The strict concavity region C = { ( s , r ) | r s > 0 } of Sharma–Mittal class of entropy measures. It is the epigraph of the curve ( r = s ), and this corresponds to the Havrda–Charvat entropy, which is depicted in brown. The Landsberg–Vedral’s ( r = 2 s ), Renyi’s ( r = 1 ), and “non-extensive” Gaussian’s ( s = 1 ) are depicted in green, blue, and red, respectively.
Figure 1. The strict concavity region C = { ( s , r ) | r s > 0 } of Sharma–Mittal class of entropy measures. It is the epigraph of the curve ( r = s ), and this corresponds to the Havrda–Charvat entropy, which is depicted in brown. The Landsberg–Vedral’s ( r = 2 s ), Renyi’s ( r = 1 ), and “non-extensive” Gaussian’s ( s = 1 ) are depicted in green, blue, and red, respectively.
Entropy 24 00993 g001
Figure 2. Subregions of the strict concavity region C = { ( s , r ) | r s > 0 } of the Sharma–Mittal class of entropy measures. (a) Khinchin–Shannon subregion R I —fully synergetic.; (b) The non-synergetic subregion R II ; (c) Khinchin–Shannon subregion R III —fully synergetic; (d) Khinchin–Shannon subregion R I R III —fully synergetic. The reduced subregion R I R III of Figure 1 is obtained by taking into consideration fully synergetic distributions only.
Figure 2. Subregions of the strict concavity region C = { ( s , r ) | r s > 0 } of the Sharma–Mittal class of entropy measures. (a) Khinchin–Shannon subregion R I —fully synergetic.; (b) The non-synergetic subregion R II ; (c) Khinchin–Shannon subregion R III —fully synergetic; (d) Khinchin–Shannon subregion R I R III —fully synergetic. The reduced subregion R I R III of Figure 1 is obtained by taking into consideration fully synergetic distributions only.
Entropy 24 00993 g002aEntropy 24 00993 g002b
Figure 3. The maximal strict concavity region of the Sharma–Mittal class of entropy measures. The hatched region is the epigraph of the curve r = 2 1 / s which is depicted in black. The Havrda–Charvat ( r = s ) region is in brown. The Landsberg–Vedral ( r = 2 s ), Renyi ( r = 1 ), and “non-extensive” Gaussian ( s = 1 ) regions are depicted in green, blue, and red, respectively.
Figure 3. The maximal strict concavity region of the Sharma–Mittal class of entropy measures. The hatched region is the epigraph of the curve r = 2 1 / s which is depicted in black. The Havrda–Charvat ( r = s ) region is in brown. The Landsberg–Vedral ( r = 2 s ), Renyi ( r = 1 ), and “non-extensive” Gaussian ( s = 1 ) regions are depicted in green, blue, and red, respectively.
Entropy 24 00993 g003
Figure 4. Subregions of the maximal strict concavity region of the Sharma–Mittal class of entropy measures (Figure 3). (a) Khinchin–Shannon subregion R IV —fully synergetic; (b) The non-synergetic subregion R V .
Figure 4. Subregions of the maximal strict concavity region of the Sharma–Mittal class of entropy measures (Figure 3). (a) Khinchin–Shannon subregion R IV —fully synergetic; (b) The non-synergetic subregion R V .
Entropy 24 00993 g004
Figure 5. R IV R I R III is the reduction of the region of Figure 3 by taking into consideration the fully synergetic distributions only.
Figure 5. R IV R I R III is the reduction of the region of Figure 3 by taking into consideration the fully synergetic distributions only.
Entropy 24 00993 g005
Figure 6. Hölder ( H O 0 , 0 s 1 ) distributions (dashed curves) and Khinchin–Shannon ( B O 0 , 0 s 1 ) distributions (continuous curves) of the PF01926 protein domain family from (a) protein domain family PF01926 obtained from Pfam 27.0 and (b) protein domain family PF01926 obtained from Pfam 35.0. The top-right inset shows details of the curves for s 1 .
Figure 6. Hölder ( H O 0 , 0 s 1 ) distributions (dashed curves) and Khinchin–Shannon ( B O 0 , 0 s 1 ) distributions (continuous curves) of the PF01926 protein domain family from (a) protein domain family PF01926 obtained from Pfam 27.0 and (b) protein domain family PF01926 obtained from Pfam 35.0. The top-right inset shows details of the curves for s 1 .
Entropy 24 00993 g006
Figure 7. B H difference of the PF01926 protein domain family from (a) protein domain family PF01926 obtained from Pfam 27.0 and (b) protein domain family PF01926 obtained from Pfam 35.0. The top-right inset shows details of the curves for s 1 . B H 0 , H O 0 B O 0 .
Figure 7. B H difference of the PF01926 protein domain family from (a) protein domain family PF01926 obtained from Pfam 27.0 and (b) protein domain family PF01926 obtained from Pfam 35.0. The top-right inset shows details of the curves for s 1 . B H 0 , H O 0 B O 0 .
Entropy 24 00993 g007
Publisher’s Note: MDPI stays neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Share and Cite

MDPI and ACS Style

Mondaini, R.P.; de Albuquerque Neto, S.C. Essential Conditions for the Full Synergy of Probability of Occurrence Distributions. Entropy 2022, 24, 993. https://doi.org/10.3390/e24070993

AMA Style

Mondaini RP, de Albuquerque Neto SC. Essential Conditions for the Full Synergy of Probability of Occurrence Distributions. Entropy. 2022; 24(7):993. https://doi.org/10.3390/e24070993

Chicago/Turabian Style

Mondaini, Rubem P., and Simão C. de Albuquerque Neto. 2022. "Essential Conditions for the Full Synergy of Probability of Occurrence Distributions" Entropy 24, no. 7: 993. https://doi.org/10.3390/e24070993

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Metrics

Back to TopTop