The Rényi Entropies Operate in Positive Semifields
Abstract
:1. Introduction
- Artificial intelligence (AI) [5] is an extensive field under which applications abound dealing with minimizing costs or maximizing utilities. Semifields and their dual-orderings (see Section 2.2 and Section 2.2.3) provide a perspective to mix these two kinds of valuations.
- Machine learning (ML) [6] makes heavy use of Probability Theory, which is built around the positive semifield of the non-negative reals with their standard algebra and negative logarithms thereof—called log-probabilities or log-likelihoods depending on the point of view—both of which are positive semifields, as shown in Section 2.2, Section 3.1.1, and Section 3.2.1.
- Computational intelligence (CI) [7] makes heavy use of positive semirings in the guise of fuzzy semirings. Although semifields cannot be considered “fuzzy” for several technical reasons, the name is sometimes an umbrella term under which non-standard algebras are included, many of which are semifields, e.g., the morphological semifield of morphological processing and memories, a special case of the semifields in Section 3.1.
- Other applications of positive semifields not related to modeling intelligence include Electrical Network analysis and synthesis (see the example in Section 2.2), queuing theory [8] and flow shop scheduling [1].
2. Materials and Methods
2.1. The Shifted Rényi Entropy
- The properties of the Rényi entropy, therefore, stem from those of the mean, inversion and the logarithm.
- This is not merely a cosmetic change, since it has the potential to allow the simplification of issues and the discovery of new ones in dealing with the Rényi magnitudes. For instance, since the means are defined for all there cannot be any objection to considering negative values for the index of the shifted entropy. This motivates calling the Rényi spectrum (of entropy).
- The definition makes it also evident that the shifted cross-entropy seems to be the more general concept, given that the shifted entropy and divergence are clearly instances of it.
- Lemma (1) rewrites the entropies in terms of the geometric means which is, by no means, the only rewriting possible. Indeed, some would say that the arithmetic mean is more natural, and this is the program of information theoretic learning [14], where it is explored under the guise of .
- The shifting clarifies the relationship between quantities around the Rényi entropy: given (4), from every measure of information an equivalent average probability emerges. In particular:Definition 2.Let with Rényi spectrum . Then the equivalent probability function of is the Hartley inverse of over all values of
2.2. Positive Semifields
2.2.1. Complete and Positive Dioids
2.2.2. Positive Semifields
- 1.
- There is a pair of completed semifields over
- 2.
- In addition to the individual laws as positive semifields, we have the modular laws:
- 3.
- Further, if is a positive dioid, then the inversion operation is a dual order isomorphism between the dual order structures and with the natural order of the original semifield a suborder of the first structure.
- If then , whence , and symmetrically for . That is, ⊤ is the neutral element of addition and a monoid.
- If , then , whence and symmetrically for . This proves that ⊥ is the maximum element of , to be defined below.
- Otherwise, for , we have ,
- If then , whence , and symmetrically for b.
- If but , then , whence .
- Otherwise, for , we have .
- if , since the natural order is compatible with multiplication we multiply by to obtain whence, by cancellation, , or else , so the order is the dual on inverses.
- We have that , otherwise which asserts that is the “top” of the inverted order. Likewise we read from that , that is is the “bottom” in .
- The dot notation, from [16], is a mnemonic for where do the multiplication of the bottom and top go:
- The case analysis for the operators in the dual semifield allows us to write their definition-by-cases as follows:This is important for calculations, but notice that and only differ in the corner cases.
- The notation to “speak” about these semirings tries to follow a convention reminiscent of that of boolean algebra, where the inversion is the complement ([17], Chapter 12).
- Note that and seem to operate on different “polarities” of the underlying set: if one operates on two numbers, the other operates on their inverses while this is not so for the respective multiplications. This proves extremely important to model physical quantities and other concepts with these calculi (see example below).
- The completed max-times and min-times semifields.
- The completed max-plus (schedule algebra, polar algebra) and min-plus semifields (tropical algebra).
2.2.3. A Construction for Positive Semifields
- 1.
- the pseudo-addition,
- 2.
- the pseudo-multiplication,
- 3.
- neutral element,
- 4.
- inverse, ,
- 1.
- if g is strictly monotone and increasing increasing such that and , then a complete positive semifield whose order is aligned with that of is:
- 2.
- order-dually, if g is strictly monotone and decreasing such that and , then a complete positive semifield whose order is aligned with that of is
- if then is strictly monotone increasing whence , , and , and the complete positive semifield generated, order-aligned with , is:
- if then is strictly monotone decreasing whence , , and , and the complete positive semifield generated, order-aligned with , or dually aligned with , is:
3. Results
3.1. Entropic Semifields
3.1.1. The Basic Entropic Semifield
3.1.2. Constructed Entropic Semifields
- for we get , and
- for we obtain ,
3.2. Applications
3.2.1. Rewriting the Viterbi Algorithm in Semifields
- A starting distribution , where , is the probability of starting at state i.
- A transition matrix W, where is the probability of a transition from state i to state j, and
- An emission distribution , where is the probability of emitting symbol from state .
- Actual non-linear computations in the Viterbi algorithm take the form of linear operations over a particular semifield, used here to minimize costs in . This takes the form of a linear matrix equation, an instance of “linear processing” in a non-linear algebra.
- Secondly and more importantly, because of Theorem 3 we know that the values in which the log-probabilities are being operated in the Viterbi are actually the Rényi entropies with index . Hence we conclude that the Viterbi is a quantitative, information entropy-processing algorithm.
- Even such a non-linear process as pruning using a threshold can be characterized and carried out by linear processing in the semifield. This is an instance of the fact that linear operations in process information in (“standard” algebraic) non-linear ways.
3.2.2. Rewriting the Hölder Means and Rényi Entropies with Semifields
- either ;
- or .
- The case where is reasoned out by duality, with being dual to , being the dual to , ∞ dual to 0, and the upper multiplication and addition duals to the lower multiplication and addition. In the following we just use this “duality” argument to solve the case for .
- The rest of the cases to analyze essentially have and whence their complements are non-null . This means that the actual summation in (38) is extended to , with and .
3.3. Discussion: A Conjecture on the Abundance of Semifields in AI, ML and CI Applications
- First, the shifting in definition of the Rényi entropy by in [9] leads to a a straightforward relation (5) between the power means of the probability distribution and the shifted Rényi entropy. For a given probability function or measure the evolution of entropy with resembles an information spectrum . In a procedure reminiscent of defining an inverse transform, we may consider an equivalent probability , which is the Hölder path of , .
- The function used by Rényi to define the generalized entropy, when shifted, is the composition of two functions: Hartley’s information function and the power function of order r, which are monotone and invertible in the extended non-negative reals . They are also bijections:
- The power function is a bijection over of the extended non-negative reals, and
- Hartley’s is a bijection between the extended reals and the extended non-negative reals.
- But in Construction 1 both the power function and Hartley’s prove to be isomorphisms of positive semifields, which are semirings whose multiplicative structure is that of a group, while the additive structure lacks additive inverses. Positive semifields are all naturally ordered and the power function respects this order within the non-negative reals, being an order isomorphism for generic power r. Importantly, positive semifields come in dually-ordered pairs and the expressions mixing operations from both members in the pair are reminiscent of boolean algebras.
- (a)
- The power function with actually generates a whole family of semifields related to emphasizing smaller (with small r) or bigger values (with big r) in the non-negative reals . Indeed, the traditional weighted means are explained by the Construction 2 as being power-deformed aritmetic means, also known as Kolmogorov-Nagumo means with the power function as generators. These, semirings come in dually-ordered pairs for orders r and whose orders are aligned or inverted with respect to that of . Indeed, (Corollaries 1 and 2).
- (b)
- However, Hartley’s function is a dual-order isomorphism, entailing that the new order in the extended reals is the opposite of that on the non-negative reals (Corollary 3). It actually mediates between the (extended) probability semifield and the semifield of informations, notated as a homage to Hartley as (Theorem 2).
- Since the composition of the power mean and Hartley’s information function produces the function that Rényi used for defining his information measures, and this is a dual-order semifield isomorphism, we can see that entropies are actually operated in modified versions of Hartley’s semifields which come in pairs, as all completed positive semifields do (Theorem 3).
- Many of the and semifields appear in domains that model intelligent behaviour. Among a list of applications we list the following:
- In AI, maximizing utilities and minimizing costs is used by many applications and algorithms, e.g., heuristic search, to mimic “informed” behaviour ([5], Chapter 3), decision theory ([5], Chapter 16), uncertainty and probability modelling ([5], Chapters 13–15). In most applications , for multiplicatively-aggregated costs and utilities, and , for additively aggregated ones are being used. Note that both a semifield and its order-dual are needed to express mixed utility-cost expressions, as in electrical network analysis with resistances and conductances.
- In ML, itself is used to model uncertainty as probabilities and as log-probabilities. Sometimes the idempotent versions of these spaces , e.g., and , and , e.g., and , are used, e.g., for A* stack decoding to find best candidates [32]. The Viterbi algorithm operates in , the semifield of max entropies, as required by the application of decoding Markov models ([25], and Section 3.2.1). Although many of the problems and solutions cited above for AI can also be considered as part of ML, a recent branch of ML is solely based upon the Rényi entropy with , [14]. Importantly, recall that every possible Hölder mean can be expressed as the arithmetic mean of a properly exponentiated kernel, whence the importance of this particular Rényi entropy would come.
- In CI, the sub-semifield obtained by the restriction of the operations to appears as a ternary sub-semifield distinct from the Boolean semifield, which in turn appears as a binary sub-semifield of every complete semifield by restricting the carrier set to . As an important example, this ternary sub-semifield, as seen in Proposition 22 and Theorem 3 is pivotal in Spohn’s logical rank theory [33] that essentially leverages the isomorphism of semifields between and the in logical applications.Finally—and also related to signal processing—mathematical morphology and morphological processing need to operate in the dual pair for image processing applications [34].
3.4. Historical Notes: The Rise of Positive Semifields
- From Barnard’s approach, it seems natural to use only in the KN means, and this is what Rényi chose.
- It is more difficult to guess why he decided to place the “origin” of his entropies at Hartley’s () implying the harmonic average, instead of at Shannon’s , related to the geometric average.
- -zero:
- -unit:
- -addition:
- -subtraction:
- -multiplication:
- -division: for
- -order:
- First, the motivation for our construction, viz. the modelling of operations on information, is clearly more specialized thant Grossman and Katz’s, as purported in the above-mentioned quote.
- Second, and more importantly, in their framework, as in Barnard’s paper, seems to be assumed monotone, and so the properties of antitone generators and their results (leading to the second half of the dual pair) in Pap’s g-calculus are downplayed.
4. Conclusions
Author Contributions
Funding
Acknowledgments
Conflicts of Interest
Abbreviations
AI | Artificial intelligence |
CI | Computational intelligence |
KN | Kolmogorov–Nagumo |
ML | Machine learning |
References
- Butkovič, P. Max-linear Systems. Theory and Algorithms; Monographs in Mathematics; Springer: Heidelberg, Germany, 2010. [Google Scholar]
- Gondran, M.; Minoux, M. Graphs, Dioids and Semirings. New Models and Algorithms; Operations Research Computer Science Interfaces Series; Springer: Heidelberg, Germany, 2008. [Google Scholar]
- Renyi, A. On measures of entropy and information. In Proceedings of the Proceedings of the Fourth Berkeley Symposium on Mathematical Statistics and Probability, Volume 1: Contributions to the Theory of Statistics, Berkeley, CA, USA, 20 June–30 July 1960; University of California Press: Berkeley, CA, USA, 1961; pp. 547–561. [Google Scholar]
- Golan, J.S. Semirings and Their Applications; Kluwer Academic: Dordrecht, The Netherlands, 1999. [Google Scholar]
- Russell, S.J.; Norvig, P. Artificial Intelligence—A Modern Approach, 3rd international ed.; Artificial Intelligence; Prentice Hall: Upper Saddle River, NJ, USA, 2010. [Google Scholar]
- Murphy, K.P. Machine Learning. A Probabilistic Perspective; MIT Press: Cambridge, MA, USA, 2012. [Google Scholar]
- Engelbrecht, A.P. Computational Intelligence. An Introduction; Wiley: Hoboken, NJ, USA, 2002. [Google Scholar]
- Baccelli, F.; Cohen, G.; Olsder, G.; Quadrat, J. Synchronization and Linearity; Wiley: Hoboken, NJ, USA, 1992. [Google Scholar]
- Valverde-Albacete, F.J.; Peláez-Moreno, C. The case for shifting the Renyi entropy. Entropy 2019, 21, 46. [Google Scholar] [CrossRef]
- Renyi, A. Probability Theory; Courier Dover Publications: Mineola, NY, USA, 1970. [Google Scholar]
- Pap, E. g-calculus. In Zbornik Radova Prirodno-Matematichkog Fakulteta. Serija za Matematiku. Review of Research. Faculty of Science. Mathematics Series; University of Novi Sad: Novi Sad, Serbia, 1993; pp. 145–156. [Google Scholar]
- Hardy, G.H.; Littlewood, J.E.; Pólya, G. Inequalities; Cambridge University Press: Cambridge, UK, 1952. [Google Scholar]
- Beck, C.; Schögl, F. Thermodynamics of Chaotic Systems: An Introduction; Cambridge University Press: Cambridge, UK, 1995. [Google Scholar]
- Principe, J.C. Information Theoretic Learning; Information Science and Statistics; Springer: New York, NY, USA, 2010. [Google Scholar]
- Valverde-Albacete, F.J.; Peláez-Moreno, C. The Spectra of irreducible matrices over completed idempotent semifields. Fuzzy Sets Syst. 2015, 271, 46–69. [Google Scholar] [CrossRef]
- Moreau, J.J. Inf-convolution, sous-additivité, convexité des fonctions numériques. J. Math. Pures Appl. 1970, 49, 109–154. (In French) [Google Scholar]
- Ellerman, D.P. Intellectual Trespassing as a Way of Life. Essays in Philosophy, Economics and Mathematics; Rowman & Littlefield Publishers Inc.: Lanham, MD, USA, 1995. [Google Scholar]
- Maslov, V.; Volosov, K. Mathematical Aspects of Computer Engineering; Mir: Moscow, Russia, 1988. [Google Scholar]
- Pap, E.; Ralević, N. Pseudo-Laplace transform. Nonlinear Anal. Theory Methods Appl. 1998, 33, 533–550. [Google Scholar] [CrossRef]
- Mesiar, R.; Pap, E. Idempotent integral as limit of g-integrals. Fuzzy Sets Syst. 1999, 102, 385–392. [Google Scholar] [CrossRef]
- Valverde-Albacete, F.J.; Peláez-Moreno, C. Towards Galois Connections over Positive Semifields. In Information Processing and Management of Uncertainty in Knowledge-Based Systems; Springer International Publishing: Heidelberg, Germany, 2016; CCIS Volume 611, pp. 81–92. [Google Scholar]
- Grabisch, M.; Marichal, J.L.; Mesiar, R.; Pap, E. Aggregation functions: Construction methods, conjunctive, disjunctive and mixed classes. Inf. Sci. 2011, 181, 23–43. [Google Scholar] [CrossRef] [Green Version]
- Shannon, C.E. A mathematical theory of Communication. Bell Syst. Tech. J. 1948, XXVII, 379–423. [Google Scholar] [CrossRef]
- Viterbi, A.J. Error bounds for convolutional codes and an asymptotically optimum decoding algorithm. IEEE Trans. Inf. Theory 1967, 13, 260–269. [Google Scholar] [CrossRef]
- Theodosis, E.; Maragos, P. Analysis of the Viterbi algorithm using tropical algebra and geometry. In Proceedings of the IEEE International Workshop on Signal Processing Advances in Wireless Communications (SPAWC—18), Kalamata, Greece, 25–28 June 2018; pp. 1–5. [Google Scholar]
- Forney, G. The Viterbi algorithm. Proc. IEEE 1973, 61, 268–278. [Google Scholar] [CrossRef]
- Rabiner, L.R. A tutorial on hidden Markov models and selected applications in speech recognition. Proc. IEEE 1989, 77, 257–286. [Google Scholar] [CrossRef]
- Bellman, R.; Kalaba, R. On the role of dynamic programming in statistical communication theory. IRE Trans. Inf. 1957, 3, 197–203. [Google Scholar] [CrossRef]
- Cuninghame-Green, R. Minimax Algebra; Number 166 in Lecture notes in Economics and Mathematical Systems; Springer: Heidelberg, Germany, 1979. [Google Scholar]
- Valverde-Albacete, F.J.; Peláez-Moreno, C. The spectra of reducible matrices over complete commutative idempotent semifields and their spectral lattices. Int. J. Gen. Syst. 2016, 45, 86–115. [Google Scholar] [CrossRef]
- Ganter, B.; Wille, R. Formal Concept Analysis: Mathematical Foundations; Springer: Berlin/Heidelberg, Germany, 1999. [Google Scholar]
- Paul, D.B. An efficient A* stack decoder algorithm for continuous speech recognition with a stochastic language model. In Proceedings of the workshop on Speech and Natural Language—HLT ’91, Harriman, NY, USA, 23–26 February 1992; Association for Computational Linguistics: Morristown, NJ, USA, 1992; p. 405. [Google Scholar]
- Spohn, W. The Laws of Belief: Ranking Theory and Its Philosophical Applications; Oxford University Press: Oxford, UK, 2012. [Google Scholar]
- Ronse, C. Why mathematical morphology needs complete lattices. Signal Process. 1990, 21, 129–154. [Google Scholar] [CrossRef]
- Barnard, G. The theory of information. J. R. Stat. Soc. Ser. B 1951, 13, 46–64. [Google Scholar] [CrossRef]
- Harremoës, P. Interpretations of Rényi entropies and divergences. Phys. A Stat. Mech. Appl. 2005, 365, 57–62. [Google Scholar] [CrossRef]
- Grossman, M.; Katz, R. Non-Newtonian Calculus; Lee Press: Pigeon Cove, MA, USA, 1972. [Google Scholar]
- Burgin, M. Nonclassical models of the natural numbers. Uspekhi Matemat. Nauk. 1977, 32, 209–210. [Google Scholar]
- Czachor, M. Relativity of arithmetic as a fundamental symmetry of physics. Quantum Stud. Math. Found. 2016, 3, 123–133. [Google Scholar] [CrossRef]
- Czachor, M. Waves along fractal coastlines: from fractal arithmetic to wave equations. Acta Phys. Pol. B 2019, 50, 813–831. [Google Scholar] [CrossRef]
Mean Name | Mean | Shifted Entropy | Entropy Name | r | |
---|---|---|---|---|---|
Maximum | min-entropy | ∞ | ∞ | ||
Arithmetic | Rényi’s quadratic | 2 | 1 | ||
Geometric | Shannon’s | 1 | 0 | ||
Harmonic | Hartley’s | 0 | |||
Minimum | max-entropy |
© 2019 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (http://creativecommons.org/licenses/by/4.0/).
Share and Cite
Valverde-Albacete, F.J.; Peláez-Moreno, C. The Rényi Entropies Operate in Positive Semifields. Entropy 2019, 21, 780. https://doi.org/10.3390/e21080780
Valverde-Albacete FJ, Peláez-Moreno C. The Rényi Entropies Operate in Positive Semifields. Entropy. 2019; 21(8):780. https://doi.org/10.3390/e21080780
Chicago/Turabian StyleValverde-Albacete, Francisco J., and Carmen Peláez-Moreno. 2019. "The Rényi Entropies Operate in Positive Semifields" Entropy 21, no. 8: 780. https://doi.org/10.3390/e21080780
APA StyleValverde-Albacete, F. J., & Peláez-Moreno, C. (2019). The Rényi Entropies Operate in Positive Semifields. Entropy, 21(8), 780. https://doi.org/10.3390/e21080780