Mirror Descent and Exponentiated Gradient Algorithms Using Trace-Form Entropies
Abstract
1. Introduction
1.1. The Information Geometry Perspective
1.2. Challenge of Geometric Selection
1.3. Research Contributions and Scope
- Mathematical Framework: We establish a comprehensive mathematical foundation connecting generalized entropies, deformed logarithms, and Mirror Descent updates, providing explicit formulations for numerous well-established trace entropy families.
- Algorithmic Innovations: We derive novel Generalized Exponentiated Gradient (GEG) algorithms with generalized multiplicative updates that leverage the flexibility of hyperparameter-controlled deformed logarithms, enabling adaptation to problem geometry.
- The significance of this research extends beyond algorithmic development; it opens new avenues for understanding the geometric foundations of optimization and provides practical tools for addressing increasingly complex machine learning challenges. The unifying theoretical framework connects optimization theory, information geometry, statistical physics, and practical machine learning, opens up new research directions, and provides principled approaches to algorithm design that respect the natural geometric structure of optimization problems.
2. Preliminaries: Mirror Descent (MD) and Standard Exponentiated Gradient (EG) Updates
2.1. Problem Statement
2.2. Mirror Descent Update Rules and Geometric Interpretation
- Mapping to dual space: ,
- Taking gradient step: ,
- Mapping back to primal: .
2.3. Continuous-Time Formulation and Natural Gradient Connection
2.4. Discrete Natural Gradient Form
2.5. Canonical Examples and Some Geometric Insights
2.6. Motivation for Using Parameterized Deformed Logarithms
- Adapt to statistical properties of training distributions.
- Interpolate between different geometries (e.g., Euclidean, exponential family, power-law).
- Provide automatic regularization through geometric bias.
- Enable systematic geometry exploration rather than ad-hoc selection.
3. Why Trace Entropies and Deformed Logarithms in MD and GEG?
- Domain :
- Strictly monotonically increasing:
- Concavity (optional):
- Scaling and normalization: ,
- Duality: .
4. MD and GEG Updates Using the Tsallis Entropy and Its Extensions
4.1. Properties of the Tsallis q-Logarithm and q-Exponential
4.2. MD and GEG Updates Using the Tsallis q-Logarithm
4.3. MD and EG Using Schwämmle–Tsallis (ST) Entropy
5. MD and GEG Using the Kaniadakis Entropy and Its Extensions and Generalizations
5.1. Basic Properties of -Logarithm and -Exponential
5.2. MD and GEG Using the Kaniadakis Entropy and -Logarithm
5.3. Two-Parameter Logarithms Based on Generalized Kaniadakis–Lissia–Scarfone Entropy
- 1.
- Numerical Stability: The KLS logarithm becomes increasingly stable as , but exhibits potential numerical instability for x values far from unity.
- 2.
- Parameter Sensitivity: Small x values create higher sensitivity to parameter changes, requiring careful numerical handling.
- 3.
- Convergence Properties: The limiting behavior requires special computational treatment using L’Hôpital’s rule.
5.4. Exponential KLS Function and Its Properties
6. Generalization and Normalization of Mirror Descent
7. Conclusions and Discussion
Author Contributions
Funding
Data Availability Statement
Acknowledgments
Conflicts of Interest
Appendix A. Deformed Algebra and Calculus
Appendix A.1. q-Algebra and Calculus
- q-sum:
- Neutral element of q-sum:
- q- substraction:
- q-product:
- Neutral element of q-product:
- q-division:
- Inverse of q-product .
Appendix A.2. κ–Algebra and Calculus
- -sum: ,
- Neutral element of -sum: ,
- - substraction: ,
- -product: ,
- Admits the unity as a neutral element of -product: .
- -product is commutative:
- -product is associative:
- The inverse element of x is , i.e.,
- -division:
- Inverse of -product .
References
- Nemirovsky, A.; Yudin, D.B. Problem Complexity and Method Efficiency in Optimization; John Wiley and Sons: Hoboken, NJ, USA, 1983. [Google Scholar] [CrossRef]
- Amid, E.; Warmuth, M.K. Reparameterizing Mirror Descent as Gradient Descent. In Proceedings of the 34th International Conference on Neural Information Processing Systems (NIPS’20), Vancouver, BC, Canada, 6–12 December 2020; Curran Associates Inc.: Red Hook, NY, USA, 2020; pp. 8430–8439. [Google Scholar]
- Amid, E.; Warmuth, M.K. Winnowing with Gradient Descent. In Proceedings of the 33rd International Conference on Algorithmic Learning Theory, PMLR 125, Graz, Austria, 9–12 July 2020; pp. 163–182. [Google Scholar]
- Ghai, U.; Hazan, E.; Singer, Y. Exponentiated Gradient Meets Gradient Descent. In Proceedings of the 31st International Conference on Algorithmic Learning Theory, PMLR 117, San Diego, CA, USA, 8–11 February 2020; pp. 386–407. [Google Scholar] [CrossRef]
- Shalev-Shwartz, S. Online learning and online convex optimization. Found. Trends Mach. Learn. 2011, 4, 107–194. [Google Scholar] [CrossRef]
- Beck, A.; Teboulle, M. Mirror descent and nonlinear projected subgradient methods for convex optimization. Oper. Res. Lett. 2003, 31, 167–175. [Google Scholar] [CrossRef]
- Amari, S. Natural gradient works efficiently in learning. Neural Comput. 1998, 10, 251–276. [Google Scholar] [CrossRef]
- Amari, S. Information Geometry and Its Applications; Springer: Berlin/Heidelberg, Germany, 2016; Volume 194. [Google Scholar]
- Amari, S. Information Geometry and Its Applications: Convex Function and Dually Flat Manifold. In Emerging Trends in Visual Computing; Nielsen, F., Ed.; Springer Lecture Notes in Computer Science; Springer: Berlin/Heidelberg, Germany, 2009; pp. 75–102. [Google Scholar]
- Amari, S. Alpha-divergence is unique, belonging to both f-divergence and Bregman divergence classes. IEEE Trans. Inf. Theory 2009, 55, 4925–4931. [Google Scholar] [CrossRef]
- Amari, S.; Cichocki, A. Information geometry of divergence functions. Bull. Pol. Acad. Sci. 2010, 58, 183–195. [Google Scholar] [CrossRef]
- Amid, E.; Nielsen, F.; Nock, R.; Warmuth, M.K. Optimal transport with tempered exponential measures. Proc. AAAI Conf. Artif. Intell. 2024, 38, 10838–10846. [Google Scholar] [CrossRef]
- Raskutti, G.; Mukherjee, S. The information geometry of mirror descent. IEEE Trans. Inf. Theory 2015, 61, 1451–1457. [Google Scholar] [CrossRef]
- Cichocki, A.; Cruces, S.; Amari, S.I. Generalized alpha-beta divergences and their application to robust nonnegative matrix factorization. Entropy 2014, 13, 134–170. [Google Scholar] [CrossRef]
- Cichocki, A.; Cruces, S.; Sarmiento, A.; Tanaka, T. Generalized Exponentiated Gradient Algorithms and Their Application to On-Line Portfolio Selection. IEEE Access 2024, 12, 197000–197020. [Google Scholar] [CrossRef]
- Kainth, A.S.; Wong, T.-K.L.; Rudzicz, F. Conformal mirror descent with logarithmic divergences. Inf. Geom. 2024, 7 (Suppl. 1), 303–327. [Google Scholar] [CrossRef]
- Cichocki, A. Generalized Exponentiated Gradient Algorithms Using the Euler Two-Parameter Logarithm. arXiv 2025, arXiv:2502.17500. [Google Scholar] [CrossRef]
- Cichocki, A. Mirror Descent Using the Tempesta Generalized Multi-parametric Logarithms. arXiv 2025, arXiv:2506.13984. [Google Scholar]
- Helmbold, D.P.; Schapire, R.E.; Singer, Y.; Warmuth, M.K. On-line Portfolio Selection Using Multiplicative Updates. Math. Financ. 1998, 8, 325–347. [Google Scholar] [CrossRef]
- Kivinen, J.; Warmuth, M.K. Exponentiated Gradient versus Gradient Descent for Linear Predictors. Inf. Comput. 1997, 132, 1–63. [Google Scholar] [CrossRef]
- Kivinen, J.; Warmuth, M.K. Additive Versus Exponentiated Gradient Updates for Linear Prediction. In Proceedings of the Twenty-Seventh Annual ACM Symposium on Theory of Computing, Las Vegas, NV, USA, 29 May–1 June 1995; pp. 209–218. [Google Scholar] [CrossRef]
- Bregman, L. The relaxation method of finding a common point of convex sets and its application to the solution of problems in convex programming. Comp. Math. Phys. USSR 1967, 7, 200–217. [Google Scholar] [CrossRef]
- Burachik, R.S.; Dao, M.N.; Lindstrom, S.B. The generalized Bregman distance. SIAM J. Optim. 2021, 31, 404–424. [Google Scholar] [CrossRef]
- Martinez-Legaz, J.E.; Tamadoni Jahromi, M.; Naraghirad, E. On Bregman-type distances and their associated projection mappings. J. Optim. Theory Appl. 2022, 193, 107–117. [Google Scholar] [CrossRef]
- Nielsen, F.; Nock, R. Generalizing skew Jensen divergences and Bregman divergences with comparative convexity. IEEE Signal Process. Lett. 2017, 24, 1123–1127. [Google Scholar] [CrossRef]
- Nock, R.; Nielsen, F. Bregman divergences and surrogates for learning. IEEE Trans. Pattern Anal. Mach. Intell. 2008, 31, 2048–2059. [Google Scholar] [CrossRef]
- Cichocki, A.; Amari, S.I. Families of α-β-and γ-divergences: Flexible and robust measures of similarities. Entropy 2010, 12, 1532–1568. [Google Scholar] [CrossRef]
- Cichocki, A.; Zdunek, R.; Phan, A.H.; Amari, S.I. Multiplicative Iterative Algorithms for NMF with Sparsity Constraints. In Nonnegative Matrix and Tensor Factorizations; Chapter 3; John Wiley and Sons: Hoboken, NJ, USA, 2009; pp. 131–202. [Google Scholar] [CrossRef]
- Cichocki, A.; Zdunek, R.; Amari, S. Csiszár’s Divergences for Nonnegative Matrix Factorization: Family of New Algorithms. In Independent Component Analysis and Signal Separation; Springer: Berlin/Heidelberg, Germany, 2006; Volume 3889, pp. 32–39. [Google Scholar]
- Gunasekar, S.; Woodworth, B.; Srebro, N. Mirrorless Mirror Descent: A Natural Derivation of Mirror Descent. In Proceedings of the 24th International Conference on Artificial Intelligence and Statistics (AISTATS) 2021, San Diego, CA, USA, 13–15 April 2021; Volume 130, pp. 2305–2313. [Google Scholar]
- Nielsen, F. A Note on the Natural Gradient and Its Connections with the Riemannian Gradient, the Mirror Descent, and the Ordinary Gradient. Github Report. Available online: https://franknielsen.github.io/blog/NaturalGradientConnections/NaturalGradientConnections.pdf (accessed on 1 August 2020).
- Hristopulos, D.T.; da Silva, S.L.E.; Scarfone, A.M. Twenty Years of Kaniadakis Entropy: Current Trends and Future Perspectives. Entropy 2025, 27, 247. [Google Scholar] [CrossRef]
- Naudts, J. Deformed exponentials and logarithms in generalized thermostatistics. Phys. A Stat. Mech. Its Appl. 2002, 316, 323–334. [Google Scholar] [CrossRef]
- Tsallis, C. Entropy. Encyclopedia 2022, 2, 264–300. [Google Scholar] [CrossRef]
- Wada, T.; Scarfone, A.M. Finite difference and averaging operators in generalized entropies. J. Phys. Conf. Ser. 2010, 201, 012005. [Google Scholar] [CrossRef]
- Kaniadakis, G.; Scarfone, A.M. A new one-parameter deformation of the exponential function. Phys. A Stat. Mech. Its Appl. 2002, 305, 69–75. [Google Scholar] [CrossRef]
- Kaniadakis, G. Statistical mechanics in the context of special relativity. Phys. Rev. E 2002, 66, 056125. [Google Scholar] [CrossRef]
- Kaniadakis, G.; Lissia, M.; Scarfone, A.M. Deformed logarithms and entropies. Phys. A Stat. Mech. Its Appl. 2004, 340, 41–49. [Google Scholar] [CrossRef]
- Kaniadakis, G.; Lissia, M.; Scarfone, A.M. Two-parameter deformations of logarithm, exponential, and entropy: A consistent framework for generalized statistical mechanics. Phys. Rev. E 2005, 71, 046128. [Google Scholar] [CrossRef]
- Tempesta, P. A theorem on the existence of trace-form generalized entropies. Proc. R. Soc. A Math. Phys. Eng. Sci. 2015, 471, 20150165. [Google Scholar] [CrossRef]
- Wada, T.; Scarfone, A.M. On the Kaniadakis distributions applied in statistical physics and natural sciences. Entropy 2023, 25, 292. [Google Scholar] [CrossRef]
- Gomez, I.S.; Borges, E.P. Algebraic structures and position-dependent mass Schrödinger equation from group entropy theory. Lett. Math. Phys. 2021, 111, 43. [Google Scholar] [CrossRef]
- Tsallis, C. Possible generalization of Boltzmann-Gibbs statistics. J. Stat. Phys. 1988, 52, 479–487. [Google Scholar] [CrossRef]
- Tsallis, C. What are the numbers that experiments provide. Quim. Nova 1994, 17, 468–471. [Google Scholar]
- Ishige, K.; Salani, P.; Takatsu, A. Hierarchy of deformations in concavity. Inf. Geom. 2024, 7 (Suppl. 1), 251–269. [Google Scholar] [CrossRef]
- Box, G.E.P.; Cox, D.R. An Analysis of Transformations. J. R. Stat. Soc. Ser. B 1964, 26, 211–252. [Google Scholar] [CrossRef]
- Borges, E.P. A possible deformed algebra and calculus inspired in nonextensive thermostatistics. Phys. A Stat. Mech. Its Appl. 2004, 340, 95–101. [Google Scholar] [CrossRef]
- Yamano, T. Some properties of q-logarithm and q-exponential functions in Tsallis statistics. Phys. A Stat. Mech. Its Appl. 2002, 305, 486–496. [Google Scholar] [CrossRef]
- Nock, R.; Amid, E.; Warmuth, M.K. Boosting with Tempered Exponential Measures. arXiv 2023, arXiv:2306.05487. [Google Scholar] [CrossRef]
- Borges, E.P.; Roditi, I. A family of nonextensive entropies. Phys. Lett. A 1998, 246, 399–402. [Google Scholar] [CrossRef]
- Schwämmle, V.; Tsallis, C. Two-parameter generalization of the logarithm and exponential functions and Boltzmann-Gibbs-Shannon entropy. J. Math. Phys. 2007, 48, 113301. [Google Scholar] [CrossRef]
- Cardoso, P.G.; Borges, E.P.; Lobao, T.C.; Pinho, S.T. Nondistributive algebraic structures derived from nonextensive statistical mechanics. J. Math. Phys. 2008, 49, 093509. [Google Scholar] [CrossRef]
- Corcino, C.B.; Corcino, R.B. Three-Parameter Logarithm and Entropy. J. Funct. Spaces 2020, 2020, 9791789. [Google Scholar] [CrossRef]
- Kaniadakis, G. Maximum entropy principle and power-law tailed distributions. Eur. Phys. J. B 2009, 70, 3–13. [Google Scholar] [CrossRef]
- Mittal, D.P. On some functional equations concerning entropy, directed divergence and inaccuracy. Metrika 1975, 22, 35–45. [Google Scholar] [CrossRef]
- Sharma, B.D.; Taneja, I.J. Entropy of type (α, β) and other generalized measures in information theory. Metrika 1975, 22, 205–215. [Google Scholar] [CrossRef]
- Taneja, I.J. On generalized information measures and their applications. Adv. Electron. Electron Phys. 1989, 76, 327–413. [Google Scholar]
- Scarfone, A.M.; Suyari, H.; Wada, T. Gauss law of error revisited in the framework of Sharma-Taneja-Mittal information measure. Cent. Eur. J. Phys. 2009, 7, 414–420. [Google Scholar] [CrossRef]
- Kaniadakis, G.; Scarfone, A.M.; Sparavigna, A.; Wada, T. Composition law of κ-entropy for statistically independent systems. Phys. Rev. E 2017, 95, 052112. [Google Scholar] [CrossRef]
- Furuichi, S. An axiomatic characterization of a two-parameter extended relative entropy. J. Math. Phys. 2010, 51, 123302. [Google Scholar] [CrossRef]
- Da Silva, G.B.; Ramos, R.V. The Lambert-Tsallis Wq function. Phys. A Stat. Mech. Its Appl. 2019, 525, 164–170. [Google Scholar] [CrossRef]









| Entropy | Deformed Exponential | MD/GEG Update |
|---|---|---|
| Shannon | (EG) | |
| Tsallis | ||
| (q-GEG) | ||
| Schwämmle-Tsallis | ||
| Kaniadakis | ||
| KLS | ||
| Generic | ||
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content. |
© 2025 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).
Share and Cite
Cichocki, A.; Tanaka, T.; Nielsen, F.; Cruces, S. Mirror Descent and Exponentiated Gradient Algorithms Using Trace-Form Entropies. Entropy 2025, 27, 1243. https://doi.org/10.3390/e27121243
Cichocki A, Tanaka T, Nielsen F, Cruces S. Mirror Descent and Exponentiated Gradient Algorithms Using Trace-Form Entropies. Entropy. 2025; 27(12):1243. https://doi.org/10.3390/e27121243
Chicago/Turabian StyleCichocki, Andrzej, Toshihisa Tanaka, Frank Nielsen, and Sergio Cruces. 2025. "Mirror Descent and Exponentiated Gradient Algorithms Using Trace-Form Entropies" Entropy 27, no. 12: 1243. https://doi.org/10.3390/e27121243
APA StyleCichocki, A., Tanaka, T., Nielsen, F., & Cruces, S. (2025). Mirror Descent and Exponentiated Gradient Algorithms Using Trace-Form Entropies. Entropy, 27(12), 1243. https://doi.org/10.3390/e27121243

