Combinatorial Optimization with Information Geometry: The Newton Method
2. Models on a Finite State Space
- Two different exponential families can actually be the same statistical model, as the set of densities in the two exponential families are actually equal. This fact is due to both the arbitrariness of the reference density and the fact that sufficient statistics are actually a vector basis of the vector space generated by the sufficient statistics. In a non-parametric approach, we can refer directly to the vector space of centered log-densities, while the change of reference density is geometrically interpreted as a change of chart. The set of all possible such charts defines a manifold.
- We make a specific interpretation of the tangent bundle as the vector space of Fisher’s scores at each density and use such tangent spaces as the space of coordinates. This produces a different tangent space/space of coordinates at each density, and different tangent spaces are mapped one onto another by a proper parallel transport, which is nothing else than the re-centering of random variables.
- If a basis is chosen, a parametrization is given, and such a parametrization is, in fact, a new chart, whose values are real vectors. In the real parametrization, the natural scalar product in each scores space is given by Fisher’s information matrix.
- Riemannian gradients are defined in the usual way. It is customary in information geometry to call “natural gradient” the real coordinate presentation of the Riemannian gradient. The natural gradient is computed by applying the inverse of the Fisher information matrix to the Euclidean gradient. It seems that there are tree gradients involved, but they all represent the same object when correctly understood.
- The classical notion of expectation parameters for exponential families carries on as another chart on the statistical manifold, which gives rise to a further presentation of a geometrical object.
- While the statistical manifold is unique, there are at least three relevant connections as structures on the vector bundles of the manifold: one relating to the exponential charts, one relating to the expectation charts and one depending on the Riemannian structure.
2.1. Exponential Families As Manifolds
2.2. Change of Chart
2.3. Tangent Bundle
2.4.1. Expectation Parameters
2.4.2. Vector Fields
2.5.2. Binary Independent Variables
2.5.3. Escort Probabilities
2.5.4. Polarization Measure
3. Second Order Calculus
3.1. Metric Derivative (Levi–Civita connection)
3.2.1. Example: Binary Independent 2.5.2 Continued
3.3. Riemannian Hessian
4. Application to Combinatorial Optimization
4.1. Hessian of a Relaxed Function
4.1.1. Example: Binary Independent 2.5.2 and 3.2.1 Continued
4.2. Newton Method
4.3. Example: Binary Independent
5. Discussion and Conclusions
Conflicts of Interest
- Brown, L.D. Fundamentals of Statistical Exponential Families with Applications in Statistical Decision Theory; Number 9 in IMS Lecture Notes. Monograph Series; Institute of Mathematical Statistics: Hayward, CA, USA, 1986; p. 283. [Google Scholar]
- Amari, S.; Nagaoka, H. Methods of Information Geometry; American Mathematical Society: Providence, RI, USA, 2000; p. 206. [Google Scholar]
- Pistone, G. Nonparametric Information Geometry. In Geometric Science of Information; Proceedings of the First International Conference, GSI 2013, Paris, France, 28–30 August 2013, Nielsen, F., Barbaresco, F., Eds.; Lecture Notes in Computer Science, Volume 8085; Springer: Berlin/Heidelberg, Germany, 2013; pp. 5–36. [Google Scholar]
- Absil, P.A.; Mahony, R.; Sepulchre, R. Optimization Algorithms on Matrix Manifolds; Princeton University Press: Princeton, NJ, USA, 2008; p. xvi+224. [Google Scholar]
- Nocedal, J.; Wright, S.J. Numerical Optimization, 2nd ed.; Springer Series in Operations Research and Financial Engineering; Springer: New York, NY, USA, 2006; p. xxii+664. [Google Scholar]
- Do Carmo, M.P. Riemannian geometry; Mathematics: Theory & Applications; Birkhäuser Boston Inc.: Boston, MA, USA, 1992; p. xiv+300. [Google Scholar]
- Abraham, R.; Marsden, J.E.; Ratiu, T. Manifolds, Tensor Analysis, and Applications, 2nd ed; Applied Mathematical Sciences, Volume 75; Springer: New York, NY, USA, 1988; p. x+654. [Google Scholar]
- Lang, S. Differential and Riemannian Manifolds, 3rd ed.; Graduate Texts in Mathematics; Springer: New York, NY, USA, 1995; p. xiv+364. [Google Scholar]
- Pistone, G. Algebraic varieties vs. differentiable manifolds in statistical models. In Algebraic and Geometric Methods in Statistics; Gibilisco, P., Riccomagno, E., Rogantin, M.P., Wynn, H.P., Eds.; Cambridge University Press: Cambridge, UK, 2010. [Google Scholar]
- Malagò, L.; Matteucci, M.; Dal Seno, B. An information geometry perspective on estimation of distribution algorithms: Boundary analysis. Proceedings of the 2008 GECCO Conference Companion On Genetic and Evolutionary Computation (GECCO ’08); ACM: New York, NY, USA, 2008; pp. 2081–2088. [Google Scholar]
- Malagò, L.; Matteucci, M.; Pistone, G. Stochastic Relaxation as a Unifying Approach in 0/1 Programming. Proceedings of the NIPS 2009 Workshop on Discrete Optimization in Machine Learning: Submodularity, Sparsity & Polyhedra (DISCML), Whistler Resort & Spa, BC, Canada, 11–12 December 2009.
- Malagò, L.; Matteucci, M.; Pistone, G. Stochastic Natural Gradient Descent by Estimation of Empirical Covariances. Proceedings of the IEEE Congress on Evolutionary Computation (CEC), New Orleans, LA, USA, 5–8 June 2011; pp. 949–956.
- Malagò, L.; Matteucci, M.; Pistone, G. Towards the geometry of estimation of distribution algorithms based on the exponential family. Proceedings of the 11th Workshop on Foundations of Genetic Algorithms (FOGA ’11), Schwarzenberg, Austria, 5–8 January 2011; ACM: New York, NY, USA, 2011; pp. 230–242. [Google Scholar]
- Malagò, L.; Matteucci, M.; Pistone, G. Natural gradient, fitness modelling and model selection: A unifying perspective. Proceedings of the IEEE Congress on Evolutionary Computation (CEC), Cancun, Mexico, 20–23 June 2013; pp. 486–493.
- Amari, S.I. Natural gradient works efficiently in learning. Neural Comput 1998, 10, 251–276. [Google Scholar]
- Shima, H. The Geometry of Hessian Structures; Scientific Publishing Co Pte. Ltd. World: Hackensack, NJ, USA, 2007; p. xiv+246. [Google Scholar]
- Malagò, L. On the Geometry of Optimization Based on the Exponential Family Relaxation. Ph.D. Thesis, Politecnico di Milano, Milano, Italy, 2012. [Google Scholar]
- Gallavotti, G. Statistical Mechanics: A Short Treatise; Texts and Monographs in Physics; Springer: Berlin, Germany, 1999; p. xiv+339. [Google Scholar]
- Naudts, J. Generalised exponential families and associated entropy functions. Entropy 2008, 10, 131–149. [Google Scholar]
- Esteban, J.; Ray, D. On the Measurement of Polarization. Econometrica 1994, 62, 819–851. [Google Scholar]
- Montalvo, J.; Reynal-Querol, M. Ethnic polarization, potential conflict, and civil wars. Am. Econ. Rev 2005, 796–816. [Google Scholar]
- Stein, W.; et al. Sage Mathematics Software (Version 6.0); The Sage Development Team, 2013. Available online: http://www.sagemath.org (accessed on 27 March 2014).
© 2014 by the authors; licensee MDPI, Basel, Switzerland This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution license (http://creativecommons.org/licenses/by/3.0/).
Malagò, L.; Pistone, G. Combinatorial Optimization with Information Geometry: The Newton Method. Entropy 2014, 16, 4260-4289. https://doi.org/10.3390/e16084260
Malagò L, Pistone G. Combinatorial Optimization with Information Geometry: The Newton Method. Entropy. 2014; 16(8):4260-4289. https://doi.org/10.3390/e16084260Chicago/Turabian Style
Malagò, Luigi, and Giovanni Pistone. 2014. "Combinatorial Optimization with Information Geometry: The Newton Method" Entropy 16, no. 8: 4260-4289. https://doi.org/10.3390/e16084260