Entropies from coarse-graining: convex polytopes vs. ellipsoids

We examine the Boltzmann/Gibbs/Shannon $\mathcal{S}_{BGS}$ and the non-additive Havrda-Charv\'{a}t / Dar\'{o}czy/Cressie-Read/Tsallis \ $\mathcal{S}_q$ \ and the Kaniadakis $\kappa$-entropy \ $\mathcal{S}_\kappa$ \ from the viewpoint of coarse-graining, symplectic capacities and convexity. We argue that the functional form of such entropies can be ascribed to a discordance in phase-space coarse-graining between two generally different approaches: the Euclidean/Riemannian metric one that reflects independence and picks cubes as the fundamental cells and the symplectic/canonical one that picks spheres/ellipsoids for this role. Our discussion is motivated by and confined to the behaviour of Hamiltonian systems of many degrees of freedom. We see that Dvoretzky's theorem provides asymptotic estimates for the minimal dimension beyond which these two approaches are close to each other. We state and speculate about the role that dualities may play in this viewpoint.


Introduction
Entropy is one of the central concepts in Statistical Mechanics. Although initially introduced in thermodynamics, it has clearly superseded its modest origins and is currently widely used in numerous fields extending from dynamical systems and geometry to communication theory, complexity and far beyond. Due to its significance, considerable effort has been invested for almost 150 years, since its introduction by R. Clausius (ca. ∼1865) in understanding its meaning and ways to calculate it for specific systems. Fundamental contributions and interpretations, in various contexts, were made by L. Boltzmann, J.W. Gibbs, J. von Neumann, C.Shannon, A. Kolmogorov, A. Renyi, I. Csiszar, E. Jaynes well as numerous other scientists and engineers ever since. The functional forms studied by most of the above people resemble each other very closely, so they tend to be treated together as one and the same entropy. For the purposes of the present work alone we will pretend they are the same, although conceptually they are clearly very distinct from each other even in the specific context of Statistical Mechanics [1,2,3].
Following this abusive bundling, we will be referring in the sequel to the Boltzmann / Gibbs / Shannon functional form where k B is a constant which is identified in Statistical Physics as Boltzmann's constant. In (1) i is an index taking values in a finite cardinality or countable set I, which is used to label the probabilities of possible outcomes.
As is well-known, (1) has a subjective character in Classical Physics [1,2]. The probabilities p i appearing in (1) depend not only on the actual system under study but also on the level of ignorance about the system by an experimenter/observer. In many occasions, it is worthwhile to consider systems with continuous rather than with discrete sets of outcomes. In such cases, we naively translate the definition (1) into the continuum, in which p i morph into a probability density function ρ : M → R + . The details and the exact way of considering such a continuum limit may be a highly non-trivial process, during which one may have to introduce a metric or a homogeneous structure [1,2] etc in order to reach a well-defined, and unique, result. Ignoring such subtle and important issues, the one would naively get where µ is an appropriate measure on the sample space M. In the case of Hamiltonian systems of many degrees of freedom, which is our object of study, M usually stands for the phase space of the system, endowed with a Riemannian metric g with µ being the unique Riemannian measure associated to g.
There are several problems with (2) though: one is that it is not coordinate independent [1,2,3]. This can be traced back to the fact that, very much like (1) from which it was naively inferred, (2) does not depend on the graph of ρ, but just on its range of values. The functional (2) is not coordinate independent (diffeomorphism invariant). A second objection, related to the above comments, is that taking naively the limit of discrete p i to the continuum ρ gives a divergent expression, the infinite part of which has to be judiciously subtracted before (2) can be properly used or interpreted.
Fortunately, using a relative entropy expression,á la Kullback-Leibler for instance, addresses the first problem. So what one actually computes in classical Statistical Mechanics is not really the "absolute" entropy but rather a form of a relative entropy of a probability distribution with respect to an underlying background measure. If such a reference measure is independent of the details of the particular model at hand, then it results in an additive constant which eventually becomes irrelevant as almost all experimentally verifiable quantities involve entropy variations rather than the "absolute" value of the entropy itself. One chooses as a reference measure the probability density function resulting from the continuum limit of a discretization, usually by cubes of side length √ of the phase space M.
One could question some of these statements, presenting in a counter-argument the case of the Sackur-Tetrode equation which gives an expression for the (absolute) entropy of a classical monatomic ideal gas where U is the internal energy of the gas, m the mass of each atom, V the volume that the gas occupies and N the number of atoms. It seems that (3) contradicts the above statements pertaining to the additive constant. However the appearance in (3) of h, which is an arbitrarily chosen regularisation parameter in classical Physics and only acquires physical significance as Planck's constant in Quantum Physics reinforces, rather than contradicts, the above statements. The implicit phase space discretisation by cubes of side √ appearing in expressions such as (3) plays an important part in the present work. Its implicit, therefore often forgotten, presence is also at the heart of a recent controversy, to be mentioned below, about the existence, appropriate form and physical significance of the continuum limit of the non-additive entropy S q .
As seen in the previous paragraph, in order to overcome the infinities that inevitably creep in the transition from (1) to (2) we can subtract a renormalisation constant. Alternatively we can turn to Quantum Physics and using the non Neumann operator functional S vN = −tr (ρ logρ) (4) whereρ stands of the density operator (matrix) on the (kinematic) Hilbert space of the wavefunctions and tr is the trace over a basis of such a Hilbert space. The von Neumann entropy may still need some form of regularisation and renormalisation, especially in Quantum Field Theory. Its fundamental "drawback" is that one has to know the underlying Quantum Theory, or at least to have a field theory description of the effective degrees of freedom before it can be properly implemented. An underlying quantum theory may not be known, as in the case of a quantum theory of gravity, for instance. We would not want to even start a discussion about the technical obstacles of actually computing (3) for particular systems. However, becoming aware of such well-known and long-ago resolved issues pertaining to S BGS shows the subtlety and care with which the concept of entropy has to be treated.
An additional incentive for looking into fundamental issues pertaining to entropy comes with the great success of S BGS in quantifying the thermodynamic properties of systems at equilibrium. Such a success is seen in the agreement of the predictions derived by using S BGS with numerous experimental observations, since the formulation of S BGS . But what is the source of such a spectacular success of S BGS ? The dictum that we use S BGS "because it works" or it is the "cannon" cannot be possibly satisfying, if someone is interested in getting a better understanding why things work the way they do. It is not obvious, for instance, why or whether S BGS precisely describes the collective behaviour of systems with long-range interactions, non-ergodic phase space evolution etc., not to even mention systems out of equilibrium.
To understand the limitations of a particular functional one can compare its properties, predictions etc with those of another judiciously chosen or intelligently constructed functional.
The Havrda-Charvát/Daróczy/Cressie-Read/Tsallis entropy S q [5,6,7,8,9,10] or the (Kaniadakis) κ− entropy S κ [11,12,13,14,15], both to be defined and used below as "alternative" (meaning in different regimes, or for different systems) functionals to S BGS . In addition, numerous other entropic functionals that have been recently introduced and used in Statistical Mechanics [10], and more particularly the generalised exponential families specific members of which are S q and S κ which were analysed in [16,17,18], can also be considered as playing, in some part, such a role [18]. A better understanding of such functionals and the determination of what are the essential physical features of the systems whose collective behaviour they describe will not only help appreciate their significance but also set some boundaries to the complete dominance of S BGS in Statisical Mechanics. Therefore, such an effort it will also help us understand better S BGS itself.
One of the many, still unanswered, questions pertaining to S q , S κ and the numerous other non-additive entropic functionals is their dynamical basis. If S BGS successfully describes systems having an ergodic evolution in their configuration or phase space, then what are the underlying dynamical features of systems, if any, described by S q , S κ etc? The present work is partly motivated by and echoes to some extent, the general viewpoint described in as well as some of the fundamental issues pointed out in [4]. Even though [4] was written more than a decade ago, and despite the intervening considerable activity in understanding aspects of S q , S κ and other non-additive entropic functionals, it is probably fair to state that most of the fundamental dynamical and statistical questions that [4] had pinpointed remain unclear to this date.
In an attempt to address such questions, we explored in our relatively recent work [19,20,21,22,23,24,25,26,27], some formal consequences of the definition of S q . At no point however did we deal in any part of these works with the actual nature of S q per se. We just confined ourselves to formal algebraic and geometric structures and conclusions stemming from its functional form. One of our key assumptions was that some of the algebraic properties of S q are not emergent from statistical averaging, but they directly reflect dynamical properties of the phase space of the system. In other words, we assumed that such algebraic properties of S q are "typical" of the underlying Hamiltonian system whose statistical behaviour is described by S q . A part of the present work is to investigate this assumed "typical" behaviour and try to determine how it may dictate, at least some parts of the functional form of the entropy used to describe such systems.
At the core of the present work is a question that appears trivial at first sight: if one knows the microscopic evolution of a Hamiltonian system of many degrees of freedom, can this person predict, or pick among various "reasonable" entropic functionals a unique one or, to be less ambitious, a class of entropies that would successfully describe the macroscopic behaviour of the system? The obvious answer appears to be negative, as statistics seems in the eyes of many to be completely independent/dissociated from the underlying dynamics. It seems that we can successfully do the former, as in the case of S BGS without knowing almost anything about the latter. This is certainly the viewpoint advocated, among others, by J.W. Gibbs, L.D. Landau and A.I. Khintchin who consider the underlying dynamics to be largely irrelevant, inasmuch as the ergodic hypothesis can be used to justify the choice of the micro-canonical ensemble. In this viewpoint the success of Statistical Mechanics is ascribed to the large numbers of degrees of freedom of such systems [29,32]. The quantification comes by the Central Limit Theorem which "justifies" the "ubiquity" of the Gaussians in physical, and not only, processes. However, one may wish to note that the Central Limit Theorem does not hold if the random variables are not independent or weakly correlated. When there are non-trivial ("strong") correlations among such variables the question naturally arises as to which statistics, hence which entropic functional, is appropriate for describing systems having such properties. This also emphasises the question about the meaning of "independence" and how it needs to be modified, if at all, for the cases of these "different kinds" of statistics.
By contrast, we follow the view of L. Boltzmann (in part), P. Ehrenfest and A. Einstein [33] according to which the underlying dynamics is at the core of the thermodynamic behaviour of a system. We believe that the recent emergence of numerous entropic functionals and the explorations into the realm of non-ergodic evolutions in phase space, make the underlying dynamical explorations highly desirable and potentially enlightening. We view the emergence of an entropic functional form for a Hamiltonian system with many degrees of freedom as a manifestation of a dissonance in phase space: usually one coarse-grains [28,29,30,31] phase space by using cubes of side length √ . Such cubes however do not behave well under canonical transformations. Based on the symplectic non-squeezing theorem and the subsequent formulation of symplectic capacities as fundamental constructions in symplectic geometry, it is probably more prudent to coarse-grain M in terms of ellipsoids. This happens because all symplectic capacities have the same value on ellipsoids. Generalizing cubes into convex polyhedra to take into account the composition of the newer, non-additive, entropic forms, we can see such entropy as arising from the difference in coarse-graining between ellipsoids and convex polyhedra.
We choose the Banach-Mazur distance to quantify such a difference. Hence the problem reduces to determining the Banach-Mazur distance between polyhedra and ellipsoids in M of typical side/radius length √ . Since ellipsoids are minimal from the viewpoint of dynamics / symplectic capacities but the polytopes do not have any a priori lower bound on their size, we will consider the distance between such polytopes and the largest spheres/ellipsoids that can be inscribed in them. A central result in the asymptotic limit of large n is provided by Dvoretzky's theorem, the lower bound in the dimension of which gives rise to the functional form of S BGS and provides the leading asymptotic form for non-additive entropies.
In Section 2, we present some of the properties of S q and S κ that we need in this work. In Section 3, we briefly discuss the geometry of "independence" and aspects of phase space coarse-graining. In Section 4, we present background material about Hamiltonian systems and symplectic geometry. In Section 5, we present basics of convex geometry/analysis needed to follow our exposition. In Section 6, we discuss Dvoretzky's theorem and dimension and the role played by dualities in this viewpoint. Section 7 contains conclusions and some speculations.
2 Structures induced by two non-additive entropies.
Two of the most commonly used non-additive entropies, which have attracted substantial attention recently are presented in this Section. Pertinent properties, for this work, are also stated.

The Tsallis entropy S q and its induced operations.
The Havrda-Charvát [5], Daróczy [6], Cressie-Read [7,8], Tsallis [9,10] entropy S q introduced and developed, in part, in the context of Statistical Mechanics by Tsallis for a discrete set of outcomes {p i }, parametrized by the index set I and with i ∈ I, is given by where k B is the Boltzmann constant. Its continuum analogue for a sample space Ω equipped with a measure absolutely continuous with respect to the Lebesgue measure with Radon-Nikodym density ρ is naively assumed to be [9,10] Here dvol Ω represents the infinitesimal volume element of Ω when it is a Riemannian manifold M, as is usually the case for the Hamitonian systems of many degrees of freedom which are the focus of our attention. It should be noted that most recently there has been a controversy regarding the validity of this naive extension of S q to continuous variables, without either side being definitively convincing, in our opinion [34,35,36,37,38,39,31,40]. The controversy brought about by [34] is intimately related to the implicit normalization of any entropy functional, such as S BGS for instance, required to make its definition coordinate independent (diffeomorphism invariant). It is usually provided by the discretizaton of M in cubes of side length √ , of the density distribution arising from its continuum limit. For S BGS due ore the presence of log it results in a constant that is additive and hence can be ignored inasmuch as entropy differences are the only relevant quantities in physical predictions. By contrast, such a term is not additive, but rather multiplicative, therefore cannot be omitted in (6) in considering entropy differences. Subsequently the above authors have presented their views on this and related matters that may make the use of (6) rather questionable. This matter is of interest but not of central importance in the line of arguments and viewpoint of the the present work, therefore will sidestep these issues in what follows, and keep using (6) pretending that this naive is indeed a valid generalisation to the continuous case.
The nonextensive parameter q can generically take any values in R. There has been a recent proposal to extend its validity to q ∈ C which is certainly worth looking into as well as considering the associated interpretation of such an extension [41]. To have desirable properties, such as relative insensitivity to rare events, convexity, and decay in a polynomial manner [10], and following our past work [19,20,21,22,23,24,25,26,27] as well as the more recent [38,39], we will assume that q ∈ [0, 1] ⊂ R everywhere in the sequel. We straightforwardly notice, that for q → 1 one recovers S BGS . We will set henceforth k B = 1 for brevity.
Conventionally, two subsystems A, B ⊂ Ω are considered independent [42] if their marginal probability distribution functions are related by Here A∪B indicates the system resulting from the interaction of A and B. For such subsystems S BGS is easily seen to be additive, namely The entropy S q however is not additive [10], at least not in the conventional sense, as it satisfies This lack of additivity is usually ascribed to the long-range spatial and temporal correlations of the systems that S q entropy conjecturally describes [10]. For systems described by such S q , additivity is manifestly restored if the addition is redefined as [43,44] x ⊕ q y = x + y + (1 − q)xy Then It took sometime before a generalized product, distributive with respect to the addition (10) was discovered [45,19] Even though [45] and [19] gave different forms of such a product, conjecturally equivalent, we will be using here the one introduced in [45] as more elegant and easier to work with. The generalised product turned out to be The definition of the generalised product (12) appears to be somewhat arcane. However, the motivation behind its construction becomes more transparent, in our opinion, if we see it as a result of demanding the commutativity of the diagram [20] Putting together the two generalized binary operations (10) and (12), we set up [20,21] a oneparameter family of deformations of the set of reals denoted by R q . An explicit isomorphism between these two sets τ q : R → R q is given by [20,21] In terms of this field isomorphism, the product (12) can be rewritten as By the above construction, we have in effect reduced the differences between S BGS and S q to the differences between R and R q . This can more formally seen through a comparison between the axioms used to determine S BGS [46,47] and S q [48,49].
In closing this Section, we should point out that there are several distinct and non-equivalent definitions of q-exponentials in the literature (see e.g. [50]) quite frequently associated to quantum groups, which have nothing obvious to do with S q . With eyes to the next subsection, the same words of caution also apply to the several κ-distributions existing in the literature (e.g. [51,52] in space plasmas) which have nothing obvious to do with S κ . Due to this lack of uniformly in nomenclature, one should be careful about the exact functional forms are used in each occasion.

2.2
The κ-entropy S κ and its induced operations.
Among the many entropic functionals that have been constructed over the years, the κ-entropy S κ has also attracted some attention since its introduction [11,12,13,14,15]. Unlike S q whose origin can be traced to the thermodynamic formalism, the origins and possible scope of S κ are far more concrete: they rely on attempts to understand the thermodynamic behaviour of the free relativistic gas, a system whose thermodynamic behaviour has proved to be far harder to describe than could be naively suspected. Since we do live in a relativistic world, where locally the principle of Relativity describes many physical phenomena, determining how it dictates the collective behaviour of systems of many degrees of freedom may be of considerable importance.
The κ-entropy was introduced as a functional that generates through the variational principle, a given the κ-exponential distribution that arose from arguments pertaining to non-linear kinetics [11]. Lorentz invariance is already built into the underlying dynamics in this formalism.
The κ-entropy was defined directly for a continuous probability distribution with density ρ on the sample space Ω as [11,13] where Z is a normalisation constant and κ ∈ R. So far as the author knows, there is no standing proposal to extend κ ∈ C although we do not see any reason what this would not be feasible, if the need arose and an appropriate physical motivation and interpretation would be provided as in the case of S q . We expect about this continuous functional form worries/objections similar to the ones that arose for S q which were alluded to above. The discrete analogue of (16) for a set of outcomes I with corresponding probabilities {p i }, i ∈ I would naively appear to be with c(κ) as in (16). One should be quite careful though in providing such naive discrete generalization, if one is interested in maintaining for (17) some form of Lorentz-invariance as the one that a gave rise to (16). It is well-known, and probably obvious, that discrete structures violate manifest Lorentz-invariance, something that has presented major technical challenges to proponents of quantum gravity theories. The solutions, which could also be adopted here, is to either forego completely any requirements for even remnants of Lorentz-invariance in (17), or to use arguments relying on randomness that preserve such a structure, as was done, for instance, in [53] for the case of causal sets.
One can immediately observe that It is also immediately obvious that S κ are not additive with respect to the usual addition and that to restore manifest additivity one will leave to define the generalised sum as [13,15] x where |κ| < 1, mirroring (10). This can be re-written as to resemble more closely the generalized product, to be defined next. The generalised product [13,15], which is distributive with respect to the generalised sum (19) and mirroring (12) turns out to be [13,15] x κ ⊗ y = 1 κ sinh 1 κ arcsinh(κx) arcsinh(κy) (21) Then, in parallel to the case of S q , one [13,15] can define the deformed field R κ = (R, κ ⊕, κ ⊗) and set set up [13,15] a field isomorphism τ κ : R → R κ which is given explicitly by with an inverse in analogy with (14) for S q and also mimicking (15) we get We are not aware of an existing axiomatic formulation of the κ-entropy, but we do not consider this as a drawback at a physical level, but rather as an open question at the formal level that remains to be addressed, if such interest arises, in the future.

Features of S q and S κ entropies.
We observe from the above structures, that even though S q and S κ are, arguably, the two most developed non-additive entropic functionals in Statistical Mechanics to date, they are not really all that different from each other in terms of their induced structures. For instance, one can easily argue as in [21,54], that S κ also describes cases of "weak chaos". To be more precise [21,54], one can straightforwardly see that if a system is described by the κ-entropy and the composition (20) or (21) is a reflection of its phase space metric properties, rather then being an emergent property due to the statistical averaging, then the largest Lyapunov exponent of the underlying dynamical system will be zero, in complete analogy with the case of S q . This commonality can be traced back to the similarity between the functional forms of S q and S κ : both (6) and (16) can be seen to have a functional form that is asymptotically exponential. These functional forms are actually suggestive of the different parametrizations of the hyperbolic space [55]. Of course, this does not mean that the actual functional forms of S q and S κ are the same, or that they will give rise to the same physical predictions, but they should share asymptotical, some common features such as describing weak chaos. It would be of great interest to compare the features of the systems that are described be each one of these two entropic functionals. We believe that someone should be able to say some similar things for many, if not necessarily all, probability distributions belonging to the exponential family, aspects of which have been developed in [16,17,18].
The non-uniqueness of S q , at least from the viewpoint of its composition properties, but the fact that it is a part of a larger family of functional forms that share many common features was also briefly touched upon in [27]. It was noticed in [27] that even though S q was an interesting case of a functional form belonging to the displacement convexity class DC N for it was not unique, by any means. Its uniqueness was restored, when in addition to (10), one could invoke the other axioms of [48,49]. It is not obvious, to us at least, that some of these axioms, even if reasonable, should necessarily describe the properties of the entropy for systems out of equilibrium, with long range temporal and spatial correlations, etc. In accordance with the functional forms of the generalized entropies used in defining the Bakry-Émery-Ricci curvature thorough optimal transportation, as presented in [27], it may be more prudent to consider S q as just one interesting example of an entropy having a polynomial/power-law form rather than as the unique entropy that may describe properties of the systems having properties that are mentioned above. Therefore the afore-mentioned interest in the analysis of systems that are described by one of such entropies, but not for the others, may help clarify their range of applicability or even the physical mechanisms leading to their effectiveness in describing the macroscopic properties of such systems.
3 The "shape" of independence and phase-space coarse-graining.
In this Section we analyse the concept of "independence" and the subsequent shape it induces on the fundamental cells in phase-space coarse-graining, with a view toward S BGS and the non-additive entropies of the previous Section.

Independence and cubes.
The conventionally accepted formalisation of the concept of independent interacting subsystems was stated in (7) and is realised through the multiplicative character of the marginal probability distributions of the interacting subsystems. In the closely related case of random variables X and Y , they are called "independent" if where E stands for "expectation value" of the corresponding random variable [56]. At the set-theoretical level, "independence" is conventionally encoded via the Cartesian product of sets. From this viewpoint, the simplest set expressing set-theoretic independence is the unit cube in R n indicated by From (7) it becomes obvious that the concept of probabilistic independence is intimately related to multiplicative-like structures [56]. Hence modifications in the definition of multiplication, as in (12), (21) for instance, will have significant implications for determining what constitutes "independent" outcomes. Through all this, we want to indicate that the introduction of (12) which was induced by S q and (21) for S κ , forces us to re-think and modify the concept of "independence" in the framework of the non-additive entropies (6), (16). This modification of "independence" is necessary due to the long-range temporal and spatial correlations of the systems that the non-additive entropies describe. When such correlations are present, the conventional definition (7) does not behave well ("covariantly") with respect to the structures induced by the underlying entropies such as S q , or S κ . As stated above, if we want to assume that the macroscopic algebraic and geometric properties are a direct reflection of the microscopic dynamics, and not emergent due to statistics, then a more "covariant" definition of independence at the microscopic level would be or following (12) or (21) respectively. Since there is no obvious generalisation of the Cartesian product in such cases, it is hard to see how one can find the counterparts of the unit cube (27) for the generalised products (12), (21).

Generalized independence and polytopes.
The question that therefore naturally arises is how to determine such generalised "cubes" whose shape would express generalized independence the same way that (27) expresses conventional independence. An answer is provided if one thinks of the cube in a metric, rather than in a set-theoretic, way. Consider R n . Its elements a are ordered n-tuples of real numbers a = (a 1 , . . . , a n ). Their R ∋ p-norm, for p ≥ 1 so as the triangle inequality to be satisfied, is defined by where | · | stands of the absolute value of its argument. The sup-norm in R n can be seen either or, equivalently, as the limit With such norms, R n is a Banach space, indicated as l n p or l n ∞ respectively. The ball indicated by B r (x) of radius r centered at a point x of a metric space X with distance function d is defined by B r (x) = {y ∈ X : d(x, y) ≤ r} (33) and the sphere S r (x) of radius r is defined by One can easily see that the cube is the unit ball of l n ∞ , namely where the superscript explicitly denotes the sup-norm. The advantage of this viewpoint is that it can be carried over directly to infinite dimensions, namely to the space of sequences (a 1 , . . . , a n , . . .) with elements in R, namely to the Banach space l p , p ∈ [1, ∞]. Actually there are several such reasonable infinite dimensional limits, depending on one's goals, but we will not enter the details of this. Such infinite dimensional limits are useful if one wishes to be able to consider the "thermodynamic limit" n → ∞ at some stage of these calculations. Moreover, such definitions can be generalised to uncountable spaces, such as the Lebesgue spaces L p (R n ) of p-integrable functions, to Orlicz, Sobolev and even more general function spaces [57] that may be useful.
One can then use the generalized operations of the deformed fields R q and R κ instead those of the usual addition and multiplication to define the generalised cubes I q n and I κ n in exactly the same way as it was done for R n in the previous paragraph. This is possible because of the presence of the field isomoprhisms (14), (23) which being distance non-decreasing maps, they also preserve the order structure of R. Hence the induced topologies by the generalized operations of R q , R κ are homeomorphic, the ordering of the elements of these sets is maintained, therefore the supremum has an unambiguous meaning etc. Given such definitions, the polytopes playing the role of the cubes I q n , I κ n for the generalized products (12), (21) respectively, can all be seen to be given by and I κ n =τ κ (I n ) (37) where the tilde ∼ denotes the n-dimensional extension of its underlying isomorphism.

Euclidean and dynamical aspects of coarse-graining with cubes.
The definitions of such cubes are particularly important in the context of coarse-graining of the phase space [29,28,58,59,60,61]. Coarse-graining was introduced by P. Ehrenfest and T. Ehrenfest in an attempt to explain the origin of macroscopic irreversibility, in the face of microscopic reversible dynamics. Many of these ideas can be traced back to L. Boltzmann.
One way to implement coarse-graining is to divide the phase space of the microscopic dynamical system (Hamiltonian of may degrees of freedom, in the case of our interest) in cells and substitute the smooth probability density ρ of phase space by a piece-wise constant one ρ cg in each of these cells. The size of each cell is assumed to be small but it should not approach zero. For Boltzmann's ideas about the behaviour of gases and the Sackur-Tetrode equation (3) the side length of each cube is taken to be √ . Effectively what this approach to coarse-graining does is to combine elements of the microscopic evolution of the system with a periodic partial equilibration. The end result is to determine a macroscopic kinetic equation that does not retain any memory of the initial condition of the system but captures the evolution of these successive partial equilibrations [28,58,59].
The coarse-graining of the phase space, but in a different form than that described in the previous paragraph can be attributed to the approximate knowledge that we have about the system, even at the (quasi-)classical level [60,61]. In a physical situation there is always some uncertainty, either about the dynamics or about the exact initial conditions of the system, or about both. Such uncertainties are frequently encoded in dynamics as "noise" or some other stochastic process through which the system interacts with its environment. "Noise", or particular slowly varying background fields, can also be seen as encoding the collective effect of degrees of freedom in the system which although present may be considered of secondary importance at the energy, time etc scale of interest. This is the spirit behind the Langevin approach in constructing kinetic equations [62].
In addition, since it is impossible to prepare a system with absolute accuracy at some predetermined state, we are inevitably led to consider not only a desired, or convenient, initial condition in our models, but a set of initial conditions that are reasonably close to the desirable one for the level of accuracy that we can tolerate in our predictions. Hence, one has to consider the evolution of sets of initial conditions under the given dynamics, with or without stochastic sources. This uncertainty is expressed by performing a periodic "ε-fattening" of the phase space evolution (orbit) of the system. After some judiciously chosen, for a particular model, amount of time, one "fattens" the Hamiltonian orbit, so that it initially appears to be like a tube.
The question that arises is how such perturbations, assumed to be initially small, either in the initial conditions or through noise, affect the system under study. The initial hope that they would not affect the underlying system in any qualitatively significant way was proved to be too naive in [63] (see also [64]), in the case of systems with phase space dimension dimM > 2.
This instability directly questions the physical relevance, for Statistical Mechanics, of the single orbit analysis of any particular dynamical system, even if it were practically feasible. Hence one is forced to consider the behaviour of sets of orbits which are initially near each other. Then one uses the ergodic theorem to substitute averages over orbits with averages with respect to appropriate measures over the whole phase space. This is a reasonable choice, assumed to be true, as ergodic measures are precluded from having a "complicated" phase space behaviour such as possessing attracting sets etc. This is direct implication of Birkhhoff's ergodic theorem [65]. Either way, and irrespective of the reason or the way that one chooses to perform phase space coarse-graining, during such process some of the features of the underlying microscopic evolution are lost, a fact which is desirable if one wishes to capture the thermodynamic behaviour of the system with the half-dozen or so (at most) macroscopic variables, as is usually the case.
The question that arises then is how to perform the coarse-graining of phase-space. The process appears, and largely is, ad hoc. But it is fundamental for the definition of any entropic functional. Due to its importance one may wish to make such a process a bit less less ad hoc by employing even partial knowledge about the underlying dynamics of the system. The obvious choice is to assume that the phase-space is divided into cubical cells of side length √ each of which has obviously a volume n/2 , if dimM = n. That typical cells in the coarse-graining process should be cubes is not only supported by the fact that geometrically they express "independence" or due to their geometric simplicity, but also due to quantum nature of the underlying physical mechanisms.
Probably the simplest realisation of this underlying quantum nature is the emergence of the unit cube of side √ in the asymptotic expression for the spectrum of the Laplacian on M which is provided by Weyl's asymptotic formula [66]. Weyl's asymptotic formula applies to a bounded domain Ω ⊂ R n , but this is not a problem in our case, since the cubes that we use to coarse-grain M have such a small side that can be considered effectively flat, to a first order approximation. Assume that such a domain Ω has also a smooth boundary and we indicate by λ k , k = 0, 1, . . . , n, . . . the eigenvalues of the Laplacian on functions f : Ω → R subject to the Dirichlet boundary condition f | ∂Ω = 0. Then the number N(Λ) of such eigenvalues which are smaller than Λ > 0 behaves asymptotically as where we have used cubes of side length √ . This counts the number of quantum states of the Laplacian inside Ω which can also be re-interpreted as the number of quantum cubical cells inside Ω, for macroscopic values of Λ such as the ones needed in thermodynamics, hence Λ → ∞.
Such validity relies tacitly on the fact that fundamental kinetic terms are always quadratic and that long memory effects that may give rise to non-Markovian evolutions described by anomalous kinetic terms are always an effective description arising due to the underlying statistics.
Despite the above plausibility arguments, the choice of the fundamental cells to be cubes still remains somewhat arbitrary. It should also be considered as still not acceptable as it ignores a central aspect of the Hamiltonian dynamics on phase space: its canonical transformation invariance, or in other words the existence of a symplectic structure on M. As will argue in the next Sections, choosing cubical cells for coarse-graining is probably the worst choice that someone could make in a metric sense, but probably the best in a measure-theoretical one: the best choice of a shape for the fundamental cells from the viewpoint of the Hamiltonian evolution would be (Euclidean) balls/ellipsoids instead of cubes.

Riemannian aspects of phase space coarse-graining.
An additional subtlety stems from the fact that the phase space on which the Hamiltonian evolution takes place is not usually R n but some Riemannian manifold M, with additional structure which we chose to overlook in the previous Sections. Even though any Riemannian manifold can be C 1 -differentiably embedded [67] (or even smoothly, i.e. C k , 3 ≤ k ≤ ∞ embedded [68]) into some R N , for N large enough, an intrinsic description is sought after that would allow us not to worry about intrinsic vs the embedding features in the resulting geometric description. This is very much in the spirit of Geometry since the time of K.F. Gauss and was implemented in General Relativity, for instance. Riemannian manifolds are metrically almost Euclidean. Many of their metric properties can be expressed in terms of their sectional curvature, which determines locally (second order deviation from "flatness") the distances on M [69,70,71]. Among by-products of the sectional creature, the Ricci curvature determines the volumes of shapes lying in hyperplanes perpendicular to a given direction on M, such as the direction of the Hamiltonian evolution. As a result of such curvatures, an initial shape will be distorted even if parallel transported along a curve. Hence if someone starts by partitioning the phase space into cubical cells, for the purposes of coarse graining, and wishes to follow the dynamics, the corresponding cells will become distorted cubes, i.e. 2n-face polytopes, along any orbit of the Hamiltonian system. As long as the underlying dynamics remains invertible, then the number of faces of such polytopes will remain 2n, even if the areas of the faces will no longer be equal to each other. If, for whatever reason (such as taking the thermodynamic limit) the dynamics loses its invertibility [65], then such cells may acquire a larger or smaller number of faces.
To summarise the discussion of this Section, coarse-graining and the curvature features of the phase space M force us to consider not only cubes but more general polytopes as the basic cells of coarse-graining of phase-space M. The need for such generalisation from cubes to polytopes becomes obvious, if one wishes to incorporate in the formalism the effects of generalized products such as (12), (21) via their induced generalised concepts of independence which are expressed geometrically through their induced "unit" cubes such as (36), (37). In all this discussion so far, we have (on purpose) ignored the dictates of the symplectic structure of M, which as will be seen in the next Section, point toward a very different, and largely incompatible, proposal on how to actually perform such a coarse-graining.

Symplectic basics: capacities and the role of ellipsoids.
In this Section, we provide some background on aspects of Symplectic Geometry/Topology that we need for our arguments, in an attempt to make the manuscript reasonably self-contained. Even though the concepts and facts that we present are very well-known to a mathematical audience, some of them are very non-trivial and have either only been proved relatively recently or they are still a subject of investigation. One might wish to consult some books, such as [72,73,74,75,76,77,78] or reviews [79,80,81] to get a grasp of such matters that we can only very superficially touch upon here.
It may, for a moment, be worth thinking about the role of the Hamiltonian approach to Mechanics. There are several, well-known advantages over the Lagrangian formulation (and vice-versa). From our perspective, and for our purposes, the Hamiltonian approach is more suitable because it allows for more symmetries between its variables. By elevating the canonical coordinates q i , i = 1, . . . , n and the canonical momenta (not probabilities!) p i , i = 1, . . . , n to equal status, the number of independent variables is doubled, hence there is greater possibility to detect and profitably use otherwise hidden symmetries or invariances. To make this easier to understand, we can start from simple discrete case: suppose we have been given one point. Then there is very little in terms of operations and symmetries that one can detect, therefore very little latitude and substantial lack of direction in building, detecting or utilising such structures. Now consider a set whose elements are multiple copies of this point. Then one can easily start by determining its automorphism group and its algebraic properties, one can build discrete geometric structures such as graphs or simplices etc and then by some from of reduction one can go back to the induced properties of such structures pertinent to one point. This approach seems to have been appreciated first by E. Galois. The spirit of the Hamiltonian approach follows, to an extent, these lines.

Basics about symplectic vector spaces.
Let H denote the Hamiltonian of a system (of many degrees of freedom, eventually). Hamilton's equations, as is well-known, arė were the dot indicates differentiation with respect to the evolution parameter ("time"). Since the canonical coordinates and momenta are on equal footing in the Hamiltonian approach, we can put them side by side as coordinates of a vector and re-express Hamilton's equations aṡ where the summation convention over repeated indices is assumed, and the matrix ω has elements ω ij , i, j = 1, . . . , n given by where 0 n and 1 n stand for the null and the unit n × n matrices with real entries. Moreover, we see that These statements are abstracted in the definition of a real (finite dimensional) symplectic vector space V which is a finite dimensional real vector space V, of even dimension, endowed with an antisymmetric and non-degenerate bilinear form ω, namely and such that for any The last equation is a non-degeneracy condition providing an isomorphism between V and its where i X denotes contraction of the symplectic form in the direction of the vector X. A different way to express the non-degeneracy of ω is by requiring that where the n! just fixes a normalisation, be a volume form on V, which is unique (up the normalisation). Following the standard algebraic practice, one defines W to a be a symplectic subspace of V if ω is non-degenerate on the linear subspace W. Obviously, the antisymmetry condition of ω is satisfied on W.
Even at this level, one can see that symplectic vector spaces are substantially different from spaces endowed with symmetric bilinear forms. Requiring antisymmetry, instead of symmetry, of a bilinear form on such a space has proved to have profound consequences, some of which will be noted below. One such consequence is that the concepts of symplectic and Euclidean orthogonality are very different: Let U be a linear subspace of the symplectic vector space V. Then the (symplectic) orthogonal complement U ⊥ of U is defined by Unlike the case of Euclidean geometry, U and U ⊥ need not be complementary subspaces, even though the non-degeneracy condition (44) implies that On the one hand, if they are indeed complementary, namely if then one can prove this is equivalent to stating that U is a symplectic subspace W of V which is also equivalent to stating that On the other hand, one can observe that that every vector X ∈ V is orthogonal to itself due to the antisymmetry of the symplectic form (42). These relationships between U and U ⊥ that are absent in Euclidean geometry can be generalized: U is called isotropic if U ⊂ U ⊥ , co-isotropic if U ⊥ ⊂ U and Lagrangian if U is both isotropic and co-isotropic, namely U = U ⊥ . Clearly, in 2-dimensions, the lines passing by the origin are Lagrangian subspaces of the plane. The subspace of canonical coordinates and that of canonical momenta are Lagrangian subspaces at a point of the phase-space of a Hamiltonian system.
All these definitions are "strange" by Euclidean standards, as the Euclidean metric has been explicitly defined to exclude such occurrences. However the vanishing of a bilinear form may be more familiar in the context of Special, and General Relativity. If ω were a symmetric, rather than antisymmetric, bilinear form then the fact that would define the light-like vectors X ∈ V. Since the usual 4 − dim Minkowski "distance function" where c indicates the speed of light, can be formally seen as arising from the 4 − dim Euclidean metric through the formal substitution one may be lead to suspect that there is an intimate relation between a Euclidean metric, an (almost) complex structure and the symplectic structure in a vector space V. The (almost) complex structure of V can be defined as an anti-involution J, namely J : V → V such that In more concrete terms and for V = R 2n , with respect to a Cartesian base J has the antisymmetric form Then V can be made into a complex vector space by defining, for a, b ∈ R and X ∈ V where the action of J on X has the effect of a multiplication by −i. It is no confidence that (42) and (56) have a similar form: indeed, if we indicate by ·, · the Euclidean inner product on R 2n and for X, Y ∈ R 2n we see that or, since The compatibility conditions (58), (59) have profound consequences in the case of manifolds to which we will turn in the next paragraphs.
Before that though, it may be worth mentioning another difference between the symplectic and the Euclidean cases: one can prove that in any symplectic vector space V one can choose a "symplectic basis" where the symplectic form will have essentially the same form as in (42), or to be more precise, one can pick a basis e i , f j , i, j = 1, . . . , n so that From our experience with Hamiltonian dynamics, one can see that this is an abstraction of the fact that, locally, the symplectic form looks like (42) in each 2-plane made up of the canonical coordinate q i and its conjugate canonical momentum p i for i = 1, . . . , n. Hence (58) is the antisymmetric/symplectic analogue of the Gram-Schmidt diagonalization process of symmetric bilinear forms. We see that even though in the latter case there are numerous possibilities in this diagonalization process, in the symplectic case, all symplectic vector spaces are locally the same. This is behind Darboux's theorem and the lack of local symplectic invariants for the case of symplectic manifolds in sharp contrast to the Riemannian case.
Let V be a symplectic vector space endowed with the symplectic form ω. A linear map ϕ : V → V is called symplectic or canonical if it preserves the symplectic structure, namely if the pull-back form ϕ * ω obeys As is clear from the terminology, symplectic maps are the linear canonical transformations of Hamiltonian Mechanics. These maps can be represented as matrices Φ, and it turns out that they obey det Φ = 1 (62) whose geometric interpretation is that they are volume-preserving. This is the linear formulation of Liouville's theorem. The obvious question on whether there is any difference between volume preserving and symplectic maps, in the case of manifolds, will be discussed in the sequel.
To complete the discussion of symplectic vector spaces, it turns out that if (U 1 , ω 1 ) and (U 2 , ω 2 ) are two symplectic vector spaces of the same dimension, then there is a linear isomorphism ϕ : U 1 → U 2 such that ϕ * ω 2 = ω 1 . Hence symplectic vector spaces of the same dimension are symplectically equivalent (indistiguishable from a symplectic viewpoint) as was also previously mentioned.
To define symplectic manifolds M, one follows the same steps as in the Riemannian case, but substitutes the symplectic for the corresponding Euclidean structures. Hence someone picks local patches ("charts") of symplectic (instead of Euclidean) vector spaces and glues them together using symplectic (rather than regular) diffeomorphisms. The details of such intuitively obvious, but non-trivial and cumbersome at times, construction can be found in the references. It is worth mentioning at this point one consequence of the fact that we use symplectic diffeomorphisms to glue together patches of symplectic vector spaces: the symplectic form ω is postulated to be closed on M where d denotes exterior differentiation on the space of differential forms of M. One way to interpret the requirement (63) is to use the canonical symplectic base (60) This theorem is due to G. Darboux. It expresses the fact that all symplectic manifolds are locally symplectically indistinguishable. Hence any non-trivial invariants of such manifolds will have to be global. Contrast this with the Riemannian case: in the Riemannian case there are plenty of local invariants which are encoded through the Riemann tensor at each point of M and its multiple covariant (properly symmetrized) derivatives and their contractions. One can also see the lack of local structure of symplectic manifolds "equivariantly": symplectic structures can be seen as the "quotient" of a topological space locally homeomorphic to R n under a set of "symmetries" (actually re-paramentrizations, therefore more akin to gauge rather than global symmetries) the action of the group of symplectic diffeomorphisms. If the set of such symmetries is large enough, it is entirely possible that the resulting structure is unique: this is what actually happens in the case of symplectic manifolds, locally at least. The non-degeneracy condition (45) carried over to the case of a symplectic manifold M can be seen as expressing, via the isomorphism (46), an isomorphism between the vector fields and the one forms of M, namely its tangent T M and cotangent bundles T * M. Consider a vector field X ∈ T M and the Lie derivative along it, indicated by [69] L X , of ω. Then according to Cartan's formula [69], If we assume that ω is closed (dω = 0), then Due to the non-degeneracy condition (45), for any smooth function f : M → R ("Hamiltonian") there is a unique vector field X f : T M → R ("Hamiltonian vector field") such that Substituting (67) into (66) one gets that Therefore ω remains invariant under the flow generated by X f . This is very desirable and natural from a physical viewpoint: in the case that f = H, we would like the symplectic (canonical) form to remain invariant under the ("time") evolution of the system. Turning the argument around, we see that the invariance under evolution of the symplectic form is equivalent to requiring it to be closed, something which is usually assumed from the outset without further explanation.
A second point arising from the above short argument is to see that the local uniqueness of the symplectic structure, expressed through Darboux's theorem, can indeed be seen from an equivariant viewpoint as previously suggested. The set of "symmetries" of the symplectic structures is infinite dimensional since it is generated by the "Hamiltonian vector fields" X f corresponding to any smooth enough functions f . This is a typical situation in topological field theories, for instance, and it is quite extensively employed in field and string theories on models with enough supersymmetries etc. It may be worth comparing this to the Riemannian case in which the isometry group of any metric is finite dimensional, something that allows for local structure that is able to differentiate between different Riemannian spaces.
which gives Hence at each point of M, dH is the kernel of the map T M → R. In other words, the Hamiltonian vector field X H is tangent to, and therefore it preserves, the level sets S 0 . This result is not totally unexpected if one looks at it from the viewpoint of the local compatibility between the simplectic and the almost complex structure expressed in the case of linear spaces in (58), (59). The role of J (56) is to generalise the complex unit i, hence its action can be seen to amount to a rotation by a right angle. The usual gradient vector field X H is perpendicular to the level set S 0 . Hence to compute the symplectic gradient, following (58), (59) one has to would rotate this by an appropriate right angle thus making it tangential to the level set S 0 . The above statements of this paragraph are a formal way of expressing the fact that the trajectory in phase space M of an isolated system, evolves in the constant energy hyper-surface S 0 , a realisation which lies at the foundation of the micro-canonical approach in Statistical Mechanics. Since the Lie derivative L X is a derivation, and given (68), we get

The symplectic non-squeezing theorem.
We saw that symplectic geometry, in sharp contrast to the Euclidean/Riemannian case, is fundamentally 2-dimensional, something which can also be justified in the following way. Let D 1 be a disk in R 2 and D 2 a subset of R 2 diffeomorphic to D 1 . If D 1 and D 2 have the same area then there is a symplectic diffeomorphism ϕ : D 1 → D 2 . This is due to J. Moser. Hence in 2 dimensions, "volume preserving" and "symplectic" are adjectives that can be used interchangeably. Therefore in 2-dimensions what distinguishes symplectic manifolds from each other is their total volume. The question that arose is how much of all these statements can carried over in higher dimensions. A step toward answering this question is the Gromov (-Eliashberg) alternative [82]: the group of symplectic diffeomorphisms of a 2n-dimensional connected symplectic manifold (M, ω) is C 0 -closed in the group of all diffeomorphisms of M or its C 0 -closure is the group of volume preserving diffeomorphisms of (M, ω). The fact that the former of these two alternatives is what actually occurs, was proved in the fundamental [83]. This result is also known as the "symplectic non-squeezing theorem" or even as "the principle of the symplectic camel": consider the standard symplectic space (R 2n , ω) parametrized by (x 1 , . . . , x n , p 1 , . . . , p n ) where each p i is canonically conjugate to the corresponding x i . Let Z i (R) indicate the cylinder of radius R based on the symplectic 2-plane (x i , p i ), namely Z i (R) = (x 1 , . . . , x n , p 1 , . . . , p n ) ∈ R 2n : Then, if there is a symplectic diffeomorphism ϕ : R 2n → R 2n embedding the ball (33) into Z j (R) we must have r ≤ R. It should be noticed here that the exact choice of the symplectic 2-plane does not matter. Someone could choose any cylinder based on another symplectic 2-plane (x j , p j ) and the result would still hold. However this conclusion does not hold if the cylinder is based on an isotropic 2-plane (x i , x j ) or (p i , p j ) as a local rescaling, leaving all other coordinates unaffected, given by is a volume preserving transformation which is moreover symplectic and can still embed the ball In words, what the symplectic non-squeezing theorem states is that it is impossible to squeeze a ball inside a symplectic cylinder if the ball's radius is larger than the radius of the cross-section (base) of the cylinder. This rigidity does not apply for isotropic cylinders. The non-squeezing theorem can be seen as providing an obstruction for the existence of symplectic embeddings and it clearly shows that the "symplectic" and "volume preserving" classes of diffeomorphisms are not the same in dimension higher than 2, unlike the 2-dimensional case.
The non-squeezing theorem can be seen as a counterpart to Liouville's theorem (71) which states that the symplectic volume of phase space remains invariant under a Hamiltonian (more generally: divergence-free) vector field. Liouville's theorem allows for the arbitrary change of shape of any subset of phase space M under symplectic/canonical transformations generated by a Hamiltonian vector field. This arbitrary change of shape, and in particular, the fact that its image under canonical transformations can become arbitrarily "thin" in phase space, very much like "oil in water" is at the heart of Boltzmann's explanation of the macroscopic timeirreversibility of physical processes on the face of their time-reversible microscopic dynamics [1,2,4,84]. The symplectic non-squeezing theorem states that such an arbitrary change of shape of a given phase space volume is simply not possible under canonical transformations and explicitly provides a limitation. Hence in higher dimensions "symplectic" and "volumepreserving" are quite different terms. It is currently unknown what the effect(s), if any, would be due to the non-squeezing theorem in the description of a Hamiltonian system of many degrees of freedom is. Would this provide some constraints in applying Boltzmann's irreversibility argument with macroscopic consequences, or the presence of the many degrees of freedom would "wash out" such "small-scale" features, as the Central Limit Theorem does for independently distributed random variables? In many ways, Boltzmann's irreversibility argument lies on the solid ground of Katok's lemma [85,65] which goes as follows: Let U 1 , U 2 be two bounded domains of equal volume in R 2n both of which are diffeomorphic to the ball B(r). Indicate by A△B the symmetric difference between the sets A and B. Then for every ǫ > 0 there is a Hamitonian H and an evolution parameter ("time") t, so that where ϕ t indicates the phase space flow generated by H calculated at time t. Due to this lemma, indeed any subset of M can "turn and twist" and become thin, overall, under canonical transformations. What cannot happen however, according to the symplectic non-squeezing theorem, is the projections of such a shape, as it evolves, along the symplectic 2-planes to become thinner than the original ones. As stated above, it is still unknown what are the effects of such a limitation on the projections of the initial shape along the canonical 2-planes. From the viewpoint of the present work, one can claim that the concept of entropy, for Hamiltonian systems of many degrees of freedom, can be seen as a manifestation of this underlying symplectic rigidity described by the symplectic non-squeezing theorem.
It may probably be worth noticing, that during the 30 years that have elapsed since the formulation and first proof of the non-squeezing theorem, several proofs different from the original one have also appeared in the literature, none of which is elementary or even relatively simple. In the face of this and in order to get a better feeling on why this theorem is true, one may wish to be less ambitious and try to present a relatively simple proof of the nonsqueezing theorem in the case of linear symplectic diffeomorphisms as indicated, for instance, in [86]. The intuitive advantage of such an approach is that such a proof involves concepts more familiar to physicists. We find it strange that 30 years have passed since the proof of the non-squeezing theorem, but its significance has not been widely appreciated in Physics. The most notable exception, in our opinion, is the work of M. de Gosson and his collaborators, who have been consistently emphasizing the interpretation and implications of the non-squeezing theorem, mainly for the physical cases of systems lying on the interface between Classical and Quantum Physics [86,87,88,89,90,91,92,93,94,95]. In this work, we rely considerably on the concept of "quantuum blob" [89,86,93,95] which will be presented shortly.

About symplectic capacities.
As it befits a fundamental work, the contribution of the non-squeezing theorem was profound in actually establishing symplectic geometry/topology as a distinct field of mathematics rather than an interesting, but mostly, afterthought. The work of Gromov [83] is also credited for developing concepts, such as the J-holomorphic curves that have proved to be enormously influential in a variety of contexts, some of which have significant overlap with developments in string/brane theories [96]. For our purposes, the significance of the non-squeezing theorem lies in that it provides an explicit construction for a class of (global) symplectic invariants called "symplectic capacities" [97,98,100,73,77] whose definition is the following: consider the class of symplectic manifolds (M, ω), possibly with a boundary, of dimension 2n. A symplectic capacity is a map c : (M, ω) → R + ∪ +∞ with c having the following properties • Conformality: • Normalization: where B r is the radius r ball and Z r is the cylinder of radius base r lying on a symplectic 2-plane, both in R 2n endowed with its standard symplectic structure (64).
It should be noticed that the conformality condition (76) can also be expressed, for U ⊂ M and a fixed symplectic structure, as • Conformality: The normalisation condition (77) can also be relaxed by just requiring • Weak normalization: It is a non-trivial fact that the symplectic capacities are invariant under symplectic diffemorphisms. The converse is partly true: a differentiable map ψ not necessarily invertible, of (R 2n , ω) that leaves the capacities invariant is either symplectic or anti-symplectic, namely it satisfies The existence of symplectic capacities is not obvious at all, but it is guaranteed by the validity of the symplectic non-squeezing theorem. Conversely, the existence of symplectic capacities implies the non-squeezing theorem. Obviously but for a symplectic cylinder Z r , which is also an unbounded set in R 2n , its symplectic capacity is bounded and given by (77). By contrast, for a cylinderZ r based on an isotropic 2-plane of R 2n the symplectic structure vanishes (by definition), therefore Based on the results of the non-squeezing theorem one can define the lower Gromov width (or just "Gromov width", or symplectic radius) of a subset U ⊂ R 2n by which means that the lower Gromov width is the maximum radius for which a ball of such a radius can be symplectically embedded through via ϕ into U. Hence, if c min (U) = r 0 , then B r cannot be symplectically embedded in U, if r > r 0 . By analogy, the upper Gromov width (or cylindrical capacity) of V ⊂ R 2n is defined as where Z R is a symplectic cylinder of base radius R lying on any symplectic 2-plane of R 2n . This also means that the smallest radius of the symplectic cylinder inside which V can be symplectically embedded via the symplectic map ψ is R and it is impossible to find a symplectic embedding of V to a symplectic cylinder with a smaller radius than R of its base. Given these two definitions, and based on the non-squeezing theorem, one sees that the lower Gromov width is the smallest possible capacity and the the upper Gromov width is the largest possible. Moreover, one can check that their convex combination is also a symplectic capacity. Hence there is an infinity of symplectic capacities on R 2n . Despite this fact, constructing explicitly such capacities has proved to be a substantial challenge: to this date several such capacities have been constructed, such as [83,97,98,99,100,80,101], none of which is obvious to either construct or prove that they indeed obey the axioms (75)- (77) or (75), (76) and (79). Without going into any details, we can indicatively mention that the Hofer-Zehnder capacity for compact, convex U ⊂ R 2n takes the form of an integral along the shortest periodic orbit γ 0 on the boundary of U of the first Poincaré invariant pdq familiar from Hamiltonian mechanics, namely where the summation convention for i = 1, . . . , n has been assumed.

Symplectic capacities of ellipsoids and the uncertainty principle.
Calculating explicitly the symplectic capacities of manifolds, or even subsets of R 2n has proved to be a difficult task. Among the very few cases for which an answer is known, the ellipsoids in R 2n stand out, because all symplectic capacities have the same value on them. This is straightforward for someone to see based on the symplectic non-squeezing theorem. Suppose that we have a real ellipsoid, the smallest two axes of which are of equal length l. Then a sphere of the radius l can be barely embedded in the ellipsoid, and the ellipsoid itself barely fits in a symplectic cylinder of the base radius l. Hence the upper and lower Gromov widths of this ellipsoid are l, so all its symplectic capacities are proportional to l 2 .
This argument can be made more concrete as follows [86]. Let A be a 2n × 2n real positivedefinite matrix, and J be as in (56). The eigenvalues of the matrix JA have the form ±iλ k with λ k > 0 and are called the symplectic eigenvalues of A. The set λ k , k = 1, . . . , n is called the symplectic spectrum of A. Williamson's theorem states that there is a unique element B of the symplectic group Sp(2n, R 2n ) such that where the superscript T stands for transposition and Λ n = diag(λ 1 , . . . , λ n ). Parametrize R 2n by the row vector z = (z i ), i = 1, . . . , 2n and consider the ellipsoid Then for any symplectic capacity c one finds that where Hence, from a Hamiltonian/symplectic viewpoint, the most appropriate/natural cells in which one should divide the phase space during coarse-graining are the ellipsoids (88).
It is a non-trivial fact [86] about which we would not like to elaborate, that the Heisenberg uncertainty principle, or its generalisation, the Robertson-Schrödinger inequality where ∆(x i , p i ) stands for the covariance matrix element, can be re-cast in terms of the symplectic capacities as where W is the Wigner ellipsoid associated to the covariances and c a symplectic capacity. This falls within the framework of the Wigner-Weyl approach to quantisation which is extensively used in the more mathematically rigorous or the quantum/classical interface treatments. In more familiar terms, one can see the content of the uncertainty principle as being expressed by the dictum that a function and its Fourier transform cannot be simultaneously sharply localized [102,103]. Since we know that coherent states (Gaussians) represent minimum uncertainty states, and that they are mapped into Gaussians under the Fourier transform, let's assume that a wave-function ψ : R 2n → C and its Fourier transform F [ψ] are bounded, in modulus, by such Gaussians, i.e.
where C is a constant and A, B are real symmetric matrices. Then consider the phase-space ellipsoid E : Ax 2 + Bp 2 ≤ . The Robertson-Schrödinger uncertainty principle can be expressed via a symplectic capacity c by These conditions imply that from a symplectic viewpoint, the appropriate choice of cell that should be used for phase-space coarse-graining is an ellipsoid rather than a cube. This is also compatible with, if not necessarily dictated by, Quantum Mechanics and provides a justification, in part, for our approach an interpretations presented in the sequel.

Symplectic vs Riemannian features.
It should be noticed that the symplectic capacities for an n-dimensional manifold M are genuinely new invariants that cannot be deduced from its Riemannian volume vol M by setting, for instance as this would violate (77) for a symplectic cylinder, for instance. On the other hand, we see that if the underlying manifold M is compact, then it will have a finite volume hence which is true for all compact manifolds. In the special case of 2 (real) dimensions, it is known that all symplectic capacities coincide with the area, as long as the manifold M is connected and simply connected [104,105]. Hence in two dimensions the symplectic and volume-reserving geometries coincide. But two dimensions are special in symplectic geometry. There is a suspicion / conjecture that this statement may be partly true ("all symplectic capacities coincide") for convex bodies in R 2n but there seems to be neither a proof nor a counterexample needed to resolve it. A fundamental, generally still unresolved, question is whether there exist intermediate symplectic invariants [80]. The simple-sounding question about finding the necessary and sufficient conditions for an ellipsoid to be symplectically embeddable in another is still generally unresolved; an answer became only recently known in 4-dimensions [81].
At this stage, there is an obvious question about the possible relation between Riemannian and symplectic invariants of a manifold M. As noticed previously, no such obvious relation exists for a power of the volume (95). But, the capacities c are a symplectic way of measuring the size of a manifold. On top of that, in the phase space of a Hamiltonian system M there is a Riemannian metric g, which is usually deduced from the quadratic term ("kinetic term") of the Hamiltonian. Therefore, one may wish to compare the areas of (real) 2-dimensional sub-manifolds V ⊂ M expressed via their embeddings ϕ : V → M. Such an area is computed symplectically via the pullback of the symplectic form or in a Riemannian manner by the area formula [106] A where we use the volume on V induced by the pullback metric given by the embedding ϕ. It turns out that (97), (98) are equal for pseudo-holomorphic curves V [83,107] which come about due to the extension of (58), (59) to almost complex target manifolds M, if endowed with a tame symplectic structure ω. Such pseudo-holomorphic curves turn out to be minimal surfaces in the Riemannian sense, hence can be considered as the analogue of geodesics in symplectic geometry. As strings move in space-time by "sweeping out" surfaces that should be of stationary ("minimal") area with respect to variations of the area functional, according to the principle of "least"/stationary action, the pseudo-holomorphic curves have been of great interest to string theory for the last two decades.
Not much is generally known about the relation between symplectic and Riemannian invariants of manifolds. A prominent role in such a relation is furnished by the conjecture of C.
Viterbo [108] which is formulated for convex domains in R 2n . Let U be such a convex domain and B 1 be the Euclidean unit radius ball in R 2n (33). For any symplectic capacity c and any where vol denotes the Riemannian volume of the corresponding sets. One immediately sees that this conjecture follows the same philosophy as the failed attempt to relate the volume and the symplectic capacity (95). The big difference between (95) and (99), is that (99) makes a similar, in spirit, statement but expressed in relative terms, i.e. relative to the corresponding quantities for a ball. The meaning of this symplectic isoperimetric conjecture is that among all convex domains U in R 2n with a given volume, the Euclidean ball has the maximal symplectc capacity.
Such a statement has a clearly isoperimetric flavour ("why is a droplet/bubble spherical"? or, among all shapes having a fixed volume, determine the one(s) with the least boundary/surface area). Viterbo's conjecture (99) is not fully proved yet, although weaker versions of it have been proved, which rely on inserting a constant A(n) on the right hand side of (99) Initially C. Viterbo proved (100) [108] for A(n) ∼ n, in particular A(n) ∼ 2n for symmetric convex domains with respect to the origin, and A(n) ∼ 32n for general convex domains. After that [109] improved the estimate to A(n) ∼ (log n) 2 . The best known estimate, to our knowledge, was furnished by [110] with A(n) becoming an actual constant, i.e. independent of the dimension n altogether, namely A(n) = A 0 . It should be noticed that the assumption of U being convex is essential: indeed, star-shaped domains were constructed in [111] having an arbitrarily small volume but fixed capacity, which violates (99). The conjecture itself is true for ellipsoids and convex Reinhardt domains [111]. In the exact opposite direction of Viterbo's conjecture, namely in finding the worst possible symplectic capacities to volume ratios, one can see the symplectic cylinders are candidates, since this ratio is zero in their case.
We see from the preceding analysis that ellipsoids are in some sense minimal and still are invariant under symplectic diffeomorphisms. Hence, from a purely Hamiltonian/symplectic viewpoint it makes more sense to use as cells in coarse-graining the phase space M ellipsoids rather than cubes. This is in sharp contrast to the Euclidean/Riemannian viewpoint of the previous Section for which, as we saw, cubical cells appear to be the most appropriate for such a coarse-graining process. Quantifying aspects of this mismatch between ellipsoids/spheres and cubes (or polytopes/convex bodies) to which we will ascribe the origin of entropy, is the topic of the next Section.

Basic concepts and implications of convexity.
In this Section we will be using convexity exclusively in R n . This is not as big of a compromise as it may appear since all manifolds, symplectic or not, are locally isometric to R n up to first order deviations. Curvature appears as a second order deviation from the Euclidean metric of R n . Hence, if one focuses on local properties of a manifold, understanding convexity of subsets of R n already accomplishes quite a bit. Moreover, as was mentioned above, Nash's embedding theorems show that a generic manifold is not "too different" from a Euclidean space since it can be isometrically embedded in a Euclidean space of high enough dimension. Working in R n allows us to use its linear structure to arrive in results that would not otherwise be accessible. Since we are working with Hamiltonian systems of many degrees of freedom, we should be able to eventually consider the thermodynamic limit. Even though taking such a limit can be a non-trivial, model dependent, process, it may not be unreasonable to assume, naively, that it is related to the limit n → ∞. Hence we are interested in the behaviour of convex subsets of R n , for n very large. This is the realm of the "local theory of Banach spaces", or "asymptotic geometric analysis", or "asymptotic convex geometry". It turns out that there are highly non-trivial and unexpected / counter-intuitive results in this realm (such as the "concentration of measure") which lies somewhere between linear algebra (for n:fixed, finite) and functional analysis (where one deals with infinite dimensional spaces). We would like to know about geometric structures that are, in a sense, "typical", so they can encode results of importance for Statistical Mechanics. The field of asymptotic convex geometry is highly developed. For our very limited purposes for this work, we have drawn, in various degrees, material from the books [112,113,114,115,116,117,118] and the reviews [119,120,121,122] to which one can turn, as well as to the numerous other outstanding references, for details and proofs and authoritative and comprehensive expositions on these topics In the sequel, we will confine ourselves to concepts and results needed in the present work.

Convex bodies, polar and functional dualities.
A convex set K ⊂ R n is a set such that A convex body is a compact, convex subset of R n having a non-empty interior. Hence balls, ellipsoids, cubes, polytopes etc. are convex bodies. A fundamental theorem in convexity is that Hahn-Banach separation theorem which implies that each convex body K is an intersection of half-spaces and that at each point of the boundary ∂K of such a convex body there is at least one supporting hyperplane. Intuitively, this should be obvious. We will be mostly interested in convex bodies that are symmetric with respect to the origin of R n . A convex body K is symmetric with respect to the origin of R n if x ∈ K implies −x ∈ K. Symmetric convex bodies are of great interest for the following reason: consider a finite dimensional normed space X.
Then it is possible to choose a bijection X → R n such that X can be identified with (R n , · ) for some norm · . What we have in mind is either the usual Euclidean norm, or a "generalised" norm induced by generalized products such as (12) or (21). The unit balls, centered at the origin, B 1 (X) of such norms are symmetric convex bodies, with respect to the origin. Conversely, for any symmetric convex body K one can assign canonically a corresponding (Minkowski) · K norm given by whose unit ball is B 1 (X) = K. As examples, with the notation of (33), one can see that is the Euclidean ball, B 1 (l n ∞ ) is the usual cube (35) whose edge is the interval [−1, +1] of length 2 units, so it is symmetric with respect to the origin. Another example, needed for our future reference, is the n-cross polytope, which is defined as the convex hull (97) of the endpoints of the unit orthonormal coordinate vectors e i , i = 1 . . . , n in Cartesian basis of R n .
The metric having the cross polytope and the unit ball is called the Manhattan/taxicab metric, or more concretely, the cross-polytope is B 1 (l n 1 ). The significance of the cross-polytope, for our purposes, stems from the fact that it is the polar set of the unit cube B 1 (l n ∞ ). Consider a convex set K ⊂ R n . Then its polar set K • is defined by where ·, · is the Euclidean inner product on R n . What polarity does is to exchange the faces with the vertices. By inspection, a cube in R n has 2n faces and 2 n vertices and the converse is true for the n cross-polytope. Upon a more careful examination, one can see that the polar of an n-cube is the n cross-polytope.
There is an equivalent functional way of expressing the above polar facts, mainly because R n is a linear space. Let X, Y be Banach spaces, whose corresponding norms even though different from each other, will still be indicated by · for simplicity of notation. Consider a linear operator T : X → Y. Such an operator is bounded if there is some constant C > 0 such Then the infimum of such numbers is the operator norm of T , indicated by T , so it is defined by In the case that Y = R then T is called a linear functional on X. The space of continuous linear functionals of X endowed with the operator norm is called the dual space of X and it is indicated by X * . The dual space of a Banach space tuns out to be a Banach space too. As is well-known from linear algebra such a T is an isomorphism if it is a bijection and in addition both T and T −1 are bounded. Such an isomorphism is an isometry if According to the Riesz representation theorem, every element a ∈ X * has the form a → a, b for some b ∈ R n . The unit ball of X * is therefore which due to (103) can be rewritten as Hence the convex concept of polarity of symmetric convex bodies in R n corresponds exactly to the functional analytic concept of duality of n-dimensional normed spaces. Given the wellknown duality are see that (108) is the generalisation of the polar relation indicated in the discussion preceeding (104).
We would like to compare, quantitatively, the coarse-graining of phase space by cubes (and their images under generalised operations induced by non-additive entropies, which are convex polytopes) and the balls/ellipsoids that should be the cells of coarse-graining of phase space as dictated by its symplectic structure. This can be accomplished by introducing a distance d in the set of all convex bodies. To define such a distance consider two symmetric convex bodies K 1 and K 2 of R n . Then such a distance d is given by What this essentially states is that the distance between K 1 , K 2 is the smallest number δ such that if K 1 can be barely inscribed in K 2 , then δK 1 can be barely circumscribed around K 2 and vice versa, with the role of K 1 , K 2 interchanged. This definition relies on dilations of the symmetric convex bodies K 1 , K 2 and therefore it is, not surprisingly, multiplicative. To get an additive distance, one should take the logarithm of such d. Moreover, and since physical theories use extensively metric concepts, it may be worthwhile comparing (111) with the wellknown Hausdorff distance as well as the numerous other distance functions that someone can come up with, many of which are disucssed in [71].
Consider the n-dimensional spaces X 1 , X 2 whose unit balls are K 1 , K 2 respectively, as explained the previous subsection. Then, following (111) their Banach-Mazur distance is defined where T is an operator which is an element of the general linear (Lie) group GL(n, R). Essentially what T does is to move around "rigidly" K 1 , K 2 until their mutual distance (111) becomes as small as possible, i.e. one of them "barely fitting" inside and outside the other, and vice versa.
This can be expressed more succinctly by stating that d is such that K 1 ⊆ T (K 2 ) ⊆ dK 1 for any T ∈ GL(n, R). Since, as seen in the previous subsection, there is an intimate link between convex geometry in R n and functional analysis, one can further translate the Banach-Mazur distance (112) in the functional analytic language as which is a form of the Banach-Mazur distance between normed spaces also frequently encountered.

5.3
The Banach-Mazur distance between a sphere and a cube.
Consider now the cube I n = [−1, +1] n in R n . It is easy to see that one can inscribe in it a ball of radius 1 and can circumscribe around it a ball of radius √ n, for any n, and one cannot do better than that. Hence, it is intuitively obvious that the Banach-Mazur distance between a cube and a ball is √ n. Therefore as the dimension increases the cube looks less and less like a ball: the vertices of the cube "move" further and further away from the centre of the ball, assumed fixed, or conversely the ball "curves more" as n increases and therefore it becomes smaller and smaller inside a fixed cube. To make this a bit more precise, one can start form the well-known formula for the volume of the Euclidean unit-radius ball in R n vol B 1 = π n 2 Γ( n 2 + 1) where Γ stands for the (Euler) gamma function. Using Stirling's approximation provides the estimate Therefore one can see that due to the curvature of the surface of the ball, the ball of unit radius has a volume that approaches zero very fast as n → ∞. This is in sharp contrast with the case of the cube I n whose volume increases as n → ∞. Hence it is expected that the distance between the cube and the ball of unit radius will increase as n increases. Not only that, but such a distance is as large as possible for the ball and the cube, a direct outcome of F. John's theorem.
Based on the above and from a metric viewpoint, as was also previously remarked, coarsegraining the phase space M with balls instead of cubes (or vice versa) should provide the greatest possible discrepancies than the coarse-graining of M with any other regular polytopes.
But entropy, and this is probably clearest in the definition of the Kolmogorov-Sinai case [65], involves considering the supremum over all possible phase space partitions. Since coarse-graining involves a piece-wise (for each cell) constant probability density ρ cg , phase space measures are constant multiples, per cell, of their phase space volumes. As a result, substituting small enough cubic cells for balls (and vice versa) in coarse-graining will provide the maximal possible measure discrepancy, namely the maximal possible phase-space volume loss, which is what S BGS has been designed to capture.
It should be noticed that the polar duality is of fundamental importance in asymptotic convex geometry hence it may be desirable to compare pairs of polar dual polytopes rather than single convex bodies. In this respect the Euclidean ball is unique in that it is the only is self-dual polytope under polarity. If it is pairs of polar dual polytopes that are of fundamental interest, the one should also consider the Banach-Mazur distance between B 1 (l n 2 ) and the cross-polytope B 1 (l n 1 ). It should also be intuitively obvious that such a distance is n and this is as bad as things can get. Intuitively the cross-polytope and the cube are the "pointiest" of all convex bodies hence their distance from the "roundest" of all of them, which is the ball, should be maximal.
For our purposes the ball, or more generally ellipsoids, are a manifestation of the symplectic structure of R n and the cube encodes geometrically the concept of independence, as seen in the previous sections. Since however we would like to allow for generalised concepts of "independence" induced by entropies such as S q , S κ etc. whose induced "cubes" are, in general, symmetric polytopes/convex bodies what we would like to ask is what is the Banach-Mazur distance between convex bodes and ellipsoids. And since actually calculating the Banach-Mazur distance between convex bodies has proved to be quite hard, in general, one can settle either by asking for upper bounds for that distance or for asymptotic estimates as n → ∞ (the "thermodynamic limit"). That there is a unique ellipsoid of maximal volume that can barely fit inside a convex body is guaranteed by F. John's theorem where conditions for this maximal ellipsoid to actually be the Euclidean sphere are also spelled out.
Another result, intuitively plausible, if not obvious, is that a reasons why a sphere and a cube are so different is that the cube has too few faces. What would happen if one allowed for a symmetric convex body with far more faces? Then the answer is that its distance form the sphere should decrease. And this is actually what is happening but the crucial matter, for our purposes, is that the increase has to be exponential in terms of the dimension. More precisely, consider a symmetric convex body K ⊆ R n such that d(K, B 1 (l n 2 )) = d. Then K must have at least exp( n 2d 2 ) faces. Since the distance between the cube and the ball is √ n, a cube in R m has almost spherical sections of dimension log m. A substantial generalisation of this fact, Dvoretzky's theorem, we will be used in Section 6.
Having stated that the distance between a cube an Euclidean ball is the maximum possible, we may turn to a desirable consequence already embedded in the definition of entropy [1,2,3]. Entropy, like any thermodynamic quantity, involves ignoring a lot of the details of the underlying dynamical processes as noticed earlier in this subsection. All such details are contained in the phase space evolution of the underlying dynamical system. Hence skipping many of these details, for small enough cells in a coarse-graining of phase-space, is tantamount to glossing over or even ignoring a substantial volume of the cells, if we think that such an omission will not make, statistically, any substantial difference. Since S BGS describes systems having a simple phase space evolution according to Birkhoff's ergodic theorem [65], hence the micro-canonical density ρ is uniform on a constant energy hypersurface of the isolated system, we can simplify our considerations and use balls instead of cubes, as the distance between them is the largest possible, for coarse-graining. By doing that, we have to ignore most of the volume of these cubes, thus simplifying the description, without losing much, due to the uniformity of the micro-canonical measure and its proportionality to volume within each cell.
This procedure is effective due to the fact that the majority of the volume of the cube I n is close to its vertices, something that can be justified as follows: place the cube's centre of symmetry at the origin of R n and express its volume in spherical coordinates. Let let θ collectively express the angular coordinates, parametrising the unit sphere S n−1 and r(θ) be the radial coordinate. Then vol (I n ) = vol(B 1 ) where dΣ expresses the infinitesimal area element of the surface of I n . Since I n is a cube of side length 2, its volume is 2 n . Therefore which, after using (112) gives approximately This corresponds to a cube of average radius about 2n/πe. Given that the distance between the vertices of the cube and its centre is √ n, and that centre of each face is 1 unit away from its centre, we conclude from (119) that most of the volume of the cube is closer to its vertices than to the centre of its faces. Hence substituting the cubes I n by Euclidean balls inscribed in them, hence of unit radius, omits most of the volume of the cube, which however is rather innocuous, as can be inferred from the above discussion.
We should be very careful however, since such an argument may not necessarily apply for systems described by S q , S κ or any of the other non-additive entropies. Such systems will have more complicated phase space behaviour, potentially attracting sets and the like [123], which a regular measure ρ may not be able to adequately describe. As the case of the Sinai-Ruelle-Bowen [124] measures indicates, one may have to use measures some of the marginals of which may be nowhere absolutely continuous with respect to the phase space volume, thus invalidating the above arguments and complicating substantially the coarse-graining process.

Unit Euclidean balls and Gaussians.
There is another reason for considering, and quantifying, the discrepancy between balls/ellipsoids and cubes/convex bodies/polytopes. Consider the, arguably, simplest system of many degrees of freedom: the classical (non-relativistic) ideal gas of N identical particles of mass 2 units which is placed inside an isolated cubical box of side length L. The phase space of this system factorizes as L 3N × R 3N . The Hamiltonian is Given that the system is isolated, its total energy is conserved. Set it equal to unit for simplicity. Hence the momentum space reduces from R 3N to the unit sphere S 3N −1 . The thermodynamic limit corresponds to taking N → ∞ which gives rise to the Maxwell distribution, which is Gaussian with respect tot the molecular speeds. Probabilistically, this is a simple manifestation of the Central Limit Theorem. This result, interpreted geometrically, shows that a unit radius sphere of high dimension (N → ∞) should be an excellent approximation to the Gaussian distribution.
That this is indeed true had been stressed byÉ. Borel and later by P. Lévy. More recently, it has been emphasized in the work of V.D. Milman and M. Gromov. The argument can be made more precise as follows [119]: consider the Euclidean ball B n R of radius R in R n . Due to the homogeneity of volume, its volume is where B 1 is given by (114). Assume that volB n R = 1. Hence Hence a section of this ball of codimension 1 passing through the origin, will be an n − 1 dimensional ball B n−1 R of radius R whose volume is according to (118) and (119) vol B n−1 After using Stirling's approximation (115), we see that for large n this section has a volume approximately equal to √ e. Consider now an n−1 dimensional section of this ball at a distance r from the ball's center. Its radius will be (R 2 − r 2 ) 1 2 . As a result its volume will be, for large n, approximately given by Following (116) we can see that the a Euclidean ball of volume 1, for large n, has a radius of about Substituting (125) in (124) gives for the volume of the spherical section at distance r from the So, the projection of the unit volume of the ball in a spherical section of co-dimension 1, which is at a distance r from the ball's centre, for large n, is almost a Gaussian distribution of variance 1 2πe . What we have done is a geometric re-formutation and "derivation" of the Central Limit Theorem.
What we also observe in the above derivation is that most of the volume of the ball concentrates in a lower dimensional section passing through its center. This observation turns out to be independent of the validity of the Central Limit Theorem, even if this is not obvious from the above arguments. Counter-intuitive as it may be, it is frequently encountered in convex asymptotic geometry, i.e. in Banach spaces as their dimension increases to infinity. Extensions for the cases of Riemannian manifolds with a Ricci curvature bounded from below or in terms of the behavior of the lowest non-trivial eigenvalue the spectrum of their Laplacian also exist. The same can be stated for smooth metric measure spaces under additional assumptions and generalizations of the definition of convexity. This underlying behavior has been called "concentration of measure" [125,126,127,128,114,129,71,130,118] and was the main avenue for V.D. Milman proving Dvoretzky's theorem which we will come to in the next Section.
In the previous Sections we saw that there is a substantial discrepancy between the coarsegraining approach that rely on cubes which are the outcome of independence/simplicity arguments and balls/ellipsoids induced from symplectic capacities. Ultimately one would like to have a "natural way", to the extent possible, to decide on such coarse-graining. The existence of the fundamental constant arising in Quantum Physics, partially helps provide a scale for such a phase-space coarse-graining, but the exact shape of the fundamental cell employed is still a matter of choice, as previously has been pointed out. Since balls/ellipsoids minimise all symplectic capacities and their high dimensional sections are Gaussian, as seen in the previous subsection, one may be willing to use them as the fundamental cells for phase-space coarsegraining. This would be favourably supported by Quantum Mechanics where the minimum uncertainty, hence "as precise as possible", wave-functions for quadratic potentials, which are the lowest order approximations to any "generic" analytic potentials, are Gaussians. See also the discussion in subsection 4.5. In this subsection we rely on the work of M. de Gosson and collaborators [86,87,88,89,90,91,92,93,94,95] where many more details on simiar topics can be found.
"Quantum blobs" are, very much like Gaussians, minimum uncertainty sets whose size is measured using symplectic capacities instead of volumes. Their advantage, as opposed to cubic cells, is that they remain invariant under canonical transformations, hence they preserve the Hamiltonian structure of the dynamical system. To be more specific, we work in R 2n , and define a quantum blob Q 2n (x 0 ) to be the image of the Euclidean ball B √ (x 0 ) ⊂ R 2n under a canonical/symplectic transformation. Using the results of subsection (4.5) we see that for a quantum blob c(Q 2n (x 0 )) = 1 2 (127) which has the same order of magnitude as the symplectic capacity of a cube. By contrast, since symplectic transformations are volume-preserving, the volume of a quantum blob, following (118), is As result, this volume is n!2 n times smaller than that of a cube. Since we are interested in the thermodynamic limit n → ∞, the quantum blob has far ("infinitely" ) smaller volume than that of a cube, despite the fact that its sections along all symplectic 2-planes are of comparable area.
The above considerations of cubes versus ellipsoids as fundamental cells of phase-space coarse-graining raise another question. If the two-dimensional sections of these two candidates for fundamental cells are of almost the same area but their volumes are so different, is there any In this Section we present Dvoretzky's theorem and the Dvoretzky dimension, and its implications for the definition of entropy. We do not claim that we can actually predict the functional form of entropy to be used in each occasion, as would be desirable, if feasible, in our opinion. However we can at least see that the choices of S BGS and S q are plausible, asymptotically, for the case of the "thermodynamic limit" (n → ∞) seen through a perspective/viewpoint induced by Dvoretzky's theorem and its implications. The choices of the appropriate deformation fields such as R q and R κ contribute in determining the shape of the fundamental cells in which the phase space M of the Hamiltonian system should be divided in a coarse-graining process.
However such a choice, if it is also followed by the other axioms of Shannon / Khintchin or Abe / Santos etc amounts to the choice of the entropic functional employed in any particular situation. Still the viewpoint that we follow in this Section may be useful in seeing the entropy under a different light, which may allow inferences that may not be easily accessible otherwise, such as the origin of dualities in non-additive entropies etc.
The above considerations of cubes versus ellipsoids as fundamental cells of phase-space coarsegraining raise another question. If the two-dimensional sections of these two candidates for fundamental cells of phase-space coarse-graining are of almost the same 2-dim "area" but their volumes are so different, is there any intermediate situation? Given that quantum blobs have a substantially smaller volume than the corresponding cubes and that inside each cube or convex body/polytope there is a maximal volume ellipsoid (F. John's theorem), is there any intermediate dimensional section of the cube which is close to being a ball/ellipsoid? We saw in subsection 5.3 that for a symmetric convex body to be close to a ball, it must have exponentially many faces. Equivalently, a cube in R n has almost spherical sections of dimension at least log n. Hence, if the dimension of a section of a cube is larger than log n then it can be reasonably close to a ball, therefore the discrepancy between phase space coarse-graining by cubes and by balls can be seen as non-significant. On the other hand, as seen in subsection 5.3, if such a section has dimension much larger than log n, then the cube and the ball will be substantially different from each other in their volumes, hence a coarse-graining procedure would give quite different results for these two fundamental cells. Since cubes and balls are as distinct from each other as possible as measured by their Banach-Mazur distance, getting the same coarse-graining results for both of them can be seen as relatively re-assuring that we will get the same results by using as fundamental cell of phase space any other symmetric convex body (polytope).
From a physical viewpoint, one can interpret this way of thinking to mean that log n as n → ∞ is, asymptotically, the optimal order of magnitude dimension that one can use in order for the coarse-graining results of phase space to be virtually independent of the exact shape of the fundamental cell. Hence the statistically important characteristics of the underlying dynamical system would be preserved, geometrically, but there would still be a reduction in the complexity as measured by the number of the effective degrees of freedom of the system. But this is exactly what the entropy was designed to capture. Of course, the entropy is associated to a measure, which may not be just the volume. However, if one assumes the validity of the ergodic hypothesis, the micro-canoncal density is uniform on the constant energy hyper surface under consideration, as was also previously pointed out (subsection 5.3). Hence all the arguments pertaining to the micro-canonical measure are reduced to the corresponding ones about volumes. Hence, in the most conventional sense, the entropy should have the form of a natural logarithm of the accessible phase space, in accordance with Boltzmann's, Gibbs', Shannon's etc ideas.

One wonders on whether anything similar can be stated about non-extensive Statistical
Mechanics and the non-additive entropies such as S q and S κ . After all, the underling geometric structures are almost the same, the only difference being that cubes will have to be replaced by their images under (14), (23) and their generalisations, generalised shapes which are symmetric convex polytopes. The underlying symplectic structure, by contrast, remains the same as Hamilton's equations are still assumed to be applicable for such systems (see also the related comment in subsection 3.3 after eq. (38)). Hence the question that arises, in analogy with the cube, is whether there is any dimension beyond which a symmetric convex body (polytope) has almost spherical sections. There is an additional level of difficulty however: it is not clear that the systems described by S q or S κ are ergodic. On the contrary, it has been conjectured that these non-additive entropies describe exactly non-ergodic systems (see also the last paragraph of subsection 5.3). Even though any "reasonable" measure admits a decomposition into ergodic components [65], this is not enough to justify the reduction of the phase-space measure to just the volume. To proceed, we assume that each phase space cell is so small that such a reduction is possible. This can occur, for instance, when the measure density variation is slow, when compared to the spatial extent of each cell.
To make the question more precise and general, assume that X is an n-dimensional Banach space whose unit ball expresses the generalized independence induced by some nonadditive entropic functional. Then, is there a subspace E of X of dimension k(ǫ, n) such that d(E, l k 2 ) ≤ 1 + ǫ, for ǫ > 0 ? In this expression d stands for the Banach-Mazur distance between E and the k-dimensional Euclidean (Hilbert) space l k 2 . Slightly more geometrically, one can start from the symmetric convex body K in R n which expresses the afore-mentioned independence. Does there exist a section K ∩ E of K by a subspace E of dimension k(ǫ, n) so that if E is an ellipsoid, one has the inclusion E ⊆ K ∩ E ⊆ (1 + ǫ)E? It should be noticed that the above questions are asymptotic, in the sense that n is large, namely n → ∞. The answer to these equivalent questions is affirmative as was proved in 1961 by A. Dvoretzky [114,118,119,125,126,121,122] a result which is one of the cornerstones of Geometric Functional Analysis / Asymptotic Convex Geometry. The result is extremely counter-intuitive as someone can easily see by trying to imagine a section of a cube which is almost spherical. Such a result is possible exactly because the dimension n is assumed to be quite large.
As it befits a fundamental result, there are different re-formulations in slightly different contexts under the general title of "Dvoretzky's theorem" and several alternative proofs. One of the still unsettled questions is the optimal form of the Dvoretzky dimension k(ǫ, n) in the above statements. The best known estimate is k(ǫ, n) ≥ cǫ 2 log n, ǫ ∈ (0, 1) The case of our interest (see also the discussion in the next subsection) involves the n-dimensional Banach spaces endowed with the norms (30), (31) which were indicated as l n p , 1 ≤ p ≤ ∞. It is a non-trivial result that the corresponding Dvoretzky dimensions k(l n p ) are given asymptotically n → ∞ by We observe that when p > 2 then we have a power-law behaviour. If the Dvoretzky dimension of a cube k(l n ∞ ) is responsible for giving rise to the logarithmic form of the entropy as we previously suggested, then one can see that power-law entropic forms such as S q and S κ may be seen, at a first glance, as arising from the Dvoretzky dimension of l n p , p > 2. Even though this would not be exactly correct, in the face of the comments of the next subsection, at least it may be considered as suggestive, and indicative of yet not fully appreciated underlying structures. We observe in (130) the remarkable property that k(l n p ) ∼ n, 1 ≤ p ≤ 2. Pushed to its limit, and substituting q for p in (130), if the above discussion is pertinent for S q , then the assumed range of entropic indices q ∈ (0, 1) would cover all possible cases for the leading power-law behaviour of the entropic functional. If the conclusions of [38,39] prove to be correct, the range q ∈ (0, 1) in S q is the only acceptable one for the entropic index related to a Hamiltonian system of many degrees of freedom. Similar things can be probably stated about the entropic parameter κ and the entropic functional S κ .
Potentially existing sub-leading terms in the non-additive power-law functionals cannot be detected by the asymptotic form (130), so far as we can see. It is probably exactly such sub-leading terms that determine the differences between functionals such as S q and S κ and therefore these terms may turn out to able to distinguish which one of these functionals is more appropriate for describing which system. Still, referring to more general systems described by the above two, or any other, entropic functionals of power-law form, we are not quite certain about how to interpret the significance, if any, of the "phase transition" in the Dvoretzky dimension of l n p exhibited between p ∈ [1, 2] and p ∈ (2, ∞).

6.2
On the (ǫ, τ ) entropy and the use of l n p spaces.
It may be of interest to elaborate a bit upon the dynamical origin and role of ǫ appearing in Dvoretzky's theorem. This parameter expresses, as was previously discussed in subsection 3.3, our fundamental inability to follow the evolution of the underlying dynamical system with absolute precision. One way to quantify this, is through a modification of the Kolmogorov-Sinai entropy [65] such as the one presented in [131] called ǫ-entropy: in the case of the Kolmogorov-Sinai entropy, one introduces partitions/cells of the phase space of size ǫ over which one defines S BGS . Eventually, one takes the size of the partition ǫ → 0. By contrast in the ǫ-entropy the size of the fundamental cell remains finite, but all other steps as the same as in the case of the Kolmogorov-Sinai entropy construction. One can refer to [131] for the advantages and difficulties that such a definition implies. What is important, for our purposes however, is that in this definition the finite, even if small, phase-space "resolution" ǫ enters explicitly the definition entropy. Hence this parameter will invariably appear, explicitly or implicitly, in the composition properties such as (8), (9) or (19) and therefore R, R q or R κ respectively. Since one defines the fundamental polytopes of phase space coarse-graining, which express geometrically "independence", via these algebraic operations and structures, the shape of these cells/polytopes will invariably contain a dependence on ǫ. Therefore there will be some finite uncertainty about the exact shape of the convex body that one should use in Dvoretzky's theorem, based on the above physical arguments. This uncertainty is quantified and appears in Dvoretzky's theorem as the upper bound requirement of the Banach-Mazur distance in the theorem and explicitly in the Dvoretzky dimension κ(ǫ, n) as its dependence on ǫ. In this paragraph, we have used ǫ twice, in two quite different contexts. This is clearly a substantial abuse of notation: even though one may possibly expect to find a relation between the indeterminacies expressed by ǫ in these two contexts, one should not assume that they are equal, let alone the same.
A second point that may be clarifying, is the extensive use of the l n p , 1 ≤ p ≤ ∞, Banach spaces in arguments (subsection 3.2 and forth) in this work. One reason for such use is that a lot of their features are reasonably well-understood, when compared to general Banach spaces.
This however does not make them physically more relevant, just formally more tractable. Based on S BGS and S q , S κ it appears that the concept of independence as expressed by cubes, namely the unit ball of l n ∞ , should be sufficient for our purposes. After all, the field isomorphisms (14), (23) are invertible and distance non-decreasing. So they preserve the number of vertices of these cubes. All that they do is to uniformly expand the sides of the fundamental cubical cells of the phase space partition used in coarse-graining. In addition, any legitimate cube should have clearly defined vertices. However, all l n p , 1 < p < ∞ have unit spheres that are everywhere differentiable, hence they do not possess any clearly defined (point) vertices.
To address the last concern, one should refer to the ideas leading to the use of the (ǫ, τ ) entropy in the first paragraph of this subsection. In realistic models, even classical ones, there is always some uncertainty associated to the scale of phase-space coarse-graining ǫ. This should be reflected on the composition property of the pertinent entropic functional. This indeterminacy, in turn, makes the concept of independence become a bit "vague": the corresponding cubes do not have well-defined vertices and faces, but rather areas of small but finite "thickness" as faces, and areas of small spatial extent / "radius" as vertices. Hence one should not be able to distinguish between "cubes" in l n p for, let's say, p = 1 and for p = 1 + δ, where 0 < δ ≪ 1. A second reason in favour of using nn only the spaces l n p but also more general Banach spaces in employing Dvoretzky's theorem for physically relevant cases is that we want to have a formalism that is flexible enough to accommodate, many families of entropic functionals. If the price that one pays for such a flexibility is small, then one is willing to go along with a slightly more elaborate, but far more general, formalism to accomplish these ends. Consider, for instance, models of highly anisotropic systems, of as practical as materials possessing layered structures [132], or as exotic and conjectural as of anisotropic (Hořava-Lifshitz [133] etc.) gravity. Then it may not be entirely unreasonable to propose a direction-depedent entropic form for such systems, as long as there is a relatively clear separation of the dynamics and scales in the different directions. This would elevate the non-extensive parameters of entropic functionals such as S q , S κ into vectors. If this is true, then the corresponding unit cells, expressing independence, in phase-space coarse-graining will be anisotropic convex bodies which however can still be accommodated by the convexity formalism presented here. To this date though, there has not been any compelling theoretical reason to introduce any such vector-valued entropic index functionals, so far as we know.
An issue that may be worth discussing is that of polar duality. From a geometric as well as analytical viewpoint, polarity has turned out to be of considerable significance, since the earliest days of Euclidean geometry. The functional analytic analogy of the polar of a convex body K • with the dual space X * presented in subsection 5.1 has important and far-reaching consequences. One of them is that the Banach-Mazur distance remains invariant under such a duality, namely for any normed spaces X, Y, the Banach-Mazur distance obeys An immediate implication is that since, according to John's theorem d(l n 2 , l n ∞ ) = √ n (132) and because (l n 1 ) * = l n ∞ and the Hilbert space l n 2 is self-dual (the Euclidean ball is the polar of itself), one also has d(l n 2 , l n 1 ) = √ n Since the different l n p via their unit balls express different ways of defining independence, induced by the various non-additive entropies, one also needs the extension of the above to d(l n p , l n q ) = n where either 1 ≤ p ≤ q ≤ 2 or 2 ≤ p ≤ q ≤ ∞. The remaining option, namely 1 ≤ p ≤ 2 < q ≤ ∞ gives only bounds for the Banach-Mazur distance as C 1 n β ≤ d(l n p , l n q ) ≤ C 2 n β , β = max with C 1 , C 2 being positive constants. Going back to Dvoretzky's theorem, the Figiel-Lindenstrauss-Milman theorem provides for the Dvoretzky dimension k of a Banach space X and its dual X * the lower bound k(X) k(X * ) ≥ Cn 2 d(X, l n 2 ) 2 which since d(X, l n 2 ) ≤ √ n gives that k(X) k(X * ) ≥ C ′ n where C and C ′ are positive constants. As a result, for any such Banach space X, one has that either k(X) ≥ C √ n or k(X * ) ≥ C √ n, a result that turns out to be sharp.
We refer to dualities in this work, not only because they play an important role in Convex Geometry, but also because they may be of importance for the case of non-additive entropies. It has been surmised from some data, for S q for instance, that some systems seem to be invariant under the entropic parameter changes Clearly the case q = 1 in (138) which corresponds to S BGS remains invariant under such transformations, hence issues that can be raised for q = 1 pertaining to (138), are undetectable for S BGS . Convex polarity of the unit balls and the corresponding Banach space duality may somehow be related to (138) in a currently not understood manner. However a pattern that starts emerging that the above considerations are suggestive of, is that it may be worthwhile to analyse in parallel, and compare to each other, features of systems whose entropic indices are connected by some duality transformation. Assuming that such systems are described by different values of the entropic parameter of the same single-parameter family of functionals such as S q or S κ , it may be worth investigating what features of such systems that are common, or in some, still vague, sense "opposite"/"complementary".
To push this viewpoint a little bit further, it may be worth examining concurrently from both a convex and a symplectic geometric viewpoints properties of the unit balls of dual to each other finite dimensional Banach spaces X endowed with the norm · and X * endowed with the dual norm · * . To this end, one may consider examining convex and symplectic geometric properties of properties of X × X * . A straightforward observation is that this vector space has a canonical symplectic structure: assume that X, Y ∈ X and that X * , Y * ∈ X * are their respective duals. Then the canonical simplectic structure ω on X × X * is defined by ω((X, X * ), (Y, Y * )) = X * (Y ) − Y * (X) (139) and the corresponding Liouville form of X × X * is, of course, ω n /n! . We saw during all this work t hat the Euclidean ball and the cube are, in some sense, as different from each other as possible, and that even though the former behaves quite well under symplectic transformations the same is not true for the latter. Therefore, it may come as a complete surprise that for the case of the cube I n ⊂ R n and its polar, the cross-polytope, I • n ⊂ R n , the interior of I n × I • n turns out to be symplectically diffeomorphic to the interior of the Euclidean ball in R 2n of the same volume [76]. This again shows the unexpected features of symplectic geometry where flabbiness and rigidity can be found in totally unexpected places.
On the geometric side, a question in the spirit of the isoperimetric problem [71,114,134] which was posed by Mahler (ca. 1939) was to find upper and lower bounds for the Liouville volume of B 1 × B • 1 where B 1 , B • 1 stand for the unit balls of X and X * respectively. This volume vol(B 1 ×B • 1 ), often called "Mahler volume", is invariant under linear invertible transformations. The upper bound was determined in 2 and 3 dimensions in [135], and generalised in any dimension [136], if and only if X is the Euclidean space (Blaschke-Santaló inequality). The equality was proved in [137]. Mahler conjectured that the lower bound is 4 n /n! and would be sharp.
This lower bound would clearly apply for the pair of the cube its polar, the cross-polytope. Mahler himself verified this conjecture in 2 dimensions but in higher dimensions the conjecture remains unproven. What has been proved though is the conjecture up to multiplicative factor whose best value known today is given in [138]. In an interesting recent development, [139,140] assumed the validity of the Viterbo conjecture (eqs. (99), (100) and the discussion around them), and proved that the Hofer-Zehnder capacity for a symmetric convex body K and its polar K • is c(K × K • ) = 4 (140) which, in turn, showed that the Viterbo conjecture implies the validity of the Mahler conjecture.
This subsection used some symplectic and convex geometric facts and conjectures to suggest that it may be formally fruitful for someone to look at the same time at pairs of systems, rather than single systems, described by entropies, belonging to the same single-parameter family but having harmonically conjugate indices (which represent geometrically polarity and Banach space duality). It remains to be seen whether this approach may provide some insights into the nature of such systems as well as about the possible invariances and their origin, of non-additive entropies such as S q under "duality" transformations such as (138).

7
Conclusions and discussion.
In this work we presented the view that the source of entropy can be ascribed to two mutually exclusive ways of performing phase space coarse-graining. for Hamiltonian systems with many degrees of freedom. The underlying Euclidean/Riemannian structure favours cells that are cubical. By contrast, the symplectic structure favours ellipsoids. We discussed ways to measure the discrepancy of these two disjoint approaches and also gave estimates, via Dvoretzky's dimension of minimal dimensions spaces in which they give almost the same results. So, it is in some sense, as if there is a set of variables present at the microscopic level that reflects the number of variables that one sees as the outcome of a statistical analysis, i.e. in thermodynamics. We cannot quite dare claim that these variables are the same, or even more so, that there is a "phase-space thermodynamic" behaviour. It just appears from Dvoretzky's theorem that at the microscopic level one can infer a number of variables that are of the same order of magnitude as the ones needed for a macroscopic descriptio of the system. Further investigation in this direction may be of some interest. Moreover, we saw some preliminary formal indications about the suspected presence and about the origin of dualities of non-additive entropies via polarity.
One could certainly expand this work in both the symplectic and the convex geometric directions, if deemed necessary. The symplectic capacities are still not very well understood objects. In case this looks too removed from the modelling of physical systems, it may be worth mentioning that there is an elaborate and flourishing research area on the interface between symplectic geometry and string theory. Even though the goals and approaches in this area may appear substantially different from the ones of Statistical Mechanics, some general ideas and technical approaches especially pertaining to dualities [141,142,143] may be profitably adapted in the present context. After all, quantum perturbative string theory, like any quantum theory, has a statistical interpretation and explicitly uses methods of statistical mechanics relying on S BGS . To what extent one may wish to consider other functionals in such a statistical approach is unclear. However, given string theory's origins in dual resonance models that were eventually superseded by QCD which is asymptotically free and has a Wilson loop formulation [144], shows us that low energy correlations become dominant, a feature of systems that non-additive entropies such as S q claim to describe. Hence it may be worth looking into string theory from an non-additive entropy viewpoint. There is more than just pure speculation on this front: using the phenomenological asymptotic bootstrap approach of Hagedorn for strong interactions, some recent results suggest an important role that S q may play in this regime. Such phenomenological approaches relying partly on S q , seem to be, most importantly, in accordance with existing experimental data [145,146,147,148,149,150].
One can also use several well-known results of convex geometric analysis, such as the Bourgain distortion theorem or the Johnson-Lindenstrauss flattening lemma etc. to expand upon the results that just used Dvoretzky's theorem and the associated dimension [113,115,116,118,121,122]. Whether such results can be generalized and can lead to interesting conclusions pertinent to non-additive entropies is not clear in our mind. However, the thought of using a physical idea, such as a non-additive entropic functional, to potentially help prove a purely geometric conjecture such as that of Mahler, is probably too enticing to not motivate someone to look carefully and further develop the symplectic/asymptotic convex point of view.
In closing, and from a formalistic viewpoint, one could not avoid mentioning a trend toward categorification that exists in some mathematical quarters. Such categorification may provide a formalism that may be able to bring forth unexpected aspects of non-extensive statistical mechanics, and touches upon on some aspects of topics discussed in this work. One application of this categorification that has touched upon Physics is that of Khovanov homology [151,152] in relation to Chern-Simons theory and the Jones polynomial. It may also be worth studying the case of the Fukaya categories related to Lagrangian Floer cohomology in symplectic geometry [153] and mirror symmetry. In the context of entropy, alas only for S BGS and in the spirit of categorification, one may appreciate some unique insights and viewpoints explored in the recent [154,155] which may eventually turn out to be particularly useful and illuminating.