Common probability patterns arise from simple invariances

Shift and stretch invariance lead to the exponential-Boltzmann probability distribution. Rotational invariance generates the Gaussian distribution. Particular scaling relations transform the canonical exponential and Gaussian patterns into the variety of commonly observed patterns. The scaling relations themselves arise from the fundamental invariances of shift, stretch, and rotation, plus a few additional invariances. Prior work described the three fundamental invariances as a consequence of the equilibrium canonical ensemble of statistical mechanics or the Jaynesian maximization of information entropy. By contrast, I emphasize the primacy and sufficiency of invariance alone to explain the commonly observed patterns. Primary invariance naturally creates the array of commonly observed scaling relations and associated probability patterns, whereas the classical approaches derived from statistical mechanics or information theory require special assumptions to derive commonly observed scales.

It is increasingly clear that the symmetry [invariance] group of nature is the deepest thing that we understand about nature today. I would like to suggest something here that I am not really certain about but which is at least a possibility: that specifying the symmetry group of nature may be all we need to say about the physical world, beyond the principles of quantum mechanics.
The paradigm for symmetries of nature is of course the group symmetries of space and time. These are symmetries that tell you that the laws of nature don't care about how you orient your laboratory, or where you locate your laboratory, or how you set your clocks or how fast your laboratory is moving (Weinberg 1 , p. 73).
For the description of processes taking place in nature, one must have a system of reference (Landau and Lifshitz 2 , p. 1).

INTRODUCTION
I argue that three simple invariances dominate much of observed pattern. First, probability patterns arise from invariance to a shift in scaled measurements. Second, the scaling of measurements satisfies invariance to uniform stretch. Third, commonly observed scales are often invariant to rotation.
Feynman 3 described the shift invariant form of probability patterns as in which q (E) is the probability associated with a measurement, E. Here, the ratio of probabilities for two different measurements, E and E , is invariant to a shift by a. Feynman derived this invariant ratio as a consequence of Boltzmann's equilibrium distribution of energy levels, E, that follows from statistical mechanics Here, λ = 1/ E is the inverse of the average measurement.
Feynman presented the second equation as primary, arising as the equilibrium from the underlying dynamics of particles and the consequent distribution of energy, E. He then mentioned in a footnote that the first equation of shift invariance follows as a property of equilibrium. However, one could take the first equation of shift invariance as primary. The second equation for the form of the probability distribution then follows as a consequence of shift invariance.
What is primary in the relation between these two equations: equilibrium statistical mechanics or shift invariance? The perspective of statistical mechanics, with eqn 2 as the primary equilibrium outcome, dominates treatises of physics. Jaynes 4,5 questioned whether statistical mechanics is sufficient to explain why patterns of nature often follow the form of eqn 2. Jaynes emphasized that the same probability pattern often arises in situations for which physical theories of particle dynamics make little sense. In Jaynes' view, if most patterns in economics, biology, and other disciplines follow the same distributional form, then that form must arise from principles that transcend the original physical interpretations of particles, energy, and statistical mechanics 6 .
Jaynes argued that probability patterns derive from the inevitable tendency for systems to lose information. By that view, the equilibrium form expresses minimum information, or maximum entropy, subject to whatever constraints may act in particular situations. In maximum entropy, the shift invariance of the equilibrium distribution is a consequence of the maximum loss of information under the constraint that total probability is conserved.
Here, I take the view that shift invariance is primary. My argument is that shift invariance and the conservation of total probability lead to the exponential-Boltzmann form of probability distributions, without the need to invoke Boltzmann's equilibrium statistical mechanics or Jaynes' maximization of entropy. Those secondary special cases of Boltzmann and Jaynes follow from primary shift invariance and the conservation of probability. The first part of this article develops the primacy of shift invariance.
Once one adopts the primacy of shift invariance, one is faced with the interpretation of the measurement scale, E. We must abandon energy, because we have discarded the primacy of statistical mechanics, and we must abandon Jaynes' information, because we have assumed that we have only general invariances as our basis.
We can of course end up with notions of energy and information that derive from underlying invariance. But that leaves open the problem of how to define the canonical scale, E, that sets the frame of reference for measurement.
We must replace the scaling relation E in the above equations by something that derives from deeper generality: the invariances that define the commonly observed scaling relations.
In essence, we start with an underlying scale for observation, z. We then ask what transformed scale, z → T z ≡ E, achieves the requisite shift invariance of probability pattern, arising from the invariance of total probability. It must be that shift transformations, T z → a+T z , leave the probability pattern invariant, apart from a constant of proportionality.
Next, we note that a stretch of the scale, T z → bT z , also leaves the probability pattern unchanged, because the inverse of the average value in eqn 2 becomes λ = 1/b T z , which cancels the stretch in the term λE = λT z . Thus, the scale T z has the property that the associated probability pattern is invariant to the affine transformation of shift and stretch, T z → a + bT z . That affine invariance generates the symmetry group of scaling relations that determine the commonly observed probability patterns [7][8][9] .
The final part of this article develops rotational invariance of conserved partitions. For example, the Pythagorean partition T z = x 2 (s)+y 2 (s) splits the scaled measurement into components that add invariantly to T z for any value of s. The invariant quantity defines a circle in the xy plane with a conserved radius R z = √ T z that is invariant to rotation around the circle, circumscribing a conserved area πR 2 z = πT z . Rotational invariance allows one to partition a conserved quantity into additive components, which often provides insight into underlying process.
If we can understand these simple shift, stretch, and rotational invariances, we will understand much about the intrinsic structure of pattern. An explanation of natural pattern often means an explanation of how particular processes lead to particular forms of invariance.

BACKGROUND
This section introduces basic concepts and notation. I emphasize qualitative aspects rather than detailed mathematics. The final section of this article provides historical background and alternative perspectives.

Probability increments
Define q(z) ≡ q z such that the probability associated with z is q z ∆ψ z . This probability is the area of a rectangle with height q z and incremental width ∆ψ z .
The total probability is constrained to be one, as the sum of the rectangular areas over all values of z, which is q z ∆ψ z = 1. When the z values are discrete quantities or qualitative labels for events, then the incremental measure is sometimes set to one everywhere, ∆ψ z ≡ 1, with changes in the measure ∆ψ z made implicitly by adjusting q z . The conservation of probability becomes q z = 1. If a quantitative scale z has values that are close together, then the incremental widths are small, ∆ψ z → dψ z , and the distribution becomes essentially continuous in the limit. The probability around each z value is q z dψ z . Writing the limiting sum as a integral over z, the conservation of total probability is q z dψ z = 1.
The increments may be constant-sized steps dψ z = dz on the z scale, with probabilities q z dψ z = q z dz in each increment. One may transform z in ways that alter the probability expression, q z , or the incremental widths, dψ z , and study how those changes alter or leave invariant properties associated with the total probability, q z dψ z .

Parametric scaling relations
A probability pattern, q z dψ z , may be considered as a parametric description of two scaling relations, q z and ψ z , with respect to the parameter z. Geometrically, q z dψ z is a rectangular area defined by the parametric height, q z , with respect to the parameter, z, and the parametric width, dψ z , with respect to the parameter, z.
We may think of z as a parameter that defines a curve along the path (ψ z , q z ), relating a scaled input measure, ψ z , to a scaled output probability, q z . The followings sections describe how different invariances constrain these scaling relations.

SHIFT INVARIANCE AND THE EXPONENTIAL FORM
I show that shift invariance and the conservation of total probability lead to the exponential form of probability distributions in eqn 2. Thus, we may consider the main conclusions of statistical mechanics and maximum entropy as secondary consequences that follow from the primacy of shift invariance and conserved total probability.

Conserved total probability
This section relates shift invariance to the conservation of total probability. Begin by expressing probability in terms of a transformed scale, z → T z , such that q z = k 0 f (T z ) and The term k 0 is independent of z and adjusts to satisfy the conservation of total probability.
If we assume that the functional form f is invariant to a shift of the transformed scale by a constant, a, then by the conservation of total probability The proportionality constant, k a , is independent of z and changes with the magnitude of the shift, a, in order to satisfy the constraint on total probability. Probability expressions, q(z) ≡ q z , are generally not shift invariant with respect to the scale, z. However, if our transformed scale, z → T z is such that we can write eqn 3 for any magnitude of shift, a, solely by adjusting the constant, k a , then the fact that the conservation of total probability sets the adjustment for k a means that the condition for T z to be a shift invariant canonical scale for probability is which holds over the entire domain of z.
The key point here is that k a is an adjustable parameter, independent of z, that is set by the conservation of total probability. Thus, the conservation of total probability means that we are only required to consider shift invariance in relation to the proportionality constant k a that changes with the magnitude of the shift, a, independently of the value of z. Appendix A provides additional detail about the conservation of total probability and the shift-invariant exponential form.

Shift-invariant canonical coordinates
This section shows the equivalence between shift invariance and the exponential form for probability distributions.
Let x ≡ T z , so that we can write the shift invariance of f in eqn 4 as By the conservation of total probability, α a depends only on a and is independent of x.
If the invariance holds for any shift, a, then it must hold for an infinitesimal shift, a = . By Taylor series, we can write Because is small and independent of x, and α 0 = 1, we can write α = 1−λ for a constant λ. Then the previous equation becomes This differential equation has the solution in whichk may be determined by an additional constraint. Using this general property for shift invariant f in eqn 4, we obtain the classical exponential-Boltzmann form for probability distributions in eqn 2 as with respect to the canonical scale, T z . Thus, expressing observations on the canonical shift-invariant scale, z → T z , leads to the classical exponential form. If one accepts the primacy of invariance, the "energy," E, of the Boltzmann form in eqn 2 arises as a particular interpretation of the generalized shift-invariant canonical coordinates, T z .

Entropy as a consequence of shift invariance
The transformation to obtain the shift-invariant coordinate T z follows from eqn 5 as This logarithmic expression of probability leads to various classical definitions of entropy and information 3,10 . Here, the linear relation between the logarithmic scale and the canonical scale follows from the shift invariance of probability with respect to the canonical scale, T z , and the conservation of total probability. I interpret shift invariance and the conservation of total probability as primary aspects of probability patterns. Entropy and information interpretations follow as secondary consequences.
One can of course derive shift invariance from physical or information theory perspectives. My only point is that such extrinsic concepts are unnecessary. One can begin directly with shift invariance and the conservation of total probability.

Example: the gamma distribution
Many commonly observed patterns follow the gamma probability distribution, which may be written as This distribution is not shift invariant with respect to z, because z → a + z alters the pattern There is no value of k a for which this expression holds for all z.
If we write the distribution in canonical form then the distribution becomes shift invariant on the canonical scale, T z = z − α log z, because T z → a + T z yields with k a = ke −λa . Thus, a shift by a leaves the pattern unchanged apart from an adjustment to the constant of proportionality that is set by the conservation of total probability. The canonical scale, T z = z − α log z, is log-linear. It is purely logarithmic for small z, purely linear for large z, and transitions between the log and linear domains through a region determined by the parameter α.
The interpretation of process in relation to pattern almost always reduces to understanding the nature of invariance. In this case, shift invariance associates with loglinear scaling. To understand the gamma pattern, one must understand how process creates a log-linear scaling relation that is shift invariant with respect to probability pattern 7-9 .

Conserved average values
Stretch invariance means that multiplying the canonical scale by a constant, T z → bT z , does not change probability pattern. This condition for stretch invariance associates with the invariance of the average value.
To begin, note that for the incremental measure dψ z = dT z , the constant in eqn 5 to satisfy the conservation of total probability is k = λ, because ∞ 0 λe −λTz dT z = 1, when integrating over T z .
Next, define X ψ as the average value of X with respect to the incremental measure dψ z . Then the average of λT z with respect to dT z is The parameter λ must satisfy the equality. This invariance of λ T T implies that any stretch transformation T z → bT z will be canceled by λ → λ/b. See Appendix A for further details.
We may consider stretch invariance as a primary attribute that leads to the invariance of the average value, λ T T . Or we may consider invariance of the average value as a primary attribute that leads to stretch invariance.

Alternative measures
Stretch invariance holds with respect to alternative measures, dψ z = dT z . Note that for q z in eqn 5, the conservation of total probability fixes the value of k, because we must have The average value of λT z with respect to dψ z is Here, we do not have any guaranteed value of λ T ψ , because it will vary with the choice of the measure dψ z . If we assume that T ψ is a conserved quantity, then λ must be chosen to satisfy that constraint, and, from the fact that λT z occurs as a pair, λ T ψ is a conserved quantity. The conservation of λ T ψ leads to stretch invariance, as in the prior section. Equivalently, stretch invariance leads to the conservation of the average value.

Example: the gamma distribution
The gamma distribution from the prior section provides an example. If we transform the base scale by a stretch factor, z → bz, then There is no altered value of λ for which this expression leaves q z invariant over all z. By contrast, if we stretch with respect to the canonical scale, T z → bT z , in which T z = z − α log z for the gamma distribution, we obtain Thus, if we assume that the distribution is stretch invariant with respect to dz, then the average value λ T z = λ z − α log z is a conserved quantity. Alternatively, if we assume that the average value is a conserved quantity, then stretch invariance of the canonical scale follows.
In this example of the gamma distribution, conservation of the average value with respect to the canonical scale is associated with conservation of a linear combination of the arithmetic mean, z , and the geometric mean, log z , with respect to the underlying values, z. In statistical theory, one would say that the arithmetic and geometric means are sufficient statistics for the gamma distribution.

Relation between alternative measures
We can relate alternative measures to the canonical scale by dT z = T dψ z , in which T = |dT z /dψ| is the absolute value of the rate of change of the canonical scale with respect to the alternative scale. Starting with eqn 7 and substituting dT z = T dψ z , we have Thus, we recover a universally conserved quantity with respect to any valid alternative measure, dψ z .

Entropy
Entropy is defined as the average value of − log q z . From the canonical form of q z in eqn 2, we have Average values depend on the incremental measure, dψ z , so we may write entropy 11 as The value of log k ψ is set by the conservation of total probability, and λ is set by stretch invariance. The value of T ψ varies according to the measure dψ z . Thus, the entropy is simply an expression of the average value of the canonical scale, T, with respect to some incremental measurement scale, ψ, adjusted by a term for the conservation of total probability, k.
When ψ ≡ T, then k ψ = λ, and we have the classic result for the exponential distribution in which the conserved value λ T T = 1 was given in eqn 7 as a consequence of stretch invariance.

Cumulative measure
Shift and stretch invariance lead to an interesting relation between − log q z and the scale at which probability accumulates. From eqn 8, we have Multiplying both sides by q z , the accumulation of probability with each increment of the associated measure is The logarithmic form for the cumulative measure of probability simplifies to This expression connects the probability weighting, q z , for each incremental measure, to the rate at which probability accumulates in each increment, dq z = −λq z dT z . This special relation follows from the expression for q z in eqn 2, arising from shift and stretch invariance and the consequent canonical exponential form.

Affine invariance and the common scales
Probability patterns are invariant to shift and stretch of the canonical scale, T z . Thus, affine transformations T z → a + bT z define a group of related canonical scales. In previous work, we showed that essentially all commonly observed probability patterns arise from a simple affine group of canonical scales 7-9 . This section briefly summarizes the concept of affine invariant canonical scales. Appendix B provides some examples.
for some constants a and b. We can abbreviate this notion of affine invariance as T • G ∼ T, (9) in which "∼" means affine invariance in the sense of equivalence for some constants a and b.
We can apply the transformation G to both sides of eqn 9, yielding the new invariance T • G • G ∼ T • G. In general, we can apply the transformation G repeatedly to each side any number of times, so that for any nonnegative integers n and m. Repeated application of G generates a group of invariances-a symmetry group. Often, in practical application, the base invariance in eqn 9 does not hold, but asymptotic invariance holds for large n. Asymptotic invariance is a key aspect of pattern 12 .

ROTATIONAL INVARIANCE AND THE GAUSSIAN RADIAL MEASURE
The following sections provide a derivation of the Gaussian form and some examples. This section highlights a few results before turning to the derivation.
Rotational invariance transforms the total probability q z dT z from the canonical exponential form into the canonical Gaussian form This transformation follows from the substitution λT z → πv 2 R 2 z , in which the stretch invariant canonical scale, λT z , becomes the stretch invariant circular area, πv 2 R 2 z , with squared radius v 2 R 2 z . The new incremental scale, vdR z , is the stretch invariant Gaussian radial measure.
We can, without loss of generality, let v = 1, and write Λ = πR 2 z as the area of a circle. Thus the canonical Gaussian form describes the probability, − log q z = Λ, in terms of the area of a circle, Λ, and the incremental measurement scale, dψ z , in terms of the radial increments, dR z . Feynman 3 noted the relation between entropy, radial measure, and circular area. In my notation, that relation may be summarized as However, Feynman considered the circular expression of entropy as a consequence of the underlying notion of statistical mechanics. Thus, his derivation followed from an underlying canonical ensemble of particles By contrast, my framework derives from primary underlying invariances. An underlying invariance of rotation leads to the natural Gaussian expression of circular scaling. To understand how rotational invariance leads to the Gaussian form, it is useful consider a second parametric input dimension, θ, that describes the angle of rotation 13 . Invariance with respect to rotation means that the probability pattern that relates q(z, θ) to ψ(z, θ) is invariant to the angle of rotation.

Gaussian distribution
I now show that rotational invariance transforms the canonical shift and stretch invariant exponential form into the Gaussian form, as in eqn 10. To begin, express the incremental measure in terms of the Gaussian radial measure as λdT z = πv 2 dR 2 z = 2πv 2 R z dR z , from which the canonical exponential form q z dT z = λe −λTz dT z may be expressed in terms of the radial measure as Rotational invariance means that for each radial increment, vdR z , the total probability in that increment given in eqn 12 is spread uniformly over the circumference 2πvR z of the circle at radius vR z from a central location.
Uniformity over the circumference implies that we can define a unit of incremental length along the circumferential path with a fraction 1/2πvR z of the total probability in the circumferential shell of width vdR z . Thus, the probability along an increment vdR z of a radial vector follows the Gaussian distribution invariantly of the angle of orientation of the radial vector.
Here, the total probability of the original exponential form, q z dT z , is spread evenly over the two-dimensional parameter space (z, θ) that includes all rotational orientations. The Gaussian expression describes the distribution of probability along each radial vector, in which a vector intersects a constant-sized area of each circumferential shell independently of distance from the origin.
The Gaussian distribution varies over all positive and negative values, R z ∈ (−∞, ∞), corresponding to an initial exponential distribution in squared radii, R 2 z = T z ∈ (0, ∞). We can think of radial vectors as taking positive or negative values according to their orientation in the upper or lower half planes.

Radial shift and stretch invariance
The radial value, R z , describes distance from the central location. Thus, the average radial value is zero, R R = 0, when evaluated over all positive and negative radial values. Shift invariance associates with no change in radial distance as the frame of reference shifts the location of the center of the circle to maintain constant radii.
Stretch invariance associates with the conserved value of the average circular area in which the variance, σ 2 , is traditionally defined as the average of the squared deviations from the central location. Here, we have squared radial deviations from the center of the circle averaged over the incremental radial measure, dR z . When λ = v 2 = 1, we have σ 2 = 1/2π, and we obtain the elegant expression of the Gaussian as the relation between circular area and radial increments in eqn 11. This result corresponds to an average circular area of one, because 2πR 2 = 2πσ 2 = 1.
It is common to express the Gaussian in the standard normal form, with σ 2 = 1, which yields v 2 = 1/2π, and the associated probability expression obtained by substituting this value into eqn 10.

Transforming distributions to canonical Gaussian form
Rotational invariance transforms the canonical exponential form into the Gaussian form, as in eqn 10. If we equate R z = √ T z and λ = πv 2 , we can write the Gaussian form as in whichσ is a generalized notion of the variance. The expression in eqn 13 may require a shift of T z so that T z ∈ (0, ∞), with associated radial values R z = ± √ T z . The nature of the required shift is most easily shown by example.

Example: the gamma distribution
The gamma distribution may be expressed as q z dψ z with respect to the parameter z when we set T z = z − α log z and dψ z = dz, yielding for z ≥ 0. To transform this expression to the Gaussian radial scale, we must shift T z so that the corresponding value of R z describes a monotonically increasing radial distance from a central location.
For the gamma distribution, if we use the shift T z → T z − α = (z − α log z) − α for α ≥ 0, then the minimum of T z and the associated maximum of q z correspond to R z = 0, which is what we need to transform into the Gaussian form. In particular, the parametric plot of the points (±R z , q z ) with respect to the parameter z ∈ (0, ∞) follows the Gaussian pattern.
In addition, the parametric plot of the points (T z , q z ) follows the exponential-Boltzmann pattern. Thus we have a parametric description of the probability pattern q z in terms of three alternative scaling relations for the underlying parameter z: the measure dz corresponds to the value of z itself and the gamma pattern, the measure dR z corresponds to the Gaussian radial measure, and the measure dT z corresponds to the logarithmic scaling of q z and the exponential-Boltzmann pattern. Each measure expresses particular invariances of scale.

Example: the beta distribution
A common form of the beta distribution is for z ∈ (0, 1). We can express this distribution in canonical exponential form ke −λTz by the scaling relation with λ > 0. For α and β both greater than one, this scaling defines a log-linear-log pattern 8 , in the sense that −λT z scales logarithmically near the endpoints of zero and one, and transitions to a linear scaling interiorly near the minimum of T z at When 0 < α < 1, the minimum (extremum) of T z is at z * = 0. For our purposes, it is useful to let α = λ for λ > 0, and assume β > 1. Define T * as the value of T z evaluated at z * . Thus T * is the minimum value of T z , and T z increases monotonically from its minimum. If we shift T z by its minimum, T z → T z − T * , and use the shifted value of T z , we obtain the three standard forms of a distribution in terms of the parameter z ∈ (0, 1), as follows.
The measure dz and parametric plot (z, q z ) is the standard beta distribution form, the measure dR z and parametric plot (±R z , q z ) is the standard Gaussian form, and the measure dT z and parametric plot (T z , q z ) is the standard exponential-Boltzmann form.

ROTATIONAL INVARIANCE AND PARTITIONS
The Gaussian radial measure often reveals the further underlying invariances that shape pattern. Those invariances appear from the natural way in which the radial measure can be partitioned into additive components.

Overview
Conserved quantities may arise from an underlying combination of processes. For example, we might know that a conserved quantity, R 2 = x + y, arises as the sum of two underlying processes with values x and y. We do not know x and y, only that their conserved sum is invariantly equal to R 2 .
The partition of an invariant quantity into a sum may be interpreted as rotational invariance, because defines a circle with conserved radius R along the positive and negative values of the coordinates √ x, √ y . That form of rotational invariance explains much of observed pattern, many of the classical results in probability and dynamics, and the expression of those results in the context of mechanics. The partition can be extended to a multidimensional sphere of radius R as One can think of rotational invariance in two different ways. First, one may start with a variety of different dimensions, with no conservation in any particular dimension. However, the aggregate may satisfy a conserved total that imposes rotational invariance among the components. Second, every conserved quantity can be partitioned into various additive components. That partition starts with a conserved quantity and then, by adding dimensions that satisfy the total conservation, one induces a higher dimensional rotational invariance. Thus, every conserved quantity associates with higher-dimensional rotational invariance.

Rotational invariance of conserved probability
In the probability expression q z dψ z , suppose the incremental measure dψ z is constant, and we have a finite number of values of z with positive probability. We may write the conserved total probability as z q z = 1. Then from eqn 15, we can write the conservation of total probability as a partition of R 2 = 1 confined to the surface of a multidimensional sphere z √ q z 2 = 1.
There is a natural square root spherical coordinate system, √ q z , in which to express conserved probability.
Square roots of probabilities arise in a variety of fundamental expressions of physics, statistics, and probability theory 14,15 .

Partition of the canonical scale
The canonical scale equals the square of the Gaussian radial scale, T z = R 2 z . Thus, we can write a twodimensional partition from eqn 15 as Define the two dimensions as √ yielding the partition for the canonical scale as This expression takes the input parameter z and partitions the resulting value of T z = R 2 z into a circle of radius R z along the path (w,ẇ) traced by the parameter s.
The radial distance, R z , and associated canonical scale value, T z = R 2 z , are invariant with respect to s. In general, for each dimension we add to a partition of T z , we can create an additional invariance with respect to a new parameter.

Partition into location and rate
A common partition separates the radius into dimensions of location and rate. Defineẇ = ∂w/∂s as the rate of change in the location w with respect to the parameter s. Then we can use the notational equivalence H z ≡ T z = R 2 z to emphasize the relation to a classic expression in physics for a conserved Hamiltonian as in which this conserved square of the radial distance is partitioned into the sum of a squared location, w 2 , and a squared rate of change in location,ẇ 2 . The squared rate, or velocity, arises as a geometric consequence of the Pythagorean partitioning of a squared radial distance into squared component dimensions. Many extensions of this Hamiltonian interpretation can be found in standard textbooks of physics. With the Hamiltonian notation, H z ≡ T z , our canonical exponential-Boltzmann distribution is The value H is often interpreted as energy, with dH as the Gibbs measure. For the simple circular partition of eqn 17, the total energy is often split into potential, w 2 , and kinetic,ẇ 2 , components.
In this article, I emphasize the underlying invariances and their geometric relations as fundamental. From my perspective, the interpretation of energy and its components are simply one way in which to describe the fundamental invariances.
The Hamiltonian interpretation is, however, particularly useful. It leads to a natural expression of dynamics with respect to underlying invariance. For example, we can partition a probability pattern into its currently observable location and its rate of change e −λHz = e −λw 2 e −λẇ 2 .
The first component, w 2 , may be interpreted as the observable state of the probability pattern at a particular time. The second component,ẇ 2 , may be interpreted as the rate of change in the probability pattern. Invariance applies to the combination of location and rate of change, rather than to either component alone. Thus, invariance does not imply equilibrium.

SUMMARY OF INVARIANCES
Probability patterns, q z , express invariances of shift and stretch with respect to a canonical scale, T z . Those invariances lead to an exponential form with respect to various incremental measures, dψ z . This probability expression may be regarded parametrically with respect to z. The parametric view splits the probability pattern into two scaling relations, q z and ψ z , with respect to z, forming the parametric curve defined by the points (ψ z , q z ).
For the canonical scale, T z , we may consider the sorts of transformations that leave the scale shift and stretch (affine) invariant, T • G ∼ T, as in eqn 9. Essentially all of the canonical scales of common probability patterns 7-9 arise from the affine invariance of T and a few simple types of underlying invariance with respect to z.
For the incremental measure scale, dψ z , four alternatives highlight different aspects of probability pattern and scale.
The scale dz leads to the traditional expression of probability pattern, q z dz, which highlights the invariances that set the canonical scale, T z .
The scale dT z leads to the universal exponential-Boltzmann form, q z dT z , which highlights the fundamental shift and stretch invariances in relation to the conservation of total probability. This conservation of total probability may alternatively be described by a cumulative probability measure, dq z = −λq z dT z .
Finally, rotational invariance leads to the Gaussian radial measure, dR z . That radial measure transforms many probability scalings, q z , into Gaussian distributions, q z dR z .
Invariances typically associate with conserved quantities 16 . For example, the rotational invariance of the Gaussian radial measure is equivalent to the conservation of the average area circumscribed by the radial measure. That average circular area is proportional to the traditional definition of the variance. Thus, rotational invariance and conserved variance are equivalent in the Gaussian form.
The Gaussian radial measure often reveals the further underlying invariances that shape pattern. That insight follows from the natural way in which the radial measure can be partitioned into additive components.

THE PRIMACY OF INVARIANCE AND SYMMETRY
It was Einstein who radically changed the way people thought about nature, moving away from the mechanical viewpoint of the nineteenth century toward the elegant contemplation of the underlying symmetry principles of the laws of physics in the twentieth century (Lederman and Hill 17 , p. 153).
The exponential-Boltzmann distribution in eqn 2 provides the basis for statistical mechanics, Jaynesian maximum entropy, and my own invariance framework. These approaches derive the exponential form from different assumptions. The underlying assumptions determine how far one may extend the exponential-Boltzmann form toward explaining the variety of commonly observed patterns.
I claim that one must begin solely with the fundamental invariances in order to develop a proper understanding of the full range of common patterns. By contrast, statistical mechanics and Jaynesian maximum entropy begin from particular assumptions that only partially reflect the deeper underlying invariances.

Statistical mechanics
Statistical mechanics typically begins with an assumed, unseen ensemble of microscopic particles. Each particle is often regarded as identical in nature to the others. Statistical averages over the underlying microscopic ensemble lead to a macroscopic distribution of measurable quantities. The exponential-Boltzmann distribution is the basic equilibrium macroscopic probability pattern.
In contrast with the mechanical perspective of statistical physics, my approach begins with fundamental underlying invariances (symmetries).
Both approaches arrive at roughly the same intermediate point of the exponential-Boltzmann form. That canonical form expresses essentially the same invariances, no matter whether one begins with an underlying mechanical perspective or an underlying invariance perspective.
From my point of view, the underlying mechanical perspective happens to be one particular way in which to uncover the basic invariances that shape pattern. But the mechanical perspective has limitations associated with the unnecessarily particular assumptions made about the underlying microscopic ensemble.
For example, to derive the log-linear scaling pattern that characterizes the commonly observed gamma distribution in eqn 6, a mechanical perspective must make special assumptions about the interactions between the underlying microscopic particles.
Some may consider the demand for explicit mechanical assumptions about the underlying particles to be a benefit. But in practice, those explicit assumptions are almost certainly false, and instead simply serve as a method by which to point in the direction of the deeper underlying invariance that shapes the scaling relations and associated probability patterns.
I prefer to start with the deeper abstract structure shaped by the key invariances. Then one may consider the variety of different particular mechanical assumptions that lead to the key invariances. Each set of particular assumptions that are consistent with the key invariances define a special case.
There have been many powerful extensions to statistical mechanics in recent years. Examples include generalized entropies based on assumptions about underlying particle mechanics 18 , superstatistics as the average over heterogeneous microscopic sets 19 , and invariance principles applied to the mechanical aspects of particle interactions 20 .
My own invariance and scaling approach subsumes essentially all of those results in a simple and elegant way, and goes much further with regard to providing a systematic understanding of the commonly observed patterns 7-9 . However, it remains a matter of opinion whether an underlying mechanical framework based on an explicit microscopic ensemble is better or worse than a more abstract approach based purely on invariances.
9.2 Jaynesian maximum entropy Jaynes 4,5 replaced the old microscopic ensemble of particles and the associated mechanical entropy with a new information entropy. He showed that maximum entropy, in the sense of information rather particle mechanics, leads to the classic exponential-Boltzmann form. A large literature extends the Jaynesian framework 21 . Axiomatic approaches transcend the original justifications based on intuitive notions of information 22 .
Jaynes' exponential form has a kind of canonical scale, T z . In Jaynes' approach, one sets the average value over the canonical scale to a fixed value, in our notation a fixed value of T z . That conserved average value defines a constraint-an invariance-that determines the associated probability pattern 23 . The Jaynesian algorithm is the maximization of entropy, subject to a constraint on the average value of some quantity, T z .
Jaynes struggled to go beyond the standard constraints of the mean or the variance. Those constraints arise from fixing the average values of T z = z or T z = z 2 , which lead to the associated exponential or Gaussian forms. Jaynes did discuss a variety of additional invariances 6 and associated probability patterns. But he never achieved any systematic understanding of the common invariances and the associated commonly observed patterns and their relations.
I regarded Jaynes' transcendence of the particle-based microscopic ensemble as a strong move in the right direction. I followed that direction for several years 7-9,12 . In my prior work, I developed the intrinsic affine invariance of the canonical scale, T z , with respect to the exponential-Boltzmann distribution of maximum entropy. The recognition of that general affine invariance plus the variety of common invariances of scale 24,25 led to my systematic classification of the common probability patterns and their relationships [7][8][9] .
In this article, I have taken the next step by doing away with the Jaynesian maximization of entropy. I replaced that maximization with the fundamental invariances of shift and stretch, from which I obtained the canonical exponential-Boltzmann form.
With the exponential-Boltzmann distribution derived from shift and stretch invariance rather than Jaynesian maximum entropy, I added my prior work on the general affine invariance of the canonical scale and the additional particular invariances that define the common scaling relations and probability patterns. We now have a complete system based purely on invariances.

Conclusion
Shift and stretch invariance set the exponential-Boltzmann form of probability patterns. Rotational invariance transforms the exponential pattern into the Gaussian pattern. These fundamental forms define the abstract structure of pattern with respect to a canonical scale.
In a particular application, observable pattern arises by the scaling relation between the natural measurements of that application and the canonical scale. The particular scaling relation derives from the universal affine invariance of the canonical scale and from the additional invariances that arise in the particular application.
Together, these invariances define the commonly observed scaling relations and associated probability patterns. The study of pattern often reduces to the study of how particular generative processes set the particular invariances that define scale.
Diverse and seemingly unrelated generative processes may reduce to the same simple invariance, and thus to the same scaling relation and associated pattern. To test hypotheses about generative process and to understand the diversity of natural pattern, one must understand the central role of invariance. Although that message has been repeated many times, it has yet to be fully deciphered. Below eqn 7, I stated that the average value λ T T = 1 remains unchanged after stretch transformation, T z → bT z . This section provides additional details. The problem begins with eqn 7, repeated here Make the substitution T z → bT z , which yields noting that T z → bT z implies dT z → bdT z , which explains the origin of the b 2 term on the right-hand side. Thus, eqn 7 remains one under stretch transformation, implying that T T = 1/λb.

Primacy of invariance
This article assumes the primacy of shift and stretch invariance. The article then develops the consequences of primary invariance. There are many other ways of understanding the fact that the foundational exponential-Boltzmann distribution expresses shift and stretch invariance, and the Gaussian distribution expresses rotational invariance. One can derive those invariances from other assumptions, rather than assume that they are primary.
Classical statistical mechanics derives shift and stretch invariance as consequences of the aggregate behavior of many particles. Jaynesian maximum entropy derives shift and stretch invariance as consequences of the tendency for entropy to increase plus the assumptions that total probability is conserved and that the average value of some measurement is conserved. In my notation, the conservation of λT z is equivalent to the assumption of stretch invariance. Often, this kind of assumption is similar to various conservation assumptions, such as the conservation of energy.
Another way to derive invariance is by the classic limit theorems of probability. Gnedenko and Kolmogorov 26 beautifully summarized a key aspect: In fact, all epistemologic value of the theory of probability is based on this: that large-scale random phenomena in their collective action create strict, nonrandom regularity.
The limit theorems typically derive from assumptions such as the summation of many independent random components, or in more complicated studies, the aggregation of partially correlated random components. From those assumptions, certain invariances may arise as consequences.
It may seem that the derivation of invariances from more concrete assumptions provides a better approach. But from a mathematical and perhaps ultimate point of view, invariance is often tautologically related to supposedly more concrete assumptions. For example, conservation of energy typically arises as an assumption in many profound physical theories. In those theories, one could chose to say that stretch invariance arises from conservation of energy or, equivalently, that conservation of energy arises from stretch invariance. It is not at all clear how we can know which is primary, because mathematically they are often effectively the same assumption.
My point of departure is the opening quote from Weinberg, who based his statement on the overwhelming success of 20th century physics. That success has partly (mostly?) been driven by studying the consequences that follow from assuming various primary invariances. The ultimate basis for those primary invariances remains unclear, but the profoundly successful consequences of proceeding in this way are very clear. These issues are very important. However, a proper discussion would require probing the basis of modern physics as well as many deep recent developments in mathematics, which is beyond my scope. I simply wanted to analyze what would follow from the assumption of a few simple primary invariances.

Measurement theory
Classical measurement theory develops a rational approach to derive and understand measurement scales 24,25 . Roughly speaking, a measurement scale is defined by the transformations that leave invariant the relevant relations of the measurement process. Different approaches develop that general notion of invariance in different ways or expand into broader aspects of pattern (e.g, Grenander 27 ).
This article concerns probability patterns in relation to scale. The key is that probability patterns remain invariant to affine transformation, that is, to shift and stretch transformations. Thus different measurement scales lead to the same invariant probability pattern if they are affine similar. I discussed the role of affine similarity in several recent articles [7][8][9] . Here, I briefly highlight the main points.
Start with some notation. Let T(z) ≡ T be a transformation of underlying observations z that define a scale, T. Each scale T has the property of being invariant to certain alterations of the underlying observations. Let a candidate alteration of the underlying observation be the generator, G(z) ≡ G. Invariance of the scale T to the generator G means that which we can write in simpler notation as Sometimes we do not require exact invariance, but only a kind of similarity. In the case of probability patterns, shift and stretch invariance mean that any two scales related by affine transformation T = a + bT yield the same probability pattern. In other words, probability patterns are invariant to affine transformations of scale. Thus, with regard to the generator G, we only require that T • G fall within a family of affine transformation of T. Thus, we write the conditions for two probability patterns to be invariant to the generator G as T • G = a + bT ∼ T, and thus the key invariance relation for probability patterns is affine similarity expressed as which was presented in the text as eqn 9. My prior publications fully developed this relation of affine similarity and its consequences for the variety of scales that define the commonly observed probability patterns 7-9 . Appendix B briefly presents a few examples, including the linear-log scale.

APPENDIX B: INVARIANCE AND THE COMMON CANONICAL SCALES
The variety of canonical scales may be understood by the variety of invariances that hold under different circumstances. I introduced the affine invariance of the canonical scale in eqn 9. This section briefly summarizes further aspects of invariance and the common canonical scales. Prior publications provide more detail [7][8][9] .
Invariance can be studied by partition of the transformation, z → T z , into two steps, z → w → T z . The first transformation expresses intrinsic invariances by the transformation z → w(z), in which w defines the new base scale consistent with the intrinsic invariances.
The second transformation evaluates only the canonical shift and stretch invariances in relation to the base scale, w → a + bw. This affine transformation of the base scale can be written as T(w) = a + bw. We can define T(w) ≡ T z , noting that w is a function of z.

Rotational invariance of the base scale
Rotational invariance is perhaps the most common base scale symmetry. In the simplest case, w(z) = z 2 . If we write x = z cos θ and y = z sin θ, then x 2 + y 2 = z 2 , and the points (x, y) trace a circle with a radius z that is rotationally invariant to the angle θ. Many probability distributions arise from rotationally invariant base scales, which is why squared values are so common in probability patterns. For example, if w = z 2 and T z ≡ w, then the canonical exponential form that follows from shift and stretch invariance of the rotationally invariant base scale is which is the Gaussian distribution, as discussed in the text. Note that the word rotation captures an invariance that transcends a purely angular interpretation. Instead, we have component processes or measurements that satisfy an additive invariance constraint. For each final value, z, there exist a variety of underlying processes or outcomes that satisfy the invariance x 2 i = z 2 . The word rotation simply refers to the diversity of underlying Pythagorean partitions that sum to an invariant Euclidean distance. The set of invariant partitions falls on the surface of a sphere. That spherical property leads to the expression of invariant additive partitions in terms of rotation.

General form of base scale invariance
The earlier sections established that the canonical scale of probability patterns is invariant to shift and stretch. Thus we may consider as equivalent any affine transformation of the base scale w → a + bw.
We may describe additional invariances of w, such as rotational invariance, in the general form in which w • G ≡ w [G(z)]. We read eqn 18 as: the base scale w is invariant to transformation by G, such that w • G = a + bw for some constants a and b. The symbol "∼" abbreviates the affine invariance of w.
For example, we may express the rotational invariance of the prior section as w(z, θ) = z 2 (cos 2 θ + sin 2 θ) = z 2 , because cos 2 θ + sin 2 θ = 1 for any value of θ. We can describe rotation by the transformation G(z, θ) = (z, θ + ), so that the invariance expression is Thus, the base scale w is affine invariant to the rotational transformation generator, G, as in eqn 18. Although this form of rotational invariance seems trivial in this context, it turns out to be the basis for many classical results in probability, dynamics, and statistical mechanics. The invariance expression of eqn 18 sets the conditions for base scale invariances. Although there are many possible base scales, a few dominate the commonly observed patterns [7][8][9] . In this article, I emphasize the principles of invariance rather than a full discussion of the various common scales.
Earlier, I discussed the log-linear scale associated with the gamma distribution. This section presents the inverse linear-log scale, which is w(z) = α log(1 + βz).
When βz is small, w is approximately αβz, which is linear in z. When βz is large, w is approximately α log(βz), which is logarithmic in z. This linear-log scale is affine invariant to transformations because w • G = αw ∼ w. The transformation, G, is linear for small magnitudes of z and power law for large magnitudes of z.
The linear-log base scale, w, yields the probability distribution q z = ke −λw = k(1 + βz) −γ , for γ = λα. This expression is the commonly observed Lomax or Pareto type II distribution, which is equivalent to an exponential-Boltzmann distribution for small z and a power law distribution in the upper tail for large z.
We can combine base scales. For example, if we start with w 1 , a rotationally invariant scale, z → z 2 , and then transform those rotationally invariant values to a linearlog scale, w 2 , we obtain w 2 [w 1 (z)] = α log(1+βz 2 ). This scale corresponds to the generalized Student's distribution For small magnitudes of z, this distribution is linear in scale and Gaussian in shape. For large magnitudes of z, this distribution has power law tails. Thus, a rotationally invariant linear-log scale grades from Gaussian to power law as magnitude increases. 8 The family of canonical scales The canonical scale, T z , determines the associated probability pattern, q z = ke −λTz . What determines the canonical scale? The answer has two parts.
First, each problem begins with a base scale, w(z) ≡ w. The base scale arises from the invariances that define the particular problem. Those invariances may come from observation or by assumption. The prior sections gave the examples of rotational invariance, associated with squared-value scaling, and linear to power-law invariance, associated with linear to log scaling. When the base scale lacks intrinsic invariance, we may write w ≡ z. Earlier publications provided examples of common base scales [7][8][9] .
Second, the canonical scale arises by transformation of the base scale, T z = T(w). The canonical scale must satisfy both the shift and stretch invariance requirements. If the base scale itself satisfies both invariances, then the base scale is the canonical scale, T z = w. In particular, if the probability pattern remains invariant to affine transformations of the base scale w → δ + γw, then the shift and stretch invariant distribution has the form Alternatively, w may satisfy the shift invariance requirement, but fail the stretch invariance requirement 8,9 . We therefore need to find a canonical transformation T(w) that achieves affine invariance with respect to the underlying shift, G(w) = δ + w. The transformation changes a shift invariance of w into a stretch invariance of T z , because T(δ + w) = e β(δ+w) = e βδ e βw = bT ∼ T for b = e βδ . We can write T(δ + w) = T • G, thus this expression shows that we have satisfied the affine invariance T • G ∼ T of eqn 9.
Thus, shift invariance with respect to w generates a family of scaling relations described by the parameter β. The one parameter family of canonical scales in eqn 20 expands the canonical exponential form for probability distributions to q z = ke −λTz = ke −λe βw .
The simpler form of eqn 19 arises as a limiting case for β → 0. That limiting form corresponds to the case in which the base scale, w, is itself both shift and stretch invariant 8,9 . Thus, we may consider the more familiar exponential form as falling within the expanded one parameter symmetry group of scaling relations in eqn 20. The expanded canonical form for probability patterns in eqn 21 and a few simple base scales, w, include essentially all of the commonly observed continuous probability patterns 8,9 . 9 Example: extreme values In some cases, it is useful to consider the probability pattern in terms of the canonical scale measure, dT z = |T |dz. Using T z = e βw , distributions take on the form often found in the extreme value problems 8,9 q z dz = kw e βw−λe βw dz, in which w = |dw/dz|. For example, w = z yields the Gumbel distribution, and w = log z yields the Fréchet or Weibull form.