Translation of Ludwig Boltzmann ’ s Paper “ On the Relationship between the Second Fundamental Theorem of the Mechanical Theory of Heat and Probability Calculations Regarding the Conditions for Thermal Equilibrium ”

Translation of the seminal 1877 paper by Ludwig Boltzmann which for the first time established the probabilistic basis of entropy. Includes a scientific commentary.


Introduction
Barely any of Boltzmann's original scientific work is available in translation. This is remarkable given his central role in the development of both equilibrium and non-equilibrium statistical mechanics, his statistical mechanical explanation of entropy, and our understanding of the Second Law of thermodynamics. What Boltzmann actually wrote on these subjects is rarely quoted directly, his methods are not fully appreciated, and key concepts have been misinterpreted. Yet his work remains relevant today. Witness, for example, the recent debate about Gibbs entropy vs. Boltzmann entropy [1][2][3][4]. The paper translated here (LB1877) occupies the high middle ground of his work, and was a key step in developing the fully probabilistic basis of entropy and the Second Law. LB1877 exemplifies several of Boltzmann's most important contributions to modern physics. These include (1) The eponymous Boltzmann distribution, relating the probability of a state to the exponential of its energy scaled by the mean kinetic energy (temperature). This distribution is so pervasive in statistical mechanics one almost forgets that it had to be "discovered" in the first place. Deriving the correct distribution is absolutely necessary for Boltzmann's statistical mechanical definition of entropy, and it is derived twice, in Sections I and II, for discretized and continuous energy distributions, respectively. Boltzmann considers this point important enough to devote a whole section (IV) to apparently plausible ways to determine the probabilities of different state distributions which are physically incorrect: Either because they don't produce the right dispersion in kinetic energies between molecules, or because the distributions don't depend correctly on the average kinetic energy (temperature), and so are not scale invariant. Previous work by Maxwell and Boltzmann was based on mechanical laws of motion and particle interaction in gases. In contrast, Boltzmann's derivations here are much more general. They require only that particles can exchange kinetic energy, but they don't specify the mechanism. As Boltzmann predicted in the final sentences of this paper his approach was applicable not just to gases, but to liquids and solids of any composition. Indeed, the Boltzmann distribution has also passed almost unchanged into the quantum world. Even without its connection to entropy, the Boltzmann distribution is of remarkably wide ranging importance: explaining, amongst other phenomena, the density distribution of the atmosphere, cooling by evaporation, and Arrhenius type chemical kinetics.
(2) Much of the theoretical apparatus of statistical mechanics is developed with great clarity in Sections I and II of LB1877. Here Boltzmann uses a hierarchy of three levels of description. First there is the macro-state, the level of thermodynamic observables such as temperature and pressure. Second there is the level at which the energy or velocity components of each molecule are specified. He calls this a Komplexion, which we translate literally as complexion. Finally there is the intermediate level at which the number of molecules with each energy/velocity/position is specified, regardless of which molecules those are (Boltzmann's w 0 , w 1 etc.). Boltzmann calls a particular set of w-values a Zustandeverteilung, translated as a "state distribution' or "distribution of states'. His terminology and usage of the three levels is incisive, in some ways superior to the two modern terms macro-state and micro-state: The latter is sometimes applied to both complexions and state distributions, eliding a distinction Boltzmann was careful to keep. Boltzmann then deploys the machinery of permutations, state counting etc. to determine the all important state distribution number: the number of complexions contained within any given state distribution, which he denotes by P. Boltzmann then shows how to find the state distribution (w max 0 , w max 1 , . . .) with the largest number of complexions P max subject to constraints on the temperature and number of molecules. Boltzmann's postulate is that (w max 0 , w max 1 , . . .) is the most probable state distribution, and that this corresponds to thermal equilibrium. Although Boltzmann introduces discrete energy levels in Section I as a device of mathematical convenience and considers them unphysical, he hedges his bets on this issue in Section II by showing that there is no material difference using continuous energies. This indifference to details like discretization demonstrates again the great generality of the statistical mechanical world view Boltzmann was forging.
(3) The statistical mechanical formulation of entropy, Boltzmann's third great contribution, comes in Section V of LB1877. Boltzmann shows that the statistical mechanical quantity he denotes by Ω (multiplied by 2/3) is equal to the thermodynamic quantity entropy (S) as defined by Clausius, to within an additive constant. Boltzmann called Ω the Permutabilitätmass, translated here literally as "permutability measure". Given the permutability measure's central role, it is worth reviewing its definition. A state distribution is specified by the number of molecules having velocity components within some small interval u and u + du, v and v + dv, w and w + dw for every u, v, and w value in the range (−∞, +∞), and having position coordinates in each small volume (x, x + dx), (y, y + dy), (z, z + dz), ranging over the total volume V . For each state distribution there is a number of possible complexions. One particular state distribution has the most complexions and so therefore is the most probable. Ω is given by the logarithm of the number of complexions for that state with the most complexions.
(4) In Section V of LB1877 Boltzmann also clearly demonstrates that there are two distinct contributions to entropy, arising from the distribution of heat (kinetic energy) and the distribution in space of atoms or molecules: compare his Equations (34) and (61) for Ω. In the initial effort to understand the nature of entropy, Carnot, Clausius, Maxwell, Kelvin and others focused almost entirely on the contribution from heat. Boltzmann welded the spatial and thermal parts together into a coherent statistical mechanical formulation. It is fitting that Boltzmann was the one to discover the third fundamental contribution to entropy, namely radiation, by deriving the Stefan-Boltzmann Law [5].
Careful reading of LB1877 is enlightening with regard to a number of apparent paradoxes subsequently encountered in the development of statistical mechanics. First, regarding extensivity, Ω is not the logarithm of a probability. That would be obtained by dividing the number of complexions P for a given state distribution by the total number of complexions J (see text immediately preceding Equation (1)); Boltzmann gives this a different symbol, W, from the first letter of the German word for probability, but he does not use it. Confusingly, Planck later chose to write Boltzmann's equation for entropy as S=klnW+constant [6]. Crucially, Ω is an extensive quantity, which leads to Boltzmann's extensive Equation (62) for the entropy of an ideal gas. Thus Boltzmann never encounters the apparent Gibbs paradox for the entropy of mixing of identical gases. Furthermore, with Boltzmann's Permutabilitätmass method for counting states, there is no need for a posteriori division by N! to "correct" the derivation using the "somewhat mystical arguments of Gibbs and Planck" [7] nor a need to appeal to quantum indistinguishability, which has been implausibly described as the appearance of quantum effects at the macroscopic classical level [8]. Subsequently, at least four distinguished practitioners of statistical mechanics have pointed out that correct counting of states a la Boltzmann obviates the need for the spurious indistinguishability/N! term: Ehrenfest [9], van Kampen [7], Jaynes [10], Swendsen [11] (and possibly Pauli [12]). This has had little impact on textbooks of statistical mechanics. An exception is the treatise by Gallavotti [13].
Second, regarding non-equilibrium states, Ω is not the logarithm of a volume in momentum-coordinate phase space dV = dqdp occupied by the system. Boltzmann notes, using Liouville's theorem, that dV remains constant in time (See Equation (41)), and so it cannot describe the entropy increase upon approach to equilibrium that Boltzmann was so concerned with. He thus avoids at the outset the considerable difficulty Gibbs had accounting for changes in entropy with time (see p144 in [14]) Boltzmann gave us, for the first time, a definition of entropy applicable to every state (distribution), at equilibrium or not: "Then the entropy of the initial and final states is not defined, (Tr: by Clausius' equation) but one can still calculate the quantity which we have called the permutability measure", LB1877, Section V. See also Equations (3), (31) and the definition of a state distribution as any set of w s satisfying the constraints of number and mean kinetic energy. By extension, every complexion can then be assigned an entropy, using the permutability measure of the state distribution to which that complexion belongs [15], opening the door to the statistical mechanics of non-equilibrium states and irreversible processes.
Finally, for a more extensive appraisal of Boltzmann's entire oeuvre we recommend reading the superb biography of Boltzmann: "The Man Who Trusted Atoms" [16].

On the Relationship between the Second Fundamental Theorem of the Mechanical Theory of Heat and Probability Calculations regarding the Conditions for Thermal Equilibrium by Ludwig Boltzmann
The relationship between the second fundamental theorem and calculations of probability became clear for the first time when I demonstrated that the theorem's analytical proof is only possible on the basis of probability calculations. (I refer to my publication "Analytical proof of the second fundamental theorem of the mechanical theory of heat derived from the laws of equilibrium for kinetic energy" Wien. Ber. 63, p8 reprinted as Wiss. Abhand. Vol I, reprint 20, pp295 1 and also my "Remarks about several problems in the mechanical theory of heat"; 3rd paragraph, Wiss. Abhand. Vol II, reprint 39. This relationship is also confirmed by demonstrating that an exact proof of the fundamental theorems of the equilibrium of heat is most easily obtained if one demonstrates that a certain quantity-which I wish to define again as E 2 -has to decrease as a result of the exchange of the kinetic energy among the gaseous molecules and therefore reaching its minimum value for the state of the equilibrium of heat. (Compare my "Additional studies about the equilibrium of heat among gaseous molecules" Wiss. Abhand. Vol I, reprint 22, p316). The relationship between the second fundamental theorem and the laws of the equilibrium of heat is made even more compelling in light of the developments in the second paragraph of my "Remarks about several problems of the mechanical theory of heat". There I mentioned for the first time the possibility of a very unique way of calculating the equilibrium of heat using the following formulation: "It is clear that every single uniform state distribution which establishes itself after a certain time given a defined initial state is equally as probable as every single nonuniform state distribution, comparable to the situation in the game of Lotto where every single quintet is as improbable as the quintet 12345. The higher probability that the state distribution becomes uniform with time arises only because there are far more uniform than nonuniform state distributions"; furthermore: "It is even possible to calculate the probabilities from the relationships of the number of different state distributions. This approach would perhaps lead to an interesting method for the calculation of the equilibrium of heat." It is thereby indicated that it is possible to calculate the state of the equilibrium of heat by finding the probability of the different possible states of the system. The initial state in most cases is bound to be highly improbable and from it the system will always rapidly approach a more probable state until it finally reaches the most probable state, i.e., that of the heat equilibrium. If we apply this to the second basic theorem we will be able to identify that quantity which is usually called entropy with the probability of the particular state. Let's assume a system of bodies which are in a state of isolation with no interaction with other bodies, e.g., one body with higher and one with lower temperature and one so called intermediate body which accomplishes the heat transfer between the two bodies; or choosing another example by assuming a vessel with absolutely even and rigid walls one half of which is filled with air of low temperature and pressure whereas the other half is filled with air of high temperature and pressure. The hypothetical system of particles is assumed to have a certain state at time zero. Through the interaction between the particles the state is changed. According to the second fundamental theorem, this change has to take place in such a way that the total entropy of the particles increases. This means according to our present interpretation that nothing changes except that the probability of the overall state for all particles will get larger and larger. The system of particles always changes from an improbable to a probable state. It will become clear later what this means. After the publication of my last treatise regarding this topic the same idea was taken up and developed further by Mr. Oskar Emil Meyer totally independent of me {Die kinetische Theorie der Gase; Breslau 1877, Seite 262}. He attempts to interpret, in the described manner, the equations of my continued studies concerning the 1 Footnotes refer to this translation. Boltzmann's citations to "this book", "earlier work" etc., and citations in footnotes, have been replaced by explicit references indicated in the text by {}. Wiss. Abhand. refers to his Collected Works. 2 E is not energy; in later publications Boltzmann used the symbol H, as in the H-theorem. equilibrium of heat particles. However, the line of reasoning of Mr. Meyer remained entirely unclear to me and I will return to my concerns with his approach on page 172 (Wiss. Abhand. Vol II).
We have to take here a totally different approach because it is our main purpose not to limit our discussion to thermal equilibrium, but to explore the relationship of this probabilistic formulation to the second theorem of the mechanical theory of heat. We want first to solve the problem which I referred to above and already defined in my "Remarks on some problems of the mechanical theory of heat" 3 , namely to calculate the probability of state distributions from the number of different distributions. We want first to treat as simple a case as possible, namely a gas of rigid absolutely elastic spherical molecules trapped in a container with absolutely elastic walls. (Which interact with central forces only within a certain small distance, but not otherwise; the latter assumption, which includes the former as a special case, does not change the calculations in the least). Even in this case, the application of probability theory is not easy. The number of molecules is not infinite, in a mathematical sense, yet the number of velocities each molecule is capable of is effectively infinite. Given this last condition, the calculations are very difficult; to facilitate understanding, I will, as in earlier work, consider a limiting case.

I. Kinetic Energy Has Discrete Values
We assume initially, each molecule is only capable of assuming a finite number of velocities, such as where p and q are arbitrary finite numbers. Upon colliding, two molecules may exchange velocities, but after the collision both molecules still have one of the above velocities, namely This assumption does not correspond to any realistic mechanical model, but it is easier to handle mathematically, and the actual problem to be solved is re-established by letting p and q go to infinity. Even if, at first sight, this seems a very abstract way of treating the problem, it rapidly leads to the desired objective, and when you consider that in nature all infinities are but limiting cases, one assumes each molecule can behave in this fashion only in the limiting case where each molecule can assume more and more values of the velocity.
To continue, however, we will consider the kinetic energy, rather than the velocity of the molecules. Each molecule can have only a finite number of values for its kinetic energy. As a further simplification, we assume that the kinetic energies of each molecule form an arithmetic progression, such as the following: 0, , 2 , 3 , . . . p We call P the largest possible value of the kinetic energy, p . Before impact, each of two colliding molecules shall have a kinetic energy of 0, or , or 2 , etc. . . . p which means that after the collision, each molecule still has one of the above values of kinetic energy. The number of molecules in the vessel is n. If we know how many of these n molecules have a kinetic energy of zero, how many have a kinetic energy of and so on, then we know the kinetic energy distribution. If at the beginning there is some state distribution among the gas molecules, this will in general be changed by the collisions. The laws governing this change have already been the subject of my previous investigations. But right away, I note that this is not my intention here; instead I want to establish the probability of a state distribution regardless of how it is created, or more specifically, I want to find all possible combinations of the p+1 kinetic energy values allowed to each of the n molecules and then establish how many of these combinations correspond to each state distribution. The latter number then determines the likelihood of the relevant state distribution, as I have already stated in my published "Remarks about several problems in the mechanical theory of heat" (Wiss. Abhand. Vol II, reprint 39, p121.) As a preliminary, we will use a simpler schematic approach to the problem, instead of the exact case. Suppose we have n molecules. Each of them is capable of having kinetic energy 0, , 2 , 3 , . . . p .
and suppose these energies are distributed in all possible ways among the n molecules, such that the total energy is a constant, e.g., λ = L. Any such distribution, in which the first molecule may have a kinetic energy of e.g., 2 , the second may have 6 , and so on, up to the last molecule, we call a complexion, and so that each individual complexion can be easily enumerated, we write them in sequence (for convenience we divide through by ), specifying the kinetic energy of each molecule. We seek the number P of complexions where w 0 molecules have kinetic energy 0, w 1 molecules have kinetic energy , w 2 have kinetic energy 2 , up to the w p which have kinetic energy p . We said, earlier, that given how many molecules have kinetic energy 0, how many have kinetic energy , etc., this distribution among the molecules specifies the number of P of complexions for that distribution; in other words, it determines the likelihood of that state distribution. Dividing the number P by the number of all possible complexions, we get the probability of the state distribution. Since a distribution of states does not determine kinetic energies exactly, the goal is to describe the state distribution by writing as many zeros as molecules with zero kinetic energy (w 0 ), w 1 ones for those with kinetic energy etc. All these zeros, ones, etc. are the elements defining the state distribution. It is now immediately clear that the number P for each state distribution is exactly the same as the number of permutations of which the elements of the state distribution are capable, and that is why the number P is the desired measure of the permutability of the corresponding distribution of states. Once we have specified every possible complexion, we have also all possible state distributions, the latter differing from the former only by immaterial permutations of molecular labels. All those complexions which contain the same number of zeros, the same number of ones etc., differing from each other merely by different arrangements of elements, will result in the same state distribution; the number of complexions forming the same state distribution, and which we have denoted by P, must be equal to the number of permutations which the elements of the state distribution are capable of. In order to give a simple numerical example, take n = 7, λ = 7, p = 7, so L = 7 , P = 7 . With 7 molecules, there are 8 possible values for the kinetic energy, 0, , 2 , 3 , 4 , 5 , 6 , 7 to distribute in any possible way such that the total kinetic energy = 7 . There are then 15 possible state distributions. We enumerate each of them in the above manner, producing the numbers listed in the second column of the following In the last column, under the heading P is the number of possible permutations of members for each state. The first state distribution, for example, has 6 molecules with zero kinetic energy, and the seventh has kinetic energy 7 . So w 0 = 6, w 7 = 1, w 2 = w 3 = w 4 = w 5 = w 6 = 0. It is immaterial which molecule has kinetic energy 7 . So there are 7 possible complexions which represent this state distribution. Denoting the sum of all possible complexions, 1716, by J then the probability of the first state distribution is 7/J; similarly, the probability of the second state distribution is 42/J; the most probable state distribution is the tenth, as its elements permit the greatest number of permutations. Hereon, we call the number of permutations the relative likelihood of the state distribution; this can be defined in a different way, which we next illustrate with a specific numerical example, since generalization is straightforward. Suppose we have an urn containing an infinite number of paper slips. On each slip is one of the numbers 0, 1, 2, 3, 4, 5, 6, 7; each number is on the same amount of slips, and has the same probability of being picked. We now draw the first septet of slips, and note the numbers on them. This septet provides a sample state distribution with a kinetic energy of times the number written on the first slip for molecule 1, and so forth. We return the slips to the urn, and draw a second septet which gives us a second state distribution, etc. After we draw a very large number of septets, we reject all those for which the total does not equal 7. This still leaves a large number of septets. Since each number has the same probability of occurrence, and the same elements in a different order form different complexions, each possible complexion will occur equally often. By ordering the numbers within each septet by size, we can classify each into one of the fifteen cases tabulated above. So the number of septets which fall into the class 0000007 relative to the 0000016 class will be 7:42. Similarly for all the other septets. The most likely state distribution is the one which produces the most septets, namely the 10th.
(LB footnote: 4 If we divide the number of septets corresponding to a particular state by the total number of septets, we obtain the probability distribution. Instead of discarding all septets whose total is not 7 we could, after drawing a slip remove from the urn all those other slips for which a total of 7 is now impossible, e.g., on drawing a slip with 6 on it, all other slips except those with 0 or 1 would be removed. If the first 6 slips all had 0 on them, only slips with 7 on them would be left in the urn. One more thing should be noted at this point: We construct all possible complexions. If we denote byw 0 the arithmetic mean of all values of w 0 which belong to the different complexions, and form analogous expressionsw 1 ,w 2 . . . , in the limit these quantities would also form the same state distribution.) (Translators' note: Boltzmann's comments on the results of Mr. Oskar Meyer beginning "Ich will hier einige Worte über die von Hrn. Oskar Meyer..." on p172 (Wiss. Ab.) and ending with "...Bearbeitung des allgemeinen Problems zurückkehren." on p175 (Wiss. Ab.) are of historical interest only and are omitted.) The first task is to determine the permutation number, previously designated by P, for any state distribution. Denoting by J the sum of the permutations P for all possible state distributions, the quotient P/J is the state distribution's probability, henceforth denoted by W . We would first like to calculate the permutations P for the state distribution characterized by w 0 molecules with kinetic energy 0, w 1 molecules with kinetic energy , etc. It must be understood that because the total number of molecules is n, and the total kinetic energy is λ = L. Describing the state distribution as before, a complexion has w 0 molecules with zero energy, w 1 with one unit, and so on. The permutations, P, arise since of the n elements w 0 are mutually identical. Similarly with the w 1 , w 2 , etc. elements. The total number of permutations is well known The most likely state distribution will be for those w 0 , w 1 . . . values for which P is a maximum or since the numerator is a constant, for which the denominator is a minimum. The values w 0 , w 1 must simultaneously satisfy the two constraints (1) and (2). Since the denominator of P is a product, it is easiest to determine the minimum of its logarithm, that is the minimum of here ln is the natural logarithm 5 . It is natural in our problem that only integer values of w 0 , w 1 . . . are meaningful. However to apply differential calculus, we will allow non-integer values, and so find the minimum of the expression 4 The paper contains two long footnotes of a scientific nature. For readability these are interpolated in the text where they are cited, as indicated by the text beginning "LB footnote:" 5 The ambiguous symbol "l" for logarithm in the original text has been replaced throughout by "ln".
which is identical to (4) for integer values of w 0 , w 1 . . .. We then get the non-integer values which for constraints (1) and (2) maximize M 1 6 . The solution to the problem will in any case be obtained if for w 0 , w 1 , etc. we select the closest set of integer values. If here and there a deviation of a few integers is required, the nearest complexion is easily found. The minimum of M 1 is found by adding to both sides of the equation for M 1 Equation (1) multiplied by the constant h, and Equation (2) multiplied by the constant k, and setting the partial derivatives with respect to each of the variables w 0 , w 1 , w 2 . . . to zero.
We thus obtain the following equations: Exact solution of the problem through evaluation of the gamma function integral is very difficult; fortunately the general solution for arbitrary finite values of p and n does not interest us here, but only the solution for the limiting case of larger and larger number of molecules. Then the numbers w 0 , w 1 , w 2 etc. become larger and larger, so we introduce the function φ(x) = ln Γ(x + 1) − x(ln x − 1) − 1 2 ln 2π 7 . Then we can write the first equation of (5) as follows Similarly for the other equations of (5). It is also well known that This series is not valid forvx = 0, but here x! and √ 2π(x/e) x should have the same value, and φ(x) = 0. Therefore the problem of finding the minimum of w 0 !w 1 !w 2 !... is replaced by the easier problem of finding the minimum of . . . 6 The original text reads as "maximized" but should mean "minimized". 7 Boltzmann approximates ln x! by x ln x − x + 1 2 ln (2π) rather than (x + 1 2 ) ln x − x + 1 2 ln (2π) as is now usual. For x >> 30 the relative difference is small. providing w is not zero, even at moderately large values of p and n both problems have matching solutions. From line (6a) it follows which for larger and larger values of w 0 or ln w 0 vanishes, the same also applies to the other w s, so the Equation (6) can be written as follows likewise the equations for the remaining w s are One sees immediately that by neglecting the expression (6b), the minimum of the denominator of is found instead of the minimum of the denominator of (3). So for problems involving w!, use of a well known approximation (See Schlömilch's Comp. S. 438) amounts to substitution of √ 2π(w/e) w for w!. If we denote the common value of the quotient (6c) by x we obtain The two Equations (1) and (2) become One sees immediately that these equations differ negligibly from Equation (42) and the preceding ones from my earlier work "Study of the thermal equilibrium of gas molecules". We can use the last equation to write Carrying out the differentiation in the last equation Dividing this equation by Equation (10) gives One can see immediately from Descartes' theorem 8 that this equation cannot have more than three real positive roots, of which two are = 1. Again it is easy to see that both roots are not solutions of Equations (8) and (9), and also do not solve the problem, but that they showed up in the final equation merely as a result of multiplying by the factor (x − 1) 2 . To be convinced of this, one need only derive the final equation directly by dividing the Equations (8) and (9). Following this division and having removed the variable x from the denominator and collecting powers of x throughout, we get the equation which is an equation of p th degree, and whose roots supply the solution to the problem. Thus Equation (12) cannot have more positive roots than the solution requires. Negative or complex roots have no meaning for the solution to the problem. We note again that the largest allowed kinetic energy P = p , is very large compared to the mean kinetic energy of a molecule L n = λ n from which it follows that p is very large compared to λ/n. The polynomial Equation (13), which shares the same real roots with Equation (12), is negative for x = 0, x = 1, however it has the value which is positive and very large, since p is very large compared to n. The only positive root occurs for x between 0 and 1, and we obtain it from the more convenient Equation (12). Since x is a proper fraction, then the p th 9 and (p + 1) th powers are smaller and can be neglected, in which case we obtain This is the value to which x tends for large p, and one can see the important fact that for reasonably large values of p the value of x depends almost exclusively on the ratio λ/n, and varies little with either λ or n providing their ratio is constant. Once one has found x, it follows from Equation (10) that and Equation (7) gives the values of the remaining w s. It is seen from the quotients w 0 n , w 1 n , w 2 n , etc. 8 Cardano's formula 9 This appears to be a typographic error. The (p + 2) th power makes mathematical sense.
that the probabilities of the various kinetic energy values for larger p are again dependent almost exclusively on the mean energy of the molecule. For infinitely large p we obtain the following limiting values To establish whether we have a maximum or have a minimum, we need to examine the second variation of the Equation (4). We note that w 0 , w 1 , w 2 , etc. are very large, so we can use the approximation formula for ln Γ(w + 1) and neglecting terms which have second or higher powers of w in the denominator, obtain so we do in fact have a minimum. I also want to remark on the size of the term previously designated J.
One easily finds that J is given by the following binomial coefficient when you neglect terms that diminish with increasing λ or n Nowλ /n is equal to the average kinetic energy µ of a molecule, therefore So for large numbers one has , therefore, neglecting diminishing terms It goes without saying that these formulas are not derived here solely for finite p and n values, because these are unlikely to be of any practical importance, but rather to obtain formulas which provide the correct limiting values when p and n become infinite.
Nevertheless, it may help to demonstrate, with specific examples of only moderately large values of p and n that these formulas are quite accurate, and though approximate, are of some value even here.
We first consider the earlier example, where n = λ = 7, i.e., the number of molecules is 7, and the total kinetic energy is 7 , and so the mean kinetic energy is also . Suppose first, that p = 7, so each molecule can only have 0, , 2 , 3 , . . . 7 of kinetic energy. Then Equation (12) becomes from which it follows Since x is close to 1 2 , we can set x = 1 2 in the last two very small terms on the right-hand side, and obtain You could easily substitute this value for x back into the right side of Equation (17) and obtain a better approximation for x; since we already have an approximate value for x, a more rapid approach is to apply the ordinary Newton iteration method to Equation (16) which results in From this, one finds in accordance with Equations (7) These numbers satisfy the condition that is minimized, while the minimized variables w obey the two constraints which minimum, incidentally because of the first of Equations (18) coincides with the minimum of (w 0 ) w 0 (w 1 ) w 1 . . . This provides only an approximate solution to our problem, which asks for so many (w 0 ) zeros, so many (w 1 ) ones etc. with as many permutations as the resulting complexion permits, while the w s simultaneously satisfy the constraints (18). Since p and n here are very small, one hardly expects any great accuracy, yet you already get the solution to the permutation problem by taking the nearest integer for each w, with the exception of w 3 , for which you have to assign the value of 1 instead of 0,4551. In this manner it is apparent and in fact we saw in the previous table that the complexion of 0001123 has the most permutations. We now consider the same special case with n = λ = 7, but set p = ∞; that is, the molecules may have kinetic energies of 0, 1, 2, 3 . . . ∞. We know then that the values of the variables w will vary little from those of the former case. In fact we obtain We consider a little more complicated example. Take n = 13, λ = 19, but we only treat the simpler case where p = ∞. Then we have . . .
Substituting here for the w s the nearest integers, we obtain Already from the fact that w 0 + w 1 + . . . should = 13, it is seen that again one of the w s must be increased by one unit. From those w s that are set = 0, w 5 differs least from the next highest integer. We want therefore w 5 = 1, and obtain the complexion 0000011122345 whose digit sum is in fact = 19. The number of permutations this complexion is capable of is A complexion whose sum of digits is also = 19, and which one might suppose is capable of very many permutations, would be the following: 0000111222334.
The number of permutations is This is less than the number of permutations of the first complexion we found from the approximate formula. Likewise, we expect that the number of permutations of the two complexions 0000111122335 and 0000111122344 is smaller still. This is, for both complexions Other possible complexions are capable of still less permutations, and it would be quite superfluous to follow these up here. It is seen from the examples given here that the above formula, even for very small values of p and n gives values of w within one or two units of the true values. In the mechanical theory of heat we are always dealing with extremely large numbers of molecules, so such small differences disappear, and our approximate formula provides an exact solution to the problem. We see also that the most likely state distribution is consistent with that known from gases in thermal equilibrium. According to Equation (15) the probability of having a kinetic energy s is given by w s = n 2 n + λ · λ n + λ s since λ /n is equal to the average kinetic of a molecule µ, which is finite, so n is very small compared to λ. So the following approximations To achieve a mechanical theory of heat, these formulas must be developed further, particularly through the introduction of differentials and some additional considerations.

II. Kinetic Energies Exchange in a Continuous Manner
In order to introduce differentials into our formula we wish to illustrate the problem in the same manner as indicated on p171 (Wiss. Abhand. vol II) because this seems to be the best way to clarify the matter. Here each molecule was only able to have one of 0, , 2 , . . . p values for kinetic energy. We generated all possible complexions, i.e., all the ways of distributing 1 + p values of the kinetic energy among the molecules, yet subject to the constraints of the problem, using a hypothetical urn containing infinitely many paper slips. Equal numbers of paper slips have kinetic energy values 0, etc. written on them. To generate the first complexion, we draw a slip of paper for each molecule, and note the value of the kinetic energy assigned in this way to each molecule. Very many complexions are generated in the same way, they are assigned to this or that state distribution, and then we determine the most probably one. That state distribution which has the most complexions, we consider as the most likely, or corresponding to thermal equilibrium. Proceeding to the continuous kinetic energy case the most natural approach is as follows: Taking to be some very small value, we assume that in the urn are very many slips of paper labeled with kinetic energy values between 0 and . In the urn are also equal numbers of paper slips labeled with kinetic energy values between and 2 , 2 and 3 up to infinity. Since is very small, we can regard all molecules with kinetic energy between x and x + as having the same kinetic energy. The rest of the calculation proceeds as in Section I above. We assume some complexion has been drawn; w 0 molecules have kinetic energy between 0 and , w 1 molecules have values between and 2 , w 2 have values between 2 and 3 , etc.
Here, because the variables w 0 , w 1 , w 2 etc. will be infinitely small, of the order of magnitude of , we prefer to write them as The probability of the state distribution in question is given, exactly as in Section I, by the number of permutations that the elements of the state distribution are capable of, e.g., by the number Again, the most likely state distribution, which corresponds to thermal equilibrium, is defined by the maximum of this expression, that is, when the denominator is minimized. We use again the reasonable approximation of Section I, replacing w! by the expression We can omit the term √ 2π since it is a constant factor in the minimization; The key again is to replace minimization of the denominator with minimization of its logarithm; then we obtain the condition for thermal equilibrium, that M = w 0 ln w 0 + w 1 ln w 1 + w 2 ln w 2 + . . . − n is a minimum; while again satisfying the two constraints which are identical with Equations (1) and (2) (20) and (21) become instead. As is made still smaller, the allowed values of kinetic energy approach a continuum. For vanishingly small , various sums in Equations (22)-(24) can be written in the form of integrals, leading to the following equations The functional form of f (x) is sought which minimizes expression (25) subject to the constraints (26) and (27), so one proceeds as follows: To the right side of Equation (25) where x is the independent variable, and f is the function to be varied. This results in Setting the quantity which has been multiplied by δf (x) in square brackets = 0, and solving for the function f (x), we obtain here the constant e −k−1 is denoted by C for brevity. The second variation of M is necessarily positive, since f (x) is positive for all values of x lying between 0 and ∞ . By the calculus of variations M is a minimum. From Equation (28), the probability that the kinetic energy of a molecule lies between x and x + dx at thermal equilibrium is The probability that the velocity of a molecule lies between ω and ω + dω would be where m is the mass of a molecule. Equation (29) gives the correct state distribution for elastic disks moving in two dimensions, for elastic cylinders with parallel axis moving in space, but not for elastic spheres which move in space. For the latter the exponential function must be multiplied by ω 2 dω not ωdω. To get the right state distribution for the latter case we must set up the initial distribution of paper slips in our urn in a different way. To this point we assumed that the number of paper slips labeled with kinetic energy values between 0 and is the same as those between and 2 . As also for slips with kinetic energies between 2 and 3 , 3 and 4 , etc.
Now, however, let us assume that the three velocity components along the three coordinate axes, rather than the kinetic energies, are written on the paper slips in the urn. The idea is the same: There are the same number of slips with u between 0 and , v between 0 and ζ, and w between 0 and η. The number of slips with u between and 2 , v between 0 and ζ, and w between 0 and η is the same. Similarly, the number for which u is between and 2 , v is between ζ and 2ζ, w is between 0 and η. Generally, the number of slips for which u, v, w are between the limits u and u + , v and v + ζ, w and w + η are the same. Here u, v, w have any magnitude, while , ζ, η are infinitesimal constants. With this one modification of the problem, we end up with the actual state distribution established in gas molecules.
(LB footnote: We can of course, instead of using finite quantities , ζ, η and then taking the limit as they go to zero, write du, dv, dw from the outset, then the distribution of paper slips in the urn must be such that the number for which u, v, w are between u and u + du, v and v + dv, w and w + dw are proportional to the product dudvdw and independent of u, v, and w. The earlier distribution of slips in the urn is characterized by the fact that although could be replaced by dx, kinetic energies between 0 and dx, dx and 2dx, 2dx and 3dx etc. occurred on the same number of slips.) If we now define w abc = ζηf (a , bζ, cη) where we first assume u adopts only values between −p and +p , v between −qζ and +qζ, w between −rη and +rη. Where again, the most likely state distribution occurs when this expression, or if you will, its logarithm, is maximum. We again substitute n! by √ 2π n e n and w! by where you can again immediately omit the factors of √ 2π as they simply contribute additive constants − 1 2 ln 2π to ln P; omitting also the constant n ln n term, the requirement for the most probable state distribution is that the sum Substituting for w abc using Equation (30), one immediately sees that the triple sums can in the limit be expressed as definite integrals; omitting an additive constant, the quantity to be maximized becomes The two constraint equations become The variable Ω, which differs from the logarithm of the number of permutations only by an additive constant, is of special importance for this work and we call it the permutability measure. I note, incidentally, that suppression of the additive constants has the advantage that the total permutability measure of two bodies is equal to the sum of the permutability measures of each body.
Thus it is the maximum of the quantity (34) subject to the constraints (35) and (36) that is sought. No further explanation of this problem is needed here; it is a special case of the problem I have already discussed in my treatise "On the thermal equilibrium of gases on which external forces act" 10 in the section which immediately precedes the appendix. There I provided evidence that this state distribution corresponds to the condition of thermal equilibrium. Thus, one is justified in saying that the most likely state distribution corresponds with the condition of thermal equilibrium. For if an urn is filled with slips of paper labeled in the manner described earlier, the most likely sampling will correspond to the state distribution for thermal equilibrium. We should not takes this for granted, however, without first defining what is meant by the most likely state distribution. For example, if the urn was filled with slips labeled in the original manner then the statement would be incorrect.
The reasoning needed to arrive at the correct state distribution will not escape those experienced in working with such problems. The same considerations apply to the following circumstance: If we group all the molecules whose coordinates at a particular time lie between the limits ξ and ξ + dξ, η and η + dη, ζ and ζ + dζ, and whose velocity components lie between the limits u and u + du, v and v + dv, w and w + dw and let these molecules collide with other molecules under specific conditions, after a certain time their coordinates will lie between the limits Ξ and Ξ + dΞ, H and H + dH, Z and Z + dZ, and their velocity components will lie between the limits U and U + dU, V and V + dV, W and W + dW (40) Then at any time dξ · dη · dζ · du · dv · dw = dΞ · dH · dZ · dU · dV · dW.
This is a general result. If at time zero the coordinates and velocity components of arbitrary molecules (material points) lie between the limits (37) and (38), and unspecified forces act between these molecules, so that at time t the coordinates and velocity components lie between the limits (39) and (40), then Equation (41) is still satisfied.
(Full details, a rigorous treatment and an even more general theorem can be found in the book by Watson "A treatise on the kinetic theory of gases", p12) If, instead of the velocity components, one uses the kinetic energy x and the velocity direction defined by the two angles α and β, to describe the action of the forces, these variables would initially lie between the limits ξ and ξ + dξ, η and η + dη, ζ and ζ + dζ, x and x + dx, α and α + dα, β and β + dβ, and then after the action of the forces lie between the limits Ξ and Ξ + dΞ, H and H + dH, Z and Z + dZ, X and X + dX, A and A + dA, B and B + dB, and so So the product of the differentials du · dv · dw becomes dU · dV · dW . Therefore the list of slips in the urn must be labeled uniformly with velocity components lying between u and u + du, v and v + dv, w and w + dw, whatever values u, v, w have. Given a certain value of the coordinates, the velocities must be described by the corresponding "moments". On the other hand √ xdx goes over to √ XdX. With the introduction of kinetic energy, slips must be labeled so that you have the same number with kinetic energy between x and x + √ xdx where dx is constant but x is completely arbitrary. This last sentence is in agreement with my "Remarks on some problems of the mechanical theory of heat" (Wiss. Abhand. Vol II, reprint 39 p121), where I demonstrated that this is the only valid way to find the most likely state distribution corresponding to the actual thermal equilibrium; here we have demonstrated a posteriori that this leads to the correct state distribution for thermal equilibrium, that which is the most likely in our sense.
Of course it is easy to analyze those cases where other conditions exist besides the principle of conservation of kinetic energy. Suppose for example, a very large number of molecules for whom (1) The total kinetic energy is constant; (2) The net velocity of the center of gravity in the directions of the x-axis; (3) y-axis and (4) z-axis are given. The question arises, what is the most probable distribution of the velocity components among the molecules, using the term in the previous sense. We then have exactly the same problem, except with four constraints instead of one. The solution gives us the most probable state distribution where C, h, α, β, γ are constants. This is in fact the state distribution for a gas at thermal equilibrium at a certain temperature, not at rest, but moving with a constant net velocity. You can treat similar problems such as the rotation of a gas in the same manner, by adding in the appropriate constraint equations; which I have discussed in my essay "On the definition and integration of the equations of molecular motion in gases." (Wiss. Abhand. Vol II, reprint 36) Some comment regarding the derivation of Equation (34) from Equation (31) is required here. The formula for x! is √ 2πx x e First note, that in determining the magnitude of P, in the limit of a very large number of molecules, n, (and thus also of w abc ), other small quantities such as , ζ, η can be treated as infinitesimals. So all terms which have n or w abc in the denominator can be neglected, and also the 1 2 in the term w abc + 1 2 . The terms containing w abc scale with the total mass of the gas, while the related 1 2 terms refer only to a single molecule. So the latter quantities can be neglected as the number of molecules increases. We then get ln P = n ln n − (p + q + r + 1) ln 2π − Substituting ζηf (a , bζ, cη) , for w abc we obtain ln P = n ln n − (p + q + r + 1) ln 2π − n ln ( ζη) − One sees that aside from the triple sum, the terms on the right hand side are constant, and so can be omitted. We also let , ζ, η decrease while p, q, r increases infinitely, so the triple sum goes over into a triple integral over limits −∞ to +∞ and from ln P we arrive immediately at the expression given by Equation (34) for the permutability measure Ω. The critical condition is that the number of molecules is very large; this means that w abc is large compared to 1 2 ; also that the velocity components between the limits a and (a + 1) , bζ and (b + 1)ζ, cη and (c + 1)η are identical to those between the limits u and u + du, v and v + dv, w and w + dw. This may appear strange at first sight, since the number of gas molecules is finite albeit large, whereas du, dv, dw are mathematical differentials. But on closer deliberation this assumption is self-evident. For all applications of differential calculus to the theory of gases are based on the same assumption, namely: diffusion, internal friction, heat conduction, etc. In each infinitesimal volume element dxdydz there are still infinitely many gas molecules whose velocity components lie between the limits u and u + du, v and v + dv, w and w + dw. The above assumption is nothing more than that very many molecules have velocity components lying within these limits for every u, v, w.

III. Consideration of Polyatomic Gas Molecules and External Forces
I will now generalize the formulas obtained so far, by first extending them to so-called polyatomic gas molecules and then including external forces and thereby finally beginning to extend the discussion to any solid and liquid. In order not to consider too many examples, I will in each case deal with the most important case, where, aside from the equation for kinetic energy, there is no other constraint.
The first generalization can be applied to our formulas without difficulty. So far, we assumed each molecule was an elastic sphere or a material point, so that its position in space was entirely defined by three variables (e.g., three orthogonal coordinates). We know that this is not the case with real gas molecules. We shall therefore assume that three coordinates are insufficient to completely specify the position of all parts of a molecule in space; rather r variables will be necessary the so called generalized coordinates. Three of them, p 1 , p 2 , p 3 are the orthogonal coordinates of the center of mass of the molecule, the others can be either the coordinates of the individual atoms relative to the center of mass, the angular direction, or whatever specifies the location of every part of the molecule. We will also remove the restriction that only one type of gas molecule is present. We assume instead, there exists a second type whose every molecule has the generalized coordinates for the third type the generalized coordinates are If there are v + 1 types of molecules, the generalized coordinates of the final type are The first three coordinates are always the orthogonal coordinates of the center of mass. Of course the necessary assumption is that many molecules of each type are present. Let l be the total kinetic energy of the first type of gas; χ is its potential energy 11 (so that χ + l is constant if internal forces only are acting). Furthermore q 1 , q 2 , q 3 . . . q r are the momentum coordinates corresponding to p 1 , p 2 . . . p r . We can think of l in terms of the coordinates p 1 , p 2 . . . p r and their derivatives with respect to timė and denote the quantities c 1 (dl/dṗ 1 ), c 2 (dl/dṗ 2 ) by q 1 , q 2 . . ., where c 1 , c 2 . . . are arbitrary constants. I would like to note here that in my essay "Remarks on some problems in the mechanical theory of heat" Section III (Wiss. Abhand. Vol II, reprint 39) the indexed variables designated p i referred to coordinate derivatives, while here they have been designated by q; this mistake would probably not have caused any misunderstanding.
We denote with the appropriate accents the analogous quantities for other types of molecules. According to the calculations of Maxwell, Watson and myself, in a state of thermal equilibrium, the number of molecules for which the magnitudes of p 4 , p 5 . . . p r , q 1 , q 2 , q 3 . . . q r lie between the limits p 4 and p 4 + dp 4 , p 5 and p 5 + dp 5 etc.q r and q r + dq r (43) is given by Ce −h(χ+l) dp 4 dp 5 . . . dq r where C and h are constants, independent of p and q. Analogous expressions hold of course for the other molecular species with the same value of h, but differing values of C. Exactly the same equation as (44) is also obtained using the methods of Sections I and II. Consider all those molecules of the first type, for which the variables p 4 , p 5 . . . q r at some time 0 lie between the limits (43), after a lapse of some time t, the values of the same variables lie between the limits P 4 and P 4 + dP 4 , P 5 and P 5 + dP 5 etc.Q r and Q r + dQ r (45) The general principle already invoked, gives the following equation dp 4 · dp 5 . . . dq r = dP 4 · dP 5 . . . dQ r , There is of course also dp 1 · dp 2 · dp 3 = dP 1 · dP 2 · dP 3 .
So that, in fact, for the variables p 4 , p 5 , . . . q r the product of their differentials does not change during a constant time interval. So we must now imagine v + 1 urns. In the first are slips of paper, upon which are written all possible values of the variables p 4 , p 5 . . . q r ; and the number of slips which have values within the limits of Equation (43) is such that, when divided by the product dp 4 · dp 5 . . . q r it is a constant. Similarly for the labeling of the slips with the variables p 4 , p 5 . . . q r in the second urn, except that for the latter the constant can have a different value. The same applies to the other urns. We draw from the first urn for each molecule of the first type, from the second urn for molecules of the second type, etc. We now suppose that the values of the variables for each molecule are determined by the relevant drawings. It is of course entirely chance that determines the state distributions for the gas molecules, and we must first discard those state distributions which do not have the prescribed value for total kinetic energy. It will then be most likely that the state distribution described by (44) will be drawn, i.e., that one corresponding to thermal equilibrium. The proof of this is straightforward. So the results found in the first two sections can be readily generalized to this case.
We want to generalize the problem further, assuming that the gas is composed of molecules specified exactly as before. But now so called external forces are acting, e.g., those like gravity which originate outside the gas. For details on the nature of these external forces, and how to treat them, see my treatise "On the thermal equilibrium of gases on which external forces act" 12 . The essence of the solution to the problem remains the same. Only now the state distribution will no longer be the same at all points of the vessel containing the gas; therefore dp 1 · dp 2 · dp 3 = dP 1 · dP 2 · dP 3 will no longer hold. We will now understand the generalized coordinates p 1 , p 2 . . . p r more generally to determine the absolute position of the molecule in space and the relative position of its constituents.
The notion that p 1 , p 2 , p 3 are just the orthogonal coordinates of the center of gravity is dropped. The same is true for the molecules of all the other types of gas. There is one further point to notice. Previously the only necessary condition was that throughout the vessel very many molecules of each type were present; now it is required that even in a small element of space, over which the external forces do not vary significantly in either size or direction, very many molecules are present (a condition, incidentally, which must hold for any theoretical treatment of problems where external forces on gases come into play). This is because our method of sampling presupposes that the states of many molecules can be considered equivalent, in the sense that the state distribution is not changed when the states of these molecules are exchanged. The probability of a state distribution is then determined by the number of complexions of which this state distribution is capable of. This is why, for the case just considered, with v + 1 molecular species present, v + 1 urns must be constructed.
We assume first, that a complexion has been drawn where These limits are so close that we can equate all the values in between, then it is as if the variable p 1 could only take the values 0, α, 2α, 3α etc., variable p 2 could take the values 0, β, 2β, 3β etc. Let n be the total number of molecules of the first type. We again distinguish the variables for the other gases by the corresponding accents, so that is the possible number of permutations of the elements of this complexion, which we call the permutability. The products are to be read so that the indices a, b . . . , a , b . . . etc. run over all possible values, i.e., −∞ to +∞ for orthogonal coordinates, 0 to 2π for angular coordinates, and so on. Consider first, the case where p 1 really can take only the values 0, α, 2α, 3α, . . ., and similarly with the other variables; then expression (49) is just the number of complexions this state distribution could have; this number is, according to the assumptions made above a measure of the probability of the state distribution. The variables w and n are all very large; we can again therefore replace w! with √ 2π(w/e) w . We also denote the sum n ln n + n ln n + . . . n (v) ln n (v) by N, so we can also replace n! by √ 2π(n/e) n and then immediately take the logarithm ln P = N − C ln 2π − w ab... · ln w ab... + w a b ... · ln w a b ... + . . . .
The sums are to be understood in the same sense as the products above. 2C is the number of factorials in the denominator of Equation (49) minus v + 1.
Let us now substitute the expression (47) for the variables w into Equation (50) and then take the limit of infinitesimal α, β, γ . . .. Omitting unnecessary constants, the magnitude we obtain for the permutability measure, denoted by Ω, is . . . f (p 1 , p 2 . . . q r ) ln f (p 1 , p 2 . . . q r )dp 1 dp 2 . . . dq r + . . . f (p 1 , p 2 . . . q r ) ln f (p 1 , p 2 . . . q r )dp 1 dp 2 . . . dq r + . . .] The integration is to extend over all possible values of the variables. I have in my paper "On the thermal equilibrium of gases on which external forces act." demonstrated that the expression in the square brackets is at a minimum for a gas in a state of thermal equilibrium; including, of course, the kinetic energy constraint equation.

IV. On the Conditions for the Maximum of the Power-Exponent Free Product Determining the State Distribution Function
Before I go into the treatment of the second law I want to concisely treat a problem whose importance I believe I have shown in Section I, in the discussion of the work of Mr. Oskar Emil Meyer's on this subject; namely the problem of finding the maximum of the product of the probabilities of all possible states. However I want to deal with this problem only for mono-atomic gases, and with no other constraint than the equation for the kinetic energy. We first consider the simplest case where only a discrete number of kinetic energy values, 0, , 2 . . . p are possible, and to start we use kinetic energies, not velocity components, as variables. We again denote by w 0 , w 1 , w 2 . . . w v the number of molecules with kinetic energy 0, , 2 . . . p .
If we treat the subject in the usual way, the following relationship holds: The quantity or, if you prefer, the quantity ln B = ln w 0 + ln w 1 + ln w 2 + . . . ln w p (52) must be a maximum, with the constraints and L = (w 1 + 2w 2 + 3w 3 + . . . + pw p ) .
If to Equation (52) we add Equation (53) multiplied by h, and add Equation (54) multiplied by k, then set the partial derivatives of the sum with respect to w 0 , w 1 , w 2 . . . equal to zero, we obtain the equations from which, by elimination of the constants h and k Substituting these values into Equations (53) and (54) the two constants a and b can be determined: The direct determination of the two unknowns a and b from these equations would be extremely lengthy. The method of Regula falsi would provide a more rapid solution for each special case; I have not troubled myself with such calculations, but will give here only a general discussion of how the expected solutions can be easily obtained, keeping in mind that these methods can only provide an approximation solution to the problem, since only positive integers are allowed, but fractional values are not. The first point to note is that the problem ceases to have any meaning as soon as the product p · (p + 1)/2 is greater than L/ . Because then it necessarily follows that one of the w s, and so also the product B, is zero. Then there is no question of a maximum value for B. For the problem to make any sense, an excessive value for the kinetic energy cannot be possible. If then all the w s from w 0 onwards must be equal to 1 for B to be non-zero. A greater variation in values can occur only if smaller values of p are chosen. Then, when n is large the above equations provide usable approximations. First, a will be significantly smaller than b, so w 0 is very large, and w 1 will be much smaller; w 2 will be close to w 1 /2, w 3 will be close to 2w 2 /3 etc. In general, the decrease in the variable w with increasing index will be fairly insignificant when the maximum of w 0 · w 1 · w 2 . . . is sought, rather than the maximum of w w 0 0 w w 1 1 w w 2 2 . . .. Given much smaller p values, the value of a is not much less than b , so w 0 is also not that much larger than the other w s; then w 2 is greater than w 1 /2, w 3 is greater than (2/3)w 2 etc. The decrease of w with increasing index is even less. Decreasing p still further, a will dominate, and there will be hardly any decrease in w with increasing index. Finally b becomes negative, and the size of w will even increase with increasing index. The following cases provide examples, for each the integer values of the w s which maximize B are given. Let us now turn to the case where the value of the kinetic energy is continuous; first, consider the kinetic energy x as the independent variable, so the problem, in our view is the following: The expression are constant. P is also constant. I have purposefully set the upper integration limit to P , not ∞. It is then still straightforward to allow P to increase more and more. Proceeding accordingly, we obtain: To determine these two constants we use the equations which, writing a/b as α, leads to: , bn = ln 1 + P α and also (L + αn) ln 1 + P α = P n.
From this transcendental equation α must be determined, from which it is easy to obtain a and b. Since P n is the kinetic energy which the gas would have if every molecule in it had the maximum possible kinetic energy P , we see immediately that P n is infinitely greater than L. L/n is the average kinetic energy of a molecule. It is then easy to verify that P/α cannot be finite because then P n/αn would be finite, and in the expression L + αn L could be neglected. But then in Equation (56) only P/α terms would remain, and only vanishingly small values of this term could satisfy the equation, which is inconsistent with the original assumption. Nor can P/α be vanishingly small because then L would again be vanishingly small compared to αn. Furthermore ln 1 + P α could be expanded in powers of P/α, and the Equation (56) would yield a finite value for P/α. There remains only the possibility that P/α is very large. Since αn P n ln P n αn vanishes, Equation (56) gives By the approach used in this section, using the mean kinetic energy of a molecule, these equations show that in the limit of increasing p, W, L the probability of dispersion in kinetic energy remains indeterminate. We now want to consider a second, more realistic problem. We take the three velocity components u, v, w parallel to the three coordinate axes as the independent variables, and find the maximum of the expression while simultaneously the two expressions remain constant, integrating over u, v, w. If the velocity magnitude is and its direction is given by the two angles θ and ϕ (length and breadth 13 ), we have as is well known If there are no external forces, then clearly f (u, v, w) is independent of direction of the velocity. Instead of integrating to infinity, we intentionally integrate to a finite value of P . Evaluating f ( ) just as we did for f (x) earlier, we obtain where we set −h = a 2 and −k = b 2 . The two constant a and b are to be determined from Equations (58) and (59) which become, given the value of f ( ): From the last equation we get Substituting this equation into the first equation of (60) gives If however b 2 is negative, we put −b 2 instead of b 2 and obtain: From Equations (60a) and (60b) one first has to calculate the ratio a/b; and by the same means by which Equation (56) was analyzed, we first determine whether bP/a is infinitely small, finite, or infinitely large, the only difference being in Equation (60a) every infinitesimal variation ofL/nP 2 occurs. However I will not discuss the point further, except to note that as n and P grow larger, one also cannot get a result which depends only on the average kinetic energy. Also I will not discuss in detail those cases where there are other constraint equations besides the equation for kinetic energy, as this would lead me too far afield.
To provide a demonstration of how general the concept of the most probable state distribution of gas molecules is, here I supply another definition for it. Suppose again that each molecule can only have a discrete number of values for the kinetic energy, 0, , 2 , 3 . . . ∞.
The total kinetic energy is L = λ . We want to determine the kinetic energy of each molecule in the following manner: We have in an urn just as many identical balls (n) as molecules present. Every ball corresponds to a certain molecule. We now make λ draws from this urn, returning the ball to the urn each time. The kinetic energy of the first molecule is now equal to the product of and the number of times the ball corresponding to this molecule is drawn. The kinetic energies of all other molecules are determined analogously. We have produced a distribution of the kinetic energy L among the molecules (a complexion). We again make λ draws from the urn and produce a second complexion, then a third, etc. many times (J), and produce J complexions. We can define the most probable state distribution in two ways: First, we find how often in all J complexions a molecule has kinetic energy 0, how often the kinetic energy is , 2 , etc., and say that the ratios of these numbers should provide the probabilities that a molecule has kinetic energy 0, , 2 , etc. at thermal equilibrium. Second, for each complexion we form the corresponding state distribution. If some state distribution is composed of P complexions, we then denote the quotient P/J as the probability of the state distribution. At first glance this definition of a state distribution seems very plausible. But we shall presently see that this should not be used, because under these conditions, the distribution whose probability is the greatest would not correspond to thermal equilibrium. It is easy to cast the hypothesis that concerns us into formulas. First of all, we want to discuss the first method of probability determination. We consider the first molecule, and assume that λ draws were made; the probability that the first molecule was picked in the first draw is 1/n; however the probability that another ball was drawn is (n − 1)/n. Thus the probability that on the 1 st , 2 nd , 3 rd . . . k th draws the molecule corresponding to the first ball has been picked, and then a different ball for each of the following, is given by Likewise is the probability that the ball corresponding to the first molecule is picked on the 1 st , 2 nd , 3 rd . . . (k − 1) th , and then (k+1)th draws etc. The probability that the ball corresponding to the first molecule is picked for any arbitrary k draws, and not for the others is This probability, that a molecule has then kinetic energy k is exactly the same for all the other molecules. Using again the approximation formula for the factorial, we obtain which shows that the probability of the larger kinetic energies is so disproportionately important that the entire expression does not approach a clearly identifiable limit with increasing k, λ, 1/ and n. We will now proceed to the second possible definition of the most probable state distribution. We need to consider all J complexions that we have formed by J drawings of λ balls from our urn. One of the various possible complexions consists of λ drawings of the ball corresponding to the first ball. We want to express this complexion symbolically by m λ 1 · m 0 2 · m 0 3 . . . m 0 n . A second complexion, with λ − 1 draws of the ball corresponding to the first molecule, and one draw of the ball corresponding to the second molecule we want to express as We see that the different possible complexions are expressed exactly by various components; the sum of these appears as the power series that is developed according to the polynomial theorem. The probability of each such complexion is thus exactly proportional to the coefficient of the corresponding power series term, when you first form the product and finally omit from this product the upper indexes, which then generates a term exactly proportional to the polynomial coefficient. Then by the symbol m 1 · m 3 · m 7 . . . we understand that the first pick corresponded to the first molecule, the second pick corresponded to the third molecule, on the third pick the ball corresponding to the seventh molecule was picked out, etc. All possible products of the variables m 1 , m 1 , m 2 etc. represent equi-probable complexions. We want to know how often among all the terms of the power series (A) (whose total number is n λ ), there occur terms whose coefficients contain any one state distribution. For example, consider the state distribution where one molecule has all the kinetic energy, all others have zero kinetic energy. This state distribution appears to correspond to the following members of the power series (A) with 'undivided' λ. Similarly for the state distribution in which w 0 molecules have kinetic energy zero, w 1 molecules have kinetic energy , w 2 molecules have kinetic energy 2 , etc. there are λ! w 0 !w 1 !w 2 ! . . . w λ ! members of the power series (A). Each of these elements has the same polynomial coefficient, and that is identical to In summary, therefore, according to the now accepted definition, the probability of this state distribution is However, the maximization of this quantity also does not lead to the state distribution corresponding to thermal equilibrium.

V. Relationship of the Entropy to That Quantity Which I Have Called the Probability Distribution
When considering this relationship, let us initially deal with the simplest and clearest case, by first investigating a monoatomic gas on which no external forces act. In this case formula (34) of Section II applies. To give it full generality, however, this formula must also include the x, y, z coordinates of the position of the molecule. The maximum of such a generalized expression (34) then yields, not only the distribution of the velocity components of the gas molecules, which was sufficient for the case considered there, but also the distribution of the whole mass of gas in an enclosing vessel, where there it was taken for granted that the gas mass fills the vessel uniformly.
The generalization of (34) for the permutability measure can be easily obtained from Equation (51) by substituting x, y, z, u, v, w for p 1 , p 2 . . . q r and simply omitting the terms with accented variables. It reads as follows: where f (x, y, z, u, v, w)dxdydzdudvdw is the number of gas molecules present for which the six variables x, y, z, u, v, w lie between the limits x and x + dx, y and y + dy, z and z + dz . . . etc., w and w + dw and the integration limits for the velocity are between −∞ and +∞, and for the coordinates over the dimensions of the vessel in which the gas exists. If the gas was not previously in thermal equilibrium, this quantity must grow. We want to compute the value this quantity has when the gas has reached the state of thermal equilibrium. Let V be the total volume of the gas, T be the average kinetic energy of a gas molecule, and N be the total number of molecules of the gas; finally m is the mass of a gas molecule. There is then for the state of thermal equilibrium Substituting this value into Equation (61) gives If by dQ we denote the differential heat supplied to the gas, where and p is the pressure per unit area, the entropy of the gas is then: since N is regarded as a constant, with a suitable choice of constant It follows that for each so-called reversible change of state, wherein in the infinitesimal limit the gas stays in equilibrium throughout the change of state, the increase in the Permutability measure Ω multiplied by 2 3 is equal to dQ/T taken over the change in state, i.e., it is equal to the entropy increment. Whereas in fact when a very small amount of heat dQ is supplied to a gas, its temperature and volume increase by dT and dV . Then it follows from Equations (63) and (64) while from Equation (62) it is found that: It is known that if in a system of bodies many reversible changes are taking place, then the total sum of the entropy of all these bodies remains constant. On the other hand, if among these processes are ones that are not reversible, then the total entropy of all the bodies must necessarily increase, as is well known from the fact that dQ/T integrated over a non-reversible cyclic process is negative.
According to Equation (65) the sum of the permutability measures of the bodies ΣΩ and the total permutability measure must have the same increase. Therefore at thermal equilibrium the magnitude of the permutability measure times a constant is identical to the entropy, to within an additive constant; but it also retains meaning during a non-reversible process, continually increasing. We can establish two principles: The first refers to a collection of bodies in which various state changes have occurred, at least some of which are irreversible; e.g., where some of the bodies were not always in thermal equilibrium. If the system was in a state of thermal equilibrium before and after all these changes, then the sum of the entropies of all the bodies can be calculated before and after those state changes without further ado, and it is always equal to 2/3 times the permutability measure of all the bodies. The first principle is that the total entropy after the state changes is always greater than before it. The same is of course true of the permutability measure. The second principle refers to a gas that undergoes a change of state without requiring that it begin and end in thermal equilibrium. Then the entropy of the initial and final states is not defined, but one can still calculate the quantity which we have called the permutability measure; and that is to say that its value is necessarily larger after the state change than before. We shall presently see that the latter proposition can be applied without difficulty to a system of several gases, and it can be extended as well to polyatomic gas molecules, and when external forces are acting. For a system of several gases, the sum of the individual gas permutability measures must be defined to be the permutability measure of the system; if one introduces on the other hand the number of permutations itself, then the number of permutations of the system would be the product of the number of permutations of the constituents. If we assume the latter principle applies to any body, then the two propositions just discussed are special cases of a single general theorem, which reads as follows: We consider any system of bodies that undergoes some state changes, without requiring the initial and final states to be in thermal equilibrium. Then the total permutability measure for the bodies continually increases during the state changes, and can remain constant only so long as all the bodies during the change of state remain infinitely close to thermal equilibrium (reversible state changes.) To give an example, we consider a vessel divided into two halves by a very thin partition. The remaining walls of the vessel should also be very thin, so that the heat they absorb can be neglected, and surrounded by a substantial mass of other gas. One half of the vessel should be completely filled with gas, while the other is initially completely empty. Suddenly pulling away the divider, which requires no significant work, causes that gas to spread at once throughout the vessel. Calculating the permutability measure for the gas, we find that this increases during this process, without changes in any other body. Now the gas is compressed in a reversible manner to its old volume by a piston. To achieve this manipulation , we can if we want assume that the piston is created by a surrounding dense gas enclosed in infinitely thin walls. This gas is unchanged except that it moves down in space.
Since the permutability measure does not depend on the absolute position in space, the permutability measure of the gas driving the piston does not change. That of the gas inside the vessel decreases to the initial value, since this gas has gone through a cyclic process. However, since this was not reversible, dQ/T integrated over this cycle is not equal to the difference between the initial and final values of the entropy, but it is smaller due to the uncompensated transformation in the expansion. In contrast, heat is transferred to the surrounding gas. So for this process the permutability measure of the surrounding gas is increased by as much as that of the enclosed gas in the vessel during the first process. Since the latter mass of gas went through a cyclic process, its entropy decreased by as much during the second process as it increased during the first, but not by dQ/T ; and because the second process was reversible, the entropy of the surrounding gas increased as much as the enclosed gas's decreased. The result is, as it has to be, that the sum of the permutability measures of all bodies of gas has increased. For a gas which moves at a constant speed in the direction of the x-axis 14 f (x, y, z, u, v, w) = V N 4πT 3m 3 · e − 3m 4T ((u−α) 2 +v 2 +w 2 ) .
If we substitute this expression into Equation (61) we get exactly Equation (62) again. Thus the translational movement of a mass of gas does not increase its permutability measure. And the same is true for the kinetic energy arising from any other net mass movement (molar movement); because it arises from the progression of the individual volume elements and their deformations and rotations which are of a higher order, -infinitely small -and therefore entirely negligible. Here we obviously ignore the changes of permutability measure due to internal friction or temperature changes connected with those molecular motions. The temperature T of the moving gas is understood to mean half of the average value of m[(u − α) 2 + v 2 + w 2 ]. So if frictional and temperature changes are not present, (e.g., if a gas, together with its enclosing vessel falls freely) a net mass movement has no effect on the permutability measure, until its kinetic energy is converted into heat; which is why molar motion is known as heat of infinite temperature. Let's now move on to a mono-atomic gas on which gravity acts. The permutability measure is represented by the same equation (51), but instead of the generalized coordinates we again introduce x, y, z, u, v, w. Equation (51) thus gives us a value for Ω which is exactly the same as Equation (61). In the case of thermal equilibrium one has f (x, y, z, u, v, w) = Ce where ω 2 = u 2 + v 2 + w 2 . The constant C is determined by the density of the gas. One has e.g., a prismatic shaped vessel of height h, with a flat, horizontal bottom surface with area = q. Further, let N be the total number of gas molecules in the vessel, and z denote the height of a gas molecule from the bottom of the vessel, then It can be seen that Equation (71) is identical to 3N/2 times Equation (18) of the aforementioned paper, except for an additive constant, wherein multiplying by N indicates that Equation (18) applies to one molecule only. Equation (95) of "Further Studies" is in an opposite manner denoted as Ω and thus also as the entropy. Taken with a negative sign, it is however greater than Ω here by N ln N . The former is because in the "Further Studies" I was looking for a value which must decrease, as a result of this, introducing the magnitude of f * instead of using f ; this was however less clear. From this agreement it follows that our statement about the relationship of entropy to the permutability measure applies to the general case exactly as it does to a monatomic gas. Up to this point, these propositions may be demonstrated exactly using the theory of gases. If one tries, however, to generalize to liquid drops and solid bodies, one must dispense with an exact treatment from the outset, since far too little is known about the nature of the latter states of matter, and the mathematical theory is barely developed. But I have already mentioned reasons in previous papers, in virtue of which it is likely that for these two aggregate states, the thermal equilibrium is achieved when Equation (51) becomes a maximum, and that when thermal equilibrium exists, the entropy is given by the same expression. It can therefore be described as likely that the validity of the principle which I have developed is not just limited to gases, but that the same constitutes a general natural law applicable to solid bodies and liquid droplets, although the exact mathematical treatment of these cases still seems to encounter extraordinary difficulties.