# Photons, Bits and Entropy: From Planck to Shannon at the Roots of the Information Age

## Abstract

**:**

## 1. Introduction

## 2. Planck and the Myth of Entropy

Thus we see that the electrodynamical state is not by any means determined by the thermodynamic data and that in cases where, according to the laws of thermodynamics and according to all experience, an unambiguous result is to be expected, a purely electrodynamical theory fails entirely, since it admits not one definite result, but an infinite number of different results and before entering on a further discussion on this fact and of the difficulty to which it leads in the electrodynamical theory of heat radiation, it may be pointed out that exactly the same case and the same difficulty are met with in the mechanical theory of heat, especially in the kinetic theory of gases.

^{2}S/dU

^{2}. It gave S as logarithmic function of U, “which is suggested by probability calculus”. Two months later, on 14 December 1900, Planck’s famous derivation of his new blackbody law starts with the words: “Entropie bedingt unordnung” (entropy conditions disorder). Then Planck recalls the manifestation of disorder in his resonator, the temporal irregularity of its phase and amplitude. Accordingly, the degree of disorder compatible with the energy U of the resonator is equal to the number of evolutions compatible with the time average energy U A semi-discrete version of this number is given by the number of distributions of energy E = NU over a large set of N identical and independent resonators, since this set can be taken to represent a single resonator at N different times” [4].

^{−27}erg sec. This constant, once multiplied by the common frequency of the resonators, gives the energy element e in ergs and by division of E by e gets the number P of energy elements to be distributed over the N resonator. When this quotient is not an integer, P is taken to be a neighboring integer” [4].

- ○
- Entropy is proportional to the logarithm of the probability of a state;
- ○
- The probability of any state is proportional to the number of corresponding complexions. Further, all complexions are equally probable.

_{0}molecules with energy 0, w

_{1}molecules with energy e, and so on is

_{i}] of a single configuration of N indistinguishable particles among P discrete states, using the expression:

^{N}.

## 3. Darwin, Fowler, and Schrodinger and the Centrality of the Partition Function

_{1}, a

_{2},…, a

_{m}states (characterized by different volumes in their example). Assuming that each complexion is equally probable results the same expression used by Planck [12] and, earlier, by Boltzmann; thus, the multinomial expression given by Equation (15). Hence, they continue: “Take, for example, a group of M A’s systems and suppose them immersed in a bath of a very much larger number of B’s. We can now define the entropy of the A’s when their specification is a

_{1}, a

_{2},…, a

_{m}as k times the logarithm of the probability of that specification. In calculating the probability, we are indifferent about the distribution among the B’s, so we sum the complexions involving all values of the b’s consistent with the selected values of the a’s”.

_{MNMax}is the maximum of the multinomial expression connected with the dedicated specification.

- ○
- N can be made arbitrarily large;
- ○
- No question about the individuality of the members of the assembly can ever arise, as it does, according to the new statistics, with particles.

_{1}, e

_{2},…, e

_{l}with occupation number a

_{1}, a

_{2},…, a

_{l}(where occupation number means how many of the N systems are in the state 1, 2,…, l). Then Schrodinger introduces (without demonstration) this statement:

_{1}, e

_{2},…, and so on. This is the classical multinomial factor. Therefore, by definition, the number of ways to split N distinct objects into l distinct groups, of sizes a

_{1}, a

_{2},…, a

_{l}, respectively. In other words, P represents a multiplicity factor or number of microstates belonging to the class of states whose total energy is E.

_{0}− E

_{0}/T. Of these, E

_{0}is seen to correspond to the arbitrary zero of energy of the system, which appears in each exponent of the partition function Z. The constant S

_{0}depends on the absolute values adopted for the weight factors (the weight factor of the exponential terms of the partition function, that was set equal to 1 in Schrodinger’s notation). We have made the convention of taking this as a unity for simple quantized systems; but it is only a convention and quite without effect on the various average values, which are all that can ever be observed. Indeed, the only conditions attaching to the weight factor are precisely analogous to those attaching to the entropy in classical thermodynamics, a definite ration is required between the weights of states of systems which can pass from one to the other, but as long as two systems are mutually not convertible into one another, it makes absolutely no difference what choice is made for their relative weight”.

## 4. Nyquist, Hartley, and the Dawn of Information

^{n}. In order that two such systems should be equivalent, the total number of characters that can be distinguished should be the same. In other words,

^{5}, or 5log2. The information associated with 100 characters will be 500log2.”

- (a)
- Information must increase linearly with time. In other words, a two-minute message will in general contain twice as much information as a one-minute message;
- (b)
- Information is independent of s and n if s
^{n}is held constant.

## 5. Szilard, Brillouin, and Beyond; Physicists Discover Information

- (1)
- The period of measurement when the piston has just been inserted in the middle of the cylinder and the molecule is trapped either in the upper or lower parts, so that we choose the origin of coordinate x appropriately (choice 1 or choice 2 depending on the position of the molecule) and associate x to the parameter of the piston, y.
- (2)
- The period of utilization of the measurement, that is, “the period of decrease of entropy” (of the reservoir) during which the piston is moving up or down “according to the value of y” (up if the molecules are on the lower part, down if the molecules are on the upper part). During this period “the molecule must bounce on the piston and transmits energy to it” [26].

- (a)
- Information can be changed in negentropy and vice versa;
- (b)
- Any experiment by which information is obtained about a physical system corresponds, on average, to an increase of entropy in the system or in its surroundings. This average increase is always larger than (or equal to) the amount of information obtained. In other words, information must always be paid for in negentropy, the price paid being larger than (or equal to) the amount of information received. Correspondingly, when the information is lost, the entropy of the system is increased;
- (c)
- The smallest possible amount of negentropy required in an observation is of the order of k. A more detailed discussion gives the value kln2 (0.69 k, the minimum of information) as the exact limit, a result which concurs with our previous discussion of Szilard. In binary digits, this minimum represents just one bit;
- (d)
- These remarks lead to an explanation of the problem of Maxwell’s demon, which represents a device changing negentropy into information and back into negentropy;
- (e)
- When a communication channel is considered, the information formula used by Shannon is obtained. The average information per signal is $I=-k{\displaystyle \sum _{i}{p}_{i}}\mathrm{ln}{p}_{i}$.

^{−23}in SI system) introduced by the change from binary digit (bits) to thermodynamical units. The smallness of these terms is the fundamental reason why transmission of information by any practical method: writing, printing, telecommunications, is so inexpensive in entropy units, which also means inexpensive in dollar units. Modern life is based on these facts and would be completely different in a world where the negentropy of information would have a larger value” [31].

## 6. Shannon and the Importance of Channel Capacity

- It is practically more useful. Parameters of engineering importance such as time, bandwidth, number of relay tend to vary linearly with the logarithm of the number of possibilities: adding one relay doubles the number of possible states of the relays, doubling the time roughly squares the number of possible messages;
- It is nearer to our intuitive feeling as the proper measure… one feels, for example, that two punched cards should have twice the capacity of one for information storage and two identical channels twice the capacity of one for transmitting information;
- It is mathematically more suitable. Many limiting operations are simple in terms of the logarithm.

_{permissible}

_{0}is the unique solution of the equation Z(α) = 1, where Z is the partition function evaluated on the alphabet of the language.

_{ij}for the words allowable between the two states:

_{1}, p

_{2},…, p

_{n}and asks: “Can we find a measure of how much ‘choice’ is involved in the selection of the event or how uncertain we are of the outcome?” If there is such measure, say H(p

_{1}, p

_{2},…, p

_{n}) it is reasonable to require of it the following properties:

- H should be continuous in p
_{i}. - If all values of p
_{i}are equal, p_{i}= 1/n, then H should be a monotonic increasing function of n. With equally likely events there is more choice, or uncertainty, when there are more possible events. - If a choice is broken down into two successive choices, the original H should be the weighted sum of the individual values of H.

_{i}is the probability of a system being in the cell I of its phase space. H is then, for example, the H in Boltzmann’s famous theorem. We shall call $H=-K{\displaystyle \sum _{i=1}^{n}{p}_{i}}\mathrm{log}{p}_{i}$ the entropy of the set of probabilities (p

_{1}, p

_{2},…, p

_{n}).”

_{i}is the probability of occurrence of the “primary symbols” m (the same expression is also reported by Grandy [36] and Hamming [37]).

_{1}, a

_{2},… a

_{n}symbols originated by the source) has symbols that occur with different probabilities p(a

_{1}), p(a

_{2}),…, p(a

_{n}). We define the average length of the code words according to the parameter:

_{i}is the length (number of symbols) of the code word (in binary code, the number of zeros or ones). An efficient code is one which has the smallest average length. For a binary instantaneous code, it is possible to demonstrate (see Reference [37]) that a relation between the average length of the code words and entropy H of the alphabet source exists. It is:

^{k}the k-th extension of the information source S:

## 7. Jaynes or Synthesis

_{i}(i = 1, 2,…, n). We are not given the corresponding probabilities p

_{i}; all we know is the expectation value of the function f(x):

_{i}) assumes the values of the quantized energy e

_{l}, the expectation value of f(x) is the total energy E and the probability p

_{i}is the normalized occupation number a

_{l}/N. Moreover, we know that quantity x

_{i}belongs to the multinomial formula suggested in Equation (25). Hence, a complete generalization of the methods employed by Darwin and Fowler is possible. Jaynes has shown that this generalization found application in the domain of information theory in order to recover the principal noiseless Shannon theorems. In Reference [32] he considers a “partition function” “base 2” (instead of “base e”), and recovers the partition function as

_{i}is the length of the code word (in binary code) and λ is a generic Lagrange multiplier. Shannon’s First Theorem (in effect, since Shannon’s capacity is expressed in bit/s, equality is verified when (86) is written according to the duration of the symbols), expressed by Equation (72), becomes then

_{i}, then for the future we should assign, as a rational procedure, p

_{i}= f

_{i}(thus frequency equals probability). However, what should we do if the frequencies of occurrence are not measurable, as is the case with atoms and molecules, as well as many other problems of statistical inference? The answer, according Jaynes, is this (Jaynes’ “Principle of Minimum Prejudice”): Assign that set of values to pi which is consistent with the given information and which maximizes the uncertainty [40].

## 8. Conclusions

_{i}distinguishable groups allowing the accumulation of indistinguishable results originated by N independent draws. Each draw is associated to an elective property of which only the expectation value is known. This mechanism originates a standard multinomial distribution from which Shannon’s entropy is derived.

## Acknowledgments

## Conflicts of Interest

## References

- Tolman, R.C. The Principles of Statistical Mechanics, 1st ed.; Oxford University Press: New York, NY, USA, 1938. [Google Scholar]
- Kuhn, T.S. Black-Body Theory and the Quantum Discontinuity 1894–1912; Oxford University Press: New York, NY, USA, 1978. [Google Scholar]
- Badino, M. The Bumpy Road, Max Planck from Radiation Theory to the Quantum 1896–1906; Springer: Berlin, Germany, 2015. [Google Scholar]
- Agassi, J. Radiation Theory and the Quantum Revolution; Birkhauser: Basel, Switzerland, 1993. [Google Scholar]
- Darrigol, O. Statistical and Combinatorics in Early Quantum Theory. In Historical Studies in the Physical and Biological Sciences; University of California Press: Berkeley, CA, USA, 1988. [Google Scholar]
- Darrigol, O. Statistical and Combinatorics in Early Quantum Theory, II: Early Symptoma of Indistinguishability and Holism; University of California Press: Berkeley, CA, USA, 1991. [Google Scholar]
- Gearhart, C.A. Planck, the Quantum and the Historians. Phys. Perspect.
**2002**, 4, 170–215. [Google Scholar] [CrossRef] - Loudon, R. The Quantum Theory of the Light; Oxford University Press: New York, NY, USA, 1983. [Google Scholar]
- Scully, M.O.; Zubairy, M.S. Quantum Optics; Cambridge University Press: Cambridge, UK, 1997. [Google Scholar]
- Carnot, S. Reflections on the Motive Power of Fire; Dover: Mineola, NY, USA, 1960. [Google Scholar]
- Clausius, R. The Mechanical Theory of Heat; Hardpress: Sligo, Ireland, 2015. [Google Scholar]
- Clausius, R. Under verschieden fur die Anwendung bequeme Formen der Hauptgleichungen der mechanischen Warmetheorie. Ann. Phys.
**1865**, 125, 353–400. [Google Scholar] [CrossRef] - Planck, M. The Theory of Heat Radiation; The Maple Press: York, PA, USA, 1948. [Google Scholar]
- Tersoff, J.; Bayer, D. Quantum Statistics for Distinguishable Particles. Phys. Rev. Lett.
**1983**, 50, 553–554. [Google Scholar] [CrossRef] - Niven, R.H.; Grendar, M. Generalized classical, quantum and intermediate statistics and the Polya urn model. Phys. Lett. A
**2009**, 373, 621–626. [Google Scholar] [CrossRef] - Darwin, C.G.; Fowler, R.H. On the partition of energy. Philos. Mag.
**1922**, 6, 450–479. [Google Scholar] [CrossRef] - Brillouin, L. Les Statistiques Quantiques et Leurs Applications; Les Presses Universitaire de France: Paris, France, 1930. [Google Scholar]
- Schrodinger, E. Statistical Thermodynamics, 1st ed.; Cambridge University Press: Cambridge, UK, 1948. [Google Scholar]
- Massieu, M. Thermodinamique—Sur les functions caracteristiques des divers fluids. Compte Rendus
**1869**, 69, 858–862. (In French) [Google Scholar] - Darwin, C.G.; Fowler, R.H. On the partition of energy, Part II, Statistical principles and thermodynamics. Philos. Mag.
**1922**, 6, 823–842. [Google Scholar] [CrossRef] - Schrodinger, E. What is the life; Cambridge University Press: Cambridge, UK, 1945. [Google Scholar]
- Nyquist, H. Certain factors affecting telegraph speed. Bell Syst. Tech. J.
**1924**, 3, 324–346. [Google Scholar] [CrossRef] - Hartley, R. Transmission of Information. Bell Syst. Tech. J.
**1928**, 7, 535–563. [Google Scholar] [CrossRef] - Tuller, W.G. Theoretical Limitations on the Rate of Transmission of Information. Proc. IRE
**1949**, 37, 468–478. [Google Scholar] [CrossRef] - Shannon, C.E. A mathematical theory of communication. Bell Syst. Tech. J.
**1948**, 27, 379–423. [Google Scholar] [CrossRef] - Shannon, C.E. A mathematical theory of communication, Part 2. Bell Syst. Tech. J.
**1948**, 27, 623–656. [Google Scholar] [CrossRef] - Szilard, L. On the decrease of Entropy in a thermodynamic system by the intervention of intelligent beings. Z. Phys.
**1929**, 53, 840–856. [Google Scholar] [CrossRef] - Bennet, C.H. Demons, engines and the second law. Sci. Am.
**1987**, 257, 108–116. [Google Scholar] [CrossRef] - Brillouin, L. The negentropy principle of Information. J. Appl. Phys.
**1953**, 24, 1152–1163. [Google Scholar] [CrossRef] - Brillouin, L. Maxwell’s Demon Cannot Operate: Information and Entropy. I. J. Appl. Phys.
**1951**, 22, 334–337. [Google Scholar] [CrossRef] - Brillouin, L. Science and Information Theory; Academic Press: Waltham, MA, USA, 1956. [Google Scholar]
- Brillouin, L. Negentropy and Information in Telecommunications, Writing, and Reading. J. Appl. Phys.
**1954**, 25, 595–599. [Google Scholar] [CrossRef] - Landaurer, R. Irreversibility and heat generation in the computing process. IBM J. Res. Dev.
**1961**, 5, 183–191. [Google Scholar] [CrossRef] - Bennet, C.H. Notes on the history of reversible computation. IBM J. Res. Dev.
**2000**, 44, 270–277. [Google Scholar] [CrossRef] - Khandekar, A.; McEliece, R.; Rodemich, E. The discrete noiseless channel revisited. Proc. ISCTA
**1999**, 99, 115–137. [Google Scholar] - Grandy, W.T. Entropy and the Time Evolution of Macroscopic Systems; Oxford University Press: New York, NY, USA, 2008. [Google Scholar]
- Hamming, R.W. Coding and Information Theory; Prentice-Hall: Upper Saddle River, NJ, USA, 1980. [Google Scholar]
- Jaynes, E.T. Information theory and Statistical Mechanics. Phys. Rev.
**1957**, 106, 620–630. [Google Scholar] [CrossRef] - Jaynes, E.T. Note on Unique Decipherability. IRE Trans. Inf. Theory
**1959**, 5, 98–102. [Google Scholar] [CrossRef] - Tribus, M. Information theory as the basis for thermostatics and thermodynamics. J. App. Mech.
**1961**, 28, 1–8. [Google Scholar] [CrossRef] - Wannier, G.H. Statistical Physics, 1st ed.; Wiley & Sons: New York, NY, USA, 1966. [Google Scholar]

© 2017 by the author. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (http://creativecommons.org/licenses/by/4.0/).

## Share and Cite

**MDPI and ACS Style**

Martinelli, M. Photons, Bits and Entropy: From Planck to Shannon at the Roots of the Information Age. *Entropy* **2017**, *19*, 341.
https://doi.org/10.3390/e19070341

**AMA Style**

Martinelli M. Photons, Bits and Entropy: From Planck to Shannon at the Roots of the Information Age. *Entropy*. 2017; 19(7):341.
https://doi.org/10.3390/e19070341

**Chicago/Turabian Style**

Martinelli, Mario. 2017. "Photons, Bits and Entropy: From Planck to Shannon at the Roots of the Information Age" *Entropy* 19, no. 7: 341.
https://doi.org/10.3390/e19070341