1. Preliminaries: Understanding Spinors and a New Approach to Quantum Mechanics
1.1. Clifford Algebra
The present paper is based on previous work of the author [
1]. For the convenience of the reader, we provide in this section a minimum of information about that work, which is a formulation of a new approach to quantum mechanics (QM) based on the geometrical meaning of spinors in group representation theory. This geometrical meaning of spinors is explained in [
2]. I cannot insist enough that the reader should really consider that reference [
2]
contains information he is not aware of, such that he should read at least pp. 3–16 of it, if he wants to make sense of the present paper and the short introduction presented in
Section 1. The basic underlying idea is that we generate the rotation group and the homogeneous Lorentz group from reflections. That was also Hamilton’s idea when he developed the quaternions.
From the reflections in
,
,
, we generate a group that contains not only the rotations of
but also reversals and reflections. By a reversal, we understand an operation obtained from an odd number of reflections. The rotations are obtained from an even number of reflections and form a subgroup of the group generated by the reflections. An analogous statement applies for the homogeneous Lorentz group, where we also call a reversal an operation obtained from an odd number of reflections. The operations obtained from an even number of reflections constitute the homogeneous Lorentz group which contains the group of rotations of
as a subgroup. How the quest to find the representation matrices of the reflections leads to the definition of the Pauli matrices and the Dirac matrices is explained in Section 2.2 of [
2]. Note that the development is algebraically equivalent to the way Dirac defined the Dirac matrices for the homogeneous Lorentz group. It is however conceptually completely different. Rather then trying to find some jaw-dropping square root of the Klein–Gordon equation, the true and geometrically clear issue is to define the reflection operators which can be used to generate the group. This gives a entirely different, geometrical meaning to the algebra, which is absent from Dirac’s approach and renders this algebra much more clear and intuitive. The group is mathematically defined prior to any use of it in physics, and as we show below its representation theory contains only group elements. It does not contain vectors, four-vectors, four-gradients or knock-out square roots of d’Alembert operators. The development should be devoid of physical quantities such as
ℏ, the electron rest mass
and energy-momentum four-vectors, which do not have their place in a purely mathematical development of the group representation theory. For reasons of enhanced clarity and readability, we use in our approach [
1] consistently a different choice for the
Dirac matrices than Dirac and which was introduced by Cartan in his monograph on spinors [
3]:
This choice has the convenience that we can immediately spot if the group element is obtained from an odd or an even number of reflections. The true Lorentz transformations have a block structure along the main diagonal, while the reversals have a block structure along the secondary diagonal. It will also allow us to spot immediately when we use a superposition state of a true Lorentz transformation and a reversal (see below), because then all four
blocks will be non-zero. In Dirac’s choice, making such distinctions is thwarted by the fact that not all four gamma matrices have their block structure along the same diagonal. We can use this alternative choice because it has been shown (by Pauli) that all valid choices for the gamma matrices are equivalent. Note that the information content of a column of a Lorentz transformation matrix corresponds to only four real parameters, while the definition of a general element of the Lorentz group requires six real parameters. In Chapter 4, p. 96 of [
1], we discuss two
matrix representations SL(2,
) of the Lorentz group which are based on the blocks and contain all six real parameters. In Section 2.2 of [
2], we explain starting from its Figure 1 how the product of two reflections in the rotation group defines a rotation and how this leads to the Rodrigues equation for a rotation matrix
in SU(2):
Here,
is a unit vector along the axis of the rotation,
is the rotation angle,
is the
unit matrix and
is a shorthand for the three Pauli matrices. The construction of the representation theory for the rotation groups in
, with
,
and the homogeneous Lorentz group is obtained by simply generalizing this idea, but in the present paper we only focus our attention on the group of the rotations in
and on the homogeneous Lorentz group. As the reflections are defined by unit vectors
that are orthogonal to their reflection planes, we can use both
and
to characterize the reflection, which is thus represented by both matrices
. The result is that each group element is algebraically represented by two representation matrices. Hence,
and
are representing the same rotation: SU(2) is a double covering of SO(3). This is also true for the constructions in the Lorentz group. Here, the reflections are defined with respect to three-dimensional hyperplanes. They are the three canonical parity transformations
P and the time reversal operation
T. Using the same geometrical derivation within the Dirac representation as used in SU(2), it is easy to see that the rotation which corresponds to the SU(2) matrices
is now represented by the two
matrices with the
block structure:
When we want to describe a spinning object at rest within the Dirac representation, it suffices to present the calculations in SU(2) because in the Dirac representation it would only imply writing the same SU(2) matrix twice on the diagonal. (It is therefore a misconception to claim that the electron spin can only be correctly described within the relativistic framework of the Dirac representation.) As already mentioned, this way all the matrices we define represent group elements.
As explained in Section 2.4 of [
2], we can now also introduce a second algebra. This is a parallel formalism for vectors and multivectors. This formalism exploits the fact that we use the unit vector
to define the reflection operator. The matrices
then occur in both algebras, such that the same algebraic quantity is representing two geometrically completely different things: reflection operators in the first, pristine algebra of group elements and vectors in the second algebra. The formalism of the second algebra is then easily extended to vectors which are not of unit length, and it is within this second algebra that Dirac’s approach is defined, as finding an algebra that linearizes the square root of a quadratic form. This is missing a crucial point because this second algebra does not refer to the group elements of the first algebra, such as rotations, which, as we show below, are essential for understanding the real meaning of the equation, which is that it expresses spinning motion. The consequence of this is that within this second algebra one obtains the Dirac equation without knowing what it means. It is mystifying us by hiding what is really going on behind the scenes in the first algebra. Most of the time the fact that there are two different formalisms such that a same algebraic expression can represent two different geometric objects is not clearly pointed out. In the first, pristine algebra of group elements, the reflection operators are defined up to a sign, while in the second algebra the vectors are represented unambiguously.
By multiplying the matrices representing vectors, we obtain new quantities in the second algebra, which contain multivectors of the type . The second algebra is thus an algebra of multivectors. The expressions we obtain in carrying out the matrix products can contain sums with terms of different types of multivectors, such that one has the impression that this is an algebra wherein we sum objects that we are not supposed to sum. It appears as summing kiwis and bananas. The definition of these awkward sums is in general the starting point of most texts about Clifford algebra. The stunning definition is introduced without any justification or discussion which raises the question of whether this is really legitimate and makes one wonder what this might mean. It just descends from heaven. Our approach to the group representation theory permits to understand where this all comes from. The strange definition relies on the algebraic feasibility to carry out these operations within a same general matrix formalism. For example, the Rodrigues equation seems to be sum of a scalar and an axial vector, which looks a priori absurd. In reality, we express the algebraic representations of geometrical objects of the first type (the group elements) in terms of algebraic representations of geometrical objects of the second type (the multivectors). It is all the consequence of the initial fact that reflections and vectors are represented by a same algebraic expression. The same is true mutatis mutandis in the Lorentz group.
Note that the notation
, which is used in the algebra, is misleading. It is a shorthand for
, which represents the vector
while the analogy with the notation for a scalar product it thrives on might make you think that it is a scalar, viz. the scalar product of a vector
with some “vector”
. However, the shorthand
is not a vector of
, it represents the trivector
, i.e., the triad of the three basis vectors of
. As already stated,
stands for
. Similar remarks apply mutatis mutandis in the Lorentz group, e.g.,
does not represent a scalar but the vector
. This becomes very important in
Section 2.2.
In SU(2), the
rotation matrix can be represented by its first column without any loss of information, as explained in Equation (4) of [
2]. This
column matrix is a spinor. In SU(2), a spinor thus represents a rotation. A spinor
is a rotation. The second column of the SU(2) rotation matrix is called the conjugated spinor, and as we show below it corresponds to a reversal (see
Appendix A). That a spinor in SU(2) is just a rotation is much easier to understand than the textbook narrative that a spinor would be the square root of a vector. That relation is nevertheless also explained in Section 2.5 of [
2]. We explain there that this idea cannot be fully generalized to
. The fact that in the Dirac equation we use superpositions of states (see below) has as a consequence that the column vectors still represent the complete information about the six parameters which define a general group element, as discussed on pp. 163–166 of [
1]. For this reason, these column matrices are called bi-spinors.
We can take advantage of this remark about superpositions of states to address an important issue. Group theory is based on products of group elements. Sums of group elements are in general not defined. Spinors are not elements of a vector space but of a curved manifold. For this reason, summing spinors is not a defined operation. However, in QM, we are making linear combinations of spinors all the time. We must thus justify this use because QM leads to meaningful results. It is explained in Section 2.3 of [
2] that
we can give such sums a meaning in terms of sets of group elements. In QM, these sets become statistical ensembles of physical states. This leads us naturally to a statistical interpretation of QM as proposed by Ballentine [
4]. The strong point of our approach is that it underpins Ballentine’s interpretation because his rules are now mathematically derived from the group theory (and the construction of the Dirac equation from scratch in [
1] sketched briefly in
Section 1.2). Another strong point is that we can read within the geometry that corresponds to the algebra of the equations what is happening in the physics. Spinors offer us a key to understanding QM. In fact, we show below that the free-space Dirac equation just describes a statistical ensemble of spinning electrons in uniform motion. This is an insight the traditional approach just emphatically denies us (because it is based on the second algebra and lacks the insight provided by the first algebra). The discussion in terms of sets can also be used to derive a Born rule (see [
5] pp. 1–2, [
6], p. 25). It is based on associating each electron with a spinor that describes its state. The spinors
of SU(2) satisfy the identity
. If we have to count electrons in some formalism based on SU(2), we should thus use
.
1.2. Use of the Clifford Algebra to Derive the Dirac Equation from Scratch
We can use SU(2) and its spinors to represent spinning motion. Throughout the paper, we use the notation for the set of functions whose definition domain is the set A and which take values in the set B. In the same way as we use vector functions : to describe orbits in classical mechanics, we can describe the spinning motion of a spinning object as a top or a particle in its rest frame by a spinor function : , where is the proper time. In classical mechanics, we use to make the link between the geometrical parameters and the physical parameters. In relativity, we rather use . In QM, we need to describe spinning motion. The mathematical tool to do this is the spinor. Now, we use an equation for to make the link between the geometrical and the physical parameters. This link is provided completely at the end by introducing the minimal substitution in order to make the step from the free-space Dirac equation to the Dirac equation for an electron that moves in an electromagnetic field. The minimal substitution is not entirely rigorous because it only addresses the boost part of the spinor, but we cannot discuss this here.
As explained above, with Equation (
3), for a spinning object at rest, we can derive the equations in SU(2) first. When we have obtained a feeling for the formalism in SU(2), we can then first lift it to the Dirac representation and then generalize it for a moving electron by covariance. The last step consists in introducing and attempting to justify the minimal substitution. Note that, in our derivation of the Dirac equation from scratch, we do not explain why the electron spins. Perhaps in the future somebody will be able to explain why the electron spins on the basis of a dynamical model for the electron. However, we do not know anything about this issue and in traditional QM we even do not know that the issue exists. We are in a position of total ignorance similar to that of Newton who introduced the expression for the gravitational force ex nihilo and could only lament that he did not understand how this force could act at a distance. Similarly we introduce ex nihilo the ansatz that the electron spins and show then that one can derive the Dirac equation just starting from this basic assumption, which we laconically introduce without any further justification. The starting ansatz thus fulfills the same role in our approach as an axiom in mathematics. Historically, the intuition that the electron may spin has been around from the beginning but this has been firmly denied by the standard dogma. Our approach, which contradicts the standard dogma, cannot be criticized from the standpoint of the traditional approach to QM, because it is a competing theory whose algebraic results are identical to those of the traditional approach.
The following derivation of the free-space Dirac equation from scratch is discussed in [
1], especially in pp. 153–168, with additions scattered over various papers (the Appendix of [
5], pp. 1–2 of [
6]). For this reason, we provide here a synopsis that can serve as a guide for further study. This synopsis can only be presented under the form of a mere sketch. It is impossible to present the full argument in the present paper because it would require incorporating a large part of the monograph [
1]. Nobody would like to read such a very long and technical paper. Moreover, the scope of the paper is not deriving the Dirac equation. In what follows, there are thus some gaps that can be filled by reading [
1].
We start from the Rodrigues equation Equation (
2) and replace
. This describes now the spinning motion of an object, e.g., a particle or a top. The time derivative of
yields:
where the
spinor
is the first column of
. Note that, to derive Equation (
4) from Equation (
2), we must assume that
, else the equation will contain extra terms and the equation will become considerably more complicated. In other words, we introduce the underlying assumption that the orientation of the spin axis remains fixed. We must thus remember in the further derivation of the Dirac equation which follows that it is only valid for a spinning electron with a fixed orientation of the axis of its spinning motion. The case where the spin axis precesses is a priori not covered by this derivation.
We thus cannot use the Dirac equation to study precession, a limitation one cannot become aware of if one just follows Dirac’s derivation of his equation.Equations (
5)–(
8) present a first intuition about a possible roadmap for deriving the equation. However, the whole rationale would fall apart in the face of mathematical rigor. Nevertheless, the intuition is right and we show how we can repair the mistakes in order to obtain a rigorous mathematical proof. For
we have
. We obtain then:
In general,
, because a reversal (obtained by an odd number of reflections) can never be equal to a rotation (obtained by an even number of reflections). In general, we also have
, such that Equation (
5) is simply wrong in general. It is a one-time, punctual coincidence we discuss more in detail in
Appendix A. However, let us imagine that we can obtain an equation
that is generally true in SU(2) anyway, such that Equation (
5) also becomes true in general. If we now postulate
, we then obtain:
We can now lift this result to the Dirac representation. Now,
Hence, the meaning of the “square root” of the d’Alembert operator is:
The right-hand side of this equation corresponds thus to the partial derivative with respect to the proper time expressed in a frame wherein the electron is no longer at rest. We see that combining this with Equation (
6) may lead to a derivation of the Dirac equation from scratch, where all we assume is that the electron spins around a fixed axis with a frequency
and that
. The free-space Dirac equation could be obtained this way by covariance from the equation of the electron in its rest frame.
This appears great but we show below that it contains a hidden error. Let us first repair the fact that, in general,
. We see that the cheat of taking
combined with the use of the spinor
has produced the miracle that we can use
and more generally
as an energy operator. We could not have defined this energy operator if we had kept working with
even for the case
. That all this lacks generality is also obvious from the fact that in general both exponentials
and
will occur in the first column of the rotation matrix, as can be seen, e.g., in the example of Equation (
13) below. Nevertheless, we can still satisfy Equation (
6) if we replace the pure state
by a superposition state
defined by:
As explained above, this superposition transforms immediately the theory into a statistical theory, where
represents a statistical ensemble wherein half of the electrons are in the state
(which is a rotation) and half of them in the state
(which is a reversal). We call the mixed state
in an abus de langage also a spinor function. The ensemble is defined by its energy and its rotation axis, whereby the state can be a rotation or a reversal. On the new states
, the operator
can function again as an energy operator. All this can also be developed in the Dirac representation and be generalized by covariance. We have done this in [
1] on pp. 153–168. As mentioned above towards the end of
Section 1.1 where we discuss bi-spinors, it also requires introducing a superposition state, as explained on pp. 162–166 of [
1].
The hidden error mentioned above is much more surreptitious a problem. Its treatment is also tedious due to its technicality. We therefore relegate it to
Appendix B. Preferably the reader should read
Appendix B immediately and then come back here to the main text.
Appendix B leads further to the insight that the Dirac equation describes a superposition state that must be interpreted statistically, as proposed by Ballentine [
4]. Equation (
A8) in
Appendix B can be combined with lifting the steps in going from Equation (
4) to Equation (
6) to the Dirac representation such as to yield the Dirac equation for an electron at rest. This in turn can then be transformed into the general free-space Dirac equation by covariance, as fully described in [
1]. It is only after publishing [
1] that we explicated the rigorous steps that are needed to extend the definition domain of the differential equation in Equation (
9) from
to
(as described in
Appendix B and discussed in [
6]). In [
1], the analog within the Dirac representation of the superposition defined in Equation (
9) is also discussed on pp. 162–166. Our derivation shows this way clearly that the free-space Dirac equation describes a statistical ensemble of spinning electrons in uniform motion. The minimal substitution required to study electrons in an electromagnetic field is also discussed in [
1].
Let us now think of tops that are spinning clockwise or counterclockwise around an axis with angular frequencies . They obviously have the same energy. We can see from this that the energy must be rather than , which settles the riddle of the negative frequencies. As we can extrapolate the equations from SU(2) to the Dirac representation, we see that this must also be true within the context of the Dirac equation. The net energy needed to make the transition between the two states with algebraic energies is not . First, the state must lose its energy to grind its spinning motion to a halt. Then, we can start to make it spin in the opposite sense. It must then regain the same energy to recover the spinning motion with the opposite algebraic angular frequency. The net change of energy required is thus zero. On the other hand, when a positron and electron annihilate, we do not obtain a zero energy but two gamma rays of 511 keV each, which shows that the identification of negative energies with antiparticles is not justified.
We may observe further that SU(2) does not contain antiparticles. Our whole derivation is based on SU(2) and what does not go into a mathematical formalism cannot come out of it by magic. Similarly, the gauge symmetry used to justify the identification with antiparticles is not used in the derivation. Hence, once again, what does not come in cannot come out by magic. In our approach, we do not introduce the notion that negative frequencies correspond to antiparticles. In addition, Dirac did not originally introduce that notion. We could of course introduce antiparticles and associate them with negative frequencies a posteriori. However, a negative frequency would then correspond to two different physical states. We all but need such ambiguity and can therefore forget about the whole idea of negative energies. In our approach, everything becomes more logical and clear.
1.3. Consequences
Even if we cannot give all the details about it in the present paper, we have derived in [
1,
6] the Dirac equation meticulously from scratch with the absolute rigour of a mathematical proof. The Schrödinger equation can be derived from the Dirac equation. Hence, the spinor approach contains the basis for a lot of QM. The derivation of the Schrödinger equation introduces approximations that break the symmetry of the Dirac equation, rendering QM actually more difficult to understand. It is somewhat analogous to replacing
by announcing the less accurate identity
, with the effect that some people would no longer make the connection to the foundational idea with its perfect symmetry. Therefore, it is important to base our approach on the derivation of the Dirac equation to make the all-important role of the symmetry completely shine through in all its dazzling beauty. Rather than on some incredible, arcane “intuition”, our derivation sketched above is based on very simple ideas. Instead of deriving the energy and momentum operators from substitutions
,
, which extrapolate a result obtained by educated guessing from the de Broglie ansatz for a scalar wave function, which itself was also guessed, we obtain it here by a very different, much more logical derivation. There is no guessing in our derivation, it just rolls out from the combination of Equations (
6) and (
8). In fact,
are the parameters
which define a boost. Any four-vector can this way be used to define a boost, which explains why there exists a special relation in QM between the energy-momentum four-vector and the four-gradient.
There have been many attempts to make sense of QM, e.g., the many-worlds interpretation [
7], Bohm’s approach [
8] and Cramer’s transactional interpretation [
9], just to name a few of them. Such attempts often introduce some new physical idea with classically forbidden traits. According to one’s personal taste, one will consider these transgressions of the classical common sense as credible or otherwise. When this physical idea is hard to verify, in the end, one is still left wondering if it is true or otherwise. We are walking on eggs. The approach described here tries to avoid at all cost introducing physical ideas whose truth is hard to decide upon. The starting point is figuring out the geometrical meaning of spinors. This is pure mathematics and not open to discussion. It can only be right or wrong, and the reader can figure this out for himself by reading at least pp. 3–16 of [
2], which gives all the details. He would see that it gives a very clear intuition for what spinors are. Once he has picked this up, the meaning of QM will just unfold itself. This is because the geometrical meaning of the algebra used in QM, which is Clifford algebra, is already given by the mathematics of the group theory itself prior to any application of the algebraic part of it to physics. Understanding the geometrical meaning of the algebra boils down to understanding the group theory and spinors. The reader can acquire this understanding by reading [
2]. This will provide him with the key to make sense of the algebra of QM. There is thus no need for introducing puzzling additional physical assumptions in order to unravel the mysteries of QM. All we need is already contained in the mathematics. Understanding spinors permits even to spot and correct flaws in the traditional theory.
What we gain in our new approach is that we know exactly which ingredients are used to derive the equation. In Dirac’s approach, one is left free to imagine that some very special quantum axioms may be needed to derive it, because one just does not know what the underlying axioms are and the experimental results it describes are baffling us. This leaves of course the door open for introducing the destabilizing speculative ideas mentioned above. In Dirac’s approach, we also remain in the dark as to the geometrical meaning of the equation. The reader may be stunned by the fact that the derivation of this eminently quantum mechanical equation is purely classical. It may leave him incredulous. Where does the quantum magic then come from? This is discussed in great detail in [
1,
6]. The way we are able to derive the Dirac equation calls for caution. If the equation can be derived from such simple assumptions, several deductions drawn from the traditional approach may be overinterpretations that are just not granted.
It turns out that, if one masters the group theory, one can derive many results of QM by just classical reasoning. The quantum mysteries disappear and the theory becomes intuitive and intelligible. We therefore undertake the quest to spot a phenomenon where we become obliged to introduce some quantum magic anyway. Some salient examples of our results are the derivation of this Dirac equation from scratch (with full details in [
1] and an addition in [
6]), the solution of the particle-wave duality in [
6,
10], the solution of the paradox of Schrödinger’s cat in Section 2.3.2 of [
2] and in [
5] and an explanation for the double-slit experiment in [
6,
10] (which can be further enriched by using the Appendix of [
5] to deal with incoherent sources), but there are many more. I have never addressed tunneling because the work of Hansen and Ravndal [
11] already explains it perfectly. These successes are obtained without the counterintuitive physical assumptions that are introduced in some other approaches. The latter assumptions are thus introducing mystery and magic without a valid reason and are therefore misleading. In general, there is much less magic than we are used to think and that is what makes our approach so interesting. The present paper shows how our approach also permits to make sense of the Stern–Gerlach experiment.
Traditional QM has been discovered with rather stunning serendipity. Dirac just guessed his equation and many other rules were introduced ad hoc. Despite the shifting grounds of these shaky foundations, QM has proved extremely successful. However, I must insist on warning the over-sceptical reader that he cannot attack my work by using the traditional textbook wisdom as the ultimate touchstone for the truth, e.g., when my work flies in the face of accepted notions or if it draws him out of his comfort zone. That is because my approach, which is a reconstruction of QM from scratch based on the geometrical meaning of spinors, should be considered as a competing theory which leads to the same algebraic results. Competing theories cannot be compared by considering one of them as the absolute truth. The comparison must be based on other merits. Here, these merits are not the agreement of the algebra with the experimental results, because the algebra remains the same. The merit of my approach is that it is based on an already pre-existing clear geometrical meaning of that algebra, provided by the group theory, whereby the results are mathematically derived and proved. It must therefore a priori be considered as superior to the traditional approach, as a viewpoint that is developed from guesses and rather uses the algebra as a black box (under the motto “Shut up and calculate!”) cannot seriously pretend to prevail with authority over an approach based on mathematical derivations and proofs.
In this paper, I have to continue pointing out errors in the traditional theory as already done in the preceding subsections. The fact that I insist on pointing out errors may upset some readers. It may appear as a rant based on sheer arrogance or a lack of respect for Dirac. However, the true issue can only be that my work is an alternative approach to QM, which makes a lot of things that appeared mysterious intelligible. It is absolutely crucial to delineate what is wrong and what is right in this approach when it contradicts accepted notions of the traditional approach or else this would lead to confusion. Solving paradoxes requires pinpointing and neutralizing subliminal logical errors with surgical precision. Nobody is served with keeping such errors concealed, especially since QM is fraught with paradoxes. I cannot lie for reasons of respect. If people want to understand QM, they will have to accept that it may take correcting for mistakes. You cannot ask for a better understanding of QM and postulate at the same time that everything that is different from what you have learned must be wrong. What you have learned is not sacred and it is responsible for the conceptual impasse we are in. Pointing out errors and the differences between the new approach and the traditional approach is a necessary part of comparing them, especially in situations where we encounter conceptual difficulties in the traditional interpretation that can be solved in the new one. Yes, the new approach is non-canonical and it is easy to pooh-pooh it for that reason, although the development in this paper shows that it is in reality its strength. However, is it not madness to still think after hundred years that one would be able to break away from the conceptual difficulties we have in making sense of QM by just sticking to the traditional canonical approach? As identical causes ought to produce identical effects, the breakthrough may just have to come from a non-canonical approach.
1.4. Breakdown of the Standard Dirac Formalism in the Case of Precession
As noted above, we make all the derivations above assuming that the orientation of the spin axis remains fixed. Hence, a priori the Dirac equation cannot be used to describe more complicated motions such as precession. This is something one cannot become aware of in Dirac’s approach. On pp. 313–316 of [
1], we have expressed such a precessing motion in SU(2). We have taken the expression for a spinning motion with an angular frequency
around a general spin axis defined by the unit vector
with spherical coordinates
. Note that we are using
and
(defined in Equation (
2)) as two different symbols in this article. Then, we have considered what we get by rotating this spinning object bodily around the
z-axis with a frequency
. At the moment
, a non-precessing spinning top is represented by the rotation matrix
. As what we get with precession has during the time
bodily been rotated around the
z-axis with a frequency
, we obtain for the precessing spinning top:
The matrix
describes the rotational motion around the
z-axis. The detailed expression for
is given by Equation (
13) in
Section 3. If the reader has doubts about the correctness of Equation (
10), he should think about two identical spinning tops, one that stays at a fixed position in space with respect to the center of the Earth and one that co-rotates with the Earth around its axis. Now, we can differentiate Equation (
10) with respect to
. We have made this calculation on pp. 313–316 of [
1] and it yields:
where
and
. A first observation is here that we no longer obtain a scalar in front of
(or its spinor
) by using
but a vector, while the energy definitely should be a scalar. There is thus definitely something wrong with the traditional energy operator in the new context which goes beyond the domain of applicability of the formalism of the Dirac equation and can therefore only be treated by a non-canonical approach. The prescription
for the energy operator ceases to be valid in the extended setting.
We are confronted with an analogous situation in Equation (
4) with its wave function
. We can recover the energy operator by replacing
by
as defined in Equation (
9). When in this formalism, we detect an electron with an angular frequency
in the state
the theory cannot tell us with certainty if it is in the state
or
. It is crucial to acknowledge that Equation (
4) is also a correct equation. It just does not yield the Dirac equation of QM, while our aim is validating our approach to the meaning of QM by deriving the Dirac equation. It is only to achieve this goal that we introduce
by Equation (
9).
We cannot apply the traditional energy operator on the wave function
of Equation (
4) because it does not yield the nice result
but
. The correct energy operator to be used with Equation (
4) would be
. This is feasible because
is constant anyway. However, there is no necessity to obtain
as the energy operator for a wave equation apart from the desire to stay within the formalism of the Dirac equation. When we want to describe precession, we are beyond the scope of the Dirac equation and there is no longer a gimmick that can help us to preserve the definition of the energy operator under the form
and drag the equation back into the field of applications of the Dirac equation. This is because there are now in any case angular frequencies with two different absolute values
or
occurring within a single column of
(see
Section 3), such that the energy operator can no longer project out a scalar energy eigenvalue in front of
or its spinor
. This is all fair enough. There is nothing wrong with it. It just signals that we are outside the scope of the Dirac equation, in the same way as Equation (
4) was outside the scope of the Dirac equation. However, we are not outside the broader scope of the group theory, which is our conceptual basis to formulate QM. The matrix
describes now a superposition state that contains in total four different algebraic angular frequencies.
Since in our non-canonical approach outlined in
Section 1 we gain a complete geometrical understanding of the ingredients that are needed to derive the Dirac equation, we can now derive a completely novel formalism within the same framework of group representation theory to deal with this new situation. Our hands are not tied to the canonical formalism of QM and its energy operator because our framework has a larger domain of applications than the Dirac equation for a fixed spin axis. Our framework is the group representation theory and its geometrical meaning, which we have already validated as a basis for a new and more intelligible approach to QM by showing that it can be used to derive the Dirac equation. The non-canonical approach can now outrun the canonical approach in its power to deal with novel situations, because we show that we can deal with precession. In our approach, we have to split the four-frequency superposition state into its two different energy components
and
, because it does not make sense to make a brute-force calculation of the energy of a superposition state that involves pure states of different energies. It is this brute-force calculation which gives rise to the unphysical feature of a varying energy in the QM treatment of precession, which is discussed in
Section 2.3. We must first calculate the energies of the pure states, and, if we want to do so, we can calculate the average energy by making statistical averages afterwards. We have now all the prerequisites to understand how we can tackle the Stern–Gerlach experiment in our new approach.
3. Tabula Rasa Approach Based on Spinors
In view of all this confusion, typical of a wobbly theory, we must rebuild a theory from scratch and try to solve the paradox within the framework of our new approach. It will therefore be mathematically rigorous and based on a good understanding of spinors [
2]. Despite the fact that the author understands spinors quite well, the many contradicting images that are living on in the intuitive folk lore about the spin in a magnetic field amount to a formidable conceptual obstacle. They are a smoke screen that kept me in the dark for a very long time and rendered it extremely difficult to find the correct solution. I am confident that I am not the only one who has been running in circles for years in trying to make sense of this spin-up/spin-down doctrine. As we show, it is focusing the attention on the supposed aligning of the spin axis with the magnetic field
that sends us irrevocably down the rabbit hole. It is the unshakable belief that the experiment unmistakably tells us that the spin must be aligned which keeps us in the total impossibility of breaking away from the conceptual death trap of space quantization. The fact that this enigma has remained unsolved for almost a century illustrates how difficult it was.
We must thus repeat our warning to the reader that he is in for a rough ride whereby a lot of what he has become used to take for granted will be ripped apart. Such a statement may cause irritation, as already discussed at the end of
Section 1.3, but I think that if you pick up the basics about spinors from [
2] and then read the present paper, you will feel rewarded for your efforts. Just as in our derivation of the Dirac equation from scratch in [
1] and in
Section 1.2, we start from the well-known Rodrigues formula (Equation (
2)) in SU(2) for a rotation over an angle
around the axis
and put
, where
is the proper time. The resulting equation models then an object that spins at the frequency
around the axis
. For an electron at rest, it suffices to make the calculations in SU(2), as explained above with the aid of Equation (
3). From the viewpoint of the traditional approach to QM (based on
guessed equations), this starting point may appear to be an extraneous development that is completely out of context and has nothing to do with the formalism of QM, but, as explained in
Section 1, the whole formalism of QM is in our approach
derived from this Rodrigues formula with the substitution
, such that the development fits completely into the context of our approach.
As easily checked and also derived in [
1] (see, e.g., [
1], p. 142), one can write the spinning motion in SU(2) in terms of a sum of two frequency components:
This is a simultaneous description of the mixed state
defined in Equation (
9) and another mixed state
we define in Equation (
A5) of
Appendix A. We work all of this out in full detail in
Appendix A. Both mixed states are characterized by the fact that they have a well-defined energy. Within the framework of QM, we can consider the two components of the matrix in Equation (
12) as two (mixed) beams. In fact, using Ehrenfest’s interpretation of superposition states (see [
2], p. 10, complemented by [
5], p. 2, for a group-theoretical justification), the presence of the two frequencies in Equation (
12) means that we are describing two mixed states simultaneously. Writing the two mixed states that occur in
simultaneously can be considered as just another way of writing a superposition state. We are not forced to consider such a (doubly) mixed beam but the geometrical equation Equation (
12) offers us the possibility to do so. Let us now write Equation (
12) for a rotation with an axis
that is different from the
z-axis:
Here,
are the spherical coordinates of the spin axis
. As already pointed out in
Section 1.4, we use
and
as two different symbols in this article. Let us now inspect the two components. The
component is:
We recover here the result
from [
1] (see Equations (3.28) and (5.25)), where
is the spinor that corresponds to
. The algebraic expression that occurs in Equation (
14) is in reality not
, but rather its value
at the starting time
, whereby
is defined by
. The
component is:
This corresponds to
, where
is the conjugated spinor corresponding to
, i.e., the second column of
. Again, the quantity that occurs in Equation (
15) is rather
. Note that
and
are orthogonal.
Up to now, all calculations have been pure geometry. To introduce the physics, we rely on just one single idea (we first introduced this in [
14]), viz. that a magnetic field would make the spin vector precess, based on the following heuristics. For different radii of the circular motion within a magnetic field, the cyclotron frequency remains the same in the non-relativistic limit. Every local co-traveling frame will spin at the same frequency, just in the same way as your horse on a merry-go-round not only moves along a circle but also spins around its own axis with respect to the frame of the observers on the ground. If you shrink the circular orbit in the magnetic field to a point, the spinning motion with the cyclotron frequency around the axis remains. Therefore, a pointlike charged particle at rest in a magnetic field would be spinning even if it were initially spinless. However, if it initially already spins and its spin axis is tilted, then this axis is precessing, which corresponds to the intuitive narrative based on the analogy with a spinning top. We encounter this merry-go-round scenario also in Purcell’s explanation of the Thomas precession [
18]. It provides us with some classical intuition for the anomalous Zeeman effect. However, in the Bohr–Sommerfeld imagery of QM, these heuristics are thwarted by the fact that the orbits are quantized. For matters of rigor, we must therefore consider all these ideas as mere heuristics and we have absolutely no cogent a priori knowledge that would help us in deciding if these heuristics are correct or otherwise. We can only acknowledge that spin precession is a popular intuitive scenario. The final test of this merry-go-round scenario will be whether it reproduces the experimental results. For a magnetic field
aligned with the
z-axis, we obtain then an electron whose spin axis is precessing according to Equation (
10). Here,
is now the cyclotron frequency. Let us write the effect of this precession on both components of
. For the first component:
The matrices are here again tensor products. However, they are now of a novel type , which no longer provides a familiar link with some rotation axis as in the equation . This is quite normal because a precession has no fixed rotation axis. We are working all the time with matrices that can be written as tensor products because they have determinant zero. That a matrix with zero determinant can be written as a tensor product is a specificity of matrices. The result of multiplying such a matrix with determinant zero with another matrix will lead to a new matrix that still has determinant zero, such that it can be written again as a tensor product, but it will no longer have the structure .
We can actually trace back how such hybrid terms come about. Let us call the spinor of the rotation around the
z-axis
and its conjugated spinor
. The first term in Equation (
16), the one that goes with
, is obtained from multiplying:
It thus corresponds to
. We can multiply the underbraced matrices in the middle, which can be shown to be a correct procedure. We obtain then the scalar
and
. In this way, we obtain again the first term of Equation (
16). We see that it is obtained by combining
and
, which is why
occurs with a minus sign and
with a plus sign. The other component yields:
As justified in
Section 1.2 and discussed in [
1], we can consider that the two signs of the frequency
correspond both to the same energy
. We can then rearrange the terms according to their energies:
where we can factorize out the probability amplitude
, and:
where we can factorize out the probability amplitude
. Equations (
19) and (
20) describe the energy states if we send a mixed electron beam into a Stern–Gerlach filter. When the beam is not mixed, each energy state will only have one component.
It transpires from the calculations that there are two possible energies for the electron within the magnetic field, according to the criterion
outlined above. Here,
takes the values
. We avoid in this way using
as an energy operator in a context where it is no longer valid, as discussed in
Section 1.4. Now, we have found an analysis that yields the correct observed energies. It also explains the whole Stern–Gerlach experiment, provided we can still explain how these two energies lead to different trajectories (see below). Let us note that we present the effect of the magnetic field on the charge by Equation (
10). This is not something we find in textbooks, but is based on our heuristics (first developed in [
14] in terms of vorticity). The algebra does not contain a current loop or a magnetic dipole. It just contains a rotating point charge. The intuition about a magnetic dipole is a wrong intuition. The fact that the magnetism produced by the spin does not need to be of the dipole type is shown by the exchange mechanism proposed by Heisenberg and Majorana, which is based on the Coulomb interaction and the exclusion principle.
The whole puzzle why the magnetic moment would have to align with the field has now disappeared. We find the right energy without having to invoke alignments of axes with the magnetic field. Such alignments are just no longer part of the story. Furthermore, there is simply no longer a well-defined single fixed axis as transpires from the weird terms
in the formalism. Equation (
19) describes a motion with energy
and which occurs with probability
, while Equation (
20) describes a motion with energy
and which occurs with probability
, in agreement with the experimental results. These are both complex motions that we cannot describe in simple terms as we do for a rotation around some axis. We can safely assume that these two components just describe precession (see
Section 4). The Stern–Gerlach filter separates these two energies into two different beams. It is one of those two rearranged combinations that in general would be fed into a next Stern–Gerlach apparatus if we performed an experiment with a sequence of Stern–Gerlach filters. The precession just adapts all the time to the magnetic field present and it stops when there is no magnetic field. There are never quantum jumps in the motion of the spin vector, while in the traditional paradigm such jumps appear inevitable. Note that the average energy is
, such that
is a macroscopic energy term, which is not applicable to individual fermions. It is not a potential energy. This average energy is no longer varying with time as in the brute-force QM calculation discussed in
Section 2.3.
The fact that we made our calculation on a mixed beam may raise the question if this is justified. We may interpret it in terms of clockwise and counterclockwise motion, but must be aware of the fact that we are talking about two mixed states. We perform the calculations on these two states simultaneously to be as general as possible. However, we can see that we could have excluded one state, e.g., by only considering the
component of Equation (
12). Both components lead to the same energies; the results only differ in the algebraic signs.
Most textbooks calculate the force exerted on the fermion starting from an equation for a “potential energy” and then using . However, the physical existence of such a potential energy is doubtful, because a magnetic field cannot do any work. The equation suggests that all directions of space are allowed, which is actually what, according to the traditional theory, the experiment would prove to be conceptually wrong. This traditional calculation for the trajectories is classical because the aim is to show that our classical notions are wrong. In principle, from the traditional point of view one must then still make a quantum mechanical calculation to render the theoretical approach correct. To avoid talking about chimerical potential energies, it is better to base the analysis on the expression . The force is the force responsible for the motion of the center of mass of the fermion through the Stern–Gerlach apparatus when the fermion is no longer at rest. One can imagine that it enters the device in uniform motion and then starts to feel a force. For an electron this would be (predominantly) the Lorentz force, but, for the Ag atoms, which are neutral, this will be this gradient force, just as in the original calculation of Stern and Gerlach. Instead of the expression , which is wrong, we must use here , which is correct. Using will lead then to the same result as in the textbook analysis of the trajectories, after postulating that is quantized. In being built on group theory, our calculation is entirely classical, and this suffices to explain the experimental results entirely correctly.
We may finally remark that the mathematical difficulties related to the errors described in
Section 2.2 are solved by the introduction of
defined in Equation (
9) and
defined in Equation (
A5) of
Appendix A. From these definitions, it follows that, for the special case
, which corresponds to
, we have
and
. This special case is the only case which can be treated by the Dirac equation because it does not give rise to precession. In
Section 2.2, which treats this special case, we encounter the riddle what we can do with the vector term
in an equation that is supposed to define an energy. Without knowing that the spinors in the Dirac equation describe superposition states of the type
or
, solving the puzzle of how we can replace the vector term
in the equation by a scalar
and obtain a true energy term
is just impossible because, for a pure state
, we have
. Dirac “solved” the problem by brute force using the error described in
Section 2.2. It can be hoped that this will convince the reader that this error cannot be hushed up or ignored. The mathematical truth must prevail and carrying out correctly the admittedly intricate algebra helps us in figuring out the physical truth.
4. More Traditional Formulation in Terms of a Differential Equation
In this section, we reformulate everything again in the more familiar differential calculus of standard textbook QM. In the calculations, we encounter tensor products of the type
. We know that they correspond to precession by construction. The tensor product
can be understood as a simultaneous description of the motions
and
and in this sense describe the precession, but it does not contain the correct time dependence
because
. If we accept the rule that we must replace
by
or
, we obtain a correct simultaneous description of
and
. There are two such terms in Equation (
19), and they are coming from the two beams we consider in Equation (
12). The motion described by Equation (
19) can be condensed into the form:
This equation describes a mixture of two states that occur when we are using a mixed beam. This energy state is thus a set that contains both mixed states. As easily seen,
SU(2)), such that this set is in some way interpretable as a rotation. We can compare this with the situation in projective geometry, where we can define a straight line as a set of all points which are incident with the line, but we can also define a point as a set of all lines which are incident with the point. A rotation can thus also be seen as a set, and other geometrical objects as well. For example, the quantities
and
are the eigenvectors of the reflection operator
and correspond to the sets
and
, respectively. We can thus consider the algebra in Equation (
21) as the construction of a mixed state (a set of states) with a constant energy that can be interpreted as a rotation. With the rotation in Equation (
21), we are now again on more familiar geometrical grounds. We know how to analyze such a matrix and we apply the spinor formalism on it. Derivation with respect to
yields:
The inverse matrix of
is:
Hence,
is given by
where
is given by:
We could treat this geometrical object with a single energy within the scope of the Dirac equation if we introduced again a mixed state (as
instead of
in Equation (
9) within
Section 1.2). Indeed, this mixed state again yields a fixed energy
, i.e., we obtain
when we use the traditional energy operator on it. A legitimate question is then if nature will provide the additional components that must enter the mixture. However, it is not at all our purpose here to bring the calculations back into the scope of the Dirac equation and its energy operator. There is no reason this energy operator should be valid within the context of precession. What interests us here is not calculating the energy which we know already. It is the fact that the set
can be interpreted as a rotation around the
z-axis when the original beam is mixed.
The result is rather amazing, because we have obtained in Equation (
25) the same type of differential equation as
for the Rodrigues formula expressing a simple spinning motion around the
z-axis, although the form of
is different from the form of
because it is not a diagonal matrix, whereas the matrix
that describes a spinning motion around the
z-axis is diagonal. With hindsight, we can see that we could have anticipated all this. The equations
or
describe any type of object that rotates with an angular frequency
around the
z-axis. In the usual approach, the object is a spinless electron that we rotate with a frequency
around the
z-axis to give the electron its spin. In the new situation, the object is an electron which is already spinning with a frequency
around an axis
, and we rotate this object bodily with a frequency
around the
z-axis, to describe the precession of the spinning electron within a magnetic field. That the new object is different from the initial one can be seen from the expression of the intervening matrix, which is different from the diagonal form we had before. This result shows that, whatever the level of complication in some hierarchy of precessions, we are always able to treat a fixed-energy component this way. We could have reached these conclusions also by observing that:
A surprising fact is that the whole energy is attributed to a rotation around the precession axis. However, this illustrates what is noted above, viz. that the energy is not a vector. We have an object that bodily rotates around the precession axis and its energy is
. The development for the equation of motion in Equation (
20) is analogous. It can be condensed in the form:
The inverse matrix of
is:
We can again construct a matrix
, which is now given by:
This is now the equation for a down state. The situation in the Equations (
25) and (
31) thus actually corresponds exactly to a physical picture of up and down states, but these states are different from what we have been told. It is no longer the same type of object, viz. the spin, that has its rotation axis aligned up or down. In the old context, we start from a spinless electron and make it spin around an axis, while, in the new context, we start from an already spinning electron whose axis is not aligned and we make the whole thing bodily spin around a precession axis. It is this precession axis which can now be up or down, not the spin axis.
We should therefore have qualified the states as precession-up and precession-down rather than as spin-up and spin-down. Pauli [
19] just introduced pragmatically the experimental result of the Stern–Gerlach experiment into the theory under the form of an ad hoc postulate, without any true justification. He replaced explaining by describing. The spin-up/spin-down narrative was so highly counter-intuitive that it could only provoke intense bewilderment, as described in
Section 2. After almost a century, we have now the theoretical justification for Pauli’s ad hoc postulate, and we can appreciate that the directions in space are absolutely not “quantized”.
This solution of a real conceptual difficulty perfectly illustrates the philosophy of our alternative approach to QM. We must obtain the same correct algebraic results, but we can change the corresponding geometrical explanation which must be clear, devoid of mysteries and contradictions and in agreement with the meaning of the spinors. This result further validates our alternative approach. The reason for the confusion within the traditional approach is that the geometrical meaning of the spinors was not understood. It remained hidden due to the fact that Dirac’s derivation has been based on the second algebra rather than the first one. Meanwhile, it often remains very hard to find an explanation for the algebraic results. It requires a lot of mathematical creativity and the mental pictures inherited from the traditional interpretation, which are deeply engraved in our minds, can really make it difficult to break away from them. They can also trigger fierce resistance to the new approach. We want a perfect mathematical system, made of a geometry, an algebra and a dictionary that translates one into the other. The interplay between the algebra and the geometry turns such a system into a very powerful method that allows gaining deep insight if we carry out the mathematics meticulously, as pointed out at the end of
Section 3. Analytical Newtonian mechanics reaches this ideal to the point that it almost appears as a purely mathematical theory. With our spinor approach to the few sample cases selected, we seem to come close to this ideal as well.
5. The Pauli Exclusion Principle Remains Valid
Feynman [
20] gave an intuitive explanation for the Pauli principle. However, he did not write down his idea under algebraic form, such that a detailed proof is lacking. In the French translation of [
20], there is a footnote by Lévy-Leblond, which shows that the argument can lead to some confusion. Intuitively, when you exchange two electrons, each of them makes a turn over an angle of
. You may think that this will multiply their spinors by ı and therefore the tensor product of the two spinors by
. However, the moves involved in the exchange are, at least in appearance, taking place in space rather than inside the electron. They are of the position type such that they and the angle
which characterizes them (see below) should in principle not intervene in the argument, because the position coordinates do not belong to the set of parameters that define a spin state. The real exchange is thus not the swap of the positions but that of the spin states. However, these moves are accompanied by the rotation of the co-moving Fresnel frame, which is also characterized by
. This is a merry-go-round type of scenario. This rotational motion is of the spin type. In our development below, the phase
which intervenes is obtained by Lorentz transformation of the spin variable
, and therefore really of the spin type.
Due to its historical context, one may suspect that the Pauli principle relies on the assumption that the spins can only be up and down, i.e., on parallelism. Now that we have discovered that the energy states must rather be characterized in terms of precession-up and precession-down, one may formulate some concerns if the Pauli principle remains valid. As the spins are no longer parallel, we might have just destroyed the Pauli principle. Certainly, there are still only two possible states for the energy, but there are now many more possible states of motion. The motion is no longer characterized by but by . In fact, the spins no longer need to be parallel in order to resort to the same energy state. Could the change of paradigm cause the meltdown of the Pauli principle?
We show that the Pauli principle is not under fire, but let us first try to write Feynman’s argument algebraically (in the non-relativistic limit), rendering our proof open to a detailed scrutiny of the effects of the change. Let us take for the spin-up and spin-down functions the wave functions for non-relativistic electrons moving on a circle:
The expressions in the exponentials come from integrating along the circle, The expression is the Lorentz invariant , whereby we drop the factor in in the non-relativistic limit. Here, ℓ is the curvilinear distance travelled along the circle and . In fact, by noting for the superluminal phase velocity w and putting , we obtain , which must be . Therefore, . The tangent vector permits following the Thomas precession of the Fresnel basis on the merry-go-round, which embodies the true rigid-body rotation of the whole two-electron configuration. We also note for the spin angle, in contrast with which is the precession angle (and does not intervene here). In fact, the electron does not have to move. When we freeze time, we can still move around the circle geometrically. We introduce the angle to specify the position of the electron on the circle. We have , such that . The angle is related to k and we can understand the value of also as the rotation angle of the co-moving Fresnel basis. Consider now two spin-up electrons at diametrically opposed positions on a circle of radius r. We can consider then two spin-up electrons positioned in , , , . The phase difference just translates the different position on the circle.
The expressions are pure spin functions. We consider Equation (
33) as the canonical situation. We treat other situations later on. An exchange of the two electrons can be obtained by a rotation over an angle of
around the centre of the circle. Under such a rotation
over
, we obtain
,
,
.
Hence, the rotation induces the substitutions
,
and
. We can see that the cause for the minus sign is the fact that position angles
occur under the form
in the spinor calculus. After this rotation
, the physical situation is indistinguishable from the situation before, because
transforms Electron 1 into Electron 2 and vice versa. This means that the wave function must be invariant under the rotation. This implies that
cannot be the wave function
. In fact,
would lead to
, where we express the exchange
. However, we have also calculated in Equation (
34) that
. This leads to
, such that
and
. Similarly, if we take:
then we obtain also
because
transforms
into
and
into
, while we also have
because
is an exchange. It follows then again that
. However, if we rather take:
we obtain
because now
transforms
into
and
into
. This is now consistent with the fact that
is an exchange. Hence,
in Equation (
36) is a wave function that takes into account the exchange correctly. The wave function has to be antisymmetric. The configuration of two electrons with parallel spins in the same place, can be obtained by considering the special case
. When the spins are parallel, we have then
and
. We are thus obliged to take the spins antiparallel if we want to succeed to have them in the same place. This is the Pauli exclusion principle for spin-up and spin-down states.
Let us now investigate what this becomes with the new paradigm of precession-up and precession-down states. We can consider this as the non-canonical counterpart of the canonical state described above. We start from
We are thus considering the rotations
and
that relate the wave functions
to the wave functions
of the canonical configuration. We have again
, where we can take
. Then,
The two exponentials still exhibit a phase difference
leading to a factor
such that
or
In other words, . Combining these two identities leads to . Therefore, the wave function must still be antisymmetrical. Hence, Pauli’s principle remains valid even when the two spins are not parallel.