1. Introduction
It is known that there are a huge number of types of interpretations of the Schrödinger equation, most of them incompatible with each other. They all assign different meanings to the function
. Some, as in Bohm’s approach [
1,
2], call it the generator of a concrete quantum mechanical field; others, such as Ballentine [
3,
4], call it a representative of an ensemble of identical physical systems; some, such as Everett/DeWitt [
5,
6,
7], say that it shows the necessity of considering the outcomes of experiments in different universes. We also have the suggestion that
represents consistent histories [
8,
9]; Bayesian calculations [
10,
11]; or it is assumed, as Copenhagen does [
12,
13], to be part of a reduction mechanism in the act of observation because it represents single systems. The function
may also be considered as codifying averages taken from a stochastic support [
14,
15,
16]; these are but a few interpretations spanning almost a century.
Perhaps the issue of the interpretation of the Schrödinger equation would be clarified if one took a “step back” and presented a set of axioms by means of which the Schrödinger equation becomes a theorem, such that the interpretation of its symbols would be a mere extension of interpretations already contained in the axioms. This “step back” can be rephrased as the need for a mathematically sound quantization process.
Indeed, if one wants to pass from the classical to the quantum domain, one uses the rules
This is the
heuristic of the quantization process currently accepted. These rules come with some limitations with respect to their generalizations. For example, generalization to general curvilinear coordinates is not prescribed, and we always adopt the strategy of quantization in Cartesian coordinates, as in (
1), and then change the differential operators to the desired curvilinear system [
17,
18]. We want to move from a heuristic approach to a formal, mathematically sound process.
In previous papers [
16,
19,
20], we have found a different quantization process based on two fairly simple axioms, of which the Schrödinger equation is just a theorem. These axioms must, of course, pass
all generalization tests if we want to assume their adequacy. For example, if we generalize them to a configuration space of greater dimension, they must yield the Schrödinger equation in this new context. Furthermore, they must allow us to derive the Schrödinger equation in any coordinate system by simply writing these axioms in the desired system. At the outset, the axioms must give us the relativistic equations when written in their relativistic generalization [
21]. All these results have already been mathematically shown to be the case [
16,
19,
20,
21]. We have also shown that our derivation is equivalent to Feynman’s path integral, except that it is carried out in phase space with the Hamiltonian, rather than in configuration space with the Lagrangian. We have also shown that our derivation is equivalent to stochastic derivations [
22].
An axiomatic approach may present many advantages. One of them is uncovering unsuspected relations. Thus, with the characteristic function derivation, we have shown that the Bohr–Sommerfeld rules are inscribed in quantum mechanics’ formalism [
23]. The usual criticisms presented for this approach were as follows: (a) it does not take half-integral numbers into account; and (b) it cannot be applied to some physical systems, such as the Helium atom. We have
mathematically shown that these rules are a direct consequence of the axioms (they are theorems), just as the Schrödinger equation is, and that the half-integral numbers appear naturally in this mathematical context. Moreover, we have also shown that these rules can be used only for systems that present some spatial symmetry (which the Helium atom does not present) [
23]. This is a fairly unsuspected result. Another unsuspected result was the mathematical connection between the Schrödinger equation and the central limit theorem, which was clarified in [
24].
Furthermore, in paper [
16], we proposed a new stochastic interpretation that crucially depends on the strength of the axioms. In fact, in paper [
25], we explicitly advanced an epistemological constraint that argues that the better any interpretation is, the closer it keeps its semantic propositions to its syntactic apparatus; that is, its formalism.
Thus,
strengthening our confidence in the axioms ensures the adequacy of our interpretation. We may now strengthen our confidence in the axioms if we show two other generalizations that were not previously envisaged: the extension of the derivation of the Schrödinger equation to deal with electromagnetic fields, known as “minimal coupling”, and the extension of this derivation to encompass superpositions, since all previous derivations were based on pure states. As will be clear in what follows, both these derivations are very algebraically involved, and the fact that they provide us with the correct derivation of the underlying Schrödinger equations must increase our confidence in the adequacy of the axioms, which, in turn, should improve our confidence in the interpretations already proposed in [
16].
The objective of this paper is to present the two generalizations mentioned above: a mathematical derivation of the Schrödinger equation related to the prescription of minimal coupling for electromagnetic potentials, and the generalization of the derivation for superposed states, all of which are expressed in terms of the generalization of the axioms presented in [
19,
20].
The paper is organized as follows: in the next section, we briefly present the usual derivation process using pure states in one dimension. In the third section, we generalize the axioms presented in section two to show that they allow us to derive the Schrödinger equation in its minimum coupling format. In section four, we present the generalization of the axioms to cope with superposed states. In the last section, we present our conclusion.
2. Previous Derivation of the Usual Schrödinger Equation
In a previous paper [
19,
20], we mathematically derived the Schrödinger equation from two simple axioms, which were stated as
Axiom 1. The marginal characteristic function of the phase space probability density function , defined bywhere ℏ is a universal parameter with dimensions of angular momentum, is such that it can be written asand should be expanded to second order in the real parameter . Axiom 2. For an isolated system, the joint phase space probability density function related to any quantum-mechanical phenomenon obeys the Fourier-transformed Liouville equationto second order in . Before proceeding with the derivation, it is interesting to comment on some important issues related to these axioms. First of all, Equation (
3) may give the impression to the reader that this is just the usual Wigner phase space distribution approach. However, the two approaches are different in some aspects: Wigner departs from the representation for the phase space
distribution, given as the Fourier inversion of (
2), where it is already assumed that the functions
are solutions of the Schrödinger equation. From this expression, one can show that the distribution must satisfy a Wigner–Moyal equation in phase space, which resembles the Liouville equation but has terms of order greater than or equal to
in an infinite expansion. The present approach takes the inverse path: it defines the characteristic function in (
2) as the Fourier transform of the probability
density function (not a distribution, since it is positive-definite) — in this sense, it begins from the “classical side” and postulates that this characteristic function must be expanded to second order in
, something completely absent from Wigner’s approach. From this assumption
and the assumption of Axiom 2, we can mathematically derive the Schrödinger equation. The assumption of an expansion to second order in
is crucial for the derivation process. Moreover, one can show [
19] that this approach is mathematically equivalent to Feynman’s path integral derivation of the Schrödinger equation exactly because the expansion to second order in
corresponds to an expansion to second order in
in Feynman’s formalism. The difference between this approach and Feynman’s is that the present one is conducted in phase space, with the Hamiltonian function, while Feynman’s is carried out in configuration space, with the classical Lagrangian, as we have already shown [
16,
19]. As a last comment, Wigner’s approach gives us a phase space
distribution function that can present negative values for some phase space regions, while the present approach provides a positive definite phase space probability
density.
Moreover, it is of utmost importance to note that, just as with Feynman’s approach (for the time), the present expansion to second order in
is not an approximation. In our case, we were able to show this result mathematically. Indeed, we have shown in [
26] that this expansion to second order in the characteristic function
and its representation as in (
3) implies that quantum mechanics has the central limit theorem implicit in it. In other words, the function
is represented, for each
in configuration space, as a Gaussian function. Thus, the present derivation is mathematically exact.
As a result of these developments, the characteristic function becomes the density matrix (for a pure state, a 1 × 1 “matrix”) in the space representation, as its expression clearly indicates. In fact, Equations (
2) and (
3) give the method to obtain averages using the probability density function
. For example, one can simply take the “trace” of this 1 × 1 matrix by making
to obtain the probability density function on configuration space, as can be easily verified from the definition (
2). The generalization for superposed states, to be made later in this paper, will introduce a full density matrix.
Thus, the derivation of the Schrödinger equation is as follows. We can write (
4) as
The equation for the characteristic function is obtained by applying the Fourier kernel in (
4) and using that
where we used integration by parts in the last integration and the fact that
, which means that the divergence of it in the entire momentum space is zero. Note that we are using
and
, as usual. If we multiply the terms in (
6) by
and use (
5), we obtain the equation for the characteristic function as
We then write
and use this result and the expression (
3) expanded to the second order in
, as imposed by the Axiom 1, to write
We now substitute this expression into Equation (
7) and separate the real and imaginary parts to obtain the equations
and
where
are the probability density function and the average momentum, respectively. Equations (
10) and (
11) are known to be equivalent to the Schrödinger equation, since we obtain these equations if we write the probability amplitude as in (
8) and substitute it into the Schrödinger equation, separating the real and imaginary parts. In fact, this is just Bohm’s decomposition, introduced in 1952 [
1,
27].
Let us now extend this derivation to encompass the electromagnetic potentials from a mathematically sound, axiomatic approach.
3. Electromagnetic Fields and Minimal Coupling
If one is now interested in making a quantization process that encompasses electromagnetic fields, in what is called the minimum coupling approach, the usual approach prescribes that it is only necessary to write rules (
1) as follows:
where
is the vector potential. In this case, the potential in the Schrödinger equation is simply the electrostatic potential
.
The rationale for applying these rules comes from writing the Hamiltonian classical function in terms of the electromagnetic potentials. Thus, one starts by calculating the Lagrangian
of a particle in an electromagnetic field [
28]. The acting force is the Lorentz force
Since F depends on velocity, we have to find a generalized potential U that satisfies the following equation
From Maxwell’s equations
, we receive
Now, by plugging our results into the Lorentz force equation and carrying out some vector calculus, we end up with
where we make use of the fact that
To obtain the generalized potential
U, a final observation is needed, namely,
which is true since the electrostatic potential
does not depend on the velocity.
Comparing
with the equation for the generalized potential, we receive
which allows us to write down the Lagrangian as
To derive the minimal coupling Hamiltonian, one has to transform the classical kinetic momentum
to the canonical momentum
In quantum mechanics, the kinetic momentum corresponds to the momentum operator
, so the canonical momentum operator becomes
. Thus, the Hamiltonian may be obtained by performing the Legendre transformation on
Being able to write down the Schrödinger equation for the electromagnetic potentials is particularly important given the role that these potentials play in quantum mechanics [
29]. In the next subsection, we provide this derivation.
3.1. Derivation of the Schrödinger Equation for the Minimal Coupling
In what follows, we will present two different (and not equivalent) ways to derive
a Schrödinger equation from the two basic postulates of the derivation of
Section 2, now extended to include the electromagnetic potentials (or fields). This, of course, implies the question of which one is correct. We will show that the usual way of postulating the minimum coupling in quantum mechanics is not correct, and we will present the correct one.
3.2. Derivation Leading to the Usual Result
Let us begin by considering the scalar electric potential
and the vector potential
in relation to the Lorentz force (we will use
to simplify notation)
in which we are using Einstein’s summation convention, and where
Of course, this is a derivation made in a non-relativistic context, which implies that we are not assuming Lorentz invariance. However, the same derivation can be made in a relativistic context using four vectors and the relativistic scalar product to show that the relativistic quantum mechanical equations are also derived as an extension of the present result [
21]; in fact, the relativistic derivation can be shown in a way that is somewhat easier than the derivation we will present in the following.
Now, to encompass the electromagnetic field, our axioms become
Axiom 3. The marginal characteristic function of the phase space probability density function , is defined bywhere ℏ is a universal parameter with dimensions of angular momentum and where is the vector potential and e is the electric charge. Moreover, this characteristic function must be written asand should be expanded to second order in the real parameter . Axiom 4. For an isolated system, the joint phase space probability density function related to any quantum-mechanical phenomenon obeys the Fourier-transformed Liouville equationto second order in , for all . Our Liouville equation becomes
When we apply the Fourier kernel to the Liouville Equation (
30) and perform the integrations, we obtain the following results:
For (D), we have
Thus,
, since it represents a divergence of a probability density function, and we know that a probability function must be zero at infinity; the term
, since
, and
, since
is an antisymmetric tensor. Thus, we end with
With these results, (
30) becomes, by explicitly writing the tensor
,
where we note that the two terms having cancel out; this being the reason why the induction part of the electric potential does not appear in the Hamiltonian in the minimal coupling.
Now, we expand the characteristic function to the second order in
to obtain
We now take (
36) into (
35) to receive (up to the first order in
)
to receive the two equations (
which is the continuity equation, and
The last two equations are equivalent to the equation
which is the Schrödinger equation for minimal coupling, as we wanted to show.
4. The Generalization of the Derivation to Superposed States
We begin with the same axioms presented in the previous section but introduce a generalization in the way the characteristic function is written. Axiom 1 now reads:
Axiom 5. The marginal characteristic function of the phase space probability density function , defined by
where ℏ is a universal parameter with dimensions of angular momentum; this characteristic function is such that it can be written aswhere is a matrix of constant parameters, which we will henceforth call “metric”; this characteristic function must be expanded to second order in the parameter . The introduction of the metric
is necessary to correctly substitute each pure state
in (
3) with a superposed state
.
Note that this is consistent with our previous understanding of the characteristic function as being the density matrix. Indeed, in the case of a superposed state, is exactly the form of the density matrix in the configuration space representation.
Thus, we generalize from the characteristic function for pure states used in our previous derivations. Given the structure of the metric, we can have mixed states if or superposed states for a self-adjoint matrix .
Indeed, for a superposition given by
we find, using (
42), that the elements of the metric matrix for this case can be written as
.
We may now repeat the steps made in the previous section to obtain the equation for the characteristic function given by (
7), since this equation does not depend on how we write this function.
Thus, we write the probability amplitude
vector components as
since each
refers to a pure state. Note that we are assuming one-dimensional physical systems; the generalization to more dimensions in the configuration space is straightforward.
Expanding each probability amplitude vector
up to second order, as we have carried out in the previous section, we receive (we used symbolic computation throughout, and these results can be find in the
Supplementary Material as an algebraic computation file).
Note that if we are assuming mixed states, then
becomes diagonal, and we may write
, where all
are real and
is the Kronecker delta; in this case, the expression simplifies to
Equation (
46) refers to mixed states, while (
45) refers to superposed states in general. In the next subsection, we will derive the Schrödinger equation for mixed states and then move on to the more involved superposed states.
4.1. Mixed States Derivation
Mixed states are represented by the characteristic function
We begin with the characteristic function defined in (
46) and substitute it into Equation (
7) to find
Now,
since the exponential terms are all linearly independent from each other, we must have, as in the previous section, the set of equation pairs (separating the real and imaginary parts in each pair in (
48))
and
where
From what we have shown in
Section 2, this result means that a Schrödinger equation must be valid for each probability amplitude
that constitutes the mixed state.
4.2. Superposed States
Now, we must insert the characteristic function in (
45) into Equation (
7) and keep only terms up to the first order in
. If we proceed in this way, we will be left with a result that gives exactly the same expression as shown in (
48)
for the diagonal terms.
Each of the non-diagonal terms is multiplied by an exponential term given by
Because of the linearly independent exponential functions in the expression obtained for different values of
and
, each term multiplying these exponential terms must be satisfied independently; that is, each one of them must be equal to zero, given the results of the diagonal terms.
The terms corresponding to the diagonal elements reproduce exactly what we obtained in the previous subsection for mixed states. Thus, they suffice to show that each state of the superposition must satisfy a Schrödinger equation. Thus, we must show that the non-diagonal terms also give zero, assuming that the states of which they are composed satisfy a Schrödinger equation.
Since the non-diagonal terms are pairwise similar (apart from a complex conjugation, which renders the involved exponential functions linearly independent) and the functions
are real, we may treat only one term without loss of generality. We thus must focus on the substitution of
into (
7). Let us write
which we already know is equivalent to the Schrödinger equation for
.
If we make that substitution and collect only terms up to the first order in
, we obtain the real part of the result as (see the
Supplemental Material)
while, for the imaginary part, if we write
we obtain (see the
Supplemental Material)
It should be noted that all terms of the real part (because of
and
) reduce to some form of Equation (
50) for the functions
and
. In the same fashion, all terms of the imaginary part (because of
and
) reduce to some form of the continuity Equation (
49) for the functions
and
.
Given the results for the diagonal elements, we know that
and Equations (
55) and (
57) are equal to zero. This means that the characteristic function built from a superposition of states also allows us to derive the Schrödinger equation (for each component of the superposition), despite this being a much more involved calculation.
5. Conclusions
Adopting an axiomatic approach to some theory entails the burden of demonstrating that the axioms withstand all possible generalizations necessary to derive the theorems of the theory in broader contexts, the Schrödinger equation being one of these theorems.
Thus, we have already shown [
16,
19] that the axioms allow us to directly derive the Schrödinger equation in any coordinate system; that our axioms are equivalent to Feynman’s path integral method (now represented in phase space); that their relativistic extension gives the relativistic counterparts of the Schrödinger equation [
21]; and that they can be extended to encompass non-Hamiltonian dissipative systems [
30].
In this paper, we have shown that the usual minimal coupling approach to quantizing physical systems encompassing electromagnetic fields can be directly obtained, along with the generalization of the results for mixed and superposed states. Unlike the generalization for the usual Schrödinger equation in many dimensions, which is quite direct, all those other derivations are algebraically involved, as shown in the present paper for two of them (for the derivation of the Schrödinger equation in generalized coordinates, see [
19]).
It should be noted that it was also possible to derive a dissipative Schrödinger equation, represented by the Caldirola–Kanai equation [
31,
32,
33], from a slight modification of the axioms [
30], since the present derivation method can be easily extended to non-Hamiltonian contexts.
These extensions provide additional support for the consistency and applicability of the proposed axiomatic framework. Note that we do not need to show that the axioms are unique. It is only necessary that they “do the job”, that is, they actually allow us to derive the Schrödinger equation. Note that it was mentioned that this derivation was already shown to be equivalent to Feynman’s path integral approach, and also many stochastic derivations of the Schrödinger equation. Another derivation of the Schrödinger equation from a different set of axioms was already made [
23]. The important fact here is that all these derivations were shown to be equivalent [
22,
23]. We are not assuming that these axioms form a minimal set, although it is important to notice that there are only two axioms, which means that it is possible that this is, indeed, a minimal set, even if this is not relevant to judge the adequacy of the derivation. What is being presented here is a formal derivation (and its extensions) that encompasses, as equivalent, many other derivations already presented in the literature over the course of decades. In this sense, this derivation also performs a great synthesis of the whole field of derivations of the Schrödinger equation.
Thus, as mentioned in the introduction of this paper, in matters of interpretation, if we keep ourselves within the boundaries established by the axioms [
25], we should be concerned about a number of semantic constructs that do not appear in these axioms but are still present in many interpretations. Active observers, reduction of the wave packet, duality, complementarity, many worlds, consciousness, and wholeness, to cite but a few, should be considered in this context as helpful historical attempts to understand a field that has shown itself impervious to an interpretation devoid of extraneous semantic constructs. However, those semantic constructs that the axiomatic derivation does not introduce must be discarded at some point in the theory’s development. This interpretation is presented in [
16] and is epistemologically justified in [
25].
As a final note, we stress that all the previous developments were made within the context of the Schrödinger equation. It is not easy, if feasible at all, to think of extensions of the approach to include quantum field theory.