What Can We Learn from Entanglement and Quantum Tomography?

: Entanglement has become a hot topic in nuclear and particle physics, although many physicists are not sure they know what it means. We maintain that an era of understanding and using quantum mechanics on a dramatically new basis has arrived. We review a viewpoint that treats the subject as being primarily descriptive and completely free of the intellectual straitjackets and mysticism argued over long ago. Quantum probability is an extension of classical probability, but with universal uses. Density matrices describe systems where entanglement or its absence is a classiﬁcation tool. Most of these have been known for decades, but there is a new way of understanding them that is liberated from the narrow outlook of the early days


What Is Happening with Entanglement?
"Entanglement" is a concept from quantum mechanics that has become a hot topic. A search for papers in the nuclear/particle physics database Inspire produced about 35 times more recent titles when using the keyword "entanglement" as compared to when "confinement" was used (see Figure 1). However, what is entanglement exactly, and why should we care about this feature of quantum mechanics? We believe that quantum mechanics is undergoing a renaissance for more than one reason. For a long time, nuclear and particle physics automated quantum mechanics with perturbative recipes. Yet since we care about strong interaction physics and want to understand it, should we not be more thoughtful? Moreover, for a long time, discussing quantum mechanics was taboo because verbiage from the Bohr era had devolved into Figure 2. Quantum probability is an extension of classical probability. It is a mathematical tool which simplifies the description of vector data. The schoolbook notion of "probability" as being estimated by counting frequencies [2] or by making distributions refers to a classical procedure that does not capture the broader features of quantum probability originating in the Born rule.

Axioms That Physics Contradicted
The history of probability is curious. First, "probability" is not straightforward to define. There is no surprise that Euler, Laplace, and others wrote about it early. It is interesting that the early thoughtful work was more Bayesian than frequentist, and that the classical frequentist dogma only appeared in the 20th century. Surprisingly late in history, Kolmogorov in 1993 elected himself to define probability [3]. He needed only three axioms. One of the consequences (sometimes swapped for an axiom) is a statement of additivity for probabilities in different channels A, B, which is expressed as follows: The term subtracted avoids double counting. Evidently, Kolmogorov was not thinking about physics because by 1933 physics had already made Equation (1) and those axioms obsolete. In 1926, Born introduced the "Born rule", which sometimes violated Equation (1). Quantum probability is not based on Equation (1) and often contradicts classical probability. However, that is not what we have been told.
Early on, we were told that quantum physical systems are inherently probabilistic and perhaps impossible for the human brain to understand, and also that the mathematics of linear algebra was very difficult. We later found that the mathematics is easy, while unaware that a presentation framed in the classical notions of probability, distributions, and so on would lead to contradictions. That is the situation to this day. A mistake about probability begins with a statement [3] that the purpose of quantum mechanics is to make a wave function ψ( x) so that the probability distribution of point-like particles at x is f ( x) = ψ * ψ( x). That gaffe makes the premises of the old quantum theory  define the features of a theory that contradicts those premises. (The "particles" of quantum field theory and Feynman diagrams are delocalized plane waves.) The Born rule for probability does not explicitly refer to a probability distribution, and it should not be introduced as one. It is so simple that it is hidden in plain sight. We might consistently usẽ f ( k) =ψ * ( k)ψ( k) as a probability distribution in an experiment involving a momentum k. However, a distribution transforms with a Jacobian factor. In quantum probability, we do not computef ( k) = |∂x/∂k| f ( x), where |∂x/∂k| is the Jacobian determinant of the transformation from x to k. As youngsters, however, we were never told about this; we were instead told that d 3 xψ * ψ( x) = 1 was the total probability, along with Parseval's theorem d 3 kψ * ψ ( x) = 1, which seemed to be enough to define probability. It did not.
The point is that quantum probability is an entirely new descriptive tool. After the first probability-based mistake (in the old days) came an intimidating axiomatic formulation, which was rather similar to Euclidean geometry for operators, observables, and measurement, and it was completely unlike any other subject in physics. Do physicists write or read axioms? The old presentation-which we strongly recommend changing-repeatedly suggested that the more rigorous the mathematics, the more contradictory the physical interpretation was, so that contradictions became necessary and possibly until a form of enlightenment was achieved. We disagree. You can find our thoughts developed in a book [1].

The Power of Polarization
Problems started with not defining probability in the first place. Long before Born, since the early 19th century, physical systems violating Equation (1) had already been known. The main example is the behavior of polarized light. To the human eye, a linear polarizer is a passive filter (an absorber) transmitting 1/2 the power of unpolarized light. The "absorber theory" predicts that lining up two filters, one in front of the other, will allow 1/4 of the power to pass through. Yet, positioning two aligned polarizers allowed 1/2 of the light to pass through, which is impossible for the absorber theory to explain. The arrangement of two crossed polarizers also allowed no light to pass, thus creating another contradiction. At this point, the absorber theory had to evolve and propose two kinds of light: light polarized in the x or y direction-which is already a concept error when any orthogonal coordinate system can be used. Muddling along with a classical probabilistic description, one can define E x , E y as mutually exclusive types of light. Absorber theorists can "explain" the data by proposing a conditional probability, P(E a |E b ) = δ ab , which assumes that species E a can be detected after the filter given that E b comes into the filter; here δ ab is the Kronecker delta. However, this is as far as one can ride the horse of Kolmogorov's axioms, because there will come a stream that horse cannot cross.
Here is the decisive experiment. Take two crossed polarizers; between them, insert a new polarizer with a 45 • orientation (see Figure 3). Light is then transmitted as if the intermediate absorber created light. A certain level of intensity adds rather than subtracts in the intersecting region P(A ∩ B). We all know how to perform the calculation. A polarizer oriented to pass x-polarization is represented by the projection operator |x >< x|, and so on. The sequence of crossed polarizers is represented by |y >< y|x >< x| = 0.  The defect of the classical probabilistic description is that "objects" are grouped into mutually exclusive equivalence classes, which are either "apples" or "oranges". In comparison, vector quantities cannot be limited to mutually exclusive classes. Unlike edible fruits, any vector can be considered to be a superposition of some other vector. This is the quickest road to demystifying the Born rule, which is a natural outcome of bookkeeping for vector-valued physical quantities. Any normalized vector |v > on any space can be considered as a multiple of any other normalized vector |w >, in addition to a component orthogonal to |w >. The formula is as follows: |v >= |w >< w|v > +(|v > −|w >< w|v >). (2) The second term, in parentheses, is orthogonal to |w >. The first term has the interpretation of the pre-existing, undisturbed vector |w > lurking inside of |v >, which will appear when any filter is aligned to pass |w >. The relative intensity of that projection is | < w|v > | 2 . Any time the probability of an event is estimated in proportion to the intensity or to the collision rate or to the number of counts of a phototube, linear algebra predicts a conditional probability P(|w > |v >) = | < w|v > | 2 unless something about the measurement disturbs the pre-existing projection. (Compare the tradition of associating the Born rule with the "irreducible disturbance of measurement due to the finite value of Planck's constant". It is specious. When a system is disturbed enough, the simple projective recipe of the Born rule has no reason to apply.) One might feel that we have gone too quickly from "intensity" to "probability". One can think there is no need to invoke any notion of probability in order to measure classical radiation. However, a good experimentalist will tell us that everything measured is probabilistic and means nothing without that context and its error bars. When we evaluate experiments, we use the probability of the data (given the hypothesis) in order to obtain to the probability of the hypothesis (given the data). It is not usually recognized that classical electrodynamics has long defined a distribution with the use of the Born rule. It appears in symbols such as dI/dωdΩ for the distribution of intensity in frequency ω and in the solid angle Ω. This is absolutely a distribution transforming under changes in frequency variables using the rule of a distribution. However, it is not the kind of classical distribution that comes from changing time to frequency with a Jacobian factor. The formulas are used by electrical engineers (and physicists [4,5]) with noisy radar problems as statistical quantities, but without noticing where the Born rule enters. The frequency distribution comes when going into the frequency and wave number domain of the electric field ( E). Just as in quantum theory, the Fourier transform from E( x, t) to E( k, ω) is squared to make the distribution. The argument that this quantum-style recipe is correct is remarkably shallow [6], rather similar to elementary quantum textbooks. The total dt E 2 (t) = dωẼ 2 (ω) is maintained by the Parseval theorem, at that is enough to have a "good distribution".
We now ask-has our discussion been about classical physics or quantum physics? We have been talking about what was already known in classical physics, which happens to appear again in quantum physics. Noticing such things can make one more comfortable with quantum mechanics, although there are difficult interpretational issues (Section 2.3) that cannot be dismissed. Hence, while we can talk about "intensity" in a self-explanatory way, a lot is hidden in the multiple uses of the word "probability". In effect, and according to an axiom that is even greater than Kolmogorov's, the word "probability" will mean what people decide it means. The word is not self-explanatory and cannot be particularly reduced to mean a number defined by counting relative frequencies of events. (Probabilities can be estimated with information from counting frequencies, not defined.) Even in classical probability, members of the Bayesian school do not allow their subject to be limited by Kolmogorov's opinions.
Then, here are the facts: Since quantum probability uses the Born rule, it can sometimes violate the assumptions of classical probability, and that is no crisis. Classical probability itself is a far more subtle topic than probability defined by counting frequencies. Instead of imagining that quantum objects are "crazy, inexplicable jelly beans" described by classical probability, consider that quantum probability is an extension of classical probability that uses the Born rule, because that is manifestly true. We are not supposed to understand the correlations of entanglement, etc.-which are features of the new mathematical descriptionin terms of the old description. Furthermore, a new mathematical description is not automatically a new discovery about the universe.

Stern-Gerlach and Calcite
We will now show that entanglement had been recognized and used in classical physics for about 100 years, before Schrödinger even coined the word for quantum physics [7].
The usual presentation is as follows: Two scientific gentlemen (with initials SG) sent silver atoms through a region with a non-uniform magnetic field. They were astonished to find two spots of silver plated out onto a paper card. Each spot can be described with (1, 0) or (0, 1), which are highly abstract notations for spin "up" or "down". Those numbers are the components of a polarization wave function |ψ > projected onto a basis with the eigenvalues of σ z = m, thus ψ m =< m|ψ >. Moreover, when the particular beam analyzed by (1, 0) is sent to a new SG apparatus rotated at the angle θ, it will spit again into two beams by projecting it onto the new basis |m (θ) > with the use of |m >= ∑ m |m (θ) >< m (θ)|m >. The probability of |m (θ) > is given by the Born rule | < m (θ)|m > | 2 . This is said to represent the discovery that quantum physics needs two axiomatic foundations: "Measurement of a quantity will yield the eigenvalue of the corresponding Hermitian operator and cast the state into the corresponding eigenvector" and "with probability given by the Born rule". To be fair, in the age of QIS, those are called "ideal projective measurements", which apply sometimes, but not always. This fact is not so well-known to the larger community of students and physicists, and so we continue.
In fact, entanglement is missing from that exercise validating circular axioms. It explains why one does not need them. The 19th century presentation is as follows: Light is described with an electric field E( x, t). This is an infinite dimensional matrix E xa (t), where subscript x stands for x, and subscript a stands for the polarization. Any matrix has a singular value decomposition (svd) [8], namely The labels α, β are the names of orthonormal components, and the sum, at most, runs up to the dimension of the smaller space. The numbers Λ α are positive and called singular values; they are invariants under orthogonal transformations in both spaces. The meaning of svd is that a vector in a direct product space (here, space ⊗ polarization) can be written as a diagonal sum of products on special orthonormal vectors customized to the object. This is very surprising. Without it, one would assume that vectors in product space must be expanded into the sums of all possible products of the basis vectors of the separate spaces, which can be quite ugly. The fascinating history of svd is reviewed in Ref [9].
The existence of svd was clearly not known to many physicists before 1930. The lack of the theorem explains how a fact of mathematics was misunderstood as representing the discovery of a new physical principle. For example, Dirac's textbook discussion of the interferometer [10] comes right to the issue of classical polarization while lacking the information about entanglement with the vacuum; it finally capitulates to postulating what was observed. The world is cheated every time this sort of thing happens. As Feynman must have said, "Nothing is explained by a postulate made for what was not explained." While svd was not known in 1830, the fact of light having two polarizations was known, and so was the completeness of the ansatz for crystal optics (see Figure 4): We absorbed the singular values into the φ symbols. One usually sees a less complete ansatz with one term and φ (1) ( x, t) = e ı( k· x−ωt) . Two terms are needed to understand the propagation of light in birefringent materials, where calcite crystals ("iceland spar") display exceptionally striking behavior; see Figure 5. The reader may not be prepared to learn that light passing through calcite splits into two beams, with each beam having a unique polarization that is orthogonal to the other. However, that is in fact what Equation (3) predicts. The spatial wave functions of the beams φ (1) , φ (2) are orthogonal since they go in different directions, so each must be strictly correlated with an orthogonal polarization. The scientists of 1830 knew Snell's Law, that two beams split by calcite implied two refractive indices; they were actually savvy enough to know that a tensor index of refraction would become diagonal in the frame of its eigenvectors [11] (although this language developed later.) Moreover, the intensity in each beam is simply the square of the projection onto the corresponding propagating polarization eigenstate.

Figure 5.
A single LED (light-emitting diod) light source is behind the calcite crystal in this paper author's hand. The beam of light splits into two beams, using the same mathematical fact as in the Stern-Gerlach experiment. The experiment and the mathematics behind it were major clues to the polarization of matter waves, which has often been misinterpreted. Figure is taken from Ref. [1] Let us now define entanglement for wave functions. If a wave function in a direct product space A ⊗ B is a simple product of wave functions, |ψ(A, B) >= |ψ 1 (A) > |ψ 2 (B) >, it is separable and not entangled. Otherwise, it is entangled. (It is really that simple.) If Equation (3) were written with one space times one polarization wave function, it would be separable. The expression shown is entangled. Again we ask: has our discussion been about classical physics or quantum physics? We have been talking about what was already known in classical physics, which happens to be recycled in quantum physics. The reason why the classical and quantum treatments share so much is that the topic is vector-valued data.

Observables and the Density Matrix
Very early in the conventional presentation of quantum theory, the states of the "quantum universe" were said to be given by wave functions |ψ > or "pure states". A symbol < A >=< ψ|A|ψ > was introduced and interpreted in terms of classical probabilities. In the presentation in [3]-which we criticize for misleading students, but not experts in quantum information science-"probability" was set up as a naive, self-defining counting of frequencies that needed no definition. (The trick was to hide the depth of the concept. Compare to classical Bayesian probability for the opposite.) Representing |ψ >= ∑ n |a n >< a n |ψ >, where A|a n >= a n |a n >, algebra gives < A >= ∑ n | < a n |ψ > | 2 a n . As everyone knows, this is consistent with experiments that measure eigenvalues a n with the Born rule probabilities | < a n |ψ > | 2 and add them up with the rules of classical probability. In fact, the presentation appears to prevent any other interpretation. It is built to validate a "postulate" of ideal projective measurement that always measures an eigenvalue and casts the system into the corresponding eigenstate, with no exceptions. However, the whole presentation in fact lacks generality. It is about the exceptional case that a pure state would be relevant and would be measured as described.
Furthermore, a mathematical theorem says that only operators that commute can share eigenvectors. On an N-dimensional complex space, there are N commuting operators, of which one (the identity) carries no information. The limitation of measuring at most N − 1 real numbers of a wave function that is described by 2N − 2 real numbers (accounting for an unobservable norm and phase) seems to rigorously imply that a wave function cannot be observed. (The number of independent variables of a spin-vector and a density matrix happen to agree for a single spin-1/2 system. It creates an unfortunate tendency to think that polarization and spin describe the same thing. Counting for any other case shows this is not true.) Some early workers, who were committed in advance to a philosophical principle on the unobservability of the quantum world, found this to be a perfect confirmation of their bias. However, the conclusion lacks generality; it is no better than the assumptions leading to it.
In fact, not all measurements are ideal projective ones. Quantum probability is not limited to being described by wave functions. The fundamental object of quantum probability is the density matrix. Here is a simple motivation. Suppose we combine the measurements of < A > α =< ψ α |A|ψ α >, where |ψ α > are wave functions we may or may not know or control. At this point, < A > α are numbers. We do not pretend to know or limit the experimental circumstances that have led to them. Let ρ α > 0 be numbers interpreted as some kind of classical probability. We agree that ∑ α ρ α = 1. An observable will thus be defined as an average that is weighted with those numbers: where Here, tr stands for the trace. Equation (4) is the actual definition of an observable in quantum mechanics. While this is not made clear or is completely omitted in many textbooks, the book by Ballentine [12] can be recommended for giving it full attention.
Equation (5) looks like the representation of an operator in the basis of its orthonormal eigenvectors, often called the "spectral resolution". The eigenvector expansion is ρ = ∑ α ρ α |ρ α >< ρ α |, with ρ|ρ α >= ρ α |ρ α >. We specifically did not impose that. In Equation (5), we intended |ψ α > to be completely general. We also did not limit how the weights ρ α were to be assigned. Instead of coming from the guts of an experimental apparatus, they might be assigned by a computer. We might measure the energy of a particle accelerator with a thermometer and never talk about the eigenvalues of a thermometer operator. The interesting fact is that once ρ is made, a great deal of information might be lost about how it was made. This is a clue to how quantum probability works. An enormous amount of information is "compressed" in the sense of image processing, by going from raw data to a density matrix that describes some of its features.
The density matrix eliminates many elementary paradoxes. Everyone has heard the following: "In classical physics one adds probabilities in distinct channels, P(A + B) = P(A) + P(B) (with allowance for double counting, Equation (1)). In quantum physics one adds amplitudes, ψ(A + B) = ψ(A) + ψ(B), and then the probability P(A + B) = (ψ * (A) + ψ * (B))(ψ(A) + ψ(B)) = P(A) + P(B) + 2Re(ψ * (A)ψ(B)) is different due to interference". This is sometimes true, but believing it is always true causes mistakes. Dealing exclusively with pure states prevents such a simple thing as adding the intensity of two beams of sunshine or describing what is meant by an unpolarized system. There is no wave function for an unpolarized photon or electron. Indeed, there is no wave function for most things observed in physics. In nuclear and particle physics, we have "exclusive" and "inclusive" experiments. The inclusive experiments sum over unobserved channels and cannot be described with wave functions, but need density matrices. So long as we cannot control the Universe that controls the experiments, there may not exist a truly exclusive experiment.
Since the diagonal elements of a density matrix have the features of classical probability (being positive and adding up to one), a density matrix can predict a distribution. However, a distribution cannot replace a density matrix. The parton "distributions" are not distributions but density matrices [13], which makes all the difference when one starts measuring off-diagonal components.
Recapitulating: While we were motivated by a density matrix beginning with wave functions, it is much more general to begin with density matrices as the fundamental description of a quantum "state". In general, a density matrix is positive, written as ρ > 0, which means it has positive eigenvalues. Nothing else is needed. One will usually see two more fundamental requirements: ρ = ρ † , which is called Hermiticity, and Tr(ρ) = 1, a normalization postulate. However, positive eigenvalues implying real eigenvalues implies Hermiticity, which is not a separate postulate. (The condition ρ = ρ † is a demand for notation where it applies.) We can also dispense with requiring Tr(ρ) = 1 by defining our observables as < A >= Tr(ρA)/Tr(ρ). Then, Tr(ρ) = 1 is a convention.
Since density matrices define the general state (also called a "mixed state"), a system that can be described by wave functions is exceptional. A "pure state" is a rank-one density matrix: ρ = |ψ >< ψ|. Given ρ, one derives |ψ > as its sole eigenvector. Eigenvectors do not have pre-determined normalizations or phases. This fact eliminates the need for a postulate about the unobservable phase and norm of wave functions. The ideal projective measurements that are given so much importance are then guaranteed circular outcomes when and if a rank-one density matrix commutes with an operator A: Then, ρ = |a >< a| for some eigenvector, and < A >= a will indeed be an eigenvalue. Thus, if one only deals with the exceptional case of pure states, quantum mechanics can be represented with wave functions. In the generic case, no wave function can do the job of a density matrix, and attempting to fully understand entanglement in those terms will fail.

Entanglement with Density Matrices
Since wave functions describe an exceptional type of entanglement, there needs to be a definition of entanglement in terms of density matrices. Quantum theory was so delayed by early squabbles that the criteria for density matrix entanglement did not appear until the 1989 paper of Werner [14]. Two systems A, B are separable when Here, each factor ρ( ) is a positive matrix, and P > 0 are positive numbers. Systems that are not separable are entangled. Let the Hamiltonian be H(AB) = H A + H B + H AB , where H AB is the interaction term. Let U A (t) = e −ıH A t , and so on. Then, Thus, when separability exists, it is invariant under time evolution neglecting interactions. Separable systems are invariably used in zeroth-order perturbation theory. Separability in interacting systems does not usually happen.
It is interesting and powerful to recognize that the separable case of Figure 6 stands for a "factorization theorem". Add external legs to represent measured particle momenta: The diagram is a basic factorization theorem, and it is up to refinements for subtle (typically soft) perturbative interactions that may not factorize, but which can be dealt with in order to make a theorem work out. The point is that separability that is conditional upon a probe is factorization, and vice versa. Alternatively, systems with probes that produce factorization are separable and not entangled. This is new.
Perturbative QCD uses factorization, maintaining separability under subsystem time evolution to identify "universal" parton density matrices. However, this depends upon a probe that makes the separation self-consistent. Then, the parton distributions-which are density matrices-express some quantities using the rules of classical probability, as if quantum mechanics never existed. It is impossible for all quantities defined in quantum mechanics to be computed with classical probability, which is a good reason to keep thinking about quantum mechanics. Figure 6. Diagrams classifying entanglement with density matrices. Left: An non-factorizable or entangled system with a generic probe Σ AB . Right: A separable system conditional upon a special class of probe Σ AB is also called factorized.

Quantum Tomography
Tomography refers to building up higher dimensional structures from lower dimensional information. Quantum tomography is the process by which measurements of the form < A 1 >, < A 2 > ... < A n > determine the features of a system's density matrix. With enough measurements, it is possible to completely construct the density matrix.
Matrices obey the addition and completeness rules of vectors, so there exists a Hilbert space of matrices. The inner product of two square matrices A, B in the space is (A, B) = Tr(A † B). While this is called the Hilbert-Schmidt inner product, it can be understood as simply flattening the columns of B and rows of A into long lists and computing the ordinary inner product. The observable number < A 1 >= tr(ρA 1 ) can now be interpreted as the "overlap" of the system with the operator, a measure of how much the operator looks like the system (and vice versa). Orthonormal Hermitian operators are defined by (A j , A k ) = Tr(A j A k ) = δ jk . The completeness of an orthonormal basis of operators is expressed with 1 = ∑ k A k ⊗ A k . For example, the N 2 generators of U(N) in standard form make a complete orthonormal basis for N × N Hermitian matrices.
Expanding a density matrix in such a set is achieved with the following equation: This is quantum tomography. It is clearly presented in the remarkable paper by Fano [15], which directly contradicted Bohr's mysticism of unobservability that was in vogue at that time. Bohr's disciples (and textbook writers) ignored it, and it was almost forgotten. Quantum tomography was more or less re-discovered with the onset of quantum computing, which could not possibly ignore it. By now, quantum tomography has been applied in a variety of domains [16][17][18][19][20][21][22][23].
Quantum tomography revises much of what is usually taught about quantum mechanics. The density matrix is observable in general. The wave function is the eigenvector of a density matrix, and it is observable when there is a pure state. The density matrix and wave function as well as quantum mechanics should not be viewed as lofty abstractions beyond human understanding, but rather as concrete summaries of what has been observed.
The question is, how much will be observed? We cannot observe an infinite amount of data to fully reconstruct an infinitely dimensional object; but that is not necessary. If some operators are not observed, one has no information about them, and they will not appear on the right-hand side in the sum of Equation (7). The left-hand side then represents what was observed, no more and no less. We have not seen this mentioned anywhere, and therefore in Ref. [24], we call it "the mirror trick". It contradicts the claims we have seen that "quantum tomography is exponentially difficult". That assumes that all possible information must be obtained to make a density matrix. When we view a density matrix as a reduction of information to an experimental summary, we escape the illusion of dealing with all possible information.
Consider how this is related to inclusive reactions. Feynman was subtle in setting up the parton model in terms of probabilities extracted from experiments. He had no obligation to predict them from first principles. That was necessary because hadrons are non-perturbative objects. It was also conceptually clean because unobservable quantities never entered the discussion. The parton "distributions" of Feynman's approach were made to describe what was observed. Feynman's treatment of quantum mechanics and field theory was alarmingly casual. (Partons were the topic of Feynman's Photon-Hadron Interactions [25]. The prototype was the equivalent photon approximation. His use of quantum mechanics was too casual, leading to a mistake in his treatment of transversity. In writing Ref. [13], the author suggested to his dissertation advisor: "How about we write that Feynman made a mistake?" The advisor responded with: "How about we not write that Feynman made a mistake".) In fact, the parton density matrices are sophisticated devices coming directly from quantum mechanics; however, they continue to be called "distributions".

Difficult Issues
Students learning quantum mechanics are faced with a huge number of interpretational challenges. They undergo a remarkable evolution. Students who believe everything they are told give up. They have been told contradictions inherited from the old quantum theory. Those who ignore what they are told become physicists.
Somehow physicists come to understand most of quantum mechanics as being not so mysterious. There is nothing mysterious about linear algebra. Most agree that the uncertainty relation is a handy fact of Fourier analysis, not a mystery, and so on. By ordering assumptions carefully, one can absolutely understand most of quantum mechanics in a natural, intuitive way, and this is a good thing.
At the same time, we need to remind the reader that serious interpretational challenges remain and are the topic of active research. Some of the topics we discuss have a deeper side that we have not discussed. It pays to think about the difficult issues. The most important, in our view, are variations in the so-called Einsteing-Podolsky-Rosen (EPR) thought experiment and the Bell/CHSH (Clauser-Horne-Shimony-Holt) inequalities. One can find an unlimited amount of wrong material online about the topics along with a sizable literature of careful, precise analysis, with some coming from deep theorists and some coming from philosophers. We cannot do justice to this topic. Concisely put, certain event-by-event correlations described by quantum probability have been experimentally observed, although they cannot exist in any classical probabilistic description conceived so far. (It is not about a classical explanation. There is not even a classical description.) Now we have emphasized that the quantum and classical descriptions differ at an early stage. One then needs to ask whether the debates simply dramatize what should have been recognized earlier. It is much more profound: The non-local character of correlations at points with space-like separation creates the question as to whether a "realist" view of the universe is tenable. But who will define "reality?" We are not going to define it, nor settle it, in these proceedings. We do believe the issues are so important that every physicist needs to study them thoroughly for themselves and face the conflicts that come up. It is absolutely fascinating.

How to Perform Quantum Tomography: An Example
In Ref. [24], we provide a concrete procedure for performing quantum tomography with laboratory data. Computer code that produces density matrices from 4-vector inputs is included [26]. It is critical to maintain positivity, which is achieved with a Cholesky representation. [27] The topic of the paper happens to be the lepton pair product from hadron collisions, but the method can be used for any subject where one has data. Here are the steps: • The "system" to be explored will have an unknown density matrix ρ X . Find another density matrix of the same dimension called the "probe" ρ probe ( ). In a strong interaction laboratory, the symbol stands for observed 4-momenta. Associate the probe with experimental data by the density matrix Born rule, P( ρ x ) = Tr(ρ probe ( )ρ X ). This replaces and completely bypasses a great deal of unobservable theoretical superstructure. • As in the data varies, ρ probe ( ) ranges over a certain space of matrices. The observable Tr(ρ probe ( )ρ X ) tomographically measures the projection of ρ X onto ρ probe ( ). By keeping track of the projections, the parts of ρ X that can be observed are determined. There is never a need and seldom an intention to make an exhaustive measurement. This is exactly the same as the standard inclusive experimental procedure: One reports what has been measured, summing over everything else. • After a density matrix has been determined, one can conduct "quantum" experiments with it. For example, the quantum probability of a normalized state |ψ > is P(|ψ > ρ) =< ψ|ρ|ψ >. This is not a classical probability, and we have observed peculiar behavior in certain data sets that theory has not predicted. Next, in any particular basis, the off-diagonal elements of ρ convey quantum information that is not equivalent to classical probability. Then, the Von Neumann entropy S = −Tr(ρ log(ρ)) has information about entanglement that is not equivalent to classical measures of correlation.
Thus, the "art" of applied quantum probability adds value to the traditional classical statistics based on making distributions. What can one learn from entanglement and quantum tomography? One can learn everything that is possible to learn.