State Entropy and Differentiation Phenomenon

In the formalism of quantum theory, a state of a system is represented by a density operator. Mathematically, a density operator can be decomposed into a weighted sum of (projection) operators representing an ensemble of pure states (a state distribution), but such decomposition is not unique. Various pure states distributions are mathematically described by the same density operator. These distributions are categorized into classical ones obtained from the Schatten decomposition and other, non-classical, ones. In this paper, we define the quantity called the state entropy. It can be considered as a generalization of the von Neumann entropy evaluating the diversity of states constituting a distribution. Further, we apply the state entropy to the analysis of non-classical states created at the intermediate stages in the process of quantum measurement. To do this, we employ the model of differentiation, where a system experiences step by step state transitions under the influence of environmental factors. This approach can be used for modeling various natural and mental phenomena: cell’s differentiation, evolution of biological populations, and decision making.


Introduction
In quantum theory, a state of a system is represented by a density operator. A density operator, e.g., ρ, can be decomposed into a weighted sum of (projection) operators representing "pure states". This linear combination represents a statistical distribution of pure states in an ensemble of systems. However, the same density operator ρ can be decomposed in various ways. Hence, numerous statistical state distributions are mathematically encoded by the same ρ, unless ρ coincides with a pure state.
One class of these statistical distributions, namely, obtained from "Schatten decompositions" of ρ, plays a special role. We remark that, for a density operator with degenerate spectrum, Schatten decomposition is not unique. Any selection of orthogonal bases in eigensubspaces of ρ generates some Schatten decomposition. Each Schatten decomposition corresponds to the statistical distribution of eigenstates of ρ. The crucial point is that these eigenstates may be distinguishable on the basis of measurement of some physical quantity X, because these states are orthogonal to each other. The eigenvalues are interpreted as the frequency probabilities of the measurement outcomes. In this sense, the distribution corresponding to the concrete Schatten decomposition of the density operator ρ is conceptually equivalent to a "classical" or "standard" probability distribution.
On the other hand, other decompositions of the same state ρ are "non-classical" or "non-standard" and represent ensembles of pure states which may be not orthogonal to each other. In Section 2, we discuss these points in more detail.
The main topic of this paper is a quantity that evaluates structural features of various statistical state distributions encoded in the same density operator ρ. It is well-known that the von Neumann entropy [1,2], defined as −ρ log ρ, can evaluate how ρ deviates from a pure state, i.e., the degree of mixture of pure states. In fact, −ρ log ρ can be rewritten as ∑ k −λ k log λ k , where {λ k } are eigenvalues of ρ. It equals to zero if and only if ρ is a pure state. Note that the quantity ∑ k −λ k log λ k is the Shannon entropy for classical probability distribution {λ k }. Thus, the von Neumann entropy evaluates only the classical distribution encoded in ρ, but not non-classical ones.
In this paper, we define a quantity such that more detailed information about the structure of statistical state distributions, especially non-classical ones, is reflected. Our discussion is fundamental, but straightforward. First, in Section 3, we mention the "differentiation phenomenon" which an ensemble of pure states experiences under a quantum measurement of some physical observable, say X. Each pure state is stochastically differentiated into an eigenstate of X. If pure states in the statistical ensemble are different, the expectation values of X estimated from each of them are also generally different. In Section 4, we focus on dispersion of these expectation values and discuss its mathematical property reflecting structural features of the state distribution. Finally, in Section 5, we define a "state entropy" (see Equation (17)). This quantity evaluates the "diversity" of pure states constituting an ensemble. It is proportional to the number of pure states and inversely proportional to similarities among them.
We also point to the interrelation between the state entropy and the von Neumann entropy. It can be briefly described in the following way. If a state distribution, which is encoded in ρ, is classical, then its state entropy is equal to the von Neumann entropy. The state entropies of non-classical state distributions do not exceed the latter; see the inequality of Equation (18): The state entropy is a generalization of von Neumann entropy which is extensively used in different types of quantum entropies, e.g., conditional, relative and mutual entropies [3][4][5].
State entropy evaluates non-classical statistical state distributions. To stress significance of the notion of state entropy, we explain the theoretical context of state distributions. We note that classical state distributions are always identified after completion of quantum measurements. Therefore, non-classical distributions may exist at the stages before measurements are completed, more generally, in the process of differentiation.
In Section 6, we focus on the model of differentiation that was discussed in Reference [6]. This model describes accumulation of very small state transitions experienced by the system, and each transition is mathematically represented by a map in the state space, i.e., by a "quantum channel" in the terminology of quantum information theory. A quantum channel denoted by Λ * is given by Equation (28), which is concerned with "environmental elements" around the system. They are weakly interacting with the system causing numerous small state transitions step by step, if differentiations of states occur sequentially. The above picture corresponds to an ideal "open quantum system dynamics". To describe the process of differentiation in the system, we consider a more complicated model, assuming differentiations not only of the system state, but also in the elements of the environment. The differentiation in each environmental element is similar to the determination of a "pointer basis" in the theory of quantum decoherence proposed by Zurek [7]. In our approach, the Lindblad equation [8,9], which is a traditional way to describe open quantum system dynamics, is not employed directly.
We believe that the described model can be applicable to a variety of natural and mental phenomena (not only in the micro-world). The process of creation of a diversity of states in an ensemble of systems, which were originally prepared in the same pure state Ψ, through mutual interaction with environmental factors is universal. Originally, the formalism of quantum theory was established to describe microscopic phenomena, but now it is widely used in psychology, decision making, and finance (see ). It is also applied to model behavior of biological systems, especially the functioning of genetic and epigenetic systems (see [39][40][41][42][43][44][45][46]). We plan to explore the novel mathematical apparatus developed in this paper (based on the state entropy) for such applications elsewhere.
In psychology, there has been extensive interest in employing classical entropy for quantifying uncertainty, e.g., in decision making (entropy minimization was used to model decision biases in [47]), categorization (as a way to formalize intuitions in spontaneous grouping [48]), and learning [49,50]. We plan to apply the apparatus of the quantum state entropy to these problems.
As shown in Figure 1, the accumulation of transitions generated by channel Λ * represents an ideal differentiation process realized in the system. Further, in this modeling, non-classical state distributions in the intermediate stages are identified (see Equations (29)-(31)). We analyze them by means of the state entropy (see Figures 2 and 3).

State Representation by Density Operator
If a physical quantity X is measurable in a system, the frequency probabilities {P(x)} for observed values {x} may be estimated. Then, the quantity X is a "stochastic variable" in terms of probability theory, and the distribution {P(x)} is a "state of the system" which can be analyzed, e.g., by calculating the expectation value E(X) or dispersion V(X) = E(X 2 ) − (E(X)) 2 , as is usual in statistics.
The mathematical framework of quantum theory includes probability theory, where classical concepts of stochastic variables and probability distribution are expanded using the notion of "operator". Firstly, a physical quantity is defined in the form of This is a Hermitian operator in Hilbert space H = C M with real eigenvalues x k ∈ R (k = 1, 2, · · · , M) and eigenvectors {|x k }. (A vector |x ∈ H whose norm is 1 is called ket-vector, and x|, which is Hermitian conjugate of |x , i.e., |x † = x|, is called bra-vector.) The form of Equation (1) implies that after a non-degenerate value x k is observed, the system under the measurement has the definite (pure) state represented by the operator |x k x k |. Note that the trace of product of X and |x k x k | is equal to x k ; For the calculation, the orthogonality of vectors, i.e., x k |x k = 0 if k = k , is used. Next, using the pure states {|x k x k |}, let us construct the operator: where {P(x k )} corresponds to the frequency probabilities of the observed values {x k }, and, in fact, the trace of Xρ is equal to the expected value E(X); Mathematically, ρ is a Hermitian matrix satisfying Tr(ρ) = 1 and x| ρ |x ≥ 0, ∀x ∈ H = C M . Such operator is called density operator and used for representing a statistical mixture of pure states (a mixed state). A density operator may be given in the form of Schatten decomposition, i.e., represented as a diagonal matrix: where {λ k ≥ 0} are the eigenvalues of the matrix (the same as probabilities {P(x k )} of ρ), and {|φ k ∈ H = C M } are the corresponding eigenvectors (the same as |x k of ρ). From Equation (4), one can obtain a picture of statistical mixture of {|φ k φ k |}. (This mixture is denoted by {φ k , λ k } hereafter.) As can be seen from the construction of ρ in Equation (2), to give a Schatten decomposition is conceptually equivalent to giving a probability distribution of measurement of some physical quantity. In this sense, the state distribution {φ k , λ k } is "classical". We have to point out here that decomposition of density operator is not unique, generally: By considering various linear combinations of {|φ k }, one can find a set of vectors {|Ψ i , i = 1, · · · N}, which satisfies Note that N ≥ M and the vectors {|Ψ i ∈ H = C M } need not be orthogonal to each other, that is, they need not be eigenstates of a single physical quantity: the state distribution {Ψ i , P i } is "non-classical". There exist numerous state distributions corresponding to same density operator, other than {φ k , λ k } and {Ψ i , P i }, and they are non-classical.

Differentiation Phenomenon in Quantum Measurement Process
As shown in Equation (3), for the density operator In this section, noting the non-uniqueness of decomposition of density operator, we mention the meaning of Tr(Xρ) that has not been discussed in the classical theory. Let us consider a different decomposition, Each term, e.g., X Ψ , in the above is expanded as (∑ M k=1 | Ψ|x k | 2 = 1 is satisfied.) The square of inner product | Ψ|x k | 2 is frequently called "transition probability". It is related to a problem of measurement that has been discussed in the quantum theory. In the concept of quantum measurement, the existence of the measurement device is considered first, because it is assumed that some interaction between the device and the system realizes the measurement of a physical quantity. Due to the interaction, the initial state of system |Ψ Ψ| is transferred to one of {|x k x k |}, and the values of {x k } can be read out from the device. If X Ψ = ∑ M k=1 x k | Ψ|x k | 2 means the average of outputs, the value of | Ψ|x k | 2 corresponds to the probability of transition from |Ψ Ψ| to |x k x k |.
We interpret the process of quantum measurement as a sort of "differentiation", in which a group of systems in one initial state is divided into groups having different states by means of external or environmental factors. The expected value of X Ψ i comes from one differentiation denoted by Ψ i → {x k }, and the value of Tr(Xρ) = ∑ N i=1 P i X Ψ i is to be calculated supposing a statistical mixture of M kinds of differentiations, {Ψ i → {x k }}(i = 1, · · · , M).

Characteristic Quantity of State Distribution
We assume a definitive state distribution denoted by {Ψ i , P i } is given, and the calculations of Below, we prove the inequality where V({x i , P(x i )}) is the dispersion of observable X, i.e., its dispersion with respect to the probability distribution encoded in the Shatten decomposition (see Equation (1)), corresponding to the spectral decomposition of the observable X (see Equation (2)). Thus, the probability distribution corresponding to the spectral decomposition of X maximizes the dispersions with respect to decompositions in Equation (5). The inequality for dispersions can be interpreted by the theory of weak measurements.
The quantities X Ψ i can be interpreted as weak values. In this framework, the inequality in Equation (9) simply means that dispersion of a weak measurement is always majorized by dispersion of the "maximally disturbing measurement", represented by a Hermitian operator. At the same time, we are aware that interpretation of weak values is a complex foundational problem of itself.
To prove the inequality in Equation (9), let us consider the first term given by Let us note the following inequality which follows from the convexity of y = x 2 , because and since Such inequality can be derived with the use of other convex functions, not limited to y = x 2 . Even if the dispersion V is defined as using another convex function, e.g., f (x), the result holds true, that is, the inequality is satisfied. We redefine the first term D of Equation (10) as As discussed in the next section, we believe that, under proper choices of X and f (x), this D itself becomes a quantity that captures structural features of {Ψ i , P i }.

State Entropy
In this section, we consider D of Equation (16) in the case of X = ρ and f (x) = − log x: Here, we fix the state distribution {Ψ i , P i } for the density operator ρ, and y = − log x is our choice of a convex function. What does the above D tell us about {Ψ i , P i }? To discuss this question, we first focus on the term of Based on this, we interpret ρ Ψ i as a degree of "similarity" of |Ψ i Ψ i | and ρ. This interpretation of quantity ρ Ψ i as the degree of similarity can also be illustrated by the representation of the operators |Ψ i Ψ i | and ρ as vectors in the Hilbert space of Hilbert-Schmidt operators endowed with the scalar product A|B = TrA B. We start with the remark that A|B = cos θ AB A 2 B 2 , where · 2 is the Hilbert-Schmidt norm; we also remark that, for a self-adjoint operator A, A 2 = TrA 2 . In particular, the norm of any pure state and the norm of any projector are equal to one. We have Hence, where θ is the angle between the vectors |Ψ i Ψ i | and ρ. The scaling coefficient Trρ 2 is the purity of the state ρ. Further, noting that y = − log x is a monotonically decreasing function, we interpret − log ρ Ψ i = − log cos θ − log Trρ 2 as a degree of orthogonality between the vectors |Ψ i Ψ i | and ρ. We note that the following inequality is satisfied: − log P i ≥ − log( ρ Ψ i ) > 0.
In general, any convex and monotonically decreasing function is allowed as f (x). The average of orthogonality − log ρ Ψ i , i.e., ∑ N i=1 P i (− log ρ Ψ i ) corresponds to D of Equation (17). Generally, the value of D will increase in proportion to the number of states and decrease in proportion to similarities among them. That is why we call the value D "state diversity" or "state entropy".
The following inequality shows the significance of state entropy D: It can be derived with the use of convexity of y = − log x, in a similar way as derivation of Equation (11).
In the above form, {λ k } are the eigenvalues of ρ = ∑ M k=1 λ k |φ k φ k |. The term of ∑ M k=1 λ k (− log λ k ) in the left-hand side corresponds to von Neumann entropy given by −ρ log ρ. Further, Tr(ρ 2 ) in the right-hand side is a well-known quantity in the quantum theory, too. The von Neumann entropy −ρ log ρ and Tr(ρ 2 ) are frequently used to evaluate the degree of "mixing" in ρ: if ρ is pure, then, −ρ log ρ = 0 and Tr(ρ 2 ) = 1. If ρ is a mixed state, −ρ log ρ > 0 and Tr(ρ 2 ) < 1, and especially, when λ 1 = λ 2 = · · · = λ M = 1/M, −ρ log ρ takes the maximum value of log M, and Tr(ρ 2 ) takes minimum value of 1/M. Mathematically, these two quantities have the relation of −ρ log ρ ≥ − log(Tr(ρ 2 )). The inequality in Equation (18) implies that the intermediate values between these two correspond to other kinds of state entropy, which can estimated for various non-classical state distributions reducing to ρ. In other words, the well-known −ρ log ρ and Tr(ρ 2 ) are newly interpreted as maximum and minimum values of state entropy.
Note that the state entropy D is different from the generalized quantum entropic measures that have been proposed until now. This point is mentioned in the Appendix A.

Model of Differentiation and Calculation of State Entropy
As mentioned in Section 2, a Schatten decomposition of a density operator such as Equation (2) represents a probabilistic distribution of orthogonal pure states. Such an ensemble of states is postulated to be the resulting state of the system after measurement of some physical quantity, whose eigenstates are orthogonal. On the other hand, using another decomposition of the density operator, a mixture of non-orthogonal pure states may be obtained, and we call such mixture non-classical. In Section 3, we point out that the essence of quantum measurement is state differentiation caused by external or environmental factors. If state distribution corresponding to Schatten decomposition is a goal of differentiation, various non-classical ones will appear in intermediate stages before reaching the goal. Below, we model this mechanism as proposed in [6]. This model mathematically explains what state distribution may occur in the differentiation process. Our aim in this section is to evaluate state structural features by using the state entropy defined in Section 5.
Let us consider a typical state transition caused by a quantum measurement, which is denoted by Ψ means an initial state of system represented by |Ψ Ψ|, and {ψ k , P k } means a distribution where the states {|ψ k ψ k |} exist with probabilities {P k }. {|ψ k } correspond to eigenstates of some physical quantity defined in Hibert space H = C M , and the initial vector |Ψ is expanded as where √ P k means a complex number satisfying | √ P k | 2 = P k , that is, The first term ∑ M k=1 P k |ψ k ψ k | corresponds to the distribution {ψ k , P k }, and, therefore, vanishing of the second term, the process called "decoherence" in quantum theory, means accomplishment of the measurement. The relation of Ψ and {ψ k , P k } is represented as with the use of projection operator M k = |ψ k ψ k |. (The transition probability | ψ k |Ψ | 2 is equal to P k .) If the above transition is interpreted as a sort of differentiation, its development, i.e., what state distributions occur between Ψ and {ψ k , P k }, becomes a crucial concern. The model of differentiation, which was proposed in [6], presents the picture that the initial state Ψ is differentiated to {ψ k , P k } step by step through many state transitions. Each state transition is described with use of a map from state to state, which is denoted by Λ * . The map is called "quantum channel" in quantum information theory. A chain of state transitions given as is satisfied. A channel Λ * is to be defined based on the following: There exist numerous environmental elements around the system. Initially, states of system and these elements are given independently. Let |Φ Φ| be the initial state of one element, which is defined on a space K 1 = C L . The initial compound state of the system and the element, on the space H ⊗ K 1 , is factorized At the next step, the states of the system and the element become non-separable. Such compound state is generally defined as U |Ψ Ψ| ⊗ |Φ Φ| U * , using a unitary operator U on H ⊗ K 1 . The unitary transformation U specifies a correlation generated between the system and the element, and, in the modeling, the following form is assumed: where u k is a unitary on K 1 . Actually, by this U, the vector |Ψ ⊗ |Φ is transformed to where |Φ k = u k |Φ . Then, the states of the system and the element are "entangled", since the above form cannot be factorized into two vectors independently defined on H and K 1 , if |Φ k = |Φ k for some k = k . A compound state at the third step is described as {M j } are projection operators corresponding to the basis set of K 1 = C L , say { φ j }. As can be seen from Equation (20), a state transition given by a projection operator mathematically represents accomplishment of differentiation. The operation of {M j } means that the state of the element is eventually differentiated into { φ j φ j }. Note that the states of the system and the element are correlated at the second step. Thus, the state of the system is affected by the differentiation. Actually, Equation (23) may be rewritten to by introducing the operator, The above form implies that the state of the element of the environment gets transformed to φ j φ j with probability, and at the same time the state of the system transits to The operator E j introduced in Equation (25) is called Kraus operator and satisfies ∑ L j=1 E * j E j = I. (In general, a set of Hermitian positive operators {F i } with ∑ N i=1 F i = I is called positive-operator valued measure (POVM).) With the use of {E j }, a quantum channel Λ * is defined: Λ * (|Ψ Ψ|) = ∑ L j=1 E j |Ψ Ψ| E * j means the density operator obtained from the partial trace of the compound state, Tr K (∑ L j=1 E j |Ψ Ψ| E * j ⊗ φ j φ j ). The other environmental elements are defined in Hilbert spaces denoted by K 2,3··· . If they interact with the system in a similar way, ρ(n) = Λ * (ρ(n − 1)) = · · · = Λ * (Λ * (· · · Λ * (|Ψ Ψ|) · · · )), is defined as the density operator of the system that is obtained after interacting with n environmental elements. From the definition of Λ * (see Equation (28)), this ρ(n) is decomposed as where P {j 1 ,j 2 ,··· ,j n } = Ψ| E * {j 1 ,j 2 ,··· ,j n } E {j 1 ,j 2 ,··· ,j n } |Ψ , and Ψ {j 1 ,j 2 ,··· ,j n } = 1 P {j 1 ,j 2 ,··· ,j n } E {j 1 ,j 2 ,··· ,j n } |Ψ .

Population Rate
, | ψ 1 |Ψ {i 1 ,i 2 ,··· ,i n } | 2 takes a value nearby 1(0). With increasing n, the state distribution approaches to {{ψ 1 , ψ 2 }, {0.7, 0.3}}. Figure 2 shows the behavior of the state entropy D, von Neumann entropy and − log(Tr(ρ 2 )) for the distribution {Ψ {i 1 ,i 2 ,··· ,i n } , P {i 1 ,i 2 ,··· ,i n } }, which are calculated in the same setting of parameters. One can directly see that the inequality of Equation (18) is satisfied at any n. Note that the state entropy D takes values close to von Neumann entropy at very large n. In fact, as shown in Figure 3, the difference between von Neumann entropy and the state entropy is noticeable mostly at earlier stages. These results imply that state distributions appearing in the differentiation process are non-classical in general.

Conclusions
The state entropy is a truly non-classical quantity because it depends not only on statistical probabilities, but also on similarities among states. The differentiation phenomenon is also non-classical, because it is interpreted as dynamics of the probabilities and similarities. Definition of the state entropy and modeling of the differentiation process are impossible in the framework of classical probability theory.
We believe that evaluation of an ensemble of systems by the state entropy fits the empirical reasoning: No matter how many systems are in the ensemble, we may not recognize high diversity if we know that these states are not very different. Further, we believe that various areas of the nature dynamics of character change in the population of individuals is very much like the differentiation phenomena. This makes prospects of the quantum-like formalism grow stronger.

Conflicts of Interest:
The authors declare no conflict of interest.

Appendix A. State Entropy and other Quantum Entropies
In this paper, we propose the state entropy, More generally, it is defined as by using a convex and monotone decreasing function f (x). Mathematically, D({ ρ Ψ i , P i } depends on the way of decomposition of ρ. As discussed in Section 5, for ρ = ∑ N i=1 P i |Ψ i Ψ i |, f (Tr(ρ |Ψ i Ψ i |)) = f ( ρ Ψ i ) is interpreted as the degree of orthogonality between |Ψ i Ψ i | and ρ. In this sense, D evaluates a sort of diversity in the state distribution {Ψ i , P i }, and it takes the maximal value equivalent to −Tr(ρ log ρ) for the Schatten decomposition (see Equation (18)).
On the other hand, there are many mathematical expansions of von Neumann entropy. As examples, the quantum version of Rényi entropy [51], and the one of Tsallis entropy [52], are well-known. These entropies approach to von Neumann entropy S(ρ) = −Tr(ρ log ρ) in the limit α → 1. The index α, which is called the entropic parameter, is nonnegative and α = 1. Further, such generalized entropies are uniformly represented in the form of quantum version of Salicrú entropy [53,54], which is given by H (h,φ) (ρ) = h(Trφ(ρ)), where the functions h : R → R and φ : [0, 1] → R satisfy either of the following conditions: (i) h is increasing and φ is concave; or (ii) h is decreasing and φ is convex. In the form of Rényi entropy, h(x) = log(x) 1−α and φ(x) = x α , and in the form of Tsallis entropy, h(x) = x−1 1−α and φ(x) = x α . Of course, von Neumann entropy is also recovered at h(x) = x and φ(x) = −x log x.
Here, we have to point out that H (h,φ) (ρ) is practically calculated as by using the eigenvalues of ρ, that is, any quantum entropic measure that is reduced into H (h,φ) (ρ) does not depend on the way of decomposition of ρ. The state entropy is different in this point. Actually, it is clear that D({ ρ Ψ i , P i }) is not recovered in the form of H (h,φ) (ρ).