1. Introduction
In an interesting article, D. Šafránek and J. Thingna introduce the concept of entropy for quantum instruments [
1]. Various important theorems are proved and applications are given. In quantum computation and information theory one of the most important problems is to determine an unknown state by applying measurements on the system [
2,
3,
4,
5]. Entropy provides a quantification for the amount of information given to solve this so-called state discrimination problem [
6,
7,
8]. In this article, we first define the entropy for the most basic measurement, namely a quantum effect 
a [
2,
3,
9,
10]. If 
 is a state, we define the 
-entropy 
 which gives the amount of uncertainty (or randomness) that a measurement of 
a provides about 
. The smaller 
 is, the more information a measurement of 
a provides about 
. In 
Section 2, we give bounds on 
 and show that if 
 is an effect then 
. We then prove a result concerning convex mixtures of effects. We also consider sequential products of effects and their 
-entropies.
In 
Section 3, we employ 
 to define the entropy 
 for an observable 
A. Then 
 gives the uncertainty that a measurement of 
A provides about 
. We show that 
 directly gives the 
-entropy 
 for an instrument 
. We establish bounds for 
 and characterize when these bounds are obtained. These give simplified proofs of results given in [
1,
5,
11]. We also consider 
-entropies for measurement models, sequential products of observables and coarse-graining of observables. Various examples that illustrate the theory are provided. In this work, all Hilbert spaces are assumed to be finite dimensional. Although this is a restriction, the work applies for quantum computation and information theory [
2,
3,
9,
10].
  2. Entropy for Effects
Let 
H be a finite dimensional complex Hilbert space with dimension 
n. We denote the set of linear operators on 
H by 
 and the set of states on 
H by 
. If 
 with nonzero eigenvalues 
 including multiplicities, the 
von Neumann entropy of 
 is [
4,
6,
7,
8].
      
      We consider 
 as a measure of the randomness or uncertainty of 
 and smaller values of 
 indicate more information content. For example, 
 is the completely random state 
, where 
I is the identity operator, if and only if 
 and 
 is a pure state if and only if 
. Moreover, it is well-known that 
 for all 
. The following properties of 
S are well-known [
4,
6,
8]:
      where 
 with 
.
An operator 
 that satisfies 
 is called an 
effect [
2,
3,
9,
10]. We think of an effect 
a as a two-outcome yes-no measurement. If a measurement of 
a results in outcome yes we say that 
a occurs and if it results in outcome no then 
a does not occur. The effect 
 is the 
complement of 
a and 
 occurs if and only if 
a does not occur. We denote the set of effects by 
. If 
 and 
 then 
 and we interpret 
 as the probability that 
a occurs when the system is in state 
. If 
 we define the 
-
entropy of 
a to be
      
     We interpret 
 as the amount of uncertainty that the system is in state 
 resulting from a measurement of 
a. The smaller 
 is, the more information a measurement of 
a gives about 
. Such information is useful for state discrimination problems [
2,
3,
4,
5].
If 
 is the completely random state 
 then (
1) becomes
      
      Since 
 we conclude that 
 for all 
. Another extreme case is when 
 for 
. We then have for any 
 that
      
      Thus, as 
 gets smaller, the more information we gain.
A real-valued function with domain 
, an interval in 
, is 
strictly convex if for any 
 with 
 and 
 we have
      
      If the opposite inequality holds, then 
f is 
strictly concave. It is clear that 
f is strictly convex if and only if 
 is strictly concave. Of special importance in this work are the strictly convex functions 
 and 
. We shall frequently employ Jensen’s theorem which says: if 
f is strictly convex and 
 with 
, then
      
      Moreover, we have equality if and only if 
 for all 
 [
1].
Theorem 1.  If  with nonzero eigenvalues , , and  with , thenwhere  is the spectral decomposition of ρ. Moreover,  if and only if  in which case  and ifthen  for all  and  while if  for all  then .  Proof.  Letting 
, 
, we have that 
 and 
. Since 
 is strictly concave we obtain
        
        Since
        
        we have that
        
        If 
, then
        
        Conversely, if 
, then clearly 
. If (
2) holds, then we have equality for Jensen’s inequality. Hence, 
 for all 
. Since
        
        we conclude that
        
       Finally, suppose 
 for all 
. Then
        
        We conclude that
        
 □
 For  we write  if .
Theorem 2.  If , then  for all . Moreover,  if and only if .
 Proof.  Since 
 is concave, letting 
, 
, 
, 
 we obtain
        
        We have equality if and only if 
 which is equivalent to 
.    □
 Corollary 1.   and  if and only if .
 Proof.  Applying Theorem 2 we obtain
        
 □
 Corollary 2.  .
 Corollary 3.  If , then  for all .
 Proof.  If 
, then 
 for 
. Hence,
        
        for every 
.    □
 Applying Theorem 2 and induction we obtain the following.
Corollary 4.  If , then . Moreover, we have equality if and only if  for all .
 Notice that  is a convex set in the sense that if  and  with , then .
Corollary 5.  (i) If  and , then  for all . (ii) If , , with , then  for all . We have equality if and only if  for all .
 Proof.  (i) We have that
        
        (ii) Applying (i) and Corollary 4 gives
        
        together with the equality condition.    □
 As with ,  is a convex set and we have the following.
Theorem 3.  If  , , with , thenfor all . We have equality if and only if  for all .  Proof.  Letting 
, since 
 is concave, we obtain
        
We have equality if and only if  which is equivalent to  for all .    □
 Theorem 4.  If , , , then  An 
operation on 
H is a completely positive linear map 
 such that 
 for all 
 [
2,
3,
6,
9,
10]. If 
 is an operation we define the 
dual of 
 to be the unique linear map 
 that satisfies 
 for all 
. If 
 then for any 
 we have 
 and it follows that 
. We say that 
measures if 
 for all 
. If 
 measures 
a we define the 
-
sequential product for all 
 [
12,
13]. Although 
 depends on the operation used to measure 
a we do not include 
 in the notation for simplicity. We interpret 
 as the effect that results from first measuring 
a using 
 and then measuring 
b.
Theorem 5.  (i) If , then . (ii). (iii) for all . (iv) for all .
 Proof.  (i) For every 
 we obtain
        
       Hence, 
. (ii) For all 
 we have
        
        Hence, 
. (iii) By (i) and (ii) we have
        
        It follows that 
. (iv) Since 
, by Corollary 3 we obtain 
 for all 
.    □
 Theorem 5(iv) shows that 
 gives more information than 
a about 
. We can continue this process and make more measurements as follows. If 
 measures 
, 
, we have
      
      and it follows from Theorem 5(iv) that
      
      Notice that the probability of occurrence of the effect 
 in state 
 is
      
      Thus, we begin with the input state 
, then measure 
 using 
, then measure 
 using 
 and finally measuring 
.
Example 1.  1 For  we define the Lüders operation  [14]. Sincewe have  so . We have that  measures a becausefor every . We conclude that the  sequential product isWe also have that  Example 2.  2 For ,  we define the Holevo operation [15] . Sincewe have . We have  measures a becausefor every . We conclude that the  sequential product isWe also have thatIf , , and we measure  with operations , , thenMoreover, it follows from Corollary 5(i) thatfor all .    3. Entropy of Observables and Instruments
We now extend our work on entropy of effects to entropy of observables and instruments. An 
observable on 
H is a finite collection of effects 
, 
, where 
 [
2,
3,
9]. The set 
 is called the 
outcome space of 
A. The effect 
 occurs when a measurement of 
A results in the outcome 
x. If 
, then 
 is the probability that outcome 
x results from a measurement of 
A when the system is in state 
. If 
, then
      
      is the probability that 
A has an outcome in 
 when the system is in state 
 and 
 is called the 
distribution of 
A. We also use the notation 
 so 
 for all 
. In this way, an observable is a 
positive operation-valued measure (POVM). We say that an observable 
A is 
sharp if 
 is a projection on 
H for all 
 and 
A is 
atomic if 
 is a one-dimensional projection for all 
.
If 
A is an observable and 
 the 
-
entropy of 
A is 
 where the sum is over the 
 such that 
. Then 
 is a measure of the information that a measurement of 
A gives about 
. The smaller 
 is, the more information given. Notice that if 
A is sharp, then 
 and if 
A is atomic, then
      
      There are two interesting extremes for 
. If 
 has spectral decomposition 
 and 
A is the observable 
, then
      
      As we shall see, this gives the minimum entropy (most information). For the completely random state 
 and any observable 
A we obtain
      
      We shall also see that this gives the maximum entropy (least information).
Theorem 6.  For any observable A and  we have  Proof.  Applying Theorem 1 we obtain
        
        Since 
 is concave and 
, 
 we have by Jensen’s inequality
        
 □
 An observable A is trivial if , , .
Corollary 6.  (i)  if and only if  for all . (ii) A is trivial if and only if  for all . (iii) if and only if  for all observables A. (iv) if and only if .
 Proof.  (i) This follows from the proof of Theorem 6 because this is the condition for equality in Jensen’s inequality. (ii) Suppose 
A is trivial with 
. Then for every 
 we have
        
       Conversely, suppose 
 for all 
. By (i) we have that 
 for all 
. It follows that
        
        for every 
, 
. Hence, 
 so that
        
        We conclude that 
 for all 
 so 
A is trivial. (iii) If 
, we have shown in (
3) that 
 for all observables 
A. Conversely, if 
 for every observable 
A, as before, we have 
 for every observable 
A. Letting 
 be the observable given by the spectral decomposition 
 where 
A is atomic, we conclude that 
 for all 
. Hence, 
 and 
. (iv)If 
, by Theorem 6, 
 for every observable 
A. Applying (iii), 
. Conversely, if 
, then
        
 □
 We now extend Corollary 5(ii) and Theorem 3 to observables. If 
 are observables with the same outcome space 
, 
, and 
 with 
, then the observable 
 where 
 is called a 
convex combination of the 
 [
12].
Theorem 7.  (i) 
If A is a convex combination of , , then for all  we have
        (ii) 
If  with , , , and A is an observable, then Proof.  (i) Applying Corollary 5(ii) gives
        
        (ii) Applying Theorem 3 gives
        
 □
 We say that an observable 
B is a 
coarse-graining of an observable 
A if there exists a surjection 
 such that
      
      for every 
 [
2,
12,
16].
Theorem 8.  If B is a coarse-graining of A, then  for all .
 Proof.  Let 
 for all 
 and let 
, 
 for all 
, 
. Then
        
        Let 
, 
 so that
        
       Since 
 is concave, we conclude that
        
 □
 The equality condition for Jensen’s inequality gives the following.
Corollary 7.  An observable A possesses a coarse-graining  with  for all  if and only if for every  with  we have  A trace preserving operation is called a 
channel. An 
instrument on 
H is a finite collection of operations 
 such that 
 is a channel [
2,
3,
9]. We call 
 the 
outcome space for 
. If 
 is an instrument, there exists a unique observable 
A such that 
 for all 
, 
 and we say that 
 measures A. Although an instrument measures a unique observable, an observable is measured by many instruments For example, if 
A is an observable, the corresponding 
Łüders instrument [
14] is defined by
      
      for all 
. Then 
 is an instrument because
      
      for all 
. Moreover, 
 measures 
A because
      
      for all 
. Of course, this is related to Example 1. Corresponding to Example 2, we have a 
Holevo instrument  where 
, 
 and
      
      for all 
 [
15]. To show that 
 is an instrument we have
      
      Moreover, 
 measures 
A because
      
Let 
 be observables and let 
 be an instrument that measures 
A. We define the 
-
sequential product 
 [
12,
13] by 
 and
      
      Defining 
 by 
,we obtain
      
      We conclude that 
A is a coarse-graining of 
. Applying Theorem 8 we obtain the following.
Corollary 8.  If  are observables, the  for all . Equality  holds if and only if for every ,  we have  Extending this work to more than two observables, let 
 be instruments that measure the observables 
, respectively. If 
 is another observable, we have that
      
The next result follows from Corollary 8.
Corollary 9.  If  are observables, thenfor all .  If 
 is an instrument, let 
A be the unique observable that 
 measures so 
 for all 
 and 
. We define the 
-
entropy of 
 as 
. Since 
 we have
      
      Hence,
      
      Now let 
 be instruments and let 
 be the unique observables they measure, respectively. Denoting the composition of two instruments 
 by 
 we have
      
      Hence, the observable measured by 
 is 
. It follows that
      
      We conclude that Theorems 1, 2 and 3 [
1] follow from our results. Moreover, our proofs are simpler since they come from the more basic concept of 
-entropy for effects.
Let 
 be observables on 
H and let 
 be an instrument that measures 
A. The corresponding sequential product becomes
      
      The 
-entropy of 
 has the form
      
      If 
 is the Lüders instrument 
 we have 
 and
      
      If 
 is the Holevo instrument 
, 
 we obtain
      
      This also follows from Corollary 8 because
      
If 
A is an observable on 
H and 
B is an observable on 
K we form the 
tensor product observable on 
 given by 
 where 
 [
12].
Lemma 1.  If , , then  Proof.  From the definition of 
 we obtain
        
 □
 We conclude that A gives more information about  than A and B give about  and similarly for B.
A 
measurement model [
2,
3,
9] is a 5-tuple 
 where 
H is the 
system Hilbert space, 
K is the 
probe Hilbert space, 
 is the 
interaction channel, 
 is the initial 
probe state and 
P is the 
probe observable on 
K. We interpret 
 as an apparatus that is employed to measure an instrument and hence an observable. In fact, 
 measures the unique instrument 
 on 
H given by
      
     In this way, a state 
 is input into the apparatus and combined with the initial state 
 of the probe system. The channel 
 interacts the two states and a measurement of the probe 
P is performed resulting in outcome 
x. The outcome state is reduced to 
H by applying the partial trace over 
K. Now 
 measures an unique observable 
A on 
H that satisfies
      
      The 
-entropy of 
 becomes
      
      where 
 is given by (
4). Of course, 
 gives the amount of information that a measurement by 
 provides about 
. A closely related concept is the observable 
 and 
 also provides the amount of information that a measurement 
 provides about 
. It follows from (
4) that the distribution of 
A in the state 
 equals the distribution of 
 in the state 
. We now compare 
 and 
. Applying (
4) gives
      
      It follows that 
 if and only if
      
      Now (
5) may or may not hold depending on 
A, 
 and 
P. In many cases, 
P is atomic [
2,
9] and then
      
      so 
 for all 
. Also, (
5) holds if 
P is sharp.