Intrinsic and Measured Information in Separable Quantum Processes

Gier, David; Crutchfield, James P.

doi:10.3390/e27060599

Open AccessArticle

Intrinsic and Measured Information in Separable Quantum Processes

by

David Gier

and

James P. Crutchfield

^*

Complexity Sciences Center and Physics and Astronomy Department, University of California at Davis, One Shields Avenue, Davis, CA 95616, USA

^*

Author to whom correspondence should be addressed.

Entropy 2025, 27(6), 599; https://doi.org/10.3390/e27060599

Submission received: 7 April 2025 / Revised: 26 May 2025 / Accepted: 29 May 2025 / Published: 3 June 2025

(This article belongs to the Special Issue Quantum Probability and Randomness V)

Download

Browse Figures

Versions Notes

Abstract

Stationary quantum information sources emit sequences of correlated qudits—that is, structured quantum stochastic processes. If an observer performs identical measurements on a qudit sequence, the outcomes are a realization of a classical stochastic process. We introduce quantum-information-theoretic properties for separable qudit sequences that serve as bounds on the classical information properties of subsequent measured processes. For sources driven by hidden Markov dynamics, we describe how an observer can temporarily or permanently synchronize to the source’s internal state using specific positive operator-valued measures or adaptive measurement protocols. We introduce a method for approximating an information source with an independent and identically distributed, Markov, or larger memory model through tomographic reconstruction. We identify broad classes of separable processes based on their quantum information properties and the complexity of measurements required to synchronize to and accurately reconstruct them.

Keywords:

quantum stochastic process; quantum information theory; classical information theory; hidden Markov models; information source; measurement; positive operator-valued measure; adaptive measurement; von Neumann entropy rate; Shannon entropy rate; excess entropy; computational mechanics; non-Markovian process; mutual information; synchronization; quantum process tomography; unifilarity

1. Introduction

Determining a quantum system’s state requires grappling with multiple sources of uncertainty, including several that do not arise in classical physics. Irreducible limits on measurement, in particular, have been a hallmark of quantum physics since Heisenberg introduced the position–momentum uncertainty principle in 1927 [1]. Similar incompatible measurements exist for generic pure quantum states [2].

For a 2-level quantum system or qubit, it is impossible to simultaneously measure the value of a spin in the x, y, and z directions. (Stated mathematically, the Pauli matrices

σ_{x}

,

σ_{y}

, and

σ_{z}

do not commute). Additionally, a single measurement in each basis is insufficient. One must measure many copies in each basis to specify the distribution of outcomes. As a result, determining an unknown qubit state through quantum state tomography requires measuring a large ensemble of identical copies with a set of mutually unbiased bases [3] or a single informationally complete positive operator-valued measure (POVM) [4].

These sources of uncertainty are familiar in quantum physics. Contrast them with when an observer receives a sequence of correlated qubits. Measuring them one by one, what will they see? And, what then can they infer about the resources necessary to generate these qubit strings? The following answers these questions by teasing apart the sources of apparent randomness and correlation in measured quantum processes.

1.1. Quantum and Classical Randomness

Also in 1927, von Neumann formulated quantum mechanics in terms of statistical ensembles and quantified the entropy of these mixed quantum states. In doing so, he extended Gibbs’ work on statistical ensembles and classical thermodynamic entropies to the quantum domain [5]. A mixed quantum state

ρ

has an entropy

S (ρ) = - t r (ρ log ρ)

, which is now known as the von Neumann entropy. (

t r (\cdot)

is the trace operator).

S (ρ) = 0

if and only if

ρ

is a pure (nonmixed) quantum state. On the one hand, the von Neumann entropy is key to understanding quantum systems, particularly those with entangled subsystems that exhibit nonclassical correlations. On the other, the uncertainty

S (ρ)

quantifies a generic feature of statistical ensembles. It does not correspond to any particular quantum mechanical effect.

These two forms of uncertainty—due to ensembles and to quantum indeterminacy—are combined within the framework of quantum information theory, which generalizes classical information theory to quantum observables [6]. One notable example is noiseless coding. Shannon quantified the information content produced by a noiseless classical independent and identically distributed (i.i.d.) information source—one that emits a state drawn from the same distribution at each timestep [7]. Schumacher’s quantum noiseless coding theorem generalized this to quantum information sources. This gave a new physical interpretation of the von Neumann entropy: For an i.i.d. quantum source emitting state

ρ

,

S (ρ)

is the number of qubits required for a reliable compression scheme [8].

1.2. Sources with Memory

Non-i.i.d. stationary information sources inject additional forms of uncertainty. For example, a source may have an internal memory that induces correlations between sequential qubits and therefore between measurement outcomes. Such correlations may be purely classical or uniquely quantal in nature. As we will show, an experimenter who assumes (incorrectly) that such a source is i.i.d. and then applies existing tomographic methods will not detect these correlations and so will overestimate the source’s randomness and underestimate its compressibility.

Classical memoryful sources are described within the framework of computational mechanics, in which stationary dynamical systems serve as information sources with their own internal states and dynamic [9]. Sequential finite-precision measurements of a dynamical system form a discrete-time stochastic process. The resulting process’ statistics allow one to construct a model of the source and calculate its asymptotic entropy rate, internal memory requirements, and other physically relevant properties [10,11]. Importantly, the uncertainty associated with sequential measurements of a classical information source can be reduced, sometimes substantially, by an observer capable of synchronizing to the source’s internal states [12,13].

Subjecting an open quantum system to sequential qudit probes presents a similar but more general challenge, as the amount of information an observer can glean from an individual qudit through measurement is limited. Recent results established that applying particular measuring instruments to qubits induces complex behavior in measurement sequences [14]. Here, we extend these results by studying the properties of the quantum states themselves in addition to particular sequences of measurement outcomes.

The following introduces novel quantum-information-theoretic properties for sequences of separable—i.e., nonentangled—qudits. We build on previous results that focused on entropy rates, compression limits, and optimal coding strategies for stationary quantum information sources [15,16,17], as well as on results for specific experimentally motivated deviations from the i.i.d. assumption [18]. The approach is distinct from but complements recent efforts on quantum stochastic processes in which an observer measures a quantum system directly. This is complicated due to the latter’s interaction with an inaccessible environment that induces memory effects in sequential measurement outcomes [19,20,21,22,23].

Section 2 introduces classical processes, separable qudit processes, and methods of transforming from one to the other via classical-quantum channels and measurement channels. Then, Section 3, in concert with Appendix A, defines the entropies associated with quantum and classical processes, respectively. Adapting Ref. [11]’s entropy hierarchy, we employ discrete-time derivatives and integrals to obtain a family of distinct quantitative measures of quantum process randomness and correlation. We prove that, for projective or informationally complete measurements, the sequences of measurement outcomes form classical processes whose information properties are bounded by those of the quantum process being measured.

Section 4 then surveys examples of increasingly structured separable qubit and qutrit processes. Section 5 discusses how an observer can synchronize to a memoryful source—i.e., determine its internal state—through sequential measurement. Section 6 uses the resulting catalog of possible process behaviors to answer practical questions for an observer of a quantum process attempting to perform tomography. Finally, Section 7 draws out lessons and proposes future directions and applications, most notably extending the results to the experimentally realizable generation of arbitrary entangled qudit states [24,25] and using correlations as a resource to perform thermodynamic quantum information processing [26,27].

2. Stochastic Processes

We consider the output of an information source to be a discrete-time, stationary stochastic process. If the source output is a classical random variable—

X_{t}

for each timestep t—we can directly apply the methods of computational mechanics [11]. Our goal is to extend these methods to describe separable sequences of qudits, each represented by a pure state in d-dimensional Hilbert space:

| ψ_{t} 〉 \in H^{d}

at each timestep t. Given such a qudit sequence, one can perform repeated, identical measurements such that the outcomes form a classical stochastic process. Since one can choose to measure qudit states in many different bases, the properties of the classical measured process are determined by both the state of the correlated qudits and the measurement choice. Thus, the relationship between a quantum process and classical measured processes is one-to-many. Figure 1 illustrates this setup.

2.1. Classical Processes

A classical stochastic process is defined by a probability measure

Pr (\overset{\leftrightarrow}{X})

over a chain of random variables:

\begin{matrix} \overset{\leftrightarrow}{X} \equiv X_{- 2} X_{- 1} X_{0} X_{1} X_{2} \dots, \end{matrix}

with each

X_{t}

taking on values drawn from a finite alphabet

X

. A block of ℓ consecutive random variables is denoted as

X_{0 : l} = X_{0} X_{1} \dots X_{l - 1}

. The indexing is left-inclusive and right-exclusive. A particular bi-infinite process realization is denoted as

\overset{\leftrightarrow}{x} \equiv \dots x_{- 2} x_{- 1} x_{0} x_{1} x_{2} \dots

, with events

x_{t}

taking values in a discrete set

X

. Realizations of a block of length ℓ are known as words and denoted as

x_{0 : l}

. The set of all words of length ℓ is

X^{l}

.

We consider processes that are stationary, meaning that word probabilities

Pr (X_{0 : l})

are time-independent:

\begin{matrix} Pr (X_{0 : l}) = Pr (X_{t : t + l}), \end{matrix}

for all

t \in Z

and

l \in Z^{+}

. A stationary process’s statistics are fully described by the set of length-ℓ word distributions

Pr (X_{0 : l})

. A block of length ℓ has at most

{| X |}^{l}

possible realizations (words).

One important subclass of processes are independently and identically distributed (i.i.d.) processes. The joint block probabilities of an i.i.d. process take the form

\begin{matrix} Pr (X_{0 : l}) \equiv Pr (X_{0}) Pr (X_{1}) \dots Pr (X_{l - 1}), \end{matrix}

(1)

for all

l \in Z^{+}

. This factoring of the block probabilities results in no statistical correlations between any random variables due to stationarity

Pr (X_{t}) = Pr (X_{0})

for all t for an i.i.d. process.

Another commonly studied subclass consists of the Markov processes for which the distribution for each

X_{t}

depends only on the immediately preceding random variable

X_{t - 1}

. For Markov processes, the joint probabilities for finite-length blocks factor as

\begin{matrix} Pr (X_{0 : l}) \equiv Pr (X_{0}) Pr (X_{1} | X_{0}) \dots Pr (X_{l - 1} | X_{l - 2}), \end{matrix}

(2)

where

Pr (X | Y)

is the probability distribution of random variable X conditioned on random variable Y.

Finally, there is the markedly larger subclass of hidden Markov processes that have an internal Markov dynamic that is not directly observable. Though the joint probabilities do not factor as in Equation (2), the internal Markov dynamic restricts the process statistics, as we describe next.

2.2. Presentations

A presentation of a process is a model consisting of a set of internal states and a transition dynamic between those states that together reproduce the process’s statistics exactly. A given process may have many presentations. We focus on those depicted with state transition diagrams (directed graphs) that generate stationary, discrete-time stochastic processes in a natural way.

A Markov chain is a process presentation defined by the pair

(X, T)

:

A finite alphabet $X$ of m symbols $x \in X$ .
A $m \times m$ transition matrix T. That is, if the source emits symbol $x_{i}$ , with probability $T_{i j} = Pr (x_{j} | x_{i})$ , it emits symbol $x_{j}$ next.

The stationary distribution for a Markov chain is denoted as

π

, and it is a distribution over internal states in

X

that satisfies

π = π T

. For a Markov chain, the set of internal states is exactly the set of emitted symbols, since the probability distribution for the next symbol is completely determined by the previous symbol. We represent each state as a node in a graph and each transition as a directed edge between nodes labeled by the associated probability.

Markov chains are sufficient to represent Markov processes, but we can describe the more general class of hidden Markov processes by allowing for internal states that are not directly observable. These processes are generated by hidden Markov chains (HMCs), which are defined by the triple

(S, X, T)

:

A finite set $S = {σ_{1}, \dots, σ_{n}}$ of internal states.
A finite alphabet $X$ of m symbols $x \in X$ .
A set $T = {T^{x} : x \in X}$ of m $n \times n$ symbol-labeled transition matrices. That is, if the source is in state $σ_{i}$ , with probability $T_{i j}^{x} = Pr (x, σ_{j} | σ_{i})$ , it emits symbol x while transitioning to state $σ_{j}$ .

We represent each possible transition between states as an edge between their nodes labeled with the emitted symbol and the transition probability. An HMC’s stationary distribution

π

over

S

uniquely satisfies

π = π \sum_{x} T^{x}

.

Any HMC that exactly reproduces a process’s statistical features is a generative HMC. This is an important distinction, since only some of those also belong to the more restrictive class of predictive HMCs. An HMC is predictive if its state at time

t + 1

is completely determined by the state at time t and the emitted symbol. This property is known as unifilarity.

At this point, we must emphasize the difference between a process and a particular presentation of that process. This distinction is critical when designating processes and models to be ‘classical’ or ‘quantum.’ A discrete-time classical stochastic processes is classical because it consists of a chain of classical random variables. Markov chains and HMCs are classical models because their internal states and dynamics are both classical. One may instead construct a presentation of a classical process with a set of quantum states that the model transitions between via some quantum dynamic. An observer can recover the classical process’ statistics by taking sequential measurements on either the system or on ancilla qudits that interact with the system at each timestep. The simulation of classical stochastic processes with quantum resources is the objective of quantum computational mechanics. There, a class of quantum models (q-simulators) shows advantage in terms of memory requirements over provably minimal classical predictive models (

ε

-machines) [28,29,30,31,32]. Likewise, different presentations of a quantum processes may have an underlying dynamic that is either classical or quantum. We turn now to quantum processes and their presentations.

2.3. Quantum Processes

Discrete-time classical stochastic processes consist of one classical random variable for each timestep. Likewise, discrete-time quantum stochastic processes consist of one quantum state

| ψ_{t} 〉 \in H^{d}

at each timestep. We first describe an i.i.d. quantum information source and then generalize to sources with memory.

2.3.1. Memoryless

A discrete-time quantum information source emits a d-level quantum system or qudit at each timestep. The statistical mixture of the infinite qudit sequences emitted by a source is a quantum process. As in the classical setting, different classes of quantum processes are distinguished by their temporal correlations. Now, however, for quantum sources we must use quantum information theory to account for both classical and quantal correlations.

First, consider the output of an i.i.d. (memoryless) quantum information source. Let

H

be a d-dimensional Hilbert space with pure states

| ψ_{x} 〉 \in H

. A d-level i.i.d. quantum information source consists of a set

Q

of pure qudit states and a probability distribution over those states such that

Pr (| ψ_{x} 〉) > 0

for all

| ψ_{x} 〉 \in Q

. We refer to

Q

as a quantum alphabet and consider only quantum alphabets with a finite number of pure states.

At each discrete timestep t, the source emits state

| ψ_{x} 〉

with probability

Pr (| ψ_{x} 〉)

. The resulting ensemble is described by the

d \times d

density matrix:

\begin{matrix} ρ_{i i d} = \sum_{| ψ_{x} 〉 \in Q} Pr (| ψ_{x} 〉) | ψ_{x} 〉 〈 ψ_{x} | . \end{matrix}

(3)

This particular pure-state decomposition of

ρ_{i i d}

is not unique. Moreover, an observer cannot determine

Q

through observations—unless

Q

consists of only one state—since many pure-state ensembles correspond to the same density matrix.

If an i.i.d. source emits

ρ_{i i d}

at each timestep, then the quantum process generated by the source is simply the infinite tensor product state:

\begin{matrix} | Ψ_{i i d} 〉 = \dots \otimes ρ_{i i d} \otimes ρ_{i i d} \otimes ρ_{i i d} \otimes \dots . \end{matrix}

(4)

2.3.2. Memoryful

We cannot describe non-i.i.d. sources using a single probability distribution over

Q

but must introduce a probability distribution over sequences of states drawn from

Q

. We do this by associating each element of

Q

with an element in the symbol alphabet

X

of an underlying classical stochastic process

\overset{\leftrightarrow}{X}

. Infinite qudit sequences then inherit probabilities from

\overset{\leftrightarrow}{X}

. This construction results in qudit sequences that are separable—i.e., not entangled.

We express the relationship between symbols and pure quantum states via a memoryless classical-quantum channel

E : X \to H

, taking

x \to | ψ_{x} 〉

. This is also known as a preparation channel (or encoder); see Appendix B for more.

Preparation channels are dual to measurement channels, described later, that map quantum states to classical probability distributions and, via sampling, to particular symbols.

For the classical process

\overset{\leftrightarrow}{X}

whose realizations consist of symbols

x \in X

, the associated quantum alphabet

Q

is constructed by passing each element of

X

through

E

such that

Q = {E (x)}

. Thus,

Q

is completely determined by

X

and

E

. For example, in the i.i.d. case, Equation (3) can also be written as

ρ_{i i d} = E (X)

, and each possible pure-state decomposition of

ρ_{i i d}

can now be interpreted as a different combination of classical random variable X and preparation channel

E

.

In a slight abuse of notation, we write

| Ψ_{- \infty : \infty} 〉 = E (\overset{\leftrightarrow}{X})

to indicate that quantum process

| Ψ_{- \infty : \infty} 〉

is formed by passing each random variable of

\overset{\leftrightarrow}{X}

through the classical quantum channel

E

.

Note that an infinite qudit sequence (separable or entangled) can be viewed as a one-dimensional lattice of qudits indexed by

t \in Z

. These possibly entangled states can be described in full generality using an operator algebraic approach. We ground our formal definition of quantum processes in this mathematical setting. (Reference [17] provides a more detailed treatment of observable algebras for entangled qudit sequences over

Z

).

Let

B_{t}

be the d-dimensional matrix algebra describing all possible observables on lattice site t. (For

d = 2

, the space of observables is spanned by the identity and the

2 \times 2

complex Pauli (Hermitian, unitary) matrices). The state of the qudit at site t can be described by the density matrix

ρ_{t}

acting on

H_{t}

of dimension d. For a block of ℓ consecutive qudits, all observables can be described by the joint algebra over ℓ sites of the lattice,

B_{0 : l} = ⨂_{t = 0}^{l - 1} B_{t}

, and the state of this block is

ρ_{0 : l}

acting on

H_{0 : l} = ⨂_{t = 0}^{l - 1} H_{t}

, a Hilbert space of dimension

d^{l}

. Combining all local algebras allows one to define an algebra

B

over the infinite lattice. A quantum process is a particular state over the infinite lattice and can be written as

| Ψ_{- \infty : \infty} 〉

.

As a necessary first step and to more readily adapt information-theoretic tools from classical processes, we return to the more restricted case: separable sequences of qudits drawn from a finite alphabet

Q

of pure qudit states. Given a classical word

w = x_{0} x_{1} \dots x_{l - 1} \in X^{l}

and a preparation channel

E : X \to Q

, a qudit sequence takes the form

\begin{matrix} | ψ_{w} 〉 & = E (w) \\ = | ψ_{x_{0}} 〉 \otimes | ψ_{x_{1}} 〉 \otimes \dots \otimes | ψ_{x_{l - 1}} 〉, \end{matrix}

(5)

where ℓ is the length of the sequence, and

| ψ_{w} 〉 \in H_{0 : l}

. Note that

dim (H_{0 : l}) = d^{l}

,

Q = {E (x)}

, and the number of possible qudit sequences of length ℓ is

{| Q |}^{l}

or—assuming all

| ψ_{x} 〉

are distinguishable—

{| X |}^{l}

.

A separable quantum process is then defined by

| Ψ_{- \infty : \infty} 〉 = E (\overset{\leftrightarrow}{X})

. Different preparations—i.e., different combinations of

\overset{\leftrightarrow}{X}

and

E

—may produce the same quantum process.

The set of length-ℓ-block density matrices for a quantum process is given by

\begin{matrix} ρ_{0 : l} & = E (X_{0 : l}) \\ = \sum_{w \in X^{l}} Pr (| ψ_{w} 〉) | ψ_{w} 〉 〈 ψ_{w} |, \end{matrix}

(6)

where

| ψ_{w} 〉

are the separable vectors given in Equation (5). Conveniently, their probabilities are determined by those of the underlying classical stochastic process

Pr (\overset{\leftrightarrow}{X})

:

Pr (| ψ_{w} 〉) = Pr (w)

. Each

ρ_{0 : l}

is a finite subsystem of the pure quantum state

| Ψ_{- \infty : \infty} 〉

over the infinite lattice. We use left-inclusive/right-exclusive indexing for density matrices as well.

For a given

ρ_{0 : l}

, one can also obtain a purification in a finite-dimensional Hilbert space [33]. It is important to note that, since

ρ_{0 : l}

does not have a unique pure-state decomposition, one cannot generally reconstruct the probabilities

Pr (| ψ_{w} 〉)

from it. Rather,

ρ_{0 : l}

contains only information accessible to an observer. And, if

Q

contains nonorthogonal qudit states (

〈 ψ_{x} | ψ_{x^{'}} 〉 \neq 0

for some

| ψ_{x} 〉

,

| ψ_{x^{'}} 〉 \in Q

), then an observer cannot unambiguously distinguish them.

In addition to separability, we also focus on stationary quantum processes, meaning

\begin{matrix} ρ_{0 : l} = ρ_{t : l + t}, \end{matrix}

(7)

for all

l \in Z^{+}

and

t \in Z

. If

\overset{\leftrightarrow}{X}

is stationary, then

| Ψ_{- \infty : \infty} 〉 = E (\overset{\leftrightarrow}{X})

will be stationary by construction.

For an i.i.d. quantum process, the joint probabilities of

\overset{\leftrightarrow}{X}

factor as in Equation (1), giving the quantum process the form of Equation (4). The length-ℓ-block density matrix is represented by a product state as follows:

\begin{matrix} ρ_{0 : l} = ⨂_{t = 0}^{l - 1} ρ_{i i d}, \end{matrix}

(8)

with

ρ_{i i d}

taking the form in Equation (3).

For an underlying classical process

\overset{\leftrightarrow}{X}

that is Markov, there are additional subtleties. Joint probabilities of

\overset{\leftrightarrow}{X}

factor as in Equation (2), so the joint probabilities of

| Ψ_{- \infty : \infty} 〉 = E (\overset{\leftrightarrow}{X})

also factor so that

\begin{matrix} Pr (| ψ_{w} 〉) & = Pr (| ψ_{x_{0}} 〉) Pr (| ψ_{x_{1}} 〉 | | ψ_{x_{0}} 〉) \\ \dots Pr (| ψ_{x_{l - 1}} 〉 | | ψ_{x_{l - 2}} 〉) . \end{matrix}

(9)

However, an observer cannot reliably distinguish between different states

| ψ_{x} 〉

when measuring a quantum process, and the underlying Markov dynamic is hidden from observation. Thus, the general setting for memoryful quantum processes is that of hidden Markov processes. These are best introduced using concrete models that directly represent a process’s structure.

2.4. Presentations of Quantum Processes

A presentation for a quantum process is a model with internal states and a transition dynamic between them that emits pure quantum states rather than classical symbols. As for presentations of classical processes, we depict them with state transition diagrams. When

\overset{\leftrightarrow}{X}

is a Markov or hidden Markov process,

| Ψ_{- \infty : \infty} 〉 = E (\overset{\leftrightarrow}{X})

can be represented with an extension of Ref. [14]’s classically controlled qubit sources (cCQSs) as follows.

A hidden Markov chain quantum source (HMCQS) is a triple

(S, Q, T)

consisting of

A finite set $S = {σ_{1}, \dots, σ_{n}}$ of internal states.
A finite alphabet $Q = {| ψ_{0} 〉, \dots, | ψ_{m - 1} 〉}$ of pure qudit states, with each $| ψ_{x} 〉 \in H^{d}$ .
A set $T = {T^{x} : | ψ_{x} 〉 \in Q}$ of m $n \times n$ transition matrices. That is, if the source is in state $σ_{i}$ , with probability $T_{i j}^{x} = Pr (| ψ_{x} 〉, σ_{j} | σ_{i})$ , it emits qudit $| ψ_{x} 〉$ while transitioning to internal state $σ_{j}$ .

As with HMCs, the stationary distribution for an HMCQS satisfies

π = π \sum_{x} T^{x}

.

Any HMCQS that exactly reproduces a quantum process is a generative HMCQS or generator of the process. Though quantum models cannot be predictive in the same sense as classical models, we can define an analog to classical unifilarity. An HMCQS is quantum unifilar if, for every state

σ \in S

at time t, there exists at most a single measurement that determines the internal state at time

t + 1

. This closely parallels the definition of unifilarity in classical stochastic processes. We discuss several implications of quantum unifilarity later.

We call an HMCQS a classical controller of a quantum process, since there is nothing quantal about its internal states or transition dynamic. This is in contrast to related classes of quantum models that evolve a finite quantum system according to a quantum operation (defined via a set of Kraus operators) at each timestep. These include Quantum Markov Chains (QMCs) [34] and Hidden Quantum Markov Models (HQMMs) [35,36,37]. While HMCQSs emit separable quantum states, QMCs and HQMMs generate sequences of measurement outcomes (each corresponding to a particular Kraus operator) that form classical stochastic processes.

Anticipating future effort, we consider it worthwhile to draw out several observations on entanglement between successive qudits at this point. Entanglement means that finite-length qudit sequences are not separable and so are not described by Equation (5). Moreover, their sequence probabilities cannot be straightforwardly defined with reference to an underlying classical stochastic process.

That said, there are systematic ways of defining stationary

| Ψ_{- \infty : \infty} 〉

such that the set

{ρ_{0 : l}}

of marginals describes all measurements over blocks of ℓ qudits. For example, if the source’s internal structure consists of a D-dimensional quantum system interacting unitarily with one qudit per time step, it generates a matrix product state (MPS) with a maximum bond dimension of D [38]. If the source operates stochastically (rather than unitarily), then many different MPSs can be emitted with varying probabilities. The collection is then described by matrix product density operators (MPDOs) [39]. We refer to these as entangled qudit processes. Their dynamical and informational analyses are left for elsewhere. The present goal is to layout the basics for those efforts.

2.5. Measured Processes

An agent observing a quantum process has many ways to measure it. Let M represent a measurement applied to the qudit in state

ρ

. In general, M is a positive operator-valued measure (POVM) described by a set of positive semi-definite Hermitian operators

{E_{y}}

on the Hilbert space

H

of dimension d. Each

E_{y}

corresponds to a possible measurement outcome y, and POVM elements must sum to the following identity:

\begin{matrix} \sum_{y} E_{y} = I . \end{matrix}

Projection-valued measures (PVMs) are an important subclass of POVMs with an additional constraint: operators

E_{y}

must be orthogonal projectors. PVMs have at most d elements. A PVM consisting only of rank-one projectors on

H^{d}

is a von Neumann measurement and has exactly d elements [5].

A set of measurements applied to a block of ℓ qudits is described by some block POVM

M_{0 : l}

with elements

{E_{y_{0 : l}}}

on the Hilbert space

H_{0 : l}

of dimension

d^{l}

.

M_{0 : l}

may include measurements in the joint basis of multiple qudits—measurements essential for fully characterizing entangled processes.

For separable processes we focus on “local” measurements—operators on a single qudit. The measurement operator for a block of ℓ qudits then takes a tensor product structure:

\begin{matrix} M_{0 : l} = ⨂_{t = 0}^{l - 1} M_{t}, \end{matrix}

(10)

where each

M_{t}

is a POVM on

H_{t}

.

If we apply the same local POVM M to each qudit, then

\begin{matrix} M_{0 : l} = ⨂_{t = 0}^{l - 1} M . \end{matrix}

(11)

We refer to this as a repeated POVM measurement.

An observer can also have multiple POVMs at their disposal and apply different measurements at different time steps according to some measurement protocols. We describe measurement protocols in more detail shortly.

For simplicity, the following ignores

ρ_{0 : l}

’s post-measurement state and considers only the measurement outcomes

y_{0 : l}

. Thus, we take

M_{0 : l}

to be a stochastic map

ρ_{0 : l} \to Y_{0 : l}

, with the random variables representing measurement outcomes.

When applying a measurement of the form of Equation (11) to a finite block of ℓ qudits, the outcomes factor into a block of ℓ classical random variables:

\begin{matrix} Y_{0 : l} & = Y_{0} Y_{1} \dots Y_{l - 1} \\ = M_{0 : l} (ρ_{0 : l}), \end{matrix}

where the possible values of each

Y_{t}

are the POVM measurement outcomes

y \in Y

. There are

{| Y |}^{l}

possible realizations of

Y_{0 : l}

. We write a realization (word) of length ℓ as

y_{0 : l}

.

The probability of any particular measurement outcome for a block of ℓ qudits in state

ρ_{0 : l}

is

\begin{matrix} Pr (y_{0 : l}) = t r (E_{y_{0 : l}} ρ_{0 : l}) . \end{matrix}

For

ρ_{0 : l}

with the separable form of Equation (6) and identical POVM measurements on each qudit as in Equation (11), we can decompose

E_{y_{0 : l}}

into ℓ local operators

E_{y_{t}}

as follows:

\begin{matrix} Pr (y_{0 : l}) & = t r (E_{y_{0 : l}} \sum_{w \in X^{l}} Pr (w) | ψ_{w} 〉 〈 ψ_{w} |) \\ = t r (\sum_{w \in X^{l}} Pr (w) \prod_{t = 0}^{l - 1} E_{y_{t}} | ψ_{x_{t}} 〉 〈 ψ_{x_{t}} |) . \end{matrix}

(12)

For a separable qudit process, a sequence

y_{0 : l}

of local measurement outcomes can also be interpreted as the result of sending random variables

X_{0 : l}

from

\overset{\leftrightarrow}{X}

over the same memoryless noisy channel

C : X \to Y

.

C

decomposes into the deterministic preparation

E

and our stochastic measurement

M

:

\begin{matrix} C = M \circ E . \end{matrix}

(Appendix B presents a more thorough description of the classical-quantum channels

E

and

M

).

This construction makes it clear that measurement outcomes correspond to classical random variables

Y_{t}

that take values

y \in Y

and form a classical process

\overset{\leftrightarrow}{Y}

, with probabilities defined by Equation (12). To express the relationship between a quantum process and a measured classical process, we write

\begin{matrix} \overset{\leftrightarrow}{Y} = M (| Ψ_{- \infty : \infty} 〉), \end{matrix}

where

M

is a repeated, local POVM. If the qudit process is separable, we can also write

\begin{matrix} \overset{\leftrightarrow}{Y} = C (\overset{\leftrightarrow}{X}) . \end{matrix}

2.6. Adaptive Measurement Protocols

An observer does not need to repeat the same measurement on every qudit but may apply different POVMs at different time steps according to some algorithm. If the observer uses past measurement outcomes to inform their choice of POVM, we say that they are using an adaptive measurement protocol.

The following limits discussion to measurement protocols that have a deterministic finite automata (DFA) as their underlying controller. Similar constructions combining quantum measurement and DFAs have appeared in the context of quantum grammars [40,41,42].

A deterministic quantum measurement protocol (DQMP) is defined by the quintuple

(S, S_{0}, M, Y, δ)

:

A finite set $S = {σ_{1}, \dots, σ_{n}}$ of internal states.
A unique start state $S_{0} \in S$ .
A set of POVMs $M = {M_{σ}}_{σ \in S}$ , one for each internal state.
An alphabet $Y$ of m symbols corresponding to different measurement outcomes.
A deterministic transition map $δ : S \times Y \to S$ .

If

S

consists of only

S_{0}

, then the DQMP is a repeated POVM measurement for POVM

M_{S_{0}}

. When

S

has more than one internal state, the POVMs corresponding to different states may have the same or a different number of elements. Likewise, the symbol sets corresponding to their measurement outcomes may be disjoint, or symbols may be repeated.

We can place the following bounds on the size of the set

Y

:

m \leq \sum_{s} | {E_{s, y}} |

, where

{E_{s, y}}

is the set of operators corresponding to POVM

M_{s}

, and

m \geq {max}_{s} | {E_{s, y}} |

is the size of the POVM with the most elements.

For DQMP

M

and qudit process

| Ψ_{- \infty : \infty} 〉

, obtaining a measured process

\overset{\leftrightarrow}{Y} = M (| Ψ_{- \infty : \infty} 〉)

is generically more difficult than for the case of repeated POVM measurements. When an observer begins using protocol

M

at

t = 0

, they experience two distinct operating regimes: first the transient dynamic and then the recurrent dynamic. We briefly outline this process and return to the subject when we describe synchronization—a task deeply related to the transient dynamic—in Section 5.

M

begins in state

S_{0}

at

t = 0

. For a given (stationary, ergodic) input

| Ψ_{- \infty : \infty} 〉

, as

t \to \infty

, the DQMP approaches a stationary distribution over a subset of its internal states

π = {Pr (σ_{i}) > 0, for all σ_{i} \in S_{r}}

, where

S_{r} \subseteq S

is the set of recurrent states. This distribution (and even which states are in

S_{r}

) depends on

| Ψ_{- \infty : \infty} 〉

. The recurrent dynamic is determined by this stationary distribution and the transition probabilities between states in

S_{r}

. Any state not in

S_{r}

is in the transient state set

S_{t}

.

The transient dynamic describes how

M

goes from

S_{0}

at

t = 0

to its recurrent dynamic over

S_{r}

, which may occur at a finite time

t = t_{s y n c}

or only asymptotically as

t \to \infty

. In general, two dynamics produce two distinct measured processes:

{\overset{\leftrightarrow}{Y}}_{r}

, which is stationary and ergodic by construction, and

Y_{0 : t_{s y n c}}

, which is not. The final measured process has two components—i.e.,

\overset{\leftrightarrow}{Y} = {{\overset{\leftrightarrow}{Y}}_{r}, Y_{0 : t_{s y n c}}}

.

2.7. Discussion

These nested layers of complication suggest working through a concrete example and restating the overall goals.

Imagine that an observer measures a single qubit from a quantum source that emitted state

ρ_{t}

using a projective measurement in the computational basis

M_{01} = {| 0 〉 〈 0 |, | 1 〉 〈 1 |}

. The possible measurement outcomes

y_{0} = 0

and

y_{0} = 1

occur with the following probabilities:

\begin{matrix} Pr (y_{0} = 0) & = t r (| 0 〉 〈 0 | ρ_{t}) \\ Pr (y_{0} = 1) & = t r (| 1 〉 〈 1 | ρ_{t}), \end{matrix}

respectively. These two values determine the distribution for the random variable

Y_{0}

and, by applying the same projective measurement to

ρ_{0 : l}

, we completely determine the statistics

Pr (Y_{0 : l})

of the measured block. Continuing this procedure for

l \to \infty

defines the measured process

\overset{\leftrightarrow}{Y}

.

Naturally, the observer can also choose to apply measurements in another basis, e.g.,

M_{\pm} = {| + 〉 〈 + |, | - 〉 〈 - |}

, where

| \pm 〉 = \frac{1}{\sqrt{2}} (| 0 〉 \pm | 1 〉)

. This typically results in a measured process

{\overset{\leftrightarrow}{Y}}^{'}

with radically different statistical features.

Finally, an observer could use an adaptive measurement protocol

M

. It starts in state

s = S_{0}

and measures with

M_{01}

. If

y_{0} = 0

, it stays in

S_{0}

and continues using

M_{01}

. If

y_{0} = 1

, it transitions to a new internal state and uses

M_{\pm}

on the next qubit. Regardless of the outcome of

M_{\pm}

, it returns to

S_{0}

and measures the next qubit with

M_{01}

. The measured process

{\overset{\leftrightarrow}{Y}}^{″}

will be distinct from both

\overset{\leftrightarrow}{Y}

and

{\overset{\leftrightarrow}{Y}}^{'}

and may consist of both a transient and recurrent component.

With this setting laid out, we can now more precisely state the questions the following development answers:

Given the density matrices $ρ_{0 : l}$ describing sequences of ℓ separable qudits, what are the general properties of sequences $Y_{0 : l}$ of measurement outcomes? This is Section 3’s focus. There, $ρ_{0 : l}$ ’s quantum information properties bound the classical information properties of measurement sequences $Y_{0 : l}$ for certain classes of measurements.
Given a hidden Markov chain quantum source, when is an observer with knowledge of the source able to determine the internal state (synchronize)? Can the observer remain synchronized at later times? Section 5 addresses this.
If an observer encounters an unknown qudit source, how accurately can the observer estimate the informational properties of the emitted process through tomography with limited resources? How can they build approximate models of the source if they reconstruct $ρ_{0 : l}$ for some finite ℓ? This is Section 6’s subject.

Additionally, Section 4 illustrates these general results and the required analysis methods using specific examples of qudit processes.

3. Information in Quantum Processes

We wish to develop an information-theoretic analysis of quantum processes for which the observed sequences depend on the observer’s choice of measurement. (Much of this parallels the classical information measures reviewed in Appendix A). This requires a more general approach using density matrices

ρ_{0 : l}

that contain all the information necessary to describe the outcome of any measurement performed on ℓ-qudit blocks. We use quantum information theory to study properties of the set of

ρ_{0 : l}

and then relate them to classical properties of measurement sequences described in Appendix A. We begin by briefly reviewing several basic quantities in quantum information theory. References [6,33] give a more complete picture of the subject.

3.1. von Neumann Entropy

In quantum information theory, the von Neumann entropy plays a role similar to that of the Shannon entropy in classical information theory. Given a mixed quantum state

ρ

, the von Neumann entropy is

\begin{matrix} S (ρ) & = - t r (ρ {log}_{2} ρ) \\ = - \sum_{i} λ_{i} {log}_{2} λ_{i}, \end{matrix}

(13)

where the

λ_{i}

s are the eigenvalues of the density matrix

ρ

.

S (ρ) = 0

if and only if

ρ

is a pure state. We use

{log}_{2} (\cdot)

; therefore, the units of the von Neumann entropy will be bits.

From Equation (13), the von Neumann entropy is the Shannon entropy of the eigenvalue distribution of density matrix

ρ

. Therefore,

S (ρ) = min_{M} H [M (ρ)],

(14)

where

H [\cdot]

is the Shannon entropy, and the minimum is taken over the set of all rank-one POVMs. The minimum will always be a PVM with projectors that compose

ρ

’s eigenbasis [6]. We use brackets to indicate that

M (ρ)

is a classical probability distribution over the measurement outcomes.

To monitor correlations between two quantum systems, we use the quantum relative entropy:

\begin{matrix} S (ρ ∥ σ) \equiv t r (ρ {log}_{2} ρ) - t r (ρ {log}_{2} σ), \end{matrix}

(15)

where

ρ

and

σ

are the density operators of the two systems. The quantum relative entropy is non-negative and defined as follows:

\begin{matrix} S (ρ ∥ σ) \geq 0, \end{matrix}

(16)

with equality if and only if

ρ = σ

, a result known as Klein’s Inequality [33].

The joint quantum entropy for a state

ρ^{A B}

of a bipartite system

A B

is

\begin{matrix} S (A, B) & = S (ρ^{A B}) \\ = - t r (ρ^{A B} {log}_{2} ρ^{A B}), \end{matrix}

We can further define a conditional quantum entropy of system A conditioned on system B as

\begin{matrix} S (A | B) = S (A, B) - S (B), \end{matrix}

(17)

where

S (B) = S (ρ^{B})

. Note that

S (A | B) \neq S (B | A)

.

In contrast to the classical case, the conditional quantum entropy may be negative—a phenomenon leveraged in super-dense coding protocols [6]. Equivalently, the conditional quantum entropy can be written using the quantum relative entropy as

\begin{matrix} S (A | B) & = - S (ρ^{A B} ∥ I^{A} \otimes ρ^{B}) \\ = {log}_{2} (d_{A}) - S (ρ^{A B} ∥ \frac{I^{A}}{d_{A}} \otimes ρ^{B}), \end{matrix}

(18)

where

I^{A}

is the identity operator on Hilbert space

H^{A}

with dimension

d_{A}

.

The quantum mutual information between quantum subsystems A and B is given by

\begin{matrix} S (A : B) & = S (A) - S (A | B) \\ = S (A) + S (B) - S (A, B) . \end{matrix}

(19)

The quantum mutual information is symmetric and non-negative. If the joint system

A B

is in a pure state, then

S (A, B)

will be zero, and

S (A) = S (B)

. It can also be expressed as a quantum relative entropy as

\begin{matrix} S (A : B) & = S (ρ^{A B} ∥ ρ^{A} \otimes ρ^{B}) . \end{matrix}

(20)

A few additional well-known properties of the von Neumann entropy facilitate later results. First, each

ρ_{0 : l}

for separable qudit sequences is a finite mixture of states formed from length-ℓ words of an underlying classical process, so the following will be useful.

Lemma 1.

Consider a random variable X that takes values

x \in {0, 1, \dots, n}

with corresponding probabilities

{p_{0}, p_{1}, \dots p_{n}}

. Given a set of density matrices

{ρ_{0}, ρ_{1}, \dots, ρ_{n}}

, the following inequality holds [33]:

\begin{matrix} S (\sum_{x = 0}^{n} p_{x} ρ_{x}) \leq H [X] + \sum_{x = 0}^{n} p_{x} S (ρ_{x}), \end{matrix}

with equality if and only if the all

ρ_{x}

have support on orthogonal subspaces.

Second, since we use quantum channels to both prepare and measure a qudit process, we make use of the fact that the quantum relative entropy is monotonic [6]:

\begin{matrix} S (ρ ∥ σ) \geq S (E (ρ) ∥ E (σ)), \end{matrix}

(21)

where

E

is any quantum channel. This inequality becomes an equality if and only if there exists a recovery map

R

such that

R (E (ρ)) = ρ

and

R (E (σ)) = σ

[43].

3.2. Quantum Block Entropy

Since stationary qudit processes are correlated across time, we explore how the von Neumann entropy for qudit blocks scales with block size. The following gives bounds on the possible measurement sequences one can observe from a quantum information source. As the von Neumann entropy generalizes the Shannon entropy, the results here (and many of the proofs) are natural generalizations of those in Appendix A. We also note that exactly determining

ρ_{0 : l}

becomes practically infeasible for large ℓ. And so, Section 6 addresses how to approximate properties and models for qudit processes when restricted to measurements of finite length blocks.

For a qudit process, we define the quantum block entropy as the von Neumann entropy of a block of ℓ consecutive qudits:

\begin{matrix} S (l) \equiv - t r (ρ_{0 : l} {log}_{2} ρ_{0 : l}) . \end{matrix}

If

ρ_{0 : l}

is a pure state,

S (l) = 0

. By the same logic as the classical case,

S (0) \equiv 0

.

Many properties of the classical block entropy hold for

S (l)

.

Proposition 1.

For a stationary qudit process,

S (l)

is a nondecreasing function of ℓ.

Proof.

As a consequence of the strong subadditivity of the von Neumann entropy [33],

\begin{matrix} S (ρ^{A}) + S (ρ^{C}) \leq S (ρ^{A B}) + S (ρ^{B C}) . \end{matrix}

(22)

Let

ρ^{A B C} = ρ_{0 : 2 l + 1}

, where

ρ^{A} = ρ_{0 : l}

,

ρ^{B} = ρ_{l}

, and

ρ^{C} = ρ_{l + 1 : 2 l + 1}

.

Incorporating qudit process stationarity, we rewrite Equation (22) as follows:

\begin{matrix} S (ρ_{0 : l}) + S (ρ_{l + 1 : 2 l + 1}) & \leq S (ρ_{0 : l + 1}) + S (ρ_{l : 2 l + 1}) \\ S (l) + S (l) & \leq S (l + 1) + S (l + 1) \\ S (l) & \leq S (l + 1), \end{matrix}

(23)

where Equation (23) follows from stationarity.

Thus,

S (l) \leq S (l + 1)

, for all

l \geq 0

, and

S (l)

is a nondecreasing function of ℓ. □

Proposition 2.

For a stationary qudit process,

S (l)

is concave.

Proof.

The von Neumann entropy is strongly subadditive [33], meaning that

\begin{matrix} S (ρ^{A B C}) + S (ρ^{B}) \leq S (ρ^{A B}) + S (ρ^{B C}) . \end{matrix}

(24)

For

l \geq 3

, let

ρ^{A B C} = ρ_{0 : l}

, where

ρ^{A} = ρ_{0}

,

ρ^{B} = ρ_{1 : l - 1}

, and

ρ^{C} = ρ_{l - 1}

We can rewrite Equation (24) by incorporating the stationarity of qudit processes:

\begin{matrix} S (ρ_{0 : l}) + S (ρ_{1 : l - 1}) & \leq S (ρ_{0 : l - 1}) + S (ρ_{1 : l}) \\ S (l) + S (l - 2) & \leq S (l - 1) + S (l - 1) \\ S (l) - 2 S (l - 1) + S (l - 2) & \leq 0, \end{matrix}

(25)

where Equation (25) follows from stationarity.

Thus,

S (l)

is concave. □

For a separable qudit process formed by passing a classical process through a classical-quantum channel, its block entropies are related in the following way:

Proposition 3.

Let

| Ψ_{- \infty : \infty} 〉 = E (\overset{\leftrightarrow}{X})

. The block entropies of

| Ψ_{- \infty : \infty} 〉

and

\overset{\leftrightarrow}{X}

obey

\begin{matrix} S (ρ_{0 : l}) \leq H [X_{0 : l}], \end{matrix}

for all ℓ, with equality if and only if

Q

consists of

| X |

orthogonal pure states in

H

of dimension

d \geq | X |

.

Proof.

Recall from Equation (6) that for separable qudit processes,

\begin{matrix} ρ_{0 : l} = \sum_{w \in X^{l}} Pr (w) | ψ_{w} 〉 〈 ψ_{w} |, \end{matrix}

with each

| ψ_{w} 〉

taking the separable form of Equation (5) and

w = x_{0 : l}

.

We first note that, for all symbols in

X

to be associated with orthogonal qudit states, the minimum dimension of the Hilbert space is

d_{m i n} = | X |

.

With

ρ_{0 : l}

written as a mixture of separable qudit words, we apply Lemma 1 to obtain

\begin{matrix} S (ρ_{0 : l}) & \leq H [X_{0 : l}] + \sum_{w \in X^{l}} Pr (w) S (| ψ_{w} 〉) \\ \leq H [X_{0 : l}], \end{matrix}

where

w = x_{0 : l}

. The second term evaluates to zero in the final line, since each

| ψ_{w} 〉

is a pure state. That is,

S (| ψ_{w} 〉) = 0

for all

w \in X^{l}

.

Equality occurs if and only if the states

| ψ_{w} 〉

have support on orthogonal subspaces, which requires that

d \geq | X |

and all elements of

Q

are orthogonal. □

We cannot use

S (ρ_{0 : l})

to bound the block entropy of a measured process

\overset{\leftrightarrow}{Y}

for general POVM measurements. For the case where the measurement

M_{0 : l}

consists only of rank-one POVMs (including all PVMs) however, the following holds:

\begin{matrix} S (ρ_{0 : l}) \leq H [M_{0 : l} (ρ_{0 : l})], \end{matrix}

(26)

with equality if and only if the measurement is performed in the minimum-entropy (eigen)basis of

ρ_{0 : l}

. This follows directly from Equation (14).

Proposition 4.

Let

\overset{\leftrightarrow}{Y} = M (| Ψ_{- \infty : \infty} 〉)

, where

M

is a repeated rank-one POVM measurement. The block entropies of

| Ψ_{- \infty : \infty} 〉

and

\overset{\leftrightarrow}{Y}

then obey

\begin{matrix} S (ρ_{0 : l}) \leq H [Y_{0 : l}], \end{matrix}

for all ℓ, with equality if and only if

| Ψ_{- \infty : \infty} 〉

is a separable process with an orthogonal alphabet

Q

and

M

uses a POVM whose operators include one projector for each element in

Q

.

Proof.

The bound follows directly from Equation (26) because repeated rank-one POVM measurements are a subclass of the more general measurement sequence

M_{0 : l}

.

The condition for equality also follows from Equation (26) but requires more justification. First, we consider measuring the single-qubit marginal

ρ_{0}

with POVM M. For equality, each element of the eigenbasis of

ρ_{0}

(

{| e_{i} 〉}

) must have a corresponding operator in M that is a projector on that eigenspace (

E_{i} = | e_{i} 〉 〈 e_{i} |

).

We can write

ρ_{0} = \sum_{i} p_{i} | e_{i} 〉 〈 e_{i} |

. Since we apply the same POVM to each qudit, all blocks of length ℓ must have eigenstates of the form

⨂_{t = 0}^{l - 1} | e_{t} 〉

—i.e., they must take the separable form of Equation (5)—making

| Ψ_{- \infty : \infty} 〉

a separable process with a quantum alphabet

Q

of orthogonal states. The M consisting of projectors onto the states in

Q

is then the minimum-entropy measurement over blocks

ρ_{0 : l}

. Note that there may be other elements of the POVM that are not projectors if the probability of those measurement outcomes is 0 when applied to

ρ_{0 : l}

. (If the process does not make use of that part of Hilbert space, it does not matter how it is measured). □

To summarize, in the case of separable qudit processes,

S (l)

is upper-bounded by the underlying classical process’s block entropy

H [X_{0 : l}]

. For repeated measurement with rank-one POVMs,

S (l)

serves as a lower bound on the block entropy of all classical measured processes

H [Y_{0 : l}]

. There is no direct relationship between

H [X_{0 : l}]

and

H [Y_{0 : l}]

. Rather, it depends on the specifics of

E

and

M

.

3.3. von Neumann Entropy Rate

The von Neumann entropy rate of a qudit process is

\begin{matrix} s = lim_{l \to \infty} \frac{S (l)}{l} . \end{matrix}

(27)

The units of s are bits per time step. This quantity is equivalent to the mean entropy, first introduced in the context of quantum statistical mechanics [44]. The limit exists for all stationary processes [45]. Operationally, s is the optimal coding rate for a stationary quantum source [17].

Proposition 5.

For a stationary qudit process, we can equivalently write the von Neumann entropy rate as

\begin{matrix} s = lim_{l \to \infty} S (ρ_{0} | ρ_{- l : 0}) . \end{matrix}

(28)

Proof.

This proof closely follows the proof for the classical entropy rate in Ref. [46]. We begin by showing that

{lim}_{l \to \infty} S (ρ_{0} | ρ_{- l : 0})

exists and then that it is equivalent to the limit in Equation (27). The limit exists if

S (ρ_{0} | ρ_{- l : 0})

is a decreasing, non-negative function of ℓ:

\begin{matrix} S (ρ_{0} | ρ_{- l : 0}) & = S (ρ_{- l : 1}) - S (ρ_{- l : 0}) \end{matrix}

\begin{matrix} = S (l + 1) - S (l) \end{matrix}

(29)

\begin{matrix} \geq 0, \end{matrix}

(30)

where Equation (29) follows from stationarity, and Equation (30) follows from the nondecreasing nature of

S (l)

. This, combined with the fact that

S (l)

is concave, means that

{lim}_{l \to \infty} S (ρ_{0} | ρ_{- l : 0})

exists.

Now, we establish that

{lim}_{l \to \infty} S (ρ_{0} | ρ_{- l : 0}) = {lim}_{l \to \infty} S (l) / l

. Through repeated application of Equation (17) to a block of length ℓ, we obtain the following chain rule for the von Neumann entropy:

\begin{matrix} S (ρ_{0 : l}) = \sum_{j = 0}^{l - 1} S (ρ_{j} | ρ_{j - 1} \dots ρ_{0}) . \end{matrix}

(31)

We can modify indices (due to stationarity) and divide both sides by ℓ to obtain

\begin{matrix} \frac{S (ρ_{0 : l})}{l} = \frac{1}{l} \sum_{i = 1}^{l} S (ρ_{i} | ρ_{i - 1} \dots ρ_{1}) . \end{matrix}

(32)

The final steps require the following result, known as the Cesáro mean [46].

Lemma 2.

If

a_{n} \to a

and

b_{n} = \frac{1}{n} \sum_{i = 1}^{n} a_{i}

, then

b_{n} \to a

.

Taking the limit of both sides of Equation (32) and applying Lemma 2, we find

\begin{matrix} lim_{l \to \infty} \frac{S (ρ_{0 : l})}{l} = lim_{l \to \infty} S (ρ_{l} | ρ_{l - 1} \dots ρ_{1}) . \end{matrix}

(33)

Using stationarity,

{lim}_{l \to \infty} S (ρ_{l} | ρ_{l - 1} \dots ρ_{1}) = {lim}_{l \to \infty} S (ρ_{0} | ρ_{- l : 0})

. When combined with Equation (33), this proves that our two definitions of s are equivalent. □

To motivate a number of the following results, it is important to appreciate that simply because a process has a von Neumann entropy rate given by Equation (27) does not imply that an observer is able to perform a measurement on any qudit such that the uncertainty in that individual measurement is s. Rather, obtaining s corresponds to the measurement basis over the entire chain of qudits for which the distribution of outcomes has the minimal Shannon entropy. This basis is highly nonlocal or otherwise experimentally infeasible for many nontrivial examples. As in the classical case, s appears graphically as the slope of

S (l)

as

l \to \infty

, as shown in Figure 2.

For separable qudit processes, we can also relate s to the classical entropy rate of the underlying process

\overset{\leftrightarrow}{X}

as follows.

Proposition 6.

Let

| Ψ_{- \infty : \infty} 〉 = E (\overset{\leftrightarrow}{X})

. The von Neumann entropy rate s of

| Ψ_{- \infty : \infty} 〉

then obeys the bound

\begin{matrix} s \leq h_{μ}^{X}, \end{matrix}

where

h_{μ}^{X}

is the Shannon entropy rate of

\overset{\leftrightarrow}{X}

. Equality occurs if and only if

Q

consists of

| X |

orthogonal pure states in

H^{d}

of dimension

d \geq | X |

.

Proof.

Divide both sides of Proposition 3 by ℓ and take the limit of both sides as

l \to \infty

to obtain

\begin{matrix} lim_{l \to \infty} \frac{S (ρ_{0 : l})}{l} \leq lim_{l \to \infty} \frac{H [X_{0 : l}]}{l} . \end{matrix}

From Equation (27), the left side is s, and from Equation (A1), the right side is

h_{μ}^{X}

. The condition for equality is inherited from Proposition 3, concluding the proof. □

Restricting once again to repeated measurements with rank-one POVMs, we can prove the following bound for measured processes.

Proposition 7.

Let

\overset{\leftrightarrow}{Y} = M (| Ψ_{- \infty : \infty} 〉)

, and let

M

be a repeated rank-one POVM. The measured entropy rate

h_{μ}^{Y}

then obeys

\begin{matrix} s \leq h_{μ}^{Y}, \end{matrix}

where s is the von Neumann entropy rate of

| Ψ_{- \infty : \infty} 〉

, with equality if and only if

| Ψ_{- \infty : \infty} 〉

is a separable process with an orthogonal alphabet

Q

and

M

uses a POVM whose operators include a projector for each element in

Q

.

Proof.

Divide both sides of Proposition 4 by ℓ and take the limit of both sides as

l \to \infty

to obtain

\begin{matrix} lim_{l \to \infty} \frac{S (ρ_{0 : l})}{l} \leq lim_{l \to \infty} \frac{H [Y_{0 : l}]}{l} . \end{matrix}

The left side is s, and the right side

h_{μ}^{Y}

. The conditions for equality are inherited from Proposition 4, concluding the proof. □

3.4. Quantum Redundancy

Unlike a classical process, the maximum entropy rate for a qudit process depends on the size of the Hilbert space rather than on the size of the alphabet

Q

. For Hilbert space of dimension d, the largest possible value of s is

{log}_{2} (d)

, corresponding to an i.i.d. sequence of qudits, each in a maximally mixed state

ρ_{i i d} = I / d

.

A qudit process can be compressed down to its von Neumann entropy rate s. The amount that it can be compressed is the quantum redundancy:

\begin{matrix} R_{q} \equiv {log}_{2} (d) - s . \end{matrix}

(34)

Statistical biases in individual qudits and temporal correlations between them offer opportunities for compression.

R_{q}

includes the effects of both.

For separable qudit processes, we can bound the quantum redundancy using properties of the underlying classical process:

Proposition 8.

Let

\overset{\leftrightarrow}{X}

be a classical process with redundancy

R^{X}

, symbol alphabet

X

, and entropy rate

h_{μ}^{X}

, and let

| Ψ_{- \infty : \infty} 〉 = E (\overset{\leftrightarrow}{X})

be a qudit process with redundancy

R_{q}

, Hilbert space of dimension d, and entropy rate s.

For

d \geq | X |

,

\begin{matrix} R_{q} \geq R^{X}, \end{matrix}

with equality if and only if

d = | X |

and

Q

consist of

| X |

orthogonal pure states.

For

d < | X |

,

\begin{matrix} R_{q} < R^{X} + (h_{μ}^{X} - s), \end{matrix}

where the term

(h_{μ}^{X} - s)

is always positive, as indicated by Proposition (6).

Proof.

First, consider

d = | X |

:

\begin{matrix} R_{q} & = {log}_{2} (| X |) - s \\ = R^{X} + h_{μ}^{X} - s \\ \geq R^{X}, \end{matrix}

The final line comes from Proposition 6, as does the condition for equality.

For

d > | X |

,

\begin{matrix} R_{q} & = {log}_{2} (d) - s \\ > {log}_{2} (| X |) - s \\ > R^{X} . \end{matrix}

There is no opportunity for equality. In this case,

Q

will not span

H

, naturally leading to more redundancy.

Finally, for

d < | X |

,

\begin{matrix} R_{q} & < {log}_{2} (d) - s \\ < {log}_{2} (| X |) - s \\ < R^{X} + (h_{μ}^{X} - s), \end{matrix}

concluding the proof. □

We can also compare the classical redundancy of a measured process (obtained through repeated use of a rank-one POVM) to the quantum redundancy of the qudit process being measured.

Proposition 9.

Let

\overset{\leftrightarrow}{Y}

be a measured process such that

\overset{\leftrightarrow}{Y} = M (| Ψ_{- \infty : \infty} 〉)

and

M

be a repeated rank-one POVM. Let

\overset{\leftrightarrow}{Y}

have redundancy

R^{Y}

, and let

| Ψ_{- \infty : \infty} 〉

have quantum redundancy

R_{q}

. Then,

\begin{matrix} R_{q} \leq R^{Y}, \end{matrix}

(35)

with equality if and only if

d = | Y |

,

| Ψ_{- \infty : \infty} 〉

is a separable process with an orthogonal alphabet

Q

and

M

uses a POVM whose operators include a projector for each element in

Q

.

Proof.

A rank-one POVM on

H

must have at least d elements; therefore, for

| Y | \geq d

,

\begin{matrix} R_{q} & = {log}_{2} (d) - s \\ \leq {log}_{2} (| Y |) - s \\ \leq R^{Y} + h_{μ}^{Y} - s \\ \leq R^{Y} . \end{matrix}

Going from the first line to the second provides a condition for equality:

d = | Y |

. Proposition 7 is used in the final line and provides the other conditions for equality. □

3.5. Quantum Entropy Gain

We can take discrete-time derivatives of

S (l)

, as was done for

H [l]

in [11]. This process is summarized in Appendix A. We call the first derivative of

S (l)

the quantum entropy gain as

\begin{matrix} Δ S (l) \equiv S (l) - S (l - 1), \end{matrix}

(36)

for

l > 0

. The units for the quantum entropy gain are bits per time step, and we set the boundary condition at length

l = 0

to

Δ S (0) = {log}_{2} (d)

, where d is the Hilbert space dimension. Since

S (l)

is monotonically increasing,

Δ S (l) \geq 0

.

The quantum entropy gain is the amount of additional uncertainty introduced by including the

l^{th}

qudit in a block, where that uncertainty is quantified by the von Neumann entropy.

By combining Equations (17) and (36), we can write

Δ S (l)

as

\begin{matrix} Δ S (l) = S (ρ_{0} | ρ_{- l : 0}) . \end{matrix}

This allows for relating the quantum entropy gain and the von Neumann entropy rate as follows:

\begin{matrix} s = lim_{l \to \infty} Δ S (l) . \end{matrix}

(37)

Thus, paralleling the classical case, the quantum entropy gain serves as a finite-ℓ approximation of the von Neumann entropy rate:

\begin{matrix} s (l) & \equiv Δ S (l) . \end{matrix}

(38)

s (l)

serves as the best estimate for the entropy rate of a qudit process that can be made by an observer who only has access to measurement statistics for length-ℓ blocks of qudits.

The way in which the entropy rate estimate converges and its relationship to other information properties of a qudit process are summarized in Figure 3.

3.6. Quantum Predictability Gain

We call the second derivative of

S (l)

the quantum predictability gain, which is given by

\begin{matrix} Δ^{2} S (l) & \equiv Δ s (l) \\ = s (l) - s (l - 1), \end{matrix}

for

l > 0

. The units of

Δ^{2} S (l)

are bits per time step². Since

S (l)

is concave, then

Δ^{2} S (l) \leq 0

.

| Δ^{2} S (l) |

quantifies how much an observer’s estimate of the von Neumann entropy rate s improves if they enlarge their observations from blocks of

l - 1

to blocks of ℓ qudits. The generic convergence behavior of

Δ^{2} S (l)

is shown in Figure 4.

For all higher-order discrete derivatives of

S (l)

(as with the classical block entropy),

\begin{matrix} lim_{l \to \infty} Δ^{n} S (l) = 0, n \geq 2 . \end{matrix}

This follows directly from the existence of the limit in Equation (27) for stationary quantum states.

3.7. Total Quantum Predictability

Up to this point, introducing new information-theoretic characteristics of separable quantum processes proceeded by taking discrete-time derivatives of the von Neumann block entropy. We can likewise integrate the functions

Δ^{n} S (l)

, as is done for the classical case with Equation (A3). While this starts off straightforwardly, a number of interesting new informational quantities emerge.

These properties of qudit processes take the following general form:

\begin{matrix} I_{n}^{q} \equiv \sum_{l = l_{0}}^{\infty} [Δ^{n} S (l) - lim_{l \to \infty} Δ^{n} S (l)], \end{matrix}

where

l_{0}

is the first value of ℓ for which

Δ^{n} S (l)

is defined.

I_{2}^{q}

, the first of these, monitors the convergence of the quantum predictability gain

Δ^{2} S (l)

to its limit of 0 for

l \to \infty

. We use

l_{0} = 1

to get the total quantum predictability

G_{q}

:

\begin{matrix} G_{q} & \equiv I_{2}^{q} \\ = \sum_{l = 1}^{\infty} Δ^{2} S (l) . \end{matrix}

(39)

The units of

G_{q}

are bits per time step. Note that

G_{q} \leq 0

because

Δ^{2} S (l) \leq 0

for all ℓ.

G_{q}

can be interpreted by relating it to a previously established property of qudit processes: quantum redundancy.

Proposition 10.

For a stationary qudit process,

\begin{matrix} G_{q} = - R_{q} . \end{matrix}

Proof.

Applying Equation (A3) to Equation (39), we find that

\begin{matrix} G_{q} & = lim_{l \to \infty} Δ S (l) - Δ S (0) \\ = s - {log}_{2} (d) \\ = - R_{q} . \end{matrix}

Here, the second line follows from Equation (37) and the third from Equation (34). □

Thus,

| G_{q} |

is the total amount of predictable information per time step for a qudit process.

Rather immediately, one sees that the amount of information in an individual qudit decomposes into

\begin{matrix} {log}_{2} (d) = | G_{q} | + s . \end{matrix}

| G_{q} |

is the amount of quantum information within a qudit that is predictable, whereas s is the amount of information that is irreducibly random.

The relation

| G_{q} | = R_{q}

can be combined with Propositions 8 and 9 to prove two corollaries.

Corollary 1.

Let

\overset{\leftrightarrow}{X}

be a classical process with total predictability

G^{X}

, symbol alphabet

X

, and entropy rate

h_{μ}^{X}

, and let

| Ψ_{- \infty : \infty} 〉 = E (\overset{\leftrightarrow}{X})

be a qudit process with total quantum predictability

G_{q}

, Hilbert space of dimension d, and entropy rate s.

For

d \geq | X |

,

\begin{matrix} | G_{q} | \geq | G^{X} |, \end{matrix}

with equality if and only if

d = | X |

and

Q

consist of

| X |

orthogonal pure states.

For

d < | X |

,

\begin{matrix} | G_{q} | < | G^{X} | + (h_{μ}^{X} - s), \end{matrix}

where the term

(h_{μ}^{X} - s)

is always positive, as derived from Proposition (6).

Proof.

This follows immediately from combining Propositions 8 and 10. □

Corollary 2.

Let

\overset{\leftrightarrow}{Y}

be a measured process such that

\overset{\leftrightarrow}{Y} = M (| Ψ_{- \infty : \infty} 〉)

, and

M

is a repeated rank-one POVM. Let

\overset{\leftrightarrow}{Y}

have redundancy

G^{Y}

, and let

| Ψ_{- \infty : \infty} 〉

have quantum redundancy

G_{q}

. Then,

\begin{matrix} | G_{q} | \leq | G^{Y} |, \end{matrix}

with equality if and only if

d = | Y |

,

| Ψ_{- \infty : \infty} 〉

is a separable process with an orthogonal alphabet

Q

and

M

uses a POVM whose operators include a projector for each element in

Q

.

Proof.

This follows immediately from combining Propositions 9 and 10. □

Graphically, the total quantum predictability is the area between the predictability gain curve and its linear asymptote of 0, as seen in Figure 4. The von Neumann entropy rate and total predictability lend insight into compression limits for stationary sources. They do not indicate, however, whether that compression is achievable due to bias within individual qudit states or correlations between qudits. For that, we must continue our way back up the entropy hierarchy.

3.8. Quantum Excess Entropy

The convergence of

s (l)

to the true von Neumann entropy rate s is quantified with the quantum excess entropy:

\begin{matrix} E_{q} \equiv I_{1}^{q} & = \sum_{l = 1}^{\infty} [Δ S (l) - lim_{l \to \infty} Δ S (l)] \\ = \sum_{l = 1}^{\infty} [s (l) - s] . \end{matrix}

(40)

The units for

E_{q}

are bits. Paralleling the classical case, we refer to any qudit process with finite

E_{q}

as finitary and those with infinite

E_{q}

as infinitary.

We can further express

E_{q}

in terms of the asymptotic behavior of

S (l)

.

Proposition 11.

The quantum excess entropy can be written as

\begin{matrix} E_{q} = lim_{l \to \infty} [S (l) - s l] . \end{matrix}

(41)

Proof.

We evaluate the discrete integral in Equation (40) with Equation (A3) using partial sums:

\begin{matrix} E_{q} & = lim_{l \to \infty} \sum_{m = 1}^{l} [Δ s (m) - s] \\ = lim_{l \to \infty} [S (l) - S (0) - s l] \\ = lim_{l \to \infty} [S (l) - s l], \end{matrix}

since

S (0) = 0

by definition. □

For finitary quantum processes,

E_{q}

is the area between the entropy gain curve and its asymptote s, as seen in Figure 3. It also appears in Figure 2 as the vertical offset of the linear asymptote to the

S (l)

curve.

This leads to a natural scaling of the quantum block entropy as

\begin{matrix} S (l) \sim E_{q} + s l, \end{matrix}

as

l \to \infty

.

A clearer interpretation of

E_{q}

as a quantum mutual information is provided by the following proposition.

Proposition 12.

The quantum excess entropy can be written as

\begin{matrix} E_{q} = lim_{l \to \infty} S (ρ_{- l : 0} : ρ_{0 : l}), \end{matrix}

where

ρ_{- l : 0}

and

ρ_{0 : l}

are two blocks of ℓ consecutive qudits with a shared boundary.

Proof.

The quantum mutual information, from Equation (19), between two neighboring blocks of ℓ qudits can be expressed as

\begin{matrix} S (ρ_{0 : l} : ρ_{- l : 0}) & = S (ρ_{0 : l}) - S (ρ_{0 : l} | ρ_{- l : 0}) \\ = S (l) - \sum_{t = 0}^{l - 1} S (ρ_{t} | ρ_{- l : t - 1}), \end{matrix}

where the final line is obtained through repeated application of Equation (17).

Taking

l \to \infty

,

\begin{matrix} lim_{l \to \infty} S (ρ_{0 : l} : ρ_{- l : 0}) & = lim_{l \to \infty} [S (l) - \sum_{t = 0}^{l - 1} S (ρ_{t} | ρ_{- l : t - 1})] \\ = lim_{l \to \infty} [S (l) - s l], \end{matrix}

where the final line follows from Equation (32) and stationarity.

This final expression is equivalent to the form of

E_{q}

derived in Proposition 11, concluding the proof. □

E_{q}

is therefore a measure of all the correlations between two halves of the infinite sequence of qudits.

E_{q} = 0

if and only if a source is i.i.d. (with

S (l) = s l

).

We can relate

E_{q}

for a separable qudit process to

E^{X}

of the underlying classical process.

Proposition 13.

Let

\overset{\leftrightarrow}{X}

be a classical process with alphabet

X

and excess entropy

E^{X}

, and let

| Ψ_{- \infty : \infty} 〉 = E (\overset{\leftrightarrow}{X})

be a qudit process with alphabet

Q

and quantum excess entropy

E_{q}

.

Then,

\begin{matrix} E_{q} \leq E^{X}, \end{matrix}

with equality if and only if

Q

consists of

| X |

orthogonal states.

Proof.

Consider

X_{- l : l}

, a block of length

2 l

of the classical process

\overset{\leftrightarrow}{X}

. We can write realizations of

X_{- l : l}

into a classical register to form the following state:

\begin{matrix} ρ_{- l : l}^{C} = \sum_{x_{- l : l}} Pr (x_{- l : l}) | x_{- l : l} 〉 〈 x_{- l : l} |, \end{matrix}

(42)

where all

| x_{- l : l} 〉

are orthogonal. Then, we pass each symbol through the preparation channel

E

to obtain blocks of our qudit process

ρ_{- l : l} = E^{2 l} (ρ_{- l : l}^{C})

, where

E^{2 l} = ⨂_{i = 0}^{2 l} E

.

We can express the quantum mutual information as a quantum relative entropy, Equation (20), giving the following relation:

\begin{matrix} I [X_{- l : 0} : X_{0 : l}] & = S (ρ_{- l : 0}^{C} : ρ_{0 : l}^{C}) \\ \geq S (E^{l} (ρ_{- l : 0}^{C}) : E^{l} (ρ_{0 : l}^{C})) \\ \geq S (ρ_{- l : 0} : ρ_{0 : l}), \end{matrix}

(43)

where Equation (43) comes from the monotonicity of the quantum relative entropy in Equation (21). The condition for equality comes from Equation (21) as well. The set of states for which the recovery map must exist is

Q

, and this is only possible if all

| ψ_{x} 〉

are distinguishable—i.e., orthogonal.

Using Equation (A9) to write the excess entropy of

\overset{\leftrightarrow}{X}

as a limit, we see that

\begin{matrix} lim_{l \to \infty} I [X_{- l : 0} : X_{0 : l}] & \geq lim_{l \to \infty} S (ρ_{- l : 0} : ρ_{0 : l}) \\ E^{X} & \geq E_{q} . \end{matrix}

□

A similar bound appears when we apply a repeated POVM measurement to the quantum process to obtain a classical process.

Proposition 14.

Let

\overset{\leftrightarrow}{Y}

be a measured process such that

\overset{\leftrightarrow}{Y} = M (| Ψ_{- \infty : \infty} 〉)

, and let

M

be a repeated rank-one POVM. Let

\overset{\leftrightarrow}{Y}

have excess entropy

E^{Y}

, and let

| Ψ_{- \infty : \infty} 〉

have quantum excess entropy

E_{q}

. Then,

\begin{matrix} E_{q} \geq E^{Y}, \end{matrix}

with equality if and only if

| Ψ_{- \infty : \infty} 〉

is a separable process with an orthogonal alphabet

Q

and

M

uses a POVM whose operators include a projector for each element in

Q

.

Proof.

Consider

ρ_{- l : l}

—a block of length

2 l

of the quantum process

| Ψ_{- \infty : \infty} 〉

—and let

M

be a repeated measurement of rank-one POVM M with elements

{E_{y}}

so that

M (ρ_{t}) = \sum_{y} Pr (y) | y 〉 〈 y |

,

Pr (y) = t r (E_{y} ρ_{t})

, and all

| y 〉

are orthogonal. The repeated measurement applied over a block of length ℓ is then

M_{0 : l} = ⨂_{i = 0}^{l - 1} M

.

We express the quantum mutual information as a quantum relative entropy (using Equation (20)) and apply Equation (21) to obtain

\begin{matrix} S (ρ_{- l : 0} : ρ_{0 : l}) \geq S (M_{0 : l} (ρ_{- l : 0}) : M_{0 : l} (ρ_{0 : l})) \geq S (\sum_{y_{- l : 0}} Pr (y_{- l : 0}) ⨂_{t = - l}^{0} | y_{t} 〉 〈 y_{t} | : \sum_{y_{0 : l}} Pr (y_{0 : l}) ⨂_{t = 0}^{l - 1} | y_{t} 〉 〈 y_{t} |) \geq I [Y_{- l : 0} : Y_{0 : l}] . \end{matrix}

(44)

The condition for equality comes from Equation (21) as well. The set of states for which the recovery map must exist is

{| y 〉}

, and this is only possible if all

| ψ_{x} 〉 \in Q

are orthogonal and M contains a projector for each

| ψ_{x} 〉

. By the same argument as Proposition 4,

| Ψ_{- \infty : \infty} 〉

must be a separable process, and

Q

must consist of orthogonal states.

Taking the limit

l \to \infty

, we see that

\begin{matrix} lim_{l \to \infty} S (ρ_{- l : 0} : ρ_{0 : l}) & \geq lim_{l \to \infty} I [Y_{- l : 0} : Y_{0 : l}] \\ E_{q} & \geq E^{Y} . \end{matrix}

□

Combining the above proofs, we obtain the following corollary relating the excess entropies of the underlying classical process

\overset{\leftrightarrow}{X}

and the measured process

\overset{\leftrightarrow}{Y}

.

Corollary 3.

Let

\overset{\leftrightarrow}{X}

be a classical process with excess entropy

E^{X}

and alphabet

X

, let

| Ψ_{- \infty : \infty} 〉 = E (\overset{\leftrightarrow}{X})

be a separable qudit process with alphabet

Q

, and let

\overset{\leftrightarrow}{Y}

be a measured process with excess entropy

E^{Y}

such that

\overset{\leftrightarrow}{Y} = M (| Ψ_{- \infty : \infty} 〉)

, where

M

is a repeated rank-one POVM.

Then,

\begin{matrix} E^{X} \geq E^{Y}, \end{matrix}

with equality if and only if

Q

consists of

| X |

orthogonal states and

M

uses a POVM whose operators include a projector for each element in

Q

.

Proof.

This follows immediately from combining Propositions 13 and 14. □

An exact value of

E_{q}

typically requires characterizing infinite-length sequences of qudits. However, we can write a finite-ℓ estimate of

E_{q}

using Equation (41):

\begin{matrix} E_{q} (l) = S (l) - l s (l), \end{matrix}

(45)

which generally underestimates

E_{q}

’s true value.

3.9. Quantum Transient Information

We now turn to look at how the quantum block entropy curve converges to its linear asymptote

E_{q} + s l

. We define the quantum transient information as

\begin{matrix} T_{q} \equiv - I_{0}^{q} = \sum_{l = 1}^{\infty} [E_{q} + s l - S (l)] . \end{matrix}

(46)

The units of

T_{q}

are bits × time steps.

T_{q}

is represented graphically as the area between the

S (l)

curve and its linear asymptote for

l \to \infty

, as seen in Figure 2. We will see that

T_{q}

distinguishes between periodic qudit processes that cannot be distinguished with previous information quantities such as

E_{q}

and s.

Proposition 15.

The transient quantum information

T_{q}

can be written as

\begin{matrix} T_{q} = \sum_{l = 1}^{\infty} l [s (l) - s] . \end{matrix}

Proof.

The proof reduces to the straightforward proof for transient information of a classical stochastic process, as defined in Ref. [11]. It depends only upon Equations (46) and (A3), which have the same form in the quantum case as the classical case. □

This expression allows us to estimate

T_{q}

for a given quantum process as

\begin{matrix} T_{q} (l) = \sum_{m = 1}^{l - 1} m [s (m) - s (l)], \end{matrix}

(47)

which generally underestimates the true value of

T_{q}

.

T_{q}

is related to the minimal amount of information necessary for an observer to synchronize to an HMCQS. We say an observer is synchronized when they are able to determine a source’s internal state. If

S (l)

converges to its linear asymptote at finite ℓ, then there exists an optimal POVM on

ρ_{0 : l}

(in the eigenbasis of

ρ_{0 : l}

) that exactly determines the HMCQS’s internal state. Note that this is not guaranteed to be a repeated POVM or even consist of local POVMs. The information within that measurement that is useful for synchronization is quantified by

T_{q}

. In contrast, if

S (l)

does not converge for finite ℓ, then no such POVM over any finite block of qudits exists, and an observer can (at best) only converge asymptotically to the source’s internal state. We will see in Section 5 that even this is not possible for most sources when we restrict ourselves to local measurements.

3.10. Quantum Markov Order

The quantum Markov order corresponds to the number of previous qudits on which the next qudit is conditionally dependent. A process has quantum Markov order

R_{q}

if

R_{q}

is the smallest value for which the following property holds:

\begin{matrix} S (ρ_{0} | ρ_{- R_{q} : 0}) = S (ρ_{0} | ρ_{- \infty : 0}) . \end{matrix}

A graphical interpretation is that when the block size reaches

R_{q}

,

S (l)

levels off and has a constant slope—which is s, as seen in Figure 2. As a consequence,

R_{q}

is the value of ℓ for which

Δ S (l)

and

Δ^{2} S (l)

converge to their asymptotic values, as seen in Figure 3 and Figure 4, respectively.

Note that a classical process is referred to as ‘Markov’ if

R = 1

. If a separable qudit process has an underlying classical process that is Markov, then it obeys the property in Equation (9) but does not generally have

R_{q} = 1

.

Consider a separable quantum process

| Ψ_{- \infty : \infty} 〉 = E (\overset{\leftrightarrow}{X})

, where

\overset{\leftrightarrow}{X}

has Markov order

R^{X}

, and

| Ψ_{- \infty : \infty} 〉

has quantum Markov order

R_{q}

.

R_{q}

can be equal to, less than, or greater than

R^{X}

. We will give a simple example of each case:

$R_{q} = R^{X}$ trivially if $Q$ consists of orthogonal states, in which case they have identical block entropy curves via Proposition 3.
$R_{q} < R^{X}$ if $R^{X} > 0$ and all symbols in $X$ are mapped to the same pure state $| ψ_{x} 〉$ . In this case, $R_{q} = 0$ , as the resulting process is i.i.d.
$R_{q} > R^{X}$ when $R^{X} > 0$ , $| X | = | Q |$ , and $Q$ consists of nonorthogonal states. Frequently, $R_{q} = \infty$ , since arbitrary sequences of nonorthogonal states cannot reliably be distinguished with a finite POVM.

Similar rules apply when comparing

R_{q}

to the Markov order

R^{Y}

of measured process

\overset{\leftrightarrow}{Y} = M (| Ψ_{- \infty : \infty} 〉)

:

$R_{q} = R^{Y}$ if $| Ψ_{- \infty : \infty} 〉$ is a separable process with an orthogonal alphabet $Q$ and $M$ uses a POVM whose operators include a projector for each element in $Q$ via Proposition 4.
$R_{q} < R_{Y}$ if $Q$ consists of orthogonal states, $R_{q} = 1$ , and $M$ is a repeated rank-one POVM that does not include projectors onto the states in $Q$ .
$R_{q} > R_{Y}$ if $R_{q} > 0$ and $M$ is a repeated POVM measurement with the one-element POVM, $I$ . Note that this is not a rank-one POVM.

Now, with a toolbox of quantum information properties in hand, the following section calculates (or estimates) their values for paradigmatic examples of qudit processes.

4. Qudit Processes

We now present a variety of separable qudit processes organized roughly by increasing structural complexity and demonstrate the extent to which their behavior can be quantified with Section 3’s informational measures. Their properties are summarized in Table 1 near the end when we have their informational measures in hand. Note that the following results on qudits do not markedly change if one simplifies to qubits.

4.1. I.I.D. Processes

Recall that an i.i.d. (independent and identically distributed) qudit process has no classical or quantum correlation between any of the qudits, and its length-ℓ density matrices are in a product state such that

\begin{matrix} ρ_{0 : l} = ⨂_{i = 0}^{l - 1} ρ_{i i d}, \end{matrix}

where

ρ_{i i d} \in H

is the density matrix for a single time step.

The quantum block entropy takes the form

S (l) = l S (ρ_{i i d})

; thus, the von Neumann entropy rate is

s = S (ρ_{i i d})

. This implies that a repeated projective measurement exists for which the measured process

\overset{\leftrightarrow}{Y}

has a classical entropy rate of

h_{μ} = S (ρ_{i i d})

. The measurement that realizes this bound consists of orthogonal projectors

P_{y}

, each of which is constructed from an eigenvector of

ρ_{i i d}

. In the special case where

ρ_{i i d}

is the maximally mixed state,

s = {log}_{2} (d)

, and any set of orthogonal projectors on a block

ρ_{0 : l}

gives a uniform distribution over all measurement outcomes

y_{0 : l}

, since there are no correlations between qudits

E_{q} = 0

,

T_{q} = 0

, and

R_{q} = 0

trivially. The output of a single-qudit source with uncorrelated noise can be considered an i.i.d. qudit process.

4.2. Quantum Presentations of Classical Processes

Any classical process with alphabet

X

can be represented by a qudit process with orthogonal alphabet

Q

, where each symbol

x \in X

corresponds to a pure state

| ψ_{x} 〉 \in Q

. In this case, the encoding

E

is trivial, and one can recover the underlying process

\overset{\leftrightarrow}{X}

via repeated measurement with orthogonal projectors

{P_{x} = | ψ_{x} 〉 〈 ψ_{x} |}

. This requires that

d \geq | X |

.

The information measures for the underlying process, the qudit process, and measurement outcomes obey

\begin{matrix} H [X_{0 : l}] & = S (ρ_{0 : l}) \\ \leq H [Y_{0 : l}], \end{matrix}

with equality for repeated measurement with a rank-one POVM whose elements include the projector set

{P_{x}}

.

From this relation, we can see that many quantum information quantities such as s,

E_{q}

,

T_{q}

, and

R_{q}

are equal to the classical properties of

\overset{\leftrightarrow}{X}

. Exceptions include quantities that depend on the relationship between d and

| X |

, such as quantum redundancy.

Since the quantum-classical channel

E

is trivial, the output process

\overset{\leftrightarrow}{Y}

is the result of passing

\overset{\leftrightarrow}{X}

through a noisy classical channel, where the level of noise depends on the particular measurement scheme

M

.

For a repeated rank-one POVM,

H [X_{0 : l}] \leq H [Y_{0 : l}]

for all ℓ from Propositions 3 and 4. Similar inequalities explicitly relate other classical process properties such as the entropy rates (

h_{μ}^{X} \leq h_{μ}^{Y}

), the predictabilities (

| G^{X} | \geq | G^{Y} |

), and the excess entropies (see Corollary 3).

4.3. Periodic Processes

A classical stochastic process

\overset{\leftrightarrow}{X}

is periodic with period p if it consists of repetitions of a template sequence—a length-p block of symbols. A periodic separable qudit process

| Ψ_{- \infty : \infty} 〉 = E (\overset{\leftrightarrow}{X})

is one for which the underlying classical process

\overset{\leftrightarrow}{X}

is periodic.

For classical periodic processes, the block entropy curve reaches a maximal value at its Markov order

R = p

and thereafter remains constant with increasing ℓ (

h_{μ} = 0

). That maximal value is the excess entropy, which is entirely determined by the period according to the formula

E = {log}_{2} (p)

. Finally, an observer attempting to synchronize to different length-p templates may encounter more or less uncertainty in the process depending on the template itself, which is a feature captured by the transient information

T

[11].

Periodic qudit processes share many of these properties (as we show via Section 3’s results) but also exhibit richer behavior, since they can consist of nonorthogonal qudit states. Figure 5 shows the quantum block entropies for the period-3 process consisting of the repeated quantum word

| ψ_{00 ϕ} 〉 = | 0 〉 \otimes | 0 〉 \otimes | ψ (ϕ) 〉

, where

| ψ (ϕ) 〉 = cos \frac{ϕ}{2} | 0 〉 + sin \frac{ϕ}{2} | 1 〉

. For

ϕ

=

π

, we recover the block entropy for the classical period-3 word ‘001’. As

ϕ

decreases,

| ψ (ϕ) 〉

becomes less distinguishable from

| 0 〉

.

From Propositions 6 and 13, it follows that periodic qudit processes with period p have

s = 0

and

E_{q} \leq {log}_{2} (p)

. (And,

E_{q} = {log}_{2} (p)

unless two classical symbols in

X

are sent to the same pure state in

Q

in a way that reduces the effective period of the qudit process to less than p). However, the quantum block entropy curve does not necessarily reach its maximal value of

E_{q}

for

l = p

if

Q

contains nonorthogonal states. In this case,

R_{q} \to \infty

, since an observer cannot unambiguously distinguish where one length-p block begins and another one ends with any finite measurement.

Though all period-p qudit processes have the same von Neumann entropy rate and quantum excess entropy, they may be distinguished by their values for the quantum transient information in two different ways.

First, different quantum alphabets

Q

give different values of

T_{q}

. Figure 5 demonstrates that, for a two-state qubit alphabet, as the states become more or less distinguishable, the quantum transient information increases. For

ϕ = π

(orthogonal alphabet),

T_{q} \approx 2.33

bits × time steps, whereas for

ϕ = \frac{π}{2}

,

T_{q} \approx 4.22

bits × time steps. These values of

T_{q}

(and others in this section) are numerically approximated using Equation (47), with

l = 12

.

Second,

T_{q}

can distinguish between different length-p words. Reference [11] shows that

T

can distinguish between different period-5 classical words (‘00001’, ‘00011’, and ‘00101’), and

T_{q}

generalizes this behavior. Whereas all period-3 words are equivalent to ‘001’ under global bit swap and translations, the same is not true for period-5 words. Section 5 discusses synchronizing to period-5 qudit sources in more detail and relates that task to the value of

T_{q}

.

4.4. Quantum Golden Mean Processes

The classical Golden Mean process consists of all binary strings with no consecutive ‘1’s. It is a Markov process (

R = 1

), since the joint probabilities

Pr (X_{0 : l})

for blocks factor as in Equation (2), with

Pr (0 | 0) = \frac{1}{2}

,

Pr (1 | 0) = \frac{1}{2}

,

Pr (0 | 1) = 1

, and

Pr (1 | 1) = 0

. For the classical Golden Mean,

h_{μ} = \frac{2}{3}

bits per symbol and

E \approx 0.2516

bits.

Replacing the classical symbol alphabet with the quantum alphabet

Q = {| 0 〉, | + 〉}

, where

| + 〉 = \frac{1}{\sqrt{2}} (| 0 〉 + | 1 〉)

, gives the

| 0 〉

-

| + 〉

Quantum Golden Mean process introduced in Ref. [14]. Figure 6 shows its generator.

We can further generalize this process to the

| 0 〉

-

| ψ (ϕ) 〉

Quantum Golden Mean process with quantum alphabet

{| 0 〉, | ψ (ϕ) 〉 = cos \frac{ϕ}{2} | 0 〉 + sin \frac{ϕ}{2} | 1 〉}

. This process’s quantum entropy rate is shown in Figure 7 for different values of

ϕ

. For

ϕ = π

,

| ψ (ϕ) 〉 = | 1 〉

, and we recover the classical Golden Mean process. As

ϕ

decreases to 0, the states in

Q

become less distinguishable and s decreases, as is expected from Proposition 6.

Also in Figure 7, we see the entropy rate

h_{μ}^{Y}

of the measured processes obtained by applying a repeated PVM to the

| 0 〉

-

| ψ (ϕ) 〉

Quantum Golden Mean process. This PVM consists of projectors parametrized by the angle

θ

, where

M_{θ} = {| ψ (θ) 〉 〈 ψ (θ) |, | ψ (θ + π) 〉 〈 ψ (θ + π) |}

. As mandated by Proposition 7,

h_{μ}^{Y} \geq s

for all

ϕ

and

θ

.

Figure 8 demonstrates how another of Section 3’s quantum information properties, the quantum excess entropy

E_{q}

, relates to the excess entropy

E^{Y}

of the classical measured process. For all

ϕ

and

θ

, the bound from Proposition 14 (

E_{q} \geq E^{Y}

) holds,

E^{Y}

has maxima at

ϕ = π

, and

θ = 0, π

when

Q = {| 0 〉, | 1 〉}

and

M = M_{01}

. These quantities were estimated using

l = 10

.

Underlying the

| 0 〉

-

| ψ (ϕ) 〉

Quantum Golden Mean is the classical Golden Mean process, which is Markov. Thus, it obeys the quantum Markov property of Equation (9) despite the fact that it has an infinite quantum Markov order for most values of

ϕ

. This has implications for an observer’s ability to synchronize to a Quantum Golden Mean source, which Section 5 explores.

4.5. 3-Symbol Quantum Golden Mean

Figure 9 shows the generator of the 3-Symbol Quantum Golden Mean process with alphabet

Q = {| 0 〉, | 1 〉, | + 〉}

. Though its generator shares the same internal states and transition probabilities as the

| 0 〉

-

| + 〉

Quantum Golden Mean, the 3-Symbol Quantum Golden Mean does not have a one-to-one correspondence between the quantum alphabet

Q

and the generator states

S

(

| 0 〉 \to A

and

| + 〉 \to B

). Instead,

| Q | = 3

, and these three states cannot all be orthogonal with

d = 2

.

However, unlike the

| 0 〉

-

| + 〉

Quantum Golden Mean, we can calculate the quantum entropy rate directly from the generator because (1) it has the property of quantum unifilarity, and (2) it is possible to synchronize to it. Both are discussed at length in Section 5. For now, an HMCQS is quantum unifilar if and only if, for every

σ \in S

, there exists some POVM

M_{σ}

such that an observer knowing

σ

and the outcome of

M_{σ}

can uniquely determine the internal state to which the HMCQS transitioned. The generator in Figure 9 meets this criterion with

M_{A} = M_{01}

.

M_{B}

can be any POVM.

For a classical, unifilar HMC,

h_{μ}

can be calculated as

\begin{matrix} h_{μ} = \sum_{σ_{i}, σ_{j} \in S} π_{i} \sum_{x \in X} T_{i j}^{x} {log}_{2} T_{i j}^{x}, \end{matrix}

(48)

where

π

is the stationary state distribution. This result dates back to the foundations of information theory [7].

Similarly, for a quantum unifilar HMCQS to which one can synchronize, we can write

\begin{matrix} s = \sum_{σ_{i}, σ_{j} \in S} π_{i} \sum_{x \in X} T_{i j}^{x} {log}_{2} T_{i j}^{x} . \end{matrix}

(49)

Let us walk through this logic for the 3-Symbol Quantum Golden Mean. In state A (

π_{A} = \frac{2}{3}

), the generator emits a qubit either in state

| 0 〉

or

| 1 〉

, each with probability

\frac{1}{2}

. The density matrix describing this qubit is a maximally mixed state, and any measurement performed on it involves 1 bit of irreducible randomness. If it is in state B (

π_{B} = \frac{1}{3}

), it emits a qubit in state

| + 〉

deterministically. Thus, averaging over the states, the von Neumann entropy rate is

s = \frac{2}{3}

bits/time step.

Despite this, when restricted to measuring with a repeated rank-one POVM, the entropy rate of the observed classical process cannot reach the lower bound of

\frac{2}{3}

bits per symbol because the minimum entropy basis when in state A is

M_{01}

, while the minimum entropy basis when in state B is

M_{\pm}

. However, an experimenter using the adaptive measurement protocol in Figure 10 would observe symbol sequences with an entropy rate of

\frac{2}{3}

bits per symbol once they have synchronized.

They start by measuring in the

M_{01} = {| 0 〉, | 1 〉}

basis and use the outcome of the initial measurement to select a new basis. If the outcome is ‘0’, they are able to synchronize to the process generator, which is necessarily in state A. They continue in state A measuring with POVM

M_{01}

until they observe a ‘1’, at which point they transition to B and measure with

M_{\pm}

, which is a zero-entropy measurement. They observe a ‘+’ and return to state A.

In this way, when using an adaptive measurement protocol, the recurrent part

\overset{\leftrightarrow}{X} [Y_{r}]

of the measurement sequence may have a lower entropy rate than

h_{μ}^{Y}

for any repeated rank-one POVM measurement, even if the associated DQMP uses only rank-one PVMs, as in this example. These ideas are formalized and expanded upon in the next section.

4.6. Unifilar and Nonunifilar Qubit Sources

The last two classes of qubit processes to discuss are generated by the unifilar and nonunifilar qubit sources shown in Figure 11 and Figure 12, respectively. Both of these generators consist of two internal states (

S = {A, B}

) and emit four possible pure-qubit states (

Q = {| 0 〉, | 1 〉, | + 〉, | - 〉}

) that form two orthogonal pairs. However, for each internal state of the unifilar qubit source, the two possible emitted states are orthogonal, and one can unambiguously determine the next internal state. In other words, it has the property of quantum unifilarity. The same is not true of nonunifilar qubit sources. The next section illustrates how this difference strongly affects an observer’s ability to synchronize.

By varying the parameter p, one can interpolate between one of the several simpler processes already analyzed. Starting with the unifilar qubit source in Figure 11, for

p = 0

, the generator becomes a period-2 source that emits the word

| ψ_{0 +} 〉

. For

p = \frac{1}{2}

, we obtain a two-state model that generates the i.i.d. maximally mixed process. And, as

p \to 1

, the generator emits longer strings of either only

| 1 〉

or only

| - 〉

qubits depending on whether the source is in A or B. At

p = 1

, the process becomes nonergodic (here demonstrated by the disconnection between internal states), and the source will only emit either

| 1 〉

or

| - 〉

deterministically.

Similarly, the nonunifilar qubit source in Figure 12 simplifies for certain p’s. For

p = 0

, we obtain a period-2 source, this time with sequence

| ψ_{01} 〉

, and for

p = \frac{1}{2}

, it also generates the i.i.d. maximally mixed process. As

p \to 1

, it emits long sequences of either all

| + 〉

or all

| - 〉

and becomes nonergodic for

p = 1

.

4.7. Unifilar Qutrit Source

Expanding beyond processes over qubits, we consider a single example of a qutrit (

d = 3

) process whose generator is shown in Figure 13 and that employs a five-qubit alphabet

Q = {| 0 〉, | 1 〉, | 2 〉, | + 〉, | - 〉}

. Using a higher-dimensional Hilbert space makes more measurements available to an observer for synchronization. As a consequence, this example process exhibits behavior that is impossible with qubit processes alone: A subspace of Hilbert space (occupied by

| 2 〉

) is always distinguishable from all other states in

Q

and can be reserved for synchronization. The other subspace (including

| 0 〉

,

| 1 〉

,

| + 〉

, and

| - 〉

) consists of states that cannot be reliably distinguished. Note that this process is quantum unifilar, but one cannot remain synchronized by measuring in a single basis. The next section discusses multiple adaptive measurement protocols that can do so.

4.8. Discussion

Table 1 summarizes information properties for the above examples with either analytic results or numerical estimates. Together, these examples illustrate a range of different features of separable qudit processes. They demonstrate how the information properties defined and characterized in Section 3 are both indicative of underlying structural features of quantum information sources and strongly influenced by the distinguishability of states in a process’s quantum alphabet. Our analysis of the convergence of the quantum block entropy to its linear asymptote thus gives meaningful and interpretable ways of quantifying the randomness and correlation in a separable quantum process.

We continue by discussing two tasks that an observer might wish to perform when faced with a quantum information source emitting a separable quantum process. First, if they have prior knowledge of the internal structure of the source, they may want to determine the internal state it occupies during a given time step. This task is synchronization, and we discuss it in the next section. Second, if they have no knowledge of the source, they may want to measure the process it produces to infer its internal structure. This task is system identification, and we demonstrate how an observer can use a tomographic protocol to perform it in Section 6.

Table 1. Information properties for example quantum processes/sources. Decimal values were numerically estimated using Equations (38), (45) and (47), with

l = 8

for the unifilar qutrit source,

l = 10

for the unifilar and nonunifilar qubit sources, and

l = 12

for all other processes. Other values were calculated analytically.

Table 1. Information properties for example quantum processes/sources. Decimal values were numerically estimated using Equations (38), (45) and (47), with

l = 8

for the unifilar qutrit source,

l = 10

for the unifilar and nonunifilar qubit sources, and

l = 12

for all other processes. Other values were calculated analytically.

	s	$\| G_{q} \|$	$E_{q}$	$T_{q}$	$R_{q}$
	(Bits/Time Step)		(Bits)	(Bits × Time Steps)	(Time Steps)
I.I.D. Qubit Process	$S (ρ_{i i d})$	$1 - S (ρ_{i i d})$	0	0	0
Period-3 Process ( $ϕ = π$ )	0	1	${log}_{2} (3)$	$2.33$	3
Period-3 Process ( $ϕ = π / 2$ )	0	1	${log}_{2} (3)$	$4.22$	∞
Quantum Golden Mean ( $ϕ = π$ )	$2 / 3$	$1 / 3$	$0.2516$	$1 / 3$	1
Quantum Golden Mean ( $ϕ = π / 2$ )	$0.4495$	0.5505	$0.1092$	$0.5687$	∞
3-Symbol Quantum Golden Mean	0.6667	0.3333	$0.4652$	$0.8855$	∞
Unifilar Qubit Source ( $p = 1 / 3$ )	0.9184	0.0816	$0.0808$	$0.1976$	∞
Nonunifilar Qubit Source ( $p = 1 / 3$ )	0.7306	0.2614	$0.3217$	$0.4090$	∞
Unifilar Qutrit Source	0.8002	0.7848	$1.290$	$2.156$	∞

5. Synchronizing to a Quantum Source

How does an observer of a process with knowledge of its quantum generator determine its internal state? When an observer is certain about the internal state, the observer is synchronized to the quantum source. The following explores synchronizing to quantum processes—both the manner in which observations lead to inferring the source’s state and quantitative measures of partial and full synchronization.

Quantum measurement adds subtlety to this task in comparison to the task of synchronizing to a classical process given knowledge of its minimal unifilar model—the

ε

-machine—as described in Refs. [12,47].

5.1. States of Knowledge

Recall that a hidden Markov chain quantum source (HMCQS) consists of a set of internal states (

S

), a pure-state alphabet (

Q

), and a set of labeled transition matrices (

T

). We assume an observer has complete knowledge of the HMCQS that generates a process but can only infer the internal state at time

t = l

by applying block measurement

M_{0 : l}

and observing outcome

y_{0 : l}

. They have no access to the qudits that were emitted before

t = 0

.

An observer’s best guess for the internal state of a source given different sequences of observations can be represented as distributions over the source’s internal states, which are known as mixed states (not to be confused with mixed quantum states, which are represented by density matrices). Classically, after observing a particular length-ℓ word

w = x_{0 : l}

, the observer is in the mixed state

η (w) = (p_{A}, p_{B}, \dots)

. After the next observation

X_{l}

, they will transition to one of a set of new mixed states depending on the outcome

η (w, x_{l} = 0)

,

η (w, x_{l} = 1)

, and so on. The word corresponding to the new mixed state is a concatenation of w and the new observation

x_{l}

. The set of mixed states for a classical process and the dynamic between them define a process’s mixed state presentation (MSP), which is unifilar by construction [48].

Using a classical process’s MSP rather than a nonunifilar generator of the process has many computational advantages. Two of interest are that it allows one to calculate the entropy rate for processes without finite unifilar presentations [49] and that it allows one to calculate the uncertainty an observer experiences while attempting to synchronize to a process’s generator [50].

For quantum processes, there is no unique MSP but rather a multiplicity of possible MSPs, each corresponding to a different choice of measurement protocol. For a given source and given measurement protocol

M

, we can define a set of mixed states with each corresponding to the possible measurement sequence one can observe.

Consider the mixed states corresponding to length-ℓ sequences of observations. We restrict

M_{0 : l}

to consist of local POVMs, allowing for adaptive measurement. Given that an observer has applied measurement

M_{0 : l}

and seen measurement outcomes

y_{0 : l}

, their best guess about the generator’s internal state is represented by the conditional distribution

η (y_{0 : l} | M_{0 : l}) = {Pr (σ | y_{0 : l}, M_{0 : l}) for all σ}

.

For

t = 0

, an observer has no measurement outcomes with which to inform their prediction about the source’s internal state. However, they do know the stationary state distribution

π

of the model (since it can be calculated directly from

T

). This serves as a ‘best guess’ of the source’s internal state absent any measurements. If the initial state distribution

S_{0}

is not

π

and the observer is aware of this fact, the mixed states for that process are

η (y_{0 : l} | M, S_{0})

. We omit the conditioning on

S_{0}

if

S_{0} = π

.

For most repeated PVM measurements of qubit processes, there are an uncountably infinite number of mixed states [14]. This measurement-induced complexity appears even for

| S | = 2

. The examples in this section step back from this complexity to focus on measurements that are sufficiently informative to allow an observer to synchronize with finite observation sequence

y_{0 : l}

and result in only finite or (at most) a countably infinite set of recurrent mixed states.

5.2. Average State Uncertainty and Synchronization Information

To compare synchronization behavior for different measurement protocols, it is useful to use entropic quantities rather than working with the set of mixed states and their dynamic directly. In particular, we look at the entropy of the possible mixed states for length-ℓ sequences of measurements.

An observer’s uncertainty about the source’s internal state after applying measurement sequence

M_{0 : l}

(according to some protocol

M

) and observing outcome

y_{0 : l}

is given by the Shannon entropy of the corresponding mixed-state distribution:

\begin{matrix} H [η (y_{0 : l} | M)] & = H [Pr (σ | y_{0 : l}, M)] \\ = - \sum_{σ \in S} Pr (σ | y_{0 : l}, M) {log}_{2} Pr (σ | y_{0 : l}, M) . \end{matrix}

One can average over all measurement outcomes on a block of ℓ qudits to find the average state uncertainty:

\begin{matrix} H (l | M) & \equiv H [η (Y_{0 : l} | M)] \\ \equiv - \sum_{y_{0 : l}} Pr (y_{0 : l} | M) H [η (y_{0 : l} | M)] . \end{matrix}

For a given measurement protocol

M

, this quantity generally converges to a finite value in the

l \to \infty

limit. This limit does not necessarily exist if the dynamics are not ergodic, for example, if there is some underlying periodicity. If such a limit exists, the asymptotic state uncertainty is

\begin{matrix} C_{\infty} (M) = lim_{l \to \infty} H (l | M) . \end{matrix}

Both

H (l | M)

and

C_{\infty} (M)

are measured in bits.

If

H [η (y_{0 : l} | M)] = 0

, then

y_{0 : l}

is a synchronizing observation, and an observer who sees outcome

y_{0 : l}

can precisely identify the source’s internal state. If

H (l | M) = 0

, then any measurement outcome seen when applying protocol

M

to ℓ qudits allows the observer to synchronize to the source. For classical and quantum processes, the Markov order is the first value of ℓ for which

H (l)

vanishes (for some

M_{0 : l}

in the quantum case). This generally does not occur for HMCQS with nonorthogonal states in

Q

except in the

l \to \infty

limit, since

R_{q}

is generally infinite.

Being synchronized after measuring ℓ qudits does not guarantee that the observer remains synchronized after measuring qudit

l + 1

. In classical processes for which this occurs, persistent synchronization requires that the HMC of the process that the observer uses satisfies the additional condition of unifilarity. In the quantum setting, synchronization persists if and only if the underlying HMCQS is quantum unifilar, and when the observer is in internal state

σ

, their measurement protocol ensures they apply the measurement for which the internal state at time

t + 1

is completely determined.

This is equivalent to the statement ‘There exists some protocol

M

with measurements

M_{0 : l}

and

M_{l}

such that

\begin{matrix} H [Pr (σ | y_{0 : l}, M_{0 : l})] = 0 \\ \Rightarrow H [Pr (σ | y_{0 : l} y_{l}, M_{0 : l} \otimes M_{l})] = 0, \end{matrix}

(50)

for all

y_{l} \in Y

’.

The measurement that maintains synchronization when the source is in one internal state (

σ_{i}

) does not need to be the same as the measurement that maintains synchronization when the source is in another internal state (

σ_{j}

). When the measurement required to maintain synchronization depends upon the HMCQS’s current state, then adaptive measurement protocols are capable of maintaining synchronization even when no fixed-basis measurement can, as the following demonstrates with multiple examples.

Finally, the total amount of state uncertainty that an observer encounters while synchronizing to a process using a given measurement protocol

M

is the synchronization information, which is given by

\begin{matrix} S (M) = \sum_{l = 0}^{\infty} H (l | M) . \end{matrix}

(51)

Note that, if the asymptotic state uncertainty

C_{\infty} (M)

is greater than 0,

S (M)

is infinite. When

C_{\infty} (M) = 0

, we can estimate

S (M)

by terminating the sum at some finite ℓ.

For classical processes, the synchronization information is closely tied to the transient information

T

. We will similarly connect

S (M)

to the quantum transient information

T_{q}

and use it as a way to compare synchronization via different measurement protocols.

The remainder of this section explores synchronization to various models from Section 4. We pair each with a variety of measurements, both repeated POVMs and adaptive protocols, to demonstrate the way

H (l | M)

is affected by this choice.

5.3. Synchronizing to Quantum Presentations of Classical Processes

An HMCQS with a qudit alphabet consisting entirely of orthogonal states (

〈 ψ_{x} | ψ_{x^{'}} 〉 = 0

, for all

| ψ_{x} 〉

and

| ψ_{x^{'}} 〉 \in Q

), is a quantum presentation of the underlying classical process

\overset{\leftrightarrow}{X}

. An observer can perform a measurement using orthogonal projectors

P_{x} = | ψ_{x} 〉 〈 ψ_{x} |

for all

| ψ_{x} 〉 \in Q

that unambiguously discriminates between the pure qudit states the source emits.

The task of synchronizing to such an HMCQS is equivalent to synchronizing to a source emitting classical symbols. If it is quantum unifilar, then the underlying classical HMC is unifilar, and synchronization is exponentially fast (on average) [12,47]. Absent unifilarity, an observer may not be able to synchronize even asymptotically—i.e.,

C_{\infty} (M) > 0

for all

M

—despite measuring with orthogonal projectors.

5.4. Synchronizing to Periodic Processes

All periodic quantum processes have a vanishing von Neumann entropy rate—i.e.,

s = 0

—and each state only has one incoming and one outgoing transition. Due to these simplifications, an observer simply determines the source’s phase to synchronize; i.e., which of the p internal states the source occupies at time t. Once this phase is determined for any t, it is also determined for all other t. The initial phase uncertainty is equivalent to the initial state uncertainty

H [π] = {log}_{2} (p)

, since

π

is uniform.

Any measurement protocol

M

that distinguishes between states in

Q

gives information about the phase and allows an observer to synchronize to the source asymptotically. If two states in

Q

cannot be distinguished by

M

, then synchronization may not be possible, even with an infinite number of measurements.

For classical periodic processes, an observer synchronizes at the Markov order

l = p

, and the synchronization information and the transient information are equal:

S = T

[11]. In contrast, quantum periodic processes generically have infinite Markov order, if there are nonorthogonal states in

Q

, and only asymptotically synchronize. Thus,

S (M) \geq T_{q}

. The condition for equality is that

M

is the optimal measurement over the process. For

R_{q} = \infty

, this means that

M

must be a global measurement over the bi-infinite chain of qudits.

Figure 14 shows the average state uncertainty for an observer measuring three different period-5 qubit processes consisting of two alphabet states:

| 0 〉

and

| ψ (ϕ = 3 π / 4) 〉

. All other period-5 sequences with this qubit alphabet are equivalent to these three under shift and swap symmetries. As expected,

H (l)

decreases from

H (0) = {log}_{2} 5

to

C_{\infty} = 0

for all combinations of sequence and measurement basis. That noted, the rate at which synchronization occurs—and the total amount of uncertainty seen by an observer, the synchronization information—depends on both the particular sequence and the particular repeated measurement applied to it.

For period-5 process with word ‘

0000 ψ

’,

T_{q} \approx 6.32

bits × time steps, using Equation (47), with

l = 12

. When measuring it in basis

M_{01}

(

M_{ϕ}

), we find that

S (M_{01}) \approx 7.92

bits (

S (M_{ϕ}) \approx 9.30

bits). These are also estimated by calculating up to

l = 12

. Similarly, the bound between

S (M)

and

T_{q}

holds for the other period-5 words. For word ‘

000 ψ ψ

’,

T_{q} \approx 4.86

bits × time steps,

S (M_{01}) \approx 6.84

bits, and

S (M_{ϕ}) \approx 7.05

bits. For word ‘

00 ψ 0 ψ

’,

T_{q} \approx 5.51

bits × time steps,

S (M_{01}) \approx 7.39

bits, and

S (M_{ϕ}) \approx 7.47

bits.

At this point, we wish to emphasize that s,

E_{q}

, and

R_{q}

are equal for generic period-5 processes with nonorthogonal alphabets

Q

. Despite this,

T_{q}

and

S (M)

identify physically relevant differences between these processes for the task of synchronization. These differences are intrinsic to the quantum process itself (

T_{q}

) and also appear within the measurement outcomes an observer obtains (

S (M)

).

As a final comment on periodic processes, we note that while

s = 0

for all periodic quantum process, many measurement protocols (even those for which

C_{\infty} = 0

) give a measured classical process with a nonzero entropy rate

h_{μ}^{Y}

. In fact, no fixed-basis measurement of a periodic process results in an observed process with

h_{μ}^{Y} = 0

unless (i) all states in

Q

are orthogonal and (ii) the measurement is in an orthogonal basis which includes one projector for each state in

Q

.

In contrast, an observer using an adaptive measurement protocol can easily have zero uncertainty in the measurement outcomes once synchronized. For example, Figure 15 shows the recurrent states for a DQMP that swaps between two different measurements—

M_{01}

and

M_{ϕ}

—depending on the internal state of the source. The sequence of measurement outcomes observed is deterministic, and the recurrent measured process is a period-5 process with word ‘

00 ϕ 0 ϕ

’ and

h_{μ}^{Y_{r}} = 0

.

5.5. Synchronizing with PVMs

The

| 0 〉

-

| + 〉

Quantum Golden Mean process has multiple synchronizing observations; see the generator in Figure 6. Measuring with

M_{01}

and seeing a ‘1’ synchronizes the observer to internal state B; measuring with

M_{\pm}

and seeing a ‘−’ synchronizes the observer to internal state A. Let us consider the mixed states produced by these two repeated PVMs in turn, starting with

M_{01}

.

Figure 16 shows the mixed states for the measured process obtained by repeatedly applying

M_{01}

. Observing synchronizing measurement

y = ‘ 1 ’

means that the source just transitioned to state B while emitting a

| + 〉

qubit, i.e.,

H [η (‘ 1 ’)] = 0

. Additionally, the source in state B can only transition to state A while emitting a

| 0 〉

qubit, so the observer will see a ‘0’, and

H [η (‘ 10 ’)] = 0

. However, once the source is in state A, an observer easily desynchronizes from the source if they observe another ‘0’, as this is consistent with either of the two transitions to the source. As an observer sees more ‘0’s, they transition to mixed states further to the right of Figure 16. To summarize, measuring with

M_{01}

makes use of synchronizing observations ‘1’ and ‘10’, but its MSP has a countably infinite number of recurrent states corresponding to sequences of n ‘0’s.

This measured process is an infinite-state classical renewal process whose information properties can be estimated using methods from Ref. [51]. After seeing n ‘0’s in a row, the next observation will be ‘1’ with the following probability:

\begin{matrix} Pr (‘ 1 ’ | {‘ 0 ’}^{n}) & = \frac{1}{4} * Pr (A | {‘ 0 ’}^{n}) \\ = \frac{3}{16} (1 - {(- \frac{1}{3})}^{n}) . \end{matrix}

We find

h_{μ}^{Y} \approx 0.60

bits per symbol,

E^{Y} \approx 0.053

bits, and a single-symbol entropy of

H [1] \approx 0.65

bits.

E^{Y}

’s small value and the fact that

h_{μ}^{Y}

is not significantly lower than

H [l = 1]

indicate that the infinite-state renewal process presentation provides only a small predictive advantage over using a biased coin with

Pr (‘ 0 ’) = 5 / 6

. This is further evidenced by how

Pr (B | {‘ 0 ’}^{n}) = \frac{1}{4} (1 - {(- 1 / 3)}^{n - 1})

converges exponentially quickly to its asymptotic value of

Pr (B | {‘ 0 ’}^{n}) = 1 / 4

.

Measuring with

M_{\pm}

, symbol ‘−’ is a synchronizing observation and indicates the source is in state A. The recurrent mixed states for this observed process are shown in Figure 17, and they also form a classical renewal process. Observing n ‘+’s since the last ‘−’, the probability that the generator is in state A is

Pr (A | {‘ + ’}^{n}) = \frac{3}{5} (1 - {(- 2 / 3)}^{n + 1})

. The measured process obtained with

M_{\pm}

has

h_{μ}^{Y} \approx 0.90

bits per symbol,

E^{Y} \approx 0.020

bits, and

H [l = 1] \approx 0.91

bits. Again, the infinite-state MSP provides only a small predictive advantage over a biased coin with

Pr (‘ + ’) = 2 / 3

. Also,

Pr (A | {‘ + ’}^{n})

converges exponentially quickly to

Pr (A | {‘ + ’}^{n}) = 3 / 5

.

Applying either of these two fixed-basis measurements gives an average state uncertainty that decreases monotonically with ℓ, as shown in Figure 18. Since this source is not quantum unifilar, an observer repeatedly synchronizes and desynchronizes while measuring this process regardless of basis, and

H (l)

does not approach 0 as

l \to \infty

. We find that

C_{\infty} (M_{01}) \approx 0.62

bits and

C_{\infty} (M_{\pm}) \approx 0.54

bits. Measuring with

M_{\pm}

not only results in less state uncertainty asymptotically but also results in less average uncertainty for all values of ℓ.

Figure 18 also displays the average state uncertainty for two other relevant PVMs:

M_{θ}

for

θ = π / 4

and

θ = 3 π / 4

. Recall that

M_{θ}

is the PVM consisting of projectors onto orthogonal states

| ψ (θ) 〉

and

| ψ (θ + π) 〉

. For comparison,

M_{01}

corresponds to

θ = 0

, and

M_{\pm}

corresponds to

θ = π / 2

.

For

θ = π / 4

, the symbol states

| 0 〉

and

| + 〉

give the observer the exact same distribution of measurement outcomes, and the observer cannot gain any information about the state of the source beyond the stationary state distribution. This can also been seen as the maximum of the asymptotic state uncertainty in Figure 19.

For

θ = 3 π / 4

—and for the majority of values of

θ

—

M_{θ}

has no synchronizing observations. Despite this, Figure 18 demonstrates that

H (l | M_{θ = 3 π / 4})

is lower for all ℓ than both bases that can exactly synchronize.

A measured process for generic

θ

does not have the renewal process structure of Figure 16 and Figure 17. Instead, in the absence of synchronizing observations, the number of mixed states generically grows exponentially with ℓ. If we approximate the MSP using length-ℓ observations, then there will be

| Y^{l} |

mixed states—one for each possible sequence of measurements—each corresponding to a different distribution over the source’s internal states. These infinite-state MSPs can be characterized by their statistical complexity dimension [52].

Despite the explosive complexity, many of these PVMs give lower average state uncertainties than

M_{01}

and

M_{\pm}

.

θ = 3 π / 4

is a representative example. The asymptotic state uncertainty when applying

M_{θ}

is shown in Figure 19. Note that the maximum value of

C_{\infty} (M_{θ}) = H [π]

occurs when

θ = π / 4

, as discussed. The minimum asymptotic state uncertainty for this set of PVMs is

C_{\infty} \approx 0.49

bits, which occurs for

θ \approx 2.01

. An observer may choose to use a basis with an exponential set of mixed states to lower their uncertainty in the source’s internal state at the cost of having to track the probability of a larger set of mixed states.

5.6. Maintaining Synchrony with Adaptive Measurement

Adaptive measurement protocols are also capable of maintaining synchronization when no fixed-basis measurement can. To appreciate this, consider Figure 11’s unifilar qubit source.

For

0 < p < 1

, there are no synchronizing observations; however, almost any measurement reduces the average state uncertainty below

H [π]

. Additionally, if an observer comes to know the source’s state by other means (perhaps the source is initialized in state A at time

t = 0

), then they are able to maintain synchronization with a simple adaptive measurement protocol

M_{a d a p t i v e}

.

This protocol, defined here only for recurrent states, is as follows. If the source is in state A, apply measurement

M_{01}

, and if the source is in state B, apply measurement

M_{\pm}

. It is only possible to define it this simply and maintain synchronization because the source is quantum unifilar.

Figure 20 shows the behavior of

H (l)

as the observer desynchronizes from the source initialized in state A. For each repeated PVM measurement, it eventually reaches an asymptotic value, though

H (l)

may be nonmonotonic.

5.7. Synchronizing to a Qutrit Source

A qubit source (

d = 2

) presents limited opportunities for unambiguous state discrimination and, hence, for synchronization. Qutrits allow for more general behavior and have been suggested for applications in quantum communication networks where one state is reserved specifically for synchronization [53].

Let us apply this logic to the unifilar qutrit source in Figure 13. Synchronizing observation sequences for this process are ‘2’, ‘

1 +

’, and ‘

1 -

’ that definitively place the source in states A, B, and C, respectively. Once synchronized, an observer may remain synchronized by measuring with

M_{012} = {| 0 〉 〈 0 |, | 1 〉 〈 1 |, | 2 〉 〈 2 |}

when in state A,

M_{\pm 2} = {| + 〉 〈 + |, | - 〉 〈 - |, | 2 〉 〈 2 |}

when in state B, and either of the above when in state C.

We explored synchronizing to this source with five different measurement protocols and compare them in Figure 21. The first two are repeated PVMs in the

M_{012}

basis and the

M_{\pm 2}

basis. The other three are adaptive measurement protocols that share a recurrent dynamic but have different transient states. Consider measuring in the

M_{012}

(

M_{\pm 2}

) basis until observing a ‘2’ and therefore synchronizing to source state A. Then, use

M_{012}

when the source is in state A and

M_{\pm 2}

when the source is in state B or C. We refer to this protocol as

M_{012, s y n c}

(

M_{\pm 2, s y n c}

). The fifth protocol

M_{a d a p t i v e}

is defined by the DQMP in Figure 22 that uses an adaptive protocol over three transient mixed states in addition to the protocol just defined for the recurrent states.

The average state uncertainties for an observer implementing these five measurement protocols are shown in Figure 21. The fixed-basis measurements do not lead to persistent synchronization and have a nonzero asymptotic state uncertainty (

C_{\infty} (M_{012}) \approx 0.40

bits and

C_{\infty} (M_{\pm 2}) \approx 0.72

bits. If one measures in a fixed basis until synchronizing (by observing a ‘2’) and then takes advantage of the quantum unifilarity of the source to stay synchronized, then the asymptotic state uncertainty vanishes. This is the case for

M_{012, s y n c}

and

M_{\pm 2, s y n c}

, which have synchronization information values of

S (M_{012, s y n c}) \approx 3.91

bits and

S (M_{012, s y n c}) \approx 3.60

bits. The extra complexity of the measurement protocol

M_{a d a p t i v e}

admits additional synchronizing words (‘

1 +

’ and ‘

1 -

’) and lower synchronization information (

S (M_{a d a p t i v e}) \approx 3.00

bits) than the simpler strategy of waiting to see a ‘2’. Note that these three synchronization information values are all greater than our estimate of the quantum transient information for this process (

T_{q} \approx 2.16

bits × symbols). These values values of

S (M)

were estimated using

l = 10

.

5.8. Discussion

This section has detailed the process of synchronizing to a known qudit source. In contrast to sources of classical random variables, an observer may attempt to synchronize to a qudit source using a variety of measurement protocols. We can compare different protocols with two informational quantities introduced above: the asymptotic state uncertainty

C_{\infty} (M)

and the synchronization information

S (M)

.

For sources that are not quantum unifilar, no protocol can remain synchronized to the source. Nevertheless, for different measurement protocols

M_{0}

and

M_{1}

, we can compare

C_{\infty} (M_{0})

and

C_{\infty} (M_{1})

. The protocol with a lower value is better at the task of synchronization in the sense that an observer’s uncertainty in the source’s internal state will be lower on average.

This leads to a natural question: What is the best measurement protocol for synchronizing to a source? To answer it, we introduce a new protocol-independent property of a qudit process, the minimal asymptotic state uncertainty:

\begin{matrix} C_{m i n} = min_{M} C_{\infty} (M), \end{matrix}

where the minimum is taken over all possible measurement protocols defined via DQMP. In practice, determining

C_{m i n}

for a quantum process requires a proof that no measurement protocol can achieve a lower value. Fully exploring the space of measurement protocols is beyond the present scope. Nevertheless, we introduced candidates for

C_{m i n}

in the above examples and found the minimal asymptotic state uncertainty for a restricted class of repeated PVMs numerically; recall Figure 19.

For sources that are quantum unifilar, we explored several protocols that are capable of persistent synchronization. For such processes,

C_{m i n} = 0

bits. We can compare different synchronizing measurement protocols

M_{0}

and

M_{1}

through their synchronization information values

S (M_{0})

and

S (M_{1})

. A lower value means that the observer experiences less state uncertainty while synchronizing. What is the minimal amount of state uncertainty an observer can experience? We define the minimal synchronization information for a quantum process as

\begin{matrix} S_{m i n} = min_{M} S (M) . \end{matrix}

This minimum is also taken over all DQMPs, establishing that a given protocol is minimal and nontrivial.

One open question prompted by this work is ‘Is it always possible to synchronize to a source that is quantum unifilar?’ Equivalently, ‘Does

C_{m i n} = 0

for all processes generated by quantum unifilar sources?’ Answering these questions will also require a greater understanding of the space of DQMPs. Progress may involve proving the existence or nonexistence of a protocol that is able to synchronize to the unifilar qubit source in Figure 11.

Finally, we recount several reasons why synchronization is an important task not only for determining a source’s internal state but also for improving predictions of future measurement outcomes. Generally, HMCQSs have inherent stochasticity. Periodic sources are an exception. Through synchronization, we may substantially reduce the uncertainty in measurement outcomes. In the extreme case where a source’s internal state has only one possible transition, this uncertainty vanishes, and we can measure the next qudit with a PVM which has a deterministic outcome.

For example, if we know a

| 0 〉

-

| + 〉

Quantum Golden Mean generator is in state B, the next qudit is

| 0 〉

, and we will always see ‘0’ if we apply the measurement

M_{01}

. Thus, we consider synchronization as a form of dynamical inference where an observer uses knowledge of both a source’s internal structure and a sequence of measurement outcomes to inform a more accurate prediction of future measurement outcomes. Each mixed state corresponds to a different prediction.

6. Quantum Process System Identification

Synchronization is a task that is performed when an observer has an accurate model (here, a HMCQS) of the quantum information source. The next natural question is the following: How does an observer create an accurate model of an unknown quantum information source? This section begins to answer this question. It starts by briefly reviewing how one infers a classical information source from data. It then discusses how to identify the state of a single qudit with quantum state tomography. By combining these two ideas we arrive at an inference method for stationary qudit sequences.

6.1. Classical System Identification

An observer of an unknown stationary classical process is limited to use only the available data—distributions of words over symbol alphabet

X

—to infer the source’s structure. We denote the distribution of length-ℓ words as

P (ℓ) = {P r (w) \forall w \in X^{ℓ}}

.

For

l = 1

, we obtain a distribution over symbols in

X

. With

P (1)

, we can reconstruct a memoryless (i.i.d.) model of the source with one internal state, and each

Pr (x \in X)

is obtained directly from

P (1)

.

For

l = 2

, we use

P (2)

to create conditional probability distributions for the next symbol conditioned on the previous one; i.e.,

Pr (x_{1} | x_{0})

, for all

x_{0}

, where

x_{1} \in X

. From these conditional distributions, we may construct a Markov approximation of the source with

| X |

states, one corresponding to each symbol. The conditional distributions set the transition matrices for Markov approximation of the source, i.e.,

Pr (x_{1} | x_{0}) = T_{x_{0}, x_{1}}

.

Similarly, for

l > 2

, we obtain a length-ℓ HMM approximation of the source by conditioning on length

l - 1

words, each of which corresponds to a different internal state. In this simplified picture, the number of internal states of the model grows exponentially with ℓ, as the number of possible words of length ℓ is

{| X |}^{l}

. This leads to numerical problems when inferring processes with correlations over long periods of time. That said, there are methods for both combining states that reflect identical predictions of future symbols and for performing inference over machine topologies with a certain number of states—known as Bayesian Structural Inference [54]—to determine the most likely

ε

-machine for the source.

6.2. Tomography of a Qudit

We cannot directly apply the procedure for inferring classical source dynamics from word distributions to qudit processes, since observations depend on the measurement basis/protocol one uses. To fully characterize a stationary quantum source, we must instead take many measurements in different bases to reconstruct the qudit density matrices. This task is known as quantum state tomography. We begin by inferring an individual qudit state

ρ_{0}

before introducing a general method for quantum system identification using separable sequences of qudits.

Tomographic reconstruction of a single unknown qudit density matrix

ρ_{0}

through measurement is challenging for two main reasons:

$ρ_{0}$ ’s complete description requires a number of parameters that scales exponentially with the state’s Hilbert space dimension.
Quantum measurement is probabilistic, so one must prepare and measure many copies of $ρ_{0}$ to estimate a single parameter.

Specific combinations of measurements are particularly useful for this task. For example,

ρ_{0}

can be inferred by measuring with a set of mutually unbiased bases (MUBs) [3] or a single informationally complete POVM (IC-POVM) [4]. For a qubit, one possible MUB consists of the x, y, and z bases. By measuring many copies of the qubit in each of these three bases, one obtains three probabilities—

Pr (+ x)

,

Pr (+ y)

, and

Pr (+ z)

—that uniquely determine the density matrix

ρ_{0}

. We do not discuss the necessary number of measurements to determine these parameters to within a desired tolerance for general qudit tomography, a question which is well studied [33,55].

An example of an IC-POVM for a qubit consists of the projectors onto states:

\begin{matrix} | ϕ_{1} 〉 & = | 0 〉, \\ | ϕ_{2} 〉 & = \frac{1}{\sqrt{3}} | 0 〉 + \sqrt{\frac{2}{3}} | 1 〉, \\ | ϕ_{3} 〉 & = \frac{1}{\sqrt{3}} | 0 〉 + \sqrt{\frac{2}{3}} e^{i 2 π / 3} | 1 〉, and \\ | ϕ_{4} 〉 & = \frac{1}{\sqrt{3}} | 0 〉 + \sqrt{\frac{2}{3}} e^{i 4 π / 3} | 1 〉 . \end{matrix}

(52)

This is also a symmetric IC-POVM, or SIC-POVM, because any combination of two projectors has the same inner product.

By measuring many identical copies of

ρ_{0}

with the same IC-POVM, one obtains a probability distribution over the

d^{2}

possible measurement outcomes. This provides

d^{2} - 1

parameters (due to normalization) that uniquely determine the density matrix

ρ_{0}

.

The existence and properties of SIC-POVMs in higher-dimensional Hilbert spaces is an active area of research [56].

6.3. Tomography of a Qudit Process

Now that we know how to estimate the density matrix for an individual qudit, we can begin analyzing length-ℓ density matrices

ρ_{0 : l}

. We will measure each qudit in turn rather than perform a joint measurement over the entire length-ℓ block of qudits. This is a luxury afforded to us because we are focusing on separable qudit sequences—the generic case of entangled qudit processes requires measuring in nonlocal bases.

For

l = 1

, we reconstruct

ρ_{0}

as described above and obtain a memoryless (i.i.d.) estimate of the source that emits qudit

ρ_{0}

at every time step, following Equation (4). Unless

ρ_{0}

is a pure state, there are many single-state HMCQSs that generate this process, since many different pure-state ensembles correspond to the same density matrix. A unique memoryless model of the source may be obtained by diagonalizing

ρ_{0}

and having the source emit each pure eigenstate

| ψ_{i} 〉

with probability equal to the corresponding eigenvalue

λ_{i}

.

For

l = 2

, we must reconstruct the two-qudit density matrix

ρ_{0 : 2}

. (Recall that our indexing is left-inclusive and right-exclusive; therefore,

ρ_{0 : 2}

is the joint state of the qudits for

t = 0, 1

). Due to stationarity,

ρ_{0 : 2}

(the joint state of the two qudits) must be consistent with the one-qudit marginals, i.e.,

t r_{0} (ρ_{0 : 2}) = ρ_{1} = ρ_{0} = t r_{1} (ρ_{0 : 2})

.

We will describe the iterative procedure for reconstructing

ρ_{0 : l}

for qubits in detail. When

d = 2

,

ρ_{0 : 2}

has 15 real parameters that must be determined via tomography. Tomography on the one-qubit marginals determines three parameters, and the condition of stationarity fixes three more. The state can be reconstructed fully by considering combinations of the set of mutually unbiased measurements. For two qubits, this means the 16 combinations of Pauli matrices (

σ_{I_{0}} \otimes σ_{x_{1}}

,

σ_{x_{0}} \otimes σ_{x_{1}}

,

σ_{x_{0}} \otimes σ_{y_{1}}

, and so on) [57]. To fully characterize

ρ_{0 : 2}

, nine of these values must be determined—those not involving the identity operators

σ_{I_{0}}

and

σ_{I_{1}}

which are fixed by the one-qubit marginals. For

d > 2

, this procedure can be modified by using a set of mutually unbiased bases in that higher-dimensional Hilbert space.

After determining

ρ_{0 : 2}

, one can continue on to determine

ρ_{0 : 3}

(63 real parameters for qubits). Many of these parameters are fixed by the previous tomography on the one-qubit marginals (3 parameters), the two-qubit marginals (15 parameters), and their stationarity conditions (6 and 15 parameters, respectively). Combinations of three one-site Pauli matrices are sufficient for full reconstruction. One may continue this procedure for larger ℓ, typically until the number of measurements becomes experimentally infeasible.

6.4. Cost of I.I.D.

Quantum information sources are often assumed to be i.i.d. [6,33]. If an observer performing quantum state tomography assumes that an unknown quantum information source is i.i.d., then they will not go beyond determining the one-qudit marginal

ρ_{0}

. If the qudits are instead correlated, this will lead to an overestimate of the source’s randomness. They will erroneously conclude that each qudit is in state

ρ_{0}

and that the entropy rate is

S (ρ_{0})

bits per time step. The latter overestimates the true entropy rate by a factor of

S (1) - s

. An observer can obtain better estimates for the entropy rate and other informational quantities by following the above procedure and tomographically reconstructing blocks of qudits of length ℓ.

To demonstrate the degree to which this assumption may mislead an observer, consider the process generated by a nonunifilar qubit source in Figure 12. Assume that the source begins in its stationary state distribution:

(p_{A} = 1 / 2, p_{B} = 1 / 2)

. For any p such that

0 \leq p < 1

,

ρ_{0}

is the maximally mixed state, and an observer assuming an i.i.d. process estimates that

s = 1

bit per time step. This is only an accurate description of the source for

p = 1 / 2

, whereas, for many values of p, this source has significant correlations between subsequent qubits.

Let us take a closer look at the extreme values of p. For

p = 0

, the process is period-2 with

s = 0

bits per time step and

E_{q} = 1

bit. This pattern can be easily detected by measuring in the

{| 0 〉, | 1 〉}

basis, where a measurement of 0 (1) immediately synchronizes an observer to state B (A). As

p \to 1

, the two source states become increasingly disconnected,

s \to 0

, and

E_{q} \to 1

. An observer measuring in the

{| + 〉, | - 〉}

basis observes a + (−) and is likely to measure another + (−). This source’s rich and varied behavior at different values of p will go entirely unappreciated when considering only the one-qubit density matrix,

ρ_{0}

.

6.5. Finite-Length Estimation of Information Properties

We just saw an example of how, if an observer assumes a source is i.i.d., they will generally underestimate the structure and correlation of the quantum process and will overestimate its entropy rate. The same is true if they only perform tomography on blocks of qudits up to finite length ℓ. Consider now an experimenter who tomographically reconstructs the density matrix

ρ_{0 : l}

and then assumes there are no additional (longer-range) correlations within the qudit process.

Their estimates for the quantum entropy rate, quantum excess entropy, and quantum transient information are given by Equations (38), (45), and (47), respectively.

The difference between the length-ℓ estimate of an information property and its true value depends on the process’s internal structure and quantum alphabet. The estimates for the nonunifilar and unifilar qubit sources in Figure 23 and Figure 24 represent two extremes in this respect. As previously discussed, the single-qubit density matrix for the nonunifilar qubit source is the maximally mixed state. If an observer instead reconstructs

ρ_{0 : 2}

, they significantly improve their estimate of s,

E_{q}

, and

T_{q}

. However, for

l > 2

, these estimates do not improve dramatically. There is always a tradeoff between the number of experiments necessary to reconstruct the process tomographically and the accuracy of the estimates obtained from that reconstruction. For this source,

l = 2

strikes a balance between those two resources.

In contrast, for the unifilar qubit source, the estimates of s,

E_{q}

, and

T_{q}

improve steadily as ℓ increases. Longer-range correlations are captured by increasing the length of the reconstructed density matrices

ρ_{0 : l}

. When observing this source, it is likely worth finding

ρ_{0 : l}

for the largest ℓ that is experimentally feasible.

When faced with an unknown quantum source, how does one pick an appropriate value of ℓ? One strategy is to increase ℓ until the correction made to the relevant information quantities by going to

l + 1

is below some threshold. Stationarity (and the resulting concavity of the quantum block entropy) ensures that future corrections will also be below that threshold.

6.6. Tomography with a Known Quantum Alphabet

If an observer has additional knowledge of what possible pure states an HMCQS may emit (i.e., the quantum alphabet

Q

), they can leverage this knowledge to simplify the task of system identification by inferring the word probabilities of the underlying classical process

\overset{\leftrightarrow}{X}

rather than performing full tomographic reconstruction. For qubits, we can represent this simplification geometrically via the Bloch sphere; see Figure 25. Each point on the surface of the Bloch sphere represents a pure qubit state in

H^{2}

—such as

| 0 〉

and

| 1 〉

on the poles of the z axis—and each interior point represents a possible qubit density matrix. The following assumes that each element of

Q

is unique.

6.6.1. $l = 1$

The length-1 density matrix

ρ_{0}

must satisfy the equation

\begin{matrix} ρ_{0} = \sum_{| ψ_{x} 〉 \in Q} Pr (| ψ_{x} 〉) | ψ_{x} 〉 〈 ψ_{x} | . \end{matrix}

After finding

ρ_{0}

via tomography, one may rearrange this equation to infer the length-1 word distributions of

\overset{\leftrightarrow}{X}

, given that

Pr (X = x) = Pr (| ψ_{x} 〉)

. The feasibility of this task depends on the relationship between

| Q |

and d. For the qubit case (

d = 2

), one can uniquely infer

Pr (X_{0})

if

| Q | \leq 3

, assuming no degenerate states in

Q

.

When

Q

is known, it may not be necessary to fully reconstruct

ρ_{0}

. We first present several simple examples before giving a general algorithm for finding

ρ_{0 : l}

without performing full-state tomography on

ρ_{0 : l}

.

For

d = 2

and

| Q | = 2

, the possible values of

ρ_{0}

are restricted to a chord within the Bloch sphere defined by

ρ_{0} = p | ψ_{0} 〉 〈 ψ_{0} | + (1 - p) | ψ_{1} 〉 〈 ψ_{1} |

with

0 < p < 1

. One needs only to determine the parameter p rather than reconstruct

ρ_{0}

in its entirety. An observer can also pick a uniquely informative measurement to determine p. The optimal PVM for doing so is one whose antipodal projectors can be connected with the diameter of the Bloch sphere that runs parallel to the line of possible values of

ρ_{0}

. For example, if

Q = {| 0 〉, | + 〉}

, then the set of possible density matrices lies on the line segment in Figure 25a. The best PVM is then

M_{θ}

with

θ = \frac{3 π}{4}

and measurement outcomes

y_{0}

(corresponding to a projection on pure-state

| ψ (θ = \frac{3 π}{4}) 〉

) and

y_{1}

(corresponding to the orthogonal projector). In this case, it can easily be shown that

p = \frac{\sqrt{2} + 1}{2} - \sqrt{2} Pr (y_{0} | ρ_{0})

and that

{cos}^{2} (\frac{3 π}{8}) \leq Pr (y_{0} | ρ_{0}) \leq {sin}^{2} (\frac{3 π}{8})

.

M_{01}

and

M_{\pm}

would also be able to determine p but require more samples to determine p to within some desired tolerance.

For

d = 2

and

| Q | = 3

, the possible values of

ρ_{0}

are confined to a simplex in the Bloch sphere defined by

ρ_{0} = p_{0} | ψ_{0} 〉 〈 ψ_{0} | + p_{1} | ψ_{1} 〉 〈 ψ_{1} | + (1 - p_{0} - p_{1}) | ψ_{2} 〉 〈 ψ_{2} |

, with

0 < p_{0}, p_{1} < 1

and

p_{0} + p_{1} < 1

. One may determine the parameters

p_{0}

and

p_{1}

rather than the three parameters usually required to characterize a qubit mixed state. There are two simple choices of measurements to do so: using a IC-POVM or using two different PVMs.

For the first case, consider measuring

ρ_{0}

with a SIC-POVM with elements

E_{y} = \frac{1}{2} | ϕ_{y} 〉 〈 ϕ_{y} |

, with each

| ϕ_{y} 〉

described by Equation (52). The probability of observing measurement y can be written as

\begin{matrix} Pr (y | ρ_{0}) & = \sum_{x} Pr (| ψ_{x} 〉) Pr (y | | ψ_{x} 〉) \\ = \sum_{x} \frac{p_{x}}{2} {| 〈 ψ_{x} | ϕ_{y} 〉 |}^{2} . \end{matrix}

One can rearrange the system of 4 equations (one for each POVM element) to obtain a unique set of

p_{x}

’s.

Alternatively, one uses two PVMs whose projectors can be connected by (ideally orthogonal) diameters of the Bloch sphere that are parallel to the simplex of possible

ρ_{0}

values. This will yield two parameters that uniquely determine a point on the

ρ_{0}

simplex. An example with

Q = {| 0 〉, | 1 〉, | + 〉}

is shown in Figure 25b, for which the possible values of

ρ_{0}

are confined to the interior of a triangle in the Bloch sphere. One can determine

p_{0}

and

p_{1}

by measuring with orthogonal PVMs

M_{01}

and

M_{\pm}

(among many other combinations), in which case

(1 - p_{0} - p_{1}) = 2 Pr (

‘+’

| ρ_{0}, M_{\pm}) - 1

, and

(p_{0} - p_{1}) = 2 Pr (

‘0’

| ρ_{0}, M_{01}) - 1

.

For

d = 2

and

| Q | = 4

, the possible values of

ρ_{0}

are confined to a tetrahedron in the Bloch sphere whose vertices are the elements of

Q

, and one cannot uniquely infer the classical symbol distribution from a fully reconstructed

ρ_{0}

. For example, if

Q = {| 0 〉, | 1 〉, | + 〉, | - 〉}

, and

ρ_{0}

is the maximally mixed state, this could correspond to any mixture of the form

ρ_{0} = p | 0 〉 〈 0 | + p | 1 〉 〈 1 | + (1 - p) | + 〉 〈 + | + (1 - p) | - 〉 〈 - |

with

0 < p < 1

. Thus, for

d = 2

and

| Q | \geq 3

, the tomographic advantage to knowing

Q

is reduced but not eliminated, as an observer can immediately exclude any value of

ρ_{0}

that lies outside the convex polyhedron defined by the elements of

Q

. This is shown in Figure 25c, where the region of possible

ρ_{0}

values is confined to less than

\frac{1}{4}

of the volume of the Bloch sphere.

Similar simplifications apply for

d = 3

(qutrits) when

Q

is known. Full tomography of an arbitrary mixed qutrit state requires the determination of eight parameters, whereas determining the classical distribution given

Q

requires

| Q | - 1

parameters. This presents an advantage in general when

| Q | \leq 8

. We do not explicitly construct measurements that can realize this advantage, as a geometric understanding of mixed states over the eight-dimensional space

H^{3}

is significantly more involved than the three-dimensional Bloch sphere describing mixed states over

H^{2}

.

For a generic qudit, the number of parameters required for full tomography is

d^{2} - 1

. And so, we expect that knowledge of

Q

gives a clear tomographic advantage (fewer parameters must be determined) when

| Q | < d^{2}

.

We are now prepared to give a general protocol for

| Q | = n

and arbitrary d. We wish to recover the underlying distribution of alphabet states (

p_{x} = Pr (| ψ_{x} 〉)

,

0 \leq x < n

) from measurement statistics alone. First, we construct a POVM with

n + 1

elements:

E_{y} = c_{y} | ψ_{x} 〉 〈 ψ_{x} |

for each

| ψ_{x} 〉

in

Q

, and

E_{n} = I - \sum_{y = 0}^{n - 1} E_{y}

. Each

c_{y}

is a parameter which can be varied to ensure that

E_{n}

is positive semi-definite. By applying this POVM to

ρ_{0}

, we obtain a distribution (

Pr (y | ρ_{0})

) over the

n + 1

possible measurements. The first n are related to our desired distribution by

\begin{matrix} Pr (y | ρ_{0}) & = \sum_{x} p_{x} Pr (y | | ψ_{x} 〉) \\ = \sum_{x} p_{x} c_{y} {| 〈 ψ_{x} | ψ_{y} 〉 |}^{2}, \end{matrix}

with one equation for each

y < n

. If a set of

p_{x}

’s is a solution to this system of linear equations, it is consistent with the observed measurements. The solution will be unique for

| Q | < d^{2}

.

6.6.2. $l = 2$

Knowing

Q

provides further advantage when considering the tomography of multiple qudits. The distribution over classical words of length ℓ has

{| Q |}^{l} - 1

parameters, whereas full tomography of ℓ qudits requires the determination of

d^{2 l} - 1

parameters.

For

l = 2

,

\begin{matrix} \begin{matrix} ρ_{0 : 2} = \sum_{| ψ_{x_{0}} 〉, | ψ_{x_{1}} 〉 \in Q} [Pr (| ψ_{x_{0}} 〉 \otimes | ψ_{x_{1}} 〉) (| ψ_{x_{0}} 〉 \otimes | ψ_{x_{1}} 〉) (〈 ψ_{x_{0}} | \otimes 〈 ψ_{x_{1}} |)] . \end{matrix} \end{matrix}

(53)

Once again, we consider the case of

d = 2

and

| Q | = 2

explicitly. There are four length-2 classical word probabilities, but there are three constraints imposed by (i) normalization, (ii) stationarity, and (iii) consistency with the one-qubit marginal. Thus, one only needs to determine a single parameter to reconstruct

ρ_{0 : 2}

.

Consider the task of reconstructing the length-2 density matrix produced by the

| 0 〉

-

| + 〉

Quantum Golden Mean generator in Figure 6 with the knowledge that

Q = {| 0 〉, | + 〉}

. One would first analyze the one-qubit density matrix to find that

Pr (| 0 〉) = \frac{2}{3}

and

ρ_{0} = \frac{2}{3} | 0 〉 〈 0 | + \frac{1}{3} | + 〉 〈 + |

.

The word probability

Pr (| 00 〉)

is the only necessary additional information to find

ρ_{0 : 2}

. The following simple measurement protocol can determine

Pr (| 00 〉)

: Measure two consecutive qubits with

M_{\pm}

. If the qubit is in state

| + 〉

, one will never see outcome ‘−.’ If the qubit is in state

| 0 〉

, one will see outcome ‘−,’ with probability

\frac{1}{2}

. And so,

Pr (| 00 〉) = 4 Pr (

‘

- -

’

| ρ_{0 : 2}, M_{\pm} \otimes M_{\pm})

.

This procedure can be generalized to arbitrary two-element alphabets

Q = {| ψ_{0} 〉, | ψ_{1} 〉}

. First, measure two consecutive qudits with a PVM

M_{\tilde{0} 1}

, where one element is a projector onto

| ψ_{1} 〉

with outcome ‘1’ and the orthogonal projector corresponds to measurement outcome ‘

\tilde{0}

’. Second,

\begin{matrix} Pr ( & | ψ_{0} 〉 \otimes | ψ_{0} 〉) \\ = {(Pr (` \tilde{0} ’ | | ψ_{0} 〉, M_{\tilde{0} 1}))}^{- 2} Pr (` \tilde{0} \tilde{0} ’ | ρ_{0 : 2}, M_{\tilde{0} 1} \otimes M_{\tilde{0} 1}) \\ = {(1 - | 〈 ψ_{0} | ψ_{1} 〉 |)}^{- 2} Pr (` \tilde{0} \tilde{0} ’ | ρ_{0 : 2}, M_{\tilde{0} 1} \otimes M_{\tilde{0} 1}) . \end{matrix}

This provides a clear advantage over the usual nine parameters necessary to reconstruct

ρ_{0 : 2}

, as it takes into account that the one-qubit marginals and stationarity each impose three constraints.

For

d = 2

and

| Q | \geq 3

, we must determine additional parameters of the underlying classical distribution over

Q

. We do so by repeatedly applying the SIC-POVM with elements

E_{y} = \frac{1}{2} | ϕ_{y} 〉 〈 ϕ_{y} |

, with each

| ϕ_{y} 〉

described by Equation (52).

The probability of observing the length-2 measurement sequence

y_{0} y_{1}

can be written as

\begin{matrix} Pr (y_{0} y_{1} | ρ_{0 : 2}) & = \sum_{x_{0}, x_{1}} p_{x_{0}, x_{1}} Pr (y_{0} y_{1} | | ψ_{x_{0}} 〉 \otimes | ψ_{x_{1}} 〉) \\ = \sum_{x_{0}, x_{1}} \frac{p_{x_{0}, x_{1}}}{4} {| 〈 ψ_{x_{0}} | ϕ_{y_{0}} 〉 |}^{2} {| 〈 ψ_{x_{1}} | ϕ_{y_{1}} 〉 |}^{2} . \end{matrix}

There are

{| Q |}^{2}

p_{x_{0}, x_{1}}

s. If a set of

p_{x_{0}, x_{1}}

s is a solution to this system of equations, it is consistent with the measurement statistics, and the solution will be unique for

| Q | = 3

.

For

| Q | = n

and arbitrary d, we may construct a POVM from

Q

, as we did for

l = 1

. It is then possible to infer a set of

p_{x_{0}, x_{1}}

that solves the resulting system of equations from measurement statistics. For length-2 words, the equations are

\begin{matrix} Pr (y_{0} y_{1} | ρ_{0 : 2}) & = \sum_{i} p_{x_{0}, x_{1}} Pr (y_{0} y_{1} | | ψ_{x_{0}} 〉 \otimes | ψ_{x_{1}} 〉) \\ = \sum_{x_{0}, x_{1}} p_{x_{0}, x_{1}} c_{y_{0}} c_{y_{1}} {| 〈 ψ_{x_{0}} | ψ_{y_{0}} 〉 |}^{2} {| 〈 ψ_{x_{1}} | ψ_{y_{1}} 〉 |}^{2} . \end{matrix}

There are

n^{2}

equations, one for each length-2 measurement sequence corresponding to the POVM elements

E_{y_{0}} \otimes E_{y_{1}}

. Calculating the other possible outcomes (corresponding to

E_{n}

) is redundant due to normalization.

6.6.3. $l \geq 3$

Extending this analysis to a length-ℓ density matrix

ρ_{0 : l}

takes the form

\begin{matrix} ρ_{0 : ℓ} = \sum_{| ψ_{w} 〉 \in Q^{ℓ}} [Pr (| ψ_{w} 〉) | ψ_{w} 〉 〈 ψ_{w} |], \end{matrix}

where each

| ψ_{w} 〉

has the form of Equation (5). One can determine the length-ℓ word distributions uniquely for the general case where

{| Q |}^{l} < d^{2 l}

.

The various measurement strategies explored above for

l = 2

can be extended to arbitrary values of ℓ. We will explicitly describe two: using a SIC-POVM for

d = 2

and using a POVM constructed from

Q

for arbitrary d.

Consider repeatedly applying the SIC-POVM with elements

E_{y} = \frac{1}{2} | ϕ_{y} 〉 〈 ϕ_{y} |

, with each

| ϕ_{y} 〉

described by Equation (52). Taking ℓ measurements, one observes a length-ℓ word

y_{0 : l}

.

Y

is the four-element alphabet of measured symbols.

The probability of observing the length-ℓ measurement sequence

y_{0 : l}

can be written as

\begin{matrix} Pr (y_{0 : ℓ} | ρ_{0 : ℓ}) & = \sum_{| ψ_{w} 〉} Pr (| ψ_{w} 〉) Pr (y_{0 : ℓ} | | ψ_{w} 〉) \\ = \sum_{| ψ_{w} 〉} \frac{Pr (| ψ_{w} 〉)}{2^{ℓ}} {| 〈 ψ_{w} | ϕ_{y_{0 : ℓ}} 〉 |}^{2}, \end{matrix}

where

| ϕ_{y_{0 : l}} 〉 = ⨂_{t = 0}^{l} | ϕ_{y_{t}} 〉

, and the factor of

2^{l}

comes from the POVM elements. Each of the

4^{l}

sequences has a probability that can be estimated from measurement. This system of equations can be solved to find underlying length-ℓ word probabilities that are consistent with measurements. If

| Q | \leq 3

, this solution is unique.

We can also measure ℓ times with a POVM constructed directly from

Q

. In this case, the resulting equations are

\begin{matrix} Pr (y_{0 : ℓ} | ρ_{0 : ℓ}) & = \sum_{| ψ_{w} 〉} Pr (| ψ_{w} 〉) Pr (y_{0 : ℓ}) | | ψ_{w} 〉) \\ = \sum_{| ψ_{w} 〉} p_{w} (\prod_{t = 0}^{ℓ - 1} c_{y_{t}}) {| 〈 ψ_{w} | ψ_{y_{0 : ℓ}} 〉 |}^{2} . \end{matrix}

As before, one infers the

{| Q |}^{l}

underlying word probabilities (

p_{w}

’s) uniquely in the case where

| Q | < d^{2}

.

6.7. Source Reconstruction

After observing a separable qudit process and finding the length-ℓ density matrix

ρ_{0 : l}

, can one infer the HMCQS that generated it? The following reconstructs the source that generates the

ρ_{0 : l}

of an unknown process for different values of ℓ. Note that a source which generates

ρ_{0 : l}

may fail to generate

ρ_{0 : l + 1}

.

6.7.1. $l = 1$

After determining

ρ_{0}

, an observer may construct an i.i.d. approximation of the quantum information source. Given any decomposition of

ρ_{0}

into pure states

| ψ_{x} 〉

—as in Equation (3)—the corresponding HMCQS consists of one internal state

Q = {| ψ_{x} 〉}

, and the single transition probability for each

| ψ_{x} 〉

is its corresponding probability

Pr (| ψ_{x} 〉)

in the decomposition of

ρ_{0}

. A unique model may be obtained by taking

ρ_{0}

’s eigendecomposition. In this case, the model emits each eigenstate

| ψ_{i} 〉

with a probability equal to the corresponding eigenvalue:

Pr (| ψ_{i} 〉) = λ_{i}

.

6.7.2. $l = 2$

With

ρ_{0 : 2}

, an observer begins to model a source that generates correlations between qudits. Doing so requires finding a separable decomposition of

ρ_{0 : 2}

of Equation (53)’s form. If

Q

is known, multiple procedures for finding such a separable decomposition have been introduced above. For a generic two-qudit density matrix, determining whether it is separable or entangled is generally NP-hard [58]. There are also many necessary and sufficient conditions for separability; for example, there is the Positive Partial Transpose (PPT) criterion [59].

The following assumes that the observer has a separable decomposition of

ρ_{0 : 2}

into alphabet states

Q^{'}

that may or may not be the source’s alphabet

Q

. In general, the sets of basis states in the decomposition of a two-qudit density matrix may differ, but we require a symmetric decomposition such that the basis states of both qudits are

Q^{'}

. From this decomposition, they can construct an HMCQS with an underlying Markov dynamic described by Equation (9). This HMCQS has

| Q^{'} |

internal states and transition probabilities

T_{σ_{x}, σ_{x^{'}}} = Pr (| ψ_{x^{'}} 〉 | | ψ_{x} 〉)

that can be calculated from

ρ_{0 : 2}

. Different separable decompositions of

ρ_{0 : 2}

yield different HMCQSs, whose statistics over longer-length sequences may differ. Determining which is a more accurate model of the source requires performing tomography on

ρ_{0 : 3}

to refine the model.

6.7.3. $l \geq 3$

Finding a separable decomposition becomes more computationally expensive as the number of qudits increases [60]. Nevertheless, if one obtains a separable decomposition of

ρ_{0 : 3}

where the basis states for all 3 qudits are

Q^{'}

, then one can create a length-2 Markov approximation of the source. Each length-2 sequence of qudits in

Q^{'}

corresponds to a different internal HMCQS state. Thus, it has

| Q^{'} |^{2}

internal states, unless some length-2 sequences are forbidden. The transition probabilities take the form

T_{σ_{x_{0}, x_{1}}, σ_{x_{1}, x_{2}}} = Pr (| ψ_{x_{2}} 〉 | | ψ_{x_{0}} 〉 | ψ_{x_{1}} 〉)

. Only transition probabilities that obey concatenation are nonzero.

One can continue in this manner, approximating the source given a separable decomposition of

ρ_{0 : l}

, to obtain an HMCQS with

| Q^{'} |^{l - 1}

states or fewer if some sequences are forbidden. Each state corresponds to a word of length

l - 1

, where the pure states composing the word are drawn from

Q^{'}

. The number of internal states in the model grows exponentially with ℓ, but not all of these states may lead to unique future predictions. If so, they can be combined without a loss of predictivity, as is done in classical computational mechanics. A general algorithm for doing so is beyond the present scope.

6.8. Discussion

The preceding section detailed many aspects of identifying an unknown quantum process by tomographically reconstructing the length-ℓ density matrices

ρ_{0 : l}

. When correlations exist between qudits, this provides a predictive advantage over the common assumption that sources are i.i.d.. Starting with the method for reconstructing classical processes, we developed a variety of measurement protocols for different values of d,

| Q |

, and ℓ to find a process’s statistics when

Q

is known. We then introduced effective models for quantum sources derived entirely from separable decompositions of the density matrices

ρ_{0 : l}

reconstructed via tomography.

Since our aim is to harness correlations to improve predictions of future measurement outcomes, it is worth asking, how accurately can future measurement outcomes be predicted? To begin to answer this, the following now defines maximally predictive measurements for the special case of

l = 2

and for arbitrary ℓ.

For a correlated qudit process, an observer with knowledge of the length-2 density matrix

ρ_{0 : 2}

may perform a measurement on the first qudit and condition upon the outcome to reduce their uncertainty when measuring the second qudit. They apply

M_{0}

(with possible outcomes

y_{i}

) on the first qubit

ρ_{0}

. This leaves the joint system in the classical quantum state:

\begin{matrix} ρ_{0 : 2}^{M_{0}} = \sum_{y_{i}} Pr (y_{i}) | y_{i} 〉 〈 y_{i} | \otimes ρ_{1}^{y_{i}}, \end{matrix}

where the

ρ_{1}^{y_{i}}

are the qudit density matrices conditioned on outcome

y_{i}

.

The conditional von Neumann entropy of the second qubit is then

\begin{matrix} S (ρ_{1} | M_{0} (ρ_{0})) = \sum_{y_{i}} Pr (y_{i}) S (ρ_{1}^{y_{i}}) . \end{matrix}

The rank-one measurement with the minimal uncertainty in measurement outcomes is in the eigenbasis of

ρ_{1}^{y_{i}}

. Different

y_{i}

values generally correspond to different minimal-entropy measurements on the second qubit.

For two qubits, we can find the PVM for which the conditional von Neumann entropy of the second qubit is minimized. This leads to a basis-independent property of the process:

\begin{matrix} S_{m i n} (ρ_{1} | M_{m i n} (ρ_{0})) = min_{M_{0}} S (ρ_{1} | M_{0} (ρ_{0})), \end{matrix}

where the minimum is taken over all PVMs on

ρ_{0}

.

If an experimenter reconstructs

ρ_{0 : l}

, a measurement on all but the last qudit in the block with a measurement protocol

M

with possible outcomes

y_{0 : l - 1}

leaves the block in the classical-quantum state:

\begin{matrix} ρ_{0 : l}^{M} = \sum_{y_{0 : l - 1}} Pr (y_{0 : l - 1}) | y_{0 : l - 1} 〉 〈 y_{0 : l - 1} | \otimes ρ_{l}^{y_{0 : l - 1}} . \end{matrix}

To minimize the conditional von Neumann entropy of the ℓth qubit, we calculate

\begin{matrix} S_{m i n} (ρ_{l} | M_{m i n} (ρ_{0 : l - 1})) = min_{M} S (ρ_{l} | M (ρ_{0})), \end{matrix}

where the minimum is taken over all local measurement protocols on

ρ_{0 : l - 1}

.

Further development necessitates exploring the space of measurement protocols to find those with the greatest predictive advantage over i.i.d. models for arbitrary separable qudit processes.

Finally, we note that our procedure for source reconstruction requires a number of model states that grows exponentially with ℓ. Future work on finding minimal models from density matrices will also require combining states with identical predictions and performing inference over possible model topologies, as with classical Bayesian Structural Inference.

7. Conclusions

Inspired by prior information-theoretic studies of classical stochastic processes, we introduced methods to quantify structure and information production for stationary quantum information sources. We identified properties related to the quantum block entropy

S (l)

that allow one to determine the amount of randomness and structure within a given qudit process. We gave bounds on informational properties of the resulting measured classical processes. In particular, we showed that they cannot have a lower entropy rate or block entropy (at any ℓ) than the original quantum process.

We analyzed a number of hidden Markov chain quantum sources (HMCQSs), explaining how an observer synchronizes to a source’s internal states via measuring the emitted qudits. If the source allows for synchronizing observations, then we showed that adaptive measurement protocols are capable of synchronizing and maintaining synchronization when fixed-basis measurements cannot.

Sequels will extend these methods and results in a number of ways. Despite focusing here on separable quantum sequences for simplicity, entangled qudit sequences can similarly be studied by combining an HMCQS and a D-dimensional quantum system capable of sequentially generating matrix product states [24]. Doing so will open up the study of entropy convergence of matrix product operators [61].

Many results exist for classical stochastic processes that may be extended to quantum processes. For example, there exist closed-form expressions for informational measures for nondiagonalizable classical dynamics [50,62,63,64]. Extending these to quantum dynamics would allow for more accurate determination of the quantum information properties introduced here. Similarly, the preceding lays the groundwork for fluctuation theorems and large deviation theories of separable quantum processes. Finally, it will be worthwhile to develop a causal equivalence relation for quantum stochastic processes and develop quantum

ε

-machines by extending classical results [65].

Separable quantum sequences also serve as a resource for information processing by finite-state quantum information transducers that transform one quantum process to another. Beyond interest in their own right, such operations have thermodynamic consequences, either requiring work to operate (as overcoming dissipation induced by Landauer erasure [66]) or acting as a quantum version of information-powered engines capable of leveraging environmental correlations to perform useful work [26,67,68]. This behavior has already been demonstrated for certain quantum processes [69].

Finally, spatially extended, ground, and thermal states of spin chains under various Hamiltonians are quantum processes. And so, the quantum information measures introduced here can serve to classify these states according to the source complexity required to generate them.

Author Contributions

Both authors conceived of the presented ideas and developed the theory. D.G. performed the computations, developed the graphics, and verified the analytical methods. Both authors discussed the results and contributed to the final manuscript. All authors have read and agreed to the published version of the manuscript.

Funding

This material is based upon work supported by, or in part by, Grant Nos. FQXi-RFP-IPW-1902 and FQXi-RFP-1809 from the Foundational Questions Institute and Fetzer Franklin Fund (a donor-advised fund of the Silicon Valley Community Foundation) and grants W911NF-18-1-0028 and W911NF-21-1-0048 from the U.S. Army Research Laboratory and the U.S. Army Research Office.

Institutional Review Board Statement

Not applicable.

Data Availability Statement

The raw data supporting the conclusions of this article will be made available by the authors on request.

Acknowledgments

We thank Fabio Anza, Sam Loomis, Alex Jurgens, Ariadna Venegas-Li, and participants of the Telluride Science Research Center Information Engines Workshops for helpful discussions. JPC acknowledges the kind hospitality of the Telluride Science Research Center, Santa Fe Institute, and California Institute of Technology for during visits.

Conflicts of Interest

The authors declare no conflicts of interest.

Appendix A. Information in Classical Processes

This section briefly recounts properties that quantify the randomness and correlation of classical stochastic processes as developed in Ref. [11]. Details and complete proofs can be found there. The main text here introduces versions appropriate to quantum processes.

Appendix A.1. Shannon Entropy

We quantify the amount of uncertainty in a discrete random variable X with possible outcomes

{x_{1}, x_{2}, \dots x_{n}}

by its Shannon entropy: [7]:

\begin{matrix} H [X] \equiv - \sum_{i = 1}^{n} Pr (x_{i}) {log}_{2} Pr (x_{i}) . \end{matrix}

We use

{log}_{2}

, in which case the units for Shannon entropy are bits.

To study correlations between multiple random variables, we use several additional information quantities related to the Shannon entropy. First, the joint entropy of two discrete random variables—X and Y, with possible outcomes

{x_{1}, x_{2}, \dots x_{n}}

and

{y_{1}, y_{2}, \dots y_{m}}

, respectively—is defined as

\begin{matrix} H [X, Y] \equiv - \sum_{i = 1}^{n} \sum_{j = 1}^{m} Pr (x_{i}, y_{j}) {log}_{2} Pr (x_{i}, y_{j}) . \end{matrix}

X and Y are statistically independent if and only if the joint entropy decomposes as

H [X, Y] = H [X] + H [Y]

.

Second, the conditional entropy of X conditioned on Y is

\begin{matrix} H [X | Y] \equiv - \sum_{i = 1}^{n} \sum_{j = 1}^{m} Pr (x_{i}, y_{j}) {log}_{2} Pr (x_{i} | y_{j}) . \end{matrix}

H [X | Y]

is the uncertainty in the value of X after already knowing the value of Y. Note that

H [X | Y] \geq 0

and is not symmetric:

H [X | Y] \neq H [Y | X]

. If X and Y are uncorrelated, then

H [X | Y] = H [X]

.

From the above definitions, one can derive the following identity, linking conditional and joint entropies:

\begin{matrix} H [X | Y] = H [X, Y] - H [Y] . \end{matrix}

Third and finally, the mutual information between two random variables is

\begin{matrix} I [X : Y] \equiv H [X] - H [X | Y] . \end{matrix}

This is the amount of information one can gain about X by having complete knowledge of Y. The mutual information is symmetric, and

I [X : Y] = 0

if and only if X and Y are statistically independent.

Appendix A.2. Block Entropy

For a classical stochastic process, we quantify the amount of uncertainty in a block of ℓ consecutive random variables by taking the joint entropy

H [X_{0 : l}]

. This is the block entropy:

\begin{matrix} H [l] & \equiv H [X_{0 : l}] \\ = - \sum_{x_{0 : l}} Pr (x_{0 : l}) {log}_{2} Pr (x_{0 : l}), \end{matrix}

where the sum is taken over all words of length ℓ and

H [0] \equiv 0

.

H [l]

is monotonically increasing and concave down, and its behavior as

l \to \infty

is indicative of a process’s correlations and randomness [11]. The generic behavior of

H [l]

and its relation to other information properties that we will define can be seen in Figure 2.

Appendix A.3. Shannon Entropy Rate

The following briefly summarizes Ref. [11]’s results. Refer there for a more thorough exploration of information-theoretic quantities related to multivariate systems and the block entropy.

First among these is the Shannon entropy rate

h_{μ}

:

\begin{matrix} h_{μ} = lim_{l \to \infty} \frac{H [l]}{l} . \end{matrix}

(A1)

The limit in Equation (A1) is guaranteed to exist for all stationary processes [70].

h_{μ}

is irreducible randomness produced by an information source. Its units are bits per symbol.

The Shannon entropy rate can equivalently be written using the conditional entropy:

\begin{matrix} h_{μ} = lim_{l \to \infty} H [X_{0} | X_{- l : 0}], \end{matrix}

(A2)

Therefore,

h_{μ}

can equivalently be thought of as the average uncertainty in the next symbol if all preceding symbols are known.

To better appreciate

h_{μ}

, we consider a few simple cases.

For an i.i.d. process, the block entropy trivially is

H [l] = l H [X_{0}]

, and therefore,

h_{μ} = H [X_{0}]

. Since there are no correlations between variables, knowledge of past symbols cannot reduce the uncertainty of the next.

If a process is periodic (for example, consisting of alternating 0s and 1s), then a keen observer will note this pattern and be able to predict with certainty all future symbols of the process. In this case,

h_{μ} = 0

.

For stationary Markov and hidden Markov processes, an observer can leverage past observations to reduce their uncertainty about succeeding symbols.

h_{μ}

will be their average uncertainty about the next symbol once they have accounted for all of the correlations with past symbols.

Graphically,

h_{μ}

corresponds to the slope of the block entropy curve as

l \to \infty

, as shown in Figure 2.

Appendix A.4. Redundancy

For a stochastic process with alphabet

X

, the maximum entropy rate is

{log}_{2} | X |

, corresponding to i.i.d. random variables

X_{t}

with uniform distributions over all measurement outcomes

x_{t}

. Any other stationary process can be compressed down to its entropy rate

h_{μ} < {log}_{2} | X |

.

The amount that a particular source can be compressed is known as its redundancy, which is defined as

\begin{matrix} R \equiv {log}_{2} | X | - h_{μ} . \end{matrix}

R

includes two very different effects: bias within individual random variables and correlations between different random variables. To determine the relative importance of those two factors requires closer examination of

H [l]

.

Appendix A.5. Block Entropy Derivatives and Integrals

Since the limit in Equation (A1) exists,

H [l]

scales (at most) linearly. We are interested in how

H [l]

converges to its linear asymptote, and we will see that taking discrete derivatives of

H [l]

(and integrals of those derivatives) provides us with useful quantities for classifying processes.

Consider applying a discrete derivative operator

Δ

to a function

F : Z \to R

:

\begin{matrix} Δ F (l) = F (l) - F (l - 1) . \end{matrix}

We can apply

Δ

to F multiple times to obtain higher-order derivatives:

\begin{matrix} Δ^{n} F (l) = (Δ \circ Δ^{n - 1}) F (l) . \end{matrix}

Taking discrete derivatives of

H [l]

yields a set of functions

Δ^{n} H [l]

. We then study how these discrete derivatives themselves converge to their asymptotic values. To do so, we take “integrals” of a discrete function

Δ F (l)

in the following manner:

\begin{matrix} \sum_{l = A}^{B} Δ F (l) = F (B) - F (A - 1) . \end{matrix}

(A3)

To study the convergence properties of each

Δ^{n} H [l]

, we compare it at each ℓ to its asymptotic value

{lim}_{l \to \infty} Δ^{n} H [l]

. We do so with the following general integral form:

\begin{matrix} I_{n} \equiv \sum_{l = l_{0}}^{\infty} [Δ^{n} H (l) - lim_{l \to \infty} Δ^{n} H [l]], \end{matrix}

(A4)

where

l_{0}

is the first value of ℓ for which

Δ^{n} H [l]

is defined.

Appendix A.6. Entropy Gain

The first derivative of

H [l]

is known as the entropy gain. It is defined as

\begin{matrix} Δ H [l] \equiv H [l] - H [l - 1], \end{matrix}

for

l > 0

. We set

Δ H [0] \equiv {log}_{2} (X)

. The entropy gain is the amount of additional uncertainty introduced by increasing the block size by one random variable, and its units are bits per symbol. Note that, because

H [l]

is monotone increasing and concave,

Δ H [l] \geq Δ H [l + 1] \geq 0

for all ℓ. This behavior is shown in Figure 3.

The entropy gain can be rewritten as a conditional entropy:

\begin{matrix} Δ H [l] = H [X_{0} | X_{- l : 0}], \end{matrix}

(A5)

in which case its relation to the entropy rate becomes clear. Using Equation (A2), we see that

\begin{matrix} h_{μ} = lim_{l \to \infty} Δ H [l] . \end{matrix}

For an observer with no prior knowledge of an information source, it will often be necessary to estimate the entropy rate of a source using finite sequences of data. In this case, the entropy gain can serve as a finite-ℓ approximation of the true entropy rate:

\begin{matrix} h_{μ} (l) & \equiv Δ H [l] \\ \equiv H [l] - H [l - 1] . \end{matrix}

(A6)

Appendix A.7. Predictability Gain

The second derivative of

H [l]

is the predictability gain, which is defined as

\begin{matrix} Δ^{2} H [l] & \equiv Δ h_{μ} (l) \\ = h_{μ} (l) - h_{μ} (l - 1), \end{matrix}

where

l > 0

. Note that

Δ^{2} H [l] \leq 0

.

| Δ^{2} H [l] |

is the average amount of additional predictive information an observer obtains when expanding their observations from blocks of length

l - 1

to blocks of length ℓ, and its units are bits per symbol². Large values of

| Δ^{2} H [l] |

imply that the ℓth measurement is particularly informative to an observer and therefore greatly improves their estimate of the entropy rate, as given by Equation (A6).

One can also calculate higher-order discrete derivatives of

H [l]

. For our purposes, this is not necessary except to note that, for stationary processes,

\begin{matrix} lim_{l \to \infty} Δ^{n} H [l] = 0, n \geq 2 . \end{matrix}

For

n = 2

, this follows from convergence of

h_{μ} (l)

, and the argument for all

n > 2

is similar.

Appendix A.8. Total Predictability

We can now integrate the functions

Δ^{n} H [l]

. As a general heuristic, the larger the magnitude of these integrals, the more correlation or statistical bias exists within the process.

We begin by studying how the predictability gain

Δ^{n} H [l]

converges to its asymptotic value

{lim}_{l \to \infty} Δ^{2} H [l] = 0

. Since

Δ^{2} H [0]

is undefined, we will integrate using Equation (A4) with

l_{0} = 1

to obtain the total predictability

G

:

\begin{matrix} G \equiv I_{2} = \sum_{l = 1}^{\infty} Δ^{2} H (l) . \end{matrix}

(A7)

Since

Δ^{2} H [l] < 0

for all ℓ,

G < 0

as well. The units of

G

are bits per symbol. Graphically, it is the area between the predictability gain curve and its linear asymptote of 0, as seen in Figure 4.

To interpret

G

’s value, we apply Equation (A3) to get

\begin{matrix} G & = - Δ H [0] + lim_{l \to \infty} Δ H [l] \\ = - {log}_{2} (| X |) + h_{μ} \\ = - R . \end{matrix}

A process’s total predictability is then equal in magnitude to its redundancy, and we can interpret

| G |

as the amount of predictable information per symbol for a process. Here, we emphasize once more that for a given process with alphabet

X

, any random variable has a maximum entropy of

{log}_{2} (| X |)

that consists of two kinds of information:

h_{μ}

, the irreducible randomness, and

| G |

, the amount of information that an observer can possibly predict about it.

The total predictability is thus a function of the entropy rate for a given process and shares

h_{μ}

’s weakness: It cannot identify statistical correlations between random variables. A large value of

| G |

could be the result of either an i.i.d. process with heavily biased random variables or strong correlations between subsequent variables. Fortunately, our next quantity does distinguish between these cases.

Appendix A.9. Excess Entropy

Investigating the convergence of the entropy gain to the asymptotic entropy rate

h_{μ}

leads to a well-studied process property, the excess entropy:

\begin{matrix} E \equiv I_{1} & = \sum_{l = 1}^{\infty} [Δ H [l] - lim_{l \to \infty} Δ H [l]] \\ = \sum_{l = 1}^{\infty} [Δ H [l] - h_{μ}] . \end{matrix}

Since

Δ H [l] \geq h_{μ}

,

E \geq 0

and its units are bits.

Our interpretation of

E

becomes clearer after applying Equation (A3) to obtain

\begin{matrix} E = lim_{l \to \infty} [H [l] - h_{μ} l] . \end{matrix}

(A8)

While

h_{μ}

determines the linear asymptotic behavior of

H [l]

,

E

encapsulates all sublinear effects. We largely discuss processes for which

E

is finite, known as finitary processes. A process for which

E

is infinite (for example, if

H [l]

scales logarithmically with ℓ), is known as an infinitary process. Graphically,

E

corresponds to the

l = 0

intercept of the linear asymptote to the block entropy curve, as shown in Figure 2, as well as the area between the area between the entropy gain curve and its asymptote

h_{μ}

, as seen in Figure 3.

The excess entropy can also be written as a mutual information between two halves of a process’ chain of random variables:

\begin{matrix} E = lim_{l \to \infty} I [X_{- l : 0} : X_{0 : l}] . \end{matrix}

(A9)

This suggests it is the total amount of information in a process’s past that is useful for predicting the future. It is therefore considered an indicator of the amount of process memory. A more structural approach reveals that

E

is a lower bound on the actual amount of memory required to predict a stochastic process [65].

Importantly,

E

easily distinguishes between i.i.d. processes (for which

E = 0

) and processes with correlations between random variables (

E > 0

). For all processes that are periodic with period p,

h_{μ} = 0

, and

H [l]

reaches a maximum value of

{log}_{2} p

for

l = p

. Therefore, all period-p processes have

E = {log}_{2} p

.

As with

h_{μ}

, it is often useful for an observer to make an approximation of the true excess entropy using only finite length-ℓ symbol sequences. Using Equation (A8), we can estimate

E

as

\begin{matrix} E (l) & \equiv H [l] - l h_{μ} (l) . \end{matrix}

(A10)

Appendix A.10. Transient Information

Based upon our analysis of the excess entropy, we can now say that as

l \to \infty

, the block entropy curve has a linear asymptote defined as

\begin{matrix} H [l] \sim E + h_{μ} l . \end{matrix}

(A11)

We capture the way that

H [l]

converges to this asymptote by taking another discrete integral to obtain the transient information:

\begin{matrix} T \equiv - I_{0} = \sum_{l = 0}^{\infty} [E + h_{μ} l - H [l]] . \end{matrix}

The units of

T

are bits × symbol.

T

is represented graphically in Figure 2 as the area between the

H [l]

curve and its linear asymptote for

l \to \infty

.

The transient information can be rewritten as

\begin{matrix} T = \sum_{l = 1}^{\infty} l [h_{μ} (l) - h_{μ}], \end{matrix}

indicating that

T

is a measure of how difficult it is to synchronize to a process. An observer is considered “synchronized” when they are able to infer exactly which internal state the source currently occupies. If they remain synchronized, then they can optimally predict future measurement outcomes. That is, their estimate of the source’s entropy rate

h_{μ} (l)

is equal to its actual entropy rate

h_{μ}

.

T

is a notable quantity since, unlike

E

, it is capable of distinguishing between different period-p processes.

Appendix A.11. Markov Order

The final property we introduce for classical stochastic processes is the Markov order. A process has Markov order R if R is the minimum value for which the following condition holds:

\begin{matrix} Pr (X_{0} | X_{R : 0}) = Pr (X_{0} | X_{- \infty : 0}) . \end{matrix}

From Equation (2), we conclude that a Markov process is one for which

R \leq 1

. More generally, for a process with Markov order R, the distribution for

X_{t}

depends only on the previous R symbols. Equivalently, R is the value of ℓ for which the block entropy reaches its linear asymptote:

\begin{matrix} H [R] = E + h_{μ} R . \end{matrix}

This fact is represented graphically in Figure 2.

The overwhelming majority of finite-state hidden Markov processes have a infinite Markov order, meaning that

H [l] < E + h_{μ} l

for all ℓ [71]. Nevertheless, an observer can still make use of finite-ℓ estimates of the information properties defined here. In fact, these estimates typically converge exponentially fast with ℓ [72].

Appendix B. Quantum Channels for Preparation and Measurement

The main text claims that a separable qudit process is the result of passing realizations of a classical process

\overset{\leftrightarrow}{X}

with symbol alphabet

X

through a classical quantum channel that takes

x \in X \to | ψ_{x} 〉 \in H

. It also claims that the act of measurement can be described as a quantum-classical channel taking

ρ_{0 : l} \to y_{0 : l}

. The following elaborates on both claims, adopting the formalism of Ref. [6].

We first describe passing realizations of a classical process

\overset{\leftrightarrow}{X}

through a conditional quantum encoder. Consider a classical quantum state composed of a classical register of dimension

| X |

and a qudit in

H

initialized in the state

| 0 〉

. Furthermore, let the classical register store the outcome of some classical random variable

X_{t}

with outcomes

x \in X

such that

\begin{matrix} ρ_{X A} = \sum_{x} Pr (x) | x 〉 {〈 x |}_{X} \otimes | 0 〉 {〈 0 |}_{A} . \end{matrix}

(A12)

This state will serve as input to a conditional quantum encoder

E_{X A \to B}

that consists of a set

{E_{A \to B}^{x}}

of

| X |

completely positive trace-preserving (CPTP) maps. We construct each map such that it transforms the initial quantum state

| 0 〉

to the desired pure qudit state

| ψ_{x} 〉

, i.e.,

\begin{matrix} E_{A \to B}^{x} (| 0 〉 {〈 0 |}_{A}) = | ψ_{x} 〉 {〈 ψ_{x} |}_{B}, \end{matrix}

(A13)

with

E_{A \to B}^{x}

being a unitary operation. (For

d = 2

, we can consider rotations on the Bloch sphere).

The encoding is

\begin{matrix} ρ_{B} & = E_{X A \to B} (ρ_{X A}) \\ = t r_{X} (\sum_{x} Pr (x) | x 〉 〈 x | \otimes E_{A \to B}^{x} (| 0 〉 〈 0 |)) \\ = \sum_{x} Pr (x) | ψ_{x} 〉 {〈 ψ_{x} |}_{B} . \end{matrix}

Now, taking ℓ classical registers that consist of length-ℓ words

x_{0 : l}

of a classical process

\overset{\leftrightarrow}{X}

, we can use

E_{X A \to B}

to encode

X_{0 : l}

and obtain

\begin{matrix} ρ_{0 : l} & = t r_{X_{0 : l}} (\sum_{x_{0 : l}} Pr (x_{0 : l}) \prod_{t = 0}^{l - 1} | x_{t} 〉 〈 x_{t} | \otimes E_{A \to B}^{x_{t}} (| 0 〉 〈 0 |)) \\ = \sum_{w} Pr (w) | ψ_{w} 〉 〈 ψ_{w} |, \end{matrix}

where

| ψ_{w} 〉

take the separable form of Equation (5), and

ρ_{0 : l}

therefore matches Equation (6).

The measurement of qudits is described similarly. Let M be a POVM with elements

{E_{y}}

that acts on a qudit state

ρ

in such a way that it records each measurement outcome in a classical register Y. The distribution over values in Y is determined by

\begin{matrix} Y & = M (ρ) \\ = \sum_{y} t r (E_{y} ρ) | y 〉 〈 y | . \end{matrix}

Likewise, consider

M_{0 : l}

to be a length-ℓ sequence of measurements with possible measurement outcome sequences

y_{0 : l}

determined according to some protocol

M

. When applying

M_{0 : l}

to ℓ consecutive qudits in the joint state

ρ_{0 : l}

, we assume that

M_{0 : l}

consists of local measurements in the form of Equation (10). In this case, we factor the POVM elements corresponding to particular sequences of measurement outcomes:

E_{y_{0 : l}} = ⨂_{t = 0}^{l - 1} E_{y_{t}}

. A length-ℓ measurement outcome can be stored in

Y_{0 : l}

, a set of ℓ classical registers, with its associated probability

t r (E_{y_{0 : l}} ρ_{0 : l})

so that

\begin{matrix} Y_{0 : l} & = M_{0 : l} (ρ_{0 : l}) \\ = \sum_{y_{0 : l}} t r (E_{y_{0 : l}} ρ_{0 : l}) | y_{0 : l} 〉 〈 y_{0 : l} | \\ = \sum_{y_{0 : l}} t r (⨂_{t = 0}^{l - 1} E_{y_{t}} ρ_{0 : l}) | y_{0 : l} 〉 〈 y_{0 : l} |, \end{matrix}

where the last line assumes local measurements. Each

| y_{0 : l} 〉

is then a separable state, and all

| y_{0 : l} 〉

are orthogonal.

References

Heisenberg, W. Über den anschaulichen inhalt der quantentheoretischen kinematik und mechanik. Z. Phys. 1927, 43, 172–198. [Google Scholar] [CrossRef]
Peres, A. Two simple proofs of the Kochen-Specker theorem. J. Phys. A 1991, 24, L175. [Google Scholar] [CrossRef]
Wootters, W.K.; Fields, B.D. Optimal state-determination by mutually unbiased measurements. Ann. Phys. 1989, 191, 363–381. [Google Scholar] [CrossRef]
Renes, J.M.; Blume-Kohout, R.; Scott, A.J.; Caves, C.M. Symmetric informationally complete quantum measurements. J. Math. Phys. 2004, 45, 2171–2180. [Google Scholar] [CrossRef]
von Neumann, J. Mathematical Foundations of Quantum Mechanics: New Edition; Princeton University Press: Princeton, NJ, USA, 2018. [Google Scholar]
Wilde, M. Quantum Information Theory, 2nd ed.; Cambridge University Press: Cambridge, UK, 2017. [Google Scholar]
Shannon, C.E. A mathematical theory of communication. Bell Sys. Tech. J. 1948, 27, 379–423+623–656. [Google Scholar] [CrossRef]
Schumacher, B. Quantum coding. Phys. Rev. A 1995, 51, 2738–2747. [Google Scholar] [CrossRef]
Crutchfield, J.P. Between order and chaos. Nat. Phys. 2012, 8, 17–24. [Google Scholar] [CrossRef]
Crutchfield, J.P.; Young, K. Inferring statistical complexity. Phys. Rev. Let. 1989, 63, 105–108. [Google Scholar] [CrossRef]
Crutchfield, J.P.; Feldman, D.P. Regularities unseen, randomness observed: Levels of entropy convergence. Chaos 2003, 13, 25–54. [Google Scholar] [CrossRef]
Travers, N.F.; Crutchfield, J.P. Exact synchronization for finite-state sources. J. Stat. Phys. 2011, 145, 1181–1201. [Google Scholar] [CrossRef]
Travers, N.F.; Crutchfield, J.P. Asymptotic synchronization for finite-state sources. J. Stat. Phys. 2011, 145, 1202–1223. [Google Scholar] [CrossRef]
Venegas-Li, A.E.; Jurgens, A.M.; Crutchfield, J.P. Measurement-induced randomness and structure in controlled qubit processes. Phys. Rev. E 2020, 102, 040102. [Google Scholar] [CrossRef] [PubMed]
Holevo, A.S. Quantum coding theorems. Usp. Mat. Nauk 1998, 53, 193–230. [Google Scholar] [CrossRef]
Datta, N.; Suhov, Y. Data Compression Limit for an Information Source of Interacting Qubits. Quantum Inf. Process. 2002, 1, 257–281. [Google Scholar] [CrossRef]
Petz, D.; Mosonyi, M. Stationary quantum source coding. J. Math. Phys. 2001, 42, 4857–4864. [Google Scholar] [CrossRef]
Nagamatsu, Y.; Mizutani, A.; Ikuta, R.; Yamamoto, T.; Imoto, N.; Tamaki, K. Security of quantum key distribution with light sources that are not independently and identically distributed. Phys. Rev. A 2016, 93, 042325. [Google Scholar] [CrossRef]
Pollock, F.A.; Rodriguez-Rosario, C.; Frauenheim, T.; Paternostro, M.; Modi, K. Operational Markov condition for quantum processes. Phys. Rev. Lett. 2018, 120, 040405. [Google Scholar] [CrossRef]
Pollock, F.A.; Rodriguez-Rosario, C.; Frauenheim, T.; Paternostro, M.; Modi, K. Non-Markovian quantum processes: Complete framework and efficient characterization. Phys. Rev. A 2018, 97, 012127. [Google Scholar] [CrossRef]
Taranto, P.; Pollock, F.A.; Milz, S.; Tomamichel, M.; Modi, K. Quantum Markov order. Phys. Rev. Lett. 2019, 122, 14041. [Google Scholar] [CrossRef]
Taranto, P.; Pollock, F.A.; Modi, K. Non-Markovian memory strength bounds quantum process recoverability. npj Quantum Inf. 2021, 7, 149. [Google Scholar] [CrossRef]
Taranto, P.; Milz, S.; Pollock, F.A.; Modi, K. Structure of quantum stochastic processes with finite Markov order. Phys. Rev. A 2019, 99, 042108. [Google Scholar] [CrossRef]
Schon, C.; Solano, E.; Verstraete, F.; Cirac, J.I.; Wolf, M.M. Sequential generation of entangled multiqubit states. Phys. Rev. Lett. 2005, 95, 110503. [Google Scholar] [CrossRef]
Schon, C.; Hammerer, K.; Wolf, M.M.; Cirac, J.O.; Solano, E. Sequential generation of matrix-product states in cavity QED. Phys. Rev. A 2007, 55, 032311. [Google Scholar] [CrossRef]
Boyd, A.B.; Mandal, D.; Crutchfield, J.P. Correlation-powered information engines and the thermodynamics of self-correction. Phys. Rev. E 2017, 95, 012152. [Google Scholar] [CrossRef]
Chapman, A.; Miyake, A. How an autonomous quantum Maxwell demon can harness correlated information. Phys. Rev. E 2015, 92, 062125. [Google Scholar] [CrossRef] [PubMed]
Gu, M.; Wiesner, K.; Rieper, E.; Vedral, V. Occam’s quantum razor: How quantum mechanics can reduce the complexity of classical models. Nat. Commun. 2012, 3, 762. [Google Scholar] [CrossRef]
Mahoney, J.R.; Aghamohammadi, C.; Crutchfield, J.P. Occam’s quantum strop: Synchronizing and compressing classical cryptic processes via a quantum channel. Sci. Rep. 2016, 6, 20495. [Google Scholar] [CrossRef] [PubMed]
Suen, W.Y.; Elliot, T.J.; Thompson, J.; Garner, A.J.; Mahoney, J.R.; Vedral, V.; Gu, M. Surveying structural complexity in quantum many-body systems. J. Stat. Phys. 2022, 187, 4. [Google Scholar] [CrossRef]
Binder, F.C.; Thompson, J.; Gu, M. Practical unitary simulator for non-Markovian complex processes. Phys. Rev. Lett. 2018, 120, 240502. [Google Scholar] [CrossRef]
Loomis, S.P.; Crutchfield, J.P. Strong and Weak Optimizations in Classical and Quantum Models of Stochastic Processes. J. Stat. Phys. 2019, 176, 1317–1342. [Google Scholar] [CrossRef]
Nielsen, M.A.; Chuang, I.L. Quantum Computation and Quantum Information: 10th Anniversary Edition, 10th ed.; Cambridge University Press: Cambridge, MA, USA, 2011. [Google Scholar]
Gudder, S. Quantum Markov chains. J. Math. Phys. 2008, 49, 072105. [Google Scholar] [CrossRef]
Monras, A.; Beige, A.; Wiesner, K. Hidden Quantum Markov Models and non-adaptive read-out of many-body states. arXiv 2010, arXiv:1002.2337. [Google Scholar]
Wiesner, K.; Crutchfield, J.P. Computation in finitary stochastic and quantum processes. Phys. D 2008, 237, 1173–1195. [Google Scholar] [CrossRef]
Srinivasan, S.; Gordon, G.; Boots, B. Learning hidden quantum Markov models. In Proceedings of the International Conference on Artificial Intelligence and Statistics, Playa Blanca, Spain, 9–11 April 2018; pp. 1979–1987. [Google Scholar]
Perez-Garcia, D.; Verstraete, F.; Wolf, M.M.; Cirac, J.I. Matrix product state representations. arXiv 2006, arXiv:quant-ph/0608197. [Google Scholar] [CrossRef]
Verstraete, F.; García-Ripoll, J.J.; Cirac, J.I. Matrix product density operators: Simulation of finite-temperature and dissipative systems. Phys. Rev. Lett. 2004, 93, 207204. [Google Scholar] [CrossRef]
Moore, C.; Crutchfield, J.P. Quantum automata and quantum grammars. Theoret. Comp. Sci. 2000, 237, 275–306. [Google Scholar] [CrossRef]
Qiu, D.; Li, L.; Mateus, P.; Gruska, J. Quantum Finite Automata. In Handbook of Finite State Based Models and Applications; CRC Press: Boca Raton, FL, USA, 2012; Volume 10, pp. 113–144. [Google Scholar]
Zheng, S.; Qiu, D.; Li, L.; Gruska, J. One-way finite automata with quantum and classical states. In Lecture Notes in Computer Science; Springer: Berlin/Heidelberg, Germany, 2012; Volume 7300, pp. 273–290. [Google Scholar]
Junge, M.; Renner, R.; Sutter, D.; Wilde, M.M.; Winter, A. Universal recovery maps and approximate sufficiency of quantum relative entropy. Ann. Henri Poincaré 2018, 19, 2955–2978. [Google Scholar] [CrossRef]
Lanford, O.E.; Robinson, D.W. Mean entropy of states in quantum-statistical mechanics. J. Math. Phys. 1968, 9, 1120–1125. [Google Scholar] [CrossRef]
Ohya, M.; Petz, D. Quantum Entropy and Its Use; Springer: Berlin/Heidelberg, Germany, 1993. [Google Scholar]
Cover, T.M.; Thomas, J.A. Elements of Information Theory; Wiley-Interscience: New York, NY, USA, 1991. [Google Scholar]
Travers, N.F.; Crutchfield, J.P. Equivalence of history and generator ϵ-machines. arXiv 2011, arXiv:1111.4500. [Google Scholar]
Ellison, C.J.; Mahoney, J.R.; Crutchfield, J.P. Prediction, retrodiction, and the amount of information stored in the present. J. Stat. Phys. 2009, 136, 1005–1034. [Google Scholar] [CrossRef]
Jurgens, A.M.; Crutchfield, J.P. Shannon entropy rate of hidden Markov processes. J. Stat. Phys. 2021, 183, 32. [Google Scholar] [CrossRef]
Crutchfield, J.P.; Riechers, P.; Ellison, C.J. Exact complexity: Spectral decomposition of intrinsic computation. Phys. Lett. A 2016, 380, 998–1002. [Google Scholar] [CrossRef]
Marzen, S.; Crutchfield, J.P. Informational and causal architecture of discrete-time renewal processes. Entropy 2015, 17, 4891–4917. [Google Scholar] [CrossRef]
Jurgens, A.; Crutchfield, J.P. Divergent predictive states: The statistical complexity dimension of stationary, ergodic hidden Markov processes. Chaos 2021, 31, 0050460. [Google Scholar] [CrossRef]
Fujiwara, Y. Parsing a sequence of qubits. IEEE Trans. Inf. Theory 2013, 59, 6796–6806. [Google Scholar] [CrossRef]
Strelioff, C.C.; Crutchfield, J.P. Bayesian structural inference for hidden processes. Phys. Rev. E 2014, 89, 042119. [Google Scholar] [CrossRef] [PubMed]
Thew, R.T.; Nemoto, K.; White, A.G.; Munro, W.J. Qudit quantum-state tomography. Phys. Rev. A 2002, 66, 012303. [Google Scholar] [CrossRef]
Bengtsson, I.; Życzkowski, K. Geometry of Quantum States: An Introduction to Quantum entanglement; Cambridge University Press: Cambridge, UK, 2017. [Google Scholar]
Lawrence, J.; Brukner, Č.; Zeilinger, A. Mutually unbiased binary observable sets on n qubits. Phys. Rev. A 2002, 65, 032320. [Google Scholar] [CrossRef]
Gharibian, S. Strong NP-hardness of the quantum separability problem. arXiv 2008, arXiv:0810.4507. [Google Scholar] [CrossRef]
Horodecki, P. Separability criterion and inseparable mixed states with positive partial transposition. Phys. Lett. A 1997, 232, 333–339. [Google Scholar] [CrossRef]
Horodecki, M.; Horodecki, P.; Horodecki, R. Separability of n-particle mixed states: Necessary and sufficient conditions in terms of linear maps. Phys. Lett. A 2001, 283, 1–7. [Google Scholar] [CrossRef]
Pirvu, B.; Murg, V.; Cirac, J.I.; Verstraete, F. Matrix product operator representations. New J. Phys. 2010, 12, 025012. [Google Scholar] [CrossRef]
Riechers, P.; Crutchfield, J.P. Spectral simplicity of apparent complexity, Part I: The nondiagonalizable metadynamics of prediction. Chaos 2018, 28, 033115. [Google Scholar] [CrossRef] [PubMed]
Riechers, P.M.; Crutchfield, J.P. Beyond the spectral theorem: Decomposing arbitrary functions of nondiagonalizable operators. AIP Adv. 2018, 8, 065305. [Google Scholar] [CrossRef]
Riechers, P.; Crutchfield, J.P. Spectral simplicity of apparent complexity, Part II: Exact complexities and complexity spectra. Chaos 2018, 28, 033116. [Google Scholar] [CrossRef]
Shalizi, C.R.; Crutchfield, J.P. Computational mechanics: Pattern and prediction, structure and simplicity. J. Stat. Phys. 2001, 104, 817–879. [Google Scholar] [CrossRef]
Landauer, R. Irreversibility and heat generation in the computing process. IBM J. Res. Dev. 1961, 5, 183–191. [Google Scholar] [CrossRef]
Boyd, A.B.; Mandal, D.; Crutchfield, J.P. Leveraging environmental correlations: The thermodynamics of requisite variety. J. Stat. Phys. 2016, 167, 1555–1585. [Google Scholar] [CrossRef]
Jurgens, A.M.; Crutchfield, J.P. Functional thermodynamics of Maxwellian ratchets: Constructing and deconstructing patterns, randomizing and derandomizing behaviors. Phys. Rev. Res. 2020, 2, 033334. [Google Scholar] [CrossRef]
Huang, R.; Riechers, P.; Gu, M.; Narasimhachar, V. Engines for predictive work extraction from memoryfull quantum stochastic processes. arXiv 2022, arXiv:2207.03480v5. [Google Scholar]
Cover, T.M.; Thomas, J.A. Elements of Information Theory, 2nd ed.; Wiley-Interscience: New York, NY, USA, 2006. [Google Scholar]
James, R.G.; Mahoney, J.R.; Ellison, C.J.; Crutchfield, J.P. Many roads to synchrony: Natural time scales and their algorithms. Phys. Rev. E 2014, 89, 042135. [Google Scholar] [CrossRef] [PubMed]
Travers, N.F. Exponential bounds for convergence of entropy rate approximations in hidden Markov models satisfying a path-mergeability condition. Stoch. Proc. Appl. 2014, 124, 4149–4170. [Google Scholar] [CrossRef]

Figure 1. A stationary quantum information source emits qudits that are correlated due to the source’s internal memory. An experimenter measures these qudits in different ways (

M

or

M^{'}

), resulting in a family of classical stochastic processes.

Figure 1. A stationary quantum information source emits qudits that are correlated due to the source’s internal memory. An experimenter measures these qudits in different ways (

M

or

M^{'}

), resulting in a family of classical stochastic processes.

Figure 2. Convergence of the block entropies to their linear asymptotes.

H [l]

is the block entropy for a finitary classical process with Markov order R (see Appendix A), and

S (l)

is the quantum block entropy for a finitary quantum process with infinite quantum Markov order

R_{q}

. For the classical process,

E

is the excess entropy, and

h_{μ}

is its Shannon entropy rate. Similarly, for a quantum process,

E_{q}

is the quantum excess entropy, and s is its von Neumann entropy rate. The area of the red shaded region is the quantum transient information

T_{q}

.

Figure 2. Convergence of the block entropies to their linear asymptotes.

H [l]

is the block entropy for a finitary classical process with Markov order R (see Appendix A), and

S (l)

is the quantum block entropy for a finitary quantum process with infinite quantum Markov order

R_{q}

. For the classical process,

E

is the excess entropy, and

h_{μ}

is its Shannon entropy rate. Similarly, for a quantum process,

E_{q}

is the quantum excess entropy, and s is its von Neumann entropy rate. The area of the red shaded region is the quantum transient information

T_{q}

.

Figure 3. Convergence of

Δ H (l)

(for a finitary classical process with Markov order R) and

Δ S (l)

(for a finitary quantum process with infinite quantum Markov order

R_{q}

) to the processes’ entropy rates,

h_{μ}

and s. The shaded areas are the classical (blue) and quantum (red) excess entropies.

Figure 3. Convergence of

Δ H (l)

(for a finitary classical process with Markov order R) and

Δ S (l)

(for a finitary quantum process with infinite quantum Markov order

R_{q}

) to the processes’ entropy rates,

h_{μ}

and s. The shaded areas are the classical (blue) and quantum (red) excess entropies.

Figure 4. Convergence of the predictability (

Δ^{2} H (l)

and

Δ^{2} S (l)

) to 0 for a finitary classical process with Markov order R and a finitary quantum process with infinite quantum Markov order

R_{q}

. Recall Section III, which notes that changes in the magnitude of these second-order quantities—rates of change in a rate—indicate increasing or decreasing unpredictability. Said differently, their nonmonotonities reflect that both entropy rates capture length-dependent correlations. Note that the predictability is not monotonic (unlike the block entropy and its first derivative). The overlapping shaded areas represent the magnitude of the classical (blue) and quantum (red) predictabilities,

G

and

G_{q}

, which are negative by convention.

Figure 4. Convergence of the predictability (

Δ^{2} H (l)

and

Δ^{2} S (l)

) to 0 for a finitary classical process with Markov order R and a finitary quantum process with infinite quantum Markov order

R_{q}

. Recall Section III, which notes that changes in the magnitude of these second-order quantities—rates of change in a rate—indicate increasing or decreasing unpredictability. Said differently, their nonmonotonities reflect that both entropy rates capture length-dependent correlations. Note that the predictability is not monotonic (unlike the block entropy and its first derivative). The overlapping shaded areas represent the magnitude of the classical (blue) and quantum (red) predictabilities,

G

and

G_{q}

, which are negative by convention.

Figure 5. Quantum block entropies

S (l)

versus length ℓ for the periodic process emitting the state

| ψ_{00 ϕ} 〉

with different values of

ϕ

. Each curve approaches a maximum value of

E_{q} = {log}_{2} 3

. Larger values of

ϕ

correspond to more distinguishable alphabet states and lower values of

T_{q}

.

Figure 5. Quantum block entropies

S (l)

versus length ℓ for the periodic process emitting the state

| ψ_{00 ϕ} 〉

with different values of

ϕ

. Each curve approaches a maximum value of

E_{q} = {log}_{2} 3

. Larger values of

ϕ

correspond to more distinguishable alphabet states and lower values of

T_{q}

.

Figure 6.

| 0 〉

-

| + 〉

Quantum Golden Mean process generator.

Figure 6.

| 0 〉

-

| + 〉

Quantum Golden Mean process generator.

Figure 7. von Neumann entropy rate s (lower, orange surface) and measured entropy rate

h_{μ}^{Y}

(higher, blue surface) for the

| 0 〉

-

| ϕ 〉

Quantum Golden Mean process measured with the repeated PVM

M_{θ}

. s increases as

ϕ

does, and the alphabet becomes more distinguishable. For

ϕ = π

and

θ = 0, π

, we recover the classical Golden Mean. For

(θ - ϕ) = \frac{π}{2}

,

M_{θ}

applied to

| ϕ 〉

is a maximum entropy PVM (distribution of measurement outcomes is 50-50). Maxima of

h_{μ}^{Y}

lie in this region.

Figure 7. von Neumann entropy rate s (lower, orange surface) and measured entropy rate

h_{μ}^{Y}

(higher, blue surface) for the

| 0 〉

-

| ϕ 〉

Quantum Golden Mean process measured with the repeated PVM

M_{θ}

. s increases as

ϕ

does, and the alphabet becomes more distinguishable. For

ϕ = π

and

θ = 0, π

, we recover the classical Golden Mean. For

(θ - ϕ) = \frac{π}{2}

,

M_{θ}

applied to

| ϕ 〉

is a maximum entropy PVM (distribution of measurement outcomes is 50-50). Maxima of

h_{μ}^{Y}

lie in this region.

Figure 8. Quantum excess entropy

E_{q}

(higher, orange surface) and measured excess entropy

E^{Y}

(lower, blue surface) for the

| 0 〉

-

| ϕ 〉

Quantum Golden Mean process measured with repeated PVM

M_{θ}

.

E_{q}

increases with

ϕ

, since

| 0 〉

and

| ϕ 〉

become more distinguishable.

E_{Y}

is maximized for

(θ - ϕ) = 0, π

, since

M_{θ}

can best determine if

| ϕ 〉

was emitted.

Figure 8. Quantum excess entropy

E_{q}

(higher, orange surface) and measured excess entropy

E^{Y}

(lower, blue surface) for the

| 0 〉

-

| ϕ 〉

Quantum Golden Mean process measured with repeated PVM

M_{θ}

.

E_{q}

increases with

ϕ

, since

| 0 〉

and

| ϕ 〉

become more distinguishable.

E_{Y}

is maximized for

(θ - ϕ) = 0, π

, since

M_{θ}

can best determine if

| ϕ 〉

was emitted.

Figure 9. 3-Symbol Quantum Golden Mean process generator.

Figure 10. Adaptive measurement protocol (in the form of a DQMP) for the 3-Symbol Quantum Golden Mean process. To synchronize, an observer starts in

T_{0}

(a transient state) and measures with

M_{01}

. The probability of observing exactly n ‘1’s is

\frac{1}{2^{n}}

. Upon observing a ‘0’, the observer synchronizes. States A and B correspond exactly to internal states A and B of the source in Figure 9. The ‘−’ transition is not displayed because it has probability 0. The source is quantum unifilar; thus, one stays synchronized for future times.

Figure 10. Adaptive measurement protocol (in the form of a DQMP) for the 3-Symbol Quantum Golden Mean process. To synchronize, an observer starts in

T_{0}

(a transient state) and measures with

M_{01}

. The probability of observing exactly n ‘1’s is

\frac{1}{2^{n}}

. Upon observing a ‘0’, the observer synchronizes. States A and B correspond exactly to internal states A and B of the source in Figure 9. The ‘−’ transition is not displayed because it has probability 0. The source is quantum unifilar; thus, one stays synchronized for future times.

Figure 11. Unifilar qubit source. Each internal state emits one of two orthogonal states and then transitions—e.g., A emits either

| 0 〉

or

| 1 〉

that can be distinguished by measurement

M_{01}

—giving this source the property of quantum unifilarity. p is a parameter that takes values from 0 to 1. Other processes correspond to particular p values: for example, a nonorthogonal period-2 process (

p = 0

), the maximally mixed i.i.d. process (

p = \frac{1}{2}

), and a deterministic sequence of either

| 1 〉

or

| - 〉

(

p = 1

).

Figure 11. Unifilar qubit source. Each internal state emits one of two orthogonal states and then transitions—e.g., A emits either

| 0 〉

or

| 1 〉

that can be distinguished by measurement

M_{01}

—giving this source the property of quantum unifilarity. p is a parameter that takes values from 0 to 1. Other processes correspond to particular p values: for example, a nonorthogonal period-2 process (

p = 0

), the maximally mixed i.i.d. process (

p = \frac{1}{2}

), and a deterministic sequence of either

| 1 〉

or

| - 〉

(

p = 1

).

Figure 12. Nonunifilar qubit source. Each internal state emits one of two nonorthogonal states and then transitions. An observer will not be able to determine which state the source transitioned to with any POVM. p takes values from 0 to 1. Other processes correspond to particular p values: for example, an orthogonal period-2 process (

p = 0

), the maximally mixed i.i.d. process (

p = \frac{1}{2}

), and a deterministic sequence of either

| 1 〉

or

| - 〉

(

p = 1

).

Figure 12. Nonunifilar qubit source. Each internal state emits one of two nonorthogonal states and then transitions. An observer will not be able to determine which state the source transitioned to with any POVM. p takes values from 0 to 1. Other processes correspond to particular p values: for example, an orthogonal period-2 process (

p = 0

), the maximally mixed i.i.d. process (

p = \frac{1}{2}

), and a deterministic sequence of either

| 1 〉

or

| - 〉

(

p = 1

).

Figure 13. Unifilar qutrit source. When in internal states A and B, it emits a qutrit in the subspace of Hilbert space spanned by

| 0 〉

and

| 1 〉

. When in C, it emits

| 2 〉

, which can always be distinguished from all other states in

Q

. This demonstrates additional opportunities for synchronization in higher-dimensional Hilbert spaces.

Figure 13. Unifilar qutrit source. When in internal states A and B, it emits a qutrit in the subspace of Hilbert space spanned by

| 0 〉

and

| 1 〉

. When in C, it emits

| 2 〉

, which can always be distinguished from all other states in

Q

. This demonstrates additional opportunities for synchronization in higher-dimensional Hilbert spaces.

Figure 14. Average state uncertainty

H (l)

for period-5 qudit processes:

ψ

denotes state

| ψ (ϕ) 〉

. The associated PVM is

M_{ϕ} = {| ψ (ϕ) 〉, | ψ_{(} ϕ + π) 〉}

. We set

ϕ = 3 π / 4

. The area under each curve is the synchronization information

S

for that process and measurement.

Figure 14. Average state uncertainty

H (l)

for period-5 qudit processes:

ψ

denotes state

| ψ (ϕ) 〉

. The associated PVM is

M_{ϕ} = {| ψ (ϕ) 〉, | ψ_{(} ϕ + π) 〉}

. We set

ϕ = 3 π / 4

. The area under each curve is the synchronization information

S

for that process and measurement.

Figure 15. Adaptive measurement protocol for period-5 sequence

| 00 ψ 0 ψ 〉

. Each state is labeled with the next measurement to perform, either

M_{01}

, with possible outcomes ‘0’ and ‘1’, or

M_{ϕ}

, the PVM with elements

| ψ (ϕ) 〉 〈 ψ (ϕ) |

and

| ψ (ϕ + π) 〉 〈 ψ (ϕ + π) |

and possible outcomes

ϕ

and

(ϕ + π)

. The five states shown are the measurement protocol’s recurrent states that an observer only encounters when synchronized. An observer who is synchronized and using this protocol sees a measured period-5 process with word ‘

00 ϕ 0 ϕ

.’

Figure 15. Adaptive measurement protocol for period-5 sequence

| 00 ψ 0 ψ 〉

. Each state is labeled with the next measurement to perform, either

M_{01}

, with possible outcomes ‘0’ and ‘1’, or

M_{ϕ}

, the PVM with elements

| ψ (ϕ) 〉 〈 ψ (ϕ) |

and

| ψ (ϕ + π) 〉 〈 ψ (ϕ + π) |

and possible outcomes

ϕ

and

(ϕ + π)

. The five states shown are the measurement protocol’s recurrent states that an observer only encounters when synchronized. An observer who is synchronized and using this protocol sees a measured period-5 process with word ‘

00 ϕ 0 ϕ

.’

Figure 16. Mixed-state presentation for the

| 0 〉

-

| + 〉

Quantum Golden Mean process measured with

M_{01}

. n refers to the number of consecutive ‘0’s since the most recent ‘1’.

Figure 16. Mixed-state presentation for the

| 0 〉

-

| + 〉

Quantum Golden Mean process measured with

M_{01}

. n refers to the number of consecutive ‘0’s since the most recent ‘1’.

Figure 17. Mixed-state presentation for the

| 0 〉

-

| + 〉

Quantum Golden Mean process measured with

M_{\pm}

. n refers to the number of consecutive ‘+’s since the most recent ‘−’.

Figure 17. Mixed-state presentation for the

| 0 〉

-

| + 〉

Quantum Golden Mean process measured with

M_{\pm}

. n refers to the number of consecutive ‘+’s since the most recent ‘−’.

Figure 18. Average state uncertainty

H (l)

for the

| 0 〉

-

| + 〉

Quantum Golden Mean generator after ℓ measurements.

Figure 18. Average state uncertainty

H (l)

for the

| 0 〉

-

| + 〉

Quantum Golden Mean generator after ℓ measurements.

Figure 19. Asymptotic state uncertainty when applying the PVM

M_{θ}

to the

| 0 〉

-

| + 〉

Quantum Golden Mean.

Figure 19. Asymptotic state uncertainty when applying the PVM

M_{θ}

to the

| 0 〉

-

| + 〉

Quantum Golden Mean.

Figure 20. Average state uncertainty

H (l)

for the unifilar qubit source (

p = 0.6

) initialized in state A. Only the adaptive measurement protocol (described in the text) is able to maintain synchronization.

Figure 20. Average state uncertainty

H (l)

for the unifilar qubit source (

p = 0.6

) initialized in state A. Only the adaptive measurement protocol (described in the text) is able to maintain synchronization.

Figure 21. Average state uncertainty

H (l)

while measuring the process generated by the unifilar qutrit source.

M_{012}

and

M_{\pm 2}

are fixed-basis measurements.

M_{012, s y n c}

and

M_{\pm 2, s y n c}

measure in a fixed basis until they observe a 2 and stay permanently synchronized afterwards.

M_{a d a p t i v e}

refers to the protocol in Figure 22.

Figure 21. Average state uncertainty

H (l)

while measuring the process generated by the unifilar qutrit source.

M_{012}

and

M_{\pm 2}

are fixed-basis measurements.

M_{012, s y n c}

and

M_{\pm 2, s y n c}

measure in a fixed basis until they observe a 2 and stay permanently synchronized afterwards.

M_{a d a p t i v e}

refers to the protocol in Figure 22.

Figure 22. Adaptive measurement protocol defined for the qutrit process generator in Figure 13. The three transient mixed states are labeled with the internal source states probabilities

(p_{A}, p_{B}, p_{C})

, and the three recurrent states correspond exactly to those states. This adaptive protocol permanently synchronizes to the source, since the source is quantum unifilar. Transitions with zero probability are omitted.

Figure 22. Adaptive measurement protocol defined for the qutrit process generator in Figure 13. The three transient mixed states are labeled with the internal source states probabilities

(p_{A}, p_{B}, p_{C})

, and the three recurrent states correspond exactly to those states. This adaptive protocol permanently synchronizes to the source, since the source is quantum unifilar. Transitions with zero probability are omitted.

Figure 23. Block entropy and length-ℓ estimates for information properties of the process generated by the nonunifilar qubit source in Figure 12, with

p = 0.05

. The slope and y-intercept of the linear estimates are s and

E_{q}

, respectively. Note that the estimates do not improve significantly for

l > 2

, indicating that the two-qubit correlations are most significant for determining information properties of the process.

Figure 23. Block entropy and length-ℓ estimates for information properties of the process generated by the nonunifilar qubit source in Figure 12, with

p = 0.05

. The slope and y-intercept of the linear estimates are s and

E_{q}

, respectively. Note that the estimates do not improve significantly for

l > 2

, indicating that the two-qubit correlations are most significant for determining information properties of the process.

Figure 24. Block entropy and length-ℓ estimates for information properties of the process generated by the unifilar qubit source in Figure 11, with

p = 0.05

. The slope and y-intercept of the linear estimates are s and

E_{q}

, respectively. Note that the estimates improve steadily for larger ℓ, indicating long-range correlations.

Figure 24. Block entropy and length-ℓ estimates for information properties of the process generated by the unifilar qubit source in Figure 11, with

p = 0.05

. The slope and y-intercept of the linear estimates are s and

E_{q}

, respectively. Note that the estimates improve steadily for larger ℓ, indicating long-range correlations.

Figure 25. The Bloch sphere representation of the boundary of the set of possible length-1 density matrices (

ρ_{0}

) for the given

Q

. (a) The set of valid states is a line segment. An observer only needs to determine one parameter. (b) The set of valid states is the interior of a triangle. An observer must determine two parameters. (c) The set of valid states is the interior of a tetrahedron. An observer must determine three parameters, and the decomposition into an ensemble of basis states is not unique. Here,

| y_{+} 〉 = \frac{1}{\sqrt{2}} (| 0 〉 - i | 1 〉)

.

Figure 25. The Bloch sphere representation of the boundary of the set of possible length-1 density matrices (

ρ_{0}

) for the given

Q

. (a) The set of valid states is a line segment. An observer only needs to determine one parameter. (b) The set of valid states is the interior of a triangle. An observer must determine two parameters. (c) The set of valid states is the interior of a tetrahedron. An observer must determine three parameters, and the decomposition into an ensemble of basis states is not unique. Here,

| y_{+} 〉 = \frac{1}{\sqrt{2}} (| 0 〉 - i | 1 〉)

.

Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

© 2025 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).

Share and Cite

MDPI and ACS Style

Gier, D.; Crutchfield, J.P. Intrinsic and Measured Information in Separable Quantum Processes. Entropy 2025, 27, 599. https://doi.org/10.3390/e27060599

AMA Style

Gier D, Crutchfield JP. Intrinsic and Measured Information in Separable Quantum Processes. Entropy. 2025; 27(6):599. https://doi.org/10.3390/e27060599

Chicago/Turabian Style

Gier, David, and James P. Crutchfield. 2025. "Intrinsic and Measured Information in Separable Quantum Processes" Entropy 27, no. 6: 599. https://doi.org/10.3390/e27060599

APA Style

Gier, D., & Crutchfield, J. P. (2025). Intrinsic and Measured Information in Separable Quantum Processes. Entropy, 27(6), 599. https://doi.org/10.3390/e27060599

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Menu

Intrinsic and Measured Information in Separable Quantum Processes

Abstract

1. Introduction

1.1. Quantum and Classical Randomness

1.2. Sources with Memory

2. Stochastic Processes

2.1. Classical Processes

2.2. Presentations

2.3. Quantum Processes

2.3.1. Memoryless

2.3.2. Memoryful

2.4. Presentations of Quantum Processes

2.5. Measured Processes

2.6. Adaptive Measurement Protocols

2.7. Discussion

3. Information in Quantum Processes

3.1. von Neumann Entropy

3.2. Quantum Block Entropy

3.3. von Neumann Entropy Rate

3.4. Quantum Redundancy

3.5. Quantum Entropy Gain

3.6. Quantum Predictability Gain

3.7. Total Quantum Predictability

3.8. Quantum Excess Entropy

3.9. Quantum Transient Information

3.10. Quantum Markov Order

4. Qudit Processes

4.1. I.I.D. Processes

4.2. Quantum Presentations of Classical Processes

4.3. Periodic Processes

4.4. Quantum Golden Mean Processes

4.5. 3-Symbol Quantum Golden Mean

4.6. Unifilar and Nonunifilar Qubit Sources

4.7. Unifilar Qutrit Source

4.8. Discussion

5. Synchronizing to a Quantum Source

5.1. States of Knowledge

5.2. Average State Uncertainty and Synchronization Information

5.3. Synchronizing to Quantum Presentations of Classical Processes

5.4. Synchronizing to Periodic Processes

5.5. Synchronizing with PVMs

5.6. Maintaining Synchrony with Adaptive Measurement

5.7. Synchronizing to a Qutrit Source

5.8. Discussion

6. Quantum Process System Identification

6.1. Classical System Identification

6.2. Tomography of a Qudit

6.3. Tomography of a Qudit Process

6.4. Cost of I.I.D.

6.5. Finite-Length Estimation of Information Properties

6.6. Tomography with a Known Quantum Alphabet

6.6.1. l = 1

6.6.2. l = 2

6.6.3. l ≥ 3

6.7. Source Reconstruction

6.7.1. l = 1

6.7.2. l = 2

6.7.3. l ≥ 3

6.8. Discussion

7. Conclusions

Author Contributions

Funding

Institutional Review Board Statement

Data Availability Statement

Acknowledgments

Conflicts of Interest

Appendix A. Information in Classical Processes

Appendix A.1. Shannon Entropy

Appendix A.2. Block Entropy

Appendix A.3. Shannon Entropy Rate

Appendix A.4. Redundancy

Appendix A.5. Block Entropy Derivatives and Integrals

Appendix A.6. Entropy Gain

Appendix A.7. Predictability Gain

Appendix A.8. Total Predictability

Appendix A.9. Excess Entropy

Appendix A.10. Transient Information

Appendix A.11. Markov Order

Appendix B. Quantum Channels for Preparation and Measurement

6.6.1. $l = 1$

6.6.2. $l = 2$

6.6.3. $l \geq 3$

6.7.1. $l = 1$

6.7.2. $l = 2$

6.7.3. $l \geq 3$