Next Article in Journal
A Network Analysis of the Impact of the Coronavirus Pandemic on the US Economy: A Comparison of the Return and the Momentum Picture
Previous Article in Journal
Recurrence Resonance and 1/f Noise in Neurons Under Quantum Conditions and Their Manifestations in Proteinoid Microspheres
 
 
Font Type:
Arial Georgia Verdana
Font Size:
Aa Aa Aa
Line Spacing:
Column Width:
Background:
Article

The Rosencrantz Coin: Predictability and Structure in Non-Ergodic Dynamics—From Recurrence Times to Temporal Horizons

by
Dimitri Volchenkov
Department of Mathematics and Statistics, Texas Tech University, 1108 Memorial Circle, Lubbock, TX 79409, USA
Entropy 2025, 27(2), 147; https://doi.org/10.3390/e27020147
Submission received: 18 December 2024 / Revised: 28 January 2025 / Accepted: 30 January 2025 / Published: 1 February 2025
(This article belongs to the Section Statistical Physics)

Abstract

:
We examine the Rosencrantz coin that can “stick” in states for extended periods. Non-ergodic dynamics is highlighted by logarithmically growing block lengths in sequences. Traditional entropy decomposition into predictable and unpredictable components fails due to the absence of stationary distributions. Instead, sequence structure is characterized by block probabilities and Stirling numbers of the second kind, peaking at block size n / log n . For large n, combinatorial growth dominates probability decay, creating a deterministic-like structure. This approach shifts the focus from predicting states to predicting temporal horizons, providing insights into systems beyond traditional equilibrium frameworks.

1. Introduction

Predicting the future state of a system is a fundamental challenge across various fields, from physics and information theory to economics and cognitive science [1]. Simple systems, such as a biased coin, can encapsulate profound insights into uncertainty, entropy, and information dynamics. In traditional ergodic systems, where the dynamics ensure that time averages converge to ensemble averages, predictability is often assessed using entropy and its decomposition into predictable and unpredictable components [2]. These components are characterized by measures such as recurrence time, residence time, and repetition time, which capture the structure of state sequences and quantify the degree of uncertainty (see Section 2).
However, many real-world systems exhibit non-ergodic behavior, where the assumption of visiting all possible states uniformly over time breaks down. In such systems, conventional entropy-based methods become inadequate because there is no stationary distribution to describe the long-term behavior. Instead, non-ergodic systems are marked by prolonged persistence in certain states, leading to sequences dominated by blocks of repeated outcomes. A striking example of such a system can be found in Tom Stoppard’s play Rosencrantz and Guildenstern Are Dead, where Rosencrantz experiences an improbable streak of 92 consecutive heads [3] that challenge traditional probabilistic frameworks. This anomaly prompts deeper exploration into non-ergodic dynamics, where long-term correlations lead to prolonged persistence in specific states. The Rosencrantz coin model (Section 3) provides a useful abstraction for studying systems with long-term correlations and memory effects. In this model, the barrier to switching states may remain constant or change intermittently, governed by a stochastic threshold. This behavior results in block-like structures within the sequences, where the lengths of these blocks increase logarithmically over time, leading to sequences dominated by blocks of persistent states. The variance bounds derived in [4] establish limits on the uncertainty of entropy estimates, particularly under multinomial assumptions. The estimation of errors in mutual information and entropy, as detailed by [5], highlights the importance of accounting for finite sample effects when interpreting dynamical systems. In the Rosencrantz coin model, similar principles can be applied to assess the variability of residence times and quantify the robustness of power-law distributions observed in correlated regimes. The model offers insights into the interplay between randomness and determinism, highlighting how predictable patterns emerge even in fundamentally stochastic processes (see Section 5).
To analyze these patterns, we explore the combinatorial structures governing the emerging block sequences, particularly focusing on the Stirling numbers of the second kind, which count the ways to partition sequences into non-empty blocks. The Stirling numbers reveal that for sufficiently long sequences, the most probable partitioning involves blocks of length approximately ln n , where n is the sequence length. This combinatorial insight compensates for the lack of a stationary distribution and enables meaningful predictions of the system’s temporal behavior (Section 5).
Our analysis shows that in a non-ergodic system, the balance between combinatorial growth and the exponential decay of block probabilities results in a deterministic-like structure within a fundamentally random process. Instead of predicting individual outcomes, the focus shifts to predicting the length of sequences based on the observed block size. The concept of a temporal horizon—the characteristic length of time over which predictable patterns emerge—becomes central in non-ergodic systems. Instead of predicting the next state, the focus shifts to predicting the duration of structured behavior based on observed block sizes (Section 5). The logarithmic utility of time for prediction the future, which reflects diminishing “returns” on prediction, further connects these ideas to hyperbolic time discounting models often found in human and animal decision-making (Section 6). This novel approach offers a deeper understanding of the dynamics of non-ergodic systems, bridging concepts from information theory, combinatorics, and stochastic processes. This approach also builds upon foundational methods in non-parametric entropy estimation as described by Harris [6], who demonstrated how entropy can be reliably estimated even in systems with vast or poorly defined state spaces.
The paper is organized as follows: Section 2 discusses entropy decomposition in stationary Markov chains, highlighting predictable and unpredictable components and introducing the notions of characteristic times. Section 3 introduces the Rosencrantz coin model and explores its stochastic dynamics. In this study, we also explore how the logarithmic asymptotic integral of motion governs the residence times in the Rosencrantz coin model. By maximizing entropy under a fixed logarithmic mean constraint, we demonstrate that residence time distributions transition from exponential to power-law behavior, emphasizing the dominance of rare, extended events and their connection to scale-invariant dynamics (see Section 4). Section 5 delves into the structure and dynamics of sequences in the non-ergodic Rosencrantz coin model, focusing on the combinatorial properties of block partitions. Section 6 presents a detailed discussion of the findings, and Section 7 concludes the paper by summarizing the key insights and implications for non-ergodic systems.

2. Decomposition of Entropy in Stationary Markov Chains—Predictable and Unpredictable Information

An outcome of each coin flip, governed by the transition matrix p 1 p 1 p p , encapsulates a single bit of information. The probability of state repetition, 0 p 1 determines how this information is divided into predictable and unpredictable components [2]. When p = 1 , the sequence becomes perfectly predictable, locking outcomes into stationary patterns like H , H , (heads) or T , T , (tails). Similarly, when p = 0 , the sequence alternates deterministically: H , T , H , T , . In contrast, p = 1 2 corresponds to complete randomness, making future outcomes wholly unpredictable.
For long sequences ( n 1 ) generated by an N-state, irreducible, and recurrent Markov chain X t t 0 with a stationary transition matrix T k s = Pr X t = s X t 1 = k , the relative frequency n k / n of visits to state k converges to the stationary distribution π k , such that k = 1 N π k T k s = π s for all s. The asymptotic constraints n i / n π i can be interpreted as the asymptotic integrals of motion, acting as conserved quantities in the long-sequence limit. The probability of each specific sequence { i 1 , i 2 , , i n } becomes extremely small for large n , decreasing as π i 1 n π i 1 π i 2 n π i 2 π i n n π i n . Consequently, the fraction of distinct sequences consistent with the stationary distribution π decreases exponentially with n, and their growth rate is given by
N π ( n ) = n ! n π 1 ! n π N ! e n k = 1 N π k ln ( n π k ) ln n = exp n H π , H π k = 1 N π k ln π k .
Here, H π is the entropy of the stationary distribution π , characterizing the exponential decay rate of the fraction of sequences constrained by the N asymptotic integrals of motion, n i / n π i , as n . While the total number of sequences grows factorially with n, the proportion of those consistent with the stationary distribution decreases exponentially as n increases. In their analysis of entropy estimators, ref. [7] demonstrate that, under a regular Markov process, the plug-in estimators converge asymptotically to a normal distribution. This convergence is governed by the spectral properties of the transition matrix and initial conditions. Up to a factor of 1 / ln N , the decay rate H π , corresponds to the Boltzmann–Gibbs–Shannon entropy quantifying the uncertainty of a Markov chain’s state in equilibrium:
H X t = k = 1 N π k log N π k = k = 1 N log N R k R k , R k 1 π k = lim n n n k .
The inverse frequency R k in (2) is the expected recurrence time of sequence returns to state k . Thus, ln R k can be termed the utility function of recurrence time, quantifying the reduction in diversity of state sequences caused by the repetition of state k in the most likely sequence patterns corresponding to the stationary distribution π .
The entropy (2) serves as a foundation for decomposing the system’s total uncertainty into predictable and unpredictable components [2,8,9]. To analyze the information dynamics, we add and subtract the following conditional entropy quantities to H X t , grouping the resulting terms into distinct informational quantities:
H X t = H X t ± H X t + 1 | X t ± H X t X t 1 ± H X t + 1 X t 1 = H ( X t ) H X t + 1 | X t E ex X t + H X t + 1 X t 1 H X t X t 1 I X t + 1 , X t | X t 1 + H X t + 1 X t + H X t X t 1 H X t + 1 X t 1 H X t | X t + 1 , X t 1 .
The second term in the above decomposition, H X t + 1 X t 1 H X t X t 1 , can be naturally identified with the conditional mutual information,
I X t + 1 , X t | X t 1 = H X t | X t 1 H X t | X t + 1 , X t 1 .
Rewriting H X t | X t 1 as
H X t | X t 1 = H X t I X t ; X t 1
and expanding the conditional entropies in (3), we arrive at the following:
I X t + 1 , X t | X t 1 = H X t + 1 | X t 1 H X t | X t 1 ,
which matches the second term of the decomposition.
The excess entropy E ex X t quantifies the influence of past states on the present and future states, capturing the system’s structural correlations and predictive potential of the system. The conditional mutual information I X t + 1 , X t | X t 1 , represents the information shared between the current state X t and the future state X t + 1 , independent of past history. It vanishes for both fully deterministic and completely random ( p = 1 2 ) systems, reflecting the absence of predictive utility in these extremes. The sum of the excess entropy and the conditional mutual information shown in the second line of (3) represents the predictable information in the system [2,8]. This component quantifies the portion of the total uncertainty in the future state X t + 1 that can be resolved using information about the current state X t and the past states within the system’s dynamics. In contrast, the unpredictable component, H X t | X t + 1 , X t 1 , given in the third line of (3), quantifies the intrinsic randomness in the system that remains unresolved even with full knowledge of its history. The entropy decomposition in (3) is closed, meaning it completely partitions the total entropy H ( X t ) into predictable and the unpredictable components. For a fully deterministic system (e.g., p = 1 or p = 0 ), the unpredictable component vanishes, H ( X t X t + 1 , X t 1 ) = 0 , making the entire entropy predictable, H ( X t ) = E ex ( X t ) . Conversely, for a completely random system ( p = 1 2 ), the predictable components disappear. In this case, H ( X t ) = H ( X t X t + 1 , X t 1 ) , as observations of the prior sequence provide no information for predicting future states ( E ex ( X t ) = 0 ). Similarly, attempts to predict the next state by repeating or alternating the current state fail, as I ( X t + 1 , X t X t 1 ) = 0 .
For an ergodic process, the entropy of the system is invariant under time shifts. Thus, H ( X t ) = H ( X t + 1 = H ( X t 1 ) . This property ensures that the decomposition in (3) holds at any time step t, making it applicable to a wide class of systems. This closed decomposition partitions the total entropy H ( X t ) into predictable and unpredictable components, providing a comprehensive framework for analyzing information dynamics.
In a finite-state, irreducible Markov chain, the entropy rate quantifies the uncertainty associated with transitions from one state to the next. It is formally given by
H X t + 1 | X t = k = 1 N π k s = 1 N T k s log N T k s = k = 1 N π k log N s = 1 N T k s T k s
where the second formulation highlights the connection to the geometric mean of transition probabilities. The term s = 1 N T k s T k s can be interpreted as the geometric mean time-averaged transition probability rate (per step) from the state k over an infinitely long observation period:
s = 1 N T k s T k s = lim n s = 1 N T k s n k , s n Q k ,
where n k , s / n is the observed frequency of transitions from k to s over the most likely sequence of length n. The frequency converges to the transition probability T k s as n . The inverse of the geometric mean transition probability rate, R k Q k 1 , represents the average residence time in state k. The excess entropy E ex X t takes a form similar to the entropy (2):
E ex X t = H ( X t ) H X t + 1 | X t = k = 1 N π k log N R k log N R k = k = 1 N log N R k / R k R k
where the ratio of average recurrence and residence times, R k / R k , measures the transience of state k in a sequence consistent with the stationary distribution π . Low transience implies that typical sequences are more predictable. Conversely, in the case of maximally random sequences where R k R k , states are visited regularly, and the excess entropy (9) does not enhance predictability. The formula for excess entropy E ex remains valid only for non-deterministic processes where all states can be visited and exited with non-zero probability. If the system remains indefinitely in a single state, the excess entropy cannot be meaningfully defined, as it relies on transitions between states to quantify predictability and structural correlations. In such cases, the entropy of the system is zero, as there is no uncertainty in the system’s behavior.
Similarly, the conditional entropy,
H X t + 1 X t 1 = k = 1 N π k log N s = 1 N T 2 k s T 2 k s k = 1 N π k log N Q k ( 2 )
quantifies the uncertainty associated with the statistics of trigrams involving two transitions. The transition probability rate,
Q k ( 2 ) = lim n s = 1 N T 2 k s n k , , s n = s = 1 N T 2 k s T 2 k s ,
represents the asymptotic frequency of trigrams starting with state k. Here, n k , , s / n denotes the number of occurrences of trigrams where the first state is k, the last state is s, and the intermediate state can be any valid state. The “black dot” (•) is used to indicate any possible intermediate state in the trigram. The relative frequency n k , , s / n converges to the corresponding elements of the squared transition matrix T 2 k s , over long, most likely sequences conforming the stationary distribution π . The conditional mutual information,
I X t + 1 , X t | X t 1 = H X t + 1 X t 1 H X t X t 1 = k = 1 N π k log N Q k Q k ( 2 ) ,
measures the amount of information shared between the current state X t and the future state X t + 1 , independently of the historical context provided by X t 1 . In the context of coin tossing, the mutual information (12) emerges from uncertainty in choosing between alternating the present coin side ( p 0 ) or repeating the current side ( p 1 ) when predicting the coin’s future state. The mutual information vanishes for the fair coin ( p = 1 2 ), where transitions are completely random and independent. It also vanishes for a fully deterministic coin ( p = 1 or p = 0 ), where past states provide no additional information for predicting future behavior.
When the past and present states of the chain are known, the predictable information about the future state can be expressed as follows:
P X t = E ex X t + I X t + 1 , X t | X t 1 = k = 1 N π k log N π k Q k 2 / Q k ( 2 ) = k = 1 N log N R k / R k R k , R k Q k ( 2 ) Q k 2 ,
The conditional probability Q k 2 / Q k ( 2 ) represents the likelihood that, in a two-step transition starting from k, both steps remain in k. The inverse conditional probability, R k , can be interpreted as the average state repetition time—the expected time between instances where state k appears consecutively twice. Predictable information, therefore, reflects the balance between recurrences (returns to states) and state repetitions (remaining in the same state). Systems with frequent recurrences and repetitions exhibit high predictability, indicating more organized and regular behavior. In contrast, systems with infrequent recurrences and frequent state changes exhibit low predictability, suggesting a more random and chaotic behavior. According to (3), unpredictable information is then defined as follows:
U ( X t ) = H ( X t ) P ( X t ) = k = 1 N π k log N R k log N R k R k = k = 1 N log N R k R k .
The logarithm of the state repetition time R k , which can also be interpreted as the utility of state repetition, emphasizes the exponential growth of unpredictability when repetition times increase. Conversely, lower values of log N R k indicate reduced uncertainty and predictable behavior.
Figure 1 provides a detailed visualization of the entropy decomposition (3) for a biased coin modeled as a Markov chain with transition matrix
T ( p , q ) = p 1 p 1 q q ,
where p and q are the probabilities of repeating the current state.
In Figure 1a, the three surfaces illustrate different information quantities as functions of p and q. The top surface represents the total entropy H ( X t ) , capturing the overall uncertainty in the system’s state. For a symmetric chain ( p = q ), the uncertainty reaches its maximum of 1 bit. The entropy decreases when p q , as one state becomes more probable than another, reducing overall uncertainty. The middle surface shows the entropy rate H X t + 1 | X t , measuring the uncertainty in predicting the next state given the past states. When p = 1 q , this surface coincides with the top one, indicating that the system behaves like a fair coin with no memory of past states. The bottom surface illustrates the conditional mutual information I X t + 1 , X t | X t 1 , measures the predictive power of the current state for the next state, independent of past history. When p = 1 q , the bottom surface highlights the tension between two simple prediction strategies—repeating or alternating the current state—reflecting the inherent randomness of a fair coin. The gaps between these surfaces reveal how the total uncertainty is partitioned. The gap between the top and middle surfaces corresponds to the excess entropy E ex ( X t ) , capturing the structured, predictable correlations in the system. Together with the space below the bottom surface, these gaps represent the amount of predictable information P X t . The gap between the middle and bottom surfaces represents the unpredictable component of the system, which diminishes as p and q approach deterministic values (0 or 1), reflecting minimal randomness.
In Figure 1b, the decomposition of entropy is presented more intuitively. The top, convex surface shows the unpredictable component, peaking at 1 bit for a fair coin ( p = q = 1 2 ) and dropping to zero for deterministic scenarios ( p , q = 0 , 1 ). The bottom, concave surface represents the predictable component, which grows as the system becomes more deterministic. Together, these surfaces illustrate the dynamic balance between randomness and structure. The unpredictable component dominates for systems near maximal uncertainty ( p = q = 1 2 ), while the predictable component reflects the emergence of structure as state repetition probabilities deviate from 1 2 .

3. Stochastic Dynamics of Rosencrantz’s Coin

In Tom Stoppard’s play “Rosencrantz and Guildenstern Are Dead” [3], Rosencrantz improbably wins 92 consecutive coin flips, prompting Guildenstern to speculate that their situation may be influenced by supernatural forces. Indeed, he is correct—for their lives are no longer governed by chance but by the will of the king. Here, we present a simple probabilistic model in which such an improbable sequence can naturally arise, where long-term correlations effectively “freeze” the coin in one of its states. Our model builds on previous approaches using stochastic thresholds to study ecological subsistence dynamics and systems near critical instability points [10].
In this model, each “coin flip” corresponds to a step in a stochastic process. At each step, a random variable X, representing the energy or capability to transition, is drawn from a probability distribution Pr { X < x } = F ( x ) . Similarly, the barrier height Y is also a random variable with its own distribution Pr { Y < y } = G ( y ) . The process updates X at every step, reflecting the inherent randomness in the transition capability. However, the barrier height Y updates with a frequency controlled by the parameter 0 η 1 : with probability η , a new Y is drawn from G ( y ) ; otherwise, Y retains its current value. For η = 0 , the the barrier is updated at every step, making it maximally volatile. Conversely, when η = 1 , the barrier height Y remains constant, introducing no additional variability. The process remains in the current state at step t if X t Y t , but transitions to another state if X t > Y t . The total number of consecutive successful coin flips, τ , represents the duration of Rosencrantz’s improbable streak—the time the process remains in the state where X Y . This framework provides a simple probabilistic explanation for how extraordinary sequences of successes can arise.
The analysis of this model (see [10] for details) depends on the specific distributions F ( x ) and G ( y ) , as well as on the value of the barrier update probability η . When the barrier is maximally volatile ( η = 0 ), Y is updated at every step. In this regime, the probability of observing τ successful flips is given by
P η = 0 ( τ ) = 0 1 d G ( y ) d F ( y ) τ 0 1 d G ( y ) 1 F ( y ) ,
which decays exponentially with τ , regardless of the specific form of F ( x ) and G ( y ) . The expected duration of state in such a sequence is always finite,
τ = 0 τ · P η = 0 ( τ + 1 ) < .
For uniform distributions d F ( u ) = d G ( u ) = d u over [ 0 , 1 ] , the coin behaves like a fair coin, and the probability of τ consecutive successes simplifies to the following:
P η = 0 ( τ ) = 2 ( τ + 1 ) ,
and the expected duration of state equals τ = 0 τ · 2 ( τ + 1 ) = 1 .
When η = 1 , the barrier Y remains constant throughout the process. In this case, the probability of τ successful flips is
P η = 1 ( τ ) = 0 1 d G ( y ) F ( y ) τ 1 F ( y ) .
For uniform distributions d F ( u ) = d G ( u ) = d u over [ 0 , 1 ] , the probability (19) reduces to the Beta function B ( τ + 1 , 2 ) = 0 1 u τ ( 1 u ) d u . Thus, for large τ , the probability of state decays is
P η = 1 ( τ ) = B ( τ + 1 , 2 ) = 1 ( τ + 1 ) ( τ + 2 ) τ 2 ,
The power-law decay in (20) implies that Rosencrantz’s coin appears biased toward producing extended streaks of success, as the expected duration of state diverges, τ = 0 τ · P η = 1 ( τ ) = , indicating that blocks of any length can appear in sufficiently long sequences.
For intermediate values of 0 < η < 1 with uniform distributions d F ( u ) = d G ( u ) = d u over [ 0 , 1 ] , the barrier Y alternates between stability and variability. This dynamic creates a mixture of exponential and power-law decay for the probability of τ successful flips [10]:
P η ( τ ) = η τ ( τ + 1 ) ( τ + 2 ) + k = 1 τ η τ k ( τ k + 1 ) ( τ k + 2 ) m = 1 k C m , k 1 η η m ,
where the coefficients C m , k are defined by
C m , k = m ! P ( k ) s = 1 m l s ( l m + 1 ) s = 1 m 1 ( l s + 1 ) k r = 1 s l r ,
and
P ( k ) = { l 1 , , l m } l i 1 , l 1 + + l m = k , l 1 l 2 l m
represents all integer partitions of k.
If G ( y ) = 1 ( 1 y ) ε , ε > 0 , the barrier height Y is typically close to one, making transitions rare and sharp. When X remains uniformly distributed, the probability of τ successful flips follows a Zipf-like behavior [10]:
P η = 1 ( τ ) = τ 1 ε ζ ( 1 + ε ) ,
where ζ ( s ) = n = 1 n s is the Riemann Zeta function. This result reflects the dominance of rare, extended streaks of success, evoking a sense of order emerging from randomness. The Rosencrantz coin can also be described by an integral row–stochastic matrix, viz.,
T η 1 , η 2 ( t ) = P η 1 ( t ) 1 P η 1 ( t ) 1 P η 2 ( t ) P η 2 ( t ) ,
which captures the transition probabilities over a finite interval t. Unlike the stationary Markov chain T ( p , q ) (15), which describes instantaneous transitions, T η 1 , η 2 ( t ) reflects the cumulative effect of all probabilities up to time t.
In the absence of correlations ( η 1 , 2 = 0 ), when the height of the energy barrier Y changes at every time step and all transitions are statistically independent, the stationary chain T ( 1 2 , 1 2 ) , corresponding to a fair coin, can be related to the Markov process generator G = 1 ln 2 T ( 1 2 , 1 2 ) I for the integral transition matrix T 0 , 0 ( t ) , where I is the identity matrix. In this case, the integral matrix T 0 , 0 ( t ) = exp ( t G ) is the solution of the differential equation T ˙ 0 , 0 ( t ) = G T 0 , 0 , with the initial condition T 0 , 0 = I . However, when correlated transitions occur ( η 1 , 2 > 0 ), the integral matrix T η 1 , η 2 ( t ) no longer has a straightforward relationship with the instantaneous transition matrix T ( p , q ) .
The long-time behavior of recurrence times in the Rosencrantz coin model depends significantly on the parameter η . The recurrence time R k ( t ) is defined as the expected number of steps required for the system to return to state k after leaving it. When η 1 , 2 = 0 , the barrier height Y changes dynamically at every step, and the process corresponds to a fair coin. The probability of remaining in a state decays geometrically as 2 ( t + 1 ) . The recurrence time can be calculated as follows:
R k = t = 1 t · P η = 0 ( t ) ,
where P η = 0 ( t ) = 2 ( t + 1 ) is the probability of returning to the state after t steps. Substituting P η = 0 ( t ) into the summation, we find that the recurrence time quickly converges to 2:
R k η 1 , 2 = 0 ( t ) = 2 2 t 2 t t 2 .
This behavior corresponds to a fair coin with a stationary distribution of states π 1 , 2 = 1 2 . Here, the ergodic hypothesis holds, meaning the system visits all allowable states with equal frequency over a long enough time. Consequently, time averages and ensemble averages become equivalent.
When η 1 , 2 = 1 , the barrier height Y remains fixed, and the recurrence time grows logarithmically. The probability of returning to the state after t steps is given by P η = 1 ( t ) = 1 / t ( t + 1 ) , which decays more slowly than the geometric decay seen in the η = 0 . The recurrence time is computed as follows:
R k = t = 1 t · P η = 1 ( t ) = t = 1 t t ( t + 1 ) .
This series is related to the digamma function:
Ψ ( x ) = γ + k = 1 1 k 1 k + x 1 ,
where γ is Euler’s constant. Using this, the recurrence time becomes
R k η 1 , 2 = 1 ( t ) = 2 Ψ ( t + 2 ) + 4 t + 2 4 + 2 γ .
In this scenario, the ergodic hypothesis does not hold because the recurrence times are not constant. The system’s behavior becomes non-ergodic, meaning it does not visit all states with the same frequency, and ensemble averages lose their equivalence to time averages.
When correlations are present, the Rosencrantz coin shows increasingly prolonged intervals before returning to a given state, rendering the concept of a stationary distribution invalid. The recurrence time grows without settling into a steady pattern, indicating that time averages diverge from ensemble averages and the system’s dynamics are non-ergodic.

4. Logarithmic Integral of Motion and Power-Law Distributions of Residence Times

In stochastic systems such as the Rosencrantz coin model, the residence time—defined as the duration a system remains in a given state before transitioning—is governed by cumulative system dynamics. Importantly, the presence of a logarithmic asymptotic ‘integral of motion’, such as log τ , featuring a logarithmic growth of the residence time with increasing steps (Figure 2), shapes the statistical behavior of residence times and their distribution. This growth, rather than the static probabilities of being in a specific state, determines the emergent long-term dynamics and reflects whether the system is ergodic or non-ergodic.
For maximizing entropy H = τ P ( τ ) log P ( τ ) under the fixed logarithmic mean, log τ = τ P ( τ ) log τ , and the probability normalization condition τ P ( τ ) = 1 , we define the functional
L = τ P ( τ ) log P ( τ ) + λ 1 τ P ( τ ) 1 + λ 2 τ P ( τ ) log τ log τ
where λ 1 , 2 are the Lagrange multipliers. The variation problem for the entropy maximum under the constraints involving the logarithmic function is considered in [11]. Taking the variation of the functional (30) with respect to P ( τ ) , we obtain the following:
L P ( τ ) = log P ( τ ) 1 + λ 1 + λ 2 log τ = 0 .
Solving (31) for P ( τ ) , we obtain P ( τ ) τ λ 2 , where the exponent α = λ 2 is determined by the value of log τ and the normalization condition τ τ α = 1 . The resulting probability distribution has the power-law form compatible with (20) and (23), viz.,
P ( τ ) = τ α ζ ( α ) ,
where ζ ( α ) is the Riemann zeta function ensuring normalization. This distribution reflects the dominance of rare, long residence times for small α , a hallmark of systems exhibiting scale-invariant dynamics.
In systems where log τ is conserved, the logarithmic mean acts as a constraint, favoring configurations where the majority of residence times are short, but a non-negligible fraction exhibits extremely long durations. For the Rosencrantz coin model, a similar principle applies. When the barrier dynamics enforce a logarithmic constraint on residence times, the most likely distribution P ( τ ) transitions from exponential ( η = 0 ) to power-law ( η = 1 ) behavior,
P ( τ ) τ α , α = 1 + ε ,
where ε > 0 reflects the degree of long-range correlations in the system. This framework underscores the emergence of extended streaks of successes or failures as a natural consequence of the interplay between entropy maximization and logarithmic constraints. Our model’s findings on the logarithmic growth of residence times aligns with Harris’s emphasis on logarithmic utility functions for entropy estimation [6]. This shared focus underscores the importance of scaling laws in non-parametric systems, where predictability emerges not from state probabilities but from temporal patterns.

5. Structure and Dynamics of Sequences in the Non-Ergodic Rosencrantz Coin Model

To study the sequences generated by the non-ergodic Rosencrantz coin, entropy and its decomposition into predictable and unpredictable components are inapplicable because there is no stationary distribution. Figuratively speaking, the coin “sticks” in each of its states, meaning the structure of observed sequences is characterized by blocks of varying length where the coin remains in the same state for an extended period of time. As we have seen from the analysis of recurrence times (Section 3), the length of such blocks gradually increases, so that increasingly extended streaks of success would be observed in sufficiently long sequences. Because the expected recurrence times grow logarithmically, the system’s dynamics become increasingly non-stationary, and the typical assumptions of equilibrium and steady-state distributions no longer apply. The probability to observe such a sequence, consisting of m consecutive blocks of lengths n 1 , , n m , is characterized by the product i = 1 m P η = 1 ( n i ) , where n = n 1 + + n m is a partition of n into m terms.
The number of ways to partition a sequence of n elements into m blocks is given by the Stirling numbers of the second kind S ( n , m ) [12], which satisfy the recurrence relation:
S ( n + 1 , m ) = S ( n , m 1 ) + m S ( n , m ) .
The values of S ( n , m ) are not uniformly distributed across all possible block sizes but exhibit a maximum for specific values of m. Figure 3a shows the normalized Stirling numbers of the second kind, representing the probability of partitioning a sequence of length n into m non-empty blocks. The normalization uses corresponding Bell numbers, B ( n ) = 1 m m S ( n , m ) , which count the total number of partitions of an n-set. This normalization scales large combinatorial values into probabilities, enabling clear graphical representation. The horizontal axis uses a logarithmic scale to depict sequence lengths of 100, 500, 1000, and 5000, while the vertical axis shows the corresponding partition probabilities. The sharp peaks in the distributions highlight the most probable partition configurations for each sequence length.
As n increases, certain partition configurations become increasingly dominant due to the combinatorial growth of the Stirling numbers. These configurations correspond to partitions where the block sizes are roughly balanced, maximizing the number of ways the sequence can be divided.
Taking the natural logarithm of S ( n , m ) , we obtain the following:
ln S ( n , m ) m ln n m + O ( m ) .
By differentiating and setting d ( m ln ( n / m ) ) / d m = 0 , we find the position of the maximum probability to be m max = n / ln n . A detailed analysis conducted in [13] provides a more accurate estimate:
m max n ln n + O n ( ln ln n ) 1 / 2 ( ln n ) 3 / 2 .
Substituting this value of m max back into the asymptotic expression for the logarithm of the maximum Stirling number, we have
ln S max n ln n n n ln ln n + 2 n ln ln n ln n .
The above estimate suggests that the system’s temporal horizon n is most likely partitioned into blocks of size ln n , representing the typical duration spent in a state before switching. The combinatorial advantage that favors specific block sizes reflects the intrinsic structure determined by the Stirling numbers of the second kind for large n. The estimates in (37) become increasingly accurate as n grows; however, for smaller n, correction terms can significantly influence the approximation (see Figure 3).
According to Rennie and Dobson [13], as n increases, the most likely partition of the sequence into r = ( 1 ± ϵ ) n / ln n blocks approaches the configuration described by (37). The deviation between the maximum Stirling number and the actual Stirling number for this partition is given by
ln S max ln S ( n , r ) n H , H ( ln ln n ) 2 2 ln n .
The rate of convergence H to the most likely partition structure in (38) can be interpreted as an entropy-like function—a measure of block-related entropy for the non-ergodic system. The value H varies only slightly over a wide range of n. Asymptotically, a sequence of length n will most likely be divided into m max = n / ln n blocks, each of length R ln n , which is the characteristic recurrence time for the Rosencrantz coin remaining in a particular state. Consequently, the probability of observing a sequence consistent with the configuration of maximum likelihood equals to
P η = 1 ( ln n ) n / ln n exp 2 n ln ln n ln n = exp 2 n ln R R ,
being compensated by the last correction term in the asymptotic approximation for the maximum Stirling number (37)—for blocks of the most probable size, the power-law decay over time is offset by the combinatorial growth of partitions. The term 2 ln R / R in (39) corresponds to the entropy of a stationary coin, as discussed earlier in (2). It captures the balance between the combinatorial growth of the predominant partitions and the algebraic decay of block probabilities (20), which stabilizes the characteristic block size at ln n . For sufficiently large n, the combinatorial increase in the number of predominant partitions compensates for the decay in block probabilities, ultimately dominating the system’s behavior. Consequently, the structure of generated sequences resembles a deterministic schedule rather than a purely random process. The deterministic-like pattern emerges because the recurrence time R aligns with the residence and repetition times, making the sequence’s structure predictable. This conclusion fully aligns with the case of a biased coin T ( p , q ) , where the probabilities p and q of repeating a state are close to one. Under these conditions, the coin’s behavior becomes almost entirely predictable (see Figure 1).
For such a non-ergodic system, the relevant question is not whether the next state of the coin can be predicted, but whether the temporal horizon  n —the length of the sequence of flips that can be predicted based on the observed characteristic block size R . Solving the equation R = n / ln n gives the following:
n ( R ) = R LambertW 1 R 1 ,
where LambertW 1 x refers to the 1 -th branch of the Lambert function, defined as the inverse of y e y = x . This branch is real-valued for e 1 x < 0 and decreases monotonically towards negative infinity as x 0 ( R ) along the negative real axis, which forms the branch cut [14]. Figure 3b illustrates the relationship between the recurrence time R and the temporal horizon n as given by (40). The curve demonstrates that as the recurrence time, derived from the optimal block size determined by combinatorial analysis, increases, the temporal horizon grows super-linearly. This relationship implies that the longer the system remains in a state before switching, the further into the future one can predict its behavior. Solving R = n / ln n via the Lambert function (40) also provides a quantitative prediction of how long structured behavior will persist.

6. Discussion—Temporal Horizons and the Predictive Economy of Non-Ergodic Systems

In the study of non-ergodic systems like the Rosencrantz coin model, the temporal horizon refers to the characteristic duration over which predictable patterns and structures emerge within the system’s evolution. Unlike ergodic systems, where time averages converge to ensemble averages, non-ergodic systems fail to visit all possible states uniformly. Consequently, the system’s long-term behavior cannot be fully captured by stationary probability distributions.
This limitation shifts the analytical focus to understanding the durations between key events, such as state switches or recurrences. In such cases, combinatorial insights compensate for the absence of a stationary distribution, enabling meaningful temporal predictions in systems governed by deterministic-like patterns amidst randomness. The interplay between the recurrence time and the temporal horizon provides a robust framework for predicting the extent of structured behavior in non-ergodic processes.
Partitioning these sequences into blocks of persistent states reveals a deep connection to combinatorial structures, particularly the Stirling numbers of the second kind. These numbers exhibit a sharp maximum for partitions of size ln n . For sufficiently large n, the combinatorial growth in the number of partitions compensates for the decay in block probabilities i = 1 m P η = 1 ( n i ) for the most likely partitions n = n 1 + + n m . This balance between increasing block lengths and decreasing block probabilities results in a deterministic-like structure within an inherently stochastic system.
The temporal horizon, defined by the duration over which predictable patterns emerge, offers a new perspective for analyzing non-ergodic processes. The logarithmic recurrence time R ln n can be interpreted as the prediction utility of time within a conceptual prediction economy. The logarithmic utility of time implies that predictable patterns arise over increasingly longer horizons. Although the first derivative R ( n ) > 0 , indicating that additional time enhances prediction, the second derivative R ( n ) < 0 reflects diminishing returns: each subsequent unit of time (or coin flip) adds less predictive value than the previous one. The concave nature of the logarithmic utility highlights this diminishing predictive benefit.
This logarithmic structure seamlessly leads to hyperbolic time discounting in prediction. Hyperbolic discounting prioritizes short-term forecasts with higher certainty over long-term, uncertain ones. The Arrow–Pratt measure of risk aversion [15,16] and the Leland measure of prudence [17] formalize this idea:
R ( n ) R ( n ) = 1 n , R ( n ) R ( n ) = 2 n .
These measures describe how knowledge of a block’s length improves the ability to predict the temporal horizon. This relationship aligns with the hyperbolic time discounting model, a well-established framework in human [18] and animal [19] intertemporal choice, where immediate rewards are weighted more heavily than future gains. In the Rosencrantz coin model, this perspective underscores the preference for short-term, reliable predictions over long-term, uncertain ones. In his exploration of entropy estimation, Harris [6] identifies significant challenges when probability mass is distributed across numerous small classes. In the Rosencrantz coin model, similar obstacles arise due to the absence of stationary distributions. By employing logarithmic scaling and Stirling numbers of the second kind, we provide a robust framework for analyzing residence times and sequence predictability.

7. Conclusions

In this paper, we explored the dynamics of both ergodic and non-ergodic systems through the lens of a coin-flipping model, specifically focusing on the Rosencrantz coin that can “freeze” in one of its states. We compared the behavior of a conventional biased coin, characterized by predictable and unpredictable entropy components, to the non-ergodic Rosencrantz coin, where the absence of a stationary distribution alters the predictability framework.
In the ergodic case, entropy decomposition into predictable and unpredictable components relies on characteristic times such as recurrence, residence, and repetition times. These measures provide insights into the structure and behavior of state sequences generated by stationary Markov chains. However, in the non-ergodic Rosencrantz coin model, where the system can persist in a state for extended periods, traditional entropy decomposition becomes inapplicable. Harris’s seminal work on non-parametric entropy estimation provides critical insights into how entropy can be measured in systems with large or infinite state spaces [6]. While our model does not explicitly rely on sampling from a multinomial population, the estimation challenges addressed in Harris’s framework resonate with the Rosencrantz coin, where the combinatorial structure compensates for the lack of a stationary distribution.
Our analysis demonstrated that the Rosencrantz coin’s non-ergodic dynamics lead to logarithmically increasing block lengths in sequences of persistent states. By leveraging the Stirling numbers of the second kind, we identified the most probable partition size of log n and showed that combinatorial growth in the number of partitions compensates for the algebraic decay of block probabilities. This balance results in deterministic-like structures emerging from fundamentally stochastic processes. Our analysis of the Rosencrantz coin model reveals that the logarithmic asymptotic integral of motion, log n , plays a pivotal role in shaping residence time distributions. The transition from exponential to power-law behavior reflects the system’s ability to balance short-term predictability with the emergence of rare, extended events. This finding underscores the broader relevance of logarithmic constraints and entropy maximization in understanding the temporal dynamics of complex systems.
The concept of a temporal horizon n , defined by the recurrence time R , provides a framework for predicting the duration of structured behavior in non-ergodic systems. The logarithmic utility of time implies diminishing predictive returns as the temporal horizon extends, aligning with hyperbolic time discounting models seen in human and animal decision-making.
In conclusion, the Rosencrantz coin model reveals that in non-ergodic systems, predictability is less about individual outcomes and more about understanding the length of sequences governed by persistent state blocks. This study bridges information theory, combinatorics, and stochastic processes, offering a deeper perspective on the dynamics and predictability of systems where conventional equilibrium frameworks fail.

Funding

This research received no external funding.

Institutional Review Board Statement

Not applicable.

Data Availability Statement

The original contributions presented in this study are included in the article. Further inquiries can be directed to the corresponding author.

Acknowledgments

We thank Texas Tech University for the administrative and technical support.

Conflicts of Interest

The author declares no conflict of interest.

References

  1. Daza, Á.; Wagemakers, A.; Sanjuán, M.A.F. Multistability and unpredictability. Phys. Today 2024, 77, 44–50. [Google Scholar] [CrossRef]
  2. Volchenkov, D. Memories of the Future. Predictable and Unpredictable Information in Fractional Flipping a Biased Coin. Entropy 2019, 21, 807. [Google Scholar] [CrossRef] [PubMed]
  3. Stoppard, T. Rosencrantz and Guildenstern Are Dead; Grove Press: New York, NY, USA, 1967; 23p. [Google Scholar]
  4. Ricci, L.; Perinelli, A.; Castelluzzo, M. Estimating the variance of Shannon entropy. Phys. Rev. E 2021, 104, 024220. [Google Scholar] [CrossRef]
  5. Roulston, M.S. Estimating the errors on measured entropy and mutual information. Phys. D Nonlinear Phenom. 1999, 125, 285–294. [Google Scholar] [CrossRef]
  6. Harris, B. The Statistical Estimation of Entropy in the Non-Parametric Case. In Topics in Information Theory; Csiszár, I., Elias, P., Eds.; North-Holland: Amsterdam, The Netherlands, 1975; pp. 323–355. [Google Scholar]
  7. Ricci, L. Asymptotic distribution of sample Shannon entropy in the case of an underlying finite, regular Markov chain. Phys. Rev. E 2021, 103, 022215. [Google Scholar] [CrossRef] [PubMed]
  8. James, R.G.; Ellison, C.J.; Crutchfield, J.P. Anatomy of a Bit: Information in a Time Series Measurement. CHAOS 2011, 21, 037109. [Google Scholar] [CrossRef] [PubMed]
  9. Volchenkov, D. Infinite Ergodic Walks in Finite Connected Undirected Graphs. Entropy 2021, 23, 205. [Google Scholar] [CrossRef] [PubMed]
  10. Volchenkov, D. Survival Under Uncertainty an Introduction to Probability Models of Social Structure and Evolution; Springer Series: Understanding Complex Systems, 240p; Springer: Berlin/Heidelberg, Germany, 2016; ISBN 978-3-319-39419-0. [Google Scholar]
  11. Visser, M. Zipf’s law, power laws and maximum entropy. New J. Phys. 2013, 15, 043021. [Google Scholar] [CrossRef]
  12. Graham, R.L.; Knuth, D.E.; Patashnik, O. Concrete Mathematics; Addison–Wesley: Reading, MA, USA, 1988; p. 244. ISBN 0-201-14236-8. [Google Scholar]
  13. Rennie, B.C.; Dobson, A.J. On stirling numbers of the second kind. J. Comb. Theory 1969, 7, 116–121. [Google Scholar] [CrossRef]
  14. Corless, R.M.; Gonnet, G.H.; Hare, D.E.G.; Jeffrey, D.J.; Knuth, D.E. On the Lambert W Function. Adv. Comput. Math. 1996, 5, 329–359. [Google Scholar] [CrossRef]
  15. Pratt, J.W. Risk Aversion in the Small and in the Large. Econometrica 1964, 32, 122–136. [Google Scholar] [CrossRef]
  16. Arrow, K.J. Essays in the Theory of Risk-Bearing; Markham Publishing Co.: Chicago, IL, USA, 1971. [Google Scholar]
  17. Leland, H.E. Saving and uncertainty: The precautionary demand for saving. Q. J. Econ. 1968, 82, 465–473. [Google Scholar] [CrossRef]
  18. Laibson, D. Golden Eggs and Hyperbolic Discounting. Q. J. Econ. 1997, 112, 443–477. [Google Scholar] [CrossRef]
  19. Mazur, J.E. An Adjusting Procedure for Studying Delayed Reinforcement. In The Effect of Delay and of Intervening Events on Reinforcement Value; Commons, M.L., Mazur, J.E., Nevin, J.A., Rachlin, H., Eds.; Series: Quantitative Analyses of Behavior, Volume 5; Lawrence Erlbaum Associates: Mahwah, NJ, USA, 1987; pp. 55–73. [Google Scholar]
Figure 1. Decomposition of entropy H ( p , q ) into its components (3) for a biased coin modeled as a Markov chain T ( p , q ) . (a) The top surface shows the total entropy H ( p , q ) , the middle surface shows the entropy rate (10), and the bottom surface shows the conditional mutual information (12). The gap between the top and middle surfaces illustrates the excess entropy (9). The gap between the middle and the bottom surfaces corresponds to the unpredictable information component (14) of the system. (b) The top, convex surface shows the unpredictable component of information, reaching a maximum of 1 bit for a fair coin ( p = q = 1 2 ), but reducing to zero for a deterministic coin ( p , q = 0 , 1 ). The bottom, concave surface represents the predictable component of the system, which grows as the system becomes more deterministic. Together, these surfaces illustrate the interplay between randomness and structure, showing how predictability increases as p and q deviate from 1 2 .
Figure 1. Decomposition of entropy H ( p , q ) into its components (3) for a biased coin modeled as a Markov chain T ( p , q ) . (a) The top surface shows the total entropy H ( p , q ) , the middle surface shows the entropy rate (10), and the bottom surface shows the conditional mutual information (12). The gap between the top and middle surfaces illustrates the excess entropy (9). The gap between the middle and the bottom surfaces corresponds to the unpredictable information component (14) of the system. (b) The top, convex surface shows the unpredictable component of information, reaching a maximum of 1 bit for a fair coin ( p = q = 1 2 ), but reducing to zero for a deterministic coin ( p , q = 0 , 1 ). The bottom, concave surface represents the predictable component of the system, which grows as the system becomes more deterministic. Together, these surfaces illustrate the interplay between randomness and structure, showing how predictability increases as p and q deviate from 1 2 .
Entropy 27 00147 g001
Figure 2. Recurrence times for the Rosencrantz coin model. The curves represent different η 1 and η 2 : when η 1 , 2 = 0 (solid line), shows a stable recurrence time of 2, consistent with a fair coin and the ergodic hypothesis; η 1 = 0 ,   η 2 = 1 (dotted line) shows gradually increasing recurrence times; when η 1 , 2 = 1 (dashed line) shows significantly increasing recurrence times.
Figure 2. Recurrence times for the Rosencrantz coin model. The curves represent different η 1 and η 2 : when η 1 , 2 = 0 (solid line), shows a stable recurrence time of 2, consistent with a fair coin and the ergodic hypothesis; η 1 = 0 ,   η 2 = 1 (dotted line) shows gradually increasing recurrence times; when η 1 , 2 = 1 (dashed line) shows significantly increasing recurrence times.
Entropy 27 00147 g002
Figure 3. (a) Partition probabilities for sequences of lengths 100, 500, 1000, and 5000. The graph displays the Stirling numbers of the second kind, normalized by the corresponding Bell numbers, representing the probability of partitioning a sequence of length n into m non-empty subsets. The sharp peaks, located at m O ( n / ln n ) , indicate the maxima of the Stirling numbers, highlighting the most probable partition configurations. The horizontal axis uses a logarithmic scale to effectively capture the wide range of sequence lengths. (b) Temporal horizon n as a function of recurrence time R for a non-ergodic Rosencrantz coin.
Figure 3. (a) Partition probabilities for sequences of lengths 100, 500, 1000, and 5000. The graph displays the Stirling numbers of the second kind, normalized by the corresponding Bell numbers, representing the probability of partitioning a sequence of length n into m non-empty subsets. The sharp peaks, located at m O ( n / ln n ) , indicate the maxima of the Stirling numbers, highlighting the most probable partition configurations. The horizontal axis uses a logarithmic scale to effectively capture the wide range of sequence lengths. (b) Temporal horizon n as a function of recurrence time R for a non-ergodic Rosencrantz coin.
Entropy 27 00147 g003
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

Share and Cite

MDPI and ACS Style

Volchenkov, D. The Rosencrantz Coin: Predictability and Structure in Non-Ergodic Dynamics—From Recurrence Times to Temporal Horizons. Entropy 2025, 27, 147. https://doi.org/10.3390/e27020147

AMA Style

Volchenkov D. The Rosencrantz Coin: Predictability and Structure in Non-Ergodic Dynamics—From Recurrence Times to Temporal Horizons. Entropy. 2025; 27(2):147. https://doi.org/10.3390/e27020147

Chicago/Turabian Style

Volchenkov, Dimitri. 2025. "The Rosencrantz Coin: Predictability and Structure in Non-Ergodic Dynamics—From Recurrence Times to Temporal Horizons" Entropy 27, no. 2: 147. https://doi.org/10.3390/e27020147

APA Style

Volchenkov, D. (2025). The Rosencrantz Coin: Predictability and Structure in Non-Ergodic Dynamics—From Recurrence Times to Temporal Horizons. Entropy, 27(2), 147. https://doi.org/10.3390/e27020147

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Metrics

Back to TopTop