entropy-logo

Journal Browser

Journal Browser

Kolmogorov Complexity and Applications—Dedicated to Professor Paul Vitanyi on the Occasion of His 80th Birthday

A special issue of Entropy (ISSN 1099-4300). This special issue belongs to the section "Information Theory, Probability and Statistics".

Deadline for manuscript submissions: closed (31 March 2026) | Viewed by 11900

Editor


E-Mail Website
Guest Editor
1. Cheriton School of Computer Science, University of Waterloo, Waterloo, ON N2L 3G1, Canada
2. Central China Research Institute of Artificial Intelligence, Zhengzhou 450046, China
Interests: bioinformatics; machine learning; Kolmogorov complexity; information distance

Special Issue Information

Dear Colleagues,

Over his distinguished career, Prof. Paul Vitanyi has worked on the theory of computation and Kolmogorov complexity. He has extended Kolmogorov complexity and its applications and brought it to the wide public from obscure mathematics. His contributions to this modern information theory have influenced many researchers in many fields, from computer science to mathematics, cognitive science, biology, philosophy, and physics.

Celebrating his 80th birthday, the aim of this Special Issue is to collect original research articles on the most recent research in Kolmogorov complexity, randomness, large language models, and compression, as well as comprehensive review articles covering these topics from either a theoretical or experimental viewpoint. A review can focus on either a wide context or the recent research contributions of the author(s) and related works of other researchers on the same topic.

We also welcome applications of Kolmogorov complexity, information distance and one-shot learning, incompressibility methods, theories of human learning, Solomonoff induction and large generative models, and Kolmogorov structure functions.

Prof. Dr. Ming Li
Guest Editor

Manuscript Submission Information

Manuscripts should be submitted online at www.mdpi.com by registering and logging in to this website. Once you are registered, click here to go to the submission form. Manuscripts can be submitted until the deadline. All submissions that pass pre-check are peer-reviewed. Accepted papers will be published continuously in the journal (as soon as accepted) and will be listed together on the special issue website. Research articles, review articles as well as short communications are invited. For planned papers, a title and short abstract (about 250 words) can be sent to the Editorial Office for assessment.

Submitted manuscripts should not have been published previously, nor be under consideration for publication elsewhere (except conference proceedings papers). All manuscripts are thoroughly refereed through a single-anonymized peer-review process. A guide for authors and other relevant information for submission of manuscripts is available on the Instructions for Authors page. Entropy is an international peer-reviewed open access monthly journal published by MDPI.

Please visit the Instructions for Authors page before submitting a manuscript. The Article Processing Charge (APC) for publication in this open access journal is 2600 CHF (Swiss Francs). Submitted papers should be well formatted and use good English. Authors may use MDPI's English editing service prior to publication or during author revisions.

Keywords

  • Kolmogorov complexity
  • randomness
  • compression and LLM
  • information distance, theory, and applications

Benefits of Publishing in a Special Issue

  • Ease of navigation: Grouping papers by topic helps scholars navigate broad scope journals more efficiently.
  • Greater discoverability: Special Issues support the reach and impact of scientific research. Articles in Special Issues are more discoverable and cited more frequently.
  • Expansion of research network: Special Issues facilitate connections among authors, fostering scientific collaborations.
  • External promotion: Articles in Special Issues are often promoted through the journal's social media, increasing their visibility.
  • Reprint: MDPI Books provides the opportunity to republish successful Special Issues in book format, both online and in print.

Further information on MDPI's Special Issue policies can be found here.

Published Papers (12 papers)

Order results
Result details
Select all
Export citation of selected articles as:

Research

Jump to: Review, Other

11 pages, 249 KB  
Article
Dialectics for Artificial Intelligence
by Zhengmian Hu
Entropy 2026, 28(6), 611; https://doi.org/10.3390/e28060611 - 29 May 2026
Viewed by 367
Abstract
Can artificial intelligence discover, from raw experience and without human supervision, concepts that humans have discovered? One challenge is that human concepts themselves are fluid: conceptual boundaries can shift, split, and merge as inquiry progresses (e.g., Pluto is no longer considered a planet). [...] Read more.
Can artificial intelligence discover, from raw experience and without human supervision, concepts that humans have discovered? One challenge is that human concepts themselves are fluid: conceptual boundaries can shift, split, and merge as inquiry progresses (e.g., Pluto is no longer considered a planet). To make progress, we need a definition of “concept” that is not merely a dictionary label, but a structure that can be revised, compared, and aligned across agents. We propose an algorithmic information viewpoint that treats a concept as an information object defined only through its structural relation to an agent’s total experience. The core constraint is determination: a set of parts forms a reversible consistency relation if any missing part is recoverable from the others (up to the standard logarithmic slack in Kolmogorov complexity). This reversibility prevents “concepts” from floating free of experience and turns concept existence into a checkable structural claim. To judge whether a decomposition is natural, we define excess information, measuring the redundancy overhead introduced by splitting experience into multiple separately described parts. On top of these definitions, we formulate dialectics as an optimization dynamics: as new patches of information appear (or become contested), competing concepts bid to explain them via shorter conditional descriptions, driving systematic expansion, contraction, splitting, and merging. Finally, we formalize low-cost concept transmission and multi-agent alignment using small grounds that allow another agent to reconstruct the same concept under a shared protocol, making communication a concrete compute-bits trade-off. Full article
Show Figures

Figure 1

33 pages, 1964 KB  
Article
On the Empirical Agreement Between Compression and Program-Execution Approaches to Algorithmic Complexity: A Controlled Study Using BDM
by Zoe Leyva-Acosta, Eduardo Acuña Yeomans and Francisco Hernández-Quiroz
Entropy 2026, 28(6), 601; https://doi.org/10.3390/e28060601 - 27 May 2026
Viewed by 222
Abstract
Algorithmic complexity is a foundational notion in theoretical computer science, but its incomputability has led to two families of practical estimators: compression-based and program-execution-based (e.g., the Coding Theorem Method, CTM). Despite widespread use, the correspondence between these paradigms remains poorly understood. We present [...] Read more.
Algorithmic complexity is a foundational notion in theoretical computer science, but its incomputability has led to two families of practical estimators: compression-based and program-execution-based (e.g., the Coding Theorem Method, CTM). Despite widespread use, the correspondence between these paradigms remains poorly understood. We present a systematic comparative framework that uses the Block Decomposition Method (BDM) to extend CTM-based estimates to longer strings, enabling direct comparison with compression-based estimators across multiple computational models. A control estimator (BDMId) isolates the contribution of block structure from algorithmic information, providing a rigorous baseline for interpreting correlations. Our results show that cross-paradigm correlations are weak and decrease systematically as model resolution decreases; for the lowest-resolution model, correlations are essentially null. In long strings, per-length correlations vanish, while global correlations appear high but are largely explained by the control estimator, indicating that they are driven primarily by trivial length effects rather than shared sensitivity to algorithmic structure. Crucially, for low-resolution models, BDMId outperforms BDM itself, indicating that the inclusion of CTM information does not improve—and may even reduce—agreement with compression-based estimators. These findings suggest that compression-based and program-execution-based estimators capture fundamentally different aspects of structure. Rather than invalidating either approach, this work provides a systematic methodology for assessing cross-paradigm correspondence and highlights the importance of explicit controls in empirical comparisons of algorithmic complexity. Full article
Show Figures

Figure 1

16 pages, 2346 KB  
Article
Semantic Algorithmic Information Theory: From Kolmogorov Complexity to Semantic Equivalence
by Jiatong Wu, Sen Wang, Kai Niu, Yifei She and Ping Zhang
Entropy 2026, 28(5), 554; https://doi.org/10.3390/e28050554 - 14 May 2026
Viewed by 392
Abstract
Classical Algorithmic Information Theory (AIT) provides a rigorous foundation for information-based similarity measurement, but classical formulations and their compression-based approximations largely operate at the syntactic level, making them sensitive to surface-level variation and insufficient for semantic equivalence. To address this limitation, this paper [...] Read more.
Classical Algorithmic Information Theory (AIT) provides a rigorous foundation for information-based similarity measurement, but classical formulations and their compression-based approximations largely operate at the syntactic level, making them sensitive to surface-level variation and insufficient for semantic equivalence. To address this limitation, this paper introduces Semantic Algorithmic Information Theory. The contributions are organized around three core aspects. First, regarding algorithmic extension, we formalize the Semantic Turing Machine System (STMS) to decouple abstract concepts from their diverse syntactic realizations. Within this framework, Semantic Complexity is defined as the minimum program length required to generate some realization in a synonymous set, thereby characterizing compact meaning representation. Second, to enable approximate computation, we move from the ideal, uncomputable semantic information distance to a model-based direct estimator of the Normalized Semantic Information Distance (NSID), which uses neural autoregressive models as conditional probability estimators. Finally, through experimental validation and comparative analysis, we show that the NSID estimator suppresses syntactic variance while preserving semantic structure. Empirical results indicate that NSID provides a practical, computable surrogate for semantic distance and improves upon classical syntactic metrics in evaluating cross-representational equivalence. Full article
Show Figures

Figure 1

31 pages, 985 KB  
Article
The Physics, Information, and Computation of Perennial Learning: Kolmogorov Complexity, Information Distance, and Port-Hamiltonian Thermodynamics
by Chandrajit Bajaj
Entropy 2026, 28(5), 551; https://doi.org/10.3390/e28050551 - 13 May 2026
Viewed by 355
Abstract
Real-world autonomous agents learn under nonstationarity, safety constraints, and finite energetic budgets. We develop a framework for perennial learning—agents that continuously refine their models while provably controlling the cost of forgetting—by unifying three classical pillars: Kolmogorov complexity, which equates scientific discovery with algorithmic [...] Read more.
Real-world autonomous agents learn under nonstationarity, safety constraints, and finite energetic budgets. We develop a framework for perennial learning—agents that continuously refine their models while provably controlling the cost of forgetting—by unifying three classical pillars: Kolmogorov complexity, which equates scientific discovery with algorithmic compression; Landauer’s principle, which assigns a minimal thermodynamic cost of kBTln2 per erased bit to every irreversible model update; and port-Hamiltonian (PH) dynamics, whose (JR)H decomposition separates zero-cost reversible inference from costly irreversible forgetting by construction. The Maxwell demon analogy is formalized: each learning episode is a Szilard cycle in which information acquisition, belief transport, and memory erasure must balance thermodynamically. The information-distance framework, comprising the normalized information distance (NID) and normalized compression distance (NCD), provides a computable geometry for measuring learning progress and guiding curriculum design. We separate theideal uncomputable regularizer based on prefix complexity from the practical compressor/MDL (minimum description length) surrogate that appears in optimization and prove a calibration lemma linking the two under a mild uniform-accuracy assumption. Under explicit regularity, compact-sublevel, and non-energy-extracting assumptions, we prove a passivity speed limit for curriculum-induced contractions of the effective feasible set. Under local asymptotic normality, we reprove that Fisher information is a local posterior codelength proxy rather than an exact theorem about algorithmic entropy. A conditional sequential information-budget proposition shows that the per-stage sample requirement scales as O˜(Δkt/λ), where Δkt is the number of materially changed model coordinates (not the total model complexity kt); the k3Δk improvement is conditional on a warm-start assumption and a chosen cold-start baseline. A double-integrator running example with a moving obstacle illustrates the architecture. Full article
Show Figures

Figure 1

11 pages, 274 KB  
Article
The Largest Number Representable in 64 Bits
by John Tromp
Entropy 2026, 28(5), 494; https://doi.org/10.3390/e28050494 - 26 Apr 2026
Viewed by 672
Abstract
We investigate how large an output can be computed by programs fitting inside a single register, using languages not designed for generating large outputs. We propose lambda calculus-based Busy Beaver functions that offer various advantages over the existing Turing machine-based ones, including a [...] Read more.
We investigate how large an output can be computed by programs fitting inside a single register, using languages not designed for generating large outputs. We propose lambda calculus-based Busy Beaver functions that offer various advantages over the existing Turing machine-based ones, including a direct relation to Kolmogorov complexity. Full article
26 pages, 360 KB  
Article
The Boltzmann Entropy and Randomness Tests
by Peter Gács
Entropy 2026, 28(4), 429; https://doi.org/10.3390/e28040429 - 11 Apr 2026
Viewed by 396
Abstract
In the context of the dynamical systems of classical mechanics, we introduce two new notions called “algorithmic fine-grain and coarse-grain entropy”. The fine-grain algorithmic entropy is, on the one hand, a simple variant of the randomness tests of Martin–Löf (and others) and is, [...] Read more.
In the context of the dynamical systems of classical mechanics, we introduce two new notions called “algorithmic fine-grain and coarse-grain entropy”. The fine-grain algorithmic entropy is, on the one hand, a simple variant of the randomness tests of Martin–Löf (and others) and is, on the other hand, a connecting link between description (Kolmogorov) complexity, Gibbs entropy and Boltzmann entropy. The coarse-grain entropy is a slight correction to Boltzmann’s coarse-grain entropy. Its main advantage is its less partition-dependence, which is because algorithmic entropies for different coarse grainings are approximations of one and the same fine-grain entropy. It has the desirable properties of Boltzmann entropy in a wider range of systems, including those of interest in the “thermodynamics of computation”. It also helps explain the behavior of some unusual spin systems arising from cellular automata. Full article
37 pages, 637 KB  
Article
AI Agents as Universal Task Solvers
by Alessandro Achille and Stefano Soatto
Entropy 2026, 28(3), 332; https://doi.org/10.3390/e28030332 - 16 Mar 2026
Viewed by 2281
Abstract
We describe AI agents as stochastic dynamical systems and frame the problem of learning to reason as in transductive inference: Rather than approximating the distribution of past data as in classical induction, the objective is to capture its algorithmic structure so as [...] Read more.
We describe AI agents as stochastic dynamical systems and frame the problem of learning to reason as in transductive inference: Rather than approximating the distribution of past data as in classical induction, the objective is to capture its algorithmic structure so as to reduce the time needed to solve new tasks. In this view, information from past experience serves not only to reduce a model’s uncertainty, as in Shannon’s classical theory, but to reduce the computational effort required to find solutions to unforeseen tasks. Working in the verifiable setting, where a checker or reward function is available, we establish three main results. First, we show that the optimal speed-up for a new task is tightly related to the algorithmic information it shares with the training data, yielding a theoretical justification for the power-law scaling empirically observed in reasoning models. Second, while the compression view of learning, rooted in Occam’s Razor, favors simplicity, we show that transductive inference yields its greatest benefits precisely when the data-generating mechanism is most complex. Third, we identify a possible failure mode of naïve scaling: in the limit of unbounded model size and computing, models with access to a reward signal can behave as savants, brute-forcing solutions without acquiring transferable reasoning strategies. Accordingly, we argue that a critical quantity to optimize when scaling reasoning models is time, the role of which in learning has remained largely unexplored. Full article
Show Figures

Figure 1

29 pages, 674 KB  
Article
The Algorithmic Regulator
by Giulio Ruffini
Entropy 2026, 28(3), 257; https://doi.org/10.3390/e28030257 - 26 Feb 2026
Viewed by 1906
Abstract
The regulator theorem states that, under certain conditions, any optimal controller must embody a model of the system it regulates, grounding the idea that controllers embed, explicitly or implicitly, internal models of the controlled. This principle underpins neuroscience and predictive brain theories like [...] Read more.
The regulator theorem states that, under certain conditions, any optimal controller must embody a model of the system it regulates, grounding the idea that controllers embed, explicitly or implicitly, internal models of the controlled. This principle underpins neuroscience and predictive brain theories like the Free-Energy Principle or Kolmogorov/Algorithmic Agent theory. However, the theorem is only proven in limited settings. Here, we treat the deterministic, closed, coupled world-regulator system (W,R) as a single self-delimiting program p via a constant-size wrapper that produces the world output string x fed to the regulator. We analyze regulation from the viewpoint of the algorithmic complexity of the output, K(x) (regulation as compression). We define R to be a good algorithmic regulator if it reduces the algorithmic complexity of the readout relative to a null (unregulated) baseline ⌀, i.e., Δ=KOW,KOW,R>0. We then prove that the larger Δ is, the more world-regulator pairs with high mutual algorithmic information are favored. More precisely, a complexity gap Δ>0 yields Pr((W,R)x)C 2M(W:R)2Δ, making low M(W:R) exponentially unlikely as Δ grows. This is an AIT version of the idea that “the regulator contains a model of the world.” The framework is distribution-free, applies to individual sequences, and complements the Internal Model Principle. Beyond this necessity claim, the same coding-theorem calculus singles out a canonical scalar objective and implicates a planner. On the realized episode, a regulator behaves as if it minimized the conditional description length of the readout. Full article
Show Figures

Graphical abstract

6 pages, 210 KB  
Article
Why Turing’s Computable Numbers Are Only Non-Constructively Closed Under Addition
by Jeff Edmonds
Entropy 2026, 28(1), 71; https://doi.org/10.3390/e28010071 - 7 Jan 2026
Viewed by 737
Abstract
Kolmogorov complexity asks whether a string can be outputted by a Turing Machine (TM) whose description is shorter. Analogously, a real number is considered computable if a Turing machine can generate its decimal expansion. The modern ϵ-approximation definition of computability, widely used [...] Read more.
Kolmogorov complexity asks whether a string can be outputted by a Turing Machine (TM) whose description is shorter. Analogously, a real number is considered computable if a Turing machine can generate its decimal expansion. The modern ϵ-approximation definition of computability, widely used in practical computation, ensures that computable reals are constructively closed under addition. However, Turing’s original 1936 digit-by-digit notion, which demands the direct output of the n-th digit, presents a stark divergence. Though the set of Turing-computable reals is not constructively closed under addition, we prove that a Turing machine capable of computing x+y non-constructively exists. The core constructive computational barrier arises from determining the ones digit of a sum like 0.333¯+0.666¯=0.999¯=1.000¯. This particular example is ambiguous because both 0.999¯ and 1.000¯ are legitimate decimal representations of the same number. However, if any of the infinite number of 3s in the first term is changed to a 2 (e.g., 0.3332+0.666¯), the sum’s leading digit is definitely zero. Conversely, if it is changed to a 4 (e.g., 0.3334+0.666¯), the leading digit is definitely one. This implies an inherent undecidability in determining these digits. Recent papers and our work address this issue. Hamkins provides an informal argument, while Berthelette et al. present more complicated formal proof, and our contribution offers a simple reduction to the Halting Problem. We demonstrate that determining when carry propagation stops can be resolved with a single query to an oracle that tells if and when a given TM halts. Because a concrete answer to this query exists, so does a TM computing the digits of x+y, though the proof is non-constructive. As far as we know, the analogous question for multiplication remains open. This, we feel, is an interesting addition to the story. This reveals a subtle but significant difference between the modern ϵ-approximation definition and Turing’s original 1936 digit-by-digit notion of a computable number, as well as between constructive and non-constructive proof. This issue of computability and numerical precision ties into algorithmic information and Kolmogorov complexity. Full article
16 pages, 368 KB  
Article
A Physical Framework for Algorithmic Entropy
by Jeff Edmonds
Entropy 2026, 28(1), 61; https://doi.org/10.3390/e28010061 - 4 Jan 2026
Viewed by 711
Abstract
This paper does not aim to prove new mathematical theorems or claim a fundamental unification of physics and information, but rather to provide a new pedagogical framework for interpreting foundational results in algorithmic information theory. Our focus is on understanding the profound connection [...] Read more.
This paper does not aim to prove new mathematical theorems or claim a fundamental unification of physics and information, but rather to provide a new pedagogical framework for interpreting foundational results in algorithmic information theory. Our focus is on understanding the profound connection between entropy and Kolmogorov complexity. We achieve this by applying these concepts to a physical model. Our work is centered on the distinction, first articulated by Boltzmann, between observable low-complexity macrostates and unobservable high-complexity microstates. We re-examine the known relationships linking complexity and probability, as detailed in works like Li and Vitányi’s An Introduction to Kolmogorov Complexity and Its Applications. Our contribution is to explicitly identify the abstract complexity of a probability distribution K(ρ) with the concrete physical complexity of a macrostate K(M). Using this framework, we explore the “Not Alone” principle, which states that a high-complexity microstate must belong to a large cluster of peers sharing the same simple properties. We show how this result is a natural consequence of our physical framework, thus providing a clear intuitive model for understanding how algorithmic information imposes structural constraints on physical systems. We end by exploring concrete properties in physics, resolving a few apparent paradoxes, and revealing how these laws are the statistical consequences of simple rules. Full article
Show Figures

Figure 1

Review

Jump to: Research, Other

23 pages, 405 KB  
Review
Algorithmic Compression via Pretrained Neural Networks
by Tim Genewein, Jordi Grau-Moya, Li Kevin Wenliang, Laurent Orseau and Marcus Hutter
Entropy 2026, 28(6), 596; https://doi.org/10.3390/e28060596 - 27 May 2026
Viewed by 1795
Abstract
The success of large neural networks trained for sequential prediction via log-loss minimization over massive and diverse datasets has sparked debate regarding the fundamental limits of this paradigm. While these models are not explicitly programmed to perform planning and search, their behavior increasingly [...] Read more.
The success of large neural networks trained for sequential prediction via log-loss minimization over massive and diverse datasets has sparked debate regarding the fundamental limits of this paradigm. While these models are not explicitly programmed to perform planning and search, their behavior increasingly resembles complex reasoning and adaptive problem-solving. This paper reviews a series of theoretical and empirical works, aiming to bridge the gap between the practical success of LLMs and formal theories of computation and intelligence—that is, algorithmic information theory and Universal Artificial Intelligence. Grounded in the framework of memory-based meta-learning, the main argument is that training sequence models to predict the next token across diverse tasks implicitly meta-trains them to perform algorithmic compression, thereby performing (amortized) Bayesian inference over the task in-context. Consequently, when pretrained on a sufficiently rich data distribution, the resulting neural networks behave as if compressing by inferring the generative algorithm producing the observed data. We discuss recent theoretical and empirical evidence demonstrating that this approach can approximate Solomonoff induction in the theoretical limit, match exact Bayesian inference on complex sources in practice, achieve strong compression on out-of-distribution data, and synthesize complex in-context algorithms like chessboard evaluations. As models become more capable and general, the theoretical understanding through the lens of algorithmic information theory, including hard theoretical limits and how far practical models are from them, becomes increasingly relevant. We thus conclude our paper by outlining a number of open research questions to further bridge the gap from well-understood theory to modern machine learning practice. Full article
Show Figures

Figure 1

Other

Jump to: Research, Review

8 pages, 250 KB  
Perspective
From Levin’s Universal Search to Policy-Guided Tree Search
by Ming Li
Entropy 2026, 28(4), 434; https://doi.org/10.3390/e28040434 - 13 Apr 2026
Viewed by 707
Abstract
Levin’s universal search embodies a striking principle: when a solution is efficiently verifiable, one can allocate search effort across candidate procedures according to a prior and obtain performance competitive (up to constant) with the best procedure in the reference class first published in [...] Read more.
Levin’s universal search embodies a striking principle: when a solution is efficiently verifiable, one can allocate search effort across candidate procedures according to a prior and obtain performance competitive (up to constant) with the best procedure in the reference class first published in (Levin, 1972). The accompanying video and Levin’s paper (Levin 2023) describe the history of this seminal result from the first-person perspective. While Andrey Kolmogorov passed away in 1987, Kolmogorov’s last student, Leonid Levin, systematically developed the theory of Kolmogorov complexity, including universal search. This perspective revisits the conceptual core of Levin-style universal search and finds its relationship to Solomonoff induction, a universal Bayesian framework for prediction that mixes over all computable hypotheses. Like Solomonoff induction, which serves as a spiritual foundation of large language models (LLMs), the Levin universal search has found important applications in AI. In this paper, we will follow one thread of this research on deterministic planning and reinforcement learning via policy-guided tree search. Full article
Back to TopTop