Algorithmic Compression via Pretrained Neural Networks
Abstract
1. Introduction
2. Background and Notation
2.1. Sequential Prediction and Compression
2.2. Bayesian Mixture Predictors
2.3. Kolmogorov Complexity and Solomonoff Induction
3. Theoretical Foundations: Meta-Learning and Bayesian Prediction
3.1. Meta-Learning and Bayesian Prediction
- 1.
- Samples a latent source or task from a task distribution ;
- 2.
- Generates one or more task sequences ;
- 3.
- Updates to minimize log-loss over these sequences.
3.2. In-Context Learning as a Consequence of Bayes-Optimality
4. Empirical Verification
4.1. Binary i.i.d. Sources and Bandit Tasks
4.2. Piecewise Stationary Sources
4.3. Variable-Order Markov Sources
5. Towards Universal Prediction via Meta-Learning
6. Empirical Evidence at Scale
6.1. Language Modeling Is Compression
6.2. Compression Perspective on Scaling Laws
6.3. From Passive Compression to Amortized Planning
7. Limitations and the Gap to Agency
7.1. The Inference and Support Gap

7.2. Causal Delusions and the Knowing–Doing Gap
7.3. Self-AIXI
8. Open Questions and Future Directions
- Practical Bounds on Algorithmic Complexity and Approximation Gap: While we have theoretical bounds on the approximation gap (Section 5), calculating or tightly bounding the Kolmogorov complexity of real-world datasets remains impossible. Developing practical Minimum Description Length (MDL) approximations [35], verifiable information distances for large models [34], and establishing tight bounds on the approximation gap under finite parameters, finite datasets, and imperfect optimization convergence are crucial for grounding the theory in realistic settings and establishing practical benchmarks for universal prediction.
- Mechanistic Interpretability of Internalized Algorithms: Wenliang et al. [30] provided initial insights into the internal representations of in-context Bayesian inference, while Shinnick et al. [36] showed that Transformers trained on procedural data develop modular internal structures. Recent work on chess-playing neural networks, such as Leela Chess Zero, has revealed a form of learned look-ahead, with intermediate layers encoding future board states [38,39]. However, the full mechanisms by which models implement search-free algorithms—such as the Chess Transformer’s amortized value computation—remain largely unknown. Understanding these internal structures and the algorithmic limits implied by certain architectures remains an important open problem. Equally important is disentangling whether generalization on novel tasks stems from a sufficiently broad pretraining distribution (rendering a large set of tasks as “in-distribution”) or from architectural inductive biases enabling true out-of-distribution generalization.
- Scaling Laws for Algorithmic Priors: Current scaling laws characterize log-loss as a function of parameter count and dataset size. However, how does the algorithmic diversity of the dataset influence the learned prior? Investigating the sample complexity of learning universal predictors and establishing scaling laws with respect to task diversity is an important open question.
- Efficient Prompting of Universal Predictors: An interesting open theoretical question is whether an approximation to a universal predictor like Solomonoff induction can be efficiently steered. Genewein et al. [5] asked whether Solomonoff’s predictor can be computationally prompted using prefixes of (relatively) short length to guarantee optimality on arbitrary downstream target tasks. While (gradient-based) prompt-optimization techniques worked well in practice for amortized Bayesian predictors on simple algorithmic data sources, giving a theoretical answer to the question is an open problem.
- From Universal Prediction to Universal Agency: As discussed in Section 7, translating a universal predictor into an optimal agent requires overcoming causal confounding and characterizing the meta-distribution gap. Key challenges include developing meta-learning objectives that encourage causal models [50], amortized do-operations, or structural mechanisms for active decision-making [13].
Additional Related Work
9. Conclusions
Author Contributions
Funding
Data Availability Statement
Acknowledgments
Conflicts of Interest
References
- Solomonoff, R.J. A formal theory of inductive inference. Part I. Inf. Control 1964, 7, 1–22. [Google Scholar] [CrossRef]
- Hutter, M. Universal Artificial Intelligence: Sequential Decisions Based on Algorithmic Probability; Springer Science & Business Media: Berlin/Heidelberg, Germany, 2004. [Google Scholar]
- Hutter, M.; Quarel, D.; Catt, E. An Introduction to Universal Artificial Intelligence; Chapman & Hall: London, UK, 2024. [Google Scholar] [CrossRef]
- Ortega, P.A.; Wang, J.X.; Rowland, M.; Genewein, T.; Kurth-Nelson, Z.; Pascanu, R.; Heess, N.; Veness, J.; Pritzel, A.; Sprechmann, P.; et al. Meta-learning of sequential strategies. arXiv 2019, arXiv:1905.03030. [Google Scholar] [CrossRef]
- Genewein, T.; Wenliang, L.K.; Grau-Moya, J.; Ruoss, A.; Orseau, L.; Hutter, M. Understanding Prompt Tuning and In-Context Learning via Meta-Learning. Adv. Neural Inf. Process. Syst. 2025, 38, 166910–166942. [Google Scholar]
- Hutter, M. The Hutter Prize. Prize for Compressing Human Knowledge. 2006. Available online: http://prize.hutter1.net/ (accessed on 23 May 2026).
- Grau-Moya, J.; Genewein, T.; Hutter, M.; Orseau, L.; Deletang, G.; Catt, E.; Ruoss, A.; Wenliang, L.K.; Mattern, C.; Aitchison, M.; et al. Learning Universal Predictors. In Proceedings of the International Conference on Machine Learning, Vienna, Austria, 21–27 July 2024; PMLR: New York, NY, USA, 2024; pp. 16178–16205. [Google Scholar]
- Genewein, T.; Delétang, G.; Ruoss, A.; Wenliang, L.K.; Catt, E.; Dutordoir, V.; Grau-Moya, J.; Orseau, L.; Hutter, M.; Veness, J. Memory-based meta-learning on non-stationary distributions. In Proceedings of the International Conference on Machine Learning, Honolulu, HI, USA, 23–29 July 2023; PMLR: New York, NY, USA, 2023; pp. 11173–11195. [Google Scholar]
- Delétang, G.; Ruoss, A.; Duquenne, P.; Catt, E.; Genewein, T.; Mattern, C.; Grau-Moya, J.; Wenliang, L.K.; Aitchison, M.; Orseau, L.; et al. Language Modeling Is Compression. In Proceedings of the Twelfth International Conference on Learning Representations, ICLR 2024, Vienna, Austria, 7–11 May 2024. [Google Scholar]
- Heurtel-Depeiges, D.; Ruoss, A.; Veness, J.; Genewein, T. Compression via Pre-trained Transformers: A Study on Byte-Level Multimodal Data. In Proceedings of the Forty-second International Conference on Machine Learning, ICML 2025, Vancouver, BC, Canada, 13–19 July 2025. [Google Scholar]
- Ruoss, A.; Delétang, G.; Medapati, S.; Grau-Moya, J.; Wenliang, L.K.; Catt, E.; Reid, J.; Lewis, C.A.; Veness, J.; Genewein, T. Amortized planning with large-scale transformers: A case study on chess. Adv. Neural Inf. Process. Syst. 2024, 37, 65765–65790. [Google Scholar]
- Mikulik, V.; Delétang, G.; McGrath, T.; Genewein, T.; Martic, M.; Legg, S.; Ortega, P. Meta-trained agents implement bayes-optimal agents. Adv. Neural Inf. Process. Syst. 2020, 33, 18691–18703. [Google Scholar]
- Catt, E.; Grau-Moya, J.; Hutter, M.; Aitchison, M.; Genewein, T.; Deletang, G.; Wenliang, L.K.; Veness, J. Self-predictive universal AI. Adv. Neural Inf. Process. Syst. 2023, 36, 27181–27198. [Google Scholar]
- Ortega, P.A.; Kunesch, M.; Delétang, G.; Genewein, T.; Grau-Moya, J.; Veness, J.; Buchli, J.; Degrave, J.; Piot, B.; Perolat, J.; et al. Shaking the foundations: Delusions in sequence models for interaction and control. arXiv 2021, arXiv:2110.10819. [Google Scholar] [CrossRef]
- Ruoss, A.; Pardo, F.; Chan, H.; Li, B.; Mnih, V.; Genewein, T. LMAct: A Benchmark for In-Context Imitation Learning with Long Multimodal Demonstrations. In Proceedings of the Forty-second International Conference on Machine Learning, ICML 2025, Vancouver, BC, Canada, 13–19 July 2025. [Google Scholar]
- Li, M.; Vitányi, P. An Introduction to Kolmogorov Complexity and Its Applications; Springer: Berlin/Heidelberg, Germany, 2008; Volume 3. [Google Scholar]
- Rissanen, J.J. Generalized Kraft inequality and arithmetic coding. IBM J. Res. Dev. 1976, 20, 198–203. [Google Scholar] [CrossRef]
- Rissanen, J. Modeling by shortest data description. Automatica 1978, 14, 465–471. [Google Scholar] [CrossRef]
- Grünwald, P.D. The Minimum Description Length Principle; MIT Press: Cambridge, MA, USA, 2007. [Google Scholar]
- Blier, L.; Ollivier, Y. The Description Length of Deep Learning Models. In Proceedings of the Advances in Neural Information Processing Systems, Montréal, QC, Canada, 3–8 December 2018; Curran Associates Inc.: Red Hook, NY, USA, 2018; Volume 31. [Google Scholar]
- Jiang, Z.; Yang, M.Y.; Tsirlin, M.; Tang, R.; Dai, Y.; Lin, J. “Low-Resource” Text Classification: A Parameter-Free Classification Method with Compressors. In Proceedings of the Findings of the Association for Computational Linguistics: ACL 2023, Toronto, ON, Canada, 9–14 July 2023; Association for Computational Linguistics: Stroudsburg, PA, USA, 2023; pp. 6810–6828. [Google Scholar]
- Shannon, C.E. A mathematical theory of communication. Bell Syst. Tech. J. 1948, 27, 379–423. [Google Scholar] [CrossRef]
- von Oswald, J.; Niklasson, E.; Randazzo, E.; Sacramento, J.; Mordvintsev, A.; Zhmoginov, A.; Vladymyrov, M. Transformers learn in-context by gradient descent. In Proceedings of the International Conference on Machine Learning, Honolulu, HI, USA, 23–29 July 2023; PMLR: New York, NY, USA, 2023; pp. 35151–35174. [Google Scholar]
- Akyürek, E.; Schuurmans, D.; Andreas, J.; Ma, T.; Zhou, D. What learning algorithm is in-context learning? Investigations with linear models. In Proceedings of the International Conference on Learning Representations, Kigali, Rwanda, 1–5 May 2023. [Google Scholar]
- von Oswald, J.; Schlegel, M.; Meulemans, A.; Kobayashi, S.; Niklasson, E.; Zucchet, N.; Scherrer, N.; Miller, N.; Sandler, M.; y Arcas, B.A.; et al. Uncovering mesa-optimization algorithms in Transformers. arXiv 2024. [Google Scholar] [CrossRef]
- Laskin, M.; Wang, L.; Oh, J.; Parisotto, E.; Spencer, S.; Steiber, R.; Strouse, D.; Hansen, S.S.; Filos, A.; Brooks, E.; et al. In-Context Reinforcement Learning with Algorithm Distillation. In Proceedings of the Eleventh International Conference on Learning Representations, Kigali, Rwanda, 1–5 May 2023. [Google Scholar]
- Lampinen, A.K.; Chan, S.C.Y.; Singh, A.K.; Shanahan, M. The broader spectrum of in-context learning. arXiv 2024, arXiv:2412.03782. [Google Scholar]
- Falck, F.; Wang, Z.; Holmes, C.C. Is In-Context Learning in Large Language Models Bayesian? A Martingale Perspective. In Proceedings of the International Conference on Machine Learning, Vienna, Austria, 21–27 July 2024; PMLR: New York, NY, USA, 2024. [Google Scholar]
- Lampinen, A.K.; Chaudhry, A.; Chan, S.C.Y.; Wild, C.; Wan, D.; Ku, A.; Bornschein, J.; Pascanu, R.; Shanahan, M.; McClelland, J.L. On the Generalization of Language Models from In-Context Learning and Finetuning: A Controlled Study. arXiv 2025, arXiv:2505.00661. [Google Scholar] [CrossRef]
- Wenliang, L.K.; Ruoss, A.; Grau-Moya, J.; Hutter, M.; Genewein, T. Why is prompting hard? Understanding prompts on binary sequence predictors. In Proceedings of the International Conference on Artificial Intelligence and Statistics (AISTATS), Valencia, Spain, 2–4 May 2026. [Google Scholar]
- Veness, J.; White, M.; Bowling, M.; György, A. Partition Tree Weighting. arXiv 2012. [Google Scholar] [CrossRef]
- Willems, F.M.; Shtarkov, Y.M.; Tjalkens, T.J. The context-tree weighting method: Basic properties. IEEE Trans. Inf. Theory 1995, 41, 653–664. [Google Scholar] [CrossRef]
- Delétang, G.; Ruoss, A.; Grau-Moya, J.; Genewein, T.; Wenliang, L.K.; Catt, E.; Cundy, C.; Hutter, M.; Legg, S.; Veness, J.; et al. Neural Networks and the Chomsky Hierarchy. In Proceedings of the Eleventh International Conference on Learning Representations, Kigali, Rwanda, 1–5 May 2023. [Google Scholar]
- Shaw, P.; Cohan, J.; Eisenstein, J.; Toutanova, K. Bridging Kolmogorov Complexity and Deep Learning: Asymptotically Optimal Description Length Objectives for Transformers. In Proceedings of the International Conference on Learning Representations (ICLR), Rio de Janeiro, Brazil, 23–27 April 2026. [Google Scholar]
- Bornschein, J.; Li, Y.; Hutter, M. Sequential Learning of Neural Networks for Prequential MDL. In Proceedings of the International Conference on Learning Representations (ICLR), Kigali, Rwanda, 1–5 May 2023. [Google Scholar]
- Shinnick, Z.; Jiang, L.; Saratchandran, H.; van den Hengel, A.; Teney, D. Transformers Pretrained on Procedural Data Contain Modular Structures for Algorithmic Reasoning. In Proceedings of the ICML 2025 Workshop on Methods and Opportunities at Small Scale, Vancouver, BC, Canada, 19 July 2025. [Google Scholar]
- Li, K.; Hopkins, A.K.; Bau, D.; Viégas, F.; Pfister, H.; Wattenberg, M. Emergent World Representations: Exploring a Sequence Model Trained on a Synthetic Task. In Proceedings of the International Conference on Learning Representations, Kigali, Rwanda, 1–5 May 2023. [Google Scholar]
- Jenner, E.; Kapur, S.; Georgiev, V.; Allen, C.; Emmons, S.; Russell, S. Evidence of Learned Look-Ahead in a Chess-Playing Neural Network. In Proceedings of the Advances in Neural Information Processing Systems, Vancouver, BC, Canada, 10–15 December 2024; Curran Associates, Inc.: Red Hook, NY, USA, 2024; Volume 37. [Google Scholar]
- Cruz, D. Understanding the learned look-ahead behavior of chess neural networks. arXiv 2025, arXiv:2505.21552. [Google Scholar] [CrossRef]
- Monroe, D.; Chalmers, P.A. Mastering Chess with a Transformer Model. arXiv 2024, arXiv:2409.12272. [Google Scholar] [CrossRef]
- Deora, P.; Vasudeva, B.; Behnia, T.; Thrampoulidis, C. In-Context Occam’s Razor: How Transformers Prefer Simpler Hypotheses on the Fly. In Proceedings of the Conference on Language Modeling, Montreal, QC, Canada, 7–10 October 2025. [Google Scholar]
- Elmoznino, E.; Marty, T.; Kasetty, T.; Gagnon, L.; Mittal, S.; Fathi, M.; Sridhar, D.; Lajoie, G. In-context learning and Occam’s razor. In Proceedings of the International Conference on Machine Learning, Vancouver, BC, Canada, 13–19 July 2025. [Google Scholar]
- Adaptive Agent Team; Bauer, J.; Baumli, K.; Baveja, S.; Behbahani, F.; Bhatt, A.; Bhoopchand, A.; Chang, M.; Clay, N.; Collister, A.; et al. Human-Timescale Adaptation in an Open-Ended Task Space. In Proceedings of the International Conference on Machine Learning, Honolulu, HI, USA, 23–29 July 2023; PMLR: New York, NY, USA, 2023. [Google Scholar]
- Chan, S.C.Y.; Santoro, A.; Lampinen, A.K.; Wang, J.X.; Singh, A.K.; Richemond, P.H.; McClelland, J.L.; Hill, F. Data Distributional Properties Drive Emergent In-Context Learning in Transformers. In Proceedings of the Advances in Neural Information Processing Systems, New Orleans, LA, USA, 28 November–9 December 2022; Curran Associates, Inc.: Red Hook, NY, USA, 2022; p. Vol ume35. [Google Scholar]
- Pearl, J. Causality: Models, Reasoning, and Inference; Cambridge University Press: Cambridge, UK, 2009. [Google Scholar]
- de Haan, P.; Jayaraman, D.; Levine, S. Causal Confusion in Imitation Learning. In Proceedings of the Advances in Neural Information Processing Systems, Vancouver, BC, Canada, 8–14 November 2019; Curran Associates, Inc.: Red Hook, NY, USA, 2019; Volume 32. [Google Scholar]
- Reed, S.; Zolna, K.; Parisotto, E.; Gomez Colmenarejo, S.; Novikov, A.; Barth-Maron, G.; Gimenez, M.; Sulsky, Y.; Kay, J.; Springenberg, J.T.; et al. A Generalist Agent. arXiv 2022, arXiv:2205.06175. [Google Scholar] [CrossRef]
- Paglieri, D.; Cupiał, B.; Coward, S.; Piterbarg, U.; Wolczyk, M.; Khan, A.; Pignatelli, E.; Kuciński, Ł.; Pinto, L.; Fergus, R.; et al. BALROG: Benchmarking Agentic LLM and VLM Reasoning On Games. In Proceedings of the International Conference on Learning Representations, Singapore, 24–28 April 2025. [Google Scholar]
- Ortega, P.A. Universal Artificial Intelligence as Imitation; Technical Report; Daios Technologies: London, UK, 2026; Available online: https://www.adaptiveagents.org/uiai (accessed on 23 May 2026).
- Shao, D.; Kleine Buening, T.; Kwiatkowska, M. A Unifying Framework for Causal Imitation Learning with Hidden Confounders. In Proceedings of the ICLR 2025 Workshop on Spurious Correlation and Shortcut Learning, Singapore, 28 April 2025. [Google Scholar]
- Xie, S.M.; Raghunathan, A.; Liang, P.; Ma, T. An Explanation of In-context Learning as Implicit Bayesian Inference. In Proceedings of the International Conference on Learning Representations, Virtual, 25–29 April 2022. [Google Scholar]
- Garg, S.; Tsipras, D.; Liang, P.; Valiant, G. What Can Transformers Learn In-Context? A Case Study of Simple Function Classes. In Proceedings of the Advances in Neural Information Processing Systems, New Orleans, LA, USA, 28 November–9 December 2022; Curran Associates, Inc.: Red Hook, NY, USA, 2022; Volume 35. [Google Scholar]
- Kirsch, L.; Harrison, J.; Sohl-Dickstein, J.; Metz, L. General-Purpose In-Context Learning by Meta-Learning Transformers. arXiv 2022, arXiv:2212.04458. [Google Scholar]
- Yadlowsky, S.; Doshi, L.; Tripuraneni, N. Pretraining Data Mixtures Enable Narrow Model Selection Capabilities in Transformer Models. arXiv 2023, arXiv:2311.00871. [Google Scholar] [CrossRef]
- Coda-Forno, J.; Binz, M.; Akata, Z.; Botvinick, M.; Wang, J.X.; Schulz, E. Meta-in-context learning in large language models. Adv. Neural Inf. Process. Syst. 2023, 36, 65189–65201. [Google Scholar]
- Mirchandani, S.; Xia, F.; Florence, P.; Ichter, B.; Driess, D.; Arenas, M.G.; Rao, K.; Sadigh, D.; Zeng, A. Large Language Models as General Pattern Machines. In Proceedings of the Conference on Robot Learning, Atlanta, GA, USA, 6–9 November 2023; PMLR: New York, NY, USA, 2023. [Google Scholar]
- Ravi, S.; Beatson, A. Amortized Bayesian Meta-Learning. In Proceedings of the International Conference on Learning Representations, New Orleans, LA, USA, 6–9 May 2019. [Google Scholar]
- Grant, E.; Finn, C.; Levine, S.; Darrell, T.; Griffiths, T.L. Recasting Gradient-Based Meta-Learning as Hierarchical Bayes. In Proceedings of the International Conference on Learning Representations, Vancouver, BC, Canada, 30 April–3 May 2018. [Google Scholar]
- Müller, S.; Hollmann, N.; Pineda Arango, S.; Grabocka, J.; Hutter, F. Transformers can do Bayesian inference. In Proceedings of the International Conference on Learning Representations, Virtual, 25–29 April 2022. [Google Scholar]
- Hollmann, N.; Müller, S.; Eggensperger, K.; Hutter, F. TabPFN: A Transformer That Solves Small Tabular Classification Problems in a Second. In Proceedings of the International Conference on Learning Representations, Kigali, Rwanda, 1–5 May 2023. [Google Scholar]
- Radev, S.T.; Mertens, U.K.; Voss, A.; Ardizzone, L.; Köthe, U. BayesFlow: Learning complex stochastic models with invertible neural networks. IEEE Trans. Neural Netw. Learn. Syst. 2020, 33, 1452–1466. [Google Scholar] [CrossRef] [PubMed]
- Reuter, A.; Rudner, T.G.J.; Fortuin, V.; Rügamer, D. Can Transformers Learn Full Bayesian Inference in Context? In Proceedings of the International Conference on Machine Learning, Vancouver, BC, Canada, 13–19 July 2025; PMLR: New York, NY, USA, 2025. [Google Scholar]
- Wan, J.; Mei, L. Large Language Models as Computable Approximations to Solomonoff Induction. arXiv 2025, arXiv:2505.15784. [Google Scholar] [CrossRef]

| Reference | Setting | Key Results/Implications |
|---|---|---|
| Ortega et al. [4] | Meta-learning theory | Log-loss minimization yields amortized Bayesian predictors. |
| Mikulik et al. [12] | Empirical confirmation of theory | Meta-trained predictors and policies match exact Bayes-optimal solutions. |
| Genewein et al. [8] | Non-stationary sources (PTW) | LSTMs match exact Bayesian inference (PTW algorithm) on piecewise-stationary sources with unobserved switching points. |
| Grau-Moya et al. [7] | Variable-order Markov sources (VoMs) & Solomonoff | Transformers match Bayes-optimal performance on VoMs (CTW); meta-learning can theoretically reach universal prediction. |
| Delétang et al. [9] | Language modeling is compression | LLMs trained on text compress images/audio better than domain-specific compressors. |
| Genewein et al. [5] | In-context learning theory | In-context learning is a necessary feature of Bayesian predictors (meta-trained nets). |
| Ruoss et al. [11] | Amortized chess engine | Amortization of complex algorithm; emergent planning. |
| Catt et al. [13] | Self-predictive agent | Formal bridge from Solomonoff prediction to AIXI-optimal action by letting the predictor do the heavy lifting. |
| Raw Rate (%) | Adjusted Rate (%) | |||||
|---|---|---|---|---|---|---|
| Compressor | enwik9 | ImageNet | LibriSpeech | enwik9 | ImageNet | LibriSpeech |
| gzip | 48.1 | 68.6 | 38.5 | 48.1 | 68.6 | 38.5 |
| LZMA2 | 50.0 | 62.4 | 38.2 | 50.0 | 62.4 | 38.2 |
| PNG | 80.6 | 61.7 | 37.6 | 80.6 | 61.7 | 37.6 |
| FLAC | 88.9 | 60.9 | 30.3 | 88.9 | 60.9 | 30.3 |
| Transformer 200K | 30.9 | 194.0 | 146.6 | 30.9 | 194.0 | 146.6 |
| Transformer 800K | 21.7 | 185.1 | 131.1 | 21.9 | 185.3 | 131.3 |
| Transformer 3.2M | 17.0 | 215.8 | 228.2 | 17.7 | 216.5 | 228.9 |
| Llama 2 (7B) | 8.9 | 53.4 | 23.1 | 1408.9 | 1453.4 | 1423.1 |
| Chinchilla 1B | 11.3 | 62.2 | 24.9 | 211.3 | 262.2 | 224.9 |
| Chinchilla 7B | 10.2 | 54.7 | 23.6 | 1410.2 | 1454.7 | 1423.6 |
| Chinchilla 70B | 8.3 | 48.0 | 21.0 | 14,008.3 | 14,048.0 | 14,021.0 |
| Agent | Train | Search | Tournament Elo | Puzzle Acc. (%) |
|---|---|---|---|---|
| 9 M Transformer | SL | 88.9 | ||
| 136 M Transformer | SL | 94.5 | ||
| 270 M Transformer | SL | 95.4 | ||
| GPT-3.5-turbo-instruct | SSL | — | 66.5 | |
| AlphaZero (policy only) | RL | 56.1 | ||
| AlphaZero (value only) | RL | 82.0 | ||
| AlphaZero (400 MCTS sim.) | RL | ✓ | 95.6 | |
| Lc0 (policy only) | RL | 88.6 | ||
| Lc0 (value only) | RL | 95.9 | ||
| Lc0 (400 MCTS sim.) | RL | ✓ | 99.6 | |
| Stockfish 16 (50 ms/move) | SL | ✓ | 99.8 | |
| Stockfish 16 (1.5 s/board) | SL | ✓ | 100.0 |
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content. |
© 2026 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license.
Share and Cite
Genewein, T.; Grau-Moya, J.; Wenliang, L.K.; Orseau, L.; Hutter, M. Algorithmic Compression via Pretrained Neural Networks. Entropy 2026, 28, 596. https://doi.org/10.3390/e28060596
Genewein T, Grau-Moya J, Wenliang LK, Orseau L, Hutter M. Algorithmic Compression via Pretrained Neural Networks. Entropy. 2026; 28(6):596. https://doi.org/10.3390/e28060596
Chicago/Turabian StyleGenewein, Tim, Jordi Grau-Moya, Li Kevin Wenliang, Laurent Orseau, and Marcus Hutter. 2026. "Algorithmic Compression via Pretrained Neural Networks" Entropy 28, no. 6: 596. https://doi.org/10.3390/e28060596
APA StyleGenewein, T., Grau-Moya, J., Wenliang, L. K., Orseau, L., & Hutter, M. (2026). Algorithmic Compression via Pretrained Neural Networks. Entropy, 28(6), 596. https://doi.org/10.3390/e28060596

