# A Decomposition Method for Global Evaluation of Shannon Entropy and Local Estimations of Algorithmic Complexity

^{1}

^{2}

^{3}

^{4}

^{5}

^{6}

^{7}

^{8}

^{*}

## Abstract

**:**

## 1. Introduction

#### 1.1. Algorithmic Complexity

#### 1.2. Algorithmic Probability

#### 1.3. Convergence Rate and the Invariance Theorem

## 2. The Use and Misuse of Lossless Compression

#### 2.1. Building on Block Entropy

## 3. The Coding Theorem Method (CTM)

## 4. The Block Decomposition Method (BDM)

#### 4.1. l-Overlapping String Block Decomposition

#### 4.2. 2- and w-Dimensional Complexity

#### 4.3. BDM Upper and Lower Absolute Bounds

**Proposition**

**1.**

**Proof.**

**Corollary**

**1.**

**Proof.**

## 5. Dealing with Object Boundaries

#### 5.1. Recursive BDM

#### 5.2. Periodic Boundary Conditions

## 6. BDM versus Shannon Entropy

- CTM is uncomputable but for decidable cases runs in $exp$ time.
- Non-overlapping string BDM and LD runs in linear time and ${n}^{d}$ polynomial time for d-dimensional objects.
- Overlapping BDM runs in $ns$ time with m the overlapping offset.
- Full overlapping with $m=1$ runs in ${2}^{n}$ polynomial time as a function of the number of overlapping elements n.
- Smooth BDM runs in linear time.
- Mutual Information BDM runs in $exp$ time for strings and d exponential for dimension d.

## 7. Error Estimations

- Trimming boundary condition: R, L, T and B are ignored, then $BDM(X)=BDM(X,R,L,T,B)$, with the undesired effect of general underestimation for objects not multiples of d. The error introduced (see Figure 7) is bounded between 0 (for matrices divisible by d) and ${k}^{2}/exp(k)$, where k is the size of X. The error is thus convergent ($exp(k)$ grows much faster than ${k}^{2}$) and can therefore be corrected, and is negligible as a function of array size as shown in Figure 7.
- Cyclic boundary condition (Figure 5 bottom): The matrix is mapped onto the surface of a torus such that there are no more boundaries and the application of the overlapping BDM version takes into consideration every part of the object. This will produce an over-estimation of the complexity of the object but will for the most part respect the ranking order of estimations if the same overlapping values are used with maximum overestimation $d-1\times max\{CTM(b)|b\in X\}$, where $K(b)$ is the maximum CTM value among all base matrices b in X after the decomposition of X.
- Full overlapping recursive decomposition: X is decomposed into ${(d-1)}^{2}$ base matrices of size $d\times d$ by traversing X with a sliding square block of size d. This will produce a polynomial overestimation in the size of the object of up to ${(d-1)}^{2}$, but if consistently applied it will for the most part preserve ranking.
- Adding low complexity rows and columns (we call this “add col”): If a matrix of interest is not multiple the base matrices, we add rows and columns until completion to the next multiple of the base matrix, then we correct the final result by substracting the borders that were artificially added.

#### 7.1. BDM Worse-Case Convergence towards Shannon Entropy

**Proposition**

**2.**

**Proof.**

## 8. Normalized BDM

## 9. CTM to BDM Transition

#### 9.1. Smooth BDM (and “Add Col”)

#### 9.2. Weighted Smooth BDM with Mutual Information

## 10. Testing BDM and Boundary Condition Strategies

`GraphData[]`repository. Graphs and their dual graphs were found by BDM to have estimated algorithmic complexities close to each other. While entropy and entropy rate do not perform well in any test compared to the other measures, compression retrieves similar values for cospectral graphs as compared to BDM, but it is outperformed by BDM on the duality test. The best BDM version for duals was different from that for cospectrals. For the duality test, the smooth, fully overlapping version of BDM outperforms all others, but for cospectrality, overlapping recursive BDM outperforms all others. In [18], we showed that BDM behaves in agreement with the theory with respect to the algorithmic complexity of graphs and the size of the automorphism group to which they belong. This is because the algorithmic complexity $K(G)$ of G is effectively a tight upper bound on $K(Aut(G))$.

## 11. Conclusions

- CTM deals with all bit strings of length 1–12 (and for some 20–30 bits).
- BDM deals with 12 bits to hundreds of bits (with a cumulative error that grows by the length of the strings—if not applied in conjunction with CTM). The worst case occurs when substrings share information content with other decomposed substrings and BDM just keeps adding their K values individually.
- CTM + BDM (deals with any string length but it is computationally extremely expensive)
- Lossless compression deals with no less than 100 bits and is unstable up to about 1 K bits.

## Author Contributions

## Funding

## Conflicts of Interest

## Abbreviations

BDM | Block Decomposition Method |

CTM | Coding Theorem Method |

TM | Turing machine |

K | Kolmogorov complexity |

## Appendix A. Entropy and Block Entropy v CTM

**Figure A1.**The randomness of the digits of $\pi $ as measured by Shannon entropy, Block entropy and CTM. Strengthening the claim made in Figure 6, and Figure A1 in the Appendix A, here we show the trend of average movement of Entropy and Block Entropy towards 1 and CTM’s average remaining the same but variance slightly reduced. The stronger the colour the more digits into consideration. The direction of Block entropy is the clearest, first from a sample of 100 segments of length 12 bits from the first 1000 decimal digits of $\pi $ converted to binary (light orange) followed by a second run of 1000 segments of length 12 bits from the first 1 million decimal digits of $\pi $. When running CTM over longer period of time, the invariance theorem guarantees convergence to 0.

#### Appendix A.1. Duality and Cospectral Graph Proofs and Test Complement

**Theorem**

**A1.**

**Proof.**

**Theorem**

**A2.**

**Corollary**

**A1.**

**Proof.**

**NP**) that produces all relabellings—the simplest one is brute force permutation—and can verify graph isomorphism in time class

**P**[18]).) a computer program p that produces $Aut(G)$ for all G. With this program we can construct $Aut(G)$ from any graph ${G}^{\prime}\in Aut(G)$ and ${G}^{\prime}$ from $Aut(G)$ and the corresponding label n. Therefore $K({G}^{\prime})\le \left|p\right|+K(Aut(G))+log(n)+O(1)$ and $K(Aut(G))<=\left|p\right|+K({G}^{\prime})$. ☐

**Theorem**

**A3.**

**Proof.**

**Table A1.**List of strings with high entropy and high Block entropy but low algorithmic randomness detected and sorted from lowest to greatest values by CTM.

101010010101 | 010101101010 | 101111000010 | 010000111101 | 111111000000 |

000000111111 | 100101011010 | 011010100101 | 101100110010 | 010011001101 |

111100000011 | 110000001111 | 001111110000 | 000011111100 | 111110100000 |

000001011111 | 111101000001 | 111100000101 | 101000001111 | 100000101111 |

011111010000 | 010111110000 | 000011111010 | 000010111110 | 110111000100 |

001000111011 | 110111000001 | 100000111011 | 011111000100 | 001000111110 |

110100010011 | 110010001011 | 001101110100 | 001011101100 | 111110000010 |

101111100000 | 010000011111 | 000001111101 | 100000111110 | 011111000001 |

110101000011 | 110000101011 | 001111010100 | 001010111100 | 111100101000 |

111010110000 | 000101001111 | 000011010111 | 111100110000 | 000011001111 |

110000111010 | 101000111100 | 010111000011 | 001111000101 | 111100001010 |

101011110000 | 010100001111 | 000011110101 | 111011000010 | 101111001000 |

010000110111 | 000100111101 | 111000001011 | 110100000111 | 110011100010 |

101110001100 | 100011001110 | 011100110001 | 010001110011 | 001100011101 |

001011111000 | 000111110100 | 111010000011 | 110000010111 | 001111101000 |

000101111100 | 110011010001 | 100010110011 | 011101001100 | 001100101110 |

110101001100 | 110011010100 | 001100101011 | 001010110011 | 111000110010 |

110010100011 | 110001010011 | 101100111000 | 010011000111 | 001110101100 |

001101011100 | 000111001101 | 101100001110 | 100011110010 | 011100001101 |

010011110001 | 111000100011 | 110001000111 | 001110111000 | 000111011100 |

110000011101 | 101110000011 | 010001111100 | 001111100010 | 111101010000 |

000010101111 | 111010001100 | 110011101000 | 001100010111 | 000101110011 |

111000101100 | 110010111000 | 001101000111 | 000111010011 | 111011001000 |

000100110111 |

**Figure A2.**Scatterplots comparing the various BDM versions tested on dual and cospectral graphs that theoretically have the same algorithmic complexity up to a (small) constant. x-axis values for each top row plot are sorted by BDM for one of the dual and for the cospectral graph series. Bottom rows: on top of each corresponding scatterplot are the Spearman $\rho $ values.

**Figure A3.**Scatterplots comparing other measures against the best BDM performance. x-axis values for each top row plot are sorted by BDM for one of the dual and for the cospectral graph series. Bottom rows: on top of each corresponding scatterplot are the Spearman $\rho $ values.

#### Appendix A.2. The Online Algorithmic Complexity Calculator and Language Implementations

`acss`[45] R package.

**Figure A4.**The Online Algorithmic Complexity Calculator available at http://www.complexitycalculator.com. Full code for the R Shiny web server used is available at [46].

**Table A2.**Computer programs in different languages implementing various BDM versions. We have shown that all implementations agree with each other in various degrees, with the only differences having to do with under- or over-estimated values and time complexity and scalability properties. They are extremely robust, thereby establishing that the use of the most basic versions (1D n-o, 2D n-o) are justified in most cases. “WL” stands for Wolfram Language, the language behind e.g., the Mathematica platform, “Online” for the online calculator, “Cyc” for Cyclic, “Norm” stands for normalized, “Rec” for recursive, “Smo” for “Smooth”, “N-o” for Nonoverlapping and “addcol” for the method that adds rows and columns of lowest complexity to the borders up to the base string/array/tensor size. If not stated then it supports overlapping. All programs are available at https://www.algorithmicdynamics.net/software.html.

Lang | 1D n-o | 1D 2D n-o | 2D Rec | 2D Cyc | 2D Smo | 2D Norm | 1-D LD | Addcol |
---|---|---|---|---|---|---|---|---|

Online | ✓ | ✓ | ✓ | × | × | × | × | × |

WL | ✓ | ✓ | ✓ | ✓ | ✓ | × | ✓ | × |

R | ✓ | ✓ | ✓ | ✓ | × | × | × | × |

Matlab | ✓ | × | ✓ | ✓ | × | ✓ | × | ✓ |

Haskell | ✓ | ✓ | ✓ | ✓ | × | × | × | × |

Perl | × | × | ✓ | × | × | × | × | × |

Python | × | × | ✓ | × | × | × | × | × |

Pascal | × | × | ✓ | × | × | × | × | × |

C++ | × | × | ✓ | × | × | × | × | × |

## References

- Zenil, H.; Kiani, N.A.; Tegnér, J. Low-algorithmic-complexity entropy-deceiving graphs. Phys. Rev. E
**2017**, 96, 012308. [Google Scholar] [CrossRef] [PubMed] - Daley, R.P. An Example of Information and Computation Trade-Off. J. ACM
**1973**, 20, 687–695. [Google Scholar] [CrossRef] - Levin, L.A. Universal sequential search problems. Probl. Inf. Transm.
**1973**, 9, 265–266. [Google Scholar] - Schmidhuber, J. The Speed Prior: A New Simplicity Measure Yielding Near-Optimal Computable Predictions. In Proceedings of the 15th Annual Conference on Computational Learning Theory (COLT 2002), Sydney, Australia, 8–10 July 2002; Kivinen, J., Sloan, R.H., Eds.; Springer: Berlin, Germany, 2002; pp. 216–228. [Google Scholar]
- Li, M.; Vitányi, P. An Introduction to Kolmogorov Complexity and Its Applications, 3rd. ed.; Springer: Heidelberg, Germany, 2009. [Google Scholar]
- Cilibrasi, R.; Vitányi, P.M. Clustering by compression. IEEE Trans. Inf. Theory
**2005**, 51, 1523–1545. [Google Scholar] [CrossRef] - Zenil, H.; Badillo, L.; Hernández-Orozco, S.; Hernández-Quiroz, F. Coding-theorem Like Behaviour and Emergence of the Universal Distribution from Resource-bounded Algorithmic Probability. Int. J. Parallel Emerg. Distrib. Syst.
**2018**, 1–20. [Google Scholar] [CrossRef] - Zenil, H. Algorithmic Data Analytics, Small Data Matters and Correlation versus Causation. In Berechenbarkeit der Welt? Philosophie und Wissenschaft im Zeitalter von Big Data (Computability of the World? Philosophy and Science in the Age of Big Data); Ott, M., Pietsch, W., Wernecke, J., Eds.; Springer: Berlin, Germany, 2017; pp. 453–475. [Google Scholar][Green Version]
- Zenil, H.; Soler-Toscano, F.; Delahaye, J.P.; Gauvrit, N. Two-dimensional Kolmogorov complexity and an empirical validation of the Coding Theorem Method by compressibility. PeerJ Comput. Sci.
**2015**, 1, e23. [Google Scholar] [CrossRef] - Gauvrit, N.; Soler-Toscano, F.; Zenil, H. Natural scene statistics mediate the perception of image complexity. Vis. Cognit.
**2014**, 22, 1084–1091. [Google Scholar] [CrossRef][Green Version] - Gauvrit, N.; Singmann, H.; Soler-Toscano, F.; Zenil, H. Algorithmic complexity for psychology: A user-friendly implementation of the coding theorem method. Behav. Res. Methods
**2016**, 48, 314–329. [Google Scholar] [CrossRef] [PubMed] - Kempe, V.; Gauvrit, N.; Forsyth, D. Structure emerges faster during cultural transmission in children than in adults. Cognition
**2015**, 136, 247–254. [Google Scholar] [CrossRef] [PubMed] - Emmert-Streib, F.; Dehmer, M. Exploring statistical and population aspects of network complexity. PLoS ONE
**2012**, 7, e34523. [Google Scholar] [CrossRef] [PubMed] - Dehmer, M.M. A novel method for measuring the structural information content of networks. Cybern. Syst.
**2008**, 39, 825–842. [Google Scholar] [CrossRef] - Dehmer, M.M.; Barbarini, N.N.; Varmuza, K.K.; Graber, A.A. Novel topological descriptors for analyzing biological networks. BMC Struct. Biol.
**2010**, 10, 18. [Google Scholar] [CrossRef] [PubMed] - Mowshowitz, A.; Dehmer, M.M. Entropy and the complexity of graphs revisited. Entropy
**2012**, 14, 559–570. [Google Scholar] [CrossRef] - Holzinger, A.; Ofner, B.; Stocker, C.; Valdez, A.C.; Schaar, A.K.; Ziefle, M.; Dehmer, M. On graph entropy measures for knowledge discovery from publication network data. In International Conference on Availability, Reliability, and Security; Springer: Berlin/Heidelberg, Germany, 2013; pp. 354–362. [Google Scholar]
- Zenil, H.; Soler-Toscano, F.; Dingle, K.; Louis, A. Correlation of Automorphism Group Size and Topological Properties with Program-size Complexity Evaluations of Graphs and Complex Networks. Physica A
**2014**, 404, 341–358. [Google Scholar] [CrossRef] - Zenil, H.; Kiani, N.A.; Tegnér, J. Methods of information theory and algorithmic complexity for network biology. In Seminars in Cell & Developmental Biology; Academic Press: Cambridge, MA, USA, 2016; Volume 51, pp. 32–43. [Google Scholar]
- Levin, L.A. Laws of information conservation (nongrowth) and aspects of the foundation of probability theory. Probl. Pereda. Inf.
**1974**, 10, 30–35. [Google Scholar] - Chaitin, G.J. On the length of programs for computing finite binary sequences. J. ACM
**1966**, 13, 547–569. [Google Scholar] [CrossRef] - Kolmogorov, A.N. Three approaches to the quantitative definition of information. Probl. Inf. Transm.
**1965**, 1, 1–7. [Google Scholar] [CrossRef] - Calude, C.S.; Salomaa, K.; Roblot, T.K. Finite state complexity. Theor. Comput. Sci.
**2011**, 412, 5668–5677. [Google Scholar] [CrossRef] - Downey, R.G.; Hirschfeldt, D.R. Algorithmic Randomness and Complexity; Springer: Berlin, Germany, 2010. [Google Scholar]
- Martin-Löf, P. The definition of random sequences. Inf. Control
**1966**, 9, 602–619. [Google Scholar] [CrossRef] - Solomonoff, R.J. A formal theory of inductive inference. Part I. Inf. Control
**1964**, 7, 1–22. [Google Scholar] [CrossRef] - Calude, C.S. Information and Randomness: An Algorithmic Perspective; Springer: Berlin, Germany, 2002. [Google Scholar]
- Cover, T.M.; Thomas, J.A. Elements of Information Theory; John Wiley & Sons: Hoboken, NJ, USA, 2012. [Google Scholar]
- Kirchherr, W.; Li, M.; Vitányi, P. The Miraculous Universal Distribution. Math. Intell.
**1997**, 19, 7–15. [Google Scholar] [CrossRef] - Solomonoff, R.J. Complexity–Based Induction Systems: Comparisons and Convergence Theorems. IEEE Trans. Inf. Theory
**1978**, 24, 422–432. [Google Scholar] [CrossRef] - Solomonoff, R.J. The Application of Algorithmic Probability to Problems in Artificial Intelligence. In Uncertainty in Artificial Intelligence; Kanal, L.N., Lemmer, J.F., Eds.; Elsevier: New York, NY, USA, 1986; pp. 473–491. [Google Scholar]
- Solomonoff, R.J. A System for Incremental Learning Based on Algorithmic Probability. In Proceedings of the Sixth Israeli Conference on Artificial Intelligence, Computer Vision and Pattern Recognition, Tel Aviv, Israel, 26–27 December 1989; pp. 515–527. [Google Scholar]
- Soler-Toscano, F.; Zenil, H.; Delahaye, J.P.; Gauvrit, N. Calculating Kolmogorov complexity from the output frequency distributions of small Turing machines. PloS ONE
**2014**, 9, e96223. [Google Scholar] [CrossRef] [PubMed] - Ziv, J.; Lempel, A. Compression of individual sequences via variable-rate coding. IEEE Trans. Inf. Theory
**1978**, 24, 530–536. [Google Scholar] [CrossRef][Green Version] - Delahaye, J.P.; Zenil, H. Numerical Evaluation of the Complexity of Short Strings: A Glance Into the Innermost Structure of Algorithmic Randomness. Appl. Math. Comput.
**2012**, 219, 63–77. [Google Scholar] [CrossRef] - Zenil, H. Une Approche Expérimentalea la Théorie Algorithmique de la Complexité. Ph.D. Thesis, Université de Paris, Paris, France, 2013. (In French). [Google Scholar]
- Calude, C.S.; Stay, M.A. Most programs stop quickly or never halt. Adv. Appl. Math.
**2008**, 40, 295–308. [Google Scholar] [CrossRef] - Rado, T. On non-computable functions. Bell Syst. Tech. J.
**1962**, 41, 877–884. [Google Scholar] [CrossRef] - Zenil, H. From Computer Runtimes to the Length of Proofs: With an Algorithmic Probabilistic Application to Waiting Times in Automatic Theorem Proving. In Computation, Physics and Beyond International Workshop on Theoretical Computer Science; Dinneen, M.J., Khousainov, B., Nies, A., Eds.; Springer: Berlin/Heidelberg, Germany, 2012; pp. 223–240. [Google Scholar]
- Brady, A.H. The determination of the value of Rado’s noncomputable function Σ(k) for four-state Turing machines. Math. Comput.
**1983**, 40, 647–665. [Google Scholar] - Soler-Toscano, F.; Zenil, H.; Delahaye, J.P.; Gauvrit, N. Correspondence and Independence of Numerical Evaluations of Algorithmic Information Measures. Computability
**2013**, 2, 125–140. [Google Scholar] - Langton, C.G. Studying artificial life with cellular automata. Physica D
**1986**, 22, 120–149. [Google Scholar] [CrossRef] - Morse, M.; Hedlund, G.A. Unending Chess, Symbolic Dynamics, and a Problem in Semigroups. Duke Math. J.
**1944**, 11, 1–7. [Google Scholar] [CrossRef] - Bailey, D.H.; Borwein, P.B.; Plouffe, S. On the Rapid Computation of Various Polylogarithmic Constants. Math. Comput.
**1997**, 66, 903–913. [Google Scholar] [CrossRef] - Gauvrit, N.; Singmann, H.; Soler-Toscano, F.; Zenil, H. Acss: Algorithmic Complexity for Short Strings, Package at the Comprehensive R Archive Network. Available online: https://cran.r-project.org/web/packages/acss/ (accessed on 13 June 2018).
- Rueda-Toicen, A.; Singmann, H. The Online Algorithmic Complexity Calculator, R Shiny Code Repository. Available online: https://github.com/andandandand/OACC (accessed on 13 June 2018).
- Soler-Toscano, F.; Zenil, H. Kolmogorov Complexity of 3 × 3 and 4 × 4 Squares, on the Wolfram Demonstrations Project. Available online: http://demonstrations.wolfram.com/KolmogorovComplexityOf33And44Squares/ (accessed on 13 June 2018).

**Figure 1.**Hypothetical behaviour of (non-)regular convergence rates of the constant involved in the invariance theorem. The invariance theorem guarantees that complexity values for a string s measured by different reference UTMs ${U}_{1}$ and ${U}_{2}$ will only diverge by a constant c (the length between ${U}_{1}$ and ${U}_{2}$) independent of s yet it does not tell how fast or in what way the convergence may happen particularly at the beginning. The invariance theorem only tells us that at the limit the curve will converge to a small and constant value c, but it tells us nothing about the rate of convergence or about transitional behaviour.

**Figure 2.**The best version of Shannon entropy can be rewritten as a function of variable block length where the minimum value best captures the (possible) periodicity of a string here illustrated with three strings of length 12, regular, periodic and random-looking. Because blocks larger than $n/2$ would result in only one block and therefore entropy equal to 0, the largest possible block is $n/2$. The normalized version (bottom) divides the entropy value for that block size by the largest possible number of blocks for that size and alphabet (here binary).

**Figure 3.**(

**A**) Observed data, a sequence of successive positive natural numbers. (

**B**) The transition table of the Turing machine found by running all possible small Turing machines. (

**C**) The same transition table in visual form. (

**D**) The space-time evolution of the Turing machine for starting from an empty tape. (

**E**) Space-time evolution of the Turing machine implementing a binary counter, taking as halting criterion the leftmost position of the original Turing machine head as depicted in

**C**(states are arrows). (

**E**) This small computer program that our CTM and BDM methods find (c.f. next Section) mean that the sequence in

**A**is not algorithmic random as the program represents a succinct generative causal model (and thus not random) for any arbitrary length that otherwise would have been assigned a maximal randomness with Shannon Entropy among all strings of the same length (in the face of no other knowledge about the source) despite its highly algorithmic non-random structured nature. Entropy alone—only equipped to spot statistical regularities when there is no access to probability distributions—cannot find this kind of generative models demonstrating the low randomness of an algorithmic sequence. (

**F**) This illustrates how algorithmic complexity and entropy may diverge in practice.

**Figure 4.**Non-overlapping BDM calculations are invariant to block permutations (reshuffling base strings and matrices), even when these permutations may have different complexities due to the reorganization of the blocks that can produce statistical or algorithmic patterns. For example, starting from a string of size 24 (

**top**) or an array of size $8\times 8$ (

**bottom**), with decomposition length $l=8$ for strings and decomposition $l=4\times 4$ block size for the array, all 6 permutations for the string and all 6 permutations for the array have the same BDM value regardless of the shuffling procedure.

**Figure 5.**One way to deal with the decomposition of n-dimensional tensors is to embed them in an n-dimensional torus ($n=2$ in the case of the one depicted here), making the borders cyclic or periodic by joining the borders of the object. Depicted here are three examples of graph canonical adjacency matrices embedded in a 2-dimensional torus that preserves the object complexity on the surface, a complete graph, a cycle graph and an Erdös-Rényi graph with edge density 0.5, all of size 20 nodes and free of self-loops. Avoiding borders has the desired effect of producing no residual matrices after the block decomposition with overlapping.

**Figure 6.**(

**A**) Telling $\pi $ and the Thue-Morse sequence apart from truly (algorithmic) random sequences. CTM assigns significantly lower randomness (

**B**,

**D**–

**F**) to known low algorithmic complexity objects. (

**B**) If absolute Borel normal (as strongly suspected and statistically demonstrated to any confidence degree), $\pi $’s entropy and block entropy asymptotically approximate 1 while, by the invariance theorem of algorithmic complexity, CTM asymptotically approximates 0. Smooth transitions between CTM and BDM are also shown (

**C**,

**D**) as a function of string complexity. Other smooth transition functions of BDM are explored and introduced in Section 9.1.

**Figure 7.**Strings that are assigned lower randomness than that estimated by entropy.

**Top left**: Comparison between values of entropy, compression (

`Compress[]`) and BDM over a sample of 100 strings of length 10,000 generated from a binary random variable following a Bernoulli distribution and normalized by maximal complexity values. Entropy follows a Bernoulli distribution and, unlike compression that follows entropy, BDM values produce clear convex-shaped gaps on each side assigning lower complexity to some strings compared to both entropy and compression.

**Top right**: The results confirmed using a popular lossless compression algorithm BZip2 (and also confirmed, even if not reported, with LZMA) on 100 random strings of 100 bits each (BZip2 is slower than Compress but achieves greater compression).

**Bottom left**: The $CT{M}_{low}(s)-{H}_{high}(s)$ gap between near-maximal entropy and low algorithmic complexity grows and is consistent along different string lengths, here from 8 to 12 bits. This gap is the one exploited by BDM and carried out over longer strings, which gives it the algorithmic edge against entropy and compression.

**Bottom right**: When strings are sorted by CTM, one notices that BZip2 collapses most strings to minimal compressibility. Over all ${2}^{12}=4096$ possible binary strings of length 12, entropy only produces 6 different entropy values, but CTM is much more fine-grained, and this is extended to the longer strings by BDM, which succeeds in identifying strings of lower algorithmic complexity that have near-maximal entropy and therefore no statistical regularities. Examples of such strings are in Section 6.

**Figure 8.**Error rate for 2-dimensional arrays. With no loss of generalization, the error rate for n-dimensional tensors ${lim}_{d\to \infty}\frac{{k}^{n}}{{n}^{k}}=0$ is convergent and thus negligible, even for the discontinuities disregarded in this plot which are introduced by some BDM versions, such as non-overlapping blocks and discontinuities related to trimming the boundary condition.

**Figure 9.**Box plot showing the error introduced by BDM quantified by CTM. The sigmoid appearance comes from the fact that we actually have exact values for CTM up to bitstrings of length 12 and so $BDM(12)=CTM(i)$ for $i=12$ but the slope of the curve gives an indication of the errors when assuming that BDM has only access to $CTM(i)$ for $i<12$ versus actual $CTM(12)$. This means that the error grows linearly as a function of CTM and of the string length, and the accuracy degrades smoothly and slowly towards entropy if CTM is not updated.

**Figure 10.**NBDM assigns maximum value 1 to any base matrix with highest CTM or any matrix constructed out of base matrices. In this case, the 4 base matrices on the left are those with the highest CTM in the space of all base matrices of the same size, while the matrix to the left is assigned the highest value because it is built out of the maximum complexity base matrices.

**Figure 11.**Spearman correlation coefficients ($\rho $) between CTM and BDM of all possible block sizes and overlap lengths for 12 bit strings, compared with the correlation between CTM and Shannon entropy, and the correlation between CTM and compression length (shown at the rightmost edge of the plot) in blue. $\rho $ coefficients for the 2048 strings below and above the median CTM value are shown in green and orange, respectively. BDM block size and overlap increases to the left. Compression length was obtained using Mathematica’s

`Compress[]`function. All values were normalized as described in Section 8.

**Table 1.**Calculated empirical distributions from rulespace $(t,k)$. Letter codes: F full space, S sample, $R(t,k)$ reduced enumeration. Time is given in seconds (s), hours (h) and days (d).

(t,k) | Calculation | Number of Machines | Time |
---|---|---|---|

(2,2) | F (6 steps) | $|R(2,2)|=2000$ | 0.01 s |

(3,2) | F (21) | $|R(3,2)|=\mathrm{2,151,296}$ | 8 s |

(4,2) | F (107) | $|R(4,2)|=\mathrm{3,673,320,192}$ | 4 h |

(4,2)${}_{2D}$ | ${F}_{2D}$ (1500) | ${|R(4,2)}_{2D}|=\mathrm{315,140,100,864}$ | 252 d |

(4,4) | S (2000) | $334\times {10}^{9}$ | 62 d |

(4,5) | S (2000) | $214\times {10}^{9}$ | 44 d |

(4,6) | S (2000) | $180\times {10}^{9}$ | 41 d |

(4,9) | S (4000) | $200\times {10}^{9}$ | 75 d |

(4,10) | S (4000) | $201\times {10}^{9}$ | 87 d |

(5,2) | F (500) | $|R(5,2)|=\mathrm{9,658,153,742,336}$ | 450 d |

(5,2)${}_{2D}$ | ${S}_{2D}$ (2000) | $1291\times {10}^{9}$ | 1970 d |

**Table 2.**Summary of ranges of application and scalability of CTM and all versions of BDM. d stands for the dimension of the object.

Short Strings $<100$ bits | Long Strings $>100$ bits | Scalability | |
---|---|---|---|

Lossless compression | × | ✓ | n |

Coding Theorem Method (CTM) | ✓ | × | $exp$ to ∞ |

Non-overlapping BDM | ✓ | ✓ | n |

Full-overlapping Recursive BDM | ✓ | ✓ | ${n}^{d-1}$ |

Full-overlapping Smooth BDM | ✓ | ✓ | ${n}^{d-1}$ |

Smooth add col BDM | ✓ | ✓ | n |

**Table 3.**Spearman $\rho $ values of various BDM versions tested on dual and cospectral graphs that theoretically have the same algorithmic complexity up to a (small) constant.

Non-Overlapping BDM | Fully Overlapping Recursive BDM | Smooth Fully Overlapping BDM | Smooth Add Row or Column BDM | |
---|---|---|---|---|

Duality test | 0.874 | 0.783 | 0.935 | 0.931 |

Cospectrality test | 0.943 | 0.933 | 0.9305 | 0.931 |

© 2018 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (http://creativecommons.org/licenses/by/4.0/).

## Share and Cite

**MDPI and ACS Style**

Zenil, H.; Hernández-Orozco, S.; Kiani, N.A.; Soler-Toscano, F.; Rueda-Toicen, A.; Tegnér, J.
A Decomposition Method for Global Evaluation of Shannon Entropy and Local Estimations of Algorithmic Complexity. *Entropy* **2018**, *20*, 605.
https://doi.org/10.3390/e20080605

**AMA Style**

Zenil H, Hernández-Orozco S, Kiani NA, Soler-Toscano F, Rueda-Toicen A, Tegnér J.
A Decomposition Method for Global Evaluation of Shannon Entropy and Local Estimations of Algorithmic Complexity. *Entropy*. 2018; 20(8):605.
https://doi.org/10.3390/e20080605

**Chicago/Turabian Style**

Zenil, Hector, Santiago Hernández-Orozco, Narsis A. Kiani, Fernando Soler-Toscano, Antonio Rueda-Toicen, and Jesper Tegnér.
2018. "A Decomposition Method for Global Evaluation of Shannon Entropy and Local Estimations of Algorithmic Complexity" *Entropy* 20, no. 8: 605.
https://doi.org/10.3390/e20080605