# Entropy of Artificial Intelligence

^{*}

## Abstract

**:**

## 1. Introduction

#### What Does Science Teach Us about Learning?

## 2. Our Mathematical Model Space

- def.:
- A complete model of a subset of all possible images, $C\subset X$, is a bijection to an N-bits string, whose coordinates (single bits) are totally independent. The ${\mathrm{Prob}}_{C}(x=\sigma )$ probabilities are either deterministic, showing the value of zero or one, or are uniformly distributed—for irrelevant bits.

- def.:
- The deterministic (zero or one) bits in the ‘cat’ set C are overall relevant coordinates,
- def.:
- while partially relevant coordinates are those independent bits that are either deterministic or uniformly distributed for all ${C}_{a}$ and C, but at least in one subset, they are deterministic.
- def.:
- Irrelevant coordinates are those independent coordinates that are uniform distributed in all subsets which are ‘cats.’

- Classification: In order to find elements from disjoint subsets and put them apart, the partially relevant bits have to be inspected. Moreover, if the leading bits disagree with those of the union set, then one immediately concludes that the shown image is not an element of any pre-determined class: an outlier is identified.
- Regression: i.e., obtaining parameters of a function from noisy function values can also be treated as a classification problem. Let, e.g., the sets $\mathsf{\Omega}(a,b)$ contain the noisy functions around the smooth one with parameters a and b. A pair $({x}_{i},{y}_{i})$ belongs to $\mathsf{\Omega}(a,b)$ if the probability, derived by using a model of the noise, is maximal. Finally, once a common complete model is learned, one decides about any further point pair by inspecting the partially relevant bits. These also indicate if the found numerical values do not fit in any of the classes. The AI may understand when it does not understand. Will that imply intelligence or awareness?
- Decoding: the AI task is to single out a random ‘cat,’ a random element in ${\mathsf{\Omega}}_{i}$. Since the relevant bits are constant over ${\mathsf{\Omega}}_{i}$, one performs:$${x}^{-1}({\sigma}_{relevant},{\sigma}_{irrelevant}=\mathrm{random}\phantom{\rule{0.166667em}{0ex}}\mathrm{uniform})\in {\mathsf{\Omega}}_{i}.$$The distribution of irrelevant coordinates being uniform, this chooses among the elements with equal probability.
- Data compression: By knowing that the relevant bits are all the same for the cats, it suffices to keep the irrelevant ones, compressing the required length of bit-strings this way. This compression is lossless and can be undone.

## 3. Entropy of a Representation

**Proposal**

**1.**

## 4. How Many Relevant Bits?

## 5. Conclusions

## Author Contributions

## Funding

## Institutional Review Board Statement

## Informed Consent Statement

## Data Availability Statement

## Acknowledgments

## Conflicts of Interest

## References

- Zalta, E.N. (Ed.) Steup, Matthias and Ram Neta, Epistemology. The Stanford Encyclopedia of Philosophy (Fall 2020 Edition). Available online: https://plato.stanford.edu/archives/fall2020/entries/epistemology (accessed on 12 January 2022).
- Biró, T.S. Thermodynamics of composition rules. J. Phys. G
**2010**, 37, 094027. [Google Scholar] [CrossRef] - Tempesta, P. Formal grouops and Z-entropies. Proc. Math. Phys. Eng. Sci.
**2016**, 472, 20160143. [Google Scholar] - Wikipedia Article. Available online: https://en.wikipedia.org/wiki/Standard_Model (accessed on 12 January 2022).
- Shaposhnikov, M.; Wetterich, C. Asymptotic safety of gravity and the Higgs boson mass. Phys. Lett. B
**2010**, 683, 196–200. [Google Scholar] [CrossRef] [Green Version] - LeCun, Y.; Bengio, Y.; Hinton, G. Deep Learning. Nature
**2015**, 521, 436–444. [Google Scholar] [CrossRef] [PubMed] - Available online: https://en.wikipedia.org/wiki/Computational_learning_theory (accessed on 12 January 2022).
- Osherson, D.N.; Stob, M.; Weinstein, S. Systems That Learn: An Introduction to Learning Theory for Cognitive and Computer Scientists; MIT: Cambridge, MA, USA, 1990. [Google Scholar]
- Bengio, Y.; Courville, A.; Vincent, P. Representation Learning: A Review and New Perspectives. IEEE Trans. Pattern Anal. Mach. Intell.
**2013**, 35, 1798–1828. [Google Scholar] [CrossRef] - Higgins, I.; Sonnerat, N.; Matthey, L.; Pal, A.; Burgess, C.P.; Bosnjak, M.; Shanahan, M.; Botvinick, M.; Hassabis, D.; Lerchner, A. SCAN: Learning Hierarchical Compositional Visual Concepts. arXiv
**2017**, arXiv:1707.03389. [Google Scholar] - Higgins, I.; Amos, D.; Pfau, D.; Racaniere, S.; Matthey, L.; Rezende, D.; Lerchner, A. Towards a Definition of Disentangled Representations. arXiv
**2018**, arXiv:1812.02230. [Google Scholar] - Lu, H.; Li, Y.; Chen, M.; Kim, H.; Serikawa, S. Brain Intelligence: Go Beyond Artificial Intelligence. Mob. Netw. Appl.
**2018**, 23, 368–375. [Google Scholar] [CrossRef] [Green Version] - Available online: https://en.wikipedia.org/wiki/Renormalization_group (accessed on 12 January 2022).
- Mehta, P.; Schwab, D.J. An exact mapping between the Variational Renormalization Group and Deep Learning. arXiv
**2014**, arXiv:1410.3831. [Google Scholar] - Lin, H.; Tegmark, M. Why does deep and cheap learning work so well? J. Stat. Phys.
**2017**, 168, 1223–1247. [Google Scholar] [CrossRef] [Green Version] - Available online: https://en.wikipedia.org/wiki/Boltzmann_machine (accessed on 12 January 2022).
- Jakovac, A.; Berenyi, D.; Posfay, P. Understanding understanding: A renormalization group inspired model of (artificial) intelligence. arXiv
**2020**, arXiv:2010.13482. [Google Scholar] - Chen, M.; Hao, Y.; Gharavi, H.; Leung, V.C. Cognitive information measurements: A new perspective. Inf. Sci.
**2019**, 505, 487–497. [Google Scholar] [CrossRef] [PubMed] - Mediano, P.A.; Rosas, F.E.; Luppi, A.I.; Jensen, H.J.; Seth, A.K.; Barrett, A.B.; Carhart-Harris, R.L.; Bor, D. Greater than the parts: A review of the information decomposition approach to causal emergence. arXiv
**2021**, arXiv:2111.06518. [Google Scholar] - Csernai, L.P.; Spinnangr, S.F.; Velle, S. Quantitaive assesment of increasing complexity. Phys. A
**2017**, 473, 363–376. [Google Scholar] [CrossRef] [Green Version] - Aitchinson, J. The Statistical Analysis of Compositional Data. J. R. Stat. Soc. B
**1982**, 44, 139–160. [Google Scholar]

**Figure 2.**One bit pixel images obtained from (

**a**) magnetic Ising model fluctuations, and (

**b**) a black square on white background.

Publisher’s Note: MDPI stays neutral with regard to jurisdictional claims in published maps and institutional affiliations. |

© 2022 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).

## Share and Cite

**MDPI and ACS Style**

Biró, T.S.; Jakovác, A.
Entropy of Artificial Intelligence. *Universe* **2022**, *8*, 53.
https://doi.org/10.3390/universe8010053

**AMA Style**

Biró TS, Jakovác A.
Entropy of Artificial Intelligence. *Universe*. 2022; 8(1):53.
https://doi.org/10.3390/universe8010053

**Chicago/Turabian Style**

Biró, Tamás Sándor, and Antal Jakovác.
2022. "Entropy of Artificial Intelligence" *Universe* 8, no. 1: 53.
https://doi.org/10.3390/universe8010053