# A Technical Critique of Some Parts of the Free Energy Principle

^{1}

^{2}

^{*}

^{†}

## Abstract

**:**

## 1. Overview

**Condition**

**1.**

**Condition**

**2.**

**Step****2**- Rewrite the vector field $f(\psi ,s,a,\lambda )$ describing the dynamics of the system in terms of the gradient of the negative logarithm of the ergodic density ${p}^{*}(\psi ,s,a,\lambda )$ of that system.
**Step****3**- Rewrite the components ${f}_{\lambda}(s,a,\lambda )$ and ${f}_{a}(s,a,\lambda )$ of the vector field $f(\psi ,s,a,\lambda )$ in terms of only partial gradients of the negative logarithm of ${p}^{*}(\psi ,s,a,\lambda )$.
**Step****4**- Assert (in the free energy lemma) the existence of a density $q\left(\psi \right|\lambda )$ over the external coordinates $\psi $ parameterized by the internal coordinates $\lambda $ and that $f(\psi ,s,a,\lambda )$ can again be rewritten, this time in terms of a free energy depending on $q\left(\mathsf{\Psi}\right|\lambda )$ (here, and whenever it would otherwise be ambiguous, we use a capitalized $\mathsf{\Psi}$ to indicate full distributions, rather than the probability density for a specific value of $\psi $).
**Step****5**- Claim that the equivalence of the equations of motion in Step 3 and Step 4 implies that certain partial gradients of the KL divergence between $q\left(\mathsf{\Psi}\right|\lambda )$ and the conditional ergodic density ${p}^{*}\left(\mathsf{\Psi}\right|s,a,\lambda )$ must vanish.
**Step****6**- Claim that it follows from Step 5 that $q\left(\mathsf{\Psi}\right|\lambda )$ and ${p}^{*}\left(\mathsf{\Psi}\right|s,a,\lambda )$ are “rendered” equal.
**Step****7**- Interpret:
- ${p}^{*}\left(\mathsf{\Psi}\right|s,a,\lambda )$ as a posterior over external coordinates given particular values of sensor, active, and internal coordinates,
- $q\left(\mathsf{\Psi}\right|\lambda )$ as encoding Bayesian beliefs about the external coordinates by the internal coordinates, and
- their equality as the internal coordinates appearing to “solve the problem of Bayesian inference”.

- The re-expression of Equation (1) in the form chosen in Step 2 is derived under restrictive assumptions, including that the system is subject to Gaussian and Markov noise.
- Conditions 1 and 2 are independent of each other.
- Conditions 1 and 3 together lead to a system where the interpretation of s and a as sensory and active coordinates is questionable.
- Under both Conditions 1 and 2, the expressions of ${f}_{\lambda}(s,a,\lambda )$ and ${f}_{a}(s,a,\lambda )$ resulting from Step 3 are not as general as those contained in the result of Step 2. The more general alternative expression derived in [2] remains insufficiently general.
- Under both Conditions 1 and 2, the free energy lemma, when taken at face value, is wrong and cannot be salvaged by using alternatives in Step 3.
- Under both Conditions 1 and 2, contrary to Step 6, the vanishing of the gradient of the KL divergence does not imply the equality of $q\left(\mathsf{\Psi}\right|\lambda )$ and ${p}^{*}\left(\mathsf{\Psi}\right|s,a,\lambda )$.
- As a consequence, the basic preconditions for the interpretations in Step 7 are not implied by either of the two proposed Markov blankets Conditions 1 and 2.

## 2. Expression via the Gradient of the Ergodic Density

**Observation**

**1.**

**Proof.**

## 3. Re-Expression Using Only Partial Gradients

**Observation**

**2.**

**Proof.**

**Condition**

**3.**

**Observation**

**3.**

**Proof.**

## 4. Free Energy Lemma

- There is a $q\left(\mathsf{\Psi}\right|\lambda )$ such that the partial gradients ${\nabla}_{a}$ and ${\nabla}_{\lambda}$ of the KL divergence between the variational density and the conditional ergodic density are elements of the nullspaces of ${(\mathsf{\Gamma}+R)}_{aa}$ and ${(\mathsf{\Gamma}+R)}_{\lambda \lambda}$, respectively.
- There is a $q\left(\mathsf{\Psi}\right|\lambda )$ such that the gradients of the KL divergence to ${p}^{*}\left(\mathsf{\Psi}\right|s,a,\lambda )$ are equal to the nullvector:$$\begin{array}{c}\hfill {\nabla}_{a}{\mathrm{D}}_{\mathrm{KL}}\left[q\left(\mathsf{\Psi}\right|\lambda )\right||{p}^{*}\left(\mathsf{\Psi}\right|s,a,\lambda )]=0,\end{array}$$$$\begin{array}{c}\hfill {\nabla}_{\lambda}{\mathrm{D}}_{\mathrm{KL}}\left[q\left(\mathsf{\Psi}\right|\lambda )\right||{p}^{*}\left(\mathsf{\Psi}\right|s,a,\lambda )]=0,\end{array}$$Then, they are always elements of the nullspaces of ${(\mathsf{\Gamma}+R)}_{aa}$ and ${(\mathsf{\Gamma}+R)}_{\lambda \lambda}$, respectively.
- There is a $q\left(\mathsf{\Psi}\right|\lambda )$ such that $q\left(\mathsf{\Psi}\right|\lambda )={p}^{*}\left(\mathsf{\Psi}\right|s,a,\lambda )$ (and hence, ${p}^{*}\left(\mathsf{\Psi}\right|s,a,\lambda )={p}^{*}\left(\mathsf{\Psi}\right|\lambda )$), which implies that the KL divergence to ${p}^{*}\left(\mathsf{\Psi}\right|s,a,\lambda )$ vanishes for all $a,\lambda $ and the two partial gradients are always nullvectors and therefore elements of the according nullspaces.

**Observation**

**4.**

**(i)****(ii)****(iii)**

**Proof.**

## 5. Vanishing Gradients

## 6. Equality of $\mathbf{Q}\left(\mathsf{\Psi}\right|\lambda )$ and ${P}^{*}\left(\mathsf{\Psi}\right|s,a,\lambda )$

In other words, the flow of internal and active states minimizes free energy, rendering the variational density equivalent to the posterior density over external states.”

**Observation**

**5.**

**(i)****(ii)****(iii)**

**Proof.**

## 7. Interpretation

Because (by Gibbs inequality) this divergence [D_{KL}$\left[q\left(\psi \right|\lambda )\right|\left|{p}^{*}\left(\psi \right|s,a,\lambda )\right]$] cannot be less than zero, the internal flow will appear to have minimized the divergence between the variational and posterior density. In other words, the internal states will appear to have solved the problem of Bayesian inference by encoding posterior beliefs about hidden (external) states, under a generative model provided by the Gibbs energy.

## 8. Consequences for Friston, K. et al. 2014

- The Markov blanket structure was not explicitly defined via Equation (2). Formally, it was introduced directly (see [4] Equation (10)) in a less general form corresponding to Equations (19) and (20) (at the same time, [1] is referenced in connection to the Markov blanket so there seems to be no intention to replace the original definition with the stronger one). Therefore, our observations concerning Steps 2 to 4 are not directly relevant to this paper.
- The internal coordinate $\lambda $ was renamed to r, and the role of matrix R was played by the matrix $-Q$.
- The proof of the free energy lemma given in [4] was different. It (implicitly) suggested setting the variational density equal to the ergodic conditional posterior.
- The proof of the free energy lemma no longer contained the proposition that the gradient of the KL divergence of the variational density and the ergodic conditional density vanish, i.e., Step 5.
- The proof also no longer contained the claim that the vanishing gradients of the KL divergence of the variational density and the ergodic conditional density imply the equality of those densities, i.e., Step 6 was not present.

**Observation**

**6.**

**Proof.**

## 9. Conclusions

## Author Contributions

## Funding

## Acknowledgments

## Conflicts of Interest

## Appendix A. Counterexamples for Observation 1

## Appendix B. Counterexample for Step 3

## Appendix C. Counterexamples for Step 4

## Appendix D. Counterexample for Step 6

## Appendix E. Translating Systems into Generalized Coordinate Systems

## References

- Friston, K. Life as we know it. J. R. Soc. Interface
**2013**, 10, 2013.0475. [Google Scholar] [CrossRef] [PubMed][Green Version] - Friston, K. A free energy principle for a particular physics. arXiv
**2019**, arXiv:1906.10184. [Google Scholar] - Parr, T.; Da Costa, L.; Friston, K. Markov blankets, information geometry and stochastic thermodynamics. Philos. Trans. R. Soc. A
**2019**, 378, 2019.0159. [Google Scholar] [CrossRef] [PubMed][Green Version] - Friston, K.; Sengupta, B.; Auletta, G. Cognitive Dynamics: From Attractors to Active Inference. Proc. IEEE
**2014**, 102, 427–445. [Google Scholar] [CrossRef] - Friston, K.; Rigoli, F.; Ognibene, D.; Mathys, C.; Fitzgerald, T.; Pezzulo, G. Active inference and epistemic value. Cogn. Neurosci.
**2015**, 6, 187–214. [Google Scholar] [CrossRef] [PubMed] - Ao, P. Potential in stochastic differential equations: Novel construction. J. Phys. A
**2004**, 37, L25–L30. [Google Scholar] [CrossRef] - Kwon, C.; Ao, P.; Thouless, D.J. Structure of stochastic dynamics near fixed points. Proc. Natl. Acad. Sci. USA
**2005**, 102, 13029–13033. [Google Scholar] [CrossRef] [PubMed][Green Version] - Kwon, C.; Ao, P. Nonequilibrium steady state of a stochastic system driven by a nonlinear drift force. Phys. Rev. E
**2011**, 84, 061106. [Google Scholar] [CrossRef] [PubMed][Green Version] - Ma, Y.A.; Chen, T.; Fox, E.B. A complete recipe for stochastic gradient MCMC. In Proceedings of the 28th International Conference on Neural Information Processing Systems—Volume 2; MIT Press: Montreal, QC, Canada, 2015; pp. 2917–2925. [Google Scholar]
- Yuan, R.; Tang, Y.; Ao, P. SDE decomposition and A-type stochastic interpretation in nonequilibrium processes. Front. Phys.
**2017**, 12, 120201. [Google Scholar] [CrossRef] - Ao, P.; Chen, T.Q.; Shi, J.H. Dynamical Decomposition of Markov Processes without Detailed Balance. Chin. Phys. Lett.
**2013**, 30, 070201. [Google Scholar] [CrossRef] - Yuan, R.S.; Ma, Y.A.; Yuan, B.; Ao, P. Lyapunov function as potential function: A dynamical equivalence. Chin. Phys. B
**2014**, 23, 010505. [Google Scholar] [CrossRef][Green Version] - Bishop, C. Pattern Recognition and Machine Learning; Information Science and Statistics; Springer: New York, NY, USA, 2006. [Google Scholar]
- Cover, T.M.; Thomas, J.A. Elements of Information Theory; Wiley-Interscience: Hoboken, NJ, USA, 2006. [Google Scholar]
- van Kampen, N.G. Stochastic Processes in Physics and Chemistry; North-Holland: Amsterdam, The Netherlands, 1981. [Google Scholar]
- Oberguggenberger, M. Generalized Functions and Stochastic Processes. In Seminar on Stochastic Analysis, Random Fields and Applications. Progress in Probability; Bolthausen, E., Dozzi, M., Russo, F., Eds.; Birkhäuser: Basel, Switzerland, 1995; Volume 36, pp. 215–230. [Google Scholar]
- Cornfeld, I.P.; Fomin, S.V.; Sinai, Y.G. Ergodic Theory; Springer: New York, NY, USA, 1982. [Google Scholar]

**Figure 1.**Argument visualization. Numbers labelling edges indicate corresponding steps in this paper. Struck out edges indicate implications that we prove incorrect. The main argument in [1] takes the left path. The box in the top right indicates the relations between Conditions 1 to 3 and their role in [3]. Merged edges indicate a logical AND combination of the parent nodes.

Publisher’s Note: MDPI stays neutral with regard to jurisdictional claims in published maps and institutional affiliations. |

© 2021 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (http://creativecommons.org/licenses/by/4.0/).

## Share and Cite

**MDPI and ACS Style**

Biehl, M.; Pollock, F.A.; Kanai, R. A Technical Critique of Some Parts of the Free Energy Principle. *Entropy* **2021**, *23*, 293.
https://doi.org/10.3390/e23030293

**AMA Style**

Biehl M, Pollock FA, Kanai R. A Technical Critique of Some Parts of the Free Energy Principle. *Entropy*. 2021; 23(3):293.
https://doi.org/10.3390/e23030293

**Chicago/Turabian Style**

Biehl, Martin, Felix A. Pollock, and Ryota Kanai. 2021. "A Technical Critique of Some Parts of the Free Energy Principle" *Entropy* 23, no. 3: 293.
https://doi.org/10.3390/e23030293