# Probabilistic Modeling with Matrix Product States

^{1}

^{2}

^{*}

## Abstract

**:**

## 1. Introduction

## 2. The Problem Formulation

**Problem**

**1.**

## 3. Outline of Our Approach to Solving the Problem

## 4. Effective Versions of the Problem

## 5. The Exact Single-Site DMRG Algorithm

## 6. Experiments

## 7. Discussion

## 8. Conclusions and Outlook

## Author Contributions

## Funding

## Acknowledgments

## Conflicts of Interest

## Appendix A. Multi-Site DMRG

- Use ${\psi}_{t}$ to define an isometric embedding ${\alpha}_{t+1}:{\mathcal{H}}_{\mathrm{eff}}\to \mathcal{H}$ with ${\psi}_{t}\in {\mathcal{H}}_{t+1}:={\alpha}_{t+1}\left({\mathcal{H}}_{\mathrm{eff}}\right).$
- Let ${\tilde{\psi}}_{t+1}$ be the unit vector in ${\mathcal{H}}_{t+1}$ closest to ${\psi}_{\widehat{\pi}}$.
- Perform a model repair of ${\tilde{\psi}}_{t+1}$ to obtain a vector ${\psi}_{t+1}\in \mathcal{M}.$ There are multiple ways to do the model repair.

**Figure A1.**The shaded region represents the model class $\mathcal{M}$. The red points all lie in ${\mathcal{H}}_{t+1}$. The vector ${\tilde{\psi}}_{t+1}$ is defined to be the unit vector in ${\mathcal{H}}_{t+1}$ closest to the target ${\psi}_{\widehat{\pi}}$. Note that ${\tilde{\psi}}_{t+1}$ does not lie in $\mathcal{M}$. The vector ${\psi}_{t+1}^{\mathrm{SVD}}$ is defined to be the vector in $\mathcal{M}\cap {\mathcal{H}}_{t+1}$ closest to to ${\tilde{\psi}}_{t+1}$. In this picture, $\parallel {\psi}_{t+1}^{\mathrm{SVD}}-{\psi}_{\widehat{\pi}}\parallel >\parallel {\psi}_{t}-{\psi}_{\widehat{\pi}}\parallel .$ There may be a point, such as the one labelled ${\psi}_{t+1}^{\mathrm{better}}$, which lies in $\mathcal{M}\cap {\mathcal{H}}_{t+1}$ and is closer to ${\psi}_{\widehat{\pi}}$ than ${\psi}_{t+1}^{\mathrm{SVD}}$, notwithstanding the fact that is is further from ${\tilde{\psi}}_{t+1}$. This figure, to scale, depicts a scenario in which $\parallel {\psi}_{t}-{\psi}_{\widehat{\pi}}\parallel $ = 0.09, $\parallel {\psi}_{t+1}^{\mathrm{SVD}}-{\psi}_{\widehat{\pi}}\parallel $ = 0.10, $\parallel {\psi}_{t+1}^{\mathrm{better}}-{\psi}_{\widehat{\pi}}\parallel $ = 0.07, $\parallel {\tilde{\psi}}_{t+1}-{\psi}_{\widehat{\pi}}\parallel $ = 0.06, $\parallel {\psi}_{t+1}^{\mathrm{SVD}}-{\tilde{\psi}}_{t+1}\parallel $ = 0.07, and $\parallel {\psi}_{t+1}^{\mathrm{better}}-{\tilde{\psi}}_{t+1}\parallel $ = 0.08.

## References

- Biamonte, J.; Wittek, P.; Pancotti, N.; Rebentrost, P.; Wiebe, N.; Lloyd, S. Quantum machine learning. Nature
**2017**, 549, 195. [Google Scholar] [CrossRef] - Preskill, J. Quantum Computing in the NISQ era and beyond. Quantum
**2018**, 2, 79. [Google Scholar] [CrossRef] - Peruzzo, A.; McClean, J.; Shadbolt, P.; Yung, M.H.; Zhou, X.Q.; Love, P.J.; Aspuru-Guzik, A.; O’brien, J.L. A variational eigenvalue solver on a photonic quantum processor. Nat. Commun.
**2014**, 5, 4213. [Google Scholar] [CrossRef] [Green Version] - Farhi, E.; Goldstone, J.; Gutmann, S. A quantum approximate optimization algorithm. arXiv
**2014**, arXiv:1411.4028. [Google Scholar] - Huggins, W.; Patil, P.; Mitchell, B.; Whaley, K.B.; Stoudenmire, E.M. Towards quantum machine learning with tensor networks. Quantum Sci. Technol.
**2019**, 4, 024001. [Google Scholar] [CrossRef] [Green Version] - Liu, J.G.; Wang, L. Differentiable learning of quantum circuit Born machines. Phys. Rev. A
**2018**, 98, 062324. [Google Scholar] [CrossRef] [Green Version] - Benedetti, M.; Garcia-Pintos, D.; Perdomo, O.; Leyton-Ortega, V.; Nam, Y.; Perdomo-Ortiz, A. A generative modeling approach for benchmarking and training shallow quantum circuits. npj Quantum Inf.
**2019**, 5, 45. [Google Scholar] [CrossRef] - Du, Y.; Hsieh, M.H.; Liu, T.; Tao, D. The expressive power of parameterized quantum circuits. arXiv
**2018**, arXiv:1810.11922. [Google Scholar] - Killoran, N.; Bromley, T.R.; Arrazola, J.M.; Schuld, M.; Quesada, N.; Lloyd, S. Continuous-variable quantum neural networks. Phys. Rev. Res.
**2019**, 1, 033063. [Google Scholar] [CrossRef] [Green Version] - Shepherd, D.; Bremner, M.J. Temporally unstructured quantum computation. Proc. R. Soc. A Math. Phys. Eng. Sci.
**2015**, 465. [Google Scholar] [CrossRef] - Romero, E.; Mazzanti Castrillejo, F.; Delgado, J.; Buchaca, D. Weighted Contrastive Divergence. Neural Netw.
**2018**. [Google Scholar] [CrossRef] [PubMed] [Green Version] - Shalev-Shwartz, S.; Shamir, O.; Shammah, S. Failures of Gradient-Based Deep Learning. In Proceedings of the 34th International Conference on Machine Learning (ICML 2017), Sydney, Australia, 6–11 August 2017; pp. 3067–3075. [Google Scholar]
- Han, Z.Y.; Wang, J.; Fan, H.; Wang, L.; Zhang, P. Unsupervised generative modeling using matrix product states. Phys. Rev. X
**2018**, 8, 031012. [Google Scholar] [CrossRef] [Green Version] - Cheng, S.; Wang, L.; Xiang, T.; Zhang, P. Tree tensor networks for generative modeling. Phys. Rev. B
**2019**, 99, 155131. [Google Scholar] [CrossRef] [Green Version] - Farhi, E.; Neven, H. Classification with quantum neural networks on near term processors. arXiv
**2018**, arXiv:1802.06002. [Google Scholar] - Stoudenmire, E.M. The Tensor Network. 2019. Available online: https://tensornetwork.org (accessed on 13 February 2019).
- Schollwöck, U. The density-matrix renormalization group in the age of matrix product states. Ann. Phys.
**2011**, 326, 96–192. [Google Scholar] [CrossRef] [Green Version] - Bridgeman, J.C.; Chubb, C.T. Hand-waving and interpretive dance: An introductory course on tensor networks. J. Phys. A Math. Theor.
**2017**, 50, 223001. [Google Scholar] [CrossRef] [Green Version] - Orús, R. A practical introduction to tensor networks: Matrix product states and projected entangled pair states. Ann. Phys.
**2014**, 349, 117–158. [Google Scholar] [CrossRef] [Green Version] - Glasser, I.; Sweke, R.; Pancotti, N.; Eisert, J.; Cirac, J.I. Expressive power of tensor-network factorizations for probabilistic modeling, with applications from hidden Markov models to quantum machine learning. arXiv
**2019**, arXiv:1907.03741. [Google Scholar] - Montúfar, G.F.; Rauh, J.; Ay, N. Expressive Power and Approximation Errors of Restricted Boltzmann Machines. In Proceedings of the 24th International Conference on Neural Information Processing Systems (NIPS’11), Granada, Spain, 12–15 December 2011; Curran Associates Inc.: Red Hook, NY, USA, 2011; pp. 415–423. [Google Scholar]
- Amin, M.H.; Andriyash, E.; Rolfe, J.; Kulchytskyy, B.; Melko, R. Quantum boltzmann machine. Phys. Rev. X
**2018**, 8, 021050. [Google Scholar] [CrossRef] [Green Version] - Kappen, H.J. Learning quantum models from quantum or classical data. arXiv
**2018**, arXiv:1803.11278. [Google Scholar] - Bradley, T.D.; Stoudenmire, E.M.; Terilla, J. Modeling Sequences with Quantum States: A Look Under the Hood. arXiv
**2019**, arXiv:1910.07425. [Google Scholar]

**Figure 1.**A bird’s eye view of the training dynamics of exact single-site DMRG on the unit sphere. (

**a**) The initial vector ${\psi}_{0}$ and the vector ${\psi}_{\widehat{\pi}}$ lie in the unit sphere of $\mathcal{H}$. (

**b**) The vector ${\psi}_{0}$ is used to define the subspace ${\mathcal{H}}_{1}$. The unit vectors in ${\mathcal{H}}_{1}$ define a lower dimensional sphere in $\mathcal{H}$ (in blue). The vector ${\psi}_{1}$ is the vector in that sphere that is closest to ${\psi}_{\widehat{\pi}}$. (

**c**) The vector ${\psi}_{1}$ is used to define the subspace ${\mathcal{H}}_{2}$. The unit sphere in ${\mathcal{H}}_{2}$ (in blue) contains ${\psi}_{1}$ but does not contain ${\psi}_{0}$. The vector ${\psi}_{2}$ is the unit vector in ${\mathcal{H}}_{2}$ closest to ${\psi}_{\widehat{\pi}}$. (

**d**) The vector ${\psi}_{2}$ is used to define the subspace ${\mathcal{H}}_{3}$. The vector ${\psi}_{3}$ is the unit vector in ${\mathcal{H}}_{3}$ closest to ${\psi}_{\widehat{\pi}}$. And so on.

**Figure 2.**A representative bias-variance tradeoff curve showing negative log-likelihood (base 2) as a function of bond dimension for exact single-site DMRG on the ${P}_{20}$ dataset. For bond dimension 3, the generalization gap is approximately $\u03f5=0.0237$. For reference, the uniform distribution on bitstrings has NLL of 20. Memorizing the training data would yield a NLL of approximately $13.356$.

**Figure 3.**A representative bias-variance tradeoff curve showing negative log-likelihood (base 2) as a function of bond dimension for exact single-site DMRG on the div7 dataset. For bond dimension 8, the generalization gap is approximately $\u03f5=0.032$. For reference, the uniform distribution on bitstrings has NLL of 20, the target distribution has a NLL of $17.192$, and memorizing the training data would yield a NLL of approximately $13.87$.

© 2019 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (http://creativecommons.org/licenses/by/4.0/).

## Share and Cite

**MDPI and ACS Style**

Stokes, J.; Terilla, J.
Probabilistic Modeling with Matrix Product States. *Entropy* **2019**, *21*, 1236.
https://doi.org/10.3390/e21121236

**AMA Style**

Stokes J, Terilla J.
Probabilistic Modeling with Matrix Product States. *Entropy*. 2019; 21(12):1236.
https://doi.org/10.3390/e21121236

**Chicago/Turabian Style**

Stokes, James, and John Terilla.
2019. "Probabilistic Modeling with Matrix Product States" *Entropy* 21, no. 12: 1236.
https://doi.org/10.3390/e21121236