This section considers the problem of unsupervised learning of probability distributions on bitstrings of fixed length (Code available online:

https://github.com/TunnelTechnologies/dmrg-exact). The first problem we consider is the parity language

${P}_{N}$, which consists of bitstrings of length

N containing an even number of 1 bits. The goal of this task is to learn the probability distribution

p which assigns uniform mass to each bitstring in

${P}_{N}$ and zero elsewhere. More explicitly,

where

${\mathbb{I}}_{{P}_{N}}:{\{0,1\}}^{N}\to \{0,1\}$ denotes the indicator function of the subset

${P}_{N}\subset {\{0,1\}}^{N}$. The above unsupervised learning problem is harder than the parity classification problem considered in [

12] because the training signal does not exploit data labels. Of the total

$|{P}_{N}|={2}^{N-1}$ such bitstrings, we reserved random disjoint subsets of size

$2\%$ for training, cross-validation and testing purposes. A NLL of

$N-1$ corresponds to the entropy of the uniform distribution on

${P}_{N}$. If the model memorizes the training set, it will assign to it a negative-log-likelihood (NLL) of

$N-1+{log}_{2}\left(0.02\right)$ corresponding to the entropy of the uniform distribution on the training data. A NLL of

N corresponds to the entropy of the uniform distribution on all bitstrings of length

N. The measure of generalization performance is the gap

$\u03f5$ between the NLL of the training and testing data. We performed exact single-site DMRG over the real number field using the

${P}_{20}$ dataset for different choices of bond dimension, which refers to the dimensionality of the bond space

W in the effective Hilbert space

${\mathcal{H}}_{\mathrm{eff}}=W\otimes V\otimes W$. Training was terminated according to an early stopping criterion as determined by distance between the MPS state and the state of the cross-validation sample. Since the bond dimension controls the complexity of the model class, and since matrix product states are universal approximators of functions on

${\{0,1\}}^{N}$, we expect overfitting to occur for sufficiently large bond dimension. Indeed, the NLL as a function of bond dimension reported in

Figure 2 displays the expected bias-variance tradeoff, with optimal model complexity occurring at bond dimension 3 with corresponding generalization gap

$\u03f5=0.0237$.

The second problem we consider is unsupervised learning of the divisible-by-7 language which consists of the binary representation of integers which are divisible by 7. The dataset was constructed using first 149797 such integers which lie in the range

$[1,{2}^{20}]$. We trained a length-20 MPS to learn the uniform distribution on the divisible-by-7 language as we did for P20, except utilizing subsets of size 10% for training, testing and cross-validation.

Figure 3 illustrates that the model trained on exact single site DMRG with a bond dimension of 8 learns the DIV7 dataset with nearly perfect accuracy, producing a model with a generalization gap of

$\u03f5=0.032$.