Variational Autoencoder Reconstruction of Complex Many-Body Physics

Thermodynamics is a theory of principles that permits a basic description of the macroscopic properties of a rich variety of complex systems from traditional ones, such as crystalline solids, gases, liquids, and thermal machines, to more intricate systems such as living organisms and black holes to name a few. Physical quantities of interest, or equilibrium state variables, are linked together in equations of state to give information on the studied system, including phase transitions, as energy in the forms of work and heat, and/or matter are exchanged with its environment, thus generating entropy. A more accurate description requires different frameworks, namely, statistical mechanics and quantum physics to explore in depth the microscopic properties of physical systems and relate them to their macroscopic properties. These frameworks also allow to go beyond equilibrium situations. Given the notably increasing complexity of mathematical models to study realistic systems, and their coupling to their environment that constrains their dynamics, both analytical approaches and numerical methods that build on these models show limitations in scope or applicability. On the other hand, machine learning, i.e., data-driven, methods prove to be increasingly efficient for the study of complex quantum systems. Deep neural networks, in particular, have been successfully applied to many-body quantum dynamics simulations and to quantum matter phase characterization. In the present work, we show how to use a variational autoencoder (VAE)—a state-of-the-art tool in the field of deep learning for the simulation of probability distributions of complex systems. More precisely, we transform a quantum mechanical problem of many-body state reconstruction into a statistical problem, suitable for VAE, by using informationally complete positive operator-valued measure. We show, with the paradigmatic quantum Ising model in a transverse magnetic field, that the ground-state physics, such as, e.g., magnetization and other mean values of observables, of a whole class of quantum many-body systems can be reconstructed by using VAE learning of tomographic data for different parameters of the Hamiltonian, and even if the system undergoes a quantum phase transition. We also discuss challenges related to our approach as entropy calculations pose particular difficulties.


Introduction
The development of the dynamical theory of heat or classical equilibrium thermodynamics as we know it was possible only with empirical data collection, processing, and analysis, which led, through a phenomenological approach, to the definition of two fundamental physical concepts, the actual pillars of the theory: energy and entropy [1]. It is with these two concepts that the laws (or principles) of thermodynamics could be stated and the absolute temperature be given a first proper definition. Though energy remains as fully enigmatic as entropy from the ontological viewpoint, the latter concept is not completely understood from the physical viewpoint. This of course did not The cross-fertilization of quantum physics and thermodynamics has benefited much from the powerful quantum formalism and computational techniques; however, as thermodynamic concepts evolved from intuitive/phenomenological definitions to classical-mechanics constructs, extended with quantum physics and formalism when needed, thermodynamics, in spite of its undeniable theoretical and practical successes, never managed to fully mature into a genuine fundamental theory that firmly rests on strong basic postulates. On one hand, this led a growing number of physicists to consider thermodynamics as incomplete, and on the other, to think quantum theory as the underlying framework from which equilibrium and nonequilibrium thermodynamics emerge. Quantum thermodynamics [48,49] is a fairly recent field of play, where new ideas are tested while revisiting old problems related to cycles, engines, refrigerators, and entropy production, to name a few [50,51]. Further, quantum technology is a burgeoning field at the interface of physics and engineering, which seeks to develop devices able to harness quantum effects for computing and secure communication purposes [52,53]. The wide scale development of such a kind of systems, which irreversibly interact with an infinite environment, rests on the ability to properly simulate the open quantum dynamics of their many-body properties and analyze coherence and dissipation at the quantum level.
How fast quantum thermodynamics will progress is difficult to anticipate as there exist numerous unsolved problems, especially those related to the proper characterization of the physical processes, e.g., what qualifies as heat or work on ultrashort time and length scales, where averages become irrelevant is unclear, and how the laws of thermodynamics may be systematically adapted still may be debated. To mitigate risks of slow progress, one may resort to approaches that do not rely on models of systems, but rather on data, the idea being to gain actual knowledge and understanding from data irrespective of how complex the studied system is. Machine learning (ML) provides perfectly suited tools for that purpose [54]. ML has a rather long history that can be dated back with the works of Bayes (1763) on prior knowledge that can be used to calculate the probability of an event as formulated by Laplace (1812). Much later (1913), Markov chains were proposed as a tool to describe sequences of events, each being characterized by a probability of occurrence that depends on the actuality of the previous event only. The main milestone is in 1950, with Turing's machine that can learn [55], shortly followed in 1951 by the first neural network machine [56]. Thanks to the huge increase in computational power over the last two decades, ML is now used for a wide variety of problems [54], and quantum machine learning now shows extraordinary potential for faster and more efficient than ever treatment of complex quantum systems problems [57], one major challenge still residing in the development of the hardware capable to harness and transform this potentiality into actual tool.
With the recent success in the field of deep learning, tools other than those based on tensor networks work as well as an ansatz. Restricted Boltzmann machine has been successfully applied as an ansatz to a ground state search, dynamics calculation, and quantum tomography [58][59][60], as well as convolution neural network to the two-dimensional frustrated J 1 − J 2 model [61]. The deep autoregressive model was applied very efficiently and elegantly to a ground state search of many-body quantum system and to classical statistical physics as well [62,63]. It was also recently shown how ML can establish and classify with high accuracy the chaotic or regular behavior of quantum billiards models and XXZ spin chains [64]. Thus, it can be useful to transfer deep architectures from the field of deep learning to the area of many-body quantum systems. A variational autoencoder (VAE) was used for sampling from probability distributions of quantum states in [65]; in the present work, we show that state-of-the-art generative architecture called conditional VAE can be applied to describe the whole family of the ground states of a quantum many-body system. For that purpose, using quantum tomography (albeit in an approximate fashion as discussed below) and reconstruction tools developed in [66], we consider the paradigmatic Ising model in a transverse-field as an illustration of the usefulness and efficiency of our approach. The use of VAE in such a problem is justified by the simplicity of VAE training, as well as its expressibility [67].
The article is organized as follows. In Section 2, we give a brief recap of the physics of the Ising model in a transverse field. In Section 3, we develop our generative model in the framework of the tensor network. Section 4 is devoted to the variational autoencoder architecture. The results are shown and discussed in Section 5. The article ends with concluding remarks, followed a by a short series of appendices.

Transverse-Field Ising Model
Among the rich variety of condensed matter systems, magnetic materials are a source of many fruitful problems, whose studies and solutions inspired discussions and new models beyond their immediate scope. The Kondo effect (existence of a minimum of electrical resistivity at low temperature in metals due to the presence of magnetic impurities) is one such problem [68,69], as it provides an excellent basis for studies of quantum criticality and absolute zero-temperature phase transitions [70,71] and, also, on a more fundamental level, a concrete example of asymptotic freedom [69]. Assuming infinite on-site repulsion, the single-impurity Anderson model [68,72] was used to establish a correspondence between Hamiltonian language and path integral for the development of nonperturbative methods in quantum field theory [73,74]. One other important model is that of the Heisenberg Hamiltonian, defined for the study of ferromagnetic materials, and which, assuming a crystal subjected to an external magnetic field B, reads [75] as where, for ease of notations, we introduced h = gµ B B, with g being the Landé factor and µ B = eh/2m e being the Bohr magneton (e: elementary electric charge, and m e : electron mass); J ij is a parameter that characterizes the nearest-neighbors exchange interaction between electron spins on the crystal sites i and j (the quantum spinsŜ i andŜ j are vector operators whose components are proportional to the Pauli matrices). For simplicity, one may consider J ij ≡ J constant. If J > 0, then the system is ferromagnetic and if J < 0 the system is antiferromagnetic. Hereafter, we fix the electron's magnetic moment gµ B = 1.
Although Equation (1) has a fairly simple form, the exact calculation of the partition function is where β = 1/k B T is the inverse thermal energy, which is possible on the analytical level with the mean-field approximation that simplifies the Hamiltonian (1), and also for one-dimensional systems, one difficulty of the Heisenberg Hamiltonian being that the three components of a spin vector operator do not commute. That said, Heisenberg's Hamiltonian is very useful to, e.g., study spin frustration [76], entanglement entropy [77], and also serve as a test case for density-matrix renormalization group algorithms [78]. Under zero field, Heisenberg's Hamiltonian is also a simplified form of the Hubbard model at half-filling, thus including ferromagnetism in the scope of strongly correlated systems studies. A particular, but very important, approximation of Heisenberg's Hamiltonian, whose significance lies in physics, especially for the study of critical phenomena, cannot be underestimated: the so-called Ising model. In its initial formulation [79], Ising spins are N classical variables, which may take ±1 as values and form a one-dimensional (1D) system characterized by free or periodic boundary conditions. The classical partition function Z may be calculated analytically for the 1D Ising model, and quantities such as the average total magnetization are obtained directly [80]: In the present work, we consider a 1D quantum spin chain whose Hilbert space is given by H = N i C 2 . The system is described by the transverse-field Ising (TFI) Hamiltonian [81]: where σ i α (α ≡ x, z) is the Pauli matrix for the α-component of the i-th spin in the chain, and h x is the magnetic field applied in the transverse direction x. In this case, the spins are no longer the classical Ising ones and the two terms that compose the Hamiltonian H do not commute, therefore requiring a full quantum approach. An example of a real-world system that may be studied as a quantum Ising chain is cobalt niobate (CoNb 2 O 6 ); in this case, the spins that undergo the phase transition as the transverse field varies are those of the Co 2+ ions [82]. The spin states are denoted |+ i and |− i at ion site i. There are two possible ground states: when all N spins are in the state |+ or in the state |− , i.e., when they are all aligned, which defines the ferromagnetic phase.
The phase transition from the ferromagnetic phase to the paramagnetic phase that we speak of now is of a quantum nature, and not of a thermal nature, as here it is driven only by the external magnetic field. More precisely, when the transverse field h x is applied with sufficient strength, the spins align along the x direction, and the spin state at site i is given as the superposition (|+ i + |− i ) / √ 2, which is nothing else but the eigenstate of the x-component of the spin. Therefore, in this particular case, there is no need to raise the temperature of the system initially in the ferromagnetic phase beyond the Curie temperature to make it a paramagnet: the many-body system remains in its ground state, but its properties have changed. Further, note that unlike for the ferromagnetic phase, the quantum paramagnetic phase has spin-inversion symmetry. An insightful discussion on quantum criticality can be found in Reference [83]. Now, we briefly comment on the quantity β = 1/k B T in the context of quantum phase transitions, which, strictly speaking, can only occur at temperature T = 0 K. In fact, close to the absolute zero, where β → ∞, their signatures can be observed as quantum fluctuations dominate thermal fluctuations in the criticality region, where the quantum critical point lies. The imaginary time formalism [84], where exp(−βH) is interpreted as an evolution operator, and the partition function Z as a path integral, provides a way to map a quantum problem onto a classical one with the introduction of the imaginary time β resulting from a Wick rotation in the complex plane, thus yielding one extra dimension to the model. In classical thermodynamics, to observe a phase transition in a system requires that its size (i.e., the number of constituents N) tends to infinity so that the order parameter is non-analytic at the transition point; so, for the quantum transition, the thermodynamic limit entails the limit β → ∞ also: the 1D TFI model is mapped onto an equivalent 2D classical Ising model [85]. The imaginary time formalism permits implementation of classical Monte Carlo simulations to study quantum systems. Further discussion, including the sign problem for the quantum spin-1/2 system, is available in Reference [4].
We have chosen the transverse-field Ising model as an illustrative case for our study for several reasons. First, as this system is 1-dimensional, we can apply an MPS variational ground state solver [37], and therefore obtain the ground state solution in MPS representation. We can then perform fast and exact sampling for generation of large data sets for the training of the VAE. Next, this model can be solved analytically, which allows us to adequately benchmark our results. Finally, this model shows a nontrivial behavior around the quantum phase transition point at h x = 1, and thus constitutes an interesting example to apply a VAE.

Generative Model as a Quantum State
Many-body quantum physics is rich in high-dimensional problems. Often, however, with increasing dimensionality, these become extremely difficult or impossible to solve. One solving method is through the reformulation of the quantum mechanical problem as a statistical problem, when possible. This way, machine learning can be used to effectively solve such a problem, as machine learning is a tool for the solving of high-dimensional statistical problems [86]. Probabilistic interpretation allows for using powerful sampling-based methods that work efficiently with high dimensional data.
An example of the reformulation of a quantum problem as a statistical problem is with informationally complete (IC) positive-operator valued measures (POVMs) [87]. POVMs describe the most general measurements of a quantum system. Each particular POVM is defined by a set of positive semidefinite operators M α , with the normalization condition ∑ α M α = 1, where 1 is the identity operator. The fact that the POVM is informationally complete means that using measurement outcomes one can reconstruct the state of a system with arbitrary accuracy.
The probability of measurement outcome for a quantum system with the density operator ρ is governed by Born's rule: In other words, any density matrix can be mapped on a mass function, although not all mass functions can be mapped on a density matrix [88,89]. Some mass functions lead to non-positive semidefinite "density matrices", which is not physically allowed. As such, quantum theory is a constrained version of probability theory. For a many-body system, these constraints can be very complicated, and direct consideration of quantum theory as a constrained probability theory is not fruitful. However, if one can access the samples of the IC POVM induced mass function, which is by definition physically allowed, this mass function can be reconstructed using generative modeling [66,67]. Samples can be obtained either by performing generalized measurements over the quantum system or by in silico simulation.
In the present work, we simulate measurements of the ground state of a spin chain with the TFI Hamiltonian, Equation (4). As a local (one spin) IC POVM, we use the so-called symmetric IC POVM for qubits (tetrahedral) POVM [90]: Note that the many-spin generalization of local IC POVM can easily be obtained by considering the tensor product of local ones: To simulate measurements outcome under the IC POVM described above, we implement the following numerical scheme: First, we run a variational MPS ground state solver to obtain the ground state of the TFI model in the MPS form: where we use the tensor notation instead of the bra-ket notation for further simplicity, and we obtain the MPS representation of IC POVM induced mass function: whose diagrammatic representation [35] is shown in Figure 1. Next, we produce a set of samples of from the given probability. The sampling can be efficiently implemented as shown in Appendix B. We call this set of samples (outcome measurements) a data set, which may then be used to train a generative model p[α 1 , α 2 , . . . , α N |θ] to emulate the true mass function P[α 1 , α 2 , . . . , α N ]. Here, θ is the set of parameters of the generative model, which is trained by maximizing the logarithmic likelihood L(θ) with respect to the parameters θ [91]. The trained generative model fully characterizes a quantum state. The density matrix is obtained by applying an inverse transformation to the mass function [92]: the diagrammatic representation of which is given in Figure 2. Note that the summation included in the density matrix representation is numerically intractable, but we can estimate it using samplings from the generative model.  Our goal is to use a generative model as an effective representation of quantum states to calculate the mean values of observables such as, e.g., two-point and higher-order correlation functions. An explicit expression of the two-point correlation function obtained by sampling from the trained generative model is shown in Figure 3. To obtain the ground state of the TFI model, we use a variational MPS ground state search, and we pick the bond dimension of MPS equal to 25 and perform 5 DMRG sweeps to get an approximate ground state in the MPS form. We use the variational MPS solver provided by the mpnum toolbox [93].

Variational Autoencoder Architecture
In our work, we use a conditional VAE [94] to represent quantum states. A conditional VAE is a generative model expressed by the following probability distribution, where x is the data we want to simulate; θ represents the VAE parameters, which can be tuned to get the desired probability distribution over x; h is the condition; and z is a vector of latent variables. In our case, x is the quantum measurement outcome in one-hot notation. A collection of measurement outcomes is a matrix of size N × 4, where N is the number of particles in the chain and 4 is the number of possible outcomes of the tetrahedral IC POVM, which is either where π ij (z, h, θ) is the neural network in our architecture, and, more precisely, π ij is the probability of the j th outcome of the POVM for the i th spin with ∑ N j=1 π ij = 1 and π ij ≥ 0. The quantity p[z] is the prior distribution over latent variables, which is simply given by N (0, I) = 1 √ 2π N exp − 1 2 z T z , with I being the identical covariance matrix. We take the number of latent variables equal to the number of spins, N. Essentially, we want to optimize our VAE so that its probability matches the probability of the quantum measurement outcomes as closely as possible. This can be done using the well-known maximum likelihood estimation: is the data set of outcome measurements. We cannot simply maximize this function using, for example, a gradient descent method, due to the presence of hidden variables in the structure of this function. However, we can overcome this problem by using the Evidence Lower Bound (ELBO) [95] and the reparametrization trick shown in [96]. The detailed description of the procedure is given in the Appendix A.
Once trained, the VAE is a simple and efficient way to produce new samples from its probability distribution. It can be done in three steps. First, we produce a sample from the prior distribution p[z] = N (0, I). Next, we feed this sample and the external magnetic field value into the neural network decoder π ij (z, θ, h), which returns the matrix of probabilities. Finally, we sample from the matrix of probability π ij (z, θ, h) to generate "fake" outcome measurements. A visual representation of the sampling method is shown in Figure 4. In many problems, gradients of observables with respect to different model parameters yield quantities of interest. For example, one may consider the magnetic differential susceptibility tensor χ ij = ∂µ i /∂h j . It can be done efficiently by using backpropagation through the VAE architecture but, as samples from the VAE are discrete, a straightforward backpropagation is impossible. In recent papers [97][98][99], a method called the Gumbel-softmax was introduced to overcome this difficulty through continuous relaxation. The spirit, and therefore the physical meaning of the method, may be understood with a short discussion of the so-called simulated annealing technique, which is often used to solve discrete optimization problems. Broadly speaking, the simulated annealing rests on the introduction of a parameter that acts as an artificial "temperature", which varies continuously to modify the state of the system in search of a global optimum. Starting from a given state, for some values of the temperature, if the system mostly explores the neighboring states, moving among them and possibly in the vicinity of the "better" ones, i.e., with lower energy, it may get and remain close to a local optimum, or local energy minimum in the thermodynamic language; however, to avoid remaining in a locally optimal region, "bad" moves leading to worse (i.e., higher energy) states are useful to explore the temperature space more completely improving the chance to find a global optimum or at least to be near it. To each move an energy variation, ∆E, is associated; it is the continuous character of the fictitious temperature that makes the discrete problem continuous as the probability exp(−∆E)/k B T of acceptance of a state is continuous. Although this approach has been known for a long time [100], it remains topical and under active development [101,102]. The method of continuous relaxation we use also exploits such an artificial temperature to make discrete samples continuous.
The Gumbel-softmax trick, consists of three steps:
Finally, we take the softmax function of the result from the previous step: x fake soft (T) = softmax(Z/T), where T is a temperature of softmax. The softmax functions is defined by the The quantity x fake soft (T) has a number of remarkable properties: first, it becomes an exact one-hot sample when T → 0; second, we can backpropagate through soft samples for any T> 0. The method is validated in the next section.
Before we proceed to the presentation and discussion of our results, and to better see the added value of the VAE, it is instructive to compare MPS and VAE (NN) in terms of expressibility, i.e., "estimation of MPS states via incomplete local measurements" vs "VAE reconstruction". As the state of the system is assumed to be unknown, and some measurement outcomes are only known for different magnetic fields, these outcomes are too few for exact tomography. Further, it is known that for a given bond dimension d, the entangled entropy cannot be larger than log(d); in other words, the bond dimension of MPS places an upper bound on the entangled entropy. Thus, the MPS representation describes well only quantum states with low entangled entropy, i.e., quantum states which satisfy the area law [103,104]. The situation with neural network quantum states (NQS) is different: there is no such a restriction for NQS. Moreover, the existence of NQS with volume-law entanglement [105] shows a promising development of new, and possibly powerful, NN-based approaches to representing many-body quantum systems.

Results
Here, we show that the VAE trained on a set of preliminary measurements is capable to describe the physics of the whole family of TFI models. We validate our results by comparing VAE-based calculations with numerically exact calculations performed by variational MPS algorithm [35]. Additionally, to assess the capabilities of the VAE, we consider a spin chain with 32 spins. We calculate the MPS representation of the ground state and extract information from it by performing measurements over the state. The external field in the x-direction is varied from 0 to 2 with a step of 0.1. The VAE is trained on a data set (TFI measurement outcomes) consisting of 10.5 million samples in total: 21 external fields h x with 500,000 samples per field.
To evaluate the VAE performance, we simply compare directly the numerically exact correlation functions with those reconstructed with our VAE. Those of n = 1, . . . , 32, σ 1 z σ n z , and σ 1 x σ n x are shown in Figures 5 and 6, respectively, and we compare the numerically exact and the VAE-based average magnetizations along x, given by σ n x for each position of the spin along the chain, in Figure 7. We see that the VAE captures well the physics of the one-and two-point correlation functions. Figure 8 shows the total magnetizations, µ x and µ z , in the x and z directions, respectively, with µ i = 1 N ∑ N j=1 σ j i , and we see that the VAE is a tool well-suited for the description of the quantum phase transition and also finite-size effects: whereas for the infinite TFI chain, i.e., in the thermodynamic limit, the phase transition is observed at h x = 1, and the finite size of the system yields a shift of the critical point at h x ≈ 0.9. Also note that in the T → 0 limit, the magnetization M defined in Equation (3) coincides exactly with the magnetization µ defined above.
A backpropagation algorithm combined with the Gumbel-softmax trick may be used to evaluate the derivative of an output over an input. We use this approach to calculate some elements of a magnetic differential susceptibility tensor χ ij = ∂µ i /∂h j , in particular, χ xx and χ zx shown in Figure 9. The backpropagation-based magnetic differential susceptibility agrees well with the numerically calculated one (central differences). The main advantage of the backpropagation-based calculation is its numerical efficiency. The VAE may thus be trained with an arbitrary set of external parameters, i.e., not only h x , but also h y and h z , and yield the full differential susceptibility tensor.     At this stage, we could conclude that the VAE is capable to describe the physics of one-and two-point correlation functions, and therefore the TFI physics. However, notwithstanding the ability of the VAE to yield correlation functions that fit well numerically-exact correlation functions, this is not yet a full proof that it represents quantum states well. To address this point, we consider a small spin chain (five spins with TFI Hamiltonian and an external magnetic field h x = 0.9) for which we calculate both the exact mass function and that estimated from VAE samples. Figure 10 shows that the VAE result again fits the numerically exact mass function with high accuracy. Further, we calculate the Bhattacharyya coefficient [106]: BC(p vae , p exact [α] as a function of the external magnetic field h x . Results reported in Figure 11 show that BC(p vae , p exact ) > 0.99 over the whole h x range, which thus proves that the VAE represents a quantum state well, at least for small spin chains. The structure of the entanglement is an another interesting subject that we would like to validate. The essence of entanglement between two parts of the chain, which is split into n left spins and N − n right spins, can be described by the Réniy entropy of the left part of this chain: S α = 1 1−α log Trρ α n , where ρ n is the density matrix of the first n spins in the chain. We estimate the Rényi entropy of order 2: S 2 = − log(Trρ 2 ), as it can be efficiently calculated from the matrix product representation of the density matrix and from the VAE samples. However, as sample-based estimation of the entangled entropy has a variance that grows exponentially with the number of spins, we consider a small spin chain of size 10. A direct comparison between the numerically exact and the VAE-based entangled entropies is shown for different values of n in Figure 12. For this particular case, the VAE clearly overestimates the entangled entropy. This undesirable effect is indeed observed for all sizes of spin chains, and even for the spin chain of size 5, for which we have an excellent agreement between the numerically exact mass function and the VAE-based result. The entropy S 2 is sensitive to small errors in the mass function, but it also appears that the primary method of state reconstruction used in the present work has the following shortcomings.

1.
If one reconstructs a pure state, the VAE smooths the spectrum of the density matrix and approximates the pure state by a slightly mixed state, as illustrated with a simple example in Figure 13.

2.
The VAE does not account the positivity constraints, which yields negative eigenvalues for the density matrix. These negative eigenvalues even appear in the spectrum of the reduced density matrix, as shown in Figure 13. These drawbacks hinder a robust description of the entanglement structure. In addition to the mismatch between the Rényi entropies (S 2 ), the entropy of a reduced density matrix can be larger than the entropy of the whole density matrix, which is erroneous. This particular issue, now identified, may be resolved by introduction of a particular regularization term into the VAE loss. This is the object of future work.
Finally, it is also instructive to comment on the memory costs of the use of either MPS or VAE, which is somehow a tricky question, as it is unclear for any NN-based architecture what numbers of layers and neurons per layer are needed because there is no criterion for NN, whereas for the MPS and tensor networks, there is one. Thus, a direct comparison of NN architectures and tensor networks (MPS, etc.) is certainly a difficult task, and in our opinion, likely an impossible one. At this stage, we may say the following. For a given spin chain of size N and maximal entangled entropy between subchains S = −Trρ log ρ, the MPS requires to store approximately 2N exp (2S) complex numbers; this follows from the fact that one then considers N subtensors of size exp (S) × 2 × exp (S), where exp (S) is the typical (approximate) size of bond dimension. For a VAE, although it seems that there are no entropic restrictions, the proper quantitative characterization of the "neural network" complexity of a quantum state still is an open question (for tensor networks, it is the entangled entropy). A VAE contains two neural networks: encoder and decoder. To store a feed-forward neural network, one has to store ∑ i l i−1 × l i + l i real numbers, with l i being the number of neurons in the layer number i. In general, one may conclude that the MPS is preferable for low entangled states, and the VAE is preferable for highly entangled states.

Conclusions
The thermodynamic study of complex many-body quantum systems still requires the development of new methods, including those that may stem from machine learning. The quantum Ising model, which is of particular importance for practical purposes [107,108], provides a rich framework to test these new methods that are also useful to obtain deeper physical insight into its nonequilibrium dynamics properties such as, e.g., quantum fluctuations propagation [109]. In the present work, we studied the ability of a VAE to reconstruct the physics of quantum many-body systems, using the transverse-field Ising model as a nontrivial example. We used the IC POVM to map the quantum problem onto a probabilistic domain and vice versa. We trained the VAE on a set of samples from the transformed quantum problem, and our numerical experiments show the following results.
• For a large system (32 spins), the VAE's reliability is verified by comparing one-and two-point correlation functions. • For small system (five spins), the VAE's reliability is verified by direct comparison of mass functions. • The VAE can capture a quantum phase transition. • The response functions (magnetic differential susceptibility tensor) can be obtained using backpropagation through VAE. • Despite the very good agreement between the VAE-based mass function and the true mass function, the VAE shows limited performance with the determination of the entangled entropy. This is point is the object of further development.
Our method can be extended to any other thermodynamic system by introduction of the temperature as an external parameter, thereby considering also thermal phase transitions. As one can calculate different thermodynamic quantities by applying backpropagation through VAE, a worthwhile and highly complex system to study would be water under its difference phases, so as to test recent new ideas and models [110,111].
Our code for our numerical experiments is available on the GitHub repository website [112].  18-37-20073. This research was also partially supported by the Skoltech NGP Program (Skoltech-MIT joint project).

Acknowledgments:
The authors thank Stepan Vintskevich for fruitful discussions. The authors also thank Google Colaboratory for providing access to GPU for the acceleration of computations.

Conflicts of Interest:
The authors declare no conflicts of interest.

Abbreviations
The following abbreviations are used in this manuscript.

Appendix A. VAE: Training and Implementation Details
When training our VAE, we find the arg maximum of the logarithmic likelihood L(θ) w.r.t. its parameters θ: where the rhs of this inequality is the lower bound of the log likelihood, as it will always be greater than or equal to the lower bound, and equality can always be achieved by a proper choice of q if it is in a complex enough family. Maximizing the lower bound is equivalent to maximizing the log likelihood. We can decompose this lower bound term into two terms: In our case, we picked the particular distribution forms that reflect the structure of our problem: where µ i and σ i are given by the encoder neural network, and π ij is given by the decoder neural network, with ∑ 4 j=1 π ij = 1 and π ij ≥ 0, which can be achieved by applying the softmax funtion to the output of the neural network. Now, we can use the reparametrization trick to change the variable in the integral z = σ j (x,θ, h)ε + µ j (x,θ, h), where ε j ∼ N (0, I), to simplify this expression to The first term is the cross-entropy, which pushes the probability distribution to be as close as possible to the data. The second term is the regularizer, which forces the latent variable z not to diverge too much from the normal distribution N (0, I), so that the VAE can be used to generate new data once it is trained. Note that both x ij and σ i must be positive. Instead of adding a constraint to the VAE, which would be difficult to do, we train the VAE for the variables Π = log π and ξ = 2 log σ. Equation (A7) then becomes x ij Π ij (e ξ i (x,θ,h)/2 ε + µ i (x,θ, h), θ, h) ε j ∼N (0,I) Now, ELBO(θ,θ) can be effectively optimized using gradient descent methods, averaging over ε can be done by sampling. Generalizing to a data set of size M: {x k } M k=1 can be easily done and is shown by A visual representation of the VAE architecture is shown in Figure A1. To solve the optimization problem, we use Adam optimizer [113] with standard parameters (lr = 0.001, β 1 = 0.9, β 2 = 0.999). For the encoder and decoder, we use fully-connected neural networks with two hidden layers and 256 neurons on each. We train the VAE using batches of size 100,000 samples and for 750 epochs.

Appendix B. Sampling from POVM-Induced Mass Function
The mass function induced by POVM P[α 1 , α 2 , . . . , α N ] has a form of matrix product state. Thus, one can easily calculate any marginal mass function because a summation over any α can be done locally. Any conditional mass functions can be also calculated by using marginal mass functions. Thus, one can calculate chain decomposition of the whole mass function: With this decomposition, one can produce a sampleα N from P[α N ] first, then a sampleα N−1 from P[α N−1 |α N ], and continue up to the end of the chain. The obtained set {α 1 ,α 2 , . . . ,α N } is a valid sample from the mass function.