Leveraging Stochasticity for Open Loop and Model Predictive Control of Spatio-Temporal Systems

Stochastic spatio-temporal processes are prevalent across domains ranging from the modeling of plasma, turbulence in fluids to the wave function of quantum systems. This letter studies a measure-theoretic description of such systems by describing them as evolutionary processes on Hilbert spaces, and in doing so, derives a framework for spatio-temporal manipulation from fundamental thermodynamic principles. This approach yields a variational optimization framework for controlling stochastic fields. The resulting scheme is applicable to a wide class of spatio-temporal processes and can be used for optimizing parameterized control policies. Our simulated experiments explore the application of two forms of this approach on four stochastic spatio-temporal processes, with results that suggest new perspectives and directions for studying stochastic control problems for spatio-temporal systems.


Introduction and Related Work
Many complex systems in nature vary spatially and temporally, and are often represented as stochastic partial differential equations (SPDEs). These systems are ubiquitous in nature and engineering, and can be found in fields such as applied physics, robotics, autonomy, and finance [1][2][3][4][5][6][7][8][9]. Examples of stochastic spatio-temporal processes include the Poisson-Vlassov equation in plasma physics, heat, Burgers' and Navier-Stokes equations in fluid mechanics, and Zakai and Belavkin equations in classical and quantum filtering. Despite their ubiquity and significance to many areas of science and engineering, algorithms for stochastic control of such systems are scarce.
The challenges of controlling SPDEs include significant control signal time delays, dramatic under-actuation, high dimensionality, regular bifurcations, and multi-modal instabilities. For many SPDEs, existence and uniqueness of solutions remains an open problem; when solutions exist, they often have a weak notion of differentiability, if at all. Their performance analysis must be treated with functional calculus, and their state vectors are often most conveniently described by vectors in an infinite-dimensional time-indexed Hilbert space, even for scalar one-dimensional SPDEs. These and other challenges together represent a large subset of the current-day challenges facing the fluid dynamics and automatic control communities, and present difficulties in the development of mathematically consistent and numerically realizable algorithms.
The majority of computational stochastic control methods in the literature have been dedicated to finite-dimensional systems. Algorithms for decision making under uncertainty of such systems typically rely on standard optimality principles, from the stochastic optimal control (SOC) literature, namely the dynamic programming (or Bellman) principle, and the stochastic Pontryagin maximum principle [10][11][12]. The resulting algorithms typically require solving the Hamilton-Jacobi-Bellman (HJB) equation: a backward nonlinear partial differential equation (PDE) of which solutions are not scalable to high dimensional spaces.
Several works (e.g., [13,14] for the Kuramoto-Sivashinsky SPDE) propose model predictive control based methodologies for reduced order models of SPDEs based on SOC principles. These reduced order methods transform the original SPDE into a finite set of coupled stochastic differential equations (SDEs). In SDE control, probabilistic representations of the HJB PDE can solve scalability via sampling techniques [15,16], including iterative sampling and/or parallelizable implementations [17,18]. These methods have been explored in a reinforcement learning context for SPDEs [19][20][21].
Recently, a growing body of work considers deterministic PDEs, and utilize finite dimensional machine learning methods, such as deep neural network surrogate models that utilize standard SOC-based methodologies. In the context of fluid systems, these approaches are increasingly widespread in the literature [22][23][24][25][26]. A critical issue in applying controllers that rely on a limited number of modes is that they can produce concerning emergent phenomena, including spillover instabilities [27,28] and failing latent space stabilizability conditions [25].
Outside the large body of finite dimensional methods for PDEs and/or SPDEs are a few works that attempt to extend the classical HJB theory for systems described by SPDEs. These are comprehensively explored in [29] and include both distributed and boundary control problems. Most notably, [30] investigates explicit solutions to the HJB equation for the stochastic Burgers equation based on an exponential transformation, and [31] provides an extension of the large deviation theory to infinite dimensional spaces that creates connections to the HJB theory. These and most other works on the HJB theory for SPDEs mainly focus on theoretical contributions and leave the literature with algorithms and numerical results tremendously sparse. Furthermore, the HJB theory for boundary control has certain mathematical difficulties, which impose limitations.
Alternative methodologies are derived, using information theoretic control. The basis of a subset of these methods is a relation between free energy and relative entropy in statistical physics, given by the following: Free Energy ≤ Work − Temperature × Entropy (1) This inequality is an instantiation of the second law in stochastic thermodynamics: increase in entropy results in minimizing the right hand side of the expression. In finite dimensions, connections between Equation (1) and dynamic programming motivate these methods. Essentially, there exist two different points of view on decision making under uncertainty that overlap for fairly general classes of stochastic systems, as depicted in Figure 1. These connections are extended to infinite-dimensional spaces [32] (see also Appendix F) and are leveraged in this letter to develop practical algorithms for distributed and boundary control of stochastic fields. Specifically, we develop a generic framework for control of stochastic fields that are modeled as semi-linear SPDEs. We show that optimal control of SPDEs can be cast as a variational optimization problem and then solved, using sampling of infinite dimensional diffusion processes. The resulting variational optimization algorithm can be used in either fixed or receding time horizon formats for distributed and boundary control of semilinear SPDEs and utilizes adaptive importance sampling of stochastic fields. The derivation relies on non-trivial generalization of stochastic calculus to arbitrary Hilbert spaces and has broad applicability.
This manuscript presents an open loop and model predictive control methodology for the control of SPDEs related to fluid dynamics, which are grounded on the theory of stochastic calculus in function spaces, which is not restricted to any particular finite representation of the original system. The control updates are independent of the method used to numerically simulate the SPDEs, which allows the most suitable problem dependent numerical scheme (e.g., finite differences, Galerkin methods, and finite elements) to be employed. Furthermore, deriving the variational optimization approach for optimal control entirely in Hilbert spaces overcomes numerical issues, including matrix singularities and SPDE space-time noise degeneracies that typically arise in finite dimensional representations of SPDEs. Thus, the work in this letter is a generalization of information theoretic control methods in finite dimensions [33][34][35][36] to infinite dimensions and inherits crucial characteristics from its finite dimensional counterparts.
However, the primary benefit of the information theoretic approach presented in this work is that the stochasticity inherent in the system can be leveraged for control. Namely, The inherent system stochasticity is utilized for exploration in the space of trajectories of SPDEs in Hilbert spaces, which provide a Newton-type parameter update on the parametrized control policy. Importance sampling techniques are incorporated to iteratively guide the sampling distribution, and result in a mathematically consistent and numerically realizable sampling-based algorithm for distributed and boundary controlled semi-linear SPDEs.

Preliminaries and Problem Formulation
At the core of our method are comparisons between sampled stochastic paths used to perform Newton-type control updates as depicted in Figure 2. Let H, U be separable Hilbert spaces with inner products ·, · H and ·, · U , respectively, σ-fields B(H) and B(U), respectively, and probability space (Ω, F , P) with filtration F t , t ∈ [0, T]. Consider the controlled and uncontrolled infinite-dimensional stochastic systems of the following form: where We view these dynamics in an iterative fashion in order to realize an iterative method. As such, the superscript (i) refers to the iteration number. The term W(t) ∈ U corresponds to a Hilbert space Wiener process, which is a generalization of the Wiener process in finite dimensions. When this noise profile is spatially uncorrelated, we call it a cylindrical Wiener process, which requires the added assumptions on A in ([2] Hypothesis 7.2) in order to form a contractive, unitary, linear semigroup, which is required to guarantee the existence and uniqueness of F t -adapted weak solutions X(t), t ≥ 0. A thorough description of the Wiener process in Hilbert spaces, along with its various forms, can be found in Appendix A. For generality, Equations (2) and (3) introduce the parameter ρ ∈ R, which acts as a uniform scaling of the covariance of the Hilbert space Wiener process. This parameter also appears as a "temperature" parameter in the context of Equation (1).
In what follows, ·, · S denotes the inner product in a Hilbert space S and C([0, T]; H) denotes the space of continuous processes in H for t ∈ [0, T]. Define the measure on the path space of uncontrolled trajectories produced by Equation (2) as L and define the measure on the path space of controlled trajectories produced by Equation (3) as L (i) . The notation E L denotes expectations over paths as Feynman path integrals.
Many physical and engineering systems can be written in the abstract form of Equation (2) by properly defining operators A , F and G along with their corresponding domains. Examples can be found in our simulated experiments, as well as Table 1, with more complete descriptions in ([2] Chapter 13)). The goal of this work is to establish control methodologies for stochastic versions of such systems.
Control tasks defined over SPDEs typically quantify task completion by a measurable functional J : C([0, T]; H) → R referred to as the cost functional, given by the following: where X(·, ω) ∈ C([0, T]; H) denotes the entire state trajectory, φ X(T), T is a terminal state cost and X(s), s is a state cost accumulated over the time horizon s ∈ [t, T].
With this, we define the terms of Equation (1). More information can be found in Appendix B.
Define the free energy of cost function J(X) with respect to the uncontrolled path measure L and temperature ρ ∈ R as follows [32]: Additionally, the generalized entropy of controlled path measure L (i) with respect uncontrolled path measure L is defined as follows: where "<<" denotes absolute continuity [32]. The relationship between free energy and relative entropy was extended to a Hilbert space formulation in [32]. Based on the free energy and generalized entropy definitions, Equation (1) with temperature T = 1 ρ becomes the so-called Legendre transformation, and takes the following form: with equilibrium probability measure in the form of a Gibbs distribution as follows: The optimality of L * is verified in [32]. The statistical physics interpretation of inequality Equation (7) is that maximization of entropy results in a reduction in the available energy. At the thermodynamic equilibrium, the entropy reaches its maximum and V = E − TS. Table 1. Examples of commonly known semi-linear PDEs in a fields representation with subscript x representing partial derivative with respect to spatial dimensions and subscript t representing partial derivatives with respect to time. The associated operators A and F(t, X) in the Hilbert space formulation are colored blue and violet, respectively.

Equation Name Partial Differential Equation Field State
Nagumo Phase of a material Navier-Stokes The free energy-relative entropy relation provides an elegant methodology to derive novel algorithms for distributed and boundary control problems of SDPEs. This relation is also significant in the context of SOC literature, wherein optimality of control solutions rely on fundamental principles of optimality, such as the Pontryagin maximum principle [10] or the Bellman principle of optimality [11]. Appendix F shows that by applying a properly defined Feynman-Kac argument, the free energy is equivalent to a value function that satisfies the HJB equation. This connection is valid for general probability measures, including measures defined on path spaces induced by infinite-dimensional stochastic systems.
Our derivation is general in the context of [30], wherein they apply a transformation that is only possible for state-dependent cost functions. The proof given in Appendix E is novel for a generic state and a time-dependent cost to the best knowledge of the authors. The observation that the Legendre transformation in Equation (7) is connected to optimality principles from SOC motivates the use of Equation (8) for the development of stochastic control algorithms.
Flexibility of this approach is apparent in the context of stochastic boundary control problems, which are theoretically more challenging due to the unbounded nature of the solutions [29,37]. The HJB theory for these settings is not as mature, and results are restricted to simplistic cases [38]. Nonetheless, since Equation (7) holds for arbitrary measures, the difficulties of related works are overcome by the proposed information theoretic approach. Hence, in either the stochastic boundary control or distributed control case, the free energy represents a lower bound of a state cost plus the associated control effort. Despite losing connections to optimality principles in systems with boundary control, our strategy in both distributed and boundary control settings is to optimize the distance between our parameterized control policies and the optimal measure in Equation (8) so that the lower bound of the total cost can be approached by the controlled system. Specifically, we look for a finite set of decision variables θ * that yield a Hilbert space control input U (·) that minimizes the distance to the optimal path measure as follows:

Stochastic Optimization in Hilbert Spaces
To optimize Equation (9), we apply the chain rule for the Radon-Nikodym derivative twice. While this is necessary on the right term for our control update, this is applied to the left term for importance sampling, which enhances algorithmic convergence. In each instance, the chain rule has the form: Note that the first derivative is given by Equation (8), while the second derivative is given by a change of measure between control and uncontrolled infinite dimensional stochastic dynamics. This change in measure arises from a version of Girsanov's Theorem, provided with a proof in Appendix C. Under the open-loop parameterization, the following holds: Girsanov's theorem yields the following change of measure between the two SPDEs: where x ∈ D ⊂ R n denotes the localization of actuators in the spatial domain D of the SPDEs and m ∈ U are design functions that specify how actuation is incorporated into the infinite dimensional dynamical system. This parameterization can be used for both openloop trajectory optimization as well as for model predictive control. In our experiments we apply model predictive control through re-optimization and turn Equation (12) into an implicit feedback-type control. Optimization using Equation (9) with policies that explicitly depend on the stochastic field is also possible and is considered, using gradient-based optimization in [19][20][21].
To simplify optimization in Equation (9), we further parameterize u(t; θ) as a simple measurable function. In this case, the parameters θ consist of all step functions {u i }. With this representation, we arrive at our main result-an importance sampled variational controller of the following form: Lemma 1. Consider the controlled SPDE in (3) and a parameterization of the control as specified by (12), with θ consisting of step functions {u i }. The iterative control scheme for solving the stochastic control problem u * = argmax S(L * ||L). (16) is given by the following expression: where and Proof. See Appendix D.  (17) and (18) and related explanations for a more complete explanation. Although the rollout images appear pictorially similar, they represent different realizations of the noise process dW t .
Lemma 1 yields a sampling-based iterative scheme for controlling semilinear SPDEs, and is depicted in Figure 2. An initial control policy, which is typically initialized by zeros, is applied to the semilinear SPDE. The controlled SPDE then evolves with different realizations of the Wiener process in a number of trajectory rollouts. The performance of these rollouts is evaluated on the importance sampled cost function in Equation (18). These are used to calculate the Gibbs averaged performance weightings exp(−ρJ (i) )/E (i) Finally, the outer expectation in Equation (17) is evaluated, and used to produce an update to the control policy. This procedure is repeated over a number of iterations. In the open-loop setting, the procedure considers the entire time window [0, T], and the entire control trajectory is optimized in a "single shot". In contrast, in the MPC setting, a shorter time window [t sim , T sim ] is considered for I iterations; the control at the current time step u I (t sim ) is applied to the system; and the window recedes backward by a time step ∆t. This procedure is explained in greater detail in Appendix J.
For the purposes of implementation, we perform the approximation as follows: where ∆β , and {e j } form a complete orthonormal system in U. This is based on truncation of the cylindrical Wiener noise expansion as follows: We note that the control of SPDEs with cylindrical Wiener noise, as shown above, can be extended to the case in [30] in which G(t, X) is treated as a trace-class covariance operator √ Q of a Q-Wiener process dW Q (t). See Appendix H for more details. The resulting iterative control policy is identical to Equation (17) derived above.

Comparisons to Finite-Dimensional Optimization
In light of recent work that applied finite dimensional control after reducing the SPDE model to a set of SDEs or ODEs, we highlight the critical advantages of optimizing in Hilbert spaces before discretizating. The main challenge with performing optimizationbased control after discretization is that SPDEs typically reduce to degenerate diffusion process for which importance sampling schemes are difficult. Consider the following finite dimensional SDE representation of Equation (2): whereX ∈ R d is a d-dimensional vector comprising the values of the stochastic field at particular basis elements. The terms A, F , and G are matrices associated with their respective Hilbert space operators. The matrix M ∈ R d×k , where k is the number of actuators placed in the field. The vector dβ ∈ R m collects noise terms and R collects associated finite dimensional basis vectors of Equation (22). The matrix R ∈ R d×m is composed of d rows, which is the number of basis elements used to spatially discretize the SPDE Equation (2), and m columns, which is the number of expansion terms of Equation (22) that are used. Girsanov's theorem for SDEs of the form Equation (23) requires the matrix R to be invertible as seen in the resulting change of measure: Deriving the optimal control in the finite dimensional space requires that (a) the noise term is expanded to at least as many terms as the points on the spatial discretization d ≤ m, and (b) the resulting diffusion matrix R in Equation (23) is full rank. Therefore, increasing finite dimensional approximation accuracy increases the complexity of the sampling process and optimal control computation. This is even more challenging in the case of SPDEs with Q-Wiener noise, where many of the eigenvalues in the expansion of W(t) must be arbitrarily close to zero.
Other finite dimensional approaches, as in [39], utilize Gaussian density functions instead of the measure theoretic approach. These approaches are not possible firstly due to the need to define the Gaussian density with respect to a measure other than the Lebesgue measure, which does not exist in infinite dimensions. Secondly, an equivalent Euler-Maruyama time discretization is not possible without first discretizing spatially. Finally, after spatial discretization, the use of transition probabilities based on density functions requires the invertibility of RR T (see Appendix I). These characteristics make Gaussian density-based approaches not suitable for deriving optimal control of SPDEs.

Numerical Results
Performing variational optimization in the infinite dimensional space enables a general framework for controlling general classes of stochastic fields. It also comes with algorithmic benefits from importance sampling and can be applied in either the open loop or MPC mode for both boundary and distributed control systems. Critically, it avoids feasibility issues in optimizing finite dimensional representations of SPDEs. Additional flexibility arises from the freedom to choose the model reduction method that is best suited for the problem without having to change the control update law. Details on the algorithm and more details on each simulated experiment can be found in Appendixes J and K.

Distributed Control of Stochastic PDEs in Fluid Physics
Several simulated experiments were conducted to investigate the efficacy of the proposed control approach. The first explores control of the 1D stochastic viscous Burgers' equation with non-homogeneous Dirichlet boundary conditions. This advection-diffusion equation with random forcing was studied as a simple model for turbulence [40,41].
The control objective in this experiment is to reach and maintain a desired velocity at specific locations along the spatial domain, depicted in black. In order to achieve the task, the controller must overcome the uncontrolled spatio-temporal evolution governed by an advective and diffusive nature, which produces an apparent velocity wave front that builds across the domain as depicted on the bottom left of Figure 3. The results suggest that both the open-loop and MPC schemes have comparable success in controlling the stochastic Burgers SPDE. The open-loop setting depicts the apparent rightward wavefront that is not as strong in the MPC setting. There is also quite a substantial difference in variance over the trajectory rollouts. The open-loop setting depicts a smaller variance overall, while the MPC setting depicts a variance that shrinks around the objective regions. The MPC performance is desirable since the performance metric only considers the objective regions. The root mean squared error (RMSE) and variance averaged over the desired regions is provided in Table 2. The stochastic Nagumo equation with homogeneous Neumann boundary conditions is a reduced model for wave propagation of the voltage in the axon of a neuron [42]. This SPDE shares a linear diffusion term with the viscous Burgers equation as depicted in Table 1. However, as shown in the bottom left subfigure of Figure 4, the nonlinearity produces a substantially different behavior, which propagates the voltage across the axon with our simulation parameters in about 5 s. This set of simulated experiments explores two tasks: accelerating the rate at which the voltage propagates across the axon, and suppressing the voltage propagation across the axon. This is analogous to either ensuring the activation of a neuronal signal, or ensuring that the neuron remains inactivated.   The results of the two stochastic Nagumo equation tasks suggest that both control schemes achieve success on both the acceleration and suppression tasks. While the performance appears substantially different outside the target region, the two control schemes have very similar performance on the desired region, which is the only penalized region in the optimization objective. In the top subfigures of Figures 4 and 5, the desired region is zoomed in on. The zoomed in views depict a higher variance in the state trajectories of the open-loop control scheme than the MPC scheme.
As in the stochastic viscous Burgers experiment, there is an apparent trade-off between the two control schemes. The MPC scheme yields a desirable lower variance in the region that is being considered for optimization, but produces state trajectories with very high variance outside the goal region. The open loop control is understood as seeking to achieve the task by reaching low variance trajectories everywhere, while the MPC scheme is understood as acting reactively (i.e., re-optimizes based on state measurements) to a propagating voltage signal. The RMSE and variance averaged over the desired region of 128 trials of each experiment are given in Table 3. The next simulated experiment explores scalability to 2D spatial domains by considering the 2D stochastic heat equation with homogeneous Dirichlet boundary conditions. This experiment can be thought of as attempting to heat an insulated metal plate to specified temperatures in specified regions while the edges remain at a temperature of 0 at some scale. The desired temperatures and regions associated with this experiment are depicted in the left subfigure of Figure 6. This experiment tests the MPC scheme. Starting from a random initial temperature profile, as in the second subfigure of Figure 6, and using a time horizon of 1.0 s, the MPC controller is able to achieve the desired temperature profile toward the end of the time horizon as shown in the fourth subfigure of Figure 6. The third subfigure of Figure 6 depicts the middle of the time horizon. The MPC controller used 5 optimization iterations at every timestep and 25 sampled trajectories per iteration.
This result suggests that in this case, this approach can handle the added complexity of 2D stochastic fields. As depicted in the right subfigure of Figure 6, the proposed MPC control scheme solves the task of reaching the desired temperature at the specified spatial regions.

Boundary Control of Stochastic PDEs
The control update in Equation (17) describes control of SPDEs by distributing actuators throughout the field. However, our framework can also handle systems with control and noise at the boundary. A key requirement is to write such dynamical systems in the mild form as follows: where the operator D corresponds to the boundary conditions of the problem, and is called the Dirichlet map (Neumann map, respectively) for Dirichlet (Neumann, respectively) boundary control/noise. These maps take operators defined on the boundary Hilbert space Λ 0 to the Hilbert space H of the domain. λ is a real number also associated with the boundary conditions. The operator dV describes a cylindrical Wiener process on the boundary Hilbert space Λ 0 . For further details, the reader can refer to the discussion in [29] Section 2.5, Appendix C.5, and Appendix G.
Studying optimal control problems with dynamics, as in Equation (25), is rather challenging. HJB theory requires additional regularity conditions, and proving convergence of Equation (25) becomes nontrivial, especially when considering Dirichlet boundary noise. Numerical results are limited to simplistic problems. Nevertheless, Equation (17) is extended to the case of boundary control by similarly using tools from Girsanov's theorem to obtain the change of measure as follows: which was also utilized in reference [43] for studying solutions of SPDEs, similar to Equation (25). Using the control parameterization of the distributed case above results in the same approach described in Equation (17) with inner products taken with respect to the boundary Hilbert space Λ 0 to solve stochastic boundary control problems. The stochastic 1D heat equation under Neumann boundary conditions was explored to conduct simulated experiments that investigate the efficacy of the proposed framework in stochastic boundary control settings. The objective is to track a time-varying profile that is uniform in space by actuation only at the boundary points. The MPC scheme of Equation (17), with 10 optimization iterations per time step is depicted in the left subfigure of Figure 7. The random sample of the controlled state trajectory, depicted in a violet to red color spectrum, remains close to the time-varying desired profile, depicted in magenta. The associated bounded actuation signals acting on the two boundary actuators are depicted in the right subfigure of Figure 7. As suggested by the results of the simulated experiments, the authors note a clear empirical iterative improvement of the control policy on each of the experiments. This necessitates a deeper theoretical analysis of the convergence of the proposed algorithm, and is influenced by several of the parameters that appear in Algorithms A1 and A2. The parameter ρ, which appears in the controlled and uncontrolled dynamics in Equations (2) and (3) as well as in the Legendre transformation Equation (7), influences the intensity of the stochasticity and the relative weightings of the terms in Equation (18), which in general leads to an exploration-exploitation trade off. The number of rollouts also has a significant effect on the empirical performance. In general, a larger number of rollouts is advantageous due to a more representative sampling of state space, as well as a better approximation of the expectation, yet can lead to a larger computational burden. In the MPC setting, the time horizon has a significant effect on the empirical performance. This is typical of MPC methods, as a short receding window can cause the algorithm to be myopic, while a large receding window recovers the "single shot" or open-loop performance. Finally, the spatial and temporal discretization size has a significant effect on algorithmic performance, due to the errors introduced in large spatial or temporal steps in the resulting discrete equations, which may ultimately fail the Courant-Friedrichs-Lewy conditions of the SPDE.
The above experiments were designed to cover stochastic SPDEs with nonlinear dynamics, multiple spatial dimensions, time-varying objectives, and systems with both distributed and boundary actuation. This range explores the versatility of the proposed framework to problems of many different types. Throughout these experiments, the control architecture produces state trajectories that solve the objective with high probability for the given stochasticity.

Conclusions
This manuscript presented a variational optimization framework for distributed and boundary controlled stochastic fields based on the free energy-relative entropy relation. The approach leverages the inherent stochasticity in the dynamics for control, and is valid for generic classes of infinite-dimensional diffusion processes. Based on thermodynamic notions that have demonstrated connections to established stochastic optimal control principles, algorithms were developed that bridge the gap between abstract theory and computational control of SPDEs. The distributed and boundary control experiments demonstrate that this approach can successfully control complex physical systems in a variety of domains.
This research opens new research directions in the area of control of stochastic fields that are ubiquitous in the domain of physics. Based on the use of forward sampling, future research on the algorithmic side will include the development of efficient methods for the representation and propagation of stochastic fields, using techniques in machine learning, such as deep neural networks. Other directions include explicit feedback parameterizations and, in the context of boundary control, HJB approaches in the information theoretic formulation.

Informed Consent Statement: Not applicable.
Data Availability Statement: Data supporting the reported results were produced "from scratch" by the algorithms detailed in the manuscript.

Acknowledgments:
We would like to express our gratitude to Andrzej Swiech for our useful and pertinent discussions, which clarified certain aspects of SPDEs in Hilbert spaces.

Conflicts of Interest:
The authors declare no conflicts of interest.

Abbreviations
The following abbreviations are used in this manuscript:

Appendix A. Description of the Hilbert Space Wiener Process
In this section, we provide formal definitions of various forms of the Hilbert space Wiener process. Some of these statements can be found in [2], Section 4.1.

Definition A1.
Let H denote a Hilbert space. A H-valued stochastic process W(t) with probability law L W(·) is called a Wiener process if the following hold: (i) W(0) = 0 (ii) W has continuous trajectories.
(iii) W has independent increments.
be a complete orthonormal system for the Hilbert Space H. Let Q denote the covariance operator of the Wiener process W(t). Note that Q satisfies Qe i = λ i e i , where λ i is the eigenvalue of Q that corresponds to eigenvector e i . Then, W(t) ∈ H has the following expansion: where β j (t) are real valued Brownian motions that are mutually independent of (Ω, F , P).
be a complete orthonormal system for the Hilbert space H. An operator A on H with the set of its eigenvalues {λ i } ∞ i=1 in a given basis {e i } ∞ i=1 is called a trace-class operator if the following holds: The two primary Wiener processes that are typically used to model spatio-temporal noise processes in the SPDE literature are the cylindrical Wiener process and the Q-Wiener process. These are both referred to in the main text, and are defined in the following two definitions.

Definition A3. A Wiener process W(t) on H with covariance operator Q is called a cylindrical
Wiener process if Q is the identity operator I.

Definition A4. A Wiener process W(t) on H with covariance operator Q is called a Q-Wiener process if Q is of trace-class.
An immediate fact following Definition A3 is that the cylindrical Wiener process acts spatially everywhere on H with equal magnitude. One can easily conclude that for a cylindrial Wiener process, the eigenvalues {λ i } ∞ i=1 of the covariance operator Q are all unity, thus the following holds: However, we note that in this case, the series in (A1) converges in another Hilbert space U 1 ⊃ U when the inclusion ι : U → U 1 is Hilbert-Schmidt. For more details, see [2]. On the other hand, immediately following Definition A4 is the fact that a Q-Wiener process must not have a spatially equal effect everywhere on the domain. More precisely, one has the following proposition.

Proposition A2. Let W(t) be a Q-Wiener process on H with covariance operator Q. Let {λ
denote the set of eigenvalues of Q in the complete orthonormal system {e i } ∞ i=1 . Then, the eigenvalues must fall into one of the following three cases: (i) For any ε > 0, there are only finitely many eigenvalues λ i of covariance operator Q such that |λ i | > ε. That is, the set {i ∈ N + : |λ i | > ε}, where N + is the positive natural numbers, has finite elements.
(ii) The eigenvalues λ i of covariance operator Q follow a bounded periodic function such that |λ i | > 0 ∀ i ∈ N + and ∑ ∞ i=1 λ i = 0. (iii) Both case (i) and case (ii) are satisfied. In this case, the eigenvalues follow a bounded and convergent periodic function with lim i→∞ λ i = 0.

Appendix B. Relative Entropy and Free Energy Dualities in Hilbert Spaces
In this section, we provide the relation between free energy and relative entropy. This connection is valid for general probability measures, including measures defined on path spaces induced by infinite-dimensional stochastic systems. In what follows, L p (1 ≤ p < ∞) denotes the standard L p space of measurable functions and P denotes the set of probability measures.
Definition A5. (Free Energy) Let L ∈ P a probability measure on a sample space Ω, and consider a measurable function J : L p → R + . Then, the following term, is called the free energy of J with respect to L and ρ ∈ R. The function log e denotes the natural logarithm Definition A6. Generalized entropy: Let L,L ∈ P, then the relative entropy ofL with respect to L is defined as follows.
where "<<" denotes absolute continuity ofL with respect to L. We say thatL is absolutely continuous with respect to L and we writeL << The free energy and relative entropy relationship is expressed by the following theorem: Theorem A1. Let (Ω, F ) be a measurable space. Consider L,L ∈ P and Definitions A5 and A6. Under the assumption thatL << L, the following inequality holds: where E L , EL denote expectations under probability measures L,L, respectively. Moreover, ρ ∈ R + and J : L p → R + . The inequality in (A5) is the so-called Legendre transformation.
By defining the free energy as temperature T = 1 ρ , the Legendre transformation has the following form: and the equilibrium probability measure has the classical form: To verify the optimality of L * , it suffices to substitute (A7) in (A5) and show that the inequality collapses to an equality [35]. The statistical physics interpretation of inequality (A6) is that maximization of entropy results in a reduction in the available energy. At the thermodynamic equilibrium, the entropy reaches its maximum and V = E − TS.
Let Ω be a sample space with a σ-algebra F . Consider the following H-valued stochastic processes: where X(0) =X(0) = x and W ∈ U is a cylindrical Wiener process with respect to measure P. Moreover, for each Γ ∈ C([0, T]; H), let the law of X be defined as L(Γ) := P(ω ∈ Ω|X(·, ω) ∈ Γ). Similarly, the law ofX is defined asL(Γ) := P(ω ∈ Ω|X(·, ω) ∈ Γ). Assume Then, the following holds: Proof. Define the process:Ŵ Under the assumption in (A10),Ŵ is a cylindrical Wiener process with respect to a measure Q determined by the following: The proof for this result can be found in ( [2], Theorem 10.14). Now, using (A13), (A8) can be rewritten as the following: Notice that the SPDE in (A16) has the same form as (A9). Therefore, under the introduced measure Q and noise profileŴ, X(·, ω) becomes equivalent toX(·, ω) from (A9). Conversely, under measure P, (A15) (or (A16)) behaves as the original system in (A8). In other words, (A8) and (A16) describe the same system on (Ω, F , P). From the uniqueness of solutions and the aforementioned reasoning, one has the following: The result follows from Equation (A14).

Appendix D. Proof of Lemma 1
Proof. Under the open loop parameterization U (x, t) = m(x) T u(t), the problem takes the following form: By using the change of measures in Equation (13) of the main text, minimization of the last expression is equivalent to the minimization of the following expression: As stated in Lemma 1, we apply the control in discrete time instances, and consider the class of step functions u i , i = 0, . . . , L − 1 that are constant over fixed-size intervals [t i , t i+1 ] of length ∆t . We have the following: where we have used the fact that M is constant with respect to time. Due to the symmetry of M, minimization of the expression above with respect to u i results in the following: Since we cannot sample directly from the optimal measure L * , we need to express the above expectation with respect to the measure induced by controlled dynamics, L (i) . We can then directly sample controlled trajectories based on L (i) and approximate the optimal control trajectory. The change in expectation is achieved by applying the Radon-Nikodym derivative. These so-called importance sampling steps are as follows. First define W (i) in a similar fashion to Equation (A13), as follows: Similar to Equation (A16), one can rewrite the uncontrolled dynamics as the following: Under the open-loop parameterization U (t)(x) = m(x) u j , where u j are step functions on each interval [t j , t j+1 ], the change of measures becomes the following: One can alternatively write this as the following: It follows that: In order to derive the iterative scheme, we perform one step of importance sampling and express the associated expectations with respect the measure induced by the controlled SPDE in Equation (3) of the main text. Let us begin by modifying Equation (A17) via the appropriate change of measures from (A20), as well as (A18): One can reorder Equation (A22) as the following: and plug it into Equation (A24) to yield the following: which is equivalent to Equation (17) in the main text with J (i) defined by Equation (18) in the main text.
evaluated on stochastic trajectories X T t,X generated by the infinite dimensional stochastic systems in Equations (2) and (3) of the main text and ρ ∈ R + . The trajectory dependent terms Φ X T t,X : L p → R + and J X T t,X : L p → R + are defined as follows: Additionally, let ψ(t, X) ∈ C 1,2 b ([0, T] × H). Then, the function ψ(t, X) satisfies the following equation: −∂ t ψ t, X(t) = −ρ t, X(t) ψ t, X(t) + ψ X , A X(t) + F X(t) (A29) Proof. The proof starts with the expectation in (A27), which is an expectation conditioned on the filtration F t . To keep the notation short, we drop the dependencies on t and X(t), and write φ T = φ T, X(T) , t = t, X(t) , and g t = g t, X(t) . We split the integrals inside the expectations to write the following: By using the law of iterated expectations between the two sub-sigma algebras F t ⊆ F t+δt we have the following: Next, we use the fact that the conditioning on the filtration F t+δt results in the following equality: By further using this property of independence we have the following: The last expression provides the backward propagation of the ψ t, X(t) by employing a expectation over ψ t + δt, X(t + δt) . To get the backward deterministic Kolmogorov equations for the infinite dimensional case, we subtract the term E ψ t + δt, X(t + δt) F t from both sides: Next, we take the limit as δt → 0 and we have the following: Thus, we have to compute three terms. We employ the Lebegue dominated convergence theorem to pass the limit inside the expectations: By using the Itô differentiation rule ([2] Theorem 4.32) for the case of infinite dimensional stochastic systems, we have the following: The next term is as follows: The third term is as follows: Combining the three terms above, we have shown that ψ t, X(t) satisfies the backward Kolmogorov equation for the case of the infinite dimensional stochastic system in Equation (3) of the main text.

Appendix F. Connections to Stochastic Dynamic Programming
In this section, we show the connections between stochastic dynamic programming and the free energy. Before proceeding, let C k,n b ([0, T] × H) denote the space of all functions ξ : [0, T] × H → R 1 that are k times continuously Fréchet differentiable with respect to time t and n times Gâteaux differentiable with respect to X. In addition, all their partial derivatives are continuous and bounded in [0, T] × H. Furthermore, trajectories starting at X ∈ E over the time horizon [t, T] are denoted X T t,X ≡ X(T, t, ω; X). Using this notation, we have that X(t, t, ω; X) = X. Finally, for real separable Hilbert space E, by the notation x ⊗ y, we mean a linear bounded operator on E such that the following holds: First, we perform the exponential transformation on the function ψ t, X(t) ∈ C 1,2 b ([0, T] × H) and show that the transformed function V t, X(t) ∈ C 1,2 b ([0, T] × H) satisfies the HJB equation for the case of infinite dimensional systems [29]. This result is derived with general Q-Wiener noise with covariance operator Q, however it holds also for cylindrical Wiener noise (Q = I). This requires applying the Feynman-Kac lemma and deriving the backward Chapman-Kolmogorov equation for the case of infinite-dimensional stochastic systems. The backward Kolmogorov equations result in the HJB equation after a logarithmic transformation is applied. We start from the free energy and relative entropy inequality in (A5) and define the function ψ t, X(t) : [0, T] × H → R as follows: which is simply the free energy as defined in Definition A5. By using the Feynman-Kac lemma we have that the function ψ(t, X) satisfies the backward Chapman-Kolmogorov equation specified as follows: where ∂ t ψ t, X(t) denotes the Fréchet derivative of ψ t, X(t) with respect to t, and ψ X and ψ XX denote the first and second Gâteaux derivatives of ψ t, X(t) with respect to X(t).
Next, we compute the functional derivatives V X and V XX as functions of the functional derivatives ψ X and ψ XX . This results in the following: The last equations simplifies to the following: From the definition of the trace operator Tr[A] := ∑ ∞ j=1 Ae j , e j for orthonormal basis {e j } over the domain of A, we have the following expression: Since (x ⊗ y)z = x y, z , we have the following: Substituting back to (A32) we have the HJB equation for the infinite dimensional case: In the same vein, one can also show that the relative entropy between the probability measures induced by the uncontrolled and controlled infinite dimensional systems in Equations (2) and (3) of the main text, respectively, result in an infinite dimensional quadratic control cost. This requires the use of the Radon-Nikodym derivative from our generalization of Girsanov's theorem for the case of infinite dimensional stochastic systems in Equations (2) and (3) of the main text.

Appendix G. SPDEs under Boundary Control and Noise
Let us consider the following problem with Neumann boundary conditions: where ∆ x corresponds to the Laplacian, λ ≥ 0 is a real number, O is a bounded domain in R d with regular boundary ∂O and ∂ ∂n denotes the normal derivative, with n being the outward unit normal vector. As shown in [29] and references therein, there exists a continuous operator D N : H s (∂O) → H s+3/2 (O) such that D N γ is the solution to (A33). Given this operator, stochastic parabolic equations with Neumann boundary conditions of the following type: which can be written in the mild abstract form: where G N := (λI − A N ) 3/4− D N , and the remaining terms are defined with respect to the space-time formulation of (A35). A similar expression can be obtained for Dirichlet conditions as well; however, the solution has to be investigated under weak norms, or in weighted L 2 spaces. More details can be found in ( [29] Appendix C) and references therein.

Appendix H. An Equivalence of the Variational Optimization Approach for SPDEs with Q-Wiener Noise
In this section, we briefly discuss how one obtains an equivalent variational optimization as in Section 3 of the main text, for control of SPDEs with Q-Wiener noise. Consider the uncontrolled and controlled version of an H-valued process be given, respectively, by the following: with initial condition X(0) =X(0) = ξ. Here, Q is a trace-class operator, and W ∈ U is a cylindrical Wiener process. The assumption that Q is of trace class is expressed as follows: Qe n , e n < ∞.
As opposed to the discussion following Equation (3) of the main text, in this case we do not require any contractive assumption on the operator A due to the nuclear property of the operator Q. The stochastic integral where the basis {e n } satisfies the eigenvalue-eigenvector relationship Qe n = λe n . The process W Q (t) satisfies the properties in Definition A4, and is therefore a Q-Wiener process.
The above case is an SPDE driven by Q-Wiener noise, which is quite different from the cylindrical Wiener process described in the rest of this work. In order to state the Girsanov's theorem in this case, we first define the Hilbert space Let Ω be a sample space with a σ-algebra F . Consider the following H-valued stochastic processes: where X(0) =X(0) = x and W Q ∈ U is a Q-Wiener process with respect to measure P. Moreover, for each Γ ∈ C([0, T]; H), let the law of X be defined as L(Γ) := P(ω ∈ Ω|X(·, ω) ∈ Γ). Similarly, the law ofX is defined asL(Γ) := P(ω ∈ Ω|X(·, ω) ∈ Γ). Then, we have the following: where we have defined ψ(t) := √ ρU t,X(t) ∈ U 0 and assumed the following: Proof. The proof is identical to the proof of Theorem A2.
Note that ψ(t) in this case is identical to ψ(t) in Theorem A2. As a result, despite having Q-Wiener noise, we have the same variational optimization for this case as in Section 3 of the main text.

Appendix I. A Comparison to Variational Optimization in Finite Dimensions
In what follows, we show how degeneracies arise for a similar derivation in finite dimensions. The stochastic dynamics are given by the following: where W(t) is a cylindrical Wiener process. Now, let the Hilbert space state vector X(t) ∈ H be approximated by a finite dimensional state vector X(t) ≈X(t) ∈ R d with arbitrary accuracy, where d is the number of grid points. In order to rewrite a finite dimensional form of (A42), the cylindrical Wiener noise term W(t) must be captured by a finite dimensional approximation. The expansion of W(t) in (A1) is restated here and truncated at m terms: where λ j = 1, ∀j ∈ N in the case of cylindrical Wiener noise, and β j (t) is a standard Wiener process on R. The stochastic dynamics in (A42) become a finite set of SDEs: The terms A, F , and G are matrices associated with the Hilbert space operators A , F, and G respectively. The matrix M has dimensionality M ∈ R d×k , where k is the number of actuators placed in the field. The vector dβ ∈ R m collects the Wiener noise terms in the expansion (A43), and the matrix R collects finite dimensional basis vectors from (A43). As noted in the main paper, the dimensionality of the R is R ∈ R d×m . The degeneracy arises when d > m for the case of the cylindrical noise. For the case of Q-Wiener noise, degeneracy may arise, even when d ≤ m and Rank(R) < d. In both cases, the issue of degeneracy prohibits the use of the Girsanov theorem for the importance sampling steps, due to the lack of invertibility of R. With respect to the approach relying on Gaussian densities, the derivation would require the following time discretization of the reduced order model in (A44): Without loss of generality, we simplify the expression above by assuming the G(t,X) = I d×d . The transition probability takes the following form: where the term µX(t + ∆t) is the mean and ΣX is the variance defined as follows: The existence of the transition probability densities requires invertibility of RR T , which is not possible when d < m or when Rank(R) < d for d ≥ m.

Appendix J. Algorithms for Open Loop and Model Predictive Infinite Dimensional Controllers
The following algorithms use equations derived in [42] for finite difference approximation of semi-linear SPDEs for Dirichlet and Neumann Boundary conditions. Spatial discretization is done as follows: pick a number of coordinate-wise discretization points J on the coordinate-wise domain D = [a, b] ⊂ R such that each spatial coordinate is discretized as x k = a + k b−a J where k = 0, 1, 2, . . . , J. For our experiments, the function that specifies how actuation is implemented by the infinite dimensional control is of the following form: where µ l denotes the spatial position of the actuator on [a, b] and σ l controls the influence of the actuator on nearby positions. Next, we provide two algorithms for infinite dimensional stochastic control. In particular, Algorithm A1 is for open-loop trajectory optimization and Algorithm A2 is for model predictive control that uses implicit feedback.

7:
Compute entries of the actuation matrixM by (A50) 8: Compute the control actions applied to each grid point, U (t) = u(t) TM
In the boundary control case, we make use of the 1D stochastic heat equation given as follows: For Dirichlet and Neumann boundary conditions, we have h(t, x) = γ(x), ∀x ∈ ∂O and h x (t, x) = γ(x), ∀x ∈ ∂O, respectively. Regarding our 1D boundary control example, we set = 1, σ = 0.1, h x (t, 0) = u 1 (t) and h x (t, a) = u 2 (t). In this case, m l (x) is simply given by the identity function and the corresponding inner products associated with Girsanov's theorem are given by the standard dot product. Finally, the cost function used is the same as above with S = {x|0 < x < a} and the following: h desired (t, x) = 1, for t ∈ [0, 0.4], 3, for t ∈ [0, 0.4] and t ∈ [0.8, 1.3].

Appendix K.2. Burgers SPDE
The 1D stochastic Burgers PDE with non-homogeneous Dirichlet boundary conditions is as follows: h(t, 0) = h(t, a) = 1.0 h(0, x) = 0, ∀x ∈ (0, a) where the parameter is the viscosity of the medium. (A53) considers a simple model of a 1D flow of a fluid in a medium with non-zero flow velocities at the two boundaries. The goal is to achieve and maintain a desired flow velocity profile at certain points along the spatial domain. As seen in the desired profile in Figure 3 of the main paper, there are three areas along the spatial domain with desired flow velocity such that the flow has to be accelerated, then decelerated, and then accelerated again while trying to overcome the stochastic forces and the dynamics governed by the Burgers PDE. Similar to the experiments for the heat SPDE, we consider actuators behaving as Gaussian-like exponential functions with the means co-located with the actuator locations at µ = 0.2a, 0.3a, 0.5a, 0.7a, 0.8a and the spatial effect (variance) of each actuator given by σ 2 l = (0.1a) 2 , ∀ l = 1, . . . , 5. The parameter a = 2.0 m is the length of the channel along which the fluid is flowing.
This spatial domain was discretized, using a grid of 128 points. The numerical scheme used semi-implicit forward Euler discretization for time and central difference approximation for both the 1st and 2nd order derivatives in space. The 1st order derivative terms in the advection term hh x were evaluated at the current time instant, while the 2nd order spatial derivatives in the diffusion term h xx were evaluated at the next time instant; hence, the scheme is semi-implicit. The following are values of some other parameters used in our experiments: time discretization ∆t = 0.01, total simulation time = 1.0 s, MPC time horizon = 0.1 s, and the scaling parameter κ = 100. The cost function considered for the experiments was defined as follows: where the function 1 S (x) is defined as in (A52) with S = ∪ 3 i=1 , where S 1 = [0.18a, 0.22a], S 2 = [0.48a, 0.52a], and S 3 = [0.78a, 0.82a]. In addition h desired (t, x) = 2.0 m/s for x ∈ S 1 ∪ S 3 , which is at the sides, and h desired (t, x) = 1.0 m/s for x ∈ S 2 which is in the central region.

Appendix K.3. Nagumo SPDE
The stochastic Nagumo equation with Neumann boundary conditions is as follows: The parameter α determines the speed of a wave traveling down the length of the axon and the rate of diffusion. By simulating the deterministic Nagumo equation with a = 5.0, = 1.0 and α = −0.5, we observed that after about 5 s, the wave completely propagates to the end of the axon. Similar to the experiments for the heat SPDE, we considered actuators behaving as Gaussian-like exponential functions with actuator centers (mean values) at µ = 0.2a, 0.3a, 0.4a, 0.5a, 0.6a, 0.7a, 0.8a and the spatial effect (variance) of each actuator given by σ 2 l = (0.1a) 2 , ∀ l = 1, . . . , 7. The spatial domain was discretized using a grid of 128 points. The numerical scheme used semi-implicit forward Euler discretization for time and central difference approximation for the 2nd order derivatives in space. The following are values of some other parameters used in our experiments: time discretization ∆t = 0.01, MPC time horizon = 0.1 s, total simulation time = 1.5 s for acceleration task and total simulation time = 5.0 s for the suppression task, and the scaling parameter κ = 10, 000. The cost function for this experiment was defined as follows: where h desired (t, x) = 0.0 V for the suppression task, and h desired (t, x) = 1.0V for the acceleration task, and the function 1 S (x) is defined as in (A52) with S = [0.7a, 0.99a].