Gaussian processes for data fulfilling linear differential equations

A method to reconstruct fields, source strengths and physical parameters based on Gaussian process regression is presented for the case where data are known to fulfill a given linear differential equation with localized sources. The approach is applicable to a wide range of data from physical measurements and numerical simulations. It is based on the well-known invariance of the Gaussian under linear operators, in particular differentiation. Instead of using a generic covariance function to represent data from an unknown field, the space of possible covariance functions is restricted to allow only Gaussian random fields that fulfil the homogeneous differential equation. The resulting tailored kernel functions lead to more reliable regression compared to using a generic kernel and makes some hyperparameters directly interpretable. For differential equations representing laws of physics such a choice limits realizations of random fields to physically possible solutions. Source terms are added by superposition and their strength estimated in a probabilistic fashion, together with possibly unknown hyperparameters with physical meaning in the differential operator.


Introduction
The larger context of the present work is the goal to construct reduced complexity models as emulators or surrogates that retain mathematical and physical properties of the underlying system. Similar to usual numerical models, such methods aim to represent infinite systems by exploiting finite information in some optimal sense. In the spirit of structure preserving numerics the aim here is to move errors to the "right place", in order to retain laws such as conservation of mass, energy or momentum.
This article deals with Gaussian process (GP) regression on data with additional information known in the form of linear, generally partial differential equations (PDEs). An illustrative application is the reconstruction of an acoustic sound pressure field and its sources from discrete microphone measurements. GPs, a special class of random fields, are used in a probabilistic rather than a stochastic sense: approximate a fixed but unknown field from possibly noisy local measurements. Uncertainties in this reconstruction are modeled by a normal distribution. For the limit of zero measured data a prior has to be chosen whose realizations take values in the expected order of magnitude. An appropriate choice of a covariance function or kernel guarantees that all fields drawn from the GP at any stage fulfill the underlying PDE. This may require to give up stationarity of the process.
Techniques to fit GPs to data from PDEs has been known for some time, especially in the field of geostatistics [1]. A general analysis including a number of important properties is given by [2]. In these earlier works GPs are usually referred to as Kriging and stationary covariance functions / kernels as covariograms. A number of more recent works from various fields [3,4,5] use the linear operator of the problem to obtain a new kernel function for the source field by applying it twice to a generic, usually squared exponential, kernel. In contrast to the present approach, that method is suited best for source fields that are non-vanishing across the whole domain. In terms of deterministic numerical methods one could say that the approach correspond to meshless variants of the finite element method (FEM). The approach in the present work instead represents a probabilistic variant of a procedure related to the boundary element method (BEM), also known as the method of fundamental solutions (MFS) or regularized BEM [6,7,8]. As in the BEM, the MFS also builds on fundamental solutions, but allows to place sources outside the boundary rather than localizing them on a layer. Thus the MFS avoids singularities in boundary integrals of the BEM while retaining a similar ratio of numerical effort and accuracy for smooth solutions. To the author's knowledge the probabilistic variant of the MFS via GPs has first been introduced by [9] to solve the boundary value problem of the Laplace equation and dubbed Bayesian boundary elements estimation method ((BE) 2 M). This work also provides a detailed treatment of kernels for the 2D Laplace equation. A more extensive and general treatment of the Bayesian context as well as kernels and their connection to fundamental solutions is available in [10] under the term probabilistic meshless methods (PMM).
While [9] is focused on boundary data of a single homogeneous equation, and [10] provides a detailed mathematical foundation, the present work aims to explore the topic further for application and extend the recent work in [11]. Starting from general notions some regression techniques are introduced with emphasis on the role of localized sources. For this purpose Poisson, Helmholtz and heat equation are considered and several kernels are derived and tested. To fit a GP to a homogeneous (source-free) PDE, kernels are built via according fundamental solutions. Possible singularities (sources) are moved outside the domain of interest. In particular, boundary conditions on a finite domain can be either supplied or reconstructed in this fashion. In addition contributions by internal sources are superimposed, using again fundamental solutions in the free field. For that part boundary conditions of the actual problem are irrelevant. The specific approach taken here is most efficient for source-free regions with possibly few localized sources that are represented by monopoles or dipoles.

GP regression for data from linear PDEs
Gaussian processes (GPs) are a useful tool to represent and update incomplete information on scalar fields 1 u(x), i.e. a real number u depending on a (multi-dimensional) independent variable x. A GP with mean m(x) and covariance function of kernel k(x, x ) is denoted as The choice of an appropriate kernel k(x, x ) restricts realizations of (1) to respect regularity properties of u(x) such as continuity or characteristic length scales. Often regularity of u does not appear by chance, but rather reflects an underlying law. We are going to exploit such laws in the construction and application of Gaussian processes describing u for the case described by linear (partial) differential equationŝ HereL is a linear differential operator, and q(x) is an inhomogeneous source term. In physical laws dimensions of x usually consist of space and/or time. Physical scalar fields u include e.g. pressure p, temperature T or the electrostatic potential φ e . Corresponding laws under certain conditions include Gauss' law of electrostatics for φ e with LaplacianL = ε∆, frequency-domain acoustics for p with Helmholtz operatorL = ∆ − k 2 0 or thermodynamics for T with heat/diffusion operatorL = ∂ ∂t − D∆. These operators contain free parameters, namely permeability ε, wavenumber k 0 , and diffusivity D, respectively. While ε may be absorbed inside q in a uniform material model of electrostatics, estimation of parameters k 0 or D is useful for material characterization.
For the representation of PDE solutions the weight-space view of Gaussian process regression is useful.
Bayesian inference starting from a Gaussian prior with covariance matrix Σ p for weights w yields a Mercer kernel The existence of such a representation is guaranteed by Mercer's theorem in the context of reproducing kernel Hilbert spaces (RKHS) [8]. More generally one can also define kernels on an uncountably infinite number of basis functions in analogy to (3) via whereφ is a linear operator acting on elements w(ζ ζ ζ ) of an infinite-dimensional weight space parametrized by an auxiliary index variable ζ ζ ζ , that may be multi-dimensional. We representφ via an inner product φ (x, ζ ζ ζ ), w(ζ ζ ζ ) in the respective function space given by an integral over ζ ζ ζ . The infinite-dimensional analogue to the prior covariance matrix is a prior covariance operatorΣ p that defines the kernel as a bilinear form Kernels of the form (6) are known as convolution kernels. Such a kernel is at least positive semidefinite, and positive definiteness follows in the case of linearly independent basis functions φ (x, ζ ζ ζ ) [8].

Construction of kernels for PDEs
For treatment of PDEs possible choices of index variables in (4) or (6) include separation constants of analytical solutions, or the frequency variable of an integral transform. In accordance with [10], using basis functions that satisfy the underlying PDE, a probabilistic meshless method (PMM) is constructed. In particular, if ζ ζ ζ parameterizes positions of sources, and φ (x, ζ ζ ζ ) = G(x, ζ ζ ζ ) in (6) is chosen to be a fundamental solution / Green's function G(x, ζ ζ ζ ) of the PDE, one may call the resulting scheme a probabilistic method of fundamental solutions (pMFS). In [10] sources are placed across the whole computational domain, and the resulting kernel is called natural. Here we will instead place sources in the exterior to fulfill the homogeneous interior problem, as in the classical MFS [6,7,8]. Technically, this is also achieved by setting Σ p (ζ ζ ζ , ζ ζ ζ ) = 0 for either ζ ζ ζ or ζ ζ ζ in the interior. For discrete sources localized ζ ζ ζ = ζ ζ ζ i one obtains again discrete basis functions φ i (x) = G(x, ζ ζ ζ i ) for (4). More generally, according to theorem 2 of [2], for linear PDE operatorsL in (2) with q = 0 we require a Gaussian process of non-zero mean m(x) withL Lk(x, x ) = 0.
HereL acts on the first argument of k(x, x ). Sources affect only the mean m(x) of the Gaussian process, whereas the kernel k(x, x ) should be based on the homogeneous equation. This hints to the technique of [12] discussed in [13] chapter 2.7 to treat m(x) via a linear model added on top of a zero-mean process for the homogeneous equation. In that case we consider is the superposition where h T (x)b is a linear model for m(x) with Gaussian prior mean b 0 and covariance B for the model coefficients. The homogeneous part (10) corresponds to a random process u h (x) where a source-free k is constructed according to (8). The inhomogeneous part (11) may be given by any particular solution u p (x) for arbitrary boundary conditions. Using the limit of a vague prior with b 0 = 0 and |B −1 | → 0, i.e. minimum information / infinite prior covariance [12,13], posteriors for meanū and covariance matrix cov(u, u) based on given training data y = u(X) + σ n with measurement noise variance σ 2 n arē cov(u(X ), u(X )) = K − K T K −1 Here X = (x 1 , x 2 , . . . x N ) contains the training points, X = (x 1 , x 2 , . . . , x N ) the evaluation or test points.
Functions of X and X are to be understood as vectors or matrices resulting from evaluation at different positions, i.e.ū(X ) ≡ (ū(x 1 ),ū(x 2 ), . . . ,ū(x N )) is a tuple of predicted expectation values. The matrix K ≡ k(X, X) is the kernel covariance of the training data with entries K i j ≡ k(x i , x j ) and cov(u(X ), u(X )) i j ≡ cov(u(x i ), u(x j )) are entries of the predicted covariance matrix for u evaluated in the test points x i . Furthermore K y ≡ k(X, X) + σ 2 n I, K ≡ k(X, X ), K ≡ k(X , X ), R ≡ H − HK −1 y K , and entries of H are

Linear modeling of sources
A linear model for m(x) fulfilling a PDE according to (8) follows directly from the source representation. Consider sources to be modeled as a linear superposition over basis functions with unknown source strength coefficients q = (q i ). To model the mean instead of the source functions themselves, one uses an according superposition of particular solutions u p i (x) from inhomogeneous equationŝ For the linear model (9) this means that b = q and h i (x) = u p i (x). Posterior mean of source strengths and their uncertainty areq One can easily check that the predicted meanū(x ) =ū h (x ) +ū p (x ) at a specific point x in (13) fulfills the linear differential equation (2). In the homogeneous partū h (x ) = k(x , X) K −1 y (y − H Tq ) sources are absent withLū h (x ) = 0, withL acting on x here. The particular solutionū p (x ) = h T (x )q = ∑ i u p i (x )q i adds source contributions q i ϕ i (x ) due to (17). For point monopole sources ϕ i (x) = δ (x − x q i ) placed at at positions x q i , the particular solution u p, i (x) equals the fundamental solution G(x, x q i ) evaluated for the respective source. In the absence of sources the part described in this subsection isn't modeled and (13)(14) reduce to posteriors of a GP with prior mean m(x) = 0 where matrix R vanishes.

Application cases
Here the general results described in the previous section are applied to specific equations. Regression is performed based on values measured at a set of sampling points x i and may also include optimization of hyperparameters β appearing as auxiliary variables inside the kernel k(x, x ; β ). The optimization step is usually performed in a maximum a posteriori (MAP) sense, choosing β MAP as fixed rather than providing a joint probability distribution function including β as random variables. We note that depending on the setting this choice may lead to underestimation of uncertainties in the reconstruction of u, in particular for sparse, low-quality measurements.

Laplace's equation in two dimensions
First we explore construction of kernels in (10) for a purely homogeneous problem in a finite and infinite dimensional index space, depending on the mode of separation. Consider Laplace's equation In contrast to the Helmholtz equation, Laplace's equation has no scale, i.e. permits all length scales in the solution. In the 2D case using polar coordinates the Laplacian becomes A well-known family of solutions for this problem based on the separation of variables is leading to a family of solutions r m cos(mθ ), r m sin(mθ ), r −m cos(mθ ), r −m sin(mθ ).
Since our aim is to work in bounded regions we discard the solutions with negative exponent that diverge at r = 0. Choosing a diagonal prior that weights sine and cosine terms equivalently [9] and introducing a length scale s as a free parameter we obtain a kernel according to (4) with A flat prior σ 2 m = 1 for all polar harmonics and a characteristic length scale s as a hyperparameter, yields This kernel is not stationary, but isotropic around a fixed coordinate origin. Introducing a mirror pointx with polar angleθ = θ and radiusr = s 2 /r we notice that (25) can be written as making a dipole singularity apparent at x =x . In addition k is normalized to 1 at x = 0. Choosing s > R 0 larger than the radius R 0 of a circle centered in the origin and enclosing the computational domain, we havē r > s 2 /s = s > R 0 . Thus all mirror points and the according singularities are moved outside the domain. Choosing a slowly decaying σ 2 m = 1/m, excluding m = 1 and adding a constant term yields a logarithmic kernel instead [9] with Instead of a dipole singularity that expression features a monopole singularity at x −x that is avoided as mentioned above. Using instead Cartesian coordinates x, y to separate the Laplacian provides harmonic functions like u = e ±κx e ±iκy .
Here all solutions yield finite values at x = 0, so we don't have to exclude any of them a priori. Introducing again a diagonal covariance operator in (6) and taking the real part yields Setting σ 2 (κ) ≡ e −2κ 2 and choosing a characteristic length scale s together with a possible rotation angle θ 0 of the coordinate frame yields the kernel Other sign combinations do not yield a positive definite kernel -similar to the polar kernel (26) before we couldn't obtain an fully stationary expression that depends only on differences between coordinates of x and x . Inside Ω the solution is represented with errors below 5%. This is also reflected in the error predicted by the posterior variance of the GP that remains small in the region enclosed by measurement points. The analogy in classical analysis is the theorem that the solution of a homogeneous elliptic equation is fully determined by boundary values. In comparison, a reconstruction using a generic squared exponential kernel k ∝ exp((x − x ) 2 /(2s 2 )) yields a result of similar approximation quality in Fig. 2. The posterior covariance of that reconstruction is however not able to capture the vanishing error inside the enclosed domain due to given boundary data. More severely, in contrast to the previous case, the posterior meanū doesn't satisfy Laplace's equation ∆ū = 0 exactly. This leads to a violation of the classical result that (differences of) solutions of Laplace's equation may not have extrema inside Ω, showing up in the difference to the reconstruction in Fig. 2. This kind of error is quantified by computation of the reconstructed charge densityq = ∆ū. This is fine if data from Poisson's equation ∆u = q with distributed charges should be fitted instead. However, to keep ∆u = 0 exact in Ω, one requires more specialized kernels such as (26).

Helmholtz equation: source and wavenumber reconstruction
To demonstrate the proposed method in full we now consider the Helmholtz equation with sources Stationary kernels based on Bessel functions for the homogeneous equation have been presented in [11]. These functions provide smoothing regularization on the order of the wavelength λ 0 = 2π/k 0 and have been demonstrated to produce excellent field reconstruction from point measurements. Here we consider the two-dimensional case. The method of source strength reconstruction is improved compared to [11], as it constitutes a linear problem according to (18)(19). Non-linear optimization is instead applied to wavenumber k 0 as a free hyperparameter to be estimated during the GP regression. The setup is the same as in [11]: a 2D cavity with various boundary conditions and two sound sources of strengths 0.5 and 1, respectively. Results for sound pressure fulfilling (32) are normalized to have a maximum of p/p 0 = 1. Fig. 3 shows reconstruction error in field reconstruction depending on the number of measurement positions. Here noise of σ n = 0.01 has been added to the samples. The obtained negative log-likelihood depending on k 0 permits an accurate reconstruction of this quantity that has the physical meaning of a wavenumber. A generic squared exponential kernel k ∝ exp((x − x ) 2 /(2(π/k 0 ) 2 )) leads to results of similar quality and a slightly less peaked spatial length scale hyperparameter without a direct physical interpretation.

Heat equation
Consider the homogeneous heat/diffusion equation for (x,t) ∈ R × R + . Integrating the fundamental solution G = 1/ 4π(t − τ) exp((x − ξ ) 2 /(4(t − τ)) from ξ = −∞ to ∞ at τ = 0, i.e. placing sources everywhere at a single point in time, leads to the kernel In terms of x this is a stationary squared exponential kernel and the natural kernel over the domain x ∈ R. The kernel broadens with increasing t and t . Non-stationarity in time can also be considered natural to the heat equation, since its solutions show a preferred time direction on each side of the singularity t = 0. The only difference of (34) to the singular heat kernel is the positive sign between t and t . If both of them are positive, k is guaranteed to takes finite values.
As for the Laplace equation it is also convenient to define a spatially non-stationary kernel by cutting out a finite source-free domain. Evaluating the integral over the fundamental solution in R\(a, b) without our domain interval (a, b) we obtain Incorporating the prior knowledge that there are no domain sources could potentially improve the reconstruction. Initial investigations on the initial-boundary value problem of the heat equation based on those kernels produce stable results showing natural regularization within the limits of the strongly ill-posed setting. Reconstruction of diffusivity D has proven to be a difficult task and requires further investigations.

Summary and Outlook
A framework for application of Gaussian process regression to data from an underlying partial differential has been presented. The method is based on Mercer kernels constructed from fundamental solutions and produces realizations that match the homogeneous problem exactly. Contributions from sources are superimposed via an additional linear model. Several examples for suitable kernels have been given for Laplace's equation, Helmholtz equation and heat equation. Regression performance has been shown to yield results of similar or higher quality to a squared exponential kernel in the considered application cases. Advantages of the specialized kernel approach are the possibility to represent exact absence of sources as well as physical interpretability of hyperparameters. In a next step reconstruction of vector fields via GPs could be formulated, taking laws such as Maxwell's equations or Hamilton's equations of motion into account. A starting point could be squared exponential kernels for divergence-and curl-free vector fields [14]. Such kernels have been used in [15] to perform statistical reconstruction, and [16] apply them to GPs for source identification in the Laplace/Poisson equation. In order to model Hamiltonian dynamics in phase-space, vector-valued GPs could possibly be extended to represent not only volume-preserving (divergence-free) maps but retain full symplectic properties, thereby conserving all integrals of motion such as energy or momentum.