Numerical Investigation of a Class of Nonlinear Time-Dependent Delay PDEs Based on Gaussian Process Regression

: Probabilistic machine learning and data-driven methods gradually show their high efﬁciency in solving the forward and inverse problems of partial differential equations (PDEs). This paper will focus on investigating the forward problem of solving time-dependent nonlinear delay PDEs with multi-delays based on multi-prior numerical Gaussian processes (MP-NGPs), which are constructed by us to solve complex PDEs that may involve fractional operators, multi-delays and different types of boundary conditions. We also quantify the uncertainty of the prediction solution by the posterior distribution of the predicted solution. The core of MP-NGPs is to discretize time ﬁrstly, then a Gaussian process regression based on multi-priors is considered at each time step to obtain the solution of the next time step, and this procedure is repeated until the last time step. Different types of boundary conditions are studied in this paper, which include Dirichlet, Neumann and mixed boundary conditions. Several numerical tests are provided to show that the methods considered in this paper work well in solving nonlinear time-dependent PDEs with delay, where delay partial differential equations, delay partial integro-differential equations and delay fractional partial differential equations are considered. Furthermore, in order to improve the accuracy of the algorithm, we construct Runge–Kutta methods under the frame of multi-prior numerical Gaussian processes. The results of the numerical experiments prove that the prediction accuracy of the algorithm is obviously improved when the Runge–Kutta methods are employed.


Introduction
In the last decade, more and more attention has been paid on the theoretical studies and practical applications concerned with probabilistic machine learning and data-driven methods [1,2], and some studies have been gradually extended to the fields of differential equations. How to predict the solution of (nonlinear) dynamic systems represented by (delay) partial differential equations (PDEs) has been valued [3][4][5][6][7][8]. However, (deep) neural networks are favored and used more often by researchers [9][10][11][12][13][14][15][16], and the research targets may involve the forward and inverse problem [17].
Machine learning for Gaussian processes (GPs) [18] is not new, and has been wildly employed in many fields, such as geology [19], meteorology [20] and some financial applications [21]. Similar to other machine learning methodologies, Gaussian processes can be utilized in regression problems or classification problems. Moreover, Gaussian processes belong to a set of methods named kernel machines [22]. However, compared with other kernel machine methods, such as support vector machines (SVMs) [23] and relevance vector machines (RVMs) [24], the strict probability deduction of Gaussian processes greatly limits the popularization of this method in industrial circles. Another drawback of Gaussian processes is that the computational cost may be expensive, due to the Cholesky decomposition of the kernel functions. As a kernel machine with very strong learning ability, Gaussian processes (GPs) have been widely used in the forward and inverse problem of ordinary differential equations (ODEs) and PDEs in recent years.
Markus et al. (2021) [25] constructed multi-output Gaussian process priors with realizations in the solution set of the linear PDEs; the construction was fully algorithmic via Gröbner bases. However, the construction for building GP priors by the parametrization of such systems was done via a pullback. Paterne et al. (2022) [26] efficiently inferred inhomogeneous terms modeled as GPs by using a truncated basis expansion of the GP kernel from the perspective of exact conjugate Bayesian inference. Mamikon et al. (2022) [27] proposed a framework which combined co-Kriging with the linear transformation of GPs together with the use of kernels given by spectral expansions in eigenfunctions of the boundary value problem, to infer the solution of a well-posed boundary value problem (BVP). To estimate parameters of nonlinear dynamic systems, represented by ODEs from sparse and noisy data, Yang et al. (2022) [28] bypassed the need for numerical integration, and proposed a fast and accurate method, named manifold-constrained Gaussian process inference (MAGI), explicitly conditioned on the manifold constraint that derivatives of Gaussian processes must satisfy the considered ODEs, based on a GP model over time series data.   [29] proposed a method for the forward problem of timedependent PDEs based on GPR, which was named numerical Gaussian processes (NGPs). The method can solve equations with high accuracy, and the frame of the algorithm is flexible for solving different problems. Generally speaking, the study of Gaussian processes is an important branch of probabilistic numerics [30][31][32][33][34][35] in numerical analysis.
On the basis of NGPs, we propose a new method, multi-prior numerical Gaussian processes (MP-NGPs), which is designed to solve complex PDEs. In our known information, most methods based on Gaussian processes do not consider the delay term and highdimensional problems, and only a few researchers consider fractional derivatives and integro differential derivatives, which obviously cannot meet the requirements of practical application. However, these problems are all considered in MP-NGPs. We also give strict mathematical deduction of MP-NGPs. Several PDEs are solved in the following sections to prove the validity of MP-NGPs.
It is well known that Gaussian processes [18] possess a strict mathematical basis and coincide with the Bayesian method in nature. Unlike traditional methods, including the finite element method [36], spectral method [37], and finite difference method [38], MP-NGPs [29] solve PDEs by means of statistical methods, which is the same as NGPs. In MP-NGPs, the Bayesian method [39] is used to construct an efficient data learning machine, and then infer the solution of the equations. This paper will mainly exploit the possibility of MP-NGPs to solve the forward problem of a class of time-dependent nonlinear delay partial differential equations with multi-delays (1), where t is the temporal variable, x is the D dimensions spatial vector, s i (i = 1, 2, · · · , m) are delay terms (s m > s m−1 > · · · > s 1 > 0), and Ω ⊆ R D , ∂Ω is the boundary of Ω. u(x, t) is the solution of (1), f (x, t) is the inhomogeneous term, g(u, x), h i (u, x) (i = 1, 2, · · · , m), ϕ(x, t), ψ(x, t) are given functions, and L x , L s i x (i = 1, 2, · · · , m) are linear operators. The linear operators considered in this paper will contain three different operators; these are partial differential, integro-differential and fractional-order operators. We should point that the fractional-order operators of (1) are considered in the sense of Caputo [40,41] ∂ α ∂x α u(x, t) = where α > 0 is a constant (n − 1 < α ≤ n, n ∈ N + ), Γ() is the Gamma function. Apparently, the boundary conditions of the problem (1) are Dirichlet boundary conditions. As for other types of boundary conditions, we will discuss them and give some solutions in Section 4. In order to certify the high efficiency and flexibility of MP-NGPs again, we will consider another type of time-dependent PDEs, wave equations, which still involve delay terms and inhomogeneous terms, in Section 5.
We arrange the outline of this paper as follows: Section 2 introduces MP-NGPs according to the workflow of solving nonlinear delay PDEs. In Section 3, Runge-Kutta methods based on MP-NGPs are constructed. Section 4 introduces the processing of Neumann and mixed boundary conditions in MP-NGPs. Section 5 gives two methods of how to slove wave equations in MP-NGPs. Finally, we present some discussions and conclusions in Sections 6 and 7, respectively.

Multi-Priors Numerical Gaussian Processes
In this section, we introduce MP-NGPs based on the workflow of solving nonlinear delay PDEs with multi-delays.

Prior
Use the backward Euler scheme in the first equation of (1) to discretize time, take the time step size as ∆t, and then obtain where n = 1, 2, · · · , N, . We approximate the nonlinear terms g(u n , x)L x u n and h i (u n , x)L s i x u n−τ i (i = 1, 2, · · · , m) with g(µ n−1 , x)L x u n and h i (µ n−1 , x)L s i x u n−τ i (i = 1, 2, · · · , m), respectively, where µ n−1 denotes the predicted solution (posterior mean) of the previous time step. Then introduce notations that L 0 x u n = u n − ∆t · g(µ n−1 , x)L x u n , L i x u n−τ i = −∆t · h i (µ n , x)L s i x u n−τ i (i = 1, 2, · · · , m) and L m+1 x f n = −∆t · f n , so we can obtain the simplified equation Equation (4) captures the whole structure of solutions at each time step. We can link the solutions at different time steps by setting reasonable prior assumptions. Assume the following prior hypotheses [29] to be m + 2 mutually independent Gaussian processes. Considering the calculation feasibility, we set a prior assumption on u n rather than u n−1 . Assume covariance functions to be of the following exponential form [18], where {θ j |j = 0, 1, · · · , m + 1} denotes the hyper-parameters. In Equations (6), , j = 0, 1, · · · , m + 1. In short, any information of PDEs, such as delay terms, can be encoded into MP-NGPs through assuming multi-priors, as long as the structure constructed by these priors is feasible.
By Mercer's theorem [42], given a positive definite covariance function k(x, x ) we can as a set of orthogonal basis and construct a reproducing kernel Hilbert space (RKHS) [43], which is the so-called kernel trick [43]. The covariance function in Equation (6) does not have a finite decomposition. Different covariance functions, such as rational quadratic covariance functions and Matérn covariance functions, can be appropriately selected under different prior information [18].

Training
By the properties of GPs [18], we can obtain y ∼ GP (0, K), where It is worth noting that K is a symmetry matrix, and elements besides the upper triangular part of K are not written in this paper.
Consider additive noise in the observed data, and introduce notations that and ε u n−τ i ∼ N(0, σ 2 u n−τ i I) (i = 1, 2, · · · , m). Assume ε u n , ε f n , ε u n−1 and ε u n−τ i (i = 1, 2, · · · , m) to be m + 3 mutually independent additive noise. Notice that k n,n b,b is equal to k n,n u,u . That is, the boundary points can be seen as the known points uses for the next time step, as Dirichlet boundary conditions are considered. For different types of boundary conditions, the corresponding treatment will be introduced in Section 4.
By (5), the hyper-parameters {θ j |j = 0, 1, · · · , m + 1} are trained by minimizing the following log marginal likelihood function, where N is the length of y. A Quasi-Newton optimizer L − BFGS − B [44] is applied for training in this paper. It should be pointed that other optimizer algorithm, such as the whale optimization algorithm [45] (WOA) and the ant colony optimization algorithm [46], can also be used for training. The training procedure is one of the most important things for exploiting the algorithm, which can be seen as the "regression nature" of MP-NGPs. However, to numerically solve the forward problem of PDEs, the method of MP-NGPs subtly turns algebraic problems into optimization problems, which are more familiar with machine learning methods.

Posterior
We can easily obtain the following posterior distribution of the prediction solution u n (x n * ) based on the conditional distribution of Gaussian processes,

Basic Workflow
Summing up the above introduction, we have the following algorithm workflow: 1.

2.
From the obtained hyper-parameters of step 1, (8) is used to predict the solution for the first time step and obtain artificially generated data {x 1 u , u 1 }, where points x 1 u are randomly sampled from Ω, and u 1 is the corresponding random vector. We obtain the delay term data {x 2−τ i u , u 2−τ i }(i = 1, · · · , m) and the inhomogeneous term data {x 2 f , f 2 } by randomly selecting points on Ω, then obtain the boundary data {x 2 b , u 2 b } by randomly selecting points on ∂Ω.

4.
From the obtained hyper-parameters of step 3, (10) is used to predict the solution for the second time step and obtain artificially generated data {x 2 u , u 2 }, where points x 2 u are randomly sampled from Ω. We also obtain the delay term data {x 3−τ i u , u 3−τ i } (i = 1, · · · , m) and the inhomogeneous term data {x 3 f , f 3 } by randomly selecting points on Ω, then obtain the boundary data {x 3 b , u 3 b } by randomly selecting points on ∂Ω.

5.
Steps 3 and 4 are repeated until the last time step.
However, we exploit the Gaussian process regression to predict the next solution at each time step. Figure 1 shows the basic idea of MP-NGPs. At each iteration, the optimized hyper-parameters of the current time step are saved and deemed as initial values of the hyper-parameters in the next time step, which can significantly speed up the training process.

Processing of Fractional-Order Operators
For simplicity, we consider the following delay problem to show the processing of the fractional-order operators, where continuous real function u(x, t) is absolutely integrable in R 2 . Assume arbitraryorder partial derivatives of u(x, t) to be continuous function and absolutely integrable in R 2 . Consider the backward Euler scheme to (11) with g(u n , x) and h(u n , x) replaced by g(µ n−1 , x) and h(µ n−1 , x), respectively, From (12), we can obtain ∂x 2 u n−τ , and L 2 x f n = −∆t · f n . Assume the Gaussian process prior as follows, We can easily obtain the following results from Section 2.2, How to deal with the first term on the right side of (15) is one of the most important tasks. Denote k in (x, x ) = L 0 x L 0 x k n,n u,u (x, x ), according to [40], the Fourier transform is carried out with respect to the spatial variable (x, x ) on k n,n u,u (x, x ) to obtaink n,n u,u (ω, ω ), as the excellent properties of the squared exponential covariance function k n,n u,u (x, x ) obviously assure the feasibility of the Fourier transform of derivatives of k n,n u,u (x, x ), according to which k n,n u,u (x, x ) and its partial derivatives vanish for x 2 + x 2 → +∞. Then we can obtain the following intermediate kernels: At last, the inverse Fourier transform can be performed onk n,n−1 u,u (ω, ω ) andk in (ω, ω ) to obtain the kernels k n,n−1 u,u (x, x ) and k in (x, x ).

Numerical Tests
To verify the validity of the method proposed, three examples are provided. Firstly, we introduce the maximum absolute error E ∞ to denote the difference between the conditional posterior mean of the prediction solution and exact solution as follows: where t n is the current time, x * is a point random sampling from Ω, u n (x * ) is the exact solution, andū n (x * ) is the posterior mean of u n (x * ) (prediction solution). We also introduce a notation that max , which can be seen as a measure of the prediction error under a different time step size ∆t.
Then we introduce the relative spatial error L 2 to represent the relative error of prediction [29], where 0 < α, β < 1. Example 1 is a fractional partial differential equation with one delay. The fractional partial derivative ∂ α ∂x α u(x, t) and ∂ β ∂x β u(x, t) is defined by (2) in the Caputo sense [40].
We fix (α, β) to be (0.25, 0.75), (0.5, 0.5), and (0.75, 0.25) in turn. The exact solution of (30) is u(x, t) = e 0.5x (1 + t 2 ), and the inhomogeneous term f (x, t) differs from each other when (α, β) is changed, where q( . Numerical tests of solving (19) are performed with noiseless data and noisy data, respectively, where the backward Euler scheme is employed. We only add noise into the initial data u 0 , and the additive noise of u 0 is ε u 0 ∼ N(0, 0.1 2 I). Figures 2-4 show the posterior distribution of the predicted solution at some time nodes, where we fix the number of initial data points (inhomogeneous term data points, artificially generated data points and delay term data points) to be 20 and ∆t to be 0.01, and take (α, β) to be (0.25, 0.75), (0.5, 0.5), and (0.75, 0.25) in turn. In each of these figures, (a) presents the results of the experiments with noiseless data, and (b) presents the results of the experiments with noisy data. Moreover, Table 1 shows us the maximum absolute error E ∞ between the conditional posterior mean of the prediction solution and the exact solution at some time nodes in detail, where noiseless data is used in the experiments. Table 2 shows us the maximum absolute error E ∞ at some time nodes, where noisy data are used. It is found that the method works well when solving nonlinear delay fractional equations. It is easy to observe that higher prediction accuracy is obtained by using noiseless data in experimental tests. Table 1. Fractional equation of Example 1: maximum absolute error E ∞ at some time nodes, where ∆t = 0.01, (α, β) is fixed to be (0.25, 0.75), (0.5, 0.5), and (0.75, 0.25) in turn, and noiseless data are used.
Example 2 is a partial integro-differential equation with one delay. The exact solution of (21) is u( Numerical tests of solving (21) are performed with noiseless data and noisy data, respectively, where the backward Euler scheme is employed. We add noise into the initial data u 0 and inhomogeneous term data f n (n = 1, 2, · · · , 100), and the additive noisy of u 0 and f n is ε u 0 ∼ N(0, 2 2 I) and ε f n ∼ N(0, 4 2 I), respectively. Figure 5 shows the posterior distribution of the predicted solution at different time nodes, where we fix the number of initial data points (inhomogeneous term data points, artificially generated data points and delay term data points) to be 10 and ∆t to be 0.01. In Figure 5, the five pictures on the upper side present the results of the experiments with noiseless data, and the other five pictures on the lower side present the results of the experiments with noisy data. Moreover, Table 3 shows us the maximum absolute error E ∞ at some time nodes in detail, where noiseless data and noisy data are used in the experiments, respectively. It is found that the method works well when solving nonlinear delay integro-differential equations. It is worth noting that the computational accuracy of the prediction solution is not sensitive to noise, as the level of noise added into the data is not low.
Example 3 defines a problem with two delays and three spatial dimensions. The exact Numerical tests applying the backward Euler scheme to solve (22) will be performed with noiseless data. Figure 6 shows the relative spatial error L 2 and the maximum absolute error E ∞ with the increase in time, where we fix the number of initial data points (inhomogeneous term data points, artificially generated data points and delay term data points) to be 20 and ∆t to be 0.01. The number of training data points on each boundary line is fixed to be 5. Furthermore, Table 4 shows the relative spatial error L 2 and the maximum absolute error E ∞ at some time nodes in detail. According to the experimental outcome, we can conclude that the method is still effective when solving nonlinear time-dependent delay equations with high dimensions and multi-delays.

Runge-Kutta Methods
For the purpose of improving the accuracy of the methodology considered in Section 2, we will consider Runge-Kutta methods [47,48] based on the frame of MP-NGPs.

MP-NGPs Based on Runge-Kutta Methods
Apply Runge-Kutta methods with q stages to the first equation of (1) with appropriate approximation of nonlinear terms, and we can obtain [47] Reforming (24), we have where u n+ς j = u(x, t n + ς j ∆t), u n+ς j −τ k = u(x, t n + ς j ∆t − τ k ), and f n+ς j = f (x, t n + ς j ∆t).
Prior assumptions are set on u n+1 , u n+ς i , f n+ς i , u n+ς i −τ k (i = 1, 2, · · · , q; k = 1, 2, · · · , m) and Equations (25) turn into a regression problem. Then directly write the joint distribution of u n+1 , u n+ς i , f n+ς i , u n+ς i −τ k , u n q+1 , u n i (i = 1, 2, · · · , q; k = 1, 2, · · · , m). For simplicity, we introduce the Runge-Kutta methods by solving the following problem: The trapezoidal rule is applied under the structure of Runge-Kutta methods to (26) with g(u n+1 , x), h(u n+1 , x), g(u n , x) and h(u n , x) replaced by g(µ n , x), h(µ n , x), g(µ n , x) and h(µ n , x), respectively. Then we have Reforming (27), we have u n 1 := u n , u n 2 := u n 3 : However, Equation (28) coincides with the structure of Runge-Kutta methods when q = 2, ς 1 = 0, ς 2 = 1, b 1 = b 2 = 1/2, a 11 = a 12 = 0, and a 21 = a 22 = 1/2. Reforming (28), we obtain u n 1 := u n , u n 2 := u n 3 : where L 0 x , L 1 x , L 2 x and L 3 x are linear operators. Assume the following prior hypotheses: to be mutually independent Gaussian processes, and assume covariance functions to be the following exponential form of (6). By the properties of the Gaussian processes, we have , k n,n 1,1 = k n,n u,u (x n u , x n u ) + σ 2 u n I, k n,n 1,2 = L 3 x k n,n u,u (x n u , x n u ), k n,n 2,2 = L 0 Other kernels are assumed to be 0. We can write the posterior distribution of the prediction u n+1 (x n+1 * ) as follows: where are artificially generated data, we cannot directly use the outcome of the conditional distribution of (32) in the next time step. To propagate the uncertainty, we should marginalize u n−1 (u n−τ , u n+1−τ ) by employing to get by 0), and Σ n,n 2×2 = Σ n,n Σ n,n Σ n,n Σ n,n .

Numerical Test for the Problem with One Delay Applying the Trapezoidal Rule
In this subsection, the method of applying the backward Euler scheme and the trapezoidal scheme is applied to solve Example 4, and numerical tests are carried out. Moreover, the impact of data noise and the amount of training data on maximum absolute error E ∞ is investigated in this subsection.
The exact solution of (35) is u(x, t) = e −t (cos x + sin x), and the inhomogeneous term is f (x, t) = e −t (cos x + sin x){2x + e −2t (cos x + sin x) 2 + sin[e −t (cos x + sin x)]} − e 0.2−2t x cos 2x. Table 5 presents the maximum absolute error E ∞ between the conditional posterior mean of the prediction solution and the exact solution at five time nodes for the different time step size ∆t under the backward Euler scheme, where the number of initial data points (inhomogeneous term data points, artificially generated data points and delay term data points) is fixed to be 20. It is found that the method applying the backward Euler scheme is first-order accurate in time, which is just in line with the first-order convergence property of the backward Euler scheme we applied. Since calculating the convergence order in time requires the maximum absolute error under two consecutive time steps, we omit the display of the first value in the last column of Tables 5 and 6 . Figure 7a shows the impact of the number of initial data points (inhomogeneous term data points, artificially generated data points and delay term data points) on max E ∞ (∆t) = max 1≤n≤N E ∞ (t n ), where we take ∆t as 10 −1 , 10 −2 and 10 −3 in turn. The results show that when ∆t is equal to 10 −1 , the prediction accuracy quickly reaches saturation with the increase in the number of training data points, but when ∆t is taken smaller, increasing the amount of training data still has the potential to improve the prediction accuracy, and the saturation behavior occurs much later.
Moreover, we investigate the impact of noise on the prediction accuracy. We add noise into the initial data u 0 , delay term data u −τ (τ = 1, 2, · · · , 19) and inhomogeneous term data f n (n = 1, 2, · · · , 100) to investigate the impact of the level of the noise on the accuracy. The additive noisy of u 0 , u −τ and f n is ε u 0 ∼ N(0, σ 2 I), ε u −τ ∼ N(0, σ 2 I) and ε f n ∼ N(0, 1 4 σ 2 I), respectively. Take t = 0.2, 0.6, 1.0, and observe the change of E ∞ under different σ at these time nodes (see Figure 7b). The experiment outcome proves that the bigger σ, the lower the prediction accuracy. With the progress of the iteration, the impact of the noise on the accuracy of the current time step becomes smaller, and the accuracy gradually tends to increase, as the hyper-parameters that contain the knowledge of the considered problem are trained better and better as the iteration progress continues. Table 6 shows the maximum absolute error E ∞ between the conditional posterior mean of the prediction solution and the exact solution at five time nodes for the different time step size ∆t under the trapezoidal scheme, where the number of initial data points (inhomogeneous term data points, artificially generated data points and delay term data points) is fixed to be 20. It is found that the method applying the trapezoidal rule is still first-order accurate in time, which is apparently not consistent with the second-order convergence property of the trapezoidal rule. Therefore, we can conclude from these results that the convergence rate of MP-NGPs using Runge-Kutta methods is also the first-order. Fortunately, the accuracy of the method applying the trapezoidal scheme (Runge-Kutta methods) does exceed the backward Euler scheme when other experimental conditions are fixed, which means that applying the Runge-Kutta methods on MP-NGPs is still very useful when the current accuracy is not satisfying. The results of Figure 8 also prove that the Runge-Kutta methods do exceed the backward Euler scheme in prediction accuracy. Results of Example 4, (a) impact of the number of initial data points (inhomogeneous term data points, artificially generated data points and delay term data points) on max E ∞ (∆t) = max 1≤n≤N E ∞ (t n ), when the backward Euler scheme is employed. Fix ∆t to be 10 −1 , 10 −2 and 10 −3 in turn. (b) Impact of the level of the additive noise on the maximum absolute error E ∞ , when the backward Euler scheme is used. Fix the number of initial data points (inhomogeneous term data points, artificially generated data points and delay term data points) to be 20, and take ∆t = 0.01.

Processing of Neumann and Mixed Boundary Conditions
In the above sections, we did not mention the processing of other boundary conditions, except Dirichlet boundary conditions. This section will introduce how to solve two-dimensional PDEs with Neumann or mixed boundary conditions. That is, the spatial vector x is a one-dimensional variable.
As everyone knows, Dirichlet, Neumann and mixed boundary conditions have the following general form, respectively, in the problem (1) of two dimensions: In order to avoid the repetitive introduction, we will focus on the processing of Neumann and mixed boundary conditions in this section.

Neumann Boundary Conditions
For the Neumann boundary conditions (37) ∂x . Note L v x u n = ∂u n ∂x , and L v x is a linear operator. Return to the problem (1), then we can get y ∼ GP (0, K), where The additive noise ε v n is assumed to be independent of other noise.

Mixed Boundary Conditions
For the mixed boundary conditions (38), the processing of boundary information is more complicated.

Numerical Tests for a Problem with Two Delays and Different Types of Boundary Conditions
In this subsection, we will solve a nonlinear PDE with two delays, and Dirichlet, Neumann and mixed boundary conditions will be considered in this PDE, in turn.
Example 5 is a nonlinear PDE with two delays. The exact solution of (51) is u(x, t) = (1 + t 2 )( x 2π + sin x), and the inhomogeneous term is f ( ). The Dirichlet, Neumann and mixed boundary conditions of Example 5 are as follows: Numerical tests of solving (51) will be performed with noiseless data. Figure 9 shows the posterior distribution of the predicted solution at different time nodes, where we fix the number of initial data points (inhomogeneous term data points, artificially generated data points and delay term data points) to be 10 and fix ∆t to be 0.01. Figure 9a-c presents the results of the experiments with Dirichlet, Neumann and mixed boundary conditions, respectively, and Figure 9d shows the maximum absolute error E ∞ with the increase in time. Moreover, Table 7 shows us the maximum absolute error E ∞ at some time nodes in detail, where Dirichlet, Neumann and mixed boundary conditions are both considered. It is found that the method based on MP-NGPs works well when solving nonlinear delay PDEs with Dirichlet, Neumann or mixed boundary conditions. The prediction accuracy under different types of boundary conditions is close to each other.

Processing of Wave Equations
So as to prove the efficiency of MP-NGPs, we give the processing of another type of time-dependent PDEs, wave equations, where φ 1 (t) ,φ 2 (t), ψ(x, t), and ϕ(x) are given functions. Apparently, delay terms and inhomogeneous terms are also considered in the wave equations.
We will give two methods of how to solve wave equations by the algorithm of MP-NGPs.

Second Order Central Difference Scheme
The first method is to apply a second-order central difference scheme of second-order derivatives on the first Equation of (55) to discretize the temporal variable: where n = 0, 1, · · · , N − 1, N = T/∆t, u n = u(x, t n ), u n−τ = u(x, t n − s), τ = s/∆t, f n = f (x, t n ). Rearrange (56) to obtain Obviously, the scheme (56) is a multi-steps method, while the backward Euler scheme is a one-step method. Then introduce notations that L 0 x u n = 2u n + ∆t 2 ∂ 2 ∂x 2 u n , L 1 x u n−1 = −u n−1 , L 2 x u n−τ = ∆t 2 ∂ ∂x u n−τ and L 3 x f n = ∆t 2 f n . So, we can obtain the simplified equation Assume the Gaussian process prior as follows: According to the properties of Gaussian processes, then we can obtain y ∼ GP (0, K), where , k n,n u,u = k n,n u,u (x n u , x n u ) + σ 2 u n I, k n−1,n−1 Any kernel not mentioned above is equal to 0. By (5), train the hyper-parameters {θ j |j = 0, 1, 2, 3} by minimizing NLML (60): After the training part, the posterior distribution of u n+1 (x n * ) can be directly written as where ]. On the basis of Section 2.4, in order to accurately propagate the uncertainty, marginalize u n , u n−1 (and u n−τ , when n > τ) by employing to obtain where µ n+1 (x n+1 and S = diag(Σ n,n , Σ n−1,n−1 , Σ n−τ,n−τ , 0, 0) (if n = 0, replace Σ n,n by 0; if n < 2, replace Σ n−1,n−1 by 0; if n ≤ τ, replace Σ n−τ,n−τ by 0).

Backward Euler Scheme
Firstly, note ∂ ∂t u(x, t) = v(x, t), then obtain ∂ ∂t u n = v n , ∂ ∂t v n = ∂ 2 ∂x 2 u n + ∂ ∂x u n−τ + f n , Any kernel not mentioned above is equal to 0. By (5), train the hyper-parameters {θ j |j = 0, 1, 2, 3} by minimizing NLML (69): After the training part, the joint posterior distribution of u n+1 (x n * ) can be directly written as k n,n u,u (x n * u , x n * u ) k n,n u,v (x n * u , where q T = k n,n u,u (x n * u , x n b ) k n,n−τ u,u (x n * u , x n−τ u ) k n,n u, f (x n * u , x n f ) k n,n−1 u,u (x n * u , x n−1 u ) k n,n−1 u,v (x n * u , On the basis of Section 2.4, in order to accurately propagate the uncertainty, marginalize u n−1 , v n−1 (and u n−τ , when n > τ) by employing to obtain , · · · , u n b u 1−τ , u 2−τ , · · · , u n−τ f 1 , f 2 , · · · , f n u 0 , v 0 Example 6 is the wave equation with one delay. The exact solution of (74) is u(x, t) = e −t (cos 2x + sin x), and the inhomogeneous term is f (x, t) = e −t (5 cos 2x + 2 sin x) − e 0.2−t (cos x − 2 sin 2x).
Numerical tests of solving (74) are performed with noiseless data. Figure 10 shows the posterior distribution of the predicted solution at different time nodes, where we fix the number of initial data points (inhomogeneous term data points, artificially generated data points and delay term data points) to be 15 and fix ∆t to be 0.01. Figure 9a,b, presents the results of the experiments employing the second-order central difference scheme and the backward Euler scheme, respectively, and Figure 9c shows the maximum absolute error E ∞ with the increase in time. Moreover, Table 8 shows us the maximum absolute error E ∞ at some time nodes in detail. According to the results of numerical tests, we conclude that the two methods based on MP-NGPs effectively solve the wave equation with one delay. The second method applying the backward Euler scheme behaves better than the first method, applying the second-order central difference scheme on prediction accuracy.

Discussions
In this paper, a new method, called multi-priors numerical Gaussian processes (MP-NGPs), is designed and proposed on the basis of numerical Gaussian processes (NGPs), to solve complex time-dependent PDEs. Compared with NGPs, the MP-NGPs proposed possess a structure with a much stricter mathematical deduction, and the application of MP-NGPs is more flexible. Delay terms and inhomogeneous terms can be introduced on MP-NGPs. The main idea of MP-NGPs is to realize the discretization of time, apply Gaussian process regression to transform the potential information of PDEs into a scheme controlled by several operators, and predict the solution from the statistical perspective. MP-NGPs build a Gaussian process regression at each time step to predict the solution at the next time step. The basic idea of MP-NGPs is very similar to that of NGPs, but the applications of MP-NGPs are explored more deeply than NGPs. Moreover, this methodology of MP-NGPs has natural advantages in handling noisy data and quantifying the uncertainty of the predicted solution. It should be mentioned that MP-NGPs can only solve a class of fractional-order PDEs composed of special fractional-order derivatives (2). Other kinds of fractional-order derivatives still deserve further research in MP-NGPs. Moreover, multi-spatial-dimension irregular domain problems, the inverse problems of PDEs, such as estimating constant or variable coefficients based on Gaussian processes and some specific application problems of MP-NGPs, may be the direction of our future investigation.

Conclusions
In this manuscript, the possibility of MP-NGPs in solving nonlinear delay partial differential equations with multi-delays is explored. These equations investigated include partial differential equations, partial integro-differential equations, fractional partial differential equations and wave equations. We also consider Neumann and mixed boundary conditions of PDEs on MP-NGPs. Numerical experiments prove that the method proposed can solve nonlinear delay partial differential equations with satisfactory precision, and the algorithm applying the backward Euler scheme is stably first-order accurate in time. So as to improve the prediction accuracy, Runge-Kutta methods under the frame of MP-NGPs are proposed in this manuscript. The algorithm applying the Runge-Kutta methods can effectively improve the prediction accuracy, although the temporal convergence rate of this algorithm is still of the first order according to the numerical experiments.