Pseudo-Likelihood Estimation for Parameters of Stochastic Time-Fractional Diffusion Equations

: Although stochastic fractional partial differential equations have received increasing attention in the last decade, the parameter estimation of these equations has been seldom reported in literature. In this paper, we propose a pseudo-likelihood approach to estimating the parameters of stochastic time-fractional diffusion equations, whose forward solver has been investigated very re-cently by Gunzburger, Li, and Wang (2019). Our approach can accurately recover the fractional order, diffusion coefﬁcient, as well as noise magnitude given the discrete observation data corresponding to only one realization of driving noise. When only partial data is available, our approach can also attain acceptable results for intermediate sparsity of observation.


Introduction
Stochastic fractional partial differential equations (SFPDEs) have received a fair amount of attention in the last decade. Research in that direction involves constructing new SFPDE models, proving well-posedness of model solution, and developing new numerical solution methods. Some of the SFPDEs can be written in the following general form: where β is the order of Caputo fractional time derivative, c is diffusion coefficient,α(x) is the spatially variable order of fractional Laplacian, f (x, t) is a prescribed source term and I 1−β t is fractional time integration operator,Ẇ(x, t) is a space-time white noise, and is the magnitude of the noise. Mijena and Nane [1] proved the existence and uniqueness of mild solutions to the space-time fractional nonlinear PDE (1) with β ∈ (0, 1), α(x) ≡ α, and a Lipschitz continuous σ(·). Anh, Leonenko, and Ruiz-Medina [2] derived the weak-sense Gaussian solution to the SFPDE (1) with β ∈ (0, 1), σ(·) ≡ 1, and pseudo-differential operator (−∆)˜α (x)/2 . Gunzburger, Li, and Wang [3] proposed a time discretization scheme for stochastic time-fractional PDE (1) with β ∈ (0, 1),α(x) ≡ 2, and σ(·) ≡ 1. Some other authors considered SFPDEs that differ from the general form (1). Bolin, Kirchner, and Kovács [4] presented a numerical solution method for an elliptic SFPDE (κ − ∆) β u(x) = W(x), where W(x) is a Gaussian white noise. Anh, Olenko, and Wang [5] constructed a SFPDE to model the evolution of a random tangent vector field on the unit sphere. Mohammed citemohammed2021approximate and Xia and Yan [7] considered FPDEs driven by multiplicative Brownian motion and fractional Brownian motion, respectively.
In contrast to the rapid development of numerical solution to forward problems, parameter estimation for SFPDEs has not yet been fully investigated, although there have been many works on parameter estimation for SPDEs [8][9][10][11]. Cialenco, Lototsk, and Pospisil [12] and Cialenco [13] studied the parameter estimation of diffusion coefficient and Hurst index for standard (integer-order) diffusion driven by additive and multiplicative fractional noises. When the fractional ingredient in the driving noise is excluded, the equations they considered are more like stochastic integer-order PDEs, which are not of the concern of current study. Geldhauser and Valdinociun [14] did consider a SFPDE, and they estimated the order of fractional Laplacian in the context of optimal control.
Parameter estimation for deterministic and stochastic FPDEs are different. Inversion for deterministic FPDEs often involves solving forward problems, while inversion for stochastic FPDE does not. Specifically, for deterministic FPDEs, once the source term f (x, t) and initial/boundary conditions are known, one can solve forward problems under different equation parameters, and after utilizing certain inversion techniques such as regularizing nonlinear least squares [15], surrogate models [16][17][18] and physics-informed deep learning [19], one can select the parameters that reconstruct the observation of u as the estimated parameters. For stochastic FPDEs, however, one does not know the specific realization of noise that corresponds to the observation, and thus cannot solve the forward problems even though other conditions are prescribed. Maximum likelihood estimation (MLE) is a powerful means to handle the parameter estimation problems for SPDEs, which can bypass solving forward problems. To the best of our knowledge, there are very few reports on MLE and its variants for parameter estimation of SFPDEs. On the other hand, there have been some efforts on theoretical analyses, such as consistency and asymptotic efficiency, for maximum likelihood estimators for parameters in SPDEs, for example, in [8,11], but the corresponding numerical studies are fewer than the theoretical studies.
In this paper, we propose a pseudo-likelihood estimation approach for SFPDE (1) with β ∈ (0, 1),α(x) ≡ 2, and σ(·) ≡ 1. We extend the pseudo-likelihood approach for solving stochastic ordinary differential equations (SODEs) to SFPDEs, despite the fact that the extension is not trivial. The paper is organized as follows. Section 2 introduces the SFPDE we consider and defines the parameter estimation problem. Section 3 elaborates on pseudo-likelihood approach for the SFPDE, before which we briefly review the approach for SODEs since it inspires us. Section 4 demonstrates some numerical results for fabricated observation data. The last section gives some remarks on the proposed approach.

Parameter Estimation Problem
We consider a one-dimensional stochastic time-fractional diffusion problem [3]   where u(x, t) is a space-time distribution of concentration of certain particles, the diffusion coefficient c > 0, the fractional order α ∈ (0, 1), and the initial condition η(x) and the source term f (x, t) are prescribed. The stochastic fractional PDE is driven by a space-time white noise ∂ t W(x, t) :=Ẇ(x, t), which is the time derivative of a cylindrical Wiener process W(x, t) in L 2 ((0, 1)) defined by [20] W( with φ j (x) being the normalized eigenfunction of negative Laplacian −∆, i.e., φ j (x) = √ 2 sin(jπx) for x ∈ (0, 1) and {W j (t)} +∞ j=1 being independent one-dimensional Wiener processes. The positive constant before the white noise determines the noise magnitude. The domain of Laplacian ∆ is {ψ ∈ H 1 0 ((0, 1)) : ∆ψ ∈ L 2 ((0, 1)). The left-sided Caputo fractional time derivative is considered: where Γ(α) is the Gamma function. Note that for ψ(x, 0) = 0 the left-sided Caputo fractional time derivative is equivalent to left-sided Riemann-Liouville fractional time derivative. Moreover, applying fractional integration operator ∂ α−1 t to both hand sides of Equation (2) yields an equivalent equation in which we have used the identity (cf. Lemma 2.22 in [21]) The parameter estimation problem can be defined as follows. Given the concentration dataū(x i , t n ) observed at space-time grid points (x i , t n ), we will find the parameters (c, α, ) that maximize the probability that such concentration data are observed. Notice that we do not know what specific sample (or realization) of the white noiseẆ(x, t) corresponds to the current observation, but we only know the observation. We will estimate the magnitude of the noise only utilizing the observation.

Pseudo-Likelihood Approach
In this section, we first review the pseudo-likelihood approach for parameter estimation of stochastic ODEs discussed in Section 3.2 of [22] and then embark on the extension of the approach to stochastic PDEs.

Pseudo-Likelihood Estimation for Stochastic ODEs
We take the parameter estimation of the following Black-Scholes equation as an example: with deterministic intial value X(t 0 ) = x 0 and Wiener process W(t). Given the observation X(t n ) for n = 1, 2, · · · , N, we will estimate the parameters (θ 1 , θ 2 ). Denote by p θ 1 ,θ 2 (x 2 , s 2 |x 1 , s 1 ) (s 1 < s 2 ) the conditional density of the random variable X(s 2 ) given X(s 1 ), and the likelihood function for the observation {X(t n )} is The MLE (θ 1 ,θ 2 ) satisfies (θ 1 ,θ 2 ) = arg min When the explicit form of conditional density is known, the corresponding approach is called exact-likelihood approach. For instance, as the conditional density for Black-Scholes equation is the density of a log-normal random variable [22], we can employ the exactlikelihood approach to infer the parameters. For general diffusion processes, however, the conditional density may not be known explicitly. In this case, we can infer parameters using pseudo-likelihood approach instead.
To make pseudo-likelihood estimation, we need to discretize the SODEs first. We still consider the Black-Scholes equation, and discretize it using the Euler scheme for n = 1, 2, · · · , N. As the increment of the Wiener process W(t n ) − W(t n−1 ) is a normal random variable with zero mean and variance t n − t n−1 , namely, W(t n ) − W(t n−1 ) ∼ N(0, ∆t), we see that the residual X(t n ) − X(t n−1 ) − θ 1 X(t n−1 )∆t := Y n is a normal random variable Y n ∼ N(0, θ 2 2 X 2 (t n−1 )∆t).
Thus, the probability we jointly observeȲ n =X(t n ) −X(t n−1 ) − θ 1X (t n−1 )∆t is determined by the density, Then, the joint probability we observe {X(t n )} N n=1 is The pseudo-likelihood estimation satisfies (θ 1 ,θ 2 ) = arg min The conditional density q θ 1 ,θ 2 in pseudo-likelihood (12) generally differs from the true conditional density p θ 1 ,θ 2 in (7) as mentioned in preceding paragraphs. The approximation q to p is good when the sampling step ∆t is very small. For example, for N(∆t) 3 → 0 as N → +∞ the maximum likelihood estimator built upon pseudo-likelihood is consistent and asymptotically normal [23].

Pseudo-Likelihood Estimation for Stochastic PDEs
We now consider the parameter estimation of fractional order α, diffusion coefficient c, as well as noise magnitude in problem (2) by leveraging the pseudo-likelihood approach we reviewed in the last subsection. Recalling that the first step we implement the approach is to discretize the Black-Scholes equation using the Euler scheme, we, first of all, need to seek a proper discretization scheme for our stochastic fractional PDE.

Spatio-Temporal Discretization Scheme
We adopt the time discretization scheme proposed in [3] and central difference for temporal and spatial discretizations, respectively. For the computational domain {(x, t) : (x, t) ∈ (0, 1) 2 } of problem (2), we denote by x i = i N x , i = 0, 1, · · · , N x the spatial grid and by t n = n N t , n = 0, 1, · · · , N t the temporal grid. The problem (2) can be discretized as for n = 1, 2, · · · , N t , i = 1, 2, · · · , N x − 1, and U 0 i := η(x i ). The grid function U n i approximates u(x i , t n ) and F n i := f (x i , t n ). We truncate the infinite series expression of cylindrical Wiener process (3) to the first M terms. For convenience, we let M = N x . The increment of Wiener process W j (t n ) − W j (t n−1 ) is a normal random variable, namely,Z j (t n−1 ) := W j (t n ) − W j (t n−1 ) ∼ N(0, ∆t). Letting Z j (t n−1 ) ∼ N(0, 1) yieldsZ j (t n−1 ) = Z j (t n−1 ) √ ∆t. The Caputo fractional time derivative is approximated by using the convolution quadrature scheme [24,25] (also known as Grünwald-Letnikov approximation): where b j,α , j = 0, 1, 2, · · · are the coefficients in the power series expansion (1 − z) α = ∑ +∞ j=0 b j,α z j . There exists an iterative formula for computing these coefficients: b 0,α = 1, and b k,α = 1 − α+1 Rearranging the discretization scheme (14) yields the following matrix form: (16) with The matrix A is the difference matrix for central difference: The column vector z n (ω) : is a realization of standard Gaussian random vector corresponding to a specific sample point ω ∈ Ω which is a probability sample space for the random vector. Moreover, z 0 , z 1 , · · · , z N t −1 are mutually independent standard Gaussian vectors. Solving the linear system (16) for each U n gives the numerical solution to the forward problem driven by a specific realization of white noise: with ω := {ω 0 , ω 1 , · · · , ω N t −1 }. We next solve the inverse problem that given the obser-vationŪ := [Ū 0 ,Ū 1 , · · · ,Ū N t ], we estimate the parameters (α, c, ). Before proceeding on pseudo-likelihood approach, we first clarify two types of observation we consider in the paper: (i) Full observation. We denote byŪ ∆x,∆t := {ū(i∆x, n∆t)} for integers i, n the discrete observation of concentration in the computational domain {(x, t) : (x, t) ∈ (0, 1) 2 }. A full observation is defined as the case where the spatial sampling step ∆x and temporal sampling step ∆t are taken to be the smallest values that are allowed in practice. Due to the limitation of economic cost of placing concentration sensors and restriction of measurement precision of sensors, in reality, the spatial and temporal steps cannot be arbitrarily small. We denote by ∆ 0 x and ∆ 0 t the smallest steps that are allowed in practice. The full observation is the most ideal case for parameter estimation, as we can extract most information from an observation. We assume that usually one can accurately estimate parameters from such an observation. (ii) Partial observation. Sometimes we cannot achieve a full observation due to shrinking budget and geological constraints for placing sensors. For example, when monitoring wells have to be digged for measuring contaminant concentration in groundwater, the budget for placing sensors has been halved for certain reason and the remaining budget only allows a less dense spatial distribution of monitoring wells. We suppose that there exists a full observationŪ ∆ 0 x,∆ 0 t , from which we can accurately estimate model parameters. Then, the partial observation is defined as a subset of U ∆ 0 x,∆ 0 t , namely,Ū r x ∆ 0 x,r t ∆ 0 t for sampling ratios r x , r t ∈ N + . When the sampling ratios r x = r t = 1, the partial observation is the same as the full observation.

Pseudo-Likelihood Estimation for Full Observation
Given the full observationŪ ∆ 0 x,∆ 0 t , we aim at finding optimal parameters α, c, and that make it most likely that we see such an observation. Recalling the discretization scheme (16) for problem (2) and denoting B := I − c(∆t) α A, we now define a random vector Y n as Analogue to (10), we easily see that the residual vector Y n is a Gaussian random vector, We The probability that we make such observation is determined by the density Then, the joint probability we observe {Ū n } N 0t n=1 (N 0t : The pseudo-likelihood estimation can be obtained by minimizing the negative log of the above joint probability: After some rearrangement, we finally arrive to the following proposition for pseudolikelihood estimation. The proof of Proposition 1 is given in Appendix A.

Proposition 1. Given a full observationŪ
n=0 ) for the stochastic fractionaltime diffusion problem (2), the pseudo-likelihood estimates for fractional order α, diffusion coefficient c, and noise magnitude are where andȲ n is defined in (21). The lower triangular matrix L is the Cholesky decomposition of matrix

Pseudo-Likelihood Estimation for Partial Observation
The pseudo-likelihood estimation for the case of partial observation is given in the following Proposition 2. The proof of Proposition 2 is the same as that of Proposition 1 except that we replace the spatio-temporal steps ∆ 0 x, ∆ 0 t with r x ∆ 0 x, r t ∆ 0 t. Proposition 2. Given a partial observationŪ r x ∆ 0 x,r t ∆ 0 t with r x > 1 and r t > 1 for the stochastic fractional-time diffusion problem (2), the pseudo-likelihood estimates for fractional order α, diffusion coefficient c, and noise magnitude are where andȲ n r is defined as The notationsŪ n r andF n r are defined as with N rx := N 0x /r x . The matrices A r and B r are defined as and B r := I − c(r t ∆ 0 ) α A r , respectively. The lower triangular matrix L r is the Cholesky decomposition of matrix B r and L r,ii is the i-th diagonal of L r .

Numerical Results
In this section, we consider the problem (2) with the source term and zero initial condition η(x) ≡ 0. The analytical solution to the deterministic problem which is the exact mean of the stochastic solutions. We assume the spatial and temporal steps for full observation are ∆ 0 x = ∆ 0 t = 2 −9 , and fabricate the full observationŪ ∆ 0 x,∆ 0 t using the true parameter α * = 0.5, c * = 1, * = 0.1 as well as a fixed sample point ω 0 in (18). We plot in Figure 1 the full observationŪ ∆ 0 x,∆ 0 t for deterministic case ( = 0) and noisy case ( = 0.1). The other parameters α * and c * are fixed. We see that the driving noise is already strong enough to produce negative values for the numerical solution, while the exact mean function u d is always non-negative. Note that a specific sample of noise Z(ω 0 ) is drawn and displayed in Figure 2. In Appendix B, we demonstrate the mean and standard deviation of 1000 numerical solutions to the forward problem (2) as well as the effect of noise magnitude on numerical solutions. Using the full observation, we next discuss the performance of pseudo-likelihood approach for cases of one-parameter, two-parameter, and three-parameter estimations.

One-Parameter Estimation
We first consider the full observationŪ 2 −9 , 2 −9 shown in the right panel of Figure 1, which corresponds to the noise sample point ω 0 . We plot the exact negative log-likelihood function −logL(α, c, ) by varying one of the three parameters (α, c, ) but fixing the other two to their true values. In the left panel of Figure 3, we plot a red dotted line corresponding to − logL(α, c * , * ) for α = 0.1, 0.2, · · · , 0.9 and ∆ 0 x = ∆ 0 t = 2 −9 . We see that the optimal α that minimizes the negative log-likelihood is 0.5, and it agrees with the true α * = 0.5. Now, we consider the partial observation by increasing the spatial step ∆ 0 x to 4∆ 0 x, 16∆ 0 x, and 64∆ 0 x, and we plot the corresponding exact negative log-likelihoods with blue, green, and cyan dotted lines. The optimal α does not alter with increasing ∆x. This indicates that the spatial step does not affect too much the estimation of α when other two parameters are fixed to their true values. This is not the case for estimating c, however. In the right panel of Figure 3, the optimal c also matches the true value c * = 1 for the full observation case, whereas the optimal c's for partial observation cases begin to shift to the right of the true value. Furthermore, the larger the spatial step is, the more the optimal c shifts to the right. This implies that the estimation of c is more sensitive to increasing spatial step than the estimation of α.
We have considered the effect of spatial step on the optimal α and c for fixed temporal step. We next consider the opposite, namely, the effect of temporal step for fixed spatial step. From Figure 4, we see that estimations of α and c are both sensitive to the varying temporal steps. This suggests that when recording the concentration data using sensors we can place a small number of sensors in space but we must ensure that the data can be recorded with high sampling frequency in time.
We last consider the sensitivity of estimated α and c to the magnitude of noise . We see from Figure 5 that for a full observation the noise magnitude has no impact on the optimal α and c. In contrast, Figure 6 illustrates that for the partial observation with r x = 16 and r t = 32 the optimal α and c can differ from their true values; moreover, the larger the noise is, the more likely the shift occurs. The temporal step is fixed to that of full observation, i.e., ∆ 0 t = 2 −9 ; the spatial steps vary, and ∆x = 2 −9 corresponds to the full observation case, while the other three spatial steps correspond to the partial observation cases with r x = 4, 16, and 64. The vertical dotted lines intersect x-axis at the optimal α or c that minimizes the normalized negative log-likelihood. The optimal parameters are regarded as the estimated parameters. Note that a vertical line having a specific color corresponds to the likelihood curve having the same color, and vertical lines can overlap each other when the optimal parameters are the same.

Two-Parameter Estimation
We now estimate jointly parameters α and c from the full observation shown in the right panel of Figure 1. In first case, we fix the temporal step to ∆ 0 t = 2 −9 and change the spatial step. Figure 7 plots how the optimal (α, c) (the red disk in the figure) alters its position in the contour plot when the spatial step varies. We see that for a full observation, the optimal parameters can always match the true parameters, whereas for partial observations, the diffusion coefficient c is more difficult to be estimated than the fractional order α. In second case, we fix the spatial step to ∆ 0 x = 2 −9 but change the temporal step. From Figure 8 we see that increasing the temporal step makes both the optimal α and c deviate from their true values, which implies, just as in the one-parameter estimation case, that a small temporal step is preferred to a small spatial step when high accuracy of partial observation is expected. In third case, we fix spatio-temporal steps while varying the noise magnitude . In Figure 9, we fix the spatio-temporal steps to be r x = r t = 1 (a full observation), and we see that the magnitude of noise does not affect the optimal parameters; this is the same as in the case of one-parameter estimation. In Figure 10, we consider a partial observation by setting the sampling ratios to r x = 16 and r t = 8, and we observe that the larger the noise magnitude is, the more the optimal parameters deviate from the true parameters. The noise magnitude is taken to its true value * = 0.1. The true values for α and c are α * = 0.5 and c * = 1.0. The full observation case r x = 1 is compared with other three partial observation cases with r x = 4, 16, and 64. For all plots, the other controlling parameter for partial observation is fixed to r t = 1. The red disk represents the optimal parameter pair that minimizes the negative log-likelihood. The noise magnitude is taken to its true value * = 0.1. The true values for α and c are α * = 0.5 and c * = 1.0. The full observation case r t = 1 is compared with other three partial observation cases with r t = 2, 8, and 32. For all plots, the other controlling parameter for partial observation is fixed to r x = 1. The red disk represents the optimal parameter pair that minimize the negative log-likelihood.  So far, we have estimated parameters using a trial-and-error approach. Specifically, we computed the exact negative log-likelihood for all parameters on a uniform grid in the parameter space and then selected the optimal parameter among the parameters on this grid. Sometimes, however, the true parameter, say, α * = 0.345, is located on a rather dense grid, and employing the trial-and-error approach could be time-consuming. We next utilize certain optimization algorithm to find the optimal parameters. In this paper, we employ a type of quasi-Newton method, called L-BFGS-B algorithm [26] to minimize the negative log-likelihood − logL(α, c, ). To implement the algorithm, we adopt the optimization algorithm package provided in SciPy (see scipy.optimization.minimize()), and set the optional parameters in the algorithm routine to their default values.
As different full observations will yield different estimates of parameters, we consider 20 different full observations obtained by solving the forward problems under different sample-points of noise: ω k for k = 0, 1, · · · , 19. In Table 1, we show the mean values of 20 pairs of parameters (α(ω k ),ĉ(ω k )) where we put the noise sample point ω k in the parentheses behind each parameter to emphasize the dependency of parameter on sample point. Table 2 displays the standard deviation of these 20 pairs of parameters. Table 1. Two-parameter estimation: Mean of the estimated (α, c) from 20 different full observations. The effect of sparsity of observation (i.e., sampling rations r x and r t ) on the mean of the estimated parameters is demonstrated. The true parameters are α * = 0.5 and c * = 1. The noise magnitude is fixed to * = 0.1. A quasi-Newton optimization algorithm, L-BFGS-B algorithm, is employed to minimize the negative log-likelihood. From Table 1, we see that keeping r t small yields more accurate estimates than keeping r x small, which accords with our comment for Figure 8. Moreover, a full observation with r x = r t = 1, again, achieves the highest accuracy for parameter estimation, which validates our approach. From Table 2 we can see that the standard deviation of diffusion coefficient is obviously larger than that of fractional order, and this implies that the former is more difficult to be estimated than the latter. This implication matches our observation for the one-parameter estimation case. Additionally, we see from Table 2 that the more accurate the mean of estimates is, the smaller the standard deviation of estimates is. This gives a suggestion that when we do not know the true parameters we can judge if the estimated parameters are reliable by checking the magnitude of standard deviation. Table 3 gives the estimated parameters from the full observation shown in the right panel of Figure 1. The initial guess for the three parameters in the L-BFGS-B algorithm is (α 0 , c 0 , 0 ) = (0.8, 0.5, 0.5) while the true parameters are (α * , c * , * ) = (0.5, 1.0, 0.1). Like the one-parameter and two-parameter estimation cases, the full observation case (r x = r t = 1) again achieves highest estimation accuracy. Jointly estimating three parameters from a partial observation, however, could not be reliable, even though r t and r x are both small. For example, for r t = 1 and r x = 2, the estimated diffusion coefficient is 1.523, while the true parameter is only 1.0. This indicates that for this case, we need to approach a full observation by keeping the spatio-temporal steps as small as possible so as to attain high estimation accuracy. Among the three parameters that are estimated, the diffusion coefficient is again most difficult to be estimated. This agrees with our observation for the one-parameter and two-parameter estimation cases. Table 3. Three-parameter estimation: Estimated (α, c, ) from the full observation shown in the right panel of Figure 1. The effect of sparsity of observation on estimated parameters is demonstrated. The true parameters are α * = 0.5, c * = 1, and * = 0.1. A quasi-Newton optimization algorithm, L-BFGS-B algorithm is employed to minimize the negative log-likelihood. We next consider the effect of true fractional order on the parameter estimation. We have fixed the true fractional order to α * = 0.5 so far, but now we will see if we can still recover parameters when α * is changed. We freeze the spatial-temporal steps to ∆ 0 x = ∆ 0 t = 2 −9 , fix the sample point to ω 0 as before, but generate the full observations U 0.1 ∆ 0 x,∆ 0 t andŪ 0.9 ∆ 0 x,∆ 0 t using α * = 0.1 and α * = 0.9, respectively, where the superscript ofŪ represents the dependency on the true fractional order. We have seen from Table 3 that the estimated parameters forŪ 0.5
On the other hand, the negative log-likelihood can also become flatter near the global minimizer when the final observation time decreases. The final observation time we previously consider is T = 1, but now we reset it to T = 1/4 and T = 1/16, with spatio-temporal steps and noise sample point fixed as before. The true fractional order is taken to 0.5. We first consider the case of default ftol and gtol. For T = 1/4, the estimated parameters are (0.5008, 1.0932, 0.1088), whose accuracy is acceptable, whereas for T = 1/16, the estimated parameters are (0.4978, 1.3775, 0.1387), which is not that accurate. Now we switch to ftol = 1e-12 and gtol = 1e-9, and see that the estimated parameters for T = 1/4 do not change, but those for T = 1/16 become much better ones (0.4962, 1.0268, 0.1045). Therefore, for a small final observation time, it is safe to keep using small iteration termination tolerances.
Here are some comments on time complexity in implementing the pseudo-likelihood approach proposed in the paper. The time complexity is where M is the number of evaluations of negative log-likelihood function in the L-BFGS-B algorithm. The number M is affected by the number of parameters to be estimated (or the dimensionality of parameter space) and the iteration termination tolerances aforementioned. For example, for a full observation with default tolerances, a two-parameter estimation requires M = 45, while a three-parameter estimation with the default tolerances requires M = 140. The term (N 0t /r t ) 2 comes from the time discretization of fractional time derivative ∂ 1−α t AŪ(t), and (N 0x /r x ) 2 arises from the multiplication of difference matrix A and observation vectorŪ n . The last term (N 0x /r x ) 3 appears due to the inversion of and Cholesky decomposition of B. In our computational experiment, it took 7523 seconds to estimate (α, c, ) for the full observation case (ω 0 ), in which M = 140, N 0x = N 0t = 512, and r x = r t = 1. The code was run at a laptop workstation with Intel(R) Core(TM) i7-10850H CPU @ 2.70GHz and 64 GB memory.
We provide at Github (The code examples can be downloaded at https://github.com/ Derek2021Pang/pseudo_likelihood_SFPDE) two Python code examples. The first example is a forword solver, which can generate full observation for given spatio-temporal steps and white noise sample, and the second example is a pseudo-likelihood parameter estimator, which can estimate the parameters from the full observation by employing the L-BFGS-B algorithm.

Concluding Remarks
In the paper, we propose a pseudo-likelihood approach to estimating the parameters of a one-dimensional stochastic time-fractional diffusion problem (2). We consider full observation and partial observation cases when different amounts of information are available. For the full observation case, our approach can accurately estimate equation (or model) parameters for one-parameter, two-parameter, and three-parameter estimation problems. For the partial observation case, the accuracy of estimated parameters are affected by the sparsity of the observation data, which is controlled by the spatio-temporal sampling steps ∆x and ∆t (or r x and r t ). Our computational experiments produce the following observations: i The larger the spatio-temporal sampling steps are, the lower the accuracy of estimated parameters is, which is intuitive. ii Keeping temporal sampling step small is more important than keeping spatial step small in terms of increasing the parameter estimation accuracy for partial observation. iii Among the three parameters being estimated, namely, fractional order, diffusion coefficient, and noise magnitude, the diffusion coefficient is most difficult to be estimated, since it is most sensitive to varying spatio-temporal steps in partial observation. iv The high accuracy of mean of estimated parameters is usually related to the low standard deviation of estimated parameters, when we fortunately have multiple observations, corresponding to different realizations of driving noise, to obtain multiple groups of estimated parameters. v Estimating more parameters jointly leads to larger variability of estimated parameters when spatio-temporal steps increase. Making spatio-temporal steps as small as possible is suggested for a joint estimation of a large number of parameters.
We need to point out, however, that a limitation of the current approach is that the observed data must be distributed at a uniform grid in the space and time. Otherwise, we cannot compute the negative-likelihood function using the finite difference schemes in the forward solver. The approach cannot handle the case where scattered observation data is present.
In reality, when confronted by discrete observation data, we need to pay attention to three points: First, we do not know in advance whether or not the observation is a full observation. The judgment largely depends on what equation model we consider, what discretization scheme we adopt, and what spatial-temporal steps we take. As we have pointed out, when one already chooses the right model and right discretization, one could make a good parameter estimation if the observation data is dense in time but sparse in space. Second, we may take into account model selection problem when we do not have preference for a specific model but have several candidate models. To select a good model, we can consider a scoring strategy. We score each candidate model with the minimum of the corresponding negative log-likelihood and choose the model having the lowest score. Third, different discretization schemes will lead to different pseudo-likelihood functioñ L(·). Thus, it could be sensible for a given model to select a discretization scheme that yields a pseudo-likelihood estimation more robust to spatial-temporal sampling steps.
Unlike the maximum likelihood estimators presented in [8,11], we do not make any theoretical analyses on asymptotic properties of our pseudo-likelihood estimators. This is one of our future works. It is straightforward to extend the current pseudo-likelihood approach to two-and three-dimensional problems. However, extending the approach to other types of equations, such as space-time fractional stochastic PDEs (1) and even the stochastic versions of variable-order fractional PDEs [27,28], is not that trivial. Reliable discretization schemes for target equations are needed to be developed for those equations, and proper parametrization is required for the variable fractional orders. Furthermore, we will consider in the future the parameter estimation for fractional diffusion equations perturbed by fractional Brownian motion and Lévy motion.

Appendix B. Numerical Solution to Forward Problem
We solve the forward problem (2) using 1000 realizations of white noise, namely, Z(ω k ), k = 0, 1, · · · , 999. We take the equation parameters α * = 0.5, c * = 1.0, * = 0.1, the spatial step ∆ 0 x = 2 −9 , and the temporal step ∆ 0 t = 2 −9 . We compare in Figure A1 the exact and computed mean functions of the 1000 numerical solutions driven by 1000 groups of the aforementioned white noise. We show the exact and computed means at t = 1 as well as one standard deviation band in the left panel of Figure A2. In the right panel of the figure, we also display the numerical solution at t = 1 for two specific realizations of white noise ω 0 and ω 1 . We can see that the regularity of numerical solution is low. To demonstrate the effect of noise magnitude on numerical solutions, we plot the numerical solutions for four increasing noise magnitudes in Figure A3.