Next Article in Journal
Jafari Transformation for Solving a System of Ordinary Differential Equations with Medical Application
Previous Article in Journal
Dynamics of Fractional-Order Digital Manufacturing Supply Chain System and Its Control and Synchronization
Previous Article in Special Issue
A Nonlocal Fractional Peridynamic Diffusion Model
 
 
Font Type:
Arial Georgia Verdana
Font Size:
Aa Aa Aa
Line Spacing:
Column Width:
Background:
Article

Pseudo-Likelihood Estimation for Parameters of Stochastic Time-Fractional Diffusion Equations

School of Mathematics, Southeast University, Nanjing 211189, China
*
Author to whom correspondence should be addressed.
Fractal Fract. 2021, 5(3), 129; https://doi.org/10.3390/fractalfract5030129
Submission received: 24 August 2021 / Revised: 3 September 2021 / Accepted: 15 September 2021 / Published: 18 September 2021
(This article belongs to the Special Issue Fractional Deterministic and Stochastic Models and Their Calibration)

Abstract

:
Although stochastic fractional partial differential equations have received increasing attention in the last decade, the parameter estimation of these equations has been seldom reported in literature. In this paper, we propose a pseudo-likelihood approach to estimating the parameters of stochastic time-fractional diffusion equations, whose forward solver has been investigated very recently by Gunzburger, Li, and Wang (2019). Our approach can accurately recover the fractional order, diffusion coefficient, as well as noise magnitude given the discrete observation data corresponding to only one realization of driving noise. When only partial data is available, our approach can also attain acceptable results for intermediate sparsity of observation.

1. Introduction

Stochastic fractional partial differential equations (SFPDEs) have received a fair amount of attention in the last decade. Research in that direction involves constructing new SFPDE models, proving well-posedness of model solution, and developing new numerical solution methods. Some of the SFPDEs can be written in the following general form:
t β u ( x , t ) = c ( Δ ) α ˜ ( x ) / 2 u ( x , t ) + f ( x , t ) + ϵ I t 1 β ( σ ( u ) W ˙ ( x , t ) ) ,
where β is the order of Caputo fractional time derivative, c is diffusion coefficient, α ˜ ( x ) is the spatially variable order of fractional Laplacian, f ( x , t ) is a prescribed source term and I t 1 β is fractional time integration operator, W ˙ ( x , t ) is a space-time white noise, and ϵ is the magnitude of the noise. Mijena and Nane [1] proved the existence and uniqueness of mild solutions to the space-time fractional nonlinear PDE (1) with β ( 0 , 1 ) , α ˜ ( x ) α , and a Lipschitz continuous σ ( · ) . Anh, Leonenko, and Ruiz-Medina [2] derived the weak-sense Gaussian solution to the SFPDE (1) with β ( 0 , 1 ) , σ ( · ) 1 , and pseudo-differential operator ( Δ ) α ˜ ( x ) / 2 . Gunzburger, Li, and Wang [3] proposed a time discretization scheme for stochastic time-fractional PDE (1) with β ( 0 , 1 ) , α ˜ ( x ) 2 , and σ ( · ) 1 . Some other authors considered SFPDEs that differ from the general form (1). Bolin, Kirchner, and Kovács [4] presented a numerical solution method for an elliptic SFPDE ( κ Δ ) β u ( x ) = W ( x ) , where W ( x ) is a Gaussian white noise. Anh, Olenko, and Wang [5] constructed a SFPDE to model the evolution of a random tangent vector field on the unit sphere. Mohammed [6] and Xia and Yan [7] considered FPDEs driven by multiplicative Brownian motion and fractional Brownian motion, respectively.
In contrast to the rapid development of numerical solution to forward problems, parameter estimation for SFPDEs has not yet been fully investigated, although there have been many works on parameter estimation for SPDEs [8,9,10,11]. Cialenco, Lototsk, and Pospisil [12] and Cialenco [13] studied the parameter estimation of diffusion coefficient and Hurst index for standard (integer-order) diffusion driven by additive and multiplicative fractional noises. When the fractional ingredient in the driving noise is excluded, the equations they considered are more like stochastic integer-order PDEs, which are not of the concern of current study. Geldhauser and Valdinociun [14] did consider a SFPDE, and they estimated the order of fractional Laplacian in the context of optimal control.
Parameter estimation for deterministic and stochastic FPDEs are different. Inversion for deterministic FPDEs often involves solving forward problems, while inversion for stochastic FPDE does not. Specifically, for deterministic FPDEs, once the source term f ( x , t ) and initial/boundary conditions are known, one can solve forward problems under different equation parameters, and after utilizing certain inversion techniques such as regularizing nonlinear least squares [15], surrogate models [16,17,18] and physics-informed deep learning [19], one can select the parameters that reconstruct the observation of u as the estimated parameters. For stochastic FPDEs, however, one does not know the specific realization of noise that corresponds to the observation, and thus cannot solve the forward problems even though other conditions are prescribed. Maximum likelihood estimation (MLE) is a powerful means to handle the parameter estimation problems for SPDEs, which can bypass solving forward problems. To the best of our knowledge, there are very few reports on MLE and its variants for parameter estimation of SFPDEs. On the other hand, there have been some efforts on theoretical analyses, such as consistency and asymptotic efficiency, for maximum likelihood estimators for parameters in SPDEs, for example, in [8,11], but the corresponding numerical studies are fewer than the theoretical studies.
In this paper, we propose a pseudo-likelihood estimation approach for SFPDE (1) with β ( 0 , 1 ) , α ˜ ( x ) 2 , and σ ( · ) 1 . We extend the pseudo-likelihood approach for solving stochastic ordinary differential equations (SODEs) to SFPDEs, despite the fact that the extension is not trivial. The paper is organized as follows. Section 2 introduces the SFPDE we consider and defines the parameter estimation problem. Section 3 elaborates on pseudo-likelihood approach for the SFPDE, before which we briefly review the approach for SODEs since it inspires us. Section 4 demonstrates some numerical results for fabricated observation data. The last section gives some remarks on the proposed approach.

2. Parameter Estimation Problem

We consider a one-dimensional stochastic time-fractional diffusion problem [3]
t u ( x , t ) c Δ t 1 α u ( x , t ) = f ( x , t ) + ϵ t W ( x , t ) , ( x , t ) ( 0 , 1 ) 2 , u ( 0 , t ) = u ( 1 , t ) = 0 , t ( 0 , 1 ) , u ( x , 0 ) = η ( x ) , x ( 0 , 1 ) ,
where u ( x , t ) is a space-time distribution of concentration of certain particles, the diffusion coefficient c > 0 , the fractional order α ( 0 , 1 ) , and the initial condition η ( x ) and the source term f ( x , t ) are prescribed. The stochastic fractional PDE is driven by a space-time white noise t W ( x , t ) : = W ˙ ( x , t ) , which is the time derivative of a cylindrical Wiener process W ( x , t ) in L 2 ( ( 0 , 1 ) ) defined by [20]
W ( x , t ) = j = 1 + ϕ j ( x ) W j ( t ) ,
with ϕ j ( x ) being the normalized eigenfunction of negative Laplacian Δ , i.e., ϕ j ( x ) = 2 sin ( j π x ) for x ( 0 , 1 ) and { W j ( t ) } j = 1 + being independent one-dimensional Wiener processes. The positive constant ϵ before the white noise determines the noise magnitude. The domain of Laplacian Δ is { ψ H 0 1 ( ( 0 , 1 ) ) : Δ ψ L 2 ( ( 0 , 1 ) ) . The left-sided Caputo fractional time derivative is considered:
t 1 α ψ ( x , t ) = 1 Γ ( α ) 0 t ( t s ) α 1 s ψ ( x , s ) d s , α ( 0 , 1 ) ,
where Γ ( α ) is the Gamma function. Note that for ψ ( x , 0 ) = 0 the left-sided Caputo fractional time derivative is equivalent to left-sided Riemann–Liouville fractional time derivative. Moreover, applying fractional integration operator t α 1 to both hand sides of Equation (2) yields an equivalent equation
t α u ( x , t ) c Δ ( u ( x , t ) η ( x ) ) = t α 1 ( f ( x , t ) + ϵ W ˙ ( x , t ) ) ,
in which we have used the identity (cf. Lemma 2.22 in [21]) t α 1 t 1 α ψ ( x , t ) = ψ ( x , t ) ψ ( x , 0 ) .
The parameter estimation problem can be defined as follows. Given the concentration data u ¯ ( x i , t n ) observed at space-time grid points ( x i , t n ) , we will find the parameters ( c , α , ϵ ) that maximize the probability that such concentration data are observed. Notice that we do not know what specific sample (or realization) of the white noise W ˙ ( x , t ) corresponds to the current observation, but we only know the observation. We will estimate the magnitude ϵ of the noise only utilizing the observation.

3. Pseudo-Likelihood Approach

In this section, we first review the pseudo-likelihood approach for parameter estimation of stochastic ODEs discussed in Section 3.2 of [22] and then embark on the extension of the approach to stochastic PDEs.

3.1. Pseudo-Likelihood Estimation for Stochastic ODEs

We take the parameter estimation of the following Black-Scholes equation as an example:
t X ( t ) = θ 1 X ( t ) + θ 2 X ( t ) W ˙ ( t ) , θ 1 R , θ 2 > 0 ,
with deterministic intial value X ( t 0 ) = x 0 and Wiener process W ( t ) . Given the observation X ¯ ( t n ) for n = 1 , 2 , , N , we will estimate the parameters ( θ 1 , θ 2 ) .
Denote by p θ 1 , θ 2 ( x 2 , s 2 | x 1 , s 1 ) ( s 1 < s 2 ) the conditional density of the random variable X ( s 2 ) given X ( s 1 ) , and the likelihood function for the observation { X ¯ ( t n ) } is
L ( θ 1 , θ 2 ) = n = 1 N p θ 1 , θ 2 ( X ¯ ( t n ) , t n | X ¯ ( t n 1 ) , t n 1 ) .
The MLE ( θ ^ 1 , θ ^ 2 ) satisfies
( θ ^ 1 , θ ^ 2 ) = arg min θ 1 , θ 2 { log L ( θ 1 , θ 2 ) } .
When the explicit form of conditional density is known, the corresponding approach is called exact-likelihood approach. For instance, as the conditional density for Black–Scholes equation is the density of a log-normal random variable [22], we can employ the exact-likelihood approach to infer the parameters. For general diffusion processes, however, the conditional density may not be known explicitly. In this case, we can infer parameters using pseudo-likelihood approach instead.
To make pseudo-likelihood estimation, we need to discretize the SODEs first. We still consider the Black–Scholes equation, and discretize it using the Euler scheme
X ( t n ) X ( t n 1 ) = θ 1 X ( t n 1 ) Δ t + θ 2 X ( t n 1 ) ( W ( t n ) W ( t n 1 ) ) ,
for n = 1 , 2 , , N . As the increment of the Wiener process W ( t n ) W ( t n 1 ) is a normal random variable with zero mean and variance t n t n 1 , namely, W ( t n ) W ( t n 1 ) N ( 0 , Δ t ) , we see that the residual X ( t n ) X ( t n 1 ) θ 1 X ( t n 1 ) Δ t : = Y n is a normal random variable
Y n N ( 0 , θ 2 2 X 2 ( t n 1 ) Δ t ) .
Thus, the probability we jointly observe Y ¯ n = X ¯ ( t n ) X ¯ ( t n 1 ) θ 1 X ¯ ( t n 1 ) Δ t is determined by the density,
q θ 1 , θ 2 ( y = Y ¯ n ) = 1 2 π θ 2 2 X ¯ 2 ( t n 1 ) Δ t exp 1 2 Y ¯ n 2 θ 2 2 X ¯ 2 ( t n 1 ) Δ t .
Then, the joint probability we observe { X ¯ ( t n ) } n = 1 N is
L ˜ ( θ 1 , θ 2 ) = n = 1 N q θ 1 , θ 2 ( Y ¯ n ) .
The pseudo-likelihood estimation satisfies
( θ ^ 1 , θ ^ 2 ) = arg min θ 1 , θ 2 { log L ˜ ( θ 1 , θ 2 ) } .
The conditional density q θ 1 , θ 2 in pseudo-likelihood (12) generally differs from the true conditional density p θ 1 , θ 2 in (7) as mentioned in preceding paragraphs. The approximation q to p is good when the sampling step Δ t is very small. For example, for N ( Δ t ) 3 0 as N + the maximum likelihood estimator built upon pseudo-likelihood is consistent and asymptotically normal [23].

3.2. Pseudo-Likelihood Estimation for Stochastic PDEs

We now consider the parameter estimation of fractional order α , diffusion coefficient c, as well as noise magnitude ϵ in problem (2) by leveraging the pseudo-likelihood approach we reviewed in the last subsection. Recalling that the first step we implement the approach is to discretize the Black–Scholes equation using the Euler scheme, we, first of all, need to seek a proper discretization scheme for our stochastic fractional PDE.

3.2.1. Spatio-Temporal Discretization Scheme

We adopt the time discretization scheme proposed in [3] and central difference for temporal and spatial discretizations, respectively. For the computational domain { ( x , t ) : ( x , t ) ( 0 , 1 ) 2 } of problem (2), we denote by x i = i N x , i = 0 , 1 , , N x the spatial grid and by t n = n N t , n = 0 , 1 , , N t the temporal grid. The problem (2) can be discretized as
U i n U i n 1 Δ t c ( Δ t ) α 1 j = 0 n b n j , 1 α U i + 1 j 2 U i j + U i 1 j ( Δ x ) 2   = F i n + ϵ j = 1 M ϕ j ( x i ) W j ( t n ) W j ( t n 1 ) Δ t ,
for n = 1 , 2 , , N t , i = 1 , 2 , , N x 1 , and U i 0 : = η ( x i ) . The grid function U i n approximates u ( x i , t n ) and F i n : = f ( x i , t n ) . We truncate the infinite series expression of cylindrical Wiener process (3) to the first M terms. For convenience, we let M = N x . The increment of Wiener process W j ( t n ) W j ( t n 1 ) is a normal random variable, namely, Z ˜ j ( t n 1 ) : = W j ( t n ) W j ( t n 1 ) N ( 0 , Δ t ) . Letting Z j ( t n 1 ) N ( 0 , 1 ) yields Z ˜ j ( t n 1 ) = Z j ( t n 1 ) Δ t . The Caputo fractional time derivative is approximated by using the convolution quadrature scheme [24,25] (also known as Grünwald–Letnikov approximation):
t 1 α ψ ( x , t n ) = 1 ( Δ t ) 1 α j = 0 n b n j , 1 α ψ ( x , t j ) ,
where b j , α , j = 0 , 1 , 2 , are the coefficients in the power series expansion ( 1 z ) α = j = 0 + b j , α z j . There exists an iterative formula for computing these coefficients: b 0 , α = 1 , and b k , α = 1 α + 1 k b k 1 , α , k 1 .
Rearranging the discretization scheme (14) yields the following matrix form:
( I c ( Δ t ) α A ) U n = U n 1 + F n Δ t + c ( Δ t ) α j = 0 n 1 b n j , 1 α A U j + ϵ Δ t Φ z n 1 ( ω ) ,
with U n : = [ U 1 n , U 2 n , , U N x 1 n ] T , F n : = [ f ( x 1 , t n ) , f ( x 2 , t n ) , , f ( x N x 1 , t n ) ] T , and Φ : = [ ϕ j ( x i ) ] R ( N x 1 ) × M for i = 1 , 2 , , N x 1 and j = 1 , 2 , , M . The matrix A is the difference matrix for central difference:
A : = 1 ( Δ x ) 2 2 1 1 2 1 1 2 1 1 2 R ( N x 1 ) × ( N x 1 ) .
The column vector z n ( ω ) : = [ Z 1 ( t n 1 ) , Z 2 ( t n 1 ) , , Z M ( t n 1 ) ] T R M is a realization of standard Gaussian random vector corresponding to a specific sample point ω Ω which is a probability sample space for the random vector. Moreover, z 0 , z 1 , , z N t 1 are mutually independent standard Gaussian vectors.
Solving the linear system (16) for each U n gives the numerical solution to the forward problem driven by a specific realization of white noise:
Z ( ω ) : = Δ t Φ [ z 0 ( ω 0 ) , z 1 ( ω 1 ) , , z N t 1 ( ω N t 1 ) ] R ( N x 1 ) × N t
with ω : = { ω 0 , ω 1 , , ω N t 1 } . We next solve the inverse problem that given the observation U ¯ : = [ U ¯ 0 , U ¯ 1 , , U ¯ N t ] , we estimate the parameters ( α , c , ϵ ) . Before proceeding on pseudo-likelihood approach, we first clarify two types of observation we consider in the paper:
(i)
Full observation. We denote by U ¯ Δ x , Δ t : = { u ¯ ( i Δ x , n Δ t ) } for integers i , n the discrete observation of concentration in the computational domain { ( x , t ) : ( x , t ) ( 0 , 1 ) 2 } . A full observation is defined as the case where the spatial sampling step Δ x and temporal sampling step Δ t are taken to be the smallest values that are allowed in practice. Due to the limitation of economic cost of placing concentration sensors and restriction of measurement precision of sensors, in reality, the spatial and temporal steps cannot be arbitrarily small. We denote by Δ 0 x and Δ 0 t the smallest steps that are allowed in practice. The full observation is the most ideal case for parameter estimation, as we can extract most information from an observation. We assume that usually one can accurately estimate parameters from such an observation.
(ii)
Partial observation. Sometimes we cannot achieve a full observation due to shrinking budget and geological constraints for placing sensors. For example, when monitoring wells have to be digged for measuring contaminant concentration in groundwater, the budget for placing sensors has been halved for certain reason and the remaining budget only allows a less dense spatial distribution of monitoring wells. We suppose that there exists a full observation U ¯ Δ 0 x , Δ 0 t , from which we can accurately estimate model parameters. Then, the partial observation is defined as a subset of U ¯ Δ 0 x , Δ 0 t , namely, U ¯ r x Δ 0 x , r t Δ 0 t for sampling ratios r x , r t N + . When the sampling ratios r x = r t = 1 , the partial observation is the same as the full observation.

3.2.2. Pseudo-Likelihood Estimation for Full Observation

Given the full observation U ¯ Δ 0 x , Δ 0 t , we aim at finding optimal parameters α , c , and ϵ that make it most likely that we see such an observation. Recalling the discretization scheme (16) for problem (2) and denoting B : = I c ( Δ t ) α A , we now define a random vector Y n as
Y n : = U n B 1 U n 1 + F n Δ t + c ( Δ t ) α j = 0 n 1 b n j , 1 α A U j = ϵ Δ t B 1 Φ z n 1 ( ω ) .
Analogue to (10), we easily see that the residual vector Y n is a Gaussian random vector,
Y n N ( 0 , ϵ 2 Δ t B 1 Φ Φ T B T ) .
We let U ¯ n : = [ u ¯ ( Δ 0 x , n Δ 0 t ) , u ¯ ( 2 Δ 0 x , n Δ 0 t ) , , u ¯ ( ( N 0 x 1 ) Δ 0 x , n Δ 0 t ) ] T with N 0 x = 1 / Δ 0 x . Given the full observation U ¯ Δ 0 x , Δ 0 t , the corresponding observation of Y n is
Y ¯ n = U ¯ n B 0 1 U ¯ n 1 + F n Δ 0 t + c ( Δ 0 t ) α j = 0 n 1 b n j , 1 α A 0 U ¯ j
where B 0 : = I c ( Δ 0 t ) α A 0 and
A 0 : = 1 ( Δ 0 x ) 2 2 1 1 2 1 1 2 1 1 2 R ( N 0 x 1 ) × ( N 0 x 1 ) .
The probability that we make such observation is determined by the density
q α , c , ϵ ( y = Y ¯ n ) = 1 ( 2 π ) ( N 0 x 1 ) / 2 | ϵ 2 Δ 0 t B 0 1 Φ Φ T B 0 T | 1 / 2 · exp 1 2 ( Y ¯ n ) T ( B 0 1 Φ Φ T B 0 T ) 1 Y ¯ n ϵ 2 Δ 0 t
Then, the joint probability we observe { U ¯ n } n = 1 N 0 t ( N 0 t : = 1 / Δ 0 t ) is
L ˜ ( α , c , ϵ ) = n = 1 N 0 t q α , c , ϵ ( y = Y ¯ n ) .
The pseudo-likelihood estimation can be obtained by minimizing the negative log of the above joint probability:
( α ^ , c ^ , ϵ ^ ) = arg min α , c , ϵ log L ˜ ( α , c , ϵ ) = arg min α , c , ϵ n = 1 N 0 t log q α , c , ϵ ( y = Y ¯ n ) .
After some rearrangement, we finally arrive to the following proposition for pseudo-likelihood estimation. The proof of Proposition 1 is given in Appendix A.
Proposition 1.
Given a full observation U ¯ Δ 0 x , Δ 0 t (namely { U ¯ n } n = 0 N 0 t ) for the stochastic fractional-time diffusion problem (2), the pseudo-likelihood estimates for fractional order α, diffusion coefficient c, and noise magnitude ϵ are
( α ^ , c ^ , ϵ ^ ) = arg min α , c , ϵ log L ˜ ( α , c , ϵ ) = arg min α , c , ϵ { G 1 ( α , c , ϵ ) + G 2 ( ϵ ) + C } ,
where
G 1 ( α , c , ϵ ) : = 2 N 0 t i = 1 N 0 x 1 log ( L i i ) + 1 2 n = 1 N 0 t B 0 Y ¯ n 2 2 ϵ 2 N 0 x Δ 0 t , G 2 ( ϵ ) : = N 0 t ( N 0 x 1 ) 2 log ( ϵ 2 N 0 x Δ 0 t ) , C : = N 0 t ( N 0 x 1 ) 2 log ( 2 π ) . ,
and Y ¯ n is defined in (21). The lower triangular matrix L is the Cholesky decomposition of matrix B 0 : = I c ( Δ 0 t ) α A 0 , i.e., B 0 = L L T , and L i i is the i-th diagonal of L .

3.2.3. Pseudo-Likelihood Estimation for Partial Observation

The pseudo-likelihood estimation for the case of partial observation is given in the following Proposition 2. The proof of Proposition 2 is the same as that of Proposition 1 except that we replace the spatio-temporal steps Δ 0 x , Δ 0 t with r x Δ 0 x , r t Δ 0 t .
Proposition 2.
Given a partial observation U ¯ r x Δ 0 x , r t Δ 0 t with r x > 1 and r t > 1 for the stochastic fractional-time diffusion problem (2), the pseudo-likelihood estimates for fractional order α, diffusion coefficient c, and noise magnitude ϵ are
( α ^ , c ^ , ϵ ^ ) = arg min α , c , ϵ log L ˜ r ( α , c , ϵ ) = arg min α , c , ϵ { G r , 1 ( α , c , ϵ ) + G r , 2 ( ϵ ) + C r } ,
where
G r , 1 ( α , c , ϵ ) : = 2 N 0 t r t i = 1 N 0 x r x 1 log ( L r , i i ) + 1 2 n = 1 N 0 t r t B r Y ¯ r n 2 2 ϵ 2 N 0 x r x r t Δ 0 t , G r , 2 ( ϵ ) : = N 0 t r t ( N 0 x r x 1 ) 2 log ( ϵ 2 N 0 x r x r t Δ 0 t ) , C : = N 0 t r t ( N 0 x r x 1 ) 2 log ( 2 π ) . ,
and Y ¯ r n is defined as
Y ¯ r n = U ¯ r n B r 1 U ¯ r n 1 + F r n r t Δ 0 t + c ( r t Δ 0 t ) α j = 0 n 1 b n j , 1 α A r U ¯ r j .
The notations U ¯ r n and F ¯ r n are defined as
U ¯ r n : = [ u ¯ ( r x Δ 0 x , n r t Δ 0 t ) , u ¯ ( 2 r x Δ 0 x , n r t Δ 0 t ) , , u ¯ ( ( N r x 1 ) r x Δ 0 x , n r t Δ 0 t ) ] T ,   F ¯ r n : = [ f ( r x Δ 0 x , n r t Δ 0 t ) , f ( 2 r x Δ 0 x , n r t Δ 0 t ) , , f ( ( N r x 1 ) r x Δ 0 x , n r t Δ 0 t ) ] T ,
with N r x : = N 0 x / r x . The matrices A r and B r are defined as
A r : = 1 ( r x Δ 0 x ) 2 2 1 1 2 1 1 2 1 1 2 R ( N r x 1 ) × ( N r x 1 ) ,
and B r : = I c ( r t Δ 0 ) α A r , respectively. The lower triangular matrix L r is the Cholesky decomposition of matrix B r and L r , i i is the i-th diagonal of L r .

4. Numerical Results

In this section, we consider the problem (2) with the source term
f ( x , t ) = 2 t x 2 ( 1 x ) 2 2 t 1 + α Γ ( 2 + α ) ( 2 12 x + 12 x 2 ) ,
and zero initial condition η ( x ) 0 . The analytical solution to the deterministic problem ( ϵ = 0 ) is u d ( x , t ) = t 2 x 2 ( 1 x ) 2 , which is the exact mean of the stochastic solutions. We assume the spatial and temporal steps for full observation are Δ 0 x = Δ 0 t = 2 9 , and fabricate the full observation U ¯ Δ 0 x , Δ 0 t using the true parameter α * = 0.5 , c * = 1 , ϵ * = 0.1 as well as a fixed sample point ω 0 in (18). We plot in Figure 1 the full observation U ¯ Δ 0 x , Δ 0 t for deterministic case ( ϵ = 0 ) and noisy case ( ϵ = 0.1 ) . The other parameters α * and c * are fixed. We see that the driving noise is already strong enough to produce negative values for the numerical solution, while the exact mean function u d is always non-negative. Note that a specific sample of noise ϵ Z ( ω 0 ) is drawn and displayed in Figure 2. In Appendix B, we demonstrate the mean and standard deviation of 1000 numerical solutions to the forward problem (2) as well as the effect of noise magnitude ϵ on numerical solutions.
Using the full observation, we next discuss the performance of pseudo-likelihood approach for cases of one-parameter, two-parameter, and three-parameter estimations.

4.1. One-Parameter Estimation

We first consider the full observation U ¯ 2 9 , 2 9 shown in the right panel of Figure 1, which corresponds to the noise sample point ω 0 . We plot the exact negative log-likelihood function − log L ˜ ( α , c , ϵ ) by varying one of the three parameters ( α , c , ϵ ) but fixing the other two to their true values. In the left panel of Figure 3, we plot a red dotted line corresponding to log L ˜ ( α , c * , ϵ * ) for α = 0.1 , 0.2 , , 0.9 and Δ 0 x = Δ 0 t = 2 9 . We see that the optimal α that minimizes the negative log-likelihood is 0.5, and it agrees with the true α * = 0.5 . Now, we consider the partial observation by increasing the spatial step Δ 0 x to 4 Δ 0 x , 16 Δ 0 x , and 64 Δ 0 x , and we plot the corresponding exact negative log-likelihoods with blue, green, and cyan dotted lines. The optimal α does not alter with increasing Δ x . This indicates that the spatial step does not affect too much the estimation of α when other two parameters are fixed to their true values. This is not the case for estimating c, however. In the right panel of Figure 3, the optimal c also matches the true value c * = 1 for the full observation case, whereas the optimal c’s for partial observation cases begin to shift to the right of the true value. Furthermore, the larger the spatial step is, the more the optimal c shifts to the right. This implies that the estimation of c is more sensitive to increasing spatial step than the estimation of α .
We have considered the effect of spatial step on the optimal α and c for fixed temporal step. We next consider the opposite, namely, the effect of temporal step for fixed spatial step. From Figure 4, we see that estimations of α and c are both sensitive to the varying temporal steps. This suggests that when recording the concentration data using sensors we can place a small number of sensors in space but we must ensure that the data can be recorded with high sampling frequency in time.
We last consider the sensitivity of estimated α and c to the magnitude of noise ϵ . We see from Figure 5 that for a full observation the noise magnitude has no impact on the optimal α and c. In contrast, Figure 6 illustrates that for the partial observation with r x = 16 and r t = 32 the optimal α and c can differ from their true values; moreover, the larger the noise is, the more likely the shift occurs.

4.2. Two-Parameter Estimation

We now estimate jointly parameters α and c from the full observation shown in the right panel of Figure 1. In first case, we fix the temporal step to Δ 0 t = 2 9 and change the spatial step. Figure 7 plots how the optimal ( α , c ) (the red disk in the figure) alters its position in the contour plot when the spatial step varies. We see that for a full observation, the optimal parameters can always match the true parameters, whereas for partial observations, the diffusion coefficient c is more difficult to be estimated than the fractional order α . In second case, we fix the spatial step to Δ 0 x = 2 9 but change the temporal step. From Figure 8 we see that increasing the temporal step makes both the optimal α and c deviate from their true values, which implies, just as in the one-parameter estimation case, that a small temporal step is preferred to a small spatial step when high accuracy of partial observation is expected. In third case, we fix spatio-temporal steps while varying the noise magnitude ϵ . In Figure 9, we fix the spatio-temporal steps to be r x = r t = 1 (a full observation), and we see that the magnitude of noise does not affect the optimal parameters; this is the same as in the case of one-parameter estimation. In Figure 10, we consider a partial observation by setting the sampling ratios to r x = 16 and r t = 8 , and we observe that the larger the noise magnitude is, the more the optimal parameters deviate from the true parameters.
So far, we have estimated parameters using a trial-and-error approach. Specifically, we computed the exact negative log-likelihood for all parameters on a uniform grid in the parameter space and then selected the optimal parameter among the parameters on this grid. Sometimes, however, the true parameter, say, α * = 0.345 , is located on a rather dense grid, and employing the trial-and-error approach could be time-consuming. We next utilize certain optimization algorithm to find the optimal parameters. In this paper, we employ a type of quasi-Newton method, called L-BFGS-B algorithm [26] to minimize the negative log-likelihood log L ˜ ( α , c , ϵ ) . To implement the algorithm, we adopt the optimization algorithm package provided in SciPy (see scipy.optimization.minimize()), and set the optional parameters in the algorithm routine to their default values.
As different full observations will yield different estimates of parameters, we consider 20 different full observations obtained by solving the forward problems under different sample-points of noise: ω k for k = 0 , 1 , , 19 . In Table 1, we show the mean values of 20 pairs of parameters ( α ^ ( ω k ) , c ^ ( ω k ) ) where we put the noise sample point ω k in the parentheses behind each parameter to emphasize the dependency of parameter on sample point. Table 2 displays the standard deviation of these 20 pairs of parameters.
From Table 1, we see that keeping r t small yields more accurate estimates than keeping r x small, which accords with our comment for Figure 8. Moreover, a full observation with r x = r t = 1 , again, achieves the highest accuracy for parameter estimation, which validates our approach. From Table 2 we can see that the standard deviation of diffusion coefficient is obviously larger than that of fractional order, and this implies that the former is more difficult to be estimated than the latter. This implication matches our observation for the one-parameter estimation case. Additionally, we see from Table 2 that the more accurate the mean of estimates is, the smaller the standard deviation of estimates is. This gives a suggestion that when we do not know the true parameters we can judge if the estimated parameters are reliable by checking the magnitude of standard deviation.

4.3. Three-Parameter Estimation

Table 3 gives the estimated parameters from the full observation shown in the right panel of Figure 1. The initial guess for the three parameters in the L-BFGS-B algorithm is ( α 0 , c 0 , ϵ 0 ) = ( 0.8 , 0.5 , 0.5 ) while the true parameters are ( α * , c * , ϵ * ) = ( 0.5 , 1.0 , 0.1 ) . Like the one-parameter and two-parameter estimation cases, the full observation case ( r x = r t = 1 ) again achieves highest estimation accuracy. Jointly estimating three parameters from a partial observation, however, could not be reliable, even though r t and r x are both small. For example, for r t = 1 and r x = 2 , the estimated diffusion coefficient is 1.523, while the true parameter is only 1.0. This indicates that for this case, we need to approach a full observation by keeping the spatio-temporal steps as small as possible so as to attain high estimation accuracy. Among the three parameters that are estimated, the diffusion coefficient is again most difficult to be estimated. This agrees with our observation for the one-parameter and two-parameter estimation cases.
We next consider the effect of true fractional order on the parameter estimation. We have fixed the true fractional order to α * = 0.5 so far, but now we will see if we can still recover parameters when α * is changed. We freeze the spatial-temporal steps to Δ 0 x = Δ 0 t = 2 9 , fix the sample point to ω 0 as before, but generate the full observations U ¯ Δ 0 x , Δ 0 t 0.1 and U ¯ Δ 0 x , Δ 0 t 0.9 using α * = 0.1 and α * = 0.9 , respectively, where the superscript of U ¯ represents the dependency on the true fractional order. We have seen from Table 3 that the estimated parameters for U ¯ Δ 0 x , Δ 0 t 0.5 are ( α ^ , c ^ , ϵ ^ ) = ( 0.500 , 1.038 , 0.104 ) . We observe that the estimated parameters for other two full observations are sufficiently accurate as well, i.e., ( α ^ , c ^ , ϵ ^ ) = ( 0.100 , 1.035 , 0.104 ) for U ¯ Δ 0 x , Δ 0 t 0.1 and ( α ^ , c ^ , ϵ ^ ) = ( 0.900 , 1.002 , 0.100 ) for U ¯ Δ 0 x , Δ 0 t 0.9 . We need to point out that for smaller α * , say, 0.1 here, the negative log-likelihood appears to be flatter in the neighborhood of global minimizer α * = 0.1 , compared with a large α * , say 0.5. For example, for U ¯ Δ 0 x , Δ 0 t 0.1 , the negative log-likelihood for the parameter triple ( 0.0999 , 1.528 , 0.1526 ) is approximately −3,335,910, while the negative-likelihood for true parameters ( 0.1 , 1.0 , 0.1 ) is approximately −3,335,916. In the neighborhood of true parameters, the negative log-likelihood can only change | 3 , 335 , 916 3 , 335 , 910 | 3 , 335 , 916 0.0002 % . This minor variation could challenge the optimization algorithms. For the L-BFGS-B algorithm, if we set the iteration termination tolerances ftol and gtol to larger values, say, the default values ftol = 1e-9 and gtol = 1e-5, we arrive to the aforementioned parameters ( 0.0999 , 1.528 , 0.1526 ) . By resetting ftol = 1e-12 and gtol = 1e-9, we obtain much better estimates ( 0.100 , 1.035 , 0.104 ) . Note that for parameter estimation in other two full observations, we still use the default tolerances for sake of computational cost.
On the other hand, the negative log-likelihood can also become flatter near the global minimizer when the final observation time decreases. The final observation time we previously consider is T = 1 , but now we reset it to T = 1 / 4 and T = 1 / 16 , with spatio-temporal steps and noise sample point fixed as before. The true fractional order is taken to 0.5. We first consider the case of default ftol and gtol. For T = 1 / 4 , the estimated parameters are ( 0.5008 , 1.0932 , 0.1088 ) , whose accuracy is acceptable, whereas for T = 1 / 16 , the estimated parameters are ( 0.4978 , 1.3775 , 0.1387 ) , which is not that accurate. Now we switch to ftol = 1e-12 and gtol = 1e-9, and see that the estimated parameters for T = 1 / 4 do not change, but those for T = 1 / 16 become much better ones ( 0.4962 , 1.0268 , 0.1045 ) . Therefore, for a small final observation time, it is safe to keep using small iteration termination tolerances.
Here are some comments on time complexity in implementing the pseudo-likelihood approach proposed in the paper. The time complexity is
M O N 0 x r x 2 N 0 t r t 2 + O N 0 x r x 3 ,
where M is the number of evaluations of negative log-likelihood function in the L-BFGS-B algorithm. The number M is affected by the number of parameters to be estimated (or the dimensionality of parameter space) and the iteration termination tolerances aforementioned. For example, for a full observation with default tolerances, a two-parameter estimation requires M = 45 , while a three-parameter estimation with the default tolerances requires M = 140 . The term ( N 0 t / r t ) 2 comes from the time discretization of fractional time derivative t 1 α A U ¯ ( t ) , and ( N 0 x / r x ) 2 arises from the multiplication of difference matrix A and observation vector U ¯ n . The last term ( N 0 x / r x ) 3 appears due to the inversion of and Cholesky decomposition of B . In our computational experiment, it took 7523 seconds to estimate ( α , c , ϵ ) for the full observation case ( ω 0 ) , in which M = 140 , N 0 x = N 0 t = 512 , and r x = r t = 1 . The code was run at a laptop workstation with Intel(R) Core(TM) i7-10850H CPU @ 2.70GHz and 64 GB memory.
We provide at Github (The code examples can be downloaded at https://github.com/Derek2021Pang/pseudo_likelihood_SFPDE) two Python code examples. The first example is a forword solver, which can generate full observation for given spatio-temporal steps and white noise sample, and the second example is a pseudo-likelihood parameter estimator, which can estimate the parameters from the full observation by employing the L-BFGS-B algorithm.

5. Concluding Remarks

In the paper, we propose a pseudo-likelihood approach to estimating the parameters of a one-dimensional stochastic time-fractional diffusion problem (2). We consider full observation and partial observation cases when different amounts of information are available. For the full observation case, our approach can accurately estimate equation (or model) parameters for one-parameter, two-parameter, and three-parameter estimation problems. For the partial observation case, the accuracy of estimated parameters are affected by the sparsity of the observation data, which is controlled by the spatio-temporal sampling steps Δ x and Δ t (or r x and r t ). Our computational experiments produce the following observations:
i
The larger the spatio-temporal sampling steps are, the lower the accuracy of estimated parameters is, which is intuitive.
ii
Keeping temporal sampling step small is more important than keeping spatial step small in terms of increasing the parameter estimation accuracy for partial observation.
iii
Among the three parameters being estimated, namely, fractional order, diffusion coefficient, and noise magnitude, the diffusion coefficient is most difficult to be estimated, since it is most sensitive to varying spatio-temporal steps in partial observation.
iv
The high accuracy of mean of estimated parameters is usually related to the low standard deviation of estimated parameters, when we fortunately have multiple observations, corresponding to different realizations of driving noise, to obtain multiple groups of estimated parameters.
v
Estimating more parameters jointly leads to larger variability of estimated parameters when spatio-temporal steps increase. Making spatio-temporal steps as small as possible is suggested for a joint estimation of a large number of parameters.
We need to point out, however, that a limitation of the current approach is that the observed data must be distributed at a uniform grid in the space and time. Otherwise, we cannot compute the negative-likelihood function using the finite difference schemes in the forward solver. The approach cannot handle the case where scattered observation data is present.
In reality, when confronted by discrete observation data, we need to pay attention to three points: First, we do not know in advance whether or not the observation is a full observation. The judgment largely depends on what equation model we consider, what discretization scheme we adopt, and what spatial-temporal steps we take. As we have pointed out, when one already chooses the right model and right discretization, one could make a good parameter estimation if the observation data is dense in time but sparse in space. Second, we may take into account model selection problem when we do not have preference for a specific model but have several candidate models. To select a good model, we can consider a scoring strategy. We score each candidate model with the minimum of the corresponding negative log-likelihood and choose the model having the lowest score. Third, different discretization schemes will lead to different pseudo-likelihood function L ˜ ( · ) . Thus, it could be sensible for a given model to select a discretization scheme that yields a pseudo-likelihood estimation more robust to spatial-temporal sampling steps.
Unlike the maximum likelihood estimators presented in [8,11], we do not make any theoretical analyses on asymptotic properties of our pseudo-likelihood estimators. This is one of our future works. It is straightforward to extend the current pseudo-likelihood approach to two- and three-dimensional problems. However, extending the approach to other types of equations, such as space-time fractional stochastic PDEs (1) and even the stochastic versions of variable-order fractional PDEs [27,28], is not that trivial. Reliable discretization schemes for target equations are needed to be developed for those equations, and proper parametrization is required for the variable fractional orders. Furthermore, we will consider in the future the parameter estimation for fractional diffusion equations perturbed by fractional Brownian motion and Lévy motion.

Author Contributions

Conceptualization, G.P. and W.C.; methodology, G.P.; software, G.P.; validation, G.P. and W.C.; formal analysis, G.P.; investigation, G.P.; resources, G.P. and W.C.; data curation, G.P.; writing—original draft preparation, G.P.; writing—review and editing, G.P. and W.C.; visualization, G.P.; supervision, G.P.; project administration, G.P.; funding acquisition, G.P. and W.C. Both authors have read and agreed to the published version of the manuscript.

Funding

The research was funded by the National Natural Science Foundation of China (11671083, 12071073).

Acknowledgments

The authors would like to thank the anonymous referees for providing comments and suggestions that helped to improve the paper. The first author also thanks the support by the Research Foundation for Young Scholars, Southeast University.

Conflicts of Interest

The authors declare no conflict of interest.

Abbreviations

The following abbreviations are used in this manuscript:
MDPIMultidisciplinary Digital Publishing Institute
DOAJDirectory of open access journals
TLAThree letter acronym
LDLinear dichroism

Appendix A. Proof of Proposition 1

Proof. 
We first simplify the expression of density (23) by proving Φ Φ T = N 0 x I . In fact, we only need to prove j = 1 N 0 x ϕ j 2 ( x k ) N 0 x for k = 1 , 2 , , N 0 x 1 . Recalling that ϕ j ( x ) = 2 sin ( j π x ) , we have
2 j = 1 N 0 x sin 2 j π x = j = 1 N 0 x ( 1 cos 2 j π x ) = N 0 x Re j = 1 N 0 x e 2 j π x i ( with i = 1 ) = N 0 x Re e 2 π x i ( 1 e 2 π x N 0 x i ) 1 e 2 π x i = N 0 x Re e 2 π x i e π x N 0 x i ( e π x N 0 x i e π x N 0 x i ) e π x i ( e π x i e π x i ) = N 0 x Re e π x ( N 0 x + 1 ) i sin ( π x N 0 x ) sin ( π x ) = N 0 x cos ( π x ( N 0 x + 1 ) ) sin ( π x N 0 x ) sin ( π x ) .
Noting that x k = k N 0 x , we have in the numerator of the last equation that sin ( π x k N 0 x ) = sin ( k π ) 0 . Thus, j = 1 N 0 x ϕ j 2 ( x k ) N 0 x and Φ Φ T = N 0 x I .
Inserting the simplified version of (23) into the negative log-likelihood in (25) yields
log L ˜ ( α , c , ϵ )   = n = 1 N 0 t N 0 x 1 2 log ( 2 π ) + 1 2 log | ϵ 2 Δ 0 t N 0 x B 0 1 B 0 T | + 1 2 ( Y ¯ n ) T B 0 T B 0 Y ¯ n ϵ 2 Δ 0 t N 0 x   = N 0 t ( N 0 x 1 ) 2 log ( 2 π ) + N 0 t 2 log ( ( ϵ 2 Δ 0 t N 0 x ) N 0 x 1 ) + log | B 0 1 B 0 T | + 1 2 n = 1 N 0 t ( B 0 Y ¯ n ) T ( B 0 Y ¯ n ) ϵ 2 Δ 0 t N 0 x .
Denoting by L the Cholesky decomposition of B 0 gives
log | B 0 1 B 0 T | = log | ( L L T ) 1 ( L L T ) T | = log | L 1 | 4     = log | L | 4 = 4 log | L | = 4 log i = 1 N 0 x 1 L i i = 4 i = 1 N 0 x 1 log L i i .
We bypass computing the determinant of a matrix since for a large matrix (i.e., a large N 0 x ) the direct computation of its determinant can be problematic due to round-off errors. Substituting the above equation into (A2) we arrive to
log L ˜ ( α , c , ϵ ) = N 0 t ( N 0 x 1 ) 2 log ( 2 π ) + N 0 t ( N 0 x 1 ) 2 log ( ϵ 2 Δ 0 t N 0 x ) 2 N 0 t i = 1 N 0 x 1 log ( L i i ) + 1 2 n = 1 N 0 t B 0 Y ¯ n 2 2 ϵ 2 Δ 0 t N 0 x ,
which is exactly (29). Following the maximum likelihood estimation theorem [29], the optimal parameters should minimize the negative log-likelihood log L ˜ ( α , c , ϵ ) . □

Appendix B. Numerical Solution to Forward Problem

We solve the forward problem (2) using 1000 realizations of white noise, namely, Z ( ω k ) , k = 0 , 1 , , 999 . We take the equation parameters α * = 0.5 , c * = 1.0 , ϵ * = 0.1 , the spatial step Δ 0 x = 2 9 , and the temporal step Δ 0 t = 2 9 . We compare in Figure A1 the exact and computed mean functions of the 1000 numerical solutions driven by 1000 groups of the aforementioned white noise. We show the exact and computed means at t = 1 as well as one standard deviation band in the left panel of Figure A2. In the right panel of the figure, we also display the numerical solution at t = 1 for two specific realizations of white noise ω 0 and ω 1 . We can see that the regularity of numerical solution is low. To demonstrate the effect of noise magnitude on numerical solutions, we plot the numerical solutions for four increasing noise magnitudes in Figure A3.
Figure A1. Solution to forward problem: Exact mean (left) and computed mean (right) of 1000 solutions under 1000 realizations of white noise.
Figure A1. Solution to forward problem: Exact mean (left) and computed mean (right) of 1000 solutions under 1000 realizations of white noise.
Fractalfract 05 00129 g0a1
Figure A2. Solution to forward problem: Exact and computed means of solutions at t = 1 and one standard deviation band (left), and solutions at t = 1 under two realizations of white noise (right).
Figure A2. Solution to forward problem: Exact and computed means of solutions at t = 1 and one standard deviation band (left), and solutions at t = 1 under two realizations of white noise (right).
Fractalfract 05 00129 g0a2
Figure A3. Solution to forward problem: Effect of noise magnitude ϵ on numerical solution for fixed realization of white noise ω 0 .
Figure A3. Solution to forward problem: Effect of noise magnitude ϵ on numerical solution for fixed realization of white noise ω 0 .
Fractalfract 05 00129 g0a3

References

  1. Mijena, J.B.; Nane, E. Space–time fractional stochastic partial differential equations. J. Stoch. Process Their Appl. 2015, 125, 3301–3326. [Google Scholar] [CrossRef]
  2. Anh, V.V.; Leonenko, N.N.; Ruiz-Medina, M.D. Fractional-in-time and multifractional-in-space stochastic partial differential equations. Fract. Calc. Appl. Anal. 2016, 19, 1434–1459. [Google Scholar] [CrossRef] [Green Version]
  3. Gunzburger, M.; Li, B.Y.; Wang, J.L. Sharp convergence rates of time discretization for stochastic time-fractional PDEs subject to additive space-time white noise. Math. Comput. 2019, 88, 1715–1741. [Google Scholar] [CrossRef]
  4. Bolin, D.; Kirchner, K.; Kovács, M. Fractional-in-time and multifractional-in-space stochastic partial differential equations. IMA J. Numer. Anal. 2020, 40, 1051–1073. [Google Scholar] [CrossRef]
  5. Anh, V.V.; Olenko, A.; Wang, Y.G. Fractional stochastic partial differential equation for random tangent fields on the sphere. arXiv 2021, arXiv:2107.03717. [Google Scholar]
  6. Mohammed, W.W. Approximate solutions for stochastic time-fractional reaction–diffusion equations with multiplicative noise. Math. Methods Appl. Sci. 2021, 44, 2140–2157. [Google Scholar] [CrossRef]
  7. Xia, D.F.; Yan, L.T. Some properties of the solution to fractional heat equation with a fractional Brownian noise. Adv. Differ. Equ. 2017, 107, 1–16. [Google Scholar] [CrossRef] [Green Version]
  8. Huebner, M.; Rozovskii, B.L. AOn asymptotic properties of maximum likelihood estimators for parabolic stochastic PDE’s. Probab. Theory Relat. Fields 1995, 103, 143–163. [Google Scholar] [CrossRef]
  9. Bishwal, J.P.N. Parameter Estimation in Stochastic Differential Equations; Springer: Berlin/Heidelberg, Germany, 2007. [Google Scholar]
  10. Rao, B.L.S.P. Statistical Inference for Fractional Diffusion Processes; John Wiley & Sons: Hoboken, NJ, USA, 2011. [Google Scholar]
  11. Huebner, M.; Khasminskii, R.; Rozovskii, B.L. Two examples of parameter estimation for stochastic partial differential equations. In Stochastic Processes: A Festschrift in Honour of Gopinath Kallianpur; Cambanis, S., Ghosh, J.K., Karandikar, R.L., Sen, P.K., Eds.; Springer: New York, NY, USA, 1993; pp. 149–160. [Google Scholar]
  12. Cialenco, I.; Lototsky, S.V.; Pospíšil, J. Asymptotic properties of the maximum likelihood estimator for stochastic parabolic equations with additive fractional Brownian motion. Stoch. Dyn. 2009, 9, 169–185. [Google Scholar] [CrossRef]
  13. Cialenco, I. Parameter estimation for SPDEs with multiplicative fractional noise. Stoch. Dyn. 2010, 10, 561–576. [Google Scholar] [CrossRef]
  14. Geldhauser, C.; Valdinoci, E. Optimizing the fractional power in a model with stochastic PDE constraints. Adv. Nonlinear Stud. 2018, 18, 649–669. [Google Scholar] [CrossRef] [Green Version]
  15. Aster, R.C.; Borchers, B.; Thurber, C.H. Parameter Estimation and Inverse Problems; Elsevier: Amsterdam, The Netherlands, 2005. [Google Scholar]
  16. Pang, G.F.; Perdikaris, P.; Cai, W.; Karniadakis, G.E. Discovering variable fractional orders of advection–dispersion equations from field data using multi-fidelity Bayesian optimization. J. Comput. 2017, 348, 694–714. [Google Scholar] [CrossRef]
  17. Yan, L.; Guo, L. Stochastic Collocation Algorithms Using l1-Minimization for Bayesian Solution of Inverse Problems. SIAM J. Sci. Comput. 2015, 37, A1410–A1435. [Google Scholar] [CrossRef]
  18. Garcia, L.A.; Shigidi, A. Using neural networks for parameter estimation in ground water. J. Hydrol. 2006, 318, 215–231. [Google Scholar] [CrossRef]
  19. Pang, G.F.; Lu, L.; Karniadakis, G.E. fPINNs: Fractional physics-informed neural networks. SIAM J. Sci. Comput. 2019, 41, A2603–A2626. [Google Scholar] [CrossRef]
  20. Da, P.G.; Zabczyk, J. Stochastic Equations in Infinite Dimensions; Cambridge University Press: Cambridge, UK, 2014. [Google Scholar]
  21. Kilbas, A.A.; Srivastava, H.M.; Trujillo, J.J. Theory and Applications of Fractional Differential Equations; Elsevier: Amsterdam, The Netherlands, 2006. [Google Scholar]
  22. Iacus, S.M. Simulation and Inference for Stochastic Differential Equations: With R Examples; Springer: Berlin/Heidelberg, Germany, 2009. [Google Scholar]
  23. Yoshida, N. Estimation for diffusion processes from discrete observation. J. Multivar. Anal. 1992, 41, 220–242. [Google Scholar] [CrossRef] [Green Version]
  24. Lubich, C. Convolution quadrature and discretized operational calculus. I. Numer. Math. 1988, 52, 129–145. [Google Scholar] [CrossRef]
  25. Lubich, C. Convolution quadrature and discretized operational calculus. II. Numer. Math. 1988, 52, 413–425. [Google Scholar] [CrossRef]
  26. Byrd, R.H.; Lu, P.H.; Nocedal, J.; Zhu, C.Y. A limited memory algorithm for bound constrained optimization. SIAM J. Sci. Comput. 1995, 16, 1190–1208. [Google Scholar] [CrossRef]
  27. Zheng, X.C.; Wang, H. An error estimate of a numerical approximation to a hidden-memory variable-order space-time fractional diffusion equation. SIAM J. Numer. Anal. 2020, 58, 2492–2514. [Google Scholar] [CrossRef]
  28. Zheng, X.C.; Wang, H. Wellposedness and regularity of a variable-order space-time fractional diffusion equation. Anal. Appl. 2020, 18, 615–638. [Google Scholar] [CrossRef]
  29. Hogg, R.V.; McKean, J.; Craig, A.T. Introduction to Mathematical Statistics; Pearson Education: London, UK, 2005. [Google Scholar]
Figure 1. Full observations with Δ 0 x = Δ 0 t = 2 9 for noise magnitude ϵ = 0 (left panel) and ϵ = 0.1 (right panel). The fractional order and diffusion coefficient are fixed to α * = 0.5 and c * = 1.0 , respectively.
Figure 1. Full observations with Δ 0 x = Δ 0 t = 2 9 for noise magnitude ϵ = 0 (left panel) and ϵ = 0.1 (right panel). The fractional order and diffusion coefficient are fixed to α * = 0.5 and c * = 1.0 , respectively.
Fractalfract 05 00129 g001
Figure 2. The realization of the noise term Z ( ω 0 ) in (18) that yields the full observation in the right panel of Figure 1. The noise magnitude is set to ϵ * = 0.1 .
Figure 2. The realization of the noise term Z ( ω 0 ) in (18) that yields the full observation in the right panel of Figure 1. The noise magnitude is set to ϵ * = 0.1 .
Fractalfract 05 00129 g002
Figure 3. One-parameter estimation: Negative log-likelihood curves for varying α [ 0.1 , 0.9 ] and fixed c * = 1.0 (left) and for varying c [ 0.1 , 2 ] and fixed α * = 0.5 (right). The effect of spatial step on the optimal parameters is demonstrated. The noise magnitude is fixed to ϵ * = 0.1 . Full observation and partial observation curves are displayed in different colors. As these curves have y values with several orders of magnitude, for convenience of comparison, the y values are normalized by dividing the y values of each curve by their maximum absolute values. The temporal step is fixed to that of full observation, i.e., Δ 0 t = 2 9 ; the spatial steps vary, and Δ x = 2 9 corresponds to the full observation case, while the other three spatial steps correspond to the partial observation cases with r x = 4 , 16 , and 64. The vertical dotted lines intersect x-axis at the optimal α or c that minimizes the normalized negative log-likelihood. The optimal parameters are regarded as the estimated parameters. Note that a vertical line having a specific color corresponds to the likelihood curve having the same color, and vertical lines can overlap each other when the optimal parameters are the same.
Figure 3. One-parameter estimation: Negative log-likelihood curves for varying α [ 0.1 , 0.9 ] and fixed c * = 1.0 (left) and for varying c [ 0.1 , 2 ] and fixed α * = 0.5 (right). The effect of spatial step on the optimal parameters is demonstrated. The noise magnitude is fixed to ϵ * = 0.1 . Full observation and partial observation curves are displayed in different colors. As these curves have y values with several orders of magnitude, for convenience of comparison, the y values are normalized by dividing the y values of each curve by their maximum absolute values. The temporal step is fixed to that of full observation, i.e., Δ 0 t = 2 9 ; the spatial steps vary, and Δ x = 2 9 corresponds to the full observation case, while the other three spatial steps correspond to the partial observation cases with r x = 4 , 16 , and 64. The vertical dotted lines intersect x-axis at the optimal α or c that minimizes the normalized negative log-likelihood. The optimal parameters are regarded as the estimated parameters. Note that a vertical line having a specific color corresponds to the likelihood curve having the same color, and vertical lines can overlap each other when the optimal parameters are the same.
Fractalfract 05 00129 g003
Figure 4. One-parameter estimation: Negative log-likelihood curves for varying α and fixed c * = 1.0 (left) and for varying c and fixed α * = 0.5 (right). The effect of temporal step on the optimal parameters is demonstrated. The noise magnitude is fixed to ϵ * = 0.1 . Full observation and partial observation curves are displayed in different colors. The spatial step is fixed to that of full observation, i.e., Δ 0 x = 2 9 ; the temporal steps vary, and Δ t = 2 9 corresponds to the full observation case, while the other three spatial steps correspond to partial observation cases with r t = 2 , 8 , and 16.
Figure 4. One-parameter estimation: Negative log-likelihood curves for varying α and fixed c * = 1.0 (left) and for varying c and fixed α * = 0.5 (right). The effect of temporal step on the optimal parameters is demonstrated. The noise magnitude is fixed to ϵ * = 0.1 . Full observation and partial observation curves are displayed in different colors. The spatial step is fixed to that of full observation, i.e., Δ 0 x = 2 9 ; the temporal steps vary, and Δ t = 2 9 corresponds to the full observation case, while the other three spatial steps correspond to partial observation cases with r t = 2 , 8 , and 16.
Fractalfract 05 00129 g004
Figure 5. One-parameter estimation for a full observation: Negative log-likelihood curves for varying α and fixed c * = 1.0 (left) and for varying c and fixed α * = 0.5 (right). The effect of noise magnitude on the optimal parameters is demonstrated. Curves for different noise magnitudes are displayed in different colors. The spatio-temporal steps are fixed to those of a full observation, i.e., Δ 0 t = Δ 0 x = 2 9 .
Figure 5. One-parameter estimation for a full observation: Negative log-likelihood curves for varying α and fixed c * = 1.0 (left) and for varying c and fixed α * = 0.5 (right). The effect of noise magnitude on the optimal parameters is demonstrated. Curves for different noise magnitudes are displayed in different colors. The spatio-temporal steps are fixed to those of a full observation, i.e., Δ 0 t = Δ 0 x = 2 9 .
Fractalfract 05 00129 g005
Figure 6. One-parameter estimation for a partial observation: Negative log-likelihood curves for varying α and fixed c * = 1.0 (left) and for varying c and fixed α * = 0.5 (right). The effect of noise magnitude on the optimal parameters is demonstrated. Curves for different noise magnitudes are displayed in different colors. The spatio-temporal steps are fixed to those of a specific partial observation with r x = 16 and r t = 32 .
Figure 6. One-parameter estimation for a partial observation: Negative log-likelihood curves for varying α and fixed c * = 1.0 (left) and for varying c and fixed α * = 0.5 (right). The effect of noise magnitude on the optimal parameters is demonstrated. Curves for different noise magnitudes are displayed in different colors. The spatio-temporal steps are fixed to those of a specific partial observation with r x = 16 and r t = 32 .
Fractalfract 05 00129 g006
Figure 7. Two-parameter estimation: Contour plots of normalized negative log-likelihood for pairs of ( α , c ) [ 0.3 , 0.9 ] × [ 0.5 , 1.5 ] . The effect of spatio step on the estimated parameters is demonstrated. The noise magnitude is taken to its true value ϵ * = 0.1 . The true values for α and c are α * = 0.5 and c * = 1.0 . The full observation case r x = 1 is compared with other three partial observation cases with r x = 4 , 16 , and 64. For all plots, the other controlling parameter for partial observation is fixed to r t = 1 . The red disk represents the optimal parameter pair that minimizes the negative log-likelihood.
Figure 7. Two-parameter estimation: Contour plots of normalized negative log-likelihood for pairs of ( α , c ) [ 0.3 , 0.9 ] × [ 0.5 , 1.5 ] . The effect of spatio step on the estimated parameters is demonstrated. The noise magnitude is taken to its true value ϵ * = 0.1 . The true values for α and c are α * = 0.5 and c * = 1.0 . The full observation case r x = 1 is compared with other three partial observation cases with r x = 4 , 16 , and 64. For all plots, the other controlling parameter for partial observation is fixed to r t = 1 . The red disk represents the optimal parameter pair that minimizes the negative log-likelihood.
Fractalfract 05 00129 g007
Figure 8. Two-parameter estimation: Contour plots of normalized negative log-likelihood for pairs of ( α , c ) [ 0.3 , 0.9 ] × [ 0.5 , 1.5 ] . The effect of temporal step on the estimated parameters is demonstrated. The noise magnitude is taken to its true value ϵ * = 0.1 . The true values for α and c are α * = 0.5 and c * = 1.0 . The full observation case r t = 1 is compared with other three partial observation cases with r t = 2 , 8 , and 32. For all plots, the other controlling parameter for partial observation is fixed to r x = 1 . The red disk represents the optimal parameter pair that minimize the negative log-likelihood.
Figure 8. Two-parameter estimation: Contour plots of normalized negative log-likelihood for pairs of ( α , c ) [ 0.3 , 0.9 ] × [ 0.5 , 1.5 ] . The effect of temporal step on the estimated parameters is demonstrated. The noise magnitude is taken to its true value ϵ * = 0.1 . The true values for α and c are α * = 0.5 and c * = 1.0 . The full observation case r t = 1 is compared with other three partial observation cases with r t = 2 , 8 , and 32. For all plots, the other controlling parameter for partial observation is fixed to r x = 1 . The red disk represents the optimal parameter pair that minimize the negative log-likelihood.
Fractalfract 05 00129 g008
Figure 9. Two-parameter estimation for a full observation: Contour plots of normalized negative log-likelihood for pairs of ( α , c ) [ 0.3 , 0.9 ] × [ 0.5 , 1.5 ] . The effect of noise magnitude on the estimated parameters is demonstrated. The true values for α and c are α * = 0.5 and c * = 1.0 . The plots for different noise magnitudes are compared. For all plots, the controlling parameters for partial observation are fixed to r x = r t = 1 (namely, a full observation). The red disk represents the optimal parameter pair that minimizes the negative log-likelihood.
Figure 9. Two-parameter estimation for a full observation: Contour plots of normalized negative log-likelihood for pairs of ( α , c ) [ 0.3 , 0.9 ] × [ 0.5 , 1.5 ] . The effect of noise magnitude on the estimated parameters is demonstrated. The true values for α and c are α * = 0.5 and c * = 1.0 . The plots for different noise magnitudes are compared. For all plots, the controlling parameters for partial observation are fixed to r x = r t = 1 (namely, a full observation). The red disk represents the optimal parameter pair that minimizes the negative log-likelihood.
Fractalfract 05 00129 g009
Figure 10. Two-parameter estimation for a partial observation: Contour plots of normalized negative log-likelihood for pairs of ( α , c ) [ 0.3 , 0.9 ] × [ 0.5 , 1.5 ] . The effect of noise magnitude on the estimated parameters is demonstrated. The true values for α and c are α * = 0.5 and c * = 1.0 . The plots for different noise magnitudes are compared. For all plots, the controlling parameters for partial observation are fixed to r x = 16 and r t = 8 . The red disk represents the optimal parameter pair that minimizes the negative log-likelihood.
Figure 10. Two-parameter estimation for a partial observation: Contour plots of normalized negative log-likelihood for pairs of ( α , c ) [ 0.3 , 0.9 ] × [ 0.5 , 1.5 ] . The effect of noise magnitude on the estimated parameters is demonstrated. The true values for α and c are α * = 0.5 and c * = 1.0 . The plots for different noise magnitudes are compared. For all plots, the controlling parameters for partial observation are fixed to r x = 16 and r t = 8 . The red disk represents the optimal parameter pair that minimizes the negative log-likelihood.
Fractalfract 05 00129 g010
Table 1. Two-parameter estimation: Mean of the estimated ( α , c ) from 20 different full observations. The effect of sparsity of observation (i.e., sampling rations r x and r t ) on the mean of the estimated parameters is demonstrated. The true parameters are α * = 0.5 and c * = 1 . The noise magnitude is fixed to ϵ * = 0.1 . A quasi-Newton optimization algorithm, L-BFGS-B algorithm, is employed to minimize the negative log-likelihood.
Table 1. Two-parameter estimation: Mean of the estimated ( α , c ) from 20 different full observations. The effect of sparsity of observation (i.e., sampling rations r x and r t ) on the mean of the estimated parameters is demonstrated. The true parameters are α * = 0.5 and c * = 1 . The noise magnitude is fixed to ϵ * = 0.1 . A quasi-Newton optimization algorithm, L-BFGS-B algorithm, is employed to minimize the negative log-likelihood.
Mean α ^ r x = 128 6432168421
r t = 32 0.6730.6340.6490.6640.6730.6820.6820.683
r t = 16 0.6210.6030.6120.6280.6370.6450.6480.651
r t = 8 0.5810.5730.5890.6030.6110.6170.6200.621
r t = 4 0.5550.5570.5690.5790.5820.5860.5880.590
r t = 2 0.5280.260.5360.5440.5460.5480.5500.551
r t = 1 0.4910.4850.4920.4970.4980.4990.5000.500
Mean c ^ r x = 128 6432168421
r t = 32 1.4551.2591.2771.3071.3141.3241.2641.100
r t = 16 1.4281.2971.3171.3651.3861.4001.3511.183
r t = 8 1.4091.3161.3791.4381.4661.4781.4231.238
r t = 4 1.4001.3711.4241.4731.4851.4871.4321.248
r t = 2 1.3611.3261.3751.4101.4111.4081.3551.177
r t = 1 1.2331.1861.2111.2231.2191.2051.1551.000
Table 2. Two-parameter estimation: Standard deviation (std) of the estimated ( α , c ) from 20 different full observations. The effect of sparsity of observation on the std of the estimated parameters is demonstrated. The noise magnitude is fixed to ϵ * = 0.1 .
Table 2. Two-parameter estimation: Standard deviation (std) of the estimated ( α , c ) from 20 different full observations. The effect of sparsity of observation on the std of the estimated parameters is demonstrated. The noise magnitude is fixed to ϵ * = 0.1 .
Std α ^ r x = 128 6432168421
r t = 32 0.0940.0620.0440.0320.0230.0190.0140.009
r t = 16 0.0710.0450.0350.0250.0160.0120.0090.007
r t = 8 0.0560.0330.0240.0170.0130.0100.0070.005
r t = 4 0.0350.0230.0160.0110.0070.0060.0040.003
r t = 2 0.0270.0170.0110.0080.0070.0050.0030.002
r t = 1 0.0210.0100.0070.0060.0050.0040.0030.002
Std c ^ r x = 128 6432168421
r t = 32 0.3920.2250.1670.1210.0880.0720.0490.278
r t = 16 0.3500.2110.1730.1230.0770.0620.0450.028
r t = 8 0.3270.1820.1400.1020.0810.0660.0460.027
r t = 4 0.2540.1510.1120.0800.0600.0500.0320.020
r t = 2 0.2150.1220.0840.0650.0550.0380.0270.015
r t = 1 0.1610.0750.0540.0460.0410.0280.0190.011
Table 3. Three-parameter estimation: Estimated ( α , c , ϵ ) from the full observation shown in the right panel of Figure 1. The effect of sparsity of observation on estimated parameters is demonstrated. The true parameters are α * = 0.5 , c * = 1 , and ϵ * = 0.1 . A quasi-Newton optimization algorithm, L-BFGS-B algorithm is employed to minimize the negative log-likelihood.
Table 3. Three-parameter estimation: Estimated ( α , c , ϵ ) from the full observation shown in the right panel of Figure 1. The effect of sparsity of observation on estimated parameters is demonstrated. The true parameters are α * = 0.5 , c * = 1 , and ϵ * = 0.1 . A quasi-Newton optimization algorithm, L-BFGS-B algorithm is employed to minimize the negative log-likelihood.
α ^ r x = 128 6432168421
r t = 32 0.4950.5960.6610.6680.6650.6940.6830.691
r t = 16 0.5180.5830.6090.6070.6140.6500.6510.662
r t = 8 0.5180.5580.5760.5860.5970.6240.6270.632
r t = 4 0.5050.5150.5480.5650.5690.5860.5870.590
r t = 2 0.4980.4970.5120.5330.5360.5470.5490.553
r t = 1 0.4560.4560.4730.4870.4910.4960.5000.500
c ^ r x = 128 6432168421
r t = 32 1.3821.1521.1371.1701.1621.1741.1551.101
r t = 16 1.2811.1921.1901.2521.2461.2601.2171.096
r t = 8 1.2461.2111.2721.3781.3911.3761.4481.079
r t = 4 1.7201.6861.8452.0001.9801.9501.4961.360
r t = 2 1.50001.5541.7261.8801.8621.8571.4411.337
r t = 1 1.3691.4631.6221.7441.7301.7091.5231.038
ϵ ^ r x = 128 6432168421
r t = 32 0.1620.1090.0910.0910.0930.0850.0910.098
r t = 16 0.1300.1020.0940.0780.0960.0870.0880.089
r t = 8 0.1090.0990.0980.1010.1000.0900.0990.084
r t = 4 0.1460.1460.1410.1420.1400.1310.1050.108
r t = 2 0.1220.1320.1390.1390.1380.1320.1070.112
r t = 1 0.1260.1400.1460.1480.1470.1430.1310.104
Publisher’s Note: MDPI stays neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Share and Cite

MDPI and ACS Style

Pang, G.; Cao, W. Pseudo-Likelihood Estimation for Parameters of Stochastic Time-Fractional Diffusion Equations. Fractal Fract. 2021, 5, 129. https://doi.org/10.3390/fractalfract5030129

AMA Style

Pang G, Cao W. Pseudo-Likelihood Estimation for Parameters of Stochastic Time-Fractional Diffusion Equations. Fractal and Fractional. 2021; 5(3):129. https://doi.org/10.3390/fractalfract5030129

Chicago/Turabian Style

Pang, Guofei, and Wanrong Cao. 2021. "Pseudo-Likelihood Estimation for Parameters of Stochastic Time-Fractional Diffusion Equations" Fractal and Fractional 5, no. 3: 129. https://doi.org/10.3390/fractalfract5030129

Article Metrics

Back to TopTop