The Combined Estimator for Stochastic Equations on Graphs with Fractional Noise

In the present paper, we study the problem of estimating a drift parameter in stochastic evolution equations on graphs. We focus on equations driven by fractional Brownian motions, which are particularly useful e.g., in biology or neuroscience. We derive a novel estimator (the combined estimator) and prove its strong consistency in the long-span asymptotic regime with a discrete-time sampling scheme. The promising performance of the combined estimator for finite samples is examined under various scenarios by Monte Carlo simulations.


Introduction
Evolution equations on graphs and networks have proven to be a useful tool for modeling dynamics of various phenomena in many diverse fields, such as biology, engineering, statistical physics, social sciences, economics etc., for several applications of such models (such as sensor networks, nanosystems, multiagent coordination, social networks etc.); see e.g., [1] and references therein. Many real-world phenomena are affected by random effects (random forcing). These effects can be included in an evolution equation as a stochastic noise term. Evolution equations on graphs with stochastic noise have been studied e.g., in [1]. A direct application in neurobiology motivated the study of a specific type of dynamical models on graphs with fractional noise (the noise generated by a fractional Brownian motion) presented in [2]. Fractional Brownian motion has become a popular tool to model random noise with memory (with auto-correlation), which is desirable in many diverse fields (see e.g., monographs [3,4] for discussion).
Another motivation for studying stochastic evolution equations on graphs comes from the discretization of parabolic stochastic partial differential equations (SPDEs) in space domain, where the selected spatial discretization method determines the evolution operator in the equation (cf. [5]).
In this paper, we consider the problem of estimating an unknown drift parameter in stochastic evolution equations (sEE) on graphs with fractional noise. Estimating a drift parameter in various stochastic differential equations has been considered by many authors. A great amount of literature has been devoted to equations with white noise (generated by a standard Brownian motion with uncorrelated increments), where the maximum-likelihood approach dominates; see for example the classical monograph [6] for 1-dimensional equations or the recent overview paper [7] for equations in ∞-dimensional spaces, to name just a few. Specific problems arise in equations with fractional noise (generated by a fractional Brownian motion with autocorrelated increments); see e.g., books [8,9] for discussion of various approaches to parameter estimation in this setting. We are not aware of any study on parameter estimation in the context of sEE on graphs with fractional noise, although it may be of a great interest for both researchers and practitioners, as it corresponds to inferring certain properties of a random dynamical system on a graph based on observing its evolution in time. This paper is supposed to bridge this gap.
Our paper is organized as follows. In Section 2, we specify the basic setting of the estimation problem. In Section 3, we recall the definition of a fractional Brownian motion, its basic properties and calculus. Moreover, we provide a short overview of the concepts of mild solutions to stochastic evolution equations in both physical and frequency domain. Section 4 is devoted to the introduction, calculation and strong consistency of the combined estimator-the main contribution of this paper. Illustration of the actual performance of the combined estimator in several scenarios is presented in Section 5 with Monte Carlo simulations. Main findings and results of this paper are summarized in Section 6.

Problem Setup
Consider an undirected, connected, finite graph G = (V, E) with a set of vertices V = {v 1 , . . . , v N } and a set of edges E. Denote by R V a finite-dimensional Hilbert space of the real-valued functions on V with the canonical scalar product defined by x, y = ∑ v∈V x(v)y(v). Note that the indicator functions e v = 1 v for v ∈ V form the canonical orthonormal basis of R V . Recall that the combinatorial graph Laplacian L on a graph G is a self-adjoint, positive semidefinite, linear operator on R V , characterized by the following matrix: where deg(v) is number of edges connected to v (the degree of v). As a motivation (and a toy model) for further studies, consider a simple heat equation on the graph G with stochastic forcing, where x(t, v) is a real-valued state variable at vertex v and time t, λ ∈ R + is an unknown parameter (heat conductivity), L is the combinatorial graph Laplacian and η(t, v) represents a noise (stochastic) term, being a formal time-derivative of a (fractional) Brownian motion.
Since the classical (strong) derivative of a (fractional) Brownian motion does not exist, we reformulate the above stochastic heat equation rigorously as a sEE on the finite-dimensional Hilbert space R V : where {X t } t≥0 = {x(t, .)} t≥0 is an R V -valued random process, λ ∈ R + is an unknown drift parameter and L is the combinatorial graph Laplacian. For the noise (stochastic) part, B H denotes the cylindrical fractional Brownian motion on R V with Hurst index H and Φ, being a bounded operator on R V , determines the correlations between vertices. Motivated by the above example, we consider the parameter estimation problem in an evolution equation on R V of the following form in this paper: where the drift operator A(λ) is a linear, self-adjoint, positive semidefinite operator (matrix) on R V . It depends on an unknown (potentially d-dimensional) parameter λ ∈ C ⊂ R d , with C being compact.
In the context of graph theory, the operator A may represent (weighted) graph Laplacian, in-degree Laplacian (for oriented graphs), d-path Laplacian (for non-local interactions, see [10]) or other linear operators. B H is again the cylindrical fractional Brownian motion on R V with Hurst index H ≥ 1/2 (the regular case with positively auto-correlated increments) and Φ a bounded linear operator on R V . We address the problem of estimating the unknown drift parameter λ based on a single trajectory of a solution to evolution Equation (5) observed at discrete time instants with a time step h > 0, i.e., based on {X ih } i=0,...n . We assume that the time step h > 0 is fixed and the time horizon goes to infinity, T = nh → ∞, i.e., the discrete sampling scheme with long-span asymptotics. Literature on drift estimation for multidimensional sEE with the discrete sampling scheme and long-span asymptotics is rather sparse. In this respect, let us mention papers [11][12][13] considering SPDEs with white-in-time noise, and paper [14] focused on infinite-dimensional sEE with fractional noise. The so-called minimum-contrast estimator (MCE) studied in the latter paper is plausible in our setting, if the projection of a solution to the eigenspaces corresponding to strictly positive eigenvalues is considered (to ensure ergodicity). However, the comparison in the recent paper [15] shows that the ergodic estimator (the counterpart to the MCE in one-dimensional setting) provides less accurate estimates compared to the conditional least-squares estimator (introduced in [15]), which turned out to be a very powerful estimator of the unknown drift in models with autocorrelated noise. Hence, we generalize the idea of the conditional least-squares approach into the multi-dimensional framework. To enhance the efficiency of the estimator, we combine the conditional least-squares approach in the time domain (where the covariance structure is complicated) with a maximum-likelihood approach in the space domain (with rather simple covariance structure), which leads to a novel estimator-the combined estimator (CoE).
The value of the Hurst index H is considered to be known. If this is not the case, it can be estimated in advance (before an evaluation of the combined estimator) by some of many available methods-see for example [16][17][18] to name just a few. The estimated H can subsequently be used in the CoE of the drift parameter similarly to the approach applied in [19]. The simultaneous estimation of λ and H in sEE on graphs is outside the scope of this work and could be a potential direction for further research.

Fractional Brownian Motion and Its Calculus
In this subsection, we briefly introduce the fractional Brownian motion (fBm) and specifics of the stochastic calculus w.r.t. the fBm. For further reading, we refer the interested reader to the monographs [3,4] and references therein.
The one-dimensional fractional Brownian motion {β H (t)} t≥0 is a centered continuous real-valued Gaussian process starting from zero and having the following covariance structure The covariance depends on a parameter H ∈ (0, 1), which is called the Hurst index. If H = 1/2, fBm has independent increments and it coincides with a standard Brownian motion (the Wiener process).
If H > 1/2, fBm has positively correlated increments and it exhibits a long-range dependence. It enjoys increasing popularity in many applications where long-range dependence is observed. It also has more regular trajectories than the Wiener process (in terms of the Hölder continuity). The case H < 1/2 (not considered in this work) leads to negatively correlated increments, less regular trajectories and serious difficulties when integrating w.r.t. the fBm. A two-sided fBm is again a centered continuous Gaussian process {β H (t)} t∈R with t ranging over the whole R, β H (0) = 0 and is an N-dimensional random process whose coordinates are mutually independent real-valued fBm with the same Hurst index H.
Consider a fBm {β H (t)} t≥0 with H > 1/2 and a deterministic step function ψ(s) = As a consequence, we obtain the following isometry for any pair of deterministic step functions ψ and φ: This isometry enables us to extend the definition of the Wiener integral w.r.t. a fBm to all elements of the completion of the space of deterministic step functions w.r.t. the scalar product ., . defined above. In result, Formula (6) holds true for all functions from this completion, see also [20]. This formula is extremely useful for the construction of the combined estimator below.

Solution to sEE in Physical and Frequency Domain
For the reader's convenience, we recall the concept of mild solution to (5) in physical domain. It reads: The existence of the mild solution was established (in more general setting of sEE on Hilbert spaces) in papers [21] for H > 1/2 and [22] for H < 1/2, or in paper [23]. When studying the solution, we can switch from the physical to the frequency domain. It provides us with an easy-to-handle formula of the solution, which will be useful both for the simulation and parameter estimation in the sequel. Start with the spectral decomposition of the self-adjoint, positive semidefinite matrix A(λ): where G(λ) is a N × N matrix whose columns are orthonormal eigenvectors {g k (λ)} k=1...N of A(λ) and D(λ) = diag{α 1 (λ), . . . , α N (λ)} is a diagonal matrix formed by the corresponding non-negative eigenvalues. It easily follows: With this decomposition at hand, we can consider the solution to (5) in frequency domain: which is given by the mild formula Since this system is diagonal, we can write explicit formulas for individual entries of F t . Denote Then { f k (t)} t≥0 are real-valued fractional Ornstein-Uhlenbeck processes, as seen from the formulas To eliminate the stochastic Wiener integral from the expression above, we can apply the integration by parts formula to get The solution formula above is particularly useful for simulations, because it contains only Riemann integral and no stochastic integration is needed.

Combined Estimator
To estimate the unknown parameter λ in (5), we observe a single realization of the process {X ih } i=0,...n . It is an (N × (n + 1))-dimensional Gaussian vector, so a natural choice would be to take the maximum-likelihood estimator. However, the evaluation of such an estimator would be extremely consuming, both in terms of CPU time and memory, especially for large n. This is due to the complicated structure of the covariance matrix with respect to time domain i = 0, . . . n resulting from the long-range dependence of the fractional Brownian motion with H > 1/2. Hence, some approximation to MLE has to be considered (cf. e.g., [24]). In this section, we study an approximation based on the conditional least-squares technique. This approach has recently been introduced and studied (with promising results) in [15] for real-valued fractional Ornstein-Uhlenbeck processes.

Combined Estimator in Physical Domain
The main idea of the conditional least-squares technique is to consider the following linear model where E λ [X t+h |X t ] means the conditional expectation of X t+h , conditioned by X t , where the process X is given by (5) with parameter value λ. Conditioning eliminates the unwanted correlations between explanatory variables and error terms in model (15), which enables the application of the least-squares technique.
To design the ultimate form of the least-squares criterion to be minimized, we have to decide about the space-norm to be applied to the vectors X t+h − E λ [X t+h |X t ]. In this aspect, we stick to the maximum-likelihood technique, which is well suited e.g., for the white-in-space noise (Φ is diagonal). Notice the N-dimensional Gaussian conditional distribution and the corresponding log-likelihood function (for a fixed t): To express µ t and Σ t explicitly, recall Now consider the more complicated case H > 1/2 (the simpler case H = 1/2 is discussed in Remark 1). Utilizing isometry (6), we get the (co)variance matrices var The joint multivariate normality of X t and ξ t leads to The combination of the conditional least-squares approach in the time domain with the maximum-likelihood approach in the space domain results in the following combined estimator: where C is a compact subset of R d , Λ is an unknown representing the parameter to be estimated (λ is its true value) and with µ t and Σ t explicitly expressed in (20) and (21), respectively.

Remark 1.
If H = 1/2, which corresponds to the white-in-time noise, the combined estimator coincides with classical MLE. Indeed, X t and ξ t are independent in this case, the Gaussian process {X ih } i=0,...,n has Markov property and its joint distribution corresponds to the product of the conditional distributions (16). As a result, minimizing (22) is equivalent to minimizing log-likelihood function of the process. Moreover, we have simplified calculations

Combined Estimator in Diagonal Case
If the number of vertices N is large, evaluation of the criterion function in Formula (22) might be computationally demanding, mainly due to inversions of N × N matrices [var(X t )] −1 and [Σ t (Λ)] −1 for all t ∈ {h, 2h . . . , nh}. This issue can be resolved by considering the whole problem in the spectral domain in the diagonal case. For this reason, we assume throughout this subsection that the noise term is white in space, i.e., Φ is an identity matrix (possibly multiplied by a constant characterizing noise intensity): Assumption (WN): Recall the spectral decomposition of A(λ) and e −A(λ)t given in (8) and (9), respectively. In addition, assume that the eigenvectors of operator A(λ) do not depend on the parameter λ, i.e., Assumption (EV): Such an assumption is satisfied if the unknown parameter λ affects only the eigenvalues of A(λ), for example, if λ is a multiplicative constant (diffusivity or heat conductivity): To simplify the exposition, sort the eigenvalues in descendent order, so that the first M eigenvalues are positive and the others are zeros: and after substitutions in integrals on the diagonal, where we denoted Note that using change-of-variable formula (u = r + s, v = r − s) and analytical integrating over v yields the following formula with simple one-dimensional integrals, which is numerically more stable and easier to implement: The other two (co)variance matrices can be treated similarly: and where we define After changing variables and analytical integration, we can again simplify this double integral into simple one-dimensional integrals, which are easier to approximate numerically: Ease of inversion of diagonal matrices provides us with the simplified formula If we switch from physical to frequency domain, see (10), we get Now we can construct the spectral version of the criterion (23). Let Λ be again an unknown representing the parameter to be estimated (λ is its true value) and denote by f k (t) the coefficients in frequency domain, i.e., ( f 1 (t), . . . , f N (t)) T = (g T 1 X t , . . . , g T N X t ) T = G T X t .
Using this notation, we can write log(s k ) with s k = s k (t, Λ) defined in (30) and m k = m k (t, Λ) in (29). Note that by using spectral decomposition, we eliminated the matrix inversions from the criterion formula and integrals of matrices were substituted by real-valued integrals of the form (27) and (28). Moreover, if k = M + 1, . . . , N, then neither of α k (Λ), s k (Λ, t) and m k (Λ, t) depend on Λ. As a result, minimizing Q(Λ) with respect to Λ is the same as minimizing its reduced version: which utilizes only projections to eigenvectors corresponding to strictly positive eigenvalues. This is not surprising, because in the white noise case, the projections { f k (t)} t≥0 , k = 1, . . . , N are independent processes and Hence, { f k (t)} t≥0 for k = M + 1, . . . , N represent (re-scaled and shifted) fractional Brownian motions that do not carry any information about the unknown parameter λ. They are, however, suitable for potential inference about σ. Such inference is outside the scope of this paper.

Remark 2. If H = 1/2, the frequency version of the conditional expectation and variance is
The corresponding reduced criterion function for the combined estimator is given by the formula (33)

Consistency of the Combined Estimator in Diagonal Case
In this subsection, we prove the strong consistency of the spectral version of the combined estimator. The diagonality assumptions (WN) and (EV), cf. Equations (24) and (25), are considered to hold throughout this subsection. In addition, let us assume the boundedness of the non-zero eigenvalues Motivated by the proof of the strong consistency of the conditional least-squares estimator in [15], we start with limiting behavior of the criterion function. Denote by the stationary fractional Ornstein-Uhlenbeck processes corresponding to (13). Note that (after some calculations) and (37) Let us recall the function f defined in Equation (13) in [15] and considered in Lemma 3 therein. We have Then Proof. To avoid repeating arguments and calculations from [15], we start by noting that ∑ t∈{0,h...,(n−1)h} where S n is the criterion for conditional LSE defined in Equation (26) in [15]. Indeed, { f k (t)} t≥0 is a fractional Ornstein-Uhlenbeck process with drift value α k (λ), and is the conditional expectation for the fractional Ornstein-Uhlenbeck processes with drift value α k (Λ). Similarly, with the function S ∞ defined in Lemma 3 in [15]. The above assumption on boundedness of eigenvalues (34) enables us to apply this lemma to get Another important ingredient is the uniform convergence of s(α k (Λ), t). Observe Consider a bounded interval [α L , α U ] ⊂ (0, ∞) and the uniform convergences |J(x, y) − J(x, ∞)| = 0, and the bounded limits These considerations imply the uniform convergence of s: and, in combination with (34), The last step is to show the positivity of the lower bound of s(α, ∞). Instead of analytical calculations; we use a simple probabilistic argument The uniform convergence (39) is now an easy consequence of (40)-(42).
Next, we show that the limiting criterion Q where λ is the true value of the drift parameter of the process underlying the observed data.

Lemma 2.
Let the estimation problem be well-identified: and let the eigenvalues depend continuously on the estimated parameter: Then the function Q where the last equality is due to relations (36) and (37), which imply

Using this decomposition ands
The inequality , ∀Λ ∈ C, now follows from the fact that To demonstrate the uniqueness of the minimum: note thatm(., ∞) is strictly decreasing, which follows from Lemma 2 in [15] and the fact that m(x, ∞) = f (x).
where {z k (t)} t≥0 is the stationary Ornstein-Uhlenbeck process defined in (35) with H = 1/2. By using arguments similar to those used in the proof of Lemma 1, including the boundedness assumption (42), we can show the uniform convergence To show that λ is the unique minimum of Q (∞) M , use the following notatioñ Employing the identification assumption (43) and the continuity assumption (44) results in the desired property (45). Note that the continuity of Q (∞) M on C is immediate.
We are in a position to formulate the main result of this section-the strong consistency of the combined estimator.

Proof.
The strong consistency can be demonstrated by standard arguments utilizing the compactness of C together with Lemmas 1 and 2 for H > 1/2 and Remark 3 for H = 1/2, respectively. For details, see, for example, the proof of Theorem 3 in [15].

Algorithm for Simulations
Simulating stochastic evolution equations with fBm as the noise source is not an easy task. To avoid complications with simulating stochastic integral w.r.t. fBm and safe computational resources, we simulated the solution to (5) in frequency domain with use of the Formula (14), which replaces the stochastic integral with the standard Riemann integral. The algorithm we used for simulations is outlined below: (12). 3. Find eigenvalues {α k (λ)} k=1...N and eigenvectors {g k (λ)} k=1...N of A(λ). 4. Transform initial condition into frequency domain f k (0) = g k (λ) T X 0 . 5. Calculate the whole solution in the frequency domain by formula (14). 6. Transform the solution back to physical domain by X t = G(λ)F t .

Examples
We choose to illustrate the performance of the combined estimator on two simple examples of stochastic heat equations on graphs.
In both examples, we consider stationary and non-stationary initial conditions and two values of the Hurst index H ∈ {0.5, 0.75}. Moreover, to illustrate the effect of increasing number of vertices N on the combined estimator, we study an example with no interactions (edges) between vertices, so that the drift operator is a diagonal matrix.

Example 2.
In the second example, we study a stochastic heat Equation (4) on the complete graph shown in Figure 3. Such an equation models situation when the diffusion of heat (or other quantity) occurs between all pairs of vertices.  Boxplots of the resulting estimates obtained from the Monte Carlo simulations with 100 independent samples for each scenario are shown in Figure 5 (Example 1) and in Figure 6 (Example 2). The related root mean square errors (RMSE) can be found in Table 1.   Estimating λ from solutions with the non-stationary initial condition provides significantly more accurate results compared to estimating from the stationary solution. This indicates that the combined estimator can efficiently utilize the valuable information about λ encoded in the speed of the convergence of the solution process to the stationary state in the initial phase. The solution in this phase is affected mainly by the drift operator; the effect of the random noise is marginal. This is in contrast to the stationary phase, where random noise prevails.
In line with the strong consistency of the combined estimator, we observe that increasing time horizon improves the accuracy of the estimates. This property is evident in the stationary case. In non-stationary case, only minor improvement has been observed, because most of the information about λ is stored in initial non-stationary phase. Time horizon T would have to be much longer to surpass this effect.
Analysis of the misspecified model reveals that it provides significantly biased estimates for stationary processes. Hence, there is a strong need to use specific methods for processes with fractional (autocorrelated) noise. The combined estimator overcomes this issue by the correction term for autocorrelation contained in Formula (32), but not in (33). In the non-stationary case, the effect of the initial condition suppresses the effect of the autocorrelated noise and the misspecified estimator shows competitive performance.
Comparison of Example 1 with Example 2 shows slightly better estimates for the latter (in terms of RMSE). More edges in the complete graph in Example 2 represent more transfer paths and more heat diffusion. In result, it contains more information about λ within the fixed time-window. Figure 7 illustrates the path-wise (almost sure) convergence of the estimator to the true value of λ. It shows the evolution of the estimates with increasing time horizon T for three samples of the solution to Example 1 with H = 0.75 and X 0 = (10, 10, 10, 10).

Example 3
The dependence of the behavior of the combined estimator on the number of vertices in the setting of the diagonal example with no interactions between vertices (Example 3) is studied below.
The results of Monte Carlo simulations (with 100 runs for each setting) are shown in the table with corresponding RMSE (Table 2). Obviously, with an increasing number of vertices, the estimator improves, which suggests the space-consistency of the combined estimator. This consistency can be conjectured due to the fact that the combined estimator follows the maximum likelihood approach in the space domain. However, a detailed study of space asymptotics of the combined estimator is not in the scope of this paper, and may be the direction of the further research. We expect that the asymptotic behavior of eigenvalues of the drift operator A(λ) will play important role here.

Conclusions
In this paper, we initiated a study of parameter estimation in the setting of stochastic evolution equations on graphs driven by fractional Brownian motions. We illustrated on the numerical simulation that such models can not be treated directly by estimation techniques known for equations with standard Brownian motion (see the misspecified model). Hence, we introduced a novel approach, which combines the conditional least-squares technique in the time domain with the maximum-likelihood technique in the space domain. This leads to a new estimator that we call the combined estimator. It has favourable asymptotic properties; we proved its strong consistency in the long-span asymptotic regime. Unlike the minimum-contrast estimator studied in [14], which could also be applied in our setting, the combined estimator is very efficient also for non-stationary solutions. We also derived an easy-to-implement formula in diagonal setting and illustrate its behaviour in various scenarios by Monte Carlo simulations.
Although we documented promising properties of the combined estimator above, some important questions remain open and may determine the direction of further research. These include (but are not limited to) the speed of convergence and asymptotic distribution, the space asymptotics (increasing number of nodes), modification for the singular case with Hurst index H < 1/2 (negatively correlated increments), simultaneous estimation of H, Φ (or its magnitude) and λ, or parameter estimation in more complex and interesting examples with the application of more sophisticated graph theory.