Gradient Profile Estimation Using Exponential Cubic Spline Smoothing in a Bayesian Framework

Attaining reliable gradient profiles is of utmost relevance for many physical systems. In many situations, the estimation of the gradient is inaccurate due to noise. It is common practice to first estimate the underlying system and then compute the gradient profile by taking the subsequent analytic derivative of the estimated system. The underlying system is often estimated by fitting or smoothing the data using other techniques. Taking the subsequent analytic derivative of an estimated function can be ill-posed. This becomes worse as the noise in the system increases. As a result, the uncertainty generated in the gradient estimate increases. In this paper, a theoretical framework for a method to estimate the gradient profile of discrete noisy data is presented. The method was developed within a Bayesian framework. Comprehensive numerical experiments were conducted on synthetic data at different levels of noise. The accuracy of the proposed method was quantified. Our findings suggest that the proposed gradient profile estimation method outperforms the state-of-the-art methods.


Introduction
Estimating the gradient of a system from a discrete set of data has a vast number of applications in many fields, such as biology, engineering, and physics. For instance, determining the particle velocity from the discrete time-position data in particle image velocimetry and particle tracking velocimetry experiments is important in plasma physics [1][2][3][4]. Applications of velocity estimation in motion control systems using discrete-time data have increased with the invention of microprocessors (see [5], and the references therein). Moreover, with improved technology, faster equipment is now available to measure highspeed discrete data [1]. The state-of-the-art method for determining the gradient from data is to estimate an underlying smoothing function of the data and to take its subsequent analytic derivative [6][7][8][9][10][11][12][13] (similar approaches with Bayesian methods are used in [14][15][16][17][18][19]). Finite differencing is also used for this purpose [2]. However, when there is noise in the data, the results can be ill-posed because the derivative tends to blow up the uncertainty in the estimates [5]. The estimates become worse when these data are measured in shorter intervals, especially when position data are measured using high-speed cameras in particle tracking experiments [1,2].
On the other hand, the cubic splines used in these methods of estimating the data can produce drastic results depending on the recorded speed of data or noise levels. Exponential cubic splines are superior to cubic splines because of their capacity to capture abrupt changes in the data due to its extra parameter of smoothness [20,21]. However, special attention must be paid to this extra smoothing parameter because its extreme values can produce unrealistic results. In the literature, Jeffreys prior was used for this smoothing parameter as a means of scaling down [1,20,21].
In our study, motivated by the work in [1], we present a detailed investigation on gradient profile estimation as a Bayesian inference problem by directly estimating the gradient, which avoids taking the analytic derivative of noisy data. Moreover, we use exponential cubic spline as the underlying smoothing function of the gradient. We also introduce a more meaningful choice of prior distribution for the smoothing parameter of this spline. Moreover, we present a comprehensive sensitivity analysis of noise on the gradient estimates and, additionally, we present the estimates for position and acceleration obtained by the subsequent integration and derivative of our gradient estimates.
The paper is organized as follows. Section 2 develops the idea of separating spaces with mathematical background. The Bayesian framework of the algorithm is demonstrated in Section 3. Thereafter, the computational details are presented in Sections 4 and 5. In particular, in these sections, our findings are compared to those obtained by means of more traditional methods, such as smoothing data with the subsequent analytic derivative. Our concluding remarks are given in Section 6.

Separating Spaces in Bayesian Context
In this section, we explain how the spaces are separated, which permits one to directly infer the gradient along with its mathematical forms. From this point forward, we use the terms (and mathematical notations) x-space (U) and v-space (V) to denote the space where measured data lives and the space where the gradient resides, respectively. We then use Bayes' theorem to map the information between these two spaces (examples are given in [22][23][24][25][26][27][28]). The mapping is done using the obvious relationship between an object's position and its velocity, where x(t) and v(t) represent the object's position and velocity, respectively. By the use of our notations, x(t) lives in x-space and v(t) lives in v-space. In our approach, a functional form (mathematical model) is given to velocity, rather than to measured data as in the traditional approach. Therefore, our mapping is such that the positions in x-space are 'obtained' by integrating the functional form placed in v-space. The Bayes' theorem allows us to do this as an inference problem to infer the unknowns of the functional form in v-space with the information available in x-space. The exponential cubic spline is proven to capture abrupt changes due to its high flexibility compared to the cubic spline [1,20,21]; thus we use the exponential cubic spline (denoted by S v ) as the functional form to represent the velocity, i.e., the exponential cubic spline lives in v-space (shown by the subscript v). This spline is the solution to the following minimal bending energy functional [1]: Here, f v i is the function value at ξ v i and λ v i is proportional to tension between two knots, ξ v i , ξ v i+1 . A numerically convenient reformulation of the spline at the i-th interval is then given by [21,29] S respectively. Using the arrow notations for vectorial quantities, the exponential cubic spline in Equation (3) can also be viewed in matrix form as [1] S In Equation (5), W denotes the design matrix of t-locations of the function values f v , the support points (knots) ξ v , and tension parameters λ v . The second derivative of S v , M v , can be found explicitly by solving the tri-diagonal system of equations in Equation (5). We used the exponential cubic spline to model the gradient. Since the gradient is the unknown quantity here, the variables, Therefore, it is important at this stage to identify data and parameters in this inference problem and they are given in Table 1. Table 1. The definition of data and parameters of the inference problem. The spaces where they reside are given in parentheses.
Now, the general relationship given in Equation (1) can be rewritten precisely using to our model for an arbitrary i-th data point (i = 1, . . . , n where n is the total number of measured data values) as (3). That is, i-th position (i-th data point) is the integrated spline from first time point (time of the initial position) to the i-th time point (time of the i-th position). The relationship in Equation (6) can be further reduced to The starting and ending values of x i in Equation (7) are the initial and the last positions recorded. Here, S v i−1 is the spline in (i − 1) th interval. We point out that the exponential cubic spline is analytically integrable and its technical details are given in Appendix A. When a relationship (as in (1) and (7)) is used to match an unknown quantity (gradient/velocity) and the observable data (position), the relationship should be able to generate data as close as possible to the observable information if the desired/unknown quantity is known. This is called the forward problem in Bayesian language. However, what is required here in our study is to be able to solve the inverse problem. That is, making inferences about the desired/unknown quantity using the observed information and the functional form of the unknown (here, the exponential cubic spline) [30][31][32]. An example of solving a forward problem by computing positions when the velocity is known is shown in Figure 1.

The Bayesian Recipe
The inverse problem for the position-velocity problem discussed in the previous section can be written using Bayes' rule as follows: which shows how the joint posterior probability distribution is built to infer the velocity (gradient) profile when the position profile is known. Bayes' rule in Equation (8) can be rewritten using data and parameters given in Table 1 as follows: where p x| f v , λ v , ξ v , E v , t, I is the likelihood and the quantity I represents all the relevant background information. We assume the spline variables are independent. It then enables us to write the following: The special case of when the number of knots and the position of knots are known (decided prior to computing the spline), we can write (10) Substituting Equation (10) into Equation (9), we get the general form of the posterior distribution: The evidence p( x|I) in Equation (9) is given by The relationship between the gradient and position in Equation (7) is included in the likelihood. We then assume the noise (or the uncertainty) in the measured data, where the quantity µ is the (n × 1)-dimensional mean vector and Σ is a (n × n)-dimensional covariance matrix of the noise. We further assume µ = 0 and the noise is uncorrelated which, in turn, makes Σ a diagonal matrix with diagonal elements, {σ 1 , . . . , σ n }. Omitting the arrow notation for vectorial quantities, we can write the following: In Equation (14), the first line (case 1) showcases when the time t i , in which x i is measured, falls between two knots of the spline, in which case S v i , calculated from ξ v i to t i , is a partial interval. The second line (case 2) showcases when t i falls exactly on a knot, in which case S v γ , calculated at t i = ξ v i , is a full interval. Note that case 2 is a special case of case 1.
In other words, case 2 can be achieved via case 1 when t i = ξ v i . Therefore, we rewrite Equation (14) for the general case subjecting the noise: for i = 1, . . . , n. By the assumption of iid (independent and identically distributed), we write our likelihood function: Combining Equations (15) and (16) together with the additional assumption that σ i = σ e =constant for any 1 ≤ i ≤ n, the likelihood distribution becomes (case 1 of Equation (14) is assumed here for generality) where, The Choice of Prior Probability Distributions According to Equation (11), we have three prior distributions to look at. We assume that there is no prior information about the spline values, p f v |E v , I , or the position of knots, p ξ v |E v , I . However, we give our attention to the tension parameter, λ v , because the shape of the spline between knots depends on it. The tension value is a scale parameter and it can take any real number. With the change in λ v i from 0 to ∞, the spline changes from a polygonal function to a cubic spline [21,29]. A polygonal function lacks the smoothness required by a moving object and the cubic spline can sometimes overestimate the curvature by producing cubic curves when knots are placed close. Hence, estimating an optimum tension value plays a big role in getting a reliable gradient estimate profile and choosing an appropriate prior adds a great value to it. Previous studies in the literature [21,33] used Jeffreys prior to merely scale down large tension values to avoid unrealistic results. In this work, we used a Gaussian distribution as a prior for the tension (Jeffreys and uniform priors were used in previous work by the same authors related to this problem [34]) because: (1) it allows one to have a higher probability for sensible values for the spline and lower probability for nonsensible values for the spline; and (2) a conjugate prior makes the posterior a Gaussian family of distribution (the known parametric form), thus making the computations easy. As stated earlier, polygonal functions lack the smoothness of a moving object and thus values of tension that produce polygonal functions are considered nonsensible values in this problem.
Assuming the prior distribution of tension parameters, λ v , are iid Gaussian distributions with mean µ i and variance σ θ λ i , we can write Finally, the posterior distribution p f v , λ v , ξ, E v | t, x, I in Equation (9) can be written as is given in Equation (18). In this work, the Gaussian prior distribution uses µ i = 0 and σ θ λ = 5. As noted previously [34], even though lambda has a range of 0 to ∞, the impact of lambda on the spline starts to flatten around 10. Thus, by setting σ θ λ = 5, 2σ θ λ covers 95% of the prior distribution.

Numerical Simulation
In this section, we discuss the numerical method followed to simulate the posterior distribution for inferring the velocity/gradient using positions/distances. We first created synthetic data from the velocity curve showcased for forward problem in Figure 1. We generated a sample of n = 121 position data points. Thereafter, we added Gaussian noise of µ = 0 and different levels of σ e to create different datasets of different noise levels (σ e ). In this work, we hand-picked the number of knots and their positions for simplicity and to demonstrate the inference (Equation (11)). Thus, we have 2E v − 1 number of parameters to infer: spline values, f v , and tension values λ v . However, in future work, it is important that these two sets of parameters are inferred.
For the synthetic data, following the work in [29], we selected E v = 8 knots at ξ v = [0, 1,3,6,8,10,11,12] in inferring the spline. Then, the posterior distribution with different priors (see Equation (12)) were simulated using a hybrid MCMC algorithm called DRAM [35,36]. The simulation begins with an initial minimization process. It should be noted that the posterior distribution is undefined at λ v = 0 due to numerical instability of the spline. Therefore, the smallest value that was used for λ v was 0.001. Thus, the values of the parameters were initialized with the following values for the initial minimization process: The initial minimization acts as a jump start to the DRAM algorithm. The proposal distributions of the parameters used in DRAM that generate the candidate sample values are Gaussian distributions. The parameter values of the proposal distributions (i.e., the mean and variance of Gaussian distributions) are user inputs and are showcased in Table 2, where LB and UB are the lower and upper bounds of the parameters, respectively. Table 2. The sets of parameters in initial probability distributions of DRAM.

Parameter Proposal Distribution Prior Distribution
The optimum values f v * and λ v * produced by the initial minimization were used as the mean values of the proposal distributions (µ v i , µ λ i ), so that the initial proposal distributions are centered around f v * and λ v * . The information of prior distributions can be specified with prior means µ θ v i , µ θ λ i and prior variances σ θ v i , σ θ λ i . The DRAM procedure takes the variance of the prior as the initial variance of the parameters if the covariance matrix is not specified by the user. When the prior variance is not specified, the variances of the proposal distributions are calculated as In other words, if no prior information is available about the variance of the parameter, 5% of the mean of the initial distribution is taken as the initial variance. Finally, samples were drawn from these Gaussian proposals and the marginal densities of the parameters were computed. The MCMC simulations were run until convergence was observed. Each sampling procedure of length Ω was repeated M = 100 times. At each of the M-th repetitions, a different dataset of same noise level was created followed by an initial minimization process. All the sample chains were stored and, in addition, for a given sample chain, the first few samples were burnt during Ω × p% period. The quantity p denotes the burn-in time. The average of M marginal densities of each parameter were then computed. In what follows, we assume that ω denotes the remaining number of samples after the burn-in period, that is, ω def = (1 − p)% × Ω. Then, for any k = 1, . . . , E v , the mean values f * v k of those average marginal densities were calculated as where f v ki denote the average sample values of f v k after M number of iterations, Following the same line of reasoning presented for Equations (22) and (23), we also have for any s = 1, . . . , E v − 1 with λ vsi denoting the average sample values of tension after M number of iterations: Finally, the standard deviations σ * v k and σ * λ s of the parameters are calculated from the average of all chains after M iterations as for any k = 1, . . . , E v and for any s = 1, . . . , E v − 1, respectively.

Results
This section presents the results obtained from the methodology built throughout the previous sections. The sensitivity of gradient estimates at different noise levels (of the position data) were tested. The noise levels (denoted by σ e ) on x-space are depicted in Figure 2 with five different noise levels: σ e ∈ {0.001, 0.3, 0.7, 1.0, 1.3}. The noise levels were selected so that they cover almost all possibilities of the standard Gaussian noise. That is, by the definition of the Gaussian distribution, the first four noise levels cover a probability of 68.2% and the last noise level covers a probability of 95%.
The sensitivity of noise levels to the posterior distribution was investigated when the Gaussian prior was used for the tension parameter. These results of the sensitivity analysis were compared against a common traditional method of fitting the same type of exponential cubic spline in x-space and the gradient was obtained by analytically differentiating the resulting fitted spline function. The notation "i-fit (i-space) with i = x, v" denotes that i-fit was fitted on the i-space, where the x-fit is the distance profile, while the v-fit is the gradient profile. A constant force was observed between the times 0-8 (a.u.) and short impulses were observed between the times 8-11 (a.u.). Moreover, it was observed that as the noise level increased, the data around the short impulses tended to become fuzzier so that it was hard to identify the trajectory. When the noise increased, the width of the marginal distributions were expanded. This, in turn, resulted in an increased length of the error bars of the parameter estimates. As a demonstration, we show the marginal probability distribution of f v 1 in Figure 3 at all noise levels. The gradient estimates at all noise levels are depicted in Figure 4 and the uncertainties of f v i parameters are shown in the same figure using error bars (top panel) and Bayesian credible intervals (bottom panel). The estimates almost overlap with the ground truth data at all noise levels, except at the boundaries and near t = 10 (a.u.), where the short impulses were present in the position profile.  The uncertainties of both velocity and tension parameters are characterized in Figure 5. The uncertainties of the velocity parameters with the noise level exhibit a linear relationship reflecting the linear relationship of velocity parameters in the likelihood function. The uncertainties of the tension parameters converge to the Gaussian prior uncertainty when the data noise exceeds the prior uncertainty. To help illustrate this, additional noise levels were added in these graphs in Figure 5 to show the relationship more clearly. The gradient estimates are compared with the analytic derivative of the x-fit obtained in U. The x-fit is obtained by fitting the same exponential cubic spline to the time-position data in U. This was performed at the same knot positions. The gradient profile, the acceleration profile, and the position profile were obtained from the two methods (via v-fit from V and x-fit from U) and are compared in Figures 6-8, respectively. The acceleration profile from our method (v-fit in V) was obtained by differentiating the exponential cubic spline, i.e., the analytic derivative of exponential cubic spline. It should be noted that ideally the acceleration should be inferred directly. Then, velocity and position could be found via integration. However, the focus and motivation of this paper was velocity. We are only using this as a quick comparison to illustrate how poor the acceleration fit would be when differentiating twice. The same was obtained using the traditional method by finite-differencing its gradient profile, i.e., finite differencing the analytic derivative of the exponential cubic spline.   The gradient estimates that emerge from the two spaces (namely, the x-space and v-space) clearly show some remarkable differences. In particular, the difference in accuracy is shown in Table 3 using the error norm: where L is the sample size [37]. The acceleration characterizes the acting forces on the object. The acceleration estimate (x-space) could not achieve constant force where it is expected. The errors at short impulses around t = 10 a.u. started becoming very large, which indicates the possibility of infinite forces. Moreover, the acceleration profile from the x-space (red dashed-dot lines) is not realistic with its very sharp turns. Therefore, the acceleration estimate from the velocity space is a reliable estimator of acting forces. The goodness of fit was tested by studying the squared sum of errors (SSE) defined as [38] (29) wheref i represent the estimated function values. The quantity f i in Equation (29) is replaced with noisy f i to obtain the second and the third columns in Table 4, while f i is replaced with the original f i without noise to calculate the fourth and fifth columns in the same table. The trajectory estimate (position profiles) is a bonus product from the gradient profile (v-space) and is obtained by integrating the gradient estimate (v-space). They are compared in Figure 8. The x-fit (x-space) follows the noisy data closely. This is because, when fitting a curve in the same space as data, that curve is trying to minimize the SSE, which means it tries to match the data as much as possible. Therefore, the x-fit from U tries to satisfy the noisy data as much as possible. However, the x-fit (v-space) follows the ground truth data more than it follows the noisy data (see Table 4). When integrating the v-fit (v-space), an additional order of differentiability is added to the trajectory estimate. Therefore, the x-fit (v-space) has a higher order of differentiability than that of the x-fit (x-space). This additional order of differentiability ensures the continuity of the velocity and, therefore, it satisfies the constraint of the finite force of the object.
The calculated SSE with true data and noisy data are shown in the Table 4. The SSE values for noisy data are smaller for the x-fit (x-space). This, in turn, reflects the fact that the x-fit (x-space) better follows noisy data. However, the SSE values for true data are smaller for the x-fit (v-space). This, instead, reflects the fact that the x-fit (v-space) better follows true data.

Conclusions
In this paper, we proposed a method to compute the gradient profile of a noisy system using the Bayesian inference method. It allowed us to infer the unobservable quantity, the gradient (velocity), by building a meaningful relationship between the unobservable quantity in velocity space and the observable quantity in data space. Furthermore, the unobservable quantity (the gradient) was modeled using the exponential cubic spline in velocity space. Using Bayesian methods, the parameters of this exponential cubic spline were inferred. The results of this new method were compared against those of a traditional method of modeling noisy data, which uses the exponential cubic spline in data space and takes the subsequent analytic derivative to obtain the gradient.
The results show that the gradient estimates obtained by modeling in velocity space are more accurate than the estimates obtained via the traditional method (see Table 3). Moreover, our method was able to produce better acceleration estimates with reliable and accurate values, where a constant force is expected (see Figure 7). We also compared the trajectory profile. We conclude that when the traditional method is used, the trajectory estimates tend to follow noisy data, whereas when our method is used, the estimates tend to follow the ground truth data, which, in return, produce more accurate estimates (see Figure 8 and Table 4). It is argued that by integrating the model in velocity space to compute the trajectory values, an extra order of differentiability is added. This argument can be further extended to suggest that the acceleration should be inferred first. Then, velocity and trajectories can be inferred by integration. However, this was not the focus of the paper. In conclusion, the method demonstrated in this paper is superior when estimating the velocity of a moving object under finite force as compared to others in the literature (for instance, see [1] and the references therein). It provided better results in all three estimates: trajectory, velocity, and acceleration.
It is necessary to note that, although our main focus in this paper was on the proposal of a novel theoretical method to estimate a gradient profile from discrete noisy data, a number of improvements can be sought in its practical implementation. For instance, the use of an MCMC algorithm gains precision at the expense of speed. A faster algorithm (such as the expectation maximization algorithm [39]) may be needed for real-time estimation. Furthermore, the amount and placement of the knots lacks a systematic guiding principle. In the future, we plan to use Bayesian model selection for determining the amount of knots. For the placement issue, we can adopt a hierarchical approach by including spline knot location algorithms [40] in conjunction with our main algorithm. This would provide estimates for the values associated with the knots and where they should be located. Finally, we hope to apply our Bayesian estimation technique to more realistic problems in which acceleration, velocity, and trajectory estimations are needed. In particular, from both theoretical and applied perspectives, in the future, we think it may be worth exploring the possibility of characterizing velocity profiles in ferromagnetic fluids [41] with the use of entropic inference methods that encompass Bayesian techniques [42] and, moreover, show promising results in inferring the ferromagnetic properties of materials [43]. α 1 (x, y) = min π(y)q(y, x) π(x)q(x, y) , 1 if π(x)q(x, y) > 0, 1 otherwise. (A6a) q(z, y) q(x, y) q(z, y, x) q(x, y, z) [1−α 1 (z, y)] [1−α 1 (x, y)] , 1 if π(x)q(x, y)q(x, y, z)[1 − α 1 (x, y)] > 0, 1 otherwise. (A6b) where q(θ) represents the proposal distribution and Θ (j) represents the set of parameters of the proposal distribution at j-th iteration.