Abstract
This paper presents convergence analysis of a novel data-driven feedback control algorithm designed for generating online controls based on partial noisy observational data. The algorithm comprises a particle filter-enabled state estimation component, estimating the controlled system’s state via indirect observations, alongside an efficient stochastic maximum principle-type optimal control solver. By integrating weak convergence techniques for the particle filter with convergence analysis for the stochastic maximum principle control solver, we derive a weak convergence result for the optimization procedure in search of optimal data-driven feedback control. Numerical experiments are performed to validate the theoretical findings.
Keywords:
stochastic optimal control; nonlinear filtering; data driven; maximum principle; stochastic optimization MSC:
49N35
1. Introduction
In this paper, we carry out numerical analysis demonstrating the convergence of a data-driven feedback control algorithm designed for generating online control based on partial noisy observational data.
Our focus lies on the stochastic feedback control problem, which aims to determine optimal controls. These control actions are used to guide a controlled state dynamical system towards meeting certain optimality conditions, leveraging feedback from the system’s current state. There are two practical challenges in solving the feedback control problem. First, when the control problem’s dimension is high, the computational cost for searching the optimal control escalates exponentially. This is known as the “curse of dimensionality”. Second, in numerous scenarios, the state of the controlled system is not directly observable and must be inferred through detectors or observation facilities. These sensors are typically subject to noise that originates from the device itself or the surrounding environment. For instance, radar receives noisy data and processes them through the arctangent function. Therefore, state estimation techniques become necessary to estimate the current state for designing optimal control, with observations gathered to aid in estimating the hidden state.
To address the aforementioned challenges, a novel online data-driven feedback control algorithm has been developed [1]. This algorithm introduces a stochastic gradient descent optimal control solver within the stochastic maximum principle framework to combat the high dimensionality issue in optimal control problems. Traditionally, stochastic optimal control problems are solved using dynamical programming or the stochastic maximum principle, both requiring numerical simulations for large differential systems [2,3,4]. However, the stochastic maximum principle stands out for its capability to handle random coefficients in the state model and finite-dimensional terminal state constraints [5]. In the stochastic maximum principle approach, a system of backward stochastic differential equations (BSDEs) is derived as the adjoint equation of the controlled state process. Then, the solution of the adjoint BSDE is utilized to formulate the gradient of the cost function with respect to the control process [6,7]. However, solving BSDEs numerically entails significant computational costs, especially in high-dimensional problems, demanding a large number of random samples [8,9]. To bolster efficiency, a sample-wise optimal control solver method has been devised [10], where the solution of the adjoint BSDE is represented using only one realization or a small batch of samples. This approach justifies the application of stochastic approximation in the optimization procedure [11,12], and it shifts the computational cost from solving BSDEs to searching for the optimal control, thereby enhancing overall efficiency [13].
In data-driven feedback control, optimal filtering methods also play a pivotal role in dynamically estimating the state of the controlled system. Two prominent approaches for nonlinear optimal filtering are the Zakai filter and the particle filter. While the Zakai filter aims to compute the conditional probability density function (pdf) for the target dynamical system using a parabolic-type stochastic partial differential equation known as the Zakai equation [14], the particle filter, also known as a sequential Monte Carlo method, approximates the desired conditional pdf using the empirical distribution of a set of random samples (particles) [15]. Although the Zakai filter theoretically offers more accurate approximations for conditional distributions, the particle filter is favored in more practical applications due to the high efficiency of the Monte Carlo method in approximating high-dimensional distributions [16].
The aim of this study is to examine the convergence of the data-driven feedback control algorithm proposed in [1], providing mathematical validation for its performance. While convergence in particle filter methods has been well studied [17,18,19], this work adopts the analysis technique outlined in [18] to establish weak convergence results for the particle filter regarding the number of particles. Analysis techniques for BSDEs alongside classical convergence results for stochastic gradient descent [13,20] are crucial for achieving convergence in the stochastic gradient descent optimal control solver. The theoretical framework of this analysis merges the examination of particle filters with the analysis of optimal control, and the overarching objective of this paper is to derive a comprehensive weak convergence result for the optimal data-driven feedback control.
In this paper, we present two numerical examples to demonstrate the baseline performance and convergence trend of our algorithm. The first example involves a classic linear quadratic optimal control problem, comparing the analytical control with the estimated control. The second example addresses a nonlinear scenario, specifically a Dubins vehicle maneuvering problem, where both the system and observations exhibit significant nonlinearity.
2. An Efficient Algorithm for Data-Driven Feedback Control
We first briefly introduce the data-driven feedback control problem that we consider in this work. Then, we describe our efficient algorithm for solving the data-driven feedback control problem by using a stochastic gradient descent-type optimization procedure for the optimal control.
2.1. Problem Setting for the Data-Driven Optimal Control Problem
In probability space , we consider the following augmented system on time interval
where is the -dimensional controlled state process with dynamics , is the diffusion coefficient for a d-dimensional Brownian motion W that perturbs the the state X, and u is an m-dimensional control process valued in some set U that controls the state process X. In the case that the state X is not directly observable, we have an observation process M that collects partial noisy observations on X with observation function , and B is a p-dimensional Brownian motion that is independent from W.
Let be the filtration of B augmented by all the -null sets in , and be the filtration generated by W and B (augmented by -null sets in ). Under mild conditions, for any square integrable random variable independent of W and B, and any -progressively measurable process u (valued in U), Equation (1) admits a unique solution which is -adapted. Next, we let be the filtration generated by M (augmented by all the -null sets in ). Clearly, , and , , in general. The progressively measurable control processes, denoted by , are control actions driven by the information contained in observational data.
We introduce the set of data-driven admissible controls as
and the cost functional that measures the performance of data-driven control is defined as
where f is the running cost, and h is the terminal cost.
The goal of the data-driven feedback control problem is to find the optimal data-driven control such that
2.2. The Algorithm for Solving the Data-Driven Optimal Control Problem
To solve the data-driven feedback control problem, we will use the algorithm from [1], which is derived from the stochastic maximum principle.
2.2.1. The Optimization Procedure for Optimal Control
When the optimal control is in the interior of , the gradient process of the cost functional with respect to the control process on time interval can be derived using the Gâteaux derivative of and the stochastic maximum principle in the following form:
where stochastic processes Y and Z are solutions of the following forward–backward stochastic differential equations (FBSDEs) system:
where Z is the martingale representation of Y with respect to W and is the martingale representation of Y with respect to B.
To solve the data-driven feedback optimal control problem, we also use gradient descent-type optimization, and the gradient process is defined in (4). Then, we can use the following gradient descent iteration to find the optimal control at any time instant
where r is the step size for the gradient. We know that the observational information is gradually increased as we collect more and more data over time. Therefore, at a certain time instant t, we target finding the optimal control with accessible information . Since the evaluation for requires trajectories as and are solved backwards from T to t, we take conditional expectation to the gradient process , i.e.,
where , and correspond to the estimated control . For the gradient descent iteration (6) on the time interval , by taking conditional expectation , we obtain
When , the observational information is not available at time t. We use conditional expectation to replace since it provides the best approximation for given the current observational information . We denote
and then the gradient descent iteration is
where can be obtained by solving the following FBSDEs
and evaluated effectively using the numerical algorithm, which will be introduced later.
When the controlled dynamics and the observation function are nonlinear, we will use optimal filtering techniques to obtain the conditional expectation . Before applying the particle filter method, which is one of the most important particle-based optimal filtering methods, we define
for . With the conditional probability density function (pdf) that we obtain through optimal filtering methods and the fact that is a stochastic process depending on the state of random variable , the conditional gradient process in (7) can be obtained by the following integral
2.2.2. Numerical Approach for Data-Driven Feedback Control by PF-SGD
For the numerical framework, we need the temporal partition
and we use the control sequence to represent the control process over the time interval .
- Numerical Schemes for FBSDEs
For the FBSDEs system, we adopt the following schemes:
where are numerical approximations for , respectively.
Then, the standard Monte Carlo method can approximate expectations with K random samples:
where is a set of random samples following the standard Gaussian distribution that we use to describe the randomness of .
The above schemes solve the FBSDE system (5) as a recursive algorithm, and the convergence of the schemes is well studied—cf. ([20,21]).
- Particle Filter Method for Conditional Distribution
To apply the particle filter method, we consider the controlled process on time interval
Assume that at time instant , we have S particles, denoted by , that follow an empirical distribution as an approximation for . The prior pdf that we want to find in the prediction stage is approximated as
where is sampled from and is the transition probability derived from the state dynamics (15). As a result, the sample cloud provides an approximate distribution for the prior . Then, in the update stage, we have
In this way, we obtain a weighted empirical distribution that approximates the posterior pdf with the importance density weight . Then, to avoid the degeneracy problem, we need the resampling step. Thus, we have
Then, we combine the numerical schemes for the adjoint FBSDEs system (10) and the particle filter algorithm to formulate an efficient stochastic optimization algorithm to solve the optimal control process .
- Stochastic Optimization for Control Process
In this subsection, we combine the numerical schemes for the adjoint FBSDEs system (10) and the particle filter algorithm to formulate an efficient stochastic optimization algorithm to solve the optimal control process .
On a time instant , we have
where is a time instant after .
Then, we use the approximate solutions of FBSDEs from schemes (14) to replace and the conditional distribution is approximated by the empirical distribution obtained from the particle filter algorithm (16)–(18). Then, we can solve the optimal control through the following gradient descent optimization iteration
Then, the standard Monte Carlo method can approximate expectation by samples:
We can see from the above Monte Carlo approximation that in order to approximate the expectation in one gradient descent iteration step, we need to generate samples. This is even more computationally expensive when the controlled system is a high-dimensional process.
Thus, we want to apply the idea of stochastic gradient descent (SGD) to improve the efficiency of classic gradient descent optimization and combine it with the particle filter method. Instead of using the fully calculated Monte Carlo simulation to approximate the conditional expectation, we use only one realization of to represent the expectation. For the conditional distribution of the controlled process, we can use the particles to describe. Thus, we have
where l is the iteration index, and the index indicates that the random generation of the controlled process varies among the gradient descent iteration steps. indicates a randomly generated realization of the controlled process with a randomly selected initial state from the particle cloud .
Then, we have the following SGD schemes:
where is the approximate solution corresponding to the random sample , and the path of is generated as follows
where . Then, an estimate for our desired data-driven optimal control at time instant is
The scheme for FBSDEs is
Then, we have the following Algorithm 1:
| Algorithm 1 PF-SGD algorithm for data-driven feedback control problem. |
|
3. Convergence Analysis
Our convergence analysis aims to show the convergence of the distribution of the state to the "true state" under the temporal model discretization N. We also show the convergence of the estimated control to the "true control" under the expectation restricting it to a compact set. To proceed, we first introduce our notations and the assumptions required in the proof in Section 3.1. Then, in Section 3.2, we shall provide the main convergence theorems.
3.1. Notations and Assumptions
- Notations
- We use to denote the control process starts from time and ends at time T. We useto denote the collection of the admissible controls starting at time .
- We define the control at time to be , the conditional distribution coming from a particle filter algorithm.
- We define where the superscript means that the measure is obtained through the particle filter method, and so it is random.
- We use to denote the sampling operator: and to denote the updating step in the particle filter. We use to denote the transition operator (the prediction step) under the SGD–particle filter framework. is the deterministic transition operator for the exact case (the control is exact in SDG). We mention “deterministic” here to distinguish the case where the control may be random due to the SGD optimization algorithm.
- We use to denote the deterministic inner product, i.e., if , then
- We define . We then have . We remark that is a process that starts from time , and so is essentially the initial condition of the diffusion process.
- We define the distance between two random measures to be the following:where the expectation is taken over by the randomness of the measure.
- We use the total variation distance between two deterministic probability measures :
- We use to denote the total number of iterations taken in the SGD algorithm at time ; we use to denote the total number of particles in the system. We use C to denote a generic constant which may vary from line to line.
- Abusing the notation, we will denote in the following way where the argument can be a vector of any length :
- Assumptions
- We assume that satisfy the following strong condition: for any , there exist a constant such that for all :Notice that (30) implies that such inequality is true for any , and it can be seen from simply fixing all the , to be 0.This is a very strong assumption, and one should consider relaxing it toThat is, this relation holds in expectation instead of point-wise.
- Both b and are deterministic and in in space variable x and control u.
- are all uniformly Lipschitz in and uniformly bounded.
- satisfies the uniform elliptic condition.
- The initial condition .
- The terminal (Loss) function is and positive, and has the most linear growth at infinity.
- We assume that the function (related to the Bayesian step) has the following bound: there exists such that
3.2. The Convergence Theorem for the Data-Driven Feedback Control Algorithm
Our algorithm combines the particle filter method and the stochastic gradient descent method. Lemma 1 (combine Lemma 4.7–4.9 from the book [22]) provides the convergence result for the particle filter method alone. It shows that each prediction and updating step is guaranteed to be convergent.
Recall that is the sampling operator where we sample particles. denotes the transition operator (the prediction step) under the SGD–particle filter framework. denotes the deterministic transition operator assuming that SGD gives the exact control. denotes the updating step in the particle filter method.
Lemma 1.
We assume that there exists . The following is true:
Given Lemma 1, Theorem 4.5 in [22] tells us the particle filter framework is convergent. Then, following Lemma 1, we can have the distance between the true distribution of the state and the estimated distribution through the SGD–particle filter framework.
where in the above inequalities, we have used triangle inequalities and Lemma 1.
Hence, if we can show that the inequality of the following form holds
for some constant and that we can tune, then by recursion, we can show that by using (34), the convergence holds true.
Remark.
We point out that the difficulty lies in showing (35). Recall that the distance between two random measures is defined in (27) and involves testing the overall measurable function bounded by 1. However, we will see later that it is more desirable to test against Lipschitz functions. Hence, since the underlying measure is a finite Borel probability measure, we want to identify the function first with a continuous function on a compact set (Lusin). Then, we approximate this continuous function uniformly by a Lipschitz function since now the domain is compact. This way, we can roughly show that a form close to (35) is true.
Remark.
Notice that the first measure in has two sources of randomness: the randomness in which comes from the SGD algorithm used to find the control, and the randomness in the measure . However, we do not distinguish the two when we take the expectation.
To prove the convergence, we need to create a subspace where all particles (obtained from the particle filter method) at any time n are within this bounded subspace. Or we can relax it to the statement that the probability of any particles escaping from a very large region is very small. Lemma 2 shows we can restrict the particles to a compact subspace with the radius M by starting from any particles and any admissible control .
Lemma 2.
There exists M and constant C, such that under any admissible control
Proof.
See Appendix A.1. □
Remark.
Lemma 2 tells that starting from any random selection of particles and any admissible control , at any time t, all particles are restricted by a compact set with diameter , such that
We will use the following result extensively later
The following Lemma 3 describes the difference between the estimated optimal control and the true control . Let . We can see that knowing essentially means that we know the control in the SGD framework at time , since according to our scheme, the control is measurable.
Lemma 3.
Under a fixed temporal discretization number N, with the particle cloud , a deterministic and a compact domain , (such that and ), we have for any iteration number K that the following holds
Remark.
The value of depends on , which is obtained from the previous step, and it does not depend on the current . As a result, we can see that, as long as
(39) can be made arbitrarily small on any compact domain , and this indicates the point-wise convergence for u at any time .
Proof.
For simplicity in the proof, we denote control as where K is the iteration in SGD, and n is the current time . denote the SGD process using estimated control , and denote the process using true control .
where is drawn from the current distribution and by the optimality condition. Take the difference between (40) and (41), square both sides, and take conditional expectation , and this conditional expectation is taken with respect to the following randomness:
- The randomness comes from the selection on the initial point .
- The randomness comes from the pathwise approximated Brownian motion used for FBSDEs.
- The randomness comes from the accumulation of the past particle sampling.
We can write , which can be seen from the following
Then, take the square norm on both sides, multiply by an indicator function and take conditional expectation . Noticing that is is measurable and is deterministic. In this case, we obtain the following
where in the last line, we used the following Lemma from [13] that states that there exists C such that
Recall that , we then have
Then, we take the expectation on both sides over the randomness and we have
Notice that for the control , we know that for for a fixed x, is uniformly bounded:
Thus, we have
where we have absorbed the constant term N in C. □
Lemma 3 shows that when the empirical distribution is close enough to the true distribution , the difference between and under the expectation restricted to a compact set is bounded by the difference between the true distribution of the state and the estimated distribution. Thus, suppose we can show the convergence of the distribution. In that case, we have the convergence result of the estimated control to the "true control" under the expectation restricted to a compact set.
Next, we want to show that by moving forward in one step, the distance between the true distribution of the state and the estimated distribution through the SGD–particle filter framework is bounded by the distance of the previous step with some constants in Lemma 4.
Lemma 4.
For each , there exist such that the following inequality holds
Proof.
The key step is to estimate the quantity in (34). WLOG, we assume that the sup is realized by the function f with ; then, we have
Notice that is the prediction operator that uses the control which carries the randomness from SGD, and uses the control . Then is a random measure, and we comment that both and are deterministic.
Without loss of generality, we use and to denote the random control and the random measure. (Even though the randomness can be different, we can concatenate to define them as in general.)
We have for the fixed randomness , and by Fubini’s theorem
where the inner conditional expectation is taken with respect to .
Now, since we can pick to be a large compact set containing the origin, then
To deal with , we see that it is desirable that the function f has the Lipschitz property. However, it is only measurable in general. The strategy to overcome this difficulty is to first use Lusin’s Theorem to find a continuous identification with f on a large compact set; then, on this compact set, we can approximate uniformly by a Lipschitz function.
We see that
Then, by taking expectation on both sides over all the randomness in this quantity, we have
We know that there exists a big compact (so a large ) containing the origin such that
and a continuous with by Lusin’s theorem.
Thus, we know that , and we also have the following inequality:
Moreover, since both and are compact, is also compact with . From Lemma 2, we know that there exists some constant C such that for any , that one obtains from or particle filter–SGD algorithm, or :
Hence, we have that
To deal with , notice that by the choice of f, we have the following.
by Lemma 2.
To deal with , we have by the density of the Lipschitz function that there exists with Lipschitz constant . We point out that may depend on , and the function . Now, by taking the expectation on both sides and using the Lipschitz property, we have
We realize that * is the SGD optimization part of the algorithm in expectation, and we note that we have dropped the inner expectation. The expectation means that given the initial condition , with , one wants to find the difference in expectation between and . The outer expectation means averaging overall the randomness in both the measure and the SGD.
Now, by using (50) in Lemma 3, absorbing N in the constant C, we obtain the following
By definition of the distance between two random measures, we have that:
Remark.
Lusin’s theorem requires the underlying measure to be finite Borel regular, and in this case, we are looking at the measure defined as follows: for , . is clearly a probability measure induced on the Polish space , and so it is tight by the inverse implication of Prokhorov’s theorem (or we can use the fact that all finite Borel measures defined on a complete metric space are tight). Thus, it is inner regular; since now is also clearly locally finite, it also implies the outer regularity.
Finally, we can use Lemma 4 repeatedly to show the convergence result:
Theorem 5.
By taking , there exist , and such that
where . Then, for any , we have by picking , , large enough and small enough, and then the following holds
for some fixed constant C which depends only on κ.
Proof.
Since we know that , we now just need to show that (70) vanishes when gets large and gets small, . Notice that comes from the domain truncation for each time step and comes from the uniform approximation which is free to choose. The choice of will potentially determine the value of .
We fix where N is the number of time discretization and M is potentially a large number.
Then, we define , through the following:
Here, we define .
Notice that one should iterate (71) and (72) iteratively, since defining will lead to the Lipschitz constant at stage i, which is needed for the definition for .
Then, we have that
and we also have
By picking to be large, we then can have
Last but not least, by taking so large such that
we can see that (70) converges to 0 by taking M to be very large. □
Remark.
Notice that in Theorem 5, it is natural to have terms that depend on and . The presence of and are due to technical difficulties. basically gives the growth of the particles in the worst-case scenario (we want our domain to be compact), while and come from the Lipchitz approximation for the test function f.
4. Numerical Example
In this section, we carry out two numerical examples. In the first example, we consider a classic linear quadratic optimal control problem, in which the optimal control can be derived analytically. We use this example as a benchmark example to show the baseline performance and the convergence trend of our algorithm. In the second example, we solve a more practical Dubins vehicle maneuvering problem, and we design control actions based on bearing angles to let the target vehicle follow a pre-designed path.
4.1. Example 1. Linear Quadratic Control Problem with Nonlinear Observations
Assume B, K are symmetric, positive definite. The forward process Y and the observation process M are given by
The cost functional is given by
and we want to find .
4.1.1. Experimental Design
An interesting fact of such an example is that one can construct a time deterministic exact solution which depends only on .
By simplifying (78), we have
Then, we define:
Hence, we see that
and (81) is true because all the terms are deterministic in time given . Moreover, we observe that
As a result, we see that (79) now takes the form:
and by performing a simple integration by part, we have
As a result, we have the following standard deterministic control problem:
Then, one can form the following Hamiltonian
where p is , and v is the value function.
Then, to find the optimal control, we have
which is
Thus, we obtain
Additionally, notice that
and then,
Then, set
Set B, R, K and Q as identity matrices. With , we have the following solution according to this setup.
To solve (92), let
Then, we have
Let be
where . Then,
Then, to find the exact form by following the trajectory of in this setup, one will have to solve the following coupled forward–backward ODE.
with . As a result, we have
That is, we need to solve the above coupled FBODE. Then, seeing that , and writing , we have
To solve (102) numerically, we conduct a numerical discretization:
We can put (103) into a large linear system and solve it numerically.
4.1.2. Performance Experiment
We set the total number of discretizations to be . Set iteration , the number of particles in each dimension is 128, , and .
In Figure 1, we present the estimated data-driven control and the true optimal control.
Figure 1.
Estimated control vs. true optimal control.
In Figure 2, we show the estimated state trajectories with respect to true state trajectories in each dimension.
Figure 2.
Estimated state vs. true state.
We can see from these Figures that our data-driven feedback control algorithm works very well for this 4-D linear quadratic control problem despite there being nonlinear observations.
4.1.3. Convergence Experiment
In this experiment, we demonstrate the convergence performance of our algorithm, and we study the error decay of the algorithm in the norm with respect to the number of particles used. Each result is an average of of 50 independent tests.
Specifically, we set and we just increase the number of particles S = {2, 8, 32, 128, 512, 2048, 4096, 8192, 16,384, 32,768}, and we obtained the result in Figure 3.
Figure 3.
Error vs. number of particles.
Set the number of particles , and . We obtained the result in Figure 4.
Figure 4.
Error vs. number of steps.
From the results above, we can see that the error will decrease and converge as we increase the number of particles and the number of iterations.
4.2. Example 2. Two-Dimensional Dubins Vehicle Maneuvering Problem
In this example, we solve a Dubins vehicle maneuvering problem. The controlled process is described by the following nonlinear controlled dynamics:
where the pair gives the position of a car-like robot moving in the 2D plane, is the steering angle that controls the moving direction of the robot, which is governed by the control action , and is the noise that perturbs the motion and control actions. Assume that we do not have direct observations on the robot. Instead, we use two detectors located on different observation platforms at and to collect bearing angles of the target robot as indirect observations. Thus, we have the observation process . Given the expected path , the car should follow it and arrive at the terminal position on time. The performance cost functional based on observational data that we aim to minimize is defined as:
In our numerical experiments, we let the car start from to . The expected path is . Other settings are , , i.e., , , , , , and the initial heading direction is . To emphasize the importance of following the expected path and arriving at the target location at the terminal time, let .
In Figure 5, we plot our algorithm’s designed trajectory and the estimated trajectory. We can see from this figure that the car moves towards the target along the designed path and is “on target” at the final time with a very small error.
Figure 5.
Controlled trajectory from (0,0) to (1,1).
We set and we just increase the number of particles . To provide the convergence of our algorithm in solving this Dubins vehicle maneuvering problem, we repeated the above experiment 50 times and we obtained the error = in Figure 6 where .
Figure 6.
Error vs. number of particles.
Set the number of particles , and . We obtained the error= in Figure 7, where the error is the average of .
Figure 7.
Error vs. number of steps.
From the results above, we can see that the error will decrease and converge as we increase the number of particles and the number of iterations.
5. Conclusions
In this paper, we present the weak convergence of the data-driven feedback control algorithm proposed in [1]. We do not discuss the convergence rate due to the challenge of determining the radius M of the compact subspace that bounds all particles . However, in practice, given a terminal time T, one can use Monte Carlo simulations to find an M that satisfies a certain probability in Lemma 2. Our numerical experiments indicate that both the estimated control and estimated distribution converge at a rate related to the number of particles and iterations.
Future work can focus on analyzing the convergence rate and error bounds for a given state system. This will provide clarity on the number of particles and iterations required to achieve the desired estimation accuracy when applying the algorithm from [1].
Author Contributions
Conceptualization, S.L., H.S. and F.B.; methodology, F.B. and R.A.; software, S.L.; validation, S.L. and F.B.; formal analysis, S.L. and H.S.; investigation, S.L.; resources, S.L.; data curation, S.L.; writing—original draft preparation, S.L. and H.S.; writing—review and editing, F.B. and R.A.; visualization, S.L.; supervision, F.B.; project administration, F.B.; funding acquisition, F.B. and R.A. All authors have read and agreed to the published version of the manuscript.
Funding
This work is partially supported by U.S. Department of Energy through FASTMath Institute and Office of Science, Advanced Scientific Computing Research program under the grant DE-SC0022297. FB would also like to acknowledge the support from U.S. National Science Foundation through project DMS-2142672.
Data Availability Statement
All code written as part of this study will be made available on GitHub upon completing the peer review process for this article.
Conflicts of Interest
Author Hui Sun was employed by the company Citigroup Inc. The research work performed by the author does not represent any corporate opinion. The remaining authors declare that the research was conducted in the absence of any commercial or financial relationships that could be construed as a potential conflict of interest.
Abbreviations
The following abbreviations are used in this manuscript:
| BSDEs | Backward stochastic differential equations |
| FBSDEs | Forward–backward stochastic differential equations |
| PF | Particle filter |
| SGD | Stochastic gradient descent |
Appendix A
Appendix A.1
Proof of Lemma 2.
Proof.
We start with time .
Step 1. Starting from with , and by fixing an arbitrary control , we have for the prediction step:
Step 2. We denote the distribution , and then the particle method will perform a random resampling from such a distribution and obtain a random distribution
Hence, we have for , taking the expectation where the expectation is taken over all randomness in the measure
where are i.i.d random samples, contains the sampling randomness and . The conditional expectation is meant to show that all the particles are conditionally independent (since there is other randomness that has accumulated in the history if we want to apply this argument recursively.) Thus, by (A1)
Step 3. We now have the random measure , and we proceed to the analysis step. We have by definition
where is the distribution of the terminal state from the previous step. Recalling assumption 7, we give an estimate over :
Step 4. Now, we again apply the random sampling step
where . Then, for , we have
where and is the filtration that builds on and the randomness of the current sampling. Then, by (A6), we have
and this completes all the estimates for the first time-stepping. Hence, after one time step, we have
which means that by applying the same argument, we will have the following recursion in general:
As a result, by picking arbitrary , using this same argument repeatedly until N, we have that for all :
and we notice that is increasing in n. As a result, we know that for any , we have that
Hence, by Chebyshev’s inequality, we have
and then we have that
By noticing that the control values are arbitrarily picked, we have that
□
References
- Archibald, R.; Bao, F.; Yong, J.; Zhou, T. An efficient numerical algorithm for solving data driven feedback control problems. J. Sci. Comput. 2020, 85, 51. [Google Scholar] [CrossRef]
- Bellman, R. Dynamic programming. Science 1966, 153, 34–37. [Google Scholar] [CrossRef] [PubMed]
- Feng, X.; Glowinski, R.; Neilan, M. Recent developments in numerical methods for fully nonlinear second order partial differential equations. SIAM Rev. 2013, 55, 205–267. [Google Scholar] [CrossRef]
- Peng, S. A general stochastic maximum principle for optimal control problems. SIAM J. Control Optim. 1990, 28, 966–979. [Google Scholar] [CrossRef]
- Yong, J.; Zhou, X.Y. Stochastic Controls: Hamiltonian Systems and HJB Equations; Springer Science & Business Media: Cham, Switzerland, 2012. [Google Scholar]
- Gong, B.; Liu, W.; Tang, T.; Zhao, W.; Zhou, T. An efficient gradient projection method for stochastic optimal control problems. SIAM J. Numer. Anal. 2017, 55, 2982–3005. [Google Scholar] [CrossRef]
- Tang, S. The maximum principle for partially observed optimal control of stochastic differential equations. SIAM J. Control Optim. 1998, 36, 1596–1617. [Google Scholar] [CrossRef]
- Zhang, J. A numerical scheme for BSDEs. Ann. Appl. Probab. 2004, 14, 459–488. [Google Scholar] [CrossRef]
- Zhao, W.; Fu, Y.; Zhou, T. New kinds of high-order multistep schemes for coupled forward backward stochastic differential equations. SIAM J. Sci. Comput. 2014, 36, A1731–A1751. [Google Scholar] [CrossRef]
- Archibald, R.; Bao, F.; Yong, J. A stochastic gradient descent approach for stochastic optimal control. East Asian J. Appl. Math. 2020, 10, 635–658. [Google Scholar] [CrossRef]
- Sato, I.; Nakagawa, H. Approximation analysis of stochastic gradient Langevin dynamics by using Fokker-Planck equation and Ito process. In Proceedings of the International Conference on Machine Learning, Beijing, China, 21–26 June 2014; Volume 32, pp. 982–990. [Google Scholar]
- Shapiro, A.; Wardi, Y. Convergence analysis of gradient descent stochastic algorithms. J. Optim. Theory Appl. 1996, 91, 439–454. [Google Scholar] [CrossRef]
- Archibald, R.; Bao, F.; Cao, Y.; Sun, H. Numerical analysis for convergence of a sample-wise backpropagation method for training stochastic neural networks. SIAM J. Numer. Anal. 2024, 62, 593–621. [Google Scholar] [CrossRef]
- Zakai, M. On the optimal filtering of diffusion processes. Z. Wahrscheinlichkeitstheorie Verw. Gebiete 1969, 11, 230–243. [Google Scholar] [CrossRef]
- Gordon, N.J.; Salmond, D.J.; Smith, A.F. Novel approach to nonlinear/non-Gaussian Bayesian state estimation. IEE Proc. F (Radar Signal Process.) 1993, 140, 107–113. [Google Scholar] [CrossRef]
- Morzfeld, M.; Tu, X.; Atkins, E.; Chorin, A.J. A random map implementation of implicit filters. J. Comput. Phys. 2012, 231, 2049–2066. [Google Scholar] [CrossRef]
- Andrieu, C.; Doucet, A.; Holenstein, R. Particle Markov chain Monte Carlo methods. J. R. Statist. Soc. B 2010, 72, 269–342. [Google Scholar] [CrossRef]
- Crisan, D.; Doucet, A. A survey of convergence results on particle filtering methods for practitioners. IEEE Trans. Signal Process. 2002, 50, 736–746. [Google Scholar] [CrossRef]
- Künsch, H.R. Particle filters. Bernoulli 2013, 19, 1391–1403. [Google Scholar] [CrossRef]
- Bao, F.; Cao, Y.; Meir, A.; Zhao, W. A first order scheme for backward doubly stochastic differential equations. SIAM/ASA J. Uncertain. Quantif. 2016, 4, 413–445. [Google Scholar] [CrossRef]
- Zhao, W.; Zhou, T.; Kong, T. High order numerical schemes for second-order FBSDEs with applications to stochastic optimal control. Commun. Comput. Phys. 2017, 21, 808–834. [Google Scholar] [CrossRef]
- Law, K.; Stuart, A.; Zygalakis, K. Data Assimilation; Springer: Cham, Switzerland, 2015. [Google Scholar]
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content. |
© 2024 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).