Tracking Control for Output Probability Density Function of Stochastic Systems Using FPD Method

Output probability density function (PDF) tracking control of stochastic systems has always been a challenging problem in both theoretical development and engineering practice. Focused on this challenge, this work proposes a novel stochastic control framework so that the output PDF can track a given time-varying PDF. Firstly, the output PDF is characterised by the weight dynamics following the B-spline model approximation. As a result, the PDF tracking problem is transferred to a state tracking problem for weight dynamics. In addition, the model error of the weight dynamics is described by the multiplicative noises to more effectively establish its stochastic dynamics. Moreover, to better reflect the practical applications in the real world, the given tracking target is set to be time-varying rather than static. Thus, an extended fully probabilistic design (FPD) is developed based on the conventional FPD to handle multiplicative noises and to track the time-varying references in a superior way. Finally, the proposed control framework is verified by a numerical example, and a comparison simulation with the linear–quadratic regulator (LQR) method is also included to illustrate the superiority of our proposed framework.


Introduction
In recent years, there has been a growing interest in stochastic control, as many systems in the real world, such as those in aerospace, chemical, textile machinery and ships, can all be modelled as stochastic dynamic systems [1]. In the study of stochastic system control, for the Gaussian processes, the distribution can be controlled by only controlling its mean and variance [2]. However, many stochastic systems contain non-Gaussian processes, such as the scale distribution of flocculating particles in the paper process [3], the flame shape distribution [4], and the chemical polymerisation reaction molecular weight distribution [5]. For such stochastic systems with non-Gaussian variables, the mean and variance cannot describe the full statistical information, so that traditional methods are no longer productive [6]. In order to solve this problem, Wang proposed a new stochastic control method by introducing the PDF into the control field in 1996 [7,8]. In this method, the B-spline function is applied to model the PDF, which releases the limitation of the Gaussian assumption of the traditional stochastic control methods [9]. Based on that, a series of stochastic control frameworks have been presented in both practical and theoretical disciplines. On the practical side, Li applied the PDF control method to the fibre length distribution in the grinding process to predict the random distribution of the fibre length and achieved a good control effect [3]. Sun applied this method to the flame temperature field, realised the modelling of the flame temperature field and its iterative learning control, and achieved the purpose of improving the control effect [4]. In theoretical terms, a PDF shape control method using the FPK equation for nonlinear stochastic systems with an arbitrary expression was proposed in [10], where the exact stationary solution of the FPK equation was derived and the controller was designed for a non-polynomial nonlinear function. Reference [11] presented the rational square root B-spline model and introduced the concept of the pseudo-weight, which makes the control algorithm almost unconstrained and the system analysis relatively simple. In these studies, most correlation methods ignore the model error of the B-spline function or simply assume the error as steady-state residuals. Thus, the control methods that have been applied in most of these studies are LQR methods. However, for some complicated practical systems, the PDF model error can be involved in a stochastic dynamical manner [12]. Ignoring that would affect the control performance and even lead to divergence. For such complex practical processes, multiplicative noise, which is also known as state-dependent noise, can better characterise the uncertainty of the model parameters and be more in line with the actual systems [13]. In this way, the LQR method is no longer suitable for dealing with noises and uncertainties, which are strongly dependent on the system state. Thus, finding a more advanced control algorithm aiming at multiplicative noise is crucial.
Target tracking has always been a hot topic in both theoretical research and practical applications [14]. Unlike the general tracking problem, for the stochastic systems, it usually requires the system to track a distribution rather than a single value. For the Gaussian processes, the controller can be simply designed to track the mean and variance of the given distribution. For the non-Gaussian processes, there are many algorithms and literature works published under the B-spline model framework proposed by Wang. For instance, Zhou designed the observer to control the observed state to track the actual state of the system [15]. Luan added noise to the model to control the output PDF by the optimal control algorithm [16]. In most of the related studies, they only considered the static tracking targets and neglected the time-varying targets. However, in many industrial plants, in order to meet the field requirements, the tracking target can be time-varying, which increases the difficulties for controller design.
In summary, based on the aforementioned problems, the challenges of stochastic systems include, but are not limited to the following: (1) most existing stochastic control algorithms are limited to the Gaussian assumption and have poor control performance once the systems contain non-Gaussianity or nonlinearity; (2) for the published control frameworks under the B-spline approximation model proposed by Wang [8], even though they have successfully lifted the Gaussian assumption of the traditional stochastic method, they have failed to involve the uncertainties and errors of the model parameters in a stochastic dynamical manner; (3) The majority of stochastic tracking problems consider static references other than the time-varying reference, which brings problems to practical applications. To address these issues, this paper proposes a novel B-spline model-based control framework using the fully probabilistic control design (FPD) [17,18]. In this method, a linear B-spline model is applied to describe the distribution of the system output [11,16]. Besides the B-spline model, there are other approximation models that can also be used to fit the PDF curve such as the RBF model [19], the input and output ARMAX model [20], the neural network PDF model, and so on [21]. For the RBF model, the existing research shows that there are few intermediate parameters in RBF model control, but the dynamical relationship between the input and output cannot be reflected. Furthermore, the model accuracy of the RBF model is relatively poor especially when few basis functions are chosen. Although the input-output ARMAX model can reflect the dynamic relationship between the input and output to some extent, it has difficulty controlling the shape of the system when designing the controller. The neural network PDF model can achieve satisfactory control results; however, for a multi-input multi-output system, the number of basis functions will increase exponentially with the increase of the complexity of the system. Compared with the other types of approximation models mentioned above, the B-spline model is the most widely used and is easier to convert to the state-space dynamical weight model. Furthermore, compared to the other types of B-spline model such as the square root B-spline model [22], rational B-spline model [23], and rational square root B-spline model [24], the linear B-spline model is relatively simple and unique in expression.
Based on the B-spline approximation principle [25], this model approximates the system output probability density function by n pre-selected B-spline basis functions. Due to the distribution constraints, only n − 1 of the n weights in this model are independent of each other [26]. Therefore, the control of the system output PDF can be achieved by controlling n − 1 independent weights. Thus, the control goal is transferred from altering the shape of the PDF to a desired PDF to controlling the weights of the B-spline model to a pre-selected weight set. Besides, the PDF model error is characterised as multiplicative noise [27,28], indicating that the weight error of the PDF model is involved in a dynamical manner. Considering that, the randomised controller FPD was implemented in this paper to achieve the control goal [29,30]. The FPD is evaluated by minimising the KL-divergence between the distribution of the system dynamics and the desired distribution. To better cope with multiplicative noises, the extended FPD proposed by Zhou [12] was applied in this paper. This extended FPD is strongly intuitively appealing and provides an explicit minimisation strategy that exhibits better convergence, as well as shorter response times when dealing with multiplicative noise [30].
To sum up, under the B-spline framework proposed by Wang, the purpose of this paper is to design a randomised control method so that the weight of the dynamics can track a target time-varying weight, thus realising PDF shape tracking and providing a theoretical basis for PDF tracking for non-Gaussian stochastic systems in the actual industry. The contribution of this paper can be summarised as follows: 1 The linear B-spline model is implemented to characterise the system PDF, thus converting the PDF shape control problem into the weight control problem; 2 The PDF model error is represented as multiplicative noise, indicating the stochasticity and dynamics of the weight error; 3 The dynamics of the weights are characterised by the stochastic state space model, thus providing full stochastic properties; 4 The extended FPD is applied to better cope with the multiplicative noises existing in the dynamics of the weight model. Compared with the conventional FPD [29], the extended FPD can better cope with multiplicative noises by modifying the Riccati equations. Moreover, we improved the extended FPD in [12] so that the system state can track the time-varying targets.
The rest of this paper is organised as follows. Section 2 states a description of the problem. In Section 3, the optimal control law is solved based on the performance metrics and the implementation algorithm is provided. In Section 4, the controller is applied to a numerical example and a comparison simulation with the LQR method is also included. Finally, the conclusion and future work are summarised in Section 5.

PDF Description Based on B-Spline
For the output PDF of the controlled system, if the PDF is obtained by solving partial differential equations, it is very challenging to model by first principles, thus bringing difficulties to obtaining an effective control strategy [31]. To address that, the B-spline model can be applied to fit the PDF curve by the relationship between the weights and basis functions. Assuming that the interval [a, b] is known and the output PDF γ(y) is continuous and bounded on the interval [a, b], the output PDF can be represented using n B-spline basis functions as follows: where w i , (i = 1, 2, ..., n) is the weight and B i (y) is the pre-selected n basis functions. There are different functions that can be chosen as basic functions such as the Gaussian function, radial basis function, and wavelet basis function. From the properties of the B-spline function, it can be used to approximate any continuous function defined on a compact set [25]. A notable advantage of the B-spline method is the exact and smooth surfacewise description of the curve [32]. Compared with the RBF basis function, when the number of basis functions is low, the B-spline basis function is more suitable for curve fitting. This will be proven later by the comparison simulation results in Section 4. Since γ(y) is a PDF defined on the interval [a, b], it should satisfy the following constraint: b a γ(y) = 1.
As Equation (2) needs to be guaranteed, it follows that only the n − 1 weights are independent of each other, for which the distribution can be written in the following form: where x is the weight set, C 0 (y) is the basis function vector, and L(y) is the basis function scalar. From Equations (5) and (6), we can see that C 0 (y) and L(y) are known once the basis functions are chosen. According to Equations (1)-(6), we can see that, by using the B-spline model, controlling the output PDF shape can be realised by controlling n − 1 independent weight vectors [33]. Figure 1 demonstrates the tracking diagram of the system B to the system A. The target system A has an unknown structure and can monitor the output in real time, which means that for any k moments, the output PDF distribution g k (y) of the system A is known. System B is a known controlled system with control input u and output γ k (y, u k ). The goal here is to make the output distribution of System B track the output distribution of System A, where D characterises the difference between the two distributions. According to Figure 1, the system and tracking control problems mentioned above will be described in detail in the following section. Consider the stochastic system with output PDF γ k (y), whose dynamics is described by:

PDF Tracking Control Problem
where the distribution of the system output y is γ k (y) and u k is the system input. By applying the B-spline model (3), the output PDF γ k (y) of the tracking system B is described as follows: where x k are the weights corresponding to the basis functions. The tracking target g k (y) is a dynamic target PDF that varies dynamically with time in the following form: where V k are the pre-set target weights corresponding to each basis function. Note that the B-spline basis functions need to be selected in advance. Then, the shaping distribution problem of the output PDF γ(y, u k ) on the interval [a, b] can be realised by controlling the weight state x k . Based on the B-spline model framework given in [8], the dynamics of the weight states x k of the B-spline model-based PDF are described as follows: where G ∈ R n−1×n−1 , H ∈ R n−1×1 are the corresponding weight system parameters and E k represents the state-based model randomness whose distribution is given by where M is the variance of E k . The control flow chart is shown above in Figure 2. After pre-selecting the B-spline basis function, the time-varying target weight V k in Figure 2 can be obtained according to the target distribution through the B-spline principle. The system input u k is obtained by evaluating the target weight and the system weight through the FPD controller, which will be introduced in the next section. The weight x k+1 is then updated by B-spline principle modelling and input u k . According to the relationship between the weight and the basis function, the output distribution is then obtained. In addition, it should be noted that the model error part E k is represented by multiplicative noise, and D is the weight matrix with the appropriate dimension. Finally, the weights are iteratively updated according to the model to achieve output distribution control.
Feedback session Following this sequence, our control goal was to design a randomised controller so that the weight x k can follow the target weight V k . The PDF shape-tracking control problem is transformed into a weight tracking control problem. The controller design will be introduced in the next section.

Control Algorithm
In this section, the FPD control algorithm is introduced to achieve the tracking goals for the weight state. The main reason we chose FPD is that we considered the multiplicative noise in the weight dynamical system. Moreover, FPD not only fully respects the stochastic nature of the system, but also provides a very detailed implementation procedure. The details will be given as follows.

General Control Solution of FPD
In FPD, the difference between the actual distribution f (D) of the system and the target distribution f I (D) is measured by the Kullback-Leibler divergence (KLD) [34], where D = (x k , u k ) represents the observation data. The closer the distance between the output distribution and the target distribution, the greater the degree of similarity and, conversely, the smaller the degree of similarity [35]. The formula for the KLD is given by where D is the relative entropy, directional divergence, or KLD and D has the key property of non-negativity, i.e., D ( f || f I ) ≥ 0. Assuming that u k is the input to the system and k ∈ k 0 , k 1 , ..., k n , all states x k of the system are assumed to be measurable; the joint PDF of the closed-loop system from moment k 0 to moment k n is: where s(x k |x k−1 , u k−1 ) denotes the distribution of the system dynamics at moment k and c(u k−1 |x k−1 ) denotes the distribution of the controller at moment k. Similar to the joint PDF system, the target PDF f I (D) is given by where s I (x k |x k−1 , u k−1 ) denotes the desired distribution at moment k, while c I (u k−1 |x k−1 ) denotes the desired distribution of the controller at moment k. The control strategy is to find an optimal randomised controller to bring the distribution of the system dynamics as close as possible to a target distribution, thus minimising the KLD given in Equation (12). The following performance indicators can be established according to Equations (12)- (14): where, for any τ ∈ k 0 , k 1 , ..., k n , the recursive form of Equation (15) can be obtained according to the dynamic programming method as follows: Thus, the optimal control law c * (u k−1 |x k−1 ) that minimises the performance index (16) can then be evaluated as follows based on FPD: where The proof of Equation (17) can be found in [12].

FPD Control Solution for the Weight Dynamic System
To apply the FPD, all the variables will be characterised as stochastic distributions. Based on Equation (10), the weight dynamic distribution is given as follows with µ k as the mean and Z k as the covariance: where where V k is the target weight of the system distribution at instant k and R k is its ideal covariance. Note that V k is dynamic and time-varying, so as to better simulate the real situation. The distribution of the ideal controller can also be formulated as where Γ is the ideal covariance of the controller and θ k−1 is the ideal mean of the optimal control signal at each moment of the controller, which can be evaluated as follows: Therefore, given the system output distribution (18) and the target distribution (19) and (20), following the general PFD scheme Equation (17), the optimal distribution of the tracking controller is given by the following theorem.

Theorem 1.
Under the PDF (18) describing the dynamic weights of the system, the optimal controller distribution for minimising the performance index function (17) is given as follows: and we establish the following performance indicator function: where where Equation (24) is the Riccati expression of the performance index, S k is the quadratic Riccati equation, P k is the linear term of the Riccati equation, and q k−1 stands for some normal numbers that do not contribute to the controller, thus omitted here. In addition,ū k−1 is the mean of the optimal controller, Γ k−1 stands for the covariance, and K k−1 is the controller feedback gain. Moreover, d k−1 is a linear shift caused by the considered tracking control problem.
Proof. The proof of this theorem can be evaluated following the same procedure in our previous publication [12], thus omitted here. (17) is the general solution to the fully probabilistic control design irrespective of the type of distribution describing the system dynamics or whether the system is linear or nonlinear. It can be derived numerically or analytically. The specific solution of the FPD given by Theorem 1 is the analytical solution for this specific case.

Remark 1. The FPD algorithm given by Equation
The proposed control framework for the B-spline modelled PDF shaping using the FPD method is summarised by Algorithm 1.

Algorithm 1:
Tracking control framework with output probability density function.
Input: Target PDF γ g (y, k) with time dynamics Output: PDF 1 Choose n B-spline basis functions; 2 Model the PDF, and obtain the weight dynamics using B-spline models as Equation (10); 3 Initialise: u 0 ← m, x 0 ← m, S k ← m, P k ← m, γ 0 ← C 0 (y)x 0 + L(y), and m here denotes random generation; 4 for k = 0 do 5 Model the target PDF, and obtain the target weight distribution using the B-spline model inEquation (19); 6 Evaluate S k , P k following Equation (25); 7 According to Equation (22), formulate a randomised optimal controller by calculating K k , d k , using the obtained S k , P k ;

8
Update the new weight distribution as (18); 9 Obtain the PDF using the updated weight distribution x k+1 and B-spline model (8); 10 k = k + 1;
where j is the order of nodes, n j is the total number of nodes of the RBF basis function, c j and b j are the centre value and width, and variable y ∈ [a, b]. The same as the B-spline model, four basis functions of the RBF model were chosen to fit the target PDF curve.
The simulation result is shown in Figures 3 and 4. Figure 3 shows the fitting effect of the B-spline basis function and RBF basis function on the curve, where the red real line is the fitting curve of the B-spline model, the green real line is the fitting curve of the RBF model, and the black dotted line is the target curve. Figure 4 demonstrates the fitting errors of the two models, where the red line is the fitting error of the B-spline model and the green solid line is the fitting error of the RBF model. From Figure 3, it can be seen that, for the same number of basis functions, the B-spline model has a better fitting performance than the RBF model. It can also be proven by Figure 4, which shows clearly that the B-spline model has a much smaller fitting error than the RBF model. Therefore, the result proves that the B-spline model is more suitable for PDF fitting when the number of basis functions is low. As a state space model, the weight dynamic system from the approximation model generally has a lower number of basis functions, which makes the B-spline model more suitable in such cases.  To verify the control effectiveness of the proposed control algorithm, the following example, which uses the same four basis functions (26), is given as below. Due to the existence of the constraint condition (2), only three corresponding weights will be required out of four B-spline basis functions. The fourth weight is linearly related to the first three weights so that the model order is reduced to three. The coefficient matrix of the system model is where G is the state weight matrix, H is the control matrix, D is the random weighting matrix of the noise term, and E k is the Gaussian noise. The target weight is given by V g = 0.4229 0.1217 0.1487 T for the first 100 steps and V g = 0.5908 0.1701 0.2077 T for the remaining steps. In addition, the system initial weight state is x 0 = 0.2673 0.0969 0.1897 T ; the covariance of the noise is subjected to the distribution given by E k N(0, 0.004); the ideal state covariance R was chosen as R = diag 0.4 0.004 10 .
In order to verify the tracking performance of the algorithm proposed in this paper, the results were compared with the conventional LQR algorithm. The following simulation results can be obtained following Algorithm 1. The state feedback gain matrix K = −0.2544 −2.2287 0.535 is calculated according to Equation (23). The state feedback gain matrix K L = 0.2814 −0.9455 0.3609 is calculated according to LQR algorithm. Figure 5 shows the four weights of the FPD algorithm and LQR algorithm and their target reference curves, respectively. More specifically, in Figure 5, the red line is the weighttracking curve under the FPD algorithm, while the blue line represents the weight-tracking curve under the LQR algorithm. Note that the weights V 1 , V 2 , and V 3 are the controlled states, while the weight V 4 has a linear relationship with the other three weights, which is not included as the controlled state due to the existence of the constraints. Therefore, the weight V 4 is represented by a dotted line. From Figure 5, it can be seen that, even with the effect of the multiplicative noises, both the FPD algorithm and the LQR algorithm can successfully track the given references and manage to keep tracking the new reference values when the target changes at the 100th step. Compared with the LQR algorithm, we can see that the FPD algorithm shows a better tracking effect and smaller tracking error under the effect of multiplicative noise. Figure 6 indicates the system control inputs, where the red line is the control input curve under the FPD algorithm and the blue line is the control input curve under the LQR algorithm. The results of Figure 6 show that, due to the change in the shape of the target, the input of the control target will also change synchronously. Compared with the LQR algorithm, the input of the FPD algorithm contains less randomness under multiplicative noise. Figure 7 shows the iterative process of the two algorithms controlling the output PDF curve. Compared with the results of the system under the action of the two algorithms, the FPD algorithm has a smaller error and a smoother tracking process from the initial curve to the target curve. When the target curve changes, the system control output curve will still track the new target. Therefore, it can be concluded that, compared with the LQR algorithm, the extended FPD algorithm can show a better tracking effect in the dynamic random B-spline function-based weight dynamical system with multiplicative noise and time-varying target.

Conclusions
This paper investigated the output distribution shape-tracking problem for a class of stochastic distribution systems under the B-spline model framework. The linear B-spline model was implemented to fit the PDF curve and simplified the PDF shaping problem to a dynamic weight-altering problem. At the same time, the randomness and the model error of the dynamic weight system were characterised as state-dependent noises, which more realistically simulate the actual complex system. In addition, the target distribution shape of the system was changed from a fixed shape to a time-varying shape, which makes the control goal more realistic and convenient to operate in the actual working process. The extended FPD algorithm was then implemented to achieve the time-varying tracking goal under the effect of the multiplicative noises. Moreover, the implementation procedure was provided step by step. Finally, the simulation results were obtained following the implementation procedure. Meanwhile, to make the experimental results more convincing, the conventional LQR algorithm was also added for comparison. As a result, the simulation showed that the proposed control framework can make the dynamic weights track their time-varying targets under the effect of multiplicative noises. Compared with the LQR algorithm, the proposed extended FPD algorithm has smaller tracking errors and stronger robustness to the multiplicative noises in the model. In addition, in order to better illustrate the suitability of the B-spline model selected in this paper, the RBF model was also added for comparison. By analysing the simulation results, it was concluded that the B-spline model has a smaller fitting error than the RBF model when the number of basis functions is small.
Overall, the method proposed in this paper can effectively solve the time-varying distribution shape-tracking problem for a class of stochastic systems. In practice, the application of the B-spline framework can productively model the non-Gaussian PDF, where most of the practical systems contain non-Gaussian variables. Moreover, using stochastic distributions to characterise the dynamics of weights can fully represent the stochastic properties. In addition, the proposed controller structure is simple and easy to follow for practical application. Thus, the proposed control method can further promote the application of PDF control. In future work, we will consider the application of the proposed framework to real-world systems.