Online Inverse Optimal Control for Time-Varying Cost Weights

Inverse optimal control is a method for recovering the cost function used in an optimal control problem in expert demonstrations. Most studies on inverse optimal control have focused on building the unknown cost function through the linear combination of given features with unknown cost weights, which are generally considered to be constant. However, in many real-world applications, the cost weights may vary over time. In this study, we propose an adaptive online inverse optimal control approach based on a neural-network approximation to address the challenge of recovering time-varying cost weights. We conduct a well-posedness analysis of the problem and suggest a condition for the adaptive goal, under which the weights of the neural network generated to achieve this adaptive goal are unique to the corresponding inverse optimal control problem. Furthermore, we propose an updating law for the weights of the neural network to ensure the stability of the convergence of the solutions. Finally, simulation results for an example linear system are presented to demonstrate the effectiveness of the proposed strategy. The proposed method is applicable to a wide range of problems requiring real-time inverse optimal control calculations.


Introduction
The integration of biological principles with robotic technology heralds a new era of innovation, with a significant focus on applying optimal control and optimization methods to analyze animal motion.This approach guides robotic movement development evident in [1], which explores the intricate control systems in mammalian locomotion.Such research underpins the development of robots that emulate the efficiency and adaptability found in nature.
These advancements in understanding animal locomotion through optimal control methods set the stage for the relevance of inverse optimal control (IOC).IOC offers a retrospective analysis of expert movements-human or animal-to infer underlying cost functions optimized in these motions.This methodology is crucial when direct modeling of optimal strategies is complex or unknown.
The use of inverse optimal control (IOC) to identify suitable cost functions from the observable control input and state trajectories of experts is becoming increasingly important.Several successful applications of IOC in estimating the cost weights of multi-features have been reported.For example, the knowledge and expertise of specialists can be categorized and exploited in several fields, including robot control and autonomous driving.The authors of [2], who employed game theory in tailoring robot-human interactions, proposed a method for estimating the human cost function and selecting the robot's cost function based on the results, leading to the Nash equilibrium in human-robot interactions.The authors of [3] applied IOC to analyze taxi drivers' route choices.To investigate the cost combination of human motion, the authors of [4] conducted an experiment using IOC techniques to study human motion during the performance of a goal-achieving task using one arm.Additionally, the authors of [5] represented the learning of biological behavior as an inverse linear quadratic regulator (LQR) problem and proposed adaptive methods for modeling and analyzing human reach-to-grasp behavior.Furthermore, the authors of [6] employed an IOC method to segment human movement.
Linear quadratic regulation is a common optimal control method for linear systems.In the 1960s and 1970s, numerous researchers offered solutions to the inverse LQR problem [7][8][9].Recently, the theory of linear matrix inequality was employed to solve the inverse LQR problem [5,10,11].Regarding the application of the IOC method for nonlinear systems, several approaches involving methods such as passivity-based condition monitoring [12] or robust design [13] have been reported.
Recent studies in the field of IOC have demonstrated significant advancements.The authors of [14] provided a comprehensive review of the methodologies and applications in inverse optimization, highlighting its growing importance across various domains.The authors of [15] introduced a novel method for sequential calculation in discrete-time systems, enhancing the IOC model's efficacy under noisy data conditions.The authors of [16] employed a multi-objective IOC approach to explore motor control objectives in human locomotion, which has implications for predictive simulations in rehabilitation technology.Furthermore, the authors of [17] delve into cost uniqueness in quadratic costs and control-affine systems, shedding light on the non-uniqueness cases in IOC.Moreover, a recent thesis [18] introduces a Collage-Based Approach for solving unique inverse optimal control problems, leveraging the Collage method for ODE inverse problems in conjunction with Pontryagin's Maximum Principle.
Feature-based IOC methods, which involve modeling the cost function as a linear combination of various feature functions with unknown weights, have gained acclaim in recent years [19][20][21][22].However, it may be difficult to apply these methods to the analysis of complex, long-term behaviors using simple feature functions, e.g., analyzing human jumping [23].To address this challenge, the authors of [24] proposed a technique for recovering phase-dependent weights that switch at unknown phase-transition points.This method employs a moving window along the observed trajectory to identify the phase-transition points, with the window length determined by a recovery matrix aimed at minimizing the number of observations required for successful cost-weight recovery.Although this method is effective in estimating phase-dependent cost weights, the complex computational requirements limit its use in real-time applications, such as human-robot collaboration tasks.Additionally, in this method, the cost weights in each phase are assumed to be fixed, which may not be generalizable.For example, the human jump motion in [23] was analyzed using time-varying, continuous cost weights.
Overall, the IOC still has several shortcomings that need to be addressed, particularly when applied in approximating complex, multi-phase, continuous cost functions in real time.In this paper, we propose a method for recovering the time-varying cost weights in the IOC problem for linear continuous systems using neural networks.Our approach involves constructing an auxiliary estimation system that closely approximates the behavior of the original system, followed by determining the necessary conditions for tuning the weights of the neurons in the neural network to obtain a unique solution for the IOC problem.We demonstrate that the unique solution corresponds to achieving a zero error between the original system state and the auxiliary estimated system state, as well as zero error between the original costate and the integral of the estimated costate.Based on this analysis, we develop two neural-network frameworks: one for approximating the cost-weight function and the other for addressing the error introduced by the auxiliary estimation system.Additionally, we discuss the necessary requirements for the feature functions to ensure the well-posedness of our online IOC method.Finally, we validate the effectiveness of our method through simulations.This work makes several significant contributions: • We provide a solution for the recovery of time-varying cost weights, essential for analyzing real-world animal or human motion.
• Our method operates online, suitable for a broad spectrum of real-time calculation problems.This contrasts with previous online IOC methods that mainly focused on constant cost weights for discrete system control.

•
We introduce a neural network and state observer-based framework for online verification and refinement of estimated cost weights.This innovation addresses the critical need for solution uniqueness and robustness against data noise in IOC applications.

System Description and Problem Statement
Consider an object's system dynamics formulated as where A ∈ R n×n and B ∈ R n×m are two time-invariant matrixes, x ∈ R n represents the system states, and u = [u 1 , . . ., u m ] T ∈ R m denotes the control input of the system [25].
To minimize the following cost function while accounting for dynamics (1), the classic optimal control problem is required to design the optimal control input u * (t), and generate a sequence of optimal states x * (t).(Superscript * stands for the optimal condition.) Here, L 0 has the following form: where q = [q 1 , q 2 , . . ., q n f ] T ∈ R n f and r = [r 1 , r 2 , . . ., r m ] T ∈ R m ∀r i > 0 represent the cost weight vectors, F(x) is referred to as the general union feature vector with respect to x, and G(u) indicates the feature vector that is only relevant to the control input u [26].n f represents the feature's number, which is different from the dimension of system states.
For simplicity, we assume that r T G(u) = u T Ru where R is an unknown matrix with

Maximum Principle in Forward Optimal Control
To minimize the cost function as is the case in (2) with L 0 defined in (3), there exists a costate variable vector λ that satisfies Pontryagin's maximum principle as follows: where Fx = ∂F(x) ∂x and λ ∈ R n denote the costate variables.These two equations are derived from Pontryagin's Maximum Principle by taking the partial derivatives of the Hamiltonian function defined by H(x, u, λ) = L 0 + λ T (Ax + Bu), specifically λ = − ∂H ∂x and ∂H ∂u = 0.The initial value of λ can be represented as λ 0 .
The optimal control input u * of the system expressed by (1) is given as where λ is unknown.Thus, using this optimal control input, we have ẋ = Ax − Hλ (7) where H denotes the matrix H = BR −1 B T .Notably, given that B is a full column rank matrix, it is clear that H is invertible.In addition, since B is a bounded constant matrix, there exists a positive scalar δ H such that H satisfies ||H|| ≤ δ H . Additionally, the time derivatives of the system dynamics can be formulated as follows:

Analysis of the IOC Problem
We assume that the system states x[t, t f ] and the control input u[t, t f ], which represent the time series of the system states and control inputs from time point t to t f , provide the solution to the optimal minimization of the cost function (2).In addition, we assume that the optimal system states and control input satisfy the boundary conditions ||x|| ≤ The objective of the IOC problem is to recover the unknown cost weight's vector q(t).Furthermore, IOC, for example, may be employed to analyze different behaviors such as the effect of different occasions on the relative importance of certain human motion feature functions.A rigorous analysis of the derived cost weights that can recreate the original data x[t, t f ], u[t, t f ] is required for the aforementioned applications.To begin, we consider two problems: • What happens when a different feature function is selected?
In previous studies, it was assumed that the cost weight vector q is either a constant value [19] or a step function with multiple phases [24].These assumptions have been effective in recovering the cost weights used in the analysis of optimal control methods for a robot's motion control, such as analyzing the motion of a robot controlled by a LQR approach.However, occasionally, it may be inappropriate to assume that the cost weights are constants or step functions when analyzing the complex behaviors of natural objects, such as human motion.In particular, deciding which feature function to adopt when evaluating the motion of natural objects could pose a challenge.Proposition 1. Depending on the different selections of feature functions F(x) for the IOC, the original constant cost weight q may become a time-varying continuous function.
Proof.From (8), for the objects' original feature function, we have where q o denotes the original time-invariant cost weight vector, and Fo (x) denotes the partial derivative with respect to x of the original feature function.When we choose a different feature function F n (x), the above equation becomes where Fnx denotes the partial derivative with respect to x of the new selected feature function and q n is the corresponding cost weights on Fnx .Thus, we have From this equation, it follows that q n may be a time-varying function when Fox and Fnx are not equivalent, and as Fox and Fnx are continuous functions, we can reasonably conclude that q n is also a continuous function.
Based on this proposition, it is crucial to expand the definition of cost weights to include time-varying values, as this will facilitate a more accurate analysis of the motion of increasingly complex natural objects.Despite the need for time-varying cost weight recovery in many applications, it has received minimal research attention thus far.

•
Whether or not the given set x[t, t f ], u[t, t f ] in the IOC problem has a unique solution {q(t), r}.
The uniqueness of the solution to the IOC problem when cost weights are constant has been discussed in many studies [15,17,18,22].In this work, we determine if there is still a unique solution to the IOC problem when q is a time-varying function.
From (10), we can find different continuous functions q(t) such that the equation is satisfied for different values of R (different values of H).This implies that if q is considered as a time-varying function, the set {q(t), r} will not have a unique solution.
Therefore, when we consider the unique solution of the IOC problem with the timevarying function q(t), it is necessary to introduce additional conditions to ensure that the IOC problem has a unique solution and that the resulting unique solution is meaningful.
In this study, for simplicity, we assume that R = I [27,28], where I is the identity matrix.In actual optimal control cost functions, when we focus on reducing one of the control inputs u i , the convergence of the i-th system state x i related to u i will also be affected.Consequently, the final control result shows that the change in each state of the system is not solely influenced by the chosen cost weights q(t), but also by R(t).In the IOC problem, setting R(t) = I allows the effect of different weights on different control inputs in the original system to be reflected in the current estimate of q(t).This enables us to view the estimated weights on the system states as representing the relative importance of each state in the system's dynamic evolution, without considering the impact of the control input on these weights.
Based on our conclusion that q may be time-varying when different feature functions are chosen and on the corresponding conditions under which a unique solution exists, we can define the IOC problem to be solved in this study as follows: Problem 1. Online Estimation of Time-Varying Cost weights q(t) Given: (1) Measured system state x as well as control input u (2) R = I Goal: Online estimate of the time-varying q(t) utilizing the given x and u.

Adaptive Observer-Based Neural Network Approximation of Time-Varying Cost Weights
In this study, we estimate time-varying cost weight functions online using an observerbased adaptive neural network estimation approach, as opposed to earlier studies that required a large number of time series of x and u to recover fixed cost weights offline.

Construction of the Observer
Following the introduction of q(t) ∈ R n denoting the estimation of q(t), we define the estimation of the associated costate variable λ as follows: where Fx = ∂F( x) ∂ x denotes the partial derivatives of the feature functions that are only relevant to the estimated system states x obtained by inserting λ into (7): where the initial state x0 of this system is selected to be x0 = x 0 .Thus, compared with that of the original system, the error generated by the new estimation system can be expressed as where λ = λ − λ and x = x − x.Here, the feature function is selected such that its partial derivative with respect to x is bounded and it is assumed that || Thus, the following equation can be satisfied: where Moreover, from ( 6) and ( 7), λ can be calculated as follows:

Neural Network-Based Approximation of Time Varying Cost Weights
In this section, a neural network-based cost weight approximation algorithm is proposed.To calculate an approximation of the time-varying vector q, we adopt a neural network in which the chosen inputs are u I = x 0 u , where x 0 denotes the initial state of the system (1).Based on this, we assume that time-invariant weight matrixes W ∈ R n f ×l exist that satisfy the following expression: where ϕ(u I ) denotes the activation function and ϵ 1 (u I ) denotes the structure approximation error of the neural networks.In addition, the activation function selected enables the activation function as well as its partial derivative to satisfy the following boundary condition: ||ϕ(u I )|| ≤ δ p and || ∂ϕ(u I ) ∂u I || ≤ δ pu where δ p and δ pu represent two positive scalars.Additionally, ||ϵ 1 (u I )|| ≤ ϵ n where ϵ n is a positive scalar.
The estimate of vector q is constructed as follows: where Ŵ denotes the estimation of W. In this paper, we will combine two estimators Ŵ1 and Ŵ2 to estimate W, as shown in Section 4.1.Before presenting the details of the estimators, we first discuss the necessary conditions for the estimation.
Based on the setting of estimator Ŵ, the error of estimating q can be expressed as where W = W − Ŵ denotes the error of estimating W. Substituting q into (16) yields To profoundly comprehend the necessary condition for the convergence of the estimation error W, we define uniformly ultimately bounded (UUB) below.Definition 1.A time-varying signal σ(t) can be said as UUB if there exists a compact set S ⊂ R n so that for all σ ∈ S, there exists a bound µ ≥ 0 and a time T such that ||σ|| ≤ µ for all t ≥ t 0 + T.
where t 1 ≤ t i ≤ t f and any term in C satisfies the persistent excitation (PE) condition defined below. || Here, β j is a positive value.
Proof.From ( 21) Since sdt → 0, s → 0 reaches a steady state and A r is a constant, we can obtain the following: where δ si denotes a small positive scalar.Additionally, with both ϵ 1 (u I ) and T x being bounded, this leads to where δ Tϵ denotes a small positive scalar.The term T x ϵ 1 (u I )dt captures the effect of the structural error of the neural network on state s.Since T x is bounded, when the neural network approximates the cost weight function adequately, the value of ϵ 1 (u I ) decreases, which in turn minimizes the overall integral value.In other words, a well-selected neural network structure with a good approximation of the cost weight function will produce a small structure error and, therefore, a small overall integral value Similarly, we can obtain a similar relation for the duration [t 0 , t 1 ] From ( 27) and ( 28), it follows that Furthermore, considering where δ x and δ q represent the bounds of x and q respectively.Thus, this leads to the inequality In this case, when ˙Ŵ approaches zero, the following relation emerges: Based on this relation, it follows that where where C is defined in (22).Due to C being full row rank, this leads to From ( 23), we have Thus, W is UUB.Notably, β j evaluates the lower bound of the norm of , it can increase when the data x cause the norm of the integral to deviate significantly from zero.The size of δ ζ (1) , δ si is related to the minimization of s and t i t 0 sdt, and the size of δ Tϵ is related to the approximation ability of the chosen neural network.The bound of W after t 1 can be minimized by the excited x, successfully minimizing s and t i t 0 sdt while appropriately designing the structure of the neural network.

Construction of the Neural Network
As shown in Lemma 1, the convergence of t t 0 sdτ is essential in the convergence of W to 0. Therefore, it is necessary to incorporate this consideration in the approximation design.
First, we divide the estimation of the weights of the neural network into two parts: and where q1 = ŴT 1 ϕ(u I ) and q2 = ŴT 2 ϕ(u I ).The necessity for employing two distinct estimators, Ŵ1 and Ŵ2 , is rooted in their specialized roles in minimizing the tracking error s.This dual-estimator approach ensures that q(t) closely aligns with the desired trajectory q(t).While Ŵ1 's adaptive tuning is primarily aimed at steering s towards zero, its inherent residual errors in its adaptive process necessitate the deployment of Ŵ2 for error compensation and enhanced accuracy in tracking the ideal cost weight q(t).To gain a deeper understanding of this system, we will begin by examining the error dynamics, which forms a fundamental basis for the subsequent detailed exploration of the tuning laws for each estimator.
The state equation describing the error dynamics can be obtained as follows: where Further, to effectively minimize We suppose that an ideal time-invariant weight matrix W 2 ∈ R n f ×l exists, which guarantees that where The estimation error of the neural network can be represented as and e can be represented as Therefore, (41) becomes 4.2.Tuning Law of the Neural Network for the Estimation of q(t) An updating law for a neural network that estimates q(t) can be represented in Theorem 1, based on the error system's dynamics that were derived in (45).
Theorem 1.If we choose the updating laws for the neural network weights Ŵ1 and Ŵ2 as shown in (46), respectively, where Γ 1 , Γ 2 , and k e are positive scalar constants, then state s, t t 0 sdτ and error e will be UUB.
In addition, if there exist positive constants t δ , β 1 , β 2 , β 3 , and β 4 such that the inequalities in (47) are satisfied for all initial times t 0 , then the signals W1 and W2 will also be UUB.
Proof.A proof of this theorem can be found in Appendix A.
Applying (46) results in s, t t 0 sdτ, and e being UUB, as shown in Theorem 1.Additionally, (46) shows that when s and e decreases, ˙Ŵ 1 and ˙Ŵ 2 decrease as well, resulting in a decrease in ˙Ŵ = ˙Ŵ 1 + ˙Ŵ 2 .At this point, as stated in Lemma 1, if the condition of matrix C (defined in Lemma 1), being a full row rank matrix, is satisfied, then W = W1 + W2 will also be UUB.Thus, the solution to the IOC problem can be derived by applying (38).

Basic Simulation Conditions
To verify the effectiveness of our method, we performed the simulations using a sample linear system controlled by the optimal control method with the original cost weights R selected in two cases.
The cost function selected in these simulations is formulated as when all the elements of θ satisfying |θ i | ≤ θ r l and Q(t) = q 1 0 0 q 2 is the continuous time-varying cost weights on system states θ.R = r 1 0 0 r 2 represents the cost weights on the control inputs.Moreover, in our simulations, we select 0 as the initial value of all the elements of both Ŵ1 and Ŵ2 .Actuation function ϕ(u I ) was selected as ϕ(u I ) = [ϕ 1 (u I ), . . ., ϕ i (u I ), . . ., ϕ l (u I )] T with ϕ i (u I ) designed as where ν denotes a positive scalar and ψ i denotes the center of the respective activation function.We initialized the activation function centers on a four-dimensional grid to match the dimension of u i , ensuring a uniform distribution across the input space and enhancing network adaptability.
The overall implementation for recovering the time-varying cost weights is shown in Algorithm 1.

•
In the first case, we apply the optimal control of the sample system with cost weights θ as the signal (q 1 (t) = 1 + cos(t) and q 2 (t) = 2 + sin(t)).The proposed IOC method is employed online to estimate the cost weights, with the simultaneous online recovery of the original system trajectory.Parameters Γ 1 and Γ 2 in the updating law are set to Γ 1 = 1 and Γ 2 = 1, respectively.Parameters k and k p are set to k = 50 and k p = 625, respectively.The initial values of Ŵ1 and Ŵ2 are set to matrixes with all elements equal to zero.The original r 1 and r 2 are set to r 1 = 1 and r 2 = 1, respectively.The simulation also uses 49 nodes in the neural network.

•
In the second case, we perform the simulation of our IOC method, but with the original r 1 and r 2 set to r 1 = 3 and r 2 = 4, respectively.All other simulation settings are the same as in the first case.
Similar to the simulation sections in previous works ( [6,24]), we use the control input from the simulation, which ignores the measurement issues with the control input and measurement errors that may occur in real-world applications.This allows us to purely evaluate the performance of our method in solving the IOC problem.In actual applications, the control input can be calculated by substituting the measured θ into (48), as described in [24].

Results
The simulation results are shown in the figures below.In Figure 1, the blue solid line represents the original variation in the cost weights whereas the gray solid line represents the estimated cost weights.After a brief period of oscillation at the initial time, our method accurately recovers the original cost weights when R = I.Notably, similar to the case in other adaptive control methods and adaptive neural network based control methods, the initial oscillation is a result of the adaptive initialization of the weights in (46) due to the large initial errors in W1 and W2 .In addition, the gray line represents the mutual weights in the dynamics of the system state, whereas the original weights among the control inputs are reflected in the current estimate of q(t).From the figure, we can observe that the bottom lines in blue and gray colors represent the value of the original and estimated q 2 .Evidently, the blue line for q 2 is larger than that for q 1 from 4.8 s to 5 s.Additionally, in the original settings, r 2 is 4, which confers greater importance to the decrease in u 2 compared with the case when r 1 = 3, leading to the weakening of the convergence of the θ 2 term associated with u 2 .In our estimates, the value of the dashed line for the estimated q 2 , which also considers the impact from original setting of R is not greater than the value of estimated q 1 between 4.8 s and 5 s.This indicates that the convergence of θ 2 is weakened by considering the impact from the cost weights on control input.Our dashed line more accurately reflects the actual situation compared to the blue line.In Figures 3-5, we show the results of error e, states s and t t 0 sdτ in two cases.The blue lines show the results of the first case, whereas the gray dotted lines show the results of the second case.From the figures, we can observe that all the values effectively decrease to a low range during the simulation, and most importantly, in the second case, the different selections of R do not affect the convergence of these values.This demonstrates the effectiveness of our method and highlights that even with different values of R, the recovered cost weights are still feasible solutions to the IOC problem, as they can be utilized to regenerate a similar system trajectory and control inputs (    In (46), Γ 1 and Γ 2 decrease the error by regulating the updating speed of the estimated values.Adjusting these two terms may successfully reduce the impact of data noise to a certain degree.Their roles are similar to that of a low-pass filter's time constant.For example, in the setting of the first case, when noise exists, x ∼ N (0, 10 −1 ) and u ∼ N (0, 10 −4 ), the simulation results show that different sets of Γ 1 and Γ 2 (e.g., Γ 1 = 10, Γ 2 = 10; Γ 1 = 1, Γ 2 = 1) can significantly influence the noise reduction performance.
As shown in Figure 6, while relatively small values of Γ 1 and Γ 2 may result in a low convergence rate, they effectively reduce the impact of data noise.Our method demonstrates robustness against noise by allowing for the adjustment of parameters Γ 1 and Γ 2 .

Calculation Complexity and Real-Time Calculation
The proposed algorithm has a low computational complexity, as it only involves the calculation of dot products between matrixes and vectors as well as the summation of vectors.Additionally, it does not require any iterative or optimization calculations.This makes it an efficient solution for real-time calculations.In fact, our simulation shows that a single iteration of the algorithm using case 1 settings takes only approximately 0.23 ms in Matlab 2016b to complete the SIOC's calculation, which is fast enough to meet real-time calculation requirements.

Advantages of Using R = I
The simulation results suggest that one of the key advantages of setting R as a constant I is that it effectively consolidates the impact of cost weights on state convergence, which would have been influenced by different settings of R, into the estimated value of q(t).This allows for a comprehensive evaluation of the system state convergence, as it only depends on q(t), without needing to account for additional considerations.Furthermore, by maintaining a consistent value of R = I, it is possible to standardize the analysis of the same motion across multiple agents, which is crucial for various applications.

Conclusions
In this paper, we proposed a neural network based method for recovering the timevarying cost weights in the IOC problem for linear continuous systems.Our approach involved constructing an auxiliary estimation system that closely approximates the behavior of the original system, followed by determining the necessary conditions for tuning the weights of the neurons in the neural network to obtain a unique solution for the IOC problem.We discussed the necessary requirements for the previous settings to ensure the well-posedness of our online IOC method.We showed that the unique solution corresponds to achieving a nearly zero error between the original system state and the auxiliary estimated system state, as well as nearly zero error between the original costate and the integral of the estimated costate.Based on this analysis, we developed two neural network frameworks: one for approximating the cost weight function and the other for addressing the error introduced by the auxiliary estimation system and terms.Finally, we validated the effectiveness of our method through simulations, highlighting its ability to recover time-varying cost weights and its robustness against different original choices of R. Overall, our method represents a significant advancement in the field of online IOC, and it is applicable to a wide range of problems requiring real-time IOC calculations.

Figure 2
Figure2demonstrates the impact of selecting R = I on the estimation results when the original R value is arbitrary.The solid blue line represents the original time-varying cost weights, whereas the dotted gray line represents the final estimated values.Although the estimated values differ from the original values, the general trend of the changes is preserved.In addition, the gray line represents the mutual weights in the dynamics of the system state, whereas the original weights among the control inputs are reflected in the current estimate of q(t).From the figure, we can observe that the bottom lines in blue and gray colors represent the value of the original and estimated q 2 .Evidently, the blue line for q 2 is larger than that for q 1 from 4.8 s to 5 s.Additionally, in the original settings, r 2 is 4, which confers greater importance to the decrease in u 2 compared with the case when r 1 = 3, leading to the weakening of the convergence of the θ 2 term associated with u 2 .In our estimates, the value of the dashed line for the estimated q 2 , which also considers the impact from original setting of R is not greater than the value of estimated q 1 between 4.8 s and 5 s.This indicates that the convergence of θ 2 is weakened by considering the impact from the cost weights on control input.Our dashed line more accurately reflects the actual situation compared to the blue line.

6 . Discussion 6 . 1 .
Robustness of the Proposed Method to Noisy Data

Lemma 1 .
If the following conditions are satisfied, W becomes UUB.