Learning Output Reference Model Tracking for Higher-Order Nonlinear Systems with Unknown Dynamics^{ †}

## Abstract

## 1. Introduction

## 2. Output Model Reference Control for Unknown Dynamics Nonlinear Processes

#### 2.1. The Process

**Assumption**

**1**

**(A1).**

**Assumption**

**2**

**(A2).**

**Assumption**

**3**

**(A3).**

#### 2.2. Output Reference Model Control Problem Definition

## 3. Solution to the ORM Tracking Problem

#### IMF-AVI Convergence Analysis with Approximation Errors for ORM Tracking

Algorithm 1 VI-based Q-learning. |

S1: Initialize controller ${\mathbf{C}}_{0}$ and the Q-function value to ${Q}_{0}({\mathbf{x}}_{k}^{E},{\mathbf{u}}_{k})=0$, initialize iteration index $j=1$ |

S2: Use one step backup equation for the Q-function as in (13) |

S3: Improve the controller using the Equation (14) |

S4: Set $j=j+1$ and repeat steps S2, S3, until convergence |

**Lemma**

**1.**

**Proof.**

**Lemma**

**2.**

- (1)
- $0\le {Q}_{j}({\mathit{x}}_{k}^{E},{\mathit{u}}_{k})\le B({\mathit{x}}_{k}^{E},{\mathit{u}}_{k})$ with $B({\mathit{x}}_{k}^{E},{\mathit{u}}_{k})$ an upper bound.
- (2)
- If there exists a solution ${Q}^{*}({\mathit{x}}_{k}^{E},{\mathit{u}}_{k})$ to (8), then $0\le {Q}_{j}({\mathit{x}}_{k}^{E},{\mathit{u}}_{k})\le {Q}^{*}({\mathit{x}}_{k}^{E},{\mathit{u}}_{k})\le B({\mathit{x}}_{k}^{E},{\mathit{u}}_{k})$.

**Proof.**

**Theorem**

**1.**

- (1)
- {${Q}_{j}({\mathit{x}}_{k}^{E},{\mathit{u}}_{k})$} is a non-decreasing sequence for which ${Q}_{j+1}({\mathit{x}}_{k}^{E},{\mathit{u}}_{k})\ge {Q}_{j}({\mathit{x}}_{k}^{E},{\mathit{u}}_{k})$ holds, $\forall j,\forall ({\mathit{x}}_{k}^{E},{\mathit{u}}_{k})$ and
- (2)
- (2) ${\mathrm{lim}}_{j\to \infty}{\mathit{C}}_{j}\left({\mathit{x}}_{k}^{E}\right)={\mathit{C}}^{*}\left({\mathit{x}}_{k}^{E}\right)$ and ${\mathrm{lim}}_{j\to \infty}{Q}_{j}({\mathit{x}}_{k}^{E},{\mathit{u}}_{k})={Q}^{*}({\mathit{x}}_{k}^{E},{\mathit{u}}_{k})$.

**Proof.**

Algorithm 2 IMF-AVI. |

S1: Initialize controller ${\tilde{\mathbf{C}}}_{0}$ and Q-function value ${\tilde{Q}}_{0}({\mathbf{x}}_{k}^{E},{\mathbf{u}}_{k})=0,\forall ({\mathbf{x}}_{k}^{E},{\mathbf{u}}_{k})$. Initialize iteration $j=1$ |

S2: Update the approximate Q-function using Equation (24) |

S3: Improve the approximate controller using Equation (25) |

S4: Set $j=j+1$ and repeat steps S2, S3, until convergence |

**Theorem**

**2.**

**Proof.**

## 4. Validation Case Studies

#### 4.1. ORM Tracking for a Linear Process

**P**and the basis function vector ${\mathsf{\Phi}}^{\top}({\mathbf{x}}_{k}^{E},{\mathbf{u}}_{k})$ is obtained by the nonrepeatable terms of the Kronecker product of all the Q-function input arguments.

#### 4.2. IMF-AVI on the Nonlinear TITOAS Aerodynamic System

#### 4.3. Initial Controller with Model-Free VRFT

#### 4.4. Input–State–Output Data Collection

#### 4.5. Learning State-Feedback Controllers with Linearly Parameterized IMF-AVI

#### 4.6. Learning State-Feedback Controllers with Nonlinearly Parameterized IMF-AVI Using NNs

#### 4.7. Comments on the Obtained Results

## 5. Conclusions

## Author Contributions

## Funding

## Conflicts of Interest

**Figure 1.**Closed-loop state transitions data collection for Example 1: (top) ${y}_{k}$ (black), ${r}_{k}$ (blue), ${y}_{k}^{m}$ (red); (bottom) ${u}_{k}$.

**Figure 2.**Convergence results of the linearly paramaterized iterative model-free approximate Value Iteration (LP-IMF-AVI) for the linear process example.

**Figure 4.**Open-loop input–output (IO) data from the two-inputs–two-outputs aerodynamic system (TITOAS) for Virtual Reference Feedback Tuning (VRFT) controller tuning [46].

**Figure 5.**IO data collection with the linear controller: (

**a**) ${u}_{k,1}$; (

**b**) ${y}_{k,1}$ (black), ${y}_{k,1}^{m}$ (red), ${r}_{k,1}$ (black dotted); (

**c**) ${u}_{k,2}$; (

**d**) ${y}_{k,2}$ (black), ${y}_{k,2}^{m}$ (red), ${r}_{k,1}$ (black dotted).

**Figure 6.**The LP-IMF-AVI convergence on TITOAS (© 2019 IEEE [12]).

**Figure 7.**The IMF-AVI convergence on TITOAS: ${y}_{k,1}^{m},{y}_{k,2}^{m}$, (red); ${u}_{k,1},{u}_{k,2},{y}_{k,1},{y}_{k,2}$ for LP-IMF-AVI (black), for NP-IMF-AVI with NNs (blue), for the initial VRFT controller used for transitions collection (green) (© 2019 IEEE [12]).

