Trajectory Tracking within a Hierarchical Primitive-Based Learning Approach

A hierarchical learning control framework (HLF) has been validated on two affordable control laboratories: an active temperature control system (ATCS) and an electrical rheostatic braking system (EBS). The proposed HLF is data-driven and model-free, while being applicable on general control tracking tasks which are omnipresent. At the lowermost level, L1, virtual state-feedback control is learned from input–output data, using a recently proposed virtual state-feedback reference tuning (VSFRT) principle. L1 ensures a linear reference model tracking (or matching) and thus, indirect closed-loop control system (CLCS) linearization. On top of L1, an experiment-driven model-free iterative learning control (EDMFILC) is then applied for learning reference input–controlled outputs pairs, coined as primitives. The primitives’ signals at the L2 level encode the CLCS dynamics, which are not explicitly used in the learning phase. Data reusability is applied to derive monotonic and safely guaranteed learning convergence. The learning primitives in the L2 level are finally used in the uppermost and final L3 level, where a decomposition/recomposition operation enables prediction of the optimal reference input assuring optimal tracking of a previously unseen trajectory, without relearning by repetitions, as it was in level L2. Hence, the HLF enables control systems to generalize their tracking behavior to new scenarios by extrapolating their current knowledge base. The proposed HLF framework endows the CLCSs with learning, memorization and generalization features which are specific to intelligent organisms. This may be considered as an advancement towards intelligent, generalizable and adaptive control systems.


Introduction
A hierarchical primitive-based learning framework (HLF) for trajectory tracking has been proposed and extended recently in [1][2][3]. Its main goal is to make the control systems (CSs) capable of extending a current knowledge base of scenarios (or experiences) consisting of different, memorized tracking tasks, towards new tracking tasks that have not been seen before. While the knowledge base of tracking tasks is improved repetitively in a trialor iterative-based manner with respect to an optimality criterion, it is required that for new tracking tasks, the unseen-before trajectory is to be optimally tracked without giving the chance of improvement by repetitions. Therefore, the problem is one where the CS is required to extrapolate its current knowledge base to new, unseen-before scenarios. It is a form of generalization ability which is specific to living organisms and can be regarded as a form of intelligence under the name of cognitive control.
The means to achieve such generalization proposes an HLF approach [3]: first, the lower level L1 is dedicated to learning output-or state-feedback controllers for the underlying nonlinear system with unknown dynamics. Thus, this is a form of model-free or data-driven control. The L1 learning aims for ensuring that the closed-loop CS (CLCS) matches a linear reference model, in response to a given reference input; in addition to The HLF has been validated on a number of complex nonlinear mono-and multivariable systems such as: an aerodynamic system [4], robotic arm [2], electrical voltage control [1,3]. This paper's goal is to prove the framework's applicability and effectiveness on other applications which are very different in nature: the active temperature control system (ATCS) and the electrical braking system (EBS). Both ATCS and EBS have wide industrial occurrence; therefore, they impact many potential applications. Hopefully, this will elucidate more about the framework's generalization ability and bring the CSs a step closer to the desirable features of intelligent control: learnability, adaptability, generalization and robustness in harsh environments. The realistic experimental validation on hardware shows that the HLF's intermediate levels exhibit robustness against noise, against the CLCS's approximate linear behavior and against the varying desired trajectory's settings such as length and constraints. As a secondary objective, the proposed HLF shows that modern machine learning methods (supervised learning in particular) leverage control system techniques to reach capabilities beyond their classical scope. To this end, a long short-term memory (LSTM) nonlinear recurrent neural network (NN) controller was used for the first time with the VSFRT approach. The resulting nonlinear controller showed superior behavior with respect to a plain feedforward NN controller, which was trained with the same VSFRT principle. The explanation lies with the LSTM's ability to learn longer-term dependencies for time sequences. Function approximation theory is again employed in the third level (level L3) for predicting optimized reference inputs in the context of dynamical systems. This paper discusses basic theoretical assumptions about the controlled systems and introduces the model reference control problem in Section 2. Application of the proposed primitive-based learning framework to the ATCS is detailed in Section 3, whereas the application to the EBS is presented in Section 4. Concluding remarks are outlined in Section 5.

The Unknown Dynamic System Observability
The nonlinear unknown controlled system has the input-output discrete-time description (k indexes time sample): y k = f y k−1 , . . . , y k−ny , u k−1 , . . . , u k−nu , fulfilling the following assumptions:  A1. The input u k = [u k,1 , . . . , u k,m u ] T ∈ Ω U ⊂ R m u has known domain Ω U , and the output y k = [y k,1 , . . . , y k,m y ] T ∈ Ω Y ⊂ R m y has known domain Ω Y .
where σ k = [σ k,1 . . . σ k,n ] T ∈ Ω Σ ⊂ R n is the system's state of unknown order n and unknown domain Ω Σ , which is again unmeasured. A5. The nonlinear system (1) is input-output-controllable and the pair (g, h) is observable.
Definition 1 [3]. The unknown observability index of (1) is the minimal value τ min of τ for which where g is a partially unknown system function and is called the virtual state from whose definition it clearly results that s k,1 = y k , . . . , s k,2τ+1 = u k−τ . Additionally, s k is an alias for σ k from another dimension/space, being related through an unknown transformation σ k = T (s k ).
Proof of Theorem 1 . Proof is based on Theorem 1 from [6] using assumptions A1-A5. Observation 1. The virtual state-space (3) is fully state-measurable. Observation 2. The virtual state-space model (3) is input-output-controllable and has the same input-output behavior of (1) and (2).
Observation 3. Input delays in (1) can still lead to transformations (3) by the appropriate introduction of additional states. Time delay affects the relative degree of the basic system (1) and can be measured from input-output data. To accommodate this case, another assumption follows. A6. The system's (1) relative degree is known.
For the subsequent output model reference tracking design, the minimum-phase assumption about the system (1) is also enforced. The motivation is that the non-minimumphase behavior is more troublesome to handle within the model reference control with unknown system dynamics.
Observation 4. The size of s k built from input-output historical data can be much greater than the size of the true state vector σ k . Dimensionality reduction techniques specific to machine learning, such as principal component analysis (PCA) or autoencoders (AEs) are employed to retain the relevant transformed features emerging from the virtual state s k [4] .

The Reference Model
A linear, strictly causal reference model described in state-space form is presented as where s RM k = [s RM k,1 , . . . , s RM k,n m ] T ∈ Ω S RM ⊂ R n m is the n m -dimensional reference model state, . . , ρ k,m y ] T ∈ Ω ρ ⊂ R m y simultaneously excites the reference model and the CLCS and there is a one-to-one relationship between the components of ρ k , y k , y RM k , where y RM k = [y RM k,1 , . . . , y RM k,m y ] T ∈ Ω Y RM ⊂ R m y : each component of ρ k drives a corresponding component from y k and y RM k , respectively. Assuming an input-output pulse transfer matrix y RM k = M(q)ρ k , q −1 is the one step delay operator operating on discrete-time signals. For the model reference control, M(q) must carefully consider the non-minimumphase behavior of (1) together with its relative degree and bandwidth. These are classical requirements for the model reference control problem where the controller tuning for the nonlinear system (1) should make its output y k track y RM k when both the CLCS and the reference model are excited by ρ k . M(q) is mostly diagonal, to obtain decoupled control channels.

The Model Reference Control
The model reference control tracking problem can formally be written as the optimal infinite-horizon control [3] 2 , s.t. dynamics (3) (or (1)) + (4). (5) In (5), V ∞ RM is the cost function measuring the deviation of the CLCS output from that of the reference model output. The closed-form VSFRT solution to (5) is expressed as u k = C s ext k , with C(.) being a linear/nonlinear map over an extended state comprising of Both s k and ρ k will be replaced by their offline calculated counterparts s k and ρ k following the VSFRT principle. Problem (5) is indirectly solved as the next equivalent controller identification problem [3,4] where π is the controller function parameter leading to notation C s ext k , π (here, the controller can be an NN or other type of approximator) [4,5]. In [4,5], it was motivated why the reference model state s RM k should not be included within s ext k because the former correlates with ρ k . Additionally, [4] proposed theoretical stability analysis of the CLCS with the resulting controller and how the V N VR from (6) and V ∞ RM from (5) are related. For other solutions to the model reference tracking problem (5), such as reinforcement learning, a different s ext k is required in order to ensure the MDP assumptions about the controlled process [1,[3][4][5][6].
After solving the model reference control problem at level L1, learning level L2 takes place, using the EDMFILC strategy. The intention is to learn the primitive pairs in this level and use them to populate the primitive's library. The proposed three-levelled HLF is completed with the final level, L3. Here, the primitive outputs of the learned pairs are used for decomposing the desired new trajectory, whereas the primitive inputs are used to recompose the optimized reference input [1][2][3]. The HLF architecture is captured in the diagram in Figure 1. place, using the EDMFILC strategy. The intention is to learn the primitive pairs in this level and use them to populate the primitive's library. The proposed three-levelled HLF is completed with the final level, L3. Here, the primitive outputs of the learned pairs are used for decomposing the desired new trajectory, whereas the primitive inputs are used to recompose the optimized reference input [1][2][3]. The HLF architecture is captured in the diagram in Figure 1.

System Description
The active temperature control system (ACTS) is an Arduino-centered device dedicated to temperature control in a room-controlled temperature environment [58]. It has an active heating module in terms of a TIP31C power transistor. Additionally, an active cooler in terms of a fan relying on a DC motor with nominal characteristic consumptions of 0.5 amps (A) at about 120 revolutions per second. Using an analogue temperature measuring sensor based on LM35DZ, the main power transistor's temperature is read and used for feedback control. The equipment is small in scale and is depicted in Figures 2 and 3. A single power supply of 12 volts and maximum 2 amps is used from a commercially available DC-DC buck-boost converter. The power supply alternatively drives the power transistor and the DC fan via control logic: only one element is active at a time. Both elements are driven by pulse width modulation (PWM).

System Description
The active temperature control system (ACTS) is an Arduino-centered device dedicated to temperature control in a room-controlled temperature environment [58]. It has an active heating module in terms of a TIP31C power transistor. Additionally, an active cooler in terms of a fan relying on a DC motor with nominal characteristic consumptions of 0.5 amps (A) at about 120 revolutions per second. Using an analogue temperature measuring sensor based on LM35DZ, the main power transistor's temperature is read and used for feedback control. The equipment is small in scale and is depicted in Figures 2 and 3. A single power supply of 12 volts and maximum 2 amps is used from a commercially available DC-DC buck-boost converter. The power supply alternatively drives the power transistor and the DC fan via control logic: only one element is active at a time. Both elements are driven by pulse width modulation (PWM).
The fan control circuit uses a 1N4001 protection diode and a BC637 (up to 1.5 watts) transistor which allows for varying fan speed, thus accelerating the cooling process by heat dissipation (which otherwise would be a slow process given the system's nature).
The heater is the power transistor itself, capable of a maximum 40 watts and gradually controlling its temperature through the PWM switching logic. A heatsink is attached to the TIP31C's body to better dissipate heat, while the LM35DZ temperature sensor is physically connected to the power transistor using thermal paste for better heat transfer.  The fan control circuit uses a 1N4001 protection diode and a BC637 (up to 1.5 watts) transistor which allows for varying fan speed, thus accelerating the cooling process by heat dissipation (which otherwise would be a slow process given the system's nature).
The heater is the power transistor itself, capable of a maximum 40 watts and gradually controlling its temperature through the PWM switching logic. A heatsink is attached to the TIP31C's body to better dissipate heat, while the LM35DZ temperature sensor is physically connected to the power transistor using thermal paste for better heat transfer.
A sampling time of 20 s is sufficient to capture the ATCS's dynamics. The TIP31C's surface temperature is measured as the voltage of the analogue sensor and then converted via the Arduino's ADC port #0. Finally, the controlled output [°C/100] is just the normalized equivalent temperature in degrees Celsius divided by 100 and used for feedback control. The control to the heater and cooler transistors uses the PWM output ports (herein, ports #3 and #5) from Arduino. The alternating high-level switching logic activating the cooler/heater (just one at a time) uses the equation [58]    The fan control circuit uses a 1N4001 protection diode and a BC637 (up to 1.5 watts) transistor which allows for varying fan speed, thus accelerating the cooling process by heat dissipation (which otherwise would be a slow process given the system's nature).
The heater is the power transistor itself, capable of a maximum 40 watts and gradually controlling its temperature through the PWM switching logic. A heatsink is attached to the TIP31C's body to better dissipate heat, while the LM35DZ temperature sensor is physically connected to the power transistor using thermal paste for better heat transfer.
A sampling time of 20 s is sufficient to capture the ATCS's dynamics. The TIP31C's surface temperature is measured as the voltage of the analogue sensor and then converted via the Arduino's ADC port #0. Finally, the controlled output [°C/100] is just the normalized equivalent temperature in degrees Celsius divided by 100 and used for feedback control. The control to the heater and cooler transistors uses the PWM output ports (herein, ports #3 and #5) from Arduino. The alternating high-level switching logic activating the cooler/heater (just one at a time) uses the equation [58]  A sampling time T s of 20 s is sufficient to capture the ATCS's dynamics. The TIP31C's surface temperature is measured as the voltage V out of the analogue sensor and then converted via the Arduino's ADC port #0. Finally, the controlled output y k [ • C/100] is just the normalized equivalent temperature in degrees Celsius divided by 100 and used for feedback control. The control to the heater and cooler transistors uses the PWM output ports (herein, ports #3 and #5) from Arduino. The alternating high-level switching logic activating the cooler/heater (just one at a time) uses the equation [58] V i1 = max(min(0.2 + |u|, 1), 0) × 5V, V i2 = 0V, when u ≥ 0, V i1 = 0V, V i2 = max(min(0.15 + |u|, 1), 0) × 5V, when u < 0.
In the above equation, V i1 and V i2 are the voltages controlling the heater and the cooler, respectively (see Figure 3), whereas the thresholds 0.15 and 0.2 compensate the deadzones in the cooler and heater, respectively. The TIPC31C does not drive the current below 1 V (hence, no heat is produced) and the fan DC motor does not spin for a voltage supply under 0.75 V. Equation (2) ensures proper saturation of the voltages per min, max functions. The term u from (2) represents the signed a-dimensional control input u k ∈ [−1; 1], which interprets a duty cycle of the two PWMs, with the sign ensuring alternate functioning of the heater and the cooler, respectively.
Next, the input-output data collection step, intended for the virtual state feedback control learning process, is unveiled.

ATCS Input-Output Data Collection for Learning Low-Level L1 Control Dedicated to Model Reference Tracking
The open-loop input-output data collection uses a signal u k described as piece-wise constant whose levels are randomly distributed in the range [−0.3; 0.8]. The switching period of these levels is 2200 s (the system has high inertia, being a thermal process). To capture all relevant dynamics of the system, the exploration is stimulated by an additive noise similarly modelled as the base signal. The noise levels were uniformly distributed in [−0.5; 0.5] and its switching period being 100 s. The resulting input-output data are presented in Figure 4 for N = 4000 samples.
In the above equation, and are the voltages controlling the heater and the cooler, respectively (see Figure 3), whereas the thresholds 0.15 and 0.2 compensate the dead-zones in the cooler and heater, respectively. The TIPC31C does not drive the current below 1 V (hence, no heat is produced) and the fan DC motor does not spin for a voltage supply under 0.75 V. Equation (2) ensures proper saturation of the voltages per min, max functions. The term from (2) represents the signed a-dimensional control input ∈ [−1; 1], which interprets a duty cycle of the two PWMs, with the sign ensuring alternate functioning of the heater and the cooler, respectively.
Next, the input-output data collection step, intended for the virtual state feedback control learning process, is unveiled.

ATCS Input-Output Data Collection for Learning Low-Level L1 Control Dedicated to Model Reference Tracking
The open-loop input-output data collection uses a signal described as piece-wise constant whose levels are randomly distributed in the range [−0.3; 0.8]. The switching period of these levels is 2200 s (the system has high inertia, being a thermal process). To capture all relevant dynamics of the system, the exploration is stimulated by an additive noise similarly modelled as the base signal. The noise levels were uniformly distributed in [−0.5; 0.5] and its switching period being 100 s. The resulting input-output data are presented in Figure 4 for = 4000 samples. To learn a model-free controller using the collected input-output data, the VSFRT procedure is applied, as thoroughly described in [2,4,58]. The VSFRT paradigm ensures the procedure for designing a linear (or nonlinear) virtual state-feedback controller which matches the closed-loop control system to a reference model.
First, a reference model is selected as ( ) = 1/(500 + 1) ( is the continuous-time transfer function Laplace domain operator). Its selection is qualitative, based on several key observations: the ATCS is highly damped, it has no dead-time either with respect to the data acquisition process nor with its intrinsic dynamics, its bandwidth is matched with the natural open-loop ATCS's bandwidth, being slightly higher (faster response on closedloop than in open-loop which is common sense for the control). This refence model is To learn a model-free controller using the collected input-output data, the VSFRT procedure is applied, as thoroughly described in [2,4,58]. The VSFRT paradigm ensures the procedure for designing a linear (or nonlinear) virtual state-feedback controller which matches the closed-loop control system to a reference model.
First, a reference model is selected as M(s) = 1/(500s + 1) (s is the continuous-time transfer function Laplace domain operator). Its selection is qualitative, based on several key observations: the ATCS is highly damped, it has no dead-time either with respect to the data acquisition process nor with its intrinsic dynamics, its bandwidth is matched with the natural open-loop ATCS's bandwidth, being slightly higher (faster response on closed-loop than in open-loop which is common sense for the control). This refence model is discretized using zero-order hold for a sampling interval of 20 s, to render the discrete-time filter M(q). The fact that M(q) is linear indirectly requires the virtual state-feedback controller to render a linear CLCS over the ATCS.
Then, the steps below are followed, in order: Step 1. Define the observability index τ = 2, and form the trajectory {u k , y k , s k }, where the virtual state s k = [y k , y k−1 , y k−2 , u k−1 , u k−2 ] is built by assuming that the nonlinear ATCS is observable. Using the discrete time index k = 1, N, a total number of 3998 tuples of the form {u k , y k , s k } are obtained.
Step 2. The virtual reference input is computed as ρ k = M −1 (q)y f k , where y f k is the lowpass-filtered version of y k through the filter 0. 45 1−0.55q −1 , because y k is slightly noisy. Notably, M −1 (q) involves a non-causal filtering operation which is not problematic because it is performed offline.
Step 3. Construct the regressor state as s ext k = s T k , 1, ρ k T , 1 ≤ k ≤ 3998. The constant "1" is added into the regressor to allow for the offset coefficient identification, leading to a linear affine virtual state-feedback controller.
Step 4. Parameterize the virtual state-feedback controller in a linear fashion, as u k = K T s ext k . The VSFRT goal is to achieve model reference matching by indirectly solving the controller identification problem [2,4,53] Step 5. The problem (8) is posed as an overdetermined linear system of equations and solved accordingly as Following the previous steps, the linear virtual state-feedback compensator matrix is K * = [−8.532, −1.9441, 9.8722, 0.5199, 0.4131, 1.0127] T ∈ 6 , where the fifth value 0.4131 represents the offset gain. Testing the controller in closed loop shows the behavior in Figure 5. discretized using zero-order hold for a sampling interval of 20 s, to render the discretetime filter ( ). The fact that ( ) is linear indirectly requires the virtual state-feedback controller to render a linear CLCS over the ATCS.
Then, the steps below are followed, in order: Step 1. Define the observability index = 2, and form the trajectory { , , }, where the virtual state = [ , , , , ] is built by assuming that the nonlinear ATCS is observable. Using the discrete time index = 1, , a total number of 3998 tuples of the form { , , } are obtained.
Step 2. The virtual reference input is computed as = ( ) , where is the low-pass-filtered version of through the filter . . , because is slightly noisy.
Notably, ( ) involves a non-causal filtering operation which is not problematic because it is performed offline.
Step 3. Construct the regressor state as = [ , 1, ] , 1 ≤ ≤ 3998. The constant "1" is added into the regressor to allow for the offset coefficient identification, leading to a linear affine virtual state-feedback controller.
Step 4. Parameterize the virtual state-feedback controller in a linear fashion, as = . The VSFRT goal is to achieve model reference matching by indirectly solving the controller identification problem [2,4,53] Step 5. The problem (8) is posed as an overdetermined linear system of equations and solved accordingly as * = . Following the previous steps, the linear virtual state-feedback compensator matrix is  A satisfactory model reference tracking performance is achieved, as seen in Figure 5, thus ensuring indirect CLCS feedback linearization. The VSFRT controller is only linear; however, nonlinear structures such as NNs have intensively been employed [2,4]. Uniformly ultimately bounded (UUB) stability of the CLCS with the proposed nonlinear VSFRT controllers was analyzed according to Theorem 1 and Corollary 1 from [4]. The learned process is also one-shot, and no iterations are performed similarly to other learning paradigms such as value iteration reinforcement Q-learning [1,3,5,53].

Intermediate L2 Level Primitives Learning with EDMFILC
The closed-loop feedback control system is treated as a linear dynamical system from the reference input ρ k to the controlled output y k . To apply the primitive-based prediction mechanism for high-performance tracking without learning by repetitions, the primitives (the pairs of reference inputs-controlled outputs) must be learned in the first instance. The reason is that the primitive outputs must describe a shape having good approximation capacity (e.g., a Gaussian shape or others, according to the function approximation theory). This is enabled by employing the EDMFILC theory to learn such primitive pairs by trials/iterations/repetitions. This is only achieved once, to populate the library of primitives, after which the optimized reference input prediction does not require relearning by repetition.
The EDMFILC theory has been developed for linear multi-input, multi-output (MIMO) systems [1][2][3]. In the case of the SISO ATCS, the EDMFILC is particularized as follows.
Let the ATCS closed-loop reference input at the current iteration be defined, in lifted (or supervectorial) notation spanning an N-samples experiment, as where y d k is the kth sample from the desired output, constant for all iterations. Additionally, k is the kth sample of the output tracking error at iteration j. Non-zero initial conditions, delays, offsets and non-minimumphase responses must be properly considered when defining the desired trajectory.
The optimal reference input ρ * k (R * in lifted notation) ensuring zero tracking error is iteratively searched, using the gradient descent update law where χ is the positive definite learning gain and ∂J(R j ) ∂R is the gradient of the cost function J(R) = 1 N E(R) 2 2 with respect to its argument R, evaluated at the current iteration reference input vector R j . This cost function penalizes the tracking error over the entire trial. For linear systems, the gradient is experimentally obtainable in a model-free manner, as shown by the application steps of the EDMFILC [3]: Step With each iteration j, follow the next steps.
Step 2. Set R j as reference input to the closed-loop ATCS and record the current iteration tracking error E j = Y j − Y d . Let this be the nominal experiment.
Step 3. Upside-down flip E j to result in ud f E j .
Step 4. Scale ud f E j in amplitude by the scalar multiplication gain µ.
Step 5. Use µ · ud f E j as an additive disturbance for the current iteration reference R j . Use R j + µ · ud f E j as the reference input to what is called "the gradient experiment" and record the output Y j G from this non-nominal experiment.
Step 6. As shown in [3], the gradient in (10) is computable as Step 7. Update R j based on (10).
Step 8. Repeat from Step 2 until the maximum number of iterations is reached or the gradient norm ∂J(R j ) ∂R 2 is below some predefined threshold. After Step 8, the learned primitive R j , Y j is stored in the library as R [m] , Y [m] , with m indexing the mth primitive. Here, R [m] is the mth primitive input, whereas Y [m] is the mth primitive output. The choice of the learning gain factor χ was proposed in [1][2][3], such that it ensures safe learning convergence. The reference input and the controlled output data from the closedloop test of Figure 5 allow for the identification of a linear output-error (OE) approximation model T(q) of the closed-loop ATCS. Furthermore, M(q) is another good approximation model of the closed-loop ATCS, resulting via the model reference matching solved via VSFRT. Then, we solve to obtain the value χ * = 99.76 which, for the closed-loop ATCS, is the most conservative learning gain that ensures zero tracking error in the long-term iteration domain, when applying EDMFILC.
For the ATCS, two primitives are learned by EDMFILC. Experiments are performed in a room with a controlled temperature environment, ensuring strong repeatability. The first primitive is defined by the desired trajectory y d k = 0.4 + 0.2e −(k·T s −1000) 2 /50000 , 1 ≤ k ≤ 100, having Gaussian shape and lasting for 2000 s in the 20-s sampling period. The factor 0.4 defines the operating point (corresponding to 40 • C), the factor 0.2 sets the Gaussian height, the factor 1000 sets the Gaussian center and the factor 50,000 sets its time-width. The scaling factor for the upside-down flipped error in the gradient experiment is µ = 3, for a maximum of 40 EDMFILC iterations.
The second learned primitive is defined by the desired trajectory y d k = 0.4 − 0.1 e −(k·T s −1000) 2 /50000 , 1 ≤ k ≤ 100, again having a Gaussian shape, but this time pointing downwards. All the parameters preserve the same interpretation from the first primitive. The learning gain and the scaling factor are the same. The resulting learning history is shown in Figure 6 for 30 iterations.
Step 8. Repeat from Step 2 until the maximum number of iterations is reached or the gradient norm is below some predefined threshold. The choice of the learning gain factor was proposed in [1][2][3], such that it ensures safe learning convergence. The reference input and the controlled output data from the closed-loop test of Figure 5 allow for the identification of a linear output-error (OE) approximation model ( ) of the closed-loop ATCS. Furthermore, ( ) is another good approximation model of the closed-loop ATCS, resulting via the model reference matching solved via VSFRT. Then, we solve * = , . .
to obtain the value * = 99.76 which, for the closed-loop ATCS, is the most conservative learning gain that ensures zero tracking error in the long-term iteration domain, when applying EDMFILC.
For the ATCS, two primitives are learned by EDMFILC. Experiments are performed in a room with a controlled temperature environment, ensuring strong repeatability. The first primitive is defined by the desired trajectory , 1 ≤ ≤ 100, again having a Gaussian shape, but this time pointing downwards. All the parameters preserve the same interpretation from the first primitive. The learning gain and the scaling factor are the same. The resulting learning history is shown in Figure 6 for 30 iterations. From the implementation viewpoint, each trial requires repeatability of the initial conditions, i.e., to reach the vicinity of the initial temperature of the desired profile y d k , after which, the data logging starts. Although the closed-loop ATCS is not perfectly linear, but rather, smooth and nonlinear, the EDMFILC is applicable and robust to such behavior. It is a pure data-driven technique relying on input-output data to learn trajectory tracking by repetitions/trials. The resulting primitive pairs are R [1] , Y [1] and R [2] , Y [2] and are memorized in the primitives' library. They are called the original primitives. Each original primitive contains the reference input and the closed-loop ATCS controlled output from the last EDMFILC iteration. Therefore, each primitive intrinsically encodes the CLCS dynamics within its signals. These encoded dynamics will be used, although not explicitly, to predict the optimal reference ensuring tracking of new desired trajectories.

Optimal Tracking Using Primitives at the Uppermost Level L3
The final application step of the primitive-based HLF concerns the optimal reference input prediction that ensures a new desired trajectory is tracked as accurately as possible. This has to be performed without relearning the reference inputs on a trial-by-trial basis, as it was with EDMFILC. The concept has been thoroughly described in [1][2][3].
The new desired trajectory for the ATCS is y d k = min {0.5, max{0.3, 0.4 + 0.05 · sin(0.002kT s ) + 0.00001kT s }}, k = 1, 400. This trajectory's length is four times greater than the length of each of the two learned primitives (lasting for 100 samples each). A linear regression dedicated to approximation purposes is to be solved at the level L3, which is costly when the length of y d k is large. The strategy is to divide y d k into shorter segments (herein four), then predict and execute the tracking on each resulting segment (or subinterval). The first segment is the part of y d k corresponding to 1 ≤ k ≤ 100, consisting of N = 100 samples. However, the length of a segment does not have to equal that of a primitive, although it should be about the same order of magnitude. After predicting the optimal reference for this first segment, the next segment from y d k is extracted, the optimal reference input for its tracking is predicted, and so on until y d k is entirely processed. Therefore, the discussion about how to predict the optimal reference input is detailed for a single segment. For such a segment of length N (assumed even without generality loss), let the desired trajectory in lifted notation be Y d ∈ R N×1 . The steps enumerated below are performed.
Step Step 3. A number of M random copies of the extended primitives are memorized.
Step 4. Each copied primitive is delayed/advanced by an integer uniform value The delayed copies are indexed as R [θ π e] , Y [θ π e] , π = 1, M. Padding has to be used again because the delay is without circular shifting. To this extent, a number of |θ| samples will be padded with the value of the first or last unshifted samples from R [πe] and Y [πe] , respectively.
Step 5. An output basis function matrix is built from the delayed primitive outputs as Step 6. Find the optimal β * = argmin β β − Y d[e] 2 2 by solving the overdetermined linear equation system with least squares.
Step 7. Employing the linear systems superposition principle in order to obtain the optimal reference input leading to optimal tracking of Y d[e] , we compute R * Here, R * [e] ∈ R 2N , and it was shown in Theorem 1 from [3] that R * [e] theoretically ensures the smallest tracking error, only bounded by the approximation error of the difference Step 8. To return to N-length signals, the true reference input R * is obtained by clipping the middle interval of R * [e] . The signal R * (ρ * k in time-based notation) is the predicted optimal reference input which is to be set as reference input to the closed-loop ATCS, to execute the tracking task on the current segment.
For the application of the previous steps for the ATCS, we used a number of M = 2400 copies of the original two primitives, with corresponding delays are uniformly random integers within [−100; 100], hence spanning 200 samples for extended trajectories of 200 samples.
A secondary aspect is the constraint satisfaction being addressed by the proposed primitive-based learning framework. As discussed in [1][2][3], the straightforward approach to indirectly address controlled output magnitude constraints is to enforce magnitude constraints upon the new trajectory y d k . In this case, the role of the max, min operators used within the definition of y d k is to enforce such constraints by trajectory magnitude clipping. This is a form of soft-constraint handling [3]; the accuracy is evaluated only after executing the tracking task and may vary. Other types of constraints, such as rate constraints, may be handled similarly, in the presented indirect style. Constraints on other CLCS characteristic signals are not considered as relevant, be it magnitude or rate inequality constraints: the ones on the CLCS's inputs are too "embedded" and they negatively influence the model matching achievement, whereas the ones on the reference input again affect the trajectory tracking accuracy at the CLCS's output.
The trajectory tracking results on a segment-by-segment basis is shown in Figure 7.
were extended, delayed and padded as indicated in the previous steps. The columns of serve as approximation functions for the extended and padded desired trajectory [ ] , by linearly combining them as , = [ , . . . , ] ∈ ℝ .
Step 6. Find the optimal * = arg min − [ ] by solving the overdetermined linear equation system with least squares.
Step 7. Employing the linear systems superposition principle in order to obtain the optimal reference input leading to optimal tracking of [  Step 8. To return to -length signals, the true reference input * is obtained by clipping the middle interval of * [ ] . The signal * ( * in time-based notation) is the predicted optimal reference input which is to be set as reference input to the closed-loop ATCS, to execute the tracking task on the current segment. For the application of the previous steps for the ATCS, we used a number of = 2400 copies of the original two primitives, with corresponding delays are uniformly random integers within [−100; 100], hence spanning 200 samples for extended trajectories of 200 samples.
A secondary aspect is the constraint satisfaction being addressed by the proposed primitive-based learning framework. As discussed in [1][2][3], the straightforward approach to indirectly address controlled output magnitude constraints is to enforce magnitude constraints upon the new trajectory . In this case, the role of the , operators used within the definition of is to enforce such constraints by trajectory magnitude clipping. This is a form of soft-constraint handling [3]; the accuracy is evaluated only after executing the tracking task and may vary. Other types of constraints, such as rate constraints, may be handled similarly, in the presented indirect style. Constraints on other CLCS characteristic signals are not considered as relevant, be it magnitude or rate inequality constraints: the ones on the CLCS's inputs are too "embedded" and they negatively influence the model matching achievement, whereas the ones on the reference input again affect the trajectory tracking accuracy at the CLCS's output.
The trajectory tracking results on a segment-by-segment basis is shown in Figure 7.  . The output y k (red) when the optimal reference input (ρ * k from R * ) is computed using primitives and the output y k (black) when the reference input is ρ k = y d k . The green boxes highlight the tracking errors at the end of each segment.
Several remarks are given. The tracking accuracy is expressed as the cumulated tracking error squared norm divided by the number of samples, which is the common mean summed squared error. The MSE obtained with the reference input is optimally predicted based on the primitive approach measures 4.75 × 10 −4 . The same indicator measured when the reference input is ρ k = y d k is 1.36 × 10 −3 , nearly three times larger. This clearly shows that the primitive-based approach effectively achieves higher tracking accuracy. Its anticipatory character is revealed in the sense that the noncausal filtering operations involved lead to a reference input which makes the CLCS respond immediately when the desired trajectory changes. Therefore, it eliminates the lagged response of the naturally low-pass CLCS.
Furthermore, the green boxes in Figure 7 highlight the tracking errors at the end time of the tracking execution on each segment. These errors do not build up, being regarded as non-zero initial conditions for the next segment tracking task, with their effect vanishing in time.
Constraint condition imposed on the upper-clipped magnitude of the desired trajectory in the fourth segment does not reflect very accurately upon the controlled output. The cause of this, as well as the only three-fold improvement in the tracking accuracy with the primitive-based approach, is due to the CLCS not being so linear (not perfectly matching the reference model M(q)). The importance of achieving high-quality model reference matching (and therefore indirect closed-loop linearization) was identified as crucial [3]. When the linearity assumption holds, the accuracy may be improved up to 100-fold, and the output constraints are thoroughly enforced. This has been reported in other applications [1][2][3]. Therefore, ensuring the low-level model reference matching (or tracking) of the CLCS is critical.

System Description
A rheostatic brake emulator is next considered as a representative case study (please refer to Figure 8 below). Such a process has wide applicability in resistive-based braking in cars, trains or in wind turbine generators [58]. Suppose there exists a variable voltage source V source (due to irrelevant conditions), the goal would be to regulate a constant voltage V gen across a section of the circuit, to ensure, e.g., a constant power delivery over some load. In practice, the (resistive) load consumer is changed accordingly, to adjust for the voltage level. By Ohm's law, keeping a constant load while changing the current achieves an equivalent effect. Hence, the means to control V gen is achieved by current variation through the blue line in Figure 8, achievable by changing the current flow through a (power) transistor (indicated on the blue path in the figure). From a practical perspective, however, it is easier to maintain V source at a constant level and instead control the voltage level V gen , to basically illustrate the same effect.
Several remarks are given. The tracking accuracy is expressed as the cumulated tracking error squared norm divided by the number of samples, which is the common mean summed squared error. The MSE obtained with the reference input is optimally predicted based on the primitive approach measures 4.75 × 10 . The same indicator measured when the reference input is = is 1.36 × 10 , nearly three times larger. This clearly shows that the primitive-based approach effectively achieves higher tracking accuracy. Its anticipatory character is revealed in the sense that the noncausal filtering operations involved lead to a reference input which makes the CLCS respond immediately when the desired trajectory changes. Therefore, it eliminates the lagged response of the naturally low-pass CLCS.
Furthermore, the green boxes in Figure 7 highlight the tracking errors at the end time of the tracking execution on each segment. These errors do not build up, being regarded as non-zero initial conditions for the next segment tracking task, with their effect vanishing in time.
Constraint condition imposed on the upper-clipped magnitude of the desired trajectory in the fourth segment does not reflect very accurately upon the controlled output. The cause of this, as well as the only three-fold improvement in the tracking accuracy with the primitive-based approach, is due to the CLCS not being so linear (not perfectly matching the reference model ( )). The importance of achieving high-quality model reference matching (and therefore indirect closed-loop linearization) was identified as crucial [3]. When the linearity assumption holds, the accuracy may be improved up to 100-fold, and the output constraints are thoroughly enforced. This has been reported in other applications [1][2][3]. Therefore, ensuring the low-level model reference matching (or tracking) of the CLCS is critical.

System Description
A rheostatic brake emulator is next considered as a representative case study (please refer to Figure 8 below). Such a process has wide applicability in resistive-based braking in cars, trains or in wind turbine generators [58]. Suppose there exists a variable voltage source (due to irrelevant conditions), the goal would be to regulate a constant voltage across a section of the circuit, to ensure, e.g., a constant power delivery over some load. In practice, the (resistive) load consumer is changed accordingly, to adjust for the voltage level. By Ohm's law, keeping a constant load while changing the current achieves an equivalent effect. Hence, the means to control is achieved by current variation through the blue line in Figure 8, achievable by changing the current flow through a (power) transistor (indicated on the blue path in the figure). From a practical perspective, however, it is easier to maintain at a constant level and instead control the voltage level , to basically illustrate the same effect.  Some technical facts about the circuit are detailed. The source voltage is 9 V, a TIPC31C power transistor is used (capable of up to 40 W in switching operation mode, colored red in Figure 8). The transistor's base voltage is compatible with the voltage level obtained from the PWM output of an Arduino board, in this case represented by V in ∈ [0; 5] V. Therefore, the actual control input u k will be the duty cycle to the variation in V in within its domain.
where the operator sat H L (.) saturates its argument within [L; H] values and the value "30" is offset to ensure a voltage of V in = 3.5 V at the transistor's base, around the linear operating point. Therefore, if u k increases, the V in decreases, the transistor opens (starts acting as open-switch), the current decreases and V gen also increases, whereas vice versa holds. For our case, the domain of the non-dimensional duty cycle factor u k is [0; 1].
The voltage V in is low-pass-filtered through an RC stage made up of a 10 kΩ resistor and 10 µF capacitor. Therefore, the resulting filtered output will actually drive the TIP31C transistor in its linear operation mode. Additional stage elements use a 100 µF capacitor to clean V source from noise and a voltage divider to reduce V gen to V out to within the voltage levels [0; 5] V acceptable for the ADC input Arduino port. The resulting voltage level V out is software-processed, multiplied by two and filtered through 1/(0.2s + 1) to recover the original value V gen . For the given V source = 9 V and the other electrical components, the effect is that the controlled output is y k = V gen ∈ [2; 6.7] V, whose level is to be controlled within its entire domain. The sampling period T s = 0.05 s is suitable for data acquisition and control inference. A picture of the realized EBS hardware attached to the Arduino board is rendered in Figure 9. The system response is rather fast and subject to noise, making it challenging for all control stages. Importantly, the EBS module is fairly cheap and can be used by many practitioners. Some technical facts about the circuit are detailed. The source voltage is 9 V, a TIPC31C power transistor is used (capable of up to 40 W in switching operation mode, colored red in Figure 8). The transistor's base voltage is compatible with the voltage level obtained from the PWM output of an Arduino board, in this case represented by ∈ [0; 5] . Therefore, the actual control input will be the duty cycle to the variation in within its domain. MATLAB software-side processing uses the values {0, … ,255} to write the PWM port; thus, the equation to derive the voltage as a function of is where the operator (. ) saturates its argument within [ ; ] values and the value "30" is offset to ensure a voltage of = 3.5 V at the transistor's base, around the linear operating point. Therefore, if increases, the decreases, the transistor opens (starts acting as open-switch), the current decreases and also increases, whereas vice versa holds. For our case, the domain of the non-dimensional duty cycle factor is [0; 1]. The voltage is low-pass-filtered through an RC stage made up of a 10 kΩ resistor and 10 μF capacitor. Therefore, the resulting filtered output will actually drive the TIP31C transistor in its linear operation mode. Additional stage elements use a 100 μF capacitor to clean from noise and a voltage divider to reduce to to within the voltage levels [0; 5] V acceptable for the ADC input Arduino port. The resulting voltage level is software-processed, multiplied by two and filtered through 1/(0.2 + 1) to recover the original value . For the given = 9 V and the other electrical components, the effect is that the controlled output is = ∈ [2; 6.7] V, whose level is to be controlled within its entire domain. The sampling period = 0.05 s is suitable for data acquisition and control inference. A picture of the realized EBS hardware attached to the Arduino board is rendered in Figure 9. The system response is rather fast and subject to noise, making it challenging for all control stages. Importantly, the EBS module is fairly cheap and can be used by many practitioners.

EBS Input-Output Data Collection for Learning Low-Level L1 Control Dedicated to Model Reference Tracking
A dataset of input-output samples is measured from the EBS in the first place, to learn the level L1 controller. Exploration quality is important because it stimulates all system dynamics [58,59]. Long experiments ensure that many combinations of and are visited; however, it is of interest to accelerate the collection phase, i.e., to obtain more

EBS Input-Output Data Collection for Learning Low-Level L1 Control Dedicated to Model Reference Tracking
A dataset of input-output samples is measured from the EBS in the first place, to learn the level L1 controller. Exploration quality is important because it stimulates all system dynamics [58,59]. Long experiments ensure that many combinations of u k and y k are visited; however, it is of interest to accelerate the collection phase, i.e., to obtain more variation from the signals in the same unit of time. This can only be obtained with the help of a closed-loop controller which compensates the system's dynamics, ensuring faster transients. To this extent, a discrete-time version of the integral-type controller C(s) = 1/s is used. The reference input driving the CLCS over the EBS is a staircase signal switching amplitude at every five seconds and whose amplitudes are uniformly random values in [2.2; 6] V. Additive stair-like noise perturbs the reference input, with a shorter switching period of 0.1 s and uniform random amplitudes within [−0.1; 0.1]. The noise's role is to further break the time correlation between successive samples. The resulting input-output explored data depicted in Figure 10. variation from the signals in the same unit of time. This can only be obtained with the help of a closed-loop controller which compensates the system's dynamics, ensuring faster transients. To this extent, a discrete-time version of the integral-type controller ( ) = 1/ is used. The reference input driving the CLCS over the EBS is a staircase signal switching amplitude at every five seconds and whose amplitudes are uniformly random values in [2.2; 6] . Additive stair-like noise perturbs the reference input, with a shorter switching period of 0.1 s and uniform random amplitudes within [−0.1; 0.1]. The noise's role is to further break the time correlation between successive samples. The resulting input-output explored data depicted in Figure 10.
where (. ) is the sigmoid function applied element-wise, ℎ(. ) is the hyperbolic tangent applied element-wise, ⨂ multiplies vectors element-wise, ∈ ℝ is the hidden LSTM state of size at time step , ∈ ℝ is the LSTM cell state at step , ∈ ℝ is the exogeneous input sequence of size , ∈ ℝ is from the input gate, ∈ ℝ is from the forget gate, ∈ ℝ is the cell candidate and ∈ ℝ is from the output A nonlinear long short-term-memory (LSTM) recurrent neural network controller u k = C s ext k is learned as suggested in [12], based on the input-output controller data sequence s ext k s T k , ρ k T , u k which is calculated offline according to the VSFRT principle, after measuring {u k , y k }. The controller's LSTM network is modelled in discrete time, based on the LSTM cell as where logsig(.) is the sigmoid function applied element-wise, tanh(.) is the hyperbolic tangent applied element-wise, multiplies vectors element-wise, h k ∈ R n h is the hidden LSTM state of size n h at time step k, c k ∈ R n h is the LSTM cell state at step k, s ext k ∈ R ξ is the exogeneous input sequence of size ξ, i k ∈ R n h is from the input gate, f k ∈ R n h is from the forget gate, g k ∈ R n h is the cell candidate and o k ∈ R n h is from the output gate. W j ∈ R n h ×ξ , j ∈ {i, f , g, o} are the cell input weights, R j ∈ R n h ×n h , j ∈ {i, f , g, o} are the cell recurrent weights and b j ∈ R n h ×1 , j ∈ {i, f , g, o} are the cell offsets. The LSTM network output is y LSTM k ∈ R n LSTM and linearly depends on h k through the network output weights W y ∈ R n LSTM ×n h , b y ∈ R n LSTM ×1 . Here, π = W j , R j , b j , W y , b y , j ∈ {i, f , g, o} collates all the trainable elements of the LSTM network.
The cell input weights are initialized with Xavier algorithm, the cell recurrent weights are initialized orthogonally, and the offsets are all zero except for the b f which are set to one. W y is also initialized with Xavier, whereas b y are all zero at first. The Adam algorithm is used for training for a maximum of 1000 epochs, over minibatches of 64 elements, initial learning rate is 0.01, gradient clipping threshold is 5, and 80% of the dataset is used for training; the remaining 20% is for validation after each 10 epochs. The loss training is the mean squared error (MSE) with L 2 weights regularization factor of 10 −4 .
The VSFRT LSTM-based recurrent neural network controller is found after the next steps are applied in order [58].
Step 1. Define the observability index τ = 3 and construct the trajectory {u k , y k , s k }, where s k = [y k , y k−1 , y k−2 , y k−3 , u k−1 , u k−2 , u k−3 ] T ∈ R 7 is a virtual state for EBS. Using the discrete time index k = 1, N, 1997 tuples of the form {u k , y k , s k } are built.
Step 2. The virtual reference input is obtained as ρ k = M −1 (q)y f k , where y f k is the low-pass-filtered version of y k through the filter 0.1 1−0.9q −1 , due to y k being noisy.
Step 3. Construct the regressor state as s ext Step 4. Parameterize the LSTM-based VSFRT controller u k = C s ext k , π with ξ = 8, n h = 10, n LSTM = 1. Initialize the controller network parameters according to the settings. The VSFRT goal is to ensure model reference matching by indirectly solving the controller identification problem [1,4,12,53] which, in fact, means training the LSTM network with input sequences s ext k and output sequences {u k } in order to minimize the mean squared prediction errors with weight regularization.
Following the previous steps, the resulting LSTM-based VSFRT controller is tested in a closed-loop against a linear-affine VSFRT controller u k = K T s ext k (but this time with s ext k including an extra feature "1" to model the affine term), as learned in [58]. The results are shown in Figure 11. The superiority of the nonlinear, recurrent LSTM controller is clear in terms of smaller errors and fewer oscillations at higher setpoint values where EBS changes its character. The reason is that LSTM is better for learning long-term dependencies from time series. Subsequently, the EBS closed-loop is considered to be sufficiently linearized to match ( ); hence, the level L2 learning phase is next attempted.

Intermediate L2 Level Primitives Learning with EDMFILC
For the EBS, the two primitives are learned by the same EDMFILC procedure that Subsequently, the EBS closed-loop is considered to be sufficiently linearized to match M(q); hence, the level L2 learning phase is next attempted.

Intermediate L2 Level Primitives Learning with EDMFILC
For the EBS, the two primitives are learned by the same EDMFILC procedure that was also applied for the ATCS.
The first primitive is defined by the desired trajectory y d k = ζ 1 + ζ 2 e −(k·T s −ζ 3 ) 2 /ζ 4 , 1 ≤ k ≤ 200, having Gaussian shape and lasting for 8 s in the 0.05-s sampling period, starting after 2 s in which the system closed-loop system stabilizes its output at 3V [59]. The factor ζ 1 = 3 defines the 3 V operating point offset level, the factor ζ 2 = 1.5 sets the Gaussian magnitude, the factor ζ 3 = 6 fixes the Gaussian center and the factor ζ 4 = 0.5 fixes its time-width. The scaling factor for the upside-down flipped error in the gradient experiment is µ = 3, for a maximum of 10 EDMFILC iterations. The learning gain is safely chosen as χ * = 153.7 based on the procedure Erom Equation (11) (using M(q) and an identified OE first-order model 0.1227q −1 / 1 − 0.8797q −1 ), in order to guarantee learning convergence [3]. The learning process of the first primitive is shown in Figure 12. Figure 11. Closed-loop control test for EBS, with the linear affine VSFRT controller from [58] of compensator * vs. the proposed LSTM-based controller. The reference model's output is green, the actual closed-loop is orange with the linear affine controller and magenta with the LSTM controller.
Subsequently, the EBS closed-loop is considered to be sufficiently linearized to match ( ); hence, the level L2 learning phase is next attempted.

Intermediate L2 Level Primitives Learning with EDMFILC
For the EBS, the two primitives are learned by the same EDMFILC procedure that was also applied for the ATCS.
The first primitive is defined by the desired trajectory = + ( ⋅ ) / , 1 ≤ ≤ 200, having Gaussian shape and lasting for 8 s in the 0.05-s sampling period, starting after 2 s in which the system closed-loop system stabilizes its output at 3 [59]. The factor = 3 defines the 3 V operating point offset level, the factor = 1.5 sets the Gaussian magnitude, the factor = 6 fixes the Gaussian center and the factor = 0.5 fixes its time-width. The scaling factor for the upside-down flipped error in the gradient experiment is = 3, for a maximum of 10 EDMFILC iterations. The learning gain is safely chosen as * = 153.7 based on the procedure Erom Equation (11) (using ( ) and an identified OE first-order model 0.1227 /(1 − 0.8797 )), in order to guarantee learning convergence [3]. The learning process of the first primitive is shown in Figure 12. The second learned primitive is defined by the desired trajectory y d k = ζ 1 − ζ 2 e −(k·T s −ζ 3 ) 2 /ζ 4 , 1 ≤ k ≤ 200, (ζ 1 = 3, ζ 2 = 1, ζ 3 = 6, ζ 4 = 0.5), again having a Gaussian shape, but this time pointing downwards. All the parameters preserve the same interpretation from the first primitive. The learning gain and the scaling factor are the same. The resulting learning history is shown in Figure 13 for 10 iterations.
From a practical validation viewpoint, all normal and gradient EDMFILC experiments start when the controlled voltage output reaches 3V, which is the operating point. For this reason, only N = 160 samples are actual primitives (Figures 12 and 13), and the first 40 samples out of the 200 allow for the EBS closed-loop to reach the operating point.
Although challenging, the noisy closed-loop EBS is capable of learning the primitives under the linearity assumption about the closed-loop, even in a low signal-to-noise ratio environment.
The resulting original primitives are R [1] , Y [1] and R [2] , Y [2] and memorized in the primitive's library. Each original primitive contains the reference input and the closed-loop EBS controlled output measured at the last EDMFILC iteration. In the next step, the final level L3 learning occurs, where the original primitives will be used to predict the optimal reference input which ensures that a new desired trajectory is optimally tracked, without having to relearn tracking by EDMFILC trials.
termediate trajectories throughout the learning process are in grey.
The second learned primitive is defined by the desired trajectory = − ( ⋅ ) / , 1 ≤ ≤ 200 , ( = 3, = 1, = 6, = 0.5 ), again having a Gaussian shape, but this time pointing downwards. All the parameters preserve the same interpretation from the first primitive. The learning gain and the scaling factor are the same. The resulting learning history is shown in Figure 13 for 10 iterations. From a practical validation viewpoint, all normal and gradient EDMFILC experiments start when the controlled voltage output reaches 3 , which is the operating point. For this reason, only = 160 samples are actual primitives (Figures 12 and 13), and the first 40 samples out of the 200 allow for the EBS closed-loop to reach the operating point.
Although challenging, the noisy closed-loop EBS is capable of learning the primitives under the linearity assumption about the closed-loop, even in a low signal-to-noise ratio environment. The

Optimal Tracking Using Primitives at the Uppermost Level L3
The new desired trajectory at the EBS's output = min{4, max{2.5, 3 + sin(1.8 ) + 0.05 }} when = 1, 640 , which is right-shifted with the left padding of a value of 3 for the first 40 samples. Then, is four times greater than the length of each of the two learned primitives (each one has 160 samples). The strategy is to divide into four segments and predict and execute the tracking on each resulting segment (or subinterval). The first segment is the part of corresponding to 1 ≤ ≤ 160, consisting of = 160 samples. After predicting the optimal reference for this first segment, it will be set as reference input to the closed-loop EBS, and the trajectory tracking is performed. At the end, the next segment from is extracted, its corresponding optimal reference input is predicted and fed to the closed-loop, tracking is executed, etc.
The approach is similar to the first case study of the ATCS. The subsequent steps done for the EBS are performed in this order [59]:

Optimal Tracking Using Primitives at the Uppermost Level L3
The new desired trajectory at the EBS's output y d k = min{4, max{2.5, 3 + sin(1.8kT s ) + 0.05kT s }} when k = 1, 640, which is right-shifted with the left padding of a value of 3 for the first 40 samples. Then, y d k is four times greater than the length of each of the two learned primitives (each one has 160 samples). The strategy is to divide y d k into four segments and predict and execute the tracking on each resulting segment (or subinterval). The first segment is the part of y d k corresponding to 1 ≤ k ≤ 160, consisting of N = 160 samples. After predicting the optimal reference for this first segment, it will be set as reference input to the closed-loop EBS, and the trajectory tracking is performed. At the end, the next segment from y d k is extracted, its corresponding optimal reference input is predicted and fed to the closed-loop, tracking is executed, etc.
The approach is similar to the first case study of the ATCS. The subsequent steps done for the EBS are performed in this order [59]: Step 1. Extend Y d (lifted notation of y d k ) to length 2N (320 samples in this case) to get Y d[e] ∈ R 2N×1 , by padding the leftmost N/2 samples with the value of the first sample of y d k and the rightmost N/2 samples with the value of the last sample from y d k .
Step 4. Each copied primitive is delayed/advanced by an integer uniform value The delayed copies are indexed as R [θ π e] , Y [θ π e] , π = 1, M. Padding has to be used again because the delay is without circular shifting. To this extent, a number of |θ| samples will be padded with the value of the first or last unshifted samples from R [πe] and Y [πe] , respectively.
Step 5. An output basis function matrix is built from the delayed primitive outputs as Step 6. The optimal weights β * = argmin β β − Y d[e] 2 2 are found solving the overdetermined linear equation system with least squares.
Step 7. The reference input ensuring that Y d[e] is optimally tracked, computes as Here, R * [e] ∈ R 2N .
Step 8. To return to N-length signals, the useful reference input R * is obtained by clipping the middle interval of R * [e] . The signal R * (ρ * k in time-based notation) is the predicted optimal reference input which is to be set as the reference input to the closed-loop EBS, to execute the tracking task on the current segment.
The importance of dividing longer desired trajectories into segments brings lower complexity to solving for the regression coefficients β * . This solves a complexity issue because the number of coefficients is in the hundreds and does not scale up well with the desired trajectory's length (more equations added to the linear overdetermined system).
Again, magnitude constraints upon the desired trajectory y d k are enforced by clipping it with the max, min operators. The final trajectory tracking results on a segment-by-segment basis are shown in Figure 14, statistically averaged for four runs.
Step 5. An output basis function matrix is built from the delayed primitive outputs as Step 6. The optimal weights * = arg min − [ ] are found solving the overdetermined linear equation system with least squares.
Step 7. Step 8. To return to -length signals, the useful reference input * is obtained by clipping the middle interval of * [ ] . The signal * ( * in time-based notation) is the predicted optimal reference input which is to be set as the reference input to the closedloop EBS, to execute the tracking task on the current segment. The importance of dividing longer desired trajectories into segments brings lower complexity to solving for the regression coefficients * . This solves a complexity issue because the number of coefficients is in the hundreds and does not scale up well with the desired trajectory's length (more equations added to the linear overdetermined system).
Again, magnitude constraints upon the desired trajectory are enforced by clipping it with the , operators. The final trajectory tracking results on a segmentby-segment basis are shown in Figure 14, statistically averaged for four runs. . The output y k (red) when the optimal reference input (ρ * k from R * ) is computed using primitives and the output y k (black) when the reference input is ρ k = y d k . The green boxes highlight the tracking errors at the end of each segment. The red and black trajectories are averaged from four runs.
The tracking accuracy is measured again using the MSE index. The MSE with the reference input optimally predicted using primitives measures 5.12 × 10 −3 . The MSE when the reference input is ρ k = y d k measures 1.03 × 10 −2 , which is two times larger. This clearly shows that the primitive-based approach effectively achieves higher tracking accuracy. Its anticipatory character is revealed in the sense that the noncausal filtering operations involved lead to a reference input, which makes the CLCS respond immediately when the desired trajectory changes. Therefore, it eliminates the lagged response of the naturally low-pass CLCS.
The tracking errors at the end of each segment do not build up after the segment tracking episodes. They are seen as non-zero initial conditions for the next segment tracking task, and their effect vanishes relatively rapidly.
The upper and lower constraints imposed on the desired trajectory do not accurately transfer upon the controlled EBS output. It is still better than with the reference input being y d k . The cause of this, as well as the only two-fold improvement in the tracking accuracy with the primitive-based approach, is due to the closed-loop being not so linear (not perfectly matching the reference model M(q)). The importance of achieving highquality model reference matching (and therefore indirect linearization of the closed-loop) is again emphasized.

Conclusions
The proposed hierarchical learning primitive-based framework has been validated on two affordable lab-scale nonlinear systems. At a low level (level L1), VSFRT was employed to learn linear-affine or nonlinear LSTM-like virtual state-feedback neuro-controllers dedicated to linear model reference matching. The learning phase relies on the nonlinearly controlled system assumed as observable, while building virtual state-space representation from present and past input-output data. Hence, VSFRT is purely data-driven and able to overcome the dynamical system's model unavailability.
At the secondary level, EDMFILC shows resilience to smooth closed-loop nonlinearity and high amplitude noise, although being based on linearity assumptions. The ATCS and EBS are mono-variable (SISO) systems; however, EDMFILC has been shown as equally effective on multivariable (or MIMO) systems. In fact, the number of gradient experiments has been reduced to one, no matter how many control channels (complexity reduced from O N 3 to O N 2 [3]). EDMFILC should be used whenever the output primitive shape is not desirable for approximation purposes when used in the L3 phase. Hence, the purely data-driven model-free level L2 learning phase has lesser impact on the final tracking quality. The anticipative response in the final tracking response, which is due to the non-causal filtering operation, is the qualitative trait of the EDMFILC.
The uppermost L3 learning phase is based on the primitives obtained after sequentially applying the L1 and L2 learning phases. The final tracking performance and constraint satisfaction critically depend on the quality of level L1 model reference matching. To this end, most efforts should be concentrated on the level L1 successful learning.
Although the effectiveness of the primitive-based HLF has been proven, it was shown how machine learning techniques can help improve and extend the scope of the classical control systems techniques. Further validation on applications more different in nature will prove the framework's ability to induce the CSs with some of the intelligent features of living organisms, based on memorization, learnability, generalization, adaptation and robustness.