Coordination of Lateral Vehicle Control Systems Using Learning-Based Strategies

: The paper proposes a novel learning-based coordination strategy for lateral control systems of automated vehicles. The motivation of the research is to improve the performance level of the coordinated system compared to the conventional model-based reconﬁgurable solutions. During vehicle maneuvers, the coordinated control system provides torque vectoring and front-wheel steering angle in order to guarantee the various lateral dynamical performances. The performance speciﬁcations are guaranteed on two levels, i.e., primary performances are guaranteed by Linear Parameter Varying (LPV) controllers, while secondary performances (e.g., economy and comfort) are maintained by a reinforcement-learning-based (RL) controller. The coordination of the control systems is carried out by a supervisor. The effectiveness of the proposed coordinated control system is illustrated through high velocity vehicle maneuvers.


Introduction and Motivation
The design of intelligent automated road vehicles required the automation of several vehicle, road and transportation systems in the last decade. The automation of the processes through the increased number of sensors and information sources can be made available. Although the automation of the different systems can improve the performance of the road vehicles, the advantages of the automation through the effective coordination of the subsystems can be achieved. From the aspect of automated vehicles at least three different levels of coordination can be defined, such as on the level of vehicle control systems, on the level of vehicle control and human driving and on the level of vehicle and transportation system. This paper focuses on the coordination on the level of the vehicle control system, but the motivations of all of the previous coordination tasks are briefly introduced below.
The coordination on the level of vehicle control systems means the integration of smart actuators, which have been developed to achieve various automated vehicle functionalities. For example, the maneuvering of the vehicle through automated steering, torque vectoring, differential braking and variable-geometry suspension can be carried out. Despite the similarities in the functionalities, the operational capability and the cost aspects of each intervention can be different. In the literature, various methods for achieving safe and costeffective coordination techniques have been developed. The most important challenges are the handling of nonlinearities and uncertainties, the providing of energy-optimal actuator selection strategy and the assessment of the performance issues in automated systems. The reconfigurable robust parameter-varying methods can provide an effective solution on the design of local controllers for coordinated control systems [1]. Through the scaling of the parameter-dependent weighting functions in the robust control design, the joint intervention of the actuators are feasible and the performance of the control intervention can be guaranteed. Moreover, the reconfiguration can provide a fault-tolerant operation for the vehicle [2,3]. However, a drawback of the reconfiguration-based coordination is that it can be difficult to formulate a provable energy-optimal actuator selection strategy [4]. A further challenge for model-based vehicle control and coordination design is the different methods for the coordination of vehicle control systems, with which the advantages of the model-based and the learning-based methods can be preserved, while the drawbacks can be eliminated, i.e., ensuring a minimum performance level and similarly, while the achievable performance level is maximized. In [24] an iterative learning-based model predictive control (MPC) method is proposed, in which the terminal cost and set for the model-based control is learned. Another example is found in [25], where the Linear Parameter-Varying (LPV) method for the design of a safe control next to the learning-based controller is used. Moreover, the learning features can have an impact on the formulation of control-oriented state-space models, see e.g., [26]. In spite of the promising result, a design method for vehicle system coordination using joint learning-based and model-based cannot be found, to the best of the author's knowledge.
The goal of this paper is to provide a coordination method for vehicle control systems on the vehicle level, which is able to combine the advantages of learning-based and model-based approaches. Thus, the proposed method guarantees minimum performance level on selected control performances, while an increased maximum performance level during the operation of the coordinated control system can be achieved. The coordination method is presented in the context of lateral control design for automated vehicles. The contribution of the paper is a control design method for steering and torque vectoring, with which the energy-optimal coordination of the actuation is achieved, while the minimum level of selected safety performances through the design method are guaranteed. The design is based on the joint application of the robust LPV and the reinforcement learning (RL) methods. The aim of the robust LPV method is to guarantee the minimum level of selected safety performances and the role of the RL method is to maximize the level of all performances.
The paper introduces the design process as follows. In Section 2 the performance specifications for lateral vehicle control and their incorporation in the control framework are presented. The design of the robust LPV control and the RL-based control are proposed in Section 3. The effectiveness of the resulted coordinated control system is illustrated in Section 4. Finally, in Section 5 the results are concluded and some future challenges regarding the proposed method are discussed.

Performance Specifications and Control Framework
Control performance specifications have high importance in the coordination of the actuators on the vehicle level. There can be strict performance specifications, which during the entire operation of the vehicle control system must be guaranteed. In the context of automated vehicle control, these are the safety performance specifications, e.g., the limitations on the path or velocity tracking errors. The strict performance specifications can result from the physical limitations of the actuators, such as the maximum torque of an actuator or the limited achievable steering angle. In this paper, the group of the strict performances are called primary performances.
Furthermore, there is another type of performance specification, which it is recommended to maintain. However, there can be critical situations during the operation of the control system, when these performance specifications can be violated. Since primary performances have priorities, in these situations the controller focuses on the guaranteeing of them. In case of automated vehicles, these performance specifications are typically the economy and comfort specifications, e.g., the minimization of the energy consumption or the maximization of the traveling comfort. The group of these performances are called secondary performances.
This paper focuses on the coordinated control design of lateral vehicle interventions, i.e., the coordination of steering and torque vectoring interventions. The automated steering intervention without human driver actuation on the front wheels of the vehicle is achieved. The torque vectoring is considered to be realized by the differential electric driving also on the front wheels. It is also considered that the rear wheels also can be driven and thus, the velocity tracking functionality of the vehicle is achieved through the traction force compensation by the same forces on the rear wheels. Thus, the performance specifications against the coordinated control system in the context of the actual problem are formed.
The primary performance specifications in the design process of the control system is as follows. • The goal of the coordinated control system is to guarantee the tracking of a predefined path for the automated vehicle. Due to safety reasons the tracking error must be limited, with which the keeping of the actual lane can be guaranteed. Thus, a primary performance for the control design is formed as where z 1 represents the definition of the performance, y re f is the requested reference of the path and y is the lateral position of the vehicle. The specification on (1) is formed as where z 1,max represents the predefined maximum path tracking error. • Due to the physical limits of the steering control intervention, the steering angle on the front wheels must be limited. Thus, a further primary performance z 2 is defined as where δ is the steering angle on the front wheel and the specification is formed as where z 2,max represents the maximum of the steering intervention. • The limitation of torque vectoring has at least two reasons. First, the electric-driven wheels have limitations on the torque actuation, which means that the intervention has physical limits. Second, the driving torque has limits due to the avoidance of the wheel skidding, i.e., the limitation of the longitudinal slip. Therefore, the achievable torque value from the torque vectoring must be limited, which leads to the definition of performance z 3 , such as where M vect represents the torque around the vertical axes of the vehicle center of gravity, which resulted in the differential driving on the front wheels. The primary performance specification is formed as where z 3,max is the limitation on the torque vectoring. Since it is necessary to avoid the skidding of the wheel, the selection of z 3,max can depend on the operation of the vehicle, e.g., z 3,max can be different for a conventional passenger car and for an off-road vehicle. In the proposed method z 3,max is considered to be a constant value during the operation of the vehicle, and thus, its selection can be influenced by the operation circumstances of the vehicle.
The secondary specifications in the design of the coordinated control are related to the economy and comfort requirements.

•
Due to energy management aspects on the vehicle level, the minimization of the control interventions are recommended. It is related the the performances z 2 and z 3 , such as The minimization of |z 2 | and |z 3 | are not independent from each other. Since through δ and also M vect the lateral motion of the vehicle can be carried out, it is requested to find a balance between their intervention. In spite of the similarities between the actuation of δ and M vect , they can also have different impacts. The intervention of M vect modifies the longitudinal slip on the front wheels and it can have an influence on the longitudinal dynamics. Moreover, the intervention through δ can require less electric power, but the steering of the front wheels can also modify the longitudinal dynamics slightly. The role of the coordination is to find an optimal balance between the intervention of δ and M vect , which can be a difficult task through purely modelbased principles.

•
The comfort has high importance in the operation of the automated vehicle control systems, because it has relevance from the aspect of the passengers. The lateral control systems can improve the traveling comfort through the minimization of the lateral jerk [27], such as where z 4 represents the definition of the performance on the jerk, a y =ÿ is the lateral acceleration of the vehicle. Thus, the performance specification is the minimization of z 4 such as The primary and secondary performance specifications show that there are some overlapping between some performances. In (4) and in (6) the absolute values of δ and M vect are limited, while in (7) their absolute values are minimized. The primary performance specifications are formed as hard constraints, while the secondary specifications are soft constraints. However, it is possible to handle all of these constraints in the model-based control design. For example, in case of an MPC design, the cost function contains the secondary specifications and the optimization constraints the primary specifications. In the reconfigurable robust LPV design of this paper the minimization on (7) is incorporated in the control design, and the constraints (4) and (6) are incorporated in the coordination strategy through the actuator selection method [4]. Nevertheless, the coordination in (7) can be difficult in the model-based strategies. Moreover, the extension of the vehicle dynamics with the lateral jerk can also provide difficulties in the control design due to the increase of the state vector. Therefore, maintaining the secondary performances motivates the tuning of the coordinated control through RL.
In the rest of this section a control design framework is proposed, with which primary and secondary performances through different techniques can be considered. Since the primary performances have priorities against further performances, and they must be guaranteed during the entire operation of the system, the primary performances are considered in the robust LPV coordinated control design, because it provides theoretical guarantees on the achieved primary performance level [1]. Thus, (2) is incorporated in the control design and (4) and (6) in the coordination strategy. Although the LPV design also involves the secondary performance specifications (7), it can be effectively incorporated in the RL-based control design. Thus, the role of the RL-based controller is to improve the primary performance (2) and the secondary performances (7) and (9).
The framework for the coordinated control design is illustrated in Figure 1. During the control design process, the robust LPV controllers for the steering and the torque vectoring interventions, together with the actuator selection coordination strategy in the first step are designed. Then, in the second step, the multiple output learning-based agent is trained through several numbers of episodes in an RL process. In the training process, the LPV-based coordinated control and the supervisor are used. The role of the supervisor is to guarantee that the control input vector u = [δ, M vect ] T is always inside of a predefined environment of u LPV = [δ LPV , M vect,LPV ] T . Thus, the output of the RL-based controller u RL = [δ RL , M vect,RL ] T is not able to violate the primary performance specifications.

Design of the Elements in the Control Framework
The goal of this section is to propose the design of the elements in the control framework of Figure 1, i.e., the supervisor, the coordinated LPV controller and the RL-based controller.

Formulation of the Supervisory Strategy
Firstly, the formulation of the supervisory strategy is proposed, which can help to understand the concept behind the design of the two control elements. The goal of the supervisor is to provide control input u = [δ, M vect ] T based on the signals of the RL-based and the robust LPV controllers.
The concept of the design is that the robust LPV-based coordinated control is able to guarantee primary performance and some of the secondary performance specifications under all vehicle dynamic scenarios. Thus, u = u LPV can be a suitable input for the automated vehicle. Nevertheless, the level of the secondary performances can be improved with u = u RL under several vehicle dynamic scenarios. Therefore, it is necessary to find a supervisory strategy, with which the benefits of u LPV and u RL can be achieved.
The idea behind the supervisory strategy is that u can differ from u LPV in a limited domain ∆ max = [∆ max,δ , ∆ max,M ] T , and the primary performances are guaranteed in the en- , the selection of u = u RL is acceptable, because the primary performance specifications are guaranteed, and the secondary performances can be improved.
The supervisory u selection strategy, which is embedded in the supervisor, is formed as an optimization problem, such as where the role of the scalar Q is to unify steering and torque vectoring actuation due to their different units and values.
and where ∆ represents the actual bounded difference of u from u LPV . The optimization problem (10) expresses that it is suggested to track the control input signal u RL through u, but the tracking must be limited by ∆ max to preserve guarantees on primary performance specifications. Using (11), the cost function in (10a) can be transformed as (12) which shows that only the first and the second terms depend on the optimization variable ∆. Thus, the third term during the optimization can be omitted, which means that the control input selection strategy in the supervisor is formulated as a constrained optimization problem, such as The result of the optimization is ∆, from which u is computed through (11). The optimization problem is solved online during the operation of the control system.

Introduction to Robust LPV-Based Design for Coordinated Lateral Vehicle Control
The role of the robust LPV-based coordinated control is to provide control signal u LPV , with which the primary performance specifications during the entire operation of the system is guaranteed. The LPV-based control design is based on the existing results in the field of vehicle control coordination, see [1,4]. Thus, in this paper, the concept of the LPV-based coordinated control design is briefly introduced. Furthermore, the extension of the existing results concerning the proposed design framework in Figure 1 is presented.
The LPV-based coordinated control design is based on a reconfiguration method, which contains local controllers for each actuators and an actuator selection strategy. Each local controller, e.g., for steering and torque vectoring functionalities are designed based on the dynamic lateral model of the vehicle [28], such as where J, m are the inertia and mass values of the vehicle, l 1 , l 2 are the lateral distances between the center of gravity and the front/rear axles, C 1 , C 2 are the cornering stiffness values on the front and the rear axles. The front/rear lateral slip values are formulated as whereψ is the yaw-rate, β is the side-slip of the vehicle and v represents longitudinal velocity. The dynamic formulation of the lateral vehicle motion can be transformed to a state space representation, such aṡ where the state vector is x = [ψ, β,ẏ, y] T and A(v), B(v) are velocity-dependent matrices.
In the design of the robust LPV controllers the variation between u LPV and u is considered as a robustness issue. The bound of the variation of the control inputs ∆ max is built in the control design as an input additive uncertainty, which transforms (15) through (11) aṡ where ∆ is a disturbance signal. In the LPV-based design, ∆ is handled as an unknown uncertainty. However, in practice, ∆ is computed by the supervisor, but this information is not used in the design. A future challenge of the LPV-based design can be the handling of ∆ as a known uncertainty, with which the conservativeness of the resulted controller can be reduced. Nevertheless, in the current design process the bound of ∆, i.e., ∆ max is incorporated in the weighting function on the signal ∆.
In the design of the local controllers for δ LPV and M vect,LPV interventions the primary performance of lateral error z 1 and the secondary performance specifications on the minimization of z 2 , z 3 , i.e., the control interventions are considered. The primary performance specification on z 1 (2) is guaranteed through a minimization, such as Although (17) is an objective in the control design, while (2) is a constraint, in the practice of the LPV-based design the controller can be effectively tuned to achieve a suitable approximation. In the weighting strategy of the LPV-based control design, it is possible to incorporate in the specification z 1,max on |z 1 | through a weighting function. Although the hard constraint is transformed to a soft constraint, in practice the original performance specification on z 1 can be guaranteed. The details on the selection strategy of the weighting function are found in [1,4]. Thus, the control design problem contains the minimizationbased performance specifications (17) and (7).
The reconfiguration strategy between the control interventions requires the possibility to scale the control interventions individually. It is carried out by parameter-dependent weighting functions on the performances z 2 and z 3 . It means that scheduling variables ρ δ and ρ M are defined, whose values can vary between 0 and 1. For example, if ρ δ = 0 is selected then steering intervention is deactivated, while ρ δ = 1 represents fully activation and the selection of 0 < ρ δ < 1 is related to partial activation. In the reconfiguration strategy each intervention through ρ δ , ρ M can be independently activated.
Thus, two independent controllers are designed, one is related to δ LPV and another is related to M vect,LPV . In case of a steering controller, two scheduling variables are incorporated in the design, such as v and ρ δ , and in the case of torque vectoring the scheduling variables are v and ρ M . In the rest of this paper, the set of the scheduling variables is represented with ρ ∈ , but it has different meanings for each controller. Similarly, in case of the steering control design M vect,LPV , ∆ M are not considered, and in the design of torque vectoring control δ LPV , ∆ δ are omitted. Therefore, w represents ∆ δ in the steering control design and it represents ∆ M for torque vectoring. The measured signals of both controllers are the same, such as the lateral error of the vehicle e y = y re f − y.
The goal of the LPV-based control design is to guarantee the quadratic stability of the closed-loop system. Simultaneously, the induced L 2 norm γ from the disturbance w to z = [z 1 , z 2 , z 3 ] T is guaranteed to be less than 1. The stability and the performance level of the closed-loop system are guaranteed by the design procedure [29,30]. The control design leads to the selection of a parameter-varying controller K(ρ, e y ), whose output vector contains steering and torque vectoring interventions. It results in a minimization problem, such as: where the results of the two independent minimization tasks are K(ρ δ , v, e y ) steering controller and K(ρ M , v, e y ) torque vectoring controller. The coordination of the resulted LPV controllers are achieved by an actuator selection strategy, which is mathematically represented by the function F . The outputs of F are ρ δ and ρ M , which are the scheduling variables of each controller. The inputs of F are the interventions δ, M vect and further vehicle dynamic signals, e.g., the longitudinal velocity or slip values of the front wheels. The primary performance specifications (4) and (6) are guaranteed through F . For example, if δ LPV is close to δ max , then ρ δ is reduced, which results in less activation for the steering intervention and thus, (4) is guaranteed. The advantage of the incorporation of the longitudinal slip values in the actuator selection strategy is that the skidding of the wheels can be avoided. If the longitudinal slips are monitored, the torque intervention on the wheels can be reduced through the decrease of ρ M , which leads to reduced longitudinal slip.
The form of the function F can be determined through model-based analysis, economical and empirical considerations. A method for the analysis of the vehicle dynamics to formulate a coordination strategy can be found in [4]. For example, under normal vehicle dynamic conditions the steering intervention can be preferred against torque vectoring due to economy reasons, which is expressed through the selection of ρ δ and ρ M . In that study, the fastness of the interventions has also been examined and it has been stated that differential braking and torque vectoring can be advantageous at high velocities. Formally, F through piecewise linear functions can be formed, whose results are the actual ρ δ , ρ M values. The method for the selection of F and the details on the mathematical formulation of the empirical considerations can be found in [1].
The results of the LPV-based coordinated control design are two independent steering and torque vectoring controllers and the actuator selection strategy F for their coordination. The coordinated control strategy is able to provide guarantees on the primary performance specifications, and some of the secondary performances are maintained. However, during the LPV-based coordinated control design, it can be difficult to formulate and improve some secondary performances, i.e., in this paper, z 4 is not improved. Moreover, the control design can result in conservativeness through the operation of the vehicle control system, e.g., it can be difficult to manage the advanced reduction of the control interventions. Moreover, the formulation of F can contain several simplifications, which can result in reduced performance level on secondary performances. It motivates the extension of the control strategy with the RL-based coordinated design.

Design of RL-Based Coordinated Control
The goal of the RL-based coordinated control is to provide the improvement of the secondary performances, i.e., the comfort and economy performances. In this paper, it is achieved through a neural network with the output u RL . The neural network is trained via an RL process through several episodes.
In the training process, the previously designed robust LPV-based coordinated control and the supervisor are incorporated. The structure of the RL process is shown in Figure  2. The environment for the training process, i.e., vehicle with LPV control and supervisor provides guaranteed minimum performance level on primary performances, independently from u RL . The goal of the RL process is to improve the achievable maximum performance for all performances. The measured signals of the RL-based controller are path tracking error, lateral jerk and yaw-rate. The tuning process of the parameters in the multiple output neural network is based on a reward cost function r, which contains some primary and secondary performances, such as where Q 1 , Q 2 , Q 3 and Q 4 scalar negative values are design parameters, which scale the importance of each terms and the balance between them. In (19) the value n represents the number of samples of a given episode and k is its index. The reason of selecting the terms of r is the following.
• The performance specification on z 1 is guaranteed by the robust LPV-based coordinated control, which leads to a guaranteed minimum performance level on z 1 . However, it is beneficial to take part in the reward, because the maximization of the further performances (z 2 , z 3 , z 4 ) in r can lead to a u RL signal, which might often violate (2). Thus, for avoiding the violation, the optimization in the supervisor can result in u = u RL . It means that it can be rarely found ∆, with which u − u RL is close to zero due to the saturation of ∆ by ∆ max . Consequently, the benefits of the RL-based controller, i.e., the improved performance level on z 2 , z 3 , z 4 can be often lost. Therefore, the incorporation of z 1 in r is recommended. • The minimization of the control interventions are secondary performance requirements, see (7). The balance between steering and torque vectoring interventions are set by Q 2 and Q 3 weights. For finding adequate control interventions, a high number of episodes with various vehicle dynamic scenarios during the training process is performed. Through the training under the various scenarios, the intervention capabilities of the actuators can be met, whose experiences are built in the design of the RL-based controller. It provides a high advantage from the aspect of the intervention coordination, compared to the actuator selection strategy in the LPV-based design, where F is resulted by simplified relations. • The minimization of the lateral jerk is a performance specification (9), which only in the RL-based controller formulation is incorporated. Thus, the resulted controller is able to improve the comfort criteria, compared to the LPV-based coordinated controller.
The reward Function (19) shows that its maximum value might be zero, if all of the quadratic terms have zero value. Nevertheless, zero is a theoretical maximum, because it is not possible to reduce to zero all performances a the same time. Thus, the maximum reward leads to the best achievable maximum performance level. The selection of Q i values for achieving the maximum performance level has two aspects. First, it is necessary to select Q i values, with which the different performances can be compared. For example, path tracking error is around ±0.03 m, while M vect is between ±5000 Nm . Therefore, it is necessary to select a high |Q 1 | value for z 2 1 and a low |Q 3 | value for z 2 3 . Second, the selection of Q i must express the relative importance of each performance, with which the priorities can be guaranteed.

RL-based controller vehicle with LPV control and supervisor
u RL e ẏ a ẏ ψ reward tuning process The goal of the reinforcement learning process is to maximize reward (19) during episodes. In this paper the training process through a deep deterministic policy gradient (DDPG) method is carried out, which is a model-free, online, off-policy RL method, see [31]. A DDPG agent is an actor-critic reinforcement learning agent that computes an optimal policy, which is able to maximize the long-term reward. In the applied method, actor and critical approximators are used. Both approximators use the observations e y ,ȧ y ,ψ, v, which are represented by S. The purpose of the actor approximator µ(S) is to find the action A with u RL , which maximizes the long-term future reward. The role of the critic Q(S, A) is to find the expected value of the long-term future reward for the task.
The result of the learning process is an RL-based controller, which is able to maximize the performance level of z 1 , z 2 , z 3 , z 4 through its control intervention u RL . The achieved neural network can be implemented in the control structure in Figure 1 directly.
Remark that an advantage of the proposed RL-based design method is that the controlled system can be used even under the training process, because the primary performances are guaranteed in every episodes. Nevertheless, the maximum performance level of the system can be low at the beginning of the training. As a consequence, it provides the capability to improve the performance level of the system during the entire life cycle of the automated vehicle. It means that after an initial learning phase the vehicle system can be operated and simultaneously, the signals in the actuation, the reward and the observation can be logged, which can serve for further training. In case of service occasions the RL-based controller can be updated, which can lead to the improvement of the performance level. However, the elaboration of the entire logging, training and updating process together with its infrastructural and cyber-security concerns is a future challenge, which is out of the scope of this paper.

Illustration of the Control Efficiency
In this section, the effectiveness of the proposed control algorithm through simulation examples is illustrated. The examples present vehicle dynamic scenarios, in which the proposed coordinated control structure is used. It is compared with further simulations, which are related to the preliminary results of the learning, i.e., the training has been stopped at a given episode. The goal of the illustration is to show that the secondary performances, especially z 4 can be improved through the proposed coordinated control structure, and simultaneously, the primary performance specifications are guaranteed. Thus, accurate path tracking with the consideration of limited coordinated control intervention and with improved traveling comfort can be achieved.
During the design of the control system, two optimization problems must be offline solved. The design of the robust LPV controller requires the offline solution of the optimization problem (18), which is carried out through an LPV Toolbox for Matlab, see [32]. The design of the controller requires low computation time, e.g., under 30 s. The other optimization process is required by reinforcement learning. It can have high computation time, because it requests the running of a high number of scenarios. Thus, it highly depends on the complexity of the vehicle dynamic model and the traffic environment. In the recent paper the performing of 120 scenarios together with the optimization process between each scenario requests around 1 h with Matlab 2020a Reinforcement Learning Toolbox [33] on Intel i7 CPU.
In the learning process of the RL-based controller the neural networks with the following structures have been trained. The actor network has six neurons in the input layer, three fully connected layers with 48 neurons and Rectified Linear Unit (ReLU) functions in each layers and three neurons with hyperbolic tangent functions in the output layer. The critic network has the same structure, but it has a further input, such as the action itself in the previous step. The sampling time in each episode is selected to T = 0.01 s and 120 episodes are carried out. The terms in the reward function are considered with the same design parameters, such as Q 1 = −1, Q 2 = −10, Q 3 = −10 −10 and Q 4 = −5000.
The 40s long scenarios for the training are generated as follows. The longitudinal velocity of the vehicle is selected in the form of a sinusoidal signal, whose bias, amplitude and frequency values are selected randomly. Thus, the velocity of the vehicle can vary between 30 km/h . . .130 km/h and its frequency can be between 0.01 . . . 2 Hz, which covers the representation of slight motion and powerful maneuvering scenarios. The reference signal y re f for the vehicle is composed as a complex signal, which contains chirp, step and ramp signal elements. In the scenarios, the amplitude of the step signal and the slope of the ramp has been also selected randomly, for covering a high variety of the signals.
The achieved value of the reward function in each scenario is illustrated in Figure 3. It can be seen that the value of the cumulative reward to the end of the training process is significantly increased, i.e., from −100 to −7. The illustration shows the convergence of the function, even if reduced values during the training process are achieved. The effectiveness of the resulted control algorithm on a comparative example is presented. In the example two scenarios are compared, i.e., the robust LPV-based coordinated control and the proposed learning-based coordination (in the legends of the figures LPV and LPV-RL, respectively). The vehicle moves along a curvy trajectory, especially at the end of the scenarios a hook-motion is performed, see Figure 4a. The velocity of the vehicle during the simulation in a high range between 80 . . . 150 km/h is varied, see Figure 4b. The primary performance z 1 , i.e., the tracking error of the path can be evaluated through Figure 4c. It shows that the error has low values in both scenarios, the performance is not degraded through the RL-based controller, which means that the minimum performance level |z 1 | < 0.2 m is guaranteed. Nevertheless, the secondary performance z 4 specification, i.e., the minimization of the lateral jerk is improved. Figure 4d shows that the peak values of the jerk signal are reduced by the proposed LPV − RL controller, which is around 15% reduction, see e.g., the section between 0 . . . 400 m. Thus, the results in Figure 4 shows that the proposed coordinated control design algorithm has a guaranteed minimum primary performance level on the tracking error, while the secondary performance jerk is improved due to the training.
The control interventions are illustrated in Figure 5. The steering and torque vectoring interventions for both scenarios are found in Figure 5a,b. It can be seen that the signals of δ are close to each other, but ∆ δ (Figure 5c) can cause abrupt changes in the steering signal. The torque-vectoring intervention differs in the two scenarios. Due to ∆ M (Figure 5d) the intervention M vect is increased with around 2000 Nm, but in both scenarios the limit z 3,max = 5000 Nm is not violated, similarly to the primary performance specification on z 2 . The increased torque vectoring intervention resulted that |z 1 | is kept below z 1,max , see Figure 4a, while the abrupt changes in δ and M vect have role in the reduction of jerk.  In the rest of the paper the results of another vehicle dynamic scenario is presented. In this example the the vehicle travels along a simplified road section with constant velocity 50km/h. Moreover, in this example the reward function (19) is extended with a further comfort factor, i.e., lateral acceleration a y , such as where z 5 = a y and Q 5 < 0 weight is related to the minimization of the lateral acceleration. The motivation of considering z 5 is that in several comfort objectives lateral jerk and lateral acceleration are simultaneously incorporated in, see e.g., ISO 2631 [34] and UIC ride quality note [35]. Thus, in this example a new training process for achieving RL-based controller has been performed. The illustration of the cumulative reward can be seen in Figure 6. It shows that the reward has an increasing tendency with reducing variation, which is the consequence of the improvement of the agent during the training process.  Some vehicle dynamic signals on the second simulation scenario are found in Figure 7. In this example, the vehicle travels the curvy road section, which is illustrated in Figure 7a. The lateral errors with the LPV-based controller and the proposed learning-based coordinated controller are found in Figure 7b. It can be seen that in this scenario the reduction of the lateral acceleration ( Figure 7c) and the lateral jerk (Figure 7d) requires increased e y .
Nevertheless, the value of e y is acceptable due to the performance requirements on z 1 . The control inputs δ and M vect are shown in Figure 7e,f.
The results of the simulation through the factors of the international standard ISO 2631 and UIC ride quality note are compared. The computation of the factor in ISO 2631 is based on a frequency weighted root mean square on the lateral acceleration data [34]. The computation of the UIC ride quality note is based on the statistics of the lateral acceleration and lateral jerk signals, i.e., the 50th and 95th percentiles. The comparison of the results show that 15% reduction on the ISO factor and 53% reduction on the UIC factor can be achieved. This improvement resulted in the reduction of the lateral acceleration at the end of the simulation scenario. Although it leads to increasing lateral tracking error, its limitation through the design of the LPV-based controller is achieved.

Conclusions
This paper proposed a novel method for the coordination of vehicle control systems. The effectiveness of the design method through simulation examples is presented. The comparison of the scenarios illustrated that the resulted coordinated control is able to improve the secondary performances of the controlled system, and meanwhile, the minimum primary performance level of the system is guaranteed. The resulted vehicle control system is able to operate with increased performance level under high velocity and powerful maneuvering, which is achieved through the various training scenarios.
The provided coordinated control design framework provides several future challenges in the field of automated vehicle control. The proposed design framework contains fixed LPV controllers and coordination strategy, which means that these elements are unchangeable during the training process of the RL-based controller. Nevertheless, it might be fruitful to modify the LPV-based controller and the coordination strategy through a parallel learning process. For example, the parameters of the control-oriented model can be adapted to their real values, and thus, the results of this paper and of [26] can be composed. Another example of the extension is to provide a training process for the setting of ∆ max , which can be formed as a variable, see the results of [25]. Moreover, the variety in the fields of applications also provides future challenges. The necessity of learning-based control elements in further coordination levels existed, e.g., in the level of human-vehicle intervention coordination and in the coordination of automated vehicle and transportation system. For example, the advanced transportation control systems contain several datadriven prediction and learning-based route selection algorithms, which have an impact on the actuator intervention on the vehicle level. Moreover, the coupling effect between the different vehicle dynamics, e.g., lateral and longitudinal tire forces, motivates the coordinated design of several vehicle control subsystems. Thus, the provided coordinated control design framework can have advances in various application fields.