Optimization-Based Reference Generator for Nonlinear Model Predictive Control of Legged Robots

Model Predictive Control (MPC) approaches are widely used in robotics, since they guarantee feasibility and allow the computation of updated trajectories while the robot is moving. They generally require heuristic references for the tracking terms and proper tuning of the parameters of the cost function in order to obtain good performance. For instance, when a legged robot has to react to disturbances from the environment (e.g., recover after a push) or track a specific goal with statically unstable gaits, the effectiveness of the algorithm can degrade. In this work, we propose a novel optimization-based Reference Generator which exploits a Linear Inverted Pendulum (LIP) model to compute reference trajectories for the Center of Mass while taking into account the possible under-actuation of a gait (e.g., in a trot). The obtained trajectories are used as references for the cost function of the Nonlinear MPC presented in our previous work. We also present a formulation that ensures guarantees on the response time to reach a goal without the need to tune the weights of the cost terms. In addition, footholds are corrected using the optimized reference to drive the robot towards the goal. We demonstrate the effectiveness of our approach both in simulations and experiments in different scenarios with the Aliengo robot.

N ∈ R NMPC horizon. N g ∈ R Reference horizon. δ ∈ R 4×N g Sequence of gait status. p f ∈ R 12×N g Footholds sequence. x g c ∈ R 4×(N g +1) State of the optimized reference generator. p g c,k ∈ R 2 X-Y COM reference position at time k. v g c,k ∈ R 2 X-Y COM reference velocity at time k. w g k ∈ R 2 ZMP position at time k. s k ∈ R 2×1 slack variables at time k.
GRFs computed by the QP Mapping. u des ∈ R n u ×N Predicted GRFs by the NMPC. x des c 1. Introduction

Related Work
Legged robots are becoming popular nowadays, thanks to their ability to operate on irregular and complex terrains. The challenge is represented by the design of a proper control strategy that allows the robots to execute their tasks. Early developed methods involve the use of heuristic approaches, e.g., [2]. They demonstrated good performance and succeeded in hardware experiments, but are tailored to specific motions and scenarios. More recently, Trajectory Optimization (TO) techniques [3][4][5] were introduced, since constraints and cost functions can ensure dynamic feasibility and desired performance. In particular, Model Predictive Control (MPC) approaches can compensate for uncertainties and changes in the environment, by computing a new trajectory online, while the robot moves. In our previous work [1], we presented an Nonlinear Model Predictive Control (NMPC) formulation, which runs at 25 Hz and allows the Hydraulically actuated Quadruped (HyQ) robot [6] to perform omni-directional motions, detect a pallet and step on it with improved leg mobility. Minniti et al. [7] integrated Control Lyapunov Functions into their MPC to guarantee stability when the robot has to interact with unknown objects. Hong et al. [8] presented an NMPC implementation with a set of different gaits, while Amatucci et al. [9] exploited a Monte Carlo Tree Search to optimize also for the contact schedule. The last two works, however, have only been tested in simulation. The importance of the MPC approaches is well evident in the case of disturbances, since they allow the robot to comply with external pushes (e.g., in [11]) thanks to high-frequency re-planning. One of the shortcomings of a standard MPC implementation is related to the fact that the robot becomes "transparent" to external pushes, i.e., at each sample the new trajectory starts from the actual position. Therefore an additional effort is required to track a specific reference Cartesian position or to return, after a push, to the original one. A common practice is to have user-defined values as a reference for the MPC [12], but in these approaches the user would have to manually change the reference velocity in order to compensate for the deviation caused by the push. Another possible solution would be to add in parallel to the MPC a simple Cartesian Proportional-Derivative (PD) control action to attract the Center of Mass (CoM) to the reference position [13]. However, this control strategy has some notable limitations, in particular: 1) the added wrench does not consider the wrench produced by the MPC layer (hence, there will be a fight between the two controllers with the risk of violating feasibility), 2) the PD is not aware of the hybrid dynamics of the legged robot, i.e., the intermittent contacts and the (possible) under-actuation. Aliengo Robot Figure 1. Overview of the locomotion framework. The Reference Generator presented in this work is integrated with the NMPC presented in our previous work [1]. The controller introduced in [10] allows the robot to track the trajectories computed by the NMPC. We used the quadruped robot Aliengo for the simulations and experiments.
Moreover, footholds play an important role in the robot agility, since their coherence with the CoM can improve the stability when the robots has to "react" to an external disturbance. 3) Finally, another drawback of PD approaches is that they cannot change feet trajectories, which is crucial to deal with lateral pushes. For example, Barasuol et al. [14] presented a Push Recovery module which modifies the footholds to counteract the disturbance and track the position with the PD. As explained, the drawback of this approach is that the computed locations of the feet give no guarantee that the resulting wrench can be generated by the robot. Another possibility to track a fixed goal is adding it in the cost function. This would solve the issue number 1 of the PD controller, because the constant position is embedded in the cost function of the MPC, hence the "conflict" is dealt with at the cost level resulting in feasible contact forces. However, this solution suffers from the fact that it is only able to apply a limited resistive force before the legs lose their control authority (e.g., when the CoM/Zero Moment Point (ZMP) goes out of the support polygon), and therefore is of restricted applicability. Also in this case, footholds are not generally designed to be consistent with the resulting CoM trajectory, since they are computed with simple heuristics. Cebe et al. [15] optimize for CoM and footholds, but they can re-plan only at the touchdown moments, due to the high computational time required. In addition, the MPCs usually employ references for contact forces that come from crude heuristic computations (i.e., dividing the gravity weight along the legs based on static conditions). These values (i.e., purely vertical forces) are often not feasible for the motion of the base. Undoubtedly, accurately tuning the weights for the different cost terms is a tedious task [16] and having more physically meaningful references has been shown to be a preferable solution [17]. To address this problem, Bjelonic et al. [18] use the result of an offline TO as cost terms for their MPC. In this work, we propose a novel optimization-based reference generator layer that supplies a NMPC with references for CoM and Ground Reaction Forces (GRFs) that are suitable to the task of the robot. The novelty of our approach is that the reference trajectories are computed online (differently from [18]), solving a simplified optimization problem that takes into account the future robot behaviour, and the intermittent contact schedule to generate the references. It is worth highlighting the difference of our approach with the Reference Governors [19][20][21]. In fact, a Governor filters the set-points (i.e., it takes a reference signal vector and changes it to another reference signal of the same type) just to enforce state and input constraints. In the proposed approach, instead, we are interested in considering the intrinsic under-actuation of a trot (chosen as a template gait) when only two feet are in contact with the ground. Moreover, we impose additional features, such as a desired interval in which the robot has to reach a fixed goal or recover its position automatically from big pushes. Another advantage of our algorithm is that it is able to affect CoM trajectories and footholds at the same time, even though the latter are not directly included among the optimization variables. A possible drawback of such a cascade optimization setting is represented by the computational effort, which can result in a reduction of the NMPC frequency. Aiming to find a compromise between accuracy and computational efficiency, we employ models of different complexity using the simplest Linear Inverted Pendulum (LIP) [22] in the reference generator and a more complex one (the Single Rigid Body Dynamics (SRBD) [23]) for the NMPC. The latter, in fact, computes the state trajectories and GRFs that are then sent to the robot, so it requires more accuracy. Thanks to the optimal references, we can avoid using a full dynamics model as in [24].

Proposed Approach and Contribution
In this work we introduce our optimization-based reference generator which endows the cost function of an NMPC [1] with physically informed reference trajectories to be tracked. We integrated this module with our previous works, see Fig 1. To summarize, the contributions of the paper are: • the presentation of a novel reference generator which drives the robot to accomplish a task (optionally in a user-defined time interval), taking into account the under-actuation of statically unstable gaits, like the trot. Footholds are heuristically computed to be coherent with the CoM motion, and optimized GRFs are obtained in order to follow those trajectories. The formulation is lightweight enough to maintain the re-planning frequency of 25 Hz of the NMPC. • simulations and experiments to demonstrate the effectiveness of the proposed approach in three different scenarios: (a) straight motion, (b) fixed lateral goal and (c) recovery after a push. We also compared in simulation our algorithm with a state-of-the-art approach (NMPC + PD action) for the scenario (c). • as an additional minor contribution, we demonstrate the generality of the approach showing it was able to deal with different dynamic gaits, i.e., trot and pace.

Outline
The paper is organized as follows: Section 2 gives an overview of our planning framework, highlighting the main features of the reference generator. Section 3 describes the optimization problem with the LIP model and how it is used to compute CoM position velocity and GRFs references. Simulations and experiments with our Aliengo 1 robot are presented in Section 4. Finally, we draw the conclusions in Section 5. Figure 2 gives an overview of our entire locomotion framework. The user decides the leg sequence and the values of both linear v usr c ∈ R 2 and heading ω usr z ∈ R CoM reference velocities, i.e., the velocities that the robot should follow. The velocities can be changed during the motion and the NMPC will immediately react accordingly. In this work, we use two modules already presented in [1]: the gait scheduler and the robocentric stepping (also used in [25]). Given the user-defined leg sequence, the gait scheduler returns the gait status (either swing or stance) δ i,k ∈ R of each leg i and for each time instant k in the reference horizon N g , reconciliating it with the real condition of the robot, e.g., early or late touchdown. The reference horizon N g corresponds to the maximum response time T f of the reference generator. Note that  If the norm of the error e between the goal and the average CoM position during the last gait cyclep c is smaller than the threshold, simple heuristic techniques are used (see [1]). In the other case, the reference generator is in the optimize status. CoM position and velocity to accomplish the task are computed with a LIP model; a QP mapping is used to obtain the GRFs corresponding to the ZMP computed by the LIP optimization. In addition, the CoM velocity is used to improve the footholds. Finally, the NMPC computes the optimal trajectories for CoM quantities x des c and GRFs u des . A Whole-Body Controller and a PD-Joint controller compute the optimal torques τ des that are sent to the robot. N g can be different from the horizon N used in the NMPC, with N g ≥ N. We use the symbol δ ∈ R 4×N g to refer to the entire sequence of gait status. The robocentric stepping module, on the other hand, has the important task of computing footholds for each leg over the entire horizon. We use the symbol p f ∈ R 12×N g to denote these quantities for the whole horizon.

Locomotion Framework Description
The key idea of robocentric stepping is that the touchdown points are computed with respect to the hip position and they are offset, with respect to that, depending on the CoM reference velocity. Applying the robocentric stepping with the optimized reference velocities allows us to obtain the coherence between CoM and footholds. Variables p f and δ are also used as parameters in the SRBD model of the NMPC. If the error between the goal and average CoM position is lower than a threshold (see Section 2.1), the reference generator does not perform optimizations and user-defined velocities (heuristic references) are used as references also for the NMPC. We refer to this condition as heuristic reference generator. Instead, when the reference generator is in the optimize state, the sequences p f and δ are the input parameters to a first-stage optimization that employs the LIP model (Section 3.1).
Here we compute the optimal X-Y CoM trajectory p g c , v g c to reach the goal p goal c where dynamic stability is satisfied (i.e., ZMP always inside the support polygon). However, since the footholds are computed the first time with the simple heuristic approach, we need to recompute them using the optimized velocity. This will result in a new set of footholds that will be used as inputs for a second optimization. We iterate this until a stop condition is reached, e.g., the maximum number of iterations or difference between two consecutive solutions below the defined threshold. As a safety check, if, instead, the solver is not able to converge to a feasible solution, the reference generator is set back to the heuristic status and v usr c /gravity compensation are used as references. The CoM velocity trajectory v g c computed by the LIP model optimization is then used as velocity reference for the NMPC. 2 A Quadratic Program (QP)-based mapping (Section 3.2) computes the references for the GRFs u ref ∈ R 12×N . Finally, the output of the NMPC are the CoM trajectories (position, orientation 3 , linear and angular velocities) x des c ∈ R 12×(N+1) and GRFs u des ∈ R 12×N that will be sent to the Controller block. The Controller is composed of a 250 Hz Whole-Body Control (WBC) [10] and a 1 kHz PD-Joint controller. They generate the torque references τ des for the low-level joint controller of the robot. The State Estimator module [26] provides the actual values x act c of the robot at a frequency of 500 Hz.

Goal Setting and Status of the Reference Generator
As already explained in Section 1, the robot cannot follow a user-defined velocity in the presence of non-idealities or external disturbances, 4 due to the MPC transparency. For this reason, we define the goal p goal c as the position the robot would have reached if it had followed the user-commanded velocities v usr c ∈ R 2 . The goal is updated at each iteration of the NMPC. It is initialized with p goal = p act c,x,y + N g v usr c T s at the beginning of an experiment and at each iteration it is incremented by v usr c T s . Variable T s is the sampling time of the reference generator and it is equal to 1/ f s where f s is the planning loop frequency. 5 As an alternative, the user can decide a fixed goal, either for one coordinate or both. A SLAM algorithm [27,28] goes beyond the scope of this paper, so we assume that there are no obstacles and the goal can be reached. In order to change the status of the reference generator from heuristic to optimize in a meaningful way (e.g., no transition with the normal sway of the robot) we consider the averagep c of the X-Y CoM position during the last gait cycle. To the average position, we add the offset due to the desired motion in the horizon N g and compare it with the goal. We compute the error as: c T s and when its norm e 6 goes beyond a threshold the reference generator status is set to optimize. Algorithm 1 illustrates the pseudo-code of the different computation phases of the reference generator.

Formal Guarantees on Response Time
In an optimization problem, if we set the tracking of the goal as either a running cost or a terminal cost, the response will depend on the tuning of the weights of the cost itself. In addition, with this approach, we cannot impose a predefined time T f in which the robot reaches the goal (response interval). 7 For these two reasons a hard constraint to impose that the CoM position matches the goal can be added to guarantee a response interval equal to T f , starting from the first time instant when the reference generator is set to the optimize status. We also need to impose the same hard constraint on all the samples after M = T f /T s . Indeed, at every iteration of the NMPC, the interval without constraint should shrink because otherwise the target would never be reached, see Fig. 3. For this reason we enforce hard constraints on the goal, with the number of samples M that gradually reduces to zero. In this way, we ensure to reach the goal in a time that does not depend on the tuning of the weights. In practice, using hard constraints might lead the optimization to be trapped in an infeasible solution [29]. Therefore, an implementation with slacks variables in the constraints, that allows us to relax them, is preferable. Penalizing their value in the cost function of problem (2), we have the guarantee that the robot reaches the target in a predefined time interval and, if this is not possible, the solver will find the best feasible solution.

Optimized Reference Generator
As already mentioned, the task of the reference generator is to compute along a horizon N g the linear (i.e., longitudinal and lateral) CoM velocity reference trajectories v g c to reach a desired goal/recover from an external disturbance in a predefined time T f and the trajectory of GRFs u ref to follow it.

LIP Model Optimization
One of the features of the proposed reference generator is to take into account the intrinsic under-actuated nature of a robot when only two feet are on the ground [30]. Therefore, we employ the simplest model that is able to capture this under-actuation: the LIP model [22]. Indeed, this allows to compute the optimal trajectory for the CoM, while imposing a desired behavior for the ZMP. The ZMP is defined as "a point on the ground at which the tangential component of the moment generated by the ground reaction force/moment becomes zero" [31], and its position determines the direction and magnitude of the CoM acceleration. Guaranteeing that the GRFs are such that the resulting ZMP is inside the support polygon ensures that they also satisfy the unilateral constraints [32] (the legs can only push and not pull the ground); therefore they can be effectively realized by a real robot. Since the support polygon boils down to a line connecting the stance feet during a two-leg-stance phase, the ZMP will be able to move only on that segment. Even though the ZMP-based models have been efficiently used in MPC approaches, e.g., [33], we are aware that the LIP model presents some assumptions (vertical and angular dynamics are neglected). As already mentioned, our NMPC uses the SRBD model that will consider these dynamics in the lower optimization stage. We define the state of the reference generator x -Y position of the ZMP at time k. Footholds p f,k and gait status δ k for all the feet at each sample k are the parameters used to compute the support polygon at each node of the reference horizon N g . Additional parameters are initial Z CoM position p act c,z and g ∈ R = 9.81m/s 2 . To obtain x g c and w g we cast the following optimization problem: The terms (1a) and (1b) of the cost function aims to minimize the distance between CoM position and the goal, the norm of the velocities and the distance of the ZMP from the center of the support polygon w ref k (for robustness purposes). In addition, the minimization of velocity term (1a) prevents having large velocities that could determine footholds outside of the workspace of the leg due to the robocentric stepping. Matrices Q v ∈ R 2×2 and Q w ∈ R 2×2 are positive definite weighting matrices. The initial condition (1c) is expressed by setting indicates the set of integer numbers in the closed interval [0 , N g − 1]. The optimization problem (1) does not have any guarantee on the response time. As already discussed in Section 2.2 a formulation with slack variable can be used to impose a predefined instant in which the robot has to reach the target: The slack variable s k ∈ R 2×1 is added in the cost term (2b) and thanks to Equation (2f), and (2g) it allows to impose that the robot CoM coincides with the goal after M samples. Differently from the optimization problem (1), the cost function does not explicitly include a position term, since it is enforced with slacks into Equation (2b). Matrices Q s,q ∈ R 2×2 and Q s,l ∈ R 1×2 are the additional weights for the quadratic and linear term for the slack variables. Equation (2c), (2d), (2e) are the same as Equation (1c), (1d), (1e) respectively.

QP Mapping
The reference generator computes CoM trajectories which the NMPC must follow, but the latter also requires reference GRFs u ref ∈ R 12×N . Since the output of the optimization problem (1) or (2) is a trajectory for the ZMP w g , we need to map this into a set of consistent GRFs, thus moving from a bi-dimensional to the higher dimensional space of contact forces. For this reason, we define the set of indices of the legs in contact with the ground by C. A parametric QP is solved to find the vector of GRFs u qp ∈ R 3|C| which corresponds to the ZMP location w g . If a foot is in the swing phase, its GRFs are set to 0 by default: where i is the leg index. Note that we need to solve a QP for each sample k in the horizon N, therefore to avoid overloading the notation, we do not specify the subscript k in the following quantities. The parameters of the model are the footholds p f , the X-Y components of the CoM position p g c , the ZMP trajectory w g , the initial Z CoM position p act c,z and the mass of the robot m ∈ R. Thus, for every k we solve: The first term of Equation (4a) is the regularization term on the X-Y components of the GRFs, with Q u ∈ R 2×2 as weighting matrix, while the second one minimizes the angular momentum rate, and it is weighted by Q k ∈ R 3×3 . Variable · x represents the skew-symmetric matrix associated with the cross product. Equation (4b) is the definition of ZMP, Equation (4c) represents the gravity compensation. Equation (4d) guarantees that the X-Y CoM acceleration computed with the LIP model in the reference generator (left term) coincides with the one of the SRBD used in NMPC. It is worth highlighting that when the robot is on two legs, the ZMP lies on a line, and so it can only move in a 1D manifold. During that phase, imposing the Eq. (4b) for the X component is enough to guarantee that also the Y coordinate of the ZMP respects the same constraint. Along the horizon N g , each problem is independent from the others, so they can be solved in parallel. In this way the computation effort remains low (a couple of ms to solve a QP problem with linear constraints) and therefore the new reference generator can be integrated into our high-frequency NMPC scheme.

Simulation and Experimental Results
The optimization-based reference generator endows the NMPC planner with the capability to recover from disturbances and avoid drifting in the face of non-idealities. To show its effectiveness, we tailored three template scenarios: (a) motion along a straight line, (b) reaching a fixed goal, and (c) recover from external pushes. We performed the simulations and experiments on the 22 kg quadruped robot AlienGo of Unitree. We consider the trot as a template gait for our experiments because of its inherently unstable nature. Indeed, any asymmetry in the real robot can make it drift when setting a pure forward velocity. The quadruped is thus not able to follow a straight line. The variable T s is set to 40 ms and the horizons of both reference generator (N g ) and NMPC (N) are 50 nodes, corresponding to an interval of 2 s. We keep, thus, the same planning loop frequency of 25Hz as our previous work. The trot parameters are cycle time T c = 1 s, duty factor D = duration of stance phase / T c = 0.65. The values of the weighting matrices are reported in Table 1. Their value has been tuned with a trial and error procedure. We refer to Table 2 of [1] for the values of the weights of the NMPC and the WBC + PD Joint Controller. Without any lack of generalization, slack variables are considered only for the Y component. We used the HPIPM [34] solver integrated into acados [35] library to find the solutions of the problem (2). The problem (4), instead, is solved using eiquadprog [36], since it offers a more straightforward interface for the QPs. Both reference generator and NMPC run on an Intel Core i7-10750H CPU @ 2.60 GHz. The LIP Model Optimization takes 2-4 ms with the prediction horizon of 2 s and thus N equal to 50 , while the time required by each QP (Eq. 4) is negligible, around 0.01 ms. In this section, we will call reference the output of the reference generator, desired the output of the NMPC, and actual the real values measured by the State Estimator.

Simulations
The four-leg-stance phase in a walking trot is the only moment of the gait where the robot has full control authority and is able to track the reference velocities. Neglecting this fact would lead to failure or unpredictably longer response times. In our algorithm, the reference generator   c,x is updated at each iteration of the NMPC according to Algorithm 1 and continuously tracked. The top plot demonstrates that the reference generator is able to compute reference CoM trajectories (yellow line) that allow the actual CoM values (red line) to accomplish the task. Since the reference quantities satisfy the under-actuation of the gait, reference velocities are nicely tracked by the NMPC (respectively yellow and blue line, bottom part of Fig. 4) and consequently by the actual ones. Figure 5, instead, shows the reference w g and actual X-Y components of the ZMP locations for the same simulation (computed by Equation (4b)). As it can be seen, the robot is able to track the reference values for the entire cycle, for both X-Y coordinates, i.e., they have the same trend. In this simulation we decided to keep the reference generator always in the optimize status to show how the algorithm is able to track (once the target has been reached) a zero user-defined lateral velocity v usr c,y . Figure 6 reports scenario (b) with the same goal (-0.2 m) but different response interval (T f = 3 s). Due to the smaller interval, the response is more aggressive and presents an overshoot which is then recovered within the 3 s interval. The task is achieved by simply modifying the response interval T f ; no tuning of the weights of the cost function is required. Shaded areas highlight the response interval. Figure 7 shows the tracking error between the actual and desired Y CoM position. It confirms that the error is negligible and the desired values are tracked by the robot. Next, we want to validate our approach by comparing it with a simple cartesian PD approach in the scenario (c). This is typically implemented computing a feed-forward force f ff,y ∈ R which depends on the error between the actual state and the goal, i.e., f ff,y = K p p goal c,y − p act c,y + K d v usr c,y − v act c,y . In this scenario, for the first 5 seconds of the motion the robot is pushed with a force of 15 N in the Y direction. The user velocity v usr c,y is zero, so the task is to come back to the initial Y position. In Fig. 8 we report the actual Y CoM position p act c,y in three different cases. The green line represents the case in which the system is modeled as a second order with critically damped response. From the control theory, we know that the settling time is equal to T f = 4 ζω n , with ζ equal to damping ratio and ω n natural frequency. To have a critically damped response we aim at ζ = 1.  Fig. 4, yellow, blue, and red lines correspond respectively to reference, desired, and actual quantities. gain are manually tuned such that the robot reaches the goal faster. The result is K p = 170 N/m and K d = 122 Ns/m. As it can be noticed in the plot, once the disturbance is removed, the robot moves towards the goal, but it is not able to follow it once it has been reached. In fact, the robot keeps drifting, moving away from the initial position. We can conclude thus that the approach with a feed-forward force results in a slow response, with a steady state error with respect to the goal. The common practice of setting the PD parameters considering the system behaving as a second order system (i.e. spring/mass /damper) is not valid due to the under-actuation, showing the limitation of the approach. In addition, even hand-tuning of the parameters does not allow the user to obtain the desired behaviour, e.g., the values of K p and K d which have good performance in recovering from the push (purple line) are not suitable for the straight forward motion. The result of our approach, instead, is reported with light blue line. Once the disturbance has been removed, the reference generator computes a velocity trajectory that brings the robot towards the goal in the desired time T f (shaded area) and without steady state error.

Experiments
In this section we present preliminary experiments for the scenarios (a), (b), (c) carried out with the real robot platform. In these experiments, we employed the problem (1) without ensuring guarantees on the response time. As first experiment we performed scenario (a) on the robot, as illustrated in Fig. 9. The figure demonstrates that the NMPC with only a heuristic reference generator (dashed lines) does not succeed in moving on a straight line for a statically unstable gait as trot. Indeed, the robot suffers lateral and backward (less visible) drifts because of two reasons: 1) the trot being an unstable gait the CoM always diverges between two four-leg-stance events in opposite directions. Any little asymmetry in the robot results in a cumulative drift in one direction. 2) since Aliengo has c-shaped legs, they create nonzero moments about the pitch axis during a swing. Enabling the optimal reference generator, instead, the robot is able to reach the goal and prevent the trajectory from drifting (see the continuous line in Fig. 9). Figure 10 shows the change in the reference lateral velocity (yellow line) done by the reference generator. When the average Y position (p y , red line) exceeds the bounds (p bound y , dashed lines), the status is changed to optimize and the reference generator brings back the CoM close to the goal. A threshold of 1 cm around the constant goal has been chosen for the activation. Once the goal has been reached, the reference generator automatically resets to heuristic, and the reference velocity becomes equal to the user one (zero). The continuous changing of the status of the reference generator demonstrates the need to have an external module which corrects the reference trajectories during a trot. Figure 11 shows the Y position of the robot in the scenario (b) with a fixed goal of -0.15 m. Figure 12. Experiments, scenario (c), sequence of screenshots. The robot moves forward (picture 1), and is suddenly pushed with a stick (picture 2). Once the push is removed (picture 3), the optimized reference generator automatically drives the robot back to its initial position. Finally, the robot follows the user-defined velocity (picture 4).
As in simulation, the robot is able to track the reference value, due to the fact that the velocity takes into account the under-actuation of the trot gait. In this case, we decided to keep the reference generator always set to optimize to demonstrate that it is able to work properly also when the target has been reached, compensating drifts as in scenario (a). In the last experiment, we show how we can use our reference generator to react to external disturbances, see Fig. 12. As we have already mentioned in the Introduction, the task is not to reject the disturbance, but to cope with it and later recover from its effect. An analysis of techniques to reject disturbances goes beyond the scope of this work. Figure 13 shows the Y position of the CoM in a real hardware experiment in scenario (c) when the robot receives two manually applied pushes. The threshold on the error is set to 1 cm (dashed purple line). During the push, the robot tries to resist to the disturbance and, thanks to the high-frequency re-planning of the NMPC, it is able to keep the stability and avoid falling. Once the pushing force is removed, the reference generator drives the robot back towards the initial position. As in the previous cases, the reference generator is set to optimize when the robot is diverging from the goal. The readers are encouraged to check the experiments corresponding to the mentioned results in the accompanying video.

Conclusions
In this work, we presented a novel optimization-based reference generator for quadruped locomotion, which deals with the under-actuation of statically unstable gaits and with external disturbances. It exploits the LIP model to compute feasible reference trajectories that allow the robot to accomplish a tracking task. A QP mapping is used to determine the GRFs which correspond to the ZMP location computed by the LIP model. Velocities and GRFs are used as informative tracking references by a 25 Hz lower-stage NMPC planner introduced in [1]. This results in the absence of conflicting tasks in the cost function, which simplifies the tuning of the cost weights of the NMPC. We validated our approach by performing simulations and experiments with the 22 kg quadruped robot Aliengo in three different scenarios: (a) straight motion, (b) fixed lateral goal and (c) recovery after a push. For the last scenario, we demonstrated that the simple solution of adding a Cartesian PD in parallel to the NMPC is not enough to return in a predefined time to the required position. In addition, we presented and validated in simulations a formulation with slack variables that can guarantee to reach the goal in a specified time, without the need to further tune any parameters. Future work involves extending the optimize reference generator also to the heading (i.e., yaw) orientation, and to perform additional experiments with the real platform with the formulation that respects a specified response time.