Feature-Based MPPI Control with Applications to Maritime Systems †

: In this paper, a novel feature-based sampling strategy for nonlinear Model Predictive Path Integral (MPPI) control is presented. Using the MPPI approach, the optimal feedback control is calculated by solving a stochastic optimal control (OCP) problem online by evaluating the weighted inference of sampled stochastic trajectories. While the MPPI algorithm can be excellently parallelized, the closed-loop performance strongly depends on the information quality of the sampled trajectories. To draw samples, a proposal density is used. The solver’s and thus, the controller’s performance is of high quality if the sampled trajectories drawn from this proposal density are located in low-cost regions of state-space. In classical MPPI control, the explored state-space is strongly constrained by assumptions that refer to the control value’s covariance matrix, which are necessary for transforming the stochastic Hamilton–Jacobi–Bellman (HJB) equation into a linear second-order partial differential equation. To achieve excellent performance even with discontinuous cost functions, in this novel approach, knowledge-based features are introduced to constitute the proposal density and thus the low-cost region of state-space for exploration. This paper addresses the question of how the performance of the MPPI algorithm can be improved using a feature-based mixture of base densities. Furthermore, the developed algorithm is applied to an autonomous vessel that follows a track and concurrently avoids collisions using an emergency braking feature. Therefore, the presented feature-based MPPI algorithm is applied and analyzed in both simulation and full-scale experiments.


Introduction
Recently, empowered by the progresses in efficiency of optimization algorithms and the available computing power, sample-based nonlinear model predictive control (NMPC) approaches can be used to solve nonlinear stochastic OCP in real time [1]. Using the path integral control framework [2], the stochastic HJB equation that corresponds to a restricted class of stochastic OCPs can be transformed into a linear partial second-order differential equation [3]. This class is characterized by arbitrary but input affine dynamics, containing additive white Gaussian noise (AWGN) at the inputs, and the cost function is only restricted to be quadratic in the controls [4]. According to [3], a Feynman-Kac path integral can be used to express the solution for both, the optimal controls and the optimal value function. Thus, the solution of the transformed stochastic HJB is formulated as a conditional expectation value with respect to the system dynamics. As a result, the optimal control can be estimated using Monte Carlo methods drawing samples of stochastic trajectories [5]. While in general the resulting optimal feedback control function has an unknown structure, there are different approaches for its representation. Besides reinforcement learning (RL) approaches based on offline learning a parametrized policy [6,7], NMPC has become the de-facto technological standard [8,9]. Based on path integrals, a new type of sample-based NMPC algorithm was presented in [8]. Using a free energy definition described in [10] and [11], the input affine requirement is completely removed by [4]. Because MPPI is based on Monte Carlo simulation, the information content of the drawn samples is highly dependent on the proposal density [12,13]. Due to this fact, the system's performance can be significantly improved by provisioning a good proposal density. While recently in [14] a robust MPPI version, in [15] a covariance steering approach and in [16] a learningbased algorithm are introduced, in this paper, a novel feature-based MPPI extension is presented. In Section 2, the basics of the MPPI approach are described. This is followed by the derivation of a feature-based extension of the MPPI algorithm in Section 3. In Section 4, an application scenario is defined containing the equations of motions of a vessel and cost functions, which are combined as an OCP. Furthermore, an emergency braking feature is defined. In Section 5, the controller design is presented including the architecture, the controller parameters and the cost function parameters. A comparison of usual MPPI control and its feature-based extension in a simulation environment is given in Section 6. In Section 7, full-scale experiments with the research vessel Solgenia are used to compare the MPPI algorithm and its feature-based extension. This is followed by the conclusions of this paper including ideas for future work in Section 8.

Model Predictive Path Integral Control
Recently, a sample-based NMPC algorithm was derived by [4] using the path integral framework. This so-called MPPI approach calculates the optimal control inputs by numerically solving the time-discrete nonlinear stochastic OCP and v t = u t + t , t ∼ N (0, Σ), R = λΣ −1 , λ ∈ R + in real time, where F : R n × R m → R n denotes the time-discrete nonlinear system dynamics of the stochastic system state X t ∈ R n , and the actual system input denoted by v t ∈ R m that is given by the commanded system's input u t ∈ R m with time discrete AWGN t ∈ R m with covariance matrix Σ ∈ R m×m . The so-called temperature of the stochastic system is denoted by λ. The objective function is given by the expected value of the terminal costs denoted by φ(X T ), the instantaneous stage costs denoted by C(X t ) and a quadratic input term with weighting matrix R ∈ R m×m . The objective function to be minimized in (1a) evaluates the expected costs subject to the commanded system inputs U = {u 0 , u 1 , . . . , u T−1 } ∈ R m×T under the measurement distribution Q corresponding to the probability density function (PDF) where V = {v 0 , v 1 , . . . , v T−1 } denotes the sequence of actual input values. The distribution of the controlled system denoted by Q corresponds to an open-loop sequence of manipulated variables u t = u P t for t = 0, 1, . . . , T − 1. According to [17], the expectation value under a measurement is defined as E Q (ρ(V )) := ρ(V )q(V ) dV , where ρ denotes a scalar function. In this section, the basics of this novel algorithm are described before it will be extended in the next section. According to [4], the density function of the uncontrolled system denoted by P with zero input u t = 0, t = 0, 1, . . . , T − 1 leads to the PDF An initial state x 0 and a realized sequence of input values V k can be uniquely assigned to a trajectory without stochastic influence by recursively applying (1b). Introducing the cumulated state-dependent path costs S(V k ) = φ(x k,T ) + ∑ T−1 t=0 C(x k,t ) according to [4], the value function of the OCP (1a) is given by with respect to the uncontrolled dynamics [4]. To express the value function with respect to Q, the likelihood ratio p(V) q(V) must be introduced, which yields Applying Jensen's inequality according to [4] yields where the bound is tight with where η ∈ R denotes a normalization constant. In [4], it is shown that the associated optimal control values are given by where D KL denotes the Kullback-Leibler divergence, and Q * denotes the abstract optimal distribution. According to [4], the optimal input is given by where Ω V denotes the image of the sample space, and the importance weighting can be calculated using the PDFs (2), (3) and (7). Using Monte Carlo simulation, (9a) can be estimated via the iterative update law where N samples are drawn from the system dynamics (1b) with the commanded control input sequence U = {u 0 , u 1 , . . . , u T−1 }. The iterative procedure described in Algorithm 1 is used to estimate the optimal commanded control input, and to improve the required importance sample distribution Q simultaneously.

Extension to Feature-Based Proposal Density
In this section, the exploration problem of standard MPPI control is described, the idea of feature-based extension is presented and the resulting feature-based MPPI algorithm is presented.

Exploration Problem of Classical MPPI Control
To improve the sampling efficiency, both the value function (5) and the optimal control sequence (9b) can be calculated by sampling trajectories under the probability measure Q with the proposal PDF (2). Due to the restrictive assumption R = λΣ −1 , the proposal PDF is only parametrized by a sequence of inputs U P = {u P 0 , u P 1 , . . . , u P T−1 }. The sampling efficiency is dependent on the quality of the proposal density [13]. In [4], the recent estimate U t of the optimal control sequence U * is used to determine the proposal PDF for the next sampling iteration. The objective function (1a) is not necessarily convex. Using an infinite number of samples, the MPPI algorithm is a global minimizer. However, due to a finite number of drawable samples, the explored state-space is concentrated around the last approximation. In order not to remain in a local minimum, the explored state-space must be enlarged. An exemplary possibility to enlarge the explored state-space is shown in Figure 1.

Feature-Based Extension of the Search Space
An enlargement of the explored state-space is possible by introducing an additional feature-based proposal density to draw a part of the samples from. Therefore, a feature is implicitly defined by solving the artificially designed OCP where φ F : R n → R denotes the terminal costs of a feature and C F : R n → R denotes the instantaneous costs of a feature. Thus, the feature-based proposal density is given by parametrized by a sequence of inputs U F = {u F 0 , u F 1 , . . . , u F T−1 }. Analogous to the previous section, the stochastic OCP (12a) can be solved by using the MPPI algorithm.

Resulting Feature-Based MPPI Algorithm
The resulting feature-based MPPI algorithm is presented in Algorithm 2. First, the control sequence is optimized using Algorithm 1, and the predicted costs are evaluated. Then, all feature control sequences are improved, and their performances regarding the main cost function are evaluated. The best control sequence is chosen to be the main control sequence. Then, the first element of the control sequence is applied. Subsequently, the sequence is shifted, and the last element is initialized. Note: In the implementation, lines 4 and 5 can be combined to reduce the computational effort.

Maritime Application Scenario
Autonomous mobility of maritime systems is a central problem of our time. Due to advances in science and technology, the requirements for the performance are constantly increasing. For example, recent publications such as [18][19][20] show how NMPC algorithms are used in the maritime environment to solve complex control engineering problems. Moreover, algorithms in the context of RL such as Deep Q-learning [21], Deep Deterministic Policy Gradients [22] or Actor-Critic [23] are increasingly used, which achieve good results for various applications, but the majority of them are only validated in simulation. While MPPI control was used to dock a fully actuated vessel in full-scale autonomously [24], a gap in this research area is the feature-based MPPI control of an autonomous vessel for collision avoidance. In order to also being able to perform autonomous maneuvers, such as heading for a defined target outside the port, it must be ensured that the hazards during these maneuvers are minimized. As regulated by the International Regulations for Preventing Collisions at Sea (COLREG) [25], in the event of a predicted collision, it is mandatory to take evasive action. In poor visibility conditions such as fog, heavy rain or due to obstacles appearing through the water surface, there are scenarios where the vessel's sensors will not detect potential collision partners until a short distance away. In such a scenario, using the standard MPPI approach, all sampled trajectories would result in a collision. This overloads the controller and makes it impossible to avoid a collision. Thus, the question arises how feature-based MPPI control can be applied to avoid a collision even in such a scenario. To answer the question, the feature-based MPPI control algorithm is applied in an emergency braking scenario for an autonomous vessel. In the scenario being discussed, the vessel is traveling at full speed along a predefined path,where an obstacle appears at a certain point. Therefore, the vessel has to autonomously perform a so-called last-minute maneuver to avoid a collision. To provide an overview, first the state and dynamics of the fully actuated research vessel Solgenia are presented. Then, a standard MPPI controller is parameterized, which causes the vessel to follow a predefined path. This is then extended to a feature-based MPPI controller by introducing an emergency brake feature.

Dynamics of the Vessel
In [26], a detailed description of the dynamics of the research vessel Solgenia including identification of the model parameters and the calculation of the actuator thrusts is outlined. Nevertheless, a short description of the modeling is given in the following since the model is an elementary component of the MPPI approach used in the scenario. A photography of the research vessel Solgenia is shown in Figure 2a. According to [27], the vessel's state vector is given by a combination of the 2D pose η = (x y ψ) and the velocity vector ν = (u v r) in body-fixed coordinates, shown in Figure 2b. This model was extended in [24] to also consider the dynamics of the actuators. Thus, the transfer properties of the i = 1, 2, 3 controlled actuators can be modeled in each case by a first-order low-pass filter with an equivalent time constant T E , limited slopeã, and dead time T T represented bẏ where w i denotes the i-th actuator's desired state and a i denotes the i-th actuator's actual state. Thus, the actual actuator states a = (a 1 a 2 a 3 ) = (n AT α n BT ) , where the speed of the azimuth thruster (AT) is denoted by n AT , the orientation of the AT is denoted by α and n BT denotes the speed of the bow thruster (BT), and the desired actuator states w = (n d,AT α d n d,BT ) denoting the desired variables of a get a part in the system state The whole dynamic model of the vessel is given bẏ where in (16a) the desired variable vector w includes the integral action u. In (16b), the actuators' dynamics are considered by d : R 3 × R 3 → R 3 whose components are given in (14). According to [27], the time derivative of the body-fixed velocityν is implicitly described in (16c) as a function of the body-fixed velocity ν, the mass matrix the Coriolis matrix and the damping matrix including the system parameters listed in Table 1, the input vector τ c and disturbance vector τ d . The kinematics equation (16d) describes the transformation of the body-fixed velocity into local coordinates as a function of the rotation matrix J(ψ). The influence of unmodeled effects and environmental disturbances are represented by the disturbance vector τ d . The controlled force vector depends on the geometric parameters L BT , L AT and α and the thrusts F AT and F BT , shown in Figure 2b. The thrust F AT is generated by the AT, and F BT denotes the thrust generated by the BT. These thrust forces can be modeled according to [28] dependent on various physical constants, the body fixed velocity vector ν and the actual states of the actuators a. Thus, the force generated by the BT is given by where K T denotes a unitless constant with ρ denotes the density of the water, the diameter of the propeller is denoted by d BT . Furthermore, the quadratic damping of the thrust dependent on the surge velocity component is scaled by a coefficient denoted by c b . Using this modeling structure, the influence of the relative speed in the axial direction of the BT is assumed to be small and hence neglected. This assumption cannot be used when modeling the force F AT , since the relative axial velocity component of the AT is large in the given application scenario and given by where L AT denotes the distance between the AT and the vessel's center of gravity (CG). According to [26], the force generated by the AT is modeled by where d AT denotes the AT's diameter, and the coefficients c 1 and c 2 are given by where a 1 , a 2 , b 1 , b 2 ∈ R denote the coefficients of the AT. The parameters identified by [26] are listed in Table 2. Note that, according to [26], the parameters b 1 , b 2 and c b are eliminated during the evaluation phase of the identification process. For more detailed information about the dynamics, the reader is referred to [24,26,27].  Table 2. Identified parameters of the actuators' thrust models according to [26].

Parameter Value
Parameter Value

Inequality Constraints
Due to the maximum speed of the actuators, two inequality constraints are defined. By considering the actuator dynamics (14) in the model, no further inequality constraints are required.

Equality Constraints
In the treated scenario, the most important aim is to avoid collisions. In order to achieve this, first the indicator function is defined, where η col is the subset of the state-space where the vessel causes a collision and η fine is a disjoint subset. Consequently, the equality constraint to be fulfilled is given by where only the indicator function (27) is used. In the next part, the derivation of the cost function is presented for the given scenario.

Cost Function
The instantaneous state-dependent cost function C(x) significantly governs the behavior of the controlled system, since it is used to evaluate the sampled stochastic trajectories. However, for a complex task, the stage cost function can be learned from expert behavior; using inverse reinforcement learning [29][30][31] for the given scenario, we can define the linguistic criteria for the cost function and then express them as simple equations and subsequently add them up. The vessel should move on a predefined trajectory shown in Figure 3 and should avoid collisions. This task is complicated by significant current, wind and further disturbance effects that occur when moving in the Rhine River. For a good system behavior, the vessel should meet the following criteria, which are sorted in descending order of importance:

1.
A collision should be prevented.

2.
The actuators must not be overloaded.

3.
The position should follow a predefined the trajectory.

4.
The orientation should be in line with the trajectory.

5.
While the surge velocity component should match a reference, the absolute value of the sway component and yaw-rate component should be minimized.  [32]. The locations of P1 and P2 are specified in Table 3. The desired path is drawn in green. The borders to the inadmissible area next to the river bank are visualized in red. Due to the application scenario, the width and orientation of the river are approximated by 2b = 120 m and ψ start = −31.96 • .
While criteria 1 and 2 are already treated introducing the inequality and equality constraints, to meet criteria 3-5, the quality of sampled trajectories is evaluated based on two parts C(x) = C pos (η) + C vel (ν), (29) where C pos (η) denotes the position and orientation dependent costs used to meet the criteria 3 and 4, C vel (ν) denotes the part of the cost function that is dependent on the body-fixed velocity ν and thus is introduced w.r.t. meeting criterion 5. In the following, the parts of the cost function (29) are presented.

Costs Dependent on the Position
For a clear presentation of the costs dependent on the position, these are defined in transformed local coordinates, which are generated by the linear transformation where the start orientation of the trajectory is denoted by ψ start and η start denotes the 2D pose located at P1 given in Table 3. Subsequently, the costs dependent on the transformed position are given by where c x , c y , c ψ ∈ R + denote weighting coefficients, and x des ∈ R + denotes the distance between P1 and P2 shown in Figure 3. The difference between the body-fixed velocity components and the desired reference velocity is penalized using where c u , c v , c r ∈ R + denote the weighting coefficients. Note that this velocity dependent part of the cost function is minimized by the optimal velocity ν * = (u des 0 0) . Thus, a drifting behavior of the vessel is penalized. In the next section, the resulting problem formulation is given.

Resulting Problem Formulation
To minimize the cumulated costs (29) subject to the given dynamics (16a)-(16d), for the inequality constraints (26) and the equality constraints (28) using MPPI control, the OCP has to be formulated in the assumed structure (1a)-(1b). Consequently, the equality and inequality constraints must be considered in the cost function with being defined, where c pen ∈ R + denotes coefficients of the penalty terms. In addition, the terminal cost function φ(x) must be determined. Because this function is not needed in the given scenario, it is defined as In MPPI control according to [4], a discrete-time system dynamics with additive input noise is assumed in (1b). Therefore, the time continuous system dynamics given in (16a)-(16d) is discretized using the explicit fourth order Runge-Kutta method with a step size h. Therefore, the input signal is chosen to be piecewise constant within this step size. Consequently, the discrete-time vessel dynamics that maps a system state is given by Finally, the assumption that the input of the system is disturbed with time discrete AWGN yields u t ∼ N (u des,t , Σ), where u t denotes the actual and u des,t the desired system input at time instance t. The covariance matrix of the AWGN is denoted by Σ. Using these assumptions, the resulting stochastic OCP is given by where the sequence of desired inputs is denoted by U des = {u des,0 , u des,1 , . . . , u des,T−1 }. In the next part, the usual MPPI approach is extended to the feature-based MPPI to improve the system's behavior.

Feature Definition
To use the presented feature-based MPPI algorithm, at least one feature has to be chosen. As already discussed in Section 4.4, the selection of the features can be done by inverse reinforcement learning [30,31] or by the definition of a linguistic quality criterion, which is subsequently formulated mathematically. It is important to note that only wellchosen features lead to an improvement of the system's behavior. In this context, wellchosen means that, in some cases, low-cost regions of the original cost function (33) of state-space are explored by sampling trajectories with controls, which minimizes the feature costs. Note, for the given scenario, the following linguistic quality criterion is defined: a reduction in speed can mitigate or even prevent a collision with an obstacle. For this purpose, an emergency break feature is defined by choosing where c Fu , c Fv , c Fr ∈ R + denote the coefficients of the velocity dependent costs. Consequently, a control sequence that minimizes this feature leads to the exploration of the region of the state-space decreasing the vessel's velocity. In the following section, the presented scenario is used to compare the performance of classical MPPI and feature-based MPPI control.

Controller Design
In this section, the design for both the standard MPPI controller presented in Section 2 and the feature-based MPPI controller presented in Section 3 is shown with application to maritime scenario presented in Section 4. For this purpose, the modularized and distributed control structure used is described. Note that an Unscented Kalman Filter (UKF) according to [33] is used to estimate the system state based on the sensor signals and the actuators' states. These are controlled in subordinate control loops. This is followed by a description of how the control parameters are determined. Finally, the selection of the cost function's coefficients is presented.

Architecture
An overview of all parts of the used control architecture is given in Figure 4. The state estimation and the communication are performed by the rapid prototyping system MicroAutoBox2 (MABX2). While the supervisory MPPI algorithm is running on the MABX3, three subordinate 1232SE control units by Curtis are used to control the drives. An RTK-GPS and an IMU sensor send their measurement data to the MABX2 using a UDP protocol. Using a UKF, this information is fused with the measured actuator state a to estimate the system's 2D pose denoted byη, the body-fixed velocity vectorν and the disturbance forcesτ d , which are sent to the MABX3 via UDP protocol. The MPPI algorithm uses the provided state estimation to calculate the desired system input u. Because the system input is defined as the zero-order-hold time derivative of the desired values of the actuator control loops, an explicit Euler integrator can be used to calculate w implemented on the MABX3. Subsequently, w is sent via UDP protocol to the MABX2, which then provides w to the subordinated actuator controllers via CAN protocol. The encoders' measurements of a are denoted by a m and collected with high quality; therefore, no additional signal processing is needed. The measured actuator states are sent from the encoders to the 1232SE drive controllers, which then send them via UDP to the MABX2 for state estimation.

Controller Parameters
The choice of controller parameters determines the properties of the sampled trajectories, which are realizations over Q. They have a significant influence on the quality of the estimation of the optimal control sequence.

Numerical Solution of the Initial Value Problem
As already discussed, the corresponding initial value problem can be solved numerically using the explicit fourth order Runge-Kutta method with a step size of h = 40 ms to sample the trajectories. It should be noted that, in principle, there is no information about the future course of the disturbance variables. Due to the river current, the disturbance vector τ d is assumed to be constant within the predicted horizon.

Choice of Controller Step Size
The controller step size denoted by T CTRL should be selected as large as possible to reduce the dimension of the control sequence in the optimization problem, but also small enough to be able to induce a desired system behavior. Furthermore, it has to be considered that T CTRL is a multiple of h = 40 ms. Simulations have shown that good performance can be achieved with T CTRL = 360 ms.

Prediction Horizon
The dimension of the control sequence, as well as the influence of the inaccuracies of the model, increases with rising prediction horizon denoted by T. However, a sufficiently large prediction horizon is essential for evaluating the result of the choice of control sequence. Based on the vessel's dynamics, the prediction horizon is determined based on the time until the maximum surge velocity is achieved. To reach u max = 3 m/s with a maximal acceleration of 0.2 m/s 2 , a duration of 15 s is needed and thus a prediction horizon of T = 41 time steps is chosen.

Covariance Matrix of Additive Noise
In path integral control framework [12], the choice of the covariance matrix Σ of the AWGN, the quadratic costs of the input is determined by Σ = λR. The quadratic penalty term u Ru relates to the accelerations of the propellers and the rotational velocity of the AT. We assume uncorrelated noise processes to excite the system's behavior and thus the covariance matrix is chosen. The elements of Σ are chosen proportionately to the corresponding actuator dynamics listed in Table 1. While the maximum dynamics of the AT is given directly, the maximum acceleration of the BT is calculated using a first order Taylor approximation of the delay time, this yields an equivalent time constant of T T + T E = 350 ms. Thus, the variances of the AWGN are given by where parameter c is introduced to scale the variances. Regarding the explore-exploit dilemma [12], the optimal value c * = 0.09 is approximated by numerically solving the minimization problem where T sim denotes the number of simulation steps,ũ k denotes the approximated optimal system input using MPPI control andx k denotes the resulting state at the time instance k withx k+1 = F(x k ,ũ k ). Using the initial state η 0 = ν 0 = a 0 = w 0 = 0 and T sim = 3T = 123 simulation steps c * is found, which parameterizes the MPPI algorithm leading to minimal cost of the scenario. Note that, using the MPPI update law (11), the noise of the AWGN and thus its scaling factor c has a huge influence of the approximated optimal controlsũ.

Temperature
According to [12], the temperature λ with scales the quadratic input costs matrix R. The relation between R and C MPPI has a significant impact on the resulting system behavior. However, C MPPI has already been scaled freely. Thus, λ = 1 has been chosen.

Number of Predicted Trajectories
Using the MPPI update law (11), the expected value is approximated by the mean over the sampled trajectories. Thus, the quality of the approximation increases with the number of drawn samples used. However, the required computing time also increases with the number of realizations. A buffer t buf > 15 ms is targeted to make the real-time capability of the controller robust. Empirically determined, one core of the TI AM5K2E04 processor with four ARM Cortex-A15 cores 1.4 GHz on the rapid prototyping system MABX3 requires 343 ms to simulate and evaluate 9000 trajectories. Thus, the real-time capability of the controller with a buffer t buf = 17 ms > 15 ms is ensured. The choice of all controller parameters determined in this subsection. To provide a structured overview, the parameters are listed in Table 4. Note that, for an objective comparison of the standard MPPI and the feature-based MPPI approaches, the parameters are chosen to be equal.

Cost Function Parameters
The structure of the cost function (33) and the structure of the feature costs (38) are already derived in previous sections. Since their coefficients have a huge influence on the behavior of the system, they must be determined with a similar diligence as the control parameters. The units of the coefficients are chosen in such a way that the corresponding product is unitless. With respect to the parameters c x , c y and c ψ , which are associated with the positions, reaching the desired position x des = 409.9 m and keeping the desired orientation is weighted more than leaving the path orthogonally. Thus, c x = 500 1/s > c y = 200 1/s and c ψ = 2500 1/rad are chosen. With reference to c u = 5000 s/m, c v =10 s/m and c r = 7500 s/rad, which are assigned to the velocity components, errors of the surge speed w.r.t. the desired surge speed u des = 2 m/s and yaw rate is weighted more heavily than the occurrence of a sway speed component. The feature cost coefficients with c Fu = 300 s/m, c Fv = 30 s/m and c Fr = 10 s/rad are chosen to weight the surge speed component by far the most, since it can be assumed that this component dominates in the application scenario. Subsequently, it is assumed that there is an obstacle which in the case of x > x collision = 80 m would cause a collision. Furthermore, it is assumed that the vessel detects the obstacle if its position is x > x appear = 60 m. An overview of the coefficients is given in Table 5.

Simulation Results
The MPPI controller presented in Section 2 and its feature-based extension introduced in Section 3 are embedded in a simulation environment to compare their performances when controlling the vessel model in the maritime application scenario described in Section 4. Therefore, the parameters derived in Section 5 are used. The vessel model including the actuator dynamics, the UKF according to [33], the MPPI controller and the feature-based MPPI controller are implemented in a simulation environment. According to [33], a constant river current is assumed with a velocity of 0.45 m/s and an orientation of 200 • to the x-axis. The simulation time is determined to 60 s. To compare the vessel's behavior, both the standard MPPI approach and subsequently the feature-based MPPI approach are used to control the vessel. The resulting trajectories are compared in Figure 5. While the upper plot shows the performance of standard MPPI control, the lower plot shows the performance of the feature-based MPPI control approach. Both plots use transformed coordinates calculated by (30). The green lines visualize the reference trajectories in both plots. After passing the black line, the vessel detects the obstacle visualized as a red line.
In the absence of an obstacle, both trajectories are quite similar in orientation and position. Note the position difference in y occurs since it is only penalized slightly in (1a).
After the obstacle appears, the different control approaches cause different behavior. While the system controlled by the standard MPPI controller causes a collision with the obstacle, the feature-based MPPI control approach can prevent this collision by performing a braking maneuver. The time courses of the body-fixed surge velocity and the vessel's velocity in the direction of the x-axis are shown in Figure 6. In the absence of the obstacle for t ≤ 44 s, the plotted velocity components show the same behavior. After a learning phase of about 5 s immediately after activating the controllers, the vessel accelerates till it reaches the desired surge velocity. After the acceleration phase, the surge velocity is kept exact at 2 m/s. For t > 44 s using the feature-based approach, the vessel decelerates and reachesẋ = 0 m/s at t = 59 s. A comparison of the desired and the actual actuator values is given in Figure 7. Furthermore, in Figure 8, the caused costs using the different approaches are plotted over time. In contrast, using the standard MPPI controller, the vessel's velocity does not change after the vessel appears. This effect results from the fact that, due to the small explored state-space using conventional MPPI control, all predicted trajectories cause a collision for t > 44 s. Consequently, the vessel controlled by the standard MPPI approach collides with the obstacle at full speed at t = 52.5 s. Regarding the input trajectories shown in Figure 7, the desired and actual actuator values are well fitting. Furthermore, in absence of an obstacle for t ≤ 44 s, the input trajectories show nearly the same behavior. To keep the surge velocity at 2 m/s, the AT is driven with about 950 rpm after it reaches its maximal value during the acceleration phase. While the AT is used to reach and keep the desired velocity, the BT is used to compensate the disturbances due to the river current. Using the feature-based MPPI approach for t > 44 s, the actuators' trajectories lead to the braking behavior of the system. To reach the maximal deceleration of the vessel, the AT's orientation is inverted while its speed is maximized. Regarding the time series of the stage costs shown in Figure 8 in the absence of an obstacle for t ≤ 44 s, both approaches cause the same costs. Both controllers find inputs leading to decreasing costs. After the obstacle is detected, the feature-based control approach applies the learned inputs corresponding to the defined feature. These inputs lead to increasing cost due to the decelerating surge velocity; however, in contrast to the standard MPPI approach, a collision is prevented and thus being in the high cost region of state-space is avoided. In this section, it was shown that, using the feature-based MPPI control approach, a significant improvement of the vessel behavior in a maritime application scenario is reached.

Full-Scale Results
To validate the simulation results, full-scale scale experiments are used. For this purpose, the algorithms used in the simulation will be transferred to the rapid prototyping systems of the research vessel Solgenia. A distributed structure of the components is used as already described in Section 5. In addition, the same controller parameters and cost functions are used as in the previous sections. The data are recorded directly during the experiments on the vessel. Regarding the disturbances immediately before the experiments, a river current of 0.46 m/s was determined in the vicinity of the trajectory. This current has an orientation of about ψ curr = 165 • and is thus almost opposite to the movement of the vessel on the trajectory. However, it should be noted that, although the current values were measured by allowing the research vessel to drift uncontrolled in the vicinity of the trajectory, they vary along the river and can therefore only be regarded as rough estimates, which, in reality, are overlaid by a stochastic component. In the experimental phase, the task described in Section 4 was executed once with a standard MPPI controller and once with a feature-based MPPI controller. For this purpose, the vessel's position is stabilized at the beginning of the trajectory using feedback linearization according to [33] until the MPPI controller is activated. Then, the MPPI controller is activated for 60 s because both a collision or a collision avoidance maneuver could occur within this time span. In Figure 9, a comparison of the vessel's behavior is shown for both experiments. In this figure, the vessel's trajectories are plotted in local coordinates. Furthermore, the vessel's contour and thus its orientation are shown at the time instances t i = i(∆T), with i = {0, 1, 2, 3, 4, 5, 6} and ∆T = 10 s. While using the standard MPPI control visualized in the upper plot of Figure 9, the vessel collides with the obstacle; using feature-based MPPI control, a collision can be prevented. Compared to the simulation results presented in the previous section, in the full-scale experiments, the vessel tracks the reference trajectory with higher errors. This behavior is to be expected due to model inaccuracies and the disturbance vector assumed to be constant within the prediction horizon. In the application example described, special focus is placed on the surge velocity component, since it is the most weighted in the selection of the coefficients of the cost function and thus the most important property in the case of the absence of an obstacle. However, if an obstacle is detected in this scenario, it is interesting to see how the speed develops in the direction of this obstacle. Thus, the surge velocity in body-fixed frame denoted by u and the velocity in direction of the obstacle denoted byẋ are shown in Figure 10. The corresponding actuators' signals are shown in Figure 11. The minimal path costs β are plotted over time in Figure 12 in logarithmic scale. Using the described subordinated actuator control loops, the actual values of the actuators follow the desired values pretty good. Both the conventional MPPI controlled system and the feature-based MPPI controlled system show almost the same behavior for t ≤ 44.6 s in the absence of an obstacle.  Standalone paragraphs require initial indentation, i change to indentation, please confirm At t = 5.1 s, the surge velocity increases until a maximum is reached with u = 1.93 m/s at t = 15.5 s. After that, the surge velocity is kept at about 1.9 m/s by both controllers. It should be noted that both controllers achieve a good response to the disturbances. The difference between the desired and actual surge velocity is assumed to be due to the limitation of the propeller speed of the AT and model inaccuracies in this state region of high speed. At t = 44.6 s, the vessel collides with the obstacle within the prediction horizon. The standard MPPI controller does not decelerate after the detection of the obstacle. This effect results from the fact that, due to the small explored state-space using conventional MPPI control, all predicted trajectories cause a collision for t > 51.2 s. Consequently, predicted trajectories with surge velocity close to u des cause lower costs. In contrast, the feature-based MPPI controlled system explores more important areas of the state-space shown in Figure 1. This allows the feature-based MPPI controller to apply the learned braking trajectory, which is optimal regarding the emergency braking feature (38), as soon as the obstacle is detected. Thus, it can also react excellently for t > 44.6 s and brake in time to avoid a collision. As can be seen in Figure 9, braking is achieved by counterclockwise rotation of the vessel, which allows the hydrodynamic effects of the vessel to be utilized in addition to the thruster forces of the propellers. As a consequence, while using the standard MPPI controller, the vessel's velocity in the direction of the obstacle is 1.85 m/s at the moment hitting the obstacle; using the feature-based MPPI approach, the feature-based MPPI controller decelerates the velocity in the direction of the obstacle down to 0.05 m/s and thus avoids a collision. The minimum path costs, which are shown in Figure 12, are equal due to the equivalent system behavior of the two control approaches up to the detection of the obstacle. Using the standard MPPI approach after the detection, a low-cost region of state-space next to η col is explored. However, this leads to the fact that, already for t > 51 s, all predicted trajectories cause a collision. In contrast, using feature-based MPPI control, directly after the obstacle detection, the explored state-space is located at a region around the braking trajectory. This yields to higher costs, but even prevent the real high state-dependent costs caused by a collision. Using an MPC approach, the consideration of costs directly provides a quantitative quality criterion. Thus, if the cost curves in Figures 8 and 12 are compared, it can be seen that the quality with regard to the cost function to be minimized is almost identical. In this section, it was shown that feature-based MPPI control in full-scale conditions can greatly expand and improve the behavior of a vessel and prevent collision without increasing the computational cost.

Conclusions
In this paper, an extension of the MPPI algorithm [4] is presented improving the sample efficiency by a knowledge-based feature definition. This is the first possibility to incorporate information subject to a scenario additive to the usual cost function into the control algorithm. The conventional MPPI and the presented feature-based MPPI control approaches are compared in a collision avoidance scenario. Therefore, a stochastic nonlinear OCP for enabling a vessel to follow a track and avoid collisions is presented and subsequently solved with both approaches. The performances of the algorithms are evaluated using data out of a simulation environment and full-scale experiments. In both simulation and full-scale experiments, while the standard MPPI would collide with an obstacle, the feature-based MPPI can prevent the collision although the same number of trajectories were drawn in both cases. In the absence of an obstacle, the feature-based MPPI controller learns an optimal braking trajectory that minimizes the feature costs subject to the modeled system dynamics. When the obstacle is detected, this is then successfully applied to avoid a collision. Thus, by inserting an emergency braking feature, the performance and usability of the algorithm could be increased significantly. Concerning other NMPC approaches, the MPPI control algorithm is characterized by a very good parallelization capability, since the samples can be drawn in different kernels. This property is not affected by the presented feature-based extension. Thus, the number of features that could be explored and learned can be increased by adding further parallel computing power. In future work, an evaluation of how the presented feature-based approach can be used in other sample-based NMPC algorithms could be scientifically valuable.  Data Availability Statement: The recorded data from simulations and real experiments, which are presented in this paper, can be downloaded using the link https://www.htwg-konstanz.de/fileadmin /pub/ou/isd/Regelungstechnik/Regelungstechnik_Data/MDPI_Machines22_Feature_Based_MPP I_Data_Simulation_and_Experiment.zip. Accessed on 23 May 2022.