Incorporating Human Preferences in Decision Making for Dynamic Multi-Objective Optimization in Model Predictive Control

: We present a new two-step approach for automatized a posteriori decision making in multi-objective optimization problems, i.e., selecting a solution from the Pareto front. In the ﬁrst step, a knee region is determined based on the normalized Euclidean distance from a hyperplane deﬁned by the furthest Pareto solution and the negative unit vector. The size of the knee region depends on the Pareto front’s shape and a design parameter. In the second step, preferences for all objectives formulated by the decision maker, e.g., 50–20–30 for a 3D problem, are translated into a hyperplane which is then used to choose a ﬁnal solution from the knee region. This way, the decision maker’s preference can be incorporated, while its inﬂuence depends on the Pareto front’s shape and a design parameter, at the same time favorizing knee points if they exist. The proposed approach is applied in simulation for the multi-objective model predictive control (MPC) of the two-dimensional rocket car example and the energy management system of a building.


Introduction
Throughout the last few decades, multi-objective optimization (MOO) has attracted a lot of attention, especially from the evolutionary optimization community. With increasing computational capacities, new methods to determine (an approximation of) the Pareto front arise regularly. The selection of a solution from the Pareto front is usually left to a human decision maker (DM), which is suitable in the context of one-time optimization, e.g., in product design.
However, multi-objective optimization can be used for more than one-time optimizations. In previous works, we proposed to combine it with Model Predictive Control (MPC) [1,2], i.e., to utilize multi-objective optimization in the permanent (optimal) control of a dynamic system. The main principle is to repetitively formulate a multi-objective optimal control problem at every time step, derive an approximation of its Pareto front, and then select a single Pareto solution. This Pareto solution corresponds to a sequence of inputs (i.e., the decision variables), from which the first element is applied to the system. Then, in the next time step, the entire process is repeated.
For the real-time control of e.g., an energy management system, this means that a decision had to be made approximately every 15 min, which is too tedious for a human. For other systems with higher dynamics, this would be even worse. Thus, this process has to be automated. So far, different methods to this end exist. Unfortunately, most of them do not incorporate the preferences of a human decision maker. If they do, they usually have other drawbacks, e.g., being limited to two objectives or lacking a clear interpretation of how the preferences affect the final choice. Moreover, these methods rely significantly on the Pareto front's extreme points. However, since also the process of determining the Pareto front is automated and the optimization problem's objectives as well as constraints may vary over time, it is not guaranteed that the actual extreme points are found. This setting of varying objective functions and objectives over time is also called dynamic multi-objective optimization [3].
In this work, we aim to overcome these problems in dynamic multi-objective optimization. Our main contributions are that we • Formulate a new adaptation of a well-known deterministic method to sample an approximation of the Pareto front, which is more apt for the dynamic multi-objective optimization case where objectives may correlate sometimes, • Present a new two-step approach for the automated decision making process, which is again designed for the use in dynamic multi-objective optimization and • In its first step uses a definition of a knee region which depends less on accurate extreme points; • In its second step uses a geometric interpretation of a hyperplane representing the preferences of a decision maker formulated a priori; and • Show in a simulation study of an energy management system that the approach leads to a proper representation of the preferences not only in the limited time horizon of the multi-objective optimization problem but also in the long-term costs. Figure 1 illustrates the incorporation of our approach in an MPC application with multiple objectives. The rest of the paper is structured as follows. We define the MOO problem, show how it can be solved with the normal boundary intersection (NBI) method, and review the available methods for the decision making process in Section 2. The methodology of the proposed two-step decision making approach is described in Section 3. We formulate the focus point boundary intersection (FPBI) method as an alternative to the normal boundary intersection in Section 4. We discuss the consequences of incorporating multi-objective optimization in Model Predictive Control and apply the proposed methods to two examples of different complexity in Section 5 before we finish with a conclusion in Section 6. Figure 1. Illustration of the presented approach in the automatized decision making process for multi-objective MPC. As usual in MPC, at every time step, an optimization problem (i.e., an optimal control problem) over some time horizon is formulated. Here, the multiple objectives lead to a multi-objective optimal control problem. The result is a new Pareto front (at every time step) from which a solution has to be chosen. In our approach, we first identify reasonable areas of the Pareto front, i.e., we exclude too extreme solutions. Then, we incorporate the decision maker's preferences to finally choose a compromise. The solution represents a control (input) plan which is then applied to the system (at least the first step in case of MPC). Afterwards, the resulting system state is measured, and the process is repeated for the next time step. Notation R n + denotes the n-dimensional R ≥ 0. For vectors, a subscript i as in J i denotes its i-th value. A superscript j marks the vector as a specific point from a set of points, e.g., J j ∈ J.

Problem Formulation
A MOO problem can be formulated as where z ∈ Z is the decision variable vector, n is the number of objectives and m ineq and m eq are the numbers of inequality and equality constraints, respectively. Since there typically is no single solution which minimizes all objectives J i at the same time, the concept of Pareto optimality is used. A solution z * is Pareto optimal if it is not dominated by any other solution, i.e., there is no solution z for which The Utopia point then consists of the single minima of all objectives and is thus generally not attainable. Similar, the Nadir point is the combination of all objectives' worst values on the front, i.e., In general, three categories of how a solution (or decision) to the MOO problem (1) can be derived exist. A priori (or explicit) methods respect the preferences or interests of the decision maker by calculating Pareto solutions on specific areas of the Pareto front. Interactive (or progressive) methods ask the decision maker for input during the optimization process itself, also to focus on specific areas. A posteriori (or implicit) methods respect the decision maker's preferences only after the Pareto front has been approximated to select a compromise from it. Our approach presented in this paper belongs to the last group, i.e., the a posteriori methods.
In the following, we will first shortly explain how an approximation of the Pareto front can be obtained. Afterwards, we focus on the different available decision making strategies to illustrate the necessity and novelty of our proposed approach.

Determining the Pareto Front
Two different main options exist to obtain an approximation of the Pareto front for the multi-objective optimization problem (1): meta-heuristic (evolutionary strategies, genetic algorithms, etc.) or deterministic (mathematical programming) methods. Meta-heuristic methods can be considered the standard choice. Their biggest advantage is that they can be used for any optimization problem, even with black box models, as long as one can evaluate the objective function, e.g., by simulation. However, this comes at the cost of high and possibly unpredictable computation times and the uncertainty regarding whether a global (or even local) optimum has been found. This makes them less apt for the setup considered here, i.e., the repeated solving of multi-objective optimization problems for, e.g., real-time control. Therefore, we omit any further descriptions of meta-heuristic methods here and refer the interested reader to [4,5].
Approximating the Pareto front with deterministic methods means to repeatedly solve a single-objective optimization problem with different parameters. Thereby, these parameters are varied iteratively such that a different point of the Pareto front is determined each time. Such a combination of the objectives into a single scalar objective function (instead of the objective vector as in (1a)) is called scalarization. Two groups of scalarization methods are commonly used for the above purpose. The first utilizes weighted sums, possibly with exponential expressions of the objectives. The second we call-due to the lack of a better term-intersection methods, since they aim at finding the intersection of the Pareto front with some geometric entity, usually a vector.
The idea to scalarize the multi-objective optimization problem (1) by, for example, maximizing the length of a vector, dates back to the 1970s [6] and has been varied since then [7]. In general, a geometrization of the objective space is used to reformulate the optimization problem such that the actual objective function appears in the constraints only. The normal boundary intersection (NBI) was then introduced as a method of systematically varying the scalarizations to obtain a reasonable approximation of the Pareto front [8]. The procedure is as follows. First, the extreme points have to be determined. Second, a simplex connecting the extreme points is constructed, which is called the convex hull of individual minima (CHIM). Then, this simplex is sampled evenly. This can be expressed with the n × n matrix Φ, whose i-th column is The CHIM is sampled by Φβ with a varying (n × 1)-vector β, s. t.
A Pareto solution is then obtained by maximizing the length κ of the CHIM's normal vectorn pointing toward the Pareto front, with the constraint that the vector's tip ends at the Pareto solution itself. For a combination β, the MOO problem (1) is then replaced by Note that (7) is the same if (7a) is replaced by max κ. Furthermore, the optimization problem's solvability might be changed, since all possible nonlinearities are shifted to the constraints instead of the objective function, which is one of its disadvantages, next to its susceptibility to weakly Pareto optimal solutions. The original normal boundary intersection as described here has been modified in different ways since its introduction in 1998 [9][10][11][12]. However, since this is not the focus of this work, we omit a further description of the detailed differences.

Decision Making (Choosing a Solution)
Once an approximation of the Pareto front has been obtained, a single solution has to be chosen. To this end, various types of a posteriori methods exist. They can be categorized by whether they • Select a final solution by themselves or only identify a subset of solutions which are then presented to the decision maker (DM); • Aim at selecting a good compromise in general (i.e., a compromise solution) or rank the solutions in dependence of the Pareto front's shape (i.e., try to identify a knee point); • Do or do not incorporate preferences of a decision maker.
In the following, we give an overview of the most prevalent methods in the literature. Note that, however, different combinations of the above categories exist, such that the following order is partially arbitrary.
The most common approach is to select a final (compromise) solution using Euclidean distance-based metrics. For example, LINMAP minimizes the weighted distance to the Utopia point [13]. One could argue that the weights represent the decision maker's preferences. However, since weighting can be problematic in general, frequently, the unweighted but normalized distance is minimized instead [1,14]. TOPSIS is an algorithm which considers both distances to the Utopia and the Nadir point [15].
Fuzzy logic is utilized in many methods for different goals but usually to select a single compromise solution, too. For example, it can be used to address uncertain objectives, constraints or decision variables [16] but also to incorporate preferences from linguistic values [17]. In [18], it is used on top of the concept of k-optimality to loosen the crisp definition of Pareto optimality. Overall, the literature on different fuzzy approaches is rich.
An alternative to fuzzy logic for decision making under uncertainties is Evidential Reasoning [19]. Multiple attributes are weighted according to their importance. For each attribute, possible grades are defined, and the likelihood of a solution's attribute to match them are assessed, e.g., a likelihood of 0.3 to be 'good' and 0.6 to be 'very good'. Then, a single overall score of the solution can be derived, and all Pareto solutions ranked accordingly.
Another concept is the use of Shannon Entropy [20]. For each objective, the solutions' entropy is calculated, which depends on their diversity. From these, weights for every objective are derived. Then, the (normalized) solution which fits the weights best is selected.
In contrast to the compromise solutions described above, the possibly more popular aim is the selection of a knee point, which in general is a solution on the Pareto front from which a small improvement in one direction (objective) would lead to a large(r) deterioration in all others. Thus, the shape of the Pareto front is essential.
Different possibilities to define (or find) a knee point exist. Multiple approaches do so based on the point's angle to other parts of the front, e.g., the reflex angle [21], the bend angle [22], the extended angle dominance [23] or the angle utility [23]. Utility-based methods generally define a knee by the best trade-off, i.e., the best ratio of improvements vs. deteriorations compared to all other solutions [22,24,25]. This approach is extended to multiple regions of the Pareto front in [26], i.e., the best trade-off for each region is determined. In [27], knee points are identified by mapping the Pareto front onto a hyperplane. Then, a solution is considered to be a knee point if the other solutions are densely located around it. According to [21], a point is a knee point if it is the result of the optimization of a weighted sum for multiple (different) weight combinations. In an early work, Das [28] characterizes the point with the largest distance to the convex hull of individual minima as the knee.
As an alternative to selecting a single solution (either compromise solutions or knee point), a subset or multiple subsets of the Pareto front which show knee-like behavior or other properties of interest are often determined and presented to the decision maker. Then, the decision maker has to select a final solution from this compromise manually. Note that as mentioned before, this is not applicable for the use case proposed in this paper. However, since the possibilities to do so are relevant for our proposed approach, we cover the most important methods.
If the assignment of a Pareto solution to the subset of interest is based on a metric as explained above, the subset is usually called the knee region. Examples are the trade-offbased knee region [22] or the bulge of points with the largest distance to the convex hull of individual minima [29].
If the assignment is based on the decision maker's preference, the subset might be called region of interest. In [30], the decision maker defines a cost reference point, i.e., an arbitrary chosen Note that a drawback of this method is that it is unclear how large the region of interest will be. In [31], the decision maker defines a starting point and a preference direction. Then, the part of the Pareto front which lies within a pre-defined preference radius around the preference direction is defined as the region of interest. Again, no final solution is provided, and possible knee points are ignored.
In summary, there are many methods to choose a solution to the MOO problem (1) once an approximation of the Pareto front has been determined. However, there is no method which (1) selects a single solution (instead of a subset of solutions), (2) thereby prefers knee points (if they exist) and (3) at the same time includes preferences of a decision maker in a comprehensible way. We try to fill this gap with the approach explained next.

Proposed Automatized Decision Making Approach
We assume to have an approximation J of the Pareto front. Then, the approach consists of two parts. First, the knee region is determined. Second, a solution is finally chosen in dependence of the decision maker's preferences.
All further calculations are done in the normalized spaceJ. Namely, all objective values J i ∈ J are normalized asJ

Knee Region Determination
For the definition of our knee region, we use a metric similar to [28]; i.e., for each Pareto solution, we calculate its Euclidean distance to a geometric object at the edge of the Pareto front. However, the individual minima (also called extreme points) are often hard to find [24,32]. Thus, instead of maximizing the distance from the convex hull of the extreme points, we use a hyperplane which we refer to as the distance plane in the following. Note thatJ q ∈J is the point of the (normalized) Pareto front with the largest Euclidean distance to the normalized Utopia pointJ utopia = 0 . . . 0 and that we use the negative unit vector −e = −1 n×1 as the distance plane's normal vector to avoid sensitivity to possibly unreliable extreme points. Then, the distance of every solutionJ i ∈J to D is calculated as Finally, similar to [29,33], we define the knee regionJ ⊆J as where r lim ∈ [0, 1] is a design parameter with which the influence of the decision maker's preferences can be adjusted. Furthermore, (11) can be understood as a bulge of the Pareto front in the direction of the Utopia point, whose size depends on the Pareto front's shape, as illustrated in Figure 2. Note that in contrast to the commentary in [29], while the bulge is hard to comprehend in more than three dimensions, this is not necessary for our approach, since the final decision making is automatized, too. Figure 3 summarizes the procedure. For fronts without a knee point (a), the knee region is larger than for fronts with a knee point (b) for any r lim . Note that for convex 2D fronts, the distance plane is equivalent to the convex hull of the minima from [28] (if normalized).

Choosing a Solution
After the knee regionJ has been determined, one of its solutions has to be chosen. First, the preferences of the decision maker are formulated as the preference vector p ∈ R n + for all n objectives. Since we work in the normalized space, the objectives' possibly different magnitudes can be ignored. Then, p can be interpreted as the normal vector of a hyperplane whereJ b ∈J is the hyperplane's base point. In the following, we will refer to P as the preference plane.
As base pointJ b , we choose the knee region's solution to which the preference plane is 'tangential', i.e., theJ b =J i ∈J that builds a halfspace with the preference plane which lies below all other solutions, such that In 2D, this halfspace is the area below a line, and the line passes throughJ b and is orthogonal to p. In the unlikely event that multiple solutions on the knee region fulfill (13), any of them can be selected. Figure 4 illustrates different preference planes and the resulting selections for a 2D front; Figure 5 summarizes the selection procedure.

Influence of Imperfect Extreme Points
As stated at the beginning of this section, we assume to have an approximation J of the Pareto front, which includes the extreme points for all objectives. These may influence the final decision significantly due to the normalization scheme (8). However, the determination of the (real) extreme points is often challenging. Thus, in the following, we analyze the effect of imperfect (i.e., underestimated) extreme points for an artificial Pareto front with significantly different magnitudes of two objectives, i.e., Since lim J 1 →0 J 2 = ∞, we restrict J 1 to J 1 ∈ [0.001, 1], which leads to J 2 ∈ [1.44, 1000.50]. The critical extreme point is the one for J 2 . Thus, we compare the calculated knee regions and selected solutions for various underestimated J extreme,2 . Figure 6 shows the results for different settings, which illustrate the dependence of the selected solution on the assumed extreme points. However, this is not a specific weakness of the proposed approach here but a problem that all decision making approaches presented in Section 2.3 share, since they either use a normalization scheme similar to (8), and/or use the extreme points in their utility calculations, e.g., for the angle of a single solution to the extreme points (bend angle).
If for a specific problem, the accurate determination of the extreme points is problematic and the objectives' magnitudes differ significantly, it might be beneficial to normalize the Pareto front with fixed values instead of the dynamic normalization in dependence of the extreme points. Values for such a fixed normalization scheme can be obtained from long-term simulations, as is explained in [2].  (14) for various underestimated J extreme,2 and three different preference settings. r lim = 0.85. As expected, both the knee region and the final decision shift to the right, i.e., to higher values of J 1 , for lower J extreme,2 . The shift is more severe for higher preferences on J 2 (see magenta square). (a) Complete Pareto front with J extreme,2 = 1000.50, (b-f) incomplete Pareto fronts with underestimated J extreme,2 from 750 to 50.

Discussion of the Preference for Knee Points
As stated in Section 2.3, Ref. [21] defines the knee point of a 2D MOO problem as the point which is the solution for the most λ i in the weighted sum where λ i is chosen from a large but finite set ⊆ [0, 1]. However, as illustrated in [22] ( Figure 1) and explained in [34], the minimization of (15) can be interpreted as shifting a plane with angle α(λ i ) to the origin until it is tangential to the Pareto front. Furthermore, this interpretation is also applicable with n-dimensional hyperplanes, see e.g., [35]. Thus, our approach of constructing a hyperplane a posteriori and choosing the solution at which it is tangential to the Pareto front inherently prefers knee points, since multiple preferences p (and thus preference planes) will satisfy (13) for the same J i if it is a knee point (as the small illustration in Figure 4b suggests). However, note that this does not allow the conclusion that our approach could be replaced by solving a weighted sum with the according weights instead. First, the reduction of possible decisions to a knee region prevents too extreme (and thus uninteresting) points to be selected, independently of the formulated preferences. Second, our approach allows us to use an approximation of the Pareto front which can be derived from any method, not just from the minimization of a weighted sum.

Focus Point Boundary Intersection Method
In the following, we present an adaption of the normal boundary intersection method. It is more apt for the proposed setup of multi-objective optimization in combination with Model Predictive Control. Namely, due to varying conditions over time, objectives may correlate sometimes. This would lead to a degenerate Pareto front [36]. Even if they do not correlate perfectly, some extreme points may end up very close to each other. If this is the case for two out of three objectives, the resulting simplex (i.e., the convex hull of individual minima (CHIM)) is a very narrow triangle. Then, in combination with the search direction being strictly orthogonal to the simplex, this might lead to almost no real Pareto solutions being found.
Thus, we propose the focus point boundary intersection (FPBI) method. In contrast to the normal boundary intersection, it (1) constructs a hyperplane which depends less on the extreme points and (2) enables the decision maker to define a search direction to increase the probability of finding solutions in the area of interest. If no specific goal is available, we use the Utopia point. Figure 7 gives an overview of the procedure. The procedure of the proposed focus point boundary intersection method is as follows. We assume that the Pareto front's extreme points {J extreme,1 , . . . , J extreme,n } are known. Moreover, all further calculations are again done after the normalization J →J of the solutions as in (8), such that each objective lies within [0, 1] in the normalized spaceJ.
First, we determine the extreme points a, b between which the distance is the longest, With a, b known, we determine the center point between them, The search direction is then defined fromJ center to the focus point, If no specific focus point is given,J focus =J utopia = 0, . . . , 0 is used, which usually gives good results.
The main idea is to use a hyperplane between the farthest extreme points (a, b), sample it equidistantly in every direction, and to then solve an optimization problem similar as in the normal boundary intersection method, i.e., maximizing the length of a vector with the direction n s from the hyperplane to the Pareto front.J center is used as the base vector of the hyperplane. Thus, we further need n − 1 (orthonormal) direction vectors to describe it. For n = 2 objectives, the connection between the two extreme points already constitutes the hyperplane and is its only direction vector. For n = 3 objectives, the necessary second direction vector can directly be determined as the cross product of the search direction and the first direction vector, For n ≥ 4 objectives, we have additional degrees of freedom. For ease of representation, assume that a = 1, b = 2. This is no limitation, but it can be achieved by simple (temporary) re-ordering. Then, we first construct n − 2 auxiliary direction vectorŝ Note that we use the extreme points, since we can assume that the resulting vectors are linearly independent.
The direction vectors are then determined in increasing order by subsequently calculating the cross product of the search direction vector n s , the already known direction vectors d i and the auxiliary direction vectorsd j for all other directions. To increase readability, we borrow the symbol for the cross product of multiple vectors in the following, with which the -th direction vector is determined by The generalized cross product of n − 1 vectors can be calculated as the determinant of an extended matrix, i.e., Note that we exceptionally use the vector symbol e i here to emphasize that these are the unit vectors, e.g., e 1 = 1, 0, · · · , 0 , and not scalar values. Equation (23) can be solved by using the Laplace expansion along the first column. In doing so, the purpose of the unit vectors becomes clear, too: they transform the minors into a vector again.
With the hyperplane defined byJ center , n s and the direction vectors, we need to sample it to determine starting points for the optimization problem. Hereby, the user can control the resolution by defining a number r F of steps along each direction. Thus, the total number of optimization problems is r n−1 F . We define a 1 × r F step size vector γ by Let s i ∈ [1, . . . , r F ] for i = 1, . . . n − 1 be the sample indices along the n − 1 direction vectors. For a combination (s 1 , s 2 , . . . , s n−1 ), the (n × 1)-dimensional starting vector in the optimization problem is then given by Θ(s 1 , s 2 , . . . , s n−1 ) =J center + The corresponding optimization problem is described by min −κ Note that we use ≥instead of = in (27b) since this led to faster convergence in practice.

Exemplary Case Studies
In this section, we apply the proposed decision making approach to two exemplary systems in simulation, i.e., the rocket car example and the energy management system of a building, and compare it to a simpler baseline approach. Both systems are controlled using Model Predictive Control. Before that, we comment on the combination with Model Predictive Control in general.

Remarks on the Consequences of the Application within MPC
The proposed decision making algorithm is well suited to combine multi-objective optimization with MPC. However, as with at least most multi-objective MPC schemes, theoretical properties such as stability or feasibility become hard to prove. Some works do so for a specific MOO setting.
For example, in [37], a general MOO MPC scheme for nonlinear systems is proposed. They consider a finite number of objectives and show that, given some mild assumptions in addition to the usual, the max() of all objectives as the cost function can be used as a Lyapunov function to guarantee stability. In [38], a weighted sum is used. However, the weights are updated in every time step, thus choosing different Pareto solutions. It is shown that under some conditions on the objectives, e.g., joint convexity, closed-loop stability can be guaranteed. However, for the updating of the weights, a linear programming problem which is not jointly convex in general has to be solved in every time step. An economic MPC scheme with a compromise solution is formulated in [39]. Namely, the authors directly minimize the (unweighted) distance to the Utopia point. However, they only consider steady-state control and show that if the objectives satisfy a Lipschitz continuity property and strong duality, stability can be guaranteed.
To employ more sophisticated (and possibly interactive) MOO schemes such as the one proposed here in combination with MPC, we suggest to indirectly ensure stability for systems such as the one presented in Section 5.4 differently. First, the proposed algorithm should be used only for systems with an inherently stable system dynamic. For example, for a discrete linear system with the system matrix A, states x(k), input matrix B and inputs u(k), the autonomous subsystem x(k + 1) = Ax(k) should be Lyapunov stable. Second, the constraints x ∈ X, u ∈ U should be chosen such that every feasible state is acceptable and that for every x(k) ∈ X, a feasible solution exists such that x(k + 1), . . . , x(N p ) ∈ X and u(k), . . . , u(N pred − 1) ∈ U. If so, the optimal control problem is always feasible independently from the chosen solution before.

Comparison Approaches
To compare the effectiveness of our proposed approach in the following examples, we present a simpler strategy as a baseline. Assume n = 3 objectives and p = [20 %, 70 %, 10 %]. The preferences determine the order in which the objectives are considered in the following. For the above example, all Pareto solutions would be ranked by J 2 first. Then, the worst 70 % (in terms of J 2 ) are removed from the set of possible solutions. Next, the remaining solutions are ranked by J 1 , from which the worst 20 % (in terms of J 1 ) are then removed. Finally, the remaining solutions are ranked by J 3 and the solution which is better than the worst 10 % (in terms of J 3 ) is selected.
In the simulation results presented in Sections 5.3.3, 5.4.3 and 5.4.4, we also optionally combine this simple approach with the limitation to a knee region as proposed in Section 3.1. If so, the knee regionJ is determined first as usual, and then, we select a solution fromJ by the rules described above (instead of using the preference plane).
Note that the preferences have to be normalized first, such that ∑ q i=1 p i = 100 %. If only n = 2 objectives are considered, the solution which splits the set in terms of the preferences can be selected directly.

Example 1: Rocket Car
As a toy example, we first apply the proposed approach to the so-called rocket car, which is controlled using multi-objective Model Predictive Control. In the following, we describe the system dynamics and the resulting optimization problem and compare the simulation results of our proposed decision making approach to the simpler baseline approaches presented above.

System Description
We consider the rocket car in two dimensions. Thus, it consists of two separated double integrators. Its coordinates are z 1 , z 2 , and the corresponding velocities are v z 1 , v z 2 . Together, they form the state vector x = z 1 , z 2 , v z 1 , v z 2 . It has two inputs, which are the acceleration in both directions, u = a z 1 , a z 2 . The dynamics are described by the time-continuous linear state space systeṁ The discretization with the sampling time T s leads to the discrete linear state space system The discrete system (30) shall be driven into a set point z 1,goal , z 2,goal = 10, 5 using Model Predictive Control. As the prediction horizon, we choose N pred = 20 steps and T s = 0.5 s. In the following, we denote the global time step by k and time steps within the prediction horizon by i. We optimize two competing objectives, i.e., first the deviation of the current position from the set point and the energetic expense We force the system to be in a box of side-length 0.2 around z 1,goal , z 2,goal at the end of the prediction horizon by the constraints Additionally, both the velocities and the accelerations are limited, i.e., The multi-objective optimal control problem is then described by

Implementation
Both system dynamics and Pareto optimization, i.e., the determination of the Pareto fronts and the automatized selection as described in Section 3, have been implemented with the MATLAB MPC framework PARODIS [40]. The approximation of a single Pareto front with the focus point boundary intersection method from Section 4, resulting in 19 Pareto optimal solutions at each time step, takes ≈0.8 s on a single core of an Intel i7-8550U CPU with 1.80 GHz. An entire simulation with 40 time steps takes ≈32 s.
Note that for all results presented in the following, the minimum possible costs for each objective have been subtracted. Namely, we run the simulations with each objective separately to obtain the lowest values which cannot be avoided, which are J min pos , J min energy = 632.29, 0.0364 . This way, the effect of the preferences can be interpreted appropriately.

Simulation Results
We vary the preference p pos on J pos from 0 to 100 and the preference on J energy reversed accordingly, such that p pos + p energy = 100. The simulation results are shown in Figure 8. Figure 8a,b show the resulting costs for the simple baseline approach described in Section 5.2. The position costs are extremely high for p pos ≤ 10 with 1247, but they decrease with increasing p pos . However, the energy costs show an unexpected increase for p pos = 60, i.e., they are higher than for p pos = 60, p energy = 40 than for p pos = 70, p energy = 30, which is unwanted.
If the simple baseline approach is combined with the prior limitation to the knee region as in Figure 8c,d, the extreme solutions for the position costs are limited to 230. The unwanted bump in the energy costs for p pos = 60 disappears, too, i.e., the long-term results show a better representation of the preferences. However, the transitions between the preference settings are still unsmooth. For example, the results for p pos = 0, 10, 20 are all the same, and the differences in J pos when increasing p pos from 20 → 30 → 40 are high, low and again high.
The proposed decision making approach in Figure 8e,f has a more predictable and smooth behavior. The effect of the limitation to the knee region is still observable, since the results for the extreme preference settings p pos = 0, 10, 20 are close. However, when p pos is further increased, the quadratic nature of J pos is observable. The energy costs J energy are also increased smoothly with every increase in p pos . Concluding, with our proposed approach, the preferences are represented in the long-term costs as the most predictable.

Example 2: Building Energy Management System
As a more sophisticated example, the energy management system of an office building is controlled using multi-objective MPC. Note that the energy management problem for buildings or microgrids has been a popular application for multi-objective optimization both in the design [41] and for the operation [42]. This is due to both the necessity of respecting multiple criteria as well as the relatively high step sizes, which make the use of computational expensive optimization methods possible.

System Description
For the considered office building, the system states are the building temperature ϑ b and the stored energy E of a stationary battery. The controllable inputs are an air conditioning unitQ cool , a gas radiatorQ heat , a combined heat and power plant P chp and the connection to the public electricity grid P grid . The building's electricity demand P dem , a photovoltaic plant P ren and the outside air temperature ϑ air are modeled as uncontrollable disturbances. For details, the reader is referred to [1,2]. It is controlled using MPC with a time horizon of 24 h (split into N p = 48 steps of T s = 0.5 h) and up to three objectives, i.e., monetary, comfort and degradation costs,

Rocket Car Example: Position vs. Energy Costs
Simple Baseline Approach Every cost term is calculated over the entire prediction horizon. The monetary costs consist of gas costs for the CHP and the heating and electricity costs (or profits) from buying (or selling) power P ren to the public grid [2], For c grid (i), real-world data of the German intraday market from July 2018 is used. In this period, the costs for 1 kWh varied from 0.003 to 0.098 e with an average of 0.0494 e.
The comfort costs describe the quadratic deviation from a desired temperature set point, The third objective consists of the main factors of battery degradation, i.e., the energy throughput, the charging rate and the average state of charge [43,44], where w bat,E = 10, w bat,CR = 0.1, w bat,SoC = 1 and with C bat being the battery capacity and P charge,max being the maximum charging rate. The charging power is not a decision variable by itself, as it is implicitly determined by P charge (i) = P grid (i) + P chp (i) +Q cool (i) ε c . . . + P ren (i) + P dem (i), (44) where ε c is the energy efficiency ratio of the cooling machine.

Implementation
Again, both system dynamics and Pareto optimization have been implemented with the MATLAB MPC framework PARODIS [40]. All results presented in the following are derived from simulations of a time frame of 30 days with real-world data from July 2018, i.e., for the intraday electricity price c grid , the building's power demand P dem and the outside air temperature ϑ air . For the determination of the Pareto front approximations, the focus point boundary intersection method from Section 4 is used. In the 3D case, it formulates 378 single optimization problems in every time step, and one simulation with its 30 · 48 = 1440 time steps in total takes about 2.5 h on a single core of an Intel Xeon CPU E5-1607 v4 with 3.10 GHz. For the 2D case, the simulation time is reduced to approx. 9 min.
Note that for all results presented in the following, the minimum possible costs for each objective have been subtracted. Namely, we run the simulations with each objective separately (e.g., w mon = 1, w comf = 0, w bat = 0 if only monetary costs are to be minimized) to obtain the lowest values, which cannot be avoided. In this way, the effect of the preferences can be interpreted appropriately.

Simulation Results for 2 Objectives
For the 2D simulations, we vary p mon from 0 to 100 while p mon + p comf = 100. The results are shown in Figure 9.  Figure 9a,b show the monetary and comfort costs for the simple baseline approach. The preferences are respected, i.e., every increase in p mon leads to a decrease in monetary costs and consequently to an increase in comfort costs. However, the trade-offs for higher preference values are extreme, especially the resulting comfort costs for p mon ≥ 80. This can be overcome by limiting all possible selections to the knee region as we propose. If so, even the simple selection shows good results in the 2D case; see Figure 9c,d. The highest comfort costs are limited to ≈100, instead of >4000. Note that for p mon = {0, 10, 20} and p mon = {80, 90, 100}, respectively, the results are (nearly) the same, because the knee region sizes have been so small that the extremes are (nearly) almost chosen by rounding. This would be different for denser samplings.

Building Energy Management System: Monetary vs. Comfort Costs Simple Baseline Approach
The proposed approach (Figure 9e,f) incorporates the preferences in the long-term costs as expected, too. Furthermore, the knee region limitation leads to the same results for p mon = {0, 10} and p mon = {80, 90, 100} only. However, here, this is not due to the sampling density and rounding but rather intended behavior. Namely, the resulting preference planes are so steep that they choose the extreme points of the knee region every time. Note that this would change for increasing knee region sizes, i.e., for r lim < 0.85.
Concluding, in the 2D case, the simple baseline approach is inappropriate for the dynamic decision making due to choices and trade-offs which are too extreme if the preferences are not set cautiously. The limitation of possible selections to a knee region, i.e., the first step of our two-step approach, can overcome this problem even in combination with a simpler selection technique than the proposed preference hyperplane (the second step of our proposed approach). However, most likely, this only holds because the occurring Pareto fronts are all convex. Furthermore, in the following, we will see that the selection based on the preference hyperplane is superior if three objectives are considered.

Simulation Results for 3 Objectives
The battery degradation costs are now considered as an additional third objective. Since only the relationship between the elements of the preference vector p = (p mon , p comf , p bat ) is relevant, we vary both p mon and p comf as {25, 50, 75, 100}, while we keep p bat = 50 constant. Figure 10a-c show the simulation results for the simple baseline approach. As in the 2D case, the costs for higher differences in the preferences become extreme, especially the comfort costs in Figure 10b. Furthermore, in contrast to the 2D case, the resulting long-term costs do not follow the preferences as expected. For example, in Figure 10a, the monetary costs are reduced by half first if preferences are changed from p mon , p comf , p bat = 25, 100, 50 to 50, 100, 50 , but then, they increase for 75, 100, 50 . Note that these considerable jumps and changes in direction can partly be explained by the necessary ordering in the algorithm. Namely, the order in which the objectives are considered in removing parts of the Pareto front is relevant. For equal preferences of two objectives, J mon is respected before J comf , which is respected before J bat . However, this does not explain all of the unwanted behavior. Consider the row for p comf = 50 in Figure 10a. The monetary costs increase instead of decreasing if p mon is increased from 50 to both 75 or 100, although the order in which the objectives are considered is the same, i.e., first J mon , then J comf and then J bat . The battery costs in Figure 10c are even more turbulent. They decrease instead of increasing for increasing p mon and p comf = 25 and have drastic jumps in general. Figure 10d-f show the simulation results for the simple baseline approach if the selection is limited to the knee region. As expected, the extreme solutions are avoided, i.e., the maximum comfort costs are reduced from 4040.14 to 271.12, and the battery costs are reduced from 387.56 to 86.49. However, the unwanted behavior is mostly the same otherwise. In contrast to the 2D case, the limitation to the knee region is not sufficient in combination with the simple baseline approach for an appropriate representation of the preferences in the long-term simulation costs. Figure 10g-i show the simulation results for our proposed approach. In contrast to the baseline approach, the long-term costs for the monetary and comfort objective differ when varying p mon and p comf , just as expected. The jumps between the different preference settings are smaller and more evenly distributed. Every increase in a preference leads to a decrease in the long-term costs and vice versa.
For the battery costs, some simulations still show unexpected results, e.g., the total J bat is slightly lower for p = 100, 50, 50 than for p = 100, 75, 50 . However, this can be explained by the weak influence of J bat . Battery and comfort costs are nearly independent and only implicitly linked via the monetary costs or possibly if P grid would be at its limit. The monetary costs are in direct conflict with the battery costs because they can be reduced by buying energy at lower prices, storing it temporarily and selling it at higher prices. However, the assumed battery capacity and charging power are so low that the vast majority of possible monetary costs are due to the possible (but not necessary) cooling and heating of the building. Thus, the Pareto fronts become extremely steep, as Figure 11 exemplary shows. The Pareto fronts are almost degenerate [36].  Figure 10. Monetary, comfort and battery degradation costs for the 30 days of 3D simulations with different preferences p mon and p comf and p bat = 50 for (a-c) the simple baseline approach and (d-f) the simple baseline approach but with limitation to the knee region and (g-i) the proposed preference-based decision making approach, with r lim = 0.85 for the latter two cases. Note the different camera angles for better readability and especially the inverted axis for p comf in (c,f,i). Subtracted minimum costs for each objective have been determined by single-objective optimizations.
However, our approach still handles this problem sufficiently well, as Figure 10i shows a clear trend of increasing costs J bat from p mon = p comf = 25 to p mon = p comf = 100. Furthermore, in contrast to the baseline approach (even with the limitation to the knee region), the battery costs are significantly lower with a maximum of 13.18 instead of 86.49 overall. The long-term results can actually be considered better overall, as our approach outperforms both the simple approach e.g., 25, 25, 50 vs. 75, 100, 50 and the simple approach with prior limitation to the knee region e.g., 75, 25, 50 vs. 55, 25, 50 ) for some preference combinations.  Figure 12 shows how r lim affects the possible influence of the decision maker. For every r lim , we simulated the three possible extremes p 1 = (1, 0, 0) , p 2 = (0, 1, 0) and p 3 = (0, 0, 1) and calculated the maximum difference for each objective, e.g., ∆J mon (r lim ) = max p∈{p 2 ,p 3 } (J mon (r lim , p)) − J mon (r lim , p 1 ).

Exemplary Preference Planes on a 3D Pareto Front
The difference in monetary costs shown in Figure 12a is nearly (anti)proportional to r lim . The possible differences ∆J comf seem to decrease quadratically with an increasing r lim in Figure 12b, which is probably due to its quadratic form (42). The battery costs in Figure 12c again have an outlier for r lim = 0.75, which can be explained by its bad conditioning in comparison to J comf as discussed before. However, the trend of the decrease in ∆J bat with an increasing r lim is clear, too. The average number of Pareto points which are determined as part of the knee region correlates nearly linear to r lim , as Figure 12d shows. However, this depends on the shapes of the Pareto fronts and cannot be generalized.

Conclusions
We presented a two-step approach for automated decision making from an available Pareto front. It allows a decision maker to formulate the preferences of each objective independently of their scales. At the same time, it (1) ensures that only good compromises can be selected by limiting possible choices to a knee region, which (2) depends on the Pareto front's shape, (3) gives the decision maker a design parameter with which he can comprehensibly choose a priori how strong his influence should be, and (4) has a built-in proclivity for knee points, if they exist. Thus, it enables the use of MOO in continuous processes where decisions have to be made repeatedly, such as in multi-objective (economic) MPC, where varying circumstances may lead to very different possible decisions regularly. The simulation results of a toy example as well as a more sophisticated case study of a building energy management system showed superior results in comparison to simpler selection techniques especially for n = 3 objectives.

Conflicts of Interest:
This study has been designed and performed as part of the first author's PhD project at the Technical University of Darmstadt, which has been supported financially by the Honda Research Institute Europe GmbH. The third author was a co-supervisor of the PhD project. There are no financial interests associated with the results of the study. The authors declare no conflict of interest.

Abbreviations
The following abbreviations are used in this manuscript: