Combinatorial Integral Approximation Decompositions for Mixed-Integer Optimal Control

: Solving mixed-integer nonlinear programs (MINLPs) is hard from both a theoretical and practical perspective. Decomposing the nonlinear and the integer part is promising from a computational point of view. In general, however, no bounds on the objective value gap can be established and iterative procedures with potentially many subproblems are necessary. The situation is different for mixed-integer optimal control problems with binary variables that switch over time. Here, a priori bounds were derived for a decomposition into one continuous nonlinear control problem and one mixed-integer linear program, the combinatorial integral approximation (CIA) problem. In this article, we generalize and extend the decomposition idea. First, we derive different decompositions and analyze the implied a priori bounds. Second, we propose several strategies to recombine promising candidate solutions for the binary control functions in the original problem. We present the extensions for ordinary differential equations-constrained problems. These extensions are transferable in a straightforward way, though, to recently suggested variants for certain partial differential equations, for algebraic equations, for additional combinatorial constraints, and for discrete time problems. We implemented all algorithms and subproblems in AMPL for a proof-of-concept study. Numerical results show the improvement compared to the standard CIA decomposition with respect to objective function value and compared to general-purpose MINLP solvers with respect to runtime.


General Context
The goal of optimal control is to find control functions and state trajectories that are feasible and optimal. State trajectories are solutions of systems of differential equations for given control functions and boundary conditions. Feasibility refers to constraints on control functions and differential states; optimality refers to an objective functional of controls and states. Mixed-integer optimal control problems (MIOCPs) do have additional integrality constraints on some of the control functions. Hence, the differential equations of the underlying system depend on the value of integer control functions, or equivalently, one can switch instantly between several differential equations [1][2][3][4]. This problem class is ubiquitous in various application areas, e.g., water, gas, traffic, and supply chain networks [5][6][7][8][9][10][11], distributed autonomous systems [12], processes in chemical engineering that involve valves [13,14], cardiac assist devices [15], or the choice of gears in automotive control [16,17]. We choose problems from a web-based MIOCP benchmark collection [18] to evaluate our algorithms.
Several families of methods have been proposed for solving MIOCPs. This comprehends indirect or first-optimize-then-discretize approaches [19], dynamic programming or Hamilton-Jacobi-Bellman equations [20], switching time optimization [16,[21][22][23][24][25], and direct αpτq´ωpτq dτˇˇˇˇˇˇˇˇ. (1) This motivates us to solve the relaxed problem (OCP) to obtain αp¨q, to calculate an integer control function ω in a second step such that the maximum on the right-hand side of (1) is as small as possible, and to obtain yp¨q by solving the initial value problem (IVP). The continuity of constraint and objective functions implies "good" behavior with respect to feasibility and objective function value in the original MIOCP. The combinatorial integral approximation (CIA) problem in the second step can be formulated as an MILP [32] and can be solved either with generic MILP methods, with specifically tailored branch-and-bound methods [32], and in some cases even in linear time, with the sum-up rounding (SUR) method [27].

Review of the State of the Art
The idea of CIA decomposition has also been used in the context of hyperbolic partial differential equations [36,37], differential-algebraic equation systems [19], with additional combinatorial constraints [10,32], and based on discrete time formulations [10]. Recently, algorithms to solve CIA problems have been implemented into the software package pycombina [38].
In this publication, we generalize all of these approaches by presenting alternative ways to calculate ω based on α, using different MILPs. We formulate them for the case of initial value problems with ordinary differential equations, although they are applicable to all mentioned variants in a straightforward way.
Our results are also independent of the method that is applied to solve the relaxed problem (OCP). For the numerical results, we are going to use a first-discretize-then-optimize approach with Radau collocation. First-discretize refers (a) to approximating the control functions with parameterized basis functions, such as finitely many piecewise constant functions, and (b) to relaxing path and control constraints from the domain of a time horizon to finitely many time points. Then-optimize refers to solving the resulting finite dimensional optimization problem numerically to optimality. An overview of direct methods for continuous optimal control problems can be found in, e.g., [39,40]. For comparison, we also apply this approach directly to the MIOCP and solve the resulting mixed-integer nonlinear program (MINLP) with the general purpose solver Bonmin.
For a general problem class of MIOCPs, the integer gap depends linearly on the control discretization [41]. For many applications, this control discretization is not fixed, but can be refined. Thus, the integer gap can be driven to zero (usually at the expense of frequent switching). For practical reasons, good or even optimal solutions for a fixed control discretization are of interest, which is our main focus.
Recently, the MIOCP class has been increasingly studied in the context of combinatorial constraints that couple over time. For example, as part of the CIA decomposition, the switching-cost aware rounding problem (SCARP) has been proposed to solve it [42,43]. In addition, the CIA decomposition has been studied in the context of multibang and total variation regularization [44,45]. Switching constraints have been also investigated for mixed-integer partial differential equation optimal control problems [46,47]. Switching costs can also be included into the switching time optimization approach based on cardinality constraints [48].

Contributions
We derive different versions of the CIA problem, leading to multiple MILPs. Noting that the computational effort to solve one MILP is usually small compared to the solution of the relaxed nonlinear problem, and even smaller compared to the deterministic solution of the original MIOCP, we propose solving several approximation problems. These solutions are candidate solutions themselves, and can be recombined into new switching sequences. We derive theoretical a priori bounds for them, discuss computational (dis)advantages, and show numerically the improvement to existing decomposition approaches with respect to objective function value, and to general-purpose MINLP solvers with respect to runtime.

Outline of the Article
We define the considered MIOCP class and propose a general decomposition framework in Section 2. The algorithm consists of several MILP formulations, which are discussed in Section 3. We discuss theoretical properties in Section 4. In Section 5, we look at strategies that combine and improve existing binary control functions. In Section 6, we provide and discuss numerical results for the MIOCP benchmark library [18]. Finally, we conclude the article in Section 7, where we also summarize our findings.

Problem Class, Definitions, and Main Algorithm
We denote the considered time horizon by T :" rt 0 , t f s Ă R, and write "for a.a. t P T " for all t P T , except on a set of measure zero. Let rns :" t1, . . . , nu, rns 0 :" 0 Y rns, for n P N. The null vector is written as o. We are interested in the following class of mixed-integer optimal control problems.
1 " We minimize a Mayer term Φ P C 1 pR n x , Rq over differential states x P W 1,8 pT , R n x q for binary control functions ω P L 8 pT , t0, 1u n ω q. The system of ordinary differential equations (ODE), (2b), is written using partial outer convexification to model the switched system, using a one-hot encoding (1hot) constraint (2d), a drift term f 0 , and n ω ě 2 functions f i , ref. [27], both out of C 0 pR n x`1 , R n x q. We assume fixed initial values x 0 P R n x for the differential states. The functions c P C 1 pR n x , R n c q model nonlinear state inequalities.
In the following, we assume that (MIOCP) has an optimal solution, so that we can write "min" instead of "inf". We refer to [19,29,49] for a discussion of the generality of (MIOCP) and extensions to cope with, e.g., Lagrange functionals, boundary and multi-point constraints, vanishing constraints, free final time, control values, and the like. Particularly interesting are continuous control functions u P L 8 pT , U q that often enter (2b) and (2f) in practical applications. From a theoretical point of view, in the interest of comparability, and for computational speed, it is convenient to consider the continuous controls up¨q as fixed to the solution that was obtained by solving the continuous relaxation of (MIOCP) in our approach. It is also possible, though, and often makes sense in practice, to improve the objective function value by reoptimizing u when (MIOCP) is evaluated for fixed ω. For notational convenience and without loss of generality, we omit the continuous controls u in the following. An evaluation of (MIOCP) for fixed ω is then the solution of an initial value problem. We stress that there are only n ω possible solutions for the integer part of the problem due to the constraint (2d).
Our algorithm solves continuous relaxations of (MIOCP), defined as follows.
The problem (OCP) in function space can be solved by different approaches, as mentioned above. For using MILPs to approximate control functions, we map between function space and r0, 1s n ωˆM , using a time grid as follows.
, Ω, Ω M ) Let the ordered set G ω :" tt 0 ă . . . ă t M " t f u denote a time grid with ∆ j :" t j`1´tj and ∆ max :" max j ∆ j for j P rM´1s 0 . We define the mapping: ϕ : r0, 1s n ωˆM Ñ L 8 pT , r0, 1s n ω q, α " ϕpaq using n ω piecewise constant functions: A mapping in reverse direction: is defined by extracting integrals on the grid G ω : We denote integrality and (2d) using the sets: To scale control variables, we are going to need function evaluations and adjoint (dual) variables on the grid G ω . Definition 4 (pλ,f q). Let x˚be the optimal solution of (OCP). The evaluated right-hand side function terms tf i,j,k u kPrn x s from (2b) are defined as the entries of: We denote, byλ j,k P R, t j P G ω , k P rn x s, the discretized and evaluated dual variables of the ODE constraints (2b) in (OCP).
Our decomposition algorithm is based on the algorithmic choices of the sets S CIA and S REC , which we define next.
Definition 5 (S CIA , S REC ). We introduce the set of CIA problems S CIA as: S CIA :" tpCIAmaxq, pCIA1q, pCIAmaxBq, pCIA1Bq, pλCIA1q, pλCIA1Bq, pSCIAmaxq, pSCIA1q, pSCIAmaxBq, pSCIA1Bqu, where we define the specific CIA problems in the next section. For a subsetS CIA Ď S CIA , we denote, with n CIA :" |S CIA |, the number of different CIA problem formulations. Let the elements ofS CIA be numbered by 1, . . . , n CIA . Let the set S REC of recombination mappings F rec P S REC be defined via: where w k denotes the optimal solution of the problem (milp) k PS CIA .
We propose to use Algorithm 1 to approximate the solution of (MIOCP) efficiently with a priori bounds. Relaxing (MIOCP) to (OCP) results in state and control trajectories (line 1). We solve different MILPs to approximate the relaxed controls with binary ones in lines 2-3. Their performance is evaluated in line 4 by calculating their corresponding (feasible) state trajectories and objective values. In lines 6-7, we create new candidate binary controls in several recombination heuristics based on the existing binary controls, which we evaluate as well (line 8). As a final step, we select the best-performing binary control as the solution in line 10.

Algorithm 1: Decomposition of (MIOCP).
Input : (MIOCP) instance, grid G ω , algorithmic choices of setsS CIA Ď S CIA and S REC Ď S REC . 1 Solve (OCP) Ñ Φ rel , x˚, α˚, a˚" ϕ´1pα˚q. 2 for milp PS CIA do 3 Solve milp with MILP solver a˚Ñ w milp ; 4 Simulate (MIOCP) with fixed ω milp :" ϕpw milp q Ñ Φ milp , x; 5 end 6 for F rec PS REC do 7 Create w rec using F rec pw milp q, Φ milp from all milp PS CIA ; 8 Simulate (MIOCP) with fixed ω rec :" ϕpw rec q Ñ Φ rec , x; 11 return: px, w, Φq " px opt , w opt , Φ opt q; The main idea of Algorithm 1 is to decouple controls and states. We approximate the relaxed control function α˚that is optimal for (OCP) with binary controls ω, such that a good objective function value is obtained when (MIOCP) is evaluated for ω. Which MILPs and recombination heuristics are used in the algorithm depends on the definition of the sets S CIA andS REC , which we discuss in Sections 3 and 5. A theoretical motivation and error bounds on the approximation quality are given in Section 4. Algorithm 1 is a generalization of the decomposition approach in [27,32], for whichS REC is empty andS CIA contains only one CIA problem formulation.

Combinatorial Integral Approximation MILPs
In the following, we define the MILP formulations of CIA type for S CIA in Algorithm 1. CIA, λCIA, and SCIA refer to different scalings (Section 3.1), "1" and "max" to different norms }¨} (Section 3.2), and the presence of "B" to a reversal of time (Section 3.3).

Combinatorial Integral Approximation and Scaled Variants
As part of the approximation step, we aim at finding binary control values that are close to the relaxed values with respect to the accumulated difference over all grid points. The following definition specifies the so-far-applied (CIA) problem together with two novel variants. We let the vector norm }¨} be unspecified here, but discuss applicable norms later.
Definition 6 (θC IA , θS CIA , θλ CIA , cf. Definition 4.17 in [50] ). Let a˚be the given optimal control solution of (OCP), and let the evaluated model function valuesf and dual variablesλ be given as introduced in Definition 4. Consider a vector norm }¨}. We introduce the following optimization problems:

Norm Dependent MILP Formulation
By introducing the auxiliary variable θ, we can reformulate (4) and (5) with the maximum norm. Definition 7. (CIAmax, SCIAmax) Let a˚be the given optimal control solution of (OCP) and let the evaluated model function valuesf be given as introduced in Definition 4. We define (CIAmax) as: and (SCIAmax) as: pai ,j´wi,j q∆ jfi,j,k , for k P rn x s, l P rM´1s 0 .
We introduce the MILP analogue formulations for the Manhattan norm with auxiliary variables s i,j ě 0, i P rn ω s, j P rM´1s 0 . In this way, we specify the norm choices for Definition 6. Definition 8. (CIA1, SCIA1, λCIA1) Consider a˚as the given optimal control solution of (OCP). Let the evaluated model function valuesf and dual variablesλ be given as introduced in Definition 4. Based on the auxiliary variables s i,l we define (CIA1) as: pai ,j´wi,j q∆ j , for i P rn ω s, l P rM´1s 0 .
With a different dimension for s k,l , we define (SCIA1) as: pai ,j´wi,j q∆ jfi,j,k , for k P rn x s, l P rM´1s 0 . (12c) Finally, we define (λCIA1) by modifying (12c) with the dual variables: pai ,j´wi,j q∆ jfi,j,k , for k P rn x s, l P rM´1s 0 .
Of course, also other norms, such as the Euclidean norm, can be used. We do not consider them here because of the resulting nonlinearity of the constraints.

Chronologically Ordered Constraints
We consider the possibility to modify the (CIA) problems by altering the chronological order in the constraints for the accumulated difference } ř lPrjs pa˚, l´w¨, l q∆ l } for j P rM´1s 0 . We may use an arbitrary ordering of time intervals, instead of starting from the first interval j " 0. Here, we consider backward accumulation, starting from the interval with index j " M´1, i.e., rt M´1 , t M s: Let the problem where (14) replaces (7b) in (CIAmax) be denoted by (CIAmaxB). The other introduced MILPs can be modified analogously with backward time accumulation and are named accordingly; e.g., (SCIA1B) refers to (SCIA1) with backward accumulation.

Combinatorial Constraints
One advantage of using an MILP for obtaining binary controls after the relaxation step, rather than using SUR, is the possibility to impose combinatorial constraints that couple over time. An example of such constraints is to limit the number of allowed switches σ max P N between activated binary controls, which can be formulated as: We will omit this constraint class in the next section, but refer to [51,52] for a priori bounds and reformulations. Nevertheless, we are going to come back to these constraints in the numerical experiments section for also testing the different MILP approaches in this situation.

A Priori Bounds for CIA Decompositions
We revise results for the a priori bounds resulting from a CIA decomposition [41] and extend them to alternative decompositions that may be used in S CIA in Algorithm 1. We stress here that Manns et al. [53] have recently presented a proof of improved regularity conditions. Nevertheless, we revise the theorem from [41] here, because it results in natural algorithmic extensions.

Combinatorial Integral Approximation
We recapitulate a variant of Grönwall's Lemma from [41], which is needed for the main theorem.

Lemma 1 (A variant of Grönwall's Lemma, see [41], Lemma 1).
Let z 1 , z 2 : T Ñ R be real-valued integrable functions and let z 2 also belong to L 8 pT , Rq. If, for a constant L ě 0, the following holds: for a.a. t P T , then we have: Proof. See [41], proof of Lemma 1.
In the following results, we analyze the evolution of two trajectories, x and y, based on the same ODE system (2b) but driven by two different controls, α and ω. The following theorem gives a statement on the distance between the two trajectories depending on the distance of the controls. Theorem 1. Consider α and ω P Ω. We reuse the model functions f 0 , f i : TˆR n x Ñ R n x from Definition 2 for i P rn ω s. Let xp¨q and yp¨q be the unique solutions of the IVP: where x 0 , y 0 P R n x . Assume that there are positive constants L, C P R`, together with a vector norm ||¨||, such that, for a.a., t P T holds: Furthermore, let f i p¨, xp¨qq, i P rn ω s be essentially bounded by B P R`on T , and assume that for all t P T . It holds that:ˇˇˇˇˇˇˇˇż with the constant θ ě 0. Then, for a.a. t P T we also have: Proof. This Theorem is an (equivalent) reformulation of Theorem 2 from [41]. The only differences are modified notations and the usage of f i , i P rn ω s 0 as differentiable mapping instead of A in [41].
We first recognize that Theorem 1 is applicable for the CIA problem with any vector norm, which can be guessed from the equivalence of norms.

Corollary 1 (Approximation bounds via (CIA), cf. Corollary 5.2 in [50]).
Consider the setting of Theorem 1; in particular, let the regularity assumptions on f i , i P rn ω s 0 , hold. Assume that x and y CIA are the solutions of the IVP (2b) and (2c), where x is based on a given relaxed control a and y CIA is based on w˚, which is the optimal solution of (CIAno), no P tmax, 1u, with objective value θC IA from Definition 6. Then, the state approximation error is bounded for a.a. t P T by: Proof. This corollary is a direct result from Theorem 1 with x 0 " y 0 . We note that θC IA represents the norm of the accumulated control deviation, which appears in the proof of the Theorem and is bounded by θn ω so that this replaced term is settled.

Scaled Combinatorial Integral Approximation
The proof of Theorem 1 contains a motivation for the (SCIA) problems as derived in the following result.

Corollary 2 (Approximation bounds via (SCIA), cf. Corollary 5.3 in [50]).
Consider the setting of Theorem 1, and let }¨} no refer to the maximum or 1-norm, i.e., no P tmax, 1u. Assume that x and y CIA are the solutions of the IVP (2b) and (2c), where x is based on a given a, and y CIA is driven by w˚, which is the optimal solution of (SCIAno). Then, for a.a. t P T , the state approximation error is bounded by: where θS CIA is the optimal objective value of (SCIAno), no P tmax, 1u.
Proof. From the second and last inequalities in the proof of Theorem 2 from [41] and Corollary 1, it follows that, for a.a. t P T and any ω P Ω: Let ω CIA denote the control based on the optimal solution w˚of (CIAno), no P tmax, 1u. We take the minimum in the above inequality and obtain: We conclude from Corollary 1 that the approximation and convergence results of the decomposition still hold if (SCIAno), no P tmax, 1u, is used to construct the binary control. The approximation bound based on (SCIAno) is tighter than the existing (CIAno)-related bound. Hence, it is an obvious choice to consult these alternative binary controls for an approximation study. Ideally, the binary control constructed in this way will result in an improved state approximation and objective value for (MIOCP). However, we oppose this hope next.

Remark 1 (Construction by (SCIA) does not guarantee superior quality)
. The binary control based on the optimal solution of (SCIAno), no P tmax, 1u and used in the decomposition does not necessarily result in a state approximation or objective value that is superior to that obtained using (CIAno). It may hold that: }xptq´y CIA ptq} ă }xptq´y SCIA ptq} ă θS CIA , for some t P T , where we use the notation from Corollaries 1 and 2. Because of a possible non-convex objective, the computed trajectories may lead to a superior objective value for the solution based on (CIAno), compared with that based on (SCIAno), even if the above inequality does not hold.

λ-Combinatorial Integral Approximation
The λ-approximation stems from the aforementioned theorem of approximating differential states, but with assessing the difference of the cost-to-go function. We define it in a more general MIOCP setting with a Bolza objective: where L P C 1 pR n x`1ˆR n ω , Rq.
Definition 9 (Cost-to-go function by Hamilton-Jacobi-Bellman). Let the cost-to-go function J P L 8 pR n xˆT , Rq with Bolza objective (20) be implicitly defined as: We recognize that BJ Bx can be interpreted as Lagrange multiplier or dual variables of the ODE constraints in (MIOCP). Therefore, we write, in short, λptq instead of BJ Bx . With the groundwork above, we are ready to deduce the corresponding bound.

Corollary 3 (Approximation bounds via (λCIA), cf. Corollary 5.4 in [50]).
Consider the setting of Theorem 1. In particular, let the regularity assumptions (17d), (17c), and the essential boundedness of f be true. Assume that x and y λCIA are the solutions of the IVP (2b) and (2c), where x is based on a given a, and y λCIA is based on w˚, which is the optimal solution of (λCIA1).
Let J be the cost-to-go function as defined in Definition 9 for (MIOCP) and λptq be the adjoint vector at t P T for the ODE system (2b). For a.a. t P T , it follows that: |Jpxptq, tq´Jpy λCIA ptq, tq| ď θλ CIA e Lpt´t 0 q`o´} xptq´y λCIA ptq} 2¯, where o refers to Landau's little-o notation.
Proof. Let us consider the difference of the cost-to-go functions by approximation with a partial first-order Taylor expansion around Jpxptq, tq. We perform the approximation with respect to the trajectories x, y λCIA . Thus, we apply Taylor's theorem for t P T : As pointed out above, the dual variables of the ODE constraint (2b) are equal to dJ dx pxptq, tq. We use the notation }xptq} λptq :"ˇˇř kPrn x s λ k ptqx k ptqˇˇfor t P T , which defines a semi-norm. Next, for a.a. t P T , we transfer the proof of Theorem 1 to this notation and to (21): The third summand of the last inequality is equal to the objective θλ CIA that is to be minimized in (λCIA1). Finally, we apply x 0 " y 0 and use the Grönwall Lemma 1 with the integrable functions: so that the claim is proven.
Due to the first-order Taylor approximation, (λCIA1) requires that the relaxed trajectory x can be well approximated by a trajectory y that is based on binary controls. On the other hand, if there is no such trajectory in a close neighborhood of x, then (λCIA1) may have an unintuitive binary control as its optimal solution.

Backwards Accumulating Constraints
If we adapt the MIOCP instance with fixed final states x f P R n x and with a Lagrangian objective type, we can also apply this modified setting to Theorem 1. We express this issue in the following corollary.

Corollary 4 (Approximation bounds via backward constraints).
Consider the setting of Theorem 1. Let x and y be the state trajectory solutions of the terminal value problems (2b), with xpt f q " x f and ypt f q " y f for x f , y f P R n x . Assume that for all t P T , θ b P R`, it holds that: Then, for a.a., t P T it also holds that: ||xptq´yptq|| ď´ˇˇˇˇx f´y fˇˇ`θb n ω´B`C pt f´t q¯¯e Lpt f´t q . (23) Proof. We apply the proof of Theorem 1 to the altered setting, in which we integrate over rt, t f s instead of integrating over rt 0 , ts.
With the assumption that }xpt f q´ypt f q} is small, the backward (CIA) rounding problem approach from Section 3.3 is not only applicable for terminal constraint problems but is also appropriate for an (MIOCP) instance with a given initial value x 0 and variable final-state values.

Connection to Decomposition Algorithm and Optimization Problem
As a last step in this chapter, the previous results are related to (MIOCP) and the decomposition Algorithm 1.
Remark 2 (Arbitrary close approximation of (OCP) solution). We could extend Algorithm 1 with an outer loop that checks if Φ opt is sufficiently close to Φ rel , and if not, we would refine the grid G ω . In [29], an arbitrary close approximation of the (OCP) solution has been deduced with this procedure and based on (CIA). With the assumption of Φp¨q, cp¨q being continuous and that there exists a feasible trajectory x for (OCP), it follows that for any ą 0 there exists a grid G ω , with grid size ∆ max , such that there is a feasible trajectory typt j qu t j PG ω with: The proof uses the sum-up rounding scheme that derives the binary control approximation with θC IA ď Constpn ω q∆ max [35,41], where Constpn ω q is a constant depending on n ω . In case we extend Algorithm 1 with the refinement procedure, and if (CIA) or (SCIA) are chosen to be elements of S CIA , the same approximation result holds.
Corollary 5 (Solution accuracy of differential states of Algorithm 1). Let θS CIA,max , θS CIA,1 be the optimal objective values of (SCIAmax) and (SCIA1), respectively. Consider the setting of Theorem 1. In particular, let the regularity assumptions (17d), (17c), and essential boundedness of f be true. Assume that x and y are the solutions of the IVP (2b) and (2c), where x is based on a given a, and y is based on Algorithm 1. Let G ω be the applied grid. It follows for t i P G ω that: Proof. The claim is a direct result of Corollary 2.
We have deliberately chosen the tightest bound from the previous corollaries, but could also use others. The received grid-specific rounding error bounds aim primarily at approximating the differential states. With (λCIA) or Lipschitz continuity of the objective, there are also tools to discuss the rounding error of the objectives. The recombination heuristics work in the area of objective approximation and, hence, are to be discussed next.

Recombination Heuristics
We present several recombination heuristics that recombine different binary controls w to new candidate solutions with potentially smaller objective values. The general framework is open to apply different heuristics, such as genetic algorithms [54], that are not introduced in this article.

GreedyTime
Algorithm 2 is a routine for using the MILP solutions in a greedy pattern with the aim of constructing binary controls w that exhibit an improved objective value Φpϕpwqq. In GreedyTime, we iterate over all intervals j P rM´1s 0 in chronological order (line 1). On every interval, we check if there are MILP pairs pm 1 , m 2 q that differ in their binary control vectors in line 2. We recombine for each of these pairs the m 1 solution with the binary control vector from m 2 at interval j to construct a temporary control solution r w m 1 (line 3). Based on this construction, we evaluate the objective of this new control solution in line 4 and overwrite the binary control w m 1 with the recombined solution r w m 1 if that latter results in an improved objective value (lines 5-6). We proceed in the same way with the second solution m 2 when the (same) pair pm 2 , m 1 q appears in the inner loop.
A large number of calculated MILPs may result in a large number of pairs with unequal control solutions. In this case, it is advisable to only swap the w m 2 solution with the currently smallest objective value, instead of swapping and testing each variation for every w m 1 .
If the control problem at hand involves no continuous controls u, it is straightforward to evaluate the (MIOCP) in line 4 with the previously found and fixed x until grid interval j. This speeds up the process in problems with fine grids and numerous MILP solutions, because (MIOCP) needs to be solved iteratively. Moreover, if an MILP solution m 1 differs from two MILPs m 2 , m 3 with identical binary control vectors w m 2 ,j " w m 3 ,j , it is sufficient to test the recombination with only one of the two. We illustrate an example recombination step for the pairs (CIA, SCIA) and (SCIA, CIA) in Figure 1.

1.
We can also apply the outer loop in Algorithm 2 backward in time and name the backward version GreedyTimeBackward; 2.
We may consider only singular arcs, instead of looping over all intervals, since the constructed binary controls are likely to be equal on bang-bang arcs. With singular arcs, we mean the intervals where ă ai ,j ă 1´ holds for the optimal control solution a˚of (OCP), with a certain threshold ą 0; 3.
Greedy-cost-to-go modification: Assume we have obtained the dual variablesλ j,k , j P rM´1s 0 , k P rn x s, of the state equations of (OCP). Then, re-sort the intervals rM´1s 0 in descending order according to ř kPrn x s |λ j,k |, j P rM´1s 0 . In this way, we construct a new ordered grid G λ ω to be iterated over in Algorithm 2.

Singular Arc Recombination
If a˚is (almost) binary on certain intervals, w should typically attain these binary values as well as an optimal solution of a CIA rounding problem-regardless of the MILP choice. To this end, we formalize singular arcs of a˚as sets of consecutive intervals on which the relaxed control takes values smaller than or larger than 1´ . Here, ą 0 is a chosen small tolerance. .
Let the number of singular arcs n sing be defined as: We aim to recombine singular arc realizations of the different MILP solutions from S CIA , which is performed in Algorithm 3.
We initialize the set of visited binary controls as empty in the algorithm and set the so-far best objective value Φ rec to infinity (line 1). Next, the temporary binary control w tmp on the bang-bang arcs is set as equal to the rounded relaxed control (line 2). In line 3, we test every possible variation of the different MILP solutions on the singular arcs. In this way, we fill up the singular arcs of the temporary binary control w tmp (line 4). The constructed control w tmp is checked if it has already been visited (line 5), and if so, the algorithm jumps to the next iteration (line 6). Otherwise, we include w tmp in the set of visited controls (line 8), and we evaluate its objective value (line 9). When a recombination has an improved objective value than the so-far best control, it will be saved as the so-far best control (lines [10][11][12]. We illustrate the algorithm in Figure 2. Set S vis " S vis Y tw tmp u; 10 Evaluate (MIOCP) with w " w tmp fixed Ñ Φ`ϕ`w tmp˘˘; 11 if Φ`ϕ`w tmp˘˘ă Φ rec then 12 Set Φ rec " Φ`ϕ`w tmp˘˘; 13 Set w rec " w tmp ; We have to take care of the number of possible variations of singular blocks and MILP solutions |S CIA | n arc to avoid a combinatorial explosion. Therefore, it is advisable to choosẽ S CIA in Algorithm 1 with a small number of MILPs. Only a few singular arcs result usually after solving (OCP). We may modify Algorithm 3 to be greedy, i.e., to apply the idea of GreedyTime on arcs instead of on single intervals, if there are more than four singular arcs.
The singular arc recombination yields an objective value that is at least as good as those previously constructed via the MILPs. However, there currently exists no framework for quantifying these possible improvements in terms of new rounding errors of the objective.

Software Implementation and Instances
We implemented Algorithm 1 in AMPL [55] using the code ampl_mintoc, which is a modeling framework to solve optimal control problems. It features different discretization schemes of ODEs, though we used only a Radau collocation from [39]. The tool is advantageous for our purpose, since it includes automatic differentiation, interfaces to MILP solvers, and its problem formulation stays close to mathematics. In addition, AMPL provides the dual variables λ. Throughout the numerical study, we applied Gurobi 8.1 as the MILP solver and IPOPT 3.12.4 as the NLP solver, with default settings in each case, to solve the discretized (OCP). We assumed that the choice of the MILP solver has little influence on solution quality and verified this by testing also with CPLEX 12.9. We tested our algorithms also with CasADi and received similar results as with ampl_mintoc. All results were obtained on a workstation with four Intel i5-4210U CPUs (1.7 GHz) and 7.7 GB RAM.
We included MIOCPs from the benchmark collection site mintoc.de [18] in our numerical study, which we specify further in the following subsections. For these problems, we chose a differential states discretization with N intervals such that it was fine enough respect to the objective value. In this way, the objective value differs only to the fifth decimal place, with respect to a finer discretization for constant M. Afterwards, we varied M with fixed N in order to construct different instances. We refer to Appendix A for further details. Solving the binary approximation problem and then solving the MIOCP with fixed binary controls might result in infeasible solutions for problems involving path or terminal constraints. To this end, we relaxed these constraints and applied a merit function that penalizes constraint violation to be added to the objective with a sufficiently high penalty factor.

Scaled Combinatorial Integral Approximation
Our hypothesis is that the MILPs based on a scaled combinatorial integral approximation perform the best on instances where the binary control enters the control dependent right-hand side terms f i of the ODE in an affine way, i.e.,: where c i P R. On the other hand, if f i depends on xptq, it may change rapidly over time. To this end, this results in possibly inaccurate ω solutions, since we use only the discretized state trajectory xptq value. The MIOCPs "Double tank (Multimode)" and "Lotka-Volterra (absolute fishing variant)" are identified as candidate problems with the above right-hand side structure and constructed their solutions for different discretizations and both with and without the combinatorial constraint (15); see Appendix A for details. The results are presented in Figure 3. We chose to evaluate the MIOCP solutions according to the distance in the }¨} 8 -norm of the differential state trajectories corresponding to either binary or relaxed controls. We argued in Section 4 that the CIA decomposition is built on this distance, and, particularly, the proximity of objective values and constraint satisfaction follows as derived in Section 4.5. The differential state trajectories based on the (SCIA1) and (SCIAmax) solutions are significantly closer to the relaxed solution compared with their CIA counterparts, as shown in the performance plot. There are hardly differences between }¨} 1 -and }¨} 8 -norm results, although a tendency can be detected of better-performing }¨} 8 variants.
We examined whether the (SCIA1) and (SCIAmax) are outperformed by (CIA1) or (CIAmax), if the state trajectory distance is measured in }¨} 1 -norm, or if the objective value deviation to the relaxed solution is taken into account, but the SCIA variants remained the clear winners. Analogously, the result remains similar when comparing the algorithms solely on instances (not) including combinatorial constraints.

λ-Combinatorial Integral Approximation
We derived (λCIA) as an approximation of the cost-to-go function difference to the relaxed solution. Since this approximation is linear, the standard (CIA) approach is more suitable on most of the (nonlinear) MIOCPs. Our hypothesis is that the situation is different when a regularization term enters the objective function, accounting for the cost of activating binary controls in the form of, e.g.,: where c i P R. The problem "Quadrotor (binary variant)" includes a cost function, where the controls enter in the above form, so that we used it with different discretizations and both with and without the combinatorial constraint (15) for comparing the (λCIA) solutions with the ones obtained via (CIA). We present the computation results in Figure 4.
In contrast to the previous section, we compared here the objective deviations from the relaxed solution in percentages, since the λ-combinatorial integral approximation aims directly at improving the objective values. However, we remark that the latter algorithm performs worse than (CIA1) and (CIAmax) if the distance to the relaxed solution is measured in differential state space. The performance plot shows that (λCIA) provides solutions with improved objective values on some instances, but on many others it does not. Since (λCIA) turned out to provide even weaker approximations of the relaxed solutions for other MIOCPs, as will be shown in Section 6.5, we do not recommend to use it in general as a single MILP approximation step. It serves, however, as beneficial candidate solution for recombination and might be useful for not-yet-explored problem classes.

Backwards Accumulating Constraints
Our hypothesis for the MILP variants based on backwards accumulated constraints is that they are beneficial if the MIOCP involves terminal equality constraints on the differential states. Because the deviation to the relaxed solution can become large, the standard (CIAmax) approach may construct a solution that does not satisfy this constraint. Nevertheless, the direct incorporation of terminal constraints into the MIOCP may lead to numerical difficulties already for the relaxed problem. To this end, we considered soft constraints, meaning that we introduce slack variables in order to penalize a deviation of the differential states from a desired terminal value. We identified the MIOCP "Lotka-Volterra (terminal constraint violation)" as a candidate problem and calculated its solutions for different discretizations and both with and without the combinatorial constraint (15). We illustrate the objective deviation from the relaxed solution in percentage of (CIA) and its backward variant solutions in Figure 5. We chose the objective deviation as the performance measure for our comparison study because the objective accounts for a violation of the terminal constraints via a slack variable penalty term. The graphs of (CIAmaxB) and (CIA1B) indicate that their respective MIOCP solutions involve smaller objective values than their forward CIA counterparts.
We observed that this result seems to be independent from the chosen norm, since the performance differences of (CIAmaxB) to (CIA1B) are neglectable.

Recombination Heuristics
We used the MILP solutions by (CIAmax), (CIA1), (SCIAmax), (λCIA1), and (CIA-maxB) as a base for running the recombination heuristics on a set of 13 MIOCPs from the benchmark collection site mintoc.de with different discretizations (see Appendix A for details). The box plot in Figure 6 illustrates the numerical results with respect to objective deviation of each algorithm to the relaxed solution in percentages. The boxes including median values of the SCIA and backwards approaches appear to be slightly larger and their mean values and their outliers a bit smaller, respectively, than the ones of the (CIA) MILPs. The numerical study revealed several instances where (SCIAmax) or (SCIA1) ran into a binary solution with active controls on some intervals with relaxed values close to zero. Under the assumption that the combinatorial approximation is conducted mainly on singular arcs, these cases might be called degenerated. We experienced underperforming objective values for SCIA in case of degenerated controls, which explains some of the underperforming instances. We conclude that (SCIA1), (SCIAmax), and (CIAmaxB) should be used with caution. For specific problem classes, as shown in the previous chapters, they can be very helpful. Here, we have not specifically selected the problems, and on this general problem class there is no guarantee that these algorithms provide any real improvement.
The solutions of (λCIA1) clearly underperform, but we stress, as mentioned in Section 6.3, their importance for recombination. As a comparative calculation, we have also computed the solutions based on SUR and see that they provide similarly good, albeit somewhat worse, solutions compared with (CIA1) and (CIAmax). Note that depending on the selected algorithm, some instances resulted in a deviation of more than 100%, which can be explained by highly penalized infeasible solutions of path-or terminal-constrained problems.
The depicted recombination heuristics provide significantly better solutions in terms of the objective than the MILP-constructed binary solutions. The median values are reduced by a factor of about 2 (ArcRecombination) to a factor of 3 (GreedyTime) in comparison with (CIA). The other characteristics such as mean values, box borders, and outliers also reflect the improvements. Particularly noteworthy is the GreedyTime heuristic, which is robust against outliers as well as constructs on average solutions with small objective values. The ArcRecombination selects the solution of the MILP algorithms with the smallest objective value in the case of only one singular arc. Since many of the selected problems have only one singular arc, the box plot illustrates that this minimum over all MILPs can already provide a significant improvement.   Figure 7. Log plot of run time and objective value deviation from the relaxed solution of the constructed solutions for different approaches and for the Lotka-Volterra multimode problem, with differential state discretization N = 12,000. The numbers in the plot indicate the applied corresponding number of control grid points M. We illustrate the outcomes of the solutions constructed by (CIAmax), OptpS CIA q, the GreedyTime recombination heuristic, and the MINLP solver Bonmin. By OptpS CIA q, we denote the best objective value outcome of over all MILP solutions. For each control discretization, we connect the outcomes of the four approaches with lines in order to compare the behavior for different discretizations. One observes the convergence of all approaches towards the lower bound provided by the relaxed solution and the closure of the gap between (CIAmax) and Bonmin solutions for a fixed discretization. GreedyTime is roughly two orders of magnitude slower than (CIAmax), but is faster than Bonmin.

Runtime Evaluation
The average runtime over all instances for (CIAmax) was about a few seconds and increased slightly for (SCIAmax); see Appendix A.2. Gurobi needed on average more than one minute for the MILPs with 1-norm and thus considerably more time. For instances involving a fine discretization, the runtime increased enormously, so we set a time limit of 30 min.
We remark that the greedy heuristics and the ArcRecombination are to be used cautiously, because an input of many MILPs leads to a high number of recombinations that have to be evaluated. ArcRecombination is relatively inexpensive and offers a solution that is at least as good as the best MILP. The algorithm is most beneficial in cases of several singular arcs, in contrast to most applied problem instances where there is only one arc. The greedy algorithm variants are quite expensive (run times of up to 15 min), yet provide solutions with objective function values very close to those of the relaxed problem.
A way to significantly reduce the computation time of the MILPs is to apply branchand-bound [32] or to use SUR for constructing approximate solutions. These algorithms are implemented in the open-source software package pycombina (see https://github.com/ adbuerger/pycombina) (accessed on 14 February 2022) [56] and might be adapted to the scaled MILP case as part of a future study. If the binary controls enter linearly into the dynamics, as in Equation (25), then the modification is straightforward, since only all differences pa i´wi q have to be scaled with the factors c i . Finally, run times of days, or even weeks when it comes to the MINLP solver, cast a positive light on the proposed decomposition algorithm including recombination.

Summary and Conclusions
We have extended the decomposition approach based on combinatorial integral approximation [32], using multiple MILP formulations and recombination heuristics in an outer loop. At the price of additional MILP solutions and MIOCP evaluations, we obtain an improvement of the objective function value for every fixed control discretization grid.
A numerical study with benchmark problems shows that the novel MILP solutions indeed improve the existing CIA solutions in terms of objective value on specific problem classes. We conclude that the CIA decomposition can be reasonably modified for certain MIOC subproblem classes. Furthermore, the computational results for a set of non-specific MIOCPs resulted in a substantial improvement of the CIA solution through recombination strategies.
The main added value of this study is a decomposition algorithm that works much faster than Bonmin, but still offers qualitatively similar solutions and performs on average better than the existing (CIA) approach. The framework is open to extensions, both on the MILP and on the post-processing level.
Additional work is necessary to incorporate other constraints, such as vanishing constraints, to derive further tailored MILP formulations for specific problem classes and to develop numerical algorithms that generalize SUR and/or branch-and-bound algorithms to the various MILP formulations.
Furthermore, it would be interesting to include and to compare the results of this study computationally with recent approaches for the CIA decomposition [43,44].
Author Contributions: C.Z. conducted the implementations and the numerical study, and was responsible for writing most of the article. T.W. contributed major algorithmic ideas and proofread the manuscript. S.S. contributed to the research design and discussions of the algorithmic ideas and wrote parts of the paper. All authors have read and agreed to the published version of the manuscript. For generating the performance and box plots in Section 6, we applied Algorithm 1 on the following discretized problems: