Fast Approximation of Over-Determined Second-Order Linear Boundary Value Problems by Cubic and Quintic Spline Collocation

: We present an efﬁcient and generic algorithm for approximating second-order linear boundary value problems through spline collocation. In contrast to the majority of other approaches, our algorithm is designed for over-determined problems. These typically occur in control theory, where a system, e.g., a robot, should be transferred from a certain initial state to a desired target state while respecting characteristic system dynamics. Our method uses polynomials of maximum degree three/ﬁve as base functions and generates a cubic/quintic spline, which is C 2 / C 4 continuous and satisﬁes the underlying ordinary differential equation at user-deﬁned collocation sites. Moreover, the approximation is forced to fulﬁll an over-determined set of two-point boundary conditions, which are speciﬁed by the given control problem. The algorithm is suitable for time-critical applications, where accuracy only plays a secondary role. For consistent boundary conditions, we experimentally validate convergence towards the analytic solution, while for inconsistent boundary conditions our algorithm is still able to ﬁnd a “reasonable” approximation. However, to avoid divergence, collocation sites have to be appropriately chosen. The proposed scheme is evaluated experimentally through comparison with the analytical solution of a simple test system. Furthermore, a fully documented C++ implementation with unit tests as example applications is provided.


Literature Review
In almost every field of natural science and engineering we face differential equations, which are typically used for modeling dynamical systems.Especially in engineering, the more specific case of boundary value problems (BVP) is very prominent.In other words, one often searches for the behavior of the investigated system between some kind of fixed, i.e., known, spatial or temporal boundaries.One can try to derive the analytical solution to the problem, however for complex systems this can be difficult or even impossible.In such cases, it can be sufficient to approximate the solution for which a variety of techniques exists.
Methods based on finite differences approximate the derivatives by difference quotients to obtain a system of equations which depends only on the primal function.This allows a solution to be found by formulating a linear system of (algebraic) equations representing the transformed system at certain grid points.In contrast, shooting methods aim at the iterative solution of an equivalent initial value problem (IVP), which is typically easier to handle than the original BVP.While for single shooting the IVP is evaluated over the complete time interval, multiple shooting considers a partitioned time domain.Another technique is given by the finite element approach, which is mainly used for partial differential equations (PDE) as they occur, for example, in structural-, thermo-, and fluid dynamics.Typically finite element methods are based on a weak formulation of the residual and on splitting the considered domain into elements on which the local base functions to approximate the solution are defined.Although designed for PDEs, they can be applied to the simpler case of ordinary differential equations (ODE), which are the focus of this contribution (see, for example, [1]).A variety of other techniques exist.However, these three groups appear to be used the most in the field of engineering.
In the following, we approximate the solution through spline collocation using piecewise polynomial trial functions, which is a well-known technique to solve BVPs, see, for example, [2][3][4].Over the past decades, various algorithms emerged, which can be classified into two types.The first type, also known as smoothest spline collocation (or just spline collocation as in [5]), aims at matching the differential equation at one collocation site per spline segment, which is typically a knot or the mid-point of the segment, while simultaneously forcing the spline to have maximum smoothness, i.e., highest possible continuity [5].The second type, in [5] called Gaussian collocation, removes the constraints on higher order continuity and instead uses more collocation sites, which are typically chosen to be the Gaussian points of each segment.This class, which is also called orthogonal spline collocation [6], originates from [3] and aims at maximizing the order of convergence.In order to also provide a competitive convergence for the first type of methods, special variants for quadratic and quintic spline collocation have been proposed in [7] and [8], respectively.Optimal methods for quadratic and cubic splines on non-uniform grids, i.e., for an inhomogeneous segmentation of the spline, have been presented in [5].
Note that in addition to general purpose codes, such as COLSYS (FORTRAN) for non-linear mixed-order systems of multi-point boundary value ODEs [9], highly specialized algorithms, which aim at obtaining the best possible approximation for certain use cases, have also been published.Recent examples are methods for integro-differential equations [10] or fractional differential equations [11], which occur in certain material models exhibiting memory effects.Along with ODEs, various kinds of (multi-variable) PDEs have been investigated.See [6] for a survey on corresponding Gaussian collocation methods.Note that, for PDEs, the time domain is typically discretized using finite differences, e.g., by the Crank-Nicholson approach as used in [6], or the second-order backward difference in [10], while spline collocation is used for approximating the spacial variables.Lastly, methods for differential algebraic equations (DAE) have also been developed, e.g., the COLSYS extension COLDAE [12] for semi-explicit DAEs of index 2 and fully implicit DAEs of index 1.
In our opinion, despite the variety of techniques, there is still a lack of simple methods prioritizing execution time over approximation quality, which is essential for time critical control applications.The aim of this contribution is to provide an algorithm satisfying these needs while focusing on the special case of second-order linear ODEs, which are very common for dynamic systems.Moreover, we focus on over-determined BVPs, i.e., where more boundary conditions (BC) are given than necessary.This may at first seem to be a restriction of our algorithm since it needs more information than other implementations; however, it allows us to also consider inconsistent BVPs, i.e., the case where no exact solution to the problem exists.

Motivation
It might appear strange to search for an approximation of something that actually does not exist.However, we face exactly this situation during walking pattern generation for our humanoid robot LOLA, cf. Figure 1 left.In particular, we use a simplified model of the robot's multi-body dynamics, cf. Figure 1 right, to plan the center of mass (CoM) motion over a certain time horizon.The planned CoM motion resembles the dynamics of the model, which can be formulated as second-order linear ODE, and is constrained to certain values on position-, velocity-and acceleration-level at the boundaries of the planning horizon.This leads to an overdetermined BVP of the type investigated in this contribution.The BCs at the beginning reflect the current state of the robot, e.g., the standing pose in front of a platform, while the BCs at the end represent the target state, e.g., the standing pose on the other side of the obstacle, cf. Figure 2.For a seamless motion of the robot it is crucial to guarantee the satisfaction of the BCs, i.e., a perfect match of the boundary states.In contrast, it is sufficient to approximate the dynamics of the underlying ODE since it is derived from a simplified model which is an approximation in itself.This is the key idea behind the formulation of an over-determined BVP, which may thus not have a proper "real" solution.To the authors knowledge, all comparable algorithms assume that a solution of the BVP exists, while most of them also require it to be unique and "sufficiently" smooth.The proposed algorithm is used within the walking pattern generation framework of LOLA, see [13] for details.The robot is 1.8 m tall and weights about 60 kg.Left: photo and kinematic configuration of the system with 24 actuated degrees of freedom.Right: simplified model of multi-body dynamics with torso mass m t , foot mass m f and torso mass moment of inertia Θ t which (together with the ground reaction forces/torques) contribute to the ordinary differential equation (ODE) describing the center of mass (CoM) dynamics (blue).
Explaining the technical details of the walking pattern generation framework of LOLA goes beyond the scope of this paper.Instead, the interested reader is referred to [13], in which the application of the proposed algorithm is presented.The publication [13] comes with an accompanying video, see [14], which visualizes the sequential steps of the walking pattern generation pipeline of LOLA.The simulations and experiments presented in the video show the successful application of our algorithm.In the following, we do not further restrict ourselves to this special application.Since the proposed algorithm is generic, we derive it in the most general way since it may be useful also for different applications in robotics.

Additional Remarks
Our method can be seen as smoothest spline collocation, i.e., it belongs to the "first" type as classified in Section 1.1.We do not apply special techniques to increase the order of convergence, but instead adhere to its basic form.This leads to a much simpler derivation and implementation.In addition it makes the algorithm faster by sacrificing approximation quality.This complies with the needs of our target application as explained in Section 1.2.We emphasize that our focus lies on simplicity, robustness, and efficiency.Thus we will not search for a mathematical formulation of the convergence order.Instead, the algorithm is evaluated mainly through experiments, where runtime performance is our primary concern.
As summarized in Section 1.1, there exist numerous methods for solving linear BVPs or spline interpolation/collocation in general.For most approaches, solving a large-sparse or small-dense linear system of equations (LSE) represents the main workload.To overcome this bottleneck, parallelized algorithms have been developed, which typically exploit the special structure of the involved matrices, e.g., the scheme proposed in [15] for staircase matrix structures.Although we aim at efficiency, we do not consider explicit parallelization as acceleration technique.This is because our algorithm is designed for embedded systems which typically feature only few physical CPU cores running also other time-critical tasks.Moreover, the use of GPUs, e.g., through CUDA or OPENCL, is often not feasible since the CPU-GPU interface lacks capabilities for hard real-time requirements.Finally, dedicated to our target application, we only consider (comparatively) small problems with runtimes ≈ 1 ms.For such systems the performance boost obtained through parallelization is likely to be canceled out by the synchronization overhead.Nevertheless, our algorithm may also be used for large scale problems.In this case an "off the shelf" parallel solver for dense LSEs may be used, cf.[16].However, one should keep in mind that by using an iterative scheme execution time is not deterministic anymore, and, even worse, the solver might not converge.As an alternative, there exists an efficient way for the parallel solution of decoupled, multi-dimensional BVPs, which takes advantage of intermediate results, see Appendix C for details.

Materials and Methods
In the following, two versions of our algorithm are presented: one using a cubic spline and the other using a quintic spline for approximating the BVP.As the derivations and resulting algorithms are similar, we show the connecting links by presenting both methods in parallel.The version based on cubic splines is naturally simpler, although, using quintic splines leads to a smoother approximation.Indeed it is C 4 -instead of C 2 -continuous, which can be preferable for some use cases.Moreover, quintic splines allow us to directly preset first and second-order derivatives at both boundaries, which otherwise requires the introduction of virtual control points and in turn can lead to poor results, as discussed in Section 3.Although deriving the proposed collocation algorithm for quintic splines is more advanced than its cubic counterpart, overall performance is superior, because the same approximation quality can be obtained with less collocation sites and hence with less computational effort as shown in Section 3.However, this requires the underlying ODE to be sufficiently smooth.Note that, in contrast to our intention, most other investigations choose quintic splines for approximating fourth-order ODEs, e.g., in [8,10,11,17] (similar to choosing cubic splines for second-order ODEs).
We highlight that the proposed method is not only inspired by, but is also heavily based on the interpolation and collocation algorithms presented in [18,19], respectively.The main contribution of this paper is the combination, extension and runtime optimization of those methods.Moreover, we provide a detailed and self-contained derivation together with a fully documented open-source C++ reference implementation.
Having a background in mechanical engineering, the authors intention is to present a simple and self-contained derivation of the proposed algorithm, which can be easily understood, implemented, and extended by readers also lacking a dedicated mathematical background.

Problem Statement
Consider the second-order linear time-variant ODE where Ḟ(t) and F(t) denote the first and second derivative of the unknown function F(t) with respect to time t, i.e., Note that t does not have to represent time.However, this synonym is used in the following due to the typical appearance of (1) in dynamical systems.The coefficients α, β, γ, and the right-hand side τ are arbitrary, in general nonlinear, but known functions of t.Let system (1) be constrained by the BCs where t 0 and t n define the considered time interval t ∈ [t 0 , t n ] and F 0 , Ḟ0 , F0 , F n , Ḟn , Fn are user-defined constants.Then system (1) together with the BCs (2) represents the second-order linear two-point BVP for which an approximation is to be found.Note that (2) considers Dirichlet, Neumann, and second-order BCs independently of each other.In contrast, various other algorithms assume Robin BCs, i.e., a linear combination of Dirichlet and Neumann BCs, which is not equivalent to our approach.Due to (2), the BVP is over-determined and the existence of a solution F(t) depends on the consistency of the BCs with the ODE.
In the following, we investigate the approximation of F(t) through spline collocation, i.e., we generate a spline y(t) which satisfies the underlying ODE (1) at a user-defined set of distinct collocation sites {t k }, numbered in increasing order, which lie within the considered interval (t 0 , t n ), i.e., Moreover, y(t) is forced to fulfill the BCs (2) at t 0 and t n , i.e., Here we use y(t) to denote the approximating spline while the exact solution is represented by F(t).For clarity, we also use different denominations for {t k } and {y k } by using the terms collocation sites and collocation points, respectively.While {t k } are user-defined parameters, {y k } describe the solution to be found.
Since the proposed collocation algorithm is strongly related to the interpolation of cubic/quintic splines, which may not be common to some readers, spline interpolation is recapitulated in Section 2.3.Then, the proposed collocation method is derived in Section 2.7.Moreover, we reuse core elements of the interpolation algorithm during collocation, thus we cannot omit its derivation.

Spline Parametrization
Before diving into the derivation of algorithms, one first has to decide which spline representation to use.In the literature, formulations such as basis splines (B-splines) are common, since they feature inherent continuity and local control, which typically leads to banded systems [20].In general, B-splines do not pass through their control points, which seems to make interpolation difficult at first sight; however, efficient algorithms for interpolation and collocation exist, see, for example, [21].In [20], it has been shown that B-splines might not be as stable and efficient as other representations, namely monomial and Hermite type bases, especially when it comes to implementation.In particular, monomial bases have been recommended due to their superior condition, and thus lower roundoff errors.For all three forms, B-spline, Hermite type, and monomial, the core operation during Gaussian collocation is typically the solution of an almost block diagonal (ABD) linear system of equations [6,21].A generic solver for these type of systems is SOLVEBLOK [22], while the special structure occurring for monomial bases is exploited by ABDPACK [23] with increased speed and lower memory consumption.Unfortunately, for smoothest spline collocation as presented in the following, the corresponding collocation matrix is dense, thus we cannot apply these algorithms.However, when compared to Gaussian collocation, the count of collocation sites and thus the dimension of the corresponding LSE is much smaller, which can lead to comparable performance.
Despite the popularity of B-splines, we use the so-called piecewise polynomial (PP) form [21], which describes the spline through the coefficients of interconnected, but independently defined, polynomial segments.We use a special type of monomial bases, namely the canonical form of the polynomials, which may not be as efficient as the choice in [20]; however, it makes our algorithm much simpler.By using this formulation, continuity between the spline segments needs to be explicitly established.The evaluation of the resulting spline, however, boils down to the evaluation of a single polynomial belonging to the corresponding segment, which is in general much quicker than evaluating the equivalent B-spline form.The evaluation of B-splines of degree p with the well-known de Boor's algorithm [24] takes O(p 2 ) + O(p) operations [25].There are optimized versions of it as proposed in [25,26]; however, these are numerically less stable [27].In contrast, evaluating a polynomial of degree p with Horner's method [28] takes only 2 p, i.e., O(p), operations [29].This is essential for time-critical applications, where the resulting spline has to be evaluated as quickly as possible.Note that we are free to construct the spline in B-spline formulation and convert it to the corresponding PP form in a post-processing step, see [21].However, this introduces an additional (expensive) step which we try to avoid since, in our case, not only the evaluation but also the construction of the spline is time critical.
Let the spline y(t) be defined as where s i represents the i-th of the n > 1 spline segments parametrized by the normalized interpolation parameter ξ i .We call t ∈ [t 0 , t n ] and ξ i ∈ [0, 1] the global and local interpolation parameters, respectively, for which we choose the linear mapping with h i > 0 as the duration of the i-th segment h i = t i+1 − t i and its reciprocal The partitioning of the spline into n segments is visualized in Figure 3.In the following, we predominantly derive expressions in local segment space, i.e., with respect to ξ i , since this makes the notation clearer, especially in Section 2.7.Note that, in contrast to some other approaches, we do not require homogeneous partitioning, thus the segmentation can be chosen arbitrarily as long as spline knots do not coincide.However, in Section 2.3, we show that for best numerical stability uniform partitioning should be used.In the case of cubic splines (left subscript C), each segment C s i represents a polynomial of degree three, where C a i , C b i , C c i , and C d i are its constant coefficients.The first two derivatives of C s i (ξ i (t)) with respect to t are obtained by applying the chain rule: For quintic splines (left subscript Q), each segment Q s i represents a polynomial of degree five, where and Q f i are its constant coefficients.As in the cubic case, we obtain the first four derivatives of Q s i (ξ i (t)) with respect to t through the chain rule:

Spline Interpolation: Preliminaries
In the following, we recall how a given set of n + 1 data points {(t i , y i )} with i = 0, . . ., n can be interpolated with a C 2 or C 4 smooth cubic or quintic spline, respectively.The derivation explicitly uses the PP form of the spline and leads to the same algorithm as presented in [18], except for slight modifications in notation.Note that [18] only deals with quintic splines.However, the method for cubic segments presented in this paper is a simplified version of the same scheme.Moreover, the derivation is investigated in more detail than it is in [18].Readers not interested in these details are referred to Algorithm 1 which summarizes the results of this section.
In contrast to [18], we only consider the case of predefined first-and second-order derivatives at the boundaries of the quintic spline, i.e., we assume ẏ0 , ÿ0 , ẏn , and ÿn to be given (as indicated in Figure 3).For the cubic counterpart, we lose two degrees of freedom, allowing us to predefine only two constraints out of { ẏ0 , ÿ0 , ẏn , ÿn }.For the remainder of this section, we restrict ourselves to the case of predefined second-order time derivatives ÿ0 and ÿn as this allows an efficient algorithm similar to the one presented for quintic splines.Note that this choice includes the common case of natural cubic splines, i.e., ÿ0 = ÿn = 0.For cubic splines, we postpone the enforcement of the remaining boundary conditions, i.e., ẏ0 and ẏn , to the end of Section 2.7.

Cubic Spline Interpolation: Derivation
Since our goal is a spline which passes through the given data points {(t i , y i )}, we enforce the interpolation constraints Furthermore, we aim at C 2 continuity, thus we further require that C ṡi (ξ i ) Inserting ( 7) and ( 9) into (15) and in the second row of ( 16) allows us to reformulate the spline coefficients as where ÿi for i = 1, . . ., n − 1 are the n − 1 unknowns which have to be computed using the remaining n − 1 equations given by the first row of (16).For this purpose, we expand the first row of ( 16) with (8) and insert C a i , C b i , C c i and C c i+1 from ( 17) to obtain which holds for i = 1, . . ., n − 1.This equation can be considered as defining intermediate unknowns ÿi and can be written as LSE of the form and their elements Herein C L i , C D i , and C U i ∈ R represent the lower diagonal, diagonal, and upper diagonal elements of C A, respectively.Note that C A only depends on the choice of {t i }, thus it can be reused in case we need to perform further interpolations with the same segmentation {t i }.While C A represents the interpolation sites {t i }, the interpolation points {y i } are encoded in C B. The additional term C Λ i incorporates the BCs such that we can write the system in the neat form of (20).From the solution C X we can directly obtain the unknowns ÿi which in turn can be used together with the data points {(t i , y i )} and BCs to compute the segment coefficients (17) such that the spline y(t) is fully defined.Note that C A is symmetric, i.e., C A = C A T ; however, we do not make use of this property.
Although one can compute C X from ( 19) using an arbitrary solver for linear systems of equations, there is a more efficient way for doing so: since C A is tridiagonal, we can solve (19) with the Thomas algorithm [30,31].Derived from an LU decomposition of C A, one performs a recursive forward elimination followed by a backward substitution Computing C X out of C A and C B thus boils down to 5 n − 9 operations.Note that in contrast to [31] where A ∈ R n×n in our case C A ∈ R (n−1)×(n−1) , thus n changes to n − 1 for computing the count of operations in total [31].Since C A is symmetric and positive definite, one may think of using an algorithm based on Cholesky factorization instead of LU decomposition, as this has proven to be approximately twice as efficient where applicable.However, this rule of thumb seems to be no longer valid for the special case of tridiagonal matrices: the Cholesky factorization T = L D −1 L L T in [32], where the computation of the lower diagonal matrix L exploits the special structure and the diagonal matrix D L is used to avoid evaluating expensive square roots, leads to 7 n − 10 operations in total.
The numerical stability of cubic spline interpolation is shown in Appendix A.1.

Quintic Spline Interpolation: Derivation
As previously mentioned, the following is a detailed version of the derivation given in [18].Just as in the cubic case, our task is to pass through the given data points {(t i , y i )}, thus we enforce the interpolation constraints We use the additional degrees of freedom to enforce not only C 2 , but instead C 4 continuity with Q ṡi (ξ i ) i (ξ i ) i (ξ i ) Inserting ( 10), (11), and ( 12) into (28) and into the first two rows of (29) allows us to reformulate the spline coefficients to where ẏi and ÿi for i = 1, . . ., n − 1 are the 2 (n − 1) unknowns which we still have to determine using the remaining 2 (n − 1) equations given by the last two rows of (29).In particular, we expand the fourth row of ( 29) with ( 14) and insert Q a i , Q b i , and Q b i+1 from (30) to obtain In the same manner, we expand the third row of ( 29) with (13) and Both (31) and (32) hold for i = 1, . . ., n − 1 which allows casting them in the form of 1) given by and their block components (38) Just as in the cubic case, Q L i , Q D i , and Q U i ∈ R 2×2 represent the lower diagonal, diagonal, and upper diagonal blocks of Q A, respectively.As suggested in [18], the additional parameter κ in the definitions ( 35)-( 38) is chosen as From an analytical point of view, κ has no influence on the solution Q X (at least if κ = 0).However, it improves numerical stability which is verified in Appendix A.2.
As for the cubic spline Q A is symmetric and only depends on the choice of {t i }, while the interpolation points {y i } are contained in Q B. As before, the additional term Q Λ i incorporates the BCs such that we can write the system in the form of (34).In contrast to the cubic case, the solution Q X represents not only ÿi , but also ẏi , which is now additionally required to compute the segment coefficients from (30).
Since Q A is block-tridiagonal, we can again solve (33) efficiently with the generalization of the Thomas algorithm to block-tridiagonal matrices [31].Based on an LU decomposition of Q A, we first run a recursive forward elimination followed by a backward substitution Computing Q X out of Q A and Q B requires at a maximum 36 n − 60 operations.Again [31] uses A ∈ R 2 n×2 n while in our case Q A ∈ R 2(n−1)×2(n−1) holds, thus n changes to n − 1 for computing the count of operations in total [31].For this upper bound, explicit computation of the inverse of Q D 1 and ) by Gaussian elimination is assumed, which in practice should be avoided by solving a 2 × 2 LSE instead [31].Thus, a corresponding implementation can be expected to require even less operations.
In Appendix A.2, the interested reader can find considerations on the numerical stability of quintic spline interpolation.

Algorithm for Cubic/Quintic Spline Interpolation
Since the presented derivation is rather lengthy, the key steps for interpolating cubic/quintic splines following the proposed method are summarized in Algorithm 1.
begin Cubic: Setup C L i , C D i , C U i , and C B i from ( 21), ( 22), (23), and ( 24) 35), ( 36), (37), and (38) Cubic: Compute C X i using the recursive scheme given in (25), (26), and (27) Quintic: Compute Q X i using the recursive scheme given in (40), (41), and (42) Cubic: Extract ÿi from C X i and compute segment coefficients C a i , C b i , C c i , and C d i using (17 and Q f i using ( 30) end For details on convergence order and approximation error of quintic spline interpolation, the interested reader is referred to [18] where these issues have been experimentally investigated for various examples.

Spline Collocation: Derivation
The following is based on the collocation algorithm presented in [19,33].However, we extend the method from cubic to quintic splines.Moreover, we do not use natural splines, but instead integrate the boundary conditions directly into the scheme.Lastly, in contrast to [19,33], we do not need to modify the right-hand side of (1), thus leading to a "true" collocation of the ODE for all collocation sites, which are chosen to be the interior spline knots.As runtime performance is of the highest priority for our application, we choose smoothest spline collocation.This minimizes the count of (expensive) collocation sites, thus reduces the count of equations to solve, and instead uses the available degrees of freedom to force C 2 (cubic spline) or C 4 (quintic spline) continuity.Moreover, in our application, y(t) is used as input for controlling the motion of a robot, thus a smooth y(t) is equivalent to small changes in joint accelerations, i.e., motor jerks, which in turn improves overall stability during locomotion.
As stated in Section 2.1, we require the approximation y(t) to fulfill the underlying ODE at certain collocation sites {t k }, see (3), while simultaneously satisfying the boundary conditions as specified in (4).Note that we use the index k instead of i to highlight that our new task consists in collocating the ODE at the interior knots, i.e., k = 1, . . ., n − 1 rather than the previously investigated interpolation at all knots, i.e., i = 0, . . ., n.Furthermore, it should be pointed out that although (3) holds, this does not imply that y(t k ) = F(t k ), ẏ(t k ) = Ḟ(t k ), or ÿ(t k ) = F(t k ).In other words, y(t) will not coincide with the real solution F(t) at the collocation sites {t k }.However, it will behave similarly at these spots (meaning that they will satisfy the same Equation ( 1)), which is illustrated in Figure 4.The approximation satisfies the underlying ODE at the specified collocation sites {t k } (blue), and fulfills the boundary conditions (BC) at t 0 and t n (orange), but do not necessarily coincide at the collocation points.
As first step, we introduce the auxiliary variables λ, η, and r which are defined as where C λ, C η, C r and Q λ, Q η, Q r are the corresponding counterparts for the case of cubic and quintic splines, respectively.While λ represents the (yet unknown) collocation points {y k }, η contains their corresponding first (and second) time derivatives, which can be seen as "internal" unknowns, as they will be implicitly defined through an embedded spline interpolation.Lastly, r depicts the BCs, where we lack ẏ0 and ẏn in the case of cubic splines as has been previously explained.From ( 17), (30), and (43) we observe that the spline segments s i are linear with respect to λ, η, and r, i.e., known for i = 0, . . ., n − 1 (44) holds.The gradients are fully defined by the spline partitioning {t i }, which is assumed to be known.Thus the construction of the spline y(t) is equivalent to the search for a corresponding λ and η.Note that to obtain (44), we used ( 17) and (30), which in turn were derived from fulfilling the interpolation condition together with enforcing continuity of the second time derivative (cubic spline), or first and second time derivative (quintic spline) at the interior knots.In order to accomplish full C 2 and C 4 continuity, we further make use of A X = B from ( 19) and (33), which represents continuity of the first time derivative (cubic spline) or third and fourth time derivative (quintic spline), respectively.
In particular, we observe from ( 23), ( 24), (37), and (38), that B is linear with respect to λ and r, thus holds, where the gradients again depend only on the known partitioning {t i }.We further observe that, according to the definitions ( 21), ( 35) and (43), we can write the mapping X = S η with C S := I and Q S := diag ( Q S 1 , . . . ,Q S n−1 ) where Q S i := −1 0 0 with I being the identity matrix of appropriate size.Since A and S are constant, by "constant" we mean in this context that an expression does not depend on the yet unknown λ or η, and assumed to be non-singular, it is clear from (45) that not only B, but also X and thus η are linear with respect to λ and r.Hence  . ( which we can use to compute the yet unknown gradients in (47).Note that this can be done very efficiently due to the (block-)tridiagonal form of A as already discussed in Section 2.3.Since S is diagonal, this property also holds for the product A S. However, for best numerical stability, one should solve for the gradients of X first and use the mapping (46) to obtain the gradients of η afterwards, which is of negligible cost since S, and thus also S −1 , is diagonal.The right-hand sides necessary to solve (48) only depend on {t i } and are derived in Appendix B. Lastly, we insert η from (47) into (44) and obtain or equivalently with the known spline gradients We can obtain the corresponding expressions for the first and second time derivatives by following the exact same scheme.In particular, we get with ∇ λ si (ξ i ) := ∂s i (ξ i ) ∂λ (54) Although computing the spline gradients can be done very efficiently, their mathematical representation is rather lengthy.A formulation of the gradients, which is ready for implementation, is given in Appendix B.
Lastly, we fulfill the dynamics of the ODE by inserting (50) and ( 52) into (3), which leads to for t 0 < t k < t n and t k < t k+1 with k = 1, . . ., n − 1.This can be formulated as LSE with where the k-th row of A coll and B coll corresponds to the collocation site t k and is given by Note that we choose a partitioning of the spline such that the collocation sites coincide with the starting ("left") knot of each segment, i.e., ξ k = 0, see Figure 4.This allows us to skip the computation of certain spline gradients.Since the underlying ODE is of order two, only gradients of the last three coefficients, i.e., C b i , C c i , C d i (cubic) or Q d i , Q e i , Q f i (quintic), have to be computed, for details see Appendix B. This simplifies the implementation and improves the overall performance.Solving (56) for λ represents the key operation (i.e., bottleneck for large n) of the proposed collocation method, since A coll is in general dense while all other operations are either simple explicit expressions or linear systems in (block-)tridiagonal form, which can be solved efficiently.This justifies our strategy to minimize the count of collocation points (and thus the dimension of A coll ) and instead force high order continuity.
As soon as λ has been obtained, we can compute η directly from (47), since the gradients of η with respect to λ and r are already available as by-products of computing λ, see (48).From λ, η and r the segment coefficients can finally be computed using (17) or (30).

Satisfying First Order Boundary Conditions for Cubic Splines
The presented method for cubic spline collocation respects y 0 , ÿ0 , y n , and ÿn as BCs.However, our task was to fulfill all BCs given in (4), which seems at first to be only possible with quintic spline collocation.Also satisfying the first-order BCs, i.e., ẏ0 and ẏn , can be achieved with moderate effort.For that purpose we insert two auxiliary knots, the so-called virtual control points t virt,1 and t virt,2 , which give us the necessary degrees of freedom to introduce additional constraints.The term "virtual" highlights that these points are not used as collocation sites.Both, t virt,1 and t virt,2 , have to lie within the specified start-and endtime and must not coincide with the collocation sites.This way the spline remains properly partitioned.For simplicity, we place the virtual control points at the centers of the (originally) first and last segments, i.e., Obviously, inserting two knots leads to a different segmentation of the spline, i.e., n → n + 2; however, the boundaries and collocation sites remain unchanged.If we adapt the indexing such that t 1 := t virt,1 and t n−1 := t virt,2 , all findings derived so far are also valid for this case.The only difference is that we do not force y(t) to fulfill the underlying ODE at t 1 and t n−1 anymore.Instead, we satisfy the BCs ẏ0 and ẏn by replacing the first and last row of A coll and B coll with In this way the resulting cubic spline fulfills all BCs given in (4), satisfies the underlying ODE at the specified collocation sites, and is C 2 continuous at the interior spline knots (which include t virt,1 and t virt,2 ).

Algorithm for Cubic/Quintic Spline Collocation
We summarize our findings in Algorithm 2. Note that, for the case of cubic splines, the modification necessary to satisfy first-order BCs is already integrated.usability, the library is designed to be header-only and uses the (also header-only) cross-platform linear algebra system EIGEN [35] as sole dependency.Although BROCCOLI also has other dependencies, only EIGEN is required for the functionality discussed in this paper.Aside from basic matrix and vector operations, we use EIGEN to solve dense linear systems of equations such as 40) and (41).In particular, we use a Householder rank-revealing QR decomposition with column-pivoting, see ColPivHouseholderQR in EIGEN, since it provides the best trade-off between speed, accuracy, and robustness for our use case.Note that for large scale systems solving A −1 coll B coll turns out to be the bottleneck, cf. Figure 11.Thus, in this case one might choose a parallel solver for (56) instead, for example PartialPivLU in EIGEN with OPENMP enabled.In addition to the variants CubicSplineCollocator and QuinticSplineCollocator for spline collocation, see ode module of BROCCOLI, cubic and quintic spline interpolation, see curve module of BROCCOLI, as specified in Algorithm 1, is also implemented.An overview over the structure of the source code related to our collocation method is given in Appendix C. Lastly, unit tests, similar to the evaluation presented in the following, are shipped with BROCCOLI.These can be used as example applications.

Test System
In order to evaluate the proposed method and its variants, a simple mass-spring-damper system is considered, see Figure 5 (left).For the sake of simplicity, we do not consider external excitation, i.e., τ(t) := 0. The corresponding ODE (1) describing the system dynamics simplifies to where α, β, and γ are constants representing the mass and (linear) damping/stiffness coefficients of the system, respectively.Using textbook mathematics, we can find the analytical solution given by where we used the initial conditions F(t 0 = 0) = F 0 and Ḟ(t 0 = 0) = Ḟ0 , and the abbreviations The characteristic shape of each branch of ( 63) is visualized in Figure 5 (right) for the parametrization α = 1, β = 1, γ = 10 in the underdamped case, α = 1, β = 10, γ = 10 in the overdamped case, and α = 1, β = 10, γ = 25 in the critically damped case.For the rest of this paper we will adhere to this parametrization.Moreover, we assume the initial conditions to be given with F 0 = 1, Ḟ0 = 0. Note that by choosing σ 1 < 0 we obtain asymptotically stable behavior.As we never constrained the underlying ODE to be stable, our algorithm can also be used to approximate instable systems.Since tests with β = −1 showed results comparable to the underdamped case (with β = 1), we omit an explicit discussion of this case for brevity.Lastly, we point out that although (62) describes a very simple system, we also successfully applied the proposed algorithm to the much more complex walking-pattern generation system of our humanoid robot LOLA, cf.[13].However, the properties and characteristics of the proposed method can be better investigated and explained by means of a less complex test system.

Convergence for Consistent and Inconsistent Boundary Conditions
As already mentioned, we focus on over-determined BVPs.Thus, we consider two cases for evaluating our algorithm.For the first analysis, we use the initial conditions y 0 = F 0 = 1, ẏ0 = Ḟ0 = 0 and correspondingly ÿ0 = F0 = −γ/α, see (62), together with the analytical solution given in (63) to compute the BCs y n = F n , ẏn = Ḟn , and ÿn = Fn at t n = 5.In this way, the BCs are guaranteed to be consistent because they belong to the same analytic solution F(t).In the second case, we keep the initial BCs unchanged, but force y n = ẏn = ÿn = 0 for t n = 5 which deviates from the previous solution F n , Ḟn , and Fn , especially in the underdamped case, as can be clearly seen in Figure 5 right.Thus, the second analysis handles inconsistent BCs.
In the following, we only focus on the underdamped and overdamped case, since the critically damped case can be seen as a special form of these with σ 2 → 0. By evaluating both cases, we aim at covering oscillating and non-oscillating dynamics.Moreover, we run tests for different counts of collocation sites ν := |{t k }|, where we use a uniform segmentation of the spline, i.e., h i = h i+1 ∀ i.For cubic and quintic spline collocation without virtual control points ν = n − 2 holds.In contrast, ν = n − 4 holds for cubic spline collocation with virtual control points.Note that an inhomogeneous partitioning h i = h i+1 is evaluated per default in the unit tests provided with BROCCOLI.The approximation of the BVP by spline collocation with consistent BCs and ν = 1, . . ., 100 is depicted in Figure 6.Note that for cubic spline collocation without virtual control points, the BCs ẏ0 and ẏn are violated (see Section 2.7).
For the case of consistent BCs, we observe that the approximation y(t) indeed converges for increasing ν to the analytical solution F(t).From a qualitative point of view, the convergence order using a quintic spline (Figure 6 right column) is clearly higher than the corresponding cubic counterparts (Figure 6 left and center column).In order to compare convergence using a quantitative measure, we use the root mean square (RMS) of the approximation error e(t) and of the residual r(t), defined as Figure 6.Consistent BCs: convergence of the approximation y(t) (blue and green) towards the analytic solution F(t) (black, dashed) for ν = 1, . . .9, 30, 50, 70, 100.The top row belongs to the underdamped case while the bottom row represents the overdamped case.From left to right: approximation with cubic spline without virtual control points (left), cubic spline with virtual control points (center) and quintic spline (right).The corresponding best approximation ν max is drawn in bold blue.
For numerical evaluation of (65) and (66), we discretize the embedded integral using a time step size of ∆t = 0.01 for which we obtain the results presented in Figure 7 left.Since we stop the test series at ν max = 100, we find the optimum for all variants to be at ν opt = ν max .However, due to convergence, we expect the theoretical optimum to be at ν opt → ∞.We observe that also from an quantitative point of view, quintic spline collocation clearly outperforms the cubic variants for the same ν.Furthermore, forcing first-order BCs through virtual control points of the cubic spline seems to have a negative influence, especially for increasing ν.The influence of virtual control points on the residual is visualized in Figure 8 where peaks in r(t) occur at these spots.Note that without virtual control points, the first-order boundary conditions at t 0 = 0 and t n = 5 are missed.Thus, in contrast to the collocation sites, the residual does not drop to zero at the boundaries.By comparing the left and right plot of Figure 8, we observe that r(t) behaves similarly for consistent and inconsistent BCs.
For consistent BCs, we can state that, at least for the investigated test system, the proposed algorithm leads to an approximation which converges to the real solution where the "speed" of convergence depends on the chosen variant of Algorithm 2. For inconsistent BCs, however, our analysis draws a different picture: while we can find an optimal count of collocation sites ν opt for each variant and test case, the approximation diverges for ν → ∞, see Figure 7 right and Figure 9.Note that this was expected since we are attempting to find an approximation of a solution which does not actually exist.For small ν, we still find a "reasonable" y(t) where the spline is still able to smooth out the wrong BCs.However, as we refine the segmentation of the spline, it gets harder to compensate the error, which in turn leads to undesired oscillations of y(t).Note that although we call this behavior divergence, our collocation algorithm still satisfies all constraints that we defined: using a given partitioning, it fulfills the BCs and satisfies the underlying ODE at the given collocation sites.Thus, the divergence is not a fault of the proposed algorithm, but rather is due to the non-existence of the solution F(t).Moreover, the sensitivity to undesired oscillations depends heavily on the underlying BVP: for our target application of real-time motion planning, we did not observe any oscillations up to ν = 10 4 .The critical value for ν is expected to be even higher.However, we could not determine it due to memory limitations of our test system.65) and (66).The left subscript differs between cubic C and quintic Q spline collocation.Moreover, for cubic spline collocation, we identify the variants without virtual control points (i.e., free first-order boundaries ẏ0 , ẏn ) and with virtual control points by the right subscript free and virt, respectively.The left plot belongs to the case of consistent BCs while the right one was obtained using inconsistent BCs.Note that for inconsistent BCs an analytic solution does not exist, thus the error e(t) is not defined and we consider only the residual r(t).

Figure 8.
Residual r(t) as defined in (66) for consistent (left) and inconsistent (right) BCs in the underdamped case.For best presentation, the count of collocation sites is chosen as ν = 4 such that the collocation sites (dots) are given by {t k } = {1, 2, 3, 4}.For cubic spline collocation, the left subscript C and for the quintic counterpart Q is used.Moreover, the right subscripts free and virt are used to differentiate between the variants without and with virtual control points, respectively.The virtual control points are highlighted with circles.

Figure 9.
Inconsistent BCs: approximation y(t) (blue, green, and orange) and reference system F(t) (black, dashed) for ν = 1, . . ., 4, ν opt , 13, 15, 17, 20.The top row belongs to the underdamped case while the bottom row represents the overdamped case.From left to right: approximation with cubic spline (no virtual control points, left), cubic spline (with virtual control points, center) and quintic spline (right).The corresponding best approximation ν opt is drawn in bold blue.Diverging approximations for ν > ν opt are colored orange.
In order to avoid undesired oscillations, an optimal ν has to be chosen, which seems to be difficult at the first sight, since in practice we do not know the analytical solution, and thus cannot evaluate e(t).However, one can use the residual r(t) as measure for the approximation error and use this to formulate a governing optimization for finding ν opt .Moreover, one can also use a non-uniform segmentation to give specific sections more weight.Such adaptive techniques for automatic mesh refinement have also been developed for other collocation methods, see, for example, [36].However, for our application it is sufficient to choose h i once (fixed), thus we withhold this idea for future investigations.

Runtime Analysis
In the following, we present measurements, which have been made by using the implementation given in BROCCOLI with the version primo (v1.0.0 -commit 89280c69).We used an AMD Ryzen 7 1700X 8x (16x) @3.40GHz with 32GB DDR4 RAM @2133MHz as hardware backend and Ubuntu 18.04.2LTS 64bit (Linux kernel 4.15.0-51)together with Clang (version 6.0.0-1) on optimization level 3 as software basis.Although our algorithm is sequential, we run different test cases on four physical cores of the CPU in parallel.For all tests simultaneous multi-threading (SMT) was disabled in the BIOS.For runtime evaluation, we take 1000 samples for every code section in Algorithm 2 and choose the minimum execution time as reference to minimize the risk of wrong measurements due to high system load and context switching effects.
In Figure 10 (left), we present runtime measurements for all three variants of our algorithm.We restrict our analysis to the case of an underdamped parametrization and consistent BCs since we expect comparable results for the other test cases.Since our algorithm has a deterministic runtime which only depends on the chosen variant (cubic/quintic, with/without virtual control points) and ν, there is no reason why it should be faster or slower with another parametrization.We observe that quintic spline collocation is more expensive than the cubic counterparts for the same ν, which complies with theory since the (block-)tridiagonal system for quintic splines is twice the size of the tridiagonal system for cubic splines.However, for increasing ν, the gap becomes smaller since solving the collocation equations, where A coll is of the same dimension for cubic and quintic splines, becomes the bottleneck, see Figure 11.Note that there is only a small difference in runtime between cubic spline collocation with and without virtual control points.This also complies with theory, since the only difference is two additional spline knots, which increases the dimension of A coll by two.Lastly, the small ripples in Figure 10 (left) are due to vectorization and SIMD optimizations handled by EIGEN and the compiler, which give slightly better performance if the dimension of arrays in memory are a multiple of two.Comparing runtimes for the same ν might not be a meaningful basis for choosing a method.Instead, one is typically interested in getting the best approximation in the shortest time.For this purpose, the RMS of the residual r(t) is plotted over runtime in Figure 10 (right).As can be seen, quintic spline collocation significantly outperforms both other variants.Moreover, bad convergence of cubic spline collocation with virtual control points is also visible in this comparison.In addition to measuring total runtime, we also performed a detailed analysis on the relative cost of each code section of Algorithm 2 during quintic spline collocation.Figure 11 demonstrates that in the vicinity of ν ≈ 160, evaluating the block-tridiagonal systems to enforce BCs and continuity, and solving A coll λ = B coll for actual collocation share approximately the same portion of total runtime.With increasing ν, solving for λ becomes more relevant since the corresponding system is dense while the block-tridiagonal LSE can be solved efficiently using the recursive scheme discussed in Section 2.3.
Lastly, we want to point out that the condition of A coll gets worse for increasing count of collocation sites ν, see Figure 10 left.Thus, there might be an upper limit of ν for the proposed algorithm.

Discussion
As shown in the previous section, the presented algorithm performs well if the BCs are consistent and fully known.Even in the case of inconsistent BCs, we still obtain a reasonable approximation as long as we carefully pick the collocation sites.However, if we exceed the optimum, undesired oscillations may occur, which is an indicator for putting too much emphasis on satisfying the underlying ODE while simultaneously trying to compensate the "broken" BCs.In order to automatically determine an optimal partitioning of the spline, a higher level optimization may be applied, which will be the focus of further investigations.
Obviously, if the investigated BVP is well-posed, i.e., if it is not over-determined, other techniques should be preferred over our approach.However, for applications where the enforcement of certain BCs is more important than approximating the underlying dynamics, the proposed method seems to be a valid approach.Moreover, we want to emphasize that no variable recursion or iteration is involved.This makes execution time predictable, which is especially relevant for real-time applications such as in our use case.
Comparing different variants of our algorithm showed that collocation using a quintic spline is in general superior to using the somewhat simpler cubic splines.Note that although its derivation is more involved, the final implementation is of approximately the same complexity, since virtual control points have to be introduced in the cubic case.At this point, we also want to emphasize that, based on the results of our study, we do not recommend to use cubic spline collocation with virtual control points.Although the full set of BCs is satisfied, the additional knots seem to significantly downgrade convergence.However, other choices of t virt,1 and t virt,2 may lead to different results.
Within this contribution, we only considered second-order BVPs.An extension to ODEs of higher order seems to be straightforward, since only the collocation equations, see (58) and (59) have to be extended by the corresponding gradients while the overall dimension of A coll and B coll stays the same.Moreover, our approach may be applied to nonlinear systems as well, by using the common approach of linearization and embedding the scheme into a Newton iteration.
Lastly, we want to highlight that our focus lies on runtime efficiency.However, for embedded systems especially, memory consumption may also be a limiting factor.Although we expect our algorithm to have similar requirements when compared to other techniques, we have not looked into this issue so far.
and further which shows that Γ i does not depend on g i for a constant ratio ω.It can be easily verified through numerical evaluation of (A4) that a homogeneous partitioning of the spline, i.e., ω = 1, results in the lowest value for Γ i .Using the spectral matrix norm and the special choice of κ given in (39) leads to Unfortunately, condition (A2) is violated, even for the ideal case of a homogeneously partitioned spline.However, (A2) represents a sufficient but not necessary condition for numerical stability, thus we can still use Γ i as a measure for the pivotal growth [37] which should be minimized.In doing so, we can conclude that for best numerical stability one should use a "reasonable" ratio ω which is close to 1.
Note that the special choice of κ in (39) has been suggested in [18] to minimize Γ i and thus optimize numerical stability.This intention becomes clear when observing that for κ 2 ≥ 64 3 where λ max (. . . ) denotes the maximum eigenvalue of a given matrix.Since both Q Li (ω) , increase with growing |κ|, the optimum (39) is chosen.This finding can be easily verified through numerical investigation.Note that numerical stability only depends on |κ|, thus one could also choose the negative form κ = − √ 64/3.

Appendix B. Spline Gradients
In the following, explicit expressions for the spline gradients used in (51), (53), and (54) are given.Note that the gradients differ depending on the type of the underlying spline (cubic/quintic).

Appendix B.1. Cubic Spline Gradients
From ( 7), (8), and (9) we obtain for each variable C ρ ∈ { C λ, C η, C r} and each segment C s i with i = 0, . . ., n − 1 )  which can easily be derived from (17) and the definition of C λ, C η, and C r.Note that for ξ i = 0 the computation of can be skipped entirely since up to the second time derivative of the gradients of s i (ξ i = 0) these terms are multiplied with zero and have no effect anyway.In order to solve (48), we have to further compute the corresponding right-hand sides which are given by

. Quintic Spline Gradients
From ( 10), (11), and ( 12) we obtain for each variable Q ρ ∈ { Q λ, Q η, Q r} and each segment Q s i with i = 0, . . ., n − 1 15 for u = i , −15 for u = i + 1 , 0 else for i = 0 ∧ w = 3 , 0 else , (A27) 0 else , (A28) which can easily be derived from (30) and the definition of Q λ, Q η, and Q r.Note that for ξ i = 0 the computation of can be skipped entirely since up to the second time derivative of the gradients of s i (ξ i = 0) these terms are multiplied with zero and have no effect anyway.In order to solve (48), we further have to compute the corresponding right-hand sides, which are given by

Figure 1 .
Figure 1.Humanoid robot LOLA developed at the Chair of Applied Mechanics, Technical University of Munich (TUM).The proposed algorithm is used within the walking pattern generation framework of LOLA, see[13] for details.The robot is 1.8 m tall and weights about 60 kg.Left: photo and kinematic configuration of the system with 24 actuated degrees of freedom.Right: simplified model of multi-body dynamics with torso mass m t , foot mass m f and torso mass moment of inertia Θ t which (together with the ground reaction forces/torques) contribute to the ordinary differential equation (ODE) describing the center of mass (CoM) dynamics (blue).

Figure 2 .
Figure 2. Visual interpretation of the over-determined boundary value problem (BVP): the start and end pose (left/right) represent the boundary conditions, while the intermediate motion (transparent) tries to approximate the inherent system dynamics of the simplified model.

Figure 3 .
Figure 3. Segmentation and parametrization of the investigated spline y(t).The spline consists of n interconnected segments, which share the interior knots with their neighbors.Each segment is described through the local interpolation parameter ξ i ∈ [0, 1] (blue).

Figure 4 .
Figure 4.The computed spline y(t) (black) approximating the real solution F(t) (green).The approximation satisfies the underlying ODE at the specified collocation sites {t k } (blue), and fulfills the boundary conditions (BC) at t 0 and t n (orange), but do not necessarily coincide at the collocation points.

Figure 7 .
Figure 7. Root mean square (RMS) of error e(t) and residual r(t) as defined in (65) and (66).The left subscript differs between cubic C and quintic Q spline collocation.Moreover, for cubic spline collocation, we identify the variants without virtual control points (i.e., free first-order boundaries ẏ0 , ẏn ) and with virtual control points by the right subscript free and virt, respectively.The left plot belongs to the case of consistent BCs while the right one was obtained using inconsistent BCs.Note that for inconsistent BCs an analytic solution does not exist, thus the error e(t) is not defined and we consider only the residual r(t).

Figure 10 .
Figure10.Left: (minimum) runtime T and condition C of A coll for running Algorithm 2 over count of collocation sites ν.Right: root mean square (RMS) of residual r(t) as defined in (66) over (minimum) runtime T. The left subscript C and Q belong the cubic or quintic spline version of the algorithm, respectively.Moreover, the right subscript indicates if cubic spline collocation was performed without (free) or with (virt) virtual control points.All measurements were performed using the underdamped parametrization and consistent BCs.

Figure 11 .
Figure11.Runtime (percentile) of relevant steps of Algorithm 2 relative to total runtime for quintic spline collocation.Code sections with negligible execution time are not plotted and also not accounted for total runtime.The measurements were obtained by using an extended time horizon t n = 50 and the parametrization α = 1, β = 0.1, γ = 10 (underdamped) to allow a better representation of high counts of collocation sites of up to ν = 300.