The Adaptive Dynamic Programming Toolbox

The paper develops the adaptive dynamic programming toolbox (ADPT), which is a MATLAB-based software package and computationally solves optimal control problems for continuous-time control-affine systems. The ADPT produces approximate optimal feedback controls by employing the adaptive dynamic programming technique and solving the Hamilton–Jacobi–Bellman equation approximately. A novel implementation method is derived to optimize the memory consumption by the ADPT throughout its execution. The ADPT supports two working modes: model-based mode and model-free mode. In the former mode, the ADPT computes optimal feedback controls provided the system dynamics. In the latter mode, optimal feedback controls are generated from the measurements of system trajectories, without the requirement of knowledge of the system model. Multiple setting options are provided in the ADPT, such that various customized circumstances can be accommodated. Compared to other popular software toolboxes for optimal control, the ADPT features computational precision and time efficiency, which is illustrated with its applications to a highly non-linear satellite attitude control problem.


Introduction
Optimal control is an important branch in control engineering. For continuoustime dynamical systems, finding an optimal feedback control involves solving the socalled Hamilton-Jacobi-Bellman (HJB) equation [1]. For linear systems, however, the HJB equation simplifies to the well-known Riccati equation which results in the linear quadratic regulator [2]. For non-linear systems, solving the HJB equation is generally a formidable task due to its inherently non-linear nature. As a result, there has been a great deal of research devoted to approximately solving the HJB equation. Al'brekht proposed a power series method for smooth systems to solve the HJB equation [3]. Under the assumption that the optimal control and the optimal cost function can be represented in Taylor series, by plugging the series expansions of the dynamics, the cost integrand function, the optimal control and the optimal cost function into the HJB equation and collecting terms degree by degree, the Taylor expansions of the optimal control and the optimal cost function can be recursively obtained. Similar ideas can be found in [4,5]. A recursive algorithm is developed to sequentially improve the control law which converges to the optimal one by starting with an admissible control [6]. This recursive algorithm is commonly referred to as policy iteration (PI) and can be also found in [7][8][9]. The common limitation of these methods is that the complete knowledge of the system is required.
In the past few decades, reinforcement learning (RL) [10] has provided a means to design optimal controllers in an adaptive manner from the viewpoint of learning. Adaptive or approximate dynamic programming (ADP), which is an iterative RL-based adaptive optimal control design method, has been proposed in [11][12][13][14][15]. An approach that employs ADP is proposed in [11] for linear systems without requiring the priori knowledge of the system matrices. An ADP strategy is presented for non-linear systems with partially unknown dynamics in [12], and the necessity of the knowledge of system model is fully relaxed in [13][14][15].
Together with the growth of optimal control theory and methods, several software tools for optimal control have been developed. Notable examples are non-linear systems toolbox [16], control toolbox [17], ACADO [18], its successor ACADOS [19], and GPOPS-II [20]. A common feature of these packages is that system equations are used in them. In addition, optimal controls generated by [17][18][19][20] are open-loop, such that an optimal control is computed for each initial state. Therefore, if the initial state changes, optimal controls need to be computed again. In contrast, the non-linear systems toolbox [16] produces an optimal feedback control by solving the HJB equation.
The primary objective of this paper is to develop a MATLAB-based toolbox that solves optimal feedback control problems computationally for control-affine systems in the continuous-time domain. More specifically, employing the adaptive dynamic programming technique, we derive a computational methodology to compute approximate optimal feedback controls, based on which we develop the adaptive dynamic programming toolbox (ADPT). In the derivation, the Kronecker product used in [11,14] is replaced by Euclidean inner product for the purpose of memory saving during execution of the ADPT. The ADPT supports two working modes: the model-based mode and the model-free mode. The knowledge of system equations is required in the model-based mode. In the model-free mode, the ADPT produces the approximate optimal feedback control from measurements of system trajectories, removing the requirement of the knowledge of system equations. Moreover, multiple options are provided, such that the user can use the toolbox with much flexibility.
The remainder of the paper is organized as follows. Section 2 reviews the standard optimal control problem for a class of continuous-time non-linear systems and the modelfree adaptive dynamic programming technique. Section 3 provides implementation details and software features of the ADPT. In Section 4, the ADPT is applied to a satellite attitude control problem in both the model-based mode and the model-free mode. Conclusions and potential future directions are given in Section 5. The codes of the ADPT are available at https://github.com/Everglow0214/The_Adaptive_Dynamic_Programming_Toolbox, accessed on 10 August 2021.

Review of Adaptive Dynamic Programming
We review the adaptive dynamic programming (ADP) technique to solve optimal control problems [13,14]. Consider a continuous-time control-affine system given bẏ where x ∈ R n is the state, u ∈ R m is the control, f : R n → R n and g : R n → R n×m are locally Lipschitz mappings with f (0) = 0. It is assumed that (1) is stabilizable at x = 0 in the sense that the system can be locally asymptotically stabilized by a continuous feedback control. To quantify the performance of a control, an integral cost associated with (1) is given by where x 0 = x(0) is the initial state, q : R n → R ≥0 is a positive definite function and R ∈ R m×m is a symmetric, positive definite matrix. A feedback control u : R n → R m is said to be admissible if it stabilizes (1) at the origin, and makes the cost J(x 0 , u) finite for all x 0 in a neighborhood of x = 0. The objective is to find a control policy u that minimizes J(x 0 , u) given x 0 . Define the optimal cost function V * : R n → R by for x ∈ R n . Then, V * satisfies the HJB equation and the minimizer in the HJB equation is the optimal control which is expressed in terms of V * as Moreover, the state feedback u * locally asymptotically stabilizes (1) at the origin and minimizes (2) over all admissible controls [2]. Solving the HJB equation analytically is extremely difficult in general except for linear cases. Hence, approximate or iterative methods are needed to solve the HJB, and the well-known policy iteration (PI) technique [6] is reviewed in Algorithm 1. Let {V i (x)} i≥0 and {u i+1 (x)} i≥0 be the sequences of functions generated by PI in Algorithm 1. It is shown in [6] that V i+1 (x) ≤ V i (x) for i ≥ 0, and the limit functions V(x) = lim i→∞ V i (x) and u(x) = lim i→∞ u i (x) are equal to the optimal cost function V * and the optimal control u * .

Algorithm 1 Policy iteration
Input: An initial admissible control u 0 (x), and a threshold > 0. Output: The approximate optimal control u i+1 (x) and the approximate optimal cost func- Policy evaluation: solve for the continuously differentiable cost function Policy improvement: update the control policy by

5:
if u i+1 (x) − u i (x) ≤ for all x then 6: break 7: end if 8: Set i ← i + 1. 9: end while As proposed in [13,14], consider approximating the solutions to (3) and (4) by ADP instead of obtaining them exactly. For this purpose, choose an admissible feedback control u 0 : R n → R m for (1) and let {V i (x)} i≥0 and {u i+1 (x)} i≥0 be the sequences of functions generated by PI in Algorithm 1 starting with the control u 0 (x). Following [13,14], choose a bounded time-varying exploration signal η : R → R m , and apply the sum u 0 (x) + η(t) to (1) as follows: Assume that solutions to (5) are well defined for all positive time. Let T (x, u 0 , η, [r, s]) = {(x(t), u 0 (x(t)), η(t)) | r ≤ t ≤ s} denote the trajectory x(t) of the system (5) with the input u 0 + η over the time interval [r, s] with 0 ≤ r < s. The system (5) can be rewritten aṡ where Combined with (3) and (4), the time derivative of V i (x) along the trajectory x(t) of (6) is obtained asV for i ≥ 0. By integrating both sides of (7) over any time interval [r, s] with 0 ≤ r < s, one gets Let φ j : R n → R and ϕ j : R n → R m , with j = 1, 2, . . . , be two infinite sequences of continuous basis functions on a compact set in R n containing the origin as an interior point that vanish at the origin [13,14]. Then, V i (x) and u i+1 (x) for each i ≥ 0 can be expressed as infinite series of the basis functions. For each i ≥ 0 letV i (x) andû i+1 (x) be approximations of V i (x) and u i+1 (x) given byV where N 1 > 0 and N 2 > 0 are integers and c i,j , w i,j ∈ R are coefficients to be found for each i ≥ 0. Then, Equation (8) is approximated byV i (x) andû i+1 (x) as follows: Suppose that we have K trajectories T (x, u 0 , η, [r k , s k ]) available, k = 1, . . . , K, where x(t), u 0 (t), and η(t) satisfy (6) over the K time intervals [r k , s k ], k = 1, . . . , K. Then, we have K equations of the form (11) for each i ≥ 0, which can be written as where e i,k := Then, the coefficients {c i,j } N 1 j=1 and {w i,j } N 2 j=1 are obtained by minimizing In other words, the K equations in (13) are solved in the least squares sense for the coefficients, be generated from (11). According to ([14], Cor. 3.2.4), for any arbitrary > 0, there exist integers i * > 0, N * * 1 > 0 and N * * 2 > 0, such that Remark 1. The ADP algorithm relies only on the measurements of states, the initial control policy and the exploration signal, lifting the requirement of knowing the precise system model, while the conventional policy iteration algorithm in Algorithm 1 requires the knowledge of the exact system model. Hence, the ADP algorithm is 100% data-based and model-free. (11)

Remark 2. Equation
for r k ≤ t ≤ s k . Then, we have K equations of the form (11) for each i ≥ 0, which can be written as For the sake of simplicity of presentation, however, in this paper we will fix η and the initial states and vary only the time intervals to generate trajectory data.

Implementation Details and Software Features
We now discuss implementation details and features of the adaptive dynamic programming toolbox (ADPT). We provide two modes to generate approximate optimal feedback controls; one mode requires the knowledge of system model, but the other eliminates this requirement, giving rise to the ADPT's unique capability of handling model-free cases.

Implementation of Computational Adaptive Dynamic Programming
To approximate V i (x) and u i+1 (x) in (3) and (4), monomials composed of state variables are selected as basis functions. For a pre-fixed number d ≥ 1, define a column vector Φ d (x) by ordering monomials in graded reverse lexicographic order [21] as , where x = (x 1 , x 2 , . . . , x n ) ∈ R n is the state, d ≥ 1 is the highest degree of the monomials, and N is given by For example, if n = 3 and d = 3, the corresponding ordered monomials are According to (9) and (10), the cost function V i (x) and the control u i+1 (x) are approxi- where d ≥ 1 is the approximation degree, and c i ∈ R 1×N 1 and W i ∈ R m×N 2 are composed of coefficients corresponding to the monomials in Φ d+1 (x) and Φ d (x) with We take the highest degree of monomials to approximate V i greater by one than the approximation degree since u i+1 is obtained by taking the gradient of V i in (4) and g(x) is constant in most cases.

Theorem 1. Let a set of trajectories be defined as
Then the coefficients c i and W i satisfy where and for i = 1, 2, . . . , for k = 1, 2, . . . , K, the operator ·, · denotes the Euclidean inner product with E, F = ∑ ij E ij F ij for matrices E = [E ij ] and F = [F ij ] of equal size, and the operator vec(·) is defined as with z j ∈ R m×1 being the jth column of a matrix Z ∈ R m×n for j = 1, . . . , n.
We now give the computational adaptive dynamic programming algorithm in Algorithm 2 for practical implementation. To solve the least squares problem in line 5 in the algorithm, we need to have a sufficiently large number K of trajectories, such that the minimization problem can be solved well numerically. Then the approximate optimal feedback control is generated by the algorithm asû i+1 = W i Φ d (x).

Algorithm 2 Computational adaptive dynamic programming
Input: An approximation degree d ≥ 1, an initial admissible control u 0 (x), an exploration signal η(t), and a threshold > 0. Output: The approximate optimal controlû i+1 (x) and the approximate optimal cost func-tionV i (x). 1: Apply u = u 0 + η as the input during a sufficiently long period and collect necessary data. 2: Set i ← 0. 3: while i ≥ 0 do 4: Generate A i and b i .

5:
Obtain c i and W i by solving the minimization problem Set i ← i + 1. 10: end while 11:

Remark 4.
In Theorem 1, the Kronecker product that is used in [11,14] for practical implementation is replaced by Euclidean inner product. Notice that s k r k γ(x) dt ∈ R N 2 ×N 2 is symmetric, k = 1, . . . , K. Thus, only upper triangular elements of these matrices are required to be stored. On the other hand, by using Kronecker product, one has to save all the elements of these matrices. As a result, less memory space of the processor is occupied by Theorem 1 especially when the number of basis functions to represent the approximate optimal control is large.

Remark 5.
In the situation where the system dynamic equations are known, the ADPT uses the Runge-Kutta method to simultaneously compute the trajectory points x(r k ) and x(s k ) and the integral terms that appear in A i and b i . In the case when system equations are not known but trajectory data are available, the ADPT applies the trapezoidal method to evaluate these integrals numerically. In this case, each trajectory T (x, u 0 , η, [r k , s k ]) is represented by a set of its sample points {x(t k, ), u 0 (t k, ), is a finite sequence that satisfies r k = t k,1 < t k,2 < . . . < t k,L k −1 < t k,L k = s k , and then the trapezoidal method is applied on these sample points to numerically evaluate the integrals over the time interval [r k , s k ]. If intermediate points in the interval [r k , s k ] are not available so that partitioning the interval [r k , s k ] is impossible, then we use the two end points r k and s k to evaluate the integral by the trapezoidal method as for a function h(t).

Symbolic Expressions
It is of great importance for an optimal control package that the user can describe functions, such as system equations, cost functions, etc., in a convenient manner. The idea of the ADPT is to use symbolic expressions. Consider an optimal control problem, where the system model is in the form (1) with where x = (x 1 , x 2 ) ∈ R 2 is the state, u ∈ R is the control, and k 1 , k 2 , k 3 , k 4 ∈ R are system parameters. The cost function is in the form (2) with Then in the ADPT the system dynamics and the cost function can be defined in lines 1-17 in Listing A1 provided in the Appendix A.

Working Modes
Two working modes are provided in the ADPT; the model-based mode and the modelfree mode. The model-based mode deals with the situation where the system model is given, while the model-free mode addresses the situation where the system model is not known but only trajectory data are available. An example of the model-based mode is given in Listing A1, where after defining the system model (22), the cost function (23) and the approximation degree d in lines 1-20, the function, adpModelBased, returns the coefficients W i and c i for the controlû i+1 and the cost functionV i , respectively, in line 21.
An example of the model-free mode is shown in Listing A2 in the Appendix A, where the system model (22)  In both the model-based and model-free modes the approximate control is saved in the file, uAdp.m, that is generated automatically and can be applied by calling u=uAdp(x) without dependence on other files. Similarly, the user may also check the approximate cost through the file, VAdp.m.

Options
Multiple options are provided such that the user may customize optimal control problems in a convenient way. We here illustrate usage of some of the options, referring the reader for the other options to the user manual available at https://github.com/Everglow0 214/The_Adaptive_Dynamic_Programming_Toolbox, accessed on 10 August 2021.
In the model-based mode, the user may set option values through the function, adpSetModelBased, in a name-value manner before calling adpModelBased. That is, the specified values may be assigned to the named options. An example is shown in Listing A3 in the Appendix A, where two sets of initial states, time intervals and exploration signals are specified in lines 1-9. Then, in line 15 the output of adpSetModelBased should be passed to adpModelBased for the options to take effect. Otherwise, the default values would be used for the options as in line 21 in Listing A1.
For the command, adpModelFree, option values can be modified with the function, adpSetModelFree, in the name-value manner. Among the options, 'stride' enables the user to record values of states, initial controls and exploration signals in a high frequency for a long time, while using only a portion of them in the iteration process inside adpModelFree. To illustrate it, let each trajectory in the set S T of trajectories in the statement of Theorem 1 be represented by two sample points at time r k and s k , that is, the trapezoidal method evaluates integrals over [r k , s k ] by taking values at r k and s k as in (21). Suppose that trajectories in S T are consecutive, that is, s k = r k+1 for k = 1, 2, . . . , K − 1. By setting 'stride' to a positive integer δ, the data used to generate A i and b i in Algorithm 2 become {T (x, u 0 , η, [r 1+iδ , s (i+1)δ ]), i ∈ N, (i + 1)δ ≤ K}. For example, consider 3 consecutive trajectories T (x, u 0 , η, [r k , r k+1 ]) with k = 1, 2, 3. If 'stride' is set to 1, one will have three equations from (11) as follows: for k = 1, 2, 3. These three equations contribute to three rows of A i and three rows of b i as in Theorem 1. If 'stride' is set to 3, then one will have only one equation from (11) as follows: where the integrals over [r 1 , r 4 ] are evaluated by the trapezoidal method with the interval [r 1 , r 4 ] partitioned into the three sub-intervals [r 1 , r 2 ] ∪ [r 2 , r 3 ] ∪ [r 3 , r 4 ], i.e, with the points at r 1 , r 2 , r 3 , and r 4 . Equation (24) will contribute to one row of A i and one row of b i as in Theorem 1. With the assumption that A i has full rank with 'stride' set to 3, by setting 'stride' to 3, the number of equations in the minimization problem in Algorithm 2 is two thirds less than that with 'stride' set to 1, and as a result, the computation load is reduced in the numerical minimization. It is remarked that with 'stride' equal to 3, all the four points at r 1 , . . . , r 4 are used by the trapezoidal method to evaluate the integrals over the interval [r 1 , r 4 ] in (24), producing a more precise value of integral than the one that would be obtained with the two end points at r 1 and r 4 only. An example of calling adpSetModelFree is shown in Listing A4 in the Appendix A. Similarly, adpModelFree takes the output of adpSetModelFree as an argument to validate the options specified.

Applications to the Satellite Attitude Stabilizing Problem
In this section, we apply the ADPT to the satellite attitude stabilizing problem because a stabilization problem can be formulated as an optimal control problem. In the first example, the system model is known and the controller is computed by the function adpModelBased. The same problem is solved again in the second example by the function adpModelFree when the system dynamics is unknown. The source codes for these two examples are available at https://github.com/Everglow0214/The_Adaptive_Dynamic_ Programming_Toolbox (accessed on 10 August 2021), where more applications of the toolbox can be found.

Model-Based Case
Let H denote the set of quaternions and S 3 = {q ∈ H | q = 1}. The equations of motion of the continuous-time fully-actuated satellite system are given bẏ where q ∈ S 3 represents the attitude of the satellite, Ω ∈ R 3 is the body angular velocity vector, I ∈ R 3×3 is the moment of inertia matrix and u ∈ R 3 is the control input. The quaternion multiplication is carried out for qΩ on the right-hand side of (25) where Ω is treated as a pure quaternion. By the stable embedding technique [22], the system (25) and (26) defined on S 3 × R 3 is extended to the Euclidean space H × R 3 [23,24] aṡ where q ∈ H, Ω ∈ R 3 and α > 0. Consider the problem of stabilizing the system (27) and (28) at the equilibrium point (q e , Ω e ) = ((1, 0, 0, 0), (0, 0, 0)). The error dynamics is given bẏ where e q = q − q e and e Ω = Ω − Ω e are state errors. Since the problem of designing a stabilizing controller can be solved by designing an optimal controller, we pose an optimal control problem with the cost integral (2) with q(x) = x T Qx, where x = (e q , e Ω ) ∈ R 7 and Q = 2I 7×7 , and R = I 3×3 . The inertia matrix I is set to I = diag(0.1029, 0.1263, 0.0292). The parameter α that appears in the above error dynamics is set to α = 1. We set the option 'xInit' with three different initial states. For each initial state, the option 'tSpan' is set to [0,15]. We use the option 'explSymb' to set exploration signals; refer, for the usage of the option 'explSysb', to the user manual available at https:// github.com/Everglow0214/The_Adaptive_Dynamic_Programming_Toolbox (accessed on 10 August 2021). For the initial control u 0 , the default initial control is used, which is an LQR controller computed for the linearization of the error dynamics around the origin with the weight matrices Q = 2I 7×7 and R = I 3×3 . We then call the function, adpModelBased, to generate controllers of degree d = 1, 2, 3. The computation time taken by the function, adpModelBased, to produce the controllers are recorded in Table 1. For the purpose of comparison, we also apply Al'brekht's method with the non-linear systems toolbox (NST) [16] to produce controllers of degree d = 1, 2, 3 for the same optimal control problem, and record their respective computation time in Table 1. For comparison in terms of optimality, we apply the controllers to the system (27) and (28) for the initial error state x 0 = ((cos(θ/2) − 1, sin(θ/2), 0, 0), (0, 0, 0)) with θ = 1.99999π and compute their corresponding values of the cost integral in Table 1. Since we do not know the exact optimal value of the cost integral J(x 0 , u) for this initial state, we employ the software package called ACADO [18] to numerically produce the optimal control for this optimal control problem with the given initial state. We note that both NST and ACADO are model-based.
We can see in Table 1 that ADPT in the model-based mode is superior to NST in terms of optimality, and ADPT (model-based) for d = 2, 3 is on par with ACADO in terms of optimality. Notice however that ACADO produces an open-loop optimal control for each given initial state, which is a drawback of ACADO, while ADPT produces a feedback optimal control that is independent of initial states. Moreover, even for the given initial state ACADO takes a tremendous amount of time to compute the open-loop optimal controller. From these observations, we can say that ADPT in the model-based mode is superior to NST and ACADO in terms of optimality, speed, and usefulness all taken into account.

Model-Free Case
Consider solving the same optimal problem as in Section 4.1, but the system dynamics in (25) and (26), or equivalently the error dynamics are not available. Since we do not have real trajectory data available, for the purpose of demonstration we generate some trajectories with four initial states for the error dynamics, where the same initial control u 0 and exploration signals η are used as the model-based case in Section 4.1. The simulation for data collection is run over the time interval [0, 20] with the recording period being 0.002 s, producing 10,000 = 20/0.002 sampled points for each run. For the function adpModelFree, the option of 'stride' is set to 4. Then, the function, adpModelFree, is called to generate controllers of degree d = 1, 2, 3, the computation time taken for each of which is recorded in Table 1. For the purpose of comparison in terms of optimality, we apply the controllers generated by adpModelFree to the system (27) and (28) with the initial error state x 0 = ((cos(θ/2) − 1, sin(θ/2), 0, 0), (0, 0, 0)) with θ = 1.99999π and compute the corresponding values of the cost integral; see Table 1 for the values.
From Table 1, we can see that ADPT in the model-free mode takes more computation time than ADPT in the model-based mode, and the cost integrals by ADPT in the modelfree working mode is slightly higher than those in the model-based working mode, since the integrals in the iteration process are evaluated less accurately. However, ADPT in the model-free mode is superior to NST in terms of optimality and to ACADO in terms of computation time. More importantly, it is noticeable that the result by model-free ADPT is comparable to model-based ADPT, which shows the power of data-based adaptive dynamic programming and the ADP toolbox.
To see how the computed optimal controller works in terms of stabilization, the norm of the state error under the control with d = 3 generated by ADPT in the model-free mode is plotted in Figure 1 together with the norm of state error by the NST controller with degree 3. We can see that the convergence to the origin is faster with the model-free ADP controller than with the controller by NST that is model-based. This comparison result is consistent with the comparison of the two in terms of optimality.

Discussion
To compare with other toolboxes on ADP or RL, we investigate MATLAB reinforcement learning toolbox with the same control problem. Equations (27) and (28) are discretized using the 4th order Runge-Kutta method to construct the environment in reinforcement learning toolbox. The integrand in (2) is taken as the reward function. The deep deterministic policy gradient (DDPG) algorithm [25] is selected to train the RL agent since the control input in (26) is continuous. However, it is found in simulations that the parameters of the agent generally diverge even after a long training time and the system cannot be stabilized. The reason probably is that by setting only parameters of the exploration signal of standard normal distribution such as mean and deviation rather than choosing an exploration signal of a specific form, the system states may go to infinity in some episodes. Although one may stop the episode before all steps run out in such a situation, the experiences saved in the replay buffer may be detrimental to the training. On the other hand, the options provided by ADPT allow the user to determine what kind of trajectories to be used so that the optimal feedback control may be found quickly.

Conclusions and Future Work
The adaptive dynamic programming toolbox, a MATLAB-based package for optimal control for continuous-time control-affine systems, has been presented. By employing the adaptive dynamic programming technique, we propose a computational methodology to approximately produce the optimal control and the optimal cost function, where the Kronecker product used in previous literature is replaced by Euclidean inner product for less memory consumption at runtime. The ADPT can work in the model-based mode or in the model-free mode. The model-based mode deals with the situation where the system model is given while the model-free mode handles the situation where the system dynamics are unknown but only system trajectory data are available. Multiple options are provided, such that the ADPT can be easily customized. The optimality, the running speed, and the utility of the ADPT are illustrated with a satellite attitude stabilizing problem.
Currently control policies and cost functions are approximated by polynomials in the ADPT. As mathematical principles of neural networks are being revealed [26,27], we plan to use deep neural networks in addition to polynomials in the ADPT to approximately represent optimal controls and optimal cost functions to provide users of the ADPT more options.