Within this section, we look at the CC framework based on the gPC approximation within the OCP that should approximate the probability of not being the failure event, i.e.,
${\mathbb{P}}_{OCP}({\mathbf{y}}_{i}^{\left(j\right)}\notin \mathcal{F}),\forall i$. In
Section 3.1, the general formulation of CCs in the deterministic OCP is introduced. Afterwards,
Section 3.2 introduces a differentiable approximation of the sharp CCs and a homotopy strategy to iteratively sharpen the differentiable CC representation. The SubSim method and its incorporation within CC–OC to calculate rareevent failure probabilities are described in
Section 3.3.
3.1. Derivation of Chance Constraint Formulation
Sampling techniques such as the Metropolis–Hastings algorithm (MHA) [
21] or importance sampling [
22] are frequently used to approximate the probability of an event (in this case: fulfilling a CC) when its pdf is difficult to sample from or integrate. A drawback of these methods is a nondeterministic evaluation procedure of the probability. Generally, this study still tries to apply samplingbased algorithms to estimate the probability
${\mathbb{P}}_{OCP}({\mathbf{y}}_{i}\notin \mathcal{F})$ in the OCP (Equation (6)). Additionally, rare events should be covered, which makes SubSim a viable choice [
12]. The basic SubSim method uses a modified Metropolis–Hastings algorithm (MMHA), i.e., random sampling, to explore the failure region and calculate the failure probability. Here, a further issue arises when using direct methods to solve OCPs with Newtontype NLP solvers: the samples cannot be redrawn in each iteration of the NLP solution process as this would yield a stochastic Newtontype optimization procedure. Generally, this would be necessary in the context of sampling techniques, such as SubSim, which ultimately results in problems defining accurate stepsizes and exit criteria in the NLP. Thus, this study uses a homotopy strategy to cope with these issues that move the creation of new samples from the NLP iteration to a homotopy step.
In order to apply the mentioned sampling techniques, we need a good approximation for the probabilistic quantity, i.e., the quantity with respect to whom the CC is defined, depending on the stochastic disturbance. When applying gPC, the gPC expansion in Equation (3) provides this approximation. Thus, in cases where the expansion coefficients are available within the NLP, as e.g., in Equation (6) (remember that Equations (2), (3) and (5) can be applied to calculate the expansions coefficients for any output quantity based on the known physical trajectories at the SC nodes for the states used in Equation (6)), we can sample the gPC expansion for thousands of samples via a matrixvector operation in an MCAtype way, but with improved efficiency due to the simple evaluation as follows: consider
${n}_{s}$ random samples obtained from the pdf of
$\mathit{\theta}$, labeled
${\mathit{\theta}}^{\left(1\right)},\cdots ,{\mathit{\theta}}^{\left({n}_{s}\right)}$. It should be noted that these samples can now be drawn randomly in contrast to the SC method as we are not trying to approximate the integral in Equation (4), but the probability of the CC. These samples for the uncertain parameters yield corresponding samples for the output
$\mathbf{y}$, given by:
such that the output samples are provided from a simple matrixvector multiplication operating on the expansion coefficients
$\widehat{\mathbf{y}}$, which are part of the OCP formulation due to Equations (2), (3) and (5). With the samples available from Equation (10), the general equation for fulfilling, i.e., not being in the failure set, a CC is given as follows:
Here,
$\mathcal{I}\left(\mathbf{y}\left(\mathbf{z};\mathit{\theta}\right)\right)$ is the indicator function, defined as:
It should be noted that the indicator function
$\mathcal{I}$ is trivial to evaluate but nondifferentiable, and can therefore create difficulties when used in the context of a Newtontype NLP solver. Thus, we introduce a smooth approximation
s of the indicator functions having the following properties:
3.2. Approximation of Chance Constraints by Sigmoids
A group of functions that can be used for the approximation of an indicator function that must fulfill the conditions given in Equation (13) are the logistics functions. An example for this class of functions is the sigmoid function, which is defined as follows for a scalar output
y:
The parameters
$a\in \mathbb{R}$ and
$b\in \mathbb{R}$ are the scaling and offset parameter of the sigmoid, respectively. These are used to shape the sigmoid in order to suitably approximate the desired CC domain. Their design using a homotopy strategy while solving the CC–OC problem is illustrated in Algorithm 1.
Algorithm 1 Implemented homotopy strategy for sigmoid scaling and offset parameter in CC–OC framework. 
Require: Define the homotopy factor ${a}_{hom}$ and the desired final sigmoid parameter ${a}_{desired}$.
 1:
Initialize sigmoid parameter a and confidence level $CL$.  2:
Define the bound value of the sigmoid ${y}_{bound}$ (i.e., the bound value of the CC)  3:
while$a<{a}_{desired}$do  4:
Calculate the sigmoid values:  5:
$c=\frac{\mathrm{ln}\left(\frac{1}{CL}1\right)}{a}$  6:
$b={y}_{bound}c$  7:
Solve the CC–OC problem including SubSim in Algorithm 4.  8:
Increase a by homotopy factor: $a={a}_{hom}\xb7a$.  9:
end while  10:
return Robust optimal trajectory.

Furthermore, the sigmoid in Equation (14) has a very simple derivative that can be used to efficiently calculate the gradient that is necessary for OC. It is given as follows:
The sigmoid in Equation (14) can be combined by multiplication in order to approximate the indicator function for
$\mathcal{F}$ being an interval in
$\mathbb{R}$. This is depicted in
Figure 1, which shows the multiplication of two sigmoids (solid blue) with one gradual descend (dashed green; number 1) and one steep ascend (dashed red; number 2) to approximate a box constraint on a scalar output. Here, one sigmoid
$s\left(y;{a}_{lb},{b}_{lb}\right)$ describes the lower bound, while the other sigmoid
$s\left(y;{a}_{ub},{b}_{ub}\right)$ describes the upper bound.
For the sake of simplicity, we further assume box constraints on all our CCs, i.e., we assume that $\mathcal{F}$ is a hyperrectangle with lower and upper bounds $l{b}_{i},\phantom{\rule{0.166667em}{0ex}}u{b}_{i}$ on dimension $i=1,\dots ,{n}_{y}$. This is a viable assumption for most OCP applications, as box constraints are very prominent in OC. The proposed approach can be trivially extended to any set $\mathcal{F}$ that can be described by a set of smooth inequality constraints.
We can then form the hyperrectangle by applying the basic sigmoid in Equation (14) as follows:
Here,
$\mathbf{a}={\left[{a}_{l{b}_{1}},{a}_{u{b}_{1}},\cdots ,{a}_{l{b}_{{n}_{y}}},{a}_{u{b}_{{n}_{y}}}\right]}^{T}$ and
$\mathbf{b}={\left[{b}_{l{b}_{1}},{b}_{u{b}_{1}},\cdots ,{b}_{l{b}_{{n}_{y}}},{b}_{u{b}_{{n}_{y}}}\right]}^{T}$ are simplifying notations. Take into account that the derivative of Equation (16) can be formed using the chain rule and Equation (15).
In order to calculate the probability, we must only sum up the function values of the multidimensional sigmoid in Equation (16) and divide it by the number of samples as follows:
This approximation can now be used within the OCP (Equation (6)). In order to include rare failure events, the next subsection introduces the SubSim method that elaborates on the CC modeling of this subsection.
3.3. Subset Simulation in ChanceConstrained Optimal Control
The probability approximation in Equations (11) and (17) converges for reasonably low choices of
${n}_{s}$ only if rather loose bounds on the probability (e.g., domain of
$\eta =99\%$) are considered. For tighter bounds typically used for rare events, as often required in e.g., reliability engineering (where
$\eta =99.9999\%$ is common), better suited algorithms to calculate and sample the probability are required. Indeed, a reliable estimation of the probability of rare events normally requires a very large number of samples. A classical approach to circumvent this difficulty is the use of SubSim, which is tailored to evaluate the probability of rare events [
11,
12,
23].
SubSim methods are based on an MCMC algorithm typically relying on a MMHA, which ensures that the failure region is properly covered by the samples. To that end, it stratifies the choice of samples iteratively in order to draw significantly more samples from the failure region than a classical MCA sampling would. SubSim methods are based on an expansion of the failure probability as a series of conditional probabilities:
In Equation (18),
$\mathcal{F}$ is the set of failure events and
${\mathcal{F}}_{1}\supset {\mathcal{F}}_{2}\supset \dots \supset {\mathcal{F}}_{m}=\mathcal{F}$ is a sequence of set of events with decreasing probability of occurrence. The conditional probability
$\mathbb{P}\left({\mathcal{F}}_{i}{\mathcal{F}}_{i1}\right)$ describes the probability that an event in
${\mathcal{F}}_{i}\subset {\mathcal{F}}_{i1}$ occurs assuming that an event in
${\mathcal{F}}_{i1}$ has already occurred. Thus, instead of evaluating the rare event
$\mathbb{P}\left(\mathcal{F}\right)=\mathbb{P}\left({\mathcal{F}}_{m}\right)$, one can evaluate a chain of relatively likely conditional probabilities
$\mathbb{P}\left({\mathcal{F}}_{i}{\mathcal{F}}_{i1}\right)$, each of which is relatively easy to evaluate via sampling.
The evaluation of the conditional probabilities is the main task in SubSim, achieved using e.g., the MMHA approach. The MMHA, working on each component of a random vector (i.e., vector of one–dimensional random variables) (RVec), is introduced in Algorithm 2 [
11,
12,
24].
Algorithm 2 Modified Metropolis–Hastings algorithm for subset simulation for each component of the random vector ${\theta}_{i}$ to create new sample for random parameter (after [11]). 
Require: Current sample for the random parameter: ${\theta}_{i}$. 1:
Define a symmetric proposal pdf, ${\rho}_{i}^{*}\left(\tilde{\theta}{\theta}_{i}\right)={\rho}_{i}^{*}\left({\theta}_{i}\tilde{\theta}\right)$, centered around the current random sample ${\theta}_{i}$.  2:
Generate candidate sample $\tilde{\theta}$ from ${\rho}_{i}^{*}\left(\tilde{\theta}{\theta}_{i}\right)$ and calculate the system response (e.g., by Equations (1), (3), or (6)) of this candidate.  3:
Calculate the ratio between the proposed and the candidate sample evaluated at the target pdf, e.g., $r=\frac{\rho \left(\tilde{\theta}\right)}{\rho \left({\theta}_{i}\right)}$ with $\rho \propto N\left(0,1\right)$ (this is the stationary pdf according to the central limit theorem ([ 17], p. 23)).  4:
Set the acceptance ratio of the candidate sample $\tilde{\theta}$ as follows: $a=min\{1,r\}$ and accept it with this probability.  5:
Draw a random sample from the uniform distribution as follows: $s\propto \mathcal{U}\left(0,1\right)$.  6:
if$s\ge a$then  7:
Set: ${\theta}_{p}=\tilde{\theta}$  8:
else  9:
Set: ${\theta}_{p}={\theta}_{i}$  10:
end if  11:
Update the new sample by the rule:  12:
if${\theta}_{p}\in {\mathcal{F}}_{i}$then  13:
Set: ${\theta}_{\mathrm{new},i}={\theta}_{p}$  14:
else  15:
Set: ${\theta}_{\mathrm{new},i}={\theta}_{i}$  16:
end if  17:
return New sample for the random parameter: ${\theta}_{\mathrm{new},i}$.

Normally, the MCMC algorithm based on the MMHA in Algorithm 2 is fast converging as especially the sampling of new candidates is done locally around the current sample point. Thus, the acceptance rate is normally very high and progress is made quite fast. An issue of the MMHA is the choice of an appropriate proposal distribution
${\rho}_{i}^{*}$. Here, generally a Uniform or a Gaussian pdf is chosen, in order to have a simple evaluation and the symmetric property. The general behavior of the MMHA in SubSim can be visualized as in
Figure 2: it can be seen that after the random sampling by MCA in level 0, the samples get shifted to the failure domain, which are the arcs in the corners of the domain. This is done until a sufficient amount of samples is located in the failure domain.
We detail the SubSim method as in Algorithm 3 [
11,
12]. The SubSim starts with a general MCA and afterwards subsequently evaluates the failure region yielding the chain of conditional probabilities. It should be noted that the choice of the intermediate failure events
${\mathcal{F}}_{1,\dots ,m}$ is critical for the convergence speed of the SubSim. In [
11], the “critical demandtocapacity ratio” is introduced that is based on normalized intermediate threshold values. Based on this ratio, it is common to choose the thresholds adaptively such that the conditional probabilities are equal to a fixed, predefined value ([
12], p. 158). This is done by appropriately ordering the previous samples and their result. An often and a normally very efficient conditional probability value is
${p}_{0}=0.1$ [
11].
Finally, we can estimate the failure probability of the SubSim regarding the desired threshold
b and the
$(m1)th$ Markov chain element, which is the last one of the SubSim as follows ([
12], p. 179) (see Algorithm 3 line 12):
Algorithm 3 General algorithm used for a subset simulation in connection with generalized polynomial chaos (after [12], p. 158ff). 
Require: Define the number of samples per level ${n}_{s}$, the conditional probability ${p}_{0}$, and the critical threshold b. 1:
Calculate the number of Markov chains ${n}_{c}={p}_{0}\xb7{n}_{s}$ and the number of samples ${n}_{sc}={p}_{0}^{1}$ for each of the chains.  2:
Initialize the SubSim by creating the random sample set $\{{\left(\mathit{\theta}\right)}_{k}^{\left(0\right)}:k=1,\cdots ,{n}_{s}\}$.  3:
Calculate the output set $\{{\mathbf{y}}_{k}^{\left(0\right)}\left(\mathbf{z};{\left(\mathit{\theta}\right)}_{k}^{\left(0\right)}\right):k=1,\cdots ,{n}_{s}\}$ by Equation (3) related to $\{{\left(\mathit{\theta}\right)}_{k}^{\left(0\right)}:k=1,\cdots ,{n}_{s}\}$.  4:
Sort $\{{\mathbf{y}}_{k}^{\left(0\right)}\left(\mathbf{z};{\left(\mathit{\theta}\right)}_{k}^{\left(0\right)}\right):k=1,\cdots ,{n}_{s}\}$ in ascending order to create $\{{\mathbf{b}}_{k}^{\left(0\right)}:k=1,\cdots ,{n}_{s}\}$. Here, ${\mathbf{b}}_{k}^{\left(0\right)}$ is an estimate of the exceedance probability $\mathbb{P}\left[\mathbf{y}\left(\mathbf{z};\left(\mathit{\theta}\right)\right)>\mathbf{b}\right]=\frac{{n}_{s}k}{{n}_{s}}$.  5:
Set ${\mathbf{b}}_{1}={\mathbf{b}}_{{n}_{s}{n}_{c}}^{\left(0\right)}$ and $\{{\left(\mathit{\theta}\right)}_{j0}^{\left(1\right)}:j=1,\cdots ,{n}_{c}\}$ corresponding to $\{{\mathbf{b}}_{{n}_{s}{n}_{c}+j}^{\left(0\right)}:j=1,\cdots ,{n}_{c}\}$ as the threshold and the seeds for the next level.  6:
for i=1...m1 do  7:
Use e.g., the MMHA (Algorithm 2) to generate the samples $\{{\left(\mathit{\theta}\right)}_{jk}^{\left(i\right)}:k=1,\cdots ,{n}_{sc}\}$ of the conditional pdf ${\rho}_{i}^{*}\left(.{\mathcal{F}}_{i}\right)$ for each seed $\{{\left(\mathit{\theta}\right)}_{j0}^{(i1)}:j=1,\cdots ,{n}_{c}\}$. This creates ${n}_{c}$ Markov chains with ${n}_{sc}$ samples.  8:
Calculate the output set $\{{\mathbf{y}}_{jk}^{\left(i\right)}\left(\mathbf{z};{\left(\mathit{\theta}\right)}_{jk}^{\left(i\right)}\right):j=1,\cdots ,{n}_{c},k=1,\cdots ,{n}_{sc}\}$ by Equation (3) related to $\{{\left(\mathit{\theta}\right)}_{jk}^{\left(i\right)}:j=1,\cdots ,{n}_{c},k=1,\cdots ,{n}_{sc}\}$.  9:
Sort $\{{\mathbf{y}}_{jk}^{\left(i\right)}\left(\mathbf{z};{\left(\mathit{\theta}\right)}_{jk}^{\left(i\right)}\right):j=1,\cdots ,{n}_{c},k=1,\cdots ,{n}_{sc}\}$ in ascending order to create $\{{\mathbf{b}}_{k}^{\left(i\right)}:k=1,\cdots ,{n}_{s}\}$. Here, ${\mathbf{b}}_{k}^{\left(i\right)}$ is an estimate of the exceedance probability $\mathbb{P}\left[\mathbf{y}\left(\mathbf{z};\left(\mathit{\theta}\right)\right)>\mathbf{b}\right]={p}_{0}^{i}\frac{{n}_{s}k}{{n}_{s}}$.  10:
Set ${\mathbf{b}}_{i+1}={\mathbf{b}}_{{n}_{s}{n}_{c}}^{\left(i\right)}$ and $\{{\left(\mathit{\theta}\right)}_{j0}^{(i+1)}:j=1,\cdots ,{n}_{c}\}$ corresponding to $\{{\mathbf{b}}_{{n}_{s}{n}_{c}+j}^{\left(i\right)}:j=1,\cdots ,{n}_{c}\}$ as the threshold and the seeds for the next level.  11:
end for  12:
Calculate the failure probability ${\mathbb{P}}_{ss}\left(\mathbf{y}\left(\mathbf{z};\mathit{\theta}\right)\notin \mathcal{F}\right)$ based on Equation (19)  13:
return Failure probability ${\mathbb{P}}_{ss}\left(\mathbf{y}\left(\mathbf{z};\mathit{\theta}\right)\notin \mathcal{F}\right)$.

It should be noted that, for the OCP in Equation (1) or Equation (6), the calculated probability in Equation (19) must be subtracted from 1 as the CC in Equation (6) is defined for not being in the failure set. Here,
$\tilde{\mathcal{I}}\left(\mathbf{y}\left(\mathbf{z};\mathit{\theta}\right)\right)$ is the complementary indicator function from Equation (12) defined for the failure region:
Take into account that the accuracy of Equation (19) can be quantified using the coefficient of variation (c.o.v.):
Here,
$\mathbb{E}\left[\mathbb{P}\left(\mathcal{F}\right)\right]$ is given by Equation (19), while the standard deviation of the failure probability can be calculated by a Beta pdf fit as proposed in ([
25], p. 293). Overall, we can compare the resulting c.o.v. with literature values [
24] to access the viability. Generally, a small c.o.v. indicates that the standard deviation of our failure probability estimation is smaller than our expected/mean value. Thus, the goal is to have a small c.o.v. as then the dispersion of the data is small and we can be certain about the CC being fulfilled.
In this study, we propose to introduce the SubSim algorithm in the CC–OC algorithm. Our procedure is to calculate the subset samples based on the analytic response surface of the gPC expansion (Equation (3)), which is based on the initial solution of the OCP (Equation (1)) by MCA. The samples are then used to run a new optimization fulfilling the desired rare event probability. A new response surface is calculated from which new samples are generated using a SubSim. This procedure is repeated until both the SubSim probability ${\mathbb{P}}_{ss}\left(\mathbf{y}\left(\mathbf{z};\mathit{\theta}\right)\notin \mathcal{F}\right)$ (Equation (19)) as well as the probability level assigned to the constraint ${\mathbb{P}}_{OCP}\left(\mathbf{y}\left(\mathbf{z};\mathit{\theta}\right)\notin \mathcal{F}\right)$ (Equation (6)) in the OCP fulfill the desired rare event probability ${\mathbb{P}}_{des}\left(\mathbf{y}\left(\mathbf{z};\mathit{\theta}\right)\notin \mathcal{F}\right)$. The procedure is described in Algorithm 4. It should be noted that this procedure can generally be applied as long as the underlying OCP in Equation (6) can be solved.
Algorithm 4 Basic strategy of the subset simulation algorithm within CC–OC framework. 
Require: OCP as in Equation (6) with initial guess for decision variables $\mathbf{z}$. 1:
Calculate an optimal solution for a likely failure (e.g., $\mathbb{P}\left(\mathbf{y}\left(\mathbf{z};\mathit{\theta}\right)\notin \mathcal{F}\right)=99\%$) using MCA.  2:
Obtain the subset probability ${\mathbb{P}}_{ss}\left(\mathbf{y}\left(\mathbf{z};\mathit{\theta}\right)\notin \mathcal{F}\right)$ and samples, based on the analytic gPC response surface (Equation (3)) and by applying Algorithm 3.  3:
while${\mathbb{P}}_{ss}\left(\mathbf{y}\left(\mathbf{z};\mathit{\theta}\right)\notin \mathcal{F}\right)>{\mathbb{P}}_{des}\left(\mathbf{y}\left(\mathbf{z};\mathit{\theta}\right)\notin \mathcal{F}\right)\phantom{\rule{3.33333pt}{0ex}}\mathbf{and}\phantom{\rule{3.33333pt}{0ex}}{\mathbb{P}}_{OCP}\left(\mathbf{y}\left(\mathbf{z};\mathit{\theta}\right)\notin \mathcal{F}\right)>{\mathbb{P}}_{des}\left(\mathbf{y}\left(\mathbf{z};\mathit{\theta}\right)\notin \mathcal{F}\right)$do  4:
Assign the SubSim samples to the evaluation routine of the CC within the OCP.  5:
Solve the CC–OC problem (Equation (6)).  6:
if Optimization not successful then  7:
Reduce the probability of the constraint ${\mathbb{P}}_{OCP}\left(\mathbf{y}\left(\mathbf{z};\mathit{\theta}\right)\notin \mathcal{F}\right)$ to relax the OCP (e.g., by factor of 10).  8:
else  9:
if ${\mathbb{P}}_{OCP}\left(\mathbf{y}\left(\mathbf{z};\mathit{\theta}\right)\notin \mathcal{F}\right)\ne {\mathbb{P}}_{des}\left(\mathbf{y}\left(\mathbf{z};\mathit{\theta}\right)\notin \mathcal{F}\right)$then  10:
Increase the constraint probability within the OCP (e.g., factor of 10).  11:
end if  12:
end if  13:
Obtain the new subset probability ${\mathbb{P}}_{ss}\left(\mathbf{y}\left(\mathbf{z};\mathit{\theta}\right)\notin \mathcal{F}\right)$ from Equation (19) and samples based on the new analytic gPC response surface and using Algorithm 3.  14:
end while  15:
return Optimal decision variables $\mathbf{z}$.

Regarding the homotopy strategy, it should be noted that, by using the SubSim samples calculated from the last optimal solution within the new optimization, we might introduce a bias as the samples drawn from the Markov chain are based on the optimal results created by the last NLP solution. Generally, they would have to adapted in each iteration of the NLP as the system response changes. As we do not update the samples within the NLP, but within the homotopy step after the optimal solution has been calculated, we technically solve the CC and its rare event probability using biased samples compared to the ones that would be calculated within the SubSim. We cope with this issue in this paper by checking the fulfillment of the CC both in the OCP as well as after the OCP is solved by the SubSim, i.e., with the new response surface. Thus, the CC–OC is only solved if both results show that the CC is fulfilled to the desired level. Within this paper, the OCP converges fine, but further studies should explore the effects and influences of this bias and how to reduce it (e.g., by importance sampling).