Pontryagin Neural Networks with Functional Interpolation for Optimal Intercept Problems

: In this work, we introduce Pontryagin Neural Networks (PoNNs) and employ them to learn the optimal control actions for unconstrained and constrained optimal intercept problems. PoNNs represent a particular family of Physics-Informed Neural Networks (PINNs) speciﬁcally designed for tackling optimal control problems via the Pontryagin Minimum Principle (PMP) application (e.g., indirect method). The PMP provides ﬁrst-order necessary optimality conditions, which result in a Two-Point Boundary Value Problem (TPBVP). More precisely, PoNNs learn the optimal control actions from the unknown solutions of the arising TPBVP, modeling them with Neural Networks (NNs). The characteristic feature of PoNNs is the use of PINNs combined with a functional interpolation technique, named the Theory of Functional Connections (TFC), which forms the so-called PINN-TFC based frameworks. According to these frameworks, the unknown solutions are modeled via the TFC’s constrained expressions using NNs as free functions. The results show that PoNNs can be successfully applied to learn optimal controls for the class of optimal intercept problems considered in this paper.


Introduction
Over the last century, optimal control theory has garnered interest in many fields of study. Understanding the dynamics of systems and optimization methods is a difficult challenge for researchers in various scientific fields, from space guidance to epidemiology. Many numerical methods have been developed for optimizing the cost function of the optimal control problems, which usually depends on the state and control variables governing the problem. In general, solution methods for optimal control problems fall into two categories: direct methods and indirect methods [1]. In the direct method, the system's state and control are discretized and, thus, the problem is transcribed into a Non-Linear Optimization Problem or Non-Linear Programming Problem (NLP). These NLPs can be solved using well-known optimization techniques, such as Trust Region Method, Nelder-Mead Method, or Interior Point Methods [2]. The optimal solution is found by transcribing an optimization problem from infinite-dimensional to finite-dimensional, according to these techniques. Conversely, indirect methods are based on the first-order necessary conditions retrieved by direct application of Pontryagin Minimum Principle (PMP). The necessary conditions result in a Two-Point Boundary Value Problem (TPBVP) in the state-costate pair that, if an analytical solution does not exist, must be solved via numerical techniques (e.g., single and multiple shooting methods [3,4], orthogonal collocation [5], or pseudo-spectral methods [6]). Generally, the indirect method provides a solution that is guaranteed to be optimal by definition. This is the reason why it should be preferred with respect to the direct method. However, the numerical solution of the TPBVP is often robust PINN model, as developed by the authors, which merges NN and TFC [12,13]. That is, we will use a PINN-TFC based model called X-TFC. This PINN-TFC based framework approximates the latent solution of the DEs via the so-called Constrained Expressions (CEs). The CEs are the critical ingredient of the Theory of Functional Connections or TFC, developed by Mortari [14], which is a method for functional interpolation where functions are approximated using these CEs. A constrained expression is a functional that is made up of the sum of two terms. The first term is a free-function, while the second term is a functional that analytically fulfils the constraints independently from the choice of the free-function [14,15]. Thanks to the CEs analytically embedding the constraints, TFC has many applications. In particular, TFC is widely used for the solution of DEs, because the CEs remove the "curse of the equation constraints" [16][17][18]. Additionally, TFC has already been used to solve several classes of optimal control space guidance problems such as energy optimal landing on large and small planetary bodies [19,20], fuel optimal landing on large planetary bodies [21], and energy optimal relative motion problems subject to Clohessy-Wiltshire dynamics [22]. For solving DEs, the standard TFC method uses a linear combination of orthogonal polynomials, such as Legendre or Chebyshev polynomials [16,17], as a free function. Using orthogonal polynomials as a free-function makes the standard TFC framework suffer from the curse of dimensionality, especially when solving large-scale PDEs. In order to overcome this limitation, X-TFC uses a single layer NN trained via the Extreme Learning Machine (ELM) algorithm [23] to represent the free-function. This is the reason why X-TFC can be labeled as a PINN framework. Although, in this work, X-TFC is used to learn the state-costate solution that solves the TPBVP arising from the application of the PMP solely in a physics-driven fashion, being a PINN framework, it can also be used for the data-driven solution of DEs, data-driven DE parameter discovery, and data-driven discovery of DEs [8]. The manuscript is organized, as follows. First, we explain how PoNNs are modeled and trained to learn control actions for optimal control problems tackled via the indirect method. PoNNs are then employed to learn optimal control actions for the minimum time-energy optimal intercept problem, both with unbounded and bounded control. Finally, the results will be discussed in the remaining sections.

Pontryagin Neural Networks for Optimal Control Problems
PoNNs represent a particular family of PINNs specifically designed to tackle optimal control problems via the Pontryagin Minimum Principle (PMP) application (e.g., indirect method). The term "Pontryagin Neural Networks" (PoNNs) is coined here by the authors to refer in a more compact form to "PINNs designed to learn optimal control via indirect method". The PMP provides first-order necessary optimality conditions, which result in a Two-Point Boundary Value Problem (TPBVP). More precisely, PoNNs learn the optimal control actions from the unknown solutions of the arising TPBVP, modeling them with Neural Networks (NNs). That is, PoNNs model a NN representation of functions for which the residuals of the TPBVP are as close to zero as possible, i.e., the solutions of the TPBVP. This NN representation of the TPBVP unknown solutions is done combining PINNs with TFC. Together, PINNs and TFC form the so-called PINN-TFC based framework. A generic differential equation's unknown solutions are modeled via the TFC's constrained expressions using NNs as free functions, according to this framework. When Deep NNs (DNNs) are used as free function, the PINN-TFC framework is known as Deep-TFC [12]. When shallow NNs, trained via Extreme Learning Machine (ELM) algorithm, are used as free function, the PINN-TFC framework is known as Extreme TFC (X-TFC) [13]. In particular, PoNNs uses the X-TFC approach.
In this section, all of the key elements of the PoNNs will be explored in detail. First, how OCPs are faced using the indirect method is presented. Moreover, bounds on the control will also be taken into consideration. We will then explain how standard PINNs and X-TFC are employed to learn the unknown solutions of the arising TPBVP, which consequently lead to retrieve the optimal control actions.

Optimal Control Problems via Indirect Method
In this section, Unconstrained Optimal Control Problems are first introduced and then the extension to Constrained Optimal Control Problems is presented following the procedure that was proposed in [24]. UOCPs are generally based on a cost function, whose expression is given by: subject to the dynamic constraints, expressed as, and the boundary conditions, where x(t) is the state vector, u(t) is the control vector, t is the independent variable, t 0 is the initial time, and t f is the final time. The terms Φ and L , appearing in Equation (1) are the end-point cost and Lagrangian, respectively [2]. If the Pontryagin Maximum (or Minimum) Principle (PMP) [25] is employed within the indirect method, then the Hamiltonian of the problem needs to be defined as where λ represent the costate (or adjoint variables). According to the first-order optimality conditions of the PMP, the optimal control is derived, as follows, Once the Equation (6) is solved for u, the result is plugged into (5), and the remaining first-order necessary conditions for the state and costate variables can be obtained are derived,ẋ Finally, the transversality conditions have to be applied, if any (e.g., when the corresponding state variable is free). The possible transversality conditions are the following, Equations (7) and (8), along with the transversality conditions on the Hamiltonian, represent a BVP, which, in the proposed method, is solved using PINNs, in particular X-TFC. One should note that the transversality conditions on the costate will be a priori satisfied with the X-TFC approach, as will be explained below. Thus, they do not explicitly appear in the BVP.
To consider control constraints for COCPs, some modifications should be introduced to the previous formulation (see [24] for more detail). First, we need to define a new unconstrained control variable w i for each control constraint that is present in the problem The idea is to replace the constraint d i with a saturation function, as follows where φ i is a smooth and monotonically increasing function and c is a constant value, which useful for modifying the slope of φ i at w i = 0. The advantages of using a saturation function is that it is defined within the range and it approaches the saturation limits asymptotically for w i → ±∞. Other saturation function, such as tanh, can also be employed. One can note that control constraints represent input constraints in the sense that the order of derivative in which the saturation function appears in the Hamiltonian is zero, as shown by Equation (16). Because a new control variable is introduced, a new term in the cost function, called the regularization term, is added, as follows, where is the regularization parameter. Having a low value of means that the new control problem with the new unconstrained control becomes similar to the original constrained control problem, as desired. However, this parameter can affect either the saturation function or its derivative, according to the order of the constraint. Indeed, decreasing results in higher values of the augmented control w, which would approach the asymptotic limits. Therefore, if approaches 0, w could tend to infinity, leading to numerical problems. To summarize, even if close to 0 leads to having the solution of the new UOCP closer to the one of the original COCP, one should be careful to set the value of . For more information about the influence of on the control and the other optimization parameters, the reader can refer to Ref. [26]. In order to obtain accurate solutions, a continuation procedure on is usually exploited to bring the new UOCP control problem to match the original COCP. According to Equation (15), the new Hamiltonian of the problem is, where ν i represents additional multipliers that take the new equality constraints into account. It is worth to note that it can happen that the optimality condition on the control u ( ∂H ∂u = 0) is represented by a trascendental form and is no longer able to provide a closed form solution for u. Because a new unconstrained control has been added, another equation should be added to the optimality condition on the control, where φ i indicates the derivative of the saturation function with respect to w i . Moreover, the following constraint equation has to be added to the BVP, Finally, the new UOCP is totally defined and the lower the value of is, the more the solution of the new UOCP approaches the solution of the original COCP.

Physics-Informed Neural Networks and Functional Interpolation
Raissi et al. have defined PINNs [7], following the original idea that was introduced by Lagaris et al. [27], to tackle two main problems categories: data-driven discovery and datadriven solution of PDEs. Even though, in [7], PINNs were introduced to tackle problems governed by PDEs, the same methodology is also used to solve problems involving ODEs. PINN uses a single neural network (either shallow or deep) to approximate the solution of DEs. In particular, the training set is represented by points randomly sampled from the full high-resolution data-set and by collocation points that were randomly chosen to enforce the BCs, the DE, and its related constraints. Once the network's output is obtained, it is substituted in the residual of both the DE and BCs. The weights and bias parameters are learned through the training process that minimizes the physics-based loss function via gradient-based methods. Hence, the main characteristic of PINNs is to include the DEs and BCs describing the physics of the problem in their cost functions.The theoretical formulation to learn a generic TPBVP is reported below in order to better understand how a PINN works.

PINN for TPBVPs
The vector differential equation of a generic TPBVP in the time domain can be expressed, in its implicit form, as follows, where i is the number of differential equations forming the ODEs system, and j is the number of unknown functions (y j (t)) that are the solutions of the system. The independent variable is the time t ∈ [t 0 , t f ]. According to the standard PINN framework, the unknown solutions of the system (19) are approximated with a NN. Following [28], as we deal with a system of ODEs, we use a Single Input and Multiple Outputs (SIMO) NN. More precisely, the input is represented by the time t, and the outputs by the y j 's. That is, where θ are the NN parameters that are trained via gradient based methods. Equation (20) represents the PoNN. That is, PoNN is a particular NN that is trained to satisfy the PMP by learning the solution of the arising TPBVP. To train the PoNN, the Mean Square Error (MSE) is to be minimized with respect to the NN parameters, where, with, Here, MSE F , MSE BC 0 , and MSE BC f are the MSEs that take the dynamics (e.g., the ODEs forming the TPBVP, for our purposes), the initial conditions, and the final conditions, respectively, into account. N eq is the number of ODEs in the system (19), N y is the number of unknown solutions y j 's, and {t k F } N F k=1 represent the training points for each Finally, the derivatives of the y NN j (t, θ)'s with respect to time are computed via automatic differentiation.
The attentive reader can clearly see how the MSE is made by multiple objects that, competing during the training, would likely lead to not properly learn the underlying DE solution. Furthermore, when dealing with a system of ODEs, the number of competing objectives increases: learning the ODE latent solution within the domain for all of the equations of the systems, and learning the ODE latent solution on the boundaries for all the unknowns. As previously mentioned, in this research, all of these issues are removed thanks to the combination of the classic PINN methods with the TFC.

X-TFC for TPBVPs
As previously stated, in this work we will use the PINN-TFC based methods, called X-TFC. Here, how to apply X-TFC to solve a generic TPBVP in the time domain is presented in details. When considering the system that is expressed by (19), the first step of the X-TFC method follows the derivation of the constrained expressions, and their derivatives, as developed by TFC, where the superscripts refers to the th derivative with respect to the independent variable, n j is the number of constraints for the jth unknown function and/or its derivatives, η k j are some coefficients, and g j (t) is the free function. According to [14], the functions s k (t), called support functions, can be selected, as following, Once the constrained expression is defined, substituting the constraints on y j (t) and/or its derivatives at the time boundary (e.g., t 0 and t f ) into the constrained expression creates a set of linear algebraic equations, which is then solved for the η k j coefficients. Once the η k j coefficients are calculated, the boundary constraints of Equation (19) are analytically embedded within then constrained expression. Now, substituting the constrained expressions into the F i differential equations transforms them into new set of equations that we define withF i . This new set of equations is now just a function of the free functions g j (t), and their derivatives, and the independent variable t. That is, This vector differential equation is unconstrained, because the boundary conditions are embedded within it through the derived η k j values. To solve Equation (28), X-TFC uses as free functions, g j (t), a shallow Single Input Single Output (SISO) NN trained via ELM algorithm [23]. That is, where b q is the bias of the qth hidden neuron, w q ∈ R is the input weights vector connecting the qth hidden neuron and the input nodes, β q ∈ R with q = 1, . . . , L is the qth output weight connecting the qth hidden neuron and the output node, L is the number of hidden neurons, and σ j (·) are activation functions for the g j (z) free chosen function. Equation (29) represents the X-TFC key difference with respect to the standard TFC, and the reason why the X-TFC framework belongs to the family of the PINN frameworks. Using a NN as free function instead of orthogonal polynomials allows the X-TFC approach to (1) highly reduce the curse of dimensionality in ODE/PDE problems with respect the standard TFC and (2) be framed as PINN method. Because we are using the ELM algorithm to train the NN, the only unknowns to compute are the output weights β j = β j,1 , . . . , β j,L T . The attentive reader will notice the use of a different independent variable, z. This happens because the domain of the activation functions z and the problem domain t are usually not coincident. Therefore, we must map the domain t into the domain z and vice-versa, where c is a mapping coefficient that is, Because c is always a positive number, it is convenient to rewrite it as c = b 2 . Because to the mapping, all subsequent derivatives of g j (t) are defined, as follows, It is to be noticed that, for the optimal control problems where the final time is free, the mapping coefficient becomes an unknown that must be computed along with all of the β j 's. Hence, the transformation of the free function and its derivatives between the t domain to the z domain can be summarized, as follows, where σ j (z) is the abbreviation for dσ j (z) dz . Equation (28) in the z domain then becomes, The z domain must be discretized into n points in order to numerically solve this TPBVP. One can note that z is usually discretized with equally spaced points or using the Chebyshev-Gauss-Lobatto points, but one can use any desired quadrature scheme. The unconstrained set of differential equations in Equation (34) can then be expressed as loss functions at each discretization point, Now, by combining the differential equation for each dimension, an augmented loss function can be written, as follows, L = L 1 , . . . , L i , . . . , L N eq T (36) and imposing that to be a true solution, this vector should be equal to 0. This allows for the β j coefficients to be learnt via different optimization schemes, e.g., least-square for linear problems [16] and iterative least-squares for non-linear problems [17]. If the iterative least-square method is needed, then the estimation of the unknowns is updated at each iteration, as follows, where Ξ is the augmented vector containing all the vector β j (and eventually the square root of mapping coefficient, b) and the k subscript indicates the current iteration. In general, the ∆Ξ k term can be obtained by performing classic linear least-square at each step of the iterative least-square procedure, where J is the Jacobian matrix containing the derivatives of the losses with respect to all of the unknowns. The Jacobian matrix can be computed either by hand or by means of computing tools, such as the Symbolic or the Automatic Differentiation Toolbox. The iterative process is repeated until the following condition is met, where defines some user prescribed tolerance. Once we are sure that the convergence is achieved by meeting the criteria L 2 [L(Ξ k )] < , the nonlinear DE can be solved using the following criteria . Doing so, round-off error causes the convergence and the use of the user prescribed tolerance is completely avoided. Thus, the solution accuracy is pushed to the limit (e.g., it is possible to reach the best solution accuracy possible for the specific DE).
Using the X-TFC method for learning the solution of the TPBVP, arising from the PMP, Equation (26) represents the PoNN. For the convenience of the reader, Figure 1 presents a schematic summarizing how the PoNNs are modeled and trained for learning optimal control actions for a generic unconstrained OCP.

Optimal Intercept Problem: Formulations and Results
In this section, we show how PoNNs are modeled and trained to learn optimal control actions for the minimum-time energy optimal intercept problem [19]. Unbounded and bounded control actions will both be considered.
All of the problems tackled in this manuscript have been coded in Matlab R2020a, and ran with an Intel Core i7 -9700 CPU PC with 64 GB of RAM.

Unconstrained Problem
The optimal control problem that is considered here is the minimum time-energy optimal intercept problem. This problem describes the motion of an interceptor (M) chasing a target (T) both subjected to the same gravitational acceleration. The equations of motion in an inertial reference frame can be written, as follows, where r = r T − r M and v = v T − v M are, respectively, the relative distance and velocity between the target and the interceptor, while a T and a M are the thrust accelerations. The optimization problem to be addressed is a minimum time-energy optimal problem, where Γ is a non-negative coefficient, t f is the free final time to be optimized, and the final relative velocity is chosen to be free. The Hamiltonian for this problem is, The control optimality comes from, By plugging Equation (45) into Equation (44), the Hamiltonian can be rewritten as, The first-order necessary conditions for the state and costate are, In addition, two transversality conditions need to be imposed, The system that is expressed by Equations (47)-(52) is a TPBVP, representing the physics constraints that drive the PoNN training.
The transversality condition (51) is necessary because the problem is final time free (e.q., the final time is in the cost function). This means that the Hamiltonian at the converged final time must be equal to Γ. As the system is autonomous (e.g., there is no explicit dependency on time in the dynamics), the Hamiltonian must be constant over time (e.g., H(t) = −Γ). Because H(t f ) = −Γ (from the transversality condition), then H(t) + Γ = 0.
By writing the equations in components (j = 1, 2, 3), one can obtaiṅ where Equation (47) is removed since it is redundant when employing the CEs approximation and Equation (52) is also removed as the boundary conditions are analytically satisfied by the CEs. Thus, the states and costates are approximated using the TFC's CEs, as follows where Ω k (z) are called switching functions and they are reported in the Appendix A. Hence, the PoNN's parameter to be learned is, T An iterative least-square approach can be used to learn the PoNN's parameter that minimizes the following losses, L λ r,j =λ r,j (58) where j = 1, 2, 3, a j =v j , and the last equation is meant to be computed at the final time.
For what concerns the derivatives of the losses with respect to the unknowns (e.g., the Jacobian matrix), it can be computed either analytically, thanks to symbolic computation, or numerically, by automatic differentiation.

Results
The initial conditions for this problem are taken from Furfaro and Mortari [19]. The initial relative position and velocity vectors are chosen to be respectively r 0 = [500, −600, 500]m and v 0 = [−50, 60, 5] m/s, whereas the initial absolute velocity of the target is equal to v T,0 = [100, 100, 10] m/s. The target is supposed to be located at the origin of the absolute frame at the initial time instant and is subjected to a constant acceleration command a T = [1, −2, 0.1] m/s 2 . The activation function that is used in this problem is the hyperbolic tangent. The weights and bias are sampled from U (−1; 1). Because the problem is nonlinear, an initial guess on the unknowns is required to initialize the iterative least-square. In this work, as a first guess, the PoNN's parameters Ξ have been randomly sampled from U (0; 1). Figure 2 shows the trajectories of target and intercept learnt by the PoNN, and the time history of state and costates, with Γ = 1. For these results, we set N = 20 and L = 16. The training's convergence is achieved in nine iterations, with a training time of 0.01 s. The average loss and Hamiltonian are 1.3 × 10 −9 and 2.5 × 10 −8 , respectively. The Hamiltonian at the final time is 7.3 × 10 −13 . Moreover, Figure 3 shows that H(t) + Γ is approximately equal to zero, as expected. The learnt time of flight is 45.54 s, and the associated cost function is J = 65.30. In Table 1, it can be seen that the average error on the dynamics and the optimality of the learnt solutions improve via increasing N and L, at the expense of the training time, always for Γ = 1. It is worth noting that the learned trajectories are still accurate and optimal when low N and L hyperparameters are chosen. Furthermore, the training time is such that the algorithm could be used for real-time application (e.g., a closed control loop made by a series of open control loops that were computed instant by instant). A major difference in the formulation of the minimum time-energy intercept problem that we are reporting here and what was solved by Furfaro and Mortari [19] is that they fixed the final time to 50 s, while, in this work, we are also optimizing the final time, which is another parameter to be learnt by the PoNN.     Table 2 also reports the solutions with different values of the penalty Γ. As expected, increasing the penalty, the time of flight decreases at the expense of a higher commanded accelerations (see Figure 4b). Even for the case of Γ = 10, Figure 5 shows that H(t) + Γ is approximately equal to zero, as expected, proving the optimality of the results. In order to test the reliability of our results, the software GPOPS-II [29] has been run, just for this example, with Γ = 1 and Γ = 10. The results obtained with GPOPS-II are the following,    As can be seen, the cost function and time of flight are comparable, while the accuracy on the quantity |H(t) + Γ| is higher for PoNNs.

Constrained Problem
Here, the optimization problem to be addressed is a minimum time-energy optimal problem with bounded control. The OCS is cast as follows, where Γ is a non-negative coefficient, u min,i and u max,i are the bounds of the control acceleration vector components, t f is the free final time to be optimized, and the final relative velocity is chosen to be free. In particular, w = [w x , w y , w z ] is the new fictitious control acceleration vector that is required to transform the original constrained OCP into its unconstrained version. The Hamiltonian for this problem is, being the additional multipliers vector and the saturation functions vector, respectively. One can note that, in this case, one equality constraint is added for each component of a M . The first-order necessary conditions for the state and costate equations and the control optimality are given by, where the symbol indicates the derivatives with respect to the new fictitious control and the symbol identifies the Hadamard product (e.g., element-wise multiplication). In addition, two transversality conditions need to be imposed, The system of equations, along with the transversality condition on the Hamiltonian at the final time, forms the TPBVP, which represent the physics constraints that drive the PoNN's training.
The transversality condition (71) is necessary, because the problem is final time free (e.q., the final time is in the cost function). This means that the Hamiltonian at the converged final time must be equal to Γ. Because the system is autonomous (e.g., there is no explicit dependency on time in the dynamics), the Hamiltonian must be constant over time (e.g., H(t) = −Γ). As H(t f ) = −Γ (from the transversality condition), then H(t) + Γ = 0.
The equations then are, where Equation (66) is removed, since it is redundant and the control a M is substituted by φ(w). The unknown solutions are approximated with the TFC's CEs, Even for this case, the switching functions Ω k (z) are reported in the Appendix A. Hence, the parameters of PoNN to be learnt are, An iterative least-square approach can be used for minimizing the losses and train the PoNN.

Results
The initial conditions and target constant acceleration command for this problem are the same of the previous unconstrained version. For what concerns the boundaries for the components of the control acceleration vector, they have all been set equal to u min,i = d − i = −0.7 m/s 2 and u max,i = d + i = 0.7 m/s 2 . The coefficient c for the saturation function appearing in Equation (14) has been set to 0.005 after a trial and error procedure in order to achieve a good accuracy. Moreover, a continuation procedure on the parameter has been carried out to increase the accuracy of the solution as the unconstrained OCP approaches its constrained version. In particular, the value of is decreased by a factor of 0.1 at each iteration starting from 0.5 and arriving at 5 × 10 −11 . The activation function that is used in this problem is the logistic function. The weights and bias are sampled from U (−1; 1). Even for this case, all of the PoNN's parameters have been initialized by randomly sample them from U (0; 1). For this problem, we set L = 16 and N = 100, where the discretization points employed are the Chebyshev-Gauss-Lobatto points, represented by the equation z i = cos(iπ), where i is equally spaced from −1 to 0. The results are reported here just for Γ = 1 and they are shown in Figures 6-8. It is actually bounded within the defined boundaries, as can be seen from the control in Figure 6b. As expected from Equations (67)    The optimality of the results is proved by the Hamiltonian loss (namely |H(t) + Γ|), whose mean and final value are 4.95 × 10 −7 and 1.76 × 10 −6 , respectively.
As can be observed, the PoNN is able to learn the optimal control action, even for this case of bounded control, with a good accuracy.

Discussions
The results that are presented in this paper show that PoNNs can effectively learn the optimal control for the class of optimal intercept problems with the integral quadratic cost. In particular, the framework converges to the solution, even for a random initialization of the PoNN's parameters, which is an important advantage, since we know how the training can be significantly affected by the initial guesses when more traditional deterministic algorithms are used.
Importantly, once the optimal control is learned on the training points, its analytical representation is automatically obtained with the NNs. Therefore, it can be evaluated in points unseen during the training (e.g., test points) without recurring to interpolation techniques and/or no additional computational efforts.
Although the PoNN was solely trained in a physics-driven fashion to generate the results, the training could also be carried out in a data-physics-driven fashion. This becomes of extreme importance if the DEs do not precisely model the physics of the problem. For instance, when perturbations are present and/or when uncertain dynamical systems are considered, which is common when dealing with OCPs for space applications. Thanks to this unique feature, the proposed PoNNs could be used solo or in synergy with other states of the art methods for OCPs. For instance, one, in principle, could use PoNNs with GPOPS-II, where the problem to tackle is sufficiently complex and both of the methods alone would struggle. Within this scenario, optimal trajectories could be generated with GPOPS-II and then sampled to generate data, which can be added as additional terms in the PoNNs' loss function to better drive the training in a data-physics-driven manner. The same approach could be used with real data that intrinsically own information about perturbations and/or unmodeled terms of the real dynamics that were not accounted for in the modeled one.

Conclusions
Pontryagin Neural Networks (PoNNs) are introduced and tested to learn optimal control actions for instances of an optimal intercept problem. PoNNs represent a particular family of PINNs, specifically designed to learn control actions while satisfying the physics constraints coming from the PMP application. PoNNs are based on a newly developed PINN framework, named Extreme Theory of Functional Connections (X-TFC). PoNNs are SISO NNs, which are modeled and trained to satisfy the PMP via learning the solution of the arising TPBVP. In particular, the use of the Constrained Expressions (CEs), defined within the TFC, enables the boundary conditions' analytical satisfaction. This removes the issues that are related to having competing objects in the loss function, representing the major drawback of the original PINNs methods. The Extreme Learning Machine (ELM) algorithm is employed to train the NN within the CEs. Thanks to the ELM, the network's training is reduced to a simple least-squares, reducing the training time significantly when compared to gradient-based methods. The results show how PoNNs successfully learn optimal control actions for the optimal intercept problem, with unbounded and bounded control. Although the training is always initialized with random initial guesses for both the unknown coefficients of the CEs and final time, the algorithm can still converge to a feasible solution in all the cases, proving the proposed method's reliability. Furthermore, the results have shown good accuracy for the precision of the dynamics and the results' optimality.
Future works will focus on applying PoNNs to other classes of OCPs for space applications involving discontinuities in control, such as for the bang-bang type of solutions. Acknowledgments: The authors would like to acknowledge Kristopher Drozd for providing computational resources, and precious insights that have helped for the realization of this work.

Conflicts of Interest:
The authors declare no conflict of interest.

Appendix A. Switching Functions
The detailed derivation of the switching functions is reported in the work of Johnston et al. [21]. Here, for the convenience of the reader we just report them. We define ∆z = z f − z 0 and z * = z − z 0 . The switching functions for a constrained expression with one constraint on f are given in Table A1. The switching functions for a constrained expression with one constraint on f are given in Table A2. The switching functions for a constrained expression with two constraints on f are given in Table A3. The switching functions for a constrained expression with two constraints, one on f and one on f , are given in Table A4. The switching functions for a three constraints constrained expression, with two constraints on f and one on f , are given in Table A5. The switching functions for a four constraints constrained expression, with two constraints on both f and f , are given in Table A6.