1. Introduction
Addressing optimal control problems (OCPs) stands as a critical endeavor in crafting precise and efficient maneuvers for space missions. Traditionally, OCPs are approached through either direct or indirect methods. The direct method entails discretizing the problem to transform it into a Non-Linear Programming (NLP) problem, which is subsequently solvable via established optimization techniques like the trust region method, Nelder–Mead method, or interior point method [
1]. However, a significant concern with direct methods arises from the fact that general NLP problems are deemed NP-hard, implying indeterminate polynomial time complexity. This characteristic entails that the computational effort required to attain the optimal solution lacks a predefined bound, with no assurance of optimality or convergence, thereby casting doubt on the reliability of these approaches. Conversely, the indirect method harnesses the calculus of variation and the Pontryagin Minimum Principle (PMP) [
2]. Consequently, first-order necessary conditions for optimality are derived from the problem’s Hamiltonian in terms of states and costates. This leads to the formulation of Ordinary Differential Equations (ODEs), constituting the Two-Point Boundary Value Problem (TPBVP), typically tackled through single and multiple shooting methods [
3,
4], orthogonal collocation [
5], or pseudo-spectral methods [
6]. However, a notable drawback of the indirect optimization lies in its narrow convergence window, which is heavily reliant on the initial guess of unknown initial costates. Furthermore, costates often lack direct physical interpretation, exacerbating the challenge of estimating an initial guess. Consequently, despite the theoretical guarantees of optimality, obtaining optimal control via the indirect method can prove arduous.
Among the OCPs employed for space exploration missions, minimum time trajectories and fuel optimal trajectories, which are also considered in this paper, play a key role especially during the orbit transfer phases and the proximity maneuvers around a target body. Applications of such optimal trajectories already studied in the literature include cislunar trajectories [
7,
8,
9,
10,
11], low−thrust orbit transfer around the Earth [
12], interplanetary trajectories [
13], and solar sail optimal trajectories [
14]. Specifically, the primary problem centers on reducing propellant consumption (or maximizing final mass), whereas the secondary objective is to minimize the time of flight. The common aspect between those two types of problem is that a discontinuous control is usually involved when traditional thrusters are employed. This occurs because the control input, often represented by the throttle factor (or engine thrust ratio), appears linearly in the Hamiltonian of the problem, thus leading to a bang−off−bang or bang−bang type of control. In particular, two major difficulties arise when dealing with fuel optimal problems tackled via an indirect method. First, an estimate for initializing the costates is necessary, and at times, it can be challenging to provide such an estimate due to the lack of direct physical interpretation for the costates. Secondly, the number of switches of the control and their temporal location are usually unknown. To deal with that, a very accurate and extensive study is presented by Taheri and Junkins [
15] with the goal of generating minimum-fuel switching surfaces and computing the solution of N-impulse fuel optimal interplanetary rendezvous and Earth orbit transfers. Moreover, the link between impulsive and continuous-thrust trajectories is also proved via optimal switching surfaces. Due to the strong importance of fuel-optimal (and minimum time) trajectories, many other works in the literature have been dedicated to the study of those trajectories and how to mitigate the difficulties arising when dealing with discontinuous control. In particular, three main techniques can be highlighted (eventually used in combination): the homotopic continuation procedure, the convexification technique, and the smoothing function technique.
The homotopic continuation procedure is widely employed and effective in solving fuel or time-optimal problems [
10,
11,
12,
13,
14]. It consists of linking the original (difficult) OCP with easier problems to solve. This technique is based on a homotopic continuation parameter whose value is usually equal to one to represent the easier OCP. Thus, this problem is first solved, and then more difficult OCPs are solved step by step by slowly decreasing the continuation parameter to low values (close to zero). This continuation procedure allows for obtaining an accurate solution of the original OCP. Within homotopy continuation, three perturbing functions are introduced by Bertrand and Epenoy [
16]: the quadratic, logarithmic and extended logarithmic functions. Considering different types of perturbing functions actually leads to having different formulations of the optimal throttle input, while the optimal thrust unit direction always remains the same. Regarding the quadratic perturbing function, it serves as a bridge connecting an energy-optimal problem to a fuel-optimal problem and has been utilized in numerous studies. To cite some examples, it was employed in [
17] to derive low−thrust fuel−optimal trajectories, in [
18] to calculate fuel−optimal low−thrust Earth−orbit transfers, accounting for shadow eclipses, and in [
19] to study fuel optimal soft landing trajectories on asteroids. As an example of the logarithmic perturbing function, it is exploited by Izzo and Öztürk [
20] to obtain the fuel−optimal trajectories required to build a dataset and train a Deep Neural Network (DNN) in a supervised fashion. The resulting model appears to hold promise for the potential real−time onboard implementation of an optimal guidance and control system for a spacecraft. Even in the case of a homotopic continuation procedure, an initial guess of the initial costates is still required. A methodology to approximate the initial costates for a fuel−optimal descent trajectory on asteroids is proposed in Ref. [
21], where a two−impulse descent trajectory computed via an irregular gravitational Lambert solver is used for the costates initialization, thus removing the issue and showing feasible results in low computational times. Continuation procedures are also employed in [
22,
23]. The initial paper achieves fuel−optimal orbital transfers by employing Lawden’s primer vector theory and implementing a continuation procedure on the thrust amplitude. Additionally, the proposed strategy enables the development of an automated algorithm, offering the benefit of not necessitating any initial guess for the costate variables. In the subsequent study, a novel homotopy continuation technique is introduced, connecting the original fuel−optimal low−thrust trajectory with the time−optimal problem. Moreover, the dynamical model introduces new variables to reduce the number of unknown initial costates, allowing the mass costate to be expressed analytically in logarithmic form.
Another important methodology to rapidly and accurately solve OCPs is the convexification technique, which allows for transforming nonconvex problems into convex problems. In fact, convex problems are easier to solve, and theoretical guarantees about the solutions convergence and the computational efficiency are generally available. Convexification has been exploited widely within the aerospace community for fuel−optimal problems, involving (but not limited to) the landing on Mars [
24] and asteroids [
25], transfer trajectories between periodic orbits in the cislunar space [
26], cooperative rendezvous [
27], and interplanetary low−thrust trajectories using a subsequent optimization process [
28]. Finally, fuel−optimal and minimum time trajectories are linked and computed via convex optimization in [
29], where the authors first solve fuel−optimal trajectories in order to compute accurate minimum time trajectories. However, for further details about convex optimization for aerospace applications, the reader can refer to Ref. [
30].
Finally, smoothing techniques are based on the approximation of the sign function involved in the discontinuous control with smooth functions. Even in this case, there is the presence of a smoothing parameter which is slowly decreased with a continuation procedure to accurately obtain the discontinuous control. Many smoothing techniques have been proposed in the literature. As an example, a trigonometric−based regularization is employed in [
31] to study fuel−optimal trajectories including also path constraints. The same kind of problem is faced in Refs. [
32,
33,
34], where a hyperbolic tangent smoothing function is employed. This last smoothing function is actually the smoothing technique exploited in this paper to approximate the discontinuous control.
This work delves into fuel−optimal trajectories with fixed time of flight, tackled through the combination of indirect methods and a machine learning approach known as Pontryagin Neural Networks (PoNNs), which is a specialized framework within Physics-Informed Neural Networks (PINNs). As defined in [
35], PoNNs are a subset of PINNs specifically trained to learn optimal control actions conforming to the Pontryagin Minimum Principle (PMP). By leveraging PoNNs, solutions to the Two−Point Boundary Value Problems (TPBVPs) associated with fuel−optimal scenarios are learned in terms of states and costates. Notably, the PINN framework utilized in PoNNs is the Extreme Theory of Functional Connections (X−TFC), which combines the functional interpolation technique known as the Theory of Functional Connections (TFC), pioneered by Mortari [
36], and the Extreme Learning Machine (ELM) [
37]. According to TFC, latent solutions are represented by constrained expressions (CEs), comprising a free function and a functional component that consistently satisfies boundary conditions analytically. The analytical fulfillment of these boundary constraints offers a significant advantage in solving TPBVPs.
Within X−TFC, the free function is represented by a shallow neural network (NN) trained via Extreme Learning Machine (ELM). Notably, ELM is a training algorithm wherein input weights and biases are randomly sampled from continuous distributions and remain untuned throughout training. Consequently, the only parameters adjusted during training are the output weights. Typically, least−square (LS) methods are employed within ELM for training, with proofs of convergence provided in [
37]. In this work, Chebyshev Neural Networks (ChNN) are utilized as the free−function [
38]. X−TFC emerges as a versatile tool applicable to various domains, ranging from data−driven parameter discovery of Ordinary Differential Equations (ODEs) [
39] to solving zero−finding problems by identifying promising homotopy paths [
40]. Furthermore, owing to their efficacy in solving Two−Point Boundary Value Problems (TPBVPs), both frameworks have been employed in solving optimal control problems (OCPs) within aerospace applications. For instance, they have been utilized in energy−optimal landing problems on small and large planetary bodies [
41,
42], energy−optimal circumnavigation trajectories around asteroids with collision avoidance [
43], energy−optimal relative motion problems [
44], optimal planar orbit transfers [
45], and intercept problems [
35]. However, in a previous work addressing fuel−optimal landing on large planetary bodies with constant gravity through TFC [
46], the number of control switches was known in advance, allowing for the problem to be simplified by explicitly segmenting the time domain into three parts. Conversely, in the present study, the number of switches for discontinuous control is not assumed to be known a priori. To the best of the authors’ knowledge, this study marks the first instance where PoNNs are employed to learn solutions of OCPs featuring discontinuous control in aerospace applications. Specifically, two distinct fuel−optimal problems are tackled to evaluate the proposed approach: a low−thrust interplanetary transfer from Earth to Mars orbit and a landing trajectory on Mars. For both scenarios, the obtained solutions are compared with other state−of−the−art methods, such as the shooting method and adaptive Gaussian quadrature collocation method.
Finally, the main contributions of this paper include the following: (1) the extension of the PoNN framework to generate solutions of OCPs with discontinuous control thanks to the combination with the smoothing hyperbolic tangent and the continuation procedure; (2) the proposed framework autonomously detects the number of switches in the control as well as their temporal location without any a priori knowledge; (3) the solution convergence is obtained with random initial guesses of the output weights of the PoNN; (4) the utilization of the X−TFC constrained expressions (CEs) facilitates the availability of an analytical approximation for the optimal trajectory and control, obviating the need for interpolation to compute solutions at points not encountered during training and mitigating potential accuracy degradation.
This paper is organized as follows.
Section 1 is dedicated to a brief recall of the indirect method and the presentation of the proposed strategy to solve OCPs via PoNNs. Afterwards, fuel−optimal problems are formulated for both the interplanetary transfer and the landing trajectory, and the latent solutions approximation via CEs is provided.
Section 3 and
Section 4 report the obtained results and related discussions, respectively.
Section 6 provides concluding remarks.
4. Results
The methodology elucidated in the preceding sections is employed herein to address two distinct problems: a fuel−optimal interplanetary rendezvous trajectory from Earth to Mars, employing low−thrust maneuvers, and a fuel−optimal landing on Mars. In particular, in the landing problem on a large planetary body under the assumption of a constant gravity (i.e., short times of flight), the number of control switches is known a priori and follows the structure
. This happens because the time derivative of the switching function
is proved to change sign at most once, leading to at most two changes of sign of
S. Therefore, the control profile presents at most three subarcs with two switches. On the other hand, in case of a central gravity field assumption (e.g., the interplanetary transfer),
can change sign more than once, leading to possibly more than three subarcs in the control profile. For more information and detailed theoretical proofs, the reader can refer to Ref. [
52]. For both the problems, the time of flight is considered fixed. The initial and final conditions of the two problems, together with the parameters associated with the spacecraft engine, are reported in
Table 1. For the interplanetary problem, the same values used in Ref. [
31] are employed, while for the landing, the same parameters of Ref. [
46] are exploited. Furthermore,
Table 1 also reports the values employed to make the problem dimensionless:
for distance,
for time, and
for mass. All the other quantities can be made dimensionless according to a combination of the three mentioned parameters. Finally, the X−TFC parameters are shown in
Table 2. One can note that if Chebyshev Neural Networks are employed as activation functions, the input weights and bias (
and
, respectively) are constant values set equal to one and zero, respectively. All the simulations have been implemented in Matlab R2023b and ran with an Intel Core i7−9700 CPU PC with 64 GB of RAM.
4.1. Interplanetary Trajectory
In tackling this problem, the smoothing parameter
has been progressively decreased from 1 to
over 20 iterations, employing a logarithmically spaced vector. Regarding the initial guesses for the unknown coefficients of the constrained expressions, they have been randomly set from a standard uniform distribution
for [
]. For the mass and mass costate,
and
have been computed so that the first guess solutions are a constant value equal to
for
m and equal to one for
. The algorithm employed for the training of PoNNs is the Levenberg−Marquardt. The results regarding the switching function
S and the throttle function are shown in
Figure 2 together with a comparison carried out via the shooting method. As can be seen, the PoNNs are able to compute the time location of the switching points well enough. However, the solution for the control stills appears different from the one obtained via the shooting technique, especially in the discontinuities of the control. In order to increase the accuracy, a last step is carried out to better refine the solution and eventually increase the accuracy. Therefore, once the switch intervals are identified, 100 discretization points per switch interval are added to
n, and the number of neurons
L has been increased to 80. Hence, the X−TFC framework has been run again, and the results after the refinement of the solution are shown in
Figure 3,
Figure 4 and
Figure 5. In particular, the interplanetary transfer and the direction of the thrust are illustrated in
Figure 3 and
Figure 4, respectively. The switching function
S, the control, the mass and mass costate are shown in
Figure 5 together with the comparison with the shooting method. The new refined solution is now very close to the one obtained via the shooting method despite a little difference in the last switch of the control thrust. In particular, the fuel consumption computed via PoNNs is 396.85 kg, which differs by 0.79 kg from the minimum solution obtained via the shooting method (396.06 kg). This value is also compliant with the one reported in Ref. [
31]. The results obtained with the current simulation are very promising, since the PINN is able to learn the solution together with the correct number of control switches and the switching times without any previous knowledge. Moreover, this is carried out without splitting the time domain into multiple segments as it was shown for a fuel−optimal landing on Mars solved via TFC in a previous work [
46].
4.2. Landing Trajectory
For this problem, the smoothing parameter
has been decreased from 1 to
with 20 iterations using a logarithmically spaced vector. For what concerns the initial guesses of the unknown coefficients of the constrained expressions, they have been set randomly for [
]. For the mass and mass costate,
and
have been computed so that the first−guess solutions are a constant value equal to
for
m and equal to 1 for
. The algorithm employed is the trust−region−reflective algorithm. The result regarding the time history of the control thrust is shown in
Figure 6 together with a comparison carried out via the adaptive Gaussian quadrature collocation method, as implemented in GPOPS−II [
53]. Even in this case, PoNNs are able to compute the time location of the switching points well enough. In order to increase the accuracy, the refinement of the solution is carried out. Therefore, once the switch intervals are identified, 100 discretization points per switch interval are added to
n, whereas the number of neurons
L has not been increased. Hence, the PoNNs framework has been run again, and the results after the refinement of the solution are shown in
Figure 7,
Figure 8 and
Figure 9. In particular, the landing trajectory and the direction of the thrust are illustrated in
Figure 7 and
Figure 8, respectively. The switching function
S, the control, the mass and mass costate are shown in
Figure 9, together with the comparison with GPOPS−II. The new refined solution is now very close to the one obtained via GPOPS−II, despite there being a little difference in the switching points. In particular, the fuel consumption computed via PoNNs is 268.30 kg, which differs by 0.37 kg from the minimum solution obtained via GPOPS−II (267.93 kg). This value is also similar to the one reported in Ref. [
46], where the fuel consumption is shown to be 275.205 kg. Nevertheless, the approach pursued in this work avoids the a priori knowledge of the number of switches and also the necessity to split the domain to handle discontinuities in the control, as shown in [
46]. Another advantage of the proposed approach with respect to Ref. [
46] is that an additional optimizer to obtain the switching times, represented by an outer loop on top of the TFC solver, is not required anymore, since the entire solution is computed with the only PoNN framework. However, the method based on the split domain is useful to increase the accuracy of the loss functions in correspondence of the switches. In fact, with the proposed approach, the loss functions present some jumps when the discontinuities are present. This indicates that the two methods can actually be used in combination: the current one to detect the first guess solution with the number of switches and their time location, and the one based on the domain splitting to improve accuracy and performances.