A Hard-Constrained PMP-Based Warm-Start Framework for Nonlinear Optimal Control Using Physics-Informed Learning

Du, Zhuo; Wang, Xu

doi:10.3390/math14101614

Open AccessArticle

A Hard-Constrained PMP-Based Warm-Start Framework for Nonlinear Optimal Control Using Physics-Informed Learning

by

Zhuo Du

^1,2

and

Xu Wang

^1,2,*

¹

School of Mathematics and Statistics, Ningxia University, Yinchuan 750021, China

²

Ningxia Basic Science Research Center of Mathematics, Ningxia University, Yinchuan 750021, China

^*

Author to whom correspondence should be addressed.

Mathematics 2026, 14(10), 1614; https://doi.org/10.3390/math14101614

Submission received: 8 April 2026 / Revised: 4 May 2026 / Accepted: 6 May 2026 / Published: 9 May 2026

(This article belongs to the Section E: Applied Mathematics)

Download

Browse Figures

Versions Notes

Abstract

Indirect methods based on Pontryagin’s Maximum Principle (PMP) offer theoretical rigor for nonlinear optimal control but suffer from extreme sensitivity to costate initialization. Physics-Informed Neural Networks (PINNs) provide a promising data-free approach to globally approximate trajectories and overcome this initialization barrier. However, they often lack strict numerical precision due to their reliance on soft penalty constraints. To bridge this gap, this paper proposes a hybrid framework that synergizes the global search capability of a structurally modified PINN with the rigorous precision of high-order Chebyshev–Gauss–Lobatto (CGL) spectral discretization. Within this framework, we first introduce a novel neural architecture that enforces the PMP stationarity condition as a hard constraint by analytically eliminating control inputs via costates, thereby reducing the optimization search space and ensuring strict optimality during training. The neural-generated trajectories subsequently provide a high-quality warm start for a CGL pseudospectral solver, transforming the problem into a single-shot convex quadratic programming formulation. Numerical experiments on the Van der Pol oscillator and elliptic PDE optimal control problems demonstrate that this strategy effectively mitigates the initialization sensitivity of indirect methods. The results show that the proposed method achieves superior accuracy and convergence stability compared to standalone PINN solvers, providing a robust initialization-free approach for complex nonlinear optimal control.

Keywords:

nonlinear optimal control; PMP; PINN; hard constraints; warm start; pseudospectral method

MSC:

93C10

1. Introduction

Nonlinear optimal control plays a central role in providing a mathematical framework for decision-making in complex dynamical systems, and has been extensively applied in a wide range of engineering domains, including spacecraft trajectory optimization [1], robotics [2,3], and process control in energy and manufacturing systems [4,5].

Numerical approaches for solving nonlinear optimal control problems are commonly classified into two major categories: direct methods and indirect methods [6].

Direct methods transcribe the original continuous-time optimal control problem into a finite-dimensional nonlinear programming (NLP) problem. For example, Betts [7] established foundational surveys on trajectory optimization transcription, while Garg et al. [8] generalized this into a unified framework for direct collocation. The resulting NLP problems are then solved using advanced programming tools. Benefiting from advances in solution algorithms and discretization theory, direct collocation nonlinear programming methods have developed rapidly in recent years [9]. In particular, pseudospectral methods have attracted significant attention due to their concise formulation and spectral (exponential) convergence properties for smooth solutions. Foundational works have established various global polynomial interpolation schemes at Gaussian quadrature points. For instance, Elnagar et al. [10] introduced the pseudospectral Legendre method, Ross and Fahroo [11] formalized the costate mapping for these discretizations, and Garg et al. [12] developed the Radau pseudospectral approach for handling optimal control problems. However, as global interpolation schemes, classical pseudospectral methods suffer from deteriorated approximation accuracy when the optimal solution exhibits non-smooth features or switching structures. This limitation has motivated the development of mesh refinement strategies to balance local resolution and global accuracy. Notably, Ross and Fahroo [13] introduced pseudospectral knotting methods specifically for handling nonsmooth dynamics, while recent work by Zhang et al. [14] demonstrated the efficacy of hp-adaptive strategies in highly dynamic pursuit–evasion games.

In contrast to direct methods, indirect methods—typically based on Pontryagin’s Maximum Principle (PMP)—transform the optimal control problem into a Two-Point Boundary Value Problem (TPBVP). To solve the resulting TPBVP, researchers have developed diverse numerical techniques. In the realm of shooting methods, approaches range from the successive linear shooting proposed by Filipov et al. [15] and generating functions by Park et al. [16], to higher-order finite-difference schemes [17] and double shooting algorithms for multibody dynamics [18]. Alternatively, quasilinearization algorithms have been extensively studied, with significant contributions including the Bellman–Kalaba–Lakshmikantham method [19], iterative Chebyshev approximations [20], and successive convexification frameworks for systems with proportional delays [21]. Despite their theoretical elegance, indirect methods remain underutilized in practice primarily due to their extreme sensitivity to costate initialization. As highlighted by classical studies on modified quasilinearization [22] and more recent exact penalty formulations [23], minor deviations in the initial costate guess often lead to numerical divergence, especially for long-horizon or stiff systems.

To alleviate the initialization challenges associated with indirect methods, Physics-Informed Neural Networks (PINNs) have recently emerged as a promising data-free learning paradigm for solving differential equation-constrained optimization problems [24]. By embedding governing equations directly into the training loss via automatic differentiation, PINNs are capable of learning physically consistent state trajectories without explicit discretization or costate initialization [25]. Recent advancements, such as Physics-Informed Deep Operator Control [26,27] and the Physics-Informed Neural Nets for Control framework [28], have further expanded PINNs’ utility in handling nonlinearities. Nevertheless, standalone PINNs often struggle to strictly guarantee dynamic constraint satisfaction due to the soft penalty nature of their loss functions.

From this perspective, PINNs offer a natural mechanism for generating high-quality, physically consistent trajectory estimates, which can be leveraged to mitigate the costate initialization sensitivity inherent in classical indirect methods.

To address the intrinsic sensitivity of indirect methods to costate initialization while preserving their theoretical rigor, we propose a hybrid PMP–PINN warm-start framework that systematically integrates physics-informed learning with high-order pseudospectral discretization. A dedicated PINN architecture is constructed to jointly approximate state and costate trajectories, providing high-quality initial guesses that substantially improve the robustness of subsequent indirect optimization.

In contrast to existing PMP-PINN formulations that treat the control variable as a free network output enforced through penalty-based constraints, our framework analytically eliminates the control variable via the stationarity condition. This design enforces the PMP optimality condition as a hard architectural constraint, guaranteeing its exact satisfaction throughout the training process. The proposed architecture enforces the stationarity condition at the functional level, rather than the asymptotics in the loss landscape. To explicitly highlight how these structural advantages position our proposed framework relative to prior works, Table 1 provides a systematic comparison of existing optimal control methodologies.

The main contributions of this work can be summarized as follows:

A PMP-consistent PINN architecture is developed, where in the stationarity condition is enforced as a hard constraint to structurally eliminate the control variable. Consequently, a physics-informed warm-start strategy is established to significantly alleviate the costate initialization sensitivity of classical indirect methods.
The neural-generated trajectories are integrated with a Chebyshev–Gauss–Lobatto (CGL) pseudospectral discretization, enabling the optimization to be formulated as a single-shot convex quadratic programming problem. The superior accuracy and robustness of this hybrid framework are demonstrated through extensive numerical experiments on nonlinear ODE and elliptic PDE optimal control problems.

From a numerical optimal control perspective, the key novelty of this work lies in the structural elimination of control variables through the PMP stationarity condition, which fundamentally alters the feasible search manifold explored by physics-informed learning.

The scope and limitations are listed as follows:

While the proposed hybrid framework significantly enhances the robustness and accuracy of indirect optimal control methods, its applicability is subjected to certain boundaries:

Assumptions on differentiability: The framework fundamentally relies on the continuous formulation of Pontryagin’s Maximum Principle. Consequently, it requires the system dynamics and objective functions to be sufficiently differentiable to analytically derive the Hamiltonian and costate equations.
Problem classes handled: The current framework, coupled with CGL spectral refinement, is highly effective for problems yielding relatively smooth optimal trajectories. However, for systems exhibiting highly discontinuous “bang-bang” control or singular arcs, the spectral solver may experience Gibbs phenomena unless coupled with additional mesh-refinement strategies (e.g., hp-adaptive methods) during the online phase.
Computational overhead: A notable trade-off of this approach is the computational overhead associated with the offline phase. Training the hard-constrained PMP-PINN to accurately capture the global topology requires considerable GPU time and hyperparameter tuning. Thus, the framework is best suited for scenarios where substantial offline training is permissible to guarantee rapid, initialization-free online execution.
Problem dimensionality: Like most neural network-based PDE solvers, the proposed PINN architecture is susceptible to the “curse of dimensionality”. Applying this method to systems with exceedingly high-dimensional state spaces (e.g., complex multi-agent systems) would demand an exponentially larger number of collocation points and network capacity. This scalability challenge remains an open area for future research.

The organization of this work is as follows: In Section 2, we formally define the nonlinear optimal control problem. The proposed PMP-PINN architecture, which explicitly introduces the hard-constrained stationarity condition, is presented in Section 3. This is followed by Section 4 on the Chebyshev–Gauss–Lobatto pseudospectral discretization and the integration of the hybrid warm-start framework. Extensive numerical experiments demonstrating the accuracy and robustness of the proposed method are given in Section 5. Finally, in Section 6, we provide concluding remarks and outline directions for future research.

2. Problem Formulation

The problem we are addressing is to find the optimal control

u^{*} (t)

defined on the time interval

t \in [t_{0}, t_{f}]

that minimizes the following Lagrange cost function:

J = \frac{1}{2} \int_{t_{0}}^{t_{f}} (x^{T} Q x + u^{T} R u) d t,

(1)

subjected to system dynamics and initial conditions:

\dot{x} (t) = A x (t) + B u (t), x (t_{0}) = x_{0},

(2)

and terminal state constraints:

Φ (x (t_{f})) = 0,

(3)

where

x (t) \in ℝ^{n}

is the state vector,

u (t) \in ℝ^{m}

is the control input vector,

A \in ℝ^{n \times n}

and

B \in ℝ^{n \times m}

are the constant system matrices, and

x_{0}

is the initial state vector.

Remark 1.

Among the weight matrices in the cost functional,

Q \in ℝ^{n \times n}

is a symmetric positive semi-definite matrix (

Q ≽ 0

) penalizing state deviations, and

R \in ℝ^{m \times m}

is a symmetric positive definite matrix (

R ≻ 0

) penalizing control energy.

Remark 2.

It should be noted that while the system dynamics in Equation (2) are formulated as a linear time-invariant system to facilitate the analytical derivation of the single-shot convex quadratic programming formulation in Section 4, the proposed PMP-PINN warm-start framework inherently accommodates general nonlinear optimal control problems. Specifically, for general nonlinear dynamics

\dot{x} = f (x, u)

, the continuous vector field is directly incorporated into the residual evaluation of the physics-informed loss function via automatic differentiation. The validity of this generalization is subsequently corroborated by the non-convex optimal control experiments detailed in Section 5.

3. PMP-PINN Warm Start Mechanism

To construct a Physics-Informed Neural Network, one must first derive the Hamiltonian system for this optimal control problem. According to PMP, we define the Hamiltonian function:

H (x, u, λ) = \frac{1}{2} x^{T} Q x + \frac{1}{2} u^{T} R u + λ^{T} (A x + B u),

(4)

where

λ (t) \in ℝ^{n}

is the costate vector or adjoint variable.

The first-order necessary conditions for optimality are given by the following set of canonical equations.

State equation:

\dot{x} = \frac{\partial H}{\partial λ} = A x + B u .

(5)

Costate equation:

\dot{λ} = - \frac{\partial H}{\partial x} = - (Q x + A^{T} λ) .

(6)

Stationarity condition:

\frac{\partial H}{\partial u} = 0 \Rightarrow R u + B^{T} λ = 0 \Rightarrow u^{*} (t) = - R^{- 1} B^{T} λ (t) .

(7)

Transversality condition:

λ (t_{f}) = \frac{\partial Φ}{\partial x} |_{t_{f}} = 0 .

(8)

It should be explicitly noted that the derivation of this analytical control law strictly relies on the assumption that the control penalty matrix

R

is invertible. As defined in Remark 1,

R

is a symmetric positive definite matrix (

R > 0

), which mathematically guarantees its invertibility and ensures the existence of a unique global minimum for the Hamiltonian with respect to

u

.

The core innovation of our framework lies in the specific architectural design that distinguishes between hard constraints and soft constraints.

We construct a fully connected deep neural network

N (t; θ)

with parameters

θ

, taking time

t

as input. Unlike standard PINNs that might output all variables

(x, u, λ)

, our network only outputs the state and costate approximations:

[\hat{x} (t), \hat{λ} (t)] = N (t; θ) .

(9)

(1): Hard constraint: stationarity condition

To strictly enforce physical consistency and reduce the optimization search space, we treat the control variable

u

via a hard constraint mechanism. According to Equation (7), the optimal control

u^{*}

has a rigorous algebraic relationship with the costate

λ

. We explicitly embed this relationship into the network’s computational graph:

\hat{u} (t) = - R^{- 1} B^{T} \hat{λ} (t) .

(10)

Unlike conventional PINN formulations where all variables are treated as free outputs, the control input in our framework is explicitly eliminated via the stationarity condition, thereby enforcing the PMP optimality condition as a hard constraint. This design ensures that the stationarity condition

\partial H / \partial u = 0

is mathematically guaranteed to be satisfied exactly at every training iteration, regardless of the training loss.

Furthermore, in practical optimal control scenarios, the control variables are frequently subjected to physical inequality constraints, typically bounded by

u (t) \in [\underline{u}, \bar{u}]

. Under the PMP framework, the stationarity condition naturally accommodates these bounds through Pontryagin’s minimization, resulting in a piecewise continuous control law. To incorporate this into our architecture, we extend the unconstrained algebraic relationship in Equation (10) into a bounded saturation projection within the network’s computational graph:

\hat{u} (t) = clip (- R^{- 1} B^{T} \hat{λ} (t), \underline{u}, \bar{u}),

(11)

where the

clip (\cdot)

function enforces the hard saturation limit. While the

clip (\cdot)

function is technically non-differentiable exactly at the boundary points, in our numerical implementation, this saturation is strictly enforced via a standard nested min–max bounding operator. Modern automatic differentiation frameworks (e.g., PyTorch 2.7.0+cu128) natively support this operator by assigning valid sub-gradients at the exact points of non-smoothness.

It is worth discussing the impact of this non-smooth saturation function on the neural network’s backpropagation. While the

clip (\cdot)

function is technically non-differentiable exactly at the boundary points, PyTorch can handle this effectively utilizing sub-gradient methods. By assigning a valid sub-gradient at the exact points of non-smoothness, the framework maintains gradient continuity almost everywhere. This ensures that the backpropagation process remains mathematically sound and stable, allowing the neural network to learn the bounded optimal control law without suffering from gradient discontinuity or structural instability.

(2): Soft constraints: dynamics and boundaries

Conversely, the differential constraints and boundary conditions are treated as soft constraints, incorporated into the loss function

L (θ)

via penalty terms. The total loss is defined as:

L (θ) = w_{d y n} L_{d y n} + w_{a d j} L_{a d j} + w_{b c} L_{b c} .

(12)

The specific residual terms are derived using automatic differentiation.

The dynamics residual is:

L_{d y n} = \frac{1}{N_{c}} \sum_{i = 1}^{N_{c}} {‖\frac{d \hat{x}}{d t} (t_{i}) - (A \hat{x} (t_{i}) - B R^{- 1} B^{T} \hat{λ} (t_{i}))‖}^{2} .

(13)

The adjoint residual is:

L_{a d j} = \frac{1}{N_{c}} \sum_{i = 1}^{N_{c}} {‖\frac{d \hat{λ}}{d t} (t_{i}) + (Q \hat{x} (t_{i}) + A^{T} \hat{λ} (t_{i}))‖}^{2} .

(14)

The boundary condition residual is:

L_{b c} = ‖ \hat{x} (t_{0}) - x_{0} ‖^{2} + ‖ \hat{λ} (t_{f}) - 0 ‖^{2},

(15)

where

T

represents the set of collocation points. This hybrid constraint formulation ensures that while the differential equations are approximated via gradient descent, the coupling between control and costate remains rigid and physically exact.

To address potential gradient pathologies caused by competing loss terms, this work adopts an adaptive weighting strategy based on multi-task learning, inspired by the AW-EL-PINNs framework [29]. Rather than assigning fixed heuristic weights, the minimization of the dynamics, adjoint, and boundary residuals is treated as distinct learning tasks. By introducing learnable weight parameters, the network dynamically balances these loss components throughout the training process.

Proposition 1 (Exact Satisfaction of Stationarity Condition).

Under the proposed architecture, the PMP stationarity condition is satisfied identically for all training iterations, independent of the optimization of the PINN loss function. This fundamentally distinguishes the proposed framework from penalty-based PINNs, where optimality conditions are only asymptotically satisfied.

4. Quasilinearization of the System

The proposed framework synergizes PMP-PINN’s global search with CGL’s local spectral precision. As illustrated in Figure 1, the process operates in two stages: First, the PMP-PINN (Stage 1) generates a physically consistent warm start via hard constraints, ensuring robustness to strong nonlinearities. These trajectories then initialize the CGL spectral solver (Stage 2). This architecture is particularly potent for linear–quadratic (LQOC) and linear PDE problems, enabling single-shot spectral solutions via a convex QP formulation. For general nonlinear systems, it mitigates initialization sensitivity, significantly accelerating iterative convergence.

Having outlined the framework’s workflow, this section establishes the mathematical foundations of the CGL method by formulating the time-domain mapping, defining the Lagrange interpolating basis functions, and deriving the corresponding spectral differentiation matrices alongside the Clenshaw–Curtis quadrature weights.

Chebyshev polynomials and their orthogonality properties are defined on the standard interval

τ \in [- 1, 1]

. To handle physical time

t \in [t_{0}, t_{f}]

, an affine mapping is introduced as:

t (τ) = \frac{t_{f} - t_{0}}{2} τ + \frac{t_{f} + t_{0}}{2} .

(16)

The corresponding inverse mapping is:

τ (t) = \frac{2}{t_{f} - t_{0}} t - \frac{t_{f} + t_{0}}{t_{f} - t_{0}} .

(17)

The chain rule relationship for the differential operator is defined as:

\frac{d}{d t} = \frac{d τ}{d t} \frac{d}{d τ} = \frac{2}{t_{f} - t_{0}} \frac{d}{d τ} .

(18)

In the following, let

S = \frac{2}{t_{f} - t_{0}}

be the time scaling factor.

CGL nodes are the extrema of the

N th

order Chebyshev polynomial

T_{N} (τ) = \cos (N \arccos τ)

, plus the interval endpoints. They are defined as:

τ_{j} = \cos (\frac{j π}{N}), j = 0, 1, \dots, N .

(19)

Note that

τ_{0} = 1

corresponds to physical time

t_{f}

, and

τ_{N} = - 1

corresponds to physical time

t_{0}

. This is opposite to the usual chronological order, so special attention must be paid to the physical meaning of indices when constructing matrices.

The Lagrange interpolating polynomial

P_{N} (τ)

for a function

x (τ)

at CGL nodes can be expressed as:

x (τ) \approx P_{N} (τ) = \sum_{j = 0}^{N} x (τ_{j}) l_{j} (τ),

(20)

where

l_{j} (τ)

denotes the interpolating basis functions:

l_{j} (τ) = \frac{{(- 1)}^{j + 1} (1 - τ^{2}) {T^{'}}_{N} (τ)}{c_{j} N^{2} (τ - τ_{j})},

(21)

where the normalization constants

c_{j}

are specified as:

c_{j} = \{\begin{array}{l} 2 & j = 0, N \\ 1 & 1 \leq j \leq N - 1 \end{array} .

(22)

The spectral differentiation matrix

D \in ℝ^{(N + 1) \times (N + 1)}

maps function values at nodes to derivative values at nodes, i.e.,

\dot{x} = D x

. The analytical expressions for its elements

D_{k j} = {\dot{l}}_{j} (τ_{k})

are derived as follows.

By applying trigonometric identities, the off-diagonal entries (

k \neq j

) simplify to:

D_{k j} = \frac{c_{k}}{c_{j}} \frac{{(- 1)}^{k + j}}{τ_{k} - τ_{j}} .

(23)

For the diagonal elements (

k = j

), the expressions for the interior nodes (

k \neq 0, N

) are given by:

D_{k k} = - \frac{τ_{k}}{2 (1 - τ_{k}^{2})}, k \neq 0, N .

(24)

Boundary entries:

D_{00} = \frac{2 N^{2} + 1}{6}, D_{N N} = - \frac{2 N^{2} + 1}{6} .

(25)

To improve numerical stability, the “Negative Sum Trick” is applied, utilizing the property that the derivative of a constant function is zero, computing diagonal elements via off-diagonal elements:

D_{k k} = - \sum_{\begin{matrix} j = 0 \\ j \neq k \end{matrix}}^{N} D_{k j} .

(26)

Furthermore, utilizing the centrosymmetric property

D_{k j} = - D_{N - k, N - j}

can further reduce computational errors.

To discretize the integral cost function into an algebraic sum, the Clenshaw–Curtis quadrature method is adopted. This method integrates the interpolated function at CGL nodes, offering accuracy far superior to the trapezoidal rule for smooth functions, and is computable with

O (N \log N)

complexity.

\int_{- 1}^{1} f (τ) d τ \approx \sum_{j = 0}^{N} w_{j} f (τ_{j}) .

(27)

The explicit summation formula for weights

w_{j}

is derived based on the integration properties of Chebyshev polynomials:

w_{j} = \frac{c_{j}}{N} [1 - \sum_{k = 1}^{⌊N / 2⌋} \frac{b_{k}}{4 k^{2} - 1} \cos (\frac{2 k j π}{N})],

(28)

where the coefficients

b_{k}

are defined as:

b_{k} = \{\begin{array}{l} 1 & k = N / 2 \\ 2 & k < N / 2 \end{array} .

(29)

Here, the definition of

c_{j}

is consistent with that in the differentiation matrix, being 1 at endpoints and 2 otherwise.

By employing the aforementioned spectral discretization, the infinite-dimensional LQOC problem is reformulated into a finite-dimensional quadratic programming problem. Given the linearity of the system dynamics, the traditional iterative quasilinearization process is bypassed, allowing for a precise solution to be obtained through a single QP optimization.

To construct sparse and structurally clear QP matrices, we define the decision vector

Z

. Let

x_{i} (τ_{j})

denote the value of the

i

-th state variable at the

j

-th time node. We stack all variables in a “variable-first” manner, facilitating the expression of constraints using Kronecker products:

X^{(i)} = {[x_{i} (τ_{0}), x_{i} (τ_{1}), \dots, x_{i} (τ_{N})]}^{T} \in ℝ^{N + 1},

(30)

U^{(l)} = {[u_{l} (τ_{0}), u_{l} (τ_{1}), \dots, u_{l} (τ_{N})]}^{T} \in ℝ^{N + 1} .

(31)

The aggregate decision vector

Z \in ℝ^{(n + m) (N + 1)}

is defined as:

Z = {[{(X^{(1)})}^{T}, \dots, {(X^{(n)})}^{T}, {(U^{(1)})}^{T}, \dots, {(U^{(m)})}^{T}]}^{T} \in ℝ^{(n + m) \times (N + 1)} .

(32)

The original objective function is given by:

J = \frac{1}{2} \int_{t_{0}}^{t_{f}} (\sum_{i = 1}^{n} \sum_{k = 1}^{n} Q_{i k} x_{i} (t) x_{k} (t) + \sum_{p = 1}^{m} \sum_{q = 1}^{m} R_{p q} u_{p} (t) u_{q} (t)) d t .

(33)

Using the affine transformation coefficient and the diagonal weight matrix

W = diag (w_{0}, \dots, w_{N})

, the discretized algebraic form is:

J \approx \frac{1}{2} S^{- 1} (\sum_{i, k} Q_{i k} {(X^{(i)})}^{T} W X^{(k)} + \sum_{p, q} R_{p q} {(U^{(p)})}^{T} W U^{(q)}) .

(34)

This can be written in the standard QP objective form

\frac{1}{2} Z^{T} H Z

. Using the Kronecker product, the Hessian matrix

H

has the following block diagonal structure (assuming no coupling terms between states and controls):

H = S^{- 1} [\begin{matrix} Q \otimes W & 0 \\ 0 & R \otimes W \end{matrix}] .

(35)

Specifically,

(Q \otimes W)

is a

n (N + 1) \times n (N + 1)

matrix composed of

n \times n

blocks, each being

Q_{i k} W

. Since

Q

and

R

are positive definite (or

Q

semi-definite) and elements of

W

are positive,

H

maintains convexity.

The system dynamics

\dot{x} (t) = A x (t) + B u (t)

must be satisfied at all

N + 1

collocation nodes:

{\dot{X}}^{(i)} = S D X^{(i)} .

(36)

The right-hand side (RHS) is a linear combination of the states and controls. For the set of equations corresponding to the

i

-th state variable across all collocation nodes:

S D X^{(i)} - \sum_{k = 1}^{n} A_{i k} X^{(k)} - \sum_{p = 1}^{m} B_{i p} U^{(p)} = 0 .

(37)

We need to assemble this system of equations into the form

A_{e q} Z = 0

. The entire dynamic constraint matrix

A_{d y n} \in ℝ^{n (N + 1) \times (n + m) (N + 1)}

can be compactly represented as:

A_{d y n} = [(I_{n} \otimes S D) - (A \otimes I_{N + 1}) | - (B \otimes I_{N + 1})],

(38)

where the left block corresponds to state variables

X

, and the right block corresponds to control variables

U

.

The initial condition

x (t_{0}) = x_{0}

corresponds to the node

τ_{N} = - 1

. This means for each state

i

, the

N

-th component (0-indexed) of vector

X^{(i)}

must equal

x_{0, i}

. We define the selection vector

e_{N} = [0, \dots, 0, 1] \in ℝ^{N + 1}

. The initial condition constraint matrix

A_{i n i t} \in ℝ^{n \times (n + m) (N + 1)}

is:

A_{i n i t} = [\begin{matrix} e_{N}^{T} & 0 & \dots & 0 & 0 & \dots & 0 \\ 0 & e_{N}^{T} & \dots & 0 & 0 & \dots & 0 \\ ⋮ & ⋮ & ⋱ & ⋮ & ⋮ & ⋱ & ⋮ \\ 0 & 0 & \dots & e_{N}^{T} & 0 & \dots & 0 \end{matrix}],

(39)

and the corresponding right-hand side vector is

B_{i n i t} = x_{0}

.

Combining the above derivations, the LQOC problem is transformed into the following standard convex quadratic programming problem:

\begin{array}{l} \min_{Z} \frac{1}{2} Z^{T} H Z \\ s . t . [\begin{array}{l} A_{d y n} \\ A_{i n i t} \end{array}] Z = [\begin{array}{l} 0_{n (N + 1)} \\ x_{0} \end{array}] \end{array} .

(40)

Before summarizing the overall algorithm, it is essential to clarify the interface mechanism between the soft-constrained Phase 1 (PINN) and the hard-constrained Phase 2 (CGL solver). To address potential feasibility conflicts, we deliberately avoid explicit geometric projections or artificial smoothing, choosing instead to feed the raw PINN trajectories directly into the NLP solver. Mathematically, within the framework of interior-point methods, a minor initial boundary violation merely manifests as a non-zero residual on the right-hand side of the KKT system, which does not compromise the structural rank of the constraint Jacobian. Since the PINN-generated guess is already in the immediate vicinity of the true optimal manifold, the solver naturally computes a stable Newton step to absorb this slight infeasibility during the very first iteration. By passing artificial spatial distortions, this direct-feed strategy perfectly preserves the physical dynamics captured by the neural network.

Ultimately, this seamless transition underscores how the proposed methodology synergizes the global search capabilities of neural networks with the rigorous precision of spectral-based optimization. By leveraging a neural-generated warm start to initialize the subsequent interior-point solver, the framework effectively converts the soft penalty constraints inherent in standalone PINNs into hard optimality conditions. This hybrid architecture not only ensures spectral-level accuracy and strict physical consistency but also successfully surmounts the convergence and precision bottlenecks typical of purely neural approaches.

5. Computational Results

Before presenting the specific numerical examples, it is important to clarify that the high-density CGL pseudospectral method, employed as the exact reference solution in the following evaluations, effectively represents the performance capability of the mainstream direct and indirect NLP approaches discussed in the introduction.

Example 1 (Van der Pol Oscillator).

In this section, the known Van der Pol oscillator problem is considered. The system state equations and boundary conditions are defined as follows:

\{\begin{array}{l} {\dot{x}}_{1} (t) = x_{2} (t) \\ {\dot{x}}_{2} (t) = - x_{1} (t) + x_{2} (t) (1 - x_{1}^{2} (t)) + u (t) \end{array} .

(41)

The optimization objective is to minimize the following Lagrange-type performance index, subjected to the control constraint

| u (t) | \leq 0.75

, and:

J = \int_{0}^{5} \frac{1}{2} (x_{1}^{2} (t) + x_{2}^{2} (t) + u^{2} (t)) d t .

(42)

The boundary conditions are set to define a transition from the initial state

x_{0} = {[1, 0]}^{⊤}

to the target state

x_{f} = {[- 1, 0]}^{⊤}

.

According to Fahroo and Ross [30], for systems with infinitely differentiable dynamics such as the Van der Pol oscillator, the error of the Chebyshev pseudospectral approximation is not merely convergent, but exhibits exponential (spectral) convergence. Therefore, we adopt

N = 150

, at which point the truncation error is strictly bounded by

O (c^{- N})

(where

c > 1

), ensuring that the numerical solution has fully converged.

In this evaluation, the spectral method with

N = 150

collocation points is used to generate the exact high-precision reference solution. Figure 2 presents a comprehensive comparison of the optimal trajectories produced by the standalone PINN, the proposed hybrid method, and this reference solution. Figure 2a illustrates the evolutionary trajectories of the state variables

x_{1}

and

x_{2}

. While the standalone PINN successfully captures the macroscopic dynamic modes, its performance deteriorates significantly at the boundaries, struggling to precisely satisfy the terminal constraints. In contrast, the proposed hybrid trajectory perfectly overlaps with the exact spectral solution, strictly anchoring to the target terminal state (see Table 2).

While the offline training phase introduces additional computational overhead for well-behaved problems like the Van der Pol oscillator, this baseline example primarily serves to validate the numerical fidelity of the proposed framework. It demonstrates that the hard-constrained architecture successfully corrects the inherent errors of standalone PINNs and exactly recovers the spectral precision.

Furthermore, the evaluation of computational efficiency highlights a practical offline–online trade-off. It should be noted that the proposed hybrid method requires an initial offline pre-training phase. While this offline computational cost is non-trivial, it effectively shifts the heavy burden of global topological exploration away from the online execution phase. In contrast to standalone direct methods, which often suffer from unpredictable computation times or divergence when trapped in local minima online, our framework relegates the online NLP solver to performing only local refinement.

Figure 3 compares the dynamics equation residuals of the standalone PINN, the proposed hybrid method, and the high-density spectral reference over the entire time domain on a logarithmic scale. As clearly observed, the standalone PINN (dashed line) struggles to minimize the dynamic residuals beyond an error floor of approximately

O (10^{- 2})

, limited by the inherent optimization challenges of competing soft penalty terms. In contrast, the proposed hybrid framework drives the dynamics violation down to the

O (10^{- 13}) \sim (10^{- 15})

range, successfully approaching the machine precision limit. This tremendous improvement demonstrates that utilizing the hard-constrained PINN as a warm start fully unlocks the exponential convergence properties of the CGL pseudospectral method, achieving an accuracy comparable to the computationally expensive high-density reference.

To validate the necessity of the proposed hard-constrained framework, we conducted a comprehensive ablation study comparing three initialization strategies: a standard PINN without PMP guidance, a PMP-based PINN with soft constraints, and our proposed hard-constrained PMP-PINN. The quantitative results are summarized in Table 3.

As shown in Table 3, the standard PINN achieves negligible ODE residuals via inverse dynamics but fails to satisfy boundary conditions (

L_{B C} \approx 2.09

) or control limits, rendering the trajectory infeasible. The soft-PMP approach reduces boundary errors but retains algebraic inconsistencies in stationarity (

~ 10^{- 2}

), creating gradient conflicts for downstream solvers. In contrast, the proposed hard-PMP framework structurally enforces stationarity and path constraints to machine precision, while achieving a boundary accuracy three orders of magnitude superior to the soft-constrained baseline. This confirms that structural encoding of KKT conditions is essential for generating strictly feasible and optimal warm starts.

Example 2 (1D Elliptic Optimal Control).

In this example, we consider a 1D elliptic optimal control problem:

\{\begin{array}{l} - y^{″} (x) = u (x) + f (x), x \in [- 1, 1] \\ y (- 1) = 0, y (1) = 0 \end{array} .

(43)

The optimization objective is to minimize the following performance index:

J = \int_{- 1}^{1} \frac{1}{2} (y^{2} (x) + u^{2} (x)) d x .

(44)

where the source term

f (x)

is selected to ensure the exact solutions

y = π^{2} \sin (π x)

and

u = - \sin (π x)

.

Unlike the ODE systems discussed in Section 3, the optimal control of PDEs involves the calculus of variations in infinite-dimensional function spaces. To ensure theoretical rigor and explicitly derive the stationarity condition for this elliptic PDE, we construct the Lagrangian functional

L (y, u, λ)

by appending the PDE constraint to the cost functional using the adjoint variable

λ (x)

:

L (y, u, λ) = \int_{- 1}^{1} [\frac{1}{2} (y^{2} (x) + u^{2} (x)) + λ (x) (y^{″} (x) + u (x) + f (x))] d x .

(45)

By taking the Fréchet derivative of the Lagrangian with respect to the state

y

and the control

u

, and applying integration by parts with the state boundary conditions

y (\pm 1) = 0

, we obtain the first-order necessary optimality conditions (KKT conditions) for the PDE system.

The state equation is:

- y^{″} (x) = u (x) + f (x), y (\pm 1) = 0 .

(46)

The adjoint equation is:

- λ^{″} (x) - y (x) = 0, λ (\pm 1) = 0 .

(47)

The stationarity condition is:

u (x) + λ (x) = 0 \Rightarrow u^{*} (x) = - λ (x) .

(48)

Table 4 presents the specific quantitative comparison of the numerical simulations Note: The metrics

‖ x_{e r r} ‖

and

ϵ_{d y n}

are quantified using the

L_{\infty}

norm (maximum absolute error). Figure 4 presents the comparison of the optimal state and control trajectories, while Figure 5 illustrates the absolute errors of the state solutions on a logarithmic scale.

Example 3 (Non-Convex Optimal Navigation in an Adverse Fluid Flow).

To further demonstrate the global search capabilities and the robustness of the proposed hybrid framework, we consider a highly non-convex 2D navigation problem in this example. An agent must navigate from an initial state

x (0) = {[- 2.0, 0.0]}^{T}

to a terminal state

x (t_{f}) = {[2.0, 0.0]}^{T}

within

t_{f} = 4.0

s, traversing a localized adverse Gaussian fluid flow.

The system dynamics are formulated as:

{\dot{x}}_{1} (t) = u_{1} (t) - 5.0 \exp (- \frac{x_{1}^{2} + x_{2}^{2}}{0.5}),

(49)

{\dot{x}}_{2} (t) = u_{2} (t) .

(50)

The objective is to minimize the performance index, which balances spatial deviation and required control effort:

J = \frac{1}{2} \int_{0}^{t_{f}} (0.1 x_{1}^{2} (t) + 0.1 x_{2}^{2} (t) + u_{1}^{2} (t) + u_{2}^{2} (t)) d t .

(51)

This problem presents a severe topological obstacle: The direct line-of-sight path passes squarely through the peak of the adverse wind field. This symmetrical setup creates a high-energy local minimum that frequently traps traditional gradient-based direct solvers.

Following the proposed methodology, the PMP stationarity condition is enforced as a hard constraint (

u_{1} = - λ_{1}

,

u_{2} = - λ_{2}

) within the PINN architecture. To bypass the symmetrical trap during the initial training phase, a transient symmetry-breaking loss

L_{s y m}

is introduced, guiding the network to explore the broader geometric manifold before local refinement.

Specifically, the symmetry-breaking loss is defined as a Mean Squared Error (MSE) penalty that encourages the lateral coordinate

x_{2}

to temporarily follow a predefined heuristic curve

x_{b i a s} (t)

, formulated as:

L_{s y m} = w_{s y m} \frac{1}{N_{c}} \sum_{i = 1}^{N_{c}} {({\hat{x}}_{2} (t_{i}) - x_{b i a s} (t_{i}))}^{2},

(52)

where

x_{b i a s} (t) = \sin (π t / t_{f})

serves as an artificial spatial perturbation. To ensure that this auxiliary loss does not compromise the exactness of the physical optimality conditions in the final solution, we employ a hard-cutoff annealing strategy. During the initial global exploration phase, the weight

w_{s y m}

is set to 1.0, which effectively pushes the trajectory out of the high-energy local minimum trap. Subsequently,

w_{s y m}

is dropped to 0.0.

Figure 6 demonstrates the internal consistency validation of the PMP-PINN training results. In optimal control theory, for autonomous systems without explicit time dependence, the Hamiltonian

H

must remain constant along the optimal trajectory, and its time derivative

\dot{H}

must identically be zero. As illustrated in the figure, the fluctuations in the network-predicted

H

values and its derivative are rigorously suppressed within an extremely small error margin, satisfying the minimum Hamiltonian condition (

\partial H / \partial u = 0

). Furthermore, the predicted trajectories for both the costates and velocities perfectly align with the theoretical values. This high-fidelity output provides a highly reliable physical prior and initial guess for the subsequent Chebyshev pseudospectral collocation method.

Figure 7 illustrates the trajectory planning results under the influence of a localized strong wind field. Figure 7a defines the initial state (blue triangle), target state (red circle), and central wind core. Figure 7b highlights the fundamental limitation of the standalone NLP solver. Relying on a naive linear cold start, it becomes trapped in a suboptimal, high-energy local minimum that directly crosses the maximum resistance of the wind core. In contrast, by leveraging the physical global topology predicted by the PMP-PINN as a warm-start guess, Figure 7c demonstrates that the proposed hybrid solver successfully circumvents the wind core to achieve a topologically superior and energy-efficient feasible path.

To summarize this specific numerical example, Example 3 highlights a critical scenario where conventional direct methods fall short. While the standalone pseudospectral solver is fully capable of handling the simpler optimal control problems presented in previous examples, it completely fails when confronted with the severe topological obstacle introduced by the localized wind field. Relying on a naive cold start, the standard NLP solver becomes trapped in a high-energy local minimum, forcing the trajectory directly through the maximum resistance of the wind core. In contrast, the proposed hybrid framework excels precisely in this complex landscape. By utilizing the PMP-PINN to capture the correct global topology, it provides a physically consistent warm-start guess. This enables the subsequent spectral solver to seamlessly converge to a topologically superior feasible path, successfully circumventing high-energy local minima.

Example 4 (Boost Converter with Constant Power Load).

To further demonstrate the superiority of the proposed framework in handling highly nonlinear and physically stiff engineering systems, we investigate the optimal control of a DC-DC boost converter feeding a constant power load (CPL). In modern microgrids and aerospace power systems, CPLs introduce severe instability due to their negative incremental impedance characteristics. The governing bilinear differential equations are given by:

{\dot{x}}_{1} = \frac{E}{L} - \frac{1 - u}{L} x_{2},

(53)

{\dot{x}}_{2} = \frac{1 - u}{C} x_{1} - \frac{P}{C x_{2}},

(54)

where

x_{1}

and

x_{2}

denote the inductor current and capacitor voltage, respectively. The control input

u \in [0, 1]

represents the duty cycle of the switching device. The term

- P / (C x_{2})

introduces strong non-convexity and a potential singularity when

x_{2} \to 0

, rendering traditional numerical solvers highly sensitive to initialization.

The objective is to drive the system from an initial perturbed state

x_{0} = {[x_{1, 0}, x_{2, 0}]}^{T}

to a desired equilibrium reference

x_{r e f} = {[x_{1, r e f}, x_{2, r e f}]}^{T}

within a fixed terminal time

t_{f}

, while minimizing the transient energy deviation. The cost functional is defined as:

J = \frac{1}{2} \int_{0}^{t_{f}} (q_{1} {(x_{1} - x_{1, r e f})}^{2} + q_{2} {(x_{2} - x_{2, r e f})}^{2} + r {(u - u_{r e f})}^{2}) d t .

(55)

The standard system parameters are set as

E = 10.0

V,

L = 0.01

H,

C = 0.002

F, and

P = 15.0

W. The reference equilibrium point is

x_{r e f} = {[1.5, 30.0]}^{T}

, with a steady-state duty cycle

u_{r e f} = 2 / 3

. The weighting matrices are chosen as

q_{1} = 1.0

,

q_{2} = 1.0

, and

r = 10.0

. The initial condition is set to

x_{0} = {[1.0, 20.0]}^{T}

, and the terminal time is

t_{f} = 0.1

.

According to the Pontryagin Maximum Principle (PMP), the analytical control law is derived by minimizing the Hamiltonian with respect to

u

. In our hard-PMP PINN architecture, the duty cycle is strictly eliminated from the neural network outputs and analytically embedded into the computational graph via the saturated stationarity condition:

u = clip (u_{r e f} + \frac{1}{r} (λ_{2} \frac{x_{1}}{C} - λ_{1} \frac{x_{2}}{L}), 0, 1) .

(56)

To evaluate the numerical performance, we compare the proposed hybrid framework (

N = 60

) against a standalone soft-constrained PINN and a high-density classical CGL pseudospectral method (

N = 150

), which serves as the exact reference solution.

The optimal state and control trajectories are illustrated in Figure 8a,b. As can be clearly observed, the optimal duty cycle exhibits a distinctive “bang-singular” structure, maintaining strict saturation at

u = 1.0

during the initial transient phase to rapidly inject energy and counteract the voltage drop caused by the CPL. Due to the gradient pathology in soft-constrained optimization and the inherent spectral bias of neural networks, the standalone PINN completely fails to capture this sharp switching behavior. It outputs a severely smoothed, suboptimal control sequence, resulting in massive tracking errors in both current and voltage recovery. In stark contrast, by enforcing the rigorous PMP stationarity condition as a hard architectural constraint, the proposed hybrid method perfectly overlaps with the ultra-high-density spectral ground truth, accurately capturing the non-smooth control corner without inducing high-frequency Gibbs oscillations.

Furthermore, Figure 9 validates the absolute ODE residuals evaluated on a dense temporal grid. The standalone PINN struggles with unacceptably high physical violations

O (10^{0})

. Conversely, utilizing the offline hard-PMP PINN to provide a topologically correct warm-start, the subsequent online CGL refinement achieves a strict machine-precision tolerance

O (10^{- 12})

. This confirms that the proposed framework not only completely bypasses the initialization sensitivity of nonlinear programs but also strictly guarantees the mathematical rigorousness of the optimal solution.

6. Conclusions

This work presents a hybrid PMP–PINN warm-start framework for nonlinear optimal control that systematically bridges physics-informed learning and high-order spectral discretization. By embedding PMP into the neural network architecture, the proposed method leverages data-free learning to generate physically consistent state–costate trajectories, which effectively mitigate the initialization sensitivity inherent in classical indirect methods. Our goal is not to replace mature direct solvers, but to enhance the robustness of PMP-based indirect methods, which remain theoretically attractive yet numerically fragile.

A key distinguishing feature of the proposed framework is the enforcement of the stationarity condition as a hard architectural constraint. Unlike existing PMP-PINN approaches that treat the control variable as a free network output subject to penalty-based regularization, the control in our method is analytically eliminated via the PMP stationarity condition. This design guarantees exact satisfaction of the optimality condition throughout the training process, while significantly reducing the optimization search space.

The neural-generated trajectories are subsequently integrated with a CGL pseudospectral discretization, enabling a single-shot convex quadratic programming formulation with spectral accuracy. This hybrid strategy effectively converts the soft-constraint nature of standalone PINNs into hard-constrained optimal control formulations, combining the global approximation capability of neural networks with the numerical rigor of spectral methods.

Rather than proposing another PINN variant, this work demonstrates how physics-informed learning can be used as a structural tool to stabilize classical indirect optimal control solvers.

Looking forward, expanding the scalability of this hybrid framework to high-dimensional complex systems remains a critical frontier for future research. While the current study successfully validates the methodology on low-dimensional dynamic models, applying indirect methods to high-dimensional systems inherently introduces severe algebraic complexities. Specifically, the manual derivation of analytical Hamiltonians and the formulation of highly coupled costate equations become intractable as the state dimension increases. To overcome this scalability bottleneck, future work will focus on integrating automated symbolic computation engines directly into the PINN computational graph to automate the derivation of the necessary PMP conditions. Furthermore, combating the “curse of dimensionality” inherent in neural network training across high-dimensional state spaces will necessitate the exploration of advanced adaptive collocation strategies. Successfully navigating these challenges will further unlock the potential of the proposed neural warm-start paradigm for large-scale, real-world engineering applications.

Author Contributions

Conceptualization, Z.D. and X.W.; methodology, Z.D. and X.W.; software, Z.D.; validation, Z.D.; formal analysis, Z.D. and X.W.; investigation, Z.D.; data curation, Z.D.; writing—original draft preparation, Z.D.; writing—review and editing, X.W. and Z.D.; visualization, Z.D.; supervision, X.W.; project administration, X.W.; funding acquisition, X.W. All authors have read and agreed to the published version of the manuscript.

Funding

The Natural Science Foundation of Ningxia (No. 2026AAC030090).

Data Availability Statement

The original data presented in the study are openly available in an open-source repository at https://github.com/mumu1cn/pinn-of-oc (accessed on 7 April 2026).

Acknowledgments

The authors express gratitude to Wenshuai Wang at Ningxia University for the valuable guidance and insightful discussions throughout this research. Additionally, during the preparation of this manuscript, the authors used Gemini 3 Pro for the purposes of literature translation and grammatical polishing. The authors have reviewed and edited the output and take full responsibility for the content of this publication.

Conflicts of Interest

The authors declare no conflicts of interest.

Abbreviations

The following mathematical variables and abbreviations are used in this manuscript:

Nomenclature
$t_{0}, t_{f}$	Initial time and final time
$t$	Time variable
$x (t)$	State vector
$u (t)$	Control input vector
$λ (t)$	Costate vector
$J$	Objective (cost) functional
$H$	Hamiltonian function
$N$	Number of collocation nodes
$τ$	Nodes in the standard Chebyshev interval $[- 1, 1]$
$T_{k} (τ)$	Chebyshev polynomial of degree $k$
$D$	Spectral differentiation matrix
$Φ, L$	Terminal cost and running cost
Abbreviation
PMP	Pontryagin’s Maximum Principle
PINNs	Physics-Informed Neural Networks
CGL	Chebyshev–Gauss–Lobatto
NLP	Nonlinear Programming
TPBVP	Two-Point Boundary Value Problem

References

Zhang, O.; Liu, Z.; Shao, X.; Yao, W.; Wu, L.; Liu, J. Learning-Based Task Space Trajectory Planning Frame- Work with Preplanning and Postprocessing for Uncertain Free-Floating Space Robots. IEEE Trans. Aerosp. Electron. Syst. 2025, 61, 6325–6338. [Google Scholar] [CrossRef]
Pan, H.; Xin, M. Nonlinear Robust and Optimal Control of Robot Manipulators. Nonlinear Dyn. 2014, 76, 237–254. [Google Scholar] [CrossRef]
Rigatos, G.; Abbaszadeh, M. Nonlinear Optimal Control for multi-DOF Robotic Manipulators with Flexible Joints. Optim. Control Appl. Methods 2021, 42, 1708–1733. [Google Scholar] [CrossRef]
Mirlekar, G.; Li, S.; Lima, F.V. Design and Implementation of a Biologically Inspired Optimal Control Strategy for Chemical Process Control. Ind. Eng. Chem. Res. 2017, 56, 6468–6479. [Google Scholar] [CrossRef]
Schwenzer, M.; Ay, M.; Bergs, T.; Abel, D. Review on Model Predictive Control: An Engineering Perspective. Int. J. Adv. Manuf. Technol. 2021, 117, 1327–1349. [Google Scholar] [CrossRef]
Bryson, A.E. Applied Optimal Control: Optimization, Estimation, and Control; Routledge: New York, NY, USA, 1975. [Google Scholar] [CrossRef]
Betts, J.T. Survey of Numerical Methods for Trajectory Optimization. J. Guid. Control Dyn. 1998, 21, 193–207. [Google Scholar] [CrossRef] [PubMed]
Garg, D.; Patterson, M.; Hager, W.W.; Rao, A.V.; Benson, D.A.; Huntington, G.T. A Unified Framework for the Numerical Solution of Optimal Control Problems Using Pseudospectral Methods. Automatica 2010, 46, 1843–1851. [Google Scholar] [CrossRef]
Garg, D.; Hager, W.W.; Rao, A.V. Pseudospectral Methods for Solving Infinite-Horizon Optimal Control Problems. Automatica 2011, 47, 829–837. [Google Scholar] [CrossRef]
Elnagar, G.; Kazemi, M.A.; Razzaghi, M. The Pseudospectral Legendre Method for Discretizing Optimal Control Problems. IEEE Trans. Autom. Contr. 1995, 40, 1793–1796. [Google Scholar] [CrossRef]
Michael Ross, I.; Fahroo, F. A Pseudospectral Transformation of the Convectors of Optimal Control Systems. IFAC Proc. Vol. 2001, 34, 543–548. [Google Scholar] [CrossRef]
Garg, D.; Patterson, M.A.; Francolin, C.; Darby, C.L.; Huntington, G.T.; Hager, W.W.; Rao, A.V. Direct Trajectory Optimization and Costate Estimation of Finite-Horizon and Infinite-Horizon Optimal Control Problems Using a Radau Pseudospectral Method. Comput. Optim. Appl. 2011, 49, 335–358. [Google Scholar] [CrossRef]
Ross, I.M.; Fahroo, F. Pseudospectral Knotting Methods for Solving Nonsmooth Optimal Control Problems. J. Guid. Control Dyn. 2004, 27, 397–405. [Google Scholar] [CrossRef]
Zhang, Z.; Zhang, Y.; Wang, B. Application of the Hp-Adaptive Pseudospectral Method in Spacecraft Orbit Pursuit-Evasion Game. Adv. Space Res. 2024, 73, 1597–1610. [Google Scholar] [CrossRef]
Filipov, S.M.; Gospodinov, I.D.; Faragó, I. Replacing the Finite Difference Methods for Nonlinear Two-Point Boundary Value Problems by Successive Application of the Linear Shooting Method. J. Comput. Appl. Math. 2019, 358, 46–60. [Google Scholar] [CrossRef]
Park, C.; Scheeres, D.J. Determination of Optimal Feedback Terminal Controllers for General Boundary Conditions Using Generating Functions. Automatica 2006, 42, 869–875. [Google Scholar] [CrossRef]
Zhanlav, T.; Batgerel, B.; Otgondorj, K.; Buyantogtokh, D.; Ulziibayar, V.; Mijiddorj, R.-O. Higher-Order Finite-Difference Schemes for Nonlinear Two-Point Boundary Value Problems. J. Math. Sci. 2024, 279, 850–865. [Google Scholar] [CrossRef]
Eichmeir, P.; Steiner, W. A Double Shooting Method for Two-Point Boundary Value Problems in Multibody Dynamics. Multibody Syst. Dyn. 2026. [Google Scholar] [CrossRef]
Ahmad, B.; Nieto, J.J.; Shahzad, N. The Bellman–Kalaba–Lakshmikantham Quasilinearization Method for Neumann Problems. J. Math. Anal. Appl. 2001, 257, 356–363. [Google Scholar] [CrossRef]
Wu, D.; Yu, C.; Wang, H.; Bai, Y.; Teo, K.-L.; Toh, K.-C. Iterative Chebyshev Approximation Method for Optimal Control Problems. ISA Trans. 2024, 152, 277–289. [Google Scholar] [CrossRef]
Wang, X.; Liu, J.; Peng, H.; Zhao, X. An Iterative Framework to Solve Nonlinear Optimal Control with Proportional Delay Using Successive Convexification and Symplectic Multi-Interval Pseudospectral Scheme. Appl. Math. Comput. 2022, 435, 127448. [Google Scholar] [CrossRef]
Miele, A.; Iyer, R.R.; Well, K.H. Modified Quasilinearization and Optimal Initial Choice of the Multipliers Part 2? Optimal Control Problems. J. Optim. Theory Appl. 1970, 6, 381–409. [Google Scholar] [CrossRef]
Jiang, C.; Lin, Q.; Yu, C.; Teo, K.L.; Duan, G.-R. An Exact Penalty Method for Free Terminal Time Optimal Control Problem with Continuous Inequality Constraints. J. Optim. Theory Appl. 2012, 154, 30–53. [Google Scholar] [CrossRef]
LeCun, Y.; Bengio, Y.; Hinton, G. Deep Learning. Nature 2015, 521, 436–444. [Google Scholar] [CrossRef]
Barry-Straume, J.; Sarshar, A.; Popov, A.A.; Sandu, A. Physics-Informed Neural Networks for PDE-Constrained Optimization and Control. Commun. Appl. Math. Comput. 2025. [Google Scholar] [CrossRef]
Zhai, H.; Sands, T. Comparison of Deep Learning and Deterministic Algorithms for Control Modeling. Sensors 2022, 22, 6362. [Google Scholar] [CrossRef]
Zhai, H.; Sands, T. Controlling Chaos in Van Der Pol Dynamics Using Signal-Encoded Deep Learning. Mathematics 2022, 10, 453. [Google Scholar] [CrossRef]
Antonelo, E.A.; Camponogara, E.; Seman, L.O.; de Souza, E.R.; Jordanou, J.P.; Hubner, J.F. Physics-Informed Neural Nets for Control of Dynamical Systems. Neurocomputing 2024, 579, 127419. [Google Scholar] [CrossRef]
Li, C.; Zeng, R. AW-EL-PINNs: A Multi-Task Learning Physics-Informed Neural Network for Euler-Lagrange Systems in Optimal Control Problems. Neural Netw. 2026, 199, 108694. [Google Scholar] [CrossRef] [PubMed]
Fahroo, F.; Ross, I.M. Direct Trajectory Optimization by a Chebyshev Pseudospectral Method. In Proceedings of the 2000 American Control Conference; ACC (IEEE Cat. No. 00CH36334); IEEE: Chicago, IL, USA, 2000; Volume 6, pp. 3860–3864. [Google Scholar]

Figure 1. Schematic of the proposed hybrid framework.

Figure 2. Comparison between the PINN and the proposed method. (a) Optimal state trajectories. (b) Optimal control input.

Figure 3. Dynamics violation errors.

Figure 4. Comparison between the PINN and the proposed hybrid method for Example 2. (a) Optimal state trajectories. (b) Optimal control input.

Figure 5. Dynamics violation errors among the different methods for Example 2 (log scale).

Figure 6. Internal consistency validation of the PMP-PINN training results.

Figure 7. Trajectory optimization comparison in a wind field. (a) Problem setup and wind field. (b) Standalone NLP failure. (c) Proposed hybrid solver.

Figure 8. Comparison of optimal trajectories among the standalone PINN, the proposed hybrid framework, and the high-density spectral reference for Example 4. (a) Optimal state trajectories (

x_{1}

and

x_{2}

). (b) Optimal control input

u

.

Figure 8. Comparison of optimal trajectories among the standalone PINN, the proposed hybrid framework, and the high-density spectral reference for Example 4. (a) Optimal state trajectories (

x_{1}

and

x_{2}

). (b) Optimal control input

u

.

Figure 9. Comparative analysis of the absolute ODE residuals for Example 4.

Table 1. Systematic comparison of methodologies.

Method	Core Mechanism	Accuracy	Init. Sens.	Constraints	Limitations
Indirect (PMP) [6]	TPBVP solvers	Very high	High	Exact	Diverges without precise costate guess
Pseudospectral [8]	NLP collocation	High	Medium	Asymptotic	Prone to local minima
Standard PINNs [25]	Auto-diff approx.	Low–med	Low	Soft	Inexact dynamics and optimality
Proposed hybrid	Hard-PMP + Spectral	Very high	Low	Hard	Offline training; assumes smoothness

Table 2. Results of Example 1.

Metric	Standalone PINN	Pseudospectral	Proposed Method
$J$	2.1407	2.1366	2.1366
$‖ x_{e r r} ‖$	$1.23 \times 10^{- 2}$	$< 1.0 \times 10^{- 12}$	$< 1.0 \times 10^{- 12}$
$ϵ_{d y n}$	$1.27 \times 10^{- 1}$	$1.50 \times 10^{- 12}$	$7.99 \times 10^{- 13}$
Solver iterations	-	16	13
Offline training time (GPU)	~97.60 s	N/A	~97.60 s
Online solving time (CPU)	N/A	73.93 ms	59.54 ms

Table 3. Assessment of warm-start quality.

Method	ODE Residual	Boundary Error	Stationarity Error	Path Constraint Violation
PINN	$1.15 \times 10^{- 7}$	$2.09 \times 10^{0}$	N/A	$1.28 \times 10^{- 1}$
Soft-PMP PINN	$5.50 \times 10^{- 3}$	$5.85 \times 10^{- 4}$	$9.75 \times 10^{- 3}$	$2.49 \times 10^{- 2}$
Hard-PMP (Ours)	$2.97 \times 10^{- 4}$	$3.73 \times 10^{- 7}$	Exact value	Exact value

Table 4. Results of Example 2.

Metric	Standalone PINN	Proposed Method	Improvement Magnitude
$J$	2.1407	2.1366	$Δ J \approx - 0.0041$
$‖ x_{e r r} ‖$	$1.23 \times 10^{- 2}$	$< 1.0 \times 10^{- 12}$	$~ 10^{10}$ reduction
$ϵ_{d y n}$	$1.27 \times 10^{- 1}$	$7.99 \times 10^{- 13}$	$~ 10^{12}$ reduction

Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

Share and Cite

MDPI and ACS Style

Du, Z.; Wang, X. A Hard-Constrained PMP-Based Warm-Start Framework for Nonlinear Optimal Control Using Physics-Informed Learning. Mathematics 2026, 14, 1614. https://doi.org/10.3390/math14101614

AMA Style

Du Z, Wang X. A Hard-Constrained PMP-Based Warm-Start Framework for Nonlinear Optimal Control Using Physics-Informed Learning. Mathematics. 2026; 14(10):1614. https://doi.org/10.3390/math14101614

Chicago/Turabian Style

Du, Zhuo, and Xu Wang. 2026. "A Hard-Constrained PMP-Based Warm-Start Framework for Nonlinear Optimal Control Using Physics-Informed Learning" Mathematics 14, no. 10: 1614. https://doi.org/10.3390/math14101614

APA Style

Du, Z., & Wang, X. (2026). A Hard-Constrained PMP-Based Warm-Start Framework for Nonlinear Optimal Control Using Physics-Informed Learning. Mathematics, 14(10), 1614. https://doi.org/10.3390/math14101614

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Menu

A Hard-Constrained PMP-Based Warm-Start Framework for Nonlinear Optimal Control Using Physics-Informed Learning

Abstract

1. Introduction

2. Problem Formulation

3. PMP-PINN Warm Start Mechanism

4. Quasilinearization of the System

5. Computational Results

6. Conclusions

Author Contributions

Funding

Data Availability Statement

Acknowledgments

Conflicts of Interest

Abbreviations

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI