A Regularized Physics-Informed Neural Network to Support Data-Driven Nonlinear Constrained Optimization

: Nonlinear optimization (NOPT) is a meaningful tool for solving complex tasks in fields like engineering, economics, and operations research, among others. However, NOPT has problems when it comes to dealing with data variability and noisy input measurements that lead to incorrect solutions. Furthermore, nonlinear constraints may result in outcomes that are either infeasible or suboptimal, such as nonconvex optimization. This paper introduces a novel regularized physics-informed neural network (RPINN) framework as a new NOPT tool for both supervised and unsupervised data-driven scenarios. Our RPINN is threefold: By using custom activation functions and regularization penalties in an artificial neural network (ANN), RPINN can handle data variability and noisy inputs. Furthermore, it employs physics principles to construct the network architecture, computing the optimization variables based on network weights and learned features. In addition, it uses automatic differentiation training to make the system scalable and cut down on computation time through batch-based back-propagation. The test results for both supervised and unsupervised NOPT tasks show that our RPINN can provide solutions that are competitive compared to state-of-the-art solvers. In turn, the robustness of RPINN against noisy input measurements makes it particularly valuable in environments with fluctuating information. Specifically, we test a uniform mixture model and a gas-powered system as NOPT scenarios. Overall, with RPINN, its ANN-based foundation offers significant flexibility and scalability.


Introduction
Optimization approaches have emerged as tools for solving complex problems across various disciplines.Unlike traditional linear models, nonlinear optimization (NOPT) methods are capable of incorporating the intricate and interdependent relationships inherent in real-world scenarios [1].These techniques are particularly valuable in fields such as engineering, economics, and operations research, where they enable the formulation and solution of models that more accurately reflect the underlying dynamics [2].By leveraging advanced algorithms and computational solutions, NOPT facilitates improved decisionmaking and implementation, thereby enhancing efficiency and effectiveness in tackling multifaceted challenges.As research and technology continue to evolve, their significance in achieving optimal outcomes in diverse applications is becoming increasingly evident [3,4].Nonetheless, NOPT comprises salient issues: First, data variability and noisy input measurements yield erroneous and fluctuating solutions.Second, nonlinear constraints greatly complicate the task of achieving optimal outputs [5].Moreover, system scalability should be considered.
Data variability and noisy samples, in particular, are known to be problems that make stochastic measurements less accurate and increase the number of errors in NOPT [6].The presence of unwanted effects in the data not only reduces the solution quality but also adds complications to the computation, making it more difficult to choose suitable optimization parameters [7].The instability greatly impedes the optimization process, rendering the algorithm vulnerable to external effects and significantly reducing its overall efficiency [8].The intricacies of nonlinear constraints might result in outcomes that are either infeasible or suboptimal [9].Then, the NOPT may have a slow rate of convergence, with a tendency to become trapped at a local minimum.This might present a challenge when both speed and accuracy are crucial [10].Hence, optimization techniques become impractical for large-scale applications [11], and as the number of variables increases, scalability becomes a significant hindrance, underscoring the pressing need for specialist software and more processing time [12].Consequently, it is important to deal with large optimization problems, reduce runtime, and simplify the inherent complexity of noisy inputs and nonlinear constraints [11].Indeed, many NOPT tasks are nondeterministic polynomial time (NP-hard), making it difficult to find an exact solution for large instances because there is not a polynomial time algorithm that works well or that does not introduce errors into the final output [13].Additionally, some NOPT tasks have nonconvex nonlinear programming (NLP) issues.The latter are especially challenging because they involve a lot of nonconvex and integer functions [14].
Typically, mathematical programming or other classical techniques solve NOPT.These methods are capable of effectively handling nonlinearities and discontinuities [9].Customized strategies are also implemented to refine the iterative search [15].Gradient-based techniques, mostly based on descent methods, have also shown they can deal with problems like nonlinear and convex constraints [16].Similarly, decomposition methods simplify complexity by segmenting the optimization into more manageable subproblems [17].Additionally, search approaches and metaheuristics are crucial for maintaining a proper balance between exploration and exploitation [18], which enhances efficiency in finding optimal outputs.However, conventional methods often converge on solutions that may not be useful, especially in stochastic and noisy environments with high uncertainty and intrinsic data variability, which can reduce their accuracy [19].
Nowadays, artificial neural networks (ANNs) employ supervised learning to tackle nonlinear and stochastic problems through regression tasks.These networks are trained to find complex patterns and make accurate predictions even when there is a lot of uncertainty using data-driven strategies [20].Commonly, ANN-based approaches employ automatic differentiation (AD), a computational technique used to evaluate the derivatives of functions efficiently and accurately.Unlike numerical alternatives, which can suffer from precision issues, or symbolic differentiation, which can be computationally expensive, AD works by breaking down functions into elementary operations for which derivatives are known and applying the chain rule systematically [21].This process ensures that the derivative calculations are exact to machine precision and enables the calculation of loss function gradients with respect to network parameters, which is essential for gradient-based optimization algorithms like back-propagation.
Recently, physics-informed neural networks (PINNs) have emerged as an effective ANN-based optimization technique.Designed to align training with relevant physical principles, they have proven successful in various NOPT applications [22].Commonly, the Karush-Kuhn-Tucker (KKT) criteria are used to represent constraints and integrate them into the network's cost function during supervised training [23].Additionally, a novel approach for integrating constraints using Runge-Kutta (RK) in unsupervised training has been proposed in [24].Nevertheless, putting these networks into action is hard, especially when it comes to defining the right loss functions, choosing the best hyperparameters, and making sure that computations run quickly while complex systems are being trained [25].Also, although PINNs have remarkable capabilities, their ability to generalize to nonlinear optimization problems is limited [26].
In this paper, we present a novel regularized PINN framework, termed RPINN, as a NOPT optimization tool for both supervised and unsupervised data-driven scenarios.As a result, we deal with three key NOPT issues.We first address data variability and noisy input measurements by appropriately adapting custom activation and regularization penalties within an ANN scheme.Second, we effectively integrate nonlinear constraints into the network architecture, adhering to the principles of model physics.Specifically, we utilize the network weights and/or learned features within a functional composition framework to determine the NOPT variables.Third, our ANN-based strategy employs AD training, which favors system scalability and computational time through batch-based backpropagation.Experimental results from both supervised and unsupervised data-driven NOPT tasks confirm that our proposal is robust and competitive against state-of-the-art optimization approaches.The primary advantage of our proposal lies in its stability against noisy input measurements, making it a particularly valuable solution in contexts with fluctuating information.Furthermore, because RPINN is based on ANN, it offers flexibility in terms of the network architecture.
The agenda for this paper is as follows: Section 2 summarizes the related work.Section 3 describes the materials and methods.Sections 5 and 6 depict the experiments and discuss the results.Lastly, Section 7 outlines the conclusions and future work.

Related work
Some studies have shown that mathematical programming has become a crucial tool in numerical optimization.A notable example is the analysis by [9], which employs a sequential linear programming algorithm to address nonlinearities and discontinuities.In this context, the simplex method proves essential, being a classic technique effective for solving linear programming problems through iterative adjustments of solutions within a feasible set [27].Similarly, the study by [15] explores a solution via quadratic programming (QP).Mixed-integer programming (MIP), on the other hand, is an optimization strategy that uses both integer and continuous variables.It is widely used to solve difficult problems [28], focusing on how the branch-and-cut (BC) algorithm can be employed to find the best solution [29].Furthermore, second-order cone programming (SOCP) facilitates effective solutions for problems involving linear and quadratic constraints [30].New studies, like [31], look into semidefinite programming (SDP), and the work in [32] uses convexification techniques.Likewise, exponential programming (EXP) models NOPT objectives and constraints through exponential functions [33].Additionally, power cone programming (PCP) is considered for modeling product and square relationships [34].Yet, these classical methods face challenges such as scalability, computation time, convergence, and practical precision, underscoring their inherent complexity and limitations.Furthermore, the use of relaxations or approximations affects the optimization accuracy ref. [31].
On the other hand, gradient methods' efficiency and precision in identifying optimal solutions highlight their relevance for practical optimization tasks.The work in [35] uses the Dai-Liao conjugate gradient method and hyperplane projections for global convergence to solve nonlinear equations.In addition, ref. [36] faces the nonconvex issue based on a set of starting points.Moreover, nonlinear decomposition using linear programming (LP) and gradient descent was also proposed [37].Further, the work in [38] examines the Newton-based search to deal with convergence issues in poorly conditioned systems.Also, the semisweeping Newton technique is applied for optimization in Hilbert spaces [39].For noisy problems, the authors in [40] use piecewise polynomial interpolation and box reformulations, along with an interior-point (IP) method.The authors in [41] tackle similar problems with integrated penalty techniques.Overall, gradient methods are effective at solving NOPT tasks, but they have a challenging time convergent and are expensive to run in noisy and nonlinear situations [42].Also, it can be challenging to choose the best learning rate, and they run the risk of finding local minima [43].As seen in [44], it is also important to make sure that at least first-degree differentiation continuity is maintained when using techniques like the conjugate gradient, the IP, and the Newton-based approach.
Of note, most of the available optimization solvers are based on the classical approaches mentioned above.Among them, Clarabel stands out for its versatility in optimizing a wide variety of problems.However, it still faces significant challenges in areas such as MIP [45].Gurobi is renowned for its proficiency in MIP due to its extensive range of techniques, including simplex and IP methods.However, because it is proprietary software, it might not be able to be used in situations that require license flexibility [46].Mosek is efficient concerning the IP approach, but its support for MIP is relatively limited, and its aptitude for NLP remains under debate, which could be a hindrance for developers who prefer open-source solutions [47].Xpress specializes in solving MIP, offering conditional support for NLP, but is a closed-license alternative [48].In turn, SCS, leveraging its open-source status, promotes adaptability and collaborative development, although its limitations in NLP reduce its effectiveness in certain optimization areas [49].IPOPT excels at solving NLP problems, and its open access allows for flexibility [50].Now, in this multifaceted optimization environment, the integration of tools such as MATPOWER, GEKKO, and CVXPY significantly expands the available options.MAT-POWER is essential for solving energy system issues and supports solvers like Gurobi, Xpress, and IPOPT for linear, mixed-integer, and nonlinear programming [51][52][53].GEKKO specializes in dynamic systems and nonlinear models, offering a holistic and open-source Python platform [54,55].CVXPY is an open-source modeling language for convex optimization problems embedded in Python.It allows you to express your problems naturally, mirroring the mathematical formulation rather than conforming to the restrictive standard form required by solvers [56,57].Table 1 summarizes the mentioned solvers.
Recently, ANNs have positioned themselves as fundamental tools in optimization by incorporating deep learning techniques, effectively addressing the complexity and nonlinearities of various problems.Conventional ANNs employ supervised learning to tackle nonlinear and stochastic problems through regression tasks.To this end, historical data or solutions precomputed by specialized NOPT tools are used to train these networks [59].This approach enables ANNs to learn complex patterns and make accurate predictions even under significant uncertainty [20].Typically, ANN-based approaches utilize AD, a computational method for efficiently and accurately evaluating function derivatives.Instead of numerical or symbolic differentiation, which can have issues with accuracy and require a lot of computing power, AD breaks functions down into simple operations whose derivatives are known and uses the chain rule consistently [21].Thereby, AD ensures machine-level accuracy in derivative calculations and simplifies the determination of loss function gradients in relation to network parameters, enabling the use of gradient-based search with back-propagation.The work in [60] combines quasi-Newton methods and ANNs for NOPT.Furthermore, the authors in [59] utilize deep learning to solve optimal flow problems.Similarly, the work in [61] introduces an integrated training technique that, while effective, requires larger neural networks and presents challenges in generalization.Concurrently, ref. [62] uses elastic layers and incremental training as optimization-based solvers.Furthermore, the method by [63] combines convex relaxation with graph neural networks.
PINN has recently emerged as a powerful optimization tool.These training approaches have proven effective in various NOPT applications, integrating relevant physical principles within ANNs [22].The KKT criteria are applied to formulate constraints that are incorporated into an ANN cost function during supervised training [23].In [64], a PINN framework is detailed that imposes penalties for constraint violations in the loss function.The study in [65] proposes a loss function that combines errors from differential and algebraic states with normative equation violations.Additionally, a novel strategy has been proposed to include constraints in unsupervised training using an RK-based technique [24].Nevertheless, complete approaches based on ANNs and PINNs face challenges such as optimality degradation.In response, advanced alternatives like [66] have emerged, integrating system constraints into the cost function and applying penalties for violations.Furthermore, ref. [67] introduces an algorithm to address nonlinear problems modeled by partial differential equations with noisy data through Bayesian physics-informed neural networks (B-PINNs).Additionally, ref. [68] proposes a parametric differential equation-based approach holding functional connections to enhance the robustness and accuracy of PINNs.In turn, ref. [69] presents a truncated Fourier decomposition, termed Modal-PINNs, to optimize the reconstruction of periodic signals.However, these alternatives often lack adequate precision, generalization capability, and scalability ref. [23].Finally, supervised data are usually required, complicating their application in various NOPT scenarios.

Nonlinear Optimization Fundamentals (NOPT)
Let x ∈ R P be a vector in P variables.The conventional NOPT problem can be summarized as follows: min where the objective function ϱ : R P → R is real-valued.Also, the bound constraints are shown by ξ min , ξ max ∈ R P .The linear and nonlinear constraints are described by h L : R P → R C L and h N : R P → R C N , where C L ∈ N and C N ∈ N.
Figure 1 depicts the main pipeline of the classical approaches for NOPT.First, it includes the physical system's parameters, constraints, limits, and the objective function to be optimized.Second, starting from an initial point, the optimization algorithm iterates until convergence.Of note, the number of iterations, the level of improvement, and the objective function thresholding are the relevant stopping criteria to return the final output.

Regularized Physics-Informed Neural Network (RPINN)
Let {y r ∈ Y, z r ∈ Z } R r=1 be an input-output set holding R samples.Our data-driven RPINN approach aims to couple the optimization problem in Equation ( 1) as a penaltybased loss with bounded constraints from both network weights and learned features as follows: where f : Z → Y is an ANN-based mapping function, L : Y × Y → R is a given loss, X holds the network parameters, and Z gathers the learned features along layers.Also, hLi (•, •) and hNi (•, •) are the i-th linear and j-th nonlinear penalty functions to follow the NOPT constraints set by the regularization terms λ L , λ Li , λ Nj ∈ [0, 1], where i ∈ {1, 2, . . ., C L } and j ∈ {1, 2, . . ., C N }.Furthermore, ζ min and ζ max collect the network parameter limit values, and ψ min and ψ max capture the network output and feature bounds.
For a given input z ∈ Z, our deep learning-based function with L layers yields: In the l-th layer of Equation ( 3), where l ∈ {1, 2, . . ., L}, the weights and bias are xl , b l ∈ X, the learned feature vector is zl ∈ Z, and ν l (•) is a nonlinear activation function to deal with both network representation and customized bounds to fulfill the Equation (2) limit constraints.Furthermore, the RINN optimization problem can be solved via gradient descent with AD and back-propagation [70].
It is worth noting that our baseline RINN studies a supervised scenario for simplicity, but by addressing its regularized loss, we can easily achieve an unsupervised extension.Figure 2 depicts the RINN main sketch.

Tested Scenarios for NOPT Using RPINN
We study two main datasets to test our RPINN as a data-driven NOPT approach: (i) a constrained uniform mixture model with nonlinear loss and supervised target, and (ii) a constrained flow and pressure gas-powered system optimization with unsupervised loss.Below, we provide a detailed description of each experiment.

Supervised Constrained Optimization: Uniform Mixture Model
This task comprises a linear and bound-constrained optimization of a nonlinear cost [71]: where y r ∈ R + is the r-th target output, x ∈ R P denotes the mixing coefficients, and z r ∈ R P holds random samples drawn from a uniform distribution as z rp ∼ U (z|p − 1, p). 0 and 1 are all-zero and all-one vectors of a proper size.Figure 3 depicts the uniform mixture model task.The optimization problem in Equation ( 4) can be solved through our RPINN as follows: (5) For concrete testing and to mitigate noisy samples, a Huber-based loss is used in Equation (5): where ϵ ∈ R + .Next, we fix a scaled exponential linear (SELU) activation for the network function composition as follows: where θ, ϑ ∈ R.Then, xL .To fulfill the former NOPT limit restriction, the RPINN weights at the output layer L hold a l1-based max constraint.

Unsupervised Constrained Optimization: Gas-Powered System
We study a gas-powered system as a function of flow and pressure.For this purpose, a synthetic network of eight nodes is used as detailed in [72] and illustrated in Figure 4.In particular, the NOPT problem is written as: where a ∈ R P represents the gas transport costs for the P flows in x ∈ R P .The incidence matrix B ∈ R W×P encodes the gas network structure, with W nodes and z ∈ R W , the input gas demand.The first equality constraint is what encodes the linear-based flow and gas demand equilibrium along the network nodes.Next, the node pressure is stored in π ∈ R W .In turn, the q-th flow x q ∈ x is selected according to the network structure from B to fulfill the Weymouth equality with k q ∈ R and Q ≤ P [53].Then, the function w(q) extracts the related pressure π w(q) ∈ π regarding such a Weymouth-based physic constraint.Furthermore, π n , π n ′ ∈ π choose the inlet and outlet pressures to fulfill the system compression ratio, with V components (n, n ′ ∈ {1, 2, . . ., V}, V ≤ W) and compression factor limits β min (n, n ′ ), β max (n, n ′ ) ∈ R + .Also, γ min , γ max ∈ R W and δ min , δ max ∈ R P are the minimum and maximum pressure and flow limits, respectively.Now, let {z r ∈ R W } R r=1 be an unsupervised input set concerning the required gas demand for R observations.Our RPINN solution of Equation ( 8) is as follows: Given the r-th gas demand vector z r ∈ R W , zr ∈ R P predicts the flow vector based on f † , and πr ∈ R W the corresponding pressure vector using f ‡ .Moreover: where notation L H (•, •; ϵ • ) stands for a Huber-based penalty (see Equation ( 6)), φ( πr ; B) ∈ R Q holds elements: and: It is worth mentioning that the custom penalty in Equation ( 10) aims to deal with noisy inputs while preserving the NOPT limits and constraints.In particular, L H (•, •; B, β, ϵ N2 ) penalizes pressures that are far from the middle of the compression factor range, according to β min (n, n ′ ), β max (n, n ′ ) ∈ β.Finally, a scaled sigmoid function σ(•) ∈ [u min , u max ] addresses the predicted flow and pressure limits in Equation ( 9) as: where α, ι ∈ R.

Experimental Set-Up
The scenarios in Section 4 will be used to test our RPINN in both supervised and unsupervised settings.They will be utilized to look at sample variability, noisy input measurements, and nonlinear constraints.

Deep Learning Architectures
To address the uniform mixture model NOPT (supervised constrained optimization), our RPINN consists of two dense layers as shown in Figure 5 and Table 2.

Input Dense Dense
Huber-based loss  Next, as seen in Figure 6 and Table 3, a wide ANN architecture is proposed for our RPINN-based gas-powered system scenario.We can specifically focus on essential variables-flows and pressures-in our sketch, adapting it to the unique characteristics of the gas network.To achieve this, our model incorporates blocks of dense layers designed to map input data, as well as batch normalization layers that help stabilize and normalize the features and gradient along the back-propagation.Additionally, it includes custom layers named custom dense, bounded dense, source switching, and unsupply gas switching.We design these to encode the source behavior of the system, manage unmet demand, and delineate system boundaries.As seen, a shallow and straightforward approach is a simple NOPT task for the uniform mixture model, which we must elucidate in relation to fixed architectures.Additionally, in order to mitigate overfitting and accommodate numerous constraints and a linear loss in the gas-powered system, we implement a shallow and wide network.Nevertheless, our RPINN approach is adaptable in terms of network architecture, enabling the implementation of more complex schemes as needed.

Training Details and Method Comparison
To evaluate the effectiveness of our methodology in addressing optimization problems, we utilize the mean absolute percentage error (MAPE) as the primary performance measure across all conducted experiments, defined as: where ỹr , ŷr ∈ R stands for r-th target and predicted value, MAPE(•, •) ∈ [0, 100][%], and | • | is the absolute value operator.Now, for the uniform mixture model, we generate 500 samples, each composed of five variables.We train our RPINN architectures on a total of 400 samples, allocating 30% for the validation phase.We use the remaining 100 samples to evaluate the model's performance.To see how well NOPT works with noisy inputs, we add white Gaussian noise to the model output while keeping the signal-to-noise ratio (SNR) value within the set {−1, 3, 5}.Further, for the gas-powered system, we define three distinct scenarios to evaluate the network's capacity under varying demand conditions.This process yields a total of 20.000 samples, of which 30% is designated for testing.We produce 320 samples using GEKKO v1.0.6 to compare the model's performance with IPOPT v3.12 [73].
We implement RPINN using Python 3.10.12and the TensorFlow API 2.15.0 on Google Colaboratory.For training, we fix 600 epochs, a batch size of 32 samples, an Adam optimizer, and a learning rate value of 1 × 10 −3 in the supervised constrained optimization.Likewise, the unsupervised constrained NOPT scenario uses a batch size of 256 and an Adamax optimizer.Also, an initial learning rate of 1 × 10 −2 with decreasing scheduling is employed.The regularization hyperparameters, namely λ • in Equation ( 2), are experimentally fixed within the range [0, 1].Since IPOPT excels at solving NOPT, not to mention its open access, we fix it as a method comparison [50].Our codes and studied datasets are publicly available at https://github.com/UN-GCPDS/python-gcpds.optimization(accessed on 1 March 2024).

Supervised Constrained Optimization Results
As shown in Figures 7(left) and 8, for noisy-free data on the uniform mixture model scenario, both our proposal and the IPOPT solution exhibit similar results.The similarity of the results stems from the fact that the problem defined in Equation ( 4) is convex.Next, for noisy inputs, our RPINN, based on the Huber loss function, shows greater robustness against data variability and noise issues.In fact, the Huber function applies the l1-norm for errors exceeding a defined threshold, reducing sensitivity to extreme values, while for smaller errors, it uses the l2-norm, ensuring accuracy by penalizing smaller errors.In contrast, the classical IPOPT technique uses an objective function based on the l2-norm, which is sensitive to outliers because it significantly penalizes large deviations.The weight distributions provide support for the latter hypothesis.Noise-free data lead to similar strength predictions for both RPINN and IPOPT.Conversely, for noisy inputs, our proposal regularizes the network weights, yielding concentrated values to find the main output dynamics, and outperforms the IPOPT regarding the MAPE for all considered SNR values.

Unsupervised Constrained Optimization Results
Figure 9 depicts our RPINN regularized penalty illustration for the gas-powered system NOPT.We adopt a standard variant of the Huber loss for the node balance and Weymouth constraints.As shown, the threshold ϵ • transitions between the l1 and l2 norms.Regarding the constraint on the compression ratio limit, it is essential to alter the structure due to its inequality behavior.This enhancement stabilizes the transition between the l2 and l1 norms at zero, based on the distance to the required range's central value.Furthermore, it is crucial to correctly integrate these cost functions into our RPINN.Then, the right plot in Figure 9 shows the Weymouth (blue), compression ratio (orange), and compression factor constraint (green) penalty evolution.The resulting loss shows a decreasing trend, indicating that the Huber-based approach can handle the physical limitations of the gas-powered NOPT.
In turn, we design three evaluation scenarios in comparison with the IPOPT framework to validate the performance of regularization functions in data generation.In the first scenario, data remain below the source's maximum capacity.In the second scenario, 50 percent of the samples exceed this capacity, while in the third, about 100 percent of the data surpass it.Figure 10 shows that even though IPOPT has a lower MAPE, its precision (variance) changes a lot over the iterations.This means that conventional methods for NOPT are not strong against data variability and nonlinear constraints.In contrast, our RPINN achieves acceptable MAPE with low variability across experiments due to its regularized strategy based on ANNs.In fact, both approaches share similar costs and adhere to compression ratio constraints.In the first two cases, traditional solutions to the Weymouth equation work better than ours.But in the third case, our proposal is better because it is more stable and less affected by outliers, thanks to the Huber-based penalty.    .Gas-powered system regularized loss illustration.Left: node balance and Weymouth penalties based on conventional Huber-loss.Middle: Compression factor limit constraint using our Huber-based enhancement.(see Eq. 10).Right: Gas-powered system custom penalty evolution (Blue: Weymouth equality constraint; Orange: compression ratio limit constraint; Green: Compression factor constraint).
adhere to compression ratio constraints.In the first two cases, traditional solutions to the 360 Weymouth equation work better than ours.But in the third case, our proposal is better 361 because it is more stable and less affected by outliers, thanks to the Huber-based penalty.

362
Finally, to see if our RPINN model can handle the limits in Eq. 9 well, we looked at 363 the results of the flow and pressure prediction layers and how they behaved, as shown in 364 Figure 11.The parameters analyzed, including injection and pipe flows as well as pipeline 365 pressures, remain within acceptable limits.This behavior is attributed to custom activation 366 in Eq. 13, which ensures a smooth and steady transition between the established ranges.

368
Figure 12 shows the training and prediction times needed by the RPINN compared to 369 IPOPT.Our model needs more time to process during the training phase because it has to 370 do both forward and backward passes in each iteration within an ANN-based framework.371 However, in the prediction phases, our RPINN outperforms IPOPT, resulting in significantly 372 shorter prediction times.This is due to the fact that our approach only requires forward 373 passes after weight training.These tools demonstrate the RPINN's capability to generate 374 fast and accurate predictions for NOPT solutions, not only by reducing processing times 375 but also by narrowing interquartile ranges.The RPINN framework, while innovative and effective in addressing many challenges 378 of NOPT, has several limitations that need to be considered.One significant limitation 379 is the complexity involved in defining appropriate loss functions and selecting optimal 380 hyperparameters, which can make the implementation process cumbersome.Additionally, 381 extremely high levels of noise or complex nonlinear constraints can hinder the performance 382 of RPINN, despite its robustness against data variability and noisy inputs.Although AD 383 has improved the model's scalability, it may still face challenges when applied to very 384 large-scale problems due to computational resource limitations.

385
Furthermore, integrating precise physical principles into the network architecture can 386 be intricate and may not always generalize well across different types of NOPT problems.387 Current trends in PINNs emphasize improving these models' generalization capabilities 388 and computational efficiency [76].To better solve the problems of scalability and accuracy, 389 researchers are focusing on hybrid approaches that mix PINNs with other advanced 390 optimization methods, like metaheuristics and gradient-based methods.The latter indicates 391 Figure 9. Gas-powered system regularized loss illustration.Left: node balance and Weymouth penalties based on conventional Huber-loss.Middle: Compression factor limit constraint using our Huber-based enhancement.(see Equation ( 10)).Right: Gas-powered system custom penalty evolution (Blue: Weymouth equality constraint; Orange: compression ratio limit constraint; Green: Compression factor constraint).

IPOPT(8)
RPINN( 8  Finally, to see if our RPINN model can handle the limits in Equation ( 9) well, we look at the results of the flow and pressure prediction layers and how they behave as shown in Figure 11.The parameters analyzed, including injection and pipe flows as well as pipeline pressures, remain within acceptable limits.This behavior is attributed to custom activation in Equation ( 13), which ensures a smooth and steady transition between the established ranges.Figure 11.Gas-powered system-bound constraint MAPE results.The star symbol on this graph denotes the defined limits for each of the sources, compressors, pipelines, and pressures as well as their behavior.The number on the x-axis indicates the node to which the information belongs.MMSCFD: Million standard cubic feet per day.psia: pounds per square inch absolute.

Computational Cost Results
Figure 12 shows the training and prediction times needed by the RPINN compared to IPOPT.Our model needs more time to process during the training phase because it has to perform both forward and backward passes in each iteration within an ANN-based framework.However, in the prediction phases, our RPINN outperforms IPOPT, resulting in significantly shorter prediction times.This is due to the fact that our approach only requires forward passes after weight training.These tools demonstrate the capability of RPINN to generate fast and accurate predictions for NOPT solutions, not only by reducing processing times but also by narrowing interquartile ranges.

Limitations
The RPINN framework, while innovative and effective in addressing many challenges of NOPT, has several limitations that need to be considered.One significant limitation is the complexity involved in defining appropriate loss functions and selecting optimal hyperparameters, which can make the implementation process cumbersome.Additionally, extremely high levels of noise or complex nonlinear constraints can hinder the performance of RPINN, despite its robustness against data variability and noisy inputs.Although AD has improved the model's scalability, it may still face challenges when applied to very large-scale problems due to computational resource limitations.
Furthermore, integrating precise physical principles into the network architecture can be intricate and may not always generalize well across different types of NOPT problems.Current trends in PINNs emphasize improving these models' generalization capabilities and computational efficiency [74].To better solve the problems of scalability and accuracy, researchers are focusing on hybrid approaches that mix PINNs with other advanced optimization methods, like metaheuristics and gradient-based methods.The latter indicates a growing recognition of the need for more flexible and adaptive frameworks that can handle a broader range of NOPT scenarios.

Conclusions
We introduce a novel Regularized Physics-Informed Neural Network (RPINN) framework, named RPINN, presenting a significant advancement in addressing the challenges associated with nonlinear constrained optimization.By integrating custom activation functions and regularization penalties within an ANN architecture, RPINN effectively handles data variability and noisy inputs.The incorporation of physics principles into the network architecture allows for the computation of optimization variables based on network weights and learned features, leading to competitive performance compared to state-of-the-art solvers.Furthermore, the use of automatic differentiation for training enhances scalability and reduces computation time, making RPINN a robust solution for various NOPT tasks.Experimental results included two scenarios regarding supervised and unsupervised datasets.
The uniform mixture model experiments (supervised constrained NOPT) show that the RPINN is good at dealing with data variability and noisy samples.For noise-free data, both RPINN and the IPOPT solver achieve similar results due to the convex nature of the problem.Still, in scenarios with noisy inputs, RPINN significantly outperforms IPOPT.The RPINN framework, leveraging the Huber loss function, shows greater robustness against noise by effectively regularizing the network weights.This results in more accurate and stable output predictions compared to IPOPT, which relies on an objective function based on the l2-norm and is more sensitive to outliers.The RPINN weight distributions are concentrated, which shows that the model can find the main output dynamics even when noise is present as shown by the lower mean absolute percentage error across all signal-to-noise ratio values.
Then, the results of the gas-powered system (unsupervised constrained optimization) highlight the capability of the RPINN framework to effectively manage complex, nonlinear constraints under varying conditions of gas demand.Compared to the IPOPT framework, the RPINN shows consistent performance with low changes in the mean absolute percentage error.This is especially true when the gas demand is higher than the source's maximum capacity.While IPOPT shows lower MAPE in terms of node balance and Weymouth constraints, its precision fluctuates significantly with data variability.In contrast, RPINN maintains stable performance, ensuring compliance with physical constraints such as the Weymouth equation and compression ratio limits.The custom penalty functions within RPINN facilitate this stability, proving particularly valuable when traditional methods struggle with outliers and extreme values.Overall, RPINN offers a robust, scalable solution with reduced prediction times.
As future work, authors plan to include Bayesian hyperparameter optimization for RPINN fine tuning [75].We will also look at normalized and information theoretic learningbased loss as ways to deal with noisy inputs and complicated constraints [76,77].Finally, Bayesian PINN and graph neural networks will be coupled with our RPINN for representation learning enhancement [67,78].

Figure 2 .
Figure 2. Regularized physics-informed neural network for data-driven nonlinear constrained optimization main sketch.

Figure 4 .
Figure 4. Optimizing gas-powered systems.An eight-node gas network is studied.The diagram depicts the nodes as points, and the arrows indicate flow direction.The trapezoidal shapes represent the pressure compressors.Numbers represent each node within the gas network.

Figure 9
Figure9.Gas-powered system regularized loss illustration.Left: node balance and Weymouth penalties based on conventional Huber-loss.Middle: Compression factor limit constraint using our Huber-based enhancement.(see Eq. 10).Right: Gas-powered system custom penalty evolution (Blue: Weymouth equality constraint; Orange: compression ratio limit constraint; Green: Compression factor constraint). 367

Figure 10 .
Figure 10.Gas-powered system objective cost and constraint compliance MAPE results.Upper left: node balance.Upper right: Weymouth constraint.Bottom left: compression ratio constraint.Bottom right: cost difference (objective function) between RPINN and IPOPT.

Figure 12 .
Figure 12.RPINN vs. IPOPT computational cost results.The graph compares solution times for the test data between the classical technique (IPOPT, in blue) and our strategy (RPINN, in green).On the left, the training times are shown, while on the right, the prediction times are displayed.

Table 1 .
State-of-the-art solvers for optimization.(*) Except mixed-integer SDP.(**) Features available with the licensed version only.

Table 3 .
RPINN architecture details for the gas-powered system NOPT.R: batch-size for AD-based back-propagation.Source switching, unsupply gas switching, custom dense, and bounded dense stand for specific switching, limited, and scaled layers, as explained in Section 4.2.Param.#: number of trainable parameters.Total # of parameters: 12,707 (49.67 KB).