Constrained Self-Adaptive Physics-Informed Neural Networks with ResNet Block-Enhanced Network Architecture

: Physics-informed neural networks (PINNs) have been widely adopted to solve partial differential equations (PDEs), which could be used to simulate physical systems. However, the accuracy of PINNs does not meet the needs of the industry, and severely degrades, especially when the PDE solution has sharp transitions. In this paper, we propose a ResNet block-enhanced network architecture to better capture the transition. Meanwhile, a constrained self-adaptive PINN (cSPINN) scheme is developed to move PINN’s objective to the areas of the physical domain, which are difﬁcult to learn. To demonstrate the performance of our method, we present the results of numerical experiments on the Allen–Cahn equation, the Burgers equation, and the Helmholtz equation. We also show the results of solving the Poisson equation using cSPINNs on different geometries to show the strong geometric adaptivity of cSPINNs. Finally, we provide the performance of cSPINNs on a high-dimensional Poisson equation to further demonstrate the ability of our method.


Introduction
Deep learning achieves breakthroughs in many scientific fields and impacts the areas of data analysis, decision making, and pattern recognition. Recently, deep learning methods have been applied to solve partial differential equations (PDEs), and physics-informed neural networks (PINNs) [1,2] used to solve the partial differential equations (PDEs). The main idea is to represent the solutions of PDEs using a neural network, and optimize them with the constraints of physics-informed loss using automatic differentiation (AD). In the last few years, PINNs have been employed to solve PDEs from different fields, including problems in mechanical engineering, geophysics [3], vascular fluid dynamics [4,5], and biomedicine [6].
To further enhance the accuracy and efficiency of PINNs, a series of extensions to the original formulation of Raissi et al. [1,[7][8][9][10][11] were proposed. For example, from the aspect of data augmentation, re-sampling methods are proposed to adaptively change the distribution of residual points during training [7,8], which could help to improve the accuracy of PINNs in solving stiff PDEs. The standard loss function in PINNs is the mean square error (MSE), which is not always suitable for training when solving PDEs [12,13]. In [9], an adjustment method among different loss terms was proposed to mitigate the gradient pathologies. Causal-PINNs [10] was proposed to solve the timedependent PDEs, by means of an adaptive adjustment scheme for loss weights in the temporal domain. Meanwhile, regularization on the differential forms of PDEs was also demonstrated to be an effective way to improve accuracy [14], compared to original PINNs. In addition, the architectures [9,11,15,16] of PINNs greatly influence the final prediction results, and some works [15,17] have focused on embedding methods, which are useful for features enhancement of PINNs and are even possibly enforceable on soft/hard boundary enforcement [18].
PINNs with fully connected neural networks are widely used to solve partial differential equations and the derivatives of PDEs could be directly computed by means of automatic differentiation (AD). There also exist various types of architectures to solve PDEs, e.g., CNN architecture [19] and UNet architecture [20]. However, CNN and UNet require a finite difference approach when calculating the derivatives of PDEs. Other architectures, such as Bayesian neural networks (BNNs) [21] and generative adversarial networks(GANs) [22], are also used to address PDE problems. Despite the development of different architectures, fully connected feed forward neural networks are still the most used architecture in PINNs and their hyper-parameters, such as depth, width or the connected way between hidden layers, greatly influence the final results. In [16], adjustments on width and depth of fully connected neural networks (FCNNs) showed the different accuracy of PINNs. In [11], a ResNet-block was used to enhance the connected way between hidden layers and performed better than FCNNs in the parameters identification of the Navier-Stokes equation. As in [9], modified neural network was also proposed to project the input variables to a high-dimensional feature space and fuse them as neural attention mechanisms. All these cases showed that the selected architecture was essential to the final predictive results of PINNs.
The main idea of such an adaptive scheme is to attach a bounded trainable weight for each single residual point in the residual loss function and adaptively update pointwise weights for each training point. As for the architecture, it influences the predicted results and is also essential for improving the accuracy of PINN methods. Our contributions in this paper are summarized below:

•
We develop a constrained self-adaptive physics-informed neural network (cSPINN), which had better accuracy in numerical results. Meanwhile, the dynamics of residual weights changed more steadily during the training process. • To better capture the solution with sharp transitions in the physical domain, we develop a ResNet block-enhanced modified MLP architecture, which also has the ability to tackle the vanishing gradient problem using identity mapping, even for deep architectures.

Related Works
In this section, we first introduce the model problem, and then provide a brief overview of physics-informed neural networks (PINNs) for solving forward partial differential equations (PDEs).

Model Problem
In this subsection, we first introduce the model problem given the spatial domain Ω, and the temporal t ∈ [0, T] domain, having the general form of a partial differential equation (PDE): where N t [·] and N x [·] are the general differential operator, which includes any combination of linear and non-linear terms of temporal and spatial derivatives. The corresponding initial condition at t = 0 is given by u 0 (x). In the above, B[·] is a boundary operator, which could be Neumann, Robin, or periodic boundary conditions, and enforces the condition g(x, t) at the boundary domain ∂Ω.

PINNs Formulation
To solve the PDEs via PINNs [1], we needed to construct a neural networkû(x, t; w), given the spatial x ∈ Ω and temporal t ∈ [0, T] inputs with the trainable parameters w, to approximate the solution u(x, t). Then, we could train a physics-informed model by minimizing the following loss function: here, L r , L ic and L bc are loss functions due to the residual in physics domain, loss in initial condition and boundary condition. We use u w to represent the output of neural network, which is parameterized by w. λ r , λ ic and λ bc are the weights that could influence the convergence rate of different loss components and the final accuracy of PINNs [9,12]. Hence, it was important to use an appropriate weighting strategy during training. Here are training points inside the domain, points on the initial domain and points on the boundary.
In this paper, we used L 2 error to define the relative error between the PINNs' prediction and the reference solutions, as where u(x k , t k ) is the reference solution andû(x k , t k ) is the neural network prediction for a set of testing points {(x k , t k )} N k=1 ((x k , t k ) ∈ Ω × (0, T]).

Formulation of Constrained Self-Adaptive PINNs (cSPINNs)
In this section, we first present the method of the constrained self-adaptive weighting scheme for PINNs, which could adaptively adjust the weights for residual points during training. Next, we propose a modified network architecture enhanced by ResNet block to further improve the performance of cSPINNs.

Constrained Self-Adaptive Weighting Scheme
In PINNs, we defined the residual loss L r (w) to enforce the network to satisfy the governing equation for any sample points inside the domain, i.e., {x i r , t i r } N r i=1 . However, we could find that equal weight was attached for all residual points in the above formulation of L r (w), with the result that PINNs could not focus on the area that was difficult to learn during training (as shown in Figure 1). One effective way to overcome this, was to attach individual adjustable weights for each residual point, according to the distribution of residual in the physical domain, and then automatically raise the weights of inner points with relatively higher loss value. Then, we could formulate such self-adaptive adjustment as a min-max optimization problem, i.e., , and C is a constant that could be used to constrain the range of weights. Here, we set C as the expectation of weights in PINNs, i.e., C = E(∑ N r i=1λ ri ) = N r . The formulation of loss function is as below: 2 is a weighted residual loss and the other terms in L w, λ r , λ ic , λ bc ,λ r are the same as in the original formulation. It was easy to solve the inner optimization of the min-max optimization problem above by selecting a residual point with the largest residual loss and attaching a weight with value C, while setting the weights of other points to zero. However, PINNs could only optimize one single point in every training iteration if we chose such a strategy, which went against the adjustment among different residual points. In [12], Mcclenny et al. proposed a self adaptive method to solve the above min-max problem by a step-forward optimization in the inner optimization using a gradient ascent procedure, which could approximately satisfy the inner maximization requirement. In this way, different residual points could be attached with appropriate weights during training. They updated theλ r during the training process as: where η k is the learning rate at iteration k. However, it was easy to see thatλ k r was an unbounded weights vector during the training, which mean that the training of PINNs would suffer an unstable state, caused by the rapidly changing weights. We modified the updating rules of theλ k r as: where we denote r i as ith residual points in {x i r , t i r } N r i=1 , k and k + 1 the training iteration numbers. λ k+1 r is a middle variable before normalization. In other words, we first normalized the λ k+1

ResNet Block-Enhanced Modified Network
Continuous improvement of network structure design is one of the drivers for the development of deep learning methods and its applications. For example, convolutional neural networks [23][24][25] were designed for, and have been widely used in, computer vision, such as image classification, image segmentation, and object recognition tasks. Similarly, recurrent neural networks and their variants [26][27][28] show great performance in natural language processing and sequential modeling because of their ability to capture longterm dependencies in the sequences. In [9], a modified MLP framework was proposed to correctly capture the solution of complex PDEs, which has been widely used in many cases. PINNs with ResNet blocks are also used to improve the representational capacity of the neural network when solving PDEs. Inspired by these ideas, we proposed a ResNet block-enhanced modified MLP framework to better represent the solution of PDEs. The ResNet block-enhanced modified network is as follows: here φ is an activation function, is an element-wise multiplication operation, f is the final output of network and the updating rule of Z (k) is similar to the residual learning, which was first proposed in [23] and achieved great success. We have denoted it above, i.e., More specifically, H (k) and Z (k) are the input and output of the ResNet block, respectively. F is an operation consisting of a fully connected layer and activation function, which could be defined as F(X) := φ(φ(XW 1,k + b 1,k )W 2,k + b 2,k ) here, with input X, hidden layers' parameters {W 1,k , b 1,k , W 2,k , b 2,k } and an activation function φ. As for the element-wise addition H (k) + F(H (k) ), it could be conducted by a shortcut connection. The effectiveness of such a block in PINNs was demonstrated in [11] and here we used it as a feature enhanced sub-structure of our network. The features could be fused by a shortcut connection and fed into updating the hidden layers by an element-wise multiplication operation with U and V, as shown in Figure 3. Compared to simple fully connected neural networks, our architecture enhanced the representative ability of hidden layers by ResNet blocks, which could make it easier for the network to learn the desired solution. Meanwhile, embedding of inputs from low-level space to higher dimensional feature space could also be considered here, and could be fused, using the attention mechanism during the forward process.

Numerical Experiments
We demonstrated the performance of our proposed cSPINN in solving several PDE problems. In all of the examples, we used the ResNet block-enhanced modified network with tanh function as our activation function φ. The proposed architecture had 2 input neurons and consisted of 4 ResNet blocks, each having a width of 64. The output layer contained only one neuron for the output/solution of the PDE.

1D Allen-Cahn Equation
The Allen-Cahn equation is a stiff PDE, which has a sharp interface and time transitions in its solution. We denote the 1D Allen-Cahn equation as below: We used the same physics parameters of the Allen-Cahn equation as in [7] to better compare the results. For the given problem, we used the ResNet block-enhanced modified network architecture mentioned above to better fit the sharp transition. In order to implement the cSPINNs for the Allen-Cahn equation, the following loss function was used: • The constrained self adaptive loss for the residual of the governing equation • Mean squared loss on the initial condition • Mean squared loss on the boundary condition whereû is the prediction of neural network and we sampled N r = 25,600 residual points, N b = 400 boundary points and N ic = 512 points on the initial condition.
Here, we used the Adam optimizer with 10,000 epochs and L-BFGS optimizer with 1000 epochs to optimize the network architecture. During the training process, we set the boundary weight and the initial weight w b = w i = 100, which could help expedite the convergence. Figure 4 shows the numerical results of constrained self-adaptive PINNs(cSPINNs) compared with the reference solution obtained through the Chebfun method [29] and the training loss history. The relative L 2 error was 1.472 × 10 −2 , which was better than the time-adaptive approach in [7] and the original PINNs [1].

1D Viscous Burgers' Equation
The Viscous Burgers' equation is widely used in various areas of applied mathematics, such as fluid mechanics, traffic flow, and gas dynamics. The 1D viscous Burgers equation could be denoted as below: To better compare the results, we used the same physics parameters of the Burgers equation as in [1]. In order to implement the cSPINN scheme for the Burgers equation, the following modified residual loss function was used as mentioned in the formulation of cSPINNs: • The constrained self adaptive loss for the residual of the governing equation • Mean squared loss on the initial condition • Mean squared loss on the boundary conditions Here, we trained the network with a constrained self-adaptive scheme andû was the prediction of neural network. In this case, we sampled N r = 25,600 residual points, N b = 256 boundary points and N ic = 512 points on the initial condition. We set the weights of initial condition and boundary condition as N ic = N bc = 100. Training was performed using 10,000 Adam iterations and 1000 L-BFGS epochs. The predicted solution of cSPINNs, and loss history are shown in Figure 5. Despite a sharp transition in the center of the domain, the solution of cSPINNs was still accurate in the whole domain, yielding a relative L 2 error of 4.796 × 10 −4 .

2D Helmholtz Equation
The Helmholtz equation is widely used to describe the behavior of wave propagation, which could be mathematically formulated as follows: ∆u(x, y) + k 2 u(x, y) − q(x, y) = 0 (16a) q(x, y) = − (a 1 π) 2 sin(a 1 πx)sin(a 2 πy) − (a 2 π) 2 sin(a 1 πx)sin(a 2 πy) is a forcing term that results in a closed-form analytical solution u(x, y) = sin(a 1 πx)sin(a 2 πy) (18) The exact solution above was as the same as in [9], which helped us better compare the results.

•
The constrained self adaptive loss for the residual of the governing equation • Mean squared loss on the boundary conditions Here, we solved the problem with a 1 = 1 and a 2 = 4 to allow a direct comparison with the results reported in [9]. In this case, the ResNet block-enhanced modified network was trained with 10,000 Adam and 1000 L-BFGS epochs. As for the training points, we sampled N r = 10,000 residual points and N b = 400 (100 points per boundary). We show the prediction results of the cSPINNs in Figure 6. Finally, we achieved a relative L 2 error of 1.626 × 10 −3 , which exhibited performance than the learning-rate annealing weighted scheme, proposed in [9], and self-adaptive PINNs, proposed in [12]. Meanwhile, our method required less computational cost, due to the stability of the design in the selfadaptive weights.

2D Poisson Equation on Different Geometries
Poisson's equation is an elliptic partial differential equation widely used in the description of potential fields. The 2D Poisson's problem could be denoted as follows: To further demonstrate the performance of cSPINNs, we used the exact solution with periodicity u(x, y) = 1 2(4π) 2 sin(4πx) sin(4πy) and obtained f (x, y) and g(x, y), according to the exact solution, directly as: Then, we had the following loss terms • The constrained self adaptive loss for the residual of the governing equation: R := −∆û(x i r , y i r ) + sin 4πx i r ) sin(4πy i r , x i r , y i r ∈ Ω (25a) • Mean squared loss on the boundary conditions (Taking the rectangular area as example): In this case, we first tested the performance of cSPINNs on a rectangular domain Ω 1 = [0, 0.25] × [0, 0.25]. We sampled N r = 10,000 residual points in the inner domain and N b = 1000 boundary points distributed on the boundary area. The ResNet blockenhanced modified network was trained with 10,000 epochs Adam and 1000 epochs L-BFGS. Meanwhile, different geometries, including circular, triangular, and pentagonal domains, were also tested to demonstrate the advantages of cSPINNs. The L 2 error between the predicted solution and the reference solution on different geometries is shown in Table 1. It is worth noting that we magnified the loss value by a constant number of c = 10,000, due to the relatively small true value (the maximum value of exact solution was about 0.003) in the solution, to ensure normal gradient backward during training. As for an irregular domain, we sampled the same number of points as for the rectangular domain. We found that the cSPINNs achieved good performance in this problem, as shown in Figure 7. The original PINNs failed, as shown in Figure 1.  Moreover, we provided the comparison results between cSPINNs and reference solutions in the domain Ω 2 = [0, 1] × [0, 1], which was hard for PINNs to solve, due to the high frequency of the solution, as shown in Figure 8. We show the relative L 2 errors between the predicted and the exact solution u(x, y) using cSPINNs on the different geometries in Table 2. To further test the performance of cSPINNs, we provided the numerical results of cSPINNs on the L-shaped domain, a classic concave geometry. In this case, we set f (x, y) = 1 and g(x, y) = 0 to have a direct comparison with PINN, as in [30]. We had the loss terms as follow: • The constrained self adaptive loss for the residual of the governing equation • Mean squared loss on the boundary condition We tested the performance of cSPINN on the L-shaped domain and show the results in Figure 9. The maximum point-wise error was yielded at about 6 × 10 −3 and the relative L 2 error 4.257 × 10 −3 . In [30], PINNs achieved accurate results with about 0.02 maximum point-wise error in the same case on the L-shaped domain. In [31], hp-VPINNs also tested the performance and, in this case, achieved about 0.02 maximum point-wise error on the domain. Therefore, cSPINNs also performed well, even on such concave geometry.
The solution of this problem was u(x) = ∑ 5 k=1 x 2k−1 x 2k and we computed the error of cSPINN using this exact solution.
• Mean squared loss on the boundary condition: We computed the relative L 2 error between the solution of cSPINN and the exact solution. The relative L 2 error of cSPINN was 1.028 × 10 −3 , which was smaller than the Deep Ritz method [32] (about 0.4%). In this case, we sampled N r = 1000 residual points in the inner domain and N b = 100 boundary points distributed on the boundary area. The ResNet block-enhanced modified network was trained with 20,000 epochs Adam and 1000 epochs L-BFGS. The training loss history of cSPINN is shown in Figure 10. Finally, we also provide the relative computational cost between cSPINNs and PINNs in different cases in Table 3. When computing the cost, we set the cSPINNs and PINNs with the same network depth, width, and number of training epochs for fair comparison.  At the end of this section, we also tested the impact of the following three architectures: the Multilayer Perceptron(MLP), the modified Multilayer Perceptron (MMLP), and the ResNet block-enhanced modified network(ResMNet). During the test, the depth and width of all networks were fixed at 6 and 128, respectively. Here, we also provided the results to demonstrate the effectiveness of our proposed constrained self-adaptive weighting scheme(cSA) compared to the L 2 loss function. Meanwhile, as can be seen in Table 4, we observed that, compared to the MLP and the MMLP, the ResMNet yielded the highest accuracy. Therefore, the constrained self-adaptive weighting scheme (cSA) and the ResNet block-enhanced modified network (ResMNet) were desired in the cSPINNs.

Conclusions and Future
In this paper, we proposed constrained self-adaptive PINNs (cSPINNs) to adaptively adjust the weights of individual residual points, which became more robust during training, due to the bounded weights. Meanwhile, a ResNet block-enhanced modified neural network was also proposed to enhance the predictive ability of PINNs.
We demonstrated the effectiveness of our method in solving various PDEs, including the Allen-Cahn equation, the Burgers equation, the Poisson equation and the Helmholtz equation. Our method showed good performance in all the cases mentioned and outperformed PINNs, especially in the Poisson equation with periodic solution, regardless of the geometries of the computational domain. Even with sharp transition in the physical domain, cSPINNs were also robust when solving the Allen-Cahn equation, which was difficult for the original PINNs to solve. Compared with the PINNs, cSPINNs could improve the accuracy and could be implemented with just a few lines of code, which made it possible to combine our method with other extensions of PINNs to further improve the performance. The usage of a constrained self-adaptive weighting scheme could attach higher weight values to difficult to learn regions during training, which made it possible to solve complicated problems. In this paper, we provided the numerical results of cSPINNs in solving the 10D Poisson equation and achieved better performance than the Deep-Ritz method. In the future, we will further generalize cSPINNs to solve higher dimensional PDEs and multi-physics problems.