Dynamic Weight Strategy of Physics-Informed Neural Networks for the 2D Navier–Stokes Equations

When PINNs solve the Navier–Stokes equations, the loss function has a gradient imbalance problem during training. It is one of the reasons why the efficiency of PINNs is limited. This paper proposes a novel method of adaptively adjusting the weights of loss terms, which can balance the gradients of each loss term during training. The weight is updated by the idea of the minmax algorithm. The neural network identifies which types of training data are harder to train and forces it to focus on those data before training the next step. Specifically, it adjusts the weight of the data that are difficult to train to maximize the objective function. On this basis, one can adjust the network parameters to minimize the objective function and do this alternately until the objective function converges. We demonstrate that the dynamic weights are monotonically non-decreasing and convergent during training. This method can not only accelerate the convergence of the loss, but also reduce the generalization error, and the computational efficiency outperformed other state-of-the-art PINNs algorithms. The validity of the method is verified by solving the forward and inverse problems of the Navier–Stokes equation.


Introduction
Numerical simulation of fluid systems relies on solving partial differential equations using computational fluid dynamics (CFD) methods, including finite element, finite volume and finite difference methods [1][2][3]. However, it is usually expensive and time-consuming to generate meshes for solving equations in complex regions. Solving the inverse problem by CFD methods first requires tedious data assimilation and does not guarantee convergence [4]. Therefore, the use of CFD models in practical applications and real-time predictions is limited. It is important to develop an effective Navier-Stokes solver that can overcome these limitations.
In recent years, deep neural networks have received extensive attention in the field of scientific machine learning [5]. It can be used to construct new methods for solving partial differential equations, based on their well-known capability as universal function approximators [6]. The process of analyzing complex physical and engineering systems usually requires a large amount of data. The cost of data acquisition is often prohibitive, and we inevitably face the challenge of making decisions with partial information. When the governing equation is known, a neural network trained under the constraints of this equation can learn the solution of the partial differential equation with a small amount of data. The potential of using neural networks to solve partial differential equations has been recognized. With huge advances in computational power and training algorithms [7], and the invention of automatic differentiation methods [8], physics-informed neural networks(PINNs) were able to take this approach to a different level [9]. PINNs does not require mesh generation to solve fluid mechanics problems [10][11][12][13], so it has advantages for solving problems in complex regions. This is a good strategy to tackle the problem of the curse of dimensionality.
The baseline PINNs has provided new research ideas for solving Navier-Stokes equations, stochastic PDEs, and fractional PDEs [14][15][16][17]. However, it has been argued that the convergence and accuracy of PINNs are still of tremendous challenge, especially for Navier-Stokes with multi-scale characteristics. Wang [18] explored the problems existing in the training process of PINNs, and found that the gradient value of each residual term in the loss was greatly different in the process of backpropagation, and the training could not be balanced. Therefore, a method of adaptively adjusting the weights between different components in the loss is proposed to improve the training convergence. The problems of gradient vanishing and gradient explosion limit the application of this method. Zhao et al. also observed that the performance of PINNs is closely related to the appropriate combination of loss terms [10]. It introduces a non-adaptive weighting strategy and time adaptive strategy of loss function. However, using fixed weights is always time-consuming, labor-intensive, and prone to errors and omissions. If gradient descent optimizes multiple objectives consisting of fixed weights, there is a high probability of obtaining a local optimal solution [19]. Ref. [20] introduced adaptive weights for configuration and boundary losses, which are updated through the Neural Tangent Kernel (NTK). The distribution of the eigenvalues of NTK does not change, and it is more difficult to calculate the eigenvalues of NTK. The performance improvement is slight. A method that updates the adaptation weights of configuration points in the loss function concerning the network parameters was suggested [21]. It performs well for the AC equation. For other equations, the objective function has a convergence problem due to the existence of the Max system. A strategy for adaptive resampling of configuration points is proposed to improve the convergence and accuracy of some partial differential equations with steep gradient solutions [22]. After each resampling step, the number of residual points grows, increasing computational complexity.
This paper introduces a dynamic weight strategy for physics-informed neural networks(dwPINNs) to balance the contribution of each loss item to the network. The mechanism of weight update in this paper is completely different from other PINNs literature. It uses trainable weights as a soft multiplicative mask reminiscent of the attention mechanism used in computer vision [23]. Adaptive weights are trained simultaneously with network parameters, and the data are automatically weighted in the loss function, forcing the approximation of these data to improve. This is achieved by training the network to minimize loss and maximize weights. The weight update is based on the information of mean square error. This strategy is conducive to the convergence of loss and can reduce the generalization error. This paper is organized as follows: Section 1 is the Introduction, and Section 2 gives a brief discussion of the effect of the loss terms on the network parameters in the optimization method. In Section 3 we discuss the construction of the loss function and some theoretical results on the convergence of weight sequence in the proposed method. Section 4 gives the results and detailed discussions on dwPINNs solving the Navier-Stokes. Finally, in Section 5, we summarize the conclusions of our work.

Partial Differential Equations
The incompressible Navier-Stokes equations is: where u(t, x) = (u(t, x), v(t, x)) is a velocity vector field, p is a scalar pressure field. λ 1 , λ 2 are the unknown parameters, and Ω is the solution domain. Γ D is the boundaries of the computational domain. Formula (1a) is the Momentum equation, Formula (1b) is the conservation of mass equation, Formula (1c) is Initial conditions, and Formula (1d) is Dirichlet boundary conditions. In this paper, we shall solve both forward problems where solutions of partial differential equations are inferred given fixed model parameters λ 1 , λ 2 as well as inverse problems, where the unknown parameters λ 1 , λ 2 are learned from the observed data.

Fully Connected Neural Networks
Neural networks have well-known function approximation properties [24], so they can be used to approximate the solution of partial differential equations. The (L − 1) -hidden layer feed-forward neural network is defined by and Z 1 (x) = W 1 x + b 1 , where in the last layer, the activation function is identity. By letting Θ = (W k , b k ) as the collection of all weights, biases. Φ is a nonlinear activation function. We can write the output of the neural network as where Z L (x;Θ) emphasizes the dependence of the neural network output Z L (x) onΘ. The loss function can be defined as Here, x i u , u i N u i=1 denote the training data on u(x). N u is the number of training data. The training process requires the loss function minimized with respect to the weights and biases in each network layer. Fitting in the sense of least squares requires the minimum mean square error between the fitting function and original data, and it is an approximation in the overall sense.

Optimization Method
We seek to findΘ * that minimizes the loss function L Θ . There are several optimization algorithms available to minimize the loss function. In general, a gradient-based optimization method is employed for the training of parameters [25]. Weights and biases are initialized from known probability distributions. In the basic form, given an initial value of parametersΘ using Xavier initialization, the parameters are updated as where η l is the learning rate.Θ m is the network parameter for step m. Next, we illustrate the effect of the error on the gradient in backpropagation. The gradient of the loss function to the network parameters in the backpropagation algorithm [26] is as follows ∂L ∂L ∂b (l) = δ l .
(2d) represents the Hadamard product; n l represents the number of neurons in layer l; f (·) represents the activation function of the neuron; W (l) ∈ R n l ×n l−1 represents the weight ∈ R n l represents the output of neurons in the l layer; ∈ R n l represents the activation value of neurons in the l layer. Formula (2c) is the gradient of the loss function to the weight of the output layer and the hidden layer. Formula (2d) is the gradient of the loss function to the bias of the output layer and the hidden layer. We calculate the δ (l) of the l layer through the δ (l+1) of the l + 1 layer, and then obtain the gradient of the weight of the hidden layer by the Formula (2c).
The error has a great influence on the backpropagation gradient, which can be obtained from (2). When the objective function consists of many terms, it always tends to optimize the loss term with a larger error. Formulas (2b) and (2d)shows that the more hidden layers in the network, the easier the problem of gradient disappearance and explosion. Therefore, the network structure used in the experiment is a wide and shallow structure, and the number of hidden layers does not exceed five layers.

Methodology
The combination of multiple loss functions plays a significant role in the convergence of PINNs [10]. The most common way to combine losses of each constraint is the weighted summation. These are either nonadaptive or require training many times at an increased computational cost. Here, we propose a simple procedure using fully trainable weights. It is in line with the idea of neural network adaptation, that is, the dynamic weights in the loss function are updated together with network parameters through backpropagation.

Dynamic Weights Strategy for Physics-Informed Neural Networks
We define residuals to be given by the left-hand-side of Equations (1a) and (1b); i.e., and proceed by approximating u(t, x) by neural networks. This assumption, along with Equation (2b) results in physical constraints f 1 , f 2 . f 1 indicates that the numerical solution satisfies the conservation of momentum, and f 2 indicates that the numerical solution satisfies the conservation of mass. The physical constraints of the network can be derived by applying the chain rule for differentiating compositions of functions using automatic differentiation. In order to balance the training of the residuals in each part of the loss, we multiply the trainable weights before each residual term of the PINNs loss function. The objective function is defined as follows where , and Here, w u , w 0 , w b , w f are the newly introduced balance parameters.
denote the initial and boundary training data on u(t, x) and specify the collocations points for f 1 (t, x), f 2 (t, x). N u , N 0 , N b , N f is the number of corresponding data. The sampling method of initial and boundary training data is random selection at the corresponding boundary. The selection method of the collocations points is Latin hypercube sampling. We determine unknown parameters by performing the following tasks This can be accomplished by a gradient descent/ascent procedure, with updates given byΘ where η k is the learning rate for the kth step in the process of updating the network parameters, η k w is the learning rate for the kth step in the process of updating the balance parameters. Considering the dynamic weight w 0 , to fix ideas, we see that The sequence of weights w k 0 ; k = 1, 2 . . . is monotonically nondecreasing, provided that w 1 0 is initialized to a non-negative value. Furthermore, (7) shows that the magnitude of the gradient, and therefore of the update, is larger when the mean squared error MSE k 0 is large. This progressively penalizes the network more for not fitting the initial points closely. Notice that any of the weights can be set to fixed, non-trainable values if desired. For example, by setting w k b ≡ 1, only the weights of the initial and collocation points would be trained. If necessary, the weight can also be changed to other types of functions. The convergence of the weight sequence plays an important role in the stability of the Min system. Next, we prove the convergence of w k 0 . From (6), we can obtain According to (7) Therefore w k+1 . According to the principle of compression mapping, {w k 0 } is convergent. {w k 0 } has an upper bound. The analysis for w k u ,w k b ,w k f is the same as w k 0 . (8), it can be seen that the convergence of the sequence {w k 0 } depends on the monotonous decrease in the MSE k 0 . The mean square error is theoretically monotonically decreased, so the weight sequence is theoretically convergent. However, in actual training, the mean square error is not strictly monotonically decreasing, which may have a little adverse effect on the performance of our method.

Remark 1. From Formula
In order to overcome the above problems as much as possible, Algorithm 1 is adopted. To strengthen the condition of balancing weight update, we update the weight only when the condition MSE k+1 0 < MSE k 0 is satisfied. As far as possible, to make the weight sequence meet the convergence conditions, reducing the fluctuation of MSE has an adverse effect on our method. In our implementation of dwPINNs, we use Tensorflow with a fixed number of iterations of Adam followed by another fixed number of iterations of the L-BFGS quasi-newton method [27,28]. This is consistent with the PINNs formulation in [9], as well as follow-up literature [11]. The adaptive weights are only updated in the Adam training steps and are held constant during L-BFGS training. The dwPINNs algorithm is summarized.

Algorithm 1: Dynamic weights strategy for PINNs
Step 1: Set Training steps K, the learning rate η, η w , initial values balance weights w = w u , w 0 , w b , w f and neural network parametersΘ.
Step 2: Consider a physics-informed neural network to define the weighted loss function J Θ , w u , w 0 , w b , w f based on (4).
Step 3: Then use K steps of a gradient descent algorithm to update the parameters w andΘ as: Tune the balance weights w via Adam to maximize the meeting constraints w k+1 ← Adam1 J w k ;Θ; K ; η; η w Update the parametersΘ via Adam to minimize J Θ k+1 ← Adam2 J w k ;Θ k ; K ; η; η w end for

A Brief Note on the Errors Involved in the dwPINNs Methodology
Let F and u be the family of functions that can be represented by the chosen neural network and the exact solution of PDE. Then, we define u a = arg min f ∈F f − u as the best approximation to the exact solution u. Let u g = arg minΘ J Θ be the solution of net at global minimum and u t = arg minΘ J Θ be the solution of net at local minimum. Therefore, the total error consists of an approximation error E app = u − u a , the optimization error E opt = u t − u g and the generalization error E gen = u g − u a . In PINNs, the number and location (distribution) of residual points are two important factors that affect the generalization error [13]. The optimization error is introduced due to the complexity of the loss function. The performance of PINNs is closely related to the appropriate combination of loss terms, which may avoid local optimization. Thus, the total error in PINNs as

1.
The optimization error can be reduced by using the dynamic weight strategy for physics-informed neural networks. During training, each part of the loss function can be dropped more evenly, and the loss can become smaller and converge faster.

2.
This method can reduce the generalization error by increasing the weights of hard-totrain points during training. It also makes the error of such hard-to-train points smaller.

Numerical Examples
In this section We apply the proposed dwPINNs to simulate different incompressible Navier-Stokes flows. First, we consider two-dimensional unsteady equations with the analytic solution to investigate the effectiveness of dwPINNs. Then, we employ the dynamic weights strategy to steady lid-driven cavity flow in two dimensions. Compared with other PINNs methods, the efficiency of this method is highlighted. Finally, the inverse problem of the flow around a cylinder is solved by the dwPINNs method. In the numerical experiments, we strictly control the non-experimental variables of the two test methods to be the same, such as training data, network structure, optimization method, etc. The activation functions used in the following numerical experiments are tanh. G.Cybenkot [29] found that it has a strong linear superposition approximation ability and is more suitable for function approximation. We keep the amount of training data roughly equal to the parameters of the neural network to avoid the overfitting problem. To illustrate the efficiency of the proposed method, the accuracy of the trained model is assessed through the relative L 2 error of the exact valueū(x i , t i ) and the trained approximation u(x i , t i ) inferred by the network at the data t i , x i N t i=1 , N t is the number of test data. The deep learning framework used in this experiment is tensorflow2.3. In terms of hardware, the CPU is Intel CORE i5 7th Gen, the memory is 4G, and the GPU is NVIDIA GeForce 940MX.

(10)
Based on Section 3.1, the problem loss is defined as follows ). The numerical results of the two methods are shown in Table 1. We see that the dwPINNs perform better than the PINNs, and also that applying the dynamic weights can improve the simulation accuracy. A snapshot of the velocity fields together with the absolute errors at t = 1 is displayed in Figure 1. It shows that this method is feasible and qualitatively accurate. The convergence of dynamic weights during the training process is displayed in Figure 2.
Next, considering whether dwPINNs can still work in the middle and later stages of training, we design the following experiment. After PINNs are normally trained 5000 times, we conduct dynamic weight strategy training 5000 times and take PINNs as the control group of the experiment. The results are shown in Figure 3. It can be clearly seen from the figure that the dynamic weight strategy can accelerate the convergence of error in the middle and later stages of training.

Comparison of the Different PINNs Methods for 2D Navier-Stokes Equations
The lid-driven cavity flow is a standard test case for verifying the accuracy of new computational methods for incompressible Navier-Stokes equations. Although, there are many papers in the literature that present results of the lid-driven cavity with different formulations, grids and numerical methods, we shall compare the results with Wang [18], where they used a Learning rate annealing for PINNs. The domain is [0, 1] × [0, 1] and no-slip boundary conditions are applied at the left, bottom and right boundaries. The top boundary moves with constant velocity in the positive x direction. u(x) = (u(x), v(x)) is a velocity vector field, p is a scalar pressure field. λ 1 = 1, λ 2 = 0.01.
Parameters during training are as follows: N f = 10,000, N b = 3000. The network structure is a 4-layer deep fully-connected network with 50 neurons per layer. The results of this experiment are summarized in Table 2. It represents average relative L2 error of |u(x)| = u 2 (x) + v 2 (x) across 10 runs with random restarts. Figure 4 summarizes the results of our experiment, which shows the magnitude of the predicted solution u, v and absolute point-wise error predicted by dwPINNs, after 32,000 stochastic gradient descent updates using the Adam optimizer. Figure 5 is the convergence process of dynamic weights and the relative L2 error of u and v.   Learning rate annealing for PINNs is one of the important loss balancing methods [18]. Compared with Learning rate annealing for PINNs, the error of dwPINNs is reduced. The SAPINNs outperformed other state-of-the-art PINN algorithms in L2 error by a wide margin [21]. However, the SAPINNs failed to solve the Navier-Stokes equations. Therefore, the performance of dwPINNs is the best at present.
In order to further analyze the performance of our method, we carried out the following systematic research to quantify its prediction accuracy for different numbers of training points and configuration points, as well as different neural network structures. In Table 3, we report the relative L2 error obtained under the conditions of different initial and boundary training data N u and different configuration points N F , while keeping the 4-layer network architecture fixed. With a sufficient number of configuration points N f , the overall trend of prediction accuracy continues to improve with the increase in the total number of training data N u . Finally, Table 4 shows the resulting relative L2 for different numbers of hidden layers, and different numbers of neurons per layer, while the total number of training and collocation points is kept fixed to N u = 3000 and N f = 10, 000, respectively. As expected, we observed that the prediction accuracy improved with the increase in the number of layers and neurons. Table 3. Relative L2 error between the predicted and the exact |u| for different number of initial and boundary training data N u , and different number of collocation points N f . Here, the network architecture is fixed to 4 layers with 50 neurons per hidden layer.

Inverse Problem: Two-Dimensional Navier-Stokes Equations
Here, we use a dynamic weight strategy for Physics-informed neural networks to simulate the 2D vortex shedding behind a circular cylinder at Re = 100. The cylinder diameter D is 1. The simulation domain size is [0,8] × [−2, −2]. In this example, λ 1 = 1.0, λ 2 = 0.01. High-fidelity data from [9] is used as a reference and for providing training data for dwPINNs. We sample N f = 5000 collocation points, N u = 5000 exact points. Our goal is to identify the unknown parameters λ 1 , λ 2 , as well as to obtain a qualitatively accurate reconstruction of the entire pressure field p(t, x, y) in the cylinder wake, which by definition can only be identified up to a constant. Each loss item is as follows Numerical results of the Navier-Stokes equation in the way of the dwPINNs are displayed in Table 5. Compared with PINNs, dwPINNs have less training time and higher prediction accuracy. Then, after applying noise to the original training data, the calculation was performed using the dwPINNs method. We observed that even if the training data are corrupted with 1% uncorrelated Gaussian noise, the method can identify unknown parameters very accurately λ 1 and λ 1 , indicating that our method has good stability. Figure 6 shows the loss history of two methods and dynamic weights history. Additionally, the convergence of loss dwPINNs is quicker than PINNs in Figure 6. The scalar w u increases rapidly and is more punitive to the MSE u , which leads to faster convergence. The loss of PINNs would attain 9.268 × 10 −1 , while dwPINNs would converge to 1.206 × 10 −1 . Also plotted are representative snapshots of the predicted velocity components u(t, x, y), v(t, x, y) after the model was trained in Figure 7. Based on the predicted versus instantaneous pressure field p(x, y, t) shown in Figure 7, the error between the predicted value and the true value is extremely low in the entire calculation domain. An interesting result is that, in the absence of any training data on the pressure itself, the network can provide a qualitatively accurate prediction of the entire pressure field p(t, x, y). The neural network architecture used here consists of 4 hidden layers with 50 neurons in each layer. The predicted λ 1 = 1.00065, λ 2 = 0.00991.

Conclusions
In this paper, we introduced a dynamic weights strategy for loss balanced in physicsinformed neural networks. The strategy is helpful to the training of PINNs in solving incompressible Navier-Stokes equations. It can effectively improve unbalanced backpropagated gradients during model training. The most appropriate weights are added to the optimization system through the objective function minimax alternating optimization, which makes the training more balanced. It is theoretically analyzed that the balanced weights are convergent under the condition of monotonic mean square error. The advantages of the method are verified by studying the forward and inverse problems of the Navier-Stokes equation. Various experimental results can support our view that the loss function decays slightly faster and the relative error is lower than the state-of-the-art PINNs. This paper also provides a reference for the application of adaptive mechanisms in the PINNs framework. Code and data accompanying this manuscript are publicly available at https://github.com/1shirong/dwPINNs.git (accessed on 7 July 2022). In this method, the convergence of the maximum system and the convergence of the minimum system are interdependent. Adam and other algorithms can not guarantee to find the global minimum, which will limit the optimal performance of this method. How to design an optimization algorithm specifically for PINNs is an open and meaningful problem.