Abstract
The Fornberg–Whitham equation, which contains high-order nonlinear derivatives, is widely recognized as a prominent model for describing shallow water dynamics. We explore physics-informed neural networks (PINsN) in conjunction with transfer learning technology to investigate numerical solutions for the FW equation. The proposed domain adaptation in transfer learning for PINN transforms the complex problem defined over the entire spatiotemporal domain into simpler problems defined over smaller subdomains. Training a neural network on these subdomains provides extra supervised learning data, effectively addressing optimization challenges associated with PINNs. Consequently, it enables the resolution of anisotropic and long-term predictive issues in these types of equations. Moreover, it enhances prediction accuracy and accelerates loss convergence by circumventing local optima in the specified scenarios. The method efficiently handles both forward and inverse FW equation problems, excelling in cost-effective accurate predictions for inverse problems. The efficiency and accuracy of our proposed approaches are demonstrated through examples and results.
Keywords:
physics-informed neural networks; Fornberg–Whitham equation; transfer learning; inverse problem MSC:
65J15
1. Introduction
Partial differential equations (PDEs) play a critical role in modeling diverse physical phenomena in applied sciences and engineering. Solving the nonlinear PDEs linked with these phenomena is essential for a thorough understanding of certain problems in nature. It is noteworthy that many nonlinear PDEs demonstrate traveling wave solutions known as solitary wave solutions. An important instance is the Fornberg–Whitham equation, which depicts features of wave breaking behavior, fluid mechanical dynamics, and water propagation. The FW equation is stated in the form
which is proposed to model the behavior of wave breaking []. The variable is explicitly defined as the wave profile, with x and t representing the spatial and temporal coordinates. A generalized Fornberg–Whitham equation takes the form
where is a constant.
Equations (1) and (2) describe the standard and parameterized Fornberg–Whitham models, respectively. To further investigate the decisive role of nonlinear effects on wave behavior, a modified Fornberg–Whitham equation is introduced. This model mathematically originates from the enhancement of the nonlinear convective term, replacing with . It provides an ideal mathematical model for studying higher-amplitude waves, more intense wave–wave interactions, and potential wave singularities. The modified Fornberg–Whitham equation is as follows:
The structure of the solutions has so far been analyzed in depth for the FW Equation []. The global existence of a solution to the viscous Fornberg–Whitham equation has been demonstrated in [], while the investigation of boundary control was discussed in []. There is a pressing need for further exploration both mathematically and numerically in this area. Developing a stable, higher-order-accurate, and authentic numerical scheme poses significant challenges and difficulties. In addition to the computationally intensive nonlinear term , two less explored third-order derivative terms are present, the mixed space and time linear term and the nonlinear dispersion term , which have the potential to spread localized waves. Direct and simple approximations of these third-order dispersion terms must be avoided to ensure a more precise and accurate numerical representation of wave steepening and spreading. The study by Hornik et al. [] demonstrated that deep neural networks serve as approximators for universal functions. Additionally, the work presented in reference [] introduced automatic differentiation techniques for neural networks capable of handling high-order differential operators and nonlinear terms. Hence, it is plausible to explore the application of neural networks in solving the FW equation.
Deep learning has emerged as a groundbreaking technology in various scientific domains [,,]. Its exceptional capacity for nonlinear modeling has garnered significant interest in computational mechanics [,] in recent years. However, training deep neural networks (DNNs) for black-box surrogate modeling typically necessitates a substantial amount of labeled data, a limitation frequently encountered in scientific applications. In 1998, Lagaris et al. [] pioneered the utilization of machine learning techniques to address partial differential Equations (PDEs) by employing artificial neural networks (ANNs). By formulating an appropriate loss function, the output generated by the network satisfies the prescribed equation and boundary conditions. Building upon this foundation, Raissi et al. [] harnessed automatic differentiation techniques and introduced physics-informed neural networks (PINNs), where the residual of the PDE is integrated into the loss function of fully connected neural networks as a regularizer. This incorporation effectively restricts the solution space to physically feasible solutions. PINNs exhibit a favorable characteristic of being able to operate effectively with limited data, a property that can be seamlessly integrated into the loss function for optimal performance.
Shin et al. [] conducted a theoretical study on the solution of partial differential equations using PINNs, suggesting that the solution obtained by PINNs can converge to the true solution of the equation under specific conditions. Mishra et al. [] provided an abstract analysis of the sources of generalization errors in PINNs when studying PDEs. PINNs have found widespread applications in solving various types of PDEs, including fractional PDEs [,] and stochastic PDEs [,], even when facing with limited training data. The loss function of PINNs typically comprises multiple terms, leading to a scenario where these terms compete during training, making it challenging to minimize all terms simultaneously. An effective strategy to enhance training performance is to adjust the weights assigned to the supervised and residual terms in the loss function [,]. A significant drawback of PINNs is the high computational cost associated with training, which can detrimentally impact their performance. Jagtap et al. [] proposed strategies for both spatial and temporal parallelization to mitigate training costs more efficiently. Additionally, Mattey and Ghosh [] highlighted that the accuracy of PINNs may be compromised in the presence of strong nonlinearity and higher-order partial differential operators. The FW equation contains high-order differential operators and nonlinear terms, thus requiring improvements to the PINN method for better solving it.
In some studies on continuous-time PINN methods, the clarity of time–space variables is compromised []. For evolutionary PDEs, accurate initial predictions are essential for PINN effectiveness. Additionally, missing time-dependent properties can cause training issues due to non-convex optimization challenges. Similar problems may arise in broader spatiotemporal domains. Therefore, in the design and implementation of PINN methods, it is crucial to address the treatment of spatiotemporal variables. Incorporating additional supervised data points aligns with the principles of label propagation methodology [], enhancing the efficacy of training performance significantly. In this scenario, the application of transfer learning principles should be considered to address this issue. Transfer learning is inspired by the human ability to leverage previously acquired knowledge to address new problems with improved solutions [,]. This approach may potentially mitigate non-convex properties, leading to efficient and accurate optimization in PINNs [,]. Domain adaptation is a mature research direction in transfer learning, with the core goal of transferring knowledge learned from a “source domain” to a related but differently distributed “target domain” through techniques such as feature alignment and distribution matching [].
In this study, we explore the utilization of PINNs for resolving Fornberg–Whitham-type equations. To improve the precision of predictions and the resilience of training, we introduce a new methodology named domain adaptation for PINNs (DA-PINN). It is capable of efficiently resolving large-scale spatiotemporal issues and demonstrates superior predictive accuracy compared to the PINN method, as validated through tests on Equation (1). The method efficiently addresses Equation (2) with minimal computational cost, while maintaining satisfactory prediction accuracy for unknown coefficients in noisy data environments. Additionally, DA-PINN is adept at solving anisotropic problems. Compared to the variational iteration methods [], it exhibits enhanced predictive accuracy, with computational efficacy confirmed for Equation (3).
This study is organized in the following manner. Section 2 introduces the background and domain adaptation for PINNs. Section 3 describes details about the proposed method. In Section 4, various examples are showcased to illustrate the effectiveness of the DA-PINN approach. The final conclusions are drawn in Section 5.
2. Background
In this section, we discuss fully connected neural networks and the issues of the baseline PINN.
2.1. Fully Connected Neural Networks
The feed-forward neural network, featuring hidden layers, is characterized by
and . In the terminal layer, an identity activation function is applied. By defining encompassed within as the complete set of weights and biases, and taking as the parameter space, we can write the output of the neural network as
in the presented framework, underscores the reliance of the output generated by the neural network, , on the parameter set . Typically, the initialization of weights and biases is conducted through predefined probability distributions to ensure a balanced onset for the training process. Construct the following objective function
The parameters of the neural network can be solved based on the least square principle. Training with labeled data enables learning of optimal parameters that minimize the objective function.
2.2. Physics-Informed Neural Networks
The solution of the equation is constructed as follows
where is the solution to the equation, is the approximate solution to the equation, is a variable, and denotes network parameters.
We define to be given by the left-hand side of Equation (1); i.e.,
and a deep neural network is used to approximate . Coupled with Equation (4), this approach leads to the formulation of a physics-informed neural network denoted as . This network is developed by employing the chain rule to differentiate function compositions through automatic differentiation. It utilizes the same parameters as the network that models though it features distinct activation functions as a result of the influence exerted by the differential operator . Empirical evidence suggests that this structured methodology acts as a regularization mechanism, enabling the utilization of comparatively straightforward feed-forward neural network architectures while training with limited data volumes. The objective function is as follows:
where
and
here, denotes the initial condition training data on , and specifies the collocation points for . The loss corresponds to the initial and boundary data, while enforces the structure imposed by Equation (4) at a finite set of collocation points. and are predetermined parameters that can accelerate loss convergence.
3. Proposed Method
When mathematicians study Fornberg–Whitham-type equations, a central objective is to seek their traveling wave solutions, which take the form , where c represents the wave speed. For such solutions, it is conventionally assumed that the wave profile decays at infinity, i.e., as . This asymptotic behavior acts as a boundary condition—referred to as the far-field boundary condition. Although not imposed at finite spatial points, it constrains the solution behavior at the boundaries of the unbounded domain. According to the characteristics of FW model equations, we use the PINN framework with different skills to solve these problems. For the FW equation, its boundary condition is specified as the far-field boundary condition, and it is appropriate to use transfer learning based on spatial variables. By using multi-scale neural networks to capture high-frequency information, more accurate solutions can be obtained.
3.1. Mscale-DNN
The F-Principle elucidates the phenomenon where neural networks are susceptible to high-frequency catastrophes, which pose significant challenges to training and generalization that are not readily mitigated by mere adjustments of parameters []. Employing a sequence of values, extending from 1 to a big number, allows for the construction of a Mscale-DNN framework. This structure is instrumental in enhancing the convergence pace for solutions that span a broad spectrum of frequencies, maintaining consistent precision across these frequencies. The Mscale-DNN network structure is shown in Figure 1.
Figure 1.
Mscale-DNN.
3.2. Domain Adaptation in Transfer Learning for PINN
As a well-established research branch in machine learning, domain adaptation essentially addresses the problem of “cross-domain knowledge transfer”. Specifically, when the “source domain” and “target domain” are correlated but differ in data distribution, it enables the knowledge trained on the source domain to be effectively adapted to the target domain through methods such as feature alignment and distribution matching. This logical framework is perfectly applicable to our work: each subproblem we handle can be clearly defined as the ”source domain” and ”target domain” within this framework. In the context of solving partial differential equations (PDEs), to avoid solution discontinuities (e.g., jumps in the solution field at the boundary of source and target domains), we introduce a source domain supervision term into the target domain’s loss function. This facilitates the transfer and alignment of features or knowledge of solutions between subdomains, thereby stabilizing training and improving the overall solution accuracy.
The main idea of domain adaptation in transfer learning for PINNs is to transform the anisotropic problem into several small-scale subproblems. The solving process is divided into two stages. In the initial stage, a traditional PINN is used to obtain the solution to the initial subproblem. In particular, Equation (1) undergoes training over the temporal interval or the spatial domain , employing the conventional PINN approach. The duration of the training interval for the source domain is an adjustable parameter, with its determination being contingent upon the specific problem at hand. In the transfer learning stage, the traditional PINN containing the training information from the previous step is used to solve the problem step by step in a time domain expansion or spatial domain expansion manner. This improved PINN can avoid the problem of traditional PINNs getting stuck in local optima. In the source domain training phase, the dataset is composed of pairs , defined as
where and are the numbers of residual data and boundary data in the source domain training step, respectively. The formulation of the optimization problem for training in the source domain is restated as follows:
The parameter is typically initialized randomly in the process of solving Equation (5). During formal training, the inclusion of additional supervised information can enhance the performance of the PINN []. The solution derived from training within the source domain during the initial phase is utilized as additional supervised learning data for subsequent formal training sessions. The loss function is delineated as follows:
where represents the weight of the extra supervised learning component. The dataset S is composed of pairs and defined as
denotes the loss function of supervised learning, defined as
with the dataset defined by
or
and represents the number of points in the dataset. The optimization problem
is solved to derive a numerical solution for Equation (1) within the domain . The specific steps of the algorithm can be found in Algorithm 1.
Remark 1.
The concept of incorporating additional supervised data is closely linked to the label propagation technique [], and the augmentation of supervised learning contributes to enhancing training performance. This technique demonstrates greater prominence in addressing complex problems, a phenomenon validated by the numerical examples in this study.
| Algorithm 1: Domain adaptation for PINNs |
| Input: Neural network structure; source task; target task.
Step 1: Set training steps K, Neural network parameters initialization; Step 2: Incorporate physical laws into the design of the neural network’s loss function based on (5). Solve the source task optimization problem (6) in the first stage. Output the neural network parameters ; Step 3: Domain adaptation: Generate the supervised learning data and the training set in next interval; Employ the parameter of the neural network, obtained upon the conclusion of the preceding training phase, as the initial estimate for the optimization of Equation (7). . output: The approximate solution of PDE. |
3.3. Optimization Method
We seek to find , which minimizes the loss function . Numerous optimization algorithms exist for minimizing the loss function, with gradient-based optimization techniques commonly utilized in the parameter training process. In the basic form, given an initial value of parameters , is the number of training subintervals. The parameters are updated as
stochastic Gradient Descent (SGD) represents a prevalent optimization method, where subsets of data points are randomly chosen to approximate the gradient direction in each iteration. This method proves effective in circumventing poor local minima during the training of deep neural networks, particularly under the one-point convexity condition.
3.4. Error Analysis
Define F as the set of functions representing a specified neural network architecture, and let denote the exact solution to the partial differential equation under consideration. Then, we define as the best approximation to the exact solution . Let be the solution obtained by the training net and be the solution of the net at the global minimum. Therefore, the total error consists of an approximation error the generalization error and the optimization error Enhanced network expressivity in a deep neural network can effectively reduce the approximation error, yet it may lead to a substantial generalization error, characterized by the bias–variance trade-off. Within the PINN framework, the number and distribution of residual points play pivotal roles in determining the generalization error, potentially influencing the loss function configuration—especially when employing a sparse set of residual points near steep solution changes. The optimization error stems from the intricate nature of the loss function, with factors such as network structure (depth, width, and connections) exerting significant influence. Fine-tuning hyperparameters like learning rate and number of iterations can further refine and mitigate optimization errors in the network. We have
3.5. Advantages of PINN Method for Transfer Learning
- In this method, the incorporation of an additional supervised learning component enhances training performance and contributes to a reduction in computational time.
- The methods can decrease the approximation error by employing multi-scale neural networks, which capture high-frequency information.
- This improved PINN avoids the problem of traditional PINNs getting stuck in local optima, thereby reducing optimization errors.
4. Numerical Examples
In this section, we provide multiple numerical illustrations encompassing the Fornberg–Whitham and the modified Fornberg–Whitham equations to show the effectiveness of domain adaptation in transfer learning for PINNs. To demonstrate the feasibility of the proposed algorithm, the baseline PINN method is first used to solve the forward and inverse problems of the Fornberg–Whitham equation, and hyperparameters are analyzed. Then, the improved PINN method is used to improve the boundary error of the region, and the proposed method is tested to have good effects on anisotropic problems. Next, the modified Fornberg–Whitham equation is tested, and the proposed method can obtain higher accuracy numerical results in a shorter time.
To evaluate the performance of our methods, we adopt the relative error. The methodology is implemented within TensorFlow version 1.3, with variables of the float32 data type. Throughout the experiments, the Swish activation function is utilized. For the Adam optimizer, an exponential decay in the learning rate is set to occur every 50 steps with a decay rate of 0.98. The configuration and termination conditions for the L-BFGS optimizer adhere to the recommendations provided in []. Prior to initiation of the training phase, neural network parameters are randomly determined following the Xavier initialization method, as outlined in [].
where represents either the actual solution or the reference solution, and denotes the prediction made by the neural network for a point within the test data . The test data are formulated through the stochastic sampling of points from the spatiotemporal domain. The performance of the neural network is evaluated using the -error
4.1. Fornberg–Whitham Equation
Consider Equation (1) with the following initial condition:
We have the exact solution:
The computation is carried out in the domain of and . In this assessment, a four-layer fully connected network architecture is employed, featuring 50 neurons in each layer and utilizing the Swish activation function. We also investigate the performance of PINNs with respect to the number of sampling points and boundary points. Specifically, the number of points selected for the trials shown is . We also determine the optimization strategy. Applying the Adam optimizer to optimize the initial trainable parameters in the neural network. Subsequently, the L-BFGS-B optimizer is deployed to fine-tune the neural network, aiming to achieve higher accuracy. The termination of the L-BFGS-B training process is automatically governed by a predefined increment tolerance. Unless otherwise specified, all experiments in this study adopt the following unified setup and parameter configurations to ensure consistency and reproducibility of results.
4.1.1. Baseline PINN
In this section, we evaluate the performance of the baseline PINN method in data-driven solutions of the Fornberg–Whitham equation and data-driven discovery of Fornberg–Whitham equation. We systematically analyze various hyperparameters within the PINN framework to better understand their influence on the predictive accuracy and overall performance of the model.
Firstly, consider the issue of data-driven solutions to the FW equation. Figure 2 encapsulates our findings regarding the data-driven solution to the FW equation. We see that the predicted solution is consistent with the true solution, and the error between the predicted solution and the true solution is small enough. It is feasible to solve the FW equation by the PINN method. With merely a limited set of initial data, the physics-informed neural network effectively grasps the complex nonlinear dynamics inherent in FW equation. This aspect is famously challenging to resolve with precision using conventional numerical techniques, necessitating a meticulous spatial and temporal discretization of the equation. To further evaluate the efficacy of our proposed method, we conduct a series of systematic investigations to measure its predictive accuracy across varying numbers of training and collocation points, as well as different neural network architectures. The relative errors associated with different quantities of initial and boundary training data and various numbers of collocation points are detailed in Table 1. The observed trend clearly demonstrates an improvement in predictive accuracy with an augmentation in the quantity of training data , contingent upon an adequate count of collocation points . This finding accentuates a principal advantage of physics-informed neural networks: the incorporation of the inherent structure of the physical laws via collocation points facilitates a learning algorithm that is both more precise and efficient in terms of data utilization. Finally, Table 2 presents the relative errors derived from varying the number of hidden layers and the number of neurons per layer, whilst maintaining constant the number of initial training points and collocation points at and , respectively.
Figure 2.
(Left): Plot of the approximation via the PINN. (Middle): Plot of the exact . (Right): Error graph between the exact solution and the predicted solution.
Table 1.
The relative norm error between the predicted and the exact solution across various quantities of initial and boundary training data , and differing numbers of collocation points .
Table 2.
The relative norm discrepancy between the predicted output and the exact is evaluated in the context of varying numbers of hidden layers and neurons per layer, while maintaining a constant total of training and collocation points at and , respectively.
4.1.2. Multiple DA-PINN
Although the PINN method has shown feasibility in solving the forward problem of the FW equation, it may not always achieve the necessary accuracy, especially at the termination time of T = 0.1. Therefore, we attempt to use DA-PINN to solve the above problems, as described in Section 3 as a more appropriate approach to tackle this issue. Specifically, in the DA-PINN process, set the source domain to (, , ) and the target domain to ().
The results obtained demonstrate that the relative error of DA-PINN on the test set is 4.051218 , which is significantly lower compared to the relative error of 8.7981 obtained when using the PINN method. Furthermore, Figure 3 and Figure 4 illustrate error distribution of two methods. Specifically, the data from the figures reveal that the errors at T = 0.1 generated by DA-PINN are substantially smaller than the corresponding errors at T = 0.1 obtained using the PINN. This underscores the superior precision and effectiveness of the DA-PINN method for addressing the FW equation compared to the standard PINN approach. The Error of the numerical solution of Equation (1) solved by DA-PINN under different spatiotemporal slices is shown in Table 3.
Figure 3.
Distribution of absolute errors of the numerical solution by PINN for Fornberg–Whitham equation.
Figure 4.
Distribution of absolute errors of the numerical solution obtained by DA-PINN.
Table 3.
Errors of the numerical solution for Equation (1) solved by DA-PINN under different spatiotemporal slices.
Next, consider the following issues: , , and . To maintain the precision of the source task, it is essential that the length of the source task interval remains reasonably short. Yet, in more challenging scenarios, a sole source target accompanied by a brief pre-training period might fall short of offering an adequate preliminary estimate for the substantive training phase. To address this issue, we introduce a strategy encompassing multiple instances of transfer learning. This strategy entails the implementation of various transfer learning phases, as outlined below:
in the first transfer learning interval , the standard transfer learning step is reused. The resulting parameters of the neural network are indicated by . The transfer learning dataset for the i-th interval is represented by . In any transfer learning interval , , consider the following optimization problem:
The training set of two methods is listed in Table 4.
Table 4.
The training sets of the PINN, the 1-interval DA-PINN, and the 2-interval DA-PINN for Equation (1).
In this study, we encounter a problem with , and . To solve this issue, we employ Multiple DA-PINN. However, when using the PINN, we observe a relative error of 6.05 , which indicates that the solution is invalid. In contrast, Multiple DA-PINN produces a relative error of 1.51 , indicating that the solution is valid. Therefore, we conclude that Multiple DA-PINN is a reliable method for solving this particular problem. Our findings demonstrate the effectiveness of Multiple DA-PINN in solving large space issues with high accuracy.
4.1.3. Prediction Results over a Long Time Interval
Given the computational instability encountered when utilizing the standard PINN for long-term evolution simulations, we investigate the potential improvements offered by DA-PINN. The spatial domain for this study is set as [−5, 5], with a time span of T = 20. The relative error for the PINN method in solving this particular problem is measured at 1.27 . In contrast, the relative error for the DA-PINN method in addressing the same issue is notably lower at 2.25 , indicating the effectiveness of the DA-PINN approach in mitigating the aforementioned challenges.
We next evaluate the convergence speed of the proposed method. Figure 5 illustrates the loss convergence of both the PINN and DA-PINN methods. Figure 6 illustrates the relative error of both the PINN and DA-PINN methods. Our results demonstrate that the DA-PINN approach exhibits a faster relative error convergence rate and significantly reduces errors in comparison to the standard PINN method. These findings suggest that DA-PINN provides an effective solution to addressing the numerical instability encountered when utilizing PINNs for long-term evolution simulations. The improved convergence speed of DA-PINN may facilitate better modeling accuracy and reduce computational cost, thus offering a promising avenue for addressing the aforementioned challenges in numerical simulations.
Figure 5.
The loss convergence process of PINN and DA-PINN.
Figure 6.
The L2 error convergence process of PINN and DA-PINN.
4.2. Data-Driven Discovery of the FW Equation
In this segment of our research, we focus on the task of data-driven discovery of partial differential equations []. When the observed partial flow field data of Equation (2) is known and the parameter is unknown, we can employ the PINN method to reconstruct the complete flow field and infer the unknown parameter . Our approach involves approximating through a deep neural network. Under this framework, together with Equation (2), we derive a physics-informed neural network . This derivation is facilitated by employing the chain rule to differentiate function compositions via automatic differentiation. Notably, the differential operator parameters are assimilated as parameters within the physics-informed neural network . Numerical results obtained with the PINN are displayed in Figure 7.
Figure 7.
(Top): Plot of the approximation via the PINN. (bottom): Snapshots of the approximation vs. the exact solution at various time points through the temporal evolution.
The objective of our study is to estimate the unknown parameter even in the presence of noisy data, utilizing the PINN. Table 5 summarizes our findings for this case. Our results demonstrate that the physics-informed neural network accurately identifies the unknown parameter with high precision, even in the presence of noisy training data. Specifically, the estimation error for is 0.44% under noise-free conditions and 0.55% when training data are corrupted by 1% uncorrelated Gaussian noise.
Table 5.
Relative errors of the approximate solution for Equation (2) with clean data and noisy data.
The Mscale-PINN method is an improved version of the PINN method that enhances solution accuracy by introducing an adaptive scale parameter in the first layer of the network structure. To conduct the comparison, both the Mscale-PINN model and the traditional PINN model are trained using the same dataset and training procedure, with the error serving as the evaluation metric. A summary of our results for this example is presented in Table 6. Experimental results demonstrate that the Mscale-PINN method achieves lower relative error compared to the traditional PINN method when solving Equation (2). Further analysis reveals that the Mscale-PINN method, by dynamically adjusting the scale parameter, adapts better to the physical processes at different scales, thereby providing more accurate solutions.
Table 6.
Comparison of relative errors between Mscale-PINN and PINN in solving Equation (2).
4.3. Modified Fornberg–Whitham Equation
In this section, we compare the proposed DA-PINN method with the variational iteration method (VIM) [], as well as the baseline PINN. The VIM is an analytical approximation technique that solves nonlinear differential equations iteratively by constructing correction functionals to establish a reference from traditional numerical methods. As a well-established numerical scheme, the VIM provides solutions with high accuracy within its convergence region and is often used to validate newly proposed numerical methods. Its results can be regarded as a ‘quasi-exact’ solution, allowing for a fair assessment of the accuracy of different numerical approaches. For this example, consider Equation (3) subject to the following initial condition:
where c is the wave speed given by
and the corresponding exact solution is
The solution region for this problem is [−5, 5], with T = 0.1. The efficacy and applicability of DA-PINN for the current problem are presented in Table 7, showing the absolute error of for different values of t and x. The data presented in the tables demonstrate that the absolute errors produced by DA-PINN are considerably lower than those generated using the VIM []. The characteristics of both exact and approximated solutions are depicted in Figure 8 and Figure 9.
Table 7.
error for VIM, PINN, and DA-PINN for Equation (3) when x = 2.5.
Figure 8.
Contrast exact solution (middle) for Equation (3) with the result obtained by DA-PINN (left) based on the numerical error (right).
Figure 9.
Visual representations of the approximation compared to the high-fidelity solution are depicted at various time points throughout its temporal evolution for Equation (3).
4.3.1. The Problem of Large Space–Time Domain
In the subsequent analysis, we expand our research to a larger spatiotemporal domain, namely . This expansion enables us to assess the performance of Multiple DA-PINN in handling problems of increasing complexity. By applying Multiple DA-PINN to this extended domain, we aim to ascertain its efficacy in capturing spatiotemporal dynamics in larger-scale problems. The relative error of solving this problem with the multiple DA-PINN method is 8.578 , which is 82% lower than that calculated with the baseline PINN method.
We see in Figure 10 that the DA-PINN prediction is very accurate and indistinguishable from the exact solution. For the test problem, the absolute errors are conveyed in Table 8 for different values of t and x.
Figure 10.
Figure of numerical and exact solutions of Equation (3) solved by DA-PINN in the spatiotemporal domain .
Table 8.
errors of Equation (3) estimated by DA-PINN at various time scales.
4.3.2. Ablation Analysis Experiments
To systematically evaluate the contribution of each component in the proposed DA-PINN method and its sensitivity to key hyperparameters, a series of ablation and sensitivity experiments are conducted in this section. All experiments are performed on the Fornberg–Whitham equation introduced in Section 4.2. The primary goal of these experiments is to isolate the two core components of the DA-PINN framework—domain adaptation and the multi-scale structure (Mscale-DNN)—to verify their necessity. Under the settings of spatiotemporal domain , four model configurations are compared: (a) Baseline PINN: The standard PINN method. (b) PINN + Mscale: Utilizing only the multi-scale network structure without domain adaptation. (c) DAPINN (ours): Employing domain adaptation with a standard fully connected neural network structure. (d) DAPINN + Mscale: Our complete proposed model, integrating both domain adaptation and the multi-scale network structure.
The relative error is adopted as the evaluation metric, and the results are presented in Table 9. The convergence process of relative L2 error is shown in Figure 11.
Table 9.
Ablation study on relative errors of different model configurations.
Figure 11.
The descending curves of errors under the four configurations in the spatiotemporal domain .
It is evident from the results that introducing transfer learning yields an order-of-magnitude improvement in accuracy. This indicates that for problems with a certain spatial scale, decomposing complex tasks via domain adaptation strategies is crucial for enhancing PINN performance. Regarding the multi-scale structure, its standalone application brings limited improvements. However, when combined with transfer learning, it further enhances accuracy. This suggests that Mscale-DNN effectively improves the model’s expressive capacity, enabling better fitting of high-frequency components of the solution, thus forming a strong complement to the domain adaptation strategy.
5. Conclusions
In this study, we propose the domain adaptation in transfer learning for PINN (DA-PINN) method for solving Fornberg–Whitham type equations. This study represents the inaugural endeavor to employ neural networks for solving the FW equation. Our results demonstrate the feasibility of using the neural networks to solve the FW equation, particularly excelling in inverse FW problems. The integration of transfer learning strategies within the PINN framework significantly enhances training effectiveness. For larger spatiotemporal scales, DA-PINN serves as a viable solution when PINNs fail. Our method effectively alleviates the issue of becoming trapped in local optima during optimization, thus enhancing predictive accuracy. Moreover, it is effective in accelerating loss convergence and reducing computational time for long-term predictive scenarios. Compared to traditional VIM methods, DA-PINN exhibits higher accuracy in solving the modified FW equation, highlighting the potential of our proposed approach in numerically solving the FW equation.
Author Contributions
Conceptualization, S.L. (Shaoyong Lai); methodology, S.L. (Shirong Li); software, S.L. (Shirong Li), H.G. and M.M.; validation, S.L. (Shirong Li); formal analysis, H.G. and S.L. (Shirong Li); investigation, M.M. and S.L. (Shirong Li); resources, S.L. (Shaoyong Lai); data curation, S.L. (Shirong Li); writing—original draft preparation, S.L. (Shirong Li); writing—review and editing, S.L. (Shirong Li) and H.G.; visualization, S.L.; supervision, S.L. (Shirong Li); project administration, S.L. (Shirong Li); funding acquisition, S.L. (Shirong Li) and M.M. All authors have read and agreed to the published version of the manuscript.
Funding
This work was supported by Natural Science Foundation of Xinjiang Uygur Autonomous Region (NO.2025D01B15) and the Initial Fund of Kashi University (NO. (2024) 2921).
Data Availability Statement
The data presented in this study are openly available in [GitHub] at [https://github.com/1shirong/DA-PINN, the access date of the URL is 23 October 2025].
Conflicts of Interest
The authors declare no conflict of interest.
References
- Camacho, J.C.; Rosa, M.; Gandarias, M.L.; Bruzón, M.S. Classical symmetries travelling wave solutions and conservation laws of a generalized Fornberg-Whitham equation. J. Comput. Appl. Math. 2017, 318, 149–155. [Google Scholar] [CrossRef]
- Whitham, G.B. Linear and Nonlinear Waves; John Wiley & Sons: Hoboken, NJ, USA, 2011. [Google Scholar]
- Tian, L.; Gao, Y. The global attractor of the viscous Fornberg-Whitham equation. Nonlinear Anal. 2009, 71, 5176–5186. [Google Scholar] [CrossRef]
- Meng, Y.; Tian, L. Boundary control on the viscous Fornberg-Whitham equation. Nonlinear Anal. Real World Appl. 2010, 11, 827–837. [Google Scholar] [CrossRef]
- Elbrächter, D.; Perekrestenko, D.; Grohs, P.; Bölcskei, H. Deep neural network approximation theory. IEEE Trans. Inform. Theory. 2021, 67, 2581–2623. [Google Scholar] [CrossRef]
- Baydin, A.G.; Pearlmutter, B.A.; Radul, A.A.; Siskind, J.M. Automatic differentiation in machine learning: A survey. J. Mach. Learn. Res. 2018, 18, 1–43. [Google Scholar]
- Wang, J.; Feng, X.; Xu, H. Adaptive sampling points based multi-scale residual network for solving partial differential equations. Comput. Math. Appl. 2024, 169, 223–236. [Google Scholar] [CrossRef]
- Wang, C.; Ma, H. Research on a Rapid Three-Dimensional Compressor Flow Field Prediction Method Integrating U-Net and Physics-Informed Neural Networks. Mathematics 2025, 13, 2396. [Google Scholar] [CrossRef]
- Zhou, M.; Mei, G. Transfer Learning-Based Coupling of Smoothed Finite Element Method and Physics-Informed Neural Network for Solving Elastoplastic Inverse Problems. Mathematics 2023, 11, 2529. [Google Scholar] [CrossRef]
- Kutz, J.N. Deep learning in fluid dynamics. J. Fluid. Mech. 2017, 814, 1–4. [Google Scholar] [CrossRef]
- Froio, A.; Bonifetto, R.; Carli, S.; Quartararo, A.; Savoldi, L.; Zanino, R. Design and optimization of artificial neural networks for the modelling of superconducting magnets operation in tokamak fusion reactors. J. Comput. Phys. 2016, 321, 476–491. [Google Scholar] [CrossRef]
- Lagaris, I.E.; Likas, A.; Fotiadis, D.I. Artificial neural networks for solving ordinary and partial differential equations. IEEE Trans. Neural. Netw. 1998, 9, 987–1000. [Google Scholar] [CrossRef] [PubMed]
- Raissi, M.; Perdikaris, P.; Karniadakis, G.E. Physics-informed neural networks: A deep learning framework for solving forward and inverse problems involving nonlinear partial differential equations. J. Comput. Phys. 2019, 378, 686–707. [Google Scholar] [CrossRef]
- Shin, Y.; Darbon, J.; Karniadakis, G.E. On the convergence of physics informed neural networks for linear second-order elliptic and parabolic type PDEs. Commun. Comput. Phys. 2020, 28, 2042–2074. [Google Scholar] [CrossRef]
- Mishra, S.; Molinaro, R. Estimates on the generalization error of physics-informed neural networks for approximating PDEs. IMA J. Numer. Anal. 2023, 43, 1–43. [Google Scholar] [CrossRef]
- Pang, G.; Lu, L.; Karniadakis, G.E. fPINN: Fractional physics-informed neural networks. SIAM J. Sci. Comput. 2019, 41, 2603–2626. [Google Scholar] [CrossRef]
- Kharazmi, E.; Cai, M.; Zheng, X.; Zhang, Z.; Lin, G.; Karniadakis, G.E. Identifiability and predictability of integer-and fractional-order epidemiological models using physics-informed neural networks. Nat. Comput. Sci. 2021, 1, 744–753. [Google Scholar] [CrossRef]
- Zhang, D.; Ling, G.; Karniadakis, G.E. Learning in modal space: Solving time-dependent stochastic PDEs using physics-informed neural networks. SIAM J. Sci. Comput. 2020, 42, 639–665. [Google Scholar] [CrossRef]
- Yang, L.; Zhang, D.; Karniadakis, G.E. Physics-informed generative adversarial networks for stochastic differential equations. SIAM J. Sci. Comput. 2020, 42, A292–A317. [Google Scholar] [CrossRef]
- Wang, S.; Yu, X.; Perdikaris, P. When and why PINN fail to train: A neural tangent kernel perspective. J. Comput. Phys. 2022, 449, 110768. [Google Scholar] [CrossRef]
- Wang, S.; Teng, Y.; Perdikaris, P. Understanding and mitigating gradient flow pathologies in physics-informed neural networks. SIAM J. Sci. Comput. 2021, 43, 3055–3081. [Google Scholar] [CrossRef]
- Jagtap, A.D.; Kharazmi, E.; Karniadakis, G.E. Conservative physics-informed neural networks on discrete domains for conservation laws: Applications to forward and inverse problems. Comput. Methods. Appl. Mech. Eng. 2020, 365, 113028. [Google Scholar] [CrossRef]
- Mattey, R.; Ghosh, S. A novel sequential method to train physics informed neural networks for Allen Cahn and Cahn Hilliard equations. Comput. Methods. Appl. Mech. Eng. 2022, 390, 114474. [Google Scholar] [CrossRef]
- Lu, L.; Meng, X.; Mao, Z.; Karniadakis, G.E. DeepXDE: A deep learning library for solving differential equations. SIAM Rev. 2021, 63, 208–228. [Google Scholar] [CrossRef]
- Haitsiukevich, K.; Ilin, A. Improved training of physics-informed neural networks with model ensembles. In Proceedings of the International Joint Conference on Neural Networks (IJCNN), Gold Coast, Australia, 18–23 June 2023; pp. 1–8. [Google Scholar]
- Pan, S.J.; Yang, Q. A survey on transfer learning. IEEE. Trans. Knowl. Data Eng. 2010, 22, 1345–1359. [Google Scholar] [CrossRef]
- Guan, Y.; Chattopadhyay, A.; Subel, A.; Hassanzadeh, P. Stable a posteriori les of 2d turbulence using convolutional neural networks: Backscattering analysis and generalization to higher re via transfer learning. J. Comput. Phys. 2022, 458, 111090. [Google Scholar] [CrossRef]
- Goswami, S.; Anitescu, C.; Chakraborty, S.; Rabczuk, T. Transfer learning enhanced physics informed neural network for phase-field modeling of fracture. Theor. Appl. Fract. Mech. 2020, 106, 102447. [Google Scholar] [CrossRef]
- Chakraborty, S. Transfer learning based multi-fidelity physics informed deep neural network. J. Comput. Phys. 2021, 426, 109942. [Google Scholar] [CrossRef]
- Farahani, A.; Voghoei, S.; Rasheed, K.; Arabnia, H.R. A brief review of domain adaptation. In Advances in Data Science and Information Engineering: Proceedings from ICDATA 2020 and IKE 2020; Springer: Cham, Switzerland, 2021; pp. 877–894. [Google Scholar]
- Lu, J. An analytical approach to the Fornberg-Whitham type equations by using the variational iteration method. Comput. Math. Appl. 2011, 61, 2010–2013. [Google Scholar] [CrossRef]
- Xu, Z.-Q.J.; Zhang, Y.; Xiao, Y. Training behavior of deep neural network in frequency domain. In Neural Information Processing; Springer International Publishing: Cham, Switzerland, 2019; pp. 264–274. [Google Scholar]
- He, Q.Z.; Barajas-Solano, D.; Tartakovsky, G.; Tartakovsky, A.M. Physics-informed neural networks for multiphysics data assimilation with application to subsurface transport. Adv. Water. Resour. 2020, 141, 103610. [Google Scholar] [CrossRef]
- Nocedal, J.; Wright, S.J. Numerical Optimization; Springer: New York, NY, USA, 2006. [Google Scholar]
- Glorot, X.; Bengio, Y. Understanding the difficulty of training deep feedforward neural networks. In Proceedings of the Thirteenth International Conference on Artificial Intelligence and Statistics, Sardinia, Italy, 13–15 May 2010; pp. 249–256. [Google Scholar]
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content. |
© 2025 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).