Learning Traveling Solitary Waves Using Separable Gaussian Neural Networks

In this paper, we apply a machine-learning approach to learn traveling solitary waves across various physical systems that are described by families of partial differential equations (PDEs). Our approach integrates a novel interpretable neural network (NN) architecture, called Separable Gaussian Neural Networks (SGNN) into the framework of Physics-Informed Neural Networks (PINNs). Unlike the traditional PINNs that treat spatial and temporal data as independent inputs, the present method leverages wave characteristics to transform data into the so-called co-traveling wave frame. This reformulation effectively addresses the issue of propagation failure in PINNs when applied to large computational domains. Here, the SGNN architecture demonstrates robust approximation capabilities for single-peakon, multi-peakon, and stationary solutions (known as “leftons”) within the (1+1)-dimensional, b-family of PDEs. In addition, we expand our investigations, and explore not only peakon solutions in the ab-family but also compacton solutions in (2+1)-dimensional, Rosenau-Hyman family of PDEs. A comparative analysis with multi-layer perceptron (MLP) reveals that SGNN achieves comparable accuracy with fewer than a tenth of the neurons, underscoring its efficiency and potential for broader application in solving complex nonlinear PDEs.


Introduction
Physics-informed Neural Networks (PINNs) [1,2] have emerged as a promising datadriven approach to solving partial differential equations (PDEs) by synthesizing data and physical laws.Moreover, they have received considerable traction because they can be efficiently adapted to solving PDEs defined on domains with arbitrary geometry.Remarkable results with PINNs have been achieved across multiple domains and physical situations, such as heat transfer [3], Navier-Stokes [4] and Euler equations [5], nonlinear dynamical lattices [6,7], and medical image processing [8], to name a few.
However, many examples of PINNs are limited to "toy" problems situated in lowdimensional spaces with small spatio-temporal, i.e., computational domains.It has been observed that PINNs often converge to incorrect or trivial solutions across a broad spectrum of problems [9][10][11] (see also [6] for a case where they fail to respect symmetries).This issue becomes more pronounced in problems with larger domains, where a phenomenon known as propagation failure [12] frequently occurs.This challenge arises because PINNs utilize an unsupervised learning scheme to solve PDEs by minimizing the residual errors of the underlying governing equations.The presence of this phenomenon does not ensure convergence to an accurate solution, as numerous trivial solutions can also exhibit zero residuals.Therefore, as the learning process attempts to extend the solution from the initial and/or boundary conditions to the interior points, it often becomes "trapped" in trivial solutions.This phenomenon is particularly common when PINNs are applied to solve problems with large domains.Indicatively, Fig. 1 highlights this issue in the Camassa-Holm (CH) equation [13] for a single-peakon solution.
To address the pathologies of PINNs, multiple methods have been developed including ones that consider embedding Fourier features [14], adaptive sampling [12,15], and those respecting causality [16].In fact, besides the physical laws embodied in PDEs themselves, the mathematical properties of their solutions can be leveraged too.For example, traveling waves (TWs) to PDEs are solutions of the form u(x ± ct), where c is their speed (the "-" and "+" signs correspond to TWs moving, i.e., traveling to the right and left, respectively).However, few efforts have been devoted to embedding such mathematical properties of solutions into PINNs (see, [6] for the development of symmetry-preserving PINNs) such that the output of neural networks (NNs) will automatically respect the corresponding features of the solution.This is expected to improve the efficiency in training and increase the opportunity for NNs to converge to correct solutions.
In this paper, we aim to enhance PINNs by pursuing this route.More specifically, we will focus on seeking TW solutions to nonlinear PDEs using PINNs with input transformed into a frame that co-moves with the solution, i.e., co-traveling frame.This idea has been explored in the recent work in [18] in which the characteristics of hyperbolic PDEs are encoded in the network by adding a characteristic layer.Herein, we will use this structure to learn TWs, i.e., solitary waves in multiple families of one and two-dimensional nonlinear PDEs.Those include the band ab-families of peakon equations [19,20] which contain the (completely integrable) Camassa-Holm (CH) and Degasperis-Procesi equations [13,21] (see, also [22]), and the Rosenau-Hyman compacton equations [23] (see, also [24]).In addition, a novel interpretable NN -Separable Gaussian Neural Networks (SGNNs) [25] -will be employed to extract solution forms in the sense of generalized Gaussian radial-basis functions.The description of this network will be deferred to Section 2, along with the discussion about its advantages.
The rest of the paper is organized as follows.In Section 2, we introduce the architecture of SGNN with its input transformed to co-traveling frames.Our aim is to integrate this structure into the framework of PINNs, and identify TWs to the nonlinear PDEs of interest in this work.In Sections 3 and 4, we demonstrate the applicability of the method to the study of peakons in the band ab-families of peakon equations, respectively.Then, we extend this approach in Section 5 to identify 2D compacton configurations.We mention in passing that the architecture can easily predict such higher-dimensional solutions which have not been studied in the realm of PINNs to the best of our knowledge.At last, we perform an extensive comparison of the two different architectures of PINNs with different network structures in Section 6 where the advantages and disadvantages, as well as limitations of SGNN are discussed.We conclude our findings in Section 7, and present future research directions.

Architecture of SGNN for traveling waves
Inspired by [18], a d-dimensional (in space) TW is mapped into a frame that co-moves with it by performing the following coordinate transformation where c i is its constant velocity in the i-th dimension.Under such a transformation, a TW becomes a stationary wave in the co-traveling frame.As shown in Fig. 2, the coordinates ζ i (i = 1, 2, . . ., d) become the input of the SGNN.The coordinates are then divided according to their dimensions, and sequentially fed to the feedforward layers.This results in a number of layers that is equal to the number of spatial dimensions.The neurons of each layer we consider are generalized univariate Gaussian functions in the form where α ∈ R − {0}.When α = 2, φ is the regular univariate radial-basis Gaussian function.In this paper, we will adopt α = 1 for peakon solutions, and α = 2 for other solutions.
The first hidden layer of SGNN receives a single input: the partial coordinate ζ 1 .Subsequent hidden layers take two inputs -the output from the preceding hidden layer and a coordinate in the traveling frame.The network culminates in a dense output layer, which aggregates the outputs from the final hidden layer.The mathematical representation of SGNN [25] is defined in the form where N (l) i represents the output of the i-th neuron of the l-th layer, N l stands for the number of neurons of the l-th layer, and ū is the output of SGNN.When d > 2, the weights of the output layer are set to 1.
Thanks to the separable property of Gaussian radial-basis functions, the forward propagation of such univariate Gaussian functions yields the summation of multiple chains of univariate Gaussian functions which are equivalent to the summation of high dimensional Gaussian radial-basis functions.In other words, the output of an SGNN is equivalent to the output of a GRBFNN [26] in the form of where G j is a d-dimensional Gaussian radial-basis function.
The SGNN offers several advantages.Firstly, it is interpretable.The parameters of a neuron depict its local geometrical information (center and width).Without the composition of nonlinear activation functions, a human-interpretable explicit output form of Eq. 6 can be obtained, in the sense of Gaussian radial-basis functions.Secondly, SGNN is easier to tune than MLP.This is because the depth of SGNN is identical to the number of dimensions; therefore, the only tunable hyperparameter is the width of each layer.Lastly, it can achieve several-order-of-magnitude more accurate results than MLP when approximating complex functions.The interested reader can consult [25] for detailed comparisons between SGNN and MLP.

Physics-informed Machine Learning
The SGNN is adopted to approximate the solution u(x, t) of PDEs in the form which is subject to boundary and initial conditions (abbreviated hereafter as BCs and ICs, respectively) The loss function is defined as where and λ r , λ bc , λ ic are scaling factors.Here, L r , L bc , and L ic represent the MSE (mean-squared error) of PDEs, BCs, and ICs, respectively.The collocation points denoted as {x i r , t i r } and {x i bc , t i bc } are randomly sampled in the domain and on the boundary, respectively.In addition, {x i ic } are spatial points sampled at t = 0.

Training scheme
In the 1D problems presented next, we employ a two-stage training process for SGNN, initially using the ADAM optimizer [27] followed by the L-BFGS algorithm [28].This approach allows us to leverage the L-BFGS algorithm's capability to enhance convergence accuracy after the preliminary optimization with ADAM.In contrast, for 2D problems, we solely rely on the ADAM optimizer due to the computational demands of running L-BFGS with larger datasets.The training dataset is randomly sampled using the 'Sobol' method, which empirically can yield better results [15].The validation dataset is created through a method of even partitioning across the domain and boundaries, ensuring comprehensive coverage and testing of the model's predictive capabilities.Throughout the training phase, the SGNN's weights are initialized based on a uniform random distribution.Additionally, the initial centers of the univariate Gaussian neurons are distributed evenly across the respective dimensions, with their initial widths defined by the distance between adjacent centers.

Peakons in b-family
The first model-PDE we consider in this work is the well-known b-family of peakon equations: that was introduced in [19].It has been proposed as model for the propagation of shallow water waves [19] with the parameter b related to the Kodama transformation group of shallow water water equations [29,30].Moreover, Eq. ( 15) contains two completely integrable models for b = 2 and b = 3, known as the Camassa-Holm equation [13,31] (see, also [22]) and Degasperis-Procesi equation [21], respectively.The striking feature of the b-family of Eq. ( 15) is that it possesses explicit single-peakon and multi-peakon solutions for all values of b, where q j and p j are the position and amplitude of j-th peakon with N representing the number of peakons, i.e., j = 1, . . ., N. The peakon solution of Eq. ( 16) (and similarly, its multi-peakon version) is not differentiable at its center, rendering its analytical and numerical study (from the PDE point of view) an extremely challenging task (see, [32] for the spectral stability analysis of peakons).It should be noted in passing that alongside the existence of peakon solutions, the b-family possesses explicit stationary solutions known as "leftons" [32] (and references therein) given by where A and x 0 are their amplitude and center, respectively.These solutions also emerge numerically upon propagating Gaussian initial data to Eq. ( 15) for b < −1.Even more, the propagation of Gaussian initial data to the b-family with −1 < b < 1 results in the emergence of "ramp-cliffs" (see [32] and references therein) That is, solutions that involve a combination of ramp-like solutions of the Burgers equation (evolving as x/t) together with an exponentially-decaying tail, i.e., cliff.
Having introduced the model of interest, we will use the SGNN to approximate both one-peakon and multi-peakon solutions in the next section.
The data collection process involves the sampling of 2 12 = 4, 096 points within the specified domain.Additionally, we use the 'Sobol' sampling scheme to gather 2 9 = 512 boundary points, and another 512 spatial points satisfying the initial condition.It should be noted that the number of samples is larger than the number reported in the literature.This increase in sample size is attributed to the comparatively larger domain size in our analysis.We train SGNN for 5, 000 epochs using ADAM [27], followed by L-BFGS [28] to refine the results.The dataset is divided into 8 mini-batches.The learning rate of ADAM is 1e − 2 for the first 100 epochs, and 1e − 3 for the rest.We report that the mean-squared loss is 8.43e − 3 when training finishes.It should also be noted that this value is scaled by a relatively large scaling factor (λ ic = 1000) that is selected using trial and error.On the other hand, the mean-squared validation error is much smaller, with a value of 7.21e − 6.The maximum absolute validation error is 3.90e − 2. As illustrated in Fig. 3 (b), the inferred peakon solution with c = 1.0 accurately approximates the exact solution with the error getting maximized at the crest of the peakon.The good agreement is also demonstrated in Fig. 3 (c), where the "x" markers stand for the exact solution [cf.Eq. ( 16)], and line for the predicted solution by SGNN at two different instant of times (see, the legend in the figure).
A case corresponding to an anti-peakon solution with c = −1.0 is represented in Fig. 4. The prediction by SGNN yields a mean-squared loss of 1.94e − 11.This means that the inference of SGNN precisely matches the exact solution.The largest error occurs on the characteristic curve x + t = −10, with the magnitude level of 1e − 5.

Other values of b
We next investigate the emergence of peakons using SGNN for different values of b.Indeed, Fig. 5 (a) presents a peakon solution predicted by SGNN with b = 0.8 and c = 1.5.While the temporal domain remains as [0, 10], the spatial domain is enlarged to [−30, 30] in order to accommodate the rise of velocity, and thus the peakon "fits" in the computational domain over its propagation.As shown in Fig. 5 (b), the prediction matches very well with the exact solution.The mean-squared validation error is 4.09e − 6, and the maximum absolute error is 0.0274.The maximum absolute error appears at the region where the u(x, t) reaches its peak value.The training loss after 5, 000 epochs is reduced to 9.18e − 2. The waveforms at t = 0 and t = 10 are depicted in 5 (c), where lines represent SGNN's prediction, and "x" markers represent the exact solution, respectively.The  predicted peakon solution with b = −1, c = 0.8 is presented in Fig. 6.Likewise, a good agreement between inference by SGNN and the exact solution u(x, t) = 0.8e −|x−0.8t| is achieved, with a training loss of 3.1e − 2, a mean-squared validation loss of 3.5e − 5, and a maximum absolute validation loss of 5.92e − 3.

Interacting peakons
Having discussed the prediction of single-peakon (and anti-peakon) solutions in the b-family, we now turn our focus to cases involving two-peakon configurations in the CH equation (b = 2), thus emulating their interactions.In particular, we focus on the following three specific scenarios: (1) peakons traveling along the same direction with identical speed, (2) peakons traveling in the same direction but at different velocities, and (3) peakons moving in opposite directions.Given that peakons can travel at varying speeds and in distinct directions in space (i.e., either left or right), we employ multiple SGNNs to approximate these peakons, allocating one SGNN per peakon.The sum of such SGNNs produces the NN-solution of Eq. 15, and the NN structure in this case is shown in Fig. 7.During the training stage, the loss functions associated with the PDE and BCs are identical to those in Eqs.11 and 12.However, it is necessary to modify the loss function of ICs such that the output of each SGNN at t = 0 accurately reflects the corresponding peakon solution at t = 0. We inspect the response from t = 0 to t = 10, within a spatial domain [−30, 30].Two one-layer SGNNs with 40 neurons are used.Each training dataset is generated by randomly sampling 2 13 = 8, 192 collocation points within the domain, and 2 10 = 1024 points on the boundary.The dataset is then divided into 8 mini batches.The results are obtained with 5, 000 training epochs by ADAM, followed by refinement by L-BFGS (as before).The validation set is generated by uniformly sampling a 50 × 100 grid in the domain including BCs and ICs.
In Fig. 8, two peakons traveling towards the right with identical speed c = 1 are presented.The ICs employed here are u(x, 0) = e −|x+2| + e −|x−2| , which forms a bi-nominal shape.The training error is 4.27e − 3, with scaling factors λ ic = λ bc = 100.As shown in Fig. 8(a), the peakons maintain their distance during propagation.Moreover, it can be discerned from Fig. 8(b) that SGNN is capable of making very good predictions of such configurations.Indeed, the mean-squared and maximum absolute errors are 2.99e − 5 and 5.22e − 2, respectively for this case.The good agreement between SGNN and exact solutions is further demonstrated in Fig. 8(c), where "x" makers are for exact solutions and solid lines are predictions by SGNN.
The complementary case corresponding to the interaction of two anti-peakons traveling at different speeds is presented in Fig. 9.In particular, we consider a configuration involving two anti-peakons: one centered at x = −5 with speed 0.8, and another one whose center is (symmetrically) placed at x = 5 and travels with velocity of 2.2.The respective IC that describes this configuration is u(x, 0) = −0.8e−|x+5| − 2.2e −|x−5| .The training error is 0.0188, with scaling factors λ ic = λ bc = 100.On the validation dataset, the mean-squared error is 2.99e − 5.In addition, the maximum absolute error is 5.22e − 2, which is reflected in Fig. 8(b).The interactions of these two anti-peakons is shown in Fig. 9(a).The second anti-peakon (in darker red), possessing a higher velocity, will eventually overtake the first one (in orange), despite initially lagging behind.A good agreement between SGNN prediction and exact solution is demonstrated in Fig. 9 (c).The second one catches the first one at t = 7.14, where their peaks add up, as shown in Fig. 9 (d).Finally, Fig. 10 shows a more realistic scenario: the (elastic) collision between a peakon and an anti-peakon.In this case, the IC considered is given by u(x, 0) = −e −|x−2| + e −|x+2| , where the training error is 6.12e − 3, with scaling factors λ ic = λ bc = 100.The mean-squared validation error is 2.11e − 5 while the maximum absolute validation error is 5.54e − 2, as illustrated in Fig. 10 (b).As expected, the peakon (light red) and anti-peakon (in darker red) move towards each other with same velocity as shown in Fig. 10(a) until they collide at  t = 2. Indeed, Fig. 10(d) showcases the predicted solution at the time of collision where the waveforms cancel each other.Then, at later times, i.e., t > 2, the anti-peakon and peakon re-emerge, and they can maintain their shape after collision, as shown in Fig. 10(c).

Lefton solutions
The last case that we consider using SGNNs is the lefton regime, i.e., b < −1 whose explicit solution form is given by Eq. (18).Herein, we study such solutions at b = −2.For our training dataset, we randomly selected 2 12 = 4096 points within the domain alongside an additional 2 9 = 512 points, subsequently dividing this dataset into 8 mini-batches.The chosen time domain is set at t ∈ [−10, 10], and the spatial domain at x ∈ [ −20, 20].As depicted in Figure 11, there is a high degree of concordance between the SGNN predictions and the exact solutions.The training loss, adjusted by scaling factors λ ic = 1, 000 and λ bc = 1, was recorded at 0.054.The mean-squared error for the validation loss stands at 4.62e − 6, with the maximum absolute validation loss reaching 1.12e − 2.

Peakons in ab-family
In this section, we turn our focus on the applicability of SGNN to the so-called abfamily [20] of peakon equations where The ab-family is a generalization of the b-family [cf.Eq. ( 15)] in the sense that it corresponds to cubic (in its nonlinearity) CH-type equations unlike the quadratic CH-type equations of the b-family [20].Interestingly, the ab-family admits the one-peakon solution taking the form For the applicability of SGNN, we inspect the peakon solution in the spatial domain x ∈ [−20, 20] and time domain t ∈ [0, 10].An SGNN with 80 neurons is used to approximate the one-peakon solution in the ab-family.To generate the training dataset, we randomly generate 2 13 = 8192 collection samples within the domain and 2 10 = 1024 samples on the boundary.The training dataset is evenly split into 8 mini-batches.In the loss function, λ ic = 1000 and λ bc = 100 are applied to penalize initial and boundary conditions.Same as before, the ADAM method is then used to train the SGNN, followed by the refinement by L-BFGS.
Distinct from the members of the b-family, both peakons and anti-peakons of the ab-family propagate in the same direction.This behavior is confirmed using parameters b = 2.0, a = 1/3, and c = 1, as illustrated in Figs. 12 and 13.The training losses for the peakon and anti-peakon solutions are recorded at 0.0141 and 0.0138, respectively.For the peakon solution, the mean-squared error across the validation set is measured at 8.63e − 6, with the maximum absolute error reaching 3.7e − 2. Similarly, the anti-peakon solution exhibits a mean-squared error of 8.24e − 6 over the validation set, and its maximum absolute error is noted as 4.5e − 2.

2D compactons
In this section, we depart from the previous one-dimensional (in space) studies, and apply SGNN in order to predict TWs in two-dimensional nonlinear wave equations.More specifically, we focus on TWs that have compact support which are referred to as compactons, and introduced in [24] (and references therein).Following the notation in [24], there exists a family of PDEs denoted as C N (m, a + b) given by that possesses compacton TWs with m ≥ max(1, a − 1), b > 0. Here, C N (m, a + b) represents a N-dimensional compacton (with N = 1, 2, 3) with a parameter set {m, a, b}.In the  following, we restrict ourselves to N = 2.According to [23], Eq.21 supports traveling compactons traversing in the x direction.In that light, the characteristic is in the form where λ is the velocity of the compacton.The case with C 2 (m = 1 + b, 1 + b) yields an explicit solution in the form where u vanishes elsewhere (i.e., compact support).In Eq. ( 23), R = s 2 + y 2 , and F(R) = J 0 ( √ bR) where J 0 is the zeroth-order Bessel function, and √ bR * is the root of the first-order Bessel function.
We use a SGNN to approximate the compacton C 2 (3, 1 + 2).The SGNN has two layers, with 50 neurons per layer, and the approximation is performed in the spatial domain x ∈ [−10, 30] and time domain t ∈ [0, 10].To generate the training dataset, we randomly sampled 2 16 = 65536 collocation points within the domain, along with 2 12 = 4096 points on the boundary.The dataset is then evenly split into 8 mini-batches.The mini-batch ADAM is used to SGNN, with loss functions to minimize the residual error of the PDE, initial conditions, and boundary conditions.
In Fig. 14, the SGNN's prediction (left column) is presented against the exact (right column) compacton solution C 2 (3, 1 + 2).The training loss is stopped at 9.97e − 3, with λ ic = 100, and λ bc = 10.The mean-squared validation error is 1.58e − 4 while the maximum absolute error is 0.371.The left panel is predicted by SGNN and the right panel shows the exact solution.The compacton travels along x-axis with a velocity of λ = 1.At t = 0 (top panel), the compacton commences with a center placed at x = 0. Middle and bottom panels present snapshots of compactons at t = 5, and t = 10, respectively.We report that the SGNN's prediction captures the main characteristics of the exact solution although minor errors appear around the edges of compacton.
As a last case, we consider the compacton C 2 (m = 2, a + b = 3) whose explicit solution is given by u with u vanishing elsewhere.According to Ref. [23], we pick N = 2, m = 0, a = 0, and b = 3, and thus we have The SGNN's prediction and exact solution of C 2 (2, 0 + 3) compacton solutions are presented in the left and right columns of Fig. 15.Snapshots of the solutions are shown at t = 0 (top), t = 5 (middle), and t = 10 (bottom) therein.In this case, we report that the predictions precisely match the exact solution at these times.The scaling factors in the loss function are λ ic = 100 and λ bc = 10, and the training loss is stopped at 1.23e − 3 after 300 epochs.The mean-squared validation error is 8.28e − 5, while the maximum absolute error is 0.135.In summary, we present the training losses and validation errors of all previous results (see, also the reference Figure in the left column) in Table 1.

Comparison and discussion
To compare the performance of the traditional and new structures of PINNs, we use them to approximate a peakon solution in the CH equation with c = 1.The spatial domain employed is [−20, 20], while the temporal domain is [0, 10].The size of the training set is 2 12 = 4096, with 2 9 = 512 samples at the initial and boundary conditions.The selection of width and depth for models is informed by the configurations reported in existing literature.Additionally, we also compare the performance of SGNN vs MLP.As shown in Table 2, in the traditional PINN framework, neither SGNN nor MLP can successfully converge to a TW on a large spatial and temporal domain.Despite small training losses, all NN structures get stuck at the trivial solution as it is very difficult to overcome propagation failure when dealing with enlarged domains.By introducing a training method that respects causality [16] or performs adaptive sampling [15], one may be able to address this failure.
On the other hand, as illustrated in Table 3, with the TW coordinate transformation, identical NN structures of SGNN and MLP with ReLu function all converge to the correct solution.Although the training losses of MLP with hyperbolic tangent and sigmoid functions are relatively large, they all capture the characteristics of the TW.Sigmoid and hyperbolic functions have difficulties approximating the non-differentiable peak of the peakon, with about 0.3 error in the peak value.Notably, these results can be improved by modifying the sampling method, training scheme, and loss functions.With the increase of depth and width, MLP with ReLU and sigmoid functions can further reduce loss values.The loss values with SGNN also gradually drop as width increases.SGNN excels at the compact structure that only requires less than 1/10 of training parameters.In addition, SGNN can give an explicit solution form of the PDEs in the sense of Gaussian radial-basis functions.However, further increasing the number of neurons in SGNN does not appreciably reduce the loss value.This could be remedied by modifying the training and sampling schemes.Why can the modified structure of PINNs avoid propagation failure and lead to better results?We attempt to answer this question next.By mathematically transforming the NN input to the traveling coordinate x − ct, we inherently produce an output in the form of u(x − ct).This representation naturally aligns with the solution form of TWs, maintaining the integrity of the solution's structure.From a physical perspective, this transformation converts a dynamic problem into a static one (i.e., a TW becomes stationary in a frame that co-moves with the solution), thus simplifying the problem considerably.Algorithmically, this transformation effectively reduces the input dimension by one, which can lead to a decrease in the required data size for training.Furthermore, the functional form of TWs

Figure 7 .
Figure 7.The NN architecture used for the study of multiple peakon configurations.

Table 1 .
The training losses and validation errors of the presented results in sections 2-5.