1. Introduction
Partial differential equations (PDEs) are some of the most significant challenges in the field of scientific computing, and they have been rigorously approached using different methods [
1]. Numerical modeling is a powerful mathematical tool in medical, industrial, and academic fields [
2]. Regardless of the field of application, it is essential to understand the system. Modeling a particular system can provide a clear view of its momentous components and affecting factors. Thus, it unlocks development, control, and maintenance insights [
3].
In particular, numerical methods have a specific significance in modeling ultrasound waves, characterizing the acoustic field, designing ultrasound transducers, and ultrasound treatment planning [
4,
5,
6]. Studying the physical nature of acoustics, especially in ultrasound therapeutics, represents a substantial contribution to noninvasive medical procedures. The ability to simulate the propagation of ultrasound waves within a domain in the human body has an extensive impact on confidence and success prior to the initiation of therapy. This reduces the possibility of erroneous setups, validates safety factors, reduces treatment planning and patient waiting times, and eventually reduces the overall cost of the medical procedure [
4,
7,
8].
Partial differential wave equations are traditionally modeled using tools such as the Finite Difference Method (FDM) [
9,
10], Finite Element Method (FEM) [
11,
12], or spectral methods [
13]. These methods typically rely on polynomials, piecewise polynomials, and other basic functions. Given the methodology of these approaches, the modeled problem must be set up on a mesh (grid) of finite points. Although they are considered elegant and practical strategies, their applicability is easily hindered as the number of dimensions increases. Owing to their mesh-based nature, the increase in dimensions is paired with an increase in computational operations and resource allocation. This modeling complication is referred to as the Curse of Dimensionality (CoD) [
14]. This is one of the most common obstacles in PDE modeling. Another concern that accompanies mesh-based approaches is discretization. The PDE was discretized and then solved through time-stepping. The discretization error when the grid size is not sufficiently small to capture the desired resolution of the modeled system can reflect incorrect results [
15]. The term “traditional methods” here also covers methods that provide a solution in a converging series form, such as the Taylor series [
16]. This steers the process of solving PDEs into other lanes of complexity, as the solution may require multiple series terms to ensure minimal error and quick convergence.
Continuous research on artificial intelligence, along with advancements in computing power, has spawned a new field of modeling techniques utilizing Deep Learning (DL) [
17,
18]. Neural Networks (NNs) have been considered universal PDE modeling tools for an extended period of time that stretches back to the 1990s [
19]. One popular option for solving forward problems is Deep Neural Networks (DNNs), where they are trained to predict the solution to a defined physics problem [
20,
21,
22]. Despite their potential, being a data-driven approach requires a relatively large number of training datasets. Sufficient training datasets are commonly lacking for many specific problems. In addition, the DNN training process can be challenging because of the difficulty in determining the optimal hyperparameters for the NN.
A recently introduced class of DNNs was explicitly used for solving PDEs by exploiting the physics of the problem. This class of DNNs is referred to as Physics-Informed Neural Networks (PINNs) [
23]. Unlike the normal DNN, which requires a previous solution of the PDE to perform training with input and output pairs, PINNs account for the physical concept of the problem by incorporating the formula of the governing physics law along with its initial and boundary conditions into the loss function. The PINN is then trained to minimize the loss value. During the training iterations, PINN efficiently employs the feature of automatic differentiation to compute partial derivative terms with respect to space or time. Therefore, PINNs are a mesh-free approach [
24,
25,
26,
27].
PINNs can overcome the CoD problem faced in traditional modeling methods by predicting the PDE solution without the need to construct detailed grids. A few differences between PINNs and traditional methods are highlighted here. Instead of using a mesh for spatiotemporal stepping, PINNs rely on irregularly sampled points from the defined domain via different sampling distributions [
28]. To approximate the PDE solution, PINNs use the nonlinear representation of the NN instead of the linear piecewise polynomials used in traditional methods. The parameters to be optimized in PINNs are the NN weights and biases, whereas in traditional methods, the optimization focus is on the point values on the formed grid. The PDE is embedded in the form of a loss function in PINNs instead of as an algebraic matrix (system) in traditional methods. In addition, gradient optimizers [
29] are error minimizers in PINNs, in contrast to the linear solvers in traditional methods.
Using the location of domain points as the training set, PINNs have the distinctive feature of not requiring a previously computed solution for the training process. However, like any other NN used for modeling forward problems, the training process can be a strenuous task for problems exhibiting high-frequency or multiscale features. Designing PINNs, although conceptually simple, requires significant trial and testing to determine the best PINN model for the PDE problem, especially because PINN models are highly problem-dependent [
28].
The wave equation has been modeled by PINNs previously in [
30], showing the possibility of accurately modeling the wave equation in 2D and in an inhomogeneous domain. However, the PINN model conditions the initial state as a single-pulse source point. Hence, the model focused on the propagation of a single wave rather than a continuous or periodic time-dependent wave source. The wave equation was also solved with a PINN-based model to complement the limited available data in [
31], where, similar to the previous approach, the source was implemented as a perturbation captured in the initial condition. This implementation of the wave source is simpler than our focus on periodically generating waves from a time-dependent function. In ref. [
32], the wave equation was modeled using PINNs and compared to the solution of the Gaussian Process (GP). The focus of that work was mainly on exploring the accuracy and noise tolerance of the two approaches instead of the setup of the problem constraints.
PINN has also been used to solve the wave equation in the frequency domain [
33]; assuming the wave is propagating in an infinite domain, the PINN-based model did not use boundary conditions. Therefore, no particular significance has been dedicated to establishing this condition. In ref. [
34], the effects of enlarging the PINN architecture and increasing randomly selected training points on the loss value were discussed. In that work, an extension to the PINN architecture was implemented to correspond to a perfectly matched layer at the boundaries, which reportedly increases the cost of training. The PINN model studied there has demonstrated reasonable predictions for the real and imaginary parts of the wavefield in different media. However, the solution is studied in the frequency domain, and no particular attention was dedicated to the implementation of the constraints’ statuses.
In the observed previous literature on using PINNs to solve the wave equation, albeit using different successful approaches of this tool to model wave propagation, there is still a lack of specialized studies on the best PINN IC and BC constraint implementations for the wave equation. Moreover, the available literature has yet to touch on implementations of the continuous (periodically generated) wave from a time-dependent point source function for some ultrasound therapeutic applications.
Since the initial and boundary conditions of a PDE problem can be implemented in PINNs as soft or hard constraints, the primary question we would like to answer in this research is: How can we achieve the most accurate prediction of the forward wave equation, given the options of soft or hard constraint implementations of the initial and boundary conditions (ICs and BCs)? In this work, we introduce a comprehensive comparison of different combinations of soft and hard constraints to implement ICs and BCs in a homogeneous domain. The wavefield model considered a single sinusoidal time-dependent function as the source point. Each PINN prediction was compared to the FDM solution. A series of experiments was performed to compare the performance of PINNs using different constraint statuses while applying the most suitable tested hyperparameters for each experiment. We then provide the average L2 relative error values to compare each case with its peer constraint combinations. To the best of our knowledge, we propose the first study on the differences between using soft and hard constraints to implement the ICs and BCs of the wave equation. In addition, instead of using the common PINN implementation of the source point in the initial state of the problem as , we employ the boundary condition as a time-dependent point source function (i.e., ). Using the results in our work, we demonstrate the flexibility of using soft constraints, the forcing effect of hard constraints, their effects on the average error values, and the trade-offs of each.
For the remainder of this article,
Section 2 presents the significance of the wave equation and PINN design for the wave equation forward model along with the studied constraint statuses.
Section 3 exhibits the performance of using different constraint combinations when using PINNs and reveals the best constraint implementation.
Section 4 presents concluding remarks and future directions of this work.
3. Results and Discussion
A series of PINN models were tested to observe the prediction behavior in multiple PINN setups with different constraint statuses. The PINN models were designed to observe the effect of implementing the initial and boundary conditions in different constraint statuses using either soft, hard, or both constraints interchangeably. While experimenting with each PINN model, we monitored its influence on prediction accuracy. All modeling trials were performed on a machine with an NVIDIA RTX 3090 GPU and Windows operating system. All the designed PINNs were implemented using the Python library DeepXDE [
28] with a TensorFlow backend. DeepXDE is a well-known library used for implementing PINNs. Several other tools and libraries can also be used to implement PINNs [
64,
65,
66]. An overview of each library and the differences between them is beyond the scope of this study.
Through the process of model testing, it was noticed that different model setups require a set of different hyperparameters to obtain the best prediction results for that particular model (i.e., the best hyperparameters for obtaining a good prediction while using hard initial and boundary constraints are different from the best hyperparameters to obtain a good prediction while using soft initial and boundary constraints). Therefore, to ensure a fair comparison between the error values, we performed trials of the best-tested hyperparameters of one constraint status combination in all other constraint combinations. This allowed us to observe the prediction accuracy of the set of hyperparameters that worked best for one of the four main cases in the remaining three constraint combinations. The word “trial” here refers to a single run of fully training a PINN model and using it for prediction. The error values reported in the tables in
Section 3 are the average values of 10 independent trials for the same randomization seed for each setup while discarding outlier values (unreasonable values that can possibly occur due to initial parameter randomization). The difference in results was noticed to occur even when using the same randomization seed when utilizing the DeepXDE library; hence, the repetition of trials with the same randomization seed was performed. This was performed to reduce the reproducibility of the results when attempting to replicate the experiments in this work.
Each PINN model prediction was compared to an FDM solution obtained previously and treated as the ground truth to measure the solution accuracy. The FDM solution has the same geometry and PDE parameters as those applied to the PINN problem. When performing our trials, we were looking for a model that produces the least an L2 relative error value with an acceptable companion mean residual error (MRE) value, which in turn reflects the PINN’s ability to predict the correct solution of the forward linear wave problem in one dimension.
In our first set of trials, we performed a series of model tests to determine the best hyperparameters for implementing hard initial and boundary constraints (hard–hard). The use of hard constraints for both the initial and boundary conditions removes the initial and boundary loss terms from the overall loss in Equation (8). Because we do not have any additional labeled data, the model loss equation depends solely on the physics loss term (
). In other words, Dirichlet initial and boundary conditions are strictly imposed on the prediction of the NN using the output transform function in Equation (12), which is thought of as an additional layer added to the approximator NN.
Setting up PINNs with hard constraints reduces the computational cost by reducing the number of loss terms whose values must be minimized [
67]. This impacts the process of fine-tuning the hyperparameters of the PINN, and it requires more testing in a single model trial to find better hyperparameters to enable more precise predictions. As shown in
Table 1, the PINN hyperparameters used for obtaining an MRE of 0.38 in the hard–hard constraints setup can obtain lower MRE values using the other constraint statuses of ICs and BCs. This value also hovers around a relatively large offset of 0.4 in comparison with other cases using the same hyperparameters. However, the L2 relative error average value is the lowest
when using this particular set of hyperparameters, yielding the closest prediction to the FDM solution. As shown in
Figure 2a,f, the PINN prediction is more accurate than the middle-time instances. This shows the forcing effects of the output transform function in Equation (12), where the prediction output
is enforced to 0 when
or
, while it changes the enforced value to
whenever
. In the middle-time instances, the enforcement effect is affected by the outcome of the approximator NN
. To improve the approximator NN prediction outcome, multiple repeated trials with different hyperparameters are required. Thus far, the results shown are the best of our hyperparameter tests. The PINN model that achieved the best prediction in this case (hard–hard constraints) contained five hidden layers with 100 neurons in each. The learning rate for training was
, and the activation function was a non-adaptive
. This PINN architecture was trained using 50,000 Adam epochs, followed by 10,000 L-BFGS-B epochs. In every epoch, 1600 points were uniformly sampled from the domain and 160 points from the boundary (very close to the boundaries).
One of the simplest ways to implement IC and BC in PINNs is to treat them as soft model constraints, usually represented by the loss term
in the overall loss function [
67]. The use of soft–soft IC and BC constraints in modeling the wave equation is reflected in the results shown in the second row of
Table 2. After conducting a series of experiments, the best-performing set of hyperparameters produced a prediction with an MRE of
and an L2 relative error of
. These average error values were consistent across the trials, as indicated by the average predicted solution in
Figure 3. No output transform function was used in these trials, which may have influenced the shape of the curve in the initial time solution prediction. The prediction output in
Figure 3a is not rigidly enforced, as is the case with hard–hard IC and BC constraints, but is instead approximated to best meet the soft IC, starting from the Glorot normal initialization [
68] to the NN trainable parameters in the approximator. During the testing trials, the set of hyperparameters used to train a soft–soft constraint model performed well, despite minor fluctuations in the solution prediction values at the initial time. Although the same set of hyperparameters was used with other combinations of IC and BC constraints, as shown in the rest of the rows of
Table 2, their predictions were not as accurate. This confirms that applying soft–soft constraints in modeling the wave equation with PINNs requires a different set of hyperparameters than those needed to achieve the best prediction results with other constraint statuses. Additionally, fine-tuning the loss weights in the loss function was easier, given the clear impact of changes on the convergence of the training error. The PINN architecture that achieved this accurate prediction with soft–soft constraints consisted of six hidden NN layers, with the first two layers consisting of 128 neurons, and the remaining four layers consisting of 64 neurons. The PINN was trained with a learning rate of
, which decreased over 2000 iterations out of the total 10,000 Adam training epochs. The activation function used was an adaptive
with a slope factor of five. This was followed by 10,000 additional L-BFGS-B training epochs. The physics, initial, and left boundary loss terms were assigned a higher weight in the loss function to ensure equal minimization momentum across all loss terms contributing to the overall loss equation.
We also conducted trials with different combinations of initial and boundary conditions (ICs and BCs) in the same PINN model, using either hard ICs and soft BCs or vice versa, as shown in
Table 3 and
Table 4. In these two sets of trials, the output transformation functions (Equations (13) and (14)) were applied to the NN predictions in the hard–soft and soft–hard cases, respectively. Equation (13) was used to enforce a hard initial constraint in the hard–soft model, while Equation (14) was applied to the soft–hard model to impose hard boundary constraints. It is important to note that the main difference between Equations (12) and (14) is the presence of
t in the first term of the equation. The presence of
t in Equations (12) and (13) is used to control the ICs. For example, when
t equals zero, the prediction is neglected and forced to zero as well.
In
Table 3, the average error values of using the best-tested set of hyperparameters in the hard IC constraint and soft BC (hard–soft) constraint model are shown. The trials show a stable average MRE of
and L2 relative error of
. To acquire these error values, the PINN model was constructed from seven hidden NN layers, where the first two layers had 256 neurons, and the remaining five layers had 64 neurons each. The learning rate was assigned to
, and an inverse time decay was applied over the 10,000 Adam training epochs. The activation function used is an adaptive
with a value of two for the slope factor. This is followed by 10,000 L-BFGS-B training epochs. The training was performed over 1600 domain points and 160 boundary points uniformly sampled. We observed that assigning higher weights for the physics (
) and left boundary conditions in the total loss equation produced better prediction values. We assume a potential reason for this occurrence is that learning a function requires more emphasis than learning a constant value in PINNs. The same set of hyperparameters performed very poorly in terms of the MRE for the hard–hard constraints model, as shown in the first row in
Table 3. While these are the best-tested hyperparameters for this case, applying them to the soft–soft constraints model reflects lower error values that translate to better performance. The solution prediction results for this trial set are summarized in
Figure 4. Using hard constraints for the initial time in
Figure 4a enforces a zero value to all prediction values at that time instance. However, the prediction results in the rest of the time instances in
Figure 4b–f become less accurate than in the soft–soft case. This occurrence is possibly the result of omitting the initial condition loss term from the loss function and instead relying on enforcing it through the output transform function. This leaves the approximator NN training process dependent on minimizing the physics (
) and BC loss (
) values alone instead of considering the IC loss value as well.
In
Table 4, a trial set is executed to identify the performance accuracy of the soft–hard IC and BC constraint PINN model using the best-tested hyperparameters for the case. As computed in the fourth row of
Table 4, the best-tested hyperparameters for this case reflected an average MRE value of 0.018, and an L2 relative error of
over the performed training and prediction trials. The PINN setup used for this case is composed of five hidden NN layers containing 64 neurons each. The learning rate used is
, and the activation function was a non-adaptive
. The PINN was trained through 12,000 Adam epochs followed by 10,000 L-BFGS-B epochs. Maintaining the same set of hyperparameters and changing the IC and BC constraint status shows the error results summarized in
Table 4. Despite showing reasonable results for modeling the wave equation with soft–hard IC and BC constraints, these PINN hyperparameters performed with a degraded MRE value for hard IC and BC constraints. They also show a poor L2 relative error value when applying them to a soft–soft IC and BC constraint model setup. Nevertheless, the best-tested hyperparameters for the soft–hard constraints perform better when switching status to hard–soft constraints. The plots in
Figure 5 display the average prediction results for the soft–hard constraints model relative to the FDM solution. The enforcing effect of applying hard BC constraints is clearly visible in the last time instance in
Figure 5f. The small-valued disruptions shown in
Figure 5a reflect the PINN prediction of the soft IC in the initial time instance. The average solution prediction of the middle time instances in
Figure 5b–e is the outcome of training PINNs with the best-tested set of hyperparameters for the soft–hard constraints combination. At its best-tested performance, this case does not show a better average prediction than the soft–soft constraints model.
The highlights of the thoroughly studied four cases for IC and BC constraints in this work reveal important behaviors when attempting to model the one-dimensional wave forward problem. Using soft constraints allows for flexibility in composing the governing ICs and BCs in the domain of interest while still enabling control over loss weights as each of the physics, initial, and boundary loss terms are defined independently in the total loss equation. The trial sets show that a soft–soft constraints model achieved the lowest L2 relative error value. Choosing the optimal hyperparameters for PINNs is highly dependent on the problem design and the available computational resources. The problem design here includes the choice of constraint status chosen to implement the IC and BCs of the system.
As for the time performance of each of the constraint statuses,
Table 5 shows the training and prediction times for a single PINN model trial of each of the four studied constraint combinations with their most appropriate set of hyperparameters. Regardless of the status of the constraints chosen during training the PINN, the prediction times remain consistent at approximately
s. However, the training time consumed for a PINN model with hard–soft constraints is the greatest, while the least is in the training of a PINN model with soft–hard constraints. This can be explained by the approximator NN size along with its suitable set of hyperparameters that are used to achieve the results reported in
Section 3. The training times do not necessarily reflect a better performance of one over another because of factors like the size of the PINN. The number of neurons and layers alone can play a main role in increasing or decreasing the training time. Considering the difference of the most suitable set of hyperparameters adds to the reason for the training time difference among the studied cases of this work. For solving the 1D wave equation problem,
Table 5 puts on view the FDM solution time of
s, which ranks the fastest among the prediction times. This is particular to the 1D case. However, it is expected in higher dimensional problems that the FDM consumes more time exponentially given its meshed-based origin. This is not the case in PINN models. The increase in dimensions is not reported to suffer from such an increase in prediction time [
31].