Abstract
Partial differential equations are common models in biology for predicting and explaining complex behaviors. Nevertheless, deriving the equations and estimating the corresponding parameters remains challenging from data. In particular, the fine description of the interactions between species requires care for taking into account various regimes such as saturation effects. We apply a method based on neural networks to discover the underlying PDE systems, which involve fractional terms and may also contain integration terms based on observed data. Our proposed framework, called Frac-PDE-Net, adapts the PDE-Net 2.0 by adding layers that are designed to learn fractional and integration terms. The key technical challenge of this task is the identifiability issue. More precisely, one needs to identify the main terms and combine similar terms among a huge number of candidates in fractional form generated by the neural network scheme due to the division operation. In order to overcome this barrier, we set up certain assumptions according to realistic biological behavior. Additionally, we use an -norm based term selection criterion and the sparse regression to obtain a parsimonious model. It turns out that the method of Frac-PDE-Net is capable of recovering the main terms with accurate coefficients, allowing for effective long term prediction. We demonstrate the interest of the method on a biological PDE model proposed to study the pollen tube growth problem.
1. Introduction
Two-component reaction–diffusion systems often model the interaction of two chemicals, leading to the formation of non-uniform spatial patterns of chemical concentration or morphogenesis under certain conditions due to chemical reactions and spreading. Since Turing’s groundbreaking work [], reaction–diffusion systems have been extensively used in developmental biology modeling. For example, let and represent the concentration of two chemical species, which may either enhance or suppress each other depending on the context. The system of u and v can be modeled as follows:
where denotes the Laplacian operator, and are interactions between u and v. The functions and are sums of various reaction terms that can be derived from physical or chemical principles such as mass-action laws, Michaelis–Menten kinetics, or products that represent some competition, cooperation effects. We refer the readers to ([], Section 2.2) for more discussions. Hence, and are sums of meaningful functions that represent specific mechanisms: if we are able to identify these terms and discover the explicit formulas for and , then we can learn more about the nature of the interactions and predict future behaviors well. This situation arises commonly in biological applications such as chemotaxis, pattern formation in developmental biology, and also the cell polarity phenomenon [,].
Cell polarity plays a vital role in cell growth and function for many cell types, affecting cell migration, proliferation, and differentiation. A classic example of polar growth is pollen tube growth, which is controlled by the Rho GTPase (ROP1) molecular switch. Recent studies have revealed that the localization of active ROP1 is regulated by both positive and negative feedback loops, and calcium ions play a role in ROP1’s negative feedback mechanism. Initially, ROP1 is inside the membrane. During positive feedback (rate ), some of the ROP1 enters the membrane. At the same time, negative feedback (rate ) causes some of it to return inside the membrane while the rest diffuse on the membrane (rate ). Calcium ions follow a similar process with positive rate , negative rate , and diffusion rate . In [,], the following 2D reaction–diffusion system (2) is introduced:
with suitable initial and boundary conditions being proposed to quantitatively describe such spatial and temporal connection between ROP1 and calcium ions, leading to rapid oscillations in their distributions on the cell membrane. Here, , , and , , , , and are abbreviated notations for partial derivatives with respect to the time t or to the spatial variable x. Moreover, the non-linear function characterizes how calcium ions play a role in ROP1’s negative feedback loop. Specifically, the active ROP1 causes an increase in levels, leading to a reduction in ROP1 activity and a decrease in its levels. Meanwhile, the flow of slows down as ROP1 drops. Ref. [] proposed the equation to describe such spatial–temporal patterns of calcium, where is a positive constant. Based on this model, ref. [] developed a modified gradient matching procedure for parameter estimation, including and . However, it requires that in (2) is a known function. In this work, we propose to apply neural network methods to uncover the function or more broadly, to learn interaction terms and in general reaction-diffusion PDEs (1), which may contain fractional expressions (Figure 1).

Figure 1.
ROP1 and polarization dynamics. Left: ROP1 dynamics; Right: dynamics.
In the past decade, the artificial intelligence community has focused increasingly on neural networks, which have become crucial in many applications, especially PDEs. Deep learning-based approaches to PDEs have made substantial progress and are well-studied, both for forward and inverse problems. For forward problems with appropriate initial and boundary conditions in various domains, several methods have been developed to accurately predict dynamics (e.g., [,,,,,,,,,,]). For inverse problems, there are two classes of approaches. The first class of approaches focuses on inferring coefficients from known data (e.g., [,,,,,]). An example of this is the widely known PINN (Physics-informed Neural Networks) method [], which uses PDEs in the loss function of neural networks to incorporate scientific knowledge. Ref. [] improved the efficiency of PINNs with the residual-based adaptive refinement (RAR) method and created a library of open-source codes for solving various PDEs, including those with complex geometry. However, this method is only capable of estimating coefficients for fixed known terms in PDEs, and may not work well for discovering hidden PDE models. Although [] extended the PINN method to find unknown dynamic systems, the nonlinear learner function remains a black-box and no explicit expressions of the discovered terms in the predicted PDE are available, making it difficult to interpret their physical meaning. The second class of approaches not only estimates coefficients, but also discovers hidden terms (e.g., [,,,,,,,,]). An example is the PDE-Net method [], which combines numerical approximations of convolutional differential operators with symbolic neural networks. PDE-Net can learn differential operators through convolution kernels, a natural method for solving PDEs that has been well-studied in []. This approach is capable of recovering terms in PDE models with explicit expressions and relatively accurate coefficients, but often produces many noisy terms that lack interpretation. In order to produce parsimonious models, refs. [,] proposed to create a regression model with the response variable , and a matrix with a collection of spatial and polynomial derivative functions (e.g., ): . The estimation of differential equations by modeling the time variations of the solution is known to produce consistent estimates []. In addition, the Ridge regression with hard thresholding can be used to approximate the coefficient vector . This sparse regression-based method generally results in a PDE model with accurately predicted terms and high accuracy coefficients. However, few existing studies have focused on effectively recovering interaction terms in the fractional form (say one polynomial term divided by another polynomial term) in hidden partial differential equations, which is the focus of this paper.
Previous methods for identifying the hidden terms in reaction–diffusion partial differential equation models have mostly focused on polynomial forms. However, as indicated in Equation (2), the model for ROP1 and calcium ion distribution also involves fractional and integral forms, which can pose identifiability issues when combined with polynomial forms. Furthermore, we want to attain a parsimonious model, as the interpretability of the PDE model is important for biologists to comprehend biological behavior and phenomena revealed by the model.
In this paper, we utilize a combination of a modified PDE-Net method (which adds fractional and integration terms to the original PDE-Net approach), an norm term selection criterion, and an appropriate sparsity regression. This combination proves to produce meaningful and stable terms with accurate estimation of coefficients. For ease of reference to this combination, we call it Frac-PDE-Net.
The paper is organized as follows. In Section 2, we explain the main idea and the framework of our proposed method Frac-PDE-Net. In Section 3, we apply Frac-PDE-Net to discover some biological PDE models based on simulation data. Then, in Section 4, we make some predictions to test the effectiveness of the models learned in Section 3. Finally, we summarize our findings and present some possible future works in Section 5.
2. Methodology
The main idea of the PDE-Net method, as described in [], is to use a deep convolutional neural network (CNN) to study generic nonlinear evolution partial differential equations (PDEs) as shown below:
where is a function (scalar valued or vector valued) of the space variable and the temporal variable t. Its architecture is a feed-forward network that combines the forward Euler method in time with the second-order finite difference method in space through the implementation of special filters in the CNN that imitate differential operators. The network is trained to approximate the solution to the above PDEs and then the network is used to make predictions for the subsequent time steps. The authors of [] show that this approach is effective for solving a range of PDEs and can achieve satisfactory accuracy and computational efficiency compared to traditional numerical methods. In this paper, we follow a similar framework to PDE-Net, but with modifications on a symbolic network framework () to better align with biological models.
2.1. PDE-Net Review
The feed-forward network consists of several -blocks, all of which use the same parameters optimized through minimizing a loss function. For simplicity, we will only show one -block for two-dimensional PDEs, as repeating it generates multiple -blocks, and the concept can easily be extended to higher-dimensional PDEs.
Denote the space variable in (3) to be since we are dealing with the two-dimensional case. Let and be the given initial data. For , denotes the predicted value of u at time calculated from the predicted (true) value of at time using the following procedure:
where SymNet is an approximation operator of F. Here, the operators are convolution operators with the underlying filters , i.e., . These operators approximate differential operators:
For a general filter , where ,
By Taylor expansion,
where
In particular, if choosing , then
As a result, the training of q can be performed through the training of since the moment matrix . It is important to note that the trainable filters M (or q) must be carefully constrained to match differential operators.
For example, to approximate by , or equivalently by for a filter , we may choose
where ∗ means no constraint on the corresponding entry. Generally, the fewer instances of ∗ present, the more restrictions are imposed, leading to increased accuracy. In this example of (6), the choice of ensures the 1st order accuracy and the choice of guarantees the 2nd order accuracy. More precisely, if we plug into (5) with , then
which implies . Similarly, if we plug into (5), then . In PDE-Net 2.0, all moment matrices are trained as subject to partial constraints so that the accuracy is at least 2nd order.
The network, modeled after CNNs, is employed to approximate the multivariate nonlinear response function F. It takes a m-dimensional vector as input and consists of k layers. As depicted in Figure 2, the network has two hidden layers, where each unit performs a dyadic multiplication and the output is added to the th hidden layer.

Figure 2.
The scheme of one .
The loss function for this method has three components and is defined as follows:
Here, measures the difference between the true data and the prediction. Consider the data set , where n is the number of blocks, N is the total number of samples, and is the number of space grids. The index j indicates the jth solution path with a certain initial condition of the unknown dynamics, and the index i represents the solution at time . Then, we define
Here, , where represents the real data and denotes the predicted data. For a given threshold s, recall the Huber’s loss function defined as
We then define the following:
where s are filters and is the moment matrix of . Using the same Huber loss function as in (8), we define
where s are the parameters in SymNet. The coefficients and in Equation (7) serve as regularization terms to help control the magnitude of the parameters, preventing them from becoming too large and overfitting to the training data.
2.2. mPDE-Net (Modified PDE-Net)
In mPDE-Net, we do not include multiplications between derivatives of u and v, as these interactions are not commonly present in biological phenomena. Additionally, to handle interactions in fractional or integral forms, such as those in Equation (2), mPDE-Net incorporates integral terms and division operations into . However, there was a challenge with identifiability using mPDE-Net. For instance, consider a two-component input vector u and v. mPDE-Net may produce results such as or , where is a small number due to noise. Although both of these terms essentially represent the same term u, the mPDE-Net is unable to automatically identify them as such. Keeping all similar terms such as , and u at the same time would result in a complex model and the real fractional term would not be effectively trained.
To address the identifiability issue, restrictions were imposed on the nonlinear interaction term by assuming that , where either g or h is linear and the other one can contain a fractional term with the order of the denominator larger than that of the numerator. For instance, the terms and are further decomposed as follows:
As seen, the main part of the above two terms is u while the rest, such as , and , are considered as perturbations since is very small. This allows the mPDE-Net to identify and combine the main parts of terms, resulting in a compact model.
Figure 3 presents an example of a system involving the derivatives of u and v up to the second order. The symbolic neural network in this example has five hidden layers, referred to as . The operators are multiplication functions, i.e., , for ; and are division functions, i.e., , for . Additionally, a term is included to incorporate fractional powers, such as the term in (2). The algorithm corresponding to this example is outlined in Algorithm 1, where
Algorithm 1 Scheme of mPDE-Net. |
Input: , where I represents , , , , , , , , , , , Output: , . |

Figure 3.
The scheme of mPDE-Net.
To further demonstrate the mPDE-Net approach, we present a concrete example. To simplify the notation, we introduce the row vector with a 1 in the ith component and 0 in all other components, i.e.,
where the number “1” is on the ith position. Then, we set
According to Algorithm 1 for ,
Therefore,
Let denote the library for PDE-Net 2.0 and denote the library for mPDE-Net. It is clear that and are distinct. Typically, only seeks to identify multiplication terms and has the form:
where
Conversely, is engineered to learn both multiplication terms and fractional terms, subject to certain constraints. In our paper, we make the choice of
which is much larger than . Therefore, our framework of neural networks, built upon , is more challenging to implement than the original framework, which is based on .
2.3. Optimizing Hyperparameters
In this section, we will explain the process of tuning hyperparameters and in the loss function (7). Firstly, the range of spatial and temporal variables in the training set are defined as and , respectively. Then, using the finite difference method, we generate a dataset that acts as the “true data”. Additionally, we consider M initial conditions. The time interval is determined by , where is the time step size for computing the “true data” and represents the time step size for selecting the “observational data”. Typically, is chosen to be much smaller than . The solution corresponding to the mth initial condition is denoted as , where the first “·” refers to the spatial variable and the second “·” represents the temporal variable. If the solution is evaluated at the kth time step, it is written as , with “·” representing the spatial variable.
The M initial values from M initial conditions are divided into three separate groups, resulting in , where , , and represent the sizes of the training set, validation set, and test set, respectively. The solutions produced by these initial values are designated as follows:
Training set: ;
Validation set: ;
Testing set: .
We use the training set to train our models, the validation set to find the best parameters, and the testing set to evaluate the performance of the trained models.
Assume we divide the time range into K blocks, with cutting points denoted as for . Then, for any and for any , we define
where denotes the norm with respect to the space variable on , is the “true solution”, and is the “predicted solution” by a neural network. Based on this, the training loss, validation loss and the testing loss are defined as follows:
- Training loss:
- Validation loss:
- Testing loss:
We choose the hyperparameters and in the loss function (7) using the validation sets. Let and , where , and . We define the training number by . We then gradually increase the time points of the training and validation sets. For instance, if K = 15 and , the training and validation sets can be selected as follows. The performance metric is the same as the validation loss in (10).
Training | Validation | Validation Loss | ||
Furthermore, we tune the hyperparameters using Hyperopt [], which uses Bayesian optimization to explore the hyperparameter space more efficiently than a brute-force grid search. Specifically, the mPDE-Net is nested in the objective function of Hyperopt, which will optimize the average validation loss of models.
The selection procedure is described in Algorithm 2.
Algorithm 2 Optimizing Hyperparameters using Hyperopt |
|
2.4. Frac-PDE-Net
We have noted that mPDE-Net accurately fits data and recovers terms, but it may not always simplify the learned PDE, making it challenging to interpret. To address this, we implement sparsity-encouraging methods such as the Lasso approach. However, even with Lasso and hyperparameters chosen from the validation sets, the predicted equation still had redundant terms. This is likely due to correlated data and linear dependencies in the data, which prevent Lasso from fully shrinking the extra coefficients to zero. To overcome this, we employ two approaches. The first, called the norm-based term selection criterion, weakens or eliminates linear dependencies in the data. The second, called sequential threshold ridge regression (STRidge), creates concise models through strong thresholding. We will discuss these approaches in more detail below.
- norm based term selection criterion. Consider the underlying PDE in the form ofTo address the issue of excessive terms in the learned PDE, we apply the norm based term selection criterion. This involves normalizing the columns of to obtainBy removing the terms in whose adjusted coefficients are significantly smaller than the largest one, we shorten the vector to . The corresponding columns in form a new matrix with reduced linear dependency between its columns. This results in a simplified approximation of the PDE:
- Sparse regression: STRidge. After using the norm-based term selection criterion to select terms, as discussed previously, we move on to consider sparse regression to further improve the compactness of the representation for the hidden PDE model (13). Here, a tolerance threshold “tol” is introduced to select coefficients for sparse results. Coefficients smaller than “tol” will be discarded, and the remaining ones will be utilized until the number of terms stabilizes. The sparsity regression process is outlined in Algorithm 3. For further information, see [].
To summarize, the mPDE-Net approach allows us to achieve relatively accurate predictions for the function and its derivatives. We then employ an norm-based term selection criterion and sparse regression to obtain a concise model, which we refer to as Frac-PDE-Net. Algorithm 4 summarizes this procedure.
Algorithm 3: STRidge(, , , tol, iters) | |
|
Algorithm 4: Norm selection criterion+ STRidge(, , , tol, iters) |
|
2.5. Kolmogorov-Smirnov Test
After applying the Frac-PDE-Net procedure, a simplified, interpretable model has been created. Our next goal is to determine if this model can be further compressed. We designate Model 1 as the system learned by Frac-PDE-Net, and Model 2 as the system obtained by removing the interaction term with the smallest norm from Model 1. To determine if Model 1 and Model 2 come from the same distribution, we use the Kolmogorov–Smirnov test (K-S test).
Since our examples involve systems of two PDEs, a two-dimensional K-S test is appropriate. The time range is with time step size , giving time grids denoted as , where , and . At a fixed time , we aim to test the proximity of two samples and , which are associated with Model 1 and Model 2, respectively, at time . For each , we specify:
Hypothesis 1
(Null). The two sets and come from a common distribution.
Hypothesis 2
(Alternative). The two sets and do not come from a common distribution.
Let and denote null hypotheses and the corresponding p-values, respectively, for . In this paper, we employed Bonerroni [], Holm [] and Benjamini–Hochberg (B-H) [] methods for multiple testing adjustment. Note that the Bonferroni method is the most conservative one among these three methods. Under the complete null hypothesis of a common distribution across all time points, no more than of the total time points can be rejected.
3. Numerical Studies: Convection-Diffusion Equations with the Neumman Boundary Condition
In this section, we showcase numerical examples to demonstrate the efficacy of Frac-PDE-Net, our proposed method. The training, validation, and testing data are generated based on the underlying governing equation. Our aim is to use Frac-PDE-Net on these data to obtain a concise and interpretable model for the PDE. The governing PDEs under consideration in this paper are of the following form:
where
Here, and are positive diffusion coefficients, and represent fractional functions of , and and denote combinations of power functions and integration operators of through addition and multiplication. For example, can be , and can be .
3.1. Example 1: A 2-Dimensional Model
Our first example is taken from (Equation (2.8) in Section 2.2 in []). In this example, we consider (14) under the Neumann boundary condition on a 2-dimensional domain with , , , , and
Thus, Equation (14) is reduced to
with and
The observations are generated with Equations (16) and (17), and then split into training data, validation data and testing data. The PDE is solved by applying a finite difference scheme to a spatial mesh grid with the central difference scheme for , and with a temporal discretization of second-order Runge–Kutta (see []), using a time step size of .
In addition, the observations are obtained from various initial values: this implies an extra variability in the datasets, that is necessary if we want to be able to generalize well to any initial conditions. We assume that we have different solutions, coming from different initial values . The functions are random, defined through random parameters , , , , , , and , which follow from the standard normal distribution , and , which follow from uniform distributions: and . Then, we generate the 12 initial values by setting
where
For any given initial data , we denote the corresponding solution to be . When noise is allowed, we assume the perturbed data to be
where is the level of Gaussian noise added, and and are random variables, which follow from the normal distribution: for , where (or resp.) is the standard deviation of the true data (or resp.).
Since the time is from 0 to 0.15, there are 15 time blocks and we denote . For spatial variables, we have , where represents the number of space grids. Therefore, the dataset is
where both and are matrices in . The following Table 1 and Table 2 show a summary of parameters for Frac-PDE-Net.

Table 1.
Fixed parameters for Frac-PDE-Net.

Table 2.
Hyper-parameters selected by validation procedure Section 2.3 for Frac-PDE-Net.
Our goal is to discover the terms and on the right hand side of (16) and the true expressions are given by (17). For convenience of notation, we denote and to be our predicted operators for and . Based on some existing models (see, e.g., Section 2.2 in []), we adopt some assumptions before discovering and . More precisely, we assume that
where and are positive constants, and are polynomials of up to order 2, and both the fractional terms and are in the form or , where l means a linear function and r denotes a fractional function in which the numerator is linear and the denominator is quadratic.
Based on these assumptions, we consider the following library for training our model.
The filters q (as defined in (4)) are selected to be of size . The total number of parameters in (as defined in Algorithm 1) for approximating and is 56, and the number of trainable parameters in the moment matrices M (as defined in (6)) is 52. To optimize the parameters, we use the BFGS algorithm instead of the Adam or SGD optimizers since the BFGS algorithm is faster and also stable.
In the following, we outline the notation used and summarize the key steps of our framework.
- 1.
- denotes the result of applying the modified PDE-Net on our model.
- 2.
- Next, we utilize the norm-based selection criterion and sparse regression on to obtain a more concise and interpretable model, referred to as . The “s” in represents the application of sparse regression.
- 3.
- Subsequently, we fix the terms in and retrain its coefficients to produce a final model named . This is the end result of our Frac-PDE-Net scheme. The “r” in signifies the process of retraining the coefficients.
- 4.
- Finally, to verify that no further terms can be eliminated after Frac-PDE-Net, we compare two models: Model 1, generated by Frac-PDE-Net; and Model 2, which is identical to Model 1 but removes the term with the smallest norm from and . The coefficients in Model 2 are retrained, and the resulting model is referred to as . “PH” in represents the Post-hoc selection in Model 2. The comparison between Model 1 and Model 2 will be conducted using the Kolmogorov–Smirnov test as outlined in Section 2.5.
For this case, we added 5% noise to the generated data to form the observational data. The results are displayed in Table 3. Table 3 shows that (modified PDE-Net framework) accurately identifies the terms in Example 1 and estimates their corresponding coefficients. However, it also produces unnecessary terms with low weights after training. By applying the norm-based selection and sparse regression (+SP), we successfully remove these extra terms in . After the terms in and are identified, we retrain the model with these fixed terms to obtain the final coefficients in .

Table 3.
PDE model discovery with noise.
To test whether Model 1 () and Model 2 () are similar or not, we compare their predictions by using the finite difference scheme. Consider the time range to be with time step size . Hence, there are 50 time grids, which are denoted to be , where , . Fix a time , we introduce the residuals and , where represents the true solution, and and denote the predicted solutions based on Model 1 and Model 2, respectively, at time . We will test if the residuals and have similar distributions. The null hypothesis is and the alternative hypothesis is . Applying Bonferroni method, Holm method and the B-H’s procedure for multiple testing adjustment, discussed in Section 2.5, the test results are presented in the following Table 4.

Table 4.
Hypothesis tests with observation noise.
The results in Table 4 show that Model 1 (Frac-PDE-Net) is significantly different from Model 2, meaning all terms in Model 1 should be kept. Hence, the final discovered terms for and are represented by Model 1 (Frac-PDE-Net) in Table 4.
To assess the stability of the results shown above, we repeated the experiments 100 times and the results are presented in Figure 4 and Figure 5. The process of merging similar terms is outline in Appendix A.1. The plots show that there are some instances where the three methods fail to eliminate certain redundant terms. However, these instances are rare, as the median of these terms is 0, indicating that they appear infrequently.

Figure 4.
Simulation results for true positive discovering with 5% noise. (a) . (b) .

Figure 5.
Simulation results for false positive discovering with 5% noise. (a) . (b) . (c) . (d) .
3.2. Example 2: A 1-Dimensional Model
Our second example is taken from []. In this example, we consider (14) under the Neumann boundary condition on a one-dimensional domain with ,
The training data, validation data and testing data are generated, based on (20), by applying a finite difference scheme to a 600 spatial mesh grid and then restricted to a 200 spatial mesh grid with the central difference scheme for , and with a temporal discretization of the implicit Euler scheme, using a time step size of 0.01. Furthermore, we evaluate 14 different initial values, 10 of which were selected from a set of solutions with periodic patterns. The remaining initial values were generated by combining elementary functions. The reason for using different ways to produce initial values is to test if this method still works for periodical solutions.
We also add noise to the generated data in the following form:
where is the level of Gaussian noise added and and are random variables that follow from the normal distribution: for , where (or resp.) is the standard deviation of (or resp.). The reason of imposing the absolute value sign is to avoid negative values, which may cause trouble to evaluate power functions with non-integer power, such as .
We choose 15 blocks for the time on the interval and denote . For spatial variables, we set , where represents the number of space grids. Therefore, the dataset is
where both and are matrices in . The following Table 5 and Table 6 show a summary of parameters for Frac-PDE-Net.

Table 5.
Fixed parameters for Frac-PDE-Net.

Table 6.
Hyper-parameters selected for Frac-PDE-Net by the validation procedure as in Section 2.3.
In [], some assumptions are made on the model based on existing experimental knowledge of the biological behavior. For example, it is assumed that the operator is linear in both u and v, while is nonlinear in both u and v. As the form in (15),
In [], the nonlinear dependence of on u is via the combination of the power function and the integration operator , where is further restricted to the range . On the other hand, is assumed to be linear in u, but nonlinear in v and the nonlinear dependence on v is via a fractional function whose denominator is a quadratic polynomial. Thanks to these a priori constraints, we consider the library for and the library for , where takes the form for to ensure that .
The filters q are of size . The total number of parameters for approximating and is 29, and the number of trainable parameters in the moment matrices M is 32. To optimize the parameters, we again use the BFGS algorithm.
For this case, we added 1% noise to the generated data to form the observational data. The results are displayed in Table 7, in which the notations are consistent with those in Table 3.

Table 7.
PDE model discovery with noise level.
Similar to the post hoc selection procedure we performed in Example 1, we also need to compare Model 1 () and Model 2 (), and determine whether they differ significantly. Consider the time range to be with time step size . Hence, there are 200 time grids which are denoted to be , where , . At each time , we introduce the residuals and , where and are associated to Model 1 and Model 2, respectively. We will test if the residuals and have similar distributions or not. Analogous to the previous case, we see from Table 7 that the coefficient in front of the term in Model 2 () is a negative number -0.026, which leads to rapid concentration rather than diffusion effect. With this being said, Model 2 is essentially different from Model 1 and the distributions of and are totally different.
To assess the stability of the results shown above, we repeated the experiments 100 times and the results are presented in Figure 6 and Figure 7. The plots show that there are some instances where the three methods fail to eliminate certain redundant terms. However, these instances are rare, as the median of these terms is 0, indicating that they appear infrequently.

Figure 6.
Simulation results for with noise. (a) True positive discovering. (b) True positive discovering. (c) False positive discovering.

Figure 7.
Simulation results for with noise. True positive discovering.
4. Prediction
4.1. Example 1: The 2-Dimensional Model
In this section, we validate the robustness of the model discovered by Frac-PDE-Net in Example 1 by performing predictions with non-typical initial values and ,
We use the finite difference method to generate the “true data” in the forward direction using the known coefficients and terms in (16) and (17). The spatial step sizes ( and ) are set to and the time step size () is . We then simulate the data using the trained model from Table 3 up to t = 0.5.
In Figure 8, both the true solution and the predicted solution of the trained model by Frac-PDE-Net are plotted at different time instances: . One can see from Figure 8 that the predicted solution is very close to the true one.

Figure 8.
The first (second resp.) row shows the true dynamics of u (v resp.) at times . The third (fourth resp.) row shows the predicted dynamics of u (v resp.) with noise level using Frac-PDE-Net.
The results of the comparison between Frac-PDE-Net and PDE-Net 2.0 are presented in both graphical and quantitative form. The model discovered by PDE-Net 2.0 is shown in Table 8, while the predicted solutions are displayed in Figure 9. Although PDE-Net 2.0 only utilizes polynomials, the predicted images still have a similar shape to the true ones. To further evaluate the performance, the predicted errors are analyzed quantitatively using the norm and norm on the space domain , as seen in Table 9. The results show that Frac-PDE-Net has smaller errors compared to PDE-Net 2.0, highlighting its advantage.

Table 8.
PDE model discovered by PDE-Net 2.0.

Figure 9.
Images of the predicted dynamics using PDE-Net 2.0 with noise level.

Table 9.
Errors of predicted solutions for u and v by Frac-PDE-Net and PDE-Net 2.0.
4.2. Example 2: The One-Dimensional Model
In this section, we validate the robustness of the model discovered by Frac-PDE-Net in Example 2 in Section 3.2 by performing predictions with the following periodic initial values and ,
We use finite difference method to generate the “true” data in the forward direction using the known coefficients and terms in (20) and (21). The spatial step sizes ( and ) are set to and the time step size () is . The time interval considered is . We then simulate the data using the trained model from Table 7 over the time period . In Figure 10, both the true solution and the predicted solution of the trained model by Frac-PDE-Net are plotted for . One can see from Figure 10 that the predicted solution is very close to the true one.

Figure 10.
The first row shows the true dynamics of for . The second row presents the predicted dynamics of with 1% noise level by Frac-PDE-Net.
The results of the comparison between Frac-PDE-Net and PDE-Net 2.0 are presented in both graphical and quantitative form. The model discovered by PDE-Net 2.0 is shown in Table 10, while the predicted solutions are displayed in Figure 11. We can clearly see that the predicted images by PDE-Net 2.0 are far from satisfactory compared to the true one in Figure 10. To further evaluate the performance, the predicted errors are analyzed quantitatively using the norm and norm on the space-time region in Table 11. The results show that Frac-PDE-Net has much smaller errors compared to PDE-Net 2.0, highlighting its advantage.

Table 10.
PDE model discovered by PDE-Net 2.0.

Figure 11.
Images of the predicted dynamics of for using PDE-Net 2.0 with 1% noise level.

Table 11.
Errors of predicted solutions for u and v by Frac-PDE-Net and PDE-Net 2.0.
5. Conclusions
Our approach, Frac-PDE-Net, builds on the symbolic approach developed in PDE-Net for addressing the discovery of realistic and interpretable PDE from data. While the neural network remains very efficient for generating and learning dictionaries of functions, typically polynomials, we have shown that if we enrich the dictionaries with large families of functions (typically uncountable), an extra-care is needed for selecting the important terms by penalization and by evaluating and testing the impact of a reaction term in the predicted solution. Quite remarkably, we can extract a sparse equation with readable terms and with good estimates of the associated parameters.
The introduction of rich families of functions, such as fractions (rational functions) is often necessary because they are well used by modelers, but also they can avoid the limitations of the approximation capacity of polynomials. Indeed, it might be necessary to have numerous terms in the expansion in order to have a correct approximation of the unknown reaction terms. As a matter of fact, we have introduced a very flexible family of fractions that avoid truncation based on powers . While we learn then the numerator and denominator coefficients in , our approach is incorporated seamlessly in the symbolic differentiable neural network framework of PDE-Net by the introduction of extra layers.
Our work is originally motivated by the discovery and estimation of reaction–diffusion PDEs, with possibly complex terms such as fractions, non-integer powers, or non-local terms (such as an integral), as it has been introduced for the pollen tube growth problem []. Nevertheless, our selection approach could be used to handle other dictionaries, or in the presence of advection terms as our methodology does exploit the reaction–diffusion structure only for imposing some constraints on the dictionaries of interest, and because of the interpretability of each term in that case. As the next steps, the Frac-PDE-Net methodology can be improved by considering more advanced numerical schemes in time discretization, say implicit Euler or second-order Runge–Kutta. In that case, we expect to have a better accuracy and stability for model recovery and prediction. Another possible improvement would be to enrich the dictionaries of fractionals by replacing the current form by more rational functions with denominators that depends both on u and v, say . Finally, we put an emphasis on the fact that Frac-PDE-Net reaches a trade-off by discovering the main terms of the PDE, accurately estimating each coefficient in order to gain interpretability, while it also allows effective long-term prediction, even for unseen initial conditions.
Author Contributions
Conceptualization, N.J-B.B., X.C.; methodology, S.C., X.Y., N.J-B.B., X.C.; software, S.C., X.Y.; validation, S.C., X.Y., N.J-B.B., X.C.; formal analysis, S.C., X.Y., N.J-B.B., X.C.; writing—original draft preparation, S.C., X.Y., N.J-B.B., X.C.; writing—review and editing, S.C., X.Y., N.J-B.B., X.C.; supervision, N.J-B.B., X.C.; funding acquisition, X.C. All authors have read and agreed to the published version of the manuscript.
Funding
This work was partially supported by United States Department of Agriculture (USDA) National Institute of Food and Agriculture (NIFA) Hatch Project AES-CE award (CA-R-STA-7132-H) and NSF DMS 1853698.
Institutional Review Board Statement
Not applicable.
Data Availability Statement
Not applicable.
Acknowledgments
The authors would like to thank all anonymous reviewers for their constructive comments and suggestions.
Conflicts of Interest
The authors declared no conflict of interest.
Appendix A
Appendix A.1. Term Combination after Simulation
During the process of simulation, if only the addition and the multiplication operators are involved, then it is not an issue to combine terms as the program can easily identify same terms and then add their coefficients together. However, combining similar terms can be difficult when fractional terms are present. To address this issue, we classify the simulation results into various groups before combining them.
As an example, we consider the scenario where the nonlinear term takes the form , and one of the following two structures is assumed.
- (i)
- g is linear and h is a fractional function whose denominator is a second order polynomial:
- (ii)
- h is linear and g is a fractional function whose denominator is a second order polynomial:
Therefore, the outcomes have 32 possibilities if we only classify terms and signs:
- Numerator (4 possibilities): u, v, , 1.
- Denominator (2 possibilities): quadratic function in u or v.
- Signs (4 possibilities): the sign of or can be either positive or negative.
There are now 32 groups. In each of them, all members share the same main terms and same signs in the denominator while the coefficients are allowed to be different. For example, in the group with the form
all members share the same term in the numerator, same terms and v in the denominator, and same signs of and , while the specific values of , and may vary.
Based on the above groups, we will adopt the following general principle to proceed. If two terms live in distinct groups, then they are considered to be different and will not be combined. If two terms live in the same group, then we will further quantify how close their coefficients in the denominator (say and ) are. If these coefficients are close enough, then we will regard them as the “same” term and combine them by adding their coefficients in the numerator (say ) together. So, the next question is how to quantify the distance between two members in the same group with possibly different coefficients (say and ).
We will illustrate the criterion in the following by studying a specific form . More precisely, suppose there are two terms and as below,
then we define their distance to be
According to this concept, we combine and together if and only if , that is when the relative difference between the coefficients is less than . In such a case, we add the coefficients and to obtain
where
References
- Turing, A.M. The Chemical Basis of Morphogenesis. Philos. Trans. R. Soc. Lond. Ser. B Biol. Sci. 1952, 237, 37–72. [Google Scholar]
- Murray, J.D. Mathematical Biology, II, 3rd ed.; Interdisciplinary Applied Mathematics; Springer: New York, NY, USA, 2003; Volume 18, p. xxvi+811. [Google Scholar]
- Mori, Y.; Jilkine, A.; Edelstein-Keshet, L. Wave-pinning and cell polarity from a bistable reaction-diffusion system. Biophys. J. 2008, 94, 3684–3697. [Google Scholar] [CrossRef]
- Mogilner, A.; Allard, J.; Wollman, R. Cell polarity: Quantitative modeling as a tool in cell biology. Science 2012, 336, 175–179. [Google Scholar] [CrossRef]
- Tian, C. Parameter Estimation Procedure of Reaction Diffusion Equation with Application on Cell Polarity Growth. Ph.D. Thesis, UC Riverside, Riverside, CA, USA, 2018. [Google Scholar]
- Tian, C.; Shi, Q.; Cui, X.; Guo, J.; Yang, Z.; Shi, J. Spatiotemporal dynamics of a reaction-diffusion model of pollen tube tip growth. J. Math. Biol. 2019, 79, 1319–1355. [Google Scholar] [CrossRef] [PubMed]
- Lu, L.; Meng, X.; Mao, Z.; Karniadakis, G.E. DeepXDE: A deep learning library for solving differential equations. arXiv 2019, arXiv:1907.04502. [Google Scholar] [CrossRef]
- Raissi, M.; Perdikaris, P.; Karniadakis, G.E. Physics Informed Deep Learning (Part I): Data-driven Solutions of Nonlinear Partial Differential Equations. arXiv 2017, arXiv:1711.10561. [Google Scholar]
- Raissi, M.; Perdikaris, P.; Karniadakis, G.E. Deep hidden physics models: Deep learning of nonlinear partial differential equations. J. Mach. Learn. Res. 2018, 19, 932–955. [Google Scholar]
- Raissi, M.; Perdikaris, P.; Karniadakis, G.E. Physics-informed neural networks: A deep learning framework for solving forward and inverse problems involving nonlinear partial differential equations. J. Comput. Phys. 2019, 378, 686–707. [Google Scholar] [CrossRef]
- Meng, X.; Li, Z.; Zhang, D.; Karniadakis, G.E. PPINN: Parareal physics-informed neural network for time-dependent PDEs. Comput. Methods Appl. Mech. Eng. 2020, 370, 113250. [Google Scholar] [CrossRef]
- Pang, G.; Lu, L.; Karniadakis, G.E. fPINNs: Fractional physics-informed neural networks. SIAM J. Sci. Comput. 2019, 41, A2603–A2626. [Google Scholar] [CrossRef]
- Chen, Z.; Xiu, D. On generalized residual network for deep learning of unknown dynamical systems. J. Comput. Phys. 2021, 438, 110362. [Google Scholar] [CrossRef]
- Wu, K.; Xiu, D. Data-driven deep learning of partial differential equations in modal space. J. Comput. Phys. 2020, 408, 109307. [Google Scholar] [CrossRef]
- Zhou, Z.; Wang, L.; Yan, Z. Deep neural networks for solving forward and inverse problems of (2 + 1)-dimensional nonlinear wave equations with rational solitons. arXiv 2021, arXiv:2112.14040. [Google Scholar]
- Long, Z.; Lu, Y.; Dong, B. PDE-Net 2.0: Learning PDEs from data with a numeric-symbolic hybrid deep network. J. Comput. Phys. 2019, 399, 108925. [Google Scholar] [CrossRef]
- Long, Z.; Lu, Y.; Ma, X.; Dong, B. PDE-Net: Learning pdes from data. In Proceedings of the International Conference on Machine Learning, Stockholm, Sweden, 10–15 July 2018; pp. 3208–3216. [Google Scholar]
- Pakravan, S.; Mistani, P.; Aragon-Calvo, M.; Gibou, F. Solving inverse-PDE problems with physics-aware neural networks. J. Comput. Phys. 2021, 440, 110414. [Google Scholar] [CrossRef]
- Daneker, M.; Zhang, Z.; Karniadakis, G.; Lu, L. Systems Biology: Identifiability analysis and parameter identification via systems-biology informed neural networks. arXiv 2022, arXiv:2202.01723. [Google Scholar]
- Both, G.; Choudhury, S.; Sens, P.; Kusters, R. DeepMoD: Deep learning for model discovery in noisy data. J. Comput. Phys. 2021, 428, 109985. [Google Scholar] [CrossRef]
- Xu, H.; Chang, H.; Zhang, D. DL-PDE: Deep-learning based data-driven discovery of partial differential equations from discrete and noisy data. arXiv 2019, arXiv:1908.04463. [Google Scholar] [CrossRef]
- Chen, Y.; Luo, Y.; Liu, Q.; Xu, H. and Zhang, D. Symbolic genetic algorithm for discovering open-form partial differential equations (SGA-PDE). Phys. Rev. Res. 2022, 4, 023174. [Google Scholar] [CrossRef]
- Zhang, Z.; Liu, Y. Robust data-driven discovery of partial differential equations under uncertainties. arXiv 2021, arXiv:2102.06504. [Google Scholar]
- Bhowmick, S.; Nagarajaiah, S. Data-driven theory-guided learning of partial differential equations using simultaneous basis function approximation and parameter estimation (SNAPE). arXiv 2021, arXiv:2109.07471. [Google Scholar]
- Rudy, S.H.; Brunton, S.L.; Kutz, J.N. Data-driven discovery of partial differential equations. Sci. Adv. 2017, 3, e1602614. [Google Scholar] [CrossRef] [PubMed]
- Rudy, S.; Alla, A.; Brunton, S.L.; Kutz, J.N. Data-driven identification of parametric partial differential equations. SIAM J. Appl. Dyn. Syst. 2019, 18, 643–660. [Google Scholar] [CrossRef]
- Cai, J.; Dong, B.; Osher, S.; Shen, Z. Image restoration: Total variation, wavelet frames, and beyond. J. Amer. Math. Soc. 2012, 25, 1033–1089. [Google Scholar] [CrossRef]
- Brunel, N. J-B. Parameter estimation of ODE’s via nonparametric estimators. Electron. J. Statist. 2008, 2, 1242–1267. [Google Scholar] [CrossRef]
- Bergstra, J.; Yamins, D.; Cox, D. Making a science of model search: Hyperparameter optimization in hundreds of dimensions for vision architectures. In Proceedings of the International Conference on Machine Learning, Atlanta, GA, USA, 16–21 June 2013; pp. 115–123. [Google Scholar]
- Dunn, O. Multiple comparisons among means. J. Am. Stat. Assoc. 1961, 56, 52–64. [Google Scholar] [CrossRef]
- Holm, S. A simple sequentially rejective multiple test procedure. Scand. J. Stat. 1979, 6, 65–70. [Google Scholar]
- Benjamini, Y.; Hochberg, Y. Controlling the false discovery rate: A practical and powerful approach to multiple testing. J. R. Stat. Soc. Ser. B 1995, 57, 289–300. [Google Scholar] [CrossRef]
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content. |
© 2023 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).