Equation Discovery Using Fast Function Extraction: a Deterministic Symbolic Regression Approach

: Advances in machine learning (ML) coupled with increased computational power have enabled identiﬁcation of patterns in data extracted from complex systems. ML algorithms are actively being sought in recovering physical models or mathematical equations from data. This is a highly valuable technique where models cannot be built using physical reasoning alone. In this paper, we investigate the application of fast function extraction (FFX), a fast, scalable, deterministic symbolic regression algorithm to recover partial differential equations (PDEs). FFX identiﬁes active bases among a huge set of candidate basis functions and their corresponding coefﬁcients from recorded snapshot data. This approach uses a sparsity-promoting technique from compressive sensing and sparse optimization called pathwise regularized learning to perform feature selection and parameter estimation. Furthermore, it recovers several models of varying complexity (number of basis terms). FFX ﬁnally ﬁlters out many identiﬁed models using non-dominated sorting and forms a Pareto front consisting of optimal models with respect to minimizing complexity and test accuracy. Numerical experiments are carried out to recover several ubiquitous PDEs such as wave and heat equations among linear PDEs and Burgers, Korteweg–de Vries (KdV), and Kawahara equations among higher-order nonlinear PDEs. Additional simulations are conducted on the same PDEs under noisy conditions to test the robustness of the proposed approach.


Introduction
Partial differential equations (PDEs) are ubiquitous in all branches of science and engineering.These mathematical models are generally derived from conservation laws, sound physical arguments, and empirical heuristics drawn from experiments by an insightful researcher.However, there are many complex systems (some being neuroscience, weather forecasting, disease control modeling, finance, and ecology) that are still awaiting quantitative models to be built on physical arguments.The counter approach is to build models based on observing complex patterns in enormous amounts of readily available data which can loosely be termed as reverse-engineering nature.This approach has been used in previous studies, for example by Kepler, to approximate elliptic orbits solely from planetary positions.This reverse-engineering is appropriate today as we can leverage computers to identify patterns from observing data that might not be comprehensible to humans.Recent progress in machine learning and data science [1,2], combined with an increase in computational power, has generated innovative algorithms to distill physical models from data.These algorithms are termed as data-driven tools, as they can infer correlations and extract patterns from high-dimensional data or measurements.Some popular data-driven tools for extracting dynamical system from data are artificial identification problems.Traditional linear and nonlinear regression algorithms assume a mathematical form and only find parameters that best fit the data.On the other hand, SR approaches aim to simultaneously find parameters and learn functional form of the model from observed data.This is generally achieved by searching mathematical abstractions with a preselected set of arithmetic operators while minimizing the error metrics.Finally, optimal model is selected from Pareto front analysis with respect to minimizing accuracy versus model complexity.Genetic programming (GP) [26] is a popular choice leveraged by most of the SR frameworks.GP is an evolutionary computation technique that is inspired by Darwin's theory of natural evolution.A seminal work was done in extracting nonlinear dynamics [28] from input-output response using the GP approach.The use of GP-based SR approaches has appeared in various system identification problems [29][30][31].Furthermore, GP has been applied to identify closed-loop feedback control for turbulent separated flows [32,33].Different improved versions of GP have been proposed recently; for instance, gene expression programming (GEP) [27], parse matrix evolution (PME) [34], and linear genetic programming (LGP) [35].GEP has been exploited recently to recover functional models approximating nonlinear behavior of stress tensors in Reynolds-averaged Navier-Stokes (RANS) and large eddy-simulation (LES) turbulence models [36,37].Generally, GP-based SR approaches can identify models with complex nonlinear compositions given enough computational time.
Recently, researchers proposed fast and deterministic SR approaches by using results from CS and sparse optimization.Thus, a family of deterministic SR approaches were proposed.The first non-evolutionary SR algorithm was fast function extraction (FFX) [38] where feasible models were confined to generalized linear models (GLM) [39] and best bases and their corresponding coefficients were found by pathwise regularized learning that is also called elastic net algorithm [17].The final models are selected through non-dominated filtering with respect to accuracy and model complexity.FFX draws influences from both GP and CS to distill better models from data.FFX solves quadratic optimization problems and thus, computation cost increases quadratically as we increase the number of bases in search space.FFX and GP has been applied to various problems such as dynamical system recovery and solar power prediction based on energy production data [40].Elite base regression (EBR) [41] is a recent advancement in non-evolutionary computation where only elite bases are selected by measuring the correlation coefficient of basis functions with respect to the target model.These elite bases are spanned in search space and use the parse matrix encoding scheme to propagate the algorithm further to recover mathematical model.Prioritized grammar enumeration (PGE) [42] is another deterministic approach where genetic operators and random numbers from GP are replaced with grammar production rules and systematic choices.PGE approach also aims for the substantial reduction of search space.
In this paper, we demonstrate the use of FFX, a deterministic symbolic regression algorithm, to identify and recover the target PDEs representing both linear and nonlinear dynamical systems.We build candidate basis functions consisting of partial derivative terms of varying orders approximated by the central finite-difference formulas.For testing, we use exact analytical solutions of PDEs to get input data and use FFX Python package [38] to demonstrate its feasibility.First, we recover simple linear PDEs such as wave and heat equations.We then recover higher-order nonlinear PDEs such as Burgers, KdV, and Kawahara equations.We further add noise to input data originated from the same PDEs to test the robustness of FFX.
The rest of the paper is organized as follows.Section 2 gives a brief description of the FFX algorithm.In Section 3, FFX is tested on different canonical PDEs.We demonstrate performance and robustness of FFX by inferring dynamics from both clean and noisy input data.Section 4 gives the summary of our findings and limitations that need further investigation.

Methodology
Fast function extraction is a deterministic symbolic regression algorithm that draws its influences from compressive sensing and genetic programming.This method was predominately proposed to improve speed and scalability while dealing with symbolic regression problems in deterministic sense.The following sections discuss the methodology used in the current paper for recovering PDEs using only data.

FFX Problem Statement
FFX aims to find white-box models or expressions, Model = {model 1 , model 2 , . ..} from a given set of data { Θ, V}.The input data Θ is arranged as a matrix with several rows equal to the number of input samples/measurements and several columns equal to the number of candidate feature/basis functions.The corresponding output data V is arranged as a column vector with several rows equal to the number of input samples/measurements.The recovered models undergo additional filtering to form a Pareto front consisting of optimum set with respect to tradeoff between minimizing model complexity f 1 (model) and expected test error f 2 = E x,y L(model), where L(model) is squared loss error of the model.In the current paper, we restrict the PDEs to be recovered to quadratic nonlinearity and up to fifth order in space.The general PDE to be recovered is in the form: where subscripts denote the order of partial differentiation and σ is an arbitrary parameter.
For illustrative example, consider the problem of discovering PDE is the viscous Burgers equation as shown below: where u(x, t) is generally velocity field and ν is viscous coefficient.The solution field u(x, t) ∈ R m×n , where m is the number of time snapshots and n is the number of spatial locations, is formed by solving Equation ( 2) analytically or numerically.The solution field is arranged as shown below: Higher-order partial derivative terms can be approximated using any available numerical schemes once recording of discrete data set given by Equation (3) is available.We use the leapfrog scheme for approximating the derivative in time, and the central difference schemes for space as follows: where temporal and spatial step sizes are given by dt and dx, respectively.In Equation (4), the spatial location is denoted using subscript index j, and the temporal location using superscript index p.
We note that the other finite-difference schemes (or automated differentiation scheme) can be easily used within our study.FFX takes the input library consisting of basis functions (candidate terms) that are built using Equations ( 2) and (3).This core library is shown below: For each column in Equation ( 5), the solution u(x, t) and its derivatives in space and time are arranged with size m • n × 1 where m is the number of time snapshots and n is the number of spatial locations.For example, the basis functions U and U 2x are arranged as follows: where subscripts j and p denote the spatial and temporal collocation points of data, respectively.The basis functions in core library Θ(U) is expanded to include interacting basis functions (with itself and each other) limiting to quadratic nonlinearity.In addition, a constant term and original basis functions in Θ(U) are added to form the library of candidate features Θ(U).The expansion procedure by FFX is discussed in Section 2.2.The final expanded library is in the form: where the size of library is Θ (U) ∈ R m•n×N B , where N B is number of basis functions (i.e., N B = 28).For example, if we have 501 spatial points and 101 time snapshots with 28 candidate bases, then Θ (U) (Equation ( 7)) contains 501 × 101 rows and 28 columns.The Burgers PDE given by Equation (2) or any other PDE under consideration can be written in the form of linear system representation in terms of V(t) and Θ(U): where where N B is number of basis functions (candidate terms) in library Θ(U).Thus, we form a linear system problem Equation ( 8) with over determined system (number of measurements greater than candidate terms).The goal of FFX is to find sparse coefficient vector β that only consists of active basis and rest of bases are set to zero.For example, in Burgers equation given by Equation ( 2), FFX has to find coefficient vector β consisting of only two basis functions, namely uu x and u 2x and all others basis functions set to zero.
The system in Equation ( 8) can be solved for β using the LS problem.But LS minimization tries to fit all the basis functions resulting in all non-zero values in coefficient vector β.Thus, solving Equation (8) using LS gives radically different functional form from the inferred dynamics and generally gives overfitted models.Hence sparse optimization is used which can identify the required basis functions along with coefficient estimation.FFX uses one of the sparse optimization algorithms called pathwise regularized learning (also known as elastic net approach) which is discussed in Section 2.3 for feature selection and parameter estimation.The sparse optimization in FFX returns multiple models of different complexity (number of basis functions).These models are further filtered using non-dominated sorting inspired by genetic programming to extract Pareto optimal set of models with respect to minimizing complexity and test error.This filtering approach is discussed in Section 2.4.In summary, FFX algorithm performs three major steps to recover model from the data.They are:

•
Construct a large library of linear and nonlinear basis functions.This step is known as basis function expansion or feature construction in ML.

•
Use regularized least square algorithm to regress the coefficients and simultaneously select basis functions mapping to target model.FFX uses pathwise regularized learning also known as elastic net algorithm to get the vector of coefficients and corresponding bases that best fit the data.This step is termed as model building.

•
Avoid over fitting using non-dominated filter (Pareto front analysis) with respect to model complexity (number of bases) versus the testing error.Final step is called model selection.
Please note that each of the above steps and associated algorithms elaborated in subsequent Sections 2.2-2.4 are derived from the original FFX algorithm [38] and only listed here for the sake of completeness.Readers are encouraged to refer the original algorithm [38] for in depth discussion.

Basis Function Expansion (Feature Construction)
FFX restricts the model form to be recovered to GLM [39].GLMs are generalization on classic linear regression model.GLM aims to find f ( x) in a finite dimensional space spanned by a set of given basis functions as shown below: where θ i s are basis functions and β i s are corresponding coefficients.The important observation in Equation ( 9) is that the basis functions θ i ( x) are allowed to be nonlinear, but the whole ensemble process of f ( x) in GLM is considered linear combinations of these nonlinear basis functions.
The input library Θ is passed to FFX where the library is expanded to Θ which consists of additional constant and nonlinear basis functions.FFX builds a massive set of nonlinear basis functions by interacting the bases with itself and each other limiting to quadratic nonlinearity.Additionally, it adds constant term and original basis functions to the expanded library.Algorithm 1 gives the outline of building interacting bases limiting to quadratic nonlinearity.In FFX, we can also construct uni-variate bases such as sqrt, exp, log, etc.But in our problem, we exclude this part as we restrict to only interacting bases.The OK function in Algorithm 1 shields from illegal bases which results in inf or NaN caused by division by zero.The basis expansion step in FFX resembles genetic programming where a large set of bases and their interactions are spanned in search space and randomly selected for evolution.However, in FFX we fill the global search space in one step and later find the optimal models deterministically as discussed in next Section 2.3.

Model Building
This step forms the core of FFX which concerns with identifying basis functions that may be sparse in nature and simultaneously regressing corresponding coefficients, β i s, using given data or measurements.The coefficient vector, β, can be computed using the LS minimization.However, LS minimization method is prone to overfitting resulting in high model sensitivity on training data and finding extremely large coefficients to represent the given data.Hence, two regularizing terms, L 1 and L 2 norms on the coefficients β, are added to the least squares objective function to avoid overfitting and tame the coefficients in finding sparse approximation of β.The resulting regularized LS objective function is as follows: where λ 1 and λ 2 are regularizing weights for L 1 and L 2 penalty functions, respectively.Regularization in compressive sensing refers to forcing certain characteristics of model-based scientific inference, for example, discouraging complex models or extreme explanations, even if they fit the input data.Solving Equation (10) in FFX is termed as pathwise regularized learning which in compressive sensing is a popular hybrid regularized LS method called elastic net approach [17,18].Elastic net approach combines popular regularizing LS algorithms, namely LASSO regression [11,14] and ridge (or Tikhonov) regression [16,43].LASSO and its variants including ridge regression are fundamental to the field of compressive sensing, especially in recovery of sparse signals from random data.LASSO penalizes LS objective function by adding sum of the absolute values (L 1 norm) of the regression coefficients.This forces the shrinkage of certain coefficients simultaneously performing feature elimination as L 1 norm promotes sparsity.In ridge regression, sum of the squares (L 2 norm) of the regression coefficients is added to LS objective function.Ridge regression has effect of grouping several correlated variables with increased stable convergence.
Hence by adding both L 1 norm and L 2 norm to LS objective function, we arrive at elastic net approach with objective function as shown in Equation (10).This hybrid approach helps in finding both sparse and dense models which can be later filtered to find optimum model.The λ 1 and λ 2 in Equation (10) control the amount of sparsity by putting more weight on penalty terms and resulting in shrinkage of coefficients and avoiding overfitting.The λ 1 and λ 2 in Equation (10) are modified by introducing a single variable λ ∈ [0,1] as a mixing parameter and α 0 called regularizing weight.The resulting modified objective function is shown as: where the mixing parameter λ is also called L 1 ratio as λ = λ 1 in Equation (11).Hence, λ = 1 corresponds to pure LASSO regression and λ = 0 corresponds to pure ridge regression.The regularizing weight α multiplies the penalty terms and dictates the strength of sparsity needed.Hence, α = 0 corresponds to ordinary LS and the larger the α, the greater the number of non-zero terms (sparsity) in the resulting model.The benefit of introducing α is that by varying the strength of α, we can get models with different complexities.Figure 1 is used to illustrate this effect of α on standard diabetic set [44] where increase of α adds more non-zero coefficients to discovered model.In FFX algorithm, elastic net procedure is termed as pathwise regularized learning as it sweeps many values of α automatically by initializing number of αs that need to be searched.Along the path of various αs, Algorithm 2 gives different models with varying complexity and accuracy which are stored for further filtering.The length of the model is restricted through number of maximum bases, which is similar to genetic programming where the maximum length/depth of the tree is restricted as to explain the data with compact model which is easier to interpret.The elastic net method is implemented in FFX using SCIKIT Learn package [44].We should note that pathwise regularized learning needs to solve quadratic optimization problem.Hence, computation cost increases quadratically with increase in bases N B .// Pathwise Learning()

Model Selection
As we sweep through the regularized path by iterating through different regularizing weights, αs, we are storing models with varying complexity (number of bases) to explain the input data.These candidate models are filtered using standard non-dominated sorting to get the Pareto optimum set of best models.A sorting algorithm uses some ranking mechanism to distill best models.For example, NSGA2 [45] uses a heuristic metric called crowding distance to compare two models.This step in FFX is called non-dominated filtering.Algorithm 3 outlines the filtering approach.
The objective of non-dominated filtering is to minimize model complexity (number of bases) along with testing error.As per the Ockham's razor ("law of parsimony"), a simple solution is considered superior to a complicated one [46,47] as it is easier to interpret.In addition, complex functions are prone to overfitting performing too good on training data but worse on unseen (test) data.We look for the set of candidate solutions that are of minimum complexity and test error.The test or training accuracy is calculated by normalized root mean squared error (NRMSE) given by the following expression: where V(t) is the predicted output, V(t) is the actual output and n • m is number of input samples or measurements.Equation ( 12) is used to determine how well the given model performs on the outside data set which is not shown to the algorithm while training.It is reported in percentage by simply multiplying NRMSE error with 100 and generally called as test error.
Hence, through this filtering, we form Pareto front along which optimal models lie.As this set of solutions satisfy at least minimizing one of the objectives, they are also called non-dominated candidate solutions.These solutions are not worse than any other solutions in the populations compared on all objectives.
We implement FFX setup to recover canonical linear and nonlinear PDEs.We restrict PDEs to be tested up to fifth order in space.The summary of PDEs that are selected for recovery along with their domain and discretization is listed in Table 1.For keeping it simple, we use exact available analytical solutions for PDEs to get the input data for building the library Θ.Furthermore, all the selected PDEs are non-dimensionalized with respect to length of the domain L and total time T to make the regression process more effective.The resulting non-dimensional forms of selected PDEs are listed in Table 2. Table 3 shows different hyper-parameters values of FFX used in this problem.The L l ratio, λ, is set at high value (0.95) to promote sparsity in recovered model.Additionally, this value for L 1 ratio (λ) in our setup gives best prediction of coefficients corresponding to basis functions.Future investigations will include more rigorous studies on selecting hyper-parameters using techniques such as cross validation.
To test the robustness of FFX algorithm, we added artificial noise to the analytical solution u(x, t) with different magnitudes of standard deviation of u(x, t).In this framework, noise can be introduced in two ways: (i) first computing the basis functions (i.e., partial derivatives) from recorded data, u(x, t), and then adding noise to each input feature basis function, or (ii) first adding noise to recorded data, u(x, t), and then compute derivatives from noised data and establish a set of basis functions.In this paper, we follow the second approach.We note that this second approach is challenging since the data is noised before we compute the derivatives (candidate basis functions).

Wave Equation
Our first test case is wave equation which is a first order linear PDE.The equation takes the form as follows: where a is constant wave speed.The traveling single wave solution that satisfies the PDE in Equation ( 13) is used to get velocity field at various spatial locations and time snapshots which are used as input data for building library of basis functions.The exact analytical solution is given as: where the wave speed is taken as a = 1.The solution, Equation ( 15), has spatial domain x ∈ [0, 1] and temporal domain t ∈ [0, 10].The original PDE is non-dimensionalized using length scale L and time scale T. We define dimensionless form and omit the tilde from x and t when they refer to the partial derivatives.The resulting non-dimensional PDE has the form: where L = 1 and T = 10 in our set up.
Figure 2 shows the plot of analytical solution for non-dimensional PDE given by Equation ( 16).Exact solution data is used to form candidate terms consisting of different higher-order partial derivatives.After non-dominated filtering, two models enter into Pareto optimal set as shown in Figure 3 with respect to test error and model complexity (number of bases).Zero complexity in Figure 3 refers to a constant value with high test error.The best model is presented in the colored box in Figure 3. Additionally, for wave equation, FFX does not identify models with intermediate complexity as shown in Figure 3.The recovered model identified by FFX has three bases whereas the wave equation has only one main base term, namely u x .This is due to FFX adding the terms with corresponding coefficients that are small compared to the main basis function ( u x ) to the model.If we look closely, the recovered model can be cleaned by removing the non-significant bases with small values of coefficients and show that the model has only one base term i.e., u x .This simplification of model to recognize that original PDE being recovered is applied for all numerical experiments carried out in this paper.
Table 4 lists the target and identified PDE recovered from using both clean and noisy data along with their associated test error.As expected, FFX works extremely well using clean data with test error 0.01% compared to their noisy counter parts.This small test error for model identified using clean data is due to the additional terms that are added by FFX to explain the data.As the magnitude of noise is increased, the accuracy of predicted coefficient corresponding to main base term ( u x ) drastically decline as shown in Table 4. Finally, at 30% noise, FFX misidentifies the PDE to be recovered.

Heat Equation
We test heat equation which is a second order linear PDE.Heat equation takes the form: where the thermal conductivity is taken as α = 1 π 2 .Exact solution used in the current paper is given by: u(x, t) = −sin(πx)e −t (18) where the spatial domain x ∈ [−1, 1] and temporal domain t ∈ [0, 1].We nondimensionalize the PDE given by Equation ( 17) using domain length L and total time T. Resulting non-dimensional PDE is of the form: where L = 2 and T = 1 in our set up.
Non-dimensional analytical solution to Equation ( 19) is plotted in Figure 4. Figure 5 shows the Pareto optimal set of models with best model labeled in the colored box.This best model can be cleaned by removing non-significant bases with coefficients vary small compared to the main base term i.e., u 2x .Table 5 shows the recovered PDE using both clean and noise data along with associated test error.The small test error for predicted model using clean data in Table 5 is due to addition of these extra basis functions by FFX to explain the data.Recovering heat equation in our set up but FFX is less robust compared to wave equation.Wave equation is recovered by FFX up to 25% noise whereas heat equation is misidentified at 5% noise even though heat equation being linear PDE.

Burgers Equation
Burgers equation is a fundamental PDE occurring in various areas such as fluid mechanics, nonlinear acoustics and gas dynamics.The interest in Burgers equation arises due to nonlinear term uu x and present a challenge to FFX algorithm to recover the dynamics.Burgers equation takes the form: where ν is viscosity coefficient.We use available analytical equation to Equation (20) to get solution u(x, t) data.The exact solution is given by: where a = 1.25 and ν = 0.01.The spatial domain x ∈ [0, 1] and temporal domain t ∈ [0, 100] are taken in Equation (21).The PDE in Equation ( 20) is nondimensionalized using domain length L and total time T. The resulting non-dimensional PDE is given by: where L = 1 and T = 100 in our set up.Non-dimensional analytical solution is plotted in Figure 6. Figure 7 shows the Pareto optimal set of models with best model highlighted in colored box.This best model can be cleaned by removing non-significant bases with corresponding coefficients close to zero to recover Burgers PDE.Table 6 shows the recovered PDE using both clean and noise data along with the test error.FFX shows promise in recovering nonlinear PDEs as we can see from this current numerical simulation.FFX misidentifies Burgers PDE at 5% noise level as shown in Table 6.

Korteweg-de Vries Equation
Korteweg-de Vries (KdV) equation is used to model the behavior of magneto hydrodynamics waves in warm plasma, acoustic waves in an inharmonic crystal and ion-acoustic waves [48].Many different forms KdV equations exist but we use the form as given below: We use analytical equation for KdV equation (Equation ( 23)) given in [49] which is an interacting wave solution.The solution with multiple soliton interactions exhibit highly nonlinear behavior which challenges FFX apart from being higher-order PDE.The exact equation is given as: where the spatial domain x ∈ [−10, 10] and temporal domain t ∈ [0, 0.5] are taken for solving Equation (24).The two solitons interacting with each other is shown in the Figure 8.The resulting non-dimensional PDE with respect to domain length L and total time T is shown below: where L = 20 and T = 0.5 in our setup.Pareto optimal set with best model in colored box is plotted in Figure 9.This best model can be cleaned by removing non-significant bases with corresponding coefficients very small compared to main basis functions, namely u u x and u 3x to recover KdV equation.FFX recovers the original non-dimensional PDE for both clean and noise cases as shown in Table 7.This test case shows FFX is reliable for recovering dynamics governed by complex multiple interacting waves.Furthermore, KdV equation is identified by FFX at 1% noise level but fails to recover at 5% noise.

Kawahara Equation
We consider a fifth-order nonlinear PDE called Kawahara equation introduced in [50].This equation is also known as a fifth-order KdV equation or singularly perturbed KdV equation.The fifth-order KdV equation is one of the most known nonlinear evolution equation which is used in the theory of magneto-acoustic waves in a plasma [50], capillary-gravity waves [51] and in the theory of shallow water waves [52].The Kawahara equation takes the form: We use single traveling solitary solution for Equation (26) to get the solution field u(x, t).The exact wave form solution listed below is taken from [53]: where the wave travels with the speed a = 36 169 , the spatial domain x ∈ [−50, 50] and temporal domain t ∈ [0, 6].The Equation ( 26) is nondimensionalized using domain length L and total time T resulting in the form as shown below: where L = 100 and T = 6 in our setup.The non-dimensional solution is plotted in Figure 10 show the single wave soliton clearly.This test case is interesting as the analytical solution also satisfies the wave equation with wave speed, a = 36 169 .Pathwise regularized learning (elastic net approach) finds the models that are sparser in nature.Hence FFX converges to wave equation with speed a.This is proved in Figure 11 which shows the wave equation in colored box being recovered for the same solution given by Equation (27).
The ambiguity in identifying intended models with input data satisfying multiple equations is a concern and actively being pursued for future research.In the current paper, to test the performance of FFX to recover higher-order PDEs, we remove the basis function u x from the library Θ to discourage FFX to converge to wave equation.Figure 12 shows the Pareto front of optimal model with best model highlighted in colored box.Table 8 shows the recovery of PDE for both clean and noise data.This workaround demonstrates the FFX algorithm to recover higher-order PDEs such as Kawahara equation.Furthermore, at 5% noise level, FFX misidentifies Kawahara equation just like other nonlinear PDEs tested before.

Summary and Conclusions
Machine-learning methods can be extremely useful for researchers in inferring complex models from data or measurements.FFX mainly leverages recent advances in compressive sensing to learn analytically tractable mathematical models using only data.This makes FFX a deterministic symbolic regression approach.The core of FFX is pathwise regularized learning also called elastic net algorithm, which is a popular sparse regression approach.The sparse regression methods regularize the ordinary least squares regression problem by adding L 1 and L 2 penalty terms to discourage overfitting and tame the coefficients.This regularization of LS problem promotes sparsity there by recovering only a subset of candidate bases to explain the data.This property is what made sparse regression approaches such as LASSO, ridge, and their variants fundamental to compressive sensing.FFX enumerates the given basis functions to add nonlinear terms and uses pathwise regularized learning to extract several models of varying complexity (number of bases) and prediction accuracy.These models are then filtered using non-dominated sorting favoring lower-complexity models with best test accuracy.The enumeration of basis functions to add nonlinear bases to library and non-dominated sorting of learned models with respect to model complexity and test accuracy are inspired by GP.The core of FFX is pathwise regularized learning (elastic net) inspired by CS which is a regularized LS problem.
In this paper, we demonstrate the use of FFX to extract different linear and nonlinear PDEs by exploring patterns in data.We especially build and enumerate large candidate features using input data to leverage sparse optimization algorithm of FFX to discover parsimonious equations.The numerical experiments of several canonical PDEs shows that FFX is a promising machine-learning technique to capture true features and associated coefficients accurately.Additionally, input sensor data is slightly distorted by adding noise to test the robustness of the algorithm (i.e., adding noise before computing candidate basis functions).FFX works reasonably well up to 1% of noise but fails for higher amounts of noise.Additionally, wave equation is recovered for higher levels of noise (up to 25%) compared to other PDEs.This aspect of algorithm needs further investigation as the real-world data sets might have higher levels of noise.However, we note that adding noise to input data which is later used to enumerate features might be more challenging than adding noise directly to candidate basis functions (not shown in this study).Additionally, there is a challenge in identifying the desired model if the solution satisfies multiple equations.This is due to the sparsity-promoting behavior of sparse regression methods which converges to the optimal model containing the least number of bases to explain the data, e.g., it finds wave equations even though the solution satisfies both wave and Kawahara equations.Overall, FFX is a deterministic and scalable algorithm that can be exploited as a potential data-driven tool for recovering hidden physical structures or parameterizations representing high-dimensional systems such as turbulent geophysical flows, traffic models, and weather forecasting using only data.States Government nor any agency thereof, nor any of their employees, makes any warranty, express or implied, or assumes any legal liability or responsibility for the accuracy, completeness, or usefulness of any information, apparatus, product, or process disclosed, or represents that its use would not infringe privately owned rights.Reference herein to any specific commercial product, process, or service by trade name, trademark, manufacturer, or otherwise does not necessarily constitute or imply its endorsement, recommendation, or favoring by the United States Government or any agency thereof.The views and opinions of authors expressed herein do not necessarily state or reflect those of the United States Government or any agency thereof.

Figure 1 .
Figure1.Regularization path for elastic net by varying α on diabetic data set illustrated in[44].Each line traces the coefficient value of basis functions (features).As α tends to 0, more bases are added to explain data and finally resulting in ordinary LS solution.

Algorithm 3 :
Non-dominated filter Input: Coeff, B // Coefficient vectors, basis functions Output: Model // Pareto optimal Models // Build Models M cand = {} for i = 1 to length( Coeff ) do a = Coeff[i] a 0 = a[0] a nz = non-zero values in a B nz = expressions in B corr. to non-zero values in a m = model(a 0 ,a nz , B nz ) add m to M cand end // Non-dominated filtering

Figure 4 .
Figure 4. Non-dimensional analytical solution of heat equation.

Figure 7 .
Figure 7. Predicted models with varied test error and complexity for Burgers equation.

Figure 9 .
Figure 9. Predicted models with varied test error and complexity for KdV equation.

Author Contributions:
Conceptualization, H.V. and O.S.; Data curation, H.V.; Formal analysis, H.V.; Visualization, H.V.; Methodology, H.V. and O.S.; Supervision, O.S.; Writing-original draft, H.V.; and Writing-review and editing, H.V. and O.S.Funding: This material is based upon work supported by the U.S. Department of Energy, Office of Science, Office of Advanced Scientific Computing Research under Award Number DE-SC0019290.Disclaimer: This report was prepared as an account of work sponsored by an agency of the United States Government.Neither the United

Table 1 .
Model Summary of different canonical PDEs selected for recovery using FFX.

Table 2 .
Non-dimensional form of selected PDEs used to test the FFX.

Table 3 .
FFX hyper-parameters used in our setup.

Table 8 .
Identifying Kawahara equation using FFX.Wave equation: u t = 0.01278 u x Figure11.Predicted wave equation rather the intended Kawahara equation as the input data satisfies both PDEs.FFX identifies the sparser (lesser bases) model which is the wave equation..999uu x 5.76 × 10 6 u 3x 6.24 × 10 10 u 5x Kawahara equation: u t = uu x 6.0 × 10 6 u 3x 6.0 × 10 10 u 5x Figure 12.Predicted models with varied test error and complexity for Kawahara equation.