An Artificial Neural Network Approach for Solving Space Fractional Differential Equations

The linear algebraic system generated by the discretization of fractional differential equations has asymmetry, and the numerical solution of this kind of problems is more complex than that of symmetric problems due to the nonlocality of fractional order operators. In this paper, we propose the artificial neural network (ANN) algorithm to approximate the solutions of the fractional differential equations (FDEs). First, we apply truncated series expansion terms to replace unknown function in equations, then we use the neural network to get series coefficients, and the obtained series solution can make the norm value of loss function reach a satisfactory error. In the part of numerical experiments, the results verify that the proposed ANN algorithm can make the numerical results achieve high accuracy and good stability.


Introduction
Fractional calculus is a field to study the applications of arbitrary order integral and differential and the mathematical properties. Fractional calculus operator is very suitable for describing materials with genetic properties and memory. Since the beginning of the 21st century, it has been widely applied in many regions, such as high energy physics, anomalous diffusion, complex viscoelastic materials, geophysics, biomedical engineering and others (see [1][2][3][4][5][6]). Subsequently, the research of fractional differential equations has attracted extensive attention, and gradually become a new and active research field.
Due to the nonlocality of fractional operators, the numerical solutions of this kind of problems are more complex than symmetric problems. Although it is hard to theoretical analyze the analytical solutions of fractional differential equations, researchers are more and more interested in its approximate methods and numerical solution. In 2006, Meerschaert Tadjern [7] proposed a finite difference approximation method to discretize the two-dimensional fractional dispersion equation with variable coefficients; in Ref. [8], Sun et al. gave a detailed introduction of the finite difference method for variable-order time fractional diffusion equations in a finite domain; in the literature [9], Deng proposed a kind of finite element method to solve the time space fractional Fokker-Planck equation; In Ref. [10], Li et al. studied three typical Caputo type partial differential equations by using the finite difference method/local discontinuous Galerkin finite element method. In Ref. [11], Wang et al. developed an indirect finite element method for solving Dirichlet boundary value problems of Caputo fractional differential equations. In addition, many researchers have achieved fruitful results in discrete fractional differential equations [12][13][14][15][16][17][18]. Due to the nonlocality of fractional differential operators, the numerical methods for solving spatial fractional differential equations usually produce full stiffness matrix, which is traditionally solved by Gauss elimination method. This requires O(N 3 ) of operations and O(N 2 ) of memory for per iteration, and the computational cost and memory load are significantly increased compared with the numerical method of high order diffusion equation [19].
Compared with the traditional numerical methods, the approximate computation of ANN seems not very sensitive to the spatial dimension. ANN provides an adaptive mesh, but it does not need to explicitly deal with the mesh, only needs to solve the optimization problem. Based on these facts and advantages, network method for PDE problems was early expanded by Lagaris et al. [20,21], and has been applied to a large number of approximate problems of partial differential equations [22][23][24][25]. At present, there are some works that utilize neural networks approach to effectively solve fractional partial differential equations. In Ref. [26], Raissia et al. proposed the physics-informed neural networks (PINNs) to solve forward and inverse problems involving nonlinear partial differential equations and studied their convergence. Then in Ref. [27], Pang et al. solve space-time fractional advection-diffusion equations by fractional PINNs. The work focus on studying the parameter identification problem of fractional partial differential equations. The fPINNs can obtain superior accuracy results and can deal with the problem of highdimensional irregular-domain . Furthermore, a wavelet neural network was proposed to get the fractional differential equations solution in Ref. [28]. In Ref. [29], Gao et al. obtained numerical methods by using a triangle neural network to solve fractional differential equations. In recent years, there are many other scholars and experts who have many works on solving fractional differential equations with neural networks [30][31][32][33][34][35].
Recently, ANNs are becoming more and more important because of their many advantages, such as they can completely approximate any complex nonlinear relationship and has strong fault tolerance and robustness for all quantitative or qualitative information stored in each neuron in the network with equipotential distribution. They can adopt parallel distributed processing methods, which makes it possible to carry out large-scale operations quickly, and they can also learn and adapt to unknown or uncertain systems. More, the power series method can be used to deal with complex mathematical problems, can provide a solution function of series polynomial, and the coefficient can be determined by certain methods. In this paper, our main research combines artificial neural networks (ANNs) and truncated power series method as an iterative minimization algorithm to obtain the approximate solution of fractional diffusion equation. Here, we use fractional differential definitions proposed by Caputo. In Section 2, we introduce the definition and notation of fractional calculus, and gives the framework and basic introduction of problem. In Section 3, we present the implementation of the ANN algorithm. The solution is expressed as an appropriate truncated series expansion. By using the minimum mean square error function, then the problem is transformed into a minimization problem. In Section 4, three numerical examples are given to verify the effectiveness of the proposed method.

Preliminaries and Notation
In this section, we first present some basic definitions and notations about the fractional calculus.

Definition 1 ([36]
). Let f (x) be the continuously differentiable function of the finite interval [a, b] on the real axis R. The fractional integral is defined as where Γ(·) denotes the Gamma function.

Definition 2 ([37]
). Let f (x) be the continuously differentiable function of the finite interval [a, b] on the real axis R. The left-sided and the right-sided Caputo fractional derivative of order α > 0 are defined by respectively, where n − 1 α < n and Γ(·) denotes the Gamma function.
Next, we present the result of Caputo fractional derivative of power function and C a D α k x C = 0, here, we use α k to denote the smallest integer greater than or equal to α k and C is a constant. In the following work, we mark C a D α k x as a D α k x for the sake of simplicity.

Problem Description
In this paper, we consider the following fractional differential equation subject to boundary conditions u(a) = 1. Here 1 < α k are the fractional orders, P k and g(x) are given real-value analytical functions. The traditional power series method is used to solve ordinary differential equations and partial differential equations. Furthermore, since it is still very difficult to find the analytical solutions of fractional differential equations, this paper mainly studies the power series solution of fractional differential equations. In this method, the truncated power series expansion is considered to replace the unknown functions in the equations. Once the unknown functions in the problem are replaced, a series of equations based on discrete points are obtained with unknown coefficients. Then the artificial neural network method is used to solve such a set of algebraic equations.
Here, we consider the following power series expansion where a i are the unknown coefficients. When (5) is substituted into (4), we have here, we define h 1 := (b − a)/(N 1 + 1) the grid size in x-direction with x i := a + ih 1 for i = 0, 1, . . . , N 1 + 1 (6). Then we obtain the following discretization scheme where n is the order of power-series polynomial.

Implement ANNs
According to the universal approximation theorem [38], for a feedforward neural network composed of a linear output layer and at least one hidden layer using an "extrusion" activation function, as long as the number of hidden layer neurons is sufficient, it can approximate any bounded closed set function defined in real number space with any accuracy. The neural network can be seen as a "universal" function and can be used to solve approximately complex FDEs. In this work, the ANN is shown in Figure 1. The architecture consists of multi-layers of neural networks with input layer, one or more hidden layers and the output layer. In Figure 1, each neuron in hidden layers consists of a function, which deals with the linear combination of weight matrix (the model parameters w i are optimized by learning algorithm) and inputs of the neuron. The output of each neuron is the output of ANN (when the neuron is located in the output layer) or the input of another neuron of ANN. In this paper, the ANN employed to solve FDEs (4) can be mathematically represented by Formula (7).
In the neural structure, the relationship of each unit can be mathematically induced as below: where σ is the activation functions, w j and b l denote the corresponding connection weights and bias term, respectively, c l is both the input and output in the l th and l − 1 th hidden layer. Here, we discrete x on the interval [a, b] to obtain several points x j where j = 1, . . . , s.
For example, we take s = 10, s = 15 and s = 20 respetively in the numerical experiments. These points and the corresponding g(x j ) are the sample points we give. When the neural network is employed to solve the numerical solution of problem (4), the loss function is defined as follow the value of λ is given artificially, and L2 regularization helps drive outlier weights closer to 0 but not quite to 0. Tor simplicity, the above mathematical symbols ζ k,j can be expressed as follows The left parameter a i is trained through the neural network, and the g(x j ) value can be obtained by bringing it into the discrete point x j . Then the left term in Equation (7) can be compared with the real value of the right term g(x j ). The total error is then gained by summing the error functions at all collocation points, as shown below then, the following optimization problem is achieved arg min In Figure 1, Equation (8) is a neuron in the hidden layer of Figure 1. The forward propagation of Figure 1 finally obtains a i . Then substitute a i into the left-hand term of Equation (7), discrete point x j substitute the right-hand term to obtain the discrete value g(x j ), and then execute Equation (9). When updating w j and b l , Equation (9) is used as a loss function. In this neural network, we use the Rectified Linear Unit (ReLU) function as activation function σ because ReLU has the following advantages. Firstly, it can make the network training faster because its derivative is easier to obtain than sigmoid and tanh functions. Secondly, it increases the nonlinearity of the network, because it is a nonlinear function itself. When added to the neural network, it can be a grid fitting nonlinear mapping. Thirdly, it can also prevent the gradient from disappearing. When the value is too large or too small, the derivatives of sigmoid and tanh functions are close to 0, while relu is an unsaturated activation function and this phenomenon does not exist. Finally, it can make the grid sparse, because the part less than zero is zero, and the part greater than zero has value, so over fitting can be reduced.
In order to minimize the loss function, we choose to use Adaptive moment estimation (Adam) algorithm to optimize the loss function, because Adam algorithm is effective in practical application. It owns higher convergence speed and more valid learning effect when compared with other adaptive learning rate methods. It can also correct the issues existing in other optimization techniques, such as the disappearance of learning rate, too slow convergence speed or large fluctuation of loss function caused by high variance parameter update.

Numerical Simulations
In order to verify the validity and efficiency of the proposed ANN algorithm, we give three examples in this section. All of the numerical results in the following examples were implemented using Python version 3.9.1. In the three examples, each hidden layer is set to contain 30 hidden layer neurons. In example 1, the proposed ANN algorithm sets up three layers of neural network, including one input layer, one output layer and one hidden layer; in examples 2 and 3, we set up five layers of neural networks including three hidden layers. In addition, when the right term of the equation is simple and the order of power-series polynomial is relatively small, we adopt 10 −3 as the learning rate. When the order is set to be 9, we usually choose 5 × 10 −4 as the learning rate, and the learning rate is adjusted between 10 −3 and 5 × 10 −4 . The initial output layer connection weight w i (for i = 1, . . . , n) were quantized with small random values to start the procedure, which is selected from the interval (0, 1). In numerical experiments, in order to facilitate comparison, we compute the following mean square error where m denotes the number of discrete samples in domain [0,1].

Example 1
We consider solving the problem of Bagley-Torvik equation with initial condition as [39]: We can find from Table 1 that with the increase of the number of iterations, our error accuracy can best reach 10 −6 . Compared with the data in Table 2, we find that the error accuracy of ANN method in Table 2 can only reach 10 −5 at most. For example, when n = 9 and s = 20, the accuracy of the ANN method obtained in this paper can reach 10 −6 , while the accuracy of the method in Table 2 can only reach 10 −4 . This shows that the convergence effect of the ANN method in this paper is better in the iteration. From Figure 2, we can observe that when the number of iteration steps reaches 25, the value of the loss function has a sharp decrease trend and can reach a relatively good accuracy. In Figure 3, the curve of the exact solution can almost coincide with the curve of the approximate solutions, which verifies the effectiveness of the proposed ANN algorithm, and Figure 4 evidences this conclusion. In addition, the total errors over different order of power-series polynomial were plotted for the iteration step number equals to 2000 in the Figure 5.

Example 2
Consider the following Bagley-Torvik equation with initial condition as: The test Example 2 showed quite similar trends as of Example 1. Table 3 exhibited that mean square error tends to decrease with the increase of the order n of the power-series polynomial. For example, when n = 3 and s = 20, the accuracy of the error can only reach about 10 −3 , and when n = 7 with s = 20, the accuracy of the error can reach 10 −6 . The total errors over different order of power-series polynomial were plotted for the iteration step number equals to 2000 in the Figure 6. In Figure 7, we also found that the number of iterative steps is about 125, and the value of the loss function basically drops to the lowest point, indicating that the convergence speed of our method is quite considerable. As shown in Figure 8, our approximate solution can be very close to the exact solution, also and Figure 9 evidences this conclusion.

Example 3
Consider the following Bagley-Torvik equation with initial condition as: with u(0) = 1. As in the previous examples, Table 4 basically shows the same results. The values of the loss function for different number of collocation point were plotted in the Figure 10. Moreover, the relationship between exact solution and approximate solutions were presented in the Figures 11-13.

Conclusions
In this work, we present that it is possible to approximate the behavior of fractional differential equations by ANN algorithm when the derivative is fractional order. The advantages of the proposed ANN algorithm are reflected in the feasibility: the estimation made by the proposed neural network algorithm could get a satisfactory approximate solution with non-mesh discretization. Furthermore, the effectiveness and applicability of the algorithm are verified by solving the above multi-term FDEs.
Remark: The structure of neural network was built by the current authors using the special libraries of pytorch. We have uploaded the main source code on GitHub: https: //github.com/hangzhounormal/sfdeNN (access on 12 February 2022). All computations were carried out on python version 3.