Next Article in Journal
An 8D Hyperchaotic System of Fractional-Order Systems Using the Memory Effect of Grünwald–Letnikov Derivatives
Next Article in Special Issue
Approximate Solution of a Kind of Time-Fractional Evolution Equations Based on Fast L1 Formula and Barycentric Lagrange Interpolation
Previous Article in Journal
Solutions for a Logarithmic Fractional Schrödinger-Poisson System with Asymptotic Potential
Previous Article in Special Issue
Difference Approximation for 2D Time-Fractional Integro-Differential Equation with Given Initial and Boundary Conditions
 
 
Font Type:
Arial Georgia Verdana
Font Size:
Aa Aa Aa
Line Spacing:
Column Width:
Background:
Article

Neural Fractional Differential Equations: Optimising the Order of the Fractional Derivative

by
Cecília Coelho
1,*,
M. Fernanda P. Costa
1 and
Luís L. Ferrás
1,2
1
Centre of Mathematics (CMAT), University of Minho, 4710-057 Braga, Portugal
2
CEFT—Centro de Estudos de Fenómenos de Transporte, Department of Mechanical Engineering (Section of Mathematics), FEUP, University of Porto, 4200-465 Porto, Portugal
*
Author to whom correspondence should be addressed.
Fractal Fract. 2024, 8(9), 529; https://doi.org/10.3390/fractalfract8090529
Submission received: 30 July 2024 / Revised: 4 September 2024 / Accepted: 6 September 2024 / Published: 10 September 2024

Abstract

:
Neural Fractional Differential Equations (Neural FDEs) represent a neural network architecture specifically designed to fit the solution of a fractional differential equation to given data. This architecture combines an analytical component, represented by a fractional derivative, with a neural network component, forming an initial value problem. During the learning process, both the order of the derivative and the parameters of the neural network must be optimised. In this work, we investigate the non-uniqueness of the optimal order of the derivative and its interaction with the neural network component. Based on our findings, we perform a numerical analysis to examine how different initialisations and values of the order of the derivative (in the optimisation process) impact its final optimal value. Results show that the neural network on the right-hand side of the Neural FDE struggles to adjust its parameters to fit the FDE to the data dynamics for any given order of the fractional derivative. Consequently, Neural FDEs do not require a unique α value; instead, they can use a wide range of α values to fit data. This flexibility is beneficial when fitting to given data is required and the underlying physics is not known.

1. Introduction

Real-world systems are often modelled using integral/differential equations, which are then numerically solved to predict the system behaviour and evolution. This process can be time-consuming, as numerical simulations sometimes take months, and finding the correct model parameters is often challenging. However, with significant advancements in Neural Networks (NNs) that can learn patterns, real-world systems are increasingly being modelled using a combination of integral/differential models and NNs or even NNs alone [1,2,3,4].
Neural Ordinary Differential Equations (Neural ODEs) were introduced in 2018 [5] (see also [6,7]) as a continuous version of the discrete Residual Neural Networks and claimed to offer a continuous modelling solution for real-world systems that incorporate time-dependence, mimicking the dynamics of a system using only discrete data. Once trained, the Neural ODEs result in a hybrid ODE (part analytical, part NN-based) that can be used for making predictions by numerically solving the resulting ODEs. The numerical solution of these hybrid ODEs is significantly simpler and less time-consuming compared to the numerical solution of complex governing equations, making Neural ODEs an excellent choice for modelling time-dependent, real-world systems [8,9,10]. However, the simplicity of ODEs sometimes limits their effectiveness in capturing complex behaviours characterised by intricate dynamics, non-linear interactions, and memory. To address this, Neural Fractional Differential Equations (Neural FDEs) were recently proposed [11,12].
Neural FDEs, as described by Equation (1), are an NN architecture designed to fit the solution h ( t ) to given data { x 0 , x 1 , , x N } (for example, experimental data) over a specified time range [ t 0 , T ] . The Neural FDE combines an analytical part, D t α 0 C h ( t ) , with an NN-based part, f θ ( t , h ( t ) ) , leading to the initial value problem
D t α 0 C h ( t ) = f θ ( t , h ( t ) ) , h ( t 0 ) = x 0 , t [ t 0 , T ] .
Here, D t α 0 C g ( t ) denotes the Caputo fractional derivative [13,14], defined for 0 < α < 1 (and considering a generic scalar function g ( t ) ) as
D t α 0 C g ( t ) = 1 Γ ( 1 α ) 0 t ( t s ) α g ( s ) d s ,
where Γ is the Gamma function.
In this study, we focus on the Caputo fractional derivative, although several other definitions of fractional derivative exist in the literature, such as the Riemann–Liouville definition,
D t α 0 RL g ( t ) = 1 Γ ( 1 α ) t 0 t ( t s ) α g ( s ) d s .
Unlike the Riemann–Liouville derivative, solving differential equations with the Caputo definition does not require specifying fractional order initial conditions. Moreover, if g ( t ) is constant, its Caputo derivative is zero, while the Riemann–Liouville fractional derivative is not. These two definitions are closely related, and under certain continuity conditions on g ( t ) , it can be shown (see [15], Lemma 2.12) that
D t α 0 RL f ( t ) = D t α 0 C g ( t ) + g ( 0 ) t α Γ ( 1 α ) .
Therefore, if g ( 0 ) = 0 , the two definitions are equivalent.
An important feature of Neural FDEs is their ability to learn not only the optimal parameters θ of the NN f θ ( t , h ( t ) ) but also the order of the derivative α (when α = 1 , we obtain a Neural ODE). This is achieved using only information from the time-series dataset { x 0 , x 1 , , x N } , where each x i = ( x i 1 , x i 2 , , x i d ) R d , i = 0 , , N is associated with a time instant t i .
In [12], the α value is learnt from another NN α ϕ with parameters ϕ . Therefore, if L ( θ , ϕ ) represents the loss function, we can train the Neural FDE by solving the minimisation problem (5). The parameters θ and ϕ are optimised by minimising the error between the predicted h ^ ( t i ) , i = 1 , , N and ground-truth { x 0 , x 1 , , x N } values (in this work, we consider that the solution outputted by the solver is the predicted output, although this might not always be the case (e.g., image classification)):
θ R n θ , ϕ R n ϕ minimize L ( θ , ϕ ) = 1 N i = 1 N | | h ^ ( t i ) x i | | 2 2 subject to h ^ ( t i ) = FDESolve ( α ϕ , f θ , x 0 , { t 0 , t 1 , , t N } ) ,
The popular Mean Squared Error (MSE) loss function was considered in [12] and also in this work. Here, FDESolve ( f θ , x 0 , { t 0 , t 1 , , t N } ) refers to any numerical solver used to obtain the numerical solution h ^ ( t ) for each instant t i .
Since Neural FDEs are a recent research topic, there are no studies on the uniqueness of the parameter α and its interaction with the NN f θ ( t , h ( t ) ) . In [12], the authors provided the values of α learnt by Neural FDEs for each dataset; however, a closer examination reveals that these values differ significantly from the ground-truth values, which were derived from synthetic datasets. The authors attributed this discrepancy to the approximation capabilities of NNs, meaning that, during training, f θ ( t , h ( t ) ) adapts to any given α (this is a complex interaction since, in [12], α is also learnt by another NN). Additionally, α must be initialised in the optimisation procedure, and yet, no studies have investigated how the initialisation of α affects the learned optimal α and the overall performance of Neural FDEs.
In this work, we address these key open questions about the order of the fractional derivative α in Neural FDEs. We show that Neural FDEs are capable of modelling data dynamics effectively, even when the learnt value of α deviates significantly from the true value. Furthermore, we perform a numerical analysis to investigate how the initialisation of α affects the performance of Neural FDEs.
This paper is organised as follows: In Section 2, we provide a brief overview of FDEs and Neural FDEs, highlighting the theoretical results regarding the existence and uniqueness of solutions. We also discuss how the solution depends on the given data. Section 3 presents a series of numerical experiments on the non-uniqueness of the learnt α values. The paper ends with the discussion and conclusions in Section 4.

2. Neural Networks and Theory of Fractional Initial Value Problems

As shown in the introduction, a Neural FDE is composed of two NNs:
  • An NN with parameters θ , denoted as f θ , that models the right-hand side of an FDE,
    D t α 0 C h ( t ) = f θ ( t , h ( t ) ) ,
    where h ( t ) is the state of the system at time step t;
  • An NN with parameters ϕ (or a learnable parameter), referred to as α ϕ , that models α ,
    α = α ϕ .
As shown in Figure 1 in this work, we constructed f θ with 3 layers: an input layer with 1 neuron and a hyperbolic tangent (tanh) activation function; a hidden layer with 64 neurons and tanh; and an output layer with 1 neuron. For learning α , we considered an NN α ϕ with 3 layers: an input layer with 1 neuron and a hyperbolic tangent (tanh) activation function; a hidden layer with 32 neurons and tanh; an output layer with 1 neuron and a sigmoid activation function (that helps keep the value of α within the interval ( 0 , 1 ) ). For ease of understanding, we consider h ^ ( t ) to be a scalar in Figure 2 and in Equations (8) and (9). We use h ^ ( t ) instead of h ( t ) because it is assumed that h ( t ) is being evaluated during the training of the Neural FDE, where a numerical solution is needed ( h ^ ( t ) is a numerical approximation of h ( t ) [12]). After training, we obtain the final/optimal Neural FDE model and use again the notation f θ ( t , h ( t ) ) .
In the NN f θ ( t , h ^ ( t ) ) (Figure 1), the values w i and w i o for i = 1 , , 64 are the weights of the hidden and output layers, respectively, and b is the bias, θ = { w 1 , , w 64 , w 1 o , , w 64 o , b } , where 64 is an arbitrary number. The output of the NN f θ ( t , h ^ ( t ) ) can be written as
w 1 o tanh ( w 1 h ^ ( t ) + b ) + w 64 o tanh ( w 64 h ^ ( t ) + b ) = i = 1 64 w i o tanh ( w i h ^ ( t ) + b ) .
An important feature is that there is no activation function in the output layer of f θ ( t , h ^ ( t ) ) , allowing the NN to approximate any function f ( t , h ( t ) ) . If we opt to use, for example, a tanh in the output layer, it would constrain the fractional derivative D t α 0 C h ( t ) to vary only from −1 to 1, thus limiting the fitting capabilities of the Neural FDE. This limitation can be mitigated by normalising the given data { x 0 , x 1 , , x N } .
Remark 1.
In Figure 1 and Equation (8), we observe h as a function of t. However, the NN depicted on the left side of Figure 1 does not use t as an input. Instead, the NN is called at each iteration of the numerical solver that addresses a discretised version of Equation (1). This solver defines all time instants through a mesh over the interval [ t 0 , T ] [12], and consequently, each evaluation of f θ is always associated with a specific time instant.
Since, for a differentiable function g ( h ) with continuous derivative on [ h 1 , h 2 ] , we have that (using the mean value Theorem),
| g ( h 1 ) g ( h 2 ) | max h [ h 1 , h 2 ] | g ( h ) | | h 1 h 2 |
and | tanh ( h ) | = | 1 tanh 2 ( h ) | 1 ( 1 tanh ( h ) 1 ), we can say that tanh is 1-Lipschitz, that is,
| tanh ( h 1 ) tanh ( h 2 ) | 1 | h 1 h 2 | .
Define g ( h ) as
g ( h ) = i = 1 64 w i o tanh ( w i h + b ) ,
which is a weighted sum of 64 tanh functions. Then, g ( h ) is L-Lipschitz, with L < 64 w max o ( w m a x o = max { w 1 o , , w 64 o } ). This property is important to guarantee the uniqueness of the solution of (1).

2.1. Fractional Differential Equations

We now provide some theoretical results that are fundamental in understanding the expected behaviour of the Neural FDE model. Consider the following fractional initial value problem:
D t α 0 C z ( t ) = f ( t , z ( t ) ) z ( t 0 ) = z 0 .
For ease of understanding, we restrict the analyses to the case where z ( t ) is a scalar function and 0 < α < 1 .

2.1.1. Existence and Uniqueness

The following Theorem [14,15] provides information on the existence of a continuous solution to problem (10):
Theorem 1.
Let 0 < α , x 0 R , Q > 0 , and T * > 0 . Define G = 0 , T * × x 0 Q , x 0 + Q , and let the function f : G R be continuous. Furthermore, we define M : = sup ( x , y ) G | f ( x , y ) | and
T : = T * , if M = 0 min T * , Q Γ ( α + 1 ) M 1 / α otherwise
 Then, there exists a function z ( t ) C 0 [ 0 , T ] solving the initial value problem (10).
Note that this continuous solution may be defined in a smaller interval [ 0 , T ] compared to the interval [ 0 , T * ] , where the function f ( t , z ( t ) ) is defined ( T T * ). From Theorem 1, we can infer that high values of | f θ ( t , h ( t ) ) | (see Equation (1)) decrease the interval within which we can guarantee a continuous solution. However, these conclusions should be approached with caution. As shown later, we only access discrete values of the function f θ ( t , h ( t ) ) in the numerical solution of (1). Note that α also affects the size of the interval. Its contribution can be either positive or negative, depending on the values of Q and M.
The following Theorem establishes the conditions for which we can guarantee the uniqueness of the solution z ( t ) [14,15]:
Theorem 2.
Let 0 < α , x 0 R , and T * > 0 . Define the set G = [ 0 , T * ] × [ x 0 Q , x 0 + Q ] , and let the function f : G R be continuous and satisfy a Lipschitz condition with respect to the second variable:
| f ( t , z 1 ) f ( t , z 2 ) | L | z 1 z 2 | ,
 where L > 0 is a constant independent of t, z 1 , and z 2 . Then, there exists a uniquely defined function z C 0 [ 0 , T ] solving the initial value problem (10).
As shown above, the function on the right-hand side of the Neural FDE model is Lipschitz (see Equation (9)). Therefore, we can conclude that the solution to Equation (1) is unique.

2.1.2. Analysing the Behaviour of Solutions with Perturbed Data

Other results of interest for Neural FDEs pertain to the dependencies of the solution on f ( t , z ( t ) ) and α . In Neural FDEs, both f ( t , z ( t ) ) and α are substituted by NNs that vary with each iteration of the Neural FDE training [12].
Let u ( t ) be the solution of the initial value problem
D t α 0 C u ( t ) = f ˜ ( t , u ( t ) ) u ( t 0 ) = z 0 .
where f ˜ ( t , u ( t ) ) is a perturbed version of f, which satisfies the same hypotheses as f .
Theorem 3.
Let
ε : = max ( x 1 , x 2 ) G f ( x 1 , x 2 ) f ˜ ( x 1 , x 2 ) .
 If ε is sufficiently small, there exists some T > 0 such that both functions z (Equation (10)) and u (Equation (11)) are defined on [ 0 , T ] , and we have
sup 0 t T | z ( t ) u ( t ) | = O max ( x 1 , x 2 ) G f ( x 1 , x 2 ) f ˜ ( x 1 , x 2 ) ,
 where z ( t ) is the solution of (10).
This Theorem provides insight into how the solution of (1) changes in response to variations in the NN f θ ( t , h ( t ) ) . While the variations in both the solution and the function are of the same order for small changes of the function, it is crucial to carefully interpret these results given the NN f θ ( t , h ( t ) ) as defined by Equation (9).
When training the Neural ODE, one must solve the optimisation problem (5), where the weights and biases are adjusted until an optimal Neural FDE model is obtained (training 1). If a second, independent training (training 2) of the Neural FDE is conducted with the same stopping criterion for the optimisation process, a new Neural FDE model with different weights and biases may be obtained.
The NN learns model parameters based on a set of ordered data, meaning the number of elements in the set significantly influences the difference between z ( t ) and u ( t ) , as in Theorem 3. This effect is illustrated in Figure 2, where a training dataset of only two observations can be fitted by two distinct functions.
Therefore, Figure 2 tells us that when modelling with Neural FDEs, it is important to have some prior knowledge of the underlying physics of the data. This is crucial because the number of data points available for the training process may be beyond our control. For instance, the data can originate from real experiments where obtaining results is challenging.
Regarding the influence of the order of the derivative on the solution, we have the following [14,15]:
Theorem 4.
Let u ( t ) be the solution of the initial value problem
D t α ˜ 0 C u ( t ) = f ( t , u ( t ) ) u ( t 0 ) = z 0 .
 where α ˜ is a perturbed α value. Let ε : = α ˜ α . Then, if ε is sufficiently small, there exists some T > 0 such that both the functions u and z are defined on [ 0 , T ] , and we have that
sup 0 t T | u ( t ) z ( t ) | = O ( α ˜ α )
 where z ( t ) is the solution of (10).
Once again, for small changes in α , the variations in both the solution and the order of the derivative are of the same order. This property is explored numerically and with more detail later in this work when solving (5). It is important to note that the NN f θ ( t , h ( t ) ) is not fixed in our problem (1) (its structure is fixed, but the weights and bias change along the training optimisation procedure). However, Theorem 4 assumes that the function f ( t , u ( t ) ) is fixed. Therefore, Theorem 4 gives us an idea of the changes in our solution but does not allow the full understanding of its variation along the optimisation procedure.

2.1.3. Smoothness of the Solution

For the classical case, α = 1 in Equation (10), we have (under some hypotheses on the interval [ a , b ] ) that if f C k 1 ( [ a , b ] ) , then z ( t ) is k times differentiable.
For the fractional case, even if f C , it may happen that z ( t ) C 1 . This means that the solutions may not behave well, and solutions with singular derivatives are quite common. See [16] for more results on the smoothness properties of solutions to fractional initial value problems.
These smoothness properties (or lack of smoothness) make it difficult for numerical methods to provide fast and accurate solutions for (1), thus making Neural FDEs more difficult to handle compared to Neural ODEs. It should be highlighted that during the learning process, the NN f θ ( t , h ( t ) ) always adjusts the Neural FDE model to the data, independent of the amount of error obtained in the numerical solution of the FDE.

3. Neural Fractional Differential Equations and the Learnt Order of the Derivative— α

In Neural FDEs, the goal is to model systems where the order of the derivative, α , is a parameter that needs to be learnt along with the parameters of a NN f θ ( t , h ( t ) ) . A key challenge in this approach is the potential variability and non-uniqueness of the learnt α values, especially when the ground-truth α used to generate synthetic datasets is known a priori. This variability arises from the highly flexible nature of NNs, which can adjust their parameters to fit the data well regardless of the specific value of α .

3.1. Numerical Experiments

The results obtained in [12] propose the existence of parameters α 1 , α 2 , , α n and corresponding parameter vectors θ 1 , θ 2 , , θ n that satisfy the following condition:
L ( θ i , α i ) 0 for i = 1 , 2 , , n .
In theory, this implies that, for each i, the loss function L can converge to zero.
There are some results in the literature on the universal approximation capabilities of Neural ODEs and ResNets. Interested readers should consult [17,18] for more information.
Based on these observations, we now conduct numerical experiments to observe the practical outcomes in detail. Specifically, we employ a Neural FDE to model population dynamics with different ground-truth α values. Our goal is to examine how different initialisations of α impact the final learnt α values.
We analyse the α values learnt across multiple runs and observe how α evolves during training. Additionally, we fix the α value and allow the Neural FDE to learn only the right-hand side of the FDE, comparing the loss for different α values.
Consider a population of organisms that follows a fractional-order logistic growth. The population size P ( t ) at time t is governed by the following FDE of order α :
D t α 0 C = r P ( t ) 1 P ( t ) K ,
with initial condition P ( 0 ) = P ( t 0 ) = 100 , a growth-rate r = 0.1 , and carrying capacity K = 1000 .
To conduct the experiments, three training datasets were generated by numerically solving (14) over the interval t ( 0 , 10 ) with a step size of 0.1 for three different values of α , namely α = 0.3 , 0.5 , and 0.99, chosen arbitrarily. These datasets are denoted as P α = 0.3 , P α = 0.5 , and P α = 0.99 , respectively. For the experiments, Adam optimiser [19] was used with a starting learning rate of 1 × 10 3 .

3.1.1. Case Study 1: Varying α and Fixed Number of Iterations

The generated datasets were used to train the Neural FDE model (see Equation (1)) with four different initialisations of the derivative order: α = 0.1 , 0.3 , 0.5 , and 0.99 . For each initialisation, three independent runs were conducted considering 200 iterations.
The results obtained for each dataset, P α = 0.3 , P α = 0.5 , and P α = 0.99 using the various α initialisation values are presented in Table 1, Table 2 and Table 3, and the evolution of the training losses and α values can be visually observed in Figure 3, Figure A1, and Figure A2 and Figure 4, Figure A3, and Figure A4, respectively.
The results presented in Table 1, Table 2 and Table 3 indicate that the initial value of α does not significantly impact the final learnt value of α . While there is some effect, it is challenging to discern a clear pattern, and the observed variations fall within a certain range of randomness. However, the ground-truth value of α in the datasets influences the learnt value, with higher ground-truth values leading to higher learnt values. For example, in Table 3, all learnt α values are above 0.50 , whereas for other datasets, the learnt α values are lower. This is expected since the Neural FDE model should fit the given data (that depends on α ).
Furthermore, the results demonstrate the approximation capabilities of NNs, which allow them to model the data dynamics with low errors even when the learnt α is far from the ground-truth value. For example, in Table 1, the ground-truth α is 0.3 , but the lowest training loss was achieved with a learnt α of 0.4497 . Similarly, in Table 3, the ground-truth α is 0.99 , but the lowest training loss values were achieved with learnt α of 0.518 and 0.5876 . It is interesting to note that the values of α are significantly different from 1. The results obtained suggest that the optimisation procedure may be placing greater importance on the NN f θ ( t , h ( t ) ) .
The evolutions of the loss and the learnt α values along training (Figure 3 and Figure 4, respectively, for the case P α = 0.3 ) indicate that the α initialisation and the ground-truth α do not significantly influence the Neural FDE training process. In Figure 4, a pattern is observed. Initially, the α values increase drastically and then slowly decrease, approaching a plateau. In some cases, after around 100 iterations, the values begin to increase again. This behaviour results from the complex interplay between the two different NNs. The behaviour is even more irregular for the cases P α = 0.5 , P α = 0.99 , as shown in Appendix A.

3.1.2. Case Study 2: Fixed α and Fixed Number of Iterations

To demonstrate the effectiveness of Neural FDEs in minimising the loss function L ( θ , α ) for various α values, we performed experiments with fixed α values of 0.1 , 0.3 , 0.5 , and 0.99 . In these experiments, the Neural FDEs used the NN f θ exclusively to fit the solution h ( t ) to the three distinct datasets P α = 0.3 , P α = 0.5 , P α = 0.99 . The experimental conditions remained the same, except that α was not learnable. Note that the stopping criterion was set to a fixed number of iterations, specifically 200.
The final training losses obtained for three runs for each fixed α are summarised in Table 4, Table 5 and Table 6. The evolution of the training losses can be visualised in Figure 5, Figure 6 and Figure 7.
The results in Table 4, Table 5 and Table 6 show that the final training loss values are generally similar for different values of α . Even when the fixed α matches the ground-truth α of the dataset, in general, the final loss values are comparable to other fixed α values.
The training losses evolution in Figure 5, Figure 6 and Figure 7 show similar behaviour for the different fixed α values.

3.1.3. Case Study 3: Fixed α and Fixed Loss Threshold

As a final experiment, we aimed to numerically demonstrate that Neural FDEs are capable of achieving the same fit to the data, independent of the α order. To unequivocally show this, we modified the stopping criteria of our Neural FDE training from a maximum number of iterations to a loss threshold of 1 × 10 5 . This approach ensures that the Neural FDEs are trained until the threshold is achieved, allowing us to demonstrate that they can reach the same loss values regardless of the α value.
In this experiment, we performed one run for each dataset P α = 0.3 , P α = 0.5 , and P α = 0.99 with fixed α values of 0.1, 0.3, 0.5, and 0.99. The results are organised in Table 7, Table 8 and Table 9.
The results presented in Table 7, Table 8 and Table 9 show that the NN f θ is capable of optimising the parameters θ to accurately model the Neural FDE to the data dynamics, regardless of whether the imposed value of α is close to or far from the ground truth.
These results may lead some to believe that changing α does not impact memory. They may also suggest that the Neural FDE can model systems with varying memory levels independent of α . However, this only happens because the NN f θ ( t , h ( t ) ) , which models the FDE’s right-hand side, adjusts its parameters θ to fit the FDE for any α value (effectively mimicking the memory effects associated with α ). Therefore, Neural FDEs do not need a unique α value for each dataset. Instead, they can work with an infinite number of α values to fit the data. This flexibility is beneficial when fitting to given data is required and the underlying physics is not known.

4. Conclusions

In this work, we present the theory of fractional initial value problems and explore its connection with Neural Fractional Differential Equations (Neural FDEs). We analyse both theoretically and numerically how the solution of a Neural FDE is influenced by two factors: the NN f θ , which models the right-hand side of the FDE, and the NN that learns the order of the derivative.
We also investigate the numerical evolution of the order of the derivative ( α ) and training loss across several iterations, considering different initialisations of the α value. For this experiment, with a fixed number of iterations at 200, we created three synthetic datasets for a fractional initial value problem with ground-truth α values of 0.3, 0.5, and 0.99. We tested four different initialisations for α : 0.1, 0.3, 0.5, and 0.99. The results indicate that both the initial α value and the ground-truth α have minimal impact on the Neural FDE training process. Initially, the α values increase sharply and then slowly decrease towards a plateau. In some cases, around 100 iterations, the values begin to rise again. This behaviour results from the complex interaction between the two NNs, and it is particularly irregular for α = 0.5 and α = 0.99 . The loss values achieved are low across all cases.
We then repeated the experiments, fixing the α value, meaning there was no initialisation of α , and the only parameters changing in the minimisation problem were those of the NN f θ ( t , h ( t ) ) . The results confirm that the final training loss values are generally similar across different fixed values of α . Even when the fixed α matches the ground-truth α of the dataset, the final loss remains comparable to other fixed α values.
In a final experiment, we modified the stopping criteria of our Neural FDE training from a maximum number of iterations to a loss threshold. This ensures that the Neural FDEs are trained until the loss threshold is achieved, demonstrating that they can reach similar loss values regardless of the α value. We conclude that f θ struggles to adjust its parameters θ to fit the FDE to the data for any given derivative order. Consequently, Neural FDEs do not require a unique α value for each dataset. Instead, they can use a wide range of α values to fit the data, suggesting that f θ is a universal approximator.
Comparing the loss values obtained across all three experiments, we conclude that there is no significant performance gain in approximating α using a NN compared to fixing it. However, learning α may require more iterations to achieve the same level of fitting performance. This is because allowing α to be learnt provides the Neural FDE with greater flexibility to adapt to the dynamics of the data under study.
If we train the model using data points obtained from an unknown experiment, then the flexibility of the Neural FDE proves to be an effective method for obtaining intermediate information about the system, provided the dataset contains sufficient information. If the physics involved in the given data is known, it is recommended to incorporate this knowledge into the loss function. This additional information helps to improve the extrapolation of results.

Author Contributions

Conceptualization, C.C. and L.L.F.; Methodology, C.C.; Software, C.C.; Validation, C.C.; Formal analysis, C.C. and L.L.F.; Investigation, C.C. and L.L.F.; Data curation, C.C.; Writing—original draft, C.C. and L.L.F.; Writing—review & editing, C.C., M.F.P.C. and L.L.F.; Visualization, C.C.; Supervision, M.F.P.C. and L.L.F. All authors have read and agreed to the published version of the manuscript.

Funding

The authors acknowledge the funding by Fundação para a Ciência e Tecnologia (Portuguese Foundation for Science and Technology) through CMAT projects UIDB/00013/2020 and UIDP/00013/2020 and the funding by FCT and Google Cloud partnership through projects CPCA-IAC/AV/589164/2023 and CPCA-IAC/AF/589140/2023. C. Coelho would like to thank FCT the funding through the scholarship with reference 2021.05201.BD. This work was also financially supported by national funds through the FCT/MCTES (PIDDAC), under the project 2022.06672.PTDC—iMAD—Improving the Modelling of Anomalous Diffusion and Viscoelasticity: solutions to industrial problems.

Data Availability Statement

The original contributions presented in the study are included in the article, further inquiries can be directed to the corresponding author.

Conflicts of Interest

The authors declare no conflicts of interest.

Appendix A. Evolution of Loss and α along Training

Figure A1. Training loss evolution for Neural FDE when modelling P α = 0.5 for (a) α = 0.1 , (b) α = 0.3 , (c) α = 0.5 , and (d) α = 0.99 initialisation (loss (vertical axis) vs. number of iterations (horizontal axis)).
Figure A1. Training loss evolution for Neural FDE when modelling P α = 0.5 for (a) α = 0.1 , (b) α = 0.3 , (c) α = 0.5 , and (d) α = 0.99 initialisation (loss (vertical axis) vs. number of iterations (horizontal axis)).
Fractalfract 08 00529 g0a1
Figure A2. Training loss evolution for Neural FDE when modelling P α = 0.99 for (a) α = 0.1 , (b) α = 0.3 , (c) α = 0.5 , and (d) α = 0.99 initialisation (loss (vertical axis) vs. number of iterations (horizontal axis)).
Figure A2. Training loss evolution for Neural FDE when modelling P α = 0.99 for (a) α = 0.1 , (b) α = 0.3 , (c) α = 0.5 , and (d) α = 0.99 initialisation (loss (vertical axis) vs. number of iterations (horizontal axis)).
Fractalfract 08 00529 g0a2aFractalfract 08 00529 g0a2b
Figure A3. Evolution of α along the iterations. Case P α = 0.5 for (a) α = 0.1 , (b) α = 0.3 , (c) α = 0.5 , and (d) α = 0.99 initialisation ( α (vertical axis) vs. number of iterations (horizontal axis)).
Figure A3. Evolution of α along the iterations. Case P α = 0.5 for (a) α = 0.1 , (b) α = 0.3 , (c) α = 0.5 , and (d) α = 0.99 initialisation ( α (vertical axis) vs. number of iterations (horizontal axis)).
Fractalfract 08 00529 g0a3
Figure A4. Evolution of α along the iterations. Case P α = 0.99 for (a) α = 0.1 , (b) α = 0.3 , (c) α = 0.5 and (d) α = 0.99 initialisation ( α (vertical axis) vs. number of iterations (horizontal axis)).
Figure A4. Evolution of α along the iterations. Case P α = 0.99 for (a) α = 0.1 , (b) α = 0.3 , (c) α = 0.5 and (d) α = 0.99 initialisation ( α (vertical axis) vs. number of iterations (horizontal axis)).
Fractalfract 08 00529 g0a4aFractalfract 08 00529 g0a4b

References

  1. Raissi, M.; Perdikaris, P.; Karniadakis, G. Physics-informed neural networks: A deep learning framework for solving forward and inverse problems involving nonlinear partial differential equations. J. Comput. Phys. 2019, 378, 686–707. [Google Scholar] [CrossRef]
  2. Salami, E.; Salari, M.; Ehteshami, M.; Bidokhti, N.; Ghadimi, H. Application of artificial neural networks and mathematical modeling for the prediction of water quality variables (Case study: Southwest of Iran). Desalin. Water Treat. 2016, 57, 27073–27084. [Google Scholar] [CrossRef]
  3. Jin, C.; Li, Y. Cryptocurrency Price Prediction Using Frequency Decomposition and Deep Learning. Fractal Fract. 2023, 7, 708. [Google Scholar] [CrossRef]
  4. Ramadevi, B.; Kasi, V.R.; Bingi, K. Hybrid LSTM-Based Fractional-Order Neural Network for Jeju Island’s Wind Farm Power Forecasting. Fractal Fract. 2024, 8, 149. [Google Scholar] [CrossRef]
  5. Chen, R.T.; Rubanova, Y.; Bettencourt, J.; Duvenaud, D.K. Neural ordinary differential equations. Adv. Neural Inf. Process. Syst. 2018, 31. [Google Scholar]
  6. Massaroli, S.; Poli, M.; Park, J.; Yamashita, A.; Asama, H. Dissecting neural odes. Adv. Neural Inf. Process. Syst. 2020, 33, 3952–3963. [Google Scholar]
  7. Dupont, E.; Doucet, A.; Teh, Y.W. Augmented neural odes. Adv. Neural Inf. Process. Syst. 2019, 32. [Google Scholar]
  8. Niu, H.; Zhou, Y.; Yan, X.; Wu, J.; Shen, Y.; Yi, Z.; Hu, J. On the applications of neural ordinary differential equations in medical image analysis. Artif. Intell. Rev. 2024, 57, 236. [Google Scholar] [CrossRef]
  9. Bonnaffé, W.; Sheldon, B.C.; Coulson, T. Neural ordinary differential equations for ecological and evolutionary time-series analysis. Methods Ecol. Evol. 2021, 12, 1301–1315. [Google Scholar] [CrossRef]
  10. Owoyele, O.; Pal, P. ChemNODE: A neural ordinary differential equations framework for efficient chemical kinetic solvers. Energy AI 2022, 7, 100118. [Google Scholar] [CrossRef]
  11. Coelho, C.; Costa, M.F.P.; Ferrás, L.L. Tracing footprints: Neural networks meet non-integer order differential equations for modelling systems with memory. In Proceedings of the Second Tiny Papers Track at ICLR 2024, Vienna, Austria, 11 May 2024. [Google Scholar]
  12. Coelho, C.; Costa, M.F.P.; Ferrás, L. Neural Fractional Differential Equations. arXiv 2024, arXiv:2403.02737. [Google Scholar]
  13. Caputo, M. Linear Models of Dissipation whose Q is almost Frequency Independent–II. Geophys. J. Int. 1967, 13, 529–539. [Google Scholar] [CrossRef]
  14. Diethelm, K. The Analysis of Fractional Differential Equations: An Application-Oriented Exposition Using Differential Operators of Caputo Type; Springer: Berlin/Heidelberg, Germany, 2010. [Google Scholar] [CrossRef]
  15. Diethelm, K.; Ford, N.J. Analysis of fractional differential equations. J. Math. Anal. Appl. 2002, 265, 229–248. [Google Scholar] [CrossRef]
  16. Diethelm, K. Smoothness properties of solutions of Caputo-type fractional differential equations. Fract. Calc. Appl. Anal. 2007, 10, 151–160. [Google Scholar]
  17. Zhang, H.; Gao, X.; Unterman, J.; Arodz, T. Approximation Capabilities of Neural ODEs and Invertible Residual Networks. In Proceedings of the 37th International Conference on Machine Learning, Virtual, 13–18 July 2020; Volume 119, pp. 11086–11095. [Google Scholar]
  18. Augustine, M.T. A Survey on Universal Approximation Theorems. arXiv 2024, arXiv:2407.12895. [Google Scholar]
  19. Kingma, D.P.; Ba, J. Adam: A method for stochastic optimization. arXiv 2014, arXiv:1412.6980. [Google Scholar]
Figure 1. Schematic of the NNs ( f θ and α ϕ ) used in the Neural FDE. (Left) f θ : The input h ^ ( t ) refers to a single value while the output f θ ( t , h ^ ( t ) ) refers to an NN that approximate sa continuous function f ( t , h ( t ) ) . w i and w i o are the weights associated with the different layers. (Right) α ϕ : the value of α i n is initialised.
Figure 1. Schematic of the NNs ( f θ and α ϕ ) used in the Neural FDE. (Left) f θ : The input h ^ ( t ) refers to a single value while the output f θ ( t , h ^ ( t ) ) refers to an NN that approximate sa continuous function f ( t , h ( t ) ) . w i and w i o are the weights associated with the different layers. (Right) α ϕ : the value of α i n is initialised.
Fractalfract 08 00529 g001
Figure 2. Schematic of a Neural FDE model fitted to data, considering two different training runs.
Figure 2. Schematic of a Neural FDE model fitted to data, considering two different training runs.
Fractalfract 08 00529 g002
Figure 3. Training loss evolution for Neural FDE when modelling P α = 0.3 for (a) α = 0.1 , (b) α = 0.3 , (c) α = 0.5 , and (d) α = 0.99 initialisation (loss (vertical axis) vs. number of iterations (horizontal axis)).
Figure 3. Training loss evolution for Neural FDE when modelling P α = 0.3 for (a) α = 0.1 , (b) α = 0.3 , (c) α = 0.5 , and (d) α = 0.99 initialisation (loss (vertical axis) vs. number of iterations (horizontal axis)).
Fractalfract 08 00529 g003
Figure 4. Evolution of α along the iterations. Case P α = 0.3 for (a) α = 0.1 , (b) α = 0.3 , (c) α = 0.5 , and (d) α = 0.99 initialisation ( α (vertical axis) vs. number of iterations (horizontal axis)).
Figure 4. Evolution of α along the iterations. Case P α = 0.3 for (a) α = 0.1 , (b) α = 0.3 , (c) α = 0.5 , and (d) α = 0.99 initialisation ( α (vertical axis) vs. number of iterations (horizontal axis)).
Fractalfract 08 00529 g004aFractalfract 08 00529 g004b
Figure 5. Training loss evolution of the Neural FDE when modelling P α = 0.3 for a fixed (a) α = 0.1 , (b) α = 0.3 , (c) α = 0.5 , and (d) α = 0.99 (loss (vertical axis) vs. number of iterations (horizontal axis)).
Figure 5. Training loss evolution of the Neural FDE when modelling P α = 0.3 for a fixed (a) α = 0.1 , (b) α = 0.3 , (c) α = 0.5 , and (d) α = 0.99 (loss (vertical axis) vs. number of iterations (horizontal axis)).
Fractalfract 08 00529 g005
Figure 6. Training loss evolution of the Neural FDE when modelling P α = 0.5 for a fixed (a) α = 0.1 , (b) α = 0.3 , (c) α = 0.5 , and (d) α = 0.99 (loss (vertical axis) vs. number of iterations (horizontal axis)).
Figure 6. Training loss evolution of the Neural FDE when modelling P α = 0.5 for a fixed (a) α = 0.1 , (b) α = 0.3 , (c) α = 0.5 , and (d) α = 0.99 (loss (vertical axis) vs. number of iterations (horizontal axis)).
Fractalfract 08 00529 g006
Figure 7. Training loss evolution of the Neural FDE when modelling P α = 0.99 for a fixed (a) α = 0.1 , (b) α = 0.3 , (c) α = 0.5 , and (d) α = 0.99 (loss (vertical axis) vs. number of iterations (horizontal axis)).
Figure 7. Training loss evolution of the Neural FDE when modelling P α = 0.99 for a fixed (a) α = 0.1 , (b) α = 0.3 , (c) α = 0.5 , and (d) α = 0.99 (loss (vertical axis) vs. number of iterations (horizontal axis)).
Fractalfract 08 00529 g007
Table 1. Learnt α and training loss obtained for three runs of a Neural FDE model, considering different α initialisations. Case P α = 0.3 .
Table 1. Learnt α and training loss obtained for three runs of a Neural FDE model, considering different α initialisations. Case P α = 0.3 .
α InitialisationLearnt α Training Loss
0.10.3498|0.3417|0.26393.60 × 10 4 |2.50 × 10 4 |3.89 × 10 4
0.30.2248|0.478|0.43741.44 × 10 4 |8.40 × 10 5 |8.00 × 10 5
0.50.3507|0.3878|0.29213.90 × 10 4 |1.67 × 10 4 |3.53 × 10 4
0.990.4427|0.3367|0.44973.50 × 10 5 |1.30 × 10 4 |6.00 × 10 6
Table 2. Learnt α and training loss obtained for three runs of a Neural FDE model, considering different α initialisations. Case P α = 0.5 .
Table 2. Learnt α and training loss obtained for three runs of a Neural FDE model, considering different α initialisations. Case P α = 0.5 .
α InitialisationLearnt α Training Loss
0.10.2927|0.4924|0.48731.36 × 10 3 |2.00 × 10 6 |5.20 × 10 5
0.30.455|0.4744|0.39233.00 × 10 6 |6.00 × 10 6 |9.52 × 10 4
0.50.5162|0.2955|0.4686.76 × 10 4 |1.02 × 10 3 |3.17 × 10 4
0.990.4191|0.5372|0.5039.20 × 10 5 |5.00 × 10 6 |3.00 × 10 6
Table 3. Learnt α and training loss obtained for three runs of a Neural FDE model, considering different α initialisations. Case P α = 0.99 .
Table 3. Learnt α and training loss obtained for three runs of a Neural FDE model, considering different α initialisations. Case P α = 0.99 .
α InitialisationLearnt α Training Loss
0.10.6216|0.5173|0.54077.00 × 10 6 |3.11 × 10 3 |1.26 × 10 3
0.30.2738|0.5429|0.53647.48 × 10 3 |4.00 × 10 6 |8.73 × 10 4
0.50.518|0.5586|0.58765.00 × 10 6 |1.00 × 10 5 |8.00 × 10 6
0.990.5652|0.5141|0.56661.60 × 10 5 |4.92 × 10 4 |1.00 × 10 5
Table 4. Final training loss for three runs of the Neural FDE (fixed α ). Case P α = 0.3 .
Table 4. Final training loss for three runs of the Neural FDE (fixed α ). Case P α = 0.3 .
Fixed α Training Loss
0.13.50 × 10 5 |4.50 × 10 5 |4.90 × 10 5
0.37.70 × 10 5 |2.44 × 10 4 |1.00 × 10 6
0.54.36 × 10 4 |1.00 × 10 5 |4.30 × 10 5
0.993.60 × 10 5 |2.10 × 10 5 |4.00 × 10 5
Table 5. Final training loss for three runs of the Neural FDE (fixed α ). Case P α = 0.5 .
Table 5. Final training loss for three runs of the Neural FDE (fixed α ). Case P α = 0.5 .
Fixed α Training Loss
0.14.53 × 10 4 |3.78 × 10 4 |4.27 × 10 4
0.31.00 × 10 6 |1.70 × 10 3 |3.00 × 10 6
0.51.70 × 10 4 |3.00 × 10 6 |4.00 × 10 6
0.993.60 × 10 4 |6.70 × 10 5 |1.00 × 10 3
Table 6. Final training loss for three runs of the Neural FDE (fixed α ). Case P α = 0.99 .
Table 6. Final training loss for three runs of the Neural FDE (fixed α ). Case P α = 0.99 .
Fixed α Training Loss
0.15.44 × 10 3 |3.10 × 10 3 |5.22 × 10 3
0.31.02 × 10 2 |8.68 × 10 3 |7.37 × 10 3
0.54.00 × 10 6 |6.10 × 10 5 |7.05 × 10 4
0.992.00 × 10 6 |8.18 × 10 3 |1.00 × 10 8
Table 7. Number of iterations needed to achieve a final training loss of 1 × 10 5 with the different fixed α values when modelling P α = 0.3 .
Table 7. Number of iterations needed to achieve a final training loss of 1 × 10 5 with the different fixed α values when modelling P α = 0.3 .
Fixed α Number of Iterations
0.1791 ± 90
0.3934 ± 76
0.5265 ± 2
0.991035 ± 462
Table 8. Number of iterations needed to achieve a final training loss of 1 × 10 5 with the different fixed α values when modelling P α = 0.5 .
Table 8. Number of iterations needed to achieve a final training loss of 1 × 10 5 with the different fixed α values when modelling P α = 0.5 .
Fixed α Number of Iterations
0.12837 ± 1697
0.34659 ± 1114
0.5193 ± 47
0.993255 ± 2003
Table 9. Number of iterations needed to achieve a final training loss of 1 × 10 5 with the different fixed α values when modelling P α = 0.99 .
Table 9. Number of iterations needed to achieve a final training loss of 1 × 10 5 with the different fixed α values when modelling P α = 0.99 .
Fixed α Number of Iterations
0.1- *
0.33119 ± 460
0.5128 ± 28
0.99240 ± 137
* In 10,000 iterations, it was not possible to achieve a final training loss of 1 × 10 5 in both runs. However the final training loss was 1.2 × 10 5 .
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

Share and Cite

MDPI and ACS Style

Coelho, C.; Costa, M.F.P.; Ferrás, L.L. Neural Fractional Differential Equations: Optimising the Order of the Fractional Derivative. Fractal Fract. 2024, 8, 529. https://doi.org/10.3390/fractalfract8090529

AMA Style

Coelho C, Costa MFP, Ferrás LL. Neural Fractional Differential Equations: Optimising the Order of the Fractional Derivative. Fractal and Fractional. 2024; 8(9):529. https://doi.org/10.3390/fractalfract8090529

Chicago/Turabian Style

Coelho, Cecília, M. Fernanda P. Costa, and Luís L. Ferrás. 2024. "Neural Fractional Differential Equations: Optimising the Order of the Fractional Derivative" Fractal and Fractional 8, no. 9: 529. https://doi.org/10.3390/fractalfract8090529

APA Style

Coelho, C., Costa, M. F. P., & Ferrás, L. L. (2024). Neural Fractional Differential Equations: Optimising the Order of the Fractional Derivative. Fractal and Fractional, 8(9), 529. https://doi.org/10.3390/fractalfract8090529

Article Metrics

Back to TopTop