Next Article in Journal
Periodically Kicked Rotator with Power-Law Memory: Exact Solution and Discrete Maps
Next Article in Special Issue
Differential and Integral Equations Involving Multivariate Special Polynomials with Applications in Computer Modeling
Previous Article in Journal
Distinctive LMI Formulations for Admissibility and Stabilization Algorithms of Singular Fractional-Order Systems with Order Less than One
Previous Article in Special Issue
Novel Approximations to the Multi-Dimensional Fractional Diffusion Models Using the Tantawy Technique and Two Other Transformed Methods
 
 
Font Type:
Arial Georgia Verdana
Font Size:
Aa Aa Aa
Line Spacing:
Column Width:
Background:
Article

Methodologies for Improved Optimisation of the Derivative Order and Neural Network Parameters in Neural FDE Models

by
Cecília Coelho
1,2,*,
M. Fernanda P. Costa
2,
Oliver Niggemann
1 and
Luís L. Ferrás
2,3,4
1
Institute for Artificial Intelligence, Helmut Schmidt University, 22043 Hamburg, Germany
2
Centre of Mathematics (CMAT), University of Minho, 4710-57 Braga, Portugal
3
Centro de Estudos de Fenómenos de Transporte CEFT, Faculty of Engineering, University of Porto, 4200-465 Porto, Portugal
4
ALiCE Associate Laboratory in Chemical Engineering, Faculty of Engineering, University of Porto, 4200-465 Porto, Portugal
*
Author to whom correspondence should be addressed.
Fractal Fract. 2025, 9(7), 471; https://doi.org/10.3390/fractalfract9070471
Submission received: 18 June 2025 / Revised: 10 July 2025 / Accepted: 17 July 2025 / Published: 20 July 2025

Abstract

This work presents and compares different methodologies for the joint optimisation of the fractional derivative order and the parameters of the right-hand-side neural network in Neural Fractional Differential Equation models. The proposed strategies aim to tackle the training difficulties typically encountered when learning the fractional order α together with the network weights. One approach is based on regulating the gradient magnitude of the loss function with respect to α , encouraging more stable and effective updates. Another strategy introduces an online pre-training scheme, where the network parameters are initially optimised over progressively longer time intervals, while α is updated more conservatively using the full time trajectory. The study focuses only on a foundational setting with one-dimensional problems, and numerical experiments demonstrate that the proposed techniques improve both training stability and accuracy. Nonetheless, the issue of non-uniqueness in the optimal derivative order remains, particularly in less well-posed scenarios, suggesting the need for further research in data-driven modelling of fractional-order systems.

1. Introduction

Neural Fractional Differential Equations (Neural FDEs) form a class of neural architectures aimed at approximating solutions h ( t ) for time-dependent data { x 0 , x 1 , , x N } , where each x i represents data collected at time t i , typically from experiments or simulations over a time interval [ t 0 , T ] [1,2]. Neural FDEs have attracted attention and are currently an emerging architecture in the literature, with extensions to Graph Neural Networks [3], variable-order operators [4], integrating physical knowledge [5], and more efficient adjoint algorithms to accelerate training [6]. While Neural FDEs can also be applied to model problems beyond time-series data, such applications are beyond the scope of this work [7].
These models incorporate a fractional derivative of Caputo type (the Caputo derivative definition is used due to its popularity in real-world formulations as it uses integer-order initial conditions) [8,9] (or any other fractional derivative [10,11,12,13,14,15,16]), denoted D 0 C t α h ( t ) , into a neural network-driven dynamical system:
D 0 C t α h ( t ) = f θ ( t , h ( t ) ) , h ( t 0 ) = x 0 , t ( t 0 , T ] ,
where f θ is a neural network with parameters θ , and α is the order of the fractional derivative. The Caputo fractional derivative for a scalar function g ( t ) with 0 < α < 1 is defined as follows:
D 0 C t α g ( t ) = 1 Γ ( 1 α ) 0 t ( t s ) α g ( s ) d s ,
with Γ ( · ) representing the Gamma function Γ ( z ) = 0 t z 1 e t d t .
A remarkable feature of this framework is the flexibility to learn not only the network parameters θ but also the fractional order α . When α = 1 , the formulation recovers the well-known Neural ODEs [17].
In the approach proposed in [2], the fractional order α is parametrised by a dedicated neural network, denoted by α ϕ , with trainable parameters ϕ . This formulation is especially advantageous when dealing with systems where the fractional order of differentiation is a function of time, accommodating non-stationary dynamics. It is worth noting that, when α is constant, it can be treated as a single trainable parameter of the model, rather than being represented by a neural network, as is done in this work. Thus, learning involves jointly optimising θ and ϕ (or α ) by minimising the difference between the predicted trajectory h ^ ( t i ) (obtained from the numerical solution of (1)) and the true data x i ; that is:
minimize θ R n θ , ϕ R n ϕ L ( θ , ϕ ) = 1 N i = 1 N   h ^ ( t i ) x i 2 2 subject to h ^ ( t i ) = FDESolve ( α ϕ , f θ , x 0 , { t 0 , , t N } )
where | | · | | 2 2 is the L 2 norm.
Here, FDESolve ( · ) denotes an appropriate numerical solver for the Fractional Differential Equation (FDE) (1) (see [2] for more details), and the loss function is defined as the Mean Squared Error (MSE) between the model prediction and the observed data.
Although the study in [2] reports that the fractional order α can be inferred from data for various datasets, the estimated values often differ significantly from the true (ground-truth) values, and different initialisations of α lead to different optimal α values.
Building upon this, the work in [18] further investigated numerically the dependency between α and f θ , showing that the neural network in the right-hand side of the Neural FDE struggles to consistently adjust its parameters to accurately represent the system’s dynamics for arbitrary values of α . As a result, Neural FDEs do not necessarily converge to a unique optimal fractional order; instead, they may rely on a wide range of α values to achieve a reasonable fit to the data. This flexibility, however, comes at a cost. Every time the value of α is modified during training, the underlying function f θ must be re-learnt, as the optimal parameter set θ for one fractional order might differ substantially from that of another. This results in an inefficient and potentially unstable optimisation process, since the function learnt by the network may change substantially with small variations in α .
To better understand the complex dependence of the neural network on the order of the derivative, it is helpful to first examine an analytical problem (specifically, a fractional initial value problem) for which there are known results concerning the continuity of the solution with respect to the problem data [9,19]:
Theorem 1.
Let 0 < α < 1 , t 0 0 , and z 0 R . Consider the Caputo fractional initial-value problem
D 0 C t α z ( t ) = f t , z ( t ) , z ( t 0 ) = z 0 ,
and two perturbed problems on the same interval [ t 0 , T ] :
D 0 C t α u ( t ) = f ˜ t , u ( t ) , u ( t 0 ) = z 0 , D 0 C t α + ε v ( t ) = f t , v ( t ) , v ( t 0 ) = z 0 .
where f ˜ satisfies the same Lipschitz (with respect to the second variable) and continuity hypotheses as f, and ε is a small scalar perturbation of the derivative order.
Define
δ f : = max ( t , z ) D   f ( t , z ) f ˜ ( t , z ) , δ α : = | ε |
with D a suitable domain where the solution exists and is unique. Then, there exists T > t 0 and a constant C > 0 such that the unique solutions z ( t ) , u ( t ) and v ( t ) exist on [ t 0 , T ] and satisfy
sup t 0 t T z ( t ) u ( t ) C δ f , sup t 0 t T z ( t ) v ( t ) C δ α ,
or,
sup t 0 t T z ( t ) u ( t ) = O δ f , sup t 0 t T z ( t ) v ( t ) = O δ α .
This means that small perturbations in either f or the fractional order α lead to only small changes in the solution, so the problem is well posed in the sense of Hadamard [9,19]. Note that similar results can be obtained for the perturbation of the initial value z 0 .
In light of this, one can formulate the following lemma:
Lemma 2.
Let f be Lipschitz continuous in its second argument with constant L, then
sup t 0 t T f t , z ( t ) f t , v ( t ) = O ( δ α )
Proof. 
Since f is Lipschitz continuous in its second argument with constant L, the previous theorem gives
sup t 0 t T z ( t ) v ( t ) C δ α
for some C > 0 . Hence
sup t 0 t T f t , z ( t ) f t , v ( t ) L sup t 0 t T z ( t ) v ( t ) L C δ α .
In other words, there exists a constant K = L C such that
sup t 0 t T f t , z ( t ) f t , v ( t ) K δ α ,
showing that a perturbation δ α in the fractional order causes at most an O ( δ α ) change in the right-hand side along the solution.    □
These estimates should be regarded with care, since their practical relevance depends on the magnitude of the constant K (or C). While they guarantee continuous dependence of the solution and the right-hand side on the data, a large constant may amplify even a small perturbation δ into a significant deviation. Moreover, because the bounds involve the supremum over t [ t 0 , T ] , there can be instants at which
z ( t ) v ( t ) or f t , z ( t ) f t , v ( t )
attains large values (smaller than the supremum), thus producing notable differences in the behaviour of f ( t , z ( t ) ) versus f ( t , v ( t ) ) .
Another point to note is that, during the training of a Neural FDE model, the optimiser often drifts toward α 1 . Three factors may contribute to this bias: the gradient can become unstable for smaller α values due to the singular behaviour of the solution, right-hand-side function or fractional kernel; the loss landscape typically features a broader, more stable basin around α = 1 ; and accumulated numerical errors in the approximate solution of the FDE are minimised when α is close to one.
Based on the above, there is a clear need for new techniques to regulate both the learning of the optimal value of α and its interaction with the parameters θ during the optimisation process. To this end, we propose novel strategies aimed at enhancing the numerical robustness of the training procedure, with a particular emphasis on reducing the sensitivity of the learnt parameters θ to variations in the fractional order α .
This paper is organised as follows. Section 2 introduces two distinct methodologies designed to address the challenges associated with learning the order of the fractional derivative (in the Neural FDE model). In Section 3, we present numerical results obtained using the Neural FDE model (together with the new proposed methodologies) under various ground-truth scenarios. The paper ends with the Conclusion in Section 4. Furthermore, Appendix A contain additional plots of the numerical experiments.

2. Method

We propose two methodologies to address the challenges associated with learning the order of the fractional derivative in Neural FDEs: Neural FDE training with α-gradient clipping (Algorithm 1) and Neural FDE training with online pre-training (Algorithm 2).
Algorithm 1 Neural FDE training with α -gradient clipping.
  1:
Input: start time t 0 , end time t N = T , initial condition y ( t 0 ) = y 0 , mesh, maximum number of epochs MAXITER, clipping value c;
  2:
Choose O p t i m i s e r ;
  3:
f θ = D y n a m i c s N N ( ) ;
  4:
Initialise θ ;
  5:
Initialise α ϕ ;
  6:
for   k = 1 : M A X I T E R   do
  7:
       { h ^ ( t i ) } i = 1 , N FDESolve ( α ϕ , f θ , y 0 , { t 0 , t 1 , , t N } ) ;
  8:
      Evaluate loss L ;
  9:
       L Compute gradients of L ( θ , ϕ ; { h ( t i ) } i = 1 , N , { h ^ ( t i ) } i = 1 , N ) ;
10:
       L α ϕ C l i p L α ϕ , c
11:
       θ , ϕ O p t i m i s e r . S t e p ( L ) ;
12:
end for
13:
Return:  θ , α ϕ ;
Algorithm 2 Neural FDE training with online pre-training.
  1:
Input: start time t 0 , end time t N = T , initial condition y ( t 0 ) = y 0 , mesh, maximum number of epochs MAXITER, loss change threshold;
  2:
Choose O p t i m i s e r ;
  3:
f θ = D y n a m i c s N N ( ) ;
  4:
Initialise θ ;
  5:
Initialise α ϕ ;
  6:
Initialise l o s s d i f f , l o s s p r e v ;
  7:
i t r s 0 ;
  8:
while   l o s s d i f f > threshold do            online pre-training stage
  9:
       i t r s + = 1
10:
      for  t i = [ t 0 , , T ]  do
11:
             y ^ ( t i ) FDESolve ( α ϕ , f θ , y 0 , { t 0 , t i } ) ;
12:
            Evaluate loss L ( θ , ϕ ; y ( t i ) , y ^ ( t i ) ) ;
13:
             L Compute gradients of L ( θ , ϕ ; y ( t i ) , y ^ ( t i ) ) ;
14:
             θ O p t i m i s e r . S t e p ( L ( θ , ϕ ; y ( t i ) , y ^ ( t i ) ) ) ;
15:
      end for
16:
       { h ^ ( t i ) } i = 1 , N FDESolve ( α ϕ , f θ , y 0 , { t 0 , t 1 , , t N } ) ;
17:
      Evaluate loss L ( θ , ϕ ; { h ( t i ) } i = 1 , N , { h ^ ( t i ) } i = 1 , N ) ;
18:
       L Compute gradients of L ( θ , ϕ ; { h ( t i ) } i = 1 , N , { h ^ ( t i ) } i = 1 , N ) ;
19:
       ϕ O p t i m i s e r . S t e p ( L ( θ , ϕ ; { h ( t i ) } i = 1 , N , { h ^ ( t i ) } i = 1 , N ) ) ;
20:
       l o s s d i f f l o s s p r e v L ( θ , ϕ ; { h ( t i ) } i = 1 , N , { h ^ ( t i ) } i = 1 , N )
21:
       l o s s p r e v L ( θ , ϕ ; { h ( t i ) } i = 1 , N , { h ^ ( t i ) } i = 1 , N )
22:
end while
23:
for   k = i t r s : M A X I T E R   do            original Neural FDE training
24:
       { h ^ ( t i ) } i = 1 , N FDESolve ( α ϕ , f θ , y 0 , { t 0 , t 1 , , t N } ) ;
25:
      Evaluate loss L ;
26:
       L Compute gradients of L ( θ , ϕ ; { h ( t i ) } i = 1 , N , { h ^ ( t i ) } i = 1 , N ) ;
27:
       θ , ϕ O p t i m i s e r . S t e p ( L ) ;
28:
end for
29:
Return:  θ , α ϕ ;

2.1. α -Gradient Clipping

Following the experimental results presented in [2,18], in this work, to prevent a rapid change in the α parameter during training, we propose clipping the gradients of α ϕ . To maintain the generality of the methodologies throughout this work, the general notation α ϕ will be adopted to represent a neural network-based approximation of the fractional order. However, the numerical experiments will focus on the case where α is a single scalar parameter, as this is the most common modelling approach in the context of FDEs.
Since α ϕ and θ are jointly optimised, large or erratic gradient updates in α ϕ can hinder the optimisation process. Thus, by bounding the gradients we reduce the risk of such adverse interactions and promote more stable training and better convergence [20,21]. Gradient clipping can be easily integrated into the Neural FDE training process as described in Algorithm 1.

2.2. Online Pre-Training

Another source of instability in the Neural FDE training process is the update of the ϕ at the same frequency of θ since, as stated in Section 1, the optimal parameter set θ for one fractional order α may be remarkably distinct for another. Thus, even small changes in α can require substantial re-learning of f θ .
To address this we introduce an online pre-training stage to the Neural FDE optimisation process. Instead of solving the FDE over the entire time domain and updating the parameters based on the full trajectory predictions in each epoch, this stage adopts a progressive, short-horizon training approach. The FDE is solved gradually by increasing the integration horizon, and at each step, only the prediction at the current terminal time point t i is used to compute the loss and update the dynamics parameters θ . This process, named online training [22], effectively performs one update per time step over the interval t [ t 0 , T ] , resulting in T iterations per epoch, while keeping the fractional-order parameters ϕ fixed.
This gradual training allows the model to first capture local dynamics and stabilise its predictions before being exposed to global behaviour, mitigating gradient instability and poor convergence. Once a full epoch of short-horizon updates is completed, ϕ is updated using a loss computed over the entire trajectory while θ is fixed. The online pre-training stage ends when the change in loss between epochs falls below a predefined threshold, avoiding unnecessary computation and reducing the risk of overfitting. This stage can be easily introduced in the Neural FDE optimisation process by adding a second training loop before the traditional optimisation, as in Algorithm 2.

3. Numerical Experiments

To analyse and compare the behaviour of the Neural FDE model trained with α -gradient clipping (clipped Neural FDE, Algorithm 1) and with online pre-training (pre-trained Neural FDE, Algorithm 2) against the baseline approach from [2] (original Neural FDE), we present numerical results for three carefully selected case studies, each chosen for its distinct characteristics. In each case, the datasets were generated using the analytical solution of the FDE under study, for different values of the fractional order α . The objective was to assess the performance of each training method by evaluating the MSE and the accuracy in learning the correct α value. Results are reported as averages over three independent runs, together with the corresponding standard deviations (std). In addition, the evolution of α throughout training, along with its gradient L α ϕ and the corresponding loss function, were analysed and compared.
All experiments were implemented in PyTorch 3.8 [23], making use of the FDEint 0.1.1 library [24]. Training was conducted using the Adam optimiser [25] with a learning rate of 0.0025. In the numerical experiments, α ϕ was treated as a single trainable parameter. Gradient clipping was applied within the range [ 1 , 1 ] , and the stopping criterion for the online pre-training phase was set to 1 × 10 6 . Each Neural FDE model was trained for a total of 2000 epochs. It is worth noting that, in the online pre-training methodology, the two training stages together comprise the 2000 epochs, ensuring a fair comparison with the original and gradient clipping approaches.

3.1. Case Study 1

Consider the following initial value problem involving a Caputo fractional derivative of order α ( 0 , 1 ) :
D 0 C t α y ( t ) = λ y ( t ) , t ( 0 , T ] , y ( 0 ) = 1 ,
where λ R is a constant parameter. The analytical solution to this problem is given by [26],
y ( t ) = E α ( λ t α ) ,
where the Mittag-Leffler function E α ( z ) is defined as E α ( z ) = k = 0 z k Γ ( α k + 1 ) , α > 0 . More generally, the two-parameter Mittag-Leffler function is given by,
E α , γ ( z ) = k = 0 z k Γ ( α k + γ ) , α , γ > 0 ,
and satisfies E α , 1 ( z ) = E α ( z ) .
Note that the right-hand-side function f ( t , y ( t ) ) = λ y ( t ) is proportional to the solution itself. Therefore, any perturbation in the solution directly translates into a proportional perturbation in the right-hand-side function. As such, the absolute difference between solutions resulting from different values of α coincides (up to a multiplicative factor λ ) with the corresponding difference in the right-hand-side functions.
We consider λ = 1 and compare the solution for α = 0.80 with a perturbed order α = 0.85 . Table 1 shows the values of y ( t ) for both cases and the absolute difference | Δ y ( t ) | = | y 0.85 ( t ) y 0.80 ( t ) | at selected time points. Varying α induces small changes in both the solution and the right-hand-side function.
  • Smoothness and Singularities: The Mittag-Leffler function E α ( t α ) is entire for t > 0 and continuous at t = 0 . Its small-t expansion
    y ( t ) = 1 t α Γ ( 1 + α ) + t 2 α Γ ( 1 + 2 α )
    yields
    y ( t ) α Γ ( 1 + α ) t α 1 , y ( t ) α ( α 1 ) Γ ( 1 + α ) t α 2 ( t 0 + ) .
    Since α 1 < 0 and α 2 < 1 , both y ( t ) and y ( t ) diverge as t 0 + . The singularity in the slope is of order t α 1 (weakly integrable), while the curvature diverges more strongly like t α 2 . For larger t, y decreases monotonically from 1 toward 0, beginning with strong convexity (since y > 0 near zero) and transitioning eventually to mild concave-down behaviour as the algebraic decay dominates. The right-hand side f ( t , y ) = y ( t ) is equally smooth on ( 0 , 4 ] and continuous at 0, with f = y and f = y . Thus f has the same singular orders but opposite sign curvature: strongly concave near t = 0 and gently concave elsewhere.
The Neural FDEs employed in this case study consist of a neural network f θ with the following architecture: an input layer with a single neuron and a rectified linear unit (ReLU) activation function, one hidden layer with 10 neurons also using ReLU activation, and an output layer with a single neuron. The fractional order α was initialised to 0.99 .
Four datasets corresponding to α = 0.4 , 0.5 , 0.8 , 0.99 were generated using the analytical solution. Each training set consisted of 500 points over the interval t [ 0 , 3 ] , while the testing sets included 700 points over t [ 0 , 4 ] .
It is crucial to emphasise that the training data originates from an analytical solution, which is then numerically transformed using an approximation of the Caputo derivative. The neural network f θ learns from this transformed data, meaning the accuracy of the numerical approximation directly impacts the reliability of the training data and, consequently, the optimisation outcomes. This numerical treatment may either suppress or artificially smooth potential singularities in the right-hand-side function f ( t , y ( t ) ) .
The results are summarised in Table 2, Table 3 and Table 4, while the evolution of the learnt α , its gradient L α ϕ , and the loss throughout training are illustrated in Figure 1, Figure 2, Figure 3 and Figure 4 for the case α = 0.5 . Additional results for α = 0.4 , 0.8 , and 0.99 are provided in Appendix A.1, Figure A1, Figure A2, Figure A3, Figure A4, Figure A5, Figure A6, Figure A7, Figure A8, Figure A9, Figure A10, Figure A11 and Figure A12.
Table 2 (final training losses) and Table 3 (test MSE over 3 runs) confirm that, as expected, there are no significant differences in performance between the original Neural FDE method [2] and the two strategies proposed in this work.
However, Table 4 shows that while both the original Neural FDE and the pre-trained version produce predicted values of α close to the ground truth, the pre-trained model significantly outperforms the original method by achieving notably more accurate estimates of α , demonstrating a clear improvement in learning.
In contrast, the worst performance is observed with the gradient clipping approach, as illustrated also in Figure 1, which displays the data fitting for the case α = 0.5 .
The good fitting performance of the original Neural FDE method (also visible in Figure 1) may be attributed to one of two reasons: either the neural network with parameters θ adapted effectively to the learnt value α 0.69 , or the discrepancy between the true value α = 0.5 and the learnt one was not significant enough to affect the prediction quality in this particular case. The latter explanation is consistent with the sensitivity analysis discussed earlier (Table 1).
Figure 2 (evolution of α during training) and Figure 3 (evolution of the loss function) indicate that gradient clipping negatively affects the update dynamics of α . Specifically, the clipped- α strategy appears to slow down convergence, requiring more training epochs to achieve the same level of accuracy as the other two methods.
Figure 4 shows that the gradients of the loss function with respect to α are very small in this case study. This may explain why the gradient clipping technique does not yield improved results. The Adam optimiser combines momentum and adaptive learning rates by computing exponential moving averages of both the gradients and their squared values. As a result, it adjusts the step size dynamically based on the history of gradients. However, when the gradients are consistently small, the adaptive nature of Adam may result in updates that are too conservative, thereby limiting the optimiser’s ability to escape flat regions of the loss landscape or to sufficiently update α . The effectiveness of the gradient clipping methodology will be better assessed in the following case studies, where larger gradients are expected to arise.
For the remaining values of α , namely 0.4, 0.8 and 0.99, the results are qualitative similar to those observed for α = 0.5 . These results are presented in Appendix A.1 to avoid cluttering of figures.

3.2. Case Study 2

Consider the following initial value problem involving a Caputo fractional derivative of order α ( 0 , 1 ) :
D 0 C t α y ( t ) = f ( t , y ( t ) ) , t ( 0 , T ] ,
subject to the initial condition y ( 0 ) = 1 , where the right-hand side is given by,
f ( t , y ( t ) ) = 2 Γ ( 3 α ) t 2 α 1 Γ ( 2 α ) t 1 α
The analytical solution to this problem is y ( t ) = t 2 t + 1  [26].
This particular example is noteworthy because the analytical solution does not depend on the value of α , allowing the model to converge to an arbitrary fractional order. However, it is important to note that the neural network f θ is trained to approximate the fractional derivative itself, not the solution y ( t ) . As such, its output inherently depends on the order α . Although it is not possible to directly compute the error between the predicted and the true value of α in this case, the model’s behaviour during the search for the optimal solution provides insight into how it adapts to the problem structure.
  • Smoothness and Singularities: The polynomial y ( t ) is C on [ 0 , 4 ] with
    y ( t ) = 2 t 1 , y ( t ) = 2 ,
    so it exhibits uniform convexity and a single turning point at t = 0.5 . No singularities arise in y.
    The function f ( t ) is C on ( 0 , 4 ] and tends to zero as t 0 + , because both exponents 2 α > 1 and 1 α ( 0 , 1 ) are positive. Its derivatives,
    f ( t ) = 2 ( 2 α ) Γ ( 3 α ) t 1 α 1 α Γ ( 2 α ) t α ,
    f ( t ) = 2 ( 2 α ) ( 1 α ) Γ ( 3 α ) t α α ( 1 α ) Γ ( 2 α ) t α 1 ,
    diverge as t 0 + with orders t α and t α 1 , respectively. Thus f has a mild slope singularity but a strong curvature singularity at the origin. On ( 0 , 4 ] , f changes sign once and its curvature f ( t ) also changes sign, making f non-monotonic and non-convex overall.
The Neural FDEs considered in this case study consist of a neural network f θ with one input layer containing a single neuron using a ReLU activation function; one hidden layer with 10 neurons and ReLU activation; and one output layer with a single neuron.
A dataset was generated from the analytical solution, with each training set comprising 500 points over the interval t [ 0 , 3 ] , and testing sets containing 700 points over t [ 0 , 4 ] . To examine the influence of the learnt α and its initialisation on the Neural FDE models, we present results for two initial values of α : 0.5 and 0.99.
The outcomes are summarised in Table 5, Table 6 and Table 7, while Figure 5, Figure 6, Figure 7, Figure 8, Figure 9 and Figure 10 display the training evolution of the learnt α , its gradient L α ϕ , and the loss for both initialisations.
From Table 5 and Table 6, it is evident that the initialisation of α slightly influences the training performance, with an initial value of 0.99 consistently yielding better results across all three methodologies. In the testing dataset, performances are comparable among the different initialisations; however, initialising at 0.5 results in less consistency across training runs, as reflected by a higher standard deviation.
All three Neural FDE training approaches learn values of α that remain extremely close to their initialisation value of 0.5 (and 0.99 ) (see Table 7 and Table 8). This indicates that, in this particular case where any α should (in theory) yield a good solution (the analytical solution does not depend on α ), the methodology is robust enough not to deviate significantly from the initialised α .
Note that when α is initialised at 0.5 , the learnt values across the three runs exhibit greater variability (see Table 7).
Figure 5 and Figure 6 display the predicted and ground-truth curves from the best runs in Case Study 2, with α initialised at 0.5 and 0.99, respectively. For the case α = 0.5 , the pre-trained methodology appears to deviate from the ground truth in the testing region; however, overall, this approach delivers the best results.
During training (Figure 7, where α = 0.5 ), the parameter α exhibits a sharp increase for both the original and gradient-clipped Neural FDEs. A similar trend is observed when α = 0.99 , although in this case, the clipped model displays a smoother evolution compared to the scenario with α = 0.5 (Figure 8).
Furthermore, both the original and clipped models show significant oscillations in the gradient L α throughout training. While the pre-trained Neural FDE exhibits similar behaviour, the oscillations are noticeably more stable, suggesting that the pre-training phase contributes to a more robust and consistent learning process (see Figure 9 and Figure 10).
It is also worth noting that the amplitude of the oscillations is substantially larger in the case where α = 0.5 (the scales used in Figure 9 and Figure 10 are different). For values of α closer to 1, the right-hand-side function becomes smoother and, in principle, easier to learn. As a result, the optimisation process may tend to favour higher values of α within the range [ 0 , 1 ] .
These oscillations are reflected in the behaviour of the loss function. Figure 11 and Figure 12 show that the pre-training method exhibits a smoother evolution of the loss compared to the other two approaches.

3.3. Case Study 3

Consider the following initial value problem involving the Caputo fractional derivative of order α ( 0 , 1 ) :
D 0 C t α y ( t ) = f ( t , y ( t ) ) , t ( 0 , T ] ,
subject to the initial condition y ( 0 ) = 0 , where the right-hand side is defined as,
f ( t , y ( t ) ) = 40320 Γ ( 9 α ) t 8 α 3 Γ ( 5 + α / 2 ) Γ ( 5 α / 2 ) t 4 α / 2 + 9 4 Γ ( α + 1 ) + 3 2 t α / 2 t 4 3 y ( t ) 3 / 2 .
We consider the interval t [ 0 , 1 ] , for which the analytical solution is given by
y ( t ) = t 8 3 t 4 + α 2 + 9 4 t α ,
as established in [9]. In this case, the analytical solution includes a term of the form t α , which introduces a more pronounced non-smoothness to the solution.
We compare the solution for α = 0.80 with a perturbed order α = 0.85 . Table 9 shows the values of y ( t ) for both cases and the absolute differences | Δ y ( t ) | = | y 0.85 ( t ) y 0.80 ( t ) | , | Δ f ( t , y ( t ) ) | = | f 0.85 ( t ) f 0.80 ( t ) | at selected time points.
Varying α induces small changes in both the solution and the right-hand-side function. However, these variations are slightly more pronounced than those observed in Case Study 1.
  • Smoothness and Singularities:
    y ( t ) = 8 t 7 3 4 + α 2 t 3 + α 2 + 9 4 α t α 1 ,
    y ( t ) = 56 t 6 3 4 + α 2 3 + α 2 t 2 + α 2 + 9 4 α ( α 1 ) t α 2 .
    Near t = 0 , the term 9 4 α t α 1 in y and 9 4 α ( α 1 ) t α 2 in y dominate, producing a weak slope singularity of order t α 1 and a strong curvature singularity of order t α 2 . As t increases, the high-degree monomials t 7 and t 6 prevail, introducing inflection points and reversals of concavity.
    The function f ( t , y ( t ) ) is a sum of fractional powers and highly nonlinear terms. Its first and second derivatives consist of sums of powers t p , where near t = 0 the most singular exponent arises from differentiating ( t α / 2 ) 3 . One finds curvature singularities of order up to t α / 2 2 , which can be exceptionally strong if α is small. Throughout [ 0 , 1 ] , f exhibits multiple sign changes, several inflection points, and rapid oscillations of curvature, reflecting high analytical complexity.
For this case study, four datasets were generated corresponding to different values of the fractional order, namely α = 0.5 , 0.8 , and 0.99 . The training set consists of 300 data points uniformly distributed over the interval t [ 0 , 0.6 ] , while the testing set includes 400 points in the extended range t [ 0 , 0.7 ] .
The Neural FDE models employ a neural network f θ with the following architecture: one input layer with a single neuron using a tanh activation function; two hidden layers with 128 neurons each—where the first uses the exponential linear unit (ELU) activation and the second uses tanh; and a final output layer with one neuron. The fractional order α was initialised at 0.99 . In this case, a more robust neural network was considered in order to deal the characteristics of the problem at hands.
The corresponding results are presented in Table 10, Table 11 and Table 12. Figure 13, Figure 14, Figure 15 and Figure 16 illustrate the training dynamics for the case α = 0.5 , including the evolution of the learnt order α , its gradient L α ϕ , and the loss function. Additional results for α = 0.8 and α = 0.99 are provided in Appendix A.2, in Figure A13, Figure A14, Figure A15, Figure A16, Figure A17, Figure A18, Figure A19 and Figure A20.
From Table 10 and Table 11, all three methodologies have similar training and testing performance. Furthermore, the learnt α of all three methodologies are similar, close to 0.99, independently of the ground-truth α value of the dataset being modelled. These results highlight the difficulty the model faces in accurately learning the correct value of α using any of the methods. The loss function consistently favours values of α close to 1 (even when considering different initialisations of α ).
Remark: 
It should be noted, however, that increasing the number of training epochs leads to a gradual decrease in the value of α towards its expected value. Nevertheless, the convergence rate is so slow that the associated computation times become prohibitively high, rendering this approach inefficient in practice.
From Figure 13, Figure 14, Figure 15, Figure 16, Figure 17, Figure 18, Figure 19, Figure 20, Figure A1, Figure A2, Figure A3, Figure A4, Figure A5, Figure A6, Figure A7, Figure A8, Figure A9, Figure A10, Figure A11, Figure A12, Figure A13, Figure A14, Figure A15, Figure A16, Figure A17, Figure A18, Figure A19 and Figure A20, unlike in the previous case study, both the loss and the gradient with respect to α remain stable throughout training. However, when analysing the prediction curves (Figure 13, Figure A13 and Figure A17), it becomes clear that although the models fit the data well up to t = 0.5 , they struggle to generalise beyond that point.
This weaker performance in the latter part of the time interval is likely due to a shift in the underlying dynamics, which were not present in the training data. For the models to successfully extrapolate, additional training data in the interval t [ 0.6 , 0.7 ] would be necessary.

3.4. Case Study 4

Consider again the initial value problem:
D 0 C t α y ( t ) = λ y ( t ) , t ( 0 , T ] , y ( 0 ) = 1 ,
but now with λ = 1 . We compare the solution ( y ( t ) = E α ( λ t α ) ) for α = 0.80 with a perturbed order α = 0.85 . Table 13 shows the values of y ( t ) for both cases and the absolute difference | Δ y ( t ) | = | y 0.85 ( t ) y 0.80 ( t ) | at selected time points:
For t = 10 , we observe that:
f 10 , z ( 10 ) f 10 , v ( 10 ) = E 0.80 ( 10 0.80 ) E 0.85 ( 10 0.85 ) 1619.583 .
This difference satisfies the inequality
f 10 , z ( 10 ) f 10 , v ( 10 ) K δ α ,
where δ α = 0.85 0.80 = 0.05 , which implies that K 32,392.
Such a large value of K highlights the strong sensitivity of the solution with respect to small perturbations in the fractional order α , particularly for large t. This indicates that the problem may exhibit a not so well-posed behaviour, where small variations in the problem data result in substantial changes in the solution (or right-hand-side function).
  • Smoothness and Singularities: Similar to Case Study 1. Both y and y diverge with orders t α 1 and t α 2 . For t > 0 , all derivatives exist and remain positive, so y grows monotonically and stays strictly convex (no inflection). The right-hand side f ( t , y ) = y ( t ) shares these properties exactly.
Using the analytical solution, four datasets were generated for α = 0.5 , 0.8 , and 0.99 . Each dataset contains 1500 points in the interval t [ 0 , 3 ] for training and 1500 points in the interval t [ 0 , 4 ] for testing. The Neural FDE models use a neural network f θ with the following architecture: one input layer with a single neuron and a hyperbolic tangent (tanh) activation function; two hidden layers with 64 neurons each and ReLU activations; and one output layer with a single neuron. The fractional order α was initialised at 0.99 for all experiments.
The corresponding results are summarised in Table 14, Table 15 and Table 16. The evolution of the learnt order α , its gradient L α ϕ , and the training loss are depicted in Figure 17, Figure 18, Figure 19 and Figure 20 for α = 0.5 , and in Appendix A.3, Figure A21, Figure A22, Figure A23, Figure A24, Figure A25, Figure A26, Figure A27 and Figure A28, for α = 0.8 and α = 0.99 .
The results presented in Table 14, Table 15 and Table 16 indicate that all three methodologies exhibit similar performance in terms of fitting the data and predicting the solution’s evolution. However, as shown in Figure 18, the best results for the most successful run were achieved using the pre-trained methodology.
It is worth noting that the clipped methodology produced very poor results, despite achieving a low final training loss. This discrepancy suggests a potential issue or instability with the training process that may depend on the initialisations used.
Once again, the model failed to accurately learn the correct order of the derivative. In all cases, the optimal value of α converged to approximately 1, regardless of the true value used during training (see also Figure 18). Furthermore, the gradient of α exhibited large-amplitude oscillations when using the pre-trained method, suggesting that this approach struggles to handle the irregular behaviour observed in this particular case study (see Table 13).

4. Conclusions

This work presents a fundamental study of two methodologies for the joint optimisation of the fractional derivative order and the parameters of the right-hand-side neural network in Neural FDE models. One approach involves bounding the gradient magnitude of the loss function with respect to α , promoting more stable and effective updates. Another strategy introduces an online pre-training scheme, in which the network parameters are first optimised over progressively longer time intervals, while α is updated more conservatively using the full time trajectory.
Several case studies were analysed, encompassing problems where the analytical solution and the right-hand-side function exhibit different levels of regularity and varying degrees of dependence on data.
In every case the solution y ( t ) and the forcing f ( t , y ( t ) ) are C on the open interval, but typically admit a weak slope singularity of order t α 1 and a stronger curvature singularity of order t α 2 (or worse) at t = 0 . The quadratic case (Study 2) is the only example without singularities in y itself; all others exhibit algebraic bends near the origin whose strength and complexity depend on the lowest fractional exponent. The curvature profiles range from simple constant concavity (Study 2) to multiple inflections and high nonlinearity (Study 3).
In scenarios with mildly smooth solutions, the proposed methodologies successfully converged to values of α that approximate the ground truth. In cases where the solution itself does not depend on α , but the right-hand-side function does, the optimisation algorithm was able to maintain the initial value of α , without drifting unnecessarily.
The final two case studies demonstrated the limitations of the proposed methods. In Case Study 3, convergence to the correct fractional order was not achieved within a practical number of epochs, although it is likely that significantly increasing the training time could improve results. In the last case study, small perturbations in the fractional order caused large variations in the right-hand-side function. As a result, despite a good fit to the data, the optimisation algorithm consistently drifted towards α = 1 (even for different initialisations of α ).
This work aims to encourage further developments towards methods that not only achieve a good fit to data (a goal successfully attained in the present study) but also ensure convergence to the ground-truth order of the derivative. While the proposed approaches demonstrate promising results in favourable scenarios, they fall short in more challenging problems, particularly those involving low regularity or high sensitivity to the fractional order. These limitations highlight the need for more robust optimisation strategies capable of addressing such complexities.
While our experiments demonstrate the efficacy of Neural FDEs, fractional derivatives are indeed known to enhance the modelling of periodic and chaotic dynamics in nonlinear systems. For instance, fractional-order variants of the Lorenz and Rössler systems exhibit rich bifurcation behaviours and chaotic attractors, while fractional extensions of the KdV and nonlinear Schrödinger equations support soliton solutions with anomalous dispersion [27,28,29,30]. Although such cases are beyond the scope of this work, our framework is theoretically compatible with these systems, as it learns the fractional dynamics directly from data without presuming a specific structure. Future research will explore Neural FDEs for chaotic systems and soliton dynamics, where the non-local memory effects of fractional derivatives could prove particularly advantageous.

Author Contributions

Conceptualisation, C.C., M.F.P.C. and L.L.F.; methodology, C.C., M.F.P.C. and L.L.F.; software, C.C.; validation, C.C. and L.L.F.; formal analysis, C.C. and L.L.F.; investigation, C.C. and L.L.F.; resources, C.C. and L.L.F.; data curation, C.C. and L.L.F.; writing—original draft preparation, C.C. and L.L.F.; writing—review and editing, C.C., M.F.P.C. and L.L.F.; visualisation, C.C.; supervision, M.F.P.C., O.N. and L.L.F.; project administration, O.N. and L.L.F.; funding acquisition, C.C., O.N. and L.L.F. All authors have read and agreed to the published version of the manuscript.

Funding

The authors acknowledge funding by Fundação para a Ciência e Tecnologia (Portuguese Foundation for Science and Technology) through CMAT projects UIDB/00013/2020 and the support of the High-Performance Computing Center at the University of Évora funded by FCT I.P. under the project “OptXAI: Constrained optimisation in NNs for Explainable, Ethical and Greener AI”, reference 2024.00191.CPCA.A1, platform Vision. C. Coelho would like to thank the KIBIDZ project funded by dtec.bw—Digitalization and Technology Research Center of the Bundeswehr; dtec.bw is funded by the European Union—NextGenerationEU. This work was also financially supported by national funds through the FCT/MCTES (PIDDAC) under the project 2022.06672.PTDC—iMAD (Improving the Modelling of Anomalous Diffusion and Viscoelasticity: solutions to industrial problems), DOI 10.54499/2022.06672.PTDC (https://doi.org/10.54499/2022.06672.PTDC); and by the projects LA/P/0045/2020 (ALiCE), UIDB/00532/2020, and UIDP/00532/2020 (CEFT). It was also financially supported by Fundação “la Caixa”|BPI and FCT through project PL24-00057: “Inteligência Artificial na Otimização da Rega para Olivais Resilientes às Alterações Climáticas”.

Data Availability Statement

The original contributions presented in this study are included in the article. Further inquiries can be directed to the corresponding author.

Conflicts of Interest

The authors declare no conflicts of interest.

Appendix A. Additional Plots

Appendix A.1. Case Study 1

Figure A1. Plot of the predicted and ground-truth curves from the best run in Case Study 1 with α initialised at 0.4 : (a) the original, (b) clipped, and (c) pre-trained Neural FDE models.
Figure A1. Plot of the predicted and ground-truth curves from the best run in Case Study 1 with α initialised at 0.4 : (a) the original, (b) clipped, and (c) pre-trained Neural FDE models.
Fractalfract 09 00471 g0a1
Figure A2. Evolution of the best run’s α during training for Case Study 1 with α = 0.4 : (a) original Neural FDE, (b) clipped Neural FDE, and (c) pre-trained Neural FDE.
Figure A2. Evolution of the best run’s α during training for Case Study 1 with α = 0.4 : (a) original Neural FDE, (b) clipped Neural FDE, and (c) pre-trained Neural FDE.
Fractalfract 09 00471 g0a2
Figure A3. Training loss across epochs for the best run in Case Study 1 with α = 0.4 : (a) the original, (b) clipped, and (c) pre-trained Neural FDE models.
Figure A3. Training loss across epochs for the best run in Case Study 1 with α = 0.4 : (a) the original, (b) clipped, and (c) pre-trained Neural FDE models.
Fractalfract 09 00471 g0a3
Figure A4. Plot of the best run α gradient value through training for Case Study 1, with α = 0.4 , for (a) original, (b) clipped and (c) pre-trained Neural FDE.
Figure A4. Plot of the best run α gradient value through training for Case Study 1, with α = 0.4 , for (a) original, (b) clipped and (c) pre-trained Neural FDE.
Fractalfract 09 00471 g0a4
Figure A5. Plot of the predicted and ground-truth curves from the best run in Case Study 1 with α initialised at 0.8 : (a) the original, (b) clipped, and (c) pre-trained Neural FDE models.
Figure A5. Plot of the predicted and ground-truth curves from the best run in Case Study 1 with α initialised at 0.8 : (a) the original, (b) clipped, and (c) pre-trained Neural FDE models.
Fractalfract 09 00471 g0a5
Figure A6. Evolution of the best run’s α during training for Case Study 1 with α = 0.8 : (a) original Neural FDE, (b) clipped Neural FDE, and (c) pre-trained Neural FDE.
Figure A6. Evolution of the best run’s α during training for Case Study 1 with α = 0.8 : (a) original Neural FDE, (b) clipped Neural FDE, and (c) pre-trained Neural FDE.
Fractalfract 09 00471 g0a6
Figure A7. Training loss across epochs for the best run in Case Study 1 with α = 0.8 : (a) the original, (b) clipped, and (c) pre-trained Neural FDE models.
Figure A7. Training loss across epochs for the best run in Case Study 1 with α = 0.8 : (a) the original, (b) clipped, and (c) pre-trained Neural FDE models.
Fractalfract 09 00471 g0a7
Figure A8. Plot of the best run α gradient value through training for Case Study 1, with α = 0.8 , for (a) original, (b) clipped and (c) pre-trained Neural FDE.
Figure A8. Plot of the best run α gradient value through training for Case Study 1, with α = 0.8 , for (a) original, (b) clipped and (c) pre-trained Neural FDE.
Fractalfract 09 00471 g0a8
Figure A9. Plot of the predicted and ground-truth curves from the best run in Case Study 1 with α initialised at 0.99 : (a) the original, (b) clipped, and (c) pre-trained Neural FDE models.
Figure A9. Plot of the predicted and ground-truth curves from the best run in Case Study 1 with α initialised at 0.99 : (a) the original, (b) clipped, and (c) pre-trained Neural FDE models.
Fractalfract 09 00471 g0a9
Figure A10. Evolution of the best run’s α during training for Case Study 1 with α = 0.99 : (a) original Neural FDE, (b) clipped Neural FDE, and (c) pre-trained Neural FDE.
Figure A10. Evolution of the best run’s α during training for Case Study 1 with α = 0.99 : (a) original Neural FDE, (b) clipped Neural FDE, and (c) pre-trained Neural FDE.
Fractalfract 09 00471 g0a10
Figure A11. Training loss across epochs for the best run in Case Study 1 with α = 0.99 : (a) the original, (b) clipped, and (c) pre-trained Neural FDE models.
Figure A11. Training loss across epochs for the best run in Case Study 1 with α = 0.99 : (a) the original, (b) clipped, and (c) pre-trained Neural FDE models.
Fractalfract 09 00471 g0a11
Figure A12. Plot of the best run α gradient value through training for Case Study 1, with α = 0.99 , for (a) original, (b) clipped and (c) pre-trained Neural FDE.
Figure A12. Plot of the best run α gradient value through training for Case Study 1, with α = 0.99 , for (a) original, (b) clipped and (c) pre-trained Neural FDE.
Fractalfract 09 00471 g0a12

Appendix A.2. Case Study 3

Figure A13. Plot of the predicted and ground-truth curves from the best run in Case Study 3 with α initialised at 0.8 : (a) the original, (b) clipped, and (c) pre-trained Neural FDE models.
Figure A13. Plot of the predicted and ground-truth curves from the best run in Case Study 3 with α initialised at 0.8 : (a) the original, (b) clipped, and (c) pre-trained Neural FDE models.
Fractalfract 09 00471 g0a13
Figure A14. Evolution of the best run’s α during training for Case Study 3 with α = 0.8 : (a) original Neural FDE, (b) clipped Neural FDE, and (c) pre-trained Neural FDE.
Figure A14. Evolution of the best run’s α during training for Case Study 3 with α = 0.8 : (a) original Neural FDE, (b) clipped Neural FDE, and (c) pre-trained Neural FDE.
Fractalfract 09 00471 g0a14
Figure A15. Training loss across epochs for the best run in Case Study 3 with α = 0.8 : (a) the original, (b) clipped, and (c) pre-trained Neural FDE models.
Figure A15. Training loss across epochs for the best run in Case Study 3 with α = 0.8 : (a) the original, (b) clipped, and (c) pre-trained Neural FDE models.
Fractalfract 09 00471 g0a15
Figure A16. Plot of the best run α gradient value through training for Case Study 3, with α = 0.8 , for (a) original, (b) clipped and (c) pre-trained Neural FDE.
Figure A16. Plot of the best run α gradient value through training for Case Study 3, with α = 0.8 , for (a) original, (b) clipped and (c) pre-trained Neural FDE.
Fractalfract 09 00471 g0a16
Figure A17. Plot of the predicted and ground-truth curves from the best run in Case Study 3 with α initialised at 0.99 : (a) the original, (b) clipped, and (c) pre-trained Neural FDE models.
Figure A17. Plot of the predicted and ground-truth curves from the best run in Case Study 3 with α initialised at 0.99 : (a) the original, (b) clipped, and (c) pre-trained Neural FDE models.
Fractalfract 09 00471 g0a17
Figure A18. Evolution of the best run’s α during training for Case Study 3 with α = 0.99 : (a) original Neural FDE, (b) clipped Neural FDE, and (c) pre-trained Neural FDE.
Figure A18. Evolution of the best run’s α during training for Case Study 3 with α = 0.99 : (a) original Neural FDE, (b) clipped Neural FDE, and (c) pre-trained Neural FDE.
Fractalfract 09 00471 g0a18
Figure A19. Training loss across epochs for the best run in Case Study 2 with α = 0.99 : (a) the original, (b) clipped, and (c) pre-trained Neural FDE models.
Figure A19. Training loss across epochs for the best run in Case Study 2 with α = 0.99 : (a) the original, (b) clipped, and (c) pre-trained Neural FDE models.
Fractalfract 09 00471 g0a19
Figure A20. Plot of the best run α gradient value through training for Case Study 3, with α = 0.99 , for (a) original, (b) clipped and (c) pre-trained Neural FDE.
Figure A20. Plot of the best run α gradient value through training for Case Study 3, with α = 0.99 , for (a) original, (b) clipped and (c) pre-trained Neural FDE.
Fractalfract 09 00471 g0a20

Appendix A.3. Case Study 4

Figure A21. Plot of the predicted and ground-truth curves from the best run in Case Study 4 with α initialised at 0.8 : (a) the original, (b) clipped, and (c) pre-trained Neural FDE models.
Figure A21. Plot of the predicted and ground-truth curves from the best run in Case Study 4 with α initialised at 0.8 : (a) the original, (b) clipped, and (c) pre-trained Neural FDE models.
Fractalfract 09 00471 g0a21
Figure A22. Evolution of the best run’s α during training for Case Study 4 with α = 0.8 : (a) original Neural FDE, (b) clipped Neural FDE, and (c) pre-trained Neural FDE.
Figure A22. Evolution of the best run’s α during training for Case Study 4 with α = 0.8 : (a) original Neural FDE, (b) clipped Neural FDE, and (c) pre-trained Neural FDE.
Fractalfract 09 00471 g0a22
Figure A23. Training loss across epochs for the best run in Case Study 4 with α = 0.8 : (a) the original, (b) clipped, and (c) pre-trained Neural FDE models.
Figure A23. Training loss across epochs for the best run in Case Study 4 with α = 0.8 : (a) the original, (b) clipped, and (c) pre-trained Neural FDE models.
Fractalfract 09 00471 g0a23
Figure A24. Plot of the best run α gradient value through training for Case Study 2, with α = 0.8 , for (a) original, (b) clipped and (c) pre-trained Neural FDE.
Figure A24. Plot of the best run α gradient value through training for Case Study 2, with α = 0.8 , for (a) original, (b) clipped and (c) pre-trained Neural FDE.
Fractalfract 09 00471 g0a24
Figure A25. Plot of the predicted and ground-truth curves from the best run in Case Study 4 with α initialised at 0.99 : (a) the original, (b) clipped, and (c) pre-trained Neural FDE models.
Figure A25. Plot of the predicted and ground-truth curves from the best run in Case Study 4 with α initialised at 0.99 : (a) the original, (b) clipped, and (c) pre-trained Neural FDE models.
Fractalfract 09 00471 g0a25
Figure A26. Evolution of the best run’s α during training for Case Study 4 with α = 0.99 : (a) original Neural FDE, (b) clipped Neural FDE, and (c) pre-trained Neural FDE.
Figure A26. Evolution of the best run’s α during training for Case Study 4 with α = 0.99 : (a) original Neural FDE, (b) clipped Neural FDE, and (c) pre-trained Neural FDE.
Fractalfract 09 00471 g0a26
Figure A27. Training loss across epochs for the best run in Case Study 4 with α = 0.99 : (a) the original, (b) clipped, and (c) pre-trained Neural FDE models.
Figure A27. Training loss across epochs for the best run in Case Study 4 with α = 0.99 : (a) the original, (b) clipped, and (c) pre-trained Neural FDE models.
Fractalfract 09 00471 g0a27
Figure A28. Plot of the best run α gradient value through training for Case Study 2, with α = 0.99 , for (a) original, (b) clipped and (c) pre-trained Neural FDE.
Figure A28. Plot of the best run α gradient value through training for Case Study 2, with α = 0.99 , for (a) original, (b) clipped and (c) pre-trained Neural FDE.
Fractalfract 09 00471 g0a28

References

  1. Coelho, C.; Costa, M.F.P.; Ferrás, L.L. Tracing Footprints: Neural Networks Meet Non-integer Order Differential Equations For Modelling Systems with Memory. In Proceedings of the The Second Tiny Papers Track at ICLR 2024, Tiny Papers @ ICLR 2024, Vienna, Austria, 11 May 2024. [Google Scholar]
  2. Coelho, C.; Costa, M.F.P.; Ferrás, L.L. Neural Fractional Differential Equations. Appl. Math. Model. 2025, 144, 116060. [Google Scholar] [CrossRef]
  3. Kang, Q.; Zhao, K.; Ding, Q.; Ji, F.; Li, X.; Liang, W.; Song, Y.; Tay, W.P. Unleashing the potential of fractional calculus in graph neural networks with FROND. arXiv 2024, arXiv:2404.17099. [Google Scholar] [CrossRef]
  4. Cui, W.; Kang, Q.; Li, X.; Zhao, K.; Tay, W.P.; Deng, W.; Li, Y. Neural variable-order Fractional Differential Equation networks. In Proceedings of the AAAI Conference on Artificial Intelligence, Philadelphia, PA, USA, 25 February–4 March 4 2025; Volume 39, pp. 16109–16117. [Google Scholar]
  5. Vellappandi, M.; Lee, S. Physics-informed Neural Fractional Differential Equations. Appl. Math. Model. 2025, 145, 116127. [Google Scholar] [CrossRef]
  6. Kang, Q.; Li, X.; Zhao, K.; Cui, W.; Zhao, Y.; Deng, W.; Tay, W.P. Efficient training of neural fractional-order differential equation via adjoint backpropagation. In Proceedings of the AAAI Conference on Artificial Intelligence, Philadelphia, PA, USA, 25 February–4 March 2025; Volume 39, pp. 17750–17759. [Google Scholar]
  7. Zhang, X.; Zhang, L.; Wei, W.; Sun, Y.; Tian, C.; Zhang, Y. FDE-Net: A memory-efficiency densely connected network inspired from fractional-order differential equations for single image super-resolution. Neurocomputing 2024, 600, 128143. [Google Scholar] [CrossRef]
  8. Caputo, M. Linear Models of Dissipation whose Q is almost Frequency Independent—II. Geophys. J. Int. 1967, 13, 529–539. [Google Scholar] [CrossRef]
  9. Diethelm, K. The Analysis of Fractional Differential Equations: An Application-Oriented Exposition Using Differential Operators of Caputo Type; Springer: Berlin/Heidelberg, Germany, 2010. [Google Scholar] [CrossRef]
  10. Podlubny, I. Fractional Differential Equations: An Introduction to fRactional Derivatives, Fractional Differential Equations, to Methods of Their Solution and Some of Their Applications; Elsevier: Amsterdam, The Netherlands, 1998; Volume 198. [Google Scholar]
  11. Kilbas, A.A.; Srivastava, H.M.; Trujillo, J.J. Theory and Applications of Fractional Differential Equations; North-Holland Mathematics Studies; Elsevier Science & Technology: Amsterdam, The Netherlands, 2014. [Google Scholar]
  12. Jin, B. Fractional Differential Equations: An Approach via Fractional Derivatives; Springer International Publishing: Cham, Switzerland, 2021. [Google Scholar] [CrossRef]
  13. Zhou, Y.; Wang, J.; Zhang, L. Basic Theory of Fractional Differential Equations; WORLD SCIENTIFIC: Singapore, 2016. [Google Scholar] [CrossRef]
  14. Abbas, S.; Benchohra, M.; N’Guérékata, G.M. Topics in Fractional Differential Equations; Springer: New York, NY, USA, 2012. [Google Scholar] [CrossRef]
  15. Milici, C.; Drăgănescu, G.; Tenreiro Machado, J. Introduction to Fractional Differential Equations; Springer International Publishing: Cham, Switzerland, 2019. [Google Scholar] [CrossRef]
  16. Almeida, R.; Bastos, N.R.O.; Monteiro, M.T.T. Modeling some real phenomena by Fractional Differential Equations. Math. Methods Appl. Sci. 2015, 39, 4846–4855. [Google Scholar] [CrossRef]
  17. Chen, R.T.; Rubanova, Y.; Bettencourt, J.; Duvenaud, D.K. Neural ordinary differential equations. Adv. Neural Inf. Process. Syst. 2018, 31. [Google Scholar]
  18. Coelho, C.; Costa, M.F.P.; Ferrás, L.L. Neural Fractional Differential Equations: Optimising the Order of the Fractional Derivative. Fractal Fract. 2024, 8, 529. [Google Scholar] [CrossRef]
  19. Diethelm, K.; Ford, N.J. Analysis of Fractional Differential Equations. J. Math. Anal. Appl. 2002, 265, 229–248. [Google Scholar] [CrossRef]
  20. Zhang, J.; He, T.; Sra, S.; Jadbabaie, A. Why gradient clipping accelerates training: A theoretical justification for adaptivity. arXiv 2019, arXiv:1905.11881. [Google Scholar]
  21. Zhang, B.; Jin, J.; Fang, C.; Wang, L. Improved analysis of clipping algorithms for non-convex optimization. Adv. Neural Inf. Process. Syst. 2020, 33, 15511–15521. [Google Scholar]
  22. Jain, L.C.; Seera, M.; Lim, C.P.; Balasubramaniam, P. A review of online learning in supervised neural networks. Neural Comput. Appl. 2014, 25, 491–509. [Google Scholar] [CrossRef]
  23. Paszke, A.; Gross, S.; Massa, F.; Lerer, A.; Bradbury, J.; Chanan, G.; Killeen, T.; Lin, Z.; Gimelshein, N.; Antiga, L.; et al. PyTorch: An Imperative Style, High-Performance Deep Learning Library. In Advances in Neural Information Processing Systems 32; Wallach, H., Larochelle, H., Beygelzimer, A., d’Alché-Buc, F., Fox, E., Garnett, R., Eds.; Curran Associates, Inc.: Red Hook, NY, USA, 2019; pp. 8024–8035. [Google Scholar]
  24. Zimmering, B.; Coelho, C.; Niggemann, O. Optimising Neural Fractional Differential Equations for Performance and Efficiency. In Proceedings of the 1st ECAI Workshop on “Machine Learning Meets Differential Equations: From Theory to Applications”, Santiago de Compostela, Spain, 20 October 2024; Volume 255, pp. 1–22. [Google Scholar]
  25. Kingma, D.P.; Ba, J. Adam: A method for stochastic optimization. arXiv 2014, arXiv:1412.6980. [Google Scholar]
  26. Liu, Y.; Roberts, J.; Yan, Y. A note on finite difference methods for nonlinear fractional differential equations with non-uniform meshes. Int. J. Comput. Math. 2018, 95, 1151–1169. [Google Scholar] [CrossRef]
  27. Luo, C.; Wang, X. Chaos in the fractional-order complex Lorenz system and its synchronization. Nonlinear Dyn. 2012, 71, 241–257. [Google Scholar] [CrossRef]
  28. Yu, Y.; Li, H.X. The synchronization of fractional-order Rössler hyperchaotic systems. Phys. A Stat. Mech. Its Appl. 2008, 387, 1393–1403. [Google Scholar] [CrossRef]
  29. El-Wakil, S.A.; Abulwafa, E.M.; Zahran, M.A.; Mahmoud, A.A. Time-fractional KdV equation: Formulation and solution using variational methods. Nonlinear Dyn. 2011, 65, 55–63. [Google Scholar] [CrossRef]
  30. Laskin, N. Fractional schrödinger equation. Phys. Rev. E 2002, 66, 056108. [Google Scholar] [CrossRef] [PubMed]
Figure 1. Plot of the predicted and ground-truth curves from the best run in Case Study 1 with α = 0.5 : (a) the original, (b) clipped, and (c) pre-trained Neural FDE models.
Figure 1. Plot of the predicted and ground-truth curves from the best run in Case Study 1 with α = 0.5 : (a) the original, (b) clipped, and (c) pre-trained Neural FDE models.
Fractalfract 09 00471 g001
Figure 2. Evolution of the best run’s α during training for Case Study 1 with α = 0.5 : (a) original Neural FDE, (b) clipped Neural FDE, and (c) pre-trained Neural FDE.
Figure 2. Evolution of the best run’s α during training for Case Study 1 with α = 0.5 : (a) original Neural FDE, (b) clipped Neural FDE, and (c) pre-trained Neural FDE.
Fractalfract 09 00471 g002
Figure 3. Training loss across epochs for the best run in Case Study 1 with with α = 0.5 : (a) the original, (b) clipped, and (c) pre-trained Neural FDE models.
Figure 3. Training loss across epochs for the best run in Case Study 1 with with α = 0.5 : (a) the original, (b) clipped, and (c) pre-trained Neural FDE models.
Fractalfract 09 00471 g003
Figure 4. Evolution of the best run’s α gradient during training for Case Study 1 with with α = 0.5 : (a) original Neural FDE, (b) clipped Neural FDE, and (c) pre-trained Neural FDE.
Figure 4. Evolution of the best run’s α gradient during training for Case Study 1 with with α = 0.5 : (a) original Neural FDE, (b) clipped Neural FDE, and (c) pre-trained Neural FDE.
Fractalfract 09 00471 g004
Figure 5. Plot of the predicted and ground-truth curves from the best run in Case Study 2 with α initialised at 0.5 : (a) the original, (b) clipped, and (c) pre-trained Neural FDE models.
Figure 5. Plot of the predicted and ground-truth curves from the best run in Case Study 2 with α initialised at 0.5 : (a) the original, (b) clipped, and (c) pre-trained Neural FDE models.
Fractalfract 09 00471 g005
Figure 6. Plot of the predicted and ground-truth curves from the best run in Case Study 2 with α initialised at 0.99 : (a) the original, (b) clipped, and (c) pre-trained Neural FDE models.
Figure 6. Plot of the predicted and ground-truth curves from the best run in Case Study 2 with α initialised at 0.99 : (a) the original, (b) clipped, and (c) pre-trained Neural FDE models.
Fractalfract 09 00471 g006
Figure 7. Evolution of the best run’s α during training for Case Study 2 with α initialised at 0.5 : (a) original Neural FDE, (b) clipped Neural FDE, and (c) pre-trained Neural FDE.
Figure 7. Evolution of the best run’s α during training for Case Study 2 with α initialised at 0.5 : (a) original Neural FDE, (b) clipped Neural FDE, and (c) pre-trained Neural FDE.
Fractalfract 09 00471 g007
Figure 8. Evolution of the best run’s α during training for Case Study 2 with α initialised at 0.99 : (a) original Neural FDE, (b) clipped Neural FDE, and (c) pre-trained Neural FDE.
Figure 8. Evolution of the best run’s α during training for Case Study 2 with α initialised at 0.99 : (a) original Neural FDE, (b) clipped Neural FDE, and (c) pre-trained Neural FDE.
Fractalfract 09 00471 g008
Figure 9. Evolution of the best run’s α gradient during training for Case Study 1 with α initialised at 0.5 : (a) original Neural FDE, (b) clipped Neural FDE, and (c) pre-trained Neural FDE.
Figure 9. Evolution of the best run’s α gradient during training for Case Study 1 with α initialised at 0.5 : (a) original Neural FDE, (b) clipped Neural FDE, and (c) pre-trained Neural FDE.
Fractalfract 09 00471 g009
Figure 10. Evolution of the best run’s α gradient during training for Case Study 1 with α initialised at 0.99 : (a) original Neural FDE, (b) clipped Neural FDE, and (c) pre-trained Neural FDE.
Figure 10. Evolution of the best run’s α gradient during training for Case Study 1 with α initialised at 0.99 : (a) original Neural FDE, (b) clipped Neural FDE, and (c) pre-trained Neural FDE.
Fractalfract 09 00471 g010
Figure 11. Training loss across epochs for the best run in Case Study 2 with α initialised at 0.5 : (a) the original, (b) clipped, and (c) pre-trained Neural FDE models.
Figure 11. Training loss across epochs for the best run in Case Study 2 with α initialised at 0.5 : (a) the original, (b) clipped, and (c) pre-trained Neural FDE models.
Fractalfract 09 00471 g011
Figure 12. Training loss across epochs for the best run in Case Study 2 with α initialised at 0.99 : (a) the original, (b) clipped, and (c) pre-trained Neural FDE models.
Figure 12. Training loss across epochs for the best run in Case Study 2 with α initialised at 0.99 : (a) the original, (b) clipped, and (c) pre-trained Neural FDE models.
Fractalfract 09 00471 g012
Figure 13. Plot of the predicted and ground-truth curves from the best run in Case Study 3 with α = 0.5 : (a) the original, (b) clipped, and (c) pre-trained Neural FDE models.
Figure 13. Plot of the predicted and ground-truth curves from the best run in Case Study 3 with α = 0.5 : (a) the original, (b) clipped, and (c) pre-trained Neural FDE models.
Fractalfract 09 00471 g013
Figure 14. Evolution of the best run’s α during training for Case Study 3 with α = 0.5 : (a) original Neural FDE, (b) clipped Neural FDE, and (c) pre-trained Neural FDE.
Figure 14. Evolution of the best run’s α during training for Case Study 3 with α = 0.5 : (a) original Neural FDE, (b) clipped Neural FDE, and (c) pre-trained Neural FDE.
Fractalfract 09 00471 g014
Figure 15. Training loss across epochs for the best run in Case Study 3 with α = 0.5 : (a) the original, (b) clipped, and (c) pre-trained Neural FDE models.
Figure 15. Training loss across epochs for the best run in Case Study 3 with α = 0.5 : (a) the original, (b) clipped, and (c) pre-trained Neural FDE models.
Fractalfract 09 00471 g015
Figure 16. Evolution of the best run’s α gradient during training for Case Study 3 with α = 0.5 : (a) original Neural FDE, (b) clipped Neural FDE, and (c) pre-trained Neural FDE.
Figure 16. Evolution of the best run’s α gradient during training for Case Study 3 with α = 0.5 : (a) original Neural FDE, (b) clipped Neural FDE, and (c) pre-trained Neural FDE.
Fractalfract 09 00471 g016
Figure 17. Plot of the predicted and ground-truth curves from the best run in Case Study 4 with α = 0.5 : (a) the original, (b) clipped, and (c) pre-trained Neural FDE models.
Figure 17. Plot of the predicted and ground-truth curves from the best run in Case Study 4 with α = 0.5 : (a) the original, (b) clipped, and (c) pre-trained Neural FDE models.
Fractalfract 09 00471 g017
Figure 18. Evolution of the best run’s α during training for Case Study 4 with α = 0.5 : (a) original Neural FDE, (b) clipped Neural FDE, and (c) pre-trained Neural FDE.
Figure 18. Evolution of the best run’s α during training for Case Study 4 with α = 0.5 : (a) original Neural FDE, (b) clipped Neural FDE, and (c) pre-trained Neural FDE.
Fractalfract 09 00471 g018
Figure 19. Plot of the best run α gradient value through training for Case Study 4, with α = 0.5 : (a) original, (b) clipped and (c) pre-trained Neural FDE.
Figure 19. Plot of the best run α gradient value through training for Case Study 4, with α = 0.5 : (a) original, (b) clipped and (c) pre-trained Neural FDE.
Fractalfract 09 00471 g019
Figure 20. Training loss across epochs for the best run in Case Study 4 with α = 0.5 : (a) the original, (b) clipped, and (c) pre-trained Neural FDE models.
Figure 20. Training loss across epochs for the best run in Case Study 4 with α = 0.5 : (a) the original, (b) clipped, and (c) pre-trained Neural FDE models.
Fractalfract 09 00471 g020
Table 1. Comparison between the solutions of problem (4), with λ = 1 , for the fractional orders α = 0.80 and α = 0.85 .
Table 1. Comparison between the solutions of problem (4), with λ = 1 , for the fractional orders α = 0.80 and α = 0.85 .
t y 0.80 ( t ) y 0.85 ( t ) | Δ y ( t ) |
0.10.8460.8630.017
0.50.5620.5720.010
1.00.3870.3810.006
5.00.0880.0660.022
7.50.0570.0400.017
10.00.0430.0290.014
Table 2. Final training loss (average MSE ± standard deviation over 3 runs) of the Neural FDE model using the three different training methods for Case Study 1.
Table 2. Final training loss (average MSE ± standard deviation over 3 runs) of the Neural FDE model using the three different training methods for Case Study 1.
α Original Neural FDEClipped Neural FDEPre-Trained Neural FDE
0.40.000090 ± 0.0000500.000120 ± 0.0000600.000080 ± 0.000070
0.50.000030 ± 0.0000200.000130 ± 0.0001000.000100 ± 0.000100
0.80.000030 ± 0.0000200.000030 ± 0.0000100.000010 ± 0.000001
0.990.000001 ± 0.0000010.00001 ± 0.0000100.000001 ± 0.000001
Table 3. Test MSE (average MSE ± standard deviation over 3 runs) of the Neural FDE using the three training methods, applied to Case Study 1.
Table 3. Test MSE (average MSE ± standard deviation over 3 runs) of the Neural FDE using the three training methods, applied to Case Study 1.
α Original Neural FDEClipped Neural FDEPre-Trained Neural FDE
0.40.00016 ± 0.000080.00014 ± 0.000020.00015 ± 0.00016
0.50.00004 ± 0.000030.00017 ± 0.000140.00021 ± 0.00023
0.80.00011 ± 0.000040.00013 ± 0.000070.00003 ± 0.00001
0.990.00001 ± 0.000010.00018 ± 0.000240.00002 ± 0.00002
Table 4. Learnt values of α across three runs of the Neural FDE model using the three different training methods for Case Study 1. Results are reported as the mean MSE ± standard deviation.
Table 4. Learnt values of α across three runs of the Neural FDE model using the three different training methods for Case Study 1. Results are reported as the mean MSE ± standard deviation.
α Original Neural FDEClipped Neural FDEPre-Trained Neural FDE
0.40.60053 ± 0.034420.79190 ± 0.174550.55023 ± 0.29157
0.50.68963 ± 0.104620.83263 ± 0.160900.47243 ± 0.15194
0.80.94540 ± 0.044140.95633 ± 0.026520.88103 ± 0.06997
0.990.98280 ± 0.003320.97977 ± 0.004660.98657 ± 0.00180
Table 5. Final training loss (average MSE ± standard deviation over 3 runs) of the Neural FDE model using the three different training methods for Case Study 2.
Table 5. Final training loss (average MSE ± standard deviation over 3 runs) of the Neural FDE model using the three different training methods for Case Study 2.
α InitialisationOriginal Neural FDEClipped Neural FDEPre-Trained Neural FDE
0.50.061450 ± 0.0357300.046410 ± 0.0304700.043890 ± 0.031410
0.990.041230 ± 0.0290300.035010 ± 0.0075700.023510 ± 0.005310
Table 6. Test MSE (average MSE ± standard deviation over 3 runs) of the Neural FDE using the three training methods, applied to Case Study 2.
Table 6. Test MSE (average MSE ± standard deviation over 3 runs) of the Neural FDE using the three training methods, applied to Case Study 2.
α InitialisationOriginal Neural FDEClipped Neural FDEPre-Trained Neural FDE
0.52.140550 ± 0.4763302.386290 ± 3.1415702.527410 ± 3.354300
0.992.253340 ± 2.9188301.505450 ± 0.4569100.465760 ± 0.514360
Table 7. Learnt values of α across three runs of the Neural FDE model using the three different training methods for Case Study 2. α initialised at 0.5 .
Table 7. Learnt values of α across three runs of the Neural FDE model using the three different training methods for Case Study 2. α initialised at 0.5 .
RunOriginal Neural FDEClipped Neural FDEPre-Trained Neural FDE
10.60720.61510.5827
20.58810.65150.7150
30.55100.54960.6826
Table 8. Learnt values of α across three runs of the Neural FDE model using the three different training methods for Case Study 2. α initialised at 0.99 .
Table 8. Learnt values of α across three runs of the Neural FDE model using the three different training methods for Case Study 2. α initialised at 0.99 .
RunOriginal Neural FDEClipped Neural FDEPre-Trained Neural FDE
10.99080.99250.9934
20.99210.99340.9921
30.99640.99410.9906
Table 9. Comparison between the solutions and right-hand-side terms of the fractional ODE for α = 0.80 and α = 0.85 .
Table 9. Comparison between the solutions and right-hand-side terms of the fractional ODE for α = 0.80 and α = 0.85 .
t y 0.80 ( t ) y 0.85 ( t ) | Δ y ( t ) | f 0.80 ( t ) f 0.85 ( t ) | Δ f ( t , y ( t ) ) |
0.100.3564820.3177080.0387732.0931022.1247580.031655
0.250.7355080.6860330.0494752.0278572.0520260.024169
0.501.1540931.1125150.0415791.3072841.2647560.042528
0.751.0415131.0220500.019464−0.781984−0.9715320.189548
1.000.2500000.2500000.000000−2.571448−2.7470860.175638
Table 10. Final training loss (average MSE ± standard deviation over 3 runs) of the Neural FDE model using the three different training methods for Case Study 3.
Table 10. Final training loss (average MSE ± standard deviation over 3 runs) of the Neural FDE model using the three different training methods for Case Study 3.
α Original Neural FDEClipped Neural FDEPre-Trained Neural FDE
0.50.000140 ± 0.0000200.000170 ± 0.0000200.000100 ± 0.000001
0.80.000020 ± 0.0000010.000010 ± 0.0000010.000001 ± 0.000001
0.990.000001 ± 0.0000010.000001 ± 0.0000010.000010 ± 0.000010
Table 11. Test MSE (average MSE ± standard deviation over 3 runs) of the Neural FDE using the three training methods, applied to Case Study 3.
Table 11. Test MSE (average MSE ± standard deviation over 3 runs) of the Neural FDE using the three training methods, applied to Case Study 3.
α Original Neural FDEClipped Neural FDEPre-Trained Neural FDE
0.50.001690 ± 0.0001100.001810 ± 0.0001000.001190 ± 0.000050
0.80.000610 ± 0.0000400.000610 ± 0.0000100.000290 ± 0.000060
0.990.000330 ± 0.0000100.000330 ± 0.0000200.000150 ± 0.000050
Table 12. Learnt values of α across three runs of the Neural FDE model using the three different training methods for Case Study 3. Results are reported as the mean MSE ± standard deviation.
Table 12. Learnt values of α across three runs of the Neural FDE model using the three different training methods for Case Study 3. Results are reported as the mean MSE ± standard deviation.
α Original Neural FDEClipped Neural FDEPre-Trained Neural FDE
0.50.978230 ± 0.0020300.980170 ± 0.0002500.974870 ± 0.000210
0.80.988530 ± 0.0000500.988700 ± 0.0000800.988400 ± 0.000370
0.990.990430 ± 0.0000500.990430 ± 0.0000500.989570 ± 0.000260
Table 13. Comparison between the solutions of problem (11), with λ = 1 , for the fractional orders α = 0.80 and α = 0.85 .
Table 13. Comparison between the solutions of problem (11), with λ = 1 , for the fractional orders α = 0.80 and α = 0.85 .
t y 0.80 ( t ) y 0.85 ( t ) | Δ y ( t ) |
0.11.1891.1630.026
0.51.9281.8460.082
1.03.2953.1250.169
5.0185.471174.57310.898
7.52260.0182127.085132.932
10.027,533.05325,913.4701619.583
Table 14. Final training loss (average MSE ± standard deviation over 3 runs) of the Neural FDE model using the three different training methods for Case Study 4.
Table 14. Final training loss (average MSE ± standard deviation over 3 runs) of the Neural FDE model using the three different training methods for Case Study 4.
α Original Neural FDEClipped Neural FDEPre-Trained Neural FDE
0.50.000020 ± 0.0000300.000110 ± 0.0001400.000180 ± 0.000260
0.80.000040 ± 0.0000300.000020 ± 0.0000200.005830 ± 0.002250
0.990.000040 ± 0.0000010.000050 ± 0.0000400.000200 ± 0.000260
Table 15. Test MSE (average MSE ± standard deviation over 3 runs) of the Neural FDE using the three training methods, applied to Case Study 4.
Table 15. Test MSE (average MSE ± standard deviation over 3 runs) of the Neural FDE using the three training methods, applied to Case Study 4.
α Original Neural FDEClipped Neural FDEPre-Trained Neural FDE
0.50.021130 ± 0.0034100.024390 ± 0.0126600.027110 ± 0.019670
0.80.014270 ± 0.0101500.014860 ± 0.0061100.032400 ± 0.010170
0.990.020810 ± 0.0066600.009430 ± 0.0017800.035060 ± 0.013860
Table 16. Learnt values of α across three runs of the Neural FDE model using the three different training methods for Case Study 4. Results are reported as the mean MSE ± standard deviation.
Table 16. Learnt values of α across three runs of the Neural FDE model using the three different training methods for Case Study 4. Results are reported as the mean MSE ± standard deviation.
α Original Neural FDEClipped Neural FDEPre-Trained Neural FDE
0.50.995400 ± 0.0006700.996800 ± 0.0011200.997230 ± 0.001340
0.80.996070 ± 0.0007400.995330 ± 0.0016000.990330 ± 0.001670
0.990.995600 ± 0.0019500.996330 ± 0.0011400.997270 ± 0.001430
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

Share and Cite

MDPI and ACS Style

Coelho, C.; Costa, M.F.P.; Niggemann, O.; Ferrás, L.L. Methodologies for Improved Optimisation of the Derivative Order and Neural Network Parameters in Neural FDE Models. Fractal Fract. 2025, 9, 471. https://doi.org/10.3390/fractalfract9070471

AMA Style

Coelho C, Costa MFP, Niggemann O, Ferrás LL. Methodologies for Improved Optimisation of the Derivative Order and Neural Network Parameters in Neural FDE Models. Fractal and Fractional. 2025; 9(7):471. https://doi.org/10.3390/fractalfract9070471

Chicago/Turabian Style

Coelho, Cecília, M. Fernanda P. Costa, Oliver Niggemann, and Luís L. Ferrás. 2025. "Methodologies for Improved Optimisation of the Derivative Order and Neural Network Parameters in Neural FDE Models" Fractal and Fractional 9, no. 7: 471. https://doi.org/10.3390/fractalfract9070471

APA Style

Coelho, C., Costa, M. F. P., Niggemann, O., & Ferrás, L. L. (2025). Methodologies for Improved Optimisation of the Derivative Order and Neural Network Parameters in Neural FDE Models. Fractal and Fractional, 9(7), 471. https://doi.org/10.3390/fractalfract9070471

Article Metrics

Back to TopTop