Next Article in Journal
Fibres and Textiles: Innovations, Engineering, and Sustainability—In Memory of Professor Izabella Krucińska (1953–2023)
Previous Article in Journal
A Dataflow-Driven Behavioral Modeling Method for RF System Design Validation
 
 
Font Type:
Arial Georgia Verdana
Font Size:
Aa Aa Aa
Line Spacing:
Column Width:
Background:
Article

A Physics-Informed Surrogate Model for the Bi-Flux Bevilacqua–Galeão Anomalous Diffusion Equation

by
Douglas Ferraz Corrêa
1,2,*,
Cláudio Motta Toledo
3,
David A. Pelta
2 and
Antônio Silva Neto
1
1
Department of Computational Modelling, Polytechnic Institute, Rio de Janeiro State University, Nova Friburgo 28625-570, Brazil
2
Department of Computer Science and Artificial Intelligence, University of Granada, 18014 Granada, Spain
3
Institute of Mathematical and Computer Sciences, University of São Paulo, São Carlos 13566-590, Brazil
*
Author to whom correspondence should be addressed.
Eng 2026, 7(6), 293; https://doi.org/10.3390/eng7060293 (registering DOI)
Submission received: 13 May 2026 / Revised: 11 June 2026 / Accepted: 12 June 2026 / Published: 14 June 2026

Abstract

Accurate modeling of bi-flux anomalous diffusion presents significant computational challenges in engineering. This paper investigates the effectiveness of physics-informed neural networks as surrogate models for the bi-flux anomalous diffusion equation. We investigate one-dimensional linear and nonlinear cases. Optimal hyperparameter configurations are determined using a modified differential evolution algorithm, guided by an objective function that leverages a combination of loss values. This optimization approach enables a rigorous evaluation of different neural network setups, providing valuable insights and practical guidance for researchers working with bi-flux anomalous diffusion phenomena. A comparison between physics-informed neural networks and conventional multilayer perceptrons is presented for the analyzed model. Finally, the capability of the best-performing models to act as virtual sensors is evaluated. This work provides guidance on the use of neural networks to efficiently and accurately tackle complex bi-flux anomalous diffusion problems, potentially accelerating research and development in fields where such processes are critical.

1. Introduction

Many transport processes encountered in engineering and the applied sciences depart from the behavior predicted by classical diffusion theory. Phenomena such as contaminant migration in heterogeneous soils, mass transport through porous media, and particle dispersion in biological or anisotropic substrata frequently exhibit anomalous diffusion, in which the mean-squared displacement of the diffusing quantity does not grow linearly in time, giving rise to sub-diffusive or super-diffusive regimes [1]. Fick’s law, expressed as a second-order parabolic equation, cannot reproduce this class of behavior; in particular, it cannot represent the temporary retention of particles that characterizes many non-Fickian processes [2]. By modeling retention through a discrete formulation and taking the continuous limit, ref. [3] obtained an equation whose negative fourth-order term is essential to represent retention. This formulation was later consolidated analytically for reactive media [4] and extended into a bimodal-flux theory in which particles in distinct energy states follow the primary and secondary fluxes, yielding a generalized law for multi-flux processes [5]. We will refer to this model as the Bevilacqua–Galeão model (BG model).
A central modeling choice in the BG framework is the law governing the redistribution parameter β , which controls the qualitative behavior of the solution. A concentration-dependent (sigmoid) law was adopted by [6], while [7] through a second-moment analysis, showed that the model can reproduce both sub-diffusive and super-diffusive behavior depending on the parameters employed. Time-dependent exponential laws were explored by [2] from energy-transfer principles, revealing richer dynamics such as a hidden internal flux under state migration, while ref. [8] analyzed alternative secondary-flux definitions for anisotropic media and motivated the dependence of β on the reactivity coefficient. These studies indicate that constant, spatially varying, and concentration-dependent forms of β are all physically admissible, which motivates the parameter configurations considered here.
The identification of the BG model parameters has also been addressed through inverse analysis. Ref. [9] estimated the redistribution and retention parameters using sensitivity analysis to locate optimal measurements and Monte Carlo propagation to quantify uncertainty, an approach later extended with stochastic and Bayesian techniques, including Differential Evolution and the Markov Chain Monte Carlo method [10]. In this context, a modified Differential Evolution algorithm that contracts the search space as the objective function decreases was shown to lower the computational cost of estimation without a significant loss of accuracy [11]. Since the fourth-order operator makes the direct numerical solution costly, particularly when the equation must be solved repeatedly, these inverse studies further motivate the use of an efficient surrogate model.
In recent years, physics-informed neural networks (PINNs) have been increasingly employed to solve partial differential equations. If effective, they could provide a computationally efficient alternative to conventional numerical solvers such as finite element, finite difference, and finite volume methods. Along these lines, many authors have tested the usefulness of PINNs for partial differential equations [12,13,14] including heat conduction equations [15,16], and fourth-order partial differential equations [17,18]. While other studies have focused on improving neural network configurations [19,20,21,22] some studies have investigated the use of metaheuristics to optimize PINNs [23,24]. After carefully implementing PINNs for several well-known partial differential equations, ref. [25] concluded that PINNs have some drawbacks; for example, training can be time-consuming, and because the principles governing optimal network design are still not well established, hyperparameter tuning for stable and efficient convergence remains challenging. In that regard, ref. [26] proposed an automated framework for architecture and hyperparameter selection to enhance PINN performance, but with limited hyperparameters to be optimized.
Building on inverse-problem methodology, and following a strategy similar to that of [26], this work addresses this limitation by optimizing the architecture of a Physics-Informed Neural Network (PINN) designed to serve as an efficient surrogate model for the Bevilacqua–Galeão bi-flux equation. To achieve this, hyperparameter selection is formulated as an optimization problem within an inverse-analysis framework, as is common in many engineering applications [27]. A metaheuristic search strategy is used to identify viable architectures, with the metaheuristic acting as a feasible solution generator [28], and the PINN is trained using high-fidelity synthetic data obtained from the finite difference method.
The results not only highlight the potential of PINNs to reduce computational cost compared with traditional numerical methods but also provide practical guidelines for researchers seeking to leverage machine learning in the modeling of non-standard diffusion phenomena. To demonstrate the practical engineering utility of this optimized surrogate, it is employed to solve a complex inverse problem: estimating the heterogeneous physical properties and reconstructing the spatio-temporal concentration profile of a simulated contaminant spill. The main contributions of this work are:
(i)
A systematic, metaheuristic-driven evaluation of PINN hyperparameter configurations for the fourth-order bi-flux anomalous diffusion equation;
(ii)
A controlled comparison between physics-informed and purely data-driven surrogates sharing the same architecture, initialization, data, and training budget;
(iii)
The demonstration of the optimized surrogate in a practical engineering task, namely the estimation of heterogeneous-medium parameters and the reconstruction of the concentration of a simulated contaminant spill, with an approximately 52% reduction in recurring computational cost.
In Section 2, the main physical problem of interest, the bi-flux Bevilacqua–Galeão anomalous diffusion model, is described and its mathematical formulation is presented.
Section 3 presents the inverse problem methodology used to find the best hyperparameters, which includes a brief explanation of the differential evolution algorithm used.
In Section 4, the results of the study are presented and analyzed. Key findings are illustrated with figures and tables and are followed by a comprehensive discussion.
Finally, Section 5 presents concluding remarks on the results obtained in this research and indicates directions for future work.

2. Formulation of the Direct Problem

2.1. Bi-Flux Bevilacqua–Galeão Anomalous Diffusion Model

One commonly used mathematical equation to represent the balance of mass in a fixed control volume can be written as
t c v C ( x , t ) d V = c s Φ · n d S ,
where C is the particle concentration at position x and time t, V is the volume, c v is the control volume being analyzed, Φ is the flux of particles, n is the unit normal to the control surface (positive outwards), S is the surface area and, finally, c s denotes the control surface.
The key point in the bi-flux anomalous diffusion theory is the hypothesis of a composite flux, which combines Fick’s classical diffusion flux and a secondary flux, proportional to the first but in the opposite direction. Here, we define this composite flux as
Φ = β ( x , t ) Φ F + 1 β ( x , t ) Φ S ,
where Φ F is the Fickian flux of diffusion, defined as
Φ F = D c ,
and Φ S is the secondary flux, given by
Φ S = β ( x , t ) R ( · ( d c ) ) ,
where β represents the fraction of particles that spread according to the classical diffusion formulation, D is Fick’s diffusion coefficient, d is a proportional coefficient, and R is the reactivity coefficient as described in [5,8].
When using Equation (2) with the Gauss divergence theorem applied to Equation (1), together with some algebraic manipulation, we obtain a general 1D bi-flux anomalous diffusion equation, written here as
c t = A 1 c x + A 2 2 c x 2 + A 3 3 c x 3 + A 4 4 c x 4 ,
where A 1 , A 2 , A 3 and A 4 are functions that depend on the nature of β , D, d and R.
An analytical solution for Equation (3) depends strongly on the assumptions adopted and is beyond the scope of the present work. Equation (3) requires four boundary conditions and one initial condition. The first set of boundary conditions used in this study was defined as
c ( x = 0 , t ) = 0 ,
and
c ( x = L , t ) = 0 ,
while the second set can be written as
c x | x = 0 = 0 ,
and
c x | x = L = 0 ,
with the following initial condition
c ( x , t = 0 ) = s i n ( π x L ) .
To better investigate the PINN for the Bevilacqua–Galeão bi-flux anomalous diffusion phenomenon, three distinct parameter configurations were evaluated. The first case assumed constant coefficients, whereas the second postulated a spatial logarithmic dependence for the parameter β and the third assumed a dependence of the concentration on the parameter β . Despite these constitutive differences, the same initial and generalized boundary conditions were imposed across all three configurations. The aim was to assess the behavior of the PINN when dealing with different governing partial differential equations derived from the same underlying bi-flux theory while trying to cover the main scenarios presented in the literature for the BG model.

2.1.1. The Constant Redistribution

By treating β , D, d and R as constants, the medium is modeled as homogeneous and time-invariant, meaning the diffusion behavior is decoupled from specific spatial or temporal coordinates.
Consequently, this formulation with the representation adopted in this study, leads to
A 1 = 0 ,
A 2 = β D ,
A 3 = 0 ,
and
A 4 = β ( 1 β ) R d ,
which, when substituted in Equation (3), results in
c t = β D 2 c x 2 β ( 1 β ) R d 4 c x 4 .

2.1.2. The Logarithmic Law for the Redistribution

One method to characterize the redistribution behavior as a function of the domain, is to postulate the parameter β as a logarithmic function of the spatial coordinate x. This spatial dependence is formulated as
β ( x ) = l o g ( x + 1 ) ,
which leads to
A 1 ( x ) = D d β ( x ) d x ,
A 2 ( x ) = D β ( x ) ,
A 3 ( x ) = R d ( 1 2 β ( x ) ) d β ( x ) d x ,
and
A 4 ( x ) = R d β ( x ) ( 1 β ( x ) ) .

2.1.3. The Sigmoid Law for the Redistribution

This dependence of β on the concentration has long been motivated by the authors of the model, but was later analyzed in greater detail in [7], the idea is to define
β ( c ) = β m a x β m a x β m i n 1 + e γ ( c c 0 ) ,
where c = c ( x , t ) . This results in
A 1 ( x ) = D β ( c ) x ,
A 2 ( x ) = D β ( c ) ,
A 3 ( x ) = R d ( 1 2 β ( c ) ) β ( c ) x ,
and
A 4 ( x ) = R d β ( c ) ( 1 β ( c ) ) .

2.2. Numerical Solution

The finite difference method requires the definition of a spatial and temporal discretization grid. For the one-dimensional case considered in this study, the domain discretization was analyzed in both space and time, in order to select a grid that is sufficiently small for the purpose of this study.
Reference [29] conducted an important analysis on the truncation order error while choosing the approximation for the derivatives, the results guided the selection of the following terms for the derivatives, starting with the first time derivative Forward-Time Centered-Space implicit approach
c ( x , t ) t ϕ i n + 1 ϕ i n Δ t , O [ ( Δ t ) ]
where ϕ i n is the concentration at the i-th spatial node and the n-th time node, i.e., ϕ i n = c ( i Δ x , n Δ t ) . For the spatial derivatives, the scheme used is given as follows
c ( x , t ) x ϕ i 2 n + 1 8 ϕ i 1 n + 1 + 8 ϕ i + 1 n + 1 ϕ i + 2 n + 1 12 Δ x , O ( Δ x ) 4 2 c ( x , t ) x 2 ϕ i 2 n + 1 + 16 ϕ i 1 n + 1 30 ϕ i n + 1 + 16 ϕ i + 1 n + 1 ϕ i + 2 n + 1 12 Δ x 2 , O ( Δ x ) 4 3 c ( x , t ) x 3 ϕ i 2 n + 1 + 2 ϕ i 1 n + 1 2 ϕ i + 1 n + 1 + ϕ i + 2 n + 1 2 Δ x 3 , O ( Δ x ) 4 4 c ( x , t ) x 4 ϕ i 2 n + 1 4 ϕ i 1 n + 1 + 6 ϕ i n + 1 4 ϕ i + 1 n + 1 + ϕ i + 2 n + 1 Δ x 4 , O ( Δ x ) 2
The resulting linear system was solved using Gaussian elimination with back substitution.

2.3. The Neural Network Configuration

To establish a robust surrogate model capable of capturing the complex redistribution dynamics of the Bevilacqua–Galeão bi-flux formalism, a Physics-Informed Neural Network (PINN) architecture was implemented based on the approach proposed by [30]. The fundamental topology consists of a Multilayer Perceptron (MLP) with six input variables: the position x, the time t, A 1 , A 2 , A 3 , and A 4 . The output layer has one node, representing the concentration c ( x , t ) for the given parameters A 1 , A 2 , A 3 , A 4 .
However, the internal topology, specifically the hyperparameters and activation mechanisms, was not fixed a priori. Instead, this work proposes an optimization framework for identifying suitable network topologies for bi-flux anomalous diffusion.
The search space S is defined by the lower and upper bounds of these parameters, as detailed in Table 1. The optimization is driven by a Differential Evolution (DE) metaheuristic, which iteratively evolves the population of network configurations to identify the parameters within the search space S that minimize the objective function J .
As can be seen from Table 1, this framework integrates a comprehensive dictionary of 17 activation functions, ranging from standard ReLU to functions such as Mish, GELU, and Swish (SiLU). This allows the evolutionary algorithm to identify activation functions capable of accurately representing the bi-flux dynamics.
Figure 1 illustrates the basic neural network structure used.
The training process for each candidate configuration utilizes the Adam optimizer, subject to an early stopping mechanism monitoring the validation loss with a patience parameter equal to 10% of the maximum number of epochs. The performance metric used to guide the evolutionary selection process is the sum of the training and validation losses, ensuring the model generalizes well to unseen data within the Bevilacqua–Galeão framework.
The Artificial Neural Network loss function was customized to account for the partial differential equation; this loss is presented here as
L P I N N = 1 N j N | | f ( x j | θ ) y j | | 2 2 + 1 M i M | | g ( x i , f ( x i | θ ) ) | | 2 2 ,
where f ( x j | θ ) is the neural network being evaluated that approximates c ( x j , t j ) and g represents the sum of losses at collocation points plus boundary and initial conditions, that is
g ( x i , f ( x i | θ ) ) = λ L pde , i + L bc , i + L ic , i .
All derivatives for L pde are obtained by automatic differentiation of the network output at the i-th collocation point. The boundary term enforces the two boundary-condition sets, and the initial term enforces c ( x , 0 ) = sin ( π x / L ) . The PDE residual was evaluated at 8096 collocation points per epoch, sampled uniformly over the space–time domain, no additional weighting was applied beyond the factor λ .
Likewise, to isolate and assess the specific advantages of physical regularization, a MLP model is evaluated alongside the PINN. This is achieved methodologically by setting the physics-residual term (g) to zero by defining λ = 0 in the source code, thereby stripping the network architecture of all physical information.

3. Formulation of the Inverse Problem

3.1. Hyperparameter Estimation

In this work, the inverse problem of parameter estimation is formulated as an optimization task. To determine the optimal PINN configuration, we minimize an objective function ( J 1 ) that quantifies the discrepancy between the model predictions and the reference data. To account for both optimization performance and generalization capability, J 1 is defined as the arithmetic mean of the training loss ( T L ) and the validation loss ( V L ):
J 1 ( T L , V L ) = T L + V L 2 .
Because both the training and validation losses are highly sensitive to the model’s configuration, minimizing J 1 effectively translates into a hyperparameter optimization problem. The specific hyperparameters selected for this tuning process are detailed in Table 1.

3.2. Heterogeneous Medium Parameter Estimation

Once the optimal PINN architecture is established, it can be used as a fast forward solver for practical engineering problems. In this study, we apply the surrogate to an inverse parameter-estimation problem in a heterogeneous medium. The objective function J 2 was formulated as the residual sum of squares between the measured concentration profiles and the trial values
J 2 ( C e x p , C t r i a l ) = i = 0 N ( C e x p , i C t r i a l , i ) 2 ,
where N is the number of available data points. In this particular case, the reference data ( C e x p ) was synthetically generated using the finite difference method. To simulate low-noise experimental conditions, Gaussian noise with zero mean and a standard deviation of 0.01 was added to this numerical solution. Conversely, the predicted concentrations ( C t r i a l ) were evaluated using the PINN surrogate model based on the candidate parameters A 1 ( x ) , A 2 ( x ) , A 3 ( x ) , and A 4 ( x ) . This strategy was used to reduce the computational cost of the optimization process.

3.3. Normalized Sensitivity Coefficient

The sensitivity coefficient ( X A i ) is defined in terms of the first-order partial derivative of the observable variable, c, with respect to the parameter of interest, A i . The normalized sensitivity coefficient is used and is defined as
X A i = A i c A i , i = 1 , 2 , , 4 .
For reliable parameter estimation, the system under consideration must be sufficiently sensitive to the parameters being estimated. In other words, small variations in a given parameter should produce measurable changes in the observable variable, allowing the influence of that parameter on the system response to be assessed. This analysis also indicates which positions and response intervals provide the most favorable conditions for parameter estimation and which parameters can be estimated simultaneously [27]. When two or more parameters are estimated at the same time, their sensitivity coefficients must be linearly independent; otherwise, the inverse problem may not have a unique solution [31].

3.4. Optimization Strategy

Proposed by Storn and Price [32], the Differential Evolution is a heuristic optimization method based on vector operations in which the weighted difference between two vectors is added to a third vector to generate candidate solutions. We summarize our implemented version of this method as follows:
  • Generation of an initial random population
    q k , j t = 0 = q L , k + r a n d k , j ( q U , k q L , k ) , j = 1 , 2 , , N p o p
    where q U , k and q L , k are the upper and lower bounds of the k-th variable, r a n d k , j is a random number between 0 and 1, and N p o p is the size of the population.
  • Mutation operation in order to generate a candidate
    v l t = q r 0 t + F ( q r 1 t q r 2 t )
    where F is a perturbation factor, and the vectors q r 0 t , q r 1 t and q r 2 t are randomly chosen from within the population and must be distinct from each other, if q r 0 t , q r 1 t or q r 2 t sizes differ, then a random number is sampled from a uniform distribution between 0 and 1 and if this random number is bigger than the probability of adding an element p a = 0.6 , the element missing is added into the small vector, this process is repeated until all vectors have the same size for the mutation operation to occur, these changes to the vector are made on a copy of the original vector, not changing the old vector.
  • The next step is the crossover operation where the generated vector can be accepted or not depending on the criterion
    q l t + 1 = v l t , if C R r a n d k , l q l t , otherwise
    where C R is the crossover probability.
  • Finally, if the new vector v l t provides a better value for the objective function than vector q l t , the latter is replaced by the former in the next generation, otherwise q l t remains in the population for one more generation
  • Repeat steps 2–4 until a predefined maximum number of generations is achieved.
Each individual (referred to as a gene) is a structure whose elements represent the hyperparameters, according to Table 1.
The idea is that each generation refines the population to converge towards an optimal configuration. After the last generation is achieved, the best-performing individual (the one with the lowest objective function value) is selected as the optimal architecture. The search-space bounds in Table 1 were deliberately broad. In particular, the learning-rate upper bound of 1.0 was chosen so that the metaheuristic itself, rather than prior assumptions, would discard divergent configurations: candidates with excessively large learning rates fail to reduce the validation loss and are naturally eliminated by the selection step, at a modest computational cost. The stopping criterion was fixed to 20 generations with a population of 10 individuals, and each (CR, F) setting was repeated four times under independent random seeds to account for the stochastic nature of the algorithm.
Because the presented algorithm requires numerical inputs, categorical parameters like activation functions were mapped to discrete integer values (Table 2). This integer encoding allowed the algorithm to seamlessly apply mathematical operations during the mutation and crossover phases.

3.5. Training Data

To train the model, a comprehensive dataset comprising the input vector ( x , t , A 1 , A 2 , A 3 , A 4 ) is required. The data generation process employed a hierarchical sampling strategy, beginning with the establishment of bounding intervals for the fundamental physical parameters. Specifically, the parameter spaces were defined as R [ 5 × 10 5 , 2 × 10 4 ] , d [ 0.005 , 0.02 ] , and D [ 0.005 , 0.02 ] . Samples for these parameters were then drawn from a uniform distribution within the defined limits. Subsequently, these values were substituted into the governing equations to compute the corresponding derived coefficients A 1 through A 4 , the parameters were then tested to ensure they generated a physically feasible solution (mainly a positivity test) and the accepted dataset was partitioned using a 70/15/15 split for training, validation, and testing, respectively.

4. Results

4.1. Numerical Validation and Verification

As can be seen from Figure 2 the error falls rapidly and levels off for N x 700 , allowing us to define N x = 1101 with confidence. The same behaviour can be seen for temporal mesh-refinement of the Figure 3 relative L 2 error norm as a function of the number of temporal nodes N t , with the spatial grid held fixed. The error decreases monotonically and saturates for N t 1000 , which led us to use N t = 1501 .

4.2. Surrogate Model

All experiments were executed on a Linux-based machine with an AMD Ryzen 9 9950X CPU, an NVIDIA RTX 3060 GPU, and 192 GB of Kingston DDR5 RAM (Fountain Valley, CA, USA). Each differential-evolution search used a population of 10 evolved over 20 generations and was repeated four times for every ( C R , F ) setting with independent random seeds.
The optimization results are detailed in Table 3 for the MLP approach with different values of the crossover rate and mutation factor and Table 4 for the PINN approach.
Surprisingly, from Table 3 the Differential Evolution (DE) algorithm consistently favored a simple single hidden-layer architecture over more complex neural network topologies. While this streamlined MLP proved highly efficient, reducing computational costs by 80% for the concentration simulation at t = 4 in comparison with the finite difference method, while maintaining acceptable validation accuracy, its predictive capability was confined to the interpolation domain.
The subsequent optimization results for the PINN framework are summarized in Table 4 and it shows that the evolutionary search favored smooth activation functions: tanh dominated the global distribution.
Figure 4 presents the training history of the best PINN configuration, showing that the loss and each of its four components, the data loss, the PDE residual, and the boundary and initial-condition terms, decrease together over training and settle after roughly 10 4 epochs.
To compare the two approaches under controlled conditions, the best architecture identified by the Differential Evolution search was trained twice using identical settings, namely the same initialization, data, optimizer, and number of epochs. The two runs differed only in the physics weight, with λ = 1 for the PINN and λ = 0 for the purely data-driven MLP.
Figure 5 compares the predictive capabilities of the optimized models when evaluated on input data that was not part of the training data, more specifically the values of R and D were 10% higher than the upper boundary. As the input parameters deviate further from the training bounds, the purely data-driven predictions progressively fail to capture the fundamental physical characteristics of the modeled phenomenon.
To quantify this behavior, Table 5 reports the RMSE of both models with respect to the solver reference, evaluated on the test set (interpolation) and on the extrapolation scenarios. In interpolation, the two models achieve comparable accuracy, with the MLP attaining a slightly lower RMSE than the PINN, an expected consequence of the physics residual acting as a regularizer, which can marginally penalize the in-distribution fit in exchange for better generalization. In extrapolation, however, the RMSE of the MLP increases by over two orders of magnitude, whereas that of the PINN increases by about tenfold.

4.3. Training Stability and Robustness

To evaluate the training stability and robustness of the chosen architecture, the best configuration was probed along two further axes.
Initialization: Retraining under several random seeds produced final errors with a small standard deviation. The relative L 2 error varied by under 10 % of its mean across seeds, so convergence was reproducible and not the result of a fortunate initialization.
Noise: Adding zero-mean Gaussian noise to the training targets, with standard deviation up to 0.05 , changed the RMSE of the solution only marginally, while a standard deviation greater than 0.1 led to the non-convergence of the PDE loss.

4.4. Sensitivity Analysis

Figure 6 presents the sensitivity coefficient as a function of position x at several time instants for each coefficient of the BG model.
The low sensitivity to A 3 and A 4 anticipates that these coefficients are intrinsically harder to recover from concentration data but also indicates the regions where sensors could be placed such as the regions of higher sensitivity.

4.5. Parameter Estimation of a Heterogeneous Medium

The true advantage of a surrogate model becomes evident in scenarios requiring repeated evaluations of Equation (3), such as inverse problems of parameter estimation. As outlined in the previous section, we applied the optimized surrogate to estimate the heterogeneous properties of a medium following a contaminant spill. Our primary objective is to identify the underlying spatial parameters necessary to accurately replicate this transport phenomenon.
For this application, the true heterogeneous medium was simulated assuming a logarithmic redistribution behavior, with D = 0.05 , R = 0.001 , and d = 1 , yielding the reference values for A 1 ( x ) through A 4 ( x ) . The optimization results, obtained after 30 executions of the Differential Evolution algorithm minimizing J 2 , are illustrated in Figure 7. The figure compares the algorithm’s mean estimated profiles against the true values for A 1 ( x ) , A 2 ( x ) , A 3 ( x ) , and A 4 ( x ) , together with 95% confidence intervals representing the variability obtained across the 30 executions of the Differential Evolution algorithm.
Notably, the spatial regions adjacent to the domain edges exhibit negligible sensitivity to these parameters, consistent with the sensitivity analysis results in Figure 6. This localized lack of sensitivity is primarily driven by the enforced fixed boundary conditions, and the spatial extent of this boundary influence is clearly visible in the confidence bands shown. The execution time of the optimization loop was also recorded. Using the traditional FDM as the solver, a complete execution of the Differential Evolution algorithm required approximately 1319.11 s to converge. In contrast, replacing the FDM with the optimized PINN surrogate reduced the runtime of the same optimization process to 635.01 s. For a fair assessment of this comparison, the one-off training cost of the surrogate must also be considered. Training the best PINN configuration required approximately 3700 s on the hardware. Taking this result in consideration, the surrogate model becomes advantageous whenever more than 5 DE executions are required; in the present study, the 30 executions used to construct the confidence intervals amortize the training cost by a wide margin.

4.6. Reconstruction of the Concentration Curve

In scenarios where the availability of concentration sensors is limited, it is crucial that these sensors are strategically placed in regions where the model exhibits high sensitivity to the target parameters. Although these optimal locations can be identified through sensitivity analysis of Equation (3), as demonstrated by [9], sparse physical data still leave gaps in spatial observation. To address this limitation, the proposed surrogate model serves as a computationally efficient virtual sensor. It provides a low-cost method for reconstructing the full concentration curve in regions lacking physical instrumentation. The mean parameter values estimated via the inverse problem were subsequently used to simulate the time evolution of the concentration profile. Figure 8 compares the concentration generated using the true parameters ( C e x p ) with that generated using the estimated parameters ( C p r e d i c t e d ), alongside the absolute error between the two.

5. Concluding Remarks

This work investigated the optimization of Physics-Informed Neural Network architectures for solving the bi-flux Bevilacqua–Galeão anomalous diffusion equation. The Differential Evolution algorithm proved effective in navigating the complex hyperparameter search space, consistently identifying configurations that achieved objective-function values J 1 in the range 10 3 10 2 . A particularly noteworthy finding is that for the purely data-driven MLP, shallow architectures with one or two hidden layers dominated; for the PINN, the top-performing configurations concentrated between 2 and 7 hidden layers, although competitive solutions with up to 10 layers were also found (Table 4).
However, when tested on out-of-distribution parameters, the purely data-driven did not deliver a result as good as the PINN, demonstrating an inability to extrapolate. This structural limitation indicates the need to embed physical knowledge into the learning process, thereby motivating the transition to a Physics-Informed Neural Network.
In the physics-informed approach, the evolutionary search consistently favored smooth activation functions, such as hyperbolic tangent (tanh), with the best architectures ranging from 2 to 10 hidden layers. A key observation from this optimization process is the direct correlation between physical complexity and network topology. When generalizing across varying redistribution laws, the model required deeper, more complex configurations to capture the underlying physics; conversely, scenarios with constant parameters allowed for simpler architectures. We anticipate that these insights will provide practical, data-driven guidance for researchers developing PINN-based surrogate models for bi-flux anomalous diffusion phenomena.
Furthermore, the practical utility of the optimized PINN was successfully demonstrated through an engineering inverse problem. By using the surrogate as a fast solver, the Differential Evolution algorithm accurately estimated the spatially varying parameters of a heterogeneous medium and successfully reconstructed the complete concentration of a contaminant spill with an acceptable error margin. Notably, using the surrogate model for this task reduced computational time by approximately 52%.
Future work should extend this analysis to two-dimensional domains and investigate the surrogate’s performance under varying physical parameters beyond the training regime. The incorporation of uncertainty quantification mechanisms and the exploration of learning strategies for adapting the optimized architectures to related diffusion phenomena represent promising directions for advancing this research.

Author Contributions

D.F.C.: conceptualization, data curation, investigation, methodology, software, validation, visualization, and writing—original draft. C.M.T.: supervision, validation. D.A.P.: supervision, conceptualization. A.S.N.: project administration, supervision, and writing—review and editing. All authors have read and agreed to the published version of the manuscript.

Funding

This study was financed in part by the Coordenação de Aperfeiçoamento de Pessoal de Nível Superior—Brasil (CAPES)—Finance Code 001, Conselho Nacional de Desenvolvimento Científico e Tecnológico (CNPq) and Fundação Carlos Chagas Filho de Amparo à Pesquisa do Estado do Rio de Janeiro (FAPERJ).

Data Availability Statement

The original contributions presented in this study are included in the article. Further inquiries can be directed to the corresponding author.

Acknowledgments

D. Corrêa acknowledges CAPES also for the scholarship for his stay at the University of Granada (Grant CAPES/PrInt No. 88887.717186/2022-00). A. Silva Neto also acknowledges the financial support provided by SENAI CIMATEC. D. Pelta acknowledges support from project PID2023-146575NB-I00, MICIU/AEI/10.13039/501100011033, including FEDER Funds, UE.

Conflicts of Interest

The authors declare no conflicts of interest. The funders had no role in the design of the study; in the collection, analyses, or interpretation of data; in the writing of the manuscript; or in the decision to publish the results.

Abbreviations

The following abbreviations are used in this manuscript:
BGBevilacqua–Galeão
MLPMulti-Layer Perceptron
PINNPhysics-Informed Neural Networks
CRCrossover Rate
FMutation Factor
PDEPartial Differential Equation

References

  1. Metzler, R.; Jeon, J.-H.; Cherstvy, A.G.; Barkai, E. Anomalous diffusion models and their properties: Non-stationarity, non-ergodicity, and ageing at the centenary of single particle tracking. Phys. Chem. Chem. Phys. 2014, 16, 24128–24164. [Google Scholar] [CrossRef]
  2. Bevilacqua, L.; Jiang, M.; Silva Neto, A.J.; Galeão, A.C.R.N. An evolutionary model of bi-flux diffusion processes. J. Braz. Soc. Mech. Sci. Eng. 2015, 38, 1421–1432. [Google Scholar] [CrossRef]
  3. Bevilacqua, L.; Galeão, A.C.N.R.; Costa, F.P. On the significance of higher order differential terms in diffusion processes. J. Braz. Soc. Mech. Sci. Eng. 2011, 33, 166–175. [Google Scholar] [CrossRef][Green Version]
  4. Bevilacqua, L.; Galeão, A.C.N.R.; Costa, F.P. A new analytical formulation of retention effects on particle diffusion process. An. Acad. Bras. Cienc. 2011, 83, 1443–1464. [Google Scholar] [CrossRef]
  5. Bevilacqua, L.; Galeão, A.C.N.R.; Simas, J.G.; Doce, A.P.R. A new theory for anomalous diffusion with a bimodal flux distribution. J. Braz. Soc. Mech. Sci. Eng. 2013, 35, 431–440. [Google Scholar] [CrossRef]
  6. Lugon Junior, J.; Vasconcelos, J.V.; Bevilacqua, L.; Silva Neto, A.J. Desenvolvimento de solução para problema unidimensional de advecção e difusão bi-fluxo. In Proceedings of the 3º Seminário Internacional de Estatística com R, Niterói, Brazil, 22–24 May 2018; Galoá: Campinas, Brazil, 2018; Available online: https://proceedings.science/iii-ser-2018/papers/desenvolvimento-de-solucao-para-problema-unidimensional-de-adveccao-e-difusao-bi (accessed on 10 April 2026).
  7. Junior, J.L.; Vasconcellos, J.F.V.; Knupp, D.C.; Marinho, G.M.; Bevilacqua, L.; Silva Neto, A.J. Solution of Fourth-Order Diffusion Equations and Analysis Using the Second Moment. DDF 2020, 399, 10–20. [Google Scholar] [CrossRef]
  8. Jiang, M.; Bevilacqua, L.; Silva Neto, A.J.; Galeão, A.C.N.R.; Zhu, J. Bi-flux theory applied to the dispersion of particles in anisotropic substratum. Appl. Math. Model. 2018, 64, 121–134. [Google Scholar] [CrossRef]
  9. Silva, L.G.; Knupp, D.C.; Bevilacqua, L.; Galeão, A.C.N.R.; Silva Neto, A.J. Inverse problem in anomalous diffusion with uncertainty propagation. Comput. Assist. Methods Eng. Sci. 2014, 21, 245–255. [Google Scholar] [CrossRef]
  10. Silva, L.G.D.; Knupp, D.C.; Bevilacqua, L.; Galeão, A.C.N.R.; Silva Neto, A.J. Formulação E Solução De Um Problema Inverso De Difusão Anômala Com Técnicas Estocásticas. Ciência E Nat. 2014, 36, 82–96. [Google Scholar] [CrossRef]
  11. da Silva, L.G.; Knupp, D.C.; da Silva Neto, A.J.; Câmara, L.D.T.; Bevilacqua, L.; Galeão, A.C.N.R.; Santiago, O.L. Uma Versão Modificada do Método Evolução Diferencial para Estimativa de Parâmetros do Modelo de Difusão Anômala Assimétrica. Proc. Ser. Braz. Soc. Comput. Appl. Math. 2015, 3, 010452. [Google Scholar] [CrossRef]
  12. He, Q.; Solano, D.B.; Tartakovsky, G.; Tartakovsky, A.M. Physics-informed neural networks for multiphysics data assimilation with application to subsurface transport. Adv. Water Resour. 2020, 141, 103610. [Google Scholar] [CrossRef]
  13. Bandai, T.; Ghezzehei, T.A. Forward and inverse modeling of water flow in unsaturated soils with discontinuous hydraulic conductivities using physics-informed neural networks with domain decomposition. Hydrol. Earth Syst. Sci. 2022, 26, 4469–4495. [Google Scholar] [CrossRef]
  14. Almajid, M.M.; Abu-Al-Saud, M.O. Prediction of porous media fluid flow using physics informed neural networks. J. Pet. Sci. Eng. 2022, 208, 109205. [Google Scholar] [CrossRef]
  15. Jalili, D.; Jang, S.; Jadidi, M.; Giustini, G.; Keshmiri, A.; Mahmoudi, Y. Physics-informed neural networks for heat transfer prediction in two-phase flows. Int. J. Heat Mass Transf. 2024, 221, 125089. [Google Scholar] [CrossRef]
  16. Laubscher, R. Simulation of multi-species flow and heat transfer using physics-informed neural networks. Phys. Fluids 2021, 33, 087101. [Google Scholar] [CrossRef]
  17. Pratap, V.; Kumar, P.; Rao, C.; Gilchrist, M.D.; Tripathi, B.B. Modelling fourth-order hyperelasticity in soft solids using physics informed neural networks without labelled data. Brain Res. Bull. 2025, 224, 111318. [Google Scholar] [CrossRef] [PubMed]
  18. Zhang, W.; Li, J. The robust physics-informed neural networks for a typical fourth-order phase field model. Comput. Math. Appl. 2023, 140, 64–77. [Google Scholar] [CrossRef]
  19. De Luca, P.; Marcellino, L. Towards Numerical Method-Informed Neural Networks for PDE Learning. Mathematics 2025, 13, 2392. [Google Scholar] [CrossRef]
  20. Sitzmann, V.; Martel, J.; Bergman, A.; Lindell, D.; Wetzstein, G. Implicit neural representations with periodic activation functions. Adv. Neural Inf. Process. Syst. 2020, 33, 7462–7473. [Google Scholar]
  21. Moseley, B.; Markham, A.; Nissen-Meyer, T. Finite basis physics-informed neural networks (FBPINNs): A scalable domain decomposition approach for solving differential equations. arXiv 2021, arXiv:2107.07871. [Google Scholar] [CrossRef]
  22. Rodriguez-Torrado, R.; Ruiz, P.; Cueto-Felgueroso, L.; Green, M.C.; Friesen, T.; Matringe, S.; Togelius, J. Physics-informed attention-based neural network for solving non-linear partial differential equations. arXiv 2021, arXiv:2105.07898. [Google Scholar]
  23. Ahmad, T.; Sulaiman, M.; Bassir, D.; Alshammari, F.S.; Laouini, G. Enhanced Numerical Solutions for Fractional PDEs Using Monte Carlo PINNs Coupled with Cuckoo Search Optimization. Fractal Fract. 2025, 9, 225. [Google Scholar] [CrossRef]
  24. Aslam, M.N.; Aslam, M.W.; Arshad, M.S.; Afzal, Z.; Hassani, M.K.; Zidan, A.M.; Akgül, A. Neuro-computing solution for Lorenz differential equations through artificial neural networks integrated with PSO-NNA hybrid meta-heuristic algorithms: A comparative study. Sci. Rep. 2024, 14, 7518. [Google Scholar] [CrossRef] [PubMed]
  25. Baty, H. A hands-on introduction to Physics-Informed Neural Networks for solving partial differential equations with benchmark tests taken from astrophysics and plasma physics. arXiv 2024, arXiv:2403.00599. [Google Scholar] [CrossRef]
  26. Wang, Y.; Han, X.; Chang, C.; Zha, D.; Braga-Neto, U.; Hu, X. Auto-PINN: Understanding and Optimizing Physics-Informed Neural Architecture. arXiv 2023, arXiv:2205.13748. [Google Scholar] [CrossRef]
  27. Silva Neto, A.J.; Becceneri, J.C.; Campos Velho, H.F. Computational Intelligence Applied to Inverse Radiative Transfer Problems; EdUERJ: Rio de Janeiro, Brasil, 2016. (In Portuguese) [Google Scholar]
  28. Raoui, H.E.; Cabrera-Cuevas, M.; Pelta, D.A. The Role of Metaheuristics as Solutions Generators. Symmetry 2021, 13, 2034. [Google Scholar] [CrossRef]
  29. Vasconcellos, J.F.V.; Marinho, G.M.; Zani, J.H. Análise numérica da equação da difusão anômala com fluxo bimodal. Rev. Int. Métodos Numéricos Para Cálculo Diseño Ing. 2017, 33, 242–249. [Google Scholar] [CrossRef]
  30. Raissi, M.; Perdikaris, P.; Karniadakis, G.E. Physics-informed neural networks: A deep learning framework for solving forward and inverse problems involving nonlinear partial differential equations. J. Comput. Phys. 2019, 378, 686–707. [Google Scholar] [CrossRef]
  31. Beck, J.V.; Blackwell, B.; St Clair, C.R., Jr. Inverse Heat Conduction—Ill-Posed Problems, 1st ed.; John Wiley & Sons: New York, NY, USA, 1985. [Google Scholar]
  32. Storn, R.; Price, K. Differential Evolution—A Simple and Efficient Heuristic for Global Optimization over Continuous Spaces. J. Glob. Optim. 1997, 11, 341–359. [Google Scholar] [CrossRef]
Figure 1. Schematic representation of the multi-layer perceptron neural network and some of its hyperparameters.Green nodes represent the input layer variables, blue nodes are the hidden layer neurons, and the pink node is the output layer variable.
Figure 1. Schematic representation of the multi-layer perceptron neural network and some of its hyperparameters.Green nodes represent the input layer variables, blue nodes are the hidden layer neurons, and the pink node is the output layer variable.
Eng 07 00293 g001
Figure 2. Relative L 2 error norm to the solution obtained on the finest grid ( N x = 5001/ N t = 10,001) of the computed concentration as a function of the number of spatial nodes N x , with the temporal grid held fixed.
Figure 2. Relative L 2 error norm to the solution obtained on the finest grid ( N x = 5001/ N t = 10,001) of the computed concentration as a function of the number of spatial nodes N x , with the temporal grid held fixed.
Eng 07 00293 g002
Figure 3. Relative L 2 error norm to the solution obtained on the finest grid ( N x = 5001/ N t = 10,001) as a function of the number of temporal nodes N t , with the spatial grid held fixed.
Figure 3. Relative L 2 error norm to the solution obtained on the finest grid ( N x = 5001/ N t = 10,001) as a function of the number of temporal nodes N t , with the spatial grid held fixed.
Eng 07 00293 g003
Figure 4. Training history of the best PINN configuration. (a) Data Loss, PDE-residual, boundary and initial-condition components Losses versus epoch; (b) Validation mean-squared error.
Figure 4. Training history of the best PINN configuration. (a) Data Loss, PDE-residual, boundary and initial-condition components Losses versus epoch; (b) Validation mean-squared error.
Eng 07 00293 g004
Figure 5. Comparison of the best MLP and PINN models obtained from Differential Evolution for values outside the training-data range.
Figure 5. Comparison of the best MLP and PINN models obtained from Differential Evolution for values outside the training-data range.
Eng 07 00293 g005
Figure 6. Local sensitivity of the concentration field to the bi-flux coefficients.
Figure 6. Local sensitivity of the concentration field to the bi-flux coefficients.
Eng 07 00293 g006
Figure 7. True and estimated values, with 95% confidence intervals for A 1 ( x ) , A 2 ( x ) , A 3 ( x ) , and A 4 ( x ) .
Figure 7. True and estimated values, with 95% confidence intervals for A 1 ( x ) , A 2 ( x ) , A 3 ( x ) , and A 4 ( x ) .
Eng 07 00293 g007
Figure 8. Spatiotemporal concentration profiles generated using the true parameters ( C e x p ) and the estimated parameters ( C p r e d i c t e d ), alongside the absolute error distribution.
Figure 8. Spatiotemporal concentration profiles generated using the true parameters ( C e x p ) and the estimated parameters ( C p r e d i c t e d ), alongside the absolute error distribution.
Eng 07 00293 g008
Table 1. Hyperparameter search space boundaries.
Table 1. Hyperparameter search space boundaries.
HyperparameterLower BoundUpper Bound
Hidden Layers115
Neurons per Layer (Layer Width)1512
Learning Rate 10 5 1.0
Epochs1030,000
Batch Size232,768
Optimizer09
Activation Functions016
Table 2. Mapping of indices to activation functions in the neural network hyperparameter optimization code.
Table 2. Mapping of indices to activation functions in the neural network hyperparameter optimization code.
IndexActivation Function
0sigmoid
1relu
2softmax
3linear
4tanh
5softplus
6softsign
7elu
8selu
9exponential
10leaky_relu
11relu6
12silu
13hard_silu
14mish
15gelu
16log_softmax
Table 3. Table of best genes found for different parameter settings with g = 0 .
Table 3. Table of best genes found for different parameter settings with g = 0 .
J 1 ( T L , V L ) Hidden LayersLayer WidthsLearning RateActivationsEpochsBatch Size
CR = 0.5F = 0.8#10.01182[63, 98]0.0086[3, 5]24,6003112
#20.02151[36]0.0069[3]15,437390
#30.02301[173]0.0073[3]23,3584177
#40.01191[512]0.0061[3]24,7492639
F = 0.4#10.01502[8, 3]0.0127[5, 3]25,18864
#20.00852[74, 109]0.0089[11, 8]23,5663183
#30.00991[154]0.0046[3]24,485544
#40.01781[101]0.0096[5]22,7181157
CR = 0.7F = 0.8#10.02302[64, 41]0.5575[15, 10]14,9101617
#20.01571[89]0.1597[1]24,7611427
#30.00892[36, 117]0.5295[7, 3]23,7557518
#40.01911[32]0.1512[3]24,210573
F = 0.4#10.02053[225, 132, 82]0.0162[1, 0, 9]18,9473298
#20.01531[122]0.2593[3]14,21310,101
#30.00861[253]0.0514[7]29,663256
#40.01651[32]0.3175[10]17,953728
Table 4. Table of best genes found for different parameter settings.
Table 4. Table of best genes found for different parameter settings.
J 1 ( T L , V L ) Hidden LayersLayer WidthsLearning RateActivationsEpochsBatch Size
CR = 0.5F = 0.8#10.03310[193, 334, 288, 248, 32, 512, 32, 140, 398, 290]0.0029[4, 4, 1, 12, 4, 12, 4, 12, 4, 4]30,0003681
#20.0156[136, 32, 32, 147, 202, 155]0.0146[4, 4, 12, 4, 4, 15]19,7713468
#30.0089[364, 32, 263, 99, 32, 490, 32, 32, 512]0.0006[4, 4, 15, 12, 4, 15, 15, 1, 12]13,640512
#40.0079[86, 32, 32, 296, 508, 32, 257, 32, 475]0.0011[4, 1, 4, 1, 12, 4, 15, 4, 12]16,5042694
F = 0.4#10.0034[147, 32, 32, 239]0.0243[12, 15, 1, 12]1000570
#20.0393[383, 121, 268]0.0001[4, 4, 4]18,9777249
#30.0188[367, 385, 69, 59, 176, 309, 330, 387]0.0044[12, 1, 15, 4, 12, 4, 1, 12]20,06914,910
#40.0203[337, 417, 125]0.0125[12, 1, 15]26,6819018
CR = 0.7F = 0.8#10.0107[163, 497, 118, 153, 194, 495, 213]0.0016[12, 12, 12, 12, 15, 1, 4]26,113512
#20.0232[321, 143]0.0068[15, 4]16,8648987
#30.0023[335, 32, 32]0.0037[12, 15, 12]25,3549283
#40.0637[163, 380, 113, 143, 194, 495, 238]0.0016[12, 15, 12, 12, 4, 4, 1]27,0251861
F = 0.4#10.0206[512, 144, 170, 512, 387, 309]0.0007[12, 4, 12, 12, 1, 12]14,4674186
#20.0154[318, 90, 301, 276]0.0007[4, 4, 1, 15]23,0997880
#30.0044[231, 132, 378, 151]0.0030[4, 4, 1, 15]16,9319678
#40.0037[512, 32, 344, 124, 369, 452, 32]0.0021[4, 1, 15, 12, 12, 1, 12]25,55415,259
Table 5. RMSE results of best MLP and PINN architectures in two scenarios: Interpolation and Extrapolation.
Table 5. RMSE results of best MLP and PINN architectures in two scenarios: Interpolation and Extrapolation.
ModelRegimeData SizeRMSE
InterpolationPINN449,764 1.2334 × 10 3
MLP449,764 1.0806 × 10 3
ExtrapolationPINN449,764 1.1775 × 10 2
MLP449,764 2.9694 × 10 1
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

Share and Cite

MDPI and ACS Style

Corrêa, D.F.; Toledo, C.M.; Pelta, D.A.; Silva Neto, A. A Physics-Informed Surrogate Model for the Bi-Flux Bevilacqua–Galeão Anomalous Diffusion Equation. Eng 2026, 7, 293. https://doi.org/10.3390/eng7060293

AMA Style

Corrêa DF, Toledo CM, Pelta DA, Silva Neto A. A Physics-Informed Surrogate Model for the Bi-Flux Bevilacqua–Galeão Anomalous Diffusion Equation. Eng. 2026; 7(6):293. https://doi.org/10.3390/eng7060293

Chicago/Turabian Style

Corrêa, Douglas Ferraz, Cláudio Motta Toledo, David A. Pelta, and Antônio Silva Neto. 2026. "A Physics-Informed Surrogate Model for the Bi-Flux Bevilacqua–Galeão Anomalous Diffusion Equation" Eng 7, no. 6: 293. https://doi.org/10.3390/eng7060293

APA Style

Corrêa, D. F., Toledo, C. M., Pelta, D. A., & Silva Neto, A. (2026). A Physics-Informed Surrogate Model for the Bi-Flux Bevilacqua–Galeão Anomalous Diffusion Equation. Eng, 7(6), 293. https://doi.org/10.3390/eng7060293

Article Metrics

Article metric data becomes available approximately 24 hours after publication online.
Back to TopTop