Abstract
Solving stiff partial differential equations with neural networks remains challenging due to the presence of multiple time scales and numerical instabilities that arise during training. This paper addresses these limitations by embedding the mathematical structure of implicit–explicit time integration schemes directly into neural network architectures. The proposed approach preserves the operator splitting decomposition that separates stiff linear terms from non-stiff nonlinear terms, inheriting the stability properties established for these numerical methods. We evaluate the methodology on Allen–Cahn equation dynamics, where interface evolution exhibits the multi-scale behavior characteristic of stiff systems. The structure-preserving architecture achieves improvements in solution accuracy and long-term stability compared to conventional physics-informed approaches, while maintaining proper energy dissipation throughout the evolution.
Keywords:
structure-preserving neural networks; implicit–explicit methods; stiff partial differential equations; Allen–Cahn equation; operator splitting; physics-informed learning MSC:
35Q68; 35k57; 68T07; 65L04
1. Introduction
The intersection of numerical analysis and machine learning has emerged as a frontier in computational science, particularly in the context of solving partial differential equations (PDEs). Since the physics-informed neural networks (PINNs) have demonstrated success in approximating solutions to various classes of PDEs [1,2], they often fail with long-term stability and conservation properties that are naturally preserved by well-established numerical methods. The fundamental challenge lies in the fact that conventional neural network training procedures do not inherently respect the mathematical structure that governs the stability and accuracy of numerical schemes. Traditional numerical methods for evolutionary PDEs, particularly those exhibiting stiff dynamics or multi-scale phenomena, have been designed to preserve essential structural properties such as energy dissipation, mass conservation, and monotonicity. Among these, implicit–explicit (IMEX) schemes represent a particularly important class of methods that achieve good stability by treating different terms in the PDE with appropriate temporal discretizations [3]. The stiff linear terms are handled implicitly to ensure unconditional stability, while the nonlinear terms are treated explicitly to maintain computational efficiency.
The Allen–Cahn equation is used as a test model problem for investigating these phenomena. Originally introduced in the context of phase transitions in binary alloys, this equation exhibits a mathematical structure including gradient flow dynamics, energy dissipation, and sharp interface formation. The equation is given by
where represents the order parameter, controls the interface width, is the spatial domain, and specifies the initial configuration. The periodic boundary conditions are employed to eliminate boundary effects and focus on the interfacial dynamics. The nonlinear term creates a double-well potential that drives the solution toward the stable states , while the diffusion term regularizes the interface between these phases. This work specifically focuses on double-well interface configurations, which represent some of the most challenging and physically relevant scenarios in Allen–Cahn dynamics. These configurations exhibit complex energy landscapes with multiple stable states and intricate interface evolution, making them ideal test cases for evaluating the robustness and accuracy of numerical methods. Due to the effects of diffusion and reaction, Equation (1) presents significant challenges, leading to stiff dynamics when is small. Standard explicit methods suffer from severe timestep restrictions, while fully implicit methods require the solution of nonlinear systems at each timestep. IMEX schemes provide a trade-off resolution by treating the linear diffusion term implicitly and the nonlinear reaction term explicitly, thereby achieving both stability and efficiency. In this work, we introduce a novel neural network architecture that directly involves the mathematical structure of IMEX schemes, termed IMEX-informed neural networks (IINNs). Rather than attempting to enforce the PDE through penalty terms in the loss function, we design the neural network to inherently respect the operator splitting structure that guarantees stability in the numerical method. This approach fundamentally differs from existing physics-informed methods by preserving the discrete-time structure of proven numerical algorithms within the continuous function approximation framework of neural networks. In fact, by preserving this structure in the neural network architecture, we inherit the stability and conservation properties that have been established for the numerical method. In other words, we shift moves away from the traditional approach of constraining neural networks through physics-based loss terms toward a methodology that embeds mathematical structure directly into the computational graph of the network.
The main contributions of this work are as follows:
- A neural network architecture that preserves the mathematical structure of IMEX time integration schemes through operator decomposition.
- Theoretical analysis of stability properties inherited from established numerical methods.
- Evaluation on Allen–Cahn double-well interface dynamics demonstrating improved accuracy and energy dissipation behavior.
- Investigation of structure-preserving approaches as a potential framework for stiff partial differential equation problems.
The rest of this work is organized as follows. In Section 2, we begin with a review of existing approaches in physics-informed machine learning and structure-preserving neural networks. We then, in Section 3, develop the mathematical framework for IMEX-informed neural networks, showing the foundations for structure preservation and stability inheritance. The numerical modeling, in Section 4, provides implementation details and algorithmic specifications. Finally, in Section 5, we present numerical experiments demonstrating the performance of this approach compared to standard methodologies. Section 6 closes the paper with conclusions.
2. Current and Related Works
Here, we introduce a review of existing approaches in physics-informed machine learning [4,5] and structure-preserving neural networks [6,7].
The integration of machine learning with numerical PDE solution techniques has witnessed unprecedented growth, driven by neural networks’ universal approximation capabilities and ability to handle high-dimensional problems that traditional methods find computationally prohibitive, as shown in [8]. PINNs, presented by Raissi et al. [1], established the foundational framework for incorporating differential equation constraints directly into neural network training. For a general PDE , the PINN loss function takes the following form:
where individual components correspond to data fitting, PDE residual minimization, boundary conditions, and initial conditions. While PINNs have demonstrated success across various applications, fundamental limitations include difficulty balancing loss components, gradient pathologies during training, and lack of structure preservation. Essential physical properties such as energy conservation or mass balance may be violated during learning, leading to unphysical solutions in long-time integration scenarios [9,10]. Recent developments in neural operator learning address some PINN limitations by learning solution operators rather than individual solutions. DeepONet [11] parameterizes operators through a branch–trunk architecture that separates input function encoding from evaluation coordinates. In the Allen–Cahn context, operators map initial conditions to later-time solutions: . Fourier Neural Operators [12] represent another advancement, operating in frequency space and leveraging FFT for efficient convolution operations. The core component learns kernels in Fourier space through learnable linear transformations. While these approaches offer improved generalization, they typically require extensive training data from traditional solvers and may not inherently preserve underlying mathematical structure. Some research has explored integrating numerical methods with neural networks through training data generation or incorporating discretizations into architectures [5]. Finite element neural networks combine finite element spatial discretizations with neural network temporal integration, while spectral neural networks leverage spectral methods for spatial derivatives. However, most approaches treat numerical methods and neural networks as separate components or attempt structure incorporation through loss function modifications rather than architectural design.
Recently, several approaches have emerged to address the limitations of standard PINNs for operator learning and structure preservation. Fourier neural operators (FNOs) [12] represent solutions as convolutions in Fourier space, leveraging FFT for efficient computation and demonstrating excellent performance on periodic domains. While FNOs handle spectral structure well, they may not naturally preserve the operator splitting essential for stiff systems. Deep operator networks (DeepONet) [11] parameterize operators through branch–trunk architectures, learning mappings between function spaces rather than individual solutions. However, these approaches typically require extensive training data from traditional solvers and may not inherently preserve the mathematical structure of underlying numerical schemes. Structure-preserving neural networks [6] have demonstrated success in maintaining geometric properties such as symplecticity and energy conservation. Hamiltonian neural networks [7] preserve energy conservation by parameterizing Hamiltonian functions, while Lagrangian neural networks extend this framework to Lagrangian mechanics. However, these approaches focus primarily on conservative systems and may not address the dissipative dynamics characteristic of Allen–Cahn equations. Neural ordinary differential equations (neural ODEs) [13] introduce continuous-depth networks where forward passes are defined through ODE solutions, providing natural frameworks for temporal dynamics. Despite their theoretical elegance, neural ODEs typically do not incorporate the operator splitting strategies that are crucial for stiff PDE stability. Recent work in multiscale neural approaches [10] has emphasized the importance of causality and temporal structure in physics-informed learning, highlighting challenges that arise when multiple time scales interact in stiff systems.
Table 1 shows a comparison of existing approaches in terms of structure preservation, stiffness handling, and temporal integration strategies.
Table 1.
Comparison of structure-preserving and physics-informed neural network approaches.
Despite these advances, a significant gap remains in systematically incorporating proven numerical method structures into neural architectures. While existing approaches either focus on general structure preservation or operator learning, none directly embed the time integration scheme mathematics that ensures stability for stiff systems. This observation motivates our approach of architectural embedding of IMEX operator splitting, inheriting established stability properties while maintaining neural network flexibility.
3. Numerical Method-Informed Neural Networks
In this section, we present the theoretical foundations of the proposed approach. The idea is that well-designed numerical methods incorporate important mathematical features through their algorithmic structure. These features can be used to guide the construction of neural network models, allowing them to reflect some of the properties of classical numerical schemes while taking advantage of the flexibility of machine learning. We first introduce the mathematical formulation of IMEX methods for time-dependent evolution equations. Then, we show how the main elements of these methods can be integrated into neural network architectures to improve their performance and interpretability.
The numerical treatment of evolutionary partial differential equations arising in phase field theory presents significant computational challenges due to the presence of multiple time scales and varying degrees of numerical stiffness. Consider a general evolution equation of the form
where represents a linear differential operator and denotes a nonlinear operator. In the context of the Allen–Cahn equation (1), originally introduced to describe antiphase boundary motion in crystalline alloys, we have and .
The computational difficulty in solving such equations stems from the disparate stiffness characteristics exhibited by different terms. The linear diffusion operator introduces severe time step restrictions when treated explicitly, as the stable time step scales as where h is the spatial mesh size [14]. This scaling becomes prohibitively restrictive for small values of , which are typically required to accurately resolve sharp interfaces. In contrast, the nonlinear reaction term exhibits soft stability constraints and can be efficiently handled with explicit methods. This observation motivates the development of IMEX time stepping schemes, which exploit the differential stiffness by treating the linear operator implicitly and the nonlinear operator explicitly. Consider an uniform time discretization of the continuous time domain by partitioning the interval into N subintervals of equal length . The discrete time grid is defined as
where and . We denote by the numerical approximation to the exact solution , i.e.,
where represents an appropriate function space. The temporal evolution from to is governed by the discrete solution operator, which we seek to construct through IMEX methodology. In the context of first-order IMEX schemes, this advancement involves the introduction of an intermediate state , which is used as a temporary computational variable that decouples the treatment of stiff and non-stiff terms within a single time step. More precisely, the IMEX–Euler method proceeds through a two-stage process: first, the intermediate state is computed by explicitly, advancing only the nonlinear terms from the current solution . Such an approach yields a solution operator of the form
where represents the IMEX advancement operator from time to .
The practical implementation of IMEX methods relies on the mathematical decomposition of the solution operator into constituent parts that reflect the underlying physics. First-order schemes have this decomposition which follows naturally from operator splitting theory [15], where the IMEX advancement operator can be expressed as a composition:
The operator maps the current solution to the next time step through the complete two-stage process, representing the discrete-time evolution of the Allen–Cahn dynamics. More in detail, represents the explicit treatment of nonlinear terms, while handles the implicit resolution of linear operators. This sequential approach, while introducing a splitting error of order , provides significant computational advantages by avoiding the need to solve nonlinear systems [16].
The most elementary implementation of this decomposition is represented by the first-order IMEX–Euler scheme, which advances the solution through two distinct stages corresponding to the implicit and explicit components. In the explicit stage, the nonlinear terms are advanced using a forward Euler step:
followed by an implicit stage that incorporates the linear operator:
This approach ensures that the most restrictive stability constraints arise from the explicit treatment of the typically non-stiff nonlinear terms, while the stiff linear operator is handled with unconditional stability. The implicit stage in Equation (7) yields a linear system of the form:
which, for the Allen–Cahn equation under periodic boundary conditions, becomes
In the context of linear problems, where the operators and can be approximated or represented by matrices with known eigenvalue spectra, the stability of each Fourier mode offers valuable insight into the global behavior of the scheme [17]. In order to make this more clear, consider the linearized version of the evolution equation:
where and are assumed to be diagonalizable operators. Let and denote the eigenvalues of and , respectively, associated with the k-th Fourier mode. Applying the first-order IMEX–Euler scheme to this linear system yields an update rule for each mode whose amplification factor is given by
By doing so, under the conditions (e.g., diffusive linear operator) and , the magnitude of the amplification factor satisfies , ensuring unconditional stability for the stiff linear part of the equation [18].
Recent advances in scientific machine learning have demonstrated the potential of PINNs [1] to solve partial differential equations by directly parameterizing the solution through neural networks while incorporating governing equations as soft constraints in the loss function. In the PINN framework, a neural network approximates the solution, and the training process minimizes a composite loss that includes both the residual of the governing PDE and boundary/initial conditions. While this approach has shown promise across various applications, it faces significant challenges when applied to stiff problems such as the Allen–Cahn equation, where the presence of multiple time scales can lead to training difficulties and poor long-time integration behavior [10]. The main limitation of standard PINNs for stiff problems is that they ignore the advanced numerical methods developed over the years to solve these systems efficiently. By learning the solution directly without incorporating knowledge of optimal time-stepping strategies, PINNs often struggle to capture the correct temporal dynamics, particularly in regimes where explicit methods would be unstable and implicit methods are necessary for stability.
This observation highlights the need to move from solution-based learning to operator-based approaches, building on the valuable knowledge developed through classical numerical techniques. Rather than parameterizing the solution and attempting to satisfy the governing PDE through residual minimization, we propose to parameterize the IMEX solution operator , in Equation (5), itself using neural networks. This operator-centric methodology preserves the carefully constructed mathematical architecture of IMEX schemes while exploiting the representational power and computational efficiency of neural networks. More specifically, the decomposition into explicit and implicit components with their associated stability properties is involved in the network. By embedding the IMEX structure directly into the neural network architecture, the resulting model inherits the favorable stability properties established in Equation (9), ensuring robust behavior for stiff problems.
Mathematical Background of IMEX-Informed Neural Networks
The operator decomposition established in the previous analysis suggests a corresponding neural network architecture that preserves this mathematical structure.
Throughout this analysis, we employ standard function space notation [19,20]. The norm is denoted
while the operator norm is defined as
for linear operators. For the Allen–Cahn equation , we decompose operators as follows: the implicit operator represents diffusion terms , while the explicit operator handles reaction terms . The operators and denote the corresponding nonlinear and linear components, respectively.
Definition 1
(IMEX-informed neural network). An IMEX-Informed Neural Network (IINN) is a composite neural architecture that approximates the IMEX solution operator through the composition:
where approximates with parameters , and approximates with parameters .
Remark 1.
The IINN architecture is designed to approximate the action of the complete IMEX time-stepping operator , inheriting its stability properties through the architectural decomposition rather than through loss function constraints.
To be more specific, the explicit component , we recall from Equation (6) for the Allen–Cahn equation, substituting yields
Since the reaction term involves no spatial derivatives, this transformation can be computed pointwise. We therefore define
which requires no learnable parameters, setting .
For the implicit component, we take into account the system from Equation (5):
and multiplying both sides by the resolvent operator :
The implicit component approximates this resolvent action:
In order to analyze the properties this network must learn, consider the eigenfunction expansion. Considering periodic boundary conditions, the Laplacian has eigenfunctions with eigenvalues [21,22]. The resolvent operator, Ref. [23] acts on each mode as
This shows that must approximate a low-pass filter with mode-dependent attenuation factors [24]. High-frequency modes (large ) are strongly attenuated, while low-frequency modes pass through with minimal modification. Therefore, the approximation quality of the IINN depends critically on how well captures this spectral behavior. In particular, the network must reproduce the correct scaling with respect to both the time step and the interface parameter .
We clarify the nature of operators in our analysis. The implicit operators and the linear component of are linear differential operators admitting spectral analysis. The explicit operators and are generally nonlinear, as exemplified by in the Allen–Cahn equation. For nonlinear operators, denotes the operator norm of the Fréchet derivative evaluated at the current solution state.
The following result provides a bound on how operator composition is affected by small perturbations in the constituent operators.
Proposition 1
(Stability Under Operator Perturbations). Let and be bounded linear operators with and . Assume the composition satisfies .
Let and be bounded linear operators such that
Then, the composed operator satisfies
Proof.
We analyze the composition by decomposing the perturbations. Let and . The composed operator can be written as
For any with , applying the triangle inequality:
Taking the supremum over all unit vectors yields the desired bound. □
Corollary 1
(Practical Stability Conditions). Let and be the explicit and implicit operators of a first-order IMEX scheme applied to the Allen–Cahn Equation (1). Under the spectral bounds and for some constant , the IINN approximation satisfies for small provided:
These conditions follow directly from Proposition 1 by setting , , and requiring the bound .
Remark 2
(Stability Condition). To ensure practical stability from the bound in the proposition (i.e., ), we require
We now establish conditions under which the IINN inherits the favorable stability properties of the underlying IMEX method.
Theorem 1
(Inherited Stability). Let be a stable IMEX operator with and let be an ϵ-approximate IMEX operator. If for some threshold depending only on the problem parameters, then satisfies the same stability bound: .
Proof.
This follows directly from Proposition 1 and Corollary 1. The threshold is determined by the conditions in Corollary 1 with chosen to ensure practical stability. □
4. Numerical Modeling of IINNs
This section introduces the algorithmic framework, discusses implementation details, and analyzes the computational properties of the proposed approach.
The fundamental challenge in implementing IINNs lies in maintaining the delicate balance between the explicit and implicit components while ensuring that the neural network approximation preserves the stability characteristics of the reference numerical method. We begin by establishing the discrete mathematical framework that governs the implementation.
We consider a uniform spatial grid consisting of M equally spaced points:
where the spatial mesh size is defined as . The choice of periodic boundary conditions ensures that and are identified:
Throughout this section, we use the notation to denote the spatially discretized solution at time , where and represents the approximate solution value at grid point . Recalling the time stepping in Equation (4), at each discrete time level , we compact the spatially discretized solution as a vector , where each component corresponds to the approximate solution value at a grid point
Similarly, we define the intermediate vector and the advanced solution vector according to
where represents the intermediate state function obtained after the explicit step of the IMEX scheme. All vector operations involving nonlinear terms are computed in component-wise way. Specifically, the notation denotes the element-wise cubing operation: and similarly for other vector expressions such as , where subtraction is also performed component-wise. The spatially semi-discretized Allen–Cahn equation takes the form of a system of coupled ordinary differential equations in , as
where represents the discrete solution vector and denotes the discrete Laplacian operator which arises from the standard second-order central finite difference approximation of the continuous Laplacian. When u is sufficiently smooth, the second derivative at each interior grid point is approximated by
The periodic boundary conditions ensure that the stencil applies consistently at all grid points, with and . Therefore, the matrix has the circulant structure
The eigenvalues of this matrix are given by
These eigenvalues are all non-positive, with the most negative eigenvalue corresponding to the highest frequency mode. The first-order IMEX–Euler discretization of this system proceeds as follows:
The implicit step in Equation (20) requires solving a linear system at each time step. Rearranging the implicit equation yields
which can be written in matrix form as
We define the coefficient matrix of this linear system as
The spectral properties of govern both the stability and computational efficiency of the IMEX scheme. Since is circulant with eigenvalues given in Equation (18), the eigenvalues of are
Since , all eigenvalues satisfy , which ensures that is invertible and the implicit step is well-posed. The condition number of is bounded by
4.1. Network Architecture Details
Previous details allow us to parameterize the solution operators through neural networks while preserving this mathematical structure. The explicit operator is implemented as
where the element-wise operations preserve the vector structure and require no approximation.
The implicit operator approximates the linear transformation . Rather than computing this inverse directly, we employ a convolutional neural network architecture that learns to approximate the inverse operator. This approach stems from the fact that can be expressed through its spectral decomposition:
where contains the eigenvectors and is the diagonal matrix of eigenvalues. Due to the circulant structure with periodic boundary conditions, the eigenvectors are the discrete Fourier modes, allowing efficient computation through FFT operations. The convolutional architecture takes the form
where each represents a one-dimensional convolution with kernel size and denotes the activation function.
We employ the Swish activation function due to its smooth properties and bounded derivatives, which contribute to training stability:
whose derivative exhibits favorable properties for gradient-based optimization:
which avoids the vanishing gradient problem associated with saturation-prone activation functions.
4.2. Architecture Specification and Training Protocol
Table 2 provides complete model specifications addressing reproducibility requirements.
Table 2.
Complete hyperparameter specification for all models.
The implicit component employs a multi-scale convolutional architecture with kernels capturing both local interface structure and global diffusive effects. The architecture incorporates three residual blocks with Swish activation functions and attention mechanisms for adaptive scale combination, totaling 39,701 parameters.
4.3. Training Data Generation
In this sub-section, we introduce the construction of a training dataset by considering the mathematical structure of Allen–Cahn dynamics. The training set must span relevant function spaces while respecting physical constraints inherent to the equation.
We start with construction of training data using four distinct classes of initial conditions that capture different aspects of Allen–Cahn behavior.
Steady-state interface profiles. We base the primary training data on equilibrium interface solutions:
where the interface centers are uniformly distributed in and the interface width w is randomly sampled in . These profiles represent the fundamental building blocks of Allen–Cahn dynamics, possessing the characteristic interface width and satisfying the ordinary differential equation .
Sinusoidal perturbations. In order to assess response to smooth variations, we include sinusoidal perturbations:
where the amplitude A is randomly sampled in and frequency is chosen uniformly in , ensuring .
Multi-phase step functions. We construct piecewise constant initial data representing multiple phase regions:
where K is the number of phase regions (typically 2–3), are characteristic functions of disjoint intervals , and are randomly assigned phase values. The interface locations are sampled with minimum separation to prevent immediate collision.
Smooth stochastic fields. To help our model work better across different situations [10,25], we create smoothed random starting data using a mathematical smoothing technique (Gaussian convolution):
where is uniform random noise in , is a normalized Gaussian kernel with , and enforces physical bounds. Boundary conditions are involved into circulant matrix (17) structure with corner entries .
The temporal evolution strategy employs randomized sampling. Instead of generating sequential time series, we implement single-step sampling from random trajectory points. The temporal extent is chosen as where ensures capture of the characteristic Allen–Cahn relaxation timescale. This choice is motivated by the energy dissipation estimate
for solutions near equilibrium, yielding exponential decay with rate .
The following Algorithm 1 lists the generation of training pairs through randomized temporal sampling:
| Algorithm 1 Randomized Training Data Generation |
for to do Generate initial condition from distribution ensemble Initialize: , Sample evolution time: uniformly from while do Explicit step: Implicit step: Update: , end while Compute IMEX step: Store pair: end for |
Algorithm 1 is enhanced with complete hyperparameter specifications (Algorithm 2):
| Algorithm 2 Enhanced IINN Training Procedure |
|
4.4. Training Methodology
The training procedure for IINN schemes incorporates both theoretical insights and practical stability considerations established in the previous sections. The total loss function design involves multiple components that ensure both accuracy and structure preservation:
Operator loss. The primary loss ensures accurate approximation of the IMEX solution operator using the training pairs generated by the Algorithm 1:
Physics loss. This term verifies that the learned operator satisfies the underlying Allen–Cahn PDE structure through finite difference approximations:
where is the circulant discrete Laplacian matrix defined in Equation (17).
Conservation loss. This component enforces energy dissipation property, an essential property of Allen–Cahn gradient flow dynamics:
where the discrete energy functional is defined as
where the spatial derivatives in the energy functional employ centered finite differences with periodic indexing and . We enforce energy dissipation but not mass conservation, as the standard Allen–Cahn equation naturally allows mass changes during phase separation, which is physically correct for the underlying phase field dynamics.
Stability loss. This term ensures that the neural network operator preserves the contractivity properties inherited from the reference IMEX scheme:
where are randomly selected pairs from the training dataset, promoting Lipschitz continuity with constant consistent with the stability properties established in Theorem 1.
The current training approach employs randomized temporal sampling from trajectory points, which provides computational efficiency but may not guarantee strict satisfaction of initial conditions. Recent advances in physics-informed learning have introduced hard constraint enforcement methods that directly embed initial and boundary conditions into the network architecture. Luong et al. [26] developed decoupled physics-informed neural networks that automatically satisfy boundary conditions through admissible function selections while handling initial conditions via Galerkin formulations, demonstrating superior accuracy and computational efficiency. While our operator-based approach focuses on preserving temporal evolution structure, future work could investigate hybrid strategies that combine operator splitting preservation with such hard constraint enforcement methods, potentially yielding enhanced performance in problems where initial condition accuracy is critical.
4.5. Optimization Procedure
The optimization employs adaptive learning rate schedules and gradient clipping to ensure stable training dynamics. We adopt the Adam optimizer with initial learning rate and exponential decay:
where is chosen to achieve convergence within the specified training horizon. Gradient clipping prevents exploding gradients that can occur due to the composition structure of the IINN established in Definition 1:
where represents the computed gradients and is the clipping threshold chosen to maintain numerical stability during the training process.
5. Results
In this section, we present numerical experiments comparing the proposed IINN with three baseline methods across five double-well interface configurations. The evaluation focuses on solution accuracy, conservation properties, and long-term stability characteristics of the Allen–Cahn equation.
5.1. Experimental Setup
We evaluate four neural network architectures: the proposed IINN, Advanced_PINN with Fourier feature embedding, Fourier_PINN with multi-scale frequency components, and ResNet_PINN employing deep residual connections. Each scenario was evolved for 300 time steps to assess long-term behavior. The scenarios are defined through specific initial conditions: Standard Double-Well uses
Narrow Double-Well employs
for increased stiffness, Wide Double-Well uses
for broader interfaces, Asymmetric Double-Well shifts the interface to
and Complex Multi-Well combines two interfaces as
These configurations test interface width effects, asymmetry handling, and multi-interface dynamics, respectively. All test scenarios employ steady-state interface profiles from Equation (29), as these represent the most physically relevant configurations for Allen–Cahn double-well dynamics and provide clear interface structures for evaluating the proposed method. Each scenario was evolved for 300 time steps to assess long-term behavior. Training data was generated using the same IMEX method in (5) to ensure fair comparison across all approaches. All models utilized similar training protocols with Adam optimization, exponential learning rate decay, and gradient clipping for numerical stability.
5.2. Network Architecture
The proposed IINN implements the mathematical decomposition directly into the network architecture. The explicit component computes without learnable parameters, while the implicit component approximates the resolvent operator through a multi-scale convolutional architecture with four parallel processing paths (kernel sizes 3, 5, 7, and 11) and attention mechanisms.
The baseline methods represent established approaches in physics-informed learning. Advanced_PINN incorporates Fourier feature embedding with parallel residual branches, Fourier_PINN uses multi-scale Fourier features at frequencies , and ResNet_PINN employs deep residual blocks with bottleneck design. All architectures maintain comparable parameter counts (150 k–280 k) to ensure fair comparison.
Table 3 presents comprehensive timing analysis demonstrating the computational efficiency of IINN. Despite sophisticated multi-scale architecture, IINN achieves competitive performance with significantly fewer parameters.
Table 3.
Computational performance comparison.
IINN shows a gain in parameter efficiency, achieving comparable inference speeds with 28× fewer parameters than baseline methods. The multi-scale CNN architecture maintains computational tractability while preserving mathematical structure of IMEX schemes.
5.3. Training Stability Analysis
Figure 1 demonstrates comprehensive training diagnostics for IINN, validating convergence stability and physics-informed loss evolution.
Figure 1.
Training stability diagnostics showing: (top row) total loss evolution, MSE component, and learning rate schedule; (bottom row) energy conservation loss, mass conservation loss, and gradient norm evolution. The multi-component loss function ensures both accuracy and physical consistency.
The training exhibits stable convergence with proper physics-informed behavior:
- Total Loss: Smooth exponential decay without oscillations
- Energy Conservation: Consistent improvement ensuring gradient flow properties
- Mass Conservation: Stable evolution maintaining physical bounds
- Gradient Norm: Controlled magnitudes indicating numerical stability
5.4. Solution Accuracy Analysis
Table 4 presents the relative error between predicted and reference solutions across all test scenarios. Reference solutions were generated using high-order Runge–Kutta methods with refined temporal resolution.
Table 4.
Relative error comparison across double-well scenarios.
The results show that IINN achieves relative errors on the order of 10−4, which are notably lower than baseline approaches that exhibit errors in the range from 10−2 to 10−1. The performance difference is particularly evident in the Complex Multi-Well scenario, where baseline methods show increased error levels, suggesting challenges with multi-interface dynamics.
5.5. Ablation Analysis of Physics-Informed Components
Table 5 quantifies the contribution of individual loss terms to overall performance.
Table 5.
Loss component ablation analysis.
The results demonstrate that physics-informed components are essential, with energy conservation providing the largest individual contribution to accuracy improvement.
5.6. Validation of Learned Operator Behavior
Figure 2 validates that successfully approximates the theoretical resolvent operator .
Figure 2.
Learned operator analysis. Left: attenuation factors for different Fourier modes showing agreement with theoretical predictions (relative errors ). Right: learned convolutional kernels demonstrating the spatial filtering characteristics of the network.
The network achieves relative errors below across all tested frequencies, confirming that the multi-scale architecture correctly captures the essential spectral properties for stability in stiff systems.
The relative errors remain below for all tested frequencies, as also shown in Table 6, confirming that the network correctly approximates the resolvent operator .
Table 6.
Learned vs. theoretical attenuation factors.
5.7. Energy Dissipation Properties
Energy dissipation, expressed as , constitutes the fundamental thermodynamic principle governing Allen–Cahn dynamics. Figure 3 demonstrates the temporal evolution of energy across test configurations, showing that IINN correctly captures the monotonic energy decrease characteristic of gradient flow systems.
Figure 3.
Energy dissipation over 300 time steps for all test scenarios. IINN demonstrates proper monotonic energy decrease while baseline methods show energy drift and violations of thermodynamic principles.
IINN maintains proper energy dissipation with average changes of , consistent with the expected gradient flow behavior. The baseline methods exhibit larger energy fluctuations, with average changes exceeding for PINN variants, indicating violations of the fundamental thermodynamic principle.
5.8. Interface Evolution Characteristics
Figure 4, Figure 5 and Figure 6 present solution snapshots at different time instances for representative test scenarios, revealing qualitative differences in interface preservation and evolution characteristics.
Figure 4.
Standard Double-Well scenario showing symmetric interface evolution. IINN preserves interface structure throughout evolution, while baseline methods exhibit interface distortion.
Figure 5.
Asymmetric Double-Well scenario evolution. IINN preserves interface structure while baseline methods show profile distortion.
Figure 6.
Complex Multi-Well scenario evolution. IINN maintains multi-interface dynamics while baseline methods show solution degradation.
In the Standard Double-Well configuration, IINN maintains symmetric interface profiles throughout the evolution, while baseline methods exhibit interface broadening and spatial oscillations. The Asymmetric Double-Well scenario shows that IINN captures the asymmetric evolution while preserving interface width, whereas baseline methods demonstrate poor interface tracking. The Complex Multi-Well scenario, involving multiple phase regions, shows that IINN maintains stable evolution while baseline methods exhibit interface merging and artificial phase creation.
5.9. Discussion
The results suggest that embedding IMEX operator splitting structure directly into neural network architectures provides benefits for the Allen–Cahn equation. The architectural preservation of the decomposition appears to contribute to improved accuracy and conservation properties compared to standard physics-informed approaches. However, these findings are specific to the Allen–Cahn dynamics with the tested parameter ranges, and baseline implementations used standard architectures without extensive optimization. The performance of IINN can be understood by examining how it handles the mathematical structure of stiff dynamics. The Allen–Cahn equation couples two processes operating on vastly different timescales: rapid diffusive relaxation governed by and slower interface motion driven by the nonlinear term . When is small, explicit numerical methods become unstable because they cannot adequately resolve the fast diffusive timescale without prohibitively small time steps. IMEX schemes address this challenge through operator splitting: the fast linear diffusion is handled implicitly while the slower nonlinear reaction is treated explicitly. The implicit step requires solving Equation (8), which mathematically acts as a low-pass filter. High-frequency spatial modes that could trigger numerical instabilities are attenuated by factors , while low-frequency modes representing the physical interface structure pass through with minimal modification. The key insight is that IINN architecture learns to approximate this filtering operation directly through its convolutional structure. Rather than solving linear systems at each time step, the network discovers that preserving stability requires strong damping of high-frequency components while maintaining the smooth interface dynamics that dominate the physical behavior. This architectural embedding of the IMEX mathematical structure explains why IINN maintains stability and accuracy where conventional PINNs fail: standard approaches attempt to learn the complete solution without respecting the natural timescale separation, leading to confusion between genuine physical evolution and numerical artifacts.
The current approach has several important limitations. Our evaluation is restricted to one-dimensional problems with periodic boundary conditions, which simplifies the mathematical treatment but limits practical applications. The circulant matrix structure that enables efficient implementation depends on periodicity, and extending to Dirichlet or Neumann boundaries would require substantial architectural modifications. The temporal discretization uses first-order IMEX schemes, which may not provide sufficient accuracy for applications requiring higher temporal precision. Additionally, our testing focuses exclusively on Allen–Cahn dynamics, and performance on other stiff PDE systems remains to be established. Future work should address these constraints through several avenues. Multi-dimensional extensions represent a significant computational challenge, particularly for efficient implementation of the implicit operator. Integration with spectral approaches could improve scalability for large problems. Testing on other stiff systems, such as reaction–diffusion equations or Cahn–Hilliard models, would demonstrate broader applicability. Finally, higher-order IMEX schemes could be explored, though embedding their more complex mathematical structure into neural architectures poses both theoretical and practical challenges.
6. Conclusions
This work introduces IMEX-informed neural networks (IINNs), which embed an implicit–explicit integration scheme structure directly into neural network architectures for solving partial differential equations. Unlike conventional physics-informed neural networks that enforce constraints through loss penalties, the proposed approach incorporates the proven IMEX operator decomposition within the network design itself. Numerical evaluation on Allen–Cahn equation dynamics across five double-well scenarios demonstrates measurable improvements in solution accuracy and conservation properties. IINNs achieved relative errors of approximately , two to three orders of magnitude lower than baseline methods exhibiting errors of to . More importantly, the method maintained mass conservation within 1.25% and appropriate energy dissipation, while baseline approaches showed mass deviations exceeding 75% and significant energy violations. The architectural preservation of numerical method structure provides inherent stability advantages for stiff interface problems, where conventional approaches face time step restrictions and training difficulties with multiple time scales. However, these findings are constrained to Allen–Cahn dynamics with specific parameter ranges, and baseline implementations used standard architectures without extensive optimization. The results suggest that structure-preserving neural architectures represent a promising direction for stiff PDEs when underlying physics exhibits well-understood mathematical structure amenable to operator splitting.
Author Contributions
Conceptualization, P.D.L. and L.M.; Methodology, P.D.L.; Software, P.D.L.; Validation, P.D.L. and L.M.; Formal analysis, P.D.L. and L.M.; Investigation, P.D.L.; Data curation, P.D.L.; Writing—original draft, P.D.L. and L.M.; Writing—review & editing, P.D.L. and L.M.; Visualization, P.D.L. and L.M.; Supervision, P.D.L. All authors have read and agreed to the published version of the manuscript.
Funding
This research received no external funding.
Data Availability Statement
The data presented in this study are available on request from the corresponding author.
Acknowledgments
De Luca P. and Marcellino L. are member of the Gruppo Nazionale Calcolo Scientifico-Istituto Nazionale di Alta Matematica (GNCS-INdAM).
Conflicts of Interest
The authors declare no conflicts of interest.
Abbreviations
The following abbreviations are used in this manuscript:
| PDEs | Partial Differential Equations |
| PINNs | Physics-Informed Neural Networks |
| IINNs | IMEX-Informed Neural Networks |
| IMEX | Implicit–Explicit |
References
- Raissi, M.; Perdikaris, P.; Karniadakis, G.E. Physics-informed neural networks: A deep learning framework for solving forward and inverse problems involving nonlinear partial differential equations. J. Comput. Phys. 2019, 378, 686–707. [Google Scholar] [CrossRef]
- Valentino, C.; Pagano, G.; Conte, D.; Paternoster, B.; Colace, F.; Casillo, M. Step-by-step time discrete Physics-Informed Neural Networks with application to a sustainability PDE model. Math. Comput. Simul. 2025, 230, 541–558. [Google Scholar] [CrossRef]
- Boscarino, S.; Pareschi, L.; Russo, G. Implicit-Explicit Methods for Evolutionary Partial Differential Equations; SIAM: Philadelphia, PA, USA, 2023. [Google Scholar]
- Cuomo, S.; di Cola, V.S.; Giampaolo, F.; Rozza, G.; Raissi, M.; Piccialli, F. Scientific machine learning through physics-informed neural networks: Where we are and what’s next. J. Sci. Comput. 2022, 92, 88. [Google Scholar] [CrossRef]
- Mishra, S.; Molinaro, R. Numerical analysis of physics-informed neural networks and related models in physics-informed machine learning. Acta Numer. 2023, 32, 1–99. [Google Scholar]
- Hernández, Q.; Badías, A.; González, D.; Chinesta, F.; Cueto, E. Structure-preserving neural networks. J. Comput. Phys. 2020, 426, 109950. [Google Scholar] [CrossRef]
- Greydanus, S.; Dzamba, M.; Yosinski, J. Hamiltonian neural networks. Adv. Neural Inf. Process. Syst. 2019, 32, 15353–15363. [Google Scholar]
- Ciaramella, A.; De Angelis, D.; De Luca, P.; Marcellino, L. Accelerated Numerical Simulations of a Reaction-Diffusion- Advection Model Using Julia-CUDA. Mathematics 2025, 13, 1488. [Google Scholar] [CrossRef]
- Di Vicino, A.; De Luca, P.; Marcellino, L. First experiences on exploiting physics-informed neural networks for approximating solutions of a biological model. In Proceedings of the International Conference on Computational Science, Singapore, 7–9 July 2025; Springer International Publishing: Cham, Switzerland, 2025. Chapter 2, LNCS 15910, Part VI. [Google Scholar] [CrossRef]
- Wang, S.; Teng, Y.; Perdikaris, P. Respecting causality is all you need for training physics-informed neural networks. arXiv 2022, arXiv:2203.07404. [Google Scholar] [CrossRef]
- Lu, L.; Jin, P.; Pang, G.; Zhang, Z.; Karniadakis, G.E. Learning nonlinear operators via DeepONet based on the universal approximation theorem of operators. Nat. Mach. Intell. 2021, 3, 218–229. [Google Scholar] [CrossRef]
- Li, Z.; Kovachki, N.; Azizzadenesheli, K.; Liu, B.; Bhattacharya, K.; Stuart, A.; An kumar, A. Fourier neural operator for parametric partial differential equations. arXiv 2021, arXiv:2010.08895. [Google Scholar] [CrossRef]
- Chen, R.T.Q.; Rubanova, Y.; Bettencourt, J.; Duvenaud, D.K. Neural ordinary differential equations. Adv. Neural Inf. Process. Syst. 2018, 31, 6572–6583. [Google Scholar]
- Bartels, S. The Allen–Cahn Equation. In Numerical Methods for Nonlinear Partial Differential Equations; Springer International Publishing: Cham, Switzerland, 2015; pp. 153–182. [Google Scholar]
- Strang, G. On the construction and comparison of difference schemes. SIAM J. Numer. Anal. 1968, 5, 506–517. [Google Scholar] [CrossRef]
- González-Pinto, S.; Hernández-Abreu, D.; Pérez-Rodríguez, M.S.; Sarshar, A.; Roberts, S.; Sandu, A. A unified formulation of splitting-based implicit time integration schemes. J. Comput. Phys. 2022, 448, 110766. [Google Scholar] [CrossRef]
- Kassam, A.-K.; Trefethen, L.N. Fourth-order time-stepping for stiff PDEs. siam J. Sci. Comput. 2005, 26, 1214–1233. [Google Scholar] [CrossRef]
- Hundsdorfer, W.; Verwer, J.G. Numerical Solution of Time-Dependent Advection-Diffusion-Reaction Equations; Springer Science & Business Media: Berlin/Heidelberg, Germany, 2013; Volume 33. [Google Scholar]
- Adams, R.A.; Fournier, J.J.F. Sobolev Spaces, 2nd ed.; Academic Press: Cambridge, MA, USA, 2003. [Google Scholar]
- Gilbarg, D.; Trudinger, N.S. Elliptic Partial Differential Equations of Second Order; Springer: Berlin/Heidelberg, Germany, 2001. [Google Scholar]
- Folland, G.B. Introduction to Partial Differential Equations; Princeton University Press: Princeton, NJ, USA, 2020; Volume 102. [Google Scholar]
- Evans, L.C. Partial Differential Equations. Graduate Studies in Mathematics; American Mathematical Society: Providence, RI, USA, 1998; Volume 19. [Google Scholar]
- Reed, M.; Simon, B. Methods of Modern Mathematical Physics I: Functional Analysis; Academic Press: Cambridge, MA, USA, 1980. [Google Scholar]
- Schmid, P.J.; Li, L.; Juniper, M.P.; Pust, O. Applications of the dynamic mode decomposition. Theor. Comput. Fluid Dyn. 2011, 25, 249–259. [Google Scholar] [CrossRef]
- Cybenko, G. Approximation by superpositions of a sigmoidal function. Math. Control Signals Syst. 1989, 2, 303–314. [Google Scholar] [CrossRef]
- Luong, K.A.; Wahab, M.A.; Lee, J.H. Simultaneous imposition of initial and boundary conditions via decoupled physics-informed neural networks for solving initial-boundary value problems. Appl. Math. Mech.-Engl. Ed. 2025, 46, 763–780. [Google Scholar] [CrossRef]
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content. |
© 2025 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).