1. Introduction
Multi-phase flow in porous media underpins critical applications in subsurface engineering—from groundwater remediation to carbon sequestration. The governing equations, nonlinear partial differential equations (PDEs) that capture the intricate interactions among fluid phases and the heterogeneous porous structure, are notoriously difficult to solve numerically. Traditional solvers, whether implemented in commercial software or open-source tools, require extremely fine discretizations to capture the essential dynamics, which in turn leads to prohibitive computational costs and long turnaround times, especially when conducting multi-parameter studies or producing high-resolution predictions.
While deep learning has revolutionized fields such as imaging, speech recognition, and natural language processing [
1], the application of machine learning techniques that explicitly incorporate physical laws is relatively new. Early contributions such as the physics-informed neural networks (PINNs) introduced [
2] and later refined in works like those of [
3,
4] have demonstrated that embedding governing equations into the training process can guide the solution toward physically consistent behavior. Yet, these methods have often yielded modest gains when compared with the dramatic successes observed in other areas of deep learning.
A particularly transformative approach comes from the emerging field of operator learning, which seeks to approximate mappings between infinite-dimensional function spaces. In this context, the Fourier Neural Operator (FNO) introduced [
5] and extended [
6] represents a paradigm shift. By reformulating the kernel integral in Fourier space and truncating the high-frequency components, FNO achieves spectral convolution that is both computationally efficient and inherently suited to enforcing physical constraints. Crucially, its invariance with respect to discretization allows the learned operator to generalize seamlessly across arbitrary grids without requiring re-training—a property of immense value for high-resolution simulations.
In porous media flow, where multi-scale heterogeneities and strong nonlinearities prevail, integrating physics-informed loss terms into the FNO framework (as demonstrated in physics-informed neural operators [
7,
8]) has proven essential. This integration ensures that the surrogate model rigorously adheres to the underlying PDEs, thereby reducing the computational effort needed to achieve accurate predictions. For example, implementations on platforms such as NVIDIA Modulus have shown that FNO-based approaches can reduce simulation times by up to three orders of magnitude while preserving accuracy across different discretizations [
9]. More recently, several studies have adapted FNO-style architectures specifically to multiphase flow in porous media and related subsurface applications, including enhanced multiphase FNO variants, residual-corrected neural operators for nonlinear PDE inverse problems, and operator-based surrogates for field-scale reservoir simulations [
10,
11,
12,
13].
However, existing applications of Fourier Neural Operators to porous-media flow still face important limitations. Most studies focus on single-phase or weakly coupled settings and treat only one state variable (typically pressure or saturation), thereby neglecting the strong two-way coupling that characterizes multiphase displacement. Purely data-driven FNOs also tend to oversmooth sharp saturation fronts and underestimate channelized flow in highly heterogeneous formations such as SPE10, particularly during early transients. In many implementations, time is handled as a simple scalar feature or as an additional spatial coordinate, which can degrade accuracy over long horizons and lead to drift between transient and pseudo-steady regimes. Moreover, existing FNO surrogates rarely enforce PDE residuals during training, so saturation and pressure predictions may violate mass balance and Darcy’s law, reducing robustness when extrapolating across injection scenarios.
This work presents an advanced framework for applying FNO to the physics-informed simulation of multi-phase flow in porous media. Building on a dual-branch formulation (DB-AFNO), the physics-based loss functions are integrated with different temporal encodings (e.g., Time2Vec, TFT/TST) to honor the governing PDE constraints in both saturation and pressure fields. Our comprehensive computational analysis demonstrates that DB-AFNO outperforms conventional, purely data-driven FNO approaches across diverse tracer transport configurations, reducing errors during early transient phases and maintaining accuracy in later pseudo-steady states. In particular, DB-AFNO with Time2Vec and TFT/TST concentrates the majority of saturation predictions in the lower error ranges, preserving critical flow structures that conventional FNO often oversmooth. Blind testing on top and bottom layers of the 10th SPE Comparative Solution Project [
14] highlights the framework’s potential for scalable, high-resolution simulations without incurring the computational costs of traditional full-order solvers.
The uniqueness of this approach lies in explicitly coupling the two PDE variables using a multitask, time-aware architecture. While conventional FNOs typically address a single field, our model retains the core spectral convolution operations and extends them via key modifications. First, a shared spectral encoder predicts both pressure and saturation, extracting spatial features that naturally encode interdependencies. Second, two distinct time encoders—one for saturation and one for pressure—transform the scalar time t into high-dimensional representations that capture differing temporal sensitivities while preserving mutual spatial information. Finally, after decoding via separate branches, the outputs are fused through an additional convolutional layer that enforces consistency between the predictions through cross-information exchange.
2. Methods
This section presents a technical description of the problem using the dynamical state-space framework. The focus of this formulation is on the incompressible and immiscible displacement of two phases within a porous media. More recently, the same principles have been extended to carbon sequestration efforts, specifically in the displacement of brine by
[
15]. In this work, we adopt a Darcy-scale, incompressible and immiscible two-phase formulation on a fixed Cartesian grid. The two phases are denoted wetting and non-wetting, with saturations
and
and corresponding pressures
and
. The physical domain
is discretized into
N control volumes on a structured grid, and the primary unknowns collected in the state vector are the cell-wise saturation and pressure fields. The fluid system is isothermal, with constant viscosities and densities, and rock properties (porosity and absolute permeability) are time-invariant and defined per grid cell. Flow is governed by Darcy’s law under a single-continuum representation, without explicit geomechanical deformation, fracture propagation, or matrix–fracture transfer terms, and without capillary-pressure hysteresis. Gravity is included through a constant acceleration vector, while molecular diffusion and dispersion are neglected so that transport is driven by advection and sources/sinks only. Under these assumptions, the saturation constraint reduces to
, and a monotone capillary-pressure law
together with Corey-type relative permeability functions
closes the system. The governing equations fully describe the displacement process considered here. Gas-phase transport, solution-gas liberation, and evaporation/condensation processes are neglected; consequently, no gas saturation or gas pressure variables are introduced into the state vector, and all volumetric source terms represent injection and production of the two liquid phases only. A concise summary of the formulation is provided in
Table 1,
Table 2 and
Table 3.
The above formulation was coded in Python using Pytorch and Cuda to allow for interoperable transfer between the simulator and the neural operator architecture. In the following sections, we will present the numerical methods used for the simulation part along with the neural operator architectures.
2.1. Model Description
In this study, we used the SPE10 Model 2 dataset [
14], an open source benchmark known to evaluate up-scaling methods for aquifer flow and tracer transport. This is used as an analogue to an underground water aquifer for tracer transport. This data set is composed of a Brent sequence mapped on a Cartesian grid with dimensions of
, resulting in 1,122,000 cells. The model encompasses two distinct formations: the shallow-marine Tarbert formation occupying the upper 35 layers, characterized by relatively smooth permeability, and the fluvial Upper-Ness formation constituting the lower 50 layers, noted for its significant permeability variations spanning 8 to 12 orders of magnitude. These formations present markedly different permeability structures, as highlighted in
Figure 1, where the model is inverted to emphasize the heterogeneity within the Upper-Ness formation. This complex permeability and porosity distribution poses a rigorous challenge, making it an ideal test case for operator learning.
2.2. Numerical Methods
We have implemented a framework that incorporates an adaptive trapezoidal integration scheme. The Generalized Conjugate Residual (GCR) method serves as a robust solver for linear systems within each Newton iteration, providing efficient handling of sparse matrix structures and enhancing the convergence characteristics of the overall method. The Newton’s algorithm, integrated with the GCR solver, is designed to leverage its fast convergence properties, significantly reducing the computational overhead associated with direct methods like LU decomposition, while maintaining the capacity for handling nonlinear systems effectively. This integration of the adaptive trapezoidal method with iterative linear solvers is crucial in our simulations, striking an optimal balance between computational efficiency and the accuracy of transient dynamic responses in complex multiphase flows.
There are two encountered main challenges in this work: (1) building a stable model to act as a base for our neural operator implementation and (2) ensuring that the model is computationally efficient.
To circumvent these challenges, we have implemented several advanced numerical techniques and optimization methods that enhance both the stability and efficiency of the model. These techniques include the adaptive dynamic stepping scheme, the use of GCR with a banded LU preconditioner, and the employment of automatic differentiation for efficient Jacobian computation.
2.2.1. Preconditioned Iterative Solvers
Iterative methods were deemed to be more suitable for our approach due to the ability to predetermine the accuracy needed for phase saturations. However, iterative methods, specifically GCR, may not be efficient for sparse matrices out-of-the-box. Therefore, using a preconditioner is important to achieve fast convergence while having control over the desired accuracy.
Banded LU/ILU preconditioners are particularly suitable for systems where the matrix
A exhibits a banded structure [
16]. This approach is advantageous for large sparse systems with a banded pattern, as it significantly reduces the fill-in compared to a full LU decomposition, thereby preserving the computational efficiency.
Alternatively, when the Jacobian matrix
A is symmetric and positive-definite, a Cholesky decomposition can be used for preconditioning [
17]. The preconditioned system is solved via
Hence, multi-grid preconditioners are also common for fluid flow problems, however, Python does not support GPU direct implement similar to PyAMG on the CPU therefore it was not included in our benchmark. The choice between banded LU and Cholesky preconditioning depends on the structure and properties of the Jacobian matrix, Cholesky preferred for symmetric positive-definite systems because of its computational efficiency. In our model, we found that ILU is more stable for our Neumann boundary condition pressure solver.
2.2.2. Adaptive Dynamic Stepping:
In the adaptive time-stepping scheme implemented within the simulation framework, the trapezoidal method is utilized to preserve numerical stability without the need for the implementation of CFL, particularly crucial in the context of stiff systems encountered in multiphase flow simulations. The integration process begins with a prediction step that takes advantage of the explicit Forward Euler method to approximate saturations at the subsequent timestep, denoted as
. This preliminary prediction is calculated according to the following equation:
where
represents the current saturation state,
signifies the current timestep, and
is the rate of change in saturation evaluated in the current state. This predictor step serves as an initial approximation for the subsequent implicit corrector step, which refines this estimate to enhance stability and accuracy. The corrector phase is carried out using an implicit trapezoidal integration, where the corrected saturation
is determined by resolving the nonlinear equation:
Here, denotes the rate of saturation change evaluated in the next predicted state. This equation is solved iteratively using the Newton–Raphson method.
To dynamically adjust the timestep
, an error estimation mechanism is incorporated that compares the norm of the discrepancy between the predictor and corrector solutions against a predefined tolerance. This adaptive time-step control is vital for managing computational efficiency and accuracy throughout varying dynamics of the modeled system. If the error estimate exceeds the tolerance threshold,
is reduced to enhance resolution; conversely, if the error is significantly below the tolerance,
is increased to expedite computations without sacrificing significant accuracy.In the context of dynamically regulating the time step
within numerical simulations, a sophisticated approach is required to ensure stability and precision under varying conditions. The reduction factor for adjusting
can be mathematically defined using an exponential function, providing a smooth and continuous regulation mechanism. The reduction factor
R for the time step
is given by the following equation:
where
is the minimum reduction factor,
is the maximum reduction factor and
is a scaling parameter controlling the steepness of the exponential decay.
In this formulation, as
increases, the reduction factor
R approaches the lower bound
, whereas as
decreases,
R approaches the upper bound
. To dynamically adjust the time step
, the new time step
is defined as
where
is the minimum allowable step size, preventing excessively small steps that would be computationally inefficient. The stability constraints for
are derived from the spectral properties of the system’s Jacobian
, using the inverse of its maximum eigenvalue,
, and a more conservative bound,
. These limits ensure that timestep updates remain within a range that preserves the numerical stability of the integration method for multiphase flow. All pseudocode listings for the above algorithms, Algorithms A1–A4, are provided in
Appendix A.
2.3. Neural Operator Architecture
We use a DB–AFNO neural operator that fuses spectral, spatial, and physics-based information to predict mean saturation
and variance
. Static geological properties—porosity
, permeability
—initial states
,
, and phase viscosities
,
are combined with temporal encoding via Time2Vec and a causal transformer that conditions each step on the time index and
. Fourier layers capture the coupled space–time response, and training minimizes a composite loss that blends data misfit with PDE residuals enforcing mass conservation and Darcy flow. In this way, the network advances the state in time and returns high-resolution, physically consistent pressure
P and saturation
S fields at arbitrary query times, as summarized in
Figure 2.
2.3.1. Data Preparation
The data preparation pipeline for the physics-informed Fourier Neural Operator framework integrates simulation outputs—including grid parameters,
K fields, well configurations, initial saturation distributions, and fluid properties—stored in serialized files. The raw data is reformatted for a consistent spatial representation; for example, the
K tensor is converted to an effective scalar field via magnitude computation. Temporal dynamics are captured using a multi-frequency positional encoding with sine and cosine functions over logarithmically spaced scales, which is then spatially broadcast and concatenated with static input channels to form a comprehensive feature tensor. Global statistical measures (mean and standard deviation) are computed over the entire dataset and applied to both input and target fields to standardize the data, reducing numerical variability and aiding convergence during training. Early time steps are replicated to better represent rapid transients. Over 1000 simulation runs at half fidelity were performed, with the data partitioned 80% for training and 20% for validation, plus additional cases reserved for blind testing. These runs span a wide range of heterogeneous SPE10 Model 2 realizations, as well as idealized cases with spatially uniform permeability and porosity that are included explicitly in the training set. Across all runs, we vary initial water saturations, viscosity ratios, well locations, and injection schedules, resulting in substantial variability in flood-front shapes, breakthrough times, and recovery factors. A statistical overview of the key petrophysical variables is provided in
Figure 3, and descriptive statistics are listed in
Table 4, demonstrating that the training set covers a broad portion of the oil–water displacement behavior within this benchmark setting. This process is implemented using a modular Python framework built on PyTorch (version 2.9.1, CUDA 12.6), supporting efficient file handling, dynamic sample generation, and mini-batch processing for large-scale data, with the simulations executed on hardware platforms equipped with RTX Titan GPUs.
2.3.2. Pure Data-Driven Approach
The model processes inputs
via:
where
and
denote Fourier and inverse Fourier transforms,
are learnable spectral weights, and
represents learned Fourier features. Subsequent layers combine spectral, spatial, and physics-guided paths:
where
, and
correspond to spectral convolution and spatial depthwise convolution, respectively.
2.3.3. Dual Output Approach
In this physics-informed multi-task framework, the model jointly predicts transient pressure (
P) and saturation (
S) fields governed by two-phase flow dynamics.
The composite loss function is designed to enforce both data fidelity and the underlying physical constraints through a dual-task formulation. It is defined as
where
is a weighting hyperparameter. The PDE residuals are expressed as
In this formulation, pressure gradients drive the propagation of the saturation front via the nonlinear interaction expressed by the product , while porosity modulates the effective velocity in the aquifer. This dual-output approach, therefore, not only minimizes discrepancies between the predicted and true fields but also rigorously enforces the physical laws governing two-phase flow in porous media.
For this coupling, handling the temporal part becomes a necessity. The temporal part of the PDE was tackled by incorporating a temporal representation into the input features with the goal of capturing both short-term variations and long-term trends. In our work, we consider three complementary approaches:
Temporal Sinusoidal Encoding
Temporal sinusoidal encoding relies on fixed periodic functions to map a scalar time
t into a high-dimensional vector. At time step
t, the encoding is defined as
where
and
d is the embedding dimension. This encoding is then concatenated with the static input features
(e.g., permeability, well locations) and the previous saturation
to form
This approach, introduced in the Transformer architecture [
18], remains a popular and effective means to provide temporal context.
Time2Vec
Time2Vec is a learnable temporal encoding method that transforms the scalar time
t into a vector composed of one linear component and several periodic components. The encoding is given by
where
and
are learnable parameters. By learning these parameters from data, Time2Vec is capable of capturing complex, nonlinear temporal patterns. This method has demonstrated advantages over fixed encodings in various time-series applications [
19].
2.4. Training and Optimization
All models were implemented in PyTorch and trained on a single NVIDIA GPU. Inputs were standardized per channel using statistics computed once on the training set; saturation and pressure targets were standardized separately. Losses were computed in standardized space, and predictions were de-standardized for reporting. The network uses per-channel normalization in the spectral trunk and BatchNorm in the convolutional decoder; dropout is applied only in the pressure decoder (). A learned 64-dimensional temporal/positional embedding of the scalar time/step is concatenated to the static inputs and broadcast over the spatial grid. No geometric augmentation was used; to emphasize early transients we oversampled early steps () by a factor of 2.
Optimization used AdamW (initial learning rate , weight decay ). A cosine-annealing schedule with warm restarts (CosineAnnealingWarmRestarts; , , ) was stepped per optimizer update and not reset across curriculum phases. Mini-batch size was 64, with global-norm gradient clipping at 2. The data term was Smooth- (Huber) with , applied to saturation and pressure with equal weights. A physics-residual term (conservation and Darcy consistency) was weighted by , where e is the global epoch index; a total-variation penalty on the pressure head was available but set to zero in the main runs.
Training followed a staged curriculum over the simulation horizon. In the first phase, the network was trained only on the earliest portion of each trajectory (up to time index 25), and in subsequent phases the maximum time index was progressively increased to 100, 500, and finally the full trajectory (
∞), using 50, 50, 50, and 500 epochs per phase, respectively. This schedule forces the model to first capture early-time transients and sharp displacement fronts before fitting the later-time, quasi-steady-state behavior over the entire horizon. A concise summary of the training and configuration settings is provided in
Table 5.
4. Discussion
Across the 2-D benchmarks, DB-AFNO consistently outperforms a purely data-driven FNO. Error distributions in
norm show that Time2Vec- and TFT/TST-encoded variants concentrate early-time errors (
) in the
–
band, whereas Positional Encoding and the conventional FNO exhibit heavier tails (
Figure 6). Near pseudo–steady state (
), Time2Vec and TFT/TST remain closely matched and continue to dominate Positional Encoding. Visual comparisons corroborate these statistics: the baseline FNO produces over-smoothed fields with disconnected saturation artifacts, while DB-AFNO retains sharp, physically plausible fronts and spatial connectivity (
Figure 7). These gains arise from (i) the explicit coupling of pressure and saturation through dual branches, (ii) temporal conditioning that disambiguates early transients, and (iii) physics-informed losses that penalize mass-balance and Darcy residuals.
For zero-shot super-resolution, a train-low, infer-high regimen enables coarse-to-fine mapping without retraining. A single coarse/fine pair
is sufficient to learn mesh-agnostic upscaling, aided by a Cartesian coordinate-grid embedding for absolute spatial context and an incremental multi-resolution curriculum that increases retained Fourier modes and spatial resolution only after validation plateaus. This curriculum halves wall-clock training and reduces GPU memory by ∼40% relative to single-shot high-resolution training, while the deployed pipeline (coarse TPFA + one forward pass) achieves ∼3× speedup over full-order fine-grid simulation (
Table 6). On five unseen layers, we obtain
,
, and
, with >90% of voxels below
absolute error. Three-dimensional renderings confirm preservation of water-flood fronts, channel continuity, and high-saturation ridges at
volumetric up-resolution (
Figure 8), and layer-wise scatter plots indicate tight agreement at the final time step with similar behavior at other times (
Figure 9).
A key technical limitation of the zero-shot path is its dependence on the fidelity of the coarse solution: the mapping presumes that the coarse grid resolves the dominant transport features. When the input was degraded to , the same accuracy was not recovered, indicating a lower bound on coarse resolution for reliable upscale inference. Minor discrepancies in the validation figures are consistent with numerical smoothing choices rather than architectural deficiencies.
Overall, the results indicate that (i) enforcing coupled physics with dual-branch decoding and temporal encodings improves early-time accuracy and suppresses over-smoothing; (ii) the operator generalizes to unseen layers within SPE10 when conditioned on physically meaningful inputs; and (iii) zero-shot super-resolution provides accurate fine-grid fields at a fraction of the computational cost, provided the coarse simulation captures the essential flow physics.
The DB-AFNO surrogates presented here are trained entirely on incompressible, immiscible two-phase displacements, and thus are most reliable for formations and operating conditions with similar heterogeneity statistics, viscosity ratios, and flow regimes; all results in this study pertain to such systems with no explicit gas phase or compositional effects. When the underlying physics changes significantly—e.g., strong capillary hysteresis, gas evolution or gas caps, or fracture-dominated flow—retraining or transfer learning on simulations from the new regime would be required. While the dual-branch architecture is, in principle, extensible to three-phase or compositional flow (e.g., via an additional output branch and corresponding physics residuals), this lies beyond the scope of the present work and is left for future investigation.