An Energy Minimization-Based Deep Learning Approach with Enhanced Stability for the Allen-Cahn Equation

He, Xianghong; Wang, Yuhan; Wu, Rentao; Gao, Jidong; Zhang, Rongpei

doi:10.3390/axioms14110806

Open AccessArticle

An Energy Minimization-Based Deep Learning Approach with Enhanced Stability for the Allen-Cahn Equation

by

Xianghong He

¹,

Yuhan Wang

^2,*,

Rentao Wu

²,

Jidong Gao

² and

Rongpei Zhang

^2,*

¹

College of General Education, Guangdong University of Science and Technology, Dongguan 523668, China

²

School of Advanced Manufacturing, Guangdong University of Technology, Jieyang 515200, China

^*

Authors to whom correspondence should be addressed.

Axioms 2025, 14(11), 806; https://doi.org/10.3390/axioms14110806

Submission received: 30 September 2025 / Revised: 24 October 2025 / Accepted: 28 October 2025 / Published: 30 October 2025

Download

Browse Figures

Versions Notes

Abstract

The Allen-Cahn equation is a fundamental model in materials science for describing phase separation phenomena. This paper introduces an Energy-Stabilized Scaled Deep Neural Network (ES-ScaDNN) framework to solve the Allen-Cahn equation by energy minimization. Unlike traditional numerical methods, our approach directly approximates the solution of steady-state solution the Allen-Cahn equation by minimizing the associated energy functional using a deep neural network. ES-ScaDNN incorporates two key innovations. The first is a scaling layer designed to map the network output to the physical range of the Allen-Cahn phase variable. The second is a variance-based regularization term designed to promote clear phase separation. We demonstrate the accuracy and efficiency of ES-ScaDNN through comprehensive numerical experiments in both one and two dimensions. Our results show that ReLU activation functions are particularly well-suited for one-dimensional cases, while tanh functions are more suitable for two-dimensional problems due to their superior ability to maintain solution smoothness. Furthermore, we investigate how training epochs and the interface parameter ε influence the behavior of the solution. ES-ScaDNN provides a novel, accurate, and efficient deep learning framework for solving the Allen-Cahn equation, paving the way for tackling more complex phase-field problems.

Keywords:

Allen-Cahn equation; energy minimization; scaling layer; energy-stabilized scaled deep neural network

MSC:

35Q99; 65N75

1. Introduction

Partial differential equations (PDEs) play a fundamental role in describing complex phenomena across natural sciences and engineering. The Allen-Cahn (AC) equation, first proposed by Allen and Cahn in 1979 [1], are widely applied in materials science to describe phase separation processes and interface dynamics. This equation, associated with a double-well potential, drives the system toward one of two stable states depending on initial conditions, which is crucial for simulating phase transitions and understanding pattern formation in complex materials [2,3,4,5]. To provide a visual illustration of this process, Figure 1 shows typical phase separation dynamics in both one- and two-dimensional domains, where an initial state evolves to a stable, phase-separated steady state.

The classical approach to solving the Allen-Cahn equation involves transforming the energy minimization problem into a time-dependent partial differential equation (PDE) through gradient flow, which is crucial for applying various numerical techniques to approximate solutions effectively. Different time discretization implicit, semi-implicit, and explicit schemes are employed for solving the ordinary differential equations obtained by spatial discretization [6,7,8]. The spatial discretization typically utilizes finite difference, finite element, or spectral methods [9,10,11]. However, the nonlinearity of the Allen-Cahn equation and sharp interface transitions pose major challenges for traditional numerical methods, particularly in capturing long-term behavior and ensuring solution stability in higher dimensions.

Recent advancements have focused on developing energy-stable numerical schemes that preserve critical properties such as the maximum bound principle and energy dissipation law. For instance, Chen et al. proposed a fourth-order structure-preserving method that conserves mass and adheres to the maximum principle under reasonable time step-size restrictions [12]. Li et al. utilized an adaptive discontinuous finite volume element method to enhance accuracy and stability in simulations of the Allen-Cahn equation [13]. Doan et al. presented fully discrete error estimates for first-order low-regularity integrators [14]. A linear doubly stabilized Crank-Nicolson scheme has been developed to improve stability [15]. Additionally, efforts are being made to approximate the stochastic Allen-Cahn equation that incorporates random diffusion coefficients and multiplicative noise [16]. While traditional numerical methods provide a foundation for solving the Allen-Cahn equation, their inherent limitations in handling complex scenarios have motivated the exploration of novel approaches.

Neural networks, particularly Physics-Informed Neural Networks (PINNs), have emerged as a promising alternative due to their ability to handle nonlinear problems and approximate complex functions. PINNs, which integrate governing physical laws directly into their loss functions, have had enormous success in the scientific computing area [17]. These networks have shown significant potential in approximating solutions to the Allen-Cahn (AC) equation in multi-phase systems. Zhao et al. presented PINN to solve the Allen-Cahn equation. Recent advancements include constrained self-adaptive architectures and adversarial training strategies [18,19]. Despite their successes, the introduction of time sampling points in Physics-Informed Neural Networks (PINNs) introduces several challenges. The discretization in the time domain often leads to an increased number of data points. This added complexity can slow down the training process. Moreover, for steady-state or long-term equilibrium problems, this time-dependent formulation can be inefficient and may struggle to accurately capture the final equilibrium state.

An alternative approach, which bypasses the time dimension entirely, is to reformulate the problem as an energy minimization problem. Energy minimization formulations are particularly suitable for problems that possess a natural minimization principle, such as many problems encountered in solid mechanics [20], phase-filed models [21], and ground state problems [22]. Poudel et al. recently proposed Functional Optimization using the Neural Networks (FONN) method, which directly minimizes the system’s energy through neural networks processing information at discrete grid points [23]. Their progressive energy reduction technique demonstrates the potential of neural networks in handling energy minimization problems efficiently. Liu et al. solved the linear elasticity equation by neural network based on the energy minimization formulations [24]. Bao et al. propose a deep neural network to solve the ground states of Bose-Einstein condensation via the minimization of the energy functional [25]. However, to the best of our knowledge, energy minimization principles have not been directly employed within a deep learning framework specifically for solving the Allen-Cahn equation’s steady state or quasi-steady states, which is the focus of this work.

The Allen-Cahn equation can be viewed as a gradient flow of an energy functional. In this paper, we shall propose a deep neural network framework, which we call Energy-Stabilized Scaled Deep Neural Network (ES-ScaDNN), for solving the Allen-Cahn equation through energy minimization. The “enhanced stability” highlighted in our title is achieved through several key innovations. Firstly, by directly targeting the energy minimization problem, our approach circumvents time discretization, thus inherently avoiding the stability constraints associated with traditional time-dependent solvers. Secondly, we introduce a specialized scaling layer that enforces the physical [−1, 1] bounds on the network output, which prevents numerical divergence and enhances training stability. Finally, a novel variance-based regularization term is incorporated into our composite loss function to stabilize the optimization, actively promoting phase separation and preventing convergence to non-physical, homogeneous states. This robust framework, which includes a pretraining mechanism for the 1D case, demonstrates good scalability and is successfully applied to both one-dimensional and two-dimensional problems while maintaining high accuracy and computational efficiency.

The rest of the paper is organized as follows. In Section 2, we present the mathematical framework of the Allen-Cahn equation and derive its energy functional formulation. Section 3 introduces our proposed Energy-Stabilized Scaled Deep Neural Network (ES-ScaDNN) method, including the network architecture, scaling layer, and loss function design. Section 4 investigates the ES-ScaDNN for solving the one-dimensional Allen-Cahn equation, with a detailed analysis of activation functions, solution accuracy, and parameter effects. Section 5 extends the method to the two-dimensional case and presents comprehensive performance evaluations. Finally, conclusions are drawn in Section 6.

2. Problem Statement

In this section, we provide a detailed explanation of the mathematical framework for the Allen-Cahn equation. First, we introduce the Allen-Cahn equation along with its boundary conditions. Next, we derive its corresponding energy minimization formulation. Finally, we explain the design of the loss function employed in our model.

The Allen-Cahn equation is a partial differential equation (PDE) used to model the phase separation process in multicomponent systems, such as anti-phase domain coarsening in alloys, crystal growth dynamics, and other phenomena involving interface motion [1,16]. Its expression is given as:

\frac{\partial u (x, t)}{\partial t} = ϵ^{2} Δ u (x, t) - f (u (x, t)),

(1)

where

u (x, t)

represents the phase variable, and

ϵ^{2}

(the “interface parameter”) is a positive constant that controls the interface width. The term

f (u) = u^{3} - u

is a nonlinear term governing the phase transition process, with two stable states at

u = 1

and

u = - 1

. In this equation, the diffusion term

ϵ^{2} Δ u

smooths the scalar field, while the nonlinear term

f (u)

drives the system toward the two stable equilibrium states. The parameter

ϵ

plays a critical role in controlling the thickness of the interface between the two phases: the smaller the value of

ϵ

, the sharper the interface becomes.

It is crucial to carefully select appropriate boundary conditions to comprehensively characterize the evolution behavior of the system. In this study, we adopt Neumann boundary conditions,

\frac{\partial u (x, t)}{\partial n} = 0 on \partial Ω \times [0, T] .

This means that the rate of change of the phase variable at the boundary is zero, indicating that the interface normally does not cross the boundary. This condition is reasonable for many physical systems, especially when the phase separation process occurs within a closed system. Employing Neumann boundary conditions not only simplifies the problem but also ensures physical consistency at the boundaries. Combined with the energy minimization approach, our model can effectively simulate the evolution of interfaces during the phase separation process.

The computational domains

Ω

considered in this study are

Ω = [- 1,1]

for the 1D case and

Ω = [0,1] \times [0,1]

for the 2D case. The system’s evolution starts from a specified initial condition, formally defined as

u (x, 0) = u_{0} (x)

.

Next, we derive the Allen-Cahn equation from the perspective of an energy functional, which provides an important theoretical foundation for constructing our model. The Allen-Cahn equation can be viewed as the

L^{2}

gradient flow of the total free energy functional, meaning that the evolution of the scalar field

u

progresses in the direction of minimizing the total free energy. The corresponding energy functional is defined as,

E (u) = \int_{Ω} (\frac{ϵ^{2}}{2} {|\nabla u|}^{2} + F (u)) d x,

where

F (u) = 1 / 4 {(u^{2} - 1)}^{2}

is a double-well potential that penalizes deviations from the stable phases

u = \pm 1

, while

1 / 2 {|\nabla u|}^{2}

represents the interfacial energy.

To further understand the connection between the energy functional and the Allen-Cahn equation, we derive the Allen-Cahn equation starting from the L₂ gradient flow of the energy functional

E (u)

. The gradient flow of the energy functional is expressed as:

\frac{\partial u}{\partial t} = - \frac{δ E}{δ u},

(2)

where

δ E / δ u

represents the variational derivative of the energy functional

E (u)

with respect to

u

. By introducing a small perturbation

δ u

to

u

, i.e., replacing

u

with

u \to u + δ u

, and expanding the energy functional

E (u + δ u)

to the first-order term in

δ u

, we have:

E (u + δ u) = \int_{Ω} (\frac{ϵ^{2}}{2} {|\nabla (u + δ u)|}^{2} + F (u + δ u)) d x,

expanding this expression, we obtain:

E (u + δ u) = E (u) + δ \int_{Ω} (ϵ^{2} \nabla u \cdot \nabla (δ u) + F^{’} (u) δ u) d x + O (δ^{2}),

using integration by parts and assuming

\nabla u \cdot n = 0

on the boundary

\partial Ω

, we simplify the variation as:

E (u + δ u) - E (u) = δ \int_{Ω} (- ϵ^{2} Δ u + F^{’} (u)) δ u d x,

thus, the functional derivative

δ E / δ u

is given by:

\frac{δ E}{δ u} = - ϵ^{2} Δ u + F^{’} (u) .

(3)

Substituting Equation (3) into Equation (2), we obtain the Allen-Cahn Equation (1). It is remarked that Equation (1) is the Euler-Lagrange equation corresponding to the minimization of the energy functional

E [u]

, which can be expressed as:

U_{g} = \arg \min E (u) .

This highlights the variational structure of the Allen-Cahn equation, where the solution evolves as a gradient flow to minimize the total energy. Consequently, solving the steady-state problem of the Allen-Cahn equation (i.e., setting

\partial u / \partial t = 0

in Equation (1)) can be transformed into seeking the minimum value of the energy functional, namely

U_{g}

.

In this study, we aim to address the efficient solution of the Allen-Cahn (AC) equation using a deep learning-based approach. A critical step in applying deep learning algorithms is the appropriate design of the loss function. Given the variational nature of the AC equation, it is natural and theoretically sound to consider the energy functional

E (u)

as the principal component of the loss function. This approach not only ensures that the deep learning model adheres to the physical principles embedded in the equation but also facilitates the enforcement of physical constraints, such as energy decay and boundary conditions, during the training process. Recent studies have demonstrated that incorporating the energy function into the loss function allows for a direct connection between the solution approximation and the underlying physical laws, thereby enhancing the accuracy and stability of the numerical solutions [23]. Accordingly, in our framework, the energy function serves as the foundation for defining the loss function, ensuring compliance with the inherent physics of the AC equation while leveraging the flexibility and efficiency of deep learning techniques.

3. Energy Minimization Neural Network Method

Neural networks have been extensively studied and applied over the past decades. By incorporating multiple hidden layers, deep neural networks (DNNs) emulate the hierarchical learning process of the human brain, enabling them to capture complex and nonlinear relationships from large datasets. This capability has led to significant achievements in various areas, including medical image processing and fault diagnosis [26,27].

In any deep learning task, four fundamental components must be considered:

The data preparation, which provides the foundation for the model’s learning process.
The model architecture, which defines how the input data is transformed or represented.
The loss function (or performance metric), which quantifies the model’s effectiveness by evaluating its performance and identifying areas of strength or limitation.
The optimization algorithm, which iteratively adjusts the model’s parameters to minimize the objective function and enhance overall performance.

3.1. Data Preparation

In an

n

-dimensional domain, the input layer of the network consists of

n

neurons corresponding to the spatial coordinates. Specifically, let

x = (x_{1}, x_{2}, \dots, x_{n}) \in Ω

represent the spatial coordinates within the domain

Ω

, obtained through uniform sampling. These coordinates serve as the input features of the neural network. The network predicts the solution

u_{N} (x)

, which approximates the solution of the Allen-Cahn equation at the points

x

.

3.2. Model Architecture

Based on fully connected neural networks (FCNNs), this paper proposes an improved network architecture called Energy-Stabilized Scaled Deep Neural Network (ES-ScaDNN). A conventional feedforward DNN transforms an input vector through a series of nonlinear operations to produce an output. Formally, given an input vector

y_{0} \in R^{n_{0}}

with

n_{0} \in N^{+}

, a DNN with

L

hidden layers defines a function:

f_{θ} (y_{0}) = F_{L + 1} \circ σ \circ F_{L} \circ σ \circ \dots \circ σ \circ F_{1} (y_{0}), θ = W_{l}, 1 \leq l \leq L + 1,

where each

F_{l} : R^{n_{l - 1}} \to R^{n_{l}}

is an affine transformation defined by:

F_{l} (y_{l - 1}) = W_{l} y_{l - 1} + b_{l}, y_{l - 1} \in R^{n_{l - 1}}, 1 \leq l \leq L + 1,

with weight matrices

W_{l} \in R^{n_{l} \times n_{l - 1}}

, bias vectors

b_{l} \in R^{n_{l}}

, and

n_{l} \in N^{+}

. The final layer

F_{L + 1}

serves as the output layer, while

F_{l}

for

1 \leq l \leq L

are the hidden layers, each containing

n_{l}

neurons. The activation function

σ : R \to R,

applied element-wise, introduces nonlinearity into the network. For simplicity, we focus on fully connected neural networks (FCNNs) and assume that all hidden layers have the same number of neurons, i.e.,

W = n_{1} = n_{2} = \dots = n_{L}

. We refer to the total number of hidden layers

L

as the “depth,” and the number of neurons

W

in each hidden layer as the “width.”

The activation function

σ : R \to R

, applied element-wise after each affine transformation, plays a crucial role in introducing nonlinearity into the network. The selection of

σ

in our study is based on a comprehensive consideration of the model’s performance and the characteristics of the Allen-Cahn equation. We explore the effectiveness of both hyperbolic tangent (tanh) and Rectified Linear Unit (ReLU) activation functions:

Hyperbolic Tangent Activation Function (tanh): The tanh function outputs values in the range [−1,1], enabling it to capture both positive and negative values, which is crucial for modeling phase transitions in the Allen-Cahn equation. It performs well in capturing nonlinear behaviors and maintaining the physical consistency of the solution. Moreover, the tanh function helps mitigate the vanishing gradient problem in deep networks [28]. The mathematical expression of the tanh function is:

$\tanh (z) = \frac{e^{z} - e^{- z}}{e^{z} + e^{- z}} .$
Rectified Linear Unit (ReLU) Activation Function: The ReLU function, defined as $R e L U (z) = m a x (0, z)$ , has a simple non-saturating form that effectively prevents the vanishing gradient problem and accelerates model convergence. It is especially suitable for shallower network structures [29].

In subsequent sections, we will discuss in detail the impact of activation function choices on the model’s performance in solving one-dimensional and two-dimensional problems.

To ensure effective learning, it is essential to initialize the weights properly and choose an appropriate activation function. We initialize the weights of each layer using Xavier initialization [30], which scales the weights to prevent the vanishing or exploding gradient problem during training. Mathematically, the weights

W^{(l)}

of each layer are initialized as:

W^{(l)} \sim N (0, \frac{2}{{fan}_{in} + {fan}_{out}}),

where fan_in and fan_out refer to the number of input and output units in the layer, respectively. The biases

b^{(l)}

are initialized to zero.

3.3. Scaling Layer

A critical aspect of the Allen-Cahn phase variable u is that its physically meaningful values lie within a specific range, typically [−1, 1], representing the two stable phases. To ensure the network output aligns with this characteristic during training, we introduce a global min-max scaling layer applied after the final linear transformation of the network. Let

U (x)

denote the raw output of the final linear layer for all sample points in the domain (or the current training batch). The scaling layer computes the scaled output

U_{scaled} (x)

using the following transformation:

U_{scaled} (x) = 2 \cdot \frac{U (x) - \min (U)}{\max (U) - \min (U)} - 1,

where

\max (U)

and

\min (U)

are the maximum and minimum predicted values of

U (x)

over the entire domain, respectively. This operation maps the entire set of network outputs to the range [−1, 1]. While this is a global operation, it serves a crucial numerical purpose during optimization. It guarantees that the values fed into the energy functional calculation (specifically the

F (u) = 1 / 4 {(u^{2} - 1)}^{2}

term) span the full [−1, 1] range where the double-well potential is defined. This explicit range enforcement across the domain can help prevent the network from converging to trivial solutions where the output collapses to a very narrow range (e.g., near zero), potentially stabilizing the energy minimization process and ensuring the network actively explores solutions within the physically relevant bounds. The overall neural network structure incorporating this layer is illustrated in Figure 2.

3.4. Loss Function Design

The loss function plays a crucial role in guiding the training of the neural network to find the solution that minimizes the energy functional of the Allen-Cahn equation. The primary components of the loss function include energy loss, boundary loss, and variance-based regularization term.

Energy Loss: The energy functional associated with the Allen-Cahn equation serves as the main part of the loss function. By minimizing this energy, the network seeks solutions that are physically meaningful and stable. The energy loss term L_E is defined as:

L_{E} = \frac{1}{N} \sum_{i = 1}^{N} (\frac{ϵ^{2}}{2} {|\nabla u_{p} (x^{(i)})|}^{2} + \frac{1}{4} {|u_{p} {(x^{(i)})}^{2} - 1|}^{2}) Δ V,

where

N

is the number of sampling points,

u_{p}

is the predicted solution (or interface parameter),

\nabla u_{p}

represents the gradient of the predicted solution, and

Δ V

is the volume element. This formulation corresponds to the discretized version of the Allen-Cahn energy functional.

Boundary Loss: In the training of neural networks for solving partial differential equations (PDEs), satisfying boundary conditions is essential. We add a boundary loss term $L_{B}$ to the loss function to ensure that the network’s predicted solution satisfies the boundary conditions. Assuming Neumann boundary conditions ( $\partial u / \partial n = 0$ ), we define the boundary loss by minimizing the discrepancy in the gradients between the predicted values and the true boundary conditions:

L_{B} = \frac{1}{N_{b}} \sum_{i = 1}^{N_{b}} {({\frac{\partial u_{p}}{\partial n}|}_{x_{b}^{(i)}})}^{2},

where

N_{b}

is the number of sampling points on the boundary, and

{\frac{\partial u_{p}}{\partial n}|}_{x_{b}^{(i)}}

is the predicted normal derivative at the boundary point

x_{b}^{(i)}

. By minimizing

L_{B}

, we ensure that the network’s solution satisfies the Neumann boundary conditions.

Variance-based regularization term: The Allen-Cahn equation naturally leads to phase separation, where the solution $u (x)$ predominantly takes values near the potential minima (e.g., −1 and 1), resulting in significant spatial variance. A common failure mode in numerical optimization, especially with complex models like neural networks, is convergence to a non-physical, homogeneous state (e.g., $u \approx 0$ everywhere). Such states correspond to low spatial variance and represent undesired local minima or saddle points in the optimization landscape. To counteract this tendency and actively promote phase separation, we introduce a threshold variance-based regularization term. This term penalizes the solution only if its spatial variance drops below a predefined minimum threshold $σ_{m i n}^{2}$ , which represents the minimal level of variance expected for a phase-separated state. The loss term $L_{V}$ is defined as:

L_{v} = {(m a x (0, σ_{m i n}^{2} - Var (U_{p})))}^{2},

where

Var (U_{p})

is the variance of the scaled network output

U_{p} = U_{s c a l e d}

computed over the sampling points within the domain (or batch), and

σ_{m i n}^{2}

is a small positive hyperparameter.

Combining the above components, we formulate a comprehensive loss function. The overall training objective is to minimize the following weighted loss function:

Loss = L_{E} + λ_{B} L_{B} + λ_{V} L_{V},

where

L_{E}

is the energy loss,

L_{B}

is the boundary loss,

L_{V}

is the threshold variance loss, and

λ_{B}

,

λ_{V}

are positive weighting coefficients. These coefficients balance the contributions of boundary enforcement and phase separation promotion against the primary energy minimization objective.

3.5. Optimization Algorithm

We employ the Adam optimizer for training the neural network. The Adam optimizer is advantageous for solving optimization problems with noisy gradients and has been shown to perform well in training deep neural networks for PDEs [30,31]. Its adaptive learning rate and momentum help accelerate convergence and improve the stability of the training process, which is beneficial for minimizing the energy functional associated with the Allen-Cahn equation. Its mechanism, which computes adaptive learning rates from estimates of the first and second moments of the gradients, makes it well-suited for navigating the complex energy landscape of this problem.

3.6. Implementation Details and Hyperparameter Selection

The neural network architecture and hyperparameters were chosen to balance model capacity, training stability, and computational cost. For the 1D experiments, we used a fully connected neural network with 3 hidden layers and 70 neurons per layer. For the 2D experiments, the network consisted of 5 hidden layers with 100 neurons each. These architectures were found to provide sufficient capacity to capture the solution’s complexity without excessive overfitting.

The network was trained using the Adam optimizer with a fixed learning rate of 1 × 10⁻³, which was empirically found to provide stable and efficient convergence. To optimize computational efficiency and avoid unnecessary calculations after convergence, we implemented an early stopping strategy. The training was set to terminate when the change in the loss function value over a set number of epochs fell below a threshold of 1 × 10⁻⁶. The minimum variance threshold

σ_{m i n}^{2}

was set to 1.0, a value determined from preliminary tests to be effective in preventing convergence to homogeneous states while not overly constraining the optimization process.

The weighting coefficients for the boundary and variance losses in the composite loss function were set to

λ_{B} = 1

and

λ_{V} = 1.5

, respectively. The relatively large weight for the boundary loss (

λ_{B}

) ensures that the Neumann boundary conditions are strongly enforced. The smaller weight for the variance loss (

λ_{V}

) serves as a soft regularization to encourage phase separation without overpowering the primary energy minimization objective. These values were determined empirically to yield the best performance across our test cases.

4. Test Case 1: One-Dimensional Allen-Cahn Equation

Although deep neural networks are ultimately designed to address complex high-dimensional problems, this study adopts a step-by-step approach by initially focusing on the one-dimensional case. This strategy facilitates a more thorough understanding of the core concepts and operational mechanisms of ES-ScaDNN, laying a solid foundation for future extensions to higher-dimensional problems. For the one-dimensional case, we consider the spatial domain

Ω = [- 1,1]

. A uniform sampling method is employed to ensure sufficient spatial information is captured across the entire computational domain. This approach helps the neural network approximate the global characteristics of the solution more accurately, avoiding issues of under-sampling or over-sampling in certain regions, thereby improving the overall stability and accuracy of the solution.

4.1. Initial Condition Pretraining

To investigate the influence of initial conditions on the solution, we conduct experiments with three different initial conditions:

u (x, 0) = c o s (π x)

,

u (x, 0) = \sin (π x)

,

u (x, 0) = - \sin (π x)

. These initial conditions represent distinct initial states of the system, enabling us to observe the evolution of the phase separation process. Prior to formal training, we introduce a pretraining phase designed to allow the neural network to approximate the initial condition

u_{0} (x)

. During pretraining, the loss function is defined as the mean square error (MSE) between the predicted values

u_{p} (x)

and the true initial condition

u_{0} (x)

:

L_{p r e} = \frac{1}{N} \sum_{i = 1}^{N} {(u_{p} (x^{(i)}) - u_{0} (x^{(i)}))}^{2},

where

N

denotes the number of training sample points, and

x_{i}

represents the positions of the sample points. By minimizing

L_{pre}

, the network learns the initial state, which provides a solid foundation for the subsequent minimization of the energy functional.

4.2. Choice of Activation Functions in the One-Dimensional Case

In solving the one-dimensional problem, we conducted comparative experiments using two commonly employed activation functions: ReLU and tanh. The results are illustrated in Figure 3.

By comparing the loss curves, we observed that the ReLU activation function demonstrates faster convergence in the one-dimensional case. For instance, the loss for the ReLU activation function drops below 0.02 in approximately 50 epochs, whereas the tanh activation function requires approximately 100 epochs to reach the same level, as shown in the inset. Unlike tanh, the non-saturating nature of ReLU helps mitigate the vanishing gradient problem, especially in relatively shallow network architectures, enabling a more efficient approximation of the solution’s local features. Specifically, while tanh excels at capturing nonlinear behaviors, its limited output range can result in slower convergence during the initial stages of training in one-dimensional problems.

Therefore, for the one-dimensional case, the ReLU activation function outperforms tanh, particularly in terms of quickly approximating the initial solution and improving training efficiency. As a result, subsequent experiments for the one-dimensional scenario will use ReLU as the activation function. However, it is worth noting that tanh is better suited for scenarios involving higher-dimensional problems or those emphasizing physical consistency.

4.3. Steady-State Solutions with Different Initial Conditions

Through training the neural network, we obtain the steady-state solutions of the one-dimensional Allen-Cahn equation. Figure 4 presents the results for different initial conditions

u (x, 0) = c o s (π x)

,

u (x, 0) = \sin (π x)

and

u (x, 0) = - s i n (π x)

, along with the corresponding loss value evolution during training.

Reference Numerical Method

To validate our 1D results and obtain the reference energy values, we computed reference steady-state solutions using a standard Finite Difference Method (FDM), consistent with the ‘Traditional Method’ label used in our comparisons. The specific implementation details, including the second-order central difference scheme for spatial discretization, an implicit time-marching scheme (e.g., Crank-Nicolson) to reach steady-state, and stability analysis, are well-established and documented in previous work, such as [32]. For our simulations, a fine spatial grid with

Δ x = 1 / 1024

was used to ensure high accuracy.

From the results, it can be observed that the loss function converges rapidly, and the system ultimately reaches a steady-state phase-separated configuration. In the spatial domain, the solution converges to

u = \pm 1

, which aligns perfectly with theoretical predictions. Specifically, for all three initial conditions (

u (x, 0) = c o s (π x)

,

u (x, 0) = \sin (π x)

,

u (x, 0) = - \sin (π x)

), the solutions exhibit distinct phase separation characteristics: clear interfaces form within the spatial domain, dividing it into two regions—one where

u

approaches

+ 1

, and the other where

u

approaches

- 1

.

To quantitatively validate our deep learning approach, we compared the minimum energy obtained through our ES-ScaDNN method with reference solutions obtained using Finite Difference Method (FDM). The total energy

E_{m i n}

is the value of the functional in Equation (2) integrated over the entire domain

Ω

. Table 1 presents the absolute differences between these energy values for various initial conditions and interface parameters. These absolute differences are consistently small (on the order of

1 0^{- 2}

to

1 0^{- 3}

), indicating good agreement between our method and the reference solution. The results show that our method achieves high accuracy, with absolute errors consistently below 0.03 across all test cases. Notably, when the interface parameter ε = 0.01, the absolute errors are particularly small (around 0.003–0.006), indicating excellent agreement with FDM reference solutions. The slightly larger discrepancies observed at ε = 0.05 might be attributed to the increased interface width, though these differences remain within acceptable bounds.

This quantitative comparison further validates that our deep learning approach not only captures the qualitative features of phase separation but also achieves high numerical accuracy in terms of energy minimization.

It is noteworthy that, despite the differences in initial conditions, the system ultimately evolves into similar phase-separated configurations. This phenomenon highlights the asymptotic behavior of the solutions to the Allen-Cahn equation, demonstrating the system’s tendency to form configurations that minimize the total energy. The deep learning-based solution method effectively captures this behavior, showing that the system gravitates toward energy-minimizing states. Our deep learning approach not only accurately reproduces this physical process but also provides an efficient numerical solution strategy.

4.4. Impact of Scaling Layer on Solution Accuracy and Physical Constraints

The Allen-Cahn equation inherently requires its solutions to remain within the interval [−1, 1] due to its double-well potential nature and physical implications. However, traditional neural network approaches may generate solutions that violate this constraint, potentially leading to physically meaningless results. To address this limitation, we introduce a Scaling Layer into the neural network architecture. This modification not only ensures that the network outputs comply with the physical constraints but also potentially improves the solution accuracy. To quantitatively assess the effectiveness of this approach, we conduct a systematic comparison between neural networks with and without the Scaling Layer, examining their performance across various initial conditions and interface parameters.

Table 2 presents a comprehensive comparison of the numerical results obtained from neural networks with and without the Scaling Layer against FDM solutions. We examined three different initial conditions: cos(πx), sin(πx), and −sin(πx), each under two interface parameters (ε = 0.01 and 0.05). The comparison metrics include both the

L_{\infty}

norm (maximum absolute error) and

L_{1}

norm (mean absolute error). The results demonstrate that the introduction of the Scaling Layer significantly improves the accuracy of the neural network solutions. A crucial advantage of incorporating the Scaling Layer is that it constrains the output values within the interval [−1, 1], which aligns with the physical characteristics of the Allen-Cahn equation’s energy minimization principle. Without the Scaling Layer, the network outputs may exceed this physically meaningful range, leading to unrealistic solutions. For instance, with cos(πx) as the initial condition and ε = 0.01, the

L_{\infty}

error decreases from 1.9315 to 0.9969, representing a 48.4% improvement. Similarly, the L1 error shows a substantial reduction from 7.0425 × 10⁻² to 2.3070 × 10⁻², indicating a 67.2% improvement in average accuracy. This enhancement pattern is consistent across all initial conditions and interface parameters, suggesting that the Scaling Layer effectively enhances the neural network’s ability to capture the solution dynamics while maintaining physical constraints, particularly in cases with smaller interface parameters.

L_{\infty} = \max_{i} |x_{i} - y_{i}|,

L_{1} = \frac{1}{n} \sum_{i = 1}^{n} |x_{i} - y_{i}| .

4.5. Impact of Variance Loss on Solution Accuracy

To further enhance the solution accuracy of our ES-ScaDNN method, we investigated the effect of incorporating variance loss into the training process. The variance loss term helps regularize the network’s output distribution and promotes more stable convergence to physically meaningful solutions.

Table 3 presents a comprehensive comparison of numerical results obtained with and without the variance loss term. For both scenarios, we evaluated the performance using

L_{\infty}

error (maximum absolute error) and

L_{1}

error (mean absolute error) across different initial conditions (cos(πx), sin(πx), and −sin(πx)) and interface parameters (ε = 0.01 and 0.05). The incorporation of variance loss consistently leads to substantial improvements in solution accuracy across all test cases.

The comparative analysis reveals significant improvements in solution accuracy when variance loss is incorporated. For the initial condition cos(πx) with ε = 0.01, the

L_{\infty}

error decreases dramatically from 1.9992 to 0.9969, representing a 50.1% reduction. More notably, the

L_{1}

error shows an even more substantial improvement, declining from 8.6914 × 10⁻¹ to 2.3070 × 10⁻², indicating a 97.3% reduction in average error. Similar improvements are observed across different initial conditions and interface parameters.

Figure 5 provides a visual comparison of the solutions obtained with and without variance loss for the initial condition cos(πx). The solid blue line represents the FDM solution, while the red dashed line represent solutions obtained with and without variance loss. The results demonstrate that incorporating variance loss leads to solutions that more closely align with FDM. Without variance loss, the neural network tends to produce solutions with larger deviations from the expected physical behavior, particularly near the interface regions where x ≈ ±0.5. The addition of variance loss effectively constrains the solution space, resulting in more physically consistent results that better capture the interface dynamics of the Allen-Cahn equation.

These findings suggest that variance loss serves as an essential component in our deep learning framework, significantly improving both the numerical accuracy and physical consistency of the solutions.

4.6. Impact of Different Training Epochs on Solution Evolution

The number of training epochs determines the number of iterations in neural network training, which directly affects the model’s convergence and the accuracy of the results. To further illustrate the evolution of the solution from the initial state to the steady state, we conducted experiments using different numbers of training epochs: 0 (i.e., immediately after pretraining), 10, 100, and 5000. These experiments were performed using

- s i n (π x)

as the initial condition. By analyzing these different epoch values, we can observe the gradual evolution of the predicted solution by the neural network. The results are illustrated in Figure 6.

The experimental results indicate that as the number of training epochs increases, the predicted solution gradually evolves from an initial mixed state to a steady-state phase-separated configuration. During the early training stages (e.g., epochs = 0 and 10), the solution undergoes significant changes, as the system has not yet been sufficiently trained. This is reflected in the incomplete emergence of the phase separation process. At this stage, the network is still rapidly adapting to the characteristics of the input data, resulting in considerable fluctuations in the predicted solution.

By the time the number of epochs increases to 100, the changes in the solution begin to stabilize, and the system starts to approach a steady state. At this stage, the neural network can accurately simulate the evolution of the solution, with the phase separation phenomenon becoming increasingly distinct. Finally, at epochs = 5000, the model achieves extremely high accuracy, with the solution fully evolving into the steady-state configuration. The interfaces of the phase-separated regions become sharp and clear, indicating that the system has reached a physically stable state.

These experimental results demonstrate that a sufficient number of training epochs is crucial for the neural network to find the global optimal solution. Short training periods (e.g., epochs = 0 and 10) may cause the network to converge to a local optimal solution, resulting in less pronounced phase separation. In contrast, longer training periods (e.g., epochs = 100 and 5000) enable the network to explore the solution space more comprehensively, ultimately yielding a more accurate steady-state solution.

4.7. Analysis of the Effect of Parameter $ϵ$ on the Steady-State Solutions of the Allen-Cahn Equation

In the context of solving the Allen-Cahn equation by minimizing the energy functional using neural networks, it is crucial to understand the influence of parameter

ϵ

on the interface width and phase separation behavior. The Allen-Cahn equation is widely applied in modeling phase separation and interface evolution, where

ϵ

, governs the sharpness of the interface and the system’s dynamic properties. By varying the value of

ϵ

, we can gain deeper insights into its role in energy functional minimization.

With an initial condition of

\sin (π x)

, we investigate the effect of

ϵ

on the steady-state solutions by selecting

ϵ

values of 0.01, 0.02, 0.05, and 0.1. The number of training epochs is fixed at 5000, and all other parameters remain unchanged. The results are illustrated in Figure 7.

The results reveal that as

ϵ

decreases, the interface becomes steeper, and the transition region narrows, exhibiting more pronounced phase separation. This is because a smaller ϵ corresponds to a higher energy penalty, causing the system to transition rapidly into stable phases. From an energetic perspective, the double-well potential term (u² − 1)² ensures that u = ±1 are the system’s energy minima, explaining the mechanism by which solutions tend toward these two stable states. Meanwhile, the gradient term ϵ²(du/dx)² determines the width of the interface. Our numerical results clearly illustrate this competition between the two energy terms.

Specifically, when

ϵ = 0.01

, the sharpest interface structure is observed, which aligns well with theoretical predictions. In contrast, for

ϵ = 0.1

, the interface exhibits a relatively large diffuse transition. These observations not only validate the accuracy of theoretical analyses but also highlight the capability of deep learning methods to capture complex interface dynamics.

It is worth noting that this parameter dependence has significant implications for understanding and controlling phase separation phenomena in real-world physical systems, such as phase transitions in materials science and the formation of biological membranes. By adjusting the value of

ϵ

, we can precisely control the interface properties, enabling effective regulation of the phase separation process.

5. Test Case 2: Two-Dimensional Allen-Cahn Equation

Building on the one-dimensional problem discussed earlier, we extend the method to a two-dimensional case to further demonstrate the neural network’s capability in solving the Allen-Cahn equation in higher dimensions. We consider the spatial domain

Ω = [0,1] \times [0,1]

, where sample points are collected using uniform sampling. The steady-state solution of the system is obtained by minimizing the energy functional.

Unlike the 1D case where specific initial functions

u_{0} (x)

were used, for the two-dimensional problem, we do not perform a pretraining step to fit a predefined initial condition. Instead, the network optimization process starts directly from its random weight initialization. This standard practice effectively places the initial predicted state

u (x, y)

in a high-energy, disordered configuration, analogous to starting from random noise. The subsequent energy minimization then drives the system towards a stable, phase-separated steady state.

5.1. Design of the Energy Functional Loss Function in Two Dimensions

For the two-dimensional case, the energy functional is expressed as:

E (u) = \int_{Ω} (\frac{ϵ^{2}}{2} ({(\frac{\partial u}{\partial x})}^{2} + {(\frac{\partial u}{\partial y})}^{2}) + \frac{1}{4} {(u^{2} - 1)}^{2}) d x d y,

to facilitate computation, we discretize this energy functional and derive the following numerical approximation of the residual energy loss function:

L_{E} = \sum_{i = 1}^{N_{x}} \sum_{j = 1}^{N_{y}} (\frac{ϵ^{2}}{2} ({({\frac{\partial u}{\partial x}|}_{(x_{i}, y_{j})})}^{2} + {({\frac{\partial u}{\partial y}|}_{(x_{i}, y_{j})})}^{2}) + \frac{1}{4} {(u {(x_{i}, y_{j})}^{2} - 1)}^{2}) Δ x Δ y,

where

Δ x

and

Δ y

are the discretization steps in the spatial directions, and

N_{x}

and

N_{y}

are the number of discrete points in the

x

and

y

directions, respectively. As in the one-dimensional case, minimizing this loss function guides the neural network toward approximating the optimal solution of the energy functional, thereby obtaining the steady-state solution.

To ensure that the boundary conditions are satisfied, we introduce a boundary loss function

L_{B}

, defined as:

L_{B} = \sum_{j = 1}^{N_{y}} ({({\frac{\partial u}{\partial x}|}_{x = 0, y_{j}})}^{2} + {({\frac{\partial u}{\partial x}|}_{x = 1, y_{j}})}^{2}) + \sum_{i = 1}^{N_{x}} ({({\frac{\partial u}{\partial y}|}_{x_{i}, y = 0})}^{2} + {({\frac{\partial u}{\partial y}|}_{x_{i}, y = 1})}^{2}),

this term ensures that the Neumann boundary condition of the Allen-Cahn equation is satisfied, i.e., the derivative at the boundaries is zero.

In the two-dimensional case, we use the ES-ScaDNN network structure to solve the Allen-Cahn equation. Similar to the one-dimensional problem, the input to the network consists of the two-dimensional spatial coordinates

(x, y)

, and the output is the predicted solution

\hat{u} (x, y)

. Since we use random noise as the initial condition, this randomness is already reflected in the sampling process, and thus the pretraining phase is omitted. We directly optimize the neural network parameters by minimizing the total loss function:

Loss = L_{E} + λ_{B} L_{B} + λ_{V} L_{V},

where

λ_{B}

and

λ_{V}

are weighting coefficients for the boundary loss and any additional constraints or regularization terms, respectively.

5.2. Choice of Activation Functions in the Two-Dimensional Case

In the two-dimensional case, we again conducted comparative experiments using the ReLU and tanh activation functions. The results are shown in the Figure 8:

The comparison reveals that, under the same number of training epochs, the loss values obtained using the tanh activation function are significantly lower than those with ReLU. Moreover, the smoothness and stability of the solutions obtained with tanh are more accurate in the two-dimensional setting. This is because the tanh activation function is better suited to handling complex boundary conditions and ensuring the continuity of solutions in multidimensional spaces. In contrast, ReLU tends to produce discontinuous gradients when dealing with high-dimensional problems, leading to abnormal fluctuations in the loss values and difficulty in converging to the optimal solution.

Therefore, in the two-dimensional case, the tanh activation function outperforms ReLU, especially in capturing the dynamics of phase separation. Subsequent experiments in the two-dimensional scenario will adopt the tanh activation function for implementation.

5.3. Solution of the 2D Problem and the Impact of Training Epochs on Solution Evolution

Through multiple experiments, we observed that, in the two-dimensional case, the phase separation process of the Allen-Cahn equation is similar to that in the one-dimensional case. The system evolves from a disordered mixed state to a steady-state phase-separated configuration. The Figure 9 shows the predicted solutions at different training epochs (100, 1000, 10,000, and 20,000).

As shown, the phase separation process becomes more apparent with an increasing number of training epochs. For fewer training epochs (e.g., epochs = 100 and 1000), the solution undergoes significant changes, and the phase separation phenomenon is not yet clear. As the training progresses (epochs = 10,000 and 20,000), the solution gradually stabilizes, the phase interfaces become sharper, and the system ultimately reaches a steady state.

The experimental results demonstrate that sufficient training epochs are also critical for the neural network to find the global optimal solution in the two-dimensional problem. Compared to the one-dimensional case, the increased complexity of the two-dimensional problem requires more training epochs to achieve convergence. Short training periods may lead the network to fall into a local optimum, resulting in less pronounced phase separation. In contrast, longer training periods allow the network to explore the solution space more comprehensively, ultimately yielding an accurate steady-state solution. This is consistent with the observations from the one-dimensional experiments and further confirms the broad applicability of neural networks to high-dimensional problems.

5.4. Hardware Scalability Analysis: CPU vs. GPU Implementation

To evaluate the computational efficiency of our approach, we compared the execution times between CPU and GPU implementations for solving the two-dimensional Allen-Cahn equation. The experiments were conducted across different grid resolutions to assess how the computational advantage of GPU acceleration scales with problem size. Table 4 presents the comparison of total training time between CPU and GPU implementations.

The results demonstrate computational speedup advantages of GPU acceleration, particularly as the problem size increases. As shown in Table 4, the speedup factor ranges from 7.33× for the smallest grid (64 × 64) to 15.64× for the largest grid (512 × 512), indicating that GPU acceleration becomes more beneficial for larger problem sizes. This scalability is particularly important for practical applications where high-resolution simulations are required. The most substantial speedup improvement is observed for the 512 × 512 grid, where the GPU implementation reduces the computation time by approximately 93.6% compared to the CPU implementation.

5.5. The Effect of Parameter $ϵ$ on the Phase Separation Process

Similar to the one-dimensional case, the parameter

ϵ

controls the width of the phase separation interface. In the two-dimensional scenario, we also conducted experiments with different

ϵ

values (epochs = 20,000) to observe its effect on the phase separation process. The results are shown in the Figure 10.

The results indicate that, as the value of

ϵ

decreases, the phase interfaces become steeper, and the separation process becomes more pronounced. This is because smaller

ϵ

values concentrate the interface energy, resulting in narrower interfaces and clearer boundaries between phases.

By extending the method from the one-dimensional problem to the two-dimensional case, we successfully demonstrated the neural network’s capability to solve high-dimensional problems. The experimental results show that neural networks can effectively capture the phase separation process of the Allen-Cahn equation. The system evolves from a disordered state to a steady state, ultimately yielding distinct phase interfaces. The training epochs and the

ϵ

parameter have significant effects on the evolution of the solutions, and an appropriate training strategy ensures that the network finds the global optimal solution. We have validated the applicability of the ES-ScaDNN method to two-dimensional problems, providing a solid foundation for further extensions to more complex problems in higher dimensions.

6. Conclusions

In this paper, we proposed an Energy-Stabilized Scaled Deep Neural Network (ES-ScaDNN) to efficiently solve for the steady-state solutions in phase-field modeling via energy minimization. Our key innovations—a scaling layer enforcing physical bounds [−1, 1] and a variance-based regularization term preventing collapse to homogeneous states—demonstrably enhance solution stability and accuracy. Specifically, our 1D analysis showed that including the variance loss term improves accuracy dramatically, yielding a reduction in L1 error of up to 97.3%. Through extensive 1D and 2D experiments, we validated the effectiveness and robustness of ES-ScaDNN across various initial conditions and interface parameters (ϵ), confirming its reliability in capturing phase separation dynamics. We identified ReLU activation for optimal 1D convergence speed and tanh for superior 2D smoothness. Furthermore, the framework exhibits excellent hardware scalability; GPU implementations achieve significant computational speedups, reaching up to 15.64 times faster than CPU for large-scale 2D problems. These results establish ES-ScaDNN as a promising, accurate, and computationally tractable deep learning framework specifically for steady-state phase-field problems.

Moving forward, extending the ES-ScaDNN framework to three-dimensional cases and coupled systems will be a key focus of our future work. While any 3D simulation presents a significant computational challenge due to the curse of dimensionality, our energy-based approach is theoretically well-suited for such an extension. The underlying methodology is directly applicable, requiring straightforward modifications to the network input and the energy loss formulation. More importantly, the inherent parallelism of neural networks, which proved highly effective in our 2D GPU analysis, offers a scalable path to mitigate the computational burden in 3D. Future research will explore optimized network architectures and sampling strategies to fully leverage this advantage for large-scale three-dimensional phase-field problems.

Author Contributions

Conceptualization, X.H. and R.Z.; Methodology, X.H. and R.Z.; Software, Y.W., R.W. and J.G.; Validation, Y.W.; Writing—original draft, Y.W.; Writing—review & editing, Y.W. and R.Z.; Visualization, Y.W. and R.W.; Supervision, R.Z.; Project administration, R.Z. All authors have read and agreed to the published version of the manuscript.

Funding

This work was supported by the PhD Research Startup Foundation of Guangdong University of Science and Technology (GKY-2022BSQD-36).

Data Availability Statement

The original contributions presented in this study are included in the article. Further inquiries can be directed to the corresponding authors.

Acknowledgments

This work was supported by the Research Projects Foundation of Guangdong University of Science and Technology (GKY-2023KYZDK-14), and the PhD Research Startup Foundation of Guangdong University of Science and Technology (GKY-2022BSQD-36).

Conflicts of Interest

The authors declare no conflicts of interest.

References

Allen, S.M.; Cahn, J.W. A microscopic theory for antiphase boundary motion and its application to antiphase domain coarsening. Acta Metall. 1979, 27, 1085–1095. [Google Scholar] [CrossRef]
Hensel, S.; Moser, M. Convergence rates for the Allen-Cahn equation with boundary contact energy: The non-perturbative regime. Calc. Var. Partial Differ. Equ. 2022, 61, 201. [Google Scholar] [CrossRef]
Wang, X.; Kou, J.; Gao, H. Linear energy stable and maximum principle preserving semi-implicit scheme for Allen-cahn equation with double well potential. Commun. Nonlinear Sci. 2021, 98, 105766. [Google Scholar] [CrossRef]
Yuan, W.; Zhang, C. Long-term dynamics of a stabilized time-space discretization scheme for 2D time-fractional Allen-cahn equation with double well potential. J. Comput. Appl. Math. 2024, 448, 115952. [Google Scholar] [CrossRef]
Fischer, J.; Marveggio, A. Quantitative convergence of the vectorial Allen-Cahn equation towards multiphase mean curvature flow. Ann. De L’institut Henri Poincaré C 2024, 41, 1117–1178. [Google Scholar] [CrossRef]
Tang, T.; Qiao, Z. Efficient numerical methods for phase-field equations. Sci. Sin. Math. 2020, 50, 775–794. (In Chinese) [Google Scholar] [CrossRef]
Wang, L.; Trojak, W.; Witherden, F.; Jameson, A. Nonlinear p-Multigrid Preconditioner for Implicit Time Integration of Compressible Navier—Stokes Equations with p-Adaptive Flux Reconstruction. J. Sci. Comput. 2022, 93, 81. [Google Scholar] [CrossRef]
Song, H.; Shu, C. Unconditional Energy Stability Analysis of a Second Order Implicit–Explicit Local Discontinuous Galerkin Method for the Cahn–Hilliard Equation. J. Sci. Comput. 2017, 73, 1178–1203. [Google Scholar] [CrossRef]
Hugues, T.J. The Finite Element Method: Linear Static and Dynamic Finite Element Analysis; Prentice-Hall Englewood: Englewood Cliffs, NJ, USA, 1987. [Google Scholar]
LeVeque, R.J. Finite Difference Methods for Ordinary and Partial Differential Equations: Steady-State and Time-Dependent Problems; Society for Industrial and Applied Mathematics: Philadelphia, PA, USA, 2007. [Google Scholar]
Lee, H.G. A second-order operator splitting Fourier spectral method for fractional-in-space reaction-diffusion equations. J. Comput. Appl. Math. 2018, 333, 395–403. [Google Scholar] [CrossRef]
Chen, X.; Qian, X.; Song, S. Fourth-order structure-preserving method for the conservative Allen-Cahn equation. Adv. Appl. Math. Mech. 2023, 15, 159–181. [Google Scholar] [CrossRef]
Li, J.; Zeng, J.; Li, R. An adaptive discontinuous finite volume element method for the Allen-Cahn equation. Adv. Comput. Math. 2023, 49, 55. [Google Scholar] [CrossRef]
Doan, C.; Hoang, T.; Ju, L. Fully discrete error analysis of first-order low regularity integrators for the Allen-Cahn equation. Numer. Methods Partial. Differ. Equ. 2023, 39, 3594–3608. [Google Scholar] [CrossRef]
Hou, D.; Qiao, Z.; Ju, L. A linear doubly stabilized Crank-Nicolson scheme for the Allen-Cahn equation with a general mobility. arXiv 2023, arXiv:2310.19663. [Google Scholar] [CrossRef]
Qi, X.; Azaiez, M.; Huang, C.; Xu, C. An efficient numerical approach for stochastic evolution PDEs driven by random diffusion coefficients and multiplicative noise. arXiv 2022, arXiv:2207.01258. [Google Scholar] [CrossRef]
Raissi, M.; Perdikaris, P.; Karniadakis, G.E. Physics-informed neural networks: A deep learning framework for solving forward and inverse problems involving nonlinear partial differential equations. J. Comput. Phys. 2019, 378, 686–707. [Google Scholar] [CrossRef]
Li, Y.; Shi, S.; Guo, Z.; Wu, B. Adversarial Training for Physics-Informed Neural Networks. arXiv 2023, arXiv:2310.11789. [Google Scholar] [CrossRef]
Heinlein, A.; Howard, A.A.; Beecroft, D.; Stinis, P. Multifidelity domain decomposition-based physics-informed neural networks and operators for time-dependent problems. arXiv 2024, arXiv:2401.07888. [Google Scholar]
Zhang, Q.; Chen, L.; Xu, Y. A Minimization Method for the Double-Well Energy Functional. arXiv 2018, arXiv:1809.01839v2. [Google Scholar] [CrossRef]
Dai, S.; Li, B.; Luong, T. Minimizers for the Cahn—Hilliard Energy Functional under Strong Anchoring Conditions. SIAM J. Appl. Math. 2020, 80, 2299–2317. [Google Scholar] [CrossRef]
Bao, W.; Tang, W. Ground-state solution of Bose-Einstein condensate by directly minimizing the energy functional. J. Comput. Phys. 2003, 187, 230–254. [Google Scholar] [CrossRef]
Poudel, S.; Wang, X.; Lee, S. A novel technique for minimizing energy functional using neural networks. Eng. Appl. Artif. Intel. 2024, 133, 108313. [Google Scholar] [CrossRef]
Liu, M.; Cai, Z.; Ramani, K. Deep Ritz method with adaptive quadrature for linear elasticity. Comput. Methods Appl. Mech. Eng. 2023, 415, 116229. [Google Scholar] [CrossRef]
Bao, W.; Chang, Z.; Zhao, X. Computing ground states of Bose-Einstein condensation by normalized deep neural network. J. Comput. Phys. 2025, 520, 113486. [Google Scholar] [CrossRef]
Chiwariro, R.; Wosowei, J.B. Comparative Analysis of Deep Learning Convolutional Neural Networks based on Transfer Learning for Pneumonia Detection. Int. J. Res. Appl. Sci. Eng. Technol. 2023, 11, 1161–1170. [Google Scholar] [CrossRef]
Li, R. A review of recent advances in fault diagnosis based on deep neural networks. Adv. Eng. Technol. Res. 2024, 9, 637. [Google Scholar] [CrossRef]
Shen, S.; Zhang, N.; Zhou, A.; Yin, Z. Enhancement of neural networks with an alternative activation function tanhLU. Expert Syst. Appl. 2022, 199, 117181. [Google Scholar] [CrossRef]
He, J.; Li, L.; Xu, J.; Zheng, C. ReLU Deep Neural Networks and Linear Finite Elements. J. Comput. Math. 2020, 38, 502–527. [Google Scholar] [CrossRef]
Glorot, X.; Bengio, Y. Understanding the difficulty of training deep feedforward neural networks. In Proceedings of the Machine Learning Research, Chia Laguna Resort, Sardinia, Italy, 13–15 May 2010; pp. 249–256. [Google Scholar]
Bock, S.; Weiß, M. A Proof of Local Convergence for the Adam Optimizer. In Proceedings of the 2019 International Joint Conference on Neural Networks (IJCNN), Budapest, Hungary, 14–19 July 2019; pp. 1–8. [Google Scholar]
Huo, J.; Liu, H.; Wen, X.; Zhang, R.; Wei, X. Energy Stability Analysis and Numerical Simulation of a Class of Phase-Field Equations. J. Jilin Univ. (Sci. Ed.) 2022, 60, 721–728. (In Chinese) [Google Scholar]

Figure 1. Visualization of typical phase separation processes governed by the Allen-Cahn equation. (Top Row) 1D case, showing the evolution from a smooth initial condition (cos(πx)) to a stable steady state with sharp interfaces. (Bottom Row) 2D case, showing the evolution from a random initial phase distribution to a final state with two clearly separated phases.

Figure 2. Illustration of the architecture of ES-ScaDNN.

Figure 3. Training Loss Comparison Between Neural Networks with ReLU and Tanh Activations in One-Dimensional Case (for the

\sin (π x)

initial condition with

ϵ = 0.01

).

Figure 3. Training Loss Comparison Between Neural Networks with ReLU and Tanh Activations in One-Dimensional Case (for the

\sin (π x)

initial condition with

ϵ = 0.01

).

Figure 4. Numerical solutions of the 1D Allen-Cahn equation with different initial conditions: (a₁) The initial condition is the result of

u (x, 0) = c o s (π x)

, (a₂) The initial condition involves training images using a loss function based on u(x,0) = cos(πx), (b₁) The initial condition is the result of

u (x, 0) = \sin (π x)

, (b₂) The initial condition involves training images using a loss function based on

u (x, 0) = \sin (π x)

and (c₁) The initial condition is the result of

u (x, 0) = - \sin (π x)

, (c₂) The initial condition involves training images using a loss function based on

u (x, 0) = - \sin (π x)

.

Figure 4. Numerical solutions of the 1D Allen-Cahn equation with different initial conditions: (a₁) The initial condition is the result of

u (x, 0) = c o s (π x)

, (a₂) The initial condition involves training images using a loss function based on u(x,0) = cos(πx), (b₁) The initial condition is the result of

u (x, 0) = \sin (π x)

, (b₂) The initial condition involves training images using a loss function based on

u (x, 0) = \sin (π x)

and (c₁) The initial condition is the result of

u (x, 0) = - \sin (π x)

, (c₂) The initial condition involves training images using a loss function based on

u (x, 0) = - \sin (π x)

.

Figure 5. Numerical solutions comparison for initial condition cos(πx) with ε = 0.01. (a) FDM (blue) vs. ES-ScaDNN without variance loss (red); (b) FDM (blue) vs. ES-ScaDNN with variance loss (red).

Figure 6. Numerical solutions of the 1D Allen-Cahn equation with initial condition −sin(\pi x) at different training epochs (0, 10, 100, and 5000). (a) Epochs 0, (b) Epochs 10, (c) Epochs 100, (d) Epochs 5000.

Figure 7. Steady-state solutions of the 1D Allen-Cahn equation with initial condition sin(πx) for different values of ε (0.01, 0.02, 0.05, and 0.1).

Figure 8. Training Loss Comparison Between Neural Networks with ReLU and Tanh Activations in Two-Dimensional Case.

Figure 9. Numerical solutions of the 2D Allen-Cahn equation at different training epochs (100, 1000, 10,000, and 20,000). (a) Epochs 100, (b) Epochs 1000, (c) Epochs 10000, (d) Epochs 20000.

Figure 10. Phase separation patterns of the 2D Allen-Cahn equation under different ε values (ε = 0.015, 0.055, 0.15, 0.25) at steady state. The simulation was conducted with 20,000 epochs. (a) ε = 0.015, (b) ε = 0.055, (c) ε = 0.15, (d) ε = 0.25.

Table 1. Absolute differences in minimum energy between FDM and ES-ScaDNN solutions.

Initial Condition	Interface Parameter (ϵ)	$\| E_{m i n}^{F D M} - E_{m i n}^{E S_S c a D N N} \|$
$u (x, 0) = c o s (π x)$	0.01	0.0064
$u (x, 0) = c o s (π x)$	0.05	0.0249
$u (x, 0) = \sin (π x)$	0.01	0.0032
$u (x, 0) = \sin (π x)$	0.05	0.0119
$u (x, 0) = - \sin (π x)$	0.01	0.0033
$u (x, 0) = - \sin (π x)$	0.05	0.0117

Table 2. Error Analysis of Neural Networks With and Without Scaling Layer for the Allen-Cahn Equation.

Initial Condition	Interface Parameter (ϵ)	Without Scaling Layer		with Scaling Layer
Initial Condition	Interface Parameter (ϵ)	$L_{\infty}$ Error	$L_{1}$ Error	$L_{\infty}$ Error	$L_{1}$ Error
$c o s (π x)$	0.01	1.9315	7.0425 × 10⁻²	9.9686 × 10⁻¹	2.3070 × 10⁻²
$c o s (π x)$	0.05	6.2911 × 10⁻¹	1.2974 × 10⁻¹	5.9905 × 10⁻¹	5.7749 × 10⁻²
$s i n (π x)$	0.01	1.9644	4.8865 × 10⁻²	9.8013 × 10⁻¹	1.3598 × 10⁻²
$s i n (π x)$	0.05	5.4945 × 10⁻¹	7.0016 × 10⁻²	5.2099 × 10⁻¹	3.3362 × 10⁻²
$- \sin (π x)$	0.01	1.5575	3.2492 × 10⁻²	9.8914 × 10⁻¹	1.5187 × 10⁻²
$- \sin (π x)$	0.05	5.4249 × 10⁻¹	7.1158 × 10⁻²	5.2261 × 10⁻¹	3.4077 × 10⁻²

Table 3. Comparison of Error Metrics With and Without Variance Loss.

Initial Condition	Interface Parameter (ϵ)	Without Variance Loss		with Variance Loss
Initial Condition	Interface Parameter (ϵ)	$L_{\infty}$ Error	$L_{1}$ Error	$L_{\infty}$ Error	$L_{1}$ Error
$c o s (π x)$	0.01	1.9992	8.6914 × 10⁻¹	9.9686 × 10⁻¹	2.3070 × 10⁻²
$c o s (π x)$	0.05	1.9930	6.0745 × 10⁻¹	5.9905 × 10⁻¹	5.7749 × 10⁻²
$s i n (π x)$	0.01	1.9991	9.3253 × 10⁻²	9.8013 × 10⁻¹	1.3598 × 10⁻²
$s i n (π x)$	0.05	1.4456	1.2102 × 10⁻¹	5.2099 × 10⁻¹	3.3362 × 10⁻²
$- \sin (π x)$	0.01	9.9780 × 10⁻¹	1.8964 × 10⁻²	9.8914 × 10⁻¹	1.5187 × 10⁻²
$- \sin (π x)$	0.05	5.9544 × 10⁻¹	4.9832 × 10⁻²	5.2261 × 10⁻¹	3.4077 × 10⁻²

Table 4. Comparison of total training time between CPU and GPU implementations for solving the two-dimensional Allen-Cahn equation with different grid resolutions. The speedup is calculated as the ratio of CPU time to GPU time.

Problem Size (Grid Points)	CPU Time (s)	GPU Time (s)
64 × 64	3191.47	435.33
128 × 128	6178.06	440.90
256 × 256	8400.00	664.87
512 × 512	27,873.63	1782.93

Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

© 2025 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).

Share and Cite

MDPI and ACS Style

He, X.; Wang, Y.; Wu, R.; Gao, J.; Zhang, R. An Energy Minimization-Based Deep Learning Approach with Enhanced Stability for the Allen-Cahn Equation. Axioms 2025, 14, 806. https://doi.org/10.3390/axioms14110806

AMA Style

He X, Wang Y, Wu R, Gao J, Zhang R. An Energy Minimization-Based Deep Learning Approach with Enhanced Stability for the Allen-Cahn Equation. Axioms. 2025; 14(11):806. https://doi.org/10.3390/axioms14110806

Chicago/Turabian Style

He, Xianghong, Yuhan Wang, Rentao Wu, Jidong Gao, and Rongpei Zhang. 2025. "An Energy Minimization-Based Deep Learning Approach with Enhanced Stability for the Allen-Cahn Equation" Axioms 14, no. 11: 806. https://doi.org/10.3390/axioms14110806

APA Style

He, X., Wang, Y., Wu, R., Gao, J., & Zhang, R. (2025). An Energy Minimization-Based Deep Learning Approach with Enhanced Stability for the Allen-Cahn Equation. Axioms, 14(11), 806. https://doi.org/10.3390/axioms14110806

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Menu

An Energy Minimization-Based Deep Learning Approach with Enhanced Stability for the Allen-Cahn Equation

Abstract

1. Introduction

2. Problem Statement

3. Energy Minimization Neural Network Method

3.1. Data Preparation

3.2. Model Architecture

3.3. Scaling Layer

3.4. Loss Function Design

3.5. Optimization Algorithm

3.6. Implementation Details and Hyperparameter Selection

4. Test Case 1: One-Dimensional Allen-Cahn Equation

4.1. Initial Condition Pretraining

4.2. Choice of Activation Functions in the One-Dimensional Case

4.3. Steady-State Solutions with Different Initial Conditions

Reference Numerical Method

4.4. Impact of Scaling Layer on Solution Accuracy and Physical Constraints

4.5. Impact of Variance Loss on Solution Accuracy

4.6. Impact of Different Training Epochs on Solution Evolution

4.7. Analysis of the Effect of Parameter ϵ on the Steady-State Solutions of the Allen-Cahn Equation

5. Test Case 2: Two-Dimensional Allen-Cahn Equation

5.1. Design of the Energy Functional Loss Function in Two Dimensions

5.2. Choice of Activation Functions in the Two-Dimensional Case

5.3. Solution of the 2D Problem and the Impact of Training Epochs on Solution Evolution

5.4. Hardware Scalability Analysis: CPU vs. GPU Implementation

5.5. The Effect of Parameter ϵ on the Phase Separation Process

6. Conclusions

Author Contributions

Funding

Data Availability Statement

Acknowledgments

Conflicts of Interest

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI

4.7. Analysis of the Effect of Parameter $ϵ$ on the Steady-State Solutions of the Allen-Cahn Equation

5.5. The Effect of Parameter $ϵ$ on the Phase Separation Process