Improved Physics-Informed Neural Networks Combined with Small Sample Learning to Solve Two-Dimensional Stefan Problem

With the remarkable development of deep learning in the field of science, deep neural networks provide a new way to solve the Stefan problem. In this paper, deep neural networks combined with small sample learning and a general deep learning framework are proposed to solve the two-dimensional Stefan problem. In the case of adding less sample data, the model can be modified and the prediction accuracy can be improved. In addition, by solving the forward and inverse problems of the two-dimensional single-phase Stefan problem, it is verified that the improved method can accurately predict the solutions of the partial differential equations of the moving boundary and the dynamic interface.


Introduction
For a fixed solution of a partial differential equation, it is usually restricted to a certain region. If this region varies with time, we call it the moving boundary problem. If part of the boundary of the fixed region is to be determined simultaneously with the solution of the fixed problem, we call it the free boundary problem, and the unknown boundary is called the free boundary. For free boundary problems, in addition to the usual fixed solution conditions, boundary conditions (Stefan conditions) must be added to the free boundary.
The Stefan problem is a class of heat conduction problems. It was first formulated by the Austrian physicist Joseph Stefan in the late 19th Century. The background of this problem is closely related to the industrial production during the industrial revolution of that time. In industrial production, many substances needed to be heat-treated, so the laws and methods of heat conduction needed to be studied to better control and utilize heat energy.
The difficulty of the Stefan problem is tracing the location of the interface, for example to model the process of change at the intersection of ice and water. Since the interface changes with time, these problems are also called free boundary problems. The study of free boundary problems has a wide practical background, such as plasma physics, percolation mechanics, and plasticity mechanics [1][2][3][4][5][6], which have presented various forms of constant and indeterminate free boundary problems. Furthermore, chemical vapor deposition [7], the vapor permeation of thermally cracked carbon in chemistry [8], tumor growth in medicine [9][10][11][12][13], the expansion propagation problem of biological populations [14][15][16][17], and the U.S. option pricing problem [18,19] also have free boundary problems. In fact, all free boundary problems are nonlinear problems, and it is important to solve them with the solution of the free boundary, which will be determined together with the solution of the fixed problem. Since these problems are closely related to practical applications, the efficient algorithmic implementation of free boundary problems is of great importance for scientific research and production practice.
Currently, a variety of numerical methods to the Stefan problems have emerged. The boundary integral method numerically solves the integral equation at a moving interface [20]. The interface tracing method explicitly represents the free boundary by using a fixed grid of points [1]. The immersion interface method [21] uses a fixed spatial grid for some physical quantities and a moving grid for the free boundary, and the information between the free boundary and the fixed grid is obtained by the immersion interface method. Segal [22] proposed an adaptive grid method for solving the free boundary problem, where the movement of the grid is determined by the control equations. Murray and Landis [23] compared the fixed-grid method with the adaptive-grid method. The adaptive-grid method can obtain the free boundary location more accurately, while the fixed-grid method can obtain certain physical quantities (temperature distribution over the whole solution area). More accurately, the enthalpy difference method [24] is an implicit method that represents the heat of the whole system by introducing an enthalpy function with a jump discontinuity at the free boundary, and this discontinuity helps to determine the location of the interface. The moving grid method [25] is a common method for solving free boundary problems, where the free boundary position is always fixed at the nth grid point, and the grid needs to be updated at each time step due to the movement of the free boundary. In recent years, the phase field method [26] has became a popular method based on the phase field function, which corresponds to a fixed constant in each phase and the interface region between the two values. This method considers a fuzzy boundary between the two phases, which is different from the classical Stefan problem, where the phase transition occurs in this interface region, where the thickness of the region is an artificially given parameter. The level-set approach [27] has also received increasing attention in recent years, by introducing a level-set function, which defines the interface position as a zero level set, obtained from the advection equation related to the velocity field, which varies considerably in different applications of the level-set approach. Sussman used the velocity of the fluid to model compressible two-phase flow [28], and Chen extended the interface moving velocity to the whole region by the advection equation in the solidification problem [27]. In contrast to the moving grid method, the level set method uses a fixed grid and avoids updating the grid at each time step. In contrast to the fixed-gridmethod, which finds the interface position at a fixed grid point, the phase-field method does not track the interface position precisely, and therefore, the discretization at the interface position is not as accurate as the fixed-grid method.
All the above methods have proven their high accuracy for some specific Stefan problems, and each has its own advantages and disadvantages. However, a general framework for solving the Stefan forward and inverse problems is still missing in all the methods at this stage.
In recent years, with the continuous development of neural networks, the use of neural networks to solve partial differential equations has gradually become popular [29][30][31][32]. Raissi et al. [33] proposed physics-informed neural networks for solving forward and inverse problems of partial differential equations, and deep-learning-based physics-informed neural networks have also been proposed for solving free boundary problems [34], as well as neural networks to solve the Stefan problems using a lattice-free grid-free automatic differentiation technique, which breaks the limitations of the above methods.
Deep learning models have achieved advanced results in solving various types of partial differential equations. However, the success of deep learning models relies heavily on a large amount of training data. In some specialized fields, the cost of data acquisition is very large. In addition, labeling samples requires much effort. Therefore, in recent years, a new learning approach, small sample learning, has gained popularity [35]. Small sample learning has been successfully applied to many new fields, such as: neural networks translation, target detection, etc. By using small sample learning, the accuracy of the model can be improved with few labeled data.
In this paper, we extended the recently emerging physics-informed neural networks framework [33] to solve the general Stefan problems. As we know, the original framework of physics-informed neural networks does not deal well with free boundary problems with time variation. To achieve this goal, we propose and modified the neural network framework proposed by Wang [34]. The specific contributions of this paper can be summarized as follows: Firstly, we incorporated the idea of small sample learning into neural network training, i.e., a small sample loss is added to the loss function optimization in order to improve the training accuracy and correct the model. Secondly, we changed the loss function to cope with the outliers that may arise from free boundary shifts. Finally, we applied the proposed framework to irregular regions and irregular free boundaries to test its performance. In summary, the proposed method provides a new general framework for solving the Stefan problems.
This paper is structured as follows: Section 2 introduces the Stefan problem and its mathematical model. Section 3 introduces the knowledge of neural networks and the PINNs' improvement strategy. Section 4 verifies the accuracy and applicability of the proposed framework with numerical arithmetic examples. The conclusions and outlook are given in Section 5.

Model Issues
In this section, we introduce the mathematical model of the Stefan problem and the corresponding boundary conditions using a one-dimensional single-phase Stefan problem in the solidification or melting process as an example.
As shown in Figure 1, assume that a semi-infinite solid occupying 0 x < ∞ is in the process of solidification or melting. For any moment t > 0, the 0 x < ∞ region consists of a solid and a liquid. The liquid is located in the 0 x < α(t) region, and the solid is located in the α(t) < x < ∞ region. If the volume change due to solidification or melting is not considered and the region 0 x < α(t) is considered, the temperature u(x, t) satisfies the classical diffusion equation over the region: and for the initial and boundary conditions: at the interface α(t), the following Stefan conditions need to be met: where α(t) denotes the free boundary, (4) denotes the initial position of the free boundary, and (5) denotes the temperature at the time of freezing. For the forward problem, in thermal physics, it is the simultaneous solution of the temperature distribution and the free boundary for which various parameters are known. Each Stefan problem corresponds to a class of inverse problems, and similar to other mathematical physics inverse problems, the inverse Stefan problem is not definite, while the uniqueness and stability of the solution are not always guaranteed.

Numerical Methods
In this section, the main methods and ideas of feedforward neural networks and physics-informed neural networks for solving partial differential equations are reviewed, and a general model of improved physics-informed neural networks combined with small sample learning for solving the Stefan problem is proposed.

Feedforward Neural Networks
Mathematically, a neural network is a specific class of complex functions, the simplest of which is a feedforward neural network (FNN), which is also called a multilayer perceptron (MLP). Let F P (x) : R d in → R d out be a P-layer feedforward neural networks with P−1 hidden layers; the number of neurons in the input layer is N 0 = d in , and the number of neurons in the output layer is N P = d out . The network weight of the pth layer is W p ∈ R N p ×N p −1 , and the network bias is b p ∈ R N p . If given the activation function σ, the feedforward neural network (FNN) can be expressed as

Physics-Informed Neural Networks
We considered the following parameterized partial differential equation: where x ∈ Ω ∈ R n , F denotes the residuals of the partial differential equation, F in (∂ x u, ∂ t u, . . .) denotes the space-time differential operator, λ = [λ 1 , λ 2 , λ 3 . . .] represents the parameters of the partial differential equation, u(x, t) is the solution of the partial differential equation, Ω denotes the solution region of the partial differential equation, ∂Ω is the boundary of Ω, and B represents any of the boundary operators, including Dirichlet, Neumann, Robin, and periodic. PINNs [33] use a fully connected feedforward neural networks consisting of multiple hidden layers to approximate the solution of the partial differential equation with spatiotemporal coordinates u(x, t) as the input. The neural network is displayed in Figure 2. First, we establish the loss function based on the form of the partial differential equation and the initial margin value condition: where where L f , L ic , and L bc denote the loss functions satisfying the control equations, initial conditions, and boundary conditions and λ f , λ ic , and λ bc denote the weights of the loss functions L f , L ic , and L bc , respectively, where the derivatives involved are automatically differentiated by the neural networks.
After the loss function L total is established, the optimal parameters θ * = (W * , b * ) of the neural networks are obtained by iterative updating with the objective of minimizing the loss function, considering that the loss function is nonlinear and nonconvex for θ * , so a gradient-based optimizer is used to minimize the loss function, such as: Adagrad, AdaDelta, Adam, momentum, and RMSProp.

Neural Network Improvement Strategies for Stefan Problem
PINNs have been applied in various scientific and engineering fields to solve various complex physical problems. In the free boundary problem, the simple PINNs can no longer meet the requirements of the Stefan problem, and certain structures need to be changed accordingly. Therefore, we propose the following improvement strategies.

Neural Networks' Basic Structure
Recall that, for the Stefan problem, we aim to find both the temperature solution u(x, t) and the moving boundary α(t). To this end, we build two deep neural networks u θ (x, t) and α δ (t) with θ and δ as independent parameter spaces to approximate the solutions u(x, t) and α(t). Then, we approximate the area temperature solution and the moving boundary by minimizing a composite loss function.

Adding Additional Terms to the Loss Function
The original PINN loss function is shown in (9), and in this paper, we considered adding a loss function with small sample data, as follows: where L ssl is the loss function for the small sample data, which is used to correct the model. Adding small sample data can improve the prediction accuracy of the model, especially in the case of fewer data. Meanwhile, it can improve the generalization ability and robustness of the model to avoid the risk of overfitting.

Loss Function Improvement Strategy
The mean-squared error (MSE) is the most-commonly used regression loss function in deep learning and is the sum of squares of the distance between the target variable and the predicted value. Considering the specificity of Stefan problem, sharp or irregular regions will appear during the melting or freezing process, which leads to outliers, and the meansquared error is very sensitive to outliers; thus, we introduce the log-cosh loss function, for a small error; log(cosh(x)) is similar to x 2 2 , while for large errors, log(cosh(x)) is similar to |x| − log 2, which means that the log-cosh loss function can have the advantage of the mean-squared error while not being affected by too many outliers. It is also quadratically derivable at every point. The introduction of the log-cosh loss function allows for better model generality and model prediction accuracy. The log-cosh loss function is expressed as follows: where the log-cosh loss function is smooth and has a continuous derivative property, which allows the model to be trained using derivative-based optimization algorithms such as gradient descent, and the log-cosh loss function is a symmetric loss function; it has equal loss for positive and negative errors. This symmetry allows the model to better handle both positive and negative errors, thus improving the model fitting ability.
To summarize, our proposed algorithm is displayed in Algorithm 1 and visualized in the schematic diagram in Figure 3, where we explain each step in detail. First, we construct two neural networks u θ (x, t) and α δ (t). θ = W k , b k 1 k K is the set of weight matrices and bias vectors in u; δ = W j , b j 1 j J is the set of weight matrices and bias vectors in α; for the neural networks as a substitute for u and α, we can use auto-differentiation and the chain rule to differentiate the functions, then we need to restrict the two neural networks to satisfy the physical conditions imposed by the PDE and Stefan condition, while it is harder to achieve if we constrain over the whole region, so we constrain u and α at scattered points and various training sets and measure the difference between the two neural networks and the constraint; we define the loss function as the sum of the hyperbolic cosine logarithms of the equations and the residuals of the boundary conditions (where the derivatives involved are handled automatically by AD), and in the last step, we minimize the total loss function to obtain the optimal parameters and, thus, the optimal solution. Since the loss is nonlinear and nonconvex, we usually use gradient-based optimizers to minimize the loss function, such as gradient descent, Adam [36], and L-BFGS [37].  1: Two neural networks u θ (x, t) and α δ (t) with θ and δ as parameters and propagating forward independently of each other are constructed. 2: The training sets K f for the temperature control equation, the training sets K u 0 and K u nc for the initial margin condition, the training sets K δ 0 for the initial margin condition of the moving interface, K δ nc , K δ ic , K δ bc , and the training set K u ssl for the residuals of the internal small sample data. 3: The total loss function is assigned by summing the hyperbolic cosine logarithms of the control equation, the initial margin conditional residuals, the moving boundary initial margin conditional residuals, and the interior point residuals. 4: The optimal parameters θ and δ are obtained by minimizing the loss function and training the neural network to obtain the optimal solution.

Numerical Examples
In this section, we consider several two-dimensional classical forward and inverse Stefan problems to demonstrate the robustness and power of the proposed model. We validated the proposed model for different numbers of small sample data points (0, 50, 100, 200). The results showed that small sample data can effectively improve the detection accuracy. In Section 4.1, one two-dimensional forward Stefan problem is given to test the accuracy of the proposed model, and in Section 4.2, two two-dimensional inverse Stefan problems are given to verify the generality of the model.
For error evaluation, we used the relative L 2 norm: Here, M represents the number of all points in the training process of two mutually independent neural networks, u pred represents the predicted values of the corresponding coordinate points, and u exact represents the exact values of the corresponding coordinate points.

Two-Dimensional Single-Phase Stefan Forward Problem
In this subsection, a classical two-dimensional solid melting problem model is used as an example to illustrate the effectiveness of the method [38].
As depicted in Figure 4, let D ∈ R 2 , for any 0 < t < T, Ω(t), be the time-dependent bounded subdomain in D, ∂Ω(t) be its boundary, and ∂Ω(t) = Γ(t) ∪ Σ(t), where Γ = ∪ 0<t<T Γ(t) represents the fixed boundary and Σ = ∪ 0<t<T Σ(t) represents the moving boundary. The above Stefan problem can be described in mathematics as u| Ω(0) = u 0 , where ∆ is the Laplace operator for spatial variables, n is the normal vector of the definition field Ω with respect to the time variables, and the definition field Ω is α(y, t) denotes the unknown free boundary; the artificial region is defined as: where Φ(x, y, t) = x − α(y, t). Define the following parameters: The boundary conditions are defined as Construct the following true solution: u(x, y, t) = exp As shown in Algorithm 1, we built two fully connected neural networks u θ (x, y, t) and α δ (y, t) that propagate independently of each other to approximate the solution of the control equation u(x, y, t) and the moving boundary α(y, t), and we trained these two neural networks by minimizing the loss function.
L(θ, δ) = L f (θ) + L u 0 (θ) + L u bc (θ) + L α 0 (δ) + L α bc (θ, δ) + L α nc (θ, δ) + L u ssl (θ), (25) where In particular, since the moving interface α(y, t) is unknown in advance, the residual point region of the equation was set to the artificial region Ω * , and the neural networks was trained to restrict the neural network prediction solution to the defined domain Ω. Two Stefan neural networks u θ (x, t) and α δ (t) were constructed with independent forward propagation, and the neural network parameters were set as follows: u θ (x, t) has a network structure of 3 + 100 × 3 + 1; the input layer has three variables and three hidden layers; each layer contains 100 neurons; the output layer outputs the neural networks prediction solution. The network structure of α δ (t) is 2 + 100 × 3 + 1; similarly, the input layer has two variables, and the output layer has three hidden layers containing 100 neurons each, while the moving boundary of the neural network prediction is the output. The training set was selected as N f with 256 points, 256 points for each of the initial border conditions (including the initial border conditions on the free boundary), and 0, 50, 100, and 200 for the small sample data points in the region. The results are shown in Table 1. The same initial learning rate of 10 −3 was set for both neural networks, and the number of iterations was 40,000 with Adam's algorithm, using tanh as the activation function and using Xavier to initialize both neural networks. By training the neural networks with the above parameters, Figure 5 shows the L 2 error plot of the predicted solution and the true solution, and Figure 6 shows the L 2 error plot of the predicted free interface α(y, t) and the true free boundary of the neural networks. It is worth mentioning that the number of small samples u ssl is set to 100 in Figures 5 and 6, and we can see that the L 2 errors reached 8.57 × 10 −3 and 3.25 × 10 −2 , respectively, which is a significant improvement in accuracy compared with the original neural networks [34].

Two-Dimensional Single-Phase Stefan Inverse Problem I
For the two-dimensional single-phase Stefan inverse problem, we assumed that the boundary conditions are not known and provide additional information at the final moment T: u| Ω(T) = u T . (26) Then, this class of inverse problems means that we know the data at the moment T = 1 and need to find the temperature solution u(x, y, t) and the free boundary α(y, t) and satisfy (14), (16), and (17)- (19).
As shown in Algorithm 1, we built two fully connected neural networks u θ (x, y, t) and α δ (y, t) that propagate independently of each other to approximate the solution of the control equation u(x, y, t) and the moving boundary α(y, t), and we trained these two neural networks by minimizing the loss function.
L(θ, δ) = L r (θ) + L u 0 (θ) + L u T (θ) + L α bc (θ, δ) + L α nc (θ, δ) + L α 0 (δ) + L u ssl (θ), (28) where the loss function definitions are all the same as in Section 4.1, except that L u T (θ): We trained the neural networks using exactly the same parameters as in Section 4.1, as shown in Table 2, which visualizes the L 2 error of the solution and moving boundaries when the small sample data are 0, 50, 100, 150, and 200, respectively.  Figure 7 gives the error plots of the predicted and exact temperature solutions at moments t = 0.2, 0.4, 0.6, and 0.8, and Figure 8 shows the error plots of the predicted moving boundaries and the true moving boundaries. In Figures 7 and 8, the number of small samples u ssl was set to 100, and we can see that the L 2 error reached 3.66 × 10 −2 and 3.15 × 10 −2 , which improved the accuracy of the neural networks compared with the original neural networks.

Two-Dimensional Single-Phase Stefan Inverse Problem II
This part mainly discusses the Stefan problem of inversion of the phase change boundary in the case of known temperature and heat changes at the boundary of a homogeneous medium, without considering any initial conditions or boundary conditions at the moving interface; only some given temperature measurements in the definition domain were considered, and neural networks were used to approximate the temperature solution and the unknown position of the moving boundary, then the above inverse problem is described by the mathematical formula: Similarly, two fully connected neural networks u θ (x, y, t) and α δ (y, t) that propagate independently of each other were constructed to approximate the solution of the equation u(x, y, t) and the moving boundary α(y, t), and we trained these two neural networks by minimizing the following loss functions: where log cosh u θ α δ y k bc , t k bc , y k bc , t k In particular, the first loss function was constructed from the residuals of the temperature observation points in the region, where (x i , y i , t i ), u i N i=1 is a randomly sampled data point in the domain. The neural networks parameters were set as follows: the network structures of the two neural networks u θ (x, t) and α δ (t) were the same as in the previous example, and the training set was selected as follows: the residuals of the equations and the initial margin conditions (including the initial margin conditions on the free boundary) were each selected as 256 points. The results are shown in Table 3. The initial learning rate was set to 10 −3 for both neural networks, and the number of iterations was set to 40,000 and optimized by Adam optimizer. Xavier initialized both neural networks. From Figures 9 and 10, it can be seen that the L 2 error of the solution of the inverse problem and the moving interface can reach 9.97 × 10 −3 and 1.26 × 10 −2 for the improved deep neural networks with small sample data of 100, which is a significant improvement compared with the original neural network.   The above two-dimensional single-phase Stefan inverse problem II is based on the temperature measurement value M = 50, and then, we tested the performance of our proposed neural network framework by adding white noise κ with magnitude equal to 1%, 2%, 5%, and 10% of the L ∞ norm of the solution function u(x, y, t), where the small sample data points were 100. The relative L 2 errors obtained for the prediction solutions u(x, y, t) and α(y, t) are shown in Tables 4 and 5, respectively. We can observe that the prediction accuracy of both u(x, y, t) and α(y, t) improved as the total number of temperature measurement data M increased, but became lower as the noise level κ increased. The latter case verified our conjecture that the dataset became increasingly inaccurate due to higher levels of noise. Another important observation is that, given a sufficient amount of temperature measurement data (e.g., M = 200), even with noise corruption up to 10%, the relative L 2 error between the predicted solutions u(x, y, t) and α(y, t) can reach 6.78 × 10 −2 and 3.77 × 10 −2 , and it can be seen that the improved physics-informed neural networks with small sample learning can accurately identify the moving boundaries despite the presence of a large amount of noisy data. Figure 11 shows the drop plots of the loss function for the three cases when the noise level κ is 10%, and it can be seen that the loss function image became relatively stable with the addition of the small sample loss function, which also validated the relevant narratives in Sections 3.3.2 and 3.3.3. Figure 11. The two-dimensional inverse Stefan problem II with a 10% noise level of the MSE, log-cosh and log-cosh loss function with the addition of small sample loss decreases the plot.

Irregular Area Stefan Problem
To make our proposed algorithmic framework more convincing, the main work of this subsection is to consider numerical solutions and the free boundary of the two-dimensional Stefan problem on an L-shaped complex computational domain that differs from the traditional [a, b] × [c, d] rectangular domain, which is more capable of testing the performance of our algorithmic framework. It is worth mentioning that the equations and Stefan conditions used in this section are the same as in Section 4.1, except that the boundary information is different due to the complex computational domain. Figure 12 represents the plan view of the selected training points on the L-shaped complex computational domain. Unlike the above example, we randomly selected the training points on the boundary, and after the complex computational domain was triangulated and dissected, the internal points were selected at the nodes instead of selecting random samples as internal information. We used the same hyperparameters in the numerical example above and constructed the same loss function in (15), minimizing the loss function by the Adam optimization algorithm to obtain the optimal solution and the free boundary, as shown in Figure 13, where we can see that our proposed method can also achieve good accuracy for complex computational domains. The relative L 2 error between the predicted solutions u(x, y, t) and α(y, t) can reach 5.74 × 10 −3 and 1.56 × 10 −2 . Snapshots of moments for t = 0.2 s, t = 0.4 s, and t = 0.8 s are given. It can be seen that the model predicts the temperature solution and moving boundary, in good agreement with the exact solution.

Irregular Free Boundary Stefan Problem
In this subsection, consider an irregular free boundary problem, the initial shape of which is described by a unit circle of x 2 + y 2 = 1 with equations and boundary conditions as described in Section 4.1, t ∈ [0, 1], and consider the following analytic solution: u(x, y, t) = e t (t + 1)x 2 + (5t + 1)y 2 − t 2 − 1 (32) α(x, y, t) = (t + 1)x 2 + (5t + 1)y 2 − t 2 − 1 Similar to the description in the previous section, we randomly selected training points on the boundary, and after the complex computational domain was triangulated and dissected, internal points were selected at the nodes. The distribution of training points is shown in the Figure 14.
Next, we also used the same loss function construction as in (15) to predict u(x, y, t) and α(x, y, t). We give snapshots of the predictions when t = 0.2 s, t = 0.4 s, and t = 0.8 s in Figure 15. From that, we can see that our proposed neural network framework can also have a good prediction for general irregular moving boundaries; meanwhile, the relative L 2 error between the predicted solutions u(x, y, t) and α(x, y, t) can reach 3.51 × 10 −2 and 2.47 × 10 −2 .

Conclusions
In this paper, by improving the neural network structure of physics-informed neural networks (PINNs) and combining the idea of small sample learning, a generalized neural network framework for solving the Stefan problems was proposed, which can be directly applied to various types of Stefan problems with only small changes. After several numerical experiments, it was proven that the improved deep neural networks based on small sample learning improved the computational accuracy by 2-3-times compared with the original neural network, and the model generalization ability was significantly improved. Meanwhile, this paper demonstrated that the proposed method had a good prediction effect and accuracy through the examples of the irregular region and irregular free boundary. The study of Stefan problems with sharp, irregular geometries and topological variations, as well as three-dimensional or even higher-dimensional Stefan problems is the focus of the future work.