A Preliminary Study on the Resolution of Electro-Thermal Multi-Physics Coupling Problem Using Physics-Informed Neural Network (PINN)

: The problem of electro-thermal coupling is widely present in the integrated circuit (IC). The accuracy and efﬁciency of traditional solution methods, such as the ﬁnite element method (FEM), are tightly related to the quality and density of mesh construction. Recently, PINN (physics-informed neural network) was proposed as a method for solving differential equations. This method is mesh free and generalizes the process of solving PDEs regardless of the equations’ structure. Therefore, an experiment is conducted to explore the feasibility of PINN in solving electro-thermal coupling problems, which include the electrokinetic ﬁeld and steady-state thermal ﬁeld. We utilize two neural networks in the form of sequential training to approximate the electric ﬁeld and the thermal ﬁeld, respectively. The experimental results show that PINN provides good accuracy in solving electro-thermal coupling problems.


Introduction
The resolution of multi-physics problems comes down to the computation of partial differential equation (PDE) solutions. General numerical methods for handling PDEs include the finite element method (FEM), the finite difference method (FDM), the finite volume method (FVM), etc. All these methods require a discrete representations of the domains and utilize interpolation functions to obtain the solution not on the discrete set. The discretization represents the domain well for low-dimensional problems, but not for high-dimensional ones , as the number of elements increases exponentially with the dimensionality. In addition, these methods only solve the PDEs at discrete points, and thus require interpolation or slope behavior for other points or other fields; this property makes the solution of the state variables at the interpolated points less accurate [1], particularly the derivatives of the state variables.
Deep learning has recently achieved great success in the fields of science and business [2][3][4]. Due to these advances, many scientists have been working to embrace deep learning in the computation of physical problems. Most of these studies are data driven [5,6], which learn certain corresponding relationships between the input data and the output data, and then output corresponding predictions for the new input data. The correspondence is usually unknown and not obvious; hence, it requires a large amount of training data to learn, and that may be arduous. Furthermore, the training process is usually more like a "black box," and the hidden layers' physical meanings are still unknown, making the network training time consuming. To alleviate this drawback, loss functions, which are The rest of the paper is organized as follows. Section 2 recalls the idea behind the PINN framework and presents the method we use for applying PINN on electro-thermal analysis. Section 3 utilizes a concrete case to solve the electro-thermal coupling problem. Then, we compare the accuracy and continuity of the gradient field between the first-order FEM and the PINN. Section 4 discusses the experimental results and provides some prospects for future work. Section 5 concludes this work.

Materials and Methods
In this section, we first introduce the principle of the deep neural network (DNN), followed by an overview of PINN.

DNN
DNN is a form of multi-layer perceptrons, which can be interpreted as universal function approximators [32]. The DNN consists of L layers and an input vector called the input layer, wherein L layers include L − 1 hidden layers, and the L th layer is the output layer. We let N L : R C i → R C o denote a deep neural network of L layers, where C i and C o denote dimensions of the input vector and output vector, respectively, and N k (1 ≤ k ≤ L) denotes neurons in k th layer (N L = C o ). Let W k ∈ R N k ×N k−1 and b k ∈ R N k denote the weight matrix and bias vector of the k th layer, respectively. The activation function in the k th layer is denoted by σ k (·), regardless of its type. Commonly used activation functions are Sigmoid, ReLu, Tanh, etc. After the k th hidden layer receives an input vector z k−1 (z 0 = x), it multiplies the vector with the weight matrix W k , and then adds it to the b k . If k is equal to L, the result is output directly; otherwise, an activation function is applied to the result and then passed to the next layer. In summary, the feed-forward process of DNN is given in Equation (1). The first hidden layer's input vector is x and output is z 1 ; the rest of the hidden layers will have an input z k−1 (2 ≤ k ≤ L − 1) and output z k . For the regression task, the last layer usually does not require an activation function, and the L th layer takes in z L−1 , and outputs y, as follows.
(1) Figure 1 shows the basic structure of DNN. In DNN, the hyperparameters are a set of parameters used to control the learning process, e.g, the number of hidden layers and the number of neurons in each layer.

PINN
Machine learning is able to act as a surrogate model, discovering the potential relations between the input and output, classifying various images, and predicting system responses under different conditions [33]. Most methods that are based on machine learning for solving PDEs are data driven and do not consider any physical constraints [27]. Hence, a great amount of data is often required to train the model. Preparing training data in advance is undoubtedly an arduous process. On the contrary, PINN introduces physical constraints and reformulates solving equations as an optimization problem, which reduces the amount of required data, alleviates overfitting issues, and therefore improves the robustness of the trained model [34].
The following describes the formulation of PINN for solving general PDEs. Considering the following PDE with a boundary condition where u(x) denotes the solution to this PDE, L(·) is a differential operator, f (x) is a forcing function, and Ω represents the domain of interest, ∂Ω is a symbol of all boundaries. Γ denotes specific boundary where the boundary condition is imposed. As mentioned previously, we can approximate the solution of PDEs by DNNs. The essence of PINN is to use the PDE as a loss function for the optimization. All NN training requires an optimization process. Figure 2 graphically introduces the structure of PINN. The input data x is a set of randomly sampled points from the domain and boundaries. After obtaining the loss function value, we will judge both the loss value and the iteration value to determine whether to enter the next step. denotes the cut-off threshold of loss value, and it refers to the maximal iteration number. Once the loss value is less than the threshold , or the iteration is greater than the maximal number it, then the training will end. With the help of PyTorch's automatic differentiation module, we can obtain the output's partial derivatives of any order. Consequently, it simplifies converting the network output into a governing equation format. We let Θ = {W 1 , W 2 , . . . , b 1 , b 2 , . . .} ∈ V be a set of trainable parameters, i.e., all the layers' weights W k and biases b k (1 ≤ k ≤ L), where V denotes the parameter space. We also let u(x; Θ) denote the output function of PINN. The loss function J(Θ) can be defined as [16].
Equation (5) denotes the governing equation constraint, and Equation (6) is the boundary condition constraints, where N f and N B are the number of domain samples and boundary samples, respectively. x i f denotes the sampling points in the domain, u i denotes the target outputs, and x i B denotes the sampling points on the boundaries. Unless the value of J(Θ) converges to zero, we end up with an approximation of the PDE's solution. By adjusting the training parameters Θ to minimize the loss function, we seek to find the optimal parameters Θ * that satisfy the following condition. The optimization of the loss function depends on the backpropagation and optimizer. Common optimizers include SGD, Adam, and L-BFGS. Furthermore, to satisfy the boundary conditions, there currently are two methods, namely the "soft" and the "hard" boundary conditions. The "soft" boundary condition method is more commonly used since it embeds the boundary conditions into the loss function directly without recourse to preprocessing; "hard" constructs a particular solution function which is usually used to automatically satisfy the Dirichlet boundary conditions, thereby making the optimization more efficient [13]. In addition, it reduces the number of training points required because we only need to sample points from the inside of the domain and from the boundaries where the "soft" condition is applied, thus reducing the training cost [1,35]. The corresponding ansatz for the solution with a "hard" boundary condition iŝ where G(x) denotes a smooth extension function, which satisfies the Dirichlet boundary condition, D(x) is a smooth distance function that gives the distance from x ∈ Ω to Γ, andû(x; Θ) represents the new output function designed to automatically fulfill the Dirichlet constraints based on the raw output u(x; Θ). Since the Dirichlet boundary can be automatically satisfied, it is no longer considered in the loss function. Nevertheless, it is still necessary to train for the Neumann boundary and Robin boundary terms by adding the boundary constraints to the loss function.

PINNs for the Electro-Thermal Coupling Problem
In this section, we introduce the principle of solving a two-dimensional electro-thermal coupling problem with PINNs. The electrokinetic field problem seeks to find the electric field or current density distribution in a conducting region. The problem can be described by the electric scalar potential in the form of the Laplace equation. Considering u(x, y) as the space distribution of the electric potential, σ denotes the electric conductivity, ∂u/∂n denotes the normal derivative of u along the outward direction of the Neumann boundary Γ N , B(x, y) is the given value of u on the Dirichlet boundary Γ D , and u(x, y) satisfies the following partial differential equation and boundary conditions: The electric current generates Joule heat, which causes the temperature change of the conductor. The calculation formula for Joule heat Q at each point is where J denotes the electric current density and E the electric field, which is expressed as E = −∇u. The constitutive equation is given by The thermal field induced by the Joule heat is in the form of steady-state heat conduction. Considering T as the spatial temperature distribution, k as the thermal conductivity, and Q as representing the Joule heat source, which can be computed by Equations (12) and (13), the governing equation and boundary condition is given by T where T 0 is set to 273K.
To handle the electro-thermal coupling problem, we employ two PINN networks to approximate solutions separately and train the networks in a sequential manner. The flow chart is shown in Figure 3, both of the two neural networks' inputs are the same sampling points, and the outputs are the electric scalar potential and the temperature, denoted by u and T, respectively. The corresponding loss functions of those two problems are listed below. Since we employ the "hard" boundary condition method here, the loss function for the Dirichlet boundary is eliminated. Equations (16) and (17) denote the PDE constraint and Neumann boundary constraint of the steady electric field problem, respectively.û(x i f , y i f ; Θ) denotes the output of PINN corresponding to the domain sample points, andû(x i N , y i N ; Θ) represents the output of the Neumann boundary sample points.
Equation (18) is the PDE constraint of the thermal field problem.T(x i f , y i f ; Θ) is the output of PINN corresponding to the temperature domain sample points. No MSE B term is needed since the thermal field problem only has Dirichlet boundary conditions.
Firstly, we train the neural network of the steady electric field problem, where the loss function is embedded with the governing equation and boundary conditions of Equations (9)- (11). When the loss function drops below a designated cut-off threshold or when the iteration reaches a specified number, the PINN training ends. Subsequently, we calculate the gradient of u to obtain the distribution of the electric field and the current density and then compute the Joule heat source based on the former results. The heat source is passed to the next PINN network to approximate the thermal field Equations (14) and (15). It can be noticed that in both PINNs, the "hard" condition is applied for the Dirichlet boundary conditions. After the same flow of work, we finally obtain both the electric field and thermal field's distributions.

Electro-Thermal Coupling Problem
In this section, the two deep neural networks whose loss functions are defined in Equation (4) are employed to study the electro-thermal coupling issue. The governing equations and boundary conditions of the electrokinetic problem are defined in Equations (9)- (11), and counterparts of the thermal problem are defined in Equations (14) and (15). The deep learning framework we choose is PyTorch. The configuration of the computer is Intel Core i7-9750H for CPU, Nvidia GTX1660Ti 6G for GPU, with 16 GB RAM. We use PyTorch's automatic differentiation [36] to establish the governing equations and the Neumann boundary constraints.

A Rectangle Electro-Thermal Coupling Problem
We choose a model of a square geometry [−0.5,0.5] × [−0.5,0.5] m 2 . As a proof of concept, here, we take the electric conductivity and thermal conductivity as isotropic scalars, and we set σ = 1 S/m and k = 1 W/(m · K). The sample results are shown in Figure 4. There are 100 sample points in the domain for both the electric and thermal field problems. Each field has different boundary conditions; for the steady electric field, the upside border of the model is applied to a voltage of 1 V, while the downside border is grounded and the Neumann boundary condition is applied to the lateral lines. Since we use the "hard" boundary method to process the Dirichlet boundary condition, the upper and lower boundaries do not need to be sampled for the electric field problem. As a result, only 40 points on the two lateral boundaries are sampled for the electric field problem. While for the thermal field, all the boundaries are Dirichlet boundaries, their boundary conditions are also automatically satisfied by the "hard" boundary method, and hence, there is no need to sample on these boundaries. The points in the inner region are sampled by the Latin hypercube sampling method [37], and the boundary data are sampled randomly in a uniform distribution.  For our study, the electrokinetic field's governing equation is the Laplace equation. Here, we construct a "hard" boundary model adapted to the geometry and boundary conditions. The smooth extension function G(x,y) and the smooth distance function D(x,y) defined in Equation (8) are specifically defined as follows, respectively.
After the "hard" boundary condition is applied, since we treat the electric conductivity as isotropic scalar, the final PDE expressions of the electric field problem based on the raw NN output u(x, y; Θ) are given by Since the Dirichlet boundary is compulsorily satisfied, the loss term of the steady electric field network model only includes governing equations and constraints of the Neumann boundaries.
When the training of the electric field is over, we can obtain a satisfactory electric field solution; then, the Joule heat is calculated according to the obtained result. Next, the heat source data are transferred to the thermal field to continue the calculation. For the heat conduction equation, we also construct "hard" boundary conditions. Since we set the temperature on the boundaries to 273 K, we have G(x, y) = 273. The distance function is given by D(x, y) = (x 2 − 0.5 2 )(y 2 − 0.5 2 ).
Similarly, the final PDE expression of the thermal field problem based on the raw NN output T(x, y; Θ) is given by Two separate PINNs are employed to approximate the steady electric field and the thermal field, respectively. After parameter tuning of the neural network, we can obtain a well-trained model. For both neural networks, we choose 6 hidden layers with 20 neurons in each layer. The activation function we select is Tanh, the optimization algorithm is Adam [38], and the initial learning rate of both networks is lr = 1 × 10 −2 . For training data, we utilize full batch size. Moreover, we set the cut-off threshold of the loss function as 1 × 10 −5 and the maximum number of epochs at 20, 000, that is, when the neural network training meets any of these two conditions, the training ends. Weights and biases of the neural network usually need to be initialized by some initialization methods. Here, we choose a widely used method: Xavier initialization [39]. Based on the experiment results, the number of epochs need for the two field problems are 1108 for the steady electric field, and 1182 for the heat conduction field. The lowest loss function values and the spent time of each neural network training are displayed in Table 1. We can notice that because of the non-homogeneous distribution of the thermal field, the learning of the temperature distribution is more difficult than the electric field. Our training results are shown in Figure 5. We use the FEM results as the reference solutions to evaluate the training accuracy. The FEM results, shown in Figure 6, are obtained with a mesh of 6592 points and the CPU time of 0.87s. In order to further compare the difference between our training results and the reference solutions, Figure 7 shows the absolute error distributions between the PINNs solutions and the reference solutions. The relative errors are below 4 × 10 −4 for both problems. We can conclude that DNN's training results are in good agreement with those calculated by FEM; however, the large computation time of DNN as compared to that of FEM is an apparent shortcoming that needs to be improved.

The Comparison between FEM and PINN
The trained PINN could be viewed as an analytical approximation of the latent solution; hence, for further exploration, such as the derivatives of state variables such as the gradient manipulation, the PINN can do this analytically and easily [1]. This advantage makes post-processing operations convenient. In addition, PINN does not need to be retrained to solve the solution and its gradient at the new sampling points. Hence, we can obtain a smooth derivative distribution. However, for FEM, the differential equations can only be solved at discrete points; when utilizing the first-order FEM to compute the state variables' gradients, such as the field strength, the value inside the element is constant and discontinuous at the elements' interfaces. Therefore, for FEM, the solution state variables' gradients are less accurate.
To compare the accuracy and the continuity of the gradient field calculated by FEM and by PINN, respectively, we choose an equation, which has an analytical solution. The research domain is a square of [0, 1] × [0, 1] m 2 , and the PDE is given by − ∇ 2 u = 2π 2 sin πx sin πy, the analytical solution is u(x, y) = sin πx sin πy. (26) We first discretize the model, acquire the solutions of the scalar potential on the nodes by first-order FEM, and then calculate the gradient inside every element. The mesh diagram is shown in Figure 8a, where the number of grid nodes is 983. In order to facilitate the comparison and the illustration, we compute the gradient module value on the line y = 0.2 as illustrated by the horizontal solid line in the picture. Figure 8b shows the gradient modulus comparison between the first-order FEM and the analytical solution. It can be seen that for the FEM, the gradient modulus distribution of the scalar potential is discontinuous along the line; this is because the gradient modulus obtained by the first-order FEM is constant per element.
Secondly, we utilize PINN to solve the equation. After tuning the deep learning parameters, we choose the number of sample points to be 1000, the number of layers to be 2, and for there to be 60 neurons for each hidden layer. The activation function and optimization function we select are the same as in the previous section, while the learning rate is lr = 5 × 10 −3 . We set the maximum number of epochs as 20, 000 and the cut-off threshold for the loss function as 1 × 10 −3 . Finally, the training time is 78 s, the training takes 1166 epochs, and the lowest loss value is 9.98 × 10 −4 . After training the model, we resample points on the y = 0.2 line and calculate their solution, gradients, and the corresponding gradient modulus. Figure 9 illustrates the gradient modulus comparison between the PINN and the analytical solution. It can be seen that the result of PINN is in good agreement with the analytical solution, given by a smooth distribution curve.
However, FEM solves this problem with a mesh of 983 points with a computational time of merely 0.12 s. Compared with FEM, the large computational time is still a limitation of PINN for now.

Empirical Properties of the PINN
For the problem defined by Equations (24) and (25), which has an analytical solution, we performed a study to quantify the training time and the predictive error for two different variables: one is the network's size and the other is the number of samples. Some empirical conclusions will be drawn. Specifically, the neural network's size includes the network's width and depth; the width refers to the number of each layer's neurons; and the depth refers to total number of hidden layers. In the following, we will study each of these two variables while the other variable value is set at a reasonable value. Hyperparameters that are not studied in this work are set to be the same as in the previous section. For each combination, we set the maximum epoch to 20, 000 and the cut-off threshold for the loss function to 1 × 10 −3 . The root mean squared error (RMSE) is used to evaluate the prediction error. Considering u i as the analytical solution,û i as the neural network's predictive value, and n as the number of test points, the RMSE is given by  Tables 2 and 3, respectively. According to the results in Table 2, we observe that except for the first set, i.e., one hidden layer with 5 neurons reaches the maximum epoch before reaching the cut-off threshold, other combinations' loss functions converge below the cut-off threshold within the maximum epoch. Furthermore, it is worth mentioning that the prediction errors, i.e., RMSEs, of all selections of the size are below 5 × 10 −4 , which satisfies our requirement. Moreover, based on the Table 3, we can conclude that as the size of the neural network increases, the convergence speed gradually increases at the beginning; however, when it reaches a certain degree, then by increasing the size, the improvement in computation time stagnates or even becomes worse. For the considered example, the best result is obtained for 2 layers and 60 neurons.

Effect of Number of Samples on the Prediction Error and Training Time
In this section, we investigate the influence of the number of samples on the training time and prediction error of the neural network. We test various number of samples. With respect to the size of the neural network, we use 2 hidden layers with 60 neurons in each layer. Table 4 displays the training time, RMSEs of predictive results as compared to the analytical solution, and the practical epochs for this experiment. We can see that all combinations' loss functions converge below the cut-off threshold within the maximum epoch. The Xavier initialization function sets different initial values for the weight matrix of the model, which makes the training results slightly different each time. Consequently, we can see that the training time corresponding to 1000 points in Table 4 is slightly different from the training results of the neural network with 2 hidden layers and 60 neurons in each layer in Tables 2 and 3.
Finally, we observe that too few sample points, such as 10, converge the fastest while resulting in a high prediction error. Other than this option, the convergence speed increases as the sampling number goes up at the beginning; however, when it reaches a certain level, the improvement of the convergence speed stagnates or even becomes worse.

Discussions
From these studies, we can summarize the following observations.
• PINN embeds physical constraints into the loss function of the neural network by using automatic differentiation. The imposition of the "hard" boundary makes the approximate solution automatically meet the Dirichlet boundary, which accelerates the convergence speed and improves the prediction accuracy [1]. • In addition to the advantage of being mesh-free, PINN can also generalize the construction process of various PDEs. On top of that, classical methods, such as FEM, can only obtain the solution on discrete points, while further interpolation is required for other points. This property makes the solution of the state variables at the interpolated points less accurate. For PINN, when considering the points that do not appear in the training set, there is no need to conduct an interpolation scheme to obtain the solutions. • With the help of automatic differentiation, the derivative of each state variable can be easily calculated. Hence, the derivative distribution is smooth. However, for the firstorder FEM, the first derivative of the state variable inside of an element is constant, which makes the derivative distribution discontinuous. • Based on the experimental results, the convergence speed of PINN gradually increases as the size of the neural network goes up at the beginning. However, when it reaches a certain point, the improvement in computation time stagnates or even becomes worse. In addition, when studying the number of training samples influence on the convergence speed, except the case where the sampling points are too few, the convergence speed increases as the number of training samples increases; however, when it reaches a certain number, the improvement in convergence speed stagnates or even becomes worse. • According to our experiments, although the PINN offers unique advantages for solving PDEs, its computational efficiency is an obvious disadvantage compared to FEM. Therefore, figuring out how to accelerate the training of PINN is an important research topic. Ref. [40] introduced an efficient approach based on adaptive sampling strategy, which speeds up the computation of the PINN. In [19], parallel calculation of the PINN was successfully implemented, which can easily handle any complex regional problems, but the improvement in computing time is still on the way. Huang et al. [41] realized speeding up convergence for high-frequency wavefields solutions by using the information from a pre-trained model instead of initializing the PINN randomly. • PINN can be conveniently utilized to generate a surrogate model in the parametric analysis. In [42], the authors conducted a sensitivity analysis experiment. They first trained the PINN with merely a few values of a specific parameter and then utilized the trained neural network to predict the solution to this parameter within a range. The result is less accurate but still useful for the specific condition, which shows the possibility of PINN in tackling this kind of issue; we will work on this subject in our future study.

Conclusions
In this article, PINN is applied to solve an electro-thermal coupling problem. Two networks are trained to approximate each field. The coupling process is sequential, with the electric field being evaluated first, followed by the thermal field after the heat sources from the electric field are given. In order to speed up the training and improve the training accuracy, we employ the "hard" boundary method. The numerical results show the feasibility of PINN in tackling this kind of problem. A well-trained PINN can provide a global and smooth approximation of the state variable, which is convenient for the evaluation of derivatives. However, compared with FEM, the long computation time is still an obvious disadvantage of PINN and needs to be further addressed. Since the PINN was successfully used to generate surrogate models, we will work on the parametric analysis based on it in the future.