An Artiﬁcial Neural Network Based Solution Scheme for Periodic Computational Homogenization of Electrostatic Problems

: The present work addresses a solution algorithm for homogenization problems based on an artiﬁcial neural network (ANN) discretization. The core idea is the construction of trial functions through ANNs that fulﬁll a priori the periodic boundary conditions of the microscopic problem. A global potential serves as an objective function, which by construction of the trial function can be optimized without constraints. The aim of the new approach is to reduce the number of unknowns as ANNs are able to ﬁt complicated functions with a relatively small number of internal parameters. We investigate the viability of the scheme on the basis of one, two-and three-dimensional microstructure problems. Further, global and piecewise-deﬁned approaches for constructing the trial function are discussed and compared to ﬁnite element (FE) and fast Fourier transform (FFT) based simulations.


Introduction
Artificial neural networks (ANNs) have attracted a lot of attention in the last few years due to their excellent universal approximation properties.Originally developed to model nervous activity in living brains [1,2], they nowadays grow popular in data-driven approaches.Tasks such as image and speech recognition [3,4] or the prediction of users' behavior on social and commercial websites are characterized by a large amount of accessible data compared to a difficult analytic mathematical description.The use of ANNs or other machine learning algorithms such as anomaly detection [5] and support vector machines [6] is suited for such problems as it enables the fitting of even highly complex data with high accuracy.Recent trends in machine learning concern the physical constraining of data driven methods for even higher convergence rate and accuracy, as done in [7].
Due to the aforementioned advantages and improvements, machine learning algorithms gained entry into the field of continuum mechanics and material modeling as well.Successful implementations were performed for the prediction of material response based on the fitting of experimental data through ANNs [8][9][10].Another interesting application is the reduction of microstructure data of a given material through pattern recognition in order to reduce computational demands (see, e.g., [11,12]).
In the present work, we employ ANNs to seek the solution of homogenization problems.Homogenization aims for the prediction of the macroscopic response of materials that have microstructures described on length scales far lower than the macroscopic dimension.In terms of analytical homogenization, Voigt [13] and Reuss [14] were the first to provide bounds of effective properties [15].Hashin and Shtrikman [16] and Willis [17] improved the theory in terms of tighter bounds.Further estimates were developed using the self-consistent method [18,19] and the Mori-Tanaka method [20] afterwards.
In the case of a rather fine microstructures with complex topology and non-linear material behavior, those bounds are only able to make rough predictions on the effective properties.To describe microscopic fields and effective properties in a more detailed and accurate fashion, several computational approaches have been developed in the last decades.Two of the most commonly used discretization techniques are finite element (FE) methods [21,22] and fast Fourier transform (FFT) based solvers [23,24].However, for materials with a microstructure that need fine discretization, the memory cost and solution time of the solvers increases vastly, making multiscale simulations uneconomical.Promising approaches to mitigate these problems are model order reduction methods (see, e.g., [25,26]).
In the present work, a memory efficient solution scheme based on the discretization through ANNs is presented.We therefore follow the idea of Lagaris et al. [27], who introduced a method for solving differential equations through ANN-based trial functions.The functions are constructed in a way that they a priori fulfill the given boundary conditions and the squared error residual of the differential equation is used as an objective that needs to be optimized with respect to the ANNs' weights.The construction of the trial function might be a difficult task for complicated boundary value problems.However, in conventional homogenization problems, we usually deal with rather simple boundary geometries.In the present work, we consider square representative volume elements (RVE) under periodic boundary conditions, as described in Section 2. In Section 3.1, the concept of the ANN-based trial function according to Lagaris et al. [27] is explained.In contrast to Lagaris et al. [27], the optimization objective in our problem is not the squared error residual of a differential equation but emerges from a global energy potential.Sections 3.2 and 3.3 give the ANNs' structure used in the present work as well as the derivatives necessary to optimize the objective function.Finally, in Section 4, the robustness of the presented method is validated for one-, two-and three-dimensional problems and is compared to FE-and FFT-based computations.Further, we compare a global with a piecewise-defined approach for constructing the trial function.In the global approach, the solution is represented by only one global function using several ANNs and the topology of the microstructure must be learned by the neural networks itself.On the other hand, in a piecewise-defined approach, the solution is represented by many neural networks that are piecewise defined on different sub-domains of the RVE.A conclusion is provided in Section 5.

Homogenization Framework
In the present work, we consider first-order homogenization of electrostatic problems.The fundamental laws of electrostatics and the corresponding variational formulation are given.In terms of the homogenization framework, we assume separation of length scales between macro-and microscale in the sense that the length scale at which the material properties vary quickly, namely the microscale, is much smaller than the length scale at the same order of the body's dimensions.A connection for the micro-and macrofields is given by the celebrated Hill-Mandel condition [28], which allows for the derivation of consistent boundary conditions for the microscopic boundary value problem.

Energy Formulation for Electrostatic Problems
We now recall the fundamental equations of electrostatics in the presence of matter [29].The focus lies on a variational formulation as the energetic potential is later needed in the optimization principle.Assuming that there are neither free currents nor charges, the fundamental physics of electric fields in a body B are governed by the reduced Maxwell equations curl E = 0 and div D = 0 in B, where E denotes the electric field and D the electric displacement.In vacuum, the electric field and displacement are connected through the vacuum permittivity κ 0 ≈ 8.854 • 10 −12 As Vm as D = κ 0 E. In the presence of matter, the permittivity κ = κ 0 • κ r must be adapted accordingly.To solve Equation (1), we choose the electric field as our primary variable and construct it as the negative gradient of some scalar electric potential φ according to and thus Equation (1) 1 is automatically fulfilled.Next, we want to solve Equation (1) 2 under some boundary conditions.We therefore introduce an energy potential where Ψ(E) is a constitutive energy density, ∂B q and ∂B φ denote boundaries along with electric charges q and electric potential φ * as prescribed.From the latter potential, it can be shown that, for equilibrium, i.e. δΠ = 0, Equation (1) 2 is solved in a weak sense for all δφ.Here, N denotes the unit normal vector pointing outwards of ∂B.By choosing Ψ = −1/2κ E • E, the static Maxwell equations are recovered.

Microscopic Boundary Value Problem
In the present work, we consider homogenization problems governed by the existence of so-called representative volume elements (RVEs).They are chosen in a way that they are statistically representative for the overall microstructure of the material.Fields emerging on the microscale are driven by macroscopic fields, which are assumed to be constant in the RVE due to the separation of length scales.The separation of length scales leads to a degeneration of the RVEs to points on the macroscale.Scale transition rules can be derived from the Hill-Mandel condition [28] by postulating energy conservation between the microscopic RVE and a macroscopic observation in the form where the macroscopic energy potential Π is obtained at equilibrium of the microscopic energy potential Π(φ).According to Equation (3), the internal energy density appearing in the potential is now a function of the electric field.In line with the assumption of first-order homogenization and separation of length scales, the electric field vector is decomposed into a macroscopic contribution which is constant on the microscale, and the gradient of the fluctuative scalar electric potential φ that acts as primary variable.The system is closed by applying appropriate boundary conditions on the RVE.There are several boundary conditions that fulfill the Hill-Mandel condition (5).In the present work, we focus on periodic boundary conditions of the form where x ± indicate opposite coordinates at the boundary of the RVE [30][31][32].Note that we only need to prescribe one of the two boundary conditions given in the latter equation as the other one will emerge naturally in equilibrium.

Artificial Neural Network Based Solution Scheme
In this section, a solution procedure based on artificial neural networks (ANNs) for finding the equilibrium state of the potential in Equation ( 3) is presented.Core idea is the construction of trial functions, which consist of ANNs and multiplicative factors fulfilling the given set of boundary conditions in Equation (8).We then recapitulate the two main ANN structures used in the present work, namely the single layer perceptron net and the multilayer perceptron.Additionally, the derivatives of those nets with respect to its inputs as well as with respect to its weights are given as they are needed when using gradient descent methods.

Optimization Principle
Consider the previously introduced RVE occupying the space B. Our goal is to find the microscopic equilibrium state of a given global potential for this body.Under prescribed periodic Dirichlet boundary conditions in Equation (8) 1 , the potential in Equation (3) takes the form Having the physical problem covered, the question arises how to approximate the solution fields.While finite element approaches employ local shape functions and Fourier transform based methods use global trigonometric basis functions, in the present work, we want to investigate an approximation method based on artificial neural networks.Following the idea of Lagaris et al. [27], we construct a trial function φ t using arbitrary artificial neural networks N i (x, p i ) as where A i (x) are functions ensuring that the boundary conditions are fulfilled a priori.As a generic one-dimensional example, one could think of a scalar electric potential that should fulfill the boundary conditions φ(0) = φ(1) = 1.In this case, we would have A 0 = 1 and A 1 = x(1 − x), yielding the trial function according to Equation (10) . The corresponding electric field E t (x, p) in line with Equation ( 6) can be calculated analytically according to the neural network derivatives given in Sections 3.2 and 3.3.Using the gradient field along with Equation ( 9) gives the global potential in terms of the neural network's parameters as follows Finally, the objective function in our machine learning problem is obtained from the Hill-Mandel condition in Equation (5) in combination with the global potential in Equation (11) approximated by neural networks.It appears as where the optimization is carried out with respect to the neural network's parameters.By having the periodic boundary conditions fulfilled by construction, the optimization can be carried out without any constraints on the parameters p.

Single Layer Perceptron (SLP)
The single layer perceptron (SLP) is one of the most basic versions of artificial neural networks [33].It is a reduced version of the general structure depicted in Figure 1.An SLP only consists of one hidden layer that connects the input and the output.Assuming an input vector x of dimension d, the response N of an SLP is calculated as where v i , w ij , u i and b are the weights and biases of the hidden unit and the output bias, respectively, and H is the overall number of neurons in the hidden layer.Those weights and biases are assembled in the neural network parameter vector p.Here, σ denotes an activation function of a neuron.The activation functions may be chosen problem-dependent and can have a large impact on the training behavior of the artificial neural network.Figure 2 shows the three activation functions used in the present work, namely the logistic sigmoid, the hyperbolic tangent and the softplus function.Despite its popularity in machine learning tasks, we are not using the rectifier linear unit (ReLu) activation function in the present work.First tests using the ReLu function resulted in poor convergence rates.We suspect that this stems from errors in the numerical integration when using the ReLu function and its discontinuous derivative.The derivatives of the SLP net with respect to its input then appear as where σ (z i ) denotes the derivative of the sigmoid function with respect to its argument.The spatial derivative can be perceived as an SLP with modified weights and activation function but it is now a gradient field.To use efficient gradient based solvers when optimizing the weights of the neural network, it is convenient to have the explicit derivatives of the ANNs with respect to the weights.These can be obtained as Note that, in the derivatives above, the indices are not treated by Einstein's summation convention.Indices appearing twice are rather multiplied pointwise in MATLAB [34] convention.

Multilayer Perceptron (MLP)
The multilayer perceptron (MLP) works similarly to the single layer perceptron.However, it is constructed by a higher number L of hidden layers.It can be shown that this deep structure enables a more general approximation property of the neural network and might lead to better training behavior [35].In the present work, we focus on MLPs with only two hidden layers.The output is then computed as where we have now additional weights θ ki and biases c k associated with the second hidden layer.The spatial derivative of the MLP appears as The derivatives of the MLP with respect to the weights are computed as

Numerical Examples
In this section, we test the robustness and reliability of the proposed method on a set of one-, twoand three-dimensional problems.The influence of global versus piecewise-defined constructions of the trial functions as well as the impact of the neuron count on the simulation results are explored.The one-dimensional example is implemented in MATLAB [34] while the two-and three-dimensional examples are carried out by a Fortran 77 code that utilizes the L-BFGS-B optimization algorithms [36,37].For the sake of simplicity, all the following simulations are performed with normalized units.

One-Dimensional Example
We first consider a one-dimensional problem to demonstrate the features of the proposed method.The RVE is a simple two-phase laminate of unit length l = 1 and unit cross section, which is loaded with a macroscopic electric field E = 0.01.The global potential takes the form where κ 1 = 1 for x < 0.5 and κ 2 = 2 for x > 0.5.Having the decomposition in Equation ( 6) along with the boundary conditions φ(0) = 0 and φ(1) = 0, the analytical solution for the electric field reads and for the electric potential To find the electric scalar potential that optimizes the energy potential in Equation ( 19), we construct a global trial function according to Equation (10) that automatically fulfills the boundary conditions given above as where N is a neural network that takes x as an input and has the weights and biases p.The derivatives in this case can be computed explicitly as The derivatives then allow us to compute the global potential in terms of the neural network's weights and biases as Finally, we need a numerical integration scheme to evaluate the integral.In the present work, we use quadrature points in terms of equidistant grid points x k with distance ∆x on the interval of [0, 1] as {∆x/2, 3∆x/2, . . ., 1 − ∆x/2}, yielding the discrete objective The maximum of this objective function can be found by means of the gradient descent method.The gradients of the objective with respect to the weights p that are needed for such an iterative solver can be computed using Equation (15).We then have everything at hand to carry out the first numerical example.As for the ANN architecture, we use an SLP with H = 10 neurons in the hidden layer.As for the activation function, the logistic sigmoid function σ(z) = 1/(1 + e −z ) is chosen, and the weights and biases p are randomly initalized with a uniform distribution between 0 and 1. Figure 3 shows the result of the numerical experiment (green) compared to the analytical results (black).One can see that the numerical scalar electric potential φ t is close to the analytical solution.However, having a look at the gradients in the form of E reveals the occurrence of oscillations around the discontinuity.The MATLAB [34] code that generates these results can be found in Appendix A. Note that the tolerances for the step size and the optimality as well as the maximum number of function evaluations and iterations is set different from the MATLAB [34] default values to obtain reasonable results.One might decrease the oscillations by having even higher iteration numbers.However, the jump in the solution at the vicinity of the material jump cannot be captured by the global smooth neural network function.The more neurons we use, the better accuracy we obtain, which leads to a trade-off between accuracy and speed.analytical solution for a global trial function approach using an SLPs with 10 neurons and a random initialization of p (0) ∼ U (0, 1), where U (0, 1) denotes a vector of numbers generated from a uniform distribution between 0 and 1.These parameters are statistically independent.

Piecewise-defined Neural Net Approach on Equidistant Grid
To overcome the oscillations observed in the global approach, we next construct a trial function that is piecewise defined for the RVE's different material regions Its derivatives can be computed according to The global potential in terms of the neural network's weights and biases then appears as In analogy to the global approach, numerical integration is performed using equidistant grid points x k with distance ∆x as quadrature points in the interval [0, 1] to obtain the discrete objective The gradient of the objective function with respect to the parameters p can again be obtained through the derivatives of Equation (15).However, the output and training behavior of artificial neural networks is dependent on the initialization of the weights and biases p.In a first run, we set the number of neurons in the two SLPs' hidden layers to 10 and initialize the weights randomly between 0 and 1.As for the activation function, we choose the logistic sigmoid function σ(z) = 1/(1 + e −z ).
Figure 4 shows the numerical results in green compared to the analytical results in black.There is a distinct deviation between them.Having a look at the output layer of the SLP in Equation ( 13) and its derivative in Equation ( 14), one can see that the initial output for 10 neurons for an initialization of all weights between 0 and 1 is far from the exact solution.This difference becomes even larger for higher number of neurons.In contrast to the global approach, we use the default values of the unconstrained MATLAB [34] minimizer and, apparently, there are too few iterations.
Next, we want to have fewer iterations of the solver within the default values by using an adaptive way of initializing the weights.The key idea is to initialize the weights in a way that the net and its derivative output values are roughly in the range of the values we would expect with respect to the given macroscopic load.We therefore use a simple modification of the weight initialization given as where U (0, 1) is a vector of random numbers uniformly distributed along 0 and 1. Please note that we use boldface calligraphic U to indicate the vector notation instead of univariate distribution.The results of the computation using the adaptive initialization method can be seen in Figure 5.The results are closer to the analytical solution and are independent of the number of hidden neurons used, making the overall method much more reliable.The MATLAB [34] code used for generating Figure 5 can be found in Appendix B. , where U (0, 1) denotes a vector of numbers generated from a uniform distribution between 0 and 1.These parameters are statistically independent., where U (0, 1) denotes a vector of numbers generated from a uniform distribution between 0 and 1.These parameters are statistically independent.

Two-Dimensional Example
Next, we consider a two-dimensional microstructure with a circular inclusion, as shown in Figure 6.The radius of the inclusion is r 0 = 0.178l, corresponding to 10% volume fraction.The global energy potential per unit out-of-plane thickness is given as the integral of the internal energy density over the RVE's domain In this example, the material parameter is set to κ = 1 in the matrix and κ = 10 in the inclusion.The square RVE of unit length l = 1 is loaded with the constant macroscopic field E 1 = 1 and E 2 = 0 under the periodic Dirichlet boundary conditions in Equation (8) 1 .For reference, a simulation using the finite element method is performed for this optimization problem.The mesh is discretized by linear quadrilateral elements with four Gauss points.The optimized global potential is calculated to |Π| FEM = 0.588652.Figure 6 shows the contour plot of the microscopic field E 1 .The finite element results serve as a reference for the neural network based approaches in the following examples.

Global Neural Net Approach on Equidistant Grid
First, we want to build a global trial function, which is covering the whole RVE.As we want to implement periodic boundary conditions, a set of three neural networks is used to construct the trial function φ t : N x 1 (x 1 ) acting in the x 1 -direction, N x 2 (x 2 ) acting in the x 2 -direction and N 1 (x) acting in both directions.The trial function then takes the form Figure 7 shows a visualization of the functions ensuring the boundary conditions.Here, we use SLPs for the boundary networks and a two-layer MLP for the two-dimensional neural network.As for the activation function for the neurons, the hyperbolic tangent σ(z) = tanh(z) is chosen.The negative gradient of the trial function can then be computed to obtain the trial field where we drop dependencies on x and p in our notation.According to Equation (32), we arrive at the global potential as an objective The spatial gradient of the trial function ∇ φ t as well as gradients of the global potential ∂Π(p)/∂p used in optimization algorithms can be obtained analytically through the derivatives given in Sections 3.2 and 3.3.We use equidistant grid points for establishing a regular mesh of elements and employ nine Gauss points per element as the quadrature points for numerical integration.The objective is optimized without any constraints on the parameters p, as we satisfy the boundary conditions a priori by the construction of φ t .
Figure 8 shows the contour plot of E t1 of the neural network after 20,000 iterations.One can see that the neural network in Case (a) localizes in an unphysical state.This could be a problem related to overfitting.Case (b), with the same amount of integration points but lower neuron count, shows a qualitatively better result.The overall optimization seems to become more reliable as the integration is performed more accurately, as seen in Case (c).Quantitatively, the global potentials are: (a) |Π|(p) = 0.213522; (b) |Π|(p) = 0.590266; and (c) |Π|(p) = 0.589887 compared to the FEM potential of |Π| FEM (E) = 0.588652, taken from the simulation that creates Figure 6.In two dimensions and with complicated geometries at hand, it can be difficult to estimate the magnitude of the solution fields appearing a priori.A rather simple approach for initializing the weights as described in the one-dimensional case did not seem to significantly improve the method.Here, we test a second approach of initializing the weights according to a normal distribution where we set the mean µ = 0 and the variance σ 2 = 1. Figure 8 shows the contour plot of E t1 of the neural network after 20,000 iterations for the normal distribution.Case

Piecewise-defined Neural Net Approach
To further improve the training and approximation properties of the trial functions, we next construct it in a way that it captures the discontinuity of material properties in the microstructure a priori.This is possible due to the simple topology of the microstructure at hand.However, for more complicated microstructures, defining explicit expressions for the trial function might be difficult.We now need a set of four ANNs for the construction of φ t : N x 1 (x 1 ) acting in the x 1 -direction, N x 2 (x 2 ) acting in the x 2 -direction, N 1 (x) acting in the matrix and N 2 (x) acting in the inclusion.The global trial function is defined piecewise in two sub-domains as follows One can see that the above equations automatically fulfill periodicity at the boundaries as well as the transition condition at the phase interface of the circular inclusion.Here, we implement the trial function in Cartesian coordinates for the matrix and in polar coordinates for the inclusion.Thus, the neural network of the inclusion takes the coordinate vector r = (r, ϕ) in terms of the radius r and the angle ϕ as its input.Having the origin of our coordinate system in the bottom left corner, the transformation used here takes the form x 1 = 0.5 + r cosϕ and x 2 = 0.5 + r sinϕ.(39) With the coordinate transformation at hand, one can calculate the gradient in the matrix and inclusion as where E is the applied macroscopic electric field.For a more detailed derivation, see Appendix C. Finally, the potential to optimize in this example takes the form As a numerical integration scheme, we use a simple one-point quadrature rule, where we have an equidistant grid of 3240 integration points in the matrix and 3600 integration points in the inclusion.The MLP N 1 (x, p 1 ) has two layers with five neurons in each, the SLP N 2 (x, p 2 ) has one layer with four neurons and the boundary SLPs N x 1 (x 1 , p x 1 ) and N x 2 (x 2 , p x 2 ) have seven neurons in the hidden layer.This sums up to a total of 116 degrees of freedom p.In the present example, we use the hyperbolic tangent σ(z) = tanh(z) as the activation function of the neurons.The initial weights are randomly initialized with uniform distribution in the range of −1 to 1.As in the previous examples, the applied macroscopic load in Equation ( 40) is E 1 = 1.0 and E 2 = 0.0.
Figure 9 shows the result for E t1 after 10,000 iterations for the local construction of the trial function.Compared with the previous simulations using only one global trial function, the fields have fewer oscillations.Additionally, the maximum energy after 10,000 iterations is |Π|(p) = 0.588414, being lower than in the previous simulations and lower than the maximum energy computed with finite elements.Interestingly, in the first iterations, the learning rates are higher for the global schemes, whether the weights are initialized according to a normal distribution or a uniform distribution (see Figure 9).

Three-Dimensional Example
In the present example, we apply the method to a three-dimensional cubic RVE of unit length l = 1 with a spherical inclusion of radius r 0 = 0.178l.We first construct a global trial function, for which we need a set of seven neural networks: N x 1 (x 1 ) acting in the x 1 -direction, N x 2 (x 2 ) acting in the x 2 -direction, N x 3 (x 3 ) acting in the x 3 -direction, N x 12 (x 1 , x 2 ) acting in the x 1 x 2 -plane, N x 13 (x 1 , x 3 ) acting in the x 1 x 3 -plane, N x 23 (x 2 , x 3 ) acting in the x 1 x 2 -plane and N 1 (x) acting in the RVE's volume (see Figure 10).The trial function for the matrix then appears as where the functions A i take the form By construction, the trial function fulfills the periodic boundary conditions.The negative gradient of the trial function can then be computed analytically according to Equation (40) (see also Appendix D for a more detailed derivation).With the gradient at hand, we are able optimize the global potential in Equation ( 32) with respect to the weights of the ANNs.As seen in the previous sections, the use of one global trial function might lead to oscillations in the solution field.In line with Section 4.2.2, we additionally construct a piecewise-defined trial function for comparison where we add an eighth neural network for the inclusion and the function The transformation between the spherical coordinates r = (r, ϕ, θ) and the Cartesian coordinates x = (x 1 , x 2 , x 3 ) is given as The gradient in Equation ( 40) can then be computed accordingly in Cartesian coordinates for the matrix and in spherical coordinates for the inclusion, as derived in Appendix D, allowing us to optimize the global potential For our numerical experiment, the RVE depicted in Figure 10 is loaded with a macroscopic field of E 1 = 1.0,E 2 = 0.0 and E 3 = 0.0.The material parameters are chosen as κ 1 = 1 and κ 2 = 10.For the global approach, the mesh consists of 43 × 43 × 43 equidistant integration points.The ANNs acting on the edges and surface boundaries are SLPs, having four neurons along each edge ANN and five neurons along each surface ANN.The ANN acting in the volume is a two-layer perceptron with eight neurons in each layer.This trial function totals 256 parameters to optimize in the ANNs.As for the activation function, we choose the softplus function, as it provides the best results in our examples.As for the piecewise-defined trial function in Equation ( 44), the ANNs are constructed in the same way.The additional ANN acting in the inclusion is an SLP with eight neurons and also has the softplus activation function.The mesh used in this case consists of 32,768 integration points in the matrix and 16,384 integration points in the inclusion.Additionally, an FFT-based simulation [23] using a conjugate gradient solver [24,[39][40][41][42] is carried out for comparison, where the microstructure is discretized on a 43 × 43 × 43 grid.
Figure 11 shows the contour plot of E t1 for the approach using the global and the piecewise-defined trial function after 20,000 iterations.In comparison to the FFT-based solution, both simulations produce qualitatively good results.The global approximation scheme captures the expected jump of E t1 across the phase interface a little less distinctly compared with the piecewise-defined one.However, the piecewise-defined approximation is more difficult to construct and will be quite challenging to implement in the case of complicated microstructure geometries.Quantitatively, the optimized global potentials after 20,000 iterations are quite close to each other, with |Π|(p) = 0.529926 for the global approach and |Π| * (p) = 0.529692 for the piecewise-defined approach, compared to a optimum energy |Π| FFT = 0.527590 obtained by the FFT-based simulation.

Conclusion
We presented a solution scheme to periodic boundary value problems in homogenization based on the training of artificial neural networks.They were employed in tandem with the multiplicative factors in a way that the resulted trial function fulfilled the boundary conditions a priori and thus we arrived at the unconstrained optimization problem.The numerical examples showed that physically reasonable results can be obtained with rather low amounts of neurons, which allows for low memory demand.A construction of trial functions by defining them piecewise in separate sub-domains led to lower oscillations and a generally more stable training behavior compared with a global approach but was geometrically more challenging to construct.The scheme carried over to three dimensions quite well.We assume this to stem from the ratio of neurons compared with the number of integration points: In the considered example of a cube-shaped matrix with spherical inclusion, the solution could be approximated using a quite low neuron count while the number of integration points grew a lot.For future progress, the training speed of the neural network needs to be improved.More sophisticated ANN structures such as deep nets or recurrent nets might further improve the approximation and training behavior of the method, while methods such as dropout or regularization might assist to avoid problems of overfitting.

% Number o f hidden neurons
ensure the satisfaction of the boundary conditions.As the matrix material (r ≥ r 0 ) is described in Cartesian coordinates, the gradient can be straightforward computed as where the derivatives of the factors A i take the form

(A10)
As for the inclusion, the transformation between the spherical coordinates r = (r, ϕ, θ) and the Cartesian coordinates x = (x 1 , x 2 , x 3 ) is given as x 1 = 0.5 + r sinϕ cosθ, x 2 = 0.5 + r sinϕ sinθ and x 3 = r cosϕ. (A11) One could now either directly substitute the latter transformation into the trial function in Equation (A7) and take the derivatives with respect to r.Alternatively, the Jacobian for the matrix can be computed as which allows for the computation of the gradient of φ * t in spherical coordinates through where one has to appropriately add the constant macroscopic loads to the fluctuations in Equations (A9) and (A13) in Cartesian and polar coordinates.

Figure 1 .
Figure 1.General structure of a multilayer perceptron with input x, L hidden layers and output N.Each neuron evaluates its inputs through predefined activation functions.

Figure 2 .
Figure 2. Different types of popular activation functions: logistic sigmoid, hyperbolic tangent and softplus function.

Figure 3 .
Figure3.Numerical vs. analytical solution for a global trial function approach using an SLPs with 10 neurons and a random initialization of p (0) ∼ U (0, 1), where U (0, 1) denotes a vector of numbers generated from a uniform distribution between 0 and 1.These parameters are statistically independent.

Figure 4 .
Figure 4. Numerical vs. analytical solution for 2 SLPs with 10 neurons each and a random initialization of p (0) ∼ U (0, 1), where U (0, 1) denotes a vector of numbers generated from a uniform distribution between 0 and 1.These parameters are statistically independent.

Figure 5 .
Figure 5. Numerical vs. analytical solution for 2 SLPs with 10 neurons each and a random initialization of p (0), * ∼ E H * U (0, 1), where U (0, 1) denotes a vector of numbers generated from a uniform distribution between 0 and 1.These parameters are statistically independent.

Figure 6 .
Figure 6.Microstructure of length l with a circular inclusion having the radius r 0 = 0.178l.On the right, a finite element solution for a phase contrast of 10 and a loading of E 1 = 1 and E 2 = 0 is displayed.

Figure 7 .
Figure 7. Visualization of the functions A 1 , A 2 and A 3 ensuring periodicity of the two-dimensional trial function φ t in a square RVE of unit length l = 1.One can see that A 1 covers the volume, A 2 satisfies the periodicity in x 1 -direction and A 3 satisfies the periodicity in x 2 -direction.
(a) now shows a more physical state.The energies are calculated as: (a) |Π|(p) = 0.590054; (b) |Π|(p) = 0.592981; and (c)|Π|(p) = 0.590544.At this point, there is surely some potential left for improving weight initialization.This is subject to other fields of machine learning as well, especially in the context of training speed and vanishing gradient problems[38].

Figure 8 .
Figure 8. Contour plot of E t1 for a set of parameters: (a) 51 × 51 elements, 15 neurons per layer in N 1 and 5 neurons each for N x 1 and N x 2 ; (b) 51 × 51 elements, 10 neurons per layer in N 1 and 5 neurons each for N x 1 and N x 2 ; and (c) 101 × 101 elements, 15 neurons per layer in N 1 and 5 neurons each for N x 1 and N x 2 .

Figure 9 .
Figure 9. (left) Contour plot of E t1 for the piecewise-defined trial function in Equation (38) after 10,000 iterations in the optimization procedure; and (right) optimization of Π(p) vs. iteration count for the piecewise-defined trial function with uniformly distributed weight initialization and the global trial function (33) with uniformly and normally distributed weight initialization.

Figure 10 .
Figure10.A representation of the artificial neural networks used in the construction of the trial function φ t .There are separate ANNs for the edges, surface boundaries and volumes.In the piecewise-defined approach, there is one additional ANN acting in the inclusion.

Figure 11 .
Figure 11.Contour plot of E 1 for a global and a piecewise-defined construction of the trial function as well as an FFT-based simulation for comparison.The results of the ANN simulations are close to the FFT-based simulation.The jump discontinuity is more distinct in the piecewise-defined approach compared with the global approach.

19 % 22 % 24 '
i t i a l i z e weights and b i a s e s 17 wb0 = rand ( 3 * nn + 1 , 1 ) ; 18 Anonymous f u n c t i o n handle 20 f = @(wb) neuralapprox (wb, KK, E0 , x , dx , nn ) ; 21 C a l l u n c o n s t r a i n e d minimizer 23 o p t s = optimoptions ( @fminunc , ' Algorithm ' , ' quasi−newton ' , . . .S t e p T o l e r a n c e ' , 1 e −12 , ' O p t i m a l i t y T o l e r a n c e ' , 1 e − 1 2 , . . .

25 '
e c i f y O b j e c t i v e G r a d i e n t ' , t ru e , ' CheckGradients ' , t ru e , . . .