Mesh-Free Surrogate Models for Structural Mechanic FEM Simulation: A Comparative Study of Approaches

: The technical world of today fundamentally relies on structural analysis in the form of design and structural mechanic simulations. A traditional and robust simulation method is the physics-based ﬁnite element method (FEM) simulation. FEM simulations in structural mechanics are known to be very accurate; however, the higher the desired resolution, the more computational effort is required. Surrogate modeling provides a robust approach to address this drawback. Nonetheless, ﬁnding the right surrogate model and its hyperparameters for a speciﬁc use case is not a straightforward process. In this paper, we discuss and compare several classes of mesh-free surrogate models based on traditional and thriving machine learning (ML) and deep learning (DL) methods. We show that relatively simple algorithms (such as k -nearest neighbor regression) can be competitive in applications with low geometrical complexity and extrapolation requirements. With respect to tasks exhibiting higher geometric complexity, our results show that recent DL methods at the forefront of literature (such as physics-informed neural networks) are complicated to train and to parameterize and thus, require further research before they can be put to practical use. In contrast, we show that already well-researched DL methods, such as the multi-layer perceptron, are superior with respect to interpolation use cases and can be easily trained with available tools. With our work, we thus present a basis for the selection and practical implementation of surrogate models.


Introduction
Assessing the properties of mechanical structures with real physical experiments is expensive, as it costs both time and resources. To reduce these costs of knowledge enrichment in the field of structural analysis, computer simulations of structural mechanics have become crucial. An essential simulation method is the finite element method (FEM) in which the simulation domain space is represented by a finite number of connected elements. Space-and time-dependent behavior between connected elements and within the elements themselves is governed by physical equations. Observation of real physical experiments provides the coefficients for these governing equations. Since most geometries and use cases cannot be solved analytically, an approximation of the proposed physical equations is obtained by numerical methods [1]. However, solving complex problems with FEM is time-consuming and computationally expensive. In order to reduce the computational effort, surrogate modeling offers a promising solution [2].
Surrogate models are trained in a supervised manner and are designed to learn the function mapping between inputs and outputs from a given FEM simulation use case. With a sufficient amount of training data with respect to the use case, an according model is able to substitute for the FEM simulation use case up to a certain accuracy.
There is already a considerable number of related work concerning surrogate modeling of structural mechanics simulations with machine learning (ML) or deep learning (DL) approaches. In the following, we want to present the most important works for this paper. Artificial neural networks (ANN) are used in the work of Roberts et al. [3] to predict damage development in forged brake discs reinforced with Al-SiC particles, using damage maps. The ANN is a multilayer perceptron (MLP), and training data are obtained from FEM simulations using the commercial DEFORM simulation software. For rapid estimation of forming and cutting forces for given process parameters, Hans Raj et al. [4] investigate a method using MLP models. The researchers focus on two processes: hot upsetting and extrusion. Each process, represented by a MLP, is trained with FEM simulation results from the FORGE2 commercial FEM simulation software. García-Crespo et al. [5] predict the projectile response after impact with steel armor using a MLP; their surrogate model studied is trained with data from FEM simulations of the use case. Nourbakhsh et al. [6] explore generalizable surrogate models for 3D trusses, using MLP and FEM training data. Chan et al. [7] estimate the performance of hot-forged product designs, using a MLP trained on FEM results obtained with the commercial software DEFORM. D'Addona and Antonelli [8] use single-layer feedforward ANNs instead of FEM as a metamodel in a sequential approximate optimization (SAO) algorithm. In a case study on hot forging of a steel disk, they compare their results with an ANN trained on FEM simulation results and the FEM simulation software QForm3D. Gudur and Dixit [9] predict the velocity field and location of neutral point of cold flat rolling with a MLP trained with rigid-plastic FEM simulation results. Pellicer-Valero et al. [10] predict the mechanical behavior of different livers with MLPs trained from FEM simulations.
Abueidda et al. [11] estimate the mechanical properties of a two-dimensional checkerboard composite using a convolutional neural network (CNN) trained with FEM results. Regarding mesh-based approaches, Pfaff et al. [12] present a framework to train graph neural networks (GNN) on mesh-based simulations and show the applicability in aerodynamics, structural mechanics, and fabric.
Surrogate models were also obtained using classical, i.e., non-neural ML, approaches. For example, the authors of [3] apply Gaussian process regression (GPR) besides ANN in their approach. Loghin and Ismonov [13] predict the stress intensity factors, using GPR trained with FEM results of a classical bolt-nut assembly. Ming et al. [14] model the electrical discharge machining process with GPR trained from data generated with numerical FEM simulation.
Using support vector regression (SVR), Pan et al. [15] construct a metamodel in an optimization approach for lightweight vehicle design. Training data are generated, using design of experiment approaches with FEM simulations. To predict the stress at the implantbone interface, Li et al. [16] utilize SVR in order to replace FEM simulation. Hu and Li [17] estimate cutting coefficients in a mechanistic milling force model with SVR trained with FEM simulation data.
Employing tree-based models, Martínez-Martínez et al. [18] estimate the biomechanical behavior of breast tissue under compression, using three different tree-based models trained from FEM simulations. The models are trained with FEM data in terms of nodal coordinates and nodal tissue membership. Zhang et al. [19] estimate the base failure stability for braced excavations in anisotropic clay using extreme gradient boosting, random forest regression (RFR) and data obtained from FEM simulation results. Qi et al. [20] utilize a decision tree regressor to predict the mechanical properties of carbon fiber reinforced plastics with data obtained from FEM simulations. Besides MLPs Pellicer-Valero et al. [10] utilize RFRs to predict the biomechanics of livers.
A recent neural network-based approach are physics informed neural networks (PINNs). PINNs are trained simultaneously on data and governing differential equations and can be used for the solution and inversion of equations governing physical systems. Utilizing PINNs, Haghighat and Juanes [21] substitute a particular FEM simulation of a perforated strip under uniaxial extension. In [22], Haghighat et al. present a surrogate modeling approach with PINNs and a specific use case. Focusing on consistency, Shin [23] evaluates findings regarding PINNs with Poisson's equation and the heat equation. Yin et al. [24] use PINNs to predict permeability and viscoelastic modulus from thrombus deformation data, described by the fourth-order Cahn-Hilliard and Navier-Stokes equations. In addition to the application of PINNs in structural mechanics problems, there is also a considerable number of papers, especially in computational fluid dynamics [25][26][27][28][29].
Related work shows capabilities of surrogate modeling, thus demonstrating the feasibility of supervised learning models trained with FEM simulations. From our analysis of the existing literature, we identify the following drawbacks: • In most cases, the surrogate model only substitutes for a subset of the considered computational domain. Thus, such an approach focuses only on a region of interest and cannot be used to evaluate the entire computational domain (notable exceptions are [12,22]). • Surrogate models representing the complete discretized computational domain (mesh) are solely fitted and evaluated on one use case-generalization to unseen data is only achieved with respect to the discretization of the computational domain, but not with respect to other use case specific parameters (notable exception concerning material parameters [22]). • Due to differences in FEM use cases and data, the comparison of related work is useful only in some cases. • Replication of published experiments is often not achievable because important parameters are not reported, e.g., number of finite elements, type of finite elements (bilinear, biquadratic, reduced integration etc.), method of discretization (meshing), as well as hyperparameters of the ML models, such as learning or activation functions.
To address these drawbacks, we present the following contributions of our paper: 1.
We present the main DL and ML methods together with a compact description and mathematical notation to equip practitioners with a reference to surrogate FEM simulation mesh-free and assess the feasibility and maturity of the novel PINNs method.

2.
We utilize three classic use cases in structural mechanics and evaluate these models in terms of performance on unseen configurations (inter-and extrapolation) in order to assess their ability to generalize across different use case specific parameters.

3.
We discuss the characteristics of all DL and ML models, and their practical implications, in the context of the use cases.
With our work, we pave the way of mesh-free surrogate modeling for practical use: we provide a basis for efficient model and hyperparameters selection regarding use case and performance metrics. These insights shall not only assist the domain expert during model selection, but will also help in consolidating the current research in mesh-free surrogate modeling for structural mechanics applications.
We report all information to make our experiments reproducible. If certain model settings are not mentioned, they are left at default values. Moreover, our FEM simulations are performed with Abaqus Student Edition 2019 (Dassault Systèmes, Velizy-Villacoublay, France), and thus, the process of data generation is not limited to commercial software, which makes it possible for everyone to connect to our research.
The remainder of this paper is organized as follows. In Section 2, we present the materials and methods of our experiments, first providing insights into the process of data generation, using the FEM simulations in Section 2.1, then describing the datasets obtained from the FEM simulations in Section 2.2, followed by the ML and DL models used in Section 2.3. Section 3 shows the results, which are discussed in Section 4. In Section 5, we present the conclusion of our work and an outlook for the future.

Materials and Methods
In this section, we present all relevant information about the methodology of our experiments. First, Section 2.1 provides an overview of the data generation process, using three classic FEM simulation use cases. Then, Section 2.2 describes the datasets used from the FEM simulations, and Section 2.3 presents the ML and DL models used. A more detailed overview of the mathematical background and assumptions of the ML and DL models can be found in the Appendix. When predicting a particular use case with a surrogate model, the individual nodes discretizing the particular geometry of the use case (i.e., mesh) are sequentially input into the surrogate model with the appropriate generalization variable. The surrogate model then predicts the output of each node in sequence; see Figure 1. Principle of our surrogate model approach: all N nodes (i.e., their coordinates), together with the respective generalization variable, are sequentially entered into a surrogate model, which then sequentially predicts the outcome of the respective coordinates (i.e., the displacements, strains, and stresses of the respective node).
It should be noted that there are no constraints on the discretization (mesh), i.e., the node coordinates can be freely chosen within the simulation domain and nodes are not connected to each other. Therefore, we refer to our approach as mesh-free, but we want to clearly distinguish ourselves from other mesh-free methods, such as smoothed particle hydrodynamics, the diffuse element method, the moving particle finite element method, etc. The predictions of the individual nodes together constitute the prediction for the simulation domain of the particular use case. By adding the nodal displacement outputs of the surrogate model to the initial node coordinates, we obtain the new deformed geometry. Further surrogate model outputs (e.g., stresses, strains) describe the queried nodes and thus the complete simulation domain in more detail.

FEM Use Cases
For illustration, we base our evaluation on three classic use cases from structural mechanics. We consider the (1) tensile load, (2) bending load and (3) compressive load:

1.
Elongation of a plate with a perforation; 2.
Bending of a beam; 3.
Compression of a block with four perforations.
See Table 1 and Figure 2. We utilize an isotropic elasto-plastic rate-independent material model (i.e., a perfectly plastic material). The kinematic relations for our 2D plane strain use cases are defined by the total strain components ε xx = ∂u x ∂x , ε yy = ∂u y ∂y , ε xy = 1 2 ( ∂u x ∂y + ∂u y ∂x ), ε zz = 0 with displacements u x and u y and deviatoric strain components e xx = ε xx − ε vol 3 , e yy = ε yy − ε vol 3 , e xy = ε xy and e zz = − ε vol 3 . Since there is no volumetric plastic strain in the von Mises yield function, the volumetric strain can be expressed as with equivalent plastic strain of the von Mises model asε pl =ε − σ Y 3µ ≥ 0, where σ Y is the yield stress and µ the second Lamé parameter. The total equivalent strain is defined byε = 2 3 ∑ i,j∈{x,y} e ij e ij with deviatoric strain components e ij . The decomposition of the strain is ε ij = ε el ij + ε pl ij with elastic component ε el ij and plastic component ε pl ij of the respective strain matrices. The equivalent stress is defined by q = 3 2 s ij s ij . In our PINN approach, we utilize the definitions of the total strain components, deviatoric strain and stress components and plastic strain components in the respective regularization term.
We use quarter symmetry in use cases 1 and 3 to make efficient use of computational resources. Additional information regarding the variation of parameters in the simulations is presented in Table 2, where simulations marked in bold are used for the test and evaluation of the surrogate models and are not in the training dataset. Conversely, simulations not marked in bold represent the training dataset and are not in the test dataset. In use cases exhibiting varied geometry parameters (i.e., elongation of a plate and compression of a block use cases), the mesh is also different in each simulation. Thus, we train and evaluate the surrogate models on use cases with different meshes (i.e., in each simulation, the node coordinates differ). Geometry, Material Properties Compression The first use case, a perforated steel strip under tensile load, is similar to the nonlinear solid mechanics use case of [21,22]. However, in our approach, we evaluate the generalization over the perforation diameter and use material properties for steel and a top edge displacement of 5 mm in positive y-axis to consider a more challenging use case.
We execute different simulation settings, where the generalization variable (diameter of perforation) is changed in each simulation; see Figure 2a and Table 2. In our second use case, we simulate a bending beam that end is displaced about 5 mm in the positive xdirection; see Figure 2b. We vary the yield stress generalization variable in each simulation setting; see Table 2. In our third use case, we simulate a quarter-symmetric block with four perforations under compressive load, which is compressed about 5 mm in the negative y-axis; see Figure 2c. In this use case, we vary two generalization variables (yield stress and width of the block) in each simulation; see Table 2.
We evaluate our models on interpolation (i.e., that the generalization variables for testing are within the range of the generalization variables observed during training) and extrapolation (i.e., that the generalization variables for testing are outside the range of the generalization variables observed during training) tasks. In Table 2, we mark interpolation tasks with superscript (i) and extrapolation tasks with superscript (e).
In Figure 3, we present the perfect nonlinear elastoplastic material behavior of our use cases. The Young's modulus is 210 GPa, Poisson's ratio 0.3 and the yield stress 900 MPa. In our first use case, the perforated plate, we use this setting in each simulation. In the other two use cases, the yield stress varies, while the remaining material parameters stay the same.  All parts are meshed, using plane strain 4-node bilinear quadrilateral elements with reduced integration and hourglass control. Please note that although [22] recommends the use of larger order elements for the approximation of body forces, we use bilinear elements since we do not use body forces in our surrogate modeling approaches. We create a finer mesh near additional geometric details (i.e., perforations in the plate and block use cases) and seed the perforation edge of the plate with an approximate size of 3.8 mm and the remaining edges with an approximate size of 5 mm. The perforation edges of the block are seeded with an approximate size of 3 mm and the remaining edges with an approximate size of 4 mm. The beam exhibits no comparable geometric details; thus, we seed all edges with an approximate size of 1.5 mm.  We obtain our FEM simulation results in the context of general static simulations. Details of the simulation steps are shown in Table 3. Simulation control parameters that are not listed are left at default values.

Dataset
The nodal data from our Abaqus FEM simulations constitute the datasets. For each use case, the nodal data are split into training and test dataset, respectively. The training dataset D = {X 1 , . . . , X n } with number of training instances n and the test dataset T = {X n+1 , . . . , X n+m } with number of test instances m are generated from several FEM simulations; see Tables 2 and 6, where bold marked simulations belong to T and the remaining to D. Thus, we split our data due to different generalization variables and not randomly. We denote each instance with index i, i ∈ {1, 2, . . . , n + m}. An instance X i = (x i , y i ) is generated of an input vector x i ∈ R p and output vector y i ∈ R q . Each input vector x i is composed of the initial xand y-coordinates of a FEM node and the respective generalization variable (i.e., perforated plate: Diameter, beam: Yield Stress, block with four perforations: Width and Yield Stress) of the FEM simulation; see Table 4. Thus, we have p = 3 in the plate and beam use case, and p = 4 in the block use case. In our setting, each output vector y i contains 13 (q = 13) output variables obtained from FEM simulation with input x i , namely the ε t xx , ε t xy and ε t yy total strain components, the ε p xx , ε p xy , ε p yy and ε p zz plastic strain components, the σ xx , σ xy , σ yy and σ zz principal and shear stress components and the displacement in x-and y-directions u and v of each node; see Table 5 and Figure 4. We split the data in a training and test dataset (see Table 6) and standardized the data by removing the mean and scaling to unit variance. In Figure 4, we present graphical results with visible mesh obtained from Abaqus FEM simulation of the output variables used for a block use case.

Surrogate Models
In this section, we give an overview of the surrogate models used and their general assumptions; to highlight the differences as well as the advantages and disadvantages between them, we present a detailed mathematical background in Appendix A. We have selected models from different learning paradigms:

Results
For evaluation, we split the data into a training and test dataset to fit and test our surrogate models; see Table 6 for the dataset sizes and Table 2 for more details regarding the data split.
As a next step, we need to define hyperparameters for each model and each use case. We performed hyperparameter optimization using only training data; no test data were used. In our PINN approaches, the adaptation of hyperparameters was based on the work of [21,22]. Our MLPs were designed to be similar to our PINNs to allow for fair comparisons. We varied hyperparameters in our neural network approaches (MLP and PINN) following best practices and guidelines, where we optimized the number of hidden layers, number of neurons per hidden layer, activation function, validation split, earlystopping patience and the size of the batch per training epoch. Regarding the rest of our models, we applied a grid-search with a five fold cross-validation, utilizing the training data to obtain the best hyperparameters. The hyperparameters for each use case are in Appendix B and Tables A1-A6.
Our evaluation is based on R2-scores with respect to the FEM results and inference time. For models that contain inherent randomness, such as MLPs, GBDTR and PINNs, a five-fold cross-validation was conducted. For these models, we report the mean values and standard deviation of the R2-score. For the sake of brevity, we report only the average R2-scores across all 13 targets in this section; see Tables 7-9. The R2-scores for individual targets are provided in Appendix C. The inference times are based on the mean value of three measurements. Inferences were run on a machine with 16 GB RAM, 8 CPUs and Intel(R) i7-8565 2.0GHz processor. To compare the inference time of our surrogate models with the computation time required to run FEM simulations, we have included the latter also in Tables 7-9.   GBDTR, KNNR, GPR and SVR algorithms were implemented with the scikit-learn library version 0.24.0 in Python. The SVR and GBDTR algorithms were constructed with MultiOutputRegressor scikit-learn API to fit one regressor per target. Regarding our DL algorithms, the utilized MLPs were implemented with the keras API version 2.4.3 and our PINNs were implemented with the sciann API version 0.5.5.0 in Python 3.8.5. We used the PDEs from [21,22], but instead of the inversion part, we trained our PINNs additionally with plastic strain data, same as for the rest of the surrogate models. In the elongation of a perforated plate use case, our approach is based on a total of nine FEM simulations. We used five simulations for training and four simulations to evaluate the fitted models; see Table 2. We report the average of R2-scores across all outputs in Table 7 with the corresponding inference times.
Regarding extrapolation, the absolute errors of each surrogate model with respect to σ xy of Simulation 1 are shown in Figure 5. We plot the absolute errors of each surrogate model of σ zz of Simulation 4 in Figure 6 as an example of interpolation. In addition, we show in both figures the ground truth of the corresponding output variable obtained from the FEM simulation. For both interpolation and extrapolation, the errors are large near the shear band. As far as extrapolation is concerned, in addition to the errors near the shear band, most models have significant errors near the maximum negative xy shear stresses; see blue areas in Figure 5g. GBDTR performs well overall, though the error increases in various locations; while PINNs have a similar average performance, they perform better outside the shear band regarding absolute errors. MLP overall shows the best results followed by KNNR.
In the bending beam use case, similar to the perforated plate use case, we trained our models on five simulations and tested them using the remaining four, see Table 2. We present the average R2-scores across all outputs and inference times in Table 8 for the test simulations 1, 4, 6 and 9. We provide a graphical representation of the absolute error of the surrogate models regarding ε t yy in Figure 7a-f with the FEM simulation result in (g) as one instance of interpolation. Absolute errors of the surrogate models regarding ε p xx and extrapolation are shown in Figure 8. Overall higher errors can be observed near the encastred boundary condition of the beam for some models for that output. While the PINN shows a competitive average R2-score regarding interpolation, on this single target, its performance shows significant weaknesses. The compression of a block with four perforations use case presents a more complex setting because we generalize by two generalization variables (yield stress and block width). Therefore, we utilize more training data for this use case; see Table 2. We report the average results of R2-scores with corresponding standard deviations, if applicable, in Table 9. As an instance for interpolation, the absolute errors regarding ε p xx can be seen in Figure 9a-f with Abaqus FEM simulation result (g). Respectively, an instance for extrapolation is shown in Figure 10 with absolute errors (a-f) and FEM ground truth (g). Some models show higher prediction errors near shear bands (high ε p xx regions) regarding the interpolation task. However, SVR and GPR cannot extract meaningful information from the training data, especially in the space free of plastic deformation. This is indicated by the low average R2-scores, compared to the other models. Considering absolute errors of σ xy and extrapolation the MLP, which is otherwise performing well, shows weaknesses and is in general outperformed by the KNNR.

Discussion
All classes of surrogate models that we considered in this work share several key characteristics: (1) they are mesh-free and thus, can deliver results with infinite resolution; (2) the computation time required to obtain the target values at predefined positions is orders of magnitude lower than for FEM simulations; (3) since for each simulation setup, where the geometry changes, a different mesh is created during FEM simulations, our results indicate that all classes of surrogate models generalize (interpolate) reasonably well across training data positions; (4) furthermore, all surrogate model classes generalize at least to some extent across use case parameters, such as changes in geometry or material parameters. Finally, all surrogate model classes must be used with care, as they do not extrapolate well to data positions and/or use case parameters unseen during training. Our findings show this in the extrapolation result of the beam use case, Simulation 9: due to the greater yield stress, almost no plastic deformation occurs; thus, the surrogate models are not able to learn such material behavior. Similar findings can be seen from the extrapolation results of the block use case, Simulation 1, 2, 12 and 13: approaches utilizing PINNs and SVRs are not able to predict acceptable strain components, leading to overall worse averaged R2-scores. In general, it can be stated that the surrogate models used show similar behavior with respect to inter-and extrapolation, but differ with respect to individual components, i.e., some models are better at predicting individual components (e.g., strains) for unknown generalization variables (e.g., yield strength) than others. Another example would be the symmetric nature of the use case, making it redundant to evaluate, e.g., stresses at negative x-positions, the proposed surrogate models will certainly respond with such stress values, which consequently, cannot be considered meaningful. Similarly, while the surrogate models may well be evaluated at physically meaningless use case parameters, e.g., negative radii, the thus obtained results must be considered meaningless as well. Therefore, all surrogate models must be treated with this in mind, which is a fundamental difference to FEM simulations that do not offer such modes of failure. With these considerations in mind, we now turn to discuss specific characteristics of each surrogate model class.
Our KNNR approach, which can be considered simple compared to the other algorithms, gave competitive results; moreover, this approach showed the best results regarding extrapolation (i.e., Simulations 1, 2, 12 and 13) in the block use case.
Algorithms we constructed with MultiOutputRegressor (SVR and GBDTR) could give better results if the hyperparameters are tuned to each target separately. However, we did not do this for fairness reasons since our other algorithms are also fitted to the overall use case and not to each target individually. We intend to monitor this in the future.
In our setting, the GPR algorithm did not deliver good results. Tuning the kernel function could deliver better results; however, we do not believe that it would be practical to modify for each new simulation use case. Thus, we not intend to head in this direction. However, we plan to investigate whether other Bayesian methods (e.g., Bayesian neural network [30] or neural processes [31]) could be beneficial.
Our MLPs approaches delivered the overall best results in our comparison, especially regarding interpolation (i.e., in the plate and beam use cases Simulations 4 and 6 and in the block use case, Simulation 7). They achieved high accuracies (R2-score > 0.992), while reducing the inference time by a factor of over 100 in comparison to FEM simulations. As mentioned before, designing the architecture is not a straightforward process; however, if the network is deep enough and suitable optimization methods are available (e.g., Adam optimizer) the network can be also efficiently trained utilizing early stopping.
As already reported in literature [32][33][34][35], we experienced in our setting that PINNs are not straightforward to design and train. Due to several plateaus in the loss function, early stopping did not prove to be effective. Therefore, we set a fixed number of training epochs. One reason for our observation could be the existence of a non-convex Pareto frontier [36]. In the multi-objective optimization problem, the optimizer might attempt to adjust the model parameters while situated between the different losses, leading it to favor one loss at the expense of the other [37]. Possible approaches to overcome this problem are adaptive optimizers [38], adaptive loss [39], and adaptive activation functions [40]. Moreover, PINNs are objects of current research and will gain more and more attention in the future. Besides other fundamental methods, we additionally plan to aim in that direction for improved surrogate modeling.

Conclusions
In this work, we deliver a comprehensive evaluation of generalizable and mesh-free ML and DL surrogate models based on FEM simulation and show that surrogate modeling leads to fast predictions with infinite resolution for practical use. In the context of our evaluation, we show which ML and DL models are target oriented at which level of complexity with respect to prediction accuracy and inference time, which can serve as a basis for the practical implementation of surrogate models (in, for example, production for real-time prediction, cyber-physical systems, and process design).
In future work, we plan to conduct more complex experiments, e.g., generalizing across more input variables regarding geometry (e.g., consideration of all component dimensions) and material parameters (e.g., non-perfect nonlinear material behavior, timedependent material properties, grain growth, and phase transformation). We will moreover explore extended surrogate models with more complex output variables (e.g., grain size, grain structure, and phase transformation). The authors would also like to thank the developers of the sciann API for making their work available and for responding promptly to our questions.

Conflicts of Interest:
The authors declare no conflict of interest.

Appendix A. Surrogate Models
We follow the notation introduced in Section 2.2 with data instance X i = (x i , y i ) containing input vector x i and output vector y i , the number of training instances is n and the number of test instances is m. Notations regarding individual models are introduced when needed.
Appendix A.1. GBDTR Boosting methods are powerful techniques in which the final "strong" regressor model is based on an iteratively formed ensemble of "weak" base regressor models [41]. The main idea behind boosting is to sequentially add new models to the ensemble, iteratively refining the output. In GBDTR models, boosting is applied to arbitrary differentiable loss functions. In general, GBDTR models are additive models, where the samples are modified so that the labels are set to the negative gradient, while the distribution is held constant [42].
The additive method of GBDTR is the following: whereŷ i is the prediction for a given input x i , and h g are the fitted base tree regressors. The constant G is the number of base tree regressors. The GBDTR algorithm is greedy, where a newly added tree regressor h g is fitted to minimize the loss L g of the resulting ensemble F g = F g−1 + h g , i.e., Here, l(y i , F(x i )) is defined by the loss parameters, and h(x i ) is the candidate base regressor. With the utilization of a first-order Taylor approximation: where z corresponds to F g−1 (x i ) + h g (x i ) and a corresponds to F g−1 (x i ), we can approximate the value of l with the following: We denote the derivative of the loss with g i and remove constant terms: h m is minimized if h(x i ) is fitted to predict a value proportional to the negative gradient.

Appendix A.2. KNNR
The KNNR algorithm is a relatively simple method mathematically, compared to other algorithms presented here. Here, the model stores all available use cases from the training dataset D and predicts the numerical targetŷ j of a test query instance x j with n < j ≤ (n + m) based on a similarity measure (e.g., distance functions). The algorithm computes the distance-weighted average of the numerical targets of the K nearest neighbors of x j in D [43].
Specifically, we introduce a distance metric d that measures the distance between all training instances x i with i ≤ n and a test instance x j . Next, the training instances are sorted w.r.t. their respective distance in ascending order to the test instance, i.e., there is a permutation π j of the training indices i such that d( x j ). Then, the estimateŷ j (x j ) is given as the following: where K must be specified as a hyperparameter.

Appendix A.3. GPR
Gaussian process regression modeling is a non-parametric Bayesian approach [44]. In general, a Gaussian process is a generalization of the Gaussian distribution. The Gaussian distribution describes random variables or random vectors, while a Gaussian process describes a function f (x) [45].
In general, a Gaussian process is completely specified by its mean function µ(x) and covariance function K(x, x ) (also called kernel).
If the function f (x) under consideration is modeled by a Gaussian process, i.e., if f (x) ∼ GP (µ(x), K(x, x )), then we have the following for all x and x . Thus, we can define the Gaussian process as the following: We use the notation that matrix D = (X D , Y D ) contains the training data with input data matrix X D = (x 1 , . . . , x n ) and output data matrix Y D = (y 1 , . . . , y n ), and test data matrix T = (X T , Y T ) contains the test data with X T = (x n+1 , . . . , x n+m ) as input and Y T = (y n+1 , . . . , y n+m ) as output. We can define that they are jointly Gaussian and zero mean with consideration of the prior distribution: The Gaussian process makes a prediction Y T for X T in a probabilistic way, where, as stated before, the posterior distribution can be fully described by the mean and the covariance.
The SVR approach is a generalization of the SVM classification problem by introducing an -sensitive region around the approximated function, also called an -tube. The optimization task in SVR contains two steps: first, finding a convex -insensitive loss function that need to be minimized, and second, finding the smallest -tube that contains the most training instances.
The convex optimization has a unique solution and is solved using numerical optimization algorithms. One of the main advantages of SVR is that the computational complexity does not depend on the dimensionality of the input space [46]. To deal with otherwise intractable constraints of the optimization problem, we introduce slack variables ξ i and ξ * i [47]. The positive constant C determines the trade-off between the flatness of the function and the magnitude up to which deviations greater than are allowed. The primal quadratic optimization problem of SVR is defined as the following: subject to the f ollowing : Here, ω is the weight and b the bias to be adjusted. The constrained quadratic optimization problem can be solved by minimizing the Lagrangian with non-negative Lagrange multipliers The minimum of L can be found by taking the partial derivatives with respect to the variables and making them equal to zero (Karush-Kuhn-Tucker (KKT) conditions). With the final KKT condition, we can state the following: The Lagrange multipliers that are zero correspond to the inside of the ε-tube, while the support vectors have non-zero Lagrange multipliers. The function estimate depends only on the support vectors, hence this representation is sparse. More specifically, we can derive the following function approximation to predictŷ j (x j ): and the number of support vectors n SV . For nonlinear SVR we replace ω T x i in (12)-(15) by ω T φ(x i ) and the inner product in (16) by the kernel K(x i , x j ).

Appendix A.5. MLP
A neural network is a network of simple processing elements, also called neurons. The neurons are arranged in layers. In a fully-connected multi-layer network, a neuron in one layer is connected to every neuron in the layer before and after it. The number of neurons in the input layer is the number of input features p and the number of neurons in the output layer is the number of targets q [48]. MLPs have several theoretical advantages, compared to other ML algorithms. Due to the universal approximation theorem, an MLP can approximate any function if the activation functions of the network are appropriate [49][50][51]. The MLP makes no prior assumptions about the data distribution, and in many cases, can be trained to generalize to new data not yet seen [52]. However, finding the right architecture and finding the setting of training parameters is not straightforward and usually done by trial and error influenced by the literature and guidelines.
A neural network outputŷ corresponding to an input x can be represented as a composition of functions, where the output of layer L − 1 acts as input to the following layer L. For example, for non-linear activation function σ L , weight matrix W L , and bias vector b L of the respective layer L, we obtain the following: With the neural network estimateŷ(x) and the respective target y of an input x, we can denote a loss function L. A very common loss function for MLPs for regression tasks is the mean-squared error: where W and b are the collections of all weight matrices and bias terms, respectively. Optimal weight W * and bias b * terms for each layer are identified with minimizing the loss function L via back-propagation [53].
In PINNs, the network is trained simultaneously on data and governing differential equations. PINNs are regularized such that their function approximationŷ(x) obeys known laws of physics that apply to the observed data. This type of network is well suited for solving and inverting equations that control physical systems and find application in fluid and solid mechanics as well as in dynamical systems [21,35].
PINNs share similarities with common ANNs, but the loss function has an additional part that describes the physics behind the use case setting. More specifically, the loss L is composed of the data-driven loss L data and the physics-informed loss L physics : While the data-driven loss is often a standard mean-squared error, the physicsinformed loss accounts for the degree to which the function approximation solves a given system of governing differential equations. For further details, we refer the reader to [23,35,54] in general and to the Python package of [21,22] in particular for simple implementation of structural mechanics use cases.       7.663 × 10 −10 1.627 × 10 −10 7.659 × 10 −4 9.020 × 10 −6 7.573 × 10 −9 4.990 × 10 −9 4.726 × 10 −9 2.438 × 10 −9 6.512 × 10 −9 σ xx 5.480 × 10 1