Data-Driven Microstructure Property Relations

: An image based prediction of the effective heat conductivity for highly heterogeneous microstructured materials is presented. The synthetic materials under consideration show different inclusion morphology, orientation, volume fraction and topology. The prediction of the effective property is made exclusively based on image data with the main emphasis being put on the 2-point spatial correlation function. This task is implemented using both unsupervised and supervised machine learning methods. First, a snapshot proper orthogonal decomposition (POD) is used to analyze big sets of random microstructures and thereafter compress signiﬁcant characteristics of the microstructure into a low-dimensional feature vector. In order to manage the related amount of data and computations, three different incremental snapshot POD methods are proposed. In the second step, the obtained feature vector is used to predict the effective material property by using feed forward neural networks. Numerical examples regarding the incremental basis identiﬁcation and the prediction accuracy of the approach are presented. A Python code illustrating the application of the surrogate is freely available.


Introduction
In material analysis and design of heterogeneous materials, multiscale modeling can be used for the discovery of microstructured materials with tuned properties for engineering applications.Thereby, it contributes to the improvement of the technical properties, reduces the amount of resources invested into the construction and enhances the reliability of "material modeling"/"the description of material behaviour".However, the discovery of materials with the desired material property, which is characterized by the microstructure of the solid, constitutes a highly challenging inverse problem.
The basis for all multiscale models and simulations is information on the microstructure and on the microscale material behaviour.If at hand, physical experiments can be replaced by -often costly -computations in order to determine the material properties by virtual testing [1][2][3].Separation of structural and microstructural length scales can often be assumed.This enables the use of the representative volume element (RVE) [4] equipped with the preferable periodic fluctuation boundary conditions [5].The RVE characterizes the highly heterogeneous material using a single frame (or image) and the (analytical or numerical) computation can be conducted on this frame.
The concurrent simulation of the underlying microstructure (e.g., through nested FE simulations, cf., e.g., [6,7], or e.g.considering microstructure behaviour in the constitutive laws, e.g.[8]) and of the problem on the structural scale is computationally intractable.In view of the correlation between computational complexity and energy consumption, nested FE simulations should be limited in application.Therefore, efficient methods giving reliable prediction of the material property are an active field of research: POD-driven reduced order models with hyper-reduction (e.g., [9,10]), with multiple reduced bases spanning also internal variable [11,12] and for finite strains (e.g., [13,14]).See also general review articles on the topic such as [15,16].
Supposing that two similar images representing microstructured materials are considered, it is natural to expect similar effective properties in many physically relevant problems such as elasticity, thermal and electric conduction to mention only two applications.The main task, thus, persists in finding low-dimensional parameterizations of the images that capture the relevant information, use these parameterizations to compress the image information into few numbers and build a surrogate model operating only on the reduced representation.A black-box approach exploiting precomputed data for the construction of the surrogate is the use of established machine learning methods which is also the topic of this paper.
As the no free lunch theorem [17] states, an algorithm can not be arbitrarily fast and arbitrarily accurate at the same time.Hence, there has to be a compromise either in accuracy, computational speed or in versatility.At the cost of generality, i.e. by focusing on subclasses of microstructures, fast and accurate models can be deployed while still allowing for considerable variations of the microstructures.This does not mean that these subclasses must be overly confined: for instance, inclusion volume fractions ranging from 20 up to 80% are considered in this work.Using a limited number of computations performed on relevant microstructure images, machine learned methods can be trained for the subclass under consideration.The sampling of the data, the feature extraction and the training of the ML algorithm constitute the offline phase in which the surrogate model is built.Typically, the evaluation of the surrogate can be realized almost in real-time (at least this is the aspired and ambitious objective), thereby enabling previously infeasible applications in microstructure tayloring, interactive user interfaces and computations on mobile devices.
A currently active research for microstructure property linkage is the material knowledge system (MKS) framework [18].Many branches thereof exist, all trying to attain low-dimensional microstructure descriptors from the truncation of selected n-point correlation functions.For instance, a PCA of the n-point correlation functions of the microstructure is performed and the principal scores are used to in a polynomial regression model in order to predict material properties.The MKS is actively researched for different material structures [19][20][21].For instance, [19,20] successfully predict the elastic strain and yield stress for the underlying microstructure using the MKS approach, however they confine their focus on either the topological features of the microstructure or a confined range of allowed volume fractions (0-20%), often held constant in individual studies.
The goal in the present study is to make accurate image based predictions for RVEs spanning large subsets, e.g. in terms of volume fraction, morphological and topological variations, of microstructure materials.
Similarly to key ideas of the MKS approach, a reduced basis is deployed to reduce the dimensionality of the microstructural features contained in the n-point correlation functions.With the sheer amount of samples required, conventional methods fail to capture the key features of the all the microstuctures and we propose three novel incremental reduced basis updates to make the computation possible.Using synthetic microstructures/RVE, the costly training of the reduced basis and of the artificial neural network (e.g., [22]) becomes feasible, allowing to build a surrogate model for the image-property linkage.The surrogate accepts binarized image representations of bi-phasic materials as inputs.The outputs constitute the effective heat conductivity of the considered material.
In Section 2 the microstructure classification and the three different incremental snapshot POD procedures used during feature extraction are presented (unsupervised learning).In Section 3 the use of feed forward artificial neural networks for the processing of the extracted features is discussed.Numerical examples are presented in Section 4 including different inclusion morphologies and an investigation of the relaxation of the microstructure subclass confinement of the procedure by using mixed data sets.A Python code illustrating the application of the surrogate is freely available Github.

Microstructure Classification
The microstructure is defined by the representative volume element (RVE) [4], which is one periodic frame or image characterizing the heterogeneous material under consideration, see Figure 1 for examples of the microstructure and its 2-point spatial correlation function (see below for its definition).Due to their favorable properties regarding the size of the RVE, periodic fluctuation boundary conditions are used for the computations during the offline phase [5].
The n-point spatial correlation functions represent a widely used mathematical framework for microstructural characterization [23,24].Roughly described, the n-point correlation is obtained by placing a polyline consisting of (n − 1) nodes defined relative to the first point by vectors r 1 , r 2 , . . . .By placing the first point uniformly randomly into the microstructure and computing the mean probability of finding a prescribed sequence of material phases at the nodes of the polyline (including the initial point) denotes the n-point correlation c n (r 1 , r 2 , . . ., r n−1 ; m 1 , m 2 , . . ., m n ), where m k is the material label expected to be found at the kth node.
For example, the 1-point spatial correlation function, i.e. the probability of finding phase m (m ∈ {a, b, . . .}), yields the phase volume fraction f m of phase m.In the present study bi-phasic materials are considered.Here m = a corresponds to the matrix material (drawn blue in Figure 1) and m = b to the inclusion phase (drawn yellow in Figure 1).The trivial relation holds.The 2-point spatial correlation function (2PCF) c 2 (r; a, b) places the vector r in each pixel/voxel x of the RVE and states the probability of starting in the matrix phase a and ending in the inclusion phase b.Mathematically we have with χ (m) being the indicator function of phase m, r the point offset and • x denoting the averaging operator over the RVE.The 2PCF is efficiently computed in Fourier space by making use of the algorithmically sleek fast Fourier transform (FFT) [25,26]: In addition to that, a key property of the 2PCF is its invariance with respect to translations of the periodic microstructure.This property is of essential importance when it comes to the comparison of several images under consideration, i.e. during the evaluation of similarities within images.
Examples of c 2 (r; b, b) (referred to also as auto-correlation of the inclusion phase) are depicted by the lower set of images in Figure 1.By the metric of vision, the following characteristics can be observed: • the maximum of c 2 (r; b, b) occurs at the corners of the domain (corresponding to r = 0); • preferred directions of the inclusion placement and/or orientation correspond to laminate-like images (best seen in the third microstructure from the left); • the domain around r = 0 partially reflects the average inclusion shape; • some similarities are found, particularly with respect to shape of the 2PCF at the corners and in the center.
These observations hint at the existence of a low-dimensional parametrization of relevant microstructural features.In the following this property is exploited by using a snapshot proper orthogonal decomposition (snapshot POD) in order to capture reoccurent patterns of the 2PCF.By working on the two-point function the afore-mentioned elimination of possible translations of the images is an important feature.The influence of higher order spatial correlation functions has been investigated in the literature [e.g., 23,27].These considerations often yield minor gains relative to the additional computations and the increased dimensionality * .Although it has been demonstrated that the two point function does not suffice to uniquely describe the microstructure in periodic domains [28], there is evidence that the level of microstructural ambiguity for identical 2PCF can be considered low.Therefor, only the n-point correlation functions up to second order are accounted for in the present study.

Unsupervised Learning via Snapshot Proper Orthogonal Decomposition
The snapshot POD [29] can be used to construct a reduced basis (RB) [30][31][32] that provides an optimal subspace for approximating a given snapshot matrix S ∈ R n×n s .The matrix S consists of n s individual snapshots s i ∈ R n with the size n being the dimension of the discrete representation of the unreduced field information.In the case of the 2PCF n denotes the total number of pixels within the RVE, i.e. the discrete two-dimensional 2PCF (representing image data) is recast into vector * For instance, the 3PCF takes to vectors r 1 , r 2 ∈ Ω as inputs.Hence, the full 3PCF is basically inaccessible in practice but only after major truncation.
format for further processing ( c 0 2 (m, m) ∈ R n ).In the present study, the constructed RB is used for information compression, i.e. for the extraction of relevant microstructural features from the image data.The reduced basis B ∈ R n×N retains the N most salient features of the data contained in S in a few eigenmodes represented by the orthonormal columns of B.
The actual snapshot data contained in S is thus constructed from the discrete 2-point function data s 0 i ∈ R n via where 1 ∈ R n is a vector containing ones at all entries.The reduced basis is computed under the premise to minimize the overall relative projection error with respect to the Frobenius norm • F .The RB can be constructed with multiple methods, e.g. with the snapshot correlation matrix C S and its eigenvalue decomposition, which is given by The following properties of the sorted eigenvalue decomposition hold: and δ ij denotes the Kronecker delta.The dimension of the reduced basis is determined by the POD threshold, i.e. the truncation criterion is given by where ε > 0 is a given tolerance denoting the admissible approximation error.Then, the reduced basis is computed via after truncation of the eigenvalue-and eigenvector matrices to reduced dimension N represented with Θ ∈ R N×N and V ∈ R n×N , respectively.The sorting of the eigenvalues with their corresponding eigenvectors leads to the property that the least reoccurent information given in S is omitted.Hence, the first eigenmode in B has the most dominant pattern, the second eigenmode the second most etc.The properties of the reduced basis computed with the snapshot correlation matrix remain the same as for the singular value decomposition (SVD) introduced below.
The SVD [33] of the snapshot matrix is given by with the following properties (asserting and the sorted non-negative singular values σ i such that The criterion for determining the reduced dimension N matching (14) takes the form Then, the reduced basis is given by truncation of the columns of More specifically, the left subspace associated with the leading singular values is the RB.Both introduced methods yield the exact same result for the same snapshot matrix S.

Incremental Generation of the Reduced Basis B
The RB is deployed in order to compress the information contained in n s snapshots into an N-dimensional set of eigenmodes stored in the columns of B ∈ R n×N , where N n s is asserted.Since the RB is computed with the snapshot matrix alone, the information contained in S needs to contain data representing the relevant microstructure range, i.e. covering the parameter range used in the generation of the synthetic materials, in order for B to be representative for the problem under consideration.
In the case of microstructural images containing n pixels, a ludicrous amount of 2 n states could theoretically be considered when allowing for fully arbitrary microstructures.When limiting attention to certain microstructure classes, then less information is required.Still, thousands of snapshots are usually required, at least.In the following, attention is limited to synthetic materials generated using random sequential adsorption of morphological prototypes with variable size, orientation, aspect ratio, overlap and phase volume fraction.Due to the high variability of such microstructures (see, e.g., Figure 1), a large number of snapshots is required that can usually not be stored in memory, i.e. a monolithic snapshot matrix S is not available.Although attention is limited to two-dimensional model problems in this study, the problem aggravates considerably for three-dimensional images which imply technical challenges of various sort (storage, processing time, data management, . . .).In order to be able to generate a rich RB accounting for largely varying microstructural classes, the incremental basis generation represents a core concept within the present work.It enables the RB generation based on a sequence of input snapshots but without the need to store previously considered data except for the current RB.Three different methods are proposed, two of which rely on approximations of the snapshot correlation matrix C S , and one of which relies on the SVD of an approximate snapshot matrix.The general incremental scheme depicted in Figure 2 remains the same for all the procedures, i.e. the only difference is found during the step labeled adjust.
The algorithm is initialized by a small sized set of initial snapshots of the shifted 2-point correlation function (see Section 2.2).Further, the algorithmic variables n δ = 0 and ∆S = 0 are set.The initial RB is computed classically using either the correlation matrix or the SVD (see previous section for details).After computation of the RB, the snapshots are stored neither in memory nor on a hard drive.The algorithm then takes input snapshots in the order of appearance.For each new snapshot s i the relative projection error with respect to the current RB is computed: If P δ is greater than the tolerance ε > 0 the snapshot is considered as inappropriately represented by the existing RB.Consequently, s i is appended to a buffer ∆S containing candidates for the next basis enrichment and the counter n δ is incremented.Once the buffer contains a critical number of n a elements the actual enrichment is triggered and the buffer is emptied thereafter.Thereby the computational overhead is reduced.The three different update procedures are described later on in detail.The procedure is continued until n c > 0 consecutive snapshots were found to be approximated up to the relative tolerance ε.Then the basis is considered as converged for the microstructure class under consideration.
In the following three methods for the update procedure are described.Formally, the update of an existing basis B with a block of snapshots contained in the buffer ∆S is sought-after.The new basis is required to remain orthonormal.

Method A: Append Eigenmodes to B
A trivial enrichment strategy is given in terms of appending new modes to the existing basis while preserving orthonormality of the basis.Therefore, the projection of ∆S onto the existing RB is subtracted in a first step: It is readily seen that ∆ S is orthogonal to B. Then the correlation matrix of the additional data and its eigen-decomposition are computed according to Eventually, the enrichment is given through the truncated matrices V and Θ: The new basis is then obtained by Method A simply adds modes generated from the projection residual ∆ S in a decoupled way, i.e. the existing basis is not modified.In order to compute the basis update, only the existing RB B and the temporarily stored snapshots ∆S are required.

Remarks on Method A
A. 1 The truncation parameter must be chosen carefully such that In particular, the normalization with respect to the original data prior to projection onto the existing RB must be taken.A.2 By appending orthonormal modes to the existing basis it is a priori guaranteed that the accuracy of previously considered snapshots cannot worsen, i.e. an upper bound for the relative projection error of all snapshots considered until termination of the algorithm is given by the truncation parameter and n a :

Method B: Approximate Reconstruction of the Snapshot Correlation Matrix
The goal of this iterative update scheme is the accurate approximation of the new correlation matrix Here S denotes all snapshots considered in the RB so far and ∆S contains the candidate snapshots.However, the previously used snapshots formally written as S are no longer available since they can not be stored due to storage limitations.Using the previously computed matrices B, V, Θ the following approximations are available: where the accuracy of the approximation is governed by the truncation threshold δ N .Using these approximations and using the property of the eigenvalue decomposition, the snapshot matrix S can be approximated by The snapshot correlation matrix C that considers the additional snapshots can be approximated as In order to compute the updated basis, the inexpensive eigenvalue decomposition of Analogously to the previous RB computation in equation ( 15), the adjusted and enriched basis is computed by To update the RB the truncated eigenvector matrix (B , V ← W ∈ R n×N ) need to be stored as well as the diagonal eigenvalue matrix Θ.

Remarks on Method B
B. 1 The existing RB is not preserved but it is updated using the newly available information.Thereby, the accuracy of the RB for the approximation of the previous snapshots is not guaranteed a priori.However, numerical experiments have shown no increase in the approximation errors of previously well-approximated snapshots.B. 2 In contrast to Method A the dimension of the RB can remain constant, i.e. a mere adjustment of existing modes is possible.The average number of added modes per enrichment is well below that of Method A. B. 3 The additional storage requirements are tolerable and the additional computations are of low algorithmic complexity.

Method C: Incremental SVD
Method C is closely related to Method B. However, instead of building on the use of the correlation matrix, it relies on the use of an updated SVD, i.e. an approximate truncated SVD is sought-after: Since the original snapshot matrix S can not be stored, only an approximation of the actual truncated SVD in (33) can be computed.Methods to compute an incremental SVD are introduced in [34,35], with the latter referring to Brand's incremental algorithm [36] which is used in the present study with minor modifications.With the previously computed basis B at hand, the approximation of S is known Introducing the projection residual ∆ S of the enrichment snapshots ∆S together with its SVD and using (33), (34) and (35) the full snapshot matrix can be approximated after some algebra by Considering the SVD of , which is inexpensive due to the sparsity of Γ, approximation (36) is further rewritten as It is easily shown that the matrices U * and W * are column-orthogonal and that Σ * is diagonal and non-negative.Therefore, the three matrices constitute an approximate SVD of the enlarged snapshot matrix at low computational expense.This implies the following updates after each enrichment step after truncation of B, where the truncation criteria needs to ensure that B does not decrease in size.
To compute the enrichment of the RB, B ∈ R n×N and the sparse singular values Σ ∈ R N×N after truncation need to be stored.

Remarks on Method C
C. 1 As highlighted for Method B in remark B.1, the existing RB is not preserved but adjusted by considering the newly added information.This ensures the accuracy of the approximation error of previously well-approximated snapshots, but a priori guarantees regarding the subset approximation accuracies cannot be made.C. 2 In contrast to Method A the dimension of the RB can remain constant, i.e. a mere adjustment of existing modes is possible.The average number of added modes per enrichment is well below that of Method A. C.3 Each update step in equation ( 38) is computed separately and, consequently, storing W is not required since only the RB B is of interest.C. 4 The diagonal matrix Σ has low storage requirements corresponding to that of a vector in R N .

Supervised learning using Feed Forward Neural Network
During the supervised learning phase, the machine is provided with data sets consisting of inputs and the related outputs.Hence, the supervised learning phase tries to learn a function relating inputs to outputs without or with limited prior knowledge of the structure of the unknown mapping.Artificial neural networks (ANN) are a powerful machine learning tool which have gained wide popularity in the recent decades due to the surge in computational power [22,37].
The functionality of the ANN is inspired by the (human) brain, propagating a signal (input) through multiple neurons where it is lastly transformed into an action (output).Various types of neural networks have been invented, e.g.feed forward, recurrent, convolutional, being applicable to almost any field of interest [38][39][40][41].In the present study a regression model from the input, i.e. the feature

Input layer
Hidden layer 1 Hidden layer L

Output layer
Figure 3.The basic functionality of a dense feed forward neural network is depicted in simplified form.
vector ξ which is derived with the converged basis B, to the output, i.e. the effective heat conduction tensor κ, is deployed with a dense feed forward ANN.
In a dense feed forward ANN (Figure 3) a signal is propagated through the hidden layers where every output of the previous layer a l−1 affects the activation z l of the current layer l (l = 1, . . ., L + 1).The activation of each layer gets wrapped into an activation function f where the output of each neuron in the layers is computed, i.e. a l = f (z l ).Note that matrix/vector notation is used, where each entry in the vectors denotes one neuron in the respective layer.
The basic learning algorithm/optimizer usually employed for a feed forward ANN is the back propagation algorithm [42] and modifications thereof.The learning of the network consists in the numerical identification of randomly initialized and unknown weights W l and biases b l that minimizes a given cost function.The latter gives an indication of the quality of the ANN prediction.For instance, the gradient back propagation computes gradients of the cost function with respect to the weights and biases, in order to compute suitable corrections for these parameters.
The learning itself is an iterative procedure in which the training data is cycled multiple times through the ANN (one run called an epoch).In each epoch the internal parameters are updated with the aim of improving the mapping from the input to the output.The optimization problem itself is (usually) high-dimensional.In most situations it is not well-posed and local minima and maxima can hinder convergence.Therefor, multiple random instantations of the network parameters are usually required to assure that a good set of parameters is found, even if the network layout remains unaltered.
Notably, the training requires a substantial input data set as inputs.It is important to note that the training of the ANN usually results in a parameter set that is able to approximate the training data with high accuracy under the given meta-parameters describing the network architecture (number of layers, number of neurons per layer, type of activation function).However, the approximation quality of the ANN may be different for other queries.Thus, it is important to validate the generality of the discovered mapping for the underlying problem setting.Therefor an additional validation data set is introduced, where only the evaluation of the cost function is tracked during the training.Generally when overfitting † occurs, the errors for the validation set increase whereas the errors of the training set decrease.The training should be halted if such a scenario is detected.
Since the choice of activation function as well as the number of hidden layers and the number of neurons within the individual layers are arbitrary (describing the ANN architecture), these meta-parameters should be tailored specifically for the desired mapping.Finding the best neural network architecture is not straight-forward and usually relies on intuition, experience and many numerical experiments.In order to find a well suited ANN, various (random) realizations of each tested ANN architecture need to be computed, before a decision regarding the optimal layout can be made.
In the present study the ANN training is performed using Tensorflow in Python [43].Tensorflow is an open source project by the Google team, providing highly efficient algorithms for ANN implementation.The Adam [44] optimizer, which is a modification of the gradient back propagation, has been deployed for the learning.

Generation of Synthetic Microstructures
All of the used synthetic microstructures have been generated by a random sequential adsortion algorithm with some examples shown in Figure 1.Two morphological prototypes were used: spheres and rectangles.The parameters used to instantiate the generation of a new microstructure were modeled as uniformly distributed variables: M.1 the phase volume fraction f b of the inclusions (0.2-0.8); M.2 the size of each inclusion (0.0-1.0); M.3 for rectangles: the orientation (0-π) and the aspect ration (1.0-10.0);M.4 the admissible relative overlap for each inclusion (0.0-1.0).For = 0 and the spherical inclusion, a boolean model of hard spheres is obtained.Setting = 1 induces a boolean model without placement restrictions, i.e. new inclusions can be placed independent of the existing ones.The generated microstructures were stored as images with resolution 400×400.After the generation of the RVE, the 2-point spatial correlation function was computed for the RVE.This was then shifted, see Section 2.2, and used as a snapshot s i for the identification of the reduced basis.
Additionally, a smaller random set of RVEs used for the supervised learning phase was simulated using the recent Fourier-based solver FANS [3] in order to compute the effective heat conduction tensor κ.The heat conductivity of the matrix and of the inclusion phase are prescribed as Overfitting relates to the fact that a subset of the data is nicely matched but small variations in the inputs can lead to substantial loss in accuracy, similar to oscillating higher-order polynomial interpolation functions.
Here R > 0 denotes the material contrast.In the present study, R = 5 was considered, i.e. the matrix of the microstructure has a five times higher conductivity than the inclusions.These values can be seen as typical values for metal ceramic composites (Figure 4).An inverse phase contrast has exemplarily been studied, i.e. inclusions with κ b = 1 and κ a = κ b 5 (corresponding to R = 1 5 ) has also been investigated.Qualitatively, the results for the inverse phase contrast did not show any new findings or qualitative differences.Therefor, the following results focus on R = 5, corresponding to rather insulating inclusions.The symmetric tensor κ can be represented as a three-dimensional vector κ using the normalized Voigt notation For the supervised learning of the ANNs (see Section 3), multiple files each containing 1500 data sets for different inclusion morphologies were generated (circle only; rectangle only; mixed).Each data set contains the image of the microstructure, the respective autocorrelation of the inclusion phase c 2 (•; b, b) and the effective heat conductivity κV .

Unsupervised Learning
First, the reduced basis is identified using the iterative procedure presented in Section 2.3.All three proposed methods were considered and for each of these, three different sets of microstructures were considered: The first set of microstructures consisted of RVEs with only circular inclusions, the second set consisted of RVEs with only rectangular inclusions, and the third set was divided into equal parts, each part consisting of RVEs with either circular or rectangular inclusions ‡ , respectively.Each type of microstructure was processed using each of the three incremental RB schemes introduced in Section 2.3.Hence, a total of nine different trainings were conducted, each using different randomly generated snapshots.
For the iterative enrichment process, the initial RB was computed from 200 snapshots S 0 .Thereafter, snapshots were randomly generated and processed by the enrichment algorithm sketched in Figure 2. The number of snapshots per enrichment step has been set to n a = 75 and the number of consecutive snapshots with P δ < ε , used to indicate convergence, has been set to n c = 100.The relative projection tolerance ε = 0.025 was chosen.Note that this corresponds to the maximum value of the mean relative • L 2 -error that is considered exact for the shifted snapshots.The actual accuracy ‡ (i.e. each structures contained exclusively one of the two morphological prototypes and the same number of realizations for each prototype was enforced) Table 1.Data of the unsupervised learning (incremental RB identification) for the nine considered scenarios; the parameters ε = 0.025 , n c = 100 and n a = 75 were used.Some numbers are rounded for easier readability.in the reproduction of the 2PCF c 2 (r; b, b) is significantly lower than this (results are given in Figure 7 below).Key attributes for each of the nine trainings are provided in Table 1.There is an obvious discrepancy between Method A and the remaining methods in basically all outputs.While Method A claims the lowest computing times, it yields approximately twice the number of modes.However, the number of snapshots needed is substantially lower which can be relevant if the generation of the synthetic microstructures is computationally involved.

Method
Note that methods B and C yield similar results, although for the rectangular and circular training method C needed significantly more snapshots, method B needed significantly more snapshots for the mixed training.The outliers in the number of snapshots needed are due to the randomness of the materials and the chosen convergence criterion.The resulting basis size of methods B and C indicate very similar results from these methods.
Also note that the calculation of the relative projection error P δ grows linearly with the dimension of the RB, i.e. the faster offline time of method A can quickly be compensated by the costly online procedure induced by the high dimension of the RB in comparison to the competing techniques.
To compare the accuracy of the resulting basis as well as during the training, the relative projection error P δ of the snapshots used for the original basis construction S 0 are plotted in Figure 5   Since each training conducted a different amount of incremental updates, the abscissa has been rescaled such that the relative progress of the basis generation is shown vs. the number of enrichments divided by n it (Figure 5 and Table 1).
Method B and C yield again very similar results whereas method A achieves a lower projection error on convergence, but at the expense of a considerably improved dimension of the RB.Since there seems to be an obvious correlation between resulting accuracy and the final basis size for the initial snapshots S 0 (see Figure 5, Table 1), the general quality for arbitrary stochastic inputs must be investigated.In order to quantify the quality of the RB, the accuracy can be expressed in terms of the relative projection error of approximating additional, newly generated snapshot data S as a function of the method (A, B, C) and the number of modes N ≥ 1 via in Matlab notation.This measure captures to what extend the first N basis functions represent the 2PCF of the underlying microstructure class.In the current work sets of 1500 newly generated snapshots assure an unbiased validation, i.e. the data was used in neither of the three training procedures.The results are stated in Figure 6.Again, methods B and C yield similar results, achieving lower projection errors with fewer eigenmodes compared to Method A, i.e. the basis produced by method A cannot catch up with its two competitors.On a side note, the rectangular inclusions apparently lead to significantly richer microstructure information which can be seen by direct comparison of the left to the middle plot in Figure 6.For methods B and C and for circular inclusions the relative error of 5% is reached for approximately 15 modes while rectangular inclusions require more than 60 modes to attain a similar accuracy.This is supported also by the rightmost plot determined from a sort of blend of the two microstructural types.
Since all of the previous error measures are given on the shiftedsnapshot according to equation (10), the true relative projection error on the unshifted snapshot is also investigated as a function of the basis size.It describes the relative accuracy of the approximation of the 2PCF c 2 (r; b, b) as a function of the basis size.The errors in the shifted data (Figure 7, left) and the corresponding reconstructed 2PCF (Figure 7, right) for five individual randomly selected snapshots show that the actual relative error in the 2PCF reconstruction is below 5% for 10 reduced coefficients even for the challenging rectangular inclusion morphology, while the error in the shifted and shifted snapshots is on the order  of 50%.This highlights the statement made earlier regarding the choice of ε which is not directly the accepted mean error in the 2PCF, but only after application of the shift.The high discrepancy in the two relative projection errors is due to the fact that the shifted snapshots fluctuate closely around 0, i.e. the homogeneous part of the 2PCF is obviously of high relevance.
The development, i.e. the stabilization of the mode shapes over the enrichment steps, of a few selected eigenmodes is shown in Figure 8 using RVEs with circular inclusions for training of method C. Similar results are expected for method B, whereas for method A the eigenmodes would remain unconditionally unchanged over the enrichment steps, i.e. a pure enlargement of the basis takes place.The faster stabilization of the leading eigenmodes indicates a quick stabilization of the lower order statistics of the microstructure ensemble, while the tracking of higher order fluctuations is more involved.

Supervised Learning
After the training of the RB, the input for the neural network, the feature vector ξ was derived using the 1-and 2-point spatial correlation functions of the ith RVE as The size of the feature vector is determined by the amount of reduced coefficients 1 ≤ h ≤ N, i.e. the snapshot is projected onto the leading h eigenmodes of B.
Since the inputs and outputs have a highly varying magnitude, they need to be shifted such that they are equally representative.Therefore, each entry of the feature vector is separately shifted and scaled such that its distribution of all samples has zero mean and a standard deviation of one.The output is shifted combinedly such that the mean of κV is 0. The transformed inputs and outputs are then given to the ANN for the training phase.Thus, the outputs of the ANN need to undergo an inverse scaling in order to yield the sought-after vector representation of the heat conduction tensor.These shifts and scalings need to be extracted from the available training data.Hence, every data set used for training purposes has its own parameters.
The training for the neural network has been conducted for all of the three microstructure classes, i.e. using only RVEs with circular inclusions, only RVEs with rectangular inclusions and lastly using RVEs with either circular or rectangular inclusions (the split was up to 60:40, which was randomly assigned before the training).In order to derive the feature vector, the converged basis of method C has been used.Note that depending on the training set, only circular/rectangular or mixed inclusions contributed to the RB.
The training of the ANN was conducted with an early stop algorithm: up to 10000 epochs were considered and the best-not necessarily last-parameter set has been saved.The decision on the best ANN was taken on the basis of the cost function of the validation set (which came out of the same parameter range as the training set).Each data set consisted of 1500 samples.These were shuffled randomly and split into the training set (n t = 1000) and the validation set (n v = 500) before each ANN training.
In the following, the error measurements used and the term of unbiased testing refers to the prediction of 1500 unseen data points for each of the three microstructure classes named test sets.In order to find a good overall ANN, the network architecture has been intensely studied: the accuracy of the prediction after the training has been evaluated with various sizes of the feature vector, different network layouts and for different activation functions (Figure 9).The depicted error measure (Figure 9) is the mean/max absolute derivation, i.e. the absolute error (corresponding to the Euclidean norm of the vector-valued error) for the prediction and the actual heat conductivity of the microstructure of κ V over each test set.
The conductivity κ 12 fluctuates mildly around zero for all inputs.In order to accurately capture this fluctuation, only the specific training and RB dimensions of four or higher (h ≥ 4) are required cf. Figure 10.However, the error of κ 12 for other inclusion morphologies than that of the training increase with h, albeit the values can be considered small in comparison to the κ 11 and κ 22 errors.
Since the trend in Figure 9 of the rectangular and mixed training looks promising, a further study using more reduced coefficients has been conducted, however the results for retangular training showed a stronger tendency towards overfitting.Rectangular training with 17 ≤ h ≤ 25 had very similar results to circular training with 12 ≤ h ≤ 17 reduced coefficients.For the mixed training the overall results worsened with a higher amount of reduced coefficients and the ANN did not seem to find any mapping generalizing the property linkage for both types of deployed RVE with the given training data.
The increased overfitting with respect to the training microstructure class for increased dimension of the feature vector can be explained by the rather limited number of 1000 input samples.For instance, considering a six-dimensional feature vector induces that for the rather limited number of ten samples per independent direction a total of 10 6 data points would be needed.The dilemma is that each of the computations is expensive, particularly when considering three-dimensional simulations, i.e. millions of samples can not be realized in practice.Therefore, the number of samples must be rather low which could be a limiting factor in view of the number of features that can be accounted for.Overall, a correlating trend with the accuracy for the RB (Figure 6) and the behaviour of the ANN training could be seen (Figure 9).Investigating the circular training, with more specific information about the 2PCF available, a slightly better mapping is found for RVEs with circular inclusions, whereas the prediction for the other two microstructure classes slightly worsen.Similarly with the rectangular training, more reduced coefficients increase the fit on the RVE with rectangular inclusions, though more reduced coefficients are required to deliver the same accuracies as the circular training, as more number of eigenmodes (of the basis) are required to yield approximately the same relative projection error.
When training with both types of RVEs (i.e. for the mixed input set), the training seemed more random than the others.Some resulting ANNs (which are not shown) had more of the property of a circular training, whereas some were more like the rectangular training.Although the ANN was trained with general RVEs, it overall failed to give good predictions for every type of RVE when using more than two or three reduced coefficients.After training with the mixed set, the prediction of κ11 and κ22 was generally better for RVEs with rectangular inclusions, whereas κ12 was better for RVEs with circular inclusions.During the training, the ANN found most likely some strange local minimum which fitted the training data quite well, however it was not a general mapping for all the microstructure classes, which hints at a too low number of input samples, as discussed earlier.
The ANN architecture did not seem to have a big impact on the quality of the prediction, there have been ANNs with a single hidden layer and six hidden neurons which delivered results comparable to an ANN comprising six hidden layers with more than 100 hidden neurons.
The used activation functions were the sigmoid, relu, tanh and softplus, where only some combinations delivered poor results.Not a clear trend of ANN architecture and quality of prediction could be seen and, consequently, the best ANN were randomly found based on the lowest error on the test set.The prediction accuracies for each test set of three differently trained ANNs, which have been deemed the best, is given in Figure 11.As to be expected, the lowest errors are achieved on the diagonal, i.e. training set = validation set.
The training and architecture of the best ANNs in Figure 11 had the following properties:  2, again using the same three ANNs which have been explicitly shown.Note that since the values of κ12 vary closely around 0 (Figure 4), relative errors are not sensible for the quantity of interest.A GUI code is provided in Github, where the user can choose between the three proposed ANN, the input for the prediction is a 400 × 400 image in matrix format written in a text file or a TIFF image and the output is the prediction for the heat conduction tensor as described above.In order to compile the code, Python3 with Tensorflow is required.Some exemplary RVE with their respective heat conductivity are uploaded in a subfolder.

Summary and Concluding Remarks
The computational homogenization of highly heterogeneous microstructures is a challenging procedure with massive computational requirements.In the present study a method to efficiently and accurately predict the heat conductivity for any RVE with the image and no further information is proposed.Key ideas of the Materials Knowledge System (MKS) [21,27] have been adopted in the sense that a subset of the POD compressed 2-point correlation function is used to identify a low-dimensional microstructure description.In contrast to [27] the 2PCF is not truncated to a small neighborhood, but the full field information is considered.Similar to other works related to the MKS [18], a truncated PCA of the 2-point information is used to extract microstructural key features.
However, the classical truncated PCA used, e.g., in [18] is not applicable to the considered rich class of microstructures due to the high number of needed samples and the related unmanageable computational resources.Therefor, our proposal is founded on a novel incremental procedure for the generation of the RB of the 2PCF.Similar techniques have not been considered in the literature to the best of the authors' knowledge.The shifting of the images of 2PCF before entering the POD is another feature that can help in reducing the impact of the inclusion volume fraction, i.e. the shifted function has zero mean.
Other than in [27] no higher-order statistics are used.This is by purpose as the selection of the relevant entries of the higher order PCF is ambiguous and a challenge in itself.Instead, the present study focus on the variability of the input images in terms phase volume fractions in a broad range (20-80%) alongside topological variations (impenetrable, partial overlap, unrestricted placement) and different morphologies (circles and rectangles).Generally speaking, a much higher microstructural variation is accounted for, than in previous studies.Therefore, the current study also investigates how the proposed technique and similar MKS related approach can possibly generalize towards truly arbitrary input images (e.g.millions of snapshots) in order to built a super-database.
In order to cope with the variability of the 2PCF, the truncated PCA or snapshot POD effected during unsupervised learning phase is replaced by novel incremental procedures for the construction of a small-sized reduced microstructure parameterization.Three increment POD methods are proposed and their results are compared regarding the computational effort, the projection accuracy of the snapshots and the quality of the basis in view of capturing random inputs.
The learned reduced bases are used to extract a low-dimensional feature vector which denotes the input of a fully connected feed forward artificial neural network.The ANN is used to predict the homogenized heat conductivity of the material defined by the microstructure.The mean relative error of the surrogate is lower than 2% for the majority of the considered test data.This is remarkable in view of the phase contrast R = 5 and the particle volume fractions ranging from 0.2-0.8, as well as morphological and topological variations.Further, an immense speedup in computing time is achieved by the surrogate over FE or FFT simulations.
Importantly, the presented methodology can immediately be adopted to different physical settings such as thermo-elastic properties, fluid permeability, dielectricity constants etc.The same holds for three-dimensional problems.However, the limited number of samples in 3d could be problematic as more features are likely required to attain a sufficiently accurate RB.

Discussion and Outlook
A weakness of the current approach remains the computational complexity of the method: Although the feature vector is rather low-dimensional, it requires the evaluation of the 2PCF using the FFT which is of complexity O(n log(n)) where n is the number of pixels/voxels in the image.In order to extract the reduced coefficient vector from the 2PCF, the latter must be projected onto the RB.This operation scales with O(n N).Both operations are at least linear to the number of pixels/voxels of the image which can be critical, especially in three-dimensional settings.Therefor, future investigations aiming at a reduced effort for gathering the features are required in our opinion.
Another extension of the current scheme could account for variable phase contrast R which was fixed as R = 5 in this work.Thereby, the dimension of the feature vector will be incremented which can add to the already existing data scarcity dilemma observed when considering a sufficient number of reduced coefficients: the number of input samples for the supervised learning should grow exponentially with the dimension of the feature vector.In the authors opinion this dependence is the most pronounced short-coming of the method and future studies should focus on limiting the number of required input samples in order to fight the curse of dimensionality: more reduced coefficients require an exponential growth in the available data, making the offline procedure unaffordable, today.
Advantages of the current scheme comprise the independence of the underlying simulation scheme.This does allow for heterogeneous simulation environments, the use of commercial software, multi-fidelity input data and blended sources of information (e.g. in silico data supported by experimental results).

Figure 1 .
Figure 1.Depicting some exemplary microstructures with their respective 2-point spatial correlation functions c 2 (r; b, b) below

Figure 2 .
Figure 2. Graphical overview of the incremental update of the reduced basis

12 Figure 4 .
Figure 4.The range of each κ entry computed with 15000 microstructures is of the mixed set.Only 1000 discrete values are shown in each plot.

Figure 5 .
Figure 5. Development of the relative projection error P δ of the snapshots S 0 with respect to the relative enrichment progress.

Figure 6 .
Figure 6.Relative projection error for three different microstructure classes as a function of the number of eigenmodes.The relative projection error is determined for a validation set of 1500 newly generated microstructures for each class.

Figure 7 .
Figure 7. Using the RB of method C, the relative projection error on the shifted snapshot is given on the left for five random samples.For comparison the relative projection error of the reconstruction of the actual 2-point correlation function is given on the right for the same samples.

Figure 8 .
Figure 8.The development of a few selected eigenmodes over the enrichment are shown.Note that these results are generated with n a = 15 and ε = 0.01 using method C. The procedure comprised a total of 87 basis enrichments/adjustment.

Figure 9 .
Figure 9.The given error measures over the test sets are shown for the ANN which achieved the lowest MSE (cost) on the training set for each number of reduced coefficients and training type.

Figure 10 .
Figure 10.The mean absolute error (MAE) of κ 12 is given for each of the trainings and test sets.

Figure 11 .
Figure 11.Results for the best of all tested for the test sets.The graphs represent probability distribution of the absolute error in the components of κ. .
For an easier readability, the percentage mean and max errors for each ANN training and prediction are given in Table

Table 2 .
Percentage errors for κ11 and κ22 given for each of the best ANNs, validated on every test set.