A Two-Stage Reconstruction of Microstructures with Arbitrarily Shaped Inclusions

The main goal of our research is to develop an effective method with a wide range of applications for the statistical reconstruction of heterogeneous microstructures with compact inclusions of any shape, such as highly irregular grains. The devised approach uses multi-scale extended entropic descriptors (ED) that quantify the degree of spatial non-uniformity of configurations of finite-sized objects. This technique is an innovative development of previously elaborated entropy methods for statistical reconstruction. Here, we discuss the two-dimensional case, but this method can be generalized into three dimensions. At the first stage, the developed procedure creates a set of black synthetic clusters that serve as surrogate inclusions. The clusters have the same individual areas and interfaces as their target counterparts, but random shapes. Then, from a given number of easy-to-generate synthetic cluster configurations, we choose the one with the lowest value of the cost function defined by us using extended ED. At the second stage, we make a significant change in the standard technique of simulated annealing (SA). Instead of swapping pixels of different phases, we randomly move each of the selected synthetic clusters. To demonstrate the accuracy of the method, we reconstruct and analyze two-phase microstructures with irregular inclusions of silica in rubber matrix as well as stones in cement paste. The results show that the two-stage reconstruction (TSR) method provides convincing realizations for these complex microstructures. The advantages of TSR include the ease of obtaining synthetic microstructures, very low computational costs, and satisfactory mapping in the statistical context of inclusion shapes. Finally, its simplicity should greatly facilitate independent applications.


Introduction
Modeling of microstructures is undoubtedly an important area of research in materials science.This kind of synthetic microstructures can be obtained using theoretical models or statistical reconstruction based on real data.The latter usually contain incomplete microstructural information."An effective reconstruction procedure enables one to generate accurate structures at will, and subsequent analysis can be performed to obtain desired macroscopic properties of the media" -this sentence in Torquato book [1], p. 294, briefly describes why modelling of heterogeneous materials is so important in the computational materials science [2][3][4][5][6][7][8].Even prototype three-dimensional heterogeneous structures reconstructed approximately using limited statistical information coming from the cross-section of a given sample, are still an important element in modelling and predicting physical properties [9][10][11][12][13][14][15][16].
Usually, different methods of microstructure characterization and reconstruction reveal certain differences in microstructural features sensitive to length scale, cf.reconstructed speckle patterns in [17,18].This is important from the point of view of the macroscopic properties of the material, because some of them may be sensitive to subtle microstructural differences.Hence the need to develop specialized methods of statistical reconstruction.On the other hand, the advanced high-resolution imaging techniques such as transmission electron microscopy or tomographic analysis, expensive though they are regarding their applications, significantly support the modeling of the so-called structure/property relationships [19][20][21].
An important element in statistical reconstruction is the quantitative characterization of the associated multi-scale spatial inhomogeneity.To extract this kind of structural information, different types of descriptors or various reconstructing methods can be used, see e.g.[22][23][24][25][26][27][28][29][30][31][32][33][34][35].Some statistical descriptors naturally define the 'energy' cost function for the Simulated Annealing (SA) method preferred here [36][37][38].For structurally complex random composites, hybrid combination of two-point correlation function and cluster function [39] is particularly useful, as well as entropic measures of spatial inhomogeneity [40] and statistical complexity [41].The hybrid combination of the above descriptors has been successfully tested for patterns of poly-dispersed islands [42].However, as a general technique, the SA approach is rather slow.
In the current attempt, we deal with structurally similar pattern, but the optimization is devised in a completely new way.In particular, we focus on two-phase materials containing the 'black-phase' represented by well-isolated compact inclusions (granules, grains) of different sizes and any shapes that are dispersed in the 'white-phase' matrix treated as background, cf.Fig. 8 in [29].Usually for this type of polydispersity, it is difficult to obtain synthetic microstructures with similar macroscopic properties due to very variable spatial inhomogeneity even on large length scales.In addition, such reconstructions require a further reduction in computational costs.In this work, we try to solve both problems as follows.
The SA standard approach poses difficulties related to optimization of various spatial features of a microstructure during its reconstructing, e.g. when modifying clusters shape and their relative locations.Therefore, the division of these processes into separate stages, which are algorithmically easier, should significantly simplify and accelerate the entire process of statistical reconstruction.Interestingly, in relation to cluster microstructures [39] or random packings of hard convex lens-shaped particles [43], the authors used the idea of separating features directly related to the cluster properties and independent of their location by means of a two-point cluster function C2 from the features of the spatial distribution of clusters characterized by a two-point correlation function S2.However, these partially complementary statistical features were simultaneously calculated as part of multi-component objective function.In this spirit, focusing on the further development of the methodology of entropic descriptors within SA, we go one step further.Namely, we separate the process of creating synthetic clusters (equivalents of target inclusions) from the process of optimizing their spatial distribution.We call these steps the first and second stages, and the proposed approach is called two-stage reconstruction (TSR).
In the first, we prepare a set of synthetic clusters whose number and individual interfaces are consistent with the goal.For each of the expected synthetic clusters, the procedure begins with the initial quasi-rectangle of the required area.The designed random procedure then optimizes the specific two-component function to modify the interface and shape of the generated cluster.
At the second stage, instead of accidentally placing individual black pixels, we generate the several trial configurations but from the previously created synthetic clusters.From these configurations, we choose the one with the lowest initial energy value for the cost function defined by the extended ED for spatial inhomogeneity.Now, as part of the Monte Carlo approach, instead of single-phase pixels, previously prepared synthetic clusters enter the game.For this purpose, we use a special SA program tailored directly to any compact clusters.
During the TSR procedure, each randomly selected synthetic cluster can move in a random direction and with a random step length if its ending position does not coincide with other clusters.Each attempt is accepted or rejected according to standard SA conditions.However, these conditions apply directly to synthetic clusters, not pixels.The movements of entire synthetic clusters are key to speeding up the statistical reconstruction process.Our findings show that this type of optimization also has an additional advantage.The current TSR produces results of similar accuracy compared to the results obtained in the standard way by the combination of two-point correlation function S2 and cluster function C2 [39].The example of a reconstructed complex microstructure confirms the reliability of our method by using the linear path function.
The article has the following structure.First, we will introduce a specific procedure that creates a set of synthetic clusters with appropriate statistical properties.Thus, any of their random settings can be used as the initial configuration.Next, we present a suitable set of three entropic descriptors.These descriptors are involved in defining the energy cost function for discrete length scales associated with a given target pattern.Then, we test our approach on a morphologically complicated example of silica dispersed in a rubber matrix and, finally, summarize the main results.

Creating synthetic clusters -stage one
In this section, we briefly describe the first stage of our approach.In the starting point, we have a digitized binary image (target pattern) with black compact inclusions in a white matrix.To create a set of statistically equivalent synthetic clusters for all compact inclusions in a target pattern, we use two auxiliary functions, f1 and f2.For each of the compact inclusions, we minimize the difference of the respective interfaces using the simple function Here, Itarget and I are the values of the interfaces of the target inclusion and its current synthetic counterpart, respectively.At the same time, to achieve a certain degree of similarity in shape between the considered target inclusion and related synthetic cluster, we use the second function This function is defined as the sum of squares of normalized differences between the values htarget(i) and h(i) of respective surface histograms.Notice that index i relates to each of the groups of linear distances between the surface pixels, which belong to the subsequent intervals, [i, i+1).Each of the N intervals contains the respective counts for the related distances.
Keeping those functions in mind, the following practical approach is preferred.Suppose that area (in pixels) of a given target inclusion fulfils the inequality: n 2 < Atarget  (n+1) 2 , where n  0 is the corresponding integer.Then, the initial synthetic cluster in the form of a quasi-rectangle is created by pre-filling the square with a linear size (n+1) with the Atarget number of black pixels being unit elements of the main phase.Individual columns of the square are filled from the left to the right until the Atarget capacity is exhausted.Now, the considered synthetic cluster is modified according to some rules.To find the socalled central black pixel, we randomly select a black pixel belonging to the surface of the nontrivial synthetic cluster until the parameter Wcentr-black = 2.This value describes the needed number of walls among those marked by bold lines in Fig. 1 that separate the black pixels from the white pixels currently located in the Moore neighborhood.For the case shown in Fig. 1, according to the definition Wcentr-black = 4, the random selection of the central black pixel must be repeated.Similarly, the central white pixel belonging to the surroundings of the considered synthetic cluster is randomly selected until Wcentr-white = 2.After a successful selection of the central black pixel and central white pixel, the central pixels are swapped in this pair.Accordingly, the two above conditions exclude the risk of dividing the synthetic cluster into parts, as well as the appearance of pores within this cluster.Let Q be the current number of rejected attempts.We define Qmax  3(the number of surface pixels of the considered cluster) as the maximum number of subsequent rejections.Table 1 shows all possible cases during the process of creating a synthetic cluster.
Table 1 Summary activities for all possible situations when creating a synthetic cluster Summarizing, the function f1 never increases, while the function f2 can do so, but after a long series of unsuccessful attempts.Optimization of both goal functions is stopped when the arbitrarily set total number of attempts exceeds 310 3 Qmax.
In Figs.2a and b, we present the edges of the chosen exemplary target inclusion and its synthetic counterpart that was prepared using the above way.In addition, for the chosen length interval [i=4, i+1=5), the corresponding collections of htarget(i) and h(i) are displayed for the linear distances between edge pixels.For clarity, Fig. 3 shows relevant histograms of the linear distance between the edge pixels for the above clusters.It should be emphasized that the procedure for generating a basic set of synthetic clusters corresponding to a given set of target inclusions provides the same dimensionless shape index defined as q = (cluster area) 1/2 /(interface of cluster) in each pair containing real inclusion and its surrogate cluster.In this way, the procedure ensures that also the average shape index < q > is equal for both sets.This type of compatibility did not occur in our earlier models for creating initial cluster configurations [42].

Reconstruction using entropic descriptors -stage two
In this article, we present a statistical two-stage reconstruction (TSR) of microstructure using length scale dependent extended ED associated with the statistical properties of configuration of finite size objects [44].The basis of this method is the assumption that given ED can describe quantitatively the specific spatial properties of the microstructure for different length scales.More details with reference to binary patterns are given in the Appendix while for grey-level patterns see, e.g.[24,46].
The hybrid pair, {S(k), CS (k)}, is often used for multi-scale statistical reconstruction via entropic descriptors [24,42,44,45].The first function, S(k)  [Smax(k)  S(k)]/(k), can be treated as a mean measure of spatial inhomogeneity per cell [18,24,40].Here, S(k) is the current entropy for the real configuration, and Smax(k) denotes the entropy for the theoretically most homogeneous system, while (k) = [L  k + 1] 2 describes the number of partially overlapping sampling cells kk for a given length scale k.In turn, the second function CS (k)  S(k) (k) can be applied as a measure of the spatial statistical complexity [41], where the following Better structural accuracy of statistical reconstructions can be expected using a set of three functions, S(k)  (k) with  = 0, 1 and 2, which we will call extended ED-triplet, {S(k), S(k) (k), S(k) 2 (k)} [44].Some changes must be made to the preferred simulated annealing (SA) technique to make it compatible with our two-stage method for clusters, not pixels.First, the cluster SA technique should consider the modified cost function associated with the extended ED-triplet.The modified objective function E is treated here as the average "energy" per the descriptor curve and the length scale.The energy E is related to the quantity commonly used in statistics as a formal goodness of fit test [37].Here, the cost multi-scale function is the sum of the squared and normalized differences between the respective ED functions for current configuration and target pattern.The differences are normalized with respect to maximum target values.The corresponding simple formulas can be saved as , target target 0 For a given tolerance value, this combined objective function improves the structural accuracy of the reconstruction compared to the case where only one pair with one target curve was used.In particular, in Sec. 4 we consider a target sample of linear size L = 170 (in pixels).To reduce computational costs, we use every second scale of length k = 2, 4,…, 170 in Eq. ( 4).Thus, the actual number of analyzed length scales equals n = 85.After creating a set of {M} initial random configurations consisting of synthetic clusters (previously prepared according to the procedure described in Sec. 2), the value E(M) of the cost function ( 4) is calculated for each of these configurations.Then the configuration mi  {M} with the lowest value E(mi) is selected as the initial one.This ensures a good starting position for the next step.
As mentioned earlier, we adapted the SA technique in these studies to use it for entire synthetic clusters.Instead of swapping pixels in different colors, each attempt concerns a synthetic cluster picked at random.It can move in a random direction and at a distance within the allowable range of step length if the synthetic cluster is not in contact with any of the other clusters.This point is crucial to speed up the process of statistical reconstruction.Further steps are quite typical.For a given temperature, loop and current configuration with Eold energy, the next configuration with Enew energy (called the new state of the system) is generated by randomly changing the position of a selected synthetic cluster.This is accepted with probability p() given by the Metropolis-MC acceptance rule [1] where  = Enew  Eold is the change in the energy between two successive states.After acceptance, the trial configuration becomes a current one and the procedure is repeated.A quite aggressive cooling schedule, T(l) ⁄ T(0) = 0.82, was used for temperature loops of increasing length.Here l numerates the loops, the initial fictitious temperature is T(0) = 510 -5 and for the chosen tolerance value  = 710 5 we get sufficiently accurate reconstructions.In the next section, we will apply this method for a difficult case from the viewpoint of standard statistical reconstruction using SA technique.Our method is particularly convenient to use for this type of aggregated isotropic materials that contain highly irregular inclusions.

Numerical example
In this part, we would like to examine our approach using the example of a microstructure with 20.04 % silica in a rubber matrix adapted from [29].In the cited article, the author introduces and tests a general methodology for characterization and reconstruction based on supervised learning that can be applied to a wide range of microstructures.For our purposes, we select a representative section 170170 treated further as the target pattern; see Fig. 4a.It is clearly seen that the black phase is characterized by non-uniform spatial distribution of 113 irregular silica inclusions.The corresponding collection of synthetic clusters provides the generating procedure described in Sec. 2 with the same average shape index per cluster < q > = < qtarget > = 0.20.From several random trial configurations, we choose the one with the lowest initial energy value for the cost function defined by formula (4).In this article, we construct the cost function using a family of three extended multi-scale entropic descriptors.Accordingly, in Fig. 4b we present the synthetic clusters configuration chosen as the initial one with the lowest value of the cost function, Estart = 710 3 .Now, we are ready to begin the second stage of the statistical reconstruction process.Using the SA program developed directly for clusters we obtain an optimized microstructure corresponding to the first appearance of such a value of the cost function (4), which ensures adequate reconstruction accuracy; see Fig. 4c.For our purposes, the appropriate tolerance value  = 710 5 is sufficient to obtain the energy value final = 6.710 5 < .In addition, two small frames (the orange online) focus our attention on a certain compact inclusion in Fig. 4a and its corresponding synthetic cluster in Fig. 4c.In turn, Figs.5a and b shows the quality of the TSR.A comparison of the extended entropic descriptors {S(k)  (k)} with  = 0, 1 and 2 for the target pattern (cf.Fig. 4a), the solid lines, with those for the optimized microstructure (cf.Fig. 4c), the open circles, is presented in Fig. 5a.The ED values were computed using hard-wall boundary conditions (HWC).The results related to the initial configuration (cf.Fig. 4b) are represented by the dashed lines.The SA technique for clusters provides an optimized microstructure with sufficiently good fitting to the target curves after just 792 accepted MC-steps.This suggests the high efficiency of the proposed method of TSR for microstructures with arbitrarily shaped compact inclusions.Additionally, Fig. 5b reveals the interesting confirmation of the quality of our TSR.When two-point correlation function S2(k) and orthogonal lineal-path function L(k) are computed for the initial configuration, microstructure of target and finally optimized pattern by mean of TSR, the corresponding values show a satisfactory agreement.Note that within HWC, the values of L(k) calculated for the initial and optimized configuration are the same.Thus, only the dashed and solid lines are compared for this function.
Moreover, because the values of L(k) do not depend on the spatial distribution of these clusters, it seems reasonable to carry out the preliminary selection among a few generated sets of synthetic clusters.For example, as the preferred basic set, we can choose the one (*) for which we obtain the smallest value of the sum over k for [L * (k)  Ltarget(k)] 2 .Only now do we use (4) to select the minimum value of the cost function E(M * ) calculated for each of the prepared M * random configurations of synthetic clusters belonging to the preferred set (*).This initial configuration would be optimal at the second stage for future applications of modified SA for clusters.
It is noteworthy that the SA modified for clusters shows a significant advantage over other methods when the microstructure contains clusters relatively large in comparison with the sample size.This observation is suggested by preliminary results obtained for a sample of concrete originally analyzed in [39].On the other hand, the current approach can also be adapted to the reconstruction of porous microstructures provided we know at least the complete interface.Finally, our method can also be extended to a more general case of non-compact clusters and to three dimensions.
Shortly after this study had been completed, the authors became aware of the existence of a conceptually similar article [47].The algorithm proposed there is a three-step process that generates statistically equivalent representative volume elements (RVE) of the size 150150150 (in pixels) from the particle shape repository.This repository reflects realistic microstructures obtained from micro-computed tomography scanning of mechanoluminescent material.At the final stage, the difference between the histograms of the targeted and reconstructed cumulative distribution function is minimized by the SA technique.Using the S2(r) for verification of their method, the authors of [47] present Fig. 10 with two diagrams.
(N.B.The corresponding descriptions, 11 and 12, of ordinate axes should be interchanged.) According to the authors, a slight gap between the reconstructed and the target model occurring in the characteristic range of distances, 10 < r < 30, could be explained with the stochastic characteristics of the reconstruction and small differences of particle sizes.
In our method applied to the 170170 target pattern, such an effect for the S2(k) in the characteristic range of length scales, 5 < k < 20, is negligible, cf.Fig. 5b.It should be emphasized that the areas and interfaces of the target inclusions are accurately reproduced by synthetic clusters.Nevertheless, their shapes differ in the stochastic context.The final planar configuration of synthetic clusters is optimized within the modified SA.

Conclusions
We offer an innovative two-stage method that can be used to generate statistically equivalent synthetic microstructures.This approach uses selected length scale dependent entropic descriptors.Particular emphasis was placed on binary microstructures with arbitrarily shaped inclusions.For a given binary target pattern, the proposed method for reconstructing any heterogeneous microstructure with truly irregular inclusions is divided into two basic stages.At the first stage, a set of synthetic clusters is created, whose number corresponds to the target pattern.The procedure used ensures the same clusters areas and interface values as in the target pattern.This way individual shape indices as well as their average value per cluster are also preserved.Thus, some significant statistical characteristics of the target pattern are taken into account.At the second stage, the approach uses the simulated annealing technique adapted to clusters.As a result, the computational cost in the two-stage method is much lower compared to the standard approach.Our method has been tested on an adapted sample of silica inclusions of various sizes and very irregular shapes widely dispersed in a rubber matrix.Final statistical reconstructions suggest that even for morphologically complex material, this low cost method easily generates statistically reliable synthetic patterns.Due to the packaging problem, with unusual or branched shapes of inclusions, the second stage of our method is effective when the concentration of inclusions is less than 0.5 or slightly exceeds this value.In addition, two-point correlation functions calculated for comparison purposes for a given reconstruction and target pattern are in satisfactory agreement.This further confirms the flexibility of our approach.We believe that the method presented here can be useful for fast and low-cost reconstruction of a wide range of cluster microstructures.

Appendix
For better readability, we briefly present the most important details related to binary entropy descriptors (ED) defined for black pixels treated here as unit objects.The ED make use of micro-canonical configurational entropy S(k) = ln  (k), its maximum possible value, Smax(k) = ln max(k), as well as the minimum one, Smin(k) = ln min(k).Here the Boltzmann constant kB = 1.The length-scale is determined by the side length of a square sampling cell of the size k  k sliding by a unit lattice constant.Thus, the number of allowed positions for the sliding cell is equal to (k) = [L  k + 1] 2 , which leads to a set of cell occupation numbers {ni(k)}, i = 1, 2,…, (k).To simplify the notation, we skip the symbol k whenever possible.
"Maps" composed of sampled cells placed in a non-overlapping manner can be regarded as representative because they clearly reproduce on each scale k the overall structure of the real initial sample.Such an approach allows computing the actual entropy S(k) related to the configurational actual macrostate AM(k)  {ni  (0, 1,..., k 2 )} realized by the number of microstates The entropic descriptor Squantifies the averaged per cell pattern's spatial inhomogeneity (a measure of configurational non-uniformity) by taking into account the average departure of a system's entropy S from Smax.When a system's actual entropy S  Smin, the spatial inhomogeneity becomes maximal.For a more detailed spatial analysis of a given binary pattern, the simplest hybrid approach using a pair {S, CS} can be recommended.Here, the CS -descriptor measures the so-called statistical spatial complexity.The entropic descriptor CS  S is able to quantify the complex spatial behaviour by taking simultaneously into account the average departures of a system's entropy S from both its maximum possible value Smax and its minimum possible value Smin [41].This becomes clear after recalling the definition If the two departures, Smax  S and S  Smin, are comparable then statistical complexity is maximal.Of course, we can use similar ideas to obtain grey-level equivalents of the above ED, which can be useful for multi-phase materials [18,24,46].

Fig. 1 .
Fig. 1.Sketch explaining the meaning of the auxiliary parameter Wcentr-black for a case where its value equals 4.

Fig. 2 .
Fig.2.The two collections htarget(i) and h(i) for distances, marked by the orange lines, within [i=4, i+1=5) between the edge pixels are presented: (a) for the chosen target inclusion with 95 connections, and (b) for the synthetic cluster with 87 connections.Only the edge pixels affecting the value of the related interfaces are sketched.Both clusters have area=143 and interface=76, so their shape index q  0.157 (the definition of q used is given in the text).

Fig. 3 .
Fig.3.The histograms of linear distances between the edge pixels, htarget(i), the drop black line, and h(i), the drop grey line, for the chosen target inclusion and the related synthetic cluster from Fig.2, respectively.The common value htarget(i=0) = h(i=0) corresponds to the number of the edge pixels.

Fig. 5 .
Fig. 5. Accuracy of the TSR of the microstructure with compact silica inclusions in a rubber matrix (adapted from [29]) supported by a comparison of the values of related initial, target and optimized functions: (a) extended ED-triplet that is S(k)  (k) with  = 0, 1 and 2 for hard-wall boundary conditions (HWC); (b) two-point correlation function S2(k) for periodic boundary conditions (PBC) in both directions and orthogonal lineal-path function L(k) for HWC.
possible value Smax(k) is associated with the most uniform macrostate RMmax(k)  {n i  (n 0 , n 0 + 1)}max, with   r 0 number of cells occupied by n 0  (0, 1, ..., k 2  1) and r 0 cells populated by n 0 + 1 of black pixels.The number of RMmax-realizations is given by 0 = N mod  and n 0 = (N  r 0 )/ .In turn, the reference minimum possible value Smin(k) corresponds to the most spatially inhomogeneous macrostate RMmin(k)  {ni  (0, 0 < n < k 2 , k 2 )}min, with   q0  1 of empty cells, at the most one cell occupied with the number n of pixels and q0 of fully occupied cells.The obvious relation holds:N = n + q0 k 2 ,where n = N mod k 2 , q0 = (N  n)/k 2 and q0  (0, 1, ...,   1) and the number of RMminmicrostates can be written as