InfPolyn, a Nonparametric Bayesian Characterization for Composition-Dependent Interdiffusion Coefficients

Wei W. Xing; Ming Cheng; Kaiming Cheng; Wei Zhang; Peng Wang

doi:10.3390/ma14133635

Abstract

Composition-dependent interdiffusion coefficients are key parameters in many physical processes. However, finding such coefficients for a system with few components is challenging due to the underdetermination of the governing diffusion equations, the lack of data in practice, and the unknown parametric form of the interdiffusion coefficients. In this work, we propose InfPolyn, Infinite Polynomial, a novel statistical framework to characterize the component-dependent interdiffusion coefficients. Our model is a generalization of the commonly used polynomial fitting method with extended model capacity and flexibility and it is combined with the numerical inversion-based Boltzmann–Matano method for the interdiffusion coefficient estimations. We assess InfPolyn on ternary and quaternary systems with predefined polynomial, exponential, and sinusoidal interdiffusion coefficients. The experiments show that InfPolyn outperforms the competitors, the SOTA numerical inversion-based Boltzmann–Matano methods, with a large margin in terms of relative error (10× more accurate). Its performance is also consistent and stable, whereas the number of samples required remains small.

Keywords:

Gaussian process; nonparametric Bayesian; interdiffusion coefficient; Boltzmann–Matano analysis

1. Introduction

In many industrial processes that involve diffusion, e.g., alloy solidification, heat treatment, coating and electric packaging, the characterization of composition-dependent interdiffusion coefficients is a crucial task, as it quantifies a diffusion process clearly. The classic approach is based on Boltzmann–Matano analysis [1,2] which transforms the diffusion system into a linear system of equations. However, the Boltzmann–Matano analysis is only applicable to a binary system and becomes problematic for a system with more than three components, as it generates an under-determined system of equations that mathematically does not yield a unique solution. To address such a challenge, a number of methods have been developed over the years. Kirkaldy et al. [3] introduced the Kirkaldy–Matano method and provided extra equations to the linear system by adding additional M diffusion paths with intersection points. Although the results have been shown to be accurate, this method cannot generalize well to a multi-component system because the difficulty in experimentally generating intersection points grows drastically with the number of components

M + 1

[4]. Alternatively, methods based on one diffusion couple were proposed. Dayananda and Sohn [5] suggested integrating over certain composition ranges along the diffusion path to evaluate an average interdiffusion coefficient. Cermak and Rothova [6] later extended this method by choosing an infinitely small integration interval. Nevertheless, as is pointed out by Cheng et al. [7], the integration approach can lead to ill-conditional problems. A pseudobinary approach is introduced by considering only two components diffused into the diffusion zone. This method takes advantage of its time independence in the first-order linear equations and thus is very efficient when the pseudobinary condition is strictly satisfied in experiments. In practice, such experimental conditions may be difficult to meet, and in addition, for a multi-component system with a limited number of experimental samples, the linear equations are not capable of eliminating the extra solutions [8,9]. Separately, Zhang and Zhao [10] suggested a forward-simulation approach by iteratively optimizing the interdiffusion coefficients with repeated forward-simulations, similar to the classic inference approach for inverse problems. Although such a method is shown to be accurate and stable, it incurs an overwhelming computational cost because each iteration requires a complete diffusion simulation with a fine spatial-temporal grid.

Another branch of the one-diffusion-couple method lies in assuming a polynomial functional form for the interdiffusivities. Ideally, with a proper design of the polynomial function, one can compute the coefficients of the polynomial functions to estimate the interdiffusion using a numerical inverse method [11]. This numerical inverse approach is adopted by Chen et al. [4] to include the atomic mobility [12] to study the diffusion in the solution phase of a multicomponent system. To improve the efficiency of the numerical inverse method, Cheng et al. [13] recast the original parabolic inverse problem [11] as a linear multi-objective optimization to improve computation efficiency while maintaining similar accuracy. The optimization algorithm places weak limits on the experimental samples and is applied to interdiffusivities of solid solution as well as various alloy systems [14,15]. This approach was recently improved by Qin et al. [16], who suggest solving an underdetermined linear system using compress sensing, a popular regularization technique, to increases stability against high order polynomial functions. However, the

L^{1}

penalty imposed by compress sensing may introduce inappropriate prior assumptions, leading to inferior overall performance. We will see this issue in detail in the later experiment section.

Despite the notable performance and the popularity of the polynomial functional interdiffusion coefficient approaches, they share a fatal issue—how does one design the polynomial functions? Considering a quadternary system (

M = 3

), we have

3 \times 3

polynomial functions requiring careful designs; modifying one function will affect the results of the other two. The challenge grows quadratically with the number M. Without proper design and repeated validations, the polynomial approach will lead to overfitting or underfitting, making this approach infeasible in practice.

One way to resolve this challenge is to use a complicated enough model with many polynomial terms and utilize classic Bayesian inference techniques [17] to estimate the posterior of the polynomial coefficients. In particular, Girolami [18] proposed an interesting Markov chain Monte Carlo (MCMC) for nonlinear and complex differential equations where the fully analytic expressions for the posterior distribution do not exist, which is similar to our problem. Despite its elegance and great accuracy, an MCMC approach often suffers from slow convergence and poor mixing, making it less practical for complex applications. To improve inference efficiency, the approximate Bayesian computation (ABC) and their variations, e.g., MCMC ABC and sequential Monte Carlo ABC (SMC ABC) are put forth by Alahmadi et al. [19]. However, despite being accurate and easy to implement, these types of sampling methods do not scale well with the number of parameters to be inferred. With unknown polynomials, the large number of parameters makes such methods impractical even with the latest accelerated variations [20,21].

Recently, the Gaussian process (GP) [22] has been utilized in dealing with data that are generated from a system of differential equations. As a back-box regression model, GP is proposed for fast parameter posterior estimations with the derivative information of the differential equations even with partially observed data [23]. The explicit derivative information is further utilized to improve a general GP’s performance for data that are generated from differential equations [24]. The derivative in a given system of differential equations is further harnessed through a constraint manifold such that the derivatives of the Gaussian process must match an ordinary differential Equation (ODE) [25]. Despite their success, these works generally require explicitly known differential equations to work. Thus, they cannot directly be implemented for our problem.

A closely related work is [26], where GP is used as a generalization for a parametric function for binary images. However, their work cannot be directly implemented in our problem because our systems of equations will lead to a mixture of GPs that are augmented by the derivative of concentrations, whereas there is normally only one GP to estimate in most of the previous works [23,24,25,26,27].

To address the challenge of stable characterization of the interdiffusion coefficients, we introduce InfPolyn (Infinite Polynomial), a nonparametric Bayesian framework for the characterization of composition-dependent interdiffusion coefficients.

In particular, we first extend the general polynomial fitting method with an infinite number of polynomial terms. We then integrate out the polynomial coefficients with a Gaussian prior to derive a nonparametric functional form for the interdiffusion coefficients. To further improve our model with prior assumptions of an interdiffusion system, we introduce a diagonal-dominant prior for the functions of the interdiffusion coefficients. Unlike most Bayesian fitting problems, the interdiffusion coefficients are not known/observable to us. Thus, we introduce latent variables, the virtual ghost interdiffusion coefficients to address this issue. Finally, we derive a tractable joint likelihood function for model training. We compare InfPolyn with the state-of-the-art Matano-based numerically inverse methods and their variations. In ternary and quaternary systems with polynomial, exponential, and sinusoidal interdiffusion coefficients, InfPolyn shows a significant improvement over the competitors in terms of relative errors. In most of the experiments, our model shows an excellent performance with only 40 EPMA measurements, which is very desirable in practical interdiffusion coefficient estimations.

Essentially, InfPolyn is a functional estimation method tailored for the characterization of interdiffusion coefficients by imposing a mixture of the SOTA nonparametric models, GPs, and particular prior knowledge. Unlike the classic Bayesian inference approaches [18,19], InfPolyn does not require a time-consuming sampling process and is thus much more efficient. The highlights of this work for interdiffusion coefficient characterizations are as follows:

InfPolyn does not require assumptions for the particular functional form of the interdiffusion coefficient; it is robust against overfitting and underfitting.
InfPolyn does not require a significant number of training data.
Prior knowledge of the interdiffusion system can be added easily in the framework of InfPolyn.

We hope the success of the nonparametric Bayesian framework can inspire more interesting applications in other interdiffusion coefficient estimation methods, e.g., the forward-simulation approach [10], in the material community. Thus, we publish our code and will maintain it as an open source toolbox on Github (https://github.com/wayXing/InfPolyn, accessed on 26 June 2021).

The rest of this paper is organized as follows. The interdiffusion coefficient estimation problem is introduced in Section 2, followed by a brief summary of the Matano–Boltzmann numerical inverse method with polynomial functions in Section 3. Our method is presented in Section 4, including the derivation, prior knowledge assumptions, and model training. The comparisons to the other SOTA methods through ternary and quandary systems are demonstrated in Section 5. Finally, Section 6 summarizes our work.

2. Statement of the Problem

We firstly formulate our problem mathematically as a foundation of this work. Consider a general one-dimensional diffusion system with

(M + 1)

components. According to Fick’s second law [28], the diffusion process is fully characterized by

\begin{matrix} \frac{\partial c^{i}}{\partial t} = \nabla (\sum_{j = 1}^{M} D_{i j} \nabla c^{j}), i = 1, \dots, M, \end{matrix}

(1)

where

\nabla

is the partial derivative operator,

c^{i}

is the concentration of i component (note that the concentration is a function of space and time

c^{i} (t, x)

);

D_{i j}

is the interdiffusion coefficient w.r.t. the concentration gradient of component j. In many textbook examples,

D_{i j}

is assumed constant, but in practice,

D_{i j}

depends on the concentrations of all components

c = {(c^{1}, \dots, c^{M})}^{T}

. Our goal is to find

D_{i j} (c)

for all

i, j = 1, \dots, M

with, ideally, a concentrations profile

C = {(c {(t_{e}, x_{1})}^{T}, \dots, c {(t_{e}, x_{N})}^{T})}^{T} \in R^{N \times M}

at some terminal time

t_{e}

and spatial locations

{x_{n}}_{n = 1}^{N}

, where N is the number of sampling points at different locations. To avoid clutter, we denote

c_{n} = c (t_{e}, x_{n})

. One may notice that an important factor, temperature, is not considered in the formulation. This is due to the general process of the experiment. To conduct the experiment and obtain the concentration profile, one first bonds two blocks of materials together and holds them at certain temperatures to activate interdiffusion at the initial interface. The annealing procedure may last from hours to days, depending on the speed of forming an interdiffusion zone wide enough for analysis. The temperature remains constant during the long-lasting annealing process except for the beginning and ending stages, which take short time. Thus, the temperature is considered constant for the interdiffusion coefficient characterizations. To fabricate just one diffusion couple, around 50–100 sample points are often selected in a line parallel to the direction of element diffusion within the interdiffusion zone. Each sample point is analyzed through electron probe micro-analysis (EPMA), which requires several minutes for the equipment to detect the concentrations. As a result, the experiment is time-consuming, and only a small amount of samples, i.e., small N, can be provided.

3. Boltzmann–Matano Polynomial Interdiffusion Coefficients

We follow the original work of the Boltzmann–Matano method [2], which is widely used to extract concentration-dependent interdiffusion coefficient

{D_{i j}}

from experimental concentration profiles. The Boltzmann–Matano method first integrates Fick’s law of diffusion (1) in time to obtain the following system,

\begin{matrix} \frac{1}{2 t} \int_{0}^{c^{i}} (x - x_{0}) d c^{i} = - \sum_{j = 1}^{M} D_{i j} \nabla c^{j}, i = 1, \dots, M, \end{matrix}

(2)

where

c^{i}

denotes the terminal concentration of i components,

\nabla c_{j}

is the concentration gradient, and

x_{0}

is the known Matano plane, defined by

\begin{matrix} \int_{- \infty}^{x_{0}} (1 - c (x)) d x = \int_{x_{0}}^{+ \infty} c (x) d x . \end{matrix}

(3)

For a binary system, i.e.,

M + 1 = 2

, there is only one composition-dependent interdiffusion coefficient

D_{11}

to determine with one diffusion couple. Based on Equation (2), we can can directly compute

D_{11} (c_{n})

for

n = 1, \dots, N

and then use any curve-fitting method to characterize the function of

D_{11} (c)

. For a ternary system, i.e.,

M = 2

, we need to determine

D_{i j} (c)

for

i = {1, 2}

and

j = {1, 2}

. For each sample

c_{n}

, we can write only two equations whereas there are four unknown parameters. This is an underdetermined system of equations to solve and will lead to multiple solutions. An effective and efficient solution is to assume a continuous function of interdiffusivity in a polynomial form, e.g., an independent quadratic form,

D_{i j} (c) = w_{i j}^{(0)} + \sum_{i = 1}^{M} (w_{i j}^{(i)} c^{i} + w_{i j}^{(M + i)} {(c^{i})}^{2}),

(4)

where w is the weight coefficient in the polynomial function. Denote the flux of the L.H.S. of Equation (2) as u: we have

u^{i} = (u_{1}^{i}, \dots, u_{N}^{i})

, where

u_{n}^{i} = \int_{0}^{c^{i} (x_{n})} (x - x_{0}) d c^{i} / 2 t

. Estimation of

D_{i j}

for

j = 1, \dots, M

can then be computed by solving the system of equation

u_{n}^{i} = - \sum_{j = 1}^{M} D_{i j} (c_{n}) \nabla c_{n}^{j},

(5)

where

D_{i k} (c_{n})

is the polynomial function fully determined by its weight coefficients given a particular functional form and

c_{n}

. All weight coefficients

W = {w_{i j}^{k}}

in the polynomial functions can be computed by solving the optimization problem,

\underset{W}{argmin} \sum_{n = 1}^{N} {∥u_{n}^{i} + \sum_{j = 1}^{M} D_{i j} (c_{n}) \nabla c_{n}^{j}∥}^{2},

(6)

where

{∥\cdot∥}^{2}

denotes the

L^{2}

norm, which can be replaced with other norms.

Remark 1.

Since the estimation of

D_{i j} (c)

for each

i = 1, \dots, M

only depends on

u^{i}

and is computed independently, we omit the index i and reformulate the Matano–Boltzmann method with polynomial interdiffusion coefficients to avoid clutter,

\underset{W}{argmin} \sum_{n = 1}^{N} {∥u_{n} + \sum_{j = 1}^{M} d_{n}^{j} \nabla c_{n}^{j}∥}^{2} = \underset{W}{argmin} \sum_{n = 1}^{N} {∥u_{n} + \nabla c_{n}^{T} d_{n}∥}^{2},

(7)

where

u_{n}

is the flux for any arbitrary component, and

\nabla c_{n}^{j}

is the concentration gradient for j component, both of which are computed from the profile

C

;

d^{j} (c_{n})

is the j column of any arbitrary row of

D_{i j} (c)

that matches the chosen flux at concentration

c_{n}

;

d_{n} = {(d^{1} (c_{n}), \dots, d^{M} (c_{n}))}^{T}

is the collection. We aim to reveal

d^{j} (c)

for

j = 1, \dots, M

.

Optimization for Polynomial Fitting

Equation (7) is a convex optimization problem provided that we have

N \geq 3 (K + 1) M

EPMA samples and we use a K-order polynomial function of Equation (4) for all

d^{j} (c)

; the closed-form solution is presented in the Appendix A. This is certainly impractical for large M and/or K. In this case, regularization techniques, e.g.,

L^{2}

-norm minimization or compress sensing, can be implemented to solve such an underdetermined system. The polynomial fitting approach with regularization is efficient in terms of computational time, space complexity, and implementation simplicity, thanks to many excellent software solutions, e.g.,

l_{1}

-magic, SPGL1, and SeDuMi [29,30,31].

4. InfPolyn for Interdiffusion Coefficients

The challenge of the discussed polynomial based approach is the lack of guidelines on how to build the model, i.e., the selection of the order of the polynomial and the polynomial form. It is unclear how many polynomial terms are needed for each diffusion coefficient such that the model is not overfitting or underfitting. Although regularization techniques [16] can be implemented, the underlying assumptions of regularization are unclear, which can lead to unexpected performance. We need a systematic way to specify the diffusion coefficients with correct prior knowledge in order to achieve better results. To this end, we propose a nonparametric Bayesian approach that is flexible enough to capture the complex nonlinear relation while restricting itself from overfitting the data by integrating all possible solutions.

4.1. Infinite Order Polynomial Model

To start with, we write the polynomial regression, e.g., Equation (4), in a compact form,

d^{j} (c) = w_{j}^{T} ϕ_{j} (c) + β_{j},

(8)

where the polynomial terms are denoted compactly as

ϕ_{j} (c) = {(c^{1}, {(c^{1})}^{2}, \dots, c^{2}, {(c^{2})}^{2}, \dots)}^{T}

, where

ϕ_{j} (\cdot)

is the predefined feature mapping that encodes the the polynomial functional form. Essentially, we can project the concentration

c

onto an

r

–dimensional feature space using an arbitrary mapping

ϕ_{j} (c) \in R^{r}

. Note that the constant term can also be absorbed into the feature mapping by setting the first element as 1. In the linear model case, the feature mapping is simply

ϕ_{j} (c) = {(1, c^{T})}^{T}

.

Obviously, this polynomial approach is only accurate and stable when we roughly know the functional form of

ϕ_{j} (c)

. Furthermore, it requires a large number of parameters

{w_{j}, β_{j}}

to be estimated. Rather than estimating the weight parameters, we consider a matrix Gaussian prior for the weight vector

w_{j}

,

w_{j} \sim N (0, Ω_{j}) = \frac{1}{\sqrt{{(2 π)}^{r} | Ω_{j} |}} exp (\frac{- w_{j}^{T} Ω_{j}^{- 1} w_{j}}{2}),

(9)

where,

Ω_{j} \in R^{r \times r}

indicates the correlation between the weight components. We then integrate out the weights and directly work with the marginal, which admits a closed-form solution,

\begin{matrix} p (d^{j} | c) & = \int p (d^{j} | w_{j}, c) p (w_{j}) d w_{j} \\ = \int (w_{j} ϕ (c) + β_{j},) N (w_{j} | 0, Ω_{j}) d w_{j} \\ = N (β_{j}, {(ϕ_{j} (c))}^{T} Ω_{j} ϕ_{j} (c)) . \end{matrix}

(10)

This is also known as the Gaussian process (GP) [22]. If we use a countably infinite feature space, i.e.,

r \to \infty

, we formally define a sum over infinite polynomial terms. Thus, we call our model InfPolyn, infinite polynomial. Our model now becomes a nonparametric model that contains no explicit parameters

w_{j}

. The model parameters are now encoded in

{(ϕ_{j} (c))}^{T} Ω_{j} ϕ_{j} (c)

, which indeed indicates a inner product in the the feature space spanned by

ϕ_{j} (c)

.

4.2. Kernel Formulation

Note that

Ω_{j}

is p.s.d. by its definition, and we can encode the inner product using a compact function, i.e.,

k_{j} (c, c^{'}) = {(ϕ_{j} (c))}^{T} Ω_{j} ϕ_{j} (c)

. This is known as the kernel trick, which works by replacing the explicit feature mapping and covariance with a kernel function

k_{j} (c, c^{'})

to indicate an inner product in the feature space. Different kernels can capture different functional features. For instance, a periodic kernel can capture periodic functions such as sinusoidal functions. If we do not know the explicit form of the kernel function, which is true in most cases, the automatic relevance determination (ARD) kernel,

k_{j} (c, c^{'}) = θ_{j 0} exp (- {(c - c^{'})}^{T} {(I ⊙ ({\tilde{θ}}_{j}^{T} {\tilde{θ}}_{j}))}^{- 1} (c - c^{'})),

(11)

is commonly adopted as it generally provides good performance in most cases, especially in regression problems [22]. In this formulation, ⊙ denotes the Hadamard product,

I

is an identity matrix,

θ_{j 0}

is the scaling factor for the kernel function, and

{\tilde{θ}}_{j} \in R^{M \times 1}

is a vector with scaling factors for each input components, i.e., the concentrations of different elements. We denote

θ_{j} = {(θ_{j 0}, {\tilde{θ}}_{j}^{T})}^{T}

for clarity. These parameters

θ_{j}

are known as the hyperparameters because they control the random process (10) statistically rather than in a determinant way (e.g., the aforementioned polynomial fitting). In this work, we use the ARD kernel throughout unless stated otherwise.

4.3. Ghost Interdiffusion Coefficients

Given that

d_{j} (c)

is a Gaussian process as stated in Equation (10), any number of observations form a joint Gaussian distribution, based on which a closed-form likelihood can be easily calculated. Unfortunately, unlike the classic regression problems, we do not have any direct observations of

d_{j} (c)

, and we cannot directly obtain the optimized hyperparameters

{θ_{j}, β_{j}}

. To resolve this problem, we borrow the pseudo-inducing points idea [32] and introduce a set of virtual ghost interdiffusion coefficients,

{h_{j g} = d_{j} (z_{j g})}_{g = 1}^{G}

, that are sampled from the function

d_{j} (c)

for virtual concentrations

{z_{j g}}_{g = 1}^{G}

. These latent variables must form a joint Gaussian distribution (because they are sampled from a Gaussian process of (10)),

h_{j} \sim N (β_{j} 1, K_{j}),

(12)

where

h_{j} = {(h_{j 1}, \dots, h_{j G})}^{T}

is the collection of the ghost interdiffusion coefficients and

{[K_{j}]}_{g g^{'}} = k_{j} (z_{j g}, z_{j g^{'}})

is the covariance matrix computed through the kernel function and the latent locations

Z_{j} = {z_{j g}}_{g = 1}^{G}

. Normally, h_j and

Z_{j}

are latent variables that need to be integrated out during the model training and predictions.

4.4. Diagonal-Dominating Prior

Following Occam’s razor, if the dominant diagonal diffusion coefficients

D_{i i} (c)

for

i = 1, \dots, M

can fully explain the diffusion process, it is reasonable to suppress the non-diagonal diffusion coefficients

D_{i j} (c)

for

i \neq j

to encourage a simpler model. To inject this preference of model, we design a special Laplace prior for the mean value for each Gaussian process of (12),

β_{j} \sim Laplace (0 . 01^{(1 - δ (i, j))}, 0.1),

(13)

where

δ (\cdot, \cdot)

is the delta function and i is the row that matches the choice of

d^{j} (c)

. We use a Laplace prior rather than a Gaussian prior to encourage sparsity of the diffusion concentration for non-diagonal locations. The particular prior parameters may be adjusted according to a different system to reflect our prior knowledge.

4.5. Joint Model Training

With each interdiffusion coefficient

d^{j} (c)

fully specified previously, the observed flux

u_{n}

can be recovered by

u_{n} = f_{n} + ϵ_{n} = \nabla c_{n}^{T} d (c_{n}) + ϵ_{n},

(14)

where we use the noise term

ϵ_{n}

to capture the model inadequacy, uncertainty, and noise as a Gaussian distribution,

ϵ_{n} \sim N (0, σ^{2})

, for the observed flux;

f_{n}

denotes the unknown true flux.

Eventually, the last piece of this work is the the estimation of the posterior of all hyperparameters

Θ = {θ_{j}}_{j = 1}^{M}, B = {β_{j}}_{j = 1}^{M}

,

Z = {Z_{j}}_{j = 1}^{M}

,

H = {h_{j}}_{j = 1}^{M}

, and

σ

. Although MCMC can be directly implemented to compute all model parameter posteriors, the computational time is overwhelming considering the large number of hyperparameters and the efficiency of an MCMC procedure. Instead, we opt for the maximum a posterior (MAP) approach. The log posterior decomposes as the log likelihood and the prior information,

\underset{Θ, B, Z, H, σ}{argmax} (L (Θ, B, Z, H, σ) + log p (B)),

(15)

where

L (Θ, B, Z, H, σ) = log p (u)

is the log likelihood of our model, which can be computed by comparing the predicted flux f and the observed flux u. More specifically, the log marginal likelihood can be computed by

\begin{matrix} log p (u) = log \int p (u | f) p (f) d f \end{matrix}

(16)

To complete the integration in Equation (16), we first notice that

p (u | f) = N (u | f, σ^{2} I)

(17)

is simply a Gaussian;

p (f)

is a mixture of M Gaussians, which is also Gaussian because

\begin{matrix} p (f) & = \sum_{j = 1}^{M} \nabla c^{T} d^{j} (c) = \sum_{j = 1}^{M} \nabla c^{T} N (μ_{j}, Q_{j}) \\ = \sum_{j = 1}^{M} N (\nabla c^{j} ⊙ μ_{j}, \nabla c^{j} Q_{j} \nabla {(c^{j})}^{T}) = N (μ, Q) . \end{matrix}

(18)

In this equation,

μ_{j} = β_{j} 1 + k_{j} {(K_{j})}^{- 1} (h_{j} - β_{j} 1)

is the predicted interdiffusion expectations for j;

Q_{j} = {\hat{K}}_{j} - {\tilde{K}}_{j} K_{j}^{- 1} {\tilde{K}}_{j}^{T}

is the covariance matrix, with

{[{\tilde{K}}_{j}]}_{n g} = k_{j} (c_{n}, z_{j g}) \in R^{N \times G}

being the covariance between

C

and

Z_{j}

and

{[{\hat{K}}_{j}]}_{n n^{'}} = k_{j} (c_{n}, c_{n^{'}}) \in R^{N \times N}

being the covariance for

C

.

μ = \sum_{i = 1}^{M} μ_{i} ⊙ \nabla c^{i}

is the joint expectations;

Q = \sum_{i = 1}^{M} \nabla c^{j} Q_{j} \nabla {(c^{j})}^{T}

is the joint covariance matrix. Substituting Equations (17) and (18) into Equation (16) to derive the joint log likelihood, we get,

log p (u) = - \frac{1}{2} {(μ - u)}^{T} {(Q + σ^{2} I)}^{- 1} (μ - u) - \frac{1}{2} log | Q + σ^{2} I | - \frac{N}{2} log (2 π) .

(19)

We can now use any optimization techniques, e.g., gradient descent, to finish the MAP optimization. Although the fully independent training conditional (FITC) approximation [33] can be used to force

Q_{j}

to be a diagonal matrix and thus to enable quick computations [32], due to the multiplier

\nabla c^{j}

,

Q

is generally non-diagonal, and this computation acceleration will not work in our case. The main computation for the joint likelihood (19) is the inverse of joint covariance matrix

{(Q + σ^{2} I)}^{- 1}

and it log determinant

log | Q + σ^{2} I |

. Using an LU decomposition trick [22], we can compute these two terms at time complexity

O (n^{3})

and space complexity

O (n^{2})

. For the interdiffusion problem, most of the time we have

N \leq 100

EPMA samples, making our method practically efficient.

4.6. Interdiffusion Coefficients Predictions

With all model parameters being optimized, we can derive the posterior of the diffusion coefficients for any concentration

c_{*}

as

\begin{matrix} d^{j} (c_{*}) & = N (μ_{*}^{j}, v_{*}^{j}), \\ μ_{*}^{j} & = β_{j} 1 + {(k_{j}^{*})}^{T} {(K_{j})}^{- 1} (h_{j} - β_{j} 1), \\ v_{*}^{j} & = {[K_{j}]}_{* *} - {(k_{j}^{*})}^{T} {(K_{j})}^{- 1} k_{j}^{*}, \end{matrix}

(20)

where

k_{j}^{*} = {(k_{j} ([c_{*}, z_{j 1}), \dots, k_{j} (c_{*}, z_{j G}))}^{T}

is the covariance between

c_{*}

and the other ghost coefficient locations

z_{j g}

and

{[K_{j}]}_{* *} = k_{j} (c_{*}, c_{*})

. The derivation details are shown in the Appendix A for clarity.

5. Results

In practical experiments, the interdiffusion coefficients are unknown and uncontrollable, leading to difficulties for unbiased evaluations. Thus, we first assess InfPolyn on numerical examples of ternary (

M = 2

) and quaternary (

M = 3

) systems. To imitate a real system but not to lose generality, we use polynomial and exponential functions to construct the interdiffusion coefficient functions. To give an example, the fourth-order polynomial function in a two-component system is represented as

D_{i j} (c^{1}, c^{2}) = a_{i j}^{0} + \sum_{m = 1}^{2} a_{i j}^{m, 1} c^{m} + \sum_{m = 1}^{2} a_{i j}^{m, 2} {(c^{m})}^{2} + \sum_{m = 1}^{2} a_{i j}^{m, 3} {(c^{m})}^{3} + \sum_{m = 1}^{2} a_{i j}^{m, 4} {(c^{m})}^{4},

(21)

where for each coefficient in the polynomial

a_{i j}^{t, r}

, the superscript r represents the degree of polynomial and the value of them are generated independently from uniform distributions

U (0, 1)

. We put constraints on the high order terms to prevent the diffusion coefficients from increasing/decreasing drastically with the concentrations

c

; the diffusion matrix is considered symmetric to ensure numerical stability for the diffusion simulations. Note that this symmetric structure prior information is not injected into InfPolyn or other competing models. For the ternary system, the initial conditions for the forward simulation are

c^{1} (t = 0, x) = 0.6 \cdot 𝟙 (0.5 - x)

(22)

c^{2} (t = 0, x) = 0.4 \cdot 𝟙 (x - 0.5),

(23)

where

𝟙 (z)

denotes the Heaviside step function, which equals to 0 when

z < 0

and equals to 1 when

z \geq 0

. Similarly, for the quaternary system, we defined the initial condition as

c^{1} (t = 0, x) = 0.6 \cdot 𝟙 (0.5 - x)

(24)

c^{2} (t = 0, x) = 0.25 \cdot 𝟙 (x - 0.5)

(25)

c^{3} (t = 0, x) = 0.15 \cdot 𝟙 (x - 0.5) .

(26)

With the defined initial condition and the interdiffusion coefficient functions, we use a finite difference (DF) diffusion forward solver to simulate a diffusion process until the terminal time and obtain the terminal concentration profile

C

. To remain numerically stable and accurate, we use a second-order central difference for space and a fourth-order Runge–Kutta for time. The forward simulation solver uses a spatial step

Δ x = 0.000625

, which suggests 1601 grinds points on the space domain

[0, 1]

, the terminal time is set to

10^{4} Δ t

.

We then take equally spaced samples from the terminal concentration profile to mimic the EPMA process to provide the terminal concentration profile

C

. Unless stated otherwise, the terminal concentration profile consists of 40 samples. Since we are concerned with the center areas where the diffusion process is significant, the EPMA samples are limited in the range of

[0.44, 0.56]

in order to avoid numerical error closed to the boundaries for all Boltzmann–Matano method. All variables are considered dimensionless in the experiments. To evaluate the performance for different methods, we follow Cheng et al. [13] and use the relative error (RE),

{RE}_{i j} (c) = \frac{{\tilde{D}}_{i j} (c) - D_{i j} (c)}{D_{i j} (c)},

(27)

where

{\tilde{D}}_{i j} (c)

and

D_{i j} (c)

are the predicted and truth interdiffusion coefficients for concentration

c

, respectively. As a Boltzmann–Matano numerical inversion-based method, InfPolyn are compared with the other SOTA Boltzmann–Matano numerical inversion-based methods, i.e., the polynomial interdiffusion methods [13] with 3rd and 4th orders of the polynomial, the compress sensing approach [16] combined with 4th-order polynomial (high order model enough to capture the subtle changes), and the

L^{2}

regularization approach, which replaces the

L^{1}

penalty term in the work of Qin et al. [16] with an

L^{2}

penalty term, combined with a 4th-order polynomial function.

5.1. Case Study 1: Polynomial Diffusion Coefficients

In this case study, we assess InfPolyn in a ternary system and a quaternary with 4th-order polynomial interdiffusion coefficients:

\begin{matrix} D_{i j} (c) = a_{i j}^{m, 0} + \sum_{m = 1}^{M} \sum_{r = 1}^{4} a_{i j}^{m, r} {(c^{m})}^{r}, \end{matrix}

(28)

where each coefficients

a_{i j}^{m, r}

are randomly generated using independent uniform distributions. To ensure the symmetrical structure of matrix

A^{m, r}

, we force

a_{i j}^{m, r} = a_{j i}^{m, r}

by taking their average. In a general interdiffusion process, the interdiffusion coefficients are supposed to be smooth and close to constants, which also prevents instability in the numerical forward solver. To ensure this prior knowledge, we constrain the polynomial coefficients by

a_{i j}^{m, r} \sim U (0, 1) 10^{(r - 5)}

. The particularly used values are shown in the Appendix A. The REs for

x \in [0.4, 0.6]

for the ternary and the quaternary system are shown in Figure 1 and Figure 2. We omit areas outside

[0.4, 0.6]

because the REs are just extended flat lines without interesting information. As expected, the 4th order polynomial method has a strong model capacity and it can thus achieve few lowest REs at as is shown in some figures within Figure 1 and Figure 2. However, if we look at the whole area of interest, the overall performance is the worst among all methods. In particular, due to the overfitting issue, the 4th order polynomial method shows a highly fluctuational performance, which is highly depreciated for real applications. It is not surprising to learn that the 3rd-order polynomial approach shows slightly fewer fluctuations but also fewer lowest REs. This is indeed the aforementioned dilemma of model selection for the polynomial based methods. Similar to results shown in [16], adding a regularization term of

L^{1}

can ease the overfitting issue and greatly overcome the performance fluctuation issue in both Figure 1 and Figure 2. Unfortunately, the improvement comes with the price of low model capacity, leading to a rather flat-fitting RE. The 4th-order polynomial method combined with a

L^{2}

regularization term shows a similar improvement. It is, however, difficult to tell which regularization terms are better. The

L^{1}

regularization works better with the ternary system in Figure 1, whereas the

L^{2}

approach outperforms the

L^{1}

with a large margin in most cases of Figure 2. The inconsistency of performance for the

L^{1}

and

L^{2}

regularization approaches certainly hinders their applications for practical problems. In contrast, guided by the correct priors and benefited from the nonparametric nature, InfPolyn shows a consistent and accurate fitting and outperforms the competitors by a significant margin. Thanks to the model flexibility of InfPolyn, it can capture the dramatic changes in the center while maintaining a good fitting in the other flat areas. In all cases, InfPolyn can not only remain stable (indicated by a smooth RE curve) but also achieve the lowest REs in most areas. Furthermore, note that the diagonal interdiffusion coefficients in general show a lower relative error. This is because, in the simulation setting, the diagonal interdiffusion coefficients play a dominant role in the diffusion process. For the non-diagonal interdiffusion coefficients, the REs are amplified by being divided by smaller true interdiffusion coefficients.

Figure 1. The relative errors (REs) of predictive diffusion coefficients

{\tilde{D}}_{i j} (c (x))

in the center areas

x \in [0.4, 0.6]

for the evaluated methods in a random ternary system.

Figure 2. The relative errors (REs) of predictive diffusion coefficients

{\tilde{D}}_{i j} (c (x))

in the center areas

x \in [0.4, 0.6]

for the evaluated methods in a quadternary system.

5.2. Case Study 2: Exponential Diffusion Coefficients

In general, the diffusion coefficients can be highly complex that they are not in polynomial forms. To imitate such challenging situations, in this case study, we assess InfPolyn in ternary and quaternary systems with the following interdiffusion coefficient that combines an exponential term and a sinusoidal term,

\begin{matrix} D_{i j} (c) = a_{i j}^{0} + \sum_{m = 1}^{M} a_{i j}^{m, 1} exp (- c^{m}) - \sum_{m = 1}^{M} a_{i j}^{m, 2} cos (c^{m}), \end{matrix}

(29)

where the functional coefficients

a_{i j}^{m, r}

are similarly sampled from different uniform distributions, i.e.,

a_{i j}^{0} \sim U (0, 1) \times 10^{- 5}

,

a_{i j}^{m, 1} \sim U (0, 1) \times 10^{- 6}

, and

a_{i j}^{m, 2} \sim U (0, 1) \times 10^{- 6}

. Similarly, to ensure the forward diffusion stability, we use the previous approach to ensure the symmetrical structure of matrices

A^{0}

,

A^{m, 1}

, and

A^{m, 2}

. The used exact values of the functional coefficients are shown in Appendix A. The model performances measured by REs are shown in Figure 3 and Figure 4.

Figure 3. The relative errors (REs) of predictive diffusion coefficients

{\tilde{D}}_{i j} (c (x))

in the center areas

x \in [0.4, 0.6]

for the evaluated methods in a ternary system.

Figure 4. The relative errors (REs) of predictive diffusion coefficients

{\tilde{D}}_{i j} (c (x))

in the center areas

x \in [0.4, 0.6]

for the evaluated methods in a quadternary system.

In this case study, the 3rd-order polynomial slightly outperforms the 4th-order polynomial approaches in most cases in both Figure 3 and Figure 4. Nevertheless, the performance of both 3rd-order and 4th-order polynomial approaches are depreciated due to the fluctuation across the domain. Furthermore, note that REs for the polynomial approaches in Figure 4 are flat and smooth, indicating that a rich model capacity does not necessarily lead to performance fluctuations in all cases. The

L^{1}

and

L^{2}

regularization combined with 4th-order polynomial degenerate the model performance rather than improving them in many cases in Figure 4. This shows evidence that inappropriate implicit prior assumptions caused by the

L^{1}

and

L^{2}

regularization terms can hurt model performance. It might be possible to circumvent this issue by adjusting the penalty weight. However, this will create a new issue of how to properly decide the value of the penalty weight, taking us back to the dilemma of model selections. In contrast, InfPolyn shows a consistent and accurate performance; it outperforms the competitors by a large margin for all cases except for

{\tilde{D}}_{31}

of the quaternary system in the left area in Figure 4. We would also like to point out that many methods actually fail the quaternary system in Figure 4 as their REs are larger than 1, meaning a total prediction failure.

5.3. Case Study 3: Uncertainty Quantification Analysis

Finally, to assess the consistency of InfPolyn, we conducted a ternary system experiment in Case Study 1 based on five distinct random polynomial coefficient sets, which assemble five different diffusion coefficients, and show the performance statistics. To also investigate the influence of the number of the EPMA samples, we ran each experiment with

{20, 30, 40, 50}

EPMA samples. The minimum number of the EPMA samples was 20 because the 4th-order polynomial has 18 coefficients and thus requires at least 18 EPMA samples to work. For each experiment with the given EPMA samples, the model performance was evaluated by average relative error (ARE),

{ARE}_{i j} = \frac{\int_{X} {RE}_{i j} (c (x)) d x}{\int_{X} d x}

(30)

where

X

indicates the whole spatial domain. We show the statistics of

{ARE}_{11}

and

{ARE}_{22}

over the five different diffusion coefficients in Figure 5 using the Tukey box plot. The distinct fact we immediately see is the superiority of InfPolyn compared to the competitors in terms of accuracy and consistency. We then notice that the performance does not improve gradually with the increasing number of EPMA samples for all methods except for the 4th-order polynomial. We believe that each method can already approach reasonable diffusion coefficients (by minimizing the loss function) with only 20 EPMA samples. In this case, more samples will not bring improvement, whereas the performance can fluctuate with different EPMA concentration profiles. Comparing the fluctuations, InfPolyn shows a modest level of changes, whereas the most unstable one is the 4th-order polynomial with

L^{2}

regularization. The most stable method for both

{\tilde{D}}_{11}

and

{\tilde{D}}_{22}

is the 3rd order polynomial, which can indicate a lack of model capacity or an underfitting issue. The only exception of performance improvement is the 4th order polynomial, which improves with more EPMA samples. This is a clear sign of overfitting, which can be addressed by introducing more training data. This explains the overfitting phenomena we previously encountered in Case Studies 1 and 2. Will the performance keep improving and outperform InfPolyn with the trend shown in Figure 5? It may happen with more than 200 EPMA samples, which becomes infeasible in practice. Furthermore, the decreasing trend should slowly disappear at some point, which is already happening for

{\tilde{D}}_{22}

.

Figure 5. The Tukey box plot of average relative error of

{\tilde{D}}_{11}

(top) and

{\tilde{D}}_{22}

(bottom) based on computation using concentration profile consisting

{20, 30, 40, 50}

EMPA samples.

It is also noticeable that the

L^{1}

and

L^{2}

regularization techniques indeed can improve the performance of a 4th-order polynomial by a large margin for all cases with a different number of EPMA samples, which is consistent with the finding in [16].

5.4. Case Study 4: Experiment Verification

To present the practical applicability of InfPolyn, we then apply it to the reproduction of the interdiffusion flux from experiment data of the Mg-Al, Mg-Al-Zn, and Mg-Al-Zn-Cu systems collected from the previous literature [13]. These experimental data include composition profiles of the annealed diffusion couples of Mg-Al at 781 K for 36,960 s, Mg-Al-Zn at 868 K for 5400 s, and Mg-Al-Zn-Cu at 755 K for 75,530 s. Since the experimental measurements are taken non-uniformly on the spatial domain for all of the components, they are reprocessed with local polynomials interpolation techniques to provide values on a uniform grid, which is the common preprocessing for the Matano-based approaches. The derivative and integral terms in the Matano equation are then obtained. Given all the preprocessed data as inputs, we then randomly take all of the samples, half of the samples, and a quarter of the samples from the diffusion systems to test the robustness of the testing methods. As shown in Figure 6, the curves for all three cases computed by InfPolyn fit well with the experimental data, which lie in the 95% confident areas, indicating a good uncertainty quantification for the predictions. As for the half size and the quarter size training data, the left areas induce oscillations in some intervals. However, InfPolyn still captures the major tendency of the fluxes with slightly increasing uncertainty.

Figure 6. The actual and predicted diffusion fluxes for the Mg-Al, Mg-AL-Zn, and Mg-Al-Zn-Cu system (from left to right columns) using 100%, 50%, and 25% of all available samples (from top to bottom rows).

6. Conclusions

In this paper, we propose InfPolyn, a novel nonparametric Bayesian framework to estimate the interdiffusivity coefficients and demonstrate its superiority in terms of accuracy and consistency by combining it with the numerical inverse Boltzmann–Matano method [13]. This also becomes the limitation of InfPolyn because the numerical inverse Boltzmann–Matano method has certain limitations. For instance, it cannot generalize to a wide variety of complicated 2D diffusion processes and complex engineering interdiffusion scenarios, e.g., in semiconductors. Nevertheless, InfPolyn can be combined with other methods (such as the forward-simulation approach [10]) to fulfill its potential in the estimations of interdiffusion coefficients. This is outside the scope of this paper and we thus leave it for the future work.

The main novelty of our work is the nonparametric Bayesian framework that allows automatic model selections (resistant to overfitting and underfitting) and meaningful prior knowledge injections. The problem of recovering interdiffusion predictions from concentrations is an ill-posed inverse problem. Thus, the injection of proper priors is a necessary way to recover the ground-truth diffusion coefficients. Unlike methods such as [16] that impose nonphysical priors, our method provides an easy way to inject intuitive priors; e.g., the diagonal diffusion coefficients normally plays a dominant role in an interdiffusion process.

Author Contributions

Conceptualization, P.W. and K.C.; methodology, W.W.X.; validation, M.C.; formal analysis, M.C. and W.W.X.; investigation, W.W.X. and M.C.; resources, W.Z. and P.W.; data curation, K.C.; writing—original draft preparation, W.W.X.; writing—review and editing, W.W.X., M.C., K.C. and P.W.; visualization, W.W.X.; supervision, K.C., W.Z. and P.W.; funding acquisition, P.W. All authors have read and agreed to the published version of the manuscript.

Funding

This research was funded by the National Key Research and Development Program of China grant number 2017YFB0701700 and 2018YFB0703902.

Institutional Review Board Statement

Not applicable.

Informed Consent Statement

Not applicable.

Data Availability Statement

All data is in the Appendix A, numerical experiments, and the reference.

Conflicts of Interest

The authors declare no conflict of interest.

Appendix A. A Gaussian Processs and Its Predicted Posterior

Consider training points

y_{i} = η (ξ_{i})

,

i = 1, \dots, M

and design points

ξ_{i}

. In a GP model, we place a prior distribution over

η (ξ)

indexed by

ξ

:

η (ξ) | θ \sim G P (m (ξ), c (ξ, ξ^{'} | θ))

(A1)

with mean and covariance functions:

\begin{matrix} m_{0} (ξ) = E [η (ξ)], c (ξ, ξ^{'} | θ) = E [(η (ξ) - m_{0} (ξ)) (η (ξ^{'}) - m_{0} (ξ^{'}))] \end{matrix}

(A2)

in which

E [\cdot]

is the expectation operator. The hyperparameters

θ

are estimated during the learning process. The mean function can be assumed to be a identical constant,

m_{0} (ξ) \equiv μ

, by virtue of centering the data. Alternative choices are possible, e.g., a linear function of

ξ

, but rarely adopted unless a priori information on the form of the function is available. The covariance function can take many forms, the most common being the automatic relevance determinant (ARD) kernel:

c (ξ, ξ^{'} | θ) = θ_{0} exp (- {(ξ - ξ^{'})}^{T} diag (θ_{1}, \dots, θ_{l}) (ξ - ξ^{'}))

(A3)

The hyperparameters

θ = {(θ_{0}, \dots, θ_{l})}^{T}

.

θ_{1}^{- 1}, \dots, θ_{l}^{- 1}

in this case are called the square correlation lengths. For any fixed

ξ

,

η (ξ)

is a random variable. A collection of values

η (ξ_{i})

,

i = 1, \dots, M

, on the other hand, is a partial realization of the GP. Realizations of the GP are deterministic functions of

ξ

. The main property of GPs is that the joint distribution of

η (ξ_{i})

,

i = 1, \dots, M

, is multivariate Gaussian.

Starting from the prior (A1) and using the available data, we obtain a posterior GP distribution conditional on

θ

, with new mean and covariance functions. Letting

t = {(y_{1}, \dots, y_{M})}^{T}

, the likelihood of the data (given

θ

) is

p (t | θ) = N (μ 1, C (θ))

. Here,

N (\cdot, \cdot)

denotes a normal distribution with mean

0

and covariance matrix

C (θ) = [C_{i j}]

, in which

C_{i j} = c (ξ_{i}, ξ_{j} | θ)

,

i, j = 1, \dots, M

. The joint distribution of

t

and

η (ξ)

(for a test input

ξ

) has the distribution

p (η (ξ), t | θ) = N (μ 1, C {(θ)}^{'})

, where:

\begin{matrix} C^{'} (θ) = [\begin{matrix} C (θ) & c (ξ) \\ c {(ξ)}^{T} & c (ξ, ξ | θ) \end{matrix}] \end{matrix}

(A4)

in which

c (ξ) = {(c (ξ_{1}, ξ | θ), \dots, c (ξ_{M}, ξ | θ))}^{T}

. Conditioning on

t

provides the conditional predictive distribution at

ξ

[22]:

\begin{matrix} η (ξ) | t, θ \sim GP (m^{'} (ξ | θ), c^{'} (ξ, ξ^{'} | θ)) \\ m^{'} (ξ | θ) = μ 1 + c {(ξ)}^{T} C {(θ)}^{- 1} (t - μ 1) \\ c^{'} (ξ, ξ^{'} | θ) = c (ξ, ξ^{'} | θ) - c {(ξ)}^{T} C {(θ)}^{- 1} c (ξ^{'}) \end{matrix}

(A5)

The expected value

E [η (ξ)]

is given by

m^{'} (ξ | θ)

, and

c^{'} (ξ, ξ | θ)

is the predictive variance. The hyperparameters

θ

are normally obtained from point estimates [34,35]. The maximum likelihood estimate (MLE), for example, is found by maximizing the log of the likelihood:

θ_{M L E} = {\arg \max}_{θ} (- \frac{M}{2} log (2 π) - \frac{1}{2} {(t - μ 1)}^{T} t^{T} C {(θ)}^{- 1} (t - μ 1) - \frac{1}{2} ln | C (θ) |) .

(A6)

Appendix A.1. Solving a Ternary System Using Boltzmann–Matano Inverse Method

Consider a ternary system where we take EPMA samples at four random locations,

[c^{1} (x_{n}), c^{2} (x_{n})]

,

(n = 1, \dots, 4)

, on the “true” concentration-distance curve at a terminal time. By substituting them into the Boltzmann–Matano equations, we have

\begin{matrix} [\begin{matrix} \frac{\partial c_{1}}{\partial x} |_{x_{1}} & 0 & \frac{\partial c_{2}}{\partial x} |_{x_{1}} & 0 \\ 0 & \frac{\partial c_{1}}{\partial x} |_{x_{1}} & 0 & \frac{\partial c_{2}}{\partial x} |_{x_{1}} \\ \frac{\partial c_{1}}{\partial x} |_{x_{2}} & 0 & \frac{\partial c_{2}}{\partial x} |_{x_{2}} & 0 \\ 0 & \frac{\partial c_{1}}{\partial x} |_{x_{2}} & 0 & \frac{\partial c_{2}}{\partial x} |_{x_{2}} \\ \frac{\partial c_{1}}{\partial x} |_{x_{3}} & 0 & \frac{\partial c_{2}}{\partial x} |_{x_{3}} & 0 \\ 0 & \frac{\partial c_{1}}{\partial x} |_{x_{3}} & 0 & \frac{\partial c_{2}}{\partial x} |_{x_{3}} \\ \frac{\partial c_{1}}{\partial x} |_{x_{4}} & 0 & \frac{\partial c_{2}}{\partial x} |_{x_{4}} & 0 \\ 0 & \frac{\partial c_{1}}{\partial x} |_{x_{4}} & 0 & \frac{\partial c_{2}}{\partial x} |_{x_{4}} \end{matrix}] [\begin{matrix} D_{11} \\ D_{21} \\ D_{12} \\ D_{22} \end{matrix}] = [\begin{matrix} u_{1} |_{x_{1}} \\ u_{2} |_{x_{1}} \\ u_{1} |_{x_{2}} \\ u_{2} |_{x_{2}} \\ u_{1} |_{x_{3}} \\ u_{2} |_{x_{3}} \\ u_{1} |_{x_{4}} \\ u_{2} |_{x_{4}} \end{matrix}] . \end{matrix}

(A7)

We assume second-order polynomials for the functional relationships between diffusion coefficients and concentrations,

\begin{matrix} [\begin{matrix} D_{11} \\ D_{21} \\ D_{12} \\ D_{22} \end{matrix}] = [\begin{matrix} α_{11}^{(0)} + α_{11}^{(1)} c_{1} + α_{11}^{(2)} c_{2} + α_{11}^{(3)} c_{1}^{2} + α_{11}^{(4)} c_{2}^{2} \\ α_{21}^{(0)} + α_{21}^{(1)} c_{1} + α_{21}^{(2)} c_{2} + α_{21}^{(3)} c_{1}^{2} + α_{21}^{(4)} c_{2}^{2} \\ α_{12}^{(0)} + α_{12}^{(1)} c_{1} + α_{12}^{(2)} c_{2} + α_{12}^{(3)} c_{1}^{2} + α_{12}^{(4)} c_{2}^{2} \\ α_{22}^{(0)} + α_{22}^{(1)} c_{1} + α_{22}^{(2)} c_{2} + α_{22}^{(3)} c_{1}^{2} + α_{22}^{(4)} c_{2}^{2} \end{matrix}] = [\begin{matrix} ϕ & 0 \\ 0 & ϕ \end{matrix}] [\begin{matrix} a_{1} \\ a_{2} \end{matrix}], \end{matrix}

(A8)

where

\begin{array}{l} (A9) & ϕ & = & [\begin{matrix} I & c^{1} I & c^{2} I & {(c^{1})}^{2} I & {(c^{2})}_{2} I \end{matrix}] \\ (A10) & = & [\begin{matrix} 1 & 0 & c^{1} & 0 & c^{2} & 0 & {(c^{1})}^{2} & 0 & {(c^{2})}^{2} & 0 \\ 0 & 1 & 0 & c^{1} & 0 & c^{2} & 0 & {(c^{1})}^{2} & 0 & {(c^{2})}^{2} \end{matrix}] \\ (A11) & a_{i} & = & {[\begin{matrix} α_{1 i}^{(0)} & α_{2 i}^{(0)} & α_{1 i}^{(1)} & α_{2 i}^{(1)} & α_{1 i}^{(2)} & α_{2 i}^{(2)} & α_{1 i}^{(3)} & α_{2 i}^{(3)} & α_{1 i}^{(4)} & α_{2 i}^{(4)} \end{matrix}]}^{T}, i = 1, 2 . \end{array}

We can now simply solve the linear system of equations to obtain the polynomial coefficients (and thus the diffusion coefficients as functions of the polynomials). The procedure is the same with more EPMA samples or higher orders of polynomials. The only difference is that we may solve an overdetermined system with the criteria of minimizing the

L^{2}

loss.

Appendix A.2. Experimental Details

Table A1 and Table A2 show the experimental setting of functional coefficients in Case Study 1; Table A3 and Table A4 show the experimental setting of functional coefficients in Case Study 2.

Table A1. The polynomial coefficients in the random ternary interdiffusion system, where

a_{i j}

represents the entries in the coefficient matrix on position

{i, j}, i = 1, 2, j = 1, 2

.

Table A1. The polynomial coefficients in the random ternary interdiffusion system, where

a_{i j}

represents the entries in the coefficient matrix on position

{i, j}, i = 1, 2, j = 1, 2

.

A	$a_{11}$	$a_{21}$	$a_{22}$
$A_{0}$	$6.05 \times 10^{- 5}$	$4.81 \times 10^{- 6}$	$5.12 \times 10^{- 5}$
$A_{1}^{1}$	$3.08 \times 10^{- 6}$	$4.18 \times 10^{- 7}$	$3.28 \times 10^{- 6}$
$A_{2}^{1}$	$1.82 \times 10^{- 6}$	$6.42 \times 10^{- 7}$	$2.86 \times 10^{- 6}$
$A_{1}^{2}$	$8.83 \times 10^{- 7}$	$6.07 \times 10^{- 8}$	$1.07 \times 10^{- 7}$
$A_{2}^{2}$	$2.96 \times 10^{- 7}$	$4.21 \times 10^{- 8}$	$9.63 \times 10^{- 7}$
$A_{1}^{3}$	$4.09 \times 10^{- 8}$	$2.82 \times 10^{- 9}$	$1.26 \times 10^{- 8}$
$A_{2}^{3}$	$1.19 \times 10^{- 8}$	$7.29 \times 10^{- 9}$	$1.10 \times 10^{- 8}$
$A_{1}^{4}$	$1.26 \times 10^{- 8}$	$5.02 \times 10^{- 10}$	$1.66 \times 10^{- 9}$
$A_{2}^{4}$	$6.04 \times 10^{- 9}$	$6.34 \times 10^{- 10}$	$1.09 \times 10^{- 8}$

Table A2. The polynomial coefficients in the random quaternary interdiffusion system, where

a_{i j}

represents the entries in the coefficient matrix on position

{i, j}, i = 1, 2, 3; j = 1, 2, 3

.

Table A2. The polynomial coefficients in the random quaternary interdiffusion system, where

a_{i j}

represents the entries in the coefficient matrix on position

{i, j}, i = 1, 2, 3; j = 1, 2, 3

.

A	$a_{11}$	$a_{12}$	$a_{13}$	$a_{22}$	$a_{23}$	$a_{33}$
$A_{0}$	$2.03 \times 10^{- 5}$	$6.50 \times 10^{- 6}$	$5.15 \times 10^{- 6}$	$7.13 \times 10^{- 5}$	$3.58 \times 10^{- 7}$	$3.27 \times 10^{- 5}$
$A_{1}^{1}$	$9.15 \times 10^{- 6}$	$3.13 \times 10^{- 7}$	$6.87 \times 10^{- 7}$	$9.46 \times 10^{- 6}$	$1.35 \times 10^{- 7}$	$1.76 \times 10^{- 6}$
$A_{2}^{1}$	$9.56 \times 10^{- 6}$	$8.08 \times 10^{- 7}$	$3.01 \times 10^{- 7}$	$1.61 \times 10^{- 6}$	$4.89 \times 10^{- 7}$	$1.18 \times 10^{- 6}$
$A_{3}^{1}$	$2.01 \times 10^{- 6}$	$9.05 \times 10^{- 7}$	$2.47 \times 10^{- 7}$	$5.48 \times 10^{- 6}$	$7.41 \times 10^{- 7}$	$2.95 \times 10^{- 6}$
$A_{1}^{2}$	$2.61 \times 10^{- 7}$	$7.28 \times 10^{- 8}$	$6.95 \times 10^{- 8}$	$2.18 \times 10^{- 7}$	$5.91 \times 10^{- 8}$	$1.58 \times 10^{- 7}$
$A_{2}^{2}$	$5.90 \times 10^{- 7}$	$2.23 \times 10^{- 8}$	$4.55 \times 10^{- 8}$	$7.08 \times 10^{- 7}$	$5.79 \times 10^{- 8}$	$3.94 \times 10^{- 7}$
$A_{3}^{2}$	$2.17 \times 10^{- 9}$	$2.96 \times 10^{- 8}$	$5.60 \times 10^{- 8}$	$8.11 \times 10^{- 7}$	$3.82 \times 10^{- 8}$	$6.70 \times 10^{- 8}$
$A_{1}^{3}$	$9.00 \times 10^{- 8}$	$2.45 \times 10^{- 9}$	$2.52 \times 10^{- 10}$	$9.00 \times 10^{- 8}$	$5.90 \times 10^{- 9}$	$2.48 \times 10^{- 8}$
$A_{2}^{3}$	$2.70 \times 10^{- 8}$	$2.57 \times 10^{- 9}$	$2.11 \times 10^{- 9}$	$6.50 \times 10^{- 10}$	$6.48 \times 10^{- 9}$	$6.01 \times 10^{- 9}$
$A_{3}^{3}$	$8.04 \times 10^{- 8}$	$2.55 \times 10^{- 9}$	$3.11 \times 10^{- 9}$	$8.91 \times 10^{- 8}$	$2.76 \times 10^{- 9}$	$1.03 \times 10^{- 8}$
$A_{1}^{4}$	$4.82 \times 10^{- 9}$	$3.18 \times 10^{- 10}$	$1.87 \times 10^{- 10}$	$9.07 \times 10^{- 9}$	$2.45 \times 10^{- 10}$	$3.91 \times 10^{- 9}$
$A_{2}^{4}$	$6.58 \times 10^{- 9}$	$9.28 \times 10^{- 10}$	$2.44 \times 10^{- 10}$	$2.05 \times 10^{- 9}$	$6.07 \times 10^{- 10}$	$7.94 \times 10^{- 9}$
$A_{3}^{4}$	$9.38 \times 10^{- 9}$	$1.30 \times 10^{- 10}$	$9.50 \times 10^{- 10}$	$9.17 \times 10^{- 9}$	$6.05 \times 10^{- 10}$	$1.46 \times 10^{- 9}$

Table A3. The table shows the setting for coefficient matrix of polynomials function and

A_{i} a n d A_{i}^{^{'}}

represents the entries in each coefficient matrix on position

{i}, i = 1, 2; j = 1, 2

.

Table A3. The table shows the setting for coefficient matrix of polynomials function and

A_{i} a n d A_{i}^{^{'}}

represents the entries in each coefficient matrix on position

{i}, i = 1, 2; j = 1, 2

.

A	$a_{11}$	$a_{21}$	$a_{22}$
$A_{0}$	$6.15 \times 10^{- 5}$	$5.08 \times 10^{- 7}$	$5.46 \times 10^{- 5}$
$A_{1}$	$1.25 \times 10^{- 7}$	$3.77 \times 10^{- 7}$	$8.50 \times 10^{- 8}$
$A_{2}$	$9.09 \times 10^{- 7}$	$3.51 \times 10^{- 7}$	$9.08 \times 10^{- 7}$
$A_{1}^{^{'}}$	$4.85 \times 10^{- 7}$	$2.85 \times 10^{- 8}$	$7.39 \times 10^{- 7}$
$A_{2}^{^{'}}$	$4.67 \times 10^{- 7}$	$7.29 \times 10^{- 7}$	$8.27 \times 10^{- 7}$

Table A4. The table shows the setting for coefficient matrix of exponential function and

A_{i} a n d A_{i}^{^{'}}

represents the entries in each coefficient matrix on position

{i}, i = 1, 2,, 3; j = 1, 2, 3

.

Table A4. The table shows the setting for coefficient matrix of exponential function and

A_{i} a n d A_{i}^{^{'}}

represents the entries in each coefficient matrix on position

{i}, i = 1, 2,, 3; j = 1, 2, 3

.

A	$a_{11}$	$a_{12}$	$a_{13}$	$a_{22}$	$a_{23}$	$a_{33}$
$A_{0}$	$4.82 \times 10^{- 5}$	$4.61 \times 10^{- 6}$	$4.17 \times 10^{- 6}$	$4.26 \times 10^{- 5}$	$4.34 \times 10^{- 6}$	$6.27 \times 10^{- 5}$
$A_{1}$	$8.77 \times 10^{- 8}$	$8.40 \times 10^{- 8}$	$1.07 \times 10^{- 8}$	$9.65 \times 10^{- 8}$	$8.20 \times 10^{- 8}$	$2.62 \times 10^{- 8}$
$A_{2}$	$5.15 \times 10^{- 8}$	$8.39 \times 10^{- 9}$	$1.87 \times 10^{- 8}$	$9.39 \times 10^{- 8}$	$1.55 \times 10^{- 8}$	$9.78 \times 10^{- 8}$
$A_{3}$	$6.81 \times 10^{- 8}$	$2.67 \times 10^{- 8}$	$2.23 \times 10^{- 8}$	$7.68 \times 10^{- 9}$	$5.41 \times 10^{- 9}$	$3.78 \times 10^{- 8}$
$A_{1}^{^{'}}$	$3.45 \times 10^{- 8}$	$3.79 \times 10^{- 8}$	$3.48 \times 10^{- 8}$	$3.11 \times 10^{- 9}$	$1.42 \times 10^{- 8}$	$5.00 \times 10^{- 8}$
$A_{2}^{^{'}}$	$1.21 \times 10^{- 9}$	$3.53 \times 10^{- 8}$	$6.64 \times 10^{- 8}$	$7.51 \times 10^{- 8}$	$2.52 \times 10^{- 8}$	$5.45 \times 10^{- 8}$
$A_{3}^{^{'}}$	$6.21 \times 10^{- 8}$	$3.77 \times 10^{- 8}$	$6.93 \times 10^{- 8}$	$1.23 \times 10^{- 9}$	$6.45 \times 10^{- 8}$	$1.75 \times 10^{- 8}$

References

Boltzmann, L. Zur integration der diffusionsgleichung bei variabeln diffusionscoefficienten. Ann. Phys. 1894, 53, 959–964. [Google Scholar] [CrossRef] [Green Version]
Matano, C. On the Relation between Diffusion-Coefficients and Concentrations of Solid Metals. Jpn. J. Phys. 1933, 8, 109–113. [Google Scholar]
Kirkaldy, J.S.; Lane, J.E.; Mason, G.R. Diffusion in multicomponent metallic systems: VII. Solutions of the multicomponent diffusion equations with variable coefficients. Can. J. Phys. 1963, 41, 2174–2186. [Google Scholar] [CrossRef]
Chen, W.; Zhang, L.; Du, Y.; Tang, C.; Huang, B. A pragmatic method to determine the composition-dependent interdiffusivities in ternary systems by using a single diffusion couple. Scr. Mater. 2014, 90–91, 53–56. [Google Scholar] [CrossRef]
Dayananda, M.A.; Sohn, Y.H. A new analysis for the determination of ternary interdiffusion coefficients from a single diffusion couple. Metall. Mater. Trans. A Phys. Metall. Mater. Sci. 1999, 30, 535–543. [Google Scholar] [CrossRef]
Cermak, J.; Rothova, V. Concentration dependence of ternary interdiffusion coefficients in Ni3Al/Ni3Al–X couples with X= Cr, Fe, Nb and Ti. Acta Mater. 2003, 51, 4411–4421. [Google Scholar] [CrossRef]
Cheng, K.; Chen, W.; Liu, D.; Zhang, L.; Du, Y. Analysis of the Cermak–Rothova method for determining the concentration dependence of ternary interdiffusion coefficients with a single diffusion couple. Scr. Mater. 2014, 76, 5–8. [Google Scholar] [CrossRef]
Dash, A.; Esakkiraja, N.; Paul, A. Solving the issues of multicomponent diffusion in an equiatomic NiCoFeCr medium entropy alloy. Acta Mater. 2020, 193, 163–171. [Google Scholar] [CrossRef]
Esakkiraja, N.; Gupta, A.; Jayaram, V.; Hickel, T.; Divinski, S.V.; Paul, A. Diffusion, defects and understanding the growth of a multicomponent interdiffusion zone between Pt-modified B2 NiAl bond coat and single crystal superalloy. Acta Mater. 2020, 195, 35–49. [Google Scholar] [CrossRef]
Zhang, Q.; Zhao, J. Extracting interdiffusion coefficients from binary diffusion couples using traditional methods and a forward-simulation method. Intermetallics 2013, 34, 132–141. [Google Scholar] [CrossRef]
Bouchet, R.; Mevrel, R. A numerical inverse method for calculating the interdiffusion coefficients along a diffusion path in ternary systems. Acta Mater. 2002, 50, 4887–4900. [Google Scholar] [CrossRef]
Andersson, J.; Ågren, J. Models for numerical treatment of multicomponent diffusion in simple phases. J. Appl. Phys. 1992, 72, 1350–1355. [Google Scholar] [CrossRef]
Cheng, K.; Zhou, J.; Xu, H.; Tang, S.; Yang, Y. An effective method to calculate the composition-dependent interdiffusivity with one diffusion couple. Comput. Mater. Sci. 2018, 143, 182–188. [Google Scholar] [CrossRef]
Cheng, K.; Xu, H.; Ma, B.; Zhou, J.; Tang, S.; Liu, Y.; Sun, C.; Wang, N.; Wang, M.; Zhang, L.; et al. An in situ study on the diffusion growth of intermetallic compounds in the Al–Mg diffusion couple. J. Alloys Compd. 2019, 810, 151878. [Google Scholar] [CrossRef]
Cheng, K.; Sun, J.; Xu, H.; Wang, J.; Zhan, C.; Ghomashchi, R.; Zhou, J.; Tang, S.; Zhang, L.; Du, Y. Diffusion growth ϕ ternary intermetallic compound in the Mg-Al-Zn alloy system: In-situ observation and modeling. J. Mater. Sci. Technol. 2020, in press. [Google Scholar]
Qin, Y.; Narayan, A.; Cheng, K.; Wang, P. An efficient method of calculating composition-dependent inter-diffusion coefficients based on compressed sensing method. Comput. Mater. Sci. 2020, 188, 110145. [Google Scholar] [CrossRef]
Robert, C. The Bayesian Choice: From Decision-Theoretic Foundations to Computational Implementation; Springer Science & Business Media: Berlin/Heidelberg, Germany, 2007. [Google Scholar]
Girolami, M. Bayesian inference for differential equations. Theor. Comput. Sci. 2008, 408, 4–16. [Google Scholar] [CrossRef] [Green Version]
Alahmadi, A.A.; Flegg, J.A.; Cochrane, D.G.; Drovandi, C.C.; Keith, J.M. A comparison of approximate versus exact techniques for Bayesian parameter inference in nonlinear ordinary differential equation models. R. Soc. Open Sci. 2020, 7, 191315. [Google Scholar] [CrossRef] [Green Version]
Wilkinson, R. Accelerating ABC methods using Gaussian processes. In Proceedings of the Seventeenth International Conference on Artificial Intelligence and Statistics, Reykjavik, Iceland, 22–25 April 2014; pp. 1015–1023. [Google Scholar]
Conrad, P.R.; Marzouk, Y.M.; Pillai, N.S.; Smith, A. Accelerating Asymptotically Exact MCMC for Computationally Intensive Models via Local Approximations. J. Am. Stat. Assoc. 2016, 111, 1591–1607. [Google Scholar] [CrossRef] [Green Version]
Rasmussen, C.E.; Williams, C.K.I. Gaussian Processes for Machine Learning; MIT Press: Cambridge, MA, USA, 2006. [Google Scholar]
Calderhead, B.; Girolami, M.; Lawrence, N.D. Accelerating Bayesian inference over nonlinear differential equations with Gaussian processes. In Proceedings of the Advances in Neural Information Processing Systems, Vancouver, BC, Canada, 7–10 December 2009; pp. 217–224. [Google Scholar]
Albert, C.G. Gaussian processes for data fulfilling linear differential equations. Proceedings 2019, 33, 5. [Google Scholar] [CrossRef] [Green Version]
Yang, S.; Wong, S.W.; Kou, S. Inference of dynamic systems from noisy and sparse data via manifold-constrained Gaussian processes. Proc. Natl. Acad. Sci. USA 2021, 118, e2020397118. [Google Scholar] [CrossRef]
Xing, W.; Elhabian, S.; Kirby, R.; Whitaker, R.; Zhe, S. Infinite ShapeOdds: Nonparametric Bayesian Models for Shape Representations. In Proceedings of the AAAI 2020: The Thirty-Fourth AAAI Conference on Artificial Intelligence, New York, NY, USA, 7–12 February 2020. [Google Scholar]
Zhe, S.; Qi, Y.; Park, Y.; Molloy, I.; Chari, S. DinTucker: Scaling up Gaussian process models on multidimensional arrays with billions of elements. arXiv 2013, arXiv:1311.2663. [Google Scholar]
Fick, A. Ueber Diffusion. Pogg. Ann. 1855, 94, 59–86. [Google Scholar] [CrossRef]
Candes, E.; Romberg, J. l1-Magic: Recovery of Sparse Signals via Convex Programming. 2005, Volume 4, p. 14. Available online: www.acm.Caltech.Edu/l1magic/downloads/l1magic.pdf (accessed on 26 June 2021).
Berg, E.V.; Friedlander, M.P. Probing the Pareto frontier for basis pursuit solutions. Sci. Comput. 2008, 31, 890–912. [Google Scholar]
Sturm, J.F. Using SeDuMi 1.02, a MATLAB toolbox for optimization over symmetric cones. Optim. Method Softw. 1999, 11, 625–653. [Google Scholar] [CrossRef]
Snelson, E.; Ghahramani, Z. Sparse Gaussian Processes using Pseudo-inputs. In Advances in Neural Information Processing Systems 19; MIT Press: Cambridge, MA, USA, 2006; pp. 1257–1264. [Google Scholar]
Quiñonero-Candela, J.; Rasmussen, C.E. A unifying view of sparse approximate Gaussian process regression. J. Mach. Learn. Res. 2005, 6, 1939–1959. [Google Scholar]
Kennedy, M.C.; O’Hagan, A. Bayesian calibration of computer models. J. R. Stat. Soc. Ser. B (Stat. Methodol.) 2001, 63, 425–464. [Google Scholar] [CrossRef]
Bastos, L.; O’Hagan, A. Diagnostics for Gaussian Process Emulators. Technometrics 2009, 51, 425–438. [Google Scholar] [CrossRef]

Figure 1. The relative errors (REs) of predictive diffusion coefficients

{\tilde{D}}_{i j} (c (x))

in the center areas

x \in [0.4, 0.6]

for the evaluated methods in a random ternary system.

Figure 1. The relative errors (REs) of predictive diffusion coefficients

{\tilde{D}}_{i j} (c (x))

in the center areas

x \in [0.4, 0.6]

for the evaluated methods in a random ternary system.

Figure 2. The relative errors (REs) of predictive diffusion coefficients

{\tilde{D}}_{i j} (c (x))

in the center areas

x \in [0.4, 0.6]

for the evaluated methods in a quadternary system.

Figure 2. The relative errors (REs) of predictive diffusion coefficients

{\tilde{D}}_{i j} (c (x))

in the center areas

x \in [0.4, 0.6]

for the evaluated methods in a quadternary system.

Figure 3. The relative errors (REs) of predictive diffusion coefficients

{\tilde{D}}_{i j} (c (x))

in the center areas

x \in [0.4, 0.6]

for the evaluated methods in a ternary system.

Figure 3. The relative errors (REs) of predictive diffusion coefficients

{\tilde{D}}_{i j} (c (x))

in the center areas

x \in [0.4, 0.6]

for the evaluated methods in a ternary system.

Figure 4. The relative errors (REs) of predictive diffusion coefficients

{\tilde{D}}_{i j} (c (x))

in the center areas

x \in [0.4, 0.6]

for the evaluated methods in a quadternary system.

Figure 4. The relative errors (REs) of predictive diffusion coefficients

{\tilde{D}}_{i j} (c (x))

in the center areas

x \in [0.4, 0.6]

for the evaluated methods in a quadternary system.

Figure 5. The Tukey box plot of average relative error of

{\tilde{D}}_{11}

(top) and

{\tilde{D}}_{22}

(bottom) based on computation using concentration profile consisting

{20, 30, 40, 50}

EMPA samples.

Figure 5. The Tukey box plot of average relative error of

{\tilde{D}}_{11}

(top) and

{\tilde{D}}_{22}

(bottom) based on computation using concentration profile consisting

{20, 30, 40, 50}

EMPA samples.

Figure 6. The actual and predicted diffusion fluxes for the Mg-Al, Mg-AL-Zn, and Mg-Al-Zn-Cu system (from left to right columns) using 100%, 50%, and 25% of all available samples (from top to bottom rows).

Publisher’s Note: MDPI stays neutral with regard to jurisdictional claims in published maps and institutional affiliations.

© 2021 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).

Article Metrics

Citations

Article Access Statistics

Journal Statistics

Multiple requests from the same IP address are counted as one view.