Next Article in Journal
Composite Interpolation-Based Multiscale Fuzzy Entropy and Its Application to Fault Diagnosis of Rolling Bearing
Next Article in Special Issue
State and Parameter Estimation from Observed Signal Increments
Previous Article in Journal
Fault Diagnosis Method for Rolling Bearings Based on Composite Multiscale Fluctuation Dispersion Entropy
Previous Article in Special Issue
Steady-State Analysis of a Flexible Markovian Queue with Server Breakdowns
 
 
Font Type:
Arial Georgia Verdana
Font Size:
Aa Aa Aa
Line Spacing:
Column Width:
Background:
Article

Diffusion Equation-Assisted Markov Chain Monte Carlo Methods for the Inverse Radiative Transfer Equation

Department of Mathematics, University of Wisconsin-Madison, Madison, WI 53705, USA
*
Author to whom correspondence should be addressed.
Entropy 2019, 21(3), 291; https://doi.org/10.3390/e21030291
Submission received: 18 December 2018 / Revised: 28 February 2019 / Accepted: 8 March 2019 / Published: 18 March 2019
(This article belongs to the Special Issue Information Theory and Stochastics for Multiscale Nonlinear Systems)

Abstract

:
Optical tomography is the process of reconstructing the optical properties of biological tissue using measurements of incoming and outgoing light intensity at the tissue boundary. Mathematically, light propagation is modeled by the radiative transfer equation (RTE), and optical tomography amounts to reconstructing the scattering coefficient in the RTE using the boundary measurements. In the strong scattering regime, the RTE is asymptotically equivalent to the diffusion equation (DE), and the inverse problem becomes reconstructing the diffusion coefficient using Dirichlet and Neumann data on the boundary. We study this problem in the Bayesian framework, meaning that we examine the posterior distribution of the scattering coefficient after the measurements have been taken. However, sampling from this distribution is computationally expensive, since to evaluate each Markov Chain Monte Carlo (MCMC) sample, one needs to run the RTE solvers multiple times. We therefore propose the DE-assisted two-level MCMC technique, in which bad samples are filtered out using DE solvers that are significantly cheaper than RTE solvers. This allows us to make sampling from the RTE posterior distribution computationally feasible.

1. Introduction

Optical tomography is a medical imaging technique in which near infrared light is sent into biological tissue, and the reflected and transmitted outgoing light at the surface of the tissue is measured [1,2,3,4]. Using information about the incoming and outgoing light, one can determine the properties of the tissue. The use of low-energy light makes optical tomography cheaper and less invasive than traditional methods such as X-ray imaging, although the reconstruction of the properties of the tissue can be more difficult than when high-energy photons are used. Optical imaging can be used to study and monitor many different kinds of tissue, including brain, breast, and joint imaging, as well as monitoring blood oxygenation [5,6,7].
The process of collecting information about the incoming and outgoing light and using it to reconstruct the tissue’s properties is an inverse problem. There are two associated forward models that both map the incoming data (light intensity sent into the tissue) to the outgoing data (light intensity measured coming off the tissue). The first forward model corresponds to the radiative transfer equation (RTE) and is called the albedo operator. Using the albedo operator, the inverse problem may be solved to obtain the scattering coefficient in the RTE. The second forward model corresponds to the diffusion equation (DE) and is called the Dirichlet-to-Neumann (DtN) map. Using the DtN map, one may similarly solve the inverse problem and reconstruct the diffusion coefficient in the DE. Typically, RTE is used for high-energy photons and DE is used for low-energy photons. The energy of photons determines the “mean-free-path” of free transport, and then further determines the strength of scattering. Mathematically, it can be characterized by the Knudsen number, and one can show that in the small Knudsen number regime, the two equations are equivalent, as will be made clearer in Section 3. It was shown in [8] that the coefficients of the RTE are uniquely recoverable in 3D when the entire albedo operator is known, and it was shown in [9] that this reconstruction is Lipschitz stable. On the other hand, using the DtN map to reconstruct the corresponding coefficients of the DE is the famously ill-posed Calderón problem. The uniqueness of the reconstruction has been presented in [10], and the logarithmically ill-posed nature of the problem has been proved in [11].
These two forward models are related, as we will describe. The RTE describes light propagating through a material with some optical properties, here taken to be the scattering coefficient of the material. Let f ( x , v ) denote the distribution of particles at location x with direction v, where x Ω R d , and v S d 1 , the unit sphere in R d , where d is the dimension of the problem. In other words, all particles are taken to move with constant unit speed. In the later parts of the paper, we use “velocity” and “direction" interchangeably when no ambiguity is present. For simplicity, we work with a version of the RTE in which the scattering coefficient σ ( x ) depends only on position. Then the RTE is
v · f ( x , v ) = σ ( x ) L f ( x , v ) ,
where L is an integral operator describing photons colliding with the media and scattering off. Its explicit formulation is in Section 3.
The diffusion equation, on the other hand, is a simplified model from the RTE, accurate for low-energy photons in the high-scattering, low-absorption regime. Since the scattering is very strong, the distribution achieves equilibrium in the velocity domain, and the light intensity, also known as fluence, becomes a function only on physical space. Suppose that ρ ( x ) is the light intensity at position x, and a ( x ) is the diffusion coefficient (corresponding to 1 σ ( x ) from the RTE, as will be shown in Theorem 2). Then the DE is
C · a ( x ) ρ ( x ) = 0 ,
where C is a constant depending on dimension. In this case, the map from the Dirichlet data (light intensity or fluence injected into the tissue) to the Neumann data (light propagating out) is used to reconstruct the diffusion coefficient a ( x ) . This map is known as the Dirichlet-to-Neumann (DtN) map.
There is a relationship between the two forward models RTE and DE. One would like to understand, physically and mathematically, why the stability of each model is different in the inverse setting. It turns out that if a ( x ) and σ ( x ) satisfy certain relations, the RTE and DE are asymptotically close in the near infrared case. This is made precise below in Theorem 2. Physically, in the forward setting, high-energy photons experience little scattering before exiting the domain, whereas low-energy photons scatter frequently in the tissue before they exit. As a result, the reconstruction of the tissue using high-energy photons is generally crisp, whereas low-energy photons provide more blurred images. We use the “Knudsen number” ϵ to quantify how much scattering a photon experiences in the material. In the low-energy regime, scattering increases, the Knudsen number shrinks to zero, and the RTE converges to the DE in the forward setting. On the inverse side, the inverse problem for the RTE converges to the inverse problem for the DE, meaning that the information carried in the albedo operator is almost the same as that in the DtN map, and therefore the reconstruction of the tissue properties converges in this limit as well. This has been numerically observed in [12,13], and further proved rigorously in [14,15].
A Bayesian solution to the inverse problem is seen as the posterior distribution for the quantity of interest (QoI), in our case the scattering coefficient in the RTE and the diffusion coefficient in the DE. Bayesian methods are particularly useful for inverse problems, because noise in the measurement as well as prior information about the QoI is taken into account naturally [16]. If we observe some noisy data in order to obtain information about the QoI, the solution of the inverse problem is the probability distribution of the QoI given this data, known as the posterior distribution [17]. Bayes’ theorem allows us to determine this distribution given a guess for the probability distribution of the QoI (the prior distribution) and the probability distribution for the data given the QoI (the likelihood function, obtainable from the forward model). As such, there is a sharp distinction between this probabilistic view and other deterministic tools, and the Bayesian formulation gives one the ability to “regularize” an inverse problem with some prior knowledge.
While we theoretically have the posterior distribution in hand, accessing concrete information about it such as its mean and variance can be challenging. One way to obtain this information is by creating a list of samples from the distribution by some means, such as the Markov chain Monte Carlo (MCMC) method. Even this, however, is not always straightforward. One sample from the RTE represents an entirely new configuration of the media, meaning that one must re-compute the forward albedo map for each MCMC sample. This is especially expensive because the RTE is over phase space: if the domain is three-dimensional, then f is supported on ( x , v ) Ω × S d 1 , a five-dimensional space. The same process applies for the DE as well, but since the DE is only over physical space, these computations are much faster. Knowing that the RTE converges to the DE in the high scattering regime, one wishes to combine the two and speed up the computation.
There is a balance to be struck here, and the goal of this paper is to find a way to sample from the posterior distribution for σ in a way that combines the DE and RTE. To that end, we employ the two-level MCMC method [18], also known as the two-stage method. The two-level method uses two models, which we will call the low-resolution model and the high-resolution model. The high-resolution model gives rise to the desired target distribution—for us, the posterior distribution for the RTE. The low-resolution model gives rise to a distribution that approximates the target distribution, and is ideally fast to compute. It is used to filter out poor draws so we need not waste time evaluating the target distribution on them. We know the DE is fast to compute, and that when scattering is sufficiently strong, the posterior based on the DE is a good approximation to the posterior based on the RTE, so we use the posterior based on the DE as the low-resolution model. Then, we can use the DE to reduce the number of times we must solve the RTE by rejecting bad draws. This method combines the inverse problem for the DE and the inverse problem for the RTE to create a faster method for sampling from the inverse problem for the RTE in the diffusion limit.
There are similar methods to approach related problems, such as the multilevel Monte Carlo method for path simulation [19] or the multilevel Monte Carlo method for parametric integration [20]. These methods also combine low-resolution models with a high-resolution model. The algorithm our method relies on, introduced in [18], is similar to the methods proposed in [21,22]. In this view, our algorithm is a two-stage method that increases the number of MCMC iterations for a given computational cost. It uses a two-stage delayed acceptance, in which the candidate sample y has to be accepted with a low-resolution model before it is passed on to be either accepted or rejected by the high-resolution model. However, typically for these methods, a coarse-grid approximation is used for the low-resolution model [23]. No matter how coarse the grids are for the RTE, however, they are still on phase space and the dimensionality issue is still left open. In our method, however, a completely different model is used: the inverse diffusion equation is used as a low-resolution model for the inverse radiative transfer equation.
This paper is organized as follows. In Section 2, we provide some necessary background, including a discussion of Bayesian methods for inverse problems, as well as an introduction to Markov chain Monte Carlo methods. In Section 3, we discuss the diffusion limit of the radiative transfer equation in the forward setting, and go on to discuss the inverse setting convergence of the posterior distributions. In Section 4, we discuss the DE-assisted two-level MCMC method, and prove its convergence. We also discuss the dependence of the second-level acceptance rate on ϵ , the Knudsen number. In Section 5, we present our numerical evidence. In particular, we show the convergence of the two forward models, the convergence of the posterior distribution functions quantified using the Hellinger distance, and the improvement of the DE-assisted two-level MCMC over the standard MCMC.

2. Background

In this section, we present preliminaries for studying the inverse problem for the RTE in the diffusion limit. In particular, we will first review basic concepts of the Bayesian formulation, and then review the previous algorithms on which our algorithm is based.
These results will be used to study the convergence of the posterior distribution with the RTE or the DE as the forward model, which will be used to develop our numerical method that combines the two posterior distributions.

2.1. Bayesian Formulation for Inverse Problems

Bayesian inference is a technique for estimating the distribution of some quantity of interest when some measurements are available. It is based on Bayes’ theorem. A given physical problem may be denoted as
b = G ( σ ) + η ,
where G is the forward map that takes a parameter σ to the measurement b, with an added noise η . The forward problem is to find b = G ( σ ) for a given σ , while most problems in practice are naturally inverse problems, meaning that one conducts experiments and obtains data b, and tries to use it to reconstruct the quantity of interest σ .
The Bayesian formulation is a method for retrieving information about σ using b. It requires knowledge of the following two probability distribution functions μ 0 and μ error ahead of time,
-
without knowledge of the measurement b, a priori σ obeys a certain law,
σ μ 0 ( σ ) ,
-
and the noise is distributed as:
η = b G ( σ ) μ error ( η ) .
In many cases, both distribution functions can be assumed to be normal, i.e., to have a Gaussian-type density function. Suppose that μ 0 is a Gaussian distribution concentrated at m 0 with covariance C 0 and the noise is concentrated at zero with covariance C error . If σ is finite dimensional then we may express the prior μ 0 and likelihood μ σ as follows:
μ 0 ( σ ) = N ( m 0 , C 0 ) exp 1 2 ( σ m 0 ) C 0 1 ( σ m 0 ) μ σ ( b ) = μ error ( η ) exp 1 2 b G ( σ ) C error 1 b G ( σ ) .
Analogous formulae are also available in the infinite dimensional setting; see [16,17] and the references therein.
With this, the posterior distribution of σ , under the condition that b is obtained in experiment, is given by Bayes’ theorem,
μ b ( σ ) = 1 Z μ 0 ( σ ) μ σ ( b ) = 1 Z exp 1 2 σ m 0 C prior 1 σ m 0 b G ( σ ) C error 1 b G ( σ ) ,
where
Z = μ σ ( y ) d μ 0 ( σ )
is the normalization factor.
In optical tomography, there are two fundamental models for describing light propagation: the radiative transfer equation (RTE), and the diffusion equation (DE). They rely on the albedo operator and the DtN map for the reconstruction. In the optically thick case (i.e., when photons scatter frequently), the two models are asymptotically close in some sense,
G RTE ( σ ) G DE ( σ ) .
Correspondingly, the posterior distributions given by the two models are close to each other [24], meaning:
μ DE b ( σ ) μ RTE b ( σ ) .
To be more precise, there are multiple ways to quantify the distance between two distribution functions. Typically, this distance is quantified by the Kullback–Leibler (KL) divergence or the Hellinger distance [25]. For any two distributions μ and μ supported on the same space, they are defined by the following,
-
KL divergence
d KL ( μ , μ ) = log μ μ d μ ,
-
Hellinger distance
d Hell ( μ , μ ) 2 = 1 2 d μ d λ d μ d λ 2 d λ ,
where λ is any pre-chosen distribution function. The Hellinger distance is invariant to the choice of λ . Again, these formulae have interpretations in the infinite dimensional setting; see the Appendix of [17]. In Theorem 3, we quantify the similarity between μ DE b ( σ ) and μ RTE b ( σ ) and find that the two distributions are ϵ apart in the diffusion limit. Therefore, since μ DE b is close to μ RTE b in the optically thick case, the main task in this paper is to use μ DE b to sample from μ RTE b .

2.2. The Markov Chain Monte Carlo Method

We first present the standard Metropolis–Hastings (MH) algorithm. Given a probability density μ called the target distribution and defined on X, the MH algorithm constructs a Markov chain on X that is stationary with respect to μ . The elements of the Markov chain are then regarded as samples from the distribution μ . More specifically, the MH algorithm starts with an initial guess x 0 , and draws new samples according to a proposal distribution q. By adjusting the acceptance rate using the target distribution μ , the MH algorithm accepts or rejects the draws so that the accepted samples form an empirical distribution that resembles the target distribution μ . We now present the MH algorithm, shown in Algorithm 1.
Algorithm 1 Metropolis Hastings
  • Given x k , draw y q ( · , x k ) .
  • Let
    α ( x k , y ) : = min 1 , q ( y , x k ) μ ( y ) q ( x k , y ) μ ( x k ) .
  • With probability α , accept y and set x k + 1 = y . Otherwise, set x k + 1 = x k .
The transition kernel P of this algorithm is
P ( x k , A ) = P { x k + 1 A | x 0 , , x k } .
In order to demonstrate the convergence of MCMC in Theorem 1, it is necessary to examine the transition kernel, which is the probability that the next draw x k + 1 is in the set A given the previous elements of the chain x 0 , , x k . It may be written
P ( x , y ) = p ( x , y ) + δ y ( x ) r ( x ) .
where the off-diagonal density of the kernel is
p ( x , y ) = q ( x , y ) α ( x , y ) if x y 0 if x = y ,
and the probability that the process remains at x is
r ( x ) = 1 p ( x , y ) d y .
The goal is to show that the target distribution μ is an invariant distribution of the Markov chain { x k } k 0 in the sense that all measurable sets A satisfy
μ ( A ) = μ ( x ) P ( x , A ) d x .
This means that elements of the Markov chain generated by Algorithm 1 give a good representation of the distribution μ . To show that μ is the invariant distribution, one first needs p ( x , y ) to satisfy the detailed balance lemma.
Lemma 1.
The off-diagonal density p ( x , y ) satisfies the following equation known as “detailed balance",
μ ( x ) p ( x , y ) = μ ( y ) p ( y , x ) .
This lemma allows us to show the following theorem.
Theorem 1.
The target distribution μ is an invariant distribution of the Markov chain x n with transition kernel P, i.e.,
μ ( A ) = P ( x , A ) μ ( x ) d x .
For our problem, each new proposal y is a new configuration of the media σ ( x ) . Thus, to compute α ( x k , y ) , one must evaluate μ ( y ) and thus re-compute the forward map, which requires solving the RTE many times. As this is computationally prohibitive, we instead use the two-level MCMC method, described in the next section. The cost comparison is discussed further in Section 5.

2.3. Two-Level MCMC Method

The two-level MCMC method is a method to increase the efficiency of sampling from the target distribution μ . It requires two distributions: a target distribution μ , and a second distribution μ . Here, the target distribution μ calls for the evaluation of a “high-resolution” model, but if it can be approximated in some sense by μ which only calls for the evaluation of some “low-resolution” model, then μ can serve as a good filter to reject poor draws in the MCMC sampling. More specifically, the algorithm goes through two levels of evaluating a proposed sample. First, the sample is evaluated using the low-resolution model and is either accepted or rejected. If it is accepted, the algorithm goes on to evaluate the sample using the high-resolution model. This “pre-acceptance” stage filters out poor draws, allowing one to make more courageous proposals and not waste time evaluating the forward model on them. We present the two-level scheme below in Algorithm 2.
Algorithm 2 Two-level Metropolis Hastings
  • Given x k , draw y q ( · , x k )
  • Let
    α ( x k , y ) : = min 1 , q ( y , x k ) μ ( y ) q ( x k , y ) μ ( x k ) .
  • With probability α , pre-accept y and continue to 4. Otherwise, set x k + 1 = x k and start over.
  • The second-level proposal is now y, effectively drawn from
    q 2 ( y , x k ) = α ( x k , y ) q ( y , x k ) + δ x k ( y ) 1 α ( x k , y ) q ( y , x k ) d y .
    Then set
    β ( x k , y ) : = min 1 , q 2 ( y , x k ) μ ( y ) q 2 ( x k , y ) μ ( x k ) .
    With probability β ( x k , y ) , accept and set x k + 1 = y . Otherwise, set x k + 1 = x k .
Similarly to the MCMC method (Algorithm 1), in the two-level MCMC method, the draw of x k + 1 merely depends on the evaluation of x k , and the previous draws x 1 , x k 1 are irrelevant. The transition kernel that brings x k to x k + 1 is denoted by P 2 , which we will discuss in more detail in Section 4.1.
In this paper, the desired high-resolution model will be the posterior distribution for the RTE. The low-resolution model will be the posterior distribution for the DE. Evaluating μ ( y ) only involves computing the DE, which is much faster since the DE is supported on the physical domain. Once the DE accepts the proposal y, it passes to the inverse problem for the RTE’s posterior distribution μ ( y ) . Thus, we compute the RTE forward model fewer times overall, which saves time.
In Section 4, we discuss the convergence of this method and the dependence of the second-level acceptance rate β on the Knudsen number ϵ , our limit parameter.

3. Diffusion Limit

In this section, we examine the diffusion limit of our problem in the forward and inverse case. We first study the convergence of the radiative transfer equation to the diffusion equation. Then, we explain how the inverse problems are solved using the forward map. Next, we discuss the convergence of the forward map for the radiative transfer equation to that of the diffusion equation. Finally, we discuss the convergence of the posterior distribution for the radiative transfer equation to the posterior for the diffusion equation.

3.1. Diffusion Limit of the Radiative Transfer Equation

The optical “thickness” of the material physically corresponds to the number of times a photon scatters between entering a medium and escaping. The physical quantity is termed the Knudsen number, which stands for the ratio of mean free path to the domain length. The mean free path is the average distance a photon travels before being scattered. When the Knudsen number is small, photons, on average, scatter many times before they are emitted, and the material is thus regarded as optically “thick”. In this regime, the two mathematical models for light propagation carry the same information, namely, the radiative transfer equation and the diffusion equation are asymptotically converging.
The radiative transfer equation takes a statistical mechanics viewpoint, and describes the distribution of photons on the phase domain. Let f ( x , v ) denote the number of photons at position x Ω R d , a bounded domain, moving in direction v S d 1 , the unit sphere in R d (i.e., the speed is normalized to be 1). This distribution satisfies the RTE,
v · f ( x , v ) = σ ( x ) L f ( x , v ) ,
where the collision operator is
L f = f ( x , v ) d v f = f v f .
In the equation, the term v · f on the left shows that the photons move with direction v, and the term on the right shows that the photons colliding with the media and being scattered. We have used the notation v to denote normalized integration over v, and d v is the normalized unit measure, meaning
S d 1 1 d v = 1 .
The equation has a unique solution when it is equipped with the incoming boundary condition, which is the analogue of a Dirichlet boundary condition for equations lacking velocity space. Let
Γ ± = { ( x , v ) : x Ω , ± v · n x > 0 }
denote the collection of coordinates on the boundary x Ω , so that the velocity v points in/out of the domain: ± v · n x > 0 . Here, n x is the normal vector at x pointing out of Ω . The incoming boundary condition is imposed on Γ , whereas Γ + represents the particles going out of Ω . For a unique solution to (7), boundary conditions must be imposed on Γ ,
f | Γ = ϕ ( x , v ) .
Remark 1.
In fact, a more general model of the radiative transfer equation is
v · x f ( x , v ) = k ( x , v , v ) f ( x , v ) d v σ a ( x , v ) f ( x , v ) .
It concerns the case when the scattering coefficient, now seen as k ( x , v , v ) , may depend on the changing velocity of the particles during a collision, and also includes an absorption coefficient σ a ( x , v ) representing the photons being absorbed into the material and lost. A standard k which takes into account other kinds of scattering is given by the Henyey–Greenstein model. The absorption coefficient can be taken to be zero if scattering is sufficiently high, and this is the case we focus on.
The equation is asymptotically equivalent to the diffusion equation in the optically thick regime when the Knudsen number is small. We denote the Knudsen number by ϵ and rescale the problem by setting σ σ / ϵ . Then, as the Knudsen number becomes small, the scattering effect dominates. Equation (7) may then be written as
v · f = 1 ϵ σ L f , ( x , v ) Ω × S d 1 f | Γ = ϕ ( x , v ) .
In the small ϵ regime, it was conjectured in [26] and then proved in [27,28] that the equation is asymptotically equivalent to the diffusion equation. One can make the convergence explicit under the following assumptions.
Assumption 1.
Both the media and the boundary conditions are bounded.
  • the admissible media is bounded, meaning that there is a constant C so that:
    max { σ L ( Ω ) , σ 1 L ( Ω ) , σ 1 L ( Ω ) } < C ;
  • and the boundary conditions are bounded, meaning:
    max { ξ L ( Ω ) , ϕ L ( Γ ) } < C .
We also term the set of admissible media:
A = { σ W 1 , ( Ω ) : max { σ L ( Ω ) , σ 1 L ( Ω ) , σ 1 L ( Ω ) } < C } .
With the assumption, we have the following theorem.
Theorem 2.
Suppose that f ( x , v ) satisfies Equation (9), then as ϵ 0 , f ( x , v ) ρ ( x ) , which satisfies
C d · 1 σ ρ = 0 , x Ω R d ρ | Ω = ξ ( x ) ,
where ξ ( x ) is defined by ϕ ( x ) . C d is a constant depending on the dimension d and could be dropped out of Equation (11). In particular, with compatible boundary conditions at different orders, one approximates f through different forms:
  • if ϕ ( x , v ) = ξ ( x ) :
    f ρ L ( Ω ) < C A ϵ .
  • if ϕ ( x , v ) = ξ ( x ) ϵ 1 σ v · ξ :
    f ρ + ϵ σ 1 v · ρ L ( Ω ) < C A ϵ 2 .
Here, the constant C A depends on C , the upper bound in Assumption 1 for the admissible set.
The proof, which we omit here, relies on asymptotic expansion away from the boundary.

3.2. Convergence in the Inverse Setting

We examine the convergence in the inverse setting in this section. To describe light propagation in a given tissue in Ω , there are two models, the radiative transfer equation that gives a statistical description, and the diffusion equation that characterizes the macroscopic behavior. The two models are asymptotically equivalent, as discussed in the previous section.
In optical tomography, light with a known intensity is injected into the material, and detectors are placed on the tissue boundary to collect the light current emitted from the material. For the RTE, the mapping from the incoming data to the outgoing data is termed the albedo operator, defined by  H RTE ,
H RTE ( σ ) : ϕ ( x ) h RTE ( x ) = 1 C d ϵ v · n f | Γ + d v ,
where f satisfies (9). It may also be written:
h RTE = H RTE ( σ ) ϕ .
In practice, finitely many incoming data ϕ k are injected and finitely many measurements are taken at the boundary location l j per experiment. We define the map, determined by the to-be-reconstructed σ , from the known incoming data to the measured outgoing data as:
b j , k = l j ( h RTE ) + η j , k = l j ( H RTE ( σ ) ϕ k ) + η j , k , ( j , k ) { 1 , , J } { 1 , , K } ,
or in a compact form:
b = G RTE ( σ ) + η ,
where η is a vector of J K length that contains the noise in the measurements. Clearly, the J K length vector b is the result of a forward map G acting on the quantity of interest σ , with a small perturbation due to the noise. We assume that the noise is distributed as a Gaussian,
η N ( 0 , γ 2 I ) exp 1 2 γ 2 σ 2 ,
meaning:
b | σ N ( G RTE ( σ ) , γ 2 I ) exp 1 2 γ 2 b G RTE ( σ ) b G RTE ( σ ) .
According to Bayes’ theorem, one then has:
μ RTE b ( σ ) = 1 Z RTE μ RTE σ ( b ) μ 0 ( σ ) .
For the DE model, we consider the forward map to be the map that takes the Dirichlet data to the Neumann outflow. It is termed the DtN map:
H DE ( σ ) : ξ ( x ) h DE ( x ) = 1 σ ρ n ,
where ρ satisfies (11). Another way to write it is:
h DE = H DE ( σ ) ξ .
Again, in practice, finitely many incoming data ξ k are injected and finitely many measurements are taken at the boundary l j per experiment. We define the map, determined by the to-be-reconstructed media σ , from ξ k to the measured data as:
b j , k = l j ( h DE ) + η j , k = l j ( H DE ( σ ) ϕ k ) + η j , k , ( j , k ) { 1 , , J } { 1 , , K } ,
or:
b = G DE ( σ ) + η ,
where η is the same pollution in the measurement. Again, the vector b is the result of the forward map for the diffusion equation acting on the to-be-reconstructed σ , with the perturbation from the noise. Then, similarly, we have
μ DE b ( σ ) = 1 Z DE μ DE σ ( b ) μ 0 ( σ ) .
It is proved in [24] that the two forward maps converge, namely:
Proposition 1.
Under Assumption 1, the forward maps G RTE and G DE satisfy
G RTE ( σ ) G DE ( σ ) = O ( ϵ ) ,
for all σ A .
Using this convergence, it was also proved that the posterior distributions are close in the diffusion limit, as in the following theorem.
Theorem 3.
Under Assumption 1, the Hellinger distance between the two posterior distribution is bounded by ϵ in the optically thick regime when ϵ 0 , namely,
d Hell ( μ RTE b , μ DE b ) = O ( ϵ ) .
Similarly, the Kullback–Leibler divergence between the posterior distribution for the RTE and the posterior distribution for the DE is also O ( ϵ ) ,
d KL ( μ RTE b , μ DE b ) = O ( ϵ ) .
The proof largely depends on the Lipschitz continuity of the Gaussian form in the likelihood function, and the convergence result in Proposition 1. We omit the details and refer interested readers to [24].
Using Theorem 3, when ϵ is sufficiently small, one may use the diffusion equation posterior to approximate the radiative transfer equation posterior. This approximation allows us to speed up the MCMC computation by setting μ DE b = μ as the low-resolution model in the first level. This filters out bad draws, passing better draws to μ RTE b = μ on the second level.

4. Algorithm

In this section, we discuss the DE-assisted two-level MCMC method and its convergence for our case, the inverse problem for the RTE in the diffusion limit. We present a result about the second-level acceptance rate of two-level MCMC and its dependence on ϵ and a result about the computational cost of our algorithm compared to the one-level MCMC method.

4.1. DE-Assisted Two-Level MCMC Method

In this section, we discuss our algorithm, the DE-assisted two-level MCMC method. It is shown below in Algorithm 3.
Algorithm 3 DE-assisted two-level Metropolis Hastings
  • Given x k , draw y q ( · , x k )
  • Let
    α ( x k , y ) : = min 1 , q ( y , x k ) μ DE b ( y ) q ( x k , y ) μ DE b ( x k ) .
  • With probability α , pre-accept y and continue to 4. Otherwise, set x k + 1 = x k and start over.
  • The second-level proposal is now y, effectively drawn from
    q 2 ( y , x k ) = α ( x k , y ) q ( y , x k ) + δ x k ( y ) 1 α ( x k , y ) q ( y , x k ) d y .
    Then set
    β ( x k , y ) : = min 1 , q 2 ( y , x k ) μ RTE b ( y ) q 2 ( x k , y ) μ RTE b ( x k ) .
    With probability β ( x k , y ) , accept and set x k + 1 = y . Otherwise, set x k + 1 = x k .
The transition kernel P 2 may be written
P 2 ( x , y ) = p 2 ( x , y ) + r 2 ( x ) δ x ( y ) ,
where
p 2 ( x , y ) = q 2 ( x , y ) β ( x , y ) x y 0 x = y
is called the second-level off-diagonal density, and r 2 ( x ) = 1 p 2 ( x , y ) d y . As before, to demonstrate the convergence of MCMC, we will examine the transition kernel, which is the probability that the next draw x k + 1 is in the set A given the previous elements of the chain x 0 , , x k .
For the two-level MCMC method, one desires that the high-resolution target distribution μ RTE b is an invariant distribution of the Markov chain { x k } k 0 . By definition, this is true if for all measurable sets A,
μ RTE b ( A ) = μ RTE b ( x ) P ( x , A ) d x .
This means that elements of the Markov chain generated by Algorithm 3 give a good representation of the distribution μ RTE b . To show that μ RTE b is the invariant distribution, we first need the following two lemmas.
Lemma 2.
The second-level acceptance rate may be written
β ( x k , y ) = min 1 , q 2 ( y , x k ) μ R T E b ( y ) q 2 ( x k , y ) μ R T E b ( x k ) = min 1 , μ D E b ( x k ) μ R T E b ( y ) μ D E b ( y ) μ R T E b ( x k ) ,
where q 2 is defined as in Algorithm 2.
Proof. 
From Lemma 3, for x k y , we have
q 2 ( y , x k ) q 2 ( x k , y ) = α ( y , x ) q ( y , x ) α ( x , y ) q ( x , y ) = μ DE b ( x ) μ DE b ( y ) .
Plugging this in to our definition of β , we find
β ( x k , y ) = min 1 , μ DE b ( x k ) μ RTE b ( y ) μ DE b ( y ) μ RTE b ( x k ) .
 □
The importance of this lemma is that it equates β , which depends on q 2 , to a form that is computable. Note that q 2 has a complicated integral form and thus is numerically challenging to compute. This form of the second-stage acceptance rate β has been calculated in [18]. Later, we will discuss the dependence of β on ϵ through the inverse problem for the RTE’s posterior distribution and show that when the RTE and DE are close, β is close to one.
We next discuss a property of β that we will need to show the convergence of the two-level MCMC method.
Lemma 3.
The second-level off-diagonal density satisfies the detailed balance equation
p 2 ( x , y ) μ R T E b ( x ) = p 2 ( y , x ) μ R T E b ( y ) .
Proof. 
Considering the form of β in Equation (24), when μ DE b ( x ) μ RTE b ( y ) < μ DE b ( y ) μ RTE b ( x ) ,
β ( x , y ) = μ DE b ( x ) μ RTE b ( y ) μ DE b ( y ) μ RTE b ( x ) , β ( y , x ) = 1 .
Similarly, when μ DE b ( x ) μ RTE b ( y ) > μ DE b ( y ) μ RTE b ( x ) ,
β ( x , y ) = 1 , β ( y , x ) = μ DE b ( y ) μ RTE b ( x ) μ DE b ( x ) μ RTE b ( y ) ,
so dividing them gives
β ( x , y ) β ( y , x ) = μ DE b ( x ) μ RTE b ( y ) μ DE b ( y ) μ RTE b ( x ) = μ RTE b ( y ) q 2 ( y , x ) μ RTE b ( x ) q 2 ( x , y ) ,
using Lemma 1. Simplifying gives
μ RTE b ( x ) q 2 ( x , y ) β ( x , y ) = μ RTE b ( y ) q 2 ( y , x ) β ( y , x ) .
 □
This lemma gives rise to the following theorem.
Theorem 4.
The second-level distribution μ R T E b is the invariant distribution of the Markov chain { x n } with second-level transition kernel P 2 ( x , A ) , i.e.,
μ R T E b ( A ) = p 2 ( x , A ) μ R T E b ( x ) d x ,
where the transition kernel p 2 ( x , y ) is defined as
P 2 ( x , y ) = p 2 ( x , y ) + r 2 ( x ) δ x ( y ) .
Proof. 
Consider
P 2 ( x , A ) μ RTE b ( x ) d x = p 2 ( x , y ) μ RTE b ( x ) d x + r 2 ( x ) δ x ( A ) μ RTE b ( x ) d x = A p 2 ( y , x ) μ RTE b ( y ) d x d y + A r 2 ( x ) μ RTE b ( x ) d x ,
using Lemma 3 and using the delta function to perform the integration over x in the second term. Then, from the definition of p 2 ,
P 2 ( x , A ) μ RTE b ( x ) d x = A ( 1 r 2 ( y ) ) μ RTE b ( y ) d y + A r 2 ( x ) μ ( x ) d x = A μ RTE b ( y ) d y = μ RTE b ( A ) .
 □
This theorem demonstrates that the high-resolution target distribution, for us μ RTE b , is the invariant distribution of P 2 . Thus, the two-level MCMC method gives a list of elements { x k } that can be regarded as samples from μ RTE b .
To generate samples from the posterior based on the RTE, using the one-level MCMC method, for each new proposal, the forward map must be computed. To do this, one injects K boundary data ϕ k and computes J outgoing data l j , meaning that the RTE is solved K times and each time the solution is evaluated at J locations. As this is computationally expensive, one looks for a cheaper way to sample from the distribution. From Theorem 3, we have that in the diffusion limit, the posterior based on the RTE is close to the posterior based on the DE. The diffusion equation is significantly faster to solve because it is only over physical space. Therefore, in the strong scattering regime, we can save time on the computation by using the DE-assisted two-level MCMC method, in which the DE is used as a surrogate model to filter out bad draws. This saves computational effort spent on evaluating bad samples using the high-resolution model. The diffusion equation posterior can be used to reject bad samples, so that we have to evaluate the RTE posterior fewer times overall.

4.2. Properties of DE-Assisted Two-Level MCMC

In this section, we first present our result concerning the acceptance rate of the DE-assisted MCMC method and its dependence on ϵ , and then present the computational cost of our method compared to the one-level MCMC method.

4.2.1. Acceptance Rate of DE-Assisted Two-Level MCMC

There are many ways to improve MCMC. Different sampling methods such as the Gibbs sampler or independence samplers can make the algorithm more efficient. There are also delayed acceptance/rejection methods, and adaptive methods in which the low-resolution model is improved at each stage using results from the high-resolution model [23]. However, this is not the focus of the current paper. Our method is a delayed acceptance method. It relies on the two-level MCMC algorithm and the diffusion limit to improve the efficiency of sampling from the posterior distribution for the RTE.
We emphasize that the DE-assisted two-level MCMC algorithm in Algorithm 3 has two acceptance rates, α and β . The first-level acceptance rate α depends only on the posterior based on the DE, so it can have no ϵ dependence. The second-level acceptance rate β , depends on ϵ through the posterior based on the RTE. As ϵ 0 , the posterior based on the RTE becomes closer to the posterior based on the DE, so we expect that the samples that pass the selection criterion have a high chance of being accepted in the second step as well. We quantify this in the following proposition.
Proposition 2.
In the diffusion limit, the acceptance rate β is high:
| β 1 | = O ( ϵ 1 α ) ,
as long as the proposal is reasonably close to the measured data,
G DE ( y ) b 2 γ α ln ϵ ,
where γ is the variance of the noise and α is any constant between 0 and 1.
Proof. 
Suppose G DE ( y ) b 2 γ α ln ϵ . Then, the likelihood function for the inverse diffusion equation is
μ DE σ ( y ) = exp 1 γ 2 G DE ( y ) b 2 2 > exp 1 γ 2 ( γ 2 ( α ln ϵ ) ) .
In other words,
μ DE σ ( y ) > ϵ α .
Since the likelihood function is greater than ϵ and the prior has no ϵ dependence, the posterior distribution will also be greater than ϵ ,
μ DE b ( y ) = μ 0 ( σ ) μ DE σ ( y ) = O ( ϵ α ) .
Considering the form of β in Equation (24), we have
β = min 1 , μ DE b ( x ) μ RTE b ( y ) μ DE b ( y ) μ RTE b ( x ) = min 1 , μ DE b ( x ) ( μ DE b ( y ) + C ( y ) ϵ ) μ DE b ( y ) ( μ DE b ( x ) + C ( x ) ϵ ) ,
using Theorem 3. Considering
μ DE b ( x ) ( μ DE b ( y ) + C ( y ) ϵ ) μ DE b ( y ) ( μ DE b ( x ) + C ( x ) ϵ ) = μ DE b ( x ) μ DE b ( y ) + μ DE b ( x ) C ( y ) ϵ μ DE b ( y ) μ DE b ( x ) + μ DE b ( y ) C ( x ) ϵ = 1 + C ( y ) ϵ / μ DE b ( y ) 1 + C ( x ) ϵ / μ DE b ( x ) = 1 + C ( y ) ϵ μ DE b ( y ) 1 C ( x ) ϵ μ DE b ( x ) + = 1 + O ( ϵ 1 α ) ,
where the Taylor expansion is possible due to Equation (27). Therefore, for small ϵ ,
| β 1 | = O ( ϵ 1 α ) .
 □
This proposition demonstrates that as ϵ 0 and the posterior based on the RTE becomes closer to the posterior based on the DE, more and more samples are accepted at the second level.

4.2.2. Computational Cost Comparison

In this section, we discuss the computational cost of our method compared to the one-level MCMC method.
Proposition 3.
Let Cost 1 denote the cost of obtaining k accepted samples using the one-level MCMC method, and Cost 2 denote the cost of obtaining k accepted samples using the DE-assisted two-level MCMC method. Let r 1 denote the acceptance rate of the one-level MCMC method. Let α denote the first-level acceptance rate of the DE-assisted two-level MCMC method, and β denote the second-level acceptance rate of the method.
  • Cost 1 and Cost 2 are
    C o s t 1 = k r 1 C R C o s t 2 = k α β C D + k β C R ,
    where C D = O ( N x c d ) is the cost of solving the diffusion equation with N x grid points in d dimensions, where c is the cost of the linear algebra solver used. Similarly, C R = O ( N x c d N v c ( d 1 ) ) is the cost of solving the radiative transfer equation with the same parameters, and N v is the number of grid points in the velocity domain.
  • Considering C R C D , we have
    C o s t 2 C o s t 1 r 1 β .
    Thus, the cost saving of our method comes from β > r 1 .
Proof. 
First, we examine the cost of evaluating the posterior distribution based on the diffusion equation compared to the posterior based on the radiative transfer equation. The cost of evaluating each posterior distribution is directly proportional to the cost of computing each solution to the equation, so we examine instead the cost of computing a solution to the diffusion equation and the radiative transfer equation. Considering C D = O ( N x c d ) , and C R = O ( N x c d N v c ( d 1 ) ) , we see that the diffusion equation is significantly cheaper to compute than the radiative transfer equation, depending on the number of grid points in the velocity domain.
Next, we compute the number of MCMC iterations required to obtain k accepted samples for each method. For the one-level MCMC method, we simply have
N 1 r 1 = k ,
where r 1 is the acceptance rate and N 1 is the total number of iterations considered. For the two-level method,
N 2 r 2 = k ,
where N 2 is the total number of iterations, and r 2 is the overall acceptance rate. To be more specific, a single sample must go through both levels of evaluation in order to be accepted. Supposing that the acceptance rate at the first level is α , and the acceptance rate at the second level is β , then, the overall acceptance rate is r 2 = α β , which gives us
N 2 α β = k .
Using the above results, we can compute the cost of obtaining k accepted samples using the one-level MCMC method,
Cost 1 = N 1 C R = k r 1 C R .
Similarly, the cost of obtaining k accepted samples using the two-level method is
Cost 2 = N 2 C D + k β C R = k α β C D + k β C R .
Considering C D C R , the term containing C D may be dropped out of Equation (29). Then we have
Cost 2 Cost 1 = r 1 β .
Then for β r 1 , the cost of obtaining k samples using the DE-assisted two-level MCMC is much less than the cost of obtaining k samples using the one-level MCMC. □
In practice, β and r 1 depend on the sampling strategy used, the initial values of the MCMC parameters, and the step size, so we give only a theoretical asymptotic estimate of the cost comparison.

5. Numerics

We summarize our numerical evidence in this section. We will first show the convergence in the forward setting, and demonstrate that the solution to the RTE indeed converges to that of the DE. We then demonstrate, with two different media configurations, the convergence of the posterior distribution using the Metropolis–Hastings MCMC method. The 2-level MCMC result will also be shown.
Throughout this section, the DE is computed using the standard finite element method, and the RTE solver is a preconditioned GMRES-based method with upwinding in the physical domain, designed in [29].

5.1. Forward Model Convergence

We first review the convergence in the forward setup. As discussed in Theorem 2, in the zero limit of ϵ , the solution to the RTE becomes the solution to the DE, which drives the convergence of the albedo operator to the DtN map. We show this numerically below.
We first use a pseudo-2D example, with the y-direction assumed to be homogeneous. The RTE is then a degenerate case given by
cos θ x f = σ ( x ) L f .
Here, the media is set as
σ ( x ) = 1 + 9 χ [ 0.05 , 0.15 ] ( x ) + 19 χ [ 0.35 , 0.45 ] ( x ) + 29 χ [ 0.75 , 0.85 ] ( x ) ,
where χ [ a , b ] is the characteristic function. Figure 1 contains a plot of the solution ρ as a function of x for the given σ for the RTE with ϵ = 1 , 2 3 , and 2 6 , as well as the solution for the DE. Numerically, we set d x = 0.05 and d v = 2 π / 16 . As ϵ becomes smaller and smaller, the density ρ of the radiative transfer equation becomes closer and closer to the solution of the diffusion equation. We plot the error on the right panel of Figure 1 on a log-log scale as a function of ϵ , measured in L 2 ( d x ) . The values of the error are shown in Table 1.
We furthermore plot the solution f as a function on the phase space. As shown in Theorem 2, in the diffusion limit, f loses its velocity dependence, and the solution becomes flat in v. This is seen numerically as well in Figure 2, which shows f ( x , v ) for different ϵ .
Similar convergence is numerically evaluated in two dimensions where Ω = [ 0 , 1 ] 2 and σ is chosen to be
σ ( x ) = 1 + h χ ( | x c | < r ) ,
where x = ( x , y ) [ 0 , 1 ] 2 , h = 10 and r = 0.4 . χ is the characteristic function, and the center c = ( 0.5 , 0.5 ) is given. It is hard to visualize a 3D object, and we only plot the difference in ρ between the radiative transfer and diffusion solutions for different ϵ in Figure 3. We observe that as ϵ 0 , the error decreases, as plotted in Figure 4.

5.2. Reconstruction Example 1

In this example, we test the inverse problem in Ω = [ 0 , 1 ] 2 R 2 with the true media
σ ( x ) = 1 + h χ ( | x c | < r ) ,
where r = 0.4 and h = 10 , and c = ( 0.5 , 0.5 ) . It represents a circle with radius r and height h in the middle of the unit square. Since the media configuration is uniquely determined by the two parameters r and h, the reconstructed posterior distribution then is a distribution function on the ( r , h ) domain. The convergence and the accuracy of the method is independent of dimension, but we need to use an example that can be visualized to the readers. Besides, in most application problems the bio-tissues to be reconstructed have two or three components that have very different optical properties, and the model used here is a realistic representation of the physical scenario. To compute the forward RTE, we use the GMRES method. We set d x = 0.05 and d v = 2 π / 16 . The light sources are placed on the left boundary ( x = 0 ) for all discrete grids in y-space. This means 20 experiments are being looked at. For all these experiments, the sensors are located at all grid points on the boundaries. For each MCMC sample ( r , h ) , to evaluate μ RTE b ( r , h ) , 20 forward RTE solvers are run with the media configuration ( r , h ) with the boundary condition being Kronecker delta functions at each light source.
We briefly mention that to visualize the data, we do apply the kernel density estimator that smooths out the solution with a Gaussian kernel.
We first consider the solution to this problem in the one-level MCMC framework. The prior distribution is taken to be uniform in r from 0 to 0.5, and uniform in h from 8 to 12, and the variance of the noise is taken to be 10 4 . With merely 10 3 MCMC steps, one can already distinguish the distribution functions. As seen in Figure 5, the posterior distribution using the RTE model with ϵ = 1 is much more smeared out than the one with ϵ = 2 6 , and the posterior distribution using the RTE model with ϵ = 2 6 is significantly closer to that using the DE model.
To quantize the difference between these three distributions, we compute the Hellinger distances, as documented in Table 2, which numerically confirms Theorem 3.
We then study the two-level MCMC method with the same parameters. The Gaussian-kernel smoothed posterior distributions are shown in Figure 6.
We examined the acceptance rate β : the number of accepted particles on the second level versus the number of accepted particle on the first level. As suggested in Proposition 2, smaller ϵ should give a higher acceptance rate, and this is truly seen in numerics. The results are shown in Table 3.
We also expect for small ϵ , distributions computed using the one-level MCMC should be similar to the ones computed using DE-assisted method. To compare them, we compute the Hellinger distance between the distribution based on the RTE computed using one-level MCMC for ϵ = 2 6 , in Figure 5 and the distribution based on the RTE computed using the DE-assisted MCMC for ϵ = 2 6 , shown in Figure 6. We obtain
d Hell μ RTE b ( σ ) 1 - level , μ RTE b ( σ ) DE - assisted = 0.2289 ,
demonstrating that the distributions based on the methods are similar.
Finally, we report the computational cost savings of using the DE-assisted two-level MCMC method compared to the one-level MCMC method. The values of β and r 1 are documented in Table 4. From the table, we see that using the DE-assisted two-level MCMC method saves us roughly 20% of the RTE evaluations, for each value of ϵ .
We emphasize that β and r 1 highly depend on the sampling strategy used, the initial values of the MCMC parameters, the step size and how the prior distribution looks. If the prior distribution is centered far away from the maximum likelihood point, a large amount of sample points are needed to give a fair presentation of the posterior distribution function, and the behavior of the underlying equation would be crucial in computational saving.

5.3. Reconstruction Example 2

In this example, we take a more complicated media σ :
σ ( x ) = 1 + i = 1 5 h i χ i ( | x c i | < r 2 ) ,
where x = ( x , y ) [ 0 , 1 ] 2 and c i are given, and χ i are characteristic functions. The list of height h i and radii r i are parameters to be reconstructed, and thus we have ten unknown parameters. Numerically, we set d x = 0.05 , d v = 2 π / 16 , and we place the light sources at the left boundary as in Example 1. Again, we take 10 3 MCMC steps. As before, to compute the forward RTE, we use the GMRES method. The light sources are placed on the left boundary ( x = 0 ) for all discrete grids in y-space. This means that 20 experiments are being looked at. For all these experiments, the sensors are located at all grid points on the boundaries. For each MCMC sample ( r , h ) , to evaluate μ RTE b ( r , h ) , 20 forward RTE solvers are run with the media configuration ( r , h ) with the boundary condition being Kronecker delta functions at each light source.
The true parameters are set as
( h 1 , r 1 ) = ( 5 , 0.19 ) , ( h 2 , r 2 ) = ( 1 , 0.1 ) , ( h 3 , r 3 ) = ( 7 , 0.09 ) , ( h 4 , r 4 ) = ( 4 , 0.13 ) , ( h 5 , r 5 ) = ( 10 , 0.04 ) ,
with the centers
c 1 = ( 0.5 , 0.5 ) , c 2 = ( 0.2 , 0.35 ) , c 3 = ( 0.75 , 0.2 ) , c 4 = ( 0.8 , 0.85 ) , c 5 = ( 0.3 , 0.8 ) .
The true σ is depicted in Figure 7.
To test the one-level MCMC result, we feed the algorithm an initial guess that could be 40 % away from the true value:
h i guess = ( 0.6 + 0.4 · rand ) · h i true ,
and similarly for each r i . The prior is set as a uniform distribution, and the variance of the noise is again taken to be 10 4 . In Figure 8, we plot the marginal posterior distribution of ( r 1 , r 4 ) of the DE, and RTE with ϵ = 1 , and ϵ = 2 6 . The posterior distribution of RTE with small ϵ is strikingly closer to the DE model, while the RTE model with big ϵ gives a very different result. In Table 5 we document the means and variances of the marginal distributions for r 1 and r 4 .
We again compute the Hellinger distance between the two marginal distributions for ( r 1 , r 4 ) of the posterior distributions with respect to ϵ , and it is documented in Table 6. The results numerically confirm Theorem 3.
As done in Example 1, we document the acceptance rate. As seen clearly in Table 7, the acceptance rate increases as ϵ shrinks to zero.
As before, we expect that the small ϵ distributions computed using the one-level MCMC case and using our method should be similar. To compare them, we compute the Hellinger distance between the distribution based on the RTE computed using one-level MCMC for ϵ = 2 6 , in Figure 8 and the distribution based on the RTE computed using the DE-assisted MCMC for ϵ = 2 6 , shown in Figure 9. We obtain
d Hell μ RTE b ( σ ) 1 - level , μ RTE b ( σ ) DE - assisted = 0.1357 ,
again demonstrating that the distributions based on the methods are similar.
Finally, we report the computational cost savings of using the DE-assisted two-level MCMC method compared to the one-level MCMC method for Example 2. The values of β and r 1 for Example 2 are documented in Table 8. From the table, we see that using the DE-assisted two-level MCMC method saves us roughly 12% of the RTE evaluations, for both values of ϵ that we considered.
We again emphasize that β and r 1 highly depend on the sampling strategy used, the initial values of the MCMC parameters, and the step size.

6. Conclusions

In this paper, we solve the inverse problem for the RTE using Bayesian inference, which gives us a posterior distribution for the quantity of interest. Accessing concrete information about this distribution using MCMC can be difficult, because the RTE is fairly expensive to solve. However, in the strong scattering regime with the Knudsen number ϵ going to zero, the posterior distribution based on the RTE converges to the posterior distribution based on the DE. With this knowledge, we employ a two-level MCMC technique, in which the posterior based on the DE is used to create good samples for the posterior based on the RTE. In this way, we save time by rejecting poor samples using the posterior based on the DE distribution, which is fast to compute. This reduces the number of times we must solve the inverse problem for the RTE, which improves the efficiency of the computation. We also prove that the second-level acceptance rate of this method is close to one in the diffusion limit, meaning that samples accepted at the first level have a high chance of being accepted at the second level, if ϵ is small.

Author Contributions

Conceptualization, Q.L. and K.N.; Methodology, Q.L. and K.N.; Software, Q.L. and K.N.; Validation, Q.L. and K.N.; Formal Analysis, Q.L. and K.N.; Investigation, Q.L. and K.N.; Resources, Q.L. and K.N.; Data Cura- tion, Q.L. and K.N.; Writing-Original Draft Preparation, Q.L. and K.N.; Writing-Review & Editing, Q.L. and K.N.; Visualization, Q.L. and K.N.; Supervision, Q.L.; Project Administration, Q.L.; Funding Acquisition, Q.L.

Funding

This research was funded by NSF DMS 1619778, NSF DMS 1750488 and NSF TRIPODS 1740707.

Acknowledgments

The authors would like to thank Chi Zhang for her work running the MCMC code and help editing the draft.

Conflicts of Interest

The authors declare no conflict of interest.

References

  1. Arridge, S.R.; Schotland, J.C. Optical tomography: Forward and inverse problems. Inverse Probl. 2009, 25, 123010. [Google Scholar] [CrossRef]
  2. Bal, G. Inverse transport theory and applications. Inverse Probl. 2009, 25, 053001. [Google Scholar] [CrossRef] [Green Version]
  3. Egger, H.; Schlottbom, M. Numerical methods for parameter identification in stationary radiative transfer. Comput. Optim. Appl. 2015, 62, 67–83. [Google Scholar] [CrossRef]
  4. Ren, K. Recent developments in numerical techniques for transport-based medical imaging methods. Commun. Comput. Phys. 2010, 8, 1–50. [Google Scholar] [CrossRef]
  5. Klose, A.D.; Netz, U.; Beuthan, J.; Hielscher, A.H. Optical tomography using the time-independent equation of radiative transfer—Part 1: Forward model. J. Quant. Spectrosc. Radiat. Transf. 2002, 72, 691–713. [Google Scholar] [CrossRef]
  6. Klose, A.D.; Netz, U.; Beuthan, J.; Hielscher, A.H. Optical tomography using the time-independent equation of radiative transfer—Part 2: Inverse model. J. Quant. Spectrosc. Radiat. Transf. 2002, 72, 715–732. [Google Scholar] [CrossRef]
  7. Mourant, J.R.; Freyer, J.; Hielscher, A.H.; Eick, A.; Shen, D.; Johnson, T. Mechanisms of light scattering from biological cells relevant to noninvasive optical-tissue diagnostics. Appl. Opt. 1998, 37, 3586–3593. [Google Scholar] [CrossRef] [PubMed]
  8. Choulli, M.; Stefanov, P. Inverse scattering and inverse boundary value problems for the linear Boltzmann equation. Commun. Partial Differ. Equ. 1996, 21, 763–785. [Google Scholar] [CrossRef]
  9. Wang, J.N. Stability estimates of an inverse problem for the stationary transport equation. Annales de l’I.H.P. Physique Théorique 1999, 70, 473–495. [Google Scholar]
  10. Sylvester, J.; Uhlmann, G. A global uniqueness theorem for an inverse boundary value problem. Ann. Math. 1987, 125, 153–169. [Google Scholar] [CrossRef]
  11. Alessandrini, G. Stable determination of conductivity by boundary measurements. Appl. Anal. 1988, 27, 153–172. [Google Scholar] [CrossRef]
  12. Hielscher, A.H.; Alcouffe, R.; Barbour, R. Comparison of finite-difference transport and diffusion calculations for photon migration in homogeneous and heterogeneous tissues. Phys. Med. Biol. 1988, 43, 1285–1302. [Google Scholar] [CrossRef]
  13. Arridge, S.R. Optical tomography in medical imaging. Inverse Probl. 1999, 15, R41–R93. [Google Scholar] [CrossRef]
  14. Chen, K.; Li, Q.; Wang, L. Stability of stationary inverse transport equation in diffusion scaling. Inverse Probl. 2018, 34, 025004. [Google Scholar] [CrossRef] [Green Version]
  15. Lai, R.-Y.; Li, Q.; Uhlmann, G. Inverse problems for the stationary transport equation in the diffusion scaling. arXiv, 2018; arXiv:1808.02071. [Google Scholar]
  16. Stuart, A.M. Inverse problems: A Bayesian perspective. Acta Numer. 2010, 19, 451–559. [Google Scholar] [CrossRef] [Green Version]
  17. Dashti, M.; Stuart, A.M. The Bayesian approach to inverse problems. In Handbook of Uncertainty Quantification; Springer International Publishing: Basel, Switzerland, 2017. [Google Scholar]
  18. Bal, G.; Langmore, I.; Marzouk, Y. Bayesian inverse problems with Monte Carlo forward models. Inverse Probl. Imaging 2013, 7, 81–105. [Google Scholar] [Green Version]
  19. Giles, M.B. Multi-level Monte Carlo path simulation. Oper. Res. 2008, 56, 607–617. [Google Scholar] [CrossRef]
  20. Heinrich, S. Multilevel monte carlo methods. In Large-Scale Scientific Computing; Birkhäuser Basel: Basel, Switzerland, 2001. [Google Scholar]
  21. Christen, J.A.; Fox, C. Markov chain monte carlo using an approximation. J. Comput. Graph. Stat. 2005, 14, 795–810. [Google Scholar] [CrossRef]
  22. Fox, C.; Nicholls, G. Sampling conductivity images via MCMC. In The Art and Science of Bayesian Image Analysis; Leeds University Press: Leeds, UK, 1997; pp. 91–100. [Google Scholar]
  23. Peherstorfer, B.; Willcox, K.; Gunzburger, M. Survey of multifidelity methods in uncertainty propagation, inference, and optimization. SIAM Rev. 2018, 60, 550–591. [Google Scholar] [CrossRef]
  24. Newton, K.; Li, Q.; Stuart, A.M. Diffuse optical tomography in the Bayesian framework. arXiv, 2019; arXiv:1902.10317. [Google Scholar]
  25. Kullback, S. Information Theory and Statistics; Reprint of the second (1968) ed.; Dover Publications, Inc.: Mineola, NY, USA, 1997. [Google Scholar]
  26. Bensoussan, A.; Lions, L.J.; Papanicolaou, G.C. Boundary layers and homogenization of transport processes. Publ. Res. Inst. Math. Sci. 1979, 15, 53–157. [Google Scholar] [CrossRef] [Green Version]
  27. Bardos, C.; Santos, R.; Sentis, R. Diffusion approximation and computation of the critical size. Trans. Am. Math. Soc. 1984, 284, 617–649. [Google Scholar] [CrossRef]
  28. Wu, L.; Guo, Y. Geometric correction for diffusive expansion of steady neutron transport equation. Commun. Math. Phys. 2015, 336, 1473–1553. [Google Scholar] [CrossRef]
  29. Li, Q.; Wang, L. Implicit asymptotic preserving method for linear transport equations. Commun. Comput. Phys. 2017, 22, 157–181. [Google Scholar] [CrossRef]
Figure 1. The solution ρ as a function of x for different ϵ , and the error in the solution ρ as a function of ϵ .
Figure 1. The solution ρ as a function of x for different ϵ , and the error in the solution ρ as a function of ϵ .
Entropy 21 00291 g001
Figure 2. The solution as a function of x and v for different ϵ .
Figure 2. The solution as a function of x and v for different ϵ .
Entropy 21 00291 g002
Figure 3. Difference between the RTE and DE solutions in 2D for different ϵ .
Figure 3. Difference between the RTE and DE solutions in 2D for different ϵ .
Entropy 21 00291 g003
Figure 4. L 2 norm in the error for the 2D solution ρ as a function of ϵ .
Figure 4. L 2 norm in the error for the 2D solution ρ as a function of ϵ .
Entropy 21 00291 g004
Figure 5. Multivariate kernel distribution representing the posterior distributions for DE, RTE with ϵ = 1 and ϵ = 2 6 obtained using the one-level MCMC method.
Figure 5. Multivariate kernel distribution representing the posterior distributions for DE, RTE with ϵ = 1 and ϵ = 2 6 obtained using the one-level MCMC method.
Entropy 21 00291 g005
Figure 6. Multivariate kernel distribution representing posterior distribution obtained from the two-level MCMC method for ϵ = 1 and ϵ = 2 6 .
Figure 6. Multivariate kernel distribution representing posterior distribution obtained from the two-level MCMC method for ϵ = 1 and ϵ = 2 6 .
Entropy 21 00291 g006
Figure 7. The true σ as a contour plot for Example 2.
Figure 7. The true σ as a contour plot for Example 2.
Entropy 21 00291 g007
Figure 8. Multivariate kernel distribution for r 1 and r 4 for DE and RTE with ϵ = 1 and ϵ = 2 6 using one-level MCMC.
Figure 8. Multivariate kernel distribution for r 1 and r 4 for DE and RTE with ϵ = 1 and ϵ = 2 6 using one-level MCMC.
Entropy 21 00291 g008
Figure 9. Multivariate kernel distribution for r 1 and r 4 using two-level MCMC with ϵ = 1 and ϵ = 2 6 .
Figure 9. Multivariate kernel distribution for r 1 and r 4 using two-level MCMC with ϵ = 1 and ϵ = 2 6 .
Entropy 21 00291 g009
Table 1. The values of the error in the solution ρ as a function of ϵ .
Table 1. The values of the error in the solution ρ as a function of ϵ .
ϵ 1 1 2 1 4 1 16 1 32 1 64
error0.21910.20040.17370.14180.11300.0990
Table 2. Hellinger distance: d Hell ( μ DE b ( σ ) , μ RTE b ( σ ) ) with μ RTE b ( σ ) computed with RTE as the forward solver using ϵ = 1 , ϵ = 2 3 and ϵ = 2 6 respectively.
Table 2. Hellinger distance: d Hell ( μ DE b ( σ ) , μ RTE b ( σ ) ) with μ RTE b ( σ ) computed with RTE as the forward solver using ϵ = 1 , ϵ = 2 3 and ϵ = 2 6 respectively.
ϵ 1 2 3 2 6
d Hell ( μ DE b ( σ ) , μ RTE b ( σ ) ) 0.64180.53220.2219
Table 3. Acceptance rate β for Example 1.
Table 3. Acceptance rate β for Example 1.
ϵ 1 2 3 2 6
acceptance rate0.69310.87360.8939
Table 4. Computational cost comparison.
Table 4. Computational cost comparison.
ϵ = 1 ϵ = 2 6
β 0.69310.8939
r 1 0.60000.7350
r 1 / β 0.86560.8222
Table 5. Means and variances of the marginal distributions for r 1 and r 4 for DE, RTE with ϵ = 1 , and RTE with ϵ = 2 6 for two-level MCMC.
Table 5. Means and variances of the marginal distributions for r 1 and r 4 for DE, RTE with ϵ = 1 , and RTE with ϵ = 2 6 for two-level MCMC.
Mean ( r 1 )Mean ( r 4 )Var ( r 1 )Var ( r 4 )
RTE, ϵ = 1 0.14830.15180.04680.0464
RTE, ϵ = 2 6 0.18560.12780.0420.0076
DE0.18950.12730.00780.0038
Table 6. Hellinger distance for the marginal distributions for r 1 and r 4 : d Hell ( μ DE b ( σ ) , μ RTE b ( σ ) ) with μ RTE b ( σ ) computed with RTE as the forward solver using ϵ = 1 , ϵ = 2 3 and ϵ = 2 6 respectively.
Table 6. Hellinger distance for the marginal distributions for r 1 and r 4 : d Hell ( μ DE b ( σ ) , μ RTE b ( σ ) ) with μ RTE b ( σ ) computed with RTE as the forward solver using ϵ = 1 , ϵ = 2 3 and ϵ = 2 6 respectively.
ϵ 1 2 3 2 6
d Hell ( μ DE b ( σ ) , μ RTE b ( σ ) ) 0.86930.63240.4440
Table 7. Acceptance rate β for Example 2.
Table 7. Acceptance rate β for Example 2.
ϵ 1 2 3 2 6
acceptance rate0.72300.89690.9305
Table 8. Computational cost comparison.
Table 8. Computational cost comparison.
ϵ = 1 ϵ = 2 6
β 0.72300.9305
r 1 0.64500.8180
r 1 / β 0.89210.8791

Share and Cite

MDPI and ACS Style

Li, Q.; Newton, K. Diffusion Equation-Assisted Markov Chain Monte Carlo Methods for the Inverse Radiative Transfer Equation. Entropy 2019, 21, 291. https://doi.org/10.3390/e21030291

AMA Style

Li Q, Newton K. Diffusion Equation-Assisted Markov Chain Monte Carlo Methods for the Inverse Radiative Transfer Equation. Entropy. 2019; 21(3):291. https://doi.org/10.3390/e21030291

Chicago/Turabian Style

Li, Qin, and Kit Newton. 2019. "Diffusion Equation-Assisted Markov Chain Monte Carlo Methods for the Inverse Radiative Transfer Equation" Entropy 21, no. 3: 291. https://doi.org/10.3390/e21030291

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Metrics

Back to TopTop