Diffusion Equation-Assisted Markov Chain Monte Carlo Methods for the Inverse Radiative Transfer Equation

Optical tomography is the process of reconstructing the optical properties of biological tissue using measurements of incoming and outgoing light intensity at the tissue boundary. Mathematically, light propagation is modeled by the radiative transfer equation (RTE), and optical tomography amounts to reconstructing the scattering coefficient in the RTE using the boundary measurements. In the strong scattering regime, the RTE is asymptotically equivalent to the diffusion equation (DE), and the inverse problem becomes reconstructing the diffusion coefficient using Dirichlet and Neumann data on the boundary. We study this problem in the Bayesian framework, meaning that we examine the posterior distribution of the scattering coefficient after the measurements have been taken. However, sampling from this distribution is computationally expensive, since to evaluate each Markov Chain Monte Carlo (MCMC) sample, one needs to run the RTE solvers multiple times. We therefore propose the DE-assisted two-level MCMC technique, in which bad samples are filtered out using DE solvers that are significantly cheaper than RTE solvers. This allows us to make sampling from the RTE posterior distribution computationally feasible.


Introduction
Optical tomography is a medical imaging technique in which near infrared light is sent into biological tissue, and the reflected and transmitted outgoing light at the surface of the tissue is measured [1][2][3][4]. Using information about the incoming and outgoing light, one can determine the properties of the tissue. The use of low-energy light makes optical tomography cheaper and less invasive than traditional methods such as X-ray imaging, although the reconstruction of the properties of the tissue can be more difficult than when high-energy photons are used. Optical imaging can be used to study and monitor many different kinds of tissue, including brain, breast, and joint imaging, as well as monitoring blood oxygenation [5][6][7].
The process of collecting information about the incoming and outgoing light and using it to reconstruct the tissue's properties is an inverse problem. There are two associated forward models that both map the incoming data (light intensity sent into the tissue) to the outgoing data (light intensity measured coming off the tissue). The first forward model corresponds to the radiative transfer equation (RTE) and is called the albedo operator. Using the albedo operator, the inverse problem may be solved to obtain the scattering coefficient in the RTE. The second forward model corresponds to the diffusion equation (DE) and is called the Dirichlet-to-Neumann (DtN) map. Using the DtN map, one may similarly solve the inverse problem and reconstruct the diffusion coefficient in the DE. Typically, RTE is used for high-energy photons and DE is used for low-energy photons. The energy of photons determines the "mean-free-path" of free transport, and then further determines the strength of scattering. Mathematically, it can be characterized by the Knudsen number, and one can show that in the small Knudsen number regime, the two equations are equivalent, as will be made clearer in Section 3. It was shown in [8] that the coefficients of the RTE are uniquely recoverable in 3D when the entire albedo operator is known, and it was shown in [9] that this reconstruction is Lipschitz stable. On the other hand, using the DtN map to reconstruct the corresponding coefficients of the DE is the famously ill-posed Calderón problem. The uniqueness of the reconstruction has been presented in [10], and the logarithmically ill-posed nature of the problem has been proved in [11].
These two forward models are related, as we will describe. The RTE describes light propagating through a material with some optical properties, here taken to be the scattering coefficient of the material. Let f (x, v) denote the distribution of particles at location x with direction v, where x ∈ Ω ⊂ R d , and v ∈ S d−1 , the unit sphere in R d , where d is the dimension of the problem. In other words, all particles are taken to move with constant unit speed. In the later parts of the paper, we use "velocity" and "direction" interchangeably when no ambiguity is present. For simplicity, we work with a version of the RTE in which the scattering coefficient σ(x) depends only on position. Then the RTE is where L is an integral operator describing photons colliding with the media and scattering off. Its explicit formulation is in Section 3.
The diffusion equation, on the other hand, is a simplified model from the RTE, accurate for low-energy photons in the high-scattering, low-absorption regime. Since the scattering is very strong, the distribution achieves equilibrium in the velocity domain, and the light intensity, also known as fluence, becomes a function only on physical space. Suppose that ρ(x) is the light intensity at position x, and a(x) is the diffusion coefficient (corresponding to 1 σ(x) from the RTE, as will be shown in Theorem 2). Then the DE is C∇ · (a(x)∇ρ(x)) = 0 , where C is a constant depending on dimension. In this case, the map from the Dirichlet data (light intensity or fluence injected into the tissue) to the Neumann data (light propagating out) is used to reconstruct the diffusion coefficient a(x). This map is known as the Dirichlet-to-Neumann (DtN) map.
There is a relationship between the two forward models RTE and DE. One would like to understand, physically and mathematically, why the stability of each model is different in the inverse setting. It turns out that if a(x) and σ(x) satisfy certain relations, the RTE and DE are asymptotically close in the near infrared case. This is made precise below in Theorem 2. Physically, in the forward setting, high-energy photons experience little scattering before exiting the domain, whereas low-energy photons scatter frequently in the tissue before they exit. As a result, the reconstruction of the tissue using high-energy photons is generally crisp, whereas low-energy photons provide more blurred images. We use the "Knudsen number" to quantify how much scattering a photon experiences in the material. In the low-energy regime, scattering increases, the Knudsen number shrinks to zero, and the RTE converges to the DE in the forward setting. On the inverse side, the inverse problem for the RTE converges to the inverse problem for the DE, meaning that the information carried in the albedo operator is almost the same as that in the DtN map, and therefore the reconstruction of the tissue properties converges in this limit as well. This has been numerically observed in [12,13], and further proved rigorously in [14,15].
A Bayesian solution to the inverse problem is seen as the posterior distribution for the quantity of interest (QoI), in our case the scattering coefficient in the RTE and the diffusion coefficient in the DE. Bayesian methods are particularly useful for inverse problems, because noise in the measurement as well as prior information about the QoI is taken into account naturally [16]. If we observe some noisy data in order to obtain information about the QoI, the solution of the inverse problem is the probability distribution of the QoI given this data, known as the posterior distribution [17]. Bayes' theorem allows us to determine this distribution given a guess for the probability distribution of the QoI (the prior distribution) and the probability distribution for the data given the QoI (the likelihood function, obtainable from the forward model). As such, there is a sharp distinction between this probabilistic view and other deterministic tools, and the Bayesian formulation gives one the ability to "regularize" an inverse problem with some prior knowledge.
While we theoretically have the posterior distribution in hand, accessing concrete information about it such as its mean and variance can be challenging. One way to obtain this information is by creating a list of samples from the distribution by some means, such as the Markov chain Monte Carlo (MCMC) method. Even this, however, is not always straightforward. One sample from the RTE represents an entirely new configuration of the media, meaning that one must re-compute the forward albedo map for each MCMC sample. This is especially expensive because the RTE is over phase space: if the domain is three-dimensional, then f is supported on (x, v) ∈ Ω × S d−1 , a five-dimensional space. The same process applies for the DE as well, but since the DE is only over physical space, these computations are much faster. Knowing that the RTE converges to the DE in the high scattering regime, one wishes to combine the two and speed up the computation.
There is a balance to be struck here, and the goal of this paper is to find a way to sample from the posterior distribution for σ in a way that combines the DE and RTE. To that end, we employ the two-level MCMC method [18], also known as the two-stage method. The two-level method uses two models, which we will call the low-resolution model and the high-resolution model. The high-resolution model gives rise to the desired target distribution-for us, the posterior distribution for the RTE. The low-resolution model gives rise to a distribution that approximates the target distribution, and is ideally fast to compute. It is used to filter out poor draws so we need not waste time evaluating the target distribution on them. We know the DE is fast to compute, and that when scattering is sufficiently strong, the posterior based on the DE is a good approximation to the posterior based on the RTE, so we use the posterior based on the DE as the low-resolution model. Then, we can use the DE to reduce the number of times we must solve the RTE by rejecting bad draws. This method combines the inverse problem for the DE and the inverse problem for the RTE to create a faster method for sampling from the inverse problem for the RTE in the diffusion limit.
There are similar methods to approach related problems, such as the multilevel Monte Carlo method for path simulation [19] or the multilevel Monte Carlo method for parametric integration [20]. These methods also combine low-resolution models with a high-resolution model. The algorithm our method relies on, introduced in [18], is similar to the methods proposed in [21,22]. In this view, our algorithm is a two-stage method that increases the number of MCMC iterations for a given computational cost. It uses a two-stage delayed acceptance, in which the candidate sample y has to be accepted with a low-resolution model before it is passed on to be either accepted or rejected by the high-resolution model. However, typically for these methods, a coarse-grid approximation is used for the low-resolution model [23]. No matter how coarse the grids are for the RTE, however, they are still on phase space and the dimensionality issue is still left open. In our method, however, a completely different model is used: the inverse diffusion equation is used as a low-resolution model for the inverse radiative transfer equation.
This paper is organized as follows. In Section 2, we provide some necessary background, including a discussion of Bayesian methods for inverse problems, as well as an introduction to Markov chain Monte Carlo methods. In Section 3, we discuss the diffusion limit of the radiative transfer equation in the forward setting, and go on to discuss the inverse setting convergence of the posterior distributions. In Section 4, we discuss the DE-assisted two-level MCMC method, and prove its convergence. We also discuss the dependence of the second-level acceptance rate on , the Knudsen number. In Section 5, we present our numerical evidence. In particular, we show the convergence of the two forward models, the convergence of the posterior distribution functions quantified using the Hellinger distance, and the improvement of the DE-assisted two-level MCMC over the standard MCMC.

Background
In this section, we present preliminaries for studying the inverse problem for the RTE in the diffusion limit. In particular, we will first review basic concepts of the Bayesian formulation, and then review the previous algorithms on which our algorithm is based.
These results will be used to study the convergence of the posterior distribution with the RTE or the DE as the forward model, which will be used to develop our numerical method that combines the two posterior distributions.

Bayesian Formulation for Inverse Problems
Bayesian inference is a technique for estimating the distribution of some quantity of interest when some measurements are available. It is based on Bayes' theorem. A given physical problem may be denoted as where G is the forward map that takes a parameter σ to the measurement b, with an added noise η. The forward problem is to find b = G(σ) for a given σ, while most problems in practice are naturally inverse problems, meaning that one conducts experiments and obtains data b, and tries to use it to reconstruct the quantity of interest σ. The Bayesian formulation is a method for retrieving information about σ using b. It requires knowledge of the following two probability distribution functions µ 0 and µ error ahead of time, -without knowledge of the measurement b, a priori σ obeys a certain law, and the noise is distributed as: In many cases, both distribution functions can be assumed to be normal, i.e., to have a Gaussian-type density function. Suppose that µ 0 is a Gaussian distribution concentrated at m 0 with covariance C 0 and the noise is concentrated at zero with covariance C error . If σ is finite dimensional then we may express the prior µ 0 and likelihood µ σ as follows: .
Analogous formulae are also available in the infinite dimensional setting; see [16,17] and the references therein.
With this, the posterior distribution of σ, under the condition that b is obtained in experiment, is given by Bayes' theorem, where is the normalization factor. In optical tomography, there are two fundamental models for describing light propagation: the radiative transfer equation (RTE), and the diffusion equation (DE). They rely on the albedo operator and the DtN map for the reconstruction. In the optically thick case (i.e., when photons scatter frequently), the two models are asymptotically close in some sense, Correspondingly, the posterior distributions given by the two models are close to each other [24], meaning: To be more precise, there are multiple ways to quantify the distance between two distribution functions. Typically, this distance is quantified by the Kullback-Leibler (KL) divergence or the Hellinger distance [25]. For any two distributions µ and µ supported on the same space, they are defined by the following, where λ is any pre-chosen distribution function. The Hellinger distance is invariant to the choice of λ. Again, these formulae have interpretations in the infinite dimensional setting; see the Appendix of [17]. In Theorem 3, we quantify the similarity between µ b DE (σ) and µ b RTE (σ) and find that the two distributions are apart in the diffusion limit. Therefore, since µ b DE is close to µ b RTE in the optically thick case, the main task in this paper is to use µ b DE to sample from µ b RTE .

The Markov Chain Monte Carlo Method
We first present the standard Metropolis-Hastings (MH) algorithm. Given a probability density µ called the target distribution and defined on X, the MH algorithm constructs a Markov chain on X that is stationary with respect to µ. The elements of the Markov chain are then regarded as samples from the distribution µ. More specifically, the MH algorithm starts with an initial guess x 0 , and draws new samples according to a proposal distribution q. By adjusting the acceptance rate using the target distribution µ, the MH algorithm accepts or rejects the draws so that the accepted samples form an empirical distribution that resembles the target distribution µ. We now present the MH algorithm, shown in Algorithm 1.
The transition kernel P of this algorithm is In order to demonstrate the convergence of MCMC in Theorem 1, it is necessary to examine the transition kernel, which is the probability that the next draw x k+1 is in the set A given the previous elements of the chain x 0 , . . . , x k . It may be written where the off-diagonal density of the kernel is and the probability that the process remains at x is The goal is to show that the target distribution µ is an invariant distribution of the Markov chain {x k } k≥0 in the sense that all measurable sets A satisfy This means that elements of the Markov chain generated by Algorithm 1 give a good representation of the distribution µ. To show that µ is the invariant distribution, one first needs p(x, y) to satisfy the detailed balance lemma. Lemma 1. The off-diagonal density p(x, y) satisfies the following equation known as "detailed balance", This lemma allows us to show the following theorem.
Theorem 1. The target distribution µ is an invariant distribution of the Markov chain x n with transition kernel P, i.e., For our problem, each new proposal y is a new configuration of the media σ(x). Thus, to compute α(x k , y), one must evaluate µ(y) and thus re-compute the forward map, which requires solving the RTE many times. As this is computationally prohibitive, we instead use the two-level MCMC method, described in the next section. The cost comparison is discussed further in Section 5.

Two-Level MCMC Method
The two-level MCMC method is a method to increase the efficiency of sampling from the target distribution µ. It requires two distributions: a target distribution µ, and a second distribution µ . Here, the target distribution µ calls for the evaluation of a "high-resolution" model, but if it can be approximated in some sense by µ which only calls for the evaluation of some "low-resolution" model, then µ can serve as a good filter to reject poor draws in the MCMC sampling. More specifically, the algorithm goes through two levels of evaluating a proposed sample. First, the sample is evaluated using the low-resolution model and is either accepted or rejected. If it is accepted, the algorithm goes on to evaluate the sample using the high-resolution model. This "pre-acceptance" stage filters out poor draws, allowing one to make more courageous proposals and not waste time evaluating the forward model on them. We present the two-level scheme below in Algorithm 2.
Algorithm 2 Two-level Metropolis Hastings 3. With probability α, pre-accept y and continue to 4. Otherwise, set x k+1 = x k and start over. 4. The second-level proposal is now y, effectively drawn from Then set With probability β(x k , y), accept and set Similarly to the MCMC method (Algorithm 1), in the two-level MCMC method, the draw of x k+1 merely depends on the evaluation of x k , and the previous draws x 1 , . . . x k−1 are irrelevant. The transition kernel that brings x k to x k+1 is denoted by P 2 , which we will discuss in more detail in Section 4.1.
In this paper, the desired high-resolution model will be the posterior distribution for the RTE. The low-resolution model will be the posterior distribution for the DE. Evaluating µ (y) only involves computing the DE, which is much faster since the DE is supported on the physical domain. Once the DE accepts the proposal y, it passes to the inverse problem for the RTE's posterior distribution µ(y). Thus, we compute the RTE forward model fewer times overall, which saves time.
In Section 4, we discuss the convergence of this method and the dependence of the second-level acceptance rate β on the Knudsen number , our limit parameter.

Diffusion Limit
In this section, we examine the diffusion limit of our problem in the forward and inverse case. We first study the convergence of the radiative transfer equation to the diffusion equation. Then, we explain how the inverse problems are solved using the forward map. Next, we discuss the convergence of the forward map for the radiative transfer equation to that of the diffusion equation. Finally, we discuss the convergence of the posterior distribution for the radiative transfer equation to the posterior for the diffusion equation.

Diffusion Limit of the Radiative Transfer Equation
The optical "thickness" of the material physically corresponds to the number of times a photon scatters between entering a medium and escaping. The physical quantity is termed the Knudsen number, which stands for the ratio of mean free path to the domain length. The mean free path is the average distance a photon travels before being scattered. When the Knudsen number is small, photons, on average, scatter many times before they are emitted, and the material is thus regarded as optically "thick". In this regime, the two mathematical models for light propagation carry the same information, namely, the radiative transfer equation and the diffusion equation are asymptotically converging.
The radiative transfer equation takes a statistical mechanics viewpoint, and describes the distribution of photons on the phase domain. Let f (x, v) denote the number of photons at position x ∈ Ω ⊂ R d , a bounded domain, moving in direction v ∈ S d−1 , the unit sphere in R d (i.e., the speed is normalized to be 1). This distribution satisfies the RTE, where the collision operator is In the equation, the term v · ∇ f on the left shows that the photons move with direction v, and the term on the right shows that the photons colliding with the media and being scattered. We have used the notation v to denote normalized integration over v, and dv is the normalized unit measure, meaning The equation has a unique solution when it is equipped with the incoming boundary condition, which is the analogue of a Dirichlet boundary condition for equations lacking velocity space. Let denote the collection of coordinates on the boundary x ∈ ∂Ω, so that the velocity v points in/out of the domain: ±v · n x > 0. Here, n x is the normal vector at x pointing out of Ω. The incoming boundary condition is imposed on Γ − , whereas Γ + represents the particles going out of Ω. For a unique solution to (7), boundary conditions must be imposed on Γ − , It concerns the case when the scattering coefficient, now seen as k(x, v, v ), may depend on the changing velocity of the particles during a collision, and also includes an absorption coefficient σ a (x, v) representing the photons being absorbed into the material and lost. A standard k which takes into account other kinds of scattering is given by the Henyey-Greenstein model. The absorption coefficient can be taken to be zero if scattering is sufficiently high, and this is the case we focus on.
The equation is asymptotically equivalent to the diffusion equation in the optically thick regime when the Knudsen number is small. We denote the Knudsen number by and rescale the problem by setting σ → σ/ . Then, as the Knudsen number becomes small, the scattering effect dominates. Equation (7) may then be written as In the small regime, it was conjectured in [26] and then proved in [27,28] that the equation is asymptotically equivalent to the diffusion equation. One can make the convergence explicit under the following assumptions. • the admissible media is bounded, meaning that there is a constant C * so that: • and the boundary conditions are bounded, meaning: We also term the set of admissible media: With the assumption, we have the following theorem.
where ξ(x) is defined by φ(x). C d is a constant depending on the dimension d and could be dropped out of Equation (11). In particular, with compatible boundary conditions at different orders, one approximates f through different forms: Here, the constant C A depends on C * , the upper bound in Assumption 1 for the admissible set.
The proof, which we omit here, relies on asymptotic expansion away from the boundary.

Convergence in the Inverse Setting
We examine the convergence in the inverse setting in this section. To describe light propagation in a given tissue in Ω, there are two models, the radiative transfer equation that gives a statistical description, and the diffusion equation that characterizes the macroscopic behavior. The two models are asymptotically equivalent, as discussed in the previous section.
In optical tomography, light with a known intensity is injected into the material, and detectors are placed on the tissue boundary to collect the light current emitted from the material. For the RTE, the mapping from the incoming data to the outgoing data is termed the albedo operator, defined by H RTE , where f satisfies (9). It may also be written: In practice, finitely many incoming data φ k are injected and finitely many measurements are taken at the boundary location l j per experiment. We define the map, determined by the to-be-reconstructed σ, from the known incoming data to the measured outgoing data as: or in a compact form: where η is a vector of JK length that contains the noise in the measurements. Clearly, the JK length vector b is the result of a forward map G acting on the quantity of interest σ, with a small perturbation due to the noise. We assume that the noise is distributed as a Gaussian, According to Bayes' theorem, one then has: For the DE model, we consider the forward map to be the map that takes the Dirichlet data to the Neumann outflow. It is termed the DtN map: where ρ satisfies (11). Another way to write it is: Again, in practice, finitely many incoming data ξ k are injected and finitely many measurements are taken at the boundary l j per experiment. We define the map, determined by the to-be-reconstructed media σ, from ξ k to the measured data as: where η is the same pollution in the measurement. Again, the vector b is the result of the forward map for the diffusion equation acting on the to-be-reconstructed σ, with the perturbation from the noise. Then, similarly, we have It is proved in [24] that the two forward maps converge, namely: Proposition 1. Under Assumption 1, the forward maps G RTE and G DE satisfy for all σ ∈ A.
Using this convergence, it was also proved that the posterior distributions are close in the diffusion limit, as in the following theorem.

Theorem 3. Under Assumption 1, the Hellinger distance between the two posterior distribution is bounded by
in the optically thick regime when → 0, namely, Similarly, the Kullback-Leibler divergence between the posterior distribution for the RTE and the posterior distribution for the DE is also O( ), The proof largely depends on the Lipschitz continuity of the Gaussian form in the likelihood function, and the convergence result in Proposition 1. We omit the details and refer interested readers to [24].
Using Theorem 3, when is sufficiently small, one may use the diffusion equation posterior to approximate the radiative transfer equation posterior. This approximation allows us to speed up the MCMC computation by setting µ b DE = µ as the low-resolution model in the first level. This filters out bad draws, passing better draws to µ b RTE = µ on the second level.

Algorithm
In this section, we discuss the DE-assisted two-level MCMC method and its convergence for our case, the inverse problem for the RTE in the diffusion limit. We present a result about the second-level acceptance rate of two-level MCMC and its dependence on and a result about the computational cost of our algorithm compared to the one-level MCMC method.

DE-Assisted Two-Level MCMC Method
In this section, we discuss our algorithm, the DE-assisted two-level MCMC method. It is shown below in Algorithm 3.

Algorithm 3 DE-assisted two-level Metropolis Hastings
3. With probability α, pre-accept y and continue to 4. Otherwise, set x k+1 = x k and start over. 4. The second-level proposal is now y, effectively drawn from Then set β(x k , y) := min 1, With probability β(x k , y), accept and set x k+1 = y. Otherwise, set x k+1 = x k .
The transition kernel P 2 may be written where p 2 (x, y) = q 2 (x, y)β(x, y) x = y 0 x = y is called the second-level off-diagonal density, and r 2 (x) = 1 − p 2 (x, y)dy. As before, to demonstrate the convergence of MCMC, we will examine the transition kernel, which is the probability that the next draw x k+1 is in the set A given the previous elements of the chain x 0 , . . . , x k . For the two-level MCMC method, one desires that the high-resolution target distribution µ b RTE is an invariant distribution of the Markov chain {x k } k≥0 . By definition, this is true if for all measurable sets A, This means that elements of the Markov chain generated by Algorithm 3 give a good representation of the distribution µ b RTE . To show that µ b RTE is the invariant distribution, we first need the following two lemmas.

Lemma 2.
The second-level acceptance rate may be written where q 2 is defined as in Algorithm 2.
Proof. From Lemma 3, for x k = y, we have Plugging this in to our definition of β, we find The importance of this lemma is that it equates β, which depends on q 2 , to a form that is computable. Note that q 2 has a complicated integral form and thus is numerically challenging to compute. This form of the second-stage acceptance rate β has been calculated in [18]. Later, we will discuss the dependence of β on through the inverse problem for the RTE's posterior distribution and show that when the RTE and DE are close, β is close to one.
We next discuss a property of β that we will need to show the convergence of the two-level MCMC method.

Lemma 3.
The second-level off-diagonal density satisfies the detailed balance equation (24),

Proof. Considering the form of β in Equation
, so dividing them gives , using Lemma 1. Simplifying gives This lemma gives rise to the following theorem.
Theorem 4. The second-level distribution µ b RTE is the invariant distribution of the Markov chain {x n } with second-level transition kernel P 2 (x, A), i.e., where the transition kernel p 2 (x, y) is defined as P 2 (x, y) = p 2 (x, y) + r 2 (x)δ x (y) .

Proof. Consider
using Lemma 3 and using the delta function to perform the integration over x in the second term. Then, from the definition of p 2 , This theorem demonstrates that the high-resolution target distribution, for us µ b RTE , is the invariant distribution of P 2 . Thus, the two-level MCMC method gives a list of elements {x k } that can be regarded as samples from µ b RTE . To generate samples from the posterior based on the RTE, using the one-level MCMC method, for each new proposal, the forward map must be computed. To do this, one injects K boundary data φ k and computes J outgoing data l j , meaning that the RTE is solved K times and each time the solution is evaluated at J locations. As this is computationally expensive, one looks for a cheaper way to sample from the distribution. From Theorem 3, we have that in the diffusion limit, the posterior based on the RTE is close to the posterior based on the DE. The diffusion equation is significantly faster to solve because it is only over physical space. Therefore, in the strong scattering regime, we can save time on the computation by using the DE-assisted two-level MCMC method, in which the DE is used as a surrogate model to filter out bad draws. This saves computational effort spent on evaluating bad samples using the high-resolution model. The diffusion equation posterior can be used to reject bad samples, so that we have to evaluate the RTE posterior fewer times overall.

Properties of DE-Assisted Two-Level MCMC
In this section, we first present our result concerning the acceptance rate of the DE-assisted MCMC method and its dependence on , and then present the computational cost of our method compared to the one-level MCMC method.

Acceptance Rate of DE-Assisted Two-Level MCMC
There are many ways to improve MCMC. Different sampling methods such as the Gibbs sampler or independence samplers can make the algorithm more efficient. There are also delayed acceptance/rejection methods, and adaptive methods in which the low-resolution model is improved at each stage using results from the high-resolution model [23]. However, this is not the focus of the current paper. Our method is a delayed acceptance method. It relies on the two-level MCMC algorithm and the diffusion limit to improve the efficiency of sampling from the posterior distribution for the RTE.
We emphasize that the DE-assisted two-level MCMC algorithm in Algorithm 3 has two acceptance rates, α and β. The first-level acceptance rate α depends only on the posterior based on the DE, so it can have no dependence. The second-level acceptance rate β, depends on through the posterior based on the RTE. As → 0, the posterior based on the RTE becomes closer to the posterior based on the DE, so we expect that the samples that pass the selection criterion have a high chance of being accepted in the second step as well. We quantify this in the following proposition. Proposition 2. In the diffusion limit, the acceptance rate β is high: as long as the proposal is reasonably close to the measured data, where γ is the variance of the noise and α is any constant between 0 and 1.
Then, the likelihood function for the inverse diffusion equation is In other words, Proposition 3. Let Cost 1 denote the cost of obtaining k accepted samples using the one-level MCMC method, and Cost 2 denote the cost of obtaining k accepted samples using the DE-assisted two-level MCMC method. Let r 1 denote the acceptance rate of the one-level MCMC method. Let α denote the first-level acceptance rate of the DE-assisted two-level MCMC method, and β denote the second-level acceptance rate of the method.
• Cost 1 and Cost 2 are Considering C R C D , we have Thus, the cost saving of our method comes from β > r 1 . ), we see that the diffusion equation is significantly cheaper to compute than the radiative transfer equation, depending on the number of grid points in the velocity domain.

Proof. First
Next, we compute the number of MCMC iterations required to obtain k accepted samples for each method. For the one-level MCMC method, we simply have where r 1 is the acceptance rate and N 1 is the total number of iterations considered. For the two-level method, where N 2 is the total number of iterations, and r 2 is the overall acceptance rate. To be more specific, a single sample must go through both levels of evaluation in order to be accepted. Supposing that the acceptance rate at the first level is α, and the acceptance rate at the second level is β, then, the overall acceptance rate is r 2 = αβ, which gives us Using the above results, we can compute the cost of obtaining k accepted samples using the one-level MCMC method, Similarly, the cost of obtaining k accepted samples using the two-level method is Considering C D C R , the term containing C D may be dropped out of Equation (29). Then we have Then for β r 1 , the cost of obtaining k samples using the DE-assisted two-level MCMC is much less than the cost of obtaining k samples using the one-level MCMC.
In practice, β and r 1 depend on the sampling strategy used, the initial values of the MCMC parameters, and the step size, so we give only a theoretical asymptotic estimate of the cost comparison.

Numerics
We summarize our numerical evidence in this section. We will first show the convergence in the forward setting, and demonstrate that the solution to the RTE indeed converges to that of the DE. We then demonstrate, with two different media configurations, the convergence of the posterior distribution using the Metropolis-Hastings MCMC method. The 2-level MCMC result will also be shown.
Throughout this section, the DE is computed using the standard finite element method, and the RTE solver is a preconditioned GMRES-based method with upwinding in the physical domain, designed in [29].

Forward Model Convergence
We first review the convergence in the forward setup. As discussed in Theorem 2, in the zero limit of , the solution to the RTE becomes the solution to the DE, which drives the convergence of the albedo operator to the DtN map. We show this numerically below.
We first use a pseudo-2D example, with the y-direction assumed to be homogeneous. The RTE is then a degenerate case given by Here, the media is set as where χ [a,b] is the characteristic function. Figure 1 contains a plot of the solution ρ as a function of x for the given σ for the RTE with = 1, 2 −3 , and 2 −6 , as well as the solution for the DE. Numerically, we set dx = 0.05 and dv = 2π/16. As becomes smaller and smaller, the density ρ of the radiative transfer equation becomes closer and closer to the solution of the diffusion equation. We plot the error on the right panel of Figure 1 on a log-log scale as a function of , measured in L 2 (dx). The values of the error are shown in Table 1.   We furthermore plot the solution f as a function on the phase space. As shown in Theorem 2, in the diffusion limit, f loses its velocity dependence, and the solution becomes flat in v. This is seen numerically as well in Figure 2, which shows f (x, v) for different .
Similar convergence is numerically evaluated in two dimensions where Ω = [0, 1] 2 and σ is chosen to be where x = (x, y) ∈ [0, 1] 2 , h = 10 and r = 0.4. χ is the characteristic function, and the center c = (0.5, 0.5) is given. It is hard to visualize a 3D object, and we only plot the difference in ρ between the radiative transfer and diffusion solutions for different in Figure 3. We observe that as → 0, the error decreases, as plotted in Figure 4.

Reconstruction Example 1
In this example, we test the inverse problem in Ω = [0, 1] 2 ⊂ R 2 with the true media where r = 0.4 and h = 10, and c = (0.5, 0.5). It represents a circle with radius r and height h in the middle of the unit square. Since the media configuration is uniquely determined by the two parameters r and h, the reconstructed posterior distribution then is a distribution function on the (r, h) domain. The convergence and the accuracy of the method is independent of dimension, but we need to use an example that can be visualized to the readers. Besides, in most application problems the bio-tissues to be reconstructed have two or three components that have very different optical properties, and the model used here is a realistic representation of the physical scenario. To compute the forward RTE, we use the GMRES method. We set dx = 0.05 and dv = 2π/16. The light sources are placed on the left boundary (x = 0) for all discrete grids in y-space. This means 20 experiments are being looked at. For all these experiments, the sensors are located at all grid points on the boundaries. For each MCMC sample (r, h), to evaluate µ b RTE (r, h), 20 forward RTE solvers are run with the media configuration (r, h) with the boundary condition being Kronecker delta functions at each light source.
We briefly mention that to visualize the data, we do apply the kernel density estimator that smooths out the solution with a Gaussian kernel.
We first consider the solution to this problem in the one-level MCMC framework. The prior distribution is taken to be uniform in r from 0 to 0.5, and uniform in h from 8 to 12, and the variance of the noise is taken to be 10 −4 . With merely 10 3 MCMC steps, one can already distinguish the distribution functions. As seen in Figure 5, the posterior distribution using the RTE model with = 1 is much more smeared out than the one with = 2 −6 , and the posterior distribution using the RTE model with = 2 −6 is significantly closer to that using the DE model.
To quantize the difference between these three distributions, we compute the Hellinger distances, as documented in Table 2, which numerically confirms Theorem 3.
We then study the two-level MCMC method with the same parameters. The Gaussian-kernel smoothed posterior distributions are shown in Figure 6.
We examined the acceptance rate β: the number of accepted particles on the second level versus the number of accepted particle on the first level. As suggested in Proposition 2, smaller should give a higher acceptance rate, and this is truly seen in numerics. The results are shown in Table 3.
with RTE as the forward solver using = 1, = 2 −3 and = 2 −6 respectively. Table 3. Acceptance rate β for Example 1. We also expect for small , distributions computed using the one-level MCMC should be similar to the ones computed using DE-assisted method. To compare them, we compute the Hellinger distance between the distribution based on the RTE computed using one-level MCMC for = 2 −6 , in Figure 5 and the distribution based on the RTE computed using the DE-assisted MCMC for = 2 −6 , shown in Figure 6. We obtain demonstrating that the distributions based on the methods are similar.
Finally, we report the computational cost savings of using the DE-assisted two-level MCMC method compared to the one-level MCMC method. The values of β and r 1 are documented in Table 4. From the table, we see that using the DE-assisted two-level MCMC method saves us roughly 20% of the RTE evaluations, for each value of . We emphasize that β and r 1 highly depend on the sampling strategy used, the initial values of the MCMC parameters, the step size and how the prior distribution looks. If the prior distribution is centered far away from the maximum likelihood point, a large amount of sample points are needed to give a fair presentation of the posterior distribution function, and the behavior of the underlying equation would be crucial in computational saving.

Reconstruction Example 2
In this example, we take a more complicated media σ: where x = (x, y) ∈ [0, 1] 2 and c i are given, and χ i are characteristic functions. The list of height h i and radii r i are parameters to be reconstructed, and thus we have ten unknown parameters. Numerically, we set dx = 0.05, dv = 2π/16, and we place the light sources at the left boundary as in Example 1. Again, we take 10 3 MCMC steps. As before, to compute the forward RTE, we use the GMRES method. The light sources are placed on the left boundary (x = 0) for all discrete grids in y-space. This means that 20 experiments are being looked at. For all these experiments, the sensors are located at all grid points on the boundaries. For each MCMC sample (r, h), to evaluate µ b RTE (r, h), 20 forward RTE solvers are run with the media configuration (r, h) with the boundary condition being Kronecker delta functions at each light source.
The true parameters are set as The true σ is depicted in Figure 7. To test the one-level MCMC result, we feed the algorithm an initial guess that could be 40% away from the true value: h guess i = (0.6 + 0.4 · rand) · h true i , and similarly for each r i . The prior is set as a uniform distribution, and the variance of the noise is again taken to be 10 −4 . In Figure 8, we plot the marginal posterior distribution of (r 1 , r 4 ) of the DE, and RTE with = 1, and = 2 −6 . The posterior distribution of RTE with small is strikingly closer to the DE model, while the RTE model with big gives a very different result. In Table 5 we document the means and variances of the marginal distributions for r 1 and r 4 .   We again compute the Hellinger distance between the two marginal distributions for (r 1 , r 4 ) of the posterior distributions with respect to , and it is documented in Table 6. The results numerically confirm Theorem 3. Table 6. Hellinger distance for the marginal distributions for r 1 and r 4 : d Hell (µ b DE (σ), µ b RTE (σ)) with µ b RTE (σ) computed with RTE as the forward solver using = 1, = 2 −3 and = 2 −6 respectively. As done in Example 1, we document the acceptance rate. As seen clearly in Table 7, the acceptance rate increases as shrinks to zero.  As before, we expect that the small distributions computed using the one-level MCMC case and using our method should be similar. To compare them, we compute the Hellinger distance between the distribution based on the RTE computed using one-level MCMC for = 2 −6 , in Figure 8 and the distribution based on the RTE computed using the DE-assisted MCMC for = 2 −6 , shown in Figure 9. We obtain d Hell µ b RTE (σ) 1-level , µ b RTE (σ) DE-assisted = 0.1357, again demonstrating that the distributions based on the methods are similar. Finally, we report the computational cost savings of using the DE-assisted two-level MCMC method compared to the one-level MCMC method for Example 2. The values of β and r 1 for Example 2 are documented in Table 8. From the table, we see that using the DE-assisted two-level MCMC method saves us roughly 12% of the RTE evaluations, for both values of that we considered. We again emphasize that β and r 1 highly depend on the sampling strategy used, the initial values of the MCMC parameters, and the step size.

Conclusions
In this paper, we solve the inverse problem for the RTE using Bayesian inference, which gives us a posterior distribution for the quantity of interest. Accessing concrete information about this distribution using MCMC can be difficult, because the RTE is fairly expensive to solve. However, in the strong scattering regime with the Knudsen number going to zero, the posterior distribution based on the RTE converges to the posterior distribution based on the DE. With this knowledge, we employ a two-level MCMC technique, in which the posterior based on the DE is used to create good samples for the posterior based on the RTE. In this way, we save time by rejecting poor samples using the posterior based on the DE distribution, which is fast to compute. This reduces the number of times we must solve the inverse problem for the RTE, which improves the efficiency of the computation. We also prove that the second-level acceptance rate of this method is close to one in the diffusion limit, meaning that samples accepted at the first level have a high chance of being accepted at the second level, if is small.