Bayesian Instability of Optical Imaging: Ill Conditioning of Inverse Linear and Nonlinear Radiative Transfer Equation in the Fluid Regime

For the inverse problem in physical models, one measures the solution and infers the model parameters using information from the collected data. Oftentimes, these data are inadequate and render the inverse problem ill-posed. We study the ill-posedness in the context of optical imaging, which is a medical imaging technique that uses light to probe (bio-)tissue structure. Depending on the intensity of the light, the forward problem can be described by different types of equations. High-energy light scatters very little, and one uses the radiative transfer equation (RTE) as the model; low-energy light scatters frequently, so the diffusion equation (DE) suffices to be a good approximation. A multiscale approximation links the hyperbolic-type RTE with the parabolic-type DE. The inverse problems for the two equations have a multiscale passage as well, so one expects that as the energy of the photons diminishes, the inverse problem changes from wellto ill-posed. We study this stability deterioration using the Bayesian inference. In particular, we use the Kullback– Leibler divergence between the prior distribution and the posterior distribution based on the RTE to prove that the information gain from the measurement vanishes as the energy of the photons decreases, so that the inverse problem is ill-posed in the diffusive regime. In the linearized setting, we also show that the mean square error of the posterior distribution increases as we approach the diffusive regime.


Introduction
Optical tomography is a well-defined inverse problem. In the lab, laser beams with high-energy photons are injected into bio-tissues to detect the interior optical property [1]. This helps identify unhealthy bio-tissues for treatment. Mathematically, this amounts to reconstructing the optical coefficients, such as the scattering and the absorption coefficients in the radiative transfer equation (RTE), which is a model equation that describes the propagation of photon particles [2]. The equation, in its simplest form, reads The unknown f (t, x, v) describes the number of photon particles at time t ∈ R + , location x ∈ Ω ⊆ R d , and traveling with velocity v ∈ S d−1 . The dimension of the problem is d = 2, 3. v is the travel direction of the particles and therefore belongs to a unit ball. The two terms in the equation represent the transport and the scattering effect, respectively. The transport term characterizesẋ = v, while the scattering term Q[ f ] describes the way the photon particles interact with the media. When the temperature is fixed, this operator is a linear operator, whereas if the laser beam heats up the tissue, Q reflects the nonlinear dependence on the temperature. This is the term that encodes the optical property of the media. In the steady state, the ∂ t f term is dropped, and the equation balances the transport term and the scattering term. The equation is well-posed if equipped with Dirichlet-type boundary condition [3][4][5][6]: where Γ ± collects the coordinates on the physical boundary ∂Ω with the velocity either pointing in or out of the domain: In practice, the laser light is shined into the domain, meaning that φ is prescribed. Then, one detects the number of photons scattered out of the domain by measuring f | Γ + . We term this operator the albedo operator: where the subindex Q reflects the influence of Q. Therefore, optical tomography amounts to reconstructing the coefficients in Q using the information in the operator A Q . The general well-posedness theory of such an inverse problem was addressed in the pioneering papers [7,8]. The result on the stability was established in [9][10][11], and see [12] for a review.
One key parameter in Equation (1) is , which is the Knudsen number. It describes the regime the system is in. By definition, the Knudsen number is the ratio of the mean-free path and the domain length. Physically, if a photon has low energy (visible or near-infrared light), it travels only a short distance before being scattered, and the mean-free path is short compared to the domain length, leading to a small . When this happens, one typically observes a diffused light phenomenon, and the received images are blurred. The situation is termed the diffusion effect, and in this regime, RTE, either linear or nonlinear, can be asymptotically approximated by a diffusive equation that characterizes macroscopic quantities such as the density ρ(x) = f (x, v)dv. Depending on the form of Q, this limiting equation for ρ is accordingly adjusted. Details regarding the diffusion limit can be found in [13,14].
Intuitively, as one decreases the photon energy, the received picture loses its crisp, and the reconstruction becomes more unstable. The perturbation observed in the measurement is enlarged in the reconstructed coefficients. This phenomenon has been been numerically observed in [1,15,16] and proved rigorously in [17], in which the authors investigate the stability deterioration of the inverse problem as → 0. However, all these results heavily depend on the theoretical framework where it is assumed that one has the access to the full map A. This amounts to sending in all kinds of incoming data φ and taking the measurement over the entire Γ + .
These theoretical results are helpful in building the foundation for understanding the stability deterioration, but the setup is infeasible numerically or practically. In the lab, a finite number of incoming configurations can be taken, and the detectors can measure outgoing photons in a finite number of locations. Therefore, new theory needs to be developed to account for this real situation [18,19]. To put the problem in a mathematical framework, we denote {φ j } J j=1 as the incoming data and {x k } K k=1 as the location of the detectors; then, the measurements are Equivalently, denoting f j as the solution to (1) with the boundary condition in (2) being φ j , then: With the finite amount of data points, to find the coefficients in Q, one can either adopt the PDE-constrained minimization algorithms [20][21][22] or employ the Bayesian inference [23][24][25]. While this finite setting greatly affects the well-posedness argument [26][27][28], the physical intuition still holds true in the sense that the reconstruction is expected to become more and more unstable as diminishes.
In this paper, we quantify the stability deterioration in the Bayesian inference framework with a finite number of data points. To quantify "stability", we propose two measures: one is a global measure that evaluates the information gain by comparing the posterior and prior distributions, and the other is a local measure that characterizes the "flatness" of the posterior distribution around the MAP (maximum a posteriori) point. More particularly, at the global level, we measure the difference between the posterior and prior distributions using the KL (Kullback-Leibler) divergence. Since the posterior distribution of the coefficients takes both the prior knowledge and the measured data into account, this divergence essentially characterizes by how much the data are driving the final guess away from the prior guess. At the local level, we estimate the Hessian of the posterior distribution around the MAP point, which essentially describes the uncertainty level of the optimizer. For both measures, we analyze their dependence on , and we reveal the stability relation between two regimes, the transport regime and diffusive regime, in a more rigorous way. We will present our theory through the lens of both linear and nonlinear RTEs, but we shall mention that other multiscale kinetic models may have such a relation in the inverse setting; see for instance in [29][30][31].
The rest of the paper is organized as follows. In the next section, we recall some formulation in the Bayesian inference problem and demonstrate a general relation in the KL divergence between prior and posterior distribution, through which we will address the stability deterioration from the global point of view. In Section 3, we consider the inverse problem for the linear radiative transfer equation, and we show that the KL divergence is of order , which diminishes as we approach the diffusive regime. We extend the investigation to the nonlinear RTE in Section 4 and obtain similar results. In Section 5, we focus on the local viewpoint by estimating the second derivative of the parameter-to-measurement map, and we show that it decreases at the order of , indicating that the posterior distribution becomes flatter near the diffusive regime and therefore is less sensitive to the measurement data, rendering the inverse problem more unstable. We summarize our theoretical results and provide some numerical evidence in Section 6. A final conclusion of the paper is drawn in Section 7.
All the discussion in these sections finally achieve the following aim of the paper: to demonstrate, in the Bayesian inference framework, the stability deterioration of optical imaging when the impinging light uses low-energy photons.

Bayesian Formulation Basics
Bayesian inference is one of the most popular numerical methods for inferring unknown parameters. In this section, we give a quick overview of definitions to be used in this paper.
To start, we define the parameter to measurement map G : B σ → R q and denote η the measurement error: where the measurement d ∈ R q , the measurement error η ∈ R d , and σ ∈ B ⊂ B σ , the admissible set in a pre-defined Banach space B σ : Note that B σ can be a function space specified in (10), and σ is a function in this space. Throughout the paper, we assume that η is an additive noise generated by a Gaussian distribution N (0, Γ η ), and Γ η is a q × q dimensional matrix.
In Bayesian inference, one needs to prepare a prior distribution for σ. Denote F as the σ-algebra of B and µ 0 as the prior probability measure on B. According to the Bayes' rule, the posterior distribution µ post = µ(σ|d) is given according to its Radon-Nikodym derivative with respect to dµ 0 : where Z is the normalization constant: and the mismatch is weighted by Γ η Suppose µ 1,2 are two distinct probability measures; then, the KL divergence measures the distance between them: when this definition is used in Bayesian inference to quantify the relative gain through the measurement process, one defines the relative entropy, and in the particular setting of (4), we have the following proposition.
Proposition 1. Assume σ ∈ B; then, the KL divergence between µ post and µ 0 has the following estimate: for some positive constant C independent of B.
Proof. Since µ 0 and µ post are mutually absolutely continuous, according to (4), we have where we used the Lipschitz continuity of e x in a bounded interval. The constant C depends on the size of d and the boundedness of G. Since σ ∈ B, according to (3), G(σ) ∞ < C B , making C < ∞. The inequality (5) holds if we plug in the expression of B(σ) to get: In view of (5), given that Γ −1 η and G(σ) + G(σ ) − 2d are at least bounded, the difference between µ post and µ 0 is controlled by the difference between G(σ) and G(σ ). This means if G is a slow-varying map over the whole admissible set B, then µ post only slightly differs from µ 0 , indicating that the information gain is small. In the following sections, we justify this property for RTE in both linear and nonlinear settings in the → 0 regime.

Global View Example 1: Linear Radiative Transfer Equation
The first example we consider is the linear radiative transfer equation: where the collision operator, σ a is the absorption coefficient, and σ s is the scattering coefficient. Both σ s and σ a here are seen as functions supported on Ω. The boundary denotes the incoming (Γ − ) and outgoing (Γ + ) boundaries respectively, and thus, incoming boundary conditions are considered here.
In the following, we first establish the the parameter-to-measurement map G. Then, according to Proposition 1, the core of the quantification lies in analyzing how fast G varies with respect to σ, for which we derive its Fréchet derivative and estimate its dependence on .

First Derivative of the Parameter-to-Solution Map
Let σ(x) = (σ s (x), σ a (x)) be the short hand notation, x ∈ Ω, and assume that only a finite series of incoming data {φ j } J j=1 are generated in the experiments, and the outgoing data are collected only at K discrete boundary locations {x k } K k=1 ; then, G is defined as: We also define the parameter-to-solution map S as: then G and S are related via: The above relation can be understood more precisely component-wise. We write G = {G jk } j=1,···J, k=1,··· ,K and S = {S jk } j=1,···J, k=1,··· ,K , then Consequently, finding the derivative of G in theσ direction amounts to finding the derivative of the map S, where the subscript in f denotes the parameter used in obtaining the solution f , andσ is an admissible variation of σ defined below. To abbreviate the notation, we denote and omit the subscripts j, k when the calculations are valid for all j and k.
We now specify the set of σ as follows: Then, from [32] (see Proposition 3.1), we have that for σ ∈ B σ , G(σ) L ∞ (Ω) ≤ C. We then call a parameterσ ∈ B σ an admissible variation of σ if the perturbed parameter σ + tσ ∈ B σ for sufficiently small t.
Hereafter, assume that the boundary condition φ ∈ L ∞ (Γ − ). We cite the following two results from [4]. The first one indicates that G from (8) is well defined, and the second one provides an estimate of w defined in (9).

Proposition 2. Denote
For σ ∈ B σ and admissible variationσ, w is the unique solution of the following equation: Aw where f ∈ F is the solution to (6) with parameter σ. Moreover, for = 1, there holds

Dependence on Knudsen Number
We discuss the dependence of the first derivative of the parameter-to-measurement map G on the Knudsen number and use it to build the asymptotic connection in the KL divergence between µ 0 and µ post . The proofs are carried out by asymptotic analysis. First, we recall the diffusion limit of the RTE.
where ξ(x) ∈ C 2 (∂Ω), and ρ f solves The proof is very similar to that in [32] (see the Appendix therein), but we still include the details here as it will be used in proving the next proposition.
Proof. We decompose the solution as where f 0 , f 1 , and f 2 are chosen as In order for f 2 to be well defined above, the standard elliptic estimate gives the global boundedness of ρ, ∂ x i ρ and ∂ x i x j ρ. Since L −1 is a bounded operator on Null(L) ⊥ , f 2 is uniformly bounded. Since L −1 and ∇ x commute, similar analysis can be applied to show that ∂ x i f 2 is uniformly bounded. Plugging the expansion (14) into (6), we obtain the equation for f r : (14), we have Using the boundedness of ∇ x ρ f , we conclude the theorem.

Remark 1.
In our case, we suppress the boundary layer by proposing a special boundary condition for f as defined in (12). This result can be extended to more general boundary conditions, in which case boundary layer analysis is inevitable, see [13] for details.
The proposition below demonstrates the dependence of the first derivative of G on .

Proposition 3.
For every σ ∈ B σ and admissible variationσ, there holds Proof. Considering (11) with perturbationσ := (σ s ,σ a ), we expand w and f as where we have used f 0 = ρ f . To get an evolution equation for w 0 , we look at the O( ) level and obtain: v (16) and (17), and taking the average in v leads to From the zero boundary condition of w, we also have Substituting (15) into (11) with boundary condition w r Γ − = − w 1 Γ − . Again from the maximum principle, w r L ∞ (Ω×S d−1 ) ≤ O(1/ ). Here, the boundedness of w 0 , ∂ x i w 0 are again from the elliptic estimate of (18). Now, recalling the definition of the first derivative of the forward map, we have which concludes the proof. Here, the first term vanishes due to the zero boundary condition (19).
For a linear RTE problem, the KL divergence between µ post and µ 0 vanishes as → 0, i.e., Proof. From (5), we have that Then, from Proposition 3 and boundedness of G(σ), the result is immediate.

Global View Example 2: Nonlinear Radiative Transfer Equation
Along the same vein, we extend our investigation to the nonlinear RTE in this section: Here, is the Knudsen number, which is defined to be the ratio of the mean free path to a characteristic length of the problem [33]. The boundary conditions for f are on the incoming boundary Γ − .
In the following, we first recall the diffusion limit of (21); then, we analyze the first derivative of the parameter to the measurement map in the inverse setting and investigate its dependence on so as to build a connection between µ post and µ 0 .

First Derivative of the Parameter-to-Solution Map
For the nonlinear RTE problem, the inverse problem is again set up to recover the scattering coefficient σ by measuring the outgoing intensity flux. Therefore, the parameterto-measurement map is defined as More particularly, assume that only a finite series of incoming data, {φ j } J j=1 , are generated in the experiments and injected into the tissue, and the outgoing data are collected only at K discrete boundary locations {x k } K k=1 ; then We also define the parameter-to-solution maps S 1 and S 2 then, the forward map G is written Consequently, finding the derivative of G amounts to finding the derivative of the map S 1 , namely, where we use the same notation to index the parameter of f as what we did for the linear RTE. Since f is nonlinearly coupled to T, we also need to find the derivative of the map S 2 , which is defined similarly as Here, σ ∈ B σ defined in (10) andσ is an admissible variation of σ.
To abbreviate the notation, we denote and omit the subscript j, k when the calculations are valid for all j and k.

Proposition 4.
Let φ ∈ L ∞ (Γ − ). Then, the operators S 1 and S 2 , defined in (22) and (23), are Lipschitz continuous mappings from Proof. We follow Theorem 3.3 in [4] and extend the proof to the nonlinear RTE. Let σ ∈ B σ andσ be admissible, and denote ( f , T), (f ,T) be corresponding solutions of (21): For sufficiently small t, we write the perturbed solutionŝ then, it is straightforward to show that (f ,T) satisfy From Theorem 3.1 in [33], for T B ∈ H 1/2 (∂Ω) ∩ L ∞ (∂Ω), let γ be that T B L ∞ (∂Ω) ≤ γ, there exists a unique solution (f ,T) to (27) that satisfies the following thanks to the maximum principle: As an immediate consequence, we have the following control of the first derivative of the parameter-to-solution map.

Proposition 5.
For σ ∈ B σ and admissible variationσ, w f and w T are the unique solutions of the following system: where ( f , T) and (f ,T) satisfy (21) with parameter σ and σ + tσ, respectively. Moreover, there holds: where C T > 0 and C f is a constant depending on C T .
Proof. From the definition of w f and w T , (29) comes directly from (27). Then, the bounds (30) can be concluded from (28).

Dependence on Knudsen Number
To analyze the dependence of G on , we first recall the asymptotic result in the forward setting. (21); then, as → 0, the pair converges to (T 0 (x) 4 , T 0 (x)), with T 0 (x) solving the nonlinear diffusion equation, where C d is a constant depending on dimension. ζ(x) is determined from φ(x, v) and T B (x) by solving a Milne problem.
This result is rigorously proved in [33], and we only sketch the main steps here.
Proof. Consider the asymptotic expansion in in the bulk area: Then, at the leading order O( 0 ), we have which implies that f 0 is uniform in v. The next order O( ) leads to Taking the average in v for the first equation and subtracting it from the second one, we have Using (32) in (34) implies that where C d depends on the dimension.
To determine the boundary condition for T 0 , we need to take into account the boundary layer effect. More precisely, let Ω 0 be the interior of the domain so that Ω \ Ω 0 is the boundary area. As before, we let y = x−x · n be the stretching variable, wherex ∈ ∂Ω and n is the unit outer normal direction atx; then, y ∈ [0, ∞). For x ∈ Ω \ Ω 0 , let f BL (y, v) = f (x, v), T BL (y) = T(x), and they satisfy the Milne problem: Then, solving the Milne problem, the boundary condition for (35) is given by With this theorem in mind, we delve into the dependence of the derivatives of G on in the following proposition.

Proposition 6.
For every j, k, σ ∈ B σ , and admissible variationσ, there holds: Proof. The proof is carried out by asymptotic analysis. We write the expansions: Plugging (36) into (21), we have as before: and solving these readily gives f 0 , f 1 , and T 0 . The remaining f 2 and T 2 are obtained by solving the rest of the system: Next, we plug both (36) and (37) into (29), and at leading order O(1): which implies Then, the next order O( 2 ) is: taking the average in v of (44) and adding the result to gives the equation for w T,0 Here, the boundary condition for w T,0 is again determined by the boundary layer asymptotic analysis, so we omit the details. Therefore, from the above, one solves (46) for w T,0 first and then uses (42) and (43) to get w f ,0 and w f ,1 . The remaining w f ,2 and w T,2 can be obtained by solving the rest of the system: This gives the boundedness of w f ,2 by 1 . Using the expansion for w f in the form (24) concluding the proposition.
The main theorem for the nonlinear case is now in order. With the result from Proposition 6, the proof is identical to that of Theorem 2.

Local Behavior around the MAP Point
The KL divergence between the prior and the posterior distribution is a global quantity: it characterizes the information gain from the measured data in the whole distribution. We are also concerned about the local behavior of the posterior distribution, especially around the maximum a posteriori (MAP) point, which is denoted by σ * . Suppose the posterior distribution around the MAP point is rather "flat"; then, the probability is unchanged in a fairly large area around σ * , meaning that all the configuration in this flat area can be approximately taken as the optimal point, and the reconstruction is insensitive to data, demonstrating the instability.
This behavior can be characterized well if the problem is linear. Suppose G(σ) = Gσ; then, assuming the prior distribution is a Gaussian centered at σ 0 with covariance C 0 , the posterior distribution is uniquely determined by σ post and C post by: The "flatness" of a Gaussian is characterized by its covariance matrix. Indeed, with a quick derivation, one can show that: B σ post − σ 2 dµ post = tr(C post ) . Therefore, the less informative the forward map is, the smaller G is, and then the bigger tr(C post ) gets, indicating the higher mean-square error. Geometrically, it means the Gaussian is flatter.
We would like to understand this local behavior around the MAP point. However, the forward map we have is nonlinear, so the argument above only serves as a guidance. To start, we denote: Then, the convexity of the posterior distribution is uniquely determined by the hessian of A. For that, we quote: Let σ * be admissible andσ be an admissible variation. The Hessian of the posterior distribution is expressed in terms of the forward map G as:

Proof.
We begin by expanding out G: where the remainder term has no G dependence. Expand G around σ * for small t: where σ = σ * + tσ. Plug this in the expression of A and collect the second power of t; then, we have: concluding the proof.
This formula holds true for all σ * that is admissible and thus is valid at MAP as well. According to this proposition, to show the "flatness" of the distribution at MAP, we essentially need to show the smallness of both G i (σ * ) and G i (σ * ) for all i. All derivations below are to justify this statement for both the linear and nonlinear RTE.

Linear RTE
We begin with the linear RTE. The following proposition, due to [4], characterizes the second derivative of the parameter-to-solution map S, which we will compute its dependence on .
Then, for any admissible σ * and admissible variations σ 1 , σ 2 , H is the unique solution of the following equation: AH where f ∈ F is the solution to (6) with parameter σ * and w (1,2) are the solutions to (11) with parameters σ * + tσ 1,2 , respectively. Moreover, there holds: (24), all f i,j satisfy the RTE with different σ as shown below, and the same incoming boundary condition g,

Proof. Considering the second derivative of G in Equation
and combining these equations gives Here, we have used the fact that C is linear in σ. Dividing by t 2 and taking the limit as t → 0 then gives AH The boundedness is seen from Theorem 3.7 in [4].
We next determine the dependence of the forward map's second derivative in terms of .

and Lw
(2) 1 , we have Integrating over velocity, we obtain an equation for the leading order term H 0 , which is in closed form, since w (i) 1 are written in terms of w Upon averaging over velocity, we obtain the second derivative of the operator G with the [σ,σ] perturbation, The contribution from the H 0 term becomes zero due to its trivial boundary condition, and H is bounded from [4].
Directly from Propositions 3, 7 and 9, we have the following corollary.

Nonlinear RTE
We repeat the analysis for the nonlinear RTE, for the case when G has been linearized. Using similar notatinos, we let and have the following proposition regarding the boundedness of H f and H T .

Proposition 10.
For σ * ∈ B σ and admissible variations σ 1 , σ 2 , (H f , H T ) are the unique solution to the following system: where w where C T > 0 and C f depends on C T .

Proof.
We let H f = 1 ,j) , T (i,j) ) satisfies the nonlinear RTE with a variety of σ shown below: (1,2) ) 4 − f (1,2) , , where ρ = 1 |S d−1 | S d−1 f dv. We begin by computing various combinations of T (i) . First, we recall: and expand (T (1,2) ) 4 = (T (1) ) 4 + (T (1) ) 3 (4tw (2) T + 4t 2 H T ) + (T (1) ) 2 (6t 2 (w (2) Similarly for σ 1 and σ 2 respectively, (T (1) ) 4 = (T + tw (1) Combining these, To derive the equations for H f and H T , we take (49), subtract (50) and (51), and add (52). Using (53), To show the boundedness, we note that H f , H T solve the nonlinear RTE with a modified right-hand side only containing bounded terms, so we again use Theorem 3.1 from [33] to obtain the boundedness: Proposition 11. For every j, k and functions σ * ,σ in the admissible set, the second derivative of the forward map is of order , Proof. We start with arbitrary directions σ 1 and σ 2 and later choose the [σ,σ] direction. To find the dependence of H f and H T , we expand: T,0 + w T,1 + 2 w , subtracting these two equations, we obtain and distribution is more and more "flat" and thus carries no useful information when the diffusion effect is strong. Both measures are taken for the linear RTE and are extended to, for the first time in the literature, treat the nonlinear RTE as well.
Although the paper is completely theoretical, it gives guidance on conducting numerical experiments. The discussion in this paper essentially suggests that the problem is intrinsically bad in the diffusion regime and thus rules out all possible algorithms that could potentially deliver a good numerical recovery.
There are some natural follow-up questions that need to be answered in the near future. One task is to clearly identify the number and the quality of the measurements for a unique reconstruction. Suppose the reconstructed function is represented by an N dimensional vector; then, how many experiments and measurements exactly are needed for a unique reconstruction, and where should the source and detector be located? The answer to this question for EIT (electrical impedance tomography) was given in [26], but one needs to tune the process to fit the situation for the radiative transfer equation used in the current paper. Another question concerns the numerical stability. While it is true that the information gain deteriorates as becomes small, there might be numerical approaches that gradually incorporate information from high-energy to low-energy photons. One possible strategy is to use a large number of experimental measurements conducted with low-energy photons to build a rough initial guess as a "warm start" before adding in information obtained at a high-energy level. The warm start given by the low-energy level results serves as a good initial guess toward a final convergence. The process is parallel to Bayesian tempering, which saw some good developments in the past few years [34,35], and the inverse scattering problem that combines information from multiple frequencies [36]. Another approach is to use a hybrid image modality, such as photoacoustic tomography. where acoustic measurement generated by the photoacoustic effect is used to infer the optimal properties of the media. In this case, an improved stability is anticipated [37,38].

Numerical Evidence
Here, we conduct two numerical tests to further support our theoretical findings. In particular, we consider the following linear radiative transfer equation: The data are prepared as follows. For the i-th experiment, we prescribe incoming data φ i and measure the output intensity f at location j on Γ + , which is denoted as d i,j . Here, φ and detect location j are chosen randomly, and N I = 44 number of experiments are conducted with N J = 264 number of receivers for each experiment. These data pairs are kept fixed for various choices of Knudsen number for a fair comparison.
In the first test, we assume σ s (x) has the form where N (0, Σ) is a two-dimensional normal distribution with mean zero and a fixed covariance matrix Σ. The parameter σ is the parameter to be inferred. We assume the ground-truth σ = 0.5, and the ground-truth medium is plotted on the left of Figure 1. The inverse problem aims at inferring this ground-truth from the prescribed data pairs. To illustrate the stability of the inverse problem, we show how, for different choices of σ away from the truth, the measurements differ from the true measurements. More precisely, we denote as the perturbation in the measurement, where d i,j is the measurement with a fixed σ for the i-th experiment and j-th measurement, and d true i,j is the measurement with true σ = 0.5. In the right panel of Figure 1, we plot the difference in measurement data (58) versus the perturbation in σ for three different . It is clear that with smaller , the difference in measurement becomes more indistinguishable, which means that a smaller perturbation in measurement can lead to larger deviation in reconstruction. This is in consistent with our theory on the stability deterioration with decreasing . In the second test, we consider a different form of σ s (x): Here, the to-be-reconstructed parameters are σ 1 and σ 2 . We choose σ 1 = 0.5 and σ 2 = 0.3 to be the ground-truth and display the medium in Figure 2. Similar to the previous case, we plot the difference in measurement with respect to σ 1 ∈ (0, 1) and σ 2 ∈ (0, 0.6). It is again evident that for smaller , d p becomes flatter, leading to a deterioration in stability, as shown in Figure 2. In addition, we identify the data that are above 0.05 × max i,j {d i,j } in Figure 3. Here, the horizontal axis is the index for the experiment, and the vertical axis is the index for the receiver. It is seen that a larger gives a sparse dataset, whereas small gives a dense dataset, meaning that all receivers receive some information about the source. This indicates that the data are more spread out for smaller , which makes the inverse problem harder to solve.  Visualization of measured data whose intensity value is above 0.05 × max i,j {d i,j }.

Conclusions
In this paper, we examine the stability deterioration for the multiscale inverse radiative transfer equation (RTE) in the Bayesian framework. Even though the instability on the continuous level was discussed in the literature [31] and the multiscale convergence was shown in [16,17,32], the instability representation in the Bayesian framework was open. The current paper constitutes the first result that fills the gap. The results presented suggest that one should not use Bayesian inference for conducting optical tomography when the photon energy is low. The numerical algorithm, without fine tuning, would carry very limited use in reconstructing the ground-truth media.