Next Article in Journal
A Numerical Approach for Developing a Bearing-Bypass Design Criterion for Sizing Bolted Joints
Next Article in Special Issue
DIAS: A Data-Informed Active Subspace Regularization Framework for Inverse Problems
Previous Article in Journal
A Theoretical Survey of the UV–Visible Spectra of Axially and Peripherally Substituted Boron Subphthalocyanines
Previous Article in Special Issue
Direct Sampling for Recovering Sound Soft Scatterers from Point Source Measurements
 
 
Font Type:
Arial Georgia Verdana
Font Size:
Aa Aa Aa
Line Spacing:
Column Width:
Background:
Article

Bayesian Instability of Optical Imaging: Ill Conditioning of Inverse Linear and Nonlinear Radiative Transfer Equation in the Fluid Regime

1
Department of Mathematics, University of Wisconsin-Madison, Madison, WI 53706, USA
2
Department of Mathematics, Diablo Valley College, Pleasant Hill, CA 94523, USA
3
School of Mathematics, University of Minnesota, Minneapolis, MN 55455, USA
*
Author to whom correspondence should be addressed.
Computation 2022, 10(2), 15; https://doi.org/10.3390/computation10020015
Submission received: 14 December 2021 / Revised: 7 January 2022 / Accepted: 13 January 2022 / Published: 19 January 2022
(This article belongs to the Special Issue Inverse Problems with Partial Data)

Abstract

:
For the inverse problem in physical models, one measures the solution and infers the model parameters using information from the collected data. Oftentimes, these data are inadequate and render the inverse problem ill-posed. We study the ill-posedness in the context of optical imaging, which is a medical imaging technique that uses light to probe (bio-)tissue structure. Depending on the intensity of the light, the forward problem can be described by different types of equations. High-energy light scatters very little, and one uses the radiative transfer equation (RTE) as the model; low-energy light scatters frequently, so the diffusion equation (DE) suffices to be a good approximation. A multiscale approximation links the hyperbolic-type RTE with the parabolic-type DE. The inverse problems for the two equations have a multiscale passage as well, so one expects that as the energy of the photons diminishes, the inverse problem changes from well- to ill-posed. We study this stability deterioration using the Bayesian inference. In particular, we use the Kullback–Leibler divergence between the prior distribution and the posterior distribution based on the RTE to prove that the information gain from the measurement vanishes as the energy of the photons decreases, so that the inverse problem is ill-posed in the diffusive regime. In the linearized setting, we also show that the mean square error of the posterior distribution increases as we approach the diffusive regime.

1. Introduction

Optical tomography is a well-defined inverse problem. In the lab, laser beams with high-energy photons are injected into bio-tissues to detect the interior optical property [1]. This helps identify unhealthy bio-tissues for treatment. Mathematically, this amounts to reconstructing the optical coefficients, such as the scattering and the absorption coefficients in the radiative transfer equation (RTE), which is a model equation that describes the propagation of photon particles [2]. The equation, in its simplest form, reads
t f + 1 ϵ v · x f = 1 ϵ 2 Q [ f ] .
The unknown f ( t , x , v ) describes the number of photon particles at time t R + , location x Ω R d , and traveling with velocity v S d 1 . The dimension of the problem is d = 2 , 3 . v is the travel direction of the particles and therefore belongs to a unit ball. The two terms in the equation represent the transport and the scattering effect, respectively. The transport term characterizes x ˙ = v , while the scattering term Q [ f ] describes the way the photon particles interact with the media. When the temperature is fixed, this operator is a linear operator, whereas if the laser beam heats up the tissue, Q reflects the nonlinear dependence on the temperature. This is the term that encodes the optical property of the media.
In the steady state, the t f term is dropped, and the equation balances the transport term and the scattering term. The equation is well-posed if equipped with Dirichlet-type boundary condition [3,4,5,6]:
f | Γ = ϕ ,
where Γ ± collects the coordinates on the physical boundary Ω with the velocity either pointing in or out of the domain:
Γ ± = { ( x , v ) : x Ω , ± v · n x > 0 } = x Ω Γ ± ( x ) , where Γ ± ( x ) = { v : ± v · n x > 0 } .
In practice, the laser light is shined into the domain, meaning that ϕ is prescribed. Then, one detects the number of photons scattered out of the domain by measuring f | Γ + . We term this operator the albedo operator:
A Q : f | Γ f | Γ + ,
where the subindex Q reflects the influence of Q . Therefore, optical tomography amounts to reconstructing the coefficients in Q using the information in the operator A Q . The general well-posedness theory of such an inverse problem was addressed in the pioneering papers [7,8]. The result on the stability was established in [9,10,11], and see [12] for a review.
One key parameter in Equation (1) is ϵ , which is the Knudsen number. It describes the regime the system is in. By definition, the Knudsen number is the ratio of the mean-free path and the domain length. Physically, if a photon has low energy (visible or near-infrared light), it travels only a short distance before being scattered, and the mean-free path is short compared to the domain length, leading to a small ϵ . When this happens, one typically observes a diffused light phenomenon, and the received images are blurred. The situation is termed the diffusion effect, and in this regime, RTE, either linear or nonlinear, can be asymptotically approximated by a diffusive equation that characterizes macroscopic quantities such as the density ρ ( x ) = f ( x , v ) d v . Depending on the form of Q , this limiting equation for ρ is accordingly adjusted. Details regarding the diffusion limit can be found in [13,14].
Intuitively, as one decreases the photon energy, the received picture loses its crisp, and the reconstruction becomes more unstable. The perturbation observed in the measurement is enlarged in the reconstructed coefficients. This phenomenon has been been numerically observed in [1,15,16] and proved rigorously in [17], in which the authors investigate the stability deterioration of the inverse problem as ϵ 0 . However, all these results heavily depend on the theoretical framework where it is assumed that one has the access to the full map A . This amounts to sending in all kinds of incoming data ϕ and taking the measurement over the entire Γ + .
These theoretical results are helpful in building the foundation for understanding the stability deterioration, but the setup is infeasible numerically or practically. In the lab, a finite number of incoming configurations can be taken, and the detectors can measure outgoing photons in a finite number of locations. Therefore, new theory needs to be developed to account for this real situation [18,19]. To put the problem in a mathematical framework, we denote { ϕ j } j = 1 J as the incoming data and { x k } k = 1 K as the location of the detectors; then, the measurements are
G = { G j k } j = 1 , k = 1 J , K , with G j k = v Γ + ( x k ) A Q [ ϕ j ] | v · n x k | d v .
Equivalently, denoting f j as the solution to (1) with the boundary condition in (2) being ϕ j , then:
G j k = v Γ + ( x k ) f j ( x k , v ) | v · n x k | d v .
With the finite amount of data points, to find the coefficients in Q , one can either adopt the PDE-constrained minimization algorithms [20,21,22] or employ the Bayesian inference [23,24,25]. While this finite setting greatly affects the well-posedness argument [26,27,28], the physical intuition still holds true in the sense that the reconstruction is expected to become more and more unstable as ϵ diminishes.
In this paper, we quantify the stability deterioration in the Bayesian inference framework with a finite number of data points. To quantify “stability”, we propose two measures: one is a global measure that evaluates the information gain by comparing the posterior and prior distributions, and the other is a local measure that characterizes the “flatness” of the posterior distribution around the MAP (maximum a posteriori) point. More particularly, at the global level, we measure the difference between the posterior and prior distributions using the KL (Kullback–Leibler) divergence. Since the posterior distribution of the coefficients takes both the prior knowledge and the measured data into account, this divergence essentially characterizes by how much the data are driving the final guess away from the prior guess. At the local level, we estimate the Hessian of the posterior distribution around the MAP point, which essentially describes the uncertainty level of the optimizer. For both measures, we analyze their dependence on ϵ , and we reveal the stability relation between two regimes, the transport regime and diffusive regime, in a more rigorous way. We will present our theory through the lens of both linear and nonlinear RTEs, but we shall mention that other multiscale kinetic models may have such a relation in the inverse setting; see for instance in [29,30,31].
The rest of the paper is organized as follows. In the next section, we recall some formulation in the Bayesian inference problem and demonstrate a general relation in the KL divergence between prior and posterior distribution, through which we will address the stability deterioration from the global point of view. In Section 3, we consider the inverse problem for the linear radiative transfer equation, and we show that the KL divergence is of order ϵ , which diminishes as we approach the diffusive regime. We extend the investigation to the nonlinear RTE in Section 4 and obtain similar results. In Section 5, we focus on the local viewpoint by estimating the second derivative of the parameter-to-measurement map, and we show that it decreases at the order of ϵ , indicating that the posterior distribution becomes flatter near the diffusive regime and therefore is less sensitive to the measurement data, rendering the inverse problem more unstable. We summarize our theoretical results and provide some numerical evidence in Section 6. A final conclusion of the paper is drawn in Section 7.
All the discussion in these sections finally achieve the following aim of the paper: to demonstrate, in the Bayesian inference framework, the stability deterioration of optical imaging when the impinging light uses low-energy photons.

2. Bayesian Formulation Basics

Bayesian inference is one of the most popular numerical methods for inferring unknown parameters. In this section, we give a quick overview of definitions to be used in this paper.
To start, we define the parameter to measurement map G : B σ R q and denote η the measurement error:
d = G ( σ ) + η ,
where the measurement d R q , the measurement error η R d , and σ B B σ , the admissible set in a pre-defined Banach space B σ :
B = { σ B σ : G ( σ ) L ( Ω ) < C B } .
Note that B σ can be a function space specified in (10), and σ is a function in this space. Throughout the paper, we assume that η is an additive noise generated by a Gaussian distribution N ( 0 , Γ η ) , and Γ η is a q × q dimensional matrix.
In Bayesian inference, one needs to prepare a prior distribution for σ . Denote F as the σ -algebra of B and μ 0 as the prior probability measure on B. According to the Bayes’ rule, the posterior distribution μ post = μ ( σ | d ) is given according to its Radon–Nikodym derivative with respect to d μ 0 :
d μ post d μ 0 1 Z exp 1 2 G ( σ ) d Γ η 2 ,
where Z is the normalization constant:
Z = B exp 1 2 G ( σ ) d Γ η 2 d μ 0 ( σ ) ,
and the mismatch is weighted by Γ η
G ( σ ) d Γ η 2 = ( G ( σ ) d ) T Γ η 1 ( G ( σ ) d ) .
Suppose μ 1 , 2 are two distinct probability measures; then, the KL divergence measures the distance between them:
D KL ( μ 1 | μ 2 ) = X log d μ 1 d μ 2 d μ 1 .
when this definition is used in Bayesian inference to quantify the relative gain through the measurement process, one defines the relative entropy, and in the particular setting of (4), we have the following proposition.
Proposition 1.
Assume σ B ; then, the KL divergence between μ post and μ 0 has the following estimate:
D K L ( μ 0 | μ post ) C B B ( G ( σ ) G ( σ ) ) Γ η 1 ( G ( σ ) + G ( σ ) 2 d ) d μ 0 ( σ ) d μ 0 ( σ )
for some positive constant C independent of B.
Proof. 
Since μ 0 and μ post are mutually absolutely continuous, according to (4), we have
D K L ( μ 0 | μ post ) = B log d μ 0 d μ post d μ 0 = B 1 2 G ( σ ) d Γ η 2 + log Z d μ 0 .
Denoting B ( σ ) = 1 2 G ( σ ) d Γ η 2 , we proceed with:
D K L ( μ 0 | μ post ) = B B ( σ ) + log B e B ( σ ) d μ 0 ( σ ) d μ 0 ( σ ) = B log e B ( σ ) + log B e B ( σ ) d μ 0 ( σ ) d μ 0 ( σ ) C B e B ( σ ) + B e B ( σ ) d μ 0 ( σ ) d μ 0 ( σ ) C B B | e B ( σ ) e B ( σ ) | d μ 0 ( σ ) d μ 0 ( σ ) C B B | B ( σ ) B ( σ ) | d μ 0 ( σ ) d μ 0 ( σ )
where we used the Lipschitz continuity of e x in a bounded interval. The constant C depends on the size of d and the boundedness of G . Since σ B , according to (3), G ( σ ) < C B , making C < . The inequality (5) holds if we plug in the expression of B ( σ ) to get:
B ( σ ) B ( σ ) = 1 2 [ G ( σ ) G ( σ ) ] Γ η 1 [ G ( σ ) + G ( σ ) 2 d ] .
In view of (5), given that Γ η 1 and G ( σ ) + G ( σ ) 2 d are at least bounded, the difference between μ post and μ 0 is controlled by the difference between G ( σ ) and G ( σ ) . This means if G is a slow-varying map over the whole admissible set B, then μ post only slightly differs from μ 0 , indicating that the information gain is small. In the following sections, we justify this property for RTE in both linear and nonlinear settings in the ϵ 0 regime.

3. Global View Example 1: Linear Radiative Transfer Equation

The first example we consider is the linear radiative transfer equation:
v · x f ( x , v ) = ϵ σ a ( x ) f ( x , v ) + 1 ϵ σ s ( x ) L f ( x , v ) , f | Γ = ϕ ,
where x Ω R d , v S d 1 , L f = 1 | S d 1 | S d 1 f d v f is the collision operator, σ a is the absorption coefficient, and σ s is the scattering coefficient. Both σ s and σ a here are seen as functions supported on Ω . The boundary Γ ± = { ( x , v ) Ω × S d 1 | ± v · n x > 0 } denotes the incoming ( Γ ) and outgoing ( Γ + ) boundaries respectively, and thus, incoming boundary conditions are considered here.
In the following, we first establish the the parameter-to-measurement map G . Then, according to Proposition 1, the core of the quantification lies in analyzing how fast G varies with respect to σ , for which we derive its Fréchet derivative and estimate its dependence on ϵ .

3.1. First Derivative of the Parameter-to-Solution Map

Let σ ( x ) = ( σ s ( x ) , σ a ( x ) ) be the short hand notation, x Ω , and assume that only a finite series of incoming data { ϕ j } j = 1 J are generated in the experiments, and the outgoing data are collected only at K discrete boundary locations { x k } k = 1 K ; then, G is defined as:
G : σ ( x ) Γ + ( x k ) f ( x k , v ; ϕ j ) v · n d v j = 1 , J , k = 1 , , K .
We also define the parameter-to-solution map S as:
S : σ ( x ) { f ( x k , v ; ϕ j ) } j = 1 , J , k = 1 , , K ,
then G and S are related via:
G ( σ ) = Γ + S ( σ ) v · n d v .
The above relation can be understood more precisely component-wise. We write G = { G j k } j = 1 , J , k = 1 , , K and S = { S j k } j = 1 , J , k = 1 , , K , then
G j k ( σ ) = Γ + ( x k ) f ( x k , v ; ϕ j ) v · n d v = Γ + ( x k ) S j k ( σ ) v · n d v .
Consequently, finding the derivative of G in the σ ^ direction amounts to finding the derivative of the map S ,
G j k ( σ ) σ ^ = lim t 0 1 t Γ + ( x k ) ( f σ + t σ ^ ( x k , v ; ϕ j ) f σ ( x k , v ; ϕ j ) ) v · n d v = Γ + ( x k ) S j k ( σ ) σ ^ v · n d v ,
where the subscript in f denotes the parameter used in obtaining the solution f, and σ ^ is an admissible variation of σ defined below. To abbreviate the notation, we denote
f = f σ ( x , v ; ϕ ) , f ^ = f σ + t σ ^ ( x , v ; ϕ ) , w = S ( σ ) σ ^
and omit the subscripts j, k when the calculations are valid for all j and k.
We now specify the set of σ as follows:
B σ = { σ C 1 ( Ω ) : σ > 0 , max { σ L ( Ω ) , σ 1 L ( Ω ) , ( σ 1 ) L ( Ω ) } < C 1 } .
Then, from [32] (see Proposition 3.1), we have that for σ B σ , G ( σ ) L ( Ω ) C . We then call a parameter σ ^ B σ an admissible variation of σ if the perturbed parameter σ + t σ ^ B σ for sufficiently small t.
Hereafter, assume that the boundary condition ϕ L ( Γ ) . We cite the following two results from [4]. The first one indicates that G from (8) is well defined, and the second one provides an estimate of w defined in (9).
Lemma 1.
Let ϕ L ( Γ ) ; then, the operator S is a Lipschitz continuous mapping from B σ to F , where F is defined as
F = { f : f L ( Ω × S d 1 ) , v · f L ( Ω × S d 1 ) } .
Proposition 2.
Denote
A f = v · x f , C σ f = ϵ σ a ( x ) f 1 ϵ σ s ( x ) L f .
For σ B σ and admissible variation σ ^ , w is the unique solution of the following equation:
A w + C σ w = C σ ^ f , w | Γ = 0 ,
where f F is the solution to (6) with parameter σ. Moreover, for ϵ = 1 , there holds
w L ( Ω ) C σ ^ L ( Ω ) ϕ L ( Γ ) .

3.2. Dependence on Knudsen Number

We discuss the dependence of the first derivative of the parameter-to-measurement map G on the Knudsen number ϵ and use it to build the asymptotic connection in the KL divergence between μ 0 and μ post . The proofs are carried out by asymptotic analysis. First, we recall the diffusion limit of the RTE.
Theorem 1.
Suppose f ( x , v ) satisfies Equation (6) with a smooth boundary condition
ϕ ( x , v ) = ξ ( x ) ϵ σ s v · x ρ f ( x ) ,
where ξ ( x ) C 2 ( Ω ) , and ρ f solves
C d · 1 σ s ρ f + σ a ρ f = 0 , ρ f ( x ) | Ω = ξ ( x ) .
Then, ϕ L ( Γ ) , and f ( x , v ) ρ f ( x ) as ϵ 0 . Moreover, we have
f ρ f L ( Ω × S d 1 ) C B σ ϵ .
The proof is very similar to that in [32] (see the Appendix therein), but we still include the details here as it will be used in proving the next proposition.
Proof. 
We decompose the solution as
f = f 0 + ϵ f 1 + ϵ 2 f 2 + ϵ 3 f r ,
where f 0 , f 1 , and f 2 are chosen as
f 0 = ρ f , f 1 = 1 σ s v · x ρ f , f 2 = L 1 v σ s · x v σ s · x ρ f + σ a σ s ρ f .
In order for f 2 to be well defined above, we need v σ s · x v σ s · x ρ f + σ a σ s ρ f to be in the range of L , which then leads to
v σ s · x v σ s · x ρ f v = σ a σ s ρ f .
Equipping it with the boundary condition ρ f | Ω = ξ ( x ) determines ρ f . Note that ξ ( x ) C 2 ( Ω ) , the standard elliptic estimate gives the global boundedness of ρ , x i ρ and x i x j ρ . Since L 1 is a bounded operator on Null ( L ) , f 2 is uniformly bounded. Since L 1 and x commute, similar analysis can be applied to show that x i f 2 is uniformly bounded. Plugging the expansion (14) into (6), we obtain the equation for f r :
v · x f r + ϵ σ a f r 1 ϵ σ s L f r = σ a f 2 1 ϵ v · x f 2 1 ϵ σ a f 1
with boundary condition f r | Γ = ϵ L 1 v σ s · x v σ s · x ρ f ( x ) + σ a σ s ρ f ( x ) . Consequently, f r satisfies RTE with O ( ϵ ) boundary condition and O ( 1 / ϵ ) source; therefore, from the maximum principle, f r L ( Ω × S d 1 ) O ( 1 / ϵ ) . Recalling (14), we have
f = ρ f ϵ 1 σ s v · x ρ f + O ( ϵ 2 ) .
Using the boundedness of x ρ f , we conclude the theorem. □
Remark 1.
In our case, we suppress the boundary layer by proposing a special boundary condition for f as defined in (12). This result can be extended to more general boundary conditions, in which case boundary layer analysis is inevitable, see [13] for details.
The proposition below demonstrates the dependence of the first derivative of G on ϵ .
Proposition 3.
For every σ B σ and admissible variation σ ^ , there holds
G ( σ ) σ ^ = O ( ϵ ) .
Proof. 
Considering (11) with perturbation σ ^ : = ( σ ^ s , σ ^ a ) , we expand w and f as
w = w 0 + ϵ w 1 + ϵ 2 w r , f = f 0 + ϵ f 1 + ϵ 2 f 2 + ϵ 3 f r .
Then, at O ( 1 / ϵ ) , we have σ s L w 0 = σ ^ s L f 0 = 0 , which indicates that
w 0 ( x , v ) = w 0 ( x ) .
At O ( 1 ) , we have v · w 0 σ s L w 1 = σ ^ s L f 1 , then using the form of f 1 = 1 σ s v · f 0 , we obtain σ s L w 1 = v · w 0 σ ^ s σ s v · f 0 , and thus
w 1 = 1 σ s v · w 0 + σ ^ s σ s 2 v · ρ f ,
where we have used f 0 = ρ f . To get an evolution equation for w 0 , we look at the O ( ϵ ) level and obtain: v · x w 1 + σ a w 0 σ s L w r = σ ^ a f 0 + σ ^ s L f r . Substituting (16) and (17), and taking the average in v leads to
v · x v σ s · x w 0 v σ a w 0 = σ ^ a ρ f + v · x σ ^ s σ s 2 v · x ρ f v .
From the zero boundary condition of w, we also have
w 0 | Ω = 0 .
Substituting (15) into (11), we get
v · x w r + ϵ σ a w r σ s ϵ L w r = σ a 1 ϵ w 0 + σ ^ s σ s 2 v · x ρ f v σ s · x w 0 1 ϵ v · x σ ^ s σ s 2 v · x ρ f v σ s · x w 0 σ ^ a ϵ f + σ ^ s ϵ L ( f 2 + ϵ f r ) ,
with boundary condition w r | Γ = ϵ w 1 | Γ . Again from the maximum principle, w r L ( Ω × S d 1 ) O ( 1 / ϵ ) . Here, the boundedness of w 0 , x i w 0 are again from the elliptic estimate of (18).
Now, recalling the definition of the first derivative of the forward map, we have
G ( σ ) σ ^ = Γ + w ( x , v ; ϕ ) v · n d v = Γ + w 0 ( x ; ϕ ) + ϵ w 1 ( x , v ; ϕ ) + ϵ 2 w r ( x , v ; ϕ ) v · n d v = O ( ϵ ) ,
which concludes the proof. Here, the first term vanishes due to the zero boundary condition (19). □
Theorem 2.
For a linear RTE problem, the KL divergence between μ post and μ 0 vanishes as ϵ 0 , i.e.,
D K L ( μ post | μ 0 ) = O ( ϵ ) .
Proof. 
From (5), we have that
D K L ( μ 0 | μ post ) B B ( G ( σ ) G ( σ ) ) Γ η 1 ( G ( σ ) + G ( σ ) 2 d ) d μ 0 ( σ ) d μ 0 ( σ ) = B B 0 1 G ( σ + s ( σ σ ) ) ( σ σ ) Γ η 1 ( G ( σ ) + G ( σ ) 2 d ) d s d μ 0 ( σ ) d μ 0 ( σ ) .
Then, from Proposition 3 and boundedness of G ( σ ) , the result is immediate. □

4. Global View Example 2: Nonlinear Radiative Transfer Equation

Along the same vein, we extend our investigation to the nonlinear RTE in this section:
ϵ v · x f ( x , v ) = σ ( x ) T ( x ) 4 f ( x , v ) , ϵ 2 T ( x ) = σ ( x ) T ( x ) 4 ρ ( x ) , f | Γ = ϕ ( x , v ) , T | Ω = T B ( x ) .
Here, x Ω R d , v S d 1 , f ( x , v ) is again the radiation intensity and T ( x ) is the temperature. ρ = 1 | S d 1 | S d 1 f d v is the total intensity. ϵ is the Knudsen number, which is defined to be the ratio of the mean free path to a characteristic length of the problem [33]. The boundary conditions for f are on the incoming boundary Γ .
In the following, we first recall the diffusion limit of (21); then, we analyze the first derivative of the parameter to the measurement map in the inverse setting and investigate its dependence on ϵ so as to build a connection between μ post and μ 0 .

4.1. First Derivative of the Parameter-to-Solution Map

For the nonlinear RTE problem, the inverse problem is again set up to recover the scattering coefficient σ by measuring the outgoing intensity flux. Therefore, the parameter-to-measurement map is defined as
G : σ ( x ) Γ + v · n f d v .
More particularly, assume that only a finite series of incoming data, { ϕ j } j = 1 J , are generated in the experiments and injected into the tissue, and the outgoing data are collected only at K discrete boundary locations { x k } k = 1 K ; then
G : σ ( x ) Γ + ( x k ) f ( x k , v ; ϕ j , T B , j ) v · n d v j = 1 , J , k = 1 , , K .
We also define the parameter-to-solution maps S 1 and S 2
S 1 : σ ( x ) { f ( x k , v ; ϕ j , T B , j ) } j = 1 , J , k = 1 , , K ,
S 2 : σ ( x ) { T ( x k ; ϕ j , T B , j ) } j = 1 , J , k = 1 , , K ;
then, the forward map G is written
G j k ( σ ) = Γ + ( x k ) f ( x k , v ; ϕ j , T B , j ) v · n d v = Γ + ( x k ) S 1 j k ( σ ) v · n d v .
Consequently, finding the derivative of G amounts to finding the derivative of the map S 1 , namely,
G j k ( σ ) σ ^ = lim t 0 1 t Γ + ( x k ) ( f σ + t σ ^ ( x k , v ; ϕ j , T B , j ) f σ ( x k , v ; ϕ j , T B , j ) ) v · n d v = Γ + ( x k ) S 1 j k ( σ ) σ ^ v · n d v ,
where we use the same notation to index the parameter of f as what we did for the linear RTE. Since f is nonlinearly coupled to T, we also need to find the derivative of the map S 2 , which is defined similarly as
S 2 j k ( σ ) σ ^ = lim t 0 1 t [ T σ + t σ ^ ( x ; ϕ , T B ) T σ ( x ; ϕ , T B ) ] .
Here, σ B σ defined in (10) and σ ^ is an admissible variation of σ .
To abbreviate the notation, we denote
f = f σ ( x , v ; ϕ , T B ) , f ^ = f σ + t σ ^ ( x , v ; ϕ , T B ) , w f = S 1 ( σ ) σ ^ ,
T = T σ ( x ; ϕ , T B ) , T ^ = T σ + t σ ^ ( x ; ϕ , T B ) , w T = S 2 ( σ ) σ ^ ,
and omit the subscript j, k when the calculations are valid for all j and k.
Proposition 4.
Let ϕ L ( Γ ) . Then, the operators S 1 and S 2 , defined in (22) and (23), are Lipschitz continuous mappings from L ( Ω ) × L ( Ω ) to F f × F T , where
F f = { f : f L ( Ω × S d 1 ) , v · f L ( Ω × S d 1 ) } , a n d F T = { T : T L ( Ω ) } .
Proof. 
We follow Theorem 3.3 in [4] and extend the proof to the nonlinear RTE. Let σ B σ and σ ^ be admissible, and denote ( f , T ) , ( f ^ , T ^ ) be corresponding solutions of (21):
ϵ v · x f = σ ( T 4 f ) , ϵ T = σ ( T 4 ρ ) , f | Γ = ϕ ( x , v ) , T | Ω = T B ( x ) ,
and
ϵ v · x f ^ = ( σ + t σ ^ ) ( ( T ^ ) 4 f ^ ) . ϵ T ^ = ( σ + t σ ^ ) ( ( T ^ ) 4 ρ ^ ) , f ^ | Γ = ϕ ( x , v ) , T ^ | Ω = T B ( x ) .
For sufficiently small t, we write the perturbed solutions
f ^ = f + t f ˜ , T ^ = T + t T ˜ ;
then, it is straightforward to show that ( f ˜ , T ˜ ) satisfy
ϵ v · x f ˜ = σ [ 4 T 3 T ˜ t f ˜ ] + σ ^ ( T 4 f ) + O ( t 2 ) , ϵ 2 T ˜ = σ [ 4 T 3 T ˜ ρ ˜ ] + σ ^ ( T 4 ρ ) + O ( t 2 ) , f ˜ | Γ = 0 , T ˜ | Ω = 0 .
From Theorem 3.1 in [33], for T B H 1 / 2 ( Ω ) L ( Ω ) , let γ be that T B L ( Ω ) γ , there exists a unique solution ( f ˜ , T ˜ ) to (27) that satisfies the following thanks to the maximum principle:
T ˜ L ( Ω × S d 1 ) = 1 t ( T T ^ ) L ( Ω × S d 1 ) γ , f ˜ L ( Ω × S d 1 ) = 1 t ( f f ^ ) L ( Ω × S d 1 ) B ( γ ) .
As an immediate consequence, we have the following control of the first derivative of the parameter-to-solution map.
Proposition 5.
For σ B σ and admissible variation σ ^ , w f and w T are the unique solutions of the following system:
ϵ v · x w f = σ 4 T 3 w T w f + σ ^ ( T 4 f ) , ϵ 2 w T = σ ( 4 T 3 w T w ρ ) + σ ^ ( T 4 ρ ) , w f | Γ = 0 , w T | Ω = 0 ,
where ( f , T ) and ( f ^ , T ^ ) satisfy (21) with parameter σ and σ + t σ ^ , respectively. Moreover, there holds:
w f L ( Ω ) C f , w T L ( Ω ) C T ,
where C T > 0 and C f is a constant depending on C T .
Proof. 
From the definition of w f and w T , (29) comes directly from (27). Then, the bounds (30) can be concluded from (28). □

4.2. Dependence on Knudsen Number

To analyze the dependence of G on ϵ , we first recall the asymptotic result in the forward setting.
Theorem 3.
Let ( f ( x , v ) , T ( x ) ) solve (21); then, as ϵ 0 , the pair converges to ( T 0 ( x ) 4 , T 0 ( x ) ) , with T 0 ( x ) solving the nonlinear diffusion equation,
C d · 1 σ ( x ) T 0 ( x ) 4 + T 0 = 0 , T 0 | Ω = ζ ( x ) ,
where C d is a constant depending on dimension. ζ ( x ) is determined from ϕ ( x , v ) and T B ( x ) by solving a Milne problem.
This result is rigorously proved in [33], and we only sketch the main steps here.
Proof. 
Consider the asymptotic expansion in ϵ in the bulk area:
f = f 0 + ϵ f 1 + ϵ 2 f 2 + , and T = T 0 + ϵ 2 T 2 + .
Then, at the leading order O ( ϵ 0 ) , we have
T 0 4 = f 0 = ρ 0 ,
which implies that f 0 is uniform in v. The next order O ( ϵ ) leads to
f 1 = 1 σ v · f 0 .
At O ( ϵ 2 ) , we have
v · x f 1 = σ ( 4 T 0 3 T 2 f 2 ) , T 0 = σ ( 4 T 0 3 T 2 ρ 2 ) .
Taking the average in v for the first equation and subtracting it from the second one, we have
1 | S d 1 | S d 1 v · x 1 σ v · f 0 d v + T 0 = 0 .
Using (32) in (34) implies that
C d · 1 σ T 0 4 + T 0 = 0 ,
where C d depends on the dimension.
To determine the boundary condition for T 0 , we need to take into account the boundary layer effect. More precisely, let Ω 0 be the interior of the domain so that Ω Ω 0 is the boundary area. As before, we let y = x x ˜ ϵ · n be the stretching variable, where x ˜ Ω and n is the unit outer normal direction at x ˜ ; then, y [ 0 , ) . For x Ω Ω 0 , let f BL ( y , v ) = f ( x , v ) , T BL ( y ) = T ( x ) , and they satisfy the Milne problem:
y f BL ( y , v ) v · n = σ T BL ( y ) 4 f BL ( y , v ) y y T BL ( y ) = σ T BL ( y ) 4 ρ BL ( y ) f BL ( 0 , v ) = ϕ ( x ˜ , v ) T BL ( 0 ) = T B ( x ˜ ) .
Then, solving the Milne problem, the boundary condition for (35) is given by
T 0 ( x ) | Ω = T BL ( ) .
With this theorem in mind, we delve into the dependence of the derivatives of G on ϵ in the following proposition.
Proposition 6.
For every j , k , σ B σ , and admissible variation σ ^ , there holds:
G j k ( σ ) σ ^ = O ( ϵ ) .
Proof. 
The proof is carried out by asymptotic analysis. We write the expansions:
f = f 0 + ϵ f 1 + ϵ 2 f 2 , T = T 0 + ϵ 2 T 2
w f = w f , 0 + ϵ w f , 1 + ϵ 2 w f , 2 , w T = w T , 0 + ϵ 2 w T , 2 .
Plugging (36) into (21), we have as before:
f 0 = T 0 4 = ρ 0 ;
f 1 = v σ · ρ 0 ;
C d · 1 σ T 0 4 + T 0 = 0 , T 0 | Ω = ζ ( x ) ,
and solving these readily gives f 0 , f 1 , and T 0 . The remaining f 2 and T 2 are obtained by solving the rest of the system:
ϵ 3 v · f 2 = σ [ ( T 0 + ϵ 2 T 2 ) 4 4 ϵ 2 T 0 3 T 2 T 0 4 ] , ϵ 4 T 2 = σ [ ( T 0 + ϵ 2 T 2 ) 4 4 ϵ 2 T 0 3 T 2 T 0 4 ] .
Next, we plug both (36) and (37) into (29), and at leading order O ( 1 ) :
w f , 0 = w ρ , 0 = 4 T 0 3 w T , 0 .
At O ( ϵ ) , we equate
v · x w f , 0 = σ w f , 1 σ ^ f 1 ,
which implies
w f , 1 = 1 σ v · x w f , 0 + σ ^ σ 2 v · x f 0 .
Then, the next order O ( ϵ 2 ) is:
v · x w f , 1 = σ ( 12 T 0 T 2 2 w T , 0 + 4 T 0 2 w T , 2 w f , 2 ) + σ ^ ( 4 T 0 3 T 2 f 2 )
w T , 0 = σ ( 12 T 0 2 T 2 w T , 0 + 4 T 0 3 w T , 2 w ρ 2 ) + σ ^ ( 4 T 0 3 T 2 ρ 2 ) ,
taking the average in v of (44) and adding the result to gives the equation for w T , 0
C d · 1 σ ( 4 T 0 3 w T , 0 ) + w T , 0 = C d x · σ ^ σ 2 ρ 0 , w T , 0 | Ω = 0 .
Here, the boundary condition for w T , 0 is again determined by the boundary layer asymptotic analysis, so we omit the details.
Therefore, from the above, one solves (46) for w T , 0 first and then uses (42) and (43) to get w f , 0 and w f , 1 . The remaining w f , 2 and w T , 2 can be obtained by solving the rest of the system:
ϵ 3 v · w f , 2 = σ [ 4 ( T 0 + ϵ 2 T 2 ) 3 ( w T , 0 + ϵ 2 w T , 2 ) 4 ϵ 2 T 0 3 w T , 2 12 ϵ 2 T 0 2 T 2 w T , 0 4 T 0 3 w T , 0 ] + σ ^ [ ( T 0 + ϵ 2 T 2 ) 4 T 0 4 4 ϵ 2 T 0 3 T 2 ] , ϵ 4 w T , 2 = σ [ 4 ( T 0 + ϵ 2 T 2 ) 3 ( w T , 0 + ϵ 2 w T , 2 ) 4 ϵ 2 T 0 3 w T , 2 12 ϵ 2 T 0 2 T 2 w T , 0 4 T 0 3 w T , 0 ] + σ ^ [ ( T 0 + ϵ 2 T 2 ) 4 T 0 4 4 ϵ 2 T 0 3 T 2 ] .
This gives the boundedness of w f , 2 by 1 ϵ . Using the expansion for w f in the form (24) of G ( σ ) σ ^ , we have
G j k ( σ ) σ ^ = Γ + ( x k ) w f ( x k , v ; ϕ j , T B ) v · n d v = Γ + ( x k ) w f , 0 ( x k ; ϕ j , T B ) + ϵ w f , 1 ( x k ; ϕ j , T B ) + ϵ 2 w f , 2 ( x k ; ϕ j , T B ) v · n d v = O ( ϵ ) ,
concluding the proposition. □
The main theorem for the nonlinear case is now in order.
Theorem 4.
For the nonlinear RTE problem, the KL divergence between μ post and μ 0 vanishes as ϵ 0 , i.e.,
D K L ( μ post | μ 0 ) = O ( ϵ ) .
With the result from Proposition 6, the proof is identical to that of Theorem 2.

5. Local Behavior around the MAP Point

The KL divergence between the prior and the posterior distribution is a global quantity: it characterizes the information gain from the measured data in the whole distribution. We are also concerned about the local behavior of the posterior distribution, especially around the maximum a posteriori (MAP) point, which is denoted by σ . Suppose the posterior distribution around the MAP point is rather “flat”; then, the probability is unchanged in a fairly large area around σ , meaning that all the configuration in this flat area can be approximately taken as the optimal point, and the reconstruction is insensitive to data, demonstrating the instability.
This behavior can be characterized well if the problem is linear. Suppose G ( σ ) = G σ ; then, assuming the prior distribution is a Gaussian centered at σ 0 with covariance C 0 , the posterior distribution is uniquely determined by σ post and C post by:
σ post = C post 1 ( G Γ η 1 d + C 0 1 σ 0 ) , C post = ( G Γ η 1 G + C 0 1 ) 1 .
The “flatness” of a Gaussian is characterized by its covariance matrix. Indeed, with a quick derivation, one can show that:
B σ post σ 2 d μ post = tr ( C post ) .
Therefore, the less informative the forward map is, the smaller G is, and then the bigger tr ( C post ) gets, indicating the higher mean-square error. Geometrically, it means the Gaussian is flatter.
We would like to understand this local behavior around the MAP point. However, the forward map we have is nonlinear, so the argument above only serves as a guidance. To start, we denote:
μ post = 1 Z exp G ( σ ) d Γ η 2 / 2 μ 0 exp { A } .
Then, the convexity of the posterior distribution is uniquely determined by the hessian of A. For that, we quote:
Proposition 7.
Let σ be admissible and σ ^ be an admissible variation. The Hessian of the posterior distribution is expressed in terms of the forward map G as:
A ( σ ) [ σ ^ , σ ^ ] = 1 2 i G i ( σ ) [ σ ^ ] Γ η 1 G i ( σ ) [ σ ^ ] + G i ( σ ) Γ η 1 G i ( σ ) [ σ ^ , σ ^ ] d i Γ η 1 G i ( σ ) [ σ ^ , σ ^ ] .
Proof. 
We begin by expanding out G :
A ( σ ) = 1 2 i = 1 J K 2 d i Γ η 1 G i ( σ ) + G i ( σ ) Γ η 1 G i ( σ ) + remainders ,
where the remainder term has no G dependence. Expand G around σ for small t:
G i ( σ ) = G i ( σ ) + t G i ( σ ) [ σ ^ ] + 1 2 t 2 G i ( σ ) [ σ ^ , σ ^ ] + O ( t 3 ) ,
where σ = σ + t σ ^ . Plug this in the expression of A and collect the second power of t; then, we have:
A ( σ ) [ σ ^ , σ ^ ] = 1 2 i G i ( σ ) [ σ ^ ] Γ η 1 G i ( σ ) [ σ ^ ] + G i ( σ ) Γ η 1 G i ( σ ) [ σ ^ , σ ^ ] d i Γ η 1 G i ( σ ) [ σ ^ , σ ^ ] ,
concluding the proof. □
This formula holds true for all σ that is admissible and thus is valid at MAP as well. According to this proposition, to show the “flatness” of the distribution at MAP, we essentially need to show the smallness of both G i ( σ ) and G i ( σ ) for all i. All derivations below are to justify this statement for both the linear and nonlinear RTE.

5.1. Linear RTE

We begin with the linear RTE. The following proposition, due to [4], characterizes the second derivative of the parameter-to-solution map S , which we will compute its dependence on ϵ .
Proposition 8.
Denote H = S ( σ ) [ σ 1 , σ 2 ] , A f = v · x , and C σ f = ϵ σ a ( x ) f 1 ϵ σ s ( x ) L f . Then, for any admissible σ and admissible variations σ 1 , σ 2 , H is the unique solution of the following equation:
A H + C σ H = C σ 1 w ( 2 ) C σ 2 w ( 1 ) , H | Γ = 0 ,
where f F is the solution to (6) with parameter σ and w ( 1 , 2 ) are the solutions to (11) with parameters σ + t σ 1 , 2 , respectively. Moreover, there holds:
H L ( Ω ) C ( σ 1 L ( Ω ) + σ 2 L ( Ω ) ) .
Proof. 
Considering the second derivative of G in Equation (24), all f i , j satisfy the RTE with different σ as shown below, and the same incoming boundary condition g,
A f ( 1 , 2 ) + C σ + t σ 1 + t σ 2 f ( 1 , 2 ) = 0 A f ( 1 ) + C σ + t σ 1 f ( 1 ) = 0 A f ( 2 ) + C σ + t σ 2 f ( 2 ) = 0 A f + C σ f = 0 ,
and combining these equations gives
A ( f ( 1 , 2 ) f ( 1 ) f ( 2 ) + f ) + C σ ( f ( 1 , 2 ) f ( 1 ) f ( 2 ) + f ) + t C σ 1 f ( 1 , 2 ) t C σ 1 f ( 1 ) + t C σ 2 f ( 1 , 2 ) t C σ 2 f ( 2 ) = 0 .
Here, we have used the fact that C is linear in σ . Dividing by t 2 and taking the limit as t 0 then gives
A H + C σ H = C σ 1 w ( 2 ) C σ 2 w ( 1 ) , H | Γ = 0 .
The boundedness is seen from Theorem 3.7 in [4]. □
We next determine the dependence of the forward map’s second derivative in terms of ϵ .
Proposition 9.
For every j , k , admissible σ B σ , and admissible variation σ ^ , there holds:
G j k ( σ ) [ σ ^ , σ ^ ] = O ( ϵ ) .
Proof. 
Considering (48) with fixed ϕ j and boundary measurement location x k , we expand
H = H 0 + ϵ H 1 + ϵ 2 H 2 w ( 1 ) = w 0 ( 1 ) + ϵ w 1 ( 1 ) + ϵ 2 w 2 ( 1 ) w ( 2 ) = w 0 ( 2 ) + ϵ w 1 ( 2 ) + ϵ 2 w 2 ( 2 ) .
Using (48), at O ( 1 / ϵ ) , we obtain
σ s L H 0 = σ s , 1 L w 0 ( 2 ) + σ s , 2 L w 0 ( 1 ) .
From Proposition 2, w 0 ( 1 ) and w 0 ( 2 ) have no velocity dependence, which gives L H 0 = 0 ; thus, H 0 has no velocity dependence. Additionally, from the zero boundary condition of H, we have H 0 | Ω = 0 .
At O ( 1 ) ,
v · x H 0 σ s L H 1 = σ s , 1 L w 1 ( 2 ) + σ s , 2 L w 1 ( 1 ) ,
and thus
L H 1 = 1 σ s v · x H 0 σ s , 1 σ s L w 1 ( 2 ) σ s , 2 σ s L w 1 ( 1 ) .
Since L w 1 ( 1 ) = w 1 ( 1 ) and L w 1 ( 2 ) = w 1 ( 2 ) , we have
H 1 = 1 σ s v · x H 0 σ s , 1 σ s w 1 ( 2 ) σ s , 2 σ s w 1 ( 1 ) .
Finally at O ( ϵ ) , we have
v · x H 1 + σ a H 0 σ s L H 2 = σ a , 1 w 0 ( 2 ) + σ s , 1 L w 2 ( 2 ) σ a , 2 w 0 ( 1 ) + σ s , 2 L w 2 ( 1 ) .
Integrating over velocity, we obtain an equation for the leading order term H 0 , which is in closed form, since w 1 ( i ) are written in terms of w 0 ( i ) (see [4])
· 1 σ s H 0 + σ a H 0 = σ a , 1 w 0 ( 2 ) σ a , 2 w 0 ( 1 ) σ s , 1 σ s 1 | S d 1 | S d 1 w 1 ( 2 ) d v σ s , 2 σ s 1 | S d 1 | S d 1 w 1 ( 1 ) d v .
Upon averaging over velocity, we obtain the second derivative of the operator G with the [ σ ^ , σ ^ ] perturbation,
G j k ( σ ) [ σ ^ , σ ^ ] = Γ + ( x k ) H ( x , v ; ϕ j ) v · n d v = Γ + ( x k ) ( H 0 ( x ; ϕ j ) + ϵ H 1 ( x , v ; ϕ j ) + ϵ 2 H 2 ( x , v ; ϕ j ) ) v · n d v = ϵ Γ + ( x k ) ( H 1 ( x , v ; ϕ j ) + O ( ϵ 2 ) ) v · n d v = O ( ϵ ) .
The contribution from the H 0 term becomes zero due to its trivial boundary condition, and H is bounded from [4]. □
Directly from Propositions 3, 7 and 9, we have the following corollary.
Corollary 1.
For σ B σ and admissible variation σ ^ , the diagonal elements A ( σ ) [ σ ^ , σ ^ ] of the Hessian of the posterior distribution satisfies:
A ( σ ) [ σ ^ , σ ^ ] = O ( ϵ ) .

5.2. Nonlinear RTE

We repeat the analysis for the nonlinear RTE, for the case when G has been linearized. Using similar notatinos, we let
H f = S 1 ( σ ) [ σ 1 , σ 2 ] , H T = S 2 ( σ ) [ σ 1 , σ 2 ]
and have the following proposition regarding the boundedness of H f and H T .
Proposition 10.
For σ B σ and admissible variations σ 1 , σ 2 , ( H f , H T ) are the unique solution to the following system:
ϵ v · x H f = σ H f ( 4 T 3 H T + 12 T 2 w T ( 1 ) w T ( 2 ) ) + σ 1 ( 4 T 3 w T ( 2 ) ) + σ 2 ( 4 T 3 w T ( 1 ) ) σ 1 w f ( 2 ) σ 2 w f ( 1 ) ϵ 2 H T = σ H ρ + σ ( 4 T 3 H T + 12 T 2 w T ( 1 ) w T ( 2 ) + σ 1 ( 4 T 3 w T ( 2 ) ) + σ 2 ( 4 T 3 w T ( 2 ) ) σ 1 w ρ ( 2 ) σ 2 w ρ ( 1 ) H f | Γ = 0 , H T | Ω = 0 ,
where w f ( 1 , 2 ) denotes the first derivative of the parameter-to-solution map for f in the σ 1 , 2 direction, and similarly for w T ( 1 , 2 ) , and H ρ = 1 | S d 1 | S d 1 H f d v . Moreover, there holds:
H f L ( Ω ) C f , H T L ( Ω ) C T ,
where C T > 0 and C f depends on C T .
Proof. 
We let H f = 1 t 2 f ( 1 , 2 ) f ( 1 ) f ( 2 ) + f , and H T = 1 t 2 ( T ( 1 , 2 ) T ( 1 ) T ( 2 ) + T ) . Each ( f ( i , j ) , T ( i , j ) ) satisfies the nonlinear RTE with a variety of σ shown below:
v · x f ( 1 , 2 ) = 1 ϵ ( σ + t σ 1 + t σ 2 ) ( T ( 1 , 2 ) ) 4 f ( 1 , 2 ) , T ( 1 , 2 ) = 1 ϵ 2 ( σ + t σ 1 + t σ 2 ) ( T ( 1 , 2 ) ) 4 ρ ( 1 , 2 ) ;
v · x f ( 1 ) = 1 ϵ ( σ + t σ 1 ) ( T ( 1 ) ) 4 f ( 1 ) , T ( 1 ) = 1 ϵ 2 ( σ + t σ 1 ) ( T ( 1 ) ) 4 ρ ( 1 ) ;
v · x f ( 2 ) = 1 ϵ ( σ + t σ 2 ) ( T ( 2 ) ) 4 f ( 2 ) , T ( 2 ) = 1 ϵ 2 ( σ + t σ 2 ) ( T ( 2 ) ) 4 ρ ( 2 ) ;
v · x f = 1 ϵ σ T 4 f , T = 1 ϵ 2 σ T 4 ρ , ,
where ρ = 1 | S d 1 | S d 1 f d v .
We begin by computing various combinations of T ( i ) . First, we recall:
w T ( 1 ) = lim t 0 1 t ( T ( 1 ) T ) , w T ( 2 ) = lim t 0 1 t ( T ( 2 ) T ) ,
and expand
( T ( 1 , 2 ) ) 4 = ( T ( 1 ) ) 4 + ( T ( 1 ) ) 3 ( 4 t w T ( 2 ) + 4 t 2 H T ) + ( T ( 1 ) ) 2 ( 6 t 2 ( w T ( 2 ) ) 2 ) + O ( t 3 ) = ( T + t w T ( 1 ) ) 4 + ( T + t w T ( 1 ) ) 3 ( 4 t w T ( 2 ) + 4 t 2 H T ) + ( T + t w T ( 1 ) ) 2 ( 6 t 2 ( w T ( 2 ) ) 2 ) + O ( t 3 ) .
Similarly for σ 1 and σ 2 respectively,
( T ( 1 ) ) 4 = ( T + t w T ( 1 ) ) 4 = T 4 + 4 t T 3 w T ( 1 ) + 6 t 2 T 2 w T ( 1 ) + O ( t 3 ) , ( T ( 2 ) ) 4 = T 4 + 4 t T 3 w T ( 2 ) + 6 t 2 T 2 w T ( 2 ) + O ( t 3 ) .
Combining these,
( T ( 1 , 2 ) ) 4 ( T ( 1 ) ) 4 = 4 t T 3 w T ( 2 ) + O ( t 2 ) , ( T ( 1 , 2 ) ) 4 ( T ( 2 ) ) 4 = 4 t T 3 w T ( 1 ) + O ( t 2 ) ,
so that
( T ( 1 , 2 ) ) 4 ( T ( 1 ) ) 4 ( T ( 2 ) ) 4 + T = 4 t 2 T 3 H T + 12 t 2 T 2 w T ( 1 ) w T ( 2 ) + O ( t 3 ) .
To derive the equations for H f and H T , we take (49), subtract (50) and (51), and add (52). Using (53),
ϵ v · x H f = σ H f ( 4 T 3 H T + 12 T 2 w T ( 1 ) w T ( 2 ) ) + σ 1 ( 4 T 3 w T ( 2 ) ) + σ 2 ( 4 T 3 w T ( 1 ) ) σ 1 w f ( 2 ) σ 2 w f ( 1 ) ϵ 2 H T = σ H ρ + σ ( 4 T 3 H T + 12 T 2 w T ( 1 ) w T ( 2 ) + σ 1 ( 4 T 3 w T ( 2 ) ) + σ 2 ( 4 T 3 w T ( 2 ) ) σ 1 w ρ ( 2 ) σ 2 w ρ ( 1 ) .
To show the boundedness, we note that H f , H T solve the nonlinear RTE with a modified right-hand side only containing bounded terms, so we again use Theorem 3.1 from [33] to obtain the boundedness:
H f L ( Ω ) C f , H T L ( Ω ) C T .
Proposition 11.
For every j , k and functions σ , σ ^ in the admissible set, the second derivative of the forward map is of order ϵ,
G j k ( σ ) [ σ ^ , σ ^ ] = O ( ϵ ) .
Proof. 
We start with arbitrary directions σ 1 and σ 2 and later choose the [ σ ^ , σ ^ ] direction. To find the ϵ dependence of H f and H T , we expand:
H f = H f , 0 + ϵ H f , 1 + ϵ 2 H f , 2 w f ( 1 ) = w f , 0 ( 1 ) + ϵ w f , 1 ( 1 ) + ϵ 2 w f , 2 ( 1 ) w f ( 2 ) = w f , 0 ( 2 ) + ϵ w f , 1 ( 2 ) + ϵ 2 w f , 2 ( 2 ) T = T 0 + ϵ 2 T 2 w T ( 1 ) = w T , 0 ( 1 ) + ϵ w T , 1 ( 1 ) + ϵ 2 w T , 2 ( 1 ) w T ( 2 ) = w T , 0 ( 2 ) + ϵ w T , 1 ( 2 ) + ϵ 2 w T , 2 ( 2 ) H T = H T , 0 + ϵ 2 H T , 2 ,
and plug this in to (54). At order O ( 1 ) , we obtain
0 = 12 σ T 0 2 w T , 0 ( 1 ) w T , 0 ( 2 ) + 4 σ T 0 3 H T , 0 + 4 σ 1 T 0 3 w T , 0 ( 2 ) + 4 σ 2 T 0 3 w T , 0 ( 1 ) σ H f , 0 σ 1 w f , 0 ( 2 ) σ 2 w f , 0 ( 1 ) 0 = 12 σ T 0 2 w T , 0 ( 1 ) w T , 0 ( 2 ) + 4 σ T 0 3 H T , 0 + 4 σ 1 T 0 3 w T , 0 ( 2 ) + 4 σ 2 T 0 3 w T , 0 ( 1 ) σ H ρ , 0 σ 1 w ρ , 0 ( 2 ) σ 2 w ρ , 0 ( 1 ) ,
subtracting these two equations, we obtain
H f , 0 = H ρ , 0 ,
so H f , 0 loses its velocity dependence. At O ( ϵ ) , we obtain
v · x H f , 0 = σ H f , 1 σ 1 w f , 1 ( 2 ) σ 2 w f , 1 ( 1 ) ,
so
H f , 1 = 1 σ v · x H f , 0 σ 1 σ w f , 1 ( 2 ) σ 2 σ w f , 1 ( 1 ) .
Finally, at O ( ϵ 2 ) , we obtain
v · x H f = 8 σ H T , 0 H T , 2 T 0 3 + 12 σ ( H T , 0 ) 2 T 0 2 T 2 + 12 σ 2 T 0 2 T 2 w T , 0 ( 1 ) + 4 σ 2 T 0 3 w T , 2 ( 1 ) + 12 σ 1 T 0 2 T 2 w T , 0 ( 2 ) + 24 σ T 0 T 2 w T , 0 ( 2 ) + 12 σ T 0 2 w T , 2 ( 1 ) w T , 0 ( 2 ) + 4 σ 1 T 0 3 w T , 2 ( 2 ) + 12 σ T 0 2 w T , 0 ( 1 ) w T , 2 ( 2 ) σ H f , 2 σ 1 w f , 2 ( 2 ) σ w f , 2 ( 1 )
and
H T , 0 = 8 σ H T , 0 H T , 2 T 0 3 + 12 σ ( H T , 0 ) 2 T 0 2 T 2 + 12 σ 2 T 0 2 T 2 w T , 0 ( 1 ) + 4 σ 2 T 0 3 w T , 2 ( 1 ) + 12 σ 1 T 0 2 T 2 w T , 0 ( 2 ) + 24 σ T 0 T 2 w T , 0 ( 2 ) + 12 σ T 0 2 w T , 2 ( 1 ) w T , 0 ( 2 ) + 4 σ 1 T 0 3 w T , 2 ( 2 ) + 12 σ T 0 2 w T , 0 ( 1 ) w T , 2 ( 2 ) σ 1 | S d 1 | S d 1 H f , 2 d v σ 1 1 | S d 1 | S d 1 w f , 2 ( 2 ) d v σ 2 1 | S d 1 | S d 1 w f , 2 ( 1 ) d v .
Integrating equation (55) over velocity, subtracting (56), and using Green’s theorem, we obtain
· 1 σ H f , 0 H T , 0 = 1 | S d 1 | S d 1 v · x σ 1 σ w f , 1 ( 2 ) σ 2 σ w f , 1 ( 1 ) d v .
Since H f , 0 has trivial boundary condition, its contribution drops out.
G j k ( σ ) [ σ ^ , σ ^ ] = Γ + ( x k ) ( H f , 0 ( x ; ϕ j ) + ϵ H f , 1 ( x , v ; ϕ j ) + ϵ 2 H f , 2 ( x , v ; ϕ j ) ) v · n d v = ϵ Γ + ( x k ) H f , 1 ( x , v ; ϕ j ) v · n d v + O ( ϵ 2 ) ;
therefore, with the boundedness of H from from Proposition 10, we have G j k ( σ ) [ σ ^ , σ ^ ] = O ( ϵ ) . □
Corollary 2.
For σ B σ and admissible variation σ ^ , the diagonal elements A ( σ ) [ σ ^ , σ ^ ] of the Hessian of the posterior distribution based on the nonlinear RTE satisfies:
A ( σ ) [ σ ^ , σ ^ ] = O ( ϵ ) .
This is a direct corollary from Propositions 3, 7, and 11.

6. Research Results and Discussion

As is well known, the radiative transfer equation, in the high-energy regime where the scattering is dominating, is well approximated by the diffusion equation. This asymptotic relation constitutes a major part in model reduction for the forward setting. On the contrary, it has an adverse effect for the inverse problem. In particular, the reconstruction becomes increasingly unstable as we approach the diffusive regime. This relation has been investigated numerically in [1,15,16] and analytically in [17], where a full data to measurement map, named the Albedo operator, is assumed to be given.
However, in practice, only partial data are available, and the previously obtained well-posedness results are no longer feasible. We focus our investigation in this paper for this scenario, using Bayesian inference as the basic tool. We have proposed two new measures to characterize the stability and its deterioration in the diffusion regime. One is the KL divergence between the posterior and prior distribution, which implies the overall information gain from the data. We show that the information gain is less in the small Knudsen number regime. The other is the Hessian of the posterior distribution, which is related to the mean square error of the posterior and quantifies the uncertainty of the optimizer; therefore, it serves as a local measure around the MAP. We find that the posterior distribution is more and more “flat” and thus carries no useful information when the diffusion effect is strong. Both measures are taken for the linear RTE and are extended to, for the first time in the literature, treat the nonlinear RTE as well.
Although the paper is completely theoretical, it gives guidance on conducting numerical experiments. The discussion in this paper essentially suggests that the problem is intrinsically bad in the diffusion regime and thus rules out all possible algorithms that could potentially deliver a good numerical recovery.
There are some natural follow-up questions that need to be answered in the near future. One task is to clearly identify the number and the quality of the measurements for a unique reconstruction. Suppose the reconstructed function is represented by an N dimensional vector; then, how many experiments and measurements exactly are needed for a unique reconstruction, and where should the source and detector be located? The answer to this question for EIT (electrical impedance tomography) was given in [26], but one needs to tune the process to fit the situation for the radiative transfer equation used in the current paper. Another question concerns the numerical stability. While it is true that the information gain deteriorates as ϵ becomes small, there might be numerical approaches that gradually incorporate information from high-energy to low-energy photons. One possible strategy is to use a large number of experimental measurements conducted with low-energy photons to build a rough initial guess as a “warm start” before adding in information obtained at a high-energy level. The warm start given by the low-energy level results serves as a good initial guess toward a final convergence. The process is parallel to Bayesian tempering, which saw some good developments in the past few years [34,35], and the inverse scattering problem that combines information from multiple frequencies [36]. Another approach is to use a hybrid image modality, such as photoacoustic tomography. where acoustic measurement generated by the photoacoustic effect is used to infer the optimal properties of the media. In this case, an improved stability is anticipated [37,38].

Numerical Evidence

Here, we conduct two numerical tests to further support our theoretical findings. In particular, we consider the following linear radiative transfer equation:
ϵ v · x f ( x , v ) = σ s ( x ) L f ( x , v ) , x [ 0 , 0.6 ] 2 , v S 1 , f | Γ = ϕ .
The data are prepared as follows. For the i-th experiment, we prescribe incoming data ϕ i and measure the output intensity f at location j on Γ + , which is denoted as d i , j . Here, ϕ and detect location j are chosen randomly, and N I = 44 number of experiments are conducted with N J = 264 number of receivers for each experiment. These data pairs { ϕ i , { d i , j } j = 1 N J } i = 1 N I are kept fixed for various choices of Knudsen number ϵ for a fair comparison.
In the first test, we assume σ s ( x ) has the form
σ s ( x ) = 1 + σ N ( ( 0.3 , 0.3 ) , Σ ) ,
where N ( 0 , Σ ) is a two-dimensional normal distribution with mean zero and a fixed covariance matrix Σ . The parameter σ is the parameter to be inferred. We assume the ground-truth σ = 0.5 , and the ground-truth medium is plotted on the left of Figure 1. The inverse problem aims at inferring this ground-truth from the prescribed data pairs. To illustrate the stability of the inverse problem, we show how, for different choices of σ away from the truth, the measurements differ from the true measurements. More precisely, we denote
d p ( σ ) = i , j | d i , j ( σ ) d i , j true | 2 1 / 2
as the perturbation in the measurement, where d i , j is the measurement with a fixed σ for the i-th experiment and j-th measurement, and d i , j true is the measurement with true σ = 0.5 . In the right panel of Figure 1, we plot the difference in measurement data (58) versus the perturbation in σ for three different ϵ . It is clear that with smaller ϵ , the difference in measurement becomes more indistinguishable, which means that a smaller perturbation in measurement can lead to larger deviation in reconstruction. This is in consistent with our theory on the stability deterioration with decreasing ϵ .
In the second test, we consider a different form of σ s ( x ) :
σ s ( x ) = 1 + σ 1 N ( [ 0.2 , 0.2 ] , Σ ) + σ 2 N ( [ 0.45 , 0.45 ] , Σ ) .
Here, the to-be-reconstructed parameters are σ 1 and σ 2 . We choose σ 1 = 0.5 and σ 2 = 0.3 to be the ground-truth and display the medium in Figure 2. Similar to the previous case, we plot the difference in measurement
d p ( σ 1 , σ 2 ) = i , j | d i , j ( σ 1 , σ 2 ) d i , j true | 2 1 / 2
with respect to σ 1 ( 0 , 1 ) and σ 2 ( 0 , 0.6 ) . It is again evident that for smaller ϵ , d p becomes flatter, leading to a deterioration in stability, as shown in Figure 2. In addition, we identify the data that are above 0.05 × max i , j { d i , j } in Figure 3. Here, the horizontal axis is the index for the experiment, and the vertical axis is the index for the receiver. It is seen that a larger ϵ gives a sparse dataset, whereas small ϵ gives a dense dataset, meaning that all receivers receive some information about the source. This indicates that the data are more spread out for smaller ϵ , which makes the inverse problem harder to solve.

7. Conclusions

In this paper, we examine the stability deterioration for the multiscale inverse radiative transfer equation (RTE) in the Bayesian framework. Even though the instability on the continuous level was discussed in the literature [31] and the multiscale convergence was shown in [16,17,32], the instability representation in the Bayesian framework was open. The current paper constitutes the first result that fills the gap. The results presented suggest that one should not use Bayesian inference for conducting optical tomography when the photon energy is low. The numerical algorithm, without fine tuning, would carry very limited use in reconstructing the ground-truth media.

Author Contributions

Conceptualization: Q.L.; methodology, Q.L. and L.W.; validation, Q.L. and L.W.; formal analysis, K.N. and L.W.; investigation, Q.L. and L.W.; resources, Q.L. and L.W.; writing—original draft preparation, K.N.; writing—review and editing, Q.L. and L.W.; visualization, Q.L.; supervision, Q.L.; project administration, Q.L.; funding acquisition, Q.L. and L.W. All authors have read and agreed to the published version of the manuscript.

Funding

Q.L. acknowledges support from Vilas Early Career award. The research is supported in part by NSF via grant DMS-1750488 and the Office of the Vice Chancellor for Research and Graduate Education at the University of Wisconsin Madison with funding from the Wisconsin Alumni Research Foundation. K.N. acknowledges support from NSF grant DMS-1750488. L.W. acknowledges support from NSF via grant DMS-1846854.

Institutional Review Board Statement

Not applicable.

Informed Consent Statement

Not applicable.

Data Availability Statement

Data is contained within the article.

Conflicts of Interest

The authors declare no conflict of interest.

References

  1. Arridge, S.R. Optical tomography in medical imaging. Inverse Probl. 1999, 15, R41–R93. [Google Scholar] [CrossRef] [Green Version]
  2. Case, K.M.; Zweifel, P.F. Linear Transport Theory; Addison-Wesley Publishing Company: Boston, MA, USA, 1967. [Google Scholar]
  3. Case, K.; Zweifel, P. Existence and uniqueness theorems for the neutron transport equation. J. Math. Phys. 1963, 4, 1376–1385. [Google Scholar] [CrossRef] [Green Version]
  4. Egger, H.; Schlottbom, M. Numerical methods for parameter identification in stationary radiative transfer. Comput. Optim. Appl. 2015, 62, 67–83. [Google Scholar] [CrossRef] [Green Version]
  5. Egger, H.; Schlottbom, M. A mixed variational framework for the radiative transfer equation. Math. Model. Methods Appl. Sci. 2012, 22, 1150014. [Google Scholar] [CrossRef]
  6. Dautray, R.; Lions, J.L. Mathematical Analysis and Numerical Methods for Science and Technology: Volume 1 Physical Origins and Classical Methods; Springer Science & Business Media: Berlin/Heidelberg, Germany, 2012. [Google Scholar]
  7. Choulli, M.; Stefanov, P. An inverse boundary value problem for the stationary transport equation. Osaka J. Math. 1999, 36, 87–104. [Google Scholar]
  8. Stefanov, P.; Tamasan, A. Uniqueness and non-uniqueness in inverse radiative transfer. Proc. Am. Math. Soc. 2009, 137, 2335–2344. [Google Scholar] [CrossRef] [Green Version]
  9. Wang, J. Stability estimates of an inverse problem for the stationary transport equation. Ann. Inst. Henri Poincaré Phys. Théor. 1999, 70, 473–495. [Google Scholar]
  10. Bal, G.; Langmore, I.; Monard, F. Inverse transport with isotropic sources and angularly averaged measurements. Inverse Probl. Imaging 2008, 2, 23–42. [Google Scholar] [CrossRef]
  11. Bal, G.; Jollivet, A. Generalized stability estimates in inverse transport theory. Inverse Probl. Imaging 2018, 12, 59–90. [Google Scholar] [CrossRef] [Green Version]
  12. Bal, G. Inverse transport theory and applications. Inverse Probl. 2009, 25, 48. [Google Scholar] [CrossRef] [Green Version]
  13. Bardos, C.; Santos, R.; Sentis, R. Diffusion Approximation and Computation of the Critical Size. Trans. Am. Math. Soc. 1984, 284, 617–649. [Google Scholar] [CrossRef]
  14. Bensoussan, A.; Lions, J.L.; Papanicolaou, G. Boundary layers and homogenization of transport processes. Publ. Res. Inst. Math. Sci. 1979, 15, 53–157. [Google Scholar] [CrossRef] [Green Version]
  15. Hielscher, A.; Alcouffe, R.; Barbour, R. Comparison of finite-difference transport and diffusion calculations for photon migration in homogeneous and heterogeneous tissues. Phys. Med. Biol. 1998, 43, 1285–1302. [Google Scholar] [CrossRef]
  16. Chen, K.; Li, Q.; Wang, L. Stability of stationary inverse transport equation in diffusion scaling. Inverse Probl. 2018, 34, 025004. [Google Scholar] [CrossRef] [Green Version]
  17. Lai, R.Y.; Li, Q.; Uhlmann, G. Inverse Problems for the Stationary Transport Equation in the Diffusion Scaling. SIAM J. Appl. Math. 2019, 79, 2340–2358. [Google Scholar] [CrossRef]
  18. Nusken, N.; Reich, S.; Rozdeba, P.J. State and Parameter Estimation from Observed Signal Increments. Entropy 2019, 21, 505. [Google Scholar] [CrossRef] [PubMed] [Green Version]
  19. Jiang, S.W.; Harlim, J. Parameter Estimation with Data-Driven Nonparametric Likelihood Functions. Entropy 2019, 21, 559. [Google Scholar] [CrossRef] [PubMed] [Green Version]
  20. Biegler, L.T.; Ghattas, O.; Heinkenschloss, M.; van Bloemen Waanders, B. Large-Scale PDE-Constrained Optimization: An Introduction. In Large-Scale PDE-Constrained Optimization; Biegler, L.T., Heinkenschloss, M., Ghattas, O., van Bloemen Waanders, B., Eds.; Springer: Berlin/Heidelberg, Germany, 2003; pp. 3–13. [Google Scholar]
  21. Rees, T.; Dollar, H.S.; Wathen, A.J. Optimal Solvers for PDE-Constrained Optimization. SIAM J. Sci. Comput. 2010, 32, 271–298. [Google Scholar] [CrossRef] [Green Version]
  22. De los Reyes, J. Numerical PDE-Constrained Optimization; SpringerBriefs in Optimization; Springer International Publishing: Cham, Switzerland, 2015. [Google Scholar]
  23. Idier, J. Bayesian Approach to Inverse Problems; ISTE Wiley: Hoboken, NJ, USA, 2013. [Google Scholar]
  24. Stuart, A.M. Inverse problems: A Bayesian perspective. Acta Numer. 2010, 19, 451–559. [Google Scholar] [CrossRef] [Green Version]
  25. Dashti, M.; Stuart, A.M. The Bayesian Approach to Inverse Problems. In Handbook of Uncertainty Quantification; Ghanem, R., Higdon, D., Owhadi, H., Eds.; Springer International Publishing: Cham, Switzerland, 2017; pp. 311–428. [Google Scholar]
  26. Bui-Thanh, T.; Li, Q.; Zepeda-Núñez, L. Bridging and Improving Theoretical and Computational Electric Impedance Tomography via Data Completion. arXiv 2021, arXiv:2105.00554. [Google Scholar]
  27. Wang, J.; Li, M.; Cheng, J.; Guo, Z.; Li, D.; Wu, S. Exact reconstruction condition for angle-limited computed tomography of chemiluminescence. Appl. Opt. 2021, 60, 4273–4281. [Google Scholar] [CrossRef]
  28. Harris, I. Direct Sampling for Recovering Sound Soft Scatterers from Point Source Measurements. Computation 2021, 9, 120. [Google Scholar] [CrossRef]
  29. Li, Q.; Sun, W. Applications of kinetic tools to inverse transport problems. Inverse Probl. 2020, 36, 035011. [Google Scholar] [CrossRef] [Green Version]
  30. Hellmuth, K.; Klingenberg, C.; Li, Q.; Tang, M. Multiscale Convergence of the Inverse Problem for Chemotaxis in the Bayesian Setting. Computation 2021, 9, 119. [Google Scholar] [CrossRef]
  31. Li, Q.; Newton, K. Diffusion Equation-Assisted Markov Chain Monte Carlo Methods for the Inverse Radiative Transfer Equation. Entropy 2019, 21, 291. [Google Scholar] [CrossRef] [Green Version]
  32. Newton, K.; Li, Q.; Stuart, A.M. Diffusive optical tomography in the Bayesian framework. Multiscale Model. Simul. 2020, 18, 589–611. [Google Scholar] [CrossRef] [Green Version]
  33. Klar, A.; Schmeiser, C. Numerical passage from radiative heat transfer to nonlinear diffusion models. Math. Model. Methods Appl. Sci. 2001, 11, 749–767. [Google Scholar] [CrossRef]
  34. Chandra, R.; Azam, D.; Kapoor, A.; Müller, R.D. Surrogate-assisted Bayesian inversion for landscape and basin evolution models. Geosci. Model Dev. 2020, 13, 2959–2979. [Google Scholar] [CrossRef]
  35. Latz, J.; Madrigal-Cianci, J.P.; Nobile, F.; Tempone, R. Generalized parallel tempering on Bayesian inverse problems. Stat. Comput. 2021, 31, 1–26. [Google Scholar] [CrossRef]
  36. Bao, G.; Li, P.; Lin, J.; Triki, F. Inverse scattering problems with multi-frequencies. Inverse Probl. 2015, 31, 093001. [Google Scholar] [CrossRef] [Green Version]
  37. Rabanser, S.; Neumann, L.; Haltmeier, M. Stochastic proximal gradient algorithms for multi-source quantitative photoacoustic tomography. Entropy 2018, 20, 121. [Google Scholar] [CrossRef] [PubMed] [Green Version]
  38. Frederick, C.; Ren, K.; Vallélian, S. Image Reconstruction in Quantitative Photoacoustic Tomography with the Simplified P_2 Approximation. SIAM J. Imaging Sci. 2018, 11, 2847–2876. [Google Scholar] [CrossRef]
Figure 1. Test 1. (Left): ground truth (57) with σ = 0.5 . (Right): perturbation in measured data versus perturbation in σ .
Figure 1. Test 1. (Left): ground truth (57) with σ = 0.5 . (Right): perturbation in measured data versus perturbation in σ .
Computation 10 00015 g001
Figure 2. Test 2: The panel on the left shows the ground-truth medium with σ 1 = 0.5 and σ 2 = 0.3 in (59). The two plots on the right show, for different ϵ , the perturbation in the measured data versus the perturbation in σ 1 , 2 .
Figure 2. Test 2: The panel on the left shows the ground-truth medium with σ 1 = 0.5 and σ 2 = 0.3 in (59). The two plots on the right show, for different ϵ , the perturbation in the measured data versus the perturbation in σ 1 , 2 .
Computation 10 00015 g002
Figure 3. Test 2. Visualization of measured data whose intensity value is above 0.05 × max i , j { d i , j } .
Figure 3. Test 2. Visualization of measured data whose intensity value is above 0.05 × max i , j { d i , j } .
Computation 10 00015 g003
Publisher’s Note: MDPI stays neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Share and Cite

MDPI and ACS Style

Li, Q.; Newton, K.; Wang, L. Bayesian Instability of Optical Imaging: Ill Conditioning of Inverse Linear and Nonlinear Radiative Transfer Equation in the Fluid Regime. Computation 2022, 10, 15. https://doi.org/10.3390/computation10020015

AMA Style

Li Q, Newton K, Wang L. Bayesian Instability of Optical Imaging: Ill Conditioning of Inverse Linear and Nonlinear Radiative Transfer Equation in the Fluid Regime. Computation. 2022; 10(2):15. https://doi.org/10.3390/computation10020015

Chicago/Turabian Style

Li, Qin, Kit Newton, and Li Wang. 2022. "Bayesian Instability of Optical Imaging: Ill Conditioning of Inverse Linear and Nonlinear Radiative Transfer Equation in the Fluid Regime" Computation 10, no. 2: 15. https://doi.org/10.3390/computation10020015

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Metrics

Back to TopTop