Multiscale convergence of the inverse problem for chemotaxis in the Bayesian setting

Chemotaxis describes the movement of an organism, such as single or multi-cellular organisms and bacteria, in response to a chemical stimulus. Two widely used models to describe the phenomenon are the celebrated Keller-Segel equation and a chemotaxis kinetic equation. These two equations describe the organism movement at the macro- and mesoscopic level respectively, and are asymptotically equivalent in the parabolic regime. How the organism responds to a chemical stimulus is embedded in the diffusion/advection coefficients of the Keller-Segel equation or the turning kernel of the chemotaxis kinetic equation. Experiments are conducted to measure the time dynamics of the organisms' population level movement when reacting to certain stimulation. From this one infers the chemotaxis response, which constitutes an inverse problem. \\ In this paper we discuss the relation between both the macro- and mesoscopic inverse problems, each of which is associated to two different forward models. The discussion is presented in the Bayesian framework, where the posterior distribution of the turning kernel of the organism population is sought after. We prove the asymptotic equivalence of the two posterior distributions.


Introduction
Chemotaxis is the phenomenon of organisms directing their movements upon certain chemical stimulation.Every motile organism exhibits some type of chemotaxis.Mathematically, there are two main-stream mathematical models used to describe this phenomenon: One at the macroscopic population level and the other at the mesoscopic level.
The most famous model in the first category is the Keller-Segel equation, introduced in Patlak [1953], Keller and Segel [1971a,b].The equation traces the evolution of bacteria density when chemical stimulation is introduced to the system: where ρ(x, t) is the cell density at location x at time t > 0. In this equation, both the advection term and the diffusion process integrate the external chemical density information, meaning both the diffusion matrix D[c](x, t) and the drift Both models above are empirical in nature.The coefficients, such as D, Γ and K that encode the way bacteria respond to the environment are typically unknown ahead of time.Since the chemoattractant concentration c depends on space and time, so do D, Γ and K.However, except for very few well studied bacteria, these quantities are not explicitly known and cannot be measured directly.One thus needs to design experiments and use measurable quantities to infer the information.This constitutes the inverse problem we study.One such experiment was reported in Giometto et al. [2015] where the authors studied phototaxis and use video recording of the seaweed motion (ρ in time) to infer D and Γ in (1).
There are various ways to conduct inverse problems, and in this paper, we take the viewpoint of Bayesian inference.This is to assume that the coefficients are not uniquely configured in reality, but rather follow a certain probability distribution.The measurements are taken to infer this probability.In the process of such inference, one nevertheless needs to incorporate the forward model.The two different forward models described above then lead to two distinctive posterior distributions as the inference.
One natural question is to understand the relation between the two resulting posterior distributions.In this article, we answer this question by asymptotic analysis.To be specific, we will show that the two models are asymptotically equivalent in the long time and large space regime, and (D, Γ) can be uniquely determined by a given K.As such, the associated two inverse problems are asymptotically equivalent too.The equivalence is characterized by the distance (we use both the Kullback-Leibler divergence and the Hellinger distance) between the two corresponding posterior distributions.We show that this distance vanishes asymptotically as the Knudsen number, a quantity that measures the mean free path between two subsequent tumbles, becomes arbitrarily small.
The rest of the paper is organized as follows: In section 2 we present the asymptotic relation between the two forward models.This can be seen as an adaption of the results in Chalub et al. [2004] to our setting.The analysis serves as the foundation to link the two inverse problems.In section 3 we formulate the Bayesian inverse problems corresponding to the scaled chemotaxis equation and the Keller Segel model as underlying models.The well-posedness and convergence of the two corresponding posterior distributions is shown in section 4. The results are summarized and discussed in section 5.
We should stress that both mathematical modeling of chemotaxis and Bayesian inference are active research areas.In formulating our problems, we select the most widely-accepted models and methods.
For modeling chemotaxis, the two models (1)-( 2) are the classical ones, and were derived from the study of a biased random walk Alt [1980], Patlak [1953].They assume the organisms passively depend on the environment.When bacteria actively respond and change the environment, a parabolic or elliptic equation for c can be added to describe PREPRINT such feedback to the environment Keller and Segel [1971a, 1970, 1971b].The coupled system consisting of equation ( 1) and a parabolic equation for c, where the chemo-attractant is assumed to be produced by the bacteria population, can exhibit blow-up solutions.Therefore, some particular form of D[c], Γ[c] are proposed to eliminate the unwanted behavior.These models include volume filling Kowalczyk [2005], quorum sensing models Horstmann and Winkler [2005], or the flux limited Keller Segel system Perthame et al. [2018].On the kinetic level, additional variables were introduced to describe the intracellular responses of the bacteria to the chemoattractant in the signalling pathway Erban and Othmer [2004], Si et al. [2012Si et al. [ , 2014]], Sun and Tang [2016].Asymptotic limits of these newer models sometimes reveal interesting phenomenon such as fractional diffusion Perthame et al. [2017].The asymptotic equivalence of the classical model to the Keller Segel model was extensively studied e.g. in Alt [1980], Othmer and Hillen [2002], Othmer et al. [1988], Chalub et al. [2004].In particular, the current paper heavily depends on the techniques shown in Chalub et al. [2004].
There is also a vast literature on inverse problems.For Bayesian inference perspective in scientific computing, interested readers are referred to monographs Stuart [2010], Dashti and Stuart [2015] and the references therein.In comparison, linking two or multiple inverse problems in different regimes are relatively rare.In Newton et al. [2020], the authors studied the asymptotic equivalence between the inverse kinetic radiative transport equations and its macroscopic counterpart, the diffusion equation.In Abdulle and Blasio [2020], the convergence of Bayesian posterior measures for a parametrized elliptic PDE forward model was shown in a similar fashion.
2 Asymptotic analysis for kinetic chemotaxis equations and the Keller-Segel model The two problems we will be using are chemotaxis kinetic equation and the Keller-Segel equation.We review these two models in this section and study their relation.It serves as a cornerstone for building the connection of the two associated inverse problems.
Throughout the paper, we assume the chemoattractant density c is one given and fixed function of (x, t) and is not produced or consumed by the bacteria.While this is an approximation, it is valid in many experiments where one has tight control over the matrix environment.We drop the dependence of K, D, Γ on the fixed c in the notation.
We claim, and will show below that the two equations ( 2) and ( 1) are asymptotically equivalent in the long time large space regime.Denote ε the scaling parameter, then in a parabolic scaling, the chemotaxis equation to be considered has the following form: (3) Formally, when ε → 0, the tumbling term dominates the equation and we expect, in the leading order: where K * can be viewed as the limiting operator as K ε .This means the limiting solution is almost in the null space of the limiting tumbling operator.Furthermore, due to the specific form of the tumbling operator, one can show that under certain conditions such null space is one dimensional, compare e.g.Chalub et al. [2004] Lemma 2 and following derivations.We thus formally write and denote f * = ρF .Conventionally we call F the local equilibrium.Due to the form of K, this is a function only of v.
Inserting this formula back into (3) and perform asymptotic expansion up to the second order, and following Chalub et al. [2004], we find that ρ satisfies the Keller-Segel equation: A rigorous proof of the convergence of a subsequence of f ε can be found in Chalub et al. [2004], theorem 3, where the authors discussed a nonlinear extension of the present model.

PREPRINT
From now on, we confine ourselves to kernels having the form of Remark 1.Because our aim is to compare the posterior distributions of K ε for the kinetic model (3) and the macroscopic model (4), this choice is reasonable.As shown in Chalub et al. [2004], higher order terms in ε would not affect the macroscopic equation.Therefore they would not be reconstructable by the macroscopic inverse problem.
In order to rigorously justify the above intuition on the convergence f ε → ρF and ensure the existence of solutions to equations ( 3), (4), we suppose (K 0 , K 1 ) to be an element of the admissible set for some preset constants C, α > 0. For any (K 0 , K 1 ) ∈ A it is straightforward to show that Remark 2. With (K 0 , K 1 ) assumed to be symmetric and antisymmetric, the local equilibrium F in (7) is explicit and simple.This is e.g. the case for one typical choice of the tumbling kernel: with antisymmetric φ", which represents a special case of the models extensively studied in Chalub et al. [2004].For better readability, we assume the symmetry properties of the tumbling kernel stated in (6) throughout the paper.We should mention, however, that it is possible to relax this assumptions on the tumbling kernel while maintaining the same macroscopic limit.In particular, if there exists one uniform velocity distribution F (v) > 0 that is positive, bounded and satisfies for all considered K 0 in the admissible set, then all statements and arguments provided in this paper still hold true.Note that by these requirements, assumption (A0) in Chalub et al.Chalub et al. [2004] is satisfied.
Suppose the initial data is smooth in the sense that f 0 ∈ C 1,+ c (R 3 × V ).".Then we have the following theorem on convergence which can be viewed as an adaption of the results in Chalub et al. [2004].Theorem 1. Suppose K ε has the form of (5) with (K 0 , K 1 ) ∈ A and suppose the initial condition f 0 ∈ C 1,+ c (R 3 × V ), then the solution f ε to the chemotaxis equation (3) satisfies the following: a) For sufficiently small ε, the solution f ε of equation (3) exists and is bounded in , where ρ satisfying the Keller-Segel equation (4) with coefficients Here θ and κ solve the cell problems: where c) The boundedness and the convergence is uniform in A.

Sketch of proof.
a) First of all, we have the maximum principle so that PREPRINT and following the same arguments as in Chalub et al. [2004], we integrate in time for for sufficiently small ε, we have: Calling the Grönwall lemma one obtains a bound on . Since the only role K i played is its boundedness by C, as in ( 11), the estimate we get is uniform in A and is independent of ε for ε small enough.
b) We show that f ε is a Cauchy sequence in ε.For the purpose, we call f ε and f ε the solutions of the chemotaxis equation ( 3) with the scaling being ε and ε.We also denote the difference fε,ε := f ε − f ε.Subtracting the two equations we have: with a trivial initial data fε,ε (x, 0, v) = 0.This is an equation with a source term S. Using the argument as in a), L ∞ boundedness of the time and spatial derivative ∂ t f ε, ∇ x f ε in S can be shown, meaning S is of order ε − ε.Running ( 11) again with this extra source term, we have Hence {f ε } is a Cauchy sequence, and thus converges to some . It remains to prove f = ρF almost everywhere in [0, T ] × R 3 × V with ρ satisfying the Keller-Segel equation (4) with D, Γ as given in equations ( 8)-( 9).This follows by arguments rather similar to those in Chalub et al. [2004], and is therefore omitted from here.Since only the boundedness of (K 0 , K 1 ) is seen in the proof, the convergence is uniform in A.

Bayesian inverse problem setup
Associated with the two forward models, there are two inverse problems.We describe the inverse problem setup and present them with the Bayesian inference formulation.
In the lab setup, it is assumed that the bacteria plate is large enough so that the boundary plays a negligible role.At the initial time, the bacteria cells are distributed on the plate.One then injects chemoattractants onto the plate through a controlled manner, so to have c(t, x) explicitly given, forcing K i , and (D, Γ) to be functions of (t, x, v) or (t, x) only.The bacteria density at location x at time t is then measured.
Measuring is usually done by taking high resolution photos of the plate at time t and counting the bacteria in a small neighbourhood of location x.Another possibility is taking a sample of the bacteria at location x and measuring the bacteria density of the sample by classical techniques like optical density OD 600 or flow cytometry, see e.g.Beal et al. [2020], Hammes F [2010].This however describes an invasive technique and thus allows measurements at only one time t.
The whole experiment is to take data of the following operator: if the dynamics of the bacteria is modeled by (3), and if the dynamics of the bacteria is modeled by (4).Noting that (D, Γ) are uniquely determined by (K 0 , K 1 ) by equations ( 8),( 9), we can equate A D ,Γ with A 0 K0,K1 .Although the more natural macroscopic inverse problem would be to recover the diffusion and drift coefficients D, Γ in (4), we choose to formulate the inverse problem for the tumbling kernel (K 0 , K 1 ).This allows us to compare the solution for both the kinetic and the macroscopic inverse problem.
Remark 3. In order to reasonably compare the solutions to the inverse problems, the solutions have to be of the same kind.We choose to reconstruct (K 0 , K 1 ) in both the kinetic and macroscopic inverse problem, see Figure 1 (left).The macroscopic inverse problem is thus also formulated for (K 0 , K 1 ) which (D, Γ) is a function of.Alternatively one could also reconstruct (D, Γ) from both models.In the kinetic setting this would mean to reconstruct (K chem 0 , K chem 1 ) and then transform to values of (D chem , Γ chem ) by equations (8),( 9), see Figure 1 (right).We do not choose this alternative, because the information on the tumbling kernel (K 0 , K 1 ) is microscopic and thus more detailed.Furthermore, with a fixed (K 0 , K 1 ), (D, Γ) can be uniquely determined, and thus the convergence can be viewed as a mere consequence, see also Remark 5.
chemotaxis Multiple experiments can be conducted using different initial profile, but the same controlled c(t, x) is used to ensure the to-be-reconstructed K i is unchanged from experiment to experiment.Denoting k ∈ [1 , • • • , K] the indices of the different initial data setups, and j = (j 1 , j the indices of the measuring time and location, with t j = t j1 being the measuring time, and χ j = χ j2 ∈ C c (R 3 ) being the spatial test function, then with (3) and (4) being the forward models, we take the measurements, respectively: where M j are the measuring operators with corresponding test functions (δ j , χ j ).One can think of χ j a compactly supported blob function concentrated at a certain location, meaning all the bacteria cells in a small neighborhood are counted towards this particular measurement, see Figure 2.This is a reasonable model when counting bacteria in a small neighbourhood or taking samples with a pipette.
Figure 2: Measurement of the bacteria density (blue) at two different measuring times t j , t j .The location of the test functions is indicated by the support in space of the test functions χ j , χ j .
PREPRINT Throughout the paper we assume the initial data and the measuring operators are controlled: Remark 4. The measurements G ε,chem jk (K 0 , K 1 ), G KS jk (K 0 , K 1 ) are formulated in a rather general form in equations ( 14),( 15) due to the freedom in the choice of the test function χ j ∈ C c (R 3 ).However, all subsequent derivations also hold true for the specific case of pointwise measurements with t j := t j1 and x j := x j2 .The measurements would then be G ε,chem jk ε (x j , t j , v) dv and G KS jk (K 0 , K 1 ) = ρ (k) (x j , t j ), which would correspond to measuring operators M j with test functions (δ tj 1 , δ xj 2 ).
Since measuring error is not avoidable in the measuring process, we assume it introduces additive error and collect the data of the form , where the noise η jk is assumed to be a random variable independently drawn from a Gaussian distribution N (0, γ 2 ) of known variance γ 2 > 0.
In the Bayesian form, the to-be-reconstructed parameter (K 0 , K 1 ) is assumed to be a random variable, and the goal is to reconstruct its distribution.Suppose a-priori we know that the parameter is drawn from the distribution µ 0 , then the Bayesian posterior distributions for (K 0 , K 1 ) should be using ( 3) as the forward model, and using ( 4) as the forward model.In the formula Z • is the normalization constant to ensure 1dµ y • (K 0 , K 1 ) = 1 and is the likelihood of observing the data y from a model with a tumbling kernel or diffusion and drift term derived by (K 0 , K 1 ).
In Section 4 we need to specify the conditions on µ 0 to ensure the well-definedness of µ y • .
Remark 5. Since the macroscopic model does not explicitly depend on (K 0 , K 1 ), the distribution of µ y KS (D, Γ) is of interest in the macroscopic description (4).There are two ways to derive it starting with a prior distribution on (K 0 , K 1 ): The natural way would be to transform the prior distribution to a prior on (D, Γ) by equations (8)-( 9) and then consider the inverse problem of reconstructing (D, Γ).This approach is displayed by the lower path in Figure 3. If, however, the posterior distribution µ y KS (K 0 , K 1 ) is calculated ahead of the transformation (as in our case), one could instead transform this posterior distribution directly to a distribution in the (D, Γ) space following the upper path in Figure 3. Naturally the question arises whether the two ways lead to the same posterior distribution.It turns out they do.Considering the second possibility, we see that the likelihood and thus the normalization constant only depend on (D, Γ), because we are in the macroscopic model.Hence, only the prior distribution is transformed just like it is the case for the first possibility.
Figure 3: Two ways to determine the posterior distribution µ y KS (D, Γ) from a prior µ 0 (K 0 , K 1 ) on the tumbling kernels. PREPRINT

Convergence of posterior distributions
One natural question arises: the two different forward models provide two different posterior distribution functions of (K 0 , K 1 ).Which distribution is the correct one, or rather, what is the relation between the two posterior distributions?
As discussed in section 2, the two forward models are asymptotically equivalent in the long time large space regime, so it is expected that the two posterior distribution converge as well.This suggests the amount of information given by the measurements is equally presented by the two forward models.However, this convergence result is not as straightforward as it may seem.One issue comes from the control of initial data and the measurement operator.For each initial data, the solution converges in L ∞ [0, T ]; , we now have a list of initial data, and the solutions are tested on a set of measuring operators, so we need a uniform convergence when tested on the dual space.Furthermore, to show the convergence of two distribution functions, a certain metric needs to be given on the probability function space, how does the convergence for one set of fixed (K 0 , K 1 ) translates to the convergence on the entire admissible set also needs to be taken care of.By choosing the admissible set ( 6), we formulated an assumption on the tumbling kernels (K 0 , K 1 ) ahead of time.With this a priori knowledge we showed the uniform boundedness and convergence of the solutions f ε to the chemotaxis equation (3) over the function set A in Theorem 1.This will play a crucial role in the convergence proof for the inverse problem.From here and on, we assume the prior distribution µ 0 is supported on A.
Before diving in to show the convergence, as an a priori estimate, we first show the well-posedness of the Bayesian posterior distributions in Lemma 1, following Stuart [2010], Dashti and Stuart [2015].
and the test functions χ j ∈ C c (R 3 ) satisfy (16) then the following properties of the posterior distributions hold true: a) The measurements G ε,chem and G KS are uniformly bounded on A (and uniformly in ε).b) For small enough ε, the measurements G ε,chem and G KS are Lipschitz continuous with respect to the tumbling kernels (K 0 , K 1 ) under the norm c) The posterior distributions are well-posed and absolutely continuous w.r.t. each other.
Proof.a) For every (j, k), we have: where we used the density conservation: Note that this bound is independent of ε.
b) For the chemotaxis model, we have for where f and tumbling kernels K ε = K 0 + εK 1 and Kε = K0 + ε K1 respectively.Their difference satisfies the scaled difference equation: PREPRINT Here, K denotes the tumbling operator with kernel Kε : This yields uniformly on A by Theorem 1 a).Additionally, c f can be chosen to be independent of k by inserting the uniform boundedness of f 16) into equation ( 12).The Grönwall Lemma thus gives with some coefficient L depending on T , C and C ρ .Inserting this in equation ( 19) results in the desired Lipschitz continuity.
We similarly study the Lipschitz continuity of the Keller-Segel measurements G KS jk (K 0 , K 1 ).The proof strategy is almost the same.With some computational effort, one can see: where (Γ, D), ( Γ, D) are the drift and diffusion terms derived by the collision operators defined by (K 0 , K 1 ) and ( K0 , K1 ) respectively by equations ( 8)-( 9).The constant c monotonously depends on the L 2 norms of ρ (k) and ∇ x ρ (k) which are bounded uniformly on A. By the linear relation between D and κ and Γ and θ, this directly translates to , with constant c depending only on V .Finally, the Lax-Milgram theorem shows the continuous dependence of are bounded away from zero and bounded uniformly on A (and in ε).Thus, also the normalization constants Z are.Part b) guarantees the measurability of the likelihoods.In total, this shows that the posterior distributions are well-defined and continuous with respect to each other.Since the likelihoods are continuous in y, well-posedness of the posterior distributions is given.
We are now ready to show the convergence of the two posterior measures.There are two quantities we use to measure the difference between two distributions: • Hellinger metric The two metrics both evaluate the distance between the two probability measures µ 1 and µ 2 that are either absolutely continuous with respect to each other or with respect to a third probability measure µ 0 .Both are frequently used for comparing two distribution functions e.g. in Machine Learning Ran and Hu [2015], Clim et al. [2018], Hamadouche et al. [2017], Cieslak et al. [2012], Ni et al. [2021], Goldenberg and Webb [2019] or inverse problem settings Newton et al. [2020], Abdulle and Blasio [2020].Even though the Kullback-Leibler divergence lacks the symmetry and triangle-inequality properties of a metric, it gained popularity due to its close connection to several information concepts such as the Shannon entropy or the Fisher information metric Kullback and Leibler [1951].Conversely, the Hellinger metric is a true metric.Although it does not have a demonstrative interpretation as the Kullback-Leibler divergence, its strength lies in the fact that convergence in the Hellinger metric implies convergence of the expectation of any polynomially bounded function with respect to either of the posterior distributions, as explained in Stuart [2010].In particular the mean, covariance and further moments of the distributions converge.
Before comparing the posterior measures, we need to have a look at the convergence of the measurements G • (K 0 , K 1 ).
Lemma 2. Assuming the initial and testing functions satisfy (16), the chemotaxis measurements G ε,chem converge to the Keller-Segel measurements G KS uniformly on A as ε → 0.
Proof.Theorem 1 shows the convergence of As a consequence, we have the convergence of the measurements: where we used the form F = 1 V .By the uniform convergence of f ε to ρF , this holds uniformly on A. Since initial data and measuring test functions that satisfy (16) we have the uniform convergence over (j, k) as well.
We can now proof the following theorem on the asymptotic equivalence of the two posterior measures describing the distribution of the tumbling kernels (K 0 , K 1 ) ∈ A if the dynamics of the bacteria is modelled by the kinetic (3) or macroscopic equation (4).
Theorem 2. Let the measurement of the macroscopic bacteria density be of the form (14) and (15) for a underlying kinetic chemotaxis model or a Keller Segel model respectively.The measuring test functions χ j ∈ C c (R 3 ) and initial data f 16).Given a prior distribution µ 0 on A and an additive centered Gaussian noise in the data, the posterior distribution for the tumbling kernel derived from the kinetic chemotaxis equation and the macroscopic Keller Segel equation as underlying models are asymptotically equivalent in the Kullback Leibler divergence With the above Lemmas one can proceed as in the proof in Newton et al. [2020].The integrand of the Kullback-Leibler divergence is by the definition of the normalization constants of order for the Lipschitz constant c < ∞ of exp(− |x| 2γ 2 ) and y − G ε,chem (K 0 , K 1 ) 2 − y − G KS (K 0 , K 1 ) 2 = tr 2y − G ε,chem (K 0 , K 1 ) − G KS (K 0 , K 1 ) T G ε,chem (K 0 , K 1 ) − G KS (K 0 , K 1 ) The first factor is bounded uniformly on A and in ε by Lemma 1 a) and Lemma 2 shows that the second factor converges to 0 uniformly on A.

Summary and Discussion
In this article, we considered bacterial movement in an environment with an attracting chemical substance that was not produced or consumed by the bacteria.The bacteria density was modelled to follow a chemotaxis equation ( 3) on the kinetic level and a Keller-Segel equation ( 4) on the macroscopic level.We studied the reconstruction of the tumbling coefficient using the measurement of the bacteria density at different time and location using different initial data.After adapting the results from Chalub et al. [2004] in the parabolic scaling, we study the equivalence between the reconstructions using the two different underlying models in the Bayesian framework.Assumptions on the prior information were made to guarantee the uniform convergence of the two forward models.This enabled us to show that the posterior distributions are properly defined and that convergence of the two posterior distributions holds true.The distance between two posterior distributions was measured in both the Kullback-Leibler divergence and the Hellinger metric.
The work presented here serves as a cornerstone of future research.On one hand, the study here can help design an efficient inversion solver.Most inversion solvers are composed of many iterations of forward solvers.Since kinetic chemotaxis equation lies on the phase space and is numerically much more expensive, the limiting Keller-Segel equation can serve as a good substitute for generating a good initial guess and speeding up the computation.On the other hand, the approach performed in this study is rather general, and with small modification, it also provides the foundation for explaining experiments, such as Giometto et al. [2015].

Figure 1 :
Figure 1: Two ways to compare the inverse problems: determining and comparing the tumbling kernels for both underlying chemotaxis and Keller Segel models (left) or determining the drift or diffusion coefficient for the Keller Segel model and the tumbling kernel for the chemotaxis model and calculating the corresponding drift and diffusion coefficients.