A Multi-Scale Approach to Modeling E. coli Chemotaxis

The degree to which we can understand the multi-scale organization of cellular life is tied to how well our models can represent this organization and the processes that drive its evolution. This paper uses Vivarium—an engine for composing heterogeneous computational biology models into integrated, multi-scale simulations. Vivarium’s approach is demonstrated by combining several sub-models of biophysical processes into a model of chemotactic E. coli that exchange molecules with their environment, express the genes required for chemotaxis, swim, grow, and divide. This model is developed incrementally, highlighting cross-compartment mechanisms that link E. coli to its environment, with models for: (1) metabolism and transport, with transport moving nutrients across the membrane boundary and metabolism converting them to useful metabolites, (2) transcription, translation, complexation, and degradation, with stochastic mechanisms that read real gene sequence data and consume base pairs and ATP to make proteins and complexes, and (3) the activity of flagella and chemoreceptors, which together support navigation in the environment.


Introduction
This supplement provides additional details about all of the process, composites, and experiments used in this paper. To support multi-scale development with many processes developed and combined by large communities of scientists, Vivarium applies principles of modularity to its software design by breaking up functionality into a collection of Python libraries. The library vivarium-core is the simulation engine, which provides the interface, combines processes into composites, and executes them in experiments. The core engine can be found at the following url: https://github.com/vivarium-collective/vivarium-core. Instructions on installation and setup can be found at the online documentation: https: //vivarium-core.readthedocs.io/en/latest/.
Process libraries are used to develop specific Vivarium projects with processes and composites that share the Vivarium interface. When released as a Python package, these libraries can be imported into other projects, re-configured, combined with other processes, and simulated in large experiments.
Two process libraries were used to develop the models described in this paper: • vivarium-chemotaxis includes chemotaxis-specific processes built for this paper and all of the paper's experiments: https://github.com/vivarium-collective/vivarium-chemotaxis.
• vivarium-cell provides several configurable process, which can be passed parameters and combined with other processes: https://github.com/vivarium-collective/ vivarium-cell.

Processes
This section provides an overview of each process used in the paper in the order they were introduced. This includes their mathematical representation, the Vivarium library they were developed in, their module path within the library, and configuration data (such as parameters) used in this paper. Config functions are special functions used to retrieve configuration data; they are typically placed within the directory of their associated process/composite. If config functions are listed as 'default', that means there is no separate function available at present and either the default parameters are used or parameters are passed in directly. There are several additional 'helper' processes not listed here, including derive globals, derive concentrations, derive counts, tree mass, meta division, divide volume, timeline, and nonspatial environment. For the curious readers, these are all available in the Vivarium repositories.

Multi-Body Physics
Multibody uses a wrapper around an open-source physics engine called pymunk [1]. This simulates agents as rigid bodies with a shape, location, orientation, length, width, mass, and motile forces that can be updated from within the agents; an example of these forces is described under Flagella Motor. The multi-body process handles their collisions and applies their motile forces, external viscous forces, and thermal jitter.

Multi-Body Physics name
Multibody module path cell.processes.multibody_physics repository vivarium-cell config function default

Diffusion
DiffusionField operates on 2D numpy arrays that represent molecular concentration fields. A diffusion operator spreads these concentrations towards a homogeneous state based on provided diffusion coefficients. Agents have a location, which can be updated by Multibody, associated with locations on the fields based on their center points. Agents update the fields with their own updaters, applying uptake and secretion directly to the concentrations fields at their given locations; this drives the fields away from homogeneity. DiffusionField updates the agents' received local environmental state within its update function.

Diffusion name
DiffusionField module path cell.processes.diffusion_field repository vivarium-cell config function default

Metabolism
Metabolism uses a wrapper around COBRA [2]-an established library for flux balance analysis (FBA). It can load large genome-scale models from the BiGG model database [6], which includes over 100 metabolic models of different species. FBA uses linear optimization to determine fluxes in the absence of kinetic parameters. It accomplishes this by using steady-state assumptions to declare Sv = 0 with a distribution of fluxes v across a stoichiometric network depicted by matrix S. Constraints are imposed on v, setting their upper and lower bounds. Then, an objective function is optimized-a typical objective function is the maximization of biomass based on known relative composition of required metabolites.
Metabolism augments the COBRA solver to make a type of dynamic FBA (dFBA) [12], which iterates on the solver in a piece-wise manner with updated constraints. Constraints come from external nutrient availability, regulation, and flux constraints coming through the process's flux bounds port. This flux bounds port can be used by other processes (in this paper, ConvenienceKinetics) to set constraints on metabolism. Additionally, rather than the standard use of the biomass objective to increase a single variable called biomass, Metabolism takes the constituent metabolites that make up the biomass objective and applies them to internal pools of metabolites (see Figure 5a). This allows other processes to utilize those metabolites.

Metabolism name
Metabolism module path cell.processes.metabolism repository vivarium-cell config function get_iAF1260b_config

Convenience Kinetics
The minimal form of Michaelis-Menten kinetics defines flux rates, v, as a function of substrate concentrations, [S], and enzyme concentrations, [E], with parameters for the catalytic rate of a single enzyme, k cat , and substrates' affinities, Convenience kinetics [8] is a generalized form of the Michaelis-Menten kinetic rate laws, which supports all possible kinetic relations in a single equation: with α i as the stoichiometric coefficient of substrate i, β j as the coefficient of product j, K m,j as the normalized concentration of j, and c k = [k] K m,k as the normalized concentration of competitor k. The advantage this provides for computational models lies in the products, , which can be evaluated in loops based on the number of cofactors and competitors in the reaction network.

Configuration Data: Glucose-Lactose Transport
ConvenienceKinetics accepts configuration data for reactions, which is the stoichiometry of the reactions, and kinetic parameters, which are the k cat and K M for those reactions. The config function get glucose lactose transport config returns the following data for the glucose exchange reaction, EX glc D e, and the lactose exchange reaction, EX lcts e. Individual states are annotated parentheses with (port id, variable name)-this is how the ConvenienceKinetics process sees states in its update function. Kinetic parameters that have a state correspond to the K M for that state.

ODE Expression
OdeExpression has equations for mRNA and protein production: M is the concentration of mRNA, k M is the transcription rate, and d M is the transcript degradation rate. P is the concentration of the protein translated from M , k P is the translation rate, d P is the protein degradation rate, and ξ is a transcriptional leak that generates small, spontaneous bursts of mRNA expression.

Configuration Data: LacY Expression
LacY expression was configured with the following parameters:

Template-Based Stochastic Gene Expression
Stochastic gene expression processes include Transcription, Translation, and Complexation. These load in several forms of configuration data, including gene sequences, protein sequences, transcription unit structure, binding affinities, and complexation stoichiometry. The processes simulate expression from DNA to protein complexes with stochastic simulations provided by an open-source Gillespie algorithm solver called arrow (https: //github.com/CovertLab/arrow), which was independently developed by R.K.S. The processes use arrows to simulate the binding and unbinding of ribosomes onto genes and RNAP onto RNA templates, as well as the complexation of the resulting proteins. The Gillespie algorithm is a stochastic approach to numerically solve discrete chemical systems with known reaction rates. At each iteration, the algorithm calculates the propensities for each reaction given a rate and the counts of the reactants, then selects one reaction to occur and determines the interval of time for the current reaction.
The propensity a total of any reaction to occur is defined as the following, with M total reactions and a as the reaction activity: To find τ , the time to the next reaction, rand 1 is generated from a uniform distribution between 0 and 1, and the following equation is applied: The reaction that occurs at time τ is found by generating another random number between 0 and 1, rand 2 , and identifying the smallest q such that: q j=1 a j > a total · rand 2 Once bound onto a template, a polymerase goes over provided curated sequences with a specialized polymerize function that reads the sequence at a provided elongation rate and pulls the required based pairs out of the available internal pools-if a base pair is absent, this stalls expression. Energy requirements for expression are based on ATP hydrolysis for each synthesized or degraded base pair [11], which is tracked and updated. The parameter polymerase occlusion (set in the paper to 50 base pairs for ribosomes and RNAP) determines how far down the template a polymerase must have moved before another one can bind to the promoter.

Configuration Data: Flagella Chromosome
FlagellaChromosome is a knowledge base that holds many different types of data about the flagellar genes: genome sequence, transcription unit structure, affinities, protein sequence, complexation, and reactions. Once instantiated as object flagella chromosome, these configuration data can be accessed and passed into the expression processes.
flagella chromosome.sequence contains the entirety of the E. coli genome's nucleotide sequence, loaded from the FASTA file: Escherichia coli str. K-12 substr. MG1655 genome assembly ASM584v2. flagella chromosome.genes contains the transcription unit structure of all flagellar genes, which were curated from EcoCyc. [5]. The following list shows all included operons followed by a list of their cluster of genes.

Flagella Motor
The flagella motor process is a combination of several published models. It models individual flagella as sub-compartments with rotational states and controls the number of flagellar sub-compartments based on the counts of flagellar complexes, which can be controlled by other processes, such as Translation. The total motile activity-a "run" or "tumble"-is a function of the total flagellar activity. The concentration of phosphorylated CheY, [CheY P ], is found with a steady-state solution of a differential equation from [7; 13]. The equation includes the activity of the chemoreceptor cluster P on from ReceptorCluster, the rates of phosphate transfer from CheA P to CheY k Y = 100µM −1 s −1 , CheZ's dephosphorylation rate of CheY P k Z = 30.0/[CheZ]s −1 , and rate constant γ = 0.1: The rotational states of each flagellum are modeled as a stochastic bistable system [10]. The transition rates between CCW and CW are functions of [CheY P ], and arise from thermal fluctuations that push the flagella over free energy barriers between their conformational states. The following equation finds the free energy barrier, ∆G, with parameters for the free energy barrier from CCW→CW g 0 = 40k B T and the free energy barrier for CW→CCW g 1 = 40k B T ; K D = 3.06µM is the binding constant of CheY P to FliM at the base of the flagellar motor: The switching rates of individual flagella, k CW →CCW and k CCW →CW , are given by the following, with ω as the characteristic motor switch time: With all of the individual flagella in a given rotational state, the veto model of motor activity determines the overall motile state of the cell [9]. Any flagella rotating CW are enough for a "tumble" state; if all are CCW, then the cell is in a "run". The generated thrust is a logarithmic function of the number of flagella and PMF. A run generates this thrust at an angle of 0; a tumble generates it at a random angle. In this paper, the environmental Multibody process reads these thrusts and angles and applies them at the back end of each corresponding agent's body.

Membrane Potential
This process is used to calculate the proton-motive force from internal and external concentrations of ions. The proton-motive force, ∆p, is comprised of two terms, ∆Ψ and ∆pH: The Goldman equation gives membrane potential, ∆Ψ: ] out with positively charged cations C, negatively charged anions A, and these ions' permeabilities P . Constant R is the ideal gas constant, T is temperature in Kelvin, and F is Faraday's constant.
The transmembrane pH difference, ∆pH, is given by −2.3 kT/e, with Boltzmann constant k, temperature T , and proton charge e. This paper assumed a constant ∆pH = −50 based on measurements for cells grown at a pH of 7.

Membrane Potential name
MembranePotential module path chemotaxis.processes.membrane_potential repository vivarium-chemotaxis config function default

Chemoreceptor Cluster
The model of chemoreceptor clusters comes from [3]. This Monod-Wyman-Changeux model includes receptor homodimers, r, with parameters for Tar (r = T ar) and Tsr (r = T sr). The total free energy is the sum of the individual free-energy difference by their relative counts: F = n T ar f T ar + n T sr f T sr . This determines the rate of receptor activation, P on : Receptor adaption comes from the changing methylation levels m based on the concentration of methylating enzyme CheR and demethylating enzyme CheB, cluster activity P on , methylation and demethylation rates k meth and k demeth , and adaptation rate k adapt : Parameters from [3] were used: n T ar = 6, n T sr = 12, K of f T ar = 0.02, K on T ar = 0.5, K of f T sr = 100.0, K on T sr = 10e6, k meth = 0.0625, k demeth = 0.0714.

Composites
The primary focus of the main text was the composition of the prior listed processes into compartments. Since much emphasis has been placed on that methodology, this supplementary section on composites is primarily to provide information on where to find the composite objects in order to simulate them. The topologies of each composite are accessible online at the provided module path. The following example of a topology for Lattice demonstrates the general form-an embedded python dictionary with keys at the top level for each process name, with subdictionaries that connect their ports to declared stores. Each process-in this case, multi-body and diffusionmaps its ports to a store using a path relative to the compartment with a python tuple. If the store is at the same compartment level as the process, the path is just ('store name',); to go up one level, a boundary store can use ('..', 'store name',), or to go down one level, ('path to', 'store name').

Experiments
All experiments described and plotted in this paper are available in the vivarium-chemotaxis github repository, in the file: chemotaxis/experiments/paper_experiments.py. There is a well-commented function for every figure shown in the paper, with in-line comments that outline how the processes, composites, and experiments are configured. To run these from the command line within the repository: $ python chemotaxis / experiments / paper_experiments . py [ exp_id ] exp_id corresponds to a figure number from the paper. These figure numbers are listed here, along with the processes/composites they use: • 3b -GrowthDivision and Lattice