Measuring Cellular Biomass Composition for Computational Biology Applications

Computational representations of metabolism are increasingly common in medical, environmental, and bioprocess applications. Cellular growth is often an important output of computational biology analyses, and therefore, accurate measurement of biomass constituents is critical for relevant model predictions. There is a distinct lack of detailed macromolecular measurement protocols, including comparisons to alternative assays and methodologies, as well as tools to convert the experimental data into biochemical reactions for computational biology applications. Herein is compiled a concise literature review regarding methods for five major cellular macromolecules (carbohydrate, DNA, lipid, protein, and RNA) with a step-by-step protocol for a select method provided for each macromolecule. Additionally, each method was tested on three different bacterial species, and recommendations for troubleshooting and testing new species are given. The macromolecular composition measurements were used to construct biomass synthesis reactions with appropriate quality control metrics such as elemental balancing for common computational biology methods, including flux balance analysis and elementary flux mode analysis. Finally, it was demonstrated that biomass composition can substantially affect fundamental model predictions. The effects of biomass composition on in silico predictions were quantified here for biomass yield on electron donor, biomass yield on electron acceptor, biomass yield on nitrogen, and biomass degree of reduction, as well as the calculation of growth associated maintenance energy; these parameters varied up to 7%, 70%, 35%, 12%, and 40%, respectively, between the reference biomass composition and ten test biomass compositions. The current work furthers the computational biology community by reviewing literature regarding a variety of common analytical measurements, developing detailed procedures, testing the methods in the laboratory, and applying the results to metabolic models, all in one publicly available resource.


Introduction
The in silico study of metabolism has largely transitioned from a specialty discipline to a mainstream biological approach due to improvements in software usability, increases in computational power, and the accumulation of omics databases. Cellular growth is an essential component of many of these computational biology studies [1][2][3]. Understanding the foundation of growth from the level of mass and energy fluxes remains critical for interpretation and integration of in silico metabolic models composition. Compiling a literature review in conjunction with laboratory-tested protocols with demonstrated application to metabolic models, all within a single source, serves as a useful resource for the computational biology community that should facilitate model building transparency and reproducibility.

Culture Conditions
E. coli cultures were grown at 37 • C shaking at 150 rpm. Inoculum cultures were prepared in 8 mL of M9 + 10 g/L glucose in disposable culture tubes, inoculated with multiple isolated colonies from an agar plate streaked from a 20% glycerol freezer stock, and grown to OD 600 < 0.6 (exponential phase). Cells were then centrifuged at 4000 rpm for 10 min and re-suspended to OD 600~0 .05 in 50 mL of fresh M9 + 1 g/L glucose in 250-mL baffled shake flasks. Cultures were grown to OD 600~0 .6 (exponential phase) and then harvested for analysis (collected in chilled 50-mL polypropylene centrifuge tubes on ice followed by centrifugation).
Synechococcus 7002 cultures were grown at 38 • C without shaking under continuous light. Inoculum cultures were prepared in 25 mL of A+ media in 250-mL non-baffled shake flasks, inoculated with multiple isolated colonies from an agar plate (transferred monthly for propagation), and grown to OD 730 < 0.5. Cells were then centrifuged at 4000 rpm for 10 min and re-suspended in 25 or 50 mL of fresh A+ media to an OD 730~0 .1. Cultures were grown to OD 730 0.4-0.5 and harvested for analysis.
A. acidocaldarius cultures were grown at 60 • C shaking at 200 rpm. Inoculum cultures were prepared in 50 mL of BAM + 5 g/L glucose in 250-mL baffled shake flasks, inoculated with multiple isolated colonies from an agar plate streaked from a 20% glycerol freezer stock, and grown to OD 600 < 0.6. Cells were then centrifuged at 4000 rpm for 10 min and re-suspended to OD 600~0 .1 in 50 mL of fresh BAM + 5 g/L glucose. Cultures were grown to OD 600~0 .6 and then harvested for analysis.

Dry Weight Determination
Optical density was correlated to biomass for each organism to determine amount of dry weight used for macromolecular analyses. Because optical density can fluctuate at high cell concentrations due to shading effects, samples were diluted to an optical density reading below 0.3 to remain within the linearity of the spectrophotometer. Biomass to OD 600 correlation for E. coli of 0.5 g/L cell dry weight per unit OD 600 was obtained from Folsom, Parker, and Carlson [25], which used the same strain and was performed in the same laboratory using the methods below.
Biomass to OD 730 correlation for Synechococcus 7002 was determined from biomass combined from 50-mL shake flask cultures. Cells were harvested by centrifugation (4000 rpm, 20 min, 4 • C), re-suspended in A+ media and centrifuged again, and a series of dilutions was prepared. Three milliliters of each dilution were aliquoted into pre-dried, pre-weighed aluminum pans, dried at 80 • C for 24 h, and weighed on a microbalance with accuracy to 0.001 mg (Mettler Toledo MT5). Pans were dried and weighed again to confirm stability. The correlation curve is provided in Appendix A ( Figure A1a).
Biomass to OD 600 correlation for A. acidocaldarius was determined from biomass grown in a batch fermentor aerated at 1 vessel volume per minute and agitated at 600 rpm. Cells were harvested by centrifugation (6000 rpm, 5 min, 4 • C), re-suspended in water and centrifuged again, and a series of dilutions was prepared in pre-weighed 50-mL polypropylene centrifuge tubes, which had been dried at 100 • C for one week before pre-weighing. Tubes were dried at 100 • C for one week and weighed on an analytical balance with accuracy to 0.1 mg. Tubes were dried and weighed again to confirm stability. The correlation curve is provided in Appendix A ( Figure A1b).

Modeling Methods
A metabolic network model for A. acidocaldarius was constructed in CellNetAnalyzer [26,27] from the annotated genome [28] with the aid of MetaCyc, KEGG, BRENDA, and NCBI [29][30][31] databases. Reversible exchange reactions were defined for protons and water. Irreversible exchange reactions were defined to permit ammonium, sulfate, oxygen, and glucose or xylose uptake and carbon dioxide evolution, as well as secretion of possible byproducts, including acetate, lactate, ethanol, and formate. Macromolecular synthesis reactions were defined for nucleic acids, glycogen, lipid, and protein. Synthesis reactions utilized two phosphate bonds per nucleic acid monomer, one phosphate bond per glycogen monomer, and four phosphate bonds per protein monomer [32]. Nucleotide distributions were set based on percent GC content of the genome for DNA and nucleotide sequence of the rRNA genes for RNA. Fatty acid distribution was assigned based on literature values [33,34]. The amino acid distribution was set using the experimentally measured values in the current study. All reactions were balanced for elements, charge, and electrons. Thermodynamic considerations were built into the model via reaction reversibilities based on data from BRENDA [31]. Model simulations were performed with elementary flux mode analysis. Flux vectors v satisfying the stoichiometric matrix S at steady state (Sv = 0) subject to conservation of mass, specified irreversibilities, and indecomposability constraints were computed, resulting in the collection of minimal pathways through the network, called elementary flux modes (EFMs) [35]. EFMs were enumerated using EFMtool [36]. Analysis of resulting EFMs (e.g., biomass yield) was performed with MATLAB. Maintenance energy was fit to experimental glucose and oxygen yield data for A. acidocaldarius obtained from [37]. Both growth associated (dominant in fast-growing environmental conditions) and non-growth associated (dominant in slow-growing environmental conditions) maintenance terms were determined. The metabolic model with supporting details, CellNetAnalyzer metabolite and reaction input, an SBML file, and maintenance calculations can be found in the Supplementary Materials (Files S1, S2, and S3).

Literature Review
Carbohydrates are common cellular energy storage molecules and constituents of cell walls. HPLC methods can be used to separate and quantify specific sugars [38,39]; however, methods for quantifying total carbohydrates were the focus of the current work. Chaplin [40] reviewed many colorimetric methods for carbohydrate quantification and detailed the advantages and disadvantages of each. The phenol sulfuric acid method [41,42] is widely used, and the L-cysteine and anthrone methods [40,43] are also frequently found in the literature. An issue with many methods is interference from other cellular constituents. For example, protein interferes with measured absorbance in the phenol sulfuric acid assay [40]. In the L-cysteine assay, pentose, heptose, and deoxy sugars contribute to absorbance, and absorbance stability varies among different carbohydrates. Pentoses also contribute to signal in the anthrone assay, but the absorbance fades rapidly and presents minimal interference. Different hexoses may also produce differential responses in the anthrone assay; for example, mannose produces 55% percent of the measured absorbance intensity of glucose [43]. Minimizing interference from pentoses is a key consideration when selecting assays to avoid measuring nucleotide bases twice in both nucleic acid and carbohydrate assays.
Glycogen is the most common form of carbohydrate storage for bacteria [44]. Glycogen content can indicate cellular responses to changing nutrient conditions; for instance, E. coli and Synechococcus 7002 have both been found to increase glycogen storage during nitrogen limitation [45,46]. Glycogen can be precipitated from cells with KOH, but alkalinity causes some degradation of glycogen. An alternative method using sodium sulfate to adsorb and co-precipitate glycogen has been developed for mosquitoes [47] and adapted to bacterial samples [48] and was selected for the current study. The anthrone assay was selected for quantification of hexoses due to minimal pentose interference. The method employs sulfuric acid to hydrolyze polysaccharides to glucose monomers. In the presence of anthrone, glucose monomers are converted to hydroxyaldehydes and dehydrated to hydroxymethylfurfurals [49], which form blue-green colored complexes with anthrone. The current study tested the hexose quantification assay on cell pellets, glycogen extracts, and the residue remaining after the glycogen extraction process. The sum of the glycogen extract and residue measurements was compared with the total cell pellet measurement to verify recovery of all cellular carbohydrates. Differentiation between glycogen and other cellular carbohydrates, such as cell wall sugars, can provide useful parameters for metabolic models.

•
Anthrone reagent: (per reaction) mix 10 mg anthrone and 250 µL fresh absolute ethanol (anthrone will partially dissolve), add 75% sulfuric acid to a final volume of 5 mL, and stir until anthrone is completely dissolved [18]. Prepare fresh daily (within 24 h of use) and store at 4 • C.
(2) Seal tube with parafilm to prevent cap from popping open and heat for 10 min at 70 • C (VWR analog heat block). (3) Add 1 mL methanol, and vortex in two 10-s rounds to co-precipitate sodium sulfate and glycogen. (4) Centrifuge for 15 s at 10,000 rpm to pellet the precipitate (Eppendorf 5415D microcentrifuge) and decant the supernatant. (5) Wash the precipitate with 1 mL methanol (add methanol, vortex, centrifuge, and decant). (6) Re-suspend the pellet in 1 mL water, transfer to a clean glass test tube, and place on ice to chill. (7) Add 5 mL ice-cold anthrone reagent (mixing is unnecessary). (8) Chill on ice for 5 min, vortex briefly to homogenize the solution, and incubate in a boiling water bath for 10 min.
Notes: Use a neutral reaction (containing no glucose) as the blank. Perform a standard curve with each assay, and treat standards identically to samples with anthrone reagent. Different sources report slightly varying absorbance wavelengths and water bath incubation times; the most widely supported parameters were implemented in the current work.

Quantification of Hexoses Excluding Glycogen
Collect the methanol decanted after the precipitation and wash steps (Steps 4-5) in an aluminum pan and evaporate in a fume hood. The methanol contains non-glycogen hexoses, which did not adsorb to and precipitate with sodium sulfate. Re-suspend in 1 mL water, transfer to a clean glass test tube, and place on ice to chill (Step 6); then perform the anthrone reaction as in Steps 7-9.

Quantification of Total Carbohydrate
Skip the glycogen precipitation and wash steps (Steps 1-5). Re-suspend the cell pellet in 1 mL water, transfer to a clean glass test tube, and place on ice to chill (Step 6). Then, perform the anthrone reaction as in Steps 7-9.

Test Results
Assay linearity was observed within 10-250 µg/mL glucose (Figure 1a). The sum of the glycogen extract and residue measurements was equivalent to the total carbohydrate measurement for E. coli, Synechococcus 7002, and A. acidocaldarius (Figure 1b, p > 0.05), indicating that glycogen can be accurately distinguished from other cellular carbohydrates. The glycogen mass percentage obtained for E. coli is similar to previously published values measured under carbon limitation (3.6%) [45] and in balanced growth (2.5%) [32]. Previous measurements for Synechococcus 7002 estimated 10-12% of dry biomass was carbohydrates under carbon-and light-limited chemostat conditions and 61% of dry biomass was carbohydrates under nitrogen-limited conditions [46]. The 17% of dry biomass value measured here falls close to the carbon-or light-limited conditions. No literature comparison was available for A. acidocaldarius. When testing the anthrone assay on glycogen extract, residue, and total carbohydrate samples, it was found that not all three measurements were captured within the standard curve for the entire set of samples; the residue represented a relatively small proportion of total cellular carbohydrates. Thus, different amounts of biomass were tested for each species to identify a quantity that would place all measurements within the standard curve. One milligram dry weight was found appropriate for E. coli and A. acidocaldarius, and 0.5 mg dry weight was used for Synechococcus 7002 (i.e., organisms with higher carbohydrate content require less biomass for the assay).
Additionally, the procedure outlined in Del Don et al. [48] prescribes washing the glycogen pellet with methanol until the pellet is white. However, in the current work, it was observed that glycogen pellets from cyanobacterial samples remained slightly blue after three successive methanol washes, most likely due to photosynthetic pigments. The anthrone assay was tested on glycogen extracts from Synechococcus 7002 samples after one, two, or three washes. The carbohydrate content was not significantly different among the three treatments (p > 0.05 for all pair-wise T-tests), indicating that a single wash is sufficient (data not shown).

Literature Review
DNA represents a small but important component of cellular biomass, and its content changes with specific growth rate. For example, slower-growing cells contain more DNA on a cell mass basis than fast-growing cells [4,17]. De Mey et al. [50] provided a summary and comparison of methods for quantifying DNA and RNA. De Mey et al. tested different absorbance, colorimetric, and fluorescence-based assays on purified DNA solutions and reported accuracy and sensitivity for bacteria of differing GC contents [50]. UV absorbance is precise but requires a pure sample (for example, from kit extraction) to minimize interference from RNA and protein. Orcinol can be used to quantify DNA colorimetrically, but has differing sensitivities for different nucleic acids and is not as precise for mixtures of DNA and RNA. The diphenylamine assay [51] is a commonly used method but has lower sensitivity for low GC content and is not as precise as other mentioned methods [50]. The diphenylamine assay also seems to be sensitive to the purity and preparation of the reagents [18]. Fluorescence methods for DNA detection are becoming popular [50]. Hoechst 33258 is a DNA-intercalating dye and is reported to be biased toward AT content [50]. However, cell lysate can be used due to low affinity of the dye for protein and RNA. Thiazole orange is another dye with good precision, but it requires a pure sample and is biased toward GC content [50]. Additional fluorescent dyes that require pure sample for good quantification include PicoGreen and RiboGreen [52].
Considerations when selecting a DNA quantification method include the purity of the sample, interfering substances, and bias toward nucleotide content. Based on these considerations, the Hoechst fluorescent assay was selected for the current study. It is more quantitative than extraction kits and is safer and more precise than the classic diphenylamine method. It is recognized that AT nucleotide bias and the DNA standard used will influence the resulting estimation. Downs and Wilfinger [53] developed and validated an alkali lysis procedure with subsequent Hoechst quantification of DNA using rat pituitary cells. Downs and Wilfinger showed equivalent accuracy but greater precision than the diphenylamine assay [53]. Note: all solutions were prepared using nuclease-free water.

Assay
(1) Re-suspend cell pellet to 50 µL total volume in nuclease-free water in a 2-mL Eppendorf tube.
Notes: Perform a standard curve with each assay. Perform three reaction wells of each sample or standard for technical replicates.

Test Results
Downs and Wilfinger [53] reported using 0.1 µg/mL Hoechst. However, saturation of calf thymus standard DNA was observed with 0.1 µg/mL Hoechst in the current work ( Figure 2a). More recent protocols [55] have suggested that 1 µg/mL Hoechst dye may be used to detect higher quantities of DNA (up to 10 µg) but may not be as sensitive for lower DNA quantities. Based on standard curves using 0.1 µg/mL and 1 µg/mL Hoechst, 1 µg/mL was selected for the current work due to its improved detection range (Figure 2a). The lowest standard concentration used in the assay was 0.25 µg/mL. Hoechst fluorescent response was determined to be linear up to 40 µg/mL DNA; however, a standard curve up to 10 µg/mL was sufficient to capture sample measurements. Additionally, calf thymus DNA standards were subjected to the lysis procedure to ensure that lysis does not cause loss of DNA. Standard curves showed equivalent fluorescent response regardless of whether the lysis procedure was performed, indicating that the lysis step did not influence DNA recovery ( Figure A2a, Appendix A).
After initial testing, standards were not subjected to the lysis steps along with samples but were subjected only to the Hoechst treatment. Different quantities of biomass were also tested to ensure that DNA recovery was in the linear range of the assay (Figure 2b). DNA recovery from E. coli plateaued between 0.5 and 1 mg biomass, thus 0.4 mg was selected as an appropriate amount of starting material for E. coli and A. acidocaldarius. DNA recovery from Synechococcus 7002 plateaued around 1.2 mg, and 0.8-1 mg biomass was used for starting material. Organisms with higher DNA content (E. coli, A. acidocaldarius) required less biomass for the assay. Incubation time in the lysis solution also influenced DNA recovery, with longer incubation times resulting in increased DNA recovery (Figure 2c). Downs and Wilfinger [53] developed their assay on mammalian cells, which are more easily lysed in contrast to bacterial cells. Incubation of E. coli samples over a time series of 10, 30, 60, 120, and 180 min resulted in about twice the amount of DNA recovered. A similar result was observed for A. acidocaldarius. DNA recovery for Synechococcus 7002 samples increased nearly four-fold with longer incubation times. These results highlight the differing sensitivities of different cell types to assay conditions: E. coli and Synechococcus 7002 are both Gram-negative bacteria, but cyanobacteria are known to have thicker cell walls with more peptidoglycan [56]. A. acidocaldarius is a Gram-positive bacterium but is adapted to acidophilic environments and may be more sensitive to alkaline conditions. A lysis period of 180 min was selected as an adequate incubation time.
Impacts of sample treatment were also investigated with E. coli, including freezing of the sample and washing of the cell pellet prior to treatment. Fresh and frozen samples from the same culture were assayed and not found to be significantly different ( Figure A2b). Downs and Wilfinger [53] reported washing samples with a cell wash solution (150 mM NaCl, 15 mM citrate, 3 mM EDTA, pH 7.0 with HCl) before performing the lysis step. In the current work, washing the sample with cell wash solution resulted in lower DNA recovery as compared to not washing ( Figure A2b) and may indicate cell loss or lysis during the washing process. Extracellular DNA was not expected to comprise a significant proportion of total DNA in the planktonic, exponentially growing samples; however, this may not be the case for all samples, such as biofilm or natural environmental samples.
The DNA percentage of dry biomass obtained for E. coli is lower than the value of 3.1% reported in Neidhardt et al. [32], which could be due to differences in methods used or E. coli strains (K-12 vs. B/r), while the percentage obtained for Synechococcus 7002 is similar to the results reported in Vu et al. [46] measured with the diphenylamine method. Differences in growth rate or growth phase may contribute to differences in measured percentages. No literature comparison was available for A. acidocaldarius.

Literature Review
Lipids are essential to cellular membranes and are carbon and energy storage molecules. Measurement of total lipid is commonly performed gravimetrically, with an absorbance-based sulfo-phospho-vanillin assay [57], or using gas chromatography. A common gravimetric method is the Bligh and Dyer chloroform-methanol extraction [58]. Other solvents have been used to mitigate the hazards of chloroform, and a variety of modifications to the original Bligh and Dyer procedure exist [16]. There is debate regarding the performance of these different methods, and methods may vary depending on downstream applications. Much research has been done on lipid extraction from biofuel-producing organisms such as algae, and recent testing and comparison of methods have shown microwave extraction with GC analysis to provide optimal results [16]. However, cyanobacteria synthesize predominantly diacylglycerol lipids as opposed to the triacylglycerol lipids of algae [59]. Cyanobacterial lipids are also located throughout the cytoplasm in the thylakoid membranes rather than in granular pockets as in algae. Many lipids are also associated with protein and photosynthetic components through hydrogen bonding. Different methods have been tested for the cyanobacterium Synechocystis sp. PCC 6803, and the traditional Bligh and Dyer and Folch methods were found to produce optimal results [59]. Different cell disruption methods have also been tested for Synechocystis sp. PCC 6803, and microwave extraction and autoclaving were found to be the most efficient disruption methods [60]. The traditional Bligh and Dyer method was selected for the current study for analysis of total cell lipid [58].

•
Cell pellet (10 mg dry biomass, fresh or frozen, washed with carbon-free media). Water.

Assay
(1) Re-suspend cell pellet to 0.6 mL total volume with water in a 15-mL polypropylene centrifuge tube.
(2) Sequentially add chloroform (0.75 mL) and methanol (1.5 mL) (vortexing between additions is not necessary).  Notes: Weights were measured with a Mettler Toledo MT5 microbalance with accuracy to 0.001 mg and were recorded as an average of three measurements. A blank reaction containing 0.6 mL water was also performed as a control.

Test Results
Typical protocols for this method recommend a minimum of 30 mg biomass [16]. However, 30 mg biomass requires a large culture volume. Smaller biomass quantities were tested, and the assay was observed to produce a linear response within 10-35 mg biomass (Figure 3a). Thus, 10 mg starting material was used in the current work.
Additional concerns for photosynthetic organisms when selecting an appropriate method for lipid quantification include interference from chlorophyll, which is also extracted by the solvents. Previous work [61] suggested that DMSO will remove chlorophyll prior to lipid extraction. DMSO was tested on cyanobacterial samples in the current work by vortexing the cell pellet in 10 mL DMSO, subsequently washing with water (re-suspending, centrifuging, decanting) until the supernatant was colorless, and then following the chloroform-methanol extraction procedure. However, DMSO treatment appeared to remove all lipid signal, resulting in no mass recovered (data not shown); thus, it was recognized that results of this method for cyanobacterial samples will encompass chlorophyll and photosynthetic pigments as well as lipid. Autoclaving samples was also tested as an alternative method of cell disruption for all three species but was not found to significantly improve lipid recovery ( Figure A3, Appendix A). Lipid percentages of dry biomass for all three species are shown in Figure 3b. The lipid percentage obtained for E. coli (6.7%) is lower than the value of 9.1% reported by Neidhardt et al. [32], which may be due to differences in methods or strains, while the measured percentages for Synechococcus 7002 (9.0%) are also comparable with previously measured lipid and chlorophyll values by Vu et al. [46], i.e., 8.8%, 5.6%, and 3.8% for carbon-, light-, and nitrogen-limited conditions, respectively, who also used the Bligh and Dyer method. The percentage obtained for A. acidocaldarius (3.4%) is similar to a previously published report of 3.6% [62], which used a 2:1 chloroform/methanol extraction method.
Differences between the current measurements and previously reported values may reflect differences in culturing conditions or the influence of specific procedural details.
Polypropylene centrifuge tubes were used for safety during centrifugation rather than glass tubes, but it was noted that polypropylene is not completely chemically resistant to chloroform and may cause leaching of compounds from the polymer into chloroform. This error was accounted for by performing a blank reaction (0.6 mL water). The mass of the blank was then subtracted from the mass of the biological sample to obtain the mass of lipid. Additionally, removal of the lower chloroform phase can be difficult to perform reproducibly. Glass Pasteur pipettes were used initially; however, micropipettes with 200-µL tips provided more control over phase removal and yielded the most reproducible results.

Literature Review
Protein is typically the largest fraction of bacterial biomass. Many methods have been reported for determining protein quantity, including UV absorption spectroscopy and dye-based assays such as Bradford, Lowry, BCA, and others, for which Noble et al. [63] and Noble and Bailey [64] provided thorough discussions. UV absorption depends on tyrosine and tryptophan residues and the molar extinction coefficient of the protein under examination, and it requires a highly purified sample. Dye-based assays are influenced by different amino acid distributions and are subject to different interfering compounds as well as variability between proteins. Bovine serum albumin is a commonly used protein standard, but its amino acid sequence may not be representative of the total cellular protein. Amino acid analysis, or hydrolysis of cellular protein followed by derivatization and identification of individual amino acids via HPLC, is an alternative to these methods and is often considered the gold standard for protein analysis [63,64].
Amino acid analysis was selected for the current study due to improved accuracy and less bias as opposed to UV absorbance or dye-based methods. Some amino acids, such as cysteine and tryptophan, degrade during hydrolysis; special hydrolysis conditions may be used to retain them [65], or their proportions may be estimated based on genome codon distribution. Amino acid analysis provides experimental amino acid distribution in addition to total protein quantification, which serve as important parameters for metabolic modeling.  Notes: PVDF was selected as the filter membrane material due to its low protein binding capacity; materials with high protein binding such as nylon may affect amino acids. A fluorescence detector or diode array detector can be used for detection. The current work employed an Agilent 1100 HPLC system equipped with autosampler and fluorescence detector (for A. acidocaldarius samples) or diode array detector (for E. coli and Synechococcus 7002 samples). A diode array detector was less sensitive than a fluorescence detector (limit of detection~100 µM vs.~2 µM). o-phthalaldehyde (OPA) reagent with 3-mercaptopropionic acid as a stabilizing agent was used for detection of primary amino acids, and 9-fluorenylmethyl chloroformate (FMOC) reagent was used for secondary amino acids. OPA and FMOC reagents were replaced daily in amber vials and were used within 10 days upon opening an ampule. The flow rate was modified from 2 mL/min [66] to 1 mL/min to permit increased resolution of peaks (see gradient settings in Table 2). The injector program followed the steps described in Henderson et al. [66] but did not make use of the optional acetonitrile needle rinse. The integration parameters for collecting the data were set to a slope sensitivity of 1, peak width of 0.04, area reject of 1, and height reject of 0.4, with shoulders off. Amino acids were identified manually in the calibration table, and undesired peaks were discarded (derivatization byproduct peaks at the end of an injection).

Test Results
Upon preparing for HPLC analysis, the lyophilized material was re-suspended in 100 µL 0.1 M HCl per mg biomass hydrolyzed. Different dilutions of the re-suspension were measured to ensure adequate detection of both more abundant and less abundant amino acids. Peak identity was confirmed for each amino acid by testing individual solutions of each amino acid. A representative chromatogram is shown in Figure 4. An internal standard, α-aminobutyric acid, was used in samples and standards alike for peak area normalization across injections. Standard curves were constructed, resulting in linear regressions with fits of 0.99 or greater. The experimental amino acid distribution and total protein quantification for the three bacterial species are shown in Table 3. Since cysteine and tryptophan were degraded during hydrolysis and methionine was present in low quantities with high variability (likely oxidized during hydrolysis), the distribution of these three amino acids was calculated according to the percentage found in the protein-coding genes of the genome. Reasonable correlations were observed between the experimentally measured and genome-predicted distributions ( Figure A4, Appendix A). Interestingly, leucine content was observed to be consistently over-predicted in the genome and under-measured in the laboratory among the three species tested, although no explanation has been linked to this observation in the literature.  Table 3. Amino acid distributions and protein quantification for E. coli, Synechococcus 7002, and A. acidocaldarius. Amino acid mass and mole percentages of total protein are reported as averages of three biological replicates; percent relative standard deviations are in parentheses. HPLC quantification was obtained with a diode array detector for E. coli and Synechococcus 7002 and a fluorescence detector for A. acidocaldarius. Cysteine, methionine, and tryptophan were factored in according to genome-based codon proportions. Asparagine and glutamine were converted to aspartate and glutamate, respectively, during derivatization with OPA. The final row includes the total protein mass percent of cell dry weight, averaged from three biological replicates with standard deviation in parentheses.

Literature Review
RNA is a major macromolecule class which contributes to ribosome assembly and cellular information processing. Methods used for quantifying RNA include UV absorbance, orcinol colorimetric reaction, and thiazole orange fluorescent dye [50]. UV absorbance is precise but requires a pure sample and is not feasible for a mixture of DNA and RNA. Orcinol is not as precise for RNA or for a mixture of DNA and RNA, and carbohydrates may also interfere. Thiazole orange has good precision, but fluorescence is biased toward GC content, and it is less sensitive for RNA than for DNA [50]. Additionally, kits are available for RNA extraction but focus predominantly on downstream applications such as PCR and RNAseq, and thus remain questionable as quantitative methods. Fluorescent dyes such as RiboGreen and PicoGreen have also been reported for quantifying extracted RNA [52] but are usually used in combination with kit extractions.
Major concerns in selecting an RNA quantification method include sample purity, accuracy, and bias of nucleotide content. Many studies have used the colorimetric orcinol reaction to quantify RNA after hot perchloric acid extraction. However, Benthin et al. [67] developed an alternative method with the bacterium Lactobacillus that utilizes alkali (KOH) lysis in combination with cold perchloric acid extraction, followed by UV absorbance. The method provided similar accuracy to the orcinol reaction but showed improved precision [67]. Benthin's KOH-UV method was selected for the current study as a more reliable and safe method, using cold rather than hot perchloric acid, and to eliminate interference from carbohydrates, which occurs in the orcinol reaction. The results can be quantified with UV absorbance using average nucleotide molar extinction coefficients, which eliminates the need to prepare a standard from a different source. This method has been used in metabolic modeling studies to quantify RNA percentage [68][69][70]. Notes: Quartz cuvettes are commonly used for measuring UV absorbance; however, disposable UV cuvettes can also be used (VWR 47727-024, rated to 220 nm and tested for chemical compatibility with concentrated hydrochloric acid). Linearity of the spectrophotometer within the range of sample absorbance should be confirmed by successively diluting the sample with 0.5 M HClO 4 and confirming a linear fit to the resulting absorbance measurements. Absorbance at 280 nm can also be measured, and the A 260 /A 280 can be calculated to assess RNA purity.

Test Results
Benthin et al. [67] recommended using a quantity of biomass corresponding to~0.4 mg of RNA.~2 mg and~8 mg biomass was used for E. coli and Synechococcus 7002, respectively, based on previous estimates of RNA content [32,46]. Correlation between biomass and RNA content was tested for Synechococcus 7002, and a linear response was observed within 2-8 mg biomass (Figure 5a). RNA mass percentages for all three species are shown in Figure 5b. The percentage obtained for E. coli is similar to the 20.5% value reported in Neidhardt et al. [32]. The percentage of dry biomass obtained for Synechococcus 7002 is higher than the 4.0% average value measured in Vu et al. [46] via the orcinol method but could reflect different growth states. No literature comparison was available for A. acidocaldarius.

Model Biomass Reaction
Experimentally measured biomass composition provides a species-relevant basis for representing cellular growth in computational models. The results of the macromolecular assays for E. coli, Synechococcus 7002, and A. acidocaldarius are summarized in Table 4. The mass percentages for the five assays do not necessarily sum to 100% of cell dry weight. The reduced mass recovery may be due to loss of biomass during centrifugation and transfer of material while performing the assays. Some bacteria may also possess other storage compounds that are not accounted for in these analyses, such as polyhydroxyalkanoates or polyphosphates. Ash weight typically accounts for 5-10% of cell dry weight [72], or perhaps even more for some organisms (e.g., 20-30% ash content has been measured in phytoplankton [73]). To adjust for losses during sample processing, measurements can be normalized to the total mass recovered such that the sum of biomass recovered from all measurements is 100% (Table 4). An in silico cellular growth reaction is a collection of macromolecular synthesis reactions scaled to account for biomass composition. The macromolecular synthesis reactions are constructed by accounting for the appropriate ratios of the monomers, polymerization energy requirements, and reaction byproducts. Macromolecular monomer distributions are either measured directly, such as the amino acid composition measured here, or can be estimated from appropriate omics datasets or the literature. DNA composition is typically estimated from GC content, and RNA composition may be estimated from rRNA-encoding genes; rRNA accounts for approximately 81% of cellular RNA [32]. Polymer lengths for the macromolecular synthesis reactions can be scaled to a convenient number of monomers, such as 10 or 100, with the appropriate polymerization energy requirements and byproducts. The polymerization energy error introduced with these scaled molecules is assumed minor.
Once formulas for individual macromolecules are calculated, model reactions can be quality control checked for balance of elemental formulas and degree of reduction to ensure adherence to the mass balance constraint required for stoichiometric modeling. Identification of imbalanced reactions can then be further investigated; often the issue can be traced to balancing of redox pairs or hydrolysis products, free protons, and water. Table 5 demonstrates the construction of a DNA macromolecule synthesis reaction for A. acidocaldarius, including the definition of monomer composition, polymerization energy requirements, and byproducts. The elemental and electron balances are included and validate conservation relationships [32]. The Supplementary Materials contain a workbook for the major biomass macromolecules that can be modified for different biomass measurements (File S4). The overall cell growth reaction has a form analogous to A carbohydrate + B DNA + C lipid + D protein + E RNA = 1 biomass, where A, B, C, D, and E are stoichiometric coefficients corresponding to the measured mass fraction. Some biomass reactions may also include additional constituents, such as chlorophyll, salts, and metabolite pools, including vitamins. The coefficients for the macromolecular constituents A-E are obtained by converting the experimental mass fraction measurements to molar coefficients, thereby yielding the appropriate stoichiometries. The following steps convert experimentally measured mass fractions of macromolecules to molar coefficients for use in the biomass reaction: (1) Record mass fractions as g macromolecule per g cell dry weight (see Table 4).
(2) Tabulate the molar mass of each macromolecule representation. Multiply the macromolecular formula by the atomic mass of the respective elements, and sum over all elements to obtain g/mol macromolecule. (3) Divide the mass fraction of the macromolecule by its molar mass to obtain mol macromolecule/g cell dry weight. The basis for cell dry weight normalization can be selected as desired; 1, 10, or 100 kg cell dry weight typically results in reasonably scaled coefficients for elementary flux mode and flux balance analyses. One kilogram cell dry weight often provides a convenient basis, as when inputs are scaled to a mM basis in FBA, the resulting output biomass scales to grams. (4) Incorporate the molar coefficients into the biomass reaction. The stoichiometries can be multiplied by the macromolecular formulas and summed over all the macromolecules to obtain an overall formula for biomass, which allows model output to be analyzed in terms of carbon moles of biomass ( Table 6).
The Supplementary Materials detail the macromolecule and biomass calculations for each species, as well as demonstrate a quality control check for balancing mass, charge, electrons, and elemental composition (File S4). In addition to the macromolecular constituents that comprise a cell, metabolic models often account for maintenance energy requirements. Maintenance energy is an implicit energy consumption term accounting for a myriad of cellular processes, such as protein turnover and osmotic pressure maintenance. Maintenance energy is typically estimated by fitting the in silico model to experimental biomass-on-substrate yield data. For example, experiments correlating substrate consumption rate (for heterotrophs) or photon absorption rate (for photoautotrophs) with growth rate can be used to determine the yield [74,75]. For elementary flux mode analysis applications, a single maintenance energy term, set for a defined growth rate, can be added to the biomass reaction. For flux balance analysis applications, maintenance energy requirements can be broken down into growth and non-growth associated maintenance (GAM and NGAM) terms. The Supplementary Materials contain a genome-enabled model constructed for A. acidocaldarius (File S1). Calculations fitting maintenance energy to observed yield data for both glucose and oxygen consumption from Farrand et al. [37] for both EFMA and FBA application are provided in MATLAB and Excel formats (Files S1, S2, and S3). The specific growth rate-dependent (µ, h −1 ) maintenance energy requirement (q ATP ) for A. acidocaldarius was calculated to be q ATP = 13.4µ + 4.2 mmol cellular energy per g biomass per hour, where GAM was 13.4 mmol cellular energy (phosphodiester bonds) per g biomass and NGAM was 4.2 mmol cellular energy per g biomass per hour. Using multiple datasets to fit the maintenance energy provides a metric of accuracy for the model, as they should provide similar results. The calculated maintenance terms for A. acidocaldarius were similar regardless of fitting with glucose or oxygen consumption data (Files S1, S2, and S3).
Finally, the A. acidocaldarius model was used to quantify potential pitfalls associated with inaccurate biomass compositions. Ten different biologically relevant variations of biomass composition were generated and tested in addition to the experimentally measured composition (see File S5). The optimal in silico biomass yield on electron donor (glucose) and associated biomass yield on electron acceptor (oxygen) was determined for each biomass composition. A sampling of the data is presented as a function of the biomass degree of reduction in Figure 6 and Table 7. The data point at degree of reduction 4.03 represents the experimentally measured composition for A. acidocaldarius; this point is used as a reference. The in silico biomass per glucose and biomass per oxygen yields change nonlinearly relative to degree of reduction. The biomass per oxygen yields change up to 70% from the reference composition, demonstrating the strong influence biomass composition can have on simulation results ( Figure 6, Table 7). Common modeling practices for determining maintenance energy parameters fit model output to experimental yield data, which can mask the effects of inaccurate biomass composition. GAM values for each biomass composition were also calculated ( Figure 6, Table 7). The GAM values changed up to 40% over the reference case. This represents a substantial 40% change in specific energy generation-associated fluxes, such as ATPase. Furthermore, the biomass yield on nitrogen was calculated for each biomass composition. The biomass per nitrogen yields varied up to 35% for the considered biomass compositions (see File S5). This variation in nitrogen content would have substantial impact on predictions for nitrogen-limited culturing conditions, such as those commonly used in bioprocesses to induce accumulation of bioplastics or lipids (e.g., [76,77]). This analysis highlights the importance of accurate species-and condition-specific measurements for biomass composition.  Table 7), resulting in a range of biomass degrees of reduction. EFMA simulations of the A. acidocaldarius metabolic model with the different biomass compositions revealed substantial variation in biomass per electron donor yield (g biomass per mol glucose) and almost a doubling of oxygen required for biomass synthesis (g biomass per mol oxygen) as a function of biomass degree of reduction. Similar differences were also observed in growth associated maintenance (GAM, mmol cellular energy per g biomass) to fit the data for glucose consumption from Farrand et al. [37]. Fitted GAM values changed by as much as 40% between the experimentally determined biomass composition (degree of reduction 4.03) and the modulated biomass compositions (parenthetical percentages). (b) Biomass yield on nitrogen (calculated from the elemental composition) varied up to 35%, demonstrating the sensitivity of modeling results to biomass composition when considering nitrogen-limited conditions. Calculations, additional data, and further details are included in the Supplementary Materials (File S5).

Conclusions
Computational biology representations of metabolism often include cellular growth reactions necessitating knowledge of biomass composition for accurate predictions. The current work surveyed analytical methods for the five major macromolecules (carbohydrate, DNA, lipid, protein, and RNA), provided step-by-step procedures for a select method for each macromolecule, tested the methods on three different bacterial species, and demonstrated application of analytical measurements to a computational representation of cellular growth. The data include a quantitative analysis of potential pitfalls associated with inaccurate biomass representations. The literature survey included references to more in-depth reviews for each macromolecule for further exploration and also provided a rationale for the selected method. Table 8 provides a summary of the selected methods and their advantages and disadvantages. The three bacterial species used for testing (E. coli, Synechococcus 7002, and A. acidocaldarius) represent a range of physiological characteristics, including Gram-negative and Gram-positive, mesophilic and thermophilic, and neutrophilic and acidophilic, as well as chemoheterotrophic and photoautotrophic, which assessed the robustness of the methods. Testing of methods highlighted potential pitfalls and provided guidelines for troubleshooting when testing a new method or when applying a method to new organisms. Based on the current study, recommendations for verifying a new protocol or testing a new organism include ensuring that the test response is linear for both the amount of biomass used and the amount of reagent, testing the standard range, and confirming the effect of any sample pre-treatment steps on standards. It is also important to consider the organism being studied and the downstream application of the measurement (e.g., glycogen vs. total carbohydrate). The presented methods of experimental measurement and conversion to computational biology reactions need to be integrated with the maturing quality standards for model construction [78,79]. The predicted elemental composition of the synthesized biomass is a relevant metric for the quality of the overall reaction. Average elemental compositions have been measured for several common microorganisms, providing a convenient check [80]. The elemental composition is linked to the biomass degree of reduction, which is an energetic measure of biomass and a critical parameter for computational biology analysis of consortia simulations. The degree of reduction of biomass for an average cell is approximately 4.2 or 4.8 on an NH 4 + or N 2 basis, respectively [80]. These values can shift due to large quantities of cellular storage polymers, such as polysaccharides or polyhydroxyalkanoates. Additionally, biomass composition is known to shift with growth rate and culturing stress [45,81]; the provided approach can be used to create culturing condition-specific cellular growth reactions. Altogether, the current work serves as a useful resource for the broader computational biology community, which will enable more accurate representations of biomass synthesis and therefore more accurate metabolism simulations.

Figure A3.
Autoclaving samples prior to chloroform-methanol lipid extraction did not significantly enhance lipid recovery. Solid columns represent unautoclaved samples and striped columns represent autoclaved samples. Lipid recovery was not significantly affected by autoclaving for A. acidocaldarius (p > 0.05, T-test), whereas lipid recovery was slightly lower for autoclaved E. coli samples (0.05 < p < 0.01, T-test). n = 9 and 3 for unautoclaved and autoclaved E. coli samples, respectively, and n = 6 and 3 for unautoclaved and autoclaved A. acidocaldarius, respectively.