Using Gene Essentiality and Synthetic Lethality Information to Correct Yeast and CHO Cell Genome-Scale Models

Essentiality (ES) and Synthetic Lethality (SL) information identify combination of genes whose deletion inhibits cell growth. This information is important for both identifying drug targets for tumor and pathogenic bacteria suppression and for flagging and avoiding gene deletions that are non-viable in biotechnology. In this study, we performed a comprehensive ES and SL analysis of two important eukaryotic models (S. cerevisiae and CHO cells) using a bilevel optimization approach introduced earlier. Information gleaned from this study is used to propose specific model changes to remedy inconsistent with data model predictions. Even for the highly curated Yeast 7.11 model we identified 50 changes (metabolic and GPR) leading to the correct prediction of an additional 28% of essential genes and 36% of synthetic lethals along with a 53% reduction in the erroneous identification of essential genes. Due to the paucity of mutant growth phenotype data only 12 changes were made for the CHO 1.2 model leading to an additional correctly predicted 11 essential and eight non-essential genes. Overall, we find that CHO 1.2 was 76% less accurate than the Yeast 7.11 metabolic model in predicting essential genes. Based on this analysis, 14 (single and double deletion) maximally informative experiments are suggested to improve the CHO cell model by using information from a mouse metabolic model. This analysis demonstrates the importance of single and multiple knockout phenotypes in assessing and improving model reconstructions. The advent of techniques such as CRISPR opens the door for the global assessment of eukaryotic models.


Introduction
Both budding yeast S. cerevisiae and Chinese Hamster Ovary (CHO) C. griseus are model cell lines for understanding metabolism in eukaryotes [1] as well as versatile bio-production hosts [2,3] for biofuels, biorenewables and proteins [4]. The earliest genome-scale metabolic model of yeast (iFF708 [5]) included 708 ORFs (10.7% of the total number of verified ORFs in yeast genome) and 1175 reactions with only two compartments (i.e., mitochondria and cytosol). Subsequent efforts improved this model not only by including additional ORFs, metabolites and metabolic pathways, but also by integration of enzyme-localization information for compartmentalization (e.g., including peroxisome, nucleus, golgi apparatus, vacuole and endoplasmic reticulum) [6]. The latest consensus model version (Yeast 7.11 [7]) consists of 2,218 metabolites and 910 genes partitioned in 14 distinct compartments. A detailed comparison of the development of yeast genome-scale models is reviewed in [8,9]. CHO cells have emerged as the preferred cell line for recombinant proteins [10]. It has been shown that 70% of therapeutics production [11] worldwide is carried out in CHO cells thereby garnering over $30 billion in sales. An important consideration is for genetic engineering to avoid knockouts of lethal gene sets while designing high yielding strains [12] of commercial importance.
Essentiality and SL analyses [13,14] have been used to systematically assess the validity/accuracy of genome-scale flux models [8]. Essentiality and SL analyses refer to identifying sets of gene deletions (single, double and higher order thereof) that render the strain nonviable. Essentiality analysis identifies the list of genes, each of which when deleted in silico, limits the biomass flux to lower than 10% of its theoretical maximum. Whereas, SL analysis identifies the list of in silico gene pairs (and higher order) whose removal constrains the biomass flux to lower than the aforesaid essentiality criterion. These analyses serve the dual purpose of model refinement (by comparing with available in vivo knockout information) and prediction for identifying genes (or combination of genes) whose knockouts could potentially be lethal. The latter is particularly useful in strain engineering applications for avoiding synthetic lethal gene deletions. In earlier efforts, these analyses were used extensively to curate metabolic models of well-studied organisms such as E. coli [15,16] and S. cerevisiae [8]. Model improvement using network-embedded thermodynamic flux variability analyses to ascertain the directionality of reactions have been used by Martinez et al. [17]. Other related efforts include Stanford et al. [18] and Soh et al. [19], all of which aim to integrate thermodynamic information to curate the yeast genome scale models. In this study, we evaluate the accuracy of the latest genome-scale consensus model first for S. cerevisiae (Yeast 7.11) and thereby update it to iSce926 and subsequently for C. griseus (CHO 1.2) with existing experimental measurements in terms of gene essentiality and synthetic lethality and propose a list of corrections and follow-up assessment of predicted gene deletions.
The proposed model modifications on Yeast 7.11 involve 50 literature-supported changes that improve the sensitivity, specificity of Yeast 7.11 by 2.66% and 20.4% and decrease the false viable rate (FVR) by 8.42% (see Appendix). They build upon the effort by Zomorrodi et al. [8] as they conserve four of earlier identified changes. ES and SL analyses are supplemented by auxotrophy information (for Precursor Identifier algorithm see Supplemental File S2) to help elucidate the cause (i.e., nutrient or biomass precursor deficiency) for lethality. For CHO 1.2, we identified eight instances where model and experiment does not match. Upon supplementing this mismatched set with another 11 cases of model and experiment discrepancies from the mouse model [20], we suggested 14 additional (single, double and higher) gene deletion experiments (see Supplemental File S1) for maximally resolving mutant growth phenotypes in CHO cell lines.

S. Cerevisiae Model Yeast 7.11 Curation
In vivo essentiality and synthetic lethality information were mostly obtained from gene deletion studies by Tong et al. [21]. Gene level essentiality analysis (see Table 1) and synthetic lethality predictions for Yeast 7.11 are shown in Figure 1. Resources used to assess in silico results included (i) the Saccharomyces Genome Database (SGD) [22], (ii) single gene deletion studies in minimal media on yeast strain S288C [23], (iii) viability analyses [24], (iv) data from The ORF Report [25][26][27] and analysis of protein encoded transmembrane segments in yeast [28] (See Supplemental File S3 for full gene lists and references of experimental evidence).   Table 1 depicts the number of essential reactions (~13% of total reactions present in the model), essential genes (~16.5% of total genes in the model) and catalogs the number synthetic lethals with up to five simultaneous gene deletions identified by the SL finder for Yeast 7.11. Figure 1 pictorially classifies the agreement of in silico lethality information with in vivo gene deletion information (see Supplemental File S4: Table 2). The vertical axis refers to in silico predictions while the horizontal axis refers to in vivo data. The boxes in the diagonal are instances of compliance between model and experiment whereas boxes off the diagonal signify various modes of disagreement between in silico and in vivo data. For example, box ESG for the Yeast 7.11 encompasses 30 genes which are essential (ES) in silico but are non-essential (G) in vivo whereas box SL2ES refers to gene pairs that have been found to form a synthetic lethal pair in silico (SL2), however at least one of them is essential (ES) in vivo. Overall, our current model iSce926 improves the sensitivity and specificity of Yeast 7.11 from 0.288 to 0.347 and 0.951 to 0.977 respectively and decreases the false viability rate (FVR) from 0.712 to 0.652 (see Appendix). Brief summaries of few of the proposed model modifications have been listed in Tables 2, 3 and 4. Comprehensive information on the exact cause of disagreement between model and experiment can be found in Supplemental File S3. In silico analysis also generates results for deletion combinations unexplored so far. For example, box SL2U (see Figure 1) contains 10 in silico lethal gene pairs for which double deletion experiments have been unverified (U) so far in the literature. These unverified results often reveal non-intuitive lethal gene sets to be avoided while designing overproduction strains. Conversely, predicted viable synthetic lethals are prime candidates to be tested experimentally to assess the functionality of all the pathways present in the model. Absence of isozymes and alternate pathways in the model may also lead to discrepancies between in silico results and in vivo data. For example, gene TYR1 (YBR166C) which is essential (ES) for tyrosine biosynthesis [29] according to the model, is found experimentally to be non-essential instead and found to form synthetic lethals (i.e., a ESSL2 discrepancy). There is a single pathway for the formation of tyrosine in Yeast 7.11; however, chitin synthase (CHS1) [30,31], which is not present in the existing metabolic model, can rescue this mutant phenotype. Another such example involves gene BAT2 (YJR148W) encoding branched-chain amino transferases in isoleucine, leucine and valine biosynthetic pathways [32]. BAT2 forms an in vivo synthetic lethal pair with its paralog BAT1 (YHR208W).
However, both have been identified as essential in silico. Reconciliation between model and experiment was achieved by adding the pretyrosine pathway [33] and allowing for the transport of α-keto-isovalerate across the mitochondrial membrane [34].
The candidate genes of lethal combination are non-metabolic and are involved in chromatid cohesion. [57]

HIS7-RSP5
RSP5 is involved in endocytosis signaling pathway, a non-metabolic function, hence unable to be captured in a metabolic model. The candidate genes of lethal combination are non-metabolic and are involved in chromatid cohesion. [57]

GNA1-CHL1
The lethality is owing to chromosome loss which is a non-metabolic phenomenon. [57]

8
FRS2 gene forms 5 lethal pairs: The candidate genes of lethal combination are non-metabolic and are involved in chromatid cohesion. [57] 9 TYS1 gene forms 2 lethal pairs: The candidate genes of lethal combination are nonmetabolic and are involved in mitosis. [57] 10 ARG7 gene forms 1 lethal quadruplet: The quadruplet association is not entirely metabolic hence cannot be captured by metabolic model. [54] No.

12
YAH1 gene forms 1 lethal pair: YAH1 has already been resolved as ESES MRE11-YAH1 double knockout strain will result in meiotic recombination disorder and will be lethal. This is a non-metabolic attribute of yeast. [57] Overall, we reconciled 50 growth discrepancies (for full list of model modifications and comparison of the performance of iSce926 and Yeast 7.11 see Supplemental File S1) between model and experiment (see Tables 2 and 3). Twelve ESG cases were identified that form ESSL2 inconsistencies in combination with other non-metabolic genes (see Table 4). For example, gene HEM13 whose deletion causes an ESG discrepancy has a non-metabolic function in chromatin assembly and interacts with RNA-polymerase II in transcription. It forms a synthetic lethal with CDC73 [57] (cell division cycle gene) due to the inability to form the pre-rRNA transcript upon simultaneous deletion of the two. We propose a possible interaction schematic (see Figure 2) explaining the cause for the lethal interaction based on information from [65][66][67]. However, it is in general beyond the purview of a metabolic model to resolve inconsistencies whenever non-metabolic genes are implicated in the interaction. Five separate classes of model modifications were introduced for Yeast 7.11 (see Table 2) including (a) addition of reactions, (b) removal of reactions, (c) GPR modifications, and (d) addition of GPR information for orphan reactions. We have also separately listed 12 ESG cases (see Table 3) where we have explained why they should have been ESES cases instead.

Addition of Reactions
A total of six reactions were added to Yeast 7.11 (see Supplemental File S1: Table 1). They generally fill in gaps in existing pathways by introducing in vivo verified reactions and the corresponding genes in the model. They also reconcile ESSL2 inconsistencies to SL2SL2 agreements. For example, restoring the BAT1-BAT2 in silico synthetic lethal pair by adding α-keto isovalerate transport between mitochondria and cytosol in the model (see Table 2). Another reconciliation of ESG case to GG case involved an iron (II) transporter reaction being added to the model (see Table 2). Yeast 7.11 contains a low-affinity iron (II) transporter gene FET4 causing ESG inconsistency. In vivo evidence [58] revealed the presence of a high-affinity iron (II) transporter (encoded by FET3) that can also transport iron (II) across the plasma membrane rendering the FET3-FET4 gene combination an in vivo synthetic lethal. Thus, adding the FET3 mediated transport reaction for iron (II) reconciles FET4 from ESG to GG and also identifies FET3-FET4 as an SL2SL2 match.

Removal of Reactions
Three reactions (see Supplemental File S1: Table 1) were removed from the model that restored GSL2 inconsistencies to SL2SL2 agreements without affecting any of the correct predictions. For example, removal of the orphan reaction pyrimidine-nucleoside phosphorylase that converts uridine to UMP renders the in silico double mutant ΔFUR1ΔURA3 lethal in agreement with in vivo data. The fact that the removed pyrimidine-nucleoside phosphorylase reaction lacked a GPR association possibly alludes to the putative nature of its inclusion into the model. The reactions (see Supplemental File S1: Table 1) were tagged using the IDs in the Yeast 7.11 model. One of suggested removals (i.e., r_1682 where S-adenosyl L-methionine and zymosterol is converted to ergosta-5, 7, 22, 24 (28)-tetraen-3beta-ol) involved the elimination of a lumped form of a reaction whose elementary steps (i.e., r_0986, r_0242, r_0243 and r_0244) are already present in the Yeast 7.11 model.

GPR Modifications
A total of 13 GPR modifications were made to the model that corrected for 15 inconsistencies. For example, α, α-trehalose phosphatase synthase (UDP-forming) and trehalose phosphate phosphatase catalyze the formation of α,α-trehalose phosphate from glucose-6-phosphate, which is converted to trehalose in the presence of UDP-glucose (see Figure 3). The GPR amendment from (TPS1 and TPS2 and TPS3) or (TPS1 and TPS2 and TSL1) to TPS1 or (TPS1 and TPS2 and TPS3 and TSL1) changes the deletion of TPS2 from ESG to GG and identifies only TPS1 as essential [38,39] in glucose or fructose media as elucidated in Bell et al. [40]. . GPR modification revealing that TPS1 gene is only essential for the associated reactions in minimal glucose medium. Old GPR is shown on the left and new GPR on the right.
In S. cerevisiae, most of the ATP formation occurs from glycolysis. A Δfba1 strain shows more than five-fold reduction in net ATP production in silico, which proportionately reduces the biomass flux (from 2.44 h −1 to 0.465 h −1 ). However, this is above the in silico viability threshold rendering the fba knockout non-essential. A possible reason for this contradiction could be the inability to capture the in vivo suppression of mitochondrial respiratory enzymes under accumulation of cytosolic fructose 1,6-bisphosphate due to short-term Crabtree effect [59]. In fact turning off MIR1 mediated phosphate/proton mitochondrial symporter (PIt2m) [60] makes fba1 mutant in silico lethal (reducing the net ATP production by ~35-fold) thereby resolving a GES case to an ESES case. The same observation also holds true for a tpi1 deletion in S. cerevisiae.
Phosphoribosyl diphosphate synthase catalyzes the essential reaction PRPPS. The inconsistency of this GPR was addressed in iAZ900 [8] and was modified to depict that any of the three PRS gene pairs (PRS1 and PRS3), (PRS2 and PRS5) or (PRS4 and PRS5) is capable of encoding the sub-units required for catalyzing the reaction. However, it was later shown [68] that any of the five viable pairs (see Figure 4) need to be present for growth with one subunit containing an NHR (non-homologous region) and the other without one. Both PRS1 and PRS5 contain NHR while the rest do not. Therefore, the current GPR (YHL011C and YKL181W) or (YOL061W and YBL068W) or (YOL061W and YER099C) was corrected accordingly to (YKL181W and YER099C) or (YKL181W and YHL011C) or (YKL181W and YBL068W) or (YER099C and YOL061W) or (YBL068W and YOL068W). This correction not only recapitulates all in vivo observations but also predicts one lethal pair and two lethal triplet mutants (i.e., Δprs1Δprs5 and Δprs1Δprs2Δprs4, Δprs2Δprs3Δprs4 respectively). Bifunctional carbamoyl phosphate synthetase catalyzes the first two enzymatic steps in the de novo biosynthesis of pyrimidines both of which undergo feedback inhibition by UTP (uridine tri-phosphate). The existing GPR identifies (CPA1, URA2) and (CPA2, URA2) as SL pairs. In fact, experimental evidence [43] and the Saccharomyces Genome Database [24] reveal that all three of CPA1, CPA2 and URA2 are essential (see Figure 5). Similar evidence was seen in another yeast strain Candida albicans [44]. This GPR modification from ((YJR019C and YOR303W) or YJL130C)) to (YJR019C and YOR303W and YJL130C) rectifies two SL2ES cases to ESES and three GES cases to ESES. Dolichyl phosphate mannose mannosyl transferase catalyzes the conversion of dolichyl mannosyl phosphate to mannan in the endoplasmic reticulum, which is then transported out to the cytosol and forms a biomass precursor. The existing GPR (see Figure 6) suggests that either one of PMT3, PMT4, PMT5 or both of PMT1 and PMT2 need to be present for the reaction to occur. However, recent experimental evidence [69,70] implies that PMT genes are classified in the following three sub-families (sub-family 1: PMT1 and PMT5), (sub-family 2: PMT2 and PMT3) and (sub-family 3: PMT4). Literature evidence [68] suggests that only the removal of PMT4 in combination with deletions in both sub-family members 1 and 2 is lethal. Therefore, the GPR is modified from ((PMT1 and PMT2) or PMT3 or PMT4 or PMT5) to ((PMT1 and PMT5) or (PMT2 and PMT3) or PMT4). This modification puts forth the following four lethal triplet gene deletions: Δpmt1Δpmt2Δpmt4, Δpmt5Δpmt2Δpmt4, Δpmt1Δpmt3Δpmt4 and Δpmt5Δpmt3Δpmt4.  . Gene-protein-reaction association for conversion of dolichyl-mannosyl phosphate to mannan. The genes (blue) code for the proteins (orange), which catalyze the reaction (gray). Old GPR is shown on the left and new GPR on the right.

Addition of GPR to Orphan Reactions
There are 15 instances where GPR associations were assigned to orphan reactions in the model (see Supplemental File S1: Table 1). This adds 10 new gene loci to the model and correctly identifies them as non-essential based on growth data in budding yeast [43,44] and other well-annotated organisms such as E. coli [35]. For example, it has been shown [35] that S. cerevisiae contains an E. coli ortholog of ubiC gene that encodes for chorismate-pyruvate lyase enzyme, which enables the addition of the corresponding GPR to the reaction that was already present in the model. Gene deletion studies on ubiC reveal that it is non-essential matching model predictions. In another example, assigning BIO6 and BIO8 genes in the GPR for putative KAPA synthetase as seen in the YJM627 and A364a strains of S. cerevisiae [53,54] correctly identifies the deletion of either BIO6 or BIO8 as non-essential.

MSL2 Gaps in the Model
This refers to missing (M) genes (i.e., YDL040C and YMR307W) from the model despite the presence of experimental data on their deletion phenotype. They are integral to nine in vivo SL2 cases (see Figure 7 and Supplemental File S5). YMR307W (GAS1) encodes for beta-1, 3-glucanosyltransferase [71] (belonging to the ERAD pathway [72]) a eukaryotic membrane protein embedded in the lipid bilayer that aids anchoring inositol associated glycophospholipids to the cell wall. On the other hand, YDL040C (NAT1) encodes for N-acetyltransferase and is primarily involved in cell wall integrity (CWI-MAPK) signaling pathway with a few ancillary functions such as cell-cycle, heat shock resistance, sporulation and telomeric silencing [73]. The CWI-MAPK signaling pathway [74,75]) involves a cohort of five genes (i.e., NAT1, NAT2, NAT3, NAT4 and NAT5) that are missing in Yeast 7.11. In the current effort we have successfully incorporated NAT1 and NAT2 and have correctly identified them as GG cases.

Figure 7.
In vivo lethal gene pairs absent in model (MSL2 case). The genes YDL040C and YMR307W are involved in nine in vivo SL pairs but they are not present in Yeast 7.11 reconstruction. The participating genes (in blue) are found to be non-essential in silico.

Model Predictions for Synthetic Lethals in S. Cerevisiae
Experimental gene deletion compilations such as the Keio collection [15] and Saccharomyces Genome Database [22,44] are available for single gene deletion mutants. Exhaustive information for growth deficiency for single gene deletions is available but for higher gene deletion combinations the task becomes prohibitive. For example, S. cerevisiae genome with 3,912 metabolic genes would require approximately 7.5 million double-knockout experiments to verify the viability of all double deletion mutants. Computational tools, such as the SL finder can identify in silico synthetic lethal combinations thus narrowing down the combinations to be tested in vivo. Supplementary Tables 3 and 4 (see Supplemental File S4) tabulate the list of gene pairs and higher order gene deletions in Yeast 7.11 that are in silico lethal but have not been tested yet in vivo. While the lethal effect of some of these deletions is straightforward, a number of cases reveal non-intuitive lethal deletion combinations from distal parts of metabolism that either hinder cofactor synthesis or transport to another compartment. Four such cases identified using the corrected Yeast 7.11 model are described next in more detail.

Proline Auxotrophy (Δpro1Δcar2 Double Mutant)
YDR300C (PRO1) is a gamma-gluatmyl kinase that initiates proline biosynthesis by catalyzing the conversion of cytosolic glutamate to L-gamma glutamyl-5-phosphate. In silico removal of PRO1 redirects the flux through the L-ornithine transaminase reaction to produce proline via L-glutamate 5-semialdehyde (LG5S) (see Figure 8). In silico removal of YLR438W (ΔCAR2) cannot catalyze the conversion of ornithine to LG5S. Therefore, the double deletion Δpro1Δcar1 strain cannot form LG5S and hence it is proline auxotrophic and thereby lethal.

Leucine Auxotrophy (Δleu4Δleu9 Double Mutant)
Δleu4Δleu9 double mutant is devoid of α-isopropylmalate synthase (IPMS) activity leading to leucine auxotrophy. YNL104C (LEU4) encodes IPMS that catalyzes 2-isopropylmalate formation from 3-methyl 2-oxobutanoate essential in leucine biosynthesis. YOR108W (LEU9) is alpha-isopropylmalate synthase II and can carry out a residual α-IPMS activity in a ΔLEU4 strain. Existing in vivo studies [76] suggest the single deletions of LEU4 or LEU9 are non-essential which makes the double mutant a candidate for a synthetic lethal pair as suggested by the metabolic model.

Arginine and Valine Auxotrophy (Δctp1Δmae1 Double Mutant)
YBR291C (CTP1) encodes the citrate-Pep antiporter from peroxisome and mitochondria to cytosol. YKL029C (MAE1) codes for the mitochondrial malic enzyme, which catalyzes oxidative decarboxylation of cytosolic S-malate to pyruvate (see Figure 9). There exist two alternate ways of providing mitochondrial L-glutamate (see Figure 9) required for valine and arginine biosynthesis. Pathway 1 uses Akg-citrate antiport to translocate mitochondrial Akg to cytosol reversibly, which is coupled to ctp1 catalyzed citrate-pep antiport. Pathway 2 uses S-malate-Akg antiport to synthesize mitochondrial Akg which is prevented in a Δmae1 strain. A Δctp1Δmae1 double mutant is thus both cytosolic and mitochondrial-Akg auxotrophic and thereby is unable to synthesize mitochondrial L-glutamate.

Figure 9.
Valine and Arginine auxotrophy due to simultaneous deletion of YBR291C and YKL029C gene. Paths 1 and 2 represent two alternate routes to regenerate mitochondrial glutamate and the yellow path shows the part that is common to both paths. Mitochondrial valine goes to valine biosynthesis and 2-acetyl ornithine goes to arginine production. SL gene pairs are marked with red crosses.

Disruption of Lipid Metabolism (Δitr1Δino1Δitr2 triple mutant)
Lipid (a biomass precursor) is an intermediate metabolite formed by 15 precursor molecules in appropriate biological ratios in Yeast 7.11. The in silico ΔYDR497CΔYOL103W (Δitr1Δitr2) double deletion strain cannot uptake myo-inositol thereby showing decrease in vegetative growth in corroboration with in vivo studies [77]. ΔYJL153C (Δino1) cannot convert cytosolic glucose-6-phosphate to myo-inositol-1-phosphate conversion according to the model (see Figure 10). Therefore, the Δitr1Δino1Δitr2 triple mutant is found to be unable to form cytosolic myo-inositol and thus lipids.
Single gene deletion studies [78,79] with inositol supplied in the media suggest that Δino1 is not lethal but minimal media [23] without inositol makes ino1 essential for viability. Note that in silico minimal media are supplemented with inositol (see Methods) in all calculations explaining why Δino1 is found to be non-essential in silico. Figure 10. Lipid metabolism disrupted due deletion of synthetic lethal gene triplet. YDR497C and YOL103W are isozymes coding for the protein catalyzing the myo-inositol transport into the cytosol from media. Reactions catalyzed by SL gene triplet are marked with red crosses.
The essentiality and synthetic analysis for the well-curated Yeast 7.11 model revealed a number of opportunities for model improvement and predictions for non-intuitive synthetic lethals to be tested experimentally for providing additional backing for the model and/or ways to remedy shortcomings. Subsequently, we switch our attention to a CHO cell model with significantly fewer mutant growth phenotypes tested.

C. Griseus Model CHO 1.2 Curation and Suggestion of Gene Knockout Experiments
The absence of a comprehensive single-gene knockout database for Chinese Hamster Ovary (CHO) cells (unlike yeast) makes the assessment CHO 1.2 genome-scale model [80] more difficult. Therefore, we supplemented limited experimental data with predicted lethal gene deletions based on the most recent mouse model [20] and gene knockout studies in mouse embryonic stem cells [81] that exhibited high degree of sequence similarity (functionality of the encoded protein is at least 70% conserved across all mammalian systems [81]) with the CHO cell genome. Any inconsistencies between mouse and CHO cell lethality was used as an opportunity to correct the CHO model (see Supplemental File S6). Eight GPR modifications were proposed for CHO 1.2 in order to address and reconcile five ESG cases to GG, three GES cases to ESES, three SL2ES cases to ESES and one ESSL2 case to SL2SL2. In addition, we propose a number of gene deletion experiments to verify non-intuitive synthetic lethal gene combinations. Reaction level essentiality analysis in silico revealed 90 essential reactions (see Table 5). Utilizing the GPR associations for these reactions, 57 essential genes were identified (see Supplemental File S4: Table 5) for growth under aerobic minimal essential media (media information in Supplemental File S3). A comparative analysis with existing experimental data, model modifications and suggested gene-deletion experiments have been listed in Table 6 (for full version of Table 6 see Supplemental File S1: Table 3).    Table 3 of Supplemental File S1 shows all eight GPR modifications in the CHO 1.2 model are based on in vivo gene deletion experiments [90] in CHO-K1 cell lines and embryonic stem cells of mouse [81]. For example, removal of glycogen synthase (GYS) reaction is lethal in silico as it blocks the pathway for synthesizing biomass precursor glycogen. The existing GPR (gys1 and gys2) suggests that deletion of either of the genes encoding the protein would be lethal in silico at the gene level. However, experiments on CHO K1 cell lines show that single deletion of these genes are not lethal in vivo. Furthermore, in vivo deletion experiments in related organisms with a conserved glycogen synthase activity (such as S. cerevisiae [54]) show that Δgys1Δgys2 double deletion is lethal. As a result, the existing GPR was changed from (gys1 and gys2) to (gys1 or gys2) to reconcile two ESG and two ESSL2 inconsistencies to GG and SL2SL2 respectively (see Figure 11(a)). Unlike the previous inconsistency, we find a contrary case for pgm2 mutant. The existing CHO 1.2 model suggests that either of pgm1 or pgm2 can encode for the in silico essential phosphoglucomutase (PGM) reaction required to synthesize biomass precursor glycogen (see Figure 11(b)). However, single gene deletion experiments in mice embryonic stem cells [91] show that deletion of pgm1 is non-lethal as an active pgm2 can compensate for loss of functional activity of pgm1. Deletion of pgm2, on the other hand is lethal, thus indicating that pgm2 is the major isoform primarily responsible for PGM activity. As a result, the current GPR for PGM was accordingly changed from pgm1 or pgm2 to pgm2 or (pgm1 and pgm2) that reconciles not only GES for pgm2 to ESES but also SL2ES for pgm1-pgm2 to ESES.
Similar to the pgm2 case deletion of pcyT1a results in a GES inconsistency. Either of pcyT1a or pcyT1b can encode for the in silico essential choline phosphate cytidylyltransferase (CPCT) reaction. Removal of CPCT blocks the production of phosphatidylcholine and sphingomyelin, which are biomass precursors, thus making the two genes a synthetic lethal pair. However, in vivo single gene deletion studies in embryonic stem cells [88] show that while ΔpcyT1a is lethal, ΔpcyT1b mutant strains are viable. This observation suggests that pcyT1a is sufficient to encode for CPCT activity, while pcyT1b is a minor isoform. The GPR for CPCT was modified accordingly to pcyT1a or (pcyT1a and pcyT1b) to resolve pcyT1a GES to ESES and pcyT1a-pcyT1b SL2ES to ESES (see Figure 11c). Reconciling gys1 and gys2 from essential genes to an SL2 (b). Shows how pgm2 can be identified as an essential gene (c). Shows how pcyT1a can be identified as an essential gene. Old GPR is shown on the left and new GPR on the right.

Suggested Single and Double Gene Deletion Experiments
Due to lack of sufficient experimental gene deletion data either in CHO or mouse embryonic stem cells, we have limited resources of confirming most of our in silico essential and synthetic lethal solutions. However, these identified sets could provide a blueprint for prioritizing future deletion experiments both for model curation as well as constructing high yielding phenotypes. In all, we identified 48 lethal reaction pairs and 44 SL gene pairs (see Supplemental File S4: Table 6) for CHO 1.2. Among these, we listed 10 non-intuitive combinations (see Supplemental File S1: Table 3) of gene deletions that can help improve the performance of the future CHO model reconstructions. For example, in silico deletion of qprT, encoding for quinolate phosphoribosyltransferase (QPRT) activity in NAD biosynthesis pathway, blocks the synthesis of biomass precursors NAD + , NADH, NADP + , and NADPH. Removal of QPRT prevents any fresh supply of nicotinate D-ribonucleotide (NDRT) to the nicotinamide regeneration cycle (see figure 12(a)), thereby preventing any of the intermediate metabolite flux in the cycle (e.g., NAD + ) to be diverted towards other pathways such as NADPH and biomass formation. The cycle functions as a futile cycle dissipating ATP. Production of all nicotinatmide-related cofactors (i.e., NAD + , NADH, NADP + and NADPH) are impaired resulting in zero biomass formation. Note that deletion of any of the genes (or gene pairs) encoding for the reactions in the nicotinamide regeration cycle (i.e., kynU, nmnAt and nadS) are in silico essential (pnp1 and pnp2 form a lethal pair) for directly impairing activity of the cycle.
Another suggestion involves synthetic lethality caused by in silico removal of slc14a1 and slc14a2 genes. These genes encode for transporter proteins for urea whose removal prevents the export of urea from cytosol out of the cell. As there are no other pathways in the current CHO model to consume urea, deletion of urea export reactions prevents the activity of the urea cycle where it is synthesized. As a result, the model cannot synthesize intermediate metabolite ornithine required for the production of biomass precursor putrescine (see Figure 12(b)). The deletion of these genes can confirm if urea production is coupled with biomass or there are additional pathways of urea metabolism (or alternate pathways for ornithine and putrescine production) missing in the current reconstruction.

SL2U Case ΔdhcR24ΔchoL4
Similarly, in silico removal of both dhcR24 and choL4 cause complete loss of sterol delta-reductase (DSR) activity (see Figure 12(c)). Removal of DSR activity shuts off the cholesterol biosynthesis pathways in silico. Cholesterol is a biomass precursor, so DSR deletion in silico is causal to lethality. Deletion of choL4 prevents zymosterol from being converted to dehydrocholesterol-cholesta 5, 7-dien 3-betaol (DC57D3B) (see Figure 12(c)) and ultimately to cholesterol in a cascade of four steps. However, cytosolic zymosterol can still be converted to ebp-encoded (emopamil binding protein-EBP) reaction forming N-cholesta 7, 2, 4-dien 3-betaol (NC724D3B) (see Figure 12(c)), which is subsequently converted to cholesterol. We note that EBP activity is common to both the routes (see Figure  12c), which makes ebp an essential gene for in silico cholesterol biosynthesis and subsequently biomass. In addition to manual inspection of in silico synthetic lethal suggestions for non-intuitive examples, we performed a node centrality analysis synthetic lethal landscape for CHO 1.2 (see Figure 13). Node centrality analysis is an important tool of querying complex networks (such as gene-association networks) to identify key nodes that have the maximal influence on the topology of the network [92]. In our case, we constructed the network of synthetic lethal gene pairs (see Figure 13) and ranked the genes based on the number of lethal pairs they were associated with. The graph shows that pc3, encoding for diacylglycerol choline phosphotransferase (DCP) activity in phosphatidyl choline (PTC) pathway, is associated with a maximum number of eight synthetic lethal gene pairs (see Figure 13). For example, pc3 forms a lethal pair with pc4 encoding for methylene-fatty-acyl-phospholipid synthase (MPS) reaction. In silico simultaneous gene deletion of pc3 and pc4 prevents synthesis of biomass precursor PTC synthesis (see Figure 14). Similarly, impA2 is associated with four in silico SL2 (See Figure 13). One such case is lethality due to deletion of impA2 and ugtLa2 genes, which prevents the synthesis of biomass precursor phosphatidyl myoinositol. Note that the results of node centrality analysis (see Supplemental File S4: Table 8) can be utilized to prioritize the construction of mutant strains. For example, while Δpc3 strain can be utilized to verify in silico-in vivo consistency for eight synthetic lethal pairs, ΔpbeF1 strain can be used to verify just a single case (pbeF1-prpS2).

Suggested Experiments for Higher Order Gene Deletions
We have further cataloged higher order lethal gene lists that show 20 lethal gene triplets and three lethal gene quadruplets. This is supplemented with information about the biomass precursor(s) each lethal mutant fails to synthesize (see Supplemental File S4: Tables 5-7). We hereby elucidate two higher order gene deletion experiments (see Table 6) to be tested in mouse embryonic stem cells or CHO-K1 cell lines based on the homology of gene functionality between mouse and CHO.

SL3U Case Δcox(N)ΔsdhDΔdhoDh
The complex GPR relationship for the cytochrome-C oxidase (CCO) reaction contributes to 18 in silico SL3 combinations (see Figure 15(a)). Unlike any other reaction in this model, the CCO activity is performed by a holoenzyme that is encoded by 20 different cox genes. 18 of these 20 cox genes constitute lethal gene triplets with a putative succinate dehydrogenase (sdhD) and a dihydroorotate dehydrogenase (dhoDh). In silico removal of cox(N), sdhD and dhoDh genes results in inability to regenerate cytosolic FAD (see Figure 15(b)). Consequently, FAD dependent sphinganine to sphingosine conversion is blocked that inhibits the synthesis of Ceramide 3-acyl sphingosine (C3AS). C3AS is required for the synthesis of biomass precursor sphingomyelin, which is blocked upon deletion of all the three genes thereby causing lethality.

SL4U Case ΔnplΔnanSΔst8Sia1Δst8Sia5
Likewise, in silico quadruple deletion of npl, nanS, st8Sia1 and st8Sia5 genes inhibit anapyruvate lyase (APL) and sialyltransferase (SIL) activities. CHO 1.2 contains four different pathways of synthesizing N-acetyl neuraminate. Loss of all APL and SIL activities shut off all four pathways that form biomass precursor N-acetyl neuraminate (see Figure 16) in silico and thus leads to lethality.

Methods
The Synthetic Lethality Finder protocol [93] was applied for (i) the Yeast 7.11 [7] genome-scale model of S. cerevisiae under aerobic condition in minimal media and (ii) for the CHO 1.2 [80] model under aerobic conditions and minimal essential media [94] to identify essential reactions (for double, triple and higher order) deletions. The SL finder [93] identifies reaction (or combination thereof) in the metabolic network whose removal would restrict the maximum biomass below a pre-specified cutoff. This is solved using a min-max mixed-integer linear problem (MILP) where the inner problem maximizes biomass subject to network stoichiometry and nutrient uptake constraints. The outer problem, at the same time, identifies reaction whose removal would minimize the cellular biomass below a specified cut-off. By iteratively increasing the number of reaction deletion k, one can identify lethal reaction combinations of higher-order. For example, setting k equal to one identifies all essential reactions. Accordingly, k = 2, 3… refer to synthetic lethal pairs, triplets, etc. Upon identification of lethal reaction combinations, we elucidated the lethal deletion combinations at the gene level by making use of GPR associations. For example, in yeast, the essential holoenzyme acetyl-CoA carboxylase involves two functional subunits encoded by genes ACC1 and BPL1. Deleting either of these genes prevents conversion of acetyl-CoA to malonyl-CoA therefore both genes are essential. In contrast, methylenetetrahydrofolate reductase is an essential reaction encoded by isozymes MET13 and MET12. Both these genes need to be knocked-out to prevent the formation of 5-methyltetrahydrofolate from 5, 10-methylenetetrahydrofolate. Thus, in this case the reaction is essential but the associated genes form a synthetic lethal pair.
The minimal medium for yeast comprised of potassium, sodium, iron (II), nitrogen (as ammonia), sulfur (as sulfate) and phosphorus (as inorganic phosphate). Glucose was used as the sole carbon source. Trace nutrients such as 4-aminobenzoate, biotin, myo-inositol, nicotinate, pantothenic acid and thiamin were included [95]. Glucose and oxygen uptake rates were set at 100 and 20 mmol gDW −1 h −1 , respectively (in accordance with the experimental study [96] which showed that oxygen uptake is about a fifth of glucose uptake on a molar basis for aerobic growth in yeast). The non-growth associated ATP maintenance was set at 1 mmol gDW −1 h −1 as proposed by Mo et al. [95]. Maximum biomass (FBA) calculations for CHO cells were carried out under minimal essential media [80] (used for CHO cell culture as shown in Riordan et al. [97], Stanners et al. [98]) with glucose, amino acids (Val, Lys, Leu, Thr, Met, Arg and His) and all four nucleotides along with nitrogen sources. 10% of the maximum theoretical biomass was chosen for both cases as the cutoff for growth [99]. We used the mixed integer program CPLEX solver in GAMS using an Intel Xeon X5675 Six-Core 3.06 GHz with 48GB of physical memory, for reaction level results and Python v3.4.1 for all calculations.
The effect of gene deletions on biomass component production was also systematically assessed using a Precursor Identifier algorithm (see Supplemental File S2). Once a lethal gene deletion combination is identified, the corresponding biomass component whose production is compromised is identified along with the inactivated production pathway. Python scripts were used to generate topology files for most networks drawn in this paper in DOT graphing language. Graphs were displayed using Omnigraffle 6.0 TM operating on the python scripts and post-processed. The updated S. cerevisiae model was generated using COBRA Toolbox v2.0 [100] in Excel and SMBL formats respectively (see Supplemental Files S7 and S8).

Conclusions
In this study we propose 50 model modifications for Yeast 7.11 and eight modifications for CHO 1.2 that improve their model consistencies in terms of essential and lethal gene predictions. In addition, we have also suggested non-intuitive gene deletion combinations for both yeast and CHO-K1 cell lines for experimental validation that can aid in future curation of the genome-scale models. Overall, the contribution of multi-gene deletion data to enhance the performance of genome-scale metabolic reconstructions has been demonstrated in this work. In addition, our analysis identifies several cases where the metabolism-only description of the current models fails to reconcile in silico-in vivo inconsistencies (ESG cases in Yeast) arising due to non-metabolic interactions. Incorporation of detailed information for gene transcription and translation in genome-scale models, as described in the ME-model formalisms (for E. coli [101] and T. maratima [102]), offer opportunities for reconciling these inconsistencies. In case of CHO 1.2 model, however, the major obstacle towards model curation remains to be the paucity of gene deletion information for CHO-K1 cell lines, or the related mouse embryonic stem cells. A comprehensive gene deletion databank, similar to that available for E. coli an S. cerevisiae, will greatly contribute towards improving the performance of the current model.

Author Contributions
CDM conceived and coordinated the study, participated in its design and helped to draft the manuscript. RC performed the simulations and drafted the manuscript. RC and AC performed data analysis. All authors read and approved the final manuscript.

Conflicts of Interest
The authors declare no conflict of interest.

Appendix
Description of the Abbreviations GPR-Gene Protein Reaction association ES-Essential genes G-Genes whose singular deletion does not affect cell viability SLx-Set of x genes whose simultaneous deletion is lethal. For example, SL3 refers to a set of 3 genes that constitute a lethal triplet ESES-Genes whose deletion reduces cellular growth below the viability threshold for both in silico and in vivo cases GG-Genes which when deleted in silico and knocked out in vivo does not decrease cell growth in both cases GES-This represents the list of genes which when deleted in silico does not affect cell growth. However, in vivo knockout of such genes, leads to cell death. SL2ES-A non-essential gene whose simultaneous deletion with another non-essential gene is lethal in silico, however, at least one of which is essential in vivo. ESSL2-This are those gene pairs for which a simultaneous knockout of both genes in vivo causes cell death. However, at least one of the genes of the pair when knocked out in silico causes cell death. SL2G-This is the list of gene pairs where simultaneous deletion of both genes in silico causes cell death. However, a simultaneous in vivo knockout of both genes is non-lethal for the cell. GSL2-This is the list of gene pairs where simultaneous deletion of both genes is non-lethal in silico. However, a simultaneous in vivo knockout of both genes is lethal for the cell. SL2SL2-Simultaneous removal of the genes of this list of gene pairs causes cell death in silico and in vivo both. However, a single gene deletion was not lethal in both in silico and in vivo. SL2U-Simultaneous deletion of the genes of this list of gene pairs causes in silico lethality. However, there exist no in vivo evidence for the same. This puts them to the in vivo untested (U) category.