A Genome-Scale Metabolic Model for the Human Pathogen Candida Parapsilosis and Early Identification of Putative Novel Antifungal Drug Targets

Candida parapsilosis is an emerging human pathogen whose incidence is rising worldwide, while an increasing number of clinical isolates display resistance to first-line antifungals, demanding alternative therapeutics. Genome-Scale Metabolic Models (GSMMs) have emerged as a powerful in silico tool for understanding pathogenesis due to their systems view of metabolism, but also to their drug target predictive capacity. This study presents the construction of the first validated GSMM for C. parapsilosis—iDC1003—comprising 1003 genes, 1804 reactions, and 1278 metabolites across four compartments and an intercompartment. In silico growth parameters, as well as predicted utilisation of several metabolites as sole carbon or nitrogen sources, were experimentally validated. Finally, iDC1003 was exploited as a platform for predicting 147 essential enzymes in mimicked host conditions, in which 56 are also predicted to be essential in C. albicans and C. glabrata. These promising drug targets include, besides those already used as targets for clinical antifungals, several others that seem to be entirely new and worthy of further scrutiny. The obtained results strengthen the notion that GSMMs are promising platforms for drug target discovery and guide the design of novel antifungal therapies.


Introduction
In a world of climate and social change, human susceptibility to microbial disease is increased. In particular, fungal infections have seen a significant rise in incidence worldwide since the 1980s, with Candida spp. accounting for the majority of cases [1]. Although Candida albicans is the most commonly isolated species from candidiasis patients, the 1990s saw a shift in incidence within the genus towards non-Candida albicans Candida species (NCAC) [2]. From these, Candida parapsilosis has seen one of the most significant increases, often surging as the second most common etiological agent of Candida spp. infections, subverting historical trends in species incidence and even outranking C. albicans in some European countries [3]. Non-geographically restricted and with a broad range of virulence factors, adding to C. parapsilosis' already complex pathogenicity, are both the rise in resistance to first-line antifungals and the intrinsically lower susceptibility to alternative therapies, such as azoles [4] and echinocandins [1], respectively. Thus, there is a strong need to develop new antifungal therapies and develop new research tools to understand

Model Construction
The herein described metabolic model reports on the yeast Candida parapsilosis with the taxonomic ID 5480. Model reconstruction was performed using merlin 4.0.5 [17], and further curation and validation were performed on OptFlux 3.0 [15] using the IBM CPLEX solver. Throughout the curation process, reactions were edited, manually added to, or removed from the model to correct gaps in the network using KEGG pathways, MetaCyc Database, and literature data as standards.

Enzyme and Reaction Annotation
The initial draft model construction comprised enzyme and subsequent reaction annotation. The genome sequence of the reference strain Candida parapsilosis CDC317 was obtained from NCBI's Assembly database, accession number ASM18276v2 [18], and the Taxonomy ID from NCBI [19], which is required by merlin to identify the organism under study throughout the reconstruction process univocally. The genome-wide functional annotation was processed by merlin based on taxonomy and frequency of similar sequences through remote Basic Local Alignment Search Tool (BLAST) [20] similarity searches to the UniProtKB/Swiss-Prot database [21]. Hit selection was performed as described elsewhere [22], and phylogenetic proximity was implemented as described in Tsui et al. 2008 [23]. Protein-reaction associations available in the Kyoto Encyclopedia of Genes and Genomes (KEGG) [24] database were used to assemble the draft network.

Correcting Reaction Reversibility, Directionality, and Balance
The initial reversibility curation was automatically performed by merlin, which implements information from remote tools such as eQuilibrator [25], as described by Dias et al. [22]. Further curation was entirely manual and justified, resorting to information from Meta-Cyc [26] and existing literature. Unbalanced reactions were identified automatically, and balancing was performed manually and based on MetaCyc [26], ChEBI [27], Brenda [28], and existing literature. All the reactions manually edited during the curation process can be found in Supplementary file S1.

Compartmentalisation
Compartmentalisation was implemented using WoLF PSORT [29], a protein localisation predictor, on the already connected non-compartmentalised model to simplify issue solving. The constructed model includes four compartments: extracellular, cytoplasm, mitochondrion, and peroxisome and one intercompartment: the cytoplasmic membrane.
A compartmentalised model calls for the implementation of transport reactions to connect intercompartment pathways. Transport reactions were generated using genomic information, together with the public database TCDB [30], using merlin's integrated tool TranSyT [31]. Transport reactions across internal and external membranes for currency metabolites, such as H 2 O, CO 2 , and NH 3 , often carried by facilitated diffusion, were added to the model with no gene association.

Defining the Biomass Equation
The biomass equation encompasses the cells' major components and their relative numerical contributions-DNA, RNA, carbohydrates, lipids, and proteins-and acted as the objective function in the presented essentiality analysis. The content of each component was determined based on the literature. All the calculations were performed as described previously [32].
The reconstructed model also includes ATP requirements for both biomass production and cell maintenance-growth-associated maintenance (GAM) and non-growth-associated maintenance (NGAM), respectively. A GAM value of 25.65 mmol ATP/gDCW was considered for the biomass equation, calculated based on the ATP requirements for the biosynthesis of cell polymers, as shown in Mishra et al. [33], then adjusted for the considered biomass macromolecule composition.
Non-growth-associated ATP maintenance, the amount of ATP required by the cell repair and similar processes, was implemented in this model as an autonomous equation, thus forcing a basal ATP consumption-flux bounds inferred from Candida tropicalis [33]. The biomass equation's components and their relative content are shown in Supplementary file S1.
The theoretical ratio used in the S. cerevisiae iMM904 metabolic model for the phosphorusto-oxygen ratio was applied. Three generic reactions contributing to this ratio were automatically generated by merlin and were updated to replicate the same ratio as in the iMM904 model:

Model Simulations and Enzyme Essentiality Prediction
The model simulations were performed using the flux balance analysis (FBA) [34] formulation on OptFlux 3.0 [15] using the IBM CPLEX solver. The determination of critical essential genes or reactions was performed with the following rationale: a gene/reaction is considered essential if, when removed from the model, this leads to a value of biomass flux of less than 5% of the reference value calculated for the wild-type strain. The essentiality of a gene/reaction was assessed by setting the flux of the reactions corresponding to a particular gene to zero and simulating the optimal growth rate with FBA. If deletion of one gene/reaction leads to non-growth, that gene/reaction is defined as essential. The simulations for gene/reaction essentiality were performed in environmental conditions simulating the RPMI medium, which mimics human serum.

Aerobic Batch Cultivation
Precultures (100 mL) for aerobic batch experiments were grown in SMM in 500mL flasks at 30 ºC in orbital agitation (250 rpm). Cells were grown until the exponential phase and used to inoculate fresh media at an initial optical density at 600nm (OD 600nm ) of 0.3. Aerobic batch cultivations were incubated in SMM at 30 ºC with orbital agitation (250 rpm) for 10 h.

Cell Density, Dry Weight, and Metabolite Concentration Assessment
During C. parapsilosis cell cultivation in SMM medium, 4 mL samples were harvested from the cell culture every two h, aiming for the quantification of biomass and extracellular metabolites. Cell density was monitored by measuring the OD 600nm . For dry weight determination, culture samples were centrifuged at 13,000 rpm for 3 min and the pellets were lyophilised for 72 h at −80 • C and weighted. The supernatants were centrifuged once more for clarification and the concentrations of glucose, ethanol, glycerol, and acetic acid in the supernatants were determined by HPLC on an Aminex HPX-87 H Ion Exchange chromatography column, eluted with 0.0005 M H 2 SO 4 at a flow rate of 0.6 mL/min at room temperature. Concentrations were determined through the corresponding calibration curves. Samples from any batch cultivation were analysed in triplicate. The specific growth rate, specific glucose consumption rate, and the specific production rates of ethanol, glycerol and acetic acid were calculated during the exponential growth phase as indicated elsewhere [35].

Model Characteristics, Highlighting C. Parapsilosis Unique Metabolic Features
The herein described C. parapsilosis metabolic model, iDC1003, comprises 1003 genes associated with 1804 reactions-of which 358 are drains (exchange constraints set to mimic the environmental conditions) and 536 are transport reactions-and 1278 metabolites across four compartments (extracellular, cytoplasm, mitochondria, and peroxisome) and within an intercompartment, the plasma membrane. The model can be found in an SBML format in Supplementary file S2.
Manual curation assessed a total of 847 reactions, from which 83 were mass balanced, 162 were corrected regarding reversibility or directionality, and 625 were added or removed from the model or had their annotation corrected or completed, as detailed in Supplementary file S1.
The biomass equation (Table 1) encompasses the cell's major components, along with their respective and relative contributions-DNA, RNA, carbohydrate, protein, lipid, and cofactor content. The equation's composition in carbohydrate [36], lipid [33,37]-sterol [33], phospholipid [33], and fatty acid [38]-and protein [33] was inferred from literature data. On the other hand, for the composition of DNA, the whole genome sequence was used to estimate the amount of each deoxyribonucleotide, as described in [39], while mRNA, rRNA, and tRNA were used to estimate the total RNA in the cell, as described in [9]. Essential metabolites were included in the biomass composition to account for the essentiality of their synthesis pathways 39 qualitatively. The growth and non-growth ATP requirements were inferred from Candida tropicalis [33]. The iDC1003 model was compared to the well-characterised genome-scale metabolic models of C. glabrata [40], S. cerevisiae [41], and C. albicans [14] to highlight unique metabolic features of C. parapsilosis. Although iDC1003 uses standard identifiers for reactions (KEGG ID), it is not possible to assess how the reactions differ among the four models, as the remaining models do not use the same identifiers (except for C. albicans). Thus, a comparison across the existing models was carried out based on the proteins associated with an EC number. After intersecting the EC numbers present in each of the three models, 85% of the proteins with an associated EC number in the C. parapsilosis model were found to also be present in at least one of the other three models (Figure 1). However, the remaining 15% (89/528) are exclusive to this model and may represent unique metabolic features of C. parapsilosis. It is also interesting to observe that 41 EC numbers are shared exclusively by Candida species and not by S. cerevisiae. These 41 enzymatic activities may be related to unique features of this genus, eventually linked to its pathogenicity. The complete list of unique EC numbers can be found in Supplementary file S1. model were found to also be present in at least one of the other three models (Figure 1). However, the remaining 15% (89/528) are exclusive to this model and may represent unique metabolic features of C. parapsilosis. It is also interesting to observe that 41 EC numbers are shared exclusively by Candida species and not by S. cerevisiae. These 41 enzymatic activities may be related to unique features of this genus, eventually linked to its pathogenicity. The complete list of unique EC numbers can be found in Supplementary file S1. Occasionally, unique EC numbers might be related to outdated EC numbers or associated with other enzymes responsible for the same reactions in the different models. Nevertheless, specific cases stand out as potentially unique features of C. parapsilosis: -1. , which allows the subsequent incorporation of the latter into proteins such as apotransferrin and lactoferrin. Occasionally, unique EC numbers might be related to outdated EC numbers or associated with other enzymes responsible for the same reactions in the different models. Nevertheless, specific cases stand out as potentially unique features of C. parapsilosis:

Assessing the Model's Ability to Predict Carbon and Nitrogen Source Usage
Simulations were performed in SMM and compared to phenotypic growth data from existing literature to assess iDC1003's reliability in predicting biomass production from different sole carbon or nitrogen sources. Data related to C. parapsilosis strains, other than the reference CDC317 strain, were also considered in the analysis to increase the number of carbon and nitrogen sources accounted for. A total of 47 sole carbon sources and 17 sole nitrogen sources were evaluated. The C. parapsilosis model correctly predicted growth in 94% of the carbon sources tested and in 100% of the nitrogen sources ( Table 2). The model only failed for three carbon sources, 2-Keto-D-gluconic acid, L-arabinose, and ribitol, which the literature indicates that C. parapsilosis can use for growth. Interestingly, as far as it could be assessed, there is no experimental evidence of any enzymes behind these pathways existing in yeasts. It would be interesting to look deeper into these organisms' eventually unique 2-Keto-D-gluconic acid, L-arabinose, and ribitol assimilation pathways. Table 2. In silico predictions versus in vivo described data regarding C. parapsilosis' ability to grow in the presence of sole carbon and nitrogen sources. From the 62 different tested compounds, iDC1003 correctly predicted positive or null biomass production on 95%. A plus represents biomass production (+), a minus (−) no biomass production, and prediction disparities are highlighted in bold. Referenced data from Westerdijk fungal collection refer to strains CBS 1954 and CBS 604.

In Vivo
In

Assessing the Model's Ability to Quantitatively Predict Growth Parameters
Specific growth rate, glucose consumption rate, and metabolite production rates were experimentally determined to validate the model quantitatively due to the lack of literature data for C. parapsilosis. C. parapsilosis CDC 317 cells were cultivated in SMM medium, and growth was followed by regular measurements of the OD 600nm and the cell dry weight. Samples were harvested to assess glucose concentration as a function of time during the exponential growth phase. In these conditions, a glucose consumption rate of 2.098 +/− 0.404 mmol·gDCW −1 ·h −1 was determined, while no ethanol, glycerol, or acetate production could be detected.
A simulation of the system's behaviour in environmental conditions of SMM medium with a fixed glucose uptake flux of 2.098 mmol·g −1 dry weight·h −1 was performed to evaluate the model's ability to predict the specific growth rate. The remaining nutrient fluxes were left unconstrained, as the system in these conditions is glucose-limited. Considering the experimentally determined glucose consumption rate of 2.098 mmol·gDCW −1 ·h −1 , the model predicted a specific growth rate of 0.172 h −1 . Compared to the experimentally determined rate of 0.159 +/− 0.027 h −1 , the predicted growth rate is within the uncertainty interval of the experimentally determined parameter. Thus, there is no significant difference between both, which suggests iDC1003 can predict C. parapsilosis growth parameters (Table 3) quantitatively. Additionally, the formation of glycerol, acetic acid, and ethanol as byproducts was not predicted to occur, which agrees with the experimental data and the notion that C. parapsilosis is a Crabtree-negative yeast. Altogether, iDC1003 was proved to predict the main metabolic features of C. parapsilosis quantitatively.

Enzyme Essentiality Assessment: Looking for New Drug Targets
Identification of essential enzymes of a given pathogen should, in principle, lead to the identification of good drug targets since, if one enzyme is essential for its growth or survival, a compound capable of inhibiting it could potentially be used as a drug with pharmacological activity against that pathogen. The drug target will be ideal if it is essential, or at least essential under conditions that mimic the human host environment, while having no human homolog. Based on these principles, iDC1003 was used to identify potential new drug targets by determining enzyme essentiality. For that, a list of essential reactions was obtained through simulation of the system's behaviour in RPMI medium, which mimics the environmental conditions of human serum. A total of 147 enzymes were predicted as essential in iDC1003, excluding drains, transport reactions and those without an associated or incomplete EC number. The complete list of predicted essential enzymes can be found in Supplementary file S1.
Finally, we decided to intersect the potential drug targets obtained through the C. parapsilosis model with those resulting from the two existing models for pathogenic Candida species, C. albicans [14] and C. glabrata [40], to focus on essential enzymes that can be used as targets in the treatment of infections caused by all Candida species. After the intersection, 56 essential enzymes common to the three models were found (Table 4, Figure 2).
Genes 2022, 13, x FOR PEER REVIEW 9 of 14 be used as targets in the treatment of infections caused by all Candida species. After the intersection, 56 essential enzymes common to the three models were found (Table 4, Figure 2).

Figure 2.
Intersection of C. parapsilosis, C. albicans, and C. glabrata essential EC numbers in RPMI medium environmental conditions in the genome-scale metabolic models iDC1003, iRV781, and iNX804, respectively. Diagrams were obtained using Multiple List Comparator (www.molbiotools.com (accessed on 1 December 2021)).
Consistently, well-established antifungal drug targets were identified, including the well-known targets of azoles and echinocandins, Erg11 and Fks1, respectively. Additionally, some of the identified predicted drug targets are homologs of enzymes used Figure 2. Intersection of C. parapsilosis, C. albicans, and C. glabrata essential EC numbers in RPMI medium environmental conditions in the genome-scale metabolic models iDC1003, iRV781, and iNX804, respectively. Diagrams were obtained using Multiple List Comparator (www.molbiotools. com (accessed on 1 December 2021)). Table 4. Enzymes predicted to be essential in RPMI medium based on the screening of the genomescale metabolic models of C. parapsilosis, iDC1003, C. albicans, iRV781, and C. glabrata, iNX804. Consistently, well-established antifungal drug targets were identified, including the well-known targets of azoles and echinocandins, Erg11 and Fks1, respectively. Additionally, some of the identified predicted drug targets are homologs of enzymes used as drug targets against other pathogenic organisms, including Imh3, which is targeted by inosinic acid in Streptococcus pyogenes, Fol1, a target of sulfonamides, and Fas1, a target of ethionamide, used in the treatment against Mycobacterium tuberculosis. These confirmatory results illustrate the potential of the used approach in the quest for new drug targets. More interesting, however, is the identification of numerous potential targets, as identified herein, that are not currently targeted by any drug used in clinical practice. There are some new targets with tremendous potential, as these do not have an orthologous enzyme in humans. Although not an excluding factor, the absence of a human ortholog is a preferable attribute, as this translates into lower chances of host drug toxicity and may allow for greater freedom of drug design.

Gene
Among the identified potential new drug targets, Abz1/2, Erg4, and Ura1 emerge as enzymes without any human homolog or drug association.
Fungi rely on folate de novo biosynthesis given their inability to uptake folate from the environment [50]. FOL1, ABZ1, and ABZ2 encode key enzymes in the folate biosynthesis pathway (Figure 3). Furthermore, these enzymes have no human orthologs, as humans rely on diet-derived folate [50]. The dihydropteroate synthase encoded by FOL1 has been shown to be successfully inhibited by antifolates, such as sulfonamides, in a series of microorganisms [50,51]. However, antifolate therapy for Candida infections is not particularly effective considering current antifolate compounds [52]. In fact, for C. albicans, only sulfanilamide is used clinically, and it is restricted to topical use [53]. Given the efficacy of antifolates in treating infections by other etiologic agents, this might present the opportunity to design new effective antifungal compounds against Candida Fol1. On the other hand, no inhibitors of the para-aminobenzoate synthetase encoded by ABZ1 or of the 4-amino-4-deoxychorismate lyase encoded by ABZ2 are currently known, making these two enzymes fully novel putative drug targets worth further exploitation. The Ura1 and Erg4 proteins may also be interesting drug targets. Ura1 has no human ortholog and, although this protein is not the target of any known drug, Aro9, from the same pathway, is a known target of atovaquone used to treat Plasmodium falciparum infections. In turn, Erg4, also with no human ortholog, is involved in ergosterol biosynthesis, a pathway currently targeted by azole drugs.

Conclusions
The first validated global metabolic model for the human pathogen C. parapsilosis is presented in this study. The model was manually curated and validated experimentally and proved to be able to predict the main metabolic features of C. parapsilosis quantitatively. Furthermore, iDC1003 is robust in predicting the use of several carbon and nitrogen sources. The model shares 85% of the proteins with an associated EC number in other published yeast models. However, 15% of them are exclusive to this model and may represent some unique metabolic features of C. parapsilosis. Using iDC1003, several enzymes were predicted to be essential in RPMI medium, including some already known targets of antifungal agents and other antimicrobial agents used in clinical practice, illustrating the potential of the used approach in the quest for new drug targets. Several of the identified potential drug targets are not currently targeted by any drug used in clinical practice, and deserve further study. Among the identified potential new drug targets, Abz1/2, Erg4, and Ura1 stand out as enzymes without any human homolog or drug association. However, all other targets are worthy of scrutiny as, in fact, many of the drugs currently used in clinical practice have human orthologs. In these cases, the strategy may involve taking advantage of the structural differences between the proteins in the two organisms to design efficient compounds.
Despite the clear usefulness of these types of models, it is important to highlight that these reconstructions have limitations. Firstly, the basis of GSMMs is the genome's functional annotation. Depending on the stringency of the criteria imposed on hit selection, such a procedure may lead either to compromising rates of false positive or The Ura1 and Erg4 proteins may also be interesting drug targets. Ura1 has no human ortholog and, although this protein is not the target of any known drug, Aro9, from the same pathway, is a known target of atovaquone used to treat Plasmodium falciparum infections. In turn, Erg4, also with no human ortholog, is involved in ergosterol biosynthesis, a pathway currently targeted by azole drugs.

Conclusions
The first validated global metabolic model for the human pathogen C. parapsilosis is presented in this study. The model was manually curated and validated experimentally and proved to be able to predict the main metabolic features of C. parapsilosis quantitatively. Furthermore, iDC1003 is robust in predicting the use of several carbon and nitrogen sources. The model shares 85% of the proteins with an associated EC number in other published yeast models. However, 15% of them are exclusive to this model and may represent some unique metabolic features of C. parapsilosis. Using iDC1003, several enzymes were predicted to be essential in RPMI medium, including some already known targets of antifungal agents and other antimicrobial agents used in clinical practice, illustrating the potential of the used approach in the quest for new drug targets. Several of the identified potential drug targets are not currently targeted by any drug used in clinical practice, and deserve further study. Among the identified potential new drug targets, Abz1/2, Erg4, and Ura1 stand out as enzymes without any human homolog or drug association. However, all other targets are worthy of scrutiny as, in fact, many of the drugs currently used in clinical practice have human orthologs. In these cases, the strategy may involve taking advantage of the structural differences between the proteins in the two organisms to design efficient compounds.
Despite the clear usefulness of these types of models, it is important to highlight that these reconstructions have limitations. Firstly, the basis of GSMMs is the genome's functional annotation. Depending on the stringency of the criteria imposed on hit selection, such a procedure may lead either to compromising rates of false positive or negative annotations. Furthermore, such models do not encompass regulatory processes due to the high complexity of such an integration. Note how enzymatic activity can be regulated at different levels-from gene expression to post-translational modifications-and may result in given pathways being preferential in certain environmental conditions. The exclusion of such mechanisms, particularly regarding essentiality prediction, may result in predicted essential ECs that would otherwise not be metabolically relevant in the conditions of interest. Lastly, these simulations do not consider supra-metabolic factors, such as stress factors. Even so, and considering all these limitations, GSMMs allow for increasingly guided and reliable drug target discovery-as illustrated in this paper. Funding: This work was supported by "Fundação para a Ciência e a Tecnologia" (FCT) (Contract PTDC/BII-BIO/28216/2017 and AEM PhD grant to RV). Funding received from project LISBOA-01-0145-FEDER-022231-the BioData.pt Research Infrastructure is acknowledged. This work was further financed by national funds from FCT in the scope of the project UIDB/04565/2020 and UIDP/04565/2020 of the Research Unit Institute for Bioengineering and Biosciences-iBB, project UIDB/04469/2020 for the Centre of Biological Engineering-CEB, and the project LA/P/0140/2020 of the Associate Laboratory Institute for Health and Bioeconomy-i4HB.

Data Availability Statement:
All data generated in this study is available in the paper or as supplementary material.

Conflicts of Interest:
The authors declare no conflict of interest.