Modular Engineering of Biomass Degradation Pathways

Production of fuels and chemicals from renewable lignocellulosic feedstocks is a promising alternative to petroleum-derived compounds. Due to the complexity of lignocellulosic feedstocks, microbial conversion of all potential substrates will require substantial metabolic engineering. Non-model microbes offer desirable physiological traits, but also increase the difficulty of heterologous pathway engineering and optimization. The development of modular design principles that allow metabolic pathways to be used in a variety of novel microbes with minimal strain-specific optimization will enable the rapid construction of microbes for commercial production of biofuels and bioproducts. In this review, we discuss variability of lignocellulosic feedstocks, pathways for catabolism of lignocellulose-derived compounds, challenges to heterologous engineering of catabolic pathways, and opportunities to apply modular pathway design. Implementation of these approaches will simplify the process of modifying non-model microbes to convert diverse lignocellulosic feedstocks.


Introduction
Lignocellulosic biomass is the most abundant renewable feedstock for production of fuels and chemicals. The U.S. Department of Energy currently projects that the United States could produce more than one billion tons of biomass per year, offsetting approximately 30% of petrochemical consumption in the country [1]. Biofuels are the largest potential market for biomass-derived chemicals, with corn-based ethanol the major current market [2]. In contrast to starch-based fuels, lignocellulosic fuels and chemicals have the advantage of not directly competing with the food supply and have the potential for lower feedstock costs. However, using lignocellulose as a feedstock brings significant additional complexity.
Lignocellulose is a complex matrix of oligosaccharides and aromatic compounds, providing plants with strength, rigidity, and pathogen defense. It is composed of three major polymers, namely cellulose, hemicellulose, and lignin. The composition of each polymer varies depending on the tissue, species, and plant age. Cellulose makes up 35-50% of the lignocellulosic biomass and is a homopolymer of glucose. Hemicellulose can comprise 20-35% of the total dry cell weight and has an irregular structure of heteropolymers formed from several different five-and six-carbon monosaccharides, including xylose, arabinose, mannose, glucose, and galactose. Finally, lignin is a phenylpropanoid polymer that provides 10-25% of the total dry cell weight of wood. Lignin is randomly polymerized from three different monolignols: coniferyl alcohol, p-coumaryl alcohol, and sinapyl alcohol. The chains of cellulose are bundled together to form microfibrils, which are then covered in hemicellulose and lignin [3]. The cellulose, hemicellulose and lignin fibers are then further packed together to form macrofibrils [4][5][6].
The second challenge is identifying organisms that readily metabolize the lignocellulosic monomers, can tolerate growth inhibitors released in the pre-treatment process, are robust enough to withstand large-scale commercialization environments, and produce a single desired product. No wild-type microbe has all these desired characteristics [13], necessitating genetic engineering to introduce the missing capabilities. Most model microbes will metabolize glucose, but pathways for catabolism of pentose sugars and lignin-derived aromatic compounds are less common. Recent efforts have focused on engineering well-studied model organisms to broaden the range of potential substrates, such as Saccharomyces cerevisiae [14], Corynebacterium glutamicum [15], Escherichia coli [16], Zymomonas mobilis [17], Bacillus subtillis [18], and Pseudomonas putida [19].
Each combination of feedstock, deconstruction method, and product will impose different requirements on the microbes used for conversion. Even after substantial engineering, no single microbe will be optimal for all potential conditions. Instead, different organisms will be selected and engineered for particular uses. Current model organisms are well-suited for specific applications, such as conversion of glucose to high-value products by E. coli and S. cerevisiae, but new applications may require the identification and engineering of new organisms with desirable characteristics. Rather than engineering complex phenotypes into model organisms, it may be simpler to use strains Figure 1. Process diagram of lignocellulosic biofuel production. Plant biomass is pretreated to loosen the cell wall structure for enzymatic hydrolysis and the release of fermentable sugars from structural polysaccharides, such as cellulose. Sugars are then fermented into biofuels or other chemical products. Pretreatment, enzymatic hydrolysis, and fermentation can theoretically be performed together via consolidated bioprocessing using specialized microbes. Lignocellulose biorefining generates a lignin-rich waste stream that can be depolymerized using a variety of techniques. Lignin depolymerization can alternatively occur prior to carbohydrate extraction using techniques such as reductive catalytic fractionation. Lignin depolymerization generates complex mixtures of lignin-derived monomers that can be microbially converted to value-added chemicals, thus increasing the value of these waste streams.
The second challenge is identifying organisms that readily metabolize the lignocellulosic monomers, can tolerate growth inhibitors released in the pre-treatment process, are robust enough to withstand large-scale commercialization environments, and produce a single desired product. No wild-type microbe has all these desired characteristics [13], necessitating genetic engineering to introduce the missing capabilities. Most model microbes will metabolize glucose, but pathways for catabolism of pentose sugars and lignin-derived aromatic compounds are less common. Recent efforts have focused on engineering well-studied model organisms to broaden the range of potential substrates, such as Saccharomyces cerevisiae [14], Corynebacterium glutamicum [15], Escherichia coli [16], Zymomonas mobilis [17], Bacillus subtillis [18], and Pseudomonas putida [19].
Each combination of feedstock, deconstruction method, and product will impose different requirements on the microbes used for conversion. Even after substantial engineering, no single microbe will be optimal for all potential conditions. Instead, different organisms will be selected and engineered for particular uses. Current model organisms are well-suited for specific applications, such as conversion of glucose to high-value products by E. coli and S. cerevisiae, but new applications may require the identification and engineering of new organisms with desirable characteristics. Rather than engineering complex phenotypes into model organisms, it may be simpler to use strains that, for example, are natively capable of degrading unpretreated biomass, catabolizing lignin-derived aromatics, or tolerating biomass-derived inhibitors [20]. Since feedstock composition is highly variable and pathways for lignocellulose catabolism are not universal, rapid deployment of new microbes with wide substrate ranges such as P. putida [21], Kluyveromyces marxianus [22], and Scheffersomyces stipitis [23], will require facile methods to engineer non-model microbes and allow the use of additional carbon sources.
Optimization of metabolic pathways in heterologous hosts is often difficult. The basic challenges are consistent from pathway to pathway and organism to organism, such as methods for stable integration of new genetic material [24] or the development of transcriptional and translation control elements that function predictably in the new host [25]. However, the introduction of new metabolic pathways, such as those for catabolism of lignocellulosic sugars and aromatic compounds, can be a significant perturbation to the metabolic and regulatory networks of the host. For example, heterologous pathways may produce new metabolites that interact with native enzymes and transcription factors, or heterologous enzymes can divert native metabolites into nonproductive pathways [26,27]. Understanding the challenges involved in heterologous pathway expression will allow us to engineer new metabolic modules that minimize these impacts [28]. When deleterious interactions cannot be avoided, careful characterization will still allow the deliberate selection of the module with the least impact.
Development of a new metabolic module will require three main steps. First, the catabolic enzymes and any necessary transporters must be identified. Second, the enzymes must be carefully characterized in heterologous hosts to identify limitations in their use. Finally, the catabolic modules can be optimized to maximize portability and identify contraindications for use under specific circumstances. In the following sections, we will describe current efforts in these areas, focusing on applications to catabolism of sugars and aromatic compounds derived from lignocellulose.

Lignocellulose Composition
Lignocellulose composition differs between biomass sources, in both the relative proportions of the major polymers as well as the subunit composition of those polymers. Pathway engineering efforts will depend on the choice of host and feedstock to ensure that the engineered strain's catabolic capabilities match the feedstock composition. Some organisms may be easier to engineer for catabolism of a particular feedstock, so these factors should be considered early in the design process.

Cellulose Composition
Cellulose is a relatively predictable polymer, mainly varying in the degree of crystallinity. The products of cellulose depolymerization are typically glucose or cellobiose. Cellobiose is a disaccharide composed of two glucose molecules linked by β(1,4) bonds. Hydrolysis of cellobiose into glucose monomers can be done naturally by some microbes [29][30][31], by acid catalysis, enzymatically, or alternatively, the cellobiose has recently been shown to be able to be fed directly to microbes and catabolized into glucose in vivo [19,32,33]. Hemicellulose and lignin composition are much more variable, and more dependent on the deconstruction method, and therefore will be discussed in more detail.

Hemicellulose Composition
Hemicellulose is a mixed polymer made up of four main polysaccharide types including xylans, mannans, β-glucans, and xyloglucans [34]. The structure of each saccharide varies in glycoside linkages, side-chain types, localization, and distribution. Different plant species will have varying ratios of each saccharide, lending to the diversity in physical plant structures and the challenge of engineering microbes to degrade plant biomass ( Figure 2). Major carbohydrate composition of potential sources of lignocellulosic feedstock, birch [35], pine [35], switchgrass [36], corn cob [37], and wheat straw [38]. In addition to the sugars highlighted here, total biomass also includes lignin and minerals [39].
Xylans are the main saccharide component of hemicellulose secondary cell walls, making up 20-30% of the biomass of dicotyls such as herbaceous and hardwood plants, and up to 50% of the biomass of monocotyls such as grasses and cereals. Xylans can exist as heteropolymers with a β-(1,4)-D-xylopyranose backbone, containing branches of D-glucuronic acid, 4-O-methyl ether, L-arabinose, or oligosaccharides of L-arabinose, D-xylose, D or L-galactose, or D-glucose [40].
Mannan saccharides occur in the greatest quantity in softwoods, in the form of either galactomannans or glucomannans. Galactomannans are built from a β-(1,4) linearly linked Dmannopryanose backbone, with galactose α-(1,6) linkages. Glucomannans are composed of a glucose and mannose backbone, linked by β-(1,4) bonds, with one galactose side chain, and sometimes acetylated hydroxyl groups on the mannose backbone. Low levels of glucuronic acid are present in the cell wall, comprised of alternating mannose residues and glucuronic acid residues in the backbone, with xylose, galactose, and arabinose side chains [34].
Xyloglucans are found in the primary cell walls of both hard and softwoods [41] and are composed of a main chain of α-D-xylopyranose and D-glucopyanose residues (in a 3:1 ratio) linked by β-(1,4) bonds, in a similar pattern to cellulose. Side chains of xyloglucan can be composed of mono, di-or trisaccharides of xylose, fructose, or galactose [42].
The carbohydrate components of lignocellulose have been fractionated into their respective monomer components through multiple different chemical methods including acid-catalyzed carbohydrate (CAH) conversion [43], diluted acid-catalyzed (DAH) conversion [44], acidified ionic liquid conversion [45,46], and γ-valerolactone (GVL)/H2SO4 conversion [47]. Mechanical, enzymatic, and pyrolysis are also common techniques used for the extraction of sugar monomers [48]. Each method provides its own advantages and disadvantages, depending on the source of feedstock and the desired downstream use.

Lignin Composition
Lignin is a heterogeneous aromatic polymer constituting roughly 15-30% of the major sources of plant biomass [49]. Lignin chemical composition differs dramatically between plant feedstocks [50] ( Table 1). Plants vary in the type and relative proportion of lignin monomers that constitute their lignin, which impacts the compounds available for lignin valorization. Hardwoods contain a greater proportion of lignin derived from sinapyl alcohol (S-lignin), whereas softwoods contain primarily coniferyl alcohol-based lignins (G-lignin) [51]. Grass substrates such as corn stover or switchgrass contain an additional p-coumaryl alcohol-based lignin (H-lignin) as well as ferulic acid (FA)  [35], pine [35], switchgrass [36], corn cob [37], and wheat straw [38]. In addition to the sugars highlighted here, total biomass also includes lignin and minerals [39].
Xylans are the main saccharide component of hemicellulose secondary cell walls, making up 20-30% of the biomass of dicotyls such as herbaceous and hardwood plants, and up to 50% of the biomass of monocotyls such as grasses and cereals. Xylans can exist as heteropolymers with a β-(1,4)-D-xylopyranose backbone, containing branches of D-glucuronic acid, 4-O-methyl ether, Larabinose, or oligosaccharides of L-arabinose, D-xylose, D or L-galactose, or D-glucose [40].
Mannan saccharides occur in the greatest quantity in softwoods, in the form of either galactomannans or glucomannans. Galactomannans are built from a β-(1,4) linearly linked D-mannopryanose backbone, with galactose α-(1,6) linkages. Glucomannans are composed of a glucose and mannose backbone, linked by β-(1,4) bonds, with one galactose side chain, and sometimes acetylated hydroxyl groups on the mannose backbone. Low levels of glucuronic acid are present in the cell wall, comprised of alternating mannose residues and glucuronic acid residues in the backbone, with xylose, galactose, and arabinose side chains [34].
Xyloglucans are found in the primary cell walls of both hard and softwoods [41] and are composed of a main chain of α-D-xylopyranose and D-glucopyanose residues (in a 3:1 ratio) linked by β- (1,4) bonds, in a similar pattern to cellulose. Side chains of xyloglucan can be composed of mono, di-or trisaccharides of xylose, fructose, or galactose [42].
The carbohydrate components of lignocellulose have been fractionated into their respective monomer components through multiple different chemical methods including acid-catalyzed carbohydrate (CAH) conversion [43], diluted acid-catalyzed (DAH) conversion [44], acidified ionic liquid conversion [45,46], and γ-valerolactone (GVL)/H 2 SO 4 conversion [47]. Mechanical, enzymatic, and pyrolysis are also common techniques used for the extraction of sugar monomers [48]. Each method provides its own advantages and disadvantages, depending on the source of feedstock and the desired downstream use.

Lignin Composition
Lignin is a heterogeneous aromatic polymer constituting roughly 15-30% of the major sources of plant biomass [49]. Lignin chemical composition differs dramatically between plant feedstocks [50] ( Table 1). Plants vary in the type and relative proportion of lignin monomers that constitute their lignin, which impacts the compounds available for lignin valorization. Hardwoods contain a greater proportion of lignin derived from sinapyl alcohol (S-lignin), whereas softwoods contain primarily coniferyl alcohol-based lignins (G-lignin) [51]. Grass substrates such as corn stover or switchgrass contain an additional p-coumaryl alcohol-based lignin (H-lignin) as well as ferulic acid (FA) crosslinks between lignin and hemicellulose [52]. These and some less prominent monomers such as tricin in grasses [53] crosslink to form C-C and C-O bonds in varying proportions and types. The major difference between these lignin types is the number of methoxy groups on each aromatic ring, two for S-lignins, one for G-lignins, and none on H-lignin [69]. S/G ratio is commonly used to describe native lignin in different biomass types and it is a proxy for degree of methoxylation and interunit bond proportions [70]. S/G ratio correlates positively with the abundance of β-O-4 bonds, a relatively easily cleaved interunit bond (Table 1). Lignin methoxylation dictates what type of enzymatic conversions will be required for biological valorization as dimethoxylated and monomethoxylated lignin monomers are degraded by different pathways [71]. High S/G substrates and low S/G feedstocks will therefore likely require alternately engineered production strains to maximize monomer conversion.
Cellulosic fuel production involves a combination of pretreatment and enzymatic hydrolysis of the plant feedstock to extract sugars from the (hemi)cellulose, leaving behind a modified lignin-rich solid substrate [72]. Current available pretreatments are numerous and cannot all be addressed here [43,48,73], but each yields chemically unique lignin feedstocks for biological valorization. Generally, pretreated lignins that retain high β-O-4 bond content, such as hardwood lignin pretreated with alkali [74] result in a greater proportion of metabolizable monomers after depolymerization [75]. However, there is no method that produces reproducible monomer mixtures from all potential feedstocks [76].
After pretreatment and carbohydrate extraction, lignin-rich feedstocks must be depolymerized by one of several methods, each of which introduces its own modifications affecting the final lignin monomer profile [48]. There is no "best" depolymerization method and these techniques are actively being developed and tested on various pretreated biomass substrates [77,78]. The goal is to convert lignin into a relatively homogenous, unmodified mixture of monomers that can readily be degraded via known biological decomposition pathways. Process conditions such as temperature [79], type of H-donating catalyst [75,80], or various chemical additives [81,82] can be used to alter final monomer composition and prevent condensation of reactive phenolics. However, all depolymerization techniques yield complex mixtures of lignin monomers and oligomers.

Pathways for Microbial Catabolism of Lignocellulose
Lignocellulosic biomass is ubiquitous in the environment and represents a substantial carbon source. As a result, microbes have independently evolved multiple pathways to metabolize lignocellulose-derived compounds. In most heterotrophic bacteria, sugar catabolism uses a network of core pathways. These pathways have been most extensively studied in E. coli, with recent advances in other industrially important bacteria such as C. glutamicum, B. subtilis, Streptomyces, and Lactococcus lactis [83]. Lignocellulosic sugars can enter this core network at different points depending on the specific catabolic pathway used, as described further in Section 3.1. To produce fuels and chemicals, sugars are typically converted into central metabolites and then reassembled into the desired products. As such, the point of entry into central metabolism affects the ease of conversion to the desired product.
Catabolism of lignin-derived aromatic compounds typically involves conversion to core aromatic intermediates, followed by oxidative ring opening and further degradation. Unlike sugars, metabolic intermediates preceding central carbon metabolism are currently the desired end products and engineering efforts focus on directing flux toward these metabolites [84]. In this review, we focus on two metabolites, cis, cis-muconic acid (MA) and 2-pyrone-4,6-dicarboxylate (PDC) with the potential to serve as valuable endpoint metabolites for lignin valorization. Lignin monomer decomposition pathways relevant to their conversion to MA and PDC are described below in Section 3.2.

Xylose
Xylose catabolism has been a primary engineering focus, as xylose is the second most abundant sugar in lignocellulose. Three unique pathways for xylose catabolism are known: the xylose isomerase pathway, the oxo-reductive pathway, and the oxidative pathways (Dahms or Weimberg) ( Figure 3). The xylose isomerase pathway is more common in bacteria, whereas the oxo-reductive pathway is found more commonly in fungi. The oxidative pathway is found mainly in archaea, but also in some bacterial and fungal strains [93].

Other Hemicellulosic Compounds
Pectin is a polysaccharide component of the primary plant cell wall that is found primarily in the non-woody components of terrestrial plants and fruits and vegetables. Pectin is a hetero-polysaccharide, the majority of which (~70%) is made up of an α-(1,4)-linked D-galacturonic acid backbone (D-galUA). It can be naturally degraded by saprophytes, phytopathogens, and some species from animal gut microbiota [103]. The hydrolysis of pectin generates D-galacturonate, which can then be metabolized by the D-galUA catabolic pathway. D-galacturonate is first reduced into L-galactonate by GAR1, followed by dehydration into 2-keto-3-deoxy-L-galactonate by L-galactonate dehydratase LGD1, which is finally converted into pyruvate and L-glyceraldehyde by GAAC for entry into central metabolism [104].

Other Hemicellulosic Compounds
Pectin is a polysaccharide component of the primary plant cell wall that is found primarily in the non-woody components of terrestrial plants and fruits and vegetables. Pectin is a heteropolysaccharide, the majority of which (~70%) is made up of an α-(1,4)-linked D-galacturonic acid backbone (D-galUA). It can be naturally degraded by saprophytes, phytopathogens, and some species from animal gut microbiota [103]. The hydrolysis of pectin generates D-galacturonate, which can then be metabolized by the D-galUA catabolic pathway. D-galacturonate is first reduced into Lgalactonate by GAR1, followed by dehydration into 2-keto-3-deoxy-L-galactonate by L-galactonate dehydratase LGD1, which is finally converted into pyruvate and L-glyceraldehyde by GAAC for entry into central metabolism [104].

Sugar Transport
Optimal expression of the pentose and hexose metabolic genes for functionality is one key component of engineering microbes for flexible feedstock utility. The other major module is the ability of the cell to transport the sugars into the cytosol. There are multiple types of transporters which mediate the selective portability of monosaccharides across bacterial membranes, including ATP binding cassette (ABC) transporters, proton symporters, uniporters, and the phosphoenolpyruvate: carbohydrate phosphotransferase system (PTS) (Figure 6).

Sugar Transport
Optimal expression of the pentose and hexose metabolic genes for functionality is one key component of engineering microbes for flexible feedstock utility. The other major module is the ability of the cell to transport the sugars into the cytosol. There are multiple types of transporters which mediate the selective portability of monosaccharides across bacterial membranes, including ATP binding cassette (ABC) transporters, proton symporters, uniporters, and the phosphoenolpyruvate: carbohydrate phosphotransferase system (PTS) (Figure 6).
Transmembrane sugar transporters can be specific to one particular substrate, or adaptable to multiple substrates, and select organisms may have multiple types of transporters for the same sugar. Of the lignocellulosic sugars, glucose and mannose are the main carbohydrates that utilize the PTS transporters. Pentoses are commonly carried by ABC transporters or proton-linked and uniporter transporters. ABC transporters have a higher affinity for their substrates than proton or uniporter transporters, and thus are typically targets for engineering sugar transport [105]. Transmembrane sugar transporters can be specific to one particular substrate, or adaptable to multiple substrates, and select organisms may have multiple types of transporters for the same sugar. Of the lignocellulosic sugars, glucose and mannose are the main carbohydrates that utilize the PTS transporters. Pentoses are commonly carried by ABC transporters or proton-linked and uniporter transporters. ABC transporters have a higher affinity for their substrates than proton or uniporter transporters, and thus are typically targets for engineering sugar transport [105].

Regulation of Sugar Catabolism
In natural environments, bacteria are exposed to a variety of carbon sources and have adapted to selectively use each substrate separately. When a preferred substrate is present, usually glucose, that substrate will prevent the expression of the metabolic genes for catabolism of secondary carbon sources. Carbon catabolite repression (CCR) is accomplished through two different types of mechanisms: inhibition of the transport of the secondary sugar (inducer exclusion), and inhibition of the transcription of the catabolic enzymes [106]. Inducer exclusion is the mechanism of action of the glucose mediated repression of secondary catabolites in E. coli [107].

Phenylpropanoids
Lignin is primarily composed of three randomly crosslinked phenylpropanoids differing in the degree of methoxylation on the aromatic ring [52]. Decomposition of these compounds is typically studied using the acid form, p-coumaric acid (H-lignin), ferulic acid (G-lignin), and sinapic acid (Slignin) (Figure 7). Decomposition of all three phenylpropanoids proceed via conjugation of the γ-acid to Coenzyme A (CoA), and cleavage of one unit of acetyl-CoA [108,109]. Two cleavage pathways are

Regulation of Sugar Catabolism
In natural environments, bacteria are exposed to a variety of carbon sources and have adapted to selectively use each substrate separately. When a preferred substrate is present, usually glucose, that substrate will prevent the expression of the metabolic genes for catabolism of secondary carbon sources. Carbon catabolite repression (CCR) is accomplished through two different types of mechanisms: inhibition of the transport of the secondary sugar (inducer exclusion), and inhibition of the transcription of the catabolic enzymes [106]. Inducer exclusion is the mechanism of action of the glucose mediated repression of secondary catabolites in E. coli [107].

Guaiacol
Guaiacol is not a building block of native lignin, but is a common component of depolymerized lignin streams [79] that can be converted to catechol with a single demethylation step. Guaiacol demethylases have not been described in most model guaiacol-degrading organisms. Recently, a P450 aryl-O-demethylase was shown to demethylate a wide range of substrates including guaiacol [123]. These enzymes can serve as useful parts for engineering production microbes [124]. Aromatic acids are then funneled to protocatechuate (H-and G-lignin) or 3-O-methylgallate/gallate (S-lignin) prior to further metabolism into central carbon intermediates (Figure 7). For benzoate, this is achieved by a single hydroxylation step [113], but vanillate and syringate must be demethylated before conversion into lower pathway intermediates. There are several types of microbial O-demethylases that perform these reactions that differ in their cofactor requirements and substrate specificities. Vanillate can be demethylated by VanAB, a heterodimer composed of a 2Fe-2S oxidase component and a ferredoxin, [114][115][116] or by the tetrahydrofolate (THF)-dependent LigM [117][118][119]. Similarly, syringate can be fully demethylated by a 2Fe-2S demethylase [120], or partially demethylated by a THF-dependent demethylase to 3-O-methylgallate, a precursor to PDC [121,122] (Figure 7).

Guaiacol
Guaiacol is not a building block of native lignin, but is a common component of depolymerized lignin streams [79] that can be converted to catechol with a single demethylation step. Guaiacol demethylases have not been described in most model guaiacol-degrading organisms. Recently, a P450 aryl-O-demethylase was shown to demethylate a wide range of substrates including guaiacol [123]. These enzymes can serve as useful parts for engineering production microbes [124].

Lower Catabolic Pathways for Biological Valorization
Bacteria naturally catabolize PCA and 3MGA to provide carbon and energy using a variety of pathways. For engineering purposes, once lignin monomers are modified by funneling reactions, PCA and 3MGA can be converted to MA or PDC by one of several pathways initiated by conversion to catechol or a ring-opening by a dioxygenase (Figure 8) [125]. PCA can be degraded into central carbon metabolites in one of three ways based on where initial ring cleavage takes place. PDC is produced as an intermediate when ring cleavage is initiated by a PCA 4,5-dioxygenase such as LigAB from Sphingobium sp. SYK-6 ( Figure 8). 3MGA can be converted to PDC by a ring-opening 3MGA-3,4-dioxygenase (DesZ) which oxidizes and demethylates 3MGA to PDC [126]. For MA production PCA must be converted to catechol by PCA decarboxylase, aroY, and this gene is commonly engineered into MA-producing hosts [127][128][129]. Catechol decomposition also occurs through the β-ketoadipate pathway and converges with PCA breakdown at β-ketoadipate enol-lactone after first being converted into MA [125]. Bacteria naturally catabolize PCA and 3MGA to provide carbon and energy using a variety of pathways. For engineering purposes, once lignin monomers are modified by funneling reactions, PCA and 3MGA can be converted to MA or PDC by one of several pathways initiated by conversion to catechol or a ring-opening by a dioxygenase (Figure 8) [125]. PCA can be degraded into central carbon metabolites in one of three ways based on where initial ring cleavage takes place. PDC is produced as an intermediate when ring cleavage is initiated by a PCA 4,5-dioxygenase such as LigAB from Sphingobium sp. SYK-6 ( Figure 8). 3MGA can be converted to PDC by a ring-opening 3MGA-3,4-dioxygenase (DesZ) which oxidizes and demethylates 3MGA to PDC [126]. For MA production PCA must be converted to catechol by PCA decarboxylase, aroY, and this gene is commonly engineered into MA-producing hosts [127][128][129]. Catechol decomposition also occurs through the βketoadipate pathway and converges with PCA breakdown at β-ketoadipate enol-lactone after first being converted into MA [125].

Transport of Aromatic Compounds
Membrane transport of lignin monomers can create a bottleneck in lignin valorization and adequate expression of transporters with varying specificity must be maintained to support metabolic flux through these pathways. Inner and outer membrane transporters are necessary for many lignin-based aromatic acids such as ferulate, vanillate, and protochatechuate [127]. Several examples are known primarily from transcriptome and proteome data for bacteria grown on these compounds, but few have been extensively characterized or manipulated for metabolic engineering purposes [130].

Transport of Aromatic Compounds
Membrane transport of lignin monomers can create a bottleneck in lignin valorization and adequate expression of transporters with varying specificity must be maintained to support metabolic flux through these pathways. Inner and outer membrane transporters are necessary for many lignin-based aromatic acids such as ferulate, vanillate, and protochatechuate [127]. Several examples are known primarily from transcriptome and proteome data for bacteria grown on these compounds, but few have been extensively characterized or manipulated for metabolic engineering purposes [121].

Challenges to Engineering Catabolic Pathways
Introducing new catabolic pathways into an organism is challenging, particularly when those pathways must carry substantial flux. Many challenges have been reported during efforts to engineer new microbial catabolic pathways, which we broadly classify into transport, low expression and/or low specific activity of the metabolic pathway enzymes, catabolite repression, redox balance, energy balance, cofactor demands, and unintentional interactions with the host's native pathways. These challenges will be described in detail below.

Transport
Initial uptake of any carbon source is critical as the first step, and sometimes the most challenging towards efficient carbon integration. Transporters can be very specific to one substrate, or promiscuous in their substrate permeability [130]. Due to this flexibility in substrate binding, it is not always clear which compounds a given strain can transport. Although a natural promiscuity towards substrate permeability may allow some initial success in heterologous pathway development, these pathways often suffer from limited consumption rates and product yields [105]. The correct assembly and integration of heterologous membrane proteins is difficult and can often lead to toxicity in the host, misfolded and inactive structures, very low expression levels, or overexpression and aggregation of the protein in the cytoplasm into inclusion bodies [131,132].

Redox & Energy Balance/Co-Factor Preference
A heterologous pathway will inevitably have its own redox, co-factor, and energy demand, potentially stealing resources away from normal cell processes. Additionally, the host organism may not produce the correct ratios of redox, energy, or co-factor compounds required for the functioning of a heterologous pathway. Accounting for these variances is an important aspect of the overall design.
All cellular metabolism utilizes the reducing power of nicotinamide adenine dinucleotides, the unphosphorylated (NADH) and phosphorylated forms (NADPH), NADH primarily for catabolic processes and NADPH primarily for anabolic reactions [133]. The cellular balance of these components varies between hosts and is dependent on several factors, including native catabolic pathways [134] and environmental conditions. During aerobic growth of many microbes, overproduction of either redox equivalent can be re-oxidized during oxidative respiration by molecular oxygen, maintaining the cellular redox balance [135]. A significant mismatch between a heterologous pathway and the normal metabolic state of the host can still be detrimental. For example, the oxo-reductive xylose pathway has two redox steps and requires two different redox co-factors, NADPH for xylose reductase and NAD + for xylitol dehydrogenase. This mixed co-factor preference may be the cause for the build-up of xylitol often seen in the utilization of the pathway [136]. During anaerobic growth, re-oxidation is more challenging, and pathways must be in near-perfect redox balance. Insertion of a heterologous pathway may disturb the balance between redox use and regeneration [135]. Some downstream product pathways of interest may also be redox heavy, so determining the best-fitting carbon assimilation pathway may also depend on the ultimate downstream goal.

Regulation
Carbon catabolite repression (CCR) is an important consideration in lignocellulosic biomass conversion where the optimal commercial conditions will involve a mixed sugar media. Selective sequential utilization of the sugars is possible, but not optimal [16]. Carbon catabolite repression is one of the most well studied mechanisms of transcriptional inhibition of catabolic pathways. However there are several bacteria that display CCR through unknown mechanisms [106], further complicating the analysis of inefficient or non-existent pathway function. Frequently, the use of heterologous catabolic pathways can successfully bypass the host's CCR, for example in S. cerevisiae [137] and P. putida [138].
In lignin catabolism, diverting aromatic carbon to MA or PDC also creates a requirement for a co-substrate such as glucose to support cell growth [139]. However, using glucose as a co-substrate can induce CCR. For example, it was shown in P. putida that feeding glucose reduces the expression of vanAB and pobA and slows MA production from lignin monomers [140].

Expression and Activity of Metabolic Pathway Enzymes
For a metabolic pathway to function effectively in a heterologous host, the enzymes must be expressed in active form in the new host. Enzymes that are soluble in one organism may not be properly folded or trafficked in a new host [141]. For example, enzymes for aromatic O-demethylation include 2Fe-2S Rieske-type and P450 monooxygenases (Section 3.2). Proper co-translational insertion of the necessary prosthetic group is required for functional expression and can be limiting in heterologous hosts [142]. Similarly, expression of an aromatic decarboxylase requires additional proteins to aid in insertion of a flavin-derived cofactor [139].
More generally, enzymes must be expressed at the appropriate level to balance pathway flux [143,144]. Overexpression wastes energy and can generate downstream toxic intermediates [145], while low expression can create bottlenecks and accumulate upstream potentially-toxic intermediates [146], or result in low product yields due to inefficient carbon flux through the length of the pathway [147].

Interactions with Native Metabolic Pathways
In addition to CCR, inefficient pathway function may be due to unintended interactions of a heterologous pathway with the host metabolism, and vice versa. Three major mechanisms for pathway and host have been proposed, broadly categorized into novel metabolites interfering with host metabolic enzymes, host metabolites interfering with the heterologous pathway enzymes, and re-routing of heterologous pathway intermediates into unproductive host pathways [26]. For example, in the transfer of a mevalonate pathway for isoprenoid production into E. coli, accumulation of one pathway intermediate, HMG-CoA, caused growth inhibition that the authors attributed to inhibition of fatty acid biosynthesis by HMG-CoA [148]. Similarly, production of reactive oxygen species by uncoupled oxidative enzymes can impose severe stresses on a host [149].
Many unmodified microbial strains can use a variety of carbon sources, in part due to the promiscuous activity of pathway enzymes [150], but at low efficiencies. In the best case, these native pathways can serve as the starting point for rapid pathway improvement [151]. However, these native pathways can also be problematic, for example by diverting carbon into unproductive pathways. For example, the ring-cleaving dioxygenase LigAB from Sphingobium sp. SYK-6 can perform a variety of cleavage reactions on PCA, 3MGA, and gallic acid [152]. While this promiscuity increases the metabolic flexibility of the natural host, in an engineered strain it can decrease product yield. Less promiscuous homologs may be preferable for pathway engineering. Similarly, promiscuous activity of heterologous xylose catabolic pathway enzymes in Saccharomyces cerevisiae were shown to reroute galactose and galactitol towards galactitol and tagatose, resulting in a decrease in the desired product, ethanol. Additionally, the overexpression of a cellobiose pathway in the same recipient strain resulted in the extracellular accumulation of glucose trisaccharide, again rerouting the feedstock carbon away from the target product [153].
In some situations, a strain may contain a native catabolic pathway that is undesirable. For example, most Pseudomonas strains natively use the PCA 3,4-cleavage pathway initiated by PcaHG. Taking advantage of an alternate catabolic pathway, such as the 4,5-cleavage pathway that produces PDC, requires first eliminating the native pathway [154,155].

Application of Modular Design for Carbon Catabolic Pathway Engineering
Pathway modularity can be assessed at multiple levels. The term is often used to refer to engineering pathways to work in arbitrary combination in a single host, allowing different pathways to be used together without substantial reengineering. An additional level of modularity involves engineering pathways to work in arbitrary host organisms, which we will distinguish by referring to as portability. Both modularity and portability will be required to rapidly and predictably engineer non-model microbes for biomass degradation. If fully implemented, researchers will be able to select the optimal combination of pathways to introduce into a target non-model production strain to enable full conversion of any characterized biomass feedstock.

Characterization of Engineered Pathways in Diverse Hosts and Combinations
The first step in engineering modularity and portability is to understand the factors that currently limit these characteristics. At a minimum, careful characterization of engineered catabolic pathways in a variety of hosts and combinations will allow the deliberate selection of pathways that are likely to function together in the target organism. Comprehensive characterization may also identify commonalities between pathways that do and do not function efficiently in various organisms or combinations. Identifying these factors will improve our ability to predict optimal host/pathway pairings by relying not just on phylogeny [156] but also physiology and biochemical mechanism [157]. In addition, these limiting factors then become potential engineering targets to improve pathway modularity.
At the level of individual enzymes, tools have been developed to predict enzyme solubility in model hosts such as E. coli. Computational analysis of physical genetic sequences seeks to aid in the development of the design space for any pathway-host pairing [158]. Finding the enzyme homolog with the highest solubility for the host of interest increases the potential for successful folding and optimal rates of activity [159]. The building of these tools has been trained on data that exists from model organism hosts, and extending these tools to additional hosts will aid in enzyme selection for novel pathway-host combinations.
Several tools have been built for the development and assessment of modular pathways. Initial work, such as the BioBrick framework [160,161], focused on assembly standards that could then lead to standard characterization [162][163][164]. This standardization of methods, materials, and parts seeks to enable the rapid development of modular parts through the division of labor and characterization across the bioengineering field. Much of the work has been done in model organisms such as Bacillus subtilis [165] and E. coli [166], with some recent development in the non-model yeast Yarrowia lipolytica [167] and Rhodococcus opacus [168]. Characterizing pathways in a variety of hosts and conditions can identify situations where pathways do and do not function reliably.
Recently, modular design of the EMP glycolytic pathway has been shown to be portable across P. putida KT2440, P. aeruginosa PAO1, and E. coli K-12 [169]. Here, the glycolytic pathway was split into two "GlucoBrick" modules, upper and lower catabolic blocks, constructed from standardized plasmids taken from the Standard European Vector Architecture [170]. The use of standardized vectors and the construction of promoter-less module derivatives allows for expression customization. Further use and characterization of the GlucoBrick modules across more diverse organisms would increase the reliability of EMP module datasheets, increasing the predictability of central metabolism engineering.
Combinatorial expression libraries have been designed and tested to understand the optimal expression profiles for various heterologous pathways, for example in xylose utilization via the oxo-reductase pathway in S. cerevisiae [171]. In this case, multiple genes from the downstream central metabolic PPP were also overexpressed, in order to support a theoretical increase in upstream carbon flux. Expression optimums for each enzyme in the xylose oxo-reductive pathway (XOR) changed when expressed alone, with the additional PPP enzymes, and under aerobic or anaerobic conditions. These pathways are not natively modular and will require further engineering to enable modularity and portability.

Minimizing Interactions with the Host or Other Pathways
One approach to improve the modularity and portability of engineered pathways is to identify host-specific or pathway-pathway interactions and minimize or alleviate these interactions. Most prior studies have overcome these problems in a single organism. The next challenge will be to develop broad host range solutions and to identify in advance when a particular solution will be necessary.
In some cases, the factors limiting pathway modularity and portability can be inferred from previous biochemical and physiological studies. For example, introduction of a heterologous aromatic decarboxylase was initially unable to support sufficient pathway flux from PCA to MA. Addition of a previously-characterized enzyme involved in decarboxylase maturation was required to improve activity [139,172,173]. Similarly, an unbalanced pathway might lead to accumulation of a toxic intermediate, which can be alleviated by overexpressing a known rate-limiting enzyme [174]. Some compensatory mechanisms are also likely to be broadly-effective, such as chaperone overexpression to improve folding of heterologous proteins [175].
However, strategies that seem to be generalizable do not always turn out to be, often for reasons that we do not yet understand. For example, altering the co-factor preference of xylose reductase from NADPH to NADH by site-directed mutagenesis was successful in improving xylose fermentation for ethanol production in S. cerevisiae [176], presumably by restoring redox balance in the pathway. The same approach was used in another study, but resulted in minimal impact to ethanol fermentation though a significant improvement in biomass accumulation [171]. Using a multi-omics analysis in both studies could identify the physiological basis for the differing results in phenotype and could inform future pathway compositions and/or host modifications.
The consequences of pathway engineering can also be highly complex, as the primary biochemical disturbance can cause multiple physiological effects. Previous methods to untangle these effects and identify host-pathway interactions have used a combination of metabolic engineering to build heterologous catabolic pathways followed by systems analysis to understand why a pathway failed to function effectively. For example, a study in Saccharomyces cerevisiae was able to identify the source of metabolic burden upon overexpression of a heterologous P450 monooxygenase [142]. A lineage of P450 monooxygenases was evolved to generate a spectrum of enzyme activity, followed by introduction of each variant into Saccharomyces cerevisiae. Global transcriptomics and principle component analysis across each variant was used to identify heme limitation as the primary cause of cellular stress. This stress was able to be alleviated through the targeted increase in heme production. A similar approach was used to identify the cellular mechanisms behind successful catabolism of xylose in evolved strains of S. cerevisiae [177]. A combination of transcriptomics, proteomics, and phosphoproteomics were used to identify two regulatory proteins involved in determining xylose-dependent growth or xylose-dependent metabolism.
Alternately, experimental evolution can be used to select for catabolic pathways with improved activity. Studying the resulting mutants can highlight both the factors that previously limited activity as well as specific strategies to overcome those limitations. For example, the implantation of two coumarate catabolic pathways into E. coli failed until laboratory evolution revealed a unique mechanism of interference. In one pathway, an intermediate was found to inhibit purine nucleotide biosynthesis [27].

Tolerating Host and Pathway Variability
When deleterious interactions cannot be eliminated, pathways can be engineered to function reliably despite variable conditions. In engineering non-biological systems, this goal is often achieved through control systems engineering, using feedback regulation to increase robustness to disturbances. Similarly, feedback control is ubiquitous in natural biological systems to increase robustness to environmental or physiological disturbances. However, few engineered biological examples have used control systems to improve robustness rather than performance.
One option for designing portable pathways is to bypass the host expression system all together and design the heterologous pathway for control under an orthogonal expression system. The development of a cross-species expression system using a T7 RNA polymerase and a self-regulating circuit has been shown to work reliably in B. subtilis, P. putida, and E. coli [25]. Similarly, the reuse of native regulatory mechanisms for heterologous pathways has been shown to improve reliability [178].
When specific pathway limitations are identified and alleviated, these solutions can then be included in the pathway to improve portability. In several examples described in Section 5.2, introduction of additional enzymes alleviated deleterious interactions between host and pathway. These enzymes can then be included in the engineered construct to improve portability [28]. However, more research is required to develop new strategies to further improve modularity and portability.

Conclusions
Rational engineering of microbial cells to metabolize specific carbon substrates requires in-depth knowledge of metabolic pathways and the regulatory elements that govern them. Traditional approaches for pathway optimization have involved years of trial and error. These efforts have led to the successful development of methods and tools to engineer novel pathways and products in many model organisms, as described above. However, extending these bespoke approaches to new, non-model microbes would require substantial effort. The goal of modular design is to develop methods and devices that can be rapidly transferred into diverse, non-model organisms. By minimizing strain-specific optimization, modular design can simplify the process of onboarding new non-model microbes.
Catabolic pathway optimization has the inherent advantage that growth selections can be used to separate pathways with differing productivities. As such, it provides a tractable testbed for the development of modular engineering approaches. These efforts would begin with a high-throughput characterization of catabolic pathways for each desired substrate from a selection of evolutionarily-divergent host organisms, to capture a wide diversity of possible enzyme structures and pathway configurations. The pathways can be tested, individually or in combination, in a variety of potential hosts to replicate a realistic diversity of intracellular environments. When coupled with sequencing methods for tracking changes in strain and pathway abundance, catabolic pathway activity can easily be measured with high-throughput growth analysis. Some pathway/host pairings will fail or severely underperform for a variety of reasons, such as those described above.
To enable modular design, the causes of these pathway/host mispairings must be identified and either alleviated or highlighted as future selection criteria. In-depth analysis including metabolomics, transcriptomics and proteomics can illuminate these causes, by detailing regulatory and metabolic changes that result from introduction of the heterologous pathway. Experimental evolution can also be used with poorly-functioning pathways to select for improved variants. Characterizing the resulting mutants can help to identify factors that were initially limiting activity. Comparing successful, unsuccessful, and evolved pathways will help to explain why species differ in their ability to functionally express various heterologous pathways of interest.
Ultimately, this information can be used to design modular metabolic units that are highly active and portable across dissimilar microbes. Understanding the requirements for effective use of various heterologous pathways will allow selection of the best pathway for a particular host, based on its unique genetics and physiology. Eliminating the causes of deleterious host/pathway interactions, for example by introducing accessory enzymes that minimize imposed stresses, will broaden the utility of the engineered pathways. Improving our ability to reliably and predictably introduce new catabolic pathways into non-model microbes will help develop broader methods for rational, modular engineering of microbial metabolism.

Conflicts of Interest:
The authors declare no conflict of interest. The funders had no role in the design of the study; in the collection, analyses, or interpretation of data; in the writing of the manuscript, or in the decision to publish the results.