Advances in Escherichia coli -Based Therapeutic Protein Expression: Mammalian Conversion, Continuous Manufacturing, and Cell-Free Production

: Therapeutic proteins treat many acute and chronic diseases that were until recently considered untreatable. However, their high development cost keeps them out of reach of most patients around the world. One plausible solution to lower-cost manufacturing is to adopt newer technologies like using Escherichia coli to express larger molecules, including full-length antibodies, generally relegated to Chinese Hamster Ovary (CHO) cells, adopt continuous manufacturing, and convert the manufacturing to cell-free synthesis. The advantages of using E. coli include a shorter production cycle, little risk of viral contamination, cell host stability, and a highly reproducible post-translational modiﬁcation.


Introduction
Therapeutic proteins represent a diverse class of drugs first made accessible as a recombinant DNA (rDNA) insulin in 1982 [1].There are now 266 such approved proteins [2], comprising a wide range of products with unique mechanisms of action and size, ranging from peptides to monoclonal antibodies (Table 1).
It is noteworthy that the European Medicines Agency (EMA) lists peptides as proteins, unlike the US Food and Drug Administration (FDA) [3].The E. coli-expressed products, including hormones, cytokines, enzymes, antibody fragments, and shorter-than-full-length antibodies, came into clinical use long before the use of Chinese Hamster Ovary (CHO) cells (Table 1).Some proteins still extracted from tissues can also be manufactured using E. coli.Examples include Alpha-1 antitrypsin, Antithrombin III, Botulinum toxin, C1 inhibitor, Fibrinogen, Heparin, Hirudin, Snake venom proteins, Streptokinase, Thrombin, and Urokinase.Figure 1 shows that most proteins expressed in E. coli are of lower molecular weight since most of the more complex and higher molecular weight monoclonal antibodies (mAbs) are expressed in CHO cells.
The FDA-approved therapeutic proteins expressed in E. coli are primarily of molecular weight of less than 32 kDa; it is difficult to produce proteins higher than 100 kDa in E. coli as it places excessive cell host load that prevents correct protein folding while maintaining adequate expression levels [4].
The value of E. coli will become more apparent when we start using it to express larger molecules such as the mAbs.The first study on the expression of full-length (FL) immunoglobulins (FL-IgGs) in E. coli was published in 2002.Later, in 2020, a modular system-based synthetic biology technique was applied to knock down gene expression using short regulatory ribonucleic acids (RNAs) with cetuximab as a target FL-IgG for enhancing expression, reaching up to 200 mg/L [5].The value of E. coli will become more apparent when we start using it to express larger molecules such as the mAbs.The first study on the expression of full-length (FL) immunoglobulins (FL-IgGs) in E. coli was published in 2002.Later, in 2020, a modular system-based synthetic biology technique was applied to knock down gene expression using short regulatory ribonucleic acids (RNAs) with cetuximab as a target FL-IgG for enhancing expression, reaching up to 200 mg/L [5].
Combining E. coli with emerging technologies like bioinformatics, novel methods for genetic manipulation to force E. coli to secrete heterologous proteins, and managing posttranslational modification offer many new opportunities, including E. coli-based Combining E. coli with emerging technologies like bioinformatics, novel methods for genetic manipulation to force E. coli to secrete heterologous proteins, and managing posttranslational modification offer many new opportunities, including E. coli-based continuous manufacturing and cell-free synthesis, that can significantly reduce the cost of development and manufacturing, as well enhance product safety.

Background
Recombinant DNA products involve the fusion of DNA from distinct species, followed by introducing the resulting hybrid DNA into a host cell, typically a bacterium or mammalian cell, to express the desired protein.The pioneering development of this molecular chimera dates back to 1972 [6,7] when researchers affiliated with the University of California, San Francisco, and Stanford University accomplished this technique.The United States patent for the invention was granted to Stanley Cohen from Stanford University and Herbert Boyer from the University of California, San Francisco (UCSF) in 1980.In 1976, Boyer played a crucial role in establishing Genentech, Inc.These patents have been licensed to more than 500 licensees and yielded royalties exceeding USD 250 million for Stanford and UCSF [8].
Since the FDA approved the first recombinant protein for therapeutic purposes in 1982, E. coli has remained a prominent organism for producing recombinant proteins despite the availability of many newer expression systems.The utilization of microbial expression systems, particularly E. coli, continues to offer a more straightforward and cost-effective approach for producing even heterologous recombinant proteins compared to mammalian cell culture and other systems.E. coli presents several notable benefits in genetic manipulation, growth conditions, high product yields, product purity, absence of viral contamination, and many more.

Advantages
Using E. coli as an expression system offers several advantages.

•
It is the most well-understood expression system.The genome of Escherichia coli strain K-12 MG1655, which is the most studied and best-characterized strain, has been fully sequenced and annotated.It was first completely sequenced in 1997, and the annotation and analysis have been continually updated since then as our understanding of genomics and the biology of E. coli has advanced [9].This knowledge base is critical in its utilization as a robust expression system.

•
Numerous prokaryotic genes [10] are expressed in operons [11], where a solitary promoter leads to the synthesis of multiple proteins from a single mRNA molecule, which has a ribosome binding site (RBS) preceding the beginning AUG codon of each protein.This enables the simultaneous production [12] of subunits that assemble into complexes or the simultaneous expression of auxiliary components that may be necessary for the protein to attain its native shape.

•
Simpler scale-up compared to eukaryotic systems, including mechanical cell disruption, which is less variable than eukaryotic cells, requires gentler lysis to preserve more fragile organelles and structures [16].

•
Avoiding virus contamination risk.Proteins synthesized in mammalian cell lines, the host cells possess multiple copies of endogenous retrovirus-like sequences, which subsequently generate retrovirus-like particles (RVLPs) together with the target protein.While RVLPs are commonly regarded as dysfunctional, certain instances have demonstrated their ability to infect cell lines that are not of rodent origin.Exogenous viral contamination resulting from raw materials or persons is also possible; however, such concerns are not relevant in the context of E. coli-based expression systems [17].

•
Low-cost growth medium, fast cellular proliferation, uncomplicated fermentation procedures, no viral contaminants in the final product, and high product yields [18].

Challenges
Choosing between these systems requires careful consideration of the specific protein and its intended application.

•
Proteins overexpressed in E. coli may form insoluble aggregates known as inclusion bodies, requiring specific solubilization and refolding steps, adding complexity to the purification process compared to eukaryotic cells [19]; additional purification steps if inclusion bodies are formed.A frequently encountered challenge is the formation of inclusion bodies-insoluble aggregates of misfolded proteins.Several tactics have been developed to address this.Incorporating solubility-enhancing fusion tags, such as SUMO or maltose-binding protein (MBP), has proven to enhance the solubility of certain target proteins [20].Additionally, co-expressing the protein of interest with molecular chaperones can help in its proper folding, making inclusion body formation less likely [21].Like adjusting the temperature or the IPTG concentration, fine-tuning expression conditions can also modulate protein synthesis rates and improve solubility [22].Even if inclusion bodies form, there is a workaround: the proteins can be solubilized with denaturants and then refolded, salvaging the protein for further use [23].These adaptive strategies emphasize the versatility and adaptability of the E. coli expression system, showcasing the myriad tools researchers have at their disposal to optimize protein production.

•
E. coli lacks the machinery for many eukaryotic PTMs, such as glycosylation, which may affect protein stability, folding, and activity [24].

•
Unlike eukaryotic systems, E. coli produces endotoxin contamination from its lipopolysaccharide, which must be removed during purification [25].

•
The toxicity of overexpressed proteins to E. coli often forces the expression of toxic protein fragments or domains retaining essential functions [26].One strategy involves using signal sequences attached to the protein's N-terminus, directing the protein's export to the periplasm, and decreasing cytoplasmic accumulation, thereby reducing potential toxicity [27].

•
Regulating the expression through weak promoters or controlled induction can temper any adverse impacts on the host cells.This requires codon optimization to enhance translation efficiency [28].

•
Expressing monoclonal antibodies (mAbs) in Escherichia coli (E.coli) presents multiple challenges, stemming primarily from the intricacy of these proteins.One of the main hurdles is ensuring the proper folding of mAbs, especially since they possess multiple domains.E. coli often struggles to correctly fold such large eukaryotic proteins, especially when they have multiple disulfide bonds.Furthermore, bacteria lack the machinery for certain post-translational modifications like glycosylation, which are vital for the function of mAbs.This absence can compromise the mAb's efficacy [29].
The reducing environment of the E. coli cytoplasm also makes disulfide bond formation problematic, while protein degradation can occur if the expressed proteins are unstable or perceived as foreign.Several strategies can be employed to counter these challenges.One approach is the expression of single-chain variable fragments (scFvs), which comprise the variable regions of the mAb's heavy and light chains connected by a short peptide linker.Researchers can also leverage specialized E. coli strains designed for disulfide bond formation in the cytoplasm, such as SHuffle strains [30].
Directing mAbs or scFv expression to the periplasmic space of E. coli, which is more oxidizing than the cytoplasm, can also encourage proper disulfide bond formation.Adjustments in expression conditions, co-expression with molecular chaperones, and codon optimization for E. coli are additional strategies to improve yields [31].The ability of bispecific antibodies (BsAbs) [32] to effectively target two entities concurrently enhances the practicality of antibody-based treatments.Genentech has successfully devised a periplasmic expression system in Escherichia coli, known as the BsAb expression system.This system utilizes either the Knobs-into-Holes (KiH) [33] technology or Fc domain HC heterodimerization [34].Genentech has made significant advancements in the production process of bispecific antibodies (BsAbs), including two distinct heavy chains (HCs) and two distinct light chains (LCs).These improvements have been achieved by utilizing either a two-culture or a coculture strategy in Escherichia coli (E.coli) systems [35].

Bioinformatics Applications
Choosing whether E. coli is a suitable system and whether its adoption will fulfill the goal of reduced-cost manufacturing requires a systematic process (Figure 2) that commences by considering several factors, such as potential splice variants, signal sequences, transmembrane helices, and post-translational modifications observed in the native protein.
Protein databases, such as UniProt [36], are valuable initial bioinformatics resources, albeit they remain suggestive, not definitive.For example, knowing what can be produced in E. coli and what cannot be is critical knowledge; cytotoxic T-lymphocyte-associated protein 4 (CTLA-4) is an obligate dimer and requires N-glycosylation of Asn78 and Asn110 for dimerization [37], and this PTM cannot be made in E. coli.This will need a synthetic biology method since the only eukaryotic-like PTMs that E. coli can produce is disulfide bond formation in the periplasm [38].
Holes (KiH) [33] technology or Fc domain HC heterodimerization [34].Genentech has made significant advancements in the production process of bispecific antibodies (BsAbs), including two distinct heavy chains (HCs) and two distinct light chains (LCs).These improvements have been achieved by utilizing either a two-culture or a coculture strategy in Escherichia coli (E.coli) systems [35].

Bioinformatics Applications
Choosing whether E. coli is a suitable system and whether its adoption will fulfill the goal of reduced-cost manufacturing requires a systematic process (Figure 2) that commences by considering several factors, such as potential splice variants, signal sequences, transmembrane helices, and post-translational modifications observed in the native protein.Protein databases, such as UniProt [36], are valuable initial bioinformatics resources, albeit they remain suggestive, not definitive.For example, knowing what can be produced in E. coli and what cannot be is critical knowledge; cytotoxic T-lymphocyte-associated protein 4 (CTLA-4) is an obligate dimer and requires N-glycosylation of Asn78 and Asn110 for dimerization [37], and this PTM cannot be made in E. coli.This will need a synthetic biology method since the only eukaryotic-like PTMs that E. coli can produce is disulfide bond formation in the periplasm [38].Bioinformatics methodologies, such as the software JPRED 4 [39], facilitate the investigation of domain boundaries and the prediction of regions of intrinsically disordered proteins (IDPs).Failure to express a construct that is insufficient in length and lacks a crucial component within a certain domain, such as a β-strand, should be anticipated.Conversely, attempting to express an excessively lengthy construct encompassing flexible regions susceptible to proteolysis will likely lead to either heterogeneity or the removal of a purification tag.Proteins characterized by a significant proportion of intrinsically disordered regions (IDRs) often provide challenges during production due to their inherent susceptibility to degradation.Proteins characterized by a significant proportion of intrinsically disordered regions (IDRs) often provide challenges in their production due 7to their inherent susceptibility to degradation.However, these may become structured upon interaction with other molecules, forming complexes such as acetyltransferase (ACTR) and nuclear coactivator binding domain (NCBD) [40] that can help as partners, making a stable and soluble protein.Bioinformatics methodologies, such as the software JPRED 4 [39], facilitate the investigation of domain boundaries and the prediction of regions of intrinsically disordered proteins (IDPs).Failure to express a construct that is insufficient in length and lacks a crucial component within a certain domain, such as a β-strand, should be anticipated.Conversely, attempting to express an excessively lengthy construct encompassing flexible regions susceptible to proteolysis will likely lead to either heterogeneity or the removal of a purification tag.Proteins characterized by a significant proportion of intrinsically disordered regions (IDRs) often provide challenges during production due to their inherent susceptibility to degradation.Proteins characterized by a significant proportion of intrinsically disordered regions (IDRs) often provide challenges in their production due to their inherent susceptibility to degradation.However, these may become structured upon interaction with other molecules, forming complexes such as acetyltransferase (ACTR) and nuclear coactivator binding domain (NCBD) [40] that can help as partners, making a stable and soluble protein.
Expanding the scope of the use of E. coli systems is now based on several well-defined prospects:

•
Exploiting the use of bioinformatics tools to determine the biophysical characteristics of the protein [41].It is a complex process that involves various computational methods.These methods utilize algorithms and statistical models to analyze the protein's primary sequence, infer its three-dimensional structure, and predict its interactions and functions.
Sequence analysis involves comparing the amino acid sequence of a protein with known sequences in databases to identify conserved domains, motifs, or families [42]; Structure prediction includes methods like homology modeling, ab initio modeling, and threading to predict a protein's three-dimensional (3D) structure based on its sequence [43]; Functional prediction identifies the biological role of a protein by assessing its structural and sequential features, often in conjunction with known proteinprotein interactions and pathway analyses [44]; Molecular dynamics simulations and related techniques are used to study the movement and interactions of proteins, providing insight into their behavior in the cellular environment [45]; Specific bioinformatics tools are designed to predict sites in proteins likely to undergo post-translational modifications (PTMs) such as phosphorylation or glycosylation [46]; Predicting how proteins interact with other proteins or ligands can be achieved through docking simulations and other modeling techniques [47].

•
Accurate delineation [48]: Identifying the boundaries of protein domains is essential for understanding the function and evolution of proteins [49]; Signal sequences are crucial for the targeting of proteins to specific cellular locations.Identifying these sequences helps in understanding the transportation and localization of proteins [50]; Transmembrane regions anchor proteins in membranes, playing essential roles in cellular communication, signaling, and transport.Accurate prediction of these regions aids in understanding membrane protein structure and function [51]; Identifying obligate oligomeric complexes is essential for understanding proteinprotein interactions and the assembly of multi-protein complexes [52]; Identification of PTMs is vital for understanding protein regulation and signaling [53].
Optimization of genetic and translation variables encompasses various elements, including codon use, the characteristics and placement of the ribosome binding site, and disparities in translation rates between prokaryotes and eukaryotes [54].

Gene Cloning and Design
Gene cloning typically involves selecting a purification method, such as affinity chromatography, which utilizes the inherent characteristics of the protein.This can be achieved through immobilized ligand or substrate mimic chromatography, using compounds like Cibacron Blue F3GA [55] or cyclic peptide-based ligands [56].Alternatively, a purification tag, such as a maltose-binding protein (MBP)-tag, glutathione-S-transferase (GST)-tag, or commonly a hexahistidine tag (his-tag), can be added to facilitate purification.Immobilized metal affinity chromatography (IMAC) [57] is commonly employed.If a protein with similar characteristics is present, its attributes can be utilized to assess the feasibility of adding a tag to the N-and C-terminus.Alternatively, one might utilize structure prediction software such as Phyre 2 [58].Although N-terminal histidine tags are highly valuable and extensively employed, they can introduce heterogeneity in the final product due to varying (phospho)gluconylation occurring at the N-terminus [59].
After the design of the protein construct, gene design starts to yield maximum expression that depends much on cellular homeostasis or keeping a delicate balance within the cell.When a high-copy number plasmid is employed with a robust promoter, it consistently leads to a reduced protein yield [60].This is attributed to the excessive allocation of cellular resources towards synthesizing plasmid DNA and mRNA.Consequently, the abundance of mRNA exceeds the capacity of the translation machinery, resulting in suboptimal protein production.Toxic effects of overexpressed recombinant proteins on E. coli cells can be anticipated to avoid these processes [61].
Transcriptome analysis can identify and remove the genes in charge of the cellular stress response.The number of growth-essential genes' down-regulated expression is reduced when cell surface receptor (CSR) is blocked [62].
The prevailing strategy involves the integration of genes into the bacterial chromosome to circumvent the issue of plasmid loss during extensive fermentation processes.However, despite their drawbacks, plasmids can be employed in their original form because they are more expeditious and cost-effective.The selection of plasmids for protein production is determined by their copy quantity, which is contingent upon the plasmid's origin of replication, promoter, and selection marker.The optimization of cellular resources allocated to protein production is contingent upon achieving an appropriate equilibrium between plasmid copy number and promoter strength, with consideration for the specific media conditions.
The field of synthetic biology has witnessed notable progress in developing growthdecoupled recombinant protein production.This has been achieved using the co-expression of Gp2, a peptide generated from a bacteriophage, which acts as an inhibitor of RNA polymerase in Escherichia coli.This methodology facilitated the regulation of metabolic resources, ensuring their exclusive allocation towards synthesizing the intended protein.
In addition to the plasmid, the origin of the gene is a crucial factor.Traditionally, the gene has been obtained directly from the original organism, typically by using a cDNA library acquired via reverse transcription polymerase chain reaction (RT-PCR) from a pool of messenger RNA (mRNA) to circumvent the inclusion of introns.Although the process can exhibit rapidity, cost-effectiveness, and efficiency, it can also lead to challenges associated with disparities in translation initiation and codon utilization between prokaryotic and eukaryotic organisms.
Due to a significant decrease in pricing, the cost of synthesizing a gene artificially has become lower than the combined expenses of labor and materials involved in cloning a gene from a complementary DNA (cDNA) library.Synthetic genes can also alleviate the potentially harmful consequences of another dissimilarity in protein translation rates between eukaryotes and prokaryotes [63].In prokaryotic organisms like Escherichia coli (E.coli), a coupling exists between the transcription and translation rates [64].Specifically, transcription occurs at a rate of 50 nucleotides, whereas translation occurs at 16 amino acids.

Ribosomes
In 1987 [65], a modified ribosome system was developed to facilitate the production of the proteins in E. coli through modifications made to the Shine-Dalgarno (SD) sequence of the mRNA and the corresponding anti-SD sequence of the 16S ribosomal RNA (rRNA).Other alternative ribosome systems can be utilized, including the orthogonal riboswitch system [66], the RiboTite system, and the Ribo-T system [67,68].The riboswitch system facilitates the adjustable co-expression of several genes in a dose-dependent manner in response to tiny synthetic chemicals.On the other hand, the RiboTite system, an extension of the riboswitch technology, has demonstrated the ability to synchronize protein translation rates with protein release.The Ribo-T system utilizes a modified hybrid rRNA that combines small and large subunit rRNA sequences.This modified rRNA is connected into a single translating unit using short RNA linkers that form covalent bonds between the subunits.The functionality of the orthogonal ribosome-mRNA system has been demonstrated in sustaining bacterial growth in the absence of wild-type ribosomes.Furthermore, a recent study has documented the development of an enhanced tethered version of this system [69].

•
The characteristics and location of the ribosome binding site (RBS) and the disparities in translation rates observed in prokaryotic and eukaryotic organisms [70].The ribosome binding site (RBS) plays a crucial role in the translation initiation.The sequence and position of a gene relative to the initiation codon can influence the translation efficiency.Customizing the RBS to the host organism might enhance the efficiency of translating the desired protein [71]; • Correct use of the strain and media to optimize production, though with many limitations [72].The optimization of production in E. coli strains through proper selection of the strain and media is a common strategy in biotechnology but comes with certain limitations; • Optimization in E. coli can vary widely depending on the protein or other manufactured product.Selecting the right strain of E. coli, determining the optimal temperature, and choosing the appropriate culture media are crucial considerations for recombinant protein expression.
The presence of secondary structural components in mRNA might obstruct ribosome binding, resulting in hindered translation and various limits in the translational process [73].Eukaryotic ribosomes exhibit a binding affinity towards the cap located at the 5 terminus of the mRNA molecule.Subsequently, they traverse along the mRNA until they commence translation at the initial AUG codon, preceded by a Kozak sequence.In contrast, prokaryotic ribosomes engage with a specific region on the mRNA called the Shine-Dalgarno sequence or ribosome binding site.The ribosome binding sites (RBS) typically consist of 5-13 base pairs [74] upstream of the beginning AUG codon, with an ideal spacing of 5-6 base pairs [75].These RBS sequences complement the 3 end of the 16S ribosomal RNA.The nucleotide sequence AGGAGGU [76] is seen in Escherichia coli.The presence of a separate ribosome binding site (RBS) in eukaryotic protein production in Escherichia coli (E.coli) leads to two unique outcomes.Before beginning the AUG codon, a ribosome binding site (RBS) must be present.This phenomenon may be observed within the plasmid region external to the multi-cloning site.However, it is imperative to exercise caution to ensure that the distance is appropriate and that the translation process does not inadvertently introduce more AUG trinucleotides.
Furthermore, it is essential that this specific nucleotide sequence does not occur inside the gene of interest.An internal ribosome binding site (RBS) can have two outcomes: it can lead to the production of a second protein if there is an AUG codon at the appropriate distance from it, or it can cause translation stalling as a ribosome binds to this site and obstructs translation.Therefore, special consideration is provided to the choice of codons for Gly-Gly pairs (excluding GGA-GGU), Arg-Arg pairs (excluding AGG-AGG), and sequences surrounding Glu (GAG), including Glu-Glu pairs (GAG-GAG).Escherichia coli (E.coli) exhibits infrequent utilization of AGG and GGA codons.Therefore, it is crucial to exercise caution while optimizing codons to prevent the occurrence of internal ribosome binding sites (RBS) associated with sequences around glutamic acid (Q/K/E-E or E-V).

Promoter
The significant functional sections close to PT7 include the −35/−10 region, translation initiation region (TIR), operator sequence, and replicon of the TpET plasmid.Numerous functional areas close to the PT7, the pET plasmid's core region, control the basal expression level before induction and the proper transcription rate following induction.
By maximizing transcription or translation levels, the T7 RNAP objective is attained.The lacUV5 promoter (PlacUV5), a strongly inducible promoter that is activated by the amino acid isopropyl-beta-d-thiogalactopyranoside (IPTG), controls this process [77], and the P lacUV5 is independent of recombinant product, which makes it leakier than P lac [78] .Three inducible promoters-ParaBAD [79], PrhaBAD, and Ptet-are appropriate for toxinprotein fermentation that lasts a long time.PrhaBAD and Ptet, however, more strictly control T7 RNAP transcription, giving additional expression possibilities for various recom-binant products-especially dangerous proteins [80].When the lac repressor gene (lacI) is altered, leaky expression is decreased by improving the ability to inhibit proteins [81].

•
To create the promoter variation lac1G, the promoter lacUV5 and lac were joined again.
(G was substituted for A at position +1) [82]; The expression of T7 RNA polymerase (RNAP) is effectively regulated to prevent leakage by the presence of a mutant form of the Lac repressor protein (LacI), specifically the V192F variant.This mutant variant cannot bind to isopropyl β-D-1-thiogalactopyranoside (IPTG), hence preventing its activation.Consequently, the mutant LacI dynamically governs the levels of transcripts produced by T7 RNAP [83]; • Building a T7 RNAP RBS library quickly involves using the base editor and CRISPR/ Cas9 to screen potential expression hosts [84]; The ability of T7 RNA polymerase to bind to the PT7 promoter was impaired due to a specific amino acid substitution (A102D), resulting in an alteration in the rate of RNA production.The T7 RNA polymerase (T7 RNAP) was fragmented into two segments and co-expressed with a light-responsive dimerization domain, exhibiting functional behavior upon exposure to blue light [85].

Codons
The expression level of the ColE1 plasmid replication-associated gene can be regulated by utilizing CRISPRi and the inducible promoter Ptet [86].
The distribution of codon usage is not uniform throughout the available codons, and there is significant variance in the degree of codon usage bias observed among different organisms.Using codons exhibits substantial variation across other microorganisms and is associated with corresponding transfer RNA (tRNA) quantities [87].
mRNA, which contains multiple rare codons, can exhibit translation stalling and degradation [88].Bioinformatic approaches can examine codon usage issues, e.g., Graphical Codon Usage Analyzer [89].One method to prevent this problem is to overexpress the rare tRNAs [90], such as from pLysSRARE [91,92].The usual approach is using synthetic genes that can be codon optimized for the expression host while avoiding internal RBS, internal restriction sites, and factors that influence mRNA structure and stability [93,94].

Protein Folding
Translation rates in eukaryotes are comparatively slower, typically occurring at approximately three amino acids per second.The process of protein folding has co-evolved with translation rates, resulting in a situation where the translation rate [95] of a eukaryotic protein expressed in E. coli may exceed the folding rate.This poses a challenge, particularly for multi-domain proteins.However, this challenge can be addressed through various strategies, such as adjusting the translation rate, harmonizing codon usage [96], or intentionally inducing ribosome stalling by incorporating rarer codons at domain boundaries.
When the host cell cannot handle the rate or volume of recombinant products being expressed, many proteins will misfold and cluster, eventually creating IBs and obstructing the expression.The primary reasons for the synthesis of IBs are limited post-translational modifications (PTMs) capacity and folding efficiency, which are of the utmost importance for increasing the functional activity of recombinant products [97].
To ensure the proper folding and functionality of antibodies with disulfide linkages, it is necessary to expose the individual antibody chains to the oxidizing conditions present in the bacterial periplasm.In addition, it should be noted that the periplasmic space serves as a habitat for specific proteins known as chaperonins and disulfide isomerases, which play a crucial role in correctly folding newly synthesized proteins [98].A leader sequence (PelB, OmpA, PhoA) drives the antibody to the oxidizing periplasm for periplasmic expression [99].After being expressed, the antibody is extracted from the periplasmic region by osmotic shock.Yields obtained from shaking flask cultures have been documented to range from 0.1 mg/L to 100 mg/L, while using fermenters has demonstrated the potential to achieve yields as high as 2 g/L [100].Utilizing specific E. coli strains that offer an oxidizing environment in the cytoplasm is an additional choice; typically, it comprises mutations of the enzymes, glutathione oxidoreductases, and thioredoxin reductases [101].

Solubilization
Despite careful selection of domain boundaries and solubilization tags, not all eukaryotic proteins can achieve proper folding in Escherichia coli, which possesses a diverse array of molecular chaperones (e.g., GroEL/ES, DnaK, Skp) and ten peptidyl cis-trans prolyl isomerases.Issues related to protein folding can be attributed to various factors, including translation rates, the formation of disulfide bonds through oxidative folding, the presence of essential post-translational modifications that E. coli cannot perform, the existence of buried prosthetic groups that wild-type E. coli cannot synthesize, or, in rare cases, the involvement of specialized folding factors.For instance, when attempting to express a hyperthermophilic α-amylase from Pyrococcus furiosus (a hyperthermophilic archaeum) in E. coli, it was found crucial to co-express small heat shock proteins (sHSP) or chaperonins (HSP60) from the same P. furiosus to facilitate proper folding.The folding and assembly of multi-subunit proteins continue to provide challenges in achieving proper folding and assembly of these proteins [103].
One effective approach for enhancing the solubility of recombinant products involves the utilization of peptide tags.Commonly tagged proteins include maltose-binding protein (MBP), glutathione S-transferase (GST), carbohydrate-binding module (CBM), thioredoxin, and NusA.Notably, novel CBM66 promotes the solubilization of several recombinant products and raises production titer.The NEXT tag [104], low-molecular-weight protamine [105], and 6HFh8 [106] are a few examples of tags smaller than recombinant proteins.
Fusion proteins, which assist in purifying, can incorporate solubilization tags.These tags, typically consisting of tiny, highly soluble, and stable proteins, facilitate the final product's solubilization and folding intermediates.Suppose a eukaryotic protein possesses a quantity of N-glycans exceeding one per every 100 amino acids.In that case, using a solubilization tag becomes necessary to facilitate the production of such protein in a soluble state within Escherichia coli.Solubilization tags, such as MBP (an affinity purification tag), thioredoxins, Sumo, or Fh8, are employed to maintain an optimal equilibrium for attaining soluble proteins.It is crucial to strike a balance, as excessive solubilization may impede proper protein folding.This step almost always requires a lot of trial and error.Different low-molecular-weight protein tags helped to solubilize and increase the yield of other RPs, only requiring fusion expression with recombinant proteins.
Protein solubility instantly alters when inclusion bodies form, leading to protein clumping.Though produced in a soluble form, the proteins can also end up as insoluble inclusion bodies that must be redissolved to refold into an active functional structure, primarily in the reducing environment of cytoplasm to improve their solubility.Insolubility or instability of E. coli proteins is managed by proper strain selection, the target of expression area, and post-translational modification [107].Additionally, the localization of the protein to the periplasmic space can be achieved by introducing an N-terminal periplasmic signal sequence.Although the process of native disulfide production is often inflexible, it is possible for the Sec secretion system and the folding apparatus in the periplasm to get overrun with relative ease.

Disulfide Bond
The predominant challenge encountered is the development of native disulfide bonds.There exist three distinct approaches to address this issue.Initially, it is plausible for the protein to undergo the formation of misfolded or unfolded protein aggregates, commonly referred to as inclusion bodies.Aggregates within inclusion bodies caused by the cytoplasmic production of antibody fragments (Fab and scFv) in E. coli frequently result in significant protein loss throughout the recovery process [108].Removing cysteine residues from the recombinant antibody sequences is one way to enhance soluble expression.One choice is the periplasmic expression, although yields can be a problem.Misfolding, inadequate solubility, and host burden are other causes of IB formation.
It is possible to employ a genetically modified variant that eliminates pathways responsible for decreasing disulfide bonds inside the cytoplasm or introduces catalysts that facilitate oxidative folding [109].Integrating the twin-arginine translocation (TAT)-secretion system can facilitate the exportation of adequately folded proteins to the periplasmic space.TatABC membrane protein overexpression improves the TAT translocation mechanism when the signal peptide TorA fusion RPs exploit the TAT translocation pathway [110].
Since disulfide bond (DSB) formation is an oxidative process, it occurs in the periplasm of E. coli rather than the cytoplasm, which is a reductive environment [111][112][113][114][115].This calls for the localization and translocation of the protein to the correct location for the modification [116].If proteins require a disulfide bond, then periplasm will be suitable because of its oxidizing properties.The reductive cytoplasmic milieu of a gor/trxB strain becomes oxidative when the normal reduction pathway is blocked, which promotes the production of DSBs [117].Based on this idea, Novagen [118] created Origami, the first commercial DSB-forming E. coli strain.A host dubbed CyDisCo was also created to produce recombinant molecules with a high disulfide bond (DSB) by overproducing the human cell enzyme disulfide bond isomerase and sulfhydryl oxidase from yeast mitochondria [119] using different sulfhydryl oxidases, inversion, or the periplasmic transmembrane disulfide bond-forming enzyme DsbB are additional strategies [120].
The development begins by isolating and sequencing the light and heavy chains that appear in the variable chains from hybridoma or B-cells.These sequences are then introduced in E. coli after cloning into a plasmid.Additional requirements can involve adding constant regions for the heavy and light chains.The constructs include a signal peptide sequence for periplasmic localization and a tag for easier purification.The E. coli is transformed with the plasmid containing the antibody sequences to induce expression with an inducer like isopropyl-beta-d-thiogalactopyranoside (IPTG) if the promoter used is lac.Following the induction, the protein, including antibodies, can be extracted from the periplasmic space, wherein E. coli often secretes recombinant proteins that are further purified using an affinity column that targets the tag in the construct or other purification strategies based on the properties of the antibody.It is then tested for functionality using techniques like ELISA, flow cytometry, or Western blotting to verify that the expressed antibody can bind its antigen.
Yeast mitochondrial sulfhydryl oxidase and human cell disulfide bond isomerase have been overexpressed to produce the CyDisCo host [121].

Post-Translational Modifications
PTMs, which include acetylation, glycosylation, and phosphorylation, influence the functional activity of recombinant products [122].While the functional activities increase by joining monosaccharides, oligosaccharides, or polysaccharides to proteins, glycosylation is the most prevalent and complex PTM [123].Because Campylobacter jejuni-related genes must be added, E. coli must be glycoengineered from the bottom up because it lacks a natural mechanism for glycosylation [124].The organism under investigation was able to facilitate glycoprotein synthesis by utilizing the first N-glycosylation expression system.In the last two decades, a considerable quantity of N/O-glycoproteins derived from E. coli or cell-free extracts have been generated.These advancements encompass the identification and orthogonality of glycosyltransferase substrates, investigations into the roles of diverse glycosylases, and enhancements in the host environment, metabolic pathways, and culture conditions [125][126][127][128][129]. These methods are used to create several products, such as the recombinant vaccine exotoxin A, the therapeutic protein O-glycosylated interferon-alpha2b [130], and N-glycosylated mannose3-N-acetylglucosamine2 [131].Serine recombinase expression increases, while the replication starts gene (oriC) is knocked down during late development stages, inhibiting host growth [132].
Recent research indicates that aglycosylated antibodies may have similar characteristics and functions, raising the possibility that glycosylation may not be required [133].Despite minor biophysical changes in the melting temperatures (Tm) of gFc and aFc [124], the orientation of the CH2 domain of aFc in solution was only very slightly affected, according to a small angle X-ray scattering research.This observation is supported by the lack of discernible changes in the E. coli-produced antibodies' "drug-like" properties; one such reported example is the antibodies made using E. coli against COVID-19 that are not glycosylated yet proven as potent as other types of antibodies.
Methodologies in synthetic biology have been employed to facilitate the production of various PTMs within the cytoplasmic environment.One such example is the synthesis of mucin-type O-glycosylation in Escherichia coli.Combining oligosaccharide production pathways and ApNGT overexpression for cytoplasmic N-glycosylation assists in cytoplasmic glycosylation [134].
It is noteworthy that E. coli's cytoplasm harbors methionine aminopeptidase, an enzyme capable of eliminating the initiating methionine residue based on the nature of the subsequent amino acids.Specifically, amino acids such as serine, alanine, cysteine, proline, or glycine at position P1' are preferred, while proline at position P2' inhibits this process [135].Furthermore, additional amino acids can be included in this list by implementing engineered systems.Additionally, this phenomenon is interconnected with the N-end rule, which governs the process of protein degradation inside a cellular environment.In Escherichia coli, proteins with an N-terminal residue of Arginine (Arg), Lysine (Lys), Leucine (Leu), Phenylalanine (Phe), Tyrosine (Tyr), or Tryptophan (Trp) are susceptible to fast degradation.However, the extent of degradation is contingent upon the specific characteristics of the N-terminal residue and the subsequent amino acids.

Strain and Media
Growth conditions and media optimization [136] influence growth conditions and medium culture composition on recombinant protein production.Different strains of E. coli are designed to overcome specific challenges in protein expression, such as protein toxicity, formation of inclusion bodies, or codon usage.For example, BL21(DE3) and its derivatives BL21(DE3) are common strains used for recombinant protein production [137].Rosetta is another strain that helps express proteins from organisms with rare codon usage [138].
To obtain optimal expression, the temperature of the upstream process should be normalized since the optimal temperature for E. coli growth is usually 37 • C [139], but lower temperatures, 30 • C or 25 • C, may promote proper folding and solubility of the expressed proteins [140].
The choice of media depends on the required growth rate, the need for specific selection markers, or the type of protein being expressed.The LB Media is the most used for protein expression [141].Specialized media (e.g., M9) can be customized for specific needs [142].
The chromosomal integration of six tRNA genes with low abundance, namely in the BL21(DE3) strain, is carried out to enable their production in the presence of a ribosomal manipulator [143].After developing an appropriate protein expression construct, the subsequent stage involves protein expression, wherein the utilization of the E. coli system offers notable advantages.Escherichia coli is a bacterial species that exhibits significant genetic diversity, with approximately 20% of the genome being shared among all strains.The categorization of strains can be broadly classified into four sub-groupings: K-12 strains, B-strains, and the C and W strains, which are distinguished depending on their first isolation.Recombinant protein manufacturing commonly involves the utilization of K-12 and B-strains in the context of K-12 and B-strain bacteria.Specific proteins notably depend on strain, frequently with unexplained underlying causes.Consequently, subjecting any novel protein to testing in both a K-12 and a B-strain has become customary.In a similar vein, there exists a diverse range of media options that can be categorized into two main groups: rich media, characterized by the inclusion of yeast extract and/or other mixed sources of peptides like tryptone, and chemically defined or minimum media, typically consisting of only 1-3 carbon sources and a singular nitrogen source.

Fermentation Conditions
Moreover, it is possible to manipulate the fermentation conditions, including the temperature of the culture after induction.This factor can significantly impact the production of properly folded proteins.The effect is attributed to the alteration in relative hydrophobicity at different temperatures and the reduced protein translation rate.This adjustment is necessary to ensure the folding machinery can handle the protein load without being overwhelmed.Recombinant protein preparations are also designed to exclude lipopolysaccharides (LPS) as potential contaminants.

Purification
Protein purification methods may vary considerably depending on the host organism used for protein expression, including Escherichia coli (E.coli), a commonly used prokaryotic host, or various eukaryotic and recombinant cells.
The affinity tag removal following protein purification can be achieved through proteolysis, wherein enzymes with broad specificity, such as trypsin, are employed.This enzymatic process is utilized to eliminate an N-terminal tag and the C-peptide from insulin derivatives, depending on the intended application of the protein [144].
The tags can be removed through more specific proteases such as Tobacco Etch Virus (TEV) (consensus site ENLYFQ↓G/S) and Factor Xa (consensus site IE/DGR).It is worth noting that proteases' efficacy may depend on their source, such as recombinant bovine Factor Xa has different specificity than recombinant human Factor Xa [145,146]; consulting the MEROPS database will help decide on the choice of proteases [147].By their specificity, proteins frequently exclude one or more amino acids from the cleavage site.To access buried cleavage sites, proteases necessitate the placement of the cleavage site within a flexible linker region, typically rich in glycine or serine.Consequently, this process leads to adding additional residues to the mature protein.The presence of proteases can degrade recombinant proteins, thereby diminishing yields and compromising stability.

Cell-Free Protein Synthesis System (CFPS)
Proteins can also be produced in a cell-free system, an in vitro process wherein even difficult-to-express molecules can be produced by manipulating the reaction media, which may not be possible when using cellular culture [148].The basis of this technology is based on engaging ribosomes in vitro instead of in vivo in the cells to translate proteins of any type.CFPS systems provide a controlled environment for protein synthesis and offer several advantages, including rapid production, flexibility, and the ability to incorporate non-natural amino acids or post-translational modifications.The vitro technology utilizes the extracts of E. coli or other cell sources such as rabbit reticulocytes, insect cells, or wheat germ from commercial sources, making cell-free synthesis a substantially less capitalintensive and a practical choice [149].The most widely used source remains E. coli [150].
Traditionally, cell-free protein synthesis has involved separate steps for transcription and translation.However, efforts have been made to develop systems that allow for the simultaneous coupling of transcription and translation, resulting in more efficient and streamlined protein synthesis processes [151].CFPS systems require an energy source, such as adenosine triphosphate (ATP), and amino acids for protein synthesis.ATP can be generated by adding an energy-regenerating system, such as creatine phosphate and creatine kinase.A mixture of natural or non-natural amino acids is supplemented to support protein production [152].
Hybrid cell-free systems combine the advantages of different cell extracts or components from various organisms to create tailored environments for protein synthesis.By leveraging the strengths of different sources, these hybrid systems can overcome limitations and expand the range of proteins that can be synthesized [153].
CFPS systems can be modified to incorporate non-natural amino acids into synthesized proteins, enabling the production of proteins with enhanced properties or novel functionalities.This is achieved by supplementing the reaction mixture with non-natural amino acids and using orthogonal translation systems that selectively incorporate these amino acids at specific codons [154].
Cell-free protein synthesis can be optimized by adjusting various parameters, such as reaction conditions, concentrations of components, and supplementation with specific factors.Optimization strategies include using different energy-regenerating systems, additives, or chaperones to enhance protein yield, folding, and functionality.Scale-up of CFPS can be achieved by increasing the reaction volume or employing high-throughput microscale platforms [155].
While CFPS primarily focuses on protein synthesis, efforts have been made to introduce post-translational modifications (PTMs) into cell-free systems.PTMs such as phosphorylation, glycosylation, and acetylation can be incorporated into proteins by adding the required enzymes or co-factors to the reaction mixture.This allows the production of proteins with functional PTMs for downstream applications [156].
Membrane proteins play crucial roles in various cellular processes and are challenging to produce using traditional expression systems.Cell-free protein synthesis offers a promising approach to producing membrane proteins, allowing for the incorporation of necessary membrane components and the utilization of specialized cell extracts [157].
DNA templates encoding the desired proteins are prepared and added to the cell extracts.These templates can be obtained through gene synthesis, PCR amplification, or isolation from natural sources.They typically contain an intense promoter sequence to drive transcription and a ribosome binding site (RBS) for efficient translation initiation [158].
Once the cell-free reaction mixture is prepared, it is incubated at an appropriate temperature, typically around 30-37 • C, to facilitate protein synthesis.The synthesized proteins can be harvested and purified using various techniques, including affinity chromatography, filtration, or precipitation [159].
Various platforms and technologies have been developed for cell-free protein synthesis, each offering unique advantages and capabilities.These include microfluidic, droplet-based, and cell-free systems integrated with nanomaterials or synthetic biology components [160].
The E. coli-based expression of proteins can also be adapted for novel technologies, particularly the cell-free synthesis system (CFPS) and continuous manufacturing (CM).
The FDA has not yet approved any biological drug manufactured using cell-free synthesis, but it is anticipated that biosimilars will adopt this technology since using a different technology is allowed in producing biosimilars; the cost factors will drive this decision.

Continuous Manufacturing (CM)
Continuous manufacturing (CM) of chemical and biological products has long been a goal to optimize the cost of manufacturing; however, the Current Good Manufacturing Practice (cGMP) compliance issues had pushed it back until March 2023, when the FDA released the first guideline to advise how to develop and adopt continuous manufacturing, particularly the biological products [161] (Figure 3).Continuous manufacturing requires a perfusion system, and it can be designed to use E. coli, which will be a better choice over the CHO cells due to a much shorter batch cycle, generally a few hours compared to weeks for the CHO cells.In E. coli, the proteins can be directed to the cytoplasm or periplasm or secreted directly into culture media, offering several choices on routing the recombinant protein, exploiting the features of each cellular compartment and the protein produced.
CHO cells due to a much shorter batch cycle, generally a few hours compared to weeks for the CHO cells.In E. coli, the proteins can be directed to the cytoplasm or periplasm or secreted directly into culture media, offering several choices on routing the recombinant protein, exploiting the features of each cellular compartment and the protein produced.The quest for a CM process has been in the works for several years [162].While the FDA has yet to approve a biological product manufactured in a continuous system, it anticipates much interest.As a result, in March 2023, the FDA released its first guidance on CM [163] addressing the scientific and regulatory issues, including the eCTD filing structure, which arose during the designing, installation, operation, and lifecycle management of CM for chemical and biological drugs.This guideline has opened the path to continuous systems over batch systems that will significantly impact the development and production cost, stability of proteins, and a significant reduction in the size of the bioreactors.Figure 2 shows a manufacturing flow in a CM for a therapeutic protein.The FDA also identifies other guidelines that control CM.The recombinant protein technology batch process is the industry standard.However, proteins can also be produced in a vessel, from which the yield is continuously removed, provided the protein is secreted into the culture medium.It helps to improve the yield of labile proteins and prevents inconsistent PTMs while maintaining cells at higher viabilities and critical factors [164].Besides the material costs, the reduced testing also adds significant savings.

Conclusions
The dawn of recombinant engineering came with E. coli expression systems.However, it was overshadowed by eukaryote systems that proved more suitable to express more complex and larger recombinant proteins.The E. coli system is re-emerging for all proteins, including antibodies and soon the full-length ones, as a cost-effective option since the technologies have evolved that make it possible to engineer E. coli to express any protein, bringing down the production cost due to its shorter manufacturing cycle, avoidance of viral contamination, and provide the base material to bring CFPS to reality.E. coli can be designed for continuous manufacturing, a feat not possible a few years ago.Most limitations in the E. coli system have now been well-resolved because of deep knowledge The quest for a CM process has been in the works for several years [162].While the FDA has yet to approve a biological product manufactured in a continuous system, it anticipates much interest.As a result, in March 2023, the FDA released its first guidance on CM [163] addressing the scientific and regulatory issues, including the eCTD filing structure, which arose during the designing, installation, operation, and lifecycle management of CM for chemical and biological drugs.This guideline has opened the path to continuous systems over batch systems that will significantly impact the development and production cost, stability of proteins, and a significant reduction in the size of the bioreactors.Figure 2 shows a manufacturing flow in a CM for a therapeutic protein.The FDA also identifies other guidelines that control CM.The recombinant protein technology batch process is the industry standard.However, proteins can also be produced in a vessel, from which the yield is continuously removed, provided the protein is secreted into the culture medium.It helps to improve the yield of labile proteins and prevents inconsistent PTMs while maintaining cells at higher viabilities and critical factors [164].Besides the material costs, the reduced testing also adds significant savings.

Conclusions
The dawn of recombinant engineering came with E. coli expression systems.However, it was overshadowed by eukaryote systems that proved more suitable to express more complex and larger recombinant proteins.The E. coli system is re-emerging for all proteins, including antibodies and soon the full-length ones, as a cost-effective option since the technologies have evolved that make it possible to engineer E. coli to express any protein, bringing down the production cost due to its shorter manufacturing cycle, avoidance of viral contamination, and provide the base material to bring CFPS to reality.E. coli can be designed for continuous manufacturing, a feat not possible a few years ago.Most limitations in the E. coli system have now been well-resolved because of deep knowledge about the E. coli; many technologies, including the updated BL21(DE3) genome, novel engineering, AI-based molecular docking, sensitive analytics, accurate optimization and fusing several engineering and mathematical tools; and many innovations that were not available to reference product when it was developed, likely 25 years ago [165].The suggestions made in this paper should encourage both the developers of new biologics and biosimilars to consider adopting these improved and more recent technologies to help

Figure 2 .
Figure 2. A systematic process of creating a plan to express a full-length antibody in E. coli.Arrow indicates optimized host follow on.

Figure 2 .
Figure 2. A systematic process of creating a plan to express a full-length antibody in E. coli.Arrow indicates optimized host follow on.

Figure 3 .
Figure 3. Continuous manufacturing system flow (as shown by the arrows) for therapeutic proteins as de-fined by the FDA.(Source: FDA.)

Figure 3 .
Figure 3. Continuous manufacturing system flow (as shown by the arrows) for therapeutic proteins as de-fined by the FDA.(Source: FDA.)