Enhanced Production of a Thermostable Carbonic Anhydrase in Escherichia coli by Using a Modified NEXT Tag

Carbonic anhydrase (CA) is an ultrafast enzyme that catalyzes the reversible conversion of carbon dioxide (CO2) to bicarbonate. CA is considered to be a green catalyst for enzyme-based CO2 capture and utilization. In particular, the CA of Thermovibrio ammonificans (taCA) has attracted increasing attention as a highly stable enzyme. However, the poor solubility and the low expression level in Escherichia coli have hampered further utilization of taCA. In a recent study, these limitations were partly resolved by using a small solubility-enhancing fusion tag named NEXT, which originates from the N-terminal extension of Hydrogenovibrio marinus CA. In this study, the NEXT tag was engineered by adding small peptides to the N terminus to further increase the production yield of NEXT-tagged taCA. The addition of ng3 peptide (His-Gly-Asn) originating from the N-terminal sequence of Neisseria gonorrhoeae CA improved the expression of NEXT-taCA, while the previously developed translation-enhancing element (TEE) and Ser-Lys-Ile-Lys (SKIK) tag were not effective. The expression test with all 16 codon combinations for the ng3 sequence revealed that the change in translation initiation rate brought about by the change in nucleotide sequence was not the primary determinant for the change in expression level. The modified ng3-NEXT tag may be applied to increase the production yields of various recombinant proteins.


Introduction
Carbonic anhydrase (CA, EC 4.2.1.1) is a ubiquitous (metallo) enzyme that plays important roles in various physiological processes, such as carbon dioxide (CO 2 ) transport, CO 2 metabolism, and pH homeostasis, by catalyzing the reversible hydration of CO 2 : CO 2 + H 2 O ↔ HCO 3 − + H + [1]. CAs are classified into eight families, including the recently discovered ι-CA family, that are evolutionarily distinct [2]. CAs are diffusionlimited enzymes with a k cat of up to 4.4 × 10 6 s −1 [3]. Due to the ultrafast kinetics, CA is considered to be a powerful biocatalyst for CO 2 capture and utilization (CCU) technologies to respond to climate change [4]. By accelerating CO 2 /HCO 3 − interconversion, CA can supply a sufficient amount of the inorganic carbon species as a feedstock for formic acid or methanol production [5,6], in vitro carboxylation [7], microalgal cultivation [8], and mineral carbonation [9]. For industrial utilization of CA, however, the low stability of CA and the high cost of CA production should be overcome.
The CA of Thermovibrio ammonificans (taCA), which originated from a deep-sea hydrothermal vent, has been expressed and purified in an Escherichia coli host [10][11][12]. A study on the purified recombinant taCA has demonstrated that taCA is the most thermostable bacterial CA with a half-life of 77 days at 60 • C under an aqueous buffer condition, making this stable enzyme a promising candidate for enzyme-based CCU [10]. However, the production yield of taCA in E. coli was relatively low, and, more seriously, the purified taCA showed poor solubility under low salt conditions, resulting in protein aggregation and precipitation [10].
Solubility-enhancing proteins can be used as fusion tags not only for circumventing the poor solubility but also for improving the expression level of target proteins [13]. Recently, a small but powerful solubility enhancer, named the NEXT tag, was developed [14]. The NEXT tag is a protein that is 53 amino acids in length, has a molecular mass of 5.5 kDa, and originated from the N-terminal extension of Hydrogenovibrio marinus CA. The NEXT tag is an intrinsically disordered protein that can entropically exclude large particles around its point of attachment to a target protein by random movements, preventing protein aggregation [14][15][16]. Despite its small size, the ability of the NEXT tag to improve both the solubility and production yield of target proteins was found to be superior to that of conventional tags, such as maltose-binding protein (MBP) with a molecular mass of 40.4 kDa. By the N-terminal fusion of the NEXT tag, a high-level expression of soluble taCA was achieved and no precipitation of purified enzyme was observed. In addition, the NEXT tag only minimally affected the activity and stability of taCA, while the inherent properties of taCA were abnormally altered by fusion to the large MBP. The outstanding ability of the NEXT tag prompted us to engineer the NEXT tag for more efficient production of target proteins.
Protein engineering of the NEXT tag by directed evolution can be highly effective at finding improved variants; however, it is generally time-consuming and labor-intensive. The design and introduction of a point mutation for improving protein expression is virtually impossible due to our poor understanding of the influence of such a mutation on the overall expression level of the target protein. In addition, these approaches can add another level of complexity since any amino acid substitution can potentially affect the function of the NEXT tag. Because protein expression is controlled primarily in the N-terminal coding region as well as in the 5 untranslated region [17], a change in amino acid sequence at the N terminus of the protein, as exemplified by the N-terminal fusion of small peptide tags, can drastically affect the expression level [18][19][20]. Based on this simple idea, herein, the N-terminal fusion of short peptides is examined to engineer the NEXT tag and improve the expression level of NEXT-tagged proteins, particularly NEXT-taCA. We show that the tripeptide His-Gly-Asn derived from the N-terminal sequence of a highly expressed CA of Neisseria gonorrhoeae is effective at improving the expression level of NEXT-tagged target proteins.

E. coli Strains and Plasmid Vector Construction
The strains, plasmids, and oligonucleotide primers used in this study are listed in Table 1. E. coli TOP10 (Thermo Fisher Scientific, Waltham, MA, USA) was used for the construction of plasmid vectors, and E. coli BL21(DE3) (Merck Millipore, Burlington, MA, USA) was used for recombinant protein expression. The cells were routinely cultivated in Luria-Bertani (LB) medium supplemented with appropriate antibiotics (50 µg/mL of ampicillin for recombinant strains and 10 µg/mL of streptomycin for wild-type E. coli TOP10) at 37 • C and 180 rpm in a shaking incubator (Jeiotech, Daejeon, Korea). The genes for modified NEXT tags were amplified by polymerase chain reaction (PCR) (Takara Bio, Shiga, Japan) using the primers listed in Table 1 and the previously constructed pET-NEXT-taCA [14] as the template. The PCR products were ligated into the pGEM-T Easy vector (Promega, Madison, WI, USA) and the insert sequences were confirmed by direct sequencing. The genes were subcloned into pET-NEXT-taCA and pET-NEXT-GFP treated with NdeI and NcoI restriction enzymes, substituting the original NEXT tag sequence with the modified ones. All of the recombinant proteins have a hexahistidine (His 6 ) tag at the C terminus and they were expressed under the control of T7lac promoter. The taCA gene was codonoptimized for E. coli with a codon adaptation index (CAI) of 0.88 [10]. The DNA and protein sequences of NEXT-taCA are shown in Table 2. pET-NEXT-taCA Expression plasmid carrying the NEXT-taCA fusion gene [14] pET-NEXT-GFP Expression plasmid carrying the NEXT-GFP fusion gene [14] pET-TEE1-NEXT-taCA 1 Restriction sites are underlined, and the regions that contain fusion peptides are in bold type.

Recombinant Protein Expression
Recombinant E. coli BL21(DE3) strains transformed with the constructed plasmids were incubated in Luria-Bertani medium supplemented with 50 µg/mL of ampicillin at 37 • C in the shaking incubator. The expression of recombinant protein was induced at 0.6-0.8 OD 600 by adding 1 mM isopropyl-β-D-thiogalactopyranoside (Duchefa Biochemie, Haarlem, The Netherlands). In addition, a sufficient amount of zinc was supplemented as the enzyme cofactor by adding 0.1 mM ZnSO 4 (Junsei, Tokyo, Japan). The cells were further cultivated at 37 • C for 12 h, followed by centrifugation at 4 • C and 4000× g for 10 min and resuspension in lysis buffer (50 mM sodium phosphate, 300 mM NaCl, and 10 mM imidazole; pH 8.0). The cell lysate was prepared by disrupting the cell suspension with an ultrasonic dismembrator (Sonics and Materials, Newtown, CT, USA) for 10 min at 20% amplitude (a 2-s pulse on and a 10-s pulse off) in ice water. The lysate was centrifuged at 4 • C and 10,000× g for 10 min, and the supernatant was designated the soluble fraction (S), while the pellet was designated the insoluble fraction (IS).  1 The sequence of the NEXT tag is in bold type and the (GGGGS) 2 linker sequence is italicized. The His 6 -tag sequence is underlined along with the XhoI restriction site.

Recombinant Protein Purification
Recombinant protein was purified by immobilized metal affinity chromatography (IMAC) via the His 6 tag. After cell lysis, the soluble fraction was mixed with Ni 2+nitrilotriacetic acid agarose beads (Qiagen, Hilden, Germany), and the recombinant protein was purified according to the manufacturer's instructions. The purified protein was thoroughly dialyzed against 20 mM sodium phosphate buffer supplemented with 300 mM NaCl (pH 7.5) at 4 • C.

CO 2 Hydration Assay
CA activity was measured by CO 2 hydration assay based on the time-course pH change estimated by using phenol red (Sigma-Aldrich, St. Louis, MO, USA) as a pH indicator [21]. An appropriate amount of diluted sample (usually 10 µL) was mixed with 600 µL of 20 mM Tris buffer (pH 8.3) supplemented with 100 µM phenol red in a disposable cuvette. Four hundred microliters of ice-cold CO 2 -saturated deionized water was added and mixed by thorough pipetting. The reaction was performed at 4 • C inside a UV-Visible spectrophotometer (Shimadzu, Kyoto, Japan) and the time-course absorbance change was monitored at 570 nm. The noncatalyzed reaction was measured by adding the same amount of blank buffer instead of enzyme solution. The time (t) required for the absorbance to decrease from 1.2 (corresponding to pH 7.5) to 0.18 (corresponding to pH 6.5) was obtained. The Wilbur-Anderson unit (WAU) was calculated as (t 0 − t)/(t × 5), where t 0 is the time for the noncatalyzed reaction [21,22].

Quantification of Purified Protein
The purified enzyme was denatured in denaturing buffer (6 M guanidine hydrochloride/20 mM sodium phosphate buffer; pH 7.5), and the absorbance of the denatured protein at 280 nm was measured in a quartz crystal cuvette (Hellma Analytics, Müllheim, Germany). The protein concentration of the purified sample was determined using the measured absorbance and the calculated extinction coefficient at 280 nm by ProtParam (http://web.expasy.org/protparam/) [23].

Thermostability Test
The concentration of purified enzyme was adjusted to 10 µM. The enzyme was incubated at 80 • C for 12 h and then immediately cooled on ice. The CO 2 hydration activity of the incubated sample was measured and compared with the activity of the nonincubated sample. The relative activity of the incubated sample was calculated and is presented as residual activity (%).

Fluorescence Measurement
The lysates of green fluorescent protein (GFP)-expressing cells were used for the measurement. The GFP fluorescence was measured with excitation and emission wavelengths at 430 nm and 510 nm, respectively, using a microplate reader (Tecan, Männedorf, Switzerland).

Effects of N-Terminal Addition of Translation-Enhancing Element and SKIK Tag
The translation-enhancing element (TEE), also known as the downstream box, is a cis element downstream of the start codon that can enhance the translation initiation by complementarily binding to a region of 16S rRNA [26]. The TEE sequence consists of 5 -AATCACAAAGTG-3 , which corresponds to the amino acid sequence of Asn-His-Lys-Val. As the first trial, the TEE sequence was added to the N terminus of NEXT-taCA right after the start codon. We tested both the direct fusion of the TEE sequence to the NEXT tag (TEE1) and the replacement of the first four amino acids of the NEXT tag with the TEE sequence (TEE2). Unfortunately, the expression level of NEXT-taCA (~33.1 kDa) decreased by 31% (TEE1) and 43% (TEE2) after the fusion of the TEE sequence (Figure 1a).
The Ser-Lys-Ile-Lys (SKIK) tag is a short peptide that was developed based on the most frequently found amino acid at each position in the four N-terminal amino acid positions excluding the first Met in highly expressed E. coli proteins [20]. Similar to the case of TEE fusion, we examined the effect of addition of the SKIK tag to the N terminus of NEXT-taCA by direct fusion (SKIK1) or by replacement of the N-terminal sequence of the NEXT tag (SKIK2). Again, both the constructs with the SKIK tag resulted in the decreased production of NEXT-taCA (Figure 1b). Although the TEE and SKIK sequences have been successfully used for improving the production of other recombinant proteins, they appeared to negatively affect the expression of NEXT-tagged proteins when combined with the NEXT tag, presumably by reducing the efficiency of the NEXT tag for gene expression. AATCACAAAGTG-3′, which corresponds to the amino acid sequence of Asn-His-Lys Val. As the first trial, the TEE sequence was added to the N terminus of NEXT-taCA righ after the start codon. We tested both the direct fusion of the TEE sequence to the NEXT tag (TEE1) and the replacement of the first four amino acids of the NEXT tag with the TEE sequence (TEE2). Unfortunately, the expression level of NEXT-taCA (~33.1 kDa) de creased by 31% (TEE1) and 43% (TEE2) after the fusion of the TEE sequence (Figure 1a). The Ser-Lys-Ile-Lys (SKIK) tag is a short peptide that was developed based on th most frequently found amino acid at each position in the four N-terminal amino acid po sitions excluding the first Met in highly expressed E. coli proteins [20]. Similar to the case of TEE fusion, we examined the effect of addition of the SKIK tag to the N terminus o NEXT-taCA by direct fusion (SKIK1) or by replacement of the N-terminal sequence of the NEXT tag (SKIK2). Again, both the constructs with the SKIK tag resulted in the decreased production of NEXT-taCA (Figure 1b). Although the TEE and SKIK sequences have been successfully used for improving the production of other recombinant proteins, they ap peared to negatively affect the expression of NEXT-tagged proteins when combined with the NEXT tag, presumably by reducing the efficiency of the NEXT tag for gene expression

Improved Production of NEXT-taCA by Fusion with a Peptide Derived from ngCA
The CA from N. gonorrhoeae (ngCA) is one of the most extensively studied bacterial CAs [27,28]. The ngCA is a highly soluble protein and its high-level expression in E. coli has been successfully demonstrated in previous studies [29,30]. Inspired by the high-level expression of soluble ngCA, the N-terminal sequences of ngCA were examined as fusion tags for improving the expression of NEXT-taCA.
At first, the peptides that consist of three (ng3), five (ng5), seven (ng7), and nine (ng9) N-terminal amino acids of ngCA excluding the first Met were tested as fusion tags. The nine amino acids with a sequence of His-Gly-Asn-His-Thr-His-Trp-Gly-Tyr (HGNHTHWGY) correspond to 27 nucleotides, which is long enough to cover the coding region in the ribosome docking site and thus affect the translation initiation rate [17]. The expression of the four variants along with the unmodified NEXT-taCA revealed that the ng3-NEXT-taCA, fused with the HGN sequence, showed the highest expression level and was~36% more produced than the unmodified control as analyzed by densitometric quantification (Figure 2a). The catalytic activities for CO 2 hydration were measured using the cell lysates, and the highest activity, which was~55% higher than that of the unmodified control, was obtained when using the lysate of ng3-NEXT-taCA (Figure 2b). We further tested whether one amino acid (ng1, His) or two amino acids (ng2, His-Gly) of ngCA would be better than the ng3 sequence. The expression levels and the activities of cell lysates of both ng1and ng2-fused NEXT-taCA were reduced compared with those of the ng3-NEXT-taCA (Figure 2c,d). These results demonstrate that the addition of the HGN sequence to the N terminus was optimally effective for improving the production yield of NEXT-taCA.
The catalytic activity of a cell lysate can be increased not only by an increased production of enzyme but also by an increase in the specific activity of the enzyme. Because it would be possible that the addition of HGN alters the structure and, in turn, the specific activity of NEXT-taCA, we purified the ng3-NEXT-taCA along with the original NEXT-taCA from the lysate by IMAC and measured the specific activity of the enzyme for CO 2 hydration. The specific activity of NEXT-taCA was not altered by the fusion of the HGN sequence (Figure 3), implying that the increased enzymatic activity of the cell lysate was exclusively due to the improved enzyme production. In addition, the thermal stability of NEXT-taCA also was not significantly affected by the addition of the HGN sequence ( Figure 3). Thus, it can be concluded that the modified NEXT tag with the additional HGN sequence successfully improved the expression level compared with the unmodified control with a minimal influence on the properties of the target protein.
ther tested whether one amino acid (ng1, His) or two amino acids (ng2, His-Gly) of would be better than the ng3 sequence. The expression levels and the activities lysates of both ng1-and ng2-fused NEXT-taCA were reduced compared with those ng3-NEXT-taCA (Figure 2c,d). These results demonstrate that the addition of the sequence to the N terminus was optimally effective for improving the production y NEXT-taCA.  The catalytic activity of a cell lysate can be increased not only by an increa duction of enzyme but also by an increase in the specific activity of the enzyme. it would be possible that the addition of HGN alters the structure and, in turn, the activity of NEXT-taCA, we purified the ng3-NEXT-taCA along with the origina taCA from the lysate by IMAC and measured the specific activity of the enzyme hydration. The specific activity of NEXT-taCA was not altered by the fusion of th sequence (Figure 3), implying that the increased enzymatic activity of the cell lys exclusively due to the improved enzyme production. In addition, the thermal sta NEXT-taCA also was not significantly affected by the addition of the HGN sequen ure 3). Thus, it can be concluded that the modified NEXT tag with the addition sequence successfully improved the expression level compared with the unmodif trol with a minimal influence on the properties of the target protein. To further test the applicability of the ng3-NEXT tag, we constructed ng3-NE and compared it with NEXT-GFP. The expression level of NEXT-GFP was succ improved by the addition of ng3 peptide (Figure 4a). The GFP fluorescence inte the cell lysate of ng3-NEXT-GFP was also higher (18%) compared with that of NE To further test the applicability of the ng3-NEXT tag, we constructed ng3-NEXT-GFP and compared it with NEXT-GFP. The expression level of NEXT-GFP was successfully improved by the addition of ng3 peptide (Figure 4a). The GFP fluorescence intensity of the cell lysate of ng3-NEXT-GFP was also higher (18%) compared with that of NEXT-GFP (Figure 4b). This result suggests that the modified NEXT tag with the N-terminal ng3 sequence can be potentially used as a general tag for better expression of the target protein compared with the original NEXT tag. To further test the applicability of the ng3-NEXT tag, we constructed ng and compared it with NEXT-GFP. The expression level of NEXT-GFP wa improved by the addition of ng3 peptide (Figure 4a). The GFP fluorescen the cell lysate of ng3-NEXT-GFP was also higher (18%) compared with that ( Figure 4b). This result suggests that the modified NEXT tag with the N-ter quence can be potentially used as a general tag for better expression of the compared with the original NEXT tag.

Combinatorial Test of the Coding Sequence for the ng3 Peptide
The coding sequence for the ng3 peptide used in our experiment was 5 -CACGGCAAT-3 . Although the fusion of ng3 improved the production of NEXT-taCA, it was not clear whether the improved production was achieved in the context of the amino acid or the nucleotide sequence. To investigate the effect of using synonymous codons on the protein expression level and optimize the codon combination of the ng3 peptide, we designed all of the 16 different possible combinations of codons for the HGN sequence (c1 to c15), including the original one (c0), and constructed the corresponding vectors for the expression of ng3-NEXT-taCA (Table 1).
Because all of the variants with different codon combinations would preserve the encoded amino acid sequence and it is not likely that the synonymous codon substitutions would perturb the enzyme folding and structure, which in turn would alter the specific activity of the enzyme [31], it could be assumed that the CO 2 hydration activity of the cell lysate would be proportional to the expression level of ng3-NEXT-taCA. The expression levels of the variants did not significantly deviate from that of the original ng3-NEXT-taCA, although some of them showed lower expression levels (Figure 5a). In addition, none of the codon combinations of ng3 clearly resulted in an improved expression of the enzyme compared with the original c0 (Figure 5a). Since it is generally accepted that protein expression is proportional to the rate of translation initiation [17], the translation initiation rates were predicted for the variants by using the UTR designer to compare them with the experimentally obtained expression levels (Table 3) [25]. Notably, the predicted rate for the original ng3-NEXT-taCA (70,038) was remarkably lower than that for NEXT-taCA (997,966) ( Table 3), while ng3-NEXT-taCA showed a higher expression level compared with NEXT-taCA as previously shown (Figure 2). In addition, the analysis of the relation between the predicted rates of ng3-NEXT-taCA variants and the experimental results revealed that the two factors showed no linear correlation (Figure 5b). These results show that the change in translation initiation rate brought about by the change in codon combination (i.e., nucleotide sequence) was not the primary determinant for the change in expression level. The improved expression of NEXT-taCA by the fusion of ng3 was likely to be achieved in the context of the amino acid sequence, which is similar to the case of the SKIK tag where two completely different codon combinations for the SKIK tag showed no significant difference in the expression level of the target protein [20].
with the experimentally obtained expression levels (Table 3) [25]. Notably, the predic rate for the original ng3-NEXT-taCA (70,038) was remarkably lower than that for NEX taCA (997,966) (Table 3), while ng3-NEXT-taCA showed a higher expression level co pared with NEXT-taCA as previously shown (Figure 2). In addition, the analysis of relation between the predicted rates of ng3-NEXT-taCA variants and the experimen results revealed that the two factors showed no linear correlation (Figure 5b). These resu show that the change in translation initiation rate brought about by the change in cod combination (i.e., nucleotide sequence) was not the primary determinant for the chan in expression level. The improved expression of NEXT-taCA by the fusion of ng3 w likely to be achieved in the context of the amino acid sequence, which is similar to the c of the SKIK tag where two completely different codon combinations for the SKIK showed no significant difference in the expression level of the target protein [20].

Conclusions
The fusion of tripeptide ng3 (HGN) to the N terminus of the NEXT tag was effective at improving the expression level without affecting the inherent properties of NEXT-taCA, while the previously developed sequences, such as TEE and SKIK, were ineffective. The amino acid sequence of ng3, and not the specific codon combination that would affect the translation initiation rate, appeared to be a primary determinant of the positive effect of ng3 addition on the protein expression level. In addition to the thermostable taCA that can be used for CO 2 capture and utilization, we expect that the production yields of other industrially important proteins might be improved by using the ng3-NEXT tag in E. coli.  Institutional Review Board Statement: Not applicable.
Informed Consent Statement: Not applicable.

Data Availability Statement:
The data presented in this study are available on request from the corresponding author.