Technical Data of Heterologous Expression and Purification of SARS-CoV-2 Proteases Using Escherichia coli System

The SARS-CoV-2 coronavirus expresses two essential proteases: firstly, the 3Chymotrypsin-like protease (3CLpro) or main protease (Mpro), and secondly, the papain-like protease (PLpro), both of which are considered as viable drug targets for the inhibition of viral replication. In order to perform drug discovery assays for SARS-CoV-2, it is imperative that efficient methods are established for the production and purification of 3CLpro and PLpro of SARS-CoV-2, designated as 3CLpro-CoV2 and PLpro-CoV2, respectively. This article expands the data collected in the attempts to express SARS-CoV-2 proteases under different conditions and purify them under single-step chromatography. Data showed that the use of E. coli BL21(DE3) strain was sufficient to express 3CLpro-CoV2 in a fully soluble form. Nevertheless, the single affinity chromatography step was only applicable for 3CLpro-CoV2 expressed at 18 °C, with a yield and purification fold of 92% and 49, respectively. Meanwhile, PLpro-CoV2 was successfully expressed in a fully soluble form in either BL21(DE3) or BL21-CodonPlus(DE3) strains. In contrast, the single affinity chromatography step was only applicable for PLpro-CoV2 expressed using E. coli BL21-CodonPlus(DE3) at 18 or 37 °C, with a yield and purification fold of 86% (18 °C) or 83.36% (37 °C) and 112 (18 °C) or 71 (37 °C), respectively. The findings provide a guide for optimizing the production of SARS-CoV-2 proteases of E. coli host cells.


Summary
Since its first emergence in December 2019, a new coronavirus, namely, severe acute respiratory syndrome-coronavirus 2 (SARS-CoV-2), has become a global health issue [1][2][3]. The virus causes the coronavirus disease 2019 (COVID-19) [1,2], with the current total global death cases of more than 4 million people [3]. The fast and widespread spread of SARS-CoV-2 has prompted a rush to find promising targets for COVID-19 therapeutic development [4]. For this purpose, SARS-CoV-2 proteases have gotten a lot of attention for therapeutic development due to their critical functions during viral replication [5,6]. SARS-CoV-2 expresses two proteases of 3Chymotrypsin-like protease (3CLpro) or main protease (Mpro) and papain-like protease (PLpro), which are known to mediate the proteolytic processing during viral replication in the host cells. This processing occurs after the production of the 800 kDa polypeptide by the virus upon the translation of its material genetic inside the host cells. The 3CLpro cleaves the polypeptide at 11 positions to produce various essential structural and non-structural viral proteins [7]. Meanwhile, PLpro does similar things as 3CLpro, yet, with different cleavage sites. In addition to this function, PLpro deubiquitinates and deISGylates the proteins of the host cell by removing ISG15 and ubiquitin. This furthers assist the virus in dodging the host-innate immune response [8]. Because of their critical roles, the SARS-CoV-2 proteases are viable therapeutic targets for COVID-19 treatments [9][10][11][12]. The three-dimensional models of 3CLpro and PLpro of SARS-CoV-2 (designated as 3CLpro-CoV2 and PLpro-CoV2, respectively) were reported and available at the Protein Data Bank (Figure 1) which allows us to discover promising drug-able spots on these proteins. PLpro-CoV2 have a structure that resemble a right-hand fold, which comprises of four distinct domains. The thumb and palm (in which the catalytic triad is situated), the fingers (which include the Zn-binding sites), and an independent N-terminal domain which termed as Ubl domain [13]. Meanwhile, 3CLpro-CoV2 forms a dimer structure in which each protomer is comprised of three structural domains. The first two domains form a chymotrypsin-like fold that is responsible for catalytic reactions (where the catalytic dyad is situated), and the third domain is responsible for the enzyme dimerization [14].
Data 2021, 6, x FOR PEER REVIEW 2 of 12 function, PLpro deubiquitinates and deISGylates the proteins of the host cell by removing ISG15 and ubiquitin. This furthers assist the virus in dodging the host-innate immune response [8]. Because of their critical roles, the SARS-CoV-2 proteases are viable therapeutic targets for COVID-19 treatments [9][10][11][12]. The three-dimensional models of 3CLpro and PLpro of SARS-CoV-2 (designated as 3CLpro-CoV2 and PLpro-CoV2, respectively) were reported and available at the Protein Data Bank (Figure 1) which allows us to discover promising drug-able spots on these proteins. PLpro-CoV2 have a structure that resemble a right-hand fold, which comprises of four distinct domains. The thumb and palm (in which the catalytic triad is situated), the fingers (which include the Zn-binding sites), and an independent N-terminal domain which termed as Ubl domain [13]. Meanwhile, 3CLpro-CoV2 forms a dimer structure in which each protomer is comprised of three structural domains. The first two domains form a chymotrypsin-like fold that is responsible for catalytic reactions (where the catalytic dyad is situated), and the third domain is responsible for the enzyme dimerization [14]. Most drug discovery initiatives require the production of recombinant proteins, which is an indispensable step in the process. In particular, the screening step in the drug Most drug discovery initiatives require the production of recombinant proteins, which is an indispensable step in the process. In particular, the screening step in the drug discovery pipeline requires the target protein with sufficient amount and quality. Similarly, this is also Data 2021, 6, 99 3 of 12 a prerequisite for many fundamental studies (structural and functional studies) of the target protein, which further serve as a platform for drug design and development. However, the production of target protein through recombinant technology is often challenged by the issues of the expression level and purification process [15]. Labor and costs should be maintained to a minimum level in the ideal production process. To note, the whole process to obtain high purity of target protein under recombinant technology involves multiple and lengthy steps. This includes, but is not limited to, gene cloning, transformation into the host cells, expression induction, host cells harvesting and lysis, removal of the cell debris, and end up with purification through one or more chromatography processes followed by the purity and yield determination [16]. Optimization at any step along this pipeline is therefore needed for obtaining the most efficient production of the target protein.
Escherichia coli is a widely accepted host organism to produce various recombinant proteins due to its technical and cost issues. Nevertheless, the expression of foreign proteins in the microbial system, including E. coli, is challenging since these proteins place a metabolic load on the host. Producing larger amounts of soluble heterologous protein is a substantial difficulty due to its foreignness. Because these foreign proteins have a proclivity for misfolding and form insoluble inclusions in the host's cytoplasm [17,18]. The production and purification of recombinant 3CLpro-CoV2 and PLpro-CoV2 were widely reported; however, the majority of the reports used different techniques. As a result, extracting the conclusive demand for the most efficient strategy to be employed in future screening and studies is tough. So far, there has not been a study that details the technical data on the production and purifying processes of these proteins. This paper, therefore, offers a comprehensive description of the data of protein produced obtained from multiple expression conditions of SARS-CoV-2 proteases ( Table 1). The rest of the paper is arranged as follows: data description is done in Section 2 and a discussion of materials and methods used in this research is presented in Section 3.

Subject
Biological Sciences

Specific Subject Area
Biotechnology and biochemistry Table  Figure How Data Were Acquired

Type of Data
The expression system for 3CLpro-CoV2 or PLpro-CoV2 was transformed into E. coli BL21(DE3) or E. coli BL21-CodonPlus(DE3) strains. The expression of both proteases was obtained by isopropyl β-D-1-thiogalactopyranoside (IPTG) induction. The expression of target proteins was analysed using SDS-PAGE and observed using Gel Doc TM XR+ imager (Biorad, CA, USA). Purification profiles of both proteases from the selected conditions were obtained through purification under a single Ni 2+ -NTA affinity chromatography, followed by quantification of protein amount and enzymatic activity.

Data Format
Raw (Purification Table) Analyzed

Parameters for Data Collection
Concentration of IPTG for protein expression induction (mM); optical density at 600 nm (OD600); incubation temperature of protein expression ( • C); incubation time of protein expression (h); volume of sample (mL); amount of protein (mg); total activity (U); specific activity (U/mg); yield (%) and purification fold.

Description of Data Collection
The data was collected along the production and purification flows of 3CLpro-CoV2 and PLpro-CoV2 through two steps. The first step involved the over-expression of 3CLpro-CoV2 and PLpro-CoV2 in the E. coli host cells under several conditions. The data collected included the expression and solubility observed under sodium dodecyl sulphate-polyacrylamide gel electrophoresis (SDS-PAGE). The second step involved the purification of the proteins using Ni 2+ -NTA affinity chromatography. The data of purification performances were collected based on the amount of protein, activity, yield and purification fold.

Data Source Location
Whole experiments and data collection were performed at Biotechnology Research Institute, Universiti Malaysia Sabah, Kota Kinabalu, Sabah, Malaysia.

Data Accessibility
With the article.

Data Description
The construction systems for 3CLpro-CoV2 and PLpro-CoV2 were composed of the genes encoding 3CLpro (Ser1-Gln306) and PLpro (Met1-Glu320), respectively, which were then inserted into pD451-SR or pET21a expression plasmids, respectively. Under this system, 3CLpro-CoV2 was expressed in a fusion form to a maltose-binding protein (MBP), at the N-terminal, and a 6His-at the C-terminal, with a total theoretical size of 79 kDa. A linker sequence (LINGDGAGLEVLSAVLQ) was located between the Maltosebinding protein (MBP) and 3CLpro-CoV2, which also serves as the autocleavage site. The expressed 3CLpro-CoV2 was therefore expected to have a free N-terminus, with no MBP and linker fragment, due to the autocleavage event during the expression. Meanwhile, PLpro-CoV2 was expressed in a fusion form to a 6His-tag at its C-terminus, with the theoretical size of 37 kDa. The primary structures of 3CLpro-CoV2 and PLpro-CoV2 are shown in Figure 2. Table 2

Data Source Location
Whole experiments and data collection were performed at Biotechnology Research Institute, Universiti Malaysia Sabah, Kota Kinabalu, Sabah, Malaysia.

Data Accessibility
With the article.

Data Description
The construction systems for 3CLpro-CoV2 and PLpro-CoV2 were composed of the genes encoding 3CLpro (Ser1-Gln306) and PLpro (Met1-Glu320), respectively, which were then inserted into pD451-SR or pET21a expression plasmids, respectively. Under this system, 3CLpro-CoV2 was expressed in a fusion form to a maltose-binding protein (MBP), at the N-terminal, and a 6His-at the C-terminal, with a total theoretical size of 79 kDa. A linker sequence (LINGDGAGLEVLSAVLQ) was located between the Maltosebinding protein (MBP) and 3CLpro-CoV2, which also serves as the autocleavage site. The expressed 3CLpro-CoV2 was therefore expected to have a free N-terminus, with no MBP and linker fragment, due to the autocleavage event during the expression. Meanwhile, PLpro-CoV2 was expressed in a fusion form to a 6His-tag at its C-terminus, with the theoretical size of 37 kDa. The primary structures of 3CLpro-CoV2 and PLpro-CoV2 are shown in Figure 2. Table 2 Figure 3 shows the SDS-PAGE for the expression profile of 3CLpro-CoV2 under E. coli BL21(DE3) cells with the given conditions 1 and 2. It clearly shows that the 3CLpro-CoV2 was indeed expressed after the induction of IPTG at condition 1 or 2, as indicated by the presence of the band corresponding to 3CLpro-CoV2 in the lane without IPTG. The apparent size of 3CLpro-CoV2 in the gel, as shown in Figure 3, was~35 kDa, which was lower than its theoretical size (79 kDa). This is due to the autocleavage of MBP (~44 kDa) by 3CLpro-CoV2 during the expression.

7
E. coli BL21-Codon-Plus(DE3) Expressed in soluble f Figure 3 shows the SDS-PAGE for the expression profile of 3CLpro-CoV2 un coli BL21(DE3) cells with the given conditions 1 and 2. It clearly shows that the 3C CoV2 was indeed expressed after the induction of IPTG at condition 1 or 2, as ind by the presence of the band corresponding to 3CLpro-CoV2 in the lane without IPT apparent size of 3CLpro-CoV2 in the gel, as shown in Figure 3, was ~35 kDa, whi lower than its theoretical size (79 kDa). This is due to the autocleavage of MBP (~4 by 3CLpro-CoV2 during the expression.       Table 2. Further, only soluble fractions were proceeded to the purification process using Ni 2+ -NTA affinity chromatography to obtain pure proteins. Figure 5 shows the purified 3CLpro-CoV2 from the expression conditions 1 and 2 after the Ni 2+ -NTA chromatography. It shows that the presence of protein contaminants in Figure 5b was considerably undetectable under the gel. Meanwhile, the contaminants remain visible after the purification of 3CLpro-CoV2 expressed from condition 1 (Figure 5a). This may be due to that more indigenous proteins of E. coli host cell were expressed at 37 °C (condition 1) than at 18 °C (condition 2). The 3CLpro-CoV2 expressed from condition 1 was, therefore, considered to require further purification steps to be at a high purity level.  Table 2.
Further, only soluble fractions were proceeded to the purification process using Ni 2+ -NTA affinity chromatography to obtain pure proteins. Figure 5 shows the purified 3CLpro-CoV2 from the expression conditions 1 and 2 after the Ni 2+ -NTA chromatography. It shows that the presence of protein contaminants in Figure 5b was considerably undetectable under the gel. Meanwhile, the contaminants remain visible after the purification of 3CLpro-CoV2 expressed from condition 1 (Figure 5a). This may be due to that more indigenous proteins of E. coli host cell were expressed at 37 • C (condition 1) than at 18 • C (condition 2). The 3CLpro-CoV2 expressed from condition 1 was, therefore, considered to require further purification steps to be at a high purity level.  Table 2. Figure 6 shows the purified PLpro-CoV2 from the expression conditions 5, 6 10 after the Ni 2+ -NTA chromatography. It shows that only conditions 9 and 10 resu undetectable contaminants after the Ni 2+ -NTA chromatography (Figure 6c,d). Mean a remarkable presence of protein contaminants remained visible after the purifica PLpro-CoV2 expressed under conditions 5 and 6 ( Figure 6a,b). This may be due to coli BL21-CodonPlus(DE3) strain which expressed less protein at the given cond This result also suggests that the use of IPTG at low concentration (0.1 mM) w ZnSO4 may be sufficient to produce PLpro-CoV2 in a fully soluble form and high under a single-step chromatography.  Table 2. Figure 6 shows the purified PLpro-CoV2 from the expression conditions 5, 6, 9 and 10 after the Ni 2+ -NTA chromatography. It shows that only conditions 9 and 10 resulted in undetectable contaminants after the Ni 2+ -NTA chromatography (Figure 6c,d). Meanwhile, a remarkable presence of protein contaminants remained visible after the purification of PLpro-CoV2 expressed under conditions 5 and 6 (Figure 6a,b). This may be due to the E. coli BL21-CodonPlus(DE3) strain which expressed less protein at the given conditions. This result also suggests that the use of IPTG at low concentration (0.1 mM) with no ZnSO4 may be sufficient to produce PLpro-CoV2 in a fully soluble form and high purity under a single-step chromatography.  Table 2. Figure 6 shows the purified PLpro-CoV2 from the expression conditions 5, 6, 9 a 10 after the Ni 2+ -NTA chromatography. It shows that only conditions 9 and 10 resulted undetectable contaminants after the Ni 2+ -NTA chromatography (Figure 6c,d). Meanwh a remarkable presence of protein contaminants remained visible after the purification PLpro-CoV2 expressed under conditions 5 and 6 (Figure 6a,b). This may be due to the coli BL21-CodonPlus(DE3) strain which expressed less protein at the given conditio This result also suggests that the use of IPTG at low concentration (0.1 mM) with ZnSO4 may be sufficient to produce PLpro-CoV2 in a fully soluble form and high pur under a single-step chromatography.  Table 2.
Purification profile for 3CLpro-CoV2 and PLpro-CoV2 were calculated (Table  This calculation was only done for the protein that was able to be purified using sing step Ni 2+ -NTA chromatography. Table 2 shows that under condition 2, with a single pu fication step, 3CLpro-CoV2 was able to be produced at the purification fold of close to with a specific activity of 1.04 U/mg. Meanwhile, PLpro-CoV2 from condition 10 was ab to be purified better than that of condition 9, with the purification fold of more than 1 with comparable specific activity to condition 9. Notably, the measurable specific activ of these proteases (Table 3) indicated that the purified 3CLpro-CoV2 and PLpro-Co were enzymatically active. Qualitatively, the active forms of both proteases were al demonstrated by the changes on the assay cocktail, whereby the yellow color formati was observed in the presence of 3CLpro-CoV2 or PLpro-CoV2 (Figure 7). The yellow co is formed due to the release of pNA moiety of the substrate upon the cleavage of the su strate by the active protease. Table 3. Purification profile of SARS-CoV-2 proteases.
Purification profile for 3CLpro-CoV2 and PLpro-CoV2 were calculated (Table 3). This calculation was only done for the protein that was able to be purified using single-step Ni 2+ -NTA chromatography. Table 2 shows that under condition 2, with a single purification step, 3CLpro-CoV2 was able to be produced at the purification fold of close to 50, with a specific activity of 1.04 U/mg. Meanwhile, PLpro-CoV2 from condition 10 was able to be purified better than that of condition 9, with the purification fold of more than 100, with comparable specific activity to condition 9. Notably, the measurable specific activity of these proteases (Table 3) indicated that the purified 3CLpro-CoV2 and PLpro-CoV2 were enzymatically active. Qualitatively, the active forms of both proteases were also demonstrated by the changes on the assay cocktail, whereby the yellow color formation was observed in the presence of 3CLpro-CoV2 or PLpro-CoV2 (Figure 7). The yellow color is formed due to the release of pNA moiety of the substrate upon the cleavage of the substrate by the active protease. Table 3. Purification profile of SARS-CoV-2 proteases.

Expression and Purification of 3CLpro-CoV2
The expression system of 3CLpro-CoV2 was obtained from Andrey Kovalevsky (Oak Ridge National Laboratory, Oak Ridge, TN, USA) as described in Kneller et al. [14]. In this system, the gene encoding of 3CLpro-CoV2 was inserted into the pD451-SR plasmid, resulting in an expression system of pD451-3CLpro. The pD451-3CLpro was transformed into E. coli strain BL21(DE3) [19]. The positive transformants were selected and cultured in Luria Bertani (LB) medium containing 35 µg/mL kanamycin at 37 °C, 180 rpm overnight. Approximately 2% of the bacterial suspensions were transferred into the larger culture volume of LB medium containing the antibiotic and incubated at 37 °C, 180 rpm. The protein expression was induced with 0.5 mM of isopropyl β-D-1-thiogalactosidase (IPTG) and incubated at two different conditions (refer to Table 4) once the OD600nm reached 0.8.

Expression and Purification of 3CLpro-CoV2
The expression system of 3CLpro-CoV2 was obtained from Andrey Kovalevsky (Oak Ridge National Laboratory, Oak Ridge, TN, USA) as described in Kneller et al. [14]. In this system, the gene encoding of 3CLpro-CoV2 was inserted into the pD451-SR plasmid, resulting in an expression system of pD451-3CLpro. The pD451-3CLpro was transformed into E. coli strain BL21(DE3) [19]. The positive transformants were selected and cultured in Luria Bertani (LB) medium containing 35 µg/mL kanamycin at 37 • C, 180 rpm overnight. Approximately 2% of the bacterial suspensions were transferred into the larger culture volume of LB medium containing the antibiotic and incubated at 37 • C, 180 rpm. The protein expression was induced with 0.5 mM of isopropyl β-D-1-thiogalactosidase (IPTG) and incubated at two different conditions (refer to Table 4) once the OD600nm reached 0.8.

Expression and Purification of PLpro-CoV2
The expression system of PLpro-CoV2 was obtained from Prof. Shaun K. Olsen (University of South Carolina, USA) as described in Rut et al. [9]. Under this system, the gene encoding of PLpro-CoV2 was inserted into the pET21a plasmid, resulting in an expression system of pET21-PLpro. The pET21-PLpro was transformed into two E. coli strains of BL21(DE3) or BL21-CodonPlus(DE3). The positive transformants were selected and cultured in LB medium supplemented with respective antibiotics (100 µg/mL ampicillin for E. coli BL21(DE3); 100 µg/mL ampicillin with 25 µg/mL chloramphenicol for E. coli BL21-CodonPlus(DE3)) at 37 • C, 180 rpm for overnight. Similar to 3CLpro-CoV2, 2% of the bacterial suspensions were transferred into the larger culture volume of LB medium containing antibiotics and incubated at 37 • C, 180 rpm. The expressions of recombinant PLpro-CoV2 were conducted at eight different conditions (refer to Table 4).

Cell Harvesting
The cells were centrifuged at 8000× g at 4 • C for 10 min, followed by washing to completely remove the remaining medium. The washed cells were then suspended in lysis buffer and sonicated on ice. The cell debris from the sonication was removed through centrifugation at 35,000× g (Beckman Optima L-100K, Brea, CA, USA) for 30 min at 4 • C. The supernatant (soluble fraction) was then collected and used for purification steps [20].

Purification of Recombinant Proteins
The purifications of all recombinant proteins were conducted using an ÄKTA Pure liquid-chromatography system (GE Healthcare, Chicago, IL, USA) by Ni 2+ -NTA affinity chromatography. All purifications were run under the same flowrate. The 5 mL HisTrap HP column (GE Healthcare, USA) was firstly equilibrated with lysis buffer (20 mM Tris-HCl pH 8.0, 40 mM imidazole, 150 mM NaCl, 1 mM DTT). Before loading onto the column, the soluble fractions were firstly filtered using a 0.22 µm filter. The loading of the filtered sample onto the column was performed at the flow rate of 1 mL/min. The elution of bound proteins was conducted through a linear gradient of increasing concentration (0-500 mM) of imidazole in 20 mM Tris-HCl pH 8.0, 150 mM NaCl, 1 mM DTT.

SDS-PAGE
Confirmation of expression, solubility and purity of the target proteins was done using 15% sodium SDS-PAGE [21]. The gel was stained with Coomassie staining and visualized using a Gel DocTM XR+ imager (Biorad, Hercules, CA, USA).

Purification Profiles
The concentration of proteins was calculated using NanoDrop 1000 Spectrophotometer (Thermo Fisher Scientific, Waltham, MA, USA) at 280 nm. For this purpose, the extinction coefficient (ε) at 0.1% (1 mg/mL) of 1.65 and 0.70 were used for 3CLpro-CoV2 and PLpro-CoV2, respectively, which were calculated using ε = 1576 M −1 cm −1 for Tyr and 5225 M −1 cm −1 for Trp at 280 nm [22]. The total activity protein was calculated based on the following formula (1): Total activity (U) = Protein concentration mg mL × volume (mL) To obtain unit activity, the activity of 3CLpro-CoV2 and PLpro-CoV2 was measured using Z-TSAVLQ-pNA and L-Pyroglutamyl-L-phenylalanyl-Leucine-pNA substrates, respectively, according to Cheng et al. [23] and Bala et al. [24]. One unit activity is defined as the amount of enzyme required to produce 1µmol of the product in 1 min reaction time. Further, the specific activity was then calculated based on the following formula (2): Specific activity = Total units of desired protein mg of total protein (2) The purification yield was calculated based on formula (3): Yield (%) = Total activity of the respected step (U) Initial total activity (U) × 100 Meanwhile, the purification (fold) was calculated based on formula (4): Purification (fold) = Total specific activity of the respected step ( U mg ) Initial specific activity ( U mg )

Institutional Review Board Statement: Not applicable.
Informed Consent Statement: Not applicable.

Data Availability Statement:
The data presented in this study are available in https://doi.org/10.5 281/zenodo.5503693.