High Throughput Expression Screening of Arabinofuranosyltransferases from Mycobacteria

: Studies on membrane proteins can help to develop new drug targets and treatments for a variety of diseases. However, membrane proteins continue to be among the most challenging targets in structural biology. This uphill endeavor can be even harder for membrane proteins from Mycobacterium species, which are notoriously difﬁcult to express in heterologous systems. Arabinofuranosyltransferases are involved in mycobacterial cell wall synthesis and thus potential targets for antituberculosis drugs. A set of 96 mycobacterial genes coding for Arabinofuranosyltransferases was selected, of which 17 were successfully expressed in E. coli and puriﬁed by metal-afﬁnity chromatography. We herein present an efﬁcient high-throughput strategy to screen in microplates a large number of targets from Mycobacteria and select the best conditions for large-scale protein production to pursue functional and structural studies. This methodology can be applied to other targets, is cost and time effective and can be implemented in common laboratories.


Introduction
Membrane proteins represent 20 to 30% of open-reading frames of all genomes sequenced [1,2] and perform essential functions in cells, such as transportation, signal transduction and energy production [3]. They also play important roles in several diseases and, as a result, are attractive therapeutic targets, estimated to represent more than 30% of all marketed drugs [4][5][6]. However, biochemical and structural characterization of membrane proteins have several bottlenecks, namely toxicity by excess of mRNA levels of the target protein [7], toxicity caused by heterologous expression [8], membrane lipid composition [9,10], detergent extraction and solubility [11,12], which ultimately results in low amounts of membrane protein produced.
Many efforts have been devoted on the development of protocols to efficiently produce membrane proteins in Escherichia coli. An elegant approach to accelerate this process involves the fusion of green fluorescent protein (GFP) to monitor the expression and purification processes [13,14]. A commonly used strategy consists of varying different parameters simultaneously, such as expression vectors with different tags and promoters, host strains, homologues or solubilizing detergents [15,16]. High-throughput (HTP) protein Processes 2021, 9,629 3 of 18 1% (w/v) final concentration and the plate containing the samples was incubated for 2 h at 4 • C, with gentle agitation. For the separation of the insoluble cell debris, plates were centrifuged at 3200× g for 20 min, 4 • C and 250 µL of the supernatants were transferred to a 96-well filter plate containing a bed of 50 µL Ni-NTA agarose resin (HisPur TM Ni-NTA Spin plate (Thermo Scientific™, Waltham, MA, USA)), previously washed with double distilled water and equilibrated with buffer (20 mM HEPES pH 7.5, 200 mM NaCl, 10 mM Imidazole, 0.1% DDM). Imidazole at 10 mM final concentration was added to each sample, to avoid unspecific binding of contaminants to the Ni-NTA resin. Plates were incubated for 15 min in a plate-shaker at 4 • C. The plates were then centrifuged, the flowthrough fractions collected and reloaded to the resin bed, repeating the 15 min incubation with the Ni-NTA resin bed. After the second incubation step, plates were centrifuged to remove unbound proteins. The resin was washed three times with 250 µL washing buffer (20 mM HEPES pH 7.5, 200 mM NaCl, 60 mM Imidazole and 0.1% DDM) and finally eluted with 250 µL of elution buffer (20 mM HEPES pH 7.5, 200 mM NaCl, 300 mM Imidazole and 0.05% DDM). Eluted samples were run on SDS-PAGE: 10% polyacrylamide gels were used for targets with molecular weights between 73-149 kDa (AftD, EmbA, EmbB and EmbC) and 12% polyacrylamide gels for 47-75 kDa targets (AftA, AftB and AftC).

Large Scale Protein Expression and Purification
50 mL cultures of each target in E. coli C41 cells were grown overnight in 250 mL flasks at 37 • C, 200 rpm, in LB medium supplemented with 100 µg/mL of ampicillin. The overnight cultures were used to inoculate 4 × 500 mL of 2xYT medium, supplemented with 100 µg/mL of ampicillin, in 2.5 L Thomson's Ultra Yield™ Flasks (Oceanside, CA, USA), at an initial OD 600 around 0.05. Cells were grown at 37 • C, 200 rpm, until the cultures reached OD 600 of 0.8 (2 to 2.5 h), then cultures were cooled to 22 • C and gene expression was induced overnight (~16 h) with 0.25 mM IPTG. Cells were harvested in the next day by centrifugation at 4472× g, for 15 min at 4 • C. OD 600 measurements were done in Ultrospec 10 Cell Density Meter. Cell pellets were re-suspended and homogenized in lysis buffer (20 mM HEPES pH 7.5, 200 mM NaCl, 20 mM MgSO 4 , 1 mM TCEP), protease inhibitor EDTA-free cocktail (Thermo Scientific™, Waltham, USA; Catalog number: 88266) and 25 U/mL of Benzonase nuclease (Santa Cruz Biotechnology, Dallas, USA; Catalog number: sc-391121). Cell suspension was passed twice at 15,000 psi on a cell disruptor (Constant Systems Ltd., Daventry, UK). Membranes were collected by ultracentrifugation at 197,215× g, for 30 min at 4 • C. Membranes were manually homogenized using a Wheaton ® glass homogenizer (DWK Life Sciences Limited, Stoke-on-Trent, UK) in 20 mM HEPES pH 7.5 and 200 mM NaCl, to which DDM was added to a final concentration of 1% (w/v). Membranes were solubilized for 2 h, with gentle agitation, at 4 • C. Soluble membrane fraction was collected by ultracentrifugation at 203,756× g for 30 min, at 4 • C. The supernatants were collected and incubated with 2 mL of equilibrated Ni-NTA agarose resin for 1.5 h at 4 • C, with gentle agitation. Imidazole was added to each sample to a final concentration of 10 mM, to prevent unspecific binding of contaminants. After incubation, the sample was loaded into a column for elution by gravity flow. The resin bed was washed with 10 column volumes (CV) with washing buffer (20 mM HEPES pH 7.5, 200 mM NaCl, 0.1% DDM, 60 mM Imidazole), and the proteins were eluted with 4 CV of elution buffer (20 mM HEPES pH 7.5, 200 mM NaCl, 0.05% DDM and 300 mM Imidazole). Eluted samples were concentrated and injected into a Superdex 200 column (Cytiva Europe GmbH, Freiburg, Germany) to assess protein dispersity. The collected fractions were run on SDS-PAGE: 10% polyacrylamide for 73-149 kDa targets (AftD, EmbA, EmbB and EmbC) and 12% polyacrylamide for those around 47-75 kDa (AftA, AftB and AftC).

Genomic Expansion and High-Throughput Cloning of Arabinofuranosyltransferases
A set of 96 target genes was assembled from the genomic sequences coding for seven AraTs from Mtb (AftA, AftB, AftC, AftD, EmbA, EmbB and EmbC). To each "seed" sequence, a "cluster" of homolog sequences, from 14 different Mycobacterium genomes was expanded, coding for proteins likely to have similar structure as the seed protein [28] (Table A1). LIC was performed as described previously by Bruni and Kloss [27]. Briefly, all sequences were amplified by PCR, using genomic DNA available from ATCC ® (Manassas, VA, USA) [https://www.lgcstandards-atcc.org (accessed on 26 March 2021)] and primer pairs compatible with LIC-adapted expression vectors (pNYCOMPS-N23 and pNYCOMPS-C23) that contained decahistidine affinity tag and Tobacco Etch Virus (TEV) protease cleavage site (ENLYFQS). 56 targets were successfully cloned into pNYCOMPS-N23 and 40 targets into pNYCOMPS-C23. Previous screening experiments had shown no expression for all constructs in pNYCOMPS-C23 vector (data not shown), therefore only the clones in pNYCOMPS-N23 were used for the HTP expression screening approach.

Small Scale High-Throughput Expression of Arabinofuranosyltransferases
All 56 positive clones in pNYCOMPS-N23 were transformed into C41, C43 and BL21 (DE3) pLysS E. coli strains. 24 deep-well plates were used to grow the positive clones simultaneously. Growth conditions, 2xYT rich medium, 0.25 mM IPTG and overnight postinduction at 22 • C, were established based on the results obtained in previous experiments. This allowed a fast and reliable comparison between different E. coli strains, also leaving room for optimization after target selection.
Cell harvesting by centrifugation and lysis were performed in 24 deep-well plates, maintaining a HTP downstream processing of the samples. Extraction of membrane proteins was achieved by adding detergent directly to each well, after cell lysis, incubating the plate at low temperature. The 24 deep-well plate is centrifuged again to clear the solubilized lysate from the cell debris. The solubilized lysate was transferred to a HisPur TM Ni-NTA Spin 96 well plate (Thermo Scientific™, Waltham, MA, USA) for affinity chromatography purification. In this step, the use of adjustable multichannel pipettes to transfer solutions from 24-well plate to 96-well plate was important for sake of speed and reproducibility/reliability, however, standard multichannel pipettes can also be used although not in an optimal manner. After a single Ni-NTA purification step, the amount of eluted target protein was too low to be detected by SDS-PAGE. Since the sample solution slowly flows from the filter plate by gravity during the incubation period, a second passage was deemed necessary to increase the contact time between the sample and Ni-NTA resin, after which the eluted AraTs could be visualized on the gel. The full pipeline is summarized in Figure 1.
In total, 17 out of 96 distinct proteins were produced and purified, resulting in 18% success rate of protein production (Table 1). All three different E. coli host strains were able to produce target proteins: 16 in C41, 6 in C43 and 8 in BL21 (DE3) pLysS (Table A2). AftB and EmbC proteins were not detected in any E. coli strain using this HTP method, suggesting that different, perhaps more tailored conditions may be needed to successfully produce these proteins. We found that a single His-tag purification step was not very efficient, considering that persistent contaminants from the host cell are present across all targets ( Figure 2; see Figures A1 and A2). Moreover, the production yields for the target proteins herein studied were low in all E. coli host strains. Nevertheless, we were still able to successfully identify bands in the SDS-PAGE that could correspond to our target proteins, based on their predicted molecular weight (MW) and considering the gel shifting for membrane proteins in denaturing protein gels [29]. Due to this anomalous migration pattern, bands related to membrane proteins in SDS-PAGE most often appear~20-30% below their predicted MW.
to successfully identify bands in the SDS-PAGE that could correspond to our target proteins, based on their predicted molecular weight (MW) and considering the gel shifting for membrane proteins in denaturing protein gels [29]. Due to this anomalous migration pattern, bands related to membrane proteins in SDS-PAGE most often appear ~20-30% below their predicted MW.

Validation of HTP Target Selection by Large-Scale Protein Production
Based on SDS-PAGE analysis, we selected one target from each cluster for large-scale production: AftA and AftC from M. neoaurum, AftD from M. abscessus 1948 F5/8, EmbA from M. marinum M. and EmbB from M. vanbaalenii PYR-1, all produced in E. coli C41. Growth conditions were similar to the ones used in the HTP screening, although cell lysis and membrane extraction steps were modified according to the cell mass. Most importantly, the incubation time of solubilized membranes with Ni-NTA resin was increased to improve protein binding and purification yield. All chosen targets were successfully produced in large scale, thus validating the selection made from the HTP screening. Size exclusion chromatography (SEC) was performed after affinity chromatography to further purify the protein and as tool for preliminary biophysical characterization of each protein ( Figure 3). Although all targets show some aggregation in the presence of DDM, it was still possible to identify heterogeneous protein populations in most target samples. Upon SDS-PAGE analysis of the SEC elution fractions, we observed that the dominant protein bands correspond to the desired targets, however there were still contaminants present. AftA ( Figure 3A), EmbA ( Figure 3D) and EmbB ( Figure 3E) showed the least amount of contaminant proteins.
For EmbB, a second SEC step was performed ( Figure 4), running each population separately. We could observe that the high molecular weight EmbB population behaves as a stable monodisperse population ( Figure 4A,B), while the low molecular weight EmbB population splits into the same two populations observed in the first SEC run ( Figure 3E), suggesting that EmbB monomers are prone to form an equilibrium with stable EmbB dimers.

Validation of HTP Target Selection by Large-Scale Protein Production
Based on SDS-PAGE analysis, we selected one target from each cluster for large-scale production: AftA and AftC from M. neoaurum, AftD from M. abscessus 1948 F5/8, EmbA from M. marinum M. and EmbB from M. vanbaalenii PYR-1, all produced in E. coli C41. Growth conditions were similar to the ones used in the HTP screening, although cell lysis and membrane extraction steps were modified according to the cell mass. Most importantly, the incubation time of solubilized membranes with Ni-NTA resin was increased to improve protein binding and purification yield. All chosen targets were successfully produced in large scale, thus validating the selection made from the HTP screening. Size exclusion chromatography (SEC) was performed after affinity chromatography to further purify the protein and as tool for preliminary biophysical characterization of each protein ( Figure 3). Although all targets show some aggregation in the presence of DDM, it was still possible to identify heterogeneous protein populations in most target samples. Upon SDS-PAGE analysis of the SEC elution fractions, we observed that the dominant protein bands correspond to the desired targets, however there were still contaminants present. AftA ( Figure 3A), EmbA ( Figure 3D) and EmbB ( Figure 3E) showed the least amount of contaminant proteins.   For EmbB, a second SEC step was performed (Figure 4), running each population separately. We could observe that the high molecular weight EmbB population behaves as a stable monodisperse population ( Figure 4A,B), while the low molecular weight EmbB population splits into the same two populations observed in the first SEC run ( Figure 3E), suggesting that EmbB monomers are prone to form an equilibrium with stable EmbB dimers.

Discussion
The need to screen the expression of a large number of membrane protein targets, as well as the selection of optimal conditions for production and purification of desired targets, led to the development of several HTP strategies. The strategy used in this study is not novel and was intended to setup a protocol to search for the best candidates to pursue functional and structural studies on AraTs from Mycobacteria. Embs are targets of ethambutol, whereas Afts are potential targets to develop new drugs to treat tuberculosis. Nevertheless, the protocol herein described can also be applied to evaluate the expression and purification of other membrane proteins.
The methodology involved the selection of 13-14 orthologue genes of each AraT subfamily (EmbA-C, AftA-D) from a variety of host genomes, gene expression with vectors harboring a poly-histidine affinity tag at either N-or C-terminus, transformation into three different E. coli strains, membrane extraction and protein solubilization by DDM detergent, and purification by Ni-NTA chromatography. By using this simple combinatorial approach, we were able to clone 56 genes at pNYCOMPS-N23 and 40 at pNYCOMPS-C23, and produce 17 proteins out of 96 chosen targets, corresponding to a success rate of 18%. Such rate is not surprising, considering that membrane proteins are often difficult to express and purify [7,30]. Heterologous expression of mycobacterial proteins in E. coli has previously been reported not to exceed 40% [30][31][32].
No expression of AraTs cloned into pNYCOMPS-C23 was observed (data not shown). It is well known that type and location of the fused affinity tag has a significant effect at all stages of protein production [33], however it is not possible to know a priori

Discussion
The need to screen the expression of a large number of membrane protein targets, as well as the selection of optimal conditions for production and purification of desired targets, led to the development of several HTP strategies. The strategy used in this study is not novel and was intended to setup a protocol to search for the best candidates to pursue functional and structural studies on AraTs from Mycobacteria. Embs are targets of ethambutol, whereas Afts are potential targets to develop new drugs to treat tuberculosis. Nevertheless, the protocol herein described can also be applied to evaluate the expression and purification of other membrane proteins.
The methodology involved the selection of 13-14 orthologue genes of each AraT subfamily (EmbA-C, AftA-D) from a variety of host genomes, gene expression with vectors harboring a poly-histidine affinity tag at either N-or C-terminus, transformation into three different E. coli strains, membrane extraction and protein solubilization by DDM detergent, and purification by Ni-NTA chromatography. By using this simple combinatorial approach, we were able to clone 56 genes at pNYCOMPS-N23 and 40 at pNYCOMPS-C23, and produce 17 proteins out of 96 chosen targets, corresponding to a success rate of 18%. Such rate is not surprising, considering that membrane proteins are often difficult to express and purify [7,30]. Heterologous expression of mycobacterial proteins in E. coli has previously been reported not to exceed 40% [30][31][32].
No expression of AraTs cloned into pNYCOMPS-C23 was observed (data not shown). It is well known that type and location of the fused affinity tag has a significant effect at all stages of protein production [33], however it is not possible to know a priori the impact caused by tag addition. This unpredictability is somehow the foundation of HTP approaches-try as many conditions as is reasonably possible and assess what works to proceed with further studies.
Concerning the host organism, M. smegmatis could be a viable alternative for the heterologous expression of Mb proteins [31], yet we considered it not appropriate for a HTP approach due to its slower growth rate compared to E. coli, and mostly due to its waxy surface [34], which promotes clumping, film formation and cell adhesion to surfaces, especially plastic, preventing an optimal use of 96-or 24-well plates for cell growth. Instead, we used E. coli C41 and C43 suited for overexpression of toxic and membrane proteins [35], and BL21 (DE3) pLysS for controlled expression [36]. The different expression levels observed among E. coli hosts suggests that the type of strain plays a pivotal role in the number of well-expressed AraTs, as also reported for other target proteins [37]. Indeed, regulation of T7 RNA polymerase expression either by mutations in its promoter (C41 and C43 Walker strains) or by its natural inhibitor T7 lysozyme (T7Lys, pLysS strain) can significantly influence membrane protein overexpression yields [38].
Large scale production (from 2 L of culture) of 5 targets, chosen based on the results of small-scale experiments, yielded purified proteins in milligram amounts, although differences are observed on the intensity of their respective bands (SDS-PAGE) in small and large-scale experiments. The shorter incubation time with the Ni-NTA agarose in the small-scale screening (2 × 15 min vs. 1.5 h) could account for this discrepancy. Moreover, the aeration rate related with the size and shape of the growth vessel (24-well plate vs. 2.5 L flasks) may also affect the overexpression levels. In addition, switching to a cobalt spin plate, instead of nickel, may increase binding specificity of the target protein and thus further improve the results for the small-scale screening. We expect similar results will be obtained on the scale-up production of other targets that showed expression on the HTP screening. Fusions with GFP tag could be advantageous to monitor the various steps of protein production by measuring fluorescence, a very sensitive detection method [13]. However, this methodology is not suitable for membrane proteins with periplasmic Cterminus [13,20,39], which is the case for most of the AraTs herein studied, so it was not considered.
Detergents are required to extract and purify target proteins and their choice is a key parameter on the entire process. We chose to use DDM since it is a mild detergent and one of the most commonly used for this purpose [20,27,33,40]. The aggregation detected in the large-scale production experiments suggests further detergent screens may be needed to select the best detergent formula for each individual target. We were able to separate two different populations of EmbB from M. vanbaalenii PYR-1, likely constituted by monomer and dimer, respectively. Despite the aggregation, all targets showed soluble populations in the SEC elution profiles, which represents a good starting point for optimization towards structural studies. Interestingly, 3D structures of both oligomeric states have been already characterized by single particle cryo-electron microscopy (cryo-EM) for EmbB from M. smegmatis [41,42]. Noteworthy, different detergents or solubilizing agents may be needed for structural studies. The cryo-EM structures of several AraTs, namely EmbA-EmbB complex, M. tuberculosis EmbB [43], M. smegmatis EmbB [42] and AftD [44], have been recently characterized and different solubilization agents were used, namely glyco-diosgenin (GDN) detergent, amphipols or nanodiscs. Interestingly, the structure of EmbC solubilized in DDM has been determined by X-ray crystallography [43].
On one hand, the production of AftB and EmbC targets was not achieved using the HTP workflow with "standard" conditions. Therefore, other parameters must be explored, such as growth media, temperature, incubation time and type of detergent, host strains or expression vectors, which will likely lead to better success rates. On the other hand, AftA and AftC from M. neoaurum were expressed and are attractive targets for drug development [45][46][47] and structural elucidation, since their structures are not yet known.