Improved Production and Biophysical Analysis of Recombinant Silicatein-α

Silicatein-α is a hydrolase found in siliceous sea sponges with a unique ability to condense and hydrolyse silicon–oxygen bonds. The enzyme is thus of interest from the perspective of its unusual enzymology, and for potential applications in the sustainable synthesis of siloxane-containing compounds. However, research into this enzyme has previously been hindered by the tendency of silicatein-α towards aggregation and insolubility. Herein, we report the development of an improved method for the production of a trigger factor-silicatein fusion protein by switching the previous hexahistidine tag for a Strep-II tag, resulting in 244-fold improvement in protein yield compared to previous methods. Light scattering and thermal denaturation analyses show that under the best storage conditions, although oligomerisation is never entirely abolished, these nanoscale aggregates of the Strep-tagged protein exhibit improved colloidal stability and solubility. Enzymatic assays show that the Strep-tagged protein retains catalytic competency, but exhibits lower activity compared to the His6-tagged protein. These results suggest that the hexahistidine tag is capable of non-specific catalysis through their imidazole side chains, highlighting the importance of careful consideration when selecting a purification tag. Overall, the Strep-tagged fusion protein reported here can be produced to a higher yield, exhibits greater stability, and allows the native catalytic properties of this protein to be assessed.


Introduction
The silicateins are a family of enzymes from marine demospongiae that are involved in biosilicification, the biogenic conversion of soluble silicates into inorganic silica [1,2]. Biochemically, the silicateins catalyse the hydrolysis and condensation of silicon-oxygen bonds, a capability unique to a small number of organisms [1,2]. Silicatein-α (Silα) is the most abundant of silicatein's three isoforms that form axial protein filaments within the silica spicules of Suberites domuncula [3]. Analysis of the proteins' primary sequence shows a high degree of similarity to proteases of the cathepsin family [3,4]. One major difference between these two families of enzyme, however, is found in the catalytic triad. Both proteins bear the classical Xaa-His-Asn motif, but while cathepsin L is a cysteine protease with this residue in the Xaa position, Silα bears serine instead.
For TF-Silα-Strep, a chemically synthesised codon-optimised DNA fragment encoding the fusion protein with a 5 NdeI site and 3 BamHI site was used as the starting point. The tf-silα-strep gene was digested with NdeI and BamHI and ligated into a pET11a vector. Successful molecular cloning of the completed pET11a-TF-Silα-Strep was confirmed by DNA sequencing. BL21(DE3) cells transformed with this vector were grown at 37 • C overnight, in LB medium with 100 µg/mL ampicillin. This resulting culture (10 mL) was used to inoculate 800 mL fresh LB medium, which in turn was grown at 37 • C to an optical density (OD600) of 0.6. IPTG was added to a final concentration of 1 mM and the culture shaken overnight at 16 • C. Cells were harvested by centrifugation (3500× g for 20 min at 4 • C), the media discarded, and the cell pellet frozen at −20 • C prior to lysis. The frozen cell pellet was resuspended in the appropriate lysis buffer ( Table 2). The resuspended cells were lysed by sonication at 4 • C for 8 cycles of 1 min pulse and 5 s rest at 50% amplitude. The lysate was centrifuged and analysed in the same manner as the His 6 -tagged proteins. The TF-Silα-Strep was isolated from the supernatant using 5 mL StrepTrap columns (with multiple columns connected in series where necessary for larger Biomolecules 2020, 10, 1209 4 of 17 batches), with elution of the protein from the column being performed using the same buffer used for lysis with an additional 1 mM EDTA and 2.5 mM desthiobiotin. In all cases, the concentration of the isolated proteins were estimated by UV-vis absorbance at 280 nm using molar absorption coefficients computed by ProtParam [24,25], and the yield was quantified by multiplying the volume of solution and the concentration of the protein.

Circular Dichroism
All protein samples were exchanged into the relevant buffer (Table 3) by gel filtration using a PD-10 column, then diluted to 1 mg mL −1 for analysis. Data was collected over a wavelength range of 192 to 260 nm, at a scan speed of 0.5 nm s −1 . The temperature was maintained at 22 • C using a temperature-controlled chamber purged with N 2 . Baseline measurements were performed using buffer alone. The raw CD data was used to calculate molar ellipticity and mean residue molar ellipticity.

Dynamic Light Scattering, Static Light Scattering and Intrinsic Fluorescence
All protein samples were exchanged into the relevant buffer by gel filtration using a PD-10 column, then diluted to 1 mg mL −1 with relevant buffer (Table 3). 8 µL of each sample was loaded into the multi-micro cell array in triplicate. The data from individual experiments were inspected, any outliers discarded, and the remaining data averaged. All three samples were analysed within 2 h of buffer exchange. The temperature was increased from 18 to 90 • C at a rate of 2 • C min −1 . The SLS and fluorescence measurements were recorded every 30 s, while the DLS measurement was recorded at 20 • C.
DLS measurements (size distribution and polydispersity) were calculated with the UNcle software (version 2.0) correlation function. The hydrodynamic radius (d h ) amplitude maxima were then identified in R using the stat_peaks function from the ggpmisc package [26], and the d h for each protein calculated as the mean d h at maximum amplitude ± standard deviation across the three pH conditions. The estimated theoretical hydrodynamic diameters of folded proteins were calculated using the following equation [27]: where d h is the hydrodynamic diameter in nm, and N is the number of residues in the polypeptide chain. For expected hydrodynamic diameters of unfolded proteins, the following equation was used [27]: For SLS, the intensity of scattered light was measured at 266 nm and was used to calculate the aggregation temperature (T agg ) with UNcle software (version 2.0). The intrinsic fluorescence emission was measured at 473 nm and the first derivative of the barycentric mean (BCM) was used to calculate the melting temperature (T m ) using the same software.

Enzyme Activity Assays
Tetraethoxysilane (TEOS) assays were undertaken following the procedure published previously [15]. The TBDMS-ONp assays were based on that which was previously published, but modified to compensate for the lower activity of the Strep-tagged protein.
In this case, a 1 mM stock of TBDMS-ONp was prepared in aqueous solution with 10% v/v 1,4-dioxane. A series of 2 mL working solutions at 10-fold the concentration of substrate required for each enzyme assay was then made by diluting the stock solution with the appropriate volume of 10% v/v aqueous 1,4-dioxane. 20 µL of each working solution was aliquoted into each well of a microtitre plate and 80 µL of the assay buffer (50 mM Tris, 100 mM NaCl at pH 8.5) was added. Separately, the enzyme was buffer exchanged into the same assay buffer from the purification buffer by overnight dialysis and adjusted to a concentration of 13.42 µM. 100 µL this enzyme solution was added and mixed to initiate the reaction (i.e., final assay volume of 200 µL and final enzyme concentration of 6.71 µM). The spectrometric data collection was then commenced and the UV-Vis absorbance at 405 nm was recorded every 5 min at ambient temperature (approximately 22 • C) for 1200 min with periodic shaking. Each time-course measurement was carried out in triplicate.
The concentration of the nitrophenoxide ion released during the reaction was determined from a calibration curve of known concentrations of 4-nitrophenol in the same reaction buffer. Control reactions were carried out where the enzyme solution was omitted and replaced with an equivalent volume of the assay buffer (50 mM Tris, 100 mM NaCl at pH 8.5). The concentration of the phenoxide product from the control reactions were then subtracted from each enzymatic reaction, to obtain the net product concentration generated by enzyme catalysis. Graphs of net phenoxide concentration against time were then plotted. The initial velocities (V 0 ) were obtained from performing a linear regression of the initial part of the curve (typically the first 50 min of the assay, which is assumed to approximate a linear increase). The V 0 /[E] was then plotted against the corresponding substrate concentration in each assay reaction and fitted against the equation for the Michaelis-Menten model of enzyme kinetics within the OriginPro 2019b (build 9.6.5.169) software to obtain the K M and V max values. The k cat values were calculated from the V max and the enzyme concentration, using the formula k cat = V max /[E].

Design of Recombinant Protein Constructs
Three separate protein constructs were designed and the genes synthesised for: (i) mature (without its pro-peptide) Silα from S. domuncula fused to an N-terminal hexahistidine tag (His 6 -Silα); (ii) mature Silα fused at the N-terminal to TF, itself fused at the N-terminal to an His 6 tag (His 6 -TF-Silα); and (iii) mature wild type Silα fused at the N-terminal to TF and a C-terminal Strep II tag (TF-Silα-Strep) ( Figure 1). The two His 6 -tagged proteins were the same as those in the previous report [15], with the TF-fusion already having been shown to exhibit superior performance in terms of practical handling and stability. The Strep II tag was chosen for investigation as it was known to be unaffected by the choice of buffer (as long as the pH remains above 7), and the presence of a wide range of additives such as metal ions and chelating agents [16,28,29]. In all cases, gene fusions were cloned into a synthetic vector under the control of the T7lac promoter, and transformed into E. coli BL21(DE3). The proteins of interest were then overproduced according to standard procedures.

Design of Recombinant Protein Constructs
Three separate protein constructs were designed and the genes synthesised for: (i) mature (without its pro-peptide) Silα from S. domuncula fused to an N-terminal hexahistidine tag (His6-Silα); (ii) mature Silα fused at the N-terminal to TF, itself fused at the N-terminal to an His6 tag (His6-TF-Silα); and (iii) mature wild type Silα fused at the N-terminal to TF and a C-terminal Strep II tag (TF-Silα-Strep) ( Figure 1). The two His6-tagged proteins were the same as those in the previous report [15], with the TF-fusion already having been shown to exhibit superior performance in terms of practical handling and stability. The Strep II tag was chosen for investigation as it was known to be unaffected by the choice of buffer (as long as the pH remains above 7), and the presence of a wide range of additives such as metal ions and chelating agents [16,28,29]. In all cases, gene fusions were cloned into a synthetic vector under the control of the T7lac promoter, and transformed into E. coli BL21(DE3). The proteins of interest were then overproduced according to standard procedures.

Optimisation of Silα Purification
The His6-Silα was first investigated, as this protein most closely resembled the wild type enzyme, but it had the poorest solubility [15]. Here, its solubility during cell lysis and subsequent chromatographic isolation was assessed with a range of buffer formulations (Table 1). Any buffer formulation(s) that gave appreciable amounts of soluble protein upon lysis (as evidenced by SDS-PAGE analysis of the soluble lysate fraction) were then used for the subsequent affinity chromatography to isolate the protein.
In general, these buffer formulations were chosen to investigate the effect on protein solubility of small molecule additives and pH. The addition of non-denaturing detergents such as Triton-X and CHAPS was investigated as these have been previously used in earlier reports on the isolation of His6-Silα [15,30]. The addition of amino acids L-arginine and L-glutamic acid was also investigated as they have been shown to reduce protein aggregation, increase thermal stability and solubility; through increasing the surface tension of water to allow preferential hydration of proteins and by acting as weak surfactants [31][32][33]. The pH range investigated was based on findings that a higher pH may disrupt hydrophobic interactions between the protein units [13].
For His6-Silα it was found that the majority of the buffers tested did give soluble protein ( Table  1, entries 4-6 and 8-13; Supplementary Figure S1), but only the buffer containing Tris and NaCl at pH 8.5 gave soluble purified protein fractions after IMAC ( Table 1,

Optimisation of Silα Purification
The His 6 -Silα was first investigated, as this protein most closely resembled the wild type enzyme, but it had the poorest solubility [15]. Here, its solubility during cell lysis and subsequent chromatographic isolation was assessed with a range of buffer formulations (Table 1). Any buffer formulation(s) that gave appreciable amounts of soluble protein upon lysis (as evidenced by SDS-PAGE analysis of the soluble lysate fraction) were then used for the subsequent affinity chromatography to isolate the protein.
In general, these buffer formulations were chosen to investigate the effect on protein solubility of small molecule additives and pH. The addition of non-denaturing detergents such as Triton-X and CHAPS was investigated as these have been previously used in earlier reports on the isolation of His 6 -Silα [15,30]. The addition of amino acids L-arginine and L-glutamic acid was also investigated as they have been shown to reduce protein aggregation, increase thermal stability and solubility; through increasing the surface tension of water to allow preferential hydration of proteins and by acting as weak surfactants [31][32][33]. The pH range investigated was based on findings that a higher pH may disrupt hydrophobic interactions between the protein units [13].
For His 6 -Silα it was found that the majority of the buffers tested did give soluble protein (Table 1, entries 4-6 and 8-13; Supplementary Figure S1), but only the buffer containing Tris and NaCl at pH 8.5 gave soluble purified protein fractions after IMAC ( Table 1, entry 8). This result is consistent with previous reports that only the weakly coordinating buffer Tris would be compatible with IMAC [16]. In comparison, earlier work investigating the same buffer but at pH 7.5 gave no soluble protein [15] ( Table 1, entry 7). The addition of Arg and Glu appeared to give no benefit in this case. These results show that pH had a greater effect on protein solubility than the presence of detergents or buffer additives. Even so, the isolated yield was poor with only approximately 0.2 mg (equivalent to 8 nmol) of purified His 6 -Silα obtained per litre of E. coli culture (Table 4).
For His 6 -TF-Silα, the best performing buffer identified above ( Table 1, entry 8) was then compared to the conditions used in previous studies [13], where the use of 100 mM phosphate buffer at pH 8.0 during lysis and isolation gave only approximately 0.5 mg per litre of culture. In contrast, the new formulation exhibited higher levels of protein in the soluble fraction ( Figure 2a) and also gave higher Biomolecules 2020, 10, 1209 7 of 17 isolated yields. At pH 8.5, 14 mg (177 nmol) per litre of culture was achievable, which was a 28-fold molar improvement compared to the phosphate buffer at pH 7.0, and 22-fold superior to His 6 -Silα noted above. Table 4. Comparative summary of protein production yields for the Silα fusion proteins.

Protein
Buffer pH Isolated Yield (nmol L −1 of Cell Culture) a 8 nmol) of purified His6-Silα obtained per litre of E. coli culture (Table 4). For His6-TF-Silα, the best performing buffer identified above ( Table 1, entry 8) was then compared to the conditions used in previous studies [13], where the use of 100 mM phosphate buffer at pH 8.0 during lysis and isolation gave only approximately 0.5 mg per litre of culture. In contrast, the new formulation exhibited higher levels of protein in the soluble fraction ( Figure 2a) and also gave higher isolated yields. At pH 8.5, 14 mg (177 nmol) per litre of culture was achievable, which was a 28-fold molar improvement compared to the phosphate buffer at pH 7.0, and 22-fold superior to His6-Silα noted above.  Table 1 was used; for (b), the buffer mixtures stated in Table 2 were used (numbers below each image correspond to the buffer entries in Table 2 However, despite this increased production yield, aggregates were regularly observed for both His6-tagged proteins upon visual inspection in the purified fractions immediately after isolation by IMAC, suggesting the solubility was still relatively poor. The estimated pI of His6-Silα and His6-TF-  Table 1 was used; for (b), the buffer mixtures stated in Table 2 were used (numbers below each image correspond to the buffer entries in Table 2 However, despite this increased production yield, aggregates were regularly observed for both His 6 -tagged proteins upon visual inspection in the purified fractions immediately after isolation by IMAC, suggesting the solubility was still relatively poor. The estimated pI of His 6 -Silα and His 6 -TF-Silα and are respectively 5.65 and 5.13 (calculated using Compute pI/Mw [25,34]), so raising the pH of the buffer is expected to increase the electrostatic repulsion in both proteins and allow for better solubilization [35,36]. However, higher pH levels are incompatible with IMAC.
The TF-Silα-Strep construct was then evaluated as it had a calculated pI of 4.92, and thus was hypothesised to have an increased solubility at higher pH. In addition, SAC tolerates a greater range of buffer conditions compared to IMAC, making purification in increasingly basic buffers possible. Thus, an analogous buffer survey was performed for this protein (Table 2). Here, it was found that all the AMP-containing buffers at higher pH ( Table 2, entries 5-9), previously incompatible with IMAC, now facilitated soluble protein production (Figure 2b). Subsequent purification by SAC yielded 115 mg (1.54 µmol) per litre of culture with the best AMP-containing formulation at pH 9.0 (Table 2, entry 6), representing a 244-fold molar improvement compared to His 6 -TF-Silα when the phosphate buffer was used previously at pH 7.0, and 8.7-fold improvement compared to the best His 6 -TF-Silα result shown above ( Table 4). The buffer consisting of Tris and NaCl at pH 8.5 also gave soluble protein (Table 2, entry 4; Figure 2b). In this case, a somewhat lower yield of 58 mg (777 nmol) per litre of culture was obtained. Nevertheless, in all cases when this Strep-tagged protein was purified at pH ≥ 8.5, no visible aggregates were ever observed in protein-containing fractions after chromatography and concentration.
These results therefore substantiate the hypothesis that greater solubility of the Silα constructs are achieved when the pH is significantly raised above the pI. These results may also indicate that the presence of the His 6 -tag negatively affects their solubility. If so, this result would be consistent with reports on other types of proteins bearing this tag [17,18,23]. Indeed, the His 6 -Silα exhibited extremely poor solubility and almost entirely precipitated from solution within hours, and was not further investigated.

Circular Dichroism Spectroscopy
To ascertain if the proteins were correctly folded after isolation, the CD spectra of both His 6 -TF-Silα and TF-Silα-Strep proteins were recorded. These analyses were performed at pH 7.0, 8.5, and 9.0 ( Table 3). For pH 7.0, a phosphate buffer containing KCl was used as it has previously been used for lyophilisation of the protein [15]. The other two buffers were chosen from the best results in the protein production experiments above.
In all cases, the proteins showed clear secondary structural features ( Figure 3). Strong signals corresponding to alpha helices (negative at 222 and 208 nm, and positive at 193 nm) were present, suggesting that the two protein constructs are mainly alpha-helical. All spectra have line shapes that are essentially identical to those previously recorded for His 6 -TF-Silα [15], apart from the large signal at 192 nm for the spectra measured at pH 9.0, which is due to the presence of the AMP buffer. The data indicates the proteins are not denatured or disordered under all the tested conditions [37].   Table 3.

Dynamic Light Scattering
His6-TF-Silα and TF-Silα-Strep were then analysed by light scattering to provide a quantitative analysis of protein aggregation. These analyses were performed using the same conditions as for the CD spectroscopy (Table 3).
Firstly, dynamic light scattering (DLS) analyses were performed to estimate the particle size distribution (expressed as hydrodynamic diameter, dh; Figure 4). For His6-TF-Silα, highly heterogeneous distributions were observed under all the tested conditions, whilst for the Streptagged protein only a single peak is seen in all cases. The major peaks for each proteins sit at 89.4 ± (a standard deviation of) 5.5 nm for His6-TF-Silα and 29.5 ± 3.6 nm for TF-Silα-Strep, when averaged across the three conditions. TF and Silα are individually estimated to have dh of less than 10 and 4.6  Table 3.

Dynamic Light Scattering
His 6 -TF-Silα and TF-Silα-Strep were then analysed by light scattering to provide a quantitative analysis of protein aggregation. These analyses were performed using the same conditions as for the CD spectroscopy (Table 3).
Firstly, dynamic light scattering (DLS) analyses were performed to estimate the particle size distribution (expressed as hydrodynamic diameter, d h ; Figure 4). For His 6 -TF-Silα, highly heterogeneous distributions were observed under all the tested conditions, whilst for the Strep-tagged protein only a single peak is seen in all cases. The major peaks for each proteins sit at 89.4 ± (a standard deviation of) 5.5 nm for His 6 -TF-Silα and 29.5 ± 3.6 nm for TF-Silα-Strep, when averaged across the three conditions. TF and Silα are individually estimated to have d h of less than 10 and 4.6 nm, respectively [13,38]. Using empirical formulae relating polypeptide length and d h proposed by Wilkins et al. [27], the predicted d h of His 6 -TF-Silα and TF-Silα-Strep are respectively 6.5 nm and 6.4 nm when folded (assuming a globular structure); and 19.0 nm and 18.4 nm if unfolded. These calculated values are much smaller than those recorded by DLS, even when considering the fact that d h values from DLS are based on an assumption of globular structure and thus cannot provide a completely reliable reflection on the size of non-globular proteins such as TF [39].   Table  3 are shown.
These findings can thus only be explained by the formation of oligomeric structures by both proteins, and is consistent with the well-known propensity for Silα to self-assemble [13]. The DLS data also suggests oligomer formation is more pronounced for His6-TF-Silα, with the presence of wide size distributions or multiple peaks suggesting a broad mix of oligomers [40].
Using the DLS data, the percentage polydispersity and polydispersity index (PDI) were calculated for each sample to quantify their homogeneity (Table 5) [41,42]. In comparing the two proteins, His6-TF-Silα gives consistently higher values across all tested conditions. For both proteins, results at pH 9.0 show the lowest percentage polydispersity and PDI, indicating lower aggregation and supporting the hypothesis of electrostatic repulsion between protein molecules. However, the percentage polydispersity for both proteins at pH 8.5 was higher compared to pH 7.0, which was unexpected since the SDS-PAGE analysis post-lysis suggested good solubility at the higher pH ( Figure 2). One possible explanation for the lower observed polydispersity at pH 7.0 could be that aggregation and subsequent precipitation of the protein from solution had occurred prior to DLS analysis. As the light scattering measurements do not detect precipitated material, the observed (apparently lower) polydispersity would be based only on the residual protein that remained in solution.
The PDI of TF-Silα-Strep decreases with increasing pH, and is always below 0.35, indicating a high degree of homogeneity. Thus, whilst the percentage polydispersity of this protein suggests aggregation is occurring, the low PDI suggests that these aggregates are relatively uniform in size. In contrast, the PDI of His6-TF-Silα is never below 1. It is evident, therefore, that the His6-tagged variant is prone to form a wider range of aggregates under all the tested buffer conditions.  Table 3 are shown.
These findings can thus only be explained by the formation of oligomeric structures by both proteins, and is consistent with the well-known propensity for Silα to self-assemble [13]. The DLS data also suggests oligomer formation is more pronounced for His 6 -TF-Silα, with the presence of wide size distributions or multiple peaks suggesting a broad mix of oligomers [40].
Using the DLS data, the percentage polydispersity and polydispersity index (PDI) were calculated for each sample to quantify their homogeneity (Table 5) [41,42]. In comparing the two proteins, His 6 -TF-Silα gives consistently higher values across all tested conditions. For both proteins, results at pH 9.0 show the lowest percentage polydispersity and PDI, indicating lower aggregation and supporting the hypothesis of electrostatic repulsion between protein molecules. However, the percentage polydispersity for both proteins at pH 8.5 was higher compared to pH 7.0, which was unexpected since the SDS-PAGE analysis post-lysis suggested good solubility at the higher pH ( Figure 2). One possible explanation for the lower observed polydispersity at pH 7.0 could be that aggregation and subsequent precipitation of the protein from solution had occurred prior to DLS analysis. As the light scattering measurements do not detect precipitated material, the observed (apparently lower) polydispersity would be based only on the residual protein that remained in solution.
The PDI of TF-Silα-Strep decreases with increasing pH, and is always below 0.35, indicating a high degree of homogeneity. Thus, whilst the percentage polydispersity of this protein suggests aggregation is occurring, the low PDI suggests that these aggregates are relatively uniform in size. In contrast, the PDI of His 6 -TF-Silα is never below 1. It is evident, therefore, that the His 6 -tagged variant is prone to form a wider range of aggregates under all the tested buffer conditions.

Temperature-Dependent Melting
To further complement the DLS data, the thermal stability of both proteins was assessed by determining their melting temperature (T m ). Both protein constructs were subjected to a thermal ramp in the aforementioned buffer conditions (Table 3), and protein denaturation was quantified by the change in peak wavelength of the intrinsic tryptophan fluorescence emission (expressed as change in the barycentric mean wavelength, BCM). Here, denaturation results in the exposure of the internal tryptophan residues to the more polar aqueous media and a corresponding bathochromic shift in emission wavelength. The T m is then determined from the first derivative of the BCM as a function of temperature.
For His 6 -TF-Silα (Figure 5a), measurements at all three conditions showed a gradual increase in BCM with temperature without any sharp transitions, rather than a classical sigmoidal line shape with a single inflection corresponding to the T m . Nevertheless, an apparent T m can be estimated based on the maxima in the first derivative ( Figure 6). As the CD spectra displayed the expected features, the lack of sharp transitions is unlikely to be due to the protein already being unfolded or misfolded. A more likely explanation is that the protein sample consists of a heterogeneous population of aggregates (as evidenced by the DLS data), and the plot represents the ensemble average with no single T m .
In contrast, the data for TF-Silα-Strep show BCM step transitions giving apparent T m values of approximately 51.8, 49.0 and 45.8 • C at pH 7.0, 8.5 and 9.0, respectively. These transitions are overlaid with a gradual increase in BCM throughout the entire temperature range that is possibly due to subpopulations of aggregated protein dissociating with the rising temperature (Figure 5b).

Temperature-Dependent Aggregation
To deconvolute the two phenomena of protein unfolding and aggregation, static light scattering (SLS, at 266 nm) measurements were performed as a function of increasing temperature, as it allows quantification of the average molar mass of the particles. Here, the SLS intensity is proportional to the molar mass of the particles in solution, and can thus be used to infer aggregation. In this analysis, SLS signal as a function of temperature are plotted and the aggregation temperature (T agg ) can be assigned from the first point that exceeds two standard deviations above the baseline signal at the start of the experiment (at 20 • C).
For His 6 -TF-Silα ( Figure 5c) the data approximates a broad sigmoidal shape at pH 8.5 and 9.0, with apparent T agg at 45.6 and 45.9 • C, respectively. The transition appeared to be sharper at pH 9.0, suggesting a more homogeneous population that aggregates over a narrower temperature range. Furthermore, the signal also plateaus at a higher level, indicating the formation of larger aggregates compared to pH 8.5. In the case of pH 7, above approximately 60 • C there is a sharp rise followed by fluctuating signal intensity, which suggests a large increase in particle size leading to macroscopic precipitation.
In the case of TF-Silα-Strep (Figure 5d), it can be seen that the signals at both pH 8.5 and pH 7.0 start at higher signal intensity than pH 9.0, suggesting protein particle sizes are already larger for those cases at the start of the experiment. This result is consistent with the DLS data (Figure 4b) that shows a wider size distribution at pH 8.5 and 7.0. Only the data at pH 7.0 appeared to give a sigmoidal shape, with a T agg at 52 • C. At pH 8.5, there is a gradual increase in SLS signal, overlaid with a very weak transition at an apparent T agg of 45 • C. At pH 9.0, the data exhibited complex behaviour, with a gradual decrease until about 40 • C, an apparent transition at approximately 45 • C, followed by a gradual rise thereafter. The decrease in signal could be due to heat-induced increases in solubility and disaggregation. As the temperature increases towards 90 • C, at pH 9.0 this protein appears to form larger aggregates than those at pH 8.5, though the high signal intensity suggests that these particles are still stably suspended in solution. The proteins in pH 7.0 display the highest signal intensity at 90 • C and thus are forming the largest aggregates, though they apparently continue to remain suspended in the buffer solution under these conditions. gradual decrease until about 40 °C, an apparent transition at approximately 45 °C, followed by a gradual rise thereafter. The decrease in signal could be due to heat-induced increases in solubility and disaggregation. As the temperature increases towards 90 °C, at pH 9.0 this protein appears to form larger aggregates than those at pH 8.5, though the high signal intensity suggests that these particles are still stably suspended in solution. The proteins in pH 7.0 display the highest signal intensity at 90 °C and thus are forming the largest aggregates, though they apparently continue to remain suspended in the buffer solution under these conditions.  In comparing the T m and T agg , for His 6 -TF-Silα (Figure 6a) no clear correlations or trends were observed, as the data generally showed a lack of clear transitions or macroscopic precipitation that confounded further analysis ( Figure 5). In contrast, TF-Silα-Strep exhibited good T m and T agg agreement at all the tested pH values (Figure 6b). This correlation shows that heat-induced unfolding and aggregation occur concurrently, and is consistent with the general theory that protein aggregation is caused by denatured or misfolded protein. Furthermore, T m and T agg decrease with increasing pH for TF-Silα-Strep, suggesting that the protein is more stable towards thermally induced denaturation at neutral pH. However, the DLS data show that the protein is more aggregated at neutral pH, as evidenced by a wider size distribution compared to pH 9.0 (Figure 4b). Taken together, these results suggest that, at pH 7.0 under ambient temperatures, the protein already presents as oligomers, which then undergo denaturation and further aggregation upon thermal treatment. Thus, in this case a basal level of aggregation already manifests under ambient conditions that is not due to the presence of denatured protein. This behaviour is consistent with the fact that Silα has many hydrophobic residues already present on the exterior [13], which will promote protein-protein interactions even with correctly folded proteins.
Biomolecules 2020, 10, x 13 of 18 In comparing the Tm and Tagg, for His6-TF-Silα (Figure 6a) no clear correlations or trends were observed, as the data generally showed a lack of clear transitions or macroscopic precipitation that confounded further analysis ( Figure 5). In contrast, TF-Silα-Strep exhibited good Tm and Tagg agreement at all the tested pH values (Figure 6b). This correlation shows that heat-induced unfolding and aggregation occur concurrently, and is consistent with the general theory that protein aggregation is caused by denatured or misfolded protein. Furthermore, Tm and Tagg decrease with increasing pH for TF-Silα-Strep, suggesting that the protein is more stable towards thermally induced denaturation at neutral pH. However, the DLS data show that the protein is more aggregated at neutral pH, as evidenced by a wider size distribution compared to pH 9.0 (Figure 4b). Taken together, these results suggest that, at pH 7.0 under ambient temperatures, the protein already presents as oligomers, which then undergo denaturation and further aggregation upon thermal treatment. Thus, in this case a basal level of aggregation already manifests under ambient conditions that is not due to the presence of denatured protein. This behaviour is consistent with the fact that Silα has many hydrophobic residues already present on the exterior [13], which will promote protein-protein interactions even with correctly folded proteins. In comparing His6-TF-Silα and TF-Silα-Strep, the presence of the His6 tag appears to have a further detrimental effect on solubility. This observation is consistent with other reports suggesting its presence negatively influences solubility through a variety of effects including increased hydrophobicity, the formation of dimers or promotion of protein misfolding [18,20,22,43]. Indeed, the His6-tagged protein is more predisposed to aggregation and forms macroscopic filament-like structures (that can even be seen by the naked eye) within minutes of isolation, even at pH 9.0. Nevertheless, the same general findings can be inferred, in that the His6-tagged protein presents as oligomers under ambient conditions, which undergo unfolding and further aggregation upon heating.
Previous studies by Murr and Morse [13] suggested that aggregation of wild type Silα (without trigger factor) is a twofold process. The protein molecules first associate by hydrophobic interactions In comparing His 6 -TF-Silα and TF-Silα-Strep, the presence of the His 6 tag appears to have a further detrimental effect on solubility. This observation is consistent with other reports suggesting its presence negatively influences solubility through a variety of effects including increased hydrophobicity, the formation of dimers or promotion of protein misfolding [18,20,22,43]. Indeed, the His 6 -tagged protein is more predisposed to aggregation and forms macroscopic filament-like structures (that can even be seen by the naked eye) within minutes of isolation, even at pH 9.0. Nevertheless, the same general findings can be inferred, in that the His 6 -tagged protein presents as oligomers under ambient conditions, which undergo unfolding and further aggregation upon heating.
Previous studies by Murr and Morse [13] suggested that aggregation of wild type Silα (without trigger factor) is a twofold process. The protein molecules first associate by hydrophobic interactions to form oligomers, which are subsequently crosslinked by intermolecular disulfide bonds. These oligomers in turn associate with each other through further hydrophobic interactions to form higher order structures, which can be disrupted upon exposure to high pH (pH ≥ 9.0). If so, the use of high pH buffers for protein purification and storage would only prevent these higher order species but would not yield monomeric proteins.
Another potential contributor to aggregation could be the presence of C5 hydrogen bonding, which occurs through an overlap of the carbonyl n p and amide σ* orbitals between small amino acid residues such as glycine and serine. The computational models previously reported [12,13] indicate the presence of a high number of both serine and glycine residues on the solvent-accessible surface of Silα, which may result in a significant contribution of this type of bonding towards protein-protein interactions.
Nevertheless, these results indicate that the optimal buffer conditions for thermal stability and minimisation of protein aggregation are Tris (50 mM) and NaCl (100 mM) at pH 8.5, which provides a robust starting point for further applications.

Enzymatic Activity
To complete the characterisation of the proteins, their catalytic activity against Si-O bond hydrolysis were compared. Two hydrolytic activity assays were assessed, with TEOS and TBDMS-ONp [15,44].
The TEOS assays showed both candidates catalysed the production of silica compared to the negative control (buffer only), but the activity of TF-Silα-Strep was approximately 6.6-fold lower than that of His 6 -TF-Silα (Figure 7). These assays suggest (but do not confirm) that the His 6 tag may also be enhancing catalytic activity of this protein towards silyl hydrolysis, and is consistent with reports that the tag can act as a catalyst for ester hydrolysis [19], and that polyhistidine groups can catalyse Si-O bond hydrolysis [45,46]. In contrast, previous studies with a range of proteins that do not contain a His 6 -tag did not display any catalysis towards the hydrolysis of Si-O bonds [15,47].
Biomolecules 2020, 10, x 14 of 18 to form oligomers, which are subsequently crosslinked by intermolecular disulfide bonds. These oligomers in turn associate with each other through further hydrophobic interactions to form higher order structures, which can be disrupted upon exposure to high pH (pH ≥ 9.0). If so, the use of high pH buffers for protein purification and storage would only prevent these higher order species but would not yield monomeric proteins. Another potential contributor to aggregation could be the presence of C5 hydrogen bonding, which occurs through an overlap of the carbonyl np and amide σ* orbitals between small amino acid residues such as glycine and serine. The computational models previously reported [12,13] indicate the presence of a high number of both serine and glycine residues on the solvent-accessible surface of Silα, which may result in a significant contribution of this type of bonding towards protein-protein interactions.
Nevertheless, these results indicate that the optimal buffer conditions for thermal stability and minimisation of protein aggregation are Tris (50 mM) and NaCl (100 mM) at pH 8.5, which provides a robust starting point for further applications.

Enzymatic Activity
To complete the characterisation of the proteins, their catalytic activity against Si-O bond hydrolysis were compared. Two hydrolytic activity assays were assessed, with TEOS and TBDMS-ONp [15,44].
The TEOS assays showed both candidates catalysed the production of silica compared to the negative control (buffer only), but the activity of TF-Silα-Strep was approximately 6.6-fold lower than that of His6-TF-Silα (Figure 7). These assays suggest (but do not confirm) that the His6 tag may also be enhancing catalytic activity of this protein towards silyl hydrolysis, and is consistent with reports that the tag can act as a catalyst for ester hydrolysis [19], and that polyhistidine groups can catalyse Si-O bond hydrolysis [45,46]. In contrast, previous studies with a range of proteins that do not contain a His6-tag did not display any catalysis towards the hydrolysis of Si-O bonds [15,47]. Since it is not expected that the His6-tag would be dependent on the tertiary structure of the protein for its catalytic activity, the relative contribution of protein folding (and thus the correctly constituted active site) was investigated by repeating the assays using equivalent amounts of heatdenatured proteins. In both cases, heating at 85 °C for 20 min prior to the assay results in a loss of 85 Since it is not expected that the His 6 -tag would be dependent on the tertiary structure of the protein for its catalytic activity, the relative contribution of protein folding (and thus the correctly constituted active site) was investigated by repeating the assays using equivalent amounts of heat-denatured proteins. In both cases, heating at 85 • C for 20 min prior to the assay results in a loss of 85 and 76 % of their activity compared to enzymes that were not heat-treated, for His 6 -TF-Silα and TF-Silα-Strep respectively. Thus, assuming the His 6 -tag is not buried within the bulk protein (either due to misfolding or aggregation) upon cooling prior to the assay, this result suggests that at least 10% of His 6 -TF-Silα's activity is due to the His 6 -tag. In comparison, the Strep-tagged protein displays only a trace level of activity after denaturation.
Using TBDMS-ONp as the substrate, colorimetric assays were carried out and the Michaelis-Menten kinetic parameters were determined for the Strep-tagged protein and compared with the His 6 -tagged protein ( Table 6, Supplementary Figure S2). In comparing the two proteins, Strep-tagged protein displays a weaker affinity to the substrate, as evidenced by the higher K M . This result is consistent with the proposal that the His 6 -tag contributes to catalysis and would present a second binding site for the substrate, resulting in an apparently lower K M . Accordingly, the k cat of the Strep-tagged protein is also significantly lower as the catalysis is now only attributable to the enzyme active site. Taken together, both assays suggest that the presence of the His 6 -tag on Silα contributes to non-active site mediated activity. Since the Strep-tag is not known to cause an equivalent effect, it is proposed that the fusion protein incorporating this tag gives a more accurate appraisal of catalytic activity attributable to enzyme itself, rather than the pendant tag. A possible alternative explanation is that the Strep-tag may be inhibiting catalytic activity. However, as the C-terminal location of this tag is distant from the active site, any possible inhibition would not be through direct steric hinderance.

Conclusions
In summary, a series of biophysical investigations were carried out with the aim of developing improved variants and formulations of the silicatein enzyme, which can serve as a starting point for future enzymology studies and biotechnological applications. During this study, the solubility and stability of three different protein constructs (His 6 -Silα, His 6 -TF-Silα, and TF-Silα-Strep) in various buffers were investigated. As expected, the His 6 -Silα without the TF fusion to aid solubility exhibited a high degree of aggregation and insolubility. For the remaining two candidates, increasing the pH of the buffers during lysis and purification resulted in a dramatic increase in soluble protein yield. Subsequent light scattering and temperature ramp analyses in the optimised buffers found that the His 6 -TF-Silα exhibited a lower solubility and colloidal stability compared to the Strep-tagged variant. Though not conclusive, these results together with others mentioned above present an emerging picture whereby the presence of a His 6 tag may be detrimental to the solubility of some proteins. However, despite being more macroscopically soluble (and less prone to precipitation), TF-Silα-Strep was still found to be forming oligomeric structures even under the best conditions reported here.
Assays of enzyme activity for both the TF-fused proteins show that the presence of a His 6 tag may be contributing to a portion of the observed catalysis, though the TF-Silα-Strep protein still demonstrated unambiguous catalysis with respect to Si-O bond hydrolysis. This result illustrates that caution should be exercised when selecting purification tags, so that they do not unintentionally alter the activity of the recombinant protein.
Further work will be needed to engineer the interactions contributing to the oligomerisation of Silα, which other reports have suggested include both hydrophobic interactions and intermolecular disulfide bonds. Thus, the generation of monomeric formulations of Silα could be achieved by the addition of additives that disrupt protein-protein interactions [48]. The intermolecular disulfide linkages can also be cleaved by the addition of reducing agents, though caution must be exercised as previous sequence analyses have indicated that internal disulfide bonds are also present that may be essential to the protein's structure. Further experiments may also be carried out with an untagged protein, to exclude the possibility that the Strep-tag may be acting as an inhibitor to catalysis.