Four Challenges for Better Biocatalysts

Biocatalysis (the use of biological molecules or materials to catalyse chemical reactions) has considerable potential. The use of biological molecules as catalysts enables new and more specific syntheses. It also meets many of the core principles of “green chemistry”. While there have been some considerable successes in biocatalysis, the full potential has yet to be realised. This results, partly, from some key challenges in understanding the fundamental biochemistry of enzymes. This review summarises four of these challenges: the need to understand protein folding, the need for a qualitative understanding of the hydrophobic effect, the need to understand and quantify the effects of organic solvents on biomolecules and the need for a deep understanding of enzymatic catalysis. If these challenges were addressed, then the number of successful biocatalysis projects is likely to increase. It would enable accurate prediction of protein structures, and the effects of changes in sequence or solution conditions on these structures. We would be better able to predict how substrates bind and are transformed into products, again leading to better enzyme engineering. Most significantly, it may enable the de novo design of enzymes to catalyse specific reactions.


Introduction
Biocatalysis can be defined as the use of biological molecules or materials to accelerate the rate of chemical processes. This includes the use of naturally occurring enzymes, recombinant enzymes, modified or engineered enzymes, groups of enzymes, cell extracts and whole cells. The use of these biological materials can offer significant advantages over "traditional" chemical catalysis. Typically, biological reactions take place at moderate temperatures, atmospheric pressure and in aqueous solution. They can result in both economic and environmental advantages compared to many existing processes which occur at elevated temperatures and pressures, and in organic solvents. Bringing reactions to high temperatures and pressures requires substantial energy inputs, which are expensive and potentially environmentally damaging. The environmental risks associated with the disposal of organic solvents imposes further costs on the chemical industry. The use of high temperatures, pressures and often flammable or toxic solvents results in health and safety concerns, which can be expensive to mitigate.
Enzymes are often highly specific in the reactions which they catalyse. For example, many enzymes recognise only one stereoisomer and produce only one product out of a range of possible stereoisomers. This is particularly attractive to those industries where the correct chirality is vital to the proper functioning of the product. For example, many pharmaceuticals are only active in one stereoisomer; indeed, other stereoisomers may have adverse effects. Thus, it can be seen that biocatalysis meets many of the 12 core principles of "green chemistry" (Table 1) [1]. Table 1. Biocatalysis and the twelve principles of green chemistry.

Green Chemistry Principle How Biocatalysts can Address this Principle
Prevention of waste Biocatalysts will help eliminate organic solvents, reducing the need to dispose of these environmentally damaging substances.

Atom economy
No effect if catalysing the same reaction already used.

Less hazardous materials
Biocatalysis will help eliminate the use of heavy metal catalysts or organic solvents. Natural redox reagents (e.g., NAD(P) + ) have low toxicity. Cells and enzymes are biodegradable and thus unlikely to pose a long-term threat to the environment.
Safer chemicals This depends on the choice of reaction and so biocatalysis can make little direct contribution. Safer solvents and auxiliary chemicals Biocatalysis will often use water as a solvent.
Energy efficiency Biocatalysis is likely to operate in a relatively low temperature range (30-60 • C). They are unlikely to require high pressures.
Renewable feedstocks Limited contribution. However, naturally occurring redox cofactors, etc., could be used and produced by fermentation of microbes. Reduce derivatives, e.g., "blocking groups" These are unlikely to be necessary in biocatalysis due to the site-and stereoselectivity of enzymes.

Catalysis
Enzymes offer impressive rate enhancements, often much greater than catalysts currently in use. Products, etc., should be degradable This depends on the choice of reaction and so biocatalysis can make little direct contribution.

Real time analysis to prevent pollution
This depends on the design of the process. However, there is no fundamental reason why it cannot be applied in biocatalytic processes.

Inherently safer chemistry
Biocatalysis allows the elimination of high temperatures and pressures. It moves reactions away from organic solvents towards working in aqueous solutions. All this is inherently safer.
However, some of the attractive features of biocatalysts can also be their shortcomings. The high specificity of enzymes means that they often catalyse commercially interesting reactions at negligible rates, or not at all. Their ability to work at modest temperatures and pressures means that they are often denatured if exposed to conditions outside their normal range. Many enzymes are also denatured by even relatively small amounts of organic solvents. These can be issues where enzymes are to be used in part of a multi-step process with more traditional chemical steps. Considerable efforts have been made to alter or broaden the specificity of some enzymes. In addition, attempts have been made to improve the stability of proteins so that they will be more resistant to denaturation by temperature, pressure and organic solvents. There have been some notable successes in this field. For example, lipases (EC 3.1.1.3) are now widely used synthetically to catalyse the formation of ester and amide bonds [2][3][4]. Laccases (EC 1.10.3.2) are used to delignify wood pulps [5,6]. However, there have been many failures in the pursuit of better enzymes for biocatalysis. Ideally, we would adopt an engineering approach, in which enzymes were redesigned for novel functions or expanded operating ranges. However, an engineering approach requires a deep knowledge of the system being adapted. In reality, we often find that fundamental deficits in biochemical understanding limit our ability to engineer enzymes. These reflect our partial, albeit evolving, understanding of some key processes in protein biochemistry. This review focuses on four of these key areas and explains how solving some key biochemical problems would enable better biocatalysis in the future. There are, of course, other challenges which lie outside protein biochemistry. These include predicting the effects on yield of recombinant proteins when scaling up from laboratory-scale fermentations to industrial-scale ones and our lack of ability to culture many micro-organisms in the laboratory [7,8]. This makes it difficult to study their biology and biochemistry and greatly hinders their use as cellular biocatalysts. Other challenges include societal and economic issues, including public acceptance of the use of genetically modified organisms as well as the time and costs associated with enzyme engineering projects.

Challenge 1: Understand Protein Folding
It is a fundamental hypothesis in biochemistry that the primary sequence of a protein dictates its three-dimensional fold under a given set of physico-chemical conditions (temperature, pressure, pH, redox potential, etc.). This hypothesis was supported by the simple but pioneering experiments of Anfinsen, who demonstrated that denaturating RNAse (EC 2.7.7.56) with urea caused it to lose its activity. Removing the denaturant by dialysis caused the protein to regain almost all of its enzymatic activity [9]. No energy input was required and no information was added to the system. Thus, the protein must fold in a thermodynamically spontaneous manner to a free energy minimum, using only the information encoded in its primary structure. This experiment has now been repeated with a wide range of other proteins, although in vivo, many proteins require chaperones to enable them to fold efficiently. This is likely to be mainly required because of the crowded and complex cellular environments in which proteins fold. Some chaperones provide an "Anfinsen cage", in which the nascent protein can fold [10]. The fundamental principal that the primary sequence contains all the information to encode the secondary and tertiary structures remains unchallenged.
However, we have been unable to decipher how this information is used to drive protein folding. Our understanding of the process of protein folding is far from complete and we are unable to predict with reasonable certainty the three-dimensional structure of a protein from just its primary sequence. In part, this results from the complexity of this process, which involves not just the polypeptide molecule but also hundreds or thousands of water molecules and inorganic ions. It is also important to consider that, in vivo, protein folding will be much faster than ribosomal synthesis. So, folding of part of the protein may occur before the C-terminal parts have been made. This is another reason why chaperones are required to deliver efficient folding in the cellular context. This matters for biocatalysis. Without the ability to predict folding from a sequence accurately, we cannot predict how structures might change under conditions of different pH values or ionic strengths. Critically, this means that we are unable to predict the consequences of changing residues in the sequence on the overall fold. Many enzyme engineering projects have involved altering residues in the active site-for example, to alter charges, increase hydrophobicity or to create space for larger substrates to bind into. Enzyme engineering is made possible by site-directed mutagenesis. It is much more chemically straightforward to alter DNA sequences in a reliable and predictable manner than protein sequences. Furthermore, the generation of a mutated form of the expression vector for a protein means that the reaction is only required once since the vector (normally a plasmid) can be readily replicated in a bacterial host. Although site-directed mutagenesis has been possible since the 1980s, modern techniques such as the QuikChange method mean that it is possible to generate large numbers of mutations in parallel with high reliability (often greater than 90% success on first attempt) [11][12][13]. In many cases, enzyme engineering does not have the predicted effects. There are many documented examples (and most likely many more which have not been published) where a single amino acid change at the active site results in an inactive protein (e.g., [14][15][16]). These drastic changes in the structure and/or stability of the protein cannot be predicted by homology modelling. Nevertheless, they result in a considerable constraint on our ability to engineer proteins.
Some empirical "rules" have been derived based on observation of what works and what does not. Many years ago, Fersht proposed five rules of enzyme engineering, which all make sense from a protein biochemistry perspective. The first four are: where possible, delete only part of a side chain (e.g., Tyr→Phe) or choose isosteric changes (e.g., Asp→Asn), do not create unbalanced charges in the interior of the protein, delete only the minimum number of interactions and do not add new functional groups [17]. The final rule reflects our lack of understanding of these processes: it is to disobey the first four rules where appropriate. In the 30 years since these rules were first proposed, we have added thousands of protein structures to the protein databank and greatly improved the speed and processing power of computers. Ab initio methods do now exist for predicting protein structures from a sequence alone. While these are improving, they tend to be reasonably accurate for small proteins (<100 residues), but less accurate for larger ones [18].
Two key problems underpin our failure to predict protein structures accurately: the role of entropy in the process and the hydrophobic effect (see below). Protein folding involves a number of energetic changes. The unfolded polypeptide chain has considerable mobility and can populate many different conformations (high entropy). However, it makes few interactions with itself (low enthalpy). In solution, it will bind many water molecules, constraining their motions (low entropy, but high enthalpy of interaction). Thus, conversion to the folded state involves a number of energetically favourable (release of bound water and formation of intra-protein interactions) and unfavourable (constraint of the mobility of the polypeptide and breaking of interactions with water) processes. These normally balance out to be marginally favourable overall, with free energies of folding in the range 20-60 kJ mol −1 (i.e., equivalent to a few hydrogen bonds) [19]. Mapping these interactions and predicting the entropic changes associated with them is not trivial. Entropy of complex systems is hard to predict computationally. Water molecules in crystal structures do not, necessarily, represent those which are likely to be present in solution. Predicting these accurately (and their subsequent behaviour during folding) is also not trivial. Nevertheless, accurate algorithms to predict protein structures from a sequence alone (and how changes in that sequence would affect that overall structure) would be highly beneficial to enzyme engineering for biocatalysis. It would greatly reduce the number of projects which begin with a nice hypothesis based on the structure of the active site, but end with a misfolded, inactive protein. It would also enable the rapid mining of proteins without experimentally determined structures for potentially useful folds and activities. If the effects of an altered environment on structure could also be predicted, the redesign of proteins to resist high temperatures, pH extremes and organic solvents could also be improved.

Challenge 2: Understand the Hydrophobic Effect
Along with Anfinsen's experiment on protein folding, the hydrophobic effect forms part of any introductory course in biochemistry. It is well established that biological polymers in water fold such that hydrophobic parts are on the inside and hydrophilic ones are on the outside. This is driven, in part, by the association of non-polar parts of the molecules. However, the mechanisms of the hydrophobic effect are less well established. The hydrophobic effect is often confused with non-polar Van der Waal's interactions, which generally involve the same types of amino acid residues. These non-polar interactions involve the attractions and repulsions of electron clouds in side chains. In contrast, the hydrophobic effect is a thermodynamic one driven partly by interactions with water molecules. In water, non-polar molecules are generally surrounded by restrained "cages" of water molecules. This is entropically costly for the system, and there is little enthalpic payback in terms of strong interactions between the molecules and water. If the non-polar molecules can be grouped together, this reduces the water-exposed surface and, thus, the number of restrained water molecules, increasing the entropy of the system. Furthermore, since non-covalent interactions between the non-polar molecules can be formed, there will also be an enthalpic gain. Therefore, while non-polar interactions can and do occur in non-aqueous environments, the hydrophobic effect is a particular effect which only occurs in systems which include water. Like the energetic transactions which occur on protein folding, building a quantitative, accurate model of the behaviour of non-polar molecules interacting in an aqueous system is challenging. In particular, accounting for entropy in these interactions has proven difficult. However, in many systems, entropy is a key driver of the process and a major contributor to the overall free energy.
This matters in biocatalysis partly because the hydrophobic effect is so important in directing protein (and nucleic acid) folding. However, it also influences how ligands (e.g., substrates, inhibitors and activators) interact with enzymes. Accounting for the hydrophobic effect in molecular docking studies has been difficult. This difficulty partly explains why attempts to predict affinities of ligands bound to enzymes are often inaccurate by several orders of magnitude: while non-covalent interactions can be modelled and predicted, the hydrophobic effect has moved more troublesome to quantify. If this problem could be solved, then the effects on substrate binding following the engineering of active sites should become more accurate. This would reduce the number of times new variants of enzymes are made but fail to bind novel substrates with sufficient affinity to make them useful biocatalysts.

Challenge 3: Understand the Complex Effects of Organic Solvents on Biomolecules
The vast majority of enzymes function in an aqueous environment. However, industrial processes may introduce organic solvents or may involve compounds which are not highly soluble in water. Such compounds may need to be dissolved in solvents such as dimethylsulphoxide (DMSO), alcohols, ethers or hydrocarbons such as toluene, hexane and cyclohexane. These compounds can promote the denaturation of enzymes and the disruption of phospholipid bilayers, but the mechanisms are not always clear. They can also alter the specificity of enzymes, sometimes in ways which can be useful in biocatalysis and, in some cases, improve the stability of the protein [20]. However, predicting the qualitative and quantitative changes is not normally possible. Often these compounds have multiple effects arising from their hydrophobicity, their interaction with functional groups in proteins, their reduction of water activity and their chaotropicity. Furthermore, while understanding protein folding and the hydrophobic effect (see above) would largely benefit biocatalysis with enzymes, understanding the complex effects of organic solvents on biomolecules would also benefit biocatalysis with whole cells.
Part of the effect of organic solvents on biomolecules is linked to their hydrophobic properties. Hydrophobic parts of biomolecules tend to associate together on the interior of the molecule. Increasing the hydrophobicity of the external environment will make it more thermodynamically viable for hydrophobic moieties to move to the exterior, resulting in rearrangement of the overall structure. Thus, a better understanding of the hydrophobic effect (Challenge 2) would also assist in predicting the effects of organic solvents. The hydrophobicity of molecules is often measured by the logarithm of their partition coefficients (logP). This is an empirical measure which reflects the fraction which is dissolved in an organic layer (typically octanol) compared to the aqueous layer. Technically, it measures the polarity or lipophilicity of the compound, rather than its hydrophobicity which is a function of polarity and entropic effects on water.
Organic solvents also reduce the water activity (a w ) of solutions. Water activity is defined as the ratio of the equilibrium partial pressure of water above a solution to the equilibrium partial pressure above a solution of pure water. It is a measure of the amount of "free" water in a system. Although some cells can live at water activities below 0.6, the vast majority of organisms tend live at high water activities, typically in excess of 0.95 [21][22][23][24]. Hydrophobic substances, including those which are commonly used as solvents, induce stress in microbes which involve similar responses to cells under osmotic (water) stress [25][26][27].
In addition to hydrophobicity and effects on water activity, many organic solvents are also chaotropic. Chaotropes are compounds which increase the entropy of the system. This is thought to be primarily due to disruption of networks of water molecules, which reduces the entropic component of the hydrophobic effect. Since the exposure of non-polar moieties requires less ordering of water molecules in the presence of a chaotrope, the energetic penalty for doing so is reduced [28,29]. Some theories of chaotropicity suggest that competition for hydrogen bonding potential with key groups in the molecule is also important. This appears to be the case for urea, which disrupts local water structures and forms noncovalent interactions with polar groups within proteins [30]. Like hydrophobic compounds, chaotropes induce stress in cells, which is similar to that caused by low water activity [31]. Several attempts have been made to quantify chaotropicity. Some of these are indirect measures (e.g., the solution entropy in water [32,33]). Probably the most extensive scale is one based on the suppression of the melting point of agarose [34]. Like the partition coefficient, this is an empirical scale which measures the effects of compounds on a complex, macromolecular system. It does not account for the underlying physical chemistry of chaotropicity, which is poorly understood [35].
The lack of a deep, quantitative understanding of these interlocking phenomena means that it is hard to predict the effects of organic solvents on enzymes, groups of enzymes and whole cells. Addressing this problem would enable us to understand and predict the effects of organic solvents on proteins. It might enable estimation of the maximum tolerable concentrations of these solvents and also inform the engineering of proteins to become more resistant to organic solvents.

Challenge 4: Understand Catalysis
Enzymes can achieve impressive catalytic rate enhancements-up to 10 17 -fold in the case of OMP decarboxylase (EC 4.1.1.23) [36]. How they do this is not fully understood [37]. There is considerable literature devoted to elucidating the chemical mechanisms of enzyme catalysis. From this, it is clear that various factors contribute, including transition state stabilisation, stretching or bending bonds in the substrates, protonation or deprotonation to produce more reactive species and the provision of lower energy reaction pathways. However, a complete description of how catalysis is realised is available for few, if any, enzymes. In part, this results from our incomplete understanding of chemical reactions and the quantum mechanical effects which underpin them. A relatively underexplored aspect of enzymatic catalysis is the role of protein mobility. Unlike the overall average structure of a protein, mobility is much harder to measure experimentally. Yet, like all molecules, proteins are in constant motion, with bonds bending, stretching and rotating. These motions can give rise to much larger ones, for example conformational rearrangements following substrate binding or the movement of two domains relative to one another. It is now clear that these motions play key roles in catalysis and the regulation of enzyme activity [38]. However, identifying those motions which contribute to catalysis will be experimentally demanding. Similarly, modelling them accurately enough to predict their effects in novel systems will be challenging.
Site-directed mutagenesis combined with sophisticated computer simulations have provided considerable insight, but gaps in our knowledge mean that it is hard to predict how enzymes might behave with novel substrates or how the alteration of residues might affect specificity and catalysis. Furthermore, these two approaches do not always agree. For example, biochemical and structural studies of enzymes from the GHMP (galactokinase, homoserine kinase, mevalonate kinase, phosphomevalonate kinase) kinase family suggest that catalysis is largely promoted by the abstraction of a proton from the substrate by an aspartate or glutamate residue acting as a base [39][40][41][42][43]. This active site base mechanism is commonly postulated for enzymes and appears chemically reasonable. The abstraction of the proton converts a relatively unreactive hydroxyl group into a highly nucleophilic alkoxide ion which then attacks the γ-phosphate of ATP. However, computer simulations employing quantum mechanics/molecular mechanics approaches suggest that this mechanism does not occur in any of the GHMP kinases studied to date. Instead, these simulations predict direct transfer of the phosphate group from ATP, most likely assisted by stabilisation of the transition state [44][45][46].
A greater understanding of catalysis would enable us to become better at predicting the effects of active site changes on activity and specificity. The biggest advance of all would be to reliably "reverse engineer" proteins [47,48]. That is, to identify a reaction which it is desirable to catalyse, design an active site to do this, along with a three-dimensional protein structure to contain it, and then derive a sequence of amino acids which would fold into that conformation acting, as a de novo, non-natural enzyme for that reaction. This would require a detailed understanding of the non-enzyme catalysed reaction complemented with realistic ideas about how to achieve catalytic rate enhancement using the functional groups found in proteins. The incorporation of non-natural amino acids could further expand the range of functional groups available [49].

Conclusions
Like all applied sciences, biocatalysis relies on basic science for its advances. Here, four limitations in our understanding of protein biochemistry are identified. These are all big problems with implications which stretch far beyond biocatalysis. While incremental progress is being made on all four, a paradigm-shifting breakthrough is probably required to address each of them. Addressing even one would reduce the failure rate in the search for effective biocatalysts. Addressing all four would be a major step towards the design of de novo enzyme-based catalysts for a wide range of reactions. This would usher in a new era of green(er) chemistry, greatly reducing the chemical industry's reliance on high temperatures, high pressures and organic solvents.