Solvents to Fragments to Drugs: MD Applications in Drug Design

Simulations of molecular dynamics (MD) are playing an increasingly important role in structure-based drug discovery (SBDD). Here we review the use of MD for proteins in aqueous solvation, organic/aqueous mixed solvents (MDmix) and with small ligands, to the classic SBDD problems: Binding mode and binding free energy predictions. The simulation of proteins in their condensed state reveals solvent structures and preferential interaction sites (hot spots) on the protein surface. The information provided by water and its cosolvents can be used very effectively to understand protein ligand recognition and to improve the predictive capability of well-established methods such as molecular docking. The application of MD simulations to the study of the association of proteins with drug-like compounds is currently only possible for specific cases, as it remains computationally very expensive and labor intensive. MDmix simulations on the other hand, can be used systematically to address some of the common tasks in SBDD. With the advent of new tools and faster computers we expect to see an increase in the application of mixed solvent MD simulations to a plethora of protein targets to identify new drug candidates.


Introduction
The first revolution in structural biology, in the early 1990's, increased the available structural information by 20-fold in a decade, creating a high expectation for computational methods that could turn this information into drug candidates. A large body of methods emerged, and some drugs owe their existence-at least in part-to them [1,2]. But it is obvious that the impact of structure-based drug design (SBDD) has not met these expectations. For instance, out of 66 clinical candidates, published by the Journal of Medicinal Chemistry in the 2016-2017 period, none originated as a virtual screening hit [3]. It is a fact that predicting binding affinities (K A = 1/K D = exp(−∆G BIND /RT)) is terribly difficult, and one of the main compounding factors is the solvent's effect. Contrary to many expectations, designing a good ligand is not a simple matter of finding a molecule that offers a good shape, or electrostatic and chemical complementarity to its protein target. Binding that occurs in the presence of solvent and related predictions will always fall short if this is not fully accounted for. Accurate predictions will unavoidably consider the protein and the ligand embedded in the solvent as part of a condensed state with a great number of configurational possibilities. Molecular dynamics (MD) is uniquely suited to simulate such systems by identifying true ensembles that can be related to macroscopic observables [4]. Here we will review how MD can be used to understand the behavior of water, the universal biological solvent, in relation to the protein surface, and to accurately predict its molecular association properties. We will then discuss how MD simulations of proteins in water and mixed solvents can be used to identify key interactions on their surface, and how these can be incorporated into computational docking, to identify better drug candidates.
We will start by showing that, far from being empty space, a protein's binding sites in the unbound state are occupied mainly by water (but also ions and metabolites) that does not behave as a homogeneous solvent. Rather, there are well-defined hydration spots and also regions where water density is much lower than in bulk solvents [5]. This determines binding in ways that were not initially expected. Solvation also affects the bound state and binding pathways, thus the gold standard for computational methods is to recapitulate the binding process of a ligand to its target by means of molecular simulations that consider the solvent explicitly. As the timescale of the binding/unbinding events has an exponential relationship with molecular size [6], observing binding on a 'computational microscope' [7] is greatly facilitated when the ligand has only a few atoms, particularly if it can be simulated at high concentrations. In Section 2, we will discuss applications and practical aspects of this approach (termed MD simulations with mixed solvents, or MDmix for short). The use of simple ligands as probes to elucidate interaction preferences of protein binding sites has a long history in SBDD. Except for the crucial difference of including explicit solvation in all the computational procedures, MDmix-type simulations can trace their roots to Goodford's GRID [8], Karplus' MCSS [9] or the more recent FTmap [10]. All such methods assume that the behavior of the probe is transferable to bigger molecules. Their documented ability to locate binding hot spots confirm that this is at least partially true. But binding free energy is clearly a non-additive property, [11,12] thus it becomes necessary to consider the actual molecules of interest to obtain quantitative predictions. Once considered a dream, major advances in the field of molecular dynamics (See [13][14][15] and references therein) have finally made it possible to directly simulate the binding and unbinding process of actual ligands to their targets. In Sections 3 and 4, we will review applications with small ligands (fragments) and actual drugs, respectively. We will conclude by discussing the practical limitations and future perspectives for the application of these methods in drug discovery.

Solvent Structure as a Predictor of Protein-Ligand Interaction Sites
Among the most relevant processes underlying the formation of protein-ligand complexes is the associated solvent reorganization at the contact surfaces, particularly that of the protein receptor. Water molecules bound to the ligand binding regions must either be displaced to allow direct protein-ligand contact [16][17][18][19] or be retained, bridging specific protein-ligand interactions, as is sometimes observed [20][21][22][23][24][25]. The thermodynamics of this solvent reorganization process is a key contribution to the complex formation free energy and thus to the ligand binding affinity. Initial attention was paid to the role of tightly bound or ordered waters, as revealed by X-ray structures [26], which after displacement by the incoming ligand, were proposed to contribute favorably to ligand affinity. [21,22]. This observation can be further extended to other proteins even if waters are not resolved in diffraction experiments due to lack of resolution, by the use of molecular dynamics in the explicit solvent.
Explicit solvent molecular dynamics allows studying the structure and dynamics of water molecules, which as a consequence of the shape and charge distribution of protein surfaces, are distributed inhomogeneously in the solvation shell, giving rise to space regions where the probability of finding water molecules is significantly higher (or lower) than that of the bulk solvent, and where rotational and translational motions of each molecule vary significantly. Wiesner et al. for example [27], found that confined waters can have residence times in the range of 1 ns to 106 ns, while for the more mobile waters residence times were only 10-50 ps. Further thermodynamic characterization of these surface waters can be achieved by means of the inhomogeneous fluid solvation theory (IFST), developed by Lazaridis et. al. [5], through the identification of the so-called water sites (WS) [28].
Water sites (sometimes also called hydration sites) are defined as confined space regions close to the protein surface, and internal cavities or packing effects, showing a high probability of finding a water molecule inside them (water finding probability, WFP). They can be evidenced by the presence of crystallographic water molecules, or from MD simulations as defined by their position (whose coordinates correspond to the center of mass of all oxygen atoms, from those water molecules that visit the site during the simulation timescale), their WFP, and their size (characterized by the R90 values, which describes in Angstrom the radius of the WS that contains a water molecule 90% of the time). WS are usually identified by applying a clustering algorithm to a collection of snapshots derived from MD simulations, and despite some special cases, good convergence is achieved in 20-50 ns [29].
In addition to their application as detailed descriptors of the solvent structure, the relevance of WS determination stems from their capacity to reveal key hydrophilic protein-ligand interaction sites, such as those established by ligand hydroxyl, carbonyl and carboxylate groups, among others. This is nicely exemplified by hydrophilic ligands such as carbohydrates, where several groups reported that the solvent structure in the receptor carbohydrate recognition domain, as revealed by the WS, mimics the framework of the sugar -OH groups, as shown in Figure 1. Moreover, detailed analysis of WS properties showed that those WS that are replaced by the incoming ligand-OH group tend to be those with higher WFP and establishing more interactions with the protein.
Molecules 2018, 23, x FOR PEER REVIEW 3 of 13 [27], found that confined waters can have residence times in the range of 1 ns to 106 ns, while for the more mobile waters residence times were only 10-50 ps. Further thermodynamic characterization of these surface waters can be achieved by means of the inhomogeneous fluid solvation theory (IFST), developed by Lazaridis et. al. [5], through the identification of the so-called water sites (WS) [28].
Water sites (sometimes also called hydration sites) are defined as confined space regions close to the protein surface, and internal cavities or packing effects, showing a high probability of finding a water molecule inside them (water finding probability, WFP). They can be evidenced by the presence of crystallographic water molecules, or from MD simulations as defined by their position (whose coordinates correspond to the center of mass of all oxygen atoms, from those water molecules that visit the site during the simulation timescale), their WFP, and their size (characterized by the R90 values, which describes in Angstrom the radius of the WS that contains a water molecule 90% of the time). WS are usually identified by applying a clustering algorithm to a collection of snapshots derived from MD simulations, and despite some special cases, good convergence is achieved in 20-50 ns [29].
In addition to their application as detailed descriptors of the solvent structure, the relevance of WS determination stems from their capacity to reveal key hydrophilic protein-ligand interaction sites, such as those established by ligand hydroxyl, carbonyl and carboxylate groups, among others. This is nicely exemplified by hydrophilic ligands such as carbohydrates, where several groups reported that the solvent structure in the receptor carbohydrate recognition domain, as revealed by the WS, mimics the framework of the sugar -OH groups, as shown in Figure 1. Moreover, detailed analysis of WS properties showed that those WS that are replaced by the incoming ligand-OH group tend to be those with higher WFP and establishing more interactions with the protein. More recently, the role WS as predictors of protein-ligand interactions was extended beyond the sugars, again showing that WS, particularly those with high probe finding probability (PFP), tend to be replaced by ligand hydrophilic groups that establish key interactions with the protein receptor, as shown in Figure 1B, for AMPc beta-lactamase.
Having established the tight relationship between WS and protein-ligand interactions, the next logical move was to apply this knowledge in the context of protein-ligand complex structure prediction (i.e., docking methods) and determination of ligand binding free energies. However, before moving to this topic we will present the use of other solvents as tools for the prediction of protein-ligand interactions. More recently, the role WS as predictors of protein-ligand interactions was extended beyond the sugars, again showing that WS, particularly those with high probe finding probability (PFP), tend to be replaced by ligand hydrophilic groups that establish key interactions with the protein receptor, as shown in Figure 1B, for AMPc beta-lactamase.

Mixed Solvents Simulations in Drug Design
Having established the tight relationship between WS and protein-ligand interactions, the next logical move was to apply this knowledge in the context of protein-ligand complex structure prediction (i.e., docking methods) and determination of ligand binding free energies. However, before moving to this topic we will present the use of other solvents as tools for the prediction of protein-ligand interactions.

Mixed Solvents Simulations in Drug Design
While water is the universal biological solvent, organic solvents are ubiquitous in laboratories. Some exceptional proteins remain active in neat organic solvents and have been explored as catalysts in industrial applications [30]. More frequently, the buffers used in chemical and structural biology contain small concentrations of organic solvents. Most proteins preserve their structure and function in the presence of 1-5% of DMSO and other common organic molecules [31]. This fact led to the independent observation by NMR and X-ray crystallography that solvents bind preferentially to the active sites of proteins [32,33]. Systematic studies on proteins crystals showed an increasing number of solvent interaction sites as the solvent concentration was increased, and some degree of selectivity for various solvents [34,35]. The most frequently occupied regions coincided with key interaction sites for the substrates, which agreed with the recently postulated notion of 'hot spots', i.e., regions on the protein surface that provide most of the binding affinity. [36] Interestingly, the same authors also showed that the computational methods available at the time, GRID [8] and MCSS [9,37] did a mediocre job at predicting binding sites due to the use of implicit solvation and neglecting entropic contributions [34,35]. While the possibility of detecting binding sites by crystallography or NMR with mixed solvents was enticing, the method had limited practical impact because proteins and their crystals rarely withstand high solvent concentrations. Retrospectively, it may seem surprising that it took more than 20 years to perform analogous experiments using molecular dynamics, but it wasn't until the late 2000's that MD simulations could routinely explore sufficiently long timescales to ensure meaningful results. In 2009 the Barril's lab published the first MD application of mixed solvents. In this work, the probe solvent was isopropanol to capture in a single molecule, the hydrophobic and hydrogen bond donor, and acceptor moieties that are common in drug-like molecules. The aim was to detect binding sites and quantify their potential to bind drug-like molecules [38]. This property, often referred to as 'druggability' (but note the parallelism with the term 'ligandability' [39]), is crucial to predicting the probabilities of successful development of a drug candidate tackling a particular site [40]. The authors noted that "in addition to a prediction for the (druggability of the) whole site, one also obtains a map of the interaction preferences". Independently and almost simultaneously, the MacKerell's lab described another mixed solvent approach that focused precisely on this application [41]. In this case, the solvents used were propane as an aliphatic probe, benzene as aromatic probe and water itself was used as a polar probe. Probe interaction maps (called FragMaps) showed an excellent correlation with the binding modes of existing ligands. Since then, a large number of contributions have emerged. Besides druggability [42][43][44] and binding site mapping [45,46], mixed solvents have also been used to predict water displaceability [47,48], to probe protein flexibility and the detection of more druggable conformations [49], or cryptic pockets [50][51][52], or used to re-score docking poses [53,54]. As the diverse implementations and applications of mixed-solvent MD have been extensively reviewed by Ghanakota and Carlson [55], we will place emphasis on the issue of convergence, which is essential for correct predictions.
Convergence of a mixed solvents MD is determined by three interrelated aspects that merit individual discussion: Simulation time, solvent concentration, and protein flexibility.
(1) Simulation time should be sufficient to observe multiple binding and unbinding events. Naturally, the accuracy of the predictions increases and variability decreases as the number of observations (N) increases. Ns as low as 5 are sufficient for qualitative applications but must reach hundreds to be truly quantitative [6]. The other factor determining the total simulation time is the residence time of the solvent (t 1/2 = ln 2/k off ; k off ∝exp(−∆G /RT) [56]. For barrierless dissociation (∆G BIND = −∆G TS ) t 1/2 depends on the binding free energy, which can increase almost linearly with the number of atoms [57]. Thus, simulation times should increase exponentially with the size of the solvent. But the pathways leading to and from particular binding sites may be hindered, particularly for large ligands, resulting in ∆G BIND << −∆G TS and, in consequence, much larger t 1/2 . Conventional recipes suggest running several replicas of 10-40 ns each, for a total timeframe of 50-100 ns. This is sufficient to ensure qualitative convergence of the published solvents on the surface of the protein. But direct counting of the number of binding/unbinding events or other forms of measuring convergence should always be used (Figure 2). (2) Solvent concentration increases sampling effectiveness. Not only due to the increase in effective on-rate (i.e., the number of binding events), but also because multiple binding sites can be sampled simultaneously. The behavior of the organic solvent should remain ideal (i.e., as in infinite dilution) to avoid artifacts caused by solvent-solvent interactions in the unbound state (e.g., inhomogeneous dilution and phase separation). Particular solutions to this problem include the introduction of repulsive terms between solvent molecules [41], or the use of amphiphilic molecules that are highly soluble and do not self-aggregate [59]. Additionally, protein dynamics should not be excessively perturbed by the solvent [60]. Considering that most solvents are denaturants at high concentrations, concentrations should be kept relatively low (<5%), as the protein could be artificially constrained, or simulation times could be much shorter than the denaturation time.
(3) Protein flexibility also determines convergence. Ideally, proteins should be allowed complete conformational freedom, but sampling the configurational space of regular proteins requires excessively long timeframes. Not only that, but it also complicates interpretation of results, as many hotspots are conformation-specific and not representative of the whole ensemble [61]. Constraining the mobility of protein atoms, on the other hand, is a straightforward way of increasing convergence. But this can lead to the overestimation of some hot spots and missing others. As a compromise, for many applications, it is useful and correct to use weak restraints that prevent conformational drift but allow sampling of the local conformational space [61]. Contrarily, if the goal is to induce conformational changes in the protein, such as the opening of cryptic pockets, simulations should be extended to the µs scale [50][51][52].

Small Ligands and Fragment Screening
Midway between solvent-sized and drug-like molecules, we find the so-called fragments. Fragment-based drug discovery initial hits are small molecules (roughly 10 to 20 non-hydrogen atoms) that are then grown and optimized to become standard drugs (30-40 atoms) [62]. Considering the industrial interest and the small size of these molecules, the use of MD as a screening technique raises considerable interest. In this approach, each compound in the virtual screening collection would be considered a probe that would be subjected to long MD simulations in the presence of the target protein. Probes that bind would then be considered fragment hits.
At present, molecular docking is the tool of choice for virtual fragment screening. Pioneering work by the Shoichet's Lab in this area led to the conclusion that although virtual fragment screening is adequate, with hit rates of 14.5% [63] and correct pose prediction, it mostly finds low specificity molecules. The effectiveness of this method for screening and de novo design are well documented in the literature [64][65][66][67][68]. Docking is particularly well suited for fragment screening since the molecules used as fragments are small and not very flexible (less than three rotatable bonds). Nevertheless, if the binding site is not known, it can lead to many false positives. Consensus strategies, like the ones used in FTMap [10], have been used to identify new binding sites. However, (2) Solvent concentration increases sampling effectiveness. Not only due to the increase in effective on-rate (i.e., the number of binding events), but also because multiple binding sites can be sampled simultaneously. The behavior of the organic solvent should remain ideal (i.e., as in infinite dilution) to avoid artifacts caused by solvent-solvent interactions in the unbound state (e.g., inhomogeneous dilution and phase separation). Particular solutions to this problem include the introduction of repulsive terms between solvent molecules [41], or the use of amphiphilic molecules that are highly soluble and do not self-aggregate [59]. Additionally, protein dynamics should not be excessively perturbed by the solvent [60]. Considering that most solvents are denaturants at high concentrations, concentrations should be kept relatively low (<5%), as the protein could be artificially constrained, or simulation times could be much shorter than the denaturation time.
(3) Protein flexibility also determines convergence. Ideally, proteins should be allowed complete conformational freedom, but sampling the configurational space of regular proteins requires excessively long timeframes. Not only that, but it also complicates interpretation of results, as many hotspots are conformation-specific and not representative of the whole ensemble [61]. Constraining the mobility of protein atoms, on the other hand, is a straightforward way of increasing convergence. But this can lead to the overestimation of some hot spots and missing others. As a compromise, for many applications, it is useful and correct to use weak restraints that prevent conformational drift but allow sampling of the local conformational space [61]. Contrarily, if the goal is to induce conformational changes in the protein, such as the opening of cryptic pockets, simulations should be extended to the µs scale [50][51][52].

Small Ligands and Fragment Screening
Midway between solvent-sized and drug-like molecules, we find the so-called fragments. Fragment-based drug discovery initial hits are small molecules (roughly 10 to 20 non-hydrogen atoms) that are then grown and optimized to become standard drugs (30-40 atoms) [62]. Considering the industrial interest and the small size of these molecules, the use of MD as a screening technique raises considerable interest. In this approach, each compound in the virtual screening collection would be considered a probe that would be subjected to long MD simulations in the presence of the target protein. Probes that bind would then be considered fragment hits.
At present, molecular docking is the tool of choice for virtual fragment screening. Pioneering work by the Shoichet's Lab in this area led to the conclusion that although virtual fragment screening is adequate, with hit rates of 14.5% [63] and correct pose prediction, it mostly finds low specificity molecules. The effectiveness of this method for screening and de novo design are well documented in the literature [64][65][66][67][68]. Docking is particularly well suited for fragment screening since the molecules used as fragments are small and not very flexible (less than three rotatable bonds). Nevertheless, if the binding site is not known, it can lead to many false positives. Consensus strategies, like the ones used in FTMap [10], have been used to identify new binding sites. However, in shallow interfaces, as seen in many protein-protein interactions (PPI) sites, the lack of proper treatment for the receptor flexibility can be a drawback for these strategies [69].
MD is an essential tool to include receptor flexibility and therefore to compute the binding free energy [1,70]. Both Free Energy Perturbation (FEP) and kinetic parameter estimation methods have been used for fragment discovery, while FEP has been successful for rescoring [71,72] as well as predicting absolute free energies (but not routinely due to high computational cost) [73][74][75]. On the other hand, recent works have focused on the determination of the binding kinetics of small molecules and fragments from MD simulations [76][77][78][79]. Many methods rely on an intelligent design of the analysis strategy to predict the kinetic binding parameters k off , mainly using Markov state models [80]. Although most of the reports use molecular simulations to characterize the binding kinetics of known fragments/small molecules [81][82][83], there are some reports on fragment-based screening from "first principles" using molecular simulations [84]. The De Fabritiis' Lab [85] recently presented a proof of concept of fragment-based screening using MD. They screened a library of 129 fragments (6 to 16 heavy atoms) using short simulations (100 ns), applying a bias and analyzing the trajectories with Markov state models (MSMs). Although the authors found promising fragments binding (8 mM) to the receptor surface (CXCL12), the computational expense is still prohibitive (380,000 GPU hours).
Work at Shaw D.E. Research sets the bar for quantitative prediction for fragment-based drug discovery [6]. They explored the binding thermodynamics and kinetics of 7 molecules of 4 to 10 heavy atoms to FKBP protein. After hundreds of direct observations of binding and unbinding events, they computed the k on , k off and binding affinities. They showed a perfect agreement with FEP simulations, demonstrating that when convergence is ensured, direct simulation of the binding equilibrium by molecular dynamics, can be a quantitative tool. Unfortunately, the RMSE of the computed binding free energy with experimental values was 2.1 kcal/mol, which illustrates the challenges that still lie ahead.
There is significant scope for cross-fertilization between mixed solvent MD and fragment-based drug discovery that has not been extensively explored. For instance, fragments often bind to multiple binding sites on the protein surface [86] which could potentially be identified by cosolvent MD. Fragments can also induce opening of new cavities (cryptic pockets). Gervasio's research on an exciting tool to address this topic, which combines co-solvent MD and advanced sampling (SWISH), helped to discover cryptic pockets [50,51]. Specifically, simulations on NPC2, p38α, LfrR, and hPNMT were performed, and due to the combined nature of SWISH and CoSolvent, MD was able to find all the cryptic pockets. Once in the binding site, the information derived from the cosolvent MD simulations could potentially be used to predict binding modes and affinities, or to guide the fragment evolution process. Work in this area has been done by MacKerell's Group with the SILCS methodology. They used the information derived from cosolvent MD to derive so-called FragMaps [41]. These grids were used to rank different ligands and to determine the free energy of binding.

Molecular Dynamics Simulations of Drugs or Drug-Like Compounds
Molecular Dynamics simulations could be used to study the free energy of binding of a drug or a drug-like molecule (30-40 heavy atoms) to a protein. This would require the sampling of several binding and unbinding events and therefore unbiased MD runs of at least hundreds of microseconds. Direct observation of drugs binding to their target has been an outstanding achievement of MD applications. Unbiased simulations have revealed the binding pathways of dasatinib to Src kinase [87] and alprenolol binding to the β2 adrenergic G protein-coupled receptor [76]. However, due to the long timescale involved in the dissociation of a drug from its target, direct observation of several unbinding events is not possible. Massive short unbiased simulations in conjunction with Markok State analysis has been used to study benzamidine binding to trypsin [80]. On the other hand, biased simulations can be used to study the Potential of Mean Force of drug binding. Cavalli published the first study of its kind, showing that it was possible to discern active from inactive compounds of the beta-hydroxyacyl-ACP dehydratase of Plasmodium falciparum using steered MD [88]. Since then, a large variety of biasing potentials have been investigated and applied to the problem [89]. Even so, the problem remains computationally prohibitive. For instance, the study of a single inhibitor of p38 MAP kinase, that is a fragment of Doramapimod (BIRB 796) and dissociates 4 orders of magnitude faster than the parent compound, took 6.8 µs of production simulations and a total CPU time of 2.5 million core-hours [90]. In addition, identification of the reaction coordinate is often a trial and error process that takes considerable human time and is difficult to automatize [89]. Intriguingly the initial steps of the dissociation may already provide useful information [91], but full reconstruction of the process and quantitative binding affinity estimates remain a major challenge that is only applicable to particularly relevant protein-drug pairs.
For higher throughput applications, docking is widely used to predict protein-ligand interactions and has become extremely useful for virtual screening of huge collections of small molecules [92][93][94].
Most popular docking methods show that success rates are highly system-dependent, with an overall good performance for pose prediction with binding free energy errors of 2-3 kcal/mol for small drug-like molecules and in the absence of significant receptor conformational adjustment [95]. However, it is well known that better results can be obtained by adjusting the docking protocol using previous knowledge for a particular system, such as binding sites or crucial molecular interactions [96,97].
The term "biased docking" (or "guided docking") refers to the use of additional, experimental or in silico, information to influence the outcome of a docking experiment, e.g., the use of chemical information to favor a certain orientation and conformation of a ligand inside the binding site. The source of this information can be either the protein target structure or its known ligands. A protein-derived bias extracts the information directly from the protein surface and its available molecular interactions and generates a chemically complementary representation of the surface with more weight on particularly important residues, e.g., those confirmed to be essential for the activity by point mutations. As we discussed before, the use of probe atoms, functional groups, small molecules (e.g., mixed solvents) or molecular fragments is another approach to detect important interaction sites or hotspots without involving actual ligands. In this way, a protein-derived pharmacophore is obtained and defines energetically favorable binding site locations for docked compounds. A currently common technique for obtaining these hotspots is to run molecular dynamics simulations with small probes (see Mixed Solvents section). The hotspots can then be used to adjust the docking protocol, e.g., by adding a restriction towards the formation of a given protein-probe interaction. Recently, Arcon et al. showed that determination of water and/or ethanol sites derived from molecular dynamics simulations in mixed solvents allowed identification of over 79% of all protein-ligand interactions, especially those that were most important for the binding [54]. They also stated how this knowledge could be used to improve docking. On the other hand, a ligand-derived bias extracts the information from the known ligands for a particular protein target, for example, a particular substructure such as the core of a congeneric series. Several protein-ligand complex structures are available and the conserved interactions of the co-crystallized ligands (ligand-derived pharmacophore) can be inferred and used to improve docking accuracy [97,98].
The improved performance of knowledge-based biased docking is highlighted by the different options available in the most common docking programs. For example, Glide [99] and GOLD [100] allow hydrogen bonds and substructure-based constraints, while Glide also permits metal restraints to enforce coordination geometries. On the other hand, rDock [94] and MOE [101] are able to constrain generated poses to satisfy pharmacophores, and thus bias the results towards important interactions, and also perform knowledge-based template guided (or tethered) docking. DOCK6 [102] has a conformational search option to bias the sampling towards poses in accordance with a defined set of known ligand structures. AutoDock [92] and DOCK3 [103] were subjected, by us [29] and others [104,105], to implementations considering the energy accounted from water displacement through inhomogeneous solvation theory for guiding the docking. Lopez et al. have proposed a scheme to add a bias to AutoDock [29] that has been recently implemented for performing biased docking with AutoDock4 (AutoDock-Bias, in preparation Arcon et al., 2018). The versatile definition of the different types of biases in AutoDock-Bias accounts for all of the above cases. It allows guided docking towards pharmacophoric interactions in a straightforward way for hydrogen bond and hydrophobic/aromatic interactions. Furthermore, it allows researchers to get ideal interaction patterns for a specific protein structure, thus easily defining interactor locations. In addition, the capability of modifying any specific energy map and assigning any bias potential strength permits the precise localization of any desired atom (e.g., metal) or group (e.g., substructure core of a congeneric ligand series or for fragment growth) in a defined region space relative to the target protein. Finally, the specific energy map modification may also be used as an anchor for covalent docking studies. Since we addressed the problem of incorporating single target information, in the present discussion, we omitted potentials used for docking scoring functions [106][107][108] generally derived for diverse protein-ligand complexes.
In summary, mixed solvents simulations can lead to the identification of hot spots that can then be used in biased docking. The bias may affect the conformational search and/or scoring of the obtained poses.

Conclusions and Perspectives
Simulation of molecular dynamics in an explicit solvent are needed for accurate drug design. As the thermodynamics of the solvent reorganization upon drug binding is a key contribution to the complex formation free energy and thus to the ligand binding affinity. Therefore, accurate predictions have to consider the protein and the ligand embedded in the solvent, as part of a condensed state and have to account for a great number of configurational possibilities. On the other hand, explicit water MD allows studying the structure and dynamics of water molecules, and therefore the identification of water sites, that are relevant for their capacity to reveal key hydrophilic protein-ligand interaction sites. Water provides useful information for drug design, like guiding thermodynamic integration computations for compound optimization by allowing researchers to predict where it is favorable to grow the compound by displacing waters [109,110]. Another recent use of water molecules is to design specific inhibitors between a protein family, like the bromodomain proteins where structural water position determines drug selectivity [111]. The MD application of mixed solvents allows researchers to detect binding sites and quantify their potential to bind drug-like molecules. In turn, the identified hot spots can then be used as a bias in docking simulations to better identify drug candidates.
Mixed solvent MD with a cosolvent of no more than 10 heavy atoms is feasible and as we have described in this review, can clearly contribute to drug design, but has not yet been fully exploited. With the advent of new web services and user-friendly software, good algorithms to analyze the simulations and faster computers, we expect to see an increase in the application of these techniques to a plethora of protein targets. Docking simulations have not increased accuracy for drug-protein conformational predictions in the last decade, but most probably will get better in the near future, with the increased use of knowledge-based algorithms. MDMix will also help to obtain more accurate binding free energy estimations, but much effort in the community is needed in order to derive new algorithms that are not only able to estimate the free energy contribution of drug-protein interactions, but also the free energy of protein and drug desolvation.