Molecular Dynamics Simulations in Drug Discovery and Pharmaceutical Development

Molecular dynamics (MD) simulations have become increasingly useful in the modern drug development process. In this review, we give a broad overview of the current application possibilities of MD in drug discovery and pharmaceutical development. Starting from the target validation step of the drug development process, we give several examples of how MD studies can give important insights into the dynamics and function of identified drug targets such as sirtuins, RAS proteins, or intrinsically disordered proteins. The role of MD in antibody design is also reviewed. In the lead discovery and lead optimization phases, MD facilitates the evaluation of the binding energetics and kinetics of the ligand-receptor interactions, therefore guiding the choice of the best candidate molecules for further development. The importance of considering the biological lipid bilayer environment in the MD simulations of membrane proteins is also discussed, using G-protein coupled receptors and ion channels as well as the drug-metabolizing cytochrome P450 enzymes as relevant examples. Lastly, we discuss the emerging role of MD simulations in facilitating the pharmaceutical formulation development of drugs and candidate drugs. Specifically, we look at how MD can be used in studying the crystalline and amorphous solids, the stability of amorphous drug or drug-polymer formulations, and drug solubility. Moreover, since nanoparticle drug formulations are of great interest in the field of drug delivery research, different applications of nano-particle simulations are also briefly summarized using multiple recent studies as examples. In the future, the role of MD simulations in facilitating the drug development process is likely to grow substantially with the increasing computer power and advancements in the development of force fields and enhanced MD methodologies.


Introduction
The purpose of a computer simulation is to gain insight into the behavior of an actual physical system or process. To achieve that specific objective, a model system is developed that represents or emulates the given physical system. A suitable algorithm subsequently generates a time series or an ensemble of states ("observations") for the model system. Finally, an analysis is conducted by calculating various system properties from these states (certain properties may also be monitored during a simulation). In the present context, the word "simulation" usually refers to the process of generating states by numerically solving a set of differential equations for the selected degrees of freedom (state variables) of the given model system. Many computed system properties are measured by experiment as well so that an explanation of the observed experimental data is immediately available based on the model system and the simulation results. More importantly, one can also observe behavior that is inaccessible to experiment and test "what-if" scenarios (e.g., mutation studies). As a result, simulation techniques have become invaluable tools for modern research as they complement experimental approaches. With the continuing advance of computing power, such tools will only further increase in importance.

Classical Molecular Dynamics Simulations
Molecular dynamics (MD) is one such simulation technique [1,2]. It aims at deriving statements about the structural, dynamical, and thermodynamical properties of a molecular system. The latter is typically a biomolecule (solute) such as a protein, an enzyme, or a collection of lipids forming a membrane, immersed in an aqueous solvent (water or electrolyte). In the case of proteins and enzymes, the experimental protein structure as deposited in the Protein Data Bank (PDB) [3] serves as a starting point for MD simulations. If no structure is available, one must resort to modeling (predicting) the structure for which several techniques are available, such as homology or comparative modeling [4]. In atomistic "all-atom" MD (AAMD), the model system consists of a collection of interacting particles represented as atoms, describing both solute and solvent, placed inside a sufficiently large simulation box, where their movements are described by Newton's laws of motions. An algorithm such as velocity-Verlet or leap-frog [2] is employed to advance over the course of many time steps the state of the model system as a function of time (see Figure 1 for a schematic view of a basic MD algorithm). A single state consists of the combined values of the atoms' positions and velocities (or momenta). To advance the state, forces acting on particles are computed from a model or empirical potential energy ("force field"), a function of particle positions, which includes all types of "non-bonded" interactions, such as electrostatic and Lennard-Jones forces, but also various types of "bonded" potentials for preserving the structural integrity of the given biomolecular system. The latter include harmonic potentials for maintaining bonds, bond angles, and "improper" dihedrals as well as terms for dihedrals. Some force fields include the Morse potential for a more realistic representation of bonds, while others may account for explicit electronic polarization effects. An extensive discussion of force fields is given by Monticelli and Tiele-Processes 2021, 9,71 3 of 60 man [5]. Of note, various force fields have been developed for different types of molecules. In the context of our review, force fields for proteins [5], biological lipids [6], and small molecules [7] are of particular relevance (see Table 1 for examples of current commonly used force fields). Force fields are also employed to compute energies in molecular mechanics (MM) applications [8]. Such simulations are usually conducted under conditions of constant temperature and pressure to mimic laboratory conditions for which special algorithms are available. In addition, to emulate a very large molecular system, several techniques for artificially extending the size of the model system have been developed. The most common one is the periodic boundary condition (PBC). Here, an infinite number of replicas of the central simulation box surrounds the central box. Due to the long-range character of electrostatic interactions, special techniques such as Ewald-based methods are required to include the interactions between the particles in the central box and their replicas. In the course of a simulation, successive states at regular time intervals are stored in a trajectory for later analysis. Typically, in AAMD, the time step is 1-2 fs (1 fs = 10-15 s) and the system size is in the order of tens of thousands of atoms (including solvent). A larger number of software packages for MD of biomolecules are available, for example, GROMACS [9], AMBER [10], NAMD [11], and CHARMM [12].
Processes 2021, 9, x FOR PEER REVIEW 3 of 63 polarization effects. An extensive discussion of force fields is given by Monticelli and Tieleman [5]. Of note, various force fields have been developed for different types of molecules. In the context of our review, force fields for proteins [5], biological lipids [6], and small molecules [7] are of particular relevance (see Table 1 for examples of current commonly used force fields). Force fields are also employed to compute energies in molecular mechanics (MM) applications [8]. Such simulations are usually conducted under conditions of constant temperature and pressure to mimic laboratory conditions for which special algorithms are available. In addition, to emulate a very large molecular system, several techniques for artificially extending the size of the model system have been developed. The most common one is the periodic boundary condition (PBC). Here, an infinite number of replicas of the central simulation box surrounds the central box. Due to the long-range character of electrostatic interactions, special techniques such as Ewald-based methods are required to include the interactions between the particles in the central box and their replicas. In the course of a simulation, successive states at regular time intervals are stored in a trajectory for later analysis. Typically, in AAMD, the time step is 1-2 fs (1 fs = 10-15 s) and the system size is in the order of tens of thousands of atoms (including solvent). A larger number of software packages for MD of biomolecules are available, for example, GROMACS [9], AMBER [10], NAMD [11], and CHARMM [12]. Basic molecular dynamics simulation algorithm. Each particle moves according to Newton's second law or the equation of motion, F = ma (where F is the force exerted on the particle, m is its mass, and a is its acceleration under a potential field), such that the particles in the system are captured in the trajectory [1,13]. r-position; v-velocity; t-time. Basic molecular dynamics simulation algorithm. Each particle moves according to Newton's second law or the equation of motion, F = ma (where F is the force exerted on the particle, m is its mass, and a is its acceleration under a potential field), such that the particles in the system are captured in the trajectory [1,13]. r-position; v-velocity; t-time. Table 1. Examples of commonly used force fields in molecular dynamics simulations.
It is worthwhile to stress that MD is meant to explore the configuration space (the set of all values of positions and momenta). In the widely used ligand-protein docking one also needs to sample the configuration space using a similar force field as in MD in order to optimize the location and binding mode (a "pose") of the ligand on the surface of a given protein (receptor) [48]. However, the sampled configuration space is frequently restricted to a specific region of the protein, while also only specific portions of the protein and/or ligand are allowed to fluctuate (e.g., only the protein's side chain are displaceable, while the main chain is kept rigid). The outcome of such docking studies is a small set of possible complexes, typically ranked according to the force field employed. Dynamical information is not obtained from ligand-protein docking studies, but some measure for the affinity of the ligand for the protein may be obtained. In fact, the lack of a proper description of systems' true dynamics is one of the biggest caveats of docking [49]. Therefore, frequently after selecting one of the complexes (usually the highest-ranked complex), an MD simulation is conducted to explore the complex in much greater detail [50].

Faster and Longer without Atomistic Details: Coarse-Grained Simulations
While the advancements of the "traditional" AAMD have been very impressive, the limited length (nm) and time scales (ns to µs) attainable in AAMD still poses a severe limitation when studying molecular processes that occur on much longer time scales (e.g., protein (un)folding, protein-membrane association) and/or require much larger length scales (e.g., very large protein tertiary structure or protein-membrane systems). In standard AAMD, atoms serve as particles. In coarse-grained MD (CGMD), 4 to 6 atoms (or more) are grouped together to form "beads" that serve as the particles of the model system. A similar procedure may be applied to the solvent as well [51]. For instance, in the Martini force field [52], initially designed for studying biological membranes modeled as lipid bilayers at the coarse-grained level, four water molecules form a single water bead. As a Processes 2021, 9,71 5 of 60 result of such coarse-graining techniques, the number of particles is significantly reduced in comparison to AAMD of the same system, while the interaction potential energy surface is also much smoother so that a much longer time and larger length scales are attainable by CGMD, but at the expense of atomistic details. It otherwise relies on similar algorithms and associated techniques such as PBC in AAMD, with time steps ranging between 20 to 60 fs. The Martini force field, one of the first force fields for CGMD, has seen important applications and developments since its first installment and is now also employed for the analysis of lipid membrane properties, protein-lipid interactions, oligomerization of membrane proteins, the self-assembly of soluble peptides and proteins, the prediction of protein conformational changes, binding and pore-formation in membranes [53,54]. A recent and far more elaborate overview of applications according to the Martini model is also given by Marrink and Tieleman [55]. There exist also more extreme versions of coarse-graining. It is quite possible and even desirable in some applications to completely remove the atomistic details of the solvent and replace it with a structureless polarizable continuum. This approach is commonly employed for the prediction of acid dissociation constants of titrating sites in proteins as well as for the calculation of affinities of ligands for proteins [56].

Zooming into the Details of Chemical Reactions: Quantum Mechanics Simulations
While MD is extremely useful for the investigation of molecular properties at a longer time and larger length scales, it cannot be used to study the details of chemical reactions in the active site of enzymes, simply because current MD techniques cannot properly handle bond-breaking events. The MM force fields commonly in use for MD enforce links between specific particles to ensure that they remain together to prevent a complete destabilization of the molecular structure. Consequently, changes in the electronic structure are not included in MD simulations. That, however, can be accounted for with quantum mechanics (QM). A typical QM calculation solves the (time-independent) wave equation to estimate single point energies, bond distances and angles, partial charges, spectroscopic properties, various thermodynamic properties, interaction parameters (for instance for use in classical force fields employed in MD), and so forth, thus providing a very detailed description of the given chemical system. This can be completed for any given atom configuration. Because QM is computationally rather demanding, it is not possible and actually not even necessary to treat the full molecular system (i.e., protein and solvent) at the quantum level [57]. Commonly, the enzyme system is divided into a quantum motif (QM part) or region that includes the active site of a given enzyme plus a relevant portion of the substrate or ligand, while the rest of the system including the solvent is considered as part of a classical environment (MM part) adhering to a force field like those in AAMD. The QM part should include all relevant residues and chemical groups that are expected to play a role in the (proposed) reaction mechanism. Failing to do so may result in erroneous statements regarding the reaction mechanism and the binding mode of the substrate. It is also useful to repeat the calculations with different sizes of the QM motif, excluding a particular residue to discover the role of that residue in the mechanism. Numerous applications can be found in the literature on the use of QM/MM for studying the details of enzyme-catalyzed chemical reactions [58]. Relevant to such studies is the identification of the correct protonation states of titrating sites, especially of those residues or chemical groups directly involved in the reaction [59][60][61].
In this review article, we aim to give a broad overview of the current and emerging applications of MD in drug discovery and design-related processes as well as in pharmaceutical formulation/product development. The review introduces the basic concepts of MD, describes the principles of various commonly used methods and approaches in relevant contexts, as well as illustrates many practical aspects and limitations of MD with respect to the particular tasks at hand. However, detailed theoretical accounts of various methodologies are beyond the scope of the review. We hope that this piece of work could serve as an introductory synopsis to many scientists that have no (or little) experience in MD and wish to combine their experimental research with appropriate computations (either in collaboration with computational scientists or if attracted to the method, by themselves). In the following paragraphs, we will discuss how MD can help explore the dynamics and function of different drug targets or evaluate drug-target interactions. We will also look at the intricacies of simulating different biological systems, such as membrane proteins or intrinsically disordered proteins. Moreover, the growing role of MD in the field of computational pharmaceutics is analyzed in light of several examples. Many focused reviews have been recently published on the application of MD in drug discovery [44,45,[62][63][64][65][66]. Some recent reviews can also be found on drug formulation and drug delivery aspects, e.g., [67][68][69][70].

Exploring the Dynamics and Function of Drug Targets
Contemporary drug discovery processes start commonly with identifying and validating a biologically relevant target that can be modulated with drug molecules to prevent or cure a disease or alleviate symptoms of sickness ( Figure 2). Drug targets are often different proteins, such as receptors or enzymes, but may also be DNA or RNA molecules. Protein conformation is one of the biggest approximations in ligand design because proteins are dynamic and can undergo various conformational changes. Even minor conformational changes that involve movements of residue side chains can affect the complementarity between the ligand and the binding site of a protein. Since protein flexibility crucially affects the range of possible target conformational states for ligand binding, MD simulations can provide important information on the dynamic character of the target with regard to drug design. MD and wish to combine their experimental research with appropriate computations (either in collaboration with computational scientists or if attracted to the method, by themselves). In the following paragraphs, we will discuss how MD can help explore the dynamics and function of different drug targets or evaluate drug-target interactions. We will also look at the intricacies of simulating different biological systems, such as membrane proteins or intrinsically disordered proteins. Moreover, the growing role of MD in the field of computational pharmaceutics is analyzed in light of several examples. Many focused reviews have been recently published on the application of MD in drug discovery [44][45][62][63][64][65][66]. Some recent reviews can also be found on drug formulation and drug delivery aspects, e.g. [67][68][69][70].

Exploring the Dynamics and Function of Drug Targets
Contemporary drug discovery processes start commonly with identifying and validating a biologically relevant target that can be modulated with drug molecules to prevent or cure a disease or alleviate symptoms of sickness ( Figure 2). Drug targets are often different proteins, such as receptors or enzymes, but may also be DNA or RNA molecules. Protein conformation is one of the biggest approximations in ligand design because proteins are dynamic and can undergo various conformational changes. Even minor conformational changes that involve movements of residue side chains can affect the complementarity between the ligand and the binding site of a protein. Since protein flexibility crucially affects the range of possible target conformational states for ligand binding, MD simulations can provide important information on the dynamic character of the target with regard to drug design.   The following paragraphs present studies of selected target proteins or protein groups that highlight the flexible nature of proteins. MD simulations have been necessary for studying these proteins' dynamics and function and understanding how they bind with small molecules or other proteins. Specifically, sirtuins have several ligand-binding sites, and understanding their dynamics may give insights into the development of selective inhibitors or activators. Additionally, one of the 'Holy Grails' of cancer drug discovery is direct inhibition of oncogenic RAS proteins [71]. RAS proteins are small, dynamic, and difficult to drug. Therefore, a proper understanding of their dynamics may provide the necessary key for drugging them. The other two examples include a group of highly flexible, intrinsically disordered proteins, and a group of immunological proteins that are currently being actively investigated and developed as biological drugs, i.e., antibodies. Both protein groups play an important role in many therapeutically significant proteinprotein interactions. Especially MD simulations of proteins belonging to the former group need improved methodologies to obtain meaningful results.

Ligand-Binding Dynamics of Sirtuins-In Search of Therapeutic Regulators
Sirtuins are a family of proteins that function as nicotinamide adenine dinucleotide (NAD + )-dependent deacetylases. In mammals, there exist seven sirtuin proteins (SIRT1-7) that have histone and non-histone targets. Sirtuins have diverged tissue distribution and subcellular localization; nuclear (SIRT1/6/7), mitochondrial (SIRT3/4/5), and cytosolic (SIRT2) [72]. Sirtuins have an important role in gene silencing and expression, in cell metabolism, and tumorigenesis [73]. The function of sirtuins depends on the physiological conditions and tissues, and they can either stimulate or prevent signaling pathways [74].
The structure of sirtuins contains a conserved catalytic core of about 275 amino acids. The catalytic core is formed by a typical Rossmann fold domain that binds NAD + and a zinc-binding domain ( Figure 3). There are four flexible loops connecting these two domains. These loops form a tunnel in which the substrate interacts with NAD + in a deacetylation reaction. The binding site of NAD + is divided into three pockets A-C: The ADP ribose moiety binds in the A and B pocket (residues Asn286, Glu288, Arg97, Ala85, Gln167, His187 in SIRT2) and nicotinamide moiety in the C pocket (residues Ala33, Gly34, Thr37, Phe44, and Asp43 in SIRT2). Inhibitors often bind into the C-pocket. In addition, sirtuins have N-and C-terminal extensions of variable length that can affect the enzymes' function as these terminals are targets to posttranslational modifications. Over two decades, several compounds have been developed to bind into the pockets A-C and to regulate the function of sirtuins (reviewed in e.g., [75]). However, there is still no approved sirtuin regulator/inhibitor in the clinic. One of the challenges has been the flexible nature of the substrate and cofactor-binding sites.
For many years, there was no crystal structure of human sirtuins with any cocrystallized regulator available. The binding site of putative regulators in sirtuins is mostly formed by flexible loops and, thus, the conformational state of these enzymes crucially affects the prediction of binding modes of small-molecule ligands. Therefore, MD simulations have been applied to generate different sirtuin conformers to explore the possible ligand-binding conformations. MD was first used for SIRT2 in the year 2008 to generate multiple conformations for an ensemble docking approach [76]. In another study, a set of short (from 400 ps to 10 ns) MD runs were performed for SIRT2 with GROMACS 3.3.1 employing the GROMOS96 force field [77]. The compounds under investigation were then docked using Gold 3.2 [78] to each SIRT2 snapshot structure in the ensemble. Several binding modes were observed for the SIRT2 inhibitors with an N-(3-phenylpropenoyl)glycine tryptamide backbone. Interestingly, the MD simulations suggested an extended binding area of C pocket for the SIRT2 inhibitors, which later was experimentally observed for a highly selective SIRT2 inhibitor, SirReal2 [79]. The structural domains forming the catalytic core of all sirtuins. The large, nucleotide-binding Rossmann fold domain comprises a six-stranded beta-sheet sandwiched between several alpha-helices, whereas the smaller zinc-binding domain is formed by four cysteine residues coordinating the Zn 2+ cation (PDB IDs 3GLT and 6QCD). The binding pockets of the substrate (green), co-factor (pockets A and B, both in brown), and inhibitor (pocket C, blue) in SIRT3 are surrounded by flexible loops. For many years, there was no crystal structure of human sirtuins with any co-crystallized regulator available. The binding site of putative regulators in sirtuins is mostly formed by flexible loops and, thus, the conformational state of these enzymes crucially affects the prediction of binding modes of small-molecule ligands. Therefore, MD simulations have been applied to generate different sirtuin conformers to explore the possible ligand-binding conformations. MD was first used for SIRT2 in the year 2008 to generate multiple conformations for an ensemble docking approach [76]. In another study, a set of short (from 400 ps to 10 ns) MD runs were performed for SIRT2 with GROMACS 3.3.1 employing the GROMOS96 force field [77]. The compounds under investigation were then docked using Gold 3.2 [78] to each SIRT2 snapshot structure in the ensemble. Several binding modes were observed for the SIRT2 inhibitors with an N-(3-phenylpropenoyl)glycine tryptamide backbone. Interestingly, the MD simulations suggested an extended binding area of C pocket for the SIRT2 inhibitors, which later was experimentally observed for a highly selective SIRT2 inhibitor, SirReal2 [79].
Thereafter, several MD studies of SIRT2 have been carried out by Sakkiah and coworkers [80][81][82][83] using GROMACS with either GROMOS96 or AMBER03 force field [84]. In one study, they performed 5-ns MD simulations for the SIRT2-sirtinol complex and the apo structure of SIRT2 to identify alternative binding pocket conformations for the enzyme [80]. The SIRT2 conformations from the MD runs were clustered and three representative structures were selected for receptor-based pharmacophore modeling in order to screen for novel SIRT2 inhibitors. Their other study on the conformational changes of the SIRT2 substrate and inhibitor binding sites revealed the dynamic role of Phe96 in bind- Thereafter, several MD studies of SIRT2 have been carried out by Sakkiah and coworkers [80][81][82][83] using GROMACS with either GROMOS96 or AMBER03 force field [84]. In one study, they performed 5-ns MD simulations for the SIRT2-sirtinol complex and the apo structure of SIRT2 to identify alternative binding pocket conformations for the enzyme [80]. The SIRT2 conformations from the MD runs were clustered and three representative structures were selected for receptor-based pharmacophore modeling in order to screen for novel SIRT2 inhibitors. Their other study on the conformational changes of the SIRT2 substrate and inhibitor binding sites revealed the dynamic role of Phe96 in binding the inhibitor in SIRT2 [81]. They further investigated the role of the inhibitor-binding residues by simulating the docked complexes of five known inhibitors with SIRT2 for 20 ns [82]. The stability of the complexes was evaluated by monitoring the root-mean-square deviation (RMSD) of the protein Cα atoms and the root-mean-square fluctuation (RMSF) of the protein residues as well as the inhibitor-protein hydrogen bonding interactions along the simulation trajectory. All compounds had established favorable binding interactions and stable binding poses after about 10-ns simulation. The detailed analysis of the interactions revealed that hydrogen bonds formed between the inhibitors and especially Arg97 and Gln167 are important for the inhibition of SIRT2 activity. Further MD simulations by the same research group suggested crucial conformational differences between the wild-type SIRT2 and two mutants, thus providing insight into the design of more potent SIRT2 inhibitors [83].
In an attempt to understand more about the role of C-and N-terminals of sirtuins, MD simulations were carried out to explore the terminal regions' effect on the enzymatic activity of SIRT2 [85]. The results of the simulations performed with GROMACS 4.5.5 and the all-atom AMBER ff99SB-ILDN force field [86] suggest that the C-terminal region of SIRT2 can partially occlude the NAD + binding pocket or stabilize the NAD + in a non-productive state, thus functioning as an autoinhibitory region.
MD studies have also been applied to study the dynamics of other sirtuins than SIRT2. For example, SIRT1 simulations have been utilized to screen for novel inhibitors and to study the interactions between ligands and the protein [87,88]. In another study, the binding modes of 2-anilinobenzamide derivatives in SIRT1/2/3 were investigated by 30-ns MD simulations with the AMBER 11.0 simulation package using the ff03.r1 [84] and ff99SB [14] force fields to explain the isoform selectivity of the compounds [89]. According to the RMSD of the ligand-bound structures, the dynamic stability was achieved at 20 ns. Furthermore, Sinha and coworkers [90] studied the stability of sirtuin-inhibitor complexes with the high-speed simulation software Desmond [91] employing the OPLS2005 [17] force field. After carrying out a virtual screening of plant-derived compounds against SIRT1-7 by docking and scoring, they identified sulforaphane, kaempferol, and apigenin as hits. Multiple 20-ns MD simulations were performed for the top-scored poses of these compounds in the respective sirtuin structures. The results suggested that sulforaphane bound stably in SIRT1 and SIRT5, kaempferol in SIRT3 and apigenin in SIRT6, and the critical ligand-protein interactions remained throughout the simulations. In another recent study [92], natural compounds were identified as potential SIRT1 activators. Stable binding of mulberrin, quinine, quinidine, gartanin, and nicotinamide in an allosteric SIRT1 site (i.e., other than the catalytic site) was demonstrated in 50-ns MD simulations with AMBER. These ligands did not disrupt the functionally important hydrogen bonds between Arg234 (allosteric site) and Asp475, His473, or Val459 (catalytic site) of SIRT1, which was concluded to be consistent with their experimentally shown ability to activate the enzyme.
There is only little experimental information on the structure of SIRT7. A recently published study reported a homology model of SIRT7 [93]. Three 1-µs long MD simulations with GROMACS (AMBER03 force field) were used to investigate the stability and dynamics of the model structure. The simulations showed that the model of SIRT7 in complex with NAD + and acetyl-lysine was stable, suggesting that the model was built properly. In addition, the simulations suggest that the N-terminus of SIRT7 may play an important role in assisting to hold the substrate in the active site during the catalytic reaction.
Additionally, the co-factor binding to sirtuins has been studied with MD simulations. Desmond together with the OPLS3 force field [25] was employed to simulate SIRT1 and SIRT3 in complex with either NAD + or NADH and an acetyl-lysine substrate for 250-500 ns [94]. The simulations suggested an opening of the enzyme structure and an alternative binding site for the dihydropyridyl moiety of NADH instead of the C pocket. Furthermore, two persistent hydrogen bonds were observed between NAD + and Ile347/230 and Asp-348/231 in SIRT1/3 whereas the corresponding bonds were missing if NADH was bound.
Our case of sirtuins exemplifies that protein flexibility is important to take into consideration in designing ligands that target dynamic binding sites such as that in sirtuins. Although current flexible docking approaches can take into consideration some features of the target protein flexibility, MD is clearly a useful method for studying not only protein flexibility affecting ligand-binding but also larger structural arrangements in dynamic drug targets. Our next case further highlights how understanding protein dynamics can be a key to facilitate the design of potent therapeutic compounds against highly dynamic drug targets such as RAS proteins.

RAS-Uncovering the Conformational Dynamics of a Challenging Anticancer Drug Target
More than 150 members belong to the Ras superfamily of small guanosine triphosphatases (GTPases) [95]. The subgroup of the Ras family comprises 36 proteins, including the oncogenes HRAS, KRAS, and NRAS. As KRAS may undergo alternative splicing, these three RAS genes encode four proteins: HRAS, KRAS4A, KRAS4B, and NRAS (referred to as RAS from now on). RAS proteins share high sequence similarity and mainly differ in their membrane anchoring hypervariable region (HVR) [96]. GTPases, including RAS proteins, have a built-in switch mechanism that allows the protein to adopt an active conformation, capable of binding effector proteins when GTP is bound [97]. Thus, RAS activity is controlled by the bound nucleotide, GDP, or GTP. The nucleotide exchange from GDP to GTP, i.e., RAS activation is facilitated by guanine exchange factors (GEFs) [98,99]. Inactivation of RAS occurs upon GTP hydrolysis to GDP. RAS bears low intrinsic hydrolytic activity, which is greatly enhanced by GTPase activating proteins (GAPs) [98,99]. For a more comprehensive picture of RAS biology, the reader is referred to review [100].
Hyperactivation in RAS signaling is related to diseases, especially to cancer [100]. Altogether, RAS mutations appear in around 17% of all human cancers, whereas KRAS mutations contribute to the majority (69%) [101]. KRAS mutations are frequently observed in solid tumors of the pancreas, colon, and lung; NRAS in melanomas; HRAS in head and neck as well as in bladder carcinomas. RAS isoforms display different mutation preferences; for instance, KRAS mutations appear most often in Gly12 (81%) whereas NRAS in Gln61 (62%). Not only are RAS isoforms different [102,103], but also oncogenic mutants of the same isoform even at the same position are dissimilar (e.g., KRAS G12D vs. KRAS G12R) [104][105][106][107][108]. Due to their major role in cancer, RAS proteins are highly attractable drug targets and, thus, the focus of major drug discovery efforts [71].
RAS related research has been intense already for several decades since the discovery of these oncogenes in the early 1980s [109]. However, these small GTPases are difficult-todrug and previously, were considered as undruggable targets [71]. The reason for this is that, first, there is no obvious druggable small-molecule binding pocket in RAS. A pocket beneath the switch-II was revealed only after G12C-targeting covalent compounds were disclosed [110]. Of note, the existence of this pocket (as druggable) was not evident from the available crystal structures at the time. Since then, potential pockets on the effector protein binding interface have been described [111][112][113][114]. Second, RAS proteins are active at the membrane, which plays a significant role in their function [115]. Third, a major obstacle in drugging RAS is its highly dynamic character ( Figure 4). These difficulties related to targeting RAS, together with the discrepancy among the oncogenic mutants, clearly highlight the need for a better understanding of these oncoproteins. To this end, mastering RAS conformational dynamics comes into play.  [116]. (B) Fully open switch-I conformation that exemplifies the high intrinsic capability of the switch mobility [117]. (C) RAS appears in a closed conformation (also known as state 2) in complex with effector proteins; here demonstrated by a complex of HRAS with RBD (RAS binding domain) of Raf kinase [118]. (D) A closed conformation of RAS is also observed in a GAP-RAS complex [119]. (E) GDP is displaced from RAS upon binding of GEF that binds to an open switch-I RAS conformation and introduces an α-helix into the nucleotide-binding site [120]. (F) Superimposed RAS structures from (A-E) highlight the switch dynamics. Oncogenic mutation hotspot locations (Cα-atoms) are indicated with small spheres. (G) NMR data-driven model of RAS on a lipid nanodisc [121]. In all (A-G): Switch-I (residues 30-40) region is highlighted with red and switch-II (58-72) with blue; Mg 2+ -ion is displayed as a magenta sphere; the bound nucleotide is shown in sticks.
In order to adopt all potential binding conformations with various effector proteins and with its regulators, RAS needs to be highly dynamic, especially on the switch regions ( Figure 4). Even though recent crystal structures have demonstrated multiple conformational states of RAS, in all crystal structures of KRAS, the switch regions, when not disordered, are stabilized by crystal contacts [122]. Therefore, the structural data is unable to fully describe the dynamics of the switches. To capture the dynamics of these cryptic switches, which is beyond the currently available experimental methods, MD simulations can be utilized [123].
Recently, we conducted a comprehensive analysis of KRAS G domain (excluding the HVR) dynamics by classical all-atom MD simulations, including wild-type KRAS and all G12 missense mutants (G12A, G12C, G12D, G12R, G12S, G12V) bound to both nucleotides GDP and GTP [124]. Perhaps unsurprisingly, with an aggregate of 170 μs simulation data, we observed highly dynamic behavior in the switch regions. Interestingly, principal component analysis (PCA) not only highlighted differences among GDP and GTP bound systems but also revealed unique profiles for each mutant. Finally, by utilizing Markov state modeling (MSM), as reviewed in [125], we were able to capture discrepancy among GTPbound G12D, G12R, G12V, and wild-type KRAS in their long-timescale dynamics. Each system populated MSM derived metastable states differently. We anticipated that the shift in protein dynamics, occurring especially in switch regions, may lead to modulated KRAS mediated signal transduction among the mutants, as was demonstrated before between wild-type HRAS and G12V [126]. Indeed, it was later confirmed that G12R mutant, for  [116]. (B) Fully open switch-I conformation that exemplifies the high intrinsic capability of the switch mobility [117]. (C) RAS appears in a closed conformation (also known as state 2) in complex with effector proteins; here demonstrated by a complex of HRAS with RBD (RAS binding domain) of Raf kinase [118]. (D) A closed conformation of RAS is also observed in a GAP-RAS complex [119]. (E) GDP is displaced from RAS upon binding of GEF that binds to an open switch-I RAS conformation and introduces an α-helix into the nucleotide-binding site [120]. (F) Superimposed RAS structures from (A-E) highlight the switch dynamics. Oncogenic mutation hotspot locations (Cα-atoms) are indicated with small spheres. (G) NMR data-driven model of RAS on a lipid nanodisc [121]. In all (A-G): Switch-I (residues 30-40) region is highlighted with red and switch-II (58-72) with blue; Mg 2+ -ion is displayed as a magenta sphere; the bound nucleotide is shown in sticks.
In order to adopt all potential binding conformations with various effector proteins and with its regulators, RAS needs to be highly dynamic, especially on the switch regions ( Figure 4). Even though recent crystal structures have demonstrated multiple conformational states of RAS, in all crystal structures of KRAS, the switch regions, when not disordered, are stabilized by crystal contacts [122]. Therefore, the structural data is unable to fully describe the dynamics of the switches. To capture the dynamics of these cryptic switches, which is beyond the currently available experimental methods, MD simulations can be utilized [123].
Recently, we conducted a comprehensive analysis of KRAS G domain (excluding the HVR) dynamics by classical all-atom MD simulations, including wild-type KRAS and all G12 missense mutants (G12A, G12C, G12D, G12R, G12S, G12V) bound to both nucleotides GDP and GTP [124]. Perhaps unsurprisingly, with an aggregate of 170 µs simulation data, we observed highly dynamic behavior in the switch regions. Interestingly, principal component analysis (PCA) not only highlighted differences among GDP and GTP bound systems but also revealed unique profiles for each mutant. Finally, by utilizing Markov state modeling (MSM), as reviewed in [125], we were able to capture discrepancy among GTPbound G12D, G12R, G12V, and wild-type KRAS in their long-timescale dynamics. Each system populated MSM derived metastable states differently. We anticipated that the shift in protein dynamics, occurring especially in switch regions, may lead to modulated KRAS mediated signal transduction among the mutants, as was demonstrated before between wild-type HRAS and G12V [126]. Indeed, it was later confirmed that G12R mutant, for which MSM revealed a unique profile, exhibits different effector protein binding compared to G12D; it is defective for interaction with PI3Kα [108]. Our results demonstrated that a G12 mutation located in the P-loop ( Figure 4F) is capable of shifting KRAS' dynamics in distant regions of the protein that are responsible for effector protein binding [124]. How is this possible as no evident direct interactions from this position to switches were observed in the simulations? A putative explanation is that the shift in the dynamics will occur via a hydrophobic interaction network. Through this network, mutant KRAS may alter the protein dynamics in an allosteric manner, inflicting changes in distant sites of the protein. In this interaction network, hydrophobic hubs (displaying more than three hydrophobic interactions) include the residues V14 and A146, which when mutated are associated with altered RAS dynamics [106,127]. In another study, MD simulations revealed highly dynamic characteristics for the switches even when there exists a covalent inhibitor beneath the switch-II [128]. Importantly, the inhibitor AMG 510 appeared extremely stable regardless of the switch movements. These open conformations and high flexibility of the switches that were revealed by the simulations are not observed in the crystal structure (most likely due to the crystal packing effects) [129].
On top of the convoluted switch dynamics, another layer of complexity in KRAS dynamics exists: RAS rotational dynamics on the membrane ( Figure 4G). On artificial lipid nanodiscs, monomeric KRAS was shown to exist in different configurations, named occluded and exposed, where the effector protein interaction-surface-forming switch regions are occluded or exposed to RAS effectors [121]. The rotational dynamics of KRAS were captured in single 20 µs MD simulations of KRAS mutants G12D, G12V, and Q61H, where three distinct major orientations of RAS were observed [130,131]. Recently, impressive 1.45 ms MD simulations (290 individual simulations of 5 µs) were conducted to further study dynamic orientations of wild-type KRAS4B at the membrane with different lipid compositions [39]. Still an unclear issue, however, is the relevant oligomerization state of KRAS at the membrane. It may exist in monomeric, dimeric, trimeric, or a higher oligomeric state [132][133][134][135][136]. Overall, small GTPase proteins' rotational and translational dynamics on the membrane, which has a substantial influence on RAS oligomerization, is not well understood [137]. Importantly, this should not be neglected as RAS dimerization may have an important role in oncogenicity and drug therapy efficacy [138]. Approaches to obtain a better understanding of RAS signaling on the membrane are ongoing, and for instance, a structural model of RAS-RAF signalosome was recently reported by Shaw et al. [139]. For further information on RAS-related MD simulations and dynamics, the reader is referred to reviews [122,140,141].
Taken together, MD simulations have uncovered RAS conformational dynamics at the atomic level and provided further insights into RAS biology. Conformational states derived from MD simulations, which are not captured by experimental methods, could be utilized in RAS targeted drug discovery. Finally, understanding RAS dynamics may be a guide in deciphering mutant RAS functionality and point out potential vulnerabilities of these oncoproteins at the atomic level, which could be targeted directly or indirectly.
In the next section, we will be introducing drug targets of the future, a group of extremely flexible proteins whose dynamics may be investigated with MD simulations to complement the data derived with various experimental approaches.

Extremely Challenging Targets-Intrinsically Disordered Proteins
Intrinsically disordered proteins (IDPs) are the type of proteins that in physiological conditions lack a stable tertiary structural arrangement and instead exist as conformational ensembles. Apart from fully disordered proteins, many folded proteins contain intrinsically disordered regions (IDPRs). IDPs are associated with multifunctional roles as they can undergo conformational arrangements to different folds, depending on the particular binding partner [142]. The formation of 'fuzzy' complexes, where the disorder is maintained upon binding to the partner, has also been shown to be important in many biological processes [143]. The structural characteristics and the individual state of conformation in such proteins are dependent upon the type of amino acids forming the disordered segments [142,144,145]. There are many web-servers and computational tools available for accurate prediction of IDPs and their molecular functions (comprehensively reviewed in [146]). In addition, there are databases that store experimentally determined IDPs and IDPRs, e.g., DisProt [147].
The abundance of IDPs in the human genome, their importance in biological functions such as cellular signaling, and their involvement in various pathological conditions make them an attractive target for drug design [148]. Although the lack of a defined tertiary structure poses a challenge for rational structure-based drug design strategies, there is evidence that small molecules and peptides can be designed to target the protein-protein interactions of IDPs with their ordered binding partners (reviewed in [148,149]). Experimental techniques such as NMR spectroscopy or small-angle X-ray scattering (SAXS) can be used to characterize the conformational ensembles of IDPs although the quantitative data on the dynamic processes can be difficult to interpret regarding the unambiguously identified motions [150]. On the other hand, MD simulations suit perfectly for describing atomic motions and, thus, MD-based approaches can be valuable in studying the conformational dynamics of IDPs and their binding mechanisms to the partner proteins. However, there are some challenges that need to be addressed [149].
First, the extremely large number of degrees of freedom present in IDPs poses a challenge for proper sampling of the conformational space of the protein. Therefore, MD algorithms with enhanced sampling techniques such as replica exchange molecular dynamics (REMD) [151] (see Figure 5) are often used to efficiently overcome energy barriers in the IDP conformational landscape.
Processes 2021, 9, x FOR PEER REVIEW 13 of 63 biological processes [143]. The structural characteristics and the individual state of conformation in such proteins are dependent upon the type of amino acids forming the disordered segments [142,144,145]. There are many web-servers and computational tools available for accurate prediction of IDPs and their molecular functions (comprehensively reviewed in [146]). In addition, there are databases that store experimentally determined IDPs and IDPRs, e.g., DisProt [147]. The abundance of IDPs in the human genome, their importance in biological functions such as cellular signaling, and their involvement in various pathological conditions make them an attractive target for drug design [148]. Although the lack of a defined tertiary structure poses a challenge for rational structure-based drug design strategies, there is evidence that small molecules and peptides can be designed to target the protein-protein interactions of IDPs with their ordered binding partners (reviewed in [148,149]). Experimental techniques such as NMR spectroscopy or small-angle X-ray scattering (SAXS) can be used to characterize the conformational ensembles of IDPs although the quantitative data on the dynamic processes can be difficult to interpret regarding the unambiguously identified motions [150]. On the other hand, MD simulations suit perfectly for describing atomic motions and, thus, MD-based approaches can be valuable in studying the conformational dynamics of IDPs and their binding mechanisms to the partner proteins. However, there are some challenges that need to be addressed [149].
First, the extremely large number of degrees of freedom present in IDPs poses a challenge for proper sampling of the conformational space of the protein. Therefore, MD algorithms with enhanced sampling techniques such as replica exchange molecular dynamics (REMD) [151] (see Figure 5) are often used to efficiently overcome energy barriers in the IDP conformational landscape.  [152,153]. Several replicas of the same system are simulated in parallel at different temperatures. At specified intervals, replicas with neighboring temperatures are exchanged with the Boltzmann weighted Metropolis criterion. After a successful temperature exchange, velocities are rescaled to those expected in the new temperature condition. Successful swapping between replicas happens when there is some overlap between the potential energy of neighboring replicas over the course of a simulation.  [152,153]. Several replicas of the same system are simulated in parallel at different temperatures. At specified intervals, replicas with neighboring temperatures are exchanged with the Boltzmann weighted Metropolis criterion. After a successful temperature exchange, velocities are rescaled to those expected in the new temperature condition. Successful swapping between replicas happens when there is some overlap between the potential energy of neighboring replicas over the course of a simulation. Second, when simulating IDPs, it is crucial is to pay attention to the selection of the applied force field since most of the available protein force fields have been developed for proteins that have stable folded structures. Thus, general protein force fields tend to overpopulate certain secondary structures or compact, collapsed structures over the extended ones [154]. Experimental structural data have been used to guide the development of novel or updated force fields for both all-atom and coarse-grained simulations, such as FF14IDPSFF [155], FF14IDP [156], FF99IDP [157], A99SB-ILDN [158], CHARMM36 IDPSFF [159], and AWSEM-IDP [160]. Third, the applied solvent model needs to be carefully selected as in IDPs, the solvation effect plays a greater role. Novel solvent models such as TIP4P-D [161] facilitate the protein-water interactions and disfavor collapsed protein structures. However, depending on the studied IDP, such models may underestimate the transient tendency of an IDP/IDPR for folded structures, in which case a traditional water model could give better results [162]. Table 2 lists some examples of MD simulation studies of biologically relevant IDPs employing different force fields and solvent models. From the studies listed in Table 2, it is worth discussing in more detail how Jin et al. [163] mapped the conformational ensemble of the unbound disordered dimerization domain of the transcription factor c-Myc using REMD. Subsequent (conventional) MD simulations of the inhibitor-bound c-Myc domain gave the authors valuable insight into the constantly changing binding interactions and binding sites of the inhibitor at c-Myc (the conformational ensembles looked as if there were 'ligand clouds' around 'protein clouds'). The obtained information was later utilized when Yu et al. [172] selected representative conformations of these different c-Myc binding sites to carry out a successful structurebased virtual screening study, which produced four active compounds that block c-Myc function in the cell.
The above example supports the suggested in silico protocol of Bhattacharya and Lin [149] for designing small molecule drugs for IDPs:

1.
Generate the conformational ensemble for the unbound target protein using enhancedsampling MD.

2.
Identify highly populated conformations in the ensemble and extract the representative ones by clustering.

3.
Detect possible ligand-binding pockets in all the representative conformations of the IDP.

4.
Select the most druggable pockets using criteria such as hydrophobicity, plasticity, and allosteric coupling to functional sites.

5.
Carry out conventional virtual screening in each selected pocket and protein conformation exhibiting that pocket. 6.
Select the virtual hits that show good predicted binding affinity in many IDP conformations.
This general protocol could also work well with other types of targets with flexible binding sites.
Of note, MD generated IDP ensembles are frequently compared with experimental data and vice versa [173,174]. Naturally, since MD and the experimental techniques have their own limitations but produce complementary information, they can be used synergistically to characterize the conformational ensembles of IDPs [150,[175][176][177][178]. For example, Shrestha et al. [170] recently reported how their enhanced simulation protocol was able to produce a structural ensemble of the disordered N terminal of c-Src kinase that agreed with the NMR and SAXS data, without reweighting or biasing the simulations.
In the next section, we will briefly look at a fast-growing branch of drug discovery that can also benefit from MD simulations: the design of monoclonal antibodies and antibody fragments.

Molecular Dynamics Simulations in Antibody Design
Antibodies have become an attractive class of biotherapeutics because of their low toxicity and innate ability to recognize and selectively bind to a wide variety of targets [179]. Experimental methods for the discovery of high-affinity antibodies, such as phage display, have proven to be extremely effective in antibody development, but also time-consuming and laborious [180]. Therefore, computational methods, such as MD simulations, have emerged to complement and expand the knowledge obtained from experimental antibody research.
In contrast to many other computational methods that ignore the structural plasticity of molecules, MD simulations can describe the dynamic details of molecular interactions. In antibody design, this can be utilized to identify the so-called "hot-spot" regions at the antibody-antigen interface [181]. As an example of this, Sinha et al. [181] in their work based on anti-lysozyme antibody structures could demonstrate that MD was able to reveal salt-bridges and other functionally important electrostatic forces even though these interactions were left undetected in the static crystal structure of the studied antigenantibody complex. Naturally, the discovery of functionally important residues is the key to designing mutations in antibodies with improved affinities and selectivity to their targets.
Because MD simulations can reproduce molecular fluctuations at atomic resolution, they are also being employed to improve the scoring of antibody-antigen complex prediction methods. In this regard, MD simulations were used to refine the interactions in docked antigen-antibody complexes to obtain average values for the Generalized Born/Surface Area (GBSA) score that was used for re-ranking the docking results. The averaged GBSA scores derived from the simulations further improved the accuracy of predicting the antigen-antibody complexes [182]. Similarly, conformational ensembles generated with MD simulations can be used in rigid docking calculations, also called ensemble docking [183,184]. Even minor changes in protein conformation can affect docking results. Therefore, the advantage of this approach is that it incorporates backbone flexibility in the docking procedure and can be considered to mimick the conformational selection in antigen-antibody binding [183].
MD simulations can also provide a solution for designing antibodies with enhanced stability [184]. In a recent study, Bekker et al. [185] applied MD simulations to predict the thermal stability of single-domain antibodies and comparing them with experimental data. This allowed them to pinpoint the key residues contributing to the instability and to successfully design a mutated antibody with improved thermal stability [185,186]. Additionally, a major issue encountered in antibody-based therapeutics is that antibodies are prone to aggregate in the high concentration formulations needed for treatment [187]. Spatial aggregation propensity (SAP) is a measure directed to solve this issue. SAP quantifies the hydrophobic patches exposed on the antibody surface, averaged over the snapshot structures obtained from MD simulations with explicit solvent. Identification of regions prone to aggregation can be used to predict targeted mutations to design antibodies with better stability.
Overall, MD simulations can reveal details on antigen-antibody interactions that are not (easily) obtainable with any other methods. MD provides a useful tool to complement both experimental and other computational methods for designing novel antibodies with improved affinity, specificity, and stability.

Simulating Drug-Target Interactions
In the drug development process, lead compound discovery and subsequent optimization of the identified compounds to candidate drugs follow the target validation ( Figure 2). During this phase, the aim is to discover and design compounds that have a good binding affinity and selectivity to the target. Docking and scoring are common tools for fast estimation of favorable ligand binding poses and binding energies. However, the scoring functions of current docking tools have many limitations that result in inaccurate binding affinity predictions [49,188]. With MD, we can tackle these limitations and facilitate a more accurate evaluation of the compound binding affinity as the simulations take into account the effect of water and the dynamics of the binding partners [49]. In this section, we will go through some principal methods that utilize MD in the estimation of ligand binding energies and binding kinetics. In the end, we also discuss how the challenge of docking extremely flexible peptides has been tackled with hybrid approaches combining molecular docking and MD simulations.

Binding Energy Estimation
Molecular recognition is critical to many fundamental biological processes [189]. Binding between two interacting molecules in the cell initiates several biological processes, and the significance of these specific interactions can be judged through the free energy of binding (∆Gbind). In general, the change in free energy (∆G) describes the thermodynamics and the kinetic properties of a system (for our purpose, a ligand binding to a protein in a solvent). ∆G can be described as an amount of energy released or required during a chemical process. The final ∆G value at the state of equilibrium is negative if the chemical process (for our purpose, ligand binding to its target) is spontaneous, while ∆G is positive if the chemical process is non-spontaneous. Conventionally, ∆G is given by the change in Gibbs function: where ∆H represents the change in enthalpy of the system, T represents the temperature in Kelvin, and ∆S represents the change in the entropy of the system. Moreover, ∆G of a system can be used as an overall measure to determine the stability of a given system (for our purpose, the stability of a ligand-receptor complex, or ligand binding affinity) [190]. In case of a ligand-binding event, the standard binding free energy ∆G • bind is related to the binding constant Kb by the relationship: where R is the universal gas constant (unit: cal·K −1 ·mol −1 ), T is the temperature in Kelvin, and ∆G • has been measured at standard conditions: 1 atm pressure, room temperature (298.15 K), and 1 M protein and ligand concentrations. On the other hand, K b is a ratio of the kinetic rate constants for association and dissociation (k on and k off , respectively) and the inverse of the dissociation constant, K d .
Since the free energy is a state function, binding affinity (or K d ) of a ligand can be obtained from the difference between the two thermodynamic equilibrium states, i.e., the unbound (initial) and the bound (final), without knowing the exact pathway connecting the two states [190]. Computationally, the estimation of ∆G bind requires extensive MD simulations to generate the unbound and the bound equilibrium states. These equilibrium states can also be defined as macrostates, while the macrostates can be further sub-divided into an ensemble of discrete microstates, comprising various intermediate stages of the ligand-receptor probing process ( Figure 6). during a chemical process. The final ∆G value at the state of equilibrium is negative if the chemical process (for our purpose, ligand binding to its target) is spontaneous, while ∆G is positive if the chemical process is non-spontaneous. Conventionally, ΔG is given by the change in Gibbs function: where ΔH represents the change in enthalpy of the system, T represents the temperature in Kelvin, and ΔS represents the change in the entropy of the system. Moreover, ∆G of a system can be used as an overall measure to determine the stability of a given system (for our purpose, the stability of a ligand-receptor complex, or ligand binding affinity) [190]. In case of a ligand-binding event, the standard binding free energy ΔG°bind is related to the binding constant Kb by the relationship: where R is the universal gas constant (unit: cal·K −1 ·mol −1 ), T is the temperature in Kelvin, and ΔG° has been measured at standard conditions: 1 atm pressure, room temperature (298.15 K), and 1 M protein and ligand concentrations. On the other hand, Kb is a ratio of the kinetic rate constants for association and dissociation (kon and koff, respectively) and the inverse of the dissociation constant, Kd.
Since the free energy is a state function, binding affinity (or Kd) of a ligand can be obtained from the difference between the two thermodynamic equilibrium states, i.e., the unbound (initial) and the bound (final), without knowing the exact pathway connecting the two states [190]. Computationally, the estimation of ΔGbind requires extensive MD simulations to generate the unbound and the bound equilibrium states. These equilibrium states can also be defined as macrostates, while the macrostates can be further sub-divided into an ensemble of discrete microstates, comprising various intermediate stages of the ligand-receptor probing process ( Figure 6). Assessment of ∆G bind for a series of ligands against a specific target protein can help identify the most promising compounds with higher binding affinities than the rest of the compounds. Therefore, drug design and docking-based virtual screening processes are often followed up by ∆G bind calculations [191] for which several computational methods have been developed. Such methods include rigorous thermodynamic pathway approaches to obtain relative (and recently also absolute) binding free energies of a series ligands [192], as well as less rigorous and computationally faster end-point methods that only sample the macrostates. The accuracy of these methods varies and is inversely correlated with the computational requirements. The most rigorous (alchemical i.e., non-physical) pathway approaches such as free energy perturbation (FEP) methods [192,193], Bennet's Acceptance Ratio (BAR) [194], and thermodynamic integration (TI) [195] can usually be applied only to a few compounds due to the high computational cost. Somewhat faster approximations include methods that sample the possible physical microstates of the thermodynamic pathway, e.g., the Weighted Histogram Analysis Method (WHAM) [196] applied to predict the potential of mean force (PMF) using enhanced MD techniques such as steered molecular dynamics (SMD) [197][198][199] (Figure 7) and umbrella sampling [200], as well as the multistate BAR estimator (MBAR) [201] and Jarzynski's non-equilibrium method [202]. For ligandprotein systems, the change in the PMF upon (un)binding is often assumed to be equivalent with the binding free energy profile along the chosen reaction coordinate [203]; there are, however, different theoretical approaches to obtain the free energy from the PMF [203][204][205]. Computationally less expensive end-point methods include the linear interaction energy (LIE) method [206] and the well-established Molecular Mechanics Poisson-Boltzmann Surface Area (MM-PBSA) and Molecular Mechanics Generalized Born Surface Area (MM-GBSA) [207,208] approaches. In the next paragraphs, we focus on the application and limitations of the MM-PB(GB)SA methods that are commonly used for the evaluation of binding free energies of hit compounds from virtual screening experiments (post-docking analysis), e.g., to re-rank the docked poses of the top ligands [209]. often followed up by ΔGbind calculations [191] for which several computational m have been developed. Such methods include rigorous thermodynamic pathw proaches to obtain relative (and recently also absolute) binding free energies of ligands [192], as well as less rigorous and computationally faster end-point meth only sample the macrostates. The accuracy of these methods varies and is inversel lated with the computational requirements. The most rigorous (alchemical i.e., no ical) pathway approaches such as free energy perturbation (FEP) methods [192,19 net's Acceptance Ratio (BAR) [194], and thermodynamic integration (TI) [195] can be applied only to a few compounds due to the high computational cost. Somewh approximations include methods that sample the possible physical microstates of t modynamic pathway, e.g., the Weighted Histogram Analysis Method (WHAM) [ plied to predict the potential of mean force (PMF) using enhanced MD techniques steered molecular dynamics (SMD) [197][198][199] (Figure 7) and umbrella sampling [ well as the multistate BAR estimator (MBAR) [201] and Jarzynski's non-equi method [202]. For ligand-protein systems, the change in the PMF upon (un)bindin ten assumed to be equivalent with the binding free energy profile along the chos tion coordinate [203]; there are, however, different theoretical approaches to ob free energy from the PMF [203][204][205]. Computationally less expensive end-point m include the linear interaction energy (LIE) method [206] and the well-established ular Mechanics Poisson-Boltzmann Surface Area (MM-PBSA) and Molecular Me Generalized Born Surface Area (MM-GBSA) [207,208] approaches. In the next para we focus on the application and limitations of the MM-PB(GB)SA methods that a monly used for the evaluation of binding free energies of hit compounds from screening experiments (post-docking analysis), e.g., to re-rank the docked poses of ligands [209]. e 7. Steered molecular dynamics (SMD). A continuous external force is used to pull the ligand out of the target prot g site along the reaction coordinate ('collective variable', CV). The free energy of binding is calculated from nship between the force used and the displacement of the ligand. The plot of average potential energy versus d ent is called the potential of mean force (PMF) [210]. Enhanced MD simulations (e.g., as umbrella sampling) c ith the weighted histogram analysis method (WHAM) are used to determine the PMF along the reaction coordin The binding free energy estimation based on MM-PB(GB)SA method emplo the unbound and bound states (i.e., end-points) of the system. The binding free en a ligand to a protein is obtained as: The free energy (G) of each individual system (ligand/protein/complex) is ca Figure 7. Steered molecular dynamics (SMD). A continuous external force is used to pull the ligand out of the target protein binding site along the reaction coordinate ('collective variable', CV). The free energy of binding is calculated from the relationship between the force used and the displacement of the ligand. The plot of average potential energy versus displacement is called the potential of mean force (PMF) [210]. Enhanced MD simulations (e.g., as umbrella sampling) coupled with the weighted histogram analysis method (WHAM) are used to determine the PMF along the reaction coordinate.
The binding free energy estimation based on MM-PB(GB)SA method employs only the unbound and bound states (i.e., end-points) of the system. The binding free energy of a ligand to a protein is obtained as: 19 of 60 The free energy (G) of each individual system (ligand/protein/complex) is calculated from the total molecular mechanics' energy (E MM ) of the system in the gas phase (includes internal bonded and non-bonded electrostatic and van der Waals energies), the solvation free energy (G solv ) with both polar (computed using implicit solvation models, either Poisson-Boltzmann or generalized Born continuum solvent models), and non-polar contributions (assumed to be proportional to the solvent-accessible surface area, SASA), as well as the entropy (S; includes translational, rotational and vibrational contributions) at a specified temperature T: The entropy is estimated from a normal-mode analysis of harmonic frequencies calculated at the molecular mechanics level, which is computationally costly (please, see e.g., [211] for a more detailed description of the theoretical aspects of the methodology).
MM-PB(GB)SA calculations include two conventional approaches: (i) the threetrajectory scheme, and (ii) the single trajectory scheme [212][213][214][215]. The former relies on the snapshots from three MD trajectories, including apoprotein, free ligand, and the ligand-protein complex, which makes it comparatively accurate but computationally expensive [214,216]. Single trajectory strategy, however, includes only one MD simulation for the ligand-protein complex, reducing the computational time significantly [214][215][216][217]. The latter approach, however, neglects any explicit structural relaxation of the protein and ligand upon binding. Besides the strategy chosen, other factors also contribute towards MM-PB(GB)SA calculations, which include: (i) simulation time, (ii) choice of the force field, (iii) solvent model, (iv) solute dielectric constant, and (v) the net charge of the system. It has been reported that several short MD simulations run independently provide better ∆G bind predictions than a single long MD trajectory [213,214,218,219]. On the other hand, charged ligands can hinder the process of making accurate predictions [216,220,221]. Additionally, for obtaining absolute ∆G bind with the MM-PB(GB)SA calculations, the inclusion of conformational entropy (T∆S) remains a challenge. Incorporation of T∆S in the calculation does not guarantee a better accuracy in the final energies due to insufficient conformational sampling [212,213,218,220]. A sufficient number of MD snapshots may lead to a reliable estimate of absolute ∆G bind , however, it is computationally expensive [214,216]. Therefore, relative ∆G bind is generally sufficient to rank compounds against the target protein in structure-based drug design [214] and has a better prediction accuracy than the absolute ∆G bind calculations with this method.
The MM-PB(GB)SA approach has been applied in a number of studies, including the development of anticancer compounds [222][223][224], antibacterial [225][226][227], antiviral [228][229][230][231][232] and antiparasitic drugs [233][234][235], as well as antipsychotics [236,237]. These methods are also often applied to understand and analyze the binding mode and key interactions of small molecules [238,239] as well as peptide [240,241] and protein ligands [242] at their targets. Several comparative studies differentiating between the prediction accuracy of ∆G bind energies of protein-ligand complexes by MM-GBSA and MM-PBSA have been reported [243][244][245][246]. One such study indicated hot-spots in Ras-Raf and Ras-RalGDS protein-protein complexes involved in allosteric activation [244]. In another such study, the prediction of binding affinities of HIV-1 protease inhibitors helped rationalize drug resistance as a result of binding site mutations [245]. MM-PB(GB)SA calculations do not rely only on the parameters employed but also the receptor structures used [216,218,233,247,248]. Specifically, these approaches are believed to be dependent on the system. Hou et al. [214] reported accurate relative ∆G bind for 59 ligands against six different protein targets using the MM-GBSA approach when compared with the MM-PBSA method. Oehme et al. [220] reported that MM-GBSA has better accuracy than MM-PBSA for calculating ∆G bind energies of ligand-HIV protease systems. Hence, to date, no study concludes on which approach is better; instead, the literature suggests that the applied method should be selected on a case-by-case basis.
In conclusion, MD-based binding free energy calculations have a significant impact on the hit identification stage as they can improve the accuracy of the ranking of the hit compounds. As we learned from Equation (2), binding free energy is also related to binding kinetics, i.e., association and dissociation of a ligand to/from its target. We will now look at how MD is used in analyzing the drug binding and unbinding events to facilitate the design of compounds with favorable binding kinetics.

Drug-Protein Binding Kinetics Estimation
There is increasing evidence of the importance of drug-protein binding kinetics to develop improved drugs [249][250][251][252]. The binding kinetics of drug-protein complexes is characterized by the association rate constant (k on ; unit: M −1 ·s −1 ) and by the dissociation rate constant (k off ; unit: s −1 ) [253]. It is also common to refer to the inverse of k off , the residence time (τ). Recent works have shown that residence times can often be better correlated with the in vivo efficacy of drugs than binding affinities [250,251,254,255]. Drugs with increased residence time interact longer with the target protein, displaying longer physiological effects. Drugs with increased on-rates may bind faster to the target protein than endogenous ligands, being effective competitors. Such drugs would also be less available in the free form, leading to less non-specific binding and fewer side effects. Therefore, modulation of k on and k off values may help in the design of effective and safe drugs.
Computational methods are useful to reveal molecular details about drug-protein binding, providing information on how to modulate binding kinetics. However, obtaining binding and unbinding events from conventional MD simulations is challenging, because they are often infrequent events. One of the ways of sampling such infrequent events is combining MD simulations with enhanced sampling methods [253,[255][256][257][258].
This section provides information on how enhanced sampling methods can be used to help in the design of drugs with modified k on or k off values. This is not a comprehensive list of all available enhanced sampling methods. The reader is referred to up-to-date information in the toolbox KBbox [259], which provides a compilation of tutorials and publications of computational methods to predict binding kinetics.
Most computational work done so far is retrospective, aiming at reproducing data for binding kinetics available in the literature. One exception is a recent study that used elABMD, a method to predict relative residence times that combines adiabatic bias MD with an electrostatics-like collective variable [260]. A Spearman coefficient of 0.94 was achieved for a set of 6 glycogen synthase kinase 3 beta inhibitors. Another exception is the study of k off values of inhibitors of the human flavoprotein D-amino acid oxidase using scaled MD [261]. The k off values of 5 compounds were determined by scaled MD and, afterward, experimentally. The correct ranking was obtained for 4 out of 5 compounds.
Methods with low computational cost allow the estimation of relative kinetic rates for a set of drugs. One of these methods is smoothed potential or scaled MD [262], which aims at estimating relative residence times. Scaled MD enhances the sampling of unbinding events by smoothing the potential energy of the system. The method was used to predict relative residence times for 4 78-kDa glucose-regulated protein binders (correlation coefficient, R, of 0.85) [262], 4 adenosine A2A receptor binders (R of 0.95) [262], 7 glucokinase activators (R of 0.92 after removal of one outlier) [263], and 7 HSP90 inhibitors (coefficient of determination, R 2 , of 0.89 after removal of one outlier) [264]. Scaled MD was also used to understand why the residence time of the drug etoposide is longer for the human type II topoisomerase (TopoII) α, compared to TopoIIβ [265].
τRAMD [266] is another method that aims at computing relative residence times for a set of drugs. τRAMD is based on random acceleration molecular dynamics (RAMD) [267,268], a method where an additional force of constant magnitude and random orientation is applied to the drug to facilitate dissociation. τRAMD was used to predict relative residence times for 70 HSP90 inhibitors (R 2 of 0.86 after removal of outliers) [266]. Analysis of inhibitor-protein contacts in the trajectories using machine learning improved residence time estimates and detected that interactions between a halogen or a methyl group in the loop-binder compounds with Phe138 lead to a longer residence time [269]. τRAMD was also recently applied to estimate relative residence times of ligands dissociating from different mutants of T4 lysozyme [270].
Nonequilibrium targeted MD (TMD) [271][272][273] is another method that can be used to compute relative residence times for a set of drugs. In this method, an additional force is applied to the drug to facilitate dissociation, as in τRAMD, but in this case, the magnitude of the force is changed to obtain constant velocity of unbinding. The mean work calculated from TMD simulations was compared to k off values for a set of 26 HSP90 inhibitors [274], resulting in R 2 from 0.45 to 0.80 for inhibitors grouped according to binding mode and scaffold. Moreover, the net charge of inhibitors was identified as a factor that could lead to longer residence times.
Brownian dynamics (BD) simulations [275] are suitable for studying drug-protein binding. In this method, the drug and protein are treated as rigid bodies, with no internal motions, and the solvent is represented implicitly, leading to low computational cost. A combination of BD with MD simulations was used to study the binding pathways of two inhibitors to H1N1 neuraminidase [276]. The computed association rates were ranked correctly, and the formation of a salt bridge between the inhibitors' carboxyl group and Arg368 was revealed as a key step for binding.
Many enhanced sampling methods have moderate to high computational costs, preventing fast computation of residence times for a large set of drugs. However, such methods can reveal molecular details that may help in the design of drugs with modified binding kinetic rates. Methods that depend on the definition of one or more progress coordinates to describe the infrequent event and enhance the sampling, such as metadynamics [277] and weighted ensemble MD (WEMD) [278,279], have been used to reveal mechanistic details of drug-protein association and dissociation. Metadynamics was used to distinguish cyclin-dependent kinase 8 inhibitors with short or long residence times [280] and to study association or dissociation events for relevant pharmacological targets such as HIV reverse transcriptase [281], Src kinase [282,283], p38 MAP kinase [284], the M3 muscarinic receptor [285], the adenosine A2A receptor [285,286] and the β2-adrenergic receptor [287]. WEMD was used to study ligand dissociation from epoxide hydrolase [288] and from proteins used as model systems to study binding kinetics [289,290].
SMD has also been used to uncover molecular mechanisms of ligand association and dissociation. SMD suits for analyzing fast and slow (un)binding events of various ligands (see for example refs [291][292][293]). While applying biasing force restraints on explicit-solvent models allows for faster approximate calculations in SMD simulations, this leads to a non-equilibrium trajectory and could change the underlying physical processes too drastically [294]. A recent work [295] used interaction-energy fingerprints obtained from SMD to predict k off values for 37 HIV protease inhibitors (R 2 of 0. 75), showing that interactions with residues Asp25, Ile47, and Ile50, located in the active site or in the flap region, are important to modulate k off values. SMD was also used in combination with RAMD to study dissociation from B-RAF kinase [296].
Another enhanced sampling method used for binding kinetics studies is REMD, although with a usually high computational cost. A modification of REMD which employed replica exchange in two dimensions was used to study the binding of an inhibitor to Src kinase [297]. Many binding and unbinding events were sampled. Transient encounter complexes, which are not visible in experimental data but may reveal information to help in the design of drugs with improved kinetic rates, were characterized.
In summary, there are many examples of works showcasing how computational methods can be used to help in the design of drugs with improved binding kinetics ( Figure 8). On one hand, methods with low computational cost, such as τRAMD and scaled MD, can be used to rank tens to hundreds of ligands by residence time. This information can help in the prospective selection of candidate drugs with short or long residence times.
On the other hand, methods with high computational cost, such as WEMD, metadynamics, and REMD, can reveal information about transient states and ligand-protein interactions that can be modified to modulate k on or k off values. Therefore, computational methods can be applied to provide mechanistic insights in drug-protein association and dissociation events, helping in the design of drugs with improved kinetic rate constants. ics, and REMD, can reveal information about transient states and ligand-protein interactions that can be modified to modulate kon or koff values. Therefore, computational methods can be applied to provide mechanistic insights in drug-protein association and dissociation events, helping in the design of drugs with improved kinetic rate constants. In the next section, we will look at ways to tackle the flexibility of peptides in peptideprotein docking to get more accurate binding poses.

Peptide-Protein Docking: Tackling the Flexibility Challenge with Molecular Dynamics Simulations
Peptides represent a unique class of compounds that differ in their biochemical and therapeutic characteristics from small molecules, proteins, and antibodies [304]. Peptideprotein interactions are essential e.g., for cellular signaling, localization, immune system, and apoptotic pathways. Consequently, these molecules have been attracting attention from pharmaceutical and biological communities, with nearly 20 new peptide-based clinical trials annually [305]. In the computational field, much progress has been achieved in small-molecule docking, but these methods are often not suited for peptides given their size and flexibility [306]. The inherent flexibility of peptides poses a great obstacle toward determining the 3D structure of the complexes they form. In this section, we briefly review the use of MD simulations to tackle the peptide flexibility challenge, its applications, advantages, and fallbacks in the context of peptide-protein interactions.
Computational methods that aim to study protein-peptide interactions can be classified into three categories: (1) template-based docking that uses known structures to build an initial model of the target complex; (2) local docking that performs a search around a user-defined region, and; (3) global docking that concomitantly searches for the binding site and a binding pose. Each approach has its own strategies to deal with the main challenges in protein-peptide docking, namely the modeling of conformational changes, in particular on the peptide side, the integration of experimental or theoretical data into the In the next section, we will look at ways to tackle the flexibility of peptides in peptideprotein docking to get more accurate binding poses.

Peptide-Protein Docking: Tackling the Flexibility Challenge with Molecular Dynamics Simulations
Peptides represent a unique class of compounds that differ in their biochemical and therapeutic characteristics from small molecules, proteins, and antibodies [304]. Peptideprotein interactions are essential e.g., for cellular signaling, localization, immune system, and apoptotic pathways. Consequently, these molecules have been attracting attention from pharmaceutical and biological communities, with nearly 20 new peptide-based clinical trials annually [305]. In the computational field, much progress has been achieved in small-molecule docking, but these methods are often not suited for peptides given their size and flexibility [306]. The inherent flexibility of peptides poses a great obstacle toward determining the 3D structure of the complexes they form. In this section, we briefly review the use of MD simulations to tackle the peptide flexibility challenge, its applications, advantages, and fallbacks in the context of peptide-protein interactions.
Computational methods that aim to study protein-peptide interactions can be classified into three categories: (1) template-based docking that uses known structures to build an initial model of the target complex; (2) local docking that performs a search around a user-defined region, and; (3) global docking that concomitantly searches for the binding site and a binding pose. Each approach has its own strategies to deal with the main challenges in protein-peptide docking, namely the modeling of conformational changes, in particular on the peptide side, the integration of experimental or theoretical data into the modeling process, and the selection of the best models out of the plethora of generated models (scoring). The application of flexible refinement in combination with molecular docking software to address the flexibility challenge has been investigated among others by using an enhanced sampling MD method with models obtained from ClusPro Peptidock [307], by combining the default pepATTRACT [308] method with an additional round of MD refinement, and by using MD to generate an ensemble of peptide conformations as input for HADDOCK [309]. A detailed description of several protein-peptide docking software is available elsewhere [310].
The use of MD simulations can provide an improved sampling of protein-peptide conformations; however, it has been observed that for large-scale predictions, heavy computational resources would be needed. A possible approach to circumvent this is to apply enhanced sampling MD methods such as SMD [311], Hamiltonian replica exchange [312], and Gaussian accelerated MD (GaMD) [313] to study protein-peptide binding. GaMD has been successfully used, in combination with the global peptide docking software ClusPro Peptidock, to refine docking poses and explore the binding mechanism of three systems. Initial models of the complexes were obtained by using the standard ClusPro Peptidock webserver protocol from the unbound forms of the peptides and proteins, excluding the PDB entries that contained the bound form of the target systems from the fragment search. For each system, the top-scoring protein-peptide complex was selected and refined using GaMD (described in detail in Supplementary Material of Wang, Alekseenko, Kozakov, and Miao [313]). The selected initial docking models had backbone RMSD of 3.3, 3.5, and 4.8 Å, for peptides 1, 2, and 3 respectively, with no significant conformational changes compared to the starting model observed for peptides 1 and 3. After GaMD refinement, the minimal backbone RMSD from the bound form observed throughout the MD simulations for peptides 1 and 2 was lower than 1 Å, whereas peptide 3 showed high fluctuations, temporarily sampling the near-native conformation (1 Å ≤ i-RMSD ≤ 2 Å). Cluster analysis of the conformations with the lowest free energies yielded backbone RMSDs of 0.94 Å, 0.61 Å, and 2.72 Å for peptides 1, 2, and 3, respectively, adequately capturing conformational changes and noticeably refining the initial docking poses. Despite these promising results, the number of systems tested remains very limited.
HADDOCK, the High Ambiguity-Driven biomolecular DOCKing software has been optimized for protein-peptide docking, taking full advantage of its capacity to handle molecular ensembles and allow for flexible refinement. The protocol starts from three peptide conformations: extended, alpha-helical, and polyproline-II, which, together, cover about 80% of the observed protein-bound peptide conformations in the experimentally determined structures in the PDB [314]. The default HADDOCK protein-protein protocol consists of three stages: (1) From randomized ligand orientations around the protein receptor, rigid-body energy minimization is performed, (2) the top-ranking models are subsequently refined using semi-flexible simulated annealing [315] in torsion angle space, and finally, (3) the resulting docking models are refined with a short restrained MD in explicit solvent. In the case of protein-peptide docking, to allow for more flexibility and potential conformational changes, the length of the semi-flexible refinement is doubled, and the peptides are treated as fully flexible, which increases the conformational sampling. This protocol was tested on peptiDB [316], with 103 non-redundant peptide-protein complexes, including 47 unbound cases. Since HADDOCK is an information-driven method, the binding region was defined in a broad manner, consisting of a binding interface on the protein receptor which was~3 times larger than the true interface. By doing this, the focus is on the challenge of identifying the correct binding conformation of the peptide. The protocol generated 79.4% high quality (interface-RMSD ≤ 1 Å) models for boundunbound (i.e., experimental peptide-bound protein/unbound peptide) and 69.4% for unbound-unbound (unbound protein/unbound peptide) docking, the latter representing the most realistic scenario [317]. This ensemble docking approach that includes flexible MD refinement steps is shown to be a valuable strategy to tackle the flexibility challenge in protein-peptide docking. The HADDOCK developers have also provided guides on how to generate peptide ensembles using MD simulations instead of the three common conformation approaches and how these can be used in HADDOCK, both through its webserver and a local installation [304,309].
The flexible peptide-protein docking protocol pepATTRACT combines fast coarsegrained ab initio docking with an atomistic refinement of the most favorable docked models. Combination of the ATTRACT docking protocol with an MD refinement using AMBER 14 [318] led to a 50% increase in the top-10 success rate when compared to a no-refinement scenario [308]. An ensemble of 80 peptide-protein complexes from the peptiDB benchmark was selected for testing. This ensemble includes experimental data and has been used in other docking protocols [317]. The same strategy as previously published for HAD-DOCK was followed by, using the three main peptide conformations for the docking with some knowledge of the binding site. For each complex, the 1000 best initial rigid-body docking solutions, ranked by the ATTRACT score, were selected for flexible interface refinement [319] and subsequent MD with AMBER 14 using a generalized Born implicit solvent model. The final complexes were clustered by the fraction of common contacts [320] and ranked by the top-four members of each cluster [317] as done in HADDOCK. Interestingly, the authors also considered a "worst-case" scenario in which no knowledge of the binding site is assumed, and a global docking is performed. This is often the case faced by most researchers. After clustering, 97% of the successful cases (56 out of the 80 complexes considered) had at least one cluster with an average top-four interface-RMSD ≤ 2 Å. The AMBER MD refinement step was shown to yield an average improvement of 0.44 Å of the interface-RMSD. There was a striking increase in comparison to the use of only the default interface refinement, which only improved the model's interface-RMSD by 0.10 Å.
In conclusion, several peptide-protein docking software are continuously being developed and optimized further. While addressing the innate flexibility of peptide-protein complexes remains challenging, researchers are taking advantage of MD simulations for enhanced sampling and refinement. This combination of docking and MD can greatly improve the prediction of peptide-protein complexes and is a powerful asset in the field of peptide therapeutics and drug design.
In our biological context, the cellular environment is of course more complex than just water, ions, and proteins. In the next section, we specifically look at target proteins that are associated with biological membranes. We will discuss how MD is utilized to study their structure, function, and ligand binding properties as well as what the benefits are if we can include the membrane structure explicitly in the simulations.

Molecular Dynamics Simulations of Membrane Proteins
Most current drug targets are membrane-associated, either integral or peripheral proteins [321,322]. Integral membrane proteins are either polytopic proteins (spanning through the membrane several times) or bitopic proteins (single-pass proteins, spanning through the membrane only once), whereas peripheral proteins are associated with only one side of the membrane. Unfortunately, the structure determination of membrane proteins has been particularly challenging [321]. This has hampered reliable computational studies of several important drug targets. However, recent advances in crystallography and especially in cryogenic electron microscopy (cryo-EM) have resulted in a significant increase of atomistic resolution structures of large membrane-bound macromolecules, thus providing valuable information on these targets for computer-aided drug design.
MD simulations of membrane proteins have often been carried out without the membrane as it is computationally more costly to include a membrane model in the simulation system, especially if the lipid bilayer is represented in atomic detail [323]. However, biological membranes can play significant roles in modulating drug action (e.g., by affecting the substrate selection or receptor function), not only at polytopic integral membrane protein targets but also at bitopic and peripheral membrane protein targets [324]. There are also drugs whose mode of action is to directly target (disrupt) biological membranes [325], and thus, the inclusion of membranes in MD simulations may be crucial for gaining realistic insights. In this section, we focus on MD simulations of particularly important drug targets that are polytopic integral membrane proteins, i.e., G-protein coupled receptors (GPCRs) and ion channels. We also look at what MD simulations can tell about the function of cytochrome P450 (CYP) enzymes that are bitopic membrane proteins playing a significant role in drug metabolism.

Molecular Dynamics Simulations of GPCRs and Ion Channels
With membrane proteins representing around 25% of the total proteome and their experimentally proven importance in various pathophysiological processes, they are highly regarded as targets in small molecule drug discovery. Indeed, approximately one-third of all currently marketed small molecule drugs target membrane proteins to alter downstream cellular signaling pathways. Rhodopsin-like GPCRs, nuclear receptors, ligandand voltage-gated ion channels are ranked as the top four of all druggable targets by FDA-approved drugs [326,327].
The most targeted class of transmembrane (TM) proteins are the GPCRs. These consist of seven transmembrane-spanning α-helical domains with a characteristic dynamic network connecting the ligand-binding site to the cytosolic Gα signaling subunit. Despite the significant functional importance of GPCRs, spectroscopic techniques remain challenging for these membrane proteins because of their distinct hydrophobic surfaces, high flexibility, and relatively low expression. However, recent advances in the structural biology of GPCRs, with X-ray crystallography, cryo-EM, and X-ray free-electron lasers (XFELs) have resulted in novel overall structural insights and understanding of the structural basis for the induced receptor response upon ligand binding and unbinding [328][329][330]. Because of the highly dynamic nature of GPCRs with multiple, functionally distinct conformational states [331,332], combined spectroscopic and computational studies allow for a detailed atomic analysis of the TM domain dynamics mediating downstream signals [333]. The most recent example is the in-depth characterized structure of GPR52 coupled to its downstream heterotrimeric Gs protein in a previously unknown self-activation state [334]. Additionally, a potential allosteric ligand-bound state was suggested as a potential drug target for Huntington's disease.
Interestingly, class A GPCRs contain a 'placeholder' formed by the N-terminus and the extracellular loop 2 (ECL2) [335][336][337]. While adopting an open conformation in the absence of the ligand, the extracellular loop plays a critical role in lipophilic ligand recognition, access, and selectivity for the orthosteric site [338,339]. In the same class, a consensus amino acid network was described through a comprehensive analysis of X-ray structures by Venkatakrishnan et al. [340]. While the GPCR family allows binding ligands of diverse chemical properties, most ligand-contacting residues are preserved in the TM helices TM3, TM6, and TM7 that form a scaffold for the ligand-binding pocket. This observation explains drug cross-reactivity and allows drug repurposing with tailored ligand specificity based on topologically equivalent positions in the TM helices from structural homologs [341][342][343]. In contrast, class B GPCRs are frequently targeted by highly flexible peptide ligands at a solvent-accessible orthosteric binding pocket [344]. While consensus scaffolds generally guide GPCR ligand recognition, different ligands induce various conformational states of the receptor with a specific downstream signal [345,346].
These observed characteristic dynamics have spurred the use and advanced development of atomic level MD simulation approaches to supplement the increasing amount of static structural information from spectroscopic techniques. The increase in computational resources has enabled the analysis of conformational dynamics up to the microseconds range [347,348]. To date, because of the high structural homology between various TM protein classes even with distant homologs [349], the rational design of novel therapeutic candidates has been facilitated by comparative modeling combined with docking approaches and machine learning algorithms [350][351][352][353]. While currently repurposed ligands generally target primary orthosteric sites [343,354], long unbiased MD simulations have also facilitated the identification of allosteric regions of membrane proteins that may potentiate or inhibit downstream signal transduction [355,356].
Early simulations of membrane proteins were performed without membrane embedding, but today accurate membrane modeling has allowed for the simulation of realistic cellular membrane environments [357,358]. In vitro data on membrane thickness, curvature, fluidity [359], and surface tension [360] has enabled accurate modeling of ligand binding and the corresponding downstream activation of the signaling proteins [361,362]. Furthermore, multiscale MD simulations combining atomistic and coarse-grained MD can also give structural and functional insights into important protein-lipid interactions, protein oligomerization, clustering, and crowding [363]. For example, cholesterol-GPCR interactions and cholesterol's general influence on membranes can modulate GPCR structure and dimerization, thus affecting e.g., receptor-crosstalk and drug efficacy [364].
Multiple dedicated web servers have been developed to facilitate the highly complex set-up of membrane protein simulations. For example, CG dynamics can benefit from the user-friendly CHARM-GUI [365], the VMD Membrane plugin [366], and the recently developed MERMAID (employing the Martini forcefield) [367]. Alternatively, parameterized phospholipids for coarse-grained simulation of membrane-protein dynamics are available in the SIRAH2.0 forcefield [368] recently implemented in the GROMACS and AMBER simulation packages. Besides these, LipidWrapper and BUMPy allow for geometric transformations towards different curved shapes of lipid bilayers [369,370]. The development of PACKMOL-Memgen for building membrane protein-lipid bilayer systems has enabled accurate atomistic MD simulations with complex lipid mixtures at desired ratios, allowing asymmetric leaflet compositions [371]. While parameters for phosphoinositides and lysophospholipids have recently been added to the AmberTools19 PACKMOL-Memgen release, the range of lipids in the new Lipid17 force field, which is the update of the AMBER Lipid14 forcefield [35], can be further extended. Moreover, various enhanced sampling and biasing simulation methods such as REMD [372], metadynamics [373], milestoning [374], and umbrella sampling [359,375] have been explored successfully for membrane-associated simulations. Another useful resource is the MemProtMD database, which, as of now, contains 3500 resolved membrane protein structures that have been inserted into simulated lipid bilayers using coarse-grained self-assembly MD simulations [376].
Besides gaining general insights into ligand binding modes [377] or quantifying ligand binding energies [378] at membrane-embedded TM receptors, ligand migration from the membrane bilayer to the central GPCR channel has been recently successfully simulated [379]. For example, it has become evident that membrane partitioning characteristics of drug molecules such as quantitative bilayer distribution, permeation depth, and their overall conformational features in the lipid bilayer are crucial for understanding their target GPCR binding kinetics [380,381]. Simulations of membrane permeability of small molecular compounds are useful for predicting drug absorption through cellular membranes, although challenging [382]. Recently, the physics-based PerMM tool and database were developed to facilitate the analysis of passive membrane permeability coefficients and translocation pathways of bioactive molecules [383]. The availability of such advanced computational tools may offer an alternative to the common in vitro membrane diffusion models that include Parallel Artificial Membrane Permeability Assay (PAMPA), phospholipid vesicle-based permeation assay (PVPA), Permeapad ® , and the artificial membrane insert system (AMI-system) [384]. From the available data in reported studies, it can be inferred that membrane-embedding significantly affects TM receptor structure and dynamics, ligand diffusion and binding as well as downstream signaling.
MD simulations of ion channels have been used, for example, to explore the mechanism of ion permeation by external stimulants or to investigate the structural and functional effects of disease-causing mutations in the context of genetic disorders such as cystic fibrosis [385,386]. Such studies can give important insights to aid in the drug discovery process. For example, the key conformational changes in the channel pore could be targeted with small molecular compounds [385]. For ion channels (or any other target proteins) that lack a known experimental structure, MD can help select and validate the best homology models for further structural investigations [387,388]. In addition to evaluating the stability of ligand-binding at ion channels (e.g., [387,389]), simulations of agonists, partial agonists, and antagonists in the ligand-binding sites of the ion channel may give an understanding of the structural details crucial for the channel opening (activation) or closing (inhibition). For example, Feng et al. found that an agonist had larger conformational effects on the channel pore profile of human TRPV1 (transient receptor potential vanilloid type 1) than an antagonist [387]. On the other hand, simulations of an ionotropic glutamate receptor GluR2 (iGluR2) suggest that since partial agonists show more flexibility at the binding pocket, they cannot open the channel pore as efficiently as full agonists [390]. Interestingly, water exchange within the iGluR2 binding pocket was shown to be more pronounced when a partial agonist was bound than when a full agonist was present. Simulations of the serotonin 3A receptor (5-HT 3A R) with granisetron showed how the binding of this antagonist causes conformational changes in the receptor, thus leading to constricting the channel pore [391]. Similar to Feng et al.'s TRPV1 simulations with agonists and antagonists [387], MD simulation of the cryo-EM structure of rabbit TRPV5 with econazole showed different flexibilities of the inhibitor in the four binding sites of the homotetrameric channel structure [392]. The varying dynamic behavior of the inhibitor in each monomer was shown to be correlated with the direct lipid contacts of econazole at the binding site. The simulation data also suggests how econazole affects the pore profile to inhibit Ca 2+ flow through the channel. Wang et al., in their study on allosteric regulation of P2X receptors (ATP-gated cation channels), used metadynamics to determine favorable binding modes of an allosteric regulator AF-353 at human P2X3 receptor (hP2X3) [393]. Moreover, after a short MD refinement of the crystal structure of hP2X3 complexed with another allosteric modulator AF-219, they were able to observe new, stable ligand-receptor hydrogen bonds that were not seen in the original crystal complex.
As in the case of GPCRs, the phospholipid bilayer plays an important role in ligand-ion channel interactions. MD simulations can give insight into the access pathways of ligands to their site of action; e.g., MD was used to elucidate how capsaicin accesses its intracellular TRPV1 binding site by first flipping from the extracellular to the intracellular leaflet of the bilayer [394]. Moreover, MD simulations can efficiently facilitate the identification of structurally and functionally important lipid interaction sites on ion channels and other integral membrane proteins [395]. For example, the crucial role of cholesterol on the structural integrity of nicotinic acetylcholine receptor (nAChR) was demonstrated by MD simulations; in the absence of cholesterol the receptor structure collapses, while the cholesterol molecules present both in the surface sites and buried pockets maintain the original receptor conformation and support the contacts between the pore and the agonist-binding domain [396]. Likewise, MD simulations of the GABA type A receptor (GABA A R) with cholesterol suggest that cholesterol may promote pore opening by binding in the intersubunit sites of the receptor in a wedge-like manner [397]. Insight into such sites may aid in the design of (lipid-like) drug molecules that can modulate the function of therapeutic integral membrane proteins such as ion channels and GPCRs.
We will now look at the key enzymes responsible for drug metabolism and how their membrane interactions significantly affect their ligand-binding and functional properties.

Drug Metabolizing Cytochrome P450 Enzymes
Cytochrome P450 enzymes (CYPs) play an important role in drug metabolism, steroid biosynthesis, and xenobiotic degradation. Human CYPs are bitopic integral membrane proteins with a single TM α-helix that is connected to the globular catalytic domain via a linker region varying in length from 25 to 28 residues. The knowledge of orientation and interactions of CYPs in the membrane and therefore mechanism of substrate entrance and exit after metabolism is scarce due to experimental limitations. The most X-ray structures of CYPs have been resolved after truncating the TM helix residues, or even by mutating residues in the globular domain region that interacts with the membrane. Therefore, significant information on protein-membrane interactions and function has been lost. To date, full-length structures of only fungal (Saccharomyces cerevisiae, Candida albicans, and Candida glabrata) lanosterol 14-alpha demethylases (CYP51) have been reported [398][399][400]. However, there are a number of experiments performed (employing e.g., epitope analysis, mutagenesis, tryptophan fluorescence scanning, and atomic force microscopy) to find the important globular domain and membrane interactions and distance of the globular domain above the membrane [401][402][403][404]. The TM-helix orientation can be measured experimentally using solid-state NMR spectroscopy [405,406]. Furthermore, the orientation of CYPs' globular domain has been determined by measuring the heme-tilt angle by rotational diffusion and nanodisc experiments [407].
As these experiments have been performed for only certain CYPs and there is a very low sequence identity between different CYP subfamilies, there is little understanding of how sequence differences between CYP isoforms will influence the membrane orientation, interactions of the globular domain with the membrane, and subsequent selection of substrates for metabolism by individual CYPs. As most of the substrates are hydrophobic in nature, it is assumed that the ligand tunnel opening in the membrane would be the favorable route of entry into the catalytic site of CYPs [408,409]. To understand such fine differences, computational modeling, and simulations at various time and length scales have played a significant role in understanding the mechanistic behavior and dynamics of CYPs in their native environment.
Computational methods such as the Orientation of Protein in Membranes (OPM) database (http://opm.phar.umich.edu/) [410] and the Protein Data Bank of Transmembrane Proteins (http://pdbtm.enzim.hu/) [411] can be used to predict the position, insertion, and orientation of membrane proteins in the biological membranes. Some research groups have taken the OPM predicted orientation of the globular domain above the membrane as the starting point and further optimized it using all-atom molecular dynamics (AAMD) simulations in the physiological conditions [412,413]. However, large domain motions or orientational changes of the globular domain are unattainable by AAMD and require coarse-grained (CG) simulations. The CG simulations help us explore large conformational spaces by decreasing the number of degrees of freedom and allowing the extension of simulations beyond the microsecond scale. Different approaches utilizing multiscale simulations, CGMD, and AAMD have been used to study the orientations of several CYPs (CYP2C9, CYP3A4, and human CYP51) in a lipid bilayer [414][415][416] We have optimized a multiscale simulation protocol to study CYP-membrane interactions [417]. For this, CYP3A4 was used as the test case for the insertion and orientation of a CYP enzyme in a POPC (1-palmitoyl-2-oleoyl-sn-glycero-3-phosphocholine) bilayer. We then extended the same approach to other major drug-metabolizing CYPs such as CYP2C9, CYP2C19, CYP1A1, CYP1A2, as well as to the human steroidogenic enzymes CYP17 and CYP19 that are [417][418][419]. From the results of our simulations, we have been able to identify differences in the orientations or the distance of the globular domain with respect to the membrane, and important residues at the membrane-protein interface that result in such differences.
Through our optimized multiscale simulation protocol, we identified that the two highly similar CYPs: CYP2C9 and CYP2C19 that share 94% sequence identity, can have not only distinct substrate specificities but also different membrane protein orientations and interactions [419]. Such differences were due to the primary sequence variation in the linker, beta-strand1, the B-C loop, helices F, F'-G', G regions and their turns. These regions are the flexible substrate recognition sites (SRS1, 2, and 3), located at the entrance to the active site [420]. Furthermore, using our multiscale simulation approach, we identified the effect of mutations in the TM-helix of CYP17 on the interactions and orientation [418]. The original TM-helix sequence was modified to increase the expression of CYP17 in E. coli. The difference in the heme-tilt angles in wildtype and mutant (mt) TM-helix corresponded to the experimental heme-tilt angle measured by Ivan Lenov for mtCYP17 (64 ± 4 • ) [421], Ohta et al. [404] (47 • to 63 • ), and simulation results by Cui et al. [422]. Furthermore, mutations in the TM-helix of CYP 17A1, especially W2A and E3L, led to a gradual drifting of the TM-helix out of the hydrophobic core of the membrane. This instability of the TM-helix could affect the interactions with the allosteric redox partner, cytochrome b5, required for CYP 17A1's lyase activity. Truncation of the TM-helix residues in CYP17 has been shown to decrease its catalytic activity 2.4-fold, which could be due to the loss of hydrophobic interactions between CYP17 and its redox partner [423]. Such hydrophobic interactions between TM helices of CYP2B4 and cytochrome b5 have been reported by dynamic nuclear polarization (DNP) NMR spectroscopy under magic angle spinning (MAS) [424]. Therefore, the unstable TM-helix lying parallel to the membrane plane in mtCYP17 could result in the loss of inter-TM helix interactions between mtCYP17 and its redox partner, leading to decreased catalytic activity, or affect membrane anchoring and thereby lead to reduced activity.
Besides understanding the CYP-membrane interactions and orientations, it remains unexplained how the CYPs select different substrates and what route substrates take for entry into the catalytic site, and how the corresponding soluble products exit the site. The study of ligand entrance and exit is beyond the scope of classical MD due to the limited time and length scale one can reach in such simulations. Therefore, enhanced sampling techniques such as RAMD [268,425] and SMD [408,[426][427][428] are used. In RAMD simulations, an artificial force with random direction is applied to the center of mass of the ligand while in SMD, a force is applied to the ligand in a specified direction (Figure 7). Different ligand access/exit tunnels have been observed opening in soluble vs membranebound CYPs [425,429]. In mammalian CYPs, the 2a tunnel that opens into the membrane can be used by hydrophobic substrates, whereas tunnel 2c that opens in the solvent may be preferred by hydrophilic products to egress [430]. RAMD simulations of CYP2C9 in complex with flurbiprofen and warfarin and their products by Cojocaru et al. [425] suggest that depending on the physicochemical properties of the ligands, different tunnels were preferred. RAMD simulation of CYP51 by Yu et al. showed, on the other hand, that tunnel 2f is the predominant ligand egress tunnel that leads to the membrane [431].
To conclude, computational prediction, modeling, and simulation have enabled us to a great extent to understand the structure, dynamics, and function of CYPs in general and in particular mammalian CYPs in their native membrane-like environment. However, there is still a need for improving computational algorithms, developing compatible force fields, and facilitating the inclusion of complex membrane systems in MD simulations to help understand fully complex biological questions relating to drug metabolism by CYP enzymes. In the future, elucidating the intricacies of the membrane and ligand interactions of CYPs may, for example, aid in predicting the metabolism of novel drug candidates or understanding the adverse effects of drugs and their metabolites. Moreover, this may give important insights into anticancer drug development that targets CYP enzymes [432].
We have now seen how MD can be used to support the first steps of the drug development process from target identification and validation to lead discovery and optimization ( Figure 2). In the last part of this review, we focus on the next phase, the preclinical research and, more specifically, pharmaceutical development or (pre)formulation studies. In this phase, it is important to find suitable formulations (e.g., a tablet or solution) that then will be used in the preclinical toxicity assays and later in clinical trials. If the candidate drug is successful in the trials (safe and effective) and will later obtain the marketing authorization, its formulation needs also to be up-scalable and suitable for industrial manufacturing. Since the solubility of the drug is important for absorption, it is of uttermost importance to ensure the solubility of the solid-state drug formulations. The stability of the chemical compounds is also an important aspect to be considered. To achieve these and any other required properties that are important for the manufacturing process or the final dosage form, drug formulations include many excipients, e.g., wetting agents, preservatives, antioxidants, plasticizers, lubricants. It may also be beneficial to load the drug to different delivery vehicles such as nanoparticles. In the following paragraphs, we get a glimpse of how MD can help in drug formulation and solubility studies, as well as in the characterization and design of nanoparticles for drug delivery and therapeutic purposes.

MD Studies of Crystalline and Amorphous Solids
Pharmaceutical formulations are often crystalline or solid-state in nature, which makes the understanding of such systems critical in pharmaceutical development and manufacturing. The crystalline systems include everything from polymorphic forms, salts, and co-crystals to hydrates [433]. Additionally, recent research has suggested that formulations containing the active pharmaceutical ingredient (API) in an amorphous state are an attractive option for poorly soluble drugs [434].
Many physical phenomena of interest such as dissolution, phase changes, nucleation, and crystal growth happen on large time scales which necessitates the use of molecular mechanics force fields instead of more accurate but more expensive quantum mechanicsbased approaches. Force fields typically used in drug discovery such as AMBER, OPLS, and CHARMM can be applied to crystalline systems [435], and other force fields that are developed with crystals in mind also exist [436,437].
Understanding the possible phase changes between different polymorphic forms of the same crystalline drugs is critical because different forms can have radically different behavior during dissolution, storage, and manufacturing. The transition between the β and α forms of DL-norleucine is a typical example of a solid-solid phase change simulated with MD [438]. It was possible to discover different transition pathways and their energy barriers and to determine that the transition proceeded by cooperative movement between the bilayers in the crystal.
Another type of phase change that is often encountered, is dehydration of a hydrate crystal to a lower hydrated form or to the anhydrous state [439,440]. This process has been captured for example for naproxen using MD as the dihydrate form of the sodium salt dehydrated to the monohydrate form during the simulation [441]. Dehydrated products are often of low crystallinity, which makes structure determination difficult with traditional single-crystal X-ray diffraction methods. Solving the molecular structure of such products requires, therefore, X-ray powder diffraction (XRD). Here, MD can assist by simulating the dehydration process; the XRD pattern of the dehydrated state found in the simulation can then be compared with an experimental powder pattern to solve the structure of the new phase [442,443]. Using MD-based free-energy perturbation methods, the free energy of inclusion of a solvated molecule can be calculated, which is relevant for understanding the stability of a hydrate [444].
The morphology of the crystalline phase of an API is very important for its behavior during processing because the shape of the microcrystalline particles influences the powder's flowability [445]. Several studies have targeted the crystal growth of organic molecules using MD and have been able to predict the morphology of experimental crystals with good accuracy [446]. The growth of a crystal is often modified using additives. MD simulations can be used to quantify the particular additive's effect on the growth rates of each crystal face [447].
At the macroscopic level, crystals of drug-like molecules often take days to grow but at the nanoscale, the growth of a single crystal layer is much faster and accessible with MD. Such events are typically simulated using enhanced sampling methods such as metadynamics, where a collective variable describing the crystallinity is used to construct a biased potential that enables escaping from local minima (Figure 9). The collective variable is modeled by calculating the orientation of a crystalline molecule with its neighbors. This allows for distinguishing between molecules in the crystal and molecules in the solution. By simulating several different faces of a crystal, the rates of the different ways a molecule can attach and detach to the crystal surface can be computed. The rates are dependent on the MD simulation and therefore parameters such as temperature, solvent composition, Processes 2021, 9, 71 31 of 60 and level of supersaturation can be varied. These rates can then be used as an input in a kinetic Monte Carlo simulation to predict the morphology of the crystal [448].
Processes 2021, 9, x FOR PEER REVIEW 31 of 63 Figure 9. Collective variable-based enhanced-sampling methods restrain the system along one or more reaction coordinates (so-called collective variables) by introducing an additional bias potential. (A,B) Metadynamics [277]. (A) After the simulation has reached a local minimum and is sampling around it, a biased potential is employed to enhance the sampling in that part of the potential energy surface to be able to push the system away from the local minimum to new regions. (B) In the end, the biased potential becomes larger than the free energy barriers, allowing the system to move freely from one minimum to the other. (C) Umbrella sampling [200]. The reaction coordinate is divided into separate overlapping windows and the biasing potential is applied to each window to efficiently sample the conformational space by multiple simulations along the reaction profile.
Another fundamental problem for crystalline formulations is the nucleation of a crystal from a solution, where changes in the experimental conditions can produce different polymorphic forms of the same API. Here, MD can be used to simulate the nucleation by preparing a simulation box consisting of a supersaturated solution of the API. MD simulations have shown that nucleation sometimes happens by a two-step process where density fluctuations of the solute produce amorphous clusters that can potentially turn crystalline and form the initial crystal nucleus [449]. This contrasts with the classical nucleation theory that assumes that a crystal nucleus forms directly from the solute [450]. For example, the nucleation of DL-norleucine crystals from an octanol solution has been studied by Ectors et al. [451] and they also found that pre-nucleation clusters were formed. The norleucine molecules formed bilayers of enantiopure domains that the authors theorize could transform into the racemic crystal structure.
Nucleation can also happen from a pure amorphous phase that can be created using methods such as ball milling or melting followed by quench cooling [452]. Understanding nucleation at the molecular level is very difficult using experimental techniques because of the small time and length scales involved. Using metadynamics, it is possible to simulate the nucleation event at the molecular level. Giberti et al. [453] were able to simulate the transition of urea from the amorphous phase to the known most stable crystalline polymorph and additionally found a new metastable polymorph.
Coamorphous systems have shown promise in drug delivery due to their enhanced stability compared with the pure amorphous phases [454]. However, MD simulations of such systems are still rare [455]. In the next paragraph, we will look at how MD simulations can facilitate the characterization of drug-polymer formulations that are commonly used for increasing the physical stability of amorphous solid drugs (e.g., preventing nucleation and subsequent crystallization).

Amorphous Drug-Polymer Formulations
In the field of amorphous drug formulations, the main challenge is the stability of the amorphous solid dispersions (ASD) [456,457]. Once the solid drug is amorphized, via, e.g., spray-drying, the molecules will tend to recrystallize [458], and the physical stability is further reduced if the compound is stored under humid conditions. The water acts as a plasticizer and will lead to time-limited stability of such drugs [459,460]. One way to extend the amorphous stage is the addition of polymers to the ASD. If the polymer is chosen Figure 9. Collective variable-based enhanced-sampling methods restrain the system along one or more reaction coordinates (so-called collective variables) by introducing an additional bias potential. (A,B) Metadynamics [277]. (A) After the simulation has reached a local minimum and is sampling around it, a biased potential is employed to enhance the sampling in that part of the potential energy surface to be able to push the system away from the local minimum to new regions. (B) In the end, the biased potential becomes larger than the free energy barriers, allowing the system to move freely from one minimum to the other. (C) Umbrella sampling [200]. The reaction coordinate is divided into separate overlapping windows and the biasing potential is applied to each window to efficiently sample the conformational space by multiple simulations along the reaction profile.
Another fundamental problem for crystalline formulations is the nucleation of a crystal from a solution, where changes in the experimental conditions can produce different polymorphic forms of the same API. Here, MD can be used to simulate the nucleation by preparing a simulation box consisting of a supersaturated solution of the API. MD simulations have shown that nucleation sometimes happens by a two-step process where density fluctuations of the solute produce amorphous clusters that can potentially turn crystalline and form the initial crystal nucleus [449]. This contrasts with the classical nucleation theory that assumes that a crystal nucleus forms directly from the solute [450]. For example, the nucleation of DL-norleucine crystals from an octanol solution has been studied by Ectors et al. [451] and they also found that pre-nucleation clusters were formed. The norleucine molecules formed bilayers of enantiopure domains that the authors theorize could transform into the racemic crystal structure.
Nucleation can also happen from a pure amorphous phase that can be created using methods such as ball milling or melting followed by quench cooling [452]. Understanding nucleation at the molecular level is very difficult using experimental techniques because of the small time and length scales involved. Using metadynamics, it is possible to simulate the nucleation event at the molecular level. Giberti et al. [453] were able to simulate the transition of urea from the amorphous phase to the known most stable crystalline polymorph and additionally found a new metastable polymorph.
Coamorphous systems have shown promise in drug delivery due to their enhanced stability compared with the pure amorphous phases [454]. However, MD simulations of such systems are still rare [455]. In the next paragraph, we will look at how MD simulations can facilitate the characterization of drug-polymer formulations that are commonly used for increasing the physical stability of amorphous solid drugs (e.g., preventing nucleation and subsequent crystallization).

Amorphous Drug-Polymer Formulations
In the field of amorphous drug formulations, the main challenge is the stability of the amorphous solid dispersions (ASD) [456,457]. Once the solid drug is amorphized, via, e.g., spray-drying, the molecules will tend to recrystallize [458], and the physical stability is further reduced if the compound is stored under humid conditions. The water acts as a plasticizer and will lead to time-limited stability of such drugs [459,460]. One way to extend the amorphous stage is the addition of polymers to the ASD. If the polymer is chosen properly, the drugs will have a high affinity to it and that will slow down the drug aggregation and crystallization process [460,461].
Solubility/miscibility and mobility are two of the several factors contributing to the stability of an amorphous formulation. For example, the formation of a hydrogen bond network between the drug and the polymer often plays a significant role in reducing the tendency of the amorphous drug to crystallize [462,463]. Experimentally, the molecular dynamics and crystallization kinetics of amorphous formulations are usually studied with broadband dielectric spectroscopy (BDS) and differential scanning calorimetry (DSC) [464][465][466], whereas the molecular interactions such as hydrogen bond distributions can be studied with Fourier-transform infrared spectroscopy (FTIR) and high-resolution solid-state NMR techniques, although with limited insights due to e.g., poor resolution [462,467].
The past decade has seen an increased interest in computational studies of drugpolymer formulations, Bradley D. Anderson's group being among the pioneers in employing MD simulations in the field [467][468][469][470][471][472][473][474]. Complementing the experimental methods, MD simulations are particularly useful for providing atomic-level structural insights into drug-polymer-solvent interactions and distributions and for exploring and predicting the thermodynamic and kinetic bulk properties of such formulations [472,[475][476][477]. For example, the glass transition temperature (T g ), an important parameter for the stability of amorphous formulations, can be determined dilatometrically from a series of MD simulations performed at different temperatures. Usually, the obtained T g values agree reasonably with the experiments, although they may be higher due to the faster cooling rates in the simulations [472]. The contribution of various groups of a polymer to the mobility of a drug molecule can be studied by analyzing the atomic fluctuations of the molecules along the simulation trajectories. Another important property to assess for ASD is the miscibility of the drug and the polymer. Despite the concerns about the applicability of the Flory-Huggins (FH) theory [478], a significant number of studies are based on the calculation of chi (χ), the FH interaction parameter [479,480]. In the majority of studies, chi represents how much molecules of type A prefer to interact with molecules of type B rather than both A and B interacting with themselves (applied to a binary system). As the next step, the miscibility can be used to estimate solubility parameters [470]. Importantly, MD allows us to study the formation of the hydrogen bonds between the drug molecules and polymers [473,481]. That, along with radial distribution function and other analysis routines, are powerful tools to quantify the nature of the drug-polymer interactions, for instance, whether hydrogen bonds are formed or whether it is the hydrophobicity that plays the major role in the specific placement of the drug molecules.
The frequent problem with MD simulations in this field is that the polymers are relatively long for the all-atomistic simulations [482]. In some works, researchers represent the polymer by a chain with a lower number of monomers [481]. In others, where atomistic details can be considered negligible, a coarse-graining is applied to get a faster qualitative analysis of the polymeric excipients [483].

Drug Solubility
According to the biopharmaceutical classification proposed by Amidon et al., active pharmaceutical ingredients can be divided into four groups [484] that define to which extent two main properties of importance for oral absorption, namely permeability and solubility, impact the bioavailability of a specific drug molecule. Absorption is a result of complex, multiphase molecular interactions between the molecules of interest, membrane components, and the environment at the interface between the gastrointestinal (GI) fluid and the cells. A number of computational studies are focused on the permeability issue, targeting for example the passage of small molecular drugs [485,486] or peptides [487][488][489] through the cell membrane, and the effect of permeability enhancers on the membrane permeability [490]. Poor aqueous solubility has emerged as the main problem in the field, with as much as 70-90% of contemporary drug molecules having too low solubility to allow absorption after oral intake [491]. Low drug solubility, therefore, frequently causes erratic exposure and clinical failure after oral dosing [492]. To improve aqueous solubility, a number of advanced strategies have been developed such as the use of cosolvents, surfactants, self-emulsifying drug delivery systems, mesoporous carriers, lipidbased and amorphous formulations [493]. All these techniques have their own pros and cons, but their optimization for a particular poorly water-soluble drug requires a thorough analysis. Not only may the experiments be expensive, but also the molecular level details and mechanisms on the scale of nanoseconds and nanometers often cannot be observed in the wet lab. Thus, computer simulations appear to be a useful tool, capable to predict the interactions of the drug molecules with the formulations and surrounding media of interest, e.g., GI fluids for orally delivered drugs. Apart from MD simulations, there are also other computational methods available; these can be generally classified as machine-learning-based methods requiring training on experimental data [494,495] and empirical methods exploiting the datasets to analyze the contributions of specific descriptors (functional groups) to solubility [496,497]. Nevertheless, classical MD not only allows one to determine the best parameters for the solubility improvement but also sheds light on the underlying mechanisms [471,498,499], thus, providing new ideas for further formulation development [500].
In MD studies, solubility is explored through calculations of the change in Gibbs free energy of the simulation box while the system is being transferred from one state to another. This is visualized in a simplified example provided in Figure 10, where a simulation box filled with only water molecules is compared to a similar box complemented with a hydrophobic molecule or a hydrophilic molecule. As compared to the purely aqueous system, the second box is going to have extra free energy, as the higher repulsion between the insoluble (I) molecule will make the solvent molecules more packed out of the cut-off radius of the I, which essentially moves the system further from the energy minima. In the third system, on the contrary, the stronger attraction will lead to the minimization of the Gibbs free energy, along with more stable placement of the molecules and lower entropy. Figure 10. Schematic representation of three similar simulation boxes. Inner circles represent the regions of strong repulsion (overlapping electronic clouds), outer stand for van der Waals attraction (except for the red, insoluble I molecule). On the left panel, a purely aqueous system is presented; in the middle, a hydrophobic molecule is put into an aqueous environment; the right panel demonstrates the system with a hydrophilic molecule.
The absolute value of solubility is not trivial to observe as it is difficult to compute the residual chemical potential of the solid form [501,502]. On the other hand, if the experimental value of solubility for one compound is known in the chosen media, it can serve as a reference for comparison in simulations: the reference system is simulated, its free energy is calculated and the same is repeated for a similar system with another compound. In the same way, the impact of various solvents can be estimated from a set of simulations where the reference system is compared with other simulation boxes containing the same molecule but different solvents [503]. The solubility of a compound can be evaluated in any solvent of interest based on the experimentally found solubility of the compound in another solvent, and the relevant free energies of solvation computed with MD. For instance, Chebil et al. [504] reproduced the solubility of quercetin in chloroform, water, acetonitrile, acetone, and tert-amyl alcohol within 0.5 log unit using the equation: Here c2 and c1 are solubilities in two different media (unknown and known, respectively), ∆G solv is the computed free energies of solvation in these media, R is the gas constant, and T is the temperature in Kelvin.
A similar approach is used for the calculation of excess solubility-the solubility in the actual solution relative to an ideal solution [505]. Nevertheless, several important assumptions need to be taken, the most important being that the amount of the solvent should not influence the solubility of the drug molecule. In practice, this is implemented by introducing just one molecule of the solute to the box with solvent (usually the periodic boundary condition is applied), which represents the infinite dilution [503].
Several computational approaches exist to calculate relative solubility values for such systems. The firsthand choice is to perform free energy calculations, for example, using free energy perturbation (FEP) [193,506], thermodynamic integration (TI) [195], or Bennett's acceptance ratio [194]. The underlying concept in them is based on the change of the energy of the system while the solute molecule is introduced to the solvent. For example, the solute molecule can be gradually inserted in (or deleted from) the box. This is implemented by reducing the strength of the Lennard-Jones and electrostatic interactions between the molecule of interest and the solvent molecules. The gap between the two states (the presence and absence of the solute molecule in the solvent) can then be subdivided into more configurations, so that the difference between the adjacent ones would be small, thus improving the accuracy of the method.
The Widom insertion method is an alternative approach where the solute particle is randomly inserted into various places across the box and the averaged energy of the interaction is used to calculate the excess chemical potential [507]. The main disadvantage of the method is the significant overlaps between the atoms, which makes it less suitable for relatively big solute molecules in the dense solvents. On the other hand, the method has shown a good agreement with experimental data for smaller molecules. Further, solubility can also be calculated through Grand Canonical Ensemble MD simulations [508,509]. The partition coefficient is yet another important value that can be observed from MD. It represents the difference in solubility of the solute in water and organic phases. Calculating partition coefficients can be as valuable as the solubility calculations, as it allows to evaluate the affinity of the solute molecule to the media at the interface with water [510]. For more details and methods, the reader is recommended to read the review on MD simulations for solubility calculations by Hossain et al. [511].

Nanoparticle Simulations
Recent advances in nanotechnology and applications of nanomedicines for site-specific drug delivery have increased the interest in nanoparticles (NPs) for drug discovery and pharmaceutical applications [69,512,513]. Currently, a number of different NPs with their unique physical properties are being investigated for pharmaceutical applications, including but not limited to, liposomes [514], inorganic or polymer-based nanoparticles [515][516][517], and carbon nanotubes [518]. The typical size of various nanoparticles can range between 1 and 100 nm. Due to this smaller length scale, molecular simulation is particularly suitable and extensively used in a number of studies for investigating different NPs' molecular structure [514][515][516][517], drug loading capacity [519,520], drug release mechanism [521][522][523], interaction with cell membrane, and membrane penetration process [524][525][526][527][528][529][530]. A number of excellent review articles are already available in the literature on the application of MD on various aspects of NP-based drug delivery systems [512,[531][532][533][534]. However, in this section, we will provide a review of recent articles where MD simulations have been applied to study some of the key structural and functional properties of different NPs that are necessary for the NPs to become successful in pharmaceutical applications.

Structural Characteristics
Structural characteristics of lipid-based NPs composed of stearic acid, oleic acid, and sodium dodecyl sulfate (SDS) have been studied by coarse-grained (CG) MD simulations [514]. Formation of vesicle-like structures with water entrapped inside the lipid droplet was observed. Such lipid-based NPs were proposed to be suitable for the delivery of both hydrophilic and hydrophobic drugs. MD simulations were used by Connell et al. to study the adsorption of gonadotropin-releasing hormone I (GnRH-I) and a cysteine-tagged modification of GnRH-I (cys-GnRH-I) to modeled silica surfaces [516]. The simulations revealed that a specific orientation of the peptides at the silica surface was the reason behind different immunological responses of two different peptide-silica NP systems. The simulations also showed that when the peptides were conjugated with bovine serum albumin, they formed more interaction points with the surrounding molecules which resulted in improved immunological responses. In another study, Monti et al. investigated the stability, aggregation, and sintering of gold (Au)-NPs functionalized with cysteine-based peptides using MD simulations with reactive force fields (ReaxFF) [517]. Their molecular level analysis suggested that the Au-NPs functionalized with peptides are physically stable and the peptide chains adsorbed to the gold surface hindered sintering of the particles. The stability of fluticasone NPs was studied by Ahmed and coworkers [515] with MD simulations. Fluticasone is a poorly water-soluble corticosteroid and six different polymers were used to prepare fluticasone nano polymeric particles. The MD simulations and calculated MM-GBSA binding free energies confirmed the experimental results, showing better binding of the drug to a copolymer as compared to the individual respective polymers.

Drug Loading and Release
MD has been employed to investigate the loading of cyclosporin A (CyA) by lipidbased NPs composed of medium-chain fatty acids and poly (ethylene glycol) molecules at various compositions [519]. The simulations showed that the increase of drug loading did not significantly change the NP nanostructure and drug exposure to the aqueous environment. The CyA molecules prefer to localize at the NP surface. However, if the particle is overloaded with CyA (increasing the number of drug molecules from 1 to 10), a submerged localization of CyA inside the NP was observed. The drug loading capacity of chitosan NPs was investigated by Mousavi et al. for donepezil and rivastigmine molecules [520]. Their simulation showed that in the presence of ions, the overall drug loading capacity of rivastigmine is relatively lower compared to donepezil in chitosan NPs.
Thota et al. [522] performed CGMD simulations to investigate the loading and release of ibuprofen in the amphiphilic peptide (AF) 6 H 5 K 15 (FA32) and its derivatives (F 12 H 5 K 15 and F 16 H 5 K 15 ). The simulations showed that the FA32 formed quasi-spherical NPs and the derivatives of FA32 formed nanofibers, respectively, when loaded with ibuprofen. It was also found that the release of ibuprofen was slower from the NPs than the nanofibers.
MD simulations can be employed to investigate the interaction of NPs with drug compounds as a function of the NPs' various functional and structural properties, i.e. surface modification with coating polymers, morphology, and size. For example, PEGylation of NPs, a process where NPs are coated with polyethylene glycol (PEG) polymers, can enhance the loading of hydrophobic drug molecules [535]. However, the binding affinities of such NPs and drug compounds should then be evaluated properly to better understand the release capabilities from such NPs. MD with ReaxFF has also been used to explore the structure, dynamics and drug release of Au-NPs functionalized with chitosan and loaded with gentamicin [521]. The simulations revealed that the loading of the drug is dependent on the size of the carrier and chitosan polymer chain length. The simulations also showed a slower release mechanism by a delayed gentamicin migration towards the solvent.
The size and internal microstructure of the NPs are also important in terms of drug release from the NPs. Depending on the binding affinity of the drug compounds with the NP constituent components, the drug can be loaded in the core or corona of the NPs. The release of drug compounds will highly depend on the location of the drug loaded at the NPs. Free energy calculations using SMD can play a vital role in understanding drug components' location inside the NP and their release mechanism. Such free energy calculations by Vukovic et al. [536] revealed the exact location of the hydrophobic bexarotene and a highly charged drug, human vasoactive intestinal peptide, inside a PEGylated NP. In a recent study by Kabedev et al. [537], it was shown that an NP composed of intestinal fluid components loaded with a hydrophobic drug, danazol, requires significantly less energy to release the danazol molecules from the NP surface to the membrane center than into the water phase. Unrestrained CGMD was employed to investigate the mixed polymeric-surfactant NP structure by Wu et al. [538]. The authors observed that the particle microstructure was affected by the different polymer-surfactant ratios, which subsequently affected the particles' drug loading ability.
MD simulations have also been used to evaluate the effect of the system pH and drug concentration required to achieve an optimal drug release and loading. Chitosan NPs were evaluated as carriers for the controlled release of cytarabine at pH 6.4 and 7.4 [523]. The release of cytarabine was found to be higher at pH 6.4 than 7.4, which agreed well with the experimental results. Koochaki et al. [539] investigated the self-assembly behavior of dual-responsive block copolymer and the resulting micelles' loading ability of doxorubicin (DOX) using both AAMD and CGMD simulations. The results of CGMD simulations showed that DOX can be efficiently loaded in the polymeric micelles. However, the loading efficiency was slightly decreased with the increasing drug concentration, mainly due to the decrease in particle size.
Overall, MD simulations can reveal important information on the size and internal molecular structure of NPs as well as on the system parameters such as pH and overall drug concentration, which can be utilized to achieve optimal NP design characteristics for desired drug loading and release.

Diffusion through Cell Protecting Layers
MD simulations were used by Yu and colleagues to better understand the diffusion behavior of lipid-based NPs in a biological hydrogel mimicking a mucus layer [540,541]. The simulations revealed that lipid-based NPs with moderate rigidity have higher diffusivity through the mucus layer compared to their soft and hard counterparts. NPs with moderate rigidity can deform into an ellipsoidal shape and diffuse through the mucus layer with rotational motions. MD simulations also captured the temperature-dependent diffusion mechanism where lipid-based NPs show higher diffusivity near the phase transition temperature or the temperature at which the liposome membrane transforms from a solid-like gel to liquid crystalline phase [541]. MD simulations were also employed to study NP diffusion through endothelial glycocalyx [542]. It was estimated that the force required to indent the glycocalyx surface increases significantly with the increase in NP size. The results also suggested that an improved diffusivity of the glycocalyx layer will be possible with positively charged NPs.

Direct Membrane Penetration
One of the attractive functional properties of NPs (more specifically cationic NPs) is their capability of direct translocation into the membrane without a significant toxic effect at lower concentrations [524]. However, the direct translocation mechanism is not clearly understood, and the knowledge of the molecular insertion mechanism(s) would be beneficial in designing NPs for efficient cellular uptake. MD simulations have shown a significant promise in modeling the direct insertion process. For example, SMD combined with umbrella sampling has been applied in a number of studies to understand the effect of particle size and shape [525], surface charge [525,526], and properties of ligands used as particle coatings [527,543] on cellular uptake of the NPs.
Regular or unrestrained MD has also been applied to investigate NP translocation through the membrane by taking some additional measures in the simulations such as using multiple NPs [528], applying an external electric [529] or magnetic field [530], or applying gradients of ionic concentration across the cell membrane [524]. In one such study, CGMD simulations were performed to investigate the interaction of multiple peptide-NP conjugates with the cell membrane [528]. The simulations showed that some NPs form a transmembrane pore and then the other NPs can permeate the membrane through that pore. Direct translocation of cationic Au-NPs was investigated by applying an external electric field using CGMD simulations [529,544]. The simulation results suggested that translocation of NPs can occur under a weak external electric field without permanently damaging the membrane. The effect of the magnetic field for the permeation of NPs through the blood-brain barrier endothelial cell membrane was studied by Pedram et al. [530]. They used iron oxide as the core and gold as the shell of the NPs in all-atom MD simulations. The results showed that the NPs of size between 5-45 nm can cross the cell membrane with the application of very week magnetic force (<5000 pN). Lin and Alexander-Katz showed that ionic concentration gradient across the cell membrane can induce the penetration of Au-NPs using CGMD simulations [524]. After the insertion process, the pore was selfresealed and the ionic concentration across the membrane was equilibrated. This study suggested that the gradient of different properties, such as pH or osmotic pressure across the membrane can be important for the direct permeation of NP.

Interactions with Viruses and Biomacromolecules
In addition to functioning as a drug delivery system, NPs can also interact with a relevant receptor and act like a drug itself [534]. In a recent study, it was found that Au-NPs coated with long and flexible ligands mimicking heparan sulfate proteoglycan (HSPG) molecules can associate with different classes of viruses and induce irreversible viral deformation [545]. To better understand the finding, all-atom MD simulations were performed to investigate the interaction of different NPs and viruses at physiological conditions. NPs were initially placed close to the virus capsid and the simulation showed that multivalent binding develops between the negative sulfonate group of the NP ligands and the positively charged L1 protein of the virus. The total binding energy and force applied on the L1 protein by each NP was calculated to be about −34 kcal/mol and 189 pN, respectively. Such force was able to deform the L1 protein, disturb the arrangement of L1 proteins of the virus, and ultimately destabilized the virus [545].
MD simulations have also been used to investigate the inhibition process of amyloid β (Aβ) fibrillation by using Au-NPs [546]. The simulations showed that Au-NPs can influence the conformational change of the peptides and reduce inter-peptide interactions which can in turn prevent the β-Sheet formation of Aβ peptides and prolong the progress of Aβ aggregation. The interaction of NPs covered with different types of ligands (i.e., positively and negatively charged, peptide, and synthetic inhibitor ligands) with Aβ40 peptides and their fibrils were also investigated using MD [547]. The simulations revealed that the peptide-NPs and negatively charged-NPs attached to the free peptides and fibril tips, respectively, which can potentially restrict peptide fibrillization and fibril growth. Positively charged NPs, on the other hand, attached to the peptide and fibril surface and deformed it [547].
Altogether, this section identifies a number of studies where MD simulations were used to investigate various structural characteristics and functionalities of NPs. MD simulations were also applied to improve the understanding of NPs' interaction with membranes, viruses, and macromolecules. The molecular-level description obtained using MD simulations might facilitate the successful NP-based product development in drug discovery and pharmaceutical applications.

Conclusions
Over the past 50 years, MD simulation techniques have established their relevance in modern drug discovery and development processes. With the increasing computer power and development of new force fields and enhanced sampling methods, it is now possible to simulate even big and complex (bio)molecular systems in time scales that can give important insights into real-life molecular events and interactions. Our example cases of sirtuins and RAS show how MD simulations can be successfully used to uncover the conformational dynamics of highly flexible target proteins, thus providing valuable insights for drug discovery and design. Moreover, MD-generated conformational ensembles can be helpful in ligand design against extremely flexible intrinsically disordered proteins. Enhanced MD approaches and force field development are key in tackling such challenging targets. MD simulations can also reveal important details of antibody-antigen interactions and help design novel antibody therapeutics with improved properties.
MD-based binding free energy calculations are widely used in the hit identification phase to improve the accuracy of ranking the putative hits. In addition, the design of drugs with improved binding kinetics benefits from various enhanced MD techniques that probe the drug residence times at the target site. Similarly, MD simulations can facilitate peptide docking by enhancing the conformational sampling and refinement of peptide-protein complexes in the development of peptide therapeutics.
MD simulations of membrane proteins embedded in a relevant cell membrane model have been challenging due to the big size of the simulation system, but especially coarsegraining has helped to reduce the computational cost. Inclusion of the membrane in MD simulations has revealed important insights into lipid-protein interactions, the function of membrane proteins, as well as the ligand entry and exit into the target binding site.
Apart from the first steps of the drug discovery process, MD simulations have also shown to be useful in pharmaceutical development and formulations studies. For example, crystalline and amorphous drugs, drug-polymer formulations, or drug-loaded nanoparticles can be studied by MD simulations to complement experimental studies. Molecular-level insights into such systems can help in improving the solubility, stability, or other properties of drug formulations.
In conclusion, MD simulations can serve as a powerful tool in facilitating the early phases of the modern drug discovery and development process. In the future, MD simulations will likely gain even more importance due to the theoretical and technological advancements in the field.