The NMR2 Method to Determine Rapidly the Structure of the Binding Pocket of a Protein–Ligand Complex with High Accuracy

Structural characterization of complexes is crucial for a better understanding of biological processes and structure-based drug design. However, many protein–ligand structures are not solvable by X-ray crystallography, for example those with low affinity binders or dynamic binding sites. Such complexes are usually targeted by solution-state NMR spectroscopy. Unfortunately, structure calculation by NMR is very time consuming since all atoms in the complex need to be assigned to their respective chemical shifts. To circumvent this problem, we recently developed the Nuclear Magnetic Resonance Molecular Replacement (NMR2) method. NMR2 very quickly provides the complex structure of a binding pocket as measured by solution-state NMR. NMR2 circumvents the assignment of the protein by using previously determined structures and therefore speeds up the whole process from a couple of months to a couple of days. Here, we recall the main aspects of the method, show how to apply it, discuss its advantages over other methods and outline its limitations and future directions.


Structure-Based Drug Design
Most biological processes rely on highly specific protein-protein or protein-ligand inter-molecular interactions.Understanding and manipulating these interactions is the ultimate goal of drug design.Drug research, as we know it today, dates back nearly 100 years with advances in chemistry, including Avogadro's atomic hypothesis, the benzene theory, and the ability to isolate and purify active ingredients from pharmaceutical plants [1].The finding of active components was initially a serendipitous accident; a famous example is the discovery of penicillin by Alexander Fleming.However, while the ligand was found to have a specific effect, the target receptor(s) remained unknown.The search for its target was very time consuming, and the idea of rational design emerged as a possible solution to speed up the process.Drug research based on the structure-activity relationship, where a molecule is designed to specifically inhibit or promote an interaction, was augmented when X-ray crystallography started to be used to derive protein structures; the protein whose structure was first determined by X-ray was myoglobin and led to a Nobel Prize in 1962.Nowadays, drug discovery commonly starts by screening large libraries of molecules or fragments against a carefully selected drug target, with identified binders further optimized by molecular refinement or fragment-based design approaches.This approach was enabled by advances in biology (e.g., biochemistry, molecular biology and genomics) that drive the search for better drug targets.Further, progress in chemistry and bioinformatics allowed for the synthesis and screening of enormous compound libraries.These methods, however, are very error prone and require validation, preferentially by a complex structure at atomic resolution.To obtain atomic-level structures, X-ray crystallography is still the most widely used method, followed by NMR spectroscopy and cryo-electron microscopy.The latter method is quickly developing (Nobel Prize in chemistry 2017) and shows great potential in drug discovery for large systems.Of particular interest are methods that combine several approaches, including the Nuclear Magnetic Resonance Molecular Replacement (NMR 2 ) method (Figure 1).
search for better drug targets.Further, progress in chemistry and bioinformatics allowed for the synthesis and screening of enormous compound libraries.These methods, however, are very error prone and require validation, preferentially by a complex structure at atomic resolution.To obtain atomic-level structures, X-ray crystallography is still the most widely used method, followed by NMR spectroscopy and cryo-electron microscopy.The latter method is quickly developing (Nobel Prize in chemistry 2017) and shows great potential in drug discovery for large systems.Of particular interest are methods that combine several approaches, including the Nuclear Magnetic Resonance Molecular Replacement (NMR 2 ) method (Figure 1).

The NMR 2 Method
NMR spectroscopy is often the only method able to determine complex structures with ligands, which are typically either part of very dynamic interactions or in fast exchange.Unfortunately, NMR is rather slow in structure determination, since all atoms must be assigned to their respective chemical shifts, which requires long measurements and intensive analysis.However, most often the information in the binding site instead of the whole protein is of interest.In those cases, NMR 2 represents a good alternative.NMR 2 utilizes exact spatial information provided by solution-state NMR to locate and refine the binding pocket of the complex structure using an independent starting model of the receptor (e.g., X-ray structure of a homolog), and performs this analysis without the need for protein resonance assignment.NMR 2 has successfully determined several structures of complexes very accurately (within 1 Å) with only a few days of measurement and calculation time.

The NMR 2 Protocol
To successfully use NMR 2 , the following steps are required (Figure 2): (i) sample preparation for NMR measurements: uniformly 13 C, 15 N labeled, or selective labeling schemes (e.g., isoleucine, leucine, and valine methyl labeling) can be used for the protein [2].This can be achieved by recombinant expression, e.g., in E. coli [3].Only one of the two molecules in the complex should be isotopically labeled.For strong binders, i.e., low µM and higher affinity (koff < ΔCS and koff < σ, where koff represents the dissociation rate, ΔCS the chemical shift difference of the bound and free states, and σ the cross-relaxation rate), an equimolar ligand to protein ratio is optimal; whereas for weak binders, i.e., high µM and lower affinity (koff > ΔCS), an excess of ligand is required to saturate the receptor as much as possible.This can be monitored by so-called chemical shift mapping experiments, where the ligand is titrated to the protein and binding is detected through perturbation of the backbone NH chemical shifts of the receptor in 1 H, 15 N-HSQC or TROSY experiments [4][5][6][7].Knowing the

The NMR 2 Method
NMR spectroscopy is often the only method able to determine complex structures with ligands, which are typically either part of very dynamic interactions or in fast exchange.Unfortunately, NMR is rather slow in structure determination, since all atoms must be assigned to their respective chemical shifts, which requires long measurements and intensive analysis.However, most often the information in the binding site instead of the whole protein is of interest.In those cases, NMR 2 represents a good alternative.NMR 2 utilizes exact spatial information provided by solution-state NMR to locate and refine the binding pocket of the complex structure using an independent starting model of the receptor (e.g., X-ray structure of a homolog), and performs this analysis without the need for protein resonance assignment.NMR 2 has successfully determined several structures of complexes very accurately (within 1 Å) with only a few days of measurement and calculation time.

The NMR 2 Protocol
To successfully use NMR 2 , the following steps are required (Figure 2): (i) sample preparation for NMR measurements: uniformly 13 C, 15 N labeled, or selective labeling schemes (e.g., isoleucine, leucine, and valine methyl labeling) can be used for the protein [2].This can be achieved by recombinant expression, e.g., in E. coli [3].Only one of the two molecules in the complex should be isotopically labeled.For strong binders, i.e., low µM and higher affinity (k off < ∆CS and k off < σ, where k off represents the dissociation rate, ∆CS the chemical shift difference of the bound and free states, and σ the cross-relaxation rate), an equimolar ligand to protein ratio is optimal; whereas for weak binders, i.e., high µM and lower affinity (k off > ∆CS), an excess of ligand is required to saturate the receptor as much as possible.This can be monitored by so-called chemical shift mapping experiments, where the ligand is titrated to the protein and binding is detected through perturbation of the backbone NH chemical shifts of the receptor in 1 H, 15 N-HSQC or TROSY experiments [4][5][6][7].Knowing the affinity of the small molecule for its receptor, the protein saturation can be calculated with the following formula: where PL, L, P and K D are the concentration of the complex, the concentration of the ligand, the concentration of the protein, and the affinity of the ligand for the protein.The subscript 'tot' stands for total concentration.
Magnetochemistry 2018, 4, 12 3 of 13 affinity of the small molecule for its receptor, the protein saturation can be calculated with the following formula: where PL, L, P and KD are the concentration of the complex, the concentration of the ligand, the concentration of the protein, and the affinity of the ligand for the protein.The subscript 'tot' stands for total concentration.(ii) Recording experiments to assign the ligand.Usually standard NMR spectra are sufficient to assign the compound in the bound state, e.g., any combination of 13 C 1D, 1D DEPT-90, and 1D DEPT-135 spectra [8], 2D 13 C, 1 H-HMQC [9], 2D 13 C, 1 H-HMBC, 2D 1 H, 1 H-DQF COSY [10,11], F1,F2-15 N, 13 C-filtered 1 H, 1 H-TOCSY, or 2D F1,F2-15 N, 13 C-filtered 1 H, 1 H-NOESY spectra [12][13][14][15][16][17][18][19][20][21].(iii) Measurement of the ligand intra-and ligand-protein inter-molecular distances.All distance restraints for NMR 2 are derived from NOE (nuclear Overhauser enhancement) cross-peaks of F1-15 N, 13 C-filtered 1 H, 1 H-NOESY spectra [16][17][18][19][20][21].These experiments suppress the intra-molecular NOEs peaks from the receptor and render the spectra easier to interpret.In theory, any moiety of the receptor can be analyzed, but to reduce the ambiguity of possible options, the NOEs should be assigned to methyls, amides, or aromatics with respect to their chemical shifts.Focusing only on distinct groups of resonances in the receptor helps to minimize the computational time of the structure calculation.Using methyl groups was so far successful for all complexes.In addition, the NOESY mixing times have to be chosen carefully.The optimal mixing times for the NOE build-ups depend on the correlation time of the complex.Too short of a mixing time would not allow for enough transfer of magnetization and (ii) Recording experiments to assign the ligand.Usually standard NMR spectra are sufficient to assign the compound in the bound state, e.g., any combination of 13 C 1D, 1D DEPT-90, and 1D DEPT-135 spectra [8], 2D 13 C, 1 H-HMQC [9], 2D 13 C, 1 H-HMBC, 2D 1 H, 1 H-DQF COSY [10,11], F 1 ,F 2 -15 N, 13 C-filtered 1 H, 1 H-TOCSY, or 2D F 1 ,F 2 -15 N, 13 C-filtered 1 H, 1 H-NOESY spectra [12][13][14][15][16][17][18][19][20][21].(iii) Measurement of the ligand intra-and ligand-protein inter-molecular distances.All distance restraints for NMR 2 are derived from NOE (nuclear Overhauser enhancement) cross-peaks of F 1 -15 N, 13 C-filtered 1 H, 1 H-NOESY spectra [16][17][18][19][20][21].These experiments suppress the intra-molecular NOEs peaks from the receptor and render the spectra easier to interpret.In theory, any moiety of the receptor can be analyzed, but to reduce the ambiguity of possible options, the NOEs should be assigned to methyls, amides, or aromatics with respect to their chemical shifts.Focusing only on distinct groups of resonances in the receptor helps to minimize the computational time of the structure calculation.Using methyl groups was so far successful for all complexes.In addition, the NOESY mixing times have to be chosen carefully.The optimal mixing times for the NOE build-ups depend on the correlation time of the complex.Too short of a mixing time would not allow for enough transfer of magnetization and inter-molecular NOE peaks will stay weak or below the noise level.Too long of a mixing time would increase spin diffusion and lead to large signal intensities, but these would require heavy calculations to translate into meaningful distances.In general, NOESY mixing times between 40 and 150 ms are reasonable for a 15-20 kDa protein, exhibiting a correlation time of approximately 10 ns.
The slope of the linear growth of the NOE build-up curve contains the information about inter-protons distances.Under the assumption of an isolated spin-pair system, the inter-molecular NOE cross-peak intensity, ∆M ij (t), is where ρ i is the auto-relaxation rate of the proton i, ∆M ii (0) the initial magnetization, σ ij the cross-relaxation rate, r ij the proton(i)-proton(j) distance, µ 0 the permeability of free space, h the reduced Planck constant, γ H the gyromagnetic ratio of the proton and τ c the rotational correlation time of the protein-ligand complex [22,23].
From Equations ( 1) and ( 2), we can derive σ ij given that fitting the decays of the ligand diagonal peaks provides the auto-relaxation rates and the initial magnetization.If the auto-relaxation rates of the protein groups are missing, because the protein diagonal peaks are suppressed from the F 1 -15 N, 13 C-filtered 1 H, 1 H-NOESY, the median of other groups is a good estimate.The fits can be made using general software such as matlab, python, and R or using the previously published eNORA software that contains an applet for fitting NOE build-up curves [24,25].The influence of a slightly incorrect auto-relaxation rate on the inter-proton distance, r ij , is negligible.However, the initial magnetization is crucial because it is directly multiplied with the cross-relaxation rate.After the fitting of all build-up and decay curves, we can derive a set of intra-ligand and inter-protein-ligand cross-relaxation rates that need to be converted into distances.
To convert cross-relaxation rates into distances, the following has to be kept in mind: in the case of a strong binder, slow exchange regime on the NMR time scale, the correlation time of the complex is the same as the one of the protein, since the influence of the small molecule on the tumbling of the protein can be neglected.In this case, Equations ( 1)-( 6) can be readily used.
In the case of a weak binder, fast exchange regime on the NMR time scale, the effective cross-relaxation rate is the population average between the free and bound states of the ligand [26,27]: Since the correlation time of the ligand is on the order of picoseconds, the first term can be neglected and, as mentioned above, the correlation time of the complex can be displayed as the correlation time of the protein.Consequently, the effective cross-relaxation rate is defined by the bound population of the ligand and the correlation time of the protein: Finally, the correlation time of the protein can be determined by standard 15 N relaxation experiments and used to convert the cross-relaxation rates to distances using Equations (4-6) [28].
A second way to derive distances from cross-relaxation rates is by using the known fixed intra-molecular distances within the ligand (e.g., protons in an aromatic ring) to calibrate the cross-relaxation rates and derive all other distances using Equation (4): As a rule of thumb, the intra-ligand distances should be slightly shorter than the inter-molecular distances and the median value of all distance should be around ~4.0-4.2Å, while the median distance of the intermolecular distances is around of 4.4 Å.
(iv) Choosing the input structure.As an input structure, the protein in its apo form, with another bound ligand, or a homolog can be used to derive a starting model of the receptor.Either X-ray or NMR structures can be provided.In the current state of the program, the user should prepare the following input files: a CYANA-regularized protein PDB file, a ligand CYANA library file that can be generated with the program cylib [29], a sequence file containing the amino acid residues of the protein followed by sufficient linker residues (long enough so that the ligand can access all the protein surface) and the ligand residue name as defined in the ligand library file.All these files are needed to produce the starting structure of the complex where the protein structure is identical to the chosen receptor and the ligand is randomly positioned in space but attached to the protein by the linker.Further details can be found in the CYANA manual.(v) Running NMR 2 .The NMR 2 program screens all possible assignment moieties (usually methyl groups) of the protein and calculates the complex structures for all options.However, it is crucial to diminish the number of options in order to complete the calculations in a reasonable amount of time.This is achieved primarily by using only a fraction of the inter-molecular distances in the first calculation cycle, where only around 3-4 methyl groups of the protein are taken into account.The use of an input structure, the previously derived network of inter-molecular distances, and the use of triangle or tetra angle smoothing to rule out most of the false assignment possibilities are equally important for a manageable calculation time.As of now, NMR 2 is a CYANA-based program and calculates all structures using the standard simulated annealing protocol [30].The results are scored with respect to the target function, which represents a measure of how well the calculated structure fulfills the data.CYANA is the most widely used NMR structure calculation program, which is solely based on experimental data and the repulsive part of the van der Waals potential modeling the atom radii.No other force field is used and therefore the electrostatic potential of the molecules is not modelled.Nonetheless, if specific interactions are known or determined by experiments they can be added following the program syntax [30].Only the best structures are kept for the next calculation cycle where more methyl groups with their respective inter-molecular distances are included.The calculation is finished when all experimental data have been used.(vi) Analyzing the results.The final complex structures have to be analyzed carefully to detect potential errors.NMR 2 requires a definition of the receptor flexibility; however, if there are no restraints on backbone and side chain atoms, the protein will freely move to fulfill the distance restraints, which could potentially yield false positives.Another source of false positives is when the ligand finds its binding site at the N-or C-terminus of the protein or where the protein atom density is the lowest.There, the ligand can freely adopt its position and orientation to fulfill the distance restraints because little or no steric inter-molecular interactions are present.One should keep in mind that this is happening only if the protein contains methyl groups at these sites.
Finally, the quantity and quality of inter-molecular distances are critical.While theoretical considerations indicate that a maximum of six distances should, in principle, be sufficient, practically we observe that ~12-15 distances are the minimum needed to calculate a NMR 2 structure (vide infra).
The NMR spectra should have a high signal to noise ratio as well as good resolution.They should also be free from water suppression artifacts, e.g., so-called pulse trains water suppression 'w5' or excitation sculpting that strongly modify nearby peak intensities [31].

Current Applications of NMR 2
NMR 2 has been successfully applied to calculate complex structures containing ligands in fast and slow exchange (Figure 3) [32,33].Structures containing strong binders where the ligand is a peptidomimetic (MDMX-comp2, Figure 3a) or a small compound (HDM2-pip, HDM2-nutlin complex, Figure 3b,c) have been determined with an accuracy relative to reference structure of 0.9-1.5 Å. Presently, the receptors were up to 32.1 kDa in size, exemplified by ABL kinase-destatinib (in silico data) where the NMR 2 -derived structure has a root-mean-square deviation (RMSD) of 1.1 Å to the previously published complex (Figure 3d).For ligands in fast exchange, so far two structures have been determined (MDMX-SJ212 and HDM2-#845, Figure 3e,f).The NMR 2 -derived SJ212-MDMX structure is consistent with the previously published complex structure with an RMSD of 1.35 Å. HDM2-#845 represents a new complex, where no previous structure existed.A thorough validation of the NMR 2 structure was performed showing the correctness of the structure with 3D 15 N, 13 C-resolved 1 H, 1 H-NOESY, and F 1 -13 C, 15 N-filtered 3D N-resolved 1 H, 1 H-NOESY-HSQC, Saturation Transfer Difference (STD) experiments and chemical shift perturbations [33].
Magnetochemistry 2018, 4, 12 6 of 13 are present.One should keep in mind that this is happening only if the protein contains methyl groups at these sites.
Finally, the quantity and quality of inter-molecular distances are critical.While theoretical considerations indicate that a maximum of six distances should, in principle, be sufficient, practically we observe that ~12-15 distances are the minimum needed to calculate a NMR 2 structure (vide infra).
The NMR spectra should have a high signal to noise ratio as well as good resolution.They should also be free from water suppression artifacts, e.g., so-called pulse trains water suppression 'w5' or excitation sculpting that strongly modify nearby peak intensities [31].

Current Applications of NMR 2
NMR 2 has been successfully applied to calculate complex structures containing ligands in fast and slow exchange (Figure 3) [32,33].Structures containing strong binders where the ligand is a peptidomimetic (MDMX-comp2, Figure 3a) or a small compound (HDM2-pip, HDM2-nutlin complex, Figure 3b,c) have been determined with an accuracy relative to reference structure of 0.9-1.5 Å. Presently, the receptors were up to 32.1 kDa in size, exemplified by ABL kinase-destatinib (in silico data) where the NMR 2 -derived structure has a root-mean-square deviation (RMSD) of 1.1 Å to the previously published complex (Figure 3d).For ligands in fast exchange, so far two structures have been determined (MDMX-SJ212 and HDM2-#845, Figure 3e,f).The NMR 2 -derived SJ212-MDMX structure is consistent with the previously published complex structure with an RMSD of 1.35 Å. HDM2-#845 represents a new complex, where no previous structure existed.A thorough validation of the NMR 2 structure was performed showing the correctness of the structure with 3D 15 N, 13 C-resolved 1 H, 1 H-NOESY, and F1-13 C, 15 N-filtered 3D N-resolved 1 H, 1 H-NOESY-HSQC, Saturation Transfer Difference (STD) experiments and chemical shift perturbations [33].3fea [34] with an RMSD of 1.1 Å, (b) 5c5a [32] with an RMSD of 0.9 Å, (c) 2lzg [35] with an RMSD of 1.5 Å, (d) 2gqg [36] with an RMSD of 1.1 Å (in silico data), and (e) 2n0w [37] with an RMSD of 1.35; (f) represents a complex with a ligand having a new scaffold, where no other structural data were known, and therefore it is compared to the complex structure with nutlin (5c5a).In orange are the NMR 2 -derived structures and in green the reference structures.

Strong binding ligands
How many are needed to successfully run NMR 2 depends on the complex structure: How large and well defined is the binding pocket, how flexible is the ligand, etc.For the previously published complexes, the ligands in slow exchange contained 16-23 inter-molecular restraints or 29 in silico restraints (for the ABL kinase-destatinib), all of which comprise distances between methyl groups of the receptor and the ligand protons.For the weakly binding ligands, 14 and 21 inter-molecular distances between the ligand and the receptor were collected, with most distances involving methyl groups of the receptor.However, in the case of HDM2-#845, one distance was included in either an amide or aromatic group of the protein.
Choosing the right input structure for NMR 2 is not very critical.In the example of HDMX in complex with cmpd2, it was shown that the input structure can be either the apo-protein or a structure with another ligand, or from a homolog.The input structures can also be determined by NMR or X-ray [32].Remarkably, NMR 2 also succeeded in finding the right complex structure of the binding site using an apo-protein as the input structure, wherein the ligand binding site was closed by one receptor helix.This case was very challenging, since the receptor undergoes an allosteric conformational change upon ligand binding, which moves the helix away from the binding site.During the NMR 2 calculations, enough flexibility was given in the loops, with the helices and β-sheets being constrained by hydrogen bonds, and finally yielded to a NMR 2 structure with an RMSD of 1.8 Å to the previously published structure.

NMR 2 versus Other Methods for Rapid Structure Calculations of Protein-Ligand Complexes
Most complex structures are analyzed by X-ray crystallography due to its speed and high degree of automation.However, weak binders often do not crystalize well.Furthermore, X-ray does not contain information on dynamics, and crystal packing can lead to artifacts.The latter is demonstrated in the case of HDM2-nutlin where the NMR 2 structure is different compared to previously published structures (PDB: 4hg7, 4e3j) that contain crystal packing artifacts, but matches perfectly the artifact-free structure, 5c5a [32].In cases involving weak binders, NMR spectroscopy is currently the best method to provide high resolution structural data.Recently, attempts to derive structures and/or dynamics of protein-ligand complexes by NMR more efficiently, when compared to the traditional structure calculation protocol, have been proposed including the use of ambiguous restraints [38,39] (such as ambiguous NOEs), chemical shift perturbations [40][41][42][43][44], or saturation transfer experiments [45,46] in combination with computational methods such as docking and scoring [47][48][49].Here, we describe the advantages and disadvantages of NMR 2 over some of the most commonly used techniques to quickly determine complex structures by NMR.The methods

Weak binding ligands
HDM2-#845 HDMX-SJ212 E F Figure 3.All complex structures so far solved by NMR 2 .They consist of four high-affinity ligands (a-d) and two low-affinity ones (e,f), all of which are consistent with previously published structures (a) 3fea [34] with an RMSD of 1.1 Å, (b) 5c5a [32] with an RMSD of 0.9 Å, (c) 2lzg [35] with an RMSD of 1.5 Å, (d) 2gqg [36] with an RMSD of 1.1 Å (in silico data), and (e) 2n0w [37] with an RMSD of 1.35; (f) represents a complex with a ligand having a new scaffold, where no other structural data were known, and therefore it is compared to the complex structure with nutlin (5c5a).In orange are the NMR 2 -derived structures and in green the reference structures.
How many distances are needed to successfully run NMR 2 depends on the complex structure: How large and well defined is the binding pocket, how flexible is the ligand, etc.For the previously published complexes, the ligands in slow exchange contained 16-23 inter-molecular restraints or 29 in silico restraints (for the ABL kinase-destatinib), all of which comprise distances between methyl groups of the receptor and the ligand protons.For the weakly binding ligands, 14 and 21 inter-molecular distances between the ligand and the receptor were collected, with most distances involving methyl groups of the receptor.However, in the case of HDM2-#845, one distance was included in either an amide or aromatic group of the protein.
Choosing the right input structure for NMR 2 is not very critical.In the example of HDMX in complex with cmpd2, it was shown that the input structure can be either the apo-protein or a structure with another ligand, or from a homolog.The input structures can also be determined by NMR or X-ray [32].Remarkably, NMR 2 also succeeded in finding the right complex structure of the binding site using an apo-protein as the input structure, wherein the ligand binding site was closed by one receptor helix.This case was very challenging, since the receptor undergoes an allosteric conformational change upon ligand binding, which moves the helix away from the binding site.During the NMR 2 calculations, enough flexibility was given in the loops, with the helices and β-sheets being constrained by hydrogen bonds, and finally yielded to a NMR 2 structure with an RMSD of 1.8 Å to the previously published structure.

NMR 2 versus Other Methods for Rapid Structure Calculations of Protein-Ligand Complexes
Most complex structures are analyzed by X-ray crystallography due to its speed and high degree of automation.However, weak binders often do not crystalize well.Furthermore, X-ray does not contain information on dynamics, and crystal packing can lead to artifacts.The latter is demonstrated in the case of HDM2-nutlin where the NMR 2 structure is different compared to previously published structures (PDB: 4hg7, 4e3j) that contain crystal packing artifacts, but matches perfectly the artifact-free structure, 5c5a [32].In cases involving weak binders, NMR spectroscopy is currently the best method to provide high resolution structural data.Recently, attempts to derive structures and/or dynamics of protein-ligand complexes by NMR more efficiently, when compared to the traditional structure calculation protocol, have been proposed including the use of ambiguous restraints [38,39] (such as ambiguous NOEs), chemical shift perturbations [40][41][42][43][44], or saturation transfer experiments [45,46] in combination with computational methods such as docking and scoring [47][48][49].Here, we describe the advantages and disadvantages of NMR 2 over some of the most commonly used techniques to quickly determine complex structures by NMR.The methods can be divided into two main classes: the data are derived from chemical shift perturbation (CSP) or NOEs.Methods using CSP usually record an 1 H, 15 N-HSQC spectrum of the apo-receptor, where each peak corresponds to one amino acid of the backbone.The ligand is then titrated and residues in close proximity to the interaction site are perturbed.These shifts are remarkably large when caused by ring currents produced by aromatic moieties in the ligand.While CSP is difficult to quantitatively interpret, progress in simulations and correlating shifts with secondary and tertiary structure has made it possible to transfer chemical shifts into structural restraints [50][51][52].This made a more quantitative interpretation of CSP possible [6,44,[53][54][55][56][57][58].One example of quantitative CSP is the J-surface-based method: it uses the finding that most of the drugs have aromatic rings involved in the binding (95% in one major drug design database [43]) and that the chemical shift difference due to ring current shift can be converted into a distance [59].This information is used to construct a so-called J-surface, designed from spheres of the distances, where the ligand could be located.The intersection of the spheres from all of the shifted protons represents the ring location.Because of the complexity of the chemical shifts' dependence with respect to the structure of the complex, the structure prediction initially requires a spatial sampling and scoring step to define the ligand binding site (high density region of the J-surface).This is subsequently followed by an experimentally restraint-based optimization of the ligand binding mode.
An advantage of CSP-based methods over NMR 2 is that, in many cases, CSP is detectable even when no inter-nuclear NOEs are observable [53].Poor solubility, low affinity, conformational variation of the ligand or few protons in the ligand are the most common difficulties that limit the detection of NOEs.
The disadvantages of CSP are that the protein backbone resonances have to be known for the free and the bound state, which can be very time consuming or sometimes not possible.The latter can occur when the protein undergoes chemical exchange in the intermediate regime, which leads to severe intensity loss of the amide resonances, like in the case of the apo-HDMX.Furthermore, chemical shifts are generally measured for the protein backbone atoms, but usually side chains (such as methyl groups) are primarily involved in binding of the ligand.Note, the CSP method works also on shifts on side chain atoms; however, this would require resonance assignment of the whole protein and is therefore usually not performed.Additionally, CSP works best for weak binders in fast exchange with the receptor (usually K D weaker than 1 µM) so that the resonances can be followed during a titration.Additionally, CSP will not be treated differently for ligands with slight chemical modifications.This is a clear drawback since often already small chemical modification in the ligand can induce a change in its orientation.Finally, the CSP-based methods also use a docking scoring protocol that relies on force fields or scoring function.The most popular program used in NMR is CSP-HADDOCK [40,47,60], which can make use of a large set of additional experimental restraints such as residual dipolar couplings or pseudocontact shifts [41,61,62].Other docking programs are BiGGER [63], AutoDockFilter [64], SAMPLEX [65], and LIGDOCK [47].
The second class of protein-ligand structure determination methods involves the usage of NOEs or spin diffusion as experimental restraints.Example methods include SOS-NMR [45,47], NOE matching [49], INPHARMA [48,66,67], CORCEMA [46,68,69], or NMR 2 [32,33].Except for NMR 2 , these methods require a docking step prior to the experimentally based scoring of the found poses and eventually perform an experimentally based refinement step.For example, the NOE matching method generates trial ligand binding poses (e.g., from docking), uses them to back predict the 3D 13 C-edited-13 C, 15 N-filtered HSQC-NOESY spectrum and scores each complex with respect to how well its back predicted spectrum matches the measured data.This method has the same advantages as NMR 2 : there is no need for protein resonance assignment and one sample is sufficient for these studies.Similarly, as is the case for NMR 2 , NOE matching is applicable for ligands in fast and slow exchange.One limiting factor is the strong dependence on the input binding poses.The true binding poses have to be sampled in the first place in order to be found by the program.
SOS-NMR (structural information using Overhauser effects and selective labeling) utilizes STD NMR on many ligand-protein complexes where the receptor is labeled specifically on certain amino acid types while the rest of the receptor is deuterated.With this approach, STD shows the contacts to the specific amino acid types in the receptor and the NOEs derive the respective distances.SOS-NMR gives the amino acid composition of the ligand binding site and, if an input structure of the receptor is available, leads to the 3D structure of the complex.The advantages are that no protein resonance assignment is necessary, only a very little amount of protein (less than 1 mg) is needed, and it is applicable for high molecular weight targets since only the free ligand is detected.The disadvantages are that many samples are required using specific labeling schemes, which may be tedious and it needs a prior docking step of the ligand into the binding site, such as DOCK [70,71].
CORCEMA [46,68,69] and INPHARMA [48,67,72,73] are methods that back predict intra-ligand, intra-protein and protein-ligand NOEs or spin diffusion using the full relaxation matrix formalism.They are powerful tools that can also handle systems undergoing multistate conformational exchange and chemical exchange between the free and bound states.Protein resonance assignment is not required but input structures of the complex should be provided as well as the exchange rates and the correlation time of the complex.The INPHARMA method additionally requires two ligands that compete for the same binding site.As for the other methods, the back predicted data are compared to the experimental data to assess the quality of the docking poses.
To summarize, NMR 2 is currently a purely NOE-based method and requires at least ~12-15 inter-molecular NOEs.This is a limiting factor, especially for low-affinity binders, which may lack enough or sufficiently strong inter-molecular NOEs.Furthermore, NMR 2 is not applicable for completely unknown complexes or protein families, since it requires an input structure.NMR 2 is applicable to most exchange regimes, the only limit being the case of severe exchange-broadening.The main advantages are that it does not need any protein resonance assignment, relies on simple and interpretable NMR experiments, requires only one sample, and performs standard NMR structure calculations instead of relying on docking poses.It provides the full structure of the complex of the binding site with high accuracy, since the distance restraints are based on accurate NOEs [24,74,75].Additionally, it is applicable to weak and strong binders in fast or slow exchange and the method is fast.

Conclusions and Outlook
X-ray crystallography molecular replacement [76] is the prime method used to establish structure-activity relationships of relevant small molecules [77].Such an approach was not feasible by NMR, as NMR structure determination relies on the assignment of the protein resonances, which can be extremely long and tedious [28].In recent decades, various methods have been developed in order to derive protein-ligand complex structures faster than with the classical NMR structure calculation protocol, but these methods mostly rely on a preliminary docking step rather than on experimentally driven calculations.Moreover, sometimes partial resonance assignments of the receptor are required [39][40][41][42][43][44][45]47,49,72].A complex structure calculation method that is based on defined and accurate NOEs [78][79][80] but also bypasses the long and tedious protein assignment step was missing.Therefore, the NMR molecular replacement method (NMR 2 ), a new molecular replacement-like approach in NMR, allows for the fast determination of protein-ligand complex structures and fills an important gap in structural biology.NMR 2 yielded the structures of ligand (peptide and small molecule)/protein complexes with an accuracy of 1 Å.It requires the measurement of few accurate inter-molecular distances and only a model of the protein receptor.It is a highly efficient way to determine protein/ligand complex structures, without the need to perform the tedious protein resonance assignment, and structures can be calculated within a couple of days.The method was demonstrated on several different complexes with strong or weak binders and will potentially compete with X-ray crystallography for rapid complex structure determination.Furthermore, the development of specific methyl labelling schemes and automatic methyl resonance assignment methods have opened an avenue toward the study of large molecular complexes [81][82][83].Since our method strongly relies on sharp methyl NMR signals, the path to structure-based drug design on a large system, where classical NMR methods are limited, is wide open.We foresee great potential for our NMR Molecular Replacement method in drug discovery research where structural information is the gold standard for rational design of new active molecules.

Figure 1 .
Figure 1.Nuclear Magnetic Resonance Molecular Replacement (NMR 2 ) derives the complex structure of the binding site within a few days without protein resonance assignment and using only standard 2D NMR experiments.

Figure 1 .
Figure 1.Nuclear Magnetic Resonance Molecular Replacement (NMR 2 ) derives the complex structure of the binding site within a few days without protein resonance assignment and using only standard 2D NMR experiments.

Figure 2 .
Figure 2. Overview of the NMR 2 method.The following steps are required for NMR 2 to determine the complex structure of the binding pocket: (i) Sample preparation for NMR measurements; (ii) Recording experiments to assign the ligand; (iii) Measurement of the ligand intra-and ligandprotein inter-molecular distances; (iv) Choosing the input structure; (v) Running NMR 2 ; (vi) Analyzing the results.

Figure 2 .
Figure 2. Overview of the NMR 2 method.The following steps are required for NMR 2 to determine the complex structure of the binding pocket: (i) Sample preparation for NMR measurements; (ii) Recording experiments to assign the ligand; (iii) Measurement of the ligand intra-and ligand-protein inter-molecular distances; (iv) Choosing the input structure; (v) Running NMR 2 ; (vi) Analyzing the results.

Figure 3 .
Figure3.All complex structures so far solved by NMR 2 .They consist of four high-affinity ligands (ad) and two low-affinity ones (e,f), all of which are consistent with previously published structures (a) 3fea[34] with an RMSD of 1.1 Å, (b) 5c5a[32] with an RMSD of 0.9 Å, (c) 2lzg[35] with an RMSD of 1.5 Å, (d) 2gqg[36] with an RMSD of 1.1 Å (in silico data), and (e) 2n0w[37] with an RMSD of 1.35; (f) represents a complex with a ligand having a new scaffold, where no other structural data were known, and therefore it is compared to the complex structure with nutlin (5c5a).In orange are the NMR 2 -derived structures and in green the reference structures.