Lead Generation and Optimization Based on Protein-Ligand Complementarity

This work proposes a computational procedure for structure-based lead generation and optimization, which relies on the complementarity of the protein-ligand interactions. This procedure takes as input the known structure of a protein-ligand complex. Retaining the positions of the ligand heavy atoms in the protein binding site it designs structurally similar compounds considering all possible combinations of atomic species (N, C, O, CH3, NH, etc). Compounds are ranked based on a score which incorporates energetic contributions evaluated using molecular mechanics force fields. This procedure was used to design new inhibitor molecules for three serine/threonine protein kinases (p38 MAP kinase, p42 MAP kinase (ERK2), and c-Jun N-terminal kinase 3 (JNK3)). For each enzyme, the calculations produce a set of potential inhibitors whose scores are in agreement with IC50 data and Ki values. Furthermore, the native ligands for each protein target, scored within the five top-ranking compounds predicted by our method, one of the top-ranking compounds predicted to inhibit JNK3 was synthesized and his inhibitory activity confirmed against ATP hydrolysis. Our computational procedure is therefore deemed to be a useful tool for generating chemically diverse molecules active against known target proteins.


Introduction
When a high-resolution structure of a ta rget protein is known, computational structure-based drug design is an efficient and effective methodology for the identification and fur ther optimization of hit compounds in order to generate lead co mpounds. Several studies [1][2][3] have reported the succe ssful identification of hi t molecules by in sil ico scr eening of large compound datab ases using software packages such as DOCK [4], G OLD [5,6] and eHiTS [7]. In many c ases, how ever, iden tified hit s exhibit w eak biological a ctivity and poor Adsorption, Distribution, Meta bolism, Ex cretion and Toxicity (ADMET) properties, m aking them uns uitable scaffolds for further optim ization. Consequently, new strategies have to be developed in order to derive molecules with better biological properties from such hits.
Standard hit co mpound optimization approaches involve the addit ion, r eplacement or rem oval of chemical gr oups within the hit molecule. H owever, enhancing the biological activity of the hit often requires a more drastic modification of the core molecular s keleton [8]. De novo design [9-1 3] and scaffold hop ping techniques [14][15][16][17][18][19][20][21][22][23][24][25][26] are ex amples of methods that invo lve su ch modifications. The general assumption underlying these methods is that compounds with similar geometries will interact in a similar manner with the ta rget protein and t herefore, w ill show sim ilar or improved i nhibitory activity. This assumption is based on the lock-and-key model for p rotein-ligand interactions [27] and most of the methods are based in making changes in the n ative moiety of t he ligand scaffolds and their geometries.
In this p aper, w e report th e application of a co mputational method for structure-based ligand optimization to t hree serine /threonine protein kinase s ystems: p38 MA P kinase, p42 MAP Kinase (Erk2), and c-Jun N-terminal kin ase 3 (JNK3). Our method uses as st arting p oint, th e atomic coordinates of the p rotein-ligand complex, determined by X-ray Crystallography, NMR, o r modeling techniques [28]. A large nu mber of compound candidates are g enerated by replacing the atoms of the native ligand with different substituents, keeping fixed the ato mic positions of its core skeleton. Each new co mpound replaces t he n ative l igand in the protein active site co mplementarity and the ne w protein-ligand co mplex i s scored in or der to rank lead candidates. The top rank ing co mpounds ar e selected for furthe r analy sis. The innovati on of our approach l ies in t he wa y in which the librar y o f compound candidates is generated as w ell as in the r apid identification of the ch emical groups (and combinations t hereof) t hat is likely to enh ance biolog ical ac tivity. Our method i dentifies th e native ligands a mong the top f ive h it compounds for t he three serine/threon ine Kinases analyzed here . In addition, the fi ve top r anking co mpounds e xhibit significant IC50 lev els against ATPase activity [29,30]. In the JNK 3 s ystems, a co mpound fro m the 10 top ranking cand idates w as chem ically synthesized and IC50 measurements showed inhibitory activity against ATP hydrolysis. These results suggest that our method can be useful in the identification and generation of lead compounds as drug candidates.

Results and Discussion
Our a pproach was tested using t hree serine/threonine pro tein kinases as t argets. The X-ray protein-ligand c omplex stru ctures used in t his study were: p38 MAP [1,2]thiazine-3-carboxamide (in h ouse cod e Z1208) c omplex obtained from the Protein Data Bank (PDB) [31] (see Table 1). In preparing the input structures for our calculations, the FPH and Z1208 were modified as mentioned in Experimental section ( Figure 1).

Redesign of the FPH ligand
For the FPH-p38 MAP kinase complex, 230 candidate molecules were obtained by our calculations after the various filters have been app lied. Our results show that the preferred atoms for position 5 in the 6-membered ring (see Figure 1c) were either -O-or -N=, to enable hydrogen bond formation with the Nζ of Ly s53 (numbering as in the PDB entry 1oz1) (see Figure 2a). The sam e hydrogen bond i s observed in the or iginal structures of th e p38-F PH co mplex [30]. In order t o maintain this hy drogen bond, bu t also satisfy the bond orders of t he 5-and 6-condens ed membered rings, speci fic combinations of the ato mic species containing single and double bonds are required. For exa mple, if an -O-group is assigned at position 5 (see Table 2), other positions in the molecule could be occupied by either sp2 atoms (ex. compound 1, 5 and 8), or the combination of two sp3 atoms and six sp2 atoms (ex. c ompound 3, 4). If a n -N = group is assigned at position 5, the o ther positio ns i n th e molecule could be occupied by one sp3 atom and seven sp2 ato ms ( for example, co mpounds 2, 6 ). Detailed analysis of the p38-FPH binding site , revealed t hat the co ndensed FPH ring formed hy drophobic interactions with Val38, Leu171, and the aromatic ring of Tyr35 (see Figure 2). These interactions are maintained by so me of the co mpounds descr ibed earl ier in this p aragraph. In su mmary, our method assigned a hydrogen bond acceptor groups to position 5 and hydrophobic groups to the other positions in the condensed rings thereby satisfying binding features observed in the original crystal structures.  We find that the scores computed by our method for the four compounds 9, 22, 24, and 35, correlate well with the experimental IC50 values (see Table 3). In addition, the suggested top-scoring compound (compound 9), also features the lowest IC50 among the six compounds. In this candidate, the geometry of native FPH (see Figure 1) was modified by replacing a hydroxyl group with a h ydrogen atom, as mentioned below (Section 3.2). Having introduced this modification, we might expect that the binding mode of the modified co mpound would differ somewhat from th e native lig and. However, i n the current version of ou r method such alternat ive bi nding modes are not con sidered. Wi thout  We also find that the scores of low ranking compounds tend to be inconsistent with their IC 50 values. For example, compounds 15 and 19a, whose ranks are 153 and 156 respectively, have IC 50 values of 3,100 nM and 1,800 nM (see Table 3). Ho wever, despite t he discrepan cy in these valu es, our calculations indicate the correct binding trend since the scores of both compounds, are positive, which is indicativ e of unfavo rable bi nding energ ies, in agr eement wi th th eir experimental IC 50 valu es. Detailed analysis of the generated structures suggests that this unfavorable score can be explained by the lack of hydrogen bond capability of position 5 in these compounds, as stated above (see Figure 2).

Redesign of the 33A scaffold to optimize ERK2 binding
The ATP binding sites in ERK2 and JNK3 exhibit different chemical compositions and in particular different ratios of h ydrophilic versus h ydrophobic residues (se e Figures 3a,b). Consequ ently, ligan d 33A displays differe nt binding or ientations i n ERK2 and J NK3, with th e chlorobenzene moieties oriented i n opposite directions. An alysis of the ERK2 complex stru cture re vealed contacts between positions 3 a nd 5 of the a romatic ring of the ligand with hy drophobic groups of t he protein (C α in Gly32, Cγ2 in Val37 and Cδ in Lys52) (see Figure 3a). Hydrophobic groups were hence the preferred  Both the proteins and the ligand are display ed using stick models, with the ligand shown using thicker bonds . Co mparison of pan els (a) and (b) illus trate th e difference in orientation of 33A in the two structurally aligned proteins.
Our cal culations y ielded 23 di fferent sub stituents for th e A3 3 ri ng scaffold (Fi gure 1c), wit h benzene moiety having the top score for the ligands-ERK2 complexes (see Table 4). These results are in agreement with the inhibitory activity (lowest K i value) previously reported [29] and can b e explained b y the hydrophobic in teractions between the aromatic ring and th e active site protein residues. The substitution of a -CH residue by -N= decreases the hydrophobic interactions and may explain the lower score v alues of the other designed compounds. The exception to this rule is the 5 th ranking compound where the addition of -N= residue to position 4 in th e aromatic ring increased the repulsive energy (d ecreasing th e ov erall score) due to the proxi mity of t his substituent t o the carbonyl oxygen of the residue Ala33. This effect is enhanced by the fact that our software is using a fixed g eometry approx imation. Two approache s ar e currently bein g devel oped to i mprove th is methodology: (a) consideration of different co mpound conform ations, and (b) relaxati on of the protein-ligand complex in order to relieve any residual strain.

Redesign of the 33A scaffold to optimize JNK3 binding
Analysis of the 3 D-structure of the 33A-JNK3 co mplex reve aled tha t th e ring scaffold to b e modified interacts with Lys68, Pro69 and Ile70 (see Figure 3b). The ring positions 2 and 3 are close to the carbonyl oxygen atoms of the protein residue Pro69 and Ile70, while the rest of the ring atoms are surrounded by hy drophobic residu es. Therefore, positi ons 2 and 3 were a ssigned p ositively charged substituents, whereas hydrophobic substituents were assigned for the remaining ring positions.
Subject to these constraints, our procedure sampled a to tal of 23 substit utions for the 6-membered 33A ring (Figure 1c). H ere too, our results revealed benzene moiety to be the best ranking compound in agree ment with th e experimental data [29], wit h a similar ration ale as for the E RK2. Our second "best c ompound" con tains an -O-at ri ng pos ition 2 a nd an -NH -group at ring position 3. The somewhat lower score of this co mpound is due to close polar con tact with backb one ato ms of th e protein. This score is driven by the stabilizing energy from the proximity between ligand -NH -to the carbonyl oxygen O of Pro69 (3.23 Å), and the repulsive energy for the interaction between the ligand -O-at position 2 with the carbonyl oxygen of Ile70 (4.46 Å).
The co mpounds with 2-, 3-and 4-p yridine moieties w ere ranked at 3 rd, 4th and 5th positi ons, respectively. With respect t o the calculated "best compound", th ese scor es can be ration alized in th e same way as i n the case of E RK2 case, by a d ecrease i n hydrophobic in teractions. How ever, th e authors of the experimental paper [29] found th at compounds with 3-and 4-pyridine moieties do not bind to JNK3, which underscores the difficulty in discrim inating between co mpounds that bind from those that do not bind, based on our computed scores alone.

Redesign of the Z1208 scaffold bound to JNK3.
The cry stal structure of the native Z1208 with JNK3 complex sugg ests that the lig and is tigh tly bound to the ATP binding pocket of the protein (Figure 4). One water molecule appears in the binding site forming hydrogen bonds with the protein backbone (carbonyl o xygen of the G lu147 and the amide nitrog en of Met149) and the h ydroxyl group of th e condensed ring in the native Z1208 (Figure 5a). Analysis of JNK3-ligand complexes in the PDB show that hydrogen bond interactions between the ligand and the backbone atoms of Glu147 and/or Met149 are common, but th e water molecule is no t present in all the structures. Therefore for the purpos e of this study the water molecule w as rem oved. Further more, in the binding site , th e 3-chloro-4-fluoro-phenyl moiety of the n ative Z 1208 li gand is pos itioned near Met1 46 of JN K3 and surrounded by other h ydrophobic residues (Figure 4).
According to S capin et al. [32], JNK 3 complexes with i midazole-pyrimidines (PD B-codes: 1q mn and 1qmq) that have h alogen-phenyl moieties, fea ture d ifferent confor mations of t he M et146 si de chain th an in the complex with AMP-PNP (PDB-code 1jnk). We found that the stru cture of J NK3 protein in the complex with Z1208_original, superimposed well onto t hose o f the JN K3 proteins in 1q mn and 1 qmq. In addi tion the Met146 adopts a similar conformation on all three complexes. Our calculations generated 64 designed compounds. The to p 10 rankin g compounds are displayed in Table 5. Co mpounds with the best and the second best sco res contain one -NH-group in different positions of the 5-membered ring. The add ition of a seco nd substit uent in th e sam e ring of the molecule (-NH-and -N= ), produced only a sl ightly higher sc ore (compound 8 in Table 5). Visual inspection of the modeled ligand-protein active site reveals that oxygen atoms in Met149 and Asp150 make contacts in the 5-membered ring, and those atoms are engaged in repulsive interactions when the negatively charged -N= groups are introduced (F igure 5a). Hence replacements which combine sulfur or carbon atoms with the -NH-group is preferable to those of two -N= groups. We see that the native compound is ranked 4th in this list, but its score is only slightly worse than the pred icted best-scoring ligand. Unfortunately, the chemical s ynthesis of all three best ranking compounds was very d ifficult with current techniques, d emonstrating that th e f ilters used in th e pre-calculation steps w ere not suffic ient to remove all the unsy nthesizable compounds and h ence further improvements of the various filters used in our procedure are required. Nevertheless, it is q uite encouraging that the native ligand was the top ranking synthesizable compound.

Discussion
Additional validation of our method by synthesis of one molecule An additional validation method step was c arried out by synthesizing in the laboratory one of the top-ranking newly designed inhibitors. The chosen compound was Z1208-8, ranked 8th in Table 5, and its inhibitory activity was measured against ATP hydrolysis by JNK3. This co mpounds was selected because it was easier to synthesize compared to other higher ranked compounds in the same list.
The biological assay showed that Z1208-8 was a ble to inhibit ATP binding to JNK3 with an I C 50 62.9 μM, representing a six fold poorer activity than the native ligand (Z1208, 9.6 μM). The a nalysis of the 3D structure of th e X1 208-8/JNK3 co mplex indicated th at th e two major contributi ons t o the overall energy score are: 1) the NH group in the 5-membered ring forming a hydrogen bond with the carbonyl oxygen of Asp150, which would favor binding, and 2) a close contact of the -N= group in the 5-membered ring with th e b ackbone oxygen of Met149 (Figure 5b), which would disfavor bin ding. The second contri bution is higher in Z1208-8 higher th an in th e nati ve co mpounds, suggestin g an explanation to the higher IC 50 value observed in the new inhibitor.
Our method is thus capable of d esigning new compounds with inhibitor activity against the target enzyme, but their activity can be lower than that of the starting hit compound. We expect however, that several cy cles of rati onal design using our method and experim ental an alysis of th e top rankin g candidates should lead not only to compounds with different che mical properties from those o f the original molecule, but also to those with a strong or stronger inhibitory activity.
This p aper has shown that our computational method is effective in a scaffol d-based redesign of kinase inhibitors FPH, 33A and Z1208. For a defined scaffold and keeping fixed the "geometry" of its core skeleton, our method was capable of sampling a large number of different substituents providing a set of compounds with potential inhibitory activity against the protein targets. Compounds were ranked based on an energy function, and in all the cases native inhibitors were identified in the 5 top ranking compounds. Validation of our method was p erformed by comparing the scores of designed molecules with e mpirical d ata on their inhib itory activ ity (IC 50 and Ki valu es). In the JNK3 inhibit or desig n study, one of t he top ra nked co mpounds was s ynthesized and it s inhibitory ac tivity was confirmed experimentally.
Future dev elopments w ill address the followin g out standing lim itations of o ur method: 1) more effective filters t o rem ove co mpounds difficu lt or impossible to sy nthesize, 2) i mprove th e scoring function to enh ance co mpound r anking accuracy, and 3) tak e into account protein and ligand conformational flexibility and different ligand poses in the protein active site. With the introduction of these improvements, our computational approach holds the promise of becoming a useful tool for lead optimization.

Lead optimization procedure
The lead/hit optimization procedure used in this study was previously reported by Ogata et al [33] and is only briefly summarized here. The first step consists in extracting the atomic coordinates of the ligand's heavy atoms (referred to as 'geometry') from a high resolution structure of the protein-ligand complex. The geometry is then divided into fragments which are grouped into three partial structures types: rings, linkers (defined as the fragments that connect rings), and terminals (defined as other types of frag ments). In addit ion, all th e atoms in the g eometry are classified according to their bond order types (sp3, sp2, etc ) an d atomic species (CH 3 , CH 2 , CH, NH 2 , NH etc). For exa mple, consider th e geometry X···Y··· Z, i n w hich X, Y , and Z represent the at oms in the g eometry and '···' is a gene ric representation of th e bonds conn ecting the atoms. Repl acing Y with a " = CH-" generates a chemically incomplete compound X=CH-Z for which "=" and "-" indicate a double and a single bond, respectively. Then, X should b e assigned to an atomic species linked through a double bond to Y (ex. O= or CH2=). Similarly, Z should be assigned to an atomic species capable of linking to Y through a single bond (ex. -CH 3 or -NH 2 ). By assembling all possible combinations of these atomic species, four compounds are obtained: O=CH-CH 3 , O=CH-NH 2 , CH 2 =CH-CH 3 , and CH 2 =CH-NH 2 . For the work presented in this paper, eighteen atomic species were used (see Table 6). After assigning all possible combinations of atomic species to the native ligand's core coordinates, all the partial structures ar e considered and bon d or der requ irements are satisfied thereby gener ating the hit compound database. Compounds in the newly generated database have similar core geometries as the native ligand and each atomic position satisfies different chemically meaningful combinations.
In a second s tep, co mpounds fro m the n ewly generated database a re s ubjected to two filters. Th e Rishton non leadlikeness filter (to remove undesira ble functional groups, see Figure 6) [34] and th e Lipinski's rule of five (co mpounds w ith more th an fiv e hy drogen-bond donors, more th an 10 hydrogen-bond acce ptors, molecular mass gr eater th an 500 Da, logP v alues gr eater t han 5, or more than 10 rota table b onds are not desira ble for orally acti ve drug s) [35]. F rom the re maining list of compounds, m olecules with ring(s) and c ondensed ring structures were selecte d b ecause known hit s for the three targe t kinase proteins contain such structures. These molecules were tr eated as th e final list of lead candidates and ranked b ased on a s coring function, Score, evaluated for the protein-ligand complex. The scoring function comprises four empirical energy terms: where, E v is the van d eer Waals in teraction energ y, E e t he electrost atic in teraction energy , E h the hydrogen bond energy, and E s the solvation energy.
E v and E e were obt ained using the A MBER force field with the GAFF parameter set. [36] E h was defined as: where r H is the distance between the hydrogen and the heavy atom (H … X, set at 2.0 Å for this study). E s was computed for the bound and unbound states of the ligand, protein and the complex as: where A i and σ i are t he solv ent-accessible s urface area and th e p roportionality fa ctor for th e solvent-accessible surface area of atom i, respectively [37]. The free energy of protein G and compound G were calculated in th e same manner. This method has been d esigned to provide a ranked list of compounds with b etter " drug-type" properties ( more stable, druglikeness an d synthesizable co mpounds) th an other approaches. Figure 6. Nonleadlikness filter. The subs tituent ty pes were extr acted fro m Rish ton work [34]. The electrophilic functional groups shown here are the most common protein-reactive covalent-acting false positives in biochemical assays. Compounds with substituents shown in this figure were removed from our results.

Application to serine/threonine protein kinases
Our a pproach was tested using t hree serine/threonine pro tein kinases as t argets. The X-ray protein-ligand co mplex structures used in th is st udy were:  Table 2). The three structures display different ligand bind ing modes, and feature d ifferences in th e electrost atic po tentials at th e ATP-binding site [29,30,[38][39][40]. In addition, inhibitory activity against the target proteins has been reported for series of compounds. These compounds were derived by small modifications (changing or adding substituents) of the n ative ligand structures and atom types [29,30,38]. We used this data to validate the results of our c alculations, which involves potential co mpound ca ndidates with larger struct ural and c hemical differences than the original authors considered in their study.
In pre paring the inpu t structures for our calculations, t he fol lowing steps were performed (see Figure 1): 1) all water molecules were removed from the original complexes' PDB files; 2) In FPH, the hydroxyl group attached to the 5-and 6-membered condensed ring was replaced by a hy drogen atom because t he modified compound has a larg er nu mber of si milar co mpounds with experimentally demonstrated inhibit ory a ctivity than t he native co mpound; 3) In FPH a nd Z12 08, all th e fluorin e atoms were replaced by chlorine atoms as this replacement made the chemical synthesis easier and; 4) the thioamide group in Z1208 was replaced by an amide group for the same reason.
In JNK3, two ligands were used to create the input structure: th e native Z1208 and a derivative of Z1208 that acts as an ATP hydrolysis inhibitor. The experimental data used for this analysis were the in-house X-ra y cry stal structur e (2.1 Å resolution a nd R-factor= 24.6%, see Figure 4) and the I C50 values of 22.8 μM a nd 9.6 μM for nativ e Z1208 and Z1208-derived ATP h ydrolysis inhibitor respectively.

Inhibition assay
To measure the inhib itory activity of the Z1208, we used the following assay s ystem: ad enosine triphosphate (A TP), ph osphoenolpyruvate (P EP), nicotinamide aden ine dinucleotide (NADH ), a nd a solution mixture of pyruvate k inase and L-lactate d ehydrogenase (PK-L-LDH) were purchas ed fro m Roche Di agnostics. Oth er reagen ts were purch ased fro m Si gma-Aldrich. JN K3 was express ed and purified by the method of Xie et al. [41]. After a purification step, JNK3 was activated by GST-fused MKK7 and further purified against a glu tathione-fixed co lumn. Inhi bitory ac tivity was estimated b y detecting th e inhib ition of ATP hydrolysis react ion monitored by the coupled reaction o f NADH oxidation; a slightly modified method of Xie et al. Experimental conditions were: 100 nM JNK3 in 50 mM Hepes, pH 7.6, 10 mM MgCl 2 , 1 mM NADH, 90 mg/mL PK, 30 mg/mL L-LDH, 2 mM PEP, 200 mM ATP, and each concentration of compound under 1% DMSO. The conversion of NADH was measured by kinetic monitoring with SpectraMax 190 (Molecular Devices).

Scheme 1.
Steps of the chemical synthesis of Z1208-8.