PETRA: Drug Engineering via Rigidity Analysis

Rational drug design aims to develop pharmaceutical agents that impart maximal therapeutic benefits via their interaction with their intended biological targets. In the past several decades, advances in computational tools that inform wet-lab techniques have aided the development of a wide variety of new medicines with high efficacies. Nonetheless, drug development remains a time and cost intensive process. In this work, we have developed a computational pipeline for assessing how individual atoms contribute to a ligand’s effect on the structural stability of a biological target. Our approach takes as input a protein-ligand resolved PDB structure file and systematically generates all possible ligand variants. We assess how the atomic-level edits to the ligand alter the drug’s effect via a graph theoretic rigidity analysis approach. We demonstrate, via four case studies of common drugs, the utility of our pipeline and corroborate our analyses with known biophysical properties of the medicines, as reported in the literature.


Introduction
The drug discovery and development process has advanced greatly over the last few decades. This is in part due to both an increased understanding of the biochemistry and biophysics of protein-ligand interactions and to the improvement in wet-lab methods of in vitro screening of novel drugs. Wet-lab methods offer the option to scan through thousands of compounds using techniques, such as high-throughput screening [1] or deep mutational scanning [2]. With this increased understanding of protein-ligand interactions, many novel approaches to drug design have been created both in silico and in vitro. These approaches include exploring enzyme flexibility [3], maximizing ligand affinity and selectivity [4,5], the use of templates [6], and the use of homology modeling [7]. A diverse variety of computational approaches [8], accessible either as libraries or via servers, are available. They include LEA3D [9], LigPlot+ [10], iScreen [11], e-LEA3D [12], iSmart [13], and PharmMapper [14]. These approaches employ a variety of techniques to better understand the effects of a ligand on a protein. Among them are fragment-based modeling approaches, which have been on the rise for the last decade [15]. Pharmacophore methods, which focus on the chemical activity of the ligand and the interactions between the proteins and ligand, have been helpful in aiding the drug discovery process. However, these methods need to still be improved in efficiency and efficacy in order to be used with high success in the drug design process [16]. Approaches similar to pharmacophore-based techniques include QSAR (quantitative structural-activity relationship), which reasons that structurally similar compounds show similar biological activity. Chemical and physical interactions, such as hydrophobic interactions, are used as correlation markers for biological activity [17]. Web services are also available, including e-LEA3D, which allows a user to in silico create new ligands based on a customizable fitness score. The score can be based on molecular properties or a protein-ligand docking score [12].
There still remains a high percentage of late stage failures in clinical development that is attributed to safety and efficacy concerns [18]. The safety and efficacy of a drug is partly determined in the early stages of drug development, where computational efforts can be used to select the best drug candidates with respect to certain parameters [17]. However, Lange et al. believe "early drug research efforts are too reductionist" in terms of delivering parameters [19]. Consequently, the current knowledge of protein-ligand interactions still needs to be improved in order to lower the cost and time of drug discovery and development.
In this work, we motivate and demonstrate the use of rigidity analysis to assess the role that each atom of a ligand has on the stability of the protein-ligand complex. Not only is our approach quick-needing, at most, hours of compute time-compared to wet lab experiments, but it also assesses the role of each atom in the engineered drug, which provides a level of detail that is not easily accessible via most other methods.

Related Work
Previously, we have relied on rigidity analysis to gain insights about a variety of structural and biochemical properties of biomolecules. Rigidity analysis is a fast combinatorial approach for identifying rigid regions, or clusters, of atoms in a protein [20,21]. Atoms and their chemical interactions are used to construct a mechanical model, from which a graph is generated, on which pebble game algorithms [22] are used to analyze the rigidity of the associated graph and biomolecule ( Figure 1). A biomolecule may be composed of many small rigid clusters, one or more larger ones, or a combination of both large and small rigid clusters. In our previous work, we correlated the size and distribution of rigid clusters to a protein's stability [23] and compared the rigidity properties of the wild type and mutants to infer the effects of amino acid substitutions [24,25].
In this work, we systematically engineer in silico all the possible variants of the ligand in a protein-drug complex, as well as measure the rigidity of all generated complexes as a way to infer how individual atoms in the ligand contribute to the drug's effect on the protein.

Rigidity Distance
To quantitatively compare a protein mutant to its wild type, we previously developed the Rigidity Distance (RD) metric [26]. To assess the effects of deleting, in silico, an atom from the ligand, we compare the count and sizes of the different rigid clusters in the wild type of the protein-ligand complex and the protein-ligand mutant, via our RD WT→Variant metric: where WT refers to the non-mutated protein-drug complex, Var refers to a variant of the protein-drug complex in which the ligand is altered in silico, and LRC is the size of the Largest Rigid Cluster (in atoms). Each summation term of the RD WT→Variant metric calculates the difference in the count of a specific cluster size, i, of the wild type and variant, and weights that difference by i. The weighting is introduced so that differences among larger clusters of rigid atoms have a greater impact on the RD score, which we found to be the best approach in correlating the RD metric to experimental data [27].
The use of the RD metric permits us to quantitatively assess the extent that each atom in the ligand has on the protein with which it is in complex.

Methods
For this work, we relied on our recently developed computational pipeline, Protein-Ligand complex Engineering Through Rigidity Analysis (PETRA). PETRA is a multi-step system that in silico engineers variants of ligand in a protein-drug complex and analyzes each variant using freely available rigidity analysis software [20]. The input to PETRA is either a protein-complex structure file from the RCSB protein data bank [28], or custom user-supplied PDB-formatted and CIF files for a protein-ligand complex. PETRA generates all possible complex variants in which atoms are removed systematically from the ligand. PETRA performs rigidity analysis of the wild type protein-drug complex and all of the variants. The results of the rigidity analyses are analyzed to infer how each atom in the ligand affects the stability of the protein-drug complex ( Figure 2).

Generating Ligand Variants
To generate ligand variants, PETRA employs a depth-first traversal of a graph representation of the drug compound ( Figure 3). Each node in the graph represents a non-hydrogen atom in the ligand. Any ligand atoms that engage in hydrogen bonds or hydrophobic interactions with the protein have their corresponding node marked as a root node for future steps. Our approach for in silico generating variants of a ligand leverages a depth-first search of a tree representation of a drug, in which nodes-representing atoms-are successively pruned away. We have chosen to always retain an atom of the ligand that engages in a stabilizing interaction with the protein because it is these interactions that give rise to the specificity and catalytic properties of a drug. Modifying atoms that engage in those interactions would negate the biochemical reasons why the drug was engineered in the first place. During the pruning process, rings are condensed into single nodes because generating all possible substructures of a ring (for example, removing a single atom from a benzene ring) is not only biologically infeasible but also would result in a cycle in the graph representation of the ligand compound. From this graph, multiple tree structures are formed, each of which is rooted at any node marked as root in the graph. When a ligand binds multiple ways to the protein via hydrogen bonds or hydrophobic interactions, multiple trees are generated, where each root node represents the atom in the ligand that interacts biochemically with the protein. The tree structure allows for enumeration of all possible sub-trees that contain the root ( Figure 3). This is done through a depth-first traversal, with each leaf returning all possible substructures in a set to prevent any duplicate substructures across multiple trees. Note that the ligand modifications process is done without regard of whether the atomic modifications are able to be realized via a synthesis process using existing wet-lab techniques and without regard whether the protein and ligand would still bind. However, each ligand variant retains at least one atom (the root) that engages in a stabilizing interaction with the protein target, and this is done to preserve the atom that is responsible for a biochemical interaction.

Analysis and Visualizations
To better understand the effect of removing an atom from a ligand that is in complex with a protein, we developed two rigidity-based metrics along with visualization of them. The metrics are the average change in rigidity distance (RD) when a single atom is cut from a ligand, and the average rigidity distance (RD) when a pairwise set of atoms is contained in the ligand variant. See Section 2 for an overview of how we and others have used rigidity analysis to explore how point mutations affect the structural stability of a protein.

SingleCut: the Role of Single Atoms
To better understand how a single atom contributes to the effect that a ligand has on the rigidity of a protein, we calculate the change in rigidity distance for all ligand variants with a specific atom removed from the drug. Note that the removal of atom x from a ligand may be the 1st, 2nd, . . . all the way up to the last atom to be removed, so our metric reasons about a set of ligands. When the change in rigidity distance is high, this indicates that the singular atom in the ligand plays an important role in how the drug affects the stability of the protein. This metric is calculated via the following: where α' is the set of all atoms cut from a given ligand, j is the set of cut atom(s) of the current cut, starting from 1 st cut, going to the n th cut, m is the number of times atom i is cut from a ligand variant, and γ' is the change in rigidity distance (RD) of a protein when an atom is cut, calculated by RD before cut − RD after cut . We use a box plot ( Figure 4) to visualize SingleCut.
Cut Atom   (1). The x-axis represents a singular cut atom and y-axis values are the average change in rigidity distance of the protein when the singular atom is cut from a ligand variant. ‡ designates involvement in a hydrophobic interaction.

PairCut: The Role of Pairs of Atoms
To reason how pairs of atoms contribute to the effect that the ligand has on the protein, we take the average rigidity distance of the ligand when a pairwise set of atoms is contained within the ligand. We use this metric to acquire a better understanding of the effect of each atom, in combination with another one, and how their combined presence contributes to the ligand's effect on the rigidity of the protein target. If the pairwise atoms that are contained in the ligand have an average rigidity distance that is relatively low in magnitude compared to the rest of the pair-wise atoms, the atoms contained play an important role in the ligand's (drug's) effect on the rigidity of the protein. The PairCut metric lends itself to the use of a heat map ( Figure 5). The PairCut metric is provided in the following equation: where β is the set of all atoms in a given ligand, α is the set of all atoms not cut at a point in time from a given ligand variant, γ is the rigidity distance of a ligand variant that contains atom i and atom j , λ is any combination of atoms contained in a ligand variant (not including atom i or atom j ), and |{(i, j, λ)|{i, j, λ} ⊆ α}| is the number of ligand variants containing atom i and atom j . Note that rings are treated as singular "atoms" because they are excised as a unit. This equation sums the rigidity distance of each pairwise set of atoms, atom i and atom j , and then divides that sum by the number of ligand variants containing atom i and atom j in order to find the average rigidity distance for each pairwise set of atoms.

Protein-Ligand Interactions
To aid in understanding which atoms of the ligand interact with the protein target, we relied on the third-party software OpenEye [29] (see example interaction map shown in Figure 6).

Data Set, Run-Times
Thirty-two protein-ligand complexes were analyzed using PETRA (Table 1). Run-time and data metrics representative of the range of protein-ligand complexes are shown in Table 2. An analysis of PDB 2PJ6, for example, created 7989 ligand variants, generating 5.7 GB of data, and needed 6 hours of compute time on a 6-core 3.7 GHz processor with 16GB in a docker container, enabling 6 ligand variants generated and analyzed concurrently. As can be seen, even large PDB files had reasonable run-times.  Table 2. Run-time and data metrics for a representative sample of the protein-ligand complexes we analyzed. These were run on a 6-core 3.7 GHz processor with 16 GB in a docker container, with 6 ligand variants generated then analyzed concurrently.

Case Studies and Discussion
We focused on analyzing protein-ligand complexes of well-researched drugs for which details about specific atoms and functional groups are available in the literature. These include Ibuprofen, Zidovudine, Aspirin, and Ciprofloxacin. Here, we present several case studies to demonstrate the utility of PETRA and its use in helping to better understand the roles of individual atoms in the effects that drugs have on protein targets.

Case Study 1: Ciprofloxacin
The ligand in PDB file 4KRA, Ciprofloxacin (Figure 7), was first approved as an antibiotic drug in 1987 and is a member of a quinolone group of antibiotics [30]. The structural design of quinolones uses quinine, which features two cyclohexenes. Before Ciprofloxacin was developed, additions to the quinolone group included a fluorine atom and a six-sided ring. Ciprofloxacin retains the design of the previous attempts but added a cyclopropyl group. Using PETRA, we assessed the effects that each addition to the basic foundation of the design quinolone group has on the rigidity on the protein-ligand complex (Figure 8). When atom O3 is cut from Ciprofloxacin, it induces the largest change, RD = 51, indicating that atom's significant role in the effect that the drug has on the protein target. Indeed, the 03 atom is needed for gyrase binding and bacterial transport [30]. The two other oxygen atoms O1 and O2 (which are bonded to C3) are involved in gyrase binding and bacterial transport, and they have a small positive effect on the median rigidity distance. The important addition that distinguishes Ciprofloxacin is that the cyclopropyl group can have the second largest change of 47 in rigidity distance. Biochemically, this cyclopropyl was designed to increase the potency of the drug. Thus, the use of PETRA, and it revealing the importance of the O3 atom, appears to be corroborated by biochemical experiments performed in wet-lab settings.

Case Study 2: Human Serum Albumin
Human serum albumin (HSA) is a highly abundant lipid binding protein found in blood. HSA has many functions including transportation of medicine through the body [31]. Myristic Acid (MYR) and Decanoic Acid (DKA) are two fatty acid ligands that are known to bind to HSA [32]. For an analysis of two ligands and their effect on HSA, we analyzed PDB files 3B9L and 1TF0. In wet-lab studies [32], Decanoic Acid does not induce a conformational changes in HSA, perhaps due to its short length. MYR, however, has more molecules that bind to HSA and, indeed, it induces a conformational change. In the heat maps generated from the PairCut analysis using PETRA (Figure 9), MYR has a larger overall change in RD, meaning the wildtype HSA has more of a change in its rigid cluster when atoms are removed from the ligand. Future studies will assess whether a high change in RD correlates whether a ligand will or will not produce a conformational change in the protein.
Average RD per atom for 3B9L.MYR Average RD per atom for 1TF0.DKA Salicylic acid (SAL, commonly known as aspirin, Figure 10) also binds to HSA. The hydroxyl group (O 2 ) participates in a hydrogen bond with one of the pocket it binds to [32]. Our visualizations highlight the effect of O 2 on HSA as it is the only atom that has a noticeable effect as determined via the SingleCut metric. In Figure 11, the lines represent the box plots with minimal quartiles; nonetheless, only when O 2 is cut, does the RD change. It has been shown that Zidovudine (AZT or AZZ) binds differently to HSA depending on whether it is in complex with just MYR or MYR and SAL [32]. SAL out-competes AZT in its ability to bind, meaning that AZT has more molecules that bind with HSA when only with MYR. Figure 12 represents the PairCut metrics for two PDB files, containing HSA in complex with different ligands. PDB file 3B9L is for HSA in complex with AZT and MYR, while 3B9M is HSA in complex with AZT and MYR and SAL. The average RD overall shown is 2 units lower for the HSA-AZT-MYR complex than the HSA-AZT-MYR-SAL complex. Although this difference in RD metrics is small, it is in agreement with wet-lab experiments reported in the literature.

AZZ.SAL
Average RD per atom for 3B9M.AZZ 18

Case Study 3: Ibuprofen
Ibuprofen (CIF ID: IBP) is a chiral nonsteroidal anti-inflammatory drug (NSAID) [33] and is a common household item. We used PETRA to analyze five different protein-ligand complexes with ibuprofen as the ligand ( Figure 13).
The aromatic ring had the greatest average rigidity distance when it was removed from the ligand in all 5 of the complexes run except for PDB ID 6U4X (Figure 13e), which is equine serum albumin. This could be interpreted that the aromatic ring has the greatest impact on the rigidity of the protein. Indeed, Gonzalez and Fisher [34] have found that the aromatic ring greatly affects the positioning of Ibuprofen in a human adipocyte lipid-binding protein named FABP4 (PDB 3P6H, Figure 13a). Thus, the aromatic ring can increase the ligand's ability to bind to the protein in a hydrophobic environment. The second protein PDB 3IB2 (Figure 13b) demonstrates this, in which also ibuprofen binds in a cleft with six hydrophobic residues [35]. The third protein PDB 4JTR (Figure 13c), which is a COX-2 mutant, also binds ibuprofen in a hydrophobic environment [36].
For the fourth protein (PDB 4RS0, an AKR1C2 complex shown in Figure 13d), the manuscript is not yet available (to be published by Yosaatmadja et al.). Nonetheless, the SingleCut metric plot identifies the aromatic rings as contributing significantly to the ligand's effect, in corroboration with our other analyses, revealing that the aromatic ring has a large effect on the ligand's binding ability.
For the case of PDB 6U4X, atom C5, which is among one of the first atoms removed by PETRA due to its location on the periphery of the drug, has the largest rigidity distance change when it is removed. This could be a sign that C5 plays an important role in how ibuprofen binds to the equine serum albumin. The manuscript accompanying 6U4X similarly has not yet been published (Czub et al.); therefore, the significance of C5 is not yet corroborated by wet-lab work. Nonetheless, the SingleCut metric box plot for 6U4X (Figure 13e)

Case Study 4: Ibuprofen Variants
A case in which PETRA is not able to predict the importance of an atom's contribution to the ligand's binding affinity is with two complexes which include variants of ibuprofen. PDB files 3P6D and 3P6E contain ligands that are based of the structure of ibuprofen. The only difference between the two is the attaching of a methyl group to the aromatic ring [34]. This methyl group causes a 100-fold increase in the binding affinity of the ligand. The methyl group is represented as C1 in Figure 14b, which shows minimal effect on the RD metric when removed from the ligand. Furthermore, the PairCut heat maps in Figure 14b indicate that the average RD is higher in 3P6E, which is without the methyl group, which is opposite of what would be expected considering the 100-fold increase.

Conclusions and Future Work
Pharmaceutic drug engineering efforts are expensive and time-consuming. Although a variety of software tools are available for exploring and aiding the drug design process, none employ the use of rigidity analysis to explore how each atom of a ligand contributes to the drug's effect on the structural stability of a protein target. We developed a compute pipeline, PETRA, that for a PDB file of a protein-ligand complex, generates in silico all possible ligand variants. Each variant is assessed to help infer the role of individual atoms of the ligand. We have shown via case studies and visualizations that PETRA can provide insights about the roles of the atoms in the ligand, and this is corroborated by wet-lab studies reported in the literature.
In ongoing work, we are continuing to assess the predictions output by PETRA against the efficacy rates of drugs as determined via clinical trials and in wet-lab assay experiments. The next several steps are envisioned or are ongoing, including the use of machine learning methods to develop accurate predictions of the effects of atoms in a ligand. PETRA is being prepared for public release via an open source Docker container that manages dependencies and custom wrapper software for file input and output options. The source code for PETRA is available upon request. Funding: This research received no external funding.

Conflicts of Interest:
The authors declare no conflict of interest.