Virtual Screening : A Fast Tool for Drug Design

Computational screening of databases has become increasingly popular in the pharmaceutical research. Virtual screening uses computer based methods to discover new ligands on the basis of biological structures. Virtual screening is divided into structural based screening (docking) and screening using active compounds as templates (ligand based virtual screening). Ligand based screening techniques mainly focus on comparing molecular similarity analyses of compounds with known and unknown moiety, regardless of the methods of the used algorithm. Docking is a computational tool of structure based drug design to predict protein ligand interaction geometries and binding affinities. In this review we provide an overview of the already used ligand based virtual screening and the docking with various databases, filters, scores and applications in the recent research in the pharmaceutical field.


Introduction
For those engaged in drug design, such as medicinal and computational chemists, the research phase can be broken down into two main tasks: identification of new compounds showing some activity against a target biological receptor, and the progressive optimization of these leads to yield a compound with improved potency and physicochemical properties in-vitro, and, eventually, improved efficacy, pharmacokinetic, and toxicological profiles in-vivo.Identification of leads is driven either by random screening or a directed design approach, and traditionally both strategies have been of equal importance, depending on the problem in hand.The directed approach needs a rational starting point for medicinal chemists and molecular modeling scientist to exploit.Examples include the design of analogs of a drug known to be active against a target receptor and mimics of the natural substrate of an enzyme.Increasingly, the three-dimensional structure of many biological targets is being revealed by X-ray crystallography and nuclear magnetic resonance (NMR) spectroscopy, opening the way to the design of novel molecules that directly exploit the structural characteristics of the receptorbinding site.In recent years, this approach of structure-based design has had a major impact on the rational design and optimization of new lead compounds in those cases where the receptor structure is well characterized [1][2][3].The practice of testing of large number of molecules for the activity in the model system that is representative of the human disease, known as screening, is a well-established fact in the pharmaceutical industry.High throughput screening technology allows for the testing of thousands to million of the molecules for activity against a new target system as a part of new drug discovery process [4][5].Virtual screening, sometime also called in-silico screening, is a new branch of medicinal chemistry that represents a fast & cost effective tool for computationally screening database in search for the novel drug leads.The routes for the virtual screening go back to the structure-based drug design & molecular modeling [6].

Concept of Virtual Screening
Virtual screening uses computer based methods discover new ligand on the bases of biological structure [7].The basic goal of the virtual screening is the reduction of the enormous virtual chemical space of small organic molecules, to synthesize and/or screen against a specific target protein, to a manageable number of the compound that inhibit a highest chance to lead to a drug candidate [8].In theory, the applicability of virtual screening is limited only by what properties of a compound can be calculated computationally and the perceived relevance of those properties to the problem in hand.On a practical level, further considerations include the timescale for calculation of the properties, which may be considerable for a database of, say, one million compounds.The software and hardware required yielding a timely answer.Many drug candidates fail in the clinical trials because of the reasons unrelated to the potency against intended drug target.
Pharmacokinetic & toxicity issues are blamed for more than half of the failure in the clinical trials.Therefore first part of the visual screening evaluates the drug likeness of the same molecules most independent of their intended drug target [1].The term virtual screening has been used to describe a process of computationally analyzing large compound collections in order to prioritize compounds for synthesis or assay [7].A broad range of computational techniques can be applied to the problem.In our work we have focused on explicit receptor-ligand molecular docking as a means of yielding the most detailed model of the way in which a given ligand will bind to a receptor, and hence the most informative basis on which to assess which ligands are useful candidates for synthesis or assay.Although the underlying methods of virtual screening have been in use in various guises for several years, it is worth noting the recent impact on molecular modeling made by the increased availability of high-performance computing platforms.Affordable multiprocessor workstations and PC (personal computer) clusters have enabled the modeler to employ computationally demanding algorithms on a routine basis.This change is particularly relevant in the case of virtual screening, where as in the work described in this paper, computationally intensive methods such as molecular docking must be applied to very large databases of chemical structures [6].

Drug likeness screening [18-23]
Many drug candidate fail in the clinical trials reasons is unrelated in the potency against the intended drug target.Pharmacokinetic and toxicity issues are blamed for more than half of all failure in the clinical trials.
Therefore first part of the virtual screening evaluates druglikness of small molecules, drug like molecules exhibit favorable absorption, distribution, metabolism, excretion, toxicological (ADMET) parameters.Using following types of method currently assesses Druglikness

□
Simple counting method □ Functional group filter

Pharmacophore filter
Counting scheme [17][18][19][20][21][22] Database collections of known drug are typically used to extract knowledge about structure properties of potential drug molecules.Molecular weight, lipophilicity, charge are profiled to extract simple counting rules for relevant description of ADMET-related parameter.

Functional group filters [23]
Reactive, toxics, or otherwise unsuitable compounds, such as natural product derivatives, are removed using specific filters.Typical reactive functional groups include, for the example, reactive alkyl halide peroxide, and carbazide,Unsuitable leads may include crown ethers, disulfide, and aliphatic methylene chain seven or more long and Unsuitable natural product may include quinones, polyenes, and cycloheximidine derivative removed by using filters.Screening out the compound that contain certain atom groups are associated with toxicity provide a practical and fast way to reduce large database.Better description of toxicity may provide structure-based method to assess toxicity of the compound.

Topological drug classification [23]
It is generally assumed that compound those having the structure similarity with known drug may exhibit drug like properties themselves, such as oral bioavailability, low toxicity membrane permeability and metabolic stability.Its first part is artificial neural networks and decision trees very fast filter tool in virtual screening approaches.Data's also collected to find structural motifs and pharmacophore features of small molecules that characterize drugs.For the analysis of virtual libraries according to the presence or absence of drug like framework, side chain or structural motifs can be used for virtual screening.
Pharmacophore points filter [22]   A simple pharmacophore filter has been introduced recently.It is based on the assumption that drug like molecules should contain at least two distinct pharmacophore groups four functional motifs have been identified that guarantee hydrogen bonding capability that are essential for the specific interaction of the drug molecules with its biological target.These motifs can be combined to functional groups that are also referred to here as pharmacophore points; they include: amine amide, alcohol, ketone, sulfone, sulfonamide, carboxylic acid carbamate, guanidine, amidine, urea, and ester.

Pharmacophore Based Virtual Screening [24-25]
It is the processes of matching atoms or functional group and the geometric relations between them to the pharmacophore in the query.Examples of the programs that perform pharmacophore based searches are 3D search UNITY, MACCS-3D and ROCS.Usually pharmacophore based search are done in two steps.First the software checks whether the compound has the atom types or functional groups required by the pharmacophore, than its checks whether the spatial arrangement of this element matches the query.Flexible 3D searches identified a higher number of hits than rigid searches do.However flexible searches are more time consuming than rigid ones.There are two main approaches for including conformational flexibility in to the search one is top generate a user defined number of representative conformation for each molecules when the database is to created, the other is to generate conformation during the search.
ROCS is using as shape based super position for identifying compound that have similar shaped.Pharmacophore model provide powerful filter tools for virtual screening even in case where the protein structure is not available, pharmacophore filter are much faster than docking approaches, and there for greatly reduce the number of compound subjected to the more expensive docking application.Another interesting aspect of pharmacophore in virtual screening is 3D-pharmacophore diversity.

Introduction to pharmacophores
The first identification of the pharmacophore formulated by Paul Ehrlich was "a molecular framework that carries (phoros) the essential feature responsible for a drug's (pharmacon) biological activity" [26].This definition is slightly modified by Peter Gund to "a set of structural features in a molecule that is recognized at a receptor site and is responsible for that molecule's biological activity" [27].
Note that in addition distance that describes the 3D relation among pharmacophore points, angles, dihedrals, and exclusion volumes are also used.
Pharmacophore hypotheses for searching can be generated using structural information from active inhibitors, ligands, or from the protein active site itself [28][29].

2-D Pharmacophore searching
Searching of 2D database is of great importance for accelerating the drug discovery different strategies are pursued to search a 2D database to identified the compound of the interest Substructure search identified larger molecules that contain user define query irrespective of the environment in which the query substructure occur [30].Biochemical data obtainable from these compounds can be used for generating structure-activity-relationship (SAR) even before synthetic plans are made for lead optimization [31].In contrast, superstructure search are used to find smaller molecules that are embedded in the query.One problem that can arise from substructure search is that the number of the compound identified can reach into the thousands.One solution o this problem is raking of the compound based on similarity between compound in the database and in the query [32][33].Beyond structure similarity, activity similarity has also been subject of several studies.Similarity search can be combined with substructure for limiting the number of compound selected.Flexible searches are used to identify the compound that differs from the query structure in user-specified ways.

3-D Pharmacophore searching [34-5] 1. Ligand based pharmacophore generation
Ligand based pharmacophores are generally used when crystallographic; solution structure or molded structure of protein cannot be obtained.When a set of active compound is known and it is hypothesized that all the compounds bind in the similar way to the protein, then common group should interact with the same protein residue.Thus, a pharmacophore capturing this compound feature should be able to identified from a database novel compounds that binds to the same site of the protein as the known compounds do.The process of deriving pharmacophore is known as pharmacophore mapping, consist of three steps (1) identifying common binding element that are responsible for the biological activity; (2) generating potential conformations that active compound may adopt; and (3) determining the 3D relationship between pharmacophore element in each conformation generated.

Manual pharmacophore generation
Manual pharmacophore generation is used when there is an easy way to identify the common feature in a set of active compounds and/or there is experimental evidence that same functional groups should be present in the ligand for good activity.An example is the development of a pharmacophore model for dopamine-transporter (DAT) inhibitor.Pharmacophores should also have some flexibility built in, thus justifying the use of distance ranges.

Automatic pharmacophore generation [36]
Pharmacophore generation through conformational analysis and manual alignment is a very time consuming task, especially when the list of the active ligands is large and the elements of the pharmacophore model are not obvious.
There are several programs Hip Hop, Hypogen, Disco, Gaps, flo, APEX, and ROCS, that can automatically generate potential pharmacophore from a list of known inhibitors.The performance of these programs in automated pharmacophore generation varies depending on the training set.
These all program use algorithms that identified the common pharmacophore features in the training set molecules; they scoring function to rank the identified pharmacophores.

Receptor based pharmacophore generation
If the 3D structure of receptor is known, a pharmacophore model can be derived based on the receptor active site [37].Biochemical data used to identify the key residue that is important for substrate and/or inhibiting binding.This information can be used for binding pharmacophores targeting the region defined by key residue or for choosing among pharmacophore generated by automated program.
This can greatly improve the chance of finding small molecules that inhibit the protein because the search is focused on a region of the binding side that is crucial for binding substrate and inhibitors.

Structure Based Virtual Screening (Docking)
High throughput screening docking and scoring techniques can be applied to computationally screening a database of hundreds of thousand of compound against a target of proteins.Computational methods that predict the 3D structure of a protein ligand complex are often referred to as molecular docking approaches [38].
Virtual screening as a computational task can be trivially run using parallel computing because the protein ligand docking events are completely independent of each other.Although docking has initially been developed as a specialist modeling tool run on computer work station, now a day in expensive Linux clusters are distributed computing over networked PCs can be used for virtual screening.
This increases the in-silico throughput into the realm of 100,000 compounds per day on a Linux clusters there by reaching the speed of today's high throughput screening.Energy function that evaluates the binding free energy between protein and ligand sometimes employs rather heuristic terms.Therefore, those functions are more broadly referred to as scoring functions the necessary steps include protein structure preparation, ligand database preparation, docking calculation, and post processing.The protein has to be prepared only once for the virtual screening experiment unless different protein conformation are considered.The receptor site needs to be determined and charges have to be assigned.The protein sites have to be modeled as accurately as possible.Determining protein surface atoms and site points as well as assignment of interaction data such as marking hydrogen-bond donors/acceptors and so forth, are sometimes internally include in the docking software (e.g., in FlexX) and sometimes done separately (e.g., DOCK) [39][40][41][42][43][44][45].
Because of the large number of the molecules, manual step in the preparation of the ligand database obviously have to be avoided.Starting typically form 2D structure, bond types have to be checked protonation states must be determine charges must be assigned and solvent molecules removed.3D coordination can generated using a program such as CORD or CORINA."Scoring" [48] refers to the fact that any docking procedure must evaluate and rank the configurations generated by the search process.The scoring scheme most closely related to experiment-the "ab initio" calculation of the binding free energies, is not easily accessible to computation.Scoring is actually composed of three different aspects relevant to the docking and design: 1. Ranking of the conformations generated by the docking search for one ligand interacting with a given protein; this aspect is essential to detect the binding mode best approximating the experimentally observed situation.
2. Ranking different ligands with respect to the binding to one protein, that is, prioritizing ligands according to their affinity; this aspect is essential in virtual screening.
3. Ranking one or different ligands with respects to their binding affinity to different; this aspect is essential for the consideration of selectivity and specificity of ligands.[43][44][45].

Types of docking
Following different types of docking have been reported [46]

Flexible protein-ligand docking
Flexible protein-ligand docking is extremely important for the discovery of new drugs.Through the exhaustive integration and fine tuning of different variables, the latest programs and algorithms are capable of predicting the behavior of chemical compounds and protein molecule in order to better help researchers find a more efficient drug leads.This method significantly reduces the necessary cost money and time consumed, as well as minimizing the non-specific interaction of drug molecule proteins; this aspect is essential for the consideration of selectivity and specificity [47].

Flexible protein-protein docking
Protein-protein interactions are also extremely important, since they are responsible for many necessary biological functions.Prediction of such interactions is extremely important to the complete understanding of human physiology.
Association of two biological macromolecules is a fundamental biological phenomenon and an unsolved theoretical problem.In recent years, several groups have developed a variety of tools in an attempt to solve the so called protein-protein docking problem, that is, the prediction of the geometry of a complex from the atom coordinates of its uncomplexed constituents.

Hydrophobic docking
In view of the higher occurrence of hydrophobic groups at contact sites, their contribution results in more intermolecular atom-atom contacts per unit area for correct matches than for false positive fits.The hydrophobic groups are also potentially less flexible at the surface.Thus, from a practical point of view, a partial representation of the molecules based on hydrophobic groups should improve the quality of the results in finding molecular recognition sites, as compared to full representation [48].

Special aspects of docking
Proteins are inherently dynamic systems [47] the average data provided by a crystal structure, may not be an adequate representation of the flexible structural characteristics of proteins, unless the system is very rigid.The actual approach could be the comparison of experimental protein structure in the ligand free and in the complexed state, this frequently exhibits conformational changes induced by or associated with ligand binding.The spectrum of phenomenon ranges from side chain rotation to the loop arrangement and movement of entire domains.The protein itself remains fixed but either through an adapted geometric representation or using a tolerant scoring function a certain amount of overlap between the protein and the ligand is allowed emulating some "Plasticity" of the receptor.
An alternative, to account in principle, for an arbitrary degree of protein flexibility is the use of protein structure ensembles.The ensembles could be assembled from multiple crystal structures of a given protein, from NMR structure determination, or from trajectories of molecular dynamics simulations.In addition, a rotamer library can be used to create a minimal set of new conformations [48].
Three different ways to use protein ensembles for docking can be distinguished: The first and straightforward way is to carry out docking sequentially with each member of the ensemble using rigid-receptor docking [49][50][51][52].
Another way is to use a weight-average representation of the ensemble.
Knegtel et al followed this approach by generating composite grids that were used for scoring within the DOCK program [53].Recently, it has also been tested with Autodock [55].Broughton has developed another method by combining statistical analysis of a conformational ensemble from short MD simulations with grid -based docking protocols [55].
The third and most sophisticated approach to handle protein ensembles is implemented into FlexE, a variant of the FlexX program [56].

Assessment of docking method
Docking methods are usually assessed by their ability to reproduce the binding mode of experimentally resolved protein ligand complexes, the ligand is removed from the complex, a search area is defined around the actual binding site, the ligand is redocked into the protein, and the achieved binding mode is compared with the experimental positions usually in terms of a root mean square deviation (RMSD).If the RMSD is below 2 Aº it is generally considered a successful prediction.The obvious goal is that such a "near native" solution is ranked best amongst the set of ligand poses generated.Virtually any introduction of a new docking method has been accompanied by such a test.The number of complexes used varies as much as the reported success rates, which are between 10% and 100%.Clearly, success rates of 100% are rather a consequence of the limited test set size than a reflection of the mere quality of the docking method [57][58][59][60][61].

Docking and QSAR
As long as the problem of accurate binding free energy prediction on the basis of a given complex geometry has not been resolved, computational methods establishing quantitative structure-activity relationships (QSARs) to estimate relative binding affinity differences within a set of ligands remain a pragmatic alternative.Both classical and 3D QSAR methods have been developed as ligandbased approaches [62].They rely exclusively on ligand information and try to correlate experimental binding data with features described by a set of relevant descriptors.In 3D QSAR such as CoMFA (Comparative Molecular Field Analysis) these descriptors are essentially virtual interaction energies calculated using an appropriate probe atom placed at the intersection of a regularly spaced grid surrounding the molecule.They can be interpreted as a surrogate representation of the binding site.
Essential for the success of all 3D QSAR approaches is an appropriate alignment of the ligands; their relative spatial superposition must reflect the differences in binding geometry experienced at the binding site of the structurally unknown protein [61].These methods are also applied if the receptor structure is known.These results in "receptor-based 3D QSAR," a combination of a ligandbased 3D QSAR approach with information extracted from receptor structures.In case of receptor structure is known, the ligand alignment can be obtained by docking.Variety of studies shows that model generated with docking alignment could be shown to higher relevance than traditional CoMFA model based on ligand alignment [63][64].
Another concept to combine docking with QSAR has recently been proposed by Vieth & Cummins in their DoMCoSAR approach [65].DoMCoSAR is used for statistically determining the docking mode that is consistent with a structure-activity relationship, based on the explicit assumption that all molecules exhibit the same binding mode.In a first step, all molecule of a chemical series with common substructure are docked in an unbiased way to the protein-binding site and the results are clustered to establish the most favorable docking modes for the common substructure.Subsequently, forcing all molecules to align with the common substructure in the major docking modes performs constrained docking.In final stage, interaction-energy-based descriptors are calculated for all major docking modes.QSAR models are then derived to determine the statistically significant and most consistent with a given structure-activity relationship.

Docking and homology modeling
Often, the crystal structure of the therapeutic target is not available, but the three-dimensional structure of a homologous protein will have been determined.
Depending on the degree of homology between the two proteins, it may be useful to model -build the structure of the unknown protein based on the known structure.
So, in absence of an experimental protein structure, a homology model may be used for docking and structure based design.Comparative modeling based on homologous proteins of known structure can generate such a model.Obviously it is most reliable in the regions of highest homology between the templates and the target protein.Using this method overall skeleton of the target protein can frequently be obtained with sufficient accuracy, but the structural details of the binding site are often not clear.In fact members of a homologous protein family may show considerable differences in the binding region.An approach developed especially for the purpose of docking ligands into approximate protein models generated by homology modeling is the DragHome method [72] in which the binding site is analyzed in terms of putative ligand interaction sites and translated using Gaussian function into a functional binding-site description represented by physicochemical properties.Similarly, ligands are translated into a description based on Gaussian functions and the docking is computed by optimizing the overlap between the two functional descriptions.The use of soft Gaussian functions to describe protein-ligand interaction is one possibility to take into account the limited accuracy of modeled structures for the purpose of docking.The method for generating and optimizing ligand orientation relative to the binding-site representation was adapted from the ligand alignment program SEAL [66][67] for a set of different ligands; the generated solutions are analyzed with respect to the mutual ligand alignment.The alignment is then used to generate 3-D QSAR models, which in turn can be interpreted with respect to the surrounding protein model.
The docking calculation is typically done for one ligand at a time.Depending on optimization and sampling parameters as well as on the flexibility of the compound, typically between a few seconds and a few minute of CPU time is needed to dock a ligand.Because the individual docking events are independent of each other they can run on parallel hardware.Task schedulers that distribute ligand docking on available CPUs are used in many dockings programs.

Research in virtual screening
Maxwell D. Cummings et al studied [68] the performances of several commercially available docking programs and compared them with context of virtual screening.Five different protein targets are used with several known ligands.
For many of the known ligand crystal structure of the relevant protein-ligand complexes were available.For a given docking method, hits rates were improved versus that would be expected for random selection for most protein targets.
However, the ability to prioritize known ligand on the basis of the docking that resembles known crystal structures is both method-and target-dependant.Jean-

Future prospective
Virtual screening, especially the structure-based virtual screening, has emerged as a reliable, cost-effective and timesaving technique for the discovery of lead compounds.This review focuses on the generation and use of virtual compound libraries, and also on studies in which chemical feature-based pharmacophore models and docking are used in combination with in silico screening.These procedures are generally used to obtain hits (or leads) that are more likely to give successful clinical candidates.Virtual screening of virtual libraries (VSVL) is a rapidly changing area of research.Great efforts are being made to produce better algorithms, selection methods and infrastructure.It is a fact that these tools remain quite daunting for the majority of scientists working at the bench.The routine use of these methods is not simply a matter of education and training.The authors are confident that the synergy of these technologies will bring great benefit to the industry, with more efficient production of higher quality clinical candidates.The future is bright.The future is virtual focused assessment of Thrombin; factor Xa; estrogen receptor PRO_LEADS 10000 ChemBridge compounds [16] [78]stophe Mozziconacci et al studied[69]to exploit available structural information about the cyclooxygenase enzyme for the virtual screening of large chemical libraries; a docking protocol was turned and validated.The screening accuracy was assessed using a series of known inhibitor and a set of diverse a priori inactivate compound that was seeded with known active ligand.Andeas Evers et al. studied[70]the homology modeling of the alpha 1A receptor based on the X-ray structure throughput docking into the active site of X-ray structures for AANAT, and in total 241 compounds were tested as inhibitors.Cavasotto CN[77]et al studied A ligand-steered homology modeling approach has been developed (where information about existing ligands is used explicitly to shape and optimize the binding site) followed by docking-based virtual screening.Top scoring compounds identified virtually were tested experimentally in an MCH-R1 (Melanin-concentrating hormone receptor 1) competitive binding assay, and six novel chemotypes as low micromolar affinity antagonist "hits" were identified.Jie-Fei Cheng[78]et al conducted virtual docking studies using GLIDE with modified LXRβ ligand-binding domain (LBD) on internal compound collection followed by the gene profiling with ArrayPlate mRNA assay.A total of 69 compounds were found to upregulate LXRα and certain LXR regulated genes from 1308 compounds selected by virtual screen (hit rate: 5.3%).
V.Vyas et al.:corporate databases by virtual screening using well validated pharmacophore models will yield to a significant improvement in lead structure determination.