SSnet: A Deep Learning Approach for Protein-Ligand Interaction Prediction

Computational prediction of Protein-Ligand Interaction (PLI) is an important step in the modern drug discovery pipeline as it mitigates the cost, time, and resources required to screen novel therapeutics. Deep Neural Networks (DNN) have recently shown excellent performance in PLI prediction. However, the performance is highly dependent on protein and ligand features utilized for the DNN model. Moreover, in current models, the deciphering of how protein features determine the underlying principles that govern PLI is not trivial. In this work, we developed a DNN framework named SSnet that utilizes secondary structure information of proteins extracted as the curvature and torsion of the protein backbone to predict PLI. We demonstrate the performance of SSnet by comparing against a variety of currently popular machine and non-Machine Learning (ML) models using various metrics. We visualize the intermediate layers of SSnet to show a potential latent space for proteins, in particular to extract structural elements in a protein that the model finds influential for ligand binding, which is one of the key features of SSnet. We observed in our study that SSnet learns information about locations in a protein where a ligand can bind, including binding sites, allosteric sites and cryptic sites, regardless of the conformation used. We further observed that SSnet is not biased to any specific molecular interaction and extracts the protein fold information critical for PLI prediction. Our work forms an important gateway to the general exploration of secondary structure-based Deep Learning (DL), which is not just confined to protein-ligand interactions, and as such will have a large impact on protein research, while being readily accessible for de novo drug designers as a standalone package.


Introduction to Hilbert Space Filling Curve
Space filling curves are a class of curves in mathematical analysis that are used to map every point of a 1-dimensional line into every ordered-pair in 2-dimensional space. Hilbert's space filling curve is an example of such a curve. The curve is defined recursively such that with each recursion the line occupies more space, ultimately filling it after infinite recursions.
A first order Hilbert curve divides a given space into quadrants whose center points are joined consecutively without repeating or intersecting the curve (similarly to the classic snake game). A second order curve further subdivides the quadrants whose centers become new points. The points are first connected within the sub-quadrants and the sub-quadrants are rotated such that the end-points of each sub-quadrant can be connected to its neighbor.
This process is recursively done until the line spans the entire 2D space. In the present study, we recur this algorithm until the number of ordered-pair generated is less than or equal to the number of compounds to be mapped. A python script named plot_hill.py can be found in https://github.com/nischal-karki/chem-hilbert-web-host/, this script was used to generate the required dimension of 2D image for each of the datasets.

SSnet and smina scores
Top scores for both smina and SSnet  SSnet scores are probability, in the range from 0 to 1, of a drug binding with a IC50 less then 10nM. Smina values are provided in kcal/mol. Values represent mean of 3 replicas. The deviation reported is calculated using the average of absolute deviations from the mean values of the 3 smina replicas.

Top scores with SSnet
Top Scores for SSnet ACE2 (open) -ACE2:S1 (closed) SSnet scores with and without Zinc of known RAAS interacting molecules Table S4 shows the difference in SSnet scores for known RAAS interacting molecules in presence and absence of zinc. This table provides an indication of the positive influence of Zinc cation in the binding probabilities of these compounds, suggesting a positive cooperation of Zinc and these molecules in binding to the ACE2 receptor.  Top Scores for SSnet ACE2 (open) -ACE2:S1 (closed) with Zinc.   Smina and SSnet scores for top scoring compounds according to smina without restriction on number of atoms Table S7 shows the top scoring compounds ranked on the basis of smina score when no cutoff on the number of atoms is applied. It can be noted that the top binding compounds present a large deviation associated with their smina energies. The deviation reported is calculated using the average of absolute deviations from the mean values of the 3 smina replicas. A significant amount of the compounds in this table are macrolides with an antibiotic function. SSnet scores are probability, in the range from 0 to 1, of a drug binding with a IC50 less then 10nM. Smina values are provided in kcal/mol. Values represent mean of 3 replicas. The deviation reported is calculated using the average of absolute deviations from the mean values of the 3 smina replicas. Figure S1: ACE2:S1 complex with box used for docking. In blue ACE2 receptor. In orange S1 sub-unit of the spike protein. In green box used for the smina docking.

Defective 3D structures
The