Decrypting the interface residues of the protein complexes provides insight into the functions of the proteins and, hence, the overall cellular machinery. Computational methods have been devised in the past to predict the interface residues using amino acid sequence information, but all these methods have been majorly applied to predict for prokaryotic protein complexes. Since the composition and rate of evolution of the primary sequence is different between prokaryotes and eukaryotes, it is important to develop a method specifically for eukaryotic complexes. Here, we report a new hybrid pipeline for predicting the protein-protein interaction interfaces in a pairwise manner from the amino acid sequence information of the interacting proteins. It is based on the framework of Co-evolution, machine learning (Random Forest), and Network Analysis named CoRNeA trained specifically on eukaryotic protein complexes. We use Co-evolution, physicochemical properties, and contact potential as major group of features to train the Random Forest classifier. We also incorporate the intra-contact information of the individual proteins to eliminate false positives from the predictions keeping in mind that the amino acid sequence of a protein also holds information for its own folding and not only the interface propensities. Our prediction on example datasets shows that CoRNeA not only enhances the prediction of true interface residues but also reduces false positive rates significantly.
This is an open access article distributed under the Creative Commons Attribution License
which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited