E-Mail Alert

Add your e-mail address to receive forthcoming issues of this journal:

Journal Browser

Journal Browser

Special Issue "Computational Analysis for Protein Structure and Interaction"

A special issue of Molecules (ISSN 1420-3049). This special issue belongs to the section "Bioorganic Chemistry".

Deadline for manuscript submissions: 31 December 2018

Special Issue Editor

Guest Editor
Prof. Dr. Quan Zou

School of Computer Science and Technology, Tianjin University, Tianjin 300350, China
Website | E-Mail
Interests: bioinformatics; molecular computing; sequence alignment; systems biology

Special Issue Information

Dear Colleagues,

Protein structure analysis is a hot topic and key issue in organic chemistry and molecular biology research. Several essential protein molecules were rebuilt with Cryo-EM (Cryo-Electron Microscopy) and their structures were published in Nature and Science. Computational structure analysis and prediction is a key process for the 3D structure reconstruction. Machine learning techniques have been employed for protein secondary and tertiary structure prediction for a long time, and it seemed to have reached a bottleneck. However, the development of the Cryo-EM technique brings new challenges and requirements to computer science. Additionally, deep learning in machine learning also seems to be powerful. Therefore, there is considerable and increasing interest in developing computational methods for protein structure analysis and prediction. Moreover, new techniques on structure could also facilitate protein–protein interaction research.

The Guest Editor looks forward to collecting a set of recent advances in the related topics, to provide a platform for researchers, and bridge the gap between computer researchers and structural chemistry researchers.

Prof. Dr. Quan Zou
Guest Editor

Manuscript Submission Information

Manuscripts should be submitted online at www.mdpi.com by registering and logging in to this website. Once you are registered, click here to go to the submission form. Manuscripts can be submitted until the deadline. All papers will be peer-reviewed. Accepted papers will be published continuously in the journal (as soon as accepted) and will be listed together on the special issue website. Research articles, review articles as well as short communications are invited. For planned papers, a title and short abstract (about 100 words) can be sent to the Editorial Office for announcement on this website.

Submitted manuscripts should not have been published previously, nor be under consideration for publication elsewhere (except conference proceedings papers). All manuscripts are thoroughly refereed through a single-blind peer-review process. A guide for authors and other relevant information for submission of manuscripts is available on the Instructions for Authors page. Molecules is an international peer-reviewed open access monthly journal published by MDPI.

Please visit the Instructions for Authors page before submitting a manuscript. The Article Processing Charge (APC) for publication in this open access journal is 1800 CHF (Swiss Francs). Submitted papers should be well formatted and use good English. Authors may use MDPI's English editing service prior to publication or during author revisions.

Keywords

  • protein structure prediction

  • protein–protein interaction network

  • Cryo-EM molecule particles boxing

  • Cryo-EM image process

  • machine learning

  • protein disorder region

  • docking

  • protein inter-residue contacts prediction

Related Special Issue

Published Papers (35 papers)

View options order results:
result details:
Displaying articles 1-35
Export citation of selected articles as:

Research

Jump to: Review, Other

Open AccessArticle Prediction of GluN2B-CT1290-1310/DAPK1 Interaction by Protein–Peptide Docking and Molecular Dynamics Simulation
Molecules 2018, 23(11), 3018; https://doi.org/10.3390/molecules23113018 (registering DOI)
Received: 20 August 2018 / Revised: 4 November 2018 / Accepted: 6 November 2018 / Published: 19 November 2018
PDF Full-text (3783 KB) | HTML Full-text | XML Full-text | Supplementary Files
Abstract
The interaction of death-associated protein kinase 1 (DAPK1) with the 2B subunit (GluN2B) C-terminus of N-methyl-D-aspartate receptor (NMDAR) plays a critical role in the pathophysiology of depression and is considered a potential target for the structure-based discovery of new antidepressants. However, the 3D
[...] Read more.
The interaction of death-associated protein kinase 1 (DAPK1) with the 2B subunit (GluN2B) C-terminus of N-methyl-D-aspartate receptor (NMDAR) plays a critical role in the pathophysiology of depression and is considered a potential target for the structure-based discovery of new antidepressants. However, the 3D structures of C-terminus residues 1290–1310 of GluN2B (GluN2B-CT1290-1310) remain elusive and the interaction between GluN2B-CT1290-1310 and DAPK1 is unknown. In this study, the mechanism of interaction between DAPK1 and GluN2B-CT1290-1310 was predicted by computational simulation methods including protein–peptide docking and molecular dynamics (MD) simulation. Based on the equilibrated MD trajectory, the total binding free energy between GluN2B-CT1290-1310 and DAPK1 was computed by the mechanics generalized born surface area (MM/GBSA) approach. The simulation results showed that hydrophobic, van der Waals, and electrostatic interactions are responsible for the binding of GluN2B-CT1290–1310/DAPK1. Moreover, through per-residue free energy decomposition and in silico alanine scanning analysis, hotspot residues between GluN2B-CT1290-1310 and DAPK1 interface were identified. In conclusion, this work predicted the binding mode and quantitatively characterized the protein–peptide interface, which will aid in the discovery of novel drugs targeting the GluN2B-CT1290-1310 and DAPK1 interface. Full article
(This article belongs to the Special Issue Computational Analysis for Protein Structure and Interaction)
Figures

Figure 1

Open AccessArticle Novel Transforming Growth Factor-Beta Receptor 1 Antagonists through a Pharmacophore-Based Virtual Screening Approach
Molecules 2018, 23(11), 2824; https://doi.org/10.3390/molecules23112824
Received: 17 October 2018 / Revised: 27 October 2018 / Accepted: 29 October 2018 / Published: 31 October 2018
PDF Full-text (3869 KB) | HTML Full-text | XML Full-text
Abstract
As new drugs for the treatment of malignant tumors, transforming growth factor-beta receptor 1 (TGFβR1) antagonists have attracted wide attention. Based on the crystal structure of TGFβR1-BMS22 complex, the pharmacophore model A02 with two hydrogen bond acceptors (HBAs) and four hydrophobic (HYD) properties
[...] Read more.
As new drugs for the treatment of malignant tumors, transforming growth factor-beta receptor 1 (TGFβR1) antagonists have attracted wide attention. Based on the crystal structure of TGFβR1-BMS22 complex, the pharmacophore model A02 with two hydrogen bond acceptors (HBAs) and four hydrophobic (HYD) properties was constructed. From the common features of active ligands reported in the literature, pharmacophore model B10 was also generated, which has two aromatic ring centers (RAs) and two HYD properties. The two models have high sensitivity and specificity to the training set, and they are highly consistent in spatial structure. Combining the two pharmacophore models, two novel skeleton structures with potential activity were selected by virtual screening from the DruglikeDiverse, MiniMaybridge, and ZINC Drug-Like databases. Four compounds (YXY01–YXY04) with potential anti-TGFβR1 activity were designed based on the new skeleton structures. In combination with Lipinski’s rules; absorption, distribution, metabolism, excretion, and toxicity (ADMET); and, toxicological properties predicted in the study, YXY01-03 with the novel skeleton, good drug-like properties, and potential activity were finally discovered and may have higher safety relative to BMS22, which may be valuable for further research. Full article
(This article belongs to the Special Issue Computational Analysis for Protein Structure and Interaction)
Figures

Figure 1

Open AccessArticle A Central Edge Selection Based Overlapping Community Detection Algorithm for the Detection of Overlapping Structures in Protein–Protein Interaction Networks
Molecules 2018, 23(10), 2633; https://doi.org/10.3390/molecules23102633
Received: 9 September 2018 / Revised: 8 October 2018 / Accepted: 9 October 2018 / Published: 13 October 2018
PDF Full-text (9103 KB) | HTML Full-text | XML Full-text | Supplementary Files
Abstract
Overlapping structures of protein–protein interaction networks are very prevalent in different biological processes, which reflect the sharing mechanism to common functional components. The overlapping community detection (OCD) algorithm based on central node selection (CNS) is a traditional and acceptable algorithm for OCD in
[...] Read more.
Overlapping structures of protein–protein interaction networks are very prevalent in different biological processes, which reflect the sharing mechanism to common functional components. The overlapping community detection (OCD) algorithm based on central node selection (CNS) is a traditional and acceptable algorithm for OCD in networks. The main content of CNS is the central node selection and the clustering procedure. However, the original CNS does not consider the influence among the nodes and the importance of the division of the edges in networks. In this paper, an OCD algorithm based on a central edge selection (CES) algorithm for detection of overlapping communities of protein–protein interaction (PPI) networks is proposed. Different from the traditional CNS algorithms for OCD, the proposed algorithm uses community magnetic interference (CMI) to obtain more reasonable central edges in the process of CES, and employs a new distance between the non-central edge and the set of the central edges to divide the non-central edge into the correct cluster during the clustering procedure. In addition, the proposed CES improves the strategy of overlapping nodes pruning (ONP) to make the division more precisely. The experimental results on three benchmark networks and three biological PPI networks of Mus. musculus, Escherichia coli, and Cerevisiae show that the CES algorithm performs well. Full article
(This article belongs to the Special Issue Computational Analysis for Protein Structure and Interaction)
Figures

Figure 1

Open AccessArticle In-Silico Prediction and Modeling of the Quorum Sensing LuxS Protein and Inhibition of AI-2 Biosynthesis in Aeromonas hydrophila
Molecules 2018, 23(10), 2627; https://doi.org/10.3390/molecules23102627
Received: 17 September 2018 / Revised: 10 October 2018 / Accepted: 10 October 2018 / Published: 12 October 2018
PDF Full-text (5405 KB) | HTML Full-text | XML Full-text | Supplementary Files
Abstract
luxS is conserved in several bacterial species, including A. hydrophila, which causes infections in prawn, fish, and shrimp, and is consequently a great risk to the aquaculture industry and public health. luxS plays a critical role in the biosynthesis of the autoinducer-2 (AI-2),
[...] Read more.
luxS is conserved in several bacterial species, including A. hydrophila, which causes infections in prawn, fish, and shrimp, and is consequently a great risk to the aquaculture industry and public health. luxS plays a critical role in the biosynthesis of the autoinducer-2 (AI-2), which performs wide-ranging functions in bacterial communication, and especially in quorum sensing (QS). The prediction of a 3D structure of the QS-associated LuxS protein is thus essential to better understand and control A. hydrophila pathogenecity. Here, we predicted the structure of A. hydrophila LuxS and characterized it structurally and functionally with in silico methods. The predicted structure of LuxS provides a framework to develop more complete structural and functional insights and will aid the mitigation of A. hydrophila infection, and the development of novel drugs to control infections. In addition to modeling, the suitable inhibitor was identified by high through put screening (HTS) against drug like subset of ZINC database and inhibitor ((−)-Dimethyl 2,3-O-isopropylidene-l-tartrate) molecule was selected based on the best drug score. Molecular docking studies were performed to find out the best binding affinity between LuxS homologous or predicted model of LuxS protein for the ligand selection. Remarkably, this inhibitor molecule establishes agreeable interfaces with amino acid residues LYS 23, VAL 35, ILE76, and SER 90, which are found to play an essential role in inhibition mechanism. These predictions were suggesting that the proposed inhibitor molecule may be considered as drug candidates against AI-2 biosynthesis of A. hydrophila. Therefore, (−)-Dimethyl 2,3-O-isopropylidene-l-tartrate inhibitor molecule was studied to confirm its potency of AI-2 biosynthesis inhibition. The results shows that the inhibitor molecule had a better efficacy in AI-2 inhibition at 40 μM concentration, which was further validated using Western blotting at a protein expression level. The AI-2 bioluminescence assay showed that the decreased amount of AI-2 biosynthesis and downregulation of LuxS protein play an important role in the AI-2 inhibition. Lastly, these experiments were conducted with the supplementation of antibiotics via cocktail therapy of AI-2 inhibitor plus OXY antibiotics, in order to determine the possibility of novel cocktail drug treatments of A. hydrophila infection. Full article
(This article belongs to the Special Issue Computational Analysis for Protein Structure and Interaction)
Figures

Figure 1

Open AccessArticle An Algorithm for Computing Side Chain Conformational Variations of a Protein Tunnel/Channel
Molecules 2018, 23(10), 2459; https://doi.org/10.3390/molecules23102459
Received: 24 August 2018 / Revised: 21 September 2018 / Accepted: 22 September 2018 / Published: 26 September 2018
PDF Full-text (1839 KB) | HTML Full-text | XML Full-text
Abstract
In this paper, a novel method to compute side chain conformational variations for a protein molecule tunnel (or channel) is proposed. From the conformational variations, we compute the flexibly deformed shapes of the initial tunnel, and present a way to compute the maximum
[...] Read more.
In this paper, a novel method to compute side chain conformational variations for a protein molecule tunnel (or channel) is proposed. From the conformational variations, we compute the flexibly deformed shapes of the initial tunnel, and present a way to compute the maximum size of the ligand that can pass through the deformed tunnel. By using the two types of graphs corresponding to amino acids and their side chain rotamers, the suggested algorithm classifies amino acids and rotamers which possibly have collisions. Based on the divide and conquer technique, local side chain conformations are computed first, and then a global conformation is generated by combining them. With the exception of certain cases, experimental results show that the algorithm finds up to 327,680 valid side chain conformations from 128~1233 conformation candidates within three seconds. Full article
(This article belongs to the Special Issue Computational Analysis for Protein Structure and Interaction)
Figures

Figure 1

Open AccessArticle A New Method for Recognizing Cytokines Based on Feature Combination and a Support Vector Machine Classifier
Molecules 2018, 23(8), 2008; https://doi.org/10.3390/molecules23082008
Received: 19 June 2018 / Revised: 31 July 2018 / Accepted: 7 August 2018 / Published: 11 August 2018
PDF Full-text (2041 KB) | HTML Full-text | XML Full-text
Abstract
Research on cytokine recognition is of great significance in the medical field due to the fact cytokines benefit the diagnosis and treatment of diseases, but the current methods for cytokine recognition have many shortcomings, such as low sensitivity and low F-score. Therefore, this
[...] Read more.
Research on cytokine recognition is of great significance in the medical field due to the fact cytokines benefit the diagnosis and treatment of diseases, but the current methods for cytokine recognition have many shortcomings, such as low sensitivity and low F-score. Therefore, this paper proposes a new method on the basis of feature combination. The features are extracted from compositions of amino acids, physicochemical properties, secondary structures, and evolutionary information. The classifier used in this paper is SVM. Experiments show that our method is better than other methods in terms of accuracy, sensitivity, specificity, F-score and Matthew’s correlation coefficient. Full article
(This article belongs to the Special Issue Computational Analysis for Protein Structure and Interaction)
Figures

Figure 1

Open AccessArticle Identifying Phage Virion Proteins by Using Two-Step Feature Selection Methods
Molecules 2018, 23(8), 2000; https://doi.org/10.3390/molecules23082000
Received: 13 July 2018 / Revised: 30 July 2018 / Accepted: 8 August 2018 / Published: 10 August 2018
PDF Full-text (2474 KB) | HTML Full-text | XML Full-text
Abstract
Accurate identification of phage virion protein is not only a key step for understanding the function of the phage virion protein but also helpful for further understanding the lysis mechanism of the bacterial cell. Since traditional experimental methods are time-consuming and costly for
[...] Read more.
Accurate identification of phage virion protein is not only a key step for understanding the function of the phage virion protein but also helpful for further understanding the lysis mechanism of the bacterial cell. Since traditional experimental methods are time-consuming and costly for identifying phage virion proteins, it is extremely urgent to apply machine learning methods to accurately and efficiently identify phage virion proteins. In this work, a support vector machine (SVM) based method was proposed by mixing multiple sets of optimal g-gap dipeptide compositions. The analysis of variance (ANOVA) and the minimal-redundancy-maximal-relevance (mRMR) with an increment feature selection (IFS) were applied to single out the optimal feature set. In the five-fold cross-validation test, the proposed method achieved an overall accuracy of 87.95%. We believe that the proposed method will become an efficient and powerful method for scientists concerning phage virion proteins. Full article
(This article belongs to the Special Issue Computational Analysis for Protein Structure and Interaction)
Figures

Figure 1

Open AccessArticle NTyroSite: Computational Identification of Protein Nitrotyrosine Sites Using Sequence Evolutionary Features
Molecules 2018, 23(7), 1667; https://doi.org/10.3390/molecules23071667
Received: 22 May 2018 / Revised: 28 June 2018 / Accepted: 28 June 2018 / Published: 9 July 2018
PDF Full-text (2676 KB) | HTML Full-text | XML Full-text | Supplementary Files
Abstract
Nitrotyrosine is a product of tyrosine nitration mediated by reactive nitrogen species. As an indicator of cell damage and inflammation, protein nitrotyrosine serves to reveal biological change associated with various diseases or oxidative stress. Accurate identification of nitrotyrosine site provides the important foundation
[...] Read more.
Nitrotyrosine is a product of tyrosine nitration mediated by reactive nitrogen species. As an indicator of cell damage and inflammation, protein nitrotyrosine serves to reveal biological change associated with various diseases or oxidative stress. Accurate identification of nitrotyrosine site provides the important foundation for further elucidating the mechanism of protein nitrotyrosination. However, experimental identification of nitrotyrosine sites through traditional methods are laborious and expensive. In silico prediction of nitrotyrosine sites based on protein sequence information are thus highly desired. Here, we report a novel predictor, NTyroSite, for accurate prediction of nitrotyrosine sites using sequence evolutionary information. The generated features were optimized using a Wilcoxon-rank sum test. A random forest classifier was then trained using these features to build the predictor. The final NTyroSite predictor achieved an area under a receiver operating characteristics curve (AUC) score of 0.904 in a 10-fold cross-validation test. It also significantly outperformed other existing implementations in an independent test. Meanwhile, for a better understanding of our prediction model, the predominant rules and informative features were extracted from the NTyroSite model to explain the prediction results. We expect that the NTyroSite predictor may serve as a useful computational resource for high-throughput nitrotyrosine site prediction. The online interface of the software is publicly available at https://biocomputer.bio.cuhk.edu.hk/NTyroSite/. Full article
(This article belongs to the Special Issue Computational Analysis for Protein Structure and Interaction)
Figures

Graphical abstract

Open AccessArticle Feature Selection via Swarm Intelligence for Determining Protein Essentiality
Molecules 2018, 23(7), 1569; https://doi.org/10.3390/molecules23071569
Received: 25 May 2018 / Revised: 22 June 2018 / Accepted: 25 June 2018 / Published: 28 June 2018
PDF Full-text (1277 KB) | HTML Full-text | XML Full-text | Supplementary Files
Abstract
Protein essentiality is fundamental to comprehend the function and evolution of genes. The prediction of protein essentiality is pivotal in identifying disease genes and potential drug targets. Since the experimental methods need many investments in time and funds, it is of great value
[...] Read more.
Protein essentiality is fundamental to comprehend the function and evolution of genes. The prediction of protein essentiality is pivotal in identifying disease genes and potential drug targets. Since the experimental methods need many investments in time and funds, it is of great value to predict protein essentiality with high accuracy using computational methods. In this study, we present a novel feature selection named Elite Search mechanism-based Flower Pollination Algorithm (ESFPA) to determine protein essentiality. Unlike other protein essentiality prediction methods, ESFPA uses an improved swarm intelligence–based algorithm for feature selection and selects optimal features for protein essentiality prediction. The first step is to collect numerous features with the highly predictive characteristics of essentiality. The second step is to develop a feature selection strategy based on a swarm intelligence algorithm to obtain the optimal feature subset. Furthermore, an elite search mechanism is adopted to further improve the quality of feature subset. Subsequently a hybrid classifier is applied to evaluate the essentiality for each protein. Finally, the experimental results show that our method is competitive to some well-known feature selection methods. The proposed method aims to provide a new perspective for protein essentiality determination. Full article
(This article belongs to the Special Issue Computational Analysis for Protein Structure and Interaction)
Figures

Figure 1

Open AccessArticle Regularized Multi-View Subspace Clustering for Common Modules Across Cancer Stages
Molecules 2018, 23(5), 1016; https://doi.org/10.3390/molecules23051016
Received: 4 April 2018 / Revised: 23 April 2018 / Accepted: 23 April 2018 / Published: 26 April 2018
PDF Full-text (336 KB) | HTML Full-text | XML Full-text | Supplementary Files
Abstract
Discovering the common modules that are co-expressed across various stages can lead to an improved understanding of the underlying molecular mechanisms of cancers. There is a shortage of efficient tools for integrative analysis of gene expression and protein interaction networks for discovering common
[...] Read more.
Discovering the common modules that are co-expressed across various stages can lead to an improved understanding of the underlying molecular mechanisms of cancers. There is a shortage of efficient tools for integrative analysis of gene expression and protein interaction networks for discovering common modules associated with cancer progression. To address this issue, we propose a novel regularized multi-view subspace clustering (rMV-spc) algorithm to obtain a representation matrix for each stage and a joint representation matrix that balances the agreement across various stages. To avoid the heterogeneity of data, the protein interaction network is incorporated into the objective of rMV-spc via regularization. Based on the interior point algorithm, we solve the optimization problem to obtain the common modules. By using artificial networks, we demonstrate that the proposed algorithm outperforms state-of-the-art methods in terms of accuracy. Furthermore, the rMV-spc discovers common modules in breast cancer networks based on the breast data, and these modules serve as biomarkers to predict stages of breast cancer. The proposed model and algorithm effectively integrate heterogeneous data for dynamic modules. Full article
(This article belongs to the Special Issue Computational Analysis for Protein Structure and Interaction)
Figures

Figure 1

Open AccessArticle Computational Prediction and Analysis of Associations between Small Molecules and Binding-Associated S-Nitrosylation Sites
Molecules 2018, 23(4), 954; https://doi.org/10.3390/molecules23040954
Received: 24 February 2018 / Revised: 30 March 2018 / Accepted: 9 April 2018 / Published: 19 April 2018
PDF Full-text (7770 KB) | HTML Full-text | XML Full-text | Supplementary Files
Abstract
Interactions between drugs and proteins occupy a central position during the process of drug discovery and development. Numerous methods have recently been developed for identifying drug–target interactions, but few have been devoted to finding interactions between post-translationally modified proteins and drugs. We presented
[...] Read more.
Interactions between drugs and proteins occupy a central position during the process of drug discovery and development. Numerous methods have recently been developed for identifying drug–target interactions, but few have been devoted to finding interactions between post-translationally modified proteins and drugs. We presented a machine learning-based method for identifying associations between small molecules and binding-associated S-nitrosylated (SNO-) proteins. Namely, small molecules were encoded by molecular fingerprint, SNO-proteins were encoded by the information entropy-based method, and the random forest was used to train a classifier. Ten-fold and leave-one-out cross validations achieved, respectively, 0.7235 and 0.7490 of the area under a receiver operating characteristic curve. Computational analysis of similarity suggested that SNO-proteins associated with the same drug shared statistically significant similarity, and vice versa. This method and finding are useful to identify drug–SNO associations and further facilitate the discovery and development of SNO-associated drugs. Full article
(This article belongs to the Special Issue Computational Analysis for Protein Structure and Interaction)
Figures

Figure 1

Open AccessArticle Prediction of Protein-Protein Interactions from Amino Acid Sequences Based on Continuous and Discrete Wavelet Transform Features
Molecules 2018, 23(4), 823; https://doi.org/10.3390/molecules23040823
Received: 6 March 2018 / Revised: 25 March 2018 / Accepted: 29 March 2018 / Published: 4 April 2018
Cited by 2 | PDF Full-text (1125 KB) | HTML Full-text | XML Full-text
Abstract
Protein-protein interactions (PPIs) play important roles in various aspects of the structural and functional organization of cells; thus, detecting PPIs is one of the most important issues in current molecular biology. Although much effort has been devoted to using high-throughput techniques to identify
[...] Read more.
Protein-protein interactions (PPIs) play important roles in various aspects of the structural and functional organization of cells; thus, detecting PPIs is one of the most important issues in current molecular biology. Although much effort has been devoted to using high-throughput techniques to identify protein-protein interactions, the experimental methods are both time-consuming and costly. In addition, they yield high rates of false positive and false negative results. In addition, most of the proposed computational methods are limited in information about protein homology or the interaction marks of the protein partners. In this paper, we report a computational method only using the information from protein sequences. The main improvements come from novel protein sequence representation by combing the continuous and discrete wavelet transforms and from adopting weighted sparse representation-based classifier (WSRC). The proposed method was used to predict PPIs from three different datasets: yeast, human and H. pylori. In addition, we employed the prediction model trained on the PPIs dataset of yeast to predict the PPIs of six datasets of other species. To further evaluate the performance of the prediction model, we compared WSRC with the state-of-the-art support vector machine classifier. When predicting PPIs of yeast, humans and H. pylori dataset, we obtained high average prediction accuracies of 97.38%, 98.92% and 93.93% respectively. In the cross-species experiments, most of the prediction accuracies are over 94%. These promising results show that the proposed method is indeed capable of obtaining higher performance in PPIs detection. Full article
(This article belongs to the Special Issue Computational Analysis for Protein Structure and Interaction)
Figures

Figure 1

Open AccessArticle Representation Learning for Class C G Protein-Coupled Receptors Classification
Molecules 2018, 23(3), 690; https://doi.org/10.3390/molecules23030690
Received: 27 February 2018 / Revised: 14 March 2018 / Accepted: 15 March 2018 / Published: 19 March 2018
PDF Full-text (352 KB) | HTML Full-text | XML Full-text | Supplementary Files
Abstract
G protein-coupled receptors (GPCRs) are integral cell membrane proteins of relevance for pharmacology. The complete tertiary structure including both extracellular and transmembrane domains has not been determined for any member of class C GPCRs. An alternative way to work on GPCR structural models
[...] Read more.
G protein-coupled receptors (GPCRs) are integral cell membrane proteins of relevance for pharmacology. The complete tertiary structure including both extracellular and transmembrane domains has not been determined for any member of class C GPCRs. An alternative way to work on GPCR structural models is the investigation of their functionality through the analysis of their primary structure. For this, sequence representation is a key factor for the GPCRs’ classification context, where usually, feature engineering is carried out. In this paper, we propose the use of representation learning to acquire the features that best represent the class C GPCR sequences and at the same time to obtain a model for classification automatically. Deep learning methods in conjunction with amino acid physicochemical property indices are then used for this purpose. Experimental results assessed by the classification accuracy, Matthews’ correlation coefficient and the balanced error rate show that using a hydrophobicity index and a restricted Boltzmann machine (RBM) can achieve performance results (accuracy of 92.9%) similar to those reported in the literature. As a second proposal, we combine two or more physicochemical property indices instead of only one as the input for a deep architecture in order to add information from the sequences. Experimental results show that using three hydrophobicity-related index combinations helps to improve the classification performance (accuracy of 94.1%) of an RBM better than those reported in the literature for class C GPCRs without using feature selection methods. Full article
(This article belongs to the Special Issue Computational Analysis for Protein Structure and Interaction)
Figures

Figure 1

Open AccessArticle RPiRLS: Quantitative Predictions of RNA Interacting with Any Protein of Known Sequence
Molecules 2018, 23(3), 540; https://doi.org/10.3390/molecules23030540
Received: 6 February 2018 / Revised: 24 February 2018 / Accepted: 25 February 2018 / Published: 28 February 2018
PDF Full-text (1149 KB) | HTML Full-text | XML Full-text
Abstract
RNA-protein interactions (RPIs) have critical roles in numerous fundamental biological processes, such as post-transcriptional gene regulation, viral assembly, cellular defence and protein synthesis. As the number of available RNA-protein binding experimental data has increased rapidly due to high-throughput sequencing methods, it is now
[...] Read more.
RNA-protein interactions (RPIs) have critical roles in numerous fundamental biological processes, such as post-transcriptional gene regulation, viral assembly, cellular defence and protein synthesis. As the number of available RNA-protein binding experimental data has increased rapidly due to high-throughput sequencing methods, it is now possible to measure and understand RNA-protein interactions by computational methods. In this study, we integrate a sequence-based derived kernel with regularized least squares to perform prediction. The derived kernel exploits the contextual information around an amino acid or a nucleic acid as well as the repetitive conserved motif information. We propose a novel machine learning method, called RPiRLS to predict the interaction between any RNA and protein of known sequences. For the RPiRLS classifier, each protein sequence comprises up to 20 diverse amino acids but for the RPiRLS-7G classifier, each protein sequence is represented by using 7-letter reduced alphabets based on their physiochemical properties. We evaluated both methods on a number of benchmark data sets and compared their performances with two newly developed and state-of-the-art methods, RPI-Pred and IPMiner. On the non-redundant benchmark test sets extracted from the PRIDB, the RPiRLS method outperformed RPI-Pred and IPMiner in terms of accuracy, specificity and sensitivity. Further, RPiRLS achieved an accuracy of 92% on the prediction of lncRNA-protein interactions. The proposed method can also be extended to construct RNA-protein interaction networks. The RPiRLS web server is freely available at http://bmc.med.stu.edu.cn/RPiRLS. Full article
(This article belongs to the Special Issue Computational Analysis for Protein Structure and Interaction)
Figures

Figure 1

Open AccessArticle Structural Dynamics of DPP-4 and Its Influence on the Projection of Bioactive Ligands
Molecules 2018, 23(2), 490; https://doi.org/10.3390/molecules23020490
Received: 19 December 2017 / Revised: 1 February 2018 / Accepted: 6 February 2018 / Published: 23 February 2018
Cited by 1 | PDF Full-text (3168 KB) | HTML Full-text | XML Full-text | Supplementary Files
Abstract
Dipeptidyl peptidase-4 (DPP-4) is a target to treat type II diabetes mellitus. Therefore, it is important to understand the structural aspects of this enzyme and its interaction with drug candidates. This study involved molecular dynamics simulations, normal mode analysis, binding site detection and
[...] Read more.
Dipeptidyl peptidase-4 (DPP-4) is a target to treat type II diabetes mellitus. Therefore, it is important to understand the structural aspects of this enzyme and its interaction with drug candidates. This study involved molecular dynamics simulations, normal mode analysis, binding site detection and analysis of molecular interactions to understand the protein dynamics. We identified some DPP-4 functional motions contributing to the exposure of the binding sites and twist movements revealing how the two enzyme chains are interconnected in their bioactive form, which are defined as chains A (residues 40–767) and B (residues 40–767). By understanding the enzyme structure, its motions and the regions of its binding sites, it will be possible to contribute to the design of new DPP-4 inhibitors as drug candidates to treat diabetes. Full article
(This article belongs to the Special Issue Computational Analysis for Protein Structure and Interaction)
Figures

Figure 1

Open AccessArticle Designability of Aromatic Interaction Networks at E. coli Bacterioferritin B-Type Channels
Molecules 2017, 22(12), 2184; https://doi.org/10.3390/molecules22122184
Received: 31 October 2017 / Revised: 1 December 2017 / Accepted: 6 December 2017 / Published: 8 December 2017
PDF Full-text (3988 KB) | HTML Full-text | XML Full-text | Supplementary Files
Abstract
The bacterioferritin from E. coli (BFR), a maxi-ferritin made of 24 subunits, has been utilized as a model to study the fundamentals of protein folding and self-assembly. Through structural and computational analyses, two amino acid residues at the B-site interface of BFR were
[...] Read more.
The bacterioferritin from E. coli (BFR), a maxi-ferritin made of 24 subunits, has been utilized as a model to study the fundamentals of protein folding and self-assembly. Through structural and computational analyses, two amino acid residues at the B-site interface of BFR were chosen to investigate the role they play in the self-assembly of nano-cage formation, and the possibility of building aromatic interaction networks at B-type protein–protein interfaces. Three mutants were designed, expressed, purified, and characterized using transmission electron microscopy, size exclusion chromatography, native gel electrophoresis, and temperature-dependent circular dichroism spectroscopy. All of the mutants fold into α-helical structures and possess lowered thermostability. The double mutant D132W/N34W was 12 °C less stable than the wild type, and was also the only mutant for which cage-like nanostructures could not be detected in the dried, surface-immobilized conditions of transmission electron microscopy. Two mutants—N34W and D132W/N34W—only formed dimers in solution, while mutant D132W favored the 24-mer even more robustly than the wild type, suggesting that we were successful in designing proteins with enhanced assembly properties. This investigation into the structure of this important class of proteins could help to understand the self-assembly of proteins in general. Full article
(This article belongs to the Special Issue Computational Analysis for Protein Structure and Interaction)
Figures

Figure 1

Open AccessArticle Identification of DNA–protein Binding Sites through Multi-Scale Local Average Blocks on Sequence Information
Molecules 2017, 22(12), 2079; https://doi.org/10.3390/molecules22122079
Received: 31 October 2017 / Revised: 22 November 2017 / Accepted: 24 November 2017 / Published: 28 November 2017
Cited by 2 | PDF Full-text (1548 KB) | HTML Full-text | XML Full-text
Abstract
DNA–protein interactions appear as pivotal roles in diverse biological procedures and are paramount for cell metabolism, while identifying them with computational means is a kind of prudent scenario in depleting in vitro and in vivo experimental charging. A variety of state-of-the-art investigations have
[...] Read more.
DNA–protein interactions appear as pivotal roles in diverse biological procedures and are paramount for cell metabolism, while identifying them with computational means is a kind of prudent scenario in depleting in vitro and in vivo experimental charging. A variety of state-of-the-art investigations have been elucidated to improve the accuracy of the DNA–protein binding sites prediction. Nevertheless, structure-based approaches are limited under the condition without 3D information, and the predictive validity is still refinable. In this essay, we address a kind of competitive method called Multi-scale Local Average Blocks (MLAB) algorithm to solve this issue. Different from structure-based routes, MLAB exploits a strategy that not only extracts local evolutionary information from primary sequences, but also using predicts solvent accessibility. Moreover, the construction about predictors of DNA–protein binding sites wields an ensemble weighted sparse representation model with random under-sampling. To evaluate the performance of MLAB, we conduct comprehensive experiments of DNA–protein binding sites prediction. MLAB gives M C C of 0.392 , 0.315 , 0.439 and 0.245 on PDNA-543, PDNA-41, PDNA-316 and PDNA-52 datasets, respectively. It shows that MLAB gains advantages by comparing with other outstanding methods. M C C for our method is increased by at least 0.053 , 0.015 and 0.064 on PDNA-543, PDNA-41 and PDNA-316 datasets, respectively. Full article
(This article belongs to the Special Issue Computational Analysis for Protein Structure and Interaction)
Figures

Figure 1

Open AccessArticle Drug-Target Interaction Prediction through Label Propagation with Linear Neighborhood Information
Molecules 2017, 22(12), 2056; https://doi.org/10.3390/molecules22122056
Received: 12 October 2017 / Revised: 19 November 2017 / Accepted: 20 November 2017 / Published: 25 November 2017
Cited by 13 | PDF Full-text (1888 KB) | HTML Full-text | XML Full-text
Abstract
Interactions between drugs and target proteins provide important information for the drug discovery. Currently, experiments identified only a small number of drug-target interactions. Therefore, the development of computational methods for drug-target interaction prediction is an urgent task of theoretical interest and practical significance.
[...] Read more.
Interactions between drugs and target proteins provide important information for the drug discovery. Currently, experiments identified only a small number of drug-target interactions. Therefore, the development of computational methods for drug-target interaction prediction is an urgent task of theoretical interest and practical significance. In this paper, we propose a label propagation method with linear neighborhood information (LPLNI) for predicting unobserved drug-target interactions. Firstly, we calculate drug-drug linear neighborhood similarity in the feature spaces, by considering how to reconstruct data points from neighbors. Then, we take similarities as the manifold of drugs, and assume the manifold unchanged in the interaction space. At last, we predict unobserved interactions between known drugs and targets by using drug-drug linear neighborhood similarity and known drug-target interactions. The experiments show that LPLNI can utilize only known drug-target interactions to make high-accuracy predictions on four benchmark datasets. Furthermore, we consider incorporating chemical structures into LPLNI models. Experimental results demonstrate that the model with integrated information (LPLNI-II) can produce improved performances, better than other state-of-the-art methods. The known drug-target interactions are an important information source for computational predictions. The usefulness of the proposed method is demonstrated by cross validation and the case study. Full article
(This article belongs to the Special Issue Computational Analysis for Protein Structure and Interaction)
Figures

Figure 1

Open AccessArticle Predict the Relationship between Gene and Large Yellow Croaker’s Economic Traits
Molecules 2017, 22(11), 1978; https://doi.org/10.3390/molecules22111978
Received: 21 October 2017 / Revised: 5 November 2017 / Accepted: 6 November 2017 / Published: 16 November 2017
PDF Full-text (1888 KB) | HTML Full-text | XML Full-text
Abstract
The importance of a gene’s impact on traits is well appreciated. Gene expression will affect the growth, immunity, reproduction and environmental resistance of some fish, and then affect the economic performance of fish-related business. Studying the connection between gene and character can help
[...] Read more.
The importance of a gene’s impact on traits is well appreciated. Gene expression will affect the growth, immunity, reproduction and environmental resistance of some fish, and then affect the economic performance of fish-related business. Studying the connection between gene and character can help elucidate the growth of fishes. Thus far, a collected database containing large yellow croaker (Larimichthys crocea) genes does not exist. The gene having to do with the growth efficiency of fish will have a huge impact on research. For example, the protein encoded by the IFIH1 gene is associated with the function of viral infection in the immune system, which affects the survival rate of large yellow croakers. Thus, we collected data through the published literature and combined them with a biological genetic database related to the large yellow croaker. Based on the data, we can predict new gene–trait associations which have not yet been discovered. This work will contribute to research on the growth of large yellow croakers. Full article
(This article belongs to the Special Issue Computational Analysis for Protein Structure and Interaction)
Figures

Figure 1

Open AccessArticle Glypre: In Silico Prediction of Protein Glycation Sites by Fusing Multiple Features and Support Vector Machine
Molecules 2017, 22(11), 1891; https://doi.org/10.3390/molecules22111891
Received: 20 September 2017 / Accepted: 26 October 2017 / Published: 3 November 2017
Cited by 2 | PDF Full-text (2023 KB) | HTML Full-text | XML Full-text | Supplementary Files
Abstract
Glycation is a non-enzymatic process occurring inside or outside the host body by attaching a sugar molecule to a protein or lipid molecule. It is an important form of post-translational modification (PTM), which impairs the function and changes the characteristics of the proteins
[...] Read more.
Glycation is a non-enzymatic process occurring inside or outside the host body by attaching a sugar molecule to a protein or lipid molecule. It is an important form of post-translational modification (PTM), which impairs the function and changes the characteristics of the proteins so that the identification of the glycation sites may provide some useful guidelines to understand various biological functions of proteins. In this study, we proposed an accurate prediction tool, named Glypre, for lysine glycation. Firstly, we used multiple informative features to encode the peptides. These features included the position scoring function, secondary structure, AAindex, and the composition of k-spaced amino acid pairs. Secondly, the distribution of distinctive features of the residues surrounding the glycation and non-glycation sites was statistically analysed. Thirdly, based on the distribution of these features, we developed a new predictor by using different optimal window sizes for different properties and a two-step feature selection method, which utilized the maximum relevance minimum redundancy method followed by a greedy feature selection procedure. The performance of Glypre was measured with a sensitivity of 57.47%, a specificity of 90.78%, an accuracy of 79.68%, area under the receiver-operating characteristic (ROC) curve (AUC) of 0.86, and a Matthews’s correlation coefficient (MCC) of 0.52 by 10-fold cross-validation. The detailed analysis results showed that our predictor may play a complementary role to other existing methods for identifying protein lysine glycation. The source code and datasets of the Glypre are available in the Supplementary File. Full article
(This article belongs to the Special Issue Computational Analysis for Protein Structure and Interaction)
Figures

Graphical abstract

Open AccessArticle ProLanGO: Protein Function Prediction Using Neural Machine Translation Based on a Recurrent Neural Network
Molecules 2017, 22(10), 1732; https://doi.org/10.3390/molecules22101732
Received: 30 August 2017 / Revised: 11 October 2017 / Accepted: 11 October 2017 / Published: 17 October 2017
Cited by 11 | PDF Full-text (612 KB) | HTML Full-text | XML Full-text | Supplementary Files
Abstract
With the development of next generation sequencing techniques, it is fast and cheap to determine protein sequences but relatively slow and expensive to extract useful information from protein sequences because of limitations of traditional biological experimental techniques. Protein function prediction has been a
[...] Read more.
With the development of next generation sequencing techniques, it is fast and cheap to determine protein sequences but relatively slow and expensive to extract useful information from protein sequences because of limitations of traditional biological experimental techniques. Protein function prediction has been a long standing challenge to fill the gap between the huge amount of protein sequences and the known function. In this paper, we propose a novel method to convert the protein function problem into a language translation problem by the new proposed protein sequence language “ProLan” to the protein function language “GOLan”, and build a neural machine translation model based on recurrent neural networks to translate “ProLan” language to “GOLan” language. We blindly tested our method by attending the latest third Critical Assessment of Function Annotation (CAFA 3) in 2016, and also evaluate the performance of our methods on selected proteins whose function was released after CAFA competition. The good performance on the training and testing datasets demonstrates that our new proposed method is a promising direction for protein function prediction. In summary, we first time propose a method which converts the protein function prediction problem to a language translation problem and applies a neural machine translation model for protein function prediction. Full article
(This article belongs to the Special Issue Computational Analysis for Protein Structure and Interaction)
Figures

Figure 1

Open AccessArticle Systematic Identification of Machine-Learning Models Aimed to Classify Critical Residues for Protein Function from Protein Structure
Molecules 2017, 22(10), 1673; https://doi.org/10.3390/molecules22101673
Received: 14 August 2017 / Revised: 24 September 2017 / Accepted: 24 September 2017 / Published: 9 October 2017
PDF Full-text (824 KB) | HTML Full-text | XML Full-text | Supplementary Files
Abstract
Protein structure and protein function should be related, yet the nature of this relationship remains unsolved. Mapping the critical residues for protein function with protein structure features represents an opportunity to explore this relationship, yet two important limitations have precluded a proper analysis
[...] Read more.
Protein structure and protein function should be related, yet the nature of this relationship remains unsolved. Mapping the critical residues for protein function with protein structure features represents an opportunity to explore this relationship, yet two important limitations have precluded a proper analysis of the structure-function relationship of proteins: (i) the lack of a formal definition of what critical residues are and (ii) the lack of a systematic evaluation of methods and protein structure features. To address this problem, here we introduce an index to quantify the protein-function criticality of a residue based on experimental data and a strategy aimed to optimize both, descriptors of protein structure (physicochemical and centrality descriptors) and machine learning algorithms, to minimize the error in the classification of critical residues. We observed that both physicochemical and centrality descriptors of residues effectively relate protein structure and protein function, and that physicochemical descriptors better describe critical residues. We also show that critical residues are better classified when residue criticality is considered as a binary attribute (i.e., residues are considered critical or not critical). Using this binary annotation for critical residues 8 models rendered accurate and non-overlapping classification of critical residues, confirming the multi-factorial character of the structure-function relationship of proteins. Full article
(This article belongs to the Special Issue Computational Analysis for Protein Structure and Interaction)
Figures

Figure 1

Open AccessArticle Molecular Dynamic Simulation of Space and Earth-Grown Crystal Structures of Thermostable T1 Lipase Geobacillus zalihae Revealed a Better Structure
Molecules 2017, 22(10), 1574; https://doi.org/10.3390/molecules22101574
Received: 21 August 2017 / Accepted: 16 September 2017 / Published: 25 September 2017
PDF Full-text (3639 KB) | HTML Full-text | XML Full-text | Supplementary Files
Abstract
Less sedimentation and convection in a microgravity environment has become a well-suited condition for growing high quality protein crystals. Thermostable T1 lipase derived from bacterium Geobacillus zalihae has been crystallized using the counter diffusion method under space and earth conditions. Preliminary study using
[...] Read more.
Less sedimentation and convection in a microgravity environment has become a well-suited condition for growing high quality protein crystals. Thermostable T1 lipase derived from bacterium Geobacillus zalihae has been crystallized using the counter diffusion method under space and earth conditions. Preliminary study using YASARA molecular modeling structure program for both structures showed differences in number of hydrogen bond, ionic interaction, and conformation. The space-grown crystal structure contains more hydrogen bonds as compared with the earth-grown crystal structure. A molecular dynamics simulation study was used to provide insight on the fluctuations and conformational changes of both T1 lipase structures. The analysis of root mean square deviation (RMSD), radius of gyration, and root mean square fluctuation (RMSF) showed that space-grown structure is more stable than the earth-grown structure. Space-structure also showed more hydrogen bonds and ion interactions compared to the earth-grown structure. Further analysis also revealed that the space-grown structure has long-lived interactions, hence it is considered as the more stable structure. This study provides the conformational dynamics of T1 lipase crystal structure grown in space and earth condition. Full article
(This article belongs to the Special Issue Computational Analysis for Protein Structure and Interaction)
Figures

Figure 1

Open AccessArticle Identification of DNA-Binding Proteins Using Mixed Feature Representation Methods
Molecules 2017, 22(10), 1602; https://doi.org/10.3390/molecules22101602
Received: 15 August 2017 / Revised: 19 September 2017 / Accepted: 20 September 2017 / Published: 22 September 2017
Cited by 2 | PDF Full-text (807 KB) | HTML Full-text | XML Full-text
Abstract
DNA-binding proteins play vital roles in cellular processes, such as DNA packaging, replication, transcription, regulation, and other DNA-associated activities. The current main prediction method is based on machine learning, and its accuracy mainly depends on the features extraction method. Therefore, using an efficient
[...] Read more.
DNA-binding proteins play vital roles in cellular processes, such as DNA packaging, replication, transcription, regulation, and other DNA-associated activities. The current main prediction method is based on machine learning, and its accuracy mainly depends on the features extraction method. Therefore, using an efficient feature representation method is important to enhance the classification accuracy. However, existing feature representation methods cannot efficiently distinguish DNA-binding proteins from non-DNA-binding proteins. In this paper, a multi-feature representation method, which combines three feature representation methods, namely, K-Skip-N-Grams, Information theory, and Sequential and structural features (SSF), is used to represent the protein sequences and improve feature representation ability. In addition, the classifier is a support vector machine. The mixed-feature representation method is evaluated using 10-fold cross-validation and a test set. Feature vectors, which are obtained from a combination of three feature extractions, show the best performance in 10-fold cross-validation both under non-dimensional reduction and dimensional reduction by max-relevance-max-distance. Moreover, the reduced mixed feature method performs better than the non-reduced mixed feature technique. The feature vectors, which are a combination of SSF and K-Skip-N-Grams, show the best performance in the test set. Among these methods, mixed features exhibit superiority over the single features. Full article
(This article belongs to the Special Issue Computational Analysis for Protein Structure and Interaction)
Figures

Figure 1

Open AccessArticle Integrative Pathway Analysis of Genes and Metabolites Reveals Metabolism Abnormal Subpathway Regions and Modules in Esophageal Squamous Cell Carcinoma
Molecules 2017, 22(10), 1599; https://doi.org/10.3390/molecules22101599
Received: 22 August 2017 / Revised: 20 September 2017 / Accepted: 20 September 2017 / Published: 22 September 2017
PDF Full-text (2928 KB) | HTML Full-text | XML Full-text | Supplementary Files
Abstract
Aberrant metabolism is one of the main driving forces in the initiation and development of ESCC. Both genes and metabolites play important roles in metabolic pathways. Integrative pathway analysis of both genes and metabolites will thus help to interpret the underlying biological phenomena.
[...] Read more.
Aberrant metabolism is one of the main driving forces in the initiation and development of ESCC. Both genes and metabolites play important roles in metabolic pathways. Integrative pathway analysis of both genes and metabolites will thus help to interpret the underlying biological phenomena. Here, we performed integrative pathway analysis of gene and metabolite profiles by analyzing six gene expression profiles and seven metabolite profiles of ESCC. Multiple known and novel subpathways associated with ESCC, such as ‘beta-Alanine metabolism’, were identified via the cooperative use of differential genes, differential metabolites, and their positional importance information in pathways. Furthermore, a global ESCC-Related Metabolic (ERM) network was constructed and 31 modules were identified on the basis of clustering analysis in the ERM network. We found that the three modules located just to the center regions of the ERM network—especially the core region of Module_1—primarily consisted of aldehyde dehydrogenase (ALDH) superfamily members, which contributes to the development of ESCC. For Module_4, pyruvate and the genes and metabolites in its adjacent region were clustered together, and formed a core region within the module. Several prognostic genes, including GPT, ALDH1B1, ABAT, WBSCR22 and MDH1, appeared in the three center modules of the network, suggesting that they can become potentially prognostic markers in ESCC. Full article
(This article belongs to the Special Issue Computational Analysis for Protein Structure and Interaction)
Figures

Graphical abstract

Open AccessArticle EPuL: An Enhanced Positive-Unlabeled Learning Algorithm for the Prediction of Pupylation Sites
Molecules 2017, 22(9), 1463; https://doi.org/10.3390/molecules22091463
Received: 23 July 2017 / Revised: 29 August 2017 / Accepted: 30 August 2017 / Published: 5 September 2017
Cited by 3 | PDF Full-text (665 KB) | HTML Full-text | XML Full-text | Supplementary Files
Abstract
Protein pupylation is a type of post-translation modification, which plays a crucial role in cellular function of bacterial organisms in prokaryotes. To have a better insight of the mechanisms underlying pupylation an initial, but important, step is to identify pupylation sites. To date,
[...] Read more.
Protein pupylation is a type of post-translation modification, which plays a crucial role in cellular function of bacterial organisms in prokaryotes. To have a better insight of the mechanisms underlying pupylation an initial, but important, step is to identify pupylation sites. To date, several computational methods have been established for the prediction of pupylation sites which usually artificially design the negative samples using the verified pupylation proteins to train the classifiers. However, if this process is not properly done it can affect the performance of the final predictor dramatically. In this work, different from previous computational methods, we proposed an enhanced positive-unlabeled learning algorithm (EPuL) to the pupylation site prediction problem, which uses only positive and unlabeled samples. Firstly, we separate the training dataset into the positive dataset and the unlabeled dataset which contains the remaining non-annotated lysine residues. Then, the EPuL algorithm is utilized to select the reliably negative initial dataset and then iteratively pick out the non-pupylation sites. The performance of the proposed method was measured with an accuracy of 90.24%, an Area Under Curve (AUC) of 0.93 and an MCC of 0.81 by 10-fold cross-validation. A user-friendly web server for predicting pupylation sites was developed and was freely available at http://59.73.198.144:8080/EPuL Full article
(This article belongs to the Special Issue Computational Analysis for Protein Structure and Interaction)
Figures

Figure 1

Open AccessArticle Detection of Interactions between Proteins by Using Legendre Moments Descriptor to Extract Discriminatory Information Embedded in PSSM
Molecules 2017, 22(8), 1366; https://doi.org/10.3390/molecules22081366
Received: 24 July 2017 / Accepted: 15 August 2017 / Published: 18 August 2017
Cited by 3 | PDF Full-text (978 KB) | HTML Full-text | XML Full-text
Abstract
Protein-protein interactions (PPIs) play a very large part in most cellular processes. Although a great deal of research has been devoted to detecting PPIs through high-throughput technologies, these methods are clearly expensive and cumbersome. Compared with the traditional experimental methods, computational methods have
[...] Read more.
Protein-protein interactions (PPIs) play a very large part in most cellular processes. Although a great deal of research has been devoted to detecting PPIs through high-throughput technologies, these methods are clearly expensive and cumbersome. Compared with the traditional experimental methods, computational methods have attracted much attention because of their good performance in detecting PPIs. In our work, a novel computational method named as PCVM-LM is proposed which combines the probabilistic classification vector machine (PCVM) model and Legendre moments (LMs) to predict PPIs from amino acid sequences. The improvement mainly comes from using the LMs to extract discriminatory information embedded in the position-specific scoring matrix (PSSM) combined with the PCVM classifier to implement prediction. The proposed method was evaluated on Yeast and Helicobacter pylori datasets with five-fold cross-validation experiments. The experimental results show that the proposed method achieves high average accuracies of 96.37% and 93.48%, respectively, which are much better than other well-known methods. To further evaluate the proposed method, we also compared the proposed method with the state-of-the-art support vector machine (SVM) classifier and other existing methods on the same datasets. The comparison results clearly show that our method is better than the SVM-based method and other existing methods. The promising experimental results show the reliability and effectiveness of the proposed method, which can be a useful decision support tool for protein research. Full article
(This article belongs to the Special Issue Computational Analysis for Protein Structure and Interaction)
Figures

Figure 1

Open AccessArticle Predicting and Interpreting the Structure of Type IV Pilus of Electricigens by Molecular Dynamics Simulations
Molecules 2017, 22(8), 1342; https://doi.org/10.3390/molecules22081342
Received: 30 June 2017 / Revised: 7 August 2017 / Accepted: 10 August 2017 / Published: 12 August 2017
Cited by 1 | PDF Full-text (4254 KB) | HTML Full-text | XML Full-text | Supplementary Files
Abstract
Nanowires that transfer electrons to extracellular acceptors are important in organic matter degradation and nutrient cycling in the environment. Geobacter pili of the group of Type IV pilus are regarded as nanowire-like biological structures. However, determination of the structure of pili remains challenging
[...] Read more.
Nanowires that transfer electrons to extracellular acceptors are important in organic matter degradation and nutrient cycling in the environment. Geobacter pili of the group of Type IV pilus are regarded as nanowire-like biological structures. However, determination of the structure of pili remains challenging due to the insolubility of monomers, presence of surface appendages, heterogeneity of the assembly, and low-resolution of electron microscopy techniques. Our previous study provided a method to predict structures for Type IV pili. In this work, we improved on our previous method using molecular dynamics simulations to optimize structures of Neisseria gonorrhoeae (GC), Neisseria meningitidis and Geobacter uraniireducens pilus. Comparison between the predicted structures for GC and Neisseria meningitidis pilus and their native structures revealed that proposed method could predict Type IV pilus successfully. According to the predicted structures, the structural basis for conductivity in G.uraniireducens pili was attributed to the three N-terminal aromatic amino acids. The aromatics were interspersed within the regions of charged amino acids, which may influence the configuration of the aromatic contacts and the rate of electron transfer. These results will supplement experimental research into the mechanism of long-rang electron transport along pili of electricigens. Full article
(This article belongs to the Special Issue Computational Analysis for Protein Structure and Interaction)
Figures

Figure 1

Open AccessArticle Neighbor Affinity-Based Core-Attachment Method to Detect Protein Complexes in Dynamic PPI Networks
Molecules 2017, 22(7), 1223; https://doi.org/10.3390/molecules22071223
Received: 28 June 2017 / Revised: 14 July 2017 / Accepted: 18 July 2017 / Published: 24 July 2017
Cited by 2 | PDF Full-text (5444 KB) | HTML Full-text | XML Full-text
Abstract
Protein complexes play significant roles in cellular processes. Identifying protein complexes from protein-protein interaction (PPI) networks is an effective strategy to understand biological processes and cellular functions. A number of methods have recently been proposed to detect protein complexes. However, most of methods
[...] Read more.
Protein complexes play significant roles in cellular processes. Identifying protein complexes from protein-protein interaction (PPI) networks is an effective strategy to understand biological processes and cellular functions. A number of methods have recently been proposed to detect protein complexes. However, most of methods predict protein complexes from static PPI networks, and usually overlook the inherent dynamics and topological properties of protein complexes. In this paper, we proposed a novel method, called NABCAM (Neighbor Affinity-Based Core-Attachment Method), to identify protein complexes from dynamic PPI networks. Firstly, the centrality score of every protein is calculated. The proteins with the highest centrality scores are regarded as the seed proteins. Secondly, the seed proteins are expanded to complex cores by calculating the similarity values between the seed proteins and their neighboring proteins. Thirdly, the attachments are appended to their corresponding protein complex cores by comparing the affinity among neighbors inside the core, against that outside the core. Finally, filtering processes are carried out to obtain the final clustering result. The result in the DIP database shows that the NABCAM algorithm can predict protein complexes effectively in comparison with other state-of-the-art methods. Moreover, many protein complexes predicted by our method are biologically significant. Full article
(This article belongs to the Special Issue Computational Analysis for Protein Structure and Interaction)
Figures

Graphical abstract

Open AccessArticle Prediction of Drug–Target Interaction Networks from the Integration of Protein Sequences and Drug Chemical Structures
Molecules 2017, 22(7), 1119; https://doi.org/10.3390/molecules22071119
Received: 27 May 2017 / Revised: 27 June 2017 / Accepted: 3 July 2017 / Published: 5 July 2017
Cited by 9 | PDF Full-text (798 KB) | HTML Full-text | XML Full-text
Abstract
Knowledge of drug–target interaction (DTI) plays an important role in discovering new drug candidates. Unfortunately, there are unavoidable shortcomings; including the time-consuming and expensive nature of the experimental method to predict DTI. Therefore, it motivates us to develop an effective computational method to
[...] Read more.
Knowledge of drug–target interaction (DTI) plays an important role in discovering new drug candidates. Unfortunately, there are unavoidable shortcomings; including the time-consuming and expensive nature of the experimental method to predict DTI. Therefore, it motivates us to develop an effective computational method to predict DTI based on protein sequence. In the paper, we proposed a novel computational approach based on protein sequence, namely PDTPS (Predicting Drug Targets with Protein Sequence) to predict DTI. The PDTPS method combines Bi-gram probabilities (BIGP), Position Specific Scoring Matrix (PSSM), and Principal Component Analysis (PCA) with Relevance Vector Machine (RVM). In order to evaluate the prediction capacity of the PDTPS, the experiment was carried out on enzyme, ion channel, GPCR, and nuclear receptor datasets by using five-fold cross-validation tests. The proposed PDTPS method achieved average accuracy of 97.73%, 93.12%, 86.78%, and 87.78% on enzyme, ion channel, GPCR and nuclear receptor datasets, respectively. The experimental results showed that our method has good prediction performance. Furthermore, in order to further evaluate the prediction performance of the proposed PDTPS method, we compared it with the state-of-the-art support vector machine (SVM) classifier on enzyme and ion channel datasets, and other exiting methods on four datasets. The promising comparison results further demonstrate that the efficiency and robust of the proposed PDTPS method. This makes it a useful tool and suitable for predicting DTI, as well as other bioinformatics tasks. Full article
(This article belongs to the Special Issue Computational Analysis for Protein Structure and Interaction)
Figures

Figure 1

Open AccessArticle High-Performance Prediction of Human Estrogen Receptor Agonists Based on Chemical Structures
Molecules 2017, 22(4), 675; https://doi.org/10.3390/molecules22040675
Received: 16 March 2017 / Revised: 16 April 2017 / Accepted: 19 April 2017 / Published: 23 April 2017
Cited by 3 | PDF Full-text (2839 KB) | HTML Full-text | XML Full-text
Abstract
Many agonists for the estrogen receptor are known to disrupt endocrine functioning. We have developed a computational model that predicts agonists for the estrogen receptor ligand-binding domain in an assay system. Our model was entered into the Tox21 Data Challenge 2014, a computational
[...] Read more.
Many agonists for the estrogen receptor are known to disrupt endocrine functioning. We have developed a computational model that predicts agonists for the estrogen receptor ligand-binding domain in an assay system. Our model was entered into the Tox21 Data Challenge 2014, a computational toxicology competition organized by the National Center for Advancing Translational Sciences. This competition aims to find high-performance predictive models for various adverse-outcome pathways, including the estrogen receptor. Our predictive model, which is based on the random forest method, delivered the best performance in its competition category. In the current study, the predictive performance of the random forest models was improved by strictly adjusting the hyperparameters to avoid overfitting. The random forest models were optimized from 4000 descriptors simultaneously applied to 10,000 activity assay results for the estrogen receptor ligand-binding domain, which have been measured and compiled by Tox21. Owing to the correlation between our model’s and the challenge’s results, we consider that our model currently possesses the highest predictive power on agonist activity of the estrogen receptor ligand-binding domain. Furthermore, analysis of the optimized model revealed some important features of the agonists, such as the number of hydroxyl groups in the molecules. Full article
(This article belongs to the Special Issue Computational Analysis for Protein Structure and Interaction)
Figures

Figure 1

Review

Jump to: Research, Other

Open AccessReview Machine Learning Approaches for Protein–Protein Interaction Hot Spot Prediction: Progress and Comparative Assessment
Molecules 2018, 23(10), 2535; https://doi.org/10.3390/molecules23102535
Received: 30 August 2018 / Revised: 27 September 2018 / Accepted: 2 October 2018 / Published: 4 October 2018
PDF Full-text (1027 KB) | HTML Full-text | XML Full-text
Abstract
Hot spots are the subset of interface residues that account for most of the binding free energy, and they play essential roles in the stability of protein binding. Effectively identifying which specific interface residues of protein–protein complexes form the hot spots is critical
[...] Read more.
Hot spots are the subset of interface residues that account for most of the binding free energy, and they play essential roles in the stability of protein binding. Effectively identifying which specific interface residues of protein–protein complexes form the hot spots is critical for understanding the principles of protein interactions, and it has broad application prospects in protein design and drug development. Experimental methods like alanine scanning mutagenesis are labor-intensive and time-consuming. At present, the experimentally measured hot spots are very limited. Hence, the use of computational approaches to predicting hot spots is becoming increasingly important. Here, we describe the basic concepts and recent advances of machine learning applications in inferring the protein–protein interaction hot spots, and assess the performance of widely used features, machine learning algorithms, and existing state-of-the-art approaches. We also discuss the challenges and future directions in the prediction of hot spots. Full article
(This article belongs to the Special Issue Computational Analysis for Protein Structure and Interaction)
Figures

Figure 1

Open AccessReview Prediction Methods of Herbal Compounds in Chinese Medicinal Herbs
Molecules 2018, 23(9), 2303; https://doi.org/10.3390/molecules23092303
Received: 12 August 2018 / Revised: 6 September 2018 / Accepted: 7 September 2018 / Published: 10 September 2018
PDF Full-text (749 KB) | HTML Full-text | XML Full-text
Abstract
Chinese herbal medicine has recently gained worldwide attention. The curative mechanism of Chinese herbal medicine is compared with that of western medicine at the molecular level. The treatment mechanism of most Chinese herbal medicines is still not clear. How do we integrate Chinese
[...] Read more.
Chinese herbal medicine has recently gained worldwide attention. The curative mechanism of Chinese herbal medicine is compared with that of western medicine at the molecular level. The treatment mechanism of most Chinese herbal medicines is still not clear. How do we integrate Chinese herbal medicine compounds with modern medicine? Chinese herbal medicine drug-like prediction method is particularly important. A growing number of Chinese herbal source compounds are now widely used as drug-like compound candidates. An important way for pharmaceutical companies to develop drugs is to discover potentially active compounds from related herbs in Chinese herbs. The methods for predicting the drug-like properties of Chinese herbal compounds include the virtual screening method, pharmacophore model method and machine learning method. In this paper, we focus on the prediction methods for the medicinal properties of Chinese herbal medicines. We analyze the advantages and disadvantages of the above three methods, and then introduce the specific steps of the virtual screening method. Finally, we present the prospect of the joint application of various methods. Full article
(This article belongs to the Special Issue Computational Analysis for Protein Structure and Interaction)
Figures

Figure 1

Open AccessReview Recent Advances in Conotoxin Classification by Using Machine Learning Methods
Molecules 2017, 22(7), 1057; https://doi.org/10.3390/molecules22071057
Received: 17 May 2017 / Revised: 12 June 2017 / Accepted: 19 June 2017 / Published: 25 June 2017
Cited by 11 | PDF Full-text (1485 KB) | HTML Full-text | XML Full-text
Abstract
Conotoxins are disulfide-rich small peptides, which are invaluable peptides that target ion channel and neuronal receptors. Conotoxins have been demonstrated as potent pharmaceuticals in the treatment of a series of diseases, such as Alzheimer’s disease, Parkinson’s disease, and epilepsy. In addition, conotoxins are
[...] Read more.
Conotoxins are disulfide-rich small peptides, which are invaluable peptides that target ion channel and neuronal receptors. Conotoxins have been demonstrated as potent pharmaceuticals in the treatment of a series of diseases, such as Alzheimer’s disease, Parkinson’s disease, and epilepsy. In addition, conotoxins are also ideal molecular templates for the development of new drug lead compounds and play important roles in neurobiological research as well. Thus, the accurate identification of conotoxin types will provide key clues for the biological research and clinical medicine. Generally, conotoxin types are confirmed when their sequence, structure, and function are experimentally validated. However, it is time-consuming and costly to acquire the structure and function information by using biochemical experiments. Therefore, it is important to develop computational tools for efficiently and effectively recognizing conotoxin types based on sequence information. In this work, we reviewed the current progress in computational identification of conotoxins in the following aspects: (i) construction of benchmark dataset; (ii) strategies for extracting sequence features; (iii) feature selection techniques; (iv) machine learning methods for classifying conotoxins; (v) the results obtained by these methods and the published tools; and (vi) future perspectives on conotoxin classification. The paper provides the basis for in-depth study of conotoxins and drug therapy research. Full article
(This article belongs to the Special Issue Computational Analysis for Protein Structure and Interaction)
Figures

Graphical abstract

Other

Jump to: Research, Review

Open AccessTechnical Note 3DCONS-DB: A Database of Position-Specific Scoring Matrices in Protein Structures
Molecules 2017, 22(12), 2230; https://doi.org/10.3390/molecules22122230
Received: 31 October 2017 / Revised: 11 December 2017 / Accepted: 13 December 2017 / Published: 15 December 2017
Cited by 1 | PDF Full-text (355 KB) | HTML Full-text | XML Full-text | Supplementary Files
Abstract
Many studies have used position-specific scoring matrices (PSSM) profiles to characterize residues in protein structures and to predict a broad range of protein features. Moreover, PSSM profiles of Protein Data Bank (PDB) entries have been recalculated in many works for different purposes. Although
[...] Read more.
Many studies have used position-specific scoring matrices (PSSM) profiles to characterize residues in protein structures and to predict a broad range of protein features. Moreover, PSSM profiles of Protein Data Bank (PDB) entries have been recalculated in many works for different purposes. Although the computational cost of calculating a single PSSM profile is affordable, many statistical studies or machine learning-based methods used thousands of profiles to achieve their goals, thereby leading to a substantial increase of the computational cost. In this work we present a new database compiling PSSM profiles for the proteins of the PDB. Currently, the database contains 333,532 protein chain profiles involving 123,135 different PDB entries. Full article
(This article belongs to the Special Issue Computational Analysis for Protein Structure and Interaction)
Figures

Figure 1

Back to Top