1. Introduction
A critical step in innovative drug development is determining the interactions among drugs and targets, which is the forerunner of drug design [
1,
2]. Drugs play an important role in the human body by interacting with their targets, of which proteins are an essential target. By inhibiting or enhancing the function of the target protein, the drug achieves the goal of treating the disease. Although the advent of high-throughput sequencing methods has provided technical support for determining DTIs and extensive efforts has been made by drug developers, few new drugs are still approved by the Food and Drug Administration (FDA) for marketing each year [
3,
4,
5,
6]. The main reason is that the identification of DTIs by wet experimental approaches alone consumes a lot of time and money, and the scale of identification is small. With the development of computational biology, this situation can be greatly alleviated. Computer-aided prediction of DTIs can be executed rapidly at scale, providing reliable candidate drug targets for biological experiments and theoretical support for new drug development [
2,
4,
7,
8,
9].
To date, computer-aided prediction-based models for DTIs have been devised by numerous researchers, and they can be roughly classified into two groups: the approach based on network and the approach based on machine learning [
10,
11,
12]. The approach based on network approach typically characterizes the association among targets and drugs as a heterogeneous network, and predict DTI by evaluating network topology node similarity. For instance, the SDTBNI model designed by Wu et al. [
13] predicts DTIs by DTI networks, drug and entity substructure linkages in unknown network space. Chu et al. [
14] proposed a new DTI prediction method called the DTI-CDF model. This method can not only extract similarity features between drugs, but also extract similarity features between target proteins from heterogeneity graph, which greatly improves the prediction performance. Zhang et al. [
15] designed the prediction method of DTIs according to LPLNI, which makes use of the data of neighborhood re-construction. Chu et al. [
16] facilitate multi-label classification by introducing the community detection method DTI-MLCD for DTI prediction. The method performs significantly better than other machine learning methods and other existing methods in the updated gold standard dataset. Zong et al. [
17] used DeepWalk combined with target-target and drug-drug similarities to accurately predict DTIs with the support of Linked Tripartite Network (LTN) and biomedical related data. The approach based on machine learning mainly uses computer to extract data features and combine with classifier to implement DTIs prediction [
18,
19]. For example, Peng et al. [
20] used a semi-supervised inference way to predict DTIs by combining a PCA-based convex optimization algorithm with information about drug targets.
On the basis of the hypothesis that drugs with chemical similarity have similar bioactivity, the DTIs prediction method of target protein information combined with drug structure information has achieved excellent results. Therefore, this paper designs the machine learning approach for predicting DTIs according to this hypothesis. Specifically, we first extracted the protein sequence features information using the Pseudo Position-Specific Score Matrix (PsePSSM) method, then fused them with drug molecular fingerprint descriptors and finally accurately predicted DTIs with interactions by the feature-weighted rotation forest classifier (FwRF) classifier. We tested the performance of RoFDT in datasets including Enzyme, GPCR, Ion Channel and Nuclear Receptor, and compared them with other feature approaches, classifier approaches and previous methods. The superior results demonstrate that the proposed model has excellent ability to identify DTIs. The frame diagram of rofdt is shown in
Figure 1.
2. Materials and Methods
2.1. Gold Standard Datasets
In the present study, we validated the performance of RoFDT on four gold standard datasets, including Enzyme, GPCR, Ion Channel and Nuclear Receptor. These data were collected from SuperTarget & Matador [
21], KEGG BRITE [
22], BRENDA [
23] and DrugBank [
24] databases by Yamanishi et al. [
25]. In these four datasets, the number of DTIs pairs (drug, target) they contain is (445, 664), (210, 204), (233, 95) and (54, 26), respectively, and the number of DTIs with interaction (positive sample) is 2926, 1476, 635 and 90, respectively [
26]. We describe the network of DTIs by a bipartite diagram, where targets or drugs are presented by nodes and their associations are represented by edges. To construct the balanced dataset, we use the random strategy to select the same number of negative samples as positive samples.
2.2. Drug Molecular Descriptor
Drug molecular fingerprinting is widely used to characterize drug compounds because it can directly represent the association between molecular properties and structure and does not need their three-dimensional structural information. Drug molecule fingerprinting manages molecular substructures with dictionary strategy. For a particular molecule, the corresponding position of its dictionary is set to 1 when it has a certain substructure and 0 otherwise. Thus, the fingerprint descriptor of a given drug molecule can be constructed. We used molecular fingerprints from PubChem in this study, and the fingerprints property is “PUBCHEM_CACTVS_SUBGRAPHKEYS”. The compound molecule is decomposed into 881 substructures, so its fingerprint feature descriptor is also 881-dimensional.
2.3. Target Protein Descriptor
In this study, PSSM [
27] was used to generate descriptors of protein sequences. PSSM
S(i,j) can be characterized as
, which is an L × 20 matrix, in which the length of sequence is L and the types of amino acids are 20. Therefore, the formula of
S(i,j) is described as shown below:
where
indicates the probability that the
residue of the protein is mutated into the
amino acid during evolution.
We obtained PSSM through Position-Specific Iterated BLAST (PSI-BLAST) according to SwansProt dataset [
28,
29]. PSI-BLAST will calculate the vector indicating the mutational conservatism of 20 different amino acids. To obtain broad and high homologous protein, the parameter
e-value and iterations are set to 0.001 and 3, respectively.
2.4. Protein Feature Extraction
For better compatibility with the PSSM matrix, we extracted the potential features of proteins using the PsePSSM designed by Chou et al. [
30], which can be denoted as below:
where
is the score calculated by PSI-BLAST, which score can be positive or negative. The probability of the appropriate mutation in the protein sequence higher than unexpectedly expected is indicated by a positive number, otherwise a negative number. However, since protein sequences of different lengths yield different rows of substrates, we thus need to convert them to a uniform pattern using the following equation:
and:
where
represents the average score when protein residue P evolves into a
j-type amino acid. To prevent protein P from losing its sequence information during evolution, we improved the equation by constructing pseudo-amino acids, which are described as follows:
where
is the correlation factor of
j-type amino acids.
2.5. Classification Prediction
In our study, we classify and predict DTIs feature descriptors by FwRF. This classifier has the advantage of increasing the effective feature weights and removing the noise information, which can effectively improve the prediction accuracy. FwRF uses the
statistical method to obtain the weights of different features, and its formula is as follows:
where
is the number of
categories with the value
, and its statistics are as follows:
is the expectation of
and
, and it can be denoted as below:
where
is the total sample size. In feature
, the sample size whose value is
is recorded as
, and in class
, the sample size whose value is
is recorded as
.
FwRF first calculates the weights of the features using the statistical method, then sorts them in descending order and removes the low-weight features depending on the parameters, and finally uses the newly obtained feature set for classification prediction.
Rotation forest (RF) [
31,
32] is a widespread classifier. Given a dataset
containing
training samples, where
is the data and
is the label, the data
consist of
features, thus forming a matrix of
. The decision tree in RF is presented as
, and there are
in total. The execution steps of RF are as follows.
a. The feature set is grouped into -independent parts of the number by the appropriate parameter.
b. The new matrix of the training set is formed using the corresponding feature columns of , and 3/4 of the features are selected from it forms matrix with bootstrap.
c. The coefficient matrix
is obtained through the feature transformation
, and the coefficient matrix
is rotated to generate the rotation matrix
, which is described as follows:
In the classification prediction stage, the classifier
calculates the confidence level
of the test sample
using the following formula and discriminates it as the class with the highest confidence value:
3. Results
3.1. Evaluation Indicator
To better evaluate the RoFDT performance, we used the general evaluation standard of machine learning in this study, which can be denoted as below:
where
TP,
TN,
FP and
FN, respectively, represent True Positive, True Negative, False Positive and False Negative. Moreover, the receiver operating characteristic (ROC) curve [
33,
34,
35] and area under the ROC curve (AUC) were also calculated to reflect the performance of RoFDT.
3.2. Parameter Evaluation
To maximize the RoFDT performance, the grid search approach is employed to verify the FwRF and PsePSSM parameters. When data features are extracted using the PsePSSM algorithm, the information content can be adjusted by changing the parameters in Equation (5) to obtain different feature values. We investigate the effect of different parameters of PsePSSM on the subsequent classification effect in this experiment in order to select the best combination of parameters. The effect of different parameters of the classifier on its prediction accuracy in the enzyme dataset is shown in
Figure 2. It can be seen from the figure that the RF classifier achieves the highest accuracy when the feature subset
, the feature selection ratio
and the number of decision trees
are set to 16, 0.8 and 21, respectively. Therefore, we apply them as the optimal parameters in the model.
3.3. Prediction Performance Evaluation
After optimizing the parameters of RoFDT, we evaluate its performance in the Enzyme, GPCR, Ion Channel and Nuclear Receptor datasets using the five-fold cross-validation (5FCV) method.
Table 1,
Table 2,
Table 3 and
Table 4 summarizes the outcomes obtained by RoFDT on the gold standard datasets. In these datasets, RoFDT achieved 91.68%, 88.11%, 84.72% and 78.33% prediction accuracy, and its standard variance was 0.84%, 1.01%, 1.94% and 5.34%, respectively. In terms of sensitivity, RoFDT achieved scores of 90.84%, 90.30%, 84.73% and 81.97% in the four datasets with standard variances of 1.68%, 1.61%, 3.45% and 7.85%, respectively. RoFDT achieved 83.39%, 79.02%, 74.06%, 65.56% and 91.72%, 88.27%, 85.57% and 75.31% in the MCC and AUC evaluation metrics, which combine to show predictive performance, respectively. These excellent results show that RoFDT has good ability to predict DTIs with strong robustness.
Figure 3,
Figure 4,
Figure 5 and
Figure 6 show the ROC curves acquired by RoFDT in the gold standard dataset, respectively.
3.4. Comparison of Different Feature Models
To estimate the influence of the PsePSSM algorithm on the RoFDT model, we compare it with the Local Phase Quantization (LPQ) algorithm model on four gold standard datasets in this part of the experiment. The LPQ algorithm originally described in the article for texture description by Ojansivu and Heikkila [
36] and is according to the blur invariance property of the Fourier phase spectrum [
37,
38,
39].
Table 5 lists the 5FCV outcomes produced by LPQ combined with FwRF on gold standard datasets. As observed in
Table 5, RoFDT has gained the optimal outcomes in all evaluation indicators. Detailed 5FCV outcomes on four gold standard datasets are aggregated in
Tables S1–S4 of Supplementary Materials. For a fair comparison, FwRF was set with the same hyperparameters in the experiment. From the experimental outcomes, it can be seen that PsePSSM combined with FwRF can effectively promote the model performance.
3.5. Classifier Model Comparison
To investigate further the influence of various classifiers on the RoFDT performance, we compare it with the SVM classifier model. The parameters of the SVM were refined, and its hyperparameters
and
were optimized to 0.6 and 0.5, respectively. The optimization outcomes of SVM parameters are shown in detail in
Table S9 of the Supplementary Materials. As can be seen in
Table 6, RoFDT achieved higher scores in all four gold standard datasets compared to the SVM model. Specifically, RoFDT achieved optimal results in the four gold standard datasets for accuracy, MCC, sensitivity and AUC, but was only slightly less precision than the SVM model in the Ion Channel and Enzyme datasets. Detailed 5FCV experimental outcomes on gold standard datasets are shown in
Tables S5–S8 of Supplementary Materials. The experimental results of comparing different classifier models show that the FwRF classifier used by RoFDT can be better compatible with it, which helps to increase the model prediction accuracy.
3.6. Comparison with Previous Models
Using the powerful computing power of computer to predict DTIs on a large scale has become increasingly important in the field of new drug research and development. Numerous researchers have constructed different computational models to solve this problem. To further evaluating RoFDT’s capabilities, we compare it with these excellent models. Among these excellent models, we chose the model that is also implemented in the four datasets and evaluated using 5FCV. The AUCs generated by these models are listed in
Table 7. As seen in table, RoFDT performed well overall, achieving the best results on Enzyme and the second highest outcomes on Ion Channel and GPCR. However, constrained by the sample size of Nuclear Receptor, RoFDT is not sufficiently trained and performs generally in it.
3.7. Case Studies
To verify the power of RoFDT to predict unknown DTIs, all known DTI pairs are used to train RoFDT and predict in its unknown space. We validate the top 10 DTIs with the highest prediction scores in SuperTarget [
21] and the drug target pairs validated in the SuperTarget database do not contain the data used for training. SuperTarget is a drug target database with a collection of 332,828 DTIs. The outcomes of the case studies are listed in
Table 8, where 7 of the top 10 with the best prediction scores were validated by this database. The case study reveals that RoFDT has the capability to competitively predict unknown DTIs. It is interesting to note that although the remaining three pairs of DTIs are not substantiated by the current database, there is a possibility that their relationship will be proved as the study progresses.
4. Discussion
In the present study, we propose a reliable DTI prediction approach RoFDT by combining protein sequence and drug molecular structure. We first transformed the protein sequence information numerically by PSSM based on its sequence information, and extracted its hidden features using PsePSSM. The drug structure is then encoded as the digital descriptor based on molecular fingerprinting techniques. Finally, the performance of RoFDT was verified using FwRF on four benchmark datasets, and its prediction results were confirmed by the authoritative databases. All these exceptional outcomes show that RoFDT is a valid approach for predicting DTIs and can provide new insights for potential drug discovery.
RoFDT exhibits competitive advantages over previous DTI prediction models. The reason for this is that RoFDT considers that protein sequences provide rich information support for DTI prediction, and its PSSM descriptors are well compatible with PsePSSM feature extraction method to extract its potential features to the maximum extent. In addition, the molecular fingerprint descriptors of drug structures can faithfully represent different drug substructure properties, and thus, have a high characterization capability. Under the above circumstances, RoFDT was able to predict DTI more accurately and provide a more reliable theoretical basis for drug development.
However, RoFDT still has some limitations. For example, the utilization of protein sequence information by RoFDT relies mainly on PSSM, and its richer description needs to be further explored. Second, although the feature extraction method used by RoFDT has achieved better results, it still requires more manual experience to support, and the automation process needs to be better improved. Finally, RoFDT requires more data for training and is not very sensitive to newly discovered drug targets. In future research, we intend to explore more intelligent feature characterization methods to overcome the above-mentioned shortcomings and further enhance the RoFDT performance.
5. Conclusions
As a pioneering step in drug development, the reliable prediction and identification of DTIs plays an essential element in innovative drug research. In the present study, we combined protein sequence and drug molecular structure to design a computational model for DTIs prediction. The proposed model achieves excellent results in the gold standard datasets including Enzyme, GPCR, Ion Channel and Nuclear Receptor. The model also exhibits strong powerful in comparison with extraction algorithm models, classifier models, and previous methods. In addition, 7 of the top 10 DTIs predicted by the proposed model have been verified by relevant database. These outcomes suggest that the RoFDT model can be employed as a stable and dependable tool to provide valuable target candidates for innovative drug research.
Supplementary Materials
The following supporting information can be downloaded at:
https://www.mdpi.com/article/10.3390/biology11050741/s1, Table S1: 5FCV results of FwRF combined with LPQ on Enzyme dataset, Table S2: 5FCV results of FwRF combined with LPQ on Ion Channel dataset, Table S3: 5FCV results of FwRF combined with LPQ on GPCR dataset, Table S4: 5FCV results of FwRF combined with LPQ on Nuclear Receptor dataset, Table S5: 5FCV results of the SVM classifier model on enzyme dataset, Table S6: 5FCV results of SVM classifier model on Ion Channel dataset, Table S7: 5FCV results of SVM classifier model on GPCR dataset, Table S8: 5FCV results of SVM classifier model on Nuclear Receptor dataset, Table S9: The results of SVM parameter optimization using grid search method on Enzyme dataset, Figure S1: The effect of different PsePSSM Parameters on classifier performance, Figure S2: The effect of different feature selection ratio on classifier performance.
Author Contributions
Conceptualization, L.W. (Lei Wang), L.W. (Leon Wong) and Y.W.; methodology, X.S., B.Z. and Y.L.; writing—original draft preparation, Y.W. and L.W. (Lei Wang), writing—review and editing, L.W. (Leon Wong) and Z.Y.; investigation, L.W. (Lei Wang); supervision, B.Z. and Y.L.; project administration, Z.Y.; funding acquisition, L.W. (Lei Wang) and Z.Y. All authors have read and agreed to the published version of the manuscript.
Funding
This work is supported in part by the National Natural Science Foundation of China, under Grants 62172355 and 61702444, in part by the Tianshan youth—Excellent Youth, under Grant 2019Q029, in part by the West Light Foundation of The Chinese Academy of Sciences, under Grant 2018-XBQNXZ-B-008, and in part by the Qingtan scholar talent project of Zaozhuang University.
Institutional Review Board Statement
Not applicable.
Informed Consent Statement
Not applicable.
Data Availability Statement
Not applicable.
Acknowledgments
The authors would like to thank all anonymous reviewers for their constructive advices.
Conflicts of Interest
The authors declare no conflict of interest.
References
- Xia, Z.; Wu, L.-Y.; Zhou, X.; Wong, S.T.C. Semi-supervised drug-protein interaction prediction from heterogeneous biological spaces. BMC Syst. Biol. 2010, 4, S6. [Google Scholar] [CrossRef] [PubMed] [Green Version]
- Wang, Y.-C.; Yang, Z.-X.; Wang, Y.; Deng, N.-Y. Computationally Probing Drug-Protein Interactions via Support Vector Machine. Lett. Drug Des. Discov. 2010, 7, 370–378. [Google Scholar] [CrossRef]
- Landry, Y.; Gies, J.-P. Drugs and their molecular targets: An updated overview. Fundam. Clin. Pharmacol. 2008, 22, 1–18. [Google Scholar] [CrossRef] [PubMed]
- Li, Q.; Lai, L. Prediction of potential drug targets based on simple sequence properties. BMC Bioinform. 2007, 8, 353. [Google Scholar] [CrossRef] [Green Version]
- Van de Waterbeemd, H.; Gifford, E. ADMET in silico modelling: Towards prediction paradise? Nat. Rev. Drug Discov. 2003, 2, 192–204. [Google Scholar] [CrossRef]
- Wang, L.; You, Z.-H.; Huang, D.-S.; Li, J.-Q. MGRCDA: Metagraph Recommendation Method for Predicting CircRNA-Disease Association. IEEE Trans. Cybern. 2021, 1–9. [Google Scholar] [CrossRef]
- Kuruvilla, F.G.; Shamji, A.F.; Sternson, S.M.; Hergenrother, P.J.; Schreiber, S.L. Dissecting glucose signalling with diversity-oriented synthesis and small-molecule microarrays. Nature 2002, 416, 653–657. [Google Scholar] [CrossRef]
- Haggarty, S.J.; Koeller, K.M.; Wong, J.C.; Butcher, R.A.; Schreiber, S.L. Multidimensional chemical genetic analysis of diversity-oriented synthesis-derived deacetylase inhibitors using cell-based assays. Chem. Biol. 2003, 10, 383–396. [Google Scholar] [CrossRef] [Green Version]
- Wang, L.; Yan, X.; You, Z.-H.; Zhou, X.; Li, H.-Y.; Huang, Y.-A. SGANRDA: Semi-supervised generative adversarial networks for predicting circRNA–disease associations. Brief. Bioinform. 2021, 22, bbab028. [Google Scholar] [CrossRef]
- Chen, X.; Yan, C.C.; Zhang, X.; Zhang, X.; Dai, F.; Yin, J.; Zhang, Y. Drug-target interaction prediction: Databases, web servers and computational models. Brief. Bioinform. 2016, 17, 696. [Google Scholar] [CrossRef] [Green Version]
- Alguwaizani, S.; Park, B.; Zhou, X.; Huang, D.-S.; Han, K. Predicting interactions between virus and host proteins using repeat patterns and composition of amino acids. J. Healthc. Eng. 2018, 2018, 1–9. [Google Scholar] [CrossRef] [PubMed] [Green Version]
- Wang, L.; You, Z.-H.; Li, J.-Q.; Huang, Y.-A. IMS-CDA: Prediction of CircRNA-Disease Associations From the Integration of Multisource Similarity Information With Deep Stacked Autoencoder Model. IEEE Trans. Cybern. 2020, 51, 5522–5531. [Google Scholar] [CrossRef] [PubMed]
- Wu, Z.; Cheng, F.; Li, J.; Li, W.; Liu, G.; Tang, Y. SDTNBI: An integrated network and chemoinformatics tool for systematic prediction of drug–target interactions and drug repositioning. Brief. Bioinform. 2017, 18, 333–347. [Google Scholar] [CrossRef] [PubMed]
- Chu, Y.; Kaushik, A.C.; Wang, X.; Wang, W.; Zhang, Y.; Shan, X.; Salahub, D.R.; Xiong, Y.; Wei, D.-Q. DTI-CDF: A cascade deep forest model towards the prediction of drug-target interactions based on hybrid features. Brief. Bioinform. 2021, 22, 451–462. [Google Scholar] [CrossRef]
- Zhang, W.; Chen, Y.; Li, D. Drug-Target Interaction Prediction through Label Propagation with Linear Neighborhood Information. Molecules 2017, 22, 2056. [Google Scholar] [CrossRef] [Green Version]
- Chu, Y.; Shan, X.; Chen, T.; Jiang, M.; Wang, Y.; Wang, Q.; Salahub, D.R.; Xiong, Y.; Wei, D.-Q. DTI-MLCD: Predicting drug-target interactions using multi-label learning with community detection method. Brief. Bioinform. 2021, 22, bbaa205. [Google Scholar] [CrossRef]
- Zong, N.; Kim, H.; Ngo, V.; Harismendy, O. Deep Mining Heterogeneous Networks of Biomedical Linked Data to Predict Novel Drug-Target Associations. Bioinformatics 2017, 33, 2337–2344. [Google Scholar] [CrossRef]
- Huang, D.-S.; Zhang, L.; Han, K.; Deng, S.; Yang, K.; Zhang, H. Prediction of protein-protein interactions based on protein-protein correlation using least squares regression. Curr. Protein Pept. Sci. 2014, 15, 553–560. [Google Scholar] [CrossRef]
- Xia, J.-F.; Han, K.; Huang, D.-S. Sequence-Based Prediction of Protein-Protein Interactions by Means of Rotation Forest and Autocorrelation Descriptor. Protein Pept. Lett. 2010, 17, 137–145. [Google Scholar] [CrossRef]
- Peng, L.; Liao, B.; Zhu, W.; Li, Z.; Li, K. Predicting Drug-Target Interactions With Multi-Information Fusion. IEEE J. Biomed. Health Inform. 2017, 21, 561–572. [Google Scholar] [CrossRef]
- Gunther, S.; Kuhn, M.; Dunkel, M.; Campillos, M.; Senger, C.; Petsalaki, E.; Ahmed, J.; Urdiales, E.G.; Gewiess, A.; Jensen, L.J.; et al. SuperTarget and Matador: Resources for exploring drug-target relationships. Nucleic Acids Res. 2008, 36, D919–D922. [Google Scholar] [CrossRef] [PubMed]
- Kanehisa, M.; Goto, S.; Hattori, M.; Aoki-Kinoshita, K.F.; Itoh, M.; Kawashima, S.; Katayama, T.; Araki, M.; Hirakawa, M. From genomics to chemical genomics: New developments in KEGG. Nucleic Acids Res. 2006, 34, D354–D357. [Google Scholar] [CrossRef] [PubMed]
- Schomburg, I.; Chang, A.; Ebeling, C.; Gremse, M.; Heldt, C.; Huhn, G.; Schomburg, D. BRENDA, the enzyme database: Updates and major new developments. Nucleic Acids Res. 2004, 32, D431–D433. [Google Scholar] [CrossRef] [PubMed] [Green Version]
- Wishart, D.S.; Knox, C.; Guo, A.C.; Cheng, D.; Shrivastava, S.; Tzur, D.; Gautam, B.; Hassanali, M. DrugBank: A knowledgebase for drugs, drug actions and drug targets. Nucleic Acids Res. 2008, 36, D901–D906. [Google Scholar] [CrossRef]
- Yamanishi, Y.; Araki, M.; Gutteridge, A.; Honda, W.; Kanehisa, M. Prediction of drug-target interaction networks from the integration of chemical and genomic spaces. Bioinformatics 2008, 24, I232–I240. [Google Scholar] [CrossRef]
- Wang, L.; You, Z.H.; Chen, X.; Yan, X.; Liu, G.; Zhang, W. RFDT: A Rotation Forest-based Predictor for Predicting Drug-Target Interactions Using Drug Structure and Protein Sequence Information. Curr. Protein Pept. Sci. 2018, 19, 445–454. [Google Scholar] [CrossRef]
- Gribskov, M.; McLachlan, A.D.; Eisenberg, D. Profile analysis: Detection of distantly related proteins. Proc. Natl. Acad. Sci. USA 1987, 84, 4355–4358. [Google Scholar] [CrossRef] [Green Version]
- Altschul, S.F.; Madden, T.L.; Schaffer, A.A.; Zhang, J.; Zhang, Z.; Miller, W.; Lipman, D.J. Gapped BLAST and PSI-BLAST: A new generation of protein database search programs. Nucleic Acids Res. 1997, 25, 3389–3402. [Google Scholar] [CrossRef] [Green Version]
- Wang, L.; You, Z.-H.; Xia, S.-X.; Liu, F.; Chen, X.; Yan, X.; Zhou, Y. Advancing the prediction accuracy of protein-protein interactions by utilizing evolutionary information from position-specific scoring matrix and ensemble classifier. J. Theor. Biol. 2017, 418, 105–110. [Google Scholar] [CrossRef]
- Chou, K.C. Prediction of protein cellular attributes using pseudo-amino acid composition. Proteins Struct. Funct. Genet. 2001, 43, 246–255. [Google Scholar] [CrossRef]
- Rodriguez, J.J.; Kuncheva, L.I. Rotation forest: A new classifier ensemble method. IEEE Trans. Pattern Anal. Mach. Intell. 2006, 28, 1619–1630. [Google Scholar] [CrossRef] [PubMed]
- Wang, L.; You, Z.-H.; Yan, X.; Xia, S.-X.; Liu, F.; Li, L.-P.; Zhang, W.; Zhou, Y. Using Two-dimensional Principal Component Analysis and Rotation Forest for Prediction of Protein-Protein Interactions. Sci. Rep. 2018, 8, 12874. [Google Scholar] [CrossRef] [PubMed]
- Zweig, M.H.; Campbell, G. Receiver-operating characteristic (ROC) plots: A fundamental evaluation tool in clinical medicine. Clin. Chem. 1993, 39, 561–577. [Google Scholar] [CrossRef]
- Wang, L.; Wang, H.-F.; Liu, S.-R.; Yan, X.; Song, K.-J. Predicting Protein-Protein Interactions from Matrix-Based Protein Sequence Using Convolution Neural Network and Feature-Selective Rotation Forest. Sci. Rep. 2019, 9, 9848. [Google Scholar] [CrossRef] [PubMed] [Green Version]
- Wang, B.; Huang, D.-S.; Jiang, C. A new strategy for protein interface identification using manifold learning method. IEEE Trans. Nanobiosci. 2014, 13, 118–123. [Google Scholar] [CrossRef] [PubMed]
- Ojansivu, V.; Heikkila, J. Blur insensitive texture classification using local phase quantization. Image Signal Process. 2008, 5099, 236–243. [Google Scholar]
- Wang, H.; Song, A.; Li, B.; Xu, B.; Li, Y. Psychophysiological classification and experiment study for spontaneous EEG based on two novel mental tasks. Technol. Health Care 2015, 23, S249–S262. [Google Scholar] [CrossRef] [Green Version]
- Li, Y.; Olson, E.B. A General Purpose Feature Extractor for Light Detection and Ranging Data. Sensors 2010, 10, 10356–10375. [Google Scholar] [CrossRef] [Green Version]
- Li, Y.; Olson, E.B. Structure Tensors for General Purpose LIDAR Feature Extraction. In Proceedings of the 2011 IEEE International Conference on Robotics and Automation (ICRA), Shanghai, China, 9–13 May 2011; pp. 1869–1874. [Google Scholar]
- Chen, H.; Zhang, Z. A Semi-Supervised Method for Drug-Target Interaction Prediction with Consistency in Networks. PLoS ONE 2013, 8, e62975. [Google Scholar] [CrossRef] [Green Version]
- Gonen, M. Predicting drug-target interactions from chemical and genomic kernels using Bayesian matrix factorization. Bioinformatics 2012, 28, 2304–2310. [Google Scholar] [CrossRef]
- Öztürk, H.; Ozkirimli, E.; Özgür, A. A comparative study of SMILES-based compound similarity functions for drug-target interaction prediction. BMC Bioinform. 2016, 17, 128. [Google Scholar] [CrossRef] [PubMed] [Green Version]
| Publisher’s Note: MDPI stays neutral with regard to jurisdictional claims in published maps and institutional affiliations. |
© 2022 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).