A Review and Experimental Analysis of Supervised Learning Systems and Methods for Protein–Protein Interaction Detection
Abstract
1. Introduction
2. Motivation and Key Contributions
2.1. Motivation
2.2. Why This Survey—What Is Different and Better
- Triangulated evidence, not summaries: Unlike prior reviews that stop at collating reported metrics, we triangulate literature-derived aggregates with observational (qualitative, deployment-oriented) comparisons and with our own experiments on benchmark datasets. This layered design lets readers reconcile “headline” metrics with operational constraints (compute budget, data regime, interpretability needs).
- Critical commentary tied to real-world use: Beyond accuracy tables, we analyze where techniques succeed or fail in practice—for example, noting when instance-based methods bottleneck at inference, when kernel choices drive SVM variance, and when deep stacks trade performance for compute. We also provide concrete deployment guidance (e.g., WKNN vs. KNN calibration and when to favor each under feature noise or curation quality), which typical surveys omit.
- A methods-first taxonomy with pointers to sections (Figure 1): We organize the space by methodology (from ELM/CNN/GNN/DNN to probabilistic and margin-based learners) and explicitly link taxonomy nodes to manuscript sections, improving navigability for readers who need technique-specific depth quickly.
- Clear, prescriptive takeaways: We convert the comparative analyses into strategic recommendations—what to use for high-accuracy scenarios, real-time triage, quick prototyping, or when interpretability dominates—so researchers can immediately map method to constraint.
2.3. Key Contributions
- Three-Layered Evaluation (Empirical + Experimental): We integrate (a) Comparative Quantitative results (averaged accuracy, F1, and time across studies), (b) Comparative Observational assessments (scalability, interpretability, dataset fit, efficiency), and (c) our Experimental Evaluations, offering a balanced, multi-angle evidence base that goes beyond single-view meta-analysis.
- Critical Commentary and Insights for Supervised PPI Methods: For each family—ELM, CNNs, GNNs, DNNs, Naïve Bayes, Probabilistic Decision Trees, SVM/LS-SVM, KNN/WKNN—we articulate why and when performance differs (e.g., CNNs capture sequence motifs but miss long-range structure; GNNs exploit topology and interfaces; ELMs deliver ultra-fast training but can trade off specificity), anchoring commentary in both literature trends and experimental behavior.
- Deployment-Aware Guidance: We translate findings into practical rules of thumb: prefer GNNs/DNNs for maximal accuracy on large or structure-rich data; use ELM for large-scale, real-time screening; apply Naïve Bayes for rapid prototyping; treat KNN/WKNN cautiously at scale due to inference cost, with calibration tips when local distances are reliable.
- Evidence-Backed Rankings with Nuanced Trade-offs: Our synthesis highlights GNNs and DNNs as accuracy leaders, while ELM and Naïve Bayes dominate efficiency; we also surface cases where classical variants (e.g., LS-SVM) stabilize performance under noise. This enables principled model selection rather than one-size-fits-all claims.
- What Readers Gain vs. Existing Surveys: Readers receive (a) triangulated results (not just curated citations), (b) qualitative, operations-focused analysis aligned to real datasets and resource envelopes, (c) section-indexed taxonomy for fast lookup, and (d) prescriptive recommendations by scenario; these are elements that are typically missing or treated separately in prior work.
- Bridging Theory and Practice: By combining theoretical underpinnings, literature-level aggregation, and hands-on experiments, the survey clarifies accuracy–efficiency–interpretability trade-offs and accelerates method-to-use-case mapping in biomedical pipelines.
- End-user takeaway: For sequence-only settings with tight compute, start with CNNs (or ELM for ultra-fast triage). When structure or topology is available, or when interfaces matter, favor GNNs. Where peak accuracy is paramount, and where resources permit, use DNNs—ideally paired with graph reasoning. These prescriptions emerge from the combined empirical and experimental evidence synthesized in this work.

3. Review Methodology and a Standardized Evaluation Scale (Scale 1–4)
- The total number of papers initially retrieved from the specified databases is 146.
- The number of papers after removing duplicates is 128.
- The number of papers excluded based on title/abstract/content screening is 58.
- The final number of papers included in the survey is: (1) 70 papers, whose proposed methods are discussed; (2) 30 papers that are only referenced.
- Accuracy scores were assigned based on reported quantitative metrics (e.g., AUROC, F1-score, MCC) and robustness across datasets.
- Scalability scores were determined using dataset size, computational complexity, and evidence of large-scale applicability.
- Efficiency scores were based on reported training cost, runtime behavior, and model complexity.
- Interpretability scores were assigned through qualitative assessment of model transparency, availability of explanation mechanisms, and biological interpretability.
- Reported empirical results from the literature.
- Computational characteristics of each method.
- Observational (qualitative) analysis.
- Sequence-based features (e.g., amino acid composition, k-mer frequencies, evolutionary profiles);
- Physicochemical properties (e.g., hydrophobicity, charge, polarity);
- Structural features (e.g., 3D conformations, residue contact maps);
- Network/topological features (e.g., node degree, graph connectivity in PPI networks);
- Embedding-based features (e.g., protein language model embeddings, learned representations).
4. Neural Network-Based Learning Category
4.1. Graph Neural Networks (GNNs) Technique
4.1.1. Components and Rationale of GNNs for PPI Prediction
4.1.2. Mathematical Formulation of GNNs
4.1.3. Featuring and Evaluating Research Papers That Have Utilized GNNs for PPI Detection
4.2. Deep Neural Networks (DNNs) Technique
4.2.1. Components and Rationale of DNNs for PPI Prediction
4.2.2. Mathematical Formulation of DNNs
- is the output of the previous layer or the input vector for the first hidden layer.
- is the activation function, commonly the ReLU (Rectified Linear Unit), defined as: .
4.2.3. Featuring and Evaluating Research Papers That Have Utilized DNNs for PPI Detection
4.3. Convolutional Neural Networks (CNNs) Technique
4.3.1. Components and Rationale of CNNs for PPI Prediction
4.3.2. Mathematical Formulation of CNNs
4.3.3. Featuring and Evaluating Research Papers That Have Utilized CNNs for PPI Detection
4.4. Extreme Learning Machine (ELM) Technique
4.4.1. Components and Rationale of ELM for PPI Prediction
4.4.2. Mathematical Formulation of ELM
- : Random weights between the input and hidden layer.
- : Random biases for the hidden layer nodes.
- g(⋅): Activation function (e.g., sigmoid, ReLU, or radial basis function).
- : Hidden layer output matrix.
- : Output weights between the hidden layer and the output layer.
- : Target matrix derived from ti.
- is the Moore–Penrose generalized inverse of H, computed as:
- is nonsingular.
- is nonsingular.
4.4.3. Featuring and Evaluating Research Papers That Have Utilized ELM for PPI Detection
| Comparative Insights Across Neural Network-Based Learning PPI Detection Families: ELM, CNN, GNN, vs. generic DNN/Transformers |
| ELM vs. CNN: For sequence-only PPI pipelines, ELM variants (e.g., PCA-EELM, WELM-SURF) train in seconds and handle class imbalance cheaply, but they rely on hand-crafted k-mer/physicochemical features and plateau on cross-species transfer; CNNs learn discriminative local motifs from raw sequences (e.g., residual/ensemble CNNs) and typically surpass ELMs on accuracy, albeit with higher compute and careful regularization. CNN vs. GNN: CNNs excel when only primary sequence is available, and labeling is noisy, but they miss 3D geometry and long-range contacts; GNNs built on residue/atom graphs integrate sequence and structure, capturing spatial dependencies and boosting generalization on structure-rich benchmarks. GNN vs. generic DNN/Transformers: Modern DNN stacks using protein language models (PLMs) provide powerful per-protein embeddings for pairwise classification, yet they can conflate evolutionary signal with data leakage and struggle without explicit interface cues; GNNs complement PLMs by message passing over contact graphs or PPI networks, improving interface-aware predictions and multi-category interaction types. PLMs generate context-aware protein embeddings, which capture evolutionary, structural, and functional information without manual feature engineering. In PLM-interact frameworks: (1) each protein is encoded independently using a PLM; (2) the resulting embeddings are combined (e.g., concatenation, similarity scoring); (3) a downstream classifier predicts interaction probability. PLM-based models significantly improve performance due to rich representations. However, they may suffer from data leakage risks and lack of explicit structural reasoning, which motivates hybrid approaches. ELM vs. DNN (deployment): If you need rapid screening over millions of candidate pairs or edge devices in labs, ELMs offer excellent latency/throughput and transparent calibration but require curated features and careful negative sampling; DNNs/PLMs are state-of-the-art in accuracy and cross-domain transfer, yet demand GPUs, larger training sets, and vigilance against train–test homology bias. End-to-end takeaway (real-world PPI detection): For sequence-only, scarce-compute settings choose CNNs (or ELMs for ultrafast triage); for structure-available or interface-critical tasks choose GNNs; and for maximal accuracy under ample compute, use PLM-based DNN—hybrid PLM→GNN models that join learned embeddings with graph. |
5. Margin-Based Discriminative Category
5.1. Support Vector Machine (SVM) Technique
5.1.1. Components and Rationale of SVM for PPI Prediction
5.1.2. Mathematical Formulation of SVM
- : feature vector representing a protein pair.
- : label indicating whether the pair interacts (+1) or does not interact (−1).
- w: normal vector to the hyperplane.
- b: bias term.
- : slack variable for each data point, allowing for soft margin classification.
- C > 0: regularization parameter controlling the trade-off between maximizing the margin and minimizing classification error.
- : Lagrange multipliers from the dual optimization problem.
- : kernel function applied to the new data point and support vectors.
5.1.3. Featuring and Evaluating Research Papers That Have Utilized SVM for PPI Detection
5.2. Least Squares SVM (LS-SVM) Technique
5.2.1. Components and Rationale of LS-SVM for PPI Prediction
5.2.2. Mathematical Formulation of LS-SVM
- Ωij = K(xi, xj): kernel matrix.
- K(⋅,⋅): kernel function (e.g., RBF, polynomial).
5.2.3. Featuring and Evaluating Research Papers That Have Utilized LS-SVM for PPI Detection
| Comparative Insights Across Margin-Based Families: SVM vs. Least Squares SVM |
| Optimization and model footprint: For PPI detection with heterogeneous descriptors (e.g., PSSM, contact potentials, structure cues), classic SVM’s hinge-loss maximizes a geometric margin and yields sparse solutions (few support vectors), so inference on proteome-scale screens is lean; LS-SVM’s least-squares reformulation solves linear systems quickly but produces denser decision functions, trading faster retraining for heavier prediction cost. This makes SVM preferable when you must score billions of pairs or deploy on constrained hardware, while LS-SVM is advantageous when gold standards or features change frequently and you must refit often. Noise, imbalance and calibration: Under real assay noise and severe class imbalance (true interactions are rare), SVM’s margin objective is slightly more outlier-tolerant, whereas LS-SVM’s quadratic loss amplifies the influence of mislabeled positives; with reweighting/SMOTE, both can recover minority recall, but SVM typically preserves precision better at the same recall. For probabilistic scoring and automated hyperparameter selection, LS-SVM integrates naturally with evidence/Bayesian formulations, while standard SVM usually needs post-hoc calibration (Platt/isotonic) and broader C–γ searches; so LS-SVM gives smoother probability estimates out-of-the-box, whereas SVM gives sharper decision boundaries that generalize more stably under label noise. Feature–kernel interplay and deployment roles: With string, spectrum, or RBF kernels over sequence/structure features, both families reach similar peak accuracy on moderate-sized PPI sets; the practical gap emerges in how they get there—SVM’s sparsity curbs latency at scale, while LS-SVM’s linear-algebra training accelerates iteration on new organisms or feature sets. In production pipelines, a high-throughput pattern is to use LS-SVM for rapid model refreshes and calibrated candidate scoring, then freeze a classic SVM for the final, sparse reranker that meets throughput/SLA targets; invert that choice only if retraining frequency dominates runtime cost. Overall, choose LS-SVM when you value swift, well-calibrated retraining and SVM when you need durable margins, fewer support vectors, and predictable speed at deployment. |
6. Probabilistic Learning Category
6.1. Naïve Bayes Technique
6.1.1. Components and Rationale of Naïve Bayes for PPI Prediction
6.1.2. Mathematical Formulation of Naïve Bayes
- This linear combination of log probabilities simplifies computation and is widely implemented in practice.
6.1.3. Featuring and Evaluating Research Papers That Have Utilized Naïve Bayes for PPI Detection
6.2. Probabilistic Decision Tree Technique
6.2.1. Components and Rationale of Probabilistic Decision Tree for PPI Prediction
6.2.2. Mathematical Formulation of Probabilistic Decision Tree
- Count(c, L): The number of samples of class c reaching the leaf node L.
- Count(L): The total number of samples reaching L.
6.2.3. Featuring and Evaluating Papers That Utilized Probabilistic Decision Tree for PPI Detection
| Comparative Insights Across Probabilistic-Based PPI Detection Families: Naïve Bayes vs. Probabilistic Decision Trees |
| Naïve Bayes (NB) and Probabilistic Decision Trees (PDTs) are both probabilistic learners used in protein–protein interaction (PPI) detection, but differ in how they handle feature dependencies. NB assumes independence among features, enabling rapid computation and scalability for large proteomic datasets. However, this assumption often fails in biological data where features like residue position and solvent accessibility are correlated, reducing NB’s predictive power. PDTs overcome this by modeling conditional dependencies through hierarchical decision paths, leading to more accurate interaction predictions when feature correlations exist. NB’s interpretability lies in its clear feature likelihoods, offering quick insights into individual contributions, whereas PDTs produce rule-based explanations that biologists can easily interpret and validate experimentally. Thus, PDTs provide deeper interpretability, while NB favors computational simplicity and speed. Under class imbalance and noisy labeling—common in PPI datasets—PDTs handle skewed data better through resampling or boosting, whereas NB requires careful probability calibration to avoid bias. NB excels in high-throughput screening due to its linear complexity and minimal training cost, making it suitable for early-stage candidate filtering. PDTs, although more resource-intensive, deliver stronger reliability in final-stage validation where nuanced feature relationships matter. In large-scale deployment pipelines, NB acts effectively as a first-pass filter, while PDTs function as a second-tier model for probabilistic refinement. Overall, NB offers scalability and simplicity, but PDTs achieve greater robustness, interpretability, and accuracy in complex, real-world PPI detection scenarios. |
7. Instance-Based Learning Category
7.1. K-Nearest Neighbor (KNN) Technique
7.1.1. Components and Rationale of KNN for PPI Prediction
7.1.2. Mathematical Formulation of KNN
- (1)
- Input: A query protein pair (q1, q2), with features . A dataset of proteins represented as feature vectors, as shown in Equation (31):
- is the feature vector of protein i,
- is the label indicating interaction (1)/non-interaction (0)
- (2)
- Distance Metric: Compute a distance d(xi, q1) and d(xi, q2) between the query proteins and the proteins in the dataset. Common distance metrics include Euclidean Distance (Equation (32)):
- (3)
- Similarity Between Protein Pairs: Define a similarity measure for the pair (q1, q2). This can be the average distance as shown in Equation (33):
- (4)
- Finding the Neighbors: Identify the k-nearest neighbors of (q1, q2) in the dataset based on dpair.
- (5)
- Classification Rule: Compute the majority class among the k-nearest neighbors (Equation (35)):
- where is an indicator function that equals 1 if yj = c, and 0 otherwise.
- (6)
- Output:
- If , the proteins (q1, q2) are predicted to interact.
- If , they are not predicted to interact.
7.1.3. Featuring and Evaluating Research Papers That Have Utilized KNN for PPI Detection
7.2. Weighted K-Nearest Neighbor (WKNN) Technique
7.2.1. Components and Rationale of WKNN for PPI Prediction
7.2.2. Mathematical Formulation of WKNN
- Distance Computation (e.g., Euclidean) is as shown in Equation (36):
- Select k Nearest Neighbors based on distance.
- Weight Assignment (e.g., Inverse Distance) is as shown in Equation (37):
- Weighted Voting:
7.2.3. Featuring and Evaluating Research Papers That Have Utilized WKNN for PPI Detection
| Comparative Insights Across Instance-Based Families: KNN vs. Weighted KNN |
| Local evidence vs. local emphasis: Plain KNN treats each of the k neighbors equally, which is attractive when PPI features (e.g., PSSM, interface propensities, structural cues) cluster cleanly, but it blurs decision boundaries when interacting/non-interacting pairs intermix; WKNN fixes this by weighting closer neighbors (e.g., 1/d, 1/d2, or Gaussian), so residues/pairs that are truly proximal in feature space dominate the vote and typically lift edge-level AUC. In practice, WKNN better reflects biological intuition that “very similar” pairs are more informative than merely “somewhat similar,” yielding higher precision around dense interface manifolds, while vanilla KNN is more stable when distances are noisy or poorly scaled (where aggressive weighting can over-trust spurious near hits). Both remain highly interpretable—KNN via a simple neighbor list, WKNN via a ranked, weight-annotated neighbor list—but WKNN’s weights provide a clearer rationale for why borderline calls tip toward “interact.” However, both methods share poor raw scalability because inference scans many points; WKNN adds weighting overhead to the same O(N) lookup, so neither is ideal for proteome-scale screening without indexing or approximate search. Real-world data issues and deployment patterns: Under high dimensionality and hubness common to PPI descriptors, KNN’s uniform vote can dilute minority signals, whereas WKNN partially counters this by amplifying dense local pockets of true positives—improving recall in imbalanced settings—but may also magnify labeling noise near decision boundaries. With careful metric learning/standardization, WKNN typically outperforms KNN on moderate-size PPI sets, yet both degrade on very large cohorts unless you add ANN indices (e.g., HNSW/IVF) and cache neighbor graphs; here, KNN retains a slight latency edge because its uniform vote avoids per-neighbor weighting arithmetic. For calibration, both models yield empirical probabilities from weighted (or unweighted) neighbor proportions; WKNN’s continuous weights often produce smoother probability curves for threshold selection in pipelines that must trade precision for wet-lab cost. In deployment, a pragmatic split is to use WKNN when local distances are trustworthy (good feature scal-ing, curated negatives) and you need sharper ranking of candidates for experimental validation, but to prefer KNN when feature noise is high, interpretability must be ultra-simple, or you are constrained to lighter inference paths—even though both will require ANN + pruning to be production-viable at proteome scale. |
8. Comparative Quantitative Analysis
Comprehensive Analysis and Strategic Recommendations Based on the Comparative Evaluation Above
- ➣
- GNNs: GNNs excel at modeling topological and relational protein interaction features and are particularly effective for complex and large-scale PPI networks due to their message-passing architecture and ability to aggregate neighbor features.
- ➣
- DNNs: DNNs are highly effective at learning non-linear feature representations from protein sequences or structure-based embeddings. Their multi-layered architecture allows deep abstraction but at a higher computational cost.
- ➣
- ELM: Offers 94% accuracy and 92% F1-score with very low computational time. ELM uses a single hidden layer with random weights and analytically determined output weights, which eliminates iterative training and makes it suitable for real-time and large-scale PPI analysis.
- ➣
- Naïve Bayes: it is computationally the most efficient due to its assumption of feature independence and probabilistic framework, making it ideal for rapid prototyping and data exploration.
- ➣
- KNN and WKNN: Despite respectable accuracy (KNN: 90%, WKNN: 92%) and F1-scores (91% and 92% respectively), both methods are instance-based and suffer from high computational times, especially during inference. This makes them less suitable for large-scale or time-sensitive PPI prediction.
- ➣
- Probabilistic Decision Tree: Shows moderate accuracy (88%) and F1-score (86%). However, it is prone to overfitting due to hierarchical splits unless regularized or combined with ensemble techniques.
- ➣
- Naïve Bayes: Although efficient, its performance suffers due to the oversimplified assumption of feature independence, which is rarely valid in protein data where features are interdependent.
- (1)
- For High-Accuracy Applications:
- ➣
- Use GNNs or DNNs to capture complex interdependencies in protein interaction networks.
- ➣
- SVM and LS-SVM (Accuracy: 91% and 93%, F1-score: 86% and 91%) are also reliable, especially when kernels like RBF are well-optimized.
- (2)
- For Large-Scale, Real-Time Analysis:
- ➣
- Deploy ELM for fast, accurate modeling. Ideal for high-throughput experiments or when time is a constraint.
- (3)
- For Quick Prototyping and Feature Screening:
- ➣
- Utilize Naïve Bayes for its simplicity and interpretability, with the caveat of using proper feature engineering to mitigate independence assumptions.
- (4)
- To Improve Generalization and Interpretability:
- ➣
- Combine Probabilistic Decision Trees with ensemble techniques (e.g., Bagging, Random Forests) to reduce overfitting while retaining interpretability.
- (5)
- KNN and WKNN Optimization:
- ➣
- Apply KD-Trees or PCA-based dimensionality reduction to reduce inference costs.
- ➣
- Use for small to medium-sized datasets where computational time is less critical.
9. Comparative Observational Analysis
- Strengths: Automatically extracts local motifs; effective for structured biological data; avoids manual feature engineering.
- Weaknesses: Requires high-quality feature representations; computationally expensive.
- Use Case: Ideal for PPI prediction based on protein sequences or 3D structures.
- Literature Support: Gao et al. and Zhang et al. demonstrate robust CNN models with AUC > 0.90 in PPI tasks.
- Strengths: Excels in modeling topological dependencies; suitable for complex networks.
- Weaknesses: Computationally more intensive; requires graph construction and domain knowledge.
- Use Case: Best for protein interaction networks with known graph structures.
- Literature Support: GNN models achieve the highest average accuracy (97%) and F1-score (96%) across large-scale datasets.
- Strengths: Strong modeling power for non-linear, high-dimensional data.
- Weaknesses: Black-box nature; computationally intensive; overfitting risk on small datasets.
- Use Case: Recommended when large labeled datasets with complex features are available.
- Literature Support: DNNs provide 96% accuracy and 92% F1-score in high-throughput tasks.
- Strengths: Improved computational performance over standard SVM; supports Bayesian inference.
- Weaknesses: Lower interpretability compared to decision trees; performance varies with data structure.
- Use Case: Appropriate when a balance between accuracy and computational efficiency is needed.
- Literature Support: Zhang et al. show LS-SVM achieves an F1-score of 0.84 with Bayesian optimization.
- Strengths: Enhanced accuracy using weighted distance voting; intuitive and effective for local similarity.
- Weaknesses: Inference is slow; unsuitable for large-scale datasets.
- Use Case: Best used in small to medium datasets where accuracy is prioritized over speed.
- Literature Support: Used successfully in miRNA-mRNA-PPI network for addiction analysis with strong clustering performance.
- Shallow models (e.g., ELM, Naïve Bayes, KNN): These models do not rely on deep layered architectures and thus have limited representational capacity but offer fast training and inference.
- Moderate-depth models (e.g., CNNs): Increasing the number of convolutional layers improves the ability to capture local sequence motifs; however, excessive depth may lead to overfitting and increased computational cost.
- Deep architectures (e.g., DNNs, GNNs): Increasing the number of layers enables learning hierarchical and high-level representations. For example, DNNs use multiple hidden layers to model complex nonlinear relationships, while GNNs use stacked layers to capture multi-hop dependencies in PPI networks.However, very deep models may suffer from:
- ○
- Overfitting;
- ○
- Vanishing gradients;
- ○
- Increased computational cost.
- Key distinction across models:
- ○
- CNN depth → improves local feature extraction.
- ○
- GNN depth → improves topological/contextual learning.
- ○
- DNN depth → improves global nonlinear representation learning.
10. Experimental Procedures and Evaluations
10.1. Approach for Choosing a Representative Algorithm for Each Technique
10.2. Datasets
- Database of Interacting Proteins (DIP) [87]: This dataset originates from the Saccharomyces cerevisiae core subset in DIP, which consists of 5221 proteins and 24,918 interactions derived from 18,229 experiments. DIP serves as a comprehensive biological archive, aggregating experimentally validated PPIs from diverse sources.
- Human Protein Reference Database (HPRD) [22]: Designed to advance research in human biology, HPRD provides curated, high-quality information on human PPIs and other protein-related data. The database includes 30,047 proteins and 41,327 interactions, offering a valuable resource for understanding human proteomics.
- STRING [93]: STRING delivers an extensive and critical analysis of PPIs, integrating both physical and functional associations. It is a widely used resource for studying the interplay between proteins at molecular/functional levels.
- ○
- Positive Sample Construction: Positive PPI pairs are obtained from curated databases DIP, HPRD, and STRING. Only experimentally validated interactions (or high-confidence STRING database scores ≥ 0.7) are included.
- ○
- Negative Sample Construction: Negative samples are generated by random pairing of proteins not reported to interact. We ensure biological plausibility constraints (e.g., avoiding the same subcellular localization when applicable).
- ○
- Train/Test Splitting Strategy: We ensure no overlap between training and testing protein pairs. We avoid information leakage through shared proteins when required.
- ○
- Redundancy Control: Sequence similarity filtering (e.g., CD-HIT or threshold-based filtering) is applied where reported. Redundant protein pairs are removed to prevent overestimated performance.
10.3. Evaluation Setup
- (a)
- Preprocessing and Training Parameters: Table 14 presents the preprocessing and training parameters.
- (b)
- Evaluation Metrics:
- Sensitivity: This metric measures the proportion of true interacting protein pairs correctly identified by the model. A higher value reflects better detection of actual interactions. Equation (38) shows the sensitivity formula.Sensitivity = TP/(TP + FN)
- Specificity: This metric indicates the proportion of true non-interacting pairs correctly recognized, reflecting the model’s ability to avoid false positives (see Equation (39)).Specificity = TN/(TN + FP)
- Precision: Precision is the ratio of correctly predicted interacting pairs to all predicted positives, indicating the accuracy of positive predictions (Equation (40)).Precision = TP/(TP + FP)
- (c)
- Experimental Design and Statistical Rationale:
- Cross-Validation: Conducted 5-fold cross-validation.
- Hyperparameter Employed Grid Search to identify optimal hyperparameters.
- Experiment Repetition: Repeated the experiment five times to ensure statistical robustness, reporting both the mean and standard deviation.
- To enable fair comparison across methods, we align evaluation protocols across datasets by standardizing negative sampling, split strategies, and evaluation metrics wherever possible.
| Preprocessing | Training Parameters | |
|---|---|---|
| CNNs [66] | ❖ Encode protein sequences using one-hot encoding or amino acid embeddings ❖ Normalize input features. | Number of convolutional layers: 2–3 Filter size: 3 × 3 or 5 × 5 Batch size: 32 Learning rate: 0.001 Dropout rate: 0.5 Epochs: 50–100 |
| GNNs [30] | ❖ Encode node features using sequence-based embeddings or graph Laplacians | Number of graph convolution layers: 2 Hidden units per layer: 64 or 128 Learning rate: 0.001 Batch size: 32–64 |
| DNNs [55] | ❖ Normalize or standardize sequence-derived features Apply dimensionality reduction (PCA) | Number of hidden layers: 3–5 Units per layer: 128–512 Learning rate: 0.001 |
| Probabilistic Decision Tree [89] | ❖ Discretize continuous variables if necessary | Tree depth: shallow Splitting criterion: information gain |
| Naïve Bayes [85] | ❖ Transform features into probability distributions | Distribution type: Gaussian for continuous features |
| KNN [94] | ❖ Normalize | Number of neighbors (k): 5 Distance metric: Euclidean |
| SVM [80] | ❖ Scale features between 0 and 1 | Kernel type: RBF or linear C (Regularization): 1.0 Gamma (RBF kernel): 1/n_features |
| ELM [72] | ❖ Normalize input features | Number of hidden nodes: 500 Activation function: sigmoid |
10.4. Evaluating and Ranking the Various Techniques
10.5. Discussion of the Experimental Results
11. Future Perspectives and Improvements in Supervised Learning for PPI Detection
- Extreme Learning Machine (ELM):
- ➣
- Future Directions: Introduce adaptive randomization and ensemble ELM frameworks to overcome instability in weight initialization. Integrate feature selection modules to improve robustness against noisy biological data.
- ➣
- Improvements Needed: Enhance generalization ability across different species; integrate with domain adaptation for cross-species transfer learning.
- Convolutional Neural Networks (CNNs):
- ➣
- Future Directions: Extend CNNs with 3D convolutions or graph-convolutional layers to capture protein structural dynamics. Use attention mechanisms to focus on biologically meaningful sequence regions.
- ➣
- Improvements Needed: Improve interpretability and reduce overfitting in small-sample PPI datasets through self-supervised pretraining and regularization.
- Graph Neural Networks (GNNs):
- ➣
- Future Directions: Develop heterogeneous GNNs that can jointly model PPI networks, gene co-expression, and ontological similarity. Explore temporal GNNs for modeling dynamic PPI behavior.
- ➣
- Improvements Needed: Reduce computational overhead and enhance interpretability using sparsity constraints or explainable subgraph discovery.
- Deep Neural Networks (DNNs):
- ➣
- Future Directions: Fuse DNNs with multi-modal data (e.g., text-mined literature, gene ontology) to learn richer interaction contexts. Implement meta-learning to support few-shot learning in rare protein classes.
- ➣
- Improvements Needed: Integrate uncertainty quantification to measure confidence in predictions, and apply transfer learning across datasets and organisms.
- Naïve Bayes Technique:
- ➣
- Future Directions: Use Bayesian network extensions to relax independence assumptions and incorporate feature dependencies. Combine with graphical models for structured prediction.
- ➣
- Improvements Needed: Improve performance on complex datasets through feature selection and hybrid ensemble strategies.
- Probabilistic Decision Tree:
- ➣
- Future Directions: Incorporate Bayesian optimization and Monte Carlo dropout to estimate uncertainty in predictions. Develop forest-based probabilistic trees for improved stability.
- ➣
- Improvements Needed: Improve resistance to overfitting and integrate biologically informed splitting criteria.
- Support Vector Machine (SVM):
- ➣
- Future Directions: Explore kernel learning with biological priors and hybrid kernel architectures that incorporate protein domain knowledge. Use quantum-enhanced SVMs for high-dimensional PPI representations.
- ➣
- Improvements Needed: Address scalability with approximate SVM solvers and integrate automatic kernel selection strategies.
- Least Squares SVM (LS-SVM):
- ➣
- Future Directions: Combine LS-SVM with deep feature embeddings and kernel approximations for real-time interaction prediction. Use sparse LS-SVMs to reduce computation and improve interpretability.
- ➣
- Improvements Needed: Enhance robustness to noise by integrating ensemble learning and dimensionality reduction methods.
- K-Nearest Neighbor (KNN):
- ➣
- Future Directions: Improve KNN through learned similarity metrics using Siamese or triplet networks. Embed KNN into attention-based memory networks for scalable few-shot learning.
- ➣
- Improvements Needed: Address scalability issues with approximate nearest neighbor search (e.g., using KD-trees or hashing) and reduce sensitivity to noisy features.
- Weighted K-Nearest Neighbor (WKNN):
- ➣
- Future Directions: Refine weight functions using learned kernels or information-theoretic weighting. Apply adaptive weighting strategies based on protein ontology or structural similarity.
- ➣
- Improvements Needed: Combine with clustering and dimensionality reduction to improve computational efficiency and prediction accuracy on large-scale PPI networks.
- Cross-Cutting Future Trends:
- ➣
- Integration of Domain Knowledge: Incorporating gene ontology, protein structure databases, and pathway annotations can significantly improve model performance.
- ➣
- Explainability and Trust: Future models must provide biologically meaningful rationales for predictions through post hoc explanation (e.g., SHAP, LIME) or inherently interpretable architectures.
- ➣
- Data Quality and Imbalance: Addressing label noise and class imbalance via robust loss functions, semi-supervised learning, and active learning will remain key.
12. Conclusions
Funding
Institutional Review Board Statement
Informed Consent Statement
Data Availability Statement
Conflicts of Interest
References
- Murakami, Y.; Tripathi, L.P.; Prathipati, P.; Mizuguchi, K. Network analysis and in silico prediction of protein–protein interactions with applications in drug discovery. Curr. Opin. Struct. Biol. 2017, 44, 134–142. [Google Scholar] [CrossRef] [PubMed]
- Rao, V.S.; Srinivas, K.; Sujini, G.N.; Kumar, G.N.S. Protein-protein interaction detection: Methods and analysis. Int. J. Proteom. 2014, 2014, 147648. [Google Scholar] [CrossRef]
- Barabási, A.L.; Albert, R. Emergence of scaling in random networks. Science 1999, 286, 509–512. [Google Scholar] [CrossRef]
- Barabási, A.L.; Oltvai, Z.N. Network biology: Understanding the cell’s functional organization. Nat. Rev. Genet. 2004, 5, 101–113. [Google Scholar] [CrossRef]
- Schwikowski, B.; Uetz, P.; Fields, S. A network of protein-protein interactions in yeast. Nat. Biotechnol. 2000, 18, 1257–1261. [Google Scholar] [CrossRef]
- Yook, S.-H.; Oltvai, Z.N.; Barabási, A.-L. Functional and topological characterization of protein interaction networks. Proteomics 2004, 4, 928–942. [Google Scholar] [CrossRef]
- Satuluri, V.; Parthasarathy, S.; Ucar, D. Markov clustering of protein interaction networks with improved balance and scalability. In Proceedings of the 1st ACM International Conference on Bioinformatics and Computational Biology, New York, NY, USA, 2–4 August 2010; pp. 247–256. [Google Scholar]
- Rual, J.F.; Venkatesan, K.; Hao, T.; Hirozane-Kishikawa, T.; Dricot, A.; Li, N.; Berriz, G.F.; Gibbons, F.D.; Dreze, M.; Ayivi-Guedehoussou, N.; et al. Towards a proteome-scale map of the human protein-protein interaction network. Nature 2005, 437, 1173–1178. [Google Scholar] [CrossRef]
- Tong, A.H.; Drees, B.; Nardelli, G.; Bader, G.D.; Brannetti, B.; Castagnoli, L.; Evangelista, M.; Ferracuti, S.; Nelson, B.; Paoluzi, S.; et al. A combined experimental and computational strategy to define protein interaction networks for peptide recognition modules. Science 2002, 295, 321–324. [Google Scholar] [CrossRef] [PubMed]
- Ravasz, E.; Somera, A.L.; Mongru, D.A.; Oltvai, Z.N.; Barabási, A.L. Hierarchical organization of modularity in metabolic networks. Science 2002, 297, 1551–1555. [Google Scholar] [CrossRef]
- Rives, A.W.; Galitski, T. Modular organization of cellular networks. Proc. Natl. Acad. Sci. USA 2003, 100, 1128–1133. [Google Scholar] [CrossRef] [PubMed]
- Tanay, A.; Sharan, R.; Kupiec, M.; Shamir, R. Revealing modularity and organization in the yeast molecular network by integrated analysis of highly heterogeneous genomewide data. Proc. Natl. Acad. Sci. USA 2004, 101, 2981–2986. [Google Scholar] [CrossRef] [PubMed]
- Giot, L.; Bader, J.S.; Brouwer, C.; Chaudhuri, A.; Kuang, B.; Li, Y.; Hao, Y.L.; Ooi, C.E.; Godwin, B.; Vitols, E.; et al. A protein interaction map of Drosophila melanogaster. Science 2003, 302, 1727–1736. [Google Scholar] [CrossRef]
- Pellegrini, M.; Marcotte, E.M.; Thompson, M.J.; Eisenberg, D.; Yeates, T.O. Assigning protein functions by comparative genome analysis: Protein phylogenetic profiles. Proc. Natl. Acad. Sci. USA 1999, 96, 4285–4288. [Google Scholar] [CrossRef]
- Taha, K. Determining Semantically Related Significant Genes. IEEE/ACM Trans. Comput. Biol. Bioinform. 2014, 11, 1119–1130. [Google Scholar] [CrossRef]
- Taha, K. Determining the semantic similarities among Gene Ontology terms. IEEE J. Biomed. Health Inform. 2013, 17, 512–525. [Google Scholar] [CrossRef] [PubMed]
- Halperin, I.; Wolfson, H.; Nussinov, R. Correlated mutations: Advances and limitations. A study on fusion proteins and on the Cohesin-Dockerin families. Proteins 2006, 63, 832–845. [Google Scholar] [CrossRef]
- Daraselia, N.; Yuryev, A.; Egorov, S.; Novichkova, S.; Nikitin, A.; Mazo, I. Extracting human protein interactions from MEDLINE using a full-sentence parser. Bioinformatics 2004, 20, 604–611. [Google Scholar] [CrossRef]
- Jang, H.; Lim, J.; Lim, J.H.; Park, S.J.; Lee, K.C.; Park, S.H. Finding the evidence for protein-protein interactions from PubMed abstracts. Bioinformatics 2006, 22, e220–e226. [Google Scholar] [CrossRef] [PubMed]
- Bader, G.D.; Betel, D.; Hogue, C.W. BIND: The Biomolecular Interaction Network Database. Nucleic Acids Res. 2003, 31, 248–250. [Google Scholar] [CrossRef]
- Salwinski, L.; Miller, C.; Smith, A.; Pettit, F.; Bowie, J.; Eisenberg, D. The database of interacting proteins: 2004 update. Nucleic Acids Res. 2004, 32, D449–D451. [Google Scholar] [CrossRef]
- Del Toro, N.; Shrivastava, A.; Ragueneau, E.; Meldal, B.; Combe, C.; Barrera, E.; Perfetto, L.; How, K.; Ratan, P.; Shirodkar, G.; et al. The MIntAct project—IntAct as a common curation platform for 11 molecular interaction databases. Nucleic Acids Res. 2014, 42, D358–D363. [Google Scholar]
- Mishra, G.R.; Suresh, M.; Kumaran, K.; Kannabiran, N.; Suresh, S.; Bala, P.; Shivakumar, K.; Anuradha, N.; Reddy, R.; Raghavan, T.M.; et al. Human protein reference database. Nucleic Acids Res. 2006, 34, D411–D414. [Google Scholar] [CrossRef]
- Taha, K.; Homouz, D.; Al Muhairi, H.; Al Mahmoud, Z. GRank: A middleware search engine for ranking genes by relevance to given genes. BMC Bioinform. 2013, 14, 25. [Google Scholar] [CrossRef] [PubMed]
- Cecchini, V.; Nguyen, T.-P.; Pfau, T.; Landtsheer, S.D.; Sauter, T. An Efficient Machine Learning Method to Solve Imbalanced Data in Metabolic Disease Prediction. In Proceedings of the 11th International Conference on Knowledge and Systems Engineering (KSE), Da Nang, Vietnam, 24–26 October 2019; pp. 1–5. [Google Scholar]
- Zhang, Z.; Zhang, Q.; Xiao, J.; Ding, S.; Li, Z. MFC-PPI: Protein–protein interaction prediction with multimodal feature fusion and contrastive learning. J. Supercomput. 2025, 81, 579. [Google Scholar] [CrossRef]
- Arteaga, D.; Chervov, N.; Poptsova, M. Multimodal graph, surface, and language-based model for protein protein interaction prediction. Sci. Rep. 2026, 16, 4772. [Google Scholar] [CrossRef]
- Yao, Y.; Chen, H.; Wang, J.; Wang, Y. Generative and Contrastive Self-Supervised Learning for Virulence Factor Identification Based on Protein–Protein Interaction Networks. Microorganisms 2025, 13, 1635. [Google Scholar] [CrossRef]
- Zhang, F.; Chang, S.; Wang, B.; Zhang, X. DSSGNN-PPI: A Protein–Protein Interactions prediction model based on Double Structure and Sequence graph neural networks. Comput. Biol. Med. 2024, 177, 108669. [Google Scholar] [CrossRef]
- Bi, X.; Ma, W.; Jiang, H.; Lu, W.; Nie, J.; Wei, Z.; Zhang, S. Protein interaction pattern recognition using heterogeneous semantics mining and hierarchical graph representation. Pattern Recognit. 2026, 172, 112563. [Google Scholar] [CrossRef]
- Park, S.; Kim, D.; Lee, H.; Hong, C.H.; Son, S.J.; Roh, H.W.; Kim, D.; Nam, Y.; Lee, D.G.; Shin, H.; et al. Plasma protein-based identification of neuroimage-driven subtypes in mild cognitive impairment via protein-protein interaction aware explainable graph propagational network. Comput. Biol. Med. 2024, 183, 109303. [Google Scholar] [CrossRef]
- Daou, L.; Hanna, E. Predicting protein complexes in protein interaction networks using Mapper and graph convolution networks. Comput. Struct. Biotechnol. J. 2024, 23, 3595–3609. [Google Scholar] [CrossRef]
- Wang, C.; Yang, C.; Yang, W.; Song, L.; Shi, C. Full-Atom Protein-Protein Interaction Prediction via Atomic Equivariant Attention Network. In Proceedings of the 34th ACM International Conference on Information and Knowledge Management (CIKM ‘25); Association for Computing Machinery: New York, NY, USA; pp. 2967–2976.
- Li, Z.; Zhang, Y.; Zhou, P. Temporal Protein Complex Identification Based on Dynamic Heterogeneous Protein Information Network Representation Learning. IEEE/ACM Trans. Comput. Biol. Bioinform. 2024, 21, 1154–1164. [Google Scholar] [CrossRef]
- Chiaranaipanich, J.; Achakulvisut, T. PPIM-Struct: Leveraging Structural Embeddings for Predicting Protein-Protein Interaction Modulator Interactions. In Proceedings of the 8th International Conference on Computational Biology and Bioinformatics (ICCBB ‘24), New York, NY, USA, 28–30 November 2024; pp. 20–24. [Google Scholar]
- Chen, S.; Tang, Z.; You, L.; Chen, C.Y.-C. Accurate Protein–Protein Interaction Prediction: Based on Multiview Heterogeneous Graph Autoencoders and Random Masking. IEEE Trans. Neural Netw. Learn. Syst. 2025, 1–14. [Google Scholar] [CrossRef]
- Yang, J.; Lu, Z.; Zhao, W.; Jiang, X. Predicting Protein-Peptide Binding Residues via Gated Fusion Mechanism and Domain-Guided Feature Optimization. In Proceedings of the 2025 IEEE International Conference on Bioinformatics and Biomedicine (BIBM), Wuhan, China, 15–18 December 2025; pp. 208–213. [Google Scholar]
- Zhang, L.; Sun, Y. GDTGO: Advancing Protein Function Prediction via Graph Convolutional Network and Iterative Optimization. In Proceedings of the 2025 IEEE International Conference on Bioinformatics and Biomedicine (BIBM), Wuhan, China, 15–18 December 2025; pp. 7743–7749. [Google Scholar]
- Mao, D.; Sun, Y. A Local-Global Multi-View Diffusion Variational Graph Auto-Encoder for lncRNA-Protein Interaction Prediction. IEEE J. Biomed. Health Inform. 2025, 30, 3219–3232. [Google Scholar] [CrossRef]
- Zhang, J.; Feng, H.; Wei, H.; Zhu, Z. CFPLM: Improve Protein-RNA Interaction Prediction with a Collaborative Framework Powered by Language Models. In Proceedings of the 2025 IEEE International Conference on Bioinformatics and Biomedicine (BIBM), Wuhan, China, 15–18 December 2025; pp. 230–235. [Google Scholar]
- Jha, K.; Saha, S.; Singh, H. Prediction of protein–protein interaction using graph neural networks. Sci. Rep. 2022, 12, 8360. [Google Scholar] [CrossRef]
- Koca, M.B.; Nourani, E.; Abbasoğlu, F.; Karadeniz, İ.; Sevilgen, F.E. Graph convolutional network based virus-human protein-protein interaction prediction for novel viruses. Comput. Biol. Chem. 2022, 101, 107755. [Google Scholar] [CrossRef]
- Xiao, Z.; Deng, Y. Graph embedding-based novel protein interaction prediction via higher-order graph convolutional network. PLoS ONE 2020, 15, e0238915. [Google Scholar] [CrossRef]
- Li, G. DeepGCNs: Making GCNs Go as Deep as CNNs. IEEE Trans. Pattern Anal. Mach. Intell. 2023, 45, 6923–6939. [Google Scholar] [CrossRef] [PubMed]
- Voytetskiy, A.; Herbert, A.; Poptsova, M. Graph Neural Networks for Z-DNA prediction in Genomes. In Proceedings of the IEEE International Conference on Bioinformatics and Biomedicine (BIBM), Las Vegas, NV, USA, 6–8 December 2022; pp. 3173–3178. [Google Scholar]
- Zhu, J.; Zheng, Z.; Yang, M.; Fung, G.P.C.; Huang, C. Protein Complexes Detection Based on Semi-Supervised Network Embedding Model. IEEE/ACM Trans. Comput. Biol. Bioinform. 2021, 18, 797–803. [Google Scholar] [CrossRef]
- Dey, L.; Chakraborty, S. Supervised learning approaches for predicting Ebola-Human Protein-Protein interactions. Gene 2025, 942, 149228. [Google Scholar] [CrossRef] [PubMed]
- Gainza, P.; Bunker, R.D.; Townson, S.A.; Castle, J.C. Machine learning to predict de novo protein-protein interactions. Trends Biotechnol. 2025, 43, 3056–3070. [Google Scholar] [CrossRef] [PubMed]
- Samarasinghe, S.; Minh-Thai, T.N.; Sorthiya, K.; Kulasiri, D. Neurons and neural networks to model proteins and protein networks. Biosystems 2025, 258, 105613. [Google Scholar] [CrossRef] [PubMed]
- Zhang, Z.; Wang, Z.; Zhao, J.; Wang, J.; Wang, C. Multimodal Contrastive Learning for Protein–Protein Interaction Inhibitor Prediction. In Proceedings of the 2024 IEEE International Conference on Bioinformatics and Biomedicine (BIBM), Lisbon, Portugal, 3–6 December 2024; pp. 1327–1333. [Google Scholar]
- Shao, J.; Chen, J.; Liu, B. ProFun-SOM: Protein Function Prediction for Specific Ontology Based on Multiple Sequence Alignment Reconstruction. IEEE Trans. Neural Netw. Learn. Syst. 2025, 36, 8060–8071. [Google Scholar] [CrossRef]
- Qiu, Y. DrugProtKGE: Weakly Supervised Knowledge Graph Embedding for Highly-Effective Drug-Protein Interaction Representation. In Proceedings of the 2023 IEEE International Conference on Bioinformatics and Biomedicine (BIBM), Istanbul, Turkey, 5–8 December 2023; pp. 1386–1393. [Google Scholar]
- Kumar, K.; Karim, S.M.S.; Kumar, M.; Singh, R.R. Prediction of transient and permanent protein interactions using AI methods. Bioinformation 2023, 19, 749–753. [Google Scholar] [CrossRef]
- Li, Z.; Yu, Y. Protein secondary structure prediction using cascaded convolutional and recurrent neural networks. In Proceedings of the 25th International Joint Conference on Artificial Intelligence (IJCAI), New York, NY, USA, 9–15 July 2016; pp. 2560–2567. [Google Scholar]
- Li, X.; Han, P.; Wang, G. SDNN-PPI: Self-attention with deep neural network effect on protein-protein interaction prediction. BMC Genom. 2022, 23, 474. [Google Scholar] [CrossRef]
- Wang, J.; Wang, X.; Chen, W. Prediction of protein interactions based on CT-DNN. In Proceedings of the 2022 9th International Conference on Biomedical and Bioinformatics Engineering (ICBBE ‘22), Kyoto, Japan, 10–13 November 2022; pp. 81–87. [Google Scholar]
- Tran, H.-N.; Nguyen, P.-X.-Q.; Peng, X.; Wang, J. An integration of deep learning with feature fusion for protein-protein interaction prediction. In Proceedings of the 2022 IEEE International Conference on Bioinformatics and Biomedicine (BIBM), Las Vegas, NV, USA, 6–8 December 2022; pp. 77–80. [Google Scholar]
- Vyas, R.; Bapat, S.; Goel, P.; Karthikeyan, M.; Tambe, S.S.; Kulkarni, B.D. Application of Genetic Programming (GP) Formalism for Building Disease Predictive Models from Protein-Protein Interactions (PPI) Data. IEEE/ACM Trans. Comput. Biol. Bioinform. 2018, 15, 27–37. [Google Scholar] [CrossRef]
- Bakar, S.; Zomaya, A.; Taheri, J. FIS-PNN: A hybrid computational method for protein-protein interaction prediction. In Proceedings of the ACS International Conference on Computer Systems and Applications (AICCSA), Sharm El-Sheikh, Egypt, 27–30 May 2013; pp. 196–203. [Google Scholar]
- Goodacre, N.; Edwards, N.; Danielsen, M.; Uetz, P.; Wu, C. Predicting nsSNPs that Disrupt Protein-Protein Interactions Using Docking. IEEE/ACM Trans. Comput. Biol. Bioinform. 2017, 14, 1082–1093. [Google Scholar] [CrossRef] [PubMed]
- Han, Y.; Zhang, S.-W.; Shi, M.-H.; Zhang, Q.-Q.; Li, Y.; Cui, X. Predicting protein-protein interaction with interpretable bilinear attention network. Comput. Methods Programs Biomed. 2025, 265, 108756. [Google Scholar] [CrossRef]
- Tang, T.; Zhang, X. Co-training based prediction of multi-label protein–protein interactions. Comput. Biol. Med. 2024, 177, 108623. [Google Scholar] [CrossRef] [PubMed]
- Bi, X.; Ma, W.; Jiang, H.; Lu, W.; Wei, Z.; Zhang, S. SSPPI: Cross-Modality Enhanced Protein–Protein Interaction Prediction From Sequence and Structure Perspectives. IEEE Trans. Neural Netw. Learn. Syst. 2026, 37, 22–36. [Google Scholar] [CrossRef]
- Paul, S.; Karłowski, W.; Girgis, H. Generating Novel Protein Sequences with Self-Supervised Learning. In Proceedings of the 2025 Intelligent Methods, Systems, and Applications (IMSA), Giza, Egypt, 12–13 July 2025; pp. 668–673. [Google Scholar]
- Gao, H.; Chen, C.; Li, S.; Wang, C.; Zhou, W.; Yu, B. Prediction of protein-protein interactions based on ensemble residual convolutional neural network. Comput. Biol. Med. 2023, 152, 106471. [Google Scholar] [CrossRef]
- Hu, X.; Feng, C.; Zhou, Y.; Harrison, A.; Chen, M. DeepTrio: A ternary prediction system for protein–protein interaction using mask multiple parallel convolutional neural networks. Bioinformatics 2022, 38, 694–702. [Google Scholar] [CrossRef]
- Xie, Z.; Deng, X.; Shu, K. Prediction of Protein-Protein Interaction Sites Using Convolutional Neural Network and Improved Data Sets. Int. J. Mol. Sci. 2020, 21, 467. [Google Scholar] [CrossRef]
- Cai, K.; Zhu, Y. A Method for Identifying Essential Proteins Based on Deep Convolutional Neural Network Architecture with Particle Swarm Optimization. In Proceedings of the 2022 Asia Conference on Advanced Robotics, Automation, and Control Engineering (ARACE), Qingdao, China, 26–28 August 2022; pp. 7–12. [Google Scholar]
- Zhang, H. Deep Residual Convolutional Neural Network for Protein-Protein Interaction Extraction. IEEE Access 2019, 7, 89354–89365. [Google Scholar] [CrossRef]
- Yuan, X.; Deng, H.; Hu, J. Deep Transfer Learning Based PPI Prediction for Protein Complex Detection. In Proceedings of the IEEE International Conference on Systems, Man, and Cybernetics (SMC), Melbourne, Australia, 17–20 October 2021; pp. 321–326. [Google Scholar]
- Dutta, P.; Saha, S. Ensembling of Gene Clusters Utilizing Deep Learning and Protein-Protein Interaction Information. IEEE/ACM Trans. Comput. Biol. Bioinform. 2020, 17, 2005–2016. [Google Scholar] [CrossRef]
- Debby, D.W.; Ran, W.; Hong, Y. Fast prediction of protein-protein interaction sites based on Extreme Learning Machines. Neurocomputing 2014, 128, 258–266. [Google Scholar]
- You, Z.; Ming, Z.; Huang, H.; Peng, X. A novel method to predict protein-protein interactions based on the information of protein sequence. In Proceedings of the 2012 IEEE International Conference on Control System, Computing and Engineering, Penang, Malaysia, 23–25 November 2012; pp. 210–215. [Google Scholar]
- Sikandar, M. Analysis for Disease Gene Association Using Machine Learning. IEEE Access 2020, 8, 160616–160626. [Google Scholar] [CrossRef]
- You, Z.; Zhou, M.; Luo, X.; Li, S. Highly Efficient Framework for Predicting Interactions Between Proteins. IEEE Trans. Cybern. 2017, 47, 731–743. [Google Scholar] [CrossRef]
- Wang, R.; Ma, H.; Wang, C. An Ensemble Learning Framework for Detecting Protein Complexes From PPI Networks. Front. Genet. 2022, 13, 839949. [Google Scholar] [CrossRef] [PubMed]
- Choppara, P.; Lokesh, P. Quantum Machine Learning for Prediction of Compound-Protein Interactions in Drug Discovery. In Proceedings of the 2024 12th International Conference on Intelligent Systems and Embedded Design (ISED), Rourkela, India, 20–22 December 2024. [Google Scholar]
- Chen, Y.; Liu, F.; Manderick, B. BioLMiner System: Interaction Normalization Task and Interaction Pair Task in the BioCreative II.5 Challenge. IEEE/ACM Trans. Comput. Biol. Bioinform. 2010, 7, 428–441. [Google Scholar] [CrossRef]
- Lin, X.; Zhang, X. Prediction and Analysis of Hot Region in Protein-Protein Interactions. In Proceedings of the IEEE International Conference on Bioinformatics and Biomedicine (BIBM), Shenzhen, China, 15–18 December 2016; pp. 1598–1603. [Google Scholar]
- Chen, H. Hyperparameter Estimation in SVM with GPU Acceleration for Prediction of Protein-Protein Interactions. In Proceedings of the IEEE International Conference on Big Data (Big Data), Los Angeles, CA, USA, 9–12 December 2019; pp. 2197–2204. [Google Scholar]
- Chatrabgoun, O.; Daneshkhah, A.; Esmaeilbeigi, M.; Safa, N.; Alenezi, A.; Rahman, N. Predicting Primary Sequence-Based Protein-Protein Interactions Using a Mercer Series Representation of Nonlinear Support Vector Machine. IEEE Access 2022, 10, 124345–124354. [Google Scholar] [CrossRef]
- Sunggawa, M.; Bustamam, A.; Sarwinda, D.; Tampubolon, P.P.; Mangunwardoyo, W. Prediction of Protein-Protein Interactions Between HIV-1 and Human Using Support Vector Machine Combined with Multivariate Mutual Information. In Proceedings of the International Conference on Biomedical Engineering (IBIOMED), Yogyakarta, Indonesia, 6–8 November 2020; pp. 77–81. [Google Scholar]
- Qi, J.; Zhang, X.; Li, B. Protein Interaction Hot Spots Prediction Using LS-SVM within the Bayesian Interpretation. In Advanced Data Mining and Applications; Motoda, H., Wu, Z., Cao, L., Zaiane, O., Yao, M., Wang, W., Eds.; Springer: Berlin/Heidelberg, Germany, 2013; Volume 8347, pp. 431–442. [Google Scholar]
- An, J.-Y.; You, Z.-H.; Meng, F.-R.; Xu, S.-J.; Wang, Y. RVMAB: Using the Relevance Vector Machine Model Combined with Average Blocks to Predict the Interactions of Proteins from Protein Sequences. Int. J. Mol. Sci. 2016, 17, 757. [Google Scholar] [CrossRef]
- Kong, H.; Kim, I.; Zhang, B.-T. AutoTarget: Disease-Associated druggable target identification via node representation learning in PPI networks. Curr. Res. Biotechnol. 2024, 8, 100260. [Google Scholar] [CrossRef]
- Xu, B.; Lin, H.; Yang, Z.; Wagholikar, K.; Liu, H. Classifying protein complexes from candidate subgraphs using fuzzy machine learning model. In Proceedings of the 2012 IEEE International Conference on Bioinformatics and Biomedicine Workshops, Philadelphia, PA, USA, 4–7 October 2012; pp. 640–647. [Google Scholar]
- Metipatil, P.; Bhuvaneshwari, P.; Basha, S.M.; Patil, S.S. An Efficient Framework for Classifying Cancer Diseases Using Ensemble Machine Learning Over Cancer Gene Expression and Sequence-Based Protein Interactions. In Proceedings of the 2023 2nd International Conference for Innovation in Technology (INOCON), Bangalore, India, 3–5 March 2023; pp. 1–8. [Google Scholar]
- Hu, J.; Li, Z.; Zhang, X.; Chen, N. Prediction of Hot Spots in Protein-Protein Interaction by Nine-Pipeline & Ensemble Learning Strategy. In Proceedings of the 2021 IEEE International Conference on Bioinformatics and Biomedicine (BIBM), Houston, TX, USA, 9–12 December 2021; pp. 2223–2230. [Google Scholar]
- Zaki, N.; Alashwal, H. Improving the Detection of Protein Complexes by Predicting Novel Missing Interactome Links in the Protein-Protein Interaction Network. Annu. Int. Conf. IEEE Eng. Med. Biol. Soc. 2018, 2018, 5041–5044. [Google Scholar]
- Yang, X. CETSA Feature-Based Clustering for Protein Outlier Discovery by Protein-to-Protein Interaction Prediction. In Proceedings of the 44th Annual International Conference of the IEEE Engineering in Medicine & Biology Society (EMBC), Glasgow, Scotland, UK, 11–15 July 2022; pp. 1659–1662. [Google Scholar]
- Yue, Y.; Li, S.; Wang, L.; Liu, H.; Tong, H.H.Y.; He, S. MpbPPI: A multi-task pre-training-based equivariant approach for the prediction of the effect of amino acid mutations on protein-protein interactions. Brief. Bioinform. 2023, 24, bbad310. [Google Scholar] [CrossRef] [PubMed]
- Dey, L.; Mukhopadhyay, A. A Classification-Based Approach to Prediction of Dengue Virus and Human Protein-Protein Interactions Using Amino Acid Composition and Conjoint Triad Features. In Proceedings of the IEEE Region 10 Symposium (TENSYMP), Kolkata, India, 7–9 June 2019; pp. 373–378. [Google Scholar]
- Szklarczyk, D.; Gable, A.L.; Lyon, D.; Junge, A.; Wyder, S.; Huerta-Cepas, J.; Simonovic, M.; Doncheva, N.T.; Morris, J.H.; Bork, P.; et al. STRING v11: Protein-protein association networks with increased coverage, supporting functional discovery in genome-wide experimental datasets. Nucleic Acids Res. 2019, 47, D607–D613. [Google Scholar] [CrossRef]
- Ambert, A.; Cohen, A. k-Information Gain Scaled Nearest Neighbors: A Novel Approach to Classifying Protein-Protein Interaction-Related Documents. IEEE/ACM Trans. Comput. Biol. Bioinform. 2012, 9, 305–310. [Google Scholar] [CrossRef] [PubMed]
- Koskinen, P.; Törönen, P.; Nokso-Koivisto, J.; Holm, L. PANNZER: High-throughput functional annotation of uncharacterized proteins in an error-prone environment. Bioinformatics 2015, 31, 1544–1552. [Google Scholar] [CrossRef] [PubMed]
- Wang, T.; Chen, X.; Zeng, K. Molecular mechanism and candidate biomarkers of morphine for analgesia and addiction effects. Ann. Transl. Med. 2022, 10, 89. [Google Scholar] [CrossRef]
- Sinaga, K.; Yang, M. Unsupervised K-Means Clustering Algorithm. IEEE Access 2020, 8, 80716–80727. [Google Scholar] [CrossRef]
- Tsimperidis, I.; Yoo, P.D.; Taha, K.; Mylonas, A.; Katos, V. R2BN: An Adaptive Model for Keystroke-Dynamics-Based Educational Level Classification. IEEE Trans. Cybern. 2020, 50, 525–535. [Google Scholar] [CrossRef]
- Zitnik, M.; Leskovec, J. Predicting multicellular function through multi-layer tissue networks. Bioinformatics 2017, 33, i190–i198. [Google Scholar] [CrossRef]
- Wang, S.; Sun, S.; Li, Z.; Zhang, R.; Xu, J. Accurate de novo prediction of protein contact map by ultra-deep learning model. PLoS Comput. Biol. 2017, 13, e1005324. [Google Scholar] [CrossRef]
- Li, Y.; Wang, S.; Umarov, R.; Xie, B.; Fan, M.; Li, L.; Gao, X. DEEPre: Sequence-based enzyme EC number prediction by deep learning. Bioinformatics 2018, 34, 760–769. [Google Scholar] [CrossRef] [PubMed]
- Domingos, P.; Pazzani, M. On the Optimality of the Simple Bayesian Classifier under Zero-One Loss. Mach. Learn. 1997, 29, 103–130. [Google Scholar] [CrossRef]
- Murthy, S.K. Automatic Construction of Decision Trees from Data: A Multi-Disciplinary Survey. Data Min. Knowl. Discov. 1998, 2, 345–389. [Google Scholar] [CrossRef]
- Suykens, J.A.K.; Vandewalle, J. Least Squares Support Vector Machine Classifiers. Neural Process. Lett. 1999, 9, 293–300. [Google Scholar] [CrossRef]
- Chen, X.-W.; Jeong, J.C. Sequence-based prediction of protein interaction sites with an integrative method. Bioinformatics 2009, 25, 585–591. [Google Scholar] [CrossRef]
- Huang, G.; Zhu, Q.; Siew, C. Extreme learning machine: Theory and applications. Neurocomputing 2006, 70, 489–501. [Google Scholar] [CrossRef]
- Cortes, C.; Vapnik, V. Support-vector networks. Mach. Learn. 1995, 20, 273–297. [Google Scholar] [CrossRef]
- Cover, T.; Hart, P. Nearest neighbor pattern classification. IEEE Trans. Inf. Theory 1967, 13, 21–27. [Google Scholar] [CrossRef]














| Scale Description (1–4) | |
|---|---|
| Scalability | 4: Demonstrated scalability to large interactomes/large candidate spaces, e.g., trained/evaluated on tens of thousands of proteins and/or >105 interactions, with clear strategies to manage the O(n2) pair explosion (e.g., candidate pruning, efficient negative sampling). Supports multi-species and network-scale use cases. 3: Scales to moderate PPI datasets (typically up to ~104–105 labeled pairs) with stable runtime; may rely on GPUs but remains feasible for repeated experiments; limited evidence for full interactome-level reconstruction. 2: Works mainly on small-to-mid curated datasets; scaling claims are limited or only shown with restricted candidate sets; runtime/memory increase noticeably with dataset growth. 1: No scalability evidence; method becomes impractical beyond small benchmarks (e.g., pair enumeration or heavy feature computation without optimization). |
| Interpretability | 4: Provides intrinsic or biologically grounded explanations (e.g., residue-/motif-level contributions, interaction-region rationale) and/or validated XAI outputs; explanations are reproducible and clearly reported. 3: Offers partial interpretability (e.g., attention maps, feature importance, saliency/IG/Grad-CAM-style analyses) with some biological discussion, but limited validation or consistency checks. 2: Minimal interpretability: only high-level model introspection (e.g., “attention weights shown”) or generic feature importances; explanations not linked to clear biological hypotheses. 1: Pure black-box; no explanation mechanism or analysis beyond reporting predictive scores. |
| Accuracy | 4: Strong performance across multiple benchmarks and robust settings, preferably including: AUPRC (recommended for imbalanced PPI data), AUROC, MCC/F1, and generalization tests (cross-species/cold-start splits). Addresses data leakage/redundancy and reports results on realistic evaluation protocols; ideally supports network-level capability. 3: High accuracy on standard benchmarks with appropriate metrics reported (AUROC plus AUPRC/MCC/F1), but limited external validation, limited cold-start tests, or weaker leakage controls. 2: Moderate accuracy; results depend strongly on dataset choice/split; limited metrics (e.g., accuracy-only or AUROC-only) in imbalanced settings. 1: Low or unstable accuracy; inconsistent across datasets; weak evaluation design (single split, insufficient negatives, unclear protocols). |
| Efficiency | 4: Fast training + fast inference suitable for high-throughput screening; reports compute cost (e.g., runtime/epochs/GPU usage) and shows efficient deployment (batch inference, lightweight architecture, or amortized embeddings). Handles large candidate sets without excessive memory/time. 3: Practical compute on modern hardware (often GPU); training is moderate; inference feasible for typical experimental pipelines, but may be heavy for interactome-wide scoring without optimization. 2: Computationally demanding (large models, expensive feature pipelines, long training); inference may be slow; limited reporting of compute or only feasible at small scale. 1: Impractical compute requirements for routine use (very long training, high memory footprint, or very slow inference); efficiency not demonstrated or clearly prohibitive. |
| Scalability | Interpretability | Accuracy | Efficiency | Datasets Used | |
|---|---|---|---|---|---|
| [26] | 4—validated on SHS148k and uses lightweight fused features. | 2—fusion/contrastive modules are not inherently explainable. | 3—reported to outperform SOTA under Random/BFS/DFS splits. | 2—contrastive learning and multimodal extraction add overhead. | SHS27k; SHS148k |
| [27] | 2—multimodal surface + structure + transformer increases preprocessing cost. | 2—attention can be inspected but explanations are limited. | 3—improves over graph-only baselines on PINDER. | 1– MaSIF surface features and cross-modal transformer are compute-heavy. | PINDER |
| [28] | 3—graph SSL scales to large PPI networks and handles imbalance without heavy resampling. | 1—learned embeddings are not directly interpretable (no explicit explanation module). | 3—outperforms baselines on naturally imbalanced VF datasets. | 2—SSL adds training cost but inference is efficient. | STRING PPI networks + VFDB labels (S. enterica LT2; |
| [29] | 2—multilevel graphs can scale, but structural graph construction adds cost. | 1—GNN-based embeddings provide limited human interpretability. | 3—shows clear gains over baselines in reported evaluations. | 2—two-stage modeling and graph attention raise compute vs. single-stream models. | SHS27k; SHS148k; Yeast |
| [30] | 3—designed for large-scale PPI spaces using hierarchical KG representations. | 2—semantics are explicit (GO categories), but predictions are not fully explainable. | 4—achieves state-of-the-art on multiple datasets. | 3—emphasizes efficiency for large-scale prediction compared with heavy 3D methods. | DIP S. cerevisiae; STRING S. cerevisiae; |
| [31] | 2—cohort-scale application; but model is simple. | 4—propagation parameters are directly interpretable. | 3—improved subtype classification vs. baselines. | 3—lightweight propagation + simple training. | BICWALZS; PPI network context |
| [32] | 2—tested on large STRING but GAN + Mapper steps add complexity. | 2—Mapper/TDA aids qualitative understanding, but clusters are not fully explanations. | 3—improves recall and matching metrics vs. prior prediction methods. | 1—GAN training and TDA mapping can be computationally intensive. | Gavin; Krogan; MIPS; STRING; Gold standards: |
| [33] | 1—full-atom equivariant attention is expensive at scale. | 2—Attention offers inspection but not full explanations. | 4—significantly outperforms SOTA in 3 tasks. | 1—atom-level message passing are compute-heavy. | H-PPI; M2H-PPI; AM-PPI; |
| [34] | 2—Benchmarks are moderate in size or require 3D structures | 2—includes attention/masking or ontology guidance that can provide some insights. | 3—reported to outperform strong baselines on benchmarks | 3—designed with efficiency considerations and/or reports favorable runtime | DIP, MIPS, Gavin, Krogan, BioGRID |
| [35] | 2—relies on curated structural data; scaling depends on structure availability | 2—includes attention/masking or ontology guidance that can provide some insights | 3—reported to outperform strong baselines on benchmarks | 3—designed with efficiency considerations and/or reports favorable runtime. | PDB |
| [36] | 3—uses large-scale STRING-derived benchmarks | 2—includes attention/masking or ontology guidance that can provide some insights | 3—reported to outperform strong baselines on benchmarks | 3—designed with efficiency considerations and/or reports favorable runtime | PDB, STRING, SYS30k, SYS60k, SHS148k. |
| [37] | 2—relies on curated structural data; scaling depends on structure availability | 2—includes attention/masking or ontology guidance that can provide some insights | 3—reported to outperform strong baselines on benchmarks | 3—designed with efficiency considerations and/or reports favorable runtime | PDB, TS125, TS639 |
| [38] | 2—relies on curated structural data; scaling depends on structure availability | 2—includes attention/masking or ontology guidance that can provide some insights | 2—Task differs from PPI detection; accuracy not directly comparable | 3—designed with efficiency considerations and/or reports favorable runtime | PDB, DeepFRI |
| [39] | 4—uses large-scale STRING-derived benchmarks | 2—includes attention/masking or ontology guidance that can provide some insights | 3—reported to outperform strong baselines on benchmarks | 3—designed with efficiency considerations and/or reports favorable runtime | STRING |
| [40] | 2—relies on curated structural data; scaling depends on structure availability | 2—includes attention/masking or ontology guidance that can provide some insights | 3—reported to outperform strong baselines on benchmarks | 3—designed with efficiency considerations and/or reports favorable runtime | PDB |
| [42] | 3—designed for multi-disease gene prioritization using protein interaction/topology features. | 2—combines multiple biological/network features, offering moderate interpretability. | 3—achieved precision up to 93.8% and F-measure up to 92.9% | 3—incorporates DELM for fast execution compared to classical classifiers. | DisGeNET disease-gene association dataset |
| [43] | 4—LELM is tailored for large PPI datasets using low-rank approx. | 2—Kernel ELM and LRA introduce complexity, but input features are bio. meaningful | 3—superior accuracy over state-of-the-art methods in human PPI detection | 4—employs K-ELM with low-rank approximations for efficient training & execution | PPI dataset with 73,110 protein pairs |
| [44] | 2—Deep GCN framework scales reasonably but may be limited by graph size and model depth | 1—uses residual and dense connections with dilated convolutions, making it difficult to interpret | 3—demonstrates high predictive performance through structural modeling of PPI net | 2—Deep architecture may incur higher computational costs than traditional models | Not explicitly named; applied to protein interaction networks |
| [45] | 3—Semi-supervised GCN model handles large PPI networks | 2—GCN layers are partially interpretable, though deeper layers obscures reasoning | 3—detects PPIs and protein complexes with good precision-recall balance | 3—Efficient graph representation enables scalable semi-supervised learning | large-scale PPI networks |
| [46] | 3—utilizes ELM with local sequence descriptors, scalable to large yeast PPI datasets | 2—combines multiple descriptors and ELM, which may reduce transparency | 3—89.09% accuracy on Saccharomyces cerevisiae dataset | 4—ELM’s analytical training provides rapid model convergence compared to SVM. | Saccharomyces cerevisiae PPI dataset |
| Scalability | Interpretability | Accuracy | Efficiency | Datasets used | |
|---|---|---|---|---|---|
| [47] | 3—multi-model pipeline trained on curated interaction data and evaluated across datasets. | 2—includes some interpretable classical models, but deep model decisions are less transparent. | 3—reports strong predictive results for target classification/prioritization. | 3—classifiers are relatively lightweight to train/infer after feature preparation. | EbolaInt dataset; comparisons/overlaps with other virus-related datasets |
| [48] | 2—discusses scalable approaches but does not present a single large-scale implemented system. | 3—review-style synthesis provides clear, human-readable insights and design guidance. | 2—summarizes literature results rather than reporting a new, directly comparable benchmark. | 2—focuses on methods; computational costs vary by approach (no unified runtime study). | protein sequences, structural data, and curated PPI databases. |
| [49] | 2—validated on a modest-sized network case study (core mammalian cell-cycle control system). | 2—protein-to-neuron mapping aids understanding, but learned dynamics is neural-network-based | 3—demonstrates close reproduction of reference dynamics generated by established mechanistic models. | 2—training is manageable for small/medium networks; efficiency depends on architecture and data generation. | Time-series data generated from an ODE model for 12 core proteins; binary data |
| [50] | 2—leverages large-scale self-supervised pretraining (≈1.6M molecules) but uses a multi-stage deep pipeline. | 1—multimodal deep encoders/adapters and fusion are hard to explain at decision level. | 4—reports near-ceiling performance on PPII identification (very high AUC/F1) and strong potency prediction. | 2—heavy pretraining/training, but inference is practical once trained; uses GPU training. | GuacaMol; pdCSM-PPI benchmark with positives, iPPI-DB, 2P2I-DB v2. |
| [51] | 3—MSA-based representation + SOM supports scaling across many proteins and functions. | 3—SOM/prototype structure enables inspection of clusters and representative patterns. | 3—reports improved function prediction on multiple benchmarks. | 3—SOM-based learning and fixed representations can be computationally efficient once MSAs are prepared. | CAFA3; SwissProt; NetGO2 (benchmarks used in evaluation). |
| [52] | 3—evaluated on sizable real-world knowledge graphs; approach designed for re-usability in compute-restricted settings. | 2—embedding-based representations offer partial interpretability but remain largely latent. | 3—reports consistent improvements vs. strong embedding baselines (including large average F1 gains on multi-interaction triples). | 3—uses a shallow (single ENC-SCO) architecture and modest training regime for downstream probing. | ChemProt and DrugProt knowledge graphs from BioCreative |
| [53] | 2-CETSA PPI prediction using decision trees; not tested on large datasets | 3—uses decision trees, which are transparent and easy to interpret | 2—good matching of predicted PPI scores, but lacks extensive benchmarking | 2—Decision trees are efficient, but iterative clustering adds computational overhead | CETSA, (HCT116, HEK293T) and Bioplex database |
| [54] | 4—applies node representation learning to over 19,000 proteins across 23,000+ diseases | 2—Embedding methods like node2vec are less transparent but biologically validated | 3—achieved 0.90 recall and 0.79 F1 score in drug target prediction | 3—Embedding and Naïve Bayes classification ensure fast prediction on large-scale graphs | STRING, TTD, DisGeNET |
| [55] | 2—tested on small datasets with up to 493 protein complexes; may not scale well | 2—Fuzzy logic enhances uncertainty handling but reduces model clarity | 3—GAFNB outperformed Naïve Bayes, demonstrating robustness to noisy data | Moderate—Genetic algorithm and fuzzy modeling increase computational demands | MIPS and TAP-MS datasets |
| [56] | 3—Framework integrates gene expression and PPI data for multiple cancers | 2—Ensemble models (SVM + NB) reduce explainability | 3—Achieved 15â€″20% improvement over baseline classifiers on cancer datasets | 3—Ensemble method balances training time with strong classification performance | Microarray cancer datasets and sequence PPI data |
| [57] | 3—ensemble of nine pipelines with stacking, allowing large hotspot prediction | 1—Complex model with stacking & multiple algorithms limits transparency | 3—Achieved accuracy of 0.8462 using ensemble voting and stacking | 2—Stacking ensemble increases computational time despite performance gain | SpotOn and extracted features from HotPoint |
| [58] | 4—designed for multi-task learning with flexible pre-training on large complexes | 1—Deep geometric learning models and equivariant encoding are complex | 3—outperformed existing models on four benchmark datasets for PPI prediction | 3—Multi-task pre-training accelerates downstream prediction with good generalization | Four benchmark PPI mutation datasets |
| [59] | 3—capable of predicting 411 new dengue-human PPIs using SVM | 3—Traditional models like SVM, NB, and KNN offer good model transparency | 3—SVM achieved superior accuracy in dengue-human PPI prediction. | 3—Supervised learning models ensure efficient processing with cross-validation. | DenvInt database of 692 dengue-human interactions |
| Scalability | Interpretability | Accuracy | Efficiency | Datasets used | |
|---|---|---|---|---|---|
| [61] | 2—requires sequence embeddings plus AlphaFold2 structure graphs and GearNet/BAN processing, which is heavy. | 3—bilinear attention weight maps can be mapped to specific residues to highlight significant interaction sites. | 4—reported to outperform other state-of-the-art methods across four datasets. | 2—deep multimodal pipeline increases compute and memory compared to simpler baselines. | Yeast, Yeast_10, Multi-species, Multi-interaction type (STRING). |
| [62] | 3—uses 1D CNN + GCN and emphasizes reduced complexity versus attention-heavy models; designed for large multi-label datasets. | 2—uses explicit nominal physicochemical features (more interpretable than latent-only encodings); lacks residue attention explanations | 3—consistent Micro-F1 gains (3.81–32.40%) compared with state-of-the-art baselines under BFS/DFS/Random splits. | 3—nominal features are lower-dimensional and the model avoids heavy attention mechanisms, reducing computation/data needs. | STRING-derived multi-label PPI type datasets; SHS27K, SYS60K |
| [63] | 2—multiple attention-based modules across two modalities plus cross-protein fusion increases training/inference cost. | 3—attention mechanisms can indicate important residues, though interpretation is less than rule-based models | 4 —performance surpassing existing state-of-the-art methods on four benchmark datasets. | 1—attention and graph modules add overhead compared to lightweight CNN-only approaches. | Yeast, Multi-species, Multi-class, SKEMPI-derived ∆∆G regression. |
| [64] | 3—trained on hundreds of thousands of UniProt sequences & large ORF corpora using AE/VAE architectures. | 1—latent representations and reconstruction error offer limited biological/interaction-level explanations. | 1—focuses on sequence generation/reconstruction rather than interaction prediction accuracy. | 2—autoencoder/conv-VAE training is generally efficient, but large-scale data impose compute costs | UniProt protein sequences, plus bacterial ORFs from GTDB genomes. |
| [65] | 2—designed for moderate-sized literature datasets with kNN-based filtering; | 3—based on intuitive metrics like information gain and k-nearest neighbors, which are easy to trace and understand | 3—outperformed all others in the BioCreative II.5 challenge for document classification tasks | 2—achieves good performance, but protein mention normalization steps can be time-consuming | BioCreative II.5 ACT: 619 training articles, 599 test articles |
| [66] | 2—handles biological literature efficiently but limited by feature complexity and external tools (e.g., NLP pipelines) | 2—combines SVMs and CRFs with custom feature engineering; interpretable but complex in design | 3—ranked among the top-performing systems in the BioCreative II.5 challenge on two tasks | 2—uses NLP tools and multiple ML models, which may introduce overhead | BioCreative II.5 INT and IPT tasks |
| [67] | 2—suitable for moderate datasets, but does not scale well to large protein structures | 2—ensemble learning approach is explainable, but its component models are not | 3—outperformed several existing techniques in hot spot prediction | 2—Ensemble SVM system incurs computational cost in feature selection | Protein hot regions and hot spots |
| [68] | 3—uses GPU acceleration to make SVMs scalable to high-dimensional & larger datasets | 2—SVM models with RBF kernels are difficult to interpret. | 3—demonstrated superior accuracy on five public PPI datasets | 4—achieves significant training time reduction via GPU acceleration | Five public PPI datasets (not named) |
| [69] | 2—suitable for standard PPI datasets but lacks scalability enhancements | 2—Residue & spatial profiles are useful but require domain knowledge for interpretation | 3—reported strong recall rates and general performance across dataset types | 3—ELM implementation achieves faster training compared to SVM | 563 non-redundant protein chains from PDB |
| [70] | 2—suitable for domain-specific applications such as HIV-1′ Human PPI data | 2—uses multivariate mutual information features, which are meaningful but complex | 3—achieved average accuracy between 83.5% and 84.9% across MMI types | 2—SVM performance is acceptable but not highly optimized | NCBI HIV-1-′ Human PPI dataset |
| [71] | 3—uses Mercer series to approximate SVM kernel, reducing computational load | 2—Mercer series low-rank is complex but improves transparency over black-box models | 3—matches kernel-based SVM accuracy with reduced computation | 3—dramatically lowers SVM training time via low-rank approximation | S. Cerevisiae PPI dataset |
| Scalability | Interpretability | Accuracy | Efficiency | Datasets Used | |
|---|---|---|---|---|---|
| [76] | 3—Ensemble framework scales reasonably to medium–large PPI networks, but feature extraction (65 features) and heuristic search increase computational load. | 2—Structural modularity is interpretable, but ensemble VotingRegressor reduces transparency of individual decision contributions. | 4—outperforms 12 state-of-the-art methods; integrates biological + topological features and ensemble learning for robust detection. | 2—Weighted network construction and core mining are computationally intensive. | Gavin, Krogan core, DIP, MIPS PPI networks; Standard protein complexes 1 & 2 |
| [77] | 2—Quantum models are promising but limited by quantum hardware scalability and computational resources. | 2—Quantum kernel space and quantum feature mapping reduce model transparency compared to classical SVM. | 4—achieves highest AUC (0.923 human; 0.927 C. elegans) outperforming classical SVM, KNN, and RF. | 3—Quantum parallelism improves kernel efficiency; speed depends on hardware constraints. | Human CPI dataset; C. elegans CPI dataset (1434 compounds) |
| [78] | 3—BioLMiner system processed large-scale literature using SVM-models for interaction extraction. | 2—integrates SVMs and CRFs with hybrid recognition models, reducing transparency | 3—top-performing systems in the BioCreative II.5 challenge on interaction | 2—Multi-stage architecture increases complexity but supports interaction | BioCreative II.5 challenge datasets (INT and IPT tasks) |
| [79] | 2—focuses on prediction of hot regions using SVM ensemble; dataset is limited to 16 protein complexes | 2—Ensemble SVM enhances performance but reduces interpretability of individual predictions | 3—demonstrated superior accuracy over baseline methods in identifying hot regions and residues | 3—Ensemble learning with mRMR feature selection ensures effective computation despite complexity | ASEdb dataset with 16 protein complexes |
| [80] | 4—GPU-accelerated SVM framework supports efficient training on high-dimensional datasets | 2—Kernel-based SVMs provide strong performance but reduce model transparency | 3—achieved faster and more accurate hyperparameter estimation | 4—utilized GPU parallelization for kernel matrix, reducing training time | classification & PPI datasets, including host–pathogen PPI |
| [81] | 3—Mercer series-based low-rank kernel approximation enables scalable SVM training on sequence-based PPI data. | 2—Kernel transformations introduce complexity, but Mercer series approximation aids understanding | 3—maintained high prediction performance with reduced computational requirements | 3—significantly reduced SVM training time using Hilbertâ€″Schmidt SVD for kernel approximation | S. cerevisiae protein interaction dataset |
| [82] | 2—applied SVM with multivariate mutual information to moderate-sized HIV1-human dataset | 3—use of interpretable MMI descriptors with SVM supports transparent reasoning. | 2—up to 84.90% accuracy, but SVM models lag behind ensemble methods | 3—fast feature extraction and classification with computational resources | NCBI HIV-1 and protein interaction dataset |
| Scalability | Interpretability | Accuracy | Efficiency | Datasets Used | |
|---|---|---|---|---|---|
| [83] | 2—applied to 17 PP complexes with limited training samples; scalability to large datasets is not demonstrated. | 3—LS-SVM with Bayesian provides structured learning and interpretable feature selection | 3—achieved an F1-score of 0.84 in cross-validation and 0.58 in independent testing. | 3—Bayesian inference eliminates cross-validation overhead, enabling faster hyperparameter tuning. | ASEdb, BID, Cho’s dataset (17 complexes), 158 labeled residues (65 hot spots, 93 non-hot spots) |
| [84] | 4—tested on large-scale and cross-species datasets (Yeast, H. pylori, C. elegans), demonstrating excellent scalability. | 2—uses PCA for dimension reduction and RVM; decision paths are less interpretable. | 4—achieved accuracy of 92.98% (yeast) and 95.58% (H. pylori), outperforming SVM. | 3—PCA and RVM significantly reduce feature noise and computational cost compared to traditional classifiers | Yeast, H. pylori, C. elegans, M. musculus, H. sapiens, E. coli (11,188 yeast, 2916 H. pylori) |
| Scalability | Interpretability | Accuracy | Efficiency | Datasets | |
|---|---|---|---|---|---|
| [85] | 4—AutoTarget embedded over 19,000 proteins and associated them with over 23,000 diseases, which supports large analysis | 2—uses node embeddings and Naïve Bayes, which reduces transparency. | 3—has a recall of 0.90 and an F1 score of 0.79, validated with case studies and clustering. | 3—Node2vec+ and Naïve Bayes combination ensures scalability and efficiency | STRING, Therapeutic Target |
| [86] | 2—applied to two datasets with 493 protein complexes; scalability not demonstrated for very large networks. | 2—Fuzzy modeling handles uncertainty, but fuzzy feature matrices reduce transparency | 3—GAFNB outperformed standard Naïve Bayes, effectively handling noisy data. | 2—Genetic algorithm and fuzzy logic increase computational complexity. | MIPS and TAP-MS datasets |
| [87] | 3—classifies cancer across multiple datasets using ensemble learning on sequence and gene expression data. | 2—SVM and NB ensemble improve prediction but reduce transparency. | 3—outperformed baseline classifiers with 15–20% improvement across metrics. | 3—Principal Component Analysis & ensemble methods enable fast classification | Kaggle cancer gene expression |
| [88] | 3—utilizes a nine-pipeline ensemble with SMOTE and PCA, capable of handling large feature sets and sample imbalance. | 1—Stacking and ensemble pipelines involve multiple models, limiting transparency and interpretability. | 3—achieved high prediction accuracy using ensemble stacking of models like XGBoost | 2—Ensemble setup provides strong results but requires more computational resources and tuning. | SpotOn; features derived-protr package |
| Scalability | Interpretability | Accuracy | Efficiency | Datasets | |
|---|---|---|---|---|---|
| [25] | 3—integrated large-scale PPI and miRNA-target networks using BioGRID and HPRD datasets. | 2—Gradient Boosting offers limited interpretability, but enhances understanding. | 3—achieved F1-score of 0.82 with (0.91) and recall (0.77). | 3—SMOTE balancing with GBC, with manageable computational cost. | BioGRID, HPRD (9455 genes) |
| [89] | 3—applied to yeast datasets with tens of thousands of interactions, showing good scalability in PPI detection | 2—boosted decision trees offer some transparency; use of topological features aids interpretability. | 4—precision of 0.994 and recall of 1.0 in predicting 6531 interactions, with 22/37 links validated | 3—used a 20-tree ensemble with efficient feature extraction; handled large datasets effectively. | Yeast PPI networks (Collins; STRING) |
| [90] | 2—focused on two cell lines and a limited number of protein pairs; scalability not demonstrated beyond this scope. | 3—Decision tree model and iterative clustering provide clear and interpretable results. | 3—MAE of 0.0698 and matching histograms show close alignment with ground truth. | 2—Multiple validation folds and clustering increase runtime despite good accuracy. | HCT116, HEK293T; and Bioplex PPI |
| Scalability | Interpretability | Accuracy | Efficiency | Datasets Used | |
|---|---|---|---|---|---|
| [91] | 3—designed to scale across multiple mutation datasets with varying sizes and mutation types. | 2—uses graph neural networks and multi-task learning, which offer limited transparency. | 3—outperforms baseline models across all tested datasets in terms of Pearson’s correlation coefficient. | 2—uses GBT decoders and geometric encoders; efficiency depends on pretraining and mutant generation method. | AB-Bind (S645, M1101), SKEMPI (S1131), SKEMPI2 (S4169) |
| [92] | 3—applies to large host–pathogen datasets using sequence-derived features. | 3—KNN, SVM, and Naïve Bayes with understandable feature spaces (AAC, conjoint triad). | 2—SVM performs best, but KNN provides useful comparisons; accuracy varies. | 3—Simple sequence-based features ensure fast training and inference. | DenvInt, HPRD (negative set), human proteins |
| [93] | 4—over 5000 organisms and 24.6 million proteins with large-scale integration. | 2—aggregates from multiple channels and algorithms; network-level interpretations are complex | 3—combines curated and predicted interactions with benchmarked confidence scores. | 3—optimized for fast access and large-network visualization; web/API and Cytoscape tools | STRING database (experimental, predicted) |
| [94] | 2—applies to document sets, but depends on cross-validation costs. | 3—clear feature representation and distance-based similarity in document classification | 3—Outperformed other BioCreative II.5 submissions in precision-recall AUC. | 2—Custom IG-weighted KNN and normalization steps increase computation time. | BioCreative II.5 corpus (FEBS publications) |
| Scalability | Interpretability | Accuracy | Efficiency | Datasets Used | |
|---|---|---|---|---|---|
| [95] | 3—designed for large-scale functional annotation across whole proteomes; scalable with batch processing and integration into annotation pipelines. | 2—The system’s layered structure and regression-based scoring offer interpretability, but statistical modeling and score combinations add complexity. | 3—outperformed competing tools in free-text and GO annotation tasks, and showed superior results in CAFA evaluations and benchmarks like NOSELF | 3—optimized sequence filtering, clustering, and sparse regression models reduce redundant processing and improve annotation speed. | UniProtKB, SwissProt datasets (for training/evaluation); NOSELF and NOCLOUD filtered versions for benchmarking. |
| [96] | 2—handles multiple GEO datasets but is limited to predefined morphine-related studies; less suitable for continuous or large-scale data streams. | 3—uses WKNN clustering and network-based visualization (mRNA–miRNA–PPI), which enhances biological interpretation and clarity of molecular interactions. | 3—identified common DEGs for addiction and analgesia; robust clustering and pathway enrichment (e.g., GO/KEGG); identified miR-129 and Fos as candidate markers. | 2—PCA, DEG screening, and enrichment analyses are computationally standard; integration of multiple networks is more resource-intensive. | GEO datasets: GSE62346, GSE50382, GSE9525, GSE7762, GSE78280, GSE15774 (covering both analgesic and addiction effects). |
| Technique | Avg. Accuracy (%) | Avg. F1-Score (%) | Computational Time | Notes |
|---|---|---|---|---|
| Graph Neural Networks | 97 | 96 | Moderate | High accuracy; suitable for large-scale datasets; moderate computational time. |
| Deep Neural Networks | 96 | 92 | High | High accuracy; computationally intensive. |
| Convolutional Neural Networks | 93 | 89 | High | Accuracy varies with dataset; computationally intensive. |
| Extreme Learning Machine | 94 | 92 | Low | Fast training; good accuracy; efficient for large datasets. |
| Support Vector Machine | 91 | 86 | Moderate | Accuracy depends on kernel and features; moderate computation time |
| Least Squares SVM | 93 | 91 | Moderate | Comparable accuracy to SVM; computational time varies with implementation. |
| K-Nearest Neighbor | 90 | 91 | High | Simple implementation; computationally intensive during inference. |
| Weighted K-Nearest Neighbor | 92 | 92 | High | Similar to KNN with weighting; computationally intensive. |
| Naïve Bayes | 81 | 75 | Very Low | Fast and simple; lower accuracy compared to other methods. |
| Probabilistic Decision Tree | 88 | 86 | Moderate | Balanced accuracy and computational time. |
| Dataset Suitability | Scalability | Interpretability | Efficiency | Strengths | Limitations | |
|---|---|---|---|---|---|---|
| Naïve Bayes | Best for small to moderately sized datasets—handles well with simple, less correlated features. | Highly scalable—computational complexity grows linearly with data size. | High—provides probabilistic outputs that are easy to interpret. | Extremely efficient—requires minimal computation for both training and prediction. | Fast training, interpretable model, suitable for early-stage analysis and filtering. | Assumes independence between features, which is unrealistic for biological data and lowers accuracy. |
| KNN | Suitable for small to moderate datasets—performance drops with large datasets due to memory and time constraints. | Poor scalability—requires full dataset scan at inference time, increasing with dataset size. | High—prediction is based on simple distance comparison; easy to explain. | Low—high computational cost during inference phase due to distance calculations. | Simple implementation with no training required; supports diverse distance metrics | Inefficient on large datasets; sensitive to noise and irrelevant features. |
| WKNN | Best for moderate datasets where small distance variations carry biological meaning. | Poor scalability—still requires evaluating all training samples during inference. | Moderate to High—provides interpretability through weighted influence of neighbors | Low—adds complexity over KNN by applying distance-based weighting. | Enhances KNN by emphasizing closer, relevant neighbors; improves prediction a | Suffers from the same scalability issues as KNN; performance degrades on large-scale datasets. |
| SVM | Effective for moderate, high-dimensional datasets—particularly binary classification tasks. | Limited scalability—training has quadratic time complexity; may struggle with large data | Low—complex kernel decisions are hard to interpret. | Moderate—depends on kernel and number of support vectors. | High accuracy; robust generalization in high-dimensional and non-linear spaces. | Requires hyperparameter tuning and kernel selection; performance bottlenecks in very large dataset |
| LS-SVM | Suitable for structured, high-dimensional data—performs well when features are optimized. | Moderate—linear equation-based formulation helps with moderate-sized datasets. | Moderate—Bayesian variants allow some interpretability, though not as intuitive as trees | Moderate to High—training is efficient due to least squares formulation. | Achieves accuracy similar to SVM with reduced computational burden and fast convergence. | Trade-off between transparency and speed; may not outperform deep models for complex features |
| ELM | Best for real-time, large-scale applications—fast training makes it suitable for big datasets. | Limited scalability—fixed random weights can reduce generalization if not tuned. | Low—less transparent due to randomly assigned hidden layer weights. | Very High—avoids iterative weight updates; uses analytical solution for output layer. | Extremely fast model training and inference; ideal for time-sensitive & streaming data tasks | Less flexible than backpropagation-based models; vulnerable to noise and variance in input data |
| DNNs | Suitable for large, complex datasets—excels in discovering non-linear and hierarchical patterns. | High—highly scalable with GPU acceleration and parallelization. | Low—multi-layered black-box models hinder explainability. | Low to Moderate—requires long training times and substantial computation resources. | Learns intricate protein interaction representations; supports feature learning | Prone to overfitting on small datasets; needs large labeled data and hyperparameter tuning. |
| CNNs | Best for sequence-based and spatial data—captures motifs and local dependencies in proteins. | Moderate to High—pooling layers reduce data size; GPU-friendly design. | Low to Moderate—convolutional layers are harder to interpret than decision trees. | Low to Moderate—heavy training phase but manageable inference time with optimized hardware. | Automatically learns spatial features from protein sequences; avoids manual feature engineering. | Needs high-quality, well-structured inputs; training can be resource-intensive. |
| GNNs | Ideal for graph-structured protein data—models both local and global interaction patterns. | High—capable of scaling to large interaction networks using message-passing architectures | Moderate—some explainability via node feature tracing, though still complex | Moderate—graph construction and training can be demanding but manageable | Captures structural and functional relationships; well-suited for complex networks. | Requires prior graph knowledge or construction; complex implementation and tuning. |
| Probabilistic Decision Tree | Effective for moderately sized datasets with uncertainty in labels—handles noise well. | Moderate—scalability depends on tree depth and dataset structure. | High—decision paths are transparent and easily explainable. | Moderate—model training and prediction are reasonably fast. | Probabilistic outputs offer soft classifications; interpretable and intuitive. | Can overfit; not ideal for complex dependencies. |
| Dataset Size | Recommended Methods |
| Small | Naïve Bayes (simplicity), SVM (accuracy), WKNN (accuracy with small sets) |
| Moderate | ELM (fast and accurate), CNN (spatial patterns), LS-SVM (robust with moderate cost) |
| Large | GNN (scalability with relational data), DNN (deep abstraction), ELM (real-time use) |
| Use Case Recommendations | |
| Use Case | Best Techniques |
| High Interpretability | Probabilistic Decision Tree, Naïve Bayes |
| Accuracy-Critical Tasks | GNN, DNN, LS-SVM, SVM |
| Real-Time Applications | ELM, Naïve Bayes |
| Topological Dependencies | GNN |
| Protein Sequence Modeling | CNN, DNN |
| Technique | Selected Papers | Datasets | Score | Technique Rank |
|---|---|---|---|---|
| Graph Neural Network (GNN) | [30] | DIP | 90.23 | 1 |
| HPRD | 89.47 | |||
| STRG | 88.65 | |||
| Convolutional Neural Network | [66] | DIP | 88.71 | 3 |
| HPRD | 87.89 | |||
| STRG | 86.95 | |||
| Extreme Learning Machine (ELM) | [72] | DIP | 86.16 | 7 |
| HPRD | 85.56 | |||
| STRG | 84.93 | |||
| Support Vector Machine (SVM) | [80] | DIP | 85.78 | 9 |
| HPRD | 85.13 | |||
| STRG | 84.14 | |||
| Least Squares SVM (LS-SVM) | [83] | DIP | 87.61 | 5 |
| HPRD | 86.94 | |||
| STRG | 86.01 | |||
| Probabilistic Decision Tree | [89] | DIP | 88.10 | 4 |
| HPRD | 87.57 | |||
| STRG | 86.32 | |||
| K-Nearest Neighbor (KNN) | [94] | DIP | 85.17 | 10 |
| HPRD | 84.33 | |||
| STRG | 83.52 | |||
| Weighted KNN (WKNN) | [95] | DIP | 86.43 | 8 |
| HPRD | 85.51 | |||
| STRG | 84.65 | |||
| Deep Neural Network (DNN) | [55] | DIP | 89.34 | 2 |
| HPRD | 88.71 | |||
| STRG | 87.82 | |||
| Naïve Bayes-Based | [85] | DIP | 87.52 | 6 |
| HPRD | 86.43 | |||
| STRG | 85.44 |
| Technique | Selected Papers | Datasets | Score | Technique Rank |
|---|---|---|---|---|
| Graph Neural Network (GNN) | [30] | DIP | 89.12 | 1 |
| HPRD | 88.33 | |||
| STRG | 86.77 | |||
| Convolutional Neural Network (CNN) | [66] | DIP | 87.44 | 3 |
| HPRD | 86.22 | |||
| STRG | 84.91 | |||
| Extreme Learning Machine (ELM) | [72] | DIP | 82.84 | 10 |
| HPRD | 82.26 | |||
| STRG | 80.89 | |||
| Support Vector Machine (SVM) | [80] | DIP | 84.32 | 8 |
| HPRD | 83.08 | |||
| STRG | 81.84 | |||
| Least Squares SVM (LS-SVM) | [83] | DIP | 85.23 | 6 |
| HPRD | 84.14 | |||
| STRG | 82.48 | |||
| Probabilistic Decision Tree | [89] | DIP | 85.78 | 5 |
| HPRD | 85.16 | |||
| STRG | 82.60 | |||
| K-Nearest Neighbor (KNN) | [94] | DIP | 83.95 | 9 |
| HPRD | 82.43 | |||
| STRG | 81.38 | |||
| Weighted KNN (WKNN) | [95] | DIP | 84.74 | 7 |
| HPRD | 83.11 | |||
| STRG | 82.06 | |||
| Deep Neural Network (DNN) | [55] | DIP | 88.01 | 2 |
| HPRD | 87.10 | |||
| STRG | 85.45 | |||
| Naïve Bayes-Based | [85] | DIP | 87.20 | 4 |
| HPRD | 86.61 | |||
| STRG | 83.04 |
| Technique | Selected Papers | Datasets | Score | Technique Rank |
|---|---|---|---|---|
| Graph Neural Network (GNN) | [30] | DIP | 86.81 | 1 |
| HPRD | 85.74 | |||
| STRG | 84.33 | |||
| Convolutional Neural Network (CNN) | [66] | DIP | 85.43 | 3 |
| HPRD | 84.16 | |||
| STRG | 82.92 | |||
| Extreme Learning Machine (ELM) | [72] | DIP | 82.72 | 7 |
| HPRD | 82.07 | |||
| STRG | 81.37 | |||
| Support Vector Machine (SVM) | [80] | DIP | 82.85 | 9 |
| HPRD | 82.23 | |||
| STRG | 80.46 | |||
| Least Squares SVM (LS-SVM) | [83] | DIP | 83.64 | 6 |
| HPRD | 82.98 | |||
| STRG | 81.29 | |||
| Probabilistic Decision Tree | [89] | DIP | 83.78 | 5 |
| HPRD | 82.55 | |||
| STRG | 80.96 | |||
| K-Nearest Neighbor (KNN) | [94] | DIP | 82.57 | 10 |
| HPRD | 81.35 | |||
| STRG | 79.73 | |||
| Weighted KNN (WKNN) | [95] | DIP | 83.01 | 8 |
| HPRD | 81.97 | |||
| STRG | 80.82 | |||
| Deep Neural Network (DNN) | [55] | DIP | 85.96 | 2 |
| HPRD | 84.89 | |||
| STRG | 83.48 | |||
| Naïve Bayes-Based | [85] | DIP | 84.24 | 4 |
| HPRD | 83.12 | |||
| STRG | 81.72 |
| Summary of Findings | Literature Correlation | |
|---|---|---|
| GNN [30] | Top performer across all three metrics: GNN achieved the highest scores in sensitivity. This aligns with the literature that reports GNNs’ superior ability to capture complex topological dependencies in PPI networks by leveraging graph structure and message passing. Their capacity to model both local and global protein interactions reduces false negatives and false positives, contributing to balanced and strong results across all metrics. | Zitnik and Leskovec [99] demonstrated that GNNs can successfully integrate heterogeneous topological information to infer functional protein associations. The ability to propagate context-aware signals across network layers aligns with GNN’s superior performance in modeling PPI relationships. |
| DNN [55] | Second-best overall: DNN consistently ranked second in sensitivity, specificity, and precision. DNNs excel at learning non-linear relationships and rich hierarchical representations from large-scale sequence data. The high precision indicates confident predictions with fewer false positives, and the strong sensitivity suggests robust detection of true interactions, making DNNs ideal for complex bioinformatics datasets. | Wang et al. [100] showed that deeper layers significantly improve the model’s capacity to detect complex structural and sequential dependencies. Their findings underscore DNNs’ strength in capturing rich and abstract features necessary for accurate interaction modeling in large datasets. |
| CNN [66] | High performance with emphasis on sensitivity and precision: CNN scored well in sensitivity and precision, ranking third, and achieved strong specificity. These results are consistent with CNN’s capacity to learn local sequence motifs and spatial patterns important for interaction prediction. CNNs are often preferred when the feature space is based on protein sequences or structural motifs. | Li et al. [101] demonstrated the use of CNNs for learning sequence motifs in enzyme classification. The convolutional layers effectively extracted spatial features that represent biological patterns. This corroborates CNN’s applicability to PPI tasks, where local sequence or structural features play a critical role in determining interaction likelihood. |
| Naïve Bayes [85] | Surprisingly strong in specificity and precision: NB demonstrated competitive specificity and precision, but slightly lower sensitivity. Its assumption of feature independence simplifies modeling and often performs well in high-dimensional but sparse biological datasets. While not as expressive as DNNs or GNNs, it is computationally efficient and still competitive when data preprocessing and feature engineering are well-optimized. | Domingos and Pazzani [102] emphasized Naïve Bayes’ efficiency but also its limitations for tasks requiring sophisticated relationships between features |
| Prob. Decision Tree [89] | Well-balanced performance: The technique delivered solid sensitivity, specificity, and precision. Its strength lies in its interpretability and flexibility, particularly when modeling uncertain biological data. The literature indicates that probabilistic trees manage data noise and uncertainty better than standard decision trees, explaining their respectable performance. | Murthy [103] discusses its effectiveness for simpler problems but acknowledges limitations for high-dimensional data. |
| LS-SVM [83] | Moderate across the board: LS-SVM showed better sensitivity and precision compared to standard SVM, with decent specificity. The least squares formulation simplifies the optimization problem, reducing computational cost while preserving performance. It is more stable in noisy datasets, as often encountered in PPI prediction. | [104] Suykens and Vandewalle [4k] emphasized LS-SVM’s applicability to classification problems with noisy or high-dimensional data—a context fitting PPI datasets where standard SVM may underperform. |
| WKNN [95] | Incremental improvements over KNN: WKNN outperformed KNN in sensitivity (86.43 vs. 85.17), specificity (84.74 vs. 83.95), and precision (83.01 vs. 82.57), reflecting its advantage in using distance-weighted voting. This aligns with the literature that shows WKNN’s ability to reduce the influence of noisy or distant neighbors, leading to better precision and general performance in high-dimensional biological datasets. | Chen et al. [105] incorporated a distance-weighted voting scheme into KNN. Their integrative approach reduced the impact of misleading neighbors by assigning higher influence to closer instances, which significantly improved prediction precision. This finding supports WKNN’s effectiveness in handling complexity in PPI datasets |
| ELM [32] | Moderate sensitivity and precision, but lowest specificity: ELM had decent sensitivity (86.16) and precision (82.72), but its specificity (82.84) lagged behind other neural methods. Although ELM is known for fast training due to random hidden weights and analytical output layer computation, this architecture may generalize less effectively on complex data, which explains the relatively higher false positive rate. | Huang et al. [106] showed that while ELM is fast, it lacks the adaptability of more advanced methods for intricate data like PPI detection |
| SVM [32] | Lower-tier performance in this comparison: SVM had lower sensitivity, specificity, and precision compared to LS-SVM and modern neural methods. Though historically strong in bioinformatics, SVM’s limitations on large, complex, and noisy datasets reduce its competitiveness when compared to more flexible deep learning models. Kernel choice and parameter tuning significantly impact its performance. | Cortes and Vapnik [107] demonstrated SVM’s robustness in classification tasks, but modern ensemble methods often surpass it in large-scale datasets |
| KNN [32] | Lowest performing method overall: KNN ranked lowest in sensitivity, specificity, and precision. The literature attributes this to its poor scalability and high sensitivity to irrelevant features and noise—especially problematic in PPI datasets with high dimensionality and heterogeneity. Despite its simplicity and interpretability, its performance does not scale well without enhancements such as feature selection or dimensionality reduction. | Cover and Hart [108] identified KNN’s simplicity but noted its shortcomings in noisy and high-dimensional environments. |
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content. |
© 2026 by the author. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license.
Share and Cite
Taha, K. A Review and Experimental Analysis of Supervised Learning Systems and Methods for Protein–Protein Interaction Detection. Int. J. Mol. Sci. 2026, 27, 4094. https://doi.org/10.3390/ijms27094094
Taha K. A Review and Experimental Analysis of Supervised Learning Systems and Methods for Protein–Protein Interaction Detection. International Journal of Molecular Sciences. 2026; 27(9):4094. https://doi.org/10.3390/ijms27094094
Chicago/Turabian StyleTaha, Kamal. 2026. "A Review and Experimental Analysis of Supervised Learning Systems and Methods for Protein–Protein Interaction Detection" International Journal of Molecular Sciences 27, no. 9: 4094. https://doi.org/10.3390/ijms27094094
APA StyleTaha, K. (2026). A Review and Experimental Analysis of Supervised Learning Systems and Methods for Protein–Protein Interaction Detection. International Journal of Molecular Sciences, 27(9), 4094. https://doi.org/10.3390/ijms27094094

