LPBERT: A Protein–Protein Interaction Prediction Method Based on a Pre-Trained Language Model
Abstract
1. Introduction
- First, we introduced ProteinBERT in the field of PPI prediction to obtain rich embedding representations of protein sequences.
- Second, with ProteinBERT as the core, we designed the Local Convolutional Recurrent Neural Network (LCR) module and the Global Convolutional Transformer encoder (GCT) module to process the extracted embedding representations and construct the LPBERT framework, which achieved satisfactory results in the PPI prediction tasks.
- Third, we conducted extensive experiments and performance evaluations to demonstrate the superior performance of the LPBERT framework on various PPI prediction benchmark datasets. In addition, we conducted ablation experiments to verify the contribution of the LCR and GCT modules to the overall model performance. Through these experiments, we were able to fully understand the superiority and potential application value of LPBERT in the PPI prediction task.
2. Materials and Methods
2.1. Datasets
2.2. Model Architecture
2.3. Sequence Encoding
2.4. Deep Feature Extraction
- CNN: Extract deep features of the embedding representation and reduce the data dimension.
- BN: Normalize the output of the previous layer to enhance the generalizability of the model [37].
- Max pooling: Retention of the most important features and reduction of the data dimension.
- BiLSTM: This is a variant of a Recurrent Neural Network (RNN) that captures contextual dependencies and enhances the expressive capability of the model for data [38].
2.5. Classifier
2.6. Hyperparameter Optimization
2.7. Evaluation Metrics
3. Results
3.1. Implementation Details
3.2. Analysis of Sequence Length Parameters
3.3. Comparative Experiment
3.4. Ablation Experiment
3.5. Sequence Similarity Analysis
3.6. Cross-Species Generalization Analysis
4. Discussion
5. Conclusions
Author Contributions
Funding
Institutional Review Board Statement
Informed Consent Statement
Data Availability Statement
Acknowledgments
Conflicts of Interest
References
- Li, Y.; Zhang, Y.; Dong, Y.; Akakuru, O.U.; Yao, X.; Yi, J.; Li, X.; Wang, L.; Lou, X.; Zhu, B.; et al. Ablation of gap junction protein improves the efficiency of nanozyme-mediated catalytic/starvation/mild-temperature photothermal therapy. Adv. Mater. 2023, 35, 2210464. [Google Scholar] [CrossRef] [PubMed]
- Wang, H.; Zhao, S.; Xia, X.; Liu, J.; Sun, F.; Kong, B. Interaction of the extracellular protease from Staphylococcus xylosus with meat proteins elucidated via spectroscopic and molecular docking. Food Chem. X 2024, 21, 101204. [Google Scholar] [CrossRef] [PubMed]
- Essandoh, K.; Teuber, J.P.; Brody, M.J. Regulation of cardiomyocyte intracellular trafficking and signal transduction by protein palmitoylation. Biochem. Soc. Trans. 2024, 52, 41–53. [Google Scholar] [CrossRef]
- Sun, X.; Xie, Y.; Xu, K.; Li, J. Regulatory networks of the F-box protein FBX206 and OVATE family proteins modulate brassinosteroid biosynthesis to regulate grain size and yield in rice. J. Exp. Bot. 2024, 75, 789–801. [Google Scholar] [CrossRef]
- Huang, J.; Ecker, G.F. A structure-based view on ABC-transporter linked to multidrug resistance. Molecules 2023, 28, 495. [Google Scholar] [CrossRef]
- Hoogstraten, C.A.; Schirris, T.J.; Russel, F.G. Unlocking mitochondrial drug targets: The importance of mitochondrial transport proteins. Acta Physiol. 2024, 240, e14150. [Google Scholar] [CrossRef]
- Sato, T.; Hanada, M.; Bodrug, S.; Irie, S.; Iwama, N.; Boise, L.H.; Thompson, C.B.; Golemis, E.; Fong, L.; Wang, H.G. Interactions among members of the Bcl-2 protein family analyzed with a yeast two-hybrid system. Proc. Natl. Acad. Sci. USA 1994, 91, 9238–9242. [Google Scholar] [CrossRef]
- Çağlayan, E.; Turan, K. An in silico prediction of interaction models of influenza A virus PA and human C14orf166 protein from yeast-two-hybrid screening data. Proteins Struct. Funct. Bioinform. 2023, 91, 1235–1244. [Google Scholar] [CrossRef]
- Free, R.B.; Hazelwood, L.A.; Sibley, D.R. Identifying novel protein-protein interactions using co-immunoprecipitation and mass spectroscopy. Curr. Protoc. Neurosci. 2009, 46, 5–28. [Google Scholar] [CrossRef]
- Floyd, B.M.; Marcotte, E.M. Protein sequencing, one molecule at a time. Annu. Rev. Biophys. 2022, 51, 181–200. [Google Scholar] [CrossRef]
- Chen, M.; Ju, C.J.T.; Zhou, G.; Chen, X.; Zhang, T.; Chang, K.W.; Zaniolo, C.; Wang, W. Multifaceted protein–protein interaction prediction based on Siamese residual RCNN. Bioinformatics 2019, 35, i305–i314. [Google Scholar] [CrossRef] [PubMed]
- Xu, W.; Gao, Y.; Wang, Y.; Guan, J. Protein–protein interaction prediction based on ordinal regression and recurrent convolutional neural networks. BMC Bioinform. 2021, 22, 485. [Google Scholar] [CrossRef] [PubMed]
- Hu, X.; Feng, C.; Zhou, Y.; Harrison, A.; Chen, M. DeepTrio: A ternary prediction system for protein–protein interaction using mask multiple parallel convolutional neural networks. Bioinformatics 2022, 38, 694–702. [Google Scholar] [CrossRef] [PubMed]
- Szymborski, J.; Emad, A. RAPPPID: Towards generalizable protein interaction prediction with AWD-LSTM twin networks. Bioinformatics 2022, 38, 3958–3967. [Google Scholar] [CrossRef]
- Kudo, T. Sentencepiece: A simple and language independent subword tokenizer and detokenizer for neural text processing. arXiv 2018, arXiv:1808.06226. [Google Scholar]
- Merity, S.; Keskar, N.S.; Socher, R. Regularizing and optimizing LSTM language models. arXiv 2017, arXiv:1708.02182. [Google Scholar]
- Chen, W.; Wang, S.; Song, T.; Li, X.; Han, P.; Gao, C. DCSE: Double-Channel-Siamese-Ensemble model for protein protein interaction prediction. BMC Genom. 2022, 23, 555. [Google Scholar] [CrossRef]
- Asim, M.N.; Ibrahim, M.A.; Malik, M.I.; Dengel, A.; Ahmed, S. ADH-PPI: An attention-based deep hybrid model for protein-protein interaction prediction. Iscience 2022, 25, 105169. [Google Scholar] [CrossRef]
- Wang, M.; Lai, J.; Jia, J.; Xu, F.; Zhou, H.; Yu, B. ECA-PHV: Predicting human-virus protein-protein interactions through an interpretable model of effective channel attention mechanism. Chemom. Intell. Lab. Syst. 2024, 247, 105103. [Google Scholar] [CrossRef]
- Song, B.; Luo, X.; Luo, X.; Liu, Y.; Niu, Z.; Zeng, X. Learning spatial structures of proteins improves protein–protein interaction prediction. Briefings Bioinform. 2022, 23, bbab558. [Google Scholar] [CrossRef]
- Jumper, J.; Evans, R.; Pritzel, A.; Green, T.; Figurnov, M.; Ronneberger, O.; Tunyasuvunakool, K.; Bates, R.; Žídek, A.; Potapenko, A.; et al. Highly accurate protein structure prediction with AlphaFold. Nature 2021, 596, 583–589. [Google Scholar] [CrossRef] [PubMed]
- Hu, W.; Ohue, M. SpatialPPI: Three-dimensional space protein-protein interaction prediction with AlphaFold Multimer. Comput. Struct. Biotechnol. J. 2024, 23, 1214–1225. [Google Scholar] [CrossRef] [PubMed]
- Evans, R.; O’Neill, M.; Pritzel, A.; Antropova, N.; Senior, A.; Green, T.; Žídek, A.; Bates, R.; Blackwell, S.; Yim, J.; et al. Protein complex prediction with AlphaFold-Multimer. bioRxiv 2021. [Google Scholar] [CrossRef]
- Rao, R.; Bhattacharya, N.; Thomas, N.; Duan, Y.; Chen, P.; Canny, J.; Abbeel, P.; Song, Y. Evaluating protein transfer learning with TAPE. Adv. Neural Inf. Process. Syst. 2019, 32, 9689–9701. [Google Scholar]
- Elnaggar, A.; Heinzinger, M.; Dallago, C.; Rehawi, G.; Wang, Y.; Jones, L.; Gibbs, T.; Feher, T.; Angerer, C.; Steinegger, M.; et al. Prottrans: Toward understanding the language of life through self-supervised learning. IEEE Trans. Pattern Anal. Mach. Intell. 2021, 44, 7112–7127. [Google Scholar] [CrossRef]
- Rives, A.; Meier, J.; Sercu, T.; Goyal, S.; Lin, Z.; Liu, J.; Guo, D.; Ott, M.; Zitnick, C.L.; Ma, J.; et al. Biological structure and function emerge from scaling unsupervised learning to 250 million protein sequences. Proc. Natl. Acad. Sci. USA 2021, 118, e2016239118. [Google Scholar] [CrossRef]
- Brandes, N.; Ofer, D.; Peleg, Y.; Rappoport, N.; Linial, M. ProteinBERT: A universal deep-learning model of protein sequence and function. Bioinformatics 2022, 38, 2102–2110. [Google Scholar] [CrossRef]
- Dang, T.H.; Vu, T.A. xCAPT5: Protein–protein interaction prediction using deep and wide multi-kernel pooling convolutional neural networks with protein language model. BMC Bioinform. 2024, 25, 106. [Google Scholar] [CrossRef]
- Elnaggar, A.; Ding, W.; Jones, L.; Gibbs, T.; Feher, T.; Angerer, C.; Severini, S.; Matthes, F.; Rost, B. Codetrans: Towards cracking the language of silicon’s code through self-supervised deep learning and high performance computing. arXiv 2021, arXiv:2104.02443. [Google Scholar]
- Chen, T.; Guestrin, C. Xgboost: A scalable tree boosting system. In Proceedings of the 22nd ACM Sigkdd International Conference on Knowledge Discovery and Data Mining, San Francisco, CA, USA, 13–17 August 2016; pp. 785–794. [Google Scholar]
- Liu, T.; Gao, H.; Ren, X.; Xu, G.; Liu, B.; Wu, N.; Luo, H.; Wang, Y.; Tu, T.; Yao, B.; et al. Protein–protein interaction and site prediction using transfer learning. Briefings Bioinform. 2023, 24, bbad376. [Google Scholar] [CrossRef]
- Vaswani, A. Attention is all you need. Adv. Neural Inf. Process. Syst. 2017, 30. [Google Scholar] [CrossRef]
- Boutet, E.; Lieberherr, D.; Tognolli, M.; Schneider, M.; Bansal, P.; Bridge, A.J.; Poux, S.; Bougueleret, L.; Xenarios, I. UniProtKB/Swiss-Prot, the manually annotated section of the UniProt KnowledgeBase: How to use the entry view. In Plant Bioinformatics: Methods and Protocols; Humana Press: New York, NY, USA, 2016; pp. 23–54. [Google Scholar]
- Meier, J.; Rao, R.; Verkuil, R.; Liu, J.; Sercu, T.; Rives, A. Language models enable zero-shot prediction of the effects of mutations on protein function. Adv. Neural Inf. Process. Syst. 2021, 34, 29287–29303. [Google Scholar]
- Mikolov, T. Efficient estimation of word representations in vector space. arXiv 2013, arXiv:1301.3781. [Google Scholar]
- Kim, H.J.; Hong, S.E.; Cha, K.J. seq2vec: Analyzing sequential data using multi-rank embedding vectors. Electron. Commer. Res. Appl. 2020, 43, 101003. [Google Scholar] [CrossRef]
- Bjorck, N.; Gomes, C.P.; Selman, B.; Weinberger, K.Q. Understanding batch normalization. Adv. Neural Inf. Process. Syst. 2018, 31. [Google Scholar] [CrossRef]
- Siami-Namini, S.; Tavakoli, N.; Namin, A.S. The performance of LSTM and BiLSTM in forecasting time series. In Proceedings of the 2019 IEEE International Conference on Big Data (Big Data), Los Angeles, CA, USA, 9–12 December 2019; IEEE: Piscataway, NJ, USA, 2019; pp. 3285–3292. [Google Scholar]
- Oughtred, R.; Rust, J.; Chang, C.; Breitkreutz, B.J.; Stark, C.; Willems, A.; Boucher, L.; Leung, G.; Kolas, N.; Zhang, F.; et al. The BioGRID database: A comprehensive biomedical resource of curated protein, genetic, and chemical interactions. Protein Sci. 2021, 30, 187–200. [Google Scholar] [CrossRef]
- Yao, Y.; Du, X.; Diao, Y.; Zhu, H. An integration of deep learning with feature embedding for protein–protein interaction prediction. PeerJ 2019, 7, e7126. [Google Scholar] [CrossRef]
- Xie, S.; Xie, X.; Zhao, X.; Liu, F.; Wang, Y.; Ping, J.; Ji, Z. HNSPPI: A hybrid computational model combing network and sequence information for predicting protein–protein interaction. Briefings Bioinform. 2023, 24, bbad261. [Google Scholar] [CrossRef]
- Li, X.; Han, P.; Wang, G.; Chen, W.; Wang, S.; Song, T. SDNN-PPI: Self-attention with deep neural network effect on protein-protein interaction prediction. BMC Genom. 2022, 23, 474. [Google Scholar] [CrossRef]
- Tran, H.N.; Xuan, Q.N.P.; Nguyen, T.T. DeepCF-PPI: Improved prediction of protein-protein interactions by combining learned and handcrafted features based on attention mechanisms. Appl. Intell. 2023, 53, 17887–17902. [Google Scholar] [CrossRef]
- Kryshtafovych, A.; Schwede, T.; Topf, M.; Fidelis, K.; Moult, J. Critical assessment of methods of protein structure prediction (CASP)—Round XIV. In Proteins: Structure, Function, and Bioinformatics; Wiley Online Library: Hoboken, NJ, USA, 2021; Volume 89, pp. 1607–1617. [Google Scholar]
- Yang, X.; Yang, S.; Lian, X.; Wuchty, S.; Zhang, Z. Transfer learning via multi-scale convolutional neural layers for human–virus protein–protein interaction prediction. Bioinformatics 2021, 37, 4771–4778. [Google Scholar] [CrossRef]



| Dataset | Positive Samples | Negative Samples | 
|---|---|---|
| H. sapiens | 31,164 | 31,164 | 
| S. cerevisiae | 13,462 | 13,462 | 
| Guo | 5594 | 5594 | 
| Pan | 27,593 | 34,298 | 
| Multi-species | 32,959 | 32,959 | 
| Human–virus | 8929 | 8929 | 
| Hyperparameter | Value | Best Value | Selection | 
|---|---|---|---|
| Learn rate | [, ] | 0.0006 | 0.0006 | 
| Optimizer | [0, 1] | 0.0581 | Adam | 
| Head1 | [2, 8] | 2.9361 | 4 | 
| Head2 | [2, 8] | 2.9360 | 4 | 
| FNN1 | [64, 512] | 489.92 | 512 | 
| FNN2 | [64, 512] | 391.93 | 512 | 
| Dropout rate | [0, 0.3] | 0.1124 | 0.1 | 
| Species | Length | Accuracy | Precision | Recall | Specificity | MCC | 
|---|---|---|---|---|---|---|
| H. sapiens | 500 | 98.50 ± 0.13 | 98.59 ± 0.20 | 98.41 ± 0.09 | 98.59 ± 0.20 | 97.00 ± 0.27 | 
| 1000 | 98.76 ± 0.11 | 99.05 ± 0.21 | 98.47 ± 0.13 | 99.05 ± 0.22 | 97.53 ± 0.22 | |
| 1500 | 98.93 ± 0.13 | 99.23 ± 0.19 | 98.62 ± 0.13 | 99.23 ± 0.19 | 97.85 ± 0.26 | |
| S. cerevisiae | 500 | 97.50 ± 0.26 | 98.15 ± 0.55 | 96.83 ± 0.54 | 98.17 ± 0.56 | 95.01 ± 0.53 | 
| 1000 | 97.71 ± 0.20 | 98.64 ± 0.31 | 96.76 ± 0.60 | 98.67 ± 0.31 | 95.45 ± 0.38 | |
| 1500 | 97.94 ± 0.16 | 98.60 ± 0.42 | 97.27 ± 0.46 | 98.61 ± 0.43 | 95.89 ± 0.31 | 
| Method | Accuracy | Precision | Recall | Specificity | F1-Score | MCC | 
|---|---|---|---|---|---|---|
| PIPR | 94.50 ± 0.22 | 95.81 ± 0.28 | 93.08 ± 0.20 | 95.93 ± 0.28 | 94.42 ± 0.22 | 89.04 ± 0.43 | 
| RAPPPID | 94.16 ± 2.04 | 95.10 ± 2.30 | 93.15 ± 2.23 | 95.17 ± 2.34 | 94.10 ± 2.06 | 88.36 ± 4.08 | 
| DeepTrio | 95.69 ± 0.33 | 98.57 ± 0.36 | 92.73 ± 0.37 | 98.65 ± 0.34 | 95.56 ± 0.34 | 91.54 ± 0.65 | 
| LPBERT | 98.93 ± 0.13 | 99.23 ± 0.19 | 98.62 ± 0.13 | 99.23 ± 0.19 | 98.93 ± 0.13 | 97.85 ± 0.26 | 
| Method | Accuracy | Precision | Recall | Specificity | F1-Score | MCC | 
|---|---|---|---|---|---|---|
| PIPR | 90.29 ± 1.08 | 92.01 ± 1.84 | 88.31 ± 0.32 | 92.29 ± 1.90 | 90.11 ± 1.02 | 80.68 ± 2.21 | 
| RAPPPID | 91.25 ± 0.27 | 93.85 ± 0.51 | 88.28 ± 0.26 | 94.21 ± 0.51 | 90.98 ± 0.26 | 82.64 ± 0.57 | 
| DeepTrio | 92.77 ± 0.63 | 93.70 ± 2.01 | 91.77 ± 1.35 | 93.76 ± 2.21 | 92.70 ± 0.55 | 85.60 ± 1.27 | 
| LPBERT | 97.94 ± 0.16 | 98.60 ± 0.42 | 97.27 ± 0.46 | 98.61 ± 0.43 | 97.93 ± 0.16 | 95.89 ± 0.31 | 
| Dataset | Method | Sequence Encoding | Accuracy | MCC | 
|---|---|---|---|---|
| S. cerevisiae (DeepFE, 5-fold) | DeepFE-PPI | Word2Vec | 94.78 ± 0.61 | 89.62 ± 1.23 | 
| SDNN-PPI | AAC + CT + AC | 95.48 ± 0.37 | 91.02 ± 0.74 | |
| LPBERT | ProteinBERT | 94.83 ± 0.16 | 89.69 ± 0.30 | |
| S. cerevisiae (DeepCF, 5-fold) | DeepCF-PPI | Word2Vec + AAC + PseAAC + APAAC + QSO + DPC | 95.6 ± 0.57 | 91.4 ± 1.13 | 
| LPBERT | ProteinBERT | 95.04 ± 0.48 | 90.1 ± 0.96 | |
| Human (10-fold) | HNSPPI | Seq2Vec | 94.92 ± 0.19 | NA | 
| LPBERT | ProteinBERT | 98.49 ± 0.36 | 96.99 ± 0.71 | 
| Dataset | Method | Accuracy | Precision | Recall | Specificity | F1-Score | MCC | 
|---|---|---|---|---|---|---|---|
| Bio H. sapiens | MPB-PPI | 98.18 ± 0.05 | 97.32 ± 0.19 | 99.10 ± 0.22 | 97.27 ± 0.21 | 98.20 ± 0.05 | 96.39 ± 0.09 | 
| LPBERT | 99.31 ± 0.08 | 99.30 ± 0.13 | 99.32 ± 0.08 | 99.30 ± 0.13 | 99.31 ± 0.08 | 98.62 ± 0.16 | |
| S. cerevisiae core (DeepFE) | MPB-PPI | 92.85 ± 0.31 | 93.95 ± 1.25 | 91.51 ± 1.43 | 94.17 ± 1.31 | 92.69 ± 0.39 | 85.77 ± 0.63 | 
| LPBERT | 94.83 ± 0.16 | 96.06 ± 0.54 | 93.50 ± 0.64 | 96.17 ± 0.45 | 94.76 ± 0.20 | 89.69 ± 0.30 | |
| Multi-species | MPB-PPI | 98.33 | 99.30 | 97.36 | 99.31 | 98.32 | 96.69 | 
| LPBERT | 98.87 ± 0.12 | 99.56 ± 0.13 | 98.19 ± 0.12 | 99.56 ± 0.14 | 98.87 ± 0.11 | 97.76 ± 0.24 | |
| Pan | xCAPT5 | 99.77 ± 0.02 | 99.75 ± 0.03 | 99.75 ± 0.02 | 99.80 ± 0.02 | 99.62 ± 0.06 | 99.55 ± 0.03 | 
| LPBERT | 98.73 ± 0.09 | 98.65 ± 0.32 | 98.51 ± 0.14 | 98.92 ± 0.25 | 98.58 ± 0.11 | 97.44 ± 0.18 | 
| Experiment | Accuracy | Precision | Recall | Specificity | F1-Score | MCC | 
|---|---|---|---|---|---|---|
| rm GCT | 97.47 ± 0.01 | 97.48 ± 0.15 | 97.47 ± 0.15 | 97.48 ± 0.16 | 97.47 ± 0.01 | 94.95 ± 0.01 | 
| rm LCR | 98.73 ± 0.10 | 99.15 ± 0.06 | 98.32 ± 0.23 | 99.15 ± 0.06 | 98.73 ± 0.10 | 97.47 ± 0.19 | 
| rp GCT with Trans | 98.73 ± 0.01 | 98.91 ± 0.16 | 98.55 ± 0.17 | 98.91 ± 0.16 | 98.73 ± 0.01 | 97.46 ± 0.02 | 
| rp LCR with CNNs | 98.8 ± 0.07 | 99.01 ± 0.20 | 98.57 ± 0.16 | 99.02 ± 0.20 | 98.8 ± 0.07 | 97.59 ± 0.15 | 
| GCT (rm Trans) | 98.89 ± 0.02 | 99.09 ± 0.07 | 98.69 ± 0.04 | 99.09 ± 0.07 | 98.89 ± 0.02 | 97.78 ± 0.03 | 
| LCR (rm BiL) | 98.88 ± 0.10 | 99.05 ± 0.21 | 98.71 ± 0.02 | 99.05 ± 0.21 | 98.88 ± 0.10 | 97.75 ± 0.20 | 
| rm GCT (Trans) + LCR (BiL) | 98.85 ± 0.16 | 99.14 ± 0.20 | 98.56 ± 0.22 | 99.15 ± 0.19 | 98.85 ± 0.17 | 97.71 ± 0.33 | 
| LCR (GRU) | 98.74 ± 0.09 | 98.80 ± 0.16 | 98.68 ± 0.07 | 98.80 ± 0.16 | 98.74 ± 0.09 | 97.48 ± 0.17 | 
| LCR (BiGRU) | 98.64 ± 0.10 | 99.09 ± 0.28 | 98.19 ± 0.24 | 99.10 ± 0.28 | 98.64 ± 0.10 | 97.29 ± 0.21 | 
| LCR (LSTM) | 98.63 ± 0.12 | 98.76 ± 0.37 | 98.49 ± 0.17 | 98.77 ± 0.38 | 98.63 ± 0.12 | 97.25 ± 0.24 | 
| LPBERT (Ours) | 98.93 ± 0.13 | 99.23 ± 0.19 | 98.62 ± 0.13 | 99.23 ± 0.19 | 98.93 ± 0.13 | 97.85 ± 0.26 | 
| Dataset | Method | Encoding Parameters | Accuracy | MCC | 
|---|---|---|---|---|
| BioGRID H. sapiens (5-fold) | ∼38~M | LPBERT (with TAPE) | 97.66 ± 0.23 | 95.32 ± 0.47 | 
| LPBERT (with ProtBert) | ∼420~M | 99.75 ± 0.06 | 99.49 ± 0.11 | |
| LPBERT (with ESM-2) | ∼650~M | 99.80 ± 0.02 | 99.61 ± 0.05 | |
| LPBERT (with ProtT5) | ∼3~B | 99.82 ± 0.02 | 99.65 ± 0.04 | |
| LPBERT (Ours) | ∼16~M | 98.93 ± 0.13 | 97.85 ± 0.26 | 
| Similarity | Training–Validation Samples | Test Samples | Accuracy | Precision | F1-Score | 
|---|---|---|---|---|---|
| Any | 12,354 | 2968 | 93.19 | 98.48 | 93.40 | 
| ≤90% | 12,308 | 2833 | 94.25 | 98.50 | 94.65 | 
| ≤80% | 13,216 | 2178 | 94.67 | 97.96 | 91.40 | 
| ≤70% | 13,224 | 2373 | 95.41 | 98.49 | 95.31 | 
| ≤60% | 12,918 | 2360 | 94.66 | 98.21 | 94.58 | 
| ≤50% | 12,414 | 2440 | 94.84 | 98.24 | 94.66 | 
| Dataset | Positive Samples | Negative Samples | Accuracy | Precision | F1-Score | 
|---|---|---|---|---|---|
| BioGRID S. cerevisiae | 13,462 | 13,462 | 93.05 ± 0.21 | 98.96 ± 0.06 | 92.61 ± 0.25 | 
| S. cerevisiae core | 5271 | 5266 | 51.81 ± 0.55 | 51.10 ± 0.32 | 63.93 ± 0.63 | 
| Guo | 5594 | 5594 | 51.82 ± 0.07 | 51.12 ± 0.04 | 63.28 ± 0.13 | 
| Multi-species | 32,959 | 32,959 | 47.79 ± 0.36 | 48.74 ± 0.21 | 62.05 ± 0.34 | 
| Human–virus | 8929 | 8929 | 86.48 ± 1.07 | 93.84 ± 1.16 | 85.22 ± 1.53 | 
| Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content. | 
© 2025 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).
Share and Cite
Hu, A.; Kuang, L.; Yang, D. LPBERT: A Protein–Protein Interaction Prediction Method Based on a Pre-Trained Language Model. Appl. Sci. 2025, 15, 3283. https://doi.org/10.3390/app15063283
Hu A, Kuang L, Yang D. LPBERT: A Protein–Protein Interaction Prediction Method Based on a Pre-Trained Language Model. Applied Sciences. 2025; 15(6):3283. https://doi.org/10.3390/app15063283
Chicago/Turabian StyleHu, An, Linai Kuang, and Dinghai Yang. 2025. "LPBERT: A Protein–Protein Interaction Prediction Method Based on a Pre-Trained Language Model" Applied Sciences 15, no. 6: 3283. https://doi.org/10.3390/app15063283
APA StyleHu, A., Kuang, L., & Yang, D. (2025). LPBERT: A Protein–Protein Interaction Prediction Method Based on a Pre-Trained Language Model. Applied Sciences, 15(6), 3283. https://doi.org/10.3390/app15063283
 
        


 
       