Recent Advances and Application of Machine Learning for Protein–Protein Interaction Prediction in Rice: Challenges and Future Perspectives
Abstract
1. Introduction
2. Data Sources and Feature Engineering for PPI Prediction
2.1. Data Sources
2.2. Feature Selection
2.2.1. Sequence-Based Features
2.2.2. Structure-Based Features
2.2.3. Function-Based Features
| Feature Type | Description | Advantages | Limitations | Typical Use in Rice PPI Modeling | References |
|---|---|---|---|---|---|
| Sequence-Based | Derived from primary amino acid sequence (e.g., AAC, CKSAAP, PSSM) |
|
| Used in SVM and RF models for rice phosphorylation and PPI predictions | [49,55,56,57] |
| Structure-Based | Based on 3D conformation: interface residues, solvent accessibility, dynamics |
|
| Emerging in rice using AlphaFold2-based models; potential for DL integration | [28,54,58,59,60] |
| Function-Based | Biological annotations (GO terms, domains, co-expression, pathway membership) |
|
| Used in GNN/DLNet models for network-based rice PPI inference | [7,53,61,62,63,64] |
2.3. Evaluation Metrics
| Metric | Definition | Advantages | Limitations | References |
|---|---|---|---|---|
| Accuracy | Ratio of correctly predicted instances (TP + TN) to total predictions | Simple to compute and interpret; provides a general overview | Misleading in imbalanced datasets where negative class dominates | [65,73] |
| Precision | TP/(TP + FP)—proportion of positive predictions that are correct | Highlights model’s ability to avoid false positives | May ignore false negatives; not sufficient alone in imbalanced settings | [66,74] |
| Recall (Sensitivity) | TP/(TP + FN)—proportion of actual positives correctly identified | Important for identifying all true interactions; useful in biological discovery | Can be high even when precision is low; may lead to many false positives | [67,75] |
| F1-Score | Harmonic mean of precision and recall | Balances precision and recall; useful when class distribution is skewed | Does not consider true negatives; sensitive to threshold choice | [68,76] |
| AUC-ROC | Area under the receiver operating characteristic curve | Measures discrimination capability of model across all thresholds | Can be misleading in highly imbalanced datasets; less focused on the positive class | [69,77] |
| PR-AUC | Area under the precision–recall curve | Better reflects performance on imbalanced datasets; focuses on positive class | Sensitive to class imbalance and prevalence; interpretation may be less intuitive than ROC curves | [42,43] |
| Matthews Correlation Coefficient (MCC) | Correlation coefficient between observed and predicted binary classifications | Takes into account all elements of the confusion matrix; robust in imbalanced datasets | Less commonly used and harder to interpret; sensitive to dataset size | [70,78] |
3. Machine Learning (ML) Methods for PPI Prediction
3.1. Traditional ML Methods
3.2. Deep Learning Approaches
3.3. Applications and Case Studies
3.3.1. Identification of Candidate Genes for Agronomic Traits
3.3.2. Understanding Plant–Pathogen Interactions
3.3.3. Elucidating Salt Tolerance Mechanisms
3.3.4. Network-Based Functional Annotation of Uncharacterized Proteins
3.3.5. Precision Breeding and Genome Editing Target Prioritization
4. Conclusions and Perspectives
Author Contributions
Funding
Data Availability Statement
Acknowledgments
Conflicts of Interest
References
- Wang, C.; Naing, N.N.Z.N.; Zhang, C.; Li, J.; Zhu, Q.; Lee, D.; Chen, L. Protein-Protein Interaction Networks in Rice under Drought Stress: Insights from Proteomics and Bioinformatics Analysis. Comput. Mol. Biol. 2024, 14, 191–201. [Google Scholar] [CrossRef]
- Wu, J.; Liu, X.; Ge, F.; Li, F.; Liu, N. Tolerance mechanism of rice (Oryza sativa L.) seedings towards polycyclic aromatic hydrocarbons toxicity: The activation of SPX-mediated signal transduction to maintain P homeostasis. Environ. Pollut. 2024, 341, 123009. [Google Scholar] [CrossRef]
- Razalli, I.I.; Abdullah-Zawawi, M.-R.; Zainal Abidin, R.A.; Harun, S.; Che Othman, M.H.; Ismail, I.; Zainal, Z. Identification and validation of hub genes associated with biotic and abiotic stresses by modular gene co-expression analysis in Oryza sativa L. Sci. Rep. 2025, 15, 8465. [Google Scholar] [CrossRef]
- Ontoy, J.C.; Ham, J.H. Mapping and omics integration: Towards precise rice disease resistance breeding. Plants 2024, 13, 1205. [Google Scholar] [CrossRef]
- Singh, S.; Praveen, A.; Dudha, N.; Bhadrecha, P. Integrating physiological and multi-omics methods to elucidate heat stress tolerance for sustainable rice production. Physiol. Mol. Biol. Plants 2024, 30, 1185–1208. [Google Scholar] [CrossRef]
- Krishna, V.A.; Singh, A.; Lal, J.P. Genome-Assisted Breeding and Genome-Wide Assocaition Studies for Rice Improevment. In Climate-Smart Rice Breed; Springer: Berlin/Heidelberg, Germany, 2024; p. 125. [Google Scholar]
- Wang, C.; Han, B. Twenty years of rice genomics research: From sequencing and functional genomics to quantitative genomics. Mol. Plant. 2022, 15, 593–619. [Google Scholar] [CrossRef]
- Usman, B.; Derakhshani, B.; Jung, K.-H. Recent molecular aspects and integrated omics strategies for understanding the abiotic stress tolerance of rice. Plants 2023, 12, 2019. [Google Scholar] [CrossRef] [PubMed]
- Wimalagunasekara, S.S.; Weeraman, J.W.; Tirimanne, S.; Fernando, P.C. Protein-protein interaction (PPI) network analysis reveals important hub proteins and sub-network modules for root development in rice (Oryza sativa). J. Genet. Eng. Biotechnol. 2023, 21, 69. [Google Scholar] [CrossRef] [PubMed]
- Zhou, H.; Hwarari, D.; Zhang, Y.; Mo, X.; Luo, Y.; Ma, H. Proteomic analysis reveals salicylic acid as a pivotal signal molecule in rice response to blast disease infection. Plants 2022, 11, 1702. [Google Scholar] [CrossRef] [PubMed]
- Laine, E.; Freiberger, M.I. Toward a comprehensive profiling of alternative splicing proteoform structures, interactions and functions. Curr. Opin. Struct. Biol. 2025, 90, 102979. [Google Scholar] [CrossRef]
- Zhang, L.; Huang, R.; Mao, D.; Zeng, J.; Fang, P.; He, Q.; Shu, F.; Deng, H.; Zhang, W.; Sun, P. Proteomes and ubiquitylomes reveal the regulation mechanism of cold tolerance mediated by OsGRF4 in rice. Front. Plant Sci. 2025, 16, 1531399. [Google Scholar] [CrossRef]
- González-Avendaño, M.; López, J.; Vergara-Jaque, A.; Cerda, O. The power of computational proteomics platforms to decipher protein-protein interactions. Curr. Opin. Struct. Biol. 2024, 88, 102882. [Google Scholar] [CrossRef]
- Sharma, M.; Sidhu, A.K.; Samota, M.K.; Gupta, M.; Koli, P.; Choudhary, M. Post-translational modifications in histones and their role in abiotic stress tolerance in plants. Proteomes 2023, 11, 38. [Google Scholar] [CrossRef]
- Liu, C.; Törnkvist, A.; Charova, S.; Stael, S.; Moschou, P.N. Proteolytic proteoforms: Elusive components of hormonal pathways? Trends Plant Sci. 2020, 25, 325–328. [Google Scholar] [CrossRef]
- Kosová, K.; Vítámvás, P.; Prášil, I.T.; Klíma, M.; Renaut, J. Plant proteoforms under environmental stress: Functional proteins arising from a single gene. Front. Plant Sci. 2021, 12, 793113. [Google Scholar] [CrossRef]
- Martínez-Esteso, M.J.; Morante-Carriel, J.; Samper-Herrero, A.; Martínez-Márquez, A.; Sellés-Marchart, S.; Nájera, H.; Bru-Martínez, R. Proteomics: An Essential Tool to Study Plant-Specialized Metabolism. Biomolecules 2024, 14, 1539. [Google Scholar] [CrossRef] [PubMed]
- Cuadrado, A.F.; Van Damme, D. Unlocking protein–protein interactions in plants: A comprehensive review of established and emerging techniques. J. Exp. Bot. 2024, 75, 5220–5236. [Google Scholar] [CrossRef] [PubMed]
- Poluri, K.M.; Gulati, K.; Sarkar, S.; Poluri, K.M.; Gulati, K.; Sarkar, S. Experimental methods for determination of protein–protein interactions. In Protein-Protein Interactions: Principles and Techniques; Springer: Singapore, 2021; Volume I, pp. 197–264. [Google Scholar]
- Melicher, P.; Dvořák, P.; Šamaj, J.; Takáč, T. Protein-protein interactions in plant antioxidant defense. Front. Plant Sci. 2022, 13, 1035573. [Google Scholar] [CrossRef] [PubMed]
- Tang, T.; Zhang, X.; Liu, Y.; Peng, H.; Zheng, B.; Yin, Y.; Zeng, X. Machine learning on protein–protein interaction prediction: Models, challenges and trends. Brief. Bioinform. 2023, 24, bbad076. [Google Scholar] [CrossRef]
- Kiouri, D.P.; Batsis, G.C.; Chasapis, C.T. Structure-Based Approaches for Protein–Protein Interaction Prediction Using Machine Learning and Deep Learning. Biomolecules 2025, 15, 141. [Google Scholar] [CrossRef]
- Wu, J.; Liu, B.; Zhang, J.; Wang, Z.; Li, J. DL-PPI: A method on prediction of sequenced protein–protein interaction based on deep learning. BMC Bioinform. 2023, 24, 473. [Google Scholar] [CrossRef]
- Hong, X.; Lv, J.; Li, Z.; Xiong, Y.; Zhang, J.; Chen, H.-F. Sequence-based machine learning method for predicting the effects of phosphorylation on protein-protein interactions. Int. J. Biol. Macromol. 2023, 243, 125233. [Google Scholar] [CrossRef]
- Nogueira-Rodríguez, A.; Glez-Peña, D.; Vieira, C.P.; Vieira, J.; López-Fernández, H. PPI prediction from sequences via transfer learning on balanced but yet biased datasets: An open problem. In Proceedings of the International Conference on Practical Applications of Computational Biology & Bioinformatics, Salamanca, Spain, 26–28 June 2024; Springer: Cham, Switzerland, 2024; pp. 31–40. [Google Scholar]
- Raj, S.S.; Chandra, S.V. Significance of sequence features in classification of protein–protein interactions using machine learning. Protein J. 2024, 43, 72–83. [Google Scholar] [CrossRef]
- Zheng, C.; Liu, Y.; Sun, F.; Zhao, L.; Zhang, L. Predicting protein–protein interactions between rice and blast fungus using structure-based approaches. Front. Plant Sci. 2021, 12, 690124. [Google Scholar] [CrossRef] [PubMed]
- Sun, F.; Deng, Y.; Ma, X.; Liu, Y.; Zhao, L.; Yu, S.; Zhang, L. Structure-based prediction of protein-protein interaction network in rice. Genet. Mol. Biol. 2024, 47, e20230068. [Google Scholar] [CrossRef] [PubMed]
- Sharma, N.K.; Anand, A.; Budhlakoti, N.; Mishra, D.C.; Jha, G.K. Artificial Intelligence and Machine Learning for Rice Improvement. In Climate-Smart Rice Breed; Springer: Berlin/Heidelberg, Germany, 2024; pp. 273–300. [Google Scholar]
- Bhuiyan, M.M.R.; Rahaman, M.M.; Aziz, M.M.; Islam, M.R.; Das, K. Predictive analytics in plant biotechnology: Using data science to drive crop resilience and productivity. J. Environ. Agric. Stud. 2023, 4, 77–83. [Google Scholar]
- Murmu, S.; Chaurasia, H.; Rao, A.; Rai, A.; Jaiswal, S.; Bharadwaj, A.; Yadav, R.; Archak, S. PlantPathoPPI: An Ensemble-based Machine Learning Architecture for Prediction of Protein-Protein Interactions between Plants and Pathogens. J. Mol. Biol. 2025, 437, 169093. [Google Scholar] [CrossRef]
- Gupta, C.; Ramegowda, V.; Basu, S.; Pereira, A. Using network-based machine learning to predict transcription factors involved in drought resistance. Front. Genet. 2021, 12, 652189. [Google Scholar] [CrossRef]
- Chi, L.; Ma, J.; Wan, Y.; Deng, Y.; Wu, Y.; Cen, X.; Zhou, X.; Zhao, X.; Wang, Y.; Ji, Z. HGNNPIP: A Hybrid Graph Neural Network framework for Protein-protein Interaction Prediction. bioRxiv 2023. bioRxiv:2023.2012.2010.571021. [Google Scholar] [CrossRef]
- Xie, S.; Xie, X.; Zhao, X.; Liu, F.; Wang, Y.; Ping, J.; Ji, Z. HNSPPI: A hybrid computational model combing network and sequence information for predicting protein–protein interaction. Brief. Bioinform. 2023, 24, bbad261. [Google Scholar] [CrossRef] [PubMed]
- Taha, K. Employing Machine Learning Techniques to Detect Protein-Protein Interaction: A Survey, Experimental, and Comparative Evaluations. bioRxiv 2023. bioRxiv:2023.2008.2022.554321. [Google Scholar] [CrossRef]
- Chatr-Aryamontri, A.; Oughtred, R.; Boucher, L.; Rust, J.; Chang, C.; Kolas, N.K.; O’Donnell, L.; Oster, S.; Theesfeld, C.; Sellam, A. The BioGRID interaction database: 2017 update. Nucleic Acids Res. 2017, 45, D369–D379. [Google Scholar] [CrossRef]
- De Silva, M.R.P.; Weeraman, J.W.J.K.; Piyatissa, S.; Fernando, P.C. Prediction of new candidate proteins and analysis of sub-modules and protein hubs associated with seed development in rice (Oryza sativa) using an ensemble network-based systems biology approach. BMC Plant Biol. 2025, 25, 604. [Google Scholar] [CrossRef]
- Abdullah-Zawawi, M.-R.; Govender, N.; Muhammad, N.A.N.; Mohd-Assaad, N.; Zainal, Z.; Mohamed-Hussein, Z.-A. Genome-wide analysis of sulfur-encoding biosynthetic genes in rice (Oryza sativa L.) with Arabidopsis as the sulfur-dependent model plant. Sci. Rep. 2022, 12, 13829. [Google Scholar] [CrossRef]
- Wang, L.; Jia, Y.; Osakina, A.; Olsen, K.M.; Huang, Y.; Jia, M.H.; Ponniah, S.; Pedrozo, R.; Nicolli, C.; Edwards, J.D. Receptor-ligand interactions in plant inmate immunity revealed by AlphaFold protein structure prediction. bioRxiv 2024. bioRxiv:2024.2006.2012.598632. [Google Scholar]
- Javaid, T.; Bhattarai, M.; Venkataraghavan, A.; Held, M.; Faik, A. Specific protein interactions between rice members of the GT43 and GT47 families form various central cores of putative xylan synthase complexes. Plant J. 2024, 118, 856–878. [Google Scholar] [CrossRef] [PubMed]
- Woo, D.U.; Lee, Y.; Min, C.W.; Kim, S.T.; Kang, Y.J. RiceProteomeDB (RPDB): A user-friendly database for proteomics data storage, retrieval, and analysis. Sci. Rep. 2024, 14, 3671. [Google Scholar] [CrossRef] [PubMed]
- Hu, L.; Wang, X.; Huang, Y.-A.; Hu, P.; You, Z.-H. A novel network-based algorithm for predicting protein-protein interactions using gene ontology. Front. Microbiol. 2021, 12, 735329. [Google Scholar] [CrossRef]
- Wang, S.; Dong, K.; Liang, D.; Zhang, Y.; Li, X.; Song, T. MIPPIS: Protein–protein interaction site prediction network with multi-information fusion. BMC Bioinform. 2024, 25, 345. [Google Scholar] [CrossRef]
- Zheng, J.; Yang, X.; Huang, Y.; Yang, S.; Wuchty, S.; Zhang, Z. Deep learning-assisted prediction of protein–protein interactions in Arabidopsis thaliana. Plant J. 2023, 114, 984–994. [Google Scholar] [CrossRef] [PubMed]
- Szklarczyk, D.; Gable, A.L.; Nastou, K.C.; Lyon, D.; Kirsch, R.; Pyysalo, S.; Doncheva, N.T.; Legeay, M.; Fang, T.; Bork, P. The STRING database in 2021: Customizable protein–protein networks, and functional characterization of user-uploaded gene/measurement sets. Nucleic Acids Res. 2021, 49, D605–D612. [Google Scholar] [CrossRef]
- Oughtred, R.; Rust, J.; Chang, C.; Breitkreutz, B.J.; Stark, C.; Willems, A.; Boucher, L.; Leung, G.; Kolas, N.; Zhang, F. The BioGRID database: A comprehensive biomedical resource of curated protein, genetic, and chemical interactions. Protein Sci. 2021, 30, 187–200. [Google Scholar] [CrossRef]
- Zhu, L.; Zhang, H.; Cao, D.; Xu, Y.; Li, L.; Ning, Z.; Zhu, L. Drought stress-related gene identification in rice by random walk with restart on multiplex biological networks. Agriculture 2022, 13, 53. [Google Scholar] [CrossRef]
- Li, M.; Shi, W.; Zhang, F.; Zeng, M.; Li, Y. A deep learning framework for predicting protein functions with co-occurrence of GO terms. IEEE/ACM Trans. Comput. Biol. Bioinform. 2022, 20, 833–842. [Google Scholar] [CrossRef]
- Lin, S.; Song, Q.; Tao, H.; Wang, W.; Wan, W.; Huang, J.; Xu, C.; Chebii, V.; Kitony, J.; Que, S. Rice_Phospho 1.0: A new rice-specific SVM predictor for protein phosphorylation sites. Sci. Rep. 2015, 5, 11940. [Google Scholar] [CrossRef]
- Pedrozo, R.; Osakina, A.; Huang, Y.; Nicolli, C.P.; Wang, L.; Jia, Y. Status on genetic resistance to rice blast disease in the post-genomic era. Plants 2025, 14, 807. [Google Scholar] [CrossRef] [PubMed]
- Ceasar, S.A.; Ebeed, H.T. The present state and impact of AI-driven computational tools for predicting plant protein structures. Protein Pept. Lett. 2024, 31, 749–758. [Google Scholar] [CrossRef] [PubMed]
- Alborzi, S.Z.; Ahmed Nacer, A.; Najjar, H.; Ritchie, D.W.; Devignes, M.-D. PPIDomainMiner: Inferring domain-domain interactions from multiple sources of protein-protein interactions. PLoS Comput. Biol. 2021, 17, e1008844. [Google Scholar] [CrossRef] [PubMed]
- Kumar, R.; Khatri, A.; Acharya, V. Deep learning uncovers distinct behavior of rice network to pathogens response. iScience 2022, 25, 104546. [Google Scholar] [CrossRef]
- Hu, J.; Li, Z.; Rao, B.; Thafar, M.A.; Arif, M. Improving protein-protein interaction prediction using protein language model and protein network features. Anal. Biochem. 2024, 693, 115550. [Google Scholar] [CrossRef]
- Karan, B.; Mahapatra, S.; Sahu, S.S. Prediction of protein interactions in rice and blast fungus using Machine Learning. In Proceedings of the 2019 International Conference on Information Technology (ICIT), Bhubaneswar, India, 19–21 December 2019; pp. 33–36. [Google Scholar]
- Li, L.-P.; Zhang, B.; Cheng, L. Cpiela: Computational prediction of plant protein–protein interactions by ensemble learning approach from protein sequences and evolutionary information. Front. Genet. 2022, 13, 857839. [Google Scholar] [CrossRef]
- Wang, L.; Li, F.-l.; Ma, X.-y.; Cang, Y.; Bai, F. PPI-Miner: A structure and sequence motif co-driven protein–protein interaction mining and modeling computational method. J. Chem. Inf. Model. 2022, 62, 6160–6171. [Google Scholar] [CrossRef]
- Baranwal, M.; Magner, A.; Saldinger, J.; Turali-Emre, E.S.; Elvati, P.; Kozarekar, S.; VanEpps, J.S.; Kotov, N.A.; Violi, A.; Hero, A.O. Struct2Graph: A graph attention network for structure based predictions of protein–protein interactions. BMC Bioinform. 2022, 23, 370. [Google Scholar] [CrossRef] [PubMed]
- Bertoline, L.M.; Lima, A.N.; Krieger, J.E.; Teixeira, S.K. Before and after AlphaFold2: An overview of protein structure prediction. Front. Bioinform. 2023, 3, 1120370. [Google Scholar] [CrossRef]
- Park, S.; Myung, S.; Baek, M. Advancing protein structure prediction beyond AlphaFold2. Curr. Opin. Struct. Biol. 2025, 90, 102985. [Google Scholar] [CrossRef]
- Hamilton, J.P.; Li, C.; Buell, C.R. The rice genome annotation project: An updated database for mining the rice genome. Nucleic Acids Res. 2025, 53, D1614–D1622. [Google Scholar] [CrossRef]
- Liu, J.; Ju, J.; Shen, T.; Guan, X. A framework for prediction of hierarchical protein function based on PPI network and semantic similarity. J. Biomol. Struct. Dyn. 2025, 1–10. [Google Scholar] [CrossRef] [PubMed]
- Xue, X.; Zhang, W.; Fan, A. Comparative analysis of gene ontology-based semantic similarity measurements for the application of identifying essential proteins. PLoS ONE 2023, 18, e0284274. [Google Scholar] [CrossRef] [PubMed]
- Zhang, Y.-H.; Zeng, T.; Chen, L.; Huang, T.; Cai, Y.-D. Determining protein–protein functional associations by functional rules based on gene ontology and KEGG pathway. Biochim. Biophys. Acta (BBA)-Proteins Proteom. 2021, 1869, 140621. [Google Scholar] [CrossRef]
- Ma, W.; Bao, W.; Cao, Y.; Yang, B.; Chen, Y. Prediction of Protein-Protein Interaction Based on Deep Learning Feature Representation and Random Forest. In Proceedings of the Intelligent Computing Theories and Application, Shenzhen, China, 12–15 August 2021; Springer International Publishing: Cham, Switzerland, 2021; pp. 654–662. [Google Scholar]
- Lakshmi, P.; Manikandan, P.; Ramyachitra, D. An Improved Bagging of Machine Learning Algorithms to Predict Motif Structures from Protein-Protein Interaction Networks. IEEE Access 2025, 13, 45077–45088. [Google Scholar] [CrossRef]
- Göktepe, Y.E. Protein-protein interaction prediction using enhanced features with spaced conjoint triad and amino acid pairwise distance. PeerJ Comput. Sci. 2025, 11, e2748. [Google Scholar] [CrossRef]
- Zheng, K.; Sun, M. Rice Quality Identification Based on Transfer Learning. In Proceedings of the 2024 International Conference on Computers, Information Processing and Advanced Education (CIPAE), Ottawa, ON, Canada, 26–28 August 2024; pp. 385–389. [Google Scholar]
- Paul, D.; Patua, R.; Saha, S.; Halder, A.K.; Basu, S. GSPPI: GraphSAGE-Based Prediction of Protein-Protein Interactions Using Graphlet Features. In Proceedings of the International Conference on Data, Electronics and Computing, Aizawl, India, 15–16 December 2023; pp. 101–112. [Google Scholar]
- Emmanuel, J.; Isewon, I.; Olasehinde, G.; Oyelade, J. Current Trend and Performance Evaluation of Machine Learning Methods for Predicting Host-Pathogen Protein-Protein Interactions. In Proceedings of the 2024 International Conference on Science, Engineering and Business for Driving Sustainable Development Goals (SEB4SDG), Omu-Aran, Nigeria, 2–4 April 2024; pp. 1–14. [Google Scholar]
- Dou, L.; Yang, F.; Xu, L.; Zou, Q. A comprehensive review of the imbalance classification of protein post-translational modifications. Brief. Bioinform. 2021, 22, bbab089. [Google Scholar]
- Lee, M. Recent advances in deep learning for protein-protein interaction analysis: A comprehensive review. Molecules 2023, 28, 5169. [Google Scholar] [CrossRef]
- Djeddi, W.E.; Yahia, S.B.; Diallo, G. Optimizing Global Network Alignment with a Genetic Algorithm: Leveraging Pre-trained Embeddings for Protein Sequences and Gene Ontology Terms. IEEE Trans. Comput. Biol. Bioinform. 2025, 22, 136–150. [Google Scholar] [CrossRef]
- Villikudathil, A.T.; Jayachandran, K.; Radhakrishnan, E. k-Nearest Neighbour machine method for predicting resistance gene against Magnaporthe oryzae in rice using proteomic markers. J. Proteins Proteom. 2024, 15, 601–610. [Google Scholar] [CrossRef]
- Shakibania, T.; Arabfard, M.; Najafi, A. A predictive approach for host-pathogen interactions using deep learning and protein sequences. VirusDisease 2024, 35, 434–445. [Google Scholar] [CrossRef] [PubMed]
- Inzamam-Ul-Hossain, M.; Islam, M.R. Prediction of essential proteins using genetic algorithm as a feature selection technique. IEEE Access 2024, 12, 126200–126220. [Google Scholar] [CrossRef]
- Naha, S.; Kaur, S.; Bhattacharya, R.; Cheemanapalli, S.; Iyyappan, Y. ANPS: Machine learning based server for identification of anti-nutritional proteins in plants. Funct. Integr. Genom. 2024, 24, 201. [Google Scholar] [CrossRef]
- ul Qamar, M.T.; Noor, F.; Guo, Y.-X.; Zhu, X.-T.; Chen, L.-L. Deep-HPI-pred: An R-Shiny applet for network-based classification and prediction of Host-Pathogen protein-protein interactions. Comput. Struct. Biotechnol. J. 2024, 23, 316–329. [Google Scholar]
- Yan, Y.; Wang, H.; Bi, Y.; Song, F. Rice E3 ubiquitin ligases: From key modulators of host immunity to potential breeding applications. Plant Commun. 2024, 5. [Google Scholar] [CrossRef]
- Liu, S.; Liu, Y.; Zhao, J.; Cai, S.; Qian, H.; Zuo, K.; Zhao, L.; Zhang, L. A computational interactome for prioritizing genes associated with complex agronomic traits in rice (Oryza sativa). Plant J. 2017, 90, 177–188. [Google Scholar] [CrossRef]
- Wei, Z.-S.; Yang, J.-Y.; Shen, H.-B.; Yu, D.-J. A cascade random forests algorithm for predicting protein-protein interaction sites. IEEE Trans. Nanobiosci. 2015, 14, 746–760. [Google Scholar] [CrossRef] [PubMed]
- Guo, J.; Li, H.; Chang, J.-W.; Lei, Y.; Li, S.; Chen, L.-L. Prediction and characterization of protein–protein interaction network in Xanthomonas oryzae pv. oryzae PXO99A. Res. Microbiol. 2013, 164, 1035–1044. [Google Scholar] [CrossRef]
- Karan, B.; Mahapatra, S.; Sahu, S.S.; Pandey, D.M.; Chakravarty, S. Computational models for prediction of protein–protein interaction in rice and Magnaporthe grisea. Front. Plant Sci. 2023, 13, 1046209. [Google Scholar] [CrossRef]
- Pan, J.; Wang, S.; Yu, C.; Li, L.; You, Z.; Sun, Y. A Novel Ensemble Learning-Based Computational Method to Predict Protein-Protein Interactions from Protein Primary Sequences. Biology 2022, 11, 775. [Google Scholar] [CrossRef] [PubMed]
- Pan, J.; Li, L.P.; Yu, C.Q.; You, Z.H.; Guan, Y.J.; Ren, Z.H. Sequence-Based Prediction of Plant Protein-Protein Interactions by Combining Discrete Sine Transformation with Rotation Forest. Evol. Bioinform. 2021, 17, 11769343211050067. [Google Scholar] [CrossRef]
- Pan, J.; Li, L.-P.; Yu, C.-Q.; You, Z.-H.; Ren, Z.-H.; Tang, J.-Y. FWHT-RF: A Novel Computational Approach to Predict Plant Protein-Protein Interactions via an Ensemble Learning Method. Sci. Program. 2021, 1607946. [Google Scholar] [CrossRef]
- Ma, S.; Song, Q.; Tao, H.; Harrison, A.; Wang, S.; Liu, W.; Lin, S.; Zhang, Z.; Ai, Y.; He, H. Prediction of protein-protein interactions between fungus (Magnaporthe grisea) and rice (Oryza sativa L.). Brief. Bioinform. 2019, 20, 448–456. [Google Scholar] [CrossRef] [PubMed]
- Wang, X.; Wu, Y.J.; Wang, R.J.; Wei, Y.Y.; Gui, Y.M. Gray BP neural network based prediction of rice protein interaction network. Clust. Comput. 2019, 22, 4165–4171. [Google Scholar] [CrossRef]
- Humphreys, I.R. Deep Learning and Coevolution Reveal Proteome-Wide Protein-Protein Interactions. Ph.D. Thesis, University of Washington, Seattle, WA, USA, 2024. [Google Scholar]
- Du, X.; Sun, S.; Hu, C.; Yao, Y.; Yan, Y.; Zhang, Y. DeepPPI: Boosting prediction of protein–protein interactions with deep neural networks. J. Chem. Inf. Model. 2017, 57, 1499–1510. [Google Scholar] [CrossRef]
- Wang, X.; Yan, R.; Chen, Y.-Z.; Wang, Y. Computational identification of ubiquitination sites in Arabidopsis thaliana using convolutional neural networks. Plant Mol. Biol. 2021, 105, 601–610. [Google Scholar] [CrossRef]
- Din, N.M.U.; Assad, A.; Dar, R.A.; Rasool, M.; Sabha, S.U.; Majeed, T.; Islam, Z.U.; Gulzar, W.; Yaseen, A. RiceNet: A deep convolutional neural network approach for classification of rice varieties. Expert Syst. Appl. 2024, 235, 121214. [Google Scholar] [CrossRef]
- Wang, L.; Wang, H.-F.; Liu, S.-R.; Yan, X.; Song, K.-J. Predicting Protein-Protein Interactions from Matrix-Based Protein Sequence Using Convolution Neural Network and Feature-Selective Rotation Forest. Sci. Rep. 2019, 9, 9848. [Google Scholar] [CrossRef] [PubMed]
- Pan, J.; Li, L.-P.; You, Z.-H.; Yu, C.-Q.; Ren, Z.-H.; Guan, Y.-J. Prediction of protein–protein interactions in Arabidopsis, maize, and rice by combining deep neural network with discrete hilbert transform. Front. Genet. 2021, 12, 745228. [Google Scholar] [CrossRef] [PubMed]
- Pan, J.; You, Z.-H.; Li, L.-P.; Huang, W.-Z.; Guo, J.-X.; Yu, C.-Q.; Wang, L.-P.; Zhao, Z.-Y. Dwppi: A deep learning approach for predicting protein–protein interactions in plants based on multi-source information with a large-scale biological network. Front. Bioeng. Biotechnol. 2022, 10, 807522. [Google Scholar] [CrossRef] [PubMed]
- Zhang, K.; Wang, C.; Sun, L.; Zheng, J. Prediction of gene co-expression from chromatin contacts with graph attention network. Bioinformatics 2022, 38, 4457–4465. [Google Scholar] [CrossRef]
- Kumar, R.; Acharya, V. Deep learning based protocol to construct an immune-related gene network of host-pathogen interactions in plants. STAR Protoc. 2023, 4, 101934. [Google Scholar] [CrossRef]
- Zhou, K.; Lei, C.; Zheng, J.; Huang, Y.; Zhang, Z. Pre-trained protein language model sheds new light on the prediction of Arabidopsis protein–protein interactions. Plant Methods 2023, 19, 141. [Google Scholar] [CrossRef]
- Chen, W.; Wang, S.; Song, T.; Li, X.; Han, P.; Gao, C. DCSE: Double-Channel-Siamese-Ensemble model for protein protein interaction prediction. BMC Genom. 2022, 23, 555. [Google Scholar] [CrossRef]
- Chen, S.; Zhang, H.; Gao, S.; He, K.; Yu, T.; Gao, S.; Wang, J.; Li, H. Unveiling Salt Tolerance Mechanisms in Plants: Integrating the KANMB Machine Learning Model With Metabolomic and Transcriptomic Analysis. Adv. Sci. 2025, 2417560. [Google Scholar] [CrossRef]
- Pradhan, U.K.; Mahapatra, A.; Naha, S.; Gupta, A.; Parsad, R.; Gahlaut, V.; Rath, S.N.; Meher, P.K. ASPTF: A computational tool to predict abiotic stress-responsive transcription factors in plants by employing machine learning algorithms. Biochim. Biophys. Acta (BBA)-Gen. Subj. 2024, 1868, 130597. [Google Scholar] [CrossRef] [PubMed]
- Liu, X.; Xia, D.; Luo, J.; Li, M.; Chen, L.; Chen, Y.; Huang, J.; Li, Y.; Xu, H.; Yuan, Y.; et al. Global Protein Interactome Mapping in Rice Using Barcode-Indexed PCR Coupled with HiFi Long-Read Sequencing. Adv. Sci. 2025, 12, e2416243. [Google Scholar] [CrossRef] [PubMed]
- Smet, D.; Opdebeeck, H.; Vandepoele, K. Predicting transcriptional responses to heat and drought stress from genomic features using a machine learning approach in rice. Front. Plant Sci. 2023, 14, 1212073. [Google Scholar] [CrossRef] [PubMed]


| Data Source | Description | Data Coverage | Key Insights | References |
|---|---|---|---|---|
| STRING | A database of known and predicted protein–protein interaction, primarily derived from experimental data, computational methods, and text mining. | Limited coverage for rice compared to model organisms | Provides a solid ground truth for known PPIs in various species. Offers a global perspective on protein interactions. | [45] |
| BioGRID | A comprehensive database of biologically relevant PPIs for multiple species, including rice. | Limited for rice but includes experimentally validated data | Contains experimentally validated PPIs and is useful for high-quality, ground-truth data. | [46] |
| RicePPINet | A rice-specific PPI database compiled by manually curating data from published studies. | Over 8000 rice-specific interactions | Focused on rice, offers insights into the rice-specific interactome and its biological relevance. | [47] |
| Arabidopsis Homology | Inferred interactions from Arabidopsis that are conserved in rice based on evolutionary relationships between species. | 40% of Arabidopsis PPIs detected in rice | Helps expand the rice PPI dataset through homology, especially in conserved pathways like ABA signaling. | [38] |
| AlphaFold Predictions | AlphaFold’s protein structure predictions for nearly the entire rice proteome. | Nearly complete rice proteome | Predicts potential binding interfaces and protein structures that assist in identifying PPIs. Useful for uncovering interactions in drought-responsive complexes. | [39] |
| RiceFREND | A co-expression network resource for rice, integrating transcriptomic data to identify potential functional linkages. | Focused on gene expression relationships | Provides functional context by linking co-expressed genes that may interact with each other. | [40] |
| Proteomic Datasets (MS) | Mass spectrometry-derived proteomic data that reveal direct evidence of protein interactions. | Varies, includes condition-specific interactions | Useful for identifying condition-specific PPIs, such as those during pathogen infection or stress responses. | [41] |
| ML Model | Database | No of PPIs (pos/neg) | Features | Performance | Limitations | Unique Aspects | References |
|---|---|---|---|---|---|---|---|
| k-NN | NCBI, UniProt | ~8000 | AAC and dipeptide | ACC: 90%, AUC: 0.9 | Simplicity of KNN; does not capture complex patterns | predict rice blast disease-resistant genes versus susceptible genes using their encoding protein sequences | [74] |
| SVM | Custom rice–M. grisea PPI set (interolog/domain inferred) | 59,430 (pos only; negatives sampled) | AAC, CT | ACC ≈ 89% (CT features: 89%; AAC features: 88%) | Large predicted set but few true positives for training; limited experimental validation | Integrates interolog-, domain-, GO-, and phylogeny-based models to generate dataset; applies ML on host–pathogen PPIs | [83] |
| Rotation Forest | PRIN (predicted Rice Interactome Network) | 9600 (4800/4800) | PSSM + Discrete Hilbert Transform (DHT) | ACC 94.24%, MCC 0.8914 | Random negative pairs may include true PPIs; ignores structure/GOs | First to apply Discrete Hilbert Transform on PSSM for rice PPI prediction; achieves high ACC with only sequence data | [84] |
| Rotation Forest (ensemble of RFs) | Plant PPI sets (Arabidopsis, maize, rice; from DIPOS/PRIN) | 9600 (4800/4800) | PSSM + local optimal-oriented pattern (LOOP) | ACC 94.02% (RF: 90.90%, SVM: 88.95%) | Balanced dataset but possible false negatives; not cross-species tested | Novel use of LOOP descriptor on PSSM with Rotation Forest; high AUC (≈0.96) | [56] |
| Rotation Forest | PRIN, agriGO | ~9600 (various) | PSSM + Discrete Sine Transform (DST) | ACC 88.82% (rice) | Lower accuracy on rice vs. other plants; depends on dimensionality reduction (SVD) | Introduced DST on PSSM for plant PPIs; shows efficacy of signal-processing features | [85] |
| Rotation Forest | PRIN | ~9600 (various) | PSSM + Fast Walsh–Hadamard Transform (FWHT) | ACC 94.42% (rice) | Computationally intensive FWHT; relies on high-quality PSSMs | Applies FWHT to extract features from PSSM for ensemble classification; very high accuracy on rice data | [86] |
| RF | HPID | 2018 (structure-matched pairs) | Structural docking scores, compatibility | N/A (focus on network discovery) | Relies on availability of structural templates; no standard ML metrics reported | First 3D-structure-based PPI predictor for rice–pathogen; built an RF classifier on docking features | [27] |
| SVM | Predicted Rice–M. grisea PPIs (interolog/domain) | 532 (pos only; negatives sampled) | AAC, CT (sequence composition) | Jackknife ACC 93.85% | Very small dataset (532); no independent test set beyond 22 pairs; potential overfitting | Combined interolog and domain inference to generate positive PPIs, then SVM to classify; enriched predicted network with pathogen effectors | [87] |
| Gray BPNN | Not specified | 1356 | AAC (sequence feature) | ACC = 92.78% | Difficulties to handle large-scale dataset | Demonstrated feasibility of neural network for rice PPI with low computational cost | [88] |
| DL Model(s) | Database | No. of PPIs | Features | Performance | Limitation | Unique Aspects | References |
|---|---|---|---|---|---|---|---|
| Ensemble of Siamese RCNN (sequence), Domain2vec MLP, GO2vec MLP + logistic regression | Arabidopsis PPI dataset (BioGRID) with curated negatives | Not specified | Sequence embeddings (word2vec → RCNN); domain embedding (domain2vec); GO term embedding (GO2vec) | Cross-species (Arabidopsis → rice) AUC not reported here, but claimed “better than ML methods, though overall remains to improve” | Requires high-quality GO/domain annotations; cross-species performance still limited | Multi-view ensemble (sequence + domain + GO); Siamese RCNN captures pairwise sequence interaction; provides web server for Arabidopsis → rice PPI prediction | [44] |
| Pre-trained Transformer (ESM-1b) + MLP (ESMAraPPI) | Arabidopsis PPI (BioGRID) with strict train/test splits | Not specified | Protein language model embeddings (ESM-1b) for each protein sequence | AUPR ~0.810 on strict independent set (no rice test reported) | Focused on Arabidopsis only; requires large pretrained model; no cross-species evaluation | First use of large pre-trained protein transformer (ESM-1b) for plant PPI; shows strong extrapolation (unseen proteins); outperforms other pLMs and baselines | [98] |
| DeepWalk graph embedding + 4-mer word2vec + DNN classifier (DWPPI) | PRIN (rice) and PPIM (maize) databases | Rice: 103,028 (positives) | Sequence (word2vec on 4-mer tokens) + network-behavior (DeepWalk embedding of PPI graph) | Rice AUC ≈ 0.9213 | Requires existing large PPI network for embedding; performance may drop on sparse networks | Multi-source fusion (sequence + network) for plant PPIs; large-scale (100K + PPIs) datasets; case studies validated top predictions against literature | [95] |
| DCSE: Siamese CNN + BiGRU ensemble (double-channel) | Human PPI (STRING/HPRD) | ~30,000 | NLP-based sequence encoding (skip-gram) + CNN, BiGRU | Acc 93.0%, Precision 90.9%, Recall 94.5%, F1 92.7%, MCC 0.860 | Human-specific training; no plant evaluation; uses large one-hot embeddings | Novel siamese-ensemble architecture (parallel CNN and CNN + BiGRU); robust to imbalanced data | [99] |
| DLNet | STRING, PRIN, IntAct | ~20,000 | Sequence similarity | Precision: 90%, Recall: 84% | High false positives | Uses both the forest model and graph-embedded deep-forward network (GEDFN) | [53] |
| DNN combined with Discrete Hilbert Transform (DHT) | PRIN Rice PPI set (4800 pos, 4800 neg) | 4800 | PSSM (via PSI-BLAST) + DHT of PSSM (followed by SVD) | Rice AUC ≈ 0.9440 (Acc 82.6%, F1 ≈ 85%, MCC 0.676) | Relies on sequence alignments (PSI-BLAST) for PSSMs; uses negative sampling (non-validated negatives); only sequence features | Innovative use of DHT on PSSM to capture evolutionary info; strong cross-plant evaluation | [94] |
| DeepPPI: Fully connected DNN | Yeast PPI (DIP) positives vs. sampled negatives | ~6600 | PSSM (evolutionary profile) + other sequence descriptors | Accuracy ≈ 65.8% (vs. 64.2% by SVM) | Moderate accuracy; only yeast data; no plant test | One of the first DNN models for PPI; showed slight improvement over SVM | [90] |
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content. |
© 2025 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).
Share and Cite
Merumba, S.B.; Ahmed, H.O.; Fu, D.; Yang, P. Recent Advances and Application of Machine Learning for Protein–Protein Interaction Prediction in Rice: Challenges and Future Perspectives. Proteomes 2025, 13, 54. https://doi.org/10.3390/proteomes13040054
Merumba SB, Ahmed HO, Fu D, Yang P. Recent Advances and Application of Machine Learning for Protein–Protein Interaction Prediction in Rice: Challenges and Future Perspectives. Proteomes. 2025; 13(4):54. https://doi.org/10.3390/proteomes13040054
Chicago/Turabian StyleMerumba, Sarah Bernard, Habiba Omar Ahmed, Dong Fu, and Pingfang Yang. 2025. "Recent Advances and Application of Machine Learning for Protein–Protein Interaction Prediction in Rice: Challenges and Future Perspectives" Proteomes 13, no. 4: 54. https://doi.org/10.3390/proteomes13040054
APA StyleMerumba, S. B., Ahmed, H. O., Fu, D., & Yang, P. (2025). Recent Advances and Application of Machine Learning for Protein–Protein Interaction Prediction in Rice: Challenges and Future Perspectives. Proteomes, 13(4), 54. https://doi.org/10.3390/proteomes13040054

