Protein Language Models Expose Viral Immune Mimicry
Abstract
1. Introduction
2. Materials and Methods
2.1. Protein Datasets
2.2. Pretrained Deep Language Models (ESM, T5)
2.3. Human-Virus Model Training and Implementation
2.4. Finding and Analyzing Model Mistakes
2.5. Dimensionality Reduction of Features
2.6. Model Performance
- (i)
- (Precision = TP/(TP + FP)
- (ii)
- Accuracy = (TP + TN)/(TP + TN + FP + FN)
- (iii)
- Recall = TP/(TP + FN)
2.7. Immunogenicity Datasets and Scores
3. Results
3.1. Human Virus Models
3.2. Error Analysis Models Insights
3.3. Virus Errors Analysis
3.4. Latent Structure Embeddings Clustering
3.5. Immunogenicity Analysis
3.6. V4H Mistakes Expose Traces of Host Sequences Within Viruses
3.7. Among the V4H Mistakes Are Proteins That Support Immune Escape Through Mimicry
4. Discussion
5. Conclusions
Supplementary Materials
Author Contributions
Funding
Institutional Review Board Statement
Informed Consent Statement
Data Availability Statement
Conflicts of Interest
Abbreviations
AUC | area under the curve |
FN | false negative |
FP | false positive |
GBT | gradient boosting decision trees |
LoRA | low-rank adapter |
LR | logistic regression |
MHC | major histocompatibility complex |
ML | machine learning |
PLM | protein language model |
ROC | receiver operating characteristics |
RT | reverse transcriptase |
TN | true negative |
TP | true positive |
UKB | UK Biobank |
References
- Masson, P.; Hulo, C.; De Castro, E.; Bitter, H.; Gruenbaum, L.; Essioux, L.; Bougueleret, L.; Xenarios, I.; Le Mercier, P. ViralZone: Recent updates to the virus knowledge resource. Nucleic Acids Res. 2012, 41, D579–D583. [Google Scholar] [CrossRef]
- Durmuş, S.; Ülgen, K.Ö. Comparative interactomics for virus–human protein–protein interactions: DNA viruses versus RNA viruses. FEBS Open Bio. 2017, 7, 96–107. [Google Scholar] [CrossRef]
- Weidner-Glunde, M.; Kruminis-Kaszkiel, E.; Savanagouder, M. Herpesviral latency—Common themes. Pathogens 2020, 9, 125. [Google Scholar] [CrossRef] [PubMed]
- Griffin, B.D.; Verweij, M.C.; Wiertz, E.J. Herpesviruses and immunity: The art of evasion. Vet. Microbiol. 2010, 143, 89–100. [Google Scholar] [CrossRef] [PubMed]
- Coscoy, L. Immune evasion by Kaposi’s sarcoma-associated herpesvirus. Nat. Rev. Immunol. 2007, 7, 391–401. [Google Scholar] [CrossRef] [PubMed]
- Johnson, W.E. Origins and evolutionary consequences of ancient endogenous retroviruses. Nat. Rev. Microbiol. 2019, 17, 355–370. [Google Scholar] [CrossRef]
- De Parseval, N.; Heidmann, T. Human endogenous retroviruses: From infectious elements to human genes. Cytogenet. Genome Res. 2005, 110, 318–332. [Google Scholar] [CrossRef]
- Grandi, N.; Tramontano, E. Human endogenous retroviruses are ancient acquired elements still shaping innate immune responses. Front. Immunol. 2018, 9, 2039. [Google Scholar] [CrossRef]
- Bahir, I.; Fromer, M.; Prat, Y.; Linial, M. Viral adaptation to host: A proteome-based analysis of codon usage and amino acid preferences. Mol. Syst. Biol. 2009, 5, 311. [Google Scholar] [CrossRef]
- Petrova, V.N.; Russell, C.A. The evolution of seasonal influenza viruses. Nat. Rev. Microbiol. 2018, 16, 47–60. [Google Scholar] [CrossRef]
- Kazlauskas, D.; Krupovic, M.; Venclovas, Č. The logic of DNA replication in double-stranded DNA viruses: Insights from global analysis of viral genomes. Nucleic Acids Res. 2016, 44, 4551–4564. [Google Scholar] [CrossRef]
- Mahmoudabadi, G.; Phillips, R. A comprehensive and quantitative exploration of thousands of viral genomes. elife 2018, 7, e31955. [Google Scholar] [CrossRef]
- Rappoport, N.; Linial, M. Viral proteins acquired from a host converge to simplified domain architectures. PLoS Comput. Biol. 2012, 8, e1002364. [Google Scholar] [CrossRef]
- Kikkert, M. Innate immune evasion by human respiratory RNA viruses. J. Innate Immun. 2020, 12, 4–20. [Google Scholar] [CrossRef]
- Van de Sandt, C.E.; Kreijtz, J.H.; Rimmelzwaan, G.F. Evasion of influenza A viruses from innate and adaptive immune responses. Viruses 2012, 4, 1438–1476. [Google Scholar] [CrossRef] [PubMed]
- Chiu, Y.F.; Ponlachantra, K.; Sugden, B. How Epstein Barr Virus Causes Lymphomas. Viruses 2024, 16, 1744. [Google Scholar] [CrossRef]
- Li, J.; Li, B. EBNA-1 antibody and autoimmune rheumatic diseases: A Mendelian Randomization Study. Heliyon 2024, 10, e37045. [Google Scholar] [CrossRef] [PubMed]
- Poole, B.D.; Scofield, R.H.; Harley, J.B.; James, J.A. Epstein-Barr virus and molecular mimicry in systemic lupus erythematosus. Autoimmunity 2006, 39, 63–70. [Google Scholar] [CrossRef] [PubMed]
- Rosen, A.; Casciola-Rosen, L. Autoantigens as Partners in Initiation and Propagation of Autoimmune Rheumatic Diseases. Annu. Rev. Immunol. 2016, 34, 395–420. [Google Scholar] [CrossRef]
- Albert, L.J.; Inman, R.D. Molecular mimicry and autoimmunity. N. Engl. J. Med. 1999, 341, 2068–2074. [Google Scholar] [CrossRef]
- Maguire, C.; Wang, C.; Ramasamy, A.; Fonken, C.; Morse, B.; Lopez, N.; Wylie, D.; Melamed, E. Molecular mimicry as a mechanism of viral immune evasion and autoimmunity. Nat. Commun. 2024, 15, 9403. [Google Scholar] [CrossRef] [PubMed]
- Robinson, W.H.; Steinman, L. Epstein-Barr virus and multiple sclerosis. Science 2022, 375, 264–265. [Google Scholar] [CrossRef]
- Tengvall, K.; Huang, J.; Hellström, C.; Kammer, P.; Biström, M.; Ayoglu, B.; Lima Bomfim, I.; Stridh, P.; Butt, J.; Brenner, N. Molecular mimicry between Anoctamin 2 and Epstein-Barr virus nuclear antigen 1 associates with multiple sclerosis risk. Proc. Natl. Acad. Sci. USA 2019, 116, 16955–16960. [Google Scholar] [CrossRef] [PubMed]
- Ofer, D.; Brandes, N.; Linial, M. The language of proteins: NLP, machine learning & protein sequences. Comput. Struct. Biotechnol. J. 2021, 19, 1750–1758. [Google Scholar] [CrossRef]
- Brandes, N.; Ofer, D.; Peleg, Y.; Rappoport, N.; Linial, M. ProteinBERT: A universal deep-learning model of protein sequence and function. Bioinformatics 2022, 38, 2102–2110. [Google Scholar] [CrossRef]
- Ferruz, N.; Schmidt, S.; Höcker, B. ProtGPT2 is a deep unsupervised language model for protein design. Nat. Commun. 2022, 13, 4348. [Google Scholar] [CrossRef]
- Brandes, N.; Goldman, G.; Wang, C.H.; Ye, C.J.; Ntranos, V. Genome-wide prediction of disease variant effects with a deep protein language model. Nat. Genet. 2023, 55, 1512–1522. [Google Scholar] [CrossRef]
- Wang, H.; Zhao, L.; Yu, Z.; Zeng, X.; Shi, S. CoNglyPred: Accurate Prediction of N-Linked Glycosylation Sites Using ESM-2 and Structural Features With Graph Network and Co-Attention. Proteomics 2025, 25, e202400210. [Google Scholar] [CrossRef] [PubMed]
- Elnaggar, A.; Heinzinger, M.; Dallago, C.; Rehawi, G.; Wang, Y.; Jones, L.; Gibbs, T.; Feher, T.; Angerer, C.; Steinegger, M.; et al. ProtTrans: Toward Understanding the Language of Life Through Self-Supervised Learning. IEEE Trans. Pattern Anal. Mach. Intell. 2022, 44, 7112–7127. [Google Scholar] [CrossRef]
- Geirhos, R.; Rubisch, P.; Michaelis, C.; Bethge, M.; Wichmann, F.A.; Brendel, W. ImageNet-trained CNNs are biased towards texture; increasing shape bias improves accuracy and robustness. In Proceedings of the International Conference on Learning Representations, New Orleans, LA, USA, 6–9 May 2019. [Google Scholar]
- Zeman, A.A.; Ritchie, J.B.; Bracci, S.; Op de Beeck, H. Orthogonal representations of object shape and category in deep convolutional neural networks and human visual cortex. Sci. Rep. 2020, 10, 2453. [Google Scholar] [CrossRef]
- Goodfellow, I.J.; Shlens, J.; Szegedy, C. Explaining and harnessing adversarial examples. arXiv 2014, arXiv:1412.6572. [Google Scholar]
- Fidel, G.; Bitton, R.; Shabtai, A. When explainability meets adversarial learning: Detecting adversarial examples using shap signatures. In Proceedings of the 2020 International Joint Conference on Neural Networks (IJCNN), Glasgow, UK, 19–24 July 2020; IEEE: Piscataway, NJ, USA, 2020; pp. 1–8. [Google Scholar]
- Akhtar, N.; Mian, A. Threat of adversarial attacks on deep learning in computer vision: A survey. Ieee Access 2018, 6, 14410–14430. [Google Scholar] [CrossRef]
- Hulo, C.; De Castro, E.; Masson, P.; Bougueleret, L.; Bairoch, A.; Xenarios, I.; Le Mercier, P. ViralZone: A knowledge resource to understand virus diversity. Nucleic Acids Res. 2011, 39, D576–D582. [Google Scholar] [CrossRef]
- Lin, Z.; Akin, H.; Rao, R.; Hie, B.; Zhu, Z.; Lu, W.; Smetanin, N.; Verkuil, R.; Kabeli, O.; Shmueli, Y. Evolutionary-scale prediction of atomic-level protein structure with a language model. Science 2023, 379, 1123–1130. [Google Scholar] [CrossRef]
- Wolf, T.; Debut, L.; Sanh, V.; Chaumond, J.; Delangue, C.; Moi, A.; Cistac, P.; Rault, T.; Louf, R.; Funtowicz, M. Huggingface’s transformers: State-of-the-art natural language processing. arXiv 2019, arXiv:1910.03771. [Google Scholar]
- Hu, E.J.; Shen, Y.; Wallis, P.; Allen-Zhu, Z.; Li, Y.; Wang, S.; Wang, L.; Chen, W. Lora: Low-rank adaptation of large language models. ICLR 2022, 1, 3. [Google Scholar]
- Pedregosa, F.; Varoquaux, G.; Gramfort, A.; Michel, V.; Thirion, B.; Grisel, O.; Blondel, M.; Prettenhofer, P.; Weiss, R.; Dubourg, V. Scikit-learn: Machine learning in Python. J. Mach. Learn. Res. 2011, 12, 2825–2830. [Google Scholar]
- Ofer, D.; Linial, M. Inferring microRNA regulation: A proteome perspective. Front. Mol. Biosci. 2022, 9, 916639. [Google Scholar] [CrossRef] [PubMed]
- Peng, H.; Long, F.; Ding, C. Feature selection based on mutual information criteria of max-dependency, max-relevance, and min-redundancy. IEEE Trans. Pattern Anal. Mach. Intell. 2005, 27, 1226–1238. [Google Scholar] [CrossRef]
- Chan, D.M.; Rao, R.; Huang, F.; Canny, J.F. t-SNE-CUDA: GPU-Accelerated t-SNE and its Applications to Modern Data. In Proceedings of the International Symposium on Computer Architecture and High Performance Computing (SBAC-PAD), Lyon, France, 24–27 September 2018; IEEE: Piscataway, NJ, USA, 2018; pp. 330–338. [Google Scholar]
- Van der Maaten, L.; Hinton, G. Visualizing data using t-SNE. J. Mach. Learn. Res. 2008, 9, 2579–2605. [Google Scholar]
- Vita, R.; Mahajan, S.; Overton, J.A.; Dhanda, S.K.; Martini, S.; Cantrell, J.R.; Wheeler, D.K.; Sette, A.; Peters, B. The immune epitope database (IEDB): 2018 update. Nucleic Acids Res. 2019, 47, D339–D343. [Google Scholar] [CrossRef]
- Schwede, T.; Kopp, J.; Guex, N.; Peitsch, M.C. SWISS-MODEL: An automated protein homology-modeling server. Nucleic Acids Res. 2003, 31, 3381–3385. [Google Scholar] [CrossRef]
- Hoque, M.; Grigg, J.B.; Ramlall, T.; Jones, J.; McGoldrick, L.L.; Lin, J.C.; Olson, W.C.; Smith, E.; Franklin, M.C.; Zhang, T.; et al. Structural characterization of two gammadelta TCR/CD3 complexes. Nat. Commun. 2025, 16, 318. [Google Scholar] [CrossRef]
- Eaglesham, J.B.; Pan, Y.; Kupper, T.S.; Kranzusch, P.J. Viral and metazoan poxins are cGAMP-specific nucleases that restrict cGAS-STING signalling. Nature 2019, 566, 259–263. [Google Scholar] [CrossRef]
- Jo, U.; Pommier, Y. Structural, molecular, and functional insights into Schlafen proteins. Exp. Mol. Med. 2022, 54, 730–738. [Google Scholar] [CrossRef]
- Couves, E.C.; Gardner, S.; Voisin, T.B.; Bickel, J.K.; Stansfeld, P.J.; Tate, E.W.; Bubeck, D. Structural basis for membrane attack complex inhibition by CD59. Nat. Commun. 2023, 14, 890. [Google Scholar] [CrossRef] [PubMed]
- Yu, J.; Murthy, V.; Liu, S.L. Relating GPI-Anchored Ly6 Proteins uPAR and CD59 to Viral Infection. Viruses 2019, 11, 1060. [Google Scholar] [CrossRef]
- Taussig, D.; Wine, Y. When a virus lies in wait. Elife 2021, 10, e71121. [Google Scholar] [CrossRef] [PubMed]
- Tagliamonte, M.; Cavalluzzo, B.; Mauriello, A.; Ragone, C.; Buonaguro, F.M.; Tornesello, M.L.; Buonaguro, L. Molecular mimicry and cancer vaccine development. Mol. Cancer 2023, 22, 75. [Google Scholar] [CrossRef]
- Smatti, M.K.; Cyprian, F.S.; Nasrallah, G.K.; Al Thani, A.A.; Almishal, R.O.; Yassine, H.M. Viruses and Autoimmunity: A Review on the Potential Interaction and Molecular Mechanisms. Viruses 2019, 11, 762. [Google Scholar] [CrossRef] [PubMed]
- Schattner, A.; Rager-Zisman, B. Virus-induced autoimmunity. Rev. Infect. Dis. 1990, 12, 204–222. [Google Scholar] [CrossRef]
- Lenti, M.V.; Rossi, C.M.; Melazzini, F.; Gastaldi, M.; Bugatti, S.; Rotondi, M.; Bianchi, P.I.; Gentile, A.; Chiovato, L.; Montecucco, C. Seronegative autoimmune diseases: A challenging diagnosis. Autoimmun. Rev. 2022, 21, 103143. [Google Scholar] [CrossRef]
- Takei, M.; Kitamura, N.; Nagasawa, Y.; Tsuzuki, H.; Iwata, M.; Nagatsuka, Y.; Nakamura, H.; Imai, K.; Fujiwara, S. Are Viral Infections Key Inducers of Autoimmune Diseases? Focus on Epstein-Barr Virus. Viruses 2022, 14, 1900. [Google Scholar] [CrossRef]
- Sokolovska, L.; Cistjakovs, M.; Matroze, A.; Murovska, M.; Sultanova, A. From Viral Infection to Autoimmune Reaction: Exploring the Link between Human Herpesvirus 6 and Autoimmune Diseases. Microorganisms 2024, 12, 362. [Google Scholar] [CrossRef] [PubMed]
- Harrigan, W.L.; Ferrell, B.D.; Wommack, K.E.; Polson, S.W.; Schreiber, Z.D.; Belcaid, M. Improvements in viral gene annotation using large language models and soft alignments. BMC Bioinform. 2024, 25, 165. [Google Scholar] [CrossRef] [PubMed]
- Michael-Pitschaze, T.; Cohen, N.; Ofer, D.; Hoshen, Y.; Linial, M. Detecting anomalous proteins using deep representations. NAR Genom. Bioinform. 2024, 6, lqae021. [Google Scholar] [CrossRef] [PubMed]
- Hie, B.; Zhong, E.D.; Berger, B.; Bryson, B. Learning the language of viral evolution and escape. Science 2021, 371, 284–288. [Google Scholar] [CrossRef]
- Ching, T.; Himmelstein, D.S.; Beaulieu-Jones, B.K.; Kalinin, A.A.; Do, B.T.; Way, G.P.; Ferrero, E.; Agapow, P.-M.; Zietz, M.; Hoffman, M.M. Opportunities and obstacles for deep learning in biology and medicine. J. R. Soc. Interface 2018, 15, 20170387. [Google Scholar] [CrossRef]
- Rojas, M.; Vasconcelos, G.; Dever, T.E. An eIF2alpha-binding motif in protein phosphatase 1 subunit GADD34 and its viral orthologs is required to promote dephosphorylation of eIF2alpha. Proc. Natl. Acad. Sci. USA 2015, 112, E3466–E3475. [Google Scholar] [CrossRef]
- Butt, B.G.; Fischer, D.; Rep, A.R.; Schauflinger, M.; Read, C.; Bock, T.; Hirner, M.; Wienen, F.; Graham, S.C.; von Einem, J. Human cytomegalovirus deploys molecular mimicry to recruit VPS4A to sites of virus assembly. PLoS Pathog. 2024, 20, e1012300. [Google Scholar] [CrossRef]
- Ragone, C.; Manolio, C.; Mauriello, A.; Cavalluzzo, B.; Buonaguro, F.M.; Tornesello, M.L.; Tagliamonte, M.; Buonaguro, L. Molecular mimicry between tumor associated antigens and microbiota-derived epitopes. J. Transl. Med. 2022, 20, 316. [Google Scholar] [CrossRef] [PubMed]
- Boesch, M.; Baty, F.; Rothschild, S.I.; Tamm, M.; Joerger, M.; Fruh, M.; Brutsche, M.H. Tumour neoantigen mimicry by microbial species in cancer immunotherapy. Br. J. Cancer 2021, 125, 313–323. [Google Scholar] [CrossRef] [PubMed]
Model a | AUC (%) | Accur. | Prec. | Recall | Log-Loss |
---|---|---|---|---|---|
BL Length | 61.97 | 78.5 | 78.5 | 78.5 | 0.52 |
AA n-grams | 91.5 | 88.49 | 88.49 | 88.49 | 0.28 |
ESM2 8M | 98.09 | 94.72 | 92.15 | 92.33 | 0.2 |
ESM2 35M | 98.69 | 95.83 | 93.81 | 93.92 | 0.18 |
ESM2 150M | 99.26 | 96.99 | 95.54 | 95.48 | 0.12 |
ESM2 650M | 99.67 | 97.86 | 96.85 | 96.68 | 0.09 |
Linear-T5 | 99.56 | 97.57 | 97.57 | 97.57 | 0.06 |
Tree-T5 | 99.65 | 97.7 | 97.7 | 97.7 | 0.06 |
Features | Mistake Rate (%) | Number of Proteins | Lift a |
---|---|---|---|
“Adaptive immune” KW | 60.5 | 46 | 15.5 |
Endogenous retrovirus | 30 | 40 | 7.7 |
Oncogene KW | 19.3 | 393 | 4.9 |
Sequence length <170 | 12.1 | 4539 | 3.1 |
Virus | 9.4 | 6699 | 2.4 |
Name “putative” | 8.7 | 1050 | 2.2 |
Few KW (<8) | 8.8 | 3326 | 2.2 |
Baltimore Class | Genome | # of Families | a Rep. Species | Mistake Rate (%) | # of Proteins | b Lift |
---|---|---|---|---|---|---|
VII | dsDNA-RT | 1 | HBV-C | 34.2 | 108 | 3.6 |
VI | ssRNA-RT | 1 | FeLV | 19.5 | 666 | 2 |
II | ssDNA | 3 | HPV B19 | 13.1 | 129 | 1.3 |
I | dsDNA | 13 | HHV-4 | 8.2 | 4421 | 0.8 |
IV, V | ssRNA | 28 | VSIV | 8.1 | 1017 | 0.8 |
III | dsRNA | 4 | RV-B | 0.8 | 358 | 0.1 |
Viral Family | Class | a Main Disease | Mistake Rate (%) | Support | b Lift |
---|---|---|---|---|---|
Hepeviridae | IV | Hepatitis | 44.4 | 9 | 4.7 |
Hepadnaviridae | VII | Hepatitis | 34.3 | 108 | 3.6 |
Circoviridae | II | CNS infection | 33.3 | 27 | 3.5 |
Polyomaviridae | I | Cancer | 30.7 | 62 | 3.2 |
Picornaviridae | IV | Nose/Throat | 28.6 | 7 | 3.0 |
Retroviridae | VI | Cancer/AIDS | 19.5 | 666 | 2.1 |
Polydnaviriformidae | I | N.A. | 18.4 | 49 | 1.9 |
Arteriviridae | IV | N.A. | 18.2 | 22 | 1.9 |
Papillomaviridae | I | Cancer | 14.2 | 520 | 1.5 |
Caliciviridae | IV | Intestines | 13.8 | 29 | 1.5 |
Paramyxoviridae | V | Mumps | 12.9 | 124 | 1.4 |
Anelloviridae | II | Immune Supp. | 11.5 | 52 | 1.2 |
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content. |
© 2025 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).
Share and Cite
Ofer, D.; Linial, M. Protein Language Models Expose Viral Immune Mimicry. Viruses 2025, 17, 1199. https://doi.org/10.3390/v17091199
Ofer D, Linial M. Protein Language Models Expose Viral Immune Mimicry. Viruses. 2025; 17(9):1199. https://doi.org/10.3390/v17091199
Chicago/Turabian StyleOfer, Dan, and Michal Linial. 2025. "Protein Language Models Expose Viral Immune Mimicry" Viruses 17, no. 9: 1199. https://doi.org/10.3390/v17091199
APA StyleOfer, D., & Linial, M. (2025). Protein Language Models Expose Viral Immune Mimicry. Viruses, 17(9), 1199. https://doi.org/10.3390/v17091199