Interrelational Proteomic Sequence Features Enhance Predictive Modeling: Application to COVID-19 Severity
Abstract
1. Introduction
2. Materials and Methods
2.1. Implementation and Usage
2.1.1. Protein Features
- Sequences: Basic statistical properties of amino acid chains, such as average length or length variance. For MSA, this category performs statistics about matches in the alignment.
- Amino-acid Types: Percentages of different chemical classes of amino acids in the sequence (e.g., uncharged, nonpolar aliphatic, positively charged, etc.) composing the chain.
- Domains: Properties related to functional domains as defined by Pfam [40]. Both Pfam domains and their corresponding clans are considered.
- Secondary Structure: Percentage of amino acids located in different protein’s secondary structures (α-helix, β-strand, hydrogen-bonded turn). This information is retrieved from Uniprot [41].
- Tertiary Structure: Properties regarding annotated structural data according to the Protein Data Bank (PDB) [42]. Among others, structural contacts are computed here. A contact is considered when any pair of atoms in two different amino acids, separated by more than five residues in the sequence, are close enough that no solvent molecule can occupy the space between them (similarly defined in [43]).
- Ontological Terms: Features related to shared terms in the three ontologies defined by Gene Ontology (GO) [44]: biological process (BP), molecular function (MF) or cellular component (CC). Note that metrics in this category cannot be referred to MSA. Since GO terms are not associated to a specific location along sequences, alignment metrics cannot be computed from them.
2.1.2. Web Interface
2.1.3. Apache Web Server
2.1.4. Metrics Calculation (Perl Module)
2.1.5. Protein Feature Database
- UniProt_Feats: This table includes all annotation details retrieved from Uniprot [41]. It also serves as the central table, linking each protein to its corresponding entries in the Pfam and PDB tables, which helps integrate both structural and functional information.
- Pfam_Feats: This table contains information on functional domain obtained from the Pfam database [40]. It includes domain names, their positions along the protein sequence, and their associated families or clans.
- PDB_Feats: This table stores the tertiary structure information provided by the Protein Data Bank [42], such as chain identifiers, the sequence regions to which they map, and several additional descriptors commonly used to describe protein structure.
2.2. Case Study: COVID-19 Severity Classification
2.2.1. Data Acquisition
2.2.2. Differential Expression Analysis
2.2.3. Retrieval of Canonical Proteins
2.2.4. INPROF Feature Extraction
2.2.5. Feature Selection
2.2.6. Classification Model
3. Results
- Percentage of polar uncharged amino acids in protein sequences (SEQ_PL).
- Number of Pfam clans shared between each protein pair (SEQ_CK).
- Variance of the length in the protein sequences (SEQ_VA).
- Maximum length in the protein sequences (SEQ_MX).
- Number of Pfam domains shared between each two proteins (SEQ_CT).
- Percentage of amino acids in protein sequences with atypical or unknown secondary structures (SEQ_SU).
- Minimum length in protein sequences (SEQ_MN).
- Percentage of amino acids in protein sequences that are included in any Pfam clan (SEQ_PC).
- Percentage of matches in the alignment between pairs of amino acids included in β-strand secondary structures (MSA_TD).
- Percentage of gaps in the alignment. This is a measure of how similar the protein sequences are (MSA_GP).
4. Discussion
5. Conclusions
Supplementary Materials
Author Contributions
Funding
Data Availability Statement
Conflicts of Interest
References
- Bendel, A.M.; Faure, A.J.; Klein, D.; Shimada, K.; Lyautey, R.; Schiffelholz, N.; Kempf, G.; Cavadini, S.; Lehner, B.; Diss, G. The genetic architecture of protein interaction affinity and specificity. Nat. Commun. 2024, 15, 8868. [Google Scholar] [CrossRef]
- Baryshev, A.; La Fleur, A.; Groves, B.; Michel, C.; Baker, D.; Ljubetič, A.; Seelig, G. Massively parallel measurement of protein–protein interactions by sequencing using MP3-seq. Nat. Chem. Biol. 2024, 20, 1514–1523. [Google Scholar] [CrossRef]
- Vander Meersche, Y.; Cretin, G.; Gheeraert, A.; Gelly, J.C.; Galochkina, T. ATLAS: Protein flexibility description from atomistic molecular dynamics simulations. Nucleic Acids Res. 2024, 52, D384–D392. [Google Scholar] [CrossRef]
- Hon, J.; Marusiak, M.; Martinek, T.; Kunka, A.; Zendulka, J.; Bednar, D.; Damborsky, J. SoluProt: Prediction of Soluble Protein Expression in Escherichia coli. Bioinformatics 2021, 37, 23–28. [Google Scholar] [CrossRef]
- Margelevicius, M. GTalign: Spatial index-driven protein structure alignment, superposition, and search. Nat. Commun. 2024, 15, 7305. [Google Scholar] [CrossRef]
- Orlando, G.; Raimondi, D.; Kagami, L.P.; Vranken, W.F. ShiftCrypt: A web server to understand and biophysically align proteins through their NMR chemical-shift values. Nucleic Acids Res. 2020, 48, W36–W40. [Google Scholar] [CrossRef]
- Penev, P.I.; McCann, H.M.; Meade, C.D.; Alvarez-Carreño, C.; Maddala, A.; Bernier, C.R.; Chivukula, V.L.; Ahmad, M.; Gulen, B.; Sharma, A.; et al. ProteoVision: Web server for advanced visualization of ribosomal proteins. Nucleic Acids Res. 2021, 49, W578–W588. [Google Scholar] [CrossRef]
- Rozewicki, J.; Li, S.; Amada, K.M.; Standley, D.M.; Katoh, K. MAFFT-DASH: Integrated protein sequence and structural alignment. Nucleic Acids Res. 2019, 47, W5–W10. [Google Scholar] [CrossRef]
- Gu, X.; Myung, Y.; Rodrigues, C.H.M.; Ascher, D.B. EFG-CS: Predicting chemical shifts from amino acid sequences with protein structure prediction using machine learning and deep learning models. Protein Sci. 2024, 33, e5096. [Google Scholar] [CrossRef]
- Tian, H.; Jiang, X.; Tao, P. PASSer: Prediction of allosteric sites server. Mach. Learn. Sci. Technol. 2021, 2, 035015. [Google Scholar] [CrossRef]
- McGuffin, L.J.; Adiyaman, R.; Maghrabi, A.H.; Shuid, A.N.; Brackenridge, D.A.; Nealon, J.O.; Philomina, L.S. IntFOLD: An integrated web resource for high performance protein structure and function prediction. Nucleic Acids Res. 2019, 47, W408–W413. [Google Scholar] [CrossRef]
- Waterhouse, A.; Bertoni, M.; Bienert, S.; Studer, G.; Tauriello, G.; Gumienny, R.; Heer, F.; Beer, T.; Rempfer, C.; Bordoli, L.; et al. SWISS-MODEL: Homology modelling of protein structures and complexes. Nucleic Acids Res. 2018, 46, W296–W303. [Google Scholar] [CrossRef]
- Abramson, J.; Adler, J.; Dunger, J.; Evans, R.; Green, T.; Pritzel, A.; Ronneberger, O.; Willmore, L.; Ballard, A.; Bambrick, J.; et al. Accurate structure prediction of biomolecular interactions with AlphaFold 3. Nature 2024, 630, 493–500. [Google Scholar] [CrossRef]
- Guevara-Barrientos, D.; Kaundal, R. ProFeatX: A parallelized protein feature extraction suite for machine learning. Comput. Struct. Biotechnol. J. 2022, 21, 796–801. [Google Scholar] [CrossRef]
- Manfredi, M.; Vazzana, G.; Savojardo, C.; Martelli, P.L.; Casadio, R. AlphaFold2 and ESMFold: A large-scale pairwise model comparison of human enzymes upon Pfam functional annotation. Comput. Struct. Biotechnol. J. 2025, 27, 461–466. [Google Scholar] [CrossRef]
- Kolberg, L.; Raudvere, U.; Kuzmin, I.; Adler, P.; Vilo, J.; Peterson, H. g:Profiler—Interoperable web service for functional enrichment analysis and gene identifier mapping (2023 update). Nucleic Acids Res. 2023, 51, W207–W212. [Google Scholar] [CrossRef]
- Ge, S.X.; Jung, D.; Yao, R. ShinyGO: A graphical gene-set enrichment tool for animals and plants. Bioinformatics 2020, 36, 2628–2629. [Google Scholar] [CrossRef]
- Zhou, Y.; Zhou, B.; Pache, L.; Chang, M.; Khodabakhshi, A.H.; Tanaseichuk, O.; Benner, C.; Chanda, S.K. Metascape provides a biologist-oriented resource for the analysis of systems-level datasets. Nat. Commun. 2019, 10, 1523. [Google Scholar] [CrossRef] [PubMed]
- Hu, B.; Tan, C.; Xu, Y.; Gao, Z.; Xia, J.; Wu, L.; Li, S.Z. ProtGO: Function-guided protein modeling for unified representation learning. In Proceedings of the 38th International Conference on Neural Information Processing Systems (NeurIPS 2024), Vancouver, BC, Canada, 9–15 December 2024; Neural Information Processing Systems Foundation, Inc.: Vancouver, BC, Canada, 2024. [Google Scholar]
- Zhang, Y.; Lang, M.; Jiang, J.; Gao, Z.; Xu, F.; Litfin, T.; Chen, K.; Singh, J.; Huang, X.; Song, G.; et al. Multiple sequence alignment-based RNA language model and its application to structural inference. Nucleic Acids Res. 2024, 52, e3. [Google Scholar] [CrossRef]
- Zhang, C.; Wang, Q.; Li, Y.; Teng, A.; Hu, G.; Wuyun, Q.; Zheng, W. The Historical Evolution and Significance of Multiple Sequence Alignment in Molecular Structure and Function Prediction. Biomolecules 2024, 14, 1531. [Google Scholar] [CrossRef]
- Javed, F.; Ahmed, J.; Hayat, M. ML-RBF: Predict protein subcellular locations in a multi-label system using evolutionary features. Chemom. Intell. Lab. Syst. 2020, 203, 104055. [Google Scholar] [CrossRef]
- Kapli, P.; Yang, Z.; Telford, M.J. Phylogenetic tree building in the genomic age. Nat. Rev. Genet. 2020, 21, 428–444. [Google Scholar] [CrossRef]
- Zou, Y.; Zhang, Z.; Zeng, Y.; Hu, H.; Hao, Y.; Huang, S.; Li, B. Common Methods for Phylogenetic Tree Construction and Their Implementation in R. Bioengineering 2024, 11, 480. [Google Scholar] [CrossRef] [PubMed]
- Yugandhar, K.; Gupta, S.; Yu, H. Inferring protein–protein interaction networks from mass spectrometry-based proteomic approaches: A mini-review. Comput. Struct. Biotechnol. J. 2019, 17, 805–811. [Google Scholar] [CrossRef] [PubMed]
- Liu, Z.; Qian, W.; Cai, W.; Song, W.; Wang, W.; Maharjan, D.T.; Cheng, W.; Chen, J.; Wang, H.; Xu, D.; et al. Inferring the effects of protein variants on protein–protein interactions with interpretable transformer representations. Research 2023, 6, 0219. [Google Scholar] [CrossRef]
- Ortuño, F.M.; Valenzuela, O.; Pomares, H.; Rojas, F.; Florido, J.P.; Urquiza, J.M.; Rojas, I. Predicting the accuracy of multiple sequence alignment algorithms by using computational intelligent techniques. Nucleic Acids Res. 2013, 41, e26. [Google Scholar] [CrossRef]
- Ortuño, F.M.; Valenzuela, O.; Prieto, B.; Saez-Lara, M.J.; Torres, C.; Pomares, H.; Rojas, I. Comparing different machine learning and mathematical regression models to evaluate multiple sequence alignments. Neurocomputing 2015, 164, 123–136. [Google Scholar] [CrossRef]
- Hu, J.; Li, Z.; Rao, B.; Thafar, M.A.; Arif, M. Improving protein–protein interaction prediction using protein language model and protein network features. Anal. Biochem. 2024, 693, 115550. [Google Scholar] [CrossRef]
- Mutti, G.; Ocaña-Pallarès, E.; Gabaldón, T. Newly developed structure-based methods do not outperform standard sequence-based methods for large-scale phylogenomics. Mol. Biol. Evol. 2025, 42, msaf149. [Google Scholar] [CrossRef]
- Kroll, A.; Niebuhr, N.; Butler, G.; Lercher, M.J. SPOT: A machine learning model that predicts specific substrates for transport proteins. PLoS Biol. 2024, 22, e3002807. [Google Scholar] [CrossRef]
- Li, W.; Shen, J.; Zhuang, A.; Wang, R.; Li, Q.; Rabata, A.; Zhang, Y.; Cao, D. Palmitoylation: An emerging therapeutic target bridging physiology and disease. Cell. Mol. Biol. Lett. 2025, 30, 98. [Google Scholar] [CrossRef] [PubMed]
- Castro-Pearson, S.; Samorodnitsky, S.; Yang, K.; Lotfi-Emran, S.; Ingraham, N.E.; Bramante, C.; Jones, E.K.; Greising, S.; Yu, M.; Steffen, B.T.; et al. Development of a proteomic signature associated with severe disease for patients with COVID-19 using data from five multicenter, randomized, controlled, and prospective studies. Sci. Rep. 2023, 13, 20315. [Google Scholar] [CrossRef]
- Redondo-Calvo, F.J.; Rabanal-Ruíz, Y.; Verdugo-Moreno, G.; Bejarano, N.; Bodoque-Villar, R.; Durán-Prado, M.; Illescas, S.; Chicano-Gálvez, E.; Gómez-Romero, F.J.; Martínez-Alarcón, J.; et al. Longitudinal assessment of nasopharyngeal biomarkers post-COVID-19: Unveiling persistent markers and severity correlations. J. Proteome Res. 2024, 23, 5064–5084. [Google Scholar] [CrossRef] [PubMed]
- Harriott, N.C.; Ryan, A.L. Proteomic profiling identifies biomarkers of COVID-19 severity. Heliyon 2024, 10, e23320. [Google Scholar] [CrossRef]
- Gabarre, P.; Dumas, G.; Dupont, T.; Darmon, M.; Azoulay, E.; Zafrani, L. Acute kidney injury in critically ill patients with COVID-19. Intensive Care Med. 2020, 46, 1339–1348. [Google Scholar] [CrossRef]
- Quiroga, S.A.; Hernández, C.; Castañeda, S.; Jimenez, P.; Vega, L.; Gomez, M.; Martinez, D.; Ballesteros, N.; Muñoz, M.; Cifuentes, C.; et al. Contrasting SARS-CoV-2 RNA copies and clinical symptoms in a large cohort of Colombian patients during the first wave of the COVID-19 pandemic. Ann. Clin. Microbiol. Antimicrob. 2021, 20, 39. [Google Scholar] [CrossRef]
- Choudhary, S.; Sreenivasulu, K.; Mitra, P.; Misra, S.; Sharma, P. Role of genetic variants and gene expression in the susceptibility and severity of COVID-19. Ann. Lab. Med. 2021, 41, 129–138. [Google Scholar] [CrossRef]
- Bajo-Morales, J.; Castillo-Secilla, D.; Herrera, L.J.; Caba, O.; Prados, J.C.; Rojas, I. Predicting COVID-19 Severity Integrating RNA-Seq Data Using Machine Learning Techniques. Curr. Bioinform. 2023, 18, 221–231. [Google Scholar] [CrossRef]
- Paysan-Lafosse, T.; Andreeva, A.; Blum, M.; Chuguransky, S.R.; Grego, T.; Pinto, B.L.; Salazar, G.A.; Bileschi, M.L.; Llinares-López, F.; Meng-Papaxanthos, L.; et al. The Pfam protein families database: Embracing AI/ML. Nucleic Acids Res. 2025, 53, D523–D534. [Google Scholar] [CrossRef]
- UniProt Consortium. UniProt: The Universal Protein Knowledgebase in 2025. Nucleic Acids Res. 2025, 53, D609–D617. [Google Scholar] [CrossRef]
- Burley, S.K.; Piehl, D.W.; Vallat, B.; Zardecki, C. RCSB Protein Data Bank: Supporting research and education worldwide through explorations of experimentally determined and computationally predicted atomic level 3D biostructures. IUCrJ 2024, 11, 279–286. [Google Scholar] [CrossRef]
- Kagaya, Y.; Flannery, S.T.; Jain, A.; Kihara, D. ContactPFP: Protein function prediction using predicted contact information. Front. Bioinform. 2022, 2, 896295. [Google Scholar] [CrossRef]
- The Gene Ontology Consortium; Aleksander, S.A.; Balhoff, J.; Carbon, S.; Cherry, J.M.; Drabkin, H.J.; Ebert, D.; Feuermann, M.; Gaudet, P.; Harris, N.L.; et al. The Gene Ontology knowledgebase in 2023. Genetics 2023, 224, iyad031. [Google Scholar] [CrossRef]
- Larkin, M.A.; Blackshields, G.; Brown, N.P.; Chenna, R.; McGettigan, P.A.; McWilliam, H.; Valentin, F.; Wallace, I.M.; Wilm, A.; Lopez, R.; et al. Clustal W and Clustal X version 2.0. Bioinformatics 2007, 23, 2947–2948. [Google Scholar] [CrossRef]
- Notredame, C.; Higgins, D.G.; Heringa, J. T-Coffee: A novel method for fast and accurate multiple sequence alignment. J. Mol. Biol. 2000, 302, 205–217. [Google Scholar] [CrossRef]
- Edgar, R.C. MUSCLE: Multiple sequence alignment with high accuracy and high throughput. Nucleic Acids Res. 2004, 32, 1792–1797. [Google Scholar] [CrossRef] [PubMed]
- Clough, E.; Barrett, T.; Wilhite, S.E.; Ledoux, P.; Evangelista, C.; Kim, I.F.; Tomashevsky, M.; Marshall, K.A.; Phillippy, K.H.; Sherman, P.M.; et al. NCBI GEO: Archive for gene expression and epigenomics data sets: 23-year update. Nucleic Acids Res. 2024, 52, D138–D144. [Google Scholar] [CrossRef] [PubMed]
- Mick, E.; Kamm, J.; Pisco, A.O.; Ratnasiri, K.; Babik, J.M.; Castañeda, G.; DeRisi, J.L.; Detweiler, A.M.; Hao, S.L.; Kangelaris, K.N.; et al. Upper airway gene expression reveals suppressed immune responses to SARS-CoV-2 compared with other respiratory viruses. Nat. Commun. 2020, 11, 5854. [Google Scholar] [CrossRef]
- Lieberman, N.A.P.; Peddu, V.; Xie, H.; Shrestha, L.; Huang, M.L.; Mears, M.C.; Cajimat, M.N.; Bente, D.A.; Shi, P.Y.; Bovier, F.; et al. In vivo antiviral host transcriptional response to SARS-CoV-2 by viral load, sex, and age. PLoS Biol. 2020, 18, e3000849. [Google Scholar] [CrossRef]
- Jain, R.; Ramaswamy, S.; Harilal, D.; Uddin, M.; Loney, T.; Nowotny, N.; Alsuwaidi, H.; Varghese, R.; Deesi, Z.; Alkhajeh, A.; et al. Host transcriptomic profiling of COVID-19 patients with mild, moderate, and severe clinical outcomes. Comput. Struct. Biotechnol. J. 2020, 19, 153–160. [Google Scholar] [CrossRef]
- Castillo-Secilla, D.; Gálvez, J.M.; Carrillo-Perez, F.; Verona-Almeida, M.; Redondo-Sánchez, D.; Ortuño, F.M.; Herrera, L.J.; Rojas, I. KnowSeq R-Bioc package: The automatic smart gene expression tool for retrieving relevant biological knowledge. Comput. Biol. Med. 2021, 133, 104387. [Google Scholar] [CrossRef]
- Dallah, D.; Sulieman, H.; Zaatreh, A.A.; Kamalov, F. Empirical evaluation of the relative range for detecting outliers. Entropy 2025, 27, 731. [Google Scholar] [CrossRef]
- Lazar, C.; Meganck, S.; Taminau, J.; Steenhoff, D.; Coletta, A.; Molter, C.; Weiss-Solís, D.Y.; Duque, R.; Bersini, H.; Nowé, A. Batch effect removal methods for microarray gene expression data integration: A survey. Brief. Bioinform. 2013, 14, 469–490. [Google Scholar] [CrossRef]
- Howe, K.L.; Achuthan, P.; Allen, J.; Allen, J.; Alvarez-Jarreta, J.; Amode, M.R.; Armean, I.M.; Azov, A.G.; Bennett, R.; Bhai, J.; et al. Ensembl 2021. Nucleic Acids Res. 2021, 49, D884–D891. [Google Scholar] [CrossRef]
- Skrodzki, M.; van Geffen, H.; Chaves-de-Plaza, N.F.; Höllt, T.; Eisemann, E.; Hildebrandt, K. Accelerating hyperbolic t-SNE. IEEE Trans. Vis. Comput. Graph. 2024, 30, 4403–4415. [Google Scholar] [CrossRef] [PubMed]
- McInnes, L.; Healy, J.; Saul, N.; Großberger, L. UMAP: Uniform manifold approximation and projection. J. Open Source Softw. 2018, 3, 861. [Google Scholar] [CrossRef]
- Krijthe, J.H. R Package, version 0.17; Rtsne: T-Distributed Stochastic Neighbor Embedding Using Barnes–Hut Implementation. 2015. Available online: https://github.com/jkrijthe/Rtsne (accessed on 12 November 2025).
- De Jay, N.; Papillon-Cavanagh, S.; Olsen, C.; El-Hachem, N.; Bontempi, G.; Haibe-Kains, B. mRMRe: An R package for parallelized Minimum Redundancy Maximum Relevance feature selection. Bioinformatics 2013, 29, 2365–2368. [Google Scholar] [CrossRef] [PubMed]
- Mittal, M.; Gujjar, P.J.; Guru Prasad, M.S.; Devadas, R.M.; Ambreen, L.; Kumar, V. Dimensionality reduction using UMAP and t-SNE technique. In Proceedings of the Second International Conference on Advances in Information Technology (ICAIT 2024), Chikkamagaluru, India, 24–27 July 2024; IEEE: Piscataway, NJ, USA, 2024; pp. 1–5. [Google Scholar]
- Vitting-Seerup, K. Most protein domains exist as variants with distinct functions across cells, tissues and diseases. NAR Genom. Bioinform. 2023, 5, lqad084. [Google Scholar] [CrossRef]
- Zhou, Y.; Liu, Y.; Gupta, S.; Paramo, M.I.; Hou, Y.; Mao, C.; Luo, Y.; Judd, J.; Wierbowski, S.; Bertolotti, M.; et al. A comprehensive SARS-CoV-2–human protein–protein interactome reveals COVID-19 pathobiology and potential host therapeutic targets. Nat. Biotechnol. 2023, 41, 128–139. [Google Scholar] [CrossRef]
- Savojardo, C.; Babbi, G.; Martelli, P.L.; Casadio, R. Mapping OMIM disease-related variations on protein domains reveals an association among variation type, Pfam models, and disease classes. Front. Mol. Biosci. 2021, 8, 617016. [Google Scholar] [CrossRef]
- Kim, D.K.; Weller, B.; Lin, C.W.; Sheykhkarimli, D.; Knapp, J.J.; Dugied, G.; Zanzoni, A.; Pons, C.; Tofaute, M.J.; Maseko, S.B.; et al. A proteome-scale map of the SARS-CoV-2–human contactome. Nat. Biotechnol. 2023, 41, 140–149. [Google Scholar] [CrossRef]
- Dyakov, I.N.; Mavletova, D.A.; Chernyshova, I.N.; Snegireva, N.A.; Gavrilova, M.V.; Bushkova, K.K.; Dyachkova, M.S.; Alekseeva, M.G.; Danilenko, V.N. FN3 protein fragment containing two type III fibronectin domains from B. longum GT15 binds to human tumor necrosis factor alpha in vitro. Anaerobe 2020, 65, 102247. [Google Scholar] [CrossRef]
- Walls, A.C.; Park, Y.J.; Tortorici, M.A.; Wall, A.; McGuire, A.T.; Veesler, D. Structure, function, and antigenicity of the SARS-CoV-2 spike glycoprotein. Cell 2020, 183, 1735. [Google Scholar] [CrossRef] [PubMed]
- Arai, M.; Suetaka, S.; Ooka, K. Dynamics and interactions of intrinsically disordered proteins. Curr. Opin. Struct. Biol. 2024, 84, 102734. [Google Scholar] [CrossRef] [PubMed]
- Shah, P.S.; Beesabathuni, N.S.; Fishburn, A.T.; Kenaston, M.W.; Minami, S.A.; Pham, O.H.; Tucker, I. Systems biology of virus–host protein interactions: From hypothesis generation to mechanisms of replication and pathogenesis. Annu. Rev. Virol. 2022, 9, 397–415. [Google Scholar] [CrossRef]






| Feature Category | Sequence-Based | MSA-Based | Total |
|---|---|---|---|
| Sequence Statistics | 5 | 3 | 8 |
| Amino Acid Type | 5 | 5 | 10 |
| Functional Domains | 6 | 2 | 8 |
| Secondary Structure | 4 | 5 | 9 |
| Tertiary Structure | 4 | 2 | 6 |
| Ontological Terms | 5 | 0 | 5 |
| Total | 29 | 17 | 46 |
| Training (80%)—5-Fold CV | ||||
| # | Accuracy | Sensitivity | Specificity | F1-Score |
| 2 | 0.937 ± 0.021 | 0.761 ± 0.181 | 0.893 ± 0.085 | 0.515 ± 0.471 |
| 8 | 0.929 ± 0.052 | 0.789 ± 0.168 | 0.904 ± 0.073 | 0.677 ± 0.383 |
| 10 | 0.936 ± 0.054 | 0.855 ± 0.049 | 0.926 ± 0.029 | 0.854 ± 0.060 |
| 14 | 0.976 ± 0.022 | 0.900 ± 0.091 | 0.954 ± 0.050 | 0.928 ± 0.065 |
| Test (20%) | ||||
| # | Accuracy | Sensitivity | Specificity | F1-Score |
| 2 | 0.910 | 0.654 | 0.878 | 0.648 |
| 8 | 0.879 | 0.642 | 0.867 | 0.642 |
| 10 | 0.879 | 0.642 | 0.867 | 0.642 |
| 14 | 0.910 | 0.654 | 0.878 | 0.648 |
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content. |
© 2026 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license.
Share and Cite
El-Awadi, R.; Gomez, O.D.; Castillo-Secilla, D.; Torres, C.; Herrera, L.J.; Rojas, I.; Ortuño, F.M. Interrelational Proteomic Sequence Features Enhance Predictive Modeling: Application to COVID-19 Severity. Biomedicines 2026, 14, 378. https://doi.org/10.3390/biomedicines14020378
El-Awadi R, Gomez OD, Castillo-Secilla D, Torres C, Herrera LJ, Rojas I, Ortuño FM. Interrelational Proteomic Sequence Features Enhance Predictive Modeling: Application to COVID-19 Severity. Biomedicines. 2026; 14(2):378. https://doi.org/10.3390/biomedicines14020378
Chicago/Turabian StyleEl-Awadi, Radwa, Oscar D. Gomez, Daniel Castillo-Secilla, Carolina Torres, Luis J. Herrera, Ignacio Rojas, and Francisco M. Ortuño. 2026. "Interrelational Proteomic Sequence Features Enhance Predictive Modeling: Application to COVID-19 Severity" Biomedicines 14, no. 2: 378. https://doi.org/10.3390/biomedicines14020378
APA StyleEl-Awadi, R., Gomez, O. D., Castillo-Secilla, D., Torres, C., Herrera, L. J., Rojas, I., & Ortuño, F. M. (2026). Interrelational Proteomic Sequence Features Enhance Predictive Modeling: Application to COVID-19 Severity. Biomedicines, 14(2), 378. https://doi.org/10.3390/biomedicines14020378

