MAKA-Map: Real-Valued Distance Prediction for Protein Folding Mechanisms via a Hybrid Neural Framework Integrating the Mamba and Kolmogorov–Arnold Networks
Abstract
1. Introduction
2. Materials and Methods
2.1. Datasets
2.2. Feature Selection
2.2.1. MSA-Derived Feature Representation
- Coevolutionary signals: Predicted using CCMpred [22] to capture the co-evolutionary strength between all residue pairs in a multiple sequence alignment (MSA), reflecting their potential synergistic mutations over evolutionary time;
- FreeContact feature: A contact prediction tool based on co-evolutionary information that provides estimates of residue-residue interaction strengths to aid in modeling possible physical contacts [23];
- Position-specific scoring matrix (PSSM): Derived from MSA calculations, it describes the probability distribution of amino acid occurrences at each position, capturing evolutionary conservation;
2.2.2. Feature-Space Independence Analysis
2.3. Architecture of MAKA-Map
2.4. Model Training and Prediction Process
2.5. Performance Evaluation
2.6. Experimental Environment
3. Results
3.1. Comparison with Existing Real-Value Distance Structural Prediction Methods
- Preserving residue-pair consistency throughout the encoding process to prevent structural information loss typically caused by conventional convolutional operations;
- Incorporating the Mamba module to enhance medium- and long-range residue interaction modeling, which is particularly beneficial for structurally complex targets;
- Integrating the Kolmogorov–Arnold Network (KAN) to improve local structural modeling through adaptive nonlinear transformations, thereby enhancing the representation of complex spatial topologies.
3.2. Analysis of the Effect of Long-Range Residues on Prediction
3.3. Training Stability Under Five-Fold Cross-Validation
3.4. Ablation Experiments
- CFP combination: comprising three classical feature types—co-evolutionary coupling scores (CCMpred), contact probability predictions (FreeContact), and position-specific scoring matrices (PSSM);
- AF combination: building upon the CFP features, contact potentials and Shannon entropy are additionally introduced, resulting in a total of five feature types.
- Baseline (Base): a residual backbone network without additional modules;
- KBL: the Kolmogorov–Arnold Network (KAN) module integrated on top of the Baseline;
- MBL: the Mamba module added to the Baseline architecture;
- MAKA: the full model integrating both KAN and Mamba modules.
3.5. Structural Relevance of MAKA-Map Predictions on CASP Targets
3.5.1. Qualitative Analysis of Predicted Distance Maps on Representative CASP Targets
- Sequence length and scale diversity.The three targets span noticeably different sequence lengths, including a short protein (T1084), a medium-length protein (T1078), and a longer protein (T1041). Variations in protein length are known to influence contact map sparsity, the relative abundance of long-range interactions, and the overall difficulty of distance prediction. Considering multiple length scales, therefore, helps to assess the general applicability of the proposed method.
- Diversity in contact map topology.The selected proteins exhibit clearly different contact map characteristics. In T1084, most contacts are concentrated near the main diagonal, which is typical for shorter or more regularly folded proteins dominated by local interactions. T1078, on the other hand, shows more evident block-like and repetitive off-diagonal patterns, suggesting more complex interactions between distant sequence segments or multiple structural units. As a longer protein, T1041 presents a comparatively sparser contact map with a higher proportion of long-range and cross-segment interactions, a pattern commonly observed in proteins with more complex structural organization. Together, these targets cover a range of contact distributions from predominantly local to strongly long-range interactions.
- Variation in prediction difficulty.As illustrated in Figure 9, the complexity of non-diagonal regions and the proportion of long-range contacts differ markedly among the three targets. For instance, T1041 contains more extensive interaction regions far from the diagonal, which generally corresponds to a more challenging prediction task. Consistent performance across such varying levels of difficulty provides evidence for the stability of the proposed approach.
3.5.2. Three-Dimensional Structure Reconstruction Using MAKA-Map Predicted Distances
3.6. Comparison with AlphaFold3
3.7. Effect of MSA Depth on Prediction Reliability
4. Conclusions
Author Contributions
Funding
Institutional Review Board Statement
Informed Consent Statement
Data Availability Statement
Acknowledgments
Conflicts of Interest
References
- Wu, H.; Liu, J.; Jiang, T.; Zou, Q.; Qi, S.; Cui, Z.; Tiwari, P.; Ding, Y. AttentionMGT-DTA: A multi-modal drug-target affinity prediction using graph transformer and attention mechanism. Neural Netw. 2024, 169, 623–636. [Google Scholar] [CrossRef]
- Zhou, C.; Li, Z.; Song, J.; Xiang, W. TransVAE-DTA: Transformer and variational autoencoder network for drug-target binding affinity prediction. Comput. Methods Programs Biomed. 2024, 244, 108003. [Google Scholar] [CrossRef]
- Jiang, M.; Li, Z.; Zhang, S.; Wang, S.; Wang, X.; Yuan, Q.; Wei, Z. Drug-target affinity prediction using graph neural network and contact maps. RSC Adv. 2020, 10, 20701–20712. [Google Scholar] [CrossRef]
- Shah, P.M.; Zhu, H.; Lu, Z.; Wang, K.; Tang, J.; Li, M. DeepDTAGen: A multitask deep learning framework for drug-target affinity prediction and target-aware drugs generation. Nat. Commun. 2025, 16, 5021. [Google Scholar] [CrossRef]
- Wang, J.; Xiao, Y.; Shang, X.; Peng, J. Predicting drug-target binding affinity with cross-scale graph contrastive learning. Briefings Bioinform. 2024, 25, bbad516. [Google Scholar] [CrossRef]
- Kumar, R.; Romano, J.D.; Ritchie, M.D. CASTER-DTA: Equivariant graph neural networks for predicting drug-target affinity. Briefings Bioinform. 2025, 26, bbaf554. [Google Scholar] [CrossRef]
- Berman, H.M.; Battistuz, T.; Bhat, T.N.; Bluhm, W.F.; Bourne, P.E.; Burkhardt, K.; Feng, Z.; Gilliland, G.L.; Iype, L.; Jain, S.; et al. The Protein Data Bank. Acta Crystallogr. Sect. D 2002, 58, 899–907. [Google Scholar] [CrossRef] [PubMed]
- Zhang, C.; Zheng, W.; Mortuza, S.M.; Li, Y.; Zhang, Y. DeepMSA: Constructing deep multiple sequence alignment to improve contact prediction and fold-recognition for distant-homology proteins. Bioinformatics 2020, 36, 2105–2112. [Google Scholar] [CrossRef] [PubMed]
- Li, Y.; Hu, J.; Zhang, C.; Yu, D.J.; Zhang, Y. ResPRE: High-accuracy protein contact prediction by coupling precision matrix with deep residual neural networks. Bioinformatics 2019, 35, 4647–4655. [Google Scholar] [CrossRef]
- Madani, M.; Behzadi, M.M.; Song, D.; Ilies, H.T.; Tarakanova, A. Improved inter-residue contact prediction via a hybrid generative model and dynamic loss function. Comput. Struct. Biotechnol. J. 2022, 20, 6138–6148. [Google Scholar] [CrossRef] [PubMed]
- Zhao, C.; Wang, S. AttCON: With better MSAs and attention mechanism for accurate protein contact map prediction. Comput. Biol. Med. 2024, 169, 107822. [Google Scholar] [CrossRef] [PubMed]
- Adhikari, B. A fully open-source framework for deep learning protein real-valued distances. Sci. Rep. 2020, 10, 13374. [Google Scholar] [CrossRef] [PubMed]
- Yang, J.; Anishchenko, I.; Park, H.; Peng, Z.; Ovchinnikov, S.; Baker, D. Improved protein structure prediction using predicted interresidue orientations. Proc. Natl. Acad. Sci. USA 2020, 117, 1496–1503. [Google Scholar] [CrossRef]
- Rahman, J.; Newton, M.A.H.; Ben Islam, M.K.; Sattar, A. Enhancing protein inter-residue real distance prediction by scrutinising deep learning models. Sci. Rep. 2022, 12, 787. [Google Scholar] [CrossRef] [PubMed]
- Zhang, Y.; Zhong, S.; Xu, S.; Wang, Z.; Xin, C.; Ni, F.; Yan, F.; Lu, X.; Sun, S.; Wang, H.; et al. MF-ProtDisMap: Protein real-valued distance prediction with fusion of sequence and coevolutionary features. Int. J. Biol. Macromol. 2025, 328, 147637. [Google Scholar] [CrossRef]
- Liao, W.; Zhu, Y.; Wang, X.; Pan, C.; Wang, Y.; Ma, L. LightM-UNet: Mamba Assists in Lightweight UNet for Medical Image Segmentation. arXiv 2024, arXiv:2403.05246. [Google Scholar]
- Li, C.; Liu, X.; Li, W.; Wang, C.; Liu, H.; Liu, Y.; Chen, Z.; Yuan, Y. U-KAN Makes Strong Backbone for Medical Image Segmentation and Generation. Proc. AAAI Conf. Artif. Intell. 2025, 39, 4652–4660. [Google Scholar] [CrossRef]
- Jones, D.T.; Kandathil, S.M. High precision in protein contact prediction using fully convolutional neural networks and minimal sequence features. Bioinformatics 2018, 34, 3308–3315. [Google Scholar] [CrossRef]
- Kryshtafovych, A.; Schwede, T.; Topf, M.; Fidelis, K.; Moult, J. Critical assessment of methods of protein structure prediction (CASP)—Round XIII. Proteins Struct. Funct. Bioinform. 2019, 87, 1011–1020. [Google Scholar] [CrossRef]
- Kryshtafovych, A.; Schwede, T.; Topf, M.; Fidelis, K.; Moult, J. Critical assessment of methods of protein structure prediction (CASP)—Round XIV. Proteins Struct. Funct. Bioinform. 2021, 89, 1607–1617. [Google Scholar] [CrossRef]
- Kryshtafovych, A.; Schwede, T.; Topf, M.; Fidelis, K.; Moult, J. Critical assessment of methods of protein structure prediction (CASP)—Round XV. Proteins Struct. Funct. Bioinform. 2023, 91, 1539–1549. [Google Scholar] [CrossRef] [PubMed]
- Seemayer, S.; Gruber, M.; Soeding, J. CCMpred-fast and precise prediction of protein residue-residue contacts from correlated mutations. Bioinformatics 2014, 30, 3128–3130. [Google Scholar] [CrossRef]
- Kajan, L.; Hopf, T.A.; Kalas, M.; Marks, D.S.; Rost, B. FreeContact: Fast and free software for protein contact prediction from residue co-evolution. BMC Bioinform. 2014, 15, 85. [Google Scholar] [CrossRef] [PubMed]
- Jones, D.T.; Singh, T.; Kosciolek, T.; Tetchner, S. MetaPSICOV: Combining coevolution methods for accurate prediction of contacts and long range hydrogen bonding in proteins. Bioinformatics 2015, 31, 999–1006. [Google Scholar] [CrossRef]
- Buchan, D.W.A.; Jones, D.T. Improved protein contact predictions with the MetaPSICOV2 server in CASP12. Proteins-Struct. Funct. Bioinform. 2018, 86, 78–83. [Google Scholar] [CrossRef]
- Mariani, V.; Biasini, M.; Barbato, A.; Schwede, T. lDDT: A local superposition-free score for comparing protein structures and models using distance difference tests. Bioinformatics 2013, 29, 2722–2728. [Google Scholar] [CrossRef] [PubMed]
- Bhattacharya, S.; Bhattacharya, D. Evaluating the significance of contact maps in low-homology protein modeling using contact-assisted threading. Sci. Rep. 2020, 10, 2908. [Google Scholar] [CrossRef]
- Adhikari, B.; Shrestha, B.; Bernardini, M.; Hou, J.; Lea, J. DISTEVAL: A web server for evaluating predicted protein distances. BMC Bioinform. 2021, 22, 8. [Google Scholar] [CrossRef]
- Bernardini, M. DISTFOLD: Distance-guided Protein Folding. Master’s Thesis, University of Missouri, St. Louis, MO, USA, 2021. [Google Scholar]
- Cheng, J.; Randall, A.; Sweredoski, M.; Baldi, P. SCRATCH: A protein structure and structural feature prediction server. Nucleic Acids Res. 2005, 33, W72–W76. [Google Scholar] [CrossRef]
- Brünger, A.T.; Adams, P.D.; Clore, G.M.; DeLano, W.L.; Gros, P.; Grosse-Kunstleve, R.W.; Jiang, J.S.; Kuszewski, J.; Nilges, M.; Pannu, N.S.; et al. Crystallography & NMR System: A New Software Suite for Macromolecular Structure Determination. Acta Crystallogr. Sect. D 1998, 54, 905–921. [Google Scholar] [CrossRef]
- Brunger, A.T. Version 1.2 of the Crystallography and NMR system. Nat. Protoc. 2007, 2, 2728–2733. [Google Scholar] [CrossRef] [PubMed]
- Sathyapriya, R.; Duarte, J.M.; Stehr, H.; Filippis, I.; Lappe, M. Defining an Essence of Structure Determining Residue Contacts in Proteins. PLoS Comput. Biol. 2009, 5, e1000584. [Google Scholar] [CrossRef] [PubMed]
- Abramson, J.; Adler, J.; Dunger, J.; Evans, R.; Green, T.; Pritzel, A.; Ronneberger, O.; Willmore, L.; Ballard, A.J.; Bambrick, J.; et al. Accurate structure prediction of biomolecular interactions with AlphaFold 3. Nature 2024, 630, 493–500. [Google Scholar] [CrossRef] [PubMed]
- Haghani, M.; Bhattacharya, D.; Murali, T.M. NEFFy: A Versatile Tool for Computing the Number of Effective Sequences. Bioinformatics 2025, btaf222. [Google Scholar] [CrossRef]











| Dataset | Predictor | Precision | MAE | MCC | ||||
|---|---|---|---|---|---|---|---|---|
| L/5 | L/2 | L | L/5 | L/2 | L | |||
| CASP13 | trRosetta | 0.7221 | 0.5802 | 0.4533 | 3.3817 | 3.7976 | 4.4342 | 0.3899 |
| pdnet | 0.7537 | 0.6117 | 0.4860 | 2.1573 | 2.7109 | 3.2811 | 0.4080 | |
| MF-ProtDisMap | 0.8185 | 0.7285 | 0.5665 | 1.5932 | 1.9652 | 2.3629 | 0.4653 | |
| sdp | 0.7831 | 0.6861 | 0.5642 | 1.6960 | 1.9302 | 2.2350 | 0.4419 | |
| MAKA | 0.8653 | 0.7601 | 0.5899 | 1.3689 | 1.6656 | 2.1019 | 0.5262 | |
| CASP14 | trRosetta | 0.7271 | 0.6026 | 0.4945 | 2.9052 | 3.2271 | 3.7977 | 0.3734 |
| pdnet | 0.7981 | 0.6962 | 0.5693 | 2.1241 | 2.5884 | 2.7834 | 0.3997 | |
| MF-ProtDisMap | 0.8077 | 0.7153 | 0.6190 | 1.6984 | 1.8987 | 2.4531 | 0.4814 | |
| sdp | 0.8342 | 0.7386 | 0.6131 | 1.5891 | 1.9932 | 2.1275 | 0.4972 | |
| MAKA | 0.8544 | 0.7525 | 0.6393 | 1.5812 | 1.8603 | 1.9755 | 0.5197 | |
| CASP15 | trRosetta | 0.7788 | 0.6718 | 0.5179 | 3.2162 | 3.3895 | 3.6178 | 0.3735 |
| pdnet | 0.7895 | 0.6558 | 0.4958 | 2.2984 | 2.6933 | 2.9189 | 0.3790 | |
| MF-ProtDisMap | 0.8079 | 0.6979 | 0.5815 | 2.4698 | 2.5203 | 2.7987 | 0.5012 | |
| sdp | 0.8452 | 0.7403 | 0.5839 | 2.3012 | 2.4961 | 2.6157 | 0.5305 | |
| MAKA | 0.8277 | 0.7424 | 0.5935 | 2.2186 | 2.4938 | 2.6145 | 0.5250 | |
| Dataset | Predictor | Precision | lDDT | ||
|---|---|---|---|---|---|
| L/5 | L/2 | L | |||
| CASP13 | trRosetta | 0.6083 | 0.5006 | 0.4013 | 0.4000 |
| pdnet | 0.6212 | 0.5543 | 0.4834 | 0.4058 | |
| MF-ProtDisMap | 0.7458 | 0.6358 | 0.5263 | 0.4513 | |
| sdp | 0.7211 | 0.6008 | 0.5069 | 0.3884 | |
| MAKA | 0.7608 | 0.6775 | 0.5541 | 0.4836 | |
| CASP14 | trRosetta | 0.6226 | 0.5018 | 0.4067 | 0.4329 |
| pdnet | 0.7326 | 0.5933 | 0.4496 | 0.4188 | |
| MF-ProtDisMap | 0.7411 | 0.6077 | 0.4662 | 0.4243 | |
| sdp | 0.7665 | 0.6249 | 0.4822 | 0.3686 | |
| MAKA | 0.7750 | 0.6594 | 0.5168 | 0.4976 | |
| CASP15 | trRosetta | 0.7272 | 0.5906 | 0.4522 | 0.4510 |
| pdnet | 0.7133 | 0.5663 | 0.4235 | 0.3620 | |
| MF-ProtDisMap | 0.7369 | 0.5815 | 0.4588 | 0.4445 | |
| sdp | 0.7988 | 0.6560 | 0.5005 | 0.3360 | |
| MAKA | 0.7992 | 0.6671 | 0.5023 | 0.4544 | |
| Dataset | Configuration | Precision | MAE | Pearson r | ||||||
|---|---|---|---|---|---|---|---|---|---|---|
| CASP14 | CFP+Base | 0.7889 | 0.6988 | 0.5677 | 2.0337 | 2.3759 | 2.7694 | 0.2049 | 0.2939 | 0.3424 |
| CFP+KBL | 0.8030 | 0.7071 | 0.5729 | 1.8083 | 2.5569 | 2.8268 | 0.2145 | 0.3217 | 0.3577 | |
| CFP+MBL | 0.8146 | 0.7223 | 0.5816 | 1.9004 | 2.4208 | 2.6969 | 0.2208 | 0.3138 | 0.3423 | |
| CFP+MAKA | 0.8235 | 0.7196 | 0.6127 | 1.7245 | 2.0277 | 2.1543 | 0.2201 | 0.3309 | 0.3777 | |
| AF+Base | 0.8272 | 0.7232 | 0.5876 | 1.8649 | 2.1788 | 2.5398 | 0.2146 | 0.3076 | 0.3584 | |
| AF+KBL | 0.8322 | 0.7318 | 0.5937 | 1.6590 | 2.3449 | 2.2516 | 0.2245 | 0.3369 | 0.3747 | |
| AF+MBL | 0.8443 | 0.7492 | 0.6027 | 1.7435 | 2.2209 | 1.8742 | 0.2313 | 0.3283 | 0.3585 | |
| AF+MAKA | 0.8544 | 0.7525 | 0.6393 | 1.5812 | 1.8603 | 1.9755 | 0.2305 | 0.3465 | 0.3954 | |
| CASP15 | CFP+Base | 0.7500 | 0.6496 | 0.5150 | 4.0664 | 4.6219 | 4.9578 | 0.1487 | 0.2426 | 0.3707 |
| CFP+KBL | 0.7632 | 0.6824 | 0.5371 | 3.5064 | 3.5935 | 3.8642 | 0.1597 | 0.2759 | 0.3602 | |
| CFP+MBL | 0.7925 | 0.6938 | 0.5564 | 3.7056 | 3.9139 | 3.5815 | 0.2276 | 0.3434 | 0.3777 | |
| CFP+MAKA | 0.7988 | 0.7168 | 0.5727 | 2.4183 | 2.7182 | 2.8498 | 0.2587 | 0.3574 | 0.3936 | |
| AF+Base | 0.7782 | 0.6732 | 0.5346 | 3.7306 | 4.2403 | 4.5475 | 0.1557 | 0.2538 | 0.3881 | |
| AF+KBL | 0.7910 | 0.7082 | 0.5575 | 3.2169 | 3.2968 | 3.5451 | 0.1671 | 0.2889 | 0.3769 | |
| AF+MBL | 0.8213 | 0.7190 | 0.5760 | 3.3996 | 3.5898 | 3.2858 | 0.2381 | 0.3596 | 0.3956 | |
| AF+MAKA | 0.8277 | 0.7424 | 0.5935 | 2.2186 | 2.4938 | 2.6145 | 0.2708 | 0.3740 | 0.4121 | |
| Test Set | Best TM-Score | Best RMSD (Å) | Mean TM-Score | Mean RMSD (Å) |
|---|---|---|---|---|
| CASP13 | 0.547 | 3.44 | 0.528 | 4.94 |
| CASP14 | 0.586 | 4.49 | 0.567 | 5.80 |
| CASP15 | 0.573 | 5.25 | 0.551 | 7.07 |
| Neff/L Range | #Targets | Precision L/5 | lDDT (Seq ≥ 24) |
|---|---|---|---|
| <1 | 8 | ||
| 1–2 | 11 | ||
| 2–4 | 7 | ||
| >4 | 17 |
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content. |
© 2026 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license.
Share and Cite
Dong, B.; Hua, Y.; Hou, C.; Xu, D.; Wang, G. MAKA-Map: Real-Valued Distance Prediction for Protein Folding Mechanisms via a Hybrid Neural Framework Integrating the Mamba and Kolmogorov–Arnold Networks. Biomolecules 2026, 16, 194. https://doi.org/10.3390/biom16020194
Dong B, Hua Y, Hou C, Xu D, Wang G. MAKA-Map: Real-Valued Distance Prediction for Protein Folding Mechanisms via a Hybrid Neural Framework Integrating the Mamba and Kolmogorov–Arnold Networks. Biomolecules. 2026; 16(2):194. https://doi.org/10.3390/biom16020194
Chicago/Turabian StyleDong, Benzhi, Yumeng Hua, Chang Hou, Dali Xu, and Guohua Wang. 2026. "MAKA-Map: Real-Valued Distance Prediction for Protein Folding Mechanisms via a Hybrid Neural Framework Integrating the Mamba and Kolmogorov–Arnold Networks" Biomolecules 16, no. 2: 194. https://doi.org/10.3390/biom16020194
APA StyleDong, B., Hua, Y., Hou, C., Xu, D., & Wang, G. (2026). MAKA-Map: Real-Valued Distance Prediction for Protein Folding Mechanisms via a Hybrid Neural Framework Integrating the Mamba and Kolmogorov–Arnold Networks. Biomolecules, 16(2), 194. https://doi.org/10.3390/biom16020194

