Overcoming the Curse of Dimensionality with Synolitic AI
Abstract
1. Introduction
- Robustness to heterogeneity. They must detect diverse and hidden patterns that may lead to the same clinical outcome, recognizing that compensatory mechanisms can produce similar disease manifestations via different pathways.
- Data efficiency. They must be able to learn from small sample sizes, given the scarcity and cost of clinical data.
- Interpretability. They must go beyond simple classification and offer insightful, explainable reasoning, and, even better, the test procedure to verify the conclusion, especially for tasks such as early diagnosis and risk stratification.
2. Materials and Methods
2.1. Datasets
- Binary class labels;
- No missing values;
- All features are numerical or binary;
- The number of samples exceeds the number of features.
2.2. Models
2.2.1. Pipeline Architecture
- 1.
- SGNN Graph Construction. We generate sample-specific graphs from selected tabular datasets using the SGNN methodology, which relies on ensembles of pairwise classifiers trained with class labels for each dataset and produces a unique graph structure for every data point [16].
- 2.
- GNN Training. We train two Graph Neural Networks—GCN and GATv2—on the resulting graphs to perform classification. For each model and task, we use training parameters selected individually via hyperparameter optimization with the Optuna framework [38].
- 3.
- Training Strategies. We evaluate two training regimes: (i) training on the concatenation of all datasets as a form of task-agnostic pretraining or foundation model setting, and (ii) individual training on each dataset separately. Exploration of these settings allows us to examine generalization across datasets.
- 4.
- Comparison with Classical Models. We compare our graph-based approaches with a classical XGBoost classifier [34] trained on the same datasets to assess the relevance and added value of GNNs in this classification context.
2.2.2. Node Features
- is the original scalar feature of node i;
- is the normalized node degree, i.e., the normalized number of edges connected to this node: , where N is the number of nodes;
- is the normalized node strength: , where and is an edge weight between nodes i and j;
- is the closeness centrality of node i, calculated as the reciprocal of the sum of the length of the shortest paths between the node and all other nodes in the graph;
- is the betweenness centrality of node i computed with edge weights.
2.2.3. Graph Sparsification
- 1.
- Threshold-based sparsification: Retains a fraction p of the most significant edges based on the criterion , where is the edge weight. This approach allows control over graph sparsity while preserving connections with the greatest deviation from the neutral value 0.5.
- 2.
- Minimum connected sparsification: Employs binary search to determine the maximum threshold such that the graph remains connected. The method finds the minimal edge set that ensures graph connectivity, thereby optimizing the trade-off between sparsity and structural integrity.
- 3.
- No sparsification: Baseline configuration that preserves the original graph structure.
3. Results
3.1. Foundation Model Task
3.2. Separate Datasets Task
3.3. Visualization of Sparsification Strategies
3.4. Testing the Universality of the Pipeline
3.5. Dealing with the Curse of Dimensionality
3.6. Robustness to Correlated Features
3.7. Evaluation on Ovarian Cancer Proteomics Data
3.8. Performance Analysis
4. Conclusions and Discussion
Author Contributions
Funding
Institutional Review Board Statement
Informed Consent Statement
Data Availability Statement
Conflicts of Interest
References
- Minaee, S.; Mikolov, T.; Nikzad, N.; Chenaghlu, M.; Socher, R.; Amatriain, X.; Gao, J. Large Language Models: A Survey. arXiv 2025, arXiv:2402.06196. [Google Scholar] [PubMed]
- Villalobos, P.; Ho, A.; Sevilla, J.; Besiroglu, T.; Heim, L.; Hobbhahn, M. Position: Will we run out of data? Limits of LLM scaling based on human-generated data. In Proceedings of the 41st International Conference on Machine Learning, Vienna, Austria, 21–27 July 2024; Proceedings of Machine Learning Research (PMLR); JMLR.org: Brookline, MA, USA, 2024; Volume 235, pp. 49523–49544. [Google Scholar]
- Jones, N. The AI revolution is running out of data. What can researchers do? Nature 2024, 636, 290–292. [Google Scholar] [CrossRef] [PubMed]
- Shumailov, I.; Shumaylov, Z.; Zhao, Y.; Papernot, N.; Anderson, R.; Gal, Y. AI models collapse when trained on recursively generated data. Nature 2024, 631, 755–759. [Google Scholar] [CrossRef] [PubMed]
- Nurk, S.; Koren, S.; Rhie, A.; Rautiainen, M.; Bzikadze, A.V.; Mikheenko, A.; Vollger, M.R.; Altemose, N.; Uralsky, L.; Gershman, A.; et al. The complete sequence of a human genome. Science 2022, 376, 44–53. [Google Scholar] [CrossRef]
- Zou, J.; Huss, M.; Abid, A.; Mohammadi, P.; Torkamani, A.; Telenti, A. A primer on deep learning in genomics. Nat. Genet. 2019, 51, 12–18. [Google Scholar] [CrossRef]
- Messner, C.B.; Demichev, V.; Wendisch, D.; Michalick, L.; White, M.; Freiwald, A.; Textoris-Taube, K.; Vernardis, S.I.; Egger, A.S.; Kreidl, M.; et al. Ultra-High-Throughput Clinical Proteomics Reveals Classifiers of COVID-19 Infection. Cell Syst. 2020, 11, 11–24.e4. [Google Scholar] [CrossRef]
- A focus on single-cell omics. Nat. Rev. Genet. 2023, 24, 485. [CrossRef]
- Schübeler, D. Function and information content of DNA methylation. Nature 2015, 517, 321–326. [Google Scholar] [CrossRef]
- Sviridov, I.; Egorov, K. Conditional Electrocardiogram Generation Using Hierarchical Variational Autoencoders. arXiv 2025, arXiv:2503.13469. [Google Scholar]
- Wang, L.; Yin, Y.; Glampson, B.; Peach, R.; Barahona, M.; Delaney, B.C.; Mayer, E.K. Transformer-based deep learning model for the diagnosis of suspected lung cancer in primary care based on electronic health record data. EBioMedicine 2024, 110, 105442. [Google Scholar] [CrossRef]
- Rahnenführer, J.; De Bin, R.; Benner, A.; Ambrogi, F.; Lusa, L.; Boulesteix, A.L.; Migliavacca, E.; Binder, H.; Michiels, S.; Sauerbrei, W.; et al. Statistical analysis of high-dimensional biomedical data: A gentle introduction to analytical goals, common approaches and challenges. BMC Med. 2023, 21, 182. [Google Scholar] [CrossRef] [PubMed]
- Yang, X.; Huang, K.; Yang, D.; Zhao, W.; Zhou, X. Biomedical Big Data Technologies, Applications, and Challenges for Precision Medicine: A Review. Glob. Chall. 2024, 8, 2300163. [Google Scholar] [CrossRef] [PubMed]
- Berisha, V.; Krantsevich, C.; Hahn, P.R.; Hahn, S.; Dasarathy, G.; Turaga, P.; Liss, J. Digital medicine and the curse of dimensionality. Npj Digit. Med. 2021, 4, 153. [Google Scholar] [CrossRef] [PubMed]
- Hoffmann, J.; Borgeaud, S.; Mensch, A.; Buchatskaya, E.; Cai, T.; Rutherford, E.; de Las Casas, D.; Hendricks, L.A.; Welbl, J.; Clark, A.; et al. Training Compute-Optimal Large Language Models. arXiv 2022, arXiv:2203.15556. [Google Scholar] [CrossRef]
- Krivonosov, M.; Nazarenko, T.; Ushakov, V.; Vlasenko, D.; Zakharov, D.; Chen, S.; Blyus, O.; Zaikin, A. Analysis of Multidimensional Clinical and Physiological Data with Synolitical Graph Neural Networks. Technologies 2025, 13, 13. [Google Scholar] [CrossRef]
- Teo, Z.L.; Thirunavukarasu, A.J.; Elangovan, K.; Cheng, H.; Moova, P.; Soetikno, B.; Nielsen, C.; Pollreisz, A.; Ting, D.S.J.; Morris, R.J.T.; et al. Generative artificial intelligence in medicine. Nat. Med. 2025, 31, 3270–3282. [Google Scholar] [CrossRef]
- Rajpurkar, P.; Chen, E.; Banerjee, O.; Topol, E.J. AI in health and medicine. Nat. Med. 2022, 28, 31–38. [Google Scholar] [CrossRef]
- Corso, G.; Stark, H.; Jegelka, S.; Jaakkola, T.; Barzilay, R. Graph neural networks. Nat. Rev. Methods Primers 2024, 4, 17. [Google Scholar] [CrossRef]
- Whitwell, H.J.; Bacalini, M.G.; Blyuss, O.; Chen, S.; Garagnani, P.; Gordleeva, S.Y.; Jalan, S.; Ivanchenko, M.; Kanakov, O.; Kustikova, V.; et al. The Human Body as a Super Network: Digital Methods to Analyze the Propagation of Aging. Front. Aging Neurosci. 2020, 12, 136. [Google Scholar] [CrossRef]
- Wu, Z.; Pan, S.; Chen, F.; Long, G.; Zhang, C.; Yu, P.S. A Comprehensive Survey on Graph Neural Networks. IEEE Trans. Neural Netw. Learn. Syst. 2021, 32, 4–24. [Google Scholar] [CrossRef]
- Zanin, M.; Alcazar, J.M.; Carbajosa, J.V.; Paez, M.G.; Papo, D.; Sousa, P.; Menasalvas, E.; Boccaletti, S. Parenclitic networks: Uncovering new functions in biological data. Sci. Rep. 2014, 4, 5112. [Google Scholar] [CrossRef]
- Zanin, M.; Papo, D.; Sousa, P.; Menasalvas, E.; Nicchi, A.; Kubik, E.; Boccaletti, S. Combining complex networks and data mining: Why and how. Phys. Rep. 2016, 635, 1–44. [Google Scholar] [CrossRef]
- Whitwell, H.J.; Blyuss, O.; Menon, U.; Timms, J.F.; Zaikin, A. Parenclitic networks for predicting ovarian cancer. Oncotarget 2018, 9, 22717–22726. [Google Scholar] [CrossRef] [PubMed]
- Nazarenko, T.; Whitwell, H.J.; Blyuss, O.; Zaikin, A. Parenclitic and Synolytic Networks Revisited. Front. Genet. 2021, 12, 733783. [Google Scholar] [CrossRef] [PubMed]
- Krivonosov, M.; Nazarenko, T.; Bacalini, M.G.; Vedunova, M.; Franceschi, C.; Zaikin, A.; Ivanchenko, M. Age-related trajectories of DNA methylation network markers: A parenclitic network approach to a family-based cohort of patients with Down Syndrome. Chaos Solitons Fractals 2022, 165, 112863. [Google Scholar] [CrossRef]
- Demichev, V.; Tober-Lau, P.; Lemke, O.; Nazarenko, T.; Thibeault, C.; Whitwell, H.; Röhl, A.; Freiwald, A.; Szyrwiel, L.; Ludwig, D.; et al. A time-resolved proteomic and prognostic map of COVID-19. Cell Syst. 2021, 12, 780–794.e7. [Google Scholar] [CrossRef]
- Demichev, V.; Tober-Lau, P.; Nazarenko, T.; Lemke, O.; Kaur Aulakh, S.; Whitwell, H.J.; Röhl, A.; Freiwald, A.; Mittermaier, M.; Szyrwiel, L.; et al. A proteomic survival predictor for COVID-19 patients in intensive care. PLoS Digit. Health 2022, 1, e0000007. [Google Scholar] [CrossRef]
- Kipf, T.N.; Welling, M. Semi-Supervised Classification with Graph Convolutional Networks. arXiv 2017, arXiv:1609.02907. [Google Scholar] [CrossRef]
- Brody, S.; Alon, U.; Yahav, E. How Attentive are Graph Attention Networks? arXiv 2022, arXiv:2105.14491. [Google Scholar] [CrossRef]
- Mirkes, E.M.; Allohibi, J.; Gorban, A. Fractional Norms and Quasinorms Do Not Help to Overcome the Curse of Dimensionality. Entropy 2020, 22, 1105. [Google Scholar] [CrossRef]
- DeLong, E.R.; DeLong, D.M.; Clarke-Pearson, D.L. Comparing the Areas under Two or More Correlated Receiver Operating Characteristic Curves: A Nonparametric Approach. Biometrics 1988, 44, 837–845. [Google Scholar] [CrossRef]
- Hearst, M.A.; Dumais, S.T.; Osuna, E.; Platt, J.; Scholkopf, B. Support vector machines. IEEE Intell. Syst. Their Appl. 1998, 13, 18–28. [Google Scholar] [CrossRef]
- Chen, T.; Guestrin, C. XGBoost: A Scalable Tree Boosting System. In KDD ’16: Proceedings of the 22nd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining; ACM: New York, NY, USA, 2016; pp. 785–794. [Google Scholar] [CrossRef]
- PyTorch Geometric Documentation. Available online: https://pytorch-geometric.readthedocs.io/en/latest/ (accessed on 1 September 2025).
- XGBoost Documentation. Available online: https://xgboost.readthedocs.io/en/stable/ (accessed on 1 September 2025).
- scikit-learn: Machine Learning in Python. Available online: https://scikit-learn.org/stable/index.html (accessed on 1 September 2025).
- Akiba, T.; Sano, S.; Yanase, T.; Ohta, T.; Koyama, M. Optuna: A Next-generation Hyperparameter Optimization Framework. arXiv 2019, arXiv:1907.10902. [Google Scholar] [CrossRef]
- Altman, N.; Krzywinski, M. The curse(s) of dimensionality. Nat. Methods 2018, 15, 399–400. [Google Scholar] [CrossRef]
- Zaikin, A.; Sviridov, I.; Oganezova, J.G.; Menon, U.; Gentry-Maharaj, A.; Timms, J.F.; Blyuss, O. Synolitic Graph Neural Networks of High-Dimensional Proteomic Data Enhance Early Detection of Ovarian Cancer. Cancers 2025, 17, 3972. [Google Scholar] [CrossRef]
- Ding, A.; Qin, Y.; Wang, B.; Guo, L.; Jia, L.; Cheng, X. Evolvable graph neural network for system-level incremental fault diagnosis of train transmission systems. Mech. Syst. Signal Process. 2024, 210, 111175. [Google Scholar] [CrossRef]
- Sun, L.; Ye, J.; Peng, H.; Wang, F.; Yu, P.S. Self-supervised continual graph learning in adaptive riemannian spaces. In Proceedings of the AAAI Conference on Artificial Intelligence, Washington DC, USA, 7–14 February 2023; Volume 37, pp. 4633–4642. [Google Scholar]





| Parameter | Value | Parameter | Value |
|---|---|---|---|
| GNN Model Parameters: | |||
| Activation function | Leaky ReLU | Hidden layer size | 128 |
| Number of GNN layers | 2 | Dropout rate | 0.3 |
| Residual connections | True | Use edge encoder | True |
| Edge encoder hidden size | 32 | Number of edge encoder layers | 2 |
| Classifier MLP hidden size | 32 | Number of classifier MLP layers | 2 |
| GATv2 Specific Parameters: | |||
| Number of attention heads | 3 | Concatenate head outputs | True |
| Training Configuration: | |||
| Learning rate | Batch size | 512 | |
| Maximum epochs | 256 | Early stopping patience | 128 |
| Learning rate patience | 32 | Cross-validation folds | 5 |
| Weight decay | LR reduction factor | 0.5 | |
| Optuna Hyperparameter Optimization: | |||
| Number of trials | 8 | Startup trials | 1 |
| Warmup steps | 4 | ||
| XGBoost Baseline Configuration: | |||
| Maximum depth | 6 | Learning rate | 0.1 |
| Number of estimators | 100 | Subsample ratio | 0.8 |
| Column sampling ratio | 0.8 | ||
| SVM Configuration: | |||
| Kernel | RBF | Regularization parameter C | 1.0 |
| Class weight | balanced | Probability estimates | True |
| Model | Sparsify | ROC-AUC | |
|---|---|---|---|
| Node Feat. = False | Node Feat. = True | ||
| GCN | None | 86.64 ± 0.94 | 91.20 ± 0.47 * |
| 83.30 ± 1.82 | 91.15 ± 0.30 * | ||
| 85.10 ± 0.28 | 91.07 ± 0.46 * | ||
| Min conn. | 84.81 ± 0.51 | 91.22 ± 0.50 * | |
| GATv2 | None | 88.84 ± 1.88 | 92.22 ± 0.49 * |
| 89.08 ± 0.89 | 91.72 ± 0.62 * | ||
| 88.67 ± 1.55 | 92.20 ± 0.65 * | ||
| Min conn. | 88.79 ± 1.05 | 91.66 ± 0.60 * | |
| XGBoost | None | 86.05 ± 0.55 | |
| Model | Sparsify | Macro ROC-AUC | |
|---|---|---|---|
| Node Feat. = False | Node Feat. = True | ||
| GCN | None | 73.62 ± 3.91 | 77.67 ± 2.62 |
| 72.08 ± 2.85 | 75.01 ± 3.14 | ||
| 74.49 ± 2.19 | 77.70 ± 1.55 | ||
| Min conn. | 74.43 ± 1.55 | 77.12 ± 1.89 | |
| GATv2 | None | 78.82 ± 2.60 | 81.53 ± 2.02 |
| 77.19 ± 1.46 | 80.16 ± 2.76 | ||
| 79.87 ± 2.75 | 82.46 ± 2.60 | ||
| Min conn. | 78.80 ± 1.73 | 83.12 ± 1.08 | |
| XGBoost | None | 80.28 ± 2.33 | |
| Configuration | ROC-AUC |
|---|---|
| Synolitic graph only | 63.89 ± 4.61 |
| Sparsified at maximum threshold while remaining connected | 67.52 ± 2.23 |
| With additional node features | 81.99 ± 2.05 |
| Model | Sparsify | ROC-AUC | |
|---|---|---|---|
| Node Feat. = False | Node Feat. = True | ||
| GCN | None | 86.64/85.78 (−0.87) | 91.20/91.30 (+0.10) |
| 83.30/83.88 (+0.58) | 91.15/91.24 (+0.08) | ||
| 85.10/84.85 (−0.25) | 91.07/91.36 (+0.29) | ||
| Min conn. | 84.81/84.71 (−0.10) | 91.22/91.42 (+0.20) | |
| GATv2 | None | 88.84/89.03 (+0.20) | 92.22/91.95 (−0.27) |
| 89.08/88.55 (−0.53) | 91.72/91.44 (−0.28) | ||
| 88.67/89.01 (+0.34) | 92.20/91.84 (−0.36) | ||
| Min conn. | 88.79/88.68 (−0.11) | 91.66/91.76 (+0.09) | |
| XGBoost | None | 86.05/85.07 (−0.98) | |
| Method | Build Time (s) | Train Time (min) | Inference Time (s) | Peak Memory (GB) |
|---|---|---|---|---|
| GAT | 77.79 ± 0.46 | 21.83 ± 0.98 | 97.76 ± 23.52 | 2.8 ± 0.6 |
| + node features | 178.11 ± 10.29 | 28.13 ± 2.97 | 140.46 ± 28.33 | 3.01 ± 0.65 |
| + | 103.64 ± 0.76 | 19.46 ± 0.31 | 103.26 ± 19.27 | 0.87 ± 0.17 |
| GCN | 77.79 ± 0.46 | 18.03 ± 2.23 | 91.72 ± 17.59 | 0.41 ± 0.01 |
| + node features | 178.11 ± 10.29 | 23.40 ± 1.89 | 111.06 ± 22.1 | 0.51 ± 0.01 |
| + | 103.64 ± 0.76 | 19.55 ± 2.93 | 94.9 ± 31.26 | 0.14 ± 0.0 |
| XGBoost | – | 3.76 ± 0.02 | 2.5 ± 0.0 | – |
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content. |
© 2026 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license.
Share and Cite
Zaikin, A.; Sviridov, I.; Sosedka, A.; Linich, A.; Nasyrov, R.; Mirkes, E.M.; Tyukina, T. Overcoming the Curse of Dimensionality with Synolitic AI. Technologies 2026, 14, 84. https://doi.org/10.3390/technologies14020084
Zaikin A, Sviridov I, Sosedka A, Linich A, Nasyrov R, Mirkes EM, Tyukina T. Overcoming the Curse of Dimensionality with Synolitic AI. Technologies. 2026; 14(2):84. https://doi.org/10.3390/technologies14020084
Chicago/Turabian StyleZaikin, Alexey, Ivan Sviridov, Artem Sosedka, Anastasia Linich, Ruslan Nasyrov, Evgeny M. Mirkes, and Tatiana Tyukina. 2026. "Overcoming the Curse of Dimensionality with Synolitic AI" Technologies 14, no. 2: 84. https://doi.org/10.3390/technologies14020084
APA StyleZaikin, A., Sviridov, I., Sosedka, A., Linich, A., Nasyrov, R., Mirkes, E. M., & Tyukina, T. (2026). Overcoming the Curse of Dimensionality with Synolitic AI. Technologies, 14(2), 84. https://doi.org/10.3390/technologies14020084

