Toward Graph-Based Decoding of Tumor Evolution: Spatial Inference of Copy Number Variations
Abstract
1. Introduction
2. Materials and Methods
2.1. Dataset Information
2.2. Construction of Pseudo-Single-Cell Resolution for Visium HD
- Nuclei Segmentation: The accompanying high-resolution H&E histological images were segmented to delineate nuclear boundaries using Napari.
- Spatial Aggregation: The SpatialData framework [21] was utilized to establish a common coordinate system, aligning the 8 µm Visium HD bins with the segmented nuclear geometries. Bins spatially overlapping with these nuclear boundaries were then computationally aggregated to generate image-segmented pseudo-single-cell expression profiles.
- Validation: The biological validity of these pseudo-cells was confirmed through expert pathologist annotations provided within the dataset. These annotations, based on H&E staining and characteristic marker gene expression, served as the ground truth to benchmark the spatial domains identified by SCOIGET.
2.3. Data Preprocessing
2.4. Sample Integration
2.5. Gene Annotation and Binning
2.6. Spatial Graph Construction
2.6.1. Initial Graph Construction
2.6.2. Refined Graph Construction
2.7. SCOIGET Framework
- Input Data: SCOIGET utilizes three main components: (1) Node Features, where each spot or cell is represented by either binned gene expression data (feat) or pseudo-copy number profiles (norm_x) depending on the training stage; (2) Spatial Graph Structure, which encodes spatial relationships in an adjacency matrix (graph_neigh) where nodes represent spots or cells and edges indicate spatial proximity based on tissue architecture; and (3) Edge Attributes, which quantify the similarity between neighboring spots by calculating gene expression distances and normalizing them using softmax.
- Model Architecture: SCOIGET’s model comprises three primary components: an Encoder, a Decoder, and a Copy Number Encoder (CNEncoder). The Encoder utilizes three Graph Attention Network (GAT) layers [29] with multiple attention heads and ReLU activations to learn latent representations that capture complex spatial–transcriptomic–genomic interactions. The Decoder reconstructs the original input features from these latent representations through fully connected layers, including an intermediate layer with 128 units and ReLU activations to preserve intricate patterns. The CNEncoder estimates CNVs from the reconstructed features by employing a Hidden Markov Model (HMM) [30], which models genomic bins with discrete states and Gaussian emission probabilities. This encoder integrates spatial smoothing to enhance CNV localization and includes a regularization loss term to prevent overfitting.
- Copy Number Estimation and Refinement: The CNEncoder in SCOIGET estimates CNVs by identifying regions with consistent copy number states through a Hidden Markov Model (HMM). This process involves predicting hidden states that correspond to different copy number levels and applying spatial smoothing to reduce noise and improve CNV localization. The final CNV estimates are normalized and rescaled to ensure consistency across samples. Detailed implementation and parameter settings of the HMM are presented in Appendix A.3.
- Training Procedure: The model undergoes two phases of training. In the first phase, it uses binned gene expression data and a spatial graph constructed from the original features. The model is trained without a validation set to learn initial latent representations and reconstruct the input features. Following this phase, the CNEncoder estimates pseudo-copy numbers using an HMM, generating initial CNV predictions. In the second phase, the pseudo-copy numbers are incorporated into the node features, and a new spatial graph is created. The model is retrained on this updated graph, utilizing a validation set to monitor performance and prevent overfitting, with early stopping based on validation loss. The CNEncoder re-estimates the copy numbers, refining the CNV profiles with the updated model.
2.8. Loss Function
- Reconstruction Loss: This measures the discrepancy between the original and reconstructed features, encouraging the model to retain essential information in the latent space:
- KL Divergence Loss: This regularizes the latent space by encouraging the learned distribution to be close to a prior distribution, which prevents overfitting and ensures meaningful latent representations:where and are the mean and covariance of the latent distribution for sample , and is the dimensionality of the latent space.
- Regularization Loss: This uses L2 regularization on the reconstructed features helps prevent overfitting and encourages smooth predictions:where controls the contribution of the regularization term.
- Spatial Smoothing Loss: This enforces spatial consistency in the predicted copy numbers by minimizing the discrepancy between the copy numbers of connected nodes in the graph:where is the set of edges in the graph, and is the number of edges.
- The Total Loss is a weighted sum of the reconstruction loss, KL divergence loss, regularization loss, and spatial smoothing terms:
2.9. Model Implementation and Training
2.10. Baseline Methods
2.11. Evaluation Metrics
2.12. Simulation Study Design
2.13. Spatial Domain Identification
2.14. Tumor Evolution Pattern Inference
2.15. Survival Analysis
3. Results
3.1. Overview of SCOIGET
3.2. Validation of SCOIGET on Simulated Data
3.3. SCOIGET Integrates Spatial Omics Features Within a Unified Framework and Accurately Detects Copy Number Features Spatially
3.4. Inferring Clonal Evolution in Colorectal Cancer Progression Through Spatial CNV Analysis
3.5. SCOIGET Reveals Tumor Evolution Patterns in Prostate Cancer
3.6. Unveiling Colorectal Cancer Heterogeneity at Subcellular Resolution
4. Discussion
4.1. Comparison with Existing Methods
- Input Requirements: CalicoST fundamentally requires allele-specific SNP counts ( and ) derived from raw BAM files. This makes it incompatible with probe-based technologies (e.g., Visium CytAssist for FFPE) which “do not sequence SNPs”, as well as many publicly available datasets that only provide gene expression matrices. SCOIGET’s ability to operate directly on standard gene expression matrices grants it far broader applicability.
- Computational Speed: CalicoST’s statistical optimization model is computationally intensive, with a reported “runtime between 2 and 8 h”. In contrast, SCOIGET’s deep-learning framework is highly scalable, completing its analysis in approximately 35 min on a comparable dataset.
4.2. Clinical Implications and Future Directions
4.3. Limitations of the Study
5. Conclusions
Author Contributions
Funding
Institutional Review Board Statement
Informed Consent Statement
Data Availability Statement
Conflicts of Interest
Abbreviations
| CNV | Copy Number Variation |
| CRC | Colorectal Cancer |
| EMT | Epithelial–Mesenchymal Transition |
| GAT | Graph Attention Network |
| GG | Gleason Grade |
| GNN | Graph Neural Network |
| HMM | Hidden Markov Model |
| HMRF | Hidden Markov Random Field |
| HLA | Human Leukocyte Antigen |
| k-NN | k-Nearest Neighbors |
| MSE | Mean Squared Error |
| MSS | Microsatellite Stable |
| NGS | Next-Generation Sequencing |
| OS | Overall Survival |
| PAGA | Partition-based Graph Abstraction |
| PCa | Prostate Cancer |
| PCA | Principal Component Analysis |
| PyG | PyTorch Geometric |
| SCOIGET | Spatial Copy Number Inference by Graph on Evolution of Tumor |
| scRNA-seq | Single-cell RNA sequencing |
| ST | Spatial Transcriptomics |
| TCGA | The Cancer Genome Atlas |
| TVA | Tubular Villous Adenoma |
| VAE | Variational Autoencoder |
| WES | Whole-Exome Sequencing |
Appendix A
Appendix A.1. Details in Dataset Construction
- Validation Dataset (Figure 3)
- •
- Source: HTAN-WUSTL atlas [18].
- •
- Samples: CRC Visium samples HT260C1, HT112C1 (U1, U2), and HT225C1 (U1–U5).
- •
- WES Data: Whole-exome sequencing data were included to construct the validation for copy number analysis.
- •
- Sample Description:
- HT260C1: Tumor sample.
- HT112C1 (U1, U2): Two sections of the same tumor.
- HT225C1 (U1–U5): Multiple regions sampled from one tumor.
- Case Study 1 (Figure 4)
- •
- Source: 2023 Cell publication [19].
- •
- Samples: Four CRC Visium samples from the same patient (PAT71397).
- •
- Stages:
- G1 Stage (AD): Sample 6723_4, identified as a tubular villous adenoma (TVA).
- G2 Stage (IIIB): Samples 6723_1, 6723_2, 6723_3, classified as microsatellite stable (MSS) stage IIIB tumors.
- •
- Specimen Location: Cecum region.
- Case Study 2 (Figure 5)
- •
- Source: 2022 Nature study [3].
- •
- Samples: Four prostate cancer Visium samples (H1_2, H1_4, H1_5, H2_1) from a single patient.
- Case Study 3 (Figure 6)
- •
- Source: Publicly available 10× Genomics and Zenodo repository [20].
- •
- Samples: Three CRC Visium HD samples (p1, p2, p5).
- •
- Processing:
- Samples p1 and p5 used 8 μm × 8 μm bins.
- Sample p2 underwent Stardist cell segmentation on HE-stained images, supplemented by pathologist-provided annotations.
| Figure | Platform | Cancer Type | Source Atlas | Patient ID | Location | Type | Grade | Stage | Gender | Age |
|---|---|---|---|---|---|---|---|---|---|---|
| Figure 3 | 10× Visium | CRC | HTAN_WUSTL | HT260C1 | / | Metastasis | / | / | / | / |
| 10× Visium | CRC | HTAN_WUSTL | HT112C1 | / | Metastasis | / | / | / | / | |
| 10× Visium | CRC | HTAN_WUSTL | HT112C1 | / | Metastasis | / | / | / | / | |
| 10× Visium | CRC | HTAN_WUSTL | HT230C1 | / | / | / | / | / | / | |
| 10× Visium | CRC | HTAN_WUSTL | HT225C1 | / | / | / | / | / | / | |
| Figure 4 | 10× Visium | CRC | CRC_ST | PAT71397 | Cecum | MSS | G2 | IIIB | M | 61 |
| 10× Visium | CRC | CRC_ST | PAT71397 | Cecum | MSS | G2 | IIIB | M | 61 | |
| 10× Visium | CRC | CRC_ST | PAT71397 | Cecum | MSS | G2 | IIIB | M | 61 | |
| 10× Visium | CRC | CRC_ST | PAT71397 | Cecum | TA/TVA | G1 | AD | M | 61 | |
| Figure 5 | 10× Visium | PCa | PCa_ST | H1_2 | / | / | / | / | / | / |
| 10× Visium | PCa | PCa_ST | H1_4 | / | / | / | / | / | / | |
| 10× Visium | PCa | PCa_ST | H1_5 | / | / | / | / | / | / | |
| 10× Visium | PCa | PCa_ST | H2_1 | / | / | / | / | / | / | |
| Figure 6 | Visium HD | CRC | Visium HD | p2 | Sigmoid | / | / | / | M | 60 |
| Visium HD | CRC | Visium HD | p1 | Sigmoid | / | / | IIA | F | 72 | |
| Visium HD | CRC | Visium HD | p5 | Sigmoid | / | / | IVA | F | 58 |
Appendix A.2. Details of the SCOIGET Algorithm
- Input Data
- •
- Node Features: Each spot or cell is represented by either binned gene expression features (feat) or pseudo-copy number profiles (norm_x), depending on the training stage.
- •
- Spatial Graph Structure: The spatial relationships between spots were encoded in an adjacency matrix (graph_neigh), where nodes represent spots or cells, and edges represent spatial proximity based on the tissue’s architecture. This graph structure preserves the spatial context of the data.
- •
- Edge Attributes: Edge weights and probabilities were computed to quantify the similarity between neighboring spots. These attributes were calculated based on distances in the gene expression feature space and transformed using softmax normalization.
- Model Architecture
- •
- Encoder: The encoder employs three GAT layers to learn latent representations of the input data. Each layer attends to neighboring nodes in the spatial graph, allowing the model to focus on relevant spatial relationships and capture both local and global patterns. The encoder incorporates multiple attention heads (e.g., eight heads per layer), and ReLU activations are used for nonlinearity. The final layer aggregates the features and outputs the mean (z_mean) and variance (z_var) of the latent representations, with variance constrained to ensure numerical stability.
- •
- Decoder: The decoder reconstructs the original input features from the latent representations. It uses fully connected layers, including an intermediate layer with 128 units, and a final output layer matching the input dimension. ReLU activations introduce nonlinearity, ensuring the model captures complex patterns during reconstruction.
- •
- Copy Number Encoder (CNEncoder): The CNEncoder estimates CNVs from the reconstructed features using a Hidden Markov Model (HMM). This model identifies regions with consistent copy number states, iteratively predicting hidden states that correspond to different copy number levels. The model’s output is adjusted by spatial smoothing, where neighboring spots’ states are averaged, using the edge_index parameter to encode spatial relationships. This spatial smoothing reduces noise and improves the localization of CNVs. The final CNV estimates are normalized, rescaled, and adjusted to ensure consistency across samples. Additionally, a regularization loss term helps prevent overfitting.
- Training procedure
- •
- First Training Phase: The initial training uses the binned gene expression data and spatial graph constructed from the original features. The model is trained without a validation set to learn initial latent representations and reconstruct the input features. The loss function combines reconstruction loss (mean squared error between the original and reconstructed features) and a Kullback–Leibler (KL) divergence term to regularize the latent space. After training, the CNEncoder estimates pseudo copy numbers using the HMM. These estimates serve as initial CNV predictions.
- •
- Second Training Phase: The pseudo copy numbers obtained from the first phase are incorporated into the node features. A new spatial graph is constructed using these updated features. The model is retrained on the new graph, this time employing a validation set to monitor performance and prevent overfitting. Early stopping criteria may be used based on validation loss. The CNEncoder re-estimates the copy numbers, refining the CNV profiles with the updated model.
Appendix A.3. Copy Number Estimation and Refinement
- Hidden Markov Model for CNV Detection
- •
- States (): Discrete copy number states, such as deletion (), normal (), and amplification ().
- •
- Observations (): Reconstructed gene expression levels per bin.
- •
- Initial State Probabilities (): Probabilities for starting in each state.
- •
- Transition Probabilities (): Probabilities of transitioning between states in adjacent bins.
- •
- Emission Probabilities (): Likelihood of observing a particular expression level given a state, modeled as Gaussian distributions .
- Estimation Process
- •
- Initialization: Parameters , , , and are initialized based on biological expectations, favoring state persistence.
- •
- Expectation-Maximization (EM) Algorithm: E-step computes posterior probabilities of the states given the observations using the forward-backward algorithm. M-step updates the HMM parameters to maximize the expected log-likelihood of the data.
- •
- Viterbi Decoding: Determines the most probable sequence of hidden states , assigning a copy number state to each bin.
- Normalization and Adjustment
- •
- State Mapping: Map hidden states to copy number values (e.g., for deletion, for normal, for amplification).
- •
- Copy Number Adjustment:
- •
- Spot Normalization:
Appendix A.4. Integration into the GNN
- •
- Regularization
Appendix A.5. Evaluation Metrics
- Mean Squared Error (MSE): Measures the average squared difference between predicted and true copy number values (WES baseline):where is the true value, and is the predicted value.
- Cosine Similarity: Measures the similarity between two vectors, treating CNV profiles as high-dimensional vectors:
- Euclidean Distance: Quantifies the straight-line distance between predicted and true CNV values:
- Manhattan Distance: Calculates the absolute distance between predicted and true CNV values:
- Silhouette Score: Evaluates clustering performance by measuring intra-cluster cohesion and inter-cluster separation:where is the average distance from point to all other points within the same cluster, is the average distance from point to all points in the nearest neighboring cluster. The silhouette score ranges from −1 to 1, with higher values (closer to 1) indicating well-defined and well-separated clusters.
Appendix B
Appendix B.1. Implementation Details for Baseline Methods
| Method | Software/Environment | Key Parameters and Settings |
|---|---|---|
| InferCNV [11] | R package v1.14.2 | cutoff = 0.1 (optimized for 10× Genomics sparse data), denoise = TRUE, HMM = TRUE, cluster_by_groups = TRUE. |
| CopyKAT [12] | R package v1.1.0 | KS.cut = 0.1 (segmentation sensitivity), ngene.chr = 5, win.size = 25, distance = “euclidean”, genome = “hg20”. |
| SCEVAN [13] | R package v1.0.3 | beta_vega = 0.5 (segmentation granularity), SUBCLONES = TRUE, FIXED_NORMAL_CELLS = FALSE, SCEVANsignatures = TRUE, organism = “human”. |
| CopyVAE [14] | Python ≥ 3.8 | dim_model = 32, layer_encode = 2, layer_decode = 2, epoch = 50, patience = 5. |
Appendix B.2. Model Stability Analysis
| Metric | Mean | Standard Deviation (SD) | Min | Max |
|---|---|---|---|---|
| Mean Squared Error | 0.0320 | 0.0002 | 0.0316 | 0.0323 |
| Cosine Similarity | 0.6281 | 0.0008 | 0.6270 | 0.6293 |
References
- Gerstung, M.; Jolly, C.; Leshchiner, I.; Dentro, S.C.; Gonzalez, S.; Rosebrock, D.; Mitchell, T.J.; Rubanova, Y.; Anur, P.; Yu, K.; et al. The Evolutionary History of 2658 Cancers. Nature 2020, 578, 122–128. [Google Scholar] [CrossRef]
- Vitale, I.; Shema, E.; Loi, S.; Galluzzi, L. Intratumoral Heterogeneity in Cancer Progression and Response to Immunotherapy. Nat. Med. 2021, 27, 212–224. [Google Scholar] [CrossRef]
- Erickson, A.; He, M.; Berglund, E.; Marklund, M.; Mirzazadeh, R.; Schultz, N.; Kvastad, L.; Andersson, A.; Bergenstråhle, L.; Bergenstråhle, J.; et al. Spatially Resolved Clonal Copy Number Alterations in Benign and Malignant Tissue. Nature 2022, 608, 360–367. [Google Scholar] [CrossRef]
- Steele, C.D.; Abbasi, A.; Islam, S.M.A.; Bowes, A.L.; Khandekar, A.; Haase, K.; Hames-Fathi, S.; Ajayi, D.; Verfaillie, A.; Dhami, P.; et al. Signatures of Copy Number Alterations in Human Cancer. Nature 2022, 606, 984–991. [Google Scholar] [CrossRef]
- Becchi, T. A Pan-Cancer Landscape of Pathogenic Somatic Copy Number Variations. J. Biomed. Inform. 2023, 147, 104529. [Google Scholar] [CrossRef] [PubMed]
- Zhao, L.; Liu, H.; Yuan, X.; Gao, K.; Duan, J. Comparative Study of Whole Exome Sequencing-Based Copy Number Variation Detection Tools. BMC Bioinform. 2020, 21, 97. [Google Scholar] [CrossRef] [PubMed]
- Mandiracioglu, B.; Ozden, F.; Kaynar, G.; Yilmaz, M.A.; Alkan, C.; Cicek, A.E. ECOLE: Learning to Call Copy Number Variants on Whole Exome Sequencing Data. Nat. Commun. 2024, 15, 132. [Google Scholar] [CrossRef] [PubMed]
- Coutelier, M.; Holtgrewe, M.; Jäger, M.; Flöttman, R.; Mensah, M.A.; Spielmann, M.; Krawitz, P.; Horn, D.; Beule, D.; Mundlos, S. Combining Callers Improves the Detection of Copy Number Variants from Whole-Genome Sequencing. Eur. J. Hum. Genet. 2022, 30, 178–186. [Google Scholar] [CrossRef]
- Zhao, T.; Chiang, Z.D.; Morriss, J.W.; LaFave, L.M.; Murray, E.M.; Del Priore, I.; Meli, K.; Lareau, C.A.; Nadaf, N.M.; Li, J.; et al. Spatial Genomics Enables Multi-Modal Study of Clonal Heterogeneity in Tissues. Nature 2022, 601, 85–91. [Google Scholar] [CrossRef]
- Shao, X.; Lv, N.; Liao, J.; Long, J.; Xue, R.; Ai, N.; Xu, D.; Fan, X. Copy Number Variation Is Highly Correlated with Differential Gene Expression: A Pan-Cancer Study. BMC Med. Genet. 2019, 20, 175. [Google Scholar] [CrossRef]
- inferCNV of the Trinity CTAT Project. Available online: https://github.com/broadinstitute/inferCNV/wiki (accessed on 23 December 2024).
- Gao, R.; Bai, S.; Henderson, Y.C.; Lin, Y.; Schalck, A.; Yan, Y.; Kumar, T.; Hu, M.; Sei, E.; Davis, A.; et al. Delineating Copy Number and Clonal Substructure in Human Tumors from Single-Cell Transcriptomes. Nat. Biotechnol. 2021, 39, 599–608. [Google Scholar] [CrossRef]
- De Falco, A.; Caruso, F.; Su, X.-D.; Iavarone, A.; Ceccarelli, M. A Variational Algorithm to Detect the Clonal Copy Number Substructure of Tumors from scRNA-Seq Data. Nat. Commun. 2023, 14, 1074. [Google Scholar] [CrossRef]
- Kurt, S.; Chen, M.; Toosi, H.; Chen, X.; Engblom, C.; Mold, J.; Hartman, J.; Lagergren, J. CopyVAE: A Variational Autoencoder-Based Approach for Copy Number Variation Inference Using Single-Cell Transcriptomics. Bioinformatics 2024, 40, btae284. [Google Scholar] [CrossRef]
- Wang, S.; Zhou, X.; Kong, Y.; Lu, H. Superresolved Spatial Transcriptomics Transferred from a Histological Context. Appl. Intell. 2023, 53, 31033–31045. [Google Scholar] [CrossRef]
- Xiao, X.; Kong, Y.; Li, R.; Wang, Z.; Lu, H. Transformer with Convolution and Graph-Node Co-Embedding: An Accurate and Interpretable Vision Backbone for Predicting Gene Expressions from Local Histopathological Image. Med. Image Anal. 2024, 91, 103040. [Google Scholar] [CrossRef]
- Ma, C.; Balaban, M.; Liu, J.; Chen, S.; Wilson, M.J.; Sun, C.H.; Ding, L.; Raphael, B.J. Inferring Allele-Specific Copy Number Aberrations and Tumor Phylogeography from Spatially Resolved Transcriptomics. Nat. Methods 2024, 21, 2239–2247. [Google Scholar] [CrossRef]
- Mo, C.-K.; Liu, J.; Chen, S.; Storrs, E.; Targino da Costa, A.L.N.; Houston, A.; Wendl, M.C.; Jayasinghe, R.G.; Iglesia, M.D.; Ma, C.; et al. Tumour Evolution and Microenvironment Interactions in 2D and 3D Space. Nature 2024, 634, 1178–1186. [Google Scholar] [CrossRef]
- Heiser, C.N.; Simmons, A.J.; Revetta, F.; McKinley, E.T.; Ramirez-Solano, M.A.; Wang, J.; Kaur, H.; Shao, J.; Ayers, G.D.; Wang, Y.; et al. Molecular Cartography Uncovers Evolutionary and Microenvironmental Dynamics in Sporadic Colorectal Tumors. Cell 2023, 186, 5620–5637.e16. [Google Scholar] [CrossRef] [PubMed]
- Kiessling, P.; El-Heliebi, A.; Ishaque, N. Visium HD Human Colorectal Cancer (FFPE) Data Release Pathologist Annotation. 2024. Available online: https://zenodo.org/records/11402686 (accessed on 20 November 2025).
- Marconato, L.; Palla, G.; Yamauchi, K.A.; Virshup, I.; Heidari, E.; Treis, T.; Vierdag, W.-M.; Toth, M.; Stockhaus, S.; Shrestha, R.B.; et al. SpatialData: An Open and Universal Data Framework for Spatial Omics. Nat. Methods 2025, 22, 58–62. [Google Scholar] [CrossRef] [PubMed]
- Ståhl, P.L.; Salmén, F.; Vickovic, S.; Lundmark, A.; Navarro, J.F.; Magnusson, J.; Giacomello, S.; Asp, M.; Westholm, J.O.; Huss, M.; et al. Visualization and Analysis of Gene Expression in Tissue Sections by Spatial Transcriptomics. Science 2016, 353, 78–82. [Google Scholar] [CrossRef]
- Nagendran, M.; Sapida, J.; Arthur, J.; Yin, Y.; Tuncer, S.D.; Anaparthy, N.; Gupta, A.; Serra, M.; Patterson, D.; Tentori, A. 1457 Visium HD Enables Spatially Resolved, Single-Cell Scale Resolution Mapping of FFPE Human Breast Cancer Tissue. J. ImmunoTher. Cancer 2023, 11. [Google Scholar] [CrossRef]
- Wolf, F.A.; Angerer, P.; Theis, F.J. SCANPY: Large-Scale Single-Cell Gene Expression Data Analysis. Genome Biol. 2018, 19, 15. [Google Scholar] [CrossRef]
- Ren, Y.; Cheng, Z.; Li, L.; Zhang, Y.; Dai, F.; Deng, L.; Wu, Y.; Gu, J.; Lin, Q.; Wang, X.; et al. BMAP: A Comprehensive and Reproducible Biomedical Data Analysis Platform. bioRxiv 2024. [Google Scholar] [CrossRef]
- Korsunsky, I.; Millard, N.; Fan, J.; Slowikowski, K.; Zhang, F.; Wei, K.; Baglaenko, Y.; Brenner, M.; Loh, P.; Raychaudhuri, S. Fast, Sensitive and Accurate Integration of Single-Cell Data with Harmony. Nat. Methods 2019, 16, 1289–1296. [Google Scholar] [CrossRef]
- Cover, T.; Hart, P. Nearest Neighbor Pattern Classification. IEEE Trans. Inf. Theory 1967, 13, 21–27. [Google Scholar] [CrossRef]
- Scarselli, F.; Gori, M.; Tsoi, A.C.; Hagenbuchner, M.; Monfardini, G. The Graph Neural Network Model. IEEE Trans. Neural Netw. 2009, 20, 61–80. [Google Scholar] [CrossRef]
- Veličković, P.; Cucurull, G.; Casanova, A.; Romero, A.; Liò, P.; Bengio, Y. Graph Attention Networks. arXiv 2018, arXiv:1710.10903. [Google Scholar]
- Baum, L.E.; Petrie, T. Statistical Inference for Probabilistic Functions of Finite State Markov Chains. Ann. Math. Stat. 1966, 37, 1554–1563. [Google Scholar] [CrossRef]
- Fey, M.; Lenssen, J.E. Fast Graph Representation Learning with PyTorch Geometric. arXiv 2019, arXiv:1903.02428. [Google Scholar] [CrossRef]
- Traag, V.A.; Waltman, L.; van Eck, N.J. From Louvain to Leiden: Guaranteeing Well-Connected Communities. Sci. Rep. 2019, 9, 5233. [Google Scholar] [CrossRef] [PubMed]
- Wolf, F.A.; Hamey, F.K.; Plass, M.; Solana, J.; Dahlin, J.S.; Göttgens, B.; Rajewsky, N.; Simon, L.; Theis, F.J. PAGA: Graph Abstraction Reconciles Clustering with Trajectory Inference through a Topology Preserving Map of Single Cells. Genome Biol. 2019, 20, 59. [Google Scholar] [CrossRef]
- Tang, Z.; Kang, B.; Li, C.; Chen, T.; Zhang, Z. GEPIA2: An Enhanced Web Server for Large-Scale Expression Profiling and Interactive Analysis. Nucleic Acids Res. 2019, 47, W556–W560. [Google Scholar] [CrossRef] [PubMed]
- Hu, X.; Zhu, H.; Chen, B.; He, X.; Shen, Y.; Zhang, X.; Xu, Y.; Xu, X. The Oncogenic Role of Tubulin Alpha-1c Chain in Human Tumours. BMC Cancer 2022, 22, 498. [Google Scholar] [CrossRef] [PubMed]
- Jiang, Y.; Zhu, C.; Huang, H.; Huang, G.; Fu, B.; Xi, X. TUBA1C Is a Potential New Prognostic Biomarker and Promotes Bladder Urothelial Carcinoma Progression by Regulating the Cell Cycle. BMC Cancer 2023, 23, 716. [Google Scholar] [CrossRef] [PubMed]
- Lai, P.M.; Chan, K.M. Roles of Histone H2A Variants in Cancer Development, Prognosis, and Treatment. Int. J. Mol. Sci. 2024, 25, 3144. [Google Scholar] [CrossRef]
- Dong, M.; Chen, J.; Deng, Y.; Zhang, D.; Dong, L.; Sun, D. H2AFZ Is a Prognostic Biomarker Correlated to TP53 Mutation and Immune Infiltration in Hepatocellular Carcinoma. Front. Oncol. 2021, 11, 701736. [Google Scholar] [CrossRef]






Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content. |
© 2025 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).
Share and Cite
Zhang, Y.; Yang, Y.; Kong, Y.; Zhong, B.; Nakai, K.; Lu, H. Toward Graph-Based Decoding of Tumor Evolution: Spatial Inference of Copy Number Variations. Diagnostics 2025, 15, 3169. https://doi.org/10.3390/diagnostics15243169
Zhang Y, Yang Y, Kong Y, Zhong B, Nakai K, Lu H. Toward Graph-Based Decoding of Tumor Evolution: Spatial Inference of Copy Number Variations. Diagnostics. 2025; 15(24):3169. https://doi.org/10.3390/diagnostics15243169
Chicago/Turabian StyleZhang, Yujia, Yitao Yang, Yan Kong, Bingxu Zhong, Kenta Nakai, and Hui Lu. 2025. "Toward Graph-Based Decoding of Tumor Evolution: Spatial Inference of Copy Number Variations" Diagnostics 15, no. 24: 3169. https://doi.org/10.3390/diagnostics15243169
APA StyleZhang, Y., Yang, Y., Kong, Y., Zhong, B., Nakai, K., & Lu, H. (2025). Toward Graph-Based Decoding of Tumor Evolution: Spatial Inference of Copy Number Variations. Diagnostics, 15(24), 3169. https://doi.org/10.3390/diagnostics15243169

