# SINFONIA: Scalable Identification of Spatially Variable Genes for Deciphering Spatial Domains

^{1}

^{2}

^{3}

^{*}

## Abstract

**:**

## 1. Introduction

## 2. Materials and Methods

#### 2.1. Construction of Spatial Neighbor Graph

#### 2.2. Calculation of Spatial Autocorrelation Statistics

#### 2.3. Identification of Spatially Variable Genes

#### 2.4. Implementation and Usage of SINFONIA

#### 2.5. Data Collection

#### 2.6. Performance Evaluation

#### 2.6.1. Spatial Clustering

#### 2.6.2. Domain Resolution

#### 2.6.3. Latent Representation

#### 2.6.4. Spot Visualization

#### 2.6.5. Computational Efficiency

#### 2.7. Baseline Methods

## 3. Results

#### 3.1. SINFONIA Enables Accurate Spatial Clustering

#### 3.2. SINFONIA Effectively Characterizes Spatial Patterns

#### 3.3. SINFONIA Facilitates Interpretable Spot Visualization

#### 3.4. SINFONIA Is Robust and Computationally Efficient

#### 3.5. SINFONIA Improves the Performance of Other Spatial Embedding Methods

## 4. Discussion

## 5. Conclusions

## Author Contributions

## Funding

## Institutional Review Board Statement

## Informed Consent Statement

## Data Availability Statement

## Conflicts of Interest

## References

- Svensson, V.; Teichmann, S.A.; Stegle, O. SpatialDE: Identification of spatially variable genes. Nat. Methods
**2018**, 15, 343–346. [Google Scholar] [CrossRef] [PubMed] - Hao, Y.; Hao, S.; Andersen-Nissen, E.; Mauck, W.M., III; Zheng, S.; Butler, A.; Lee, M.J.; Wilk, A.J.; Darby, C.; Zager, M.; et al. Integrated analysis of multimodal single-cell data. Cell
**2021**, 184, 3573–3587.e29. [Google Scholar] [CrossRef] [PubMed] - Palla, G.; Spitzer, H.; Klein, M.; Fischer, D.; Schaar, A.C.; Kuemmerle, L.B.; Rybakov, S.; Ibarra, I.L.; Holmberg, O.; Virshup, I.; et al. Squidpy: A scalable framework for spatial omics analysis. Nat. Methods
**2022**, 19, 171–178. [Google Scholar] [CrossRef] [PubMed] - Abdelaal, T.; Mourragui, S.; Mahfouz, A.; Reinders, M.J.T. SpaGE: Spatial Gene Enhancement using scRNA-seq. Nucleic Acids Res.
**2020**, 48, E107. [Google Scholar] [CrossRef] [PubMed] - Hu, J.; Li, X.; Coleman, K.; Schroeder, A.; Ma, N.; Irwin, D.J.; Lee, E.B.; Shinohara, R.T.; Li, M. SpaGCN: Integrating gene expression, spatial location and histology to identify spatial domains and spatially variable genes by graph convolutional network. Nat. Methods
**2021**, 18, 1342–1351. [Google Scholar] [CrossRef] - Li, K.; Yan, C.; Li, C.; Chen, L.; Zhao, J.; Zhang, Z.; Bao, S.; Sun, J.; Zhou, M. Computational elucidation of spatial gene expression variation from spatially resolved transcriptomics data. Mol. Ther. Nucleic Acids
**2022**, 27, 404–411. [Google Scholar] [CrossRef] - Lu, L.; Welch, J.D. PyLiger: Scalable single-cell multi-omic data integration in Python. Bioinformatics
**2022**, 38, 2946–2948. [Google Scholar] [CrossRef] - Wolf, F.A.; Angerer, P.; Theis, F.J. SCANPY: Large-scale single-cell gene expression data analysis. Genome Biol.
**2018**, 19, 15. [Google Scholar] [CrossRef] - Gayoso, A.; Lopez, R.; Xing, G.; Boyeau, P.; Valiollah Pour Amiri, V.; Hong, J.; Wu, K.; Jayasuriya, M.; Mehlman, E.; Langevin, M.; et al. A Python library for probabilistic analysis of single-cell omics data. Nat. Biotechnol.
**2022**, 40, 163–166. [Google Scholar] [CrossRef] - Bae, S.; Choi, H.; Lee, D.S. Discovery of molecular features underlying the morphological landscape by integrating spatial transcriptomic data with deep features of tissue images. Nucleic Acids Res.
**2021**, 49, e55. [Google Scholar] [CrossRef] - BinTayyash, N.; Georgaka, S.; John, S.T.; Ahmed, S.; Boukouvalas, A.; Hensman, J.; Rattray, M. Non-parametric modelling of temporal and spatial counts data from RNA-seq experiments. Bioinformatics
**2021**, 37, 3788–3795. [Google Scholar] [CrossRef] [PubMed] - Hao, M.; Hua, K.; Zhang, X. SOMDE: A scalable method for identifying spatially variable genes with self-organizing map. Bioinformatics
**2021**, 37, 4392–4398. [Google Scholar] [CrossRef] [PubMed] - Zhu, Q.; Shah, S.; Dries, R.; Cai, L.; Yuan, G.C. Identification of spatially associated subpopulations by combining scRNAseq and sequential fluorescence in situ hybridization data. Nat. Biotechnol.
**2018**, 36, 1183–1190. [Google Scholar] [CrossRef] [PubMed] - Dong, K.; Zhang, S. Deciphering spatial domains from spatially resolved transcriptomics with an adaptive graph attention auto-encoder. Nat. Commun.
**2022**, 13, 1739. [Google Scholar] [CrossRef] - Chen, S.; Zhang, B.; Chen, X.; Zhang, X.; Jiang, R. stPlus: A reference-based method for the accurate enhancement of spatial transcriptomics. Bioinformatics
**2021**, 37, i299–i307. [Google Scholar] [CrossRef] - Zeng, Z.; Li, Y.; Li, Y.; Luo, Y. Statistical and machine learning methods for spatially resolved transcriptomics data analysis. Genome Biol.
**2022**, 23, 83. [Google Scholar] [CrossRef] - Moran, P.A. Notes on continuous stochastic phenomena. Biometrika
**1950**, 37, 17–23. [Google Scholar] [CrossRef] - Geary, R.C. The Contiguity Ratio and Statistical Mapping. Inc. Stat.
**1954**, 5, 115–146. [Google Scholar] [CrossRef] - Virtanen, P.; Gommers, R.; Oliphant, T.E.; Haberland, M.; Reddy, T.; Cournapeau, D.; Burovski, E.; Peterson, P.; Weckesser, W.; Bright, J.; et al. SciPy 1.0: Fundamental algorithms for scientific computing in Python. Nat. Methods
**2020**, 17, 261–272. [Google Scholar] [CrossRef] - Lahnemann, D.; Koster, J.; Szczurek, E.; McCarthy, D.J.; Hicks, S.C.; Robinson, M.D.; Vallejos, C.A.; Campbell, K.R.; Beerenwinkel, N.; Mahfouz, A.; et al. Eleven grand challenges in single-cell data science. Genome Biol
**2020**, 21, 31. [Google Scholar] [CrossRef] - Pardo, B.; Spangler, A.; Weber, L.M.; Page, S.C.; Hicks, S.C.; Jaffe, A.E.; Martinowich, K.; Maynard, K.R.; Collado-Torres, L. spatialLIBD: An R/Bioconductor package to visualize spatially-resolved transcriptomics data. BMC Genom.
**2022**, 23, 434. [Google Scholar] [CrossRef] [PubMed] - Maynard, K.R.; Collado-Torres, L.; Weber, L.M.; Uytingco, C.; Barry, B.K.; Williams, S.R.; Catallini, J.L., II; Tran, M.N.; Besich, Z.; Tippani, M.; et al. Transcriptome-scale spatial gene expression in the human dorsolateral prefrontal cortex. Nat. Neurosci.
**2021**, 24, 425–436. [Google Scholar] [CrossRef] - 10XGenomics. Visium Spatial Gene Expression Reagent Kits User Guide. 2021. Available online: https://www.10xgenomics.com/support/spatial-gene-expression-fresh-frozen/documentation/steps/library-construction/visium-spatial-gene-expression-reagent-kits-user-guide (accessed on 17 July 2022).
- Sunkin, S.M.; Ng, L.; Lau, C.; Dolbeare, T.; Gilbert, T.L.; Thompson, C.L.; Hawrylycz, M.; Dang, C. Allen Brain Atlas: An integrated spatio-temporal portal for exploring the central nervous system. Nucleic Acids Res.
**2013**, 41, D996–D1008. [Google Scholar] [CrossRef] [PubMed] - Gracia Villacampa, E.; Larsson, L.; Mirzazadeh, R.; Kvastad, L.; Andersson, A.; Mollbrink, A.; Kokaraki, G.; Monteil, V.; Schultz, N.; Appelberg, K.S.; et al. Genome-wide spatial expression profiling in formalin-fixed tissues. Cell Genom.
**2021**, 1, 100065. [Google Scholar] [CrossRef] - Stickels, R.R.; Murray, E.; Kumar, P.; Li, J.; Marshall, J.L.; Di Bella, D.J.; Arlotta, P.; Macosko, E.Z.; Chen, F. Highly sensitive spatial transcriptomics at near-cellular resolution with Slide-seqV2. Nat. Biotechnol.
**2021**, 39, 313–319. [Google Scholar] [CrossRef] [PubMed] - Chen, A.; Liao, S.; Cheng, M.; Ma, K.; Wu, L.; Lai, Y.; Qiu, X.; Yang, J.; Xu, J.; Hao, S.; et al. Spatiotemporal transcriptomic atlas of mouse organogenesis using DNA nanoball-patterned arrays. Cell
**2022**, 185, 1777–1792.e1721. [Google Scholar] [CrossRef] - Stuart, T.; Butler, A.; Hoffman, P.; Hafemeister, C.; Papalexi, E.; Mauck, W.M., 3rd; Hao, Y.; Stoeckius, M.; Smibert, P.; Satija, R. Comprehensive Integration of Single-Cell Data. Cell
**2019**, 177, 1888–1902.e21. [Google Scholar] [CrossRef] - Chen, H.; Lareau, C.; Andreani, T.; Vinyard, M.E.; Garcia, S.P.; Clement, K.; Andrade-Navarro, M.A.; Buenrostro, J.D.; Pinello, L. Assessment of computational methods for the analysis of single-cell ATAC-seq data. Genome Biol.
**2019**, 20, 241. [Google Scholar] [CrossRef] - Danese, A.; Richter, M.L.; Chaichoompu, K.; Fischer, D.S.; Theis, F.J.; Colome-Tatche, M. EpiScanpy: Integrated single-cell epigenomic analysis. Nat. Commun.
**2021**, 12, 5228. [Google Scholar] [CrossRef] - Chen, S.; Wang, R.; Long, W.; Jiang, R. ASTER: Accurately estimating the number of cell types in single-cell chromatin accessibility data. Bioinformatics
**2023**, 39, btac842. [Google Scholar] [CrossRef] - Chen, S.; Yan, G.; Zhang, W.; Li, J.; Jiang, R.; Lin, Z. RA3 is a reference-guided approach for epigenetic characterization of single cells. Nat. Commun.
**2021**, 12, 2177. [Google Scholar] [CrossRef] [PubMed] - Vinh, N.X.; Epps, J.; Bailey, J. Information theoretic measures for clusterings comparison: Is a correction for chance necessary? In Proceedings of the 26th Annual International Conference on Machine Learning, Montreal, QC, Canada, 14–18 June 2009; pp. 1073–1080. [Google Scholar]
- Hubert, L.; Arabie, P. Comparing partitions. J. Classif.
**1985**, 2, 193–218. [Google Scholar] [CrossRef] - Rosenberg, A.; Hirschberg, J. V-Measure: A Conditional Entropy-Based External Cluster Evaluation Measure. 2007, pp. 410–420. Available online: https://aclanthology.org/D07-1043.pdf (accessed on 17 July 2022).
- Strehl, A.; Ghosh, J. Cluster ensembles—A knowledge reuse framework for combining multiple partitions. J. Mach. Learn. Res.
**2003**, 3, 583–617. [Google Scholar] [CrossRef] - Romano, S.; Vinh, N.X.; Bailey, J.; Verspoor, K. Adjusting for chance clustering comparison measures. J. Mach. Learn. Res.
**2016**, 17, 1–32. [Google Scholar] - Cao, Z.-J.; Gao, G. Multi-omics single-cell data integration and regulatory inference with graph-linked embedding. Nat. Biotechnol.
**2022**, 40, 1458–1466. [Google Scholar] [CrossRef] [PubMed] - Abdelaal, T.; Michielsen, L.; Cats, D.; Hoogduin, D.; Mei, H.; Reinders, M.J.T.; Mahfouz, A. A comparison of automatic cell identification methods for single-cell RNA sequencing data. Genome Biol.
**2019**, 20, 194. [Google Scholar] [CrossRef] [PubMed] - Ma, W.; Su, K.; Wu, H. Evaluation of some aspects in supervised cell type identification for single-cell RNA-seq: Classifier, feature selection, and reference construction. Genome Biol.
**2021**, 22, 264. [Google Scholar] [CrossRef] [PubMed] - Korsunsky, I.; Millard, N.; Fan, J.; Slowikowski, K.; Zhang, F.; Wei, K.; Baglaenko, Y.; Brenner, M.; Loh, P.R.; Raychaudhuri, S. Fast, sensitive and accurate integration of single-cell data with Harmony. Nat. Methods
**2019**, 16, 1289–1296. [Google Scholar] [CrossRef] - Shang, L.; Zhou, X. Spatially Aware Dimension Reduction for Spatial Transcriptomics. Nat. Commun.
**2022**, 13, 7203. [Google Scholar] [CrossRef] - Mangul, S.; Martin, L.S.; Eskin, E.; Blekhman, R. Improving the usability and archival stability of bioinformatics software. Genome Biol.
**2019**, 20, 47. [Google Scholar] [CrossRef]

**Figure 2.**Louvain clustering results with (

**A**) default resolution and (

**B**) searched resolution (given a specified number of clusters) on 12 DLPFC datasets. Leiden clustering results with (

**C**) default resolution and (

**D**) searched resolution on 12 DLPFC datasets.

**Figure 3.**Louvain clustering results with (

**A**) default resolution and (

**B**) searched resolution on the MBC dataset. Leiden clustering results with (

**C**) default resolution and (

**D**) searched resolution on the MBC dataset.

**Figure 4.**(

**A**) Performance of MAP and MCVA on the 12 DLPFC datasets. (

**B**) Performance of iLISImd and iLISIm on the 12 DLPFC datasets. (

**C**) Performance of MAP, MCVA, iLISImd and iLISIm on the MBC dataset.

**Figure 5.**Visualization of datasets of (

**A**) a mouse hippocampus and (

**B**) a mouse olfactory bulb with tissue regions annotated from the Allen Brain Atlas and spatial domains detected by different methods.

**Figure 6.**(

**A**) p-values of two-sided Wilcoxon signed-rank tests on the performance of different numbers of SVGs identified by SINFONIA. (

**B**) Computation time of different methods on 12 DLPFC datasets. (

**C**) Computation time of different implementations for calculating spatial autocorrelation statistics on 12 DLPFC datasets.

**Figure 7.**Performance improvement of STAGATE using SINFONIA on (

**A**) the 12 DLPFC datasets and (

**B**) the MBC dataset. The performance of spatial clustering, domain resolution and prediction ability were evaluated.

Dataset | # of Spots | # of Genes | # of Domains | Sparsity | Protocol | Species | Reference |
---|---|---|---|---|---|---|---|

DLPFC_151507 | 4221 | 33,538 | 7 | 0.958 | 10X Visium | Homo sapiens | [22] |

DLPFC_151508 | 4381 | 33,538 | 7 | 0.964 | 10X Visium | Homo sapiens | [22] |

DLPFC_151509 | 4788 | 33,538 | 7 | 0.957 | 10X Visium | Homo sapiens | [22] |

DLPFC_151510 | 4595 | 33,538 | 7 | 0.959 | 10X Visium | Homo sapiens | [22] |

DLPFC_151669 | 3636 | 33,538 | 5 | 0.946 | 10X Visium | Homo sapiens | [22] |

DLPFC_151670 | 3484 | 33,538 | 5 | 0.950 | 10X Visium | Homo sapiens | [22] |

DLPFC_151671 | 4093 | 33,538 | 5 | 0.945 | 10X Visium | Homo sapiens | [22] |

DLPFC_151672 | 3888 | 33,538 | 5 | 0.947 | 10X Visium | Homo sapiens | [22] |

DLPFC_151673 | 3611 | 33,538 | 7 | 0.934 | 10X Visium | Homo sapiens | [22] |

DLPFC_151674 | 3635 | 33,538 | 7 | 0.920 | 10X Visium | Homo sapiens | [22] |

DLPFC_151675 | 3566 | 33,538 | 7 | 0.946 | 10X Visium | Homo sapiens | [22] |

DLPFC_151676 | 3431 | 33,538 | 7 | 0.942 | 10X Visium | Homo sapiens | [22] |

Brain coronal | 2800 | 32,285 | 15 | 0.870 | 10X Visium | Mus musculus | [23] |

Hippocampus | 53,208 | 23,264 | - | 0.982 | Slide-seqV2 | Mus musculus | [26] |

Olfactory bulb | 19,527 | 27,106 | - | 0.987 | Stereo-seq | Mus musculus | [27] |

Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content. |

© 2023 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).

## Share and Cite

**MDPI and ACS Style**

Jiang, R.; Li, Z.; Jia, Y.; Li, S.; Chen, S.
SINFONIA: Scalable Identification of Spatially Variable Genes for Deciphering Spatial Domains. *Cells* **2023**, *12*, 604.
https://doi.org/10.3390/cells12040604

**AMA Style**

Jiang R, Li Z, Jia Y, Li S, Chen S.
SINFONIA: Scalable Identification of Spatially Variable Genes for Deciphering Spatial Domains. *Cells*. 2023; 12(4):604.
https://doi.org/10.3390/cells12040604

**Chicago/Turabian Style**

Jiang, Rui, Zhen Li, Yuhang Jia, Siyu Li, and Shengquan Chen.
2023. "SINFONIA: Scalable Identification of Spatially Variable Genes for Deciphering Spatial Domains" *Cells* 12, no. 4: 604.
https://doi.org/10.3390/cells12040604