# SMCKAT, a Sequential Multi-Dimensional CNV Kernel-Based Association Test

^{1}

^{2}

^{*}

^{†}

## Abstract

**:**

## 1. Introduction

## 2. Method and Materials

#### 2.1. Pair CNV Group Kernel

#### 2.2. Whole Genome CNV Group Kernel

#### 2.3. Kernel-Based Association Test

#### 2.4. Common and Rare CNV Data

## 3. Simulation Studies

#### Simulation Results

## 4. Real Data Application Results

#### 4.1. CNV Analysis on Rhabdomyosarcoma Data Set

#### 4.2. CNV Analysis on Cytogenetic Bands in RMS

#### 4.3. CNV Analysis on Autism Data Set

## 5. Discussion

## 6. Conclusions

## Author Contributions

## Funding

## Institutional Review Board Statement

## Informed Consent Statement

## Data Availability Statement

## Conflicts of Interest

## References

- National Human Genome Research Institute. Genetics vs. Genomics Fact Sheet; National Human Genome Research Institute: Bethesda, MD, USA, 2018. Available online: https://www.genome.gov/about-genomics/fact-sheets/Genetics-vs-Genomics (accessed on 20 November 2021).
- Frazer, K.A.; Murray, S.S.; Schork, N.J.; Topol, E.J. Human genetic variation and its contribution to complex traits. Nat. Rev. Genet.
**2009**, 10, 241–251. [Google Scholar] [CrossRef] - Edwards, D.; Forster, J.W.; Chagné, D.; Batley, J. What Are SNPs? In Association Mapping in Plants; Springer: Berlin/Heidelberg, Germany, 2007; pp. 41–52. [Google Scholar]
- Schrider, D.R.; Hahn, M.W. Gene copy-number polymorphism in nature. Proc. R. Soc. B Biol. Sci.
**2010**, 277, 3213–3221. [Google Scholar] [CrossRef] [Green Version] - Monlong, J.; Cossette, P.; Meloche, C.; Rouleau, G.; Girard, S.L.; Bourque, G. Human copy number variants are enriched in regions of low mappability. Nucleic Acids Res.
**2018**, 46, 7236–7249. [Google Scholar] [CrossRef] [PubMed] - Zhan, X.; Girirajan, S.; Zhao, N.; Wu, M.C.; Ghosh, D. A novel copy number variants kernel association test with application to autism spectrum disorders studies. Bioinformatics
**2016**, 32, 3603–3610. [Google Scholar] [CrossRef] [PubMed] [Green Version] - Brucker, A.; Lu, W.; West, R.M.; Yu, Q.Y.; Hsiao, C.K.; Hsiao, T.H.; Lin, C.H.; Magnusson, P.K.; Sullivan, P.F.; Szatkiewicz, J.P.; et al. Association test using Copy Number Profile Curves (CONCUR) enhances power in rare copy number variant analysis. PLoS Comput. Biol.
**2020**, 16, e1007797. [Google Scholar] [CrossRef] - Esfahani, N.M.; Catchpoole, D.; Khan, J.; Kennedy, P.J. MCKAT, a multi-dimensional copy number variant kernel association test. BMC Bioinform.
**2021**. [Google Scholar] [CrossRef] - Liu, D.; Ghosh, D.; Lin, X. Estimation and testing for the effect of a genetic pathway on a disease outcome using logistic kernel machine regression via logistic mixed models. BMC Bioinform.
**2008**, 9, 292. [Google Scholar] [CrossRef] [PubMed] [Green Version] - Wu, M.C.; Kraft, P.; Epstein, M.P.; Taylor, D.M.; Chanock, S.J.; Hunter, D.J.; Lin, X. Powerful SNP-set analysis for case-control genome-wide association studies. Am. J. Hum. Genet.
**2010**, 86, 929–942. [Google Scholar] [CrossRef] [Green Version] - Davies, R.B. The distribution of a linear combination of χ2 random variables. J. R. Stat. Soc. Ser. C Appl. Stat.
**1980**, 29, 323–333. [Google Scholar] - Shern, J.F.; Chen, L.; Chmielecki, J.; Wei, J.S.; Patidar, R.; Rosenberg, M.; Ambrogio, L.; Auclair, D.; Wang, J.; Song, Y.K.; et al. Comprehensive genomic analysis of Rhabdomyosarcoma reveals a landscape of alterations affecting a common genetic axis in fusion-positive and fusion-negative tumors. Cancer Discov.
**2014**, 4, 216–231. [Google Scholar] [CrossRef] [Green Version] - Girirajan, S.; Brkanac, Z.; Coe, B.P.; Baker, C.; Vives, L.; Vu, T.H.; Shafer, N.; Bernier, R.; Ferrero, G.B.; Silengo, M.; et al. Relative burden of large CNVs on a range of neurodevelopmental phenotypes. PLoS Genet.
**2011**, 7, e1002334. [Google Scholar] [CrossRef] - El Demellawy, D.; McGowan-Jordan, J.; De Nanassy, J.; Chernetsova, E.; Nasr, A. Update on molecular findings in rhabdomyosarcoma. Pathology
**2017**, 49, 238–246. [Google Scholar] [CrossRef] - Sun, X.; Guo, W.; Shen, J.K.; Mankin, H.J.; Hornicek, F.J.; Duan, Z. Rhabdomyosarcoma: Advances in molecular and cellular biology. Sarcoma
**2015**, 2015, 232010. [Google Scholar] [CrossRef] [Green Version] - Nishimura, R.; Takita, J.; Sato-Otsubo, A.; Kato, M.; Koh, K.; Hanada, R.; Tanaka, Y.; Kato, K.; Maeda, D.; Fukayama, M.; et al. Characterization of genetic lesions in Rhabdomyosarcoma using a high-density single nucleotide polymorphism array. Cancer Sci.
**2013**, 104, 856–864. [Google Scholar] [CrossRef]

**Figure 1.**Generating CNV profile ${R}_{i}$ where CNVs are sorted with respect to their chromosomal position. A, B,…, and F are arbitrary CNVs at ${m}^{th}$, ${m}^{th+1}$, …, and ${m}^{th+n}$ positions and ${G}_{i}$ is a group of CNVs of size n.

**Figure 2.**Aligning CNVs within two CNV groups of size n, ${G}_{i}$ and ${G}_{j}$, to generate n CNV pairs.

**Figure 4.**Aligning ${G}_{z}^{i}$ to the best group-to-group correspondence of the highest similarity among ${G}_{z-1}^{j}$, ${G}_{z}^{j}$ and ${G}_{z+1}^{j}$.

**Figure 6.**p-value based QQ-plots of MCKAT and CKAT under first (

**a**) and second (

**b**) simulation scenario.

**Figure 8.**Empirical power of SMCKAT, MCKAT and CKAT under second simulation scenario, common CNV data.

**Table 1.**p-values of testing the association between CNV sequential order and RMS subtype trying different CNV group sizes. n is the group size and (#) denotes the total number of CNVs on the chromosome.

Chr. | #CNV | n = 1 | n = 2 | n = 3 | n = 4 | n = 5 | n = 6 |
---|---|---|---|---|---|---|---|

2 | 5584 | $2.45\times {10}^{-2}$ | $5.10\times {10}^{-2}$ | $8.31\times {10}^{-2}$ | $3.49\times {10}^{-3}$ | $4.25\times {10}^{-3}$ | $3.21\times {10}^{-2}$ |

8 | 5365 | $2.61\times {10}^{-5}$ | $7.37\times {10}^{-6}$ | $1.13\times {10}^{-6}$ | $7.63\times {10}^{-7}$ | $4.99\times {10}^{-8}$ | 0 |

11 | 3449 | $2.03\times {10}^{-2}$ | $8.26\times {10}^{-3}$ | $2.93\times {10}^{-3}$ | $1.54\times {10}^{-3}$ | $5.82\times {10}^{-4}$ | $1.20\times {10}^{-4}$ |

13 | 2462 | $1.80\times {10}^{-3}$ | $3.56\times {10}^{-3}$ | $4.86\times {10}^{-3}$ | $6.06\times {10}^{-3}$ | $7.89\times {10}^{-3}$ | $6.23\times {10}^{-2}$ |

**Table 2.**p-values of the testing association between RMS subtype and CNVs in the chromosome 8 cytogenetic bands by SMCKAT, MCKAT and CKAT. (*) denotes significant association between RMS subtype and CNVs, (#) denotes the number of total CNVs on the band.

Arm | Band | Start | Stop | #CNVs | SMCKAT | MCKAT | CKAT | ||||
---|---|---|---|---|---|---|---|---|---|---|---|

p | 23 | 3 | 1 | $2,300,000$ | 113 | 9 | $6\times {10}^{-2}$ | 3 | $4\times {10}^{-4}$ * | 4 | $917\times {10}^{-1}$ |

p | 23 | 2 | $2,300,001$ | $6,300,000$ | 85 | 3 | $0\times {10}^{-2}$ | 2 | $0\times {10}^{-2}$ | 3 | $939\times {10}^{-1}$ |

p | 23 | 1 | $6,300,001$ | $12,800,000$ | 304 | 1 | $8\times {10}^{-4}$ * | 4 | $7\times {10}^{-8}$ * | 4 | $755\times {10}^{-1}$ |

p | 22 | 0 | $12,800,001$ | $19,200,000$ | 101 | 2 | $8\times {10}^{-2}$ | 8 | $2\times {10}^{-3}$ | 4 | $327\times {10}^{-1}$ |

p | 21 | 3 | $19,200,001$ | $23,500,000$ | 102 | 1 | $1\times {10}^{-1}$ | 2 | $5\times {10}^{-2}$ | 4 | $237\times {10}^{-1}$ |

p | 21 | 2 | $23,500,001$ | $27,500,000$ | 82 | 3 | $4\times {10}^{-2}$ | 3 | $6\times {10}^{-2}$ | 4 | $717\times {10}^{-1}$ |

p | 21 | 1 | $27,500,001$ | $29,000,000$ | 50 | 2 | $5\times {10}^{-2}$ | 1 | $6\times {10}^{-2}$ | 4 | $948\times {10}^{-1}$ |

p | 12 | 0 | $29,000,001$ | $36,700,000$ | 190 | 1 | $3\times {10}^{-6}$ * | 3 | $7\times {10}^{-5}$ * | 4 | $658\times {10}^{-1}$ |

p | 11 | 23 | $36,700,001$ | $38,500,000$ | 48 | 1 | 0 | 3 | $7\times {10}^{-3}$ | 3 | $916\times {10}^{-1}$ |

p | 11 | 22 | $38,500,001$ | $39,900,000$ | 57 | 9 | $3\times {10}^{-2}$ | 8 | $4\times {10}^{-3}$ | 4 | $613\times {10}^{-1}$ |

p | 11 | 21 | $39,900,001$ | $43,200,000$ | 147 | 4 | $4\times {10}^{-3}$ | 1 | $0\times {10}^{-4}$ * | 3 | $655\times {10}^{-1}$ |

p | 11 | 1 | $43,200,001$ | $45,200,000$ | 72 | 8 | $8\times {10}^{-2}$ | 2 | $8\times {10}^{-2}$ | 4 | $584\times {10}^{-1}$ |

q | 11 | 1 | $45,200,001$ | $47,200,000$ | 41 | 1 | 0 | 2 | $1\times {10}^{-2}$ | 4 | $436\times {10}^{-1}$ |

q | 11 | 21 | $47,200,001$ | $51,300,000$ | 200 | 4 | $4\times {10}^{-3}$ | 8 | $4\times {10}^{-5}$ * | 4 | $064\times {10}^{-1}$ |

q | 11 | 22 | $51,300,001$ | $51,700,000$ | 6 | 9 | $3\times {10}^{-1}$ | 4 | $7\times {10}^{-2}$ | 4 | $200\times {10}^{-1}$ |

q | 11 | 23 | $51,700,001$ | $54,600,000$ | 61 | 1 | 0 | 6 | $1\times {10}^{-2}$ | 4 | $657\times {10}^{-1}$ |

q | 12 | 1 | $54,600,001$ | $60,600,000$ | 177 | 9 | $1\times {10}^{-3}$ | 7 | $0\times {10}^{-4}$ * | 4 | $505\times {10}^{-1}$ |

q | 12 | 2 | $60,600,001$ | $61,300,000$ | 18 | 1 | 0 | 3 | $3\times {10}^{-2}$ | 4 | $502\times {10}^{-1}$ |

q | 12 | 3 | $61,300,001$ | $65,100,000$ | 134 | 4 | $9\times {10}^{-2}$ | 1 | $1\times {10}^{-2}$ | 4 | $110\times {10}^{-1}$ |

q | 13 | 1 | $65,100,001$ | $67,100,000$ | 71 | 4 | $4\times {10}^{-2}$ | 5 | $8\times {10}^{-3}$ | 4 | $427\times {10}^{-1}$ |

q | 13 | 2 | $67,100,001$ | $69,600,000$ | 54 | 5 | $8\times {10}^{-2}$ | 4 | $3\times {10}^{-3}$ | 4 | $659\times {10}^{-1}$ |

q | 13 | 3 | $69,600,001$ | $72,000,000$ | 62 | 1 | $4\times {10}^{-2}$ | 1 | $8\times {10}^{-3}$ | 3 | $762\times {10}^{-1}$ |

q | 21 | 11 | $72,000,001$ | $74,600,000$ | 144 | 4 | $8\times {10}^{-1}$ | 8 | $4\times {10}^{-3}$ | 3 | $325\times {10}^{-1}$ |

q | 21 | 12 | $74,600,001$ | $74,700,000$ | 1 | 1 | 0 | 1 | 0 | 1 | 0 |

q | 21 | 13 | $74,700,001$ | $83,500,000$ | 308 | 1 | $0\times {10}^{-2}$ | 2 | $6\times {10}^{-3}$ | 4 | $927\times {10}^{-1}$ |

q | 21 | 2 | $83,500,001$ | $85,900,000$ | 56 | 4 | $8\times {10}^{-2}$ | 2 | $9\times {10}^{-2}$ | 4 | $189\times {10}^{-1}$ |

q | 21 | 3 | $85,900,001$ | $92,300,000$ | 185 | 4 | $7\times {10}^{-3}$ | 1 | $0\times {10}^{-4}$ * | 4 | $215\times {10}^{-1}$ |

q | 22 | 1 | $92,300,001$ | $97,900,000$ | 182 | 1 | $7\times {10}^{-2}$ | 1 | $0\times {10}^{-2}$ | 3 | $072\times {10}^{-1}$ |

q | 22 | 2 | $97,900,001$ | $100,500,000$ | 103 | 4 | $5\times {10}^{-2}$ | 3 | $9\times {10}^{-3}$ | 4 | $395\times {10}^{-1}$ |

q | 22 | 3 | $100,500,001$ | $105,100,000$ | 162 | 1 | $2\times {10}^{-2}$ | 4 | $6\times {10}^{-3}$ | 4 | $458\times {10}^{-1}$ |

q | 23 | 1 | $105,100,001$ | $109,500,000$ | 135 | 2 | $8\times {10}^{-3}$ | 2 | $5\times {10}^{-3}$ | 4 | $017\times {10}^{-1}$ |

q | 23 | 2 | $109,500,001$ | $111,100,000$ | 33 | 9 | $8\times {10}^{-1}$ | 8 | $0\times {10}^{-1}$ | 3 | $005\times {10}^{-1}$ |

q | 23 | 3 | $111,100,001$ | $116,700,000$ | 185 | 1 | $1\times {10}^{-2}$ | 2 | $3\times {10}^{-3}$ | 4 | $419\times {10}^{-1}$ |

q | 24 | 11 | $116,700,001$ | $118,300,000$ | 53 | 4 | $6\times {10}^{-2}$ | 2 | $6\times {10}^{-2}$ | 4 | $705\times {10}^{-1}$ |

q | 24 | 12 | $118,300,001$ | $121,500,000$ | 109 | 2 | $5\times {10}^{-3}$ | 2 | $2\times {10}^{-3}$ | 4 | $068\times {10}^{-1}$ |

q | 24 | 13 | $121,500,001$ | $126,300,000$ | 151 | 2 | $2\times {10}^{-2}$ | 6 | $0\times {10}^{-3}$ | 4 | $856\times {10}^{-1}$ |

q | 24 | 21 | $126,300,001$ | $130,400,000$ | 208 | 5 | $0\times {10}^{-2}$ | 1 | $9\times {10}^{-2}$ | 3 | $922\times {10}^{-1}$ |

q | 24 | 22 | $130,400,001$ | $135,400,000$ | 155 | 5 | $5\times {10}^{-2}$ | 1 | $5\times {10}^{-2}$ | 4 | $638\times {10}^{-1}$ |

q | 24 | 23 | $135,400,001$ | $138,900,000$ | 162 | 2 | $8\times {10}^{-1}$ | 7 | $7\times {10}^{-3}$ | 4 | $512\times {10}^{-1}$ |

q | 24 | 3 | $138,900,001$ | $145,138,636$ | 354 | 8 | $8\times {10}^{-3}$ | 2 | $5\times {10}^{-8}$ * | 4 | $277\times {10}^{-1}$ |

**Table 3.**p-values of testing the association between CNV sequential order and ASD status trying different CNV group sizes.

n | 1 | 2 | 3 | 4 | 5 | 6 |
---|---|---|---|---|---|---|

p-value | 0 | $7.91\times {10}^{-9}$ | $3.09\times {10}^{-6}$ | $3.62\times {10}^{-4}$ | $4.89\times {10}^{-3}$ | $1.03\times {10}^{-1}$ |

Publisher’s Note: MDPI stays neutral with regard to jurisdictional claims in published maps and institutional affiliations. |

© 2021 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).

## Share and Cite

**MDPI and ACS Style**

Maus Esfahani, N.; Catchpoole, D.; Kennedy, P.J.
SMCKAT, a Sequential Multi-Dimensional CNV Kernel-Based Association Test. *Life* **2021**, *11*, 1302.
https://doi.org/10.3390/life11121302

**AMA Style**

Maus Esfahani N, Catchpoole D, Kennedy PJ.
SMCKAT, a Sequential Multi-Dimensional CNV Kernel-Based Association Test. *Life*. 2021; 11(12):1302.
https://doi.org/10.3390/life11121302

**Chicago/Turabian Style**

Maus Esfahani, Nastaran, Daniel Catchpoole, and Paul J. Kennedy.
2021. "SMCKAT, a Sequential Multi-Dimensional CNV Kernel-Based Association Test" *Life* 11, no. 12: 1302.
https://doi.org/10.3390/life11121302