Next Article in Journal
Pharmacogenetic Study of Trabectedin-Induced Severe Hepatotoxicity in Patients with Advanced Soft Tissue Sarcoma
Next Article in Special Issue
From Allergy to Cancer—Clinical Usefulness of Eotaxins
Previous Article in Journal
Adjuvant Radiation Therapy for Male Breast Cancer—A Rare Indication?
Previous Article in Special Issue
Prognostic Role of the Red Blood Cell Distribution Width (RDW) in Hodgkin Lymphoma
 
 
Font Type:
Arial Georgia Verdana
Font Size:
Aa Aa Aa
Line Spacing:
Column Width:
Background:
Article

Global Autozygosity Is Associated with Cancer Risk, Mutational Signature and Prognosis

1
Department of Internal Medicine, Comprehensive Cancer Center, University of New Mexico, Albuquerque, NM 87109, USA
2
School of Computer Science and Technology, College of Intelligence and Computing, Tianjin University, Tianjin 300350, China
3
Department of Computer Science, University of South Carolina, Columbia, SC 29208, USA
4
Department of Biostatistics, Vanderbilt University, Nashville, TN 37232, USA
5
Department of Molecular Physiology and Biophysics, Vanderbilt Genetics Institute, Vanderbilt University, Nashville, TN 37232, USA
*
Author to whom correspondence should be addressed.
Cancers 2020, 12(12), 3646; https://doi.org/10.3390/cancers12123646
Submission received: 3 October 2020 / Revised: 25 November 2020 / Accepted: 1 December 2020 / Published: 4 December 2020
(This article belongs to the Special Issue The Biomarkers for the Diagnosis and Prognosis in Cancer)

Abstract

:

Simple Summary

Global autozygosity in the form of runs of homozygosity is associated with various diseases. Heterozygosity ratio, an alternative measure of global autozygosity, is used to assess cancer risk in this study. Our analysis shows strong and consistent associations between heterozygosity ratios and various cancer types. Further analysis reveals the heterozygosity ratio’s potential connections to mutational signatures and cancer prognosis.

Abstract

Global autozygosity quantifies the genome-wide levels of homozygous and heterozygous variants. It is the signature of non-random reproduction, though it can also be driven by other factors, and has been used to assess risk in various diseases. However, the association between global autozygosity and cancer risk has not been studied. From 4057 cancer subjects and 1668 healthy controls, we found strong associations between global autozygosity and risk in ten different cancer types. For example, the heterozygosity ratio was found to be significantly associated with breast invasive carcinoma in Blacks and with male skin cutaneous melanoma in Caucasians. We also discovered eleven associations between global autozygosity and mutational signatures which can explain a portion of the etiology. Furthermore, four significant associations for heterozygosity ratio were revealed in disease-specific survival analyses. This study demonstrates that global autozygosity is effective for cancer risk assessment.

Graphical Abstract

1. Introduction

The human genome is comprised of approximately three billion base pairs. Single Nucleotide Polymorphisms (SNPs) can affect various disease risks as shown by numerous genome-wide association studies (GWAS). According to the GWAS catalog (May 2020), 4424 unique SNPs have been found to influence cancer risk with p < 10−5 significance. While an SNP describes the allelic information at a single genomic position, global heterozygosity and homozygosity describe the genome-wide zygosity level. Heterozygosity describes the possession of two different alleles of an SNP, and homozygosity describes the possession of the same allele at a genomic position. Global homozygosity is often measured in the form of Runs of Homozygosity (ROH) [1], which is a measure of the segments of the genome without heterozygous SNPs. The associations between ROH and many phenotypes have been thoroughly established, such as height [2], schizophrenia [3], Alzheimer’s Disease [4], along with many others. ROH can be calculated through common genomic toolboxes, such as PLINK [5] and BCFtools [6]. The units of ROH can vary. In some studies [2,3,4], the number of ROH segments detected (based on a minimum threshold) was used for the association study. In another [7], the median length of ROH was used. Regardless, the computation of ROH is dependent on many factors, such as the density and quality of SNP data, the number of tolerated heterozygous SNPs within an ROH, and the size of the sliding window. It has been demonstrated that ROH is highly sensitive to SNP density (the coverage of genome-wide SNPs by an array) and that genotyping arrays of different SNP densities may produce very different ROH results [7]. These inconsistencies in ROH computation can lead to contradictions in ROH association analyses [8,9,10].
An alternative measure of autozygosity is the Heterozygosity Ratio (HR), which is the ratio between the number of heterozygous SNPs and the number of non-reference homozygous SNPs. The HR was originally proposed as a quality control parameter for SNP data because it has a theoretically expected value of 2 [11]. A subsequent study showed that the observed average HR value is dependent on race, with only African ancestry individuals having a ratio close to 2, whereas the HRs for other races are substantially lower in empirical data [12]. Unlike ROH, the computation of HR does not require adjustable parameters. Thus, HR is more robust than ROH against variable SNP density [7]. ROHs have previously been tested for cancer risk; most of those studies found no association [13,14]. HR’s association with cancer risk has not been evaluated. In this study, we focused on global HR measures and their associations with cancer risk, cancer prognosis, and the mutagenesis process.

2. Results

2.1. HRNonRef vs. HRMinor vs. ROH

The HR was traditionally computed as the ratio between the number of heterozygous SNPs divided by the number of non-reference homozygous SNPs, which we define as HRNonRef. Since the human reference genome was constructed from a small subset of individuals, it is limited by a small sample size and poor global race representation. These limitations result in many cases where the reference allele is not the major allele of the population. By comparing The Cancer Genome Atlas (TCGA) SNP data and reference allele in the GRCh38 genome, we estimated that around 8% of the variants in the human reference genome may not represent the major allele of the population. This inconsistency between reference allele and major allele has a potential effect on HR computation, which has not been previously considered. To investigate this potential difference, we define HRNonRef and HRMinor, where HRNonRef uses reference alleles based on the traditional HR definition and HRMinor uses major alleles defined by the study cohort instead. HRNonRef and HRMinor were computed for all TCGA SNP and International Genome Sample Resources (IGSR) SNP datasets. High correlations were observed between HRNonRef and HRMinor for all Caucasians, Blacks, and Asians (Figure 1), although the correlation dropped slightly after imputation, most likely due to the additional noise introduced. The only difference between HRNonRef and HRMinor is the scaling, as HRMinor appears to be 2 to 3 times the value of HRNonRef. Due to the similarity between HRNonRef and HRMinor, we chose to use HRNonRef for all subsequent analyses. Furthermore, the scatter plot (Figure S1) between HRNonRef and HRMinor resembles the scatter plot of principal component 1 vs. principal component 2 from a principal component analysis of ancestry informative markers. This result implies that the differences in HR measures between the two definitions is strongly associated with race.
A previous study [7] has shown that HR is more robust than ROH due to immunity to SNP density. We performed verification of this result by comparing the results before and after imputation. Figure 2 shows that imputation produced ROH outliers which severely hampered the overall correlation of ROH. We further evaluated the race and sex differences between HR and ROH. Differences by race for HRNonRef and ROH are shown in Figure 3A, with the highest average HR for Blacks, followed by Caucasians and Asians. These results are consistent with previous publications [7,12]. Sex differences for HR were not previously studied. For HRNonRef, females have a substantially higher HR than males for all races (Figure 3A). For ROH, females have higher ROH than males in Caucasians (Figure 3A) and Asians (Figure 3B), males had higher ROH than females in Blacks (Figure 3C). However, the sex difference of HR and ROH is primarily contributed by the sex chromosomes, and after removing Chromosomes X and Y from the computation of HR and ROH, the difference is substantially reduced (Figure S2). The violin plots also demonstrate that HRNonRef has less variation than ROH. The cancer subjects’ median HRNonRef is visibly higher than normal subjects’ median HRNonRef, which results in significant associations with cancer risk. Furthermore, we found no significant association between HRNonRef/ROH and age, which is in concordance with the conventional genetic concept that that germline variants should not be affected by age. Because of the strong differences of HRNonRef and ROH based on sex and race, and varying prevalence of cancer, all subsequent analyses were stratified by sex and race.

2.2. Global Autozygosity and Cancer Risk

We evaluated the association of HRNonRef and ROH with cancer risk using logistic regression with cancer cases from TCGA and healthy controls from IGSR. To avoid any selection bias, HRNonRef and ROH were computed from SNPs present in both TCGA and IGSR. Races (Caucasian, Black and Asian) and sex were tested separately. Tests were limited to case sample size greater than 100. Ten cancer types met the criteria and were tested. For Caucasians, logistic regression analyses showed that HRNonRef is significantly positively associated with cancer risk in all ten cancer types (Table 1). The strongest cancer association was for male skin cutaneous melanoma ( p = 5.28 × 10 12 ) , followed by female ovarian cancer ( p = 3.34 × 10 11 ) . For Blacks, only breast invasive carcinoma met the case sample size greater than 100 criteria. HRNonRef was found to be positively significantly associated with breast invasive carcinoma ( p = 4.89 × 10 28 ) , a result more extreme than the Caucasian’s counterpart. For Asians, only liver hepatocellular carcinoma met the sample size requirement, and HRNonRef was positively significantly associated ( p = 0.001 ) . The global positive associations between HRNonRef and cancer risk suggest that individuals with a more heterozygous genome are at higher risk for multiple cancer types. Receiver operating characteristic (ROC) curves show that statistically significant cancer, race, sex groups had area under curve (AUC) between 0.54 and 0.88, with Black females in breast cancer being the most predictive group (Figure S3).
For Caucasians, ROH was found to be significantly associated with nine out of ten cancer types (Table 2). However, the directions of association were mixed, with four negative and five positive associations. Furthermore, there were several differences based on sex. In head and neck squamous cell carcinoma, ROH was not significantly associated with cancer risk for females but was nominally significant for males ( p = 0.02 ) . In lung adenocarcinoma, ROH was significantly associated with cancer risk for females ( p = 0.007 ) , but was not significant for males. Similarly, in skin cutaneous melanoma, ROH was nominally significantly associated with cancer risk for females ( p = 0.01 ) , but not for males. For Blacks, in breast invasive carcinoma, ROH was borderline associated with breast invasive carcinoma ( p = 0.07 ) . For Asians, in liver hepatocellular carcinoma, ROH was significantly associated with cancer risk for males ( p = 0.03 ) . The associations between HRNonRef and cancer risk are stronger and more consistent than those of ROH. The inconsistency of association direction and the inconsistency between sexes observed in ROH results may also represent the instability of ROH measurement from incomplete genotyping data.
In addition to the cancer risk analysis by race, sex and cancer type, we also conducted meta analysis across all possible cancer types to study the overall effect. The meta-analyses were conducted based on the results from cancer risk analysis with the same inclusion criteria. A random effect model was used because the heterogeneity test was significant which indicated heterogeneity across the cancer types. A meta-analysis on HRNonRef showed significant association of HRNonRef with cancers ( p = 3.04 × 10 19 ) (Figure 4A). For ROH, the meta random effect model produced was not significant ( p = 0.1 ) (Figure 4B). The meta analysis further demonstrated the robustness of HRNonRef over ROH.

2.3. Mutational Signatures and Survival Analysis

Somatic mutation is one of the most important aspects of cancer. Somatic mutations occur as the consequence of a mutational process that is triggered by either endogenous errors in DNA replication and repair or exogenous mutagens. Taking into consideration DNA’s complementarity, six distinct substitutions can be formed between the four nucleotides. When adding up the 5’-neighbor and the 3’-neighbor, we can derive a three-nucleotide motif from the focal substitution, and thus expand the six-substitution inventory to a 96-motif catalog, known as mutational signatures [15]. The profile of various mutational motifs in a cancer patient can be modeled as a combination of distinct mutational signatures. A mutational signature is conceived as the footprint of a mutational process in the nuclear genome, represented in the form of relative frequencies of the motifs of a mutational catalog [16,17,18,19,20].
Both mutational signature and global autozygosity represent genome-wide patterns, with mutational pattern at the somatic level and HR and ROH at the underlying germline level. As we have shown that HRNonRef is highly associated with cancer risk, we hypothesize that HRNonRef may be related to mutational signatures. Using TCGA somatic mutation data, we fit each patient into the established COSMIC reference mutational signatures, as described in the Methods section. Linear regression models were used to describe the association between mutational signatures and HRNonRef and ROH. False discovery Rate (FDR) < 0.05 was used as the significant threshold. Datasets with a sample size greater than 100 were included in the analyses. Eleven significant associations were identified, seven for HRNonRef and four for ROH (Table 3). All 11 significant results were from those subjects of Caucasian descent. Five of the seven significant HRNonRef associations were from the ovarian cancer dataset, and consisted of SBS9 ( F D R = 0.001 ), SBS18 ( F D R = 0.001 ), SBS5 ( F D R = 0.007 ), SBS7c ( F D R = 0.007 ), and SBS22 ( F D R = 0.03 ). SBS9 is a mutational signature resulting from mutations during replication by polymerase eta. SBS18’s etiology is proposed to be damaged by reactive oxygen species; SBS5’s etiology is currently unknown; SBS7c is related to ultraviolet light damage and is possibly the consequence of translesion DNA synthesis by enzymes with a propensity to insert T, and SBS22 is related to aristolochic acid exposure. The other two significant associations with HRNonRef were SBS44 (related to DNA mismatch repair, F D R = 0.02 ) in female skin cutaneous melanoma and SBS36 (related to defective base excision repair, F D R = 0.045 ) in prostate adenocarcinoma. The most significant association was found between ROH and SBS44 ( F D R = 4.62 × 10 8 ) in females with head and neck squamous cell carcinoma. The other three significant associations with ROH were SBS36 ( F D R = 3.02 × 10 5 ) in prostate adenocarcinoma, SBS42 (related to haloalkanes exposure, F D R = 0.0002 ) in male lung squamous cell carcinoma, and SBS7b (related to ultraviolet light exposure, F D R = 0.02 ) in males in male lung squamous cell carcinoma.
Next, we performed survival analyses to examine whether HRNonRef and ROH have prognostic value. Disease-specific survival analyses using Cox proportional hazard models identified no significant results for ROH under any scenarios. For HRNonRef, four race and gender-specific significant results were found (Figure 5): Asian males and liver hepatocellular carcinoma ( p = 5.96 × 10 5 ) , Caucasian males and lung adenocarcinoma ( p = 0.03 ), Caucasian females and lung adenocarcinoma ( p = 0.01 ), and Caucasian males and skin cutaneous melanoma ( p = 0.02 ). All survival results show that lower HRNonRef is associated with better prognosis.

3. Discussion

A single SNP can have a severe impact, as shown by Mendelian diseases. Multiple SNPs together can help explain a portion of a disease’s variation in the population but never fully account for the heritability. This is the famous missing heritability problem [21]. One proposed solution for this problem is that a person’s susceptibility to disease may be polygenic and dependent on many low effect variants [22]. Global autozygosity measurement expands on the polygenic idea, by measuring the genome globally. Global autozygosity as a risk factor for diseases such as schizophrenia and Alzheimer’s have been established. However, its connection to cancers has not been examined previously.
Cancer risk analysis showed that both HRNonRef and ROH are closely associated with cancer risk. However, given the same sample size, HRNonRef demonstrated stronger associations than ROH. Eight of the 14 significant HRNonRef associations were at ≤10−8, the GWAS significance level. However, we stress that since we did not carry out 1 million independent tests, GWAS level significance is not required for the multiple testing correction. For ROH, the most significant association is at p = 0.0009. Furthermore, the associations between HRNonRef and cancer risk are more consistent than ROH. While all 14 significant HRNonRef and cancer risk associations are positive, eight of the 14 significant associations for ROH are positive, and six are negative. These results further illustrate the robustness of HR as a characterization of the genome variability.
The literature [23,24] has shown that a single SNP can increase cancer risk. Our analysis results for global autozygosity also suggest that genome-wide characteristics can also affect cancer risk. However, the etiology behind global autozygosity and cancer risk is not well understood, and it would be even harder to study the etiology for global autozygosity compared to single SNPs due to the lack of precise targets. Nonetheless, we performed additional analyses to assess the associations between global autozygosity and mutational signatures. Mutational signatures are constructed from somatic mutations, which can represent the mutagenesis history. A previous study has shown the link between germline variants and somatic mutation [25]. For example, germline variants in RBFOX1 increased the incidence of SF3B1 somatic mutation eight-fold via functional alterations in RNA splicing, and 19p13.3 variants were associated with a four-fold increased likelihood of somatic mutations in PTEN. Thus, it is not unreasonable to hypothesize that there are connections between global autozygosity and mutational signatures. Our analyses found eleven significant associations between global autozygosity and various mutational signatures after correcting for false discovery. These results suggest that global autozygosity is related to some mutational processes. It might affect the risk of DNA mismatch in the repair process after exposure to carcinogens such as ultraviolet light and haloalkanes. Disease-specific survival analysis also identified four significant associations for HRNonRef, which also suggest potential prognosis associations of global autozygosity.
One of the limitations of HRNonRef is that it requires the measurement of the entire genome. Compared to biomarkers of a few SNPs and genes, HRNonRef computation is more expensive and time consuming. However, in previous work [7], we showed that HRNonRef computed from a random subset of SNPs can be a robust representation of the true HRNonRef. Furthermore, the price of whole genome genotyping has dropped below USD 100 per subject, well within the range of acceptable cost. Further cost reduction can be achieved by estimating HRNonRef from the subset of SNPs. Although, the criteria of the subset of SNPs to best estimate HRNonRef requires additional study.

4. Materials and Methods

4.1. Genotyping Data Acquisition and Imputation

Germline SNP data were obtained from 4833 subjects with 12 cancer types from the Affymetrix Genome-Wide Human SNP Array 6.0 in The Cancer Genome Atlas (TCGA), which contains 934,968 SNPs. All SNP data used in this analysis were derived from blood samples. Additional SNP imputation was performed using a Hapmap Phase 3.0 reference through the Michigan Imputation Server [26]. Imputed SNPs with R2 > 0.8 were retained for further analysis. After imputation, each cancer type contained 10–16 million SNPs. The total SNP number was 164,497,868. SNP data with imputation of 1668 subjects from The International Genome Sample Resources (IGSR), formally known as the 1000 Genome Project, were also downloaded.

4.2. Somatic Mutation Data Acquisition and Mutational Signature Computation

Somatic mutation data of 10,179 patients with 33 cancer types were downloaded from the Genomic Data Commons, the gateway of TCGA. The cancer type abbreviations, full name, and detailed sample size are available in Table S1. The probability matrix for 49 established COSMIC reference mutational signatures (v3) was downloaded from Synapse Documentation (https://www.synapse.org/#!Synapse:syn11738319) (Table S2). We formalized a catalog of 96 three-nucleotide motifs that surround the mutational focus (one upstream nucleotide, one mutation site, and one downstream nucleotide site), and derived frequency tables of this motif catalog for each patient. We leveraged a computational function from the R package MutationalPatterns [27] to fit the patient mutational motif frequency tables to the reference mutational signatures while requiring the coefficients, i.e., signature-to-patient contribution strengths, to be non-negative values. The estimated coefficients formed a 96-by-10,179 matrix of non-negative values, representing the distribution of 96 mutational motifs across the 10,179 patients.

4.3. HR and ROH Computation

Two types of HR were computed, which we denote as HRNonRef and HRMinor. HRNonRef is computed as Nhet/NHomNonRef, where Nhet is the number of heterozygous SNPs, and NHomNonRef is the number of homozygous non-reference SNPs. These definitions are consistent with previous studies [7,11,12] of HR. To study the potential effect when the reference allele does not equal the major allele in a cohort, we also defined HRMinor as Nhet/NHomMinor, where NHomMinor is the number of homozygous minor alleles based on the patient cohort. ROHs were computed using PLINK [5]. The median ROH length was used for subsequent analyses.

4.4. Statistical Analyses

All statistical analyses were conducted using 64 bit R 4.0.2. Both Spearman’s and Pearson’s correlation coefficients were used to compare HRNonRef and HRMinor. Linear regression (R glm function with family = Gaussian parameter) was used to evaluate the association between age and HRNonRef. Logistic regression (R glm function with family = binomial parameter) was used to evaluate the association between HR/ROH and cancer risk. The cancer cases from TCGA were matched with non-cancer controls from IGSR by race and sex. Moreover, to avoid any potential bias, the common set of SNPs between TCGA and IGSR was used to compute HR and ROH. The unit for HRNonRef and ROH was per standard deviation. The R function coxph was used to evaluate the survival predictability of HRNonRef and ROH. Both HRNonRef continuous and dichotomized models were conducted. For the dichotomized model, the R package maxstat was used to find the optimal dichotomization threshold. This threshold was used for dichotomizing HRNonRef into high and low groups for the Kaplan–Meier curve presentation. The associations between mutational signatures and HRNonRef/ROH were found by linear regression (R glm function with family = Gaussian parameter). The R function p.adjust with FDR parameter was used for adjusting for multiple test correction. ROC curves were drawn with the ggplot2 package. The auc_roc function from mltools R package was used to compute AUC.
Meta-analyses across all possible cancer types and groups were conducted for HRNonRef and ROH. The meta-analyses were conducted based on the results from cancer risk analysis (Table 2 and Table 3). The R function rma from the metafor package was used for the meta analysis. The random effect model was used during the meta analysis because the heterogeneity test across cancer types was significant (p < 0.05).

5. Conclusions

Our analyses of global autozygosity show that HRNonRef is a more robust measurement than ROH. More importantly, our study demonstrates the connections between global autozygosity and cancer risk. We identified strong associations between HRNonRef and cancer risk. Even though the majority of the subjects were Cacuasian, strong associations for minority groups, such as breast invasive carcinoma risk in Black women and liver hepatocellular carcinoma in Asian men, were identified. Further evidence was identified by exploring the associations between global autozygosity, mutational signatures, and cancer prognosis. These results show that global autozygosity can be used for reliable cancer risk assessment.

Supplementary Materials

The following are available online at https://www.mdpi.com/2072-6694/12/12/3646/s1, Figure S1: Scatter plot between HRNonRef and HRMinor, Figure S2: Comparison of HRNonRef and ROH between sex across all three races tested, Figure S3: ROC curves based on the cancer risk analysis in Table 1, Table S1: Cancer names and sample size, Table S2: List of mutational signatures.

Author Contributions

L.J. conducted data analysis and wrote the manuscript. Y.G. supervised the project and wrote the manuscript. F.G., J.T. provided funding and edited the manuscript. S.N. provided cancer research strategy and edited the manuscript. F.Y., H.K. provided statistical analysis support. D.C.S. provided genetics research strategy and edited the manuscript. S.L. contributed significantly during the revision of the manuscript by providing additional analysis. All authors have read and agreed to the published version of the manuscript.

Funding

This research was supported by the Cancer Center Support Grant P30CA118100 and R01ES030993-01A1 from the National Cancer Institute of USA.

Conflicts of Interest

The authors declare no conflict of interest.

References

  1. Ceballos, F.C.; Joshi, P.K.; Clark, D.W.; Ramsay, M.; Wilson, J.F. Runs of homozygosity: Windows into population history and trait architecture. Nat. Rev. Genet. 2018, 19, 220–234. [Google Scholar] [CrossRef] [PubMed]
  2. Joshi, P.K.; Esko, T.; Mattsson, H.; Eklund, N.; Gandin, I.; Nutile, T.; Jackson, A.U.; Schurmann, C.; Smith, A.V.; Zhang, W.; et al. Directional dominance on stature and cognition in diverse human populations. Nature 2015, 523, 459–462. [Google Scholar] [CrossRef] [PubMed] [Green Version]
  3. Lencz, T.; Lambert, C.; DeRosse, P.; Burdick, K.E.; Morgan, T.V.; Kane, J.M.; Kucherlapati, R.; Malhotra, A.K. Runs of homozygosity reveal highly penetrant recessive loci in schizophrenia. Proc. Natl. Acad. Sci. USA 2007, 104, 19942–19947. [Google Scholar] [CrossRef] [PubMed] [Green Version]
  4. Ghani, M.; Reitz, C.; Cheng, R.; Vardarajan, B.N.; Jun, G.; Sato, C.; Naj, A.; Rajbhandary, R.; Wang, L.S.; Valladares, O.; et al. Association of long runs of homozygosity with alzheimer disease among african american individuals. JAMA Neurol. 2015, 72, 1313–1323. [Google Scholar] [CrossRef] [PubMed] [Green Version]
  5. Purcell, S.; Neale, B.; Todd-Brown, K.; Thomas, L.; Ferreira, M.A.; Bender, D.; Maller, J.; Sklar, P.; de Bakker, P.I.; Daly, M.J.; et al. Plink: A tool set for whole-genome association and population-based linkage analyses. Am. J. Hum. Genet. 2007, 81, 559–575. [Google Scholar] [CrossRef] [PubMed] [Green Version]
  6. Narasimhan, V.; Danecek, P.; Scally, A.; Xue, Y.; Tyler-Smith, C.; Durbin, R. Bcftools/roh: A hidden markov model approach for detecting autozygosity from next-generation sequencing data. Bioinformatics 2016, 32, 1749–1751. [Google Scholar] [CrossRef] [Green Version]
  7. Samuels, D.C.; Wang, J.; Ye, F.; He, J.; Levinson, R.T.; Sheng, Q.H.; Zhao, S.L.; Capra, J.A.; Shyr, Y.; Zheng, W.; et al. Heterozygosity ratio, a robust global genomic measure of autozygosity and its association with height and disease risk. Genetics 2016, 204, 893–904. [Google Scholar] [CrossRef] [PubMed] [Green Version]
  8. Vine, A.E.; McQuillin, A.; Bass, N.J.; Pereira, A.; Kandaswamy, R.; Robinson, M.; Lawrence, J.; Anjorin, A.; Sklar, P.; Gurling, H.M.D.; et al. No evidence for excess runs of homozygosity in bipolar disorder. Psychiatr. Genet. 2009, 19, 165–170. [Google Scholar] [CrossRef] [PubMed]
  9. Sims, R.; Dwyer, S.; Harold, D.; Gerrish, A.; Hollingworth, P.; Chapman, J.; Jones, N.; Abraham, R.; Ivanov, D.; Pahwa, J.S.; et al. No evidence that extended tracts of homozygosity are associated with alzheimer’s disease. Am. J. Med. Genet. B 2011, 156B, 764–771. [Google Scholar] [CrossRef] [PubMed]
  10. Heron, E.A.; Cormican, P.; Donohoe, G.; O’Neill, F.A.; Kendler, K.S.; Riley, B.P.; Gill, M.; Corvin, A.P.; Morris, D.W.; Wellcome Trust Case, C. No evidence that runs of homozygosity are associated with schizophrenia in an irish genome-wide association dataset. Schizophr. Res. 2014, 154, 79–82. [Google Scholar] [CrossRef] [PubMed] [Green Version]
  11. Guo, Y.; Ye, F.; Sheng, Q.H.; Clark, T.; Samuels, D.C. Three-stage quality control strategies for DNA re-sequencing data. Brief. Bioinform. 2014, 15, 879–889. [Google Scholar] [CrossRef] [PubMed] [Green Version]
  12. Wang, J.; Raskin, L.; Samuels, D.C.; Shyr, Y.; Guo, Y. Genome measures used for quality control are dependent on gene function and ancestry. Bioinformatics 2015, 31, 318–323. [Google Scholar] [CrossRef] [PubMed] [Green Version]
  13. Thomsen, H.; Chen, B.; Figlioli, G.; Elisei, R.; Romei, C.; Cipollini, M.; Cristaudo, A.; Bambi, F.; Hoffmann, P.; Herms, S.; et al. Runs of homozygosity and inbreeding in thyroid cancer. BMC Cancer 2016, 16, 227. [Google Scholar] [CrossRef] [PubMed] [Green Version]
  14. Loveday, C.; Sud, A.; Litchfield, K.; Levy, M.; Holroyd, A.; Broderick, P.; Kote-Jarai, Z.; Dunning, A.M.; Muir, K.; Peto, J.; et al. Runs of homozygosity and testicular cancer risk. Andrology 2019, 7, 555–564. [Google Scholar] [CrossRef] [PubMed]
  15. Bergstrom, E.N.; Huang, M.N.; Mahto, U.; Barnes, M.; Stratton, M.R.; Rozen, S.G.; Alexandrov, L.B. Sigprofilermatrixgenerator: A tool for visualizing and exploring patterns of small mutational events. BMC Genom. 2019, 20, 685. [Google Scholar] [CrossRef] [PubMed] [Green Version]
  16. Petljak, M.; Alexandrov, L.B.; Brammeld, J.S.; Price, S.; Wedge, D.C.; Grossmann, S.; Dawson, K.J.; Ju, Y.S.; Iorio, F.; Tubio, J.M.C.; et al. Characterizing mutational signatures in human cancer cell lines reveals episodic apobec mutagenesis. Cell 2019, 176, 1282–1294.e20. [Google Scholar] [CrossRef] [Green Version]
  17. Alexandrov, L.B.; Nik-Zainal, S.; Wedge, D.C.; Aparicio, S.A.; Behjati, S.; Biankin, A.V.; Bignell, G.R.; Bolli, N.; Borg, A.; Borresen-Dale, A.L.; et al. Signatures of mutational processes in human cancer. Nature 2013, 500, 415–421. [Google Scholar] [CrossRef] [Green Version]
  18. Alexandrov, L.B.; Nik-Zainal, S.; Wedge, D.C.; Campbell, P.J.; Stratton, M.R. Deciphering signatures of mutational processes operative in human cancer. Cell Rep. 2013, 3, 246–259. [Google Scholar] [CrossRef] [Green Version]
  19. Letouze, E.; Shinde, J.; Renault, V.; Couchy, G.; Blanc, J.F.; Tubacher, E.; Bayard, Q.; Bacq, D.; Meyer, V.; Semhoun, J.; et al. Mutational signatures reveal the dynamic interplay of risk factors and cellular processes during liver tumorigenesis. Nat. Commun. 2017, 8, 1315. [Google Scholar] [CrossRef] [Green Version]
  20. Polak, P.; Kim, J.; Braunstein, L.Z.; Karlic, R.; Haradhavala, N.J.; Tiao, G.; Rosebrock, D.; Livitz, D.; Kubler, K.; Mouw, K.W.; et al. A mutational signature reveals alterations underlying deficient homologous recombination repair in breast cancer. Nat. Genet. 2017, 49, 1476–1486. [Google Scholar] [CrossRef]
  21. Manolio, T.A.; Collins, F.S.; Cox, N.J.; Goldstein, D.B.; Hindorff, L.A.; Hunter, D.J.; McCarthy, M.I.; Ramos, E.M.; Cardon, L.R.; Chakravarti, A.; et al. Finding the missing heritability of complex diseases. Nature 2009, 461, 747–753. [Google Scholar] [CrossRef] [PubMed] [Green Version]
  22. Dudbridge, F. Power and predictive accuracy of polygenic risk scores. PLoS Genet. 2013, 9, e1003348. [Google Scholar] [CrossRef]
  23. Goode, L.L.; Chenevix-Trench, G.; Song, H.; Ramus, S.J.; Notaridou, M.; Lawrenson, K.; Widschwendter, M.; Vierkant, R.A.; Larson, M.C.; Kjaer, S.K.; et al. A genome-wide association study identifies susceptibility loci for ovarian cancer at 2q31 and 8q24. Nat. Genet. 2010, 42, 874–879. [Google Scholar] [CrossRef] [PubMed] [Green Version]
  24. Eeles, R.A.; Kote-Jarai, Z.; Al Olama, A.A.; Giles, G.G.; Guy, M.; Severi, G.; Muir, K.; Hopper, J.L.; Henderson, B.E.; Haiman, C.A.; et al. Identification of seven new prostate cancer susceptibility loci through a genome-wide association study. Nat. Genet. 2009, 41, 1116–1121. [Google Scholar] [CrossRef] [PubMed]
  25. Carter, H.; Marty, R.; Hofree, M.; Gross, A.M.; Jensen, J.; Fisch, K.M.; Wu, X.Y.; DeBoever, C.; Van Nostrand, E.L.; Song, Y.; et al. Interaction landscape of inherited polymorphisms with somatic events in cancer. Cancer Discov. 2017, 7, 410–423. [Google Scholar] [CrossRef] [PubMed] [Green Version]
  26. Das, S.; Forer, L.; Schonherr, S.; Sidore, C.; Locke, A.E.; Kwong, A.; Vrieze, S.I.; Chew, E.Y.; Levy, S.; McGue, M.; et al. Next-generation genotype imputation service and methods. Nat. Genet. 2016, 48, 1284–1287. [Google Scholar] [CrossRef] [Green Version]
  27. Blokzijl, F.; Janssen, R.; van Boxtel, R.; Cuppen, E. Mutationalpatterns: Comprehensive genome-wide analysis of mutational processes. Genome Med. 2018, 10, 33. [Google Scholar] [CrossRef]
Figure 1. Scatter plots for HRNonRef vs. HRMinor. (A,B): Caucasian; (C,D): Black; (E,F): Asian. (A,C,E): Heterozygosity Ratio (HR) computed from original Single Nucleotide Polymorphism (SNP) data without imputation. (B,D,F): HR computed from SNP data after imputation. Each data point is an individual in the cohort. All results show excellent correlations between HRNonRef and HRMinor, with HRMinor two to three times higher than HRNonRef.
Figure 1. Scatter plots for HRNonRef vs. HRMinor. (A,B): Caucasian; (C,D): Black; (E,F): Asian. (A,C,E): Heterozygosity Ratio (HR) computed from original Single Nucleotide Polymorphism (SNP) data without imputation. (B,D,F): HR computed from SNP data after imputation. Each data point is an individual in the cohort. All results show excellent correlations between HRNonRef and HRMinor, with HRMinor two to three times higher than HRNonRef.
Cancers 12 03646 g001
Figure 2. Scatter plot for HRNonRef, Runs of Homozygosity (ROH), before and after imputation. (A,B): Caucasian; (C,D): Black; (E,F): Asian. (A,C,E): HRNonRef; (B,D,F): ROH. The correlations for ROH are weaker than HRNonRef before and after imputation due to the ROH outliers resulted from imputation. This shows that HR is less prone to the effect of SNP density than ROH.
Figure 2. Scatter plot for HRNonRef, Runs of Homozygosity (ROH), before and after imputation. (A,B): Caucasian; (C,D): Black; (E,F): Asian. (A,C,E): HRNonRef; (B,D,F): ROH. The correlations for ROH are weaker than HRNonRef before and after imputation due to the ROH outliers resulted from imputation. This shows that HR is less prone to the effect of SNP density than ROH.
Cancers 12 03646 g002
Figure 3. Comparison of HRNonRef and ROH between sex across all three races tested. The computation of HRNonRef and ROH contains sex chromosomes X and Y. For the equivalent of this figure without chromosomes X and Y, please see Figure S2. (A) Violin and boxplots of ROH separated by sex for Caucasians. (B) Violin and boxplots of ROH separated by sex for Asians. (C) Violin and boxplots of ROH separated by sex for Blacks. (D) Violin and boxplots of HRNonRef separated by sex for Caucasians. (E) Violin and boxplots of HRNonRef separated by sex for Asians. (F) Violin and boxplots of HRNonRef separated by sex for Blacks. Females, in general, have higher HRNonRef than males and the difference is substantially more visible than ROH.
Figure 3. Comparison of HRNonRef and ROH between sex across all three races tested. The computation of HRNonRef and ROH contains sex chromosomes X and Y. For the equivalent of this figure without chromosomes X and Y, please see Figure S2. (A) Violin and boxplots of ROH separated by sex for Caucasians. (B) Violin and boxplots of ROH separated by sex for Asians. (C) Violin and boxplots of ROH separated by sex for Blacks. (D) Violin and boxplots of HRNonRef separated by sex for Caucasians. (E) Violin and boxplots of HRNonRef separated by sex for Asians. (F) Violin and boxplots of HRNonRef separated by sex for Blacks. Females, in general, have higher HRNonRef than males and the difference is substantially more visible than ROH.
Cancers 12 03646 g003
Figure 4. Meta-analysis for cancer risk. (A) Meta-analysis of cancer risk results of HRNonRef (Table 1). (B) Meta-analysis of cancer risk results of ROH (Table 2). Heterogeneity p < 0.05 indicates significant heterogeneity across cancer datasets, thus a random effect model was used for the meta-analysis. HRNonRef behaved consistently across multiple cancer types, which resulted in a significant meta-analysis p-value. ROH, on the other hand, was not significant.
Figure 4. Meta-analysis for cancer risk. (A) Meta-analysis of cancer risk results of HRNonRef (Table 1). (B) Meta-analysis of cancer risk results of ROH (Table 2). Heterogeneity p < 0.05 indicates significant heterogeneity across cancer datasets, thus a random effect model was used for the meta-analysis. HRNonRef behaved consistently across multiple cancer types, which resulted in a significant meta-analysis p-value. ROH, on the other hand, was not significant.
Cancers 12 03646 g004
Figure 5. Disease-specific survival analysis results presented in Kaplan–Meier plots. R’ maxstat package was used to identify the optimal point for dichotomization. Two p-values are presented in the figure, one from Cox proportional hazards regression models without dichotomizing the data, and one from Cox proportional hazards regression models with dichotomized data.
Figure 5. Disease-specific survival analysis results presented in Kaplan–Meier plots. R’ maxstat package was used to identify the optimal point for dichotomization. Two p-values are presented in the figure, one from Cox proportional hazards regression models without dichotomizing the data, and one from Cox proportional hazards regression models with dichotomized data.
Cancers 12 03646 g005
Table 1. Results from logistic regression assessing HRNonRef’s cancer risk.
Table 1. Results from logistic regression assessing HRNonRef’s cancer risk.
LogOR 1 (95% CI)pCancer 2RaceGenderCasesControls 3
1.8973 (1.6228–2.1920)4.89 × 10−28BRCABlackfemale163342
1.1487 (0.8858–1.4336)5.28 × 10−12SKCMCaucasianmale274240
0.9630 (0.7327–1.2104)3.34 × 10−11OVCaucasianfemale369263
1.4445 (1.0980–1.8303)8.83 × 10−11STADCaucasianmale153240
1.1018 (0.8318–1.3940)1.17 × 10−10BRCACaucasianfemale653263
1.1224 (0.8393–1.4289)3.80 × 10−10HNSCCaucasianmale300240
1.2666 (0.9249–1.6396)5.68 × 10−9LUADCaucasianfemale165263
0.9607 (0.6844–1.2734)7.93 × 10−8HNSCCaucasianfemale120263
0.9216 (0.6325–1.2450)7.44 × 10−7PRADCaucasianmale127240
0.8458 (0.5814–1.1452)7.70 × 10−7COADCaucasianmale101240
0.6438 (0.4208–0.8882)6.00 × 10−6SKCMCaucasianfemale168263
0.4797 (0.2712–0.7050)2.71 × 10−4LUADCaucasianmale134240
0.4689 (0.2634–0.6977)3.82 × 10−4LUSCCaucasianmale166240
0.4903 (0.2585–0.7537)1.12 × 10−3LIHCAsianmale118244
1 Log odds ratio, unit = per stand deviation. 2 Cancer abbreviations: BRCA—The Breast Cancer Gene; SKCM—Skin Cutaneous Melanoma; OV—Ovarian Serous Cystadenocarcinoma; STAD—Stomach Adenocarcinoma; HNSC—Head and Neck Squamous Cell Carcinoma; LUAD—Lung Adenocarcinoma; PRAD—Prostate Adenocarcinoma; COAD—Colon Adenocarcinoma; LUSC—Lung Squamous Cell Carcinoma; LIHC—Liver Hepatocellular Carcinoma. 3 The number of matched normal control were taken from International Genome Sample Resources (IGSR).
Table 2. Results from logistic regression assessing ROH’s cancer risk.
Table 2. Results from logistic regression assessing ROH’s cancer risk.
LogOR 1 (95% CI)pCancer 2RaceGenderCasesControls 3
15.6383 (7.9048–23.5950)0.0010COADCaucasianmale101240
−0.2304 (−0.3658–−0.0969)0.0048OVCaucasianfemale369263
5.3577 (2.2553–8.5218)0.0048PRADCaucasianmale127240
−0.2839 (−0.4606–−0.1132)0.0071LUADCaucasianfemale165263
−0.1797 (−0.2989–−0.0623)0.0122BRCACaucasianfemale653263
−0.2557 (−0.4285–−0.0882)0.0133SKCMCaucasianfemale168263
8.1083 (2.2387–14.1885)0.0253HNSCCaucasianmale300240
0.2548 (0.0708–0.4519)0.0269LIHCAsianmale118244
7.1487 (1.0826–13.0103)0.0422LUSCCaucasianmale166240
5.0918 (0.6647–9.4886)0.0483STADCaucasianmale153240
−0.2022 (−0.3918–−0.0282)0.0670BRCABlackfemale163342
−0.0873 (−0.2877–0.0971)0.4534HNSCCaucasianfemale120263
0.7471 (−0.0236–2.9058)0.5483SKCMCaucasianmale274240
0.0625 (−0.1152–0.2397)0.5614LUADCaucasianmale134240
1 Log odds ratio, unit = per stand deviation. 2 Cancer abbreviations: COAD—Colon Adenocarcinoma; OV—Ovarian Serous Cystadenocarcinoma; PRAD—Prostate Adenocarcinoma; LUAD—Lung Adenocarcinoma; BRCA—The Breast Cancer Gene; SKCM—Skin Cutaneous Melanoma; HNSC—Head and Neck Squamous Cell Carcinoma; LIHC—Liver Hepatocellular Carcinoma; LUSC—Lung Squamous Cell Carcinoma; STAD—Stomach Adenocarcinoma. 3 The number of matched normal control were taken from IGSR.
Table 3. Association between global autozygosity and mutational signatures.
Table 3. Association between global autozygosity and mutational signatures.
EffectStderr 1Adusted pSignatureCasePredictorGenderRaceCancer 2
0.26380.03874.62 × 10−8SBS44116ROHfemaleCaucasianHNSC
0.35710.06793.02 × 10−5SBS36125ROHmaleCaucasianPRAD
0.84070.19700.0013SBS9277HRNonReffemaleCaucasianOV
1.32450.32230.0013SBS18277HRNonReffemaleCaucasianOV
1.00010.23590.0018SBS42164ROHmaleCaucasianLUSC
0.65970.18910.0070SBS5277HRNonReffemaleCaucasianOV
0.89390.25640.0070SBS7c277HRNonReffemaleCaucasianOV
2.90860.75330.0172SBS7b132ROHmaleCaucasianLUAD
−0.08850.02340.0206SBS44166HRNonReffemaleCaucasianSKCM
0.97140.32270.0280SBS22277HRNonReffemaleCaucasianOV
0.24390.07180.0453SBS36125HRNonRefmaleCaucasianPRAD
1 Standard error. 2 Cancer abbreviations: HNSC—Head and Neck Squamous Cell Carcinoma; PRAD—Prostate Adenocarcinoma; OV—Ovarian Serous Cystadenocarcinoma; LUSC—Lung Squamous Cell Carcinoma; LUAD—Lung Adenocarcinoma; SKCM—Skin Cutaneous Melanoma.
Publisher’s Note: MDPI stays neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Share and Cite

MDPI and ACS Style

Jiang, L.; Guo, F.; Tang, J.; Leng, S.; Ness, S.; Ye, F.; Kang, H.; Samuels, D.C.; Guo, Y. Global Autozygosity Is Associated with Cancer Risk, Mutational Signature and Prognosis. Cancers 2020, 12, 3646. https://doi.org/10.3390/cancers12123646

AMA Style

Jiang L, Guo F, Tang J, Leng S, Ness S, Ye F, Kang H, Samuels DC, Guo Y. Global Autozygosity Is Associated with Cancer Risk, Mutational Signature and Prognosis. Cancers. 2020; 12(12):3646. https://doi.org/10.3390/cancers12123646

Chicago/Turabian Style

Jiang, Limin, Fei Guo, Jijun Tang, Shuguan Leng, Scott Ness, Fei Ye, Huining Kang, David C. Samuels, and Yan Guo. 2020. "Global Autozygosity Is Associated with Cancer Risk, Mutational Signature and Prognosis" Cancers 12, no. 12: 3646. https://doi.org/10.3390/cancers12123646

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Metrics

Back to TopTop