Population Substructure Has Implications in Validating Next-Generation Cancer Genomics Studies with TCGA
Abstract
1. Introduction
2. Results
3. Discussion
4. Materials and Methods
4.1. Tissue Procurement
4.2. RNA Purification and Sequencing
4.3. TCGA Cohort
4.4. File Pre-Processing
4.5. Data Analysis
5. Conclusions
Supplementary Materials
Author Contributions
Funding
Acknowledgments
Conflicts of Interest
Abbreviations
TCGA | The Cancer Genome Atlas |
UIHC | University of Iowa Hospitals and Clinics |
RNA | ribonucleic acid |
NHW | non-Hispanic white |
NHB | non-Hispanic black |
References
- Tomczak, K.; Czerwinska, P.; Wiznerowicz, M. The Cancer Genome Atlas (TCGA): An immeasurable source of knowledge. Contemp. Oncol. (Pozn) 2015, 19, A68–A77. [Google Scholar] [CrossRef] [PubMed]
- Salinas, E.A.; Miller, M.D.; Newtson, A.M.; Sharma, D.; McDonald, M.E.; Me, K.; Smith, B.J.; Bender, B.J.; Goodheart, M.J.; Thiel, K.W.; et al. A prediction model for preoperative risk assessment in endometrial cancer utilizing clinical and molecular variables. Int. J. Mol. Sci. 2019. under review. [Google Scholar]
- Miller, M.D.; Salinas, E.A.; Newtson, A.M.; Sharma, D.; Me, K.; Warrier, A.; Smith, B.J.; Bender, B.J.; Goodheart, M.J.; Thiel, K.W.; et al. An Integrated Prediction Model of Recurrence in Endometrial Endometrioid Cancers. Cancer Manag. Res. 2019. under review. [Google Scholar]
- Grunda, J.M.; Steg, A.D.; He, Q.; Steciuk, M.R.; Byan-Parker, S.; Johnson, M.R.; Grizzle, W.E. Differential expression of breast cancer-associated genes between stage- and age-matched tumor specimens from African- and Caucasian-American Women diagnosed with breast cancer. BMC Res. Notes 2012, 5, 248. [Google Scholar] [CrossRef] [PubMed]
- Dubil, E.A.; Tian, C.; Wang, G.; Tarney, C.M.; Bateman, N.W.; Levine, D.A.; Conrads, T.P.; Hamilton, C.A.; Maxwell, G.L.; Darcy, K.M. Racial disparities in molecular subtypes of endometrial cancer. Gynecol. Oncol. 2018, 149, 106–116. [Google Scholar] [CrossRef] [PubMed]
- Park, H.K.; Ruterbusch, J.J.; Cote, M.L. Recent Trends in Ovarian Cancer Incidence and Relative Survival in the United States by Race/Ethnicity and Histologic Subtypes. Cancer Epidemiol. Biomark. Prev. 2017, 26, 1511–1518. [Google Scholar] [CrossRef]
- Facts, Q. United States Census Bureau Quick Facts: Iowa; U.S. Department of Commerce: Washington, DC, USA, 2019. Available online: https://www.census.gov/quickfacts/IA (accessed on 30 January 2019).
- Spratt, D.E.; Chan, T.; Waldron, L.; Speers, C.; Feng, F.Y.; Ogunwobi, O.O.; Osborne, J.R. Racial/Ethnic Disparities in Genomic Sequencing. JAMA Oncol. 2016, 2, 1070–1074. [Google Scholar] [CrossRef]
- Cote, M.L.; Ruterbusch, J.J.; Olson, S.H.; Lu, K.; Ali-Fehmi, R. The Growing Burden of Endometrial Cancer: A Major Racial Disparity Affecting Black Women. Cancer Epidemiol. Biomark. Prev. 2015, 24, 1407–1415. [Google Scholar] [CrossRef]
- DeSantis, C.E.; Siegel, R.L.; Sauer, A.G.; Miller, K.D.; Fedewa, S.A.; Alcaraz, K.I.; Jemal, A. Cancer statistics for African Americans, 2016: Progress and opportunities in reducing racial disparities. CA Cancer J. Clin. 2016, 66, 290–308. [Google Scholar] [CrossRef]
- Bryc, K.; Velez, C.; Karafet, T.; Moreno-Estrada, A.; Reynolds, A.; Auton, A.; Hammer, M.; Bustamante, C.D.; Ostrer, H. Colloquium paper: Genome-wide patterns of population structure and admixture among Hispanic/Latino populations. Proc. Natl. Acad. Sci. USA 2010, 107, 8954–8961. [Google Scholar] [CrossRef]
- Sillanpaa, M.J. Overview of techniques to account for confounding due to population stratification and cryptic relatedness in genomic data association analyses. Heredity (Edinb) 2011, 106, 511–519. [Google Scholar] [CrossRef] [PubMed]
- Janes, J.K.; Miller, J.M.; Dupuis, J.R.; Malenfant, R.M.; Gorrell, J.C.; Cullingham, C.I.; Andrew, R.L. The K = 2 conundrum. Mol. Ecol. 2017, 26, 3594–3602. [Google Scholar] [CrossRef] [PubMed]
- Danecek, P.; Auton, A.; Abecasis, G.; Albers, C.A.; Banks, E.; DePristo, M.A.; Handsaker, R.E.; Lunter, G.; Marth, G.T.; Sherry, S.T.; et al. The variant call format and VCFtools. Bioinformatics 2011, 27, 2156–2158. [Google Scholar] [CrossRef]
- Li, H.; Handsaker, B.; Wysoker, A.; Fennell, T.; Ruan, J.; Homer, N.; Marth, G.; Abecasis, G.; Durbin, R.; 1000 Genome Project Data Processing Subgroup. The Sequence Alignment/Map Format and SAMtools. Bioinformatics 2009, 25, 2078–2079. [Google Scholar] [CrossRef] [PubMed]
- Chang, C.C.; Chow, C.C.; Tellier, L.C.; Vattikuti, S.; Purcell, S.M.; Lee, J.J. Second-generation PLINK: Rising to the challenge of larger and richer datasets. Gigascience 2015, 4, 7. [Google Scholar] [CrossRef] [PubMed]
- Purcell, S.; Neale, B.; Todd-Brown, K.; Thomas, L.; Ferreira, M.A.; Bender, D.; Maller, J.; Sklar, P.; de Bakker, P.I.; Daly, M.J.; et al. PLINK: A tool set for whole-genome association and population-based linkage analyses. Am. J. Hum. Genet. 2007, 81, 559–575. [Google Scholar] [CrossRef] [PubMed]
- Browning, B.L.; Browning, S.R. Genotype Imputation with Millions of Reference Samples. Am. J. Hum. Genet. 2016, 98, 116–126. [Google Scholar] [CrossRef] [PubMed]
- Alexander, D.H.; Novembre, J.; Lange, K. Fast model-based estimation of ancestry in unrelated individuals. Genome Res. 2009, 19, 1655–1664. [Google Scholar] [CrossRef] [PubMed]
- Pritchard, J.K.; Stephens, M.; Donnelly, P. Inference of population structure using multilocus genotype data. Genetics 2000, 155, 945–959. [Google Scholar] [PubMed]
- Evanno, G.; Regnaut, S.; Goudet, J. Detecting the number of clusters of individuals using the software STRUCTURE: A simulation study. Mol. Ecol. 2005, 14, 2611–2620. [Google Scholar] [CrossRef]
- Francis, R.M. Pophelper: An R package and web app to analyse and visualize population structure. Mol. Ecol. Resour. 2017, 17, 27–32. [Google Scholar] [CrossRef] [PubMed]
- Wright, S. Evolution in Mendelian Populations. Genetics 1931, 16, 97–159. [Google Scholar] [PubMed]
- Meirmans, P.G.; Hedrick, P.W. Assessing population structure: F(ST) and related measures. Mol. Ecol. Resour. 2011, 11, 5–18. [Google Scholar] [CrossRef] [PubMed]
- Zheng, X.; Levine, D.; Shen, J.; Gogarten, S.M.; Laurie, C.; Weir, B.S. A high-performance computing toolset for relatedness and principal component analysis of SNP data. Bioinformatics 2012, 28, 3326–3328. [Google Scholar] [CrossRef] [PubMed]
UIHC | TCGA | |||
---|---|---|---|---|
Cancer | Ovarian | Endometrial | Ovarian | Endometrial |
Histological Type | High grade serous | Endometrioid | High grade serous | Endometrioid |
Samples | 50 | 62 | 351 | 395 |
Age (mean) | 59 | 61 | 59 | 65 |
* Race: | ||||
White | 48 | 57 | 302 | 288 |
Black | 1 | 0 | 25 | 61 |
Asian | 0 | 0 | 10 | 17 |
Pacific Islander | 0 | 1 | 1 | 7 |
American Indian | 0 | 0 | 2 | 3 |
Unknown | 1 | 4 | 12 | 20 |
* Ethnicity | ||||
Hispanic | 0 | 0 | 8 | 9 |
Non-Hispanic | 49 | 58 | 201 | 275 |
Unknown | 1 | 4 | 142 | 111 |
Stage: | ||||
I | 0 | 44 | 1 | 281 |
II | 0 | 4 | 20 | 34 |
III | 34 | 11 | 274 | 66 |
IV | 13 | 3 | 53 | 14 |
Unknown | 3 | 0 | 1 | 1 |
© 2019 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (http://creativecommons.org/licenses/by/4.0/).
Share and Cite
Miller, M.D.; Devor, E.J.; Salinas, E.A.; Newtson, A.M.; Goodheart, M.J.; Leslie, K.K.; Gonzalez-Bosquet, J. Population Substructure Has Implications in Validating Next-Generation Cancer Genomics Studies with TCGA. Int. J. Mol. Sci. 2019, 20, 1192. https://doi.org/10.3390/ijms20051192
Miller MD, Devor EJ, Salinas EA, Newtson AM, Goodheart MJ, Leslie KK, Gonzalez-Bosquet J. Population Substructure Has Implications in Validating Next-Generation Cancer Genomics Studies with TCGA. International Journal of Molecular Sciences. 2019; 20(5):1192. https://doi.org/10.3390/ijms20051192
Chicago/Turabian StyleMiller, Marina D., Eric J. Devor, Erin A. Salinas, Andreea M. Newtson, Michael J. Goodheart, Kimberly K. Leslie, and Jesus Gonzalez-Bosquet. 2019. "Population Substructure Has Implications in Validating Next-Generation Cancer Genomics Studies with TCGA" International Journal of Molecular Sciences 20, no. 5: 1192. https://doi.org/10.3390/ijms20051192
APA StyleMiller, M. D., Devor, E. J., Salinas, E. A., Newtson, A. M., Goodheart, M. J., Leslie, K. K., & Gonzalez-Bosquet, J. (2019). Population Substructure Has Implications in Validating Next-Generation Cancer Genomics Studies with TCGA. International Journal of Molecular Sciences, 20(5), 1192. https://doi.org/10.3390/ijms20051192