Novel Somatic Copy Number Alteration Identified for Cervical Cancer in the Mexican American Population

Cervical cancer affects millions of Americans, but the rate for cervical cancer in the Mexican American is approximately twice that for non-Mexican Americans. The etiologies of cervical cancer are still not fully understood. A number of somatic mutations, including several copy number alterations (CNAs), have been identified in the pathogenesis of cervical carcinomas in non-Mexican Americans. Thus, the purpose of this study was to investigate CNAs in association with cervical cancer in the Mexican American population. We conducted a pilot study of genome-wide CNA analysis using 2.5 million markers in four diagnostic groups: reference (n = 125), low grade dysplasia (cervical intraepithelial neoplasia (CIN)-I, n = 4), high grade dysplasia (CIN-II and -III, n = 5) and invasive carcinoma (squamous cell carcinoma (SCC), n = 5) followed by data analyses using Partek. We observed a statistically-significant difference of CNA burden between case and reference groups of different sizes (>100 kb, 10–100 kb and 1–10 kb) of CNAs that included deletions and amplifications, e.g., a statistically-significant difference of >100 kb deletions was observed between the reference (6.6%) and pre-cancer and cancer (91.3%) groups. Recurrent aberrations of 98 CNA regions were also identified in cases only. However, none of the CNAs have an impact on cancer progression. A total of 32 CNA regions identified contained tumor suppressor genes and oncogenes. Moreover, the pathway analysis revealed endometrial cancer and estrogen signaling pathways associated with this cancer (p < 0.05) using Kyoto Encyclopedia of Genes and Genomes (KEGG). This is the first report of CNAs identified for cervical cancer in the U.S. Latino population using high density markers. We are aware of the small sample size in the study. Thus, additional studies with a larger sample are needed to confirm the current findings.


Introduction
Based on the U.S. Centers for Disease Control and Prevention (CDC) 2009 report, 12,357 women were diagnosed with cervical cancer, and 3909 women died from cervical cancer in the United States.This huge number of patients with cervical cancer has been a critical issue and is responsible for 10%-15% of cancer-related deaths in females globally [1].Cervical cancer is the second most common malignant tumor in women worldwide.Among these tumors, approximately 80% are squamous cell Med.Sci.2016, 4, 12 2 of 17 carcinomas (SCCs), and 5%-20% are adenocarcinomas (AdCAs) [2,3].Cervical SCCs are developed from a premalignant disease known as cervical intraepithelial neoplasia (CIN) Graded 1-3 with increasing atypical features.The five-year overall survival for cervical cancer is only 66%.
In addition, increasing evidence suggests that infection with high-risk subtypes of human papillomavirus (HPV) (e.g., HPV-16 and HPV-18) is the most common cause and is the primary initiator of premalignant lesions [4].However, only a small proportion of women infected with oncogenic HPV subtypes develop cervical cancer, which suggests that HPV infection alone is insufficient to cause cancer, and there is a possibility of other host factors linked to the development of invasive cervical cancer [5], like genetic variation, including polymorphisms, insertions or deletions in the host genome [6,7].Increasing evidence demonstrates that there was a consistent relationship between certain genetic variants (such as the tumor protein 53 (TP53) Arg72Pro polymorphism) and cervical cancer, most likely modulated by the presence of high-risk HPV during progression from squamous intraepithelial lesions (SIL) to cervical cancer.
Moreover, Mexican American women in Texas have among the highest rates of cervical cancer incidence and mortality in the country [8] with twice the frequency as compared to their non-Mexican American counterparts.The annual death rate from cervical cancer for Mexican Americans is 24.2 out of 100,000 [9].However, there is a lack of studies on somatic mutation identification for cervical cancer in this population.Only four studies have reported on genetic basis of cervical cancer in Mexican American women based on a PubMed search (19 November 2015).The number of the Mexican American population was estimated at 50.5 million in the 2010 census, making Mexican American the largest minority group in the U.S., as well as a rapidly growing segment of the U.S. population.
Recent advances in genome studies have led to the discovery of one important type of variation that can be assessed with recent technology: copy number alterations (CNAs), usually for a cancer study, somatic copy number changes and/or copy number variations (CNVs), usually for a non-cancer study, or germline copy number changes, such as CNVs identified for neuropsychiatric disorders in our recent study [10].These CNAs or CNVs are by definition chromosomal regions with sizes of 1 kb to several Mb, which vary across individuals with regard to the number of copies of a chromosomal segment.CNVs refer to structural variations of the DNA that include insertions, deletions and duplications.Studies have found that CNVs cover as much as 14% of the human genome [11], and there is a much higher de novo rate as many as 10-1000-fold in CNVs as compared to single nucleotide polymorphisms (SNPs) [11,12].Furthermore, CNVs have been shown to account for more genomic differences between individuals than SNPs [13,14].Therefore, CNVs may contribute a sizeable amount of disease phenotypic variation in each individual of a population [15].
Analyses integrating mutation information with data on rearrangements and CNAs have revealed a higher-order organization of the seemingly random genetic events that lead to cancer [16].Interestingly, genes in regions subject to copy number changes appear to be organized along functional ontological terms related to cancer [16].Studies have also implicated a number of somatic mutations, including TP53, phosphatidylinositol-4,5-bisphosphate 3-kinase catalytic subunit alpha (PIK3CA), phosphatase and tensin homolog (PTEN), serine/threonine kinase 11 (STK11) and V-Ki-ras2 Kirsten rat sarcoma viral oncogene homolog (KRAS) [17,18], and several CNAs in the pathogenesis of cervical carcinomas [19,20] in non-Mexican Americans.
In this study, we carried out a genome-wide survey of potential somatic CNAs, including amplifications and deletions, in apparently normal tissues (n = 2), low grade dysplasia (n = 4), high grade dysplasia (n = 5), invasive carcinoma (n = 5) and blood samples (n = 125, serving as a reference group) of female subjects from the HapMap data.We are aware of the limited number of cases and the lack of a control group.Thus, in the future, a large study with a control sample and more cases as a methodological alternative is needed.We genotyped 2.5 million markers and analyzed somatic CNAs in a total of 14 tissues using the Illumina HumanOmni2.5-8BeadChip Kit at tissue-level resolution.We mapped genomic changes: (1) between peripheral blood samples of the reference subjects and cervical tissues from the cases (cervical dysplasia and invasive carcinoma); because only two normal cervical tissue samples had insufficient statistical power, we excluded these two samples from the further analysis; (2) we also analyzed genomic changes among four diagnostic groups (normal, low, high grade and invasive carcinoma).We expect that this study: (i) will provide an estimate of the prevalence of somatic CNAs by identifying specific patterns, genes and/or biological pathways associated with different stages of cervical dysplasia in Mexican Americans; (ii) will investigate the genomic context of these somatic CNAs; and (iii) will evaluate whether the burden of somatic mutations predicts tumor progression.

Materials
A total of 14 tissues (low, high grade dysplasia and invasive carcinoma) from cases and 125 female subjects, serving as a reference group, were used for this study.The demographic information is shown in Table 1.The cases were categorized into three groups, including low grade dysplasia (CIN-I, n = 4), high grade dysplasia (CIN-II and III, n = 5) and squamous cell carcinoma (SCC, n = 5) groups.For a broader definition of the case group, we also divided the cases into two groups, pre-cancer (CIN-I, -II and -III) and cancer (SCC).All of the case subjects in the current study were from the Mexican American population recruited from the outpatient clinics at the University Medical Center (UMC) and Texas Tech University Health Sciences Center (TTUHSC)-El Paso.All cases of cervical cancer were diagnosed as SCC by histopathological examinations, whereas healthy women had no abnormal cytological findings in the Pap smear tests of the uterine cervix.The Illumina HumanOmni2.5-8BeadChip data from a total of 125 blood samples of female subjects from the HapMap were used as a reference in the current study.

Tissue Specimens
Sixteen cervical tissue samples were obtained from the Department of Pathology, TTUHSC-El Paso.We excluded two normal tissues due to an insufficient tissue sample size.All case subjects were HPV positive, from the Mexican American population and had signed Institutional Review Board-approved written informed consent forms prior to enrolling in the study.The procedures were approved by the Institutional Review Boards of TTUHSC (IRB #E13107), and the study was performed in accordance with the Helsinki Declaration of 1975.

Microdissection
Paraffin-embedded tissues were first sectioned into 10-µm slices, which were hematoxylin-eosin stained for the selection of the appropriate tissue area.The corresponding selected areas of each tissue sample were then collected under microscopic observation with a 30-gauge needle (Becton-Dickinson, Franklin Lakes, NJ, USA).
Genomic DNA of micro-dissected tissue was extracted by proteinase K digestion followed by standard phenol-chloroform extraction.The QIAamp Formalin-Fixed, Paraffin-Embedded (FFPE) Tissue Kit from Qiagen (Valencia, CA, USA), which is widely used for extracting DNA from FFPE sections, was used.The experiment was performed according to the manufacturer's handbook.The total amount of DNA was spectrophotometrically determined by measuring the absorbance at 260 nm (A260), and DNA purity was assessed by detecting the A260/A280 ratio using the Varioskan Flash (Thermo Scientific, Rockford, IL, USA) according to the manufacturer's instructions.

High Density Genotyping
Genomic DNA from the case group was used to obtain genotypes by the Illumina HumanOmni2.5-8BeadChip (Illumina, San Diego, CA, USA).This DNA chip provided over 2.5 million markers at a median spacing of 1.2 kb and full support of CNV or CNA applications, which was a powerful genotyping tool and allowed us to make more meaningful discoveries.Genomic annotations were based on National Center for Biotechnology Information (NCBI) Human Genome Build 37 (University of California, Santa Cruz Genome Browser Release 19), and genotyping experiments were performed at the Genomics Core at the TTUHSC-El Paso.
Due to our limited number of control samples of cervical tissues, we used publicly available HapMap data with genotypes, including CNV data of the same DNA chip, the Illumina HumanOmni2.5-8BeadChip Kit, in the 125 female subjects as a reference group.The HapMap data with genotypes were downloaded from [21].
In addition, there is no genotype data available for the Mexican American population using the HapMap data, so we selected these 125 female subjects from the Admix populations (Utah residents with Northern and Western European ancestry, CEU; Yoruba, YRI; Han Chinese, CHB, and Japanese, JPT) since the various combinations of reference panels with the multi-ethnic populations had better accuracy than those containing only single ethnic samples [22] for genetic study and genetic imputation analysis.
The raw genotyping signal data were processed by the Illumina GenomeStudio software Genotyping Module Version 3.2.33 (Illumina) and converted to allele-specific intensity values.The genotype call rate threshold was set at ě95% for all samples, and a total of 139 including case (n = 14) and reference (n = 125) subjects passed the quality control.A Partek customized report of the normalized genotype data, composed of those 139 samples, including low grade (CIN-I = 4), high grade (CIN-II and CIN-III = 5), SCC (n = 5) and 125 females from the reference group (Table 1), was transferred to Partek ® Genomic Suite ® software, Version 6.6,Copyright 2014, Partek Inc. (Saint Louis, MO, USA), for downstream analysis.

CNA Detection
Unpaired copy number analysis was performed in the Partek Genomics Suite comparing allele intensities to a reference baseline of 125 female HapMap admixed samples using a similar analysis strategy as a previous publication [23].The genomic segmentation algorithm was applied to find break points and to detect amplifications (gains) and deletions (losses).The following stringent parameters were used to identify CNAs and CNA regions as a previous study [24]: (1) each segment must contain a minimum of 10 consecutive filtered probe sets; (2) a p value threshold of 0.001 when compared to the neighboring adjacent regions; and (3) a signal-to-noise threshold of 0.5 and diploid copy number 1.7 to 2.3.

Pathway Analysis
We further examined whether these CNAs in the various genes have an impact on gene functions.Gene ontology (GO) analysis was also performed using the Partek ® Genomic Suite ® software, Version 6.6,Copyright ® 2014, Partek Inc., to investigate whether there was enrichment for cancer-associated CNAs (p < 0.05) in genes from any ontology categories.
Disease association of individual CNA frequencies with patients, as a group, compared to a reference group, was assessed using 2 ˆ2 or 2 ˆ3 contingency tables, two-tailed χ 2 tests or Fisher exact tests.Statistical analyses were performed using the SPSS statistical software package (Version 10, IBM, Chicago, IL, USA).Differences with two-tailed probability values of p ď 0.05 were taken as statistically significant.p-values for tests of CNA association were conservatively corrected for multiple testing using the Bonferroni method.

Overall CNA Patterns
We processed genotype and intensity data for all 2.5 million probes of the Illumina HumanOmni2.5-8BeadChip Kit for the 14 cases and 125 reference subjects.A number of CNAs (deletions or losses) was present in many chromosomal regions (Figure 1).A total of 2220 CNAs >1000 bp was identified, including 725 (32.7%) amplifications (gains) and 1495 (67.3%) deletions (losses) mainly on the 22 autosomal chromosomes.

Pathway Analysis
We further examined whether these CNAs in the various genes have an impact on gene functions.Gene ontology (GO) analysis was also performed using the Partek ® Genomic Suite ® software, Version 6.6,Copyright ® 2014, Partek Inc., to investigate whether there was enrichment for cancer-associated CNAs (p < 0.05) in genes from any ontology categories.
Disease association of individual CNA frequencies with patients, as a group, compared to a reference group, was assessed using 2 × 2 or 2 × 3 contingency tables, two-tailed χ 2 tests or Fisher exact tests.Statistical analyses were performed using the SPSS statistical software package (Version 10, IBM, Chicago, IL, USA).Differences with two-tailed probability values of p ≤ 0.05 were taken as statistically significant.p-values for tests of CNA association were conservatively corrected for multiple testing using the Bonferroni method.

Overall CNA Patterns
We processed genotype and intensity data for all 2.5 million probes of the Illumina HumanOmni2.5-8BeadChip Kit for the 14 cases and 125 reference subjects.A number of CNAs (deletions or losses) was present in many chromosomal regions (Figure 1).A total of 2220 CNAs >1000 bp was identified, including 725 (32.7%) amplifications (gains) and 1495 (67.3%) deletions (losses) mainly on the 22 autosomal chromosomes.
The chromosomal locations of the copy number amplifications and deletions of the 22 autosomes and X chromosomes were shown by karyograms (Figure 1).Most of the amplifications are found in the cases at the short arms of chromosomes 1, 5, 8, 16, 19 and 20, as well as the centromeres of chromosomes 14, 15 and 21.Most of the deletions in the cases were mainly observed at the short arms of chromosomes 1, 4, 5, 7, 11, 12, 16, 17 and 19, as well as the centromeres of chromosome 19.We observed a statistically-significant difference of CNA burden between case and reference groups (Table 2) for different sizes of CNAs We observed a statistically-significant difference of CNA burden between case and reference groups (Table 2) for different sizes of CNAs (>100 kb, 10-100 kb and 1-10 kb).For example, statistically-significant differences of >100 kb, 10-100 kb and 1-10 kb deletions were observed between reference (6.7%, 2.5%, 0.8%), pre-cancer (92.5%, 81.5%, 87.7%) and cancer (89.0%, 76.7%, 86.6%) groups, respectively (Table 2).The chromosomal locations of the copy number amplifications and deletions of the 22 autosomes and X chromosomes were shown by karyograms (Figure 1).Most of the amplifications are found in the cases at the short arms of chromosomes 1, 5, 8, 16, 19 and 20, as well as the centromeres of chromosomes 14, 15 and 21.Most of the deletions in the cases were mainly observed at the short arms of chromosomes 1, 4, 5, 7, 11, 12, 16, 17 and 19, as well as the centromeres of chromosome 19.

Principal Component Analysis (PCA)
To characterize aberration profiles in different diagnostic groups (low grade dysplasia, high grade dysplasia, SCC and reference), we performed a PCA (Figure 2).There was a great variability in the three diagnostic groups (reference, pre-cancer and cancer) and four diagnostic groups (reference, low grade, high grade and SCC).These results indicate that there is a distinct difference in the patterns of the CNAs between case and reference groups based on the genomic profiles of these subjects.

Principal Component Analysis (PCA)
To characterize aberration profiles in different diagnostic groups (low grade dysplasia, high grade dysplasia, SCC and reference), we performed a PCA (Figure 2).There was a great variability in the three diagnostic groups (reference, pre-cancer and cancer) and four diagnostic groups (reference, low grade, high grade and SCC).These results indicate that there is a distinct difference in the patterns of the CNAs between case and reference groups based on the genomic profiles of these subjects.
Overall, there was a clear separation between the reference and the case group when examined by PCA clustering.Overall, there was a clear separation between the reference and the case group when examined by PCA clustering.

Genome-Wide CNA and Cancer Progression
Increasing evidence demonstrates that more numerical genomic alterations are associated with progression from precursor lesions to invasive cancer [25].Thus, we conducted CNA genomic analysis in different diagnostic groups to understand the complexities of the genomic architecture of these highly heterogeneous groups of cervical cancer and to assess whether CNA burden impacts the prognosis of cervical cancer in this Mexican American population.We analyzed CNAs among two (pre-cancer and cancer; Table 2B) and three diagnostic groups (low grade, high grade and SCC; Table 2A) in cases only.was no statistically-significant difference of CNA burden among diagnostic groups in cases in any size of CNA, although there are slightly higher frequencies of amplifications in the 100 kb, 10-100 kb and 1-10 kb categories in the cancer group (11.0%, 23.3% and 13.4%, respectively) as compared to the pre-cancer group (7.5%, 18.5% and 12.3%, respectively) (Table 2) using 205 two-tailed χ 2 tests with the SPSS statistical software package.

Recurrent Aberrations Identified in Cases Only
We are interested in recurrent CNAs since the chromosome regions with recurrent CNAs more likely to harbor disease-critical genes are those that show alterations that are recurrent among individuals with cancer or other diseases [26,27].In this context, we can define a recurrent CNA region as a set of contiguous genes (a region) that shows a high enough probability (or evidence) of being altered (e.g., deletion or insertion) in two or more samples, as previously described [28].The recurrent aberrations in a total of 98 CNA regions containing 849 genes/loci were identified only in the case group (pre-cancer and cancer subjects) with at least two cases who carried these CNAs.None of these CNAs were observed in the reference group (Table 3).Among these 98 recurrent CNA regions, there were 545 deletions (84%) and 104 amplifications (16%) observed in cases only using statistical analyses described in the Methods section.
Table 3 shows the top 20 chromosome regions with a high number of recurrent somatic CNAs (>100 kb) in the cases with at least nine subjects carrying these deletions in each CNA region; none of these CNAs were observed in the reference group.Some of the CNAs contain tumor suppressor genes (e.g., axis inhibition protein 1 (AXIN1) and tuberous sclerosis complex 2 (TSC2)).
Eight exceptionally large CNA regions with deletions of between 1591 kb and 517 kb were detected in cases on short arms of chromosomes 1, 16 and 19.
In addition to recurrent deletions identified in the cases (Table 3), two recurrent CNA regions with a 220-kb amplification on 21q11.2 and a 113.5-kbdeletion on 7p22.2 were observed in only CIN-III and SCC.

Pathway Analysis
Recent studies have incorporated protein networks into the results of genome-wide CNA data using networks or GO analysis to discover disease-associated and/or enriched pathways [30][31][32][33].Therefore, we conducted GO analysis and Kyoto Encyclopedia of Genes and Genomes (KEGG) pathway analysis.
The results of the functional enrichment of KEGG pathway analysis in the identified CNAs >100 kb that occurred in two or more cases are shown in Table 6.Fourteen pathways were discovered, including the insulin signaling pathway and the endometrial cancer and estrogen signaling pathway, with enrichment p values lower than 0.05.A GO analysis was performed by selecting CNAs in the genes to see if any GO categories were overrepresented among CNAs identified in cases [34].Many biological processes were undisturbed at the molecular level, while others were frequently affected across multiple cases, as shown in Table 7, which lists the most commonly-affected function with p-values lower than 0.05 and more than four genes in each function category.

Discussion
It has been demonstrated that somatic structural alterations (e.g., amplifications or deletions) of human chromosomes represent a common class of mutations, which may cause gene disruption (e.g., deletion or rearrangement), gene activation (e.g., CNAs, gain or amplification) or the formation of novel oncogenic gene products (gene fusions).Many of these events actively drive carcinogenesis [35,36].In our initial cervical cancer cohort, some CNA patterns identified were previously found to be correlated with cervical cancer.
We screened whole genomes with 2.5 million markers of an array to discover any recurrent copy number alterations in cases (pre-cancer and SCC).We identified a total of 98 CNA regions >100 kb in the case group, including grade dysplasia, high grade dysplasia and SCC.These CNAs occurred in two or more cases, including two large deletions (1591 kb and 1123 kb) (Table 3).Of the top 20 CNA regions >100 kb deletions in cases only, six CNAs occurred on 16p13 that have been reported in cervical cancer [37], and mutation analysis of the AXIN1 gene located at 16p13 was reported to be involved in the Wnt pathway in cervical carcinomas [38].The CNA containing the SET and MYND domain-containing protein 4 (SMYD4) gene on 17p13.3 was also reported previously in this cancer [37].The SMYD4 gene is demonstrated to be a potential tumor suppressor and plays a critical role in breast carcinogenesis, at least partly through inhibiting the expression of platelet-derived growth factor receptor-alpha (PDGFRA), and this gene could be a novel target for improving the treatment of breast cancer [39].
A number of previous studies demonstrate that numerical chromosomal aberrations were found to progress to invasive cancer from precursor lesions [25,37]; however, we did not observe this feature in our current study sample.This might be due to the small sample size or any other unknown or not yet identified factors, such as ethnicity, since most previous study populations are non-Hispanic populations.
Among two amplifications identified in CIN-III and SCC, one recurrent CNA region with a 220-kb amplification on 21q11.2 was observed in CIN-III and co-occurred with previous findings, where the amplification had been identified in breast cancer subjects with tamoxifen resistance [40].
The recurrent CNA regions we identified contain a number of tumor suppressor genes, oncogenes and cancer-related genes.A CNA that contains the interferon-induced transmembrane protein 1 (IFITM1) gene at 11p15.5 was identified in eight cases.This gene has been reported to be involved in cervical carcinogenesis [41].A total of nine cases with the CNA containing the fucosyltransferase 3 (FUT3) gene at 19p13.3 were observed in the current study, and this gene is associated with breast cancer [42].A CNA region that includes the naked cuticle 2 (NKD2) gene at 5p15.35 was detected in six cases.This gene was found to suppress breast cancer proliferation by inhibiting Wnt signaling [43].The ovarian cancer-associated gene 2 protein (OVCA2) at 17p13.3 was repeatedly reported to be associated with ovarian cancer, and we observed five cases carrying the CNA at this gene region.A cluster of three tumor suppressor genes, cadherin 1 (CDH1), death-associated protein kinase (DAPK) and hypermethylated in cancer 1 (HIC1), displayed a significantly increased frequency of promoter methylation with progressively more severe cervical neoplasia.In addition, the Hes family BHLH Transcription Factor 5 (HES5) gene was reported to be associated with cervical carcinoma cells using immunocytochemistry, Western blot and methyl thiazolyl tetrazolium assays [44], and the CNAs on 1p36.33-1p36.32containing the HES5 gene were also identified in the current study.Our newly-identified CNA containing this gene on 7p22.2 was also observed in the recurrent pre-cancer and cancer subjects (Table 3).
Moreover, we also examined deletion burden observed to be similar to those for other cancers and found more deletions identified than amplification in most of the cancer studies, which supports our findings.A recent study using the TCGA data identified nine regions of deletion that were unique to estrogen receptor positive (ER+) post menopause tumors in patients with breast cancer [45], including deletion in 7p22.3,where our newly-identified deletion in cases only was located, and it contains a known tumor suppressor gene.
To analyze the possible effect of genomic alterations, to further capture cancer-causing gene information and to see whether any GO categories are overrepresented among CNA regions, we searched the KEGG pathway database and GO categories and identified a number of pathways and functions where CNAs occur in the SCC group, but not in the reference group (Tables 6 and 7).Furthermore, our results were partly in agreement with previous reports about cervical cancer.For example, using the functional enrichment of KEGG pathway analysis, we discovered the insulin signaling pathway in the case group, and it was evidenced that HPV 16 E6 oncoprotein interferes with the insulin signaling pathway by binding to tuberin [46].As is already known, high-risk HPV infection is a causal agent for cervical cancer.In addition, two pathways of interest are endometrial cancer and the estrogen signaling pathway observed in the current study.The NF-κB signaling pathway was also identified in our cancer group (SCC), but not in the reference group, which supports a previous study where bisphenol A (BPA) stimulated the cervical cancer migration via IKK-β/NF-κB signals [47].The estrogen signaling pathway was also shown in our KEGG analysis.A previous study was performed to evaluate the potential of miRNAs as novel markers for the post-therapeutic monitoring of cervical SCC patients.A regulatory network of differentially-expressed serum miRNAs was identified, and a number of target genes was predicted in the estrogen-mediated signal pathways [48].Ours and others' findings support that cervical cancer is a hormone-associated gynecological disease.For example, HPV infection has been associated with the deregulation of the PI3K-Akt-mTOR pathway in invasive cervical carcinomas [49].The PI3K-Akt signaling pathway was also found in our current study, and 14 genes in this pathway were found in two or more SCC cases (Table 6).
Moreover, signal transducer activity with an enrichment score of 4.2 and a p value of 0.01 containing 11 genes was also identified in a previous report on cervical cancer [50] using GO analysis.Signal transduction was identified in our current study in cervical cancer (SCC), but not in the reference group; a recent study also supports that a signal transduction cascade and mitogen-activated protein kinase kinase kinase 3 (MEKK3) serve as key integration points and are important factors in regulating cellular responses to environmental stress.This signal transduction and MEKK3 not only link diverse extracellular stimuli to subsequent signaling molecules, but they also amplify the initiating signals to ultimately activate effector molecules and induce cell proliferation, differentiation and survival of cervical cancer [51].
These pathways and functions were identified as overlapping biological themes, and these data may provide useful information on the molecular mechanisms for cervical cancer, its prognosis and treatment responses.
There are a number of novelties in this study: (1) this is the first report of CNAs in the relatively ethnically homogeneous group of Mexican Americans using high density mapping (2.5 million in number); previous studies using less density markers may result in an underestimation of the genetic changes that take place in cervical cancer; (2) newly-identified CNAs in cases together with results in the in silico analysis using KEGG and GO function provide insight into how multiple CNAs might contribute to cervical cancer development.We also are aware of some limitations in our study.(1) Our small sample size (14 cases and 125 references) is a major limitation for this type of study.Due to a small number of samples, the nature of genome-wide alteration of copy number may not be fully explained in pre-cancer and SCC.Future confirmation studies using an independent sample will provide an opportunity to more accurately dissect the genetic complexity of somatic CNAs for cervical cancer.(2) There might be a bias of the CNA identified in cervical cancer for the Mexican Americans, since we used the 1000 Genome admixed populations, not Mexican Americans; thus, we currently are recruiting more subjects with cervical cancer from the same population and plan to validate the findings in more samples.(3) Other biomarkers, such as RNA-Seq or DNA methylation profile and sequencing data using next generation sequencing technologies (such as target gene sequencing in these CNA regions), will provide an opportunity for in-depth molecular profiling of the fundamental biological processes of cervical cancer.We expect that with future validation and confirmation, newly-discovered CNAs can be used in cancer classification, diagnosis, prognosis and treatment responses.

Conclusions
Using high density SNP array analysis, we have shown extensive genome-wide CNA changes in pre-cancer and cancer groups as compared to the genome CNA profile in the reference group.Our results demonstrated that the recurrent alterations of CNAs occurred in cases of pre-cancer and SCC.Some of the somatic genomic gains and losses in cervical pre-cancer and cancer were in concordance with the results from previous studies.To our knowledge, no previous studies have applied genome-wide copy number analysis using such high density markers for cervical cancers in the Mexican American population; however, validation and confirmation in a larger sample size are needed.

Figure 1 .
Figure 1.Karyogram view of detected amplified and deleted regions across autosomes.Amplifications are shown at right side of the chromosomes, and deletions are shown at the left side of the chromosomes.The length of the horizontal bar corresponds to the number of samples observed at the respective cytobands.Most of the amplifications were found in cases at the short arms of chromosomes 1, 5, 8, 16, 19 and 20, as well as the centromeres of chromosomes 14, 15 and 21.Most of the deletions in the cases were mainly observed at the short arms of chromosomes 1, 4, 5, 7, 11, 12, 16, 17 and 19, as well as the centromere of chromosome 19.We observed a statistically-significant difference of CNA burden between case and reference groups (Table 2) for different sizes of CNAs

Figure 1 .
Figure 1.Karyogram view of detected amplified and deleted regions across autosomes.Amplifications are shown at right side of the chromosomes, and deletions are shown at the left side of the chromosomes.The length of the horizontal bar corresponds to the number of samples observed at the respective cytobands.Most of the amplifications were found in cases at the short arms of chromosomes 1, 5, 8, 16, 19 and 20, as well as the centromeres of chromosomes 14, 15 and 21.Most of the deletions in the cases were mainly observed at the short arms of chromosomes 1, 4, 5, 7, 11, 12, 16, 17 and 19, as well as the centromere of chromosome 19.We observed a statistically-significant difference of CNA burden between case and reference groups (Table2) for different sizes of CNAs (>100 kb, 10-100 kb and 1-10 kb).For example, statistically-significant differences of >100 kb, 10-100 kb and 1-10 kb deletions were observed between reference (6.7%, 2.5%, 0.8%), pre-cancer (92.5%, 81.5%, 87.7%) and cancer (89.0%, 76.7%, 86.6%) groups, respectively (Table2).

Figure 2 .BFigure 2 .
Figure 2. Plot of principal component analysis (PCA) and hierarchical clustering of copy number variation (CNV) or CNA datasets.(A) PCA scatter plot of three diagnostic groups (pre-cancer, cancer and reference).Each point represents a specific sample.Points are colored by group status, with blue representing pre-cancer (CIN-I, -II and -III), red representing invasive cancer and green representing references.(B) PCA scatter plot of four diagnostic groups.Each point represents a specific sample.

Table 1 .
Clinical demographics with diagnosis, age for cases and reference subjects.

Table 2 .
Copy number alteration (CNA) burden (deletion and amplification) among different diagnostic groups.

Table 3 .
Top 20 CNA chromosome regions with recurrent somatic CNAs (deletions >100 kb) in the cases; not observed in the reference group.

Table 6 .
Functional enrichment of Kyoto Encyclopedia of Genes and Genomes (KEGG) pathway analysis in the identified CNAs (>100 kb and occurring in two or more patients with SCC).

Table 7 .
Gene ontology (GO) categories of CNAs (>100 kb and occurring in two or more than two patients with SCC).