Abstract
Human papillomavirus (HPV) infection is a primary driver of cervical cancer. Integration of HPV into the human genome causes persistent expression of viral oncogenes E6 and E7, which promote carcinogenesis and disrupt host genomic function. However, the impact of integration on host gene expression remains incompletely understood. We used multimodal RNA sequencing, combining total RNA-seq and Cap Analysis of Gene Expression (CAGE), to clarify virus–host interactions after HPV integration. HPV-derived transcripts were detected in 17 of 20 clinical samples. In most specimens, transcriptional start sites (TSSs) showed predominant early promoter usage, and transcript patterns differed with detectable E4 RNA region. Notably, the high RNA expressions of E4 region and viral-human chimeric RNAs were mutually exclusive. Chimeric RNAs were identified in 13 of 17 samples, revealing 16 viral integration sites (ISs). CAGE data revealed two patterns of TSS upregulation centered on the ISs: a two-sided pattern (43.8%) and a one-sided pattern (31.3%). Total RNA-seq showed upregulation of 12 putative cancer-related genes near ISs, including MAGI1-AS1, HAS3, CASC8, BIRC2, and MMP12. These findings indicate that HPV integration drives transcriptional activation near ISs, enhancing expression of adjacent oncogenes. Our study deepens understanding of HPV-induced carcinogenesis and informs precision medicine strategies for cervical cancer.
1. Introduction
Cervical cancer is mainly caused by high-risk human papillomavirus (HR-HPV) infection [,,]. In HPV-infected cells, HPV-derived E6 and E7 oncoproteins inactivate p53 and pRb tumor suppressor proteins, which eventually cause resistance to apoptosis and promote cell proliferation [,,]. Continuous expression of E6 and E7 oncoproteins is important for cervical cancer progression [,]. HPV integration into the human genome is one of the most influential factors that induces the continuous expression of E6 and E7 oncoproteins.
Traditionally, HPV integration has been studied using techniques such as in situ hybridization and PCR amplification of viral-host fusion sequences or transcripts []. More recently, the advent of next-generation sequencing (NGS) technologies—particularly whole genome sequencing and RNA sequencing (RNA-seq)—has enabled more comprehensive identification of integration sites (ISs) []. Furthermore, the recent development of NGS has facilitated the gradual elucidation of genetic variations associated with cervical carcinogenesis [,]. For example, HPV integration not only induces the continuous expression of E6 and E7 oncoproteins but also triggers various genetic alterations, including oncogenic amplification, chromosomal rearrangements, and chromosomal instability [,,,]. However, direct evaluation of transcriptional activation surrounding the integration sites has remained challenging.
Cap analysis of gene expression (CAGE) is a transcriptome profiling method that can be used to determine the 5-terminal sequence of RNA, allowing the detection and quantitative measurements of promoter and their activities, respectively []. Moreover, CAGE enables the detection of enhancer elements by identifying their transcriptional direction [,]. In our previous report on CAGE, we revealed that HPV-derived transcription start sites (TSSs) were altered with cervical cancer progression. TSSs within the early viral promoter became dominant in cervical cancer and high-grade cervical intraepithelial neoplasia (CIN) []. CAGE cannot detect the integration of HPV DNA into the human genome or the associated genetic alterations. In contrast, paired-end total RNA-seq can detect transcriptionally active HPV DNA integration sites by identifying viral-human chimeric RNAs []. Hence, the combination of CAGE and total RNA-seq may enable us to assess the presence of activated HPV DNA integration sites, the expression of HPV-derived transcriptomes, and TSS upregulation around HPV DNA integration sites.
This study aimed to perform an integrative analysis involving CAGE and total RNA-seq to reveal the genomic status as well as TSS upregulation around integration sites caused by HPV integration into the human genome.
2. Materials and Methods
2.1. Patients and Clinical Samples
This study was conducted in accordance with the principles of the Declaration of Helsinki. HPV-infected cervical cancer tissue samples were obtained from biopsy or surgical samples. The diagnosis was confirmed by experienced pathologists based on a pathological examination performed at the University of Tokyo Hospital. All experimental procedures were approved by the Institutional Review Board of the University of Tokyo (approval number: G0637), and formal informed consent was obtained from each patient for the use of clinical samples.
2.2. HPV Genotyping
DNA was extracted from cervical cancer tissues using the QIAamp DNA Mini Kit (QIAGEN, Hilden, Germany) according to the manufacturer’s instructions. HPV genotyping assays were performed using multiplex polymerase chain reaction (PCR), which is a rapid, high-throughput genotyping procedure that allows the simultaneous detection of 16 types of genital HPV [], or PCR with PGMY primers followed by reverse line blot hybridization, which allows the detection of 31 types of HPV [].
2.3. Single-Strand CAGE and Total RNA-Seq
Cervical cancer tissues were homogenized in a gentleMACS M Tube (Miltenyi Biotec, Bergisch Gladbach, Germany) with gentleMACS Dissociators (Miltenyi Biotec). Total RNA was isolated using the miRNeasy Mini Kit (QIAGEN). The RNA-seq library was generated using the TruSeq Stranded Total RNA Library Prep kit with Ribo-Zero Human/Mouse/Rat (Illumina, San Diego, CA, USA) using 0.5 μg of total RNA, according to the manufacturer’s protocol. The CAGE library was prepared using 4 μg of total RNA, as previously reported []. The cDNA libraries were sequenced using a 50-base paired-end sequence for total RNA-seq and a 50-base single-read sequence for CAGE on an Illumina HiSeq 2500 sequencer (Illumina). The sequencings and base call analyses were performed according to the HiSeq 2500 System Guide using the TruSeq SBS kit v3-HS (Illumina). After the sequencings, raw sequence data were processed using CASAVA-1.8.4, version RTA 1.17.20.0.
2.4. Mapping, Quantifying, and Visualizing HPV Transcriptomes
The total RNA-seq and CAGE-seq reads were mapped to the reference genomes of HPV16, HPV18, HPV33, HPV45, HPV52, and HPV58 retrieved from the papillomavirus genome database PaVE (https://pave.niaid.nih.gov/) using TopHat2 v. 2.0.6 [] and STAR v. 2.7.2b [], respectively. The mapping results were visualized using IGV.js v2.7.0 [,] and implemented on the NGS analysis platform Maser []. Based on the mapping results, the expression levels of the reference genomes were estimated according to the PaVE-annotated genes using CuffLinks v 2.0.2 suites [].
2.5. Identification of Viral-Human Chimeric RNA and HPV ISs
Viral-human chimeric RNA was identified as previously reported [] with slight modifications. Briefly, de novo transcriptome assemblies were performed on each RNA-seq data using Trinity r2013-02-24 [,]. The resultant contigs were aligned to the HPV reference genomes using BLASTN (BLAST+ version 2.10.0) with the optional settings of E-value < 1 × 10−6 and percent identity ≥98%. Next, the contigs that aligned to the HPV genomes were aligned to the hg38 reference genome (Human G+T database) using the National Center for Biotechnology Information (NCBI) Web BLAST interface according to the criteria described above. Finally, contigs that aligned with the HPV and human reference genomes and whose regions did not overlap were defined as viral-human chimeric RNAs.
In each chimeric RNA, the 5′ end of the region where the chimeric RNA aligned with the hg38 reference genome was considered the IS. Among them, sites at 500 kb or less were bundled, and the 5′ end position in each bundle was defined as the representative HPV IS.
2.6. Human Gene Expression Analysis of Total RNA-Seq
The expression levels of human genes were estimated to investigate the transcriptional activity downstream of the IS. Total RNA-seq reads were mapped to the hg38 reference genome using TopHat2 v 2.0.6 [], and expression levels were calculated based on the NCBI reference sequences (RefSeq) using CuffLinks v 2.0.2 suites. Since topologically associating domains (TADs) are typically up to 1 Mb in size, we analyzed gene expression levels within 500 kb upstream and downstream of the ISs. The genes at 500 kb or less from the IS were defined as the “IS-neighboring genes.”
2.7. Quantification of CAGE Reads at the HPV ISs
To investigate transcriptional upregulation around the ISs, CAGE-seq reads mapped to the hg38 reference genome were analyzed using STAR v.2.7.2b []. From the mapping results (BAM files), the number of reads located within 1 kb and 10 kb windows centered on each IS was counted. For each read, the mapping start position (4th field in the BAM file) was compared with the genomic position of the IS. Read counts were obtained separately for the 5′ (upstream) and 3′ (downstream) sides, and further divided into sense and antisense strands, as well as their combined totals. To assess whether an IS was associated with transcriptional upregulation, the CAGE read counts were compared among the 20 cases. The case harboring the IS was judged to show upregulation if it had the highest read counts and more than twice the number of counts compared to the 75th percentile of the cohort.
3. Results
3.1. Transcriptome Analysis of HPV-Derived Transcripts Using CAGE and Total RNA-Seq
Twenty HPV-positive cervical cancer specimens were analyzed using CAGE and total RNA-seq. The clinicopathological characteristics and corresponding HPV genotypes are summarized in Table 1. Sequence data were mapped to hg38 and the corresponding HPV references. The mapping rates of the corresponding HPV references are summarized in Table 1 and Figure S1. Of the 20 cervical cancer cases analyzed, five datasets (Cx02_HPV33, Cx05_HPV52, Cx09_HPV16, Cx15_HPV16, and Cx20_HPV45) had mapping rates to the HPV genome below 0.001%. For Cx09 and Cx15—both small tumor samples—macroscopic sampling for RNA extraction (to preserve tissue for pathology) may have introduced sampling error and led to low HPV detection. The absence of HPV-derived reads in Cx20 remains unexplained, though sampling error cannot be ruled out. Cx02 (HPV33) and Cx05 (HPV52) showed low expression of type-specific gene products but strong HPV16 transcript levels, indicating HPV16 as the primary oncogenic driver. Because our aim was to analyze HPV-derived gene expression, we excluded Cx09, Cx15, and Cx20 from downstream analyses. Detailed sequencing metrics, including total reads, uniquely mapped reads, overall mapping rate, and RNA integrity number, are provided in Table S1.

Table 1.
Clinicopathological characteristics and summary of the sequences.
We visualized HPV-derived TSS activities at the single-nucleotide level. As described in our previous paper, HPV-derived CAGE TSS patterns can be classified into two types: the early promoter-dominant type (Type A), which signifies activated expression of oncogenes E6 and E7, and the late promoter-dominant type (Type B), characterized by the expression of late genes such as L1 and L2, often detected in less severe cases of CIN []. Except for the HPV16-positive case (Cx08), all cases were of the prominent early promoter sequence type (Figure 1A,C). Similarly, HPV-derived transcription patterns identified using total RNA-seq were classified into two types according to the expression pattern of detectable E4 RNA regions: an E6/E7 dominant pattern (Type I) and a pattern with the strong expression of E4 RNA regions (Type II). Among HPV16-positive cases, five exhibited the E6/E7 dominant pattern, whereas three showed a strong expression of detectable E4 RNA regions (Figure 1B). In contrast, all HPV18-positive cases showed E6/E7 dominant total RNA-seq patterns, and the expression of E4 RNA regions was reduced (Figure 1D).

Figure 1.
Expression of the HPV-derived transcriptome. CAGE, cap analysis gene expression. CAGE sequence data (A) and RNA sequence data (B) were mapped to the HPV16 genome and CAGE sequence data (C), and RNA sequence data (D) were mapped to the HPV18 genome. Mapped data were visualized using IGV.
3.2. Identification of Chimeric RNA
To investigate HPV DNA integration, we detected viral-human chimeric RNAs using an assembly-based approach []. A total of 93 chimeric RNAs were identified from 13 samples, including five of eight HPV16-positive cases (63%), all seven HPV18-positive cases (100%), and one HPV52-positive case (Table S2). CAGE patterns, total RNA-seq patterns, and the presence of chimeric RNA in HPV16 and HPV18 cases are summarized in Table 2. All five HPV16-positive cases with detectable chimeric RNAs exhibited a dominant E6/E7 expression pattern in RNA-seq, whereas the three HPV16-positive cases without chimeric RNA detection showed the strong RNA expression of the E4 region. Six of the seven HPV18-positive cases lacked the detectable E4 and E5 RNA regions, suggesting that HPV integration may have occurred upstream of these loci. Conversely, three of the five HPV16-positive cases with chimeric RNAs retained the detectable E4 and E5 RNA regions, indicating that integration did not occur upstream of these regions, or that both integrated and episomal HPV genomes coexist. According to recent findings [], most splicing variants derived from either the early or late promoters terminate at the early polyadenylation site (pAE) located around position 4215, and these transcripts typically include the E4 and E5 regions. Therefore, the absence of detectable E4 and E5 RNA regions may suggest a disruption or loss of the pAE site. This is also consistent with our observation that detectable E4 RNA region and the presence of chimeric RNAs were mutually exclusive in HPV16-positive cases, suggesting that transcription termination via the viral pAE site may not be functioning properly in the presence of viral integration.

Table 2.
Summary of the RNA seq and CAGE patterns and chimeric RNAs.
Subsequently, we focused on the structures of 93 chimeric RNAs. Among them, host-derived RNA was located upstream of HPV-derived RNA in 10 chimeric RNAs, whereas the other 83 chimeric RNAs started from the HPV TSS (Table 2 and Table S2). Among 83 chimeric RNAs with an HPV sequence upstream of the human sequence, 72 started from the long control region of the HPV early promoter. A representative case (Cx10) is shown in Figure 2A. In this case, six chimeric RNAs starting from the HPV early promoter were identified, and CAGE peaks corresponding to the HPV early promoter were identified. However, the remaining 11 chimeric RNAs began from other starting sites (one from nt1377 in Cx02, one from nt6033 in Cx19, four from nt5725 in Cx17, and five from nt6613 in Cx18). Although these 11 chimeric RNAs start from four specific sites on the HPV genome, no corresponding CAGE peaks were observed (Figure 2B and Figure S2), suggesting that these chimeric RNAs may lack 5′ cap structures. However, because CAGE may fail to capture a fraction of capped transcripts due to technical limitations such as library preparation efficiency or RNA structural features, the absence of CAGE peaks does not definitively prove that these RNAs were originally uncapped.

Figure 2.
Structures of chimeric RNA. Examples of the chimeric RNA identified using the assembly method are shown. (A) Chimeric RNAs starting from the early promoter, (B) Chimeric RNAs originating from a specific region of the HPV genome other than the early promoter. Arrows shown in orange indicate the inserted HPV genome direction. RNA-seq and CAGE-mapped data were visualized using IGV.
3.3. Transcriptome Upregulation of the Chimeric Sites
Some integration sites have been reported to acquire super-enhancer-like structures []. We subsequently investigated the upregulation of human TSSs at the ISs and the expression levels of their neighboring genes. For this analysis, ISs located within 500 kb of each other were bundled into one IS, and 16 ISs were identified in 13 samples. First, we compared CAGE read counts within 1 kb and 10 kb windows from each integration site. In the 1 kb window, almost no reads were detected (median = 0, IQR = 0–1, Figure S3A), and therefore this window was excluded from further analyses. In contrast, in the 10 kb window, increased CAGE signals were consistently observed. To investigate the upregulation of human TSSs at the ISs, CAGE reads within a 10 kb window centered on each IS were counted for both 5′ and 3′ sides, and compared within the cohort (Figure 3 and Figure S3B). When analyzed separately on the 5′ and 3′ sides, as well as by strand orientation, the increase in CAGE reads was not restricted to a single strand but detected on both strands (Figure S3B). We found that seven of the 16 ISs (IS_01, IS_02, IS_03, IS_04, IS_06, IS_10, and IS_12) exhibited a two-sided pattern of TSS upregulation, with local upregulation of CAGE reads on both sides of the IS (Figure 3). On the other hand, five ISs showed a one-sided pattern of TSS upregulation (IS_05, IS_08, IS_11, IS_13, and IS_14), with increased CAGE read counts observed only on the downstream (3′) side in all cases. The remaining four ISs did not show TSS upregulation associated with HPV integration (IS_07, IS_09, IS_15, and IS_16).

Figure 3.
Upregulation of human transcription start sites at HPV integration sites. A heatmap summarizing the upregulation of human transcription start sites (TSSs) located within 10 kb regions upstream (5′, green) and downstream (3′, red) of HPV integration sites (ISs). For each direction, the total number of TSSs was calculated, and values were scaled relative to the sample with the highest count in that direction (set as 1). Color intensity corresponds to the relative TSS activation level. Separate color scales are shown for upstream (green) and downstream (red) regions. HPV-ISs bordered by a thick black square indicate the presence of HPV integration in the corresponding cases.
Neighboring genes were defined as those located within 500 kb of the detected ISs. If the expression level of the target genes in the sample with integration at the corresponding region was more than twice that of the other samples, the gene was identified as being upregulated upon HPV integration. The expression levels of the neighboring genes at each IS are summarized in Table S3. Among the 12 activated ISs, seven led to the upregulation of at least one neighboring gene (Table S3), including PPARG for IS_02 in Cx02, MAGI1-AS1 for IS_03 in Cx17, C6orf99 for IS_04 in Cx07, CCAT2, POU5F1B, CASC8, and CASC11 for IS_05 in Cx19, BIRC2, MMP10, and MMP12 for IS_08 in Cx11, ABHD17C for IS_11 in Cx14, and HAS3 for IS_13 in Cx05 (Table S3 and Figure 4). All upregulated genes around the ISs were possibly cancer-related. While HPV frequently integrates into transcriptionally active regions, some TSSs—including those of PPARG, MAGI1-AS1, C6orf99, CCAT2, POU5F1B, and CASC8—were upregulated exclusively in samples with integration at that locus, supporting the possibility that integration contributed to transcriptional activation in these cases.

Figure 4.
Activation of cancer-related genes around HPV integration sites. Comparison of gene-expression levels around HPV integration sites (ISs) among samples. Cases with the corresponding HPV-IS are marked in red. The 12 genes with increased expression in the cases where each HPV-IS was identified are highlighted.
Among the 17 tumors analyzed, Cx02, Cx14, and Cx06 each harbored two distinct HPV ISs of the same genotype. The transcriptional activity of integrated HPV varies depending on the integration site, and it is known that clones harboring transcriptionally active integration sites preferentially expand []. Furthermore, previous studies have suggested that, in tumors with multiple integration events, only one integration site is typically transcriptionally dominant, while the others may be transcriptionally silent []. Among the three cases with two ISs, two cases—Cx02 and Cx14—exhibited differences in IS activity. In Cx02 with IS02 and IS12, only the PPARG located near IS02 was transcriptionally activated. Similarly, in Cx14 with IS11 and IS15, only ABHD17C, located near IS11, was activated, suggesting that IS02 and IS11 are the dominant ISs in Cx02 and Cx14, respectively.
4. Discussion
In our study, we conducted multimodal RNA-seq analysis (CAGE and total RNA-seq) and elucidated the transcriptional alteration induced by HPV integration. Especially, we identified two types of TSS upregulation: one in which CAGE read counts increased on both sides of the integration site (two-sided pattern), and another in which increased counts were observed on only one side (one-sided pattern). We also confirmed the enhanced expression of cancer-related genes around ISs.
Consistent with the findings of our previous study, we confirmed that in almost all cancer specimens, HPV-derived TSS exhibited an early promoter-prominent type. Moreover, the patterns of HPV-derived transcripts analyzed using total RNA-seq were classified based on the RNA expression of the E4 region into two types (type I and II). Integration frequently occurs within the E1/E2 region of the HPV genome [,,], and subsequent loss of downstream viral gene expression often leads to the disappearance of E4 RNA region. In this study, we showed that a high expression of the E4 region was mutually exclusive of the presence of viral-human chimeric RNAs. This inverse correlation between the RNA expression of the E4 region and the presence of viral-human chimeric RNAs may be associated with episomal loss in cells with HPV integration [].
A combination of total RNA-seq and CAGE revealed that although the HPV early promoter was the most activated promoter in almost all cervical cancer samples, regardless of the HPV type, the RNA expression of the E4 region and the presence of chimeric RNA differed between HPV16 and HPV18. In HPV16-positive cases, three of eight cases were negative for chimeric RNA and positive for the RNA expression of the E4 region. Even among cases with HPV integration, the expression of the E4 region was slightly detected in 60% of the cases. In contrast, all HPV18-positive cases were positive for chimeric RNAs and almost negative for the RNA expression of the E4 region. A previous study demonstrated that the viral genome integration rate is higher in HPV18-positive cells than in HPV16-positive ones []. Additionally, we had previously demonstrated a low detection rate of E1^E4 RNA region in HPV18-positive cervical precancerous lesions [], suggesting that HPV18 integration occurs at an early stage of cervical carcinogenesis. Our results reflect the different genomic statuses of HPV16 and HPV18 in cervical cancer cells; specifically, HPV genome integration is observed at a higher frequency in HPV18 than in HPV16.
Some integration sites have been reported to acquire super-enhancer-like structures []. In a previous study, a super-enhancer-like region was identified based on the accumulation of BRD4 and H3K27ac. While it is difficult to define super-enhancers solely by CAGE analysis, several studies have reported that active enhancers can be predicted by their characteristic bidirectional TSS activity [,]. In our study, TSS upregulation on both sides of the IS, potentially reflecting enhancer-like activity, was observed in seven of 16 ISs. In contrast, a one-sided pattern of TSS upregulation was observed in five ISs, whereas no TSS upregulation was observed in the remaining four ISs. These findings suggest that HPV integration does not necessarily form structures resembling active enhancers, and its effects may vary depending on the integration site and case. Notably, a one-sided pattern of TSS upregulation was also associated with the upregulation of nearby genes via HPV integration. These results indicate that although some integration sites do not generate active enhancers, HPV integration can upregulate nearby cancer-related genes. Therefore, focusing on such integration sites may help elucidate tumorigenic mechanisms and identify novel therapeutic targets.
By combining total RNA-seq and CAGE, we identified 12 HPV integration-associated upregulated genes: PPARG, MAGI1-AS1, C6orf99, CCAT2, POU5F1B, CASC8, CASC11, BIRC2, MMP10, MMP12, ABHD17C, and HAS3. Consistent with previous reports that HPV integration perturbs host gene expression within the same topologically associating domain (TAD) [], most activated genes mapped directly to IS or lay immediately adjacent, forming continuous regions of transcriptional upregulation. These patterns support the notion that TAD architecture constrains HPV-mediated transcriptional activation. Three genes, POU5F1B [], MMP12 [], and CASC8 [], have been previously reported as HPV integration targets. The nine novel candidates include PPARG, a nuclear receptor involved in differentiation and metabolism and implicated in cancer progression [,]; MMP10, which promotes tumor invasion via extracellular matrix degradation []; and the long noncoding RNAs CASC11 and CCAT2, both regulators of Wnt/β-catenin and MYC signaling pathways [,,,]. We also identified BIRC2, an apoptosis inhibitor and therapeutic target in various cancers []; MAGI1-AS1, an antisense RNA to the tumor suppressor MAGI1 that may promote tumorigenesis by suppressing MAGI1 function []; C6orf99, a suggested breast cancer biomarker []; ABHD17C, which participates in protein palmitoylation affecting membrane-associated oncogenic signaling []; and HAS3, which synthesizes hyaluronan to shape the tumor microenvironment and enhance cancer cell migration []. These observations reinforce the idea that HPV integration preferentially occurs near genes with key roles in cancer biology, some of which may represent novel therapeutic targets.
This study has some limitations. First, in our study, we did not perform statistical analyses due to the small sample size. However, this is the first study to demonstrate TSS upregulation around HPV ISs by combining total RNA-seq and CAGE to simultaneously identify active HPV ISs, upregulated neighboring genes, and TSS upregulation at the ISs. Further research is warranted to confirm the association between TSS upregulation in the human genome and the upregulation of neighboring genes. Second, this study analyzed tumor tissues that were pathologically diagnosed as cancer, but whether the dissected cancer tissues used for RNA extraction contained precancerous lesions was not verified. The CAGE TSS shift is more important in precancerous lesions because it reflects cellular differentiation and the steps involved in carcinogenesis []. Further research focusing specifically on precancerous lesions is needed to elucidate the alterations in HPV genome status, TSS upregulation, and cervical cancer development. Third, we did not further verify any identified integration sites or mapped CAGE sites by other methods. Fourth, although our study provided detailed structural information on integrated viral transcripts based on total RNA-seq data, we did not perform RACE analysis. Since 3′ or 5′ RACE is necessary to accurately determine the origin of polyadenylated ends and to identify splicing variants [], the absence of such experiments represents a limitation in the structural characterization of the integrated transcripts.
5. Conclusions
By using multimodal RNA-seq analysis combining total RNA-seq and CAGE, we revealed transcriptional upregulation around HPV integration sites and increased expression of adjacent cancer-related genes. This research is important as it elucidated TSS upregulation at HPV integration sites and showed the genomic and transcriptomic profiles of individual cervical cancer cases. The combination of total RNA-seq and CAGE will not only lead to the elucidation of HPV-induced carcinogenesis in individual patients but also mark a breakthrough in precision medicine for patients with cervical cancer.
Supplementary Materials
The following supporting information can be downloaded at: https://www.mdpi.com/article/10.3390/v17101344/s1, Table S1: The number of sequenced reads and mapping rate of RNA sequencing; Table S2: Mapping information of chimeric RNA; Table S3: List of integration sites and gene expression around integration sites; Figure S1: Mapping rate against HPV genome; Figure S2: Structure of chimeric RNAs originating from HPV genome other than early promoter regions; Figure S3: (A) CAGE read counts around HPV integration sites in 1 kb and 10 kb, (B) CAGE read counts within 10 kb windows from integration sites by 5′/3′ position and sense/antisense orientation.
Author Contributions
Conceptualization, A.T. and K.I.; methodology, S.K.; formal analysis, K.T., S.K. and H.I.; investigation, K.T., A.T., D.Y., A.Q.D., Y.Y., M.M. and K.S.; data curation, K.T., S.K., H.I., M.H. and K.I.; writing—original draft preparation, K.T., S.K. and A.T.; writing—review and editing, K.N., D.Y., A.Q.D., Y.Y., H.I., M.M., K.S., M.H., K.K., K.I., Y.H. and Y.O.; visualization, K.T., S.K. and H.I.; supervision, K.N., K.K., Y.H. and Y.O.; project administration, A.T. and K.I.; funding acquisition, A.T. The work reported in the paper has been performed by the authors, unless clearly specified in the text. All authors have read and agreed to the published version of the manuscript.
Funding
This research was funded by a grant to A.T. from AMED (grant numbers: 22wm0325014h0003 and 25wm0325057h0003) and JSPS KAKENHI (grant number: 23K08839). This research was also partially supported by Platform Project for Supporting Drug Discovery and Life Science Research (Basis for Supporting Innovative Drug Discovery and Life Science Research (BINDS)) from AMED under Grant Number JP18am0101102.
Institutional Review Board Statement
All experimental procedures were approved by the Institutional Review Board of the University of Tokyo (approval number: G0637, approved on 5 November 2012).
Informed Consent Statement
Formal informed consent was obtained from each patient for the use of clinical samples.
Data Availability Statement
The sequence data used in this study has been deposited at the Japanese Genotype-phenotype Archive (JGA, https://www.ddbj.nig.ac.jp/jga (accessed on 25 July 2025)), which is hosted by the Bioinformation and DDBJ Center, under accession number JGAS000822.
Acknowledgments
We thank Terufumi Yokoyama for HPV typing. We thank all patients who participated in this research and the staff involved in their care, as well as CGA at RIKEN for preparation and sequencing of CAGE libraries.
Conflicts of Interest
Sonoko Kinjo was affiliated with an academic institution at the time the research was conducted and is currently employed by Varinos Inc. The company was not involved in this study, and there are no conflicts of interest to declare. The other authors declare no conflicts of interest. The funders had no role in the design of the study, in the collection, analyses, or interpretation of data, in the writing of the manuscript, or in the decision to publish the results.
Abbreviations
The following abbreviations are used in this manuscript:
BINDS | Basis for Supporting Innovative Drug Discovery and Life Science |
CAGE | cap analysis of gene expression |
CIN | Cervical intraepithelial neoplasia |
HPV | human papillomavirus |
IGV | Integrative Genomics Viewer |
IS | Integration Site |
NCBI | National Center for Biotechnology Information |
NGS | next-generation sequencing |
ORF | open reading frame |
PCR | polymerase chain reaction |
RNA-seq | RNA sequencing |
RIN | RNA integrity number |
TAD | Topologically associating domain |
TSS | transcription start sites |
References
- Lowy, D.R.; Schiller, J.T. Reducing HPV-associated cancer globally. Cancer Prev. Res. 2012, 5, 18–23. [Google Scholar] [CrossRef]
- Yim, E.-K.; Park, J.-S. The role of HPV E6 and E7 oncoproteins in HPV-associated cervical carcinogenesis. Cancer Res. Treat. 2005, 37, 319–324. [Google Scholar] [CrossRef] [PubMed]
- Centers for Disease Control and Prevention (CDC). Human papillomavirus-associated cancers—United States, 2004–2008. Morb. Mortal. Wkly. Rep. 2012, 61, 258–261. [Google Scholar]
- Doorbar, J. Molecular biology of human papillomavirus infection and cervical cancer. Clin. Sci. 2006, 110, 525–541. [Google Scholar] [CrossRef]
- Münger, K.; Scheffner, M.; Huibregtse, J.M.; Howley, P.M. Interactions of HPV E6 and E7 oncoproteins with tumour suppressor gene products. Cancer Surv. 1992, 12, 197–217. [Google Scholar]
- Jeon, S.; Allen-Hoffmann, B.L.; Lambert, P.F. Integration of human papillomavirus type 16 into the human genome correlates with a selective growth advantage of cells. J. Virol. 1995, 69, 2989–2997. [Google Scholar] [CrossRef]
- McBride, A.A.; Warburton, A. The role of integration in oncogenic progression of HPV-associated cancers. PLOS Pathog. 2017, 13, e1006211. [Google Scholar] [CrossRef]
- Wentzensen, N.; Vinokurova, S.; von Knebel Doeberitz, M. Systematic review of genomic integration sites of human papillomavirus genomes in epithelial dysplasia and invasive cancer of the female lower genital tract. Cancer Res. 2004, 64, 3878–3884. [Google Scholar] [CrossRef]
- Bodelon, C.; Untereiner, M.E.; Machiela, M.J.; Vinokurova, S.; Wentzensen, N. Genomic characterization of viral integration sites in HPV-related cancers. Int. J. Cancer 2016, 139, 2001–2011. [Google Scholar] [CrossRef]
- The Cancer Genome Atlas Research Network. Integrated Genomic and Molecular Characterization of Cervical Cancer. Nature 2017, 543, 378–384. [Google Scholar] [CrossRef]
- Hu, Z.; Zhu, D.; Wang, W.; Li, W.; Jia, W.; Zeng, X.; Ding, W.; Yu, L.; Wang, X.; Wang, L.; et al. Genome-wide profiling of HPV integration in cervical cancer identifies clustered genomic hot spots and a potential microhomology-mediated integration mechanism. Nat Genet. 2015, 47, 158–163. [Google Scholar] [CrossRef] [PubMed]
- Rusan, M.; Li, Y.Y.; Hammerman, P.S. Genomic landscape of human papillomavirus-associated cancers. Clin. Cancer Res. 2015, 21, 2009–2019. [Google Scholar] [CrossRef] [PubMed]
- Akagi, K.; Li, J.; Broutian, T.R.; Padilla-Nash, H.; Xiao, W.; Jiang, B.; Rocco, J.W.; Teknos, T.N.; Kumar, B.; Wangsa, D.; et al. Genome-wide analysis of HPV integration in human cancers reveals recurrent, focal genomic instability. Genome Res. 2014, 24, 185–199. [Google Scholar] [CrossRef] [PubMed]
- Shen, C.; Liu, Y.; Shi, S.; Zhang, R.; Zhang, T.; Xu, Q.; Zhu, P.; Chen, X.; Lu, F. Long-distance interaction of the integrated HPV fragment with MYC gene and 8q24.22 region upregulating the allele-specific MYC expression in HeLa cells. Int. J. Cancer 2017, 141, 540–548. [Google Scholar] [CrossRef]
- Kamal, M.; Lameiras, S.; Deloger, M.; Morel, A.; Vacher, S.; Lecerf, C.; Dupain, C.; Jeannot, E.; Girard, E.; Baulande, S.; et al. Human papilloma virus (HPV) integration signature in Cervical Cancer: Identification of MACROD2 gene as HPV hot spot integration site. Br. J. Cancer 2021, 124, 777–785. [Google Scholar] [CrossRef]
- Shiraki, T.; Kondo, S.; Katayama, S.; Waki, K.; Kasukawa, T.; Kawaji, H.; Kodzius, R.; Watahiki, A.; Nakamura, M.; Arakawa, T.; et al. Cap analysis gene expression for high-throughput analysis of transcriptional starting point and identification of promoter usage. Proc. Natl. Acad. Sci. USA 2003, 100, 15776–15781. [Google Scholar] [CrossRef]
- Andersson, R.; Gebhard, C.; Miguel-Escalada, I.; Hoof, I.; Bornholdt, J.; Boyd, M.; Chen, Y.; Zhao, X.; Schmidl, C.; Suzuki, T.; et al. An atlas of active enhancers across human cell types and tissues. Nature 2014, 507, 455–461. [Google Scholar] [CrossRef]
- Kouno, T.; Moody, J.; Kwon, A.T.-J.; Shibayama, Y.; Kato, S.; Huang, Y.; Böttcher, M.; Motakis, E.; Mendez, M.; Severin, J.; et al. C1 CAGE detects transcription start sites and enhancer activity at single-cell resolution. Nat. Commun. 2019, 10, 360. [Google Scholar] [CrossRef]
- Taguchi, A.; Nagasaka, K.; Plessy, C.; Nakamura, H.; Kawata, Y.; Kato, S.; Hashimoto, K.; Nagamatsu, T.; Oda, K.; Kukimoto, I.; et al. Use of Cap Analysis Gene Expression to detect human papillomavirus promoter activity patterns at different disease stages. Sci. Rep. 2020, 10, 17991. [Google Scholar] [CrossRef]
- Brant, A.C.; Menezes, A.N.; Felix, S.P.; de Almeida, L.M.; Sammeth, M.; Moreira, M.A.M. Characterization of HPV integration, viral gene expression and E6E7 alternative transcripts by RNA-Seq: A descriptive study in invasive cervical cancer. Genomics 2019, 111, 1853–1861. [Google Scholar] [CrossRef]
- Nishiwaki, M.; Yamamoto, T.; Tone, S.; Murai, T.; Ohkawara, T.; Matsunami, T.; Koizumi, M.; Takagi, Y.; Yamaguchi, J.; Kondo, N.; et al. Genotyping of human papillomaviruses by a novel one-step typing method with multiplex PCR and clinical applications. J. Clin. Microbiol. 2008, 46, 1161–1168. [Google Scholar] [CrossRef] [PubMed]
- Azuma, Y.; Kusumoto-Matsuo, R.; Takeuchi, F.; Uenoyama, A.; Kondo, K.; Tsunoda, H.; Nagasaka, K.; Kawana, K.; Morisada, T.; Iwata, T.; et al. Human papillomavirus genotype distribution in cervical intraepithelial neoplasia grade 2/3 and invasive cervical cancer in Japanese women. Jpn. J. Clin. Oncol. 2014, 44, 910–917. [Google Scholar] [CrossRef] [PubMed]
- Morioka, M.S.; Kawaji, H.; Nishiyori-Sueki, H.; Murata, M.; Kojima-Ishiyama, M.; Carninci, P.; Itoh, M. Cap analysis of gene expression (CAGE): A quantitative and genome-wide assay of transcription start sites. Methods Mol. Biol. 2020, 2120, 277–301. [Google Scholar] [CrossRef] [PubMed]
- Kim, D.; Pertea, G.; Trapnell, C.; Pimentel, H.; Kelley, R.; Salzberg, S.L. TopHat2: Accurate alignment of transcriptomes in the presence of insertions, deletions and gene fusions. Genome Biol. 2013, 14, R36. [Google Scholar] [CrossRef]
- Dobin, A.; Davis, C.A.; Schlesinger, F.; Drenkow, J.; Zaleski, C.; Jha, S.; Batut, P.; Chaisson, M.; Gingeras, T.R. STAR: Ultrafast universal RNA-seq aligner. Bioinformatics 2013, 29, 15–21. [Google Scholar] [CrossRef]
- Robinson, J.T.; Thorvaldsdóttir, H.; Winckler, W.; Guttman, M.; Lander, E.S.; Getz, G.; Mesirov, J.P. Integrative genomics viewer. Nat. Biotechnol. 2011, 29, 24–26. [Google Scholar] [CrossRef]
- Robinson, J.T.; Thorvaldsdottir, H.; Turner, D.; Mesirov, J.P. igv.js: An embeddable JavaScript implementation of the Integrative Genomics Viewer (IGV). Bioinformatics 2023, 39, 830. [Google Scholar] [CrossRef]
- Kinjo, S.; Monma, N.; Misu, S.; Kitamura, N.; Imoto, J.; Yoshitake, K.; Gojobori, T.; Ikeo, K. Maser: One-Stop Platform for NGS Big Data from Analysis to Visualization. Database 2018, 2018. [Google Scholar] [CrossRef]
- Trapnell, C.; Hendrickson, D.G.; Sauvageau, M.; Goff, L.; Rinn, J.L.; Pachter, L. Differential analysis of gene regulation at transcript resolution with RNA-seq. Nat. Biotechnol. 2013, 31, 46–53. [Google Scholar] [CrossRef]
- Grabherr, M.G.; Haas, B.J.; Yassour, M.; Levin, J.Z.; Thompson, D.A.; Amit, I.; Adiconis, X.; Fan, L.; Raychowdhury, R.; Zeng, Q.; et al. Full-length transcriptome assembly from RNA-Seq data without a reference genome. Nat. Biotechnol. 2011, 29, 644–652. [Google Scholar] [CrossRef]
- Haas, B.J.; Papanicolaou, A.; Yassour, M.; Grabherr, M.; Blood, P.D.; Bowden, J.; Couger, M.B.; Eccles, D.; Li, B.O.; Lieber, M.; et al. De novo transcript sequence reconstruction from RNA-seq using the Trinity platform for reference generation and analysis. Nat. Protoc. 2013, 8, 1494–1512. [Google Scholar] [CrossRef]
- Yu, L.; Majerciak, V.; Zheng, Z.-M. HPV16 and HPV18 genome structure, expression, and post-transcriptional regulation. Int. J. Mol. Sci. 2022, 23, 4943. [Google Scholar] [CrossRef] [PubMed]
- Warburton, A.; Redmond, C.J.; Dooley, K.E.; Fu, H.; Gillison, M.L.; Akagi, K.; Symer, D.E.; Aladjem, M.I.; McBride, A.A. HPV integration hijacks and multimerizes a cellular enhancer to generate a viral-cellular super-enhancer that drives high viral oncogene expression. PLOS Genet. 2018, 14, e1007179. [Google Scholar] [CrossRef] [PubMed]
- Yu, L.; Majerciak, V.; Lobanov, A.; Mirza, S.; Band, V.; Liu, H.; Cam, M.; Hughes, S.H.; Lowy, D.R.; Zheng, Z.-M.; et al. HPV oncogenes expressed from only one of multiple integrated HPV DNA copies drive clonal cell expansion in cervical cancer. mBio 2024, 15, e0072924. [Google Scholar] [CrossRef] [PubMed]
- Baker, C.C.; Phelps, W.C.; Lindgren, V.; Braun, M.J.; Gonda, M.A.; Howley, P.M. Structural and transcriptional analysis of human papillomavirus type 16 sequences in cervical carcinoma cell lines. J. Virol. 1987, 61, 962–971. [Google Scholar] [CrossRef]
- Jeon, S.; Lambert, P.F. Integration of human papillomavirus type 16 DNA into the human genome leads to increased stability of E6 and E7 mRNAs: Implications for cervical carcinogenesis. Proc. Natl. Acad. Sci. USA 1995, 92, 1654–1658. [Google Scholar] [CrossRef]
- Romanczuk, H.; Howley, P.M. Disruption of either the E1 or the E2 regulatory gene of human papillomavirus type 16 increases viral immortalization capacity. Proc. Natl. Acad. Sci. USA 1992, 89, 3159–3163. [Google Scholar] [CrossRef]
- Pett, M.R.; Herdman, M.T.; Palmer, R.D.; Yeo, G.S.H.; Shivji, M.K.; Stanley, M.A.; Coleman, N. Selection of cervical keratinocytes containing integrated HPV16 associates with episome loss and an endogenous antiviral response. Proc. Natl. Acad. Sci. USA 2006, 103, 3822–3827. [Google Scholar] [CrossRef]
- Vinokurova, S.; Wentzensen, N.; Kraus, I.; Klaes, R.; Driesch, C.; Melsheimer, P.; Kisseljov, F.; Dürst, M.; Schneider, A.; von Knebel Doeberitz, M. Type-dependent integration frequency of human papillomavirus genomes in cervical lesions. Cancer Res. 2008, 68, 307–313. [Google Scholar] [CrossRef]
- Baba, S.; Taguchi, A.; Kawata, A.; Hara, K.; Eguchi, S.; Mori, M.; Adachi, K.; Mori, S.; Iwata, T.; Mitsuhashi, A.; et al. Differential expression of human papillomavirus 16-, 18-, 52-, and 58-derived transcripts in cervical intraepithelial neoplasia. Virol. J. 2020, 17, 32. [Google Scholar] [CrossRef]
- Groves, I.J.; Drane, E.L.A.; Michalski, M.; Monahan, J.M.; Scarpini, C.G.; Smith, S.P.; Bussotti, G.; Várnai, C.; Schoenfelder, S.; Fraser, P.; et al. Short- and long-range cis interactions between integrated HPV genomes and cellular chromatin dysregulate host gene expression in early cervical carcinogenesis. PLoS Pathog. 2021, 17, e1009875. [Google Scholar] [CrossRef]
- Holmes, A.; Lameiras, S.; Jeannot, E.; Marie, Y.; Castera, L.; Sastre-Garau, X.; Nicolas, A. Mechanistic signatures of HPV insertions in cervical carcinomas. NPJ Genom. Med. 2016, 1, 16004. [Google Scholar] [CrossRef]
- Li, W.; Qi, Y.; Cui, X.; Huo, Q.; Zhu, L.; Zhang, A.; Tan, M.; Hong, Q.; Yang, Y.; Zhang, H.; et al. Characteristic of HPV Integration in the Genome and Transcriptome of Cervical Cancer Tissues. Biomed. Res. Int. 2018, 2018, 6242173. [Google Scholar] [CrossRef]
- Elix, C.; Pal, S.K.; Jones, J.O. The role of peroxisome proliferator-activated receptor gamma in prostate cancer. Asian J. Androl. 2018, 20, 238–243. [Google Scholar] [CrossRef] [PubMed]
- Sanchez, D.J.; Missiaen, R.; Skuli, N.; Steger, D.J.; Simon, M.C. Cell-Intrinsic Tumorigenic Functions of PPARγ in Bladder Urothelial Carcinoma. Mol. Cancer Res. 2021, 19, 598–611. [Google Scholar] [CrossRef] [PubMed]
- Justilien, V.; Regala, R.P.; Tseng, I.-C.; Walsh, M.P.; Batra, J.; Radisky, E.S.; Murray, N.R.; Fields, A.P. Matrix metalloproteinase-10 is required for lung cancer stem cell maintenance, tumor initiation and metastatic potential. PLoS ONE 2012, 7, e35040. [Google Scholar] [CrossRef] [PubMed]
- Zhang, Z.; Zhou, C.; Chang, Y.; Zhang, Z.; Hu, Y.; Zhang, F.; Lu, Y.; Zheng, L.; Zhang, W.; Li, X.; et al. Long non-coding RNA CASC11 interacts with hnRNP-K and activates the WNT/β-catenin pathway to promote growth and metastasis in colorectal cancer. Cancer Lett. 2016, 376, 62–73. [Google Scholar] [CrossRef]
- Zamani, M.; Foroughmand, A.-M.; Hajjari, M.-R.; Bakhshinejad, B.; Johnson, R.; Galehdari, H. CASC11 and PVT1 spliced transcripts play an oncogenic role in colorectal carcinogenesis. Front. Oncol. 2022, 12, 954634. [Google Scholar] [CrossRef]
- Wang, B.; Liu, M.; Zhuang, R.; Jiang, J.; Gao, J.; Wang, H.; Chen, H.; Zhang, Z.; Kuang, Y.; Li, P. Long non-coding RNA CCAT2 promotes epithelial-mesenchymal transition involving Wnt/β-catenin pathway in epithelial ovarian carcinoma cells. Oncol. Lett. 2018, 15, 3369–3375. [Google Scholar] [CrossRef]
- Ling, H.; Spizzo, R.; Atlasi, Y.; Nicoloso, M.; Shimizu, M.; Redis, R.S.; Nishida, N.; Gafà, R.; Song, J.; Guo, Z.; et al. CCAT2, a novel noncoding RNA mapping to 8q24, underlies metastatic progression and chromosomal instability in colon cancer. Genome Res. 2013, 23, 1446–1461. [Google Scholar] [CrossRef]
- Varfolomeev, E.; Blankenship, J.W.; Wayson, S.M.; Fedorova, A.V.; Kayagaki, N.; Garg, P.; Zobel, K.; Dynek, J.N.; Elliott, L.O.; Wallweber, H.J.A.; et al. IAP antagonists induce autoubiquitination of c-IAPs, NF-kappaB activation, and TNFalpha-dependent apoptosis. Cell 2007, 131, 669–681. [Google Scholar] [CrossRef]
- Kotelevets, L.; Chastre, E. A New Story of the Three Magi: Scaffolding Proteins and lncRNA Suppressors of Cancer. Cancers 2021, 13, 4264. [Google Scholar] [CrossRef]
- Li, X.; Jin, F.; Li, Y. A novel autophagy-related lncRNA prognostic risk model for breast cancer. J Cell. Mol. Med. 2021, 25, 4–14. [Google Scholar] [CrossRef]
- Zhou, B.; Hao, Q.; Liang, Y.; Kong, E. Protein palmitoylation in cancer: Molecular functions and therapeutic potential. Mol. Oncol. 2023, 17, 3–26. [Google Scholar] [CrossRef]
- Kultti, A.; Zhao, C.; Singha, N.C.; Zimmerman, S.; Osgood, R.J.; Symons, R.; Jiang, P.; Li, X.; Thompson, C.B.; Infante, J.R.; et al. Accumulation of extracellular hyaluronan by hyaluronan synthase 3 promotes tumor growth and modulates the pancreatic cancer microenvironment. Biomed. Res. Int. 2014, 2014, 817613. [Google Scholar] [CrossRef]
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content. |
© 2025 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).