Mutational Signature and Integrative Genomic Analysis of Human Papillomavirus-Associated Penile Squamous Cell Carcinomas from Latin American Patients

Simple Summary DNA sequencing has been crucial to comprehending cancer mutational patterns, leading to the identification of driver genes and altered signaling pathways. Thus, identifying new pathogenic variants and their impact on tumor onset, progression, and treatment response has fueled tumor biology research. Here, we present novel findings addressing the first whole-exome sequencing (WES) of human papillomavirus (HPV)-associated penile squamous cell carcinoma (PSCC) from Latin Americans and its association with pathogenesis. We also compared the molecular profile of the tumors to that of three previous studies from populations with different genetic and socioeconomic backgrounds, the majority of which was HPV-negative. We describe the most altered genes and the main pathogenic variants found in the Latin Americans, ten of which are exclusive to our study sample. The data allowed us to identify molecular pathways and druggable targets with potential treatment value for this still-neglected HPV-associated carcinoma. Abstract High-throughput DNA sequencing has allowed for the identification of genomic alterations and their impact on tumor development, progression, and therapeutic responses. In PSCC, for which the incidence has progressively increased worldwide, there are still limited data on the molecular mechanisms involved in the disease pathogenesis. In this study, we characterized the mutational signature of 30 human papillomavirus (HPV)-associated PSCC cases from Latin Americans, using whole-exome sequencing. Copy number variations (CNVs) were also identified and compared to previous array-generated data. Enrichment analyses were performed to reveal disrupted pathways and to identify alterations mapped to HPV integration sites (HPVis) and miRNA–mRNA hybridization regions. Among the most frequently mutated genes were NOTCH1, TERT, TTN, FAT1, TP53, CDKN2A, RYR2, CASP8, FBXW7, HMCN2, and ITGA8. Of note, 92% of these altered genes were localized at HPVis. We also found mutations in ten novel genes (KMT2C, SMARCA4, PTPRB, AJUBA, CR1, KMT2D, NBEA, FAM135B, GTF2I, and CIC), thus increasing our understanding of the potential HPV-disrupted pathways. Therefore, our study reveals innovative targets with potential therapeutic benefits for HPV-associated PSCCs. The CNV analysis by sequencing (CNV-seq) revealed five cancer-associated genes as the most frequent with gains (NOTCH1, MYC, NUMA1, PLAG1, and RAD21), while 30% of the tumors showed SMARCA4 with loss. Additionally, four cancer-associated genes (CARD11, CSMD3, KDR, and TLX3) carried untranslated regions (UTRs) variants, which may impact gene regulation by affecting the miRNAs hybridization regions. Altogether, these data contribute to the characterization of the mutational spectrum and its impact on cellular signaling pathways in PSCC, thus reinforcing the pivotal role of HPV infection in the molecular pathogenesis of these tumors.


Introduction
Several genes are frequently mutated in various cancers, yet the altered gene spectra vary greatly [1]. The advancement of high-throughput DNA sequencing has been crucial to comprehending cancer mutational patterns, leading to the identification of driver genes and altered signaling pathways. This knowledge enables the association of mutational profiles with specific tumors and their clinical-histopathological parameters. Thus, identifying new pathogenic variants and their impact on tumor onset, progression, and treatment response has fueled tumor biology research [2]. Here, our goal was to characterize the mutational profile of HPV-positive PSCC and its association with pathogenesis.
Penile cancer (PeCa) is one of the most neglected malignancies; it is described as a rare cancer in high-income nations, but it has higher incidences in developing countries. This can be partly attributed to social and economic issues, such as poor genital hygiene, lack of childhood circumcision, phimosis, and chronic inflammation. In addition, human papillomavirus (HPV) infection, often linked to unprotected sex with multiple partners, has been identified as a risk factor for penile carcinogenesis [3]. In some regions of Brazil, PeCa may account for up to 10% of male cancers [4][5][6]. Notably, the PeCa incidence is progressively increasing even in developed countries, along with HPV infections and oropharynx cancer [7][8][9].
Our previous studies corroborate the role of HPV in PeCa, showing a high frequency of cytobands with copy number alterations (CNAs) harboring HPVis, as well as the impact of miRNA/mRNA changes on HPV-related molecular pathways [10][11][12]. However, the molecular characterization of these tumors, especially upon HPV infection, is still limited. Furthermore, patients' treatments disregard HPV status, though infection contributes to PeCa classification [13]. Surgical resection and systemic chemotherapy are the primary treatment approaches. Nevertheless, they result in poor outcomes, low survival, and a high risk of comorbidities and psychosocial issues [3]. Information about the somatic mutation landscape of PeCa is limited to patients from Europe, Asia, and North America [14][15][16], who are mostly HPV-negative. Further investigation is needed to unravel HPV-associated molecular alterations and identify potential targets for more efficient personalized treatments.
In the present work, we report unique data on the mutational signature of HPVassociated PSCC from Latin American patients. We described novel genes and pathogenic variants that may contribute to a better understanding of the potential HPV-disrupted pathways. This study reveals innovative targets with potential therapeutic benefit for this carcinoma.

Materials and Methods
Supplementary Figure S1 shows the flowchart representing the study design.

Study Population
Thirty patients with PSCC underwent chemotherapy-naive surgical excision at the Aldenora Bello Cancer Hospital (São Luís, Maranhão, Brazil). All patients submitted written informed consent, which was approved by the Research Ethics Committee on Humans from the Federal University of Maranhão and by the National Research Ethics Commission (CONEP, Brazil; CAAE: 46371515.5.0000.5087). Clinical and histopathological variables were extracted from the patients' medical records.
At diagnosis, the mean age of the patients was 60.4 ± 16.4 years old, ranging from 31 to 85 years old. Most patients presented a low level of education (83.3% had no studies or incomplete primary education, and only one had an undergraduate degree), 43.3% self-declared themselves to be a current smoker or declared to have been a smoker in the past 5 years, and 36.7% declared that they drink alcohol regularly. Additionally, 62.5% of the patients presented phimosis. All tumors were SCC, presenting an average size of 4.8 ± 2.7 cm. The tumors were localized in the glans, corpus cavernosum, and corpus spongios in 20% of the cases. The most frequent histological subtypes were condylomatous and usual, which were present in 46.7% and 40% of the cases, respectively. Tumor grades G2 and G3 were the most frequent, present in 66.7% and 26.7% of the patients, respectively. Partial and total penectomy were performed in 63% and 30% of the patients, respectively, while lymphatic and perineural invasion were performed in 18.5% and 33.3%, respectively. Table 1 and Supplementary Table S1 present the detailed patients' clinical-histopathological information.

Whole-Exome Sequencing
Genomic DNA was isolated from 30 surgically fresh tumor tissues that were macrodissected by an experienced pathologist and presented at least 80% of tumor cells. Twenty of them were paired with nontumor tissue located roughly two centimeters away from the primary tumor.
DNA from tumor and nontumor tissues was extracted by using DNeasy ® Tissue Kit (QIAGEN Group, Germantown, MD, USA), according to the manufacturer's protocol. DNA library was prepared by using 100 ng input gDNA of each tumor and their matched adjacent nontumor samples. DNA was fragmented by using a Covaris ME220 ultrasonicator and quantified on Qubit (Invitrogen, Waltham, MA, USA) according to the manufacturer's instructions. Library preparation was performed by using the Illumina ® TruSeq ® Exome Kit (Illumina, San Diego, CA, USA) and IDT custom probe enrichment (Integrated DNA Technologies, Coralville, IA, USA), according to the manufacturer's guidelines. The size, quantification, and quality control of the DNA library were evaluated on TapeStation (Agilent Technologies, Santa Clara, CA, USA). The captured DNA was subjected to high-performance sequencing, using Illumina NovaSeq 6000 platform to generate 2 × 150-bp paired-end reads.

Data Processing
To explore the spectrum of somatic variants, three pipelines were developed by VarStation Bioinformatics (https://varsomics.com/varstation/ (last accessed on 10 May 2022)), following the steps described below. The quality control of the raw sequence reads and the presence of adapters were evaluated by FASTQC v0.11.9 (https://qubeshub.org/resources/ fastqc (accessed on 22 March 2021)) and MultiQC v.1.10.1 [17]. Regions with low quality and containing adapters were trimmed, and the resulting sequences that were smaller than 50 base pairs were excluded by using Cutadapt v2.10 [18]. The FASTQ files were aligned to the reference human genome (hg38), using BWA-MEM v.0.7.13 [19]. The aligned BAM files were ordinated according to genomic annotation, PCR duplicates were removed, and the outputs were realigned by using the Indel Realignment tool of the Genome Analysis Toolkit (GATK) [20]. The recalibration was carried out by using the BaseRecalibrator tool from GATK. Moreover, the GATK Crosscheck Fingerprints tool was used to identify possible exchange between the tumor/nontumor sample pairs. The high-performance sequencing produced 73.78 ± 17.4 million and 75.35 ± 18.2 million of paired-reads for the tumors and nontumor samples, respectively, both with 99.68% ± 0.1% of DNA mapping on target. The mean coverage was 294× and 300× for tumor and nontumor tissues, with 93% ± 6% and 94% ± 8% of target DNA covered at least 100×, respectively (Supplementary Figure S2). All sequenced samples presented > 95% of quality score ≥ Q30 (mean equals to 97% ± 0.5% for tumor and 96.7% ± 0.7% for nontumor) (Supplementary Table S2).

Variant Calling
The somatic and germline variant calling was performed by using the best practice of the GATK4 [21]. The HaplotypeCaller [22] was used for germline variants, and the Mutect2 tool [23] was used for somatic variants.
The normal panel was built with the 20 adjacent nontumor samples, which were paired to their respective tumor samples (somatic paired samples) to identify single-nucleotide variants (SNVs) and nucleotide(s) deletion coupled with a nucleotide(s) insertion (Indels). In addition, the 10 tumor samples without corresponding nontumor tissue (Somatic single samples) followed the tumor-only pipeline. For comparison purposes, the somatic paired samples were also analyzed as tumor-only, totaling 30 samples. The comparison of VCFs files obtained from these different analyses was performed by using vcfeval [24], which revealed 82% matching between somatic paired and somatic single samples, while there was no matching between somatic paired and germline samples, thus confirming the usefulness of the pipeline for somatic variants through 20 paired samples.
Following the analysis of the 20 paired tumors, only the altered genes occurring in at least 15% of the cases were further analyzed in the 10 single tumors. In addition, we looked at the variants, especially the ones classified as pathogenic or likely pathogenic according to ClinVar (http://www.ncbi.nlm.nih.gov/clinvar (accessed on 22 March 2021)), following the somatic variant interpretation/reporting by the American College of Medical Genetics and Genomics (ACMG), Association for Molecular Pathology (AMP), and College of American Pathologists (CAP) recommendations [28,29]. All variants were searched in the COSMIC v9, released 28 May 2021 (https://cancer.sanger.ac.uk/cosmic (accessed on 1 June 2021)), and in dbSNP (https://www.ncbi.nlm.nih.gov/snp/ (accessed on 1 June 2021)). Our data were compared to variants described for penile tumors from European [14], Asian [15], and North American [16] populations containing mostly HPV-negative.

Copy Number Variation by Sequencing (CNV-seq)
The 20 paired samples and the 30 single tumor samples (20 paired plus 10 unpaired) were also used for calling up somatic CNVs, using high-throughput sequencing. We followed the Best Practices guide GATK4, using the somatic pipeline for CNV-seq [21]. The normal panel was built with the same samples as for SNV/Indels. Variants were annotated with Funcotator v1.6.20190124 s and AnnotSV v2.5 [30]. We considered cutoffs for calling gain and deletion as CN > 3 and CN < 1, respectively. In addition, the specified regions should have over 10 Kb and should be present in at least 10% of samples [31].

UTR-Perturbing Variants
We used the miRabel platform [32] to identify miRNAs that regulate genes with variants in UTRs. Prediction tools available in miRabel, such as MiRanda, Pita, SVmicro, and TargetScan, were used. Only miRNA-mRNA interactions with a score ≤ 0.05 were considered. The miRNA-mRNA interactions identified in miRabel were then used in the STarMir program [33] to identify the seed regions. The construction of the design and nucleic acid fold (STarMir) is obtained by using the Mfold package [34] and Sfold containing the Srna module [35]. Sfold applies a two-step model for hybridization between mRNA and miRNA. In this model, the hybridization of the miRNA-target occurs at an accessible target site, and then the hybrid elongates to form the complete miRNA-target duplex. The minimum free energy of hybridization was obtained from the RNAhybrid tool [36,37]. Only interactions with LogitProb values ≥ 0.5 were considered.

Integrated Genomic and Molecular Characterization of HPV-Associated PSCC
The thirty sequenced tumor samples were analyzed by qPCR for alterations in gene copy number and/or gene expression, while 18 tumors were evaluated for CNA by aCGH [10]. The data were cross-referenced to produce an overview of the main genes with SNV/Indels, CNV-seq, CNA, and gene expression changes (GECs) in HPV-associated PSCC in the Latin Americans.

Pathway Analysis
The Integrative Onco Genomics (IntOGen) platform [38] was used to identify variants of cancer-related genes in both paired and unpaired tumors, which were confirmed by using COSMIC [39]. The STRING v.11.5 database [40] was used to predict protein-protein interactions (PPI) based on direct (physical) and indirect (functional) associations (score > 0.9). The KEGG pathways (Kyoto Encyclopedia of Genes and Genomes) were used for functional analysis through the enrichment of STRING clusters (FDR < 0.05, with p-values corrected by the Bonferroni method), which generate sets of functionally associated proteins at various levels of hierarchy, ranging from small groups of 5 proteins to large sets of up to 200 proteins [41].

High Frequency of HPV Infection in PSCC
All patients (n = 30) were tested for HPV infection by nested-PCR and direct DNA sequencing, revealing 100% of positivity for the virus (Table 1). Genotyping was possible in all but five samples, which yielded insufficient DNA. High-risk HPV subtypes were detected in 96% of the tumors, of which 29.2% presented multiple infections. The HPV16 was the most frequent (75%).
We also highlight the occurrence of 10 (50%) variants in TERT (telomerase catalytic subunit reverse transcriptase), of which one occurs in exonic region (c.2713G > A; p.Val905Met) and nine (45% of tumors) in the upstream region of the gene. All TERT promoter (TERTp) variants were G > A transitions, with 66.7% of them being located at -146bp (rs1561215364). This variant was also found in 50% of the single tumors. The second most frequent (10% of paired tumors) was at -124bp (rs1242535815).

Novel Genes Altered Exclusively in PSCC from Latin Americans
We compared the top altered genes in Latin Americans (all of them HPV-positive) with those described in three other populations (European, Asian, and North American) (in which most of the tumors are HPV-negative) (Supplementary Table S6). These populations present some commonly altered genes (Figure 3), of which only two, NOTCH1 and CASP8, were found in the four populations. Latin Americans, North Americans, and Europeans share an alteration in CDKN2A and FBXW7, while Latin Americans, North Americans, and Asians have in common changes in TP53, FAT1, and TTN genes. The highest frequency of exclusive altered genes occurred in the Latin American population (88%), followed by North American (63%), European (60%), and Asian (18%). Among the exclusive genes in the Latin American population, we identified 10 genes related to cancer as novel altered genes potentially associated with HPV-associated PSCC: KMT2C, SMARCA4, PTPRB, AJUBA, CR1 (in 20% of the tumors, each), KMT2D, CIC, NBEA, FAM135B, and GTF2I (in 15% of the tumors, each). Table 2 shows the variants detected in cancer-associated genes found exclusively in Latin Americans. Interestingly, all loci are in HPVis, according to HPVbase (http://crdd.osdd.net/servers/hpvbase/index.html (accessed on 17 March 2022)) [42].

Copy Number Variation in HPV-Associated PSCC
The CNV-seq analyses revealed 558 genes presenting ≥ 3 copies, of which 42 are associated with cancers. NOTCH1, PTCH1, TSC1, NUP214, SET, ABL1, TNC, FANCC, PLAG1, MYC, RAD21, and NUMA1 were detected in 10% of the tumors. When comparing the CNV-seq data with our previous data on CNAs detected by aCGH [10], we found 80 genes in common presenting gene copy gain. Among them, five cancer-associated genes were confirmed by both analyses as the most frequent genes with gains: NOTCH1, MYC, NUMA1, PLAG1, and RAD21. Deletions were less frequent in both aCGH and CNV-seq analysis; however, we highlight that 30% of the tumors showed SMARCA4 loss ( Figure 2C). Table 3 presents the main altered cytobands and the most frequent genes with gain, all of them localized at HPVis for subtype 16.

UTRs Variants and Their Biological Consequences in miRNA-mRNA Interactions
Among all altered genes, 5.5% have variants in their UTRs (Figure 1). To investigate the biological consequences of these variants, we identified miRNAs that may regulate these genes, as well as their seed regions (miRNA × mRNA interacting regions) that may be impacted by the variants. As a result, 14 genes were found to be regulated by miRNAs (miRabel score ≤ 0.05), four of which are cancer-associated genes (CARD11, CSMD3, KDR, and TLX3). The variants are located within the gene hybridization sites (Supplementary Table S7). The CSMD3 gene (regulated by miR-1237-3p, miR-224-5p, and miR-132-3p) presented the highest number of hybridization regions affected by the c.-19T > C variant (sites for miR-1237-5p). In addition, the 'c.-19T > C' variant has not yet been described in the dbSNP and COSMIC databases.     the CNV-seq data with our previous data on CNAs detected by aCGH [10], we found 80 genes in common presenting gene copy gain. Among them, five cancer-associated genes were confirmed by both analyses as the most frequent genes with gains: NOTCH1, MYC, NUMA1, PLAG1, and RAD21. Deletions were less frequent in both aCGH and CNV-seq analysis; however, we highlight that 30% of the tumors showed SMARCA4 loss ( Figure  2C). Table 3 presents the main altered cytobands and the most frequent genes with gain, all of them localized at HPVis for subtype 16.

Copy Number Variation in HPV-Associated PSCC
The CNV-seq analyses revealed 558 genes presenting  3 copies, of which 42 are associated with cancers. NOTCH1, PTCH1, TSC1, NUP214, SET, ABL1, TNC, FANCC, PLAG1, MYC, RAD21, and NUMA1 were detected in 10% of the tumors. When comparing the CNV-seq data with our previous data on CNAs detected by aCGH [10], we found 80 genes in common presenting gene copy gain. Among them, five cancer-associated genes were confirmed by both analyses as the most frequent genes with gains: NOTCH1, MYC, NUMA1, PLAG1, and RAD21. Deletions were less frequent in both aCGH and CNV-seq analysis; however, we highlight that 30% of the tumors showed SMARCA4 loss ( Figure  2C). Table 3 presents the main altered cytobands and the most frequent genes with gain, all of them localized at HPVis for subtype 16.

Copy Number Variation in HPV-Associated PSCC
The CNV-seq analyses revealed 558 genes presenting  3 copies, of which 42 are associated with cancers. NOTCH1, PTCH1, TSC1, NUP214, SET, ABL1, TNC, FANCC, PLAG1, MYC, RAD21, and NUMA1 were detected in 10% of the tumors. When comparing the CNV-seq data with our previous data on CNAs detected by aCGH [10], we found 80 genes in common presenting gene copy gain. Among them, five cancer-associated genes were confirmed by both analyses as the most frequent genes with gains: NOTCH1, MYC, NUMA1, PLAG1, and RAD21. Deletions were less frequent in both aCGH and CNV-seq analysis; however, we highlight that 30% of the tumors showed SMARCA4 loss ( Figure  2C). Table 3 presents the main altered cytobands and the most frequent genes with gain, all of them localized at HPVis for subtype 16.

Copy Number Variation in HPV-Associated PSCC
The CNV-seq analyses revealed 558 genes presenting  3 copies, of which 42 are associated with cancers. NOTCH1, PTCH1, TSC1, NUP214, SET, ABL1, TNC, FANCC, PLAG1, MYC, RAD21, and NUMA1 were detected in 10% of the tumors. When comparing the CNV-seq data with our previous data on CNAs detected by aCGH [10], we found 80 genes in common presenting gene copy gain. Among them, five cancer-associated genes were confirmed by both analyses as the most frequent genes with gains: NOTCH1, MYC, NUMA1, PLAG1, and RAD21. Deletions were less frequent in both aCGH and CNV-seq analysis; however, we highlight that 30% of the tumors showed SMARCA4 loss ( Figure  2C). Table 3 presents the main altered cytobands and the most frequent genes with gain, all of them localized at HPVis for subtype 16.

Copy Number Variation in HPV-Associated PSCC
The CNV-seq analyses revealed 558 genes presenting  3 copies, of which 42 are associated with cancers. NOTCH1, PTCH1, TSC1, NUP214, SET, ABL1, TNC, FANCC, PLAG1, MYC, RAD21, and NUMA1 were detected in 10% of the tumors. When comparing the CNV-seq data with our previous data on CNAs detected by aCGH [10], we found 80 genes in common presenting gene copy gain. Among them, five cancer-associated genes were confirmed by both analyses as the most frequent genes with gains: NOTCH1, MYC, NUMA1, PLAG1, and RAD21. Deletions were less frequent in both aCGH and CNV-seq analysis; however, we highlight that 30% of the tumors showed SMARCA4 loss ( Figure  2C). Table 3 presents the main altered cytobands and the most frequent genes with gain, all of them localized at HPVis for subtype 16.

Copy Number Variation in HPV-Associated PSCC
The CNV-seq analyses revealed 558 genes presenting  3 copies, of which 42 are associated with cancers. NOTCH1, PTCH1, TSC1, NUP214, SET, ABL1, TNC, FANCC, PLAG1, MYC, RAD21, and NUMA1 were detected in 10% of the tumors. When comparing the CNV-seq data with our previous data on CNAs detected by aCGH [10], we found 80 genes in common presenting gene copy gain. Among them, five cancer-associated genes were confirmed by both analyses as the most frequent genes with gains: NOTCH1, MYC, NUMA1, PLAG1, and RAD21. Deletions were less frequent in both aCGH and CNV-seq analysis; however, we highlight that 30% of the tumors showed SMARCA4 loss ( Figure  2C). Table 3 presents the main altered cytobands and the most frequent genes with gain, all of them localized at HPVis for subtype 16.

Spectrum of Somatic Variants in HPV-Associated PSCC
Finally, to generate a spectrum of somatic variants in HPV-associated PSCC from Latin Americans, we crossed WES data with genomic alterations identified in our previous study [10], in which tumors were characterized for GEC and CNA. The profile of the most frequently altered genes in both studies is represented in Figure 4.

Molecular Pathways in HPV-Associated PSCC
We predicted the main pathways (highest confidence) by using the most frequently altered genes in the HPV-associated PSCC (Figure 4). Supplementary Table S8 lists the top 20 KEGG pathways identified and ranked by FDR value. The HPV infection pathway (hsa05165) is one of the top five, with PIK3CA, RB1, TP53, EGFR, NOTCH1, TERT, CASP8, PTEN, ITGA8, and AKT2 supporting it. The network of gene interactions supporting the top pathways for HPV-associated PSCC in Latin Americans is visualized in Figure 5.
dbSNP and COSMIC databases.

Spectrum of Somatic Variants in HPV-Associated PSCC
Finally, to generate a spectrum of somatic variants in HPV-associated PSCC from Latin Americans, we crossed WES data with genomic alterations identified in our previous study [10], in which tumors were characterized for GEC and CNA. The profile of the most frequently altered genes in both studies is represented in Figure 4.

Molecular Pathways in HPV-Associated PSCC
We predicted the main pathways (highest confidence) by using the most frequently altered genes in the HPV-associated PSCC (Figure 4). Supplementary Table S8 lists the top 20 KEGG pathways identified and ranked by FDR value. The HPV infection pathway (hsa05165) is one of the top five, with PIK3CA, RB1, TP53, EGFR, NOTCH1, TERT, CASP8, PTEN, ITGA8, and AKT2 supporting it. The network of gene interactions supporting the top pathways for HPV-associated PSCC in Latin Americans is visualized in Figure 5.

Discussion
Our study revealed the first mutational PSCC signature of Latin Americans, for whom we performed a high-throughput WES. Our study population is from the state of Maranhão in Northeast Brazil, where HPV prevalence has been reported to be among the highest in the world [5]. We also compared our findings with previous WES studies of penile tumors, in which mutational landscapes were described in Europeans [14], Asians [15], and North Americans [16], all of which had a similar sample size (30 ± 3.5). This

Discussion
Our study revealed the first mutational PSCC signature of Latin Americans, for whom we performed a high-throughput WES. Our study population is from the state of Maranhão in Northeast Brazil, where HPV prevalence has been reported to be among the highest in the world [5]. We also compared our findings with previous WES studies of penile tumors, in which mutational landscapes were described in Europeans [14], Asians [15], and North Americans [16], all of which had a similar sample size (30 ± 3.5). This comparison is particularly interesting because, in addition to the differences in genetic backgrounds and HPV status (mostly HPV-negative in the previous studies), the clinical and social characteristics of patients from developed countries contrast with those from Brazil.
HPV is the second most prevalent cancer-causing viral agent worldwide and is related to several types of tumors, such as cervical, oropharyngeal, and anogenital, including penile [44]. However, while HPV DNA is identified in over 90% of cervical carcinoma cells [7], there is substantial variation in viral positivity among penile tumors, mainly due to differences in molecular testing. Our study population is a subset from 55 previously investigated PSCC cases [10], with the highest HPV positivity among other PeCa populations [45]. This higher HPV positivity is likely a result of two more sensitive detection/genotyping methods (DNA sequencing in addition to nested-PCR able to detect 40 HPV subtypes), improving the accuracy compared to the use of a single method. As result, in this present study, 100% of the tumors were HPV positive, of which 96% were high risk.
The high-coverage WES enabled us to detect variants in exonic and intronic regions, splicing sites, UTR, and ncRNA, in addition to the major CNV-seq. Additionally, these alterations were integrated with CNAs and gene expression variations detected in 18/30 cases [10,11]. Among the altered genes found in at least 15% of tumors, 18 (NOTCH, FAT1, TP53, CDKN2A, FBXW7, CASP8, SMARCA4, PTPRB, KMT2C, CR1, AJUBA, KMT2D, MTOR, NBEA, LRP1B, FAM135B, GTF2I, and CIC) had already been linked to different types of cancer [38]. Considering the number of variants per gene size, CDKN2A, TP53, CASP8, and FBXW7 were the hotspots. Schwaederle et al. [46] proposed CDKN2A, NOTCH1, TP53, and FBXW7 as "squamousness" genes in head and neck, lung, cutaneous, gastrointestinal, and gynecologic cancers. Interestingly, all of those genes were altered at a high frequency in our cases and were classified as PSCC. Therefore, our data strengthen the importance of translational research that focuses on potential molecular targets that are common to squamous cell cancers, thereby mitigating the difficulties associated with conducting effective clinical trials due to the low prevalence of PeCa worldwide.
CDKN2A (p16) expression is a predictive marker for HPV infection [47], and as such, practically all HPV-associated cancers overexpress p16. HPV-unrelated tumors are often more aggressive and p16-negative, which is usually inactivated by homozygous deletion and promoter hypermethylation [48,49]. In the present study, CDKN2A had the highest frequency (35% of cases) of tumor-shared pathogenic variants, with all being stop-gain (p.R80X, p.R58X, and p.W110X). Previously, the nonsense variant was also reported, but in only two p16-negative penile tumors [16]. Similarly, other authors attributed lossof-function variants to explain the negativity of p16 in 5% of HPV-associated cervical cancer [50]. Hence, considering the high frequency of CDKN2A nonsense variants in highrisk HPV advanced PSCC, we reinforce that p16 immunoexpression may be ineffective as an HPV marker in a significant subset of PeCa. Our results corroborate those of Zito Marino et al. [51], who highlighted the importance of using at least two methods to avoid the misclassification of HPV penile tumors in clinical practice.
By comparing our mutational signature to those reported before [14][15][16], we identified a common list of altered genes. Latin Americans' signature shares gene alterations with North Americans and Europeans (CDKN2A and FBXW7) and with North Americans and Asians (TP53, FAT1, and TTN). These genes have been described as being involved in the establishment and progression of other tumors [52][53][54] and may also be potential drivers of this rare carcinoma. NOTCH1 and CASP8 were found to be frequently altered in all PeCa WES studies (LA = 33% and 17%; NA = 41% and 24%, respectively; EU = 12.5% each; AS = 13.3% each). Feber's group [14] proposed two mutational signatures based on the HPV status: an HPV-associated APOBEC mutation signature and an NpCpG mutation signature in HPV-negative disease, whereas Wang and collaborators [15] identified the Notch, RTK-RAS, and hippo pathways as the most disrupted pathways. Chahoud et al. [16] highlighted a molecular similarity between PSCC and head and neck squamous cell carcinoma (HNSC), specifically involving the Notch pathway, corroborating our findings of NOTCH1 variants in 50% of cases, as well as those of Wang et al. [15].
It is worth noting that whole-genome sequencing was performed on a panel of HPVnegative and epithelial-like PSCC cell lines, in which the exomic variants were highly consistent with their corresponding cancer tissues [55]. Similar to our study, TP53, CDKN2A, and NOTCH1 were among the most commonly altered genes. Furthermore, the authors observed amplification of MYC, PLAG1, and EGFR, revealing MAPK, Jak-STAT, TGF-β, Notch, and apoptosis signaling pathways as major players in the PeCa. Altogether, these results point out that SNV/Indels in these genes may be related to penile carcinogenesis regardless of HPV infection. On the other hand, we have previously suggested that downregulation of TP53 and RB1 mRNAs in PSCC [10] occurs through the upregulation of 13 miRNAs mapped at HPVis [12]. We also reported the impact of HPVis-localized CNAs on miRNA/mRNA interactions, which may alter critical pathways, such as P53 signaling, hippo and TGF-β signaling, proteoglycans in cancer, and viral carcinogenesis [11].
Although studies have described potential targets for PeCa treatment (see [45] for a review), most of them are frequently altered in other cancers; only a few clinical trials are available, with discouraging endpoints. Furthermore, despite the association between prognosis and HPV status [47], treatment is not chosen based on HPV infection. Thus, cisplatin-based chemotherapy remains the most used treatment, although there is no consensus on a standard regimen and, unfortunately, the majority of patients has a poor response to treatment [56].
In terms of targeted therapies, the mitogen-activated protein kinase (MAPK) pathway has received considerable attention; however, most MAPK inhibitors induce chemoresistance [59]. Furthermore, some of these inhibitors are highly selective, such as Vemurafenib, a drug with specificity for BRAF-V600E tumors [31], which were never reported in PeCa. On the other hand, although the SMARCA4 loss is not directly targetable, Januario and collaborators [60] demonstrated that inhibiting histone methyltransferase EZH2 was effective in SMARCA4-mutated cancer cell lines derived from various tumor types. Clinical trials have also demonstrated that Tazemetostat, an oral EZH2-specific inhibitor, has a favorable safety profile and antitumor activity [61][62][63][64]. Studies have also indicated that combining tumor immunotherapy with epigenetic modifiers, particularly EZH2, could enhance treatment efficacy (see Reference [65] for a review). Additionally, given the efficacy of CDK4/6 inhibitors in certain types of cancers harboring SMARCA4 loss [66], we highlight that the deletion and nonsense variants in CDKN2A (p16) were found in 30% and 35% of our tumor samples, respectively. Hence, considering that p16 normally inhibits CDK4 and CDK6 [67], the use of tyrosine kinase inhibitors in tumors with SMARCA4 and CDKN2A deficiency is a rational potential therapeutic option for this set of PSCCs.
We identified noncoding variants in the TERT promoter (TERTp) in 45% of tumors, with G250A (rs1561215364) and G228A (rs1242535815) accounting for 66.7% and 22.2%, respectively. Both G > A transitions were described as the most frequent in a large core of various cancers, 73% of which overexpressed TERT [68]. In contrast to a recent study [69] that linked TERTp alterations to non-HPV-related PSCC, we emphasize that all tumors with these alterations are HPV positive in the current study. TERT is normally silenced in nontumor cells but is reactivated via regulatory mechanisms that maintain the telomere length in human cancer cells, allowing them to survive. Interestingly, GABP (GA-binding protein) was identified as a critical transcription factor specifically recruited for TERT activation via the G > A mutant promoter [70]. Hence, the high frequency of these variants in PSCC provides opportunities for the development of anticancer strategies and applications of TERT as a potential biomarker for the diagnosis and prognosis of neoplasms [71]. Additionally, we identified an exonic p. (Val905Met) variant and a novel upstream (TERT:GG > AA at 1295127-1295128) variant, making TERT and NOTCH1 the most frequently altered genes in HPV-associated PSCC (50% each).
The WES also allowed us to identify CNV-seq in 42 cancer-associated genes with cn > 3, including NOTCH1, MYC, NUMA1, PLAG1, and RAD21 amplified in 10% of tumors. These findings confirm our previous aCGH data, showing that all alterations are mapped at high-risk HPVis. Moreover, we were able to identify genes presenting variants at miRNA binding sites (seed bindings in UTR regions). MiRNAs normally interact with the target gene/mRNA via their seed sequence (5 end) or the untranslated region (3 UTR) of the gene [72]. Base pairing has also been reported in the 5 UTR region and in the coding region [73]. Herein, we found 14 genes with UTR variants, of which four cancerassociated genes harbor variants in 3 UTR (KDR and CARD11) and 5 UTR (CSMD3 and TLX3). These genes have been described as oncogenes, coding for vascular endothelial growth factor receptor 2 [57,74,75], promoting cell survival by activating NF-kb [76,77], activating downstream MAPK/ERK signaling, inhibiting p53 and activating cMyc [78,79], and promoting cell growth in hematological neoplasms by activating LINC00478/miR-125b [80]. The UTR variants can alter the binding site of miRNAs, impairing their regulatory capacity; they may also generate a new binding site for other miRNAs, resulting in an imbalance of cellular homeostasis. A limited number of studies have described variants in UTRs in cancer [81][82][83], and our findings form the basis for future studies into the biological and functional impact of somatic variants in these regions on penile carcinogenesis.
Finally, the spectrum of genomic alterations obtained across an integrative analysis of the copy number and gene expression alterations in the analyzed penile tumors allowed us to predict the main pathways involved in penile carcinogenesis. As previously discussed, although other authors have proposed a link between HPV infection and penile carcinogenesis, the variety of tissue collection methods and techniques used to detect the virus allowed false-negative results. Thus, the association of specific markers or cellular pathways with viral infection still remains controversial. We used two highly sensitive molecular methods for virus detection/genotyping, which allowed us to identify the virus precisely in the tumor samples. Nevertheless, our methods did not detect active infection, which was a limitation of our study, as well as the absence of HPV-negative cases.
The enrichment analysis revealed the HPV infection pathway (hsa05165) among the top five, supported by the interaction network of the genes PIK3CA, RB1, TP53, EGFR, NOTCH1, TERT, CASP8, PTEN, ITGA8, and AKT2. This corroborates with our previous study in which HPV-mediated viral carcinogenesis was also supported for both gene and miRNAs CNAs [11]. Furthermore, most of the altered genes identified by WES, presenting SNV/Indels or CNV-seq, as well as the majority of altered cytobands identified by aCGH, lie within HPVis, reinforcing the pivotal role of HPV infection in genomic alterations through integration of the virus into the host genome.

Conclusions
The data from the first whole-exome sequencing of penile tumors from Latin Americans that are characterized by HPV infection and an advanced disease revealed that most of the alterations are located at HPVis. NOTCH1, TERT, TTN, FAT1, TP53, CDKN2A, RYR2, CASP8, FBXW7, HMCN2, and ITGA8 were the most frequently altered genes with SNV/Indels. In addition, ten novel genes (KMT2C, SMARCA4, PTPRB, AJUBA, CR1, KMT2D, NBEA, FAM135B, GTF2I, and CIC) were found in the Latin American population only, suggesting innovative targets with a potential therapeutic benefit for HPV-associated PSCC. We highlighted potential tyrosine kinase inhibitors since a significant proportion of tumors have SMARCA4 and CDKN2A deficiency. In addition, NOTCH1, MYC, NUMA1, PLAG1, and RAD21 were found to be amplified in 10% of tumors, thus supporting the aCGH findings. We also described, for the first time, UTRs variants (KDR, CARD11, CSMD3, and TLX3), impacting the miRNAs' binding sites in penile tumors. All of these genomic alterations allowed us to predict important molecular pathways and biological markers involved in penile carcinogenesis, thereby improving our knowledge and confirming previous studies. The HPV molecular pathway was highly supported by our findings. Additionally, our data strengthen the importance of translational research that focuses on potential molecular targets common to squamous cell cancers, thereby mitigating the difficulties associated with conducting effective clinical trials due to the low prevalence of penile tumors worldwide. We further highlight the importance of collaborative research involving the integration of data from different populations, whose genetic, socioeconomic, and cultural characteristics, in addition to HPV status, must have a substantial impact on the initiation, development, and treatment response of penile cancer.
Supplementary Materials: The following supporting information can be downloaded at: https: //www.mdpi.com/article/10.3390/cancers14143514/s1. Figure S1: Flowchart representing the study design. Fresh penile squamous cell carcinomas were collected from 30 Latin American patients. Tumors were screened for human papillomavirus (HPV) infection and were genotyped in most cases (5 cases yielded insufficient DNA). Whole-exome sequencing (WES) was performed in 20 paired tumors and adjacent normal tissues and 10 single tumors. Sequencing results were used for identifying somatic alterations. Furthermore, enrichment for miRNA-mRNA interactive regions (miR-mRNAir) and HPV integration sites (HPVis) was performed. The copy number variants (CNVs) were contrasted to previous CNA results of the same cases based on aCGH, with validation of selected genes by qPCR and RT-qPCR 10 . The mutated genes identified in this study were compared to those described by three previous studies reporting WES data from PeCa patients from Europe 14 , Asia 15 , and North America 16 . Figure S2: Depth and coverage of (A) HPV-associated penile squamous cell carcinoma samples (n = 30) and (B) nontumor adjacent penile tissues (n = 20). Table S1: Clinical-histopathological characterization of the Latin Americans' patients with HPVassociated penile squamous cell carcinomas (n = 30). Table S2: Quality control of the whole-exome sequencing performed in 30 HPV-associated penile squamous cell carcinomas. Table S3: Cancerassociated genes presenting alterations in HPV-associated penile squamous cell carcinomas from Latin Americans. Table S4: Top 84 genes with somatic variants identified by WES in HPV-associated penile squamous cell carcinomas from Latin Americans, and the normalized number of variants based on gene size (bp). Cancer-associated genes are highlighted in bold. Table S5: Variants detected in the most altered genes in Latin Americans with HPV-associated penile squamous cell carcinomas. Table S6: Main altered genes described in three previous studies in which whole-exome sequencing was performed in penile squamous cell carcinomas from Europeans [14], Asians [15], and North Americans [16], compared to Latin Americans (this study). Table S7: Genes with variants in untranslated regions (UTRs), and the miRNAs that binding by seed region. Table S8: The top twenty KEGG pathways identified from a spectrum of major genes with somatic variants in HPV-associated penile squamous cell carcinomas from Latin Americans, ranked by FDR value. Informed Consent Statement: Informed consent was obtained from all subjects involved in the study.

Data Availability Statement:
The data presented in this study are available upon request from the corresponding author. The data are not publicly available due to patient confidentiality.