Genetic Characterization in High-Risk Individuals from a Low-Resource City of Peru

Simple Summary Genetic testing should be accessible to all individuals independently of where they live. There is an unbalanced distribution of resources and health care facilities in different geographic regions, not only when comparing high-income to low/middle-income countries but also within countries (e.g., rural vs. urban areas). Early age of onset is helpful for identifying patients who are affected by inherited syndromes and carry a pathogenic germline variant associated with cancer predisposition. Most of hereditary cancer mutations confer susceptibility to cancers in multiple organs. This study identified seven different hereditary cancer syndromes in a high-risk population located in a low-resource setting city and allow an appropriate genetic counselling and clinical management for these individuals and their relatives. Abstract Background: Genetic testing for hereditary cancers is inconsistently applied within the healthcare systems in Latin America. In Peru, the prevalence and spectrum of cancer-predisposing germline variants is thus poorly characterized. Purpose: To determine the spectrum and prevalence of cancer-predisposing germline variants and variants of uncertain significance (VUS) in high-risk individuals located in a Peruvian low-resource setting city. Methods: Individuals presenting clinical criteria for hereditary cancer syndromes or being unaffected with familial history of cancer were included in the study. Samples from a total of 84 individuals were subjected to a high-throughput DNA sequencing assay that targeted a panel of 94 cancer predisposition genes. The pathogenicity of detected germline variants was classified according to the established American College of Medical Genetics and Genomics (ACMG) criteria. All pathogenic variants were validated by cycling temperature capillary electrophoresis. Results: We identified a total of eight pathogenic variants, found in 19 out of 84 individuals (23%). Pathogenic variants were identified in 24% (10/42) of unaffected individuals with family history of cancer and in 21% (9/42) of individuals with a cancer diagnosis. Pathogenic variants were identified in eight genes: RET (3), BRCA1 (3), SBDS (2), SBDS/MLH1 (4), MLH1 (4), TP53 (1), FANCD2 (1), DDB2/FANCG (1). In cancer cases, all colon cancer cases were affected by pathogenic variants in MLH1 and SBDS genes, while 20% (2/10) of the thyroid cancer cases by RET c.1900T>C variants were affected. One patient with endometrial cancer (1/3) had a double heterozygous pathogenic variant in DDB2 and FANCG genes, while one breast cancer patient (1/14) had a pathogenic variant in TP53 gene. Overall, each individual presented at least 17 VUS, totaling 1926 VUS for the full study population. Conclusion: We describe the first genetic characterization in a low-resource setting population where genetic testing is not yet implemented. We identified multiple pathogenic germline variants in clinically actionable predisposition genes, that have an impact on providing an appropriate genetic counselling and clinical management for individuals and their relatives who carry these variants. We also reported a high number of VUS, which may indicate variants specific for this population and may require a determination of their clinical significance.


Introduction
Clinical genetic testing for cancer-risk assessment has become widespread over the last two decades, with evidence-based testing guidelines for hereditary breast cancer (BC), Lynch syndrome (LS), Li-Fraumeni syndrome (LF) syndrome, familial adenomatous polyposis, hereditary diffuse gastric cancer, and a few other conditions [1].
Often, an early age of onset is helpful for identifying patients who are affected by inherited syndromes and that carry cancer-predisposing germline variants. It is also known that most hereditary cancer mutations confer susceptibility to cancer in multiple organs [1]. Some ethnic groups have been described to be at greater risk of developing cancers, such as individuals of Ashkenazi Jewish descent being at increased risk of developing early-onset breast and ovarian cancer [2,3].
In a rural and low-income population from Northern Peru, we have previously identified a greater proportion of cancer having a young age of onset and having a differential profile of the most frequent cancers (e.g., submaxillary gland, stomach cancer, endometrial cancer) [4]. Therefore, there is a need to identify patients who are affected by inherited syndromes and carry a cancer-predisposing germline variant. However, genetic testing is not widely available at the health-care system in Peru and no study has until now assessed the prevalence and mutational spectrum of germline variants in high-risk individuals from rural and low-income populations.
Targeted multigene next generation sequencing (NGS) panels have a significant impact on the accessibility of genetic testing owing to their versatility and low cost [5]. Given the immediate clinical management implications that patients with genetic disorders may have [5], these data are urgently needed to inform genetic counseling. To that end, a relatively large cancer panel of 94 cancer predisposition genes was sequenced in a population of 84 high-risk Peruvian individuals. In addition, sociodemographic data have been collected from the study population.

Study Population
The study population (n = 101) was selected from three regional hospitals that cares for the rural and low-income population from Northern Peru (Chimbote).
The selection criteria for the 101 individuals were as follows: • Having an early onset of cancer (cancer diagnosis < 55 years of age) (n = 51)  The participants completed a survey that contained questions about sociodemographic characteristics, family history, lifestyle habits and social determinants of health (quality of housing and health care assurance).

Next Generation Sequencing (NGS)
Genomic DNA was isolated from peripheral blood samples using DNeasy Blood & Tissue Kit (Qiagen, Germantown, MD, USA), according to the manufacturer's protocol. Whole genome sequencing libraries were made by the Oslo University Hospital Genomics Core Facility (oslo.genomics.no, accessed on 12 October 2022) using Illumina Nextera DNA Flex Pre-Enrichment Library Prep kit and captured using the Illumina TruSight Cancer panel, which enriches for a total of 94 protein-coding genes known to be implicated with cancer risk. One hundred nanograms of genomics DNA was used as input material and libraries were prepared following manufacturer's instructions. The resulting libraries were sequenced paired-end 2 × 149 bp on a NextSeq 500 using a Mid Output Kit v2.5 kit from Illumina (San Diego, CA, USA).

Variant Detection and Interpretation
Small germline variants (single nucleotide variants (SNVs)/short insertions and deletions (indels)) were identified with the DRAGEN BIO-IT platform from Illumina (software version 01.011.565.3.6.3). We executed the DNA analysis pipeline, including quality checks of the raw sequencing data, read mapping towards the GRCh37 reference genome and variant calling. The targeted regions of the Illumina TruSight Cancer panel were extended by 250 bp on both ends, and otherwise running the pipeline with the default settings. Several quality filters were automatically applied on the called variants, and the variants not passing the filters were marked as such. All the variants with low sequencing depth (less than 2) were marked. Additionally, the SNVs with variant quality below 10.41 and indels with variant quality below 7.83 were also marked. During the data analysis, multiple quality metrics were reported allowing us to inspect the quality of the sequenced data, subsequently provided in Section 3.
We used the Cancer Predisposition Sequencing Reporter (CPSR) v0.6.1 for variant interpretation [6]. In brief, CPSR classifies the clinical significance of variants according to a standard five-tier scheme, i.e., benign (B), Likely Benign (LB), variants of uncertain significance (VUS), likely pathogenic (LP), and pathogenic (P). For variants that are found with existing submissions in ClinVar, CPSR assigns the consensus classification reported in ClinVar. For novel variants (i.e., not present in ClinVar), CPSR assigns a classification according to a comprehensive implementation of American College of Medical Genetics (ACMG) guidelines for variant interpretation [7]. Considering that the DNA samples originated from Peruvian families, we used the admixed American (AMR) sub-population of gnomAD as the reference source of variant population frequencies.

Variant Validation by Cycling Temperature Capillary Electrophoresis (CTCE)
The pathogenic variants identified in this study were validated by cycling temperature capillary electrophoresis (CTCE) or real-time PCR amplification by allele specific PCR. CTCE is based on allele separation by cooperative melting equilibrium while cycling the temperature around the melting temperature using capillary technology [8]. The heterozygote samples display two peaks when separated by CTCE. This approach has previously been described and extensively used to detect somatic mutations and single nucleotide polymorphisms (SNPs) [9][10][11]. The amplicon design was performed by the variant melting profile tool (https://hyperbrowser.uio.no/hb/?toolid=hb_variant_melting_profiles/, accessed on 12 October 2022) [10]. A variant not suited for CTCE analysis was alternatively verified using allele-specific real-time PCR. The samples where assayed in two different reactions with primers specific for each allele interrogated. Cycle threshold (Ct) below 25 was used as cut off for scoring of amplification of the specific alleles. Blank and template not having the variant was used as controls. Primer sequences, PCR reaction conditions and electrophoresis settings are available upon request.

Clinical Characteristics of the Individuals Tested by Gene Panel
Out of the 101 individuals, 84 were analyzed by gene panel testing, while 17 cases were excluded due to poor quality of DNA for NGS. The most common cancer sites of origin were breast (33%, 14/42), followed by thyroid (24%, 10/42) and colon (12%, 5/42) ( Figure 1). The median age at first cancer diagnosis was 35 years (range 4-69 years). Half of the cases (42/84) were unaffected individuals with a familial history of cancer. Females (86%, 72/84) were more commonly affected than males (14%, 12/84).

Clinical Characteristics of the Individuals Tested by Gene Panel
Out of the 101 individuals, 84 were analyzed by gene panel testing, while 17 cases were excluded due to poor quality of DNA for NGS. The most common cancer sites of origin were breast (33%, 14/42), followed by thyroid (24%, 10/42) and colon (12%, 5/42) ( Figure 1). The median age at first cancer diagnosis was 35 years (range 4-69 years). Half of the cases (42/84) were unaffected individuals with a familial history of cancer. Females (86%, 72/84) were more commonly affected than males (14%, 12/84).

NGS Data Analysis
The gene panel sequencing generated on average 4.4 million reads per sample. The proportion of the marked duplicates ranged from 14% to 29% of the total sample reads, and the proportion of the reads mapping to the reference genome with MAPQ at least 40 ranged from 93% to almost 97%. The mean depth of coverage of the targeted regions ranged from 169× to 455×, while the percentage of the targeted regions with depth of at least 25× ranged from 89% to 93% between individuals. On average, 1344 variants were

NGS Data Analysis
The gene panel sequencing generated on average 4.4 million reads per sample. The proportion of the marked duplicates ranged from 14% to 29% of the total sample reads, and the proportion of the reads mapping to the reference genome with MAPQ at least 40 ranged from 93% to almost 97%. The mean depth of coverage of the targeted regions ranged from 169× to 455×, while the percentage of the targeted regions with depth of at least 25× ranged from 89% to 93% between individuals. On average, 1344 variants were called per sample. For a more comprehensive summary of the metrics related to the primary analysis of the sequencing data, see Supplementary Table S1.

Pathogenic Germline Findings
We identified a total of eight pathogenic variants, found in 19 out of 84 individuals (23% of all samples subject to sequencing) belonging to ten different families. Pathogenic variants were identified in 24% (10/42) of unaffected individuals with a family history of cancer and in 21% (9/42) of individuals with a cancer diagnosis.
All pathogenic variants were confirmed by CTCE, showing 100% concordance. A total of five individuals carried double heterozygous pathogenic variants, e.g., MLH1 and SBDS variants were found in three related family members and in one unrelated individual. DDB2 and FANCG were identified in a patient with a diagnosis of endometrium cancer at 45 years (see pedigrees in Figure 3).
The NGS results revealed that each individual carried an average of 23 VUS (range 17-34) in the set of 94 cancer susceptibility genes.
Cancer patients (n=42) Figure 2. Flowchart of the study population submitted to NGS and results from the study. All pathogenic variants were confirmed by CTCE, showing 100% concordance. A total of five individuals carried double heterozygous pathogenic variants, e.g., MLH1 and SBDS variants were found in three related family members and in one unrelated individual. DDB2 and FANCG were identified in a patient with a diagnosis of endometrium cancer at 45 years (see pedigrees in Figure 3).

Genotype and Phenotype Correlation
The variant in the TP53 (ENST00000269305.4: c.375G>A) gene is a silent type of variant that, based on its spliceogenic position, has been reported as pathogenic [12]. It was identified in a patient with a breast cancer diagnosis and is associated to the LF syndrome phenotype. A double heterozygous frameshift and missense pathogenic variant in DDB2 and FANCG genes, respectively were identified in a patient with endometrial cancer, which contrasts to the reported associated phenotype for these variants (Xeroderma pigmentosum and Fanconi anemia, respectively). The RET pathogenic variant (ENST00000355710.3) c.1900T>C) was identified in three related cases with a family history of thyroid cancer, providing a diagnosis of multiple endocrine neoplasia type 2 (MEN2), which is characterized by the development of medullary thyroid carcinoma [13]. Interestingly, the splice site variant in BRCA1 gene (ENST00000471181.2: c.4357+1G>A) was identified in three related cases, all had a family history of colon, stomach, thyroid and pancreatic cancer without the presence of breast or ovarian cancer in the family (Supplementary Figure S1). A pathogenic frameshift deletion in MLH1 (ENST00000231790.2: c.1852_1854del) was identified in 8 individuals, providing a LS diagnosis. In addition, half of these cases (4/8) also carried the SBDS (ENST00000246868.2: c.258+2T>C) variant, while two cases without a cancer diagnosis only carried the heterozygous SBDS c.258+2T>C variant. Biallelic pathogenic variants in SBDS have been associated to the Schwachman-Diamond syndrome 1 (SBDS) that has a variety of clinical features, including exocrine pancreatic insufficiency and hematological dysfunction [14,15]. The FANCD2 (ENST00000287647.3: c.848dup) pathogenic variant was identified in a case with family history of breast, liver lung and colon cancer and has been associated to Fanconi anemia ( Table 2).

Discussion
This study has for the first time allowed the identification of seven different hereditary cancer syndromes in a high-risk population located in a low-resource setting city where genetic testing is not available. Importantly, we could provide information as to which genes may have been causative for cancer in the patients and their relatives. This is likely to have direct impact on providing an appropriate genetic counselling and clinical management for individuals and their relatives carrying these variants, depending on available counselling.
We are aware that social and economic factors have a greater influence on health than clinical care [16,17]. In this study, we described that public health assurance was reported to be granted for most of the individuals (90%), while private oncological assurance was obtained only for 6.5%. According to the Northern Peru Cancer Registry (IREN Norte), 22,250 cases of neoplasms have been described in the period of 2007-2021. Approximately 6% of these cases come from Chimbote cancer hospitals [18]. Chimbote is the largest city and port of Ancash (department and region in Northern Peru) and has an estimated population (2015) of 371,012 inhabitants [19]. Interestingly, half of the reported cancer cases from IREN Norte have been diagnosed in people aged up to 59 years. In Chimbote, patients with a suspected diagnosis of cancer need to travel to the larger cities such as Lima (capital, 428 km/266 miles) or Trujillo (132 km/82 miles) to have an accurate cancer diagnosis and initiate their treatments in larger public hospitals. In this study, patients have reported that the transference to a cancer-based hospital takes up to 3 years. These social needs contribute to health inequities and higher health care costs [16,17]. Our results indicate a need for urgent implementation of genetic testing and counselling in public hospitals/centres located in low-resource setting cities to provide an early diagnosis and personalized treatment to cancer patients.
The lack of the genetic knowledge prevents effective prevention for hereditary cancer syndromes. In this study, we have provided the genetic information for 19 individuals/families that will benefit from personalized cancer medicine. We demonstrated a rate of pathogenic variants (23%) within the reported range from other populations [20][21][22]. Interestingly, a high number of VUS was identified in this study that suggest a need for their clinical significance classification. There is a need for more genetic information from delineated populations that have not been previously characterized, it will allow interpretation of genetics findings and their cancer-associations in order to provide a properly genetic counselling.
Individuals with the RET c.1900T>C (p.Cys634Arg) should follow the recommendations by the American Thyroid Association (ATA) regarding surveillance and management.
These include screening, surgery, therapy and also consideration of the implications for family members regarding reproductive considerations [23]. Carriers of pathogenic variants in BRCA1 have a high risk (approximately > 60%) of developing breast cancer, followed by ovarian cancer (39-58%), and around 5% develop pancreatic cancer. Management and surveillance should be undertaken according to the National Comprehensive Cancer Network (NCCN). The management guidelines for LS have recently been updated and are based on gene and gender-specific risks, with a resultant good prognosis for the most commonly associated cancers [24,25]. In our cohort, we identified four LS carriers with the SBDS c.258+2T>C variant, which has not previously been associated to LS, but could affect the risk for these patients. There is a need to understand the association of SBDS and MLH1 in LS in order to facilitate personalized medicine for the carriers. On the other hand, SBDS pathogenic variants have been associated to the autosomal recessive syndrome Shwachman-Diamond 1, that is an inherited bone marrow failure syndrome characterized by enhanced cancer predisposition [26]. LF is generally caused by pathogenic germline variants in TP53, which are identified in~70% of families meeting the classic LF diagnostic criteria [27] and well-established clinical and surveillance recommendations exist.
Pathogenic variants in FANCG and FANCD2 genes have a pathogenic variant frequency of 8% and 4%, respectively, and have shown to exert an autosomal recessive effect [28]. Monoallelic pathogenic variants in FANCD1/BRCA2, FANCS/BRCA1, FANCJ/BRIP1, FANCM, FANCN/PALB2, and FANCO/RAD51C have been linked to familial breast and ovarian cancer [29]. DDB2 (damage-specific DNA-binding protein 2, also known as the p48 subunit) is ubiquitously present in human tissues, albeit differentially expressed [30] and has been recently associated to many cancers, including prostate, colorectal, skin, ovarian, head and neck, suggesting a critical role for DDB2 in tumor suppression [30]. There is a need to further understand the involvement of DDB2, FANCG and FANCD2 with respect to cancer risk, in order to facilitate personalized medicine for the carriers.

Conclusions
Improving the understanding of the genetics of inherited cancer in low income countries such as Peru is crucial to harvest a significant number of individuals with pathogenic variants in clinically actionable genes. The results obtained in this study had a significant impact on patients and their relatives since it allowed genetic counselling and personalized management decisions.
Supplementary Materials: The following supporting information can be downloaded at: https:// www.mdpi.com/article/10.3390/cancers14225603/s1, Figure S1: Family pedigree where the proband carried the variant BRCA1 (ENST00000471181.2) c.4357+1G>A; Table S1: Summary of various quality metrics related to the primary analysis of the 84 analysed NGS samples