High-Throughput Sequencing of Gastric Cancer Patients: Unravelling Genetic Predispositions Towards an Early-Onset Subtype

Background: Gastric cancer is the fourth most common cause of cancer-related death. Currently, it is broadly accepted that the molecular complexity and heterogeneity of gastric cancer, both inter- and intra-tumor, display important barriers for finding specific biomarkers for the early detection and diagnosis of this malignancy. Early-onset gastric cancer is not as prevalent as conventional gastric carcinoma, but it is a preferable model for studying the genetic background, as young patients are less exposed to environmental factors, which influence cancer development. Aim: The main objective of this study was to reveal age-dependent genotypic characteristics of gastric cancer subtypes, as well as conduct mutation profiling for the most frequent alterations in gastric cancer development, using targeted next-generation sequencing technology. Patients and methods: The study group included 53 patients, consisting of 18 patients with conventional gastric cancer and 35 with an early-onset subtype. The DNA of all index cases was used for next-generation sequencing, employing a panel of 94 genes and 284 single nucleotide polymorphisms (SNPs) (TruSight Cancer Panel, Illumina), which is characteristic for common and rare types of cancer. Results: From among the 53 samples processed for sequencing, we were able to identify seven candidate genes (STK11, RET, FANCM, SLX4, WRN, MEN1, and KIT) and nine variants among them: one splice_acceptor, four synonymous, and four missense variants. These were selected for the age-dependent differentiation of gastric cancer subtypes. We found four variants with C-Score ≥ 10, as 10% of the most deleterious substitutions: rs1800862 (RET), rs10138997 (FANCM), rs2230009 (WRN), and rs2959656 (MEN1). We identified 36 different variants, among 24 different genes, which were the most frequent genetic alterations among study subjects. We found 16 different variants among the genes that were present in 100% of the total cohort: SDHB (rs2746462), ALK (rs1670283), XPC (rs2958057), RECQL4 (rs4925828; rs11342077, rs398010167; rs2721190), DDB2 (rs326212), MEN1 (rs540012), AIP (rs4930199), ATM (rs659243), HNF1A (rs1169305), BRCA2 (rs206075; rs169547), ERCC5 (rs9514066; rs9514067), and FANCI (rs7183618). Conclusions: The technology of next-generation sequencing is a useful tool for studying the development and progression of gastric carcinoma in a high-throughput way. Our study revealed that early-onset gastric cancer has a different mutation frequency profile in certain genes compared to conventional subtype.


DNA Extraction
The tissues were collected as previously described by Sitarz et al. (2008) [20]. They were stored in liquid nitrogen. The genomic DNA extraction from formalin-fixed paraffin-embedded (FFPE) tissues was performed using the QIAamp DNA Mini kit (Qiagen, Venlo, the Netherlands) or the Puregene DNA Isolation kit (Gentra, MN, USA), in accordance with the manufacturer's instructions. Isolated DNA was kept at −80 • C. DNA quantity assessment was conducted with the Quantus™ Fluorometer with the QuantiFluor ® dsDNA System, according to the protocol (Promega, Madison, WI, USA). The DNA quality was checked using the Agilent 2200 TapeStation and Genomic DNA ScreenTape (Agilent Technologies, Santa Clara, CA, USA). DNA samples were tested for high, middle, and low integrity. The DNA Integrity Number (DIN) algorithm was used to assess nucleic acid fragmentation. We selected samples with a DIN of at least 5 (where the scale is 1-10) for qPCR assessment. Further evaluation of the usefulness of the samples for sequencing was done with the Infinium HD FFPE QC Assay Protocol (Illumina, San Diego, CA, USA). Real-Time PCR by the CFX96 Touch TM Real-Time PCR Detection System (Bio-Rad, Hercules, CA, USA) was performed to check the amplification status in each sample. The average quantification cycle (Cq) values for the quality control template_standard (QCT_ST) were subtracted from the average Cq value for each sample to compute the Delta Cq value for each sample. All samples with a Delta Cq value below 5 were selected for library preparation.

Library Preparation for Next-Generation Sequencing
Libraries for sequencing experiments were prepared using the Trusight Rapid Capture Preparation Kit (Illumina, San Diego, CA, USA), according to the manufacturer's protocol. DNA samples were standardized to 50 ng and underwent a tagmentation step. The libraries generated from input genomic DNA were amplified and adapter-tagged multiple short library fragments of 220-350 base pairs long. The libraries for each sample were combined to obtain proper complexity for a single run. Biotin-labeled probes were used to target regions of interest. Further steps of hybridization were conducted with custom-synthesized oligos that captured genes with the assigned predisposition towards multiple cancer types. The sequencing panel, the TruSight Cancer Panel (Illumina, San Diego, CA, USA), targeted 94 genes and 284 single nucleotide polymorphisms (SNPs) linked to common and rare cancers. The complete list of target genes and SNPs is available on the website of Illumina. Standard pools were enriched with regions of interest through streptavidin-coated beads that bound to biotinylated probes. Then, DNA fragments were eluted from the beads and another round of hybridization and enrichment was performed. Final amplified post-capture library concentrations were assessed with the Quantus™ Fluorometer, according to the manufacturer's protocol. Libraries were quantified by qPCR using the KAPA Library Quantification kit (KAPA Biosystems, Boston, MA, USA). The size of the obtained library fragments was evaluated with the Agilent 2200 TapeStation and High Sensitivity D1000 ScreenTape System (Agilent Technologies, Santa Clara, CA, USA). Post-capture enriched libraries were sequenced on the MiSeq Sequencing Platform (Illumina, San Diego, CA, USA) with the manufacturer's Cancers 2020, 12, 1981 4 of 14 workflow. The concentration of loaded libraries amounted to 10 pM. The sequencing experiment was performed with the MiSeq Reagent Kit v2 (300 cycles).

Data Processing
Results from each sample were mapped to the human reference genome GRCh37, also known as Human Genome version 19 (hg19), using the Burrows-Wheeler Aligner (BWA-mem, version 0.7.5). Readings with a low mapping quality score, unmapped readings, and duplicates were filtered out with Samtools (version 0.1.19) [21]. The local realignment of readings around indels (insertion or deletion) and detection of systematic errors in base quality scores were performed with the Genome Analysis Toolkit (GATK) [22]. Readings mapped outside the target region were removed. Variant calling for germline SNPs and indels was performed with the GATK HaplotypeCaller tool [23]. The callsets of SNPs and small insertions and deletions were separated for further filtering. The hard filters applied to variant callsets were, for SNPs, QD < 2.0, MQ < 40.0, FS > 60.0, HaplotypeScore > 13.0, QRankSum < −12.5, and ReadPosRankSum < −8.0, and for INDELs, QD < 2.0, ReadPosRankSum < −20.0, and FS > 200.0. Filtered variants were concatenated into one record (VCF file) and the discovered variants were then annotated with SnpEff (version 4.2) using GEMINI (GEnome MINIng 0.18.3) and loaded into the SQLite database [24].

In Silico Estimation of the Detected Variants
We attempted to provide an appropriate explanation for variants, which were prioritized as statistically significant for the age-dependent diversification of GC. There are several online annotations that provide variant interpretation, such as CADD and DANN scoring, FATHMM-XF, PROVEAN, and SIFT predictions, which are useful tools for estimating variants with a potentially high risk of GC development. Nevertheless, they also have several limitations. Firstly, factors of annotations differ in various aspects, from constitutions to functions. Secondly, each tool has a unique metric, which is hard to compare with the others. Thirdly, combined annotations might only deliver overlapping importance. In our study, we used the CADD scoring system, which is a framework employed for estimating the relative pathogenicity of human genetic variants, by incorporating multiple, different annotations into a single quantitative score.

Age-Dependent Genotypic and Phenotypic Characteristics of Gastric Cancer Subtypes
In the study cohort of 53 patients with GC, which included 18 cases with CGC and 35 with EOGC, we identified seven candidate genes for discriminating these two age-associated subtypes of GC (Table 1). Among them, we detected nine variants, which passed our selection criteria, including one splice_acceptor, four synonymous, and four missense variants. The frequency of each variant is presented in Table 1, separating EOGC and CGC groups, including the genotype, gene and chromosome location, type of alteration, and p-value (indicating the variant diversification among groups).
Three of the described variants were assigned to patients with the EOGC subtype. The missense variant (rs1799939) was found in the RET gene. The alteration frequency distribution among groups was, respectively, 46.2% of cases heterozygous for this variant in EOGC, 3 NA: not applicable. a * Sequence with deletion: GAGGTAGGCACGTGCTAGGGGGGGCCCTGGGGCGCCCCCTCCC GGGCACTCCCTGAGGGCTGCACGGCACCGCCAC/G. b * Reference sequence: GAGGTAGGCACGTGCTAG GGGGGGCCCTGGGGCGCCCCCTCCCGGGCACTCCCTGAGGGCTGCACGGCACCGCCAC/GAGGTAGGCA CGTGCTAGGGGGGGCCCTGGGGCGCCCCCTCCCGGGCACTCCCTGAGGGCTGCACGGCACCGCCAC.

In Silico Estimation
In our study, we used different scoring and prediction systems for the relative assessment of the variant pathogenicity of a particular variant. The results are shown in Table 2. We defined variants with C-Score ≥10 (top 10% in the ranking for pathogenicity) as 10% of the most deleterious substitutions. This criterion was fulfilled by the following four variants: rs1800862, rs10138997, rs2230009, and rs2959656. In practice, we cannot define pathogenicity as "deleterious" by only employing the C-score, as functional and/or clinical evidence is mandatory to confirm pathogenicity. The gnomAD database displayed variants with different frequencies among populations. The variants with CADD score higher than 10 were distributed respectively: rs1800862 (MAF = 0.04899), rs10138997 (MAF = 0.05851), rs2230009 (MAF = 0.05857), and rs2959656 (MAF = 0.9960).
We defined variants with C-Score ≥10 (top 10% in the ranking for pathogenicity) as 10% of the most deleterious substitutions. This criterion was fulfilled by the following four variants: rs1800862, rs10138997, rs2230009, and rs2959656. In practice, we cannot define pathogenicity as "deleterious" by only employing the C-score, as functional and/or clinical evidence is mandatory to confirm pathogenicity. The gnomAD database displayed variants with different frequencies among populations. The variants with CADD score higher than 10 were distributed respectively: rs1800862 (MAF = 0.04899), rs10138997 (MAF = 0.05851), rs2230009 (MAF = 0.05857), and rs2959656 (MAF = 0.9960).

High-Throughput Mutation Profiling Identifies the Most Frequent Mutations in EOGC and CGC Samples
In our cohort of 53 patients with GC, we identified 36 different variants, among 24 various genes, which are presented in Figure 1. Our selection criterion was the frequency of chosen variants among the analyzed group (both EOGC and CGC). The variants were selected based on the number of homozygous reference variants in analyzed cases. We determined the range of 0-5 numbers of homozygous reference variants in analyzed cases. We found 16 different variants among genes which were present in 100% of the total cohort, as heterozygous or alternative homozygous variants: SDHB (rs2746462), ALK (rs1670283), XPC (rs2958057), RECQL4 (rs4925828; rs11342077, rs398010167; rs2721190), DDB2 (rs326212), MEN1 (rs540012), AIP (rs4930199), ATM (rs659243), HNF1A (rs1169305), BRCA2 (rs206075; rs169547), ERCC5 (rs9514066; rs9514067), and FANCI (rs7183618). Six variants with one or two homozygous reference variants were detected in several genes: ALK (rs4358080; rs2293564), APC (rs459552), NSD1 (rs28580074), EGFR (rs1140475), AIP (rs641081) and XPC (rs2228001), WRN (rs1800389), RET (rs1800861), MEN1 (rs2959656), RHBDF2 (rs3744045), and ALK (rs2246745). Two variants in genes-RET (rs1800858) and PMS2 (rs2228006) were present among the samples, with three homozygous reference variants. Two variants were detected with four references-in genes BRIP1 (rs4986765) and TP53 (rs1042522). Genes PMS2 (rs1805319), FANCE (rs4713867), RET (rs1800860), and CDH1 (rs1801552)-each with a variant showing five references among the analyzed samples.  Among the thirty-six variants identified in our patient cohort, it is expected that sixteen of them (fifteen missense and one frameshift) will result in amino acid substitution (Table 3). Nine variants according to the CADD scoring system, with C-Score ≥ 10, were described as 10% of the most deleterious substitutions and thus most likely affect gene or protein function. Twenty of the detected variants were synonymous substitutions in which the produced amino acid sequence was not modified.  Nine of the mentioned variants with C-Score ≥ 10 were sequences of nine different genes: AIP, BRCA2, ERCC5, APC, XPC, RET, MEN1, RHBDF2, and PMS2. These genes are known to have roles in different processes and syndromes, such as DNA damage responses, DNA repair, multiple endocrine neoplasia type 1 syndrome, the aryl hydrocarbon receptor pathway, the Wnt signaling pathway, and ERK signaling. All of the variants are listed in Table 3.

Discussion
GC is one of the most common cancers and one of the most frequent causes of cancer-related deaths [25]. The prognosis and 5-year survival rate of patients with GC are still very poor. NGS is a useful tool for revealing the mutation profiling, importance of early and late development, and genotypic and phenotypic classification of GC. This technology helps to identify specific biomarkers, tumor suppressor genes, and carcinogens, as well as facilitate the understanding of mechanisms and affected pathways of GC tumorigenesis [26]. Therefore, this might be applied in the future for early diagnosis or personal treatment by either determining the particular GC biomarker or a specific drug-resistant gene [27]. Figure 2 summarizes the conclusions that can be drawn from the obtained results in the context of applying NGS studies as a useful tool for studying GC development in a high-throughput way.

Discussion
GC is one of the most common cancers and one of the most frequent causes of cancer-related deaths [25]. The prognosis and 5-year survival rate of patients with GC are still very poor. NGS is a useful tool for revealing the mutation profiling, importance of early and late development, and genotypic and phenotypic classification of GC. This technology helps to identify specific biomarkers, tumor suppressor genes, and carcinogens, as well as facilitate the understanding of mechanisms and affected pathways of GC tumorigenesis [26]. Therefore, this might be applied in the future for early diagnosis or personal treatment by either determining the particular GC biomarker or a specific drugresistant gene [27]. Figure 2 summarizes the conclusions that can be drawn from the obtained results in the context of applying NGS studies as a useful tool for studying GC development in a highthroughput way. We used the high-throughput sequencing of GC patients to compare and characterize agedependent genotypic and phenotypic characteristics of GC subtypes. Based on the results obtained in this study, we identified potential candidate genes for distinguishing between EOGC and CGC. We were able to identify seven candidate genes, as well as nine variants, among them, which were statistically significantly different in these two subgroups. Variants, including rs1799939 (CADD = 8.26, MAF = 0.1847, p = 0.010), rs2959656 (CADD = 11.00, MAF = 0.9960, p = 0.041), and rs55986963 (CADD = 7.36, MAF = 0.03027, p = 0.052), were predominantly detected in EOGC patients. Interestingly, variant rs2959656 was found in 100% of cases of EOGC as homozygous. According to the NCBI ClinVar database, variant rs1799939 in the RET gene is associated with multiple endocrine We used the high-throughput sequencing of GC patients to compare and characterize age-dependent genotypic and phenotypic characteristics of GC subtypes. Based on the results obtained in this study, we identified potential candidate genes for distinguishing between EOGC and CGC. We were able to identify seven candidate genes, as well as nine variants, among them, which were statistically significantly different in these two subgroups. Variants, including rs1799939 (CADD = 8.26, MAF = 0.1847, p = 0.010), rs2959656 (CADD = 11.00, MAF = 0.9960, p = 0.041), and rs55986963 (CADD = 7.36, MAF = 0.03027, p = 0.052), were predominantly detected in EOGC patients. Interestingly, variant rs2959656 was found in 100% of cases of EOGC as homozygous. According to the NCBI ClinVar database, variant rs1799939 in the RET gene is associated with multiple endocrine neoplasia, hereditary cancer-predisposing syndrome, renal dysplasia, and pheochromocytoma. Activation of the RET proto-oncogene might constitute one of the molecular drivers of gastric inflammatory and neoplastic diseases [3]. Variant rs2959656 was detected in the MEN1 gene. Multiple Endocrine Neoplasia type 1 (MEN1) is a rare hereditary endocrine cancer syndrome, mainly leading to tumors of the parathyroid glands, endocrine gastroenteropancreatic tract, and the anterior pituitary [28]. Other endocrine and non-endocrine neoplasms are also observed, including GC [29]. Variant rs55986963 in the KIT gene, according to the NCBI ClinVar database, is associated with gastrointestinal stromal tumors (GISTs), mastocytosis, and partial albinism. Capelli et al. (2016) [30] also investigated exons 9, 11, 13, and 17 of the KIT gene by direct sequencing and showed that KIT mutations were observed in 53.8% of patients with gastric GISTs. Besides, KIT deletions in exon 11, mostly those involving codons 557, 558, and 559, were primarily associated with the more aggressive gastric GIST phenotype and a higher probability of death or relapse.
Variants that were most frequently detected among CGC included long deletion in the STK11 gene  [31]. The variant rs2230009 (WRN) is associated with Werner syndrome (NCBI ClinVar). Patients with Werner syndrome present with an increased incidence of cancer, indicating that the lack of a proper WRN function affects tumorigenesis [32].
Our study of a cohort of 53 patients with GC allowed us to identify 36 different variants, among 24 various genes, which turned out to be the most probable genetic alterations, causing the development of GC. Importantly, we displayed 16 different variants that were detected in 100% of the total cohort. DNA damage appears to be the underlying cause of cancer [33] and deficiencies in DNA repair genes induce the development of different types of cancer [34]. If DNA repair is defective, DNA damage accumulates, increasing the number of mutations that occur due to error-prone translational synthesis, which consequently leads to the initiation of cancer development. Alterations in DNA double-strand break repair and DNA damage-response genes were detected in our study, such as variants rs2958057 (XPC); rs4925828, rs11342077, rs398010167 and rs2721190 (RECQL4); rs206075 and rs169547 (BRCA2); rs659243 (ATM); rs326212 (DDB2); rs9514066 and rs9514067 (ERCC5). It has been shown in different studies that the response to DNA damage plays an important role in the pathobiology of GCs [35][36][37]. The FA pathway constitutes a part of the DNA-damage network, including breast cancer-susceptibility proteins BRCA1 and BRCA2. The pathway is activated by ataxia telangiectasia and Rad3-related (ATR) kinase; however, the underlying mechanism remains unclear.
A new study has demonstrated that the major switch activating the pathway is the ATR-dependent phosphorylation of FANCI [38]. Variant rs2746462 (SDHB) is related to paraganglioma and gastric stromal cell sarcoma, as well as the hereditary cancer-predisposing syndrome. Miettinen and Lasota (2014) described that approximately half of the patients with GISTs present SDH subunit gene mutations, mostly germline, with both alleles being inactivated in the tumor cells [39]. Variant rs1670283 (ALK) is associated with the hereditary cancer-predisposing syndrome and neuroblastoma 3 (NCBI ClinVar). Variant rs1169305 (HNF1A) is assigned to maturity-onset diabetes of young type 3 (MODY3) and might also result in hepatic adenomas (NCBI ClinVar). Variant rs4930199 (AIP) is associated with the hereditary cancer-predisposing syndrome and familial isolated pituitary adenomas.
Six of the variants with one and two homozygous reference variants were displayed in genes: ALK, APC, NSD1, EGFR, AIP and XPC, WRN, RET, MEN1, RHBDF2, and ALK. Two variants in genes-RET and PMS2-were detected with three homozygous reference variants. For BRIP1 and TP53 genes, two variants were detected with four references. Genes PMS2, FANCE, RET, and CDH1-each with variants showing five references among the analyzed group. Variant rs459552 (APC) is associated with familial adenomatous polyposis 1 and hereditary cancer-predisposing syndrome. The APC mutation is involved in the carcinogenesis of the intestinal type of GC and related to the LOH pathway in GC [40]. Variant rs28580074 (NSD1) is prevalent in Weaver syndrome and Sotos syndrome according to NCBI ClinVar. Variant rs1140475 is present in EGFR. EGFR, HER2, and MET signaling is important, especially in proximal non-diffuse tumors, and constitutes the logical targets for molecular therapy [41]. Variant rs3744045 in the RHBDF2 gene is associated with Howel-Evans syndrome. RHBDF2 seems to regulate oncogenic and non-canonical TGFB1 signaling. GC-associated fibroblasts increase their motility, via the expression of rhomboid 5 homolog 2, and the ability to induce the invasiveness of GC cells [42]. Variant rs2228006 (PMS2) is common for hereditary nonpolyposis colorectal cancer type 4 and mismatch repair cancer syndrome. BRIP1 and variant rs4986765 are related to familial cancer of the breast, neoplasms of the ovary, hereditary cancer-predisposing syndrome, and Fanconi anemia, which is also associated with the variant rs4713867 (FANCE) gene. The TP53 alteration (rs1042522) is related to Li-Fraumeni syndrome and hereditary cancer-predisposing syndrome. Besides, p53 alterations are displayed quite early in the development of GC, even being present in the non-neoplastic mucosa, and their incidence increases with the progression of GC [43]. Even though our study was limited due to a low number of samples, it provides a significant insight into genetic alterations that occur in GC, primarily EOGC and CGC. Further studies with larger groups of patients are needed, since the differences between EOGC and CGC cases may possibly be more significant.

Conclusions
We were able to compare the age-dependent genotypic and phenotypic profile of GC patients, focusing on the most frequent alterations in GC development, both EOGC and CGC, using high-throughput sequencing technology. We found variants among several genes, which might be considered for future studies on the early detection and diagnosis of GC. We displayed the processes and syndromes involved in EOGC and CGC development. Mutation prediction tools enable us to estimate potential candidate biomarkers with a pathogenic impact on disease development. This study primarily placed emphasis on EOGC development, which is less exposed to environmental factors and constitutes a good model for further studies on the genetic background of GC development.
Supplementary Materials: The following are available online at http://www.mdpi.com/2072-6694/12/7/1981/s1: Table S1: A table containing the clinical characteristics of the studied group, Table S2: A table with in

Conflicts of Interest:
The authors declare that they have no conflicts of interest.