Next-Generation Sequencing in Lung Cancer Patients: A Comparative Approach in NSCLC and SCLC Mutational Landscapes

Background: Lung cancer remains one of the most diagnosed malignancies, being the second most diagnosed cancer, while still being the leading cause of cancer-related deaths. Late diagnosis remains a problem, alongside the high mutational burden encountered in lung cancer. Methods: We assessed the genetic profile of cancer genes in lung cancer using The Cancer Genome Atlas (TCGA) datasets for mutations and validated the results in a separate cohort of 32 lung cancer patients using tumor tissue and whole blood samples for next-generation sequencing (NGS) experiments. Another separate cohort of 32 patients was analyzed to validate some of the molecular alterations depicted in the NGS experiment. Results: In the TCGA analysis, we identified the most commonly mutated genes in each lung cancer dataset, with differences among the three histotypes analyzed. NGS analysis revealed TP53, CSF1R, PIK3CA, FLT3, ERBB4, and KDR as being the genes most frequently mutated. We validated the c.1621A>C mutation in KIT. The correlation analysis indicated negative correlation between adenocarcinoma and altered PIK3CA (r = −0.50918; p = 0.0029). TCGA survival analysis indicated that NRAS and IDH2 (LUAD), STK11 and TP53 (LUSC), and T53 (SCLC) alterations are correlated with the survival of patients. Conclusions: The study revealed differences in the mutational landscape of lung cancer histotypes.


Introduction
GLOBOCAN 2020 indicates lung cancer as being the second most diagnosed cancer after breast cancer, while still being the leading cause of cancer-related deaths [1], despite the decline recorded in the last decade regarding lung cancer incidence and mortality rates. Based on the histological aspect, lung cancers can be divided into two main types-non-small-cell lung cancer (NSCLC) and small-cell lung cancer (SCLC)-the first being responsible for more than 80% of lung cancers, while the latter accounts for about 15% of bronchogenic carcinomas. High mortality rates in lung cancer are not type-specific, with the survival rate being directly linked to the stage at diagnosis. More precisely, the 5-year survival rate is about 14% in stage III and 1% in stage IV NSCLC, and 8% in stage III and 2% in stage IV SCLC [2]. Still, the late diagnosis is not the only cause for high mortality in lung cancer. Another cause is the fact that lung cancer treatments appear to be less effective than treatments for other types of cancer (i.e., breast cancer) [3]. Nonetheless, the mutational burden of lung tumors appears to be the most critical aspect that influences response to therapy and, thus, survival rates of lung cancer patients. One important element to consider in lung cancer is the higher mutational burden of the patients with a smoking history compared to the age-related lung cancer patients [4]. Regarding the somatic mutations identified in lung cancer tumors, it is well known that the number of driver mutations in cancer is correlated to the number of somatic mutations. Therefore, an increase in the number of somatic mutations would be succeeded by an increase in the driver mutations number. In this context, the high mutational burden of lung cancer tumors hinders the attempts to identify a single driver mutation that can be druggable for therapeutic purposes. Therefore, most cancer patients display a high mutational burden that appears during a process of accumulation of driver mutations that confer to the tumor cell a selective advantage. This way, the targeted therapies available in clinical practice for oncologists to treat lung cancer are restricted to a few options [2].
However, the generous amounts of data regarding genetic alterations (including mutation detection and copy number alterations) that are now available using NGS technology favor the detection of new molecular biomarkers for early detection and new targetable molecules in precision medicine for a personalized treatment that would improve the lung cancer patient's outcome. The impact of the NGS technologies is remarkable as it offers data support to understand the biological processes underlying oncogenesis and favors personalizing patient care in clinical practice. Nowadays, using NGS, the research community is making substantial progress in detecting new causative mutations in lung cancer tumors, to identify new biomarkers to favor early diagnosis, and these elements guide clinicians in selecting a properly targeted therapy for lung cancer patients based on their molecular profile [5][6][7][8].
This study hypothesized that the molecular alterations in lung cancer patients are associated with the morphological/histological classification of this pathology. We hypothesized that each histological type is characterized by a set of specific mutations/alterations in the cancer genes panel used for the targeted sequencing experiment and that we might be able to discriminate between histological types based on the mutational profile of each histotype. We aimed to characterize the genetic alterations encountered in patients included in the datasets for lung cancer (LUAD, LUSC, and SCLC) in the TCGA database and to unravel specific altered pathways for each histotype. Another objective was to assess the mutational profile of lung cancer patients using a 46-gene panel in a targeted sequencing experiment and to identify commonly mutated genes in patients with the same lung cancer subtype. Another focus of this study was to identify correlations between genetic alterations and clinicopathological characteristics of the patients in the study (Figure 1).

TCGA Datasets and Statistical Analysis
For the TCGA (The Cancer Genome Atlas) data, analysis began with the download and visualization of the available data from the CBioPortal webpage (v 4.0.1). The datasets downloaded included the most comprehensive datasets for the two subtypes of lung cancer (NSCLC and SCLC) in three separate datasets for lung adenocarcinoma (LUAD) (586 tumor samples), lung squamous cell carcinoma (LUSC) (511 tumor samples), and smallcell lung cancer (SCLC) (120 tumor samples).
The statistical analyses were performed using GraphPad Prism v.6.0. The Pearson test was used for determining statistical associations between most altered genes identified as carrying also pathogenic mutations; altered genes and tumor stage; tumor size; histological type; lymph node involvement; metastatic spread; days to event (death).

Patient Cohorts
For this study, the first cohort of patients consisted of a total of 32, of which 16 samples were non-small-cell lung carcinoma (adenocarcinoma and squamous cell carcinoma) and 16 were small-cell lung carcinoma. In this cohort, we used primary lung tumor biopsies and whole blood samples from 25 males and 7 females. The sampling was performed from patients undergoing biopsies for diagnostic (tumor and adjacent normal tissue) for stage III and IV pulmonary cancer. The patients were enrolled in the study between December 2016 and January 2018. The second cohort also included 32 patients-16 NSCLC and 16 SCLC (9 females and 23 males)-diagnosed with pulmonary cancer between February 2018 and May 2019. All patients were provided with detailed information regarding the present study and gave their informed consent. This research was approved by the Ethical Committee of Leon Daniello Pulmonology Hospital, Cluj-Napoca, Romania no. 264/26.06.2018 and the Ethical Committee of Iuliu Hatieganu University of Medicine and Pharmacy no. 438/24.11.2016. The clinicopathological characteristics of these patients are listed in Table 1 and the individual characteristics in Table 2.

TCGA Datasets and Statistical Analysis
For the TCGA (The Cancer Genome Atlas) data, analysis began with the download and visualization of the available data from the CBioPortal webpage (v 4.0.1). The datasets downloaded included the most comprehensive datasets for the two subtypes of lung cancer (NSCLC and SCLC) in three separate datasets for lung adenocarcinoma (LUAD) (586 tumor samples), lung squamous cell carcinoma (LUSC) (511 tumor samples), and small-cell lung cancer (SCLC) (120 tumor samples).
The statistical analyses were performed using GraphPad Prism v.6.0. The Pearson test was used for determining statistical associations between most altered genes identified as carrying also pathogenic mutations; altered genes and tumor stage; tumor size; histological type; lymph node involvement; metastatic spread; days to event (death).

Patient Cohorts
For this study, the first cohort of patients consisted of a total of 32, of which 16 samples were non-small-cell lung carcinoma (adenocarcinoma and squamous cell carcinoma) and 16 were small-cell lung carcinoma. In this cohort, we used primary lung tumor biopsies and whole blood samples from 25 males and 7 females. The sampling was performed from patients undergoing biopsies for diagnostic (tumor and adjacent normal tissue) for stage III and IV pulmonary cancer. The patients were enrolled in the study between December 2016 and January 2018. The second cohort also included 32 patients-16 NSCLC and 16 SCLC (9 females and 23 males)-diagnosed with pulmonary cancer between February 2018 and May 2019. All patients were provided with detailed information regarding the present study and gave their informed consent. This research was approved by the Ethical Committee of Leon Daniello Pulmonology Hospital, Cluj-Napoca, Romania no. 264/26.06.2018 and the Ethical Committee of Iuliu Hatieganu University of Medicine and Pharmacy no. 438/24.11.2016. The clinicopathological characteristics of these patients are listed in Table 1 and the individual characteristics in Table 2.  P8  F  58  T3N2M0  IIIB  adenocarcinoma  yes  yes  116   P12  M  67  T3N2M0  IIIB  SCLC  yes  yes  41   P13  M  66  T3N3M0  IIIB  adenocarcinoma  no  yes  518   P15  M  58  T4N2M0  IIIB  squamous cell carcinoma  no  yes  311   P16  F  76  T4N3M0  IIIC  squamous cell carcinoma  yes  yes  584   P19  F  67  T3N3M1  IV  adenocarcinoma  no  no  871   P20  F  81  T4N2M0  IIIB  adenocarcinoma  no  yes  32   P22  M  58  T2N2M0  IIIB  SCLC  yes  yes  261   P23  M  60  T3N2M0  IIIB  SCLC  no  yes  336   P24  M  60  T4N2M1  IV  SCLC  no  yes  111   P31  M  61  T4N2M1  IV  squamous cell carcinoma  no  yes   P34  M  62  T3N2M1  IV  adenocarcinoma

Sample Processing and DNA Extraction
Lung tissue biopsy samples obtained from the patients described in the previous section were processed following the steps required using the Trizol Reagent (Ambion) according to the manufacturer's instruction for DNA extraction. Therefore, after the suspension of the tissue sample in the Trizol reagent, the sample was snap-frozen in a liquid nitrogen bath in order to proceed with the homogenization step using a polytron. The following steps included the addition of 160 µL of chloroform. The separation of the aqueous phase for the DNA precipitation was performed in the tube using pure ethanol. After the two washing steps with 75% ethanol, DNA is resuspended using 40 µL of nucleasefree water. DNA extraction from 200 µL whole blood matched samples was performed using the PureLink™ Genomic DNA Mini Kit (Thermo Fisher Scientific, Waltham, MA, USA) according to the manufacturer's instructions. The concentration and quality of the gDNA were assessed using a NanoDrop-1000 spectrophotometer.

Sequencing Protocol and Data Analysis
The targeted sequencing protocol was performed using the Ion Ampliseq Cancer Panel. Signal processing, base calling, and alignment to the hg19 reference genome operated on the Torrent Suite 5.6 (Life Technologies). The reference genome was used for the comparison and it represents the normal sequence, as previously described [9][10][11]. For the downstream analyses of variant calling and data trimming alignment, the Ion Reporter 5.6 software was used. The same software was used for annotations; we performed the transfer of the VCF files for each sample using the following parameters: coverage ≥ 100 for tissue samples and ≥50 for blood samples and p-value ≤ 0.05. For the clinical assessment of the mutations (benign/likely benign/neutral/likely pathogenic/pathogenic), the COSMIC (Catalogue of Somatic Mutations in Cancer) (https://cancer.sanger.ac.uk/ cosmic (accessed on 12 November 2021)) and the ClinVar databases were interrogated (https://www.ncbi.nlm.nih.gov/clinvar/ (accessed on 12 November 2021)).

Mutation Validation-SNP Genotyping Assay
Using TaqMan™ SNP Genotyping Assay for c.1621A>C in KIT (rs3822214) and c.215C>G in TP53 (rs1042522) mutations, we experimented validating the two mutations following the next protocol: after the preparation of the 20x assay dilutions, the PCR mix consisted of 2.5 µL of TaqMan MasterMix, 0.25 µL of the assay (rs1042522/rs3822214) and 1 µL of DNA (20 ng/µL) and 1.25 µL of ultrapure H 2 O. The PCR program for the genotyping assay consists of the following incubation steps: 95 • C for 10 min, followed by 40 cycles of 95 • C for 15 s and 60 • C for 1 min. For this experiment, all samples were analyzed in duplicate. All the reactions were performed on a quantitative real-time PCR machine ViiA7 (Thermo Fisher Scientific, Waltham, MA, USA). Additionally, the analysis was performed with the PCR instrument incorporated software.

Genetic Alterations in the TCGA Datasets
The TCGA data analysis began with downloading the mutation and can file for each dataset of lung cancer type: LUAD, LUSC, and SCLC. After selecting the top 30 genes with frequent alterations (with known status of cancer gene), we combined the results obtained for each dataset and the removal of the duplicate retrieved 54 genes that were selected as the most mutated cancer genes in lung cancer TCGA datasets and were used for the analysis of the genetic alterations.
Using the TCGA datasets for lung cancer patients, we also took a look at the genetic alterations identified in each dataset separating the frequencies for men and women.
As illustrated in Figure 3, in LUAD patients, TP53 was altered in 22.46% of women and 20.42% of men, while in LUSC, TP53 was altered in 30% of women and 28.84% of men. The highest frequency of TP53 alterations was encountered in women (84.09%) and men (86.84%) in the SCLC dataset. This is also the case for RB1, which was altered in 81.82% of women and 67.11% of men in SCLC, while in LUAD and LUSC, the frequencies were less than 10% of the men/women. LRP1B was encountered altered in 18.33% of men and 15.94% of women in LUAD, and31.54% of men and 27.69% of women in LUSC; in SCLC, the alteration frequencies in both men and women were about 40%. The clearest differences in terms of alteration frequencies in men and women were observed in the fact that in LUAD, ATRX, KRAS, and EGFR appeared to be altered with higher frequency in women (5.8%, 21.38%, and 12.68%, respectively), than in men (0.42%, 18.33%, and 8.33%, respectively). In LUSC, visible differences between men and women were observed in ATRX (alterations being more frequent in women-8.46% compared to 2.43% in men) and CDKN2A (alterations being more frequent in men-35.31% compared to 28.46% in women). In SCLC, it can be observed that cancer gene alterations appear to be more frequent in women, the most consistent differences being observed in PTRD, CREBBP, GRM3, ATRX, NOTCH3, and PDE4DIP, while ATM appears to be altered in 5.26% of men, and no alterations were depicted in women.

Clinicopathological Characteristics of the Cohort
The main characteristics of the patients included in the study cohort are listed in Table 1. The patients included in the sequencing experiment were between 53 and 81 years, with a mean age of 62 years. Males represented a majority with 25 cases (78.2%), while women represented a minority with 7 cases (21.8%). The cohort of 32 patients used in the mutation validation experiment were between 42 and 81 years, with a mean age of 62 years. Males represented approximately 72% (23 cases), while women represented about 28% (9 cases) of the total. Each cohort included 16 NSCLC samples (8 lung adenocarcinomas and 8 squamous cell carcinomas of the lung) and 16 SCLC samples. For this study, we only selected patients with stage III and IV lung cancer at diagnosis. The pTNM classification is detailed in Table 1. Table 2 details the individual clinicopathological

Clinicopathological Characteristics of the Cohort
The main characteristics of the patients included in the study cohort are listed in Table 1. The patients included in the sequencing experiment were between 53 and 81 years, with a mean age of 62 years. Males represented a majority with 25 cases (78.2%), while women represented a minority with 7 cases (21.8%). The cohort of 32 patients used in the mutation validation experiment were between 42 and 81 years, with a mean age of 62 years. Males represented approximately 72% (23 cases), while women represented about 28% (9 cases) of the total. Each cohort included 16 NSCLC samples (8 lung adenocarcinomas and 8 squamous cell carcinomas of the lung) and 16 SCLC samples. For this study, we only selected patients with stage III and IV lung cancer at diagnosis. The pTNM classification is detailed in Table 1. Table 2

Mutations Identified in the Experimental Cohort
When we sequence a patient's genome and compare it to the reference genome, the reference represents the normal sequence. Therefore, any identified variation could be considered responsible for the malignant phenotype of lung cancer patients. After establishing the parameters for filtering the significant mutations in the targeted sequencing experiment (p > 0.05, coverage > 100× or >50×), we proceeded to the filtering out of the synonymous mutations, as they do not affect the phenotype of the transformed cell. The

Mutations Identified in the Experimental Cohort
When we sequence a patient's genome and compare it to the reference genome, the reference represents the normal sequence. Therefore, any identified variation could be considered responsible for the malignant phenotype of lung cancer patients. After establishing the parameters for filtering the significant mutations in the targeted sequencing experiment (p > 0.05, coverage > 100× or >50×), we proceeded to the filtering out of the synonymous mutations, as they do not affect the phenotype of the transformed cell. The selected mutations were classified as benign/likely benign/neutral/likely pathogenic/pathogenic according to the FATHMM score and according to the ClinVar interpretation. Moreover, according to The International HapMap Consortium, minor allele frequency (MAF) is widely used in population genetics studies because it provides information to differentiate between common and rare variants in the population); a variant is considered rare if the MAF ≤ 0.05, and low if MAFs < 0.1 [12,13]. Table 3 lists all the pathogenic mutations detected after filtering out the 32 tumor biopsy samples and matched whole blood samples. As can be observed in Supplementary File S1, the majority of the pathogenic alterations that were further analyzed in our study have a MAF of 0 and very few a MAF of 0.1, indicating that these alterations might be linked to lung cancer pathogenesis (Supplementary File S1). Figure 5 illustrates all the altered genes in the analysis, alongside the number of different mutations encountered in each gene and the number of patients harboring at least one mutation in the gene for both tumor biopsy and whole blood samples. According to the data generated in the sequencing experiment, the most altered genes in the tumor tissue samples were TP53, CSF1R, PIK3CA, FLT3, and KDR, while in the matched blood samples, KDR was encountered with frequent alterations. As illustrated, CSF1R (72%), ERBB4 (53%), FLT3 (87%), KDR (84%), PIK3CA (59%), and TP53 (90%) appeared to be altered in a higher number of tumor tissue biopsy samples, while frequently altered genes in blood samples were FLT3 (93.7%), KDR (84.3%), CSF1R (75%), TP53 (75%), and ERBB4 (59.3%). Regarding c.*37delA and c.*35insT, alterations in the CSF1R were detected in the majority of the samples analyzed, their categorization is still unknown, but is most likely benign. Pathogenic mutations were detected in the following genes: ABL1, CDKN2A, CTNNB1, ERBB2, IDH2, KIT, NRAS, PIK3CA, PTEN, RB1, RET, SMAD4, STK11, and TP53. A particular case is the c.215C>G (p.Pro72Arg) mutation in TP53. This alteration was found in 65% of the tumor tissue samples and 75% of the matched blood samples from patients in the cohort and is associated with drug response.
As they were detected in a higher number of patients (c.215C>G in TP53 (rs1042522), c.1621A>C in KIT (rs3822214), and c.3196G>A in PIK3CA), these alterations were taken into consideration for subsequent studies, their classification being either pathogenic or being associated with drug response. Moreover, there are many genes with only one detected mutation that appears to be present in only one patient. Given the high mutational burden of the patients in the experimental cohort and the reduced size of the cohort analyzed, further studies are required to establish a common mutational pattern for each histological type of lung cancer.
When assessing the number of different mutations identified and the number of patients with altered genes separately on men and women, it was revealed that in women, the number of mutated genes appeared to be lower than in men. As expected, in the tumor tissue samples, TP53 was the gene with frequent alterations in both men and women, followed by KDR, FLT3, PIK3CA, ERBB4, and CSF1R. One observation is that KIT, STK11, and RB1 were found mutated only in men, and alterations in these genes were encountered in a large number of patients. In the matched blood samples, KDR was encountered with frequent alterations, followed by CSF1R, FLT3, FGFR3, and PIK3CA, in both men and women. Moreover, SMAD4, MET, and KIT were identified as altered only in blood samples from men ( Figure 6). Nevertheless, these observations need further studies for statistical validation regarding the differences in the mutational profiles of men vs. women. When the mutation analysis was performed separately on NSCLC and SCLC samples to compare the similarities/differences, similarities were observed in the high number of alterations depicted in TP53, PIK3CA, and KDR; and in the high number of patients harboring mutations in TP53, PIK3CA, KDR, FLT3, and CSF1R in tumor samples. In the blood samples, KDR had a higher number of alterations and FLT3, KDR, CSF1R, TP53, ERBB4, and PIK3CA were altered in a higher number of patients. When assessing the differences, it was observed that in SCLC patients, RB1 was depicted with a higher number of different mutations and this gene was altered in a higher number of patients (in tumors), while in blood samples, RET, ATM, and ABL1 appeared altered only in the SCLC group.
Moreover, in the SCLC group, the patients appeared to have more different genes mutated than in the NSCLC group (Figure 7). in blood samples were FLT3 (93.7%), KDR (84.3%), CSF1R (75%), TP53 (75%), and ERBB4 (59.3%). Regarding c.*37delA and c.*35insT, alterations in the CSF1R were detected in the majority of the samples analyzed, their categorization is still unknown, but is most likely benign. Pathogenic mutations were detected in the following genes: ABL1, CDKN2A, CTNNB1, ERBB2, IDH2, KIT, NRAS, PIK3CA, PTEN, RB1, RET, SMAD4, STK11, and TP53. A particular case is the c.215C>G (p.Pro72Arg) mutation in TP53. This alteration was found in 65% of the tumor tissue samples and 75% of the matched blood samples from patients in the cohort and is associated with drug response.

Mutation Validation
Two of the variants that were detected in several patients with lung cancer were tested for further validation, as they were classified as pathogenic or associated with drug response according to ClinVar. In this respect, we used the TaqMan SNP Genotyping assay for the following gene coding variants: for c.1621A>C in KIT (rs3822214) and c.215C>G in TP53 (rs1042522). This experiment properly validated the c.1621A>C in KIT (rs3822214) assay, as the proportion of patients (6 out of 31) with a positive identification was similar to the proportion in the sequencing experiment. The poor DNA quality did not allow the identification of a result for one sample. The assay for TP53 (rs1042522) was not properly validated, as the proportion of patients (3 out of 27) with a positive identification (11%) was different from the proportion in the sequencing experiment (21 out of 32) (65%). Poor DNA quality in five samples hampered the possibility of identifying a result.

Statistical Analysis
Overall patient survival was assessed for the three TCGA datasets (LUAD, LUSC, SCLC) for the 13 genes identified as harboring pathogenic mutations in the sequencing experiment. The survival analysis revealed that in the TCGA LUAD datasets, the presence of NRAS, IDH2, and CDKN2A alterations is correlated with the survival of patients, the latter gene reaching a borderline statistical significance. For LUSC datasets, TP53, STK11, and CDKN2A alterations appear to be correlated with the survival of patients, the latter gene reaching a borderline statistical significance. In the SCLC dataset, only STK11 appeared to reach statistical significance in the survival analysis (Table 4). These results

Mutation Validation
Two of the variants that were detected in several patients with lung cancer were tested for further validation, as they were classified as pathogenic or associated with drug response according to ClinVar. In this respect, we used the TaqMan SNP Genotyping assay for the following gene coding variants: for c.1621A>C in KIT (rs3822214) and c.215C>G in TP53 (rs1042522). This experiment properly validated the c.1621A>C in KIT (rs3822214) assay, as the proportion of patients (6 out of 31) with a positive identification was similar to the proportion in the sequencing experiment. The poor DNA quality did not allow the identification of a result for one sample. The assay for TP53 (rs1042522) was not properly validated, as the proportion of patients (3 out of 27) with a positive identification (11%) was different from the proportion in the sequencing experiment (21 out of 32) (65%). Poor DNA quality in five samples hampered the possibility of identifying a result.

Statistical Analysis
Overall patient survival was assessed for the three TCGA datasets (LUAD, LUSC, SCLC) for the 13 genes identified as harboring pathogenic mutations in the sequencing experiment. The survival analysis revealed that in the TCGA LUAD datasets, the presence of NRAS, IDH2, and CDKN2A alterations is correlated with the survival of patients, the latter gene reaching a borderline statistical significance. For LUSC datasets, TP53, STK11, and CDKN2A alterations appear to be correlated with the survival of patients, the latter gene reaching a borderline statistical significance. In the SCLC dataset, only STK11 appeared to reach statistical significance in the survival analysis (Table 4). These results suggest that overall survival of patients with the above mentioned altered genes for each dataset is reduced compared to the unaltered group. Pearson correlation analysis is represented in Table 5. For this analysis, we selected the genes altered in a higher number of patients with at least one pathogenic mutation to assess the correlation with the clinicopathological information of the patients (tumor stage, tumor size, histological type, lymph node involvement, metastatic spread, days to event (death)). For the correlation analysis, we selected the genes based on the pathogenicity score of the alterations obtained by interrogating the two mentioned mutation databases (ClinVar and COSMIC). This parameter was chosen given its essential role in evaluating lung cancer biology. Survival data (days to event) were available for 26 out of 32 patients from the training cohort and for 30 out of 32 from the validation cohort. Significant values for different correlations are bolded in Table 5. As expected, the presence of metastases was correlated with stage (r = 0.96862; p = 0.000). A strong negative correlation was identified between adenocarcinoma and altered PIK3CA (r = −0.50918; p = 0.0029). Regarding the smoking status, the analysis revealed a moderate positive correlation between former smokers and altered PIK3CA and (r = 0.364698; p = 0.0401), and a moderate negative correlation between never smokers and altered PIK3CA (r = −0.3647; p = 0.0401); a moderate negative correlation between former smokers and adenocarcinoma (r = −0.30949; p = 0.013); and a moderate positive correlation between never smokers and adenocarcinoma (r = 0.309492; p = 0.013) ( Table 5).

Discussion
In this study, we aimed to identify the genetic alterations in cancer genes using three different lung cancer datasets (LUAD, LUSC, and SCLC), to identify common mutational patterns specific for each histotype, to address the differences between men and women, and to compare the results in the TCGA datasets with the genetic alterations observed in an NGS experiment performed on a cohort of 32 lung cancer patients. Another focus of this paper was to assess the impact of pathological alterations on the clinical outcome of the patients in our cohorts and whether these alterations can be correlated with different clinicopathological characteristics, such as histological type, stage, lymph node involvement, metastasis, smoking status, etc.
In lung cancer studies, implementing NGS approaches using brain metastases samples from patients with metastatic NSCLC reported TP53 as being the most frequently altered gene in the samples, followed by KRAS, FGFR3, CDKN2A, and VHL. The authors also identified various targetable mutations that may be used for targeted approaches to improve patients' outcomes. Still, the high mutational burden reported in this study questions the ease of applying targeted therapies with high efficiency in metastatic NSCLC [14].
When we assessed the genetic alterations identified in each lung cancer TCGA dataset separating the cancer genes' alteration frequencies observed in men and women, we identified several genes with visible differences in terms of genetic alterations between men and women. One notable difference was observed in RB1, which was altered in 81.82% of women and 67.11% of men in SCLC, while in LUAD and LUSC, the frequencies were less than 10% of the men/women. We also identified that ATRX, KRAS, and EGFR appeared to be altered with higher frequency in women (5.8%, 21.38%, and 12.68%, respectively), than in men (0.42%, 18.33%, and 8.33%, respectively) in the LUAD datasets. In LUSC, visible differences between men and women were observed in ATRX (women-8.46%, compared to men-2.43%) and CDKN2A (men-35.31%, compared to women-28.46%). We also observed that in the SCLC dataset, cancer gene alterations appeared to be more frequent in women, with one exception-ATM, which appears to be altered only in men (Figure 3).
In our experiments, TP53 appeared as the most frequently altered gene in the tissue samples analyzed (more than 90%), while in blood samples, FLT3 was the most frequently altered gene (more than 90%). FLT3 harbors germline mutations that appear to be present in the majority of lung cancer patients analyzed in this study. According to the lung cancer literature, SCLC specimens encounter the highest frequency of TP53 alterations [15,16]. In the samples analyzed in our study, SCLC samples were identified as harboring TP53 alterations (more than 93% of the samples) with a frequency that is slightly higher than in the NSCLC specimens analyzed (87% of the samples) in the tumor tissue, but this tendency was not observed in the blood samples, TP53 being altered in 68% of SCLC and 81% of NSCLC samples (Figure 7). The majority of these mutations were catalogued as pathogenic, while one mutation was associated with drug response (c.215C>G) ( Table 3). This variation in TP53 was encountered in more than 65% of the tumor tissue samples and was selected for validation using the SNP genotyping assay in a second cohort of matched lung cancer patients, but was not properly validated in the second cohort, as it was detected in only 11% of the analyzed samples. Frequent abnormalities of TP53 in lung cancer patients are associated with worse overall survival and resistance to therapy [17][18][19].
Other frequently mutated genes in the tumor tissue samples used in our lung cancer study were FLT3 (altered in 87% of the samples), KDR (84%), CSF1R (72%), PIK3CA (59%), and ERBB4 (53%). Similar proportions were also observed in the mutational profile of the blood samples: FLT3 (altered in 93% of the samples), KDR (84%), CSF1R (75%), and ERBB4 (59%). The variations identified in CSF1R were not classified in the COSMIC or ClinVar databases, but as identified in most of the samples analyzed (both tissue and blood), are considered germline mutations and appear to be most likely benign. FLT3 alterations were studied mostly in acute myeloid leukemia, and are particularly involved in cell differentiation, survival, and proliferation, in a mechanism involving PI3K, RAS, and STAT5 [20]. Moreover, FLT3 is a druggable target, as first-and next-generation inhibitors were already developed for precise treatments and in order to overcome acquired resistance [21][22][23]. In our study, two of the FLT3 variations detected are known to be neutral, and one has no classification yet. KDR is a tyrosine kinase gene that may function as a proto-oncogene, as mutations in KDR appear to modulate angiogenesis, being involved in cell proliferation, migration, and survival. While most of the identified mutations appear to be neutral, the others are not classified yet; in a previous study involving c.1416A>T (p.Gln472His) variation with a frequency of 33% in patients, it was suggested as functioning as a proto-oncogene in LUAD [24]. In our cohort, this mutation appeared as germline, having a frequency of about 47% of the patients (in both tissue and blood samples), yet the FATHMM score indicates it as neutral, while it was not classified yet in the ClinVar mutation databases. PIK3CA was identified in this cohort as harboring seven different mutations, most of them being labeled as pathogenic. In total, at least one PIK3CA mutation was identified in 19 tumor samples from patients (59% frequency), and 7 tumor samples from patients (20%) had at least one pathogenic mutation in PIK3CA. Other studies reported PIK3CA as harboring pathogenic mutations in 4% of the cohort of 186 NSCLC patients [25]. ERBB4 appeared to be altered in 17 tumor samples (53%) and 8 blood samples (25%), but this gene harbored one mutation classified as neutral and another one with no classification ( Figure 5). Our study revealed that KIT, STK11, and RB1 were found mutated only in men's tumor samples, while SMAD4, MET, and KIT were identified as altered only in blood samples from men; alterations in these genes were encountered in a large number of patients, indicating there are some differences in terms of mutations in the group of women vs. men ( Figure 6). When comparing the mutations identified in the NSCLC and SCLC groups, we observed that RB1 was depicted with a higher number of different mutations in tumor samples and this gene was altered in a higher number of patients in the SCLC group ( Figure 7); this result was consistent with the TCGA data analysis (Figure 2).
Other genes harboring at least one pathogenic mutation were ABL1, CDKN2A, CTNNB1, ERBB2, IDH1, KIT, NRAS, PTEN, RB1, RET, SMAD4, and STK11 ( Figure 8). All these genes harbor one pathogenic mutation identified in only one patient, with few exceptions-SMAD4 and NRAS have two pathogenic mutations, and each mutation is encountered in only one patient; KIT was identified with one pathogenic mutation encountered in four different patients. The correlation analysis indicated a strong negative correlation between adenocarcinoma and altered PIK3CA, while a moderate correlation was noticed between time (days) to event (death) and former smokers (positive correlation) and never smokers (negative correlation). As indicated, this experiment revealed multiple genes with pathogenic alterations in lung cancer (adenocarcinoma, squamous cell carcinoma, and small-cell carcinoma), suggesting a high mutational burden for this pathology. Moreover, exposure to different carcinogens (tobacco smoke, asbestos or radon exposure, radioactive ores, etc.) might cause differences in the mutational profile of these patients. Another aspect that should be taken into consideration is that some differences may be endemic for a specific population. Besides the high mutational burden encountered in different lung cancer histotypes (that is reflected also in the reduced efficiency of targeted therapies in lung cancer), other limitations include the morphological and genetic heterogeneity previously reported by other studies [26][27][28]. noticed between time (days) to event (death) and former smokers (positive correlation) and never smokers (negative correlation). As indicated, this experiment revealed multiple genes with pathogenic alterations in lung cancer (adenocarcinoma, squamous cell carcinoma, and small-cell carcinoma), suggesting a high mutational burden for this pathology. Moreover, exposure to different carcinogens (tobacco smoke, asbestos or radon exposure, radioactive ores, etc.) might cause differences in the mutational profile of these patients. Another aspect that should be taken into consideration is that some differences may be endemic for a specific population. Besides the high mutational burden encountered in different lung cancer histotypes (that is reflected also in the reduced efficiency of targeted therapies in lung cancer), other limitations include the morphological and genetic heterogeneity previously reported by other studies [26][27][28].

Conclusions
Our study revealed TP53, CSF1R, PIK3CA, FLT3, and KDR as genes with most variations detected in lung cancer specimens, while CSF1R, ERBB4, FLT3, KDR, PIK3CA, and TP53 appeared to be mutated in a higher number of patients. We also managed to underline differences in the mutational profile of patients in NSCLC vs. SCLC and men vs. women. These results partially overlapped with the mutational landscape identified analyzing lung cancer TCGA datasets. Furthermore, in our cohort, the presence of PIK3CA alterations was correlated with adenocarcinoma. Moreover, the mutation validation properly validated only one out of two variants selected. All these results indicate that in lung cancer tumor samples, the mutational burden is high, and even if differences in the mutational landscape of patients with LUAD, LUSC, and SCLC subtypes of lung cancer were observed, a specific mutational pattern was not established for each histological type. This situation is characteristic of malignancies with a high mutation burden as a result of the accumulation of numerous driver mutations and this limits the targeted

Conclusions
Our study revealed TP53, CSF1R, PIK3CA, FLT3, and KDR as genes with most variations detected in lung cancer specimens, while CSF1R, ERBB4, FLT3, KDR, PIK3CA, and TP53 appeared to be mutated in a higher number of patients. We also managed to underline differences in the mutational profile of patients in NSCLC vs. SCLC and men vs. women. These results partially overlapped with the mutational landscape identified analyzing lung cancer TCGA datasets. Furthermore, in our cohort, the presence of PIK3CA alterations was correlated with adenocarcinoma. Moreover, the mutation validation properly validated only one out of two variants selected. All these results indicate that in lung cancer tumor samples, the mutational burden is high, and even if differences in the mutational landscape of patients with LUAD, LUSC, and SCLC subtypes of lung cancer were observed, a specific mutational pattern was not established for each histological type. This situation is characteristic of malignancies with a high mutation burden as a result of the accumulation of numerous driver mutations and this limits the targeted approach in clinical practice. All these results and observations should be further investigated in larger cohorts to establish accurate mutational patterns specific for each lung cancer histotype.