Identification of Genetic Variants Associated with Sex-Specific Lung-Cancer Risk

Simple Summary The incidence of lung cancer differs between men and women, suggesting the potential role of sex-specific influences in susceptibility to this cancer. While behavioural differences, such as smoking rates, may account for much of the risk, another possibility is that X chromosome susceptibility genes may have an effect. Therefore, in this study, we tested specifically for the influence of X chromosome single-nucleotide polymorphisms (SNPs) in male lung cancer cases, and found 24 that were significantly associated with male, but not female, lung cancer cases. Examining these in detail, we observed these SNPs resided in blocks near the annotated genes DMD, PTCHD1-AS, and AL008633.1. We also observed that DMD was differentially expressed in lung cancer subtypes curated in the Cancer Genome Atlas database. Examining this gene further, we found that expression and mutation of DMD may have effects on immune function. This work defines potential targets for sex-specific lung cancer prevention. Abstract Background: The incidence of lung cancer differs between men and women, suggesting the potential role of sex-specific influences in susceptibility to this cancer. While behavioural differences may account for some of the risk, another possibility is that X chromosome susceptibility genes may have an effect. Little is known about genetic variants on the X chromosome that contribute to sex-specific lung-cancer risk, so we investigated this in a previously characterized cohort. Methods: We conducted a genetic association reanalysis of 518 lung cancer patients and 844 controls to test for lung cancer susceptibility variants on the X chromosome. Annotated gene expression, co-expression analysis, pathway, and immune infiltration analyses were also performed. Results: 24 SNPs were identified as significantly associated with male, but not female, lung cancer cases. These resided in blocks near the annotated genes DMD, PTCHD1-AS, and AL008633.1. Of these, DMD was differentially expressed in lung cancer cases curated in The Cancer Genome Atlas. A functional enrichment and a KEGG pathway analysis of co-expressed genes revealed that differences in immune function could play a role in sex-specific susceptibility. Conclusions: Our analyses identified potential genetic variants associated with sex-specific lung cancer risk. Integrating GWAS and RNA-sequencing data revealed potential targets for lung cancer prevention.


Introduction
Consecutive epidemiological studies have found that the estimated new lung and bronchus case rate is higher in males than in females [1][2][3], suggesting that gender differences contribute to the incidence of lung cancer. Tobacco is a carcinogen that increases the risk of lung cancer. However, it is controversial whether the difference in lung cancer susceptibility in smokers is greater in women than in men compared with non-smokers [4,5]. Beyond smoking exposure, we hypothesized there could be a genetic effect on increased susceptibility in men. Many lung cancer susceptibility loci have been identified by genomewide association studies (GWAS) [6]. X-linked genetic variants could affect susceptibility

Data Accession and Categorization
The GWAS dataset in this analysis was downloaded with appropriate approvals from the dbGaP database (phs000093.v2. p2). A total of 513 lung cancer cases and 834 controls were retrieved based on phenotype document. The samples were subgrouped based on gender, resulting in 54 male SCLC cases, 27 female SCLC cases, 259 male NSCLC cases, and 173 female NSCLC cases. The age and family history groups in the 313 male lung cancer patients were defined based on the phenotype files in the dbGAP data set. We defined "younger age" as 64 or less (code 0-1) and "older age" as 65 or more (code 2-3). Twenty male patients had an incomplete family history, so only 293 patients were included for analysis of peak SNPs by family history. The genetic data were imputed by the fcGENE [13]. As part of our quality control procedure, we excluded samples and SNPs based on the following criteria: a. any SNP that had >5% heterozygous genotypes in all male samples; b. any male sample with >5% heterozygosity across all SNPs; c. SNPs in the pseudo-autosomal region (PAR) were removed; d. any designated female samples that were homozygous for >90% of the SNPs.

Association Analysis
The case against control association test on each subgroup, linkage disequilibrium (LD) analysis, haplotype analysis, and SNP annotation were conducted using Plink v1.07 with default settings. All significant SNPs were annotated using information from db-SNP (GRCh38.p12, https://www.ncbi.nlm.nih.gov/snp/) (accessed on 12 March 2021). Population-specific haplotype frequencies were analysed and visualized by LDhap (https://ldlink.nci.nih.gov/?tab=ldhap) (accessed on 25 April 2021) with reference to the "British in England and Scotland" and "Utah residents from North and West Europe" datasets.

GO/KEGG Pathway Enrichment Analysis
We applied the clusterProfiler package in R for the gene cluster analysis [16]. The unified positively co-expressed genes of DMD from both LUAD and LUSC were used for the Kyoto Encyclopedia of Genes and Genomes (KEGG) Gene Ontology (GO) enrichment analysis, including biological process (BP), cellular components (CC), and molecular function (MF). p-values < 0.05 were considered to indicate significantly enriched pathways.

Statistical Analysis
Fisher's exact test was performed to calculate the significance of SNP genotype associations; the Kaplan-Meier method was used to estimate the impact of gene expression on survival; and P-values less than 0.05 were considered as statistically significant, except for GWAS SNP association, in which case, correction was made for testing of all the X chromosome SNPs.

Identification of Sex-Specific SNPs Associated with Lung Cancer Susceptibility
To find potential X chromosome lung cancer susceptibility genes, we compared male lung cancer cases with male controls using data from a previously characterised cohort derived from the Environment and Genetics in Lung Cancer Etiology Study (EAGLE) [17] and the Prostate, Lung, Colon and Ovary Study (PLCO) [18] Cancer Screening Trial. Access to these data was approved via dbGAP (phs000093.v2.p2). We identified a total of 24 significantly associated SNPs ( Figure 1A and Table 1); all of these were outside the pseudoautonomous (PAR) region. The genotypes of the most strongly associated SNPs that were over-represented in male lung cancer cases were the C alleles of rs145211462 and rs62587743, suggesting that the alleles of these SNPs contributed significantly to lung cancer susceptibility in males (Fisher's Exact Test, Table 2). The genes that are located nearest to these SNPs are AL008633.1 and DMD.   We tested whether the peak SNPs were associated with cancer in females. As shown in Figure 1A, there were no significantly associated SNPs considering only female cases versus female controls. Since X chromosome SNPs are obligatory homozygote in males, we also considered homozygosity at these SNPs in females by excluding heterozygotes from the analyses. As shown in Table 3, there was no significant difference in females who were homozygous for the SNP alleles (Fisher's Exact Test), suggesting that the alleles of these SNPs contributed specifically to susceptibility only in males. This does not exclude potential gene dosage effects. Next, we compared the p values of the 24 significant SNPs in males to other groups, including male lung cancer versus female lung cancer, smokers with lung cancer cases versus smokers without lung cancer, and non-smokers with lung cancer cases versus non-smokers without lung cancer. As shown in Figure 1B, the identified SNPs that contributed specifically to lung cancer susceptibility in males were not associated with smoking behaviour, which is a known cancer predisposition risk. Since males inherit X-linked alleles from their mothers, we reasoned that the X-linked male lung cancer risk SNPs would not be associated with disease in men with a family history of lung cancer. This was found to be the case (Table S1). We also asked whether these SNPs were associated with a later age of cancer onset. Some of the identified SNPs were weakly associated with a later age of diagnosis (Table 4), but these results should be confirmed in a larger cohort.
Together, these results further supported the argument that X-linked cancer susceptibility genes contribute to lung cancer in males, regardless of smoking status and family history of lung cancer.

Interactions between X Chromosome SNPs in Male Lung Cancer Risk
Next, we performed a haplotype-trait association analysis on pairs of SNPs from each peak. Significant associations between SNPs in different peaks were found, suggesting that these SNPs defined chromosome regions that were associated with male-specific lung cancer susceptibility. SNP-SNP interaction analyses of the genotypes in the peak SNPs was performed. Based on the results of the Chi-squared test of the risk alleles of these SNPs, we observed that some of the risk alleles may have an additive effect on lung cancer risk. The odds of these risk alleles to lung cancer risk were compared between men and women. In two-by-two combination analysis, the risk allele combinations that were significant for male cancer were not detected in female cases (Table 5). Similarly, most three-by-three risk allele combinations contributing to the risk of male lung cancer were not found in female lung cancer cases (Table 6). This further supports the notion that the X-linked SNPs were associated with the risk of lung cancer in males. C_T 47 28 1.68 None * The risk alleles of representative SNPs in each peak were retrieved. The number of samples with the combination of these risk alleles were calculated. Each combination was labelled as "X risk allele in peak number_ allele in peak number". * The risk allele of representative SNPs in each peak were retrieved. Next, the number of samples with the combination of these risk alleles were calculated. Each combination was labelled as "X risk allele in peak number_ allele in peak number". None of these combinations were identified in female.

Effect of Sex-Specific Lung Cancer Risk SNPs on DMD Expression
The gene with the most annotated SNPs in this study was DMD, a very large gene that encodes the muscle protein, dystrophin. However, PTCHD1-AS and AL008633.1, the other two genes closely associated SNPs, were either not detected or not included in the relevant databases. Therefore, we focused on investigating the potential effect of DMD expression on lung cancer. We observed a haplotype pattern in these SNPs (Figure 2A), and their genomic position was close to SNPs identified as pathogenic in cancer and Duchenne muscular dystrophy ( Figure 2B). The mutation profile in exons of the DMD gene in 3163 lung cancer samples was analysed in data from the cBioPortal for Cancer Genomics. A total of 14% of samples harboured DMD mutations, ranging from 3.75% to 27.59% in different cohorts ( Figure 2C).     We next checked the gene expression of DMD in the TCGA-Lung adenocarcinoma (LUAD) and TCGA-Lung squamous cell carcinoma (LUSC) cohorts. The results showed that the mRNA expression levels of DMD were significantly decreased in the lung cancer tissues compared with the control tissues ( Figure 2D). The differential expression of DMD between pan-cancer and corresponding control tissues was also investigated ( Figure 2E), revealing that 55% (18 out of 33) of cancer types had abnormal DMD expression.
The impact of DMD on lung cancer survival was investigated, but its expression was not associated with either overall or disease-free survival in the lung cancer cohorts studied ( Figure S1). We further analysed the effect of the differential expression of DMD on 1424 lung cancer patients in 13 microarray datasets (Table 7). We identified that DMD expression was associated with lung cancer survival in 4 out of 13 unified cohorts (30%), in which gene probes of different microarrays, such as 203881_s_at (GSE31210, p = 0.00004, relapse free survival of adenocarcinoma), A_24_P185854 (GSE13213, p = 0.00047, overall survival of adenocarcinoma), 203881_s_at (GSE31210, p = 0.00199061, overall survival of adenocarcinoma), 207660_at (GSE31210, p = 0.00427137, relapse-free survival of adenocarcinoma), 203881_s_at (jacob-00182-UM, p = 0.0116482, overall survival of adenocarcinoma), 234752_x_at (GSE8894,p= 0.0379615, and relapse-free survival of non-small cell lung cancer). This result suggested that DMD expression may play a minor role in lung cancer survival.
Genes that were co-expressed with DMD were identified by UALCAN online analysis [14]. A total of 5 genes in the TCGA-LUAD dataset and 180 genes in the TCGA-LUSC dataset with Spearman correlation coefficients greater than or equal to 0.4 were retrieved. No gene in either dataset was negatively co-expressed with DMD (with Pearson correlation coefficient <−0.3). We merged the positively co-expressed genes and performed in-silico analyses to explore the effects of expression DMD affected by X chromosome susceptibility SNPs in NSCLC. The enriched GO pathways for the co-expressed genes with DMD included "extracellular matrix organization" and "response to tumor necrosis factor" (Figure 3A), while the KEGG analysis implicated the NF−kappa B signaling pathway ( Figure 3B).

DMD Could Affect CD4+ T Cell Infiltration in LUSC
Copy number variation (CNV) has been observed in many studies to participate in the occurrence and development of cancer, and the number and complexity of CNVs are associated with the prognosis of many cancer types. Somatic copy-number alterations (SCNAs) affect a larger fraction of the genome, which can potentially activate an oncogene or inactivate a tumor suppressor gene. SCNAs can be further divided into focal SCNAs (shorter than one chromosome arm) and arm-level SCNAs (chromosome-arm length or longer) [19,20]. The SCNA subtypes, including deep deletion, arm-level deletion, diploid/normal, arm-level gain, and high amplification, can be defined by GISTIC 2.0 [20]. Studies of the correlation of gene mutation with immune infiltration levels in cancer facilitated the understanding of the interaction between malignant cells and the host immune system [21]. Therefore, we investigated the correlation of SCNA and tumor infiltration levels in LUAD and LUSC. As indicated in Figure 4, more tumor infiltrating cells were associated with DMD somatic copy-number alterations in LUSC than in LUAD. Of note, significant arm-level DMD deletion occurred in LUSC samples with CD4+ T cells infiltration (Figure 4), supporting the hypothesis that decreased expression of DMD caused by mutation may affect CD4+ T cell infiltration in LUSC.

Discussion
Genome-wide association testing is an important approach for the identification of genetic factors associated with complex genetic diseases such as lung cancer [6]. However, previous lung cancer GWA studies did not specifically test for potentially susceptible SNPs on the X chromosome. In this study, we performed an X chromosome-wide association study to identify susceptibility loci for lung cancer risk. We identified 24 significant SNPs in two X chromosomes that were associated with lung cancer in male patients. Based on the genome annotation, these SNPs mapped near the genes DMD, PTCHD1-AS, and AL008633.1.
Previous sex-specificity differences in lung cancer risk have been focused on tobaccoderived carcinogens, sex hormones, and carcinogen metabolism [22]. However, the intrinsic influence of genetic variants on sex-specific lung cancer risk should not be neglected. In the present study, we identified genetic variants on the X chromosome that were associated with lung cancer risk regardless of smoking, suggesting some male individuals who bear risk alleles of these X-linked genes are more susceptible to lung cancer. The synergistic interaction of SNPs could be associated with cancer susceptibility [23,24]. In this study, we identified that interactions between SNPs in different regions increase lung cancer risk. Further studies of genes in these regions could identify novel targets for lung cancer prevention.
DMD is a very large gene (greater than 2 Mb), and its mutations are known to be pathogenic in causing Duchenne and Becker muscular dystrophy. Recently, increasing evidence has suggested the role of DMD abnormality in cancer development. Leanne et al. summarized DMD mutations in major cancer types, including soft tissue sarcomas, tumours of the nervous system, carcinomas, and haematological malignancies [25]. Our study revealed that genetic variation in DMD (either as germline variants or as somatic mutations) could be associated with sex-specific risk of lung cancer. Consistent with previous findings, abnormal DMD expression was found in lung cancer compared with control tissues. However, the contribution of DMD to lung cancer susceptibility remains unclear. The pathway analysis of DMD co-expressed genes identified response to tumour necrosis factor in the GO and NF−kappa B signalling pathway in the KEGG pathways. Moreover, an association of the levels of immune infiltrates with DMD mutation was observed, suggesting that DMD may affect tumour development through abnormal immune processes. Altogether, DMD could be a molecular target for the prevention of some cases of male-specific lung cancer.
There are some limitations to this study. First, the dataset we used does not provide protein data, making a direct SNP-protein association analysis impossible. Second, the data were derived from subjects of European descent. Further investigation in other ethnic groups is needed. Third, it could be interesting to test whether these or other X-linked SNPs affected other cancer types. Fourth, collecting further DNA samples for X chromosome sequencing could validate our SNPs or provide novel SNPs associated with sex-specific lung cancer risk.

Conclusions
In this study, we performed analyses of GWAS data to identify sex-specific SNPs, located on chromosome X, associated with lung cancer. Based on gene annotation, expression analysis, co-expression analysis, and functional analyses, our findings support the hypothesis that DMD is abnormally expressed in cancer tissue and DMD-induced immune dysregulation may be responsible for the etiology of lung cancer. Further biomolecular experiments are needed to understand the interaction of these SNPs with DMD. Finally, it is well known that some simple single-gene diseases are more common in males due to inherited mutations in X-liked genes; our results may provide a paradigm for inherited X-linked variants contributing to susceptibility to other cancers and in other common complex genetic diseases.
Supplementary Materials: The following are available online at https://www.mdpi.com/article/10 .3390/cancers13246379/s1. Figure S1: Impact of DMD on lung cancer overall or disease-free survival. Table S1: Genotype of peak SNPs by family history.  Institutional Review Board Statement: We obtained ethical approval from the UWA Human Research Ethics Committee (HREC, 2020/ET000284) to analyse the data.

Informed Consent Statement:
Not applicable for studies involving public anonymous data that was generated under appropriate ethics approval by the original investigators' IRB.

Data Availability Statement:
The GWAS dataset in this analysis was downloaded with appropriate approvals from the dbGaP database (phs000093.v2. p2). Other databases used in this study are listed in Materials and Methods section.