A Novel apaQTL-SNP for the Modification of Non-Small-Cell Lung Cancer Susceptibility across Histological Subtypes

Simple Summary Lung cancer is one of the major public health problems in the world. Genetic variation plays an important role in the development of lung cancer. In this study, we screened apaQTL-SNPs that may influence lung cancer development based on a publicly available database. Then we performed a two-stage case-control population susceptibility study. We found a novel apaQTL-SNP, rs10138506, which might affect lung adenocarcinoma (LUAD) risk by modulating the 3′UTR length of CHURC1. At the same time, CHURC1 could function as a suppressor gene in LUAD with APA regulation. Further experiments indicated that CHURC1 has two poly(A) sites (proximal and distal) and different genotypes of rs1127968 which in perfect LD with rs10138506 can mediate changes in the lengths of the 3′UTR of the CHURC1 isoforms by choosing different poly(A) sites. The results of this study may be helpful in providing more genetic basis data for the screening of high-risk populations of lung cancer. Abstract Background: Alternative polyadenylation (APA) events may be modulated by single nucleotide polymorphisms (SNPs). Therefore, this study aims to evaluate the association between APA quantitative trait loci (apaQTLs)-related SNPs (apaQTL-SNPs) and non-small-cell lung cancer (NSCLC) risk. Methods: APA-related genes associated with NSCLC (LUAD and LUSC) were first identified, and the respective apaQTL-SNPs of those genes were selected. Then, a two-phase case-control study was performed to evaluate the association between candidate apaQTL-SNPs and NSCLC risk. Results: A total of 7 LUAD- and 21 LUSC-associated apaQTL-SNPs were selected. In the first phase, the apaQTL-SNP rs10138506 was significantly associated with LUAD risk (p < 0.05), whereas the other two apaQTL-SNPs (rs1130698 and rs1130719) were significantly associated with LUSC risk (p < 0.05). In the second phase, the variant G allele of rs10138506 was still significantly associated with an increased risk of LUAD (OR = 1.42, 95%CI = 1.02–1.98, p = 0.038). Functional annotation indicated that the variant G allele of rs10138506 was significantly associated with a higher PDUI value of CHURC1. Meanwhile, 3′RACE experiments verified the presence of two poly(A) sites (proximal and distal) in CHURC1, while qRT-PCR results indicated that different genotypes of rs1127968 which, in perfect LD with rs10138506, can mediate changes in the lengths of the 3′UTR of CHURC1 isoforms. Conclusion: The variant G allele of rs10138506 in CHURC1 was correlated with a longer 3′UTR of CHURC1 mRNA and an increased LUAD risk. Further studies should evaluate the interaction between rs10138506 and different 3′UTR lengths of CHURC1 that regulate LUAD development.


Introduction
Lung cancer is the second-most frequently diagnosed cancer and is the first leading cause of cancer-related deaths worldwide [1]. Non-small-cell lung cancer (NSCLC) accounts for approximately 85% of all lung cancer cases, with lung adenocarcinoma (LUAD) and lung squamous cell carcinoma (LUSC) as the most common histological types [2]. In recent years, and although the NSCLC risk is much higher among elderly people, its incidence still shows increasing trends in women and younger adults, especially for LUAD [3]. Additionally, the survival rates of patients with NSCLC remain very low; for instance, the 5-year relative survival rate in the world is 26%, which is higher than that in China (<20%) [4][5][6]. Thus, further exploring the NSCLC etiology might help identify high-risk individuals and reduce its incidence in the general population.
NSCLC is associated with multiple environmental factors, with tobacco smoking as one of its main known factors [7]. Genetic factors also play a vital role in NSCLC development [8], and single nucleotide polymorphism (SNP) is the most common type among them [9]. Although genome-wide association studies (GWASs) have identified numerous SNPs associated with NSCLC risk, their biological functions remain to be elucidated if located in non-coding regions [10,11]. Based on the results of the ENCODE project, >80% of the human genome has been reported to be biochemically active, especially outside the well-studied coding regions [12]. Therefore, exploring the underlying biological functions of SNPs in non-coding regions may provide additional insights in elucidating the etiology and molecular genetic mechanisms of NSCLC.
Polyadenylation is an important post-transcriptional regulation mechanism of gene expression [13]. Additionally, alternative polyadenylation (APA) events commonly exist in approximately 70% of human genes [14]. Recently, Xiang et al. systematically explored the roles of APA events in different cancer types and identified that they were frequently detected in NSCLC (both LUAD and LUSC) and other cancers [15]. APA events indicate that pre-mRNA is cleaved at different poly(A) sites by recognizing different polyadenylation signals (PAS), resulting in the production of diverse mRNA isoforms with longer or shorter lengths of 3 untranslated regions (3 UTRs) [16]. Poly(A) sites are mainly located in the 3 UTR and can be divided into distal and proximal regions, i.e., near and far from the 3 end, respectively [17]. Generally, the percentage of distal poly(A) site usage index (PDUI) values are used to quantify APA events and are calculated by dividing the number of transcripts at the distal poly(A) sites based on the total number of transcripts with distal and proximal poly(A) sites. The PDUI value ranges from 0 to 1, and a PDUI close to 1 indicates that the gene tends to use the distal poly(A) site, which may cleave and express mRNA isoforms with a longer 3 UTR. A PDUI of close to 0 represents that this gene tends to utilize the proximal poly(A) site, which may cleave and express mRNA isoforms with a shorter 3 UTR in turn [18].
Increasing evidence has shown that copious oncogenes in multiple cancer types are accompanied by 3 UTR shortening [19], which is also common in the majority of NSCLCrelated oncogenes [20]. For example, the oncogene CSNK1D commonly uses the proximal poly(A) site in LUAD tumor tissues, resulting in a lower PDUI value (shorter 3 UTR) than that in the adjacent normal lung tissues [15]. Different 3 UTR lengths mediated by APA events could affect mRNA stability, transcription and translation [21]. Thus, it is biologically plausible that APA events may modify the expression of certain corresponding genes by affecting the transcripts with different 3 UTR lengths and further influence LUAD and LUSC development. Further studies exploring the APA event regulators and the underlying regulation mechanisms might provide new clues in elucidating the etiology of LUAD and LUSC.
Recent studies have proposed that some SNPs can act as regulators of APA events [22]. Certain SNPs have been associated with the use of different poly(A) sites, which might cause APA dysfunctions. Furthermore, abnormal APA events might regulate gene expression by generating mRNA isoforms with different 3 UTR lengths. Similar to the eQTL analysis, APA quantitative trait loci (apaQTL) analysis can indicate the relationship between different SNP genotypes and PDUI values in certain genes. The proposed novel apaQTL mechanism may provide a novel insight to the interpretation of potential SNP functions in non-coding regions [23]. Recently, by integrating the SNP genotype and APA data (quantified by PDUI values), Yang et al. systematically identified apaQTL-SNPs in 32 cancer types and developed an online database known as SNP2APA [24]. However, apaQTL-SNPs from SNP2APA were obtained by bioinformatics analysis. Further studies with large samples are needed to investigate whether and how apaQTL-SNPs identified by SNP2APA affect NSCLC risk.
In this study, APA-related genes associated with NSCLC (LUAD and LUSC) were firstly obtained from published data by Xiang et al. [15]. Secondly, the respective apaQTL-SNPs for LUAD and LUSC were systematically screened based on these genes. Thirdly, a two-phase case-control study (phase I: 4107 patients with NSCLC and 3710 healthy controls; phase II: 779 patients with NSCLC and 667 healthy controls) was performed to evaluate the association between candidate apaQTL-SNPs and NSCLC (LUAD and LUSC) susceptibility.

Study Population
To explore the association between candidate apaQTL-SNPs and NSCLC risk, a two-phase case-control study (phase I and II) was performed. The study participants in phase I were obtained based on the Female Lung Cancer Consortium in Asia (FLCCA) GWAS from the database of Genotypes and Phenotypes (dbGAP), with accession number phs000716.v1.p1. The FLCCA GWAS consisted of 5510 never-smoking female patients with lung cancer and 4544 controls [25]. Among them, 4107 with NSCLC (including 3453 with LUAD and 654 with LUSC) and 3710 controls obtained from 14 studies conducted in mainland China, South Korea, Japan, Singapore, as well as Taiwan, China and Hong Kong, China were available for the present study.
In phase II, 779 patients with NSCLC (576 LUAD and 203 LUSC) and 667 cancerfree healthy controls were included to further validate the association between candidate apaQTL-SNPs and NSCLC risk. Among the samples in this phase, 626 patients with NSCLC and 667 healthy controls were previously described [26]. An additional 153 patients, newly diagnosed with NSCLC, were continuously recruited from two hospitals (the Sixth People's Hospital of Nantong and Jiangdu People's Hospital of Yangzhou) in Jiangsu from 2020 to 2022. All patients were pathologically confirmed. Written informed consent was obtained from all subjects, and the study was reviewed and approved by the ethics committee of Nantong University (approval no. 2022-2, February 2022).

Identification of APA-Related Genes Associated with LUAD and LUSC
APA-related genes associated with LUAD and LUSC were obtained based on the published study by Xiang et al. [15], in which the authors first downloaded the RNA-seq data from The Cancer Genome Atlas (TCGA) database and used the DaPars algorithm to calculate PDUI values of the corresponding genes for quantifying APA events. They further systematically assessed the correlation between PDUI values and expression levels of the corresponding genes using Spearman's correlation (Rs), and APA-related genes were defined based on the threshold of |Rs| >0.3 and a false discovery rate (FDR) of <0.05.

Selection of Respective apaQTL-SNPs in APA-Related LUAD and LUSC Genes
Respective apaQTL-SNPs in APA-related LUAD and LUSC genes were downloaded from the SNP2APA online website (http://gong_lab.hzau.edu.cn/SNP2APA/ accessed on 21 June 2021) [24], and the authors obtained the SNP genotype data from TCGA database, whereas PDUI values of the corresponding genes were obtained from TC3A database. Then, linear regression of MatrixEQTL was used to calculate the absolute value of the correlation coefficient (r) to evaluate the pairwise association between SNPs and PDUI values. Moreover, SNPs with |r| ≥0.3 and P FDR < 0.05 were defined as apaQTL-SNPs.
After obtaining apaQTL-SNPs in APA-related LUAD and LUSC genes, the candidate apaQTL-SNPs were then filtered out with a minor allele frequency (MAF) of ≥0.01 in the East Asian population (including the Chinese Han population, Southern Han Chinese population and Japanese population). Finally, the linkage disequilibrium (LD) analysis with an r 2 threshold of 0.80 was performed to further filter above the apaQTL-SNPs (Figure 1).

Genotyping of FLCCA GWAS in Phase I
In phase I, genotyping of FLCCA GWAS was performed using Illumina SNP microarrays.

Genotyping of Candidate apaQTL-SNPs in Phase II
For the samples in phase II, genomic DNA was extracted from the peripheral blood using a DNA extraction kit (Qiagen, Valencia, CA, USA). Then, genotyping was performed using the TaqMan allele discrimination method, and the Applied Biosystems™ QuantStu-dio™ 5 FAST Real-Time Polymerase Chain Reaction (PCR) system (Applied Biosystems, Foster City, CA, USA) was used to perform PCR reactions and read the fluorescent signal. Genotyping results were displayed on the allelic discrimination system based on FAM and VIC fluorescence intensity.

3 -Rapid Amplification of cDNA Ends (3 RACE) Experiments
Total RNA was extracted from A549 cell lines using a TRIzol reagent (Invitrogen). Furthermore, the first-strand cDNA was synthesized from 1 µg of the total RNA using the HiScript 1st Strand cDNA Synthesis Kit (Vazyme R111-02). Then, 3 RACE was used to determine poly(A) sites of Churchill Domain Containing 1 (CHURC1) with a 3 RACE kit (Jingrui, Guangzhou, China). The sequences of gene-specific forward primers (GSP) were designed for CHURC1 as follows: CHURC1_F1: GGTAATACCTGCCTGGAGAATGGATC CHURC1_F2: ACAAGGATCTGGCAAGGTTAGGAAG PCR products of 3 RACE were then separated through agarose gel electrophoresis and gel-purified. After confirming the quality with agarose gel electrophoresis, the 3 ends were sequenced using an Applied Biosystems 3730× l DNA Analyzer (Applied Biosystems, Foster City, CA, USA). Finally, the AlignX function of Vector NTI Advance 11.0 (Invitrogen, Carlsbad, CA, USA) was used for sequence alignment.

Vectors Construction and qRT-PCR Assay
The psiCHECK2-CHURC1-3 UTR-rs1127968-wid and the psiCHECK2-CHURC1-3 UTR-rs1127968-mut were constructed by NKY GENEREADER (Guangzhou, China). Then, qRT-PCR assays were performed to evaluate the relative expression levels of the longer and shorter 3 UTRs of CHURC1 under rs1127968 wid/mut genotypes. The qRT-PCR was conducted using a SYBR Premix Ex Taq kit (Takara) on a 7500 Real-Time PCR system (Applied Biosystems, Foster City, CA, USA). The sequences of the target longer and shorter 3 UTRs of CHURC1 were as follows:

Statistical Analysis
The two-sided chi-squared test and Student's t-test were used to statistically evaluate differences in demographic characteristics and smoking status between the case and the control groups at each study phase. Logistic regression analyses were used to assess the association between the candidate apaQTL-SNPs and NSCLC risk, and the odds ratios (ORs) and 95% confidence intervals (95% CIs) were obtained after adjusting for gender, age and smoking status as appropriate. Kaplan-Meier survival curves were plotted using the "survival" package in R software. A log-rank test was used to compare the difference in survival rate between the two groups. The "Coxph" function in the "survival" package was used for univariate COX regression analysis. p < 0.05 was defined as the boundary of statistical significance. Analysis were performed using Stata version 15.0 (StataCorp, College Station, TX, USA), SPSS version 20.0 (IBM Corp., Armonk, NY, USA) and R version 3.6.2 software (R Foundation for Statistical Computing, Vienna, Austria).

Characteristics of the Study Subjects
In the present study, the baseline characteristics of patients with NSCLC and cancerfree healthy controls are shown in Table 1. Briefly, a total of 4107 patients with NSCLC (3453 with LUAD and 654 with LUSC) and 3710 healthy controls were enrolled at phase I, and 779 patients with NSCLC (576 with LUAD and 203 with LUSC) and 667 healthy controls were enrolled at phase II. In phase II, the 779 NSCLC cases (including 511 males and 268 females) and 667 healthy controls (including 443 males and 224 females) were used to validate the results. As shown in Table 1, in phase I, 49.38% of patients in the case group were aged ≤60 years, making them younger than those in the control group, and 50.62% of patients in the case group were aged >60 years, making them older than those in the control group. In phase II, the age of patients with NSCLC was comparable to that of the control group (p = 0.358), and the gender ratio was comparable between patients with NSCLC and the controls (p = 0.781). Additionally, the proportion of smokers in the case group was higher than that in the control group in phase II (p < 0.001).

Identification of APA-Related Genes Associated with LUAD and LUSC
In published studies [15], 518 APA-related genes associated with LUAD, and 489 APArelated genes associated with LUSC, were obtained. Among the 518 APA-related genes associated with LUAD, the PDUI values of 233 genes were negatively correlated with the corresponding gene expression (R S < −0.3, P FDR < 0.05), and the PDUI values of 285 genes were positively correlated with the corresponding gene expression (R S > 0.3, P FDR < 0.05). Among the 489 APA-related genes associated with LUSC, 259 genes had PDUI values negatively correlated with the corresponding gene expression (R S < −0.3, P FDR < 0.05), and 230 genes had PDUI values positively correlated with (R S > 0.3, P FDR < 0.05).

Evaluation of the Association between apaQTL-SNP (rs10138506) and LUAD Risk in Phase II
To further evaluate the association between the identified apaQTL-SNP (rs10138506) and LUAD risk, genotyping was performed on 576 patients with LUAD and 667 healthy controls from the case-control study of phase II. As shown in Table 4, the variant G allele of rs10138506 was significantly associated with an increased LUAD risk (additive model: OR = 1.42, 95% CI = 1.02-1.98, p = 0.038).

Evaluation of the Association between apaQTL-SNPs (rs1130698 and rs1130719) and LUSC Risk in Phase II
To further evaluate the association between the identified apaQTL-SNPs (rs1130698 and rs1130719) and LUSC risk, genotyping was performed on 203 patients with LUSC and 667 healthy controls from the case-control study in phase II. However, no statistical significance was observed between the two apaQTL-SNPs and LUSC risk (both p > 0.05) ( Table 4).

APA Analysis for rs10138506
We further evaluated the relationship between the identified apaQTL-SNP rs10138506 and PDUI values of CHURC1 in LUAD. As shown in Figure 2, the apaQTL-SNP rs10138506 was significantly correlated with the PDUI values of CHURC1 (p = 3.33 × 10 −14 ). In detail, compared with those of the wild-type homozygote AA genotype, the PDUI values of the heterozygous AG genotype in rs10138506 were significantly higher, whereas those of the variant homozygous GG genotype were even higher.

CHURC1 Expression
Based on TCGA database, the CHURC1 expression between LUAD tumor tissues (n = 526) and adjacent non-tumor tissues (n = 108) was significantly different (p < 0.001), and lower CHURC1 expression levels were found in tumor tissues. Furthermore, CHURC1 expression levels in 57 LUAD paired samples were also analyzed, and the results showed that CHURC1 expression in LUAD tumor tissues was significantly lower than that in the adjacent non-tumor tissues (p = 0.002) ( Figure 3A,B).

Survival Analysis
To explore whether the expression and PDUI value of CHURC1 was related to the survival rate, we downloaded the TCGA dataset and the APA-related genes dataset [15] to perform survival analysis. The results showed that the survival time of high CHURC1 expression in LUAD was significantly higher than that of low CHURC1 expression (Logrank p = 0.049). (Figure 4A). Further analysis of the correlation between the PDUI value of CHURC1 and survival time showed that the survival time of the low PDUI value of CHURC1 in LUAD was higher than that of the high PDUI value (log-rank p = 0.056) ( Figure 4B).

Analysis of Poly(A) Sites of CHURC1 Based on the 3 RACE Experiment
Bioinformatic analysis indicated that different genotypes of apaQTL-SNP rs10138506 may affect the usage of different poly(A) sites in CHURC1, resulting in changes in PDUI values. Then, the predicted poly(A) sites of CHURC1 were identified in the NCBI database (https://www.ncbi.nlm.nih.gov/ accessed on 16 February 2022), and CHURC1 3 RACE was performed and verified using the mRNA from the A549 cell line. These results indicated that CHURC1 has two poly(A) sites that may produce two mRNA transcripts with different 3 UTR lengths. The distance between the proximal and distal poly(A) sites was approximately 2.7 kb. Meanwhile, SNP rs1127968 was found with a perfect high LD with rs10138506 (r 2 = 1), which was located in the 3 UTR of CHURC1 and was approximately 0.2 kb away from the poly(A) site1 ( Figure 5A). Meanwhile, the APA regulation for rs1127968 was the same as that for rs10138506 ( Figure 5B). For the shorter CHURC1 mRNA isoform, the adopted PAS signal AAUACA is located at 39 bp upstream of the poly(A) site1, and the band of the PCR product (~420 bp) was consistent with the designed sequences ( Figure 5C). For the longer CHURC1 mRNA isoform, the adopted PAS signal UUUAAA was located at 115 bp upstream of the poly(A) site2, whereas the band of the PCR product (~560 bp) was consistent with the designed sequences ( Figure 5D). Finally, PCR product sequences were aligned to the transcript reference sequences from the NCBI database. Results showed that the 3 UTR transcript sequences with different CHURC1 lengths matched with the target region.

The Expression of CHURC1 Isoforms under Different Genotypes of apaQTL-SNP rs10138506
We then performed qRT-PCR to evaluate whether different genotypes of apaQTL-SNP rs10138506 may affect the expression of different lengths of CHURC1 3 UTR isoforms. As rs1127968 had a perfect high LD (r 2 = 1) with rs10138506 and was approximately 0.2 kb away from the poly(A) site1, we further explored the association between the different genotypes of rs1127968 and the expression of CHURC1 isoforms. The results showed that, with the expression level of the shorter 3 UTR of the CHURC1 mRNA isoform as a reference compared with the rs1127968 wild G allele, the expression level of the longer 3 UTR of the CHURC1 mRNA isoform was 1.85 fold higher under the rs1127968 mutant C allele. This indicates that different genotypes of rs1127968 may mediate changes in the lengths of the 3 UTR of the CHURC1 isoforms.

Discussion
Based on the present two-phase case-control study, the relationship between apaQTL-SNPs and NSCLC risk was systematically appraised. These results showed that the variant G allele of apaQTL-SNP rs10138506 in CHURC1 was significantly associated with an increased LUAD risk. Furthermore, the variant G allele of rs10138506 was significantly associated with higher PDUI values of CHURC1. This study may shed light on the etiology and underlying molecular genetic mechanisms of LUAD.
CHURC1, a transcriptional activator controlling the fibroblast growth factor [27], is a cancer susceptibility gene, affecting cell proliferation [28]. In our present study, the CHURC1 expression level in LUAD tumor tissues was lower than that in the adjacent non-tumor tissues based on TCGA data, indicating that CHURC1 may function as a tumorsuppressor gene in LUAD development. Furthermore, transcription factor SIN3A may bind at the CHURC1 promoter in human A549 LUAD cell lines [29]. Additionally, reduced SIN3A might strongly promote A549 cell invasion [30]. Therefore, CHURC1 may exert tumor-suppressive effects through certain transcription factors (e.g., SIN3A) and further affect LUAD development. We also analyzed the protein expression of CHURC1 in LUAD based on a published database [31]. The results show that CHURC1 had no statistically significant expression difference between LUAD tumor tissues and adjacent non-tumor tissues (p = 0.33). Although the mRNA expression of CHURC1 between LUAD tumor tissues and adjacent non-tumor tissues was significantly different (p < 0.001), the protein expression of CHURC1 had no statistically significant difference. Considering the concordance rate of the differential expression of mRNAs and proteins in tissues is relatively low, only about 40% [32], in subsequent studies, screening the genes related to LUAD from the perspectives of the consistency of both mRNA and protein levels are warranted.
At present, relevant studies have shown that the risk factors for lung cancer include exposure to tobacco smoke [33], indoor air pollution caused by cooking smoke and coal combustion [34], outdoor air pollution (PM2.5, et al.) [35], infection and genetic factors [36]. Among them, although environmental risk factors (such as smoking and indoor and outdoor air pollution) have a great impact on the risk of lung cancer, genetic factors explain about 20% of the heritability of lung cancer [37]. In terms of genetic factors, many scholars have explored the impact of genetic variation on LUAD. Dai et al. found that 19 susceptible loci were significantly associated with the risk of non-small-cell lung cancer through a large prospective cohort study of Chinese people, including two new SNP loci for LUAD and one new SNP loci for LUSC [8]. The risk loci found in the study explained the genetic basis of lung cancer, and people with a high genetic risk have a significantly higher risk of lung cancer than people with a low genetic risk.
Studies have found that some tumor-suppressor genes tend to produce mRNAs with longer 3 UTRs in tumor tissues. Begik et al. performed an APA analysis of gene expression data from various cancer types and found that PNRC1 acts as a tumor-suppressor gene, which tends to produce mRNAs with longer 3 UTRs in tumor tissues [38]. Additionally, ZBTB4 is significantly downregulated in breast cancer tissues with longer 3 UTRs [39]. Therefore, the role of identified apaQTL-SNP rs10138506 in regulating the 3 UTR length was explored by mediating the CHURC1 expression. Furthermore, SNP2APA data results showed that the PDUI value of CHURC1 increased under the variant G allele of rs10138506. Meanwhile, 3 RACE experiments verified the presence of two poly(A) sites (proximal and distal) in CHURC1, and the SNP rs1127968 located 0.2 kb downstream of APA site1 was in a perfect high LD with rs10138506 (r 2 = 1.0). Additionally, qRT-PCR assay indicated that CHURC1 tends to use the distal poly(A) site under the variant C allele of rs1127968, which may express transcripts with longer 3 UTRs, and the potential mechanism underlying the alteration of 3 UTR length that regulates CHURC1 expression levels remains to be further explored.
The 3 UTR contains various regulatory elements, including miRNA binding sites, which play a significant role in the expression level and translation ability of their target genes. Changes in the 3 UTR length of mRNAs mediated by APA events may affect the existence of miRNA binding sites, which may in turn affect the target genes [40,41]. Several potentially conserved binding sites in the 3 UTR of CHURC1 were further identified using Targetscan (http://www.targetscan.org/vert_71/ accessed on 18 April 2022). The results showed that both miR-4262 and miR-181b-5p were identified as potential CHURC1 regulators. Previous studies reported that miR-181b-5p may play a regulatory role in LUAD, which was down-regulated in LUAD [42]. Furthermore, miR-4262 was significantly down-regulated in gastric cancer tissues, and its lower-level expression was associated with a poorer prognosis and lower survival rate in patients with gastric cancer [43]. Taken together, we hypothesized that the variant G allele of rs10138506 could influence the APA events through its high LD SNP rs1127968, thereby making the 3 UTR expression longer in CHURC1 mRNA. Changes in the 3 UTR length of mRNA yield the enrichment of miRNA binding in the 3 UTR of CHURC1 and may, therefore, inhibit the CHURC1 expression in LUAD and elevate the risk of developing LUAD ( Figure 6).
This study has several merits. Firstly, the well-reported NSCLC-related APA genes were integrated with the SNP2APA apaQTL-SNP database, which may contribute to the precise screening of candidate NSCLC-related apaQTL-SNPs. Secondly, although LUAD and LUSC are the two main histological NSCLC subtypes, they not only have different histological origins, but also present different molecular oncogenic mechanisms. Thus, apaQTL-SNPs associated with both LUAD and LUSC were separately screened, and the association between screened apaQTL-SNPs and the risk of two different NSCLC pathological types were evaluated. In addition, the conclusion that apaQTL-SNP rs10138506 may modify NSCLC susceptibility across histological subtypes has enhanced the rigor of our study strategy. Thirdly, a two-phase case-control study with a large sample size was conducted to evaluate the association between apaQTL-SNPs and NSCLC risk, which may increase the stability and reliability of the results. The results of this study may be helpful in providing more genetic basis data for the screening of high-risk populations of lung cancer, and as an important reference in establishing a feasible lung-cancer-screening program for precision prevention and individualized intervention. However, certain limitations unavoidably exist in this study. The samples in the first phase of GWAS screening were all females. Although samples in the second phase of the validation included both males and females, we cannot exclude the possibility that gender bias in the first phase may have affected our results, and further studies with strict inclusion criteria should be paid attention.

Conclusions
This study concluded that the variant G allele of the apaQTL-SNP rs10138506 in CHURC1 could correlate with longer 3 UTR lengths of CHURC1 mRNA and increase LUAD risk. Therefore, further investigation is warranted to explore the potential biological effects of rs10138506 and CHURC1 in LUAD development.
Institutional Review Board Statement: The study was reviewed and approved by the ethics committee of Nantong University (Approval No. 2022-2, February 2022).
Informed Consent Statement: Informed consent was obtained from all subjects involved in the study.

Data Availability Statement:
The data that support the findings of this study are available from the corresponding author upon reasonable request.

Conflicts of Interest:
The authors declared no potential conflict of interest.