A Significant Increasing Risk Association between Cigarette Smoking and XPA and XPC Genes Polymorphisms

Cigarette smoking (CS) is a major cause of various serious diseases due to tobacco chemicals. There is evidence suggesting that CS has been linked with the DNA damage repair system, as it can affect genomic stability, inducing genetic changes in the genes involved in the repair system, specifically the nucleotide excision repair (NER) pathway, affecting the function and/or regulation of these genes. Single nucleotide polymorphism (SNP), along with CS, can affect the work of the NER pathway and, therefore, could lead to different diseases. This study explored the association of four SNPs in both XPA and XPC genes with CS in the Saudi population. The Taq Man genotyping assay was used for 220 healthy non-smokers (control) and 201 healthy smokers to evaluate four SNPs in the XPA gene named rs10817938, rs1800975, rs3176751, and rs3176752 and four SNPs in the XPC gene called rs1870134, rs2228000, rs2228001, and rs2607775. In the XPA gene, SNP rs3176751 showed a high-risk association with CS-induced diseases with all clinical parameters, including CS duration, CS intensity, gender, and age of smokers. On the other hand, SNP rs1800975 showed a statistically significant low-risk association with all clinical parameters. In addition, rs10817938 showed a high-risk association only with long-term smokers and a low-risk association only with younger smokers. A low-risk association was found in SNP rs3176752 with older smokers. In the XPC gene, SNP rs2228001 showed a low-risk association only with female smokers. SNP rs2607775 revealed a statistically significant low-risk association with CS-induced diseases, concerning all parameters, except for male smokers. However, SNP rs2228000 and rs1870134 showed no association with CS. Overall, the study results demonstrated possible significant associations (effector/and protector) between CS and SNPs polymorphisms in DNA repair genes, such as XPA and XPC, except for rs2228000 and rs1870134 polymorphisms.


Introduction
In the 1930s, epidemiologists used case-control surveys to investigate the relationship between lung cancer and smoking. The first study was published in 1939 by Müller, and this study shows that the incidence of lung cancer was higher in smokers than in nonsmokers [1,2]. Over the years, the number of smokers in the world has increased, and many various serious diseases have been attributed to smoking, such as cardiovascular diseases [3], obstruction of the respiratory tract, age-related macular degeneration, and the risk of developing diabetes, which is all 30-40% higher for smokers than nonsmokers. Additionally, many studies showed that there are different types of cancer that are related to CS, such as lung, oral cavity, larynx, tongue, bladder, esophagus, colon, and rectum cancers [4]. According to the World Health Organization, smoking was found to be linked

Ethics Statement and Samples Collection
Written ethical consent was already reviewed and obtained by the Research Ethics Committee of the College of Applied Medical Sciences at King Saud University (KSU) in Riyadh, Saudi Arabia (Approval Number: CAMS 13/3536). As described in our previous work [6], smokers were divided into two groups based on daily quantity of cigarette consumption (≥10 and <10 cigarettes/day). All volunteer participant in the current study signed written informed consent. Clinical data on smoking history, allergic symptoms and diseases, number of cigarettes smoked daily, and body mass index (BMI) were obtained through a self-completed questionnaire. The control group corresponds to non-smokers, and the former smokers were excluded from the control group. Additionally, as described in our previously study [22], all samples were collected from self-reported healthy smokers and non-smokers (controls) who had signed informed consent forms, confirming their participation in the present study. We excluded any potential participants who self-reported having symptoms, such as metabolic disorders, inflammatory diseases, autoimmune diseases, cancer, or blood diseases. The blood samples (3 mL) were collected via venipuncture in EDTA-containing tubes. A total of 421 participants were divided into 220 healthy nonsmokers (control) and 201 healthy smokers. Baseline characteristics of cases and controls were performed (Age, no. of CS/day ≤ 10 sticks per day (moderate) and >10 sticks per day as heavy smokers, and period of smoking). In addition, it was made sure that they had not taken any other drug and that they had no disease. Samples of participants who did not meet the criteria were excluded from the study.

DNA Extraction
Genomic DNA was extracted from 200 µL of EDTA anticoagulated peripheral blood using Qiagen QIAamp ® DNA Mini Kit (Q), as per the manufacturer's instructions. The DNA concentration quantity was determined using a NanoDrop 8000 (Thermo Fisher Scientific, Waltham, MA, USA). The purity of each DNA sample was then determined by calculating the ratio of A260/A280 nm and A260/A230 nm. The samples were considered contamination-free when the ratios were 1.7-2.0. The DNA samples were preserved at −20 • C.

SNP Selection Taq Man Genotyping Assay
A Taq Man assay was carried out for all 421 samples to examine the polymorphism variation of XPA and XPC genes. The DNA blood samples were diluted to obtain a final concentration of 10 ng/µL. Four SNPs in the XPA gene, named rs10817938, rs1800975, rs3176751, and rs3176752, as well as four SNPs in the XPC gene, called rs2607775, rs2228000, rs2228001, and rs1870134, were evaluated by the genotyping assay. The selection of these SNPs was based on the previous review, and they were selected from the NCBI database (http://www.ncbi.nlm.nih.gov/snp, accessed on 1 February 2023). Each SNP was located either in promoter, intron, or exon regions (Tables S1 and S2), which may lead to a change in the regulation of the protein or modification in protein folding or function. As described by our previous works [23][24][25][26], a total of 8 µL of the final reaction mix and 2 µL of DNA (10 ng/µL) were distributed in an optical reaction plate. The SNP reaction mix contained 5.3 µL of TaqMan ® genotyping master mix (Applied Biosystems, Foster City, CA, USA), 2.5 µL of nuclease-free water, and 0.2 µL of SNP. The negative control (no DNA) was included in each plate. PCR amplification was carried out under the following conditions: a primary denaturation step at 95 • C for 7 min, followed by 40 cycles of 95 • C for 30 s, 60 • C for 1 min, and 72 • C for 30 s. The PCR reaction mixture was terminated for a final extension at 72 • C for 5 min. Quant Studio™ 7 Flex Real-Time PCR System (Applied Biosystems) with an endpoint reading of the genotypes was used to perform the reaction.

Statistical Analysis
Statistical analysis was carried out using the Statistical Package for Social Sciences (SPSS) version 26.0 (IBM-SPSS, Armonk, NY, USA). Hardy Weinberg equilibrium was used to check the deviation of the computed genotypic and allelic frequencies of each SNP. Genetic comparisons were performed with the aid of the χ2 test and allelic odds ratios (ORs). The chi-square test was used to determine the proportion of cases versus control and according to different groupings (based on duration of smoking, frequency of smoking, gender, and age for the SNPs and alleles). Odds ratio and 95% confidence interval were calculated using the online OR calculator (Medcalc, https://www.medcalc. org/calc/odds_ratio.php, accessed on 1 February 2023). In addition, Fisher's exact test (two-tailed) was applied. Results were expressed as the mean and the standard deviation for age. The proportion of SNPs and alleles were reported as numbers and percentages. An independent sample t-test was performed to compare the mean age between smokes and non-smokers. p values of less than 0.05 were considered statistically significant. OR more than one indicates high-risk association, and less than one indicates a low-risk association.

Baseline and Clinical Characteristics of Participants
A total of 220 Saudi healthy non-smokers and 201 healthy smokers were used in the study. Participants were further classified into different groups based on age, gender, period of smoking, and the number of CS per day. Table 1 illustrates the baseline and clinical characteristics of participants measured for all participants. These variable parameters were used to study the association between SNPs in tested genes and the risk of CS causing disease.

Global Genotyping Analysis of XPA and XPC among Smokers and Non-Smokers
To evaluate the association of SNPs in the XPA and XPC genes with the effects of CS on induced diseases, we evaluated rs10817938, rs1800975, rs3176751, rs3176752, rs1870134, rs2228000, rs2228001, and rs2607775 variants and CS in 421 participants. The distributions of genotyping and allele frequencies of the smoker and non-smoker groups are summarized in Table 2. For SNP rs10817938 (T/C) of XPA gene, no association was found with the risk related to smoking induced diseases. In SNP rs1800975 of the XPA gene, the genotyping distribution was as follows: 96.4% TT, 3.1% TC, and 0.5% CC in smokers, while it was 55.1% TT, 6.5% TC, and 38.4% CC in non-smokers. The TC, CC, and TC+CC alleles of rs1800975 decreased the risk of developing diseases related to smoking by approximately 72.6%, 99. SNP rs3176751 exhibited significant differences between smoker and non-smokers with a higher risk of smoking-induced diseases. The genotyping distribution was as the follows: 58% GG and 66.7% CG+GG in smokers, while it was 19.6% GG and 31.1% CG+GG in non-smokers when compared to the CC reference genotype (GG: OR = 6.105, CI = [3.869-9.631], p < 0.001; CG+GG = OR = 2.147, CI = [1.511-3.050], p < 0.001). The C allele was used as a reference. The G allele was found to be more frequent in smokers (62.3%) and in the non-smoker group (25.3%) compared to the C allele. The G allele showed significant high-risk association with smoking-induced diseases among smokers, as shown in Table 2 (OR = 4.869, CI = [3.618-6.555], p < 0.001). Additionally, SNP rs3176752 of the XPA gene did not show any association with CS. The genotyping distribution of GG, GT, and TT variants was estimated to be 97%, 3%, and 0%, respectively, in smokers, and it was 96%, 4%, and 0%, respectively, in the controls.
We have evaluated the association SNPs in the XPC gene (rs1870134, rs2228000, rs2228001, and rs2607775) variations and CS in 421 participants. The data for three SNPs, rs1870134, rs2228000 and rs2228001, did not show any association with the increase or decrease in the risk of smoking-induced diseases in the Saudi population. However, SNP rs2607775 exhibited significant differences between smokers and non-smokers, presenting a high risk of smoking-induced diseases in the CC genotype and the C allele. The genotyping distribution was as follows: 20.1% CC in smokers and 8.8% CC in non-smokers; CC: OR = 2.810, CI = [1.509-5.233], p = 0.001; the G allele was used as a reference. The C allele was found to be more frequent in smokers (39.2%) and in the non-smoker group (28.7%) compared to the G allele. The C allele showed significant high-risk association with smoking-induced diseases among smokers, as shown in Table 3      The smokers and non-smokers were divided into different groups based on CS duration, daily CS average gender, and age. The associations of the four SNPs in XPA and the four SNPs in XPC genes with clinical characteristics were evaluated. Tables 3 and 4 compare genotyping and allele frequencies for each SNP in XPA and XPC genes based on different clinical characteristics.   SNPs XPA (rs10817938, rs1800975, rs3176751, and rs3176752) To evaluate the relationships between different XPA rs10817938 genotypes and CS in the control and smokers, we distributed the study based on smoking duration (short-term smokers ≤ five years and long-term smokers > five years), frequency of smoking (≤10 times and >10 times), gender (males and females), and the average age of smokers (≤28 years and >28 years). The results of genotype and allele distributions of the rs10817938 variant in smokers and controls, with its different clinical characteristics, are described in Table 3A-D. XPA rs10817938 showed an association only with the duration of smoking (smoking for ≤five years and >five years) in the TC+CC genotype. The distribution of genotyping frequency was 37.8% and 35.7%. In short-term and long-term smokers, only the TC+CC genotype showed low-risk association (TC+CC: OR = 0.402, CI = [0.254-0.637], p < 0.001 for short -term; while, in long-term-smoking: OR = 0.380, CI = [0.239-0.603], p < 0.001).

Estimation for
SNP rs1800975 has a significant association with all clinical parameters. With regards to smoking duration (more/less five years), the genotyping and allele distribution were analyzed in Table 3A. In a period of smoking of more than five years, the results showed highly significant low-risk association in genotype TC compared to the TT genotype.  Table 3C, males showed significant low-risk association with smokers. The CC variant, which is homozygous in males, exhibited significant lowrisk association with smokers (OR = 0.008; CI = [0.001-0.059]; p < 0.001), along with the C. allele (OR = 0.019; CI = [0.008-0.018]; p < 0.001). Additionally, the C. Allele presented 0.168fold protective effects in female smokers (OR = 0.0.168; CI = CI = [0.050-0.565]; p < 0.004). The SNPs showed low-risk association with age for both those under and above 28 years, as shown in Table 3D The rs3176751 SNPs showed significant high-risk association with all clinical parameters tested in this study. The genotyping and allele distribution were shown in Table 3A-D. For example, the allele frequencies analysis of the GG genotype and the G. allele showed that short-term smokers (≤ five years) and long-term smokers (>five years) revealed high-risk significant association when compared to non-smokers (for short-term smokers: (OR = 7.866 GG and 6.150 G), (CI = [(4.4012-14.056)], p < 0.001), (CI = [4.197-9.013], p < 0.001), respectively; for short-term smokers: (OR = 5.691 GG and 4.529 G), (CI = [3.207-10.099], p < 0.001); (CI = [3.111-6.593], p < 0.001), respectively). The moderate and heavy smokers both dis-played a significant high-risk relationship when compared to non-smokers (Table 3B). The GG genotype presented a significant high-risk association of moderate smokers (OR = 7.692; CI = [4.148-14.262]; p < 0.001). The G. allele presented 6.015 high-risk effects with moderate smokers (OR = 6.015, CI = [4.004-9.035] and p < 0.001). Heavy smokers showed a significant high-risk association with the GG genotype and the G. allele ((OR = 6.230 GG, 4.909 G); (CI = [3.575-10.858] p < 0.001); (CI = [3.415-7.058], p < 0.001), respectively). The combination of CG+GG has a significant high-risk association in both moderate and heavy smokers (p < 0.001 (OR = 2.294, 2.180; CI = [1.466-3.590], CI = [1.438-3.305], respectively). In addition, there was significant high-risk association for both genders, male and female, as well as the age of subjects for the GG genotype and the G. allele, as shown in Table 3C SNP rs3176752 showed no significant association with clinical parameters, including CS duration, daily CS average, gender, and younger smokers, as shown in Table 3A-D. 3.3.2. Estimation for SNPs XPC (rs1870134, rs2228000, rs2228001, and rs2607775) The genotyping and allele distributions of SNPs XPA rs1870134 and rs2228000 were estimated in order to investigate the link between clinical parameters and the risk of smoking-induced diseases. These SNPs showed no significant association with all clinical parameters. The comparison of alleles and genotyping frequencies between subjects with the four clinical characteristics did not show any correlation because the p value is not statistically significant.
The analysis result in SNP rs2228001 does not show a significant association with the risk of smoking causing disease, considering genotyping and allele frequencies and statistical values, except for the female gender. The CC genotype and the C allele have a high-risk association with smoking-induced diseases ((OR = 18.765 CC, 2.810 C); CI = [1.012-347.841], [1.246-6.377], p = 0.044, 0.013, respectively). SNP 2607775CC presented a high-risk association with all clinical parameters. Based on the C allele and the CC genotype frequencies comparison, the SNP between subjects with short-term smokers and long-term smokers compared to non-smokers showed significant high-risk association with smoking duration (Table 4A). Similarly, the CC and the C. allele genotype displayed a significant high-risk relationship with moderate smokers when compared to non-smokers (see Table 4B). Additionally, the SNP showed significant high-risk association with both male and female smokers (Table 4C). Lastly, there was a significant high-risk association for age above 28 between smokers and non-smokers (Table 4D).

Observed and Expected Counts
The null hypothesis is that there is no difference in the genotypes or alleles and the results in the equilibrium. So, if the p value is <0.05, we will reject the null hypothesis, and the genotype/alleles will be in disequilibrium (Table 5).
condition caused by smoking in the Saudi population; this study could indicate the early effects of the smoking-induced disease because of genetic variations to several affected genes following the CS exposure. This shows the importance of evaluating the effects of CS on causing disease or cancer by looking into associations with a genetic variation on the number of genes in different pathways.
The present study aimed to investigate the XPA and XPC gene polymorphisms' variations in the NER pathway, an important part of the DNA repair system damage caused by CS in smokers versus non-smokers of the Saudi population, to detect a genetic marker that could help predict disease, thus reducing the risks caused by CS among healthy individuals. In this work, four SNPs were selected (rs10817938, rs1800975, rs3176751, and rs3176752) and distributed in different regions of XPA gene, and four SNPs (rs1870134, rs2228000, rs2228001, and rs2607775) were distributed in different regions of the XPC gene. There have been many studies suggesting that XPA and XPC polymorphisms had a significant effect on the risk of cancer and disease, and they could be a biomarker [35,36].
Furthermore, the associations were validated between XPA and XPC SNPs and clinical characteristics, including CS duration, daily CS average, gender, and age. The XPA and XPC SNPs appeared to be significantly affected by CS, resulting in genetic changes in the DNA repair system gene. Given that, the XPA polymorphisms are related to the risk of many types of cancers [21,37]. In this study, SNP rs10817938 results showed a significant low-risk association only with the duration of smoking. SNPs rs1800975 and rs3176751 results showed significant low-risk and high-risk associations, respectively, with regard to all clinical parameters. However, rs3176752 showed no significant association in all parameters. The results of XPA rs3176751 polymorphisms might increase the influence of these clinical parameters regarding disease caused by CS. It is considered that CS contains chemical carcinogens that are known to produce genetic mutations that may not be repaired by the NER pathway because this mutation may not be recognized by XPA with the rs3176751 mutant gene type. The study results of SNP 10817938 for the Saudi population are inconsistent with the study that confirmed the association of XPA polymorphisms with oral squamous cell carcinoma (OSCC) risk in the Han Chinese population. The results demonstrated a significant high-risk association between CS and CC homozygous genotype in rs10817938 of OSCC, p < 0.01, OR = 3.60 [38]. For rs1800975, the results of this study are similar to a prior study, demonstrating that this variant was associated with a significantly reduced risk of lung cancer [39]. Although the percentage of female smokers used in this study was very low (13.1%) compared to the male smokers (59.9%), it was intriguing to find a significant high-risk difference in the genotypic and the allelic distribution of XPA rs3176751 in female smokers, suggesting a possible interference of CS in disease development among women, as reported previously for innate immune genes in acute respiratory distress syndrome [40] and human papillomavirus [41]. The results of the study indicate that gender may have a significant role in the association between the XPA rs3176751 polymorphism and the cancer risk or other diseases.
A comparison between smokers with clinical characteristics to controls revealed that there were no associations observed between SNPs rs1870134 and rs2228000 in the XPC gene and smokers. These results do not match the previous study, which showed that XPC rs1870134 was verified to be correlated with a decreased risk of hepatocellular and prostate cancers [42,43]. Additionally, SNP rs2228000 CT/TT genotype revealed a protective effect of gastric cancer only significant among subjects older than 58 years in a Southern Chinese population [44]. However, there were significant high-risk associations between the rs2228001 polymorphism and female smokers. SNP rs2607775 showed significant high-risk association with all clinical parameters.
Finally, this work offers various strengths and benefits. One of its strengths is that this study determined polymorphisms in the XPA and XPC genes in three categories of the SNP site, including 3 UTR, 5 UTR, and exon variants. Second, all samples were obtained from the same region of Riyadh and not from different regions of Saudi Arabia, and they were carefully monitored and stored according to protocol. However, due to the social traditions of our community, this study was limited with regards to the adequacy of samples from female smokers.

Conclusions
Overall, the present study results demonstrated possible significant associations between CS and SNPs polymorphisms in DNA repair genes, such as XPA and XPC, and these effects of polymorphism can be a key factor in the development of CS-induced disease. The exact mechanism of how smoking influences genetic changes that cause cancer or disease remains unclear. Therefore, future studies are required to investigate the expressions of XPA and XPC gene and the link between polymorphisms and the rate of CS. Additionally, we suggest examining the oxidatively generated guanine lesion 8-oxoguanin to evaluate the oxidative stress DNA damage that occurs by CS. A further investigation comparing our results with other previously studied populations involving different ethnicities and CS habits may help define the effects of CS on different genes involved in DNA repair systems. The finding of identified SNPs polymorphisms associated with the CS induced disease could be used as biomarkers.
Supplementary Materials: The following supporting information can be downloaded at: https: //www.mdpi.com/article/10.3390/genes14071349/s1, Table S1: Data for XPA genes SNPs; Table S2: Data for XPC genes SNPs. Informed Consent Statement: Informed consent was obtained from all subjects involved in the study. Written informed consent has been obtained from the patient(s) to publish this paper. Data Availability Statement: All data generated or analyzed during this study are included in this published article.

Acknowledgments:
The authors extend their appreciation to the Researchers Supporting Project number (RSP2023R191), King Saud University, Riyadh, Saudi Arabia.

Conflicts of Interest:
The authors declare no conflict of interest.