Associations between Biomarkers of Exposure and Lung Cancer Risk among Exclusive Cigarette Smokers in the Golestan Cohort Study

Biomarkers of tobacco exposure are known to be associated with disease risk but previous studies are limited in number and restricted to certain regions. We conducted a nested case–control study examining baseline levels and subsequent lung cancer incidence among current male exclusive cigarette smokers in the Golestan Cohort Study in Iran. We calculated geometric mean biomarker concentrations for 28 matched cases and 52 controls for the correlation of biomarker levels among controls and for adjusted odds’ ratios (ORs) for lung cancer incidence by biomarker concentration, accounting for demographic characteristics, smoking quantity and duration, and opium use. Lung cancer cases had higher average levels of most biomarkers including total nicotine equivalents (TNE-2), 4-(methylnitrosamino)-1-(3-pyridyl)-1-butanol (NNAL), and 3-hydroxyfluorene (3-FLU). Many biomarkers correlated highly with one another including TNE-2 with NNAL and N-Acetyl-S-(2-cyanoethyl)-L-cysteine (2CYEMA), and N-Acetyl-S-(4-hydroxy-2-buten-1-yl)-L-cysteine (t4HBEMA) with N-Acetyl-S-(3-hydroxypropyl-1-methyl)-L-cysteine (3HMPMA) and N-Acetyl-S-(4-hydroxy-2-methyl-2-buten-1-yl)-L-cysteine (4HMBEMA). Lung cancer risk increased with concentration for several biomarkers, including TNE-2 (OR = 2.22, 95% CI = 1.03, 4.78) and NNN (OR = 2.44, 95% CI = 1.13, 5.27), and estimates were significant after further adjustment for demographic and smoking characteristics for 2CYEMA (OR = 2.17, 95% CI = 1.03, 4.55), N-Acetyl-S-(2-carbamoylethyl)-L-cysteine (2CAEMA) (OR = 2.14, 95% CI = 1.01, 4.55), and N-Acetyl-S-(2-hydroxypropyl)-L-cysteine (2HPMA) (OR = 2.85, 95% CI = 1.04, 7.81). Estimates were not significant with adjustment for opium use. Concentrations of many biomarkers were higher at the baseline for participants who subsequently developed lung cancer than among the matched controls. Odds of lung cancer were higher for several biomarkers including with adjustment for smoking exposure for some but not with adjustment for opium use.

Several prospective cohort studies have been used to examine the relationship between tobacco smoke constituents and cancer risk using biomarker of exposure data. For example, Church et al. [8] conducted a nested case-control study of cigarette smokers using data from the Prostate, Lung, Colorectal, and Ovarian Cancer Screening Trial (PLCO) in the U.S. They found that baseline serum NNAL was significantly associated with subsequent incidence of lung cancer even after adjustment for demographic characteristics, smoking duration, and levels of cotinine, the principal metabolite of nicotine, and r-1,t-2,3,c-4-tetrahydroxy-1,2,3,4tetrahydrophenanthrene (PheT), one PAH metabolite. Researchers have also analyzed baseline biomarker data for subsequent smoking cancer cases and controls in the Shanghai Cohort Study [9]. They found that cotinine, NNAL, and PheT were significantly associated with lung cancer risk even after adjusting for the number of cigarettes smoked per day (CPD), number of years of smoking, and levels of the other two biomarkers [10]. They also found a similar relationship between NNN and esophageal cancer [11]. They did not find an association between lung cancer risk and the metabolites of five VOCs, including acrolein and benzene, after adjustment for cotinine levels [12]. Another analysis of the Shanghai Cohort Study and Singapore Chinese Health Study also found associations between cotinine, NNAL levels, and lung cancer incidence, adjusting for smoking history and levels of the other biomarker [13]. A related prospective study among people who have never smoked in their lifetime in the Shanghai cohort focused on three PAHs including PheT and metabolites of four VOCs [14]. This study also found associations with lung cancer risk for the PAHs but not the VOCs. The use of biomarker data in these studies, in addition to self-reported behavioral information, provided quantitative measures of exposure for cigarette smoke and its constituents.
This study examines urinary biomarker levels and lung cancer risk utilizing the resources provided by the Golestan Cohort Study (GCS), a population-based prospective cohort study conducted in the Golestan Province of northeastern Iran [15]. Prior analyses of this cohort have examined the use of tobacco products such as cigarettes, waterpipe, and nass (a type of smokeless tobacco used in the region that also includes ash and lime) among study participants and analyzed data on biomarkers of exposure from users of these products [16,17]. In the current analysis, we examine associations between baseline concentrations of nicotine, TSNAs, and PAHs and VOC metabolites, and the subsequent lung cancer incidence among exclusive cigarette smokers. In doing so, we provide addi-tional information about these associations using a cohort study from a region of the world in which these relationships have not been previously studied.

Materials and Methods
Slightly more than 50,000 residents of the Golestan Province aged 40 to 75 years were recruited for the GCS between 2004 and 2008 [15]. Approximately 20% of the participants lived in the city of Gonbad Kavus and the remainder consisted of eligible residents from 326 rural villages. The primary purpose of the study was to investigate the relationship between risk factors and chronic disease in this population, in particular esophageal cancer [15]. Study participants were interviewed by a trained interviewer using a structured questionnaire. Participants provided information on topics such as demographic characteristics, socioeconomic status, and medical history, as well as use of tobacco products, alcohol, and opium in the study's general questionnaire [18]. Participants were asked about their use of cigarettes, nass, and waterpipes and pipes, including frequency, the amount of use, and ages at which they began and ended their use. This study used information reported at baseline. Participants were also asked to provide baseline biospecimens including a spot urine sample at the time of recruitment. The GCS and its study protocols were approved by the institutional review boards of Tehran University of Medical Sciences, the International Agency for Research on Cancer (IARC), and the National Cancer Institute [15]. The involvement of the Centers for Disease Control and Prevention (CDC) laboratories did not constitute engagement in human subject research.
GCS participants were monitored after the study enrollment through annual telephone interviews and home visits through the end of 2017 [18]. In the event of a participant's cancer diagnosis or death, a study staff member was sent to the home of the individual to collect detailed information and the participant's medical records were obtained from their medical center. The collected information was independently evaluated by at least two physicians to produce a cancer determination based on the 10th Revision of the International Statistical Classification of Diseases and Related Health Problems (lung cancer: C34). All cancer diagnoses were further confirmed through linkage to the Golestan Cancer Registry which is included in the IARC's Cancer Incidence in Five Continents series [19].
This analysis used the resources provided by the GCS to perform a nested case-control study examining the relationship between biomarkers of exposure and lung cancer risk. Up until 1 January 2018, a total of 116 probable cases of lung cancer accrued in the cohort. We used risk-set sampling to randomly select two controls for each case, matched by tobacco use, age, sex, place of residence (urban, living in Gonbad Kavus/rural, living in a village), enrollment period, and duration of the follow-up in the cohort. For the duration of the follow-up, the cases were matched with the controls from the same enrollment period who did not have lung cancer at the time of lung cancer diagnosis. The analysis was limited to current exclusive cigarette smokers who reported daily or non-daily smoking, smoked at least once a week for at least six months, and never used a waterpipe tobacco or nass at the time of enrollment. Participants could not have been diagnosed with lung cancer and were free of other cancers at baseline. A previous study of mortality risks among tobacco users in the GCS found that the exclusion of cases occurring during the first two years of the follow-up did not affect estimates [20]. A total of 31 confirmed cases and 58 controls were identified from this group. Three study participants with extremes in hydration (i.e., urinary creatinine levels outside the range of 10-370 mg/dL) were excluded from the analysis [21], as were six participants who were no longer matched to an eligible case or control, resulting in the inclusion of 28 cases and 52 controls in the analysis. Figure S1 in the Supplementary Material graphically presents the selection of participants. The mean follow-up time between the biospecimen collection and cancer diagnosis for the cases was 5.1 years with a range of 0 to 11 years.
The measurement of biomarkers in the baseline urine samples was conducted at the Division of Laboratory Sciences of the National Center for Environmental Health at the CDC and detailed information about the assay methodology has been presented previously [16]. Panels consisted of tobacco alkaloids (7 nicotine metabolites and 2 minor alkaloids), TSNAs (4 compounds), PAH metabolites (7 compounds), and VOC metabolites (19 compounds). The results for all measured biomarkers are presented as supplementary material (Tables S1-S3) and a group of biomarkers were selected for presentation in the main body as representatives of different classes of harmful or potentially harmful constituents because of their health impact. For example, the TSNAs NNAL and NNN were selected as powerful carcinogens [5,6]. Metabolites of the VOCs acrylonitrile and 1,3-butadiene were selected due to their substantial cancer risks and a metabolite of acrolein was selected due to its respiratory effects [22]. Of all the biomarkers evaluated in this study, we present results for two TSNAs (NNAL and NNN), two PAH metabolites (1-hydroxypyrene (1-PYR) and 3-hydroxyfluorene (3-FLU)), and ten VOC metabolites (N-acetyl-S- and N-acetyl-S-(4-hydroxy-2-methyl-2-buten-1-yl)-L-cysteine (4HMBEMA)-isoprene). Nicotine itself is not the main cause of cancer from tobacco use [23] but nicotine and its metabolites such as cotinine are often used as measures of smoking exposure [24]. As such, total nicotine equivalents (TNE-2, the molar sum of cotinine, and trans-3 -hydroxycotinine) were used as a measure of overall nicotine exposure in this analysis [25].
Ever regular use of a substance including cigarettes, opium, and alcohol was defined as having ever used at least once a week for at least six months. Socioeconomic status was assessed using quartiles of a previously developed wealth score based on characteristics such as occupation and ownership of property, vehicles, and appliances [26]. Other control variables included education (none, 1-8 years, and 9+ years), ethnicity (Turkmen and non-Turkmen), regular opium use (never used and ever used), regular alcohol use (never used and ever used), and body mass index based on measured height and weight (<25, 25-29, 30+ kg/m 2 ). Opium is used in various forms in the region and can be ingested, injected, or smoked [18].
The distribution of demographic and substance use characteristics, mean age at smoking initiation, mean CPD, and geometric mean biomarker concentrations were calculated for the cases and controls. Biomarker concentrations were adjusted by dividing by urinary creatinine as a measure of hydration. The means and 95% confidence intervals of the log-transformed values were calculated and exponentiated to obtain geometric means and confidence intervals. Biomarker levels below the limit of detection were replaced by the limit of detection divided by the square root of 2 [27]. For most biomarkers, fewer than 3 participants had values lower than the limit of detection with the exception of 18 participants for NNN and 9 for PHGA. Four participants did not have information for NNN and NNAL. P-values were calculated using two-sided chi-squared tests for differences of frequencies and two-sided t-tests for differences of means. Values less than 0.05 were considered statistically significant. Multiple comparisons of biomarker concentrations were controlled for with Benjamini and Hochberg's False Discovery Rate procedure [28]. An adjusted p-value, 0.05 * (the unadjusted p-value rank/the total number of biomarker comparisons), was calculated and the null hypothesis of no difference between log-transformed means was rejected if the unadjusted p-value was less than or equal to the adjusted pvalue. The correlation between biomarker values for controls was calculated as the Pearson correlation coefficients of creatinine-corrected and log-transformed biomarker values. Associations between log-transformed biomarker concentrations and lung cancer incidence were analyzed using conditional logistic regression analysis. This analysis involved accounting for the matching and adjusting for the log-transformed creatinine values as a covariable and then sequentially adjusting for the demographic characteristics of education and ethnicity; demographic characteristics and the regular use of opium; demographic characteristics, CPD, and years of cigarette smoking; and demographics characteristics, opium use, smoking quantity, and duration. The results represent the change in lung cancer risk associated with one log-unit change of the relevant biomarker. Table 1 presents demographic and substance use characteristics of study participants. All participants were male and mostly from rural areas. Most participants were middleaged with 79% being between 40 and 59 years of age at enrollment. Among the cases, 68% regularly used opium, whereas only 33% of the controls had. The cases smoked on average 16.6 CPD and controls 12.6 CPD. Mean number of cigarettes smoked per day (SD) 16.6 (7.0) 12.6 (9.2) 0.045 Table 2 presents geometric mean biomarker concentrations for participants. The cases had higher levels of all but two of the biomarkers. For example, mean TNE-2 was 57.3 nmol/mg creatinine (95% CI = 46.0, 71.3 nmol/mg) among the cases and 13.5 nmol/mg (95% CI = 6.9, 26.3 nmol/mg) among the controls. Mean NNAL was 0.280 ng/mg (95% CI = 0.213, 0.368 ng/mg) among the cases and 0.140 ng/mg (95% CI = 0.099, 0.199 ng/mg) among the controls, and mean 3-FLU was 1.88 ng/mg (95% CI = 1.45, 2.44 ng/mg) among the cases and 1.04 ng/mg (0.73, 1.48 ng/mg) among the controls.  (2) Adjusted p-value calculated as α × k/m where α is the significance level (0.05), k is the unadjusted p-value rank, and m is the total number of biomarker comparisons tested (15). (3) The null hypothesis was rejected for 2COEMA as the lowest-ranked biomarker for which the p-value ≤ adjusted p-value and for all higher-ranked biomarkers. Table 3 presents the correlation between biomarkers among the controls. All correlation coefficients were statistically significant with the exception of those for 2CAEMA with NNN and 4HMBEMA and all those involving PHGA. Many of the biomarkers measured were highly correlated with each other, with particularly high correlation coefficients for TNE-2 with 2CYEMA at 0.926 and NNAL at 0.907. Levels of several of the VOC metabolites including 3HPMA, t4HBEMA, 3HMPMA, and 4HMBEMA were also highly correlated, with the correlation coefficients for t4HBEMA being 0.943 for 3HMPMA and 0.919 for 4HMBEMA. Table 4 presents results from the conditional logistic regression analysis of lung cancer incidence by biomarker concentration. The risk of lung cancer increased significantly with each log-unit change of TNE-2 (OR = 2.22, 95% CI = 1.03, 4.78), NNN (OR = 2.44, 95% CI = 1.13, 5.27), 2CAEMA (OR = 2.00, 95% CI = 1.03, 3.88), 2CYEMA (OR = 2.17, 95% CI = 1.03, 4.58), 3HPMA (OR = 2.19, 95% CI = 1.03, 4.66), MADA (OR = 3.63, 95% CI = 1.00, 13.10), and 2HPMA (OR = 2.72, 95% CI = 1.16, 6.36), and these estimates were significant with additional adjustment for demographic characteristics for these biomarkers and 1-PYR (OR = 2.47, 95% CI = 1.06, 5.77). Estimates were also significant for 2CAEMA (OR = 2.14, 95% CI = 1.01, 4.55), 2CYEMA (OR = 2.17, 95% CI = 1.03, 4.55), and 2HPMA (OR = 2.85, 95% CI = 1.04, 7.81) with additional adjustment for smoking quantity and duration. None of the estimates were significant with adjustment for demographic characteristics and occurrence of opium use, or adjustment for demographics, opium use, and smoking exposure.

Discussion
In this study of male exclusive cigarette smokers in Iran, the majority of whom were from rural areas, concentrations of most tobacco-related biomarkers including TNE-2, NNAL, and 3-FLU were higher among the lung cancer cases than controls. Many of the biomarkers were highly correlated with one another, including TNE-2 with NNAL and 2CYEMA and t4HBEMA with 3HMPMA and 4HMBEMA, as nicotine NNAL, and the VOC parent compounds are found in substantial quantities in cigarette smoke [29]. Odds of lung cancer increased with higher concentrations of several biomarkers including TNE-2, NNN, 2CYEMA, and 3HPMA, and estimates for the VOC metabolites, 2CYEMA, 2COEMA, and 2HPMA, were statistically significant after further adjustment for demographic characteristics, smoking quantity, and duration. None of the estimates were significant when the occurrence of regular opium use was adjusted for in the regression model. This analysis provides additional evidence and information concerning the relationship between tobacco biomarker concentrations and lung cancer incidence, using a cohort study in the Middle East, notably an area that has not been previously studied for this purpose. Previous studies utilized data from the PLCO, Shanghai, and Singapore cohorts [8,9,[12][13][14] and found that TSNAs such as NNAL and PAH metabolites such as PheT were associated with increased lung cancer risk among smokers. The results in the current study are generally consistent with these previous findings in that smoker lung cancer cases had higher geometric mean concentrations of NNAL and the PAH metabolites 1-PYR and 3-FLU than the controls, and these results thus provide additional evidence from a cohort study in a population with different ethnic and genetic backgrounds. This study has also found associations between VOC metabolites such as 2CYEMA and 2HPMA and lung cancer incidence even after adjustment for smoking exposure.
There are some differences between these results and those of the previous studies, especially for the TSNAs and PAH metabolites. Some of these biomarkers have previously been found to be associated with lung cancer risk even after adjusting for smoking history and levels of other biomarkers. For example, previous studies have found significant associations between NNAL and lung cancer risk among smokers in the PLCO cohort [8], the Shanghai cohort [10], and the Singapore cohort [13] after controlling for smoking. The estimated mean NNAL concentration for the cases in this study was double that of the controls'. In the conditional logistic regression analysis, the matched odds ratio for the NNAL concentration was 1.46, the odds ratio adjusted for demographic characteristics was 1.44, and the odds ratio adjusted for demographics and smoking exposure was 1.35, although these results were not statistically significant. These observed differences may be partially due to the limited sample size in this analysis. This study included 28 lung cancer cases and 52 controls, whereas there were 100 cases and controls in the PLCO analysis [8], 476 cases and controls in the Shanghai cohort analysis [10], and 91 cases and 93 controls in the Singapore cohort analysis [13]. In contrast to the result for NNAL, the odds ratio for NNN was significant in the matched analysis, although not in the matched analyses that further adjusted for factors including smoking exposure. A previous study of the Shanghai cohort found an association between NNN and esophageal cancer risk [11] but this result for NNN and lung cancer is somewhat unexpected. Previous users of nass were excluded from the analysis and the NNN in smokeless tobacco has been identified as a powerful carcinogen, particularly in the oral cavity [6]. Any potential association between NNN and lung cancer risk could be further investigated.
One particularly important aspect of these results is the prevalence of regular opium use especially among the cases and the lack of significant results in the regression analyses adjusted for opium use. Opium use has been found to be associated with the incidence of lung cancer and cancer overall in the Golestan study [18] and has been classified as a lung carcinogen by the IARC [30]. A previous study regarding biomarkers of exposure among opiate and tobacco users in the GCS found that dual users of opiates and tobacco had higher concentrations of 39 biomarkers including those for TSNAs, PAHs, and VOCs than either exclusive smokers or exclusive opiate users [17]. The analysis also found that opiate use contributed a larger portion of PAH concentrations than the nicotine dose did. Sample sizes were limited to perform separate analyses for opium users and non-users in this study. However, a previous study of GCS smokers found that hazard ratio estimates for overall and cancer mortality were slightly higher among opium users compared to non-users [20]. The presence of another exposure (opium use) that contributes to the biomarker concentrations and lung cancer risk complicates and potentially confounds any associations between smoking and lung cancer in this study. Further studies could examine the relative contributions of opium and tobacco use to lung cancer risk among dual users and explore potential interactive effects between these two exposures.
Other characteristics of the present study's population could also account for some of the differences between these results and those of previous studies. Particular characteristics of nicotine and other biomarker metabolism among study participants could affect results given that certain genetic alleles have been linked to nicotine metabolism and lung cancer risk in regions such as China and Japan [31]. NNAL concentrations have been found to be lower among cigarette smokers in the GCS than among U.S. smokers in the National Health and Nutrition Examination Survey [16]. Such differences between this group and other populations could also result from the cigarette types used or their smoking histories. Large quantities of cigarettes in Iran are imported or smuggled from other countries [32] and the nicotine content in domestic and imported cigarettes has been found to be similar as measured by high performance liquid chromatography [33]. Both the cases and controls reported beginning to smoke on average in their late 20s in this group, whereas most smoking initiation in the U.S. occurs in the teens or early 20s [34].
Associations between biomarkers of exposure and lung cancer risk, controlling for smoking exposure, could have various interpretations. They could suggest that variations in concentrations of specific biomarkers have particular effects on cancer incidence even with adjustment for the effects of smoking generally. They could also suggest that there is a misclassification of the smoking status or there is residual confounding due to insufficient characterization of smoking exposure, or that sources other than smoking contribute to the biomarker concentrations.
This study has certain limitations. As noted, the sample size of lung cancer cases and thus matched controls was limited, reducing power to detect statistically significant results. Many of the biomarkers in the same classes such as TSNAs and PAH and VOC metabolites were highly correlated with one another, resulting in a difficulty to isolate and identify the portion of increased lung cancer risk that was due to particular biomarkers. Biospecimens were collected at baseline and biomarker concentrations may not reflect the total exposure to constituents over time. A subset of GCS participants provided a follow-up urine sample after an average of five years and biomarker concentrations among continuing smokers were highly correlated [16]. Finally, all of the participants in this analysis were male given that smoking is predominantly practiced by males in the region of study.
Future research could expand on this analysis in this and other cohorts, ideally with a larger sample size. Research of this nature on the association between biomarkers of tobacco exposure and disease risk is informative for public health organizations and government agencies as it provides information on the contribution of specific constituents and toxicants regarding the health effects of tobacco products. Such results can thus be used to inform tobacco control efforts as well as government policies and regulations concerning tobacco products.

Conclusions
Levels of several biomarkers were higher in lung cancer cases than among the controls in a study of male exclusive cigarette smokers in the Golestan Cohort Study in Iran. Some of these biomarkers were associated with lung cancer risk after adjusting for smoking duration and quantity but not after adjustment for opium use.
Supplementary Materials: The following are available online at https://www.mdpi.com/article/10 .3390/ijerph18147349/s1: Figure S1: Flowchart of Sample Inclusion and Exclusion Criteria; Table  S1. Geometric mean (with 95% CI) of creatinine-corrected biomarkers concentrations among current exclusive cigarette smokers (n = 80) in the Golestan Cohort Study; Table S2. Correlation between biomarker levels among exclusive current cigarette controls in the Golestan Cohort Study (n = 52); and Table S3. Risk of lung cancer by biomarkers among current exclusive cigarette users (n = 80).  Institutional Review Board Statement: The Golestan Cohort Study and its study protocols were approved by the institutional review boards of Tehran University of Medical Sciences, the International Agency for Research on Cancer (IARC), and the National Cancer Institute (ID# 07CN120).

Informed Consent Statement:
Informed consent was obtained from all subjects involved in the study.

Data Availability Statement:
The data presented in this study are available upon reasonable request submitted through the study portal: https://dceg2.cancer.gov/gemshare/ (accessed on 27 May 2021).