If Smoking Were Eliminated, Which US Counties Would Still Have High Rates of Smoking-Related Cancers?

Objective: to characterize the county variability of the impact of smoking elimination on rates of smoking-related cancers and explore whether common environmental indices predicted which metropolitan counties would experience high rates of smoking-related cancers even after smoking was eliminated. Methods: Surveillance, Epidemiology, and End Results Program (SEER) and Environmental Protection Agency (EPA) data were obtained. County level cancer rates for 257 metropolitan SEER counties, including the observed rates and those predicted after eliminating smoking, were derived via multilevel regression modeling and age standardized to the 2016 SEER population. Associations between the EPA’s Environmental Quality Index (EQI) scores and “Low Benefit” counties (counties that remain above the top 20th percentile of post-smoking elimination incidence rates) were explored via logistic regression. Results: Reductions in smoking-related cancer incidence ranged from 58.4 to 3.2%. The overall EQI (OR = 1.96, 95% CI [1.34, 2.86]) and the air quality index (OR = 5.99, 95% CI [3.20, 11.22]) scores predicted higher odds of being a “Low Benefit” county. Conclusions: Substantial inequities in the post-smoking elimination cancer rates were observed; air pollution appears to be a primary explanation for this. Cancer prevention in metropolitan counties with high levels of air pollution should prioritize pollution control at least as much as tobacco control.


Introduction
Smoking is the single most important modifiable cause of cancer [1], and the substantial declines in smoking rates in the U.S. over the past few decades have resulted in impressive reductions in the incidence of cancers of the lungs and other smoking-related types [2]. In a previous paper [3], we confirmed these findings using multilevel regression methods to estimate the association between county level smoking prevalence and cancer incidence data for 612 of the approximately 3100 counties in the U.S., which provided data for the Surveillance, Epidemiology, and End Results (SEER) program for 12 types of cancer that are known to be caused by tobacco.
We found that, if smoking was entirely eliminated, the incidence of these 12 types of cancer would decrease by about 40%, which would translate to a 16.3% reduction in all types of cancer combined [3]. This finding is in good agreement with other authors using different methods [1]. Our previous paper went beyond confirming this substantial role of tobacco in cancer, however. The county data and multilevel regression methods allowed us to estimate the county-by-county variability in tobacco's contribution, and to observe that not all counties would benefit equally from smoking elimination [3].
Some counties would see only modest reductions in the incidence of smoking-related cancers, according to this modeling approach. These "low benefit" counties tended to be in metropolitan areas, which suggested that perhaps carcinogenic exposures such as air pollution or occupational agents might be "offsetting" the potential benefit that would be expected if smoking could be eliminated. The five counties with the highest predicted cancer rates in 2016 after eliminating smoking were all in the metropolitan areas of large cities: Jefferson County KY (Louisville); Wayne and Macomb counties MI (Detroit); Campbell County KY (Cincinnati); and Jefferson Parish LA (New Orleans). If smoking was eliminated, we predicted that these five counties would see only an approximate 8% reduction in their rates of smoking-related cancers-far less than the overall average of about 40% after total smoking elimination.
In the current paper, we investigated the question of why some counties would retain high rates of smoking-related cancers if smoking was eliminated. Specifically, we tested hypotheses that these low-benefit counties would have higher exposures to environmental carcinogens. Likely candidates included PM 2.5 and other carcinogenic components of urban air pollution.

Cancer Data
Cancer incidence data for 2016 were obtained from the Surveillance, Epidemiology, and End Results (SEER) program of the National Cancer Institute (NCI) [4]. The program includes 18 cancer registries from across the United States. Sixteen registries were included in this analysis (Alaska and Hawaii were excluded); these 16 registries cover approximately 20% of the 2016 US population. SEER data contain cancer incidence information, as well as patient demographics. Population data are also provided by the SEER program. As the SEER incidence data provide county of residence information, the county was the unit of analysis (as it was in our previous analysis). Because our previous study found that the "low benefit" counties were all in metropolitan areas, the present analyses were restricted to the 257 SEER counties classified as metropolitan by the US Department of Agriculture (Rural and Urban County Codes 1, 2, and 3) [5]. These constitute 42% of the 612 counties in the previous SEER analyses. Counties identified as metropolitan, regardless of population size, were grouped into a single "metropolitan" category.
In line with our previous study [3], we chose to study the 12 cancer types which are deemed to be caused by smoking according to the U.S. Centers for Disease Control and Prevention [6]. These 12 are:
While these cancers are associated with smoking, not all incidence of these cancer types is caused by smoking. Previous research indicated [3] that the county level incidence rates of these types of cancer would drop by approximately 40%, on average, if smoking had been eliminated twenty years prior.

Smoking, Environmental, and Demographic Data
Independent variables were available at either the individual level or county level. Individual level variables for each case include sex and age, examined in 5-year categories (20-24, . . . , 80-84). As in the original analysis, race effects were not modeled (and all races combined were included in the dataset) because of the small numbers of non-whites in many of the SEER counties.
Reliable individual smoking data are not available from SEER. Therefore, county level smoking prevalence estimates were used instead. We used the age-standardized calendar year-and sex-specific smoking prevalence obtained from the Institute for Health Metrics and Evaluation [7]. These smoking prevalence estimates were based on Behavioral Risk Factor Surveillance System (BRFSS) data, which were modeled to generate estimates of county level smoking prevalence for the US between 1996 and 2012 for ages 18 and over. Estimates for counties that had limited data for a given year were derived via spatial and temporal smoothing techniques which included county and state-level covariates. The smoking variable was defined as "prevalence of current daily cigarette smoking". Smoking prevalence estimates for 1996 were used as they allowed for a lag of 20 years for the analysis of the 2016 SEER cancer incidence data.
We hypothesized that environmental exposures might explain why some counties would expect low benefits in cancer reduction from smoking elimination. Candidate carcinogenic exposures included PM 2.5 as well as other pollutants in air, water, and land. County level PM 2.5 data were gathered from the Center for Air, Climate, and Energy Solutions (CACES). These estimates were derived using spatially decomposed v1 empirical models, as described in [8]. The earliest available year for these data was 1999; therefore, this variable was lagged 17 years when used to model the 2016 SEER incidence data. For other environmental exposures, we used the U.S. Environmental Protection Agency's Environmental Quality Index (EQI) [9]. The EQI was designed to measure overall environmental quality at the county level, with the goal of improving the current understanding of the relationship between environmental conditions and human health. This objective seemed to fit well with our goal of investigating the role of environmental exposures in explaining inequalities in the benefit of smoking elimination.
The EQI is composed of indices representing five environmental domains: air, water, land, built environment, and sociodemographic. The earliest available data represent the period 2000-2005, and these were used for the primary analyses to maximize the latency for the 2016 cancer incidence data. The most recent data, 2006-2010, were used in a sensitivity analysis investigating the impact of the choice of latency on the results.
Each of the five environmental quality scales was constructed by combining information from a large number of environmental exposure variables using principal component analysis [10]. The five resulting scales or sub-indices are unitless numbers which can be considered county rankings of environmental quality. The overall EQI is a combined score including all five domains. We hypothesized a priori that the air quality index (AQI) would be the most important of the five in explaining variations in cancer rates because of the well-established contribution of air pollutants to cancer risk [11,12]. The AQI combines data on particulates (both PM 2.5 and PM 10 ) and the other EPA criteria air pollutants including nitrogen dioxide, sulfur dioxide ozone, carbon monoxide, as well as hazardous air pollutants (HAPs) including a number of carcinogens such as 1,2-dibromo-3-chloropropane, benzidine, carbon tetrachloride, chloroprene, ethylene dibromide, formaldehyde, trichloroethylene, and vinyl chloride [10].
The other four indices are water, representing overall water quality and chemical contamination of surface and drinking water; land, which includes measures of land use, pesticide use, presence of industrial facilities, and radon exposure; sociodemographic, which represents socioeconomic factors including poverty and crime; and built environment, which includes data on pedestrian and highway safety, access to various businesses including food, healthcare, and recreation and the quality of housing stock.

Modeling Actual and Simulated Cancer Rates
Statistical analyses were performed using multilevel mixed-effects regression models in STATA/MP 16.0 [13]. We modeled cancer incidence rates by using observed cancer counts as the dependent variable and population as the offset. The observed counts were modeled assuming a negative binomial distribution, based on the presence of overdispersion; incidence rate ratios and 95% confidence intervals were generated. The fixedeffect part of the model included age, sex, and county level age-adjusted and sex-stratified daily smoking prevalence, lagged 20 years. Smoking prevalence was modeled as a fixed effect because we assumed the effect of smoking prevalence would be constant across counties. County specific intercepts were included in the random component. Only metropolitan counties were included in the model. After fitting the model, we used STATA's "predict" command to generate two predicted counts of cancers: one derived using the actual smoking prevalence values, the second assuming smoking was completely eliminated (daily smoking prevalence = 0). Values were then converted to the expected county rates using county population data. The fixed-and random-effects model components were used to create the predicted values. To allow for comparison across counties, each county's rate was age standardized to the sex-specific SEER population distribution for 2016.

Analyzing Environmental Predictors of Low-Benefit Counties
Counties which the model predicted would still have high rates of the 12 types of smoking-related cancers, even after smoking was eliminated, were labelled "low benefit" counties ( Figure 1). Specifically, this term was assigned to those SEER metropolitan counties in the top 20% of the distribution of cancer incidence rates after eliminating smoking (the remaining counties were referred to as "high benefit" counties). This 20% cut point corresponded to an incidence rate in 2016 greater than 235/100,000 after smoking was eliminated (this cut point represented 19.8% of the metropolitan SEER counties, and those counties included 22.0% of the entire SEER population in 2016). T-tests were used to evaluate differences in demographic variables, PM 2.5 , and EQI scores by "low benefit" status. Logistic regression was used to examine associations between being a low-benefit county and environmental quality indices. Bivariate and multivariate analyses were performed. Akaike's information criteria (AIC) were used to compare the model fit.
Finally, to determine whether differences in environmental variables by low-benefit status was due to residual confounding by county level smoking status, we conducted a sensitivity analysis to examine differences among counties near the mean of smoking prevalence (within +/−0.67 standard deviations of the mean). This reduced the variance in county level smoking prevalence and allowed us to compare results produced using all metropolitan counties to this subset with similar smoking prevalence rates (Appendix A).

Results
There were 136,158 cases of smoking-related cancers identified in the 257 metropolitan SEER counties in 2016. The population in these counties was 58,129,876, which represents 90.9% of the population in the full set of 607 counties included in the 16 SEER registries.
The base model used to generate the 2016 county cancer incidence rates showed the anticipated strong associations between cancer incidence and age, gender, and smoking prevalence (lagged 20 years) ( Table 1). Males had higher rates of smoking-related cancers than females (RR = 1.42, 95% CI 1.39-1.46), and the rates rose steadily across the five-year age groups. Likewise, the rate of cancers increased steadily across the categories of smoking prevalence, and the well-described strong contribution of smoking to the rates of these 12 cancer types is clearly evident.

Descriptive Statistics for All Metropolitan Counties
After fitting a model including age, gender, and smoking prevalence (similar to the model in Table 1 except that the smoking prevalence lagged 20 years was coded as a continuous value rather than in six levels), the average predicted cancer incidence rate for all 257 metropolitan counties was 275.1 per 100,000 (Table 2). After setting all county smoking prevalences to zero, the model predicted an average incidence rate of 201.9, an average reduction of 25.0%. The model predicted considerable variability from county to county in the "post-smoking elimination" cancer rates, however, and the percent reduction in cancer incidence rates after smoking elimination ranged as high as 58.4% to a low of 3.2%; post-smoking elimination incidence rates ranged from 137.0/100,000 to 292.0/100,000 ( Figure 1).

Comparing Low-Benefit and High-Benefit Counties
After simulating the complete elimination of smoking, some counties were predicted to benefit only modestly ( Figure 1 and Table 2). The one-fifth of metropolitan counties (n = 51) with the highest predicted cancer rates post-smoking elimination (the "low benefit" counties) would have cancer incidence rates only eight percent lower than the actual mean rate for all SEER metropolitan counties in 2016. What distinguished these counties from other metropolitan counties?
The concentrations of PM 2.5 , lagged 17 years, did not distinguish low-from highbenefit counties (p = 0.32), nor was there an important difference in county population size, which might have been a proxy for urban environmental hazards. However, the overall EQI index was more than twice as high (poorer environmental quality) in the low-benefit counties (0.98 versus 0.45, p < 0.001).
The five sub-indices that comprise the EQI were studied separately for differences between low-and high-benefit counties and the Air Quality Index showed the most striking difference (1.24 vs. 0.62, p-value < 0.001), indicating substantially poorer air quality in the low-benefit counties ( Table 2). The only other notable difference was in the Built Environment Index (0.66 vs. 0.01, p-Value < 0.01), in which the low-benefit counties actually had a higher, meaning better, score.
Because the regression model including county smoking prevalence was used to predict cancer rates under the hypothetical condition that smoking had been eliminated (essentially "controlling for" effect of smoking), one might expect to find no difference in smoking prevalence between low-and high-benefit counties. In fact, the low-benefit counties had modestly higher 20-year lagged smoking prevalence (23.1% vs. 21.5%, p = 0.03). This modest residual difference in smoking prevalence between low-and high-benefit counties probably represents confounding effects of other factors that were correlated with county smoking prevalence.
Bivariate logistic regression modeling found that both EQI and air quality index scores predicted higher odds of being a low-benefit county (Part a of Table 3). A one standard deviation increase in the EQI was associated with nearly double the odds of being a lowbenefit county (OR = 1.96, 95% CI [1.34, 2.86]); a one unit increase in the air quality index indicated a six-fold increase in the odds of being a low-benefit county (OR = 5.99, 95% CI [3.20, 11.22]). The Built Environment Quality Index was also associated with increased odds of being a low-benefit county (OR = 2.70, 95% CI [1.68, 4.32]); however, of these two sub-indices, AQI produced the better fitting model (AIC = 219.4 versus 237.2). When the five sub-indices were included together in a single model, only the air quality index remained an important predictor of low-benefit counties (adjusted OR = 4.43, 95% CI [2.14, 9.19]) (Part b of Table 3). Two additional analyses were conducted to evaluate potential limitations in the available data. First, it is well known that the epidemiologic assessment of environmental carcinogens should take into account long latencies between exposure and disease. The EQI datasets cover two time periods, either 2000-2005, which was used above, or 2006-2010. Thus, the maximum available latency was 12 to 17 years. Using these data, the odds ratio associating the AQI with being a low-benefit county in 2016 was 5.99 (Table 3a). When the more recent AQI score was used, this odds ratio decreased substantially to 1.90 (results not shown). This reduction may suggest that the longer latency was more appropriate, as expected. However, caution is warranted in this interpretation because the Environmental Protection Agency changed the makeup of the EQI measure between the two time periods, making direct comparisons difficult.
The second sensitivity analysis was conducted to investigate further the residual difference in smoking prevalence between low-and high-benefit counties ( Table 2). We were concerned that this residual difference might be a proxy for important unmeasured exposures. We therefore repeated the analyses in Table 2 using a restricted dataset of only the metropolitan counties whose smoking prevalence was very close to the mean for all metropolitan counties. Specifically, we limited the dataset to the 133 metropolitan counties with smoking prevalence within 0.67 standard deviations of the mean (Appendix A). As in the full analysis (Table 2), the AQI remained higher (poorer air quality) in the low-benefit counties, and the magnitude of the difference with high-benefit counties actually increased slightly. Thus, we conclude that the findings in Tables 2 and 3 on the importance of air quality were not due to residual confounding by county smoking prevalence.

Discussion
These analyses suggest that metropolitan counties with poorer air quality would benefit less from tobacco control, and retain higher smoking-related cancer incidence rates, than counties with cleaner air. There are, however, several limitations. We lacked individual smoking data and were constrained to using county averages. Smoking data indicated county prevalence; no measure of smoking intensity or duration was available. However, as noted earlier, despite this limitation, the resulting estimate of the mean contribution of smoking to the incidence of the smoking-related cancers was in good agreement with other authors [1,3]. Another limitation was that the EQI data were not available with a 20-year latency, which would have been preferable.
There are both strengths and limitations to the AQI being an integrated measure of many different pollutants, considering not only particulate matter but also many organic and inorganic pollutants including carcinogens. From a policy perspective, it may be useful to have an overall measure of air pollution, albeit with the disadvantage that the contributions of individual pollutants are not identifiable. We were able to separately investigate the role of PM 2.5 , and the finding that the AQI was more strongly associated with being a low-benefit county than PM 2.5 points to the likelihood that there are other important carcinogens in urban air besides particulate matter. In future work, we will use more refined measures of air toxics to pursue this hypothesis.
The finding that counties with poorer air quality would benefit less from tobacco control could be understood as simply another way to say that air pollution causes some of the same types of cancers as tobacco, which is now well accepted [11,12]. For example, very recent work by Swanton and colleagues has identified a potentially powerful mechanism through which PM 2.5 can contribute to non-small cell lung cancer (NSCLC) in non-smokers, a disease with a high frequency of EGFR mutations (EGFRm). The authors found that PM promotes precursor lung epithelial cells initiated by EGFRm [14].
What remains controversial is the question of the relative importance of environmental exposures versus "lifestyle" factors in cancer prevention [3,[15][16][17]. By simulating the pattern of county cancer rates in a world without tobacco, we find that there would be substantial inequities in the remaining cancer rates, and that urban air pollution, in all its considerable complexity, appears to be a primary explanation for this.

Conclusions
Often, when the priorities for cancer prevention are discussed, the overall or average contributions of tobacco, diet, occupational exposures, air pollution, etc., are compared, with the conclusion that smoking is by far the most important. While this argument appears logical, it overlooks the fact that there will be significant inequities in these contributions depending on environmental conditions. More specifically, we conclude that cancer prevention in metropolitan counties with high levels of air pollution should prioritize pollution control at least as much as tobacco control.

Conflicts of Interest:
The authors have no conflicts to declare. Table A1. Incidence rates of 12 smoking-related types of cancer in 133 SEER metropolitan counties, 2016, as observed and assuming smoking elimination, and covariates restricted to counties with smoking prevalence close to the mean.